Monday, March 11, 2013

Postgress ORDER BY - how random?

Recently I ran into a surprising bug (a feature?) of Postgress SQL. It looks like under certain circumstances the order of rows in the result set of the SELECT statement can change between runs. In other words you run it once and then run it again, the order of records in these two result sets will be different.

To add some specifics: The SELECT statement in question has an ORDER BY clause. Ordering is done by a string field. The sort order has to be case insensitive, so that identical phrases sit next to each other regardless of the case. Obviously that if the result set contains several rows with the string field value differing only in string case, the order of these rows in the result set is arbitrary, which I totally expected and am completely fine with.

What I did not expect is that this order can change between 2 runs of the query.

It does not happen in a simple SELECT ... ORDER BY though. I ran into this feature/bug while working with paging using OFFSET/LIMIT. It so happened that in my result set I had 2 records: one with the sorting key 'Account' and another one with sorting key 'ACCOUNT'. In the result set without OFFSET/LIMIT clauses the 'Account' record as placed in position 130, while the 'ACCOUNT' record was sitting in position 131. This is fine. I do not care between the 2 which one comes first.

But when I started to read this result set in chunks of 10 something weird happened. The 'Account' record as expected was the last one in the result set of the statement with the OFFSET=120, LIMIT=10.

What was not expected is for it to also show up as the first record of the next chunk OFFSET=130, LIMIT=10. The proper owner of the position 131 - the 'ACCOUNT' record was nowhere to be found (in result sets). As a result of this problem one of my records was processed twice, while another one was skipped.

The workaround for this problem was simple - I appended the primary key to the sort field. This way I made the sorting key unique without altering the desired order of the records in the result set

No comments:

Post a Comment