Thursday 6 May 2010

Strings: sorted

Sorting data… I don’t mean complex bespoke types that you’ve written – I mean inbuilt types like “string”. You probably wouldn’t think too long about trying to sort strings – you just expect them to work.

Oddly enough, this wasn’t actually always the case. OK, it is an edge case, but string comparison wasn’t actually guaranteed to be transitive. By this I mean that if we’ve got three values A, B and C, and we’ve tested that A comes before B, and B comes before C – then you might reasonably deduce that A comes before C. This is a key requirement for sorting (and has always been documented as such). But it wasn’t always the case!

Why is this bad? At best case, your data can choose random-looking sort orders. At worst case a “sort” operation may loop forever, attempting to shuffle data that is teasing it.

Even though this oddity was rare, it was problem enough for it to come up in the wild, way back when usenet was in vogue (in other news: Microsoft is shutting down their usenet farm… is that news to anyone?).

It was news to me, but while discussing this curio, somebody observed that it is now fixed in .NET 4.0; the nature of the fix means that the code in the connect article (above) needs re-ordering to show it working and not-working as you flip between framework versions; here’s the updated sample:

        string s1 = "-0.67:-0.33:0.33";
string s2 = "-0.67:0.33:-0.33";
string s3 = "0.67:-0.33:0.33";

Console.WriteLine(s1.CompareTo(s2));
Console.WriteLine(s2.CompareTo(s3));
Console.WriteLine(s1.CompareTo(s3));


In .NET < 4.0 this shows “-1 –1 1” (this is a fail). In .NET 4.0, it shows “1 1 1” (a pass). I tried a range of combinations and couldn’t make it misbehave. A small matter, maybe, but I’m much happier knowing that this has finally been tucked up in bed.