Sunday 30 May 2010

When to optimize?

Prompted by a tweet earlier, I thought I’d share some thoughts on optimization. In code, there are diverging views here, varying between the extremes:

  • Forget the performance; write an awesome application, then see (perhaps profile, perhaps just use it) where it sucks
  • Performance rules! Give yourself an ulcer trying to eek out every last picosecond of CPU and nibble of RAM!

We've all been there Obviously there is some pragmatic middle-ground in the middle where you need to design an app deliberately to allow performance, without stressing over every line of code, every stack-frame, etc.

Is this juxtaposition valid?

I suspect it is. In my day job I’m an app developer: what matters is shipping an app that works, is usable, is clear, doesn’t corrupt the database, doesn’t e-mail real clients from the test server*, etc. Performance is important, but management aren’t going to be impressed with “I shaved 3ms/cycle from the ‘frobing’ loop! oh, yes, but the login page is still broken”.

But my “for kicks” OSS alter-ego is a library developer. Shipping your library with great features is important, but library developers are (in my experience at least) far more interested in all the micro-optimisations. And this makes sense; the people using your library are trusting you to do it well – they’re too busy getting the login page working. And for a library developer, getting these details right is part of how you deliver the best product.

Summary

So next time you hear somebody worrying about whether to use string.Concat(string[]), string.Concat(object[]) or StringBuilder: it could be random micro-optimisation gone wild (and please do kick them); but maybe, just maybe, you’re dealing with a closet library developer.

*=yes, I’ve accidentally done that. Once. Fortunately we caught it quickly, and the data wasn’t rude or anything. Lawsuit avoided.

Thursday 6 May 2010

Strings: sorted

Sorting data… I don’t mean complex bespoke types that you’ve written – I mean inbuilt types like “string”. You probably wouldn’t think too long about trying to sort strings – you just expect them to work.

Oddly enough, this wasn’t actually always the case. OK, it is an edge case, but string comparison wasn’t actually guaranteed to be transitive. By this I mean that if we’ve got three values A, B and C, and we’ve tested that A comes before B, and B comes before C – then you might reasonably deduce that A comes before C. This is a key requirement for sorting (and has always been documented as such). But it wasn’t always the case!

Why is this bad? At best case, your data can choose random-looking sort orders. At worst case a “sort” operation may loop forever, attempting to shuffle data that is teasing it.

Even though this oddity was rare, it was problem enough for it to come up in the wild, way back when usenet was in vogue (in other news: Microsoft is shutting down their usenet farm… is that news to anyone?).

It was news to me, but while discussing this curio, somebody observed that it is now fixed in .NET 4.0; the nature of the fix means that the code in the connect article (above) needs re-ordering to show it working and not-working as you flip between framework versions; here’s the updated sample:

        string s1 = "-0.67:-0.33:0.33";
string s2 = "-0.67:0.33:-0.33";
string s3 = "0.67:-0.33:0.33";

Console.WriteLine(s1.CompareTo(s2));
Console.WriteLine(s2.CompareTo(s3));
Console.WriteLine(s1.CompareTo(s3));


In .NET < 4.0 this shows “-1 –1 1” (this is a fail). In .NET 4.0, it shows “1 1 1” (a pass). I tried a range of combinations and couldn’t make it misbehave. A small matter, maybe, but I’m much happier knowing that this has finally been tucked up in bed.