Tuesday 17 November 2009

LINQ Query Syntax – a whistlestop

Based on some feedback (both comments and e-mail) to my last entry, it seems obvious to me that a number of readers aren’t familiar with the way that LINQ maps query syntax to real code.

So I thought I’d offer a very brief taster. For real detail here, good resources include:

So hang on – I said this was a whistlestop, but this isn’t a short blog. You see, I never said this was a simple area. Go read one of the above and hopefully it will make more sense… but we’ll muddle on and see if anything sticks…

What is LINQ query syntax?

At the spec level, it isn’t quite what many people think it is… it knows nothing about what it is doing. It doesn’t have any knowledge of “IEnumerable”, “IQueryable”, etc – it is just a set of rules that map keywords like “from”, “select”, “join” etc into instance-style method calls using lambdas.

The rules for this are… “complicated”, for example:

A query expression with a second from clause followed by something other than a select clause:

from x1 in e1
from x2 in e2

is translated into

from * in ( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => new { x1 , x2 } )

Don’t worry – you don’t usually need the details; I didn’t use the spec to write my custom “SelectMany”, for example. A bit of guesswork, and a quick check against the standard “Enumerable.SelectMany” got me going.

So what does my code look like?

As always, reflector will be useful (if you crank down the optimisations, as discussed here), but let's pick apart the example from yesterday:

image

The first “from” is largely inert; perhaps the most important thing it does is propose a name (and optionally type) for the first variable* “path”.

*=I’m using the term loosely.

The additional “from” statements are more interesting. These do a “SelectMany” (see quote above), but in order to keep “path” in scope, it uses the overload that accepts two lambdas; one to select the data (“File.OpenRead(path)” etc), and another to create a new type representing both “path” and the new data. Not necessarily very obvious, but something like:

image

(The “x1” / “x2” identifiers are introduced by the compiler)

So where do the methods come from?

Here’s the interesting thing: the LINQ spec doesn’t care! It could be an instance Select (etc) method declared on the actual type that happens to be “in play”; it could be a custom extension method provided after-the-fact. The actual method is resolved by regular C# member resolution rules. A few interesting examples of alternative implementations include:

At one point I had an F#-style async implementation using LINQ query syntax, but… well, I wasn’t very happy with it – sometimes you can push too hard ;-p

Or (as in the example with IDisposable) you can just offer some new code for specific cases. As long as the C# compiler can resolve a preferred implementation, you’re in business.

Summary

  • LINQ = translation to instance-style method calls using lambdas
  • You can write your own
  • It isn’t at all limited to things like IEnumerable
  • Since it uses lambdas, it can be delegate-based or Expression-based

Go crazy; try your own query syntax implementation today ;-p