Friday, 20 February 2009

Pragmatic LINQ

LINQ, especially when talking to a database (or other back-end), arguably creates a bit of a tricky mess for proper application design.


  • I'm strongly in favour of proper unit testing
  • As such, I like the repository pattern, since this allows me to mock my DAL
  • In reality, I don't really need true POCO support; if I start needing my app to work with MySQL, I have bigger problems than POCO
  • I like that LINQ-to-SQL / EF can maintain a basic object model for me
  • Since it doesn't buy me anything, I don't really want to declare and maintain a "pure" (i.e. separate to the DAL) object model - I'm generally content to use partial classes to extend the classes we have generated
  • For use in things like MVC, I want to know when data access happens; I don't want my view messing with lazy navigation properties
  • I'm content (at the minute) to have a DAL that includes the LINQ-generated classes, the repository interface, and a class implementing that interafce, and think of it as "my object model assembly that happens (by coincidence) to contain a default repository implementation" (try saying that twice...)
  • While I'm happy to know that LINQ to SQL supports POCO usage, the tooling doesn't make it easy. For now, I'm content to use attributed classes, and get on with the job...

So what?

Well, the first impact this has is: if you want it to be testable in any real sense, there isn't a lot of the LINQ stuff that can make it into the public API. For example:

  • the data-context - has nothing to do with external callers in a repository implementation
  • navigation properties - very quickly start crossing aggregates and/or causing lazy behaviour (not good if your context no longer exists)

My current thinking with this is that the data-context and most navigation properties should be marked as internal to the data layer (trivial either in the designer or dbml views). This means that your repository implementation can use the navigation properties to construct interesting queries, but the public API surfaced back to the caller doesn't include them. If the caller gets and order from the order repository, and wants details about the customer: tough - go and ask the customer repository. There is an edge-case for tightly coupled data (order header/details being a prime example), where it might be prudent to leave the navigation property public - but this is in the same aggregate, so this isn't a problem.

It isn't whiter than white - but it seems pretty workable:

  • The repository interface protects our data-context from abuse; we know what use-cases we are supporting, and that they have been tested
  • The lack of navigation properties means we know when data access happens : it is when the caller asks the repository

The bit where I'm a bit mixed is in the subject of allowing the caller to use Expression arguments on method calls; the ability to pass in an Expression<T,bool>-based predicate is powerful, but risky: we can't validate that our DAL covers all use cases (due to unmapped method calls etc), and we can't be 100% what TSQL is going to execute in reality (even small composed expressions can radically change the TSQL expression). I think I'd need to evaluate that on a need-by-need basis.

I don't know if I'm saying anything a: obvious, or b: stupid... but that is my thinking...