Friday, 20 February 2009

Pragmatic LINQ

LINQ, especially when talking to a database (or other back-end), arguably creates a bit of a tricky mess for proper application design.


  • I'm strongly in favour of proper unit testing
  • As such, I like the repository pattern, since this allows me to mock my DAL
  • In reality, I don't really need true POCO support; if I start needing my app to work with MySQL, I have bigger problems than POCO
  • I like that LINQ-to-SQL / EF can maintain a basic object model for me
  • Since it doesn't buy me anything, I don't really want to declare and maintain a "pure" (i.e. separate to the DAL) object model - I'm generally content to use partial classes to extend the classes we have generated
  • For use in things like MVC, I want to know when data access happens; I don't want my view messing with lazy navigation properties
  • I'm content (at the minute) to have a DAL that includes the LINQ-generated classes, the repository interface, and a class implementing that interafce, and think of it as "my object model assembly that happens (by coincidence) to contain a default repository implementation" (try saying that twice...)
  • While I'm happy to know that LINQ to SQL supports POCO usage, the tooling doesn't make it easy. For now, I'm content to use attributed classes, and get on with the job...

So what?

Well, the first impact this has is: if you want it to be testable in any real sense, there isn't a lot of the LINQ stuff that can make it into the public API. For example:

  • the data-context - has nothing to do with external callers in a repository implementation
  • navigation properties - very quickly start crossing aggregates and/or causing lazy behaviour (not good if your context no longer exists)

My current thinking with this is that the data-context and most navigation properties should be marked as internal to the data layer (trivial either in the designer or dbml views). This means that your repository implementation can use the navigation properties to construct interesting queries, but the public API surfaced back to the caller doesn't include them. If the caller gets and order from the order repository, and wants details about the customer: tough - go and ask the customer repository. There is an edge-case for tightly coupled data (order header/details being a prime example), where it might be prudent to leave the navigation property public - but this is in the same aggregate, so this isn't a problem.

It isn't whiter than white - but it seems pretty workable:

  • The repository interface protects our data-context from abuse; we know what use-cases we are supporting, and that they have been tested
  • The lack of navigation properties means we know when data access happens : it is when the caller asks the repository

The bit where I'm a bit mixed is in the subject of allowing the caller to use Expression arguments on method calls; the ability to pass in an Expression<T,bool>-based predicate is powerful, but risky: we can't validate that our DAL covers all use cases (due to unmapped method calls etc), and we can't be 100% what TSQL is going to execute in reality (even small composed expressions can radically change the TSQL expression). I think I'd need to evaluate that on a need-by-need basis.

I don't know if I'm saying anything a: obvious, or b: stupid... but that is my thinking...


Chris said...

While I agree with most of what you are saying here, there is one point that I'd like to make in favor of POCOs. Using entities that are created by Linq to SQL means in order to add functionality (eg, Methods) to my core domain objects, I have to define them in my Data Access Layer (or have my domain layer inherit from the data layer) which is backwards from what I want - I'd rather my data layer depended on the domain layer so there is no possibility of bleeding data access into the domain).

Because of this, I feel like a lot of my projects using Linq to SQL are ending up with Anemic Domain Syndrome.

Marc Gravell said...

Yes, it is a tricky balancing act. In a perfect world DDD with a separate mapping would be nice - the trick is getting everything tied together with the minimum fuss.

It isn't a simple problem to solve, indeed; either approach has compromises.

manwood said...

Interesting thoughts. Would you mind explaining a bit more about navigation properties and why you hide them? In my (perhaps naive) understanding these are one to one or one to many associations (right?), and I find it useful to be able to access a graph of objects... although having said that, I have often wondered to extent you 'fill a graph' when pulling objects from the repository - are you just saying 'be as restrictive as possible'? Or am I not getting something?

Marc Gravell said...

With L2S, they are typically lazy, which means we can't prove much about our system without full end-to-end testing (since the "view") might expand them - or try to against a disposed data-context.

More, even if it works, we want our external caller to understand that they are accessing data that may not be loaded. In some ways, I'd rather return a separate DTO for the composite data (that might be composed of the different *actual* entities).

Duncan said...

I'm all for POCO in the Repository, and wrapping everything LINQy internal to this interface - it's how I've done it before -

However, recent findings make me wonder if wrapping the DataContext internal to Repos is a good idea - the DataContext is the Unit of Work after all - so what do you do if you want to manage transactions over a number of repos?

Or is this the whole point of your post on "Pragmatic LINQ"? :)

Marc Gravell said...

Well, at the simplest level, SqlTransaction or TransactionScope would suffice. But I think that question applies regardless of the specific repo internals (i.e. I don't think it is specific to DataContext).