Friday, 27 February 2009

Summit's up

No, not a bug report. But on Sunday begins the Global MVP Summit 2009 - an opportunity to nag liaise with the various Microsoft product teams, and to catch up with the geeks at large.

My first MVP Summit, and I'm quite looking forward to it. My only regret is that I don't expect to be able to blog about anything said there... oh well - it is a small price to pay.

Back in a few ;-p

Tuesday, 24 February 2009

What C# 4.0 covariance *doesn't* do

On a number of occasions, I've seen people talking about the old covariance problem - in particular, given:

    abstract class Fruit { }
    class Apple : Fruit { }

    void Foo(List<Fruit> fruit) { /* do some things */ }

How do I call Foo with the list of apples (List<Apple>) that I have?

    List<Apple> apples = ...
    Foo(apples); // compile error

The error is: Argument '1': cannot convert from 'System.Collections.Generic.List<Program.Apple>' to 'System.Collections.Generic.List<Program.Fruit>'.

There is no implicit conversion (covariance) between list-of-subclass and list-of-superclass. In particular, contrast this to arrays:

    void Foo(Fruit[] fruit) { /* do some things */ }
    Apple[] apples = ...
    Foo(apples); // compiles and executes (within limitations)

The difference is that .NET arrays (of reference-type objects) are covariant, but prior to .NET 4.0, nothing else is. And the existing arrays covariance is a bit of a bodge to make some things work; it is risky (for example, Foo could happily try to add (via the indexer) a Banana to the array; this would compile fine but would explode at execution - so compile-time checking goes out the window).

So here's the problem: repeatedly, I've seen people claim that ".NET 4.0 / C# 4.0 covariance will make the above (lists) work" - this is both wrong and completely misses the point that you can already do this in .NET 2.0 / C# 2.0.

Why is it wrong?

  1. .NET 4.0 / C# 4.0 only applies to generic interfaces and generic delegate declarations; it does not apply to concrete types (OK, we could re-declare the method to accept an IList<Fruit>...)
  2. It is limited to strict "out" usage (compare to contravariance which is limited to strict "in" usage); lists cannot be either, so lists will not feature any covariance/contravariance features in 4.0

Note that if we only used IEnumerable<Fruit>, then it would work, since IEnumerable<T> is pure "out", and is covariant in 4.0 - confused yet?
For the general case, I'm assuming that Foo needs a list (i.e. it is going to add things, use the indexer, etc).

Huh? What is in/out usage?

To get this working in 4.0, you can now optionally mark generic-type-arguments as either "in" or "out" (not both). An "in" is something that is only ever passed in via the interface/delegate (for example, "void Add(T arg);"), and an "out" is something that is only ever passed out via the interface/delegate (for example, "T Current {get;}"). [caveat: the "out" modifier against method arguments doesn't satisfy "out" covariance usage, since it is really just a special "ref" scenario].

This is set against the generic-type-argument directly, and is verified by the compiler; for example, IEnumerable<T> and Func<TArg1, TResult> are now:

    IEnumerable<out T>
    Func<in TArg1, out TResult>

The BCL team have had the fun job of updating many existing interfaces / delegates to make this work!

OK, so lists don't work; what can we do?

Generics. We simply don't need covariance in this case. Instead of thinking "list of fruit", think "list of things, where those things are fruit", and you are getting close (for the caller's convenience, we'll also switch to the IList<T> interface too, but this is unrelated):

    Foo<T>(IList<T> fruit) where T : Fruit { /* do some things */ }

This is now a generic method that will accept any (interface-based) list-of-T, where T is a fruit. We don't have to modify the calling code - our original Foo(apples) works fine (the compiler infers Foo<Apple>). More: the compiler now checks that we only attempt to add T or subclasses of T to the list, so the problem found with arrays goes away. Of course, you need to start thinking in terms of T, which is tricky... for example, if you need to create fruit, you might want an additional generic constraint:

    Foo<T>(IList<T> fruit) where T : Fruit, new()
    {
        T newFruit = new T();
        // other things
        fruit.Add(newFruit);
    }

Generics. Your friend since 2.0; use them ;-p

Monday, 23 February 2009

Get busy in the community

Do you make use of the user-groups near you? No, not web-sites, but the real world? If not, perhaps you should? They are a fantastic opportunity both to get up to speed on technology with the help of experts, and for networking. And maybe think about speaking about your own areas of interest/expertise.

I normally pop along to NxtGenUG (Oxford), but with a new baby that has been a bit tricky lately, so I was surprised to see a new user group more local to me - GL.net. So in the unlikely event that you live local to Gloucester, UK - why not pop along? I've also signed up to jump on the soap box in March, talking about the Expression API, which I plan to repeat at DDD South West.

Now; geography probably dictates that you can ignore all those links... but why not take the opportunity to double-check what groups might be near you? Go on... step outside into the daylight; it might not hurt ;-p

Friday, 20 February 2009

Pragmatic LINQ

LINQ, especially when talking to a database (or other back-end), arguably creates a bit of a tricky mess for proper application design.

Consider:

  • I'm strongly in favour of proper unit testing
  • As such, I like the repository pattern, since this allows me to mock my DAL
  • In reality, I don't really need true POCO support; if I start needing my app to work with MySQL, I have bigger problems than POCO
  • I like that LINQ-to-SQL / EF can maintain a basic object model for me
  • Since it doesn't buy me anything, I don't really want to declare and maintain a "pure" (i.e. separate to the DAL) object model - I'm generally content to use partial classes to extend the classes we have generated
  • For use in things like MVC, I want to know when data access happens; I don't want my view messing with lazy navigation properties
  • I'm content (at the minute) to have a DAL that includes the LINQ-generated classes, the repository interface, and a class implementing that interafce, and think of it as "my object model assembly that happens (by coincidence) to contain a default repository implementation" (try saying that twice...)
  • While I'm happy to know that LINQ to SQL supports POCO usage, the tooling doesn't make it easy. For now, I'm content to use attributed classes, and get on with the job...

So what?

Well, the first impact this has is: if you want it to be testable in any real sense, there isn't a lot of the LINQ stuff that can make it into the public API. For example:

  • the data-context - has nothing to do with external callers in a repository implementation
  • navigation properties - very quickly start crossing aggregates and/or causing lazy behaviour (not good if your context no longer exists)

My current thinking with this is that the data-context and most navigation properties should be marked as internal to the data layer (trivial either in the designer or dbml views). This means that your repository implementation can use the navigation properties to construct interesting queries, but the public API surfaced back to the caller doesn't include them. If the caller gets and order from the order repository, and wants details about the customer: tough - go and ask the customer repository. There is an edge-case for tightly coupled data (order header/details being a prime example), where it might be prudent to leave the navigation property public - but this is in the same aggregate, so this isn't a problem.

It isn't whiter than white - but it seems pretty workable:

  • The repository interface protects our data-context from abuse; we know what use-cases we are supporting, and that they have been tested
  • The lack of navigation properties means we know when data access happens : it is when the caller asks the repository

The bit where I'm a bit mixed is in the subject of allowing the caller to use Expression arguments on method calls; the ability to pass in an Expression<T,bool>-based predicate is powerful, but risky: we can't validate that our DAL covers all use cases (due to unmapped method calls etc), and we can't be 100% what TSQL is going to execute in reality (even small composed expressions can radically change the TSQL expression). I think I'd need to evaluate that on a need-by-need basis.

I don't know if I'm saying anything a: obvious, or b: stupid... but that is my thinking...

LINQ to SQL - not quite dead yet...

I've happily gone on-record to say that in their current form, I believe that LINQ to SQL is a more useful tool that Entity Framework. Obviously, with the planned road-map to prioritise the latter (hopefully back-filling with the missing features that shipped with LINQ to SQL), this puts me on a back foot - so no doubt at some point (in the 4.0 era) I'll need to re-check my position.

However, I was very happy to get a "connect" e-mail today, telling me that they've fixed a bug I reported in LINQ to SQL, and that it would ship with 4.0; conclusive proof that it isn't completely side-lined, and continues to be a supported product (albeit with a reduced development effort). The only downside is that this makes it much harder to know when to make the switch... but I won't begrudge Microsoft that.

For now, at least, I breath the contented sigh of the developer who knows that their data-access works today and (in theory) tomorrow.

Thursday, 12 February 2009

Fun with field-like events

UPDATE: this all changes in 4.0; full details here.

Field-like events; a great compiler convenience, but sometimes a pain. To recap, a field-like event is where you let the compiler write the add/remove methods:

public event EventHandler Foo;

All well and good... the C# compiler creates a backing field, and add/remove accessor methods - however, the C# specification also dictates that the accessor methods will be synchronized. Unfortunately, the ECMA and MS specs disagree how. The ECMA spec maintains that the "how" is unimportant (an implementation detail) - the MS spec dictates "this" for instance methods, "typeof(TheClass)" for static methods. But if you follow the ECMA spec, there is no reliable way of independently using the same lock - you simply can't guarantee what it is (the C# spec doesn't mention [MethodImpl], so using this would also be making assumptions).

Aside: best practice is not to lock on either "this" or a Type - since in both cases you can't guarantee who else might be using the same lock.

Of course, in most cases this is irrelevant. Most classes simply don't need thread safety, and it is pure overkill. However, I was dealing with a case earlier where thread-safety was important (it is a class for simplifying fork/join operations). For convenience, I wanted to provide both regular event accessors and a fluent API to do the same - i.e.

class Bar {
public event EventHandler Foo;
public Bar AddFoo(EventHandler handler) {
Foo += handler;
return this;
}
// snip
}

with fluent-API usage:

    new Bar().AddFoo(handler).Fork(action).Fork(action).Join();

So what broke? The C# spec also dictates that inside the type, all access goes directly to the field. This means that the usage inside the AddFoo method is not synchronized. This is bad. So what can we do? My first thought was to use a nested class (since this is then a different type):

class Bar {
public event EventHandler Foo;
public Bar AddFoo(EventHandler handler) {
BarUtil.AddFoo(this, handler);
return this;
}
static class BarUtil {
internal static void AddFoo(
Bar bar, EventHandler handler)
{
bar.Foo += handler;
}
}
// snip
}

Unfortunately, it turns out (by inspection) that this still uses the field directly, so isn't synchronized. If we make it non-nested, it finally works correctly - but then we're getting into the position where it simplifies things to just have an extension method:

class Bar {
public event EventHandler Foo;
// snip
}
static class BarUtil {
public static Bar AddFoo(
this Bar bar, EventHandler handler)
{
bar.Foo += handler;
return bar;
}
}

As it happens, I decided to just side-step the whole debacle instead and do the locking myself...

Summary: field-like events; unnecessary synchronization when you don't need thread-safety, and highly questionable synchronization when you do need it...

Wednesday, 11 February 2009

Async without the pain

I've seen a number of questions lately about async operations recently, which (coupled with some notes in a book I'm proof-reading) have made me think a lot about async operations. I'll be honest: I often don't use async IO correctly, simply because it isn't friendly.

Basically, async is hard in .NET - at least to get right. One common alternative to avoid this pain is to use the synchronous version of code, but on a pool thread - i.e.

    ThreadPool.QueueUserWorkItem(delegate { /* do stuff */ });

However, while this is fine for many cases, it uses threads... we'd much prefer to use completion ports etc, which we can only really usually do by using the proper Begin/End methods that many IO wrappers provide.

So why is this a problem? Simply - you need to mess with IAsyncResult, instances, exception handling, etc. It soon gets messy. But are we missing a trick? Why can't we wrap this using functional programming?

For example, consider the following:

    static void Main() {
HttpWebRequest req = (HttpWebRequest)
WebRequest.Create("http://www.google.com/");
RunAsync<WebResponse>(
req.BeginGetResponse, req.EndGetResponse,
ProcessResponse);
Console.WriteLine("Running...");
Console.ReadLine();
}
static void ProcessResponse(WebResponse resp) {
using (StreamReader reader = new
StreamReader(resp.GetResponseStream())) {
Console.WriteLine(reader.ReadToEnd());
}
}

That doesn't look scary at all; we've hidden all the grungy details behind the opaque RunAsync method, using delegates - but we're using the proper (IO completion-based) async handlers. Here's the RunAsync method(s) - a little trickier, perhaps, but we only need to write it once - the point is that it can be used for any async Begin/End pattern (although we'd probably need to add a few overloads for common method signatures):

    static void RunAsync<T>(
Func<AsyncCallback, object, IAsyncResult> begin,
Func<IAsyncResult, T> end,
Action<T> callback,
Action<Exception> exceptionHandler) {
RunAsync<T>(begin, end, callback, null);
}
static void RunAsync<T>(
Func<AsyncCallback, object, IAsyncResult> begin,
Func<IAsyncResult, T> end,
Action<T> callback,
Action<Exception> exceptionHandler) {
begin(ar=> {
T result;
try {
result = end(ar);
} catch(Exception ex) {
if (exceptionHandler != null) {
exceptionHandler(ex);
}
return;
}
callback(result);
}, null);
}

We could probably also do something similar using a fluent API, but to be honest the above makes it simple enough for me to use...

All of which will be handy when/if I finally get around to writing an RPC client/server for protobuf-net...

-----

UPDATE: further work has shown that having two actions (result and exception) is ugly; a far more useful pattern is to take a single action; Action<Func<T>gt; (for methods with return values) or Action<Action> (for void methods). The idea is that the original caller can invoke this function to get either the value or the exception (thrown):


public static void RunAsync<T>(
Func<AsyncCallback, object, IAsyncResult> begin,
Func<IAsyncResult, T> end,
Action<Func<T>> callback) {
begin(ar => {
T result;
try {
result = end(ar); // ensure end called
callback(() => result);
} catch (Exception ex) {
callback(() => { throw ex; });
}
}, null);
}

static void ProcessResponse(Func<WebResponse> result) {
WebResponse resp = result();
using (StreamReader reader = new
StreamReader(resp.GetResponseStream())) {
Console.WriteLine(reader.ReadToEnd());
}
}

This is illustrated further here.