Tuesday, 24 February 2009

What C# 4.0 covariance *doesn't* do

On a number of occasions, I've seen people talking about the old covariance problem - in particular, given:

    abstract class Fruit { }
    class Apple : Fruit { }

    void Foo(List<Fruit> fruit) { /* do some things */ }

How do I call Foo with the list of apples (List<Apple>) that I have?

    List<Apple> apples = ...
    Foo(apples); // compile error

The error is: Argument '1': cannot convert from 'System.Collections.Generic.List<Program.Apple>' to 'System.Collections.Generic.List<Program.Fruit>'.

There is no implicit conversion (covariance) between list-of-subclass and list-of-superclass. In particular, contrast this to arrays:

    void Foo(Fruit[] fruit) { /* do some things */ }
    Apple[] apples = ...
    Foo(apples); // compiles and executes (within limitations)

The difference is that .NET arrays (of reference-type objects) are covariant, but prior to .NET 4.0, nothing else is. And the existing arrays covariance is a bit of a bodge to make some things work; it is risky (for example, Foo could happily try to add (via the indexer) a Banana to the array; this would compile fine but would explode at execution - so compile-time checking goes out the window).

So here's the problem: repeatedly, I've seen people claim that ".NET 4.0 / C# 4.0 covariance will make the above (lists) work" - this is both wrong and completely misses the point that you can already do this in .NET 2.0 / C# 2.0.

Why is it wrong?

  1. .NET 4.0 / C# 4.0 only applies to generic interfaces and generic delegate declarations; it does not apply to concrete types (OK, we could re-declare the method to accept an IList<Fruit>...)
  2. It is limited to strict "out" usage (compare to contravariance which is limited to strict "in" usage); lists cannot be either, so lists will not feature any covariance/contravariance features in 4.0

Note that if we only used IEnumerable<Fruit>, then it would work, since IEnumerable<T> is pure "out", and is covariant in 4.0 - confused yet?
For the general case, I'm assuming that Foo needs a list (i.e. it is going to add things, use the indexer, etc).

Huh? What is in/out usage?

To get this working in 4.0, you can now optionally mark generic-type-arguments as either "in" or "out" (not both). An "in" is something that is only ever passed in via the interface/delegate (for example, "void Add(T arg);"), and an "out" is something that is only ever passed out via the interface/delegate (for example, "T Current {get;}"). [caveat: the "out" modifier against method arguments doesn't satisfy "out" covariance usage, since it is really just a special "ref" scenario].

This is set against the generic-type-argument directly, and is verified by the compiler; for example, IEnumerable<T> and Func<TArg1, TResult> are now:

    IEnumerable<out T>
    Func<in TArg1, out TResult>

The BCL team have had the fun job of updating many existing interfaces / delegates to make this work!

OK, so lists don't work; what can we do?

Generics. We simply don't need covariance in this case. Instead of thinking "list of fruit", think "list of things, where those things are fruit", and you are getting close (for the caller's convenience, we'll also switch to the IList<T> interface too, but this is unrelated):

    Foo<T>(IList<T> fruit) where T : Fruit { /* do some things */ }

This is now a generic method that will accept any (interface-based) list-of-T, where T is a fruit. We don't have to modify the calling code - our original Foo(apples) works fine (the compiler infers Foo<Apple>). More: the compiler now checks that we only attempt to add T or subclasses of T to the list, so the problem found with arrays goes away. Of course, you need to start thinking in terms of T, which is tricky... for example, if you need to create fruit, you might want an additional generic constraint:

    Foo<T>(IList<T> fruit) where T : Fruit, new()
    {
        T newFruit = new T();
        // other things
        fruit.Add(newFruit);
    }

Generics. Your friend since 2.0; use them ;-p