Sunday, 8 September 2013

Fun with immutable collections

 

Yesterday, somebody asked me whether I was going to add protobuf-net support for the Microsoft Immutable Collections package. So far, I’ve been following their announcements, but not really dabbling with them too much – but, it looks pretty close to complete, so I guess I’d better take a peek.

Context: What is Microsoft Immutable Collections?

The Immutable Collections package includes a set of new externally immutable collection types. In particular, having immutable types makes it much easier to avoid issues with threading and concurrency, which is ever-increasingly important whether that is because you want to exploit all the cores on a powerful box, or because the framework you are using demands that you you do as much as you possibly can in asynchronous methods. Of course, it also makes it harder to mess up values unexpectedly (some method in the 7th level of code hell unexpectedly calling Clear(), for example)– so win/win.

So how does that impact serializers?

The up-side of all of this is that you can expose now-immutable collections without needing to worry about changes; the down-side is that code that depends on changes will need some love – and in particular, this is bad news for things like serializers. The “serialize” step isn’t hugely impacted – we can still walk the data writing the elements to the output stream – but the “deserialize” step is shot down in flames. A basic deserialize-a-list implementation is something like:

IList collection = (IList)(value
?? new SomeDefaultCollectionType());
while(CanReadNextSubItem())
{
object newItem = ReadNextSubItem();
collection.Add(newItem);
}
return collection;

(note that the actual code is likely to be more complex to account for the various edge-cases and optimisations that most people really don’t want to ever have to know about)


The problem here is that the immutable collections kinda claim that they can support this – raising exceptions at runtime. So: we need to identify the immutable collections and accommodate them. It would be ironic if I said “nope, I don’t want to support that” – because in most protobuf implementations the tool-generated objects and collections are inherently immutable.


What is the immutable collection API?


Obviously, the new / Add() approach isn’t going to work. If we take a look at the package, there are basically two usage patterns in play. Here’s two different ways of creating an immutable list (the other collection types are pretty similar):


Example 1

var list = ImmutableList.Create();
list = list.Add("first");
list = list.Add("second");
list = list.Add("third");

Example 2

var builder = ImmutableList.CreateBuilder();
builder.Add("first");
builder.Add("second");
builder.Add("third");
var list = builder.ToImmutable();

The first example shows how you might make occasional changes; note that list becomes a different value after each call, which is how it achieves immutability – the old collection (which could be still being used elsewhere) remains unchanged. Optimisations help make this as inexpensive as possible, hopefully allowing the caller not to need to know much about it.


The second examples shows how you might implement it if you know you are doing a block of changes – basically, the builder API allows it to avoid the overheads of immutability for all the intermediate steps – only worrying about that when we call ToImmutable();


Looking at these, then, it feels like the most obvious contender for a serializer is to try to identify the builder API. This will also avoid confusion with some pre-existing list implementations that have Add() methods that return a non-void result.


The implementation


I don’t like hard-coding to specific interfaces - at least, not from the meta-programming layer. There are various reasons for this, including:



  • I don’t want to force an additional reference on people
  • I don’t want to be tied to a particular version of an in-progress API
  • I don’t want to explicitly implement versions for every immutable collection that gets added, and its interface-based twin
  • sometimes, the meta-programming layer is executing against an entirely different framework (for example, generating a dll that will execute on Windows Phone) – external dependencies are thus to be avoided like the plague!

so I prefer to identify the pattern – not the specific implementation.


If you take a look at the API in the current library, we can see:



  • concrete types like ImmutableList<T>
  • which implemenents IEnumerable<T>, possibly with a custom iterator type which may have a paired IImutableList<T> interface
  • and which implements IReadOnlyCollection<T>
  • which has a utility class ImmutableList, with methods like CreateBuilder()
  • where the builder has Add(), ToImmutable(), and possibly AddRange() methods

Which is actually a pretty specific set of stuff to identify a pattern. Note that the custom iterator type is a handy optimisation used by ImmutableArray<T> (which is a struct) to avoid boxing; fortunately protobuf-net already knows all about custom iterators, so we don’t have to change a single line of the serialize code. What I do, then, is:



  • firstly – everything here is meta-programming – identifying the pattern via reflection, and then typically baking IL to execute it as fast as possible
  • assume the existing code recognises that it is vaguely list-like
  • then we check that it is a generic type that implements some IReadOnlyCollection<T>
  • if it implements that, it might be a candidate – so we look for the non-generic utility type of the same name in the same namespace, taking into account the interface abstractions (so IImutableSet<T> actually maps to ImmutableHashSet – that is the only hard-coded exception I needed to add)
  • resolve the CreateBuilder() method, and identify the methods available (Add(), AddRange(), ToImmutable()
  • generate an deserialize loop using those methods

So comparing back to our original loop, we now have (assuming we don’t care about existing values, which makes it trickier) something like:

var builder = ImmutableList.CreateBuilder<SomeType>();
while(CanReadNextSubItem())
{
SomeType newItem = ReadNextSubItem();
builder.Add(newItem);
}
return builder.ToImmutable();

Conclusion


I hope that has served as an interesting introduction into the immutable types, but coming at it from the angle of “identifying the abstract pattern” rather than “working with the API directly”. The trunk of protobuf-net (r666) now has support for immutable lists, arrays, dictionaries, hash-sets, sorted-sets, sorted-dictionaries - and all of their interface twins. But all in a single pattern recognition block.

3 comments:

Max said...

Nice! One thing that I found surprising: ImmutableList doeesn't appear to be covariant, unlike IEnumerable.

E.g. revisiting the the example you gave in your 2009 blog post (http://marcgravell.blogspot.com/2009/02/what-c-40-covariance-doesn-do.html), using System.Collections.Immutable:

abstract class Fruit { }
class Apple : Fruit { }

void Foo(ImmutableList fruit) { /* do some things */ }


Now this still throws a compiler error:

var apples = ImmutableList.Create();
Foo(apples); // compile error

I wonder why ... can you think of a reason why ImmutableList would not be covariant, i.e. ImmutableList?

Marc Gravell said...

@Max firstly, covariance only applies to interfaces and delegates - not classes; however, IImutableList-of-T *also* can't be immutable, because it has methods that would violate the compiler rules. Yes, we know that it doesn't actually violate them - but the compiler doesn't attempt to understand the intent - just the rules.

James World said...

You said that foreach would use duck-typing ahead of IEnumerable.

This much is true, but the compiler will note that String.Split returns a string array, and will use the implicit reference conversion to IEnumerable to work the foreach.

(See section 8.8.4 in the C# 5 language specification)