Sunday 9 April 2017

Exploring Tuples as a Library Author

Exploring Tuples as a Library Author

tl;dr:

This works in an upcoming version of dapper; we like it:

// very lazy dapper query 
var users = db.Query<(int id, string name)>(
    "select Id, Name from Users").AsList();

And this works in an upcoming release of protobuf-net:

// poor man's serialization contract
var token = Serializer.Deserialize<(DateTime dob, string key)>(source);

But - all is not necessarily as it seems. Here we will discuss the nuances of C# 7 tuple usage, and look at the limitations and restrictions that the implementation imposes. In doing so, we can look at what this means for library authors, and what expectations consumers should have.

What is new in the world of tuples?

One of the fun new tools in C# 7 is language-level tuple support. Sure, we’ve had the Tuple<...> family for ages, but that forces you to use Item1, Item2 (etc) naming, and it is a class - so involves allocations, but… it works:

// System.Tuple usage
var tuple = Tuple.Create(123, "abc");
int foo = tuple.Item1; // 123
string bar = tuple.Item2; // "abc"

The ever-terser syntax for read-only properties makes it not too hard to write your own manual tuples, but it isn’t necessarily the best use of your time (especially if you want to do it well - the code is less trivial than people imagine); a simple example might look like:

// manual tuple struct definition
struct MyType
{
    public MyType(int id, string name)
    {
        Id = id;
        Name = name;
    }
    public int Id { get; }
    public string Name { get; }
}

Maneagable, but… not very friendly if you’re just trying to hack a few values around in close proximity to each-other.

More recently, the ValueTuple<...> family has been added on nuget; this solves the allocation problem, but still leaves us with names like Item1, Item2:

// System.ValueTuple usage
var tuple = ValueTuple.Create(123, "abc");
int foo = tuple.Item1; // 123
string bar = tuple.Item2; // "abc"

The new thing, then, is language support to prettify this. As an illustration, we can re-write that as:

// language tuple usage
var tuple = (id: 23, name: "abc");
int foo = tuple.id; // 123
string bar = tuple.name; // "abc"

This allows us to conveniently represent some named pieces of data. It looks broadly similar to anonymous types:

// anonymous type
var obj = new { Id = 23, Name = "abc" };
int foo = obj.Id; // 123
string bar = obj.name; // "abc"

But it is implemented very differently, as we’ll see. Anonymous types are very limited in scope - bacause you can’t say their name, you can’t expose them on any API boundary (even internal methods). Tuples, however, don’t have this limitation - they can be used in parameters and return types:

// language tuples on method signatures
public static (int x, int y) DoSomething(
    string foo, (int id, DateTime when) bar) {...}

And you can even express them in generics:

// language tuples as generic type parameters
var points = new List<(int x, int y)>();

How are language tuples implemented?

Given how similar they look to anonymous types, you’d be forgiven for assuming that they were implemented similarly - perhaps defining a struct instead of a class. Let’s test that by decompiling a basic console exe:

// test console using language tuples
static class Program
{
    static void Main()
    {
        var tuple = (id: 23, name: "abc");
        System.Console.WriteLine(tuple.name); // "abc"
    }
}

We’ll compile that in release mode and look at the IL:

ildasm mytest.exe

and find the Main() method - we see:

.method private hidebysig static void  Main() cil managed
{
.entrypoint
// Code size       23 (0x17)
.maxstack  8
IL_0000:  ldc.i4.s   23
IL_0002:  ldstr      "abc"
IL_0007:  newobj     instance void valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>::.ctor(!0, !1)
IL_000c:  ldfld      !1 valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>::Item2
IL_0011:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0016:  ret
} // end of method Program::Main

I don’t expect everyone to be able to read IL, but what you won’t see in there is any mention of id or name - and there are no additional types in the assembly (just Program). Instead, it is using System.ValueTuple<int,string>. The IL we have here is exactly the same as we would get if we had compiled:

// test console using System.ValueTuple<...>
static class Program
{
    static void Main()
    {
        var tuple = new System.ValueTuple<int, string>(23, "abc");
        System.Console.WriteLine(tuple.Item2); // "abc"
    }
}

This is by design and is always the case - which is why you need a reference to System.ValueTuple in order to use language tuples. In fact, to make things more fun, you’re *still allowed to use the Item1 etc names:

// accessing a tuple via the implementation names
var tuple = (id: 23, name: "abc");
System.Console.WriteLine(tuple.Item2); // "abc"

Things would obviously get very confusing if you had used Item2 as a name in your tuple syntax - the compiler has enough sense to check that if you declare Item2, it must be in the second spot; and a few other well-known method names are blocked too, but generally: you’ll be fine.

But if it is just ValueTuple<...>, how does it work on the public API?

After all, this works:

// declare a method that uses tuples on the public API
public static class TypeInAssemblyA
{
    public static List<(int x, int y)> GetCoords((string key, string value) foo)
    {
        return new[] { (1, 2) }.ToList();
    }
}
// consume a method that uses tuples on the public API
static class TypeInAssemblyB
{
    static void ConsumeData()
    {
        foreach (var pair in TypeInAssemblyA.GetCoords(("abc", "def")))
        {
            System.Console.WriteLine($"{pair.x}, {pair.y}");
        }
    }
}

So the logical names (x and y) are clearly being conveyed somehow, but this is still ValueTuple<...>. If we look at the IL again, we see the trick in the top of the declaration of GetCoords:

.method public hidebysig static class [mscorlib]System.Collections.Generic.List`1<valuetype [System.ValueTuple]System.ValueTuple`2<int32,int32>> 
        GetCoords(valuetype [System.ValueTuple]System.ValueTuple`2<string,string> foo) cil managed
{
.param [0]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 02 00 00 00 01 78 01 79 00 00 )
// .......x.y..
.param [1]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 02 00 00 00 03 6B 65 79 05 76 61 6C 75 65 00 00 )
// .......key.value

Don’t worry if you’re not an IL expert - they key point is that it declares attributes ([TupleElementNames(...)]) against the parameter and return type that include the names, essentially as though we had done:

// equivalent: use tuple element name attributes explicitly
[return: TupleElementNames(new[] { "x", "y}" })]
public static List<ValueTuple<int, int>> GetCoords(
    [TupleElementNames(new[] {"key","value"})]
    ValueTuple<string,string> foo)
{
    return new[] { (1, 2) }.ToList();
}

(the C# 7 compiler doesn’t let us do this manually, though - it tells us to use tuple syntax instead)

This tells us once and for all that the names are nothing to do with the type, and are either removed completely, or exposed simply as decorations via attributes (when used on an API).

But tuples can be arbitrarily complex

Note that we don’t just need to consider returning flat tuples - the individual elements of tuples can themselves be tuples, or involve tuples (lists of tuples, arrays of tuples, tuple fields, etc). To be honest I don’t expect to see this much in the wild (if you’re getting this complex, you should probably define your own type), but this works:

// expose nested tuples
public static (int x, (int y, (float a, float b))[] coords) Nested()
    => (1, new[] { (2, (3.0f, 4.0f)) });

which is described as:

.param [0]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 06 00 00 00 01 78 06 63 6F 6F 72 64 73 01 79 FF 01 61 01 62 00 00 )
// .......x.coords.y..a.b..

which is the equivalent of:

 [return: TupleElementNames(new[] {
     "x", "coords", "y", null, "a", "b" })]

In truth, I don’t know the significance of the null, or how exactly the mapping works in terms of position in the compound tuple vs position in the array.

EDIT: oops, silly me - the null is because I didn't name the element described by (float a, float b) - yes, that's allowed.

We can get names out of an API - can we push them in?

It is very common in code that uses libraries to create an insteance and then push it into an API either as object or via generics (<T>). For example:

var val = (id: 123, name: "Fred");
SomeLibrary.SomeObjectMethod(val);
SomeLibrary.SomeGenericMethod(val); // <T> inferred

with (for a simple example:

public static void SomeGenericMethod<T>(T val) {
    Console.WriteLine(typeof(T).Name);
    foreach(var field in typeof(T).GetFields())
        Console.WriteLine(field.Name);
}

public static void SomeObjectMethod(object val)
{
    Console.WriteLine(val.GetType().Name);
    foreach (var field in val.GetType().GetFields())
        Console.WriteLine(field.Name);
}

If we run this, we see:

ValueTuple`2
Item1
Item2
ValueTuple`2
Item1
Item2

This output confirms that once again we’re seeing ValueTuple<...> in play. As before, let’s look at the relevant part of the IL for these two calls (note I added some other code, not shown, to force the value into locals for simplicity):

ldloc.0
box        valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>
call       void SomeLibrary::SomeObjectMethod(object)

ldloc.0
call       void SomeLibrary::SomeGenericMethod<valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>>(!!0)

The first call converts the tuple to object by “boxing” it; the second call uses generics without boxing. But neither of them include anything whatsoever about the names. This is consistent with what we saw by dumping the fields, and gives us the summary statement:

Tuple names pass outwards, never inwards

Tuple names can be passed outwards (names declared in the public signature of a method can be used in code that consumes that method), but can not be passed inwards (names declared in one method will never be available to general library code that consumes those values). This has significant implications in a wide range of scenarios that we normally expect to be able to access names via reflection: the names simply won’t exist. For example, if we serialized a tuple to JSON, we would expect to see Item1, Item2, etc instead of the names we thought we declared. And there is nothing whatsoever the library author can do to get the original names. It would be nice if there was something akin to the method caller attributes ([CallerMemberName] etc), that library authors could attach to an optional parameter to receive the dummy names in some structured form (one part per generic type parameter), but: I can see that it would be very hard to define the semantics of this in the general case (it could be a generic method on a generic type nested in a generic type, for example).

So what can library authors do?

OK, so names are off the table. But that doesn’t mean that tuples are completely useless for use with library code. We still have position. And if a library can do something sensible just using the position semantics, then it doesn’t matter that the library doesn’t know the names that the caller had in mind.

Let’s take 2 examples; protobuf-net and dapper.

protobuf-net

This is a binary serialization library that follows the protocol buffers (protobuf) serialization format. One of the ways that protobuf achieves efficent encoding is that it doesn’t care about names. At all (at least, in the binary protocol). All it knows is that there is an integer in field 1, a utf8 string in field 2, and an array of floats in field 3 (for example). In this case, we don’t lose anything when we see ValueTuple<...> - all we need to do is to make sure that the library recognises the pattern and knows how to infer the contract from the type (so Item1 maps to field 1, Item2 maps to field 2, etc).

As it happens, protobuf-net already had code to recognise a general family of tuple patterns, so this very nearly worked for free without any changes. The only thing that tripped it up was that historically the “is this a tuple” check made the assumption that any sensible tuple would be immutable. As it happens, ValueTuple<...> is implemented as fully mutable (with mutable public fields). By softening that check a little bit, protobuf-net now works with any such tuple it sees.

dapper

This is an ADO.NET helper utility, that makes it really really simple to correctly execute parameterized commands and queries, and populate records from the results. In terms of the latter, when executing Query<T>(...), it executes the ADO.NET query, then maps the results by name into the fields and properties of T, one instance of T per row received. As we’ve discovered before, we are never going to able to respect the caller’s names when using C# 7 tuples, and it is unlikely that people will conveniently call their columns Item1, Item2, etc. That means that it probably isn’t going to be useful to use value tuples with domain entity objects, but… if you’re using value tuples for your domain entity objects you already need a stiff talking to.

However! There is another very common query scenario with dapper; ad-hoc local queries that get trivial data. In a lot of these cases, it really isn’t worth going to the trouble of declaring a custom type to receive the results, so dapper supports dynamic as a “meh, it’ll work” helper:

// query dapper via "dynamic"
int id = ...
var row = conn.QuerySingle("select Name, LastUpdated from Records where Id=@id", new {id});
string name = row.Name; // "dynamic"
DateTime lastUpdated = row.LastUpdated;
// use name and lastUpdated

This works, but: it involves the DLR (relatively expensive compared to direct C#), and has a few other overheads. What we can do is change dapper to recognise that names aren’t useful in ValueTuple<...>, but rely instead on the column position:

// query dapper via C# 7 tuples
int id = ...
var row = conn.QuerySingle<(string name, DateTime lastUpdated)>(
     "select Name, LastUpdated from Records where Id=@id", new {id});
// use row.name and row.lastUpdated

That’s actually pretty handy for a wide range of scenarios where we’re going to consume the data immediately adjacent to the query. And because they are value types we don’t even pay heap allocation costs. I wouldn’t use it for populating real domain objects, though.

dapper: what about parameters?

you’ll notice that we also have a new {id} anonymous type usage to pass parameters into dapper. There are a number of reasons that I’ve left this alone and have made no attempt to support value tuples:

  • in ADO.NET, most providers support names, and some important providers (SqlClient for SQL Server for example) do not support positional parameters; while we could in theory expose a positional parameter syntax in the API, the use would be somewhat limited
  • at the moment, the parameters are passed as object; if we pass a C# tuple it would be boxed, which means an allocation - and at that point, we might as well have just used an anonymous type object (which would have conveyed names)
  • if you’re thinking “make the method generic so the parameter is passed as T to avoid boxing”, the problem is that the method is already generic in the return type, and in C# when calling a generic method you can either supply all the generic types explicitly, or let the compiler infer all of them; there is no middle ground. Making the parameters a generic would be inconvenient in all cases, and impossible in others - we cannot name the T used in code that uses anonymous types, precisely because it is anonymous. There are hacks around this, but: nothing great

All in all, I’m happy to just say “use anonymous types for the parameters; use tuples for output rows” - it is a pragmatic compromise that covers the huge majority of useful real-world scenarios. I have added code to detect ValueTuple<...> being used in the parameters and raise an appropriate exception guiding the user on the problem an how to fix it.

Summary

We’ve explored C# 7 tuples and their implementation. Armed with the implementation details, we’ve explored what that means for code that consumes C# 7 tuples - both upstream and downstream. We’ve discussed the limitations for use with libary code, but we’ve also seen that they still have plenty of uses with library code in some scenarios. If I’ve missed anything, let me know on twitter!

Addendum

After publishing, Daniel Crabtree very rightly notes that the Item* fields only go up to Item7, and that additional fields are stored in the Rest field which can itself be a tuple. You can read more about that in his post here.