Saturday, 29 April 2017

StackExchange.Redis and Redis 4.0 Modules

StackExchange.Redis and Redis Modules

This is largely a brain-dump of my plans for Redis 4.0 Modules and the StackExchange.Redis client library.

Redis 4.0 is in RC 3, which is great for folks interested in Redis. As the primary maintainer of StackExchange.Redis, new releases also bring me some extra work in terms of checking whether there are new features that I need to incorporate into the client library. Some client libraries expose a very raw API surface, leaving the individual commands etc to the caller - this has the advantagee of simplicity, but it has disadvantages too:

  • it presents a higher barrier to entry, as users need to learn the redis command semantics
  • it prevents the library offering any special-case optimizations or guidance
  • it makes it hard to ensure that key-based sharding is being implemented correctly (as to do that you need to know with certainty which tokens are keys vs values vs command semantics)
  • it is hard to optimize the API

For all these reasons, StackExchange.Redis has historically offered a more method-per-command experience, allowing full intellisense, identification of keys, helper enums for options, santity checking of operands, and various scenario-specific optimizations / fallback strategies. And ... if that isn't enough, you can always use hack by using Lua to do things at the server directly.

Along comes Modules

A key feature in Redis 4.0 is the introduction of modules. This allows anyone to write a module that does something interesting and useful that they want to run inside Redis, and load that module into their Redis server - then invoke it using whatever exotic commands they choose. If you're interested in Redis, you should go check it out! There's already a gallery of useful modules started by Redis Labs - things like JSON support, Machine Learning, or Search - with an option to submit your own modules to the community.

Clearly, my old approach of "manually update the API when new releases come out" doesn't scale to the advent of modules, and saying "use Lua to run them" is ... ungainly. We need a different approach.

Adding Execute / ExecuteAsync

As a result, in an upcoming (not yet released) version, the plan is to add some new methods to StackExchange.Redis to allow more direct and raw access to the pipe; for example the rejson module adds a JSON.GET command that takes a key to an existing JSON value, and a path inside that json - we can invoke this via:

string foo = (string)db.Execute(
    "JSON.GET", key, "[1].foo");

(there's a similar ExecuteAsync method)

The return value of these methods is the flexible RedisResult type that the Lua API already exposes, which handles all the expected scenarios of primitives, arrays, etc. The parameters are simply a string command name, and a params object[] of everything else - with appropriate handling of the types you're likely to use with redis commands (string, int, double, etc). It also recognises parameters typed as RedisKey and uses them for routing / sharding purposes as necessary.

The key from all of this is that it should be easy to quickly hook into any modules that you write or want to consume.

What about more graceful handling for well-known modules?

My hope here is that or well-known but non-trivial modules, "someone" (maybe me, maybe the wider community) will be able to write helper methods as C# extension methods against the client library, and package them as module-specific NuGet packages; for example, a package could add:

public static RedisValue JsonGet(this IDatabase db, RedisKey key,
    string path = ".", CommandFlags flags = CommandFlags.None)
{
    return (RedisValue)db.Execute("JSON.GET",
        new object[] { key, path }, flags);
}

to expose raw json functionality, or could choose to add serialization / deserialization into the mix too:

public static T JsonGet<T>(this IDatabase db, RedisKey key,
    string path = ".", CommandFlags flags = CommandFlags.None)
{
    byte[] bytes = (byte[])db.Execute("JSON.GET",
        new object[] { key, path }, flags);
    using (var ms = new MemoryStream(bytes))
    {
        return SomeJsonSerializer.Deserialize<T>(ms);
    }
}

The slight wrinkle here is that it is still using the Execute[Async] API; as a general-purpose API it is very convenient and flexible, but slightly more expensive than it absolutely needs to be. But being realistic, it is probably fine for 95% of use-cases, so: let's get that shipped and iterate from there.

I'd like to add a second API specifically intended for extensions like this (more direct, less allocations, etc), but a: ideally I'd want to ensure that I can subsequently tie it cleanly into the "pipelines" concept (which is currently just a corefxlab dream, without a known ETA for "real" .NET), and b: it would be good to gauge interest and uptake before spending any time doing this.

But what should consumers target?

This also makes "strong naming" rear it's ugly head. I'm not going to opine on strong naming here - the discussion is not very interesting and has been done to death. Tl,dr: currently, there are two packages for the client library - strong named and not strong named. It would be sucky if there was a mix of external extensions targeting one, the other, or both. The mid range plan is to make a breaking package change and re-deploy StackExchange.Redis (which currently is not strong-named) as: strong-named. The StackExchange.Redis.StrongName would be essentially retired, although I guess it could be an empty package with a StackExchange.Redis dependency for convenience purposes, possibly populated entirely by [assembly:TypeForwardedTo(...)] markers. I'm open to better ideas, of course!

So that's "The Plan"

If you have strong views, hit me on twitter (@marcgravell), or log an issue and we can discuss it.

Sunday, 23 April 2017

Spans and ref part 2 : spans

Spans and ref part 2 : spans

In part 1, we looked at ref locals and ref return, and hinted at a connection to “spans”; this time we’re going to take a deeper look at what this connection might be, and how we can use make use of it.

Disclaimer

I’m mostly on the outside of this - looking in at the public artefacts, playing with the API etc - maybe the odd PR or issue report. It is entirely possible that I’ve misunderstood some things, and it is possible that things will change between now and general availability.

What are spans?

By spans, I mean System.Span<T>, which is part of .NET Core, living in the System.Memory assembly. It is also available for .NET via the System.Memory package. But please note: it is a loaded gun to use at the moment - you can currently compile code that has undefined behavior, and which may not compile at some point in the future. Although to be fair, to get into any of the terrible scenarios you need to use the unsafe keyword, at which point you already said “I take full responsibility for everything that goes wrong here”. I’ll discuss this more below, but I wanted to mention that at the top in case you stop reading and don’t get to that important point.

Note that some of the code in this post uses unreleased features; I’m using:

<PackageReference Include="System.Memory"
    Version="4.4.0-preview1-25219-04" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe"
    Version="4.4.0-preview1-25219-04" />

Obviously all bets are off with preview code; things may change.

Why do spans need to exist?

We saw previously how ref T can be used similarly to pointers (T*) to represent a reference to a single value. Basically, anything that allows us to talk about complex scenarios without needing pointers is a good thing. But: representing a single value is not the only use-case of pointers. The much more common scenario for pointers is for talking about a range of contiguous data, usually when paired with a count of the elements.

At the most basic level, a Span<T> represents a strongly typed contiguous chunk of elements of type T with a known and enforced length. In many ways, very comparable to an array (T[]) or segment ArraySegment<T>) - but… more. They also provide safe (by which I mean: not unsafe in the C# sense) access to features that would previously have required pointers (T*).

I’m probably missing a few things here, but the most immediate features are:

  • provide a unified type system over all contiguous memory, including: arrays, unmanaged pointers, stack pointers, fixed / pinned pointers to managed data, and references into the interior of values
  • allow type coercion for primitives and value-types
  • work with generics (unlike pointers, which don’t)
  • respect garbage collection (GC) semantics by using references instead of pointers (the GC only walks references)

Now: if none of the above sounds like things you ever need to do, then great: you probably won’t ever need to use Span<T> - and that’s perfectly OK. Most application code will never need to use these features. Ultimately, these tools are designed for lower level code (usually: library code) that is performance critical. That said, there are some great uses in regular code, that we’ll get onto.

But… what is a span?

OK, OK. Conceptually, a Span<T> can be thought of as a reference and a length:

public struct Span<T> {
    ref T _reference;
    int _length;
    public ref T this[int index] { get {...} }
}

with a cousin:

public struct ReadOnlySpan<T> {
    ref T _reference;
    int _length;
    public T this[int index] { get {...} }
}

You would be perfectly correct to complain “but… but… in the last part you said no ref fields!”. That’s fair, but I did say conceptually. At least… for now!

Spans as ranges of an array

As a completely trivial (and rather pointless) example, we can see how we can use a Span<T> very similarly to how we might have used a T[]:

void ArrayExample() {
    byte[] data = new byte[1024];
    // not shown: populate data
    ProcessData(data);
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

Here we implicitly convert the byte[] to Span<byte> when calling the method, but at this point you would still be justified in being underwhelmed - we could have done everything here with just an array.

Similarly, we can talk about just a portion of the array:

void ArrayExample() {
    byte[] data = new byte[1024];
    // not shown: populate data
    ProcessData(new Span<byte>(data, 10, 512));
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

And again you could observe that we could have just used ArraySegment<T>. Actually, let’s be realistic: very few people use ArraySegment<T> - but we could have just passed int offset and int count as additional parameters, it would have worked fine. But I mentioned pointers earlier…

Spans as ranges of pointers

The second way we can use Span<T> is over a pointer; which could be any of:

  • a stackalloc pointer for a small value that we want to work on without allocating an array
  • a managed array that we previously fixed
  • a managed array that we previously pinned with GCHandle.Alloc
  • a fixed-sized buffer that we previously fixed
  • the contents of a string that we previously fixed
  • a coerced pointer from any of the above (I’ll explain what this means below)
  • a chunk of unmanaged memory obtained with Marshal.AllocHGlobal or any other unmanaged memory API
  • etc

All of these will necessarily involve unsafe, but: we’ll tread carefully! Let’s have a look at a stackalloc example (stackalloc is where you obtain a chunk of data directly on the call-stack):

void StackAllocExample() {
    unsafe {
        byte* data = stackalloc byte[128];
        var span = new Span<byte>(data, 128);
        // not shown: populate data / span
        ProcessData(span);
    }
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

That’s… actually pretty huge! We just used the exact same processing code to handle an array and a pointer, and we didn’t need to use unsafe (except in the code that initially obtained the pointer). This opens up a huge range of possibilities, especially for things like network IO and serialization. Even better, it means that we can do all of the above with a “zero copy” mentality: rather than having managed code writing to a byte[] that later gets copied to some unmanaged chunk (for whatever IO we need), we can write directly to the unmanaged memory via a Span<T>.

Slice and dice

A very common scenario when working with buffers and buffer segments is the need to sub-divide the buffer. Span<T> makes this easy via the Slice() method, best illustrated by an example:

void ProcessData(Span<byte> span) {
    while(span.Length > 0) {
        // first byte is single-byte length-prefix
        int len = span[0];

        // process the next "len" bytes
        ProcessChunk(span.Slice(1, len));

        // move forward len+1 bytes
        span = span.Slice(len + 1);
    }
}

This isn’t something we couldn’t do other ways, but it is very convenient here. Importantly, we haven’t allocated anything here - there’s no “new array” or similar - we just have a reference to a different part of the existing range, and / or a different length.

Coercion

A more interesting example is coercion; this is something that you can do with pointers, but is very hard to do with arrays. A classic scenario here would be IO / serialization: you have a chunk of bytes, and at some point in that data you need to treat the data as fixed-size int, float, double, etc data. In the world of pointers, you just… do that:

byte* raw = ...
float* floats = (float*)raw;
float x = floats[0], y = floats[1]; // consume 8 bytes

With arrays, there is no direct way to do this; you’d either need to use unsafe hacks, or you can use BitConverter if the types you need are supported. But this is easy with Span<T>:

Span<byte> raw = ...
var floats = raw.NonPortableCast<byte, float>();
float x = floats[0], y = floats[1]; // consume 8 bytes

Not only can we do it, but we have the added advantage that it has correctly tracked the end range for us during the conversion - we will find that floats.Length is equal to raw.Length / 4 (since each float requires 4 bytes). The important thing to realise here is that we haven’t copied any data - we’re still looking at the exact same place in memory - but instead of treating it as a ref byte, we’re treating it as a ref float.

Except… better!

We observed that with pointers we could coerce from byte* to float*. That’s fine, but you can’t use pointers with all types. Span<T> has much stronger support here. A particularly interesting illustration is SIMD, which is exposed in .NET via Vector<T>. A vexing limitation of pointers is that we cannot talk about a Vector<float>* pointer (for example). This means that we can’t use pointer coercion as a convenient way of reading and writing SIMD vectors (you’ll usually have to use Unsafe.Read<T> and Unsafe.Write<T> instead). But we can coerce directly to Vector<T> from a span! Here’s an example that might come up in things like applying the web-sockets xor mask to a received frame’s payload:

void ApplyXor(Span<byte> span, uint mask) {
    if(Vector.IsHardwareAccelerated) {
        // apply the mask to SIMD-width bytes at a time
        var vectorMask = new Vector<uint>(mask);
        var typed = span.NonPortableCast<byte, Vector<uint>>();
        for (int i = 0; i < typed.Length; i++) {
            typed[i] ^= vectorMask;
        }
        // move past that data (might be a few bytes left)
        span = span.Slice(Vector<uint>.Count * typed.Length);
    }
    // not shown - finish any remaining data 
}

That’s pretty minimal code for vectorizing something; it is especially nice that we didn’t even need to do the math to figure out the vectorizable range - typed.Length did everything we wanted. It would be premature for me to know for sure, but I’m also hopeful that these 0-Span<T>.Length loops will also elide the bounds check in the same way that array access from 0-T[].Length elides the bounds check.

And readonly too!

Pointers are notoriously permissive; if you have a pointer: you can do anything. You can use fixed to obtain the char* pointer inside a string: if you change the data via the pointer, the string now has different contents. string is not immutable if you allow unsafe: nothing is immutable if you allow unsafe. But just as we can obtain a Span<T>, we can also get a ReadOnlySpan<T>. If you only expect a method to read the data, you can give them a ReadOnlySpan<T>.

Zero-cost substrings

In the “corefxlab” preview code, there’s a method-group with signatures along the lines of:

 public static  ReadOnlySpan<char> Slice(this string text, ...)

(where the overloads allow an initial range to be specified). This gives us a ReadOnlySpan<char> that points directly at a range inside the string. If we want a substring, we can just Slice() again and again - with zero allocations and zero string copying - we just have different spans over the same data. A rich set of APIs already exists in the corefxlab code for working with this type of string-like data. If you do a lot of text processing, this could have some really interesting aspects.

This all sounds too good to be true - what’s the catch?

Here’s the gotcha: in order to have the appropriate correctness guarantees when discussing something that could be a managed object, could be data on the stack, or could be unmanaged data, we run into very similar problems that make it impossible to store a ref T local as a field. Remember that a Span<T> is conceptually a ref T (reference) and int (length) - well: we still need to obey the rules imposed by that “conceptually”. For a trivial example of how we can get in a mess, we can tweak our stackalloc example:

private Span<byte> _span;
unsafe void StackAllocExample() {
    byte* data = stackalloc byte[128];
    _span = new Span<byte>(data, 128);
    ...
}
void SomeWhileLater() {
    ProcessData(_span);
}

Where does _span refer to in SomeWhileLater? I can’t tell you. We get into similar problems with anything that used fixed to get a pointer - the pointer is only guaranteed to make sense inside the fixed. Conceptually the issue is not restricted to pointers - it would apply equally if we could initialize Span<T> directly with a ref T constuctor:

private Span<SomeStruct> _span;
void StackRefExample() {
    var val = new SomeStruct(123, 456);
    _span = new Span<SomeStruct>(ref val);
    // ^^^ hypothetical span of length 1
}

We didn’t even need unsafe to break things this time. No such constructor currently exists, very wisely!

We should be OK if we only ever use managed heap objects (arrays, etc) to initialize a Span<T>, but the entire point of Span<T> is to provide feature parity between things like arrays and pointers while making it hard to shoot yourself in the foot.

In addition to this, we also need to worry about atomicity. The runtime and language guarantee that a single reference can be read atomically (in one CPU instruction), but it makes no guarantees about anything larger. If we have a reference and a length, we start getting into very complex issues around “torn” values (an invalid pair of the reference and length that didn’t actually exist, due to two threads squabbling). A torn value is vexing at the best of times, but in this case it would lead to valid-looking code accessing unexpected memory - a very bad thing.

The stackalloc example above is a perfect example of code that will compile without complaint today, but will end very very badly - although we used unsafe, so: self-inflicted. But this and the atomicity issue are both illustrations of why we have…

The Important Big Rule Of Spans

Span<T> has undefined behavior off the stack. And in the future: may not be allowed off the stack at all - this means no fields, no arrays, no boxing, etc. In the same way that ref T only has defined behavior on the stack (locals, parameters, return values) - so Span<T> only has defined behavior on the stack. You are not meant to ever put a Span<T> in a field (including all those times when things look like locals but are actually fields, that I touched on last time). An immediate consequence of this is that atomicity is no longer an issue: each stack is specific to a single thread; if our value can’t escape the stack, then two threads can’t have competing reads and writes.

There’s some in-progress discussion on how the rules for this requirement should work, but it looks like the concept of a “ref-like” stack-only type is being introduced. ref T as a field would be ref-like, and Span<T> would be ref-like. Any ref-like type would only be valid directly on the stack, or as an instance field (not a static field) on a ref-like type. If I had to speculate at syntax, I’d expect this to look something like:

public ref struct Span<T> {
    ref T _reference;
    int _length;
    public ref T this[int index] { get {...} }
}

Emphasis: this syntax is pure speculation based on the historic reluctance to introduce new keywords, but the ref struct here denotes a ref-like type. It could also be done via attributes or a range of other ideas, but note that we’re now allowed to embed the ref-like ref T field. Additionally, the compiler and runtime would verify that Span<T> is never used illegally as a field or in an array etc. Notionally, we could also do this for our own types that shouldn’t escape the stack, if we have similar semantics but Span<T> doesn’t represent our scenario.

Thinking back to the StackRefExample, if we wanted to safely support usage like:

var val = new SomeStruct(123, 456);
var span = new Span<SomeStruct>(ref val); // local, not field

then presumably it could work, but we’d have to have similar logic about returning ref-like types as currently exists for ref return, further complicated by the fact that we don’t have the single-assignment guarantee - we can reassign a Span<T>. If ref-like types work in the general case, then the logic about passing and returning such a value needs ironing out. And that’s complex. I’m very happy to defer to Vladimir Sadov on this!

EDIT: to clarify - it is only the pair of ref T and length (together known as a span, Span<T> or ReadOnlySpan<T>) that need to stay on the stack; the memory that we're spanning can be anywhere - and will often be part of a regular array (T[]) on the managed heap. It could also be a reference to the unmanaged heap, or to a separate part of the current stack.

So how am I meant to work with spans?

Sure, not everything is on the stack.

This isn’t as much of a limitation as it sounds. Instead of storing the Span<T> itself, you just need to store something that can manifest a span. For example, if you’re actually using arrays you might have a type that contains an ArraySegment<T>, but which has a property:

public Span<T> Span { get { ... } }

As long as you can switch into Span<T> mode when you’re inside an appropriate method, all is good.

For a more unified model, the corefxlab code contains the Buffer<T> concept, but it is still very much a work in progress. We’ll have to see how it shakes out in time.

Wait… why so much ref previously?

We covered a lot of ref details - you might feel cheated. Well, partly we needed that information to understand the stack-only semantics of Span<T>. But there’s more! Span<T> also exposes the ref T directly via the aptly named DangerousGetPinnableReference() method. This is a ref return, and allows us to do any of:

  • store the ref return into a ref local and work with it
  • pass the ref return as a ref or out parameter to another method
  • use fixed to convert the ref to a pointer (preventing GC movement at the same time)

The latter option means that not only can we get from unsafe to Span<T>, but we can go the other direction if we need:

fixed(byte* ptr = &span.DangerousGetPinnableReference())
{ ... }

If I can get a ref, can I escape the bounds?

The DangerousGetPinnableReference() method give us back a ref to the start of the range, comparable to how a T* pointer refers to the start of a range in pointer terms. So: can we use this to get around the range constraints? Well… yes… ish:

ref int somewhere = ref Unsafe.Add(
    ref span.DangerousGetPinnableReference(), 5000);

This cheeky duo gives us a reference to whatever is 5000-integers ahead of the span we were thinking of. It might still be part of our data (if we have a large array, for example), or it might be something completely random. But the sharp eyed might have noticed some key words in that expression… “Unsafe...” and “Dangerous...”. If you keep sprinting past signs with words like that on: expect to hit rocks. There’s nothing here that you couldn’t already do with unsafe code, note.

Doing crazy things with unmanaged memory

Sometimes you need to use unmanaged memory - this could be because of memory / collection issues, or could be because of interfacing with unmanaged systems - I use it in CUDA work, for example, where the CUDA driver has to allocate the memory in a special way to get optimal performance. Historically, working with unmanaged memory is hard - you will be using pointers all the time. But we can simplify everything by using spans. Here’s our dummy type that we will store in unmanaged memory:

// could be explict layout to match external definition
struct SomeType
{
    public SomeType(int id, DateTime creationDate)
    {
        Id = id;
        _creationDate = creationDate.ToEpochTime();
        // ...
    }
    public int Id { get; }
    private long _creationDate;
    public DateTime CreationDate => _creationDate.FromEpochTime();
    // ...
    public override string ToString()
        => $"{Id}: {CreationDate}, ...";
}

We’ll need to allocate some memory and ensure it is collected, usually via a finalizer in a wrapper class:

unsafe class UnmanagedStuff : IDisposable
{
    private SomeType* ptr;
    public UnmanagedStuff(int count)
    {
        ptr = (SomeType*) Marshal.AllocHGlobal(
            sizeof(SomeType) * count).ToPointer();
    }
    ~UnmanagedStuff() { Dispose(false); }
    public void Dispose() => Dispose(true);
    private void Dispose(bool disposing)
    {
        if(disposing) GC.SuppressFinalize(this);
        var ip = new IntPtr(ptr);
        if (ip != IntPtr.Zero) Marshal.Release(ip);
        ptr = default(SomeType*);
    }
}

The wrapper type needs to know about the pointers, so is going to be unsafe - but does the rest of the code need to? Sure, we could add an indexer that uses Unsafe.Read / Unsafe.Write to access individual elements, but that means copying the data constantly, which is probably not what we want - and it doesn’t help us represent ranges. But spans do: we can return a span of the data (perhaps via a Slice() API):

public Span<SomeType> Slice(int offset, int count)
    => new Span<SomeType>(ptr + offset, count);
// ^^^ not shown: validate range first

And we can consume this pretty naturally without unsafe:

// "stuff" is our UnmanagedStuff object
// easily talk about a slice of unmanaged data
var slice = stuff.Slice(5, 10);
slice[0] = new SomeType(123, DateTime.Now);                

// (separate slices work)
slice = stuff.Slice(0, 25);
Console.WriteLine(slice[5]); // 123: 23/04/2017 09:09:51, ...

If we want to talk about individual elements (rather than a range), then a ref local (via a ref return) is what we want; we could use the DangerousGetPinnableReference() API on a Span<T> for this, but in this case it is probably easier just to use Unsafe directly:

public ref SomeType this[int index]
    => ref Unsafe.AsRef<SomeType>(ptr + index);
// ^^^ not shown: validate range first 

We can consume this with similar ease:

// talk about a *reference* to unmanaged data
ref SomeType item = ref stuff[5];
Console.WriteLine(item); // 123: 23/04/2017 09:09:51, ...
item = new SomeType(42, new DateTime(2016, 1, 8));

// prove that updated *inside* the slice
Console.WriteLine(slice[5]); // 42: 08/01/2016 00:00:00, ...

And now from any code, we can talk directly to the unmanaged memory simply by passing it in as a ref parameter - it will never be copied, just dereferenced. If you want to talk about an isolated copy or store a copy as a field, then you can dereference, but that is easy:

SomeType isolated = item;

If you’ve ever worked with unmanaged memory from C#, this is a huge difference - and opens up a whole range of interesting scenarios for allocation-free systems without requiring the entire codebase to be unsafe. For context, in an allocation-free system, the lifetime of a set of data is strictly defined by some unit of work - processing an inbound request, for example. This means we don’t need reference tracking and garbage collection (and GC pauses can hurt high performance systems), so instead we simply take some slabs of memory, work from them (incrementing counters as we consume space), and then when we’ve finished the request we just set all the counters back to zero and we’re ready for the next request, no mess. Spans and ref locals and ref return make this friendly, even in the unmanaged memory scenario. The only caveat being - once again: Span<T> and ref T cannot legally escape the stack. But as we’ve seen, we can expose on-demand a Span<T> or ref T - so it isn’t a burden.

Summary

Spans; they’re very powerful if you need that kind of thing. And they force a range of new concepts into C#, giving us all the combined strong points of arrays, pointers, references and generics - with very few of the pain points. If you don’t care about pointers, buffers, etc - you probably won’t need to learn about spans. But if you do, they’re awesome. The amount of effort the .NET folks (and the community, but mostly Microsoft) have made making this span concept so rich and powerful is huge - it impacts the compiler, the JIT, the runtime, and multiple libraries both pre-existing and brand new. And it impacts both .NET and .NET Core. As someone who works a lot in the areas affected by spans and ref - it is also hugely appreciated. Good things are coming.

Saturday, 22 April 2017

Spans and ref part 1 : ref

Spans and ref part 1 : ref

One of the new features in C# 7 is by-reference (ref) return values and locals. This is a complex topic to explain, but a good example of why we might want this is “spans” (Span<T>). I don’t have any inside knowledge on the design meetings, but I’d go further and speculate that if Span<T> wasn’t a thing, the ref-changes wouldn’t have happened, so it makes sense to consider them together. Most of the fun things you can do with ref returns / locals start to make a lot more sense when we look at Span<T> - which we’ll do in part 2, but first we need to remind ourselves what ref means, and explore the new ref changes.

ref returns and locals

There’s a reason that ref (and cousin out) aren’t used extensively on many APIs: they are hard to fully understand. A lot of people will describe them in terms of “changes being visible”, but that is just a side-effect, not the meaning. I don’t mean this as a criticism: it isn’t necessary for every C# developer to have a deep knowledge of the inner workings of these things.

But consider the following common question:

void PassByRef()
{
    int i = 42;
    IncrementByRef(ref i);
    // what does this line print, and why?
    Console.WriteLine(i);
}
void IncrementByRef(ref int x)
{
    x = x + 1; // increment
}

Most developers will be able to correctly understand that this will output “43”, but asking them exactly what happened can reveal very different levels of understanding. The short summary is that a reference to the variable i was passed to IncrementByRef; all the code in IncrementByRef that looks like it is reading / writing to the parameter is actually dereferencing the parameter at each stage. This is clearer if we write it in unsafe pointer code instead:

unsafe void PassByPointer()
{
    int i = 42;
    IncrementByPointer(&i);
    // what does this line print, and why?
    Console.WriteLine(i);
}
unsafe void IncrementByPointer(int* x)
{
    *x = *x + 1; // increment
}

Here we can clearly see the “take a reference to” operation (&) and “dereference” (*) operations, but there’s a lot of problems with pointers:

  • the garbage collector (GC) refuses to even try to walk pointers, which means you need to be very careful to only access memory that won’t move (unmanaged memory, stack memory, or pinned managed objects)
  • pointer arithmetic makes it trivially possible to access adjacent memory without any bounds checking
  • it forces us to use unsafe, which makes it very easy to make subtle but major bugs that cause just about any level of silliness imaginable
  • pointers only work for a small subset of types - essentially primitives and structs composed of primitives

The point of ref parameters is to get the best of both worlds. ref is essentially just like pointers, but with enough restrictions to stop us getting into messes. The aditional sanity checks and restrictions mean that the IL knows enough about the meaning for the GC to sensibly be able to navigate them without getting confused, so we don’t need to worry about the reference suddenly being meaningless - and since we can’t do anything too silly, we don’t need to drop to unsafe. And it should work for any regular type.

But, historically, this ability to add automatic dereferencing and talk about ref has been restricted to method parameters; no fields, no locals, and no return values.

ref locals

The first change in C#7 allows us to talk about automatically dereferenced ref items as local (method) variables. In the same way that a ref parameter is denoted by a ref prefix before the type, so it is with ref locals, with the added bonus that ref var is legal:

void ByRefLocal()
{
    int i = 42;
    ref var x = ref i;
    x = x + 1;
    // what does this line print, and why?
    Console.WriteLine(i);
}

This prints “43” for exactly the same reasons as before - the only difference is that we now have a syntax to express ref when talking about locals. Previously, we would have to have added an additional method to switch to ref semantics for a local. One slight peculiarity here is that ref locals must be assigned a value at the point of declaration - and we can only assign it a value at this point. Any further attempt to assign a value to a ref local is interpreted as a dereferencing assignment - the *x = in our pointer example.

This ability is nice, but it isn’t very useful until we combine it with…

ref return

A much more interesting and powerful addition in C# 7 is ref returns. As the name suggests, this allows us to return a ref value from a method. We can capture this value into a ref local as long as we include an additional ref just before the value to make it very clear that we don’t want to dereference - which is the regular behaviour whenever touching a ref parameter or local:

ref int GetArrayReference(int[] items, int index)
    => ref items[index];

void IncrementInsideArrayByRef()
{
    int[] values = { 1, 2, 3 };

    ref int item = ref GetArrayReference(values, 1);
    IncrementByRef(ref item);
    // what does this line print, and why?
    Console.WriteLine(string.Join(",", values));
}

Here the GetArrayReference method provides the caller a ref to inside the array. Note that the ability to get a ref into an array is not by itself new - this has always worked:

IncrementByRef(ref values[item]);

The bit that is new and different is only the ability to return a ref value.

Since we increment a ref int that refers to the array index 1 (the second element), the result is “1,3,3”.

Note that we don’t need to capture the ref value before we use it - we can also pass a ref return result directly into a ref or out parameter:

IncrementByRef(ref GetArrayReference(values, 1));

Are there restrictions on what we can ref return?

Yes, yes there are. Figuring out the rules on what can and can’t be safely returned as ref without letting the author get into an accidental ugly mess of landmines is probably why it has never been supported in the past. We’ve seen that we can return ref references into arrays. And we’ve seen that we can take a ref of a local variable. But a local only exists in the context of the current stack-frame: very bad things would happen if we could ref return a ref to a local - the caller would have a ref to a position outside the active stack, which would be undefined behavior:

ref int ReturnInvalidStackReference()
{
    int i = 32;
    return ref i; // can't do this
}
void WhatHappensHere()
{
    ref int v = ref ReturnInvalidStackReference();
    CallSomeOtherMethods(); // to use the stack
    int i = v; // dereference the ref
}

You’ll be relieved to know that the compiler doesn’t let us do this - the compiler is very strict to ensure that if we want to ref return something, then it must demonstrably refer to a safe value. Put very simply: as long as the assignment doesn’t involve a ref to a local, we’ll be fine. return ref i; clearly involves the local i, so can’t be returned. The return ref expression is inspected for safety; each part of the expression must be safe. This includes any ref parameters that we have passed into any method calls:

ref int ReturnInvalidStackReference()
{
    int j = 42;
    return ref DoSomething(ref j);
}

This might look like a confusing restriction, but note that DoSomething could be implemented as:

ref int DoSomething(ref int evil) => ref evil;

which would expose the ref j stack reference to the caller of ReturnInvalidStackReference, so any such possibility is excluded. The implementation here is pretty solid, so if it refuses to let you ref return something, you’ve probably attempted something that looks too much like you’re involving locals of the current method.

But no ref fields

We have ref parameters, ref locals and ref returns. However, there is no currently support for ref fields (instance or static variables). Specifically, this is not legal:

struct Foo {
    ref int _reference;
}

or:

class Foo {
    ref int _reference;
}

The reason for this again relates to undefined behaviour and escaping the stack-frame. If we could put a ref into a field, then we are at grave danger of proving invalid access (at some point later on) to a position in memory that now means something completely unrelated to what it meant when we took the ref. Strictly speaking we can prove that ref fields would be valid if the assigned value comes from inside an object, and we’ll discuss another safe scenario in part 2,but currently the rule is simple: no ref fields.

When is a local not a local?

This has some additional consequences for a number of code concepts that look like locals, but which are actually fields; for example:

  • locals in an iterator block (yield return)
  • locals in an async method
  • captured variables in lambdas, anonymous methods, and LINQ syntax comprehensions

All of these situations are - and for similar reasons (with the added bonus of issues of lifetime) - the scenarios where you can’t use ref or out parameters or unsafe, so basically: if you can’t use ref, out parameters or unsafe, you won’t be able to use ref locals or ref return either.

One additional scenario, though, is tuples: as I discussed previously, tuples are secretly implemented as fields on the ValueTuple<...> family. So: no ref values in tuples.

Summary

This should give you enough to start understanding what ref locals and ref returns are, but for them really start to make sense we need a concrete example. And we get that in “spans”, coming up next!

Sunday, 9 April 2017

Exploring Tuples as a Library Author

Exploring Tuples as a Library Author

tl;dr:

This works in an upcoming version of dapper; we like it:

// very lazy dapper query 
var users = db.Query<(int id, string name)>(
    "select Id, Name from Users").AsList();

And this works in an upcoming release of protobuf-net:

// poor man's serialization contract
var token = Serializer.Deserialize<(DateTime dob, string key)>(source);

But - all is not necessarily as it seems. Here we will discuss the nuances of C# 7 tuple usage, and look at the limitations and restrictions that the implementation imposes. In doing so, we can look at what this means for library authors, and what expectations consumers should have.

What is new in the world of tuples?

One of the fun new tools in C# 7 is language-level tuple support. Sure, we’ve had the Tuple<...> family for ages, but that forces you to use Item1, Item2 (etc) naming, and it is a class - so involves allocations, but… it works:

// System.Tuple usage
var tuple = Tuple.Create(123, "abc");
int foo = tuple.Item1; // 123
string bar = tuple.Item2; // "abc"

The ever-terser syntax for read-only properties makes it not too hard to write your own manual tuples, but it isn’t necessarily the best use of your time (especially if you want to do it well - the code is less trivial than people imagine); a simple example might look like:

// manual tuple struct definition
struct MyType
{
    public MyType(int id, string name)
    {
        Id = id;
        Name = name;
    }
    public int Id { get; }
    public string Name { get; }
}

Maneagable, but… not very friendly if you’re just trying to hack a few values around in close proximity to each-other.

More recently, the ValueTuple<...> family has been added on nuget; this solves the allocation problem, but still leaves us with names like Item1, Item2:

// System.ValueTuple usage
var tuple = ValueTuple.Create(123, "abc");
int foo = tuple.Item1; // 123
string bar = tuple.Item2; // "abc"

The new thing, then, is language support to prettify this. As an illustration, we can re-write that as:

// language tuple usage
var tuple = (id: 23, name: "abc");
int foo = tuple.id; // 123
string bar = tuple.name; // "abc"

This allows us to conveniently represent some named pieces of data. It looks broadly similar to anonymous types:

// anonymous type
var obj = new { Id = 23, Name = "abc" };
int foo = obj.Id; // 123
string bar = obj.name; // "abc"

But it is implemented very differently, as we’ll see. Anonymous types are very limited in scope - bacause you can’t say their name, you can’t expose them on any API boundary (even internal methods). Tuples, however, don’t have this limitation - they can be used in parameters and return types:

// language tuples on method signatures
public static (int x, int y) DoSomething(
    string foo, (int id, DateTime when) bar) {...}

And you can even express them in generics:

// language tuples as generic type parameters
var points = new List<(int x, int y)>();

How are language tuples implemented?

Given how similar they look to anonymous types, you’d be forgiven for assuming that they were implemented similarly - perhaps defining a struct instead of a class. Let’s test that by decompiling a basic console exe:

// test console using language tuples
static class Program
{
    static void Main()
    {
        var tuple = (id: 23, name: "abc");
        System.Console.WriteLine(tuple.name); // "abc"
    }
}

We’ll compile that in release mode and look at the IL:

ildasm mytest.exe

and find the Main() method - we see:

.method private hidebysig static void  Main() cil managed
{
.entrypoint
// Code size       23 (0x17)
.maxstack  8
IL_0000:  ldc.i4.s   23
IL_0002:  ldstr      "abc"
IL_0007:  newobj     instance void valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>::.ctor(!0, !1)
IL_000c:  ldfld      !1 valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>::Item2
IL_0011:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0016:  ret
} // end of method Program::Main

I don’t expect everyone to be able to read IL, but what you won’t see in there is any mention of id or name - and there are no additional types in the assembly (just Program). Instead, it is using System.ValueTuple<int,string>. The IL we have here is exactly the same as we would get if we had compiled:

// test console using System.ValueTuple<...>
static class Program
{
    static void Main()
    {
        var tuple = new System.ValueTuple<int, string>(23, "abc");
        System.Console.WriteLine(tuple.Item2); // "abc"
    }
}

This is by design and is always the case - which is why you need a reference to System.ValueTuple in order to use language tuples. In fact, to make things more fun, you’re *still allowed to use the Item1 etc names:

// accessing a tuple via the implementation names
var tuple = (id: 23, name: "abc");
System.Console.WriteLine(tuple.Item2); // "abc"

Things would obviously get very confusing if you had used Item2 as a name in your tuple syntax - the compiler has enough sense to check that if you declare Item2, it must be in the second spot; and a few other well-known method names are blocked too, but generally: you’ll be fine.

But if it is just ValueTuple<...>, how does it work on the public API?

After all, this works:

// declare a method that uses tuples on the public API
public static class TypeInAssemblyA
{
    public static List<(int x, int y)> GetCoords((string key, string value) foo)
    {
        return new[] { (1, 2) }.ToList();
    }
}
// consume a method that uses tuples on the public API
static class TypeInAssemblyB
{
    static void ConsumeData()
    {
        foreach (var pair in TypeInAssemblyA.GetCoords(("abc", "def")))
        {
            System.Console.WriteLine($"{pair.x}, {pair.y}");
        }
    }
}

So the logical names (x and y) are clearly being conveyed somehow, but this is still ValueTuple<...>. If we look at the IL again, we see the trick in the top of the declaration of GetCoords:

.method public hidebysig static class [mscorlib]System.Collections.Generic.List`1<valuetype [System.ValueTuple]System.ValueTuple`2<int32,int32>> 
        GetCoords(valuetype [System.ValueTuple]System.ValueTuple`2<string,string> foo) cil managed
{
.param [0]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 02 00 00 00 01 78 01 79 00 00 )
// .......x.y..
.param [1]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 02 00 00 00 03 6B 65 79 05 76 61 6C 75 65 00 00 )
// .......key.value

Don’t worry if you’re not an IL expert - they key point is that it declares attributes ([TupleElementNames(...)]) against the parameter and return type that include the names, essentially as though we had done:

// equivalent: use tuple element name attributes explicitly
[return: TupleElementNames(new[] { "x", "y}" })]
public static List<ValueTuple<int, int>> GetCoords(
    [TupleElementNames(new[] {"key","value"})]
    ValueTuple<string,string> foo)
{
    return new[] { (1, 2) }.ToList();
}

(the C# 7 compiler doesn’t let us do this manually, though - it tells us to use tuple syntax instead)

This tells us once and for all that the names are nothing to do with the type, and are either removed completely, or exposed simply as decorations via attributes (when used on an API).

But tuples can be arbitrarily complex

Note that we don’t just need to consider returning flat tuples - the individual elements of tuples can themselves be tuples, or involve tuples (lists of tuples, arrays of tuples, tuple fields, etc). To be honest I don’t expect to see this much in the wild (if you’re getting this complex, you should probably define your own type), but this works:

// expose nested tuples
public static (int x, (int y, (float a, float b))[] coords) Nested()
    => (1, new[] { (2, (3.0f, 4.0f)) });

which is described as:

.param [0]
.custom instance void [System.ValueTuple]System.Runtime.CompilerServices.TupleElementNamesAttribute::.ctor(string[])
= ( 01 00 06 00 00 00 01 78 06 63 6F 6F 72 64 73 01 79 FF 01 61 01 62 00 00 )
// .......x.coords.y..a.b..

which is the equivalent of:

 [return: TupleElementNames(new[] {
     "x", "coords", "y", null, "a", "b" })]

In truth, I don’t know the significance of the null, or how exactly the mapping works in terms of position in the compound tuple vs position in the array.

EDIT: oops, silly me - the null is because I didn't name the element described by (float a, float b) - yes, that's allowed.

We can get names out of an API - can we push them in?

It is very common in code that uses libraries to create an insteance and then push it into an API either as object or via generics (<T>). For example:

var val = (id: 123, name: "Fred");
SomeLibrary.SomeObjectMethod(val);
SomeLibrary.SomeGenericMethod(val); // <T> inferred

with (for a simple example:

public static void SomeGenericMethod<T>(T val) {
    Console.WriteLine(typeof(T).Name);
    foreach(var field in typeof(T).GetFields())
        Console.WriteLine(field.Name);
}

public static void SomeObjectMethod(object val)
{
    Console.WriteLine(val.GetType().Name);
    foreach (var field in val.GetType().GetFields())
        Console.WriteLine(field.Name);
}

If we run this, we see:

ValueTuple`2
Item1
Item2
ValueTuple`2
Item1
Item2

This output confirms that once again we’re seeing ValueTuple<...> in play. As before, let’s look at the relevant part of the IL for these two calls (note I added some other code, not shown, to force the value into locals for simplicity):

ldloc.0
box        valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>
call       void SomeLibrary::SomeObjectMethod(object)

ldloc.0
call       void SomeLibrary::SomeGenericMethod<valuetype [System.ValueTuple]System.ValueTuple`2<int32,string>>(!!0)

The first call converts the tuple to object by “boxing” it; the second call uses generics without boxing. But neither of them include anything whatsoever about the names. This is consistent with what we saw by dumping the fields, and gives us the summary statement:

Tuple names pass outwards, never inwards

Tuple names can be passed outwards (names declared in the public signature of a method can be used in code that consumes that method), but can not be passed inwards (names declared in one method will never be available to general library code that consumes those values). This has significant implications in a wide range of scenarios that we normally expect to be able to access names via reflection: the names simply won’t exist. For example, if we serialized a tuple to JSON, we would expect to see Item1, Item2, etc instead of the names we thought we declared. And there is nothing whatsoever the library author can do to get the original names. It would be nice if there was something akin to the method caller attributes ([CallerMemberName] etc), that library authors could attach to an optional parameter to receive the dummy names in some structured form (one part per generic type parameter), but: I can see that it would be very hard to define the semantics of this in the general case (it could be a generic method on a generic type nested in a generic type, for example).

So what can library authors do?

OK, so names are off the table. But that doesn’t mean that tuples are completely useless for use with library code. We still have position. And if a library can do something sensible just using the position semantics, then it doesn’t matter that the library doesn’t know the names that the caller had in mind.

Let’s take 2 examples; protobuf-net and dapper.

protobuf-net

This is a binary serialization library that follows the protocol buffers (protobuf) serialization format. One of the ways that protobuf achieves efficent encoding is that it doesn’t care about names. At all (at least, in the binary protocol). All it knows is that there is an integer in field 1, a utf8 string in field 2, and an array of floats in field 3 (for example). In this case, we don’t lose anything when we see ValueTuple<...> - all we need to do is to make sure that the library recognises the pattern and knows how to infer the contract from the type (so Item1 maps to field 1, Item2 maps to field 2, etc).

As it happens, protobuf-net already had code to recognise a general family of tuple patterns, so this very nearly worked for free without any changes. The only thing that tripped it up was that historically the “is this a tuple” check made the assumption that any sensible tuple would be immutable. As it happens, ValueTuple<...> is implemented as fully mutable (with mutable public fields). By softening that check a little bit, protobuf-net now works with any such tuple it sees.

dapper

This is an ADO.NET helper utility, that makes it really really simple to correctly execute parameterized commands and queries, and populate records from the results. In terms of the latter, when executing Query<T>(...), it executes the ADO.NET query, then maps the results by name into the fields and properties of T, one instance of T per row received. As we’ve discovered before, we are never going to able to respect the caller’s names when using C# 7 tuples, and it is unlikely that people will conveniently call their columns Item1, Item2, etc. That means that it probably isn’t going to be useful to use value tuples with domain entity objects, but… if you’re using value tuples for your domain entity objects you already need a stiff talking to.

However! There is another very common query scenario with dapper; ad-hoc local queries that get trivial data. In a lot of these cases, it really isn’t worth going to the trouble of declaring a custom type to receive the results, so dapper supports dynamic as a “meh, it’ll work” helper:

// query dapper via "dynamic"
int id = ...
var row = conn.QuerySingle("select Name, LastUpdated from Records where Id=@id", new {id});
string name = row.Name; // "dynamic"
DateTime lastUpdated = row.LastUpdated;
// use name and lastUpdated

This works, but: it involves the DLR (relatively expensive compared to direct C#), and has a few other overheads. What we can do is change dapper to recognise that names aren’t useful in ValueTuple<...>, but rely instead on the column position:

// query dapper via C# 7 tuples
int id = ...
var row = conn.QuerySingle<(string name, DateTime lastUpdated)>(
     "select Name, LastUpdated from Records where Id=@id", new {id});
// use row.name and row.lastUpdated

That’s actually pretty handy for a wide range of scenarios where we’re going to consume the data immediately adjacent to the query. And because they are value types we don’t even pay heap allocation costs. I wouldn’t use it for populating real domain objects, though.

dapper: what about parameters?

you’ll notice that we also have a new {id} anonymous type usage to pass parameters into dapper. There are a number of reasons that I’ve left this alone and have made no attempt to support value tuples:

  • in ADO.NET, most providers support names, and some important providers (SqlClient for SQL Server for example) do not support positional parameters; while we could in theory expose a positional parameter syntax in the API, the use would be somewhat limited
  • at the moment, the parameters are passed as object; if we pass a C# tuple it would be boxed, which means an allocation - and at that point, we might as well have just used an anonymous type object (which would have conveyed names)
  • if you’re thinking “make the method generic so the parameter is passed as T to avoid boxing”, the problem is that the method is already generic in the return type, and in C# when calling a generic method you can either supply all the generic types explicitly, or let the compiler infer all of them; there is no middle ground. Making the parameters a generic would be inconvenient in all cases, and impossible in others - we cannot name the T used in code that uses anonymous types, precisely because it is anonymous. There are hacks around this, but: nothing great

All in all, I’m happy to just say “use anonymous types for the parameters; use tuples for output rows” - it is a pragmatic compromise that covers the huge majority of useful real-world scenarios. I have added code to detect ValueTuple<...> being used in the parameters and raise an appropriate exception guiding the user on the problem an how to fix it.

Summary

We’ve explored C# 7 tuples and their implementation. Armed with the implementation details, we’ve explored what that means for code that consumes C# 7 tuples - both upstream and downstream. We’ve discussed the limitations for use with libary code, but we’ve also seen that they still have plenty of uses with library code in some scenarios. If I’ve missed anything, let me know on twitter!

Addendum

After publishing, Daniel Crabtree very rightly notes that the Item* fields only go up to Item7, and that additional fields are stored in the Rest field which can itself be a tuple. You can read more about that in his post here.