Spans and `ref` part 2 : spans

In part 1, we looked at ref locals and ref return, and hinted at a connection to “spans”; this time we’re going to take a deeper look at what this connection might be, and how we can use make use of it.

Disclaimer

I’m mostly on the outside of this - looking in at the public artefacts, playing with the API etc - maybe the odd PR or issue report. It is entirely possible that I’ve misunderstood some things, and it is possible that things will change between now and general availability.

What are spans?

By spans, I mean System.Span<T>, which is part of .NET Core, living in the System.Memory assembly. It is also available for .NET via the System.Memory package. But please note: it is a loaded gun to use at the moment - you can currently compile code that has undefined behavior, and which may not compile at some point in the future. Although to be fair, to get into any of the terrible scenarios you need to use the unsafe keyword, at which point you already said “I take full responsibility for everything that goes wrong here”. I’ll discuss this more below, but I wanted to mention that at the top in case you stop reading and don’t get to that important point.

Note that some of the code in this post uses unreleased features; I’m using:

<PackageReference Include="System.Memory"
    Version="4.4.0-preview1-25219-04" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe"
    Version="4.4.0-preview1-25219-04" />

Obviously all bets are off with preview code; things may change.

Why do spans need to exist?

We saw previously how ref T can be used similarly to pointers (T*) to represent a reference to a single value. Basically, anything that allows us to talk about complex scenarios without needing pointers is a good thing. But: representing a single value is not the only use-case of pointers. The much more common scenario for pointers is for talking about a range of contiguous data, usually when paired with a count of the elements.

At the most basic level, a Span<T> represents a strongly typed contiguous chunk of elements of type T with a known and enforced length. In many ways, very comparable to an array (T[]) or segment ArraySegment<T>) - but… more. They also provide safe (by which I mean: not unsafe in the C# sense) access to features that would previously have required pointers (T*).

I’m probably missing a few things here, but the most immediate features are:

provide a unified type system over all contiguous memory, including: arrays, unmanaged pointers, stack pointers, fixed / pinned pointers to managed data, and references into the interior of values
allow type coercion for primitives and value-types
work with generics (unlike pointers, which don’t)
respect garbage collection (GC) semantics by using references instead of pointers (the GC only walks references)

Now: if none of the above sounds like things you ever need to do, then great: you probably won’t ever need to use Span<T> - and that’s perfectly OK. Most application code will never need to use these features. Ultimately, these tools are designed for lower level code (usually: library code) that is performance critical. That said, there are some great uses in regular code, that we’ll get onto.

But… what is a span?

OK, OK. Conceptually, a Span<T> can be thought of as a reference and a length:

public struct Span<T> {
    ref T _reference;
    int _length;
    public ref T this[int index] { get {...} }
}

with a cousin:

public struct ReadOnlySpan<T> {
    ref T _reference;
    int _length;
    public T this[int index] { get {...} }
}

You would be perfectly correct to complain “but… but… in the last part you said no ref fields!”. That’s fair, but I did say conceptually. At least… for now!

Spans as ranges of an array

As a completely trivial (and rather pointless) example, we can see how we can use a Span<T> very similarly to how we might have used a T[]:

void ArrayExample() {
    byte[] data = new byte[1024];
    // not shown: populate data
    ProcessData(data);
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

Here we implicitly convert the byte[] to Span<byte> when calling the method, but at this point you would still be justified in being underwhelmed - we could have done everything here with just an array.

Similarly, we can talk about just a portion of the array:

void ArrayExample() {
    byte[] data = new byte[1024];
    // not shown: populate data
    ProcessData(new Span<byte>(data, 10, 512));
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

And again you could observe that we could have just used ArraySegment<T>. Actually, let’s be realistic: very few people use ArraySegment<T> - but we could have just passed int offset and int count as additional parameters, it would have worked fine. But I mentioned pointers earlier…

Spans as ranges of pointers

The second way we can use Span<T> is over a pointer; which could be any of:

a stackalloc pointer for a small value that we want to work on without allocating an array
a managed array that we previously fixed
a managed array that we previously pinned with GCHandle.Alloc
a fixed-sized buffer that we previously fixed
the contents of a string that we previously fixed
a coerced pointer from any of the above (I’ll explain what this means below)
a chunk of unmanaged memory obtained with Marshal.AllocHGlobal or any other unmanaged memory API
etc

All of these will necessarily involve unsafe, but: we’ll tread carefully! Let’s have a look at a stackalloc example (stackalloc is where you obtain a chunk of data directly on the call-stack):

void StackAllocExample() {
    unsafe {
        byte* data = stackalloc byte[128];
        var span = new Span<byte>(data, 128);
        // not shown: populate data / span
        ProcessData(span);
    }
}
void ProcessData(Span<byte> span) {
    for (int i = 0; i < span.Length; i++) {
        DoSomething(span[i]);
    }
}

That’s… actually pretty huge! We just used the exact same processing code to handle an array and a pointer, and we didn’t need to use unsafe (except in the code that initially obtained the pointer). This opens up a huge range of possibilities, especially for things like network IO and serialization. Even better, it means that we can do all of the above with a “zero copy” mentality: rather than having managed code writing to a byte[] that later gets copied to some unmanaged chunk (for whatever IO we need), we can write directly to the unmanaged memory via a Span<T>.

Slice and dice

A very common scenario when working with buffers and buffer segments is the need to sub-divide the buffer. Span<T> makes this easy via the Slice() method, best illustrated by an example:

void ProcessData(Span<byte> span) {
    while(span.Length > 0) {
        // first byte is single-byte length-prefix
        int len = span[0];

        // process the next "len" bytes
        ProcessChunk(span.Slice(1, len));

        // move forward len+1 bytes
        span = span.Slice(len + 1);
    }
}

This isn’t something we couldn’t do other ways, but it is very convenient here. Importantly, we haven’t allocated anything here - there’s no “new array” or similar - we just have a reference to a different part of the existing range, and / or a different length.

Coercion

A more interesting example is coercion; this is something that you can do with pointers, but is very hard to do with arrays. A classic scenario here would be IO / serialization: you have a chunk of bytes, and at some point in that data you need to treat the data as fixed-size int, float, double, etc data. In the world of pointers, you just… do that:

byte* raw = ...
float* floats = (float*)raw;
float x = floats[0], y = floats[1]; // consume 8 bytes

With arrays, there is no direct way to do this; you’d either need to use unsafe hacks, or you can use BitConverter if the types you need are supported. But this is easy with Span<T>:

Span<byte> raw = ...
var floats = raw.NonPortableCast<byte, float>();
float x = floats[0], y = floats[1]; // consume 8 bytes

Not only can we do it, but we have the added advantage that it has correctly tracked the end range for us during the conversion - we will find that floats.Length is equal to raw.Length / 4 (since each float requires 4 bytes). The important thing to realise here is that we haven’t copied any data - we’re still looking at the exact same place in memory - but instead of treating it as a ref byte, we’re treating it as a ref float.

Except… better!

We observed that with pointers we could coerce from byte* to float*. That’s fine, but you can’t use pointers with all types. Span<T> has much stronger support here. A particularly interesting illustration is SIMD, which is exposed in .NET via Vector<T>. A vexing limitation of pointers is that we cannot talk about a Vector<float>* pointer (for example). This means that we can’t use pointer coercion as a convenient way of reading and writing SIMD vectors (you’ll usually have to use Unsafe.Read<T> and Unsafe.Write<T> instead). But we can coerce directly to Vector<T> from a span! Here’s an example that might come up in things like applying the web-sockets xor mask to a received frame’s payload:

void ApplyXor(Span<byte> span, uint mask) {
    if(Vector.IsHardwareAccelerated) {
        // apply the mask to SIMD-width bytes at a time
        var vectorMask = new Vector<uint>(mask);
        var typed = span.NonPortableCast<byte, Vector<uint>>();
        for (int i = 0; i < typed.Length; i++) {
            typed[i] ^= vectorMask;
        }
        // move past that data (might be a few bytes left)
        span = span.Slice(Vector<uint>.Count * typed.Length);
    }
    // not shown - finish any remaining data 
}

That’s pretty minimal code for vectorizing something; it is especially nice that we didn’t even need to do the math to figure out the vectorizable range - typed.Length did everything we wanted. It would be premature for me to know for sure, but I’m also hopeful that these 0-Span<T>.Length loops will also elide the bounds check in the same way that array access from 0-T[].Length elides the bounds check.

And readonly too!

Pointers are notoriously permissive; if you have a pointer: you can do anything. You can use fixed to obtain the char* pointer inside a string: if you change the data via the pointer, the string now has different contents. string is not immutable if you allow unsafe: nothing is immutable if you allow unsafe. But just as we can obtain a Span<T>, we can also get a ReadOnlySpan<T>. If you only expect a method to read the data, you can give them a ReadOnlySpan<T>.

Zero-cost substrings

In the “corefxlab” preview code, there’s a method-group with signatures along the lines of:

 public static  ReadOnlySpan<char> Slice(this string text, ...)

(where the overloads allow an initial range to be specified). This gives us a ReadOnlySpan<char> that points directly at a range inside the string. If we want a substring, we can just Slice() again and again - with zero allocations and zero string copying - we just have different spans over the same data. A rich set of APIs already exists in the corefxlab code for working with this type of string-like data. If you do a lot of text processing, this could have some really interesting aspects.

This all sounds too good to be true - what’s the catch?

Here’s the gotcha: in order to have the appropriate correctness guarantees when discussing something that could be a managed object, could be data on the stack, or could be unmanaged data, we run into very similar problems that make it impossible to store a ref T local as a field. Remember that a Span<T> is conceptually a ref T (reference) and int (length) - well: we still need to obey the rules imposed by that “conceptually”. For a trivial example of how we can get in a mess, we can tweak our stackalloc example:

private Span<byte> _span;
unsafe void StackAllocExample() {
    byte* data = stackalloc byte[128];
    _span = new Span<byte>(data, 128);
    ...
}
void SomeWhileLater() {
    ProcessData(_span);
}

Where does _span refer to in SomeWhileLater? I can’t tell you. We get into similar problems with anything that used fixed to get a pointer - the pointer is only guaranteed to make sense inside the fixed. Conceptually the issue is not restricted to pointers - it would apply equally if we could initialize Span<T> directly with a ref T constuctor:

private Span<SomeStruct> _span;
void StackRefExample() {
    var val = new SomeStruct(123, 456);
    _span = new Span<SomeStruct>(ref val);
    // ^^^ hypothetical span of length 1
}

We didn’t even need unsafe to break things this time. No such constructor currently exists, very wisely!

We should be OK if we only ever use managed heap objects (arrays, etc) to initialize a Span<T>, but the entire point of Span<T> is to provide feature parity between things like arrays and pointers while making it hard to shoot yourself in the foot.

In addition to this, we also need to worry about atomicity. The runtime and language guarantee that a single reference can be read atomically (in one CPU instruction), but it makes no guarantees about anything larger. If we have a reference and a length, we start getting into very complex issues around “torn” values (an invalid pair of the reference and length that didn’t actually exist, due to two threads squabbling). A torn value is vexing at the best of times, but in this case it would lead to valid-looking code accessing unexpected memory - a very bad thing.

The stackalloc example above is a perfect example of code that will compile without complaint today, but will end very very badly - although we used unsafe, so: self-inflicted. But this and the atomicity issue are both illustrations of why we have…

The Important Big Rule Of Spans

Span<T> has undefined behavior off the stack. And in the future: may not be allowed off the stack at all - this means no fields, no arrays, no boxing, etc. In the same way that ref T only has defined behavior on the stack (locals, parameters, return values) - so Span<T> only has defined behavior on the stack. You are not meant to ever put a Span<T> in a field (including all those times when things look like locals but are actually fields, that I touched on last time). An immediate consequence of this is that atomicity is no longer an issue: each stack is specific to a single thread; if our value can’t escape the stack, then two threads can’t have competing reads and writes.

There’s some in-progress discussion on how the rules for this requirement should work, but it looks like the concept of a “ref-like” stack-only type is being introduced. ref T as a field would be ref-like, and Span<T> would be ref-like. Any ref-like type would only be valid directly on the stack, or as an instance field (not a static field) on a ref-like type. If I had to speculate at syntax, I’d expect this to look something like:

public ref struct Span<T> {
    ref T _reference;
    int _length;
    public ref T this[int index] { get {...} }
}

Emphasis: this syntax is pure speculation based on the historic reluctance to introduce new keywords, but the ref struct here denotes a ref-like type. It could also be done via attributes or a range of other ideas, but note that we’re now allowed to embed the ref-like ref T field. Additionally, the compiler and runtime would verify that Span<T> is never used illegally as a field or in an array etc. Notionally, we could also do this for our own types that shouldn’t escape the stack, if we have similar semantics but Span<T> doesn’t represent our scenario.

Thinking back to the StackRefExample, if we wanted to safely support usage like:

var val = new SomeStruct(123, 456);
var span = new Span<SomeStruct>(ref val); // local, not field

then presumably it could work, but we’d have to have similar logic about returning ref-like types as currently exists for ref return, further complicated by the fact that we don’t have the single-assignment guarantee - we can reassign a Span<T>. If ref-like types work in the general case, then the logic about passing and returning such a value needs ironing out. And that’s complex. I’m very happy to defer to Vladimir Sadov on this!

EDIT: to clarify - it is only the pair of ref T and length (together known as a span, Span<T> or ReadOnlySpan<T>) that need to stay on the stack; the memory that we're spanning can be anywhere - and will often be part of a regular array (T[]) on the managed heap. It could also be a reference to the unmanaged heap, or to a separate part of the current stack.

So how am I meant to work with spans?

Sure, not everything is on the stack.

This isn’t as much of a limitation as it sounds. Instead of storing the Span<T> itself, you just need to store something that can manifest a span. For example, if you’re actually using arrays you might have a type that contains an ArraySegment<T>, but which has a property:

public Span<T> Span { get { ... } }

As long as you can switch into Span<T> mode when you’re inside an appropriate method, all is good.

For a more unified model, the corefxlab code contains the Buffer<T> concept, but it is still very much a work in progress. We’ll have to see how it shakes out in time.

Wait… why so much `ref` previously?

We covered a lot of ref details - you might feel cheated. Well, partly we needed that information to understand the stack-only semantics of Span<T>. But there’s more! Span<T> also exposes the ref T directly via the aptly named DangerousGetPinnableReference() method. This is a ref return, and allows us to do any of:

store the ref return into a ref local and work with it
pass the ref return as a ref or out parameter to another method
use fixed to convert the ref to a pointer (preventing GC movement at the same time)

The latter option means that not only can we get from unsafe to Span<T>, but we can go the other direction if we need:

fixed(byte* ptr = &span.DangerousGetPinnableReference())
{ ... }

If I can get a `ref`, can I escape the bounds?

The DangerousGetPinnableReference() method give us back a ref to the start of the range, comparable to how a T* pointer refers to the start of a range in pointer terms. So: can we use this to get around the range constraints? Well… yes… ish:

ref int somewhere = ref Unsafe.Add(
    ref span.DangerousGetPinnableReference(), 5000);

This cheeky duo gives us a reference to whatever is 5000-integers ahead of the span we were thinking of. It might still be part of our data (if we have a large array, for example), or it might be something completely random. But the sharp eyed might have noticed some key words in that expression… “Unsafe...” and “Dangerous...”. If you keep sprinting past signs with words like that on: expect to hit rocks. There’s nothing here that you couldn’t already do with unsafe code, note.

Doing crazy things with unmanaged memory

Sometimes you need to use unmanaged memory - this could be because of memory / collection issues, or could be because of interfacing with unmanaged systems - I use it in CUDA work, for example, where the CUDA driver has to allocate the memory in a special way to get optimal performance. Historically, working with unmanaged memory is hard - you will be using pointers all the time. But we can simplify everything by using spans. Here’s our dummy type that we will store in unmanaged memory:

// could be explict layout to match external definition
struct SomeType
{
    public SomeType(int id, DateTime creationDate)
    {
        Id = id;
        _creationDate = creationDate.ToEpochTime();
        // ...
    }
    public int Id { get; }
    private long _creationDate;
    public DateTime CreationDate => _creationDate.FromEpochTime();
    // ...
    public override string ToString()
        => $"{Id}: {CreationDate}, ...";
}

We’ll need to allocate some memory and ensure it is collected, usually via a finalizer in a wrapper class:

unsafe class UnmanagedStuff : IDisposable
{
    private SomeType* ptr;
    public UnmanagedStuff(int count)
    {
        ptr = (SomeType*) Marshal.AllocHGlobal(
            sizeof(SomeType) * count).ToPointer();
    }
    ~UnmanagedStuff() { Dispose(false); }
    public void Dispose() => Dispose(true);
    private void Dispose(bool disposing)
    {
        if(disposing) GC.SuppressFinalize(this);
        var ip = new IntPtr(ptr);
        if (ip != IntPtr.Zero) Marshal.Release(ip);
        ptr = default(SomeType*);
    }
}

The wrapper type needs to know about the pointers, so is going to be unsafe - but does the rest of the code need to? Sure, we could add an indexer that uses Unsafe.Read / Unsafe.Write to access individual elements, but that means copying the data constantly, which is probably not what we want - and it doesn’t help us represent ranges. But spans do: we can return a span of the data (perhaps via a Slice() API):

public Span<SomeType> Slice(int offset, int count)
    => new Span<SomeType>(ptr + offset, count);
// ^^^ not shown: validate range first

And we can consume this pretty naturally without unsafe:

// "stuff" is our UnmanagedStuff object
// easily talk about a slice of unmanaged data
var slice = stuff.Slice(5, 10);
slice[0] = new SomeType(123, DateTime.Now);                

// (separate slices work)
slice = stuff.Slice(0, 25);
Console.WriteLine(slice[5]); // 123: 23/04/2017 09:09:51, ...

If we want to talk about individual elements (rather than a range), then a ref local (via a ref return) is what we want; we could use the DangerousGetPinnableReference() API on a Span<T> for this, but in this case it is probably easier just to use Unsafe directly:

public ref SomeType this[int index]
    => ref Unsafe.AsRef<SomeType>(ptr + index);
// ^^^ not shown: validate range first

We can consume this with similar ease:

// talk about a *reference* to unmanaged data
ref SomeType item = ref stuff[5];
Console.WriteLine(item); // 123: 23/04/2017 09:09:51, ...
item = new SomeType(42, new DateTime(2016, 1, 8));

// prove that updated *inside* the slice
Console.WriteLine(slice[5]); // 42: 08/01/2016 00:00:00, ...

And now from any code, we can talk directly to the unmanaged memory simply by passing it in as a ref parameter - it will never be copied, just dereferenced. If you want to talk about an isolated copy or store a copy as a field, then you can dereference, but that is easy:

SomeType isolated = item;

If you’ve ever worked with unmanaged memory from C#, this is a huge difference - and opens up a whole range of interesting scenarios for allocation-free systems without requiring the entire codebase to be unsafe. For context, in an allocation-free system, the lifetime of a set of data is strictly defined by some unit of work - processing an inbound request, for example. This means we don’t need reference tracking and garbage collection (and GC pauses can hurt high performance systems), so instead we simply take some slabs of memory, work from them (incrementing counters as we consume space), and then when we’ve finished the request we just set all the counters back to zero and we’re ready for the next request, no mess. Spans and ref locals and ref return make this friendly, even in the unmanaged memory scenario. The only caveat being - once again: Span<T> and ref T cannot legally escape the stack. But as we’ve seen, we can expose on-demand a Span<T> or ref T - so it isn’t a burden.

Summary

Spans; they’re very powerful if you need that kind of thing. And they force a range of new concepts into C#, giving us all the combined strong points of arrays, pointers, references and generics - with very few of the pain points. If you don’t care about pointers, buffers, etc - you probably won’t need to learn about spans. But if you do, they’re awesome. The amount of effort the .NET folks (and the community, but mostly Microsoft) have made making this span concept so rich and powerful is huge - it impacts the compiler, the JIT, the runtime, and multiple libraries both pre-existing and brand new. And it impacts both .NET and .NET Core. As someone who works a lot in the areas affected by spans and ref - it is also hugely appreciated. Good things are coming.

Code, code and more code.

Sunday, 23 April 2017