Monday, 18 May 2020

Multi-path cancellation; a tale of two codependent async enumerators

Disclaimer: I'll be honest: many of the concepts in this post are a bit more advanced - some viewer caution is advised! It touches on concurrent linked async enumerators that share a termination condition by combining multiple CancellationToken.


Something that I've been looking at recently - in the context of gRPC (and protobuf-net.Grpc in particular) - is the complex story of duplex data pipes. A full-duplex connection is a connection between two nodes, but instead of being request-response, either node can send messages at any time. There's still a notional "client" and "server", but that is purely a feature of which node was sat listening for connection attempts vs which node reached out and established a connection. Shaping a duplex API is much more complex than shaping a request-response API, and frankly: a lot of the details around timing are hard.

So: I had the idea that maybe we can reshape everything at the library level, and offer the consumer something more familiar. It makes an interesting (to me, at least) worked example of cancellation in practice. So; let's start with an imaginary transport API (the thing that is happening underneath) - let's say that we have:

  • a client establishes a connection (we're not going to worry about how)
  • there is a SendAsync message that sends a message from the client to the server
  • there is a TryReceiveAsync message that attempts to await a message from the server (this will also report true if a message could be fetched, and false if the server has indicated that it won't ever be sending any more)
  • additionally, the server controls data flow termination; if the server indicates that it has sent the last message, the client should not send any more

something like (where TRequest is the data-type being sent from the client to the server, and TResponse is the data-type expected from the server to the client):

interface ITransport<TRequest, TResponse> : IAsyncDisposable
{
    ValueTask SendAsync(TRequest request,
        CancellationToken cancellationToken);

    ValueTask<(bool Success, TResponse Message)> TryReceiveAsync(
        CancellationToken cancellationToken);
}

This API doesn't look all that complicated - it looks like (if we ignore connection etc for the moment) we can just create a couple of loops, and expose the data via enumerators - presumably starting the SendAsync via Task.Run or similar so it is on a parallel flow:

ITransport<TRequest, TResponse> transport;
public async IAsyncEnumerable<TResponse> ReceiveAsync(
    [EnumeratorCancellation] CancellationToken cancellationToken)
{
    while (true)
    {
        var (success, message) =
            await transport.TryReceiveAsync(cancellationToken);
        if (!success) break;
        yield return message;
    }
}

public async ValueTask SendAsync(
    IAsyncEnumerable<TRequest> data,
    CancellationToken cancellationToken)
{
    await foreach (var message in data
        .WithCancellation(cancellationToken))
    {
        await transport.SendAsync(message, cancellationToken);
    }
}

and it looks like we're all set for cancellation - we can pass in an external cancellation-token to both methods, and we're set. Right?

Well, it is a bit more complex than that, and the above doesn't take into consideration that these two flows are codependent. In particular, a big concern is that we don't want to leave the producer (the thing pumping SendAsync) still running in any scenario where the connection is doomed. There are actually many more cancellation paths than we might think:

  1. we might have supplied an external cancellation-token to both methods, and this token may have triggered
  2. the consumer of ReceiveAsync (the thing iterating it) might have supplied a cancellation-token to GetAsyncEnumerator (via WithCancellation), and this token may have been triggered (we looked at this last time)
  3. we could have faulted in our send/receive code
  4. the consumer of ReceiveAsync may have decided not to take all the data - that might be because of some async simile of Enumerable.Take(), or it could be because they faulted when processing a message they had received
  5. the producer in SendAsync may have faulted

All of these scenarios essentially signify termination of the connection, so we need to be able to encompass all of these scenarios in some way that allows us to communicate the problem between the send and receive path. In a word, we want our own CancellationTokenSource.

There's a lot going on here; more than we can reasonably expect consumers to do each and every time they use the API, so this is a perfect scenario for a library method. Let's imagine that we want to encompass all this complexity in a simple single library API that the consumer can access - something like:

public IAsyncEnumerable<TResponse> Duplex(
    IAsyncEnumerable<TRequest> request,
    CancellationToken cancellationToken = default);

This:

  • allows them to pass in a producer
  • optionally allows them to pass in an external cancellation-token
  • makes an async feed of responses available to them

Their usage might be something like:

await foreach (MyResponse item in client.Duplex(ProducerAsync()))
{
    Console.WriteLine(item);
}

where their ProducerAsync() method is (just "because"):

async IAsyncEnumerable<MyRequest> ProducerAsync(
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 100; i++)
    {
        yield return new MyRequest(i);
        await Task.Delay(100, cancellationToken);
    }
}

As I discussed in The anatomy of async iterators (aka await, foreach, yield), our call to ProducerAsync() doesn't actually do much yet - this just hands a place-holder that can be enumerated later, and it is the act of enumerating it that actually invokes the code. Very important point, that.

So; what can our Duplex code do? It already needs to think about at least 2 different kinds of cancellation:

  • the external token that was passed into cancellationToken
  • the potentially different token that could be passed into GetAsyncEnumerator() when it is consumed

but we know from our thoughts earler that we also have a bunch of other ways of cancelling. We can do something clever here. Recall how the compiler usually combines the above two tokens for us? Well, if we do that ourselves, then instead of getting just a CancellationToken, we find ourselves with a CancellationTokenSource, which gives us lots of control:

public IAsyncEnumerable<TResponse> Duplex(
    IAsyncEnumerable<TRequest> request,
    CancellationToken cancellationToken = default)
    => DuplexImpl(transport, request, cancellationToken);

private async static IAsyncEnumerable<TResponse> DuplexImpl(
    ITransport<TRequest, TResponse> transport,
    IAsyncEnumerable<TRequest> request,
    CancellationToken externalToken,
    [EnumeratorCancellation] CancellationToken enumeratorToken = default)
{
    using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
            externalToken, enumeratorToken);
    // ... todo
}

Our DuplexImpl method here allows the enumerator cancellation to be provided, but (importantly) kept separate from the original external token; this means that it won't yet be combined, and we can do that ourselves using CancellationTokenSource.CreateLinkedTokenSource - much like the compiler would have done for us, but: now we have a CancellationTokenSource that we can cancel when we choose. This means that we can use allDone.Token in all the places we want to ask "are we done yet?", and we're considering everything.

For starters, let's handle the scenario where the consumer doesn't take all the data (out of choice, or because of a fault). We want to trigger allDone however we exit DuplexImpl. Fortunately, the way that iterator blocks are implemented makes this simple (and we're already using it here, via using): recall (from the previous blog post) that foreach and await foreach both (usually) include a using block that invokes Dispose/DisposeAsync on the enumerator instance? Well: anything we put in a finally essentially relocates to that Dispose/DisposeAsync. The upshot of this is that triggering the cancellation token when the consumer is done with us is trivial:

using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
        externalToken, enumeratorToken);
try
{
    // ... todo
}
finally
{   // cancel allDone however we exit
    allDone.Cancel();
}

The next step is to get our producer working - that's our SendAsync code. Because this is duplex, it doesn't have any bearing on the incoming messages, so we'll start that as a completely separate code-path via Task.Run, but we can make it such that if the producer or send faults, it stops the entire show; so if we look just at our // ... todo code, we can add:

var send = Task.Run(async () =>
{
    try
    {
        await foreach (var message in
            request.WithCancellation(allDone.Token))
        {
            await transport.SendAsync(message, allDone.Token);
        }
    }
    catch
    {   // trigger cancellation if send faults
        allDone.Cancel();
        throw;
    }
}, allDone.Token);

// ... todo: receive

await send; // observe send outcome

This starts a parallel operation that consumes the data from our producer, but notice that we're using allDone.Token to pass our combined cancellation knowledge to the producer. This is very subtle, because it represents a cancellation state that didn't even conceptually exist at the time ProducerAsync() was originall invoked. The fact that GetAsyncEnumerator is deferred has allowed us to give it something much more useful, and as long as ProducerAsync() uses the cancellation-token appropriately, it can now be fully aware of the life-cycle of the composite duplex operation.

This just leaves our receive code, which is more or less like it was originally, but again: using allDone.Token:

while (true)
{
    var (success, message) = await transport.TryReceiveAsync(allDone.Token);
    if (!success) break;
    yield return message;
}

// the server's last message stops everything
allDone.Cancel();

Putting all this together gives us a non-trivial libray function:

private async static IAsyncEnumerable<TResponse> DuplexImpl(
    ITransport<TRequest, TResponse> transport,
    IAsyncEnumerable<TRequest> request,
    CancellationToken externalToken,
    [EnumeratorCancellation] CancellationToken enumeratorToken = default)
{
    using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
        externalToken, enumeratorToken);
    try
    {
        var send = Task.Run(async () =>
        {
            try
            {
                await foreach (var message in
                    request.WithCancellation(allDone.Token))
                {
                    await transport.SendAsync(message, allDone.Token);
                }
            }
            catch
            {   // trigger cancellation if send faults
                allDone.Cancel();
                throw;
            }
        }, allDone.Token);

        while (true)
        {
            var (success, message) = await transport.TryReceiveAsync(allDone.Token);
            if (!success) break;
            yield return message;
        }

        // the server's last message stops everything
        allDone.Cancel();

        await send; // observe send outcome
    }
    finally
    {   // cancel allDone however we exit
        allDone.Cancel();
    }
}

The key points here being:

  • both the external token and the enumerator token contribute to allDone
  • the transport-level send and receive code uses allDone.Token
  • the producer enumeration uses allDone.Token
  • however we exit our enumerator, allDone is cancelled
    • if transport-receive faults, allDone is cancelled
    • if the consumer terminates early, allDone is cancelled
  • when we receive the last message from the server, allDone is cancelled
  • if the producer or transport-send faults, allDone is cancelled

The one thing it doesn't support well is people using GetAsyncEnumerator() directly and not disposing it. That comes under the heading of "using the API incorrectly", and is self-inflicted.

A side note on ConfigureAwait(false); by default await includes a check on SynchronizationContext.Current; in addition to meaning an extra context-switch, in the case of UI applications this may mean running code on the UI thread that does not need to run on the UI thread. Library code usually does not require this (it isn't as though we're updating form controls here, so we don't need thread-affinity). As such, in library code, it is common to use .ConfigureAwait(false) basically everywhere that you see an await - which bypasses this mechanism. I have not included that in the code above, for readability, but: you should imagine it being there :) By contrast, in application code, you should usually default to just using await without ConfigureAwait, unless you know you're writing something that doesn't need sync-context.

I hope this has been a useful delve into some of the more complex things you can do with cancellation-tokens, and how you can combine them to represent codependent exit conditions.

Thursday, 14 May 2020

The anatomy of async iterators (aka await, foreach, yield)

Here I'm going to discuss the mechanisms and concepts relating to async iterators in C# - with the hope of both demystifying them a bit, and also showing how we can use some of the more advanced (but slightly hidden) features. I'm going to give some illustrations of what happens under the hood, but note: these are illustrations, not the literal generated expansion - this is deliberately to help show what is conceptually happening, so if I ignore some sublte implementation detail: that's not accidental. As always, if you want to see the actual code, tools like https://sharplab.io/ are awesome (just change the "Results" view to "C#" and paste the code you're interested in onto the left).

Iterators in the sync world

Before we discuss async iterators, let's start by recapping iterators. Many folks may already be familiar with all of this, but hey: it helps to set the scene. More importantly, it is useful to allow us to compare and contrast later when we look at how async changes things. So: we know that we can write a foreach loop (over a sequence) of the form:

foreach (var item in SomeSource(42))
{
    Console.WriteLine(item);
}

and for each item that SomeSource returns, we'll get a line in the console. SomeSource could be returning a fully buffered set of data (like a List<string>):

IEnumerable<string> SomeSource(int x)
{
    var list = new List<string>();
    for (int i = 0; i < 5; i++)
        list.Add($"result from SomeSource, x={x}, result {i}");
    return list;
}

but a problem here is that this requires SomeSource to run to completion before we get even the first result, which could take a lot of time and memory - and is just generally restrictive. Often, when we're trying to represent a sequence, it may be unbounded, or at least: open-ended - for example, we could be pulling data from a remote work queue, where a: we only want to be holding one pending item at a time, and b: it may not have a logical "end". It turns out that C#'s definition of a "sequence" (for the purposes of foreach) is fine with this. Instead of returning a list, we can write an iterator block:

IEnumerable<string> SomeSource(int x)
{
    for (int i = 0; i < 5; i++)
        yield return $"result from SomeSource, x={x}, result {i}";
}

This works similarly, but there are some fundamental differences - most noticeably: we don't ever have a buffer - we just make one element available at a time. To understand how this can work, it is useful to take another look at our foreach; the compiler interprets foreach as something like the following:

using (var iter = SomeSource(42).GetEnumerator())
{
    while (iter.MoveNext())
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

We have to be a little loose in our phrasing here, because foreach isn't actually tied to IEnumerable<T> - it is duck-typed against an API shape instead; the using may or may not be there, for example. But fundamentally, the compiler calls GetEnumerator() on the expression passed to foreach, then creates a while loop checking MoveNext() (which defines "is there more data?" and advances the mechanism in the success case), then accesses the Current property (which exposes the element we advanced to). As an aside, historically (prior to C# 5) the compiler used to scope item outside of the while loop, which might sound innocent, but it was the source of absolutely no end of confusion, code erros, and questions on Stack Overflow (think "captured variables").

So; hopefully you can see in the above how the consumer can access an unbounded forwards-only sequence via this MoveNext() / Current approach; but how does that get implemented? Iterator blocks (anything involving the yield keyword) are actually incredibly complex, so I'm going to take a lot of liberties here, but what is going on is similar to:

IEnumerable<string> SomeSource(int x)
    => new GeneratedEnumerable(x);

class GeneratedEnumerable : IEnumerable<string>
{
    private int x;
    public GeneratedEnumerable(int x)
        => this.x = x;

    public IEnumerator<string> GetEnumerator()
        => new GeneratedEnumerator(x);

    // non-generic fallback
    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();
}

class GeneratedEnumerator : IEnumerator<string>
{
    private int x, i;
    public GeneratedEnumerator(int x)
        => this.x = x;

    public string Current { get; private set; }

    // non-generic fallback
    object IEnumerator.Current => Current;

    // if we had "finally" code, it would go here
    public void Dispose() { }

    // our "advance" logic
    public bool MoveNext()
    {
        if (i < 5)
        {
            Current = $"result from SomeSource, x={x}, result {i}";
            i++;
            return true;
        }
        else
        {
            return false;
        }
    }

    // this API is essentially deprecated and never used
    void IEnumerator.Reset() => throw new NotSupportedException();
}

Let's tear this apart:

  • firstly, we need some object to represent IEnumerable<T>, but we also need to understand that IEnumerable<T> and IEnumerator<T> (as returned from GetEnumerator()) are different APIs; in the generated version there is a lot of overlap and they can share an instance, but to help discuss it, I've kept the two concepts separate.
  • when we call SomeSource, we create our GeneratedEnumerable which stores the state (x) that was passed to SomeSource, and exposes the required IEnumerable<T> API
  • later (and it could be much later), when the caller iterates (foreach) the data, GetEnumerator() is invoked, which calls into our GeneratedEnumerator to act as the cursor over the data
  • our MoveNext() logic implements the same for loop conceptually, but one step per call to MoveNext(); if there is more data, Current is assigned with the thing we would have passed to yield return
  • note that there is also a yield break C# keyword, which terminates iteration; this would essentially be return false in the generated expansion
  • note that there are some nuanced differences in my hand-written version that the C# compiler needs to deal with; for example, what happens if I change x in my enumerator code (MoveNext()), and then later iterate the data a second time - what is the value of x? emphasis: I don't care about this nuance for this discussion!

Hopefully this gives enough of a flavor to understand foreach and iterators (yield) - now let's get onto the more interesting bit: async.

Why do we need async iterators?

The above works great in a synchronous world, but a lot of .NET work is now favoring async/await, in particular to improve server scalability. The big problem in the above code is the bool MoveNext(). This is explicitly synchronous. If the thing it is doing takes some time, we'll be blocking a thread, and blocking a thread is increasingly anathema to us. In the context of our earlier "remote work queue" example, there might not be anything there for seconds, minutes, hours. We really don't want to block threads for that kind of time! The closest we can do without async iterators is to fetch the data asynchronously, but buffered - for example:

async Task<List<string>> SomeSource(int x) {...}

But this is not the same semantics - and is getting back into buffering. Assuming we don't want to fetch everything in one go, to get around this we'd eventually end up implementing some kind of "async batch loop" monstrosity that effectily re-implements foreach using manual ugly code, negating the reasons that foreach even exists. To address this, C# and the BCL have recently added support for async iterators, yay! The new APIs (which are available down to net461 and netstandard20 via NuGet) are:

public interface IAsyncEnumerable<out T>
{
    IAsyncEnumerator<T> GetAsyncEnumerator(CancellationToken cancellationToken = default);
}
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
    T Current { get; }
    ValueTask<bool> MoveNextAsync();
}
public interface IAsyncDisposable
{
    ValueTask DisposeAsync();
}

Let's look at our example again, this time: with added async; we'll look at the consumer first (the code doing the foreach), so for now, let's imagine that we have:

IAsyncEnumerable<string> SomeSourceAsync(int x)
    => throw new NotImplementedException();

and focus on the loop; C# now has the await foreach concept, so we can do:

await foreach (var item in SomeSourceAsync(42))
{
    Console.WriteLine(item);
}

and the compiler interprets this as something similar to:

await using (var iter = SomeSourceAsync(42).GetAsyncEnumerator())
{
    while (await iter.MoveNextAsync())
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

(note that await using is similar to using, but DisposeAsync() is called and awaited, instead of Dispose() - even cleanup code can be asynchronous!)

The key point here is that this is actually pretty similar to our sync version, just with added await. Ultimately, however, the moment we add await the entire body is ripped apart by the compiler and rewritten as an asynchronous state machine. That isn't the topic of this article, so I'm not even going to try and cover how await is implemented behind the scenes. For today "a miracle happens" will suffice for that. The observant might also be wondering "wait, but what about cancellation?" - don't worry, we'll get there!

So what about our enumerator? Along with await foreach, we can also now write async iterators with yield; for example, we could do:

async IAsyncEnumerable<string> SomeSourceAsync(int x)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

In real code, we could now be consuming data from a remote source asynchronously, and we have a very effective mechanism for expressing open sequences of asynchronous data. In particular, remember that the await iter.MoveNextAsync() might complete synchronously, so if data is available immediately, there is no context switch. We can imagine, for example, an iterator block that requests data from a remote server in pages, and yield return each record of the data in the current page (making it available immediately), only doing an await when it needs to fetch the next page.

Behind the scenes, the compiler generates types to implement the IAsyncEnumerable<T> and IAsyncEnumerator<T> pieces, but this time they are even more obtuse, owing to the async/await restructuring. I do not intend to try and cover those here - it is my hope instead that we wave a hand and say "you know that expansion we wrote by hand earlier? like that, but with more async". However, there is a very important topic that we have overlooked, and that we should cover: cancellation.

But what about cancellation?

Most async APIs support cancellation via a CancellationToken, and this is no exception; look back up to IAsyncEnumerable<T> and you'll see that it can be passed into the GetAsyncEnumerator() method. But if we're not writing the loop by hand, how do we do this? This is achieved via WithCancellation, similarly do how ConfigureAwait can be used to configure await - and indeed, there's even a ConfigureAwait we can use too! For example, we could do (showing both config options in action here):

await foreach (var item in SomeSourceAsync(42)
    .WithCancellation(cancellationToken).ConfigureAwait(false))
{
    Console.WriteLine(item);
}

which would be semantically equivalent to:

var iter = SomeSourceAsync(42).GetAsyncEnumerator(cancellationToken);
await using (iter.ConfigureAwait(false))
{
    while (await iter.MoveNextAsync().ConfigureAwait(false))
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

(I've had to split the iter local out to illustrate that the ConfigureAwait applies to the DisposeAsync() too - via await iter.DisposeAsync().ConfigureAwait(false) in a finally)

So; now we can pass a CancellationToken into our iterator... but - how can we use it? That's where things get even more fun! The naive way to do this would be to think along the lines of "I can't take a CancellationToken until GetAsyncEnumerator is called, so... perhaps I can create a type to hold the state until I get to that point, and create an iterator block on the GetAsyncEnumerator method" - something like:

// this is unnecessary; do not copy this!
IAsyncEnumerable<string> SomeSourceAsync(int x)
    => new SomeSourceEnumerable(x);
class SomeSourceEnumerable : IAsyncEnumerable<string>
{
    private int x;
    public SomeSourceEnumerable(int x)
        => this.x = x;

    public async IAsyncEnumerator<string> GetAsyncEnumerator(
        CancellationToken cancellationToken = default)
    {
        for (int i = 0; i < 5; i++)
        {
            await Task.Delay(100, cancellationToken); // simulate async something
            yield return $"result from SomeSource, x={x}, result {i}";
        }
    }
}

The above works. If a CancellationToken is passed in via WithCancellation, our iterator will be cancelled at the correct time - including during the Task.Delay; we could also check IsCancellationRequested or call ThrowIfCancellationRequested() at any point in our iterator block, and all the right things would happen. But; we're making life hard for ourselves - the compiler can do this for us, via [EnumeratorCancellation]. We could also just have:

async IAsyncEnumerable<string> SomeSourceAsync(int x,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100, cancellationToken); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

This works similarly to our approach above - our cancellationToken parameter makes the token from GetAsyncEnumerator() (via WithCancellation) available to our iterator block, and we haven't had to create any dummy types. There is one slight nuance, though... we've changed the signature of SomeSourceAsync by adding a parameter. The code we had above still compiles because the parameter is optional. But this prompts the question: what happens if I passed one in? For example, what are the differences between:

// option A - no cancellation
await foreach (var item in SomeSourceAsync(42))

// option B - cancellation via WithCancellation
await foreach (var item in SomeSourceAsync(42).WithCancellation(cancellationToken))

// option C - cancellation via SomeSourceAsync
await foreach (var item in SomeSourceAsync(42, cancellationToken))

// option D - cancellation via both
await foreach (var item in SomeSourceAsync(42, cancellationToken).WithCancellation(cancellationToken))

// option E - cancellation via both with different tokens
await foreach (var item in SomeSourceAsync(42, tokenA).WithCancellation(tokenB))

The answer is that the right thing happens: it doesn't matter which API you use - if a cancellation token is provided, it will be respected. If you pass two different tokens, then when either token is cancelled, it will be considered cancelled. What happens is that the original token passed via the parameter is stored as a field on the generated enumerable type, and when GetAsyncEnumerator is called, the parameter to GetAsyncEnumerator and the field are inspected. If they are both genuine but different cancellable tokens, CancellationTokenSource.CreateLinkedTokenSource is used to create a combined token (you can think of CreateLinkedTokenSource as the cancellation version of Task.WhenAny); otherwise, if either is genuine and cancellable, it is used. The result is that when you write an async cancellable iterator, you don't need to worry too much about whether the caller used the API directly vs indirectly.

You might be more concerned by the fact that we've changed the signature, however; in that case, a neat trick is to use two methods - one without the token that is for consumers, and one with the token for the actual implementation:

public IAsyncEnumerable<string> SomeSourceAsync(int x)
    => SomeSourceImplAsync(x);

private async IAsyncEnumerable<string> SomeSourceImplAsync(int x,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100, cancellationToken); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

This would seem an ideal candidate for a "local function", but unfortunately at the current time, parameters on local functions are not allowed to be decorated with attributes. It is my hope that the language / compiler folks take pity on us, and allow us to do (in the future) something more like:

public IAsyncEnumerable<string> SomeSourceAsync(int x)
{
    return Impl();

    // this does not compile today
    async IAsyncEnumerable<string> Impl(
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        for (int i = 0; i < 5; i++)
        {
            await Task.Delay(100, cancellationToken); // simulate async something
            yield return $"result from SomeSource, x={x}, result {i}";
        }
    }
}

or the equivalent using static local functions, which is usually my preference to avoid any surprises in how capture works. The good news is that this works in the preview language versions, but that is not a guarantee that it will "land".

Summary

So; that's how you can implement and use async iterators in C# now. We've looked at both the consumer and producer versions of iterators, for both synchronous and asynchronous code paths, and looked at various ways of accessing cancellation of asynchronous iterators. There is a lot going on here, but: hopefully it is useful and meaningful.

Wednesday, 4 March 2020

Why do I rag on BinaryFormatter?

tl;dr: seriously, stop using BinaryFormatter

The other evening, in the context of protobuf-net.Grpc, someone asked me whether it was possible to use BinaryFormatter as the marshaller. This isn't an unreasonable question, especially as protobuf-net.Grpc is designed to allow you to swap out the marshaller (gRPC is usually used with protobuf, but it isn't restricted to that as long as both ends understand what they're dealing with).

This made me realise that while I've spent over a decade telling people variants of "don't use BinaryFormatter", I don't think I've ever collated the reasons in one place. I suspect that many people think I'm being self-serving by saying this - after all it is so easy to use BinaryFormatter, and I'm not exactly a disinterested observer when it comes to serialization tools.

So! I thought I'd take this opportunity to put together my thoughts and reasons in one place, while also providing a "custom marshaller" example for protobuf-net.Grpc. Because "reasons", I've done this as comments in the example, but I present them below. There are four sections, but if you aren't sold by the time you've finished the first ("Security") section, then frankly: I give up. Everything beyond that first section is just decoration!

So; if you're still using BinaryFormatter, I implore you: please just stop.

And without further embellishment, I present my thesis. If I missed anything, please let me know and we can add more. But again, no more should be needed.

Tuesday, 7 January 2020

.NET Core, .NET 5; the exodus of .NET Framework?

tl,dr; opinion: ongoing .NET Framework support for F/OSS libraries may quickly start evaporating, and this should be a consideration in migration planning.


First, a clarification of terms, because they matter:

  • .NET Framework - the original .NET, the one that ships on Windows and only on Windows; the current (and probably final) version of .NET Framework is 4.8
  • .NET Core - the evolution of .NET, that is not tied to the OS as much, with slightly different feature sets, and where most of the Microsoft .NET effort has been for the last few years; .NET Core 3.1 shipped recently
  • .NET Standard - an API definition (not implementation - akin to an interface) that allows a library to target a range of platforms in a single build, i.e. by targeting .NET Standard 2.0 a library can in theory run equivalently on .NET Core 3 and .NET Framework 4.6.2 (ish...) and others (Mono, Unity, etc), without needing to target each individually
  • .NET 5 - the next version of .NET Core; the naming deliberately emphasizes that there isn't a two-pronged development future consisting of "Framework" and "Core", but just one - this one - which isn't "Core" in the "minimal" sense, but is in fact now a very rich and powerful runtime; .NET 4 was avoided to prevent versioning confusion between .NET 4.* and .NET Framework 4.* (and again, to emphasize that this is the future direction of .NET, including if you are currently on .NET Framework)

The first thing we must be clear about, in case it isn't 100% clear from the above, is that .NET Framework is legacy completed. There isn't going to be a .NET Framework 4.9 or a .NET Framework 5. There might be some critical security fixes, but there aren't going to be feature additions, unless those additions come from out-of-band NuGet (etc) packages that just happened to work on .NET Framework on the first (or maybe second) try.

I commented on Twitter yesterday about my perceptions on the status of this, and how we (the .NET community) should look at the landscape; it goes without saying that I'm merely opining here - I'm not a spokesperson for Microsoft, but I am a library author and consumer, and I work extensively in the .NET space. Other views and conclusions are possible! But: I wanted to take the time to write up a more long-form version of what I see, with space to give reasons and discuss consequences.

What I said yesterday

The short version is: I expect that 2020 will see a lot of library authors giving serious consideration as to whether to continue shipping .NET Framework support on new library versions. There are lots of reasons for this, including:

  • increasing feature gaps making it increasingly expensive to support multiple frameworks, either via compatibility shims or framework-dependent feature sets
  • as more and more library authors complete their own migrations to .NET Core, the effort required to support a framework that they aren't using increases:
    • bugs don't get spotted until they've shipped to consumers
    • a lot of knowledge of "the old framework" needs to be retained and kept in mind - a particular issue with new contributors who might never have used that framework (and yes, there are some huge gotchas)
    • there are often two (or more) code implementations to support
    • builds are more complicated than necessary (requiring either Windows or the build-pack), and take longer
    • tests take longer and require Windows
    • packages and dependency trees are larger than necessary
  • not all new language features are equal citizens on down-level frameworks
    • some features, such as default interface methods, will not work on down-level frameworks
    • some important features like C# 8 nullability are in a weird middle ground where some bits kinda work sometimes most of the time except when it doesn't
    • some, like IAsyncEnumerable<T> may have compatibility shims, but that only allows minimal support on library surfaces, since of course many framework level pieces to produce or consume such will be missing
  • some APIs are fundamentally brittle on .NET Framework, especially when multi-targeting, with the breaks happening only at run-time (they are not obvious at build, and may not be obvious until a very specific code-path is hit, which might be a long time after initial deployment); a lot of this comes does to the assembly loader and assembly-binding-redirects (a problem that simply does not exist in .NET Core / .NET 5)
    • if you want to see a library author cry, mention System.ValueTuple, System.Numerics.Vectors, or System.Runtime.CompilerServices.Unsafe. Why? Because they are deployment nightmares if you are targeting multiple platforms, because .NET Framework makes a complete pig's ear of them; you can just about fix it up with assembly-binding-redirects some of the time, but the tooling will not and can not do this for you, which is pure pain for a library author
    • recall that .NET Framework is "complete"; the loader isn't going to be fixed (also, nobody wants to touch it); alternatively, it could be said that the loader has already been fixed; the fix is called .NET Core / .NET 5
  • a lot of recent performance-focused APIs are not available on .NET Framework, or perform very differently (which is almost the worst possible outcome for performance-focused APIs!); for example:
    • concurrency: a lot of async APIs designed for highly concurrent systems (servers, in particular) will be simply missing on .NET Framework, or may be implemented via async-over-sync / sync-over-async, which significantly changes the characteristics
    • allocations: ther are a lot of new APIs designed to avoid allocations, typically in library code related to IO, data-processing etc - things like Span<T>; the APIs to interact with the framework with these directly with these won't exist on .NET Framework, forcing dual code paths, but even when they do, .NET Framework uses a different (and less optimal) Span<T> implementation, and the JIT lacks the knowledge to make Span<T> be magical; you can hack over some of the API gaps using pointer-based APIs when they exist, but then you might be tempted to use Unsafe.*, which as already mentioned: wants to kill you
    • processing: one of the most powerful new toolkits in .NET for CPU-focused work is access to SIMD and CPU intrinsics; both of these work especially well when mixed with spans, due to the ability to coerce between spans and vectors - but we just saw how Span<T> is problematic; full CPU intrinsics are only available on .NET Core / .NET 5, but you can still get a lot done by using Vector<T> which allows SIMD on .NET Framework... except I already mentioned that System.Numerics.Vectors is one of the trifecta of doom - so yes, you can use it, but: brace yourself.
    • now consider that a lot of libraries - including Microsoft libraries on NuGet, and F/OSS libraries - are starting to make more and more use of these features for performance, and you start to see how brittle things get, and it often won't be the library author that sees the problem.
  • as .NET Core / .NET 5 expand our ability to reach more OSes, we already have enough permutations of configurations to worry about.
  • often, the issues here may not be just down to a library, but may be due to interactions of multiple libraries (or indeed, conflicting dependencies of multiple libraries), so the issues may be unique to specific deployments.

How about just offering patch support, not feature support?

So the theory here is that we can throw our hands in the air, and declare "no new features in the .NET Framework version - but we'll bugfix". This sounds great to the consumer, but... it isn't really very enticing to the maintainer. In reality, this means branching at some point, and now ... what happens? We still retain all of the build, test, deploy problems (although now we might need completely different build/CI tools for each), but now we have two versions of the code that are drifting apart; we need to keep all the old things in our mind for support, and when we bugfix the current code, we might also need to backport that bug into a branch that uses very different code, and test that. On a platform that the library maintainers aren't using.

F/OSS isn't free; it is paid for by the maintainers. When proposing something like the above, we need to be very clear about whose time we are committing, and why we feel entitled to commit it. Fundamentally, I don't think that option scales very well. At some point, I think it becomes increasingly necessary to think of .NET Framework in the same way that we have thought of .NET 1.* for a very long time - it is interesting to know that it exists, but the longer you stay stuck on that island, the harder life is going to become for you.

In particular, to spell it out explicitly; I expect a number of libraries will start rebasing to .NET Standard 2.1 and .NET Core 3.0 or 3.1 as their minimum versions, carving off .NET Framework. The choice of .NET Standard 2.1 here isn't necessarily "because we want to use APIs only available in 2.1", but is instead: "because we actively don't want .NET Framework trying to run this, and .NET Framework thinks, often mistakenly, that it works with .NET Standard 2.0" (again, emphasis here is that .NET Framework 4.6.2 only sort of implements .NET Standard 2.0, and even when it does, it drags in a large dependency graph; this is partly resolved if you also target .NET Framework 4.7.2, but your list of TFMs is now growing even further).

So what happens to .NET Framework folks?

I totally get that a lot of people will be stuck on .NET Framework for the foreseeable future. Hell, a lot of our code at Stack Overflow is still .NET Framework (we're working through migration). I completely understand and empathize with all the familiar topics of service lifetimes, SLAs, budgets, clients, contracts/legals, and all of those things.

Just like nobody is coming to take .NET Framework off your machine, nobody is coming to take F/OSS libraries either. What I'm saying is that a time may come - and it is getting closer on the horizon - when you just won't get updates. The library you have today will continue working, and will still be on NuGet, but there won't be feature updates, and very few (for the reasons above) bug fixes.

I know I've spoken about open source funding before, but: at some point, if your business genuinely needs additional support on .NET Framework where it is going to create significant extra work (see: everything above) for the maintainers, perhaps at some point this is simply a supply-chain issue, and one solution is to sponsor that work and the ongoing support. Another option may be to fork the project yourself at the point where you're stuck, and maintain all the changes there, perhaps even supporting the other folks using that level. If you're thinking "but that sounds like a lot of effort": congratulations, you're right - it is! That's why it isn't already being done. All such work is zero sum; time spent on the additional work needed to support .NET Framework is time not being spent actually developing the library for what the maintainer wants and needs, and: it is their time being spent.

Conclusion

A lot of what I've discussed here is opinion; I can't say for sure how it will play out, but I think it is a very real (and IMO likely) possibility. As such, I think it is just one facet of the matrix you should be considering in terms of "should we, or when should we, look to migrate to .NET Core / .NET 5"; key point: .NET Core 3.1 is a LTS release, so frankly, there's absolutely no better time than now. Is migrating work? Yes, it is. But staying put also presents challenges, and I do not believe that .NET Framework consumers can reasonably expect the status-quo of F/OSS support (for .NET Framework) to continue.

(the Twitter thread)

Friday, 23 August 2019

Prefer ValueTask to Task, always; and don't await twice

Preamble - not a part 2

A little while ago I blogged here and I set it up to be a "continues..." style post. I haven't had the energy to continue it in that context, and this fact was putting me off concluding the post. I then realised: the thing that matters isn't some overarching narrative structure, but that I get my ideas down. So: I'm aborting any attempt at making this post a continuation, and just focusing on the content!

Prefer ValueTask[<T>] to Task[<T>], always.

There's been a lot of confusion over when to use Task[<T>] vs ValueTask[<T>] (note: I'm going to drop the [<T>] from now on; just pretend they're there when you see Task / ValueTask etc).

Context: what are Task and ValueTask?

In case you don't know, Task and ValueTask are the two primary implementations of "awaitable" types in .NET; "awaitable" here means that there is a duck-typed signature that allows the compiler to turn this:

int i = await obj.SomeMethodAsync();

into something like this:

var awaiter = obj.SomeMethodAsync().GetAwaiter();
if (!awaiter.IsCompleted)
{
    // voodoo here that schedules a
    // continuation that resumes here
    // once the result becomes available
}
int i = awaiter.GetResult();

Task is the original and most well known API, since it shipped with the TPL, but it means that an object allocation is necessary even for scenarios where it turns out that it was already available, i.e. awaiter.IsCompleted returned true. The ValueTask value-type (struct) acts as a hybrid result that can represent an already completed result without allocating or an incomplete pending operation. You can implement your own custom awaitables, but it isn't common.

When to choose each, the incorrect version

If you'd asked me a while back about when to choose each, I might have incorrectly said something like:

Use Task when something is usually or always going to be genuinely asynchronous, i.e. not immediately complete; use ValueTask when something is usually or always going to be synchronous, i.e. the value will be known inline; also use ValueTask in a polymorphic scenario (virtual, interface) where you can't know the answer.

The logic behind this incorrect statement is that if something is incomplete, your ValueTask is going to end up being backed by a Task anyway, but without the extra indirection and false promise of ValueTask. This is incorrect, though, because it is based on the premise that a ValueTask is a composite of "known result (T)" and "Task". In fact, ValueTask is also a composite of a third thing: IValueTaskSource[<T>].

What is IValueTaskSource[<T>]?

IValueTaskSource is an abstraction that allows you to represent the logical behaviour of a task separately to the result itself. That's a little vague, so an example:

IValueTaskSource<int> someSource = // ...
short token = // ...
var vt = new ValueTask<int>(someSource, token);
// ...
int i = await vt;

This now functions like you'd expect from an awaitable, but even in the incomplete/asynchronous case the logic about how everything works is now down to whatever implements the interface - it does not need to be backed by a Task. You might be thinking:

ah, but we still need an instance of whatever is implementing the interface, and we're treating it as a reference, so: we're still going to allocate; what's the point? what have you gained?

And that's when I need to point out the short token. This little gem allows us to use the same interface instance with multiple value-tasks, and have them know the difference. There are two ways you could use this:

  • keep the state for multiple asynchronous operations concurrently, using the token to pick the correct state (presumably from a vector)
  • keep a single piece of state for multiple consecutive operations, using the token to guarantee that we're talking about the correct one

The second is actually by far the more common implementation, and in fact is now included in the BCL for you to make direct use of - see ManualResetValueTaskSourceCore<T>.

So what? How does this help me?

OK; so - we've seen that this alternative exists. There are two ways that people commonly author awaitable APIs today:

  • using TaskCompletionSource<T> and handing the caller the .Task (perhaps wrapped in a ValueTask), and calling TrySetResult etc when we want to trigger completion
  • using async and await, having the compiler generate all the machinery behind the scenes - noting that this currently involves creating a Task in the incomplete case, even for ValueTask methods (because it has to come from somewhere)

Hopefully you can see that if we have ValueTask available to us it is relatively easy to substitute in a ManualResetValueTaskSourceCore backer, allowing us to reuse the same IValueTaskSource instance multiple times, avoiding lots of allocations. But: there's an important caveat - it changes the API. No, really. Let's take a stroll to discuss how...

Don't await twice

Right now, the following code works - assuming the result is backed by either a fixed T or a Task<T>:

var pending = obj.SomeMethodAsync();
int i = await pending;
// ...
int j = await pending;

You'll get the same answer from each await, unsurprisingly - but the actual operation (the method) is only performed once. But: if we switch to ManualResetValueTaskSourceCore, we should only assume that each token is valid exactly once; once we've awaited the result, the entire point is that the backing implementation is free to re-use that IValueTaskSource with a different token for another consumer. That means that the code shown above is no longer legal, and we should expect that the second await can now throw an exception about the token being incorrect.

This is a pretty rare thing to see in code, so personally I'm OK with saying "tough; await once only". Think of it in human terms; this is like a manager going to someone's desk and saying:

Hi, I need the answer to (some topical question); do you know that now? if so, tell me now; otherwise, when you have the answer, bring it (somewhere) and nudge me.

All fine and reasonable so far; our office hero didn't know the answer right away, so they went away and got it, took it where instructed and handed the answer to the manager.

20 minutes later (or 2 days later), the manager stops by their desk again:

Hey, give me that answer

At this point, our hero might reasonably say

Boss, I already gave it you; I only printed it out once - you have the copy; I deal with lots of requests each day, and I can't even remember what you asked about, let alone what the answer was; if you've forgotten the answer, that's on you - feel free to ask again, it's all billable

This is kinda how I anthropomorphize ValueTask, especially in the context of IValueTaskSource. So key point: don't await twice. Treat the results of awaitables exactly the same as you would the result of any other expression: if you are going to need the value twice, store it in a local when you first fetch it.

How else can we benefit from IValueTaskSource?

So; we've seen how we can manually use an IValueTaskSource to efficiently issue ValueTask awaitable results; but if we use async/await, in the incomplete / asynchronous case the compiler is still going to be generating a Task - and also generating a bunch of other state boxes associated with the continuation voodoo. But.. it doesn't have to! A while ago I did some playing in this area that resulted in "Pooled Await"; I'm not going to go into details about this here, and for reasons that will become clear in a moment, I don't recommend switching to this, but the short version is: you can write a method that behaves exactly like a ValueTask awaitable method (including async), but the library makes the compiler generate different code that using IValueTaskSource to avoid the Task allocation, and uses state machine boxing to reduce the other allocations. It works pretty well, but as you might expect, it has the above caveat about awaiting things more than once

So; why am I saying don't leap at this? That because the BCL folks are also now playing in this space, as evidenced by this PR, which has pretty much the exact same feature set, but the advantages of:

  • being written by people who really, really understand async
  • it not adding any dependencies - it would just work out of the box for ValueTask awaitables

If that happens, then a lot of asynchronous code will magically get less allocatey all at once. I know this is something they've discussed in the past, so maybe my "Pooled Await" stuff gave them the metaphorical kick to go and take another look at implementing it for real; or maybe it was just a timing coincidence.

For both my own implementation and the BCL version, it can't do all the magic if you return Task - for best results, a ValueTask is needed (although "Pooled Await" still reuses the state-machine boxes for Task APIs)

Conclusion

So, going back to the earlier question of when to use Task vs ValueTask, IMO the answer is now obvious:

Use ValueTask[<T>], unless you absolutely can't because the existing API is Task[<T>], and even then: at least consider an API break

And also keep in mind:

Only await any single awaitable expression once

If we put those two things together, libraries and the BCL are free to work miracles in the background to improve performance without the caller needing to care.

Thursday, 21 February 2019

Fun with the Spiral of Death

Subtitled: "a cautionary tale of SemaphoreSlim", an adventure in two parts:

  • In part 1 I want to discuss a very fun series of problems we had in some asynchronous code - where "fun" here means "I took Stack Overflow offline, again". Partly because it is a fun story, but mostly because I think there's some really useful learning points in there for general adventures in asynchronicity
  • In part 2 I want to look at some of the implementation details of our eventual fix, which covers some slightly more advanced themes around how to implement awaitable code in non-trivial scenarios

I took Stack Overflow offline, again

As a side note: many of the themes here run hand-in-hand with David and Damian's recent presentation "Why your ASP.NET Core application won't scale" at NDC; if you haven't seen it yet: go watch it - in particular everything around "the application works fine until it suddenly doesn't" and "don't sync-over-async or async-over-sync".

A lot of this journey relates to our migration of StackExchange.Redis to use "pipelines", the new IO layer in .NET (previously discussed here, here, here, and here - I love me some pipelines). One of the key design choices in StackExchange.Redis is for the library to implement multiplexing to allow multiple concurrent calling threads to communicate over the same underlying socket to the server; this keeps the socket count low while also helping to reduce packet fragmentation, but it means that we need to do some synchronization around how the many caller threads access the underlying socket.

Before the pipeline migration, this code was basically synchronous (it was a bit more complex, but… that's close enough), and the "write an actual command" code could be expressed (if we take some liberties for readability) as below:

readonly object syncLock = new object(); // single writer

void WriteMessage(Message message)
{
    bool haveLock = false;
    try
    {
        Monitor.TryEnter(syncLock, timeout, ref haveLock);
        if (!haveLock) ThrowTimeout();

        ActuallyWriteTheThing(message);
        Flush();
    }
    finally
    {
        if (haveLock) Monitor.Exit(syncLock);
    }
}

This is a fairly normal style of coding - the try/finally/Monitor/haveLock code here is just a standard implementation of "lock with a timeout", so all this really does is:

  • try to acquire exclusive access to the socket, guarded by syncLock
  • if successful, write and flush

All reasonable. But then we moved to pipelines, and one of the defining features of the pipelines implementation is that key steps in it are async. You might assume that it is the write that is async - but since you write to a buffer pool, this isn't actually the case - it's the flush that is async. The flush in pipelines achieves a few different things:

  • if necessary, it activates the consumer that is pulling work from the pipe and sending it to the next step (a socket in our case)
  • it provides back-pressure to the provider (WriteMessage in this case), so that if the consumer is falling behind and there's too much backlog, we can slow down the provider (in an asynchronous way) so we don't get unbounded buffer growth

All very neat.

But switching from synchronous code to an API that uses async is not always trivial - async begets async, and once you start going async, it all goes async. So… I did a bad thing; I was lazy, and figured "hey, flush will almost always complete synchronously anyway; we can probably get away with a sync-over-async here" (narrator: they didn't get away with it).

So; what I did was something like:

readonly object syncLock = new object(); // single writer

void WriteMessage(Message message)
{
    bool haveLock = false;
    try
    {
        Monitor.TryEnter(syncLock, timeout, ref haveLock);
        if (!haveLock) ThrowTimeout();

        ActuallyWriteTheThing(message);
        FlushSync();
    }
    finally
    {
        if (haveLock) Monitor.Exit(syncLock);
    }
}

void FlushSync() // evil hack, DO NOT USE
{
    var flush = FlushAsync();
    if (!flush.IsCompletedSuccessfully)
    {
        flush.Wait();
    }
}

The IsCompletedSuccessfully here is a check you can use on many task-like (awaitable) results to see if it completed synchronously and without faulting; if it did, you're safe to access the .Result (etc.) and it will all be available already - a good trick for avoiding the async state-machine complications in high-throughput code (typically library code, not application code). The bad bit is the .Wait(…) when it isn't already completed - this is a sync-over-async.

What happened next?

A key thing to keep in mind is that StackExchange.Redis exposes both synchronous and asynchronous APIs - i.e. there are twin methods, for example:

  • RedisValue StringGet(RedisKey key)
  • Task<RedisValue> StringGetAsync(RedisKey key)

Internally they are implemented very differently so that they both get the job done with the minimum of fuss and overhead, but they were both calling into the same WriteMessage at some point. Actually, never afraid to double-down on the anti-patterns, this means that for the async callers, they were effectively doing async-over-sync-over-async; ouch.

The WriteMessage code above is used from both the synchronous and asynchronous call paths. As it happens, much of our internal existing application codebase mostly uses the synchronous paths (we're gradually adding more async, but we need to complete our in-progress transition from .NET Framework to .NET Core to be able to do it more extensively), and on the synchronous paths you were always going to be blocked anyway, so from the perspective of synchronous callers, there's not really that much wrong with the above. It does what it promises: execute synchronously.

The problem here comes from asynchronous callers, who thought they were calling StringGetAsync, and their thread got blocked. The golden rule of async is: don't block an async caller unless you really, really have to. We broke this rule, and we had reports from users about big thread-jams with async call paths all stuck at WriteMessage, because one thread had paused for the flush, and all the other threads were trying to obtain the lock.

Note: the problem here isn't that "a backlog happened, and we had to delay" - that's just business as normal. That happens, especially when you need mutex-like semantics. The problem is that we blocked the worker threads (although we did at least have the good grace to include a timeout), which under heavy load caused thread-pool starvation and a cascading failure (again: watch the video above).

So what should we have done in theory?

Given that we have both synchronous and asynchronous call-paths, what we should do is have two versions of the write code:

  • void WriteMessage(Message message)
  • ValueTask WriteMessageAsync(Message message)

but we get into immediate problems when we talk about our locking mechanism. We can see this more clearly if we use a simple lock rather than the more complex Monitor usage above - the following does not compile:

async ValueTask Foo()
{
    lock (syncLock)
    {
        // CS1996 - Cannot await in the body of a lock statement
        await Task.Delay(SomeAmount);
    }
}

The reason this doesn't work is that lock (aka Monitor) is thread-oriented. You need to [Try]Enter (take the lock) and Exit (release the lock) the constrained region from the same thread. But the moment you await, you're saying "this might continue synchronously, or it might resume later on a different thread". This actually has two consequences:

  • it would mean that when we try to release the lock, it will fail because the resuming thread probably won't actually have it
  • when we await, we're releasing the current thread back to do whatever else needs doing… which could actually end up calling back into Foo… and Monitor is "re-entrant", meaning: if you have the lock once, you can actually lock again successfully (it maintains a counter internally), which means that code in a completely unrelated execution context could incorrectly end up inside the lock, before we've resumed from the await and logically released it

As a side note, it is worth knowing that the compiler only spots this (CS1996) if you use lock; if you use manual Monitor code (because of timeouts), it won't warn you - you just need to know not to do this (which perhaps by itself is good motivation for "lock with timeout" as a language feature). Fortunately, I did know not to do this - and I moved to the next most obvious locking primitive: SemaphoreSlim. A semaphore is like Monitor, but instead of being thread-based, it is purely counter-based. Theoretically you can use a semaphore to say "no more than 5 in here", but in reality it is often used as a mutex by saying "no more than 1". SemaphoreSlim is particularly enticing because it has both synchronous and asynchronous APIs, allowing us to split our code in two fairly neatly:

readonly SemaphoreSlim singleWriter
    = new SemaphoreSlim(1); // single writer

void WriteMessage(Message message)
{
    if (!singleWriter.Wait(timeout))
        ThrowTimeout();
    try
    {
        ActuallyWriteTheThing(message);
        FlushSync(); // our hack from before
    }
    finally
    {
        singleWriter.Release();
    }
}

async ValueTask WriteMessageAsync(Message message)
{
    if (!await singleWriter.WaitAsync(timeout))
        ThrowTimeout();
    try
    {
        ActuallyWriteTheThing(message);
        await FlushAsync();
    }
    finally
    {
        singleWriter.Release();
    }
}

This looks broadly similar to what we had before; the new SemaphoreSlim(1) initializes the semaphore with a limit of 1, i.e. a mutex. In the synchronous path, it works mostly like it always did, but the asynchronous path (now used by the asynchronous callers) now correctly releases worker threads back to wherever worker threads go - when either they can't get the lock yet, or when they are waiting (or rather: awaiting) on the flush. We still have the sync-over-async in the sync path, but that's not really a problem in this case - but we've completely fixed the async path. Short of removing or optionally disabling the sync path (which is an idea I'm putting serious thought into doing, as an opt-in thing), that's probably about as good as we can get.

This looks like it should work, and the chances are that this would have completely solved the problems being seen by our consumers with heavily asynchronous workloads. But one of the nice things about working at Stack Overflow is that I have an opportunity to dogfood library releases under Stack Overflow load (which isn't "big big" by any stretch, but it is comfortably big enough to give me confidence that the library isn't pathologically broken). So, we dropped the above changes into production (after testing etc.), and: BOOM!

We went down.

What happened there?

Fortunately, we were lucky enough to manage to grab some process dumps from the production servers in their death-throes before we stood them back up (with the older version of the library), and the stack-traces in the doomed processes were very interesting; they are pretty verbose, but something that kept recurring (note: I've inverted and summarised this trace for readability):

WriteMessage
…
System.Threading.SemaphoreSlim.Wait
…
System.Threading.SemaphoreSlim.WaitAsync
…
KERNELBASE!WaitForMultipleObjectsEx

This was the case for 650+ threads - almost all of them; and critically, no-one actually had the lock - nobody was doing anything useful. The semaphore had, in an edge case, failed to activate the lucky winner of the conch.

What actually went wrong?

Looking at it, our synchronous WriteMessage implementation, when calling Wait on the semaphore, was calling into WaitAsync, and then blocking at the kernel for the object. Despite looking odd, this by itself isn't actually a terrible idea. It turns out that SemaphoreSlim has different strategies that it uses internally:

  • if you're just using synchronous Wait calls, it can handle everything using regular synchronous code and syncronous blocks
  • if you're just using WaitAsync, because it wants to release the caller promptly, it needs to maintain a queue (actually a linked-list) of waiting callers as Task<bool>; when you release something, it takes the next item from one end of the list, and reactivates (TrySetResult) that caller
  • if you're using a mixture of Wait and WaitAsync, if it can't get access immediately, then it uses the WaitAsync approach so that the Wait and WaitAsync consumers are in the same queue - otherwise you'd have two separate queues and it gets very confusing and unpredictable

Now this seems fine, but it turns out that the way it was using TrySetResult was… problematic. It wasn't using TrySetResult directly, but instead was enqueuing a work item to do the TrySetResult. There's actually a good - albeit now legacy - reason for this: thread stealing, another problem I've had to contend with many times.

When you call TrySetResult etc. on a Task<T> (usually via TaskCompletionSource<T>), it is possible (likely, even) that the async continuation is going to run immediately and inline on the thread that called TrySetResult. This is something you need to be really careful about - it can lead to dedicated IO threads somehow ending up serving web requests; or more generally: just … not doing what you expected. But in the scenario presented we got into a "spiral of death": due to a very brief blip from the FlushAsync, our workers had got stuck in the Wait->WaitAsync path, and the very thing that was meant to unblock everything: needed a worker. To release (resource) you need more of (resource), and (resource) is currently exhausted. It is almost impossible to recover from that situation due to the growth limits on workers, and the servers became increasingly unstable until they stopped working completely.

This is clearly a dangerous scenario, so we reported it as an issue, and amazingly within a day Stephen Toub had a surprisingly minimal and elegant fix for SemaphoreSlim. The commit message (and code changes themselves) explain it in more depth, but by configuring the queued tasks with the TaskCreationOptions.RunContinuationsAsynchronously flag, it means the "release" code can call TrySetResult directly, without needing an extra worker as an intermediary. In the specific case where the only thing waiting on the task is a synchronous Wait, the task code already has specific detection to unblock that scenario directly without needing a worker at all, and in the genuine async/await case, we just end up with the actual work going to the queue, rather than the "call TrySetResult" going to the queue. Tidy!

But that isn't the end of the story

It would be nice to say "all's well that ends well; bug in SemaphoreSlim fixed", but it isn't as easy as that. The fix for SemaphoreSlim has been merged, but a) that won't help "us" until the next .NET Framework service release, and b) as library authors, we can't rely on which service releases are on our consumers' machines. We need a fix that works reliably everywhere. So whilst it is great to know that our pain has improved things for future users of SemaphoreSlim, we needed something more immediate and framework-independent. So that's when I went away and created a bespoke synchronous/asynchronous MutexSlim that we are now using in StackExchange.Redis.

It is amazing how much simpler things become if you limit yourself to "0 or 1 people in the pool", so it wasn't actually that much work; but: I thought I knew a lot about async/await, yet in writing MutexSlim I dove deeper into that topic than I have usually had to; and in the second part I'll talk about some of what I learned.