Tuesday, 7 January 2020

.NET Core, .NET 5; the exodus of .NET Framework?

tl,dr; opinion: ongoing .NET Framework support for F/OSS libraries may quickly start evaporating, and this should be a consideration in migration planning.


First, a clarification of terms, because they matter:

  • .NET Framework - the original .NET, the one that ships on Windows and only on Windows; the current (and probably final) version of .NET Framework is 4.8
  • .NET Core - the evolution of .NET, that is not tied to the OS as much, with slightly different feature sets, and where most of the Microsoft .NET effort has been for the last few years; .NET Core 3.1 shipped recently
  • .NET Standard - an API definition (not implementation - akin to an interface) that allows a library to target a range of platforms in a single build, i.e. by targeting .NET Standard 2.0 a library can in theory run equivalently on .NET Core 3 and .NET Framework 4.6.2 (ish...) and others (Mono, Unity, etc), without needing to target each individually
  • .NET 5 - the next version of .NET Core; the naming deliberately emphasizes that there isn't a two-pronged development future consisting of "Framework" and "Core", but just one - this one - which isn't "Core" in the "minimal" sense, but is in fact now a very rich and powerful runtime; .NET 4 was avoided to prevent versioning confusion between .NET 4.* and .NET Framework 4.* (and again, to emphasize that this is the future direction of .NET, including if you are currently on .NET Framework)

The first thing we must be clear about, in case it isn't 100% clear from the above, is that .NET Framework is legacy completed. There isn't going to be a .NET Framework 4.9 or a .NET Framework 5. There might be some critical security fixes, but there aren't going to be feature additions, unless those additions come from out-of-band NuGet (etc) packages that just happened to work on .NET Framework on the first (or maybe second) try.

I commented on Twitter yesterday about my perceptions on the status of this, and how we (the .NET community) should look at the landscape; it goes without saying that I'm merely opining here - I'm not a spokesperson for Microsoft, but I am a library author and consumer, and I work extensively in the .NET space. Other views and conclusions are possible! But: I wanted to take the time to write up a more long-form version of what I see, with space to give reasons and discuss consequences.

What I said yesterday

The short version is: I expect that 2020 will see a lot of library authors giving serious consideration as to whether to continue shipping .NET Framework support on new library versions. There are lots of reasons for this, including:

  • increasing feature gaps making it increasingly expensive to support multiple frameworks, either via compatibility shims or framework-dependent feature sets
  • as more and more library authors complete their own migrations to .NET Core, the effort required to support a framework that they aren't using increases:
    • bugs don't get spotted until they've shipped to consumers
    • a lot of knowledge of "the old framework" needs to be retained and kept in mind - a particular issue with new contributors who might never have used that framework (and yes, there are some huge gotchas)
    • there are often two (or more) code implementations to support
    • builds are more complicated than necessary (requiring either Windows or the build-pack), and take longer
    • tests take longer and require Windows
    • packages and dependency trees are larger than necessary
  • not all new language features are equal citizens on down-level frameworks
    • some features, such as default interface methods, will not work on down-level frameworks
    • some important features like C# 8 nullability are in a weird middle ground where some bits kinda work sometimes most of the time except when it doesn't
    • some, like IAsyncEnumerable<T> may have compatibility shims, but that only allows minimal support on library surfaces, since of course many framework level pieces to produce or consume such will be missing
  • some APIs are fundamentally brittle on .NET Framework, especially when multi-targeting, with the breaks happening only at run-time (they are not obvious at build, and may not be obvious until a very specific code-path is hit, which might be a long time after initial deployment); a lot of this comes does to the assembly loader and assembly-binding-redirects (a problem that simply does not exist in .NET Core / .NET 5)
    • if you want to see a library author cry, mention System.ValueTuple, System.Numerics.Vectors, or System.Runtime.CompilerServices.Unsafe. Why? Because they are deployment nightmares if you are targeting multiple platforms, because .NET Framework makes a complete pig's ear of them; you can just about fix it up with assembly-binding-redirects some of the time, but the tooling will not and can not do this for you, which is pure pain for a library author
    • recall that .NET Framework is "complete"; the loader isn't going to be fixed (also, nobody wants to touch it); alternatively, it could be said that the loader has already been fixed; the fix is called .NET Core / .NET 5
  • a lot of recent performance-focused APIs are not available on .NET Framework, or perform very differently (which is almost the worst possible outcome for performance-focused APIs!); for example:
    • concurrency: a lot of async APIs designed for highly concurrent systems (servers, in particular) will be simply missing on .NET Framework, or may be implemented via async-over-sync / sync-over-async, which significantly changes the characteristics
    • allocations: ther are a lot of new APIs designed to avoid allocations, typically in library code related to IO, data-processing etc - things like Span<T>; the APIs to interact with the framework with these directly with these won't exist on .NET Framework, forcing dual code paths, but even when they do, .NET Framework uses a different (and less optimal) Span<T> implementation, and the JIT lacks the knowledge to make Span<T> be magical; you can hack over some of the API gaps using pointer-based APIs when they exist, but then you might be tempted to use Unsafe.*, which as already mentioned: wants to kill you
    • processing: one of the most powerful new toolkits in .NET for CPU-focused work is access to SIMD and CPU intrinsics; both of these work especially well when mixed with spans, due to the ability to coerce between spans and vectors - but we just saw how Span<T> is problematic; full CPU intrinsics are only available on .NET Core / .NET 5, but you can still get a lot done by using Vector<T> which allows SIMD on .NET Framework... except I already mentioned that System.Numerics.Vectors is one of the trifecta of doom - so yes, you can use it, but: brace yourself.
    • now consider that a lot of libraries - including Microsoft libraries on NuGet, and F/OSS libraries - are starting to make more and more use of these features for performance, and you start to see how brittle things get, and it often won't be the library author that sees the problem.
  • as .NET Core / .NET 5 expand our ability to reach more OSes, we already have enough permutations of configurations to worry about.
  • often, the issues here may not be just down to a library, but may be due to interactions of multiple libraries (or indeed, conflicting dependencies of multiple libraries), so the issues may be unique to specific deployments.

How about just offering patch support, not feature support?

So the theory here is that we can throw our hands in the air, and declare "no new features in the .NET Framework version - but we'll bugfix". This sounds great to the consumer, but... it isn't really very enticing to the maintainer. In reality, this means branching at some point, and now ... what happens? We still retain all of the build, test, deploy problems (although now we might need completely different build/CI tools for each), but now we have two versions of the code that are drifting apart; we need to keep all the old things in our mind for support, and when we bugfix the current code, we might also need to backport that bug into a branch that uses very different code, and test that. On a platform that the library maintainers aren't using.

F/OSS isn't free; it is paid for by the maintainers. When proposing something like the above, we need to be very clear about whose time we are committing, and why we feel entitled to commit it. Fundamentally, I don't think that option scales very well. At some point, I think it becomes increasingly necessary to think of .NET Framework in the same way that we have thought of .NET 1.* for a very long time - it is interesting to know that it exists, but the longer you stay stuck on that island, the harder life is going to become for you.

In particular, to spell it out explicitly; I expect a number of libraries will start rebasing to .NET Standard 2.1 and .NET Core 3.0 or 3.1 as their minimum versions, carving off .NET Framework. The choice of .NET Standard 2.1 here isn't necessarily "because we want to use APIs only available in 2.1", but is instead: "because we actively don't want .NET Framework trying to run this, and .NET Framework thinks, often mistakenly, that it works with .NET Standard 2.0" (again, emphasis here is that .NET Framework 4.6.2 only sort of implements .NET Standard 2.0, and even when it does, it drags in a large dependency graph; this is partly resolved if you also target .NET Framework 4.7.2, but your list of TFMs is now growing even further).

So what happens to .NET Framework folks?

I totally get that a lot of people will be stuck on .NET Framework for the foreseeable future. Hell, a lot of our code at Stack Overflow is still .NET Framework (we're working through migration). I completely understand and empathize with all the familiar topics of service lifetimes, SLAs, budgets, clients, contracts/legals, and all of those things.

Just like nobody is coming to take .NET Framework off your machine, nobody is coming to take F/OSS libraries either. What I'm saying is that a time may come - and it is getting closer on the horizon - when you just won't get updates. The library you have today will continue working, and will still be on NuGet, but there won't be feature updates, and very few (for the reasons above) bug fixes.

I know I've spoken about open source funding before, but: at some point, if your business genuinely needs additional support on .NET Framework where it is going to create significant extra work (see: everything above) for the maintainers, perhaps at some point this is simply a supply-chain issue, and one solution is to sponsor that work and the ongoing support. Another option may be to fork the project yourself at the point where you're stuck, and maintain all the changes there, perhaps even supporting the other folks using that level. If you're thinking "but that sounds like a lot of effort": congratulations, you're right - it is! That's why it isn't already being done. All such work is zero sum; time spent on the additional work needed to support .NET Framework is time not being spent actually developing the library for what the maintainer wants and needs, and: it is their time being spent.

Conclusion

A lot of what I've discussed here is opinion; I can't say for sure how it will play out, but I think it is a very real (and IMO likely) possibility. As such, I think it is just one facet of the matrix you should be considering in terms of "should we, or when should we, look to migrate to .NET Core / .NET 5"; key point: .NET Core 3.1 is a LTS release, so frankly, there's absolutely no better time than now. Is migrating work? Yes, it is. But staying put also presents challenges, and I do not believe that .NET Framework consumers can reasonably expect the status-quo of F/OSS support (for .NET Framework) to continue.

(the Twitter thread)

Friday, 23 August 2019

Prefer ValueTask to Task, always; and don't await twice

Preamble - not a part 2

A little while ago I blogged here and I set it up to be a "continues..." style post. I haven't had the energy to continue it in that context, and this fact was putting me off concluding the post. I then realised: the thing that matters isn't some overarching narrative structure, but that I get my ideas down. So: I'm aborting any attempt at making this post a continuation, and just focusing on the content!

Prefer ValueTask[<T>] to Task[<T>], always.

There's been a lot of confusion over when to use Task[<T>] vs ValueTask[<T>] (note: I'm going to drop the [<T>] from now on; just pretend they're there when you see Task / ValueTask etc).

Context: what are Task and ValueTask?

In case you don't know, Task and ValueTask are the two primary implementations of "awaitable" types in .NET; "awaitable" here means that there is a duck-typed signature that allows the compiler to turn this:

int i = await obj.SomeMethodAsync();

into something like this:

var awaiter = obj.SomeMethodAsync().GetAwaiter();
if (!awaiter.IsCompleted)
{
    // voodoo here that schedules a
    // continuation that resumes here
    // once the result becomes available
}
int i = awaiter.GetResult();

Task is the original and most well known API, since it shipped with the TPL, but it means that an object allocation is necessary even for scenarios where it turns out that it was already available, i.e. awaiter.IsCompleted returned true. The ValueTask value-type (struct) acts as a hybrid result that can represent an already completed result without allocating or an incomplete pending operation. You can implement your own custom awaitables, but it isn't common.

When to choose each, the incorrect version

If you'd asked me a while back about when to choose each, I might have incorrectly said something like:

Use Task when something is usually or always going to be genuinely asynchronous, i.e. not immediately complete; use ValueTask when something is usually or always going to be synchronous, i.e. the value will be known inline; also use ValueTask in a polymorphic scenario (virtual, interface) where you can't know the answer.

The logic behind this incorrect statement is that if something is incomplete, your ValueTask is going to end up being backed by a Task anyway, but without the extra indirection and false promise of ValueTask. This is incorrect, though, because it is based on the premise that a ValueTask is a composite of "known result (T)" and "Task". In fact, ValueTask is also a composite of a third thing: IValueTaskSource[<T>].

What is IValueTaskSource[<T>]?

IValueTaskSource is an abstraction that allows you to represent the logical behaviour of a task separately to the result itself. That's a little vague, so an example:

IValueTaskSource<int> someSource = // ...
short token = // ...
var vt = new ValueTask<int>(someSource, token);
// ...
int i = await vt;

This now functions like you'd expect from an awaitable, but even in the incomplete/asynchronous case the logic about how everything works is now down to whatever implements the interface - it does not need to be backed by a Task. You might be thinking:

ah, but we still need an instance of whatever is implementing the interface, and we're treating it as a reference, so: we're still going to allocate; what's the point? what have you gained?

And that's when I need to point out the short token. This little gem allows us to use the same interface instance with multiple value-tasks, and have them know the difference. There are two ways you could use this:

  • keep the state for multiple asynchronous operations concurrently, using the token to pick the correct state (presumably from a vector)
  • keep a single piece of state for multiple consecutive operations, using the token to guarantee that we're talking about the correct one

The second is actually by far the more common implementation, and in fact is now included in the BCL for you to make direct use of - see ManualResetValueTaskSourceCore<T>.

So what? How does this help me?

OK; so - we've seen that this alternative exists. There are two ways that people commonly author awaitable APIs today:

  • using TaskCompletionSource<T> and handing the caller the .Task (perhaps wrapped in a ValueTask), and calling TrySetResult etc when we want to trigger completion
  • using async and await, having the compiler generate all the machinery behind the scenes - noting that this currently involves creating a Task in the incomplete case, even for ValueTask methods (because it has to come from somewhere)

Hopefully you can see that if we have ValueTask available to us it is relatively easy to substitute in a ManualResetValueTaskSourceCore backer, allowing us to reuse the same IValueTaskSource instance multiple times, avoiding lots of allocations. But: there's an important caveat - it changes the API. No, really. Let's take a stroll to discuss how...

Don't await twice

Right now, the following code works - assuming the result is backed by either a fixed T or a Task<T>:

var pending = obj.SomeMethodAsync();
int i = await pending;
// ...
int j = await pending;

You'll get the same answer from each await, unsurprisingly - but the actual operation (the method) is only performed once. But: if we switch to ManualResetValueTaskSourceCore, we should only assume that each token is valid exactly once; once we've awaited the result, the entire point is that the backing implementation is free to re-use that IValueTaskSource with a different token for another consumer. That means that the code shown above is no longer legal, and we should expect that the second await can now throw an exception about the token being incorrect.

This is a pretty rare thing to see in code, so personally I'm OK with saying "tough; await once only". Think of it in human terms; this is like a manager going to someone's desk and saying:

Hi, I need the answer to (some topical question); do you know that now? if so, tell me now; otherwise, when you have the answer, bring it (somewhere) and nudge me.

All fine and reasonable so far; our office hero didn't know the answer right away, so they went away and got it, took it where instructed and handed the answer to the manager.

20 minutes later (or 2 days later), the manager stops by their desk again:

Hey, give me that answer

At this point, our hero might reasonably say

Boss, I already gave it you; I only printed it out once - you have the copy; I deal with lots of requests each day, and I can't even remember what you asked about, let alone what the answer was; if you've forgotten the answer, that's on you - feel free to ask again, it's all billable

This is kinda how I anthropomorphize ValueTask, especially in the context of IValueTaskSource. So key point: don't await twice. Treat the results of awaitables exactly the same as you would the result of any other expression: if you are going to need the value twice, store it in a local when you first fetch it.

How else can we benefit from IValueTaskSource?

So; we've seen how we can manually use an IValueTaskSource to efficiently issue ValueTask awaitable results; but if we use async/await, in the incomplete / asynchronous case the compiler is still going to be generating a Task - and also generating a bunch of other state boxes associated with the continuation voodoo. But.. it doesn't have to! A while ago I did some playing in this area that resulted in "Pooled Await"; I'm not going to go into details about this here, and for reasons that will become clear in a moment, I don't recommend switching to this, but the short version is: you can write a method that behaves exactly like a ValueTask awaitable method (including async), but the library makes the compiler generate different code that using IValueTaskSource to avoid the Task allocation, and uses state machine boxing to reduce the other allocations. It works pretty well, but as you might expect, it has the above caveat about awaiting things more than once

So; why am I saying don't leap at this? That because the BCL folks are also now playing in this space, as evidenced by this PR, which has pretty much the exact same feature set, but the advantages of:

  • being written by people who really, really understand async
  • it not adding any dependencies - it would just work out of the box for ValueTask awaitables

If that happens, then a lot of asynchronous code will magically get less allocatey all at once. I know this is something they've discussed in the past, so maybe my "Pooled Await" stuff gave them the metaphorical kick to go and take another look at implementing it for real; or maybe it was just a timing coincidence.

For both my own implementation and the BCL version, it can't do all the magic if you return Task - for best results, a ValueTask is needed (although "Pooled Await" still reuses the state-machine boxes for Task APIs)

Conclusion

So, going back to the earlier question of when to use Task vs ValueTask, IMO the answer is now obvious:

Use ValueTask[<T>], unless you absolutely can't because the existing API is Task[<T>], and even then: at least consider an API break

And also keep in mind:

Only await any single awaitable expression once

If we put those two things together, libraries and the BCL are free to work miracles in the background to improve performance without the caller needing to care.

Thursday, 21 February 2019

Fun with the Spiral of Death

Subtitled: "a cautionary tale of SemaphoreSlim", an adventure in two parts:

  • In part 1 I want to discuss a very fun series of problems we had in some asynchronous code - where "fun" here means "I took Stack Overflow offline, again". Partly because it is a fun story, but mostly because I think there's some really useful learning points in there for general adventures in asynchronicity
  • In part 2 I want to look at some of the implementation details of our eventual fix, which covers some slightly more advanced themes around how to implement awaitable code in non-trivial scenarios

I took Stack Overflow offline, again

As a side note: many of the themes here run hand-in-hand with David and Damian's recent presentation "Why your ASP.NET Core application won't scale" at NDC; if you haven't seen it yet: go watch it - in particular everything around "the application works fine until it suddenly doesn't" and "don't sync-over-async or async-over-sync".

A lot of this journey relates to our migration of StackExchange.Redis to use "pipelines", the new IO layer in .NET (previously discussed here, here, here, and here - I love me some pipelines). One of the key design choices in StackExchange.Redis is for the library to implement multiplexing to allow multiple concurrent calling threads to communicate over the same underlying socket to the server; this keeps the socket count low while also helping to reduce packet fragmentation, but it means that we need to do some synchronization around how the many caller threads access the underlying socket.

Before the pipeline migration, this code was basically synchronous (it was a bit more complex, but… that's close enough), and the "write an actual command" code could be expressed (if we take some liberties for readability) as below:

readonly object syncLock = new object(); // single writer

void WriteMessage(Message message)
{
    bool haveLock = false;
    try
    {
        Monitor.TryEnter(syncLock, timeout, ref haveLock);
        if (!haveLock) ThrowTimeout();

        ActuallyWriteTheThing(message);
        Flush();
    }
    finally
    {
        if (haveLock) Monitor.Exit(syncLock);
    }
}

This is a fairly normal style of coding - the try/finally/Monitor/haveLock code here is just a standard implementation of "lock with a timeout", so all this really does is:

  • try to acquire exclusive access to the socket, guarded by syncLock
  • if successful, write and flush

All reasonable. But then we moved to pipelines, and one of the defining features of the pipelines implementation is that key steps in it are async. You might assume that it is the write that is async - but since you write to a buffer pool, this isn't actually the case - it's the flush that is async. The flush in pipelines achieves a few different things:

  • if necessary, it activates the consumer that is pulling work from the pipe and sending it to the next step (a socket in our case)
  • it provides back-pressure to the provider (WriteMessage in this case), so that if the consumer is falling behind and there's too much backlog, we can slow down the provider (in an asynchronous way) so we don't get unbounded buffer growth

All very neat.

But switching from synchronous code to an API that uses async is not always trivial - async begets async, and once you start going async, it all goes async. So… I did a bad thing; I was lazy, and figured "hey, flush will almost always complete synchronously anyway; we can probably get away with a sync-over-async here" (narrator: they didn't get away with it).

So; what I did was something like:

readonly object syncLock = new object(); // single writer

void WriteMessage(Message message)
{
    bool haveLock = false;
    try
    {
        Monitor.TryEnter(syncLock, timeout, ref haveLock);
        if (!haveLock) ThrowTimeout();

        ActuallyWriteTheThing(message);
        FlushSync();
    }
    finally
    {
        if (haveLock) Monitor.Exit(syncLock);
    }
}

void FlushSync() // evil hack, DO NOT USE
{
    var flush = FlushAsync();
    if (!flush.IsCompletedSuccessfully)
    {
        flush.Wait();
    }
}

The IsCompletedSuccessfully here is a check you can use on many task-like (awaitable) results to see if it completed synchronously and without faulting; if it did, you're safe to access the .Result (etc.) and it will all be available already - a good trick for avoiding the async state-machine complications in high-throughput code (typically library code, not application code). The bad bit is the .Wait(…) when it isn't already completed - this is a sync-over-async.

What happened next?

A key thing to keep in mind is that StackExchange.Redis exposes both synchronous and asynchronous APIs - i.e. there are twin methods, for example:

  • RedisValue StringGet(RedisKey key)
  • Task<RedisValue> StringGetAsync(RedisKey key)

Internally they are implemented very differently so that they both get the job done with the minimum of fuss and overhead, but they were both calling into the same WriteMessage at some point. Actually, never afraid to double-down on the anti-patterns, this means that for the async callers, they were effectively doing async-over-sync-over-async; ouch.

The WriteMessage code above is used from both the synchronous and asynchronous call paths. As it happens, much of our internal existing application codebase mostly uses the synchronous paths (we're gradually adding more async, but we need to complete our in-progress transition from .NET Framework to .NET Core to be able to do it more extensively), and on the synchronous paths you were always going to be blocked anyway, so from the perspective of synchronous callers, there's not really that much wrong with the above. It does what it promises: execute synchronously.

The problem here comes from asynchronous callers, who thought they were calling StringGetAsync, and their thread got blocked. The golden rule of async is: don't block an async caller unless you really, really have to. We broke this rule, and we had reports from users about big thread-jams with async call paths all stuck at WriteMessage, because one thread had paused for the flush, and all the other threads were trying to obtain the lock.

Note: the problem here isn't that "a backlog happened, and we had to delay" - that's just business as normal. That happens, especially when you need mutex-like semantics. The problem is that we blocked the worker threads (although we did at least have the good grace to include a timeout), which under heavy load caused thread-pool starvation and a cascading failure (again: watch the video above).

So what should we have done in theory?

Given that we have both synchronous and asynchronous call-paths, what we should do is have two versions of the write code:

  • void WriteMessage(Message message)
  • ValueTask WriteMessageAsync(Message message)

but we get into immediate problems when we talk about our locking mechanism. We can see this more clearly if we use a simple lock rather than the more complex Monitor usage above - the following does not compile:

async ValueTask Foo()
{
    lock (syncLock)
    {
        // CS1996 - Cannot await in the body of a lock statement
        await Task.Delay(SomeAmount);
    }
}

The reason this doesn't work is that lock (aka Monitor) is thread-oriented. You need to [Try]Enter (take the lock) and Exit (release the lock) the constrained region from the same thread. But the moment you await, you're saying "this might continue synchronously, or it might resume later on a different thread". This actually has two consequences:

  • it would mean that when we try to release the lock, it will fail because the resuming thread probably won't actually have it
  • when we await, we're releasing the current thread back to do whatever else needs doing… which could actually end up calling back into Foo… and Monitor is "re-entrant", meaning: if you have the lock once, you can actually lock again successfully (it maintains a counter internally), which means that code in a completely unrelated execution context could incorrectly end up inside the lock, before we've resumed from the await and logically released it

As a side note, it is worth knowing that the compiler only spots this (CS1996) if you use lock; if you use manual Monitor code (because of timeouts), it won't warn you - you just need to know not to do this (which perhaps by itself is good motivation for "lock with timeout" as a language feature). Fortunately, I did know not to do this - and I moved to the next most obvious locking primitive: SemaphoreSlim. A semaphore is like Monitor, but instead of being thread-based, it is purely counter-based. Theoretically you can use a semaphore to say "no more than 5 in here", but in reality it is often used as a mutex by saying "no more than 1". SemaphoreSlim is particularly enticing because it has both synchronous and asynchronous APIs, allowing us to split our code in two fairly neatly:

readonly SemaphoreSlim singleWriter
    = new SemaphoreSlim(1); // single writer

void WriteMessage(Message message)
{
    if (!singleWriter.Wait(timeout))
        ThrowTimeout();
    try
    {
        ActuallyWriteTheThing(message);
        FlushSync(); // our hack from before
    }
    finally
    {
        singleWriter.Release();
    }
}

async ValueTask WriteMessageAsync(Message message)
{
    if (!await singleWriter.WaitAsync(timeout))
        ThrowTimeout();
    try
    {
        ActuallyWriteTheThing(message);
        await FlushAsync();
    }
    finally
    {
        singleWriter.Release();
    }
}

This looks broadly similar to what we had before; the new SemaphoreSlim(1) initializes the semaphore with a limit of 1, i.e. a mutex. In the synchronous path, it works mostly like it always did, but the asynchronous path (now used by the asynchronous callers) now correctly releases worker threads back to wherever worker threads go - when either they can't get the lock yet, or when they are waiting (or rather: awaiting) on the flush. We still have the sync-over-async in the sync path, but that's not really a problem in this case - but we've completely fixed the async path. Short of removing or optionally disabling the sync path (which is an idea I'm putting serious thought into doing, as an opt-in thing), that's probably about as good as we can get.

This looks like it should work, and the chances are that this would have completely solved the problems being seen by our consumers with heavily asynchronous workloads. But one of the nice things about working at Stack Overflow is that I have an opportunity to dogfood library releases under Stack Overflow load (which isn't "big big" by any stretch, but it is comfortably big enough to give me confidence that the library isn't pathologically broken). So, we dropped the above changes into production (after testing etc.), and: BOOM!

We went down.

What happened there?

Fortunately, we were lucky enough to manage to grab some process dumps from the production servers in their death-throes before we stood them back up (with the older version of the library), and the stack-traces in the doomed processes were very interesting; they are pretty verbose, but something that kept recurring (note: I've inverted and summarised this trace for readability):

WriteMessage
…
System.Threading.SemaphoreSlim.Wait
…
System.Threading.SemaphoreSlim.WaitAsync
…
KERNELBASE!WaitForMultipleObjectsEx

This was the case for 650+ threads - almost all of them; and critically, no-one actually had the lock - nobody was doing anything useful. The semaphore had, in an edge case, failed to activate the lucky winner of the conch.

What actually went wrong?

Looking at it, our synchronous WriteMessage implementation, when calling Wait on the semaphore, was calling into WaitAsync, and then blocking at the kernel for the object. Despite looking odd, this by itself isn't actually a terrible idea. It turns out that SemaphoreSlim has different strategies that it uses internally:

  • if you're just using synchronous Wait calls, it can handle everything using regular synchronous code and syncronous blocks
  • if you're just using WaitAsync, because it wants to release the caller promptly, it needs to maintain a queue (actually a linked-list) of waiting callers as Task<bool>; when you release something, it takes the next item from one end of the list, and reactivates (TrySetResult) that caller
  • if you're using a mixture of Wait and WaitAsync, if it can't get access immediately, then it uses the WaitAsync approach so that the Wait and WaitAsync consumers are in the same queue - otherwise you'd have two separate queues and it gets very confusing and unpredictable

Now this seems fine, but it turns out that the way it was using TrySetResult was… problematic. It wasn't using TrySetResult directly, but instead was enqueuing a work item to do the TrySetResult. There's actually a good - albeit now legacy - reason for this: thread stealing, another problem I've had to contend with many times.

When you call TrySetResult etc. on a Task<T> (usually via TaskCompletionSource<T>), it is possible (likely, even) that the async continuation is going to run immediately and inline on the thread that called TrySetResult. This is something you need to be really careful about - it can lead to dedicated IO threads somehow ending up serving web requests; or more generally: just … not doing what you expected. But in the scenario presented we got into a "spiral of death": due to a very brief blip from the FlushAsync, our workers had got stuck in the Wait->WaitAsync path, and the very thing that was meant to unblock everything: needed a worker. To release (resource) you need more of (resource), and (resource) is currently exhausted. It is almost impossible to recover from that situation due to the growth limits on workers, and the servers became increasingly unstable until they stopped working completely.

This is clearly a dangerous scenario, so we reported it as an issue, and amazingly within a day Stephen Toub had a surprisingly minimal and elegant fix for SemaphoreSlim. The commit message (and code changes themselves) explain it in more depth, but by configuring the queued tasks with the TaskCreationOptions.RunContinuationsAsynchronously flag, it means the "release" code can call TrySetResult directly, without needing an extra worker as an intermediary. In the specific case where the only thing waiting on the task is a synchronous Wait, the task code already has specific detection to unblock that scenario directly without needing a worker at all, and in the genuine async/await case, we just end up with the actual work going to the queue, rather than the "call TrySetResult" going to the queue. Tidy!

But that isn't the end of the story

It would be nice to say "all's well that ends well; bug in SemaphoreSlim fixed", but it isn't as easy as that. The fix for SemaphoreSlim has been merged, but a) that won't help "us" until the next .NET Framework service release, and b) as library authors, we can't rely on which service releases are on our consumers' machines. We need a fix that works reliably everywhere. So whilst it is great to know that our pain has improved things for future users of SemaphoreSlim, we needed something more immediate and framework-independent. So that's when I went away and created a bespoke synchronous/asynchronous MutexSlim that we are now using in StackExchange.Redis.

It is amazing how much simpler things become if you limit yourself to "0 or 1 people in the pool", so it wasn't actually that much work; but: I thought I knew a lot about async/await, yet in writing MutexSlim I dove deeper into that topic than I have usually had to; and in the second part I'll talk about some of what I learned.

Thursday, 6 December 2018

A Thanksgiving Carol

Normally I write about programming topics (usually .NET); today I'm going to veer very far from that track - and talk about society, mental health, individual and corporate responsibility, and personal relationships. I genuinely hope you hear me out, but if that isn't your thing ... well, then you probably need to read it more than most. I could try a clever reverse psychology trick to oblige you to see it through, but you'd see straight through it... or would you?

My apologies in advance if I seem to be on a negative tone through much of this - I'm pulling no punches in something that has been a quite deep - and painful - personal journey and realisation. I assure you that it ends much more positively than the body might suggest. Maybe for me this is mostly cathartic self-indulgence and rambling, but.. it's my personal blog and I get to do that if I want. But if it makes even one person think for a few minutes, it has been well worth my time.

So; on with the real title:

Technology is Outpacing our Individual and Societal Health

This week, I've identified hugely with that famous (infamous?) festive favorite: Ebenezer Scrooge (humbug!). Not the usury part - but instead:

  • the familiar story of spending a long time making choices that cause harm
  • having some catastrophic event or events bring everything into focus
  • having a genuine yet painful inspection of those past (and present) choices
  • consideration of what those choices mean for the future
  • undergoing a fundamental transformation, a realignment of priorities and thinking, that should lead to a much happier future
  • actively walking that path with actions, not just hollow words

See, I got heavy and personal! Let's see how deep this rabbit hole goes. How to start...


Recently I nearly destroyed my marriage and a relationship of nearly 25 years.

As opening lines go, it isn't quite up there with "Marley was dead: to begin with.", but it's all I've got. It wasn't anything huge and obvious like an affair or a huge violent argument. What I did was to make - over an extended period of time - a series of bad choices about my relationship with technology.

The reality of the era is that we are absolutely surrounded by technology - it encroaches and invades on every aspect of our lives, and it has progressed so fast that we haven't really had time to figure out where "healthy" lies. I must immediately stress that I don't say this to absolve myself of responsibility; we're adults, and we must own the choices that we make, even if we make those choices in an environment that normalises them. So what do I mean?

Ultimately, the heart of my personal failings here stem from how easy - and tempting - it can be to lose ourselves in a digital world. We live in such a hyper-connected time, surrounded by flashing instant updates at every turn. It is alarmingly easy to confuse the signals that this electronic phantom universe provides, prioritising them over the real world in front of us. I'm sure we can all relate to seeing a group of people out together, whether at a bar, a meal, or some other social gathering - and seeing the mobile phones come out regularly. Don't get me started on the idiots who think they can drive while distracted by a phone. I'm certainly guilty of occasionally "parenting" by observing the digitial-tablet-infused face of one of my children, by half-watching them over the top of a mobile. And I'd be lying if I said I'd never treated my marriage with the same over-familiarity bordering on contempt.

The digital world is so easy and tempting. Everything is immediate and easy. The real world takes effort, work, and time. When I was growing up, "allow 28 days for delivery" was a mantra; today, if something physical won't arrive within 28 hours we look at alternative vendors; for purely virtual items, we'd get twitchy and worried if it took 28 minutes.

I've reached the conclusion that among other things, I was - for want of a better word - in an addictive and unhealthy relationship with the internet. The internet is amazing and brilliant - and I'm not proposing we need to nuke it from orbit, but it is at our great peril that we think that it is always (or ever) without harm. We have grown complacent, when we should be treating it with respect and, yes, at times: fear - or at least concern.

We build a global platform for communicating data - all the shared collective knowledge and wisdom of the world past and present, and how do we choose to use it? If only it was "sharing cat pics", maybe the world would be a better place. Instead, as people, we mostly seem to use it for either validating ourselves in echo chambers (tip: nothing useful is ever achieved by listening to people you already agree with), or getting into angry anonymous rows with strangers. Either triggers a spurt of rewarding chemicals to the brain, but they're both usually entirely empty of any real achievement. If only that was the only mine to avoid.

Perverse Incentives and Eroded Psychological Walls

Again, I want to keep emphasizing that no personal responsibility is voided, but we haven't arrived at this place in isolation. At risk of sounding like a radical anti-capitalist (I'm not - really), corporate interests are actively averse to us having a healthy relationship with the internet. One way this materializes is in the notion of "engagement". Now; "engagement" by itself isn't an unreasonable measure, but as with most measures: the moment that we start treating it as a target, all hell breaks loose.

Because all genuine inspections should start at home, I'll start by talking about Stack Overflow. We have a measure there, on a user's profile page: consecutive days visited. We're not monsters, so we only display this on your own profile, but: I can only see negative things about this number. On its own, it adds nothing (not least: you can't compare it to anything), but: I know that at some point in history I cared about that number. I would try to do something, anything to not lose this number, including checking in while on family holidays. And here's the thing: the more you maintain it, the more it feels to lose. It is purely a psychological thing, but... when thinking about it, I can't think of a single positive use of this number. The only thing it does is encourage wholly harmful behaviours. I love our users, and I want them to be happy, rested, and healthy. Making users not want to go even 24 hours without checking in with us - well, that doesn't seem good to me. If anything, it sounds like a great way to cause burnout and frustration. I would love to start a conversation internally about whether we should just nuke that number entirely - or if anything, use it to prompt a user "hey, we really love you, but ... maybe take a day off? we'll see you next week!". As a counterpoint to that: we actively enforce a daily "rep cap", which I think is hugely positive thing towards sensible and healthy usage; I just want to call that out for balance and fairness.

Now consider: in the grand scheme of things: we're cuddly kittens. Just think what the Facebooks, Googles, etc are doing with psychological factors to drive "engagement". We've already seen the disclosures about Facebook's manipulation of feeds to drive specific responses. Corporations are often perversely incentivized to be at odds with healthy engagement. We can see this most clearly in sectors like gambling, pornography, gaming (especially in-game/in-app purchases, "pay to win"), drugs (whether legal or illicit), "psychics" (deal with the air-quotes) etc. Healthy customers are all well and good, but you make most of your money from the customers with unhealthy relationships. The idea of fast-eroding virtual "credit" is rife. If I can pick another example: I used to play quite a bit of Elite: Dangerous; I stopped playing around the time of the "Powerplay" update, which involved a mechanic around "merits" with a steep decay cycle: if you didn't play significant amounts of grind every week (without fail): you'd basically always have zero merits. This is far from unusual in today's games, especially where an online component exists. I've seen YouTube content creators talking about how they strongly feel that if they don't publish on a hard schedule, their channel tanks - and it doesn't even matter whether they're right: their behaviour is driven by the perception, not cold reality (whatever it may be).

I now accept that I had developed some unhealthy relationships with the internet. It hugely impacted my relationships at home, both in quality and quantity. I would either be unavailable, or when I was available, I would be... distracted. Checking my phone way too often - essentially not really present, except in the "meat" sense. Over time, this eroded things. Important things.

And yet as a society we've normalized it.

Let's look at some of the worst examples from above - gambling, pornography, drugs, etc: it used to be that if you had a proclivity in those directions, there would be some psychological or physical barrier: you'd need to go to the book-maker or casino, or that seedy corner-shop, or find a dealer. Now we have all of those things in our pocket, 24/7, offering anonymous instant access to the best and worst of everything the world has to offer. How would you know that your colleague has a gambling problem, when placing a bet looks identical to responding to a work email? As if that wasn't enough, we've even invented new ways of paying - "crypto-currency" - the key purposes of which are (in no particular order) "to ensure we don't get caught" and "to burn electricity pointlessly". There is possibly some third option about "decentralization" (is that just another word for "crowd-sourced money-laundering"? I can't decide), but I think we all know that in reality for most regular crypto-currency users this is a very far third option; it is probably more important for the organised criminals using it, but... that's another topic.

We Need to Maintain Vigilance

I wouldn't be saying all this if I thought it was all doom. I do think we've reached a fundamentally unhealthy place with technology; maybe we've been over-indulging in an initial excited binge, but: we really need to get over it and see where we're harming and being harmed. We don't need to obsess over our phones - those little notifications mean almost nothing. I'm absolutely not saying that I'm detaching myself from the internet, but I am treating it with a lot more respect - and caution. I'm actively limiting the times that I engage to times that I am comfortable with. There are very few things that are important enough to need your constant attention; things can wait. For most things: if it is genuinely urgent, someone will simply call you. I've completely and irrevocably blocked my access to a range of locations that (upon introspection) I found myself over-using, but which weren't helping me as a person - again, hollow validation like echo-chambers and empty arguments. I can limit my usage of things like "twitter" to useful professional interactions, not the uglier side of twitter politics. And I can ensure that in the time I spend with my family: I'm actually there. In mind and person, not just body. I've completely removed technology from the bedroom - and no, I'm not being crude there - there is a lot of important and useful discussion and just closeness time to be had there, without touching on more ... "intimate" topics. You really, really don't need to check your inbox while on the toilet - nobody deserves that; just leave the phone outside.

I got lucky; whatever problems I had, I was able to identify, isolate, and work through before they caused total destruction - and I need to be thankful for the support and patience of my wife. But it was genuinely close, and I need to acknowledge that. I'm happier today - and closer to my wife - than I have been in a long long time, mostly through my own casual fault. I'm cautious that the next person might not be so lucky. I'm also terrified about the upcoming generation of children who have very little baseline to compare to. What, for them, is "normal"? How much time at school and home are we dedicating to teaching these impressionable youths successful tactics for navigating the internet, and what that means for their "real" life? I think we can agree that when we hear of "Fortnite", "kids" and "rehab" being used in the same sentence: something is wrong somewhere.

Maybe somewhere along the line we (culture) threw the baby out with the bathwater. I'm not at all a religious person, but if I look at most established religions with that atheistic lens, I have to acknowledge that among the superstition: there are some good wisdoms about leading a good and healthy life - whether by way of moral codes (that vary hugely by religion), or by instilling a sense of personal accountability and responsibility, or by the simple act of finding time to sit quietly - regularly - and be honestly contemplative. To consider the consequences of our actions, even - perhaps especially - when we haven't had to do so directly. Humility, patience, empathy. I know in the past I've been somewhat dismissive of even non-theistic meditation, but: I suspect that it is something that I might now be in a position to appreciate.

To re-state: I'm OK; I am (and in terms of my marriage: we are) in a much better, stronger, healthier place than I (we) have been in a long time. I've had my Thanksgiving Miracle, and I've come out the other side with a renewed energy, and some fundamentally altered opinions. I'm interested in your thoughts here, but I'm not opening comments; again - we've made it too easy and anonymous! If you want to email me on this, please do (marc.gravell at gmail.com - if you could use "Thanksgiving Carol" in the subject, that'd really help me organize my inbox); I may respond, but I won't guarantee it, and I certainly won't guarantee an immediate response. I'm also deliciously conscious of the apparent irony of my blogging about the harms of the internet. But: if - as Joel assures me - "Developers are Writing the Script for the Future" - we need to start being a bit more outspoken about what that script says, and calling out when some measure of "success" of a product or service is likely impactful to healthy usage.

Closing: technology is great, the internet is great; but: we need to treat them with respect, and use them in sensible moderation. And pay lots more attention to the real world.

Saturday, 8 September 2018

Monotoolism

One Tool To Rule Them All

A recent twitter thread reminded me of a trope that I see frequently as a library author (and just as a general observer) - let’s call it “monotoolism”.

Examples of this might be examples like:

  • “wait, you’re still using ‘LINQ to SQL’? I thought you were using ‘Dapper’?”
  • “Google’s protobuf impementation provides opinionated JSON parsing, but my JSON doesn’t fit that layout - how do I get the library to parse my layout?”
  • “how do I parse HTML with a regular expression?”
  • etc

The common theme here being the expectation that once you have one tool in a codebase that fits a particular category: that’s it - there is one and only one tool against each category; one “data access tool”, one “string parsing tool”, etc.

This has always irked me. I understand where people are coming from - they don’t want an explosion of different tools to have to worry about:

  • they don’t want an overly complex dependency tree
  • they don’t want to have to check licensing / compliance etc against a huge number of libraries
  • they don’t want to have to train everyone to use a plethora of tools
  • etc

It absolutely makes sense to minimize the dependency count, and to remove unnecessary library overlap. But the key word in that sentence: “unnecessary” - and I mean that in a fairly loose sense: you can use the handle of a screwdriver to drive in a nail if you try hard enough, but it is much easier (and you get a better outcome) if you use a hammer. I think I’d include a hammer as a “necessary” tool alongside a set of screwdrivers if you’re doing any form of construction (but is that a metric or imperial hammer?).

I often see people either expressing frustration that their chosen “one tool to rule them all” can’t do tangentially-related-feature-X, or bending their code massively out of shape to try to make it do it; sometimes they even succeed, which is even scarier as a library author - because now there’s some completely undesigned-for, unspecified, undocumented and just unknown usage in the wild (quite possibly abusing reflection to push buttons that aren’t exposed) that the library author is going to get yelled at when it breaks.

XKCD: Workflow

It is OK to use more than one tool!

Yes, it is desirable to minimize the number of unnecessary tools. But: it is OK to use more than one tool. Expected, even. You absolutely should be wary of uncontrolled tool propogation, but I strongly advocate against being too aggressive with rebukes along the lines of:

We already have a tool that does something kinda like that; can you just torture the tool and the domain model a bit and see if it works well enough to just about work?

Remember, the options here are:

  1. two (or more) different tools, each used in their intended way, closely following their respective documented examples in ways that are “obviously right” and which it is easy to ask questions of the library authors or the community
  2. one single tool, tortured and warped beyond recognition, looking nothing like… anything, where even the tool’s authors can’t understand what you’re doing (let alone why, and they’re probably too afraid to ask), where you’re the only usage like that, ever, and where your “elegant hack” might stop working in the next minor revision, because it wasn’t a tested scenario

I prefer “1”. It’ll keep your model cleaner. It’ll keep you relationship with the tool more successful. Yes, it will mean that you occasionally need more than one tool listed in a particular box. Deal with it! If the tool really is complex enough that this is problematic, just move the ugly complexity behind some abstraction, then only a limited number of people need to worry about how it works.

Always use the right tool for the job.

Thursday, 2 August 2018

protobuf-net, August 2018 update

An update on what's happening with protobuf-net

Headline: .proto processing now works directly from dotnet build and MSBuild, without any need for DSL processing steps; and - new shiny things in the future.


I haven't spoken about protobuf-net for quite a while, but: it is very much alive and active. However, I really should do a catch-up, and I'm really excited about where we are.

Level 100 primer, if you don't know what "protobuf" is

"protobuf" is Protocol Buffers, Google's cross-platform/language/OS/etc serialization format (and associated tools). It is primarily a dense binary format, but a JSON variant also exists. A lot of Google's public and private APIs are protobuf, but it is used widely outside of Google too.

The data/schema is often described via a custom DSL, .proto - which comes in 2 versions (proto2 and proto3). They both describe the same binary format.

Google provide implementations for a range of platforms including C# (note: "proto3" only), but ... I kinda find the "DSL first, always" approach limiting (I like the flexibility of "code first"), and ... the Google implementation is "Google-idiomatic", rather than ".NET idiomatic".

Hence protobuf-net exists; it is a fast/dense binary serializer that implements the protobuf specifiction, but which is .NET-idiomatic, and allows either code-first or DSL-first. I use it a lot.

Historically, it was biased towards "code first", with the "DSL first" tools a viable but more awkward option.

What's changed lately?

Bespoke managed DSL parser

Just over a year ago now, back in 2.3.0, I released a new set of DSL parsing tools. In the past, protobuf-net's tooling (protogen) made use of Google's protoc tool - a binary executable that processes .proto files, but this was incredibly akward to deploy between platforms. Essentially, the tools would probably work on Windows, but that was about it. This wasn't a great option going forward, so I implemented a completely bespoke 100% managed-code parser and code-generator that didn't depend on protoc at all. protogen was reborn (and it works with both "proto2" and "proto3"), but it lacked a good deployment route.

Playground website

Next, I threw together protogen.marcgravell.com. This is an ASP.NET Core web app that uses the same library code as protogen, but in an interactive web app. This makes for a pretty easy way to play with .proto files, including a code-editor and code generator. It also hosts protoc, if you prefer that - and includes a wide range of Google's API definitions available as imports. This is a very easy way of working with casual .proto usage, and it provides a download location for the standalone protogen tools. It isn't going to win any UI awards, but it works. It even includes a decoder, if you want to understand serialized protobuf data.

Global tools

Having a download for the command-line tools is a great step forward, but ... it is still a lot of hassle. If only there were a way of installing managed-code developer tools in a convenient way. Well, there is: .NET "global tools"; so, a few months ago I added protobuf-net.Protogen. As a "global tool", this can be installed once via

dotnet tool install --global protobuf-net.Protogen

and then protogen will be available anywhere, as a development tool. Impressively, "global tools" work between operating systems, so the exact same package will also work on linux (and presumably Mac). This starts to make .proto very friendly to work with, as a developer.

Build tools

I'm going to be frank and honest: MSBuild scares the bejeezus out of me. I don't understand .targets files, etc. It is a huge blind-spot of mine, but I've made my peace with that reality. So... I was genuinely delighted to receive a pull request from Mark Pflug that fills in the gaps! What this adds is protobuf-net.MSBuild - tools that tweak that build process from dotnet build and MSBuild. What this means is that you can just install protobuf-net.MSBuild into a project, and it automatically runs the .proto → C# code-generation steps as part of build. This means you can just maintain your .proto files without any need to generate the C# as a separate step. You can still extend the partial types in the usualy way. All you need to do is make sure the .proto files are in the project. It even includes the common Google import additions for free (without any extra files required), so: if you know what a .google.protobuf.timestamp.Timestamp is - know that it'll work without you having to add the relevant .proto file manually (although you still need the import statement).

I can't understate how awesome I think these tools are, and how much friendlier it makes the "DSL first" scenario. Finally, protobuf-net can use .proto as a truly first class experience. Thanks again, Mark Pflug!

What next?

That's where we are today, but : to give an update on my plans and priorities going forwards...

Spans and Pipelines

You might have noticed me talking about these a little lately; I've done lots of research to look at what protobuf-net might do with these, but it is probably time to start looking at doing it "for real". The first step there is getting some real timings on the performance difference between a few different approaches

AOT

In particular, platforms that don't allow IL-emit. This helps consumers like UWP, Unity, iOS, etc. They usually currently work with protobuf-net, but via huge compromises. To do better, we need radically overhaul how we approach those platforms. I see two viable avenues to explore there.

  1. we can enhance the .proto codegen (the bits that protobuf-net.MSBuild just made tons better), to include generation of the actual serialization code

  2. we can implement Roslyn-based tools that pull apart code-first usage to understand the model, and emit the serialization code at build time

All of these are going to keep me busy into the foreseeable!

Monday, 30 July 2018

Pipe Dreams, part 3.1

Pipelines - a guided tour of the new IO API in .NET, part 3.1

(part 1, part 2, part 3)

After part 3, I got some great feedback - mostly requests to clarify things that I touched on, but could do with further explanation. Rather than make part 3 even longer, I want to address those here! Yay, more words!


Isn't ArrayPoolOwner<T> doing the same thing as MemoryPool<T>? Why don't you just use MemoryPool<T> ?

Great question! I didn't actually mention MemoryPool<T>, so I'd better introduce it.

MemoryPool<T> is an abstract base type that offers an API of the form:

public abstract class MemoryPool<T> : IDisposable
{
    public abstract IMemoryOwner<T> Rent(int minBufferSize = -1);
    // not shown: a few unrelated details
}

As you can see, this Rent() method looks exactly like what we were looking for before - it takes a size and returns an IMemoryOwner<T> (to provide a Memory<T>), with it being returned from whence it came upon disposal.

MemoryPool<T> also has a default implementation (public static MemoryPool<T> Shared { get; }), which returns a MemoryPool<T> that is based on the ArrayPool<T> (i.e. ArrayMemoryPool<T>). The Rent() method returns an ArrayMemoryPoolBuffer, which looks remarkably like the thing that I called ArrayPoolOwner<T>.

So: a very valid question would be: "Marc, didn't you just re-invent the default memory pool?". The answer is "no", but it is for a very subtle reason that I probably should have expounded upon at the time.

The problem is in the name minBufferSize; well... not really the name, but the consequence. What this means is: when you Rent() from the default MemoryPool<T>.Shared, the .Memory that you get back will be over-sized. Often this isn't a problem, but in our case we really want the .Memory to represent the actual number of bytes that were sent (even if we are, behind the scenes, using a larger array from the pool to contain it).

We could use an extension method on arbitrary memory pools to wrap potentially oversized memory, i.e.

public static IMemoryOwner<T> RentRightSized<T>(
    this MemoryPool<T> pool, int size)
{
    var leased = pool.Rent(size);
    if (leased.Memory.Length == size)
        return leased; // already OK
    return new RightSizeWrapper<T>(leased, size);
}
class RightSizeWrapper<T> : IMemoryOwner<T>
{
    public RightSizeWrapper(
        IMemoryOwner<T> inner, int length)
    {
        _inner = inner;
        _length = length;
    } 
    IMemoryOwner<T> _inner;
    int _length;
    public void Dispose() => _inner.Dispose();
    public Memory<T> Memory
        => _inner.Memory.Slice(0, _length);
}

but... this would mean allocating two objects for most leases - one for the actual lease, and one for the thing that fixes the length. So, since we only really care about the array-pool here, it is preferable IMO to cut out the extra layer, and write our own right-sized implementation from scratch.

So: that's the difference in the reasoning and implementation. As a side note, though: it prompts the question as to whether I should refactor my API to actually implement the MemoryPool<T> API.


You might not want to complete with success if the cancellation token is cancelled

This is in relation to the while in the read loop:

while (!cancellationToken.IsCancellationRequested)
{...}

The more typical expectation for cancellation is for it to throw with a cancellation exception of some kind; therefore, if it is cancelled, I might want to reflect that.

This is very valid feedback! Perhaps the most practical fix here is simply to use while (true) and let the subsequent await reader.ReadAsync(cancellationToken) worry about what cancellation should look like.


You should clarify about testing the result in async "sync path" scenarios

In my aside about async uglification (optimizing when we expect it to be synchronous in most cases), I ommitted to talk about getting results from the pseudo-awaited operation. Usually, this comes down to calling .Result on an awaitable (something like a Task<T>, ValueTask<T>, or .GetResult() on an awaiter (the thing you get from .GetAwaiter()). I haven't done it in the example because in async terms this would simply have been an await theThing; usage, not a var local = await theThing; usage; but you can if you need that.

I must, however, clarify a few points that perhaps weren't clear:

  • you should not (usually) try to access the .Result of a task unless you know that it has already completed
  • knowing that it has completed isn't enough to know that it has completed successfully; if you only test "is completed", you can use .GetResult() on the awaiter to check for exceptions while also fetching the result (which you can then discard if you like)
  • in my case, I'm taking a shortcut by checking IsCompletedSuccessfully; this exists on ValueTask[<T>] (and on Task[<T>] in .NET Core 2.0, else you can check .Status == TaskStatus.RanToCompletion) - which is only true in the "completed without an exception" case
  • because of expectations around how exceptions on async operations are wrapped and surfaced, it is almost always preferable to just switch into the async flow if you know a task has faulted, and just await it; the compiler knows how to get the exception out in the most suitable way, so: let it do the hard work

You should explain more about ValueTask[<T>] vs Task[<T>] - not many people understand them well

OK! Many moons ago, Task<T> became a thing, and all was well. Task<T> actually happened long before C# had any kind of support for async/await, and the main scenarios it was concerned about were genuinely asynchronous - it was expected that the answer would not be immediately available. So, the ovehead of allocating a placeholder object was fine and dandy, dandy and fine.

As the usage of Task<T> grew, and the language support came into effect, it started to become clear that there were many cases where:

  • the operation would often be available immediately (think: caches, buffered data, uncontested locking primitives, etc)
  • it was being used inside a tight loop, or just at high frequency - i.e. something that happens thousands of times a second (file IO, network IO, synchronization over a collection, etc)

When you put those two things together, you find yourself allocating large numbers of objects for something that was only rarely actually asynchronous (so: when there wasn't data available in the socket, or the file buffer was empty, or the lock was contested). For some scenarios, there are pre-completed reusable task instances available (such as Task.CompletedTask, and inbuilt handling for some low integers), but this doesn't help if the return value is outside this very limited set. To help avoid the allocations in the general case, ValueTask[<T>] was born. A ValueTask[<T>] is a struct that implements the "awaitable" pattern (a duck-typed pattern, like foreach, but that's a story for another day), that essentially contains two fields:

  • a T if the value was known immediately (obviously not needed for the untyped ValueTask case)
  • a Task<T> if the value is pending and the answer depends on the result of the incomplete operation

That means that if the value is known now, no Task[<T>] (and no corresponding TaskCompletionSource<T>) ever needs to be allocated - we just throw back the struct, it gets unwrapped by the async pattern, and life is good. Only in the case where the operation is actually asynchronous does an object need to be allocated.

Now, there are three common views on what we should do with this:

  1. always expose Task[<T>], regardless of whether it is likely to be synchronous
  2. expose Task[<T>] if we know it will be async, expose ValueTask[<T>] if we think it may be synchronous
  3. always expose ValueTask[<T>]

Frankly, the only valid reason to use 1 is because your API surface was baked and fixed back before ValueTask[<T>] existed.

The choice between 2 and 3 is interesting; what we're actually talking about there is an implementation detail, so a good case could be argued for 3, allowing you to amaze yourself later if you find a way of doing something synchronously (where it was previously asynchronous), without breaking the API. I went for 2 in the code shown, but it would be something I'd be willing to change without much prodding.

You should also note that there is actually a fourth option: use custom awaitables (meaning: a custom type that implements the "awaitable" duck-typed pattern). This is an advanced topic, and needs very careful consideration. I'm not even going to give examples of how to do that, but it is worth noting that ReadAsync and FlushAsync ("pipelines" methods that we've used extensively here) do return custom awaitables. You'd need to really, really understand your reasons before going down that path, though.


I spotted a bug in your "next message number" code

Yes, the code shown in the post can generate two messages with id 1, after 4-billion-something messages:

messageId = ++_nextMessageId;
if (messageId == 0) messageId = 1;

Note that I didn't increment _nextMessageId when I dodged the sentinel (zero). There's also a very small chance that a previous message from 4-billion-ago still hasn't been replied to. Both of these are fixed in the "real" code.


You might be leaking your lease around the TrySetResult

In the original blog code, I had

tcs?.TrySetResult(payload.Lease());

If tcs is not null (via the "Elvis operator"), this allocates a lease and then invokes TrySetResult. However, TrySetResult can return false - meaning: it couldn't do that, because the underlying task was already completed in some other way (perhaps we added timeout code). The only time we should consider that we have successfully transferred ownership of the lease to the task is if it returns true. The real code fixes this, ensuring that it is disposed in all cases except where TrySetResult returns true.


What about incremental frame parsers?

In my discussion of handling the frame, I was using an approach that processed a frame either in it's entirety, or not at all. This is not the only option, and you can consume any amount of the frame that you want, as long as you write code to track the internal state. For example, if you are parsing http, you could parse the http headers into some container as long as you have at least one entire http header name/value pair (without requiring all the headers to start parsing). Similarly, you could consume some of the payload (perhaps writing what you have so far to disk). In both cases, you would simply need to Advance past the bit that you consider consumed, and update your own state object.

So yes, that is absolutely possible - even highly desirable in some cases. In some cases it is highly desirable not to start until you've got everything. Remember that parsing often means taking the data from a streamed representation, and pushing it into a model representation - you might actually need more memory for the model representation (especially if the source data is compressed or simply stored in a dense binary format). An advantage of incremental parsing is that when the last few bytes dribble in, you might already have done most of the parsing work - allowing you to overlap pre-processing the data with data receive - rather than "buffer, buffer, buffer; right - now start parsing it".

However, in the case I was discussing: the header was 8 bytes, so there's not much point trying to over-optimize; if we don't have an entire header now, we'll mostly likely have a complete header when the next packet arrives, or we'll never have an entire header. Likewise, because we want to hand the entire payload to the consumer as a single chunk, we need all the data. We could actually lease the target array as soon as we know the size, and start copying data into that buffer and releasing the source buffers. We're not actually gaining much by this - we're simply exchanging data in one buffer for the same amount of data in another buffer; but we're actually exposing ourselves to an attack vector: a malicious (or buggy) client can sent a message-header that claims to be sending a large amount of data (maybe 1GiB), then just ... keeps the socket open and doesn't send anything more. In this scenario, the client has sent almost nothing (maybe just 8 bytes!), but they've chewed up a lot of server memory. Now imagine they do this from 10, 100, or 1000 parallel connections - and you can see how they've achieved disproportionate harm to our server, for almost zero cost at the client. There are two pragmatic fixes for this:

  • Put an upper limit on the message size, and put an upper limit on the connections from a single endpoint
  • Make the client pay their dues: if they claim to be sending a large message (which may indeed have legitimate uses): don't lease any expensive resources until they've actually sent that much (which is what the code as-implemented achieves)

Emphasis: your choice of frame parsing strategy is entirely contextual, and you can play with other implementations.


So; that's the amendments. I hope they are useful. A huge "thanks" to the people who are keeping me honest here, including Shane GrĂ¼ling, David Fowler, and Nick Craver.