Friday, 20 May 2022

Unusual optimizations; ref foreach and ref returns

A really interesting feature quietly slipped into C# 7.3 - interesting to me, at least - but which I’ve seen almost no noise about. As I’ve said many times before: I have niche interests - I spend a lot of time in library code, or acting in a consulting capacity on performance tuning application code - so in both capacities, I tend to look at performance tweaks that aren’t usually needed, but when they are: they’re glorious. As I say: I haven’t seen this discussed a lot, so: “be the change you want to see” - here’s my attempt to sell you on the glory of ref foreach.

Because I know folks have a short attention span, I’ll start with the money shot:

Method Mean Gen 0 Allocated
ListForEachLoop 2,724.7 ns - -
ArrayForEachLoop 972.2 ns - -
CustomForEachLoop 987.2 ns - -
ListForLoop 1,201.3 ns - -
ArrayForLoop 593.0 ns - -
CustomForLoop 596.2 ns - -
ListLinqSum 7,057.1 ns 0.0076 80 B
ArrayLinqSum 4,832.7 ns - 32 B
ListForEachMethod 2,070.6 ns 0.0114 88 B
ListRefForeachLoop 586.2 ns - -
ListSpanForLoop 590.3 ns - -
ArrayRefForeachLoop 574.1 ns - -
CustomRefForeachLoop 581.0 ns - -
CustomSpanForeachLoop 816.1 ns - -
CustomSpanRefForeachLoop 592.2 ns - -

With the point being: I want to sell you on those sub-600 nanosecond versions, rather than the multi-microsecond versions of the same operation.

What the hell is ref foreach?

First, simple recap: let’s consider:

foreach (var someValue in someSequence)
{
    someValue.DoSomething();
}

The details here may vary depending on what someSequence is, but conceptually, what this is doing is reading each value from someSequence into a local variable someValue, and calling the DoSomething() method on each. If the type of someValue is a reference-type (i.e. a class or interface), then each “value” in the sequence is just that: a reference - so we’re not really moving much data around here, just a pointer.

When this gets interesting is: what if the type of someValue is a struct? And in particular, what if it is a heckin’ chonka of a struct? (and yes, there are some interesting scenarios where struct is useful outside of simple data types, especially if we enforce readonly struct to prevent ourselves from shooting our own feet off) In that case, copying the value out of the sequence can be a singificant operation (if we do it often enough to care). Historically, the foreach syntax has an inbuilt implementation for some types (arrays, etc), falling back to a duck-typed pattern that relies on a bool MoveNext() and SomeType Current {get;} pair (often, but not exclusively, provided via IEnumerator<T>) - so the “return the entire value” is baked into the old signature (via the Current property).

What if we could avoid that?

For arrays: we already can!

Let’s consider that someSequence is explicitly typed as an array. It is very tempting to think that foreach and for over the array work the same - i.e. the same foreach as above, compared to for:

for(int i = 0 ; i < someArray.Length ; i++)
{
    someArray[i].DoSomething();
}

But: if we run both of those through sharplab, we can see that they compile differently; in C#, the difference is that foreach is basically:

SomeType someType = someArray[index];
someType.DoSomething();

which fetches the entire value out of the array, where as for is:

someArray[index].DoSomething();

Now, you might be looking at that and thinking “aren’t they the same thing?”, and the simple answer is: “no, no they are not”. You see, there are two ways of accessing values inside an array; you can copy the data out (ldelem in IL, which returns the value at the index), or you can access the data directly inside the array (ldelema in IL, which returns the address at the index). Ironically, we need an address to call the DoSomething() method, so for the foreach version, this actually becomes three steps: “copy out the value from the index, store the value to a local, get the address of a local” - instead of just “get the address of the index”; or in IL:

IL_0006: ldloc.0 // the array
IL_0007: ldloc.1 // the index
IL_0008: ldelem SomeType // read value out from array:index
IL_000d: stloc.2 // store in local
IL_000e: ldloca.s 2 // get address of local
IL_0010: call instance void SomeType::DoSomething() // invoke method

vs

IL_0004: ldarg.0 // the array
IL_0005: ldloc.0 // the index
IL_0006: ldelema SomeType // get address of array:index
IL_000b: call instance void SomeType::DoSomething() // invoke method

So by using for here, not only have we avoided copying the entire value, but we’ve dodged a few extra operations too! Nice. Depending on the size of the value being iterated (again, think “chunky struct” here), using for rather than foreach on an array (making sure you snapshot the value to elide bounds checks) can make a significant difference!

But: that’s arrays, and we aren’t always interested in arrays.

But how does that help me outside arrays?

You might reasonably be thinking “great, but I don’t want to just hand arrays around” - after all, they give me no ability to protect the data, and they’re inconvenient for sizing - you can’t add/remove, short of creating a second array and copying all the data. This is where C# 7.3 takes a huge flex; it introduces a few key things here:

  • C# 7.0 adds ref return values from custom methods including indexers, and ref local values (so you don’t need to use them immediately as a return value)
  • C# 7.2 adds ref readonly to most places where ref might be used (and readonly struct, which often applies here)
  • C# 7.3 adds ref (and ref readonly) as foreach L-values (i.e. the iterator value corresponding to .Current)

Note that with ref, the caller can mutate the data in-place, which is not always wanted; ref readonly signals that we don’t want that to happen, hence why it is so often matched with readonly struct (to avoid having to make defensive copies of data), but as a warning: readonly is always a guideline, not a rule; a suitably motivated caller can convert a ref readonly to a ref, and can convert a ReadOnlySpan<T> to a Span<T>, and convert any of the above to an unmanaged T* pointer (at which point you can forget about all safety); this is not a bug, but a simple reality: everything is mutable if you try hard enough.

These languages features provide the building blocks - especially, but not exclusively, when combined with Span<T>; Span<T> (and the twin, ReadOnlySpan<T>) provide unified access to arbitrary data, which could be a slice of an array, but could be anything else - with the usual .Length, indexer (this[int index]) and foreach support you’d expect, with some additional compiler and JIT tricks (much like with arrays) to make them fly. Since spans are naturally optimized, one of the first things we can do - if we don’t want to deal with arrays - is: deal with spans instead! This is sometimes a little hard to fit into existing systems without drastically refactoring the code, but more recently (.NET 5+), we get helper methods like CollectionsMarshal.AsSpan, which gives us the sized span of the data underpinning a List<T>. This is only useful transiently (as any Add/Remove on the list will render the span broken - the length will be wrong, and it may now even point to the wrong array instance, if the list had to re-size the underlying data), but when used correctly, it allows us to access the data in situ rather than having to go via the indexer or iterator (both of which copy out the entire value at each position). For example:

foreach (ref var tmp in CollectionsMarshal.AsSpan(someList))
{   // also works identically with "ref readonly var", since this is
    // a readonly struct
    tmp.DoSomething();
}

Our use of ref var tmp with foreach here means that the L-value (tmp) is a managed pointer to the data - not the data itself; we have avoided copying the overweight value-type, and called the method in-place.

If you look carefully, the indexer on a span is not T this[int index], but rather: ref T this[int index] (or ref readonly T this[int index] for ReadOnlySpan<T>), so we can also use a for loop, and avoid copying the data at any point:

var span = CollectionsMarshal.AsSpan(someList);
for (int i = 0; i < span.Length; i++) { span[i].DoSomething(); }

Generalizing this

Sometimes, spans aren’t viable either - for whatever reason. The good news is: we can do the exact same thing with our own types, in two ways:

  1. we can write our own types with an indexer that returns a ref or ref readonly managed pointer to the real data
  2. we can write our own iterator types with a ref or ref readonly return value on Current; this won’t satisfy IEnumerator<T>, but the compiler isn’t limited to IEnumerator<T>, and if you’re writing a custom iterator (rather than using a yield return iterator block): you’re probably using a custom value-type iterator and avoiding the interface to make sure it never gets boxed accidentally, so: nothing is lost!

Purely for illustration (you wouldn’t do this - you’d just use ReadOnlySpan<T>), a very simple custom iterator could be something like:

public struct Enumerator
{
    private readonly SomeStruct[] _array;
    private int _index;

    internal Enumerator(SomeStruct[] array)
    {
        _array = array;
        _index = -1;
    }

    public bool MoveNext()
        => ++_index < _array.Length;

    public ref readonly SomeStruct Current
        => ref _array[_index];
}

which would provide foreach access almost as good as a direct span. If the caller uses foreach (var tmp in ...) rather than foreach(ref readonly var tmp in ...), then the compiler will simply de-reference the value for the caller, which it would have done anyway in the old-style foreach, so: once again: no harm.

Summary

In modern C#, we have a range of tricks that can help in certain niche scenarios relating to sequences of - in particular - value types. These scenarios don’t apply to everyone, and that’s fine. If you never need to use any of the above: that’s great, and good luck to you. But when you do need them, they are incredibly powerful and versatile, and a valuable tool in the optimizer’s toolbox.

The benchamrk code used for the table at the start of the post is included here.

Tuesday, 22 February 2022

Migrating from Redis-64 to Memurai

or alternatively:

How did updating to .NET 6 break asp-net redis cache for some users?

Whereby I present the history of Redis-64, along with options and motivations for Redis-64 users on Windows to consider updating their redis via Memurai.

Running Redis on Windows, 2022 edition; replacing Redis-64

A funny thing happened recently; after updating to .NET 6, some StackExchange.Redis users started reporting that redis was not working from their web applications. A relatively small number, so: not an endemic fail - but also far from zero. As you might hope, we took a look, and pieced together that what was actually happening here was:

  • a part of ASP.NET allows using redis as a cache
  • historically, this used the HMSET redis command (which sets multiple hash fields, contrast to HSET which sets a single hash field)
  • in redis 4.0 (July 2014), HSET was made variadic and thus functionally identical to HMSET - and so HMSET was marked “deprecated” (although it still works)
  • respecting the “deprecated” marker, .NET 6 (Nov 2021) included a change to switch from HMSET to HSET, thinking that the number of people below redis 4.0 should be negligible
  • and it turned out not to be!

This problem was reported and the relevant code has now been fixed to support both variants, but we need to take a step further and understand why a non-trivial number of users are more than 7 years behind on servicing. After a bit more probing, it is my understanding that for a lot of the affected users, the answer is simple: they are using Redis-64.

What is (was) Redis-64?

Historically, the main redis project has only supported linux usage. There are some particular nuances of how redis is implemented (using fork-based replication and persistance with copy-on-write semantics, for example) that don’t make for a direct “just recompile the code and it works the same” nirvana. Way back around the redis 2.6 era (2013), Microsoft (in the guise of MSOpenTech) released a viable Windows-compatible fork, under the name Redis-64 (May 2013). This fork was kept up to date through redis 2.8 and some 3.0 releases, but the development was ultimately dropped some time in 2016, leaving redis 3.0 as the last MSOpenTech redis implementation. There was also a Redis-32 variant for x86 usage, although this was even more short-lived, staying at 2.6.

I’m all in favor of a wide variety of good quality tools and options. If you want to run a redis server as part of a Windows installation, you should be able to do that! This could be because you already have Windows servers and administrative experience, and want a single OS deployment; it could be because you don’t want the additional overheads/complications of virtualization/container technologies. It could be because you’re primarily doing development on a Windows machine, and it is convenient. Clearly, Redis-64 was an attractive option to many people who want to run redis natively on Windows; I know we used it (in addition to redis on linux) when I worked with Stack Overflow.

Running outdated software is a risk

Ultimately, being stuck with a server that is based on 2015/2016 starts to present a few problems:

  1. you need to live with long-known and long-fixed bugs and other problems (including any well-known security vulnerabilities)
  2. you don’t get to use up-to-date features and capabilities
  3. you might start dropping off the support horizon of 3rd party libraries and tools

This 3rd option is what happened with ASP.NET in .NET 6, but the other points also stand; the “modules” (redis 4.x) and “streams” (redis 5.x) features come to mind immediately - both have huge utility.

So: if you’re currently using Redis-64, how can we resolve this, without completely changing our infrastructure?

Shout-out: Memurai

The simplest way out of this corner is, in my opinion: Memurai, by Janea Systems. So: what is Memurai? To put it simply: Memurai is a redis 5 compatible fork of redis that runs natively on Windows. That means you get a wide range of more recent redis fixes and features. Fortunately, it is a breeze to install, with options for nuget, choco/cinst, winget, winstall and an installer. This means that you can get started with a Memurai development installation immediately.

The obsolete Redis-64 nuget package also now carries a link to Memurai in the “Suggested Alternatives”, which is encouraging. To be transparent: I need to emphasize - Memurai is a commercial offering with a free developer edition. If we look at how Redis-64 ultimately stagnated, I view this as a strength: it means that someone has a vested interest in making sure that the product continues to evolve and be supported, now and into the future.

Working with Memurai

As previously noted: installation is quick and simple, but so is working with it. The command-line tools change nominally; instead of redis-cli, we have memurai-cli; instead of redis-server we have memurai. However, they work exactly as you expect and will be immediately familar to anyone who has used redis. At the server level, Memurai surfaces the exact same protocol and API surface as a vanilla redis server, meaning any existing redis-compatible tools and clients should work without problem:

c:\Code>memurai-cli
127.0.0.1:6379> get foo
(nil)
127.0.0.1:6379> set foo bar
OK
127.0.0.1:6379> get bar
(nil)
127.0.0.1:6379>

(note that redis-cli would have worked identically)

At the metadata level, you may notice that info server reports some additional antries:

127.0.0.1:6379> info server
# Server
memurai_edition:Memurai Developer
memurai_version:2.0.5
redis-version:5.0.14
...

The redis_version entry is present so that client libraries and applications expecting this entry can understand the features available, so this is effectively the redis API compatibility level; the memurai_version and memurai_edition give specific Memurai information, if you need it - but other than those additions (and extra rows are expected here), everything works as you would expect. For example, we can use any pre-existing redis client to talk to the server:

using StackExchange.Redis;

// connect to local redis, default port
using var conn = await ConnectionMultiplexer.ConnectAsync("127.0.0.1");
var db = conn.GetDatabase();

// reset and populate some data
await db.KeyDeleteAsync("mykey");
for (int i = 1; i <= 20; i++)
{
    await db.StringIncrementAsync("mykey", i);
}

// fetch and display
var sum = (int)await db.StringGetAsync("mykey");
Console.WriteLine(sum); // writes: 210

Configuring the server works exactly like it does for redis - the config file works the same, although the example template is named differently:

c:\Code>where memurai
C:\Program Files\Memurai\memurai.exe

c:\Code>dir "C:\Program Files\Memurai\*.conf" /B
memurai.conf

Summary

Putting this all together: if you’re currently choosing Redis-64 to run a redis server natively on Windows, then Memurai might make a very appealing option - certainly more appealing than remaining on the long-obsolete Redis-64. All of your existing redis knowledge continues to apply, but you get a wide range of features that were added to redis after Redis-64 was last maintained. Are there other ways of running redis on Windows? Absolutely. But for people in the Redis-64 zone, it looks like a good option.

Monday, 3 May 2021

Is the era of reflection-heavy C# libraries at an end?

I’m going to talk about reflection-heavy libraries; I will describe the scenario I’m talking about - as it is commonly used today, the status quo, giving a brief overview of the pros and cons of this, and then present the case that times have changed, and with new language and runtime features: it may be time to challenge our way of thinking about this kind of library.


I’m a code-first kind of developer; I love the inner-loop experience of being able to tweak some C# types and immediately have everything work, and I hate having to mess in external DSLs or configuration files (protobuf/xml/json/yaml/etc). Over the last almost-two-decades, I’ve selected or written libraries that allow me to work that way. And, to be fair, this seems to be a pretty common way of working in .NET.

What this means in reality is that we tend to have libraries where a lot of magic happens at runtime, based either on the various <T> for generic APIs, or via GetType() on objects that are passed in. Consider the following examples:

Json.NET

// from  https://www.newtonsoft.com/json
Product product = new Product();
product.Name = "Apple";
product.Expiry = new DateTime(2008, 12, 28);
product.Sizes = new string[] { "Small" };

string json = JsonConvert.SerializeObject(product);

Dapper

var producer = "Megacorp, Inc.";
var products = connection.Query<Product>(@"
    select Id, Name, Expiry
    from Products
    where Producer = producer",
    new { producer }).AsList();

I won’t try to give an exhaustive list, but there are a myriad of libraries - both by Microsoft, or 3rd-party, for a myriad of purposes, that fundamentally fall into the camp of:

At runtime, given some Type: check the local library type cache; if we haven’t seen that Type before: perform a ton of reflection code to understand the model, produce a strategy to implement the library features on that model, and then expose some simplified API that invokes that strategy.

Behind the scenes, this might be “simple” naive reflection (PropertyInfo.GetValue(), etc), or it might use the Expression API or the ref-emit API (mainly: ILGenerator) to write runtime methods directly, or it might generate C# that it then runs through the compiler (XmlSerializer used to work this way, and may well still do so).

This provides a pretty reasonable experience for the consumer; their code just works, and - sure, the library does a lot of work behind the scenes, but the library authors usually invest a decent amount of time into trying to minimize that so you aren’t paying the reflection costs every time.

So what is the problem?

For many cases: this is fine - we’ve certainly lived well-enough with it for the last however-many years; but: times change. In particular, a few things have become increasingly significant in the last few years:

  • async/await

    Increasing demands of highly performant massively parallel code (think: “web servers”, for example) has made async/await hugely important; from the consumer perspective, it is easy to think that this is mostly a “sprinkle in some async/await keywords” (OK, I’m glossing over a lot of nuance here), but behind the scenes, the compiler is doing a lot - like a real lot of work for us. If you’re in the Expression or ILGenerator mind-set, switching fully to async/await is virtually impossible - it is just too much. At best, you can end up with some async shell library codes that calls into some generated Func<...>/Action<...> code, but that assumes that the context-switch points (i.e. the places where you’d want to await etc) can be conveniently mapped to that split. It isn’t assumed that a reflection-heavy library can even be carved up in this way.

  • AOT platforms

    At the other end of the spectrum, we have AOT devices - think “Xamarin”, “Unity”, etc. Running on a small device can mean that you have reduced computational power, so you start noticing the time it takes to inspect models at runtime - but they also often have deliberately restricted runtimes that prevent runtime code generation. This means that you can probably get away with the naive reflection approach (which is relatively slow), but you won’t be able to emit optimized code via ILGenerator; the Expression approach is a nice compromise here, in that it will optimize when it can, but use naive reflection when that isn’t possible - but you still end up paying the performance cost.

  • Linkers

    Another feature of AOT device scenarios is that they often involve trimmed deployments via a pruning linker, but “Single file deployment and executable” deployments are now a “thing” for regular .NET Core 5 / .NET 6+. This brings two problems:

    1. we need to work very hard to convince the linker not to remove things that our library is going to need to use at runtime, despite the fact that they aren’t used if you scan the assembly in isolation
    2. our reflection-heavy library often needs to consider all the possible problematic edge scenarios that could exist, ever, which means it might appear to touch a lot more things than it does, when in reality for the majority of runs it is just going to be asking “do I need to consider this? oh, nope, that’s fine” - because the library appears to touch it
    3. we thus find ourselves fighting the linker’s tendency to remove everything we need while simultaneously retaining everything that doesn’t apply to our scenario
  • Cold start

    It is easy to think of applications as having a relatively long duration, so: cold-start performance doesn’t matter. Now consider things like “Azure functions”, or other environments where our code is invoked for a very brief time, as-needed (often on massively shared infrastructure); in this scenario, cold-start performance translates directly (almost linearly) to throughput, and thus real money

  • Runtime error discovery

    One of the problems with having the library do all the model analysis at runtime is that you don’t get feedback on your code until you run it; and sure, you can (and should) write unit/integration tests that push your model through the library in every way you can think of, but: things get missed. This means that code that compiled blows up at runtime, for reasons that should be knowable - an “obvious” attribute misconfiguration, for example.

  • Magic code

    Magic is bad. By which I mean: if I said to you “there’s going to be some code running in your application, that doesn’t exist anywhere - it can’t be seen on GitHub, or in your source-code, or in the IDE, or in the assembly IL, or anywhere, and by the way it probably uses lots of really gnarly unusual IL, but trust me it is totally legit” - you might get a little worried; but that is exactly what all of these libraries do. I’m not being hyperbolic here; I’ve personally received bug-reports from the JIT god (AndyAyersMS) because my generated IL used ever so slightly the wrong pointer type in one place, which worked fine almost always, except when it didn’t and exploded the runtime.

There is a different way we can do all of this

Everything above is a side-effect of the tools that have been available to us - when the only tool you’ve had for years has been a hammer, you get used to thinking in terms of nails. For “code first”, that really meant “reflection”, which meant “runtime”. Reflection-based library authors aren’t ignorant of these problems, and for a long time now have been talking to the framework and language teams about options. As the above problem scenarios have become increasingly important, we’ve recently been graced with new features in Roslyn (the C# / VB compiler engine), i.e. “generators”. So: what are generators?

Imagine you could take your reflection-based analysis code, and inject it way earlier - in the build pipe, so when your library consumer is building their code (whether they’re using Visual Studio, or dotnet build or whatever else), you get given the compiler’s view of the code (the types, the methods, etc), and at that point you had the chance to add your own code (note: purely additive - you aren’t allowed to change the existing code), and have our additional code included in the build. That: would be a generator. This solves most of the problems we’ve discussed:

  • async: our generated code can use async/await, and we can just let the regular compiler worry about what that means - we don’t need to get our hands dirty
  • AOT: all of the actual code needed at runtime exists in the assemblies we ship - nothing needs to be generated at runtime
  • linkers: the required code is now much more obvious, because: it exists in the assembly; conversely, because we can consider all the problematic edge scenarios during build, the workarounds needed for those niche scenarios don’t get included when they’re not needed, and nor do their dependency chains
  • cold start: we now don’t need to do any model inspection or generation at runtime: it is already done during build
  • error discovery: our generator doubles as a Roslyn analyzer; it can emit warnings and errors during build if it finds something suspicious in our model
  • magic code: the consumer can see the generated code in the IDE, or the final IL in the assembly

If you’re thinking “this sounds great!”, you’d be right. It is a huge step towards addressing the problems described above.

What does a generator look like for a consumer?

From the “I’m an application developer, just make things work for me” perspective, using a generator firstly means adding a build-time package; for example, to add DapperAOT (which is purely experimental at this point, don’t get too excited), we would add (to our csproj):

<ItemGroup>
    <PackageReference Include="Dapper.AOT"
                      Version="0.0.8" PrivateAssets="all"
                      IncludeAssets="runtime;build;native;contentfiles;analyzers;buildtransitive" />
</ItemGroup>

This package isn’t part of what gets shipped in our application - it just gets hooked into the build pipe. Then we need to follow the library’s instructions on what is needed! In many cases, I would expect the library to self-discover scenarios where it needs to get involved, but as with any library, there might be special methods we need to call, or attributes we need to add, to make the magic happen. For example, with DapperAOT I’m thinking of having the consumer declare their intent via partial methods in a partial type:

[Command(@"select * from Customers where Id=@id and Region=@region")]
[SingleRow(SingleRowKind.FirstOrDefault)] // entirely optional; this is
    // to influence what happens when zero/multiple rows returned
public static partial Customer GetCustomer(
    DbConnection connection, int id, string region);

If you haven’t seen this partial usage before, this is an “extended partial method” in C# 9, which basically means partial methods can now be accessible, have return values, out parameters, etc - the caveat is that somewhere the compiler expects to find another matching half of the partial method that provides an implementation. Our generator can detect the above dangling partial method, and add the implementation in the generated code. This generated code is then available in the IDE, either by stepping into the code as usual, or in the solution explorer:

Showing the solution explorer, expanding: (the project), Dependencies, Analyzers, Dapper.AOT, Dapper.CoreAnalysis.CommandGenerator, Dapper.generated.cs

and as a code file:

The generated code file Dapper.generated.cs, declaring the GetCustomer method

Other libraries may choose other approaches, perhaps using module initializers to register some specific type handlers into a lightweight library, that handle expected known types (as discovered during build); or it could detect API calls that don’t resolve, and add them (either via partial types, or extension methods) - like a custom dynamic type, but where the convention-based APIs are very real, but generated automatically during build. But the theme remains: from the consumer perspective, everything just works, and is now more discoverable.

What does a generator look like for a library author?

Things are a little more complicated for the library author; the Roslyn semantic tree is similar to the kind of model you get at runtime - but it isn’t the same model; in particular, you’re not working with Type any more, you’re working with ITypeSymbol or (perhaps more commonly) INamedTypeSymbol. That’s because the type system that you’re inspecting is not the same as the type system that you’re running on - it could be for an entirely different framework, for example. But if you’re already used to complex reflection analysis, most things are pretty obvious. It isn’t very hard, honest. Mostly, this involves:

  1. implementing ISourceGenerator (and marking that type with [Generator])
  2. implementing ISyntaxReceiver to capture candidate nodes you might want to look at later
  3. implementing ISourceGenerator.Initialize to register your ISyntaxReceiver
  4. implementing ISourceGenerator.Execute to perform whatever logic you need against the nodes you captured
  5. calling context.AddSource some number of times to add whatever file(s) you need

I’m not going to give a full lesson on “how to write a generator” - I’m mostly trying to set the scene for why you might want to consider this, but there is a Source Generators Cookbook that covers a lot, or I humbly submit that the DapperAOT code might be interesting (I am not suggesting that it does everything the best way, but: it kinda works, and shows input-source-file-based unit testing etc).

This all sounds too good to be true? What is the catch?

Nothing is free. There’s a few gotchas here.

  1. It is a lot of re-work; if you have an existing non-trivial library, this represents a lot of effort
  2. You may also need to re-work your main library, perhaps splitting the “reflection aware” code and “making things work” code into two separate pieces, with the generator-based approach only needing the latter half
  3. Some scenarios may be hard to detect reliably during code analysis - where your code is seven layers down in generic types and methods, for example, it may be hard to discover all of the original types that are passed into your library; and if it is just object: even harder; we may need to consider this when designing APIs, or provide fallback mechanisms to educate the generator (for example, [model:GenerateJsonSerializerFor(typeof(Customer))])
  4. There are some things we can’t do in C# that we can do in IL; change readonly fields, call init/get-only properties, bypass accessibility, etc; in some cases, we might be able to generate internal constructors in another partial class (for example), that allows us to sneak past those boundaries, but in some other cases (where the type being used isn’t part of the current compilation, because it comes from a package reference) it might simply be that we can’t offer the exact same features (or need to use a fallback reflection scenario)
  5. It is C# specific (edit: C# and VB, my mistake!); this is a huuuuuge “but”, and I can hear the F#, VB, etc developers gnashing their teeth already; there’s a very nuanced conversation here about whether the advantages I’ve covered outweigh the disadvantages of not being able to offer the same features on all .NET platforms
  6. It needs up-to-date build tools, which may limit adoption (note: this does not mean we can only use generators when building against .NET 6 etc)
  7. We have less flexibility to configure things at runtime; in practice, this isn’t usually a problem as long as we can actually configure it, which can be done at build-time using attributes (and by using [Conditional(...)] on our configuration attributes, we don’t even need to include them in the final assembly - they can be used by the generator and then discarded by the compiler)

That said, there’s also some great upsides - during build we have access to information that doesn’t exist in the reflection model, for example the name parts of value-tuples (which are exposed outwards via attributes, but not inwards; libraries are inwards, from this perspective), and more reliable nullability annotation data when calling generic APIs with nullability.

Summary

I genuinely think we should be embracing generators and reducing or removing completely our reliance on runtime reflection emit code. I say this as someone who has built a pretty successful niche as an expert in those areas, and would have to start again with the new tools - I see the benefits, despite the work and wrinkles. Not only that, I think there is an opportunity here (with things like “extended partial methods” etc) to make our application code even more expressive, rather than having than having to worry about dancing around library implementation details.

But I welcome competing thoughts!

Monday, 18 May 2020

Multi-path cancellation; a tale of two codependent async enumerators

Disclaimer: I'll be honest: many of the concepts in this post are a bit more advanced - some viewer caution is advised! It touches on concurrent linked async enumerators that share a termination condition by combining multiple CancellationToken.


Something that I've been looking at recently - in the context of gRPC (and protobuf-net.Grpc in particular) - is the complex story of duplex data pipes. A full-duplex connection is a connection between two nodes, but instead of being request-response, either node can send messages at any time. There's still a notional "client" and "server", but that is purely a feature of which node was sat listening for connection attempts vs which node reached out and established a connection. Shaping a duplex API is much more complex than shaping a request-response API, and frankly: a lot of the details around timing are hard.

So: I had the idea that maybe we can reshape everything at the library level, and offer the consumer something more familiar. It makes an interesting (to me, at least) worked example of cancellation in practice. So; let's start with an imaginary transport API (the thing that is happening underneath) - let's say that we have:

  • a client establishes a connection (we're not going to worry about how)
  • there is a SendAsync message that sends a message from the client to the server
  • there is a TryReceiveAsync message that attempts to await a message from the server (this will also report true if a message could be fetched, and false if the server has indicated that it won't ever be sending any more)
  • additionally, the server controls data flow termination; if the server indicates that it has sent the last message, the client should not send any more

something like (where TRequest is the data-type being sent from the client to the server, and TResponse is the data-type expected from the server to the client):

interface ITransport<TRequest, TResponse> : IAsyncDisposable
{
    ValueTask SendAsync(TRequest request,
        CancellationToken cancellationToken);

    ValueTask<(bool Success, TResponse Message)> TryReceiveAsync(
        CancellationToken cancellationToken);
}

This API doesn't look all that complicated - it looks like (if we ignore connection etc for the moment) we can just create a couple of loops, and expose the data via enumerators - presumably starting the SendAsync via Task.Run or similar so it is on a parallel flow:

ITransport<TRequest, TResponse> transport;
public async IAsyncEnumerable<TResponse> ReceiveAsync(
    [EnumeratorCancellation] CancellationToken cancellationToken)
{
    while (true)
    {
        var (success, message) =
            await transport.TryReceiveAsync(cancellationToken);
        if (!success) break;
        yield return message;
    }
}

public async ValueTask SendAsync(
    IAsyncEnumerable<TRequest> data,
    CancellationToken cancellationToken)
{
    await foreach (var message in data
        .WithCancellation(cancellationToken))
    {
        await transport.SendAsync(message, cancellationToken);
    }
}

and it looks like we're all set for cancellation - we can pass in an external cancellation-token to both methods, and we're set. Right?

Well, it is a bit more complex than that, and the above doesn't take into consideration that these two flows are codependent. In particular, a big concern is that we don't want to leave the producer (the thing pumping SendAsync) still running in any scenario where the connection is doomed. There are actually many more cancellation paths than we might think:

  1. we might have supplied an external cancellation-token to both methods, and this token may have triggered
  2. the consumer of ReceiveAsync (the thing iterating it) might have supplied a cancellation-token to GetAsyncEnumerator (via WithCancellation), and this token may have been triggered (we looked at this last time)
  3. we could have faulted in our send/receive code
  4. the consumer of ReceiveAsync may have decided not to take all the data - that might be because of some async simile of Enumerable.Take(), or it could be because they faulted when processing a message they had received
  5. the producer in SendAsync may have faulted

All of these scenarios essentially signify termination of the connection, so we need to be able to encompass all of these scenarios in some way that allows us to communicate the problem between the send and receive path. In a word, we want our own CancellationTokenSource.

There's a lot going on here; more than we can reasonably expect consumers to do each and every time they use the API, so this is a perfect scenario for a library method. Let's imagine that we want to encompass all this complexity in a simple single library API that the consumer can access - something like:

public IAsyncEnumerable<TResponse> Duplex(
    IAsyncEnumerable<TRequest> request,
    CancellationToken cancellationToken = default);

This:

  • allows them to pass in a producer
  • optionally allows them to pass in an external cancellation-token
  • makes an async feed of responses available to them

Their usage might be something like:

await foreach (MyResponse item in client.Duplex(ProducerAsync()))
{
    Console.WriteLine(item);
}

where their ProducerAsync() method is (just "because"):

async IAsyncEnumerable<MyRequest> ProducerAsync(
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 100; i++)
    {
        yield return new MyRequest(i);
        await Task.Delay(100, cancellationToken);
    }
}

As I discussed in The anatomy of async iterators (aka await, foreach, yield), our call to ProducerAsync() doesn't actually do much yet - this just hands a place-holder that can be enumerated later, and it is the act of enumerating it that actually invokes the code. Very important point, that.

So; what can our Duplex code do? It already needs to think about at least 2 different kinds of cancellation:

  • the external token that was passed into cancellationToken
  • the potentially different token that could be passed into GetAsyncEnumerator() when it is consumed

but we know from our thoughts earler that we also have a bunch of other ways of cancelling. We can do something clever here. Recall how the compiler usually combines the above two tokens for us? Well, if we do that ourselves, then instead of getting just a CancellationToken, we find ourselves with a CancellationTokenSource, which gives us lots of control:

public IAsyncEnumerable<TResponse> Duplex(
    IAsyncEnumerable<TRequest> request,
    CancellationToken cancellationToken = default)
    => DuplexImpl(transport, request, cancellationToken);

private async static IAsyncEnumerable<TResponse> DuplexImpl(
    ITransport<TRequest, TResponse> transport,
    IAsyncEnumerable<TRequest> request,
    CancellationToken externalToken,
    [EnumeratorCancellation] CancellationToken enumeratorToken = default)
{
    using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
            externalToken, enumeratorToken);
    // ... todo
}

Our DuplexImpl method here allows the enumerator cancellation to be provided, but (importantly) kept separate from the original external token; this means that it won't yet be combined, and we can do that ourselves using CancellationTokenSource.CreateLinkedTokenSource - much like the compiler would have done for us, but: now we have a CancellationTokenSource that we can cancel when we choose. This means that we can use allDone.Token in all the places we want to ask "are we done yet?", and we're considering everything.

For starters, let's handle the scenario where the consumer doesn't take all the data (out of choice, or because of a fault). We want to trigger allDone however we exit DuplexImpl. Fortunately, the way that iterator blocks are implemented makes this simple (and we're already using it here, via using): recall (from the previous blog post) that foreach and await foreach both (usually) include a using block that invokes Dispose/DisposeAsync on the enumerator instance? Well: anything we put in a finally essentially relocates to that Dispose/DisposeAsync. The upshot of this is that triggering the cancellation token when the consumer is done with us is trivial:

using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
        externalToken, enumeratorToken);
try
{
    // ... todo
}
finally
{   // cancel allDone however we exit
    allDone.Cancel();
}

The next step is to get our producer working - that's our SendAsync code. Because this is duplex, it doesn't have any bearing on the incoming messages, so we'll start that as a completely separate code-path via Task.Run, but we can make it such that if the producer or send faults, it stops the entire show; so if we look just at our // ... todo code, we can add:

var send = Task.Run(async () =>
{
    try
    {
        await foreach (var message in
            request.WithCancellation(allDone.Token))
        {
            await transport.SendAsync(message, allDone.Token);
        }
    }
    catch
    {   // trigger cancellation if send faults
        allDone.Cancel();
        throw;
    }
}, allDone.Token);

// ... todo: receive

await send; // observe send outcome

This starts a parallel operation that consumes the data from our producer, but notice that we're using allDone.Token to pass our combined cancellation knowledge to the producer. This is very subtle, because it represents a cancellation state that didn't even conceptually exist at the time ProducerAsync() was originall invoked. The fact that GetAsyncEnumerator is deferred has allowed us to give it something much more useful, and as long as ProducerAsync() uses the cancellation-token appropriately, it can now be fully aware of the life-cycle of the composite duplex operation.

This just leaves our receive code, which is more or less like it was originally, but again: using allDone.Token:

while (true)
{
    var (success, message) = await transport.TryReceiveAsync(allDone.Token);
    if (!success) break;
    yield return message;
}

// the server's last message stops everything
allDone.Cancel();

Putting all this together gives us a non-trivial libray function:

private async static IAsyncEnumerable<TResponse> DuplexImpl(
    ITransport<TRequest, TResponse> transport,
    IAsyncEnumerable<TRequest> request,
    CancellationToken externalToken,
    [EnumeratorCancellation] CancellationToken enumeratorToken = default)
{
    using var allDone = CancellationTokenSource.CreateLinkedTokenSource(
        externalToken, enumeratorToken);
    try
    {
        var send = Task.Run(async () =>
        {
            try
            {
                await foreach (var message in
                    request.WithCancellation(allDone.Token))
                {
                    await transport.SendAsync(message, allDone.Token);
                }
            }
            catch
            {   // trigger cancellation if send faults
                allDone.Cancel();
                throw;
            }
        }, allDone.Token);

        while (true)
        {
            var (success, message) = await transport.TryReceiveAsync(allDone.Token);
            if (!success) break;
            yield return message;
        }

        // the server's last message stops everything
        allDone.Cancel();

        await send; // observe send outcome
    }
    finally
    {   // cancel allDone however we exit
        allDone.Cancel();
    }
}

The key points here being:

  • both the external token and the enumerator token contribute to allDone
  • the transport-level send and receive code uses allDone.Token
  • the producer enumeration uses allDone.Token
  • however we exit our enumerator, allDone is cancelled
    • if transport-receive faults, allDone is cancelled
    • if the consumer terminates early, allDone is cancelled
  • when we receive the last message from the server, allDone is cancelled
  • if the producer or transport-send faults, allDone is cancelled

The one thing it doesn't support well is people using GetAsyncEnumerator() directly and not disposing it. That comes under the heading of "using the API incorrectly", and is self-inflicted.

A side note on ConfigureAwait(false); by default await includes a check on SynchronizationContext.Current; in addition to meaning an extra context-switch, in the case of UI applications this may mean running code on the UI thread that does not need to run on the UI thread. Library code usually does not require this (it isn't as though we're updating form controls here, so we don't need thread-affinity). As such, in library code, it is common to use .ConfigureAwait(false) basically everywhere that you see an await - which bypasses this mechanism. I have not included that in the code above, for readability, but: you should imagine it being there :) By contrast, in application code, you should usually default to just using await without ConfigureAwait, unless you know you're writing something that doesn't need sync-context.

I hope this has been a useful delve into some of the more complex things you can do with cancellation-tokens, and how you can combine them to represent codependent exit conditions.

Thursday, 14 May 2020

The anatomy of async iterators (aka await, foreach, yield)

Here I'm going to discuss the mechanisms and concepts relating to async iterators in C# - with the hope of both demystifying them a bit, and also showing how we can use some of the more advanced (but slightly hidden) features. I'm going to give some illustrations of what happens under the hood, but note: these are illustrations, not the literal generated expansion - this is deliberately to help show what is conceptually happening, so if I ignore some sublte implementation detail: that's not accidental. As always, if you want to see the actual code, tools like https://sharplab.io/ are awesome (just change the "Results" view to "C#" and paste the code you're interested in onto the left).

Iterators in the sync world

Before we discuss async iterators, let's start by recapping iterators. Many folks may already be familiar with all of this, but hey: it helps to set the scene. More importantly, it is useful to allow us to compare and contrast later when we look at how async changes things. So: we know that we can write a foreach loop (over a sequence) of the form:

foreach (var item in SomeSource(42))
{
    Console.WriteLine(item);
}

and for each item that SomeSource returns, we'll get a line in the console. SomeSource could be returning a fully buffered set of data (like a List<string>):

IEnumerable<string> SomeSource(int x)
{
    var list = new List<string>();
    for (int i = 0; i < 5; i++)
        list.Add($"result from SomeSource, x={x}, result {i}");
    return list;
}

but a problem here is that this requires SomeSource to run to completion before we get even the first result, which could take a lot of time and memory - and is just generally restrictive. Often, when we're trying to represent a sequence, it may be unbounded, or at least: open-ended - for example, we could be pulling data from a remote work queue, where a: we only want to be holding one pending item at a time, and b: it may not have a logical "end". It turns out that C#'s definition of a "sequence" (for the purposes of foreach) is fine with this. Instead of returning a list, we can write an iterator block:

IEnumerable<string> SomeSource(int x)
{
    for (int i = 0; i < 5; i++)
        yield return $"result from SomeSource, x={x}, result {i}";
}

This works similarly, but there are some fundamental differences - most noticeably: we don't ever have a buffer - we just make one element available at a time. To understand how this can work, it is useful to take another look at our foreach; the compiler interprets foreach as something like the following:

using (var iter = SomeSource(42).GetEnumerator())
{
    while (iter.MoveNext())
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

We have to be a little loose in our phrasing here, because foreach isn't actually tied to IEnumerable<T> - it is duck-typed against an API shape instead; the using may or may not be there, for example. But fundamentally, the compiler calls GetEnumerator() on the expression passed to foreach, then creates a while loop checking MoveNext() (which defines "is there more data?" and advances the mechanism in the success case), then accesses the Current property (which exposes the element we advanced to). As an aside, historically (prior to C# 5) the compiler used to scope item outside of the while loop, which might sound innocent, but it was the source of absolutely no end of confusion, code erros, and questions on Stack Overflow (think "captured variables").

So; hopefully you can see in the above how the consumer can access an unbounded forwards-only sequence via this MoveNext() / Current approach; but how does that get implemented? Iterator blocks (anything involving the yield keyword) are actually incredibly complex, so I'm going to take a lot of liberties here, but what is going on is similar to:

IEnumerable<string> SomeSource(int x)
    => new GeneratedEnumerable(x);

class GeneratedEnumerable : IEnumerable<string>
{
    private int x;
    public GeneratedEnumerable(int x)
        => this.x = x;

    public IEnumerator<string> GetEnumerator()
        => new GeneratedEnumerator(x);

    // non-generic fallback
    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();
}

class GeneratedEnumerator : IEnumerator<string>
{
    private int x, i;
    public GeneratedEnumerator(int x)
        => this.x = x;

    public string Current { get; private set; }

    // non-generic fallback
    object IEnumerator.Current => Current;

    // if we had "finally" code, it would go here
    public void Dispose() { }

    // our "advance" logic
    public bool MoveNext()
    {
        if (i < 5)
        {
            Current = $"result from SomeSource, x={x}, result {i}";
            i++;
            return true;
        }
        else
        {
            return false;
        }
    }

    // this API is essentially deprecated and never used
    void IEnumerator.Reset() => throw new NotSupportedException();
}

Let's tear this apart:

  • firstly, we need some object to represent IEnumerable<T>, but we also need to understand that IEnumerable<T> and IEnumerator<T> (as returned from GetEnumerator()) are different APIs; in the generated version there is a lot of overlap and they can share an instance, but to help discuss it, I've kept the two concepts separate.
  • when we call SomeSource, we create our GeneratedEnumerable which stores the state (x) that was passed to SomeSource, and exposes the required IEnumerable<T> API
  • later (and it could be much later), when the caller iterates (foreach) the data, GetEnumerator() is invoked, which calls into our GeneratedEnumerator to act as the cursor over the data
  • our MoveNext() logic implements the same for loop conceptually, but one step per call to MoveNext(); if there is more data, Current is assigned with the thing we would have passed to yield return
  • note that there is also a yield break C# keyword, which terminates iteration; this would essentially be return false in the generated expansion
  • note that there are some nuanced differences in my hand-written version that the C# compiler needs to deal with; for example, what happens if I change x in my enumerator code (MoveNext()), and then later iterate the data a second time - what is the value of x? emphasis: I don't care about this nuance for this discussion!

Hopefully this gives enough of a flavor to understand foreach and iterators (yield) - now let's get onto the more interesting bit: async.

Why do we need async iterators?

The above works great in a synchronous world, but a lot of .NET work is now favoring async/await, in particular to improve server scalability. The big problem in the above code is the bool MoveNext(). This is explicitly synchronous. If the thing it is doing takes some time, we'll be blocking a thread, and blocking a thread is increasingly anathema to us. In the context of our earlier "remote work queue" example, there might not be anything there for seconds, minutes, hours. We really don't want to block threads for that kind of time! The closest we can do without async iterators is to fetch the data asynchronously, but buffered - for example:

async Task<List<string>> SomeSource(int x) {...}

But this is not the same semantics - and is getting back into buffering. Assuming we don't want to fetch everything in one go, to get around this we'd eventually end up implementing some kind of "async batch loop" monstrosity that effectily re-implements foreach using manual ugly code, negating the reasons that foreach even exists. To address this, C# and the BCL have recently added support for async iterators, yay! The new APIs (which are available down to net461 and netstandard20 via NuGet) are:

public interface IAsyncEnumerable<out T>
{
    IAsyncEnumerator<T> GetAsyncEnumerator(CancellationToken cancellationToken = default);
}
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
    T Current { get; }
    ValueTask<bool> MoveNextAsync();
}
public interface IAsyncDisposable
{
    ValueTask DisposeAsync();
}

Let's look at our example again, this time: with added async; we'll look at the consumer first (the code doing the foreach), so for now, let's imagine that we have:

IAsyncEnumerable<string> SomeSourceAsync(int x)
    => throw new NotImplementedException();

and focus on the loop; C# now has the await foreach concept, so we can do:

await foreach (var item in SomeSourceAsync(42))
{
    Console.WriteLine(item);
}

and the compiler interprets this as something similar to:

await using (var iter = SomeSourceAsync(42).GetAsyncEnumerator())
{
    while (await iter.MoveNextAsync())
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

(note that await using is similar to using, but DisposeAsync() is called and awaited, instead of Dispose() - even cleanup code can be asynchronous!)

The key point here is that this is actually pretty similar to our sync version, just with added await. Ultimately, however, the moment we add await the entire body is ripped apart by the compiler and rewritten as an asynchronous state machine. That isn't the topic of this article, so I'm not even going to try and cover how await is implemented behind the scenes. For today "a miracle happens" will suffice for that. The observant might also be wondering "wait, but what about cancellation?" - don't worry, we'll get there!

So what about our enumerator? Along with await foreach, we can also now write async iterators with yield; for example, we could do:

async IAsyncEnumerable<string> SomeSourceAsync(int x)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

In real code, we could now be consuming data from a remote source asynchronously, and we have a very effective mechanism for expressing open sequences of asynchronous data. In particular, remember that the await iter.MoveNextAsync() might complete synchronously, so if data is available immediately, there is no context switch. We can imagine, for example, an iterator block that requests data from a remote server in pages, and yield return each record of the data in the current page (making it available immediately), only doing an await when it needs to fetch the next page.

Behind the scenes, the compiler generates types to implement the IAsyncEnumerable<T> and IAsyncEnumerator<T> pieces, but this time they are even more obtuse, owing to the async/await restructuring. I do not intend to try and cover those here - it is my hope instead that we wave a hand and say "you know that expansion we wrote by hand earlier? like that, but with more async". However, there is a very important topic that we have overlooked, and that we should cover: cancellation.

But what about cancellation?

Most async APIs support cancellation via a CancellationToken, and this is no exception; look back up to IAsyncEnumerable<T> and you'll see that it can be passed into the GetAsyncEnumerator() method. But if we're not writing the loop by hand, how do we do this? This is achieved via WithCancellation, similarly do how ConfigureAwait can be used to configure await - and indeed, there's even a ConfigureAwait we can use too! For example, we could do (showing both config options in action here):

await foreach (var item in SomeSourceAsync(42)
    .WithCancellation(cancellationToken).ConfigureAwait(false))
{
    Console.WriteLine(item);
}

which would be semantically equivalent to:

var iter = SomeSourceAsync(42).GetAsyncEnumerator(cancellationToken);
await using (iter.ConfigureAwait(false))
{
    while (await iter.MoveNextAsync().ConfigureAwait(false))
    {
        var item = iter.Current;
        Console.WriteLine(item);
    }
}

(I've had to split the iter local out to illustrate that the ConfigureAwait applies to the DisposeAsync() too - via await iter.DisposeAsync().ConfigureAwait(false) in a finally)

So; now we can pass a CancellationToken into our iterator... but - how can we use it? That's where things get even more fun! The naive way to do this would be to think along the lines of "I can't take a CancellationToken until GetAsyncEnumerator is called, so... perhaps I can create a type to hold the state until I get to that point, and create an iterator block on the GetAsyncEnumerator method" - something like:

// this is unnecessary; do not copy this!
IAsyncEnumerable<string> SomeSourceAsync(int x)
    => new SomeSourceEnumerable(x);
class SomeSourceEnumerable : IAsyncEnumerable<string>
{
    private int x;
    public SomeSourceEnumerable(int x)
        => this.x = x;

    public async IAsyncEnumerator<string> GetAsyncEnumerator(
        CancellationToken cancellationToken = default)
    {
        for (int i = 0; i < 5; i++)
        {
            await Task.Delay(100, cancellationToken); // simulate async something
            yield return $"result from SomeSource, x={x}, result {i}";
        }
    }
}

The above works. If a CancellationToken is passed in via WithCancellation, our iterator will be cancelled at the correct time - including during the Task.Delay; we could also check IsCancellationRequested or call ThrowIfCancellationRequested() at any point in our iterator block, and all the right things would happen. But; we're making life hard for ourselves - the compiler can do this for us, via [EnumeratorCancellation]. We could also just have:

async IAsyncEnumerable<string> SomeSourceAsync(int x,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100, cancellationToken); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

This works similarly to our approach above - our cancellationToken parameter makes the token from GetAsyncEnumerator() (via WithCancellation) available to our iterator block, and we haven't had to create any dummy types. There is one slight nuance, though... we've changed the signature of SomeSourceAsync by adding a parameter. The code we had above still compiles because the parameter is optional. But this prompts the question: what happens if I passed one in? For example, what are the differences between:

// option A - no cancellation
await foreach (var item in SomeSourceAsync(42))

// option B - cancellation via WithCancellation
await foreach (var item in SomeSourceAsync(42).WithCancellation(cancellationToken))

// option C - cancellation via SomeSourceAsync
await foreach (var item in SomeSourceAsync(42, cancellationToken))

// option D - cancellation via both
await foreach (var item in SomeSourceAsync(42, cancellationToken).WithCancellation(cancellationToken))

// option E - cancellation via both with different tokens
await foreach (var item in SomeSourceAsync(42, tokenA).WithCancellation(tokenB))

The answer is that the right thing happens: it doesn't matter which API you use - if a cancellation token is provided, it will be respected. If you pass two different tokens, then when either token is cancelled, it will be considered cancelled. What happens is that the original token passed via the parameter is stored as a field on the generated enumerable type, and when GetAsyncEnumerator is called, the parameter to GetAsyncEnumerator and the field are inspected. If they are both genuine but different cancellable tokens, CancellationTokenSource.CreateLinkedTokenSource is used to create a combined token (you can think of CreateLinkedTokenSource as the cancellation version of Task.WhenAny); otherwise, if either is genuine and cancellable, it is used. The result is that when you write an async cancellable iterator, you don't need to worry too much about whether the caller used the API directly vs indirectly.

You might be more concerned by the fact that we've changed the signature, however; in that case, a neat trick is to use two methods - one without the token that is for consumers, and one with the token for the actual implementation:

public IAsyncEnumerable<string> SomeSourceAsync(int x)
    => SomeSourceImplAsync(x);

private async IAsyncEnumerable<string> SomeSourceImplAsync(int x,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Delay(100, cancellationToken); // simulate async something
        yield return $"result from SomeSource, x={x}, result {i}";
    }
}

This would seem an ideal candidate for a "local function", but unfortunately at the current time, parameters on local functions are not allowed to be decorated with attributes. It is my hope that the language / compiler folks take pity on us, and allow us to do (in the future) something more like:

public IAsyncEnumerable<string> SomeSourceAsync(int x)
{
    return Impl();

    // this does not compile today
    async IAsyncEnumerable<string> Impl(
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        for (int i = 0; i < 5; i++)
        {
            await Task.Delay(100, cancellationToken); // simulate async something
            yield return $"result from SomeSource, x={x}, result {i}";
        }
    }
}

or the equivalent using static local functions, which is usually my preference to avoid any surprises in how capture works. The good news is that this works in the preview language versions, but that is not a guarantee that it will "land".

Summary

So; that's how you can implement and use async iterators in C# now. We've looked at both the consumer and producer versions of iterators, for both synchronous and asynchronous code paths, and looked at various ways of accessing cancellation of asynchronous iterators. There is a lot going on here, but: hopefully it is useful and meaningful.