A really interesting feature quietly slipped into C# 7.3 - interesting to me, at least - but which I’ve seen almost no noise about. As I’ve said many times before: I have niche interests - I spend a lot of time in library code, or acting in a consulting capacity on performance tuning application code - so in both capacities, I tend to look at performance tweaks that aren’t usually needed, but when they are: they’re glorious. As I say: I haven’t seen this discussed a lot, so: “be the change you want to see” - here’s my attempt to sell you on the glory of ref foreach
.
Because I know folks have a short attention span, I’ll start with the money shot:
Method | Mean | Gen 0 | Allocated |
---|---|---|---|
ListForEachLoop | 2,724.7 ns | - | - |
ArrayForEachLoop | 972.2 ns | - | - |
CustomForEachLoop | 987.2 ns | - | - |
ListForLoop | 1,201.3 ns | - | - |
ArrayForLoop | 593.0 ns | - | - |
CustomForLoop | 596.2 ns | - | - |
ListLinqSum | 7,057.1 ns | 0.0076 | 80 B |
ArrayLinqSum | 4,832.7 ns | - | 32 B |
ListForEachMethod | 2,070.6 ns | 0.0114 | 88 B |
ListRefForeachLoop | 586.2 ns | - | - |
ListSpanForLoop | 590.3 ns | - | - |
ArrayRefForeachLoop | 574.1 ns | - | - |
CustomRefForeachLoop | 581.0 ns | - | - |
CustomSpanForeachLoop | 816.1 ns | - | - |
CustomSpanRefForeachLoop | 592.2 ns | - | - |
With the point being: I want to sell you on those sub-600 nanosecond versions, rather than the multi-microsecond versions of the same operation.
What the hell is ref foreach
?
First, simple recap: let’s consider:
foreach (var someValue in someSequence)
{
someValue.DoSomething();
}
The details here may vary depending on what someSequence
is, but conceptually, what this is doing is reading each value from someSequence
into a local variable someValue
, and calling the DoSomething()
method on each. If the type of someValue
is a reference-type (i.e. a class
or interface
), then each “value” in the sequence is just that: a reference - so we’re not really moving much data around here, just a pointer.
When this gets interesting is: what if the type of someValue
is a struct
? And in particular, what if it is a heckin’ chonka of a struct
? (and yes, there are some interesting scenarios where struct
is useful outside of simple data types, especially if we enforce readonly struct
to prevent ourselves from shooting our own feet off) In that case, copying the value out of the sequence can be a singificant operation (if we do it often enough to care). Historically, the foreach
syntax has an inbuilt implementation for some types (arrays, etc), falling back to a duck-typed pattern that relies on a bool MoveNext()
and SomeType Current {get;}
pair (often, but not exclusively, provided via IEnumerator<T>
) - so the “return the entire value” is baked into the old signature (via the Current
property).
What if we could avoid that?
For arrays: we already can!
Let’s consider that someSequence
is explicitly typed as an array. It is very tempting to think that foreach
and for
over the array work the same - i.e. the same foreach
as above, compared to for
:
for(int i = 0 ; i < someArray.Length ; i++)
{
someArray[i].DoSomething();
}
But: if we run both of those through sharplab, we can see that they compile differently; in C#, the difference is that foreach
is basically:
SomeType someType = someArray[index];
someType.DoSomething();
which fetches the entire value out of the array, where as for
is:
someArray[index].DoSomething();
Now, you might be looking at that and thinking “aren’t they the same thing?”, and the simple answer is: “no, no they are not”. You see, there are two ways of accessing values inside an array; you can copy the data out (ldelem
in IL, which returns the value at the index), or you can access the data directly inside the array (ldelema
in IL, which returns the address at the index). Ironically, we need an address to call the DoSomething()
method, so for the foreach
version, this actually becomes three steps: “copy out the value from the index, store the value to a local, get the address of a local” - instead of just “get the address of the index”; or in IL:
IL_0006: ldloc.0 // the array
IL_0007: ldloc.1 // the index
IL_0008: ldelem SomeType // read value out from array:index
IL_000d: stloc.2 // store in local
IL_000e: ldloca.s 2 // get address of local
IL_0010: call instance void SomeType::DoSomething() // invoke method
vs
IL_0004: ldarg.0 // the array
IL_0005: ldloc.0 // the index
IL_0006: ldelema SomeType // get address of array:index
IL_000b: call instance void SomeType::DoSomething() // invoke method
So by using for
here, not only have we avoided copying the entire value, but we’ve dodged a few extra operations too! Nice. Depending on the size of the value being iterated (again, think “chunky struct” here), using for
rather than foreach
on an array (making sure you snapshot the value to elide bounds checks) can make a significant difference!
But: that’s arrays, and we aren’t always interested in arrays.
But how does that help me outside arrays?
You might reasonably be thinking “great, but I don’t want to just hand arrays around” - after all, they give me no ability to protect the data, and they’re inconvenient for sizing - you can’t add/remove, short of creating a second array and copying all the data. This is where C# 7.3 takes a huge flex; it introduces a few key things here:
- C# 7.0 adds
ref
return values from custom methods including indexers, andref
local values (so you don’t need to use them immediately as a return value) - C# 7.2 adds
ref readonly
to most places whereref
might be used (andreadonly struct
, which often applies here) - C# 7.3 adds
ref
(andref readonly
) asforeach
L-values (i.e. the iterator value corresponding to.Current
)
Note that with ref
, the caller can mutate the data in-place, which is not always wanted; ref readonly
signals that we don’t want that to happen, hence why it is so often matched with readonly struct
(to avoid having to make defensive copies of data), but as a warning: readonly
is always a guideline, not a rule; a suitably motivated caller can convert a ref readonly
to a ref
, and can convert a ReadOnlySpan<T>
to a Span<T>
, and convert any of the above to an unmanaged T*
pointer (at which point you can forget about all safety); this is not a bug, but a simple reality: everything is mutable if you try hard enough.
These languages features provide the building blocks - especially, but not exclusively, when combined with Span<T>
; Span<T>
(and the twin, ReadOnlySpan<T>
) provide unified access to arbitrary data, which could be a slice of an array, but could be anything else - with the usual .Length
, indexer (this[int index]
) and foreach
support you’d expect, with some additional compiler and JIT tricks (much like with arrays) to make them fly. Since spans are naturally optimized, one of the first things we can do - if we don’t want to deal with arrays - is: deal with spans instead! This is sometimes a little hard to fit into existing systems without drastically refactoring the code, but more recently (.NET 5+), we get helper methods like CollectionsMarshal.AsSpan
, which gives us the sized span of the data underpinning a List<T>
. This is only useful transiently (as any Add
/Remove
on the list will render the span broken - the length will be wrong, and it may now even point to the wrong array instance, if the list had to re-size the underlying data), but when used correctly, it allows us to access the data in situ rather than having to go via the indexer or iterator (both of which copy out the entire value at each position). For example:
foreach (ref var tmp in CollectionsMarshal.AsSpan(someList))
{ // also works identically with "ref readonly var", since this is
// a readonly struct
tmp.DoSomething();
}
Our use of ref var tmp
with foreach
here means that the L-value (tmp
) is a managed pointer to the data - not the data itself; we have avoided copying the overweight value-type, and called the method in-place.
If you look carefully, the indexer on a span is not T this[int index]
, but rather: ref T this[int index]
(or ref readonly T this[int index]
for ReadOnlySpan<T>
), so we can also use a for
loop, and avoid copying the data at any point:
var span = CollectionsMarshal.AsSpan(someList);
for (int i = 0; i < span.Length; i++)
{
span[i].DoSomething();
}
Generalizing this
Sometimes, spans aren’t viable either - for whatever reason. The good news is: we can do the exact same thing with our own types, in two ways:
- we can write our own types with an indexer that returns a
ref
orref readonly
managed pointer to the real data - we can write our own iterator types with a
ref
orref readonly
return value onCurrent
; this won’t satisfyIEnumerator<T>
, but the compiler isn’t limited toIEnumerator<T>
, and if you’re writing a custom iterator (rather than using ayield return
iterator block): you’re probably using a custom value-type iterator and avoiding the interface to make sure it never gets boxed accidentally, so: nothing is lost!
Purely for illustration (you wouldn’t do this - you’d just use ReadOnlySpan<T>
), a very simple custom iterator could be something like:
public struct Enumerator
{
private readonly SomeStruct[] _array;
private int _index;
internal Enumerator(SomeStruct[] array)
{
_array = array;
_index = -1;
}
public bool MoveNext()
=> ++_index < _array.Length;
public ref readonly SomeStruct Current
=> ref _array[_index];
}
which would provide foreach
access almost as good as a direct span. If the caller uses foreach (var tmp in ...)
rather than foreach(ref readonly var tmp in ...)
, then the compiler will simply de-reference the value for the caller, which it would have done anyway in the old-style foreach
, so: once again: no harm.
Summary
In modern C#, we have a range of tricks that can help in certain niche scenarios relating to sequences of - in particular - value types. These scenarios don’t apply to everyone, and that’s fine. If you never need to use any of the above: that’s great, and good luck to you. But when you do need them, they are incredibly powerful and versatile, and a valuable tool in the optimizer’s toolbox.
The benchamrk code used for the table at the start of the post is included here.