Spans and ref
part 1 : ref
One of the new features in C# 7 is by-reference (ref
) return values and locals. This is a complex topic to explain, but a good example of why we might want this is “spans” (Span<T>
). I don’t have any inside knowledge on the design meetings, but I’d go further and speculate that if Span<T>
wasn’t a thing, the ref
-changes wouldn’t have happened, so it makes sense to consider them together. Most of the fun things you can do with ref
returns / locals start to make a lot more sense when we look at Span<T>
- which we’ll do in part 2, but first we need to remind ourselves what ref
means, and explore the new ref
changes.
ref
returns and locals
There’s a reason that ref
(and cousin out
) aren’t used extensively on many APIs: they are hard to fully understand. A lot of people will describe them in terms of “changes being visible”, but that is just a side-effect, not the meaning. I don’t mean this as a criticism: it isn’t necessary for every C# developer to have a deep knowledge of the inner workings of these things.
But consider the following common question:
void PassByRef()
{
int i = 42;
IncrementByRef(ref i);
// what does this line print, and why?
Console.WriteLine(i);
}
void IncrementByRef(ref int x)
{
x = x + 1; // increment
}
Most developers will be able to correctly understand that this will output “43”, but asking them exactly what happened can reveal very different levels of understanding. The short summary is that a reference to the variable i
was passed to IncrementByRef
; all the code in IncrementByRef
that looks like it is reading / writing to the parameter is actually dereferencing the parameter at each stage. This is clearer if we write it in unsafe
pointer code instead:
unsafe void PassByPointer()
{
int i = 42;
IncrementByPointer(&i);
// what does this line print, and why?
Console.WriteLine(i);
}
unsafe void IncrementByPointer(int* x)
{
*x = *x + 1; // increment
}
Here we can clearly see the “take a reference to” operation (&
) and “dereference” (*
) operations, but there’s a lot of problems with pointers:
- the garbage collector (GC) refuses to even try to walk pointers, which means you need to be very careful to only access memory that won’t move (unmanaged memory, stack memory, or pinned managed objects)
- pointer arithmetic makes it trivially possible to access adjacent memory without any bounds checking
- it forces us to use
unsafe
, which makes it very easy to make subtle but major bugs that cause just about any level of silliness imaginable - pointers only work for a small subset of types - essentially primitives and
struct
s composed of primitives
The point of ref
parameters is to get the best of both worlds. ref
is essentially just like pointers, but with enough restrictions to stop us getting into messes. The aditional sanity checks and restrictions mean that the IL knows enough about the meaning for the GC to sensibly be able to navigate them without getting confused, so we don’t need to worry about the reference suddenly being meaningless - and since we can’t do anything too silly, we don’t need to drop to unsafe
. And it should work for any regular type.
But, historically, this ability to add automatic dereferencing and talk about ref
has been restricted to method parameters; no fields, no locals, and no return values.
ref
locals
The first change in C#7 allows us to talk about automatically dereferenced ref
items as local (method) variables. In the same way that a ref
parameter is denoted by a ref
prefix before the type, so it is with ref
locals, with the added bonus that ref var
is legal:
void ByRefLocal()
{
int i = 42;
ref var x = ref i;
x = x + 1;
// what does this line print, and why?
Console.WriteLine(i);
}
This prints “43” for exactly the same reasons as before - the only difference is that we now have a syntax to express ref
when talking about locals. Previously, we would have to have added an additional method to switch to ref
semantics for a local. One slight peculiarity here is that ref
locals must be assigned a value at the point of declaration - and we can only assign it a value at this point. Any further attempt to assign a value to a ref
local is interpreted as a dereferencing assignment - the *x =
in our pointer example.
This ability is nice, but it isn’t very useful until we combine it with…
ref return
A much more interesting and powerful addition in C# 7 is ref
returns. As the name suggests, this allows us to return a ref
value from a method. We can capture this value into a ref
local as long as we include an additional ref
just before the value to make it very clear that we don’t want to dereference - which is the regular behaviour whenever touching a ref
parameter or local:
ref int GetArrayReference(int[] items, int index)
=> ref items[index];
void IncrementInsideArrayByRef()
{
int[] values = { 1, 2, 3 };
ref int item = ref GetArrayReference(values, 1);
IncrementByRef(ref item);
// what does this line print, and why?
Console.WriteLine(string.Join(",", values));
}
Here the GetArrayReference
method provides the caller a ref
to inside the array. Note that the ability to get a ref
into an array is not by itself new - this has always worked:
IncrementByRef(ref values[item]);
The bit that is new and different is only the ability to return
a ref
value.
Since we increment a ref int
that refers to the array index 1
(the second element), the result is “1,3,3”.
Note that we don’t need to capture the ref
value before we use it - we can also pass a ref return
result directly into a ref
or out
parameter:
IncrementByRef(ref GetArrayReference(values, 1));
Are there restrictions on what we can ref return
?
Yes, yes there are. Figuring out the rules on what can and can’t be safely returned as ref
without letting the author get into an accidental ugly mess of landmines is probably why it has never been supported in the past. We’ve seen that we can return ref
references into arrays. And we’ve seen that we can take a ref
of a local variable. But a local only exists in the context of the current stack-frame: very bad things would happen if we could ref return
a ref
to a local - the caller would have a ref
to a position outside the active stack, which would be undefined behavior:
ref int ReturnInvalidStackReference()
{
int i = 32;
return ref i; // can't do this
}
void WhatHappensHere()
{
ref int v = ref ReturnInvalidStackReference();
CallSomeOtherMethods(); // to use the stack
int i = v; // dereference the ref
}
You’ll be relieved to know that the compiler doesn’t let us do this - the compiler is very strict to ensure that if we want to ref return
something, then it must demonstrably refer to a safe value. Put very simply: as long as the assignment doesn’t involve a ref
to a local, we’ll be fine. return ref i;
clearly involves the local i
, so can’t be returned. The return ref
expression is inspected for safety; each part of the expression must be safe. This includes any ref
parameters that we have passed into any method calls:
ref int ReturnInvalidStackReference()
{
int j = 42;
return ref DoSomething(ref j);
}
This might look like a confusing restriction, but note that DoSomething
could be implemented as:
ref int DoSomething(ref int evil) => ref evil;
which would expose the ref j
stack reference to the caller of ReturnInvalidStackReference
, so any such possibility is excluded. The implementation here is pretty solid, so if it refuses to let you ref return
something, you’ve probably attempted something that looks too much like you’re involving locals of the current method.
But no ref
fields
We have ref
parameters, ref
locals and ref
returns. However, there is no currently support for ref
fields (instance or static variables). Specifically, this is not legal:
struct Foo {
ref int _reference;
}
or:
class Foo {
ref int _reference;
}
The reason for this again relates to undefined behaviour and escaping the stack-frame. If we could put a ref
into a field, then we are at grave danger of proving invalid access (at some point later on) to a position in memory that now means something completely unrelated to what it meant when we took the ref
. Strictly speaking we can prove that ref
fields would be valid if the assigned value comes from inside an object
, and we’ll discuss another safe scenario in part 2,but currently the rule is simple: no ref
fields.
When is a local not a local?
This has some additional consequences for a number of code concepts that look like locals, but which are actually fields; for example:
- locals in an iterator block (
yield return
) - locals in an
async
method - captured variables in lambdas, anonymous methods, and LINQ syntax comprehensions
All of these situations are - and for similar reasons (with the added bonus of issues of lifetime) - the scenarios where you can’t use ref
or out
parameters or unsafe
, so basically: if you can’t use ref
, out
parameters or unsafe
, you won’t be able to use ref
locals or ref return
either.
One additional scenario, though, is tuples: as I discussed previously, tuples are secretly implemented as fields on the ValueTuple<...>
family. So: no ref
values in tuples.
Summary
This should give you enough to start understanding what ref
locals and ref
returns are, but for them really start to make sense we need a concrete example. And we get that in “spans”, coming up next!