Friday, 15 November 2013

Allocaction, Allocation, Allocation

Foreword

If you read nothing else in this post, please at least read the bit about params; thanks. I’m going to talk about a number of things that cause unforeseen allocations, and then discuss possible options for avoiding them. Some of the options presented are intentionally exotic and esoteric, and are not intended to be used directly – they are for the purposes of discussion only.

Additional: Blogger and LiveWriter seem to be squabbling about the layout; my apologies for any spacing issues

Back story

Today I rant about allocations; “but objects are cheap!” I hear you cry – and indeed you’d be right. So is an individual coffee bean, but yet coffee is one of the most highly traded commodities in the world (next to crude oil, IIRC). Any Java joke in here is coincidental. You use something routinely, and it is amazing how it adds up. On 64 bit, even an empty object is pretty big.
.NET has a first class garbage collector, and it is constantly improving, but on a busy system memory concerns can be very noticeable and very real. Adding more memory to the box helps a bit, but then when garbage collection does eventually happen, it can be even more noticeable.
So; I get very excited about allocations. Perhaps ridiculously so. I want to be very clear: I don’t mind using objects, and I don’t mind using memory. Simply – I want to be using them for useful work. And a lot of things… just aren’t. Collecting a few hundred generation-zero objects might be cheap, but do you know what is cheaper? Not having to. Perhaps my perspective is also skewed a bit by the fact that I work on a lot of library / utility code, and have reached the conclusion that no library should ever be the cause of unexpected / unnecessary overhead. Application code is a bit different: in application code you should be writing domain logic about your application – that is useful work and is free to do whatever it wants.
Enough background; let’s look at an example; don’t worry, we’ll pick an easy one…

string prefix = // something variable
foreach (var line in File.ReadAllLines(somePath))
foreach (var token in line.Split(';').Where(s => s.StartsWith(prefix)))
{
    Console.WriteLine(token);
}

Fairly innocent looking file processing; but let’s pull it apart for allocations – I’ll list them first, and discuss each in turn:


  • ReadAllLines creates a vector containing each line, all at once
  • We get an enumerator over the vector of lines
  • We get a capture-context over prefix
  • Then for every line:

    • We create and populate a new vector of characters, length 1 – for the Split call
    • Split allocates a vector of the results
    • The first iteration only (in this case, but can vary) creates a delegate to the hoisted predicate on the capture-context (the lambda)
    • Where allocates an enumerable/iterator (some fancy details make the same object function as enumerable and iterator most of the time)

Which might not sound too bad at all, but if you know (by usage, profiling, memory dumps etc) that this code is used all the time, then you really might want to think about how much of these are actually useful.

The obvious but uninteresting problems


There are no prizes for spotting these…

ReadAllLines


Yeah, not much to say here other than: use ReadLines instead, which gives you a spooling iterator over the lines, rather than reading everything up front. Not what I want to focus on today, really.

foreach over a vector


You need to be a little careful over-interpreting foreach on a vector – I suspect the JIT might do more than you might imagine here, so I’m not going to get too excited about this. If we did still have a vector (see above), then a naked for might be cheaper, but I wouldn’t rely on it – again, not my focus.

The capture-context and delegate


Did that lambda really make the code easier to read than, say, an if test (with either a nested block or a continue)? LINQ is great, but can be over-used… and in non-trivial code the captures can be surprisingly nuanced, often requiring a new capture-context and delegate instance on every iteration.

The less obvious, but more interesting (IMO) problems


There are also no prizes for spotting these, but give yourself a quiet cheer if you did.

The params array to Split

This is one that often gets overlooked; the overload of Split being used here takes a params char[]; there is no overload that takes a single char. Most usages of Split that I see only use a single char parameter (or maybe two). Since .NET vectors are mutable, it doesn’t trust the code inside the method not to change the values, so the compiler explicitly does not hoist this away onto a static field somewhere. It would be nice if it could, but there’s a lot of “const correctness” problems that would need to be resolved for that to work – so it needs to allocate a new vector every single call – which is often inside loops within loops. In the interim, I posit that there are some sorely needed missing overloads here! I strongly recommend that you do a search for “.Split(“ in your code and see how many times you use it – especially in loops. I tend to create a utility class (imaginately named StringSplits), and use things like
line.Split(StringSplits.Semicolon), where that is defined as static readonly char[] Semicolon = {';'};
This change is so staggeringly simple, but removes so many allocations that are entirely overhead. Imagine what your CPU could be doing instead of collecting all those vectors!

The result of Split

Yet another vector. But really… do we need that? A lot of uses of Split are going to involve things like foreach, so maybe an IEnumerable<string> (presumably via an iterator block) would be more efficient. But, then we still need to allocate an enumerator etc. Or do we? A lot of people mistakenly think that foreach is tied to the IEnumerable[<T>]/IEnumerator[<T>] interfaces. It is indeed true that the compiler can use these, but actually the compiler looks first for a duck-typed API:


  • is there a method called GetEnumerator() that returns some result…

    • …that has a bool MoveNext() method
    • …and a .Current property with a get accessor?

If so: that is what it uses. Historically, this allowed for typed iterators that didn’t need to box their values, but it also allows the iterator itself to be a value-type. This is actually very common – List<T> and Dictionary<TKey,TValue> use this approach – which means when you iterate a List<T> directly, there are zero allocations. If, however, you use the IEnumerable[<T>]/IEnumerator[<T>] APIs, you force the iterator to become boxed. And keep in mind that LINQ uses the IEnumerable[<T>]/IEnumerator[<T>] APIs – so code like our lazy Where shown above can multiple unexpected consequences.

If we wanted to, we could write a Split2 (naming is hard) extension method that took a single char, and returned a value-typed enumerator / iterator; then this entire chunk would have zero overhead allocations. It would still have the allocations for the substrings, but that is useful data, not overhead. Unfortunately the compiler doesn’t help us at all here – if you want to write value-typed iterators you need to do it all by hand, which is quite tedious. I’ve tested it locally and it works great, but I’m not actually presenting this as a serious “thing to do” – the intent here is more to get people thinking about what is possible.

What about the substrings?


Another interesting feature of Split is that everything it produces is pure redundant duplication. If I split “abc,def,ghi” by commas, then I now have 4 strings, 3 of which are exact sub-substrings of the original. Since strings in .NET are immutable (at least, to the public), it is alarming how close this is to being ideal, while being so far away in reality. Imagine if there was a Substring struct – a tuple consisting of a string reference, the offset to the first character, and length – with equality and comparison support etc. Unfortunately, without direct support from the BCL by things like StringBuilder, TextWriter, etc (so that they could copy in values without creating a new string) it wouldn’t be very useful. What would be even nicer is if we didn’t need such a thing in the first place, and could simply use string itself to represent internal sub-strings – however, unfortunately the internal implementation details don’t really allow for that: it isn’t the case that the string type contains a pointer to the actual data; rather, the string instance is magically variable-sized, and is the data. Again, I don’t have a magic wand here, other than highlighting where hidden allocations can come from.

Why do I care about all of this?


Because often, memory profiling shows that vast amounts of our memory is swamped with silly, unnecessary allocations. Of particular note in some recent Stack Exchange memory dumps was massive numbers of string[] next to the same recurring set of strings. After some digging, we recognised them from [OutputCache] parameters. Some of these we could remove ourselves by tweaking our GetVaryByCustomString; in our case, we were actually mainly interested in “does the tokenized string contain this token, under usual split logic“ – and of course you don’t actually need to split the code at all to do that – you can just check for containment, paying special attention to the previous/next characters. Some more of these allocations were coming from ASP.NET MVC (and I hope to send a pull-request over shortly); and yet more are coming from raw ASP.NET internals - (Page, IIRC - which has the low-level output caching implementation). The overheads we were seeing is from Split; and pretty much all of it is avoidable if the code is written with a view to allocations. Since we make extensive use of caching, we were perhaps paying disproportionately here.

But if your server’s memory is being dominated by unnecessary overhead: that is memory that can’t be used for interesting data, like questions and answers.

Oh, and the final code


string prefix = // something variable
foreach (var line in File.ReadLines(somePath))
foreach (var token in line.Split2(';'))
{
    if(token.StartsWith(prefix))
    {
        Console.WriteLine(token);
    }
}

5 comments:

Victor Kornov said...

Oh, and the difference the final code made compared to the original one?

Marc Gravell said...

@Victor minimal overhead allocations (just the readlines iterator). This wasn't the actual code from the memory dumps, so I can't give you exact numbers - but the main per-iteration overheads here were a char[], a string[] and a "where" enumerator. My point is: these per-row overheads can add up surprisingly quickly on a hot path of a busy system.

Benjol said...

I remember being infuriated/baffled by the fact I had no way of getting round these allocations generated by a call to a params in some system function (can't quite remember which one now).

That aside, were you aware that Google is serving "Live Chat with Beautiful Russian Women" in your sidebar? ;)

Ian said...

A RegEx that does the matching and splitting in one go operating over a scream may also be a good solution. Then only matching groups will get a string allocated for them.

Anonymous said...

Great post! What tools do you use for memory profiling?