Monday, 24 October 2011

Assault by GC

TL;DR;

We had performance spikes, which we eased with some insane use of structs.

History

For a while now, we had been seeing a problem in the Stack Exchange engine where we would see regular and predictable stalls in performance. So much so that our sysadmins blogged about it back here. Annoyingly, we could only see this problem from the outside (i.e. haproxy logging, etc) – and to cut a very long story short, these stalls were due to garbage collection (GC).

We had a problem, you see, with some particularly large and long-lived sets of data (that we use for some crazy-complex performance-related code), that would hobble the server periodically.

Short aside on Server GC

The regular .NET framework has a generational GC – meaning: new objects are allocated “generation 0” (GEN-0). When it chooses, the system scans GEN-0 and finds (by walking references from the so-called “roots”) which (if any) of the objects are still in use to your application. Most objects have very short lives, and it can very quickly and efficiently reclaim the space from GEN-0; any objects that survived move to GEN-1. GEN-1 works the same (more-or-less), but is swept less often – moving any survivors into GEN-2. GEN-2 is the final lurking place for all your long-lived data – it is swept least often, and is the most expensive to check (especially if it gets big).

Until .NET 4.5 rolls into town with a background server GC, checking GEN-2 is a “stop the world” event – it (usually briefly) pauses your code, and does what it needs to. Now imagine you have a huge set of objects which will never be available to collect (because you will always be using them) all sat in GEN-2. What does that look like? Well, using StackExchange Data Explorer to analyse our haproxy logs, it looks a bit like this:

image

I’ve omitted the numbers, as they don’t matter; but interpretation: normally the server is ticking along with nice fast response times, then WHAM! A big spike (which we have correlated with GC) that just hammers the response-times.

So what to do?

We take performance very seriously at Stack Exchange, so as you imagine this got a bit of attention. The obvious answer of “don’t keep that data”, while a valid suggestion, would have hurt a lot of the overall performance, so we needed to find a way to remove or reduce this while keeping the data.

Our initial efforts focused on removing things like unnecessary copy/extend operations on the data, which helped some, but didn’t really make a step-change. Eventually, we concluded…

Break all the rules

Important: the following is merely a discussion of what has helped us. This is not a magic bullet, and should only be applied to some very limited scenarios after you have profiled and you know what you are doing and why. And it helps if you are just a little bit crazy.

First, a brief overview – imagine you are holding Customer objects in memory for an extended period; you have a lot of them in a List<Customer>, and occasionally add more (with suitable threading protection, etc). Further, you have some pre-calculated subsets of the data (perhaps by region) – so a number of smaller List<Customer>. That is entirely inaccurate, but sets the scene ;p

After exhausting alternatives, what we did was:

  • change Customer from a class to a struct (only within this crazy code)
  • change the main store from a List<Customer> to a Customer[]
  • change the subsets from List<Customer> to List<int>, specifically the offset into the main Customer[]

eek; so what does that do for us?

  • the main data is now a single object (the array) on the "large object heap", rather than lots of small objects
  • by using direct access into an array (rather than a list indexer) you can access the data in-situ, so we are not constantly copying Customer values on the stack
  • the int offsets for subsets are essential to make sure we don't have multiple separate values of each record, but on x64 using int offsets rather than references also means our subsets suddenly take half the memory

Note that for some of the more complex code (applying predicates etc), we also had to switch to by-ref passing, i.e.

void SomethingComplex(ref Customer customer) {...}
...
int custIndex = ...
SomethingComplex(ref customers[custIndex]);

This again is accessing the Customer value in-situ inside the array, rather than copying it.

To finish off the bad-practice sheet, we also had some crazy changes to replace some reference data inside the record to fixed sized buffers inside the value (avoiding an array on the heap per item), and some corresponding unsafe code to query it (including the rarely-used stackalloc), but that code is probably a bit complex to cover properly - essentially: we removed all the objects/references from this data. And after removing the objects, there is nothing for GC to look at.

Impact

It helped! Here’s the “after”, at the same scales:

image

As you can see, there are still a few vertical spikes (which still tie into GC), but they are much less dense, and less tall. Basically, the server is no longer tied up in knots. The code is a bit less OOP than we might usually do, but it is a: constrained to this very specific scenario (this is absolutely not a carte-blanche “make everything a struct”), and b: for understood reasons.

Plus, it was pretty interesting (oh dear; I really, really need to get out more). Enjoy.

Disclaimers

  • You need to be really careful messing with this – especially with updates etc, since your records are certainly over-size for atomic read/write, and you don’t want callers seeing torn data.
  • This is not about “struct is faster” – please don’t let that be what you take away; this is a very specific scenario.
  • You need to be really careful to avoid copying fat values on the stack.
  • This code really is well outside of the normal bell-curve. Indeed, it is pretty resonant of XNA-style C# (for games, where GC hurts the frame-rate).

Credit

A lot of input, data-analysis, scheming, etc here is also due to Sam Saffron and Kyle Brandt; any stupid/wrong bits are mine alone.

Friday, 7 October 2011

The trouble with tuples

A little while ago, I mentioned how new tuple handling in protobuf-net meant that likely-looking tuples could now be handled automatically, and even better this meant I could remove my hacky KeyValuePair<,> handling. All was well in the world – after all, what could go wrong?

Answer: Mono.

Actually, I’m not criticizing the Mono implementation really, but simply: Mono has a subtly different implementation of KeyValuePair<,> to Microsoft – nothing huge; simply there exist some set accessors (private ones, visible here). And the library was pretty fussy – if there was a set it wouldn’t be treated as a sure-thing tuple.

This is now fixed in r447, but: if you are using Mono and your dictionaries stopped serializing – then honestly and humbly: sorry about that.

And if you are a library author – watch out: sometimes the simplest most subtle differences between implementations can kill you.

Monday, 15 August 2011

Automatic serialization; what’s in a tuple?

I recently had a lot of reason (read: a serialization snafu) to think about tuples in the context of serialization. Historically, protobuf-net has focused on mutable types, which are convenient to manipulate while processing a protobuf stream (setting fields/properties on a per-field basis). However, if you think about it, tuples have an implicit obvious positional “contract”, so it makes a lot of sense to serialize them automatically. If I write:

var tuple = Tuple.Create(123, "abc");

then it doesn’t take a lot of initiative to think of tuple as an implicit contract with two fields:

  • field 1 is an integer with value 123
  • field 2 is a string with value “abc”

Since it is deeply immutable, at the moment we would need to either abuse reflection to mutate private fields, or write a mutable surrogate type for serialization, with conversion operators, and tell protobuf-net about the surrogate. Wouldn’t it be nice if protobuf-net could make this leap for us?

Well, after a Sunday-night hack it now (in the source code) does.

The rules are:

  • it must not already be marked as an explicit contract
  • only public fields / properties are considered
  • any public fields (spit) must be readonly
  • any public properties must have a get but not a set (on the public API, at least)
  • there must be exactly one interesting constructor, with parameters that are a case-insensitive match for each field/property in some order (i.e. there must be an obvious 1:1 mapping between members and constructor parameter names)

If all of the above conditions are met then it is now capable of behaving as you might hope and expect, deducing the contract and using the chosen constructor to rehydrate the objects. Which is nice! As a few side-benefits:

  • this completely removes the need for the existing KeyValuePairSurrogate<,>, which conveniently meets all of the above requirements
  • it also works for C# anonymous types if we want, since they too have an implicit positional contract (I am not convinced this is significant, but it may have uses)

This should make it into the next deploy, once I’m sure there are no obvious failures in my assumptions.

Just one more thing, sir…

While I’m on the subject of serialization (which, to be fair, I often am) – I have now also completed some changes to use RuntimeHelpers.GetHashCode()for reference-tracking (serialization). This lets met construct a reference-preserving hash-based lookup to minimise the cost of checking whether an object has already been seen (and if so, fetch the existing token). Wins all round.

Friday, 12 August 2011

Shifting expectations: non-integer shift operators in C#

In C#, it is quite nice that the shift operators (<< and >>) have had their second argument constrained to int arguments, avoiding the oft-confusing piped C++ usage:

cout << "abc" << "def";

But wait a minute! The above line is actually C#, in an all-C# project. OK, I cheated a little (I always do…) but genuinely! This little nugget comes from a stackoverflow post that really piqued my curiosity. All credit here goes to vcsjones, but indeed the line in the documentation about restricting to int (and the return type, etc) is about declaring the operator – not consuming it – so it is seemingly valid for the C# dynamic implementation to use it quite happily.

In fact, the main cheat I used here was simply hiding the assignment of the result, since that is still required. Here’s the full evil:

  using System;

using System.Dynamic;
using System.Linq.Expressions;

class Evil : DynamicObject {
static void Main() {
dynamic cout = new Evil();
var hacketyHackHack =
cout << "abc" << "def";
}
public override bool TryBinaryOperation(
BinaryOperationBinder binder,
object arg, out object result) {
switch(binder.Operation) {
case ExpressionType.LeftShift:
case ExpressionType.LeftShiftAssign:
// or whatever you want to do
Console.WriteLine(arg);
result = this;
return true;
}
return base.TryBinaryOperation(
binder, arg, out result);
}
}

I've seen more evil things (subverting new() via ContextBoundObject is still my favorite evil), but quite dastardly, IMO!

Additional: grrr! I don't know why blogger hates me so much; here it is on pastie.org.

Monday, 4 July 2011

BookSleeve, transactions, hashes, and flukes

Sometimes, you just get lucky.

tl;dr; version : BookSleeve now has transactions and Hashes; new version on nuget

Transactions

For some time now I’ve been meaning to add multi/exec support to BookSleeve (which is how redis manages transactions). The main reason it wasn’t there already is simply: we don’t use them within StackExchange.

To recap, because BookSleeve is designed to be entirely asynchronous, the API almost-always returns some kind of Task or (more often) Task-of-T; from there you can hook a “continue-with” handler, you can elect to wait for a result (or error), or you can just drop it on the floor (fire-and-forget).

Over the weekend, I finally got around to looking at multi/exec. The interesting thing here is that once you’ve issued a MULTI, redis changes the output – return “QUEUED” for pretty much every command. If I was using a synchronous API, I’d be pretty much scuppered at this point; I’d need to duplicate the entire (quite large) API to provide a way of using it. I’d love to say it was by-design (I’d be lying), but the existing asynchronous API made this a breeze (relatively speaking) – all I had to do was wrap the messages in a decorator that expects to see “QUEUED”, and then (after the EXEC) run the already-existing logic against the wrapped message. Or more visually:

conn.Remove(db, "foo"); // just to reset
using(var tran = conn.CreateTransaction())
{ // deliberately ignoring INCRBY here
tran.Increment(db, "foo");
tran.Increment(db, "foo");
var val = tran.GetString(db, "foo");

tran.Execute(); // this *still* returns a Task

Assert.AreEqual("2", conn.Wait(val));
}

Here all the operations happen as an atomic (but pipelined) unit, allowing for more complex integrity conditions. Note that Execute() is still asynchronous (returning a Task), so you can still carry on doing useful work while the network and redis server does some thinking (although redis is shockingly quick).

So what is CreateTransaction?

Because a BookSleeve connection is a thread-safe multiplexer, it is not a good idea to start sending messages down the wire until we have everything we need. If we did that, we’d have to block all the other clients, which is not a good thing. Instead, CreateTransaction() creates a staging area to build commands (using exactly the same API) and capture future results. Then, when Execute() is called the buffered commands are assembled into a MULTI/EXEC unit and sent down in a contiguous block (the multiplexer will send all of these together, obviously).

As an added bonus, we can also use the Task cancellation metaphor to cleanly handle discarding the transaction without execution.

The only disadvantage of the multiplexer approach here is that it makes it pretty hard to use WATCH/UNWATCH, since we wouldn’t be sure what we are watching. Such is life; I’ll think some more on this.

Hashes

Another tool we don’t use much at StackExchange is hashes; however, for your convenience this is now fully implemented in BookSleeve. And since we now have transactions, we can solve the awkward varadic/non-varadic issue of HDEL between 2.2+ and previous versions; if BookSleeve detects a 2.2+ server it will automatically use the varadic version, otherwise it will automatically use a transaction (or enlist in the existing transaction). Sweet!

Go play

A new build is on nuget; I hope you find it useful.

Wednesday, 8 June 2011

Profiling with knobs on

A little while ago, I mentioned the core part of the profiling tool we use at stackexchange – our mini-profiler. Well, I’m delighted to say that with the input from Jarrod and Sam this tool has been evolving at an alarming rate, and has now gained:

  • full integration with our bespoke SQL profiler, which is now also included
    • basically, this is a custom ADO.NET DbConnection that you wrap around your existing connection, and it adds logging
  • an awesome UI (rather than html comments), including a much clearer timeline
  • full support for AJAX activity
  • some very basic highlighting of duplicated queries, etc
  • webform support
  • and a ton of other features and enhancements

To illustrate with Jarrod’s image from the project page:

miniprofiler

The custom connection has been exercised extensively with dapper-dot-net, LINQ-to-SQL and direct ADO.NET, but should also work with Entity Framework (via ObjectContextUtils, provided) and hopefully any other DB frameworks built on top of ADO.NET.

We use it constantly. It is perhaps our primary tool in both understanding problems and just keeping a constant eye on the system while we use it – since this is running (for our developers) constantly, 24×7.

All too often such fine-grain profiling is just “there’s a problem, ask a DBA to attach a profiler to the production server for 2 minutes while I validate something”.

Enjoy.

http://code.google.com/p/mvc-mini-profiler/

Wednesday, 25 May 2011

So soon, beta 2

I’m delighted by the level of response to my protobuf-net v2 beta; I have been kept really busy with a range of questions, feature requests and bug fixes (almost exclusively limited to the new features).

So, before I annoy people with too many “fixed in source” comments, I thought I’d better re-deploy. Beta 2 gives the same as beta 1, but with:

  • reference-tracked objects
    • root object now included (BREAKING CHANGE)
    • fixed false-positive recursion issue
  • full type metadata
    • support custom formatting/parsing
  • interface serialization
    • support serialization from interfaces
    • support known-implementations of interfaces
    • support default concrete type
  • WCF
    • expose WCF types on public API to allow custom models
  • misc
    • support shadow set-methods
    • fix IL glitch
    • fix cold-start thread-race condition
    • fix missing method-forwarding from legacy non-generic API

In addition to a very encouraging level of interest, I’m also pleased that things like the interface-based support mentioned above were pretty effortless to push through the v2 type-model. These are things that had been proposed previously, but there was just no way to push them into the codebase without hideous (and cumulative) hackery. With v2, it was pretty much a breeze.

But for v2 fun see the project download page

And please, keep pestering me ;p

Note that the BREAKING CHANGE marked above only applies to data serialized with the new full-graph AsReference option. No v1 data is impacted, but if you have stored data from earlier v2 alpha/beta code, please drop me a line and I can advise.