Wednesday, 10 October 2012

Multiplexed transactions in BookSleeve

UPDATE

BookSleeve has now been succeeded by StackExchange.Redis, for lots of reasons. The API and intent is similar, but the changes are significant enough that we had to reboot. All further development will be in StackExchange.Redis, not BookSleeve.

ORIGINAL CONTENT

Short version, for the impatient

BookSleeve now has support for transactions using WATCH and pre-conditions. If you don’t know what BookSleeve or redis is, this won’t be of interest to you.

Background

We use redis extensively as part of our caching (and a few other things) technologies at Stack Exchange, and as part of doing that efficiently, we put together BookSleeve, which is a high-performance multiplexer over a redis connection. In English, what that means is: you can share a single redis connection from lots of different threads, and it will worry about delivering the commands efficiently and shipping results back to the appropriate callers, without 200 commands having to pay the penalty of 200 latency. Redis even includes transaction support for complex operations, however, previously I’ve always been a bit stumped how to fit this into BookSleeve…

How do transactions work in redis?

You see, redis transactions are not like SQL transactions. A transaction is defined by a MULTIEXEC block, with everything between the MULTI and the EXEC being processed as a single atomic unit (meaning: redis doesn’t process any requests from other connections during that time - don’t worry if that sounds draconian, redis manages extraordinarily well on a single thread anyway). That’s pretty easy to fit into a multiplexer, as we can just have a structure to buffer up commands, and send them when complete – so that already exists in BookSleeve:

using (var tran = conn.CreateTransaction()) 
{
tran.Strings.Set(0, "foo", "abc");
tran.Strings.Set(0, "bar", "def");
tran.Execute();
}

But here’s the gotcha: between the MULTI and the EXEC, you don’t get results – you only get results when you commit the transaction. This means that you can’t easily request information in the middle of a transaction and then make a decision on that for what you do next – which makes a lot of sense really, since while you’re doing that thinking and network-IO, every other connection would be stalled. A sane decision really.

To get around this limitation, redis transactions actually allow a bit more cleverness… you can issue a WATCH command against keys of your choice, and then if somebody changes that key, your transaction automatically gets killed. So, a typical cycle might be:

WATCH SomeKeyName
GET SomeKeyName
{some test on the value back in your client code, then}
MULTI
SET SomeKeyName SomeNewValue
EXEC
{or if you decided you didn't like it after all}
UNWATCH

And importantly, because of the WATCH, if another connection comes along and changes SomeKeyName, then the EXEC does nothing, and returns a reply that robustly indicates the cancellation. This actually allows a lot of very subtle usage, while allowing everything to happen at the server as a single unit of work.

So what is the problem? And what did you do?

The painful bit here is simply: WATCH doesn’t work in a multiplexer. As soon as one caller issues a WATCH, either you need to stop accepting work from other callers, or you now have no idea what each caller thinks it is watching (which is very important). Another approach would be to somehow have the user write callbacks that happen as part of a transaction, but what I absolutely don’t want to do is expose anything in BookSleeve that inadvertently allow one caller to break everybody, by being too slow – so callbacks: not really an option. So for a long time, BookSleeve only made use of WATCH etc internally, and didn’t expose it to the caller. But that has finally changed!

I’ve just committed my first stab at improving this, basically by implementing a number of pre-canned pre-conditions that can be enforced on a transaction. The “pre-canned” is there to avoid the issue of opening up massive performance issues, but in reality they are very simple. For example, a previous internal usage of WATCH was to take an exclusive lock (by marking a key as in-use). Until today, that was about 100 lines of complex code that needed to know about the full gory details of the redis protocol – where-as now it is just:

TaskCompletionSource<bool> result = new TaskCompletionSource<bool>();

using (var tran = CreateTransaction())
{
tran.AddCondition(Condition.KeyNotExists(db, key));
tran.Strings.Set(db, key, value, expirySeconds);
tran.Execute().ContinueWith(takeLockContinuation);
}
return result.Task;

where takeLockContinuation is just a static delegate instance that take the Task<bool> result from the Execute call, and sets that into result (plus handling task faults, cancellation, etc). Crucially, the condition (Condition.KeyNotExists in this case) takes care of both the WATCH steps, and the actual validation / enforcement. Basically, if there are pre-conditions (there can be multiple), then they are all sent to the server, and only if they all return the expected results is the MULTI / EXEC block sent – otherwise an UNWATCH is sent instead. Simple, elegant (IMO), and efficient.

The available pre-conditions are pretty limited at the moment, but I’ll try to add all the obvious ones I can think of (exists/not-exists for keys/hashes, equals/not-equals for keys/hashes, maybe a few inequalities, maybe something involving ttl). But finally I have a WATCH implementation that isn’t entirely sucky for a multiplexer. At the moment this is in the code repository only; I'll push to NuGet when I have a few more pre-conditions implemented.