Friday 14 November 2014

Efficiency: the work you don’t do…

Part 1 of 2; here’s where I talk about a premise; in part 2 I pester you for help and involvement
Q: what's faster than doing {anything}?
A: not doing it
I spend a lot of time playing with raw data, whether it is serialization protocols (protobuf-net), materialization (dapper) or network api protocols (redis, web-sockets, etc). I don't claim to be the world's gift to binary, but I like to think I can "hold my own" in this area. And one thing that I've learned over and over again is: at the application level, sure: do what you want - you're not going to upset the runtime / etc - but at the library level (in these areas, and others) you often need to do quite a bit of work to minimize CPU and memory (allocation / collection) overhead.
In the real world, we have competing needs. We want to process large amounts of data, but we don't want to pay the price. What to do?

Get me a bowl of Cap’n Proto…

A little while ago I became aware of Cap'n Proto. You might suspect from the name that it is tangentially related to Protocol Buffers (protobuf), and you would be right: it it the brain-child of Kenton Varda, one of the main originators of protobuf. So at this point you're probably thinking "another serialization protocol? boooring", but stick with me - it gets more interesting! And not just because it describes itself as a “cerealization protocol” (badum tish?). Rather than simply describing a serialization format, Cap'n Proto instead describes a general way of mapping structured data to raw memory. And here's the fun bit: in a way that is also suitable for live objects.
Compare and contrast:
  • load (or stream) the source data through memory, applying complex formatting rules, constructing a tree (or graph) of managed objects, before handing the root object back to the caller
  • load the source data into memory, and... that's it
Cap'n Proto is the second of these; it is designed to be fully usable (read and write) in the raw form, so if you can load the data, your work is done.

So how does that work?

Basically, it describes a “message” of data as a series of one or more “segments” (which can be of different sizes), with the objects densely packed inside a segment. Each object is split into a number of “data words” (this is where your ints, bools, floats, etc, get stored), and a number of “pointers”, where “pointer” here just means a few bits to describe the nature of the data, and either a relative offset to another chunk in the same segment, or an absolute offset into a different segment. Visually:
We can then make an object model that sits on top of that, where each object just has a reference to the segment, an absolute offset to the first word of our data, and a few bits to describe the shape, and we’re done. This format works both on disk and for live interactive objects, so the disk format is exactly the in-memory format. Zero processing. Note that there’s a lot of up-front processing that defines how an object with 2 bools, an int, a float, a string, and another bool (that was added in a later revision of your schema) should be laid out in the data words, but ultimately that is a solved problem.

Nothing is free

Of course everything has a price; the cost in this case is that traversing from object to object involves parsing the pointer offsets, but this is all heavily optimized - it just means the actual data is one level abstracted. Indeed, most of the work in shimming between the two can be done via basic bitwise operations (“and”, “shift”, etc). Plus of course, you only pay this miniscule cost for data you actually touch. Ignoring data is effectively entirely free. So all we need to do is load the data into memory. Now consider: many modern OSes offer "memory mapped files" to act as an OS-optimized way of treating a flat file as raw memory. This holds some very interesting possibilities for insanely efficient processing of arbitrarily large files.

I couldn't stay away

In my defence, I did have a genuine business need, but it was something I'd been thinking about for a while; so I took the plunge and started a .NET / C# implementation of Cap'n Proto, paying lots of attention to effeciency - and in particular allocations (the "pointers" to the underlying data are implemented as value types, with the schema-generated entities also existing as value types that simply surround a "pointer", making the defined members more conveniently accessible). This means that not only is there no real processing involved in parsing the data, but there are also no allocations - and no allocations means nothing to collect. Essentially, we can use the “message” as a localized memory manager, with no .NET managed reference-types for our data – just structs that refer to the formatted data inside the “message” – and then dispose of the entire “message” in one go, rather than having lots of object references. Consider also that while a segment could be a simple byte[] (or perhaps ulong[]), it could also be implemented using entirely unmanaged memory, avoiding the “Large Object Heap” entirely (we kinda need to use unmanaged memory if we want to talk to memory mapped files). A bit “inner platform”, but I think we’ll survive.

Sounds interesting! How do I use it?

Well, that’s the gotcha; the core is stable, but - I have lots more to do, and this is where I need your help. I think there's lots of potential here for a genuinely powerful and versatile piece of .NET tooling. In my next blog entry, I'm going to be describing what the gaps are and inviting community involvement.