Monday, 15 August 2011

Automatic serialization; what’s in a tuple?

I recently had a lot of reason (read: a serialization snafu) to think about tuples in the context of serialization. Historically, protobuf-net has focused on mutable types, which are convenient to manipulate while processing a protobuf stream (setting fields/properties on a per-field basis). However, if you think about it, tuples have an implicit obvious positional “contract”, so it makes a lot of sense to serialize them automatically. If I write:

var tuple = Tuple.Create(123, "abc");

then it doesn’t take a lot of initiative to think of tuple as an implicit contract with two fields:

  • field 1 is an integer with value 123
  • field 2 is a string with value “abc”

Since it is deeply immutable, at the moment we would need to either abuse reflection to mutate private fields, or write a mutable surrogate type for serialization, with conversion operators, and tell protobuf-net about the surrogate. Wouldn’t it be nice if protobuf-net could make this leap for us?

Well, after a Sunday-night hack it now (in the source code) does.

The rules are:

  • it must not already be marked as an explicit contract
  • only public fields / properties are considered
  • any public fields (spit) must be readonly
  • any public properties must have a get but not a set (on the public API, at least)
  • there must be exactly one interesting constructor, with parameters that are a case-insensitive match for each field/property in some order (i.e. there must be an obvious 1:1 mapping between members and constructor parameter names)

If all of the above conditions are met then it is now capable of behaving as you might hope and expect, deducing the contract and using the chosen constructor to rehydrate the objects. Which is nice! As a few side-benefits:

  • this completely removes the need for the existing KeyValuePairSurrogate<,>, which conveniently meets all of the above requirements
  • it also works for C# anonymous types if we want, since they too have an implicit positional contract (I am not convinced this is significant, but it may have uses)

This should make it into the next deploy, once I’m sure there are no obvious failures in my assumptions.

Just one more thing, sir…

While I’m on the subject of serialization (which, to be fair, I often am) – I have now also completed some changes to use RuntimeHelpers.GetHashCode()for reference-tracking (serialization). This lets met construct a reference-preserving hash-based lookup to minimise the cost of checking whether an object has already been seen (and if so, fetch the existing token). Wins all round.

Friday, 12 August 2011

Shifting expectations: non-integer shift operators in C#

In C#, it is quite nice that the shift operators (<< and >>) have had their second argument constrained to int arguments, avoiding the oft-confusing piped C++ usage:

cout << "abc" << "def";

But wait a minute! The above line is actually C#, in an all-C# project. OK, I cheated a little (I always do…) but genuinely! This little nugget comes from a stackoverflow post that really piqued my curiosity. All credit here goes to vcsjones, but indeed the line in the documentation about restricting to int (and the return type, etc) is about declaring the operator – not consuming it – so it is seemingly valid for the C# dynamic implementation to use it quite happily.

In fact, the main cheat I used here was simply hiding the assignment of the result, since that is still required. Here’s the full evil:

  using System;

using System.Dynamic;
using System.Linq.Expressions;

class Evil : DynamicObject {
static void Main() {
dynamic cout = new Evil();
var hacketyHackHack =
cout << "abc" << "def";
}
public override bool TryBinaryOperation(
BinaryOperationBinder binder,
object arg, out object result) {
switch(binder.Operation) {
case ExpressionType.LeftShift:
case ExpressionType.LeftShiftAssign:
// or whatever you want to do
Console.WriteLine(arg);
result = this;
return true;
}
return base.TryBinaryOperation(
binder, arg, out result);
}
}

I've seen more evil things (subverting new() via ContextBoundObject is still my favorite evil), but quite dastardly, IMO!

Additional: grrr! I don't know why blogger hates me so much; here it is on pastie.org.

Monday, 4 July 2011

BookSleeve, transactions, hashes, and flukes

Sometimes, you just get lucky.

tl;dr; version : BookSleeve now has transactions and Hashes; new version on nuget

Transactions

For some time now I’ve been meaning to add multi/exec support to BookSleeve (which is how redis manages transactions). The main reason it wasn’t there already is simply: we don’t use them within StackExchange.

To recap, because BookSleeve is designed to be entirely asynchronous, the API almost-always returns some kind of Task or (more often) Task-of-T; from there you can hook a “continue-with” handler, you can elect to wait for a result (or error), or you can just drop it on the floor (fire-and-forget).

Over the weekend, I finally got around to looking at multi/exec. The interesting thing here is that once you’ve issued a MULTI, redis changes the output – return “QUEUED” for pretty much every command. If I was using a synchronous API, I’d be pretty much scuppered at this point; I’d need to duplicate the entire (quite large) API to provide a way of using it. I’d love to say it was by-design (I’d be lying), but the existing asynchronous API made this a breeze (relatively speaking) – all I had to do was wrap the messages in a decorator that expects to see “QUEUED”, and then (after the EXEC) run the already-existing logic against the wrapped message. Or more visually:

conn.Remove(db, "foo"); // just to reset
using(var tran = conn.CreateTransaction())
{ // deliberately ignoring INCRBY here
tran.Increment(db, "foo");
tran.Increment(db, "foo");
var val = tran.GetString(db, "foo");

tran.Execute(); // this *still* returns a Task

Assert.AreEqual("2", conn.Wait(val));
}

Here all the operations happen as an atomic (but pipelined) unit, allowing for more complex integrity conditions. Note that Execute() is still asynchronous (returning a Task), so you can still carry on doing useful work while the network and redis server does some thinking (although redis is shockingly quick).

So what is CreateTransaction?

Because a BookSleeve connection is a thread-safe multiplexer, it is not a good idea to start sending messages down the wire until we have everything we need. If we did that, we’d have to block all the other clients, which is not a good thing. Instead, CreateTransaction() creates a staging area to build commands (using exactly the same API) and capture future results. Then, when Execute() is called the buffered commands are assembled into a MULTI/EXEC unit and sent down in a contiguous block (the multiplexer will send all of these together, obviously).

As an added bonus, we can also use the Task cancellation metaphor to cleanly handle discarding the transaction without execution.

The only disadvantage of the multiplexer approach here is that it makes it pretty hard to use WATCH/UNWATCH, since we wouldn’t be sure what we are watching. Such is life; I’ll think some more on this.

Hashes

Another tool we don’t use much at StackExchange is hashes; however, for your convenience this is now fully implemented in BookSleeve. And since we now have transactions, we can solve the awkward varadic/non-varadic issue of HDEL between 2.2+ and previous versions; if BookSleeve detects a 2.2+ server it will automatically use the varadic version, otherwise it will automatically use a transaction (or enlist in the existing transaction). Sweet!

Go play

A new build is on nuget; I hope you find it useful.

Wednesday, 8 June 2011

Profiling with knobs on

A little while ago, I mentioned the core part of the profiling tool we use at stackexchange – our mini-profiler. Well, I’m delighted to say that with the input from Jarrod and Sam this tool has been evolving at an alarming rate, and has now gained:

  • full integration with our bespoke SQL profiler, which is now also included
    • basically, this is a custom ADO.NET DbConnection that you wrap around your existing connection, and it adds logging
  • an awesome UI (rather than html comments), including a much clearer timeline
  • full support for AJAX activity
  • some very basic highlighting of duplicated queries, etc
  • webform support
  • and a ton of other features and enhancements

To illustrate with Jarrod’s image from the project page:

miniprofiler

The custom connection has been exercised extensively with dapper-dot-net, LINQ-to-SQL and direct ADO.NET, but should also work with Entity Framework (via ObjectContextUtils, provided) and hopefully any other DB frameworks built on top of ADO.NET.

We use it constantly. It is perhaps our primary tool in both understanding problems and just keeping a constant eye on the system while we use it – since this is running (for our developers) constantly, 24×7.

All too often such fine-grain profiling is just “there’s a problem, ask a DBA to attach a profiler to the production server for 2 minutes while I validate something”.

Enjoy.

http://code.google.com/p/mvc-mini-profiler/

Wednesday, 25 May 2011

So soon, beta 2

I’m delighted by the level of response to my protobuf-net v2 beta; I have been kept really busy with a range of questions, feature requests and bug fixes (almost exclusively limited to the new features).

So, before I annoy people with too many “fixed in source” comments, I thought I’d better re-deploy. Beta 2 gives the same as beta 1, but with:

  • reference-tracked objects
    • root object now included (BREAKING CHANGE)
    • fixed false-positive recursion issue
  • full type metadata
    • support custom formatting/parsing
  • interface serialization
    • support serialization from interfaces
    • support known-implementations of interfaces
    • support default concrete type
  • WCF
    • expose WCF types on public API to allow custom models
  • misc
    • support shadow set-methods
    • fix IL glitch
    • fix cold-start thread-race condition
    • fix missing method-forwarding from legacy non-generic API

In addition to a very encouraging level of interest, I’m also pleased that things like the interface-based support mentioned above were pretty effortless to push through the v2 type-model. These are things that had been proposed previously, but there was just no way to push them into the codebase without hideous (and cumulative) hackery. With v2, it was pretty much a breeze.

But for v2 fun see the project download page

And please, keep pestering me ;p

Note that the BREAKING CHANGE marked above only applies to data serialized with the new full-graph AsReference option. No v1 data is impacted, but if you have stored data from earlier v2 alpha/beta code, please drop me a line and I can advise.

Thursday, 19 May 2011

protobuf-net v2, beta

It has been a long time in the coming, I know. I make no excuses, but various other things have meant that it didn’t get much input for a while…

But! I’m happy to say that I’ve just pushed a beta download up onto the project site

So… this v2… what is it?

Basically, it is still the core protobuf stream, but against all the safe advice of my employer I rewrote the entire core. For many reasons:

  • the original design grew organically, and ideas that seemed good turned to make for some problematic code later
  • the vast overuse of generics (and in particular, generics at runtime via reflection) was really hurting some platforms
  • there were many things the design could not readily support – structs, runtime models, pre-built serialization dlls, usage on mobile devices, etc
  • the code path at runtime was just not as minimal as I would like
  • and a myriad of other things

So… I rewrote it. Fortunately I had a barrage of integration tests, which grew yet further during this exercise. The way I figured, I had 2 choices here:

  • go with a richer code-generator to use as a build task (not runtime)
  • go crazy with meta-programming

After some thought, I opted for the latter. In particular, this would (in my reasoning) give me lots of flexibility for keeping things runtime based (where you are able to, at least), and would be a good opportunity to have some fun and learn IL emit to dangerous levels (which, incidentally, turns out to be very valuable – hence mine and Sam’s work on dapper-dot-net).

So what is new?

I don’t claim this list is exhaustive:

  • the ability to define models at runtime (also useful for working with types you don’t control) (example)
  • the ability to have separate models for the same types
  • pre-generation to dll
  • runtime performance improvements
  • support for structs
  • support for immutable objects via surrogates (example)
  • the ability to run on mobile platforms (unity, wp7 – and perhaps moot for a while: MonoDroid, MonoTouch – but presumably Xamarin)
  • the ability to serialize object graphs preserving references (example)
  • the ability to work with types not known in advance (same)
  • probably about 40 other things that slip my mind

I’ll try to go into each of these in detail when I can

And what isn’t there yet?

This is beta; at some point I had to make a milestone cut – but some things aren’t quite complete yet; these are not usually needed (hint: I’m happy to use protobuf-net v2 on my employer’s code-base, so I have confidence in it, and confidence that the core paths have been smoke-tested)

But things that definitely aren’t there in the beta

  • WCF hooks
  • VS tooling (it is just the core runtime dlls)
  • round-trip safe / “extension” fields
  • extraction of .proto (schema) files from a model
  • my build / deploy script needs an overhaul
  • tons of documentation / examples
  • tooling to make the “mobile” story easier (but; it should work – it is being used in a few, for example)

But; have at it

If you want to play with it, go ahead. I only ask that if something behaves unexpectedly, drop me a line before declaring loudly “it sux, dude!”. It might just need some guidance, or maybe even a code fix (I’m far from perfect).

Likewise, any feature suggestions, etc; let me know.

Wednesday, 27 April 2011

Completion tasks - the easy way

One of the good things about doing things in public – people point out when you’ve missed a trick.

Just the other day, I was moaning about how Task seemed too tightly coupled to schedulers, and wouldn’t it be great if you could have a Task that just allowed you to indicate success/failure – I even went so far as to write some experimental code to do just that.

So you can predict what comes next; it already existed – I just didn’t know about it. I give you TaskCompletionSource<T>, which has SetResult(…), SetException(…) and a Task property. Everything I need (except maybe support for result-free tasks, but I can work around that).

And wouldn’t you believe it, the TPL folks shine again, easily doubling my lousy performance, and trebling (or more) the performance compared to “faking it” via RunSynchronously:

Future (uncontested): 1976ms // my offering
Task (uncontested): 4149ms // TPL Task via RunSynchronously
Source (uncontested): 765ms // via TaskCompletionSource
Future (contested): 5608ms
Task (contested): 6982ms
Source (contested): 2389ms


I am hugely indebted to Sam Jack, who corrected me via a comment on the blog entry. Cheers!