Wednesday 16 September 2009

protobuf-net vs NetDataContractSerializer

In a user-group meeting yesterday, I was talking briefly (in a “grok” session) about “protocol buffers” (protobuf-net in particular), and I was asked a question about the comparison to NetDataContractSerializer. I didn’t have the answer to hand (I only had numbers vs DataContractSerializer), so promised to find out…

First; note that the key difference between DataContractSerializer and NetDataContractSerializer is (as I understand it) that NetDataContractSerializer includes more type metadata. This makes it possible to do a few things involving “object” and unanticipated subclasses (which can be useful), but renders it .NET-specific (and less version-tolerant).

The test

As per my existing test rig, I test 1000 iterations of a moderately large extract from the Northwind sample database (into some pretty-standard POCO types); I simply extended this rig to incude NetDataContractSerializer. For completeness, I have 3 variants of the tests, tested separately each against a different set of serializers:

  1. LINQ-to-SQL generated classes with serialization enabled at the data-context; suitable for protobuf-net, DataContractSerializer and DataContractJsonSerializer
  2. as the first test, but with the LINQ-to-SQL bits hacked out and [Serializable] added (to make it suitable for the core binary .NET serializers); suitable for protobuf-net, NetDataContractSerializer, DataContractSerializer, DataContractJsonSerializer, BinaryFormatter and XmlSerializer
  3. as second test, but with ISerializable and IXmlSerializable implemented to support BinaryFormatter and XmlSerializer via protobuf-net

The numbers and analysis

Since only the second test supports NetDataContractSerializer, that is the one I’ll focus on (with results from the 3rd test included for interest only, marked *); here’s the numbers:

Serializer Size (bytes) Serialize (ms) Deserialize (ms)
protobuf-net 133,010 10,769 23,511
NetDataContract
Serializer
992,203 29,343 91,453
DataContract
Serializer
772,406 16,272 66,755
DataContractJson
Serializer
490,425 29,604 135,125
BinaryFormatter 276,366 67,151 56,253
XmlSerializer 1,043,137 29,257 38,528
BinaryFormatter* 133,167 12,095 24,653
XmlSerializer* 177,41012,189 31,827

Some interesting points; firstly, for my sample data it is a myth that WCF’s binary type-centric serializer (NetDataContractSerializer) is going to improve things over xml (DataContractSerializer). Both in terms of bandwidth and processing time it is worse on every front. The biggest advantage of NetDataContractSerializer (in this case) is that the type model is more flexible in terms of interfaces, “object” and unknown subclasses; but this also makes it entirely non-interoperable. Of course, some details of the SOAP envelope used by the binary type-centric WCF transport may be more efficient – but I’m looking at payload for these tests.

Secondly (and fortunately for me) it yet again proves that google know how to do data transport; with protobuf-net being the clear winner for both bandwidth and processing time.

But for me, another point that I’d like to highlight is versatility and compatibility; the same classes can be used happily by a wide range of serializers (above), and presumably a few others too. For the record, the entities in the above test don’t even have any protobuf-net-specific attributes (the outer-most wrapper does have them, but only because at some point in the distant past I wanted a way to compare/contrast some protocol-buffers-specific details, which can only be controlled via the custom attributes).

Other thoughts

As with all performance tests, your specific environment and data may be an important factor. But based on the above numbers it may be reasonable to assume that protobuf-net stands a good chance of holding its corner.

Finally; following some unrelated conversations at the same user-group (they kept me pretty busy), I stress that tight serialization is only part of the story:

  • It can help reduce bandwidth costs and CPU costs associated with (de)serialization, but it won’t help if latency is your issue; protobuf-net makes no claim to change the speed of light.
  • While protocol buffers may be supported on a range of platforms, it is by no means ubiquitous; if your biggest demand is portability, perhaps use SOAP/WSDL (or offer both SOAP and protobuf-net on separate endpoints).
  • By itself it won’t solve all your disconnected woes; although it can presumably be used in tandem with any message-queue based solution you might dream of.