Updated: Billy referred me to the binary remoting format for DataTable; stats for BinaryFormatter now include both xml and binary.
This week, someone asked the fastest way to send a lot of data, only currently available in a DataTable (as xml), over the wire.
My immediate thought was simply GZIP the xml and have done with it, but I was intrigued… can we do better? In particular (just for a complete surprise) I was thinking whether protobuf-net could be used to write them. Now don’t get me wrong; I’m not a big supporter of DataTable, but they are still alive in the wild. And protobuf-net is a general-purpose serializer, so… why not try to be helpful, eh?
v1 won’t help at all with this, since the object-model just isn’t primed for extension, but in v2 we have many options. I thought I’d look at some typical data (SalesOrderDetail from AdventureWorks2008R2; 121317 rows and 11 columns), comparing the inbuilt xml support, BinaryFormatter support, and protobuf-net. And each of those uncompressed, via gzip, and via deflate.
At this point my new friend Lucy (right) is pestering me for a walk, but… it worked; purely experimental, but committed (r356). Results shown below:
Table loaded 11 cols 121317 rowsSince “walkies” beckons, I’ll be brief:
DataTable (xml) (vanilla) 2269ms/6039ms
DataTable (xml) (gzip) 4881ms/6714ms
DataTable (xml) (deflate) 4475ms/6351ms
BinaryFormatter (rf:xml) (vanilla) 3710ms/6623ms
BinaryFormatter (rf:xml) (gzip) 6879ms/8312ms
BinaryFormatter (rf:xml) (deflate) 5979ms/7472ms
BinaryFormatter (rf:binary) (vanilla) 2006ms/3366ms
BinaryFormatter (rf:binary) (gzip) 3332ms/4267ms
BinaryFormatter (rf:binary) (deflate) 3216ms/4130ms
protobuf-net v2 (vanilla) 316ms/773ms
protobuf-net v2 (gzip) 932ms/1096ms
protobuf-net v2 (deflate) 872ms/1050ms
- the inbuilt xml and BinaryFormatter (as xml) are both very large natively and compress pretty well, but with noticeable additional CPU costs
- with BinaryFormatter using the binary encoding it does much less work, getting (before compression) into the same size-bracket as the xml encoding managed with compression
- protobuf-net is much faster in terms of CPU, but slightly larger than the gzip xml output
- unusually for protobuf, it compresses quite will – presumably there is enough text in there to make it worthwhile; this then takes roughly half the size and much less CPU (compared to gzip xml)
Overall the current experimental code is exceptionally rough and ready, mainly as an investigation. But… should I pursue this? do people still care about DataTable enough? Or just use gzip/xml in this scenario?