Thursday 2 August 2018

protobuf-net, August 2018 update

An update on what's happening with protobuf-net

Headline: .proto processing now works directly from dotnet build and MSBuild, without any need for DSL processing steps; and - new shiny things in the future.


I haven't spoken about protobuf-net for quite a while, but: it is very much alive and active. However, I really should do a catch-up, and I'm really excited about where we are.

Level 100 primer, if you don't know what "protobuf" is

"protobuf" is Protocol Buffers, Google's cross-platform/language/OS/etc serialization format (and associated tools). It is primarily a dense binary format, but a JSON variant also exists. A lot of Google's public and private APIs are protobuf, but it is used widely outside of Google too.

The data/schema is often described via a custom DSL, .proto - which comes in 2 versions (proto2 and proto3). They both describe the same binary format.

Google provide implementations for a range of platforms including C# (note: "proto3" only), but ... I kinda find the "DSL first, always" approach limiting (I like the flexibility of "code first"), and ... the Google implementation is "Google-idiomatic", rather than ".NET idiomatic".

Hence protobuf-net exists; it is a fast/dense binary serializer that implements the protobuf specifiction, but which is .NET-idiomatic, and allows either code-first or DSL-first. I use it a lot.

Historically, it was biased towards "code first", with the "DSL first" tools a viable but more awkward option.

What's changed lately?

Bespoke managed DSL parser

Just over a year ago now, back in 2.3.0, I released a new set of DSL parsing tools. In the past, protobuf-net's tooling (protogen) made use of Google's protoc tool - a binary executable that processes .proto files, but this was incredibly akward to deploy between platforms. Essentially, the tools would probably work on Windows, but that was about it. This wasn't a great option going forward, so I implemented a completely bespoke 100% managed-code parser and code-generator that didn't depend on protoc at all. protogen was reborn (and it works with both "proto2" and "proto3"), but it lacked a good deployment route.

Playground website

Next, I threw together protogen.marcgravell.com. This is an ASP.NET Core web app that uses the same library code as protogen, but in an interactive web app. This makes for a pretty easy way to play with .proto files, including a code-editor and code generator. It also hosts protoc, if you prefer that - and includes a wide range of Google's API definitions available as imports. This is a very easy way of working with casual .proto usage, and it provides a download location for the standalone protogen tools. It isn't going to win any UI awards, but it works. It even includes a decoder, if you want to understand serialized protobuf data.

Global tools

Having a download for the command-line tools is a great step forward, but ... it is still a lot of hassle. If only there were a way of installing managed-code developer tools in a convenient way. Well, there is: .NET "global tools"; so, a few months ago I added protobuf-net.Protogen. As a "global tool", this can be installed once via

dotnet tool install --global protobuf-net.Protogen

and then protogen will be available anywhere, as a development tool. Impressively, "global tools" work between operating systems, so the exact same package will also work on linux (and presumably Mac). This starts to make .proto very friendly to work with, as a developer.

Build tools

I'm going to be frank and honest: MSBuild scares the bejeezus out of me. I don't understand .targets files, etc. It is a huge blind-spot of mine, but I've made my peace with that reality. So... I was genuinely delighted to receive a pull request from Mark Pflug that fills in the gaps! What this adds is protobuf-net.MSBuild - tools that tweak that build process from dotnet build and MSBuild. What this means is that you can just install protobuf-net.MSBuild into a project, and it automatically runs the .proto → C# code-generation steps as part of build. This means you can just maintain your .proto files without any need to generate the C# as a separate step. You can still extend the partial types in the usualy way. All you need to do is make sure the .proto files are in the project. It even includes the common Google import additions for free (without any extra files required), so: if you know what a .google.protobuf.timestamp.Timestamp is - know that it'll work without you having to add the relevant .proto file manually (although you still need the import statement).

I can't understate how awesome I think these tools are, and how much friendlier it makes the "DSL first" scenario. Finally, protobuf-net can use .proto as a truly first class experience. Thanks again, Mark Pflug!

What next?

That's where we are today, but : to give an update on my plans and priorities going forwards...

Spans and Pipelines

You might have noticed me talking about these a little lately; I've done lots of research to look at what protobuf-net might do with these, but it is probably time to start looking at doing it "for real". The first step there is getting some real timings on the performance difference between a few different approaches

AOT

In particular, platforms that don't allow IL-emit. This helps consumers like UWP, Unity, iOS, etc. They usually currently work with protobuf-net, but via huge compromises. To do better, we need radically overhaul how we approach those platforms. I see two viable avenues to explore there.

  1. we can enhance the .proto codegen (the bits that protobuf-net.MSBuild just made tons better), to include generation of the actual serialization code

  2. we can implement Roslyn-based tools that pull apart code-first usage to understand the model, and emit the serialization code at build time

All of these are going to keep me busy into the foreseeable!