Friday, 14 November 2014

Episode 2; a cry for help


Part 2 of 2; in part 1 I talked about Cap’n Proto; here’s where you get to be a hero!
In my last post, I talked about the fun I've been having starting a Cap'n Proto spike for .NET/C#. Well, here's the thing: I don't scale. I already have an absurd number of OSS projects needing my time, plus a full time job, plus a family, including a child with "special needs", plus I help run my local school (no, really - I'm legally accountable for the education of 240 small people; damn!), and a number of speaking engagements (see you at London NDC or WCDC ?).

I don't have infinite time. What I don’t need is yet another project where I’m the primary contributor, struggling to give it anything like the time it deserves. So here's me asking if anybody in my reach wants to get involved and help me take it from a barely-usable shell, into a first rate tool. Don't get me wrong: the important word in "barely usable" is "usable", but it could be so much better.
 

Remind me: why do you care about this?


Let’s summarize the key points that make Cap’n Proto interesting:
  • next to zero processing
  • next to zero allocations (even for a rich object model)
  • possibility to exploit OS-level concepts like memory mapped files (actually, this is already implemented) for high performance
  • multi-platform
  • version tolerant
  • interesting concepts like “unions” for overlapping data
Although if I’m honest, pure geek curiosity was enough for me. The simple “nobody else has done it” was my lure.

So what needs doing?


While the core is usable, there’s a whole pile of stuff that could be done to make it into a much more useful tool. Conveniently, a lot of these are fairly independent, and could be sliced off very easily if a few other people wanted to get involved in an area of particular interest. But; here’s the my high-level ideas:

  • Schema parsing: this is currently a major PITA, since the current "capnp" tool only really works / compiles for Linux. There is a plan in the core Cap'n Proto project to get this working on MinGW (for Windows), but it would be nice to have a full .NET parser - I'm thinking "Irony"-based, although I'm not precious about the implementation
  • Offset processing: related to schema parsing, we need to compute the byte offsets of a parsed schema. This is another job that the "capnp" tool does currently. Basically, the idea is to look at all of the defined fields, and assign byte offsets to each, taking into account some fairly complicated "group" and "union" rules
  • Code generation: I have some working code-gen, but it is crude and "least viable". It feels like this could be done much better. In particular I'm thinking "devenv" tooling, whether that means T4 or some kind of VS plugin, ideally trivially deployed (NuGet or similar) - so some experience making Visual Studio behave would be great. Of course, it would be great if it worked on Mono too – I don’t know what that means for choices like T4.
  • Code-first: “schemas? we don't need no stinking schemas!”; here I mean the work to build a schema model from pre-existing types, presumably via attributes – or perhaps combining an unattributed POCO model with a regular schema
  • POCO serializer: the existing proof-of-concept works via generated types; however, building on the code-first work, it is entirely feasible to write a "regular" serializer that talks in terms of some pre-existing POCO model, but uses the library api to read/write the objects per the wire format
  • RPC: yes, Cap'n Proto defines an RPC stack. No I haven't even looked at it. It would be great if somebody did, though
  • Packed encoding: the specification defines an alternative "packed" representation for data, that requires some extra processing during load/save, but removes some redundant data; not currently implemented
  • Testing: I'm the worst possible person to test my own code – too “close” to it. I should note that I have a suite of tests related to my actual needs that aren't in then open source repo (I’ll try and migrate many of them), but: more would be great
  • Multi-platform projects: for example, an iOS / Windows Store version probably needs to use less (well, zero) of the “unsafe” code (mostly there for efficiency); does it compile / run on Mono? I don’t know.
  • Proper performance testing; I'm casually happy with it, but some dedicated love would be great
  • Much more compatibility testing against the other implementations
  • Documentation; yeah, telling people how to use it helps
  • And probably lots more stuff I'm forgetting

Easy budget planning


Conveniently, I have a budget that can be represented using no bits of data! See how well we're doing already? I can offer nothing except my thanks, and the satisfaction of having fun hacking at some fun stuff (caveat: for small values of "fun"), and maybe some community visibility if it takes off. Which it might not. I'm more than happy to open up commit access to the repo (I can always revert :p) - although I'd rather keep more control over NuGet (more risk to innocents).
So... anyone interested? Is my offer of work for zero pay and limited reward appealing? Maybe you’re looking to get involved in OSS, but haven’t found an interesting project. Maybe you work for a company that has an interest in being able to process large amounts of data with minimum overheads.

Besides, it could be fun ;p

If you think you want to get involved, drop me an email (marc.gravell@gmail.com) and we'll see if we can get things going!