Friday, 14 November 2014

Episode 2; a cry for help


Part 2 of 2; in part 1 I talked about Cap’n Proto; here’s where you get to be a hero!
In my last post, I talked about the fun I've been having starting a Cap'n Proto spike for .NET/C#. Well, here's the thing: I don't scale. I already have an absurd number of OSS projects needing my time, plus a full time job, plus a family, including a child with "special needs", plus I help run my local school (no, really - I'm legally accountable for the education of 240 small people; damn!), and a number of speaking engagements (see you at London NDC or WCDC ?).

I don't have infinite time. What I don’t need is yet another project where I’m the primary contributor, struggling to give it anything like the time it deserves. So here's me asking if anybody in my reach wants to get involved and help me take it from a barely-usable shell, into a first rate tool. Don't get me wrong: the important word in "barely usable" is "usable", but it could be so much better.
 

Remind me: why do you care about this?


Let’s summarize the key points that make Cap’n Proto interesting:
  • next to zero processing
  • next to zero allocations (even for a rich object model)
  • possibility to exploit OS-level concepts like memory mapped files (actually, this is already implemented) for high performance
  • multi-platform
  • version tolerant
  • interesting concepts like “unions” for overlapping data
Although if I’m honest, pure geek curiosity was enough for me. The simple “nobody else has done it” was my lure.

So what needs doing?


While the core is usable, there’s a whole pile of stuff that could be done to make it into a much more useful tool. Conveniently, a lot of these are fairly independent, and could be sliced off very easily if a few other people wanted to get involved in an area of particular interest. But; here’s the my high-level ideas:

  • Schema parsing: this is currently a major PITA, since the current "capnp" tool only really works / compiles for Linux. There is a plan in the core Cap'n Proto project to get this working on MinGW (for Windows), but it would be nice to have a full .NET parser - I'm thinking "Irony"-based, although I'm not precious about the implementation
  • Offset processing: related to schema parsing, we need to compute the byte offsets of a parsed schema. This is another job that the "capnp" tool does currently. Basically, the idea is to look at all of the defined fields, and assign byte offsets to each, taking into account some fairly complicated "group" and "union" rules
  • Code generation: I have some working code-gen, but it is crude and "least viable". It feels like this could be done much better. In particular I'm thinking "devenv" tooling, whether that means T4 or some kind of VS plugin, ideally trivially deployed (NuGet or similar) - so some experience making Visual Studio behave would be great. Of course, it would be great if it worked on Mono too – I don’t know what that means for choices like T4.
  • Code-first: “schemas? we don't need no stinking schemas!”; here I mean the work to build a schema model from pre-existing types, presumably via attributes – or perhaps combining an unattributed POCO model with a regular schema
  • POCO serializer: the existing proof-of-concept works via generated types; however, building on the code-first work, it is entirely feasible to write a "regular" serializer that talks in terms of some pre-existing POCO model, but uses the library api to read/write the objects per the wire format
  • RPC: yes, Cap'n Proto defines an RPC stack. No I haven't even looked at it. It would be great if somebody did, though
  • Packed encoding: the specification defines an alternative "packed" representation for data, that requires some extra processing during load/save, but removes some redundant data; not currently implemented
  • Testing: I'm the worst possible person to test my own code – too “close” to it. I should note that I have a suite of tests related to my actual needs that aren't in then open source repo (I’ll try and migrate many of them), but: more would be great
  • Multi-platform projects: for example, an iOS / Windows Store version probably needs to use less (well, zero) of the “unsafe” code (mostly there for efficiency); does it compile / run on Mono? I don’t know.
  • Proper performance testing; I'm casually happy with it, but some dedicated love would be great
  • Much more compatibility testing against the other implementations
  • Documentation; yeah, telling people how to use it helps
  • And probably lots more stuff I'm forgetting

Easy budget planning


Conveniently, I have a budget that can be represented using no bits of data! See how well we're doing already? I can offer nothing except my thanks, and the satisfaction of having fun hacking at some fun stuff (caveat: for small values of "fun"), and maybe some community visibility if it takes off. Which it might not. I'm more than happy to open up commit access to the repo (I can always revert :p) - although I'd rather keep more control over NuGet (more risk to innocents).
So... anyone interested? Is my offer of work for zero pay and limited reward appealing? Maybe you’re looking to get involved in OSS, but haven’t found an interesting project. Maybe you work for a company that has an interest in being able to process large amounts of data with minimum overheads.

Besides, it could be fun ;p

If you think you want to get involved, drop me an email (marc.gravell@gmail.com) and we'll see if we can get things going!





Efficiency: the work you don’t do…

 
Part 1 of 2; here’s where I talk about a premise; in part 2 I pester you for help and involvement
Q: what's faster than doing {anything}?
A: not doing it
I spend a lot of time playing with raw data, whether it is serialization protocols (protobuf-net), materialization (dapper) or network api protocols (redis, web-sockets, etc). I don't claim to be the world's gift to binary, but I like to think I can "hold my own" in this area. And one thing that I've learned over and over again is: at the application level, sure: do what you want - you're not going to upset the runtime / etc - but at the library level (in these areas, and others) you often need to do quite a bit of work to minimize CPU and memory (allocation / collection) overhead.
In the real world, we have competing needs. We want to process large amounts of data, but we don't want to pay the price. What to do?
 

Get me a bowl of Cap’n Proto…

A little while ago I became aware of Cap'n Proto. You might suspect from the name that it is tangentially related to Protocol Buffers (protobuf), and you would be right: it it the brain-child of Kenton Varda, one of the main originators of protobuf. So at this point you're probably thinking "another serialization protocol? boooring", but stick with me - it gets more interesting! And not just because it describes itself as a “cerealization protocol” (badum tish?). Rather than simply describing a serialization format, Cap'n Proto instead describes a general way of mapping structured data to raw memory. And here's the fun bit: in a way that is also suitable for live objects.
Compare and contrast:
  • load (or stream) the source data through memory, applying complex formatting rules, constructing a tree (or graph) of managed objects, before handing the root object back to the caller
  • load the source data into memory, and... that's it
Cap'n Proto is the second of these; it is designed to be fully usable (read and write) in the raw form, so if you can load the data, your work is done.
 

So how does that work?

Basically, it describes a “message” of data as a series of one or more “segments” (which can be of different sizes), with the objects densely packed inside a segment. Each object is split into a number of “data words” (this is where your ints, bools, floats, etc, get stored), and a number of “pointers”, where “pointer” here just means a few bits to describe the nature of the data, and either a relative offset to another chunk in the same segment, or an absolute offset into a different segment. Visually:
image
We can then make an object model that sits on top of that, where each object just has a reference to the segment, an absolute offset to the first word of our data, and a few bits to describe the shape, and we’re done. This format works both on disk and for live interactive objects, so the disk format is exactly the in-memory format. Zero processing. Note that there’s a lot of up-front processing that defines how an object with 2 bools, an int, a float, a string, and another bool (that was added in a later revision of your schema) should be laid out in the data words, but ultimately that is a solved problem.
 

Nothing is free

Of course everything has a price; the cost in this case is that traversing from object to object involves parsing the pointer offsets, but this is all heavily optimized - it just means the actual data is one level abstracted. Indeed, most of the work in shimming between the two can be done via basic bitwise operations (“and”, “shift”, etc). Plus of course, you only pay this miniscule cost for data you actually touch. Ignoring data is effectively entirely free. So all we need to do is load the data into memory. Now consider: many modern OSes offer "memory mapped files" to act as an OS-optimized way of treating a flat file as raw memory. This holds some very interesting possibilities for insanely efficient processing of arbitrarily large files.
 

I couldn't stay away

In my defence, I did have a genuine business need, but it was something I'd been thinking about for a while; so I took the plunge and started a .NET / C# implementation of Cap'n Proto, paying lots of attention to effeciency - and in particular allocations (the "pointers" to the underlying data are implemented as value types, with the schema-generated entities also existing as value types that simply surround a "pointer", making the defined members more conveniently accessible). This means that not only is there no real processing involved in parsing the data, but there are also no allocations - and no allocations means nothing to collect. Essentially, we can use the “message” as a localized memory manager, with no .NET managed reference-types for our data – just structs that refer to the formatted data inside the “message” – and then dispose of the entire “message” in one go, rather than having lots of object references. Consider also that while a segment could be a simple byte[] (or perhaps ulong[]), it could also be implemented using entirely unmanaged memory, avoiding the “Large Object Heap” entirely (we kinda need to use unmanaged memory if we want to talk to memory mapped files). A bit “inner platform”, but I think we’ll survive.
 

Sounds interesting! How do I use it?

Well, that’s the gotcha; the core is stable, but - I have lots more to do, and this is where I need your help. I think there's lots of potential here for a genuinely powerful and versatile piece of .NET tooling. In my next blog entry, I'm going to be describing what the gaps are and inviting community involvement.

















Monday, 1 September 2014

Optional parameters: maybe not so harmful

 

A few days ago I blogged Optional parameters considered harmful; well, as is often the case – I was missing a trick. Maybe they aren’t as bad as I said. Patrick Huizinga put me right; I don’t mind admitting: I was being stupid.

To recap, we had a scenario were these two methods were causing a problem:

static void Compute(int value, double factor = 1.0,
string caption = null)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
string caption = null, Uri target = null)
{ /* ... */ }

What I should have done is: make the parameters non-optional in all overloads except the new one. Existing compiled code isn’t interested in the optional parameters, so will still use the old methods. New code will use the most appropriate overload, which will often (but not quite always) be the overload with the most parameters – the optional parameters.


static void Compute(int value, double factor,
string caption)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
string caption = null, Uri target = null)
{ /* ... */ }

There we go; done; simple; working; no issues. My thanks to Patrick for keeping me honest.

Thursday, 28 August 2014

Optional parameters considered harmful (in public libraries)


UPDATE: I was being stupid. See here for the right way to do this

TAKE THIS POST WITH A HUGE PINCH OF SALT

Optional parameters are great; they can really clean down the number of overloads needed on some APIs where the intent can be very different in different scenarios. But; they can hurt. Consider the following:
static void Compute(int value, double factor = 1.0,
    string caption = null)
{ /* ... */ }

Great; our callers can use Compute(123), or Compute(123, 2.0, "awesome"). All is well in the world. Then a few months later, you realize that you need more options. The nice thing is, you can just add them at the end, so our method becomes:
static void Compute(int value, double factor = 1.0,
  string caption = null, Uri target = null)
{ /* ... */ }

This works great if you are recompiling everything that uses this method, but it isn’t so great if you are a library author; the method could be used inside another assembly that you don’t control and can’t force to be recompiled. If someone updates your library without rebuilding that other dll, it will start throwing MissingMethodException; not great.

OK, you think; I’ll just add it as an overload instead:
static void Compute(int value, double factor = 1.0,
    string caption = null)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
    string caption = null, Uri target = null)
{ /* ... */ }

And you test Compute(123, 1.0, "abc") and Compute(123, target: foo), and everything works fine; great! ship it! No, don’t. You see, what doesn’t work is: Compute(123). This instead creates a compiler error:


The call is ambiguous between the following methods or properties: 'blah.Compute(int, double, string)' and 'blah.Compute(int, double, string, System.Uri)'

Well, damn…

This is by no means new or different; this has been there for quite a while now – but it still sometimes trips me (and other people) up. It would be really nice if the compiler would have a heuristic for this scenario such as “pick the one with the fewest parameters”, but; not today. A limitation to be wary of (if you are a library author). For reference, it bit me hard when I failed to add CancellationToken as a parameter on some *Async methods in Dapper that had optional parameters - and of course, I then couldn't add it after the fact.

Friday, 11 July 2014

Securing redis in the cloud

Redis has utility in just about any system, but when we start thinking about “the cloud” we have a few additional things to worry about. One of the biggest issues is that the cloud is not your personal space (unless you have a VPN / subnet setup, etc) – so you need to think very carefully about what data is openly available to inspection at the network level. Redis does have an AUTH concept, but frankly it is designed to deter casual access: all commands and data remain unencrypted and visible in the protocol, including any AUTH requests themselves. What we probably want, then, is some kind of transport level security.
Now, redis itself does not provide this; there is no standard encryption, but you could configure a secure tunnel to the server using something like stunnel. This works, but requires configuration at both client and server. But to make our lives a bit easier, some of the redis hosting providers are beginning to offer encrypted redis access as a supported option. This certainly applies both to Microsoft “Azure Redis Cache” and Redis Labs “redis cloud”. I’m going to walk through both of these, discussing their implementations, and showing how we can connect.

Creating a Microsoft “Azure Redis Cache” instance

First, we need a new redis instance, which you can provision at https://portal.azure.com by clicking on “NEW”, “Everything”, “Redis Cache”, “Create”:
image
image
There are different sizes of server available; they are all currently free during the preview, and I’m going to go with a “STANDARD” / 250MB:
image
Azure will now go away and start creating your instance:
image
This could take a few minutes (actually, it takes surprisingly long IMO, considering that starting a redis process is virtually instantaneous; but for all I know it is running on a dedicated VM for isolation etc; and either way, it is quicker and easier than provisioning a server from scratch). After a while, it should become ready:
image

Connecting to a Microsoft “Azure Redis Cache” instance

We have our instance; lets talk to it. Azure Redis Cache uses a server-side certificate chain that should be valid without having to configure anything, and uses a client-side password (not a client certificate), so all we need to know is the host address, port, and key. These are all readily available in the portal:
image
image
Normally you wouldn’t post these on the internet, but I’m going to delete the instance before I publish, so; meh. You’ll notice that there are two ports: we only want to use the SSL port. You also want either stunnel, or a client library that can talk SSL; I strongly suggest that the latter is easier! So; Install-Package StackExchange.Redis, and you’re sorted (or Install-Package StackExchange.Redis.StrongName if you are still a casualty of the strong name war). The configuration can be set either as a single configuration string, or via properties on an object model; I’ll use a single string for convenience – and my string is:
mgblogdemo.redis.cache.windows.net,ssl=true,password=LLyZwv8evHgveA8hnS1iFyMnZ1A=

The first part is the host name without a port; the middle part enables ssl, and the final part is either of our keys (the primary in my case, for no particular reason). Note that if no port is specified, StackExchange.Redis will select 6379 if ssl is disabled, and 6380 if ssl is enabled. There is no official convention on this, and 6380 is not an official “ssl redis” port, but: it works. You could also explicitly specify the ssl port (6380) using standard {host}:{port} syntax. With that in place, we can access redis (an overview of the library API is available here; the redis API is on http://redis.io)
var muxer = ConnectionMultiplexer.Connect(configString);
var db = muxer.GetDatabase();
db.KeyDelete("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
int i = (int)db.StringGet("foo");
Console.WriteLine(i); // 3

and there we are; readily talking to an Azure Redis Cache instance over SSL.

Creating a new Redis Labs “redis cloud” instance and configuring the certificates


Another option is Redis Labs; they too have an SSL offering, although it makes some different implementation choices. Fortunately, the same client can connect to both, giving you flexibility. Note: the SSL feature of Redis Labs is not available just through the UI yet, as they are still gauging uptake etc. But it exists and works, and is available upon request; here’s how:

Once you have logged in to Redis Labs, you should immediately have a simple option to create a new redis instance:

image

Like Azure, a range of different levels is available; I’m using the Free option, purely for demo purposes:

image

We’ll keep the config simple:

image

and wait for it to provision:

image

(note; this only takes a few moments)

Don’t add anything to this DB yet, as it will probably get nuked in a moment! Now we need to contact Redis Labs; the best option here is support@redislabs.com; make sure you tell them who you are, your subscription number (blanked out in the image above), and that you want to try their SSL offering. At some point in that dialogue, a switch gets flipped, or a dial cranked, and the Access Control & Security changes from password:

image

to SSL; click edit:

image

and now we get many more options, including the option to generate a new client certificate:

image

Clicking this button will cause a zip file to be downloaded, which has the keys to the kingdom:

image

The pem file is the certificate authority; the crt and key files are the client key. They are not in the most convenient format for .NET code like this, so we need to tweak them a bit; openssl makes this fairly easy:
c:\OpenSSL-Win64\bin\openssl pkcs12 -inkey garantia_user_private.key -in garantia_user.crt -export -out redislabs.pfx

This converts the 2 parts of the user key into a pfx, which .NET is much happier with. The pem can be imported directly by running certmgr.msc (note: if you don’t want to install the CA, there is another option, see below):

image

Note that it doesn’t appear in any of the pre-defined lists, so you will need to select “All Files (*.*)”:

image

After the prompts, it should import:

image

So now we have a physical pfx for the client certificate, and the server’s CA is known; we should be good to go!

Connecting to a Redis Labs “redis cloud” instance


Back on the Redis Labs site, select your subscription, and note the Endpoint:

image

We need a little bit more code to connect than we did with Azure, because we need to tell it which certificate to load; the configuration object model has events that mimic the callback methods on the SslStream constructor:
var options = new ConfigurationOptions();
options.EndPoints.Add(
    "pub-redis-16398.us-east-1-3.1.ec2.garantiadata.com:16398");
options.Ssl = true;
options.CertificateSelection += delegate {
    return new System.Security.Cryptography.X509Certificates.X509Certificate2(
    @"C:\redislabs_creds\redislabs.pfx", "");
};

var muxer = ConnectionMultiplexer.Connect(options);
var db = muxer.GetDatabase();
db.KeyDelete("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
int i = (int)db.StringGet("foo");
Console.WriteLine(i); // 3

Which is the same smoke test we did for Azure. If you don’t want to import the CA certificate, you could also use the CertificateValidation event to provide custom certificate checks (return true if you trust it, false if you don’t).

Way way way tl:dr;


Cloud host providers are happy to let you use redis, and happy to provide SSL support so you can do it without being negligent. StackExchange.Redis has hooks to let this work with the two SSL-based providers that I know of.

Thursday, 3 July 2014

Dapper gets type handlers and learns how to read maps

A recurring point of contention in dapper has been that it is a bit limited in terms of the types it handles. If you are passing around strings and integers: great. If you are passing around DataTable – that’s a bit more complicated (although moderate support was added for table valued parameters). If you were passing around an entity framework spatial type: forget it.
Part of the problem here is that we don’t want dapper to take a huge pile of dependencies on external libraries, that most people aren’t using – and often don’t even have installed or readily available. What we needed was a type handler API. So: I added a type handler API! Quite a simple one, really – dapper still deals with most of the nuts and bolts, and to add your own handler all you need to provide is some basic parameter setup.
For example, here’s the code for DbGeographyHandler; the only interesting thing that dapper doesn’t do internally is set the value – but the type-handler can also do other things to configure the ADO.NET parameter (in this case, set the type name). It also needs to convert between the Entity Framework representation of geography and the ADO.NET representation, but that is pretty easy:
public override void SetValue(IDbDataParameter parameter, DbGeography value)
{
    parameter.Value = value == null ? (object)DBNull.Value
        : (object)SqlGeography.Parse(value.AsText());
    if (parameter is SqlParameter)
    {
        ((SqlParameter)parameter).UdtTypeName = "GEOGRAPHY";
    }
}

and… that’s it. All you need to do is register any additional handlers (SqlMapper.AddTypeHandler()) and it will hopefully work. We can now use geography values in parameters without pain – i.e.
conn.Execute("... @geo ...",
    new { id = 123, name = "abc", geo = myGeography }

Plugin packages

This means that handlers for exotic types with external dependencies can be shipped separately to dapper, meaning we can now add packages like Dapper.EntityFramework, which brings in support for the Entity Framework types. Neat.

Free to a good home: better DataTable support

At the same time, we can also make a custom handler for DataTable, simplifying a lot of code. There is one slight wrinkle, though: if you are using stored procedures, the type name of the custom type is known implicitly, but a lot of people (including us) don’t really use stored procedures much : we just use raw command text. In this situation, it is necessary to specify the custom type name along with the parameter. Previously, support for this has been provided via the AsTableValuedParameter() extension method, which created a custom parameter with an optional type name – but dapper now internally registers a custom type handler for DataTable to make this easier. We still might need the type name, though, so dapper adds a separate extension method for this, exploiting the extended-properties feature of DataTable:
DataTable table = ...
table.SetTypeName("MyCustomType");
conn.Execute(someSql, new { id = 123, values = table });

That should make things a bit cleaner! Custom type handlers are welcome and encouraged - please do share them with the community (ideally in ready-to-use packages).

Tuesday, 1 July 2014

SNK and me work out a compromise

and bonus feature: our build server configuration
A few days ago I was bemoaning the issue of strong names and nuget. There isn’t a perfect solution, but I think I have a reasonable compromise now. What I have done is:
  • Copy/pasted the csproj, with the duplicates referencing a strong name key
  • Changed those projects to emit an assembly named StackExchange.Redis.StrongName
  • Copy/pasted the nuspec, with the new spec creating the StackExchange.Redis.StrongName package from the new assemblies
  • PINNED the assembly version number; this will not change unless I am introducing breaking changes, which will also coincide with major/minor version number changes – or maybe also for reasonably significant feature additions; not for every point-release, is the thing
  • Ensured I am using [assembly:AssemblyFileVersion] and [assembly:AssembyInformationalVersion] to describe the point-release status
This should achieve the broadest reach with the minimum of fuss and maintenance.
Since that isn’t enough words for a meaningful blog entry, I thought I’d also talk about the build process we use, and the tooling changes we needed for that. Since we are using TeamCity for the builds, it is pretty easy to double everything without complicating the process – I just added the 2 build steps for the new csproj, and told it about the new nuspec:
image
Likewise, TeamCity includes a tool to automatically tweak the assembly metadata before the build happens – the “AssemblyInfo patcher” – which we can use to pin the one value (to change manually based on human decisions), while keeping the others automatic:
image
Actually, I’ll probably change that to use a parameter to avoid me having to dig 3 levels down. After that, we can just let it run the build, and out pop the 2 nupkg as artefacts ready for uploading:
image
If you choose, you can also use TeamCity to set up an internal nuget feed – ideal for both internal use and dogfooding before you upload to nuget.org. In this case, you don’t need to grab the artefacts manually or add a secondary upload step – you can get TeamCity to publish them by default, making them immediately available in-house:
image
It isn’t obvious in the UI, but this search is actually restricted to the “SE” feed, which is my Visual Studio alias to the TeamCity nuget server:
image
Other build tools will generally have similar features – simply: this is what works for us.

Thursday, 19 June 2014

SNK, we need to talk…


Because the world needs another rant about SNK and NuGet


In .NET assemblies, strong names are an optional lightweight signing mechanism that provides identity, including versioning support. Part of the idea here is that a calling assembly can have a pretty good idea that what it asks for is what it gets – it asks for Foo.Bar with key {blah}, version “1.2.3”, and it gets exactly that from the GAC, and the world is rosy. If somebody installs a new additional version of Foo.Bar, or a Foo.Bar from a different author, the old code still gets the dll it wanted, happy in the knowledge that the identity is reliable.
There are only a few problems with this:
  • Most applications don’t use the GAC; the only times people generally “choose” to use the GAC is when their list of options had exactly one option: “use the GAC”. Sharepoint and COM+, I’m looking at you and judging you harshly
  • Actually, there’s a second category of this: people who use strong names because that is what their corporate policy says they must do, with some some well-meaning but completely misguided and incorrect notion that this provides some kind of security. Strong naming is not a security feature. You are just making work and issues for yourself; seriously
  • It doesn’t actually guarantee the version: binding redirect configuration options (just in an xml file in your application) allow for a different version (with the same key) to be provided
  • It doesn’t actually guarantee the integrity of the dll: if somebody has enough access to your computer that they have access to the GAC, they also have enough access to configure .NET to skip assembly identity checking for that dll (just a “snk –Vr {assembly}” away)
And of course, it introduces a range of problems:
  • Versioning becomes a huge pain the backside for all downstream callers, who now need to manage the binding redirect configuration every time a dll gets upgraded anywhere (there are some tools that can help with this, but it isn’t perfect)
  • A strong-named assembly can only reference other strong-named assemblies
  • You now have all sorts of key management issues over your key file (despite the fact that it is pointless and can be bypassed, as already mentioned)

 

Assembly management versus package management


Now enter package management, i.e. NuGet and kin (note: I’m only using NuGet as the example here because it is particularly convenient and readily available to .NET developers; other package management tools exist, each with strengths and weaknesses). Here we have a tool that clearly targets the way 95% of the .NET world actually works:
  • no strong name; we could not care less (or for the Americans: we could care less)
  • no GAC: libraries deployed alongside the application for per-application isolation and deployment convenience (this is especially useful for web-farms, where we just want to robocopy the files out)
  • versioning managed by the package management tool
So, as a library author, it is hugely tempting to simply ignore strong naming, and simply put assemblies without a strong name onto NuGet. And that is exactly what I have done with StackExchange.Redis: it has no strong name. And for most people, that is fine. But now I get repeated calls and emails from people saying “please can you strong name it”.
The argument for and against strong-naming in NuGet is very verbose; there are threads with hundreds of messages for and against – both with valid points. There is no simple answer here.
  • if it is strong named, I introduce the problems already mentioned – when for 95% (number totally invented, note) of the people using it, this is simply not an issue
  • if it isn’t strong named, people doing Sharepoint development, or COM+ development, or just with awkward local policies cannot use it – at least not conveniently
They can of course self-sign locally, and there are tools to help with that – including Nivot.StrongNaming. But this only helps for immediate references: any such change will be fatal to indirect references. You can’t use binding redirects to change the identity of an assembly – except for the version.
I could start signing before deployment, but that would be a breaking change. So I’d have to at a minimum do a major version release. Again, direct references will be fine – just update the package and it works – but indirect references are still completely toast, with no way of fixing them except to recompile the intermediate assembly against the new identity. Not ideal.

I’m torn


In some ways, it is tempting to say “screw it, I need to add a strong name so that the tiny number of people bound by strong naming can use it”, but that is also saying “I need to totally and irreconcilably break all indirect references, to add zero functionality and despite the fact that it was working fine”, and also “I actively want to introduce binding redirect problems for users who currently don’t have any issues whatsoever”.
This is not an easy place to be. Frankly, at this stage I’m also not sure I want to be adding implicit support to the problems that SNK introduce by adding a strong name.

But what if…


Imagineering is fun. Let’s be realistic and suppose that there is nothing we can do to get those systems away from strong names, and that we can’t change the strong-named-can-only-reference-strong-named infection. But let’s also assume we don’t want to keep complicating package management by adding them by default. Why can’t we let this all just work. Or at least, that maybe our package management tools could fix it all. It seems to me that we would only need two things:
  • assembly binding redirects that allow unsigned assemblies to be forwarded to signed assemblies
  • some inbuilt well-known publicly available key that the package management tools could use to self-sign assemblies
For example, imagine that you have your signed application. and you use NuGet to add a package reference to Microsoft.Web.RedisSessionStateProvider, which in turn references StackExchange.Redis. In my imaginary world, NuGet would detect that the local project is signed, and these aren’t – so it would self-sign them with the well-known key, and add an assembly-binding redirect from “StackExchange.Redis” to “StackExchange.Redis with key hash and version”. Important note: the well-known key here is not being used to assert any particular authorship etc – it is simply “this is what we got; make it work”. There’s no need to protect the private key of that.
The major wrinkle in this, of course, is that it would require .NET changes to the fusion loader, in order to allow a binding redirect that doesn’t currently exist. But seriously: haven’t we been having this debate for long enough now? Isn’t it time the .NET framework started helping us with this? If I could request a single vNext CLR feature: this would be it.
Because I am so very tired of having this whole conversation, after a decade of it.
There are probably huge holes in my reasoning here, and reasons why it isn’t a simple thing to change. But: this change, or something like it, is so very very overdue.

And for now…


Do I add a strong name to StackExchange.Redis? Local experiments have shown me that whatever I do: somebody gets screwed, and no matter what I do: I’m going to get yelled at by someone. But I’m open to people’s thoughts and suggestions.

Wednesday, 18 June 2014

Dapper : some minor but useful tweaks

At Stack Exchange we pay a lot of attention to our data access, which is why a few years ago we spawned dapper. In case you haven’t seen it, the purpose of dapper is to remove the mind-numbing and easy-to-get-wrong parts of ADO.NET, making it easy to do proper parameterization, without having the overhead of a micro-ORM. In other words, to make it easy to do this:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top (@count) * from Orders where CustomerId=@id
    and CreationDate >= @from and Status=@open",
    new { id, from, count, open=OrderStatus.Open }).ToList();

It is unapologetic about expecting you to know and “own” your SQL – entirely good things, IMO (by comparison: most professionals wouldn't program the web tier without knowing what their html looks like).

Inline value injection


With a few exceptions the sql that you pass in is essentially opaque: the only things it has done historically is:
  • check the sql to see which possible parameters can definitely be ignored, because they aren’t mentioned (assuming it isn’t a stored procedure, etc)
  • parameter expansion (typically used for in queries, i.e. where id in @ids, where @ids is something like an int[])
Sometimes, however, parameters bite back. There are times when you don’t really want something to be a parameter – it either changes the query so significantly that a separate query plan would be desirable. You don’t want to use a different command-string per value, as this would force dapper to perform meta-programming per value (since it would be a cache miss on the strategy cache).

Alternatively, often the value will never be changed (but is treated as a parameter for code reason), or will only ever be one value unless some configuration setting is changed, maybe twice a year. Classic values here might be a status code from an enum, or the number of rows (select top 50 sometimes performs quite differently to select top (@rows)).

Because of this, the current builds of dapper introduce value injection. This uses an intentionally different syntax similar to parameters, but injects the value directly into the SQL immediately before execution (but allowing dapper to re-use the strategy). It probably makes more sense with an example:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top {=count} * from Orders where CustomerId=@id
    and CreationDate >= @from and Status={=open}",
    new { id, from, count, open=OrderStatus.Open }).ToList();

Essentially, anything of the form {=name} will be treated as an inlined value, and will be replaced rather than treated as a parameter. Of course, at this point you’re probably screaming “nooooooo! what about sql injection!” – which is why it restricts usage to integer-based types (including enums). This significantly reduces the risk of any abuse, and of course: you don’t have to use it if you don’t want!

Parameter expansion support for OPTIMIZE FOR query hints


Parameter sniffing can be a real pain. For us, we call this “the Jon Skeet problem”: we have some very different users – some with maybe a handful posts and a few dozen reputation, and some with tens of thousands of posts and over a half-million reputation (which means: a lot of vote records). Let’s assume we want to keep the user-id as a regular SQL parameter: if the first use of a query is for “new user Fred”, the query plan won’t work well for Jon. If the first use of a query is for Jon, it won’t work well for “new user Fred”. Fortunately, SQL Server has a mechanism to tell it not to get too attached to a query-plan based on a parameter – the OPTIMIZE FOR query hint. This can be left open (applies to all parameters), but it is often desirable to use the variant that tells it exactly which parameters we are concerned about. This is easy normally, but recall that dapper offers parameter expansion. So how can we use this query hint with the in query above? We don't conveniently know the names of the eventual parameters...

Because of this, dapper now recognises this specific pattern, and performs parameter expansion compatible with this hint. If you use:
option (optimize for (@ids unknown))

it will expand this out to the correct query hint for the parameters that @ids become during expansion.

Async all the things


The usage of async keeps growing, so dapper has now evolved much better async methods (with a number of useful contributions from users); most of these are self explanatory – simply using the *Async method names, but some key additions:
  • Support for cancellation tokens
  • Opt-in use of pipelining (performing a range of operations on a connection without waiting for the early operations to complete – this requires MARS to be enabled)

Summary

No huge changes, but hopefully useful to a few people. Enjoy.

Thursday, 17 April 2014

Technical Debt, a case study : tags

 

This is going to be more a discussion / wordy thing; no real code here. This is just to wander through some points, and maybe you’ll find them interesting or thought provoking. You have been warned.

At Stack Exchange, we have a fair understanding of technical debt. Like real debt, technical debt is not by necessity a bad thing – it can allow you to choose an acceptable (but not ideal) solution today, which means you can ship today, but you know that at some point you are going to have to revisit it. Like all loans, technical debt carries interest.

Today I’m talking about tags; specifically, the tags on Stack Exchange questions. These things:

image

I’m going to run through a lot of back-story, but if you’re lazy, just pretend you read all of it and skip to the last paragraph. Or, heck, go look at cat videos. I won’t tell.

Step 0 : the original problem

Way way back (long before I joined the company), the then-much-smaller team needed a way to store and query tags – in particular a wide range of “{a} and {b} and {c}”, “{d} or {e}”, “{f} but not {y}” etc queries for the “show me the questions in {some tags}” pages, or the search page (which allows a lot of tag-based filtering). Your classic database approach here might be to have a Posts table, and Tags table, and a PostTags table, and do a lot of work on PostTags, but as I understand it this simply didn’t want to perform well. Equally, we access and display questions a lot. No, a real lot. Huge amounts. The one thing that we want to happen really really efficiently is “fetch a question”.

Having that data spread over 3 tables requires complex joins or multiple queries (because you could be fetching 500 questions, which could each have 1-5 tags), and then processing the more complicated data. Because of this, we store the tags as a single character-data field in the database – all the tags in one string. This makes it possible to get the post including all the tags in a single row query, and just worry about deciphering it at the UI. Neat. But it doesn’t really allow for efficient query.

Aside: at this point I should also note that we additionally store the data in a PostTags table (although I can’t remember whether this was the case at the time) – the inline vs normalized data is kept in sync and used for different purposes; the things you do for performance, eh?

Step 1 : the evil/awesome hack

So, we’ve got inline tag data that is simple to display, but is virtually impossible to query. Regular indexing doesn’t really work well at finding matches in the middle of character data. Enter (trumpets) SQL Server Full-Text Search. This is inbuilt to SQL Server (which we were already using), and allows all kinds of complex matching to be done using CONTAINS, FREETEXT, CONTAINSTABLE and FREETEXTTABLE. But there were some problems: stop words and non-word characters (think “c#”, “c++”, etc). For the tags that weren’t stop words and didn’t involve symbols, it worked great. So how to convince Full Text Search to work with these others? Answer: cheat, lie and fake it. At the time, we only allowed ASCII alpha-numerics and a few reserved special characters (+, –, ., #), so it was possible to hijack some non-English characters to replace these 4, and the problem of stop words could be solved by wrapping each tag in a pair of other characters. It looked like gibberish, but we were asking Full Text Search for exact matches only, so frankly it didn’t matter. A set of tags like “.net c#” thus became “éûnetà écñà”. Obviously! This worked; it performed well (remember, Stack Overflow was still very young here), and allowed the team to ship. Shipping is a pretty important feature! The team knew it was a fudge, but all was well (enough) in the world…

Step 2 : the creaks begin

Stack Overflow was a hit. Over time, the question count grows steadily larger, as do the number of page requests. Eventually it became apparent that our flagrant abuse of Full Text Search was becoming a performance bottleneck. To be fair, Full Text Search wasn’t really intended to be used in this way, and we were using it at an alarming rate (even after caching). But by now the team had a viable product, more developers, and enough information to judge that more effort was both necessary and worthwhile. At different points, then, our search efforts were re-written to use Lucene.Net and then Elasticsearch, and the “list by tag” work grew a bespoke entity that we call the “tag engine” (which is actually what this previous post is about) – essentially an out-of-process index service. Nowhere near as fully-featured as (say) Lucene.Net, but it only needs to do one thing.

Step 3 : Stack Exchange has sites about language

In time, we had requests for sites that were still primarily in English, but were about languages; French, Japanese, Russian, etc. A lot of their tags were in English, but there was a need to remove our “ASCII alpha-numerics” restriction. Fortunately, since we had Elasticsearch and the tag-engine, this mainly meant changing the few remaining Full Text Search usages (things that  were used rarely, and hadn’t been worth fixing) to use alternatives. The format in the database, however, remained. And since we didn’t want to introduce even more bizarre rules, we kept the 6 reserved characters. Sorry, French.StackExchange – no “é” or “à” in your tags. In reality this was less of a problem than it sounds, as French.StackExchange elects to use accent-free tags – while sites like Russian.StackExchange could use the tags they wanted. And the world keeps turning.

Step 4 : Stack Exchange goes fully multi-lingual (well, ok, bilingual)

Enter Portuguese; in pt.stackoverflow.com, we have our first site that is completely in another language. No weasel room left now: they want tags like this:

image

And guess what: ç was one of our reserved tokens (representing +, so “c++” was stored as “écççà”). We couldn’t even whistle a happy tune and hope nobody would notice the gaps: they found it on day 1 of closed beta. Maybe day 2. We finally had a reason to remove this legacy from the past. But – and I cannot emphasize this enough: these changes were scary. So scary that we didn’t want to do a “update all the things” until we’d had chance to “bed it in” on pt.StackOverflow, and fix any glitches that we’d missed. As it happened, all the code was making use of a single “decipher the tags” method, so it was sensible and pragmatic to simply make that understand both the old format and whatever we came up with now. After some thought and consideration, we settled on a pipe (bar) delimited natural representation, with leading/trailing pipes, so “.net c#” becomes simply “|.net|c#|”. This has virtues:

  • very simple to parse
  • we can tell immediately from the first character which format it is
  • bulk update and removal of tags can be done with a simple replace (including the pipes, to avoid replacing mid-tag matches)
  • and unlike the old format, you don’t need to worry about special-casing the first/last tag when doing a bulk operation (trailing spaces, etc)

Sure, there’s still a reserved |, but we can live with that. We could have used a non-printing character, but that would be very awkward in things like JSON – lots of risk of subtle bugs.

Once this had been working on pt.StackOverflow for a while, we flipped a switch and all new write operations switched to the new format; all “language” sites could have free access to the 6 previously reserved characters. Hoorah!

Step 5 : the backfill

When we did that, we only affected new writes. There was still data using the old format. A lot of data (not just the questions themselves: all tag edits, for example, were stored in this way). But while it remained, our “bulk remove a tag” code had to cope with two formats, which is an unnecessary maintenance overhead. So finally this week we absorbed the pain to do a batched migration of all the old data to the new format. Fairly routine, if a little scary.

And happy dance; the hack is no more!

So why are you telling me all this? What is the point?

Our job as engineers is not always to write the best possible thing that we can, and that can solve every problem ever, and is beautiful code that makes us want to adopt it and take it on picnics. An important part of our job is to:

  • understand what the code actually and genuinely needs to be able to do today and tomorrow
  • think of a way of delivering that with the available resources
  • think a little about what might be needed in the future, acknowledging that this is only a best guess
  • make sure that it is not impossible (or prohibitively expensive) to change things later
  • At the time the original hack was put in, it was absolutely the right choice. A few lines of code, a few clicks in SQL Server, job done. Shipped. Working. It would not have been sensible to invest time getting an external indexing service working just for this. But there was always the possibility to change the implementation, and the “unscramble the mess” code was in one place. Sure: there was debt, but the level of pain was kept to a minimum, and has finally been paid back in full.

    Friday, 28 March 2014

    Windows Redis-64

    As you know, I’m a huge fan of Redis. This week, a team at the Microsoft Open Tech group released a new drop of redis-64 for windows (note: all links there are old – github is fresher), with binaries available on both NuGet and chocolatey. It is even marked by them as “production ready”. The current drop is 2.8.4 in Redis terms (the last public drop was 2.6.12).

    So congratulations, thanks and kudos to the team involved: an important milestone.

    In production, at Stack Exchange we use CentOS to host Redis, so I’m simply not in a position to comment on how well it works in production, but for windows developers using Redis it is really handy having convenient access to a Redis server without having to spin up a VM.

    One really important thing to watch

    From reading the "Redis Release Notes.docx" and "redis.windows.conf" file in the packages/Redis-64.2.8.4 folder, the appoach they used to solve the fork problem was to introduce a shared, memory-mapped file, and it defaults to the size of your physical memory (and that is before it has forked, when it can grow due to copy-on-write). So whether you are using it as a production server or a dev tool, you might want to pay particular attention the the “maxheap” setting in your .conf file. For my local dev work I usually spin up 5 servers, and I currently have 24GB physical memory in my machine – so by default, when running 5 completely empty databases, it instantly chewed up 120GB of hard disk space (you get it back when the server exits). If you don’t have enough disk space: the server won’t start. You'll probably see:

    QForkMasterInit: system error caught. error code=0x000005af, message=VirtualAllocEx failed.

    Fortunately it is simple to configure a memory bound:

    maxheap 4gb # or whatever

    Please note – this really isn’t a criticism; I understand the “why” – they need the shared memory-mapped file for the pseudo-fork, and they need to allocate the “maxheap” amount from the outset because IIRC you can’t grow such a map very flexibly. My intent is merely to flag it in flashing colours that if you’re playing with Redis-64, you want to think about your memory vs disk.

    Now go, play, have some Redis fun.

    Tuesday, 18 March 2014

    So I went and wrote another Redis client…

    …aka: Introducing StackExchange.Redis (nuget | github)

    The observant out there are probably thinking “Wut?” about now, after all didn’t I write this one? Yes, yes I did. This one too, although that was just a mechanism to illustrate the C# dynamic API. So you would be perfectly justified in thinking that I had finally lost the plot and meandered into crazy-town. Actually, you’d have been reasonably justified in thinking that already. So let me take a step back and explain….

    Why? Just… why?

    Firstly, BookSleeve has served us well: it has got the job done with ever increasing load through the network. If it is possible to salute code as a trusted friend, then BookSleeve: I salute you. But: some things had to change. There were a number of problems and design decisions that were conspiring against me, including:

    • Robustness: BookSleeve acts as a wrapper around a single connection to a single endpoint; while it is up: great! But occasionally sockets die, and it was incredibly hard to do anything useful either investigative or recover. In our internal code we had a whole second library that existed mainly to hide this wrinkle (and for things like emergency slave fallback), but even then it wasn’t perfect and we were getting increasing problems with network issues causing downstream impact.
    • Ability to identify performance issues: BookSleeve has only minimal instrumentation – it wasn’t a first class feature, and it showed. Again; fine when everything works great, but if you hit any load issues, there was virtually nothing you could do to resolve them.
    • Single-server: in common with “Robustness” above, the single-endpoint nature of BookSleeve means that it isn’t in a great position if you want to talk to multiple nodes; this could be to fully exploit the available masters and slave nodes, or thinking ahead could be in consideration of “redis cluster” (currently in beta) – and simply wrapping multiple RedisConnection instances doesn’t play nicely in terms of efficient network API usage
    • The socket query API: again, partly tied up to the above multi-server concerns, but also tied up to things like the thread pool and IO pool (the issues described there applies equally to the async network read API, not just the thread-pool)

    And those are just the immediate problems.

    There were some other longstanding glitches in the API that deserved some love, but didn’t justify huge work by themselves – things like the fact that only string keys were supported (some folks make use of binary keys), and things like constantly having to specify the database number (rather than abstracting over that) were troublesome. The necessity of involving the TPL when you didn’t actually want to be true “async” (forcing you to use sync-over-async, an ugly anti-pattern), and the fact that the names didn’t follow the correct async convention (although I can’t honestly remember whether the current conventions existed at the time).

    Looking ahead, “redis cluster” as mentioned above introduced a range of concerns; it probably would have been possible to wrap multiple connections in a super connection, but the existing implementations didn’t really make that feasible without significant work. Also, when looking at “redis cluster”, it is critically important that you know at every point whether a particular value represents a key versus a valuekeys impact how commands are targeted for sharding (and impact whether a multi-key operation is even possible); values do not – and the distinction between them had been lost, which would basically need an operation-by-operation review of the entire codebase.

    In short, to address any of:

    • our immediate internal needs
    • the community needs that weren’t internal priorities
    • any future “redis cluster” needs
    • providing a generally usable platform going forward

    was going to require significant work, and would have by necessity involved significant breaking API changes. If I had reworked the existing code, not only would it have shattered the old API, but it would have meant compromise both for users of the old code and users of the new. And that simply wasn’t acceptable. So: I drew a line, and went for a clean break.

    So what does this mean for users?

    Firstly, if you are using BookSleeve, feel free to continue using it; I won’t delete the files or anything silly like that – but: my main development focus going forward is going to be in StackExchange.Redis, not BookSleeve. The API is basically similar – the work to migrate between them is not huge, but first – why would you? How about:

    • Full multi-server connection abstraction including automatic reconnect and fallback (so read operations continue on a slave if the master is unavailable), and automatic pub/sub resubscription
    • Ability to express preferences to target work at slaves, or demand a certain operation happens on a slave, etc – trivially
    • Full support for “redis cluster”
    • Completely reworked network layer, designed to avoid stalls due to worker starvation while efficiently scaling to multiple connections, while also reducing overheads and moving steps like text encode/decode to the caller/consumer rather than the network layer
    • Full support for both binary and string keys, while reducing (not increasing) the methods necessary (you no longer need to tell it which you want in advance)
    • Observes TPL guidance: no more sync-over-async (there is a full sync API that does not involve the TPL), and the TPL/async methods are now named appropriately
    • Instrumentation as a design feature
    • And lots more…
    • … heck, when I get a moment I might also throw our 2-tier cache (local in-memory cache with a shared redis central cache, including pub/sub-based cache expiry notification etc) down into the client library too

    Is it ready?

    Let’s consider it a “late beta” (edit: fully released now); on the Q&A sites we have now replaced all of our BookSleeve code with StackExchange.Redis code, which meant that hopefully we’ve already stubbed our toes on all the big bugs. I don’t plan on any breaking changes to the API (and will try to avoid it). Lua script support is not yet implemented (edit: it is now), and “redis cluster” isn’t yet released and thus support for this is still a work in progress, but basically: it works, and works fine.

    A full introduction and example basic usage is shown on the project site; please do take a look, and let me know if I’ve moved too much cheese!

    Thursday, 13 March 2014

    Beware jamming the thread pool

     

    I’ve been struggling over the last few days to diagnose and eradicate a fun little bug in the cache tier of Stack Overflow / Stack Exchange; for context, we use redis extensively as a shared cache (and for some other things – including the realtime updates via web-sockets, which rely heavily on redis pub/sub) – and have this week deployed a new implementation of our redis communications stack. Brain-dead bugs aside (and I really did manage a howler, for which I apologise: sorry), we got it in and stable: it would be working just fine, processing a few million messages without issue, and then out of the blue… WHAM! 10 thousand timeouts in a second, and when you immediately go to look, everything is happy again, merrily churning through load as though you had imagined things. Local load testing failed to reproduce this issue.

    As always, I got bitten by the rule: interesting problems only happen at scale.

    ONE DOES NOT SIMPLY ... DO ANYTHING AT STACKEXCHANGE SCALE

    By which, I don’t mean to imply that we’re the biggest site out there, or that we’re doing anything especially clever (quite the opposite in my case, ahem), but we take great pride in the fact that we run a very busy site on very small numbers of servers. We accidentally found out on Tuesday (again, my bad) that we can still run the entire Stack Exchange Network on two web-servers. Not quite as well as normal, but it worked. edit - proof (courtesy of @Nick_Craver):

    The Stack Exchange Network running on two servers

    But unless you have a very expensive test lab and dedicated load-test team, it is really quite hard to simulate realistic load for the network.

    Enough rambling; what went wrong?

    I got hit by the thread-pool. You see, like BookSleeve (which I will be talking about probably next blog), the new client is an async-enabled multiplexer. As outbound messages for an endpoint come in, we dispatch them to a queue (you need to serialize work on a socket, and know which requests marry to which responses), and if we don’t already have a writer for that queue, we request one. The problem here was: I requested if from the thread-pool. Now, the thread-pool is very very clever – it has lots of smarts in there to handle automatic growth and shrinking, etc. It works perfectly well under sane conditions. But, I asked too much of it: asp.net will already be chewing through workers for requests, and we don’t currently use much async code – our requests are processed pretty much synchronously. This means that during a storm (and only then) we were essentially getting into a deadlock condition:

    • asp.net requests were using lots of workers
    • which were blocked waiting on a response from redis
    • which hadn’t been sent the request yet, because no thread-pool thread could be allocated to write the queue

    Fortunately, this is self curing; eventually either:

    • the blocking requests will timeout (we always specify timeouts, and pretty short ones), allowing the requests to complete and making more writers available
    • the thread-pool will grow and allocate a writer (although to be honest, when competing with constant inbound asp.net requests, it is dubious whether the writer would get there fast enough)

    That is why when you look at the system a second after the trouble, it was all healthy again - the timeouts have happened, releasing enough workers to the pool to service the queue.

    Sigh; all fun.

    The moral of the story

    See, I do have morals! Don’t use the thread-pool for time-critical operations if there’s a good chance you’ll already be stressing the thread-pool. The fix was pretty simple once we understood the problem: we now retain a dedicated writer thread per multiplexer (note: not per socket). We do reserve the right to seek help from the thread-pool if there is a backlog, but frankly that is pretty rare – the dedicated writer is usually more than able to keep up (in local testing, I’ve seen the multiplexer servicing around 400k ops/s – far above what most people need).

    Next time: announcing a new redis client for .NET! (also: reasons why, and what about BookSleeve?)

    Saturday, 8 March 2014

    Cleaner code: using partial methods for debugging utilities

    Often in a complex code base, there are additional bits of work you need to do to help with debugging. The easy but hard-to-maintain way to do this is with #if:

    obj.Wow();
    #if DEBUG
    // only want the overhead of checking this when debugging...
    Very("abc");
    #endif
    Such();

    This gets the job done, but can lead to a whole heap of problems:



    • general ugliness – especially when you have lots of different #if pieces all over the file
    • hard to track what uses each method (many code tools won’t look inside inactive conditional compilation blocks)
    • problems with automated tools, such as using directive removal or code organization tools, which get very confused

    We can do better!


    Partial classes were added to C# a long time ago – with one of their main aims being to help with extension points for generated code. You can then split a logical class into multiple files, as long as you use the partial modifier in each file. The contents of the file get merged by the compiler. So what about partial methods? These are like regular method, but with the odd property that they completely evaporate if you aren’t doing anything useful; as in – the compiler completely ignores all mention of them. You declare a partial method like so:


    partial void OnSomethingAwesome(int someArg);


    (noting that you cannot specify an access modifier, and that they follow the same rules as [Conditional(…)] methods: you cannot specify a return value or access modifier, and the parameters cannot be declared out (although they can be ref) – again, this is because the compiler might be removing all trace of them, and we can’t do anything that would upset “definite assignment”). Then elsewhere in the code you can use that method like normal:


    Write("tralala");
    OnSomethingAwesome(123);
    return true;

    That looks convincing right? Except: until we write the body of OnSomethingAwesome, it does not exist. At all. For example, in another file somewhere we could add:


    partial class TheSameClass // and the same namespace!
    {
    partial void OnSomethingAwesome(int number)
    {
    Trace(number);
    }
    }

    And only now does our code do anything useful. It is also important to note that just like [Conditional(…)] methods, the evaluation of the arguments and target are also removed, so you need to be a little careful… there is a huge difference between:


    OnSomethingAwesome(SomethingCritical());

    and


    var value = SomethingCritical();
    OnSomethingAwesome(value);

    In reality, this is rarely an actual issue.


    Applying this to debugging operations


    Hopefully, it should be obvious that we can use this to move all of our debugging operations out of the main code file – perhaps into TheSameClass.debugging.cs file (or whatever you want) – which can then legitimately have a single #if conditional region. So to take our earlier example:


    obj.Wow();
    OnVery("abc");
    Such();

    with (elsewhere):


    partial void OnVery(string much);

    How about interfaces?


    EDIT:It turns out I was completely wrong here; partial interface works fine - my mistake. I'm leaving this here as a mark of my public shame, but: ignore me. You can use interfaces much like the above.

    There is no such thing as a partial interface, but what you can do is declare a separate debug-only interface, and then extend the type in a partial class:


    partial class TheSameClass : IMagicDebugger
    {
    void IMagicDebugger.So() { /* ... */ }
    }

    Real world example


    Here’s something from some real code, where during debugging and testing I need to keep additional counters, that are not needed in the production code:


    #if DEBUG
    partial class ResultBox
    {
    internal static long allocations;

    public static long GetAllocationCount()
    {
    return Interlocked.Read(ref allocations);
    }
    static partial void OnAllocated()
    {
    Interlocked.Increment(ref allocations);
    }
    }
    #endif

    The intent should be pretty obvious from the context, but note that here everything to do with this debug-only feature is now neatly packaged together.


    Enjoy.

    Monday, 3 March 2014

    Be strict with me, dear CLI

    There is an old adage in code:

    Be liberal in what you accept, and conservative in what you send

    And right now, this adage can [obscenity]. The week before last, my main gripe was with broken browser implementations (not behind proxy servers) that clearly didn’t read RFC 6455 (aka WebSocket) – or at least, they read it enough to know to tend a Sec-WebSocket-Key, but not enough to understand client-to-server masking, or even simply to add a Connection or Upgrade header. But that’s not what I’m moaning about today; last week, my gripe was .NET. Specifically, IL emit.

    (wibbly-wobbly screen-fade going to backstory)

    A good number of the tools I work on involve metaprogramming, some of them to quite a scary degree. Which means lots of keeping track of the state of the stack as you bounce around the IL you are writing. And to be fair, yes it is my responsibility to make sure that the IL I emit is meaningful.. I just wish that the CLI was more consistent in terms of what it will allow.

    You see, there’s an interesting rule in the CLI specification:

    In particular, if that single-pass analysis arrives at an instruction, call it location X, that immediately follows an unconditional branch, and where X is not the target of an earlier branch instruction, then the state of the evaluation stack at X, clearly, cannot be derived from existing information. In this case, the CLI demands that the evaluation stack at X be empty.

    The scenario this describes is actually very common – for example in a while loop (which is also the core of foreach), the most common way of setting those out is:

    • (preceding code)
    • unconditional-branch: {condition test}
    • label: {the actual code}
    • (the actual code)
    • label: {condition test}
    • (condition test)
    • branch-if-true: {the actual code}
    • (subsequent code)

    The important point is that “{the actual code}” meets this special case; it is precisely the X mentioned in the specification: it immediately follows an unconditional branch, and is not the target of an earlier branch instruction (it is, however, the target of a later branch instruction). This means that to be valid, the stack (relative to that method) must be empty.

    This would actually be easy enough to smoke-test… just setup some simple IL that forces this condition, press the button, and wait for the CLI to complain about it. Here’s the C# for that, on pastie.org. The only problem is that it runs without complaining. Well, maybe DynamicMethod is a special case… we’ll try a full assembly instead. And again: it works without the slightest complaint. To get it to notice, we need to write it to disk (assembly.Save("Broken.dll");) and then run peverify Broken.dll, which finally gives us the complaint we wanted:

    [IL]: Error: [Broken.dll : BrokenType::BrokenMethod][offset 0x00000002] Stack height at all points must be determinable in a single forward scan of IL.
    1 Error(s) Verifying Broken.dll

    You might think I’m being fussy… I mean, if the CLI runs it anyway then what is the problem? A fair question, but the truth is more complicated. When the CLI is loading an assembly from disk it is often more strict. This depends on a lot of things, including the framework version, the framework platform, and the trust settings.

    Oh for a simple “strict mode” for running in-memory dynamic assemblies: that would make it so much easier for us long-suffering metaprogrammers. And I’m not alone here: a quick google search shows this issue has bitten mono, ikvm, mono-cecil, and many others.

    A silver lining…

    The good news is that the excellent Sigil utility (by my colleague Kevin Montrose) now has support for this, via the strictBranchVerification flag. It also makes the IL emit a lot easier to grok – here’s the same example via Sigil. Sadly, I can’t use it in my particular scenario, unless I create a custom Sigil build that uses IKVM for cross-platform emit, but for most people this should help a lot.