Thursday, 19 June 2014

SNK, we need to talk…


Because the world needs another rant about SNK and NuGet


In .NET assemblies, strong names are an optional lightweight signing mechanism that provides identity, including versioning support. Part of the idea here is that a calling assembly can have a pretty good idea that what it asks for is what it gets – it asks for Foo.Bar with key {blah}, version “1.2.3”, and it gets exactly that from the GAC, and the world is rosy. If somebody installs a new additional version of Foo.Bar, or a Foo.Bar from a different author, the old code still gets the dll it wanted, happy in the knowledge that the identity is reliable.
There are only a few problems with this:
  • Most applications don’t use the GAC; the only times people generally “choose” to use the GAC is when their list of options had exactly one option: “use the GAC”. Sharepoint and COM+, I’m looking at you and judging you harshly
  • Actually, there’s a second category of this: people who use strong names because that is what their corporate policy says they must do, with some some well-meaning but completely misguided and incorrect notion that this provides some kind of security. Strong naming is not a security feature. You are just making work and issues for yourself; seriously
  • It doesn’t actually guarantee the version: binding redirect configuration options (just in an xml file in your application) allow for a different version (with the same key) to be provided
  • It doesn’t actually guarantee the integrity of the dll: if somebody has enough access to your computer that they have access to the GAC, they also have enough access to configure .NET to skip assembly identity checking for that dll (just a “snk –Vr {assembly}” away)
And of course, it introduces a range of problems:
  • Versioning becomes a huge pain the backside for all downstream callers, who now need to manage the binding redirect configuration every time a dll gets upgraded anywhere (there are some tools that can help with this, but it isn’t perfect)
  • A strong-named assembly can only reference other strong-named assemblies
  • You now have all sorts of key management issues over your key file (despite the fact that it is pointless and can be bypassed, as already mentioned)

 

Assembly management versus package management


Now enter package management, i.e. NuGet and kin (note: I’m only using NuGet as the example here because it is particularly convenient and readily available to .NET developers; other package management tools exist, each with strengths and weaknesses). Here we have a tool that clearly targets the way 95% of the .NET world actually works:
  • no strong name; we could not care less (or for the Americans: we could care less)
  • no GAC: libraries deployed alongside the application for per-application isolation and deployment convenience (this is especially useful for web-farms, where we just want to robocopy the files out)
  • versioning managed by the package management tool
So, as a library author, it is hugely tempting to simply ignore strong naming, and simply put assemblies without a strong name onto NuGet. And that is exactly what I have done with StackExchange.Redis: it has no strong name. And for most people, that is fine. But now I get repeated calls and emails from people saying “please can you strong name it”.
The argument for and against strong-naming in NuGet is very verbose; there are threads with hundreds of messages for and against – both with valid points. There is no simple answer here.
  • if it is strong named, I introduce the problems already mentioned – when for 95% (number totally invented, note) of the people using it, this is simply not an issue
  • if it isn’t strong named, people doing Sharepoint development, or COM+ development, or just with awkward local policies cannot use it – at least not conveniently
They can of course self-sign locally, and there are tools to help with that – including Nivot.StrongNaming. But this only helps for immediate references: any such change will be fatal to indirect references. You can’t use binding redirects to change the identity of an assembly – except for the version.
I could start signing before deployment, but that would be a breaking change. So I’d have to at a minimum do a major version release. Again, direct references will be fine – just update the package and it works – but indirect references are still completely toast, with no way of fixing them except to recompile the intermediate assembly against the new identity. Not ideal.

I’m torn


In some ways, it is tempting to say “screw it, I need to add a strong name so that the tiny number of people bound by strong naming can use it”, but that is also saying “I need to totally and irreconcilably break all indirect references, to add zero functionality and despite the fact that it was working fine”, and also “I actively want to introduce binding redirect problems for users who currently don’t have any issues whatsoever”.
This is not an easy place to be. Frankly, at this stage I’m also not sure I want to be adding implicit support to the problems that SNK introduce by adding a strong name.

But what if…


Imagineering is fun. Let’s be realistic and suppose that there is nothing we can do to get those systems away from strong names, and that we can’t change the strong-named-can-only-reference-strong-named infection. But let’s also assume we don’t want to keep complicating package management by adding them by default. Why can’t we let this all just work. Or at least, that maybe our package management tools could fix it all. It seems to me that we would only need two things:
  • assembly binding redirects that allow unsigned assemblies to be forwarded to signed assemblies
  • some inbuilt well-known publicly available key that the package management tools could use to self-sign assemblies
For example, imagine that you have your signed application. and you use NuGet to add a package reference to Microsoft.Web.RedisSessionStateProvider, which in turn references StackExchange.Redis. In my imaginary world, NuGet would detect that the local project is signed, and these aren’t – so it would self-sign them with the well-known key, and add an assembly-binding redirect from “StackExchange.Redis” to “StackExchange.Redis with key hash and version”. Important note: the well-known key here is not being used to assert any particular authorship etc – it is simply “this is what we got; make it work”. There’s no need to protect the private key of that.
The major wrinkle in this, of course, is that it would require .NET changes to the fusion loader, in order to allow a binding redirect that doesn’t currently exist. But seriously: haven’t we been having this debate for long enough now? Isn’t it time the .NET framework started helping us with this? If I could request a single vNext CLR feature: this would be it.
Because I am so very tired of having this whole conversation, after a decade of it.
There are probably huge holes in my reasoning here, and reasons why it isn’t a simple thing to change. But: this change, or something like it, is so very very overdue.

And for now…


Do I add a strong name to StackExchange.Redis? Local experiments have shown me that whatever I do: somebody gets screwed, and no matter what I do: I’m going to get yelled at by someone. But I’m open to people’s thoughts and suggestions.

Wednesday, 18 June 2014

Dapper : some minor but useful tweaks

At Stack Exchange we pay a lot of attention to our data access, which is why a few years ago we spawned dapper. In case you haven’t seen it, the purpose of dapper is to remove the mind-numbing and easy-to-get-wrong parts of ADO.NET, making it easy to do proper parameterization, without having the overhead of a micro-ORM. In other words, to make it easy to do this:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top (@count) * from Orders where CustomerId=@id
    and CreationDate >= @from and Status=@open",
    new { id, from, count, open=OrderStatus.Open }).ToList();

It is unapologetic about expecting you to know and “own” your SQL – entirely good things, IMO (by comparison: most professionals wouldn't program the web tier without knowing what their html looks like).

Inline value injection


With a few exceptions the sql that you pass in is essentially opaque: the only things it has done historically is:
  • check the sql to see which possible parameters can definitely be ignored, because they aren’t mentioned (assuming it isn’t a stored procedure, etc)
  • parameter expansion (typically used for in queries, i.e. where id in @ids, where @ids is something like an int[])
Sometimes, however, parameters bite back. There are times when you don’t really want something to be a parameter – it either changes the query so significantly that a separate query plan would be desirable. You don’t want to use a different command-string per value, as this would force dapper to perform meta-programming per value (since it would be a cache miss on the strategy cache).

Alternatively, often the value will never be changed (but is treated as a parameter for code reason), or will only ever be one value unless some configuration setting is changed, maybe twice a year. Classic values here might be a status code from an enum, or the number of rows (select top 50 sometimes performs quite differently to select top (@rows)).

Because of this, the current builds of dapper introduce value injection. This uses an intentionally different syntax similar to parameters, but injects the value directly into the SQL immediately before execution (but allowing dapper to re-use the strategy). It probably makes more sense with an example:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top {=count} * from Orders where CustomerId=@id
    and CreationDate >= @from and Status={=open}",
    new { id, from, count, open=OrderStatus.Open }).ToList();

Essentially, anything of the form {=name} will be treated as an inlined value, and will be replaced rather than treated as a parameter. Of course, at this point you’re probably screaming “nooooooo! what about sql injection!” – which is why it restricts usage to integer-based types (including enums). This significantly reduces the risk of any abuse, and of course: you don’t have to use it if you don’t want!

Parameter expansion support for OPTIMIZE FOR query hints


Parameter sniffing can be a real pain. For us, we call this “the Jon Skeet problem”: we have some very different users – some with maybe a handful posts and a few dozen reputation, and some with tens of thousands of posts and over a half-million reputation (which means: a lot of vote records). Let’s assume we want to keep the user-id as a regular SQL parameter: if the first use of a query is for “new user Fred”, the query plan won’t work well for Jon. If the first use of a query is for Jon, it won’t work well for “new user Fred”. Fortunately, SQL Server has a mechanism to tell it not to get too attached to a query-plan based on a parameter – the OPTIMIZE FOR query hint. This can be left open (applies to all parameters), but it is often desirable to use the variant that tells it exactly which parameters we are concerned about. This is easy normally, but recall that dapper offers parameter expansion. So how can we use this query hint with the in query above? We don't conveniently know the names of the eventual parameters...

Because of this, dapper now recognises this specific pattern, and performs parameter expansion compatible with this hint. If you use:
option (optimize for (@ids unknown))

it will expand this out to the correct query hint for the parameters that @ids become during expansion.

Async all the things


The usage of async keeps growing, so dapper has now evolved much better async methods (with a number of useful contributions from users); most of these are self explanatory – simply using the *Async method names, but some key additions:
  • Support for cancellation tokens
  • Opt-in use of pipelining (performing a range of operations on a connection without waiting for the early operations to complete – this requires MARS to be enabled)

Summary

No huge changes, but hopefully useful to a few people. Enjoy.