Monday, 1 September 2014

Optional parameters: maybe not so harmful

 

A few days ago I blogged Optional parameters considered harmful; well, as is often the case – I was missing a trick. Maybe they aren’t as bad as I said. Patrick Huizinga put me right; I don’t mind admitting: I was being stupid.

To recap, we had a scenario were these two methods were causing a problem:

static void Compute(int value, double factor = 1.0,
string caption = null)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
string caption = null, Uri target = null)
{ /* ... */ }

What I should have done is: make the parameters non-optional in all overloads except the new one. Existing compiled code isn’t interested in the optional parameters, so will still use the old methods. New code will use the most appropriate overload, which will often (but not quite always) be the overload with the most parameters – the optional parameters.


static void Compute(int value, double factor,
string caption)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
string caption = null, Uri target = null)
{ /* ... */ }

There we go; done; simple; working; no issues. My thanks to Patrick for keeping me honest.

Thursday, 28 August 2014

Optional parameters considered harmful (in public libraries)


UPDATE: I was being stupid. See here for the right way to do this

TAKE THIS POST WITH A HUGE PINCH OF SALT

Optional parameters are great; they can really clean down the number of overloads needed on some APIs where the intent can be very different in different scenarios. But; they can hurt. Consider the following:
static void Compute(int value, double factor = 1.0,
    string caption = null)
{ /* ... */ }

Great; our callers can use Compute(123), or Compute(123, 2.0, "awesome"). All is well in the world. Then a few months later, you realize that you need more options. The nice thing is, you can just add them at the end, so our method becomes:
static void Compute(int value, double factor = 1.0,
  string caption = null, Uri target = null)
{ /* ... */ }

This works great if you are recompiling everything that uses this method, but it isn’t so great if you are a library author; the method could be used inside another assembly that you don’t control and can’t force to be recompiled. If someone updates your library without rebuilding that other dll, it will start throwing MissingMethodException; not great.

OK, you think; I’ll just add it as an overload instead:
static void Compute(int value, double factor = 1.0,
    string caption = null)
{ Compute(value, factor, caption, null); }
static void Compute(int value, double factor = 1.0,
    string caption = null, Uri target = null)
{ /* ... */ }

And you test Compute(123, 1.0, "abc") and Compute(123, target: foo), and everything works fine; great! ship it! No, don’t. You see, what doesn’t work is: Compute(123). This instead creates a compiler error:


The call is ambiguous between the following methods or properties: 'blah.Compute(int, double, string)' and 'blah.Compute(int, double, string, System.Uri)'

Well, damn…

This is by no means new or different; this has been there for quite a while now – but it still sometimes trips me (and other people) up. It would be really nice if the compiler would have a heuristic for this scenario such as “pick the one with the fewest parameters”, but; not today. A limitation to be wary of (if you are a library author). For reference, it bit me hard when I failed to add CancellationToken as a parameter on some *Async methods in Dapper that had optional parameters - and of course, I then couldn't add it after the fact.

Friday, 11 July 2014

Securing redis in the cloud

Redis has utility in just about any system, but when we start thinking about “the cloud” we have a few additional things to worry about. One of the biggest issues is that the cloud is not your personal space (unless you have a VPN / subnet setup, etc) – so you need to think very carefully about what data is openly available to inspection at the network level. Redis does have an AUTH concept, but frankly it is designed to deter casual access: all commands and data remain unencrypted and visible in the protocol, including any AUTH requests themselves. What we probably want, then, is some kind of transport level security.
Now, redis itself does not provide this; there is no standard encryption, but you could configure a secure tunnel to the server using something like stunnel. This works, but requires configuration at both client and server. But to make our lives a bit easier, some of the redis hosting providers are beginning to offer encrypted redis access as a supported option. This certainly applies both to Microsoft “Azure Redis Cache” and Redis Labs “redis cloud”. I’m going to walk through both of these, discussing their implementations, and showing how we can connect.

Creating a Microsoft “Azure Redis Cache” instance

First, we need a new redis instance, which you can provision at https://portal.azure.com by clicking on “NEW”, “Everything”, “Redis Cache”, “Create”:
image
image
There are different sizes of server available; they are all currently free during the preview, and I’m going to go with a “STANDARD” / 250MB:
image
Azure will now go away and start creating your instance:
image
This could take a few minutes (actually, it takes surprisingly long IMO, considering that starting a redis process is virtually instantaneous; but for all I know it is running on a dedicated VM for isolation etc; and either way, it is quicker and easier than provisioning a server from scratch). After a while, it should become ready:
image

Connecting to a Microsoft “Azure Redis Cache” instance

We have our instance; lets talk to it. Azure Redis Cache uses a server-side certificate chain that should be valid without having to configure anything, and uses a client-side password (not a client certificate), so all we need to know is the host address, port, and key. These are all readily available in the portal:
image
image
Normally you wouldn’t post these on the internet, but I’m going to delete the instance before I publish, so; meh. You’ll notice that there are two ports: we only want to use the SSL port. You also want either stunnel, or a client library that can talk SSL; I strongly suggest that the latter is easier! So; Install-Package StackExchange.Redis, and you’re sorted (or Install-Package StackExchange.Redis.StrongName if you are still a casualty of the strong name war). The configuration can be set either as a single configuration string, or via properties on an object model; I’ll use a single string for convenience – and my string is:
mgblogdemo.redis.cache.windows.net,ssl=true,password=LLyZwv8evHgveA8hnS1iFyMnZ1A=

The first part is the host name without a port; the middle part enables ssl, and the final part is either of our keys (the primary in my case, for no particular reason). Note that if no port is specified, StackExchange.Redis will select 6379 if ssl is disabled, and 6380 if ssl is enabled. There is no official convention on this, and 6380 is not an official “ssl redis” port, but: it works. You could also explicitly specify the ssl port (6380) using standard {host}:{port} syntax. With that in place, we can access redis (an overview of the library API is available here; the redis API is on http://redis.io)
var muxer = ConnectionMultiplexer.Connect(configString);
var db = muxer.GetDatabase();
db.KeyDelete("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
int i = (int)db.StringGet("foo");
Console.WriteLine(i); // 3

and there we are; readily talking to an Azure Redis Cache instance over SSL.

Creating a new Redis Labs “redis cloud” instance and configuring the certificates


Another option is Redis Labs; they too have an SSL offering, although it makes some different implementation choices. Fortunately, the same client can connect to both, giving you flexibility. Note: the SSL feature of Redis Labs is not available just through the UI yet, as they are still gauging uptake etc. But it exists and works, and is available upon request; here’s how:

Once you have logged in to Redis Labs, you should immediately have a simple option to create a new redis instance:

image

Like Azure, a range of different levels is available; I’m using the Free option, purely for demo purposes:

image

We’ll keep the config simple:

image

and wait for it to provision:

image

(note; this only takes a few moments)

Don’t add anything to this DB yet, as it will probably get nuked in a moment! Now we need to contact Redis Labs; the best option here is support@redislabs.com; make sure you tell them who you are, your subscription number (blanked out in the image above), and that you want to try their SSL offering. At some point in that dialogue, a switch gets flipped, or a dial cranked, and the Access Control & Security changes from password:

image

to SSL; click edit:

image

and now we get many more options, including the option to generate a new client certificate:

image

Clicking this button will cause a zip file to be downloaded, which has the keys to the kingdom:

image

The pem file is the certificate authority; the crt and key files are the client key. They are not in the most convenient format for .NET code like this, so we need to tweak them a bit; openssl makes this fairly easy:
c:\OpenSSL-Win64\bin\openssl pkcs12 -inkey garantia_user_private.key -in garantia_user.crt -export -out redislabs.pfx

This converts the 2 parts of the user key into a pfx, which .NET is much happier with. The pem can be imported directly by running certmgr.msc (note: if you don’t want to install the CA, there is another option, see below):

image

Note that it doesn’t appear in any of the pre-defined lists, so you will need to select “All Files (*.*)”:

image

After the prompts, it should import:

image

So now we have a physical pfx for the client certificate, and the server’s CA is known; we should be good to go!

Connecting to a Redis Labs “redis cloud” instance


Back on the Redis Labs site, select your subscription, and note the Endpoint:

image

We need a little bit more code to connect than we did with Azure, because we need to tell it which certificate to load; the configuration object model has events that mimic the callback methods on the SslStream constructor:
var options = new ConfigurationOptions();
options.EndPoints.Add(
    "pub-redis-16398.us-east-1-3.1.ec2.garantiadata.com:16398");
options.Ssl = true;
options.CertificateSelection += delegate {
    return new System.Security.Cryptography.X509Certificates.X509Certificate2(
    @"C:\redislabs_creds\redislabs.pfx", "");
};

var muxer = ConnectionMultiplexer.Connect(options);
var db = muxer.GetDatabase();
db.KeyDelete("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
db.StringIncrement("foo");
int i = (int)db.StringGet("foo");
Console.WriteLine(i); // 3

Which is the same smoke test we did for Azure. If you don’t want to import the CA certificate, you could also use the CertificateValidation event to provide custom certificate checks (return true if you trust it, false if you don’t).

Way way way tl:dr;


Cloud host providers are happy to let you use redis, and happy to provide SSL support so you can do it without being negligent. StackExchange.Redis has hooks to let this work with the two SSL-based providers that I know of.

Thursday, 3 July 2014

Dapper gets type handlers and learns how to read maps

A recurring point of contention in dapper has been that it is a bit limited in terms of the types it handles. If you are passing around strings and integers: great. If you are passing around DataTable – that’s a bit more complicated (although moderate support was added for table valued parameters). If you were passing around an entity framework spatial type: forget it.
Part of the problem here is that we don’t want dapper to take a huge pile of dependencies on external libraries, that most people aren’t using – and often don’t even have installed or readily available. What we needed was a type handler API. So: I added a type handler API! Quite a simple one, really – dapper still deals with most of the nuts and bolts, and to add your own handler all you need to provide is some basic parameter setup.
For example, here’s the code for DbGeographyHandler; the only interesting thing that dapper doesn’t do internally is set the value – but the type-handler can also do other things to configure the ADO.NET parameter (in this case, set the type name). It also needs to convert between the Entity Framework representation of geography and the ADO.NET representation, but that is pretty easy:
public override void SetValue(IDbDataParameter parameter, DbGeography value)
{
    parameter.Value = value == null ? (object)DBNull.Value
        : (object)SqlGeography.Parse(value.AsText());
    if (parameter is SqlParameter)
    {
        ((SqlParameter)parameter).UdtTypeName = "GEOGRAPHY";
    }
}

and… that’s it. All you need to do is register any additional handlers (SqlMapper.AddTypeHandler()) and it will hopefully work. We can now use geography values in parameters without pain – i.e.
conn.Execute("... @geo ...",
    new { id = 123, name = "abc", geo = myGeography }

Plugin packages

This means that handlers for exotic types with external dependencies can be shipped separately to dapper, meaning we can now add packages like Dapper.EntityFramework, which brings in support for the Entity Framework types. Neat.

Free to a good home: better DataTable support

At the same time, we can also make a custom handler for DataTable, simplifying a lot of code. There is one slight wrinkle, though: if you are using stored procedures, the type name of the custom type is known implicitly, but a lot of people (including us) don’t really use stored procedures much : we just use raw command text. In this situation, it is necessary to specify the custom type name along with the parameter. Previously, support for this has been provided via the AsTableValuedParameter() extension method, which created a custom parameter with an optional type name – but dapper now internally registers a custom type handler for DataTable to make this easier. We still might need the type name, though, so dapper adds a separate extension method for this, exploiting the extended-properties feature of DataTable:
DataTable table = ...
table.SetTypeName("MyCustomType");
conn.Execute(someSql, new { id = 123, values = table });

That should make things a bit cleaner! Custom type handlers are welcome and encouraged - please do share them with the community (ideally in ready-to-use packages).

Tuesday, 1 July 2014

SNK and me work out a compromise

and bonus feature: our build server configuration
A few days ago I was bemoaning the issue of strong names and nuget. There isn’t a perfect solution, but I think I have a reasonable compromise now. What I have done is:
  • Copy/pasted the csproj, with the duplicates referencing a strong name key
  • Changed those projects to emit an assembly named StackExchange.Redis.StrongName
  • Copy/pasted the nuspec, with the new spec creating the StackExchange.Redis.StrongName package from the new assemblies
  • PINNED the assembly version number; this will not change unless I am introducing breaking changes, which will also coincide with major/minor version number changes – or maybe also for reasonably significant feature additions; not for every point-release, is the thing
  • Ensured I am using [assembly:AssemblyFileVersion] and [assembly:AssembyInformationalVersion] to describe the point-release status
This should achieve the broadest reach with the minimum of fuss and maintenance.
Since that isn’t enough words for a meaningful blog entry, I thought I’d also talk about the build process we use, and the tooling changes we needed for that. Since we are using TeamCity for the builds, it is pretty easy to double everything without complicating the process – I just added the 2 build steps for the new csproj, and told it about the new nuspec:
image
Likewise, TeamCity includes a tool to automatically tweak the assembly metadata before the build happens – the “AssemblyInfo patcher” – which we can use to pin the one value (to change manually based on human decisions), while keeping the others automatic:
image
Actually, I’ll probably change that to use a parameter to avoid me having to dig 3 levels down. After that, we can just let it run the build, and out pop the 2 nupkg as artefacts ready for uploading:
image
If you choose, you can also use TeamCity to set up an internal nuget feed – ideal for both internal use and dogfooding before you upload to nuget.org. In this case, you don’t need to grab the artefacts manually or add a secondary upload step – you can get TeamCity to publish them by default, making them immediately available in-house:
image
It isn’t obvious in the UI, but this search is actually restricted to the “SE” feed, which is my Visual Studio alias to the TeamCity nuget server:
image
Other build tools will generally have similar features – simply: this is what works for us.

Thursday, 19 June 2014

SNK, we need to talk…


Because the world needs another rant about SNK and NuGet


In .NET assemblies, strong names are an optional lightweight signing mechanism that provides identity, including versioning support. Part of the idea here is that a calling assembly can have a pretty good idea that what it asks for is what it gets – it asks for Foo.Bar with key {blah}, version “1.2.3”, and it gets exactly that from the GAC, and the world is rosy. If somebody installs a new additional version of Foo.Bar, or a Foo.Bar from a different author, the old code still gets the dll it wanted, happy in the knowledge that the identity is reliable.
There are only a few problems with this:
  • Most applications don’t use the GAC; the only times people generally “choose” to use the GAC is when their list of options had exactly one option: “use the GAC”. Sharepoint and COM+, I’m looking at you and judging you harshly
  • Actually, there’s a second category of this: people who use strong names because that is what their corporate policy says they must do, with some some well-meaning but completely misguided and incorrect notion that this provides some kind of security. Strong naming is not a security feature. You are just making work and issues for yourself; seriously
  • It doesn’t actually guarantee the version: binding redirect configuration options (just in an xml file in your application) allow for a different version (with the same key) to be provided
  • It doesn’t actually guarantee the integrity of the dll: if somebody has enough access to your computer that they have access to the GAC, they also have enough access to configure .NET to skip assembly identity checking for that dll (just a “snk –Vr {assembly}” away)
And of course, it introduces a range of problems:
  • Versioning becomes a huge pain the backside for all downstream callers, who now need to manage the binding redirect configuration every time a dll gets upgraded anywhere (there are some tools that can help with this, but it isn’t perfect)
  • A strong-named assembly can only reference other strong-named assemblies
  • You now have all sorts of key management issues over your key file (despite the fact that it is pointless and can be bypassed, as already mentioned)

 

Assembly management versus package management


Now enter package management, i.e. NuGet and kin (note: I’m only using NuGet as the example here because it is particularly convenient and readily available to .NET developers; other package management tools exist, each with strengths and weaknesses). Here we have a tool that clearly targets the way 95% of the .NET world actually works:
  • no strong name; we could not care less (or for the Americans: we could care less)
  • no GAC: libraries deployed alongside the application for per-application isolation and deployment convenience (this is especially useful for web-farms, where we just want to robocopy the files out)
  • versioning managed by the package management tool
So, as a library author, it is hugely tempting to simply ignore strong naming, and simply put assemblies without a strong name onto NuGet. And that is exactly what I have done with StackExchange.Redis: it has no strong name. And for most people, that is fine. But now I get repeated calls and emails from people saying “please can you strong name it”.
The argument for and against strong-naming in NuGet is very verbose; there are threads with hundreds of messages for and against – both with valid points. There is no simple answer here.
  • if it is strong named, I introduce the problems already mentioned – when for 95% (number totally invented, note) of the people using it, this is simply not an issue
  • if it isn’t strong named, people doing Sharepoint development, or COM+ development, or just with awkward local policies cannot use it – at least not conveniently
They can of course self-sign locally, and there are tools to help with that – including Nivot.StrongNaming. But this only helps for immediate references: any such change will be fatal to indirect references. You can’t use binding redirects to change the identity of an assembly – except for the version.
I could start signing before deployment, but that would be a breaking change. So I’d have to at a minimum do a major version release. Again, direct references will be fine – just update the package and it works – but indirect references are still completely toast, with no way of fixing them except to recompile the intermediate assembly against the new identity. Not ideal.

I’m torn


In some ways, it is tempting to say “screw it, I need to add a strong name so that the tiny number of people bound by strong naming can use it”, but that is also saying “I need to totally and irreconcilably break all indirect references, to add zero functionality and despite the fact that it was working fine”, and also “I actively want to introduce binding redirect problems for users who currently don’t have any issues whatsoever”.
This is not an easy place to be. Frankly, at this stage I’m also not sure I want to be adding implicit support to the problems that SNK introduce by adding a strong name.

But what if…


Imagineering is fun. Let’s be realistic and suppose that there is nothing we can do to get those systems away from strong names, and that we can’t change the strong-named-can-only-reference-strong-named infection. But let’s also assume we don’t want to keep complicating package management by adding them by default. Why can’t we let this all just work. Or at least, that maybe our package management tools could fix it all. It seems to me that we would only need two things:
  • assembly binding redirects that allow unsigned assemblies to be forwarded to signed assemblies
  • some inbuilt well-known publicly available key that the package management tools could use to self-sign assemblies
For example, imagine that you have your signed application. and you use NuGet to add a package reference to Microsoft.Web.RedisSessionStateProvider, which in turn references StackExchange.Redis. In my imaginary world, NuGet would detect that the local project is signed, and these aren’t – so it would self-sign them with the well-known key, and add an assembly-binding redirect from “StackExchange.Redis” to “StackExchange.Redis with key hash and version”. Important note: the well-known key here is not being used to assert any particular authorship etc – it is simply “this is what we got; make it work”. There’s no need to protect the private key of that.
The major wrinkle in this, of course, is that it would require .NET changes to the fusion loader, in order to allow a binding redirect that doesn’t currently exist. But seriously: haven’t we been having this debate for long enough now? Isn’t it time the .NET framework started helping us with this? If I could request a single vNext CLR feature: this would be it.
Because I am so very tired of having this whole conversation, after a decade of it.
There are probably huge holes in my reasoning here, and reasons why it isn’t a simple thing to change. But: this change, or something like it, is so very very overdue.

And for now…


Do I add a strong name to StackExchange.Redis? Local experiments have shown me that whatever I do: somebody gets screwed, and no matter what I do: I’m going to get yelled at by someone. But I’m open to people’s thoughts and suggestions.

Wednesday, 18 June 2014

Dapper : some minor but useful tweaks

At Stack Exchange we pay a lot of attention to our data access, which is why a few years ago we spawned dapper. In case you haven’t seen it, the purpose of dapper is to remove the mind-numbing and easy-to-get-wrong parts of ADO.NET, making it easy to do proper parameterization, without having the overhead of a micro-ORM. In other words, to make it easy to do this:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top (@count) * from Orders where CustomerId=@id
    and CreationDate >= @from and Status=@open",
    new { id, from, count, open=OrderStatus.Open }).ToList();

It is unapologetic about expecting you to know and “own” your SQL – entirely good things, IMO (by comparison: most professionals wouldn't program the web tier without knowing what their html looks like).

Inline value injection


With a few exceptions the sql that you pass in is essentially opaque: the only things it has done historically is:
  • check the sql to see which possible parameters can definitely be ignored, because they aren’t mentioned (assuming it isn’t a stored procedure, etc)
  • parameter expansion (typically used for in queries, i.e. where id in @ids, where @ids is something like an int[])
Sometimes, however, parameters bite back. There are times when you don’t really want something to be a parameter – it either changes the query so significantly that a separate query plan would be desirable. You don’t want to use a different command-string per value, as this would force dapper to perform meta-programming per value (since it would be a cache miss on the strategy cache).

Alternatively, often the value will never be changed (but is treated as a parameter for code reason), or will only ever be one value unless some configuration setting is changed, maybe twice a year. Classic values here might be a status code from an enum, or the number of rows (select top 50 sometimes performs quite differently to select top (@rows)).

Because of this, the current builds of dapper introduce value injection. This uses an intentionally different syntax similar to parameters, but injects the value directly into the SQL immediately before execution (but allowing dapper to re-use the strategy). It probably makes more sense with an example:
DateTime from = ...
int id = ...
int count = ...
var orders = connection.Query<Order>(
    @"select top {=count} * from Orders where CustomerId=@id
    and CreationDate >= @from and Status={=open}",
    new { id, from, count, open=OrderStatus.Open }).ToList();

Essentially, anything of the form {=name} will be treated as an inlined value, and will be replaced rather than treated as a parameter. Of course, at this point you’re probably screaming “nooooooo! what about sql injection!” – which is why it restricts usage to integer-based types (including enums). This significantly reduces the risk of any abuse, and of course: you don’t have to use it if you don’t want!

Parameter expansion support for OPTIMIZE FOR query hints


Parameter sniffing can be a real pain. For us, we call this “the Jon Skeet problem”: we have some very different users – some with maybe a handful posts and a few dozen reputation, and some with tens of thousands of posts and over a half-million reputation (which means: a lot of vote records). Let’s assume we want to keep the user-id as a regular SQL parameter: if the first use of a query is for “new user Fred”, the query plan won’t work well for Jon. If the first use of a query is for Jon, it won’t work well for “new user Fred”. Fortunately, SQL Server has a mechanism to tell it not to get too attached to a query-plan based on a parameter – the OPTIMIZE FOR query hint. This can be left open (applies to all parameters), but it is often desirable to use the variant that tells it exactly which parameters we are concerned about. This is easy normally, but recall that dapper offers parameter expansion. So how can we use this query hint with the in query above? We don't conveniently know the names of the eventual parameters...

Because of this, dapper now recognises this specific pattern, and performs parameter expansion compatible with this hint. If you use:
option (optimize for (@ids unknown))

it will expand this out to the correct query hint for the parameters that @ids become during expansion.

Async all the things


The usage of async keeps growing, so dapper has now evolved much better async methods (with a number of useful contributions from users); most of these are self explanatory – simply using the *Async method names, but some key additions:
  • Support for cancellation tokens
  • Opt-in use of pipelining (performing a range of operations on a connection without waiting for the early operations to complete – this requires MARS to be enabled)

Summary

No huge changes, but hopefully useful to a few people. Enjoy.