Monday 30 March 2009

Obfuscation, serialization and automatically implemented properties

Not the first time, I saw a question today discussing obfuscation and serialization in .NET; as a re-cap, obfuscation (in this context) is the art of making an assembly hard to read, and one of the most basic tricks is to rename types and members to things like "a1", "a2", "ab1", etc. The problem is that the automated serialization engines are very fussy about names...

Oddly, this isn't unique to obfuscation; consider:
    public DateTime DateOfBirth {get;set;}
With automatically implemented properties, the field is supplied by the compiler (with an obscure name); but BinaryFormatter (etc) work against fields! If we make a supposedly innocent change to add validation logic (and so add our own field) our serialization can break.

Problem statement (with automatically implemented properties)
  • We don't want out binary serialization to break just because we switch to manual implementation
Problem statement (with obfuscation):
  • We want the type to be obfuscated
  • We want a type to be serializable and deserializable in both original and obfuscated forms, ideally in a compatible way
  • We don't want to write all the serialization code by hand (i.e. ISerializable)
  • It would be nice (but not essential) if the serialized data didn't expose all our innards by name
For people using remoting or BinaryFormatter, this isn't simple. Formats like xml would work (assuming explicit xml names marked against each member), but xml isn't always the best choice - and you largely defeat the purpose of obfuscation with the attribute.

As it happens, all this is something that protobuf-net does for free! Since the contract doesn't involve names, we don't (usually) care what the fields/properties are called. It is also trivial to hook into ISerializable (if we need remoting), and WCF etc. Members are designated a numeric tag - for example:
   [ProtoMember(3)]
public DateTime DateOfBirth {get;set;}
This then solves all our problems in one go; the only identifier used on the wire is the "3". There is a supported use-case that uses names to infer tags, but that is "opt in" (and not the preferred usage).

Plus of course; it is free, fast and gives much smaller binary output (for which Google must take much of the credit) - and works on virtuall all the .NET platforms. Give it a whirl ;-p