Monday 30 March 2009

Obfuscation, serialization and automatically implemented properties

Not the first time, I saw a question today discussing obfuscation and serialization in .NET; as a re-cap, obfuscation (in this context) is the art of making an assembly hard to read, and one of the most basic tricks is to rename types and members to things like "a1", "a2", "ab1", etc. The problem is that the automated serialization engines are very fussy about names...

Oddly, this isn't unique to obfuscation; consider:
    public DateTime DateOfBirth {get;set;}
With automatically implemented properties, the field is supplied by the compiler (with an obscure name); but BinaryFormatter (etc) work against fields! If we make a supposedly innocent change to add validation logic (and so add our own field) our serialization can break.

Problem statement (with automatically implemented properties)
  • We don't want out binary serialization to break just because we switch to manual implementation
Problem statement (with obfuscation):
  • We want the type to be obfuscated
  • We want a type to be serializable and deserializable in both original and obfuscated forms, ideally in a compatible way
  • We don't want to write all the serialization code by hand (i.e. ISerializable)
  • It would be nice (but not essential) if the serialized data didn't expose all our innards by name
For people using remoting or BinaryFormatter, this isn't simple. Formats like xml would work (assuming explicit xml names marked against each member), but xml isn't always the best choice - and you largely defeat the purpose of obfuscation with the attribute.

As it happens, all this is something that protobuf-net does for free! Since the contract doesn't involve names, we don't (usually) care what the fields/properties are called. It is also trivial to hook into ISerializable (if we need remoting), and WCF etc. Members are designated a numeric tag - for example:
   [ProtoMember(3)]
public DateTime DateOfBirth {get;set;}
This then solves all our problems in one go; the only identifier used on the wire is the "3". There is a supported use-case that uses names to infer tags, but that is "opt in" (and not the preferred usage).

Plus of course; it is free, fast and gives much smaller binary output (for which Google must take much of the credit) - and works on virtuall all the .NET platforms. Give it a whirl ;-p

Thursday 19 March 2009

Visualizing Expressions in PropertyGrid

The problem

The Expression API is complex; not least because picking apart an Expression is complicated. For example, lets pick an expression that removes the first and last characters from a string:

s => s.Length > 2 ? s.Substring(1, s.Length - 2) : s;

If you are working regularly with Expression, you often need to be able to visualize it (if only to stay sane); this counts doubly if we want to understand how to construct our own Expression to do something similar. Ideally without needing any external tools / plug-ins, and something a bit more permanent than the inspector in the IDE.

If you wanted to write down what that actually means (which is what Expression has to do):

  • given a parameter "s" 
    • branch by evaluating the greater-than operator with arguments: 
      • query the Length of "s"
      • the constant 2
    • if true, evaluate Substring on the parameter "s" with arguments: 
      • the constant 1
      • the result of the subtraction operator with arguments:
        • query the Length of "s"
        • the constant 2
    • if false, evaluate the parameter "s"

Although fairly mechanical, pulling apart Expressions can take time. If only the system could display it for us.

The naïve approach

But wait a minute! PropertyGrid is good at displaying tree-like data - lets try setting the SelectedObject to an instance of the above Expression:

The PropertyGrid control, showing minimal data of the outermost (lambda) Expression only.

Disappointed? It doesn't really help us (and note that the Parameters collection is not expandable). The problem is that PropertyGrid uses TypeDescriptor (and friends) to query the metadata, and Expression simply isn't configured to allow hierarchical browsing. But what many people don't realise is that TypeDescriptor allows runtime configuration.

Changing the type-converter

Behind the scenes, PropertyGrid uses the TypeConverter associated with a type (or an individual member), by querying GetPropertiesSupported(), and (if it returns true) GetProperties(). And even better, there is an inbuilt converter (ExpandableObjectConverter) that exposes all the public members for nested browsing.

Additionally, TypeDescriptor offers an indirect way to associate a different type-converter with a type; by adding an attribute at runtime*. Since Expression is the base class for all the interesting objects in the Expression API, we only have to tweak this one type:

  TypeDescriptor.AddAttributes(typeof(Expression),
new TypeConverterAttribute(typeof(ExpandableObjectConverter)));

This is broadly equivalent to using the attribute at compile-time:

  [TypeDescriptor(typeof(ExpandableObjectConverter))]
public abstract class Expression { /* ... */ }

*=for the pedants (myself included), I should note that we only add an attribute to the runtime model (System.ComponentModel); reflection is unaware of this change.

What does it look like now?

So how much difference does that one line make? Test again, and we see something very different - both in the main grid, and in the collection pop-ups:

The PropertyGrid control, showing the full hierarchy of nodes that describe the Expression. The Expression Collection Editor, showing the members and their hierarchical composition.

So with just 1 line of code, we've obtained a free tool for exploring Expression; I'm not saying it is the prettiest tool ever, but it is plenty good enough, considering that the main audience would be you, the developer.

It begs the question; why isn't it expandable in the first place?

Wednesday 18 March 2009

ASP.NET MVC

You've probably heard already, but ASP.NET MVC has finally been officially released.

If, like me, you never really got on with the vanilla ASP.NET pipeline, then ASP.NET MVC offers a very different and understandable experience. The fact that you control the HTML a lot more closely also makes it ideal for jQuery and similar tools.

Add to that the testability built in, the ease of attaching IoC containers, and the simplicity of working with routes instead of pages, and you are onto a winner, IMO.

So if you want to use MVC-style web, with jQuery richness, and the power of the .NET framework behind you, give it a look.

For serious usage, I also recommend:

Finguistics; behind the scenes

Back in January, I mentioned my involvement in Finguistics, the UK's first public surface exhibition.

As a small update, I'm happy to say that a short video covering both the project and the team's thoughts on the experience is now live (update; 2nd (similar) video here):



(the back of my head is top-right; in the middle is Dave Crawford, an ex-colleague of mine; on the left is Jim Allen)


I've also just noticed the typo in my name when it flashes up, but never mind... I had a preview, so it is my own fault for missing it.

In particular, I like Dave Crawford's insights on designing a surface application (from a user-experience perspective). You can also see Paul Tallett showing the dual-wielding mouse approach of surface development - it takes quite some getting used to!

If you are interested in surface development, you might also be interested in Dave Brown's development of the physics model beyond this project (part 1, part 2, part 3). Hopefully some of this work might make it back into the product at some point.

I should also note that while at the MVP Summit, the Sheraton hotel in Seattle has a number of surface devices in the lobby, which were proving very popular with the guests. No Finguistics, though ;-p

Tuesday 17 March 2009

Compact Framework Woes, revisted

A cautionary tale for CF development...

Previously, I mentioned a compact framework permissions gotcha that was hurting protobuf-net. Other than making thinks public, I never got much further, but at least I can understand the issue.

Well, now I have hit another issue (in the wild); for a complex model, the VM is throwing a MissingMethodException - but simply when invoking a regular generic method - no reflection etc:

    prop = CreateProperty<T>(type, ref format);

This surfaces in a test case where we add lots of different (but very similar) types to the model - eventually it simply gives up, and can't resolve the generic method (even though it worked fine for the identical property on the last "n" similar types). One can only assume that it has hit an internal limitation of the CF runtime.

The good news is that the regular framework doesn't exhibit this; the bad news is that this doesn't help me... to support complex models I expect I will have to take out some of the generics, and use a bit more casting / boxing. Which probably won't kill me.

In truth, I've probably out-clevered myself through over-use of generics - and in some ways, it has stopped me using a decorator quite as cleanly as I'd like. Lesson learnt...

Sunday 15 March 2009

Explaining Expression

As part of planning for a user-group presentation, I've been trying to think how to distinguish between a delegate and an expression - ideally without mentioning the word "database".

For example, what is the difference between:

void Foo(Func<int,int,bool> func) {...}

and

void Foo(Expression<Func<int,int,bool>> func) {...}

? After all; both are called with identical syntax, for example:

Foo((x,y) => x==y);

My explanation

So here's my offering...
  • The delegate version (Func<int,int,bool>) is the beligerant manager; "I need you to give me a way to get from 2 integers to a bool; I don't care how - when I'm ready, I'll ask you - and you can tell me the answer".
  • The expression version (Expr<Func<int,int,bool>>) is the dutiful analyst; "I need you to explain to me - if I gave you 2 integers, how would you go about giving me a bool?"
In standard programming, the managerial approach is optimal; the caller already knows how to do the job (i.e. has IL for the purpose). But the analytic approach is more flexible; the analyst reserves the right to simply follow the instructions "as is" (i.e. call Compile().Invoke(...)) - but with understanding comes power. Power to inspect the method followed; report on it; substitute portions; replace it completely with something demonstrably equivalent, etc...

RPC Example

What does this mean? Well, for protobuf-net, I'm currently looking at the RPC stack. The design of protobuf-net is such that you *can* use code-generation, but it isn't enforced. I really want to avoid the messy approaches that you often need to use with WCF (for example).

Consider we have an RPC client that uses generics to indicate the interface that defines the service-contract (pretty common stuff if you are familiar with WCF):

class RpcClient<T> where T : class // T actually an interface
{}

Now; we can't expose the methods of T on RpcClient directly without using code generation; but we could use the approach of getting the caller to express their intent via the interface:

interface IFoo { string Bar(int i); }

var client = new RpcClient<IFoo>();
string s = client.Call(svc => svc.Bar(12345));

So what is Call? It wouldn't be useful as a delegate, as then we'd need to actually provide an IFoo instance to convince the caller to do the work for us... which means either code-generation or runtime type creation; the first is a pain, the second isn't even possible in some of the light-weight frameworks. But what about an expression?

TResult Call<TResult>(Expression<Func<T,TResult>> operation) {...}

Here, the caller gives us their version of what they would do with a hypothetical IFoo instance; we can take those instructions, dig out the MethodInfo (IFoo.Bar), evaluate the arguments (12345), and perform our own custom serialization on the wire. All without ever having an IFoo instance.

Application to Async

To take this a step further; consider Silverlight. All IO should be asynchronous. Yet we really don't want to have to start messing with Begin/End methods or events. We can use the same approach to let the caller express what they would do if running synchronously, but we'll execute it asynchronously. For example:

client.Call(svc => svc.Bar(12345),
result => Console.WriteLine(result()));

With

void Call(
Expression<Func<T,TResult>> operation,
Action<Func<TResult>> callback) {...}

The idea is that we will pick the Expression apart (like before), but rather than return the result, we'll execute the task asynchronously. The caller is requested to supply a callback which we'll invoke when we know the answer. Note that we don't give them the answer - we give them a function that they can invoke to find the answer; the distinction is that this also gives us a mechanism to convey an exception (i.e. if there was a problem the call to result() will re-raise the problem).

This also demonstrates mixing delegates and expressions in a single operation - we don't need to know what the caller wants with the answer in order to fetch it.

Summary

I've covered a lot in a small(ish) space, so feel free to pick it apart - but I hope this helps show some of the more interesting uses of Expression and delegates, and not a single database in sight!

Wednesday 11 March 2009

Back to the coal face

With the summit now over, back to the day job - but first some thoughts (taking NDA etc into consideration):
  • It is no secret that the big theme in .NET 4.0 is the DLR; despite some C# dabbling (here, here), I need to get my head around this "proper"... to that end, I've secured a copy of IronPython in Action, which I hope will set me on the one true path. Besides which - it is good to look at other languages occasionally (and I never really got far with F#).
  • It is no secret that there are lots of changes afoot with Entity Framework and ADO.NET Data Services; now that I've had chance to see them, I'm encouraged that they might be a lot more usable in 4.0 - still waiting on the "bits" though ;-p
  • Windows 7 continues to look very good; I've got the beta at home, and it runs very well on my ancient machine - and the newer builds look even better. Microsoft stand a good chance of actually getting some forgiveness for Vista.
  • The EMP is very, very... odd. Beer, a live-rock/karaoke, an Xbox Rock Band arena and a robot museum all in one.
Other than that - back to the day job. Oh; and planning for a few user group sessions - so if you're in the UK and near Gloucester or Taunton, then I'll be talking about Expression. Feel free to say hi ;-p I also seem to have become involved in the "Balloon Debate" at the latter, so you can watch me try to defend the very existence of C#.