Friday, 19 December 2008

Astoria and LINQ-to-SQL; batching and replacing

So far, we have been doing individual updates - but if we perform multiple updates, each is currently sent as an individual update. We can do better; if the server supports it (and ADO.NET Data Services does), we can use batch-mode to send multiple requests at once, and to commit all the changes in a single transaction at the server. Note that this is far preferable to the option of using a distributed transaction to span multiple requests. Fortunately this latter option is not supported under ADO.NET Data Services: it simply doesn't scale, and besides - there is no way to convey this intent over a simple REST request which follow the YAGNI approach of keeping things simple.

Using Batches

The decision to use batches is taken at the client - for example, a non-batched update might be written:

var emp = ctx.Employees.Where(x => x.EmployeeID == empId).Single();
var boss = ctx.Employees.Where(x => x.EmployeeID == bossId).Single();

emp.BirthDate = boss.BirthDate = DateTime.Now;
ctx.UpdateObject(boss);
ctx.UpdateObject(emp);

ctx.SaveChanges();

This performs 2 http requests, and involves 2 data-contexts / SubmitChanges - i.e. adding some logging shows:

GET: http://localhost:28601/Restful.svc/Employees(1)
GET: http://localhost:28601/Restful.svc/Employees(2)
MERGE: http://localhost:28601/Restful.svc/Employees(2)
3: D 0, I 0, U 1 ## the 3rd data-context usage, doing a single update
MERGE: http://localhost:28601/Restful.svc/Employees(1)
4: D 0, I 0, U 1 ## the 4th data-context usage, doing a single update

Changing the last line at the client makes a big difference:

ctx.SaveChanges(SaveChangesOptions.Batch);

with the trace output:

GET: http://localhost:28601/Restful.svc/Employees(1)
GET: http://localhost:28601/Restful.svc/Employees(2)
POST: http://localhost:28601/Restful.svc/$batch ## note different endpoint and POST not MERGE
3: D 0, I 0, U 2 ## 3rd data-context usage, doing both updates

This usage can make the API far less chatty. However, it raises the issue of what to do if the second update fails (concurrency, blocking, etc). The answer lies in one of our few remaining IUpdatable methods. In fact, in batch mode (but not in single-update mode), erros in SaveChanges normally cause the ClearChanges method to be invoked, allowing us chance to cancel our changes. As it happens, LINQ-to-SQL uses transactions by default anyway, so our database should still be in a good state. Entity Framework, at this point, detaches all the objecs in the change-set, but LINQ-to-SQL doesn't offer this option; as a substitute, we can take this opportunity to prevent any further usage of the data-context in an unexpected state:

public static void ClearChanges(DataContext context)
{ // prevent any further usage
context.Dispose();
}

In other scenarios, this would be an opportunity to undo any changes.

Replacing Records

ADO.NET Data Services also provides another option for submits - replace-on-update. The difference here is that normally the data uploaded from the client is merged with the existing row, where-as with replace-on-update, the uplodaded row effectively *replaces* the existing. Importantly, any values not sent in the upload are reset to their default values (which might make a difference if the server knows about more columns than the client). At the REST level, the http verb is "PUT" instead of the "MERGE" or "POST" we saw earlier.

At the client, the only change here is our argument to SaveChanges:

ctx.SaveChanges(SaveChangesOptions.ReplaceOnUpdate);

Note that this mode is not compatible with batching; a request is made per-record. At the server, this maps to the final IUpdatable method - ResetResource. This method is responsible for clearing all data fields except the identity / primary key /etc (since we still want to write over the same database object). Since LINQ-to-SQL supports multiple mapping schemes (file-based vs attribute-based) we need to ask the MetaType for help here; we'll create a new vanilla object, and use these default property to reset the actual object:

public static object ResetResource(DataContext context, object resource)
{
Type type = resource.GetType();
object vanilla = Activator.CreateInstance(type);
MetaType metaType = context.Mapping.GetMetaType(type);
var identity = metaType.IdentityMembers;
foreach (MetaDataMember member in metaType.DataMembers)
{
if(member.IsPrimaryKey || member.IsVersion || member.IsDeferred
|| member.IsDbGenerated || !member.IsPersistent || identity.Contains(member))
{ // exclusions
continue;
}
MetaAccessor accessor = member.MemberAccessor;
accessor.SetBoxedValue(ref resource, accessor.GetBoxedValue(vanilla));
}
return resource;
}

Summary

Breath a sigh; we have now completely implemented IUpdatable, completing all of the raw CRUD operations required to underpin the framework, and all without too much inconvenience. Hopefully it should be fairly clear how this could be applied to other implementations. I promised to return to inheritance when creating records (and I will), and I'll also explore some of the other features of ADO.NET Data Services.