Code, code and more code.: 2013

Friday 15 November 2013

Allocaction, Allocation, Allocation

Foreword

If you read nothing else in this post, please at least read the bit about params; thanks. I’m going to talk about a number of things that cause unforeseen allocations, and then discuss possible options for avoiding them. Some of the options presented are intentionally exotic and esoteric, and are not intended to be used directly – they are for the purposes of discussion only.

Additional: Blogger and LiveWriter seem to be squabbling about the layout; my apologies for any spacing issues

Back story

Today I rant about allocations; “but objects are cheap!” I hear you cry – and indeed you’d be right. So is an individual coffee bean, but yet coffee is one of the most highly traded commodities in the world (next to crude oil, IIRC). Any Java joke in here is coincidental. You use something routinely, and it is amazing how it adds up. On 64 bit, even an empty object is pretty big.
.NET has a first class garbage collector, and it is constantly improving, but on a busy system memory concerns can be very noticeable and very real. Adding more memory to the box helps a bit, but then when garbage collection does eventually happen, it can be even more noticeable.
So; I get very excited about allocations. Perhaps ridiculously so. I want to be very clear: I don’t mind using objects, and I don’t mind using memory. Simply – I want to be using them for useful work. And a lot of things… just aren’t. Collecting a few hundred generation-zero objects might be cheap, but do you know what is cheaper? Not having to. Perhaps my perspective is also skewed a bit by the fact that I work on a lot of library / utility code, and have reached the conclusion that no library should ever be the cause of unexpected / unnecessary overhead. Application code is a bit different: in application code you should be writing domain logic about your application – that is useful work and is free to do whatever it wants.
Enough background; let’s look at an example; don’t worry, we’ll pick an easy one…

string prefix = // something variable
foreach (var line in File.ReadAllLines(somePath))
foreach (var token in line.Split(';').Where(s => s.StartsWith(prefix)))
{
    Console.WriteLine(token);
}

Fairly innocent looking file processing; but let’s pull it apart for allocations – I’ll list them first, and discuss each in turn:

ReadAllLines creates a vector containing each line, all at once
We get an enumerator over the vector of lines
We get a capture-context over prefix
Then for every line:
- We create and populate a new vector of characters, length 1 – for the Split call
- Split allocates a vector of the results
- The first iteration only (in this case, but can vary) creates a delegate to the hoisted predicate on the capture-context (the lambda)
- Where allocates an enumerable/iterator (some fancy details make the same object function as enumerable and iterator most of the time)

Which might not sound too bad at all, but if you know (by usage, profiling, memory dumps etc) that this code is used all the time, then you really might want to think about how much of these are actually useful.

The obvious but uninteresting problems

There are no prizes for spotting these…

ReadAllLines

Yeah, not much to say here other than: use ReadLines instead, which gives you a spooling iterator over the lines, rather than reading everything up front. Not what I want to focus on today, really.

foreach over a vector

You need to be a little careful over-interpreting foreach on a vector – I suspect the JIT might do more than you might imagine here, so I’m not going to get too excited about this. If we did still have a vector (see above), then a naked for might be cheaper, but I wouldn’t rely on it – again, not my focus.

The capture-context and delegate

Did that lambda really make the code easier to read than, say, an if test (with either a nested block or a continue)? LINQ is great, but can be over-used… and in non-trivial code the captures can be surprisingly nuanced, often requiring a new capture-context and delegate instance on every iteration.

The less obvious, but more interesting (IMO) problems

There are also no prizes for spotting these, but give yourself a quiet cheer if you did.

The params array to Split

This is one that often gets overlooked; the overload of Split being used here takes a params char[]; there is no overload that takes a single char. Most usages of Split that I see only use a single char parameter (or maybe two). Since .NET vectors are mutable, it doesn’t trust the code inside the method not to change the values, so the compiler explicitly does not hoist this away onto a static field somewhere. It would be nice if it could, but there’s a lot of “const correctness” problems that would need to be resolved for that to work – so it needs to allocate a new vector every single call – which is often inside loops within loops. In the interim, I posit that there are some sorely needed missing overloads here! I strongly recommend that you do a search for “.Split(“ in your code and see how many times you use it – especially in loops. I tend to create a utility class (imaginately named StringSplits), and use things like
line.Split(StringSplits.Semicolon), where that is defined as static readonly char[] Semicolon = {';'};
This change is so staggeringly simple, but removes so many allocations that are entirely overhead. Imagine what your CPU could be doing instead of collecting all those vectors!

The result of Split

Yet another vector. But really… do we need that? A lot of uses of Split are going to involve things like foreach, so maybe an IEnumerable<string> (presumably via an iterator block) would be more efficient. But, then we still need to allocate an enumerator etc. Or do we? A lot of people mistakenly think that foreach is tied to the IEnumerable[<T>]/IEnumerator[<T>] interfaces. It is indeed true that the compiler can use these, but actually the compiler looks first for a duck-typed API:

is there a method called GetEnumerator() that returns some result…
- …that has a bool MoveNext() method
- …and a .Current property with a get accessor?

If so: that is what it uses. Historically, this allowed for typed iterators that didn’t need to box their values, but it also allows the iterator itself to be a value-type. This is actually very common – List<T> and Dictionary<TKey,TValue> use this approach – which means when you iterate a List<T> directly, there are zero allocations. If, however, you use the IEnumerable[<T>]/IEnumerator[<T>] APIs, you force the iterator to become boxed. And keep in mind that LINQ uses the IEnumerable[<T>]/IEnumerator[<T>] APIs – so code like our lazy Where shown above can multiple unexpected consequences.

If we wanted to, we could write a Split2 (naming is hard) extension method that took a single char, and returned a value-typed enumerator / iterator; then this entire chunk would have zero overhead allocations. It would still have the allocations for the substrings, but that is useful data, not overhead. Unfortunately the compiler doesn’t help us at all here – if you want to write value-typed iterators you need to do it all by hand, which is quite tedious. I’ve tested it locally and it works great, but I’m not actually presenting this as a serious “thing to do” – the intent here is more to get people thinking about what is possible.

What about the substrings?

Another interesting feature of Split is that everything it produces is pure redundant duplication. If I split “abc,def,ghi” by commas, then I now have 4 strings, 3 of which are exact sub-substrings of the original. Since strings in .NET are immutable (at least, to the public), it is alarming how close this is to being ideal, while being so far away in reality. Imagine if there was a Substring struct – a tuple consisting of a string reference, the offset to the first character, and length – with equality and comparison support etc. Unfortunately, without direct support from the BCL by things like StringBuilder, TextWriter, etc (so that they could copy in values without creating a new string) it wouldn’t be very useful. What would be even nicer is if we didn’t need such a thing in the first place, and could simply use string itself to represent internal sub-strings – however, unfortunately the internal implementation details don’t really allow for that: it isn’t the case that the string type contains a pointer to the actual data; rather, the string instance is magically variable-sized, and is the data. Again, I don’t have a magic wand here, other than highlighting where hidden allocations can come from.

Why do I care about all of this?

Because often, memory profiling shows that vast amounts of our memory is swamped with silly, unnecessary allocations. Of particular note in some recent Stack Exchange memory dumps was massive numbers of string[] next to the same recurring set of strings. After some digging, we recognised them from [OutputCache] parameters. Some of these we could remove ourselves by tweaking our GetVaryByCustomString; in our case, we were actually mainly interested in “does the tokenized string contain this token, under usual split logic“ – and of course you don’t actually need to split the code at all to do that – you can just check for containment, paying special attention to the previous/next characters. Some more of these allocations were coming from ASP.NET MVC (and I hope to send a pull-request over shortly); and yet more are coming from raw ASP.NET internals - (Page, IIRC - which has the low-level output caching implementation). The overheads we were seeing is from Split; and pretty much all of it is avoidable if the code is written with a view to allocations. Since we make extensive use of caching, we were perhaps paying disproportionately here.

But if your server’s memory is being dominated by unnecessary overhead: that is memory that can’t be used for interesting data, like questions and answers.

Oh, and the final code

string prefix = // something variable
foreach (var line in File.ReadLines(somePath))
foreach (var token in line.Split2(';'))
{
    if(token.StartsWith(prefix))
    {
        Console.WriteLine(token);
    }
}

Sunday 8 September 2013

Fun with immutable collections

Yesterday, somebody asked me whether I was going to add protobuf-net support for the Microsoft Immutable Collections package. So far, I’ve been following their announcements, but not really dabbling with them too much – but, it looks pretty close to complete, so I guess I’d better take a peek.

Context: What is Microsoft Immutable Collections?

The Immutable Collections package includes a set of new externally immutable collection types. In particular, having immutable types makes it much easier to avoid issues with threading and concurrency, which is ever-increasingly important whether that is because you want to exploit all the cores on a powerful box, or because the framework you are using demands that you you do as much as you possibly can in asynchronous methods. Of course, it also makes it harder to mess up values unexpectedly (some method in the 7th level of code hell unexpectedly calling Clear(), for example)– so win/win.

So how does that impact serializers?

The up-side of all of this is that you can expose now-immutable collections without needing to worry about changes; the down-side is that code that depends on changes will need some love – and in particular, this is bad news for things like serializers. The “serialize” step isn’t hugely impacted – we can still walk the data writing the elements to the output stream – but the “deserialize” step is shot down in flames. A basic deserialize-a-list implementation is something like:

IList collection = (IList)(value
     ?? new SomeDefaultCollectionType());
while(CanReadNextSubItem())
{
    object newItem = ReadNextSubItem();
    collection.Add(newItem);
}
return collection;

(note that the actual code is likely to be more complex to account for the various edge-cases and optimisations that most people really don’t want to ever have to know about)

The problem here is that the immutable collections kinda claim that they can support this – raising exceptions at runtime. So: we need to identify the immutable collections and accommodate them. It would be ironic if I said “nope, I don’t want to support that” – because in most protobuf implementations the tool-generated objects and collections are inherently immutable.

What is the immutable collection API?

Obviously, the new / Add() approach isn’t going to work. If we take a look at the package, there are basically two usage patterns in play. Here’s two different ways of creating an immutable list (the other collection types are pretty similar):

Example 1

var list = ImmutableList.Create();
list = list.Add("first");
list = list.Add("second");
list = list.Add("third");

Example 2

var builder = ImmutableList.CreateBuilder();
builder.Add("first");
builder.Add("second");
builder.Add("third");
var list = builder.ToImmutable();

The first example shows how you might make occasional changes; note that list becomes a different value after each call, which is how it achieves immutability – the old collection (which could be still being used elsewhere) remains unchanged. Optimisations help make this as inexpensive as possible, hopefully allowing the caller not to need to know much about it.

The second examples shows how you might implement it if you know you are doing a block of changes – basically, the builder API allows it to avoid the overheads of immutability for all the intermediate steps – only worrying about that when we call ToImmutable();

Looking at these, then, it feels like the most obvious contender for a serializer is to try to identify the builder API. This will also avoid confusion with some pre-existing list implementations that have Add() methods that return a non-void result.

The implementation

I don’t like hard-coding to specific interfaces - at least, not from the meta-programming layer. There are various reasons for this, including:

I don’t want to force an additional reference on people
I don’t want to be tied to a particular version of an in-progress API
I don’t want to explicitly implement versions for every immutable collection that gets added, and its interface-based twin
sometimes, the meta-programming layer is executing against an entirely different framework (for example, generating a dll that will execute on Windows Phone) – external dependencies are thus to be avoided like the plague!

so I prefer to identify the pattern – not the specific implementation.

If you take a look at the API in the current library, we can see:

concrete types like ImmutableList<T>
which implemenents IEnumerable<T>, possibly with a custom iterator type which may have a paired IImutableList<T> interface
and which implements IReadOnlyCollection<T>
which has a utility class ImmutableList, with methods like CreateBuilder()
where the builder has Add(), ToImmutable(), and possibly AddRange() methods

Which is actually a pretty specific set of stuff to identify a pattern. Note that the custom iterator type is a handy optimisation used by ImmutableArray<T> (which is a struct) to avoid boxing; fortunately protobuf-net already knows all about custom iterators, so we don’t have to change a single line of the serialize code. What I do, then, is:

firstly – everything here is meta-programming – identifying the pattern via reflection, and then typically baking IL to execute it as fast as possible
assume the existing code recognises that it is vaguely list-like
then we check that it is a generic type that implements some IReadOnlyCollection<T>
if it implements that, it might be a candidate – so we look for the non-generic utility type of the same name in the same namespace, taking into account the interface abstractions (so IImutableSet<T> actually maps to ImmutableHashSet – that is the only hard-coded exception I needed to add)
resolve the CreateBuilder() method, and identify the methods available (Add(), AddRange(), ToImmutable()
generate an deserialize loop using those methods

So comparing back to our original loop, we now have (assuming we don’t care about existing values, which makes it trickier) something like:

var builder = ImmutableList.CreateBuilder<SomeType>();
while(CanReadNextSubItem())
{
    SomeType newItem = ReadNextSubItem();
    builder.Add(newItem);
}
return builder.ToImmutable();

Conclusion

I hope that has served as an interesting introduction into the immutable types, but coming at it from the angle of “identifying the abstract pattern” rather than “working with the API directly”. The trunk of protobuf-net (r666) now has support for immutable lists, arrays, dictionaries, hash-sets, sorted-sets, sorted-dictionaries - and all of their interface twins. But all in a single pattern recognition block.

Thursday 25 April 2013

Fun with dynamic - how to build a dynamic object

The project: build a redis client

The dynamic feature in C# 4 is powerful when used appropriately. I thought it was about time I wrote a piece on how to do that. And … because I like redis so much I thought it would make a perfect example.

Now, the observant among you may be thinking “but Marc, you’ve already written a redis client” – in which case I agree (and incidentally I congratulate you on getting hyperlinks into speech – no mean feat); but that isn’t the point! This toy client isn’t meant to compete: BookSleeve is heavily optimised to allow really fast and efficient usage. This one is just for fun. If you like it and it works for you, great! All the code for this is available on google-code.

What we want

I want to be able to do things like:

// increment and fetch new
int hitCount = client.incr("hits");
// fetch next, if any (else null)
string nextWorkItem = client.lpop("pending");

but without the client knowing anything about redis except the binary wire protocol – those commands are entirely dynamic. Actually a nice advantage of this is that the client doesn’t need to be updated as new redis features are released… but I digress!

Getting started

The key in implementing a dynamic API is implementing IDynamicMetaObjectProvider – although frankly I don’t propose doing that; I’m just going to subclass DynamicObject which does a lot of the work for us. So here's our first step:

public sealed class RedisClient : DynamicObject, IDisposable {...}

This gives us the start of a client that will respond to dynamic; although it doesn't actually do anything yet - we have to tell our object to handle dynamic method calls, which we do by overriding TryInvokeMember. Again, keep in mind that this is only a toy, and we’ll do this by simply writing the command name and parameters (in redis format) down the wire, and reading one result in redis format (note that this means that we’ll pay the full latency price per operation, which isn’t ideal – and that we can’t act as a redis pub/sub subscriber – that would simply not work, since replies don’t match neatly to commands then):

public override bool TryInvokeMember(
    InvokeMemberBinder binder,
    object[] args, out object result)
{
    WriteCommand(binder.Name, args);
    result = ReadResult();
    var err = result as RedisExceptionResult;
    if (err != null) throw err.GetException();
    return true;
}

In particular, note that the name of the method requested is available as binder.Name, and the parameters are just an object[].

Writing the command

I won't dwell on the redis protocol details (feel free to read the specification), but basically we need to write the number of arguments for the command (where the command-name itself counts as an argument), followed by the command-name, followed by each of the parameters. To avoid packet-fragmentation, we’ll use some buffering into a BufferedStream which we hold in outStream, which in turn writes to a NetworkStream which we hold in netStream - and obviously after each command we need to flush those to ensure they get to the server, so we get:

private void WriteCommand(string name, object[] args)
{
    WriteRaw(outStream, '*');
    WriteRaw(outStream, 1 + args.Length);
    WriteEndLine();
    WriteArg(name);
    for (int i = 0; i < args.Length; i++)
    {
        WriteArg(args[i]);
    }
    // and make sure we aren't holding onto any data...
    outStream.Flush(); // flushes to netStream
    netStream.Flush(); // just to be sure! (although this is a no-op, IIRC)
}

And for each argument, we need to write the length of the data, followed by the data itself. I won't detail each the various formats for different types of data, but: to avoid having to test "what type of data is this?", we'll cheat by having a few overloads of a method we'll call WriteRaw, and use dynamic to get the runtime to pick between the overloads for us... sneaky:

private void WriteArg(object value)
{
    // need to know the length, so: write to our memory-stream
    // first
    buffer.SetLength(0);
    WriteRaw(buffer, (dynamic)value);
    // now write that to the (bufferred) output
    WriteRaw(outStream, '$');
    WriteRaw(outStream, (int)buffer.Length);
    WriteEndLine();
    WriteRaw(outStream, new ArraySegment(buffer.GetBuffer(), 0, (int)buffer.Length));
    WriteEndLine();
}

Did you spot the cheeky dynamic in there? Since we're already in dynamic-land, it is hard to say that this is going to have any negative impact... so; why not? It means that if I need to support a new data-type, I just add a new WriteRaw to match, and: job done - and frankly, that's about it for sending the data.

Reading the response

After that, we need to refer to the specification again to see what replies look like - it turns out that they're basically the same format as the outbound data. But we have some ambiguity - does the user want their data as a byte[]? or as a decoded string? or maybe they want int? The nice thing is: we can let them tell us, by providing a result that supports conversions via dynamic - so when they type:

byte[] blob = client.get("my_image");
string name = client.get("name");

they get the binary and text correctly. So we can subclass DynamicObject again for a class that holds the raw result, and override another method - TryConvert this time. We get passed in a different binder, this time with access to the requested type in binder.Type - which we can then use to unscramble the data accordingly. This implementation is a less interesting and more tedious (testing different matched types), so I’ll leave that out of the blog. The only thing left to do (as shown in the TryInvokeMember) is to check if the response is an error-message from the server, and turn that into a thrown Exception, so that it feels intuitively .NET. The reason I can’t do this directly when reading the reply comes down to some implementation details – some replies are themselves composed of multiple nested replies, and I want to re-use the code internally for reading those. We can’t do that if it throws when hitting an error a few levels down – the stream could be left in an incomplete state (i.e. we might not have finished reading the outer-most reply).

Summary

And: that's it! A redis client written from scratch in a little over an hour; but more importantly, a complete dynamic API illustration. Well, maybe not complete, but as you can imagine: the other available operations (properties, indexers, operators, etc) work very similarly. I hope it is illustrative. Again, all the code for this is available on google-code.

Monday 25 February 2013

How many ways can you mess up IO?

There are so many interesting ways that people find to make a mess of reading binary data, but actually it doesn’t need to be hard. This post is intended to describe some of the more common misunderstandings people have.

Text encodings: what they are not

Text is simple, right?

Text encodings are great; you use them (whether you know it or not) all the time, whenever your computer opens any kind of text file, or downloads text (such as the ubiquitous html and json) from the internet. If you don’t know what an encoding is, then first: go and read Joel’s treatise on the subject. No really, go read that and then come back.

So now you know: an encoding is a way of working with character data (you know... words etc) over a binary medium (such as a file on disk, or a network socket). Common encodings are things like UTF-8, UTF-7, UTF-16 (in either endian-ness), UTF-32, or “Windows-1252” - however, there is a wide range of encodings available. Some (in particular the UTF-* encodings) can handle most-any unicode character, but many are restricted to the characters used most commonly in a particular locale. Ultimately, an encoding defines a map between characters and bytes, for example defining that “abc” should be represented as the bytes 00-61-00-62-00-63 (and equally, that the bytes 00-61-00-62-00-63 can be interpreted as the text “abc”).

Here’s the thing an encoding is not: it is not a way to turn arbitrary bytes into text for storage. You would be amazed how often people try this, but: that simply doesn’t work. If an encoding tries to read something that isn’t text data, then at best: it will throw an error. At worst, it will silently corrupt the data without realizing it has made a mistake. If you want to store arbitrary binary data as text, then there are other tools for that: primarily, things like base-n. In this case, it is the text that is specially formatted. For example, we might need to convey the bytes 07-A2-00-B2-EE-02 using only text.The observant will notice that I’ve just done exactly that using hexadecimal (base-16), but that it took 17 characters (12 without the dashes) to represent just 6 bytes of actual data. A common alternative to reduce this overhead is base-64, which uses characters that avoid the “control characters” likely to cause problems, while also staying in the first range 0-127, which is the most reliable region. Our 6 bytes become “B6IAsu4C” when stored as a base-64 string. Most platforms have utility methods that provide translation to and from base-64.

The key in choosing between a text-encoding and something like base-64 is simple: all you need to figure out is: which of these two things can have arbitrary contents, and which is required to follow rules (otherwise it is nonsensical)?

arbitrary* text / rules-based binary: use a text encoding
arbitrary binary / rules-based text: use base-64 (or similar)

*=at least, for the characters that the encoding supports, since not every encoding supports every character.

One last thing to say about encodings: don’t use the “system default” encoding - ever. This goes back to the old days of 8-bit systems where your text console needed a code-page that fitted into a single byte but reached at least most of the characters the user was likely to need. For me, this is code-page 1252, but yours may be different. It is exceptionally rare that this is what you want; if in doubt, explicitly specify something like UTF-8. Sometimes you can use byte-order-mark detection, but this is also pretty unreliable - many files omit a BOM. The best answer, obviously, is: define and document what encoding you are planning to use. Then use it.

Network packets: what you send is not (usually) what you get

TCP usually works as a stream of data. When you send things, it is usually not guaranteed that it will arrive at the destination in exactly the same chunks that you send. For example, let’s say that
you try to “send” 3 messages over the same socket using your chosen network library- one with 10 bytes, one with 4 bytes, and one with 8 bytes. You might think that the receiving client can call “read” 3 times and get 10 bytes, 4 bytes and 8 bytes - but it is much more interesting than that. Because it is just a stream, this is simply very unlikely to be what happens. The client could find they get 1 message with 22 bytes. Or 22 messages each with 1 byte. Or any other combination. They could get 10 bytes, 10 bytes, and the last 2 bytes never arrive (because of some network issue). All that TCP guarantees is that whatever portion of the data does make it to the receiver, it will be the correct bytes in the correct order.

Because of this, if you want to send multiple messages over the same socket it is necessary to implement some kind of “framing” protocol - i.e. some way of splitting the data back into logical pieces at the other end. How you do this is up to you, and often depends on the data being sent. For example, text-based protocols often detect a “sentinel” value to split messages (possibly a carriage return, line-feed, or some other “control character” rarely seen in regular text; the characters with values 0 (“nul”) and 3 (“etx”) are also popular choices.

If your messages are binary then it is more challenging, as usually there aren’t really any safe “sentinel” bytes to choose from - so commonly some header information is sent before each message that includes (perhaps only includes) the length of the message that follows. This could be as simple as dumping the 4-byte (32-bit) or 8-byte (64-bit) native representation of the length onto the stream (although you also need to decide whether it is “big endian” or “little endian”, obviously). There are also a range of variable-length representations, for when most messages are expected to be small, but the protocol needs to allow for much longer messages. The key thing here is that you must decide how you are going to do this, and clearly document it for consumers. This of course has the unfortunate requirement that you must actually know the length of the data you want to write before you write it - not always convenient. To help with this, some protocols (for example, the more recent web-sockets protocols) allow you to further break down a single message into multiple fragments that the client must stitch back together into a single logical piece - with an additional marker (perhaps a trailing zero-length message, or a special bit set in the header message) to indicate the end of the logical message.

Learning to read

Most frameworks have a “read” API that looks something like:

int read(byte[] buffer, int offset, int count)

(or something appropriately similar for asynchronous access). The point being that you supply a buffer (somewhere for it to put the data), tell it where in that buffer to start writing (often, but not always, 0, and how much data you want it to read. It then goes away and fetches some data for you. Here’s the rub: in most cases, the last parameter is just a maximum - it is not “get this much data”; it is “get at most this much data”. The return value tells you what it could do: this could be non-positive if no more data will ever be available (the stream has ended), it could be count - which is to say: it could fetch all the data you wanted, or it could be any other value greater than zero and less than count. Because of this, reading often involves a loop; for example, let’s say we want to write a method to read an exact number of bytes; me might do that as:

    void readExact(byte[] buffer, int offset, int count) {
        int bytesRead;
        if(count < 0) {
            throw new ArgumentOutOfRangeException(“count”);
        }
        while(count != 0 &&
          (bytesRead = source.read(buffer, offset, count)) > 0) {
            offset += bytesRead;
            count -= bytesRead;
        }
        if(count != 0) throw new EndOfStreamException();
    }

Dissecting that:

first we check that we aren’t requesting a negative amount of data, which is clearly silly
then in a loop, we:
- check to see if we want more data, then try to read at most count more data
- if the stream ends, we break out
- otherwise, we increment out offset, so that if we need to do another read, we don’t over-stamp the data we just fetched) - and decrement count, because we now have less work required
finally, we check whether we still have data outstanding, because the stream terminated before we expected to - perhaps raising an error to indicate this

This is a fairly classic pattern for reading data that shows how to process the number of bytes obtained in each iteration.

Gotchas when buffering data in memory

Sometimes it isn’t possible to process all the data in a single buffer; in many cases it becomes necessary to use an in-memory store of data that you have received but still need to process (perhaps because you need yet more data before you can make proper sense of it). A common way to do this is to make use of a “memory-stream” - an in-memory object that acts like a stream, but that simply uses local memory rather than a disk or network socket. You basically “write” to the memory-stream until you think you have something you can process, then “read” from it. But did you spot the deliberate mistake? Most memory-streams acts like an old-fashioned VCR: if you record a show, hit “stop”, and then hit “play” - you will find that you are unexpectedly either watching a black screen, or something that you recorded 3 weeks ago. We forgot to rewind it. Essentially, the memory-stream has a single cursor position; after writing the data, the cursor is at the end of the data: what follows is either all zeros, or garbage from the last thing that happened to be in that space. Fortunately, rewinding a memory-stream is usually trivial:

stream.Position = 0;

There’s a second very common confusion with memory-streams; sometimes, after writing to it you want to get the contents, but as a raw buffer (byte[]) rather than as a stream. Often, there are two different APIs for this:

one which gives you a brand new buffer of the current logical length of the stream, copying the data into the new buffer
one which hands you the oversized buffer that it is using internally - oversized so that it doesn’t need to allocate a new backing buffer every time you write to it

They both have uses; the second is often more efficient, as it avoids an allocation - but it is only useful if you also track the logical length of the data (often via stream.Length or stream.getLength()). To go back to our VCR analogy: one creates a brand new cassette of exactly the 37.5 minutes we recorded, and copies the video - the other simply hands us the existing 180 minute cassette: this is only useful if we also say “the bit you want is the first 37.5 minutes”.

Conclusion

This isn’t actually all that complex, but people tend to bring a lot of incorrect preconceptions to the table, which results in the same bugs – over and over and over. So let’s all save some time, and do it right. Please?