Monday, December 14, 2009

Commonly Confused Tidbits re .NET Garbage Collector

I recently read a less than informed blog post purporting to describe the .NET garbage collector, but when pressed to point to a better one, I couldn't find a single concise article that covered everything I thought should have been covered.

Type and Memory Safety

We all know what the gist of GC is about: you allocate new objects, but don't have to worry about freeing them. Combined with type safety in the language, this can give you another kind of guarantee, memory safety: that it's impossible to create a dangling pointer (a pointer to freed memory). Type safety's "safety" is compromised in the absence of memory safety. For example, in a type-safe yet non-GC language, if you allocate an object, make a copy of the reference, then free the object, you're left with a dangling pointer. If you then reallocate another object with a similar size, depending on the memory allocator's implementation, you may have actually created a type hole - the dangling pointer may now point to the newly allocated object but with the wrong type. This extra safety is one of the reasons why GC programs often have less bugs than ones using manual allocation.

Precise compacting GC

.NET has a precise compacting GC. It's precise because .NET deals with type-safe and memory-safe references into the managed heap, it can know the precise layout of every allocated object. The opposite is conservative GC; conservative GC doesn't necessarily know the types of variables stored on the stack or in registers, or the layouts of those types, so if a random number on the stack or in a register or the middle of an object happens to coincide with a valid object address, it can't collect that object. It's compacting because during a collection, it moves live (non-garbage) objects closer together, squeezing out the gaps left by dead, old objects which no longer have any references to them. This eliminates the problem of memory fragmentation, at least as far as the managed heap is concerned.

However, thinking of a precise compacting GC as "collecting garbage" is not always the best way. When looking at the asymptotic performance of GC, it's helpful to turn things around, and consider that the GC collects live objects and brings them together. Garbage is just the stuff that got overwritten during moving, or the stuff left at the end after everything has been compacted together. Why is this important? Because the cost of a precise compacting garbage collection is proportional to the number of live objects it has to trace through and move, rather than the amount of garbage it needs to collect. When compared with manual allocation, where the cost of freeing memory is usually proportional to the number of objects freed (i.e. the amount of garbage), something should become clear: the longer you can delay GC - i.e. the more garbage you can build up - the faster and cheaper, in per-garbage terms, GC is than manual allocation.

An example. Imagine a single linked list with 1000 elements, with a single variable in the program referring to the head of the list. Consider the problem of truncating the list, and leaving only the head behind. With manual memory allocation, all 999 trailing items will need to be disposed of, with bookkeeping for each one. With garbage collection, only the first element need be moved to the start of the heap, and the free pointer (where new allocations begin) set to begin directly after it. The 999 other elements got "collected" for free.

This is why the performance of precise compacting GC is proportional to the amount of garbage it is allowed to build up between collections, and hence proportional to the amount of memory it's allowed to consume. It's also the reason why GC programs tend to use more memory than manual allocation ones. It's also the reason why small single-use objects, such as for function return values, are very cheap and shouldn't be unduly avoided in GC languages. Congruent with this, code in a programming language with GC is often more naturally written, easier to read and easier to write because of the freedom with which one can return freshly allocated objects without concern for who is going to collect them, or the performance ramifications. Understanding this point is key to understanding why GC is so productivity-enhancing.

Resources

The garbage collector is for memory, not resources. I've written about this at length in the past, so I shall quote myself:

Disposal of resources in a long-running application using GC is not a performance issue. It's a correctness issue.

Garbage collection knows about memory. With most collectors, GC is only invoked when the GC is asked for more memory than is immediately available, taking specific tuning parameters into account. In other words, the GC is only sensitive to memory pressure. It doesn't know about resources that it sees only as pointer-sized handles, it doesn't know how much they "cost", and indeed those resources might be on a different machine or even spread across many different machines.

More critically, garbage collection gets its performance surplus over and above manual garbage collection by not collecting until as late as reasonably possible

In .NET, the deterministic disposal mechanism is via the IDisposable interface, and in languages like C#, the using keyword.

Finalizer vs IDisposable

Finalizers in .NET are overrides of the method System.Object::Finalize(void). In C#, the syntax looks like a C++ destructor, but other .NET languages have other syntaxes. If this method is overridden, the .NET GC treats the object specially. After it has become garbage, rather than just overwriting it during compaction phase of GC, it will delay collection until the Finalize method of the instance has been executed by the finalizer thread. The GC makes a best effort to call the method; if the finalizer thread is blocked (e.g. some other finalizer hung), then the finalizer might not get called. This is another reason why deterministic disposal is preferred to relying on finalizers.

Finalizers should only be used with objects that directly encapsulate a non-managed resource, such as a file handle, a window handle, a socket, etc. A file stream, or a window object, or a network stream, should not normally implement a finalizer. IDisposable is a different matter. IDisposable should be implemented by every class with a finalizer, and every class that logically owns (transitively) any other object that implements IDisposable. Calling IDisposable on the root of the ownership tree should ripple through and call IDisposable on everything beneath it; ownership trees and other deterministic lifetime approaches still exist with GC precisely because GC is just for memory, not resources. Most objects neither own nor encapsulate resources, however, so much of the burden is still lifted.

Generations

I've read many posts that describe how generational GC works: very loosely, you have three generations, gen0 through gen2; allocations come from gen0; gen0 is collected if it's full; gen1 is collected if gen0 collection didn't free enough space; gen2 is collected if gen1 didn't free enough; more memory is allocated if gen2 didn't free enough; if not enough memory, then out of memory exception.

They also describe the hypothesis behind generational collections: that young objects are more likely to die (become garbage) sooner rather than later, while old objects are likely to survive longer.

But this, however, doesn't capture the intuition behind why there are 3 generations, rather than 4, or 5, or some other number. Let me explain.

Most applications have a roughly constant, steady state of total allocated memory. It's made up of two kinds of objects: objects associated with currently executing code (let's call them ephemeral objects), and objects linked into a global object graph which is shared across multiple instances and iterations of executing code (let's call them persistent objects). We want to avoid tracing through the persistent objects as much as possible, as it's unlikely that any objects which have survived for a long time have become garbage. Actually, making old objects garbage is a pathology to be avoided in .NET programs, and has been described as the mid-life crisis problem.

The ephemeral objects, on the other hand, are likely to have a high proportion of garbage. But we have a problem: when we perform a garbage collection on the ephemeral objects, there'll be some live objects left over, currently being used by executing code - but we still don't want them in with the persistent objects. To solve this problem, there needs to be a generation in the middle, a half-way house, before resorting to increasing the persistent object set.

This combination of three generations is almost ideally suited to server applications which have relatively little variance in total per-request allocation and tend towards an average per-request allocation which is less than the size of gen0+gen1 per thread. If one could time per-thread collections of gen0 and gen1 correctly, in between requests, then in theory GC would be almost completely free, as there would be almost no live objects. There are complications in practice (we can't manually invoke a GC for just a single thread, for one thing), but when designing applications and servers, it's very much worth keeping the generational design of the GC in mind and working with its assumptions, rather than against them.

For example, this can mean that caching objects can turn out to be a pessimization rather than an optimization, because keeping them around moves them into gen2, where it is more expensive to collect them when they get evicted from the cache than it was to create them in the first place. Depending on how expensive the objects are to create vs cache, they may need to be cached permanently and updated as needed, or referred to through a WeakReference which won't keep them alive, or at worst managed manually one way or another, such as via unmanaged memory, or slots or ranges in an array, etc.

14 comments:

Jamie said...

Thanks Barry, a great explanation.

Your point about potential false "optimisation" by caching objects particularly stuck with me. I also now understand why I witnessed exactly this mis-conception recently concerning Caching Objects vs Creating "fresh" objects triggered by this post about how Twitter has huge problems with their Ruby GC (http://bit.ly/3YLr8u).

unused said...

Excellent. I have always been bugged with how much most .NET developers don't know about the GC. Now I have someplace to point them.

Javier Santo Domingo said...

Really good post, GC its a very interesting topic

Jolyon Smith said...

"This extra safety is one of the reasons why GC programs often have less bugs than ones using manual allocation."

What's your source for this assertion?

I would put it to you that the people that make the sorts of mistakes leading to bugs that a GC will pick up are people who are going to make other sorts of mistakes too.

A GC will reduce the number of bugs in code written by such people, because it removes one bullet from the loaded gun, but those people are still going to try to shoot themselves in the foot in other ways.

A GC doesn't reduce bugs in and of itself (applying a GC to code that already contains zero memory allocation/overwrite bugs and other things that a GC would "pick up" for you, is a null-op).

But a GC does allow you to be lazy, and in my experience promoting laziness is a sure fire way to promote carelessness and *more* bugs, not fewer, is the inevitable result.

Barry Kelly said...

Jolyon, from past experience I know there's little win from arguing with you, so I won't expend much effort.

Briefly:

Type and memory safety combined - which do not by themselves require GC but are a lot easier to implement with GC - make several classes of bugs impossible: dangling pointer, double-free, buffer overflow, uninitialized memory writes, unsafe typecasts, etc.

Secondly, GC can in fact fix bugs in and of itself: a conservative collector can fix memory leaks in an otherwise unmodified program.

Finally, yes, a GC lets you be lazy with respect to memory allocation. This is in fact its primary benefit. Don't forget, laziness is one of the many virtues of the good programmer. Don't do something by hand when you can easily automate it. The fact that you can be lazy WRT memory means you can put more effort into other areas, such as program correctness, efficiency, maintainability, etc.

Anonymous said...

Very interesting, thanks.
I would like to read the explanations of the new programming conceptions from other languages independently whether they will appear in Delphi or not.

B said...

Barry, thanks for a nice and concise explanation of most important points about GC...

Does this discussion published by one of the Delphi architects (yourself) mean that potential benefits of GC might bring one (GC) or something similar to Delphi (native) world as well?

Cheers, B.

Barry Kelly said...

zurosevic, I can't speak for future Delphi plans. Speaking personally, I'd like to see GC be an option, but there are problems with grafting GC onto a language not designed with it in mind, so I'm not sure a native Delphi (as it currently exists) + GC would really extract all the possible benefits.

Jarle Stabell said...

Jolyon, in my experience, in approx 95% of the cases, doing manual GC is trivial. However, sometimes doing manual GC can get pretty complex (think cases where an object's lifetime isn't related to the call stack nor is owned by a single other object), making it a burden even for highly experienced developers, and forces more complex designs than would have been possible in a language/environment with automatic GC. So even the best developers can benefit a lot from automatic GC because it means one can use simpler (more effective) designs and have one less design issue to take care of.
Quite a few people have asserted that it is generally the best developers who benefits the most from advances in programming languages and technology, not the mediocre ones and I believe automatic GC is one such advancement. It also is an enabler for programming in a more functional style, which in many cases is vastly superior to imperative style.

Unknown said...

is it possible for an object to be collect by GC while its is in scope of running function? i

Barry Kelly said...

Arseny, yes it's possible for the GC to collect objects which have existing references in variables that are still in scope, but only if the variable is not referred to in subsequent code.

For example, it is even possible for the 'this' reference in C# (equivalent to Self in Delphi) to be garbage collected even while code is executing in a method of the instance, so long as there is no reference to fields or virtual methods etc. in the remainder of the method.

Anonymous said...

How about having a set of GC safe common objects that developers could opt to use, TGCStringList, TGCList, TGCBitmap, etc. That would reduce the overall use of try/finally/free by about 90%

Anonymous said...

I think somebody has suggested this idea before. How about adding keyword for out of scope gc object?

var
aStrings: TStringList gc;
aList: TList gc;
aBitmap: TBitmap gc;

This methods doesn't need a gc service to be running at the background at all. of course, user is not allowed to return these objects as a function result (compiling error?)

Barry Kelly said...

You simply can't embed "GC-ness" in the type or the variable declaration. All variables (whether they are of a GC type or not, or whether they are of the appropriate declaration or not) need to be findable by the garbage collector in order for it to make sure it doesn't collect an allocation prematurely. If you're going to search a non-GC heap looking for GC references into the GC heap, you've already lost a big chunk of the benefit from GC. And if you try and get around the issue by using some kind of wrapper (like GCHandle in the .NET world) to keep GC references alive from the non-GC world, you lose substantial chunks of usability.

A GC-world by default with selective manual allocation is far more workable than the converse, manual allocation by default with selective GC.

Re "a GC service in the background" - the majority of throughput-oriented collectors do not collect in the background, but rather when an allocation is made which cannot be satisfied without doing a collection. GC services run concurrently usually to reduce GC-induced latencies, typically for GUI applications.