I rarely (I hope) post specific corrections to misleading information out there on the web, not least because there's so much of it and it's usually a futile effort, but Jeff Atwood's latest post really rubbed me up the wrong way.
In his post, Jeff writes some unidiomatic C and bewails its ugliness and pain etc.:
b1 = (double *)malloc(m*sizeof(double));
By way of comparison, he then writes some C# (I expect that's what it's supposed to be) that doesn't actually use the GC, since it's not allocated on the heap:
(Of course, if it's not C#, and is in fact supposed to be e.g. Java, it's still not an example since it hasn't been initialized.)
Now Jeff's core message (at least at the start of the post) that GC is a Good Thing, I'm all in favour of. I strongly believe that manual memory allocation has good reasons to be used only in about 5% (or less) of programming tasks, usually restricted to things like operating systems and embedded devices. I consider well-defined, trivially provably correct zoned allocation to be a good 80% of the way to GC, so I would include good uses of that in the 95% case.
Jeff's final point, however, about "disposal anxiety" is where he casually reveals that he doesn't pay much attention to a very important issue: disposal of resources. He mocks a particular piece of code that has at least the core concept right:
sqlConnection.Close(); sqlConnection.Dispose(); sqlConnection = null;
Two-thirds of this code is redundant - the last line is always so unless sqlConnection is read later in the same routine, while either of the first two would do for resource disposal. This wouldn't be so bad but for Jeff saying:
Personally, I view explicit disposal as more of an optimization than anything else, but it can be a pretty important optimization on a heavily loaded webserver, or a performance intensive desktop application plowing through gigabytes of data.
Disposal of resources in a long-running application using GC is not a performance issue. It's a correctness issue.
Garbage collection knows about memory. With most collectors, GC is only invoked when the GC is asked for more memory than is immediately available, taking specific tuning parameters into account. In other words, the GC is only sensitive to memory pressure. It doesn't know about resources that it sees only as pointer-sized handles, it doesn't know how much they "cost", and indeed those resources might be on a different machine or even spread across many different machines.
More critically, garbage collection gets its performance surplus over and above manual garbage collection by not collecting until as late as reasonably possible, where "reasonable" is usually a function of how much free memory is available without swapping to disk. The long-run-average optimal collector for a program that has a machine all to itself won't collect at all until every last remaining byte of RAM has been allocated, which may of course take some time. (Added to clarify: this is a theoretical optimum, and not how most GCs act in practice. They collect much sooner e.g. the youngest generation may fit in L2 cache and so be very fast to collect.)
Precise tracing garbage collectors work somewhat paradoxically not by collecting garbage, but by collecting live objects. The "garbage" is everything left over after all the live objects have been collected. The more garbage as a fraction of live objects there is, the cheaper, proportionally, it has been to collect. (Added to clarify: this means that the collection of garbage is amortized; any amount of garbage costs the same to collect, providing the set of live objects is held constant.) This is how GCs outperform manual memory allocation on average and with sufficient free memory. Ideally, the GC never runs at all, and program termination cleans up.
With this insight under your belt, it should be clear that expecting the GC to clean up resources is to be ignoring one of the key benefits of GC. Not only that, but you shouldn't be expecting the GC to finalize your objects at all. If your resources must be disposed of - and almost all resources should, e.g. TCP sockets, file handles, etc. - then you need to take care of that yourself, and deterministically. Leaving e.g. file handles to be disposed of by the GC is opening up the program (and possibly the user) to odd non-deterministic failures when they find files on disk are still locked, even though the program should have closed them.