How Garbage Collection works in .Net

Understanding the Basics of Working in a Garbage-Collected Platform

Every program uses resources of one sort or another, be they files, memory buffers, screen
space, network connections, database resources, and so on.

In fact, in an object-oriented environment, every type identifies some resource available for a program’s use.

To use any of these resources requires memory to be allocated to represent the type. The following steps are required to access a resource:

  1. Allocate memory for the type that represents the resource by calling the intermediate
    language’s newobj instruction, which is emitted when you use the new operator in C#.

  2. Initialize the memory to set the initial state of the resource and to make the resource usable. The type’s instance constructor is responsible for setting this initial state.

  3. Use the resource by accessing the type’s members (repeating as necessary).

  4. Tear down(vt. 向 猛扑,冲下; 拆毁,拆卸) the state of a resource to clean up.
    I’ll address this topic in the section “The Dispose Pattern: Forcing an Object to Clean Up” later in this chapter.

  5. Free the memory. The garbage collector is solely responsible for this step.

This seemingly simple paradigm has been one of the major sources of programming errors.

How many times have programmers forgotten to free memory when it is no longer needed? How many times have programmers attempted to use memory after it had already been freed?

With unmanaged programming, these two application bugs are worse than most others
because you usually can’t predict the consequences or the timing of them.

For other bugs, when you see your application misbehaving, you just fix the problem.

But these two bugs cause resource leaks (memory consumption) and object corruption (destabilization), making the application perform unpredictably.

In fact, there are many tools (such as the Microsoft Windows Task Manager, the System Monitor ActiveX Control, NuMega BoundsChecker from Compuware, and Rational’s Purify) that are specifically designed to help developers locate these types of bugs.

Proper resource management is very difficult and quite tedious(adj. 单调沉闷的; 冗长乏味的; 令人生厌的).

It distracts developers from concentrating on the real problems that they’re trying to solve.

It would be wonderful if some mechanism existed that simplified the mind-numbing(adj.
(糟糕、无聊或程度强烈到)令人头脑麻木的) memory-management task for developers.

Fortunately, there is: Garbage Collection.

Garbage collection completely absolves the developer from having to track memory usage and
know when to free memory.

However, the garbage collector doesn’t know anything about the resource represented by the type in memory, which means that a garbage collector can’t know how to perform Step 4 in the preceding list:

tear down the state of a resource to clean up.

To get a resource to clean up properly, the developer must write code that knows how to properly clean up a resource.

The developer writes this code in the Finalize, Dispose, and Close methods, as described later in this chapter.

However, as you’ll see, the garbage collector can offer some assistance here too, allowing developers to skip Step 4 in many circumstances.

Also, most types, including String, Attribute, Delegate, and Exception, represent resources
that don’t require any special cleanup.

For example, a String resource can be completely cleaned up simply by destroying the character array maintained in the object’s memory.

On the other hand, a type that represents (or wraps) an unmanaged or native resource, such
as a file, a database connection, a socket, a mutex, a bitmap, an icon, and so on, always
requires the execution of some cleanup code when the object is about to have its memory
reclaimed.

In this chapter, I’ll explain how to properly define types that require explicit clean
up, and I’ll show you how to properly use types that offer this explicit clean-up.

For now, let’s examine how memory is allocated and how resources are initialized.

Allocating Resources from the Managed Heap

The CLR requires that all resources to be allocated from a heap called the managed heap.

This heap is similar to a C-runtime heap, except that you never delete objects from the managed heap.

Objects are automatically deleted when the application no longer needs them.

This, of course, raises the que stion, “How does the managed heap know when the application is no longer using an object?” I’ll address this question shortly.

Several garbage-collection algorithms are in use today.

Each algorithm is fine-tuned for a particular environment to provide the best performance.

In this chapter, I’ll concentrate on the garbage-collection algorithm used by the Microsoft .NET Framework’s CLR.

Let’s start off with the basic concepts.

When a process is initialized, the CLR reserves a contiguous region of address space that initially contains no backing storage.

This address space region is the managed heap.

The heap also maintains a pointer, which I’ll call NextObjPtr. This pointer indicates where the next object is to be allocated within the heap.

Initially, NextObjPtr is set to the base address of the reserved address space region.

The newobj intermediate language (IL) instruction creates an object.

Many languages (including C#, C++/CLI, and Microsoft Visual Basic) offer a new operator that causes the compiler to emit a newobj instruction into the method’s IL code.

The newobj instruction causes the CLR to perform the following steps:

  1. Calculate the number of bytes required for the type’s (and all of its base type’s) fields.

  2. Add the bytes required for an object’s overhead. Each object has two overhead fields:
    a type object pointer and a sync block index. For a 32-bit application, each of these
    fields requires 32 bits, adding 8 bytes to each object. For a 64-bit application, each field
    is 64 bits, adding 16 bytes to each object.

  3. The CLR then checks that the bytes required to allocate the object are available in the
    reserved region (committing storage if necessary). If there is enough free space in the
    managed heap, the object will fit, starting at the address pointed to by NextObjPtr,
    and these bytes are zeroed out. The type’s constructor is called (passing NextObjPtr
    for the this parameter), and the newobj IL instruction (or C#’s new operator) returns
    the address of the object. Just before the address is returned, NextObjPtr is advanced
    past the object and now points to the address where the next object will be placed in
    the heap.

Figure 20-1 shows a managed heap consisting of three objects: A, B, and C. If a new object
were to be allocated, it would be placed where NextObjPtr points to (immediately after
object C).

By contrast, let’s look at how the C-runtime heap allocates memory.

In a C-runtime heap, allocating memory for an object requires walking through a linked list of data structures.

Once a large enough block is found, that block is split, and pointers in the linked-list nodes
are modified to keep everything intact.

For the managed heap, allocating an object simply means adding a value to a pointer—this is blazingly fast by comparison.

In fact, allocating an object from the managed heap is nearly as fast as allocating memory from a thread’s stack!

In addition, most heaps (such as the C-runtime heap) allocate objects wherever they find free space.

Therefore, if I create several objects consecutively, it’s quite possible for these objects to be separated by megabytes of address space.

In the managed heap, however, allocating several objects consecutively ensures that the objects are contiguous in memory.

In many applications, objects allocated around the same time tend to have strong relationships to each other and are frequently accessed around the same time.

For example, it’s very common to allocate a FileStream object immediately before a BinaryWriter object is created. Then the application would use the BinaryWriter object, which internally uses the
FileStream object.

In a garbage-collected environment, new objects are allocated contiguously in memory, providing performance gains resulting from locality of reference.

Specifically, this means that your process’ working set will be smaller than a similar application running in a non-managed environment.

It’s also likely that the objects that your code is using can all reside in the CPU’s cache.

Your application will access these objects with phenomenal speed because the CPU will be able to perform most of its manipulations without having cache misses that would force slower access to RAM.

So far, it sounds as if the managed heap is far superior to the C-runtime heap because of its simplicity of implementation and speed.

But there’s one little detail you should know about before getting too excited.

The managed heap gains these advantages because it makes one really big assumption: that address space and storage are infinite.

Obviously, this assumption is ridiculous, and the managed heap must employ a mechanism to allow it to make this assumption.

This mechanism is the garbage collector.

Here’s how it works:

When an application calls the new operator to create an object, there might not be enough address space left in the region to allocate to the object.

The heap detects this lack of space by adding the bytes that the object requires to the address in NextObjPtr.

If the resulting value is beyond the end of the address space region, the heap is full, and a garbage collection must be performed.

Important:
What I’ve just said is an oversimplification.

In reality, a garbage collection occurs when generation 0 is full.

Some garbage collectors use generations, a mechanism whose sole purpose is to improve performance.

The idea is that newly created objects are part of a young generation and objects created early in the application’s lifecycle are in an old generation.

Objects in Generation 0 are objects that have recently been allocated and have never been examined by the garbage collector algorithm.

Objects that survive a collection are promoted to another generation (such as Generation 1).

Separating objects into generations allows the garbage collector to collect specific generations instead of collecting all of the objects in the managed heap.

I’ll explain generations in more detail later in this chapter.

Until then, it’s easiest for you to think that a garbage collection occurs when the heap is full.

The Garbage Collection Algorithm

The garbage collector checks to see if any objects in the heap are no longer being used by the application.

If such objects exist, the memory used by these objects can be reclaimed.

(If no more memory is available in the heap after a garbage collection,new throws an OutOfMemory-Exception.)

How does the garbage collector know whether the application is using an object?

As you might imagine, this isn’t a simple question to answer.

Every application has a set of roots.

A single root is a storage location containing a memory pointer to a reference type object.

This pointer either refers to an object in the managed heap or is set to null.

For example, a static field (defined within a type) is considered a root.

In addition, any method parameter or local variable is also considered a root.

Only variables that are of a reference type are considered roots; value type variables are never considered roots.

Now, let’s look at a concrete example starting with the following class definition:

internal sealed class SomeType
{
private TextWriter m_textWriter;
public SomeType(TextWriter tw)
{

m_textWriter = tw;
}
public void WriteBytes(Byte[] bytes)
{

for (Int32 x = 0; x < bytes.Length; x++)
{
m_textWriter.Write(bytes[x]);
}
}
}

The first time the WriteBytes method is called, the JIT compiler converts the method’s IL code into native CPU instructions. Let’s say the CLR is running on an x86 CPU, and the JIT
compiler compiles the WriteBytes method into the CPU instructions shown in Figure 20-2.

(I added comments on the right to help you understand how the native code maps back to the original source code.)

00000000 push edi // Prolog
00000001 push esi
00000002 push ebx
00000003 mov ebx,ecx // ebx = this (argument)
00000005 mov esi,edx // esi = bytes array (argument)
00000007 xor edi,edi // edi = x (a value type)
00000009 cmp dword ptr [esi+4],0 // compare bytes.Length with 0
0000000d jle 0000002A // if bytes.Length <=0, go to 2a
0000000f mov ecx,dword ptr [ebx+4] // ecx = m_textWriter (field)
00000012 cmp edi,dword ptr [esi+4] // compare x with bytes.Length
00000015 jae 0000002E // if x >= bytes.Length, go to 2e
00000017 movzx edx,byte ptr [esi+edi+8] // edx = bytes[x]
0000001c mov eax,dword ptr [ecx] // eax = m_textWriter’s type object
0000001e call dword ptr [eax+000000BCh] // Call m_textWriter’s write method
00000024 inc edi // x++
00000025 cmp dword ptr [esi+4],edi // compare bytes.Length with x
00000028 jg 0000000F // if bytes.Length > x, go to f
0000002a pop ebx // Epilog
0000002b pop esi
0000002c pop edi
0000002d ret // return to caller
0000002e call 76B6E337 // Throw IndexOutOfRangeException
00000033 int 3 // Break in debugger
Figure 20-2 Native code produced by the JIT compiler with ranges of roots shown

As the JIT compiler produces the native code, it also creates an internal table.

Logically, each entry in the table indicates a range of byte offsets in the method’s native CPU instructions,
and for each range, a set of memory addresses and CPU registers that contain roots.

For the WriteBytes method, this table reflects that the EBX register starts being a root at offset
0x00000003, the ESI register starts being a root at offset 0x00000005, and the ECX register
starts being a root at offset 0x0000000f.

All three of these registers stop being roots at the end of the loop (offset 0x00000028).

Also note that the EAX register is a root from 0x0000001c to 0x0000001e.

The EDI register is used to hold the Int32 value represented by the variable x in the original source code.

Since Int32 is a value type, the JIT compiler doesn’t consider the EDI register to be a root.

The WriteBytes method is a fairly simple method, and all of the variables that it uses can be enregistered.

A more complex method could use all of the available CPU registers, and some roots would be in memory locations relative to the method’s stack frame.

Also note that on an x86 architecture, the CLR passes the first two arguments to a method via the ECX and EDX registers.

For instance methods, the first argument is the this pointer, which is always passed in the ECX register.

For the WriteBytes method, this is how I know that the this pointer is passed in the ECX register and stored in the EBX register right after the method prolog.

This is also how I know that the bytes argument is passed in the EDX register and stored in the ESI register after the prolog.

If a garbage collection were to start while code was executing at offset 0x00000017 in the WriteBytes method, the garbage collector would know that the objects referred to by the EBX
(this argument), ESI (bytes argument), and ECX (the m_textWriter field) registers were all roots and refer to objects in the heap that shouldn’t be considered garbage.

In addition, the garbage collector can walk up the thread’s call stack and determine the roots for all of the calling methods by examining each method’s internal table.

The garbage collector iterates through all the type objects to obtain the set of roots stored in static fields.

When a garbage collection starts, it assumes that all objects in the heap are garbage.

In other words, it is assumed that the thread’s stack contains no variables that refer to objects in the heap, that no CPU registers refer to objects in the heap, and that no static fields refer to objects
in the heap.

The garbage collector starts what is called the marking phase of the collection.

This is when the collector walks up the thread’s stack checking all of the roots.

If a root is found to refer to an object, a bit will be turned on in the object’s sync block index field-this is how the object is marked.

For example, the garbage collector might locate a local variable that points to an object in the heap.

Figure 20-3 shows a heap containing several allocated objects, and the application’s roots refer directly to objects A, C, D, and F.

All of these objects are marked.

When marking object D, the garbage collector notices that this object contains a field that refers to object H, causing object H to be marked as well.

The garbage collector continues to walk through all reachable objects recursively.

After a root and the objects referenced by its fields are marked, the garbage collector checks the next root and continues marking objects.

If the garbage collector is going to mark an object that it previously marked, it can stop walking down that path.

This behavior serves two purposes.

First, performance is enhanced significantly because the garbage collector doesn’t walk through a set of objects more than once.
Second, infinite loops are prevented if you have any circular linked lists of objects.

Once all of the roots have been checked, the heap contains a set of marked and unmarked objects.

The marked objects are reachable via the application’s code, and the unmarked objects are unreachable.

The unreachable objects are considered garbage, and the memory that they occupy can be reclaimed.

The garbage collector now starts what is called the compact phase of the collection.

This is when the collector traverses the heap linearly looking for contiguous blocks of unmarked (garbage) objects.

If small blocks are found, the garbage collector leaves the blocks alone.

If large free contiguous blocks are found, however, the garbage collector shifts the nongarbage objects down in memory to compact the heap.

Naturally, moving the objects in memory invalidates all variables and CPU registers that contain pointers to the objects.

So the garbage collector must revisit all of the application’s roots and modify them so that each root’s value points to the objects’ new memory location.

In addition, if any object contains a field that refers to another moved object, the garbage collector is responsible for correcting these fields as well. After the heap memory is compacted, the
managed heap’s NextObjPtr pointer is set to point to a location just after the last nongarbage object.

Figure 20-4 shows the managed heap after a collection.

As you can see, a garbage collection generates a considerable performance hit, which is the major downside of using a managed heap.

But keep in mind that garbage collections occur only when generation 0 is full, and until then, the managed heap is significantly faster than a C-runtime heap.

Finally, the CLR’s garbage collector offers some optimizations that greatly improve the performance of garbage collection.

I’ll discuss these optimizations later in this chapter, in the “Generations” and “Other Garbage Collector Performance Topics” sections.

As a programmer, you should take away a couple of important points from this discussion.

To start, you no longer have to implement any code to manage the lifetime of objects your application uses.

And notice how the two bugs described at the beginning of this chapter no longer exist.

First, it’s not possible to leak objects because any object not accessible from your application’s roots can be collected at some point.
Second, it’s not possible to access an object that is freed because the object won’t be freed if it is reachable, and if it’s not reachable, your application has no way to access it.

Also, since a collection causes memory compaction, it is not possible for managed objects to fragment your process’ virtual address space.

This would sometimes be a severe problem with unmanaged heaps but is no longer an issue when using the managed heap.

Using large objects (discussed later in this chapter) is an exception to this, and fragmentation of the large object heap is possible.

Note:

If garbage collection is so great, you might be wondering why it isn’t in ANSI C++.

The reason is that a garbage collector must be able to identify an application’s roots and must also be able to find all object pointers.

The problem with unmanaged C++ is that it allows casting a pointer from one type to another, and there’s no way to know what a pointer refers to.

In the CLR, the managed heap always knows the actual type of an object and uses the metadata information to determine which members of an object refer to other objects.