Heap Corruption

The heap is a data structure, maintained by the compiler or OS's runtime libraries, responsible for handling memory allocation (e.g. new, delete, malloc, ...). (This heap is not related to another data structure also called "Heap", used for sorting.) Heap corruption occurs when the heap's bookkeeping data--such as which parts of memory are allocated and which are free for new allocations--are corrupted. This usually occurs from incorrect use of memory allocation functions. After heap corruption, undefined behaviour occurs; the program may appear to work correctly to start with, but might fail on the next run, or when recompiled, or at any other time.

Common symptoms of heap corruption

 * Program crashes
 * SIGSEGV, in either user code or library functions;
 * runtime assertion failures
 * Error messages about releasing a chunk of memory twice (usually accompanied by an assertion failure)
 * Error messages about null pointers
 * Assertion errors involving _BLOCK_TYPE_IS_VALID
 * Values being overwritten unexpectedly
 * No symptom

Note: The symptoms of heap corruption do not have to occur near the code that causes the heap corruption.

Releasing a resource twice
This probably is the most common cause of heap corruption.

e.g.:

p = std::malloc(sizeof(struct T)); /* some code */ std::free(p); /* some code */ std::free(p);

or:

p = new T; /* some code */ delete p; /* some code */ delete p;

Lack of copy constructor and/or copy assignment operator
This cause is very common among inexperienced programmers. It is a subtype of the releasing-resources-twice cause.

{  struct buggy {     buggy : p(new char[1]) {} ~buggy { delete[] p; } private: char *p; };  buggy x;   buggy y(x);  // this will copy p--the pointer--not the memory to which the pointer refers. // Result: x and y get destructed, both will free the same memory. }

Rule: If a class manages resources (this gets indicated by the class having a destructor) then the class needs a copy ctor and a copy assignment operator.

Hint: Avoid raw (i.e. normal) pointers. Use pointer classes that manage memory, e.g. std::auto_ptr, boost::scoped_ptr, boost::shared_ptr. RAII is your friend, and will help you avoid many memory problems.

Improper Handling of Pointers

 * Pointers not initialized
 * Dereferencing of invalid pointers (deleted object, null pointers)
 * Not checking whether memory allocation operations were successful
 * Incorrect type casts

Violating Allocation Symmetry
Don't use the delete operator on pointers you got from std::malloc. Don't use std::free on pointers you got from the new operator. The delete[] operator and the delete operator are not interchangable. Every allocation has to be matched with a deallocation using the interface that complements the interface you used for allocation.

Overrunning storage
e.g.

int *p = new int[3]; p[3] = 0;

The array p only has 3 elements, 0...2; p[3] is out of bounds and may overwrite internal data structures.

or:

struct A { int a, b; }; struct B { int a; }; A *a = std::malloc(sizeof(struct B)); a->b = 0;

This problem is less common in C++, which has type-safe allocation (new), than in C. The problem is that malloc only returns enough storage to hold an object of type B, which is smaller than the intended type A; referring to members of A which may be outside the allocated space corrupt the heap.

Stack Overflow
This rare cause of heap corruption technically is a storage overrun. However, this overrun is not a result of accessing memory outside the bounds of an allocated chunk as described in the previous section. Instead, it is a result of overrunning memory allocated to the memory management system itself.

The stack is a datastructure that is used for storing the locations to return to from a function call. Depending on the specific C++ implementation, it can also be used for storing function arguments and/or the return value. The stack usually also is used for storing function-local objects.

The size of the stack is limited by the operating system. Deep nesting of functions, especially by recursion, and large function-local objects may cause the stack to grow beyond the limit imposed by the operating system. Although some implementation specific mechanisms to detect such an overflow exist, the overflow can go unnoticed and cause damaging of other memory areas like the heap.

Guessing and poor documentation
Rule: Don't guess. You'll be wrong, eventually.

Read documentation thoroughly. Look for information about transfer of ownership. "Transfer of ownership" describes any change of responsibility for managing the lifetime of an object. A lot of documentation is languishing in this regard. Don't hesitate to complain at the authors of such documentation. In case of an open source project, don't hesitate to submit a patch for the documentation.

A lot of heap corruption problems are caused by bad writing or bad reading of documentation.

Hardware problems
These are a rare, but e.g. broken RAM can cause heap corruption.

Problems with construction order or destruction order of globals
They can also cause heap corruption. However, this is rare. Avoid globals. If you have globals that depend on the existence of other globals then use singletons.

Debugging heap corruption
Debugging heap corruption is a pain. This is mainly caused by the fact that symptoms and causes of heap corruptions can be found far away from each other in the source code. Heap corruption usually gets detected late, i.e. some time after the erroneous code got executed (this is not limited to heaps. Other datastructures can show similar behaviour in this regard.)

Sometimes, however, the code showing symptoms and the code causing the error are close to each other. Consequently it is useful, to inspect the code near the symptoms and to scan it for patterns of typical causes.

Tools
Some tools can detect patterns of incorrect usage before/during compilation. Check your compiler manual for relevant options. In professional environments, you can often find code checkers that can help you. Ask your colleagues whether such tools exist in your work environment.

Some tools can detect incorrect usage during runtime. They wrap around the invokation or the program to test and usually modify the memory management libraries. For Linux, valgrind is such a tool.

Use the debugger; keep an eye on releasing of resources and array accesses.

Add code to destructors, emitting log entries about which object gets deleted at which line of code. "printf-debugging" is often underestimated.


 * Valgrind -- free Linux tool to help debug memory allocation issues
 * IBM Rational Purify -- commercial Unix and Windows tool

Detect problems early
Some compilers initialize memory to a pattern (like 0xDEADBEEF) that is known to cause symptoms early for certain bugs when the debug mode is enabled. So use the debug mode.

Some implementations allow for enabling additional checks for heap consistency, e.g. defining the environment variable MALLOC_CHECK_ to certain values enables malloc/free debugging for some C library implementations. Please check the documentation for your implementation of the standard library and/or the memory allocation functions you're using for such additional checks and for how to enable them. They can help a lot.

As for all code problems: Write and run tests early, run them often. There can't be put enough emphasis on this.

Code Kung Fu
The producedure of reducing the code to a smaller portion that still exhibits the problem. One often can disable large portions of code (e.g. by adorning them with "#if 0" and "#endif") and still see the heap corruption happening. Try do disable large portions first, then proceed to smaller ones, similar to how binary search works. This will reduce the amount of code that needs to get inspected in order to find the bug that causes the heap corruption.

Avoiding Heap Corruption
There are techniques that, when used consequently, help to prevent heap corruption.

Document any transfer of owner ship
A function signature like  char * foo(char *p, char *q) alone tells nothing. Answer questions like:


 * Who is responsible for creating an object?
 * Who is responsible for deleting an object? Does that responsibility change? When exactly does it change?
 * What happens in case of exceptions? (Many documentations are silent about that. This is a severe problem)

E.g. consider a snippet from a hypothetical smart-pointer class for objects of type foo:

class smart_foo_pointer {  smart_foo_pointer(foo *p); };

One would think that this snippet doesn't need documentation since it is obvious what it does. So wrong. Consider this:

class smart_foo_pointer { struct counter { unsigned value; counter(unsigned initial_value = 0) : value(initial_value) {} }; smart_foo_pointer(foo *p) {    /* initialization code */ } counter *shared_counter; /* counter, shared by all foo_pointers that refer to the same foo instance */ foo *pointer; }

Still looks harmless? Consider this:

We have code that might throw an exception! What does that mean for code that uses smart_foo pointer?

Initialization code 1 implements late takeover of ownership. The smart-pointer becomes the owner of the pointee near the end of the constructor. Consequently, code that uses this version of smart_foo_pointer is responsible to delete the pointee in case of an exception.

Initialization code 2 implements early takeover of ownership. The smart-pointer becomes the owner of the pointee right at the point of the invokation of the ctor. Consequently, code that uses this version of smart_foo_pointer must not attempt to delete the pointee in case of an exception. (boost::shared_ptr uses this type of initialization).

This is a major difference, obviously, and the documentation must not be silent about it. This does not only apply to constructors of smart-pointer classes, it applies to normal functions, too.

Avoid raw pointers
As you could see in the previous section about documention, using raw pointers bears the need to explain a lot of the protocol used when calling a function. It also bears a chance for misunderstandings and, consequently, bugs.

A lot of that can be avoided by using smart-pointers. Smart-pointers serve two purposes:
 * Document the intent. E.g. the signature void foo(std::auto_ptr x)  tells everything about the transfer of ownership involved.
 * Simplify memory management. Smart-pointers manage the lifetime of objects for you. This reduces both the chance for heap corruption and the chance for memory leaks

Life becomes a lot easier for you when you learn about smart-pointers and how to use them properly.

Prefer Containers
int *p = new int[65535];

This allocation is too large to be replaced by a stack allocation. Instead, use an std::vector:

std::vector v(65535);

This provides the same result, but with exception-safe memory management.