The Dangers of Garbage-Collected Languages

The dangers of garbage-collected languages

Any programmer that has had to manually manage memory knows the value of automatic memory management, also known as garbage collection. Garbage collection prevents several potential bugs that occur in manually managed memory, such as dangling pointers, double free errors, and several types of memory leaks. On the other hand, garbage collection is by no means perfect. It can give the illusion that resource management is no longer necessary, but in some cases manual resource management is still a requirement. Garbage collection also comes at a cost in performance, which can sometimes be problematic.

Memory can still leak

The novice programmer may be misled into believing garbage collection prevents all memory leaks, but this is not the case. Although garbage collection prevents many types of memory leaks, it doesn’t prevent all of them.

In automatic reference counting systems, such as Perl or Objective-C, memory is leaked whenever there are cyclical references, since the reference count is never decremented to zero. The solution in these systems is to break the cycle by specifying that at least one of the references is a “weak” reference, which doesn’t prevent the object from getting garbage collected.

But even in languages with mark-and-sweep garbage collection where cyclical references are correctly garbage collected, such as Java, Javascript, and Ruby, there are still several ways to leak memory. These leaks occur when objects are still reachable from live objects but will never be used again. There are a number of ways this could happen.


Illustration of memory with garbage collector
Objects in memory. Blue represents roots (such as the stack and globals), green represents live objects, yellow represents dead objects that are ready to be garbage collected, red represents live objects that will never be used again (but can’t be garbage collected), and arrows represent references to other objects.

For example, you could use a HashMap as a cache. The map contains references to both the key and value, keeping those objects alive. The garbage collector can’t know which values in the cache will be used again and which won’t. To avoid leaking memory, you need to have some kind of eviction policy. Depending on the application, this could be anything from using a WeakHashMap that removes values when there are no longer any other references to the key to an LRU cache that evicts the least recently used elements. Similar leaks can occur when adding elements to collections without bound. In fact, even if the memory is eventually cleaned up when the collection is no longer needed, these kinds of leaks can lead to out-of-memory errors by using all available memory for collections where only some of the elements are actually needed.

Another way an object can be leaked is if it registers itself with another object, such as an object registering as a listener but not unregistering itself when it is no longer needed. This kind of leak is especially common in GUI code, where it is common to have lists of event listeners and widgets that need to be be cleaned up when a widget is destroyed.

A third way memory could be leaked is by keeping a reference to unneeded objects on a long-lived object. For example, storing megabytes of image data in a long-lived object when that data is no longer needed. As long as the long-lived object is alive, it will needlessly consume a large amount of memory. Relatedly, memory can be leaked by holding on to a large object, when all you need is a smaller part of it. For example, an object that keeps a reference to an entire image object when all it really needs are the dimensions of the image.

A stranger type of memory leak can occur if the garbage collector is conservative. A conservative garbage collector assumes that any memory that looks like a valid address to an allocated object is a pointer to that object. As such, if you have an integer that happens to contain the address of an allocated object, the garbage collector will consider that object live, even if it is just a coincidence. These types of leaks are incredibly difficult to debug but fortunately are pretty rare. Also, conservative garbage collectors are typically only found in languages where pointers can be converted to other types (for example D and C++).

These types of leaks are, of course, also issues when memory is manually managed. The garbage collector will make many types of memory leaks irrelevant, but unless you know to look out for the leaks that can occur, it is far easier than you might think to introduce a memory leak in a garbage-collected language.

Not everything is memory

Garbage collection is designed to automatically clean up unused memory resources. However, garbage collectors generally don’t handle cleanup of non-memory resources very well. Even in most garbage-collected languages, it is still necessary for the developer to manually free resources such as file handles, sockets, database connections, locks, GUI objects, etc. While this is also true in languages without garbage collection, in those languages all resources, memory or otherwise, follow a similar life-cycle, and often a single destructor function will free all associated resources.

Connection conn = connectionPool.getConnection();
try {
  // All uses of conn should go here
  doStuff(conn);
} finally {
  // If you forget this, then you leak the connection.
  // And if you don't have the try-finally or equivalent, an exception
  // in doStuff can still lead to a leak.
  s.close();
}
// At this point conn is closed, and attempting to use
// it results in an exception.

Most (but not all) garbage-collected languages provide some form of finalize method to perform clean up or other final actions before an object is garbage collected. These should not be relied on. The finalize method isn’t called until after the object has been marked as garbage, and that could be quite a while after the object has actually become unreachable. And in some cases, the finalize method may not be called at all! That’s not to say finalize methods shouldn’t be used to make sure resources are cleaned up, but it should be as a precaution, not as the primary method. And in such cases, I would recommend reporting a warning that the resources weren’t freed before they were garbage collected in the finalize method. It’s also important to note that even in garbage-collected languages, you can run into the equivalent of dangling pointer bugs, where you attempt to use a resource after it has been freed. An example of this is  attempting to read from a file after it has already been closed.

Fortunately, many languages provide some form of mechanism to make freeing non-memory resources easier, by specifying a scope where the resource is available and automatically releasing it at the end of the scope. Some examples include the with statement in python, Java’s try-with-resource, scala-arm, and the defer statement in go. If your language doesn’t have explicit support, you will need to use a try-finally block to free the resource even if there is an exception.

The cost of garbage collection

Garbage collection isn’t free. Regardless of the strategy, collecting the garbage takes CPU cycles that could be used for other work. In addition, garbage collectors can introduce other limitations on the application. For most tasks, the performance hit is small enough that the advantages of automatic memory management are worth it. However, it is still important to be aware of the performance characteristics of your garbage collector and how it impacts your application.

Perhaps the most commonly discussed issue with mark-and-sweep style garbage collectors is the non-deterministic timing of garbage collection. The mark-and-sweep algorithm must periodically analyse all memory in use to identify what can be freed,  free that memory, and often compact memory still in use to prevent fragmentation. In the worse case scenario, this causes a “stop the world” event, where the rest of the application must stop to wait for the garbage collector to finish. Some implementations will perform most of the work in a background thread, but compactions will still require a “stop the world.” In this case, the application is only stopped for a brief period of time (usually) and such events happen infrequently. However, for some applications, such as real-time systems, even a brief “stop the world” is unacceptable, and for this reason garbage-collected languages aren’t typically used in this space. Furthermore, even if the garbage collection is done in another thread, it can still cause your application to slow down when the garbage collection happens, especially if the number of objects to scan is large. It is often desirable to tune the garbage collection and heap size to minimize the impact on your application. Depending on the application, you may want to tune the garbage collector to run as infrequently as possible and utilize available RAM at the cost of longer garbage collection runs, or garbage collect more frequently to minimize the impact of each garbage collection. Frequent garbage collection is desirable when a noticeable pause is undesirable, such as with games or other applications with a lot of user interaction. On the other hand, load-balanced servers can benefit from less frequent garbage collections, since when the longer garbage collection does occur on one server, the other servers can pick up the slack.


Simple mark and sweep algorithm
https://www.lucidchart.com/documents/edit/3fd96a99-2892-4dd5-bb90-26d29dbf756e/0?shared=true

A less frequently discussed, but still important, limitation of garbage-collected languages is that they make significantly more use of the heap than their manually managed cousins. Variables that would be stored on the stack in languages like C or C++ are instead stored on the heap, and the stack is used only for primitive types and pointers, or maybe even just pointers. This has a couple of performance implications. First of all, allocating and freeing memory on the heap is significantly more expensive than using the stack. Secondly, variables on the heap will have a lower hit-rate on the CPU cache than variables on the stack.

There is another method where automated resource management is done at compile time with static analysis. Managing memory at compile time avoids the runtime costs of garbage collection and can also manage non-memory resources the same way. The only language I know of that does this is Rust. This approach is relatively new and has its own set of limitations, but I am excited to see where it goes.

Conclusion

Garbage collection is great. It removes a lot of cognitive load by handling memory management for you. But there are still some potholes to watch out for. It isn’t a panacea for all your resource-management needs and can even have some undesirable side-effects. If you work in a garbage-collected language, I recommend that you learn about how your garbage collector works so that you can write code making more effective use of it. I would also encourage software engineers to learn at least one language with manually managed memory, so they can better understand how to manage memory and other resources, and transfer their skills to garbage-collected languages as well. Learning a non-managed language can also lead to a better understanding of the stack, heap, and allocation.

Resources

2 Comments

  1. I don’t like garbage collection. C forever.

  2. I don’t like garbage collection. Rust forever.

Your email address will not be published.