Java app developers, protect your intellectual property and improve the end-user experience. Get Excelsior JET.

5+ Garbage Collectors

A recently published essay titled “Why mobile web apps are slow” deservedly got much attention in the developer community. I must say I’ve read it with interest though the “All about garbage collection” section left serious doubts. Referencing the research paper, the author replicates two statements from it:

  1. Apps running in a GCed environment reach maximum performance only if 4x the required memory is available
  2. Below that memory boundary, explicit memory managers substantially outperform all of the GCs

Well, it sounds a bit apocalyptic to me. The author argues that we overlooked/disregard/are not aware of this anomaly just because today’s desktops and servers are equipped with gigabytes of RAM.

My insight (or maybe, many years experience in memory management) suggested me that something is wrong with these assertions. First off, I decided to simply repeat the same experiment with the use of an up-to-date JVM.

Quick check

A JVM author always has a suite of SPEC j* benchmarks at hand, you know. Given the jess test from SPEC jvm98 (selected by the blog author to illustrate the problem), I ran it on Oracle JRE 7u21, HotSpot Client VM under Windows 7/Intel Core2 Duo. The minimal heap size at which the test survives is about 3MB. Following the methodology proposed in the paper, I logged total execution time of 20 runs and value of the “Peak Memory Usage” performance counter with that heap size. Then, gradually increasing the heap size and logging the respective values, I obtained the following graph illustrating how performance depends on footprint (peak memory usage).

SPECjvm98 jess, Oracle JRE7u21
It shows that sufficient performance is achieved with just 1.2x minimal possible footprint and the performance difference is negligible. Compare it to the original figure for the same benchmark from the cited paper:
SPECjvm98 jess, Jikes RVM 2.3.2
Well, the results are different, to say the least. Maybe the minimal heap size for this test is too small? Okay, let us take SPEC jvm98 javac, another test used in the paper, whose minimal heap size is about 11MB. Repeating exactly the same for it, I received the following curve:
SPECjvm98 javac, Oracle JRE7u21
This time, performance varies substantially but, again, sufficient performance is delivered at 1.2x minimal footprint. The original figure from the paper is given below.

SPECjvm98 javac, Jikes RVM 2.3.2

Who are those players in the GC team?

Let us name the GCs that provide acceptable throughput by taking 4x the required memory, as reported in the research work.

As of 2004, Jikes RVM included the following GC implementations. MarkSweep cannot avoid heap fragmentation and visits every single object each collection cycle. SemiSpace doubles the minimal required space and copies the entire heap to and fro. Typically, these algorithms are considered primitives to build hybrid GC systems and I am surprised that they were included in the evaluation as participants. GenCopy is the classic Appel’s generational GC having only a nursery for allocation and an old space. When triggered, it promotes all nursery survivals (which are not necessarily aged objects) to the old space thus provoking frequent full collections (whole heap copying) unless provided with more than enough memory. GenMS is a generational GC (nursery plus mark-sweep old space) that does not cope with heap fragmentation. CopyMS is non-generational GC with nursery and mark-sweep evacuation space. It traverses the entire heap each collection and can do nothing with fragmentation either.

To summarize, some of the presented GCs are just like BubbleSort in the space of sorting algorithms, others do not address heap fragmentation, and the remaining ones bloat the old generation space with floating garbage – objects prematurely considered long living. Another point is that Jikes RVM MMTk also had two GCs based on reference counting but, for some reasons, no results were reported for them in the scope of the paper.

A better GC

The default HotSpot’s generational GC is enough to address all the mentioned problems, if we are not talking about giant heaps. From the allocation nursery , objects are promoted to the copying semi-space area, and those got aged are evacuated to the mark-compact old generation space. This garbage collector from Oracle JRE was basically the same in Sun JRE back in 2005, at the time of writing the discussed paper.

Order (of publication) is important

Actually, another research work, accomplished with Jikes RVM, solves the footprint issues and directly matches the “GC for mobile apps” topic. In “MC2: High-Performance Garbage Collection for Memory-Constrained Environments”, the authors propose to add various heap compaction techniques to the generational GCs described above. This article was published in 2004 (one year before the previously discussed paper) and Emery Berger was a co-author of both. That said, it remains uncertain why those five simplistic GCs were selected for evaluation in the paper published in 2005.

GC vs malloc/free

This is a really long-running feud. If you do not have time to examine the research paper, I will follow up on how and under which conditions explicit MM was able to “substantially outperform” GC-based MM.

Discussion on reddit

Categories: Java

8 Responses

  1. Francis Stephens Says:

    I would really love to read an article about your experiences around the areas where manual memory management is a big win.

  2. David Jeske Says:

    Sadly, both this article, and the original article, slightly misunderstand the problem with GC on mobile.

    The problem is responsiveness and pause-times. Every currently shipping production GC except Azul Zing does a heap-proportional world-stop to start collection. This means your app pauses at unpredtable times. If your heap is too big, that pause is unacceptable.

    Unity3d is a cross-paltform mobile game development environment which uses the CLR/Mono runtime, with GC. They recommend you keep your iOS heap below **–>> ONE <–** megabyte to avoid pauses. A 1MB heap causes a 7ms pause when it does a world-stop to mark heap objects. If your heap is bigger, it pauses longer.

  3. Vitaly Mikheev Says:

    >>Sadly, both this article,
    >>and the original article,
    >>slightly misunderstand
    >>the problem with GC on mobile.

    Of course, lattency (max GC pause time) is another important property. It addressed by incremental GCs and the paper I referenced (“MC2: High-Performance Garbage Collection for Memory-Constrained Environments”) is about this class of GC algorithms.

    >>Every currently shipping
    >>production GC except Azul Zing
    >>does a heap-proportional
    >>world-stop to start

    That’s not quite correct. Sure, Zing’s C4 GC does a great job thought it targets servers and giant heaps and it’s not clear if it can be effectivety adapted to mobile devices. There are at least two currently shipping production VMs that incorporate GCs bounding max pause: PERC JVM; and IBM J9 including Metronome GC (a result of tremendous work at IBM Research).

    If I manage to get those systems for evaluation purposes, I will shed some light on this aspect of garbage collection too.

  4. Gabriel Castro Says:

    You can get a version of J9 from the IBM site:

    If you want the Windows version you’ll have to download the Eclipse package.

  5. tobi Says:

    For throughput, these small heap sizes of 11mb mean nothing. What does it look like with a typical web application using 100s or GBs of memory?

  6. Dmitry Leskov Says:

    @tobi: The topic discussed is mobile web apps, i.e JavaScript code running in a mobile browser, not server-side Java apps. The author of the original article, to which this post is a response, drew his conclusions from an article about Java Garbage Collectors, most likely because there is no such article about JavaScript ones.

  7. Erik Corry Says:

    Interesting test, and I am also very sceptical about the “why mobile web apps are slow”, esp. the emphasis on GC as a problem.

    But I see some issues with your test too. The original paper compared the max reachable data with the max heap size allowed to the VM. The first is not strictly comparable to your “what’s the smallest heap limit that allows the VM to complete the benchmark”. For example for a na├»ve semispace single generation GC your number would be almost exactly twice their number.

    The second may not be comparable with your test either. Can you be sure that the VM is not using more memory than the command line flags allow it?

  8. Dmitry Leskov Says:

    Apparently we are not the only ones questioning the credibility of Drew’s conclusions on Garbage Collection: