A recently published essay titled “Why mobile web apps are slow” deservedly got much attention in the developer community. I must say I’ve read it with interest though the “All about garbage collection” section left serious doubts. Referencing the research paper, the author replicates two statements from it:
- Apps running in a GCed environment reach maximum performance only if 4x the required memory is available
- Below that memory boundary, explicit memory managers substantially outperform all of the GCs
Well, it sounds a bit apocalyptic to me. The author argues that we overlooked/disregard/are not aware of this anomaly just because today’s desktops and servers are equipped with gigabytes of RAM.
My insight (or maybe, many years experience in memory management) suggested me that something is wrong with these assertions. First off, I decided to simply repeat the same experiment with the use of an up-to-date JVM.
A JVM author always has a suite of SPEC j* benchmarks at hand, you know. Given the jess test from SPEC jvm98 (selected by the blog author to illustrate the problem), I ran it on Oracle JRE 7u21, HotSpot Client VM under Windows 7/Intel Core2 Duo. The minimal heap size at which the test survives is about 3MB. Following the methodology proposed in the paper, I logged total execution time of 20 runs and value of the “Peak Memory Usage” performance counter with that heap size. Then, gradually increasing the heap size and logging the respective values, I obtained the following graph illustrating how performance depends on footprint (peak memory usage).
It shows that sufficient performance is achieved with just 1.2x minimal possible footprint and the performance difference is negligible. Compare it to the original figure for the same benchmark from the cited paper:
Well, the results are different, to say the least. Maybe the minimal heap size for this test is too small? Okay, let us take SPEC jvm98 javac, another test used in the paper, whose minimal heap size is about 11MB. Repeating exactly the same for it, I received the following curve:
This time, performance varies substantially but, again, sufficient performance is delivered at 1.2x minimal footprint. The original figure from the paper is given below.
Who are those players in the GC team?
Let us name the GCs that provide acceptable throughput by taking 4x the required memory, as reported in the research work.
As of 2004, Jikes RVM included the following GC implementations. MarkSweep cannot avoid heap fragmentation and visits every single object each collection cycle. SemiSpace doubles the minimal required space and copies the entire heap to and fro. Typically, these algorithms are considered primitives to build hybrid GC systems and I am surprised that they were included in the evaluation as participants. GenCopy is the classic Appel’s generational GC having only a nursery for allocation and an old space. When triggered, it promotes all nursery survivals (which are not necessarily aged objects) to the old space thus provoking frequent full collections (whole heap copying) unless provided with more than enough memory. GenMS is a generational GC (nursery plus mark-sweep old space) that does not cope with heap fragmentation. CopyMS is non-generational GC with nursery and mark-sweep evacuation space. It traverses the entire heap each collection and can do nothing with fragmentation either.
To summarize, some of the presented GCs are just like BubbleSort in the space of sorting algorithms, others do not address heap fragmentation, and the remaining ones bloat the old generation space with floating garbage – objects prematurely considered long living. Another point is that Jikes RVM MMTk also had two GCs based on reference counting but, for some reasons, no results were reported for them in the scope of the paper.
A better GC
The default HotSpot’s generational GC is enough to address all the mentioned problems, if we are not talking about giant heaps. From the allocation nursery , objects are promoted to the copying semi-space area, and those got aged are evacuated to the mark-compact old generation space. This garbage collector from Oracle JRE was basically the same in Sun JRE back in 2005, at the time of writing the discussed paper.
Order (of publication) is important
Actually, another research work, accomplished with Jikes RVM, solves the footprint issues and directly matches the “GC for mobile apps” topic. In “MC2: High-Performance Garbage Collection for Memory-Constrained Environments”, the authors propose to add various heap compaction techniques to the generational GCs described above. This article was published in 2004 (one year before the previously discussed paper) and Emery Berger was a co-author of both. That said, it remains uncertain why those five simplistic GCs were selected for evaluation in the paper published in 2005.
GC vs malloc/free
This is a really long-running feud. If you do not have time to examine the research paper, I will follow up on how and under which conditions explicit MM was able to “substantially outperform” GC-based MM.