Wednesday, June 15, 2011

Generational GC always better?

I have heard a lot of stories about different ways of implementing GCs in the past few years. Academic papers that are usually the base for such opinions are highly outdated and a real comparison of different GC implementations for all the different architectures that Firefox is running on has never been published. (Please please tell me if I am wrong and there is something that compares GC approaches that work all the way between high end desktop machines and low end smart phones!)

Here at Mozilla we have a cool animation that uses HTML 5 and WebGL features.
The Flight of the Navigator definitely shows every flaw of the GC because long GC events result in annoying pauses during the animation.
So lets compare the generational-moving approach that Google Chrome uses to our old-school "mark and sweep" approach.



The first graph shows the GC pause times for Chrome in msec. The second graph shows the same benchmark with a nightly build of Firefox.
Chrome has many GC events that are close to 4 msec but there are also the outliers that cause GC pauses of up to 900ms. We have constant pauses of around 100ms.
Our pause time is not perfect but comparing the video experience between the 2 browsers I definitely favor many barely noticeable 100ms pauses instead of some 900ms pauses where I start worrying if my browser has crashed.

We invested a lot of time decreasing the GC pause time. Our compartment approach assures that our GC has a minimal workload and we moved most of the sweeping cost off the critical path to a background thread.

We also go towards a generational and incremental GC model but we have to make sure that we watch our outliers closely.

4 comments:

  1. Pretty cool analysis; you don't see much attention paid to GC, so I"m happy to see it.

    Also, although I'm not aware of any significant changes to GC in v8 since m12 shipped, I'd love to see FF nightly being benchmarked against Chrome Canary or whathaveyou. :)

    Cheers

    ReplyDelete
  2. Good point Paul! I tried the Google Canary build but the results didn't change much. The peak pause time is still around 900msec.

    ReplyDelete
  3. Thanks for posting this example of bad behavior in Chrome!

    This was a bug in our typed array support that meant that finalization of typed arrays was delayed. With that bug fixed the max pause time in Chrome on my machine is 30ms. Before the fix I could reproduce your spikes of up to a second.

    You seem to conclude that this example could be bad for your generational GC work. I don't think it will. Most of the data in this example is very short lived and you can get rid of it in new space collections. After fixing our bug we do not get a single mark-sweep in this benchmark. Everything dies in new space collections. I think you will find that generational GC will help you on this example as well.

    Thanks again for posting this example so we could catch that bug. The fix is available in Chrome Canary builds.

    Cheers, -- Mads

    ReplyDelete
  4. Good to see that the generational model works. Respect for the quick fix Mads!

    ReplyDelete