Tuesday, July 5, 2011
Wednesday, June 29, 2011
Fixing memory bloat problem
Firefox users complained a lot about the memory situation since the Firefox 4 release. The browser consumes endless memory if it stays idle for a while and after an allocation heavy task it doesn't trigger a GC to shrink the memory footprint.
One of the reasons why this happened is the landing of the Bug 558451 - "Merge JSScope into JSScopeProperty, JSObject". JSScopes were allocated with malloc and not on our JS heap. These off-heap allocations triggered most of our GCs. Without these allocations, our main GC trigger vanished over night. I did a quick fix in
Bug 592007 - "TM: New Scope patch changes GC behavior in browser". The main goal was to imitate our old GC behavior.
Our off-heap allocation trigger is pretty simple: Once we hit 128MB of mallocs after the last GC trigger a new GC.
The problem was that the new trigger has to monitor and predict heap growth rather than simply add off-heap allocations.
I introduced a heap-growth-factor that allows the JS heap to grow by 300% before we force another GC. So the additional trigger was based on the amount of memory that survives the GC. This number has to allow the heap to grow very fast because we don't want to trigger a GC during a critical page-load or benchmark. With these changes, the GC behavior was almost the same as before.
Well now we can see that it was not good enough because we only trigger the GC when we allocate objects.
This means that we don't perform a GC even if we have a huge heap because the trigger limit is not reached. Running the V8 benchmark suite is an example for this bad behavior. Right at the end of the suite the splay benchmark allocates a huge splay tree. We have to increase our JS heap and the amount of reachable memory after each GC is around 300MB. Our GC trigger allows the heap to grow 3x before the next GC is performed.
So after the benchmark we end up with a heap size between 300 and 900MB and we don't perform a GC until the trigger limit (900MB) is reached. This can be forever if you just read a blog or surf on some pages that are not JS allocation heavy.
I did most of my testing on my MacBook Pro with 4GB of RAM. So I never realized this bad behavior.
Recently I bought a new netbook with 1GB RAM and running the V8 benchmark suite on it was painful because afterwards my browser used up all memory and no GC made my browser useable again.
In order to make FF work on my netbook I implemented Bug 656120 - " Increase GC frequency: GC occasionally based on a timer (except when completely idle)". The idea is now that we perform a GC even if the trigger is not reached after 20 seconds. This should shrink the heap without hurting any allocation heavy pages like benchmarks or animations.
Nick Nethercote found that this patch fixed many other bugs that other users filed.
It didn't make it into the FF 6 release. Maybe not enough people complained about the memory bloat problem or we should just buy all release drivers a low-end netbook :) Well the good thing is that the fix will be in FF7! Everybody should see a reduced memory footprint and it should definitely help users with limited devices like netbooks!
PS: This patch also reduces fragmentation but that's for the next post!
One of the reasons why this happened is the landing of the Bug 558451 - "Merge JSScope into JSScopeProperty, JSObject". JSScopes were allocated with malloc and not on our JS heap. These off-heap allocations triggered most of our GCs. Without these allocations, our main GC trigger vanished over night. I did a quick fix in
Bug 592007 - "TM: New Scope patch changes GC behavior in browser". The main goal was to imitate our old GC behavior.
Our off-heap allocation trigger is pretty simple: Once we hit 128MB of mallocs after the last GC trigger a new GC.
The problem was that the new trigger has to monitor and predict heap growth rather than simply add off-heap allocations.
I introduced a heap-growth-factor that allows the JS heap to grow by 300% before we force another GC. So the additional trigger was based on the amount of memory that survives the GC. This number has to allow the heap to grow very fast because we don't want to trigger a GC during a critical page-load or benchmark. With these changes, the GC behavior was almost the same as before.
Well now we can see that it was not good enough because we only trigger the GC when we allocate objects.
This means that we don't perform a GC even if we have a huge heap because the trigger limit is not reached. Running the V8 benchmark suite is an example for this bad behavior. Right at the end of the suite the splay benchmark allocates a huge splay tree. We have to increase our JS heap and the amount of reachable memory after each GC is around 300MB. Our GC trigger allows the heap to grow 3x before the next GC is performed.
So after the benchmark we end up with a heap size between 300 and 900MB and we don't perform a GC until the trigger limit (900MB) is reached. This can be forever if you just read a blog or surf on some pages that are not JS allocation heavy.
I did most of my testing on my MacBook Pro with 4GB of RAM. So I never realized this bad behavior.
Recently I bought a new netbook with 1GB RAM and running the V8 benchmark suite on it was painful because afterwards my browser used up all memory and no GC made my browser useable again.
In order to make FF work on my netbook I implemented Bug 656120 - " Increase GC frequency: GC occasionally based on a timer (except when completely idle)". The idea is now that we perform a GC even if the trigger is not reached after 20 seconds. This should shrink the heap without hurting any allocation heavy pages like benchmarks or animations.
Nick Nethercote found that this patch fixed many other bugs that other users filed.
It didn't make it into the FF 6 release. Maybe not enough people complained about the memory bloat problem or we should just buy all release drivers a low-end netbook :) Well the good thing is that the fix will be in FF7! Everybody should see a reduced memory footprint and it should definitely help users with limited devices like netbooks!
PS: This patch also reduces fragmentation but that's for the next post!
Wednesday, June 15, 2011
Generational GC always better?
I have heard a lot of stories about different ways of implementing GCs in the past few years. Academic papers that are usually the base for such opinions are highly outdated and a real comparison of different GC implementations for all the different architectures that Firefox is running on has never been published. (Please please tell me if I am wrong and there is something that compares GC approaches that work all the way between high end desktop machines and low end smart phones!)
Here at Mozilla we have a cool animation that uses HTML 5 and WebGL features.
The Flight of the Navigator definitely shows every flaw of the GC because long GC events result in annoying pauses during the animation.
So lets compare the generational-moving approach that Google Chrome uses to our old-school "mark and sweep" approach.
The first graph shows the GC pause times for Chrome in msec. The second graph shows the same benchmark with a nightly build of Firefox.
Chrome has many GC events that are close to 4 msec but there are also the outliers that cause GC pauses of up to 900ms. We have constant pauses of around 100ms.
Our pause time is not perfect but comparing the video experience between the 2 browsers I definitely favor many barely noticeable 100ms pauses instead of some 900ms pauses where I start worrying if my browser has crashed.
We invested a lot of time decreasing the GC pause time. Our compartment approach assures that our GC has a minimal workload and we moved most of the sweeping cost off the critical path to a background thread.
We also go towards a generational and incremental GC model but we have to make sure that we watch our outliers closely.
Here at Mozilla we have a cool animation that uses HTML 5 and WebGL features.
The Flight of the Navigator definitely shows every flaw of the GC because long GC events result in annoying pauses during the animation.
So lets compare the generational-moving approach that Google Chrome uses to our old-school "mark and sweep" approach.
The first graph shows the GC pause times for Chrome in msec. The second graph shows the same benchmark with a nightly build of Firefox.
Chrome has many GC events that are close to 4 msec but there are also the outliers that cause GC pauses of up to 900ms. We have constant pauses of around 100ms.
Our pause time is not perfect but comparing the video experience between the 2 browsers I definitely favor many barely noticeable 100ms pauses instead of some 900ms pauses where I start worrying if my browser has crashed.
We invested a lot of time decreasing the GC pause time. Our compartment approach assures that our GC has a minimal workload and we moved most of the sweeping cost off the critical path to a background thread.
We also go towards a generational and incremental GC model but we have to make sure that we watch our outliers closely.
Tuesday, June 14, 2011
Domain specific garbage Collection?
I spent the last 2 years instrumenting, changing, benchmarking, optimizing, hating and finally a little bit appreciating the garbage collector for the JavaScript engine in the Firefox browser.
This blog should collect many insights that I gained in the area of web-browsers and garbage collectors.
The performance bottleneck of the garbage collector hit us over night when JIT compiler were introduced and replaced the slow interpreted execution environment.
Our solution for the GC performance problem is not just a copy and paste of approaches that were researched for statically typed languages like Java.
The following blog posts will introduce our solution and individual performance gains step by step .
This blog should collect many insights that I gained in the area of web-browsers and garbage collectors.
The performance bottleneck of the garbage collector hit us over night when JIT compiler were introduced and replaced the slow interpreted execution environment.
Our solution for the GC performance problem is not just a copy and paste of approaches that were researched for statically typed languages like Java.
The following blog posts will introduce our solution and individual performance gains step by step .
Step zero: Thesis template
One of the initial problems with writing a dissertation is the template chaos. Too many people have too many tastes and there is no optimal template out there. Finally I found "my template" that meets my expectations (after some editing of course...) Here is the link.
Now I am ready for step one... the table of contents.
Now I am ready for step one... the table of contents.
Saturday, June 13, 2009
A new challenge in Mountain View
Last week I moved to Bay Area. I leave my QEMU emulator and security issues regarding buffer overflows behind and change my focus again to virtual machines. The new trace monkey virtual machine from Mozilla is a little bit more complex than my SlimVM :) My fist job is to instrument the VM in order to understand the behavior of JavaScript benchmarks.
I also have to write a journal paper about SlimVM.
Christophs and my SlimVM paper are due the same time.
I guess this is a good time switching to paper writing...
Subscribe to:
Posts (Atom)