Skip to content

proposal: cmd/compile: appropriately disable garbage collector #24299

Closed
@pciet

Description

@pciet

Disabling the garbage collector in cmd/compile via debug.SetGCPercent(-1) saves significant time according to compilebench. Since the compile command is used per-package there are cases where the garbage collector is freeing memory the OS is about to free anyway, so this proposal is to define a feature to disable the collector in certain cases for cmd/compile.

name       old time/op       new time/op       delta
Template         172ms ± 1%        162ms ± 1%   -5.88%  (p=0.000 n=9+8)
Unicode         80.2ms ± 2%       74.3ms ± 1%   -7.38%  (p=0.000 n=10+8)
GoTypes          572ms ± 1%        542ms ± 0%   -5.24%  (p=0.000 n=9+9)
Compiler         2.63s ± 1%        2.50s ± 1%   -4.91%  (p=0.000 n=10+10)
SSA              6.67s ± 1%        6.27s ± 0%   -6.00%  (p=0.000 n=10+10)
Flate            111ms ± 1%        105ms ± 3%   -5.44%  (p=0.000 n=9+10)
GoParser         137ms ± 1%        130ms ± 1%   -5.38%  (p=0.000 n=8+9)
Reflect          365ms ± 1%        346ms ± 1%   -5.13%  (p=0.000 n=9+10)
Tar              161ms ± 1%        155ms ± 4%   -3.56%  (p=0.004 n=10+10)
XML              193ms ± 1%        185ms ± 1%   -4.09%  (p=0.000 n=9+9)
StdCmd           16.7s ± 1%        12.8s ± 0%  -23.46%  (p=0.000 n=9+10)


name       old user-time/op  new user-time/op  delta
Template         221ms ± 4%        165ms ± 8%  -25.32%  (p=0.000 n=10+10)
Unicode          112ms ± 7%         77ms ± 7%  -31.07%  (p=0.000 n=10+10)
GoTypes          718ms ± 3%        564ms ± 2%  -21.50%  (p=0.000 n=10+10)
Compiler         3.31s ± 2%        2.60s ± 1%  -21.59%  (p=0.000 n=10+10)
SSA              8.75s ± 2%        6.53s ± 1%  -25.38%  (p=0.000 n=10+10)
Flate            135ms ± 8%        105ms ± 8%  -22.49%  (p=0.000 n=10+10)
GoParser         172ms ± 3%        135ms ± 2%  -21.22%  (p=0.000 n=8+9)
Reflect          448ms ± 3%        350ms ± 2%  -21.92%  (p=0.000 n=9+9)
Tar              202ms ± 9%        160ms ± 3%  -21.01%  (p=0.000 n=10+9)
XML              242ms ± 4%        185ms ± 6%  -23.34%  (p=0.000 n=10+10)


name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%   -0.03%  (p=0.005 n=10+10)
Unicode         28.8MB ± 0%       28.8MB ± 0%     ~     (p=0.093 n=10+10)
GoTypes          112MB ± 0%        112MB ± 0%   -0.01%  (p=0.029 n=10+10)
Compiler         466MB ± 0%        466MB ± 0%     ~     (p=0.105 n=10+10)
SSA             1.48GB ± 0%       1.48GB ± 0%     ~     (p=0.105 n=10+10)
Flate           24.3MB ± 0%       24.3MB ± 0%   -0.04%  (p=0.002 n=10+10)
GoParser        30.7MB ± 0%       30.7MB ± 0%   -0.04%  (p=0.000 n=9+10)
Reflect         76.3MB ± 0%       76.3MB ± 0%   -0.02%  (p=0.000 n=7+10)
Tar             39.2MB ± 0%       39.2MB ± 0%   -0.03%  (p=0.002 n=10+9)
XML             41.5MB ± 0%       41.4MB ± 0%   -0.02%  (p=0.000 n=10+9)


name       old allocs/op     new allocs/op     delta
Template          385k ± 0%         385k ± 0%   -0.03%  (p=0.000 n=10+10)
Unicode           342k ± 0%         342k ± 0%     ~     (p=0.118 n=10+10)
GoTypes          1.19M ± 0%        1.19M ± 0%   -0.02%  (p=0.000 n=9+10)
Compiler         4.52M ± 0%        4.52M ± 0%   -0.00%  (p=0.000 n=10+10)
SSA              12.2M ± 0%        12.2M ± 0%   -0.00%  (p=0.000 n=9+10)
Flate             234k ± 0%         234k ± 0%   -0.04%  (p=0.000 n=10+10)
GoParser          318k ± 0%         317k ± 0%   -0.03%  (p=0.000 n=10+8)
Reflect           974k ± 0%         974k ± 0%   -0.01%  (p=0.000 n=10+10)
Tar               395k ± 0%         395k ± 0%   -0.03%  (p=0.000 n=10+9)
XML               404k ± 0%         404k ± 0%   -0.02%  (p=0.000 n=10+10)

(with go version devel +1b1c8b3 Sat Feb 17 18:35:41 2018 +0000 linux/amd64, four cores, and 'performance' CPU frequency governor)

Running the benchmark and compiling the Go toolchain worked on an 8GB linux/amd64 computer with the garbage collector disabled.

Two concerns from https://groups.google.com/forum/#!topic/golang-dev/atj2hJIJj4o are for limited systems such as the Raspberry Pi and for large packages that may be created by generating code, but a conclusion is that there may be a careful worthwhile cmd/compile change to make.

I plan to report results here from:

  • how low can memory be limited on my 8GB linux/amd64 computer with and without GC enabled
  • adding a large generated code case to compilebench

Activity

added this to the Proposal milestone on Mar 7, 2018
ALTree

ALTree commented on Mar 7, 2018

@ALTree
Member

Potentially unbounded memory grow while compiling for a ~20% speed-up in compilation times for a typical package seems a bad trade-off. I this had the potential to cut in half compilation times it could be worth it, but 20% is not much. CPU-time reduction is also very small, I wonder if filling up the memory when compiling many packages in parallel could make it even less worthwhile.

And anyway users can already do this with GOCG=off.

this proposal is to define a feature to disable the collector in certain cases for cmd/compile.

which cases? It's not clear from the proposal. Like on certain systems? Or when compiling certain packages? Or both?

pciet

pciet commented on Mar 7, 2018

@pciet
ContributorAuthor

Potentially unbounded memory grow while compiling for a ~5% speed-up in compilation times for a typical package seems a bad trade-off. I this had the potential to cut in half compilation times it could be worth it, but 5% is not much.

I may misunderstand, but I think the benchmark means 5% in kernel and 20-25% in application, which is a noticeable difference by a person.

which cases? It's not clear from the proposal. Like on certain systems? Or when compiling certain packages? Or both?

We’re missing data from everything not amd64/linux. I’d like to try with large open source projects. @ianlancetaylor mentioned very large packages built by generating code. The proposal is to define these cases and the feature that meets all needs. My thought and guess is disabling it may help 80% of people without any crashing, and otherwise we can reenable it by checking something.

And anyway users can already do this with GOCG=off.

Yes, but I’d prefer to not worry about that and a free noticeable improvement is good for the project.

ALTree

ALTree commented on Mar 7, 2018

@ALTree
Member

Would you mind if I'd label this proposal as "on hold" until you have all the data you need to come up with a concrete plan? The proposal process is usually used for concrete proposals with most of the details already worked out.

pciet

pciet commented on Mar 7, 2018

@pciet
ContributorAuthor

@ALTree that's fine. I could go back to the golang-dev thread and work through it there too. Thanks.

agnivade

agnivade commented on Mar 7, 2018

@agnivade
Contributor

I am a bit apprehensive about this. IMO, this really feels like a slippery slope. There are whole lot of cases where disabling GC gives a boost. But changing the compiler to dynamically switch off GC seems like a cop-out to me.

We should instead optimize the runtime further instead of switching off GC to improve performance. Especially when there is already a switch (GOGC) exposed to the user.

pciet

pciet commented on Mar 7, 2018

@pciet
ContributorAuthor

On the disabling front: what about a goroutine in cmd/compile that periodically (every 100ms?) checks memory usage and turns on regular garbage collection and returns if over a platform default that can be adjusted with an environment variable?

mvdan

mvdan commented on Mar 7, 2018

@mvdan
Member

There's always making the compiler generate less garbage. For example, at the moment it parses files via cmd/compile/internal/syntax, and translates that AST to cmd/compile/internal/gc's. That results in every AST node being allocated twice, and lots of little objects for the GC to keep track of.

That will eventually be cleaned up, though. I would imagine that once the compiler gets better at generating less garbage, turning the GC off will have less of an impact.

josharian

josharian commented on Mar 7, 2018

@josharian
Contributor

Long term plan is indeed to use less memory. Skipping the intermediate ast is one big piece of that. The other is lazy importing, since much of what gets imported is unused. @mdempsky is actively working on the latter, I believe.

pciet

pciet commented on Mar 9, 2018

@pciet
ContributorAuthor

That will eventually be cleaned up, though. I would imagine that once the compiler gets better at generating less garbage, turning the GC off will have less of an impact.

If this isn’t the case by the end of the Go 1.11 development cycle then I think invisibly (no regressions because of memory use changes) disabling the collector for (assumed) widespread 10-30% time reduction is the right move. This would require a Go 1.12 issue to revisit the workaround.

mvdan

mvdan commented on Mar 9, 2018

@mvdan
Member

That seems to imply that making the compiler 5% faster is an immediate priority. Sure, the compiler could be faster, and it is made faster every release. But I don't see the need for this kind of urgency, especially when this workaround has many potential downsides. And also since it's already available via GOGC.

For example, if one compiles very large programs, I wouldn't be surprised if turning off the GC doubled the peak memory use of the compiler. Have you measured the downsides to disabling the GC in any way? Also remember the machines that have low RAM - if I remember correctly, even with GC on some ARM builders were having issues with memory.

pciet

pciet commented on Mar 9, 2018

@pciet
ContributorAuthor

That seems to imply that making the compiler 5% faster is an immediate priority.

5% doesn’t seem worth a workaround effort, but 20-30% does to me. I may be misunderstanding the benchmark. The benchmark system is Ubuntu server without a GUI, and I’m assuming the time and user-time add.

For example, if one compiles very large programs, I wouldn't be surprised if turning off the GC doubled the peak memory use of the compiler. Have you measured the downsides to disabling the GC in any way?

From compilebench I assume the total allocations (without each package’s memory being released to the OS taken into account) generally don’t go beyond a few GB, which is fine for a typical desktop compile. Disabling the GC doesn’t change that number, and I assume most large programs consist of packages that fit within the compilebench constraints.

Also remember the machines that have low RAM - if I remember correctly, even with GC on some ARM builders were having issues with memory.

Having a dynamic reenable like I suggested earlier would cover these cases. An idea is that if memory is overused then the memory threshold environment variable could be updated by the toolchain (so all future compiles use that) then the compile could be retried. Worst case the GC is back to always on and the user didn’t see any difference.

pciet

pciet commented on Mar 11, 2018

@pciet
ContributorAuthor

linux/amd64 with 1 GB of memory (kernel flag mem=1G) and spinning hard drive:

name       old time/op       new time/op       delta
Template         208ms ±12%        228ms ±11%     +9.93%  (p=0.010 n=10+9)
Unicode          109ms ±37%        128ms ±65%       ~     (p=0.218 n=10+10)
GoTypes          592ms ± 4%        589ms ± 5%       ~     (p=0.661 n=10+9)
Compiler         2.74s ± 4%        2.70s ± 3%       ~     (p=0.095 n=10+9)
SSA              6.79s ± 2%      161.30s ±14%  +2276.02%  (p=0.000 n=10+10)
Flate            111ms ± 6%        600ms ±20%   +439.18%  (p=0.000 n=8+10)
GoParser         144ms ±17%        293ms ± 7%   +103.54%  (p=0.000 n=10+10)
Reflect          378ms ± 4%        395ms ± 7%     +4.27%  (p=0.043 n=10+9)
Tar              180ms ± 1%        257ms ±10%    +43.12%  (p=0.000 n=8+9)
XML              215ms ±10%        230ms ± 1%       ~     (p=0.408 n=10+8)
StdCmd           27.2s ±21%       153.6s ±17%   +465.76%  (p=0.000 n=8+10)

name       old user-time/op  new user-time/op  delta
Template         221ms ± 4%        168ms ±10%    -24.10%  (p=0.000 n=9+10)
Unicode          114ms ± 6%         78ms ± 8%    -31.47%  (p=0.000 n=10+10)
GoTypes          720ms ± 2%        560ms ± 4%    -22.12%  (p=0.000 n=10+10)
Compiler         3.26s ± 1%        2.60s ± 2%    -20.10%  (p=0.000 n=9+10)
SSA              8.74s ± 1%        7.59s ± 4%    -13.20%  (p=0.000 n=10+10)
Flate            139ms ± 4%        113ms ±19%    -18.37%  (p=0.000 n=9+10)
GoParser         174ms ± 3%        139ms ± 5%    -19.82%  (p=0.000 n=10+10)
Reflect          454ms ± 0%        353ms ± 7%    -22.24%  (p=0.000 n=9+10)
Tar              208ms ± 2%        161ms ± 4%    -22.84%  (p=0.000 n=10+10)
XML              242ms ± 1%        193ms ± 3%    -20.04%  (p=0.000 n=9+9)

name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%     -0.03%  (p=0.002 n=10+10)
Unicode         28.8MB ± 0%       28.8MB ± 0%     -0.01%  (p=0.015 n=10+10)
GoTypes          112MB ± 0%        112MB ± 0%       ~     (p=0.113 n=9+10)
Compiler         466MB ± 0%        466MB ± 0%     -0.01%  (p=0.003 n=9+10)
SSA             1.48GB ± 0%       1.48GB ± 0%       ~     (p=0.093 n=10+10)
Flate           24.3MB ± 0%       24.3MB ± 0%     -0.04%  (p=0.000 n=10+10)
GoParser        30.7MB ± 0%       30.7MB ± 0%     -0.04%  (p=0.000 n=10+10)
Reflect         76.3MB ± 0%       76.3MB ± 0%     -0.02%  (p=0.000 n=10+10)
Tar             39.2MB ± 0%       39.2MB ± 0%     -0.02%  (p=0.009 n=10+10)
XML             41.5MB ± 0%       41.4MB ± 0%     -0.02%  (p=0.019 n=10+10)

name       old allocs/op     new allocs/op     delta
Template          385k ± 0%         385k ± 0%     -0.03%  (p=0.000 n=10+10)
Unicode           342k ± 0%         342k ± 0%     -0.01%  (p=0.004 n=10+10)
GoTypes          1.19M ± 0%        1.19M ± 0%     -0.01%  (p=0.000 n=10+10)
Compiler         4.52M ± 0%        4.52M ± 0%     -0.01%  (p=0.000 n=9+10)
SSA              12.2M ± 0%        12.2M ± 0%     -0.00%  (p=0.000 n=10+10)
Flate             234k ± 0%         234k ± 0%     -0.04%  (p=0.000 n=9+10)
GoParser          318k ± 0%         317k ± 0%     -0.03%  (p=0.000 n=9+9)
Reflect           974k ± 0%         974k ± 0%     -0.01%  (p=0.000 n=10+10)
Tar               395k ± 0%         395k ± 0%     -0.02%  (p=0.000 n=10+9)
XML               404k ± 0%         404k ± 0%     -0.02%  (p=0.000 n=10+10)

Disabling on platforms with virtual memory (all of them?) shouldn’t cause crashes (assuming ample drive space), but for large cases performance can be severely impacted by memory swapping to the point of being unusable.

The compile command is used on each package separately but has to load the entirety of its dependency object code, so it appears a lot of memory can be used in a single call especially at the root package. But for compilebench cases this number appears to be under 8 GB.

Conclusion

The cmd/compile memory needs are unbounded in relation to program size and performance is majorly helped by the garbage collector after a knee point of memory use, but before that point, which is significant on most development computers for small to medium programs, we can see 10%-30% cmd/compile time reduction by disabling the garbage collector.

pciet

pciet commented on Mar 15, 2018

@pciet
ContributorAuthor

I misunderstood the benchmark. The first table (time/op) is wall time, and the second time (user-time/op) is the value reported by os.ProcessState.UserTime() (I was thinking this was kernel vs app time - also I can’t just add percentages for this even though the values are somewhat close).

For the user the wall time is what they’ll perceive, so we are looking at ~5% like @mvdan said. That doesn’t seem like a worthwhile increase for a workaround, so I’ll close this. Thanks.

locked and limited conversation to collaborators on Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @josharian@agnivade@mvdan@ALTree@pciet

        Issue actions

          proposal: cmd/compile: appropriately disable garbage collector · Issue #24299 · golang/go