Description
Disabling the garbage collector in cmd/compile via debug.SetGCPercent(-1)
saves significant time according to compilebench. Since the compile command is used per-package there are cases where the garbage collector is freeing memory the OS is about to free anyway, so this proposal is to define a feature to disable the collector in certain cases for cmd/compile.
name old time/op new time/op delta
Template 172ms ± 1% 162ms ± 1% -5.88% (p=0.000 n=9+8)
Unicode 80.2ms ± 2% 74.3ms ± 1% -7.38% (p=0.000 n=10+8)
GoTypes 572ms ± 1% 542ms ± 0% -5.24% (p=0.000 n=9+9)
Compiler 2.63s ± 1% 2.50s ± 1% -4.91% (p=0.000 n=10+10)
SSA 6.67s ± 1% 6.27s ± 0% -6.00% (p=0.000 n=10+10)
Flate 111ms ± 1% 105ms ± 3% -5.44% (p=0.000 n=9+10)
GoParser 137ms ± 1% 130ms ± 1% -5.38% (p=0.000 n=8+9)
Reflect 365ms ± 1% 346ms ± 1% -5.13% (p=0.000 n=9+10)
Tar 161ms ± 1% 155ms ± 4% -3.56% (p=0.004 n=10+10)
XML 193ms ± 1% 185ms ± 1% -4.09% (p=0.000 n=9+9)
StdCmd 16.7s ± 1% 12.8s ± 0% -23.46% (p=0.000 n=9+10)
name old user-time/op new user-time/op delta
Template 221ms ± 4% 165ms ± 8% -25.32% (p=0.000 n=10+10)
Unicode 112ms ± 7% 77ms ± 7% -31.07% (p=0.000 n=10+10)
GoTypes 718ms ± 3% 564ms ± 2% -21.50% (p=0.000 n=10+10)
Compiler 3.31s ± 2% 2.60s ± 1% -21.59% (p=0.000 n=10+10)
SSA 8.75s ± 2% 6.53s ± 1% -25.38% (p=0.000 n=10+10)
Flate 135ms ± 8% 105ms ± 8% -22.49% (p=0.000 n=10+10)
GoParser 172ms ± 3% 135ms ± 2% -21.22% (p=0.000 n=8+9)
Reflect 448ms ± 3% 350ms ± 2% -21.92% (p=0.000 n=9+9)
Tar 202ms ± 9% 160ms ± 3% -21.01% (p=0.000 n=10+9)
XML 242ms ± 4% 185ms ± 6% -23.34% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.9MB ± 0% -0.03% (p=0.005 n=10+10)
Unicode 28.8MB ± 0% 28.8MB ± 0% ~ (p=0.093 n=10+10)
GoTypes 112MB ± 0% 112MB ± 0% -0.01% (p=0.029 n=10+10)
Compiler 466MB ± 0% 466MB ± 0% ~ (p=0.105 n=10+10)
SSA 1.48GB ± 0% 1.48GB ± 0% ~ (p=0.105 n=10+10)
Flate 24.3MB ± 0% 24.3MB ± 0% -0.04% (p=0.002 n=10+10)
GoParser 30.7MB ± 0% 30.7MB ± 0% -0.04% (p=0.000 n=9+10)
Reflect 76.3MB ± 0% 76.3MB ± 0% -0.02% (p=0.000 n=7+10)
Tar 39.2MB ± 0% 39.2MB ± 0% -0.03% (p=0.002 n=10+9)
XML 41.5MB ± 0% 41.4MB ± 0% -0.02% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
Template 385k ± 0% 385k ± 0% -0.03% (p=0.000 n=10+10)
Unicode 342k ± 0% 342k ± 0% ~ (p=0.118 n=10+10)
GoTypes 1.19M ± 0% 1.19M ± 0% -0.02% (p=0.000 n=9+10)
Compiler 4.52M ± 0% 4.52M ± 0% -0.00% (p=0.000 n=10+10)
SSA 12.2M ± 0% 12.2M ± 0% -0.00% (p=0.000 n=9+10)
Flate 234k ± 0% 234k ± 0% -0.04% (p=0.000 n=10+10)
GoParser 318k ± 0% 317k ± 0% -0.03% (p=0.000 n=10+8)
Reflect 974k ± 0% 974k ± 0% -0.01% (p=0.000 n=10+10)
Tar 395k ± 0% 395k ± 0% -0.03% (p=0.000 n=10+9)
XML 404k ± 0% 404k ± 0% -0.02% (p=0.000 n=10+10)
(with go version devel +1b1c8b3 Sat Feb 17 18:35:41 2018 +0000 linux/amd64
, four cores, and 'performance' CPU frequency governor)
Running the benchmark and compiling the Go toolchain worked on an 8GB linux/amd64 computer with the garbage collector disabled.
Two concerns from https://groups.google.com/forum/#!topic/golang-dev/atj2hJIJj4o are for limited systems such as the Raspberry Pi and for large packages that may be created by generating code, but a conclusion is that there may be a careful worthwhile cmd/compile change to make.
I plan to report results here from:
- how low can memory be limited on my 8GB linux/amd64 computer with and without GC enabled
- adding a large generated code case to compilebench
Activity
ALTree commentedon Mar 7, 2018
Potentially unbounded memory grow while compiling for a ~20% speed-up in compilation times for a typical package seems a bad trade-off. I this had the potential to cut in half compilation times it could be worth it, but 20% is not much. CPU-time reduction is also very small, I wonder if filling up the memory when compiling many packages in parallel could make it even less worthwhile.
And anyway users can already do this with
GOCG=off
.which cases? It's not clear from the proposal. Like on certain systems? Or when compiling certain packages? Or both?
pciet commentedon Mar 7, 2018
I may misunderstand, but I think the benchmark means 5% in kernel and 20-25% in application, which is a noticeable difference by a person.
We’re missing data from everything not amd64/linux. I’d like to try with large open source projects. @ianlancetaylor mentioned very large packages built by generating code. The proposal is to define these cases and the feature that meets all needs. My thought and guess is disabling it may help 80% of people without any crashing, and otherwise we can reenable it by checking something.
Yes, but I’d prefer to not worry about that and a free noticeable improvement is good for the project.
ALTree commentedon Mar 7, 2018
Would you mind if I'd label this proposal as "on hold" until you have all the data you need to come up with a concrete plan? The proposal process is usually used for concrete proposals with most of the details already worked out.
pciet commentedon Mar 7, 2018
@ALTree that's fine. I could go back to the golang-dev thread and work through it there too. Thanks.
agnivade commentedon Mar 7, 2018
I am a bit apprehensive about this. IMO, this really feels like a slippery slope. There are whole lot of cases where disabling GC gives a boost. But changing the compiler to dynamically switch off GC seems like a cop-out to me.
We should instead optimize the runtime further instead of switching off GC to improve performance. Especially when there is already a switch (
GOGC
) exposed to the user.pciet commentedon Mar 7, 2018
On the disabling front: what about a goroutine in cmd/compile that periodically (every 100ms?) checks memory usage and turns on regular garbage collection and returns if over a platform default that can be adjusted with an environment variable?
mvdan commentedon Mar 7, 2018
There's always making the compiler generate less garbage. For example, at the moment it parses files via cmd/compile/internal/syntax, and translates that AST to cmd/compile/internal/gc's. That results in every AST node being allocated twice, and lots of little objects for the GC to keep track of.
That will eventually be cleaned up, though. I would imagine that once the compiler gets better at generating less garbage, turning the GC off will have less of an impact.
josharian commentedon Mar 7, 2018
Long term plan is indeed to use less memory. Skipping the intermediate ast is one big piece of that. The other is lazy importing, since much of what gets imported is unused. @mdempsky is actively working on the latter, I believe.
pciet commentedon Mar 9, 2018
If this isn’t the case by the end of the Go 1.11 development cycle then I think invisibly (no regressions because of memory use changes) disabling the collector for (assumed) widespread 10-30% time reduction is the right move. This would require a Go 1.12 issue to revisit the workaround.
mvdan commentedon Mar 9, 2018
That seems to imply that making the compiler 5% faster is an immediate priority. Sure, the compiler could be faster, and it is made faster every release. But I don't see the need for this kind of urgency, especially when this workaround has many potential downsides. And also since it's already available via GOGC.
For example, if one compiles very large programs, I wouldn't be surprised if turning off the GC doubled the peak memory use of the compiler. Have you measured the downsides to disabling the GC in any way? Also remember the machines that have low RAM - if I remember correctly, even with GC on some ARM builders were having issues with memory.
pciet commentedon Mar 9, 2018
5% doesn’t seem worth a workaround effort, but 20-30% does to me. I may be misunderstanding the benchmark. The benchmark system is Ubuntu server without a GUI, and I’m assuming the time and user-time add.
From compilebench I assume the total allocations (without each package’s memory being released to the OS taken into account) generally don’t go beyond a few GB, which is fine for a typical desktop compile. Disabling the GC doesn’t change that number, and I assume most large programs consist of packages that fit within the compilebench constraints.
Having a dynamic reenable like I suggested earlier would cover these cases. An idea is that if memory is overused then the memory threshold environment variable could be updated by the toolchain (so all future compiles use that) then the compile could be retried. Worst case the GC is back to always on and the user didn’t see any difference.
pciet commentedon Mar 11, 2018
linux/amd64 with 1 GB of memory (kernel flag
mem=1G
) and spinning hard drive:Disabling on platforms with virtual memory (all of them?) shouldn’t cause crashes (assuming ample drive space), but for large cases performance can be severely impacted by memory swapping to the point of being unusable.
The compile command is used on each package separately but has to load the entirety of its dependency object code, so it appears a lot of memory can be used in a single call especially at the root package. But for compilebench cases this number appears to be under 8 GB.
Conclusion
The cmd/compile memory needs are unbounded in relation to program size and performance is majorly helped by the garbage collector after a knee point of memory use, but before that point, which is significant on most development computers for small to medium programs, we can see 10%-30% cmd/compile time reduction by disabling the garbage collector.
pciet commentedon Mar 15, 2018
I misunderstood the benchmark. The first table (time/op) is wall time, and the second time (user-time/op) is the value reported by os.ProcessState.UserTime() (I was thinking this was kernel vs app time - also I can’t just add percentages for this even though the values are somewhat close).
For the user the wall time is what they’ll perceive, so we are looking at ~5% like @mvdan said. That doesn’t seem like a worthwhile increase for a workaround, so I’ll close this. Thanks.