Skip to content

runtime: performance degradation in go 1.12 #36521

Closed
@interviewQ

Description

@interviewQ

What version of Go are you using (go version)?

go 1.11.12, planning to upgrade to go 1.12.15

Does this issue reproduce with the latest release?

Yes, with go 1.12.15

What operating system and processor architecture are you using (go env)?

Ubuntu 18.04

What did you do?

Since go 1.11 is about to go EOL, we are planning to upgrade to the next version. I attempted
to upgrade to go 1.12.15, but I am seeing 15-20% performance degradation. A similar result was seen with go 1.13.6.

What did you expect to see?

I was hoping to see performance improvement with a newer go compiler.

What did you see instead?

Instead of performance improvement, saw performance degradation instead.

Is there anything in newer go versions (1.12 and beyond) that could answer the above performance degradation? I read online that the scheduler was modified to allow preemption of long-running goroutines. While I do not know if this could cause the performance issue, is there any way to revert this scheduler behavior just for test purposes. Any other suggestions are welcome.

Activity

ALTree

ALTree commented on Jan 13, 2020

@ALTree
Member

Hi,

thanks for reporting this, but your description of the issue is far too vague to be actionable.

Please provide a runnable, auto-contained benchmark that demonstrates the problem on the latest 1.13 release, together with some benchmarking results on the platform you are running on.

added
WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.
on Jan 13, 2020
changed the title [-]Performance degradation in go 1.12[/-] [+]runtime: performance degradation in go 1.12[/+] on Jan 13, 2020
interviewQ

interviewQ commented on Jan 13, 2020

@interviewQ
Author

Thanks for the response. I do see the performance issue with 1.13 also. Give me a few days to come up with a sample code that demonstrates the issue.

interviewQ

interviewQ commented on Feb 4, 2020

@interviewQ
Author

Sorry for the delay, it took me a while to narrow down the problem code path. Attached is a sample benchmark test. I am running this test on my Mac and here are the results -

1.11.13 -

goos: darwin
goarch: amd64
BenchmarkBlockSplit-12 1 2506669772 ns/op
PASS

1.13.7 -

goos: darwin
goarch: amd64
BenchmarkBlockSplit-12 1 2945086182 ns/op
PASS

So, 2.5 sec -> 2.9 sec equates to 16% drop in performance.
perf.txt

randall77

randall77 commented on Feb 4, 2020

@randall77
Contributor

Here's a simple reproducer. I can reproduce on Linux.

package main

import (
	"fmt"
	"sync"
	"time"
)

var sink []byte

func main() {
	start := time.Now()
	var wg sync.WaitGroup
	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			for i := 0; i < 16000; i++ {
				sink = make([]byte, 64*1024)
			}
			wg.Done()
		}()
	}
	wg.Wait()
	fmt.Printf("time: %f\n", time.Since(start).Seconds())
}

This test is really allocation heavy. It's going to depend on the precise details of how allocation is done. I get these times:

go1.11.13: 0.689 sec
go1.12.16: 0.973 sec
go1.13.7: 1.151 sec
tip: 1.099 sec

I think most of the difference has to do with how big the heap grows. Here's my eyeballing the maxiumum RSS (adding lots of iterations to the innermost loop so I can watch top while it runs):

go1.11.13: 310MB
go1.12.16: 202MB
go1.13.7: 158MB
tip: 98MB

This test is stressing the allocator so much that it goes way beyond 200% of live heap data, which should only be 640KB or so. You'll see that we've made steady improvements on how much over we go. Of course, that means that we'll run garbage collections more often, which will hurt the run time.

I'm not sure there's any bug here. It just seems like a different speed/memory tradeoff. To actually demonstrate a regression, I think we'd need to hold heap size constant somehow.

@mknyszek

interviewQ

interviewQ commented on Feb 4, 2020

@interviewQ
Author

Our app which is showing performance degradation is also allocation heavy, which is why we are seeing this performance degradation with newer go runtime. Having said that, over the last few weeks I have made lots of changes to improve memory management and did see improvements that make the performance drop lower. From what you are saying above, it seems like memory allocator heavy apps will have to take a hit when they upgrade to a newer go runtime.

In your test above it seems time taken increased from 0.689 sec -> 1.151 sec which is a 65% increase. Is this big drop in performance solely attributed to running GC more often?

mknyszek

mknyszek commented on Feb 4, 2020

@mknyszek
Contributor

@interviewQ Some background: there was a known regression in the slow path of the allocator in Go 1.12, which was intentional in order to support returning memory to the OS more eagerly. For the vast majority of programs, this wasn't noticed. The regression only showed up in microbenchmarks, not in any real production services or applications (as far as we could see/find). You need to be really allocation bound to notice this, as @randall77's benchmark shows. Note that @randall77's reproducer makes many 64 KiB allocations without ever touching that memory (aside from zeroing), which would already meaningfully reduce the load on the allocator.

However, it's also been known that allocating heavily and in parallel has had serious lock contention issues since at least Go 1.11. This was noticed when we worked on the regression in Go 1.12. In Go 1.14 we worked to fix these issues (#35112) and it went reasonably well. I'm fairly confident that the allocator is now faster, so this seems to me like it's the change in the default trade-off in these "way past the heap goal" cases as @randall77 says.

With that being said, @interviewQ could you:

  1. Check if using the tip of the master branch for Go results in less memory used for your application.
  2. If so, try cranking up GOGC to the point where you're using the same amount of memory as before, then measure CPU performance again.

Hopefully the net effect will be that we made performance better. :) If not, we can move on from there.

interviewQ

interviewQ commented on Feb 4, 2020

@interviewQ
Author

First of all thanks a lot for these quick replies and very useful info.

I had tested with 1.14beta1 in Dec and that still showed the performance issue.
Did the fix for 35112 go after 1.14 beta1 was released? I was planning on retesting with 1.14 when it gets released (was due on Feb 1). However, I could try the tip of the master branch.

In our app, we have 1000 goroutines, all allocating memory in parallel, so from what you are saying above "allocating heavily and in parallel" is what applies to our app.

Using sync.Pool is something I have thought about but it seems non-trivial since I will need to figure out when to put the buffer back to the pool. It is not impossible just hard to figure out using sync.Pool.

What if I have 1 goroutine whose only job is to alloc memory. These 1000 goroutines can talk to this "allocator" goroutine (via channels). That way these huge allocations happen from only 1 goroutine and contention is eliminated. Please let me know if this makes sense. Also, I come from C++ world and hence new to Go, so please pardon my ignorance.

@mknyszek

mknyszek

mknyszek commented on Feb 4, 2020

@mknyszek
Contributor

First of all thanks a lot for these quick replies and very useful info.

I had tested with 1.14beta1 in Dec and that still showed the performance issue.
Did the fix for 35112 go after 1.14 beta1 was released? I was planning on retesting with 1.14 when it gets released (was due on Feb 1). However, I could try the tip of the master branch.

There may be a regression in the default configuration, but have you looked at memory use like I mentioned? If it went down, you could increase GOGC from the default until your memory use is the same as before (in the steady-state) and see if it performs better? Since you mentioned you're relatively new to Go, check out the comment at the top of https://golang.org/pkg/runtime for an explanation of GOGC, and also see https://golang.org/pkg/runtime/debug/#SetGCPercent for more information.

In our app, we have 1000 goroutines, all allocating memory in parallel, so from what you are saying above "allocating heavily and in parallel" is what applies to our app.

The number of goroutines doesn't really matter since that's just concurrency. The GOMAXPROCS value is your actual level of parallelism (up to the number of independent CPU cores on your machine). What is GOMAXPROCS for your application?

Using sync.Pool is something I have thought about but it seems non-trivial since I will need to figure out when to put the buffer back to the pool. It is not impossible just hard to figure out using sync.Pool.

What if I have 1 goroutine whose only job is to alloc memory. These 1000 goroutines can talk to this "allocator" goroutine (via channels). That way these huge allocations happen from only 1 goroutine and contention is eliminated. Please let me know if this makes sense. Also, I come from C++ world and hence new to Go, so please pardon my ignorance.

Contention isn't really eliminated; you're just moving it from the allocator and to the channels in this case. Before trying to restructure your code for this please give my suggestion above a try. You might just have to increase GOGC to get the same level of CPU performance for the same memory usage.

interviewQ

interviewQ commented on Feb 4, 2020

@interviewQ
Author

GOMAXPROCS = number of CPU cores. Our app runs on EC2 instances, so it depends on which instance we are running on. In my test GOMAXPROCS = 72. I will try out your suggestions and will get back to you (most likely tomorrow).

interviewQ

interviewQ commented on Feb 4, 2020

@interviewQ
Author

I ran tests with various go versions and checked RSS usage.

1.11.13 - 3.3 GB
1.13.7 - 3.4GB
1.14 (tip of master) - 4.6 GB

Actually, RSS increased with 1.14, due to this I did not so the GOGC adjustment. On this note, I see mention of SetMaxHeap function in runtime (https://blog.golang.org/ismmkeynote). Is this something that would be included in future versions of Go?

Relative to 1.11.13, performance-wise I see 25% drop with 1.13.7 and 12% drop with 1.14.

By reducing heap allocation in the app I do see performance improvement, so it is clear allocator is the bottleneck.

@mknyszek

mknyszek

mknyszek commented on Feb 5, 2020

@mknyszek
Contributor

Just out of curiosity, how did you measure RSS?

I just want to be very precise on this because the virtual memory usage of Go increased significantly with 1.14 (around 600 MiB, which admittedly doesn't account for everything you're seeing), but most of that memory is not mapped in/committed.

Thanks for the quick turnaround on this and for your cooperation. 72 cores is not a level of parallelism I've personally ever tested, it's possible we have some scalability problems at that level that we haven't seen before.

As another experiment, can you try setting GOMAXPROCS to 48 or so? There are other things I'd be interested in looking at as well, but I'll wait for your reply first.

mknyszek

mknyszek commented on Feb 5, 2020

@mknyszek
Contributor

@interviewQ Also, if you'd be willing to share a GC trace, that would give us a lot more insight into what the runtime is doing. You can collect one by running your application with GODEBUG=gctrace=1 set and capturing STDERR.

interviewQ

interviewQ commented on Feb 5, 2020

@interviewQ
Author

@mknyszek
I measured RSS by looking into "top" output, did you mean something else when you mentioned memory usage of the application. I had collected gctrace but did not save it. Will run the test again and give you the output. Planning to run the following tests -

  • GOMAXPROCS=72, gctrace=1, go 1.11.13
  • GOMAXPROCS=72, gctrace=1, go 1.13.7
  • GOMAXPROCS=48, gctrace=1, go 1.11.13
  • GOMAXPROCS=48, gctrace=1, go 1.13.7

Do you also need results from 1.14 (top of master branch)?

10 remaining items

added this to the Go1.14 milestone on Feb 12, 2020
dmitshur

dmitshur commented on Feb 13, 2020

@dmitshur
Member

It's not clear to me yet if this should be a release blocker for 1.14, but I'll tentatively add it for visibility so we can discuss it in a release meeting.

dmitshur

dmitshur commented on Feb 14, 2020

@dmitshur
Member

We've looked over this, and it should not be a release blocker for Go 1.14, because it's not a regression that is specific to 1.14 (it affects previous releases too). Removing release-blocker.

modified the milestones: Go1.14, Go1.15 on Feb 14, 2020
added
WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.
on Mar 12, 2020
self-assigned this
on Mar 12, 2020
gopherbot

gopherbot commented on Apr 12, 2020

@gopherbot
Contributor

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

networkimprov

networkimprov commented on Apr 12, 2020

@networkimprov

@interviewQ I think this was waiting on pprof results from you, described in #36521 (comment).

locked and limited conversation to collaborators on Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @networkimprov@mknyszek@dmitshur@ALTree@randall77

        Issue actions

          runtime: performance degradation in go 1.12 · Issue #36521 · golang/go