Closed
Description
package main
import (
"bytes"
"fmt"
"io/ioutil"
"math/rand"
"runtime"
"time"
)
func main() {
for i := 0; i < 1000; i++ {
time.Sleep(time.Millisecond * 1)
go fakeGetAndSaveData()
}
runtime.GC()
time.Sleep(10 * time.Minute)
}
func fakeGetAndSaveData() {
var buf bytes.Buffer
for i := 0; i < 40000; i++ {
buf.WriteString(fmt.Sprintf("the number is %d\n", i))
}
ioutil.WriteFile(fmt.Sprintf("%d.txt", rand.Int()), buf.Bytes(), 0644)
}
Copied from issue #9869 .
Reporter thinks the memory used by this program is not returned to the OS during the Sleep(10*minute) call.
On Linux, I'm using top -b | grep memoryLeak command and on Windows, I use Task Manager. I just ran this again on Ubuntu 16.04 and it looks like it isn't exhibiting the problem after all. When I ran it yesterday it crashed my VM, but it wasn't due to out of memory as I'd assumed. But, I do see the problem clearly on Windows. Sorry, I'll put together a different example for Linux.
Here's what I see on Windows. This is after it's gone idle. Nothing new is being spawned. It's just sitting there. It stays like this for 10 minutes until the program closes.
Activity
randall77 commentedon Oct 25, 2017
@6degreeshealth
Please try and run with the environment variable
GODEBUG
set togctrace=1
. Let it run to completion.Grab all the lines starting with
scvg
and post them here.My current suspicion is that we're actually returning pages to the OS, but it isn't taking them. We've seen this before, where the OS doesn't actually remove the pages from the process unless there is demand for them elsewhere in the system.
A scavenger trace (the
scvg
lines) will let us know for sure one way or the other.alexbrainman commentedon Oct 26, 2017
@randall77 I replaced your 10 minutes sleep with
and I am posting whole output
Alex
randall77 commentedon Oct 26, 2017
The Go runtime claims it returned all but 94MB to the OS (the scvg2 lines).
Maybe my hunch earlier was correct, or maybe the "Memory" reported is virtual, not physical.
In any case, I suspect the error is with the observations, not with Go. I think we need an expert familiar with the Windows virtual memory system to know for sure.
alexbrainman commentedon Oct 26, 2017
I downloaded VMMap program to see what is going on with this process. And it does uses a lot of memory. This particular program creates quite a few threads, and OS allocates 2MB for each thread stack space. @aclements merged CL 49331 recently, that changed oh64.SizeOfStackCommit from 0x00001000 to 0x00200000. I think it is a mistake (and I did not notice at the time). We should change it back to make thread initial stack size small again.
It would be nice to have a test to prevent mistakes like that in the future. I could, probably, write a test that measures committed memory of an external program. But what should I compare that value with?
Alex
randall77 commentedon Oct 26, 2017
Ah, I see, this might be Ms, not Gs or the heap, that are using up all the memory.
In CL 49331, Austin claimed large M stacks just use virtual memory, not physical memory. Are you disputing that?
alexbrainman commentedon Oct 26, 2017
CL 49331 intended to change just the use of virtual memory, but, I think, there is a mistake in CL 49331 - change of oh64.SizeOfStackCommit from 0x00001000 to 0x00200000 actually affects physical memory.
Windows memory management is quite complicated, but, for the purpose of this discussion, each page of process addressed memory could be one of three: free, reserved and committed (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa366794(v=vs.85).aspx ) Only "committed"
memory require actual physical memory to back it up.
When process starts, EXE file describes to OS how stacks should be managed. In particular PE file format specifies the maximum size of the stack to be used (stored in oh64.SizeOfStackReserve). When process creates new thread, it "reserves" oh64.SizeOfStackReserve bytes for each stack. PE file also specify how much of oh64.SizeOfStackReserve bytes should be "committed" at start of each thread (this is stored in oh64.SizeOfStackCommit). So normal stacks should start with large "reserved" stack, with some small part of it "committed" at the beginning. As stack grows, "committed" part grows too - when program grows beyond "committed" area, it tries to read / write memory, this causes "system exception", and memory manager handles these exceptions by making more "committed" memory and restarting failed code again.
So for a particular case of non-cgo windows/amd64 we had:
before CL 49331, and:
after CL 49331. So the max stack size got increased from 132K to 2M, but "committed" part of the stack also was increased from 4K to 2M. I think we should not have done the second change, because now every new thread cost us 2M of physical memory comparing to 4K before CL 49331. And once "committed", stack memory cannot be freed - I suspect only exiting thread will free its stack memory.
I hope I explained it well enough.
Alex
randall77 commentedon Oct 27, 2017
Yes, that all makes sense, thanks.
Can someone test such a change on a Windows machine, and see if it helps this case?
Are we ever exiting Ms that have been idle for a long time? If not, then doing that might fix the issue. Maybe an additional task for the scavenger to do.
aclements commentedon Oct 27, 2017
I see. It looks like I just did what we were already doing in the cgo case. I don't know why we were committing basically the whole stack under cgo in the first place. I'm certainly fine with reducing the commit if that works.
I recently added the framework necessary to do this, but right now we only exit locked Ms (when their Gs exit without unlocking). We don't use this for exiting idle Ms currently, but we could.
alexbrainman commentedon Oct 27, 2017
I did test that yesterday. Setting oh64.SizeOfStackCommit back to 0x00001000 helps with memory usage significantly. We can get @6degreeshealth check that it helps too to be sure. I am worried that we have broken this thing without anyone noticing. I wonder if I could write some test to stop this happening in the future.
I don't think so. But Austin will know.
Yes, the fact that thread stacks consume memory (even when threads are idle) might be a good reason to exit them.
cgo case is very different from non-cgo case as far as I am concerned. I hope not many people use cgo. Especially on Windows, because we have syscall.Syscall to easily call into Windows C code without cgo. And cgo executables are created by external linker, so I am not sure if we can control SizeOfStackReserve and SizeOfStackCommit fields in the PE exe. Also we use _beginthead function to start threads with cgo. I do not know if _beginthread allow us control "commit" stack size. I probably dropped the ball with cgo. Do we want to try and improve this for cgo case too?
Cool. I will send the change. The change should be simple, I just wonder about the test.
Alex
aclements commentedon Oct 27, 2017
That's true. Though I would expect the external linker to set reasonable "normal" stack sizes for Windows, which presumably have a small commit?
We pass 0 for the stack size, which MSDN says means to "the same value as the stack that's specified for the main thread", which I expect means to use both the reservation and commit from the PE header.
Start lots of threads and if the system crashes the test fails? :)
robarchibald commentedon Oct 27, 2017
I would be happy to rest, but I'm on a business trip away from my Windows computer. I won't be able to test until Saturday. I assume that to test I'll need to build Go from source using the latest source. Never done that before, but I'll give it a whirl.
alexbrainman commentedon Oct 27, 2017
I will see what we get with external linker.
I was actually hoping you will be helpful. ;-)
That would be nice, if you test when my change is ready. Thank you.
The change is not ready, so do not worry.
Yes, you will need to build Go from source. Here https://golang.org/doc/install/source are the instructions. We will help you if you get into trouble.
Alex
aclements commentedon Oct 27, 2017
For the test, I was thinking the Windows equivalent of this:
E.g., on Linux/amd64, I get a mere "3973120 bytes RSS (52129792 bytes virtual)".
I'm not sure what the Windows equivalent of /proc/self/stat's RSS is. Maybe
PROCESS_MEMORY_COUNTERS_EX.PrivateUsage
?Unfortunately, this probably has to be a
runtime/testdata/testprog
thing rather than a regular test to avoid interference from other tests' memory usage.15 remaining items