Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

High memory & CPU usage after running for awhile #178

Closed
beyang opened this issue Mar 14, 2017 · 37 comments
Closed

High memory & CPU usage after running for awhile #178

beyang opened this issue Mar 14, 2017 · 37 comments

Comments

@beyang
Copy link
Member

beyang commented Mar 14, 2017

Enabled the langserver and did a few hovers for testing. Then vscode was in the background. After a few minutes the langserver was using 5.6GB of memory (machine only has 4 GB RAM).

@keegancsmith
Copy link
Member

Related to microsoft/vscode-go#853

So the issue is our language server resource usage is currently tuned for a server environment were we have a premise of lots of RAM. This has been presenting us issues in prod as well, so rewriting our hover and j2d to be smarter with resources needs to be done. It is almost certainly related to using the simple go loader interface and/or naive caching of its output. Add in our conservative cache invalidation policy, and we create a lot of garbage and re-work things out often in an editor environment.

I'll see if there is any short term wins here, but we probably need to make some larger changes to get good at this.

@vanackere
Copy link

Same issue for me: even with 16GB of memory I had to stop using go-langserver in vscode because it was frequently using all memory, making the computer unusable and triggering the OOM-Killer...

@vanackere
Copy link

vanackere commented Mar 22, 2017

Since I can reliably cause the langserver to use all memory on my private code base, how can I help debug this issue (or at least find where most of the ressource are spent) ? Any profiling tool to run ? If so, how ?

@ramya-rao-a
Copy link
Contributor

ramya-rao-a commented Mar 23, 2017

@keegancsmith Any updates on this?

I have pushed a change for adding the ability to pass -trace and -logfile flags to the language server (See microsoft/vscode-go@50e5c8a)

Is there anything else you'd like to see users able to do via vscode?

@OneOfOne
Copy link
Contributor

OneOfOne commented Apr 4, 2017

I hate to do this but any news on this? this is kind of a major issue.

I have 32gb of ram and the oom killer triggered twice in the last 4 hours, gocode is broken on go git.

@beyang
Copy link
Member Author

beyang commented Apr 17, 2017

@OneOfOne Apologies for the delay. We're still looking into it, but have had to prioritize a few other features for Go in the past 2 weeks. I'll update this issue when we have a patch targeting this issue. Appreciate the feedback and thanks for trying it out!

@mgleason
Copy link

mgleason commented Apr 27, 2017

+1 Same issue here (on MacOS). Seems to grow to consume all available memory and then keeps going consuming swap as the system slows to a crawl. At one point "go-langserver" was consuming over 18GB.

FYI - after restarted "go-langserver", it re-started consuming 183.6MB of space. Added a single space to a .go file and saved - memory consumed went to 576.0MB. Changed a letter from lower to uppercase, saved, changed it back to lowercase, saved and memory was at 1.09GB. Will try restarting next...

OK, restarted and getting same behavior. 193.8MB immediately, added a single space character, saved - memory went to 574.8MB. Changed a letter from uppercase to lower, saved, changed it back to uppercase, saved and memory is at 1.86GB. Did it again and memory is 2.0GB.

@beyang
Copy link
Member Author

beyang commented Apr 28, 2017

Thanks for the feedback @mgleason. I'll be targeting the local machine use case in the next couple of weeks, so I hope to have a patch out soon. In the meantime, thanks for trying it out and posting the feedback!

@keegancsmith
Copy link
Member

A detailed comment for someone who tackles this problem.

TLDR: Rewrite everything in loader.go so we work more like the go compiler. https://sourcegraph.com/github.com/sourcegraph/go-langserver@ee6a3562/-/blob/langserver/loader.go

Currently this language server is optimized for the server environment on sourcegraph.com. The tradeoffs which most negatively effect desktop users are:

  • Many shared read-only workspaces in a single process.
  • Server environment has lots of RAM.

When we receive a request for my/pkg/a.go, we use golang.org/x/tools/go/loader to typecheck my/pkg. We only typecheck my/pkg, but the bulk of the time is spent on the transitive dependencies. https://sourcegraph.com/github.com/sourcegraph/go-langserver@ee6a3562/-/blob/langserver/loader.go#L233

Our caching for this is only to cache the output of loader for my/pkg. All the intermediate work it does (on the transitive dependencies) is not cached at all. When we receive an edit we just throw that cache out. This works alright for sourcegraph.com since when a user is browsing a file we never invalidate the cache and we expect to receive a lot of hovers for different places in that file. On the desktop when we receive an edit, we just throw away all of our caches https://sourcegraph.com/github.com/sourcegraph/go-langserver@ee6a3562/-/blob/langserver/handler.go#L353

With that in mind the major improvements for this are:

  • Granular caching and cache invalidation. Do not throw away everything on edits.
  • Cache intermediate work. Look into how we can use what go stores in $GOPATH/pkg.
  • Possibly throw away the coarse pkg caching, and instead do the minimal amount of work for a hover, rather than doing the work for a whole pkg. This will technically make sourcegraph "slower", but I believe it will still be fast enough for users (instead of a few milliseconds, it will be a few 100 milliseconds).
  • Do not use in-memory caches, instead rely on a filesystem (or make it pluggable).

Implementing this sort of stuff almost certainly will require forking and heavily modifying golang.org/x/tools/go/loader. I would target the j2d use case since that is the most heavily used. We can then get hover to use the j2d result to generate hover contents. At some point we had a fork of godef. Making godef use a virtual filesystem may be the easiest win. As for references it can be left alone since it works quite differently. Symbols just relies on the AST, so does not suffer the same performance problems.

Note: I have not done any proper profiling of the desktop use case, this is what my educated guess says the problem is. We may be surprised to find that its something silly like FindPackage being slow. If so, we could look into open sourcing the optimised FindPackage we use on sourcegraph.com.

@emidoots
Copy link
Member

(Hi all! I'm taking over here where @keegancsmith had to leave off)

I've merged #192 just now which will significantly help lower the memory consumption of the language server, by somewhere between 20-40% from my testing.

I suspect my above PR entirely fixes this issue, but I also have another change brewing that will lower the memory consumption even further, and make most hover/definition requests faster. I'll keep everyone here posted, I hope to have that change out by the end of this week or early next week (obviously this is just an estimate / timeline could change).

Feel free to give the language server another shot and provide feedback! You can update it by running go get -u github.com/sourcegraph/go-langserver and enabling the VS Code option once again.

@OneOfOne
Copy link
Contributor

I'll test and get back to you, woo

@OneOfOne
Copy link
Contributor

It seems stable around 4.3G of RAM (16.9G virtual), been running for 30 mins or so.

@OneOfOne
Copy link
Contributor

OneOfOne commented Jun 1, 2017

It randomly happened again, it spiked to 25G and I had to ssh in from my phone to kill it. :( @slimsag

@emidoots
Copy link
Member

emidoots commented Jun 1, 2017

Thanks for giving it a shot and providing feedback, @OneOfOne, we appreciate it! Any details you can provide would be helpful:

  1. How long was it running before it happened? A few hours, multiple days, etc.
  2. Did you notice if the spike occurred very rapidly or slowly over a long period of time?
  3. If possible, which codebase does it occur on? Or if it is private, is there an open source project that you would guess to be about the same size?

As mentioned in my prior message, I am working on some further improvements that will make a larger impact here. I'll update here as soon as that lands and can be tested :)

@OneOfOne
Copy link
Contributor

OneOfOne commented Jun 1, 2017

Glad to help, when it works it's really responsive and great.

  1. Less than 10 minutes, which is odd because I worked on the same project for several hours yesterday and it didn't cause any issues.
  2. It was very rapid, I was trying to get to the source of a function when it happened.
  3. Sadly it's a private project, that includes 10-15 sub projects, has unsafe in it, the code base is about ~5mb of pure code.

@henvic
Copy link

henvic commented Jun 2, 2017

A simular issue is still happening on my Mac (tried updating after these changes) on a small sized project (< 30k LOC). But here is more high CPU usage than memory.

After the update I could use the langserver for more time, but after a while it still happens.

@cossay
Copy link

cossay commented Jun 2, 2017

I'm also having the same problem here on Ubuntu. High memory and CPU usage:
screenshot from 2017-06-02 17-50-53
screenshot from 2017-06-02 17-52-13

@emidoots
Copy link
Member

emidoots commented Jun 2, 2017

Hi @cossay

Your screenshot shows processes like godef running, but as far as I know that should not be running if you have set "go.useLanguageServer": true, in your VS Code settings. Can you confirm you're testing with that option turned on?

@cossay
Copy link

cossay commented Jun 2, 2017

@slimsag, you are right. I have "go.useLanguageServer" set to "false" in my VS Code settings. I'm going to set it as you suggested and see what happens. I'm not working on any projects now, I'm just following a tutorial.

@emidoots
Copy link
Member

emidoots commented Jun 2, 2017

It'd also be very helpful for me if anyone observing high memory or CPU usage could upload an SVG profile, see https://github.com/sourcegraph/go-langserver#profiling

@henvic
Copy link

henvic commented Jun 3, 2017

@henvic
Copy link

henvic commented Jun 5, 2017

Now with almost 100% usage after ~45min:
https://cl.ly/3Y2z3x1z2a0h
https://cl.ly/201G40441R1u

emidoots pushed a commit that referenced this issue Jun 14, 2017
…rging

Typechecking used to cause linear memory growth, now it doesn't. In the docker repository, editing and hovering 5 times (to trigger typechecking) produces:

Without this change:

1. 719MB
2. 1.21GB
3. 1.89GB
4. 2.24GB
5. 2.51GB

With this change:

1. 655MB
2. 776MB
3. 787MB
4. 787MB
5. 787MB

i.e. we used to grow by about 500MB per typecheck operation, now we just idle at ~500MB.

I actually planned to remove the cache _entirely_, but with a change in direction about how I will be tackling the slowness of go to definition and hover, removing the cache entirely is not possible anymore. This is the next best thing, and is still a HUGE improvement.

Each `LangHandler` has it's own cache (see `(*LangHandler).resetCaches`), but they are backed by the same LRU cache (see `cache.go`). Every cache that wraps the LRU cache (see `newTypecheckCache`) uses a global cache ID to avoid conflicts:

```
	// cacheID is used to prevent key conflicts between different
	// LangHandlers in the same process.
	cacheID int64
```

That is, we use the same backing LRU cache but keep our data separate. The intent of this, I presume, is to keep memory usage lower (because 2 LRU caches would have 2x the number of objects in caches overall). At first glance, this seems like a good idea. But when it comes to purging the cache, issues crop up. For example:

```
func (c *boundedCache) Purge() {
	// c.c is a process level cache. Since c.id is part of the cache keys,
	// we can just change its values to make it seem like we have purged
	// the cache.
	c.mu.Lock()
	c.id = nextCacheID()
	c.mu.Unlock()
}
```

This code assumes that we can purge the cache by incrementing the cache ID. And that's true -- data from the old cache ID will never be returned. However, this doesn't actually purge the old cache contents from memory!

In the case of sourcegraph.com, not purging things from memory is OK (we expect our process to hold all of it's memory, basically). But for local editors using the language server, this isn't OK. Every time the user modifies a file, the cache is purged (because the typechecking data is invalid). Because the purging doesn't actually clear contents from memory, we store (by default) up to 500 typechecked copies of the user's program. For something like docker, these cache objects can be up to 500MB of memory each (and we'll store 500 of them).

Given this is the case, we should really purge the cache contents instead of just invalidating them via ID incrementing. This will actually help lower memory consumption on sourcegraph.com, too, so it's a win-win situation.

Helps microsoft/vscode-go#853
Helps #178
emidoots pushed a commit that referenced this issue Jun 14, 2017
…rging

Typechecking used to cause linear memory growth, now it doesn't. In the docker repository, editing and hovering 5 times (to trigger typechecking) produces:

Without this change:

1. 719MB
2. 1.21GB
3. 1.89GB
4. 2.24GB
5. 2.51GB

With this change:

1. 655MB
2. 776MB
3. 787MB
4. 787MB
5. 787MB

i.e. we used to grow by about 500MB per typecheck operation, now we just idle at ~500MB.

I actually planned to remove the cache _entirely_, but with a change in direction about how I will be tackling the slowness of go to definition and hover, removing the cache entirely is not possible anymore. This is the next best thing, and is still a HUGE improvement.

Each `LangHandler` has it's own cache (see `(*LangHandler).resetCaches`), but they are backed by the same LRU cache (see `cache.go`). Every cache that wraps the LRU cache (see `newTypecheckCache`) uses a global cache ID to avoid conflicts:

```
	// cacheID is used to prevent key conflicts between different
	// LangHandlers in the same process.
	cacheID int64
```

That is, we use the same backing LRU cache but keep our data separate. The intent of this, I presume, is to keep memory usage lower (because 2 LRU caches would have 2x the number of objects in caches overall). At first glance, this seems like a good idea. But when it comes to purging the cache, issues crop up. For example:

```
func (c *boundedCache) Purge() {
	// c.c is a process level cache. Since c.id is part of the cache keys,
	// we can just change its values to make it seem like we have purged
	// the cache.
	c.mu.Lock()
	c.id = nextCacheID()
	c.mu.Unlock()
}
```

This code assumes that we can purge the cache by incrementing the cache ID. And that's true -- data from the old cache ID will never be returned. However, this doesn't actually purge the old cache contents from memory!

In the case of sourcegraph.com, not purging things from memory is OK (we expect our process to hold all of it's memory, basically). But for local editors using the language server, this isn't OK. Every time the user modifies a file, the cache is purged (because the typechecking data is invalid). Because the purging doesn't actually clear contents from memory, we store (by default) up to 500 typechecked copies of the user's program. For something like docker, these cache objects can be up to 500MB of memory each (and we'll store 500 of them).

Given this is the case, we should really purge the cache contents instead of just invalidating them via ID incrementing. This will actually help lower memory consumption on sourcegraph.com, too, so it's a win-win situation.

Helps microsoft/vscode-go#853
Helps #178
@emidoots emidoots changed the title High memory usage after running for awhile High memory & CPU usage after running for awhile Jun 14, 2017
@emidoots
Copy link
Member

@henvic @OneOfOne (/cc @ramya-rao-a) I've made another significant improvement to the memory consumption issue. I believe this should solve the issue you've all observed with the memory usage growing constantly. Now, it should sit at some constant number based on how large your project is (e.g. it should idle at a few hundred MB instead of growing into multiple GB).

The CPU usage issue / slowness around responding to go to definition and hover requests is still known, I'm working on improving this now.

Please feel free to give the language server another shot and provide feedback:

  1. Update: go get -u github.com/sourcegraph/go-langserver
  2. Re-enable the VS Code option once again.
  3. If you run into any issues, please capture a profile and respond here.

@OneOfOne
Copy link
Contributor

I'll give it a try and report back.

@OneOfOne
Copy link
Contributor

Good news / bad news kinda thing, good news is it stabilized at 1.6gb (~5%) of ram, bad news is every time I type anything it spikes to using 800% CPU. @slimsag

pprof.go-langserver.samples.cpu.001.pb.gz
cpu.svg.gz

@emidoots
Copy link
Member

Thanks for the feedback, @OneOfOne !

That is actually just good news. I am already aware of the CPU usage issue and am working on it now :) will update soon.

emidoots pushed a commit that referenced this issue Jun 23, 2017
This change adds a godef-based hover backend, similar to the godef-based
`textDocument/definition` backend added prior. The motivation for this
change is to avoid typechecking the entire program just to serve a single
`textDocument/hover` request. After this change, hover and definition are
both very quick and use little resources. Larger requests like `workspace/symbol`,
or `textDocument/references`, will continue to use more resources as they must
perform typechecking.

The output style of this hover implementation does vary from our prior
typechecking implementation in slight ways, but overall the implementation
always produces results that are on par or slightly better than our typechecking
implementation.

As with the `textDocument/definition` implementation, we should attempt consolidation
of this hover implementation with our typechecking variant further in the future. At
this point, of course, they are too different to share much code or implementation.

Helps #178
Helps microsoft/vscode-go#853
@emidoots
Copy link
Member

(Copying the message I just posted in microsoft/vscode-go#853 (comment) since these were basically the same issue.)

Hi everyone ( /cc @ramya-rao-a ), this issue should be fixed now. 😃 Very significant improvements have been made to both goto definition and hover requests, and they now use a tiny fraction of the memory / CPU usage seen before.

To update, please run:

go get -u github.com/sourcegraph/go-langserver

And once again set go.useLanguageServer to true inside your settings to enable the language server.

Please feel free to provide any feedback here, and if all looks good we can close this issue out!

@fsouza
Copy link

fsouza commented Jun 24, 2017

@slimsag thanks for working on that!

I know that Go 1.9 is still in beta, but could you add support for it? Installing go-langserver on go 1.9beta1 fails:

% go get -u github.com/sourcegraph/go-langserver
# github.com/sourcegraph/go-langserver/langserver/internal/godef/go/parser
src/github.com/sourcegraph/go-langserver/langserver/internal/godef/go/parser/parser.go:2036:36: cannot use nil as type token.Pos in field value
src/github.com/sourcegraph/go-langserver/langserver/internal/godef/go/parser/parser.go:2036:42: cannot use p.lineComment (type *ast.CommentGroup) as type ast.Expr in field value:
	*ast.CommentGroup does not implement ast.Expr (missing ast.exprNode method)
src/github.com/sourcegraph/go-langserver/langserver/internal/godef/go/parser/parser.go:2036:42: too few values in struct initializer

@emidoots
Copy link
Member

Hi @fsouza, thanks for the feedback. I just fixed that in 66ff4f5 so try giving go get -u github.com/sourcegraph/go-langserver another shot now.

@fsouza
Copy link

fsouza commented Jun 24, 2017

@slimsag awesome, it works! Thank you very much! :D

@ramya-rao-a
Copy link
Contributor

@OneOfOne @henvic @vanackere @cossay Can you update the language server go get -u github.com/sourcegraph/go-langserver, turn the setting back on and let us know how it works for you?

@OneOfOne
Copy link
Contributor

3 hours so far and it's steady at 1.5g virt mem and 0.7-1.5% cpu!

@OneOfOne
Copy link
Contributor

OneOfOne commented Jun 26, 2017

Sadly after a while the cpu usage spikes to 40% and stays there even when idle.

cpu.svg.gz

@emidoots
Copy link
Member

emidoots commented Jun 26, 2017

Hi @OneOfOne -- please do the following to give me some more insight (the CPU profile, unfortunately, does not provide enough here):

  1. While it is using 40% CPU, kill the process via terminal with kill -SIGQUIT $(pgrep go-langserver), then inside of VS Code Output tab (press ctrl+backtick then click 'Output' tab, like e.g. this) please copy entire log output here (or email).
  2. Did you activate 'Find All References' or symbol search, or just doing hovers and Goto / Peek definition?

@OneOfOne
Copy link
Contributor

Ok I can't reproduce it anymore, so it seems to be working good, if I run into it I'll update this issue.

Thank you for your hard work @slimsag.

@emidoots
Copy link
Member

Alright :) Please don't hesitate to open a new issue if you run into it again (even if you have little or no info)!

@emidoots
Copy link
Member

Closing since this issue is stale / no further complaints about performance have been made. If anyone runs into issues, please do not hesitate to create a new issue! :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants