Skip to content

Web Socket Message Broker memory leak. Broker retaining reactor tcp clients. #5810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
daveburnsLIT opened this issue Apr 27, 2016 · 14 comments
Closed

Comments

@daveburnsLIT
Copy link

Version : Spring Boot 1.4.0.M2 and Spring Boot 1.3.3 Release

There is a memory leak in the Stomp Relay Broker. When client stomp websocket connections have disconnected the broker retains reactor tcp clients.

To replicate the issue the sample application used for our previous reactor memory leak investigation was executed for both Spring Boot 1.4.0.M2 and Spring Boot 1.3.3. The websocket broker was backed by a RabbitMQ instance.

The problem manifests when we disconnect clients as tcp client refs are retained. The following steps were used to replicate :

1. Run server with web socket broker (sample app)
2. Run VisualVM and connect to server
3. Run client test harness (sample app)
4. Wait until 2000 connections in RabbitMQ (shown via RabbitMQ Management)
5. Stop client test harness
6. Watch all client connections disappear from RabbitMQ (via RabbitMQ Management)
7. In VisualVM : Perform GC then Heap Dump
8. In Heap dump click Find (20 Biggest objects)
9. Repeat from step 3

The following shows the visualVM monitor for the above test. The arrows show the points when step 7 above was performed. The upward slating red line was drawn to illustrate the leak i.e. baseline memory heap post GC is increasing.

image

The following shows the output for the top retained objects in each heap dump (arrows). The first heap dump is before any clients had connected, basically a newly started websocket broker and what we expect all subsequent heap dumps to look like post client run and GC.

image

The following is a heap dump after 2000 client connection have terminated, no clients are visible in RAbbitMQ and the actual Java client VM has terminated, i.e. no way there could be a websocket client connection to the broker. There is about an 8MB leak in StompBrokerRelayMessageHandler and/or the Reactor TCP client.

image

The following shows the result after another test run, we keep leaking 8MB for every 2000 websocket connect/disconnects.

image

It looks like some sort or pool (ArrayList in StompBrokerRelayMessagHandler) is increasing, although this could be related to the issue @smaldini previously looked at.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Apr 27, 2016
@philwebb
Copy link
Member

/cc @rstoyanchev

@philwebb
Copy link
Member

I'm guessing that this is a Spring MVC or Reactor issue rather than Spring Boot but lets leave this open here until someone from the other team can confirm.

@rstoyanchev
Copy link
Contributor

I'll have a look.

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Apr 28, 2016

@daveburnsLIT I've confirmed the issue and created SPR-14231.

There is a proposed fix that will become available when this build completes. I've tried the fix locally but would be great to have you confirm that too.

To try the fix you'll need to switch to Spring Framework 4.3.0.BUILD-SNAPSHOT. In the sample app I added http://repo.spring.io/snapshot as a repository and this:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-framework-bom</artifactId>
            <version>4.3.0.BUILD-SNAPSHOT</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Note that for some reason spring-core was still 4.3.0.RC1 after the change and I had to add it explicitly as a dependency with a version. Not sure why that is. The actual bom looks okay. So do check the actual dependencies with mvn dependency:tree.

@wilkinsona
Copy link
Member

Thanks, @rstoyanchev.

@daveburnsLIT An easier way to use the Spring Framework snapshot is to configure the spring.version property in your pom:

<properties>
    <spring.version>4.3.0.BUILD-SNAPSHOT</spring.version>
</properties>

@rstoyanchev
Copy link
Contributor

Oops, thanks Andy :)

@philwebb philwebb added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage An issue we've not yet triaged labels Apr 29, 2016
@snicoll
Copy link
Member

snicoll commented Apr 30, 2016

@rstoyanchev fyi you cannot override a version from the parent with a bom import unfortunately. We've raised that and a fix will be available in Maven 3.4

@daveburnsLIT
Copy link
Author

@rstoyanchev - Seems like that has fixed the reactortcp client leak but after running client connect/disconnects for an hour the heap graph still trends up

image

Here's the top retained after 10 mins after a GC and heap dump

image

But here's after 30mins, where the webresources cache seems to be a problem

image

But then after an hour we have both the cache and some connection handler

image

After running about 15 heap dumps over the hour shows the webresources cache and connection handler objetcs gradually trend up on retained size. Don't know if these are a separate issue but still looks like a leak on the websocket broker

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels May 1, 2016
@rstoyanchev
Copy link
Contributor

@daveburnsLIT the first two images (10 vs 30 min) appear identical. In the 3rd the retained size has grown but I don't see anything related to Spring.

I ran the server for about an hour repeatedly hitting it with the test (as per the steps at the top) but the memory remained stable.

image

While testing I did notice an anomaly. In the client test there is a RestTemplate call to "/hello" which causes an HTTP session to be created and the client obtains the session id via "Set-Cookie" and uses it for the WebSocket handshake. In some test runs the first requests (anywhere between the very first up to the first 50 or 60) fail with a network error:

org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://localhost:8080/hello": Connection reset; nested exception is java.net.SocketException: Connection reset
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:634)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:581)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:499)
    at org.sample.websocket.WebsocketTest.getCookies(WebsocketTest.java:96)
    at org.sample.websocket.WebsocketTest.TestConnectionsSingleThread(WebsocketTest.java:60)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:196)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:689)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:660)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
    at org.springframework.http.client.SimpleBufferingClientHttpRequest.executeInternal(SimpleBufferingClientHttpRequest.java:84)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:620)
    ... 31 common frames omitted

I haven't figured out what causes the error but that's an HTTP REST call failure, i.e. before the WebSocket handshake is even attempted, and doesn't even reach the SessionController. In some cases I've tried curl from the command line and gotten the same failure.

I suspect something Tomcat-related is at hand. Perhaps after many sessions are created and destroyed something affects it. That may also be related to the memory leak you're seeing as the biggest objects in your snapshots above are also Tomcat-related. Switching to Jetty would be one thing to try.

Can you confirm if when running the tests you see errors either client or server side? You can also check in the RabbitMQ console after each iteration if you actually see 2001 connections (1 for the shared system connection). You would see less than that if not all connected.

@daveburnsLIT
Copy link
Author

@rstoyanchev - oops, sorry about that copy/paste error on the 10 and 30 min screen shot, basically trying to show the increase. That run was on a virtualbox setup (on leave and didn't have access to original workplace setup). With the virtualbox setup i noticed with a 2000 connections test client I might only get 1800 actual rabbitmq connections, and yes, when this happens I see http REST failure. I was assuming the virtual machine couldn't handle the load.

I've just rerun on a host system rather than virtual and I get 2000 connections without the error. Also, my concern on the web resources cache and connection handler may have been premature, I see it flatline around 14MB after 30mins (12 test runs of 2000 client connect/disconnects).

image

So, looks like the the leak is fixed and the failed initial requests may be due to host/container sizing and the rate at which connections are made.

Thanks a lot for the support, and, yet again, phenomenal turnaround time.

@daveburnsLIT
Copy link
Author

@rstoyanchev - Also, switched to Jetty as suggested and it took around 6 minutes to get 1100 client connections, then a couple of seconds for each new connection after that (didn't wait for the full 2000). I can get 2000 clean client connections in 40 seconds on the same host using tomcat.

@rstoyanchev
Copy link
Contributor

@daveburnsLIT I suspect whatever issue remains is related to the HTTP REST call failures. I did not investigate that further but it happens at the Tomcat level (or lower) because no logging appears from the DispatcherServlet for such failed requests. Perhaps we should treat that as a separate issue to work on demonstrating and narrowing down.

Thanks for reporting the problem. We have 4.2.6 coming out this week that will have the fix.

@philwebb philwebb closed this as completed May 2, 2016
@philwebb philwebb removed the status: feedback-provided Feedback has been provided label May 2, 2016
@PaulGobin
Copy link

PaulGobin commented May 11, 2016

Thanks for this, will the new fix be part of spring boot release as well?

I am running into a similar issue as described here at websocket+stomp+springboot

@rstoyanchev
Copy link
Contributor

@PaulGobin from reading the SO issue it doesn't sound the same. The issue is not about the server not accepting more connections. Rather it is about a memory leak. You need to find where exactly connections are getting rejected, e.g. at the WebSocket server or by the broker (ActiveMQ in your case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants