-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Web Socket Message Broker memory leak. Broker retaining reactor tcp clients. #5810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @rstoyanchev |
I'm guessing that this is a Spring MVC or Reactor issue rather than Spring Boot but lets leave this open here until someone from the other team can confirm. |
I'll have a look. |
@daveburnsLIT I've confirmed the issue and created SPR-14231. There is a proposed fix that will become available when this build completes. I've tried the fix locally but would be great to have you confirm that too. To try the fix you'll need to switch to Spring Framework 4.3.0.BUILD-SNAPSHOT. In the sample app I added http://repo.spring.io/snapshot as a repository and this: <dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-framework-bom</artifactId>
<version>4.3.0.BUILD-SNAPSHOT</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement> Note that for some reason spring-core was still 4.3.0.RC1 after the change and I had to add it explicitly as a dependency with a version. Not sure why that is. The actual bom looks okay. So do check the actual dependencies with |
Thanks, @rstoyanchev. @daveburnsLIT An easier way to use the Spring Framework snapshot is to configure the
|
Oops, thanks Andy :) |
@rstoyanchev fyi you cannot override a version from the parent with a bom import unfortunately. We've raised that and a fix will be available in Maven 3.4 |
@rstoyanchev - Seems like that has fixed the reactortcp client leak but after running client connect/disconnects for an hour the heap graph still trends up Here's the top retained after 10 mins after a GC and heap dump But here's after 30mins, where the webresources cache seems to be a problem But then after an hour we have both the cache and some connection handler After running about 15 heap dumps over the hour shows the webresources cache and connection handler objetcs gradually trend up on retained size. Don't know if these are a separate issue but still looks like a leak on the websocket broker |
@daveburnsLIT the first two images (10 vs 30 min) appear identical. In the 3rd the retained size has grown but I don't see anything related to Spring. I ran the server for about an hour repeatedly hitting it with the test (as per the steps at the top) but the memory remained stable. While testing I did notice an anomaly. In the client test there is a RestTemplate call to "/hello" which causes an HTTP session to be created and the client obtains the session id via "Set-Cookie" and uses it for the WebSocket handshake. In some test runs the first requests (anywhere between the very first up to the first 50 or 60) fail with a network error:
I haven't figured out what causes the error but that's an HTTP REST call failure, i.e. before the WebSocket handshake is even attempted, and doesn't even reach the I suspect something Tomcat-related is at hand. Perhaps after many sessions are created and destroyed something affects it. That may also be related to the memory leak you're seeing as the biggest objects in your snapshots above are also Tomcat-related. Switching to Jetty would be one thing to try. Can you confirm if when running the tests you see errors either client or server side? You can also check in the RabbitMQ console after each iteration if you actually see 2001 connections (1 for the shared system connection). You would see less than that if not all connected. |
@rstoyanchev - oops, sorry about that copy/paste error on the 10 and 30 min screen shot, basically trying to show the increase. That run was on a virtualbox setup (on leave and didn't have access to original workplace setup). With the virtualbox setup i noticed with a 2000 connections test client I might only get 1800 actual rabbitmq connections, and yes, when this happens I see http REST failure. I was assuming the virtual machine couldn't handle the load. I've just rerun on a host system rather than virtual and I get 2000 connections without the error. Also, my concern on the web resources cache and connection handler may have been premature, I see it flatline around 14MB after 30mins (12 test runs of 2000 client connect/disconnects). So, looks like the the leak is fixed and the failed initial requests may be due to host/container sizing and the rate at which connections are made. Thanks a lot for the support, and, yet again, phenomenal turnaround time. |
@rstoyanchev - Also, switched to Jetty as suggested and it took around 6 minutes to get 1100 client connections, then a couple of seconds for each new connection after that (didn't wait for the full 2000). I can get 2000 clean client connections in 40 seconds on the same host using tomcat. |
@daveburnsLIT I suspect whatever issue remains is related to the HTTP REST call failures. I did not investigate that further but it happens at the Tomcat level (or lower) because no logging appears from the DispatcherServlet for such failed requests. Perhaps we should treat that as a separate issue to work on demonstrating and narrowing down. Thanks for reporting the problem. We have 4.2.6 coming out this week that will have the fix. |
Thanks for this, will the new fix be part of spring boot release as well? I am running into a similar issue as described here at websocket+stomp+springboot |
@PaulGobin from reading the SO issue it doesn't sound the same. The issue is not about the server not accepting more connections. Rather it is about a memory leak. You need to find where exactly connections are getting rejected, e.g. at the WebSocket server or by the broker (ActiveMQ in your case). |
Version : Spring Boot 1.4.0.M2 and Spring Boot 1.3.3 Release
There is a memory leak in the Stomp Relay Broker. When client stomp websocket connections have disconnected the broker retains reactor tcp clients.
To replicate the issue the sample application used for our previous reactor memory leak investigation was executed for both Spring Boot 1.4.0.M2 and Spring Boot 1.3.3. The websocket broker was backed by a RabbitMQ instance.
The problem manifests when we disconnect clients as tcp client refs are retained. The following steps were used to replicate :
The following shows the visualVM monitor for the above test. The arrows show the points when step 7 above was performed. The upward slating red line was drawn to illustrate the leak i.e. baseline memory heap post GC is increasing.
The following shows the output for the top retained objects in each heap dump (arrows). The first heap dump is before any clients had connected, basically a newly started websocket broker and what we expect all subsequent heap dumps to look like post client run and GC.
The following is a heap dump after 2000 client connection have terminated, no clients are visible in RAbbitMQ and the actual Java client VM has terminated, i.e. no way there could be a websocket client connection to the broker. There is about an 8MB leak in StompBrokerRelayMessageHandler and/or the Reactor TCP client.
The following shows the result after another test run, we keep leaking 8MB for every 2000 websocket connect/disconnects.
It looks like some sort or pool (ArrayList in StompBrokerRelayMessagHandler) is increasing, although this could be related to the issue @smaldini previously looked at.
The text was updated successfully, but these errors were encountered: