-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fasthttp behind Aws load balancer. Keepalive conn are causing trouble #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you try this, it's what we use with the Google Loadbalancer: s := &fasthttp.Server{
Handler: OurHandler,
ReadTimeout: time.Hour,
WriteTimeout: time.Hour,
MaxKeepaliveDuration: time.Hour,
} How many requests are you getting per second? |
Hey thanks for the quick response. About the requests, it's weird. We have 2 instances of the same service:
Thanks again, we'll try those parameters. |
You might also want to try my fork of fasthttp which is actually being maintained (this original version is not maintained anymore): https://github.com/erikdubbelboer/fasthttp |
@erikdubbelboer if you fixed this issue, can you please send us a PR with the fix? |
Hey @erikdubbelboer But still some of the times I am getting 502 - backend_connection_closed_before_data_sent_to_client What could be the issue? |
@Hemant-Mann I think your 502 errors can occur exactly when connection timeout happens. Please check logs for a time patterns and share us some info |
@Hemant-Mann I'll have another look. I'm about to board a 16 hour flight so it will take a while 😄 |
@erikdubbelboer have a good trip ;) |
Hey guys, just an FYI, we are still having the issue, the only thing we could do was to deactivate the keep alive in the fasthttp Server. We didn't have time to try your fork @erikdubbelboer :( (Have a nice trip btw!) PS: I can get some logs of what happens during these errors, but most of the time the request doesn't even reach the go app (running inside a docker container), so I don't think is going to be possible or easy. Thank you and glad this is being maintained again! |
@Rulox I'll be glad to help as soon as I get some logs |
While searching for the bug I noticed that I have always misunderstood I suggest setting |
@erikdubbelboer unfortunately, this should not be a goto solution, 'cause you can get yourself leaking connection pool |
@kirillDanshin I don't see how? Connections will still timeout after |
@erikdubbelboer timeouts didn't work for me when using ELB, they're keeping connections alive for a really long time, but never reuse it after time configured in ELB |
@kirillDanshin That sounds like an AWS issue then. I only use the Google Load Balancer which always worked perfectly fine so far. |
Hey guys, I don't think it has anything to do with the idle keepalive duration setting. We're seeing the error also on the first 5/6 requests when an user enters our website, so these are new connections from a new host. Our fasthttp service acts as a proxy with a server and a client (both fasthttp, check my first comment)
So far this is a recap:
I'm starting to think it might be the combination with a fasthttp server and client? This doesn't make any sense. But we have other services (in python for example) that are working well without disruption. I've tried to change from fasthttp to nethttp to check if there's something we're doing bad, but to be honest, fasthttp performance is much better, and that's something we really need. Also @Hemant-Mann says he's having issues with the Google LB too, that makes me think that is not only on our end? (Hemant are you using fasthttp just as a web server? something you can relate to my architecture?) Thanks guys! |
In your diagram Client is ELB? I have looked over the code again but I don't see anything that could cause this. One more thing you could try is removing this optimization. In this comment valyala mentions that the issue was fixed in go1.10 so the optimization shouldn't be needed anymore and could maybe cause issues in some cases where this causes the connection to be closed before you would expect. I think the issue for @Hemant-Mann might have been the |
Hi Everyone So I analysed my GCP load balancer logs for the last 24 hours Additional Info:
Below is the stats (Date_Hour: error_count) {
"2018-08-15_21": 41,
"2018-08-15_22": 27,
"2018-08-15_23": 1,
"2018-08-16_06": 2,
"2018-08-16_07": 1,
"2018-08-16_08": 1,
"2018-08-16_09": 3,
"2018-08-16_12": 1,
"2018-08-16_14": 1,
"2018-08-16_15": 10,
"2018-08-16_16": 1,
"2018-08-16_19": 5,
"2018-08-16_21": 1
} This shows the number of 502's generated because of backend_connection_closed_before_data_sent_to_client |
@Hemant-Mann please send us your CPU load stats and |
@kirillDanshin I have not configured any separate monitoring for CPU stats other than what the google cloud offers and the backend service autoscales at 60% capacity and |
@Hemant-Mann autoscale might be at 60% but insividual machines might still be at 100%. We had the same issue in the past. Can you check if around the error time any machines are at 100%? |
Hey @erikdubbelboer I'm going to try to extract the main code as an example. It happens in our pre-production env with a few connections (1 or 2) too, so we've discarded overload (CPU, connection pool, etc) Thanks! |
If it already happens with that few connections then I think I really need your code to reproduce the issue. |
@Rulox any update on this? |
I use fasthttp for multiple services all behind ALBs in AWS. Although, I'm probably using a year old version. I've never seen this issue. The only settings I change from the default are the ReadTimeout, MaxRequestBodySize and Concurrency. |
@erikdubbelboer Yeah sorry it's been a crazy month. I made this as an example https://github.com/Rulox/proxy-tiny That's pretty much our code (deleting business logic code, which is some header manipulation and security, but nothing else). The main.go in the root contains the reverse proxy implementation, I prepared a docker-compose.yml as well if you want to try (see readme) If you see anything weird please let me know! Thanks a lot |
@Rulox in the
code do you remove the This proxy is very little code and should work in theory. Have you tried this simple proxy with the AWS loadbalancer as well and is it causing the exact same issues? |
@erikdubbelboer thanks for the quick response. Yes I remove the Connection header before doing the request to the proxied service, and before sending back the response to the client (ELB in this case). Like this:
That's going to be my next test, use this code behind the ELB. Thanks for the heads up |
Hi Guys Thanks for your efforts: @erikdubbelboer and @kirillDanshin |
Maybe you can see this issue moby/moby#31208 and this https://success.docker.com/article/ipvs-connection-timeout-issue |
Awesome @bslizon I'll try , thanks |
@Rulox is this still and issue or can this be closed? |
@erikdubbelboer You can close if you want to keep the list clean, we haven't had the time to test it as we're swamped with different things so I'm not sure when we will have the time. Sorry for any inconvenience. |
@Rulox Ok I'll close it for now. You can reopen it if you find the same issue in the future. |
@erikdubbelboer hi, im using fasthttp v1.1.0. what are the suggested settings for MaxKeepaliveDuration, readTimeout, writeTimeout? i want to make sure the connections are open as long as possible and dont close MaxKeepaliveDuration time... |
You'll have to use 1.3.0 and set |
@erikdubbelboer what is the issue with fasthttp v1.1.0?
for 1 and 3 i can use: // GetOpenConnectionsCount returns a number of opened connections. |
@Arnold1 v1.1.0 is and old release. It's obviously always better to use a newer version to get the latest improvements and bug fixes. I just released v1.4.0, I suggest you use that.
|
ok will try v1.4.
|
|
isnt OpenConnections different from concurrentRequests? im thinking of adding a counter to my handler:
|
Yes, |
Hi!
We're using a light/fast fasthttp server as a proxy in our services infrastructure. However, we've been experiencing some issues when we use an amazon Load Balancer. Sometimes (and this is randomly) the ALB returns 502 because the request can't find the fasthttp service. Note that ALB uses keepalive connections by default and that can't be changed.
After a while doing some research, we were suspicious that fasthttp was closing the keepalive connections at some point, and the ALB couldn't re-use it, so it would return a 502.
If we set the
Server.DisableKeepAlive = true
everything works as expected (with a lot more of load of course)We reduced our implementation to the minimum to test:
The handler basically does this:
Is there any chance someone has experienced this? I'm not sure how we should proceed with the keepalive connections in the fasthttp.Server, as we are using pretty much all the default parameters.
Thanks in advance!
The text was updated successfully, but these errors were encountered: