-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Server non responsive under small load (thread-pool starvation) #17090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ouch 😢. Just an FYI, WebRequest is implemented very poorly on .NET Core and performs much worse than on .NET Framework (it's basically a trap). Bumping the min number of threads should help with code like this and the injection should be fast (more than 1-2 per second).. Are you sure the min threads fix is deployed? .NET Core 3.0 has counters you can use to look at this stuff, before 3.x it's hard to look at these kinds of metrics. |
@davidfowl I am sure its deployed and its logging out the new min threads that I have set .. but for some odd reason htop I side the container will show a very slow injection of new threads .. Is it important to set in before I create the host builder ? Maybe that I set the threads during configure services is an issue ? |
Ho and thanks about the webrequest tip I have encountered it already :) |
That shouldn't matter. @stephentoub @kouvel Any ideas on this? |
The default maximum IO completion thread count is 1000, so the call above may fail (min can't be greater than the current max). Check the result of the call to To set the min worker and IO completion thread counts, this should work: int workerThreads, ioCompletionThreads;
ThreadPool.GetMaxThreads(out workerThreads, out ioCompletionThreads);
ThreadPool.SetMaxThreads(Math.Max(10000, workerThreads), Math.Max(10000, ioCompletionThreads));
ThreadPool.SetMinThreads(10000, 10000); To set just the min worker thread count, this should work: int workerThreads, ioCompletionThreads;
ThreadPool.GetMinThreads(out workerThreads, out ioCompletionThreads);
ThreadPool.SetMinThreads(10000, ioCompletionThreads);
On Unixes the env var is case-sensitive and the correct casing is |
While using HttpClientFactory and avoiding any blocking I/O would be ideal, as a stopgap, instead of increasing the min thread count, it might be better just to limit the number of concurrent requests. If the requests complete quickly enough under normal load, limiting the number of requests executing at once could help a lot. You can try out our ConcurrencyLimiter package which allows you to do exactly this. Once the limit is reached, new requests are queued without blocking any threads. You can learn more about it and why it can be so helpful from this community standup discussion. |
@kouvel Thanks you so much! I didn't notice that I have set the min before the max .. and that there is a bool return value on this method :\ |
Yesss now we have Threads! |
@halter73 Thanks for the suggestion im gonna definitely look that up! is there a difference between queuing request (non bollocking) and queuing requests inside the threadpool using the min thread property ? basically the end result would be the same.. am i missing something ? |
Huh.. :( ConcurrencyLimiter is net core 3,0 only ..
|
The biggest difference is even when the request queue limit is reached, you should still have threadpool threads available as long as you set the concurrent request limit low enough. This is important in order to avoid a vicious cycle where the threads being injected into the threadpool immediately add more threadpool workitems before blocking yet another thread. In this case (which is common for any ASP.NET Core apps that do blocking I/O), injecting threads can paradoxically make the threadpool's workitem backlog even larger. In many cases, the blocked threads are really just waiting on another thread to become available to unblock the Task/operation it's waiting on rather than the actual I/O which has already completed. Since this is an example of code limiting access to the very resource (threadpool threads) needed to complete the operation it's waiting on, this is sometimes referred to as an "async deadlock". I go over a specific instance of this in aspnet/IISIntegration#245 (comment). Another big difference with the ConcurrencyLimiter is that you can set both a concurrent request limit and a queue size limit. When queue size limit is reached, it will start responding with 503s or do whatever you specify in a custom OnRejected RequestDelegate. This way you can proactively tell clients to back off when you're overloaded. With the threadpool, there's no upper bound that I know of on the number of workitems If the backlog becomes to large, by the time the workitem is dequeued to process the request, the client has probably already timed out the request making all the time spent executing that workitem wasted.
Is that when you upgrade to 3.0 in general or when you upgrade to 3.0 and enable the ConcurrencyLimiter? If it's the latter, it's possible that the clients are timing out the request in the middle of uploading the body. While the request is queued, Kestrel will only buffer up to about 1MB of the request body. So if the client is trying to upload more than that and gets queued for a long time, it makes sense that the client could close the connection mid-request-body. What do the client logs say? Do you have any more server logs? For diagnosing bad request failures, setting the min log level to debug is often helpful because we try to log client errors with low severity. |
Thank for the valuable info ! so for now i have tried to up the min threads to 10K, this helped somewhat but still the load the our servers require and the amount of blocking I/O that is causing slow responses we are still hitting starvation and timeouts. I'll update soon on progress, thanks. |
I was going to open an issue but then I found this one which is very similar. After updating our app to netcore3 from netcore2.2, we started getting several errors randomly, and on netcore2.2 we never had this error. This is the stack we are getting:
We are using
|
Okay so I can confirm that while increasing the min Threadpool threas help, it was not enough, we were still hitting starvation and getting timeouts. What helped was adding the concurrency limiter and setting it to the default of anyway playing with these 2 did the trick and the legacy code now works as expected and holds the load it should have keep in mind that the old NetFramework had IIS queue and concurrency limit to protect from that.. |
Edit @halter73 Hello again.. using
and the api is alive and all okay ..
looks like all 12 request gets locked and nothing is getting in. every new request will increas the counter, and nothing else is happening. I tried to "Disable" the concurrency limiter by setting the concurrency to 1K and the api started working fine again (with min threads set to 1k). I fear there might be a deadlock there somewhere. private async ValueTask<bool> SemaphoreAwaited(Task task)
{
await task;
return true;
}
this method awaits the semaphore while the middleware itself awaits the wrapping task
but im not entirely sure, would love to get your help on this. |
I'll admit SemaphoreAwaited is a bit weird. It probably makes more sense just to call the ValueTask ctor that takes a Task so the conversion doesn't require SemaphoreAwaited to run on the threadpool. That said, the ConcurrencyLimiterMiddleware also needs to continue on the threadpool as soon as the awaited task completes, and it should continue on the same thread as SemaphoreAwaited, so this shouldn't make much of a difference in performance. Additionally, if you aren't threadpool starved because the number of concurrent requests is limited enough, it shouldn't be a problem that SemaphoreAwaited needs a thread. I would recommend collecting a dump and running |
Edit okay I used
also have the trace file clrstack -all |
Okay .. Facepalm I am running a full load test to verify indeed this fixes the issue, ill update as we go.. in addition I think it is a good idea to "throw the devs into the pit of success" and throw some kind of |
Interesting. The app probably broke when the ConcurrencyLimiter middleware was added twice because both instances of the middleware were using the same singleton IQueuePolicy instance. Once the max number of concurrent requests were let through the outer middleware, the inner middleware must have checked the now already-throttling IQueuePolicy and seen that it shouldn't let any more requests through until more ongoing requests finished. Unfortunately, at this point, no requests could finish because the inner middleware was queuing them all leading to a deadlock that happened not to block any threads.
Edit: I now agree with @Tratcher that making the IQueuePolicy transient is the better fix. You might still take a slight performance hit from registering the ConcurrencyLimiter middleware multiple times unintentionally, but at least everything should still work as intended. |
@Tratcher pointed out that making the IQueuePolicy a transient instead of a singleton would have stopped the deadlock issue. Since the middleware itself has a singleton lifetime and resolves the IQueuePolicy once, this could work. This still wouldn't make it easy to configure multiple instances of the middleware differently, but that's true for an IOptions-based approach too. Out of curiosity, @arthurvaverko is there anything that made it easy to accidentally add the ConcurrencyLimiter middleware multiple times other than the fact that it didn't throw when you did? |
Nothing made it easy ... I agree that making the IQueuePolicy transient is a better approach. Ill try to find some time to maybe make my first conlntrib to the AspNetCore repo :) |
Thank you for contacting us. Due to no activity on this issue we're closing it in an effort to keep our backlog clean. If you believe there is a concern related to the ASP.NET Core framework, which hasn't been addressed yet, please file a new issue. |
I have a very large legacy code-base that we recently converted to netcore.2.2 from Net 4.6.1
The the entire legacy code is fully synchronous blocking calls starting from db using
dataset.Fill()
up to using WebRequests with stream readers and so on.
Under not so heavy load Kestrel server becomes unresponsive (50 concurrent users)
I can clearly see that we have threadpool starvation issue described here: aspnet/KestrelHttpServer#2104
I see that during simple load with
ab
of 50 concurrent users the amount of threads in htop starting to climb 1-2 per second and when I increase the concurrency I see that it goes even further but the client gets timeouts (lots of them ...)the middleware pipeline i have is fully async up until the point it calls upon the controller (they are sync and legacy)
I am not at a point where we can convert the code into a pure async all the way architecture and I am looking for a workaround.
i have set the ThreadPool minimum like this in my StartupClass
but still the app starts with 25 threads and when hit with a load of even 50 users the threads keep growing 1-2 per second without letting requests in.
I am running from a container and tried to add the
-e ComPlus_ThreadPool_ForceMinWorkerThreads=1000
but still got same behavioralso set the
--sysctl net.core.somaxconn=4096
didnt helpI would love to get some pointers from you as to how I can avoid threadpool starvation without going full async on the legacy code.
The text was updated successfully, but these errors were encountered: