Kestrel + IIS is falling over with a small load in production with RTM #245
Description
We are currently experiencing an issue in production (after upgrading from DNX to RTM) when our Web API is
- loaded with around 10 req/second
- Some calls are taking > 2 second with sometimes over 1mb in a payload.
- Our average API timing and payload would be < 1 second and around 30kb of data.
It seems that sometimes when the larger API get queued whilst processing the smaller requests it overwhelms Kestrel.
We have a setup as follows:
- Dotnet Core RTM (1.0.0)
- Webhosting pack (1.0.0) installed
- Windows Server 2012 R2
- Large number of API's
- HTTPS
- Windows Authentication
- Entity Framework 7
- SQL Server 2014 (different server)
What we find is that we start eventually getting 502.3 errors and then Kestrel become unresponsive or takes a very long time to respond (e.g. > 30 seconds for a specific API when usually the API would respond in less than a second).
The only way to resolve this issue is to recycle the app pool and kill the process thread (currently we are having to do this at least once a day in production).
Do we need to tune IIS to handle this scenario and how can we diagnose, provide log files to get to the bottom of this, as prior to RC2 (we upgraded from DNX through to RTM) the website has never crashed.