Skip to content

WebSocket does not honor ShutDownTimeout once SIGTERM/Ctrl+C is issued #26482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RaviPidaparthi opened this issue Oct 1, 2020 · 13 comments
Closed
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Milestone

Comments

@RaviPidaparthi
Copy link

RaviPidaparthi commented Oct 1, 2020

Description

Once a SIGTERM is issued, the host should be allowed the opportunity to gracefully drain and shutdown all existing http and WebSocket connections.

This used to work properly on netcore2.2, however we observed that since netcore3.1 this functionality is broken for WebSockets. This also remains broken on net5.0.

To Reproduce

Fully baked samples for netcore2.2, netcore3.1 and net5.0 can be found here.
https://github.com/RaviPidaparthi/PlayGround/tree/master/WebsocketGracefulShutdown

To repro

  1. Open, build and Start the solution from VS (>= 16.8.0 Preview version)
  2. This should start the client and 3 separate services each for netcore2.2, netcore3.1 and net5.0.
  3. In the client console window, press any key to start off the WebSocket connections to all the 3 services
  4. Wait for a few seconds until a few messages are sent to all 3 services.
  5. Press ctrl+c on all the 3 services

For netcore2.2 service, the the WebSocket will continue processing/reading messages until 30sec after shutdown is initiated.
For netcore3.1 and net5.0 services, the WebSocket read operation throws with below error as soon as shutdown is initiated.

Error=System.Threading.Tasks.TaskCanceledException: The request was aborted ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: The connection was aborted because the server is shutting down and request processing didn't complete within the time specified by HostOptions.ShutdownTimeout.
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.PumpAsync()
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
   at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.StartTimingReadAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
   at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, Boolean throwOnPrematureClosure)
   at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
   at WebsocketGracefulShutdown22.Controllers.TestController.TestWsAsync(CancellationToken cancellationToken) in C:\Users\rapida\source\repos\WebsocketGracefulShutdown\WebsocketGracefulShutdown22\Controllers\TestController.cs:line 25

Code snippets

Configure shutdown timeout

public static IHostBuilder CreateHostBuilder(string[] args) =>
    Host.CreateDefaultBuilder(args)
        .ConfigureWebHostDefaults(webBuilder =>
        {
            webBuilder.UseShutdownTimeout(TimeSpan.FromSeconds(30)).UseStartup<Startup>();
        });

Standup a Websocket endpoint

[HttpGetAttribute("/ws/test")]
public async Task TestWsAsync(CancellationToken cancellationToken)
{
    var webSocket = await this.HttpContext.WebSockets.AcceptWebSocketAsync().ConfigureAwait(false);

    while (true)
    {
        try
        {
            var result = await webSocket.ReceiveAsync(new ArraySegment<byte>(new byte[64000], 0, 64000), cancellationToken).ConfigureAwait(false);
            await Task.Delay(TimeSpan.FromMilliseconds(1000)).ConfigureAwait(false);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
            break;
        }
    }
}

Create a client

try
{
    var websocket = new ClientWebSocket();
    await websocket.ConnectAsync(new Uri($"ws://localhost:10050/ws/test"), CancellationToken.None).ConfigureAwait(false);
    while (true)
    {
        var bytes = Encoding.UTF8.GetBytes(Guid.NewGuid().ToString());
        await websocket.SendAsync(new ArraySegment<byte>(bytes, 0, bytes.Length), WebSocketMessageType.Binary, true, CancellationToken.None).ConfigureAwait(false);
        await Task.Delay(1000).ConfigureAwait(false);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex);
}

Exceptions

Error=System.Threading.Tasks.TaskCanceledException: The request was aborted ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: The connection was aborted because the server is shutting down and request processing didn't complete within the time specified by HostOptions.ShutdownTimeout.
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.PumpAsync()
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
   at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.StartTimingReadAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
   at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, Boolean throwOnPrematureClosure)
   at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
   at WebsocketGracefulShutdown22.Controllers.TestController.TestWsAsync(CancellationToken cancellationToken) in C:\Users\rapida\source\repos\WebsocketGracefulShutdown\WebsocketGracefulShutdown22\Controllers\TestController.cs:line 25

Further technical details

ASP.NET Core version=net5.0
Visual Studio Version 16.8.0 Preview 3.2

@splusq
Copy link

splusq commented Oct 1, 2020

@davidfowl curious how signalr handles this. Since upgrading to .net 3.1 this is affecting our service availability.

@davidfowl
Copy link
Member

davidfowl commented Oct 1, 2020

I'm guessing we didn't plumb the call to UseShutdownTimeout to the generic host properly (looking at the code). You should be able to work around this by using this instead:

public static IHostBuilder CreateHostBuilder(string[] args) =>
    Host.CreateDefaultBuilder(args)
    .ConfigureServices(services =>
            {
                services.Configure<HostOptions>(o => o.ShutdownTimeout = TimeSpan.FromSeconds(30));
            })
        .ConfigureWebHostDefaults(webBuilder =>
        {
            webBuilder.UseStartup<Startup>();
        });

@RaviPidaparthi
Copy link
Author

RaviPidaparthi commented Oct 1, 2020

We hit a different exception doing Configure<HostOptions>.

On net5.0, as soon as SIGTERM is issued, the immediate WebSocket read fails with a cancellation exception.
However if we ignore the exception and continue, the subsequent reads succeed.

System.OperationCanceledException: The read was canceled
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken)
   at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure)
   at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
   at WebsocketGracefulShutdown50.Controllers.TestController.TestWsAsync(CancellationToken cancellationToken)

On netcore3.1, all reads after SIGTERM keep failing with a slightly different cancelled exception.

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
   at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure)
   at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
   at WebsocketGracefulShutdown31.Controllers.TestController.TestWsAsync(CancellationToken cancellationToken) in C:\git\PlayGround\WebsocketGracefulShutdown\WebsocketGracefulShutdown31\Controllers\TestController.cs:line 26

I'm guessing we didn't plumb the call to UseShutdownTimeout to the generic host properly (looking at the code). You should be able to work around this by using this instead:

public static IHostBuilder CreateHostBuilder(string[] args) =>
    Host.CreateDefaultBuilder(args)
    .ConfigureServices(services =>
            {
                services.Configure<HostOptions>(o => o.ShutdownTimeout = TimeSpan.FromSeconds(30));
            })
        .ConfigureWebHostDefaults(webBuilder =>
        {
            webBuilder.UseStartup<Startup>();
        });

@davidfowl
Copy link
Member

I can reproduce.

@halter73
Copy link
Member

halter73 commented Oct 2, 2020

It looks like we introduced this WebSocket shutdown bug when we pipeified our request bodies. For normal HTTP/1.1 Content-Length and chunked request bodies, we have logic that ignores canceled ReadResults from the connection-level PipeReader and re-reads internally, if RequestBodyPipe.CancelPendingRead() wasn't called by the application.

// Ignore the canceled readResult if it wasn't canceled by the user.
// Normally we do not return a canceled ReadResult unless CancelPendingRead was called on the request body PipeReader itself,
// but if the last call to AdvanceTo examined data it did not consume, we cannot reset the state of the Input pipe.
// https://github.com/dotnet/aspnetcore/issues/19476
if (!_readResult.IsCanceled || Interlocked.Exchange(ref _userCanceled, 0) == 1 || _cannotResetInputPipe)
{

This is important, because the connection-level PipeReader will return canceled ReadResults at the start of server shutdown. This is because Kestrel's graceful shutdown logic calls CancelPendingRead() on very connection-level PipeReader to wake up idle connections' request processing loops so they can exit gracefully if they are between requests.

If the connection is in the middle of the request, which is the case with an active WebSocket from Kestrel's perspective, the canceled ReadResult should just be ignored like it is for normal request bodies. Kestrel will still close the connection gracefully if the middleware exits before the ShutdownTimeout. Unfortunately, the Http1UpgradeMessageBody that backs WebSockets does not have the same reread-if-the-ReadResult-was-canceled-by-something-other-than-the-application logic the normal request bodies do.

public override ValueTask<ReadResult> ReadAsync(CancellationToken cancellationToken = default)
{
ThrowIfCompleted();
return _context.Input.ReadAsync(cancellationToken);
}

This in turn causes HttpRequestStream to throw an OperationCanceledException:

if (result.IsCanceled)
{
throw new OperationCanceledException("The read was canceled");
}

But since PipeReader.CancelPendingRead() only causes the current or next call to PipeReader.ReadAsync() to return a canceled ReadResult and not any subsequent reads, this only gets thrown once.

It is unusual to try to continue reading after observing an exception, but this is still a bug. It might make sense for HttpRequestStream.ReadAsync() to throw an even scarier exception since it should never observe a canceled ReadResult unless someone called RequestBodyPipe.CancelPendingRead() while also reading from the request Body Stream which would be very odd.

@davidfowl
Copy link
Member

@Pilchie This is something we want to patch in 3.1 (and 5.0 and 6.0)

@Pilchie Pilchie added this to the 3.1.x milestone Oct 2, 2020
@Pilchie
Copy link
Member

Pilchie commented Oct 2, 2020

I put it in the 3.1.x milestone. Please loop me in when we have a proposed fix. One thing that will help me is describing the impact to customers, and the set of possible workarounds.

@halter73
Copy link
Member

halter73 commented Oct 2, 2020

The impact to customers is that WebSocket.ReadAsync() throws an OperationCanceledException as soon as Kestrel starts shutting down rather than throwing after HostOptions.ShutdownTimeout (which defaults to 5 seconds) expires like a normal HTTP request body does.

We think this does not cause a problem for SignalR because SignalR registers with the ApplicationStopping token and immediately closes its connections once the token fires. For some apps, closing the WebSocket immediately after ApplicationStopping fires or after observing the OperationCanceledException might be OK.

appLifetime.ApplicationStopping.Register(() => CloseConnections());

For apps that want to still read from the WebSocket after shutdown has initiated, swallowing the first OperationCanceledException is the only option. This exception is transient, so the next read should succeed if more data arrives before the ShutdownTimeout expires as weird as that is.

@halter73
Copy link
Member

halter73 commented Oct 2, 2020

The fix is going to be to not throw from WebSocket.ReadAsync() until the ShutdownTimeout expires at which point the underlying socket will be closed and WebSocket.ReadAsync() will repeatedly throw. You'll still be able to use ApplicationStopping to start gracefully closing the connection when Kestrel starts shutting down.

@Pilchie
Copy link
Member

Pilchie commented Oct 13, 2020

Just a note that we'll need to move fairly quickly if we want this for November servicing.

@Pilchie
Copy link
Member

Pilchie commented Oct 26, 2020

@halter73 Now this can be closed, right?

@halter73
Copy link
Member

Yep.

@RaviPidaparthi
Copy link
Author

Thanks every one for fixing this. We will verify once its available.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 27, 2020
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests

7 participants