Skip to content

System.Data.SqlClient 4.7.0 tends to deadlock on async #262

@marcwittke

Description

@marcwittke

I am afraid I cannot provide a clean reproduction, but I want to share my experience, so that you could decide if this is an issue or not.

Turns out, that with the bump to 4.7.0 (from 4.6.1) our CI build deadlocks almost every run during unit tests. There is neither a problem on development environments nor on production.

The setup: a xunit test suite starts a .net core 2.1 service host as shared fixture to run some integration tests. part of the application stack is asp.net identity core. The whole environment is being started via docker-compose so that the dependent SQL Server is also available. A rather complete test database seeding is done during test initialization. Build agents are cheap Azure VMs of B2s size (2vCPUs, 4GB RAM) running Ubuntu 18.04 with docker.

Since I was unable to dump/debug the locked process on the VM inside docker, I spent some time to reproduce the issue locally (i7-8850H, Archlinux, Jetbrains Rider). I finally made it by running stress --cpu 12 --io 10 --vm 5 --hdd 1, starting the sqlserver process inside docker and limiting it via taskset -p 0x01 nnnn to one CPU, and starting the integration test restricted to the same CPU directly via taskset 0x01 dotnet test.

Attaching a debugger on the locked process reveals the following stack trace (besides various others waiting on reset events, like the xunit runner, test diagnostics and Application insights):

image

the last frame on the stack resulting from our application code is this line:

await userManager.GetAuthenticationTokenAsync(user, TokenManager.TokenProviderName, "refreshToken"); 

Note that there is not a single Task.Wait() or Task.Result in the whole code base, everything is "async all the way down". After suspecting the advanced parallelization features of xUnit as part of the problem, I disabled all test parallelization and lifted all restrictions on allowed threads, but without any effect. The SQL Server is only hit by one open connection, state "await command".

Reverting the solution to System.Data.SqlClient 4.6.1 made all problems go away, there was not a single locking build since then, so I am pretty sure it's related.

Note: The solution is on a private GitHub Repo, but I'd be able to share an export or more details, when needed.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions