-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Debug dotnet new
timeouts
#32889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug dotnet new
timeouts
#32889
Conversation
I don't plan to merge this. Added reviewers partially for awareness and partially because I'll likely need help making heads or tails of whatever dump this captures. Will rerun 'til I hit the failures, perhaps after removing all non-templates non-Helix jobs from the pipeline YAML. |
Well the reason other tests can't get the lock is because it's global for the process, so when 1 test is hanging on |
You mean one test hangs for 15 minutes and enough tests get backed up waiting that some exceed their 20 minute |
Why do we need to take a global lock? Can we just change these tests to run sequentially instead? Maybe that would help? I'd imagine those locks were added pre helix days when the templates didn't have an isolated helix machine all to themselves. |
@HaoK sequential runs might be fine but we need to figure out the |
We used to do that and the tests would take very long to run. @javiercn or @SteveSandersonMS might have more insights about this. |
Has the math perhaps changed since removing the uninstallation of the default templates❔ |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Heisenbug? 😄 |
I can't tell 😀 |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
May need to lengthen other locks as well. Logs in https://dev.azure.com/dnceng/public/_build/results?buildId=1151130&view=ms.vss-test-web.build-test-results-tab&runId=34807950&resultId=123880&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab make it appear the first failing test timed out after 15 minutes but other tests ran after it, some failing to acquire |
Looks like there was another timeout we needed to update |
Thanks @BrennanConroy. Unfortunately, more Heisenbug behaviour… |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Why is this running quarantined jobs only? Aren't most of the templates tests not quarantined? |
We saw the timeouts in regular test runs and aspnetcore-quarantined-pr covers the quarantined tests. That said, we don't need another pipeline for this debugging and aspnetcore-quarantined-pr isn't running a focused set of tests. |
e8d9be4
to
0f87949
Compare
0f87949
to
db7f554
Compare
Not sure you want to add the blazor templates to this since those will fail a lot due to actual flakiness unrelated to dotnet new, another random datapoint for you, when i was looking at the blazor template with retry runs, occasionally there will be a test run that passes after a little more than 10 minutes, which is an extreme outlier from the normal 1-2 minute runs, that looks like some kind of timeout since its such a round number |
Another Relevant bits may be
and
Binary log at https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-aspnetcore-refs-pull-32889-merge-6a5d0c1edf6647f58f/ProjectTemplates.Tests--net6.0/AspNet.nniqxrj5imm.binlog may help too. |
db7f554
to
f36ec2d
Compare
I turned it on hoping it would be useful, but either it contains nothing for us or I don't know how to find the useful information in it. We should probably turn it off at this point. |
/azp run |
Pull request contains merge conflicts. |
- hang long enough to grab a dump This will _not_ help us understand later inability to acquire `DotNetNewLock` - guessing but this might have something to do w/ the 20 minute default timeout for that acquisition
f36ec2d
to
a29a46c
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Now that @sebastienros removed |
This will not help us understand later inability to acquire
DotNetNewLock