Skip to content

Conversation

dougbu
Copy link
Contributor

@dougbu dougbu commented Nov 16, 2021

  • part of dotnet/aspnetcore-internal#3950
  • move to Helix queues for Alpine 3.14, OSX 10.15, and (for Arm64) Debian 11 (not 9)
    • use OSX 11.00 when testing PRs and rolling builds; reduce 10.15 usage to scheduled runs
    • remove duplication between PRs / rolling builds and scheduled runs
  • build source-index on windows-latest (not vs2017-win2016)
  • update Helix Docker images for Debian.11.Amd64.Open and Fedora.34.Amd64.Open

nits:

  • don't skip unused Helix queues
  • remove versions from pipeline job display names
    • some were already outdated; rest will be confusing in the future
  • remove comments about unused Helix queues

@Pilchie Pilchie added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label Nov 16, 2021
@dougbu dougbu force-pushed the dougbu/agent.types.3950 branch from c9b383f to ec21abe Compare November 16, 2021 18:31
@dougbu
Copy link
Contributor Author

dougbu commented Nov 16, 2021

--- ".\\helix.matrix.before"    2021-11-16 13:38:59.593746300 -0800
+++ ".\\helix.matrix.after"     2021-11-16 13:21:31.644538900 -0800
@@ -1,14 +1,11 @@
-(Alpine.312.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:alpine-3.12-helix-20200908125345-56c6673
-(Debian.11.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-amd64-20210304164428-5a7c380
-(Debian.9.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:debian-9-helix-arm64v8-a12566d-20190807161036
-(Fedora.34.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:fedora-34-helix-20210728124700-4f64125
+(Alpine.314.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:alpine-3.14-helix-amd64-20210910135833-1848e19
+(Debian.11.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-amd64-20211001171307-0ece9b3
+(Debian.11.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-arm64v8-20211001171229-97d8652
+(Fedora.34.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:fedora-34-helix-20210924174119-4f64125
 (Mariner)[email protected]/dotnet-buildtools/prereqs:cbl-mariner-1.0-helix-20210528192219-92bf620
-OSX.1014.Amd64.Open
-OSX.1100.Amd64.Open
+OSX.1015.Amd64.Open
 Redhat.7.Amd64.Open
-Ubuntu.1804.Amd64.Open
 Ubuntu.2004.Amd64.Open
 Windows.10.Amd64.Server20H2.Open
 Windows.10.Arm64v8.Open
-Windows.11.Amd64.ClientPre.Open
 Windows.Amd64.Server2022.Open

aspnetcore-quarantined-tests runs on the same queues as above before and after.

@dougbu
Copy link
Contributor Author

dougbu commented Nov 16, 2021

--- ".\\quarantined.pr.before"  2021-11-16 13:57:28.157821100 -0800
+++ ".\\quarantined.pr.after"   2021-11-16 13:53:39.075019400 -0800
@@ -1,4 +1,4 @@
-(Fedora.34.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:fedora-34-helix-20210728124700-4f64125
-OSX.1014.Amd64.Open
+OSX.1100.Amd64.Open
 Ubuntu.1804.Amd64.Open
 Windows.11.Amd64.ClientPre.Open

aspnetcore-ci Helix job runs on the same queues before and after as above except it does no Fedora 34 testing today i.e. it now lines up with aspnetcore-quarantined-pr.

@dougbu dougbu requested a review from a team November 16, 2021 22:01
@dougbu dougbu marked this pull request as ready for review November 16, 2021 22:02
Copy link
Contributor Author

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes

  • aspnetcore-helix-matrix failures are about flakiness that exists in our regular runs
  • aspnetcore-quarantined-pr pipeline times out for 'main' builds fairly frequently

<PropertyGroup Condition="'$(TestDependsOnPlaywright)' == 'true'">
<SkipHelixQueues>
$(HelixQueueAlpine312);
$(HelixQueueAlpine314);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth removing this line and running aspnetcore-helix-matrix again❔ It's unclear how to determine whether Playwright is working everywhere given all or most Playwright tests are quarantined and at least the BlazorWasmTemplateTest tests mostly fail in any case. @javiercn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wtgodbe you mentioned Alpine 3.14 image doesn't work with Playwright in #36032. How did you confirm that❔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can basically safely ignore playwright now, I believe the only place that's running now is in the components pipleine, @TanayParikh can you confirm that's true?

Copy link
Member

@wtgodbe wtgodbe Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the original PRs where I:

The linked helix-matrix might have info that isn't in the PRs - all I remember is that the failure I mentioned in the issue was persistent. I couldn't find anything more in emails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @HaoK remembers something I don't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

believe the only place that's running now is in the components pipeline.

The Components pipeline mainly runs Microsoft.AspNetCore.Components.E2ETests and those rely on Selenium, not Playwright. In addition, those tests don't run on Helix.

On the other hand, Playwright is used only in BlazorTemplates.Tests. Those tests are all quarantined and never run on any Linux e.g. from a recent aspnetcore-quarantined-tests run of 'main':

image

In general, can that restriction be removed or reduced (adding a Linux platform)❔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, can that restriction be removed or reduced (adding a Linux platform)❔

@javiercn and @dotnet/aspnet-blazor-eng (because you're likely the most familiar w/ the $(TestDependsOnPlaywright) exclusions) please chime in here. I see Playwright is fully supported on Ubuntu 18.04 and 20.04 but we're running our Playwright tests on neither. I haven't searched enough to understand why Ubuntu 18.04 is also not running Playwright tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiercn @dotnet/aspnet-blazor-eng any thoughts on Doug's question above?

Copy link
Contributor Author

@dougbu dougbu Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now leaning toward leaving everything but the duplicate Fedora testing unchanged and getting this in tonight or tomorrow. The experts can deal with our limited coverage for the Blazor template tests and Playwright system compatibility.

@dougbu dougbu requested a review from HaoK November 16, 2021 23:34
Copy link
Contributor Author

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking to resolve questions before merging this PR…

<PropertyGroup Condition="'$(TestDependsOnPlaywright)' == 'true'">
<SkipHelixQueues>
$(HelixQueueAlpine312);
$(HelixQueueAlpine314);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, can that restriction be removed or reduced (adding a Linux platform)❔

@javiercn and @dotnet/aspnet-blazor-eng (because you're likely the most familiar w/ the $(TestDependsOnPlaywright) exclusions) please chime in here. I see Playwright is fully supported on Ubuntu 18.04 and 20.04 but we're running our Playwright tests on neither. I haven't searched enough to understand why Ubuntu 18.04 is also not running Playwright tests.

@wtgodbe
Copy link
Member

wtgodbe commented Nov 18, 2021

We have one test failure on OSX 10.15:

[FAIL]
[xUnit.net 00:01:41.07] System.Threading.Tasks.TaskCanceledException : The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
[xUnit.net 00:01:41.07] ---- System.TimeoutException : A task was canceled.
[xUnit.net 00:01:41.07] -------- System.Threading.Tasks.TaskCanceledException : A task was canceled.
[xUnit.net 00:01:41.07] Stack Trace:
[xUnit.net 00:01:41.07] at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
[xUnit.net 00:01:41.07] at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] //src/Identity/test/Identity.Test/IdentityUIScriptsTest.cs(86,0): at Microsoft.AspNetCore.Identity.Test.IdentityUIScriptsTest.IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(ScriptTag scriptTag)
[xUnit.net 00:01:41.07] --- End of stack trace from previous location ---
[xUnit.net 00:01:41.07] ----- Inner Stack Trace -----
[xUnit.net 00:01:41.07]
[xUnit.net 00:01:41.07] ----- Inner Stack Trace -----
[xUnit.net 00:01:41.07] /
/src/Identity/test/Identity.Test/RetryHandler.cs(56,0): at Microsoft.AspNetCore.Identity.Test.RetryHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] Output:
[xUnit.net 00:01:41.07] Sending request 'GET - https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js' 1 attempt.
[xUnit.net 00:01:41.07] Request 'GET - https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js' failed with System.Threading.Tasks.TaskCanceledException: A task was canceled.
[xUnit.net 00:01:41.07] at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[xUnit.net 00:01:41.07] at Microsoft.AspNetCore.Identity.Test.RetryHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in //src/Identity/test/Identity.Test/RetryHandler.cs:line 40
Failed Microsoft.AspNetCore.Identity.Test.IdentityUIScriptsTest.IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(scriptTag: https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js) [1 m 40 s]
Error Message:
System.Threading.Tasks.TaskCanceledException : The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
---- System.TimeoutException : A task was canceled.
-------- System.Threading.Tasks.TaskCanceledException : A task was canceled.
Stack Trace:
at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Identity.Test.IdentityUIScriptsTest.IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(ScriptTag scriptTag) in /
/src/Identity/test/Identity.Test/IdentityUIScriptsTest.cs:line 86
--- End of stack trace from previous location ---
----- Inner Stack Trace -----

----- Inner Stack Trace -----
at Microsoft.AspNetCore.Identity.Test.RetryHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in //src/Identity/test/Identity.Test/RetryHandler.cs:line 56
at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
Standard Output Messages:
Sending request 'GET - https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js' 1 attempt.
Request 'GET - https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js' failed with System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Identity.Test.RetryHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /
/src/Identity/test/Identity.Test/RetryHandler.cs:line 40

@HaoK have you seen this one before? Should we just quarantine it?

@Tratcher
Copy link
Member

External networking issue? We wouldn't normally quarantine a test for that.

That said, why is a test hitting external resources?

@dougbu
Copy link
Contributor Author

dougbu commented Nov 18, 2021

That said, why is a test hitting external resources?

The failure is in the Microsoft.AspNetCore.Identity.Test--net7.0 work item and it's running fine everywhere but on OSX 10.15. I suspect it's just a glitch but agree it would be nice if the test weren't using JS downloads from anywhere other than the test web app. @HaoK

@dougbu
Copy link
Contributor Author

dougbu commented Nov 18, 2021

Separately, @Tratcher could you answer my question at https://github.com/dotnet/aspnetcore/pull/38427/files#r749957765 please❔

@HaoK
Copy link
Member

HaoK commented Nov 18, 2021

We need the external hit as we want to ensure the contents of identity UI match the cdn, so its literally comparing against the cdn contents

@HaoK
Copy link
Member

HaoK commented Nov 18, 2021

For that specific test, we can also just configure helix retries since that has a known external network dependency: https://github.com/dotnet/aspnetcore/blob/main/eng/test-configuration.json#L10 we should add a glob line for all of the script ui tests: "Microsoft.AspNetCore.Identity.Test.IdentityUIScriptsTest.*"

@dougbu
Copy link
Contributor Author

dougbu commented Nov 18, 2021

Note the entire work item failed in this case even though it was only IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent() that fell over. The problem here is the HttpClient timeout of 100 seconds is too long to retry the expected 5 times. We could change src/Identity/test/Identity.Test/RetryHandler.cs to use a shorter timeout for the request. That would be much better than retrying the entire work item.

And, on @Tratcher's overall point, we would be in a much better place if the exponential retry in RetryHandler weren't the only thing we backed off or if we weren't trying to retry so many times.

Bottom line, the test fails almost never but the RetryHandler in that test class isn't well set up.

@HaoK
Copy link
Member

HaoK commented Nov 18, 2021

We could just get rid of using retry handler and let the helix work item retry the entire workitem, it doesn't take long its already on the machine and all of the identity tests take a few seconds to run in its entirety.

@dougbu
Copy link
Contributor Author

dougbu commented Nov 18, 2021

Does the Helix SDK retry support retry just the single Microsoft.AspNetCore.Identity.Test--net7.0 work item on a single platform❔ If yes, 🆗 I guess but if the retry scope is more narrow broader than that e.g. if it submitted to every platform in the matrix or retried all of our tests on one platform, just no.

@dougbu dougbu force-pushed the dougbu/agent.types.3950 branch from ec21abe to fb0a87f Compare November 19, 2021 00:11
@HaoK
Copy link
Member

HaoK commented Nov 19, 2021

Why does it matter the scope? If it only fails on OSX, it doesn't really hurt specifying broader retry policy for that test class

@dougbu
Copy link
Contributor Author

dougbu commented Nov 19, 2021

Why does it matter the scope? If it only fails on OSX, it doesn't really hurt specifying broader retry policy for that test class

@HaoK and I chatted offline and the scope is a single Helix agent i.e. retries just retry one test class on that one platform.

More generally, we're in agreement to

  1. Remove RetryHandler in the Identity tests and (maybe) wherever else we have similar complications w/ fewer benefits than the Helix retry feature
  2. Enable the Helix retry feature for this particular test class
  3. (Optionally -- if it works) Reduce the 100 second HttpClient timeout because that eats into our Helix timeouts too quickly. (If the Helix timeout restarts when the agent starts a retry, this point isn't worth touching.)

@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

I'm worry that macOS-10.15 Helix machines are significantly slower or have worse network connections than the macOS 10.14 machines we were using before. The most recent aspnetcore-helix-matrix run for this PR failed in exactly the same way as before, w/ IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(...) timeouts after trying to download files 3 times ☹️. Thoughts @ilyas1974 @MattGal

Separately from that and somewhat strangely, we aren't seeing timeouts of the IdentityUI_ScriptTags_SubresourceIntegrityCheck(...) tests and those should be using the same CDN links. May eventually need to download the files in a static method and save the files for use in these two tests. That's for later though.

@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

To clarify, the move that seems to be hurting our IdentityUIScriptsTests class is OSX.1014.Amd64.Open ➡️ OSX.1015.Amd64.Open.

@HaoK
Copy link
Member

HaoK commented Nov 20, 2021

We could just choose to skip these identity tests on the OSX queue, we don't really need the OS coverage for this, there's no variation between OS for this feature area

@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

We could just choose to skip these identity tests on the OSX queue, we don't really need the OS coverage for this, there's no variation between OS for this feature area

WFM though this remains a general problem. @wtgodbe is hitting similar problems w/ tests on macOS 10.15 e.g. in #38536. I'll add the [SkipOnHelix] attribute for the single queue (plus it's non-.Open alternative) in this PR but won't bother doing another aspnetcore-helix-matrix run.

Suggest we move to macOS-11 and OSX.1100.Amd64[.Open] in all of your PRs for dotnet/aspnetcore-internal#3950 except where we have aspnetcore-helix-matrix or similar @wtgodbe.

- part of dotnet/aspnetcore-internal#3950
  - also touches on #36032
- update Helix queues from Alpine 3.12 to 3.14, OSX 10.14 to 10.15, and (for Arm64) Debian 9 to 11
  - use OSX 11.00 when testing PRs and rolling builds; reduce 10.15 usage to scheduled runs
  - remove overlap (all 3 queues) between PRs / rolling builds and scheduled runs
- build source-index on `windows-latest` (not `vs2017-win2016`)
- update build and Helix Docker images to latest tags

nits:
- don't skip unused Helix queues
- remove versions from pipeline job display names
  - some were already outdated; rest will be confusing in the future
- remove most comments about unused Helix queues
@dougbu dougbu force-pushed the dougbu/agent.types.3950 branch from fb0a87f to f4ca1c5 Compare November 20, 2021 01:26

namespace Microsoft.AspNetCore.Identity.Test;

[SkipOnHelix("https://github.com/dotnet/aspnetcore/issues/38542", Queues="OSX.1015.Amd64.Open;OSX.1015.Amd64")] //slow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☹️

Note I'm skipping the test class in case the failure would just move to IdentityUI_ScriptTags_SubresourceIntegrityCheck(...) if I skipped only IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(...). I double-checked and don't see evidence IdentityUI_ScriptTags_SubresourceIntegrityCheck(...) ran before (and of course not after) the timeouts of IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems fine, we could go so far as skip All.OSX if you wanted since we really only need to run this on any single queue to ensure we are including the right file versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<whining>I don't wanna do another iteration</whining> 😭

Plus, we haven't seen similar issues in other queues. We could add a [RunOnlyOnLatestUbuntu] attribute at some point in the future 😄

@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

Separately from that and somewhat strangely, we aren't seeing timeouts of the IdentityUI_ScriptTags_SubresourceIntegrityCheck(...) tests and those should be using the same CDN links. May eventually need to download the files in a static method and save the files for use in these two tests. That's for later though.

This wasn't correct. IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(...) tests didn't run in the failing builds AFAICT.

@dougbu dougbu merged commit 64c3711 into main Nov 20, 2021
@dougbu dougbu deleted the dougbu/agent.types.3950 branch November 20, 2021 06:08
@ghost ghost added this to the 7.0-preview1 milestone Nov 20, 2021
@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/aspnetcore/actions/runs/1483876410

@github-actions
Copy link
Contributor

@dougbu backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: [main] Update Docker images, queues, etc. - part of dotnet/aspnetcore-internal#3950 - also touches on #36032 - update Helix queues from Alpine 3.12 to 3.14, OSX 10.14 to 10.15, and (for Arm64) Debian 9 to 11 - use OSX 11.00 when testing PRs and rolling builds; reduce 10.15 usage to scheduled runs - remove overlap (all 3 queues) between PRs / rolling builds and scheduled runs - build source-index on `windows-latest` (not `vs2017-win2016`) - update build and Helix Docker images to latest tags
Using index info to reconstruct a base tree...
M	.azure/pipelines/ci.yml
M	.azure/pipelines/quarantined-pr.yml
M	docs/Helix.md
M	eng/targets/Helix.Common.props
M	eng/targets/Helix.targets
M	src/ProjectTemplates/test/GrpcTemplateTest.cs
M	src/Testing/src/xunit/HelixConstants.cs
M	src/Testing/src/xunit/SkipOnHelixAttribute.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Testing/src/xunit/SkipOnHelixAttribute.cs
CONFLICT (content): Merge conflict in src/Testing/src/xunit/SkipOnHelixAttribute.cs
Auto-merging src/Testing/src/xunit/HelixConstants.cs
CONFLICT (content): Merge conflict in src/Testing/src/xunit/HelixConstants.cs
Auto-merging src/ProjectTemplates/test/GrpcTemplateTest.cs
CONFLICT (content): Merge conflict in src/ProjectTemplates/test/GrpcTemplateTest.cs
Auto-merging eng/targets/Helix.targets
CONFLICT (content): Merge conflict in eng/targets/Helix.targets
Auto-merging eng/targets/Helix.Common.props
CONFLICT (content): Merge conflict in eng/targets/Helix.Common.props
Auto-merging docs/Helix.md
Auto-merging .azure/pipelines/quarantined-pr.yml
Auto-merging .azure/pipelines/ci.yml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 [main] Update Docker images, queues, etc. - part of dotnet/aspnetcore-internal#3950 - also touches on #36032 - update Helix queues from Alpine 3.12 to 3.14, OSX 10.14 to 10.15, and (for Arm64) Debian 9 to 11 - use OSX 11.00 when testing PRs and rolling builds; reduce 10.15 usage to scheduled runs - remove overlap (all 3 queues) between PRs / rolling builds and scheduled runs - build source-index on `windows-latest` (not `vs2017-win2016`) - update build and Helix Docker images to latest tags
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@dougbu
Copy link
Contributor Author

dougbu commented Nov 20, 2021

Hmm, shouldn't be surprised that backporting changes that touch C# files doesn't work (after #38076). Bit unhappy the .props and .targets files aren't aligned. ☹️

I'll deal w/ this @wtgodbe

@MattGal
Copy link
Member

MattGal commented Nov 22, 2021

I'm worry that macOS-10.15 Helix machines are significantly slower or have worse network connections than the macOS 10.14 machines we were using before. The most recent aspnetcore-helix-matrix run for this PR failed in exactly the same way as before, w/ IdentityUI_ScriptTags_FallbackSourceContent_Matches_CDNContent(...) timeouts after trying to download files 3 times ☹️. Thoughts @ilyas1974 @MattGal

Separately from that and somewhat strangely, we aren't seeing timeouts of the IdentityUI_ScriptTags_SubresourceIntegrityCheck(...) tests and those should be using the same CDN links. May eventually need to download the files in a static method and save the files for use in these two tests. That's for later though.

Pasting reply from Teams:

I think it's a bit of a stretch to extrapolate "machines are not as good" from some HttpClient timeouts. These machines aren't in an Azure data center so literally everything they do (fetch helix work items from Azure Service Bus, send events to Azure Event Hub, download payloads from Azure Storage accounts, etc) involves communicating with and downloading successfully from external resources.

There is some variance in hardware to be found (some pools have a mix of minis and Pros;) but even the oldest, worst mac minis we have are using gigabit adapters built into their main boards connected to the same general network topology as the others. If we see a pattern of a specific machine having this problem, we can certainly investigate but my suspicions lie with some kind of DoS prevention system with cloudflare.

When the vendors aren't using every port on the KVM I will fetch some hardware specs off a few machines but I don't think we can realistically blame compute power on a failure to download an 88 KB file from an external source in 100 seconds.

@ghost
Copy link

ghost commented Nov 22, 2021

Hi @MattGal. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants