Make E2E tests work on Linux, support retries, and have new Azure pipeline #36207

SteveSandersonMS · 2021-09-06T15:00:17Z

This is the final chunk of the work I've been doing on making the Blazor E2E tests ready to be good friends with our CI environment. This PR:

Makes the E2E tests able to run on Linux and Mac, as well as the existing Windows support
- Mostly this just meant removing the "don't run on Linux/Mac" checks, since it almost entirely just worked like it should, but there were a couple of places where we made Windows-specific and host-machine-culture-specific assumptions that I've fixed
Restores the retry support that was originally set up by @HaoK (thanks!). The difference now is that retry runs one level higher in the Xunit class hierarchy, so now it fully re-initializes any test that it needs to retry.
Skips one class of tests that really do seem genuinely flaky (all the InputDate ones, for which the browser seems to behave unpredictably with keyboard input). All the other tests do not appear to be flaky, and I've done 200+ runs to try to confirm that. More details below.
Makes the selenium-config.json file get ignored in CI now, so we no longer have to worry about version mismatches between the chromedriver binary and the version of Chrome installed on the runner machine. Now, it uses the chromedriver binary that was preinstalled on the runner machine regardless of selenium-config.json (which is now only used locally). So we should no longer have failures that start happening every time the AzDO images update their version of Chrome.
Configures CI to run these E2E tests on Linux, which leads to a smaller and simpler build pipeline, and improves our test coverage since most of the devs are mostly working on Windows

Evidence that these should now run well in CI

After implementing all the changes except retries, I was getting around 99% pass rate for the entire suite of 800 tests, based on running it in a loop in an Azure VM around 200 times and getting exactly two failures (i.e., 2 out of 200*800=160000). That is, individual test cases are 99.999% on average, if that's a meaningful thing to average, but the suite as a whole multiplies out to about 99%. As much as we want all our tests to be 100% deterministic, that's not a realistic goal for browser automation - at least I've never heard of anyone claiming 100% determinism at scale.

99% is not good enough on its own, but it is good enough that retries can cover the gap. If the outcomes were independent and identically distributed (IID) then allowing 3 attempts per test case we'd never expect it to fail in our lifetimes (but I know IID would be a tenuous assumption - there probably are some test cases that have as-yet undetected real problems). Up to 3 attempts is what I've set.

Since adding retries, I ran for another 2.5 days totalling 132 test suite runs, and got a 100% pass rate. From logs I see there were 2 separate incidents when a test case had to be retried once. This suggests that retries are covering the gap.

As for any concerns about the build itself hanging or failing, the new pipeline only automates a minimal set of "restore and build" steps that would be correct and legal for any contributor to do in our repo. So that shouldn't fail, but if it did, we'd want to know about it in the same way as any other build failure. Also I haven't seen any cases of this new, more minimal set of build actions failing.

Rehab plan

I know people won't be comfortable just turning everything back on in a single step. I want to move more cautiously too. So I propose the following roll-out plan:

Run these tests manually in an independent pipeline to verify the behavior in Azure Pipelines is as expected.
- Done. I've run it in CI > 20 times since retries were added here with success
Replace the old Components E2E pipeline with this new one, and re-enable it on main and PRs, but at this stage, keep it painted fake green so it can't block anyone even if something goes wrong
When we've collected a bunch more data over a week or so, if it looks good, change it to accurately report its result so that failures are real failures

Also:

Have issues for future follow-up: unskipping the InputDate tests if we can ensure a high level of reliability, and doing other cleanup (e.g., removing all the code related to Sauce Labs which we aren't using, and the test parallelism and browser-restarting optimizations which we're no longer using)

…eline

SteveSandersonMS · 2021-09-06T15:32:37Z

cc @HaoK @dougbu

…ew one)

src/Components/test/E2ETest/Tests/InputFileTest.cs

HaoK

Retry changes look good! Should we try to get these tests running on helix at the same time since long term these probably shouldn't be running on azdo test jobs

SteveSandersonMS · 2021-09-07T06:29:46Z

Should we try to get these tests running on helix at the same time

I think it would be fine to try that as soon as we do get it running on a regular pipeline, but I wouldn’t want to block until then. It’s taken a long time to get this far, and we really need the tests running in CI one way or another.

Co-authored-by: Martin Costello <[email protected]>

wtgodbe · 2021-09-07T15:59:41Z

.azure/pipelines/components-e2e-tests-new.yml

@@ -0,0 +1,45 @@
+# This configuration builds and runs Components E2E tests only


Are we going to have both components-e2e-tests.yml and components-e2e-tests-new.yml?

No, I only create the new one so I could invoke it manually (the old one is currently disabled, which means you can't even run it manually).

Before merging this PR, I'll replace the contents of the old pipeline with the new one, and delete the new one. Then we can re-enable the old one as per the "rehab plan" I listed above.

Thanks for checking.

SteveSandersonMS · 2021-09-07T19:58:14Z

/backport to release/6.0

github-actions · 2021-09-07T19:58:28Z

Started backporting to release/6.0: https://github.com/dotnet/aspnetcore/actions/runs/1210805286

github-actions · 2021-09-07T19:59:01Z

@SteveSandersonMS backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Make E2E tests work on Linux, support retries, and have new Azure pipeline
Applying: Opt components E2E tests out of other CI pipelines (run only in the new one)
Applying: Update src/Components/test/E2ETest/Tests/InputFileTest.cs
Applying: Move new pipeline logic into old pipeline
Using index info to reconstruct a base tree...
M	.azure/pipelines/components-e2e-tests.yml
Falling back to patching base and 3-way merge...
Auto-merging .azure/pipelines/components-e2e-tests.yml
CONFLICT (content): Merge conflict in .azure/pipelines/components-e2e-tests.yml
Removing .azure/pipelines/components-e2e-tests-new.yml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0004 Move new pipeline logic into old pipeline
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

…eline (#36207) * Make E2E tests work on Linux, support retries, and have new Azure pipeline * Opt components E2E tests out of other CI pipelines (run only in the new one) * Update src/Components/test/E2ETest/Tests/InputFileTest.cs Co-authored-by: Martin Costello <[email protected]> * Move new pipeline logic into old pipeline Co-authored-by: Your Name <[email protected]> Co-authored-by: Martin Costello <[email protected]>

…eline (#36207) (#36247) * Make E2E tests work on Linux, support retries, and have new Azure pipeline * Opt components E2E tests out of other CI pipelines (run only in the new one) * Update src/Components/test/E2ETest/Tests/InputFileTest.cs Co-authored-by: Martin Costello <[email protected]> * Move new pipeline logic into old pipeline Co-authored-by: Your Name <[email protected]> Co-authored-by: Martin Costello <[email protected]> Co-authored-by: Your Name <[email protected]> Co-authored-by: Martin Costello <[email protected]>

Make E2E tests work on Linux, support retries, and have new Azure pip…

fa49528

…eline

SteveSandersonMS requested a review from a team September 6, 2021 15:31

SteveSandersonMS marked this pull request as ready for review September 6, 2021 15:32

SteveSandersonMS requested review from Pilchie and a team as code owners September 6, 2021 15:32

Opt components E2E tests out of other CI pipelines (run only in the n…

638489d

…ew one)

martincostello reviewed Sep 6, 2021

View reviewed changes

src/Components/test/E2ETest/Tests/InputFileTest.cs Outdated Show resolved Hide resolved

Pilchie added the area-blazor Includes: Blazor, Razor Components label Sep 7, 2021

HaoK approved these changes Sep 7, 2021

View reviewed changes

Update src/Components/test/E2ETest/Tests/InputFileTest.cs

a2530c5

Co-authored-by: Martin Costello <[email protected]>

wtgodbe reviewed Sep 7, 2021

View reviewed changes

Move new pipeline logic into old pipeline

7102baf

SteveSandersonMS merged commit b0d5651 into main Sep 7, 2021

SteveSandersonMS deleted the stevesa/e2e-tests-on-linux branch September 7, 2021 19:57

ghost added this to the 7.0-preview1 milestone Sep 7, 2021

SteveSandersonMS mentioned this pull request Sep 7, 2021

Backport #36207 to 6.0 #36247

Merged

10 tasks

SteveSandersonMS mentioned this pull request Sep 9, 2021

Update ChromeDriver version #36107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make E2E tests work on Linux, support retries, and have new Azure pipeline #36207

Make E2E tests work on Linux, support retries, and have new Azure pipeline #36207

SteveSandersonMS commented Sep 6, 2021 •

edited

Loading

SteveSandersonMS commented Sep 6, 2021

HaoK left a comment

SteveSandersonMS commented Sep 7, 2021

wtgodbe Sep 7, 2021

SteveSandersonMS Sep 7, 2021

SteveSandersonMS commented Sep 7, 2021

github-actions bot commented Sep 7, 2021

github-actions bot commented Sep 7, 2021

		@@ -0,0 +1,45 @@
		# This configuration builds and runs Components E2E tests only

Make E2E tests work on Linux, support retries, and have new Azure pipeline #36207

Make E2E tests work on Linux, support retries, and have new Azure pipeline #36207

Conversation

SteveSandersonMS commented Sep 6, 2021 • edited Loading

Evidence that these should now run well in CI

Rehab plan

SteveSandersonMS commented Sep 6, 2021

HaoK left a comment

Choose a reason for hiding this comment

SteveSandersonMS commented Sep 7, 2021

wtgodbe Sep 7, 2021

Choose a reason for hiding this comment

SteveSandersonMS Sep 7, 2021

Choose a reason for hiding this comment

SteveSandersonMS commented Sep 7, 2021

github-actions bot commented Sep 7, 2021

github-actions bot commented Sep 7, 2021

SteveSandersonMS commented Sep 6, 2021 •

edited

Loading