Skip to content

Global state creates unpredictable zig test results #11080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danielchasehooper opened this issue Mar 7, 2022 · 12 comments
Open

Global state creates unpredictable zig test results #11080

danielchasehooper opened this issue Mar 7, 2022 · 12 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@danielchasehooper
Copy link
Contributor

danielchasehooper commented Mar 7, 2022

Zig Version

0.10.0-dev.871+bb05a8a08

Steps to Reproduce

Imagine you saw the following output:

$ zig test globalteststate.zig --test-filter "buzz"
TestAll 1 tests passed.
$ zig test globalteststate.zig --test-filter "fizz"
TestAll 1 tests passed.
$ zig test globalteststate.zig
TestTestTest [2/2] test "buzz"... FAIL (TestUnexpectedResult)

That would be very surprising! The two tests pass on their own, but when both tests are run together, a test fails.

You can recreate the above situation with the following contents of globalteststate.zig

var global_state_clean = true;

test "fizz" {
    global_state_clean = false;
}

test "buzz" {
    try @import("std").testing.expect(global_state_clean);
}

This is a trivial example that exemplifies a situation that can manifest in much less obvious ways in large projects.

Expected Behavior

I'd expect each test to be run with a "clean" global state untarnished by other tests.

Actual Behavior

Tests can affect the results of other tests. this can lead to very confusing situations.

Notes

These situations can be triggered in several ways that are very common in practice:

  1. Working on one test can cause seemingly unrelated tests to change pass/fail
  2. Reordering tests can change which tests pass/fail
  3. Filtering tests with --test-filter flag can change which tests pass/fail
  4. Running tests on more resource constrained systems can cause later tests to fail due to earlier tests using up system resources. I think this would be especially confusing since the state is in the OS and not in your code.

In effect you're testing global behaviors / orderings without explicitly intending to do that.

I hope everyone can see why this is a problem and don't write it off as "if you have global state, that's on you". Global state is common, even in non-obvious ways like open file handles, process memory use, etc. If tests act in unpredictable ways, it'll make people less likely to use them and/or trust them less. Off the top of my head, it'd be nice if after the test process had set up global state, but before any tests are run, it would fork() for each test so they're run in an untarnished process - but I leave it up to the Zig committee to decide how best to handle this.

@danielchasehooper danielchasehooper added the bug Observed behavior contradicts documented or intended behavior label Mar 7, 2022
@danielchasehooper danielchasehooper changed the title global state creates unpredictable zig test results Global state creates unpredictable zig test results Mar 7, 2022
@InKryption
Copy link
Contributor

InKryption commented Mar 7, 2022

It would be important to note that the reason there isn't a "clean global state" per test is because test blocks are eventually all compiled down to functions, which exist in a binary, alongside the entry point which runs the test runner, which then runs all those functions.

That is to say, tests aren't special, and neither are the executables they exist in, so there wouldn't necessarily be a trivial way to solve this. Maybe by somehow storing the global state before a test is run, and restoring it after the fact, but not sure how great of a solution that is, or how feasible it would be.

The fork solution wouldn't work on all platforms in the same way, especially not on Windows, so that's a very unlikely solution.

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 7, 2022

Hmm, the compiler does know about all globals. It could potentially generate code to reset them all between tests. However, as the OP points out,

Global state is common, even in non-obvious ways like open file handles, process memory use, etc.

this approach would not be perfect. Imo we should wait until this problem is encountered in a real project so that we have a concrete use case, before attempting to solve this problem.

@marler8997
Copy link
Contributor

I recently saw a project where someone was testing the use of environment variable which is global state. Like @SpexGuy says any solution to this would not be perfect, it's unclear to me whether the effort to tackle this problem would warrant the extra complexity. Maybe in some cases it makes sense.

@danielchasehooper
Copy link
Contributor Author

danielchasehooper commented Mar 8, 2022

Maybe instead of ‘fork’ing, the generated test binary could repeatedly launch a new instance of itself and pass the index of the test to run. The first time the test binary is run it's not passed a test index, which signals that the test runner/self launching code should be run instead of a specific test.

@judofyr
Copy link

judofyr commented Mar 8, 2022

Another technique to combat this is to always randomize the order which the tests are executed in. It’s not perfect, but it’ll help catch unintended dependencies between tests.

@danielchasehooper
Copy link
Contributor Author

Another technique is to always randomize the order

Clever idea! While it could catch unintended dependencies, I think that would make test failures more surprising, and less reproducible. The spirit of this bug report is to do the opposite.

@ifreund
Copy link
Member

ifreund commented Mar 8, 2022

To make the random order reproducible we could simply introduce a test seed to the test runner used to seed the prng for ordering. We already use such test seeds for our various fuzz tests in tigerbeetle.

@judofyr
Copy link

judofyr commented Mar 8, 2022

The way this works in Minitest (a Ruby testing framework) is that it supports a seed and it will always output the seed it used. Same seed = same ordering of tests.

Run options: --seed 53708

# Running:

..........

Finished in 0.905809s, 14.3518 runs/s, 41.9514 assertions/s.

13 runs, 38 assertions, 0 failures, 0 errors, 0 skips

Here "Run options" would include any other flags specified as well.

There's also a way of enforcing that tests are being executed in order: https://www.rubydoc.info/gems/minitest/Minitest%2FTest.i_suck_and_my_tests_are_order_dependent!

@judofyr
Copy link

judofyr commented Mar 8, 2022

Off the top of my head, it'd be nice if after the test process had set up global state, but before any tests are run, it would fork() for each test so they're run in an untarnished process - but I leave it up to the Zig committee to decide how best to handle this.

Another way of approaching this: Could we make it possible to customize the execution of test blocks? Here's a random syntax proposal:

var forkTestRunner = …;

test(forkTestRunner) "fizz" {
  //
}

And then there would be an API for the test runner which makes it capable of forking the process.

@ityonemo
Copy link
Contributor

ityonemo commented Mar 8, 2022

To make the random order reproducible we could simply introduce a test seed to the test runner used to seed the prng for ordering. We already use such test seeds for our various fuzz tests in tigerbeetle.

Can confirm. Randomized test order with seeds is extremely useful in elixir. You expect to see irreproducibility if you're mishandling global resources (like databases e.g.), but using the seed it's easy to track down (if it's not obvious what's going on).

@matu3ba
Copy link
Contributor

matu3ba commented Mar 8, 2022

Quick summary with additional point 4+5 added by me.

  1. Preventing global state also prevents reusing of code. Comptime stuff is more cluttered and very annoying to repeat, but very simple to zero out/initialize at test begin.
  2. A user-definable API, ie in build.zig or testing.zig for the test order sounds useful for the long-term goal of optimizing testing performance ie for use cases like testing compilation (even simple programs have compilation time for ca. 2s and having very many or having complex build dependencies for specific tests adds up). This also enables use cases like using a seed for the test order.
  3. Forking on every block or execve is expensive and has a significant perf loss, which is why testing panics is planned to only fork, if necessary.
  4. Forking prevents reusing this part of zig test infrastructure for embedded.
  5. Resetting all global vars to their default value will increase binary size with a hidden cost (the reset code), which is very undesirable for embedded.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. and removed bug Observed behavior contradicts documented or intended behavior labels Aug 19, 2022
@andrewrk andrewrk added this to the 0.12.0 milestone Aug 19, 2022
@rohlem
Copy link
Contributor

rohlem commented Jun 7, 2023

See also the new #15953 - multi-threading tests by default / indiscriminately + randomized order would be an even more extreme scenario.

The third step would be multiplying some test executions in that pool - probably not enabled by default, but maybe useful for stress testing.
(Randomized parallel execution of multiple executions of tests in sequence should really be able to uncover any kind of global state.)
Or maybe that's where we draw the line to recommending a custom test runner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

10 participants