-
Notifications
You must be signed in to change notification settings - Fork 189
Description
Working on Fedora 41, x86_64
, with the system-packaged rustc
, version 1.86.0:
$ git clone https://github.com/salsa-rs/salsa.git
$ cd salsa
$ for i in $(seq 1 10000); do echo "==== ${i} ===="; cargo test --test parallel || break; done
I expect to see this every time:
test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
However, instead I eventually get this:
failures:
---- parallel_scope::execute_cancellation stdout ----
2025-04-15T14:18:29.632787Z INFO salsa::function::execute: a1(Id(0)): executing query
thread 'parallel_scope::execute_cancellation' panicked at tests/parallel/parallel_scope.rs:105:10:
called `Result::unwrap_err()` on an `Ok` value: (11, 21)
failures:
parallel_scope::execute_cancellation
test result: FAILED. 13 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Testing with 10000 iterations is conservative. In practice, I see the problem in fewer than a thousand iterations, generally a few dozen to a few hundred. The problem still occurs when I add --release
to the cargo test
invocation.
We originally observed this while building a current release of ruff
in Fedora; ruff
uses a git snapshot of salsa
, and our package runs the tests for the salsa
snapshot as part of the RPM build process. The failure occurs on at least x86_64
and s390x
architectures; it is probably architecture-independent.
Since the above procedure reliably reproduces the flaky failure, I was able to bisect it to 5ee3bdd. Based on the flaky nature of the regression, the nature of the test that is failing, and the contents of the commit that introduced the regression, this is clearly a concurrency bug, but I don’t know what exactly is wrong.
Mentioning @Veykril, hoping that they will have some insight.