-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
I'd like to add an option to the CLI of the standard test runner so that it stops running tests after the first one fails. This would be useful in situations where you just want to find whether any test fails. I don't propose to change the default.
My immediate motivation is that cargo-mutants just needs to see whether any test fails, and it's a waste of time to continue running tests after one failure has been found.
More generally this is a feature many test runners have and that people seem to find useful. For example, failing fast is the default in nextest: https://nexte.st/docs/running/#failing-fast, Bazel has --test_runner_fail_fast
, and pytest has --exitfirst
. And of course cargo test
fails fast by default at the test target level.
Today, cargo test
will fail fast by default at the test target level: if any tests fail, it won't run any more. However, within the test target, there's no way to fail fast. This can be confusing, but it would be disruptive to change it now.
I discovered that the logic for this actually already exists, it's just not exposed in the CLI. #142807 adds an option. With that change, you can run cargo test -- --fail-fast -Zunstable-options
and it will stop after the first test fails.
When multiple threads are used the tests are run in nondeterministic order, and so in a tree with multiple failing tests, with this option on, it's nondeterministic which tests will get run before the process stops. I don't think that's surprising to people who just want to know of any one failure, and the order can be made predictable by running on a single thread.
I have read that people would like to move away from the current libtest architecture and so apparently there has been a soft feature freeze for some months or years. However since this is a small change in the implementation and shouldn't introduce any compatibility concerns I hope it could still be considered.
Crates can work around the absence of this feature in libtest by setting harness = false
and using Nextest or some other harness, but that's a large transition and I think it would be nice to have it in the standard library: at least, that would help cargo-mutants get better performance on most crates.
cc @rust-lang/testing-devex
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
[-]PR: Add a `--fail-fast` option to libtest[/-][+]FR: Add a `--fail-fast` option to libtest[/+]epage commentedon Jun 23, 2025
@oli-obk and @compiler-errors, in #105153, you both recognized that the fail-fast mode was a hack. While testing-devex and libs-api decide what should be part of the stable API, any input on user facing problems or limitations from that hack? If we move forward with this, I'd like for us to understand what might be blockers for stabilization.
compiler-errors commentedon Jun 23, 2025
No user facing problems; I think the only reason we called it hack was how it was implemented (like via a rustc specific env var), and not the general idea itself.
epage commentedon Jul 1, 2025
@sourcefrog sorry for the delays, the testing-devex team hasn't been able to meet in a bit. I'm going to go ahead and try to prime the conversation here with my own thoughts to hopefully streamline things for when we do meet.
Focusing on the the question of what should be in the API / CLI, we've overall been working to shrink the surface area of lib
test
. Right now, this has mostly been us deprecating (but not removing) functionality. We may remove some unstable functionality. This is part of our effort to flesh out custom test harnesses, including the inter-process API thatcargo test
and other test runners would interact with. To reduce that surface and to improve some aspects of usability, we also want to shift some responsibilities from libtest
tocargo test
.So from my perspective, the questions that would be relevant to testing-devex in discussing this:
cargo test
knowing every "modern" harness supports this by dropping our weird "keep-going within a binary but fail fast across binaries" to having "keep going across binaries and--fail-fast
would do so across binaries".nextest
doesn't just have--fail-fast
but also--no-fail-fast
and--max-fail
. Could we dig into the motivations to see if they apply here?sourcefrog commentedon Jul 1, 2025
Thanks! I'll follow up with a survey of what other frameworks do.
sourcefrog commentedon Jul 13, 2025
In short: Adding
--fail-fast
into libtest seems to me to align with the common name for a common practice and to fill a worthwhile gap that can't reasonably be worked around at the cargo level.I'm assuming the split here continues to be that
cargo test
runs various test target binaries, each of which uses a library/harness that's fairly opaque to cargo test.Are you thinking we might have
--fail-fast
as a standard argument that test processes should expect?Right, this seems inherently very tied to how the individual tests are executed, which seems to be very much the business of the individual test harness implementation.
All the test runners I've seen have some kind of loop over a work queue, possibly with multiple workers. They may run the tests in process, on threads, in subprocesses, in containers, or remotely, but there's still some kind of queue. It's easy to exit early when one or more tests have failed.
Right, I think that would be a less confusing experience, and this is something that essentially every Rust user will hit when they write a failing test. If we weren't constrained by previous behavior I think that might be a better default. But it will be a change in the command line behavior. Personally I would welcome it but I also prize Rust's stability commitments.
Other test libraries
Nextest: Has
--max-fail=N
(or=all
),--fail-fast
(default) and--no-fail-fast
.HUnit (Haskell): Apparently doesn't have a fail fast feature.
cargo-maelstrom: Has
--stop-after=N
.go test: Has
-failfast
Python
unittest
(in stdlib): Has-f, --failfast
pytest: Has
-x
(fail fast, no long option?), and--maxfail=N
jest (js): Has
--bail
or--bail=N
Boost (C++): Doesn't seem to have an option
Junit: Has
--fail-fast
Rake (Ruby): Has
--fail-fast[=N]
Overall adding a
--fail-fast
and optionally a--max-fail=N
seems to align with common practice. Since we have--no-fail-fast
the style of options already aligns with Nextest, rather than the alternate style of--fail-fast=false
. It could reasonably be abbreviated to-f
and also-F
for--no-fail-fast
.Motivations
I think the motivation to use this feature come from two distinct scenarios:
--fail-fast
you'll see the error earlier.git bisect
the user may only want to know whether any tests fail.There are certainly situations where users would rather make a throughput/latency choice to get many errors in batch, including if the test suite is very slow or (as a special case) if some errors are hard to reproduce outside CI and CI takes a while. I've also seen, less than once a year in my experience, that some test failures are incomprehensible and I need to skim many failures to work out where to begin -- but users will still have the option to run all tests when they need it.
Generalizations and evolution of this feature
This feature has existed in other languages for many years, without apparently growing a lot of complexity. So, it doesn't seem very likely to lead to many follow-on features in Rust? But I will mention two:
Stop after N failures
The most common generalization is from "stop after a failure" to "stop after N failures", allowing people to adjust the tradeoff between getting short usable output faster versus the cost of running the test suite up to the point something fails.
--max-fail=N
could make sense for harnesses to add. A straightforward implementation within the harness would mean "stop after N failures in one binary". Since it's relatively rare and I'd say less important than stopping after one failure, perhaps this is reasonable to add as a target-specific option? On the other hand it's unlikely to be difficult for any harness to implement this, so it could be part of a standard protocol.Run tests more than once
A related area is to retry failing tests, or all tests. I've used Bazel's
--runs_per_test=N
and--flaky_test_attempts=N
which are quite useful when you suspect a test is flaky.Maelstrom also has
--repeat=N
.Workarounds
The main workaround I can think of is that cargo mutants could kill the subprocess when it notices that a test has failed. I have thought about doing this in cargo-mutants. (It would be clunky to do this from the text output, but more reliable if the protocol looks more like subunit or junit.) That has some drawbacks:
cargo test
would make any problems more prominent.Alternatively I can imagine adding an interactive protocol between cargo test and the harness, where the harness reports incremental results (perhaps over subunit) and cargo test can ask it to gracefully stop. It doesn't seem worth it for only this feature, and seems likely to complicate and constrain the harness implementation, but perhaps there would be other features that want this.
epage commentedon Jul 14, 2025
Thanks for that write up!
I guess if
cargo test
uses thecargo nextest
model, this technically wouldn't be needed.I'm surprised so many have a "first N" variant. I wonder what the use cases look like for that that motivated that in case it impacts the design here, especially since we're shifting focus from humans passing flags to machine.
I was wondering about what worflows we might want to offer from cargo. I've been particularly eyeing
--last-failed
The other two are about sort order and I've prototyped in libtest2 a solution that will allow cargo to do those.
For
--last-failed
, I think--fail-fast
becomes important. I'd probably have the "or all if none failed" case implyfail-fast
since this is an iterative development mode which it fits with. Unless the "find any" case for CI is important enough, I wonder if--fail-fast
incargo test
would be worth it.sourcefrog commentedon Jul 14, 2025
Right, if it ran each test function in one process it would be totally in control of when to stop. Also, this would remove the need to finish all the tests in one target before starting the next.
However, there are downsides to this approach, because launching a process can be significantly slower than running a small unit test and so the overall test time can be much slower on Nextest on some trees.
So I guess I would be inclined to leave this up to the harnesses to experiment with, but I haven't read all the history of how the testing-devx team conceives of this interface.
I think it's essentially splitting the difference between the motivations I described above: I don't want to be spammed by dozens of failures, but I also want to get more data out of a slow test run than just a single failure. My guess these would be rarely used but they're easy to add.
If you're going to add that then I'd suggest also options to run tests in random or seeded pseudorandom order as people will discover some nondeterminicity. Also perhaps the Bazel thing of repeating failed or all tests.
As additional inspiration cargo-mutants has
--iterate
which basically re-runs failed meta-tests by looking at the previous failures.These features seem pretty good. I guess there is a question of approach between allowing harnesses to add them vs having batteries included in the standard tool.
Also, I would rather like to land this into the existing harness even if a large 2.0 is in the pipeline. It doesn't seem like it would constrain future changes too much.
epage commentedon Aug 14, 2025
This was discussed in a testing-devex meeting on 2025-07-29 (sorry for the delay in reporting this) and we had unanimous agreement among attendees (@epage, @calebcartwright, @Muscraft, @weihanglo).
We'd then endorse this to t-libs-api to have the final say but in 9 days FCP closes on t-testing-devex having delegated authority to make these decisions on our own (rust-lang/libs-team#633). Maybe we just wait until then?
feat(lexarg): Add --fail-fast flag
feat(lexarg): Add --fail-fast flag (#105)
sourcefrog commentedon Aug 18, 2025
Great, let me know if I can help!