-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: testing/synctest: replace Run with Test #73567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is it legal to call synctest.Test(t, func(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
// ...
})
}) |
Related Issues
Related Code Changes Related Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
No, no recursive bubbles. |
Can that be added to the docs? Also, what about |
Noted that the non-recursiveness needs to be in the docs. I waffled a bit on |
I like the idea of making In concordance with func Test(t *testing.T, name string, f func(*testing.T)) |
As currently defined, A possible modification to this proposal would be for it to start a subtest, in which case we would want to give it a We're aiming for the least amount of new mechanism in this API to start with, which I think argues for |
As in my original feedback in #73062 (comment), I do prefer that Presumably if we do find with experience that it's common to combine subtests directly with synctest bubbles (so that they correspond exactly in the code) then a later proposal could call for something like the following that would be backward compatible: package synctest
// Subtest begins a new subtest (as if calling [testing.T.Run]) whose body
// executes in a new goroutine in an isolated bubble (as if calling [Test]).
func Subtest(t *testing.T, name string, f func (t *testing.T)) {
t.Run(name, func (t *testing.T) {
Test(t, f)
})
} ...but in the meantime, something like the body above can be inlined into any test that needs it, so we can wait to see whether this situation comes up enough for this helper to pull its weight. |
I like this new API. But I don't like the stutter in It would be a pity to forego |
I agree with Hein. I dislike the stuttering Test suggestion because the word "test" is over used, generic, not descriptive, and synctest is already vague and not particularly descriptive for something this awesome (I discovered it almost by accident!) "faketime" would be a descriptive package name. "bubblelife" might be amusing. I kinda like double barrelling for emphasis sometimes -- like "fakebubble". Also, using Bubble as the verb might not be bad, as it's a great, rarely used word. As in, faketime.Bubble(). There! Super descriptive. Anyway. Give naming some weight in your thoughts. It matters way, way more than we realize. For better or worse, names are very sticky. I actually prefer to break -- even deliberately break -- backwards compatibility within experiments, because otherwise you are shaping yourself towards a worse design at the end. The point of the experiment is literally to improve the design. Use that power. Have courage. Be bold, especially in an experiment. I like adding the testing.T parameter also from a safety point of view. You don't want to accidentally use fake time in a real program. Being a sub package of testing might not imply a enough of a danger zone, necessarily, but seeing a *testing.T is pretty darn unambiguous. |
A few reasons for the name
|
A couple updates to the proposal: I'm adding a (An alternative would be a generic function, such as Added a mention that bubbles may not be recursive. // Test executes f in a new goroutine.
//
// The new goroutine and any goroutines transitively started by it form
// an isolated "bubble".
// Test waits for all goroutines in the bubble to exit before returning.
//
// Goroutines in the bubble use a synthetic time implementation.
// The initial time is midnight UTC 2000-01-01.
//
// Time advances when every goroutine in the bubble is blocked.
// For example, a call to time.Sleep will block until all other
// goroutines are blocked and return after the bubble's clock has
// advanced. See [Wait] for the specific definition of blocked.
//
// If every goroutine is blocked and there are no timers scheduled,
// Test panics.
//
// Channels, time.Timers, and time.Tickers created within the bubble
// are associated with it. Operating on a bubbled channel, timer, or ticker
// from outside the bubble panics.
//
// If Test is called from within an existing bubble, it panics.
//
// The [*testing.T] provided to f has the following properties:
//
// - Functions registered with T.Cleanup run inside the bubble,
// immediately before Test returns.
// - The [context.Context] returned by T.Context has a Done channel
// associated with the bubble.
// - T.Run panics. Subtests may not be created within the bubble.
func Test(t *testing.T, f func(*testing.T))
// Fuzz is like Test, but for fuzz tests.
func Fuzz(f *testing.F, fn func(*testing.F))
// Wait is unchanged from the existing proposal.
func Wait() |
This makes a lot of sense to me. Has anything under a GOEXPERIMENT flag "failed" in the sense that the underlying stuff was retracted? It seems like that must be an anticipated outcome, but I'm not sure what the rules are or should be. If it wouldn't be reasonable to retract E.g. with a package |
In this new form that has direct access to a I'm hoping that the "bubble" mechanism does a good enough job of keeping things contained that it would be okay to keep running other tests in the same package -- or possibly even other separately-bubbled subtests in the same test? -- if one test deadlocks inside a synctest bubble. I can understand that might be unfavorable if it would make it hard to offer a non-testing-integrated version of this later, but all else being equal having a test fail properly tends to be more ergonomic than a panic. |
Usability feedback: its a minor PITA to wrap all my tests with synctest.Run(). I mean it is worth it, but awkward, currently. For an experiment, this is fine. Mimimizing the invasive changes needed to adopt synctest I assume is a goal. What I would like: A) to not have to maintain two versions of every test. What I would like: B) to run both synctest and non-synctest versions of every test, easily, before check-in. My current usage pattern has been to run both synctest and non-synctest versions before committing new code. Current solution: I have a pair files, synctest.go and synctest_not.go, that abstract out synctest.Run and synctest.Wait, and use the build tag or !tag for the experiment. In the synctest_not.go file, the operations are no-ops. This avoids the (A) duplication issue. I not sure how to improve the situation, and its not horrible presently, but it is clunky. I'd be open to using _ imports to add synctest to _test.go files with less surgery required. I'd be open to a go test flag that converts the whole testing package to be a synctest bubble-every-test package, and just ignores benchmarks. I'd be open to having the package provide a tag based approach similar to my wrappers like (B) above. |
Spelling out some logic:
It seems like there could be various interesting/useful ways to toggle evaluation of p and q (environment variables come to mind), but maybe giving q directly, e.g.: |
Reading the most recent comments above my mind initially went to something like this: func TestSomething(t *testing.T) {
test := func (t *testing.T) {
// (the actual test code)
}
t.Run("bubbled", func (t *testing.T) {
synctest.Test(t, test)
}()
t.Run("unbubbled", test)
} Exactly what I sketched above could only work if Now, I'm not meaning to suggest that's necessarily a good idea, but I do still want to ask for the sake of better understanding the problem: would that be sufficient to allow running the same test code both bubbled and unbubbled, or are there likely to be additional differences between a bubbled and an unbubbled test implementation that would also need to be accounted for somehow? Sidebar: the example I wrote above could've been a little shorter and (subjectively) more symmetrical if there were a function like this: package synctest
// Bubbled takes a subtest-like function and returns the same function
// instrumented so that its body runs in a synctest bubble.
//
// (Various additional constraints on usage of the result follow from
// this that I won't spell out here, since they are just implications of
// the current specification of synctest.Test in the proposal)
func Bubbled(f func (*testing.T)) func (*testing.T) then, t.Run("bubbled", synctest.Bubbled(test))
t.Run("unbubbled", test) I don't know if this improves the typical callsite enough to be worth it, but I'm mentioning it anyway for the sake of discussion. |
I'm dubious about providing an API for running a function inside or outside a bubble. Any test which uses Almost any test which uses time is going to be slow and/or flaky outside a bubble. Making a test better able to execute outside a bubble makes it a worse test inside one: Avoiding On the flipside, making a test able to execute inside a bubble may require using a fake network or other dependency, which isn't required for a non-bubbled test. So I don't think that running the same test both inside and outside a bubble is going to be the usual case, and I don't think we should encourage authors to write tests that way. In the event that you do want to do this, a helper function is trivial: func testMaybeBubbled(t *testing.T, f func(*testing.T)) {
t.Run("no bubble", f)
t.Run("yes bubble", func(t *testing.T) {
synctest.Test(t, f)
})
} |
Thanks - I found this persuasive, I was thinking of cases where |
@neild your helper function example seems essentially the same as what I showed inline in my earlier comment, and so I think has the same problem if implemented with the current form of this proposal: the bubbled version probably needs to call So it seems like it would need either a concession that func testMaybeBubbled(t *testing.T, f func(t *testing.T, waitIfBubble func()) {
t.Run("no bubble", func (t *testing.T) {
f(t, func () { /* no-op wait */ })
})
t.Run("bubble", func (t *testing.T) {
f(t, synctest.Wait)
})
} ...and then the given function would call this However, I do tend to agree with you that if you're testing something where it's beneficial to use ("where |
The problem is that
If you take out the If you want to write a test that can execute with both fake and real clocks, then you need to avoid |
I extracted a small package containing my network/chaos simulator (all channels, no real network) from my larger RPC package, and made it more general by rounding it out with a proper net.Conn interface. If it might be useful for anyone doing synctest experiments, you can find it here https://github.com/glycerine/gosimnet . It can simulate network partitions, node failures, and node restarts. It is also kind of rough and ready, as I don't need the full net.Conn interface myself in rpc25519. So if you want to alpha test it, help me polish it by filing issues for any rough edges or incomplete implementation points, or especially anything that would make synctest testing better. |
Neil wrote,
I don't want overstate the point, but just in case this is not obvious... The reason I've been doing this as standard operating procedure is to get to test a different subset of event interleavings. Both synctest and non-synctest runs still only give me very partial coverage of all possible event interleavings of course--for that you do need a real model checker and its correspondingly smaller than the real world sized model--but so far both have been highly useful and both have helped me catch real bugs. Synctest gives me interleavings that are hard to get with realtime, and its higher reproducibilty (not completely reproducible, of course, since the goroutines that are not in synctest.Wait can become durably blocked in many orders) is a massive boon. Realtime on the other hand checks a wider but shallower set of interleavings. So I find both very valuable. I get better statistical "coverage", better sampling if you will, of possible interleavings. synctest.Wait gives me a "spike" where I know one goroutine always goes last, whereas realtime has more concurrent interleavings and I don't know that a given goroutine is "going last" in a time step. edit: Interestingly, this suggests a sampling strategy to take multiple "spikes" through the event space that might get even better coverage: rotate the goroutine that calls synctest.Wait around deterministically; or even randomize it, varying who gets to "go last" by that goroutine being the one to call Wait. The synctest.Wait call's barrier provides a chokepoint to restrict sampling to a particular event behavior subspace. There's no reason the chokepoint always has to end with the same goroutine, or even to require only one chokepoint in a given time quantum; you can have multiple Wait calls, possibly on different goroutines as above, before doing a time.Sleep. Systematically exploring behavior subsets is the name of the game here. You'd just need to pass a token to avoid overlaps/confusion about who should be doing Wait. I could not in good conscience recommend that a developer only evaluate their system on a single synctest spike of the behavior space. |
Update: While implementing this proposal, I discovered to my embarrassment that I have forgotten how fuzz tests work. While fuzz tests operate on a For example, to use synctest inside a fuzz test:
That means the original version of the proposal stands: package synctest
func Test(*testing.T, func(*testing.T))
func Wait() |
Change https://go.dev/cl/671961 mentions this issue: |
This sounds like a bug to me. If anything, the scheduler ought to have strong randomization under synctest in order to avoid dependencies on particular schedules. @neild , does your synctest implementation inject randomness into the scheduler, or is it possible to get overly deterministic schedules? |
The current implementation doesn't inject any scheduler randomness. I think the runtime does inject scheduling randomness when -race is enabled, though, regardless of the presence of synctest. It's definitely possible to hide real race conditions when using synctest. For example, in a real program two goroutines started one nanosecond apart in time may schedule in arbitrary order. In a synctest bubble, the one started earlier in (fake) time will always schedule and run until it exits or blocks before the clock advances and the next goroutine starts. This is a problem with fake clocks in general: A fake clock lets you reliably test specific scenarios, but can hide races that only show up with the fuzziness of real time. |
Based on the discussion above, this proposal seems like a likely accept. See #67434. |
Important
The latest version of this proposal is in #73567 (comment), with a minor update in #73567 (comment)
This is a revised version of #67434, and an alternative to the (now-withdrawn) #73062.
One issue with the existing experimental synctest API is that it can interact poorly with the testing package. In particular:
T.Cleanup
executes cleanup functions registered in a bubble outside of that bubble. When the cleanup function is used to shut down bubbled goroutines, this can cause problems.T.Context
returns a context with a non-bubbled Done channel, which is probably not what you want to use inside a bubble.See the introduction to #73062 for some more details.
The proposal: We replace
synctest.Run
withsynctest.Test
.To anticipate a few possible questions:
What about
testing.B
?Using synctest to run benchmarks is probably always a mistake, since the mechanism of establishing and maintaining the bubble will interfere with the code being benchmarked. Not supporting synctest within a benchmark is probably a positive.
What about
testing.F
?Perhaps we should have a
synctest.Fuzz
as well. Alternatively, we could makesynctest.Test
generic. If we don't decide to addsynctest.Fuzz
now, this could be easily added in the future. (MakingTest
generic would need to be done now, though.)What about non-test cases?
All our current intended uses for
synctest
are in tests, and I don't believe I've heard of any cases of it being used outside of a test function. We can addsynctest.Run
back in the future if it turns out to be useful. Making it more inconvenient to usesynctest
outside of tests (and more convenient to use it in tests) isn't a bad thing at this point in time.Why not make this a method on
testing.T
?Keeping the entire API in the
testing/synctest
package is good for documentation purposes: It avoids adding even more documentation to thetesting
package (already quite large) and lets us put all the documentation for bubbles in a single place.What happens to users of the current experimental API?
We'll keep
Run
around when GOEXPERIMENT=synctest is set for at least one release cycle, to give people a chance to gracefully transition toTest
.The text was updated successfully, but these errors were encountered: