Skip to content

proposal: testing/synctest: replace Run with Test #73567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
neild opened this issue May 1, 2025 · 28 comments
Open

proposal: testing/synctest: replace Run with Test #73567

neild opened this issue May 1, 2025 · 28 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-FinalCommentPeriod
Milestone

Comments

@neild
Copy link
Contributor

neild commented May 1, 2025

Important

The latest version of this proposal is in #73567 (comment), with a minor update in #73567 (comment)

This is a revised version of #67434, and an alternative to the (now-withdrawn) #73062.

One issue with the existing experimental synctest API is that it can interact poorly with the testing package. In particular:

  • T.Cleanup executes cleanup functions registered in a bubble outside of that bubble. When the cleanup function is used to shut down bubbled goroutines, this can cause problems.
  • T.Context returns a context with a non-bubbled Done channel, which is probably not what you want to use inside a bubble.

See the introduction to #73062 for some more details.

The proposal: We replace synctest.Run with synctest.Test.

// Test executes f in a new goroutine.
//
// The new goroutine and any goroutines transitively started by it form
// an isolated "bubble".
// Test waits for all goroutines in the bubble to exit before returning.
//
// Goroutines in the bubble use a synthetic time implementation.
// The initial time is midnight UTC 2000-01-01.
//
// Time advances when every goroutine in the bubble is blocked.
// For example, a call to time.Sleep will block until all other
// goroutines are blocked and return after the bubble's clock has
// advanced. See [Wait] for the specific definition of blocked.
//
// If every goroutine is blocked and there are no timers scheduled,
// Test panics.
//
// Channels, time.Timers, and time.Tickers created within the bubble
// are associated with it. Operating on a bubbled channel, timer, or ticker
// from outside the bubble panics.
//
// The [*testing.T] provided to f has the following properties:
//
//    - Functions registered with T.Cleanup run inside the bubble,
//       immediately before Test returns.
//    - The [context.Context] returned by T.Context has a Done channel
//       associated with the bubble.
//    - T.Run panics. Subtests may not be created within the bubble.
func Test(t *testing.T, f func(*testing.T))

// Wait is unchanged from the existing proposal.
func Wait()

To anticipate a few possible questions:

What about testing.B?

Using synctest to run benchmarks is probably always a mistake, since the mechanism of establishing and maintaining the bubble will interfere with the code being benchmarked. Not supporting synctest within a benchmark is probably a positive.

What about testing.F?

Perhaps we should have a synctest.Fuzz as well. Alternatively, we could make synctest.Test generic. If we don't decide to add synctest.Fuzz now, this could be easily added in the future. (Making Test generic would need to be done now, though.)

What about non-test cases?

All our current intended uses for synctest are in tests, and I don't believe I've heard of any cases of it being used outside of a test function. We can add synctest.Run back in the future if it turns out to be useful. Making it more inconvenient to use synctest outside of tests (and more convenient to use it in tests) isn't a bad thing at this point in time.

Why not make this a method on testing.T?

Keeping the entire API in the testing/synctest package is good for documentation purposes: It avoids adding even more documentation to the testing package (already quite large) and lets us put all the documentation for bubbles in a single place.

What happens to users of the current experimental API?

We'll keep Run around when GOEXPERIMENT=synctest is set for at least one release cycle, to give people a chance to gracefully transition to Test.

@gopherbot gopherbot added this to the Proposal milestone May 1, 2025
@rittneje
Copy link
Contributor

rittneje commented May 1, 2025

Is it legal to call synctest.Test within the callback itself?

synctest.Test(t, func(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        // ...
    })
})

@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label May 1, 2025
@neild
Copy link
Contributor Author

neild commented May 1, 2025

Is it legal to call synctest.Test within the callback itself?

No, no recursive bubbles.

@rittneje
Copy link
Contributor

rittneje commented May 2, 2025

Can that be added to the docs?

Also, what about t.Parallel?

@neild
Copy link
Contributor Author

neild commented May 2, 2025

Noted that the non-recursiveness needs to be in the docs.

I waffled a bit on t.Parallel. It probably should panic, since it's not obvious whether it's making the portion of the test started by synctest.Test parallel or the test itself. If it panics, we leave ourselves open to implement useful behavior in the future if we think of it. (I doubt we'll think of anything useful, though.)

@gh2o
Copy link

gh2o commented May 2, 2025

I like the idea of making synctest only available within tests. The only downside I can think of is if someone were to want to use synctest with a third party testing framework. But I agree that can be added later if requested, and this is a much more common use case.

In concordance with t.Run(), would it make sense for it to take a name parameter, since presumably the new test would run as a sub-test?

func Test(t *testing.T, name string, f func(*testing.T))

@neild
Copy link
Contributor Author

neild commented May 2, 2025

presumably the new test would run as a sub-test?

As currently defined, synctest.Test does not start a new subtest.

A possible modification to this proposal would be for it to start a subtest, in which case we would want to give it a name parameter. But then you can't run a bubbled test without an extra layer of subtest, even if you don't want one, which is unfortunate.

We're aiming for the least amount of new mechanism in this API to start with, which I think argues for synctest.Test not starting a subtest--if you do want a subtest, you can always call T.Run before creating the synctest bubble.

@apparentlymart
Copy link

apparentlymart commented May 2, 2025

As in my original feedback in #73062 (comment), I do prefer that synctest.Test be a separate idea from creating a subtest since in the case where the entire test runs in a single bubble there isn't really any useful subtest name to use.


Presumably if we do find with experience that it's common to combine subtests directly with synctest bubbles (so that they correspond exactly in the code) then a later proposal could call for something like the following that would be backward compatible:

package synctest

// Subtest begins a new subtest (as if calling [testing.T.Run]) whose body
// executes in a new goroutine in an isolated bubble (as if calling [Test]).
func Subtest(t *testing.T, name string, f func (t *testing.T)) {
    t.Run(name, func (t *testing.T) {
        Test(t, f)
    })
}

...but in the meantime, something like the body above can be inlined into any test that needs it, so we can wait to see whether this situation comes up enough for this helper to pull its weight.

@meling
Copy link

meling commented May 4, 2025

I like this new API.

But I don't like the stutter in synctest.Test. How important is it to allow a transition period away from the previous Run function? I mean, it was communicated clearly that it was an experiment, and it would be relatively easy to provide a go fix to add a t argument to any existing code. Maybe there are other changes related to cleanup and context that cannot easily be go fixed, but perhaps those are less common, no?

It would be a pity to forego synctest.Run just to allow a transition period for something that was already an experiment.

@glycerine
Copy link

glycerine commented May 4, 2025

I agree with Hein. I dislike the stuttering Test suggestion because the word "test" is over used, generic, not descriptive, and synctest is already vague and not particularly descriptive for something this awesome (I discovered it almost by accident!) "faketime" would be a descriptive package name. "bubblelife" might be amusing. I kinda like double barrelling for emphasis sometimes -- like "fakebubble". Also, using Bubble as the verb might not be bad, as it's a great, rarely used word. As in, faketime.Bubble(). There! Super descriptive. Anyway. Give naming some weight in your thoughts. It matters way, way more than we realize. For better or worse, names are very sticky.

I actually prefer to break -- even deliberately break -- backwards compatibility within experiments, because otherwise you are shaping yourself towards a worse design at the end. The point of the experiment is literally to improve the design. Use that power. Have courage. Be bold, especially in an experiment.

I like adding the testing.T parameter also from a safety point of view. You don't want to accidentally use fake time in a real program. Being a sub package of testing might not imply a enough of a danger zone, necessarily, but seeing a *testing.T is pretty darn unambiguous.

@neild
Copy link
Contributor Author

neild commented May 5, 2025

A few reasons for the name Test:

  1. It makes it clear that it works on a *testing.T.
  2. It gives us an obvious name for an equivalent that works on a *testing.F: synctest.Fuzz.
  3. If we decide in the future that we actually do want a function that starts a bubble without an associated *testing.T, we still have the Run name available.

@neild
Copy link
Contributor Author

neild commented May 5, 2025

A couple updates to the proposal:

I'm adding a Fuzz function to the proposal. My first thought was that this is something that could be added later, but on reflection I don't see why we would wait. Using fuzzing and bubbles together is perfectly reasonable. There is still no Bench function, because we don't want to make any performance guarantees about bubbles. Benchmarks should run in the most realistic environment possible.

(An alternative would be a generic function, such as func Run[TF *testing.T | *testing.F](tf TF, f func(TF)). I think separate functions is a bit clearer, but I don't have a strong opinon here.)

Added a mention that bubbles may not be recursive.

// Test executes f in a new goroutine.
//
// The new goroutine and any goroutines transitively started by it form
// an isolated "bubble".
// Test waits for all goroutines in the bubble to exit before returning.
//
// Goroutines in the bubble use a synthetic time implementation.
// The initial time is midnight UTC 2000-01-01.
//
// Time advances when every goroutine in the bubble is blocked.
// For example, a call to time.Sleep will block until all other
// goroutines are blocked and return after the bubble's clock has
// advanced. See [Wait] for the specific definition of blocked.
//
// If every goroutine is blocked and there are no timers scheduled,
// Test panics.
//
// Channels, time.Timers, and time.Tickers created within the bubble
// are associated with it. Operating on a bubbled channel, timer, or ticker
// from outside the bubble panics.
//
// If Test is called from within an existing bubble, it panics.
//
// The [*testing.T] provided to f has the following properties:
//
//    - Functions registered with T.Cleanup run inside the bubble,
//       immediately before Test returns.
//    - The [context.Context] returned by T.Context has a Done channel
//       associated with the bubble.
//    - T.Run panics. Subtests may not be created within the bubble.
func Test(t *testing.T, f func(*testing.T))

// Fuzz is like Test, but for fuzz tests.
func Fuzz(f *testing.F, fn func(*testing.F))

// Wait is unchanged from the existing proposal.
func Wait()

@AndrewHarrisSPU
Copy link

@glycerine

I actually prefer to break -- even deliberately break -- backwards compatibility within experiments, because otherwise you are shaping yourself towards a worse design at the end. The point of the experiment is literally to improve the design.

This makes a lot of sense to me. Has anything under a GOEXPERIMENT flag "failed" in the sense that the underlying stuff was retracted? It seems like that must be an anticipated outcome, but I'm not sure what the rules are or should be. If it wouldn't be reasonable to retract GOEXPERIMENT=synctest, having that flag result in a program shutdown and a tip to move on, what would be the reasonable way to "fail" an experiment?

E.g. with a package testing/modtime, modtime.Run(t *testing.T, f func(*testing.T)), modtime.Fuzz(f *testing.F, fn func(*testing.F)), modtime.Wait() are the symbols, and GOEXPERIMENT=synctest would exit with a tip to jump to GOEXPERIMENT=modtime for experimental users?

@apparentlymart
Copy link

apparentlymart commented May 7, 2025

In this new form that has direct access to a testing.T, would it make sense to redefine the deadlock behavior to be something functionally equivalent to calling t.Fatal with whatever text would've been in the panic message in the original proposal?

I'm hoping that the "bubble" mechanism does a good enough job of keeping things contained that it would be okay to keep running other tests in the same package -- or possibly even other separately-bubbled subtests in the same test? -- if one test deadlocks inside a synctest bubble.

I can understand that might be unfavorable if it would make it hard to offer a non-testing-integrated version of this later, but all else being equal having a test fail properly tends to be more ergonomic than a panic.

@glycerine
Copy link

glycerine commented May 8, 2025

Usability feedback: its a minor PITA to wrap all my tests with synctest.Run(). I mean it is worth it, but awkward, currently. For an experiment, this is fine. Mimimizing the invasive changes needed to adopt synctest I assume is a goal.

What I would like: A) to not have to maintain two versions of every test.

What I would like: B) to run both synctest and non-synctest versions of every test, easily, before check-in. My current usage pattern has been to run both synctest and non-synctest versions before committing new code.

Current solution: I have a pair files, synctest.go and synctest_not.go, that abstract out synctest.Run and synctest.Wait, and use the build tag or !tag for the experiment. In the synctest_not.go file, the operations are no-ops. This avoids the (A) duplication issue.

I not sure how to improve the situation, and its not horrible presently, but it is clunky.

I'd be open to using _ imports to add synctest to _test.go files with less surgery required. I'd be open to a go test flag that converts the whole testing package to be a synctest bubble-every-test package, and just ignores benchmarks. I'd be open to having the package provide a tag based approach similar to my wrappers like (B) above.

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented May 8, 2025

Spelling out some logic:

  1. with p and q, where p is synctest.Test(t, f), and q is the same test without clock manipulation, we want to establish that executing p or q results in the same test outcome - pass/pass, fail/fail, deadlock/deadlock, etc.

  2. synctest is a really wonderful, technical bit to have in the standard library, but necessarily f has to respect synctest mechanisms or it won't establish the same-outcomes property

  3. therefore, test authors using synctest should know how to evaluate q.

It seems like there could be various interesting/useful ways to toggle evaluation of p and q (environment variables come to mind), but maybe giving q directly, e.g.: synctest.TestRealTime(t *testing.T, f func(*testing.T)) would leave that in the hands of test authors?

@apparentlymart
Copy link

Reading the most recent comments above my mind initially went to something like this:

func TestSomething(t *testing.T) {
    test := func (t *testing.T) {
        // (the actual test code)
    }

    t.Run("bubbled", func (t *testing.T) {
        synctest.Test(t, test)
    }()
    t.Run("unbubbled", test)
}

Exactly what I sketched above could only work if synctest.Wait()'s specification were changed so that calling it outside of a bubble is no-op rather than a panic, since otherwise the unbubbled function would need to have some material differences compared to the bubbled version.

Now, I'm not meaning to suggest that's necessarily a good idea, but I do still want to ask for the sake of better understanding the problem: would that be sufficient to allow running the same test code both bubbled and unbubbled, or are there likely to be additional differences between a bubbled and an unbubbled test implementation that would also need to be accounted for somehow?


Sidebar: the example I wrote above could've been a little shorter and (subjectively) more symmetrical if there were a function like this:

package synctest

// Bubbled takes a subtest-like function and returns the same function
// instrumented so that its body runs in a synctest bubble.
//
// (Various additional constraints on usage of the result follow from
// this that I won't spell out here, since they are just implications of
// the current specification of synctest.Test in the proposal)
func Bubbled(f func (*testing.T)) func (*testing.T)

then,

    t.Run("bubbled", synctest.Bubbled(test))
    t.Run("unbubbled", test)

I don't know if this improves the typical callsite enough to be worth it, but I'm mentioning it anyway for the sake of discussion.

@neild
Copy link
Contributor Author

neild commented May 8, 2025

I'm dubious about providing an API for running a function inside or outside a bubble.

Any test which uses synctest.Wait can't be correctly used outside a bubble. (If the Wait is doing anything meaningful, then removing it causes a race condition. Even if you use a real clock.)

Almost any test which uses time is going to be slow and/or flaky outside a bubble.

Making a test better able to execute outside a bubble makes it a worse test inside one: Avoiding synctest.Wait makes tests less clear. Balancing slow and flaky for a real clock makes tests less precise: A test using a fake clock can verify that events occur at the expected time (for example, cache entries expiring at a specified deadline), while a test using a real clock can't.

On the flipside, making a test able to execute inside a bubble may require using a fake network or other dependency, which isn't required for a non-bubbled test.

So I don't think that running the same test both inside and outside a bubble is going to be the usual case, and I don't think we should encourage authors to write tests that way.

In the event that you do want to do this, a helper function is trivial:

func testMaybeBubbled(t *testing.T, f func(*testing.T)) {
  t.Run("no bubble", f)
  t.Run("yes bubble", func(t *testing.T) {
    synctest.Test(t, f)
  })
}

@AndrewHarrisSPU
Copy link

Almost any test which uses time is going to be slow and/or flaky outside a bubble.

Thanks - I found this persuasive, I was thinking of cases where synctest is just speeding up "slow", and not considering "flaky".

@apparentlymart
Copy link

@neild your helper function example seems essentially the same as what I showed inline in my earlier comment, and so I think has the same problem if implemented with the current form of this proposal: the bubbled version probably needs to call synctest.Wait() at some point, but the non-bubbled version cannot call synctest.Wait() without panicking.

So it seems like it would need either a concession that synctest.Wait() is treated as a no-op instead of a panic when called outside of a bubble, or for the test author to make some extra effort to wrap synctest.Wait() in something that the test itself can turn into a no-op in the non-bubble case, such as:

func testMaybeBubbled(t *testing.T, f func(t *testing.T, waitIfBubble func()) {
    t.Run("no bubble", func (t *testing.T) {
        f(t, func () { /* no-op wait */ })
    })
    t.Run("bubble", func (t *testing.T) {
        f(t, synctest.Wait)
    })
}

...and then the given function would call this waitIfBubble parameter instead of calling synctest.Wait directly.

However, I do tend to agree with you that if you're testing something where it's beneficial to use synctest then it's presumably something that's very hard to test reliably or efficiently without synctest, and so that was what I was trying to get at with the question in my previous comment: is disabling calls to synctest.Wait in the non-bubble case enough for this to work, or are there other problems that would cause the two versions of the test to be too different for any sort of reuse to be worthwhile?

("where synctest is just speeding up slow" does seem like a plausible example where something like the above could work, but is that situation common enough to be worth any special support? It seems like probably not...)

@neild
Copy link
Contributor Author

neild commented May 8, 2025

The problem is that synctest.Wait is not a no-op. You can't take a test that uses it, turn the waits into no-ops, and have a working test.

done := false
go func() {
  done = true
}()
time.Sleep(someInterval)
synctest.Wait()
if !done { ... }

If you take out the synctest.Wait, this test contains a race condition. The race detector will correctly complain about it. If you make someInterval larger the race will become less likely to happen in practice, but it doesn't go away.

If you want to write a test that can execute with both fake and real clocks, then you need to avoid synctest.Wait entirely and rely on other synchronization mechanisms.

@glycerine
Copy link

I extracted a small package containing my network/chaos simulator (all channels, no real network) from my larger RPC package, and made it more general by rounding it out with a proper net.Conn interface. If it might be useful for anyone doing synctest experiments, you can find it here https://github.com/glycerine/gosimnet . It can simulate network partitions, node failures, and node restarts. It is also kind of rough and ready, as I don't need the full net.Conn interface myself in rpc25519. So if you want to alpha test it, help me polish it by filing issues for any rough edges or incomplete implementation points, or especially anything that would make synctest testing better.

@glycerine
Copy link

glycerine commented May 9, 2025

Neil wrote,

I don't think that running the same test both inside and outside a bubble is going to be the usual case

I don't want overstate the point, but just in case this is not obvious...

The reason I've been doing this as standard operating procedure is to get to test a different subset of event interleavings. Both synctest and non-synctest runs still only give me very partial coverage of all possible event interleavings of course--for that you do need a real model checker and its correspondingly smaller than the real world sized model--but so far both have been highly useful and both have helped me catch real bugs.

Synctest gives me interleavings that are hard to get with realtime, and its higher reproducibilty (not completely reproducible, of course, since the goroutines that are not in synctest.Wait can become durably blocked in many orders) is a massive boon. Realtime on the other hand checks a wider but shallower set of interleavings. So I find both very valuable. I get better statistical "coverage", better sampling if you will, of possible interleavings. synctest.Wait gives me a "spike" where I know one goroutine always goes last, whereas realtime has more concurrent interleavings and I don't know that a given goroutine is "going last" in a time step.

edit: Interestingly, this suggests a sampling strategy to take multiple "spikes" through the event space that might get even better coverage: rotate the goroutine that calls synctest.Wait around deterministically; or even randomize it, varying who gets to "go last" by that goroutine being the one to call Wait.

The synctest.Wait call's barrier provides a chokepoint to restrict sampling to a particular event behavior subspace. There's no reason the chokepoint always has to end with the same goroutine, or even to require only one chokepoint in a given time quantum; you can have multiple Wait calls, possibly on different goroutines as above, before doing a time.Sleep.

Systematically exploring behavior subsets is the name of the game here. You'd just need to pass a token to avoid overlaps/confusion about who should be doing Wait.

I could not in good conscience recommend that a developer only evaluate their system on a single synctest spike of the behavior space.

@neild
Copy link
Contributor Author

neild commented May 9, 2025

Update: While implementing this proposal, I discovered to my embarrassment that I have forgotten how fuzz tests work. While fuzz tests operate on a *testing.F, fuzz targets use a *testing.T. I can think of no reason to use synctest outside of a fuzz target, so there is no need for a synctest.Fuzz function.

For example, to use synctest inside a fuzz test:

func Fuzz(f *testing.F) {
  f.Fuzz(func(t *testing.T, in []byte) {
    synctest.Test(t, func(t *testing.T) {
      // test here
    }
  }
}

That means the original version of the proposal stands:

package synctest
func Test(*testing.T, func(*testing.T))
func Wait()

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/671961 mentions this issue: testing/synctest: add Test

@aclements
Copy link
Member

synctest.Wait gives me a "spike" where I know one goroutine always goes last, whereas realtime has more concurrent interleavings and I don't know that a given goroutine is "going last" in a time step.

This sounds like a bug to me. If anything, the scheduler ought to have strong randomization under synctest in order to avoid dependencies on particular schedules. @neild , does your synctest implementation inject randomness into the scheduler, or is it possible to get overly deterministic schedules?

@neild
Copy link
Contributor Author

neild commented May 12, 2025

The current implementation doesn't inject any scheduler randomness. I think the runtime does inject scheduling randomness when -race is enabled, though, regardless of the presence of synctest.

It's definitely possible to hide real race conditions when using synctest. For example, in a real program two goroutines started one nanosecond apart in time may schedule in arbitrary order. In a synctest bubble, the one started earlier in (fake) time will always schedule and run until it exits or blocks before the clock advances and the next goroutine starts. This is a problem with fake clocks in general: A fake clock lets you reliably test specific scenarios, but can hide races that only show up with the fuzziness of real time.

@aclements aclements moved this to Incoming in Proposals May 13, 2025
@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

See #67434.

@aclements aclements moved this from Incoming to Likely Accept in Proposals May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-FinalCommentPeriod
Projects
Status: Likely Accept
Development

No branches or pull requests

10 participants