Description
The errgroup package will currently spawn a new goroutine for each invocation of Group.Go. This is usually fine, but extremely high cardinality fanout can exhaust memory or other resources. It would be neat if the errgroup interface allowed users to specify the maximum number of concurrent goroutines they want the errgroup to spawn.
Proposal
type Group struct {
N int
// contains etc
}
N would be copied to an unexported on the first invocation of Go, so that subsequent modification has no effect. This preserves the validity and the behavior of the empty Group.
When calling Go, if the number of functions running is > N then Go would block until the number was <= N.
The behavior of Go is not otherwise modified; if a subtask returns an error, then subsequent tasks will still be executed, and callers would rely on subtasks handling context cancellation to fall through to the Wait() call and then return, if WithContext was called.
Alternatives considered
An alternative interface would be that Go never block, but enqueue instead. This is an unbounded queue and I'm not a fan.
Another alternative is that the group is context-aware, and that Go return immediately if the group's context is cancelled. This requires that Group retain a reference to the context, which it does not currently do.
Activity
mdlayher commentedon Sep 25, 2018
/cc @bcmills who recently was thinking about some changes to this package IIRC
kevinburke commentedon Sep 25, 2018
In the meantime I'd suggest using a buffered channel before calling group.Go() and releasing it when the function returns, or using a package like github.com/kevinburke/semaphore to acquire resources before starting a goroutine.
bcmills commentedon Jan 10, 2019
There is a draft API In slide 119 (in the backup slides) of my GopherCon 2018 talk, Rethinking Classical Concurrency Patterns.
I agree that the
Go
method should block until it can begin executing the function, not enqueue: enqueuing tasks to a bounded executor is much too prone to deadlocks.I propose a new
TryGo
method as a non-blocking alternative. (A non-blocking variant is mostly useful for “concurrency-saturating” operations like tree or graph traversals, where you want to keep the number of concurrent workers as high as possible but can fall back to sequential operation when saturated.)I would rather have a
SetLimit
method than an exported field: that way we can more easily enforce invariants like “the limit must not be modified while goroutines are running”.fatih commentedon Jul 20, 2019
I also needed something similar and combined it with
golang.org/x/sync/semaphore
. Here is an example on how I'm using it. It limits the number of simultaneous execution based on the variablemaxWorkers
:If anything in this approach wrong please let me know. Seems like it works fine based on the debug statements.
alexaandru commentedon Jul 25, 2019
@fatih I would personally put the
Acquire()
outside/in front of the goroutine. The way you have it, it does NOT prevent the launching of 50 simultaneous goroutines, it only prevents them to actually do their work for more thanmaxWorkers
at a time.Look at it another way, if instead of 50, you had 1m, what your code does is launch 1m goroutines. Of them,
maxWorkers
goroutines will actuall do the work (well, in this case sleep), while1m - maxWorkers
of them will ALL attempt to acquire the lock (that sits behind the semaphore abstraction).All the best!
fatih commentedon Jul 25, 2019
@alexaandru thanks for the tip! You're right about that. I've fixed that actually on my end (https://twitter.com/fatih/status/1152991683870633985 and https://play.golang.org/p/h2yfBVC8IjB) but I forgot to update it here.
alexaandru commentedon Jul 25, 2019
You're most welcome @fatih ! Cheers! :)
tschaub commentedon Apr 9, 2020
Another subtle issue that ideally would be solved by having an errgroup with a limit is that it is very easy to write code using
errgroup
andsemaphore
that swallows significant errors and instead returns onlycontext.Cancelled
.For example, it might be non-obvious that the
work
function below returnscontext.Cancelled
instead oferrors.New("important message here")
:The code can be fixed with something like this, but it is easy to forget
bcmills commentedon Apr 10, 2020
@tschaub, note that in general anything that may produce an error as a result of
errgroup
cancellation should be run within theerrgroup
itself.So that example would probably be clearer as:
smasher164 commentedon May 25, 2020
We came across this use-case today, and used a semaphore channel instead of x/sync/semaphore. But since context is heavily threaded through, we'll probably switch to using x/sync/semaphore.
Regarding the proposed API,
SetLimit
makes sense with existing errgroup API, butTryGo
always succeeds when there is no limit. Would there be a clearer separation with aLimitGroup
type, which is instantiated withWithContextLimit
?60 remaining items