-
Notifications
You must be signed in to change notification settings - Fork 123
sweepbatcher: fix race conditions in unit tests #889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d319d72
to
565cecd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Looks good, just a few questions.
|
||
// Check if the sweep is already in a batch. If that is the case, we | ||
// provide the sweep to that batch and return. | ||
for _, batch := range b.batches { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, now I got it :)
@@ -412,9 +413,22 @@ func NewBatchFromDB(cfg batchConfig, bk batchKit) (*batch, error) { | |||
publishErrorHandler: bk.publishErrorHandler, | |||
purger: bk.purger, | |||
store: bk.store, | |||
log: bk.log, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why this caused a race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests replace the logger with wrappedLogger
(defined in sweep_batcher_test.go) to wait for a particular log message.
// This is used only in unit tests! The reason to have this is to | ||
// avoid data races in require.Eventually calls running in parallel | ||
// to the event loop. See method testRunInEventLoop(). | ||
testReqs chan *testRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could avoid this by instead using checkpoints in within mocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the checks that use this feature access batcher.batches
and fields of a batch
. The fields (e.g. fee rate, the list of sweeps) are updated in the main code, not in the mocks.
} | ||
|
||
// UseLogger uses a specified Logger to output package logging info. | ||
// This should be used in preference to SetLogWriter if the caller is also | ||
// using btclog. | ||
func UseLogger(logger btclog.Logger) { | ||
log = logger | ||
log_.Store(&logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we only really call UseLogger
once it should not race iiuc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call it in runTests
function in each test. It increases logging to include debug messages. I caught a race between tests: in one test a sweepbatcher was shutting down and logging something, while another test was calling UseLogger
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
sweepbatcher/sweep_batch.go
Outdated
@@ -430,7 +444,7 @@ func (b *batch) addSweep(ctx context.Context, sweep *sweep) (bool, error) { | |||
// If the provided sweep is nil, we can't proceed with any checks, so | |||
// we just return early. | |||
if sweep == nil { | |||
b.log.Infof("the sweep is nil") | |||
b.log().Infof("the sweep is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the extra step of calling b.log()
we could create log wrappers in the form of:
func (b *batch) Infof(format string, args ...interface{}) {
b.log().Infof(...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
sweepbatcher/sweep_batcher_test.go
Outdated
// When batch is successfully created it will execute it's first step, | ||
// which leads to a spend monitor of the primary sweep. | ||
<-lnd.RegisterSpendChannel | ||
|
||
// Eventually request will be consumed and a new batch will spin up. | ||
require.Eventually(t, func() bool { | ||
return len(getBatches(ctx, batcher)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could add a Batcher.NumBatches()
and a batch.NumSweeps()
so most of these repeating checks are easier to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could also add a batch.Sweeps()
which would return the swap hashes for the sweeps. So then we could completely remove the intrusive check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added private methods Batcher.numBatches(ctx)
, batch.numSweeps(ctx)
and batch.snapshot(ctx)
and managed to get rid of the majority of testRunInEventLoop
calls.
sweepbatcher/sweep_batcher.go
Outdated
@@ -535,13 +535,15 @@ func (b *Batcher) Run(ctx context.Context) error { | |||
case sweepReq := <-b.sweepReqs: | |||
sweep, err := b.fetchSweep(runCtx, sweepReq) | |||
if err != nil { | |||
log.Warnf("fetchSweep failed: %v.", err) | |||
log().Warnf("fetchSweep failed: %v.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before, although there it's just one function call, but the wrapped log might be a bit nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added private functions debuf
, infof
, warnf
, and errorf
.
@@ -20,7 +20,7 @@ env: | |||
|
|||
# If you change this value, please change it in the following files as well: | |||
# /Dockerfile | |||
GO_VERSION: 1.21.10 | |||
GO_VERSION: 1.24.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
48b25bd
to
51196e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅ Thanks for the fixes, only a few nits.
sweepbatcher/sweep_batcher.go
Outdated
for _, batch := range b.batches { | ||
// This is a check to see if a batch is completed. In that case | ||
// we just lazily delete it and continue our scan. | ||
if batch.isComplete() { | ||
delete(b.batches, batch.id) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove the continue
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
|
||
// Check if the sweep is already in a batch. If that is the case, we | ||
// provide the sweep to that batch and return. | ||
for _, batch := range b.batches { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, now I got it :)
b.log.Warnf("Method timeout() returned %v. Number of"+ | ||
" sweeps: %d. It may be an empty batch.", | ||
b.Warnf("Method timeout() returned %v. Number of "+ | ||
"sweeps: %d. It may be an empty batch.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls clarify, is there another option to being an empty batch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the following comment:
if timeout <= 0 {
+ // This may happen if the batch is empty or if SweepInfo.Timeout
+ // is not set, may be possible in tests or if there is a bug.
b.Warnf("Method timeout() returned %v. Number of "+
"sweeps: %d. It may be an empty batch.",
"batchConfTarget") | ||
} | ||
|
||
b.log.Infof("initializing rbf fee rate for conf target=%v", | ||
b.Infof("initializing rbf fee rate for conf target=%v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline after this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
sweepbatcher/sweep_batcher_test.go
Outdated
return false | ||
} | ||
|
||
// Make sure the batch has proper batchConfTarget. | ||
return batch.cfg.batchConfTarget == 123 | ||
return snapshot.cfg.batchConfTarget == 123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
place 123 into var...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
sweepbatcher/sweep_batcher_test.go
Outdated
// Wait for tx to be published. | ||
<-lnd.TxPublishChannel | ||
batch := getOnlyBatch(t, ctx, batcher).snapshot(ctx) | ||
require.Equal(t, int32(123), batch.cfg.batchConfTarget) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and use it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
sweepbatcher/sweep_batcher_test.go
Outdated
return false | ||
} | ||
|
||
// Make sure the batch has proper batchConfTarget. | ||
return batch.cfg.batchConfTarget == 123 | ||
return snapshot.cfg.batchConfTarget == 123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
sweepbatcher/sweep_batcher_test.go
Outdated
require.Equal(t, feeRateMedium, batch.rbfCache.FeeRate) | ||
batch := getOnlyBatch(t, ctx, batcher) | ||
snapshot := batch.snapshot(ctx) | ||
require.Equal(t, 1, len(snapshot.sweeps)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
sweepbatcher/sweep_batcher_test.go
Outdated
require.Len(t, batch.sweeps, 2) | ||
require.Equal(t, feeRateMedium, batch.rbfCache.FeeRate) | ||
snapshot = batch.snapshot(ctx) | ||
require.Equal(t, 2, len(snapshot.sweeps)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
It should use the c variable passed into the lambda, not the parent t. It should use assert, not require package.
Previously, if a completed batch was visited after a batch to which the sweep was added, it was not deleted because the function returned early. This has been separated into two loops: the first one removes completed batches, and the second one adds the sweep to a batch.
This is needed to fix crashes in unit tests under -race.
Several structures were accessed without protection causing crashes under -race.
The code inside require.Eventually runs in parallel with the event loops of the batcher and its batches. Accessing fields of the batcher and batches must be done within an event loop. To address this, testRunInEventLoop methods were added to the Batcher and batch types. Unit tests were then rewritten to use this approach when accessing batcher and batch fields. Additionally, in many cases, receive operations from RegisterSpendChannel were moved before require.Eventually. This prevents testRunInEventLoop from getting stuck in an event loop while blocked on a RegisterSpendChannel send operation.
Try to catch more race conditions in unit tests.
51196e7
to
60f4906
Compare
Go 1.24 apparently improved its race detector, so regular
go test -race
started discovering all of these. Now it passes. Some changes were needed not only in unit tests to make accesses of batcher and batch fields thread safe.Pull Request Checklist
release_notes.md
if your PR contains major features, breaking changes or bugfixes