-
Notifications
You must be signed in to change notification settings - Fork 411
[peer_handler] Take the peers lock before getting messages to send #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[peer_handler] Take the peers lock before getting messages to send #891
Conversation
Previously, if a user simultaneously called `PeerHandler::process_events()` from two threads, we'd race, which ended up sending messages out-of-order in the real world. Specifically, we first called `get_and_clear_pending_msg_events`, then take the `peers` lock and push the messages we got into the sending queue. Two threads may both get some set of messages to send, but then race each other into the `peers` lock and send the messages in random order. Because we already hold the `peers` lock when calling most message handler functions, we can simply take the lock before calling `get_and_clear_pending_msg_events`, solving the race.
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
Codecov Report
@@ Coverage Diff @@
## main #891 +/- ##
==========================================
- Coverage 90.30% 90.29% -0.01%
==========================================
Files 57 57
Lines 29225 29225
==========================================
- Hits 26392 26390 -2
- Misses 2833 2835 +2
Continue to review full report at Codecov.
|
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
utACK If I understand correctly, read handling locks this in |
Fixes #888 |
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
Previously, if a user simultaneously called
PeerHandler::process_events()
from two threads, we'd race, whichended up sending messages out-of-order in the real world.
Specifically, we first called
get_and_clear_pending_msg_events
,then take the
peers
lock and push the messages we got into thesending queue. Two threads may both get some set of messages to
send, but then race each other into the
peers
lock and send themessages in random order.
Because we already hold the
peers
lock when calling most messagehandler functions, we can simply take the lock before calling
get_and_clear_pending_msg_events
, solving the race.