-
Notifications
You must be signed in to change notification settings - Fork 408
Implement KeyError for bad key in channel announcement process #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ln/channelmanager.rs
Outdated
short_to_id.remove(&channel.get_short_channel_id().unwrap()); | ||
return false; | ||
} else { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to do with failure due to not having received FundingLocked yet from remote peer ? Do we need to scan the block again ? And so flagged it somewhere ? Error case from get_channel_announcement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if KeyError is really an "action"? It almost certainly indicates a very broken client that we can't really do anything with and we should just prefer to Disconnect them.
Yes in fact KeyError isn't an "action" to initiate reaction but more an internal signal. It's the same problem we already have with other error management commits like #43 . We don't want to enumerate each case of error but sometimes we want to take different actions filtering on the kind of error. Here, in block_connected, it seems we can have failures provoked 1/ by a bad key material, 2/ by not having receiving yet FundingLocked from remote peer and channel not being in a ready state. How do we sort between the two ones ? |
I mean if we have more granual ErrorActions then we just have to have a mapping in peer_handler from granual events to handling many of them in the same way. It seems to me that the place where we know the most about how we should be handling an error is the place we detect it, and thus we should make the decision of what we want to do at that time. If we want more options beyond Disconnect (maybe BanTheEvilPeer?) I think its helpful to keep them as "actions" not "descriptions of the events" (those come in the ASCII debug string). |
Ok so ideally the unwrapper of error (here block_connected) shouldn't be aware of error types and just cast an ErrorAction to an Event, but in this case events fields need to be fulfilled at error detection. Why not having the action field of HandleError being directly an Event ? Other ways ? |
Ah, yea, I hit the same conundrum in https://github.com/rust-bitcoin/rust-lightning/pull/98/files#diff-d152f468921ba1eb6da8dac49a90a97aR1160. I started down the road of having a converter that converts ErrorActions into Events (which I think is probably the right approach, or maybe just have an Event type that holds a node_id and an ErrorAction), but sadly UpdateFailHTLC doesn't map cleanly as we have an asymmetry between the fail_backwards version (where we include a commitment) and the handle_update_add_htlc where we don't. Reading it more closely now I realize this is a bug so things should be symmetric. Seems to me moving towards having Event that holds ErrorActions makes sense. |
So an Event { node_id, error_action } pass to the peer_handler which will be in charge of handling all the error actions. Hmm I'll sketch out a PR on it tomorrow to have a clue on this design |
Yea, should be easy just an Event::HandleError { node_id, error_action } and change the UpdateFailHTLC stuff to have an Option (for now, the bug needs fixing, but I think it'll end up needing that either way). |
Note that I'm hoping most of the key errors can go away with new rust-secp. |
Cool, waiting for it, if it can get ride of all key errors, please close it. |
Hmm, well it didn't go away as much as I'd liked, so may be time to revisit this. Though I'm still skeptical KeyError makes sense as an "action", it may be nice to convert the key errors into channel-failures or disconnects depending on the context. |
Ok, I'm gonna give a new look on errors in channel_manager, without KeyError but action-based, lightning for downstream handling. |
cbeff7e
to
e46b876
Compare
e46b876
to
23468e0
Compare
@@ -1370,7 +1370,7 @@ impl ChainListener for ChannelManager { | |||
Ok(res) => res, | |||
Err(e) => { | |||
log_error!(self, "Got error handling message: {}!", e.err); | |||
//TODO: push e on events and blow up the channel (it has bad keys) | |||
//TODO: handle WaitNewState or IgnoreMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need KeyError anymore here ! Just to fulfill the downstream HandleError with some WaitNewState (waiting for ChannelState::ChannelFunded) or other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think WaitNewState makes sense - maybe there's some confusion here but if you see a "bad key" from something the remote generated they are clearly misbehaving - if you could randomly hit bad keys on accident it'd be a sign that the underlying ECC crypto were broken, so we should treat it as any other clearly-misbehaving node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not targeting the "bad key" err here but more the "Cannot get a ChannelAnnouncement until the channel funding has been locked" from get_channel_announcement which can be return upstream in block_connected. How should we handle this kind of case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I missed context, in any case, we can simplify this further, see #133.
let failure = chan.get_mut().force_shutdown(); | ||
short_to_id.remove(&chan.get_mut().get_short_channel_id().unwrap()); | ||
self.finish_force_close_channel(failure); | ||
return Err(HandleError{err: e.err, action: e.action}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the errors are bad keys or shows strange behavior of peer so we close and disconnect. But pushing event here is pretty messy with channel and events locks so we give error upstream (ugly I admit still thinking on another way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're gonna need to handle this in a number of places, so it may make sense to start thinking about splitting out the "Channel returned an Err, maybe call force_close and then return" function and wrap all the handle_MSG functions so that we can handle things appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a handle_channel_msg function in channel_manager which pass message and manage errors (remove channel and call force_close) ? Okay but it's gonna to get the lock again that some handle_MSG just dropped..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more that the ChannelMessageHandler impl for ChannelManager would mostly consist of wrapper functions that called into "real" handler functions and then, on error, called force_close channel. The ChannelMessageHandler requirements say that we'll never have paralell message handler calls for the same peer, so locking should be fine in that we'll handle the error in order for the given peer.
For me, there is only one case of key error handling left in |
I think all the bits and bobs here have been pulled in via some other PR at this point? |
Yes, #160 at least is taking the following of these issues. |
Don't panic on tx broadcast failures
Implement a KeyError for case which we have a secp failure. In this case of block_connected, the error is internal so we disconnect the remote peer to avoid further interactions with a defectuous key material (maybe add a message "Internal bad keys management" but could incite to exploitation if we share others channels, so staying mute seems better).