-
Notifications
You must be signed in to change notification settings - Fork 407
Cleanup logging #965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup logging #965
Conversation
These patches have been merged upstream and are in releases now so we don't need to patch them locally.
When a peer sends us the routing graph, it may include gossip messages for our channels, despite it not being a party to them. This is completely fine, but we currently print a somewhat-scary looking log messages in these cases, eg: ``` ERROR [lightning::ln::channelmanager:4104] Got a message for a channel from the wrong node! TRACE [lightning::ln::peer_handler:1267] Handling SendErrorMessage HandleError event in peer_handler for node ... with message Got a message for a channel from the wrong node! ``` Instead, we should simply not consider this an "error" condition and stay silent.
f83fa1a
to
4a87105
Compare
Codecov Report
@@ Coverage Diff @@
## main #965 +/- ##
==========================================
+ Coverage 90.66% 91.75% +1.08%
==========================================
Files 60 60
Lines 30407 34788 +4381
==========================================
+ Hits 27568 31919 +4351
- Misses 2839 2869 +30
Continue to review full report at Codecov.
|
4a87105
to
edd0da5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this cleanup :)
// a scary-looking error message and return Ok instead. | ||
return Ok(()); | ||
} | ||
return Err(MsgHandleErrInternal::send_err_msg_no_close("Got a channel_update for a channel from the wrong node - it shouldn't know about our private channels!".to_owned(), chan_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though #949 blurs this assertion, a direct peer might learn a channel_update for outbound-from-us payments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#949 should only be sending the message directly to the channel counterparty, so I think its still correct. Its true we do tell other nodes about our private channels in invoices, but they still shouldn't remember that, nor should they ever have channel_updates for those messages. Do you have a suggestion for a different concrete wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your comment is right if this peer is applying the same gossip policy than us ? Some weird LN client could not sort discovered gossips according to their sources and treat them uniformly, I don't think that's something the specification is covering (and IMHO it should be strongly opinionated on this).
So as a suggestion maybe prefix "Gossip policy: Got a channel_update for a channel from the wrong node - we consider it shouldn't know about this private channels!" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification is clear that you should not accept or forward a channel_update
unless you've received the corresponding channel_announcement
, which we will never generate signatures for, so it cannot exist.
log_error!(args.logger, " The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.", | ||
log_bytes!(channel.channel_id()), monitor.get_latest_update_id(), channel.get_latest_monitor_update_id()); | ||
log_error!(args.logger, " The chain::Watch API *requires* that monitors are persisted durably before returning,"); | ||
log_error!(args.logger, " client applications must ensure that ChannelMonitor data is always available and the latest to avoid funds loss!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add another line "You can search for a ChannelMonitor update_id matching the ChannelManager", otherwise what the user can do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, isn't that included in the line two lines up The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.
? I don't really think there is anything the user can do here, the application developer has a critical bug that may result/has resulted in funds loss, that bug needs to be resolved, and then maybe we can chat with them about how to deal directly with ChannelMonitor
s to close channels and create a new ChannelManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right the update_id
is already yielded. Maybe we could have a conventional message "This is a LDK API implementation bug : you can submit a bug report but ultimately solving the issue is on your side" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of #976 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to me
lightning/src/ln/msgs.rs
Outdated
@@ -689,6 +689,9 @@ pub enum ErrorAction { | |||
}, | |||
/// The peer did something harmless that we weren't able to process, just log and ignore | |||
IgnoreError, | |||
/// The peer did something harmless (and expected) that we weren't able to meaningfully | |||
/// process. Simply ignore. | |||
IgnoreCommonError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about HarmlessError/MinorError
, a bit more meaningful w.r.t to IgnoreError ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, part of what I wanted to capture is that it is "common" in that it appears regularly. Ultimately that's the real difference between log_trace
and log_debug
- debug should still be readable, but trace you probably need to grep to get anything useful :). Maybe drop the Ignore
part and just do ExpectedError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type indicates an action, though, so the naming is less meaningful without Ignore
. Maybe use a single variant IgnoreAndLog
with a severity parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that idea, will you kill me if I add a second type - there's a ton of Ignore
references in the codebase and updating all of them in this PR seems like it would be a bit large. We can migrate to the explicit variant over time and then clean them all up when the repo is a bit quieter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup! Couldn't find any swapped args or anything
lightning/src/ln/msgs.rs
Outdated
@@ -689,6 +689,9 @@ pub enum ErrorAction { | |||
}, | |||
/// The peer did something harmless that we weren't able to process, just log and ignore | |||
IgnoreError, | |||
/// The peer did something harmless (and expected) that we weren't able to meaningfully | |||
/// process. Simply ignore. | |||
IgnoreCommonError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type indicates an action, though, so the naming is less meaningful without Ignore
. Maybe use a single variant IgnoreAndLog
with a severity parameter?
log_trace!(logger, "Checking funding_created tx signature {} by key {} against tx {} (sighash {}) with redeemscript {} for channel {}.", | ||
log_bytes!(sig.serialize_compact()[..]), log_bytes!(self.counterparty_funding_pubkey().serialize()), | ||
encode::serialize_hex(&initial_commitment_bitcoin_tx.transaction), log_bytes!(sighash[..]), | ||
encode::serialize_hex(&funding_script), log_bytes!(self.channel_id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some of these longer messages, I wonder if they would be easier to read if we logged a struct containing references to the data that implements Debug
and used{:#?}
to break across lines.
More thinking out loud. Feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question! I wasn't aware that was a thing, but I'm a bit torn on it. I agree its pretty unreadable right now, but at the same time logging is an API not strictly a user-interface thing, so abstractly it feels weird to have linebreaks. More concretely, I currently grep the sample node logs pretty extensively to filter out verbose stuff, which moving towards inserting linebreaks would break. TRACE-level logging isn't really intended for direct human processing without some filtering, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, not sure how well supported this would be, but a log entry could theoretically contain new lines and the logger could write each entry separated by record separator control characters. Other loggers may log to a database or key-value store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel like that violates the principle of least-surprise - Users shouldn't be surprised that they have to do extra work to make our log entries "work" when writing to a file (the "common" use case IMO).
bd6b560
to
15cc91a
Compare
Addressed all feedback, I think. |
15cc91a
to
29991ac
Compare
lightning/src/ln/peer_handler.rs
Outdated
msgs::ErrorAction::IgnoreCommonError => { | ||
log_trace!(self.logger, "Error handling message; ignoring: {}", e.err); | ||
msgs::ErrorAction::IgnoreAndLog(level) => { | ||
log_given_level!(self.logger, cmp::min(Level::Debug, level), "Error handling message; ignoring: {}", e.err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the cmp::min
used? Would it make sense to write a custom macro for this so it doesn't need to be repeated elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought the "minimum level" there was pretty readable. I could move the cmp::min
into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rephrase my first question, why not just use level
here? I don't quite understand why we want to use a lower level if IgnoreAndLog
gives, say, Warn
.
Hmm, I thought the "minimum level" there was pretty readable. I could move the
cmp::min
into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?
I was saying to leave log_given_level
as is and add another macro like log_debug_or_lower
(or similarly named) to avoid a lot of boilerplate if this pattern will be common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rephrase my first question, why not just use level here? I don't quite understand why we want to use a lower level if IgnoreAndLog gives, say, Warn.
Hmm, right, I guess we could? Its a bit strange to "warn" the user for an error that we "ignored", but I can just drop it.
I was saying to leave log_given_level as is and add another macro like log_debug_or_lower (or similarly named) to avoid a lot of boilerplate if this pattern will be common.
I mean we only handle the ErrorAction
stuff in two places... :)
29991ac
to
7043607
Compare
7043607
to
c37e76d
Compare
@@ -2028,7 +2043,7 @@ impl<Signer: Sign> Channel<Signer> { | |||
if pending_remote_value_msat - msg.amount_msat - chan_reserve_msat < remote_fee_cost_incl_stuck_buffer_msat { | |||
// Note that if the pending_forward_status is not updated here, then it's because we're already failing | |||
// the HTLC, i.e. its status is already set to failing. | |||
log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation"); | |||
log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation in channel {}. Rebalancing is required.", log_bytes!(self.channel_id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but wish we could point people somewhere for expansion on "rebalancing is required" 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll get there eventually :) #959
We had a client application which provided inconsistent monitor state when deserializing a ChannelManager, resulting in opaque and generic "InvalidData" deserialization failures. Instead, we log some informative (and appropriately scary) warning messages in such cases.
This makes our logging consistent and somewhat simplifies message sending code in a few places.
This much more consistently logs information about messages sent/received, including logging the full messages being sent/received at the TRACE log level. Many other log messages which are more often of interest were moved to the DEBUG log level.
For log entries which may have a variable level, we can't call an arbitrary macro and need to be able to pass an explicit level. This does so without breaking the compile-time disabling of certain log levels. Further, we "fix" the comparison order of log levels to make more significant levels sort "higher", which implicitly makes more sense than sorting "lower". Finally, we remove the "Off" log level as no log entry should ever be logged at the "Off" level - that would be nonsensical.
We very often receive duplicate gossip messages, which now causes us to log at the DEBUG level, which is almost certainly not what a user wants. Instead, we add a new form of ErrorAction which causes us to only log at the TRACE level.
This updates a number of log sites in channel and channelmanager to * Be a bit more verbose at the TRACE level, * Move some error/useful messages to the ERROR/WARN/INFO level, * Add new logs to always log once at the DEBUG level when we send/receive a commitment_signed (with some extra data), * Include the channel id being operated on in more log messages.
ChannelMonitor and related log entries can generally lean towards being higher log levels than they necessarily need to be, as they should be exceedingly rare, if only because they require confirmation of an on-chain transaction.
c37e76d
to
6d98aed
Compare
Squashed without changes:
|
Post-merge Code Review ACK 6d98aed +1 for the new |
This is the result of going through almost all of the logging in the library, as well as feedback from a few different users over the past month or so noting lack of logging in a few places.
onchaintx.rs
andpackage.rs
, but otherwise I think this is good.