Skip to content

Cleanup logging #965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 29, 2021
Merged

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Jun 23, 2021

This is the result of going through almost all of the logging in the library, as well as feedback from a few different users over the past month or so noting lack of logging in a few places.

  • I still need to go through onchaintx.rs and package.rs, but otherwise I think this is good.

These patches have been merged upstream and are in releases now so
we don't need to patch them locally.
When a peer sends us the routing graph, it may include gossip
messages for our channels, despite it not being a party to them.
This is completely fine, but we currently print a somewhat-scary
looking log messages in these cases, eg:

```
ERROR [lightning::ln::channelmanager:4104] Got a message for a channel from the wrong node!
TRACE [lightning::ln::peer_handler:1267] Handling SendErrorMessage HandleError event in peer_handler for node ... with message Got a message for a channel from the wrong node!
```

Instead, we should simply not consider this an "error" condition
and stay silent.
@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from f83fa1a to 4a87105 Compare June 23, 2021 16:00
@codecov
Copy link

codecov bot commented Jun 23, 2021

Codecov Report

Merging #965 (6d98aed) into main (073afbb) will increase coverage by 1.08%.
The diff coverage is 73.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   90.66%   91.75%   +1.08%     
==========================================
  Files          60       60              
  Lines       30407    34788    +4381     
==========================================
+ Hits        27568    31919    +4351     
- Misses       2839     2869      +30     
Impacted Files Coverage Δ
lightning/src/ln/msgs.rs 90.60% <0.00%> (+2.25%) ⬆️
lightning/src/ln/peer_handler.rs 46.73% <12.50%> (+0.22%) ⬆️
lightning/src/ln/wire.rs 53.53% <50.00%> (-10.57%) ⬇️
lightning/src/routing/network_graph.rs 91.76% <50.00%> (-0.22%) ⬇️
lightning/src/util/macro_logger.rs 87.87% <50.00%> (-1.19%) ⬇️
lightning/src/chain/package.rs 93.04% <83.33%> (+0.74%) ⬆️
lightning/src/ln/channelmanager.rs 86.18% <83.33%> (+2.32%) ⬆️
lightning/src/chain/channelmonitor.rs 91.90% <100.00%> (+1.14%) ⬆️
lightning/src/chain/onchaintx.rs 94.19% <100.00%> (+0.04%) ⬆️
lightning/src/ln/channel.rs 92.51% <100.00%> (+4.22%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 073afbb...6d98aed. Read the comment docs.

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 4a87105 to edd0da5 Compare June 23, 2021 16:20
@TheBlueMatt TheBlueMatt added this to the 0.0.99 milestone Jun 23, 2021
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this cleanup :)

// a scary-looking error message and return Ok instead.
return Ok(());
}
return Err(MsgHandleErrInternal::send_err_msg_no_close("Got a channel_update for a channel from the wrong node - it shouldn't know about our private channels!".to_owned(), chan_id));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though #949 blurs this assertion, a direct peer might learn a channel_update for outbound-from-us payments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#949 should only be sending the message directly to the channel counterparty, so I think its still correct. Its true we do tell other nodes about our private channels in invoices, but they still shouldn't remember that, nor should they ever have channel_updates for those messages. Do you have a suggestion for a different concrete wording?

Copy link

@ariard ariard Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your comment is right if this peer is applying the same gossip policy than us ? Some weird LN client could not sort discovered gossips according to their sources and treat them uniformly, I don't think that's something the specification is covering (and IMHO it should be strongly opinionated on this).

So as a suggestion maybe prefix "Gossip policy: Got a channel_update for a channel from the wrong node - we consider it shouldn't know about this private channels!" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification is clear that you should not accept or forward a channel_update unless you've received the corresponding channel_announcement, which we will never generate signatures for, so it cannot exist.

log_error!(args.logger, " The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.",
log_bytes!(channel.channel_id()), monitor.get_latest_update_id(), channel.get_latest_monitor_update_id());
log_error!(args.logger, " The chain::Watch API *requires* that monitors are persisted durably before returning,");
log_error!(args.logger, " client applications must ensure that ChannelMonitor data is always available and the latest to avoid funds loss!");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add another line "You can search for a ChannelMonitor update_id matching the ChannelManager", otherwise what the user can do ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't that included in the line two lines up The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.? I don't really think there is anything the user can do here, the application developer has a critical bug that may result/has resulted in funds loss, that bug needs to be resolved, and then maybe we can chat with them about how to deal directly with ChannelMonitors to close channels and create a new ChannelManager.

Copy link

@ariard ariard Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right the update_id is already yielded. Maybe we could have a conventional message "This is a LDK API implementation bug : you can submit a bug report but ultimately solving the issue is on your side" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of #976 ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to me

@@ -689,6 +689,9 @@ pub enum ErrorAction {
},
/// The peer did something harmless that we weren't able to process, just log and ignore
IgnoreError,
/// The peer did something harmless (and expected) that we weren't able to meaningfully
/// process. Simply ignore.
IgnoreCommonError,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about HarmlessError/MinorError, a bit more meaningful w.r.t to IgnoreError ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, part of what I wanted to capture is that it is "common" in that it appears regularly. Ultimately that's the real difference between log_trace and log_debug - debug should still be readable, but trace you probably need to grep to get anything useful :). Maybe drop the Ignore part and just do ExpectedError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type indicates an action, though, so the naming is less meaningful without Ignore. Maybe use a single variant IgnoreAndLog with a severity parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that idea, will you kill me if I add a second type - there's a ton of Ignore references in the codebase and updating all of them in this PR seems like it would be a bit large. We can migrate to the explicit variant over time and then clean them all up when the repo is a bit quieter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! Couldn't find any swapped args or anything

@@ -689,6 +689,9 @@ pub enum ErrorAction {
},
/// The peer did something harmless that we weren't able to process, just log and ignore
IgnoreError,
/// The peer did something harmless (and expected) that we weren't able to meaningfully
/// process. Simply ignore.
IgnoreCommonError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type indicates an action, though, so the naming is less meaningful without Ignore. Maybe use a single variant IgnoreAndLog with a severity parameter?

Comment on lines +1537 to +1540
log_trace!(logger, "Checking funding_created tx signature {} by key {} against tx {} (sighash {}) with redeemscript {} for channel {}.",
log_bytes!(sig.serialize_compact()[..]), log_bytes!(self.counterparty_funding_pubkey().serialize()),
encode::serialize_hex(&initial_commitment_bitcoin_tx.transaction), log_bytes!(sighash[..]),
encode::serialize_hex(&funding_script), log_bytes!(self.channel_id()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of these longer messages, I wonder if they would be easier to read if we logged a struct containing references to the data that implements Debug and used{:#?} to break across lines.

More thinking out loud. Feel free to ignore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question! I wasn't aware that was a thing, but I'm a bit torn on it. I agree its pretty unreadable right now, but at the same time logging is an API not strictly a user-interface thing, so abstractly it feels weird to have linebreaks. More concretely, I currently grep the sample node logs pretty extensively to filter out verbose stuff, which moving towards inserting linebreaks would break. TRACE-level logging isn't really intended for direct human processing without some filtering, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, not sure how well supported this would be, but a log entry could theoretically contain new lines and the logger could write each entry separated by record separator control characters. Other loggers may log to a database or key-value store.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel like that violates the principle of least-surprise - Users shouldn't be surprised that they have to do extra work to make our log entries "work" when writing to a file (the "common" use case IMO).

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch 3 times, most recently from bd6b560 to 15cc91a Compare June 28, 2021 20:53
@TheBlueMatt
Copy link
Collaborator Author

Addressed all feedback, I think.

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 15cc91a to 29991ac Compare June 28, 2021 20:59
msgs::ErrorAction::IgnoreCommonError => {
log_trace!(self.logger, "Error handling message; ignoring: {}", e.err);
msgs::ErrorAction::IgnoreAndLog(level) => {
log_given_level!(self.logger, cmp::min(Level::Debug, level), "Error handling message; ignoring: {}", e.err);
Copy link
Contributor

@jkczyz jkczyz Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the cmp::min used? Would it make sense to write a custom macro for this so it doesn't need to be repeated elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought the "minimum level" there was pretty readable. I could move the cmp::min into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase my first question, why not just use level here? I don't quite understand why we want to use a lower level if IgnoreAndLog gives, say, Warn.

Hmm, I thought the "minimum level" there was pretty readable. I could move the cmp::min into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?

I was saying to leave log_given_level as is and add another macro like log_debug_or_lower (or similarly named) to avoid a lot of boilerplate if this pattern will be common.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase my first question, why not just use level here? I don't quite understand why we want to use a lower level if IgnoreAndLog gives, say, Warn.

Hmm, right, I guess we could? Its a bit strange to "warn" the user for an error that we "ignored", but I can just drop it.

I was saying to leave log_given_level as is and add another macro like log_debug_or_lower (or similarly named) to avoid a lot of boilerplate if this pattern will be common.

I mean we only handle the ErrorAction stuff in two places... :)

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 29991ac to 7043607 Compare June 29, 2021 15:18
@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 7043607 to c37e76d Compare June 29, 2021 16:02
@@ -2028,7 +2043,7 @@ impl<Signer: Sign> Channel<Signer> {
if pending_remote_value_msat - msg.amount_msat - chan_reserve_msat < remote_fee_cost_incl_stuck_buffer_msat {
// Note that if the pending_forward_status is not updated here, then it's because we're already failing
// the HTLC, i.e. its status is already set to failing.
log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation");
log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation in channel {}. Rebalancing is required.", log_bytes!(self.channel_id()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but wish we could point people somewhere for expansion on "rebalancing is required" 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get there eventually :) #959

We had a client application which provided inconsistent monitor
state when deserializing a ChannelManager, resulting in opaque and
generic "InvalidData" deserialization failures. Instead, we log
some informative (and appropriately scary) warning messages in
such cases.
This makes our logging consistent and somewhat simplifies message
sending code in a few places.
This much more consistently logs information about messages
sent/received, including logging the full messages being
sent/received at the TRACE log level. Many other log messages which
are more often of interest were moved to the DEBUG log level.
For log entries which may have a variable level, we can't call an
arbitrary macro and need to be able to pass an explicit level. This
does so without breaking the compile-time disabling of certain log
levels.

Further, we "fix" the comparison order of log levels to make more
significant levels sort "higher", which implicitly makes more sense
than sorting "lower".

Finally, we remove the "Off" log level as no log entry should ever
be logged at the "Off" level - that would be nonsensical.
We very often receive duplicate gossip messages, which now causes us
to log at the DEBUG level, which is almost certainly not what a
user wants. Instead, we add a new form of ErrorAction which causes
us to only log at the TRACE level.
This updates a number of log sites in channel and channelmanager to
 * Be a bit more verbose at the TRACE level,
 * Move some error/useful messages to the ERROR/WARN/INFO level,
 * Add new logs to always log once at the DEBUG level when we
   send/receive a commitment_signed (with some extra data),
 * Include the channel id being operated on in more log messages.
ChannelMonitor and related log entries can generally lean towards
being higher log levels than they necessarily need to be, as they
should be exceedingly rare, if only because they require
confirmation of an on-chain transaction.
@TheBlueMatt TheBlueMatt force-pushed the 2021-06-log-cleanups branch from c37e76d to 6d98aed Compare June 29, 2021 19:36
@TheBlueMatt
Copy link
Collaborator Author

Squashed without changes:

$ git diff-tree -U1 c37e76da 6d98aeda
$

@TheBlueMatt TheBlueMatt merged commit f472907 into lightningdevkit:main Jun 29, 2021
@ariard
Copy link

ariard commented Jun 30, 2021

Post-merge Code Review ACK 6d98aed

+1 for the new ErrorAction::IgnoreAndLog API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants