Cleanup logging #965

TheBlueMatt · 2021-06-23T03:39:24Z

This is the result of going through almost all of the logging in the library, as well as feedback from a few different users over the past month or so noting lack of logging in a few places.

I still need to go through onchaintx.rs and package.rs, but otherwise I think this is good.

These patches have been merged upstream and are in releases now so we don't need to patch them locally.

When a peer sends us the routing graph, it may include gossip messages for our channels, despite it not being a party to them. This is completely fine, but we currently print a somewhat-scary looking log messages in these cases, eg: ``` ERROR [lightning::ln::channelmanager:4104] Got a message for a channel from the wrong node! TRACE [lightning::ln::peer_handler:1267] Handling SendErrorMessage HandleError event in peer_handler for node ... with message Got a message for a channel from the wrong node! ``` Instead, we should simply not consider this an "error" condition and stay silent.

codecov · 2021-06-23T16:18:43Z

Codecov Report

Merging #965 (6d98aed) into main (073afbb) will increase coverage by 1.08%.
The diff coverage is 73.09%.

@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   90.66%   91.75%   +1.08%     
==========================================
  Files          60       60              
  Lines       30407    34788    +4381     
==========================================
+ Hits        27568    31919    +4351     
- Misses       2839     2869      +30

Impacted Files	Coverage Δ
lightning/src/ln/msgs.rs	`90.60% <0.00%> (+2.25%)`	⬆️
lightning/src/ln/peer_handler.rs	`46.73% <12.50%> (+0.22%)`	⬆️
lightning/src/ln/wire.rs	`53.53% <50.00%> (-10.57%)`	⬇️
lightning/src/routing/network_graph.rs	`91.76% <50.00%> (-0.22%)`	⬇️
lightning/src/util/macro_logger.rs	`87.87% <50.00%> (-1.19%)`	⬇️
lightning/src/chain/package.rs	`93.04% <83.33%> (+0.74%)`	⬆️
lightning/src/ln/channelmanager.rs	`86.18% <83.33%> (+2.32%)`	⬆️
lightning/src/chain/channelmonitor.rs	`91.90% <100.00%> (+1.14%)`	⬆️
lightning/src/chain/onchaintx.rs	`94.19% <100.00%> (+0.04%)`	⬆️
lightning/src/ln/channel.rs	`92.51% <100.00%> (+4.22%)`	⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 073afbb...6d98aed. Read the comment docs.

ariard

Thanks for this cleanup :)

ariard · 2021-06-23T22:03:08Z

lightning/src/ln/channelmanager.rs

+						// a scary-looking error message and return Ok instead.
+						return Ok(());
+					}
+					return Err(MsgHandleErrInternal::send_err_msg_no_close("Got a channel_update for a channel from the wrong node - it shouldn't know about our private channels!".to_owned(), chan_id));


Though #949 blurs this assertion, a direct peer might learn a channel_update for outbound-from-us payments.

#949 should only be sending the message directly to the channel counterparty, so I think its still correct. Its true we do tell other nodes about our private channels in invoices, but they still shouldn't remember that, nor should they ever have channel_updates for those messages. Do you have a suggestion for a different concrete wording?

I think your comment is right if this peer is applying the same gossip policy than us ? Some weird LN client could not sort discovered gossips according to their sources and treat them uniformly, I don't think that's something the specification is covering (and IMHO it should be strongly opinionated on this).

So as a suggestion maybe prefix "Gossip policy: Got a channel_update for a channel from the wrong node - we consider it shouldn't know about this private channels!" ?

The specification is clear that you should not accept or forward a channel_update unless you've received the corresponding channel_announcement, which we will never generate signatures for, so it cannot exist.

ariard · 2021-06-23T22:08:31Z

lightning/src/ln/channelmanager.rs

+					log_error!(args.logger, " The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.",
+						log_bytes!(channel.channel_id()), monitor.get_latest_update_id(), channel.get_latest_monitor_update_id());
+					log_error!(args.logger, " The chain::Watch API *requires* that monitors are persisted durably before returning,");
+					log_error!(args.logger, " client applications must ensure that ChannelMonitor data is always available and the latest to avoid funds loss!");


Maybe add another line "You can search for a ChannelMonitor update_id matching the ChannelManager", otherwise what the user can do ?

Hmm, isn't that included in the line two lines up The ChannelMonitor for channel {} is at update_id {} but the ChannelManager is at update_id {}.? I don't really think there is anything the user can do here, the application developer has a critical bug that may result/has resulted in funds loss, that bug needs to be resolved, and then maybe we can chat with them about how to deal directly with ChannelMonitors to close channels and create a new ChannelManager.

Right the update_id is already yielded. Maybe we could have a conventional message "This is a LDK API implementation bug : you can submit a bug report but ultimately solving the issue is on your side" ?

What do you think of #976 ?

ariard · 2021-06-23T22:21:31Z

lightning/src/ln/msgs.rs

@@ -689,6 +689,9 @@ pub enum ErrorAction {
 	},
 	/// The peer did something harmless that we weren't able to process, just log and ignore
 	IgnoreError,
+	/// The peer did something harmless (and expected) that we weren't able to meaningfully
+	/// process. Simply ignore.
+	IgnoreCommonError,


What do you think about HarmlessError/MinorError, a bit more meaningful w.r.t to IgnoreError ?

Hmm, part of what I wanted to capture is that it is "common" in that it appears regularly. Ultimately that's the real difference between log_trace and log_debug - debug should still be readable, but trace you probably need to grep to get anything useful :). Maybe drop the Ignore part and just do ExpectedError?

The type indicates an action, though, so the naming is less meaningful without Ignore. Maybe use a single variant IgnoreAndLog with a severity parameter?

I really like that idea, will you kill me if I add a second type - there's a ton of Ignore references in the codebase and updating all of them in this PR seems like it would be a bit large. We can migrate to the explicit variant over time and then clean them all up when the repo is a bit quieter.

lightning/src/ln/channelmanager.rs

valentinewallace

Nice cleanup! Couldn't find any swapped args or anything

lightning/src/ln/channelmanager.rs

lightning/src/ln/peer_handler.rs

lightning/src/ln/wire.rs

jkczyz · 2021-06-28T18:37:07Z

lightning/src/ln/msgs.rs

@@ -689,6 +689,9 @@ pub enum ErrorAction {
 	},
 	/// The peer did something harmless that we weren't able to process, just log and ignore
 	IgnoreError,
+	/// The peer did something harmless (and expected) that we weren't able to meaningfully
+	/// process. Simply ignore.
+	IgnoreCommonError,


The type indicates an action, though, so the naming is less meaningful without Ignore. Maybe use a single variant IgnoreAndLog with a severity parameter?

jkczyz · 2021-06-28T18:48:21Z

lightning/src/ln/channel.rs

+			log_trace!(logger, "Checking funding_created tx signature {} by key {} against tx {} (sighash {}) with redeemscript {} for channel {}.",
+				log_bytes!(sig.serialize_compact()[..]), log_bytes!(self.counterparty_funding_pubkey().serialize()),
+				encode::serialize_hex(&initial_commitment_bitcoin_tx.transaction), log_bytes!(sighash[..]),
+				encode::serialize_hex(&funding_script), log_bytes!(self.channel_id()));


For some of these longer messages, I wonder if they would be easier to read if we logged a struct containing references to the data that implements Debug and used{:#?} to break across lines.

More thinking out loud. Feel free to ignore.

Hmm, good question! I wasn't aware that was a thing, but I'm a bit torn on it. I agree its pretty unreadable right now, but at the same time logging is an API not strictly a user-interface thing, so abstractly it feels weird to have linebreaks. More concretely, I currently grep the sample node logs pretty extensively to filter out verbose stuff, which moving towards inserting linebreaks would break. TRACE-level logging isn't really intended for direct human processing without some filtering, I think.

FWIW, not sure how well supported this would be, but a log entry could theoretically contain new lines and the logger could write each entry separated by record separator control characters. Other loggers may log to a database or key-value store.

I still feel like that violates the principle of least-surprise - Users shouldn't be surprised that they have to do extra work to make our log entries "work" when writing to a file (the "common" use case IMO).

lightning/src/ln/channelmanager.rs

lightning/src/routing/router.rs

TheBlueMatt · 2021-06-28T20:55:06Z

Addressed all feedback, I think.

jkczyz · 2021-06-29T00:41:18Z

lightning/src/ln/peer_handler.rs

-												msgs::ErrorAction::IgnoreCommonError => {
-													log_trace!(self.logger, "Error handling message; ignoring: {}", e.err);
+												msgs::ErrorAction::IgnoreAndLog(level) => {
+													log_given_level!(self.logger, cmp::min(Level::Debug, level), "Error handling message; ignoring: {}", e.err);


Why is the cmp::min used? Would it make sense to write a custom macro for this so it doesn't need to be repeated elsewhere?

Hmm, I thought the "minimum level" there was pretty readable. I could move the cmp::min into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?

To rephrase my first question, why not just use level here? I don't quite understand why we want to use a lower level if IgnoreAndLog gives, say, Warn.

Hmm, I thought the "minimum level" there was pretty readable. I could move the cmp::min into the macro, but it felt readable and this is slightly more generic because it would let you do some match or if statement that sets the level?

I was saying to leave log_given_level as is and add another macro like log_debug_or_lower (or similarly named) to avoid a lot of boilerplate if this pattern will be common.

To rephrase my first question, why not just use level here? I don't quite understand why we want to use a lower level if IgnoreAndLog gives, say, Warn.

Hmm, right, I guess we could? Its a bit strange to "warn" the user for an error that we "ignored", but I can just drop it.

I was saying to leave log_given_level as is and add another macro like log_debug_or_lower (or similarly named) to avoid a lot of boilerplate if this pattern will be common.

I mean we only handle the ErrorAction stuff in two places... :)

lightning/src/util/logger.rs

valentinewallace · 2021-06-29T18:57:25Z

lightning/src/ln/channel.rs

@@ -2028,7 +2043,7 @@ impl<Signer: Sign> Channel<Signer> {
 			if pending_remote_value_msat - msg.amount_msat - chan_reserve_msat < remote_fee_cost_incl_stuck_buffer_msat {
 				// Note that if the pending_forward_status is not updated here, then it's because we're already failing
 				// the HTLC, i.e. its status is already set to failing.
-				log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation");
+				log_info!(logger, "Attempting to fail HTLC due to fee spike buffer violation in channel {}. Rebalancing is required.", log_bytes!(self.channel_id()));


Not a big deal but wish we could point people somewhere for expansion on "rebalancing is required" 🤔

We'll get there eventually :) #959

We had a client application which provided inconsistent monitor state when deserializing a ChannelManager, resulting in opaque and generic "InvalidData" deserialization failures. Instead, we log some informative (and appropriately scary) warning messages in such cases.

This makes our logging consistent and somewhat simplifies message sending code in a few places.

This much more consistently logs information about messages sent/received, including logging the full messages being sent/received at the TRACE log level. Many other log messages which are more often of interest were moved to the DEBUG log level.

For log entries which may have a variable level, we can't call an arbitrary macro and need to be able to pass an explicit level. This does so without breaking the compile-time disabling of certain log levels. Further, we "fix" the comparison order of log levels to make more significant levels sort "higher", which implicitly makes more sense than sorting "lower". Finally, we remove the "Off" log level as no log entry should ever be logged at the "Off" level - that would be nonsensical.

We very often receive duplicate gossip messages, which now causes us to log at the DEBUG level, which is almost certainly not what a user wants. Instead, we add a new form of ErrorAction which causes us to only log at the TRACE level.

This updates a number of log sites in channel and channelmanager to * Be a bit more verbose at the TRACE level, * Move some error/useful messages to the ERROR/WARN/INFO level, * Add new logs to always log once at the DEBUG level when we send/receive a commitment_signed (with some extra data), * Include the channel id being operated on in more log messages.

ChannelMonitor and related log entries can generally lean towards being higher log levels than they necessarily need to be, as they should be exceedingly rare, if only because they require confirmation of an on-chain transaction.

TheBlueMatt · 2021-06-29T19:37:21Z

Squashed without changes:

$ git diff-tree -U1 c37e76da 6d98aeda
$

ariard · 2021-06-30T00:38:19Z

Post-merge Code Review ACK 6d98aed

+1 for the new ErrorAction::IgnoreAndLog API

TheBlueMatt added 4 commits June 23, 2021 01:35

Don't print file paths in fuzz logger as they can be very long

d61d698

log_debug information about network graph updates from payments

c5a6135

Drop rust-bitcoin crate patches in fuzz now that they're merged

95f9523

These patches have been merged upstream and are in releases now so we don't need to patch them locally.

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from f83fa1a to 4a87105 Compare June 23, 2021 16:00

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 4a87105 to edd0da5 Compare June 23, 2021 16:20

TheBlueMatt added this to the 0.0.99 milestone Jun 23, 2021

ariard reviewed Jun 23, 2021

View reviewed changes

TheBlueMatt mentioned this pull request Jun 25, 2021

Create SpendableOutputs events no matter the chain::Confirm order #970

Merged

valentinewallace approved these changes Jun 25, 2021

View reviewed changes

TheBlueMatt mentioned this pull request Jun 28, 2021

Do not always persist ChannelManager on channel_update messages #972

Merged

jkczyz reviewed Jun 28, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-06-log-cleanups branch 3 times, most recently from bd6b560 to 15cc91a Compare June 28, 2021 20:53

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 15cc91a to 29991ac Compare June 28, 2021 20:59

jkczyz reviewed Jun 29, 2021

View reviewed changes

lightning/src/util/logger.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 29991ac to 7043607 Compare June 29, 2021 15:18

jkczyz reviewed Jun 29, 2021

View reviewed changes

lightning/src/util/logger.rs Show resolved Hide resolved

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from 7043607 to c37e76d Compare June 29, 2021 16:02

valentinewallace approved these changes Jun 29, 2021

View reviewed changes

jkczyz approved these changes Jun 29, 2021

View reviewed changes

TheBlueMatt added 5 commits June 29, 2021 19:36

Unify message sending to use PeerManager::enqueue_message

3ea4279

This makes our logging consistent and somewhat simplifies message sending code in a few places.

TheBlueMatt added 5 commits June 29, 2021 19:36

Add additional TRACE-level logging during pathfinding in router

76ea834

Correct inbound HTLC upgrade logs on revoke_and_ack receipt

6d446a6

Add debug log when we stop tracking confirmed on-chain packages

6d98aed

TheBlueMatt force-pushed the 2021-06-log-cleanups branch from c37e76d to 6d98aed Compare June 29, 2021 19:36

TheBlueMatt merged commit f472907 into lightningdevkit:main Jun 29, 2021

TheBlueMatt mentioned this pull request Jun 30, 2021

Make errors actionable when failing to deserialize a ChannelManager #976

Merged

Cleanup logging #965

Cleanup logging #965

Uh oh!

Conversation

TheBlueMatt commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ariard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariard Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariard Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Jun 28, 2021

Uh oh!

jkczyz Jun 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Jun 23, 2021 •

edited

Loading

codecov bot commented Jun 23, 2021 •

edited

Loading

ariard Jun 30, 2021 •

edited

Loading

ariard Jun 30, 2021 •

edited

Loading

jkczyz Jun 29, 2021 •

edited

Loading