Skip to content

LSPS5: Correct notification cooldown & reset logic #3975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

martinsaposnic
Copy link
Contributor

Addressing LSPS5 post merge follow ups, specifically comments #3944 (comment) and #3662 (comment)

From the LSPS5 spec:

LSP: Rate-limiting Notifications By method
An LSP implementation must avoid sending multiple LSPS5 notifications of the same method (other than
lsps5.webhook_registered) close in time, as long as the client has not connected to it.
For example, if a payment arrives and the LSP thus wants to send lsps5.payment_incoming notification to a registered webhook, and another payment arrives before the client comes online, the LSP must not send another lsps5.payment_incoming notification.
If the client does not come online after some time that a particular method was sent via a webhook, then the LSP may raise it again. This timeout must be measurable in hours or days.
The timeout should be reset once the client comes online and then goes offline.

This PR fixes a bug in the notification rate limiting logic. Previously, the same notification method (e.g., lsps5.payment_incoming) could not be sent again for 24 hours, regardless of whether the client reconnected (yikes). Now, the default cooldown is set to 1 hour, and the cooldown is reset whenever the client connects again. This ensures that after a client comes online, the LSP can immediately send new notifications of the same method.

Also adds a default for LSPS5/service configuration, which for some reason I didn't do before.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 30, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@martinsaposnic martinsaposnic mentioned this pull request Jul 30, 2025
18 tasks
Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.98%. Comparing base (e01663a) to head (bae3224).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps5/msgs.rs 0.00% 2 Missing ⚠️
lightning-liquidity/src/lsps5/service.rs 93.75% 2 Missing ⚠️
lightning-liquidity/src/manager.rs 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3975   +/-   ##
=======================================
  Coverage   88.97%   88.98%           
=======================================
  Files         174      174           
  Lines      124161   124184   +23     
  Branches   124161   124184   +23     
=======================================
+ Hits       110470   110500   +30     
+ Misses      11216    11209    -7     
  Partials     2475     2475           
Flag Coverage Δ
fuzzing 22.23% <0.00%> (-0.41%) ⬇️
tests 88.80% <87.17%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull requested review from tnull and removed request for valentinewallace July 31, 2025 08:13
fn default() -> Self {
Self {
max_webhooks_per_client: DEFAULT_MAX_WEBHOOKS_PER_CLIENT,
notification_cooldown_hours: DEFAULT_NOTIFICATION_COOLDOWN_HOURS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure if we should make this user-configurable. But if we do, let's check on startup that they set it to at least 1 hour to comply with the spec.

.map_or(true, |duration| {
duration >= self.config.notification_cooldown_hours.as_secs()
.map_or(true, |last_sent| {
last_sent >= self.config.notification_cooldown_hours.as_secs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think duration was clearer here, as we don't compare the absolute time.

let mut webhooks = self.webhooks.lock().unwrap();
if let Some(client_webhooks) = webhooks.get_mut(counterparty_node_id) {
for webhook in client_webhooks.values_mut() {
webhook.last_notification_sent.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit orthogonal, but do we even need a separate last_used field? Mind adding comments on the fields to explain how last_used and last_notification_sent differ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments explaining because it was even hard for me to remember the difference.

they are different because

  • last_used is updated either if a notification is sent or if the webhook is updated (via set_webhook)
  • last_notification_sent is a Map<NotificationMethod, timestamp> that is only updated when a notification is sent

last_notification_sent is used for the notification_cooldown logic, and last_used is used for the stale_webhooks logic

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@martinsaposnic
Copy link
Contributor Author

martinsaposnic commented Jul 31, 2025

@tnull does it make sense to add a new error SLOW_DOWN if the notification is not sent due to cooldown? right now it does nothing silently.

after edit: I went ahead and implemented this 9e59322, let me know what you think

@martinsaposnic martinsaposnic force-pushed the fix-lsps5-notification-cooldown-logic branch from 9e59322 to 9e0ff6f Compare July 31, 2025 15:53
@martinsaposnic martinsaposnic requested a review from tnull July 31, 2025 15:54
@martinsaposnic martinsaposnic force-pushed the fix-lsps5-notification-cooldown-logic branch from 9e0ff6f to a4a560b Compare July 31, 2025 15:59
@@ -118,6 +128,7 @@ impl LSPS5ProtocolError {
LSPS5ProtocolError::AppNameNotFound => LSPS5_APP_NAME_NOT_FOUND_ERROR_CODE,
LSPS5ProtocolError::UnknownError => LSPS5_UNKNOWN_ERROR_CODE,
LSPS5ProtocolError::SerializationError => LSPS5_SERIALIZATION_ERROR_CODE,
LSPS5ProtocolError::SlowDownError => LSPS5_SLOW_DOWN_ERROR_CODE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't get there in github, but it seems the above comment

	/// private code range so we never collide with the spec's codes

is wrong given we match spec'd and non-spec'd codes here and elsewhere?

.map_or(false, |duration| duration < DEFAULT_NOTIFICATION_COOLDOWN_HOURS.as_secs())
});

if rate_limit_applies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think the spec says that we should disregard the rate-limit for WebhookRegistered, right?

An LSP implementation must avoid sending multiple LSPS5 notifications of the same method (other than lsps5.webhook_registered) close in time, as long as the client has not connected to it.


// If the send_notification failed because of a SLOW_DOWN_ERROR, it means we sent this
// notification recently, and the user has not seen it yet. It's safe to continue, but we still need to handle other error types.
if result.is_err() && !matches!(result, Err(LSPS5ProtocolError::SlowDownError)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a classic match would be preferable here. Also agree to continue, even though the counterparty shouldn't send the error for webhook_registered in the first place, no?

@martinsaposnic
Copy link
Contributor Author

thanks for the review @tnull

I addressed your comments on this fixup commit 121dc04

  • changed a stale comment on lsps5/msgs
  • on handle_set_webhook it does not check for the slowdown error (according to spec it never should rate limit this type of notification)
  • on send_notifications_to_client_webhooks, it will only check for rate-limit if the notification is not webhook_registered

@martinsaposnic martinsaposnic requested a review from tnull August 4, 2025 15:04
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please squash.

@martinsaposnic martinsaposnic force-pushed the fix-lsps5-notification-cooldown-logic branch 2 times, most recently from 3adc266 to 8cafaf6 Compare August 5, 2025 11:26
per method cooldown enforced correctly, and reset
after peer_connected event
@martinsaposnic martinsaposnic force-pushed the fix-lsps5-notification-cooldown-logic branch from 8cafaf6 to bae3224 Compare August 5, 2025 11:30
@martinsaposnic
Copy link
Contributor Author

squash done!

@martinsaposnic martinsaposnic requested a review from tnull August 5, 2025 11:34
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple enough, landing this!

@tnull tnull merged commit 7c5491d into lightningdevkit:main Aug 5, 2025
22 of 24 checks passed
/// Default maximum number of webhooks allowed per client.
pub const DEFAULT_MAX_WEBHOOKS_PER_CLIENT: u32 = 10;
/// Default notification cooldown time in hours.
pub const DEFAULT_NOTIFICATION_COOLDOWN_HOURS: Duration = Duration::from_secs(60 * 60); // 1 hour
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems wayyyy too high. If we get two payments an hour apart, just because the first wakeup was missed doesn't mean we don't want to send another. A phone may be disconnected from the network or low on battery when the first was sent and the phone decided not to do a wakeup, but it may well be in a different state an hour later. Yea, we don't want to send 1000 notifications in an hour, but we definitely should be comfortable sending five in an hour. We either need to crank this way down (one a minute?) or track and allow bursts.

last_used: LSPSDateTime,
// Map of last notification sent timestamps for each notification method.
// This is used to enforce notification cooldowns.
last_notification_sent: HashMap<WebhookNotificationMethod, LSPSDateTime>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EEk, we end up with a HashMap<_, HashMap<_, HashMap<_, _>>> here, let's condense that please. First of all we probably don't need to track the last-sent notification on a per-webhook basis, it seems like it can be per-client.

Secondly, indexing this by WebhookNotificationMethod means we track per-timeout for LSPS5ExpirySoon notifications, which I don't think is what we want. Once we fix that, we have 5 notification types, which definitely don't need to be a HashMap, probably not even a Vec, a [X; 5] should do fine :).

@@ -318,10 +329,14 @@ where
/// This builds a [`WebhookNotificationMethod::LSPS5PaymentIncoming`] webhook notification, signs it with your
/// node key, and enqueues HTTP POSTs to all registered webhook URLs for that client.
///
/// This may fail if a similar notification was sent too recently,
/// violating the notification cooldown period defined in [`DEFAULT_NOTIFICATION_COOLDOWN_HOURS`].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these links should be to the Config not the default constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants