database,redis: More verbosity in OnRetryableError #124

oxzi · 2025-04-15T15:45:21Z

database,redis: More verbosity in OnRetryableError

The OnRetryableError functions came with conditions to only log the
error if it had changed since the last iteration. While this keeps the
log file clean, it hides reoccurring errors and makes it harder to
identify specific errors.

The changed version always logs the reoccurring error and also includes
the duration since the start and the current attempt.

oxzi · 2025-04-16T10:30:23Z

Removing "retry deadline exceeded" from the error message also results in more useful error messages for users. Take Icinga/icingadb#900 for an example.

lippserd

retry.WithBackoff: Reduce error wrapping
...
But this additional layer does not really carry any useful information.

This helps to easily distinguish non-retryable from "timeout" errors.

Removing "retry deadline exceeded" from the error message also results in more useful error messages for users. Take Icinga/icingadb#900 for an example.

That's a different "retry deadline exceeded" error:

https://github.com/Icinga/icingadb/blob/main/pkg/icingadb/ha.go#L166

That doesn't mean that I don't want to change anything but I'm not a fan of just removing the wrapping. The ha:166 error could get more context, a more understandable message or whatever. And if wrapping is such a problem, I would change the renderer, output the error messages first and then the stack trace(s).

database/redis: More verbosity in OnRetryableError
...
The changed version always logs the reoccurring error and also includes
the duration since the start and the current attempt.

If I am not mistaken, this leads to a large number of error messages in the first few seconds, as there is not much time between retries. Can this be optimized, for example, to only log ~~every~~ after at least 10 seconds if the error message has not changed?

oxzi · 2025-05-09T08:57:18Z

retry.WithBackoff: Reduce error wrapping
...
But this additional layer does not really carry any useful information.

This helps to easily distinguish non-retryable from "timeout" errors.

Fair enough. I just dislike having another entry in the stack trace carrying the position of this wrapping. I am going to remove this.

Removing "retry deadline exceeded" from the error message also results in more useful error messages for users. Take Icinga/icingadb#900 for an example.

That's a different "retry deadline exceeded" error:

https://github.com/Icinga/icingadb/blob/main/pkg/icingadb/ha.go#L166

You are totally right. I have successfully confused myself.

database/redis: More verbosity in OnRetryableError
...
The changed version always logs the reoccurring error and also includes
the duration since the start and the current attempt.

If I am not mistaken, this leads to a large number of error messages in the first few seconds, as there is not much time between retries. Can this be optimized, for example, to only log ~~every~~ after at least 10 seconds if the error message has not changed?

Yes, this results in more WARNING log entries but this was my intention.

IMO, if something is broken, it should be shown and not hidden. The current behavior only shows an error once, hiding it for up to five minutes before crashing. The suggested behavior would always show the error, resulting in a few entries recurring entries at first - but only if the error is not recovered directly.

The OnRetryableError functions came with conditions to only log the error if it had changed since the last iteration. While this keeps the log file clean, it hides reoccurring errors and makes it harder to identify specific errors. The changed version always logs the reoccurring error and also includes the duration since the start and the current attempt.

lippserd · 2025-05-09T09:56:36Z

retry.WithBackoff: Reduce error wrapping
...
But this additional layer does not really carry any useful information.

This helps to easily distinguish non-retryable from "timeout" errors.

Fair enough. I just dislike having another entry in the stack trace carrying the position of this wrapping. I am going to remove this.

I completely understand your point. The wrapping helps us to understand errors better, but it can be confusing for users since the messages end up on multiple lines. But as I said, I think we can improve this by adjusting how we render the errors.

database/redis: More verbosity in OnRetryableError
...
The changed version always logs the reoccurring error and also includes
the duration since the start and the current attempt.

If I am not mistaken, this leads to a large number of error messages in the first few seconds, as there is not much time between retries. Can this be optimized, for example, to only log ~~every~~ after at least 10 seconds if the error message has not changed?

Yes, this results in more WARNING log entries but this was my intention.

IMO, if something is broken, it should be shown and not hidden. The current behavior only shows an error once, hiding it for up to five minutes before crashing. The suggested behavior would always show the error, resulting in a few entries recurring entries at first - but only if the error is not recovered directly.

I thought the log gets cluttered because of the frequent retries in the first few seconds, but there are only six messages in the first 10 seconds, so that's perfectly fine:

2025-05-09T11:33:17.565+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "1.30975ms", "attempt": 1}
2025-05-09T11:33:17.717+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "153.499917ms", "attempt": 2}
2025-05-09T11:33:18.184+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "620.013209ms", "attempt": 3}
2025-05-09T11:33:19.001+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "1.436866667s", "attempt": 4}
2025-05-09T11:33:20.775+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "3.211682125s", "attempt": 5}
2025-05-09T11:33:22.843+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "5.279387875s", "attempt": 6}
2025-05-09T11:33:30.280+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "12.716074292s", "attempt": 7}
2025-05-09T11:33:45.382+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "27.817010459s", "attempt": 8}
2025-05-09T11:34:12.283+0200	WARN	database	Can't connect to database. Retrying	{"error": "dial tcp 127.0.0.1:5432: connect: connection refused", "after": "54.717846s", "attempt": 9

cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Apr 15, 2025

oxzi mentioned this pull request Apr 15, 2025

Small Logging Rework: More logging in OnRetryableError in HA.realize and abort with a message instead of a stack trace Icinga/icingadb#944

Merged

oxzi changed the title ~~Logging small rework~~ Small Logging Rework: Reduce error wrapping in retry.WithBackoff and more verbosity in OnRetryableError Apr 15, 2025

oxzi mentioned this pull request May 7, 2025

icingadb-migrate: Logger for Database connections Icinga/icingadb#957

Merged

oxzi requested a review from lippserd May 7, 2025 07:32

oxzi added this to the 0.6.4 milestone May 7, 2025

lippserd requested changes May 7, 2025

View reviewed changes

oxzi force-pushed the logging-small-rework branch from aa60419 to 0c7ca17 Compare May 9, 2025 08:59

oxzi changed the title ~~Small Logging Rework: Reduce error wrapping in retry.WithBackoff and more verbosity in OnRetryableError~~ database,redis: More verbosity in OnRetryableError May 9, 2025

oxzi requested a review from lippserd May 9, 2025 08:59

lippserd approved these changes May 9, 2025

View reviewed changes

oxzi merged commit b05aeed into main May 9, 2025
15 checks passed

oxzi deleted the logging-small-rework branch May 9, 2025 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

database,redis: More verbosity in OnRetryableError #124

database,redis: More verbosity in OnRetryableError #124

Uh oh!

oxzi commented Apr 15, 2025 •

edited

Loading

Uh oh!

oxzi commented Apr 16, 2025

Uh oh!

lippserd left a comment •

edited

Loading

Uh oh!

oxzi commented May 9, 2025

Uh oh!

lippserd commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

database,redis: More verbosity in OnRetryableError #124

database,redis: More verbosity in OnRetryableError #124

Uh oh!

Conversation

oxzi commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oxzi commented Apr 16, 2025

Uh oh!

lippserd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oxzi commented May 9, 2025

Uh oh!

lippserd commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

oxzi commented Apr 15, 2025 •

edited

Loading

lippserd left a comment •

edited

Loading