Skip to content

logging: Include error field in journald message, Fix key encoding of all fields for journaldCore #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 27, 2025

Conversation

oxzi
Copy link
Member

@oxzi oxzi commented Apr 22, 2025

logging: Include error field in journald message

Our journald zapcore.Core logs every zap field as a journald field. This has the disadvantage that these fields do not shown up in the default journalctl output, effectively hiding errors. While this is documented, it is not the expected behavior for most users, resulting in incomplete bug reports.

This change introduces a new set of zap fields to be also included in the journald message. Currently, only the "error" field gets this special treatment, allowing errors to be displayed.

An exemplary Icinga DB log message would change as follows while still containing the ICINGADB_ERROR field.

database: Can't connect to database. Retrying
database: Can't connect to database. Retrying error="dial tcp [::1]:5432: connect: connection refused"

As a drive-by change, the package-global variable "priorities" was renamed to "journaldPriorities" to make it more obvious.

Fixes Icinga/icingadb#793.


logging.journaldCore: Fix key encoding of all fields

The encodeJournaldFieldKey function sanitizes the keys for the journald fields. However, calling the AddTo method on some zap fields results in multiple output fields. The keys of these additional fields are not sanitized and may not comply with journald's requirements.

This change applies name sanitation after all fields were added, thereby including also these surplus fields. As a result, errors are now also available as "ERROR_VERBOSE", including a potential stack trace.

@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Apr 22, 2025
@julianbrost julianbrost requested a review from yhabteab April 22, 2025 14:33
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exemplary Icinga DB log message would change as follows while still containing the ICINGADB_ERROR field.

database: Can't connect to database. Retrying
database: Can't connect to database. Retrying error="dial tcp [::1]:5432: connect: connection refused"

Well, that looks quite nice and informative but wouldn't this also include a bunch of stacktraces if the error has any? That looks even scarier and disasteriouser IMHO.

Starting Icinga DB daemon (1.3.0)
Connecting to database at 'mysql://icingadb@localhost:3306/icingadb'
database: Can't connect to database. Retrying	error="dial tcp [::1]:3306: connect: connection refused", errorVerbose="dial tcp [::1]:3306: connect: connection refused\ngithub.com/icinga/icinga-go-library/database.RetryConnector.Connect.func2\n\t/Users/yhabteab/Workspace/go/icinga-go-library/database/driver.go:71\ngithub.com/icinga/icinga-go-library/retry.WithBackoff\n\t/Users/yhabteab/Workspace/go/icinga-go-library/retry/retry.go:106\ngithub.com/icinga/icinga-go-library/database.RetryConnector.Connect\n\t/Users/yhabteab/Workspace/go/icinga-go-library/database/driver.go:48\ndatabase/sql.(*DB).conn\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/database/sql/sql.go:1431\ndatabase/sql.(*DB).PingContext.func1\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/database/sql/sql.go:900\ndatabase/sql.(*DB).retry\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/database/sql/sql.go:1576\ndatabase/sql.(*DB).PingContext\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/database/sql/sql.go:899\ndatabase/sql.(*DB).Ping\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/database/sql/sql.go:917\nmain.run\n\t/Users/yhabteab/Workspace/go/icingadb/cmd/icingadb/main.go:65\nmain.main\n\t/Users/yhabteab/Workspace/go/icingadb/cmd/icingadb/main.go:37\nruntime.main\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/proc.go:283\nruntime.goexit\n\t/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/asm_arm64.s:1223"

@oxzi
Copy link
Member Author

oxzi commented Apr 24, 2025

@yhabteab: How did you created this output? It contains both the error and the errorVerbose field. The latter carries the stack trace and should have not been included here. For reference, it is not part of the output I shared in the commit message/PR description.

@yhabteab
Copy link
Member

yhabteab commented Apr 24, 2025

@yhabteab: How did you created this output?

Well, obviously I don't have systemd installed on my Mac, so I had to workaround it like this:

IcingaDB:

diff --git a/cmd/icingadb/main.go b/cmd/icingadb/main.go
index 3e2815a..b1ebc55 100644
--- a/cmd/icingadb/main.go
+++ b/cmd/icingadb/main.go
@@ -40,6 +40,7 @@ func main() {
 func run() int {
        cmd := command.New()

+       cmd.Config.Logging.Output = logging.JOURNAL
        logs, err := logging.NewLoggingFromConfig(utils.AppName(), cmd.Config.Logging)
        if err != nil {
                utils.PrintErrorThenExit(err, ExitFailure)

Icinga Go Library:

diff --git a/database/driver.go b/database/driver.go
index 5aa03aea..d83a9ed6 100644
--- a/database/driver.go
+++ b/database/driver.go
@@ -68,7 +68,7 @@ func (c RetryConnector) Connect(ctx context.Context) (driver.Conn, error) {
                                }

                                if lastErr == nil || err.Error() != lastErr.Error() {
-                                       c.logger.Warnw("Can't connect to database. Retrying", zap.Error(err))
+                                       c.logger.Warnw("Can't connect to database. Retrying", zap.Error(errors.WithStack(err)))
                                }
                        },
                        OnSuccess: func(elapsed time.Duration, attempt uint64, lastErr error) {
diff --git a/logging/journald_core.go b/logging/journald_core.go
index 263635e4..b4693fa4 100644
--- a/logging/journald_core.go
+++ b/logging/journald_core.go
@@ -6,6 +6,7 @@ import (
        "github.com/pkg/errors"
        "github.com/ssgreg/journald"
        "go.uber.org/zap/zapcore"
+       "log"
        "strings"
 )

@@ -88,6 +89,8 @@ func (c *journaldCore) Write(ent zapcore.Entry, fields []zapcore.Field) error {
        }
        message += "\t" + strings.Join(visibleFields, ", ")

+       log.Println(message)
+       return nil
        return journald.Send(message, pri, enc.Fields)
 }

I know that this specific reconnect error won't have any stacktraces, I just wanted to demonstrate how that would look like with an error with stacks.

@oxzi oxzi marked this pull request as draft April 28, 2025 14:54
@oxzi oxzi force-pushed the logging-journald-include-error-field branch from af1ac54 to 6d0fd3a Compare April 29, 2025 06:58
@oxzi
Copy link
Member Author

oxzi commented Apr 29, 2025

@yhabteab: You were right, of course. I have updated this PR to only include the required fields and no extended versions of a field, e.g., an expanding errorVerbose.

Since I required one (1) utility function which was already available elsewhere, I created #127 which is required to be merged first. Afterwards, I would rebase this PR. If you feel free, please take a look at #127, importing the utility functions from Icinga Notifications here.

@oxzi oxzi added this to the 0.6.4 milestone Apr 29, 2025
@oxzi oxzi force-pushed the logging-journald-include-error-field branch from 6d0fd3a to 372d621 Compare May 8, 2025 07:48
@oxzi oxzi marked this pull request as ready for review May 8, 2025 08:17
@oxzi oxzi requested a review from yhabteab May 8, 2025 08:17
@oxzi oxzi force-pushed the logging-journald-include-error-field branch from 372d621 to b32bdc4 Compare May 12, 2025 12:34
@oxzi oxzi requested a review from yhabteab May 12, 2025 12:36
@oxzi oxzi force-pushed the logging-journald-include-error-field branch from b32bdc4 to 403b859 Compare May 12, 2025 13:22
@oxzi oxzi requested a review from julianbrost May 12, 2025 13:22
@oxzi oxzi force-pushed the logging-journald-include-error-field branch 3 times, most recently from 2531820 to 067af0c Compare May 12, 2025 14:21
@oxzi oxzi requested a review from julianbrost May 12, 2025 14:22
@oxzi oxzi force-pushed the logging-journald-include-error-field branch from 067af0c to a520688 Compare May 14, 2025 10:39
yhabteab
yhabteab previously approved these changes May 14, 2025
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFTM!

2025/05/14 12:46:02 Starting Icinga DB daemon (1.3.0)
2025/05/14 12:46:02 Connecting to database at 'mysql://icingadb@localhost:3306/icingadb'
2025/05/14 12:46:02 database: Can't connect to database. Retrying	attempt="1", elapsed="3.8735ms", error="dial tcp [::1]:3306: connect: connection refused"

@oxzi oxzi marked this pull request as draft May 14, 2025 14:27
@oxzi oxzi changed the title logging: Include error field in journald message logging: Include error field in journald message, Fix key encoding of all fields for journaldCore May 14, 2025
@oxzi
Copy link
Member Author

oxzi commented May 14, 2025

After discussion this PR and a more border change with @lippserd, we realized that at the moment the stack traces (errorVerbose) were missing from the journald logs while being visible when using the console output.

After a short debugging session, the culprit was found: the key name sanitation was called for the original name, but was missing for derived names, such as SOMETHING_ERRORVerbose.

I have decided to add the patch to this PR since these changes are somewhat related.

@oxzi oxzi marked this pull request as ready for review May 14, 2025 14:47
@oxzi oxzi requested review from yhabteab and lippserd May 14, 2025 14:47
@oxzi
Copy link
Member Author

oxzi commented May 20, 2025

@lippserd: This worked for me.

These lines...

c.logger.Warnw("Can't connect to database. Retrying", zap.Error(errors.Wrap(err, "zap oops")))
c.logger.Warnw("Can't connect to database. Retrying", logging.Error(errors.Wrap(err, "igl oops")))

...produced resulted in the following output:

May 15 17:01:56 nwt14 icingadb[566989]: database: Can't connect to database. Retrying        error="zap oops: dial tcp [::1]:3306: connect: connection refused"
May 15 17:01:56 nwt14 icingadb[566989]: database: Can't connect to database. Retrying        error="igl oops: dial tcp [::1]:3306: connect: connection refused"

When viewing journalctl with verbose output, the ICINGADB_ERROR_VERBOSE field is only present for the "zap oops" entry.

@lippserd lippserd force-pushed the logging-journald-include-error-field branch from 5cccb75 to 5dd0a44 Compare May 27, 2025 07:51
oxzi and others added 3 commits May 27, 2025 09:51
Our journald zapcore.Core logs every zap field as a journald field. This
has the disadvantage that these fields do not shown up in the default
journalctl output, effectively hiding errors. While this is documented,
it is not the expected behavior for most users, resulting in incomplete
bug reports.

This change introduces a new set of zap fields to be also included in
the journald message. Currently, only the "error" field gets this
special treatment, allowing errors to be displayed.

An exemplary Icinga DB log message would change as follows while still
containing the ICINGADB_ERROR field.

> database: Can't connect to database. Retrying
> database: Can't connect to database. Retrying	error="dial tcp [::1]:5432: connect: connection refused"

As a drive-by change, the package-global variable "priorities" was
renamed to "journaldPriorities" to make it more obvious.

Fixes Icinga/icingadb#793.
The encodeJournaldFieldKey function sanitizes the keys for the journald
fields. However, calling the AddTo method on some zap fields results in
multiple output fields. The keys of these additional fields are not
sanitized and may not comply with journald's requirements.

This change applies name sanitation after all fields were added, thereby
including also these surplus fields. As a result, errors are now also
available as "ERROR_VERBOSE", including a potential stack trace.
The `Error` function returns a `zap.Field` for logging errors. It checks if an
error includes a stack trace from the `pkg/errors` library. If a stack trace is
present, it is suppressed in the log output as logging a stack trace is
unnecessary. Errors without a stack trace are logged normally.
@lippserd lippserd force-pushed the logging-journald-include-error-field branch from 5dd0a44 to ca887af Compare May 27, 2025 07:51
@oxzi
Copy link
Member Author

oxzi commented May 27, 2025

@lippserd: Thanks for rebasing.

@oxzi oxzi merged commit 33bda3e into main May 27, 2025
15 checks passed
@oxzi oxzi deleted the logging-journald-include-error-field branch May 27, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logging to journald misses error messages
4 participants