Skip to content

Adjust GcpLayout JSON to latest format #3586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: 2.x
Choose a base branch
from

Conversation

ViliusS
Copy link

@ViliusS ViliusS commented Apr 2, 2025

This is my first contribution to Log4J so I'm not 100% sure if I did everything right, but here it goes.

This PR adjusts GcpLayout JSON formatting to match what is expected by GCP.

First, it formats the log timestamp field to the correct format recognized by Fluent-Bit (component of Google Cloud Logging) and Google Ops Agent. I'm not sure how it worked before, probably it didn't, but Google Cloud Logging expects timestampSecond + timestampNanos format. Ops Agent, which also can be used to feed the logs from external Log4J environment via Socket or RollingFile into GCP, also expects the same format. More over, the same timestamp format is always used in Spring GCP libraries.

Secondly, severity field now must be prefixed with logging.googleapis.com. I'm not 100% sure regarding this, because Spring GCP libraries are still using just severity field name. However Ops Agent (link above) already uses logging.googleapis.com/severity, and simple severity name is only described in legacy logging agent which is deprecated.

Third, counter cannot be used for logging.googleapis.com/insertId as it is duplicated on different threads. I've chosen to use time-based UUID, but if that increases garbage collection issues it safely can be removed. GCP will generate their own IDs anyway.

And the last but not the least, I have renamed exception, thread and logger fields. I saw the PR discussion in #543, however these fields are pretty standard when logging via Logback's JSON layout and then they are reused in Google's Spring GCP libraries. Sadly, I didn't find how to completely remove exception field from JSON output if it's empty. Looks like all empty fields are removed except for pattern type. I also found https://issues.apache.org/jira/browse/LOG4J2-3053 regarding that, but not sure about its state. This is not a big issue but is preferable for GCP.
By the way, Log4J docs are a little bit incorrect. If message field exist, having separate exception does nothing to catch the error in Google Error Reporting. According to Google docs, if message field exist it must contain stack trace and only that stack trace is used in the Error Reporting UI. I tested this extensively. But I left separate exception field if one wants to filter it in the logs.

All in all, this should provide much better JSON field mapping experience on GCP. If you think this PR is usable I then will go ahead and fix tests for timestamp processing.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViliusS,

Thanks for you contribution.

Did you test it on an actual Google Cloud instance? I am not a GCP user, but some of the changes you propose differ from the documented log format.

Copy link

github-actions bot commented Apr 2, 2025

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@ViliusS
Copy link
Author

ViliusS commented Apr 2, 2025

@ViliusS,

Thanks for you contribution.

Did you test it on an actual Google Cloud instance? I am not a GCP user, but some of the changes you propose differ from the documented log format.

Yes I did. Here are some screenshots.

Exception caught because of the stack trace in message field:
image
Expanded jsonPayload fields in the logging:
image
Proof about automatic timestamp mapping (previously this field was generated by Google and the one sent from Log4J lived in jsonPayload.timestamp:
image

The same error caught in Error Reporting UI:
image

@ViliusS ViliusS requested a review from ppkarwasz April 2, 2025 14:48
@ppkarwasz ppkarwasz requested review from vy and removed request for ppkarwasz April 13, 2025 06:59
@ppkarwasz
Copy link
Contributor

Let's wait for @vy's review.

@ViliusS, in this repository we require signed commits, so you need to:

  1. Add your GPG public key to your GitHub Settings if you didn't do that already.
  2. Rebase the PR to resign all the commits.

@vy vy added layouts Affects one or more Layout plugins bug Incorrect, unexpected, or unintended behavior of existing code labels Apr 17, 2025
ViliusS added 7 commits April 21, 2025 12:32
First, it formats the log timestamp field to the correct format
recognized by Fluent-Bit (component of Google Cloud Logging) and Google
Ops Agent.

Secondly, severity field now must be prefixed with
logging.googleapis.com.

Third, counter cannot be used for insertId as it is duplicated on
different threads.

And the last but not the least, exception, thread and logger fields are
pretty standard when logging via Logback's JSON layout and Google's
Spring GCP libraries. Field name changes now match these other loggers.
@ViliusS ViliusS force-pushed the gcplayout-improvements branch from 718ccbd to 5e0656e Compare April 21, 2025 09:34
@ViliusS
Copy link
Author

ViliusS commented Apr 21, 2025

Let's wait for @vy's review.

@ViliusS, in this repository we require signed commits, so you need to:

  1. Add your GPG public key to your GitHub Settings if you didn't do that already.
  2. Rebase the PR to resign all the commits.

Done.

@vy
Copy link
Member

vy commented Apr 23, 2025

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

@ppkarwasz, thoughts?

@ViliusS, would you mind adding a changelog entry file (i.e., src/changelog/.2.x.x/3586_fix_GcpLayout.xml), please? (Mind the entry.type attribute and entry > issue element.)

@ppkarwasz
Copy link
Contributor

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

I would call the new format GoogleOpsAgent.json, FluentBit.json or something similar.

However, I am not entirely sure that this template file belongs to Log4j. The only users that can profit from it are Google Ops Agent users, so we might consider moving the file to that project instead. See also GoogleCloudPlatform/ops-agent#1920 (comment)

@ViliusS
Copy link
Author

ViliusS commented Apr 23, 2025

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

@ppkarwasz, thoughts?

@ViliusS, would you mind adding a changelog entry file (i.e., src/changelog/.2.x.x/3586_fix_GcpLayout.xml), please? (Mind the entry.type attribute and entry > issue element.)

Changelog added.

Regarding backward compatibility I see it this way. The proposed changes are not really fixes, they are changes/improvements. Old schema probably worked in some sense, but required a little bit tinkering for proper GCP field mapping on Google side. It is natural that schemas change and they need to be re-adjusted, there is no way to avoid that when dealing with 3rd party services. The real question is when backward compatibility and how it should be broken.
You are right that some users could be using old schema already, so let's try to list what's actually changed and the use cases.

  1. The timestamp field users who just send the data from previous field directly to GCP not realizing that it is not mapped to real log timestamp. Nothing serious changes for them, new timestamp fields will be sent to GCP and automatically mapped to log timestamp. They will be missing previous jsonPayload.timestamp field on GCP side, but that should not break anything serious.
  2. Timestamp field users who processed current field values with Ops Agent or any other tool to be able to map it to GCP timestamp. Such tools usually just ignore missing field, but again information is now sent to GCP's timestamp field automatically, so nothing is lost.
  3. Removed insertId field. Nothing should depend on this field. It is for internal GCP usage and will now be generated automatically.
  4. Renamed/Changed exception format and removed separate class and message subfields. Some could depend on these fields in Google Cloud Monitoring but would consider it as an edge case, since the exception was/is duplicated in main message field anyway and was/is caught by the Google Error Reporting automatically. There was/is not need to configure separate monitoring.
  5. Renamed thread and logger fields. Some could depend on these fields in Google Cloud Monitoring but, again, I would consider it as an edge case.

Considering that new adjustments makes it a lot easier to send logs to GCP now without major processing in between, and if they are included in 2.25.0, I think it is fair.

@ppkarwasz I wouldn't call it OpsAgent. This format could be used to print directly to stdout which is then automatically sent to GCP on Google's Kubernetes Engine or serverless AppEngine services. I would not differentiate it from ECS or Gelf you already support.
I also don't think it belongs to Ops Agent repo. If, for example, I want to use two monitoring systems and both require different formats moving JSON template elsewhere means that I must use Ops Agent for one monitoring solution, and then use different agent for another solution. Having JSON templates in central repo makes things must easier and more transparent.

@ppkarwasz
Copy link
Contributor

I wouldn't call it OpsAgent. This format could be used to print directly to stdout which is then automatically sent to GCP on Google's Kubernetes Engine or serverless AppEngine services. I would not differentiate it from ECS or Gelf you already support.

Looking at GoogleCloudPlatform/ops-agent#1920 (comment) there are currently three different ways to interpret structured logs in Google Cloud:

  1. The legacy fluentd-based agent, which we can probably disregard.
  2. The default out_stackdriver configuration.
  3. The special fields interpreted by OpsAgent.

So I think we should use a template name that suggests, which of these formats is supported, like GoogleOpsAgent.json. This way we can further specialize the format by:

  • Replacing severity with logging.googleapis.com/severity.
  • Removing the unsupported logging.googleapis.com/trace_sampled field.

Note: We can also add multiple templates, one for stackdriver and one for OpsAgent. What do you think? To properly maintain the new templates, for me it is important to know exactly which specification we need to follow.

@ppkarwasz
Copy link
Contributor

Regarding the logging.googleapis.com/trace_sampled field I submitted open-telemetry/opentelemetry-java-instrumentation#13774, so we don't need to hard code it to true.

@ViliusS
Copy link
Author

ViliusS commented Apr 25, 2025

I wouldn't call it OpsAgent. This format could be used to print directly to stdout which is then automatically sent to GCP on Google's Kubernetes Engine or serverless AppEngine services. I would not differentiate it from ECS or Gelf you already support.

Looking at GoogleCloudPlatform/ops-agent#1920 (comment) there are currently three different ways to interpret structured logs in Google Cloud:

  1. The legacy fluentd-based agent, which we can probably disregard.
  2. The default out_stackdriver configuration.
  3. The special fields interpreted by OpsAgent.

So I think we should use a template name that suggests, which of these formats is supported, like GoogleOpsAgent.json. This way we can further specialize the format by:

  • Replacing severity with logging.googleapis.com/severity.
  • Removing the unsupported logging.googleapis.com/trace_sampled field.

Note: We can also add multiple templates, one for stackdriver and one for OpsAgent. What do you think? To properly maintain the new templates, for me it is important to know exactly which specification we need to follow.

I thought about this earlier, but for me it looked like too few differences to support different templates. Also, because of those subtle differences, it could be complicated to document for the end users which template in which case needs to be used.

Given that OpsAgent is just one tool to process logs, and folks could be using any other log sending methods or processors, like for example native OTLP one, there is a question how many of these JSON templates we should have.

That's why I tried to come up with a format which is as close as possible natively supported by Google Cloud Looging itself, so you will need as few pre-processing as possible. Sure, you still need to apply few tricks in every processor (in OpsAgent case for severity and trace_sampled fields), but in my mind these tools are there exactly for that reason, i.e. match log fields from source to destination, in case of differences.

Nevertheless, I will support any decision and can re-adjust current PR if you decide otherwise.

@ViliusS ViliusS force-pushed the gcplayout-improvements branch from 8b17cb8 to 9e8a03b Compare April 25, 2025 14:05
@ppkarwasz
Copy link
Contributor

@vy,

What do you think about two new templates instead of a single one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect, unexpected, or unintended behavior of existing code layouts Affects one or more Layout plugins
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

3 participants