Skip to content

Feature request: Parse exceptions into structured logs #1268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 2 tasks
Nogginboink opened this issue Jun 29, 2022 · 10 comments · Fixed by #3147
Closed
1 of 2 tasks

Feature request: Parse exceptions into structured logs #1268

Nogginboink opened this issue Jun 29, 2022 · 10 comments · Fixed by #3147
Assignees
Labels

Comments

@Nogginboink
Copy link

Use case

When using the logger and logging an exception, the exception traceback appears as a single string that includes the stack trace, each frame of which has a file, a line number, a function name, and the statement on that line. Parsing that line, either with the Mk I eyeball or with code, is difficult

Solution/User Experience

Because logger is all about structured logging, it would be nice if the stack backtrace were structured as well. Something like:

Traceback = [
  {
    "File" : "myfile.py",
    "Line" : 44,
    "Function" : "do_something_risky",
    "Statement": "do_division(x, 0)"
  },
  {
    "File" : "myotherfile.py",
    "Line": 12,
    "Function" : "do_division",
    "Statement" : "return numerator/denominator
  }
]

Alternative solutions

There are no good ones that I'm aware of. I might be able to overload the existing functions in the traceback module, but doing that for every lambda function in my solution is not ideal

Acknowledgment

@Nogginboink Nogginboink added feature-request feature request triage Pending triage from maintainers labels Jun 29, 2022
@heitorlessa
Copy link
Contributor

That's a great idea @Nogginboink! Are you thinking of including traceback field in addition to exception_name and exception? Or an opt-in like POWERTOOLS_LOGGER_ENHANCED_TRACEBACK behaviour that replaces the value of exception with that list?

We should definitely look into making this a thing after our pause ends (end of the month). We could ping back here after the pause to ask whether you're up to contribute w/ any guidance or for us to schedule it along with our backlog.

Thank you!

@heitorlessa heitorlessa added area/logger and removed triage Pending triage from maintainers labels Jul 4, 2022
@heitorlessa heitorlessa added the help wanted Could use a second pair of eyes/hands label Aug 1, 2022
@ppavlovdev
Copy link

Hey @Nogginboink @heitorlessa! I would like to help you guys to implement this feature.

Regarding to your conversation, I think we should stick to additional traceback field rather than replacing exception field. Let's assume that someone somewhere will want to process raw exception in a custom way.

We can add include_traceback or format_traceback flag to LambdaPowertoolsFormatter constructor for more flexibility. Keep in mind that there possibly can be a lot of data in traceback and in some cases it can cause user-specific problems.

@kbakk
Copy link
Contributor

kbakk commented Jan 7, 2023

Wanted to see if there are some standards.

OpenTelemetry came to mind, but they seem to model it as a string under exception.stacktrace (https://opentelemetry.io/docs/reference/specification/logs/semantic_conventions/exceptions/#attributes).

Google only mentions that it should be part of the message key/val (https://cloud.google.com/logging/docs/structured-logging), but I couldn't find how to actually format it.

Sentry have a lot richer model (full example in their demo site - https://try.sentry-demo.com/organizations/on-caiman/issues/?project=141374&query=&referrer=issue-list&statsPeriod=14d):

...
"exception": {
    "values": [
        {
            "type": "SSLError",
            "value": "('The certificate belonging to https://app.example.com has expired.',)",
            "module": "ssl",
            "stacktrace": {
                "frames": [
                    {
                        "function": "<module>",
                        "module": "__main__",
                        "filename": "purchase.py",
                        "abs_path": "/empowerplant/purchase.py",
                        "lineno": 36,
                        "pre_context": [
                            "    generate_invoice(status)",
                            "    return 200",
                            "",
                            "main(request={",
                            "    \"ccno\": ****************,"
                        ],
                        "context_line": "    \"cart\": [11.69, 4.95, 5.99],",
                        "post_context": [
                            "})"
                        ],
                        "in_app": true,
                        "vars": {
                            "__annotations__": {},
                            "__builtins__": "<module 'builtins' (built-in)>",
                            "__cached__": "None",
                            "__doc__": "None",
                            "__file__": "'purchase.py'",
                            "__loader__": "<_frozen_importlib_external.SourceFileLoader object at 0x107f5c850>",
                            "__name__": "'__main__'",
                            "__package__": "None",
                            "__spec__": "None",
                            "init": "<function _init at 0x1088598c0>"
                        },
                        "data": {
                            "orig_in_app": 1
                        }
                    },
...

This seems to be (at least part of) the code responsible for the frame formatting/extraction:

https://github.com/getsentry/sentry-python/blob/2f67f12e405f8a6f89418d96071158367fcf516f/sentry_sdk/utils.py#L551

Because there are so many different use cases, it could be a good idea to account for that the users might want to format the tracebacks differently, by allowing to pass a richer set of config options when initializing the logger. E.g. I might want filename and lineno, others might want a lot more, maybe even run a custom data scrubbing function.

@stevrobu
Copy link
Contributor

I'd like to take this request on.

@leandrodamascena
Copy link
Contributor

Hi @stevrobu! It will be amazing if you can work on this issue.

Do you need any help? I can help you with some guidance if needed.

Thanks a lot

@stevrobu
Copy link
Contributor

@leandrodamascena - Yes. I would appreciate some help. I will connect with you offline.

@leandrodamascena
Copy link
Contributor

Hi @stevrobu! I was playing around with this issue and managed to add a new field with the exception formatted in JSON. The code isn't good yet and missing important things, but I think it's a good starting point. What do you think we talk so I can show you what I did and you can follow?

Code example:

import requests

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext

ENDPOINT = "http://httpbin.org/status/500"
logger = Logger()


try:
    ret = requests.get(ENDPOINT)
    ret.raise_for_status()
except requests.HTTPError as e:
    logger.exception("Received a HTTP 5xx error")
    raise RuntimeError("Unable to fullfil request") from e

Output:

{
   "level":"ERROR",
   "location":"<module>:15",
   "message":"Received a HTTP 5xx error",
   "timestamp":"2023-09-27 23:38:05,190+0100",
   "service":"service_undefined",
   "exception":"Traceback (most recent call last):\n  File \"/tmp/xyz/leo.py\", line 13, in <module>\n    ret.raise_for_status()\n  File \"/tmp/xyz/.venv/lib64/python3.10/site-packages/requests/models.py\", line 1021, in raise_for_status\n    raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://httpbin.org/status/500",
   "traceback":[
      {
         "exc_type":"HTTPError",
         "exc_value":"500 Server Error: INTERNAL SERVER ERROR for url: http://httpbin.org/status/500",
         "syntax_error":null,
         "is_cause":false,
         "frames":[
            {
               "filename":"/tmp/xyz/leo.py",
               "lineno":13,
               "name":"<module>",
               "line":"",
               "locals":null
            },
            {
               "filename":"/tmp/xyz/.venv/lib64/python3.10/site-packages/requests/models.py",
               "lineno":1021,
               "name":"raise_for_status",
               "line":"",
               "locals":null
            }
         ]
      }
   ],
   "exception_name":"HTTPError"
}

@stevrobu
Copy link
Contributor

@leandrodamascena - Sure. Let's talk. I have a pull request ready, but let's compare code first.

@leandrodamascena
Copy link
Contributor

Wanted to see if there are some standards.

Sentry have a lot richer model (full example in their demo site - try.sentry-demo.com/organizations/on-caiman/issues/?project=141374&query=&referrer=issue-list&statsPeriod=14d):

I really like the way the Sentry creates exception and we try to add something like that.

...
"exception": {
    "values": [
        {
            "type": "SSLError",
            "value": "('The certificate belonging to https://app.example.com has expired.',)",
            "module": "ssl",
            "stacktrace": {
                "frames": [
                    {
                        "function": "<module>",
                        "module": "__main__",
                        "filename": "purchase.py",
                        "abs_path": "/empowerplant/purchase.py",
                        "lineno": 36,
                        "pre_context": [
                            "    generate_invoice(status)",
                            "    return 200",
                            "",
                            "main(request={",
                            "    \"ccno\": ****************,"
                        ],
                        "context_line": "    \"cart\": [11.69, 4.95, 5.99],",
                        "post_context": [
                            "})"
                        ],
                        "in_app": true,
                        "vars": {
                            "__annotations__": {},
                            "__builtins__": "<module 'builtins' (built-in)>",
                            "__cached__": "None",
                            "__doc__": "None",
                            "__file__": "'purchase.py'",
                            "__loader__": "<_frozen_importlib_external.SourceFileLoader object at 0x107f5c850>",
                            "__name__": "'__main__'",
                            "__package__": "None",
                            "__spec__": "None",
                            "init": "<function _init at 0x1088598c0>"
                        },
                        "data": {
                            "orig_in_app": 1
                        }
                    },
...

This seems to be (at least part of) the code responsible for the frame formatting/extraction:

getsentry/sentry-python@2f67f12/sentry_sdk/utils.py#L551

Because there are so many different use cases, it could be a good idea to account for that the users might want to format the tracebacks differently, by allowing to pass a richer set of config options when initializing the logger. E.g. I might want filename and lineno, others might want a lot more, maybe even run a custom data scrubbing function.

I'm not sure if we will have any experience gained if we allow the customer to add/remove fields from the stacktrace. This is a stracktrace, the customer generally wants to know the error that occurred and the frames where the error occurred in order to parse it. If I allowed the customer to remove the line, for example, how will they find the error? I think we should define a standard and use it. What do you think @kbakk and @stevrobu?

We have a PR - #3147. In addition to some bugs that we need to fix, I would like to hear your feedback.

cc @heitorlessa

Thank you.

@leandrodamascena leandrodamascena moved this from Backlog to Working on it in Powertools for AWS Lambda (Python) Oct 2, 2023
@leandrodamascena leandrodamascena removed the help wanted Could use a second pair of eyes/hands label Oct 2, 2023
@github-project-automation github-project-automation bot moved this from Working on it to Coming soon in Powertools for AWS Lambda (Python) Oct 10, 2023
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@leandrodamascena leandrodamascena self-assigned this Oct 11, 2023
@leandrodamascena leandrodamascena moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Shipped
Development

Successfully merging a pull request may close this issue.

6 participants