Skip to content

Corrupt msgpack encoding #970

@ngraef

Description

@ngraef

Describe the bug

I noticed my agents are periodically crashing after a bunch of trace decoding errors.

Agent log
[...]
2020-05-28 22:21:20 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:20 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:20 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "float64" with method for "str"
2020-05-28 22:21:21 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:21 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:21 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:22 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:22 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:22 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:22 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:23 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:23 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:23 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:24 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:24 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:24 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:24 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:25 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:25 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:25 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:26 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:26 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:26 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:26 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:27 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:27 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:27 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:28 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:28 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:28 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:28 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:29 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:29 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:29 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:30 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:30 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:30 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:30 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: unexpected EOF
2020-05-28 22:21:31 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:31 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:31 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:32 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:32 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:32 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:32 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: unexpected EOF
2020-05-28 22:21:33 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:33 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "map" with method for "str"
2020-05-28 22:21:33 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:33 UTC | TRACE | INFO | (pkg/trace/info/stats.go:108 in LogStats) | [lang:nodejs lang_version:v12.16.1 interpreter:v8 tracer_version:0.18.1] -> traces received: 180, traces filtered: 0, traces amount: 94650 bytes, events extracted: 0, events sampled: 0
2020-05-28 22:21:33 UTC | TRACE | INFO | (pkg/trace/info/stats.go:108 in LogStats) | [lang:nodejs lang_version:v12.16.1 interpreter:v8 tracer_version:0.19.1] -> traces received: 30, traces filtered: 0, traces amount: 13440 bytes, events extracted: 0, events sampled: 0
2020-05-28 22:21:33 UTC | TRACE | WARN | (pkg/trace/info/stats.go:111 in LogStats) | [lang:nodejs lang_version:v12.16.1 interpreter:v8 tracer_version:0.19.1] -> traces_dropped(decoding_error:39697). Enable debug logging for more details.
2020-05-28 22:21:34 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "int" with method for "str"
2020-05-28 22:21:34 UTC | TRACE | ERROR | (pkg/trace/api/api.go:379 in handleTraces) | Cannot decode v0.4 traces payload: msgp: attempted to decode type "array" with method for "str"
2020-05-28 22:21:34 UTC | CORE | WARN | (pkg/logs/input/docker/launcher.go:268 in restartTailer) | Could not resume tailing container 7f1dfb34d821
seelog internal error: write /var/log/datadog/agent.log: cannot allocate memory
2020-05-28 22:21:36 UTC | CORE | WARN | (pkg/logs/auditor/auditor.go:172 in run) | write /opt/datadog-agent/run/registry.json: cannot allocate memory
seelog internal error: write /var/log/datadog/agent.log: cannot allocate memory
2020-05-28 22:21:37 UTC | PROCESS | CRITICAL | (pkg/process/util/signal_nowindows.go:21 in HandleSignals) | Caught signal 'terminated'; terminating.
2020-05-28 22:21:37 UTC | PROCESS | INFO | (pkg/forwarder/forwarder.go:243 in Stop) | stopping the Forwarder
2020-05-28 22:21:37 UTC | PROCESS | INFO | (pkg/forwarder/domain_forwarder.go:200 in Stop) | domainForwarder stopped
2020-05-28 22:21:37 UTC | PROCESS | INFO | (pkg/forwarder/forwarder.go:243 in Stop) | stopping the Forwarder
2020-05-28 22:21:37 UTC | CORE | INFO | (cmd/agent/app/run.go:104 in func2) | Received signal 'terminated', shutting down...
2020-05-28 22:21:37 UTC | TRACE | INFO | (main.go:23 in handleSignal) | received signal 15 (terminated)
2020-05-28 22:21:37 UTC | CORE | INFO | (pkg/collector/runner/runner.go:149 in Stop) | Runner is shutting down...
2020-05-28 22:21:37 UTC | CORE | INFO | (pkg/collector/python/subprocesses.go:48 in TerminateRunningProcesses) | Canceling all running python subprocesses
2020-05-28 22:21:37 UTC | CORE | INFO | (pkg/collector/runner/runner.go:166 in func1) | Stopping Check kubelet that is still running...
2020-05-28 22:21:37 UTC | CORE | INFO | (pkg/collector/runner/runner.go:166 in func1) | Stopping Check disk that is still running...
2020-05-28 22:21:37 UTC | CORE | INFO | (pkg/aggregator/aggregator.go:639 in run) | Stopping aggregator
ifelse: fatal: unable to spawn s6-test: Out of memory
2020-05-28 22:21:38 UTC | CORE | WARN | (pkg/logs/auditor/auditor.go:172 in run) | write /opt/datadog-agent/run/registry.json: cannot allocate memory
backtick: fatal: unable to exec importas: Out of memory

So I turned on debug logging in dd-trace in my app to inspect the trace payloads. Here's a sample span that was logged by the tracer:

Encoding trace: [{"trace_id":"31731c1fac15ceea","span_id":"31731c1fac15ceea","parent_id":"0000000000000000","name":"pg.query","resource":"INSERT INTO \"user_role\"(\"roleId\", \"userId\") VALUES ($1, $2)","error":0,"meta":{"service":"moleculer","version":"0.14.7","span.kind":"client","db.type":"postgres","db.name":"iam","db.user":"velocity","out.host":"postgres"},"metrics":{"_sample_rate":1,"out.port":5432,"_dd.agent_psr":1,"_sampling_priority_v1":1},"start":1590733517735480600,"duration":1671143,"service":"moleculer-postgres","type":"sql"}]
Added encoded trace to buffer: 91 8c a8 74 72 61 63 65 5f 69 64 cf 31 73 1c 1f ac 15 ce ea b9 6e 28 38 97 e4 3d 34 cf 31 73 1c 1f ac 15 ce ea 98 b4 51 b9 6c b6 c0 16 13 6b b4 d0 ee 4d 00 a8 64 75 72 61 74 69 6f 6e d3 00 00 00 00 00 19 7f e7 a5 65 72 72 6f 72 00 a9 70 61 72 65 6e 74 5f 69 64 cf 00 00 00 00 00 00 00 00 a4 6e 61 6d 65 a8 70 67 2e 71 75 65 72 79 a8 72 65 73 6f 75 72 63 65 d9 3b 49 4e 53 45 52 54 20 49 4e 54 4f 20 22 75 73 65 72 5f 72 6f 6c 65 22 28 22 72 6f 6c 65 49 64 22 2c 20 22 75 73 65 72 49 64 22 29 20 56 41 4c 55 45 53 20 28 24 31 2c 20 24 32 29 a7 73 65 72 76 69 63 65 b2 6d 6f 6c 65 63 75 6c 65 72 2d 70 6f 73 74 67 72 65 73 a4 74 79 70 65 a3 73 71 6c a4 6d 65 74 61 87 a7 73 65 72 76 69 63 65 a9 6d 6f 6c 65 63 75 6c 65 72 a7 76 65 72 73 69 6f 6e a6 30 2e 31 34 2e 37 a9 73 70 61 6e 2e 6b 69 6e 64 a6 63 6c 69 65 6e 74 a7 64 62 2e 74 79 70 65 a8 70 6f 73 74 67 72 65 73 a7 64 62 2e 6e 61 6d 65 a3 69 61 6d a7 64 62 2e 75 73 65 72 a8 76 65 6c 6f 63 69 74 79 a8 6f 75 74 2e 68 6f 73 74 a8 70 6f 73 74 67 72 65 73 a7 6d 65 74 72 69 63 73 84 ac 5f 73 61 6d 70 6c 65 5f 72 61 74 65 cb 3f f0 00 00 00 00 00 00 a8 6f 75 74 2e 70 6f 72 74 cb 40 b5 38 00 00 00 00 00 ad 5f 64 64 2e 61 67 65 6e 74 5f 70 73 72 cb 3f f0 00 00 00 00 00 00 b5 5f 73 61 6d 70 6c 69 6e 67 5f 70 72 69 6f 72 69 74 79 5f 76 31 cb 3f f0 00 00 00 00 00 00

The printed buffer appears to be corrupt:

91 = 1-item array
  8c = 12-entry map
    --
    a8 = 8-byte string
      74 72 61 63 65 5f 69 64 = "trace_id"
    cf = uint64
      31 73 1c 1f ac 15 ce ea
    --
    b9 = 25-byte string
      6e 28 38 97 e4 3d 34 cf 31 73 1c 1f ac 15 ce ea 98 b4 51 b9 6c b6 c0 16 13 = garbage
    6b = 107
    --
    b4 = 20-byte string
      d0 ee 4d 00 a8 64 75 72 61 74 69 6f 6e d3 00 00 00 00 00 19 = garbage, including the string "duration"
    7f = 127
    --
[...]

It should be:

91 = 1-item array
  8c = 12-entry map
    --
    a8 = 8-byte string
      74 72 61 63 65 5f 69 64 = "trace_id"
    cf = uint64
      31 73 1c 1f ac 15 ce ea
    --
-   b9 6e 28 38 97 e4 3d 34
+   a7 = 7-byte string
+     73 70 61 6e 5f 69 64 = "span_id"
    cf = uint64
      31 73 1c 1f ac 15 ce ea
    --
-   98 b4 51 b9 6c b6 c0 16 13 6b b4 d0 ee 4d 00
+   a5 = 5-byte string
+     73 74 61 72 74 = "start"
+   d3 = int64
+     16 13 6b b4 d0 ee 4d 18 = 1590733517735480600
    --
    a8 = 8-byte string
      64 75 72 61 74 69 6f 6e = "duration"
    d3 = int64
      00 00 00 00 00 19 7f e7 = 1671143
    --
    a5 = 5-byte string
      65 72 72 6f 72 = "error"
    00 = 0
    --
    a9 = 9-byte string
      70 61 72 65 6e 74 5f 69 64 = "parent_id"
    cf = uint64
      00 00 00 00 00 00 00 00
    --
    a4 = 4-byte string
      6e 61 6d 65 = "name"
    a8 = 8-byte string
      70 67 2e 71 75 65 72 79 = "pg.query"
    --
[...]

Oddly, when I add this span data to the encoder unit tests in the dd-trace project, it encodes correctly. So I'm stumped. Any idea why the printed trace and encoded buffer don't match? Is this the cause of the agent decoding errors?

Environment

  • Operating system: Linux / macOS 10.15
  • Node version: 12.16.1
  • Tracer version: 0.21.0
  • Agent version: 7.19.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions