Skip to content

How to handle parse of persisted objects due to new UTF-8 validation #922

@imirkin

Description

@imirkin

"Recently" (i.e. since the last time we performed a vendored code update), strings gained UTF-8 validation of data. So now parsing objects stored in a database or log files no longer works, if they had any offending sequences (which, generically, could be in any string).

The solutions I see:

  1. Use "proto2". This is a non-starter since it would require a rewrite of all proto definitions (and, I think, code).
  2. Do a global s/string/bytes/ and adjust all the code to wrap each field access in string() and write in a []byte(). This is a lot of replacement, but doable. However printing objects will be nasty now, we can no longer printf("%v") it and expect to see a string.
  3. Set validateUTF8 = false in the vendored code instead of = true. Easy enough, but begins a maintenance burden and moves us away from upstream.
  4. At every Marshal/Unmarshal site, check the returned error for being of type "interface{ InvalidUTF8() bool }". Seems feasible, esp if we provide our own wrappers for Marshal/Unmarshal and update all the existing call sites to use the new wrappers.

I was wondering if the protobuf team (or anyone else) had any other suggestions or opinions on good ways to address this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions