How to handle parse of persisted objects due to new UTF-8 validation

"Recently" (i.e. since the last time we performed a vendored code update), strings gained UTF-8 validation of data. So now parsing objects stored in a database or log files no longer works, if they had any offending sequences (which, generically, could be in any string).

The solutions I see:

1. Use "proto2". This is a non-starter since it would require a rewrite of all proto definitions (and, I think, code).
2. Do a global s/string/bytes/ and adjust all the code to wrap each field access in string() and write in a []byte(). This is a lot of replacement, but doable. However printing objects will be nasty now, we can no longer printf("%v") it and expect to see a string.
3. Set validateUTF8 = false in the vendored code instead of = true. Easy enough, but begins a maintenance burden and moves us away from upstream.
4. At every Marshal/Unmarshal site, check the returned error for being of type "interface{ InvalidUTF8() bool }". Seems feasible, esp if we provide our own wrappers for Marshal/Unmarshal and update all the existing call sites to use the new wrappers.

I was wondering if the protobuf team (or anyone else) had any other suggestions or opinions on good ways to address this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to handle parse of persisted objects due to new UTF-8 validation #922

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle parse of persisted objects due to new UTF-8 validation #922

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions