-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
"Recently" (i.e. since the last time we performed a vendored code update), strings gained UTF-8 validation of data. So now parsing objects stored in a database or log files no longer works, if they had any offending sequences (which, generically, could be in any string).
The solutions I see:
- Use "proto2". This is a non-starter since it would require a rewrite of all proto definitions (and, I think, code).
- Do a global s/string/bytes/ and adjust all the code to wrap each field access in string() and write in a []byte(). This is a lot of replacement, but doable. However printing objects will be nasty now, we can no longer printf("%v") it and expect to see a string.
- Set validateUTF8 = false in the vendored code instead of = true. Easy enough, but begins a maintenance burden and moves us away from upstream.
- At every Marshal/Unmarshal site, check the returned error for being of type "interface{ InvalidUTF8() bool }". Seems feasible, esp if we provide our own wrappers for Marshal/Unmarshal and update all the existing call sites to use the new wrappers.
I was wondering if the protobuf team (or anyone else) had any other suggestions or opinions on good ways to address this.
darthwalsh