Skip to content

Serialize (and deserialize) string values #864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 28, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Firestore/core/src/firebase/firestore/model/field_value.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,11 @@ class FieldValue {
return integer_value_;
}

const std::string& string_value() const {
FIREBASE_ASSERT(tag_ == Type::String);
return string_value_;
}

/** factory methods. */
static const FieldValue& NullValue();
static const FieldValue& TrueValue();
Expand Down
78 changes: 77 additions & 1 deletion Firestore/core/src/firebase/firestore/remote/serializer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include <pb_decode.h>
#include <pb_encode.h>

#include <string>

namespace firebase {
namespace firestore {
namespace remote {
Expand Down Expand Up @@ -98,6 +100,47 @@ int64_t DecodeInteger(pb_istream_t* stream) {
return DecodeVarint(stream);
}

void EncodeString(pb_ostream_t* stream, const std::string& string_value) {
bool status = pb_encode_string(
stream, reinterpret_cast<const pb_byte_t*>(string_value.c_str()),
string_value.length());
if (!status) {
// TODO(rsgowman): figure out error handling
abort();
}
}

std::string DecodeString(pb_istream_t* stream) {
pb_istream_t substream;
bool status = pb_make_string_substream(stream, &substream);
if (!status) {
// TODO(rsgowman): figure out error handling
abort();
}

std::string result(substream.bytes_left, '\0');
status = pb_read(&substream, reinterpret_cast<pb_byte_t*>(&result[0]),
substream.bytes_left);
if (!status) {
// TODO(rsgowman): figure out error handling
abort();
}

// NB: future versions of nanopb read the remaining characters out of the
// substream (and return false if that fails) as an additional safety
// check within pb_close_string_substream. Unfortunately, that's not present
// in the current version (0.38). We'll make a stronger assertion and check
// to make sure there *are* no remaining characters in the substream.
if (substream.bytes_left != 0) {
// TODO(rsgowman): figure out error handling
abort();
}

pb_close_string_substream(stream, &substream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate to ask this question but does this need to be exception safe?

As it stands the string allocation could fail and we'd fail to undo this business here.

A simple way forward would be to move the allocation of result outside the group of C function calls. Secondarily though, there's the question of when we adopt error handling, is it a leak to fail to call this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilhuff I thought we just crash on exceptions and basically treat them as if they didn't exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Objective-C, yes. Exceptions aren't expected to be handled and the runtime isn't capable of handling them without leaking anyway.

In general we don't throw exceptions there except for assertions and we expect those to crash the app.

In plain C++ that's not necessarily the case. Does anyone reasonably catch std::bad_alloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean in C++. The style guide doesn't seem to distinguish between try and catch, so I presumed we don't use either. I'd be happy if that's not the case.

Copy link
Contributor

@var-const var-const Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, you mean user code should be able to catch the exception and Firestore shouldn't leak. If that's the case, sorry for misunderstanding.

Does anyone reasonably catch std::bad_alloc?

Can't think of how client code might reasonably expect to catch it from here.

Copy link
Member Author

@rsgowman rsgowman Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate to ask this question

You've just kicked the hornet's nest. :(

A simple way forward would be to move the allocation of result outside the group of C function calls

I can't easily[1] do that. I need to wait until after pb_make_string_substream to know how large the string should be. (Well, I could allocate 0 bytes, and then extend it afterwards, but that doesn't avoid the problem.)

[1] - not technically correct. I could peak into the implementation details of pb_make_string_substream and simply do what it does, namely read the size required from the input stream. At which point, I've essentially just inlined the function.

does this need to be exception safe?

In this particular case, I believe the code is safe.

pb_make_string_substream/pb_close_string_substream don't actually allocate any memory. In particular, the close call just sets state on the stream (and newer versions of nanopb also ensure the substream is "consumed", i.e. the pointer is pointing to the end rather than somewhere in the middle.) So generally, there's no leak.

However, the stream may not be consistent, so recovering from bad_alloc and assuming the stream is fine would be incorrect. So as long as we don't catch the exception, we're ok (since by the time the exception goes up to the user, the stream has fallen off the stack. Also note that streams are typically stack allocated, so no free() calls are required.)

That said, these calls look like they could malloc or open() or otherwise consume resources that needs to be freed and, while nanopb clearly tries really hard to not malloc(), there's no guarantee it wouldn't change the implementation in the future to do this. (Side note: if you define PB_ENABLE_MALLOC=0 while compiling nanopb, it will ifdef out all the malloc calls. This reduces functionality, but a quick glance didn't turn up anything we actually use, so this might be worth investigating. Worst case: it would fail and therefore confirm that we are actually mallocing within nanopb.)

Does anyone reasonably catch std::bad_alloc?

Not generally on linux. These don't occur by default on linux due to it's over-committing memory model. (Instead, it relies on the infamous OOM killer to "take care" of things.) But I don't know about android/ios. (Probably developers still don't since there's usually not much that can usefully be done except crash.) OTOH, when people actually do catch bad_alloc, it's safe to assume that they really, really care about memory management, and would likely be disappointed (though not surprised) when a library they used didn't clean up properly due to this state.

So, options:

  1. noop. I think we're in a safe situation right now, even if bad_alloc is thrown. (More generally, we could always ignore bad_alloc and hope no one cares about this state.)
  2. catch bad_alloc (or ...). Then rethrow. (We don't use exceptions. But can we catch/rethrow them? I think we could add a catch block which would simply be not evaluated if -fno-exceptions is defined. In this situation, I think std::string would simply abort.)
  3. similar to 2; use char* buffer = malloc(size) and then check for null. (And then remember to free() after)
  4. similar to 2; but use #ifdef __cpp_exceptions.
  5. inline the nanopb calls. (Not a fan of this one.)
  6. wrap stuff like this in a c++ object with a dtor that handles cleanup. (i.e. RAII-ify nanopb)
  7. provide a different allocator to std::string which never throws, even when exceptions are enabled. (Haven't investigated this, but believe it to be feasible?)

Votes? (Keep in mind this will likely appear in many other places too, so vote twice if you feel we should do something different here than the general guidance.)

I like (1) for this specific case, and (2), (6) or (7) for the general case. (But (1) for the general case may be more pragmatic.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  1. Seems good in this instance
  2. is not viable:
    int main() {
      int* foo = nullptr;
      try {
        foo = new int(42);
      } catch (...) {
      }
      return *foo;
    }
    Fails to compile:
    # clang++ -fno-exceptions -std=c++11 test.cc
    test.cc:3:3: error: cannot use 'try' with exceptions disabled
      try {
      ^
    1 error generated.
    
  3. I'm not super happy about this as an approach just to be exception safe. Prefer (7) if at all.
  4. __cpp_exceptions is insufficient for all compilers. Thankfully ABSL_HAVE_EXCEPTIONS does the right thing. I'd consider this unusable if we had to do this in every function that could potentially throw though.
  5. Agreed
  6. For places it really matters and it's an error we're actually going to handle and continue processing then thisis the way to go. For nanopb specifically, I think any error returned here indicates we made a programming error so this should ultimately end up in an assertion failure.
  7. new (std::nothrow) char[bufsize] from #include <new>.

I'd articulate general principles:

  • In cases where we have heavy weight objects we need to remember to deallocate we should RAII just for our own sanity. At the very least with std::unique_ptr.
  • Avoid malloc/free unless specifically dealing with a C API that requires it (e.g. asprintf if we were crazy enough to use it).
  • Even still, we might consider using unique_ptr so we can't forget, e.g.
    auto buf = std::unique_ptr<char, decltype(free)*>(reinterpret_cast<char*>(malloc(42)), free);
  • We tolerate that libraries we use might throw but don't attempt to recover.
  • We don't throw our own C++ exceptions
  • If we could usefully recover wrap the invocation in a lambda and pass that to a helper that does the #ifdef magic. This will be an extremely high bar to get over and must demonstrably and meaningfully improve developer quality of life.

If we agree I'll chuck these I'll chuck these in our migration guide (plus your feedback of course).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're voting, I'd vote for 1 and 6, depending on the situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re (2): Darn. For some reason, I thought that worked. (Must've been confused.)

Yeah, everything else sgtm. I'll merge this PR shortly as is.


return result;
}

} // namespace

using firebase::firestore::model::FieldValue;
Expand Down Expand Up @@ -149,6 +192,16 @@ void Serializer::EncodeFieldValue(const FieldValue& field_value,
EncodeInteger(&stream, field_value.integer_value());
break;

case FieldValue::Type::String:
status = pb_encode_tag(&stream, PB_WT_STRING,
google_firestore_v1beta1_Value_string_value_tag);
if (!status) {
// TODO(rsgowman): figure out error handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: maybe do a helper function like handle_error() { /* TODO */ abort(); } or validate_status(bool status, SomeType error) { /*TODO*/ if (!status) abort; } to avoid copying the same comment around? This presumes that error handling will be pretty similar across {en,de}coders.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than do all that refactoring, I'd probably be better off to just implement error handling; it really isn't going to be all that complicated. I'll issue a followup PR.

abort();
}
EncodeString(&stream, field_value.string_value());
break;

default:
// TODO(rsgowman): implement the other types
abort();
Expand All @@ -163,11 +216,32 @@ FieldValue Serializer::DecodeFieldValue(const uint8_t* bytes, size_t length) {
uint32_t tag;
bool eof;
bool status = pb_decode_tag(&stream, &wire_type, &tag, &eof);
if (!status || wire_type != PB_WT_VARINT) {
if (!status) {
// TODO(rsgowman): figure out error handling
abort();
}

// Ensure the tag matches the wire type
// TODO(rsgowman): figure out error handling
switch (tag) {
case google_firestore_v1beta1_Value_null_value_tag:
case google_firestore_v1beta1_Value_boolean_value_tag:
case google_firestore_v1beta1_Value_integer_value_tag:
if (wire_type != PB_WT_VARINT) {
abort();
}
break;

case google_firestore_v1beta1_Value_string_value_tag:
if (wire_type != PB_WT_STRING) {
abort();
}
break;

default:
abort();
}

switch (tag) {
case google_firestore_v1beta1_Value_null_value_tag:
DecodeNull(&stream);
Expand All @@ -176,6 +250,8 @@ FieldValue Serializer::DecodeFieldValue(const uint8_t* bytes, size_t length) {
return FieldValue::BooleanValue(DecodeBool(&stream));
case google_firestore_v1beta1_Value_integer_value_tag:
return FieldValue::IntegerValue(DecodeInteger(&stream));
case google_firestore_v1beta1_Value_string_value_tag:
return FieldValue::StringValue(DecodeString(&stream));

default:
// TODO(rsgowman): figure out error handling
Expand Down
40 changes: 40 additions & 0 deletions Firestore/core/test/firebase/firestore/remote/serializer_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,46 @@ TEST_F(SerializerTest, EncodesIntegersModelToBytes) {
}
}

TEST_F(SerializerTest, EncodesStringModelToBytes) {
struct TestCase {
std::string value;
std::vector<uint8_t> bytes;
};

std::vector<TestCase> cases{
// TEXT_FORMAT_PROTO: 'string_value: ""'
{"", {0x8a, 0x01, 0x00}},
// TEXT_FORMAT_PROTO: 'string_value: "a"'
{"a", {0x8a, 0x01, 0x01, 0x61}},
// TEXT_FORMAT_PROTO: 'string_value: "abc def"'
{"abc def", {0x8a, 0x01, 0x07, 0x61, 0x62, 0x63, 0x20, 0x64, 0x65, 0x66}},
// TEXT_FORMAT_PROTO: 'string_value: "æ"'
{"æ", {0x8a, 0x01, 0x02, 0xc3, 0xa6}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely optional: Is it possible to use escape-sequence for the unicode character here instead of use that character directly? Assuming any modern editor/source-control-system supports unicode, there is no problem; so it is optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could; but OTOH, we've already got some test cases that use escape characters below. (More context: These test cases are taken from the other test suites, so my preference is to leave them as close as possible to the original.)

We could make an argument that this is redundant... but it's probably more trouble than it's worth to remove it across all platforms.

// TEXT_FORMAT_PROTO: 'string_value: "\0\ud7ff\ue000\uffff"'
// Note: Each one of the three embedded universal character names
// (\u-escaped) maps to three chars, so the total length of the string
// literal is 10 (ignoring the terminating null), and the resulting string
// literal is the same as '\0\xed\x9f\xbf\xee\x80\x80\xef\xbf\xbf'". The
// size of 10 must be added, or else std::string will see the \0 at the
// start and assume that's the end of the string.
{{"\0\ud7ff\ue000\uffff", 10},
{0x8a, 0x01, 0x0a, 0x00, 0xed, 0x9f, 0xbf, 0xee, 0x80, 0x80, 0xef, 0xbf,
0xbf}},
{{"\0\xed\x9f\xbf\xee\x80\x80\xef\xbf\xbf", 10},
{0x8a, 0x01, 0x0a, 0x00, 0xed, 0x9f, 0xbf, 0xee, 0x80, 0x80, 0xef, 0xbf,
0xbf}},
// TEXT_FORMAT_PROTO: 'string_value: "(╯°□°)╯︵ ┻━┻"'
{"(╯°□°)╯︵ ┻━┻",
{0x8a, 0x01, 0x1e, 0x28, 0xe2, 0x95, 0xaf, 0xc2, 0xb0, 0xe2, 0x96,
0xa1, 0xc2, 0xb0, 0xef, 0xbc, 0x89, 0xe2, 0x95, 0xaf, 0xef, 0xb8,
0xb5, 0x20, 0xe2, 0x94, 0xbb, 0xe2, 0x94, 0x81, 0xe2, 0x94, 0xbb}}};

for (const TestCase& test : cases) {
FieldValue model = FieldValue::StringValue(test.value);
ExpectRoundTrip(model, test.bytes, FieldValue::Type::String);
}
}

// TODO(rsgowman): Test [en|de]coding multiple protos into the same output
// vector.

Expand Down