Skip to content

Cast bytearray to string #3707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions include/pybind11/cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ struct string_caster {
return false;
}
if (!PyUnicode_Check(load_src.ptr())) {
return load_bytes(load_src);
return load_raw(load_src);
}

// For UTF-8 we avoid the need for a temporary `bytes` object by using
Expand Down Expand Up @@ -458,26 +458,37 @@ struct string_caster {
#endif
}

// When loading into a std::string or char*, accept a bytes object as-is (i.e.
// When loading into a std::string or char*, accept a bytes/bytearray object as-is (i.e.
// without any encoding/decoding attempt). For other C++ char sizes this is a no-op.
// which supports loading a unicode from a str, doesn't take this path.
template <typename C = CharT>
bool load_bytes(enable_if_t<std::is_same<C, char>::value, handle> src) {
bool load_raw(enable_if_t<std::is_same<C, char>::value, handle> src) {
if (PYBIND11_BYTES_CHECK(src.ptr())) {
// We were passed raw bytes; accept it into a std::string or char*
// without any encoding attempt.
const char *bytes = PYBIND11_BYTES_AS_STRING(src.ptr());
if (bytes) {
value = StringType(bytes, (size_t) PYBIND11_BYTES_SIZE(src.ptr()));
return true;
if (!bytes) {
pybind11_fail("Unexpected PYBIND11_BYTES_AS_STRING() failure.");
}
value = StringType(bytes, (size_t) PYBIND11_BYTES_SIZE(src.ptr()));
return true;
}
if (PyByteArray_Check(src.ptr())) {
// We were passed a bytearray; accept it into a std::string or char*
// without any encoding attempt.
const char *bytearray = PyByteArray_AsString(src.ptr());
if (!bytearray) {
pybind11_fail("Unexpected PyByteArray_AsString() failure.");
}
value = StringType(bytearray, (size_t) PyByteArray_Size(src.ptr()));
return true;
}

return false;
}

template <typename C = CharT>
bool load_bytes(enable_if_t<!std::is_same<C, char>::value, handle>) {
bool load_raw(enable_if_t<!std::is_same<C, char>::value, handle>) {
return false;
}
};
Expand Down
9 changes: 9 additions & 0 deletions tests/test_builtin_casters.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,15 @@ def test_bytes_to_string():
assert m.string_length("💩".encode()) == 4


def test_bytearray_to_string():
"""Tests the ability to pass bytearray to C++ string-accepting functions"""
assert m.string_length(bytearray(b"Hi")) == 2
assert m.strlen(bytearray(b"bytearray")) == 9
assert m.string_length(bytearray()) == 0
assert m.string_length(bytearray("🦜", "utf-8", "strict")) == 4
assert m.string_length(bytearray(b"\x80")) == 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests with empty bytearrays?

And with malformed utf-8, to prove that there really isn't any encoding (see e.g. malformed_utf8 in test_pytypes.py).

I looked at the Python C code underneath PyByteArray_AsString, it special-cases empty arrays. But in any case, tests for corner cases are best practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, an empty bytearray test case will be added.
For malformed_utf8, I'm not sure it's the same case as casting to python str. In test_pytypes, it's casting a single byte to str, and str is not fulfilled with a utf-8 resulting "b'\x80'". While passing b"\x80" to C++ string it would be treated as a byte and has size 1. Is this the case you're looking for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation/idea is very simple and high-level:

  • We want to be sure this works for malformed utf-8.

We know the current implementation does. We want to be sure that's not accidentally broken somehow (e.g. refactoring).

My suggestion was based on past experience, where that actually happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, the test is added. I'm trying to make sure assert m.string_length(bytearray(b"\x80")) == 1 is the behavior we're looking for 😊

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it just be better to parameterize the bytes_to_str tests and have them be rerun with all the bytes args wrapped as bytearrays(via pytest fixture for instance? or by having a function that either returns the bytes or the bytes wrapped in a bytearray depending on test parameters)

Copy link
Contributor Author

@porrashuang porrashuang Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Skylion007, I also came up wrapping up the params for better coverage. Considering there're at least 3 dimensions (bytearray vs bytes, string_lenth vs strlen, bytearray with/without params), and a few corner cases(empty bytearray), I would suggest the current intuitive version should give us similar coverage with simplicity. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


@pytest.mark.skipif(not hasattr(m, "has_string_view"), reason="no <string_view>")
def test_string_view(capture):
"""Tests support for C++17 string_view arguments and return values"""
Expand Down