-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Support arrays inside PYBIND11_NUMPY_DTYPE #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This implements pybind#800. Both C++ arrays and std::array are supported, including mixtures like std::array<int, 2>[4]. In a multi-dimensional array of char, the last dimension is used to construct a numpy string type. There are some things I'm not sure about: - Is there a better name for array_info? The "array" could easily be misinterpreted as referring to py::array rather than std::array. - Should array_info be split into separate classes for the different traits, or is it okay to have this uber traits class? - When constructing the dtype, the shape should really be a tuple rather than a list, but I'm not sure how to dynamically construct a tuple (or how to convert the list to a tuple). numpy seems to accept a list. - Should the array dtypes be registered in some way? At present two structures embedding the same array type will each construct a separate dtype object to represent that array type. That's already the case for char[N] though. - Should I add tests to use an array type directly e.g. py::array_t<int[2]>?
This is going to need a bit more work due to numpy/numpy#9049: misaligned arrays in a struct (or any array, if #831 is merged) will generate format strings like Possibly the |
Cc: @aldanor for input on the PR. |
This moves the ^ in the format string (to specify unaligned) to outside the `T{}` where it is sure to be parsed correctly. This is not strictly necessary yet, but it paves the way for pybind#832.
This actually looks fairly good. Minor, you could probably simplify the test code a bit by just overloading Changing |
@@ -70,6 +70,13 @@ struct StringStruct { | |||
std::array<char, 3> b; | |||
}; | |||
|
|||
struct ArrayStruct { | |||
char a[3][4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test zero-sized arrays as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero-sized built-in arrays are illegal in C++ (GCC and Clang accept them, but std::is_array is false for them, so they can't easily be supported).
std::array<T, 0> is legal (C++ says so explicitly) and I've started writing some extra code to support them, but I now think they should be disallowed because they're causing all kinds of problems with numpy. _dtype_from_pep3118
is non-deterministic with zero-sized arrays (I'm guessing because offsets are not unique so do not sort consistently). Given ^T{f:d:(0)I:empty:4x}
, it sometimes gives
[('d', '<f4'), ('empty', '<u4', (1,)), ('', 'V4')]
and sometimes gives
{'names':['d','pad0',''], 'formats':['<f4','V4',('<u4', (0,))], 'offsets':[0,4,4], 'itemsize':8}
In the latter case, the dtype match sanity check fails, presumably because the padding is now named pad0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, that wouldn't prevent using a struct with zero-sized arrays in it - you just couldn't reflect the zero-sized array into Python. That's no great loss given that the field contains no information anyway.
include/pybind11/numpy.h
Outdated
static constexpr const size_t extent = N; | ||
|
||
// appends the extents to shape | ||
static void extents(list& shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_extent
? append_extent
? (to indicate that this is mutating the argument)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Are you referring to the template <typename T, size_t N> ostream& operator<<(ostream& os, const T v[N]) That feels like it's overly generic and may conflict with something else. In retrospect, it will probably take less code just to manually code up the loops in the |
Having a general-purpose array printer took up more LOC than just writing code to print each of the arrays I wanted to print.
They can't be reliably supported due to numpy/numpy#9049.
- Use explicitly sized types rather than assuming 32-bit int etc - Use uint8_t aka unsigned char, to ensure that it doesn't conflict with the special handling for char. - Add padding in the middle of the structure, to ensure that this is handled correctly.
It's not clear why the other changes related to arrays triggered this failure, but for some reason the compiler is generating references to the symbol for the digits member of the int_to_str type. Added the definition of the digits member so that the symbol will be emitted if needed. A quick check suggests that this doesn't increase the compiled size at all. I'm slightly suspicious because the size appears to stay *exactly* the same, whereas it should go up just to make room for the definition in the symbol table, but maybe some alignment is at work.
I've made requested changes and fixed up the builds on Travis (Appveyor is still running, but I'm confident). This should be ready for more review and then hopefully merge. |
+1 or static assert re: zero-sized arrays, if they can't be properly reflected anyway -- I didn't know that. This PR looks all good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. The only issue I see is the missing digits
symbol which can be handled differently as suggested in the comment below.
The array_info
naming and all-in-one traits struct seem fine to me since it's in detail
and can be broken up in the future if needed.
Should I add tests to use an array type directly e.g. py::array_t<int[2]>?
If this is supported now, it would be good to cover it with a test. Or make it a compile-time error otherwise.
include/pybind11/numpy.h
Outdated
static std::string format() { | ||
using detail::_; | ||
return (_("(") + detail::array_info<T>::extents() + _(")")).text() | ||
+ format_descriptor<detail::remove_all_extents_t<T>>::format(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing the missing symbols with clang/C++14. The descr
part must be made explicitly constexpr
to avoid needing a definition for digits
(i.e. PYBIND11_DESCR
in place of constexpr auto
for compatiblity with C++11). The following lines along with removing the digits
definition reduces binary size slightly for me:
PYBIND11_DESCR name = _("(") + detail::array_info<T>::extents() + _(")");
return name.text() + format_descriptor<detail::remove_all_extents_t<T>>::format();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. And thanks for figuring out how to solve the problem.
include/pybind11/numpy.h
Outdated
@@ -243,6 +243,46 @@ template <typename T, size_t N> struct is_std_array<std::array<T, N>> : std::tru | |||
template <typename T> struct is_complex : std::false_type { }; | |||
template <typename T> struct is_complex<std::complex<T>> : std::true_type { }; | |||
|
|||
template <typename T> struct array_info_scalar { | |||
typedef T type; | |||
static constexpr const bool is_array = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
is redundant with constexpr
. I'd remove it here and the other occurrences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the code added in this pull request. There are other instances already in pybind11 that I haven't touched.
include/pybind11/numpy.h
Outdated
// treated as scalar because it gets special handling. | ||
template <typename T> struct array_info : array_info_scalar<T> { }; | ||
template <typename T, size_t N> struct array_info<std::array<T, N>> { | ||
typedef typename array_info<T>::type type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using
is pretty much always preferable to typedef
. typedef typename
and ::type type
tend to be confusing and scare away newcomers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It looks like it doesn't work quite as expected: when constructing a numpy array with an array dtype, the dtype dimensions are sucked into the array shape and the array has the base dtype. Maybe in theory one could extend the caster so that it could reverse this mapping (checking at run-time that the minor dimensions are the correct lengths and are C-contiguous), but that sounds like a major new feature rather than something I'm going to add to this pull request. I've added a static_assert to array_t to prevent it being instantiated with array types. |
This is to go with the weakened is_pod_struct static_assert.
I don't think it's really a must-have feature. Just having the |
include/pybind11/descr.h
Outdated
@@ -69,7 +69,7 @@ template <size_t Size> constexpr descr<Size - 1, 0> _(char const(&text)[Size]) { | |||
|
|||
template <size_t Rem, size_t... Digits> struct int_to_str : int_to_str<Rem/10, Rem%10, Digits...> { }; | |||
template <size_t...Digits> struct int_to_str<0, Digits...> { | |||
static constexpr auto digits = descr<sizeof...(Digits), 0>({ ('0' + Digits)..., '\0' }, { nullptr }); | |||
static constexpr const descr<sizeof...(Digits), 0> digits{{ ('0' + Digits)..., '\0' }, { nullptr }}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this line as well since the missing symbols are fixed and this ends up being just an unrelated style change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder - should be fixed now.
The previous commit added documentation on what structure fields are supported, and it incorrectly listed std::complex (which is in a different branch). Also removed the note about the static assert since that is also relevant to the std::complex branch only.
This was a leftover from having the digits member defined (as opposed to declared).
Is there anything else remaining to do on this pull request? As far as I know the Appveyor failure is random (#792). |
Nothing else, this is great -- merged! Thanks for also adding the note in the documentation. |
This fixes the test code on big-endian architectures: the array support (PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.
This fixes the test code on big-endian architectures: the array support (PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.
This fixes the test code on big-endian architectures: the array support (PR #832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.
This fixes the test code on big-endian architectures: the array support (PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.
This implements #800.
Both C++ arrays and std::array are supported, including mixtures like
std::array<int, 2>[4]. In a multi-dimensional array of char, the last
dimension is used to construct a numpy string type.
There are some things I'm not sure about:
misinterpreted as referring to py::array rather than std::array.
traits, or is it okay to have this uber traits class?
than a list, but I'm not sure how to dynamically construct a tuple (or
how to convert the list to a tuple). numpy seems to accept a list.
structures embedding the same array type will each construct a
separate dtype object to represent that array type. That's already the
case for char[N] though.
py::array_t<int[2]>?