Support arrays inside PYBIND11_NUMPY_DTYPE #832

bmerry · 2017-05-04T13:42:29Z

This implements #800.

Both C++ arrays and std::array are supported, including mixtures like
std::array<int, 2>[4]. In a multi-dimensional array of char, the last
dimension is used to construct a numpy string type.

There are some things I'm not sure about:

Is there a better name for array_info? The "array" could easily be
misinterpreted as referring to py::array rather than std::array.
Should array_info be split into separate classes for the different
traits, or is it okay to have this uber traits class?
When constructing the dtype, the shape should really be a tuple rather
than a list, but I'm not sure how to dynamically construct a tuple (or
how to convert the list to a tuple). numpy seems to accept a list.
Should the array dtypes be registered in some way? At present two
structures embedding the same array type will each construct a
separate dtype object to represent that array type. That's already the
case for char[N] though.
Should I add tests to use an array type directly e.g.
py::array_t<int[2]>?

This implements pybind#800. Both C++ arrays and std::array are supported, including mixtures like std::array<int, 2>[4]. In a multi-dimensional array of char, the last dimension is used to construct a numpy string type. There are some things I'm not sure about: - Is there a better name for array_info? The "array" could easily be misinterpreted as referring to py::array rather than std::array. - Should array_info be split into separate classes for the different traits, or is it okay to have this uber traits class? - When constructing the dtype, the shape should really be a tuple rather than a list, but I'm not sure how to dynamically construct a tuple (or how to convert the list to a tuple). numpy seems to accept a list. - Should the array dtypes be registered in some way? At present two structures embedding the same array type will each construct a separate dtype object to represent that array type. That's already the case for char[N] though. - Should I add tests to use an array type directly e.g. py::array_t<int[2]>?

bmerry · 2017-05-04T16:02:53Z

This is going to need a bit more work due to numpy/numpy#9049: misaligned arrays in a struct (or any array, if #831 is merged) will generate format strings like ^(3, 2)i which numpy chokes on.

Possibly the ^ should just be prepended to the T{} so that the whole struct gets parsed as packed, or it should be passed as the second argument to _dtype_from_pep3118.

jagerman · 2017-05-04T16:07:05Z

Cc: @aldanor for input on the PR.

This moves the ^ in the format string (to specify unaligned) to outside the `T{}` where it is sure to be parsed correctly. This is not strictly necessary yet, but it paves the way for pybind#832.

aldanor · 2017-05-05T00:06:32Z

This actually looks fairly good. Minor, you could probably simplify the test code a bit by just overloading operator<< on the types you dump?

Changing T{} to ^T{} is probably a good point too, given we already provide our own padding anyway... wish that numpy folks figured it out, we've hit quite a few edge cases already with this.

aldanor · 2017-05-05T00:08:55Z

tests/test_numpy_dtypes.cpp

@@ -70,6 +70,13 @@ struct StringStruct {
    std::array<char, 3> b;
 };

+struct ArrayStruct {
+    char a[3][4];


Maybe test zero-sized arrays as well?

Zero-sized built-in arrays are illegal in C++ (GCC and Clang accept them, but std::is_array is false for them, so they can't easily be supported).

std::array<T, 0> is legal (C++ says so explicitly) and I've started writing some extra code to support them, but I now think they should be disallowed because they're causing all kinds of problems with numpy. _dtype_from_pep3118 is non-deterministic with zero-sized arrays (I'm guessing because offsets are not unique so do not sort consistently). Given ^T{f:d:(0)I:empty:4x}, it sometimes gives

[('d', '<f4'), ('empty', '<u4', (1,)), ('', 'V4')]

and sometimes gives

{'names':['d','pad0',''], 'formats':['<f4','V4',('<u4', (0,))], 'offsets':[0,4,4], 'itemsize':8}

In the latter case, the dtype match sanity check fails, presumably because the padding is now named pad0.

By the way, that wouldn't prevent using a struct with zero-sized arrays in it - you just couldn't reflect the zero-sized array into Python. That's no great loss given that the field contains no information anyway.

aldanor · 2017-05-05T00:09:43Z

include/pybind11/numpy.h

+    static constexpr const size_t extent = N;
+
+    // appends the extents to shape
+    static void extents(list& shape) {


add_extent? append_extent? (to indicate that this is mutating the argument)

bmerry · 2017-05-05T06:02:28Z

This actually looks fairly good. Minor, you could probably simplify the test code a bit by just overloading operator<< on the types you dump?

Are you referring to the ArrayPrinter stuff? Do you mean that I should write templates like

template <typename T, size_t N> ostream& operator<<(ostream& os, const T v[N])

That feels like it's overly generic and may conflict with something else.

In retrospect, it will probably take less code just to manually code up the loops in the ArrayStruct version of operator <<. I'll look into that today.

Having a general-purpose array printer took up more LOC than just writing code to print each of the arrays I wanted to print.

They can't be reliably supported due to numpy/numpy#9049.

- Use explicitly sized types rather than assuming 32-bit int etc - Use uint8_t aka unsigned char, to ensure that it doesn't conflict with the special handling for char. - Add padding in the middle of the structure, to ensure that this is handled correctly.

It's not clear why the other changes related to arrays triggered this failure, but for some reason the compiler is generating references to the symbol for the digits member of the int_to_str type. Added the definition of the digits member so that the symbol will be emitted if needed. A quick check suggests that this doesn't increase the compiled size at all. I'm slightly suspicious because the size appears to stay *exactly* the same, whereas it should go up just to make room for the definition in the symbol table, but maybe some alignment is at work.

bmerry · 2017-05-05T12:58:11Z

I've made requested changes and fixed up the builds on Travis (Appveyor is still running, but I'm confident). This should be ready for more review and then hopefully merge.

aldanor · 2017-05-07T19:14:25Z

+1 or static assert re: zero-sized arrays, if they can't be properly reflected anyway -- I didn't know that.

This PR looks all good to me.

dean0x7d

Looks good to me overall. The only issue I see is the missing digits symbol which can be handled differently as suggested in the comment below.

The array_info naming and all-in-one traits struct seem fine to me since it's in detail and can be broken up in the future if needed.

Should I add tests to use an array type directly e.g. py::array_t<int[2]>?

If this is supported now, it would be good to cover it with a test. Or make it a compile-time error otherwise.

dean0x7d · 2017-05-07T22:09:50Z

include/pybind11/numpy.h

+    static std::string format() {
+        using detail::_;
+        return (_("(") + detail::array_info<T>::extents() + _(")")).text()
+            + format_descriptor<detail::remove_all_extents_t<T>>::format();


This is causing the missing symbols with clang/C++14. The descr part must be made explicitly constexpr to avoid needing a definition for digits (i.e. PYBIND11_DESCR in place of constexpr auto for compatiblity with C++11). The following lines along with removing the digits definition reduces binary size slightly for me:

PYBIND11_DESCR name = _("(") + detail::array_info<T>::extents() + _(")"); return name.text() + format_descriptor<detail::remove_all_extents_t<T>>::format();

Done. And thanks for figuring out how to solve the problem.

dean0x7d · 2017-05-07T22:12:14Z

include/pybind11/numpy.h

@@ -243,6 +243,46 @@ template <typename T, size_t N> struct is_std_array<std::array<T, N>> : std::tru
 template <typename T> struct is_complex : std::false_type { };
 template <typename T> struct is_complex<std::complex<T>> : std::true_type { };

+template <typename T> struct array_info_scalar {
+    typedef T type;
+    static constexpr const bool is_array = false;


const is redundant with constexpr. I'd remove it here and the other occurrences.

Done for the code added in this pull request. There are other instances already in pybind11 that I haven't touched.

dean0x7d · 2017-05-07T22:28:38Z

include/pybind11/numpy.h

+// treated as scalar because it gets special handling.
+template <typename T> struct array_info : array_info_scalar<T> { };
+template <typename T, size_t N> struct array_info<std::array<T, N>> {
+    typedef typename array_info<T>::type type;


I think using is pretty much always preferable to typedef. typedef typename and ::type type tend to be confusing and scare away newcomers.

@dean0x7d

Thanks to @dean0x7d for figuring out how to prevent clang demanding a definition of the member.

@dean0x7d

Suggested by @dean0x7d. I've removed it from the changes in the pull request, but haven't touched other instances that were already in the code.

bmerry · 2017-05-08T11:21:41Z

Should I add tests to use an array type directly e.g. py::array_t<int[2]>?
If this is supported now, it would be good to cover it with a test. Or make it a compile-time error otherwise.

It looks like it doesn't work quite as expected: when constructing a numpy array with an array dtype, the dtype dimensions are sucked into the array shape and the array has the base dtype. Maybe in theory one could extend the caster so that it could reverse this mapping (checking at run-time that the minor dimensions are the correct lengths and are C-contiguous), but that sounds like a major new feature rather than something I'm going to add to this pull request.

I've added a static_assert to array_t to prevent it being instantiated with array types.

This is to go with the weakened is_pod_struct static_assert.

dean0x7d · 2017-05-08T13:06:45Z

Maybe in theory one could extend the caster so that it could reverse this mapping (checking at run-time that the minor dimensions are the correct lengths and are C-contiguous), but that sounds like a major new feature rather than something I'm going to add to this pull request.

I don't think it's really a must-have feature. Just having the static_assert seems perfectly reasonable to me.

dean0x7d · 2017-05-08T13:08:23Z

include/pybind11/descr.h

@@ -69,7 +69,7 @@ template <size_t Size> constexpr descr<Size - 1, 0> _(char const(&text)[Size]) {

 template <size_t Rem, size_t... Digits> struct int_to_str : int_to_str<Rem/10, Rem%10, Digits...> { };
 template <size_t...Digits> struct int_to_str<0, Digits...> {
-    static constexpr auto digits = descr<sizeof...(Digits), 0>({ ('0' + Digits)..., '\0' }, { nullptr });
+    static constexpr const descr<sizeof...(Digits), 0> digits{{ ('0' + Digits)..., '\0' }, { nullptr }};


Revert this line as well since the missing symbols are fixed and this ends up being just an unrelated style change.

Thanks for the reminder - should be fixed now.

The previous commit added documentation on what structure fields are supported, and it incorrectly listed std::complex (which is in a different branch). Also removed the note about the static assert since that is also relevant to the std::complex branch only.

This was a leftover from having the digits member defined (as opposed to declared).

bmerry · 2017-05-10T06:54:18Z

Is there anything else remaining to do on this pull request? As far as I know the Appveyor failure is random (#792).

dean0x7d · 2017-05-10T08:26:03Z

Nothing else, this is great -- merged! Thanks for also adding the note in the documentation.

This fixes the test code on big-endian architectures: the array support (PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.

This fixes the test code on big-endian architectures: the array support (PR #832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.

This fixes the test code on big-endian architectures: the array support (PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on big-endian architectures.

bmerry added 2 commits May 4, 2017 15:27

Break long lines to keep flake8 happy

eb0e179

aldanor reviewed May 5, 2017

View reviewed changes

bmerry added 5 commits May 5, 2017 10:00

Rename mutating extents function append_extents

bbdd282

Simplify test code by eliminating ArrayPrinter

c09915c

Having a general-purpose array printer took up more LOC than just writing code to print each of the arrays I wanted to print.

Use a static_assert to explicitly disallow zero-sized arrays

3a561e9

They can't be reliably supported due to numpy/numpy#9049.

dean0x7d reviewed May 7, 2017

View reviewed changes

bmerry added 5 commits May 8, 2017 10:06

Avoid need to instantiate digits member

3fb52f6

Thanks to @dean0x7d for figuring out how to prevent clang demanding a definition of the member.

Change constexpr const -> constexpr

f813cdb

Suggested by @dean0x7d. I've removed it from the changes in the pull request, but haven't touched other instances that were already in the code.

Change typedef to using

c76533f

Merge remote-tracking branch 'origin/master' into arrays-in-struct

aabe835

Prevent using array types directly with array_t

9e2f502

Add documentation about what structured types are acceptable

398905d

This is to go with the weakened is_pod_struct static_assert.

dean0x7d reviewed May 8, 2017

View reviewed changes

bmerry added 2 commits May 8, 2017 15:24

Revert style change on declaration of digits member

b84c086

This was a leftover from having the digits member defined (as opposed to declared).

dean0x7d merged commit 8e0d832 into pybind:master May 10, 2017

dean0x7d mentioned this pull request May 10, 2017

Allow std::complex field with PYBIND11_NUMPY_DTYPE #831

Merged

bmerry deleted the arrays-in-struct branch May 10, 2017 12:08

dean0x7d modified the milestone: v2.2 Aug 13, 2017

jagerman mentioned this pull request Feb 17, 2018

Fix numpy dtypes test on big-endian architectures #1287

Merged

rwgk mentioned this pull request Feb 9, 2023

FWD pybind11 google/pybind11clif#832

Closed

Support arrays inside PYBIND11_NUMPY_DTYPE #832

Support arrays inside PYBIND11_NUMPY_DTYPE #832

Uh oh!

Conversation

bmerry commented May 4, 2017

Uh oh!

bmerry commented May 4, 2017

Uh oh!

jagerman commented May 4, 2017

Uh oh!

aldanor commented May 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmerry commented May 5, 2017

Uh oh!

bmerry commented May 5, 2017

Uh oh!

aldanor commented May 7, 2017

Uh oh!

dean0x7d left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmerry commented May 8, 2017

Uh oh!

dean0x7d commented May 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmerry commented May 10, 2017

Uh oh!

dean0x7d commented May 10, 2017

Uh oh!

Uh oh!

dean0x7d left a comment •

edited

Loading