Skip to content

Support arrays inside PYBIND11_NUMPY_DTYPE #832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 10, 2017

Conversation

bmerry
Copy link
Contributor

@bmerry bmerry commented May 4, 2017

This implements #800.

Both C++ arrays and std::array are supported, including mixtures like
std::array<int, 2>[4]. In a multi-dimensional array of char, the last
dimension is used to construct a numpy string type.

There are some things I'm not sure about:

  • Is there a better name for array_info? The "array" could easily be
    misinterpreted as referring to py::array rather than std::array.
  • Should array_info be split into separate classes for the different
    traits, or is it okay to have this uber traits class?
  • When constructing the dtype, the shape should really be a tuple rather
    than a list, but I'm not sure how to dynamically construct a tuple (or
    how to convert the list to a tuple). numpy seems to accept a list.
  • Should the array dtypes be registered in some way? At present two
    structures embedding the same array type will each construct a
    separate dtype object to represent that array type. That's already the
    case for char[N] though.
  • Should I add tests to use an array type directly e.g.
    py::array_t<int[2]>?

bmerry added 2 commits May 4, 2017 15:27
This implements pybind#800.

Both C++ arrays and std::array are supported, including mixtures like
std::array<int, 2>[4]. In a multi-dimensional array of char, the last
dimension is used to construct a numpy string type.

There are some things I'm not sure about:
- Is there a better name for array_info? The "array" could easily be
  misinterpreted as referring to py::array rather than std::array.
- Should array_info be split into separate classes for the different
  traits, or is it okay to have this uber traits class?
- When constructing the dtype, the shape should really be a tuple rather
  than a list, but I'm not sure how to dynamically construct a tuple (or
  how to convert the list to a tuple). numpy seems to accept a list.
- Should the array dtypes be registered in some way? At present two
  structures embedding the same array type will each construct a
  separate dtype object to represent that array type. That's already the
  case for char[N] though.
- Should I add tests to use an array type directly e.g.
  py::array_t<int[2]>?
@bmerry
Copy link
Contributor Author

bmerry commented May 4, 2017

This is going to need a bit more work due to numpy/numpy#9049: misaligned arrays in a struct (or any array, if #831 is merged) will generate format strings like ^(3, 2)i which numpy chokes on.

Possibly the ^ should just be prepended to the T{} so that the whole struct gets parsed as packed, or it should be passed as the second argument to _dtype_from_pep3118.

@jagerman
Copy link
Member

jagerman commented May 4, 2017

Cc: @aldanor for input on the PR.

bmerry added a commit to bmerry/pybind11 that referenced this pull request May 4, 2017
This moves the ^ in the format string (to specify unaligned) to
outside the `T{}` where it is sure to be parsed correctly. This is not
strictly necessary yet, but it paves the way for pybind#832.
@aldanor
Copy link
Member

aldanor commented May 5, 2017

This actually looks fairly good. Minor, you could probably simplify the test code a bit by just overloading operator<< on the types you dump?

Changing T{} to ^T{} is probably a good point too, given we already provide our own padding anyway... wish that numpy folks figured it out, we've hit quite a few edge cases already with this.

@@ -70,6 +70,13 @@ struct StringStruct {
std::array<char, 3> b;
};

struct ArrayStruct {
char a[3][4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test zero-sized arrays as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero-sized built-in arrays are illegal in C++ (GCC and Clang accept them, but std::is_array is false for them, so they can't easily be supported).

std::array<T, 0> is legal (C++ says so explicitly) and I've started writing some extra code to support them, but I now think they should be disallowed because they're causing all kinds of problems with numpy. _dtype_from_pep3118 is non-deterministic with zero-sized arrays (I'm guessing because offsets are not unique so do not sort consistently). Given ^T{f:d:(0)I:empty:4x}, it sometimes gives

[('d', '<f4'), ('empty', '<u4', (1,)), ('', 'V4')]

and sometimes gives

{'names':['d','pad0',''], 'formats':['<f4','V4',('<u4', (0,))], 'offsets':[0,4,4], 'itemsize':8}

In the latter case, the dtype match sanity check fails, presumably because the padding is now named pad0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, that wouldn't prevent using a struct with zero-sized arrays in it - you just couldn't reflect the zero-sized array into Python. That's no great loss given that the field contains no information anyway.

static constexpr const size_t extent = N;

// appends the extents to shape
static void extents(list& shape) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_extent? append_extent? (to indicate that this is mutating the argument)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bmerry
Copy link
Contributor Author

bmerry commented May 5, 2017

This actually looks fairly good. Minor, you could probably simplify the test code a bit by just overloading operator<< on the types you dump?

Are you referring to the ArrayPrinter stuff? Do you mean that I should write templates like

template <typename T, size_t N> ostream& operator<<(ostream& os, const T v[N])

That feels like it's overly generic and may conflict with something else.

In retrospect, it will probably take less code just to manually code up the loops in the ArrayStruct version of operator <<. I'll look into that today.

bmerry added 5 commits May 5, 2017 10:00
Having a general-purpose array printer took up more LOC than just
writing code to print each of the arrays I wanted to print.
- Use explicitly sized types rather than assuming 32-bit int etc
- Use uint8_t aka unsigned char, to ensure that it doesn't conflict with
  the special handling for char.
- Add padding in the middle of the structure, to ensure that this is
  handled correctly.
It's not clear why the other changes related to arrays triggered this
failure, but for some reason the compiler is generating references to
the symbol for the digits member of the int_to_str type.

Added the definition of the digits member so that the symbol will be
emitted if needed. A quick check suggests that this doesn't increase the
compiled size at all. I'm slightly suspicious because the size appears
to stay *exactly* the same, whereas it should go up just to make room
for the definition in the symbol table, but maybe some alignment is at
work.
@bmerry
Copy link
Contributor Author

bmerry commented May 5, 2017

I've made requested changes and fixed up the builds on Travis (Appveyor is still running, but I'm confident). This should be ready for more review and then hopefully merge.

@aldanor
Copy link
Member

aldanor commented May 7, 2017

+1 or static assert re: zero-sized arrays, if they can't be properly reflected anyway -- I didn't know that.

This PR looks all good to me.

Copy link
Member

@dean0x7d dean0x7d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. The only issue I see is the missing digits symbol which can be handled differently as suggested in the comment below.

The array_info naming and all-in-one traits struct seem fine to me since it's in detail and can be broken up in the future if needed.

Should I add tests to use an array type directly e.g. py::array_t<int[2]>?

If this is supported now, it would be good to cover it with a test. Or make it a compile-time error otherwise.

static std::string format() {
using detail::_;
return (_("(") + detail::array_info<T>::extents() + _(")")).text()
+ format_descriptor<detail::remove_all_extents_t<T>>::format();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing the missing symbols with clang/C++14. The descr part must be made explicitly constexpr to avoid needing a definition for digits (i.e. PYBIND11_DESCR in place of constexpr auto for compatiblity with C++11). The following lines along with removing the digits definition reduces binary size slightly for me:

PYBIND11_DESCR name = _("(") + detail::array_info<T>::extents() + _(")");
return name.text() + format_descriptor<detail::remove_all_extents_t<T>>::format();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And thanks for figuring out how to solve the problem.

@@ -243,6 +243,46 @@ template <typename T, size_t N> struct is_std_array<std::array<T, N>> : std::tru
template <typename T> struct is_complex : std::false_type { };
template <typename T> struct is_complex<std::complex<T>> : std::true_type { };

template <typename T> struct array_info_scalar {
typedef T type;
static constexpr const bool is_array = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const is redundant with constexpr. I'd remove it here and the other occurrences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the code added in this pull request. There are other instances already in pybind11 that I haven't touched.

// treated as scalar because it gets special handling.
template <typename T> struct array_info : array_info_scalar<T> { };
template <typename T, size_t N> struct array_info<std::array<T, N>> {
typedef typename array_info<T>::type type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using is pretty much always preferable to typedef. typedef typename and ::type type tend to be confusing and scare away newcomers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bmerry added 5 commits May 8, 2017 10:06
Thanks to @dean0x7d for figuring out how to prevent clang demanding a
definition of the member.
Suggested by @dean0x7d. I've removed it from the changes in the pull
request, but haven't touched other instances that were already in the
code.
@bmerry
Copy link
Contributor Author

bmerry commented May 8, 2017

Should I add tests to use an array type directly e.g. py::array_t<int[2]>?
If this is supported now, it would be good to cover it with a test. Or make it a compile-time error otherwise.

It looks like it doesn't work quite as expected: when constructing a numpy array with an array dtype, the dtype dimensions are sucked into the array shape and the array has the base dtype. Maybe in theory one could extend the caster so that it could reverse this mapping (checking at run-time that the minor dimensions are the correct lengths and are C-contiguous), but that sounds like a major new feature rather than something I'm going to add to this pull request.

I've added a static_assert to array_t to prevent it being instantiated with array types.

This is to go with the weakened is_pod_struct static_assert.
@dean0x7d
Copy link
Member

dean0x7d commented May 8, 2017

Maybe in theory one could extend the caster so that it could reverse this mapping (checking at run-time that the minor dimensions are the correct lengths and are C-contiguous), but that sounds like a major new feature rather than something I'm going to add to this pull request.

I don't think it's really a must-have feature. Just having the static_assert seems perfectly reasonable to me.

@@ -69,7 +69,7 @@ template <size_t Size> constexpr descr<Size - 1, 0> _(char const(&text)[Size]) {

template <size_t Rem, size_t... Digits> struct int_to_str : int_to_str<Rem/10, Rem%10, Digits...> { };
template <size_t...Digits> struct int_to_str<0, Digits...> {
static constexpr auto digits = descr<sizeof...(Digits), 0>({ ('0' + Digits)..., '\0' }, { nullptr });
static constexpr const descr<sizeof...(Digits), 0> digits{{ ('0' + Digits)..., '\0' }, { nullptr }};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this line as well since the missing symbols are fixed and this ends up being just an unrelated style change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder - should be fixed now.

bmerry added 2 commits May 8, 2017 15:24
The previous commit added documentation on what structure fields are
supported, and it incorrectly listed std::complex (which is in a
different branch). Also removed the note about the static assert since
that is also relevant to the std::complex branch only.
This was a leftover from having the digits member defined (as opposed to
declared).
@bmerry
Copy link
Contributor Author

bmerry commented May 10, 2017

Is there anything else remaining to do on this pull request? As far as I know the Appveyor failure is random (#792).

@dean0x7d dean0x7d merged commit 8e0d832 into pybind:master May 10, 2017
@dean0x7d
Copy link
Member

Nothing else, this is great -- merged! Thanks for also adding the note in the documentation.

@bmerry bmerry deleted the arrays-in-struct branch May 10, 2017 12:08
@dean0x7d dean0x7d modified the milestone: v2.2 Aug 13, 2017
jagerman added a commit to jagerman/pybind11 that referenced this pull request Feb 17, 2018
This fixes the test code on big-endian architectures: the array support
(PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on
big-endian architectures.
jagerman added a commit to jagerman/pybind11 that referenced this pull request Feb 17, 2018
This fixes the test code on big-endian architectures: the array support
(PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on
big-endian architectures.
wjakob pushed a commit that referenced this pull request Feb 18, 2018
This fixes the test code on big-endian architectures: the array support
(PR #832) had hard-coded the little-endian '<' but we need to use '>' on
big-endian architectures.
wjakob pushed a commit to wjakob/pybind11 that referenced this pull request Apr 29, 2018
This fixes the test code on big-endian architectures: the array support
(PR pybind#832) had hard-coded the little-endian '<' but we need to use '>' on
big-endian architectures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants