-
Notifications
You must be signed in to change notification settings - Fork 323
Adoption of the Apache Arrow memory alignment and padding? #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@paddyhoran if this happens would it make sense to use ndarray-stats for Arrow compute kernels? (I hope this is the good name for min, max, sum, count etc functions). |
Yes, perhaps. We started to implement some compute kernels in Arrow and using SIMD,etc. However, the Rust community is relatively small to begin with.
There are probably some things that we need to resolve, for instance:
I think integration between the two could open up a lot of possibilities on both sides but I'm not sure how the |
Looking at this purely from an ecosystem integration perspective, it would be amazing! From a technical perspective, I am afraid I don't have enough mastery of the nitty-gritty details of With respect to your questions:
You would generally use
I am not aware of any optimization regarding boolean arrays in |
Hi Everyone,
I'm working on dictionary support (categories/factors) for the rust arrow
implementation.
The simplest thing would be to support bit packed bool vectors in ndarray.
These could be zipped with the data vectors.
I'm proposing to write iterator support for the arrow arrays in the near
future which could include the bitarrays. The
trick is to get the loops to vectorise.
It would be nice if there was an intrinsic in Rust to assert pointer
alignment. There may be such a thing
as it does exist in LLVM.
Arrow's 8 byte alignment is enough for pretty much every SIMD we are likely
to encounter except for power PC.
Andy.
…On Sun, Jan 5, 2020 at 11:21 AM Luca Palmieri ***@***.***> wrote:
Looking at this purely from an ecosystem integration perspective, it would
be amazing!
It would make it seamless to interoperate Rust code using ndarray with
systems developed in other language ecosystems using Apache Arrow (an
ever-growing list as far as I can see).
From a technical perspective, I am afraid I don't have enough mastery of
the nitty-gritty details of ndarray's internal memory representation (or
Arrow's 😅) to judge if there might be issues/incompatibilities.
@jturner314 <https://github.com/jturner314> / @bluss
<https://github.com/bluss> are probably the best suited to give a
high-level feasibility judgement - I'd be happy to work on this myself if
it's indeed viable.
With respect to your questions:
handling of Nulls, Arrow has a bitmap, does ndarray use a sentinel value?
You would generally use Option<T> if you are handling nullable values.
Arrow Boolean Arrays are bit packed, ndarray's might not
I am not aware of any optimization regarding boolean arrays in ndarray.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#771?email_source=notifications&email_token=AAL36XC6IJ5JHPSINYLFDRTQ4G7CBA5CNFSM4KBPYCSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIDUMKI#issuecomment-570902057>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAL36XGMK3HPGSH525S4WCLQ4G7CBANCNFSM4KBPYCSA>
.
|
@paddyhoran Exactly this is why I suggested the cross-dependency. While both Rust Arrow and ndarray(-stats) is awesome, they don't have the "critical mass" yet, together they could be much better 🤞 |
X86 is generally the best of all CPUs for alignment requirements as it will access data on two beats with at any alignment relatively little overhead. The exception may be "exotic" memory, such as GPU buffers and DMA I/O spaces. ARM can be a little fussy, but generally 4 bytes is good enough. Again, this depends very much on the flavour of ARM CPU and bus. Also the LLVM implementation is super-conservative and generates terrible code. I'm talking to Philippe the next time I'm in Cambridge about this. PPC is always the hard one as several instructions are needed to make an unaligned load or store. Worth noting the benefits of non-temporal stores on X86 which significantly improve the store speed which will otherwise become a bottleneck for all vector operations. Hard to achieve in Rust as LLVM is resistant to the concept beyond basic intrinsic support. You need to put a fence at the end of the loop to make it thread safe. But on the whole 128 or 256 bit alignment is likely to be a win. |
The current format of The best resources I've found about the Apache Arrow Tensor format are the following:
From what I can tell, the only restrictions on the memory layout of the data are that:
Given this, creating an Going the other way (from For owned arrays, we could allocate with 64-byte alignment when It would also be helpful to have a better understanding of the ownership semantics of the |
Hi @jturner314, Thanks for the detailed response.
Yes, I think this scenario should be, relatively, straight forward.
AFAIK the strides would always be a multiple of the element size and the use of bytes in the strides is an implementation detail (which could be changed).
Yes, this is where adoption of the Arrow functions that allocate memory would help. If
My understanding is as follows:
I think I will start small and add an implementation of converting from Arrow tensor to |
One key question is in regard to how |
My concern is that simply allocating with 64-byte alignment does not ensure that the first element of an array will be 64-byte aligned, due to the possibility of slicing. Consider the following Python example: import numpy as np
def is_first_elem_64_byte_aligned(array):
return array.__array_interface__['data'][0] % 64 == 0
aligned = np.zeros(10, dtype=np.uint8)
print(is_first_elem_64_byte_aligned(aligned))
unaligned = aligned[1:]
print(is_first_elem_64_byte_aligned(unaligned)) The example prints
In other words, the allocation is 64-byte aligned, but the first element of the array
|
I'll need to look into how As you mentioned on #776 I think there's not much you can do about alignment of slices and ensuring that the first element is aligned at allocation covers alot of common use cases. Technically, in your numpy example above you could have SIMD operate on the underlying data (without the I think you are right about investigating how Thanks again |
Just one note before you draft a PR -- I'd prefer for |
Yep, makes total sense. Thanks @jturner314 |
That makes sense if Arrow's memory allocation algorithm will never change. If it will, |
Integrating with Arrow seems very promising. The only "disappointment" is that Arrow focuses on columnar data, and its multidimensional type - Tensor - does not have much features or focus. The default owned Array - would want to be able to abstract away allocation strategy, but is today in practice tied to Vec. In practice it offers zero-copy conversion from In the long run I'd like to do maybe A and definitely B. A) Remove the strong ties between ndarray's Unfortunately I haven't had much time for ndarray. My time & interest remains with the fundamentals of ndarray - like these questions. I'll say that if a PR doesn't address (A), then an arrow allocated array needs to use a different type, for example it could be a new storage type. |
I've been reading the Buffer code -- from docs.rs/arrow -- just to understand a bit, and I can't help but mention the various questions I have. All after a superficial understanding of the code, unfortunately. Still, I hope it helps. I'm using the principle that if I see something, I prefer to say something. cc @paddyhoran I'm sorry that I am not signing up for the Arrow JIRA at this point, so I'll mention it here Soundness bugs
|
It absolutely does help! Thanks for taking the time to review.
No, worries. I don't blame you. I think that most of what you are describing is fixed / addressed in this PR. Would you mind take a quick look to confirm? I can open JIRA's for anything else.
I think we need to focus on how we can make it a safe abstraction. The |
This is true today (functionality may expand in the future), it's main feature is that it is trivial to convert to other Arrow types without copying data. The tensor type is mostly used to integrate with other frameworks (initially tensorflow, etc. on the Python side). The advantage would be that if you could zero-copy convert from |
As it stands, arrow's memory allocation functionality can not be used to allocate a general ndarray owned array, because of the hard-coded 64-byte alignment, just because this will be incorrect for element types that require higher alignment than 64; with the alignment attribute, I suppose this is easy to create. This is one minor technical niggle. I suppose Arrow defines its own type system, so it doesn't necessarily have to need the same generality. An arrow-allocated array in ndarray could have the same kind of restriction of element types, to only those permitted by the arrow model. Certainly conversions to Arrow types would be type system restricted in this way. (I suppose this is a tiresome approach, the nitpicks will never end, but this is part of working with |
Thanks again for your review and input @bluss. I opened ARROW-8480 to track the remaining issue you found. |
ARROW-8480 has now been resolved (https://issues.apache.org/jira/browse/ARROW-8480), but uses an unstable AllocRef API that's part of the |
@nevi-me It doesn't really look fixed unfortunately. |
How about |
Any plans on implementing this for arrow2? |
Hi,
I'm just trying to get a sense of the level of interest from the ndarray developers regarding adopting the Apache Arrow memory layout and padding.
I have been wanting to build integrations between Arrow and ndarray for some time. Today it should be easy enough to build a zero-copy converter to ndarray types. Arrow has a tensor type and this could be converted (with the optional names for dimensions in Arrow dropped).
However, without guarantees over the memory alignment and padding assumptions you could not go back to Arrow with zero-copy. The easiest way to do this would be for ndarray to use the Arrow functions that allocate memory through the Arrow Buffer type.
Arrow is attempting to make integrations between crates easier, I noticed this issue today. This is the kind of issue we could avoid.
In general, I think that Arrow and ndarray fit together quite nicely where Arrow could provide alot of help processing data and ndarray provides all the algorithms once data is cleaned and in-memory.
I'm not very familiar with the ndarray codebase, if this sounds like a good idea could you point me to where you allocate memory etc. and any other information that might help?
The text was updated successfully, but these errors were encountered: