-
Notifications
You must be signed in to change notification settings - Fork 70
Use "native" types when possible #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When passing an array from Julia to Python, if you want it to be a numpy array, you can simply do PythonCall/JuliaCall aims to be agnostic to particular Python packages like numpy, so automatically converting to Conversely, Python arrays become |
Thanks for the clarifications! I assume the And I take it that It would be nice if these points were mentioned in https://cjdoris.github.io/PythonCall.jl/stable/compat/ |
I get your point about not depending on specific Python packages, and I see how calling But it seems there's no equivalent in the other direction. Getting data from Python wraps things in a Here's a concrete example of the issue: using BenchmarkTools
using Strided
using PythonCall
a = rand(4000, 4000);
pa = PyArray(a);
b = similar(a);
pb = PyArray(b);
@btime b .= (a .+ a) ./ 2;
@btime pb .= (a .+ a) ./ 2;
@btime @strided b .= (a .+ a) ./ 2;
@btime @strided pb .= (pa .+ pa) ./ 2; Gives me the results:
One way to solve it is to keep |
A
I'm not familiar with Strided.jl, but maybe there is some integration work to do to make
|
That would increase complexity quite a bit! Presumably Strides.jl relies on some trait somewhere that we could overload. Maybe in ArrayInterface.jl?
Yep. |
Yes, they do have a trait they call "StridedView" - in https://github.com/Jutho/StridedViews.jl - which in theory could be implemented for PyArray. That would be nice but is is just one example of one library. If, on the other hand, there was a PyDenseArray in the (extremely common case) that the data is actually dense (the strides work out), then this would work for "everything" that takes advantage of the standard DenseArray trait (restricted though this may be). As far as complexity goes, both implementation of PyArray and PyDenseArray would be "mostly identical" so perhaps some creative use of Edit: added a link. |
Another implementation strategy, hopefully simpler: since DenseArray is an abstract type, it should be possible to write DenseView which has an Array field (using jl_ptr_to_array), as well as a second field of type Any to keep whatever-is-holding-the-data-for-gc. Declare DenseView to be a DenseArray, and forward the relevant methods. Then it would just be a matter of wrapping PyArray with a DenseView (obviously, only if the strides work out). "In theory", this should allow anything that expects a DenseArray to "just work"...? |
I'm getting good results with the following. I can use it to wrap a
|
That |
Problem
Currently when passing arrays between Python and Julia, then the code always uses a wrapper object (
PyArray
,ArrayValue
, etc.), even if the data happens to be contiguous in memory. This is a problem because not all code works well (or at all) with these wrappers. In an ideal world with proper interfaces, this wouldn't have been a problem, but both Python and Julia are somewhat lax about strict interfaces for arrays, so in practice, things sometimes break (or just run slowly).As just one trivial example, this will crash:
Because
TypeError: unsupported operand type(s) for +=: 'VectorValue' and 'int'
.The point isn't about this specific missing operation (though fixing it would be nice); the point is that try as we may, we'll never make
VectorValue
be a 100% drop-in replacement tondarray
.Solution
When converting arrays between Julia and Python, if the data is contiguous, then use
frombuffer
to wrap the memory with anndarray
for Python, or usejl_ptr_to_array
to wrap the memory with anArray
for Julia to use. If, however, the data is not contiguous in memory, keep the current behavior of returning a wrapper type.Alternatives
This can be done manually of course, which is probably what I will be doing in my code for now. That said, even if this wasn't the default behavior, it would be nice to provide some convenience functions to make it easier to achieve.
Additional context
Here is some example code which worked for me to demonstrate the feasibility of the solution:
Naturally this is just a proof of concept and doesn't deal with issues such as proper mapping of
dtype
, testing whether the data is actually contiguous, etc.One point woth noting is that the fact that
Array
"likes" to be column-major andndarray
"likes" to be row-major is not a show stopper here. I have plenty of Python code which explicitly works with column-major arrays (because that's what needed for efficiency), and Julia hasPermutedDimsArray
which can likewise deal with row-major order. It is always the developer's responsibility to deal with this issue (summing rows of column-major data will be much slower than summing rows of the same data in rows-major order).As a side note, the built-in implementation in both
numpy
and Julia for converting data between these layouts is slow as molasses for no good reason, so I had to provide my own implementation to get reasonable performance. But that's mostly irrelevant to the issue I raised here.The text was updated successfully, but these errors were encountered: