Skip to content

add convenience wrappers for ccall(:jl_reshape_array,...) #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chethega
Copy link
Contributor

Reinterpreting between Array{T} and Array{SVector{T}} is one of the most common operations when using StaticArrays. This adds a convenient wrapper for the underlying runtime function. This does literally the same as the old reinterpret, but is now maybe unsafe due to type-based aliasing analysis. I am open for changing the name and the usage. This avoids the performance impact of #410.

Examples:

julia> using StaticArrays

julia> A=reshape(collect(1:8), (2,2,2))
2×2×2 Array{Int64,3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

 julia> Asv = unsafe_squashdims(A,(:,:))
2-element Array{SArray{Tuple{2,2},Int64,2,4},1}:
 [1 3; 2 4]
 [5 7; 6 8]

 julia> A[2,2,1]=-1; A[2,2,1]==Asv[1][2,2]

ulia> MM=zeros(SVector{2,Int32},2)
2-element Array{SArray{Tuple{2},Int32,1,2},1}:
 [0, 0]
 [0, 0]

 julia> M=unsafe_unsquash(MM); M[1,1]=-1; M[2,1]=-2; M
2×2 Array{Int32,2}:
 -1  0
 -2  0

julia> MM
2-element Array{SArray{Tuple{2},Int32,1,2},1}:
 [-1, -2]
 [0, 0]  

@coveralls
Copy link

coveralls commented Sep 15, 2018

Coverage Status

Coverage remained the same at 94.992% when pulling 92287a2 on chethega:squash into 81dd6ca on JuliaArrays:master.

@andyferris
Copy link
Member

Thanks a lot for this. I agree we need to address this!

A couple of thoughts:

  • On naming, SplitApplyCombine has a function that does something similar to this called splitdims and the reverse called combinedims, for creating or flattening nested arrays. Not sure if there is other nomenclenture around for similar operations. This seems like a version of these where one of the axes has a static size. Thus, perhaps we could name it something similar (like unsafe_splitdims).

  • I wondering if we can work around the Array TBAA stuff entirely by making our own array wrapper type. In UmanagedMemory I made a pointer-based wrapper which seemd (in some limited benchmarks) to be just as fast in performance as Array. I'd love to have a working understanding of what the consequences of TBAA are...

Finally - @Keno and @vtjnash, as always I'd appreciate knowing whether you think we're crazy :)

@vtjnash
Copy link

vtjnash commented Sep 16, 2018

By violating the aliasing rules, you are encouraging LLVM to miscompile the code. By using unsafe load/store, you are simply encouraging LLVM not to optimize it. Neither matters in trivial tests—they only become problematic if someone wants to integrate the code with their own.

@chethega
Copy link
Contributor Author

By violating the aliasing rules, you are encouraging LLVM to miscompile the code.

I know! That's why we should offer multiple versions, this being the unsafe one.

The way I use such constructs looks like: Read data as Array. Run some very expensive operations on the data using the SArray-form. Run other operations on the Array-form (maybe using external libraries like BLAS). Maybe write transformed data as Array.

Aliasing accesses are typically separated my multiple non-inlined, often type-unstable/non-inferred (size of the SArray), function boundaries, and often even across language boundaries.

For such purposes, a ReinterpretArray, or any pure julia construct, cannot not work, even if its performance gets fixed: FFI-related functions wisely should dispatch on Array, not AbstractArray.

I don't know whether there is an optimization-boundary that people can use and we could document/export.

For example, I could imagine also a aliasing_analysis_boundary(a,b)=ccall(..., pointer(a),pointer(b)) that calls into some trivial function that llvm cannot see into. Then, one could request users to separate aliasing read/write accesses by a call to this function (well, better make it a macro; then one could avoid the insubstantial run-time cost of the ccall ifsome meta_here_be_dragons ever makes it into Base).

I wondering if we can work around the Array TBAA stuff entirely by making our own array wrapper type

By using unsafe load/store, you are simply encouraging LLVM not to optimize it.

I really liked @Keno's idea of the new tbaa_pointerref builtin that unfortunately didn't get merged. Some variant of this would be cool and allow us non-core people to experiment, e.g. permitting julia code to set all attributes of pointerref/pointerset (maybe including dereferencable, nonnull, alignment, aliasing tags, etc -- make us read the llvm IR manual and expose all the attributes).

On naming and signature: what about unsafe_SAcombine(a::Array{T,N}, dims)::Array{<:SArray{T}, N - dims} and unsafe_SAsplit? Upon reconsideration, there is no way in hell to make unsafe_SAcombine / unsafe_squashdims amenable to inference or type-stable. So no reason to lift dims to the type domain (which I did via the NTuple{Colon}).

@chethega
Copy link
Contributor Author

So for all this tbaa-discussion, a little reality check:

julia> @noinline function aliasornot(A,B)
       @inbounds begin
       v=A[1]
       B[1]=zero(eltype(B))
       A[1]=v
       end
       return nothing
       end

Now the resulting @code_native should allow us to infer what llvm effectively believes about aliasing. Result: LLVM appears to currently (1.0.0 and master) act on no aliasing assumptions regarding differently typed arrays, like e.g. UInt8, Float32, Float64, Int64. Is all the current reinterpret-hassle premature because we don't even reap the performance benefits of array-type tbaa?

@Keno
Copy link
Contributor

Keno commented Sep 16, 2018

Correct, julia master doesn't express the TBAA information, but the reason it had to be done in 1.0 was that changing the behavior of reinterpret is a breaking change.

@chethega
Copy link
Contributor Author

Thanks Keno for the explanation!

So I added a safe version using the new reinterpret, and switched the name and calling convention around a little. Better?

Also, I'd like to note that reshape/reinterpret-chains accumulate and don't appear to saturate. That could be optimized (on the other hand, doing this is just silly).

julia> zeros(Int, 1, 1)|> packdims |> unpackdims |> packdims |> unpackdims

1×1 reshape(reinterpret(Int64, reinterpret(SArray{Tuple{1},Int64,1,1}, reinterpret(Int64, reinterpret(SArray{Tuple{1},Int64,1,1}, ::Array{Int64,1})))), 1, 1) with eltype Int64:
 0

@andyferris
Copy link
Member

I honestly don't feel that comfortable providing code that might be miscompiled, even if the method begins with unsafe_ (AFAICT this is meant to represent the normal, C-style pointer unsafeness, not damn you straight to undefined behavior).

On the other hand, given this:

Correct, julia master doesn't express the TBAA information, but the reason it had to be done in 1.0 was that changing the behavior of reinterpret is a breaking change.

I'm wondering if we should maybe upper-bound Julia so it's in the range [0.7 1.1) in the REQUIRE file, i.e. julia 0.7 1.1? (I'm assuming that making available the TBAA info to LLVM won't be backported to 1.0).

@Keno
Copy link
Contributor

Keno commented Sep 16, 2018

I'm assuming that making available the TBAA info to LLVM won't be backported to 1.0

I'm willing to make that guarantee.

@andyferris
Copy link
Member

For such purposes, a ReinterpretArray, or any pure julia construct, cannot not work, even if its performance gets fixed: FFI-related functions wisely should dispatch on Array, not AbstractArray.

I kinda disagree. Really, we should have traits for array memory layouts which tell us when we can get a raw pointer to (column oriented) dense data - this has been discussed elsewhere. In the status quo, there is DenseArray which should also be suitable.

@chethega
Copy link
Contributor Author

I honestly don't feel that comfortable providing code that might be miscompiled, even if the method begins with unsafe_ (AFAICT this is meant to represent the normal, C-style pointer unsafeness, not damn you straight to undefined behavior).

Call it unsafe_packdims for now, and rename to unsafe_really_packdims in the version when tbaa arrives, thus intentionally breaking all relying code? Because every use might need review once we get and can test tbaa.

Provide and document now but refuse to export, and hope that most people use the safe and convenient packdims unless they get really desperate about performance or too narrow FFI dispatches?

Regarding the signature, I am quite happy with the change; especially the most common case Matrix - Vector{SVector} is easy to spell. The name packdims is suggestive but might be too prominent.

@andyferris
Copy link
Member

The old reintrepret was type-stable because you specified the new element type. I'm wondering whether we can use Size, or a concrete static array type in the signature for this? One example:

array = randn(3, 100)
splitdims(array, SVector{3, Float64})

I'd really rather use a keyword argument but they get inferred as DataType, not Type{T}.

"Pack" and "unpack" aren't bad, by the way, it just might be nice to use similar terminology across packages and base. The other package that implements this array-splitting funcitonality is JuliennedArrays, which uses functions named julienne and align.

@chethega
Copy link
Contributor Author

Hmm. So something along the line of the following?

function splitdims(A,::Type{SAType}==Nothing; dims=1) where SVType
if SVType == Nothing
#use size and dims to build the resulting eltype
else
#resulting eltype specified. Check whether the resulting reinterpret makes sense, size-wise, or throw.
end

My reason for the dims specifier is that SArray-Types are annoying to construct (I, at least, always forget how to specify them). A single type-instability is not problematic when isolated behind function barriers.

If the caller already knows the resulting type, he can place a type-assert behind the call? The speedup for knowing the restype in advance should be in the low usec, since we basically do a cheap ccall plus a couple of boxes, and nobody in his right mind should call this function in a loop anyway. Or add an optional argument res_size that allows constant propagation to maybe infer the restype?

Re name: static_align and static_julienne also makes sense, and one could adopt the same API, throwing on invalid (non-contiguous) specifiers, which basically goes back to the NTuple{Colon} one commit before. I slightly dislike align (clashes with intuition about pointer alignment), but otherwise have no real preference between julienne + unjulienne, packdims, squashdims, splitdims + unsplitdims, slicedims, ...

@mcabbott
Copy link
Collaborator

What's the current thinking on having such convenience functions, with or without unsafe magic?

Before finding this thread I found myself writing such a function: slice(A::Array, s::Size) seemed a fairly natural way to say what gets split off, stably, without having to remember the signature. Link to mine.

I see that was one of the proposals above. No reason one couldn't also accept a number of dimensions and dispatch. But I think keyword dims=2 to mean two dimensions, not the 2nd one, seems misleading, perhaps ndims=2?

Re names, note that JuliennedArrays has moved to Slice / Align, and of course Julia 1.1 has eachslice. In my head glue is the opposite of slice, perhaps that's just me... but pack/unpack seem hard to remember which is which.

@andyferris
Copy link
Member

What's the current thinking on having such convenience functions, with or without unsafe magic?

Well... I think we all agree that doing this operation should be convenient enough.

@c42f c42f mentioned this pull request Apr 18, 2019
@c42f c42f added the performance runtime performance label Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants