Description
There is a serious regression in Julia v1.11 that makes it completely unusable for my applications built on top of SIMD.jl + MultiFloats.jl. I think it was supposed to be fixed in #56937 and #56938, but I still observe the issue in v1.11.4.
If we have a struct
with a member of type NTuple{N,VecElement{T}}
, reading that struct member from memory generates an aligned vector load instruction. This is a serious problem because allocations are no longer 64-byte-aligned in Julia v1.11, and AVX-512 loads segfault if the target address is not 64-byte aligned. This makes it impossible to work with these structs, which badly breaks SIMD.jl. We cannot even print them:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.4 (2025-03-10)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> struct S; data::NTuple{8,VecElement{Float64}}; end
julia> for _ = 1:10; v = Vector{S}(undef, 1); println(v); end
S[
[22569] signal 11 (128): Segmentation fault
in expression starting at REPL[2]:1
getindex at ./essentials.jl:917 [inlined]
show_delim_array at ./show.jl:1397
show_delim_array at ./show.jl:1387 [inlined]
show_vector at ./arrayshow.jl:530
show_vector at ./arrayshow.jl:515 [inlined]
show at ./arrayshow.jl:486 [inlined]
print at ./strings/io.jl:35
print at ./strings/io.jl:46
println at ./strings/io.jl:75
unknown function (ip: 0x70b1b3f04e46)
println at ./coreio.jl:4
top-level scope at ./REPL[2]:1
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10102 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14761 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73560 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x70b1b522a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 800305 (Pool: 800275; Big: 30); GC: 1
Segmentation fault (core dumped)
Here, I'm performing 10 trials to be conservative -- on my machine, I always get a segfault in the first 3-4 tries. We can see the problematic load instruction using code_native
:
julia> code_native(getindex, (Vector{S}, Int))
.text
.file "getindex"
.globl julia_getindex_399 # -- Begin function julia_getindex_399
.p2align 4, 0x90
.type julia_getindex_399,@function
julia_getindex_399: # @julia_getindex_399
; Function Signature: getindex(Array{Main.S, 1}, Int64)
; ┌ @ essentials.jl:914 within `getindex`
# %bb.0: # %top
; │ @ essentials.jl within `getindex`
#DEBUG_VALUE: getindex:A <- [DW_OP_deref] $rsi
#DEBUG_VALUE: getindex:i <- $rdx
#DEBUG_VALUE: getindex:A <- [DW_OP_deref] 0
push rbp
mov rbp, rsp
sub rsp, 16
; │ @ essentials.jl:916 within `getindex`
lea rax, [rdx - 1]
cmp rax, qword ptr [rsi + 16]
jae .LBB0_2
# %bb.1: # %L15
; │ @ essentials.jl:917 within `getindex`
mov rcx, qword ptr [rsi]
shl rax, 6
vmovaps zmm0, zmmword ptr [rcx + rax]
vmovaps zmmword ptr [rdi], zmm0
mov rax, rdi
add rsp, 16
pop rbp
vzeroupper
ret
.LBB0_2: # %L12
; │ @ essentials.jl:916 within `getindex`
mov qword ptr [rbp - 8], rdx
movabs rcx, offset j_throw_boundserror_411
lea rax, [rbp - 8]
mov rdi, rsi
mov rsi, rax
call rcx
.Lfunc_end0:
.size julia_getindex_399, .Lfunc_end0-julia_getindex_399
; └
# -- End function
.type ".L+Main.S#401",@object # @"+Main.S#401"
.section .rodata,"a",@progbits
.p2align 3, 0x0
".L+Main.S#401":
.quad ".L+Main.S#401.jit"
.size ".L+Main.S#401", 8
.set ".L+Main.S#401.jit", 124970931979792
.size ".L+Main.S#401.jit", 8
.section ".note.GNU-stack","",@progbits
Either that vmovaps
needs to be a vmovups
, or all allocations need to be 64-byte-aligned again.