Skip to content

Segmentation fault in Julia v1.11.4 due to unaligned SIMD loads #57713

Closed
@dzhang314

Description

@dzhang314

There is a serious regression in Julia v1.11 that makes it completely unusable for my applications built on top of SIMD.jl + MultiFloats.jl. I think it was supposed to be fixed in #56937 and #56938, but I still observe the issue in v1.11.4.

If we have a struct with a member of type NTuple{N,VecElement{T}}, reading that struct member from memory generates an aligned vector load instruction. This is a serious problem because allocations are no longer 64-byte-aligned in Julia v1.11, and AVX-512 loads segfault if the target address is not 64-byte aligned. This makes it impossible to work with these structs, which badly breaks SIMD.jl. We cannot even print them:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.4 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> struct S; data::NTuple{8,VecElement{Float64}}; end

julia> for _ = 1:10; v = Vector{S}(undef, 1); println(v); end
S[
[22569] signal 11 (128): Segmentation fault
in expression starting at REPL[2]:1
getindex at ./essentials.jl:917 [inlined]
show_delim_array at ./show.jl:1397
show_delim_array at ./show.jl:1387 [inlined]
show_vector at ./arrayshow.jl:530
show_vector at ./arrayshow.jl:515 [inlined]
show at ./arrayshow.jl:486 [inlined]
print at ./strings/io.jl:35
print at ./strings/io.jl:46
println at ./strings/io.jl:75
unknown function (ip: 0x70b1b3f04e46)
println at ./coreio.jl:4
top-level scope at ./REPL[2]:1
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10102 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14761 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73560 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x70b1b522a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 800305 (Pool: 800275; Big: 30); GC: 1
Segmentation fault (core dumped)

Here, I'm performing 10 trials to be conservative -- on my machine, I always get a segfault in the first 3-4 tries. We can see the problematic load instruction using code_native:

julia> code_native(getindex, (Vector{S}, Int))
	.text
	.file	"getindex"
	.globl	julia_getindex_399              # -- Begin function julia_getindex_399
	.p2align	4, 0x90
	.type	julia_getindex_399,@function
julia_getindex_399:                     # @julia_getindex_399
; Function Signature: getindex(Array{Main.S, 1}, Int64)
; ┌ @ essentials.jl:914 within `getindex`
# %bb.0:                                # %top
; │ @ essentials.jl within `getindex`
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] $rsi
	#DEBUG_VALUE: getindex:i <- $rdx
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] 0
	push	rbp
	mov	rbp, rsp
	sub	rsp, 16
; │ @ essentials.jl:916 within `getindex`
	lea	rax, [rdx - 1]
	cmp	rax, qword ptr [rsi + 16]
	jae	.LBB0_2
# %bb.1:                                # %L15
; │ @ essentials.jl:917 within `getindex`
	mov	rcx, qword ptr [rsi]
	shl	rax, 6
	vmovaps	zmm0, zmmword ptr [rcx + rax]
	vmovaps	zmmword ptr [rdi], zmm0
	mov	rax, rdi
	add	rsp, 16
	pop	rbp
	vzeroupper
	ret
.LBB0_2:                                # %L12
; │ @ essentials.jl:916 within `getindex`
	mov	qword ptr [rbp - 8], rdx
	movabs	rcx, offset j_throw_boundserror_411
	lea	rax, [rbp - 8]
	mov	rdi, rsi
	mov	rsi, rax
	call	rcx
.Lfunc_end0:
	.size	julia_getindex_399, .Lfunc_end0-julia_getindex_399
; └
                                        # -- End function
	.type	".L+Main.S#401",@object         # @"+Main.S#401"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+Main.S#401":
	.quad	".L+Main.S#401.jit"
	.size	".L+Main.S#401", 8

.set ".L+Main.S#401.jit", 124970931979792
	.size	".L+Main.S#401.jit", 8
	.section	".note.GNU-stack","",@progbits

Either that vmovaps needs to be a vmovups, or all allocations need to be 64-byte-aligned again.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions