Skip to content

Conversation

xal-0
Copy link
Member

@xal-0 xal-0 commented Jul 7, 2025

Fix the data races described in #57457 (comment),
summarized here so it can be included in the commit message. It's difficult to
get either of these to crash reliably without tsan, but it does appear in the
wild from time to time (#57457). I'll add these tests when we get tsan to run
on CI.

Data race on jl_binding_t.backedges

Guard jl_binding_t.backedges with the module's lock, using the same strategy
we use for the scanned methods array to use it from invalidations.jl, namely
taking the lock once to get the length and then locking every time we get the
ith entry in the array. I'm not yet convinced this will avoid missing
invalidations in all circumstances, but it should avoid deadlocks and data
races.

module M
    const x = 0 
end
t = Threads.@spawn for i=1:100
    @eval M const x = $i
end
for i=1:100
    s = Symbol("f$i")
    @eval ($s() = M.x; $s())
end
wait(t)
$ ./usr/bin/julia -t4,0 --gcthreads=1 test.jl
==================
WARNING: ThreadSanitizer: data race (pid=3644)
  Write of size 8 at 0xffff9e714960 by main thread:
    #0 ijl_array_grow_end /home/user/c/julia/src/array.c (libjulia-internal.so.1.13+0x9f5f4)
    #1 ijl_array_ptr_1d_push /home/user/c/julia/src/array.c:260:5 (libjulia-internal.so.1.13+0x9f8b0)
    #2 jl_add_binding_backedge /home/user/c/julia/src/module.c:1634:5 (libjulia-internal.so.1.13+0x8e2e4)
    #3 jl_maybe_add_binding_backedge /home/user/c/julia/src/module.c:1654:5 (libjulia-internal.so.1.13+0x90cf0)
    #4 maybe_add_binding_backedge! invalidation.jl:175 (sys.so+0x74ce8c)
    #5 julia_store_backedges_11269 ../usr/share/julia/Compiler/src/typeinfer.jl:768 (sys.so+0x74ce8c)
    #6 julia_codeinst_as_edge_11265 ../usr/share/julia/Compiler/src/typeinfer.jl:1029 (sys.so+0x10ee660)
[snip]

  Previous read of size 8 at 0xffff9e714960 by thread T9:
    #0 length essentials.jl:11 (sys.so+0x8a9188)
    #1 checkbounds essentials.jl:383 (sys.so+0x8a9188)
    #2 _iterate_array genericmemory.jl:230 (sys.so+0x8a9188)
    #3 iterate array.jl:907 (sys.so+0x8a9188)
    #4 julia_invalidate_code_for_globalrefNOT._86900 invalidation.jl:139 (sys.so+0x8a9188)
[snip]

Data race on jl_module_t.usings_backedges

jl_get_module_usings_backedges is still guarded by world_counter_lock, but
we needed to take it in jl_new_module_.

Threads.@spawn for i=1:10000
    @eval Core const foo = $i
end
Threads.@threads for i=1:10000
    Module()
end
==================
WARNING: ThreadSanitizer: data race (pid=9714)
  Write of size 8 at 0xffff7db5b3b0 by thread T11:
    #0 ijl_array_grow_end /home/user/c/julia/src/gc-wb-stock.h (libjulia-internal.so.1.13+0x9f834)
    #1 ijl_array_ptr_1d_push /home/user/c/julia/src/array.c:260:5 (libjulia-internal.so.1.13+0x9f99c)
    #2 jl_add_usings_backedge /home/user/c/julia/src/module.c:1323:5 (libjulia-internal.so.1.13+0x8e73c)
    #3 jl_module_initial_using /home/user/c/julia/src/module.c:1337:5 (libjulia-internal.so.1.13+0x89a34)
    #4 jl_add_default_names /home/user/c/julia/src/module.c:517:13 (libjulia-internal.so.1.13+0x89a34)
    #5 jl_new_module_ /home/user/c/julia/src/module.c:530:5 (libjulia-internal.so.1.13+0x89a34)
    #6 jl_f_new_module /home/user/c/julia/src/module.c:650:22 (libjulia-internal.so.1.13+0x8a9b4)
[snip]

  Previous read of size 8 at 0xffff7db5b3b0 by thread T9:
    #0 getindex essentials.jl:967 (sys.so+0x40a1e4)
    #1 _iterate_array genericmemory.jl:230 (sys.so+0x40a1e4)
    #2 iterate array.jl:907 (sys.so+0x40a1e4)
    #3 iterate array.jl:907 (sys.so+0x40a1e4)
    #4 julia_invalidate_code_for_globalrefNOT._86900 invalidation.jl:148 (sys.so+0x40a1e4)
[snip]

Remove type instability in invalidate_code_for_globalref!

We hold world_counter_lock while running invalidate_code_for_globalref!, so
we need to avoid triggering inference entirely. A typeassert avoids the last
few dynamic calls:

171 ─ %450 =   dynamic Base.getproperty(%438, :partitions)::Any
└────        goto #172
172 ┄ %452 = φ (#170 => %448, #171 => %450)::Any
│     @ invalidation.jl:175 within `invalidate_code_for_globalref!`
│     %453 =   builtin (isa)(%452, Module)::Bool
└────        goto #174 if not %453
173 ─ %455 = π (%452, Module)
│    ┌ @ Base_compiler.jl:47 within `getproperty`
│    │ %456 =   builtin Base.getglobal(%455, :max_world)::Any
│    └
└────        goto #175
174 ─ %458 =   dynamic Base.getproperty(%452, :max_world)::Any
└────        goto #175
175 ┄ %460 = φ (#173 => %456, #174 => %458)::Any
│     %461 =   dynamic (%460 == 0xffffffffffffffff)::Any
└────        goto #203 if not %461

TODO: write guidelines for runtime code written in Julia somewhere in devdocs/locks.

@xal-0 xal-0 added the bugfix This change fixes an existing bug label Jul 7, 2025
@vtjnash vtjnash added merge me PR is reviewed. Merge when all tests are passing backport 1.12 Change should be backported to release-1.12 labels Jul 8, 2025
@KristofferC KristofferC merged commit c3282ce into JuliaLang:master Jul 8, 2025
10 of 11 checks passed
@KristofferC KristofferC mentioned this pull request Jul 8, 2025
60 tasks
KristofferC added a commit that referenced this pull request Jul 8, 2025
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Jul 8, 2025
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Jul 9, 2025
@xal-0 xal-0 deleted the binding-invalidation-data-race-fix branch July 14, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants