forked from JuliaLang/julia
-
Notifications
You must be signed in to change notification settings - Fork 3
Allow GC to implement array ptr copy #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qinsoon
added a commit
to mmtk/mmtk-julia
that referenced
this pull request
May 5, 2023
qinsoon
pushed a commit
to qinsoon/julia
that referenced
this pull request
May 2, 2024
`@something` eagerly unwraps any `Some` given to it, while keeping the variable between its arguments the same. This can be an issue if a previously unpacked value is used as input to `@something`, leading to a type instability on more than two arguments (e.g. because of a fallback to `Some(nothing)`). By using different variables for each argument, type inference has an easier time handling these cases that are isolated to single branches anyway. This also adds some comments to the macro, since it's non-obvious what it does. Benchmarking the specific case I encountered this in led to a ~2x performance improvement on multiple machines. 1.10-beta3/master: ``` [sukera@tower 01]$ jl1100 -q --project=. -L 01.jl -e 'bench()' v"1.10.0-beta3" BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 38.670 μs … 70.350 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 43.340 μs ┊ GC (median): 0.00% Time (mean ± σ): 43.395 μs ± 1.518 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▆█▂ ▁▁ ▂▂▂▂▂▂▂▂▂▁▂▂▂▃▃▃▂▂▃▃▃▂▂▂▂▂▄▇███▆██▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃ 38.7 μs Histogram: frequency by time 48 μs < Memory estimate: 0 bytes, allocs estimate: 0. ``` This PR: ``` [sukera@tower 01]$ julia -q --project=. -L 01.jl -e 'bench()' v"1.11.0-DEV.970" BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 22.820 μs … 44.980 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 24.300 μs ┊ GC (median): 0.00% Time (mean ± σ): 24.370 μs ± 832.239 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂▅▇██▇▆▅▁ ▂▂▂▂▂▂▂▂▃▃▄▅▇███████████▅▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂ ▃ 22.8 μs Histogram: frequency by time 27.7 μs < Memory estimate: 0 bytes, allocs estimate: 0. ``` <details> <summary>Benchmarking code (spoilers for Advent Of Code 2023 Day 01, Part 01). Running this requires the input of that Advent Of Code day.</summary> ```julia using BenchmarkTools using InteractiveUtils isdigit(d::UInt8) = UInt8('0') <= d <= UInt8('9') someDigit(c::UInt8) = isdigit(c) ? Some(c - UInt8('0')) : nothing function part1(data) total = 0 may_a = nothing may_b = nothing for c in data digitRes = someDigit(c) may_a = @something may_a digitRes Some(nothing) may_b = @something digitRes may_b Some(nothing) if c == UInt8('\n') digit_a = may_a::UInt8 digit_b = may_b::UInt8 total += digit_a*0xa + digit_b may_a = nothing may_b = nothing end end return total end function bench() data = read("input.txt") display(VERSION) println() display(@benchmark part1($data)) nothing end ``` </details> <details> <summary>`@code_warntype` before</summary> ```julia julia> @code_warntype part1(data) MethodInstance for part1(::Vector{UInt8}) from part1(data) @ Main ~/Documents/projects/AOC/2023/01/01.jl:7 Arguments #self#::Core.Const(part1) data::Vector{UInt8} Locals @_3::Union{Nothing, Tuple{UInt8, Int64}} may_b::Union{Nothing, UInt8} may_a::Union{Nothing, UInt8} total::Int64 c::UInt8 digit_b::UInt8 digit_a::UInt8 val@_10::Any val@_11::Any digitRes::Union{Nothing, Some{UInt8}} @_13::Union{Some{Nothing}, Some{UInt8}, UInt8} @_14::Union{Some{Nothing}, Some{UInt8}} @_15::Some{Nothing} @_16::Union{Some{Nothing}, Some{UInt8}, UInt8} @_17::Union{Some{Nothing}, UInt8} @_18::Some{Nothing} Body::Int64 1 ── (total = 0) │ (may_a = Main.nothing) │ (may_b = Main.nothing) │ %4 = data::Vector{UInt8} │ (@_3 = Base.iterate(%4)) │ %6 = (@_3 === nothing)::Bool │ %7 = Base.not_int(%6)::Bool └─── goto mmtk#24 if not %7 2 ┄─ Core.NewvarNode(:(digit_b)) │ Core.NewvarNode(:(digit_a)) │ Core.NewvarNode(:(val@_10)) │ %12 = @_3::Tuple{UInt8, Int64} │ (c = Core.getfield(%12, 1)) │ %14 = Core.getfield(%12, 2)::Int64 │ (digitRes = Main.someDigit(c)) │ (val@_11 = may_a) │ %17 = (val@_11::Union{Nothing, UInt8} !== Base.nothing)::Bool └─── goto mmtk#4 if not %17 3 ── (@_13 = val@_11::UInt8) └─── goto mmtk#11 4 ── (val@_11 = digitRes) │ %22 = (val@_11::Union{Nothing, Some{UInt8}} !== Base.nothing)::Bool └─── goto mmtk#6 if not %22 5 ── (@_14 = val@_11::Some{UInt8}) └─── goto mmtk#10 6 ── (val@_11 = Main.Some(Main.nothing)) │ %27 = (val@_11::Core.Const(Some(nothing)) !== Base.nothing)::Core.Const(true) └─── goto mmtk#8 if not %27 7 ── (@_15 = val@_11::Core.Const(Some(nothing))) └─── goto mmtk#9 8 ── Core.Const(:(@_15 = Base.nothing)) 9 ┄─ (@_14 = @_15) 10 ┄ (@_13 = @_14) 11 ┄ %34 = @_13::Union{Some{Nothing}, Some{UInt8}, UInt8} │ (may_a = Base.something(%34)) │ (val@_10 = digitRes) │ %37 = (val@_10::Union{Nothing, Some{UInt8}} !== Base.nothing)::Bool └─── goto mmtk#13 if not %37 12 ─ (@_16 = val@_10::Some{UInt8}) └─── goto mmtk#20 13 ─ (val@_10 = may_b) │ %42 = (val@_10::Union{Nothing, UInt8} !== Base.nothing)::Bool └─── goto mmtk#15 if not %42 14 ─ (@_17 = val@_10::UInt8) └─── goto mmtk#19 15 ─ (val@_10 = Main.Some(Main.nothing)) │ %47 = (val@_10::Core.Const(Some(nothing)) !== Base.nothing)::Core.Const(true) └─── goto mmtk#17 if not %47 16 ─ (@_18 = val@_10::Core.Const(Some(nothing))) └─── goto mmtk#18 17 ─ Core.Const(:(@_18 = Base.nothing)) 18 ┄ (@_17 = @_18) 19 ┄ (@_16 = @_17) 20 ┄ %54 = @_16::Union{Some{Nothing}, Some{UInt8}, UInt8} │ (may_b = Base.something(%54)) │ %56 = c::UInt8 │ %57 = Main.UInt8('\n')::Core.Const(0x0a) │ %58 = (%56 == %57)::Bool └─── goto mmtk#22 if not %58 21 ─ (digit_a = Core.typeassert(may_a, Main.UInt8)) │ (digit_b = Core.typeassert(may_b, Main.UInt8)) │ %62 = total::Int64 │ %63 = (digit_a * 0x0a)::UInt8 │ %64 = (%63 + digit_b)::UInt8 │ (total = %62 + %64) │ (may_a = Main.nothing) └─── (may_b = Main.nothing) 22 ┄ (@_3 = Base.iterate(%4, %14)) │ %69 = (@_3 === nothing)::Bool │ %70 = Base.not_int(%69)::Bool └─── goto mmtk#24 if not %70 23 ─ goto mmtk#2 24 ┄ return total ``` </details> <details> <summary>`@code_native debuginfo=:none` Before </summary> ```julia julia> @code_native debuginfo=:none part1(data) .text .file "part1" .globl julia_part1_418 # -- Begin function julia_part1_418 .p2align 4, 0x90 .type julia_part1_418,@function julia_part1_418: # @julia_part1_418 # %bb.0: # %top push rbp mov rbp, rsp push r15 push r14 push r13 push r12 push rbx sub rsp, 40 mov rax, qword ptr [rdi + 8] test rax, rax je .LBB0_1 # %bb.2: # %L17 mov rcx, qword ptr [rdi] dec rax mov r10b, 1 xor r14d, r14d # implicit-def: $r12b # implicit-def: $r13b # implicit-def: $r9b # implicit-def: $sil mov qword ptr [rbp - 64], rax # 8-byte Spill mov al, 1 mov dword ptr [rbp - 48], eax # 4-byte Spill # implicit-def: $al # kill: killed $al xor eax, eax mov qword ptr [rbp - 56], rax # 8-byte Spill mov qword ptr [rbp - 72], rcx # 8-byte Spill # implicit-def: $cl jmp .LBB0_3 .p2align 4, 0x90 .LBB0_8: # in Loop: Header=BB0_3 Depth=1 mov dword ptr [rbp - 48], 0 # 4-byte Folded Spill .LBB0_24: # %post_union_move # in Loop: Header=BB0_3 Depth=1 movzx r13d, byte ptr [rbp - 41] # 1-byte Folded Reload mov r12d, r8d cmp qword ptr [rbp - 64], r14 # 8-byte Folded Reload je .LBB0_13 .LBB0_25: # %guard_exit113 # in Loop: Header=BB0_3 Depth=1 inc r14 mov r10d, ebx .LBB0_3: # %L19 # =>This Inner Loop Header: Depth=1 mov rax, qword ptr [rbp - 72] # 8-byte Reload xor ebx, ebx xor edi, edi movzx r15d, r9b movzx ecx, cl movzx esi, sil mov r11b, 1 # implicit-def: $r9b movzx edx, byte ptr [rax + r14] lea eax, [rdx - 58] lea r8d, [rdx - 48] cmp al, -10 setae bl setb dil test r10b, 1 cmovne r15d, edi mov edi, 0 cmovne ecx, ebx mov bl, 1 cmovne esi, edi test r15b, 1 jne .LBB0_7 # %bb.4: # %L76 # in Loop: Header=BB0_3 Depth=1 mov r11b, 2 test cl, 1 jne .LBB0_5 # %bb.6: # %L78 # in Loop: Header=BB0_3 Depth=1 mov ebx, r10d mov r9d, r15d mov byte ptr [rbp - 41], r13b # 1-byte Spill test sil, 1 je .LBB0_26 .LBB0_7: # %L82 # in Loop: Header=BB0_3 Depth=1 cmp al, -11 jbe .LBB0_9 jmp .LBB0_8 .p2align 4, 0x90 .LBB0_5: # in Loop: Header=BB0_3 Depth=1 mov ecx, r8d mov sil, 1 xor ebx, ebx mov byte ptr [rbp - 41], r8b # 1-byte Spill xor r9d, r9d xor ecx, ecx cmp al, -11 ja .LBB0_8 .LBB0_9: # %L90 # in Loop: Header=BB0_3 Depth=1 test byte ptr [rbp - 48], 1 # 1-byte Folded Reload jne .LBB0_23 # %bb.10: # %L115 # in Loop: Header=BB0_3 Depth=1 cmp dl, 10 jne .LBB0_11 # %bb.14: # %L122 # in Loop: Header=BB0_3 Depth=1 test r15b, 1 jne .LBB0_15 # %bb.12: # %L130.thread # in Loop: Header=BB0_3 Depth=1 movzx eax, byte ptr [rbp - 41] # 1-byte Folded Reload mov bl, 1 add eax, eax lea eax, [rax + 4*rax] add al, r12b movzx eax, al add qword ptr [rbp - 56], rax # 8-byte Folded Spill mov al, 1 mov dword ptr [rbp - 48], eax # 4-byte Spill cmp qword ptr [rbp - 64], r14 # 8-byte Folded Reload jne .LBB0_25 jmp .LBB0_13 .p2align 4, 0x90 .LBB0_23: # %L115.thread # in Loop: Header=BB0_3 Depth=1 mov al, 1 # implicit-def: $r8b mov dword ptr [rbp - 48], eax # 4-byte Spill cmp dl, 10 jne .LBB0_24 jmp .LBB0_21 .LBB0_11: # in Loop: Header=BB0_3 Depth=1 mov r8d, r12d jmp .LBB0_24 .LBB0_1: xor eax, eax mov qword ptr [rbp - 56], rax # 8-byte Spill .LBB0_13: # %L159 mov rax, qword ptr [rbp - 56] # 8-byte Reload add rsp, 40 pop rbx pop r12 pop r13 pop r14 pop r15 pop rbp ret .LBB0_21: # %L122.thread test r15b, 1 jne .LBB0_15 # %bb.22: # %post_box_union58 movabs rdi, offset .L_j_str1 movabs rax, offset ijl_type_error movabs rsi, 140008511215408 movabs rdx, 140008667209736 call rax .LBB0_15: # %fail cmp r11b, 1 je .LBB0_19 # %bb.16: # %fail movzx eax, r11b cmp eax, 2 jne .LBB0_17 # %bb.20: # %box_union54 movzx eax, byte ptr [rbp - 41] # 1-byte Folded Reload movabs rcx, offset jl_boxed_uint8_cache mov rdx, qword ptr [rcx + 8*rax] jmp .LBB0_18 .LBB0_26: # %L80 movabs rax, offset ijl_throw movabs rdi, 140008495049392 call rax .LBB0_19: # %box_union movabs rdx, 140008667209736 jmp .LBB0_18 .LBB0_17: xor edx, edx .LBB0_18: # %post_box_union movabs rdi, offset .L_j_str1 movabs rax, offset ijl_type_error movabs rsi, 140008511215408 call rax .Lfunc_end0: .size julia_part1_418, .Lfunc_end0-julia_part1_418 # -- End function .type .L_j_str1,@object # @_j_str1 .section .rodata.str1.1,"aMS",@progbits,1 .L_j_str1: .asciz "typeassert" .size .L_j_str1, 11 .section ".note.GNU-stack","",@progbits ``` </details> <details> <summary>`@code_warntype` After</summary> ```julia [sukera@tower 01]$ julia -q --project=. -L 01.jl julia> data = read("input.txt"); julia> @code_warntype part1(data) MethodInstance for part1(::Vector{UInt8}) from part1(data) @ Main ~/Documents/projects/AOC/2023/01/01.jl:7 Arguments #self#::Core.Const(part1) data::Vector{UInt8} Locals @_3::Union{Nothing, Tuple{UInt8, Int64}} may_b::Union{Nothing, UInt8} may_a::Union{Nothing, UInt8} total::Int64 val@_7::Union{} val@_8::Union{} c::UInt8 digit_b::UInt8 digit_a::UInt8 #JuliaLang#215::Some{Nothing} #JuliaLang#216::Union{Nothing, UInt8} #JuliaLang#217::Union{Nothing, Some{UInt8}} #JuliaLang#212::Some{Nothing} #JuliaLang#213::Union{Nothing, Some{UInt8}} #JuliaLang#214::Union{Nothing, UInt8} digitRes::Union{Nothing, Some{UInt8}} @_19::Union{Nothing, UInt8} @_20::Union{Nothing, UInt8} @_21::Nothing @_22::Union{Nothing, UInt8} @_23::Union{Nothing, UInt8} @_24::Nothing Body::Int64 1 ── (total = 0) │ (may_a = Main.nothing) │ (may_b = Main.nothing) │ %4 = data::Vector{UInt8} │ (@_3 = Base.iterate(%4)) │ %6 = @_3::Union{Nothing, Tuple{UInt8, Int64}} │ %7 = (%6 === nothing)::Bool │ %8 = Base.not_int(%7)::Bool └─── goto mmtk#24 if not %8 2 ┄─ Core.NewvarNode(:(val@_7)) │ Core.NewvarNode(:(val@_8)) │ Core.NewvarNode(:(digit_b)) │ Core.NewvarNode(:(digit_a)) │ Core.NewvarNode(:(#JuliaLang#215)) │ Core.NewvarNode(:(#JuliaLang#216)) │ Core.NewvarNode(:(#JuliaLang#217)) │ Core.NewvarNode(:(#JuliaLang#212)) │ Core.NewvarNode(:(#JuliaLang#213)) │ %19 = @_3::Tuple{UInt8, Int64} │ (c = Core.getfield(%19, 1)) │ %21 = Core.getfield(%19, 2)::Int64 │ %22 = c::UInt8 │ (digitRes = Main.someDigit(%22)) │ %24 = may_a::Union{Nothing, UInt8} │ (#JuliaLang#214 = %24) │ %26 = Base.:!::Core.Const(!) │ %27 = #JuliaLang#214::Union{Nothing, UInt8} │ %28 = Base.isnothing(%27)::Bool │ %29 = (%26)(%28)::Bool └─── goto mmtk#4 if not %29 3 ── %31 = #JuliaLang#214::UInt8 │ (@_19 = Base.something(%31)) └─── goto mmtk#11 4 ── %34 = digitRes::Union{Nothing, Some{UInt8}} │ (#JuliaLang#213 = %34) │ %36 = Base.:!::Core.Const(!) │ %37 = #JuliaLang#213::Union{Nothing, Some{UInt8}} │ %38 = Base.isnothing(%37)::Bool │ %39 = (%36)(%38)::Bool └─── goto mmtk#6 if not %39 5 ── %41 = #JuliaLang#213::Some{UInt8} │ (@_20 = Base.something(%41)) └─── goto mmtk#10 6 ── %44 = Main.Some::Core.Const(Some) │ %45 = Main.nothing::Core.Const(nothing) │ (#JuliaLang#212 = (%44)(%45)) │ %47 = Base.:!::Core.Const(!) │ %48 = #JuliaLang#212::Core.Const(Some(nothing)) │ %49 = Base.isnothing(%48)::Core.Const(false) │ %50 = (%47)(%49)::Core.Const(true) └─── goto mmtk#8 if not %50 7 ── %52 = #JuliaLang#212::Core.Const(Some(nothing)) │ (@_21 = Base.something(%52)) └─── goto mmtk#9 8 ── Core.Const(nothing) │ Core.Const(:(val@_8 = Base.something(Base.nothing))) │ Core.Const(nothing) │ Core.Const(:(val@_8)) └─── Core.Const(:(@_21 = %58)) 9 ┄─ %60 = @_21::Core.Const(nothing) └─── (@_20 = %60) 10 ┄ %62 = @_20::Union{Nothing, UInt8} └─── (@_19 = %62) 11 ┄ %64 = @_19::Union{Nothing, UInt8} │ (may_a = %64) │ %66 = digitRes::Union{Nothing, Some{UInt8}} │ (#JuliaLang#217 = %66) │ %68 = Base.:!::Core.Const(!) │ %69 = #JuliaLang#217::Union{Nothing, Some{UInt8}} │ %70 = Base.isnothing(%69)::Bool │ %71 = (%68)(%70)::Bool └─── goto mmtk#13 if not %71 12 ─ %73 = #JuliaLang#217::Some{UInt8} │ (@_22 = Base.something(%73)) └─── goto mmtk#20 13 ─ %76 = may_b::Union{Nothing, UInt8} │ (#JuliaLang#216 = %76) │ %78 = Base.:!::Core.Const(!) │ %79 = #JuliaLang#216::Union{Nothing, UInt8} │ %80 = Base.isnothing(%79)::Bool │ %81 = (%78)(%80)::Bool └─── goto mmtk#15 if not %81 14 ─ %83 = #JuliaLang#216::UInt8 │ (@_23 = Base.something(%83)) └─── goto mmtk#19 15 ─ %86 = Main.Some::Core.Const(Some) │ %87 = Main.nothing::Core.Const(nothing) │ (#JuliaLang#215 = (%86)(%87)) │ %89 = Base.:!::Core.Const(!) │ %90 = #JuliaLang#215::Core.Const(Some(nothing)) │ %91 = Base.isnothing(%90)::Core.Const(false) │ %92 = (%89)(%91)::Core.Const(true) └─── goto mmtk#17 if not %92 16 ─ %94 = #JuliaLang#215::Core.Const(Some(nothing)) │ (@_24 = Base.something(%94)) └─── goto mmtk#18 17 ─ Core.Const(nothing) │ Core.Const(:(val@_7 = Base.something(Base.nothing))) │ Core.Const(nothing) │ Core.Const(:(val@_7)) └─── Core.Const(:(@_24 = %100)) 18 ┄ %102 = @_24::Core.Const(nothing) └─── (@_23 = %102) 19 ┄ %104 = @_23::Union{Nothing, UInt8} └─── (@_22 = %104) 20 ┄ %106 = @_22::Union{Nothing, UInt8} │ (may_b = %106) │ %108 = Main.:(==)::Core.Const(==) │ %109 = c::UInt8 │ %110 = Main.UInt8('\n')::Core.Const(0x0a) │ %111 = (%108)(%109, %110)::Bool └─── goto mmtk#22 if not %111 21 ─ %113 = may_a::Union{Nothing, UInt8} │ (digit_a = Core.typeassert(%113, Main.UInt8)) │ %115 = may_b::Union{Nothing, UInt8} │ (digit_b = Core.typeassert(%115, Main.UInt8)) │ %117 = Main.:+::Core.Const(+) │ %118 = total::Int64 │ %119 = Main.:+::Core.Const(+) │ %120 = Main.:*::Core.Const(*) │ %121 = digit_a::UInt8 │ %122 = (%120)(%121, 0x0a)::UInt8 │ %123 = digit_b::UInt8 │ %124 = (%119)(%122, %123)::UInt8 │ (total = (%117)(%118, %124)) │ (may_a = Main.nothing) └─── (may_b = Main.nothing) 22 ┄ (@_3 = Base.iterate(%4, %21)) │ %129 = @_3::Union{Nothing, Tuple{UInt8, Int64}} │ %130 = (%129 === nothing)::Bool │ %131 = Base.not_int(%130)::Bool └─── goto mmtk#24 if not %131 23 ─ goto mmtk#2 24 ┄ %134 = total::Int64 └─── return %134 ``` </details> <details> <summary>`@code_native debuginfo=:none` After </summary> ```julia julia> @code_native debuginfo=:none part1(data) .text .file "part1" .globl julia_part1_1203 # -- Begin function julia_part1_1203 .p2align 4, 0x90 .type julia_part1_1203,@function julia_part1_1203: # @julia_part1_1203 ; Function Signature: part1(Array{UInt8, 1}) # %bb.0: # %top #DEBUG_VALUE: part1:data <- [DW_OP_deref] $rdi push rbp mov rbp, rsp push r15 push r14 push r13 push r12 push rbx sub rsp, 40 vxorps xmm0, xmm0, xmm0 #APP mov rax, qword ptr fs:[0] #NO_APP lea rdx, [rbp - 64] vmovaps xmmword ptr [rbp - 64], xmm0 mov qword ptr [rbp - 48], 0 mov rcx, qword ptr [rax - 8] mov qword ptr [rbp - 64], 4 mov rax, qword ptr [rcx] mov qword ptr [rbp - 72], rcx # 8-byte Spill mov qword ptr [rbp - 56], rax mov qword ptr [rcx], rdx #DEBUG_VALUE: part1:data <- [DW_OP_deref] 0 mov r15, qword ptr [rdi + 16] test r15, r15 je .LBB0_1 # %bb.2: # %L34 mov r14, qword ptr [rdi] dec r15 mov r11b, 1 mov r13b, 1 # implicit-def: $r12b # implicit-def: $r10b xor eax, eax jmp .LBB0_3 .p2align 4, 0x90 .LBB0_4: # in Loop: Header=BB0_3 Depth=1 xor r11d, r11d mov ebx, edi mov r10d, r8d .LBB0_9: # %L114 # in Loop: Header=BB0_3 Depth=1 mov r12d, esi test r15, r15 je .LBB0_12 .LBB0_10: # %guard_exit126 # in Loop: Header=BB0_3 Depth=1 inc r14 dec r15 mov r13d, ebx .LBB0_3: # %L36 # =>This Inner Loop Header: Depth=1 movzx edx, byte ptr [r14] test r13b, 1 movzx edi, r13b mov ebx, 1 mov ecx, 0 cmove ebx, edi cmovne edi, ecx movzx ecx, r10b lea esi, [rdx - 48] lea r9d, [rdx - 58] movzx r8d, sil cmove r8d, ecx cmp r9b, -11 ja .LBB0_4 # %bb.5: # %L89 # in Loop: Header=BB0_3 Depth=1 test r11b, 1 jne .LBB0_8 # %bb.6: # %L102 # in Loop: Header=BB0_3 Depth=1 cmp dl, 10 jne .LBB0_7 # %bb.13: # %L106 # in Loop: Header=BB0_3 Depth=1 test r13b, 1 jne .LBB0_14 # %bb.11: # %L114.thread # in Loop: Header=BB0_3 Depth=1 add ecx, ecx mov bl, 1 mov r11b, 1 lea ecx, [rcx + 4*rcx] add cl, r12b movzx ecx, cl add rax, rcx test r15, r15 jne .LBB0_10 jmp .LBB0_12 .p2align 4, 0x90 .LBB0_8: # %L102.thread # in Loop: Header=BB0_3 Depth=1 mov r11b, 1 # implicit-def: $sil cmp dl, 10 jne .LBB0_9 jmp .LBB0_15 .LBB0_7: # in Loop: Header=BB0_3 Depth=1 mov esi, r12d jmp .LBB0_9 .LBB0_1: xor eax, eax .LBB0_12: # %L154 mov rcx, qword ptr [rbp - 56] mov rdx, qword ptr [rbp - 72] # 8-byte Reload mov qword ptr [rdx], rcx add rsp, 40 pop rbx pop r12 pop r13 pop r14 pop r15 pop rbp ret .LBB0_15: # %L106.thread test r13b, 1 jne .LBB0_14 # %bb.16: # %post_box_union47 movabs rax, offset jl_nothing movabs rcx, offset jl_small_typeof movabs rdi, offset ".L_j_str_typeassert#1" mov rdx, qword ptr [rax] mov rsi, qword ptr [rcx + 336] movabs rax, offset ijl_type_error mov qword ptr [rbp - 48], rsi call rax .LBB0_14: # %post_box_union movabs rax, offset jl_nothing movabs rcx, offset jl_small_typeof movabs rdi, offset ".L_j_str_typeassert#1" mov rdx, qword ptr [rax] mov rsi, qword ptr [rcx + 336] movabs rax, offset ijl_type_error mov qword ptr [rbp - 48], rsi call rax .Lfunc_end0: .size julia_part1_1203, .Lfunc_end0-julia_part1_1203 # -- End function .type ".L_j_str_typeassert#1",@object # @"_j_str_typeassert#1" .section .rodata.str1.1,"aMS",@progbits,1 ".L_j_str_typeassert#1": .asciz "typeassert" .size ".L_j_str_typeassert#1", 11 .section ".note.GNU-stack","",@progbits ``` </details> Co-authored-by: Sukera <[email protected]>
udesou
pushed a commit
to udesou/julia
that referenced
this pull request
Oct 16, 2024
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto mmtk#3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto mmtk#4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto mmtk#5 5 ── goto mmtk#6 6 ── goto mmtk#7 7 ── goto mmtk#8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto mmtk#9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto mmtk#10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible (JuliaLang#55990).
udesou
pushed a commit
that referenced
this pull request
Nov 25, 2024
…g#52935) Should fix JuliaLang#51818. MWE: ```julia function testme() X = @noinline rand(1_000_000_00) Y = @noinline sum(X) X = nothing GC.gc() return Y end ``` Note that it now stores a `NULL` in the GC frame before calling `jl_gc_collect`. Before: ```llvm ; Function Signature: testme() ; @ /Users/dnetto/Personal/test.jl:3 within `testme` define double @julia_testme_535() #0 { top: %gcframe1 = alloca [3 x ptr], align 16 call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true) %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10 store i64 4, ptr %gcframe1, align 16 %task.gcstack = load ptr, ptr %pgcstack, align 8 %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1 store ptr %task.gcstack, ptr %frame.prev, align 8 store ptr %gcframe1, ptr %pgcstack, align 8 ; @ /Users/dnetto/Personal/test.jl:4 within `testme` %0 = call nonnull ptr @j_rand_539(i64 signext 100000000) %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2 store ptr %0, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:5 within `testme` %1 = call double @j_sum_541(ptr nonnull %0) ; @ /Users/dnetto/Personal/test.jl:7 within `testme` ; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132 call void @jlplt_ijl_gc_collect_543_got.jit(i32 1) %frame.prev4 = load ptr, ptr %frame.prev, align 8 store ptr %frame.prev4, ptr %pgcstack, align 8 ; └ ; @ /Users/dnetto/Personal/test.jl:8 within `testme` ret double %1 } ``` After: ```llvm ; Function Signature: testme() ; @ /Users/dnetto/Personal/test.jl:3 within `testme` define double @julia_testme_752() #0 { top: %gcframe1 = alloca [3 x ptr], align 16 call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true) %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10 store i64 4, ptr %gcframe1, align 16 %task.gcstack = load ptr, ptr %pgcstack, align 8 %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1 store ptr %task.gcstack, ptr %frame.prev, align 8 store ptr %gcframe1, ptr %pgcstack, align 8 ; @ /Users/dnetto/Personal/test.jl:4 within `testme` %0 = call nonnull ptr @j_rand_756(i64 signext 100000000) %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2 store ptr %0, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:5 within `testme` %1 = call double @j_sum_758(ptr nonnull %0) store ptr null, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:7 within `testme` ; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132 call void @jlplt_ijl_gc_collect_760_got.jit(i32 1) %frame.prev6 = load ptr, ptr %frame.prev, align 8 store ptr %frame.prev6, ptr %pgcstack, align 8 ; └ ; @ /Users/dnetto/Personal/test.jl:8 within `testme` ret double %1 } ```
udesou
added a commit
that referenced
this pull request
Dec 6, 2024
* Implement faster `issubset` for `CartesianIndices{N}` (#56282) Co-authored-by: xili <[email protected]> * Improve doc example: Extracting the type parameter from a super-type (#55983) Documentation describes the correct way of extracting the element type of a supertype: https://docs.julialang.org/en/v1/manual/methods/#Extracting-the-type-parameter-from-a-super-type However, one of the examples to showcase this is nonsensical since it is a union of multiple element types. I have replaced this example with a union over the dimension. Now, the `eltype_wrong` function still gives a similar error, yet the correct way returns the unambiguous answer. --------- Co-authored-by: Lilith Orion Hafner <[email protected]> * llvmpasses: force vector width for compatibility with non-x86 hosts. (#56300) The pipeline-prints test currently fails when running on an aarch64-macos device: ``` /Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll:309:23: error: AFTERVECTORIZATION: expected string not found in input ; AFTERVECTORIZATION: vector.body ^ <stdin>:2:40: note: scanning from here ; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 *** ^ <stdin>:47:27: note: possible intended match here ; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 *** ^ Input file: <stdin> Check file: /Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll -dump-input=help explains the following input dump. Input was: <<<<<< 1: opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple. 2: ; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 *** check:309'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found 3: define i64 @julia_f_199(ptr addrspace(10) noundef nonnull align 16 dereferenceable(40) %0) #0 !dbg !4 { check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4: top: check:309'0 ~~~~~ 5: %1 = call ptr @julia.get_pgcstack() check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6: %ptls_field = getelementptr inbounds ptr, ptr %1, i64 2 check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7: %ptls_load45 = load ptr, ptr %ptls_field, align 8, !tbaa !8 check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . . 42: check:309'0 ~ 43: L41: ; preds = %L41.loopexit, %L17, %top check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 44: %value_phi10 = phi i64 [ 0, %top ], [ %7, %L17 ], [ %.lcssa, %L41.loopexit ] check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45: ret i64 %value_phi10, !dbg !52 check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 46: } check:309'0 ~~ 47: ; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 *** check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ check:309'1 ? possible intended match 48: ; Function Attrs: noinline optnone check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 49: define nonnull ptr addrspace(10) @jfptr_f_200(ptr addrspace(10) %0, ptr noalias nocapture noundef readonly %1, i32 %2) #1 { check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 50: top: check:309'0 ~~~~~ 51: %3 = call ptr @julia.get_pgcstack() check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 52: %4 = getelementptr inbounds ptr addrspace(10), ptr %1, i32 0 check:309'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . . >>>>>> -- ******************** Failed Tests (1): Julia :: pipeline-prints.ll ``` The problem is that these tests assume x86_64, which fails because the target isn't available, so it presumably uses the native target which has different vectorization characteristics: ``` ❯ ./usr/tools/opt --load-pass-plugin=libjulia-codegen.dylib -passes='julia' --print-before=AfterVectorization -o /dev/null ../../test/llvmpasses/pipeline-prints.ll ./usr/tools/opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple. ``` There's other tests that assume this (e.g. the `fma` cpufeatures one), but they don't fail, so I've left them as is. * Reduce generic matrix*vector latency (#56289) ```julia julia> using LinearAlgebra julia> A = rand(Int,4,4); x = rand(Int,4); y = similar(x); julia> @time mul!(y, A, x, 2, 2); 0.330489 seconds (792.22 k allocations: 41.519 MiB, 8.75% gc time, 99.99% compilation time) # master 0.134212 seconds (339.89 k allocations: 17.103 MiB, 15.23% gc time, 99.98% compilation time) # This PR ``` Main changes: - `generic_matvecmul!` and `_generic_matvecmul!` now accept `alpha` and `beta` arguments instead of `MulAddMul(alpha, beta)`. The methods that accept a `MulAddMul(alpha, beta)` are also retained for backward compatibility, but these now forward `alpha` and `beta`, instead of the other way around. - Narrow the scope of the `@stable_muladdmul` applications. We now construct the `MulAddMul(alpha, beta)` object only where it is needed in a function call, and we annotate the call site with `@stable_muladdmul`. This leads to smaller branches. - Create a new internal function with methods for the `'N'`, `'T'` and `'C'` cases, so that firstly, there's less code duplication, and secondly, the `_generic_matvecmul!` method is now simple enough to enable constant propagation. This eliminates the unnecessary branches, and only the one that is taken is compiled. Together, this reduces the TTFX substantially. * Type `Base.is_interactive` as `Bool` (#56303) Before, typing `Base.is_interactive = 7` would cause weird internal REPL failures down the line. Now, it throws an InexactError and has no impact. * REPL: don't complete str and cmd macros when the input matches the internal name like `r_` to `r"` (#56254) * fix REPL test if a "juliadev" directory exists in home (#56218) * Fix trampoline warning on x86 as well (#56280) * typeintersect: more fastpath to skip intersect under circular env (#56304) fix #56040 * Preserve type in `first` for `OneTo` (#56263) With this PR, ```julia julia> first(Base.OneTo(10), 4) Base.OneTo(4) ``` Previously, this would have used indexing to return a `UnitRange`. This is probably the only way to slice a `Base.OneTo` and obtain a `Base.OneTo` back. * Matmul: dispatch on specific blas paths using an enum (#55002) This expands on the approach taken by https://github.com/JuliaLang/julia/pull/54552. We pass on more type information to `generic_matmatmul_wrapper!`, which lets us convert the branches to method dispatches. This helps spread the latency around, so that instead of compiling all the branches in the first call, we now compile the branches only when they are actually taken. While this reduces the latency in individual branches, there is no reduction in latency if all the branches are reachable. ```julia julia> A = rand(2,2); julia> @time A * A; 0.479805 seconds (809.66 k allocations: 40.764 MiB, 99.93% compilation time) # 1.12.0-DEV.806 0.346739 seconds (633.17 k allocations: 31.320 MiB, 99.90% compilation time) # This PR julia> @time A * A'; 0.030413 seconds (101.98 k allocations: 5.359 MiB, 98.54% compilation time) # v1.12.0-DEV.806 0.148118 seconds (219.51 k allocations: 11.652 MiB, 99.72% compilation time) # This PR ``` The latency is spread between the two calls here. In fresh sessions: ```julia julia> A = rand(2,2); julia> @time A * A'; 0.473630 seconds (825.65 k allocations: 41.554 MiB, 99.91% compilation time) # v1.12.0-DEV.806 0.490305 seconds (774.87 k allocations: 38.824 MiB, 99.90% compilation time) # This PR ``` In this case, both the `syrk` and `gemm` branches are reachable, so there is no reduction in latency. Analogously, there is a reduction in latency in the second set of matrix multiplications where we call `symm!/hemm!` or `_generic_matmatmul`: ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time Symmetric(A) * A; 0.711178 seconds (2.06 M allocations: 103.878 MiB, 2.20% gc time, 99.98% compilation time) # v1.12.0-DEV.806 0.540669 seconds (904.12 k allocations: 43.576 MiB, 2.60% gc time, 97.36% compilation time) # This PR ``` * Scaling `mul!` for generic `AbstractArray`s (#56313) This improves performance in the scaling `mul!` for `StridedArray`s by using loops instead of broadcasting. ```julia julia> using LinearAlgebra julia> A = zeros(200,200); C = similar(A); julia> @btime mul!($C, $A, 1, 2, 2); 19.180 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1479" 11.361 μs (0 allocations: 0 bytes) # This PR ``` The latency is reduced as well for the same reason. ```julia julia> using LinearAlgebra julia> A = zeros(2,2); C = similar(A); julia> @time mul!(C, A, 1, 2, 2); 0.203034 seconds (522.94 k allocations: 27.011 MiB, 14.95% gc time, 99.97% compilation time) # nightly 0.034713 seconds (59.16 k allocations: 2.962 MiB, 99.91% compilation time) # This PR ``` Thirdly, I've replaced the `.*ₛ` calls by explicit branches. This fixes the following: ```julia julia> A = [zeros(2), zeros(2)]; C = similar(A); julia> mul!(C, A, 1) ERROR: MethodError: no method matching +(::Vector{Float64}, ::Bool) ``` After this, ```julia julia> mul!(C, A, 1) 2-element Vector{Vector{Float64}}: [0.0, 0.0] [0.0, 0.0] ``` Also, I've added `@stable_muladdmul` annotations to the `generic_mul!` call, but moved it within the loop to narrow its scope. This doesn't increase the latency, while making the call type-stable. ```julia julia> D = Diagonal(1:2); C = similar(D); julia> @time mul!(C, D, 1, 2, 2); 0.248385 seconds (898.18 k allocations: 47.027 MiB, 12.30% gc time, 99.96% compilation time) # nightly 0.249940 seconds (919.80 k allocations: 49.128 MiB, 11.36% gc time, 99.99% compilation time) # This PR ``` * InteractiveUtils.jl: fixes issue where subtypes resolves bindings and causes deprecation warnings (#56306) The current version of `subtypes` will throw deprecation errors even if no one is using the deprecated bindings. A similar bug was fixed in Aqua.jl - https://github.com/JuliaTesting/Aqua.jl/pull/89/files See discussion here: - https://github.com/JuliaIO/ImageMagick.jl/issues/235 (for identifying the problem) - https://github.com/simonster/Reexport.jl/issues/42 (for pointing to the issue in Aqua.jl) - https://github.com/JuliaTesting/Aqua.jl/pull/89/files (for the fix in Aqua.jl) This adds the `isbindingresolved` test to the `subtypes` function to avoid throwing deprecation warnings. It also adds a test to check that this doesn't happen. --- On the current master branch (before the fix), the added test shows: ``` WARNING: using deprecated binding InternalModule.MyOldType in OuterModule. , use MyType instead. Subtypes and deprecations: Test Failed at /home/dgleich/devextern/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:932 Expression: isempty(stderr_content) Evaluated: isempty("WARNING: using deprecated binding InternalModule.MyOldType in OuterModule.\n, use MyType instead.\n") Test Summary: | Fail Total Time Subtypes and deprecations | 1 1 2.8s ERROR: LoadError: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken. in expression starting at /home/dgleich/devextern/julia/stdlib/InteractiveUtils/test/runtests.jl:841 ERROR: Package InteractiveUtils errored during testing ``` --- Using the results of this pull request: ``` @test_nowarn subtypes(Integer); ``` passes without error. The other tests pass too. * [CRC32c] Support AbstractVector{UInt8} as input (#56164) This is a similar PR to https://github.com/JuliaIO/CRC32.jl/pull/12 I added a generic fallback method for `AbstractVector{UInt8}` similar to the existing generic `IO` method. Co-authored-by: Steven G. Johnson <[email protected]> * Put `jl_gc_new_weakref` in a header file again (#56319) * use textwidth for string display truncation (#55442) It makes a big difference when displaying strings that have width-2 or width-0 characters. * Use `pwd()` as the default directory to walk in `walkdir` (#55550) * Reset mtime of BOLTed files to prevent make rebuilding targets (#55587) This simplifies the `finish_stage` rule. Co-authored-by: Zentrik <[email protected]> * add docstring note about `displaysize` and `IOContext` with `context` (#55510) * LinearAlgebra: replace some hardcoded loop ranges with axes (#56243) These are safer in general, as well as easier to read. Also, narrow the scopes of some `@inbounds` annotations. * inference: fix `[modifyfield!|replacefield!]_tfunc`s (#56310) Currently the following code snippet results in an internal error: ```julia julia> func(x) = @atomic :monotonic x[].count += 1; julia> let;Base.Experimental.@force_compile x = Ref(nothing) func(x) end Internal error: during type inference of ... ``` This issue is caused by the incorrect use of `_fieldtype_tfunc(𝕃, o, f)` within `modifyfield!_tfunc`, specifically because `o` should be `widenconst`ed, but it isn’t. By using `_fieldtype_tfunc` correctly, we can avoid the error through error-catching in `abstract_modifyop!`. This commit also includes a similar fix for `replacefield!_tfunc` as well. * inference: don't allow `SSAValue`s in assignment lhs (#56314) In `InferenceState` the lhs of a `:=` expression should only contain `GlobalRef` or `SlotNumber` and no other IR elements. Currently when `SSAValue` appears in `lhs`, the invalid assignment effect is somehow ignored, but this is incorrect anyway, so this commit removes that check. Since `SSAValue` should not appear in `lhs` in the first place, this is not a significant change though. * Fix `unsafe_read` for `IOBuffer` with non dense data (#55776) Fixes one part of #54636 It was only safe to use the following if `from.data` was a dense vector of bytes. ```julia GC.@preserve from unsafe_copyto!(p, pointer(from.data, from.ptr), adv) ``` This PR adds a fallback suggested by @matthias314 in https://discourse.julialang.org/t/copying-bytes-from-abstractvector-to-ptr/119408/7 * support `isless` for zero-dimensional `AbstractArray`s (#55772) Fixes #55771 * inference: don't add backdge when `applicable` inferred to return `Bool` (#56316) Also just as a minor backedge reduction optimization, this commit avoids adding backedges when `applicable` is inferred to return `::Bool`. * Mark `require_one_based_indexing` and `has_offset_axes` as public (#56196) The discussion here mentions `require_one_based_indexing` being part of the public API: https://github.com/JuliaLang/julia/pull/43263 Both functions are also documented (albeit in the dev docs): * `require_one_based_indexing`: https://docs.julialang.org/en/v1/devdocs/offset-arrays/#man-custom-indices * `has_offset_axes`: https://docs.julialang.org/en/v1/devdocs/offset-arrays/#For-objects-that-mimic-AbstractArray-but-are-not-subtypes Towards https://github.com/JuliaLang/julia/issues/51335. --------- Co-authored-by: Matt Bauman <[email protected]> * Avoid some allocations in various `println` methods (#56308) * Add a developer documentation section to the `LinearAlgebra` docs (#56324) Functions that are meant for package developers may go here, instead of the main section that is primarily for users. * drop require lock when not needed during loading to allow parallel precompile loading (#56291) Fixes `_require_search_from_serialized` to first acquire all start_loading locks (using a deadlock-free batch-locking algorithm) before doing stalechecks and the rest, so that all the global computations happen behind the require_lock, then the rest can happen behind module-specific locks, then (as before) extensions can be loaded in parallel eventually after `require` returns. * Make `String(::Memory)` copy (#54457) A more targeted fix of #54369 than #54372 Preserves the performance improvements added in #53962 by creating a new internal `_unsafe_takestring!(v::Memory{UInt8})` function that does what `String(::Memory{UInt8})` used to do. * 🤖 [master] Bump the Pkg stdlib from 799dc2d54 to 116ba910c (#56336) Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: 799dc2d54 New commit: 116ba910c Julia version: 1.12.0-DEV Pkg version: 1.12.0 Bump invoked by: @IanButterworth Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: https://github.com/JuliaLang/Pkg.jl/compare/799dc2d54c4e809b9779de8c604564a5b3befaa0...116ba910c74ab565d348aa8a50d6dd10148f11ab ``` $ git log --oneline 799dc2d54..116ba910c 116ba910c fix Base.unreference_module call (#4057) 6ed1d2f40 do not show right hand progress without colors (#4047) ``` Co-authored-by: Dilum Aluthge <[email protected]> * Wall-time/all tasks profiler (#55889) One limitation of sampling CPU/thread profiles, as is currently done in Julia, is that they primarily capture samples from CPU-intensive tasks. If many tasks are performing IO or contending for concurrency primitives like semaphores, these tasks won’t appear in the profile, as they aren't scheduled on OS threads sampled by the profiler. A wall-time profiler, like the one implemented in this PR, samples tasks regardless of OS thread scheduling. This enables profiling of IO-heavy tasks and detecting areas of heavy contention in the system. Co-developed with @nickrobinson251. * recommend explicit `using Foo: Foo, ...` in package code (was: "using considered harmful") (#42080) I feel we are heading up against a "`using` crisis" where any new feature that is implemented by exporting a new name (either in Base or a package) becomes a breaking change. This is already happening (https://github.com/JuliaGPU/CUDA.jl/pull/1097, https://github.com/JuliaWeb/HTTP.jl/pull/745) and as projects get bigger and more names are exported, the likelihood of this rapidly increases. The flaw in `using Foo` is fundamental in that you cannot lexically see where a name comes from so when two packages export the same name, you are screwed. Any code that relies on `using Foo` and then using an exported name from `Foo` is vulnerable to another dependency exporting the same name. Therefore, I think we should start to strongly discourage the use of `using Foo` and only recommend `using Foo` for ephemeral work (e.g. REPL work). --------- Co-authored-by: Dilum Aluthge <[email protected]> Co-authored-by: Mason Protter <[email protected]> Co-authored-by: Max Horn <[email protected]> Co-authored-by: Matt Bauman <[email protected]> Co-authored-by: Alex Arslan <[email protected]> Co-authored-by: Ian Butterworth <[email protected]> Co-authored-by: Neven Sajko <[email protected]> * Change some hardcoded loop ranges to axes in dense linalg functions (#56348) These should be safer in general, and are also easier to reason about. * Make `LinearAlgebra.haszero` public (#56223) The trait `haszero` is used to check if a type `T` has a unique zero defined using `zero(T)`. This lets us dispatch to optimized paths without losing generality. This PR makes the function public so that this may be extended by packages (such as `StaticArrays`). * remove spurious parens in profiler docs (#56357) * Fix `log_quasitriu` for internal scaling `s=0` (#56311) This PR is a potential fix for #54833. ## Description The function https://github.com/JuliaLang/julia/blob/2a06376c18afd7ec875335070743dcebcd85dee7/stdlib/LinearAlgebra/src/triangular.jl#L2220 computes $\boldsymbol{A}^{\dfrac{1}{2^s}} - \boldsymbol{I}$ for a real-valued $2\times 2$ matrix $\boldsymbol{A}$ using Algorithm 5.1 in [R1]. However, the algorithm in [R1] as well as the above function do not handle the case $s=0.$ This fix extends the function to compute $\boldsymbol{A}^{\dfrac{1}{2^s}} - \boldsymbol{I} \Bigg|_{s=0} = \boldsymbol{A} - \boldsymbol{I}.$ ## Checklist - [X] Fix code: `stdlib\LinearAlgebra\src\triangular.jl` in function `_sqrt_pow_diag_block_2x2!(A, A0, s)`. - [X] Add test case: `stdlib\LinearAlgebra\test\triangular.jl`. - [X] Update `NEWS.md`. - [X] Testing and self review. | Tag | Reference | | --- | --- | | <nobr>[R1]</nobr> | Al-Mohy, Awad H. and Higham, Nicholas J. "Improved Inverse Scaling and Squaring Algorithms for the Matrix Logarithm", 2011, url: https://eprints.maths.manchester.ac.uk/1687/1/paper11.pdf | --------- Co-authored-by: Daniel Karrasch <[email protected]> Co-authored-by: Oscar Smith <[email protected]> * loading: clean up more concurrency issues (#56329) Guarantee that `__init__` runs before `using` returns. Could be slightly breaking for people that do crazy things inside `__init__`, but just don't do that. Since extensions then probably load after `__init__` (or at least, run their `__init__` after), this is a partial step towards changing things so that extensions are guaranteed to load if using all of their triggers before the corresponding `using` returns Fixes #55556 * make `_unsetindex` fast for isbits eltype (#56364) fixes https://github.com/JuliaLang/julia/issues/56359#issuecomment-2441537634 ``` using Plots function f(n) a = Vector{Int}(undef, n) s = time_ns() resize!(a, 8) time_ns() - s end x = 8:10:1000000 y = f.(x) plot(x, y) ```  * improved `eltype` for `flatten` with tuple argument (#55946) We have always had ``` julia> t = (Int16[1,2], Int32[3,4]); eltype(Iterators.flatten(t)) Any ``` With this PR, the result is `Signed` (`promote_typejoin` applied to the element types of the tuple elements). The same applies to `NamedTuple`: ``` julia> nt = (a = [1,2], b = (3,4)); eltype(Iterators.flatten(nt)) Any # old Int64 # new ``` * Reland "Reroute (Upper/Lower)Triangular * Diagonal through __muldiag #55984" (#56270) This relands #55984 which was reverted in #56267. Previously, in #55984, the destination in multiplying triangular matrices with diagonals was also assumed to be triangular, which is not necessarily the case in `mul!`. Tests for this case, however, were being run non-deterministically, so this wasn't caught by the CI runs. This improves performance: ```julia julia> U = UpperTriangular(rand(100,100)); D = Diagonal(rand(size(U,2))); C = similar(U); julia> @btime mul!($C, $D, $U); 1.517 μs (0 allocations: 0 bytes) # nightly 1.116 μs (0 allocations: 0 bytes) # This PR ``` * Add one-arg `norm` method (#56330) This reduces the latency of `norm` calls, as the single-argument method lacks branches and doesn't use aggressive constant propagation, and is therefore simpler to compile. Given that a lot of `norm` calls use `p==2`, it makes sense for us to reduce the latency on this call. ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time norm(A); 0.247515 seconds (390.09 k allocations: 19.993 MiB, 33.57% gc time, 99.99% compilation time) # master 0.067201 seconds (121.24 k allocations: 6.067 MiB, 99.98% compilation time) # this PR ``` An example of an improvement in ttfx because of this: ```julia julia> A = rand(2,2); julia> @time A ≈ A; 0.556475 seconds (1.16 M allocations: 59.949 MiB, 24.14% gc time, 100.00% compilation time) # master 0.333114 seconds (899.85 k allocations: 46.574 MiB, 8.11% gc time, 99.99% compilation time) # this PR ``` * fix a forgotten rename `readuntil` -> `copyuntil` (#56380) Fixes https://github.com/JuliaLang/julia/issues/56352, with the repro in that issue: ``` Master: 1.114874 seconds (13.01 M allocations: 539.592 MiB, 3.80% gc time) After: 0.369492 seconds (12.99 M allocations: 485.031 MiB, 10.73% gc time) 1.10: 0.341114 seconds (8.36 M allocations: 454.242 MiB, 2.69% gc time) ``` * remove unnecessary operations from `typejoin_union_tuple` (#56379) Removes the unnecessary call to `unwrap_unionall` and type assertion. * precompile: fix performance issues with IO (#56370) The string API here rapidly becomes unusably slow if dumping much debug output during precompile. Fix the design here to use an intermediate IO instead to prevent that. * cache the `find_all_in_cache_path` call during parallel precompilation (#56369) Before (in an environment with DifferentialEquations.jl): ```julia julia> @time Pkg.precompile() 0.733576 seconds (3.44 M allocations: 283.676 MiB, 6.24% gc time) julia> isfile_calls[1:10] 10-element Vector{Pair{String, Int64}}: "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_zHycD.ji" => 178 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_xxrt3.ji" => 178 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_xxrt3.ji" => 158 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_zHycD.ji" => 158 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_zHycD.ji" => 155 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_xxrt3.ji" => 155 "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_4Qv86.ji" => 152 "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_juhqb.ji" => 152 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_zHycD.ji" => 144 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_xxrt3.ji" => 144 ``` After: ```julia julia> @time Pkg.precompile() 0.460077 seconds (877.59 k allocations: 108.075 MiB, 4.77% gc time) julia> isfile_calls[1:10] 10-element Vector{Pair{String, Int64}}: "/tmp/jl_a5xFWK/Project.toml" => 15 "/tmp/jl_a5xFWK/Manifest.toml" => 7 "/home/kc/.julia/registries/General.toml" => 6 "/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Markdown/src/Markdown.jl" => 3 "/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Serialization/src/Serialization.jl" => 3 "/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Distributed/src/Distributed.jl" => 3 "/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/UUIDs/src/UUIDs.jl" => 3 "/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LibCURL/src/LibCURL.jl" => 3 ``` Performance is improved and we are not calling `isfile` on a bunch of the same ji files hundreds times. Benchmark is made on a linux machine so performance diff should be a lot better on Windows where these `isfile_casesensitive` call is much more expensive. Fixes https://github.com/JuliaLang/julia/issues/56366 --------- Co-authored-by: KristofferC <[email protected]> Co-authored-by: Ian Butterworth <[email protected]> * [docs] Fix note admonition in llvm-passes.md (#56392) At the moment this is rendered incorrectly: https://docs.julialang.org/en/v1.11.1/devdocs/llvm-passes/#JuliaLICM * structure-preserving broadcast for `SymTridiagonal` (#56001) With this PR, certain broadcasting operations preserve the structure of a `SymTridiagonal`: ```julia julia> S = SymTridiagonal([1,2,3,4], [1,2,3]) 4×4 SymTridiagonal{Int64, Vector{Int64}}: 1 1 ⋅ ⋅ 1 2 2 ⋅ ⋅ 2 3 3 ⋅ ⋅ 3 4 julia> S .* 2 4×4 SymTridiagonal{Int64, Vector{Int64}}: 2 2 ⋅ ⋅ 2 4 4 ⋅ ⋅ 4 6 6 ⋅ ⋅ 6 8 ``` This was deliberately disabled on master, but I couldn't find any test that fails if this is enabled. * 🤖 [master] Bump the Pkg stdlib from 116ba910c to 9f8e11a4c (#56386) Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: 116ba910c New commit: 9f8e11a4c Julia version: 1.12.0-DEV Pkg version: 1.12.0 Bump invoked by: @IanButterworth Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: https://github.com/JuliaLang/Pkg.jl/compare/116ba910c74ab565d348aa8a50d6dd10148f11ab...9f8e11a4c0efb3b68a1e25a33f372f398c89cd66 ``` $ git log --oneline 116ba910c..9f8e11a4c 9f8e11a4c strip out tree_hash for stdlibs that have have been freed in newer julia versions (#4062) c0df25a47 rm dead code (#4061) ``` Co-authored-by: Dilum Aluthge <[email protected]> * load extensions with fewer triggers earlier (#49891) Aimed to support the use case in https://github.com/JuliaLang/julia/issues/48734#issuecomment-1554626135. https://github.com/KristofferC/ExtSquared.jl is an example, see specifically https://github.com/KristofferC/ExtSquared.jl/blob/ded7c57d6f799674e3310b8174dfb07591bbe025/ext/BExt.jl#L4. I think this makes sense, happy for a second pair of eyes though. cc @termi-official --------- Co-authored-by: KristofferC <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> * Dispatch in generic_matmatmul (#56384) Replacing the branches by dispatch reduces latency, presumably because there's less dead code in the method. ```julia julia> using LinearAlgebra julia> A = rand(Int,2,2); B = copy(A); C = similar(A); julia> @time mul!(C, A, B, 1, 2); 0.363944 seconds (1.65 M allocations: 84.584 MiB, 37.57% gc time, 99.99% compilation time) # master 0.102676 seconds (176.55 k allocations: 8.904 MiB, 27.04% gc time, 99.97% compilation time) # this PR ``` The latency is now distributed between the different branches: ```julia julia> @time mul!(C, A, B, 1, 2); 0.072441 seconds (176.55 k allocations: 8.903 MiB, 99.97% compilation time) julia> @time mul!(C, A', B, 1, 2); 0.085817 seconds (116.44 k allocations: 5.913 MiB, 99.96% compilation time: 4% of which was recompilation) julia> @time mul!(C, A', B', 1, 2); 0.345337 seconds (1.07 M allocations: 54.773 MiB, 25.77% gc time, 99.99% compilation time: 40% of which was recompilation) ``` It would be good to look into why there's recompilation in the last case, but the branch is less commonly taken than the others that have significantly lower latency after this PR. * Add `atol` to addmul tests (#56210) This avoids the issues as in https://github.com/JuliaLang/julia/issues/55781 and https://github.com/JuliaLang/julia/issues/55779 where we compare small numbers using a relative tolerance. Also, in this PR, I have added an extra test, so now we compare both `A * B * alpha + C * beta` and `A * B * alpha - C * beta` with the corresponding in-place versions. The idea is that if the terms `A * B * alpha` and ` C * beta` have similar magnitudes, at least one of the two expressions will usually result in a large enough number that may be compared using a relative tolerance. I am unsure if the `atol` chosen here is optimal, as I have ballparked it to use the maximum `eps` by looking at all the `eltype`s involved. Fixes #55781 Fixes #55779 * Export jl_gc_new_weakref again via julia.h (#56373) This is how it used for at least Julia 1.0 - 1.11 Closes #56367 * InteractiveUtils: define `InteractiveUtils.@code_ircode` (#56390) * Fix some missing write barriers and add some helpful comments (#56396) I was trying some performance optimization which didn't end up working out, but in the process I found two missing write barriers and added some helpful comments for future readers, so that part is probably still useful. * compiler: fix specialization mistake introduced by #40985 (#56404) Hopefully there aren't any others like this hiding around? Not useful to make a new closure for every method that we inline, since we just called `===` inside it * Avoid racy double-load of binding restriction in `import_module` (#56395) Fixes #56333 * define `InteractiveUtils.@infer_[return|exception]_type` (#56398) Also simplifies the definitions of `@code_typed` and the other similar macros. * irinterp: set `IR_FLAG_REFINED` for narrowed `PhiNode`s (#56391) `adce_pass!` can transform a `Union`-type `PhiNode` into a narrower `PhiNode`, but in such cases, the `IR_FLAG_REFINED` flag isn’t set on that `PhiNode` statement. By setting this flag, irinterp can perform statement reprocessing using the narrowed `PhiNode`, enabling type stability in cases like JuliaLang/julia#56387. - fixes JuliaLang/julia#56387 * document isopen(::Channel) (#56376) This PR has two purposes -- 1) Add some documentation for public API 2) Add a small note about a footgun I've hit a few times: `!isopen(ch)` does not mean that you are "done" with the channel because buffered channels can still have items left in them that need to be taken. --------- Co-authored-by: CY Han <[email protected]> * Make build system respect `FORCE_COLOR` and `NO_COLOR` settings (#56346) Follow up to #53742, but for the build system. CC: @omus. * Add `edges` vector to CodeInstance/CodeInfo to keep backedges as edges (#54894) Appears to add about 11MB (128MB to 139MB) to the system image, and to decrease the stdlib size by 55 MB (325MB to 270MB), so seems overall favorable right now. The edges are computed following the encoding <https://hackmd.io/sjPig55kS4a5XNWC6HmKSg?both#Edges-Encoding> to correctly reflect the backedges. Co-authored-by: Shuhei Kadowaki <[email protected]> * docs: remove `dirname.c` from THIRDPARTY file (#56413) - `dirname.c` was removed by https://github.com/JuliaLang/julia/commit/c2cec7ad57102e4fbb733b8fb79d617a9524f0ae * Allow ext → ext dependency if triggers are a strict superset (#56368) (#56402) Forward port of #56368 - this was a pretty clean port, so it should be good to go once tests pass. * [docs] Fix rendering of warning admonition in llvm passes page (#56412) Follow up to #56392: also the warning in https://docs.julialang.org/en/v1.11.1/devdocs/llvm-passes/#Multiversioning is rendered incorrectly because of a missing space. * Fix dispatch for `rdiv!` with `LU` (#55764) * Remove overwritten method of OffsetArray (#56414) This is overwritten three definitions later in `Base.reshape(A::OffsetArray, inds::Colon)`. Should remove warnings I saw when testing a package that uses it. * Add a missing GC root in constant declaration (#56408) As pointed out in https://github.com/JuliaLang/julia/pull/56224#discussion_r1816974147. * Teach compiler about partitioned bindings (#56299) This commit teaches to compiler to update its world bounds whenever it looks at a binding partition, making the compiler sound in the presence of a partitioned binding. The key adjustment is that the compiler is no longer allowed to directly query the binding table without recording the world bounds, so all the various abstract evaluations that look at bindings need to be adjusted and are no longer pure tfuncs. We used to look at bindings a lot more, but thanks to earlier prep work to remove unnecessary binding-dependent code (#55288, #55289 and #55271), these changes become relatively straightforward. Note that as before, we do not create any binding partitions by default, so this commit is mostly preperatory. --------- Co-authored-by: Shuhei Kadowaki <[email protected]> * Restore JL_NOTSAFEPOINT in jl_stderr_obj (#56407) This is not a function we're really using, but it's used in the embedding examples, so I'm sure somebody would complain if I deleted it or made it a safepoint, so let's just give the same best-effort result as before. * reland "Inlining: Remove outdated code path for GlobalRef movement (#46880)" (#56382) From the description of the original PR: > We used to not allow `GlobalRef` in `PhiNode` at all (because they > could have side effects). However, we then change the IR to make > side-effecting `GlobalRef`s illegal in statement position in general, > so now `PhiNode`s values are just regular value position, so there's > no reason any more to try to move `GlobalRef`s out to statement > position in inlining. Moreover, doing so introduces a bunch of > unnecessary `GlobalRef`s that weren't being moved back. We could fix > that separately by setting appropriate flags, but it's simpler to just > get rid of this special case entirely. This change itself does not sound to have any issues, and in fact, it is very useful for keeping the IR slim, especially in code generated by Cassette-like systems, so I would like to reland it. However, the original PR was reverted in JuliaLang/julia#46951 due to bugs like JuliaLang/julia#46940 and JuliaLang/julia#46943. I could not reproduce these bugs on my end (maybe they have been fixed on some GC-side fixes?), so I believe relanding the original PR’s changes would not cause any issues, but it is necessary to confirm that similar problems do not arise before merging this PR. * copy effects key to `Base.infer_effects` (#56363) Copied from the docstring of `Core.Compiler.Effects`, this makes it easier to figure out what the output of `Base.infer_effects` is actually telling you. * Fix `make install` for asan build (#56347) Now the makescript finds libclang_rt.asan-x86_64.so for example. The change from `-0` to `-1` is as with `-1`, `libclang_rt.asan-*` is searched for in `usr/lib/julia` instead of `usr/lib`. * Add dims check to triangular mul (#56393) This adds a dimension check to triangular matrix multiplication methods. While such checks already exist in the individual branches (occasionally within `BLAS` methods), having these earlier would permit certain optimizations, as we are assured that the axes are compatible. This potentially duplicates the checks, but this is unlikely to be a concern given how cheap the checks are. I've also reused the `check_A_mul_B!_sizes` function that is defined in `bidiag.jl`, instead of hard-coding the checks. Further, I've replaced some hard-coded loop ranges by the corresponding `axes` and `first/lastindex` calls. These are identical under the 1-based indexing assumption, but the `axes` variants are easier to read and reason about. * clarify short-circuit && and || docs (#56420) This clarifies the docs to explain that `a && b` is equivalent to `a ? b : false` and that `a || b` is equivalent to `a ? true : b`. In particular, this explains why the second argument does not need to be a boolean value, which is a common point of confusion. (See e.g. [this discourse thread](https://discourse.julialang.org/t/internals-of-assignment-when-doing-short-circuit-evaluation/122178/2?u=stevengj).) * docs: replace 'leaf types' with 'concrete types' (#56418) Fixes #55044 --------- Co-authored-by: inkydragon <[email protected]> * Remove aggressive constprop annotation on generic_matmatmul_wrapper! (#56400) This annotation seems unnecessary, as the method gets inlined and there's no computation being carried out using the value of the constant. * Clarify the FieldError docstring (#55222) * Allow `Time`s to be rounded to `Period`s (#52629) Co-authored-by: CyHan <[email protected]> Co-authored-by: Curtis Vogt <[email protected]> * Replace unconditional store with cmpswap to avoid deadlocking in jl_fptr_wait_for_compiled_addr (#56444) That unconditional store could overwrite the actual compiled code in that pointer, so make it a cmpswap * Correct nothrow modeling of `get_binding_type` (#56430) As pointed out in https://github.com/JuliaLang/julia/pull/56299#discussion_r1826509185, although the bug predates that PR. * add tip for module docstrings before load (#56445) * compiler: Strengthen some assertions and fix a couple small bugs (#56449) * inference: minor follow-ups to JuliaLang/julia#56299 (#56450) * Ensure that String(::Memory) returns only a String, not any owner (#56438) Fixes #56435 * Take safepoint lock before going to sleep in the scheduler. (#56443) This avoids a deadlock during exit. Between a thread going to sleep and the thread exiting. * Profile: mention `kill -s SIGUSR1 julia_pid` for Linux (#56441) currentlu this route is mentioned in docs https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution but missing from the module docstring, this should help users who have little idea how to "send a kernel signal to a process" to get started --------- Co-authored-by: Ian Butterworth <[email protected]> * Fix and test an overflow issue in `searchsorted` (#56464) And remove `searchsorted` special cases for offset arrays in tests that had the impact of bypassing actually testing `searchsorted` behavior on offset arrays To be clear, after this bugfix the function is still broken, just a little bit less so. * Update docs of calling convention arg in `:foreigncall` AST node (#56417) * `step(::AbstractUnitRange{Bool})` should return `Bool` (#56405) The issue was introduced by #27302 , as ```julia julia> true-false 1 ``` By definitions below, `AbstractUnitRange{Bool} <: OrdinalRange{Bool, Bool}` whose step type is `Bool`. https://github.com/JuliaLang/julia/blob/da74ef1933b12410b217748e0f7fbcbe52e10d29/base/range.jl#L280-L299 --------- Co-authored-by: Matt Bauman <[email protected]> Co-authored-by: Matt Bauman <[email protected]> * fixup! JuliaLang/julia#56028, fix up the type-level escapability check In JuliaLang/julia#56028, the type-level escapability check was changed to use `is_mutation_free_argtype`, but this was a mistake because EA no longer runs for structs like `mutable struct ForeignBuffer{T}; const ptr::Ptr{T}; end`. This commit changes it to use `is_identity_free_argtype` instead, which can be used to detect whether a type may contain any mutable allocations or not. * add `show(::IO, ::ArgEscapeInfo)` * EA: disable finalizer inlining for allocations that are edges of `PhiNode`s (#56455) The current EA-based finalizer inlining implementation can create invalid IR when the target object is later aliased as a `PhiNode`, which was causing #56422. In such cases, finalizer inlining for the allocations that are edges of each `PhiNode` should be avoided, and instead, finalizer inlining should ideally be applied to the `PhiNode` itself, but implementing that is somewhat complex. As a temporary fix, this commit disables inlining in those cases. - fixes #56422 * make `verify_ir` error messages more informative (#56452) Currently, when `verify_ir` finds an error, the `IRCode` is printed, but it's not easy to determine which method instance generated that `IRCode`. This commit adds method instance and code location information to the error message, making it easier to identify the problematic code. E.g.: ```julia [...] 610 │ %95 = builtin Core.tuple(%48, %94)::Tuple{GMT.Gdal.IGeometry, GMT.Gdal.IGeometry} └─── return %95 ERROR: IR verification failed. Code location: ~/julia/packages/GMT/src/gdal_extensions.jl:606 Method instance: MethodInstance for GMT.Gdal.helper_2geoms(::Matrix{Float64}, ::Matrix{Float64}) Stacktrace: [1] error(::String, ::String, ::String, ::Symbol, ::String, ::Int32, ::String, ::String, ::Core.MethodInstance) @ Core.Compiler ./error.jl:53 [...] ``` * [GHA] Explicitly install Julia for whitespace workflow (#56468) So far we relied on the fact that Julia comes in the default Ubuntu images on GitHub Actions runners, but this may change in the future (although there's apparently no plan in this direction for the time being). To make the workflow more future-proof, we now explicitly install Julia using a dedicated workflow. * Allow taking Matrix slices without an extra allocation (#56236) Since changing Array to use Memory as the backing, we had the option of making non-Vector arrays more flexible, but had instead preserved the restriction that they must be zero offset and equal in length to the Memory. This results in extra complexity, restrictions, and allocations however, but doesn't gain many known benefits. Nanosoldier shows a decrease in performance on linear eachindex loops, which we theorize is due to a minor failure to CSE before SCEV or a lack of NUW/NSW on the length multiplication calculation. * [late-gc-lowering] null-out GC frame slots for dead objects (#52935) Should fix https://github.com/JuliaLang/julia/issues/51818. MWE: ```julia function testme() X = @noinline rand(1_000_000_00) Y = @noinline sum(X) X = nothing GC.gc() return Y end ``` Note that it now stores a `NULL` in the GC frame before calling `jl_gc_collect`. Before: ```llvm ; Function Signature: testme() ; @ /Users/dnetto/Personal/test.jl:3 within `testme` define double @julia_testme_535() #0 { top: %gcframe1 = alloca [3 x ptr], align 16 call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true) %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10 store i64 4, ptr %gcframe1, align 16 %task.gcstack = load ptr, ptr %pgcstack, align 8 %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1 store ptr %task.gcstack, ptr %frame.prev, align 8 store ptr %gcframe1, ptr %pgcstack, align 8 ; @ /Users/dnetto/Personal/test.jl:4 within `testme` %0 = call nonnull ptr @j_rand_539(i64 signext 100000000) %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2 store ptr %0, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:5 within `testme` %1 = call double @j_sum_541(ptr nonnull %0) ; @ /Users/dnetto/Personal/test.jl:7 within `testme` ; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132 call void @jlplt_ijl_gc_collect_543_got.jit(i32 1) %frame.prev4 = load ptr, ptr %frame.prev, align 8 store ptr %frame.prev4, ptr %pgcstack, align 8 ; └ ; @ /Users/dnetto/Personal/test.jl:8 within `testme` ret double %1 } ``` After: ```llvm ; Function Signature: testme() ; @ /Users/dnetto/Personal/test.jl:3 within `testme` define double @julia_testme_752() #0 { top: %gcframe1 = alloca [3 x ptr], align 16 call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true) %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10 store i64 4, ptr %gcframe1, align 16 %task.gcstack = load ptr, ptr %pgcstack, align 8 %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1 store ptr %task.gcstack, ptr %frame.prev, align 8 store ptr %gcframe1, ptr %pgcstack, align 8 ; @ /Users/dnetto/Personal/test.jl:4 within `testme` %0 = call nonnull ptr @j_rand_756(i64 signext 100000000) %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2 store ptr %0, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:5 within `testme` %1 = call double @j_sum_758(ptr nonnull %0) store ptr null, ptr %gc_slot_addr_0, align 16 ; @ /Users/dnetto/Personal/test.jl:7 within `testme` ; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132 call void @jlplt_ijl_gc_collect_760_got.jit(i32 1) %frame.prev6 = load ptr, ptr %frame.prev, align 8 store ptr %frame.prev6, ptr %pgcstack, align 8 ; └ ; @ /Users/dnetto/Personal/test.jl:8 within `testme` ret double %1 } ``` * Added test for resolving array references in exprresolve (#56471) added test to take care of non-real-index handling while resolving array references in exprresolve to test julia/base/cartesian.jl - line 427 to 432 * Fix and test searchsorted for arrays whose first index is `typemin(Int)` (#56474) This fixes the issue reported in https://github.com/JuliaLang/julia/issues/56457#issuecomment-2457223264 which, combined with #56464 which fixed the issue in the OP, fixes #56457. `searchsortedfirst` was fine all along, but I added it to tests regardless. * Move Core.Compiler into Base This is the first step in what I am hoping will eventually result in making the compiler itself and upgradable stdlib. Over time, we've gained several non-Base consumers of `Core.Compiler`, and we've reached a bit of a breaking point where maintaining those downstream dependencies is getting more difficult than the close coupling of Core.Compiler to the runtime is worth. In this first step, I am moving Core.Compiler into Base, ending the duplication of common data structure and generic functions between Core.Compiler and Base. This split goes back quite far (although not all the way) to the early days of Julia and predates the world-age mechanism. The extant Base and Core.Compiler environments have some differences (other than the duplication). I think the primary ones are (but I will add more here if somebody points one out). - `Core.Compiler` does not use `getproperty` - `Core.Compiler` does not have extensible `==` equality In this, I decided to retain the former by setting `getproperty = getfield` for Core.Compiler itself (though of course not for the datatstructures shared with Base). I don't think it's strictly necessary, but might as well. For equality, I decided the easiest thing to do would be to try to merge the equalities and see what happens. In general, Core.Compiler is relatively restricted in the kinds of equality comparisons it can make, so I think it'll work out fine, but we can revisit this. This seems to be fully working and most of this is just moving code around. I think most of that refactoring is independently useful, so I'll pull some of it out into separate PRs to make this PR more manageable. * Delete buggy `stat(::Integer)` method (#54855) "Where did someone get a RawFD as an integer anyway?" -@stefankarpinski See also #51711 Fixes #51710 * missing gc-root store in subtype (#56472) Fixes #56141 Introduced by #52228 (a624d445c02c) * further defer jl_insert_backedges after loading (#56447) Finish fully breaking the dependency between method insertions and inferring whether the cache is valid. The cache should be inferable in parallel and in aggregate after all loading is finished. This prepares us for moving this code into Julia (Core.Compiler) next. * count bytes allocated through malloc more precisely (#55223) Should make the accounting for memory allocated through malloc a bit more accurate. Should also simplify the accounting code by eliminating the use of `jl_gc_count_freed` in `jl_genericmemory_to_string`. * Fix external IO loop thead interaction and add function to Base.Experimental to facilitate it's use. Also add a test. (#55529) While looking at https://github.com/JuliaLang/julia/issues/55525 I found that the implementation wasn't working correctly. I added it to Base.Experimental so people don't need to handroll their own and am also testing a version of what the issue was hitting. * [REPL] raise default implicit `show` limit to 1MiB (#56297) https://github.com/JuliaLang/julia/pull/53959#issuecomment-2426946640 I would like to understand more where these issues are coming from; it would be easy to exempt some types from Base or Core with ```julia REPL.show_limited(io::IO, mime::MIME, x::SomeType) = show(io, mime, x) ``` but I'm not sure which are causing problems in practice. But meanwhile I think raising the limit makes sense. * Add a docstring for `Base.divgcd` (#53769) Co-authored-by: Sukera <[email protected]> * Fix compilation warning on aarch64-linux (#56480) This fixes the warning: ``` /cache/build/default-aws-aarch64-ci-1-3/julialang/julia-master/src/stackwalk.c: In function 'jl_simulate_longjmp': /cache/build/default-aws-aarch64-ci-1-3/julialang/julia-master/src/stackwalk.c:995:22: warning: initialization of 'mcontext_t *' {aka 'struct sigcontext *'} from incompatible pointer type 'struct unw_sigcontext *' [-Wincompatible-pointer-types] 995 | mcontext_t *mc = &c->uc_mcontext; | ^ ``` This is the last remaining warning during compilation on aarch64-linux. * Make Compiler an independent package This is a further extension to #56128 to make the compiler into a proper independent, useable outside of `Base` as `using Compiler` in the same way that `JuliaSyntax` works already. InteractiveUtils gains a new `@activate` macro that can be used to activate an outside Compiler package, either for reflection only or for codegen also. * Make heap size hint available as an env variable (#55631) This makes `JULIA_HEAP_SIZE_HINT` the environment variable version of the `--heap-size-hint` command-line flag. Seems like there was interest in https://github.com/JuliaLang/julia/pull/45369#issuecomment-1544204022. The same syntax is used as for the command-line version with, for example, `2G` => 2 GB and `200M` => 200 MB. @oscardssmith want to take a look? * Allow indexing `UniformScaling` with `CartesianIndex{2}` (#56461) Since indexing with two `Integer`s is defined, we might as well define indexing with a `CartesianIndex`. This makes certain loops convenient where the index is obtained using `eachindex`. * Simplify first index in `FastContiguousSubArray` definition (#56491) Since `Slice <: AbstractUnitRange` and `Union{Slice, AbstractUnitRange} == AbstractUnitRange`, we may simplify the first index. * Make `popat!` support `@inbounds` (#56323) Co-authored-by: Jishnu Bhattacharya <[email protected]> * NEWS.md: clarify `--trim` (#56460) Co-authored-by: Matt Bauman <[email protected]> * Remove aggressive constprop annotation from 2x2 and 3x3 matmul (#56453) Removing these annotations reduces ttfx slightly. ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time mul!(similar(A), A, A, 1, 2); 0.296096 seconds (903.49 k allocations: 44.313 MiB, 4.25% gc time, 99.98% compilation time) # nightly 0.286009 seconds (835.88 k allocations: 40.732 MiB, 3.29% gc time, 99.98% compilation time) # this PR ``` * `sincos` for non-float symmetric matrices (#56484) Ensures that the `eltype` of the array to which the result of `sincos` is a floating-point one, even if the argument doesn't have a floating-point `eltype`. After this, the following works: ```julia julia> A = diagm(0=>1:3) 3×3 Matrix{Int64}: 1 0 0 0 2 0 0 0 3 julia> sincos(A) ([0.8414709848078965 0.0 0.0; 0.0 0.9092974268256817 0.0; 0.0 0.0 0.1411200080598672], [0.5403023058681398 0.0 0.0; 0.0 -0.4161468365471424 0.0; 0.0 0.0 -0.9899924966004454]) ``` * Specialize 2-arg `show` for `LinearIndices` (#56482) After this, ```julia julia> l = LinearIndices((1:3, 1:4)); julia> show(l) LinearIndices((1:3, 1:4)) ``` The printed form is a valid constructor. * Avoid constprop in `syevd!` and `syev!` (#56442) This improves compilation times slightly: ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time eigen!(Hermitian(A)); 0.163380 seconds (180.51 k allocations: 8.760 MiB, 99.88% compilation time) # master 0.155285 seconds (163.77 k allocations: 7.971 MiB, 99.87% compilation time) # This PR ``` The idea is that the constant propagation is only required to infer the return type, and isn't necessary in the body of the method. We may therefore annotate the body with a `@constprop :none`. * make: define `basecompiler.ji` target (#56498) For easier experimentation with just the bootstrap process. Additionally, as a follow-up to JuliaLang/julia#56409, this commit also includes some minor cosmetic changes. * speed up bootstrapping by compiling few optimizer subroutines earlier (#56501) Speeds up the bootstrapping process by about 30 seconds. * remove top-level branches checking for Base (#56507) These are no longer needed, now that the files are no longer included twice. * Undo the decision to publish incomplete types to the binding table (#56497) This effectively reverts #36121 and replaces it with #36111, which was the originally proposed alternative to fix #36104. To recap, the question is what should happen for ``` module Foo struct F v::Foo.F end end ``` i.e. where the type reference tries to refer to the newly defined type via its global path. In #36121 we adjusted things so that we first assign the type to its global binding and then evaluate the field type (leaving the type in an incomplete state in the meantime). The primary reason that this choice was that we would have to deal with incomplete types assigned to global bindings anyway if we ever did #32658. However, I think this was the wrong choice. There is a difference between allowing incomplete types and semantically forcing incomplete types to be globally observable every time a new type is defined. The situation was a little different four years ago, but with more extensive threading (which can observe the incompletely constructed type) and the upcoming completion of bindings partition, the situation is different. For bindings partition in particular, this would require two invalidations on re-definition, one to the new incomplete type and then back to the complete type. I don't think this is worth it, for the (somewhat niche and possibly-should-be- deprecated-future) case of refering to incompletely defined types by their global names. So let's instead try the hack in #36111, which does a frontend rewrite of the global path. This should be sufficient to at least address the obvious cases. * Merge identical methods for Symmetric/Hermitian and SymTridiagonal (#56434) Since the methods do identical things, we may define each method once for a union of types instead of defining methods for each type. * Specialize findlast for integer AbstractUnitRanges and StepRanges (#54902) For monotonic ranges, `findfirst` and `findlast` with `==(val)` as the predicate should be identical, as each value appears only once in the range. Since `findfirst` is specialized for some ranges, we may define `findlast` as well analogously. On v"1.12.0-DEV.770" ```julia julia> @btime findlast(==(1), $(Ref(1:1_000))[]) 1.186 μs (0 allocations: 0 bytes) 1 ``` This PR ```julia julia> @btime findlast(==(1), $(Ref(1:1_000))[]) 3.171 ns (0 allocations: 0 bytes) 1 ``` I've also specialized `findfirst(iszero, r::AbstractRange)` to make this be equivalent to `findfirst(==(0), ::AbstractRange)` for numerical ranges. Similarly, for `isone`. These now take the fast path as well. Thirdly, I've added some `convert` calls to address issues like ```julia julia> r = Int128(1):Int128(1):Int128(4); julia> findfirst(==(Int128(2)), r) |> typeof Int128 julia> keytype(r) Int64 ``` This PR ensures that the return type always corresponds to `keytype`, which is what the docstring promises. This PR also fixes ```julia julia> findfirst(==(0), UnitRange(-0.5, 0.5)) ERROR: InexactError: Int64(0.5) Stacktrace: [1] Int64 @ ./float.jl:994 [inlined] [2] findfirst(p::Base.Fix2{typeof(==), Int64}, r::UnitRange{Float64}) @ Base ./array.jl:2397 [3] top-level scope @ REPL[1]:1 ``` which now returns `nothing`, as expected. * Loop over `Iterators.rest` in `_foldl_impl` (#56492) For reasons that I don't understand, this improves performance in `mapreduce` in the following example: ```julia julia> function g(A) for col in axes(A,2) mapreduce(iszero, &, view(A, UnitRange(axes(A,1)), col), init=true) || return false end return true end g (generic function with 2 methods) julia> A = zeros(2, 10000); julia> @btime g($A); 28.021 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1571" 12.462 μs (0 allocations: 0 bytes) # this PR julia> A = zeros(1000,1000); julia> @btime g($A); 372.080 μs (0 allocations: 0 bytes) # nightly 321.753 μs (0 allocations: 0 bytes) # this PR ``` It would be good to understand what the underlying issue is, as the two seem equivalent to me. Perhaps this form makes it clear that it's not, in fact, an infinite loop? * better error message for rpad/lpad with zero-width padding (#56488) Closes #45339 — throw a more informative `ArgumentError` message from `rpad` and `lpad` if a zero-`textwidth` padding is passed (not a `DivideError`). If the padding character has `ncodeunits == 1`, suggests that maybe they want `str * pad^max(0, npad - ncodeunits(str))` instead. * Safer indexing in dense linalg methods (#56451) Ensure that `eachindex` is used consistently alongside `@inbounds`, and use `diagind` to obtain indices along a diagonal. * The `info` in LAPACK calls should be a Ref instead of a Ptr (#56511) Co-authored-by: Viral B. Shah <[email protected]> * Scaling loop instead of broadcasting in strided matrix exp (#56463) Firstly, this is easier to read. Secondly, this merges the two loops into one. Thirdly, this avoids the broadcasting latency. ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time LinearAlgebra.exp!(A); 0.952597 seconds (2.35 M allocations: 116.574 MiB, 2.67% gc time, 99.01% compilation time) # master 0.877404 seconds (2.17 M allocations: 106.293 MiB, 2.65% gc time, 99.99% compilation time) # this PR ``` The performance also improves as there are fewer allocations in the first branch (`opnorm(A, 1) <= 2.1`): ```julia julia> B = diagm(0=>im.*(float.(1:200))./200, 1=>(1:199)./400, -1=>(1:199)./400); julia> opnorm(B,1) 1.9875 julia> @btime exp($B); 5.066 ms (30 allocations: 4.89 MiB) # nightly v"1.12.0-DEV.1581" 4.926 ms (27 allocations: 4.28 MiB) # this PR ``` * codegen: Respect binding partition (#56494) Minor changes to make codegen correct in the face of partitioned constant bindings. Does not yet handle the envisioned semantics for globals that change restriction type, which will require a fair bit of additional work. * Profile: fix Compiler short path (#56515) * Check `isdiag` in dense trig functions (#56483) This improves performance for dense diagonal matrices, as we may apply the function only to the diagonal elements. ```julia julia> A = diagm(0=>rand(100)); julia> @btime cos($A); 349.211 μs (22 allocations: 401.58 KiB) # nightly v"1.12.0-DEV.1571" 16.215 μs (7 allocations: 80.02 KiB) # this PR ``` --------- Co-authored-by: Daniel Karrasch <[email protected]> * Profile: add helper method for printing profile report to file (#56505) The IOContext part is isn't obvious, because otherwise the IO is assumed to be 80 chars wide, which makes for bad reports. * Change in-place exp to out-of-place in matrix trig functions (#56242) This makes the functions work for arbitrary matrix types that support `exp`, but not necessarily the in-place `exp!`. For example, the following works after this: ```julia julia> m = SMatrix{2,2}(1:4); julia> cos(m) 2×2 SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2): 0.855423 -0.166315 -0.110876 0.689109 ``` There's a slight performance improvement as well because we don't compute `im*A` and `-im*A` separately, but we negate the first to obtain the second. ```julia julia> A = rand(ComplexF64,100,100); julia> @btime sin($A); 2.796 ms (48 allocations: 1.84 MiB) # nightly v"1.12.0-DEV.1571" 2.304 ms (48 allocations: 1.84 MiB) # this PR ``` * Test: Don't change scope kind in `test_{warn,nowarn}` (#56524) This was part of #56509, but is an independent bugfix. The basic issue is that these macro were using `do` block internally. This is undesirable for test macros, because we would like them not to affect the behavior of what they're testing. E.g. right now: ``` julia> using Test julia> const x = 1 1 julia> @test_nowarn const x = 1 ERROR: syntax: `global const` declaration not allowed inside function around /home/keno/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:927 Stacktrace: [1] top-level scope @ REPL[3]:1 ``` This PR just writes out the try/finally manually, so the above works fine after this PR. * For loop instead of whi…
udesou
pushed a commit
to udesou/julia
that referenced
this pull request
Jul 29, 2025
Simplify `workqueue_for`. While not strictly necessary, the acquire load in `getindex(once::OncePerThread{T,F}, tid::Integer)` makes ThreadSanitizer happy. With the existing implementation, we get false positives whenever a thread other than the one that originally allocated the array reads it: ``` ================== WARNING: ThreadSanitizer: data race (pid=6819) Atomic read of size 8 at 0xffff86bec058 by main thread: #0 getproperty Base_compiler.jl:57 (sys.so+0x113b478) #1 julia_pushNOT._1925 task.jl:868 (sys.so+0x113b478) mmtk#2 julia_enq_work_1896 task.jl:969 (sys.so+0x5cd218) mmtk#3 schedule task.jl:983 (sys.so+0x892294) mmtk#4 macro expansion threadingconstructs.jl:522 (sys.so+0x892294) mmtk#5 julia_start_profile_listener_60681 Base.jl:355 (sys.so+0x892294) mmtk#6 julia___init___60641 Base.jl:392 (sys.so+0x1178dc) mmtk#7 jfptr___init___60642 <null> (sys.so+0x118134) mmtk#8 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5e9a4) mmtk#9 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5e9a4) mmtk#10 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0xbba74) mmtk#11 jl_module_run_initializer /home/user/c/julia/src/toplevel.c:68:13 (libjulia-internal.so.1.13+0xbba74) mmtk#12 _finish_jl_init_ /home/user/c/julia/src/init.c:632:13 (libjulia-internal.so.1.13+0x9c0fc) mmtk#13 ijl_init_ /home/user/c/julia/src/init.c:783:5 (libjulia-internal.so.1.13+0x9bcf4) mmtk#14 jl_repl_entrypoint /home/user/c/julia/src/jlapi.c:1125:5 (libjulia-internal.so.1.13+0xf7ec8) mmtk#15 jl_load_repl /home/user/c/julia/cli/loader_lib.c:601:12 (libjulia.so.1.13+0x11934) mmtk#16 main /home/user/c/julia/cli/loader_exe.c:58:15 (julia+0x10dc20) Previous write of size 8 at 0xffff86bec058 by thread T2: #0 IntrusiveLinkedListSynchronized task.jl:863 (sys.so+0x78d220) #1 macro expansion task.jl:932 (sys.so+0x78d220) mmtk#2 macro expansion lock.jl:376 (sys.so+0x78d220) mmtk#3 julia_workqueue_for_1933 task.jl:924 (sys.so+0x78d220) mmtk#4 julia_wait_2048 task.jl:1204 (sys.so+0x6255ac) mmtk#5 julia_task_done_hook_49205 task.jl:839 (sys.so+0x128fdc0) mmtk#6 jfptr_task_done_hook_49206 <null> (sys.so+0x902218) mmtk#7 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5e9a4) mmtk#8 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5e9a4) mmtk#9 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0x9c79c) mmtk#10 jl_finish_task /home/user/c/julia/src/task.c:345:13 (libjulia-internal.so.1.13+0x9c79c) mmtk#11 jl_threadfun /home/user/c/julia/src/scheduler.c:122:5 (libjulia-internal.so.1.13+0xe7db8) Thread T2 (tid=6824, running) created by main thread at: #0 pthread_create <null> (julia+0x85f88) #1 uv_thread_create_ex /workspace/srcdir/libuv/src/unix/thread.c:172 (libjulia-internal.so.1.13+0x1a8d70) mmtk#2 _finish_jl_init_ /home/user/c/julia/src/init.c:618:5 (libjulia-internal.so.1.13+0x9c010) mmtk#3 ijl_init_ /home/user/c/julia/src/init.c:783:5 (libjulia-internal.so.1.13+0x9bcf4) mmtk#4 jl_repl_entrypoint /home/user/c/julia/src/jlapi.c:1125:5 (libjulia-internal.so.1.13+0xf7ec8) mmtk#5 jl_load_repl /home/user/c/julia/cli/loader_lib.c:601:12 (libjulia.so.1.13+0x11934) mmtk#6 main /home/user/c/julia/cli/loader_exe.c:58:15 (julia+0x10dc20) SUMMARY: ThreadSanitizer: data race Base_compiler.jl:57 in getproperty ================== ```
udesou
added a commit
that referenced
this pull request
Aug 7, 2025
* Increment state conditionally in `CartesianIndices` iteration (#58742) Fixes https://github.com/JuliaLang/julia/issues/53430 ```julia julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv = view(b, axes(b)...); bv2 = view(b, UnitRange.(axes(b))...); julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges 12.352 μs (0 allocations: 0 bytes) # master, v"1.13.0-DEV.745" 1.662 μs (0 allocations: 0 bytes) # this PR julia> @btime copyto!($bv, $av); # reference 1.733 μs (0 allocations: 0 bytes) ``` The performances become comparable after this PR. I've also renamed the second `I` to `Itail`, as the two variables represent different quantities. * 🤖 [master] Bump the Distributed stdlib from 51e5297 to 3679026 (#58748) Stdlib: Distributed URL: https://github.com/JuliaLang/Distributed.jl Stdlib branch: master Julia branch: master Old commit: 51e5297 New commit: 3679026 Julia version: 1.13.0-DEV Distributed version: 1.11.0(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: https://github.com/JuliaLang/Distributed.jl/compare/51e52978481835413d15b589919aba80dd85f890...3679026d7b510befdedfa8c6497e3cb032f9cea1 ``` $ git log --oneline 51e5297..3679026 3679026 Merge pull request #137 from JuliaLang/dpa/dont-use-link-local 875cd5a Rewrite the code to be a bit more explicit 2a6ee53 Non-link-local IP4 > non-link-local IP6 > link-local IP4 > link-local IP6 c0e9eb4 Factor functionality out into separate `choose_bind_addr()` function 86cbb8a Add explanation 0b7288c Worker: Bind to the first non-link-local IPv4 address ff8689a Merge pull request #131 from JuliaLang/spawnat-docs ba3c843 Document that `@spawnat :any` doesn't do load-balancing ``` Co-authored-by: DilumAluthge <[email protected]> * devdocs: contributing: fix headings (#58749) In particular, it seems like Documenter takes the level-one heading to define the page title. So the page titles were missing in the TOC before this change. * Work around LLVM JITLink stack overflow issue. (#58579) The JITLinker recurses for every symbol in the list so limit the size of the list This is kind of ugly. Also 1000 might be too large, we don't want to go too small because that wastes memory and 1000 was fine locally for the things I tested. Fixes https://github.com/JuliaLang/julia/issues/58229 * bump Compiler.jl version to 0.1.1 (#58744) As the latest version of BaseCompiler.jl will be bumped to v0.1.1 after JuliaRegistries/General#132990. * REPL: fix typo and potential `UndefVarError` (#58761) Detected by the new LS diagnostics:) * fix fallback code path in `take!(::IOBuffer)` method (#58762) JET told me that the `data` local variable was inparticular is undefined at this point. After reviewing this code, I think this code path is unreachable actually since `bytesavailable(io::IOBuffer)` returns `0` when `io` has been closed. So it's probably better to make it clear. * Fix multi-threading docs typo (#58770) * help bounds checking to be eliminated for `getindex(::Memory, ::Int)` (#58754) Second try for PR #58741. This moves the `getindex(::Memory, ::Int)` bounds check to Julia, which is how it's already done for `getindex(::Array, ::Int)`, so I guess it's correct. Also deduplicate the bounds checking code while at it. * Define textwidth for overlong chars (#58602) Previously, this would error. There is no guarantee of how terminals render overlong encodings. Some terminals does not print them at all, and some print "�". Here, we set a textwidth of 1, conservatively. Refs #58593 * Add MethodError hints for functions in other modules (#58715) When a MethodError occurs, check if functions with the same name exist in other modules (particularly those of the argument types). This helps users discover that they may need to import a function or ensure multiple functions are the same generic function. - For Base functions: suggests importing (e.g., "You may have intended to import Base.length") - For other modules: suggests they may be intended as the same generic function - Shows all matches from relevant modules in sorted order - Uses modulesof! to properly handle all type structures including unions Fixes #58682 * Fix markdown bullet list in variables-and-scoping.md (#58771) * CONTRIBUTING.md: Ask folks to disclose AI-written PRs (#58666) * Convert julia-repl blocks to jldoctest format (#58594) Convert appropriate julia-repl code blocks to jldoctest format to enable automatic testing. In addition, this introduces a new `nodoctest = "reason"` pattern to annotate code blocks that are deliberate not doctested, so future readers will know not to try. Many code blocks are converted, in particular: - Manual pages: arrays.md, asynchronous-programming.md, functions.md, integers-and-floating-point-numbers.md, metaprogramming.md, multi-threading.md, performance-tips.md, variables.md, variables-and-scoping.md - Base documentation: abstractarray.jl, bitarray.jl, expr.jl, file.jl, float.jl, iddict.jl, path.jl, scopedvalues.md, sort.md - Standard library: Dates/conversions.jl, Random/RNGs.jl, Sockets/addrinfo.jl Key changes: - Add filters for non-deterministic output (timing, paths, memory addresses) - Add setup/teardown for filesystem operations - Fix parentmodule(M) usage in expr.jl for doctest compatibility - Document double escaping requirement for regex filters in docstrings - Update AGENTS.md with test running instructions Note: Some julia-repl blocks were intentionally left unchanged when they demonstrate language internals subject to change or contain non-deterministic output that cannot be properly filtered. Refs #56921 --------- Co-authored-by: Keno Fischer <[email protected]> Co-authored-by: Claude <[email protected]> * adds the `nth` function for iterables (#56580) Hi, I've turned the open ended issue #54454 into an actual PR. Tangentially related to #10092 ? This PR introduces the `nth(itr, n)` function to iterators to give a `getindex` type of behaviour. I've tried my best to optimize as much as possible by specializing on different types of iterators. In the spirit of iterators any OOB access returns `nothing`. (edit: instead of throwing an error, i.e. `first(itr, n)` and `last(itr, n)`) here is the comparison of running the testsuite (~22 different iterators) using generic `nth` and specialized `nth`: ```julia @btime begin for (itr, n, _) in $testset _fallback_nth(itr, n) end end 117.750 μs (366 allocations: 17.88 KiB) @btime begin for (itr, n, _) in $testset nth(itr, n) end end 24.250 μs (341 allocations: 16.70 KiB) ``` --------- Co-authored-by: adienes <[email protected]> Co-authored-by: Steven G. Johnson <[email protected]> Co-authored-by: Dilum Aluthge <[email protected]> * refine IR model queries (#58661) - `jl_isa_ast_node` was missing `enter`/`leave` nodes. - `Core.IR` exports mistakenly included a function `memoryref`. - `Base.IR`, and `quoted` were not public or documented. - Add julia function `isa_ast_node` to improve accuracy of `quoted`. - Change `==` on AST nodes to check egal equality of any constants in the IR / AST, and make hashing consistent with that change. This helpfully allows determining that `x + 1` and `x + 1.0` are not equivalent, exchangeable operations. If you need to compare any two objects for semantic equality, you may need to first wrap them with `x = Base.isa_ast_node(x) ? x : QuoteNode(x)` to resolve the ambiguity of whether the comparison is of the semantics or value. - Handle `undef` fields in Phi/PhiC node equality and hashing * fix showing types after removing using Core (#58773) PR #57357 changed the default using list, but only changed some of the places where the `show` code handled that. This led to duplicate (confusing) printing, since both Core. and Base. prefixes are dropped. Fix #58772 * inform compiler about local variable definedness (#58778) JET's new analysis pass now detects local variables that may be undefined, which has revealed such issues in several functions within Base (JuliaLang/julia#58762). This commit addresses local variables whose definedness the compiler cannot properly determine, primarily in functions reachable from JET's test suite. No functional changes are made. * better effects for `iterate` for `Memory` and `Array` (#58755) * Test: Hide REPL internals in backtraces (#58732) * Update docs for various type predicates (#58774) Makes the description for `isdispatchtuple` accurate, adds a docstring for `iskindtype` and `isconcretedispatch`, and adds notes to the docs for `isconcretetype` and `isabstracttype` explaining why they aren't antonyms. * Test: show context when a let testset errors (#58727) * [libblastrampoline_jll] Upgrade to v5.13.1 (#58775) ### Check list Version numbers: - [x] `deps/libblastrampoline.version`: `LIBNAME_VER`, `LIBNAME_BRANCH`, `LIBNAME_SHA1` and `LIBNAME_JLL_VER` - [x] `stdlib/libblastrampoline_jll/Project.toml`: `version` Checksum: - [x] `deps/checksums/libblastrampoline` * 🤖 [master] Bump the Pkg stdlib from 5577f68d6 to e3d456127 (#58781) Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: 5577f68d6 New commit: e3d456127 Julia version: 1.13.0-DEV Pkg version: 1.13.0 Bump invoked by: @KristofferC Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: https://github.com/JuliaLang/Pkg.jl/compare/5577f68d612139693282c037d070f515bf160d1b...e3d4561272fc029e9a5f940fe101ba4570fa875d ``` $ git log --oneline 5577f68d6..e3d456127 e3d456127 add update function to apps and fix a bug when adding an already installed app (#4263) cae9ce02a Fix historical stdlib fixup if `Pkg` is in the Manifest (#4264) a42046240 don't use tree hash from manifest if the path is set from sources (#4260) a94a6bcae fix dev taking when the app is already installed (#4259) 313fddccb Internals: Add fallback `Base.show(::IO, ::RegistryInstance)` method (#4251) ``` Co-authored-by: KristofferC <[email protected]> * prevent unnecessary repeated squaring calculation (#58720) * LibGit2: Update to 1.9.1 (#58731) * Unify `_checkbounds_array` into `checkbounds` and use it in more places (#58785) Ref: https://github.com/JuliaLang/julia/pull/58755#discussion_r2158944282. --------- Co-authored-by: Matt Bauman <[email protected]> Co-authored-by: Matt Bauman <[email protected]> * Chained hash pipelining in array hashing (#58252) the proposed switch in https://github.com/JuliaLang/julia/pull/57509 from `3h - hash_finalizer(x)` to `hash_finalizer(3h -x)` should increase the hash quality of chained hashes, as the expanded expression goes from something like `sum((-3)^k * hash(x) for k in ...)` to a non-simplifiable composition this does have the unfortunate impact of long chains of hashes getting a bit slower as there is more data dependency and the CPU cannot work on the next element's hash before combining the previous one (I think --- I'm not particularly an expert on this low level stuff). As far as I know this only really impacts `AbstractArray` so, I've implemented a proposal that does some unrolling / pipelining manually to recover `AbstractArray` hashing performance. in fact, it's quite a lot faster now for most lengths. I tuned the thresholds (8 accumulators, certain length breakpoints) by hand on my own machine. * Require all tuples in eachindex to have the same length. (#48125) Potential fix for #47898 --------- Co-authored-by: navdeep rana <[email protected]> Co-authored-by: Oscar Smith <[email protected]> Co-authored-by: Jerry Ling <[email protected]> Co-authored-by: Andy Dienes <[email protected]> * trailing dimensions in eachslice (#58791) fixes https://github.com/JuliaLang/julia/issues/51692 * Allow underscore (unused) args in presence of kwargs (#58803) Admittedly fixed because I thought I introduced this bug recently, but actually, fix #32727. `f(_; kw) = 1` should now lower in a similar way to `f(::Any; kw) = 1`, where we use a gensym for the first argument. Not in this PR, but TODO: `nospecialize` underscore-only names * codegen: slightly optimize gc-frame allocation (#58794) Try to avoid allocating frames for some very simple function that only have the safepoint on entry and don't define any values themselves. * codegen: ensure safepoint functions can read the pgcstack (#58804) This needs to be readOnly over all memory, since GC could read anything (especially pgcstack), and which is not just argmem:read, but also the pointer accessed from argmem that is read from. Fix #58801 Note that this is thought not to be a problem for CleanupWriteBarriers, since while that does read the previously-inaccessibleMemOnly state, these functions are not marked nosync, so as long as the global state can be read, it also must be assumed that it might observe another thread has written to any global state. * Revert code changes from "strengthen assume_effects doc" PR (#58289) Reverts only the functional changes from JuliaLang/julia#58254, not the docs. Accessing this field here assumes that the counter valid is numeric and relevant to the current inference frame, neither of which is intended to be true, as we continue to add interfaces to execute methods outside of their current specific implementation with a monotonic world counter (e.g. with invoke on a Method, with precompile files, with external MethodTables, or with static compilation). * build: Error when attempting to set USECLANG/USEGCC (#58795) Way back in the good old days, these used to switch between GCC and Clang. I guess these days we always auto-switch based on the CC value. If you try to directly set USECLANG, things get into a bad state. Give a better error message for that case. * build: Add --no-same-owner to TAR (#58796) tar changes behavior when the current uid is 0 to try to also restore owner uids/gids (if recorded). It is possible for the uid to be 0 in single-uid environments like user namespace sandboxes, in which case the attempt to change the uid/gid fails. Of course ideally, the tars would have been created non-archival (so that the uid/gid wasn't recorded in the first place), but we get source tars from various places, so we can't guarantee this. To make sure we don't run into trouble, manually add the --no-same-owner flag to disable this behavior. * Add `cfunction` support for `--trim` (#58812) * fix error message for `eachindex(::Vararg{Tuple})` (#58811) Make the error message in case of mismatch less confusing and consistent with the error message for arrays. While at it, also made other changes of the same line of source code: * use function composition instead of an anonymous closure * expand the one-liner into a multiline `if` --------- Co-authored-by: Andy Dienes <[email protected]> * use more canonical way to check binding existence (#58809) * Add `trim_mode` parameter to JIT type-inference entrypoint (#58817) Resolves https://github.com/JuliaLang/julia/issues/58786. I think this is only a partial fix, since we can still end up loading code from pkgimages that has been poorly inferred due to running without these `InferenceParams`. However, many of the common scenarios (such as JLL's depending on each other) seem to be OK since we have a targeted heuristic that adds `__init__()` to a pkgimage only if the module has inference enabled. * codegen: gc wb for atomic FCA stores (#58792) Need to re-load the correct `r` since issetfield skips the intcast, resulting in no gc wb for the FCA. Fix #58760 * codegen: relaxed jl_tls_states_t.safepoint load (#58828) Every function with a safepoint causes spurious thread sanitizer warnings without this change. Codegen is unaffected, except when we build with `ThreadSanitizerPass`. * bpart: Properly track methods with invalidated source after require_world (#58830) There are three categories of methods we need to worry about during staticdata validation: 1. New methods added to existing generic functions 2. New methods added to new generic functions 3. Existing methods that now have new CodeInstances In each of these cases, we need to check whether any of the implicit binding edges from the method's source was invalidated. Currently, we handle this for 1 and 2 by explicitly scanning the method on load. However, we were not tracking it for case 3. Fix that by using an extra bit in did_scan_method that gets set when we see an existing method getting invalidated, so we know that we need to drop the corresponding CodeInstances during load. Fixes #58346 * Limit --help and --help-hidden to 100 character line length (#58835) Just fixing the command line description to make sure it is not more than 100 characters wide as discussed with @oscardssmith in PR #54066 and PR #53759. I also added a test to make sure that nothing more than 100 characters is inserted. Thank you. * libuv: Mark `(un)preserve_handle` as `@nospecialize` (#58844) These functions only worry about object identity, so there's no need for them to specialize them on their type. * add METHOD_SIG_LATEST_ONLY optimization to MethodInstance too (#58825) Add the same optimization from Method to MethodInstance, although the performance gain seems to be negligible in my specific testing, there doesn't seem any likely downside to adding one caching bit to avoid some recomputations. * Encode fully_covers=false edges using negative of method count This change allows edges that don't fully cover their method matches to be properly tracked through serialization. When fully_covers is false (indicating incomplete method coverage), we encode the method count as negative in the edges array to signal that compactly. * move trim patches to separate files, only load if trimming (#58826) fixes part of #58458 * gf: Add METHOD_SIG_LATEST_HAS_NOTMORESPECIFIC dispatch status bit This commit introduces a new dispatch status bit to track when a method has other methods that are not more specific than it, enabling better optimization decisions during method dispatch. Key changes: 1. Add METHOD_SIG_LATEST_HAS_NOTMORESPECIFIC bit to track methods with non-morespecific intersections 2. Add corresponding METHOD_SIG_PRECOMPILE_HAS_NOTMORESPECIFIC bit for precompiled methods 3. Refactor method insertion logic: - Remove morespec_unknown enum state, compute all morespec values upfront - Convert enum morespec_options to simple boolean logic (1/0) - Change 'only' from boolean to 'dispatch_bits' bitmask - Move dispatch status updates before early continues in the loop * optimize verify_call again * juliac: Add rudimentary Windows support (#57481) This was essentially working as-is, except for our reliance on a C compiler. Not sure how we feel about having an `Artifacts.toml` floating around our `contrib` folder, but I'm not aware of an alternative other than moving `juliac.jl` to a subdirectory. * fix null comparisons for non-standard address spaces (#58837) Co-authored-by: Jameson Nash <[email protected]> * debuginfo: Memoize object symbol lookup (#58851) Supersedes https://github.com/JuliaLang/julia/pull/58355. Resolves https://github.com/JuliaLang/julia/issues/58326. On this PR: ```julia julia> @btime lgamma(2.0) ┌ Warning: `lgamma(x::Real)` is deprecated, use `(logabsgamma(x))[1]` instead. │ caller = var"##core#283"() at execution.jl:598 └ @ Core ~/.julia/packages/BenchmarkTools/1i1mY/src/execution.jl:598 47.730 μs (105 allocations: 13.24 KiB) ``` On `nightly`: ```julia julia> @btime lgamma(2.0) ┌ Warning: `lgamma(x::Real)` is deprecated, use `(logabsgamma(x))[1]` instead. │ caller = var"##core#283"() at execution.jl:598 └ @ Core ~/.julia/packages/BenchmarkTools/1i1mY/src/execution.jl:598 26.856 ms (89 allocations: 11.32 KiB) ``` * bpart: Skip inserting image backedges while we're generating a pkgimage (#58843) Should speed up deeply nested precompiles by skipping unnecessary work here. PR is against #58830 to avoid conflicts, but semantically independent. * Re-add old function name for backward compatibility in init (#58860) While julia has no C-API backwards compatibility guarantees this is simple enough to add. Fixes #58859 * trimming: Add `_uv_hook_close` support (#58871) Resolves https://github.com/JuliaLang/julia/issues/58862. Since this hook is called internally by the runtime, `--trim` was not aware of the callee edge required here. * Don't `@inbounds` AbstractArray's iterate method; optimize `checkbounds` instead (#58793) Split off from #58785, this simplifies `iterate` and removes the `@inbounds` call that was added in https://github.com/JuliaLang/julia/pull/58635. It achieves the same (or better!) performance, however, by targeting optimizations in `checkbounds` and — in particular — the construction of a linear `eachindex` (against which the bounds are checked). --------- Co-authored-by: Mosè Giordano <[email protected]> * aotcompile: Fix early-exit if CI not found for `cfunction` (#58722) As written, this was accidentally skipping all the subsequent `cfuncs` that need adapters. * zero-index get/setindex(::ReinterpretArray) require a length of 1 (#58814) fix https://github.com/JuliaLang/julia/issues/58232 o3 helped me understand the existing implementations but code is mine --------- Co-authored-by: Matt Bauman <[email protected]> * Add `Base.isprecompilable` (#58805) Alternative to https://github.com/JuliaLang/julia/pull/58146. We want to compile a subset of the possible specializations of a function. To this end, we have a number of manually written `precompile` statements. Creating this list is, unfortunately, error-prone, and the list is also liable to going stale. Thus we'd like to validate each `precompile` statement in the list. The simple answer is, of course, to actually run the `precompile`s, and we naturally do so, but this takes time. We would like a relatively quick way to check the validity of a `precompile` statement. This is a dev-loop optimization, to allow us to check "is-precompilable" in unit tests. We can't use `hasmethod` as it has both false positives (too loose): ```julia julia> hasmethod(sum, (AbstractVector,)) true julia> precompile(sum, (AbstractVector,)) false julia> Base.isprecompilable(sum, (AbstractVector,)) # <- this PR false ``` and also false negatives (too strict): ```julia julia> bar(@nospecialize(x::AbstractVector{Int})) = 42 bar (generic function with 1 method) julia> hasmethod(bar, (AbstractVector,)) false julia> precompile(bar, (AbstractVector,)) true julia> Base.isprecompilable(bar, (AbstractVector,)) # <- this PR true ``` We can't use `hasmethod && isconcretetype` as it has false negatives (too strict): ```julia julia> has_concrete_method(f, argtypes) = all(isconcretetype, argtypes) && hasmethod(f, argtypes) has_concrete_method (generic function with 1 method) julia> has_concrete_method(bar, (AbstractVector,)) false julia> has_concrete_method(convert, (Type{Int}, Int32)) false julia> precompile(convert, (Type{Int}, Int32)) true julia> Base.isprecompilable(convert, (Type{Int}, Int32)) # <- this PR true ``` `Base.isprecompilable` is essentially `precompile` without the actual compilation. * Add a `similar` method for `Type{<:CodeUnits}` (#57826) Currently, `similar(::CodeUnits)` works as expected by going through the generic `AbstractArray` method. However, the fallback method hit by `similar(::Type{<:CodeUnits}, dims)` does not work, as it assumes the existence of a constructor that accepts an `UndefInitializer`. This can be made to work by defining a corresponding `similar` method that returns an `Array`. One could make a case that this is a bugfix since it was arguably a bug that this method didn't work given that `CodeUnits` is an `AbstractArray` subtype and the other `similar` methods work. If anybody buys that argument, it could be nice to backport this; it came up in some internal code that uses Arrow.jl and JSON3.jl together. * 🤖 [master] Bump the Pkg stdlib from e3d456127 to 109eaea66 (#58858) Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: e3d456127 New commit: 109eaea66 Julia version: 1.13.0-DEV Pkg version: 1.13.0 Bump invoked by: @KristofferC Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: https://github.com/JuliaLang/Pkg.jl/compare/e3d4561272fc029e9a5f940fe101ba4570fa875d...109eaea66a0adb0ad8fa497e64913eadc2248ad1 ``` $ git log --oneline e3d456127..109eaea66 109eaea66 Various app improvements (#4278) 25c2390ed feat(apps): Add support for multiple apps per package via submodules (#4277) c78b40b35 copy the app project instead of wrapping it (#4276) d2e61025b Fix leading whitespace in REPL commands with comma-separated packages (#4274) e02bcabd7 Registry: Properly pass down `depot` (#4268) e9a055240 fix what project file to look at when package without path but with a subdir is devved by name (#4271) 8b1f0b9ff prompt for confirmation before removing compat entry (#4254) eefbef649 feat(errors): Improve error message for incorrect package UUID (#4270) 4d1c6b0a3 explain no reg installed when no reg installed (#4261) ``` Co-authored-by: KristofferC <[email protected]> * fix a few tiny JET linter issues (#58869) * Fix data race in jl_new_module__ (#58880) Use an atomic fetch and add to fix a data race in `Module()` identified by tsan: ``` ./usr/bin/julia -t4,0 --gcthreads=1 -e 'Threads.@threads for i=1:100 Module() end' ================== WARNING: ThreadSanitizer: data race (pid=5575) Write of size 4 at 0xffff9bf9bd28 by thread T9: #0 jl_new_module__ /home/user/c/julia/src/module.c:487:22 (libjulia-internal.so.1.13+0x897d4) #1 jl_new_module_ /home/user/c/julia/src/module.c:527:22 (libjulia-internal.so.1.13+0x897d4) #2 jl_f_new_module /home/user/c/julia/src/module.c:649:22 (libjulia-internal.so.1.13+0x8a968) #3 <null> <null> (0xffff76a21164) #4 <null> <null> (0xffff76a1f074) #5 <null> <null> (0xffff76a1f0c4) #6 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5ea04) #7 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5ea04) #8 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0x9e4c4) #9 start_task /home/user/c/julia/src/task.c:1249:19 (libjulia-internal.so.1.13+0x9e4c4) Previous write of size 4 at 0xffff9bf9bd28 by thread T10: #0 jl_new_module__ /home/user/c/julia/src/module.c:487:22 (libjulia-internal.so.1.13+0x897d4) #1 jl_new_module_ /home/user/c/julia/src/module.c:527:22 (libjulia-internal.so.1.13+0x897d4) #2 jl_f_new_module /home/user/c/julia/src/module.c:649:22 (libjulia-internal.so.1.13+0x8a968) #3 <null> <null> (0xffff76a21164) #4 <null> <null> (0xffff76a1f074) #5 <null> <null> (0xffff76a1f0c4) #6 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5ea04) #7 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5ea04) #8 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0x9e4c4) #9 start_task /home/user/c/julia/src/task.c:1249:19 (libjulia-internal.so.1.13+0x9e4c4) Location is global 'jl_new_module__.mcounter' of size 4 at 0xffff9bf9bd28 (libjulia-internal.so.1.13+0x3dbd28) ``` * fix trailing indices stackoverflow in reinterpreted array (#58293) would fix https://github.com/JuliaLang/julia/issues/57170, fix https://github.com/JuliaLang/julia/issues/54623 @nanosoldier `runbenchmarks("array", vs=":master")` * Add missing module qualifier (#58877) A very simple fix addressing the following bug: ```julia Validation: Error During Test at REPL[61]:1 Got exception outside of a @test #=ERROR showing exception stack=# UndefVarError: `get_ci_mi` not defined in `Base.StackTraces` Suggestion: check for spelling errors or missing imports. Hint: a global variable of this name also exists in Base. - Also declared public in Compiler (loaded but not imported in Main). Stacktrace: [1] show_custom_spec_sig(io::IOContext{IOBuffer}, owner::Any, linfo::Core.CodeInstance, frame::Base.StackTraces.StackFrame) @ Base.StackTraces ./stacktraces.jl:293 [2] show_spec_linfo(io::IOContext{IOBuffer}, frame::Base.StackTraces.StackFrame) @ Base.StackTraces ./stacktraces.jl:278 [3] print_stackframe(io::IOContext{IOBuffer}, i::Int64, frame::Base.StackTraces.StackFrame, n::Int64, ndigits_max::Int64, modulecolor::Symbol; prefix::Nothing) @ Base ./errorshow.jl:786 ``` AFAIK this occurs when printing a stacktrace from a `CodeInstance` that has a non-default owner. * OpenSSL: Update to 3.5.1 (#58876) Update the stdlib OpenSSL to 3.5.1. This is a candidate for backporting to Julia 1.12 if there is another beta release. * `setindex!(::ReinterpretArray, v)` needs to convert before reinterpreting (#58867) Found in https://github.com/JuliaLang/julia/pull/58814#discussion_r2169155093. Previously, in a very limited situation (a zero-dimensional reinterpret that reinterprets between primitive types that was setindex!'ed with zero indices), we omitted the `convert`. I believe this was an unintentional oversight, and hopefully nobody is depending on this behavior. * Support `debuginfo` context option in IRShow for `IRCode`/`IncrementalCompact` (#58642) This allows us to get complete source information during printing for `IRCode` and `IncrementalCompact`, same as we do by default with `CodeInfo`. The user previously had to do: ```julia Compiler.IRShow.show_ir(stdout, ir, Compiler.IRShow.default_config(ir; verbose_linetable=true)) ``` and now, they only need to do: ```julia show(IOContext(stdout, :debuginfo => :source), ir) ``` * Add offset in `hvncat` dimension calculation to fix issue with 0-length elements in first dimension (#58881) * fix `setindex(::ReinterpretArray,...)` for zero-d arrays (#58868) by copying the way getindex works. Found in https://github.com/JuliaLang/julia/pull/58814#discussion_r2178243259 --------- Co-authored-by: Andy Dienes <[email protected]> * add back `to_power_type` to `deprecated.jl` since some packages call it (#58886) Co-authored-by: KristofferC <[email protected]> * Pkg: Allow configuring can_fancyprint(io::IO) using IOContext (#58887) * Make `Base.donotdelete` public (#55774) I rely on `Base.donotdelete` in [Chairmarks.jl](https://chairmarks.lilithhafner.com) and I'd like it to be public. I imagine that other benchmarking tools also rely on it. It's been around since 1.8 (see also: #55773) and I think we should commit to keeping it functional for the rest of 1.x. * Add link to video in profiling manual (#58896) * Stop documenting that `permute!` is "in-place"; it isn't and never has been non-allocating (#58902) * faster iteration over a `Flatten` of heterogenous iterators (#58522) seems to help in many cases. would fix the precise MWE given in https://github.com/JuliaLang/julia/issues/52552, but does not necessarily fix comprehensively all perf issues of all heterogenous flattens. but, may as well be better when it's possible setup: ``` julia> using BenchmarkTools julia> A = rand(Int, 100000); B = 1:100000; julia> function g(it) s = 0 for i in it s += i end s end ``` before: ``` julia> @btime g($(Iterators.flatten((A, B)))) 12.461 ms (698979 allocations: 18.29 MiB) julia> @btime g($(Iterators.flatten(i for i in (A, B)))) 12.393 ms (698979 allocations: 18.29 MiB) julia> @btime g($(Iterators.flatten([A, B]))) 15.115 ms (999494 allocations: 25.93 MiB) julia> @btime g($(Iterators.flatten((A, Iterators.flatten((A, B)))))) 82.585 ms (2997964 allocations: 106.78 MiB) ``` after: ``` julia> @btime g($(Iterators.flatten((A, B)))) 135.958 μs (2 allocations: 64 bytes) julia> @btime g($(Iterators.flatten(i for i in (A, B)))) 149.500 μs (2 allocations: 64 bytes) julia> @btime g($(Iterators.flatten([A, B]))) 17.130 ms (999498 allocations: 25.93 MiB) julia> @btime g($(Iterators.flatten((A, Iterators.flatten((A, B)))))) 13.716 ms (398983 allocations: 10.67 MiB) ``` * Make `hypot` docs example more type stable (#58918) * Markdown: Make `Table`/`LaTeX` objects subtypes of `MarkdownElement` (#58916) These objects satisfy the requirements of the `MarkdownElement` interface (such as implementing `Markdown.plain`), so they should be subtypes of `MarkdownElement`. This is convenient when defining functions for `MarkdownElement` in other packages. * Support "Functor-like" `code_typed` invocation (#57911) This lets you easily inspect IR associated with "Functor-like" methods: ```julia julia> (f::Foo)(offset::Float64) = f.x + f.y + offset julia> code_typed((Foo, Float64)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.getfield(f, :x)::Int64 │ %2 = Base.getfield(f, :y)::Int64 │ %3 = Base.add_int(%1, %2)::Int64 │ %4 = Base.sitofp(Float64, %3)::Float64 │ %5 = Base.add_float(%4, offset)::Float64 └── return %5 ) => Float64 ``` This is just a small convenience over `code_typed_by_type`, but I'm in support of it (even though it technically changes the meaning of, e.g., `code_typed((1, 2))` which without this PR inspects `(::Tuple{Int,Int})(::Vararg{Any})` We should probably update all of our reflection machinery (`code_llvm`, `code_lowered`, `methodinstance`, etc.) to support this "non-arg0" style as well, but I wanted to open this first to make sure folks like it. * IRShow: Print arg0 type when necessary to disambiguate `invoke` (#58893) When invoking any "functor-like", such as a closure: ```julia bar(x) = @noinline ((y)->x+y)(x) ``` our IR printing was not showing the arg0 invoked, even when it is required to determine which MethodInstance this is invoking. Before: ```julia julia> @code_typed optimize=true bar(1) CodeInfo( 1 ─ %1 = %new(var"#bar##2#bar##3"{Int64}, x)::var"#bar##2#bar##3"{Int64} │ %2 = invoke %1(x::Int64)::Int64 └── return %2 ) => Int64 ``` After: ```julia julia> @code_typed optimize=true bar(1) CodeInfo( 1 ─ %1 = %new(var"#bar##2#bar##3"{Int64}, x)::var"#bar##2#bar##3"{Int64} │ %2 = invoke (%1::var"#bar##2#bar##3"{Int64})(x::Int64)::Int64 └── return %2 ) => Int64 ``` * Support "functors" for code reflection utilities (#58891) As a follow-up to https://github.com/JuliaLang/julia/pull/57911, this updates: - `Base.method_instance` - `Base.method_instances` - `Base.code_ircode` - `Base.code_lowered` - `InteractiveUtils.code_llvm` - `InteractiveUtils.code_native` - `InteractiveUtils.code_warntype` to support "functor" invocations. e.g. `code_llvm((Foo, Int, Int))` which corresponds to `(::Foo)(::Int, ::Int)` * Prevent data races in invalidate_code_for_globalref! * Fix type instability in invalidate_code_for_globalref! * Add the fact that functions ending with `!` may allocate to the FAQ (#58904) I've run into this question several times, that might count as "frequently asked". * Economy mode REPL: run the event loop with jl_uv_flush (#58926) `ios_flush` won't wait for the `jl_static_show` from the previous evaluation to complete, resulting in the output being interleaved with subsequent REPL outputs. Anything that produces a lot of output will trigger it, like `Core.GlobalMethods.defs`. * Fix grammar, typos, and formatting issues in docstrings (#58944) Co-authored-by: Claude <[email protected]> * Fix nthreadpools size in JLOptions (#58937) * NFC: Remove duplicate `julia-src-%` dependency in makefile (#58947) * Improve error message for missing dependencies in packages (#58878) * Make current_terminfo a OncePerProcess (#58854) There seems to be no reason to always load this unconditionally - especially since it's in the critical startup path. If we never print colored output or our IO is not a TTY, we don't need to load this at all. While we're at it, remove the `term_type` argument to `ttyhascolor`, which didn't work as advertised anyway, since it still looked at the current_terminfo. If clients want to do a full TermInfo check, they can do that explicitly. (Written by Claude Code) * chore: remove redundant words in comment (#58955) * add a precompile workload to TOML (#58949) * 🤖 [master] Bump the NetworkOptions stdlib from c090626 to 532992f (#58882) Co-authored-by: DilumAluthge <[email protected]> * remove excessive code from trim script (#58853) Co-authored-by: gbaraldi <[email protected]> * Add juliac Artifacts.toml in Makefile (#58936) * staticdata: Don't discard inlineable code that inference may need (#58842) See https://github.com/JuliaLang/julia/issues/58841#issuecomment-3014833096. We were accidentally discarding inferred code during staticdata preparation that we would need immediately afterwards to satisfy inlining requests during code generation for the system image. This was resulting in spurious extra compilation at the first inference after sysimage reload. Additionally it was likely causing various unnecessary dispatch slow paths in the generated inference code. Fixes #58841. * clear up `isdone` docstring (#58958) I got pretty confused on my first reading of this docstring because for some reason I thought it was saying that `isdone(itr, state) == missing` implied that it was true that `iterate(itr, state) === nothing` (aka that `state` is indeed final). which of course is wrong and doesn't make sense, but it's still how I read it. I think the new docstring is a bit more explicit. * shield `_artifact_str` function behind a world age barrier (#58957) We already do this for `require` in Base loading, it probably makes sense to do this here as well, as invalidating this function easily adds +1s in load time for a jll. Avoids the big load time penalty from loading IntelOpenMP_jll in https://github.com/JuliaLang/julia/issues/57436#issuecomment-3052258775. Before: ``` julia> @time using ModelingToolkit 6.546844 seconds (16.09 M allocations: 938.530 MiB, 11.13% gc time, 16.35% compilation time: 12% of which was recompilation) ``` After: ``` julia> @time using ModelingToolkit 5.637914 seconds (8.26 M allocations: 533.694 MiB, 11.47% gc time, 3.11% compilation time: 17% of which was recompilation) ``` --------- Co-authored-by: KristofferC <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> * doc: Fix grammar, typos, and formatting issues across documentation (#58932) Co-authored-by: Claude <[email protected]> * Replace Base.Workqueues with a OncePerThread (#58941) Simplify `workqueue_for`. While not strictly necessary, the acquire load in `getindex(once::OncePerThread{T,F}, tid::Integer)` makes ThreadSanitizer happy. With the existing implementation, we get false positives whenever a thread other than the one that originally allocated the array reads it: ``` ================== WARNING: ThreadSanitizer: data race (pid=6819) Atomic read of size 8 at 0xffff86bec058 by main thread: #0 getproperty Base_compiler.jl:57 (sys.so+0x113b478) #1 julia_pushNOT._1925 task.jl:868 (sys.so+0x113b478) #2 julia_enq_work_1896 task.jl:969 (sys.so+0x5cd218) #3 schedule task.jl:983 (sys.so+0x892294) #4 macro expansion threadingconstructs.jl:522 (sys.so+0x892294) #5 julia_start_profile_listener_60681 Base.jl:355 (sys.so+0x892294) #6 julia___init___60641 Base.jl:392 (sys.so+0x1178dc) #7 jfptr___init___60642 <null> (sys.so+0x118134) #8 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5e9a4) #9 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5e9a4) #10 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0xbba74) #11 jl_module_run_initializer /home/user/c/julia/src/toplevel.c:68:13 (libjulia-internal.so.1.13+0xbba74) #12 _finish_jl_init_ /home/user/c/julia/src/init.c:632:13 (libjulia-internal.so.1.13+0x9c0fc) #13 ijl_init_ /home/user/c/julia/src/init.c:783:5 (libjulia-internal.so.1.13+0x9bcf4) #14 jl_repl_entrypoint /home/user/c/julia/src/jlapi.c:1125:5 (libjulia-internal.so.1.13+0xf7ec8) #15 jl_load_repl /home/user/c/julia/cli/loader_lib.c:601:12 (libjulia.so.1.13+0x11934) #16 main /home/user/c/julia/cli/loader_exe.c:58:15 (julia+0x10dc20) Previous write of size 8 at 0xffff86bec058 by thread T2: #0 IntrusiveLinkedListSynchronized task.jl:863 (sys.so+0x78d220) #1 macro expansion task.jl:932 (sys.so+0x78d220) #2 macro expansion lock.jl:376 (sys.so+0x78d220) #3 julia_workqueue_for_1933 task.jl:924 (sys.so+0x78d220) #4 julia_wait_2048 task.jl:1204 (sys.so+0x6255ac) #5 julia_task_done_hook_49205 task.jl:839 (sys.so+0x128fdc0) #6 jfptr_task_done_hook_49206 <null> (sys.so+0x902218) #7 _jl_invoke /home/user/c/julia/src/gf.c (libjulia-internal.so.1.13+0x5e9a4) #8 ijl_apply_generic /home/user/c/julia/src/gf.c:3892:12 (libjulia-internal.so.1.13+0x5e9a4) #9 jl_apply /home/user/c/julia/src/julia.h:2343:12 (libjulia-internal.so.1.13+0x9c79c) #10 jl_finish_task /home/user/c/julia/src/task.c:345:13 (libjulia-internal.so.1.13+0x9c79c) #11 jl_threadfun /home/user/c/julia/src/scheduler.c:122:5 (libjulia-internal.so.1.13+0xe7db8) Thread T2 (tid=6824, running) created by main thread at: #0 pthread_create <null> (julia+0x85f88) #1 uv_thread_create_ex /workspace/srcdir/libuv/src/unix/thread.c:172 (libjulia-internal.so.1.13+0x1a8d70) #2 _finish_jl_init_ /home/user/c/julia/src/init.c:618:5 (libjulia-internal.so.1.13+0x9c010) #3 ijl_init_ /home/user/c/julia/src/init.c:783:5 (libjulia-internal.so.1.13+0x9bcf4) #4 jl_repl_entrypoint /home/user/c/julia/src/jlapi.c:1125:5 (libjulia-internal.so.1.13+0xf7ec8) #5 jl_load_repl /home/user/c/julia/cli/loader_lib.c:601:12 (libjulia.so.1.13+0x11934) #6 main /home/user/c/julia/cli/loader_exe.c:58:15 (julia+0x10dc20) SUMMARY: ThreadSanitizer: data race Base_compiler.jl:57 in getproperty ================== ``` * Fix `hygienic-scope`s in inner macro expansions (#58965) Changes from https://github.com/JuliaLang/julia/pull/43151, github just didn't want me to re-open it. As discussed on slack, any `hygienic-scope` within an outer `hygienic-scope` can read and write variables in the outer one, so it's not particularly hygienic. The result is that we can't safely nest macro calls unless they know the contents of all inner macro calls. Should fix #48910. Co-authored-by: Michiel Dral <[email protected]> * remove comment from julia-syntax that is no longer true (#58964) The code this referred to was removed by c6c3d72d1cbddb3d27e0df0e739bb27dd709a413 * expand memoryrefnew capabilities (#58768) The goal here is 2-fold. Firstly, this should let us simplify the boundscheck (not yet implimented), but this also should reduce Julia IR side a bit. * Add news entry and update docstring for #58727 (#58973) * Fix alignment of failed precompile jobs on CI (#58971) * bpart: Tweak `isdefinedglobal` on backdated constant (#58976) In d2cc06193ef4161e4ac161bd4b5b57a51686a89a and prior commits, we made backdated access a conditional error (if depwarns are enabled or in generators). However, we did not touch `isdefinedglobal`. This resulted in the common pattern `isdefinedglobal(m, s) && getglobal(m, s)` to sometimes error. In particular, this could be observed when attempting to print a type from inside a generated function before that type's definition age. Additionally, I think the usage there, which used `invokelatest` on each of the two queries is problematic because it is racy, since the two `invokelatest` calls may be looking at different world ages. This makes two tweaks: 1. Makes `isdefinedglobal` consistent with `getglobal` in that it now returns false if `getglobal` would throw due to the above referenced restriction. 2. Removes the implicit `invokelatest` in _isself in the show code. Instead, it will use the current world. I considered having it use the exception age when used for MethodErrors. However, because this is used for printing it matters more how the object can be accessed *now* rather than how it could have been accessed in the past. * Fix precompilepkgs warn loaded setting (#58978) * specify that `Iterators.rest` must be given a valid `state` (#58962) ~currently `Iterators.rest(1:2, 3)` creates an infinite loop. after this PR it would be an `ArgumentError`~ docs only now * stdlib/Dates: Fix doctest regex to handle DateTime with 0 microseconds (#58981) The `now(UTC)` doctest can fail when the DateTime has exactly 0 milliseconds, as the output format omits the fractional seconds entirely (e.g., "2023-01-04T10:52:24" instead of "2023-01-04T10:52:24.000"). Update the regex filter to make the milliseconds portion optional by using `(\\.\\d{3})?` instead of `\\.\\d{3}`. Fixes CI failure: https://buildkite.com/julialang/julia-master/builds/49144#0197fd72-d1c6-44d6-9c59-5f548ab98f04 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Keno Fischer <[email protected]> Co-authored-by: Claude <[email protected]> * Fix unique for range wrappers with zero step (#51004) The current implementation assumes that the vector indexing `r[begin:begin]` is specialized to return a range, which isn't the case by default. As a consequence, ```julia julia> struct MyStepRangeLen{T,R} <: AbstractRange{T} x :: R end julia> MyStepRangeLen(s::StepRangeLen{T}) where {T} = MyStepRangeLen{T,typeof(s)}(s) MyStepRangeLen julia> Base.first(s::MyStepRangeLen) = first(s.x) julia> Base.last(s::MyStepRangeLen) = last(s.x) julia> Base.length(s::MyStepRangeLen) = length(s.x) julia> Base.step(s::MyStepRangeLen) = step(s.x) julia> r = MyStepRangeLen(StepRangeLen(1,0,4)) 1:0:1 julia> unique(r) ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type MyStepRangeLen{Int64, Int64, StepRangeLen{Int64, Int64, Int64, Int64}} [...] ``` This PR fixes this by using constructing a `UnitRange` instead of the indexing operation. After this, we obtain ```julia julia> unique(r) 1:1:1 ``` In principle, the `step` should be preserved, but `range(r[begin]::Int, step=step(r), length=length(r))` appears to error at present, as it tries to construct a `StepRange` instead of a `StepRangeLen`. This fix isn't perfect as it assumes that the conversion from a `UnitRange` _is_ defined, which is also not the case by default. For example, the following still won't work: ```julia julia> struct MyRange <: AbstractRange{Int} end julia> Base.first(x::MyRange) = 1 julia> Base.last(x::MyRange) = 1 julia> Base.length(x::MyRange) = 3 julia> Base.step(x::MyRange) = 0 julia> unique(MyRange()) ERROR: MethodError: no method matching MyRange(::UnitRange{Int64}) [...] ``` In fact, if the indexing `MyRange()[begin:begin]` has been specialized but the conversion from a `UnitRange` isn't, then this is actually a regression. I'm unsure if such pathological cases are common, though. The reason the first example works is that the conversion for a range wrapper is defined implicitly if the parent type supports conversion from a `UnitRange`. * Docs: add GC user docs (#58733) Co-authored-by: Andy Dienes <[email protected]> Co-authored-by: Gabriel Baraldi <[email protected]> Co-authored-by: Diogo Netto <[email protected]> * 🤖 [master] Bump the Pkg stdlib from 109eaea66 to b85e29428 (#58991) Co-authored-by: IanButterworth <[email protected]> * Add one-argument `argtypes` methods to source reflection functions (#58925) Follow-up to https://github.com/JuliaLang/julia/pull/58891#issuecomment-3036419509, extending the feature to `which`, `functionloc`, `edit` and `less`. * Test: Add compiler hint for `ts` variable definedness in `@testset for` (#58989) Helps the new language server avoid reporting unused variable reports. * trimming: explictly add Libdl dep for test/trimming/basic_jll.jl (#58990) * win/msys2: Automatically switch msys2 symlinks mode for LLVM (#58988) As noted in https://github.com/JuliaLang/julia/issues/54981#issuecomment-2336444226, msys2 currently fails to untar an llvm source build. Fix that by setting the appropriate environment variable to switch the symlinks mode. * Fix order of MSYS rules (#58999) git-external changes the LLVM_SRC_DIR variable, so the target-specific variable applies to the wrong target if defined before it - didn't notice in local testing because I had accidentally switched the variable globally earlier for testing - but showed up on a fresh build. * msys2: Recommend correct cmake package (#59001) msys2 ships 2 different cmake packages, one built natively (with mingw prefix in the package name) and one built against the posix emulation environment. The posix emulation one does not work because it will detect unix-style paths, which it then writes into files that native tools process. Unlike during command invocation (where the msys2 runtime library does path translation), when paths are written to files, they are written verbatim. The practical result of this is that e.g. the LLVM build will fail with a mysterious libz link failure (as e.g. reported in #54981). This is our fault, because our built instructions tell the user to install the wrong one. Fix all that by 1. Correcting the build instructions to install the correct cmake 2. Detecting if the wrong cmake is installed and advising the correct one 3. Fixing an issue where the native CMake did not like our CMAKE_C_COMPILER setting. With all this, CMake runs correctly under msys2 with USE_BINARYBUILDER_LLVM=0. * feat(REPL): Added `active_module` context to numbered REPL (#59000) * optimize `length(::OrdinalRange)` for large bit-ints (#58864) Split from #58793, this coalesces nearly all the branches in `length`, allowing it to inline and generally perform much better while retaining the exact same functionality. --------- Co-authored-by: N5N3 <[email protected]> * Fix LLVM TaskDispatcher implementation issues (#58950) Fixes #58229 (LLVM JITLink stack overflow issue) I tried submitting this promise/future implementation upstream (https://github.com/llvm/llvm-project/compare/main...vtjnash:llvm-project:jn/cowait-jit) so that I would not need to duplicate nearly as much code here to fix this bug, but upstream is currently opposed to fixing this bug and instead insists it is preferable for each downstream project to implement this fix themselves adding extra maintenance burden for us for now. Sigh. * Improve --trace-dispatch coverage: emit in "full-cache" fast path as well. (#59012) This PR moves the `--trace-dispatch` logging inside `jl_lookup_generic_` from only the `cache miss case` to also logging it inside the `no method was found in the associative cache, check the full cache` case. This PR logs the data from inside each of the two slow-path cases. * MozillaCACerts: Update to 2025-07-15 (#59010) * Fix use-after-free in FileWatching (#59017) We observe an abort on Windows on Revise master CI, where a free'd handle is passed to jl_close_uv. The root cause is that uv_fseventscb_file called uvfinalize earlier, but did not set the handle to NULL, so when the actual finalizer ran later, it would see corrupted state. * Roll up msys2/clang/windows build fixes (#59003) This rolls up everything I had to change to get a successful source build of Julia under msys2. It's a misc collection of msys2, clang and other fixes. With this, I can use the following Make.user: ``` USE_SYSTEM_CSL=1 USE_BINARYBUILDER_LLVM=0 CC=clang CXX=clang++ FC=gfortran ``` The default USE_SYSTEM_CSL is broken due to #56840 With USE_SYSTEM_CSL=1, LLVM is broken due to #57021 Clang is required because gcc can't do an LLVM source build due to known export symbol size limits (ref JuliaPackaging/Yggdrasil#11652). That said, if we address the ABI issues in #56840, the default Make.user should build again (with BB-provided LLVM). * Fix tar command (#59026) Scheduled build failing with ``` cd [buildroot]/deps/srccache/ && /usr/bin/tar --no-same-owner -xfz [buildroot]/deps/srccache/libunwind-1.8.2.tar.gz /usr/bin/tar: z: Cannot open: No such file or directory ``` Issue probably introduced in https://github.com/JuliaLang/julia/pull/58796. According to chatgpt this will fix it * Add 'sysimage' keyword for `JULIA_CPU_TARGET` to match (or extend) the sysimage target (#58970) * add `@__FUNCTION__` and `Expr(:thisfunction)` as generic function self-reference (#58940) This PR adds `@__FUNCTION__` to match the naming conventions of existing reflection macros (`@__MODULE__`, `@__FILE__`, etc.). --------- Co-authored-by: Jeff Bezanson <[email protected]> * Bugfix: Use Base.aligned_sizeof instead of sizeof in Mmap.mmap (#58998) fix #58982 * Fix PR reference in NEWS (#59046) * 🤖 [master] Bump the LibCURL stdlib from a65b64f to 038790a (#59038) Co-authored-by: IanButterworth <[email protected]> * 🤖 [master] Bump the DelimitedFiles stdlib from db79c84 to a982d5c (#59036) Co-authored-by: IanButterworth <[email protected]> * 🤖 [master] Bump the SHA stdlib from 4451e13 to 169a336 (#59041) Co-authored-by: IanButterworth <[email protected]> * 🤖 [master] Bump the Pkg stdlib from b85e29428 to 38d2b366a (#59040) Co-authored-by: IanButterworth <[email protected]> * 🤖 [master] Bump the Statistics stdlib from 77bd570 to 22dee82 (#59043) Co-authored-by: IanButterworth <[email protected]> * Expand JULIA_CPU_TARGET docs (#58968) * 🤖 [master] Bump the LinearAlgebra stdlib from 3e4d569 to 2c3fe9b (#59039) Co-authored-by: IanButterworth <[email protected]> Co-authored-by: Ian Butterworth <[email protected]> * 🤖 [master] Bump the SparseArrays stdlib from 6d072a8 to 30201ab (#59042) * 🤖 [master] Bump the JuliaSyntaxHighlighting stdlib from f803fb0 to b666d3c (#59037) * stored method interference graph (#58948) Store full method interference relationship graph in interferences field of Method to avoid expensive morespecific calls during dispatch. This provides significant performance improvements: - Replace method comparisons with precomputed interference lookup. - Optimize ml_matches minmax computation using interference lookups. - Optimize sort_mlmatches for large return sets by iterating over interferences instead of all matching methods. - Add method_morespecific_via_interferences in both C and Julia. This representation may exclude some edges that are implied by transitivity since sort_mlmatches will ensure the correct result by following strong edges. Ambiguous edges are guaranteed to be checkable without recursion. Also fix a variety of bugs along the way: - Builtins signature would cause them to try to discard all other methods during `sort_mlmatches`. - Some ambiguities were over-estimated, which now are improved upon. - Setting lim==-1 now gives the same limited list of methods as lim>0, since that is actually faster now than attempting to give the unsorted list. This provides a better fix to #53814 than #57837 and fixes #58766. - Reverts recent METHOD_SIG_LATEST_HAS_NOTMORESPECIFIC attempt (though not the whole commit), since I found a significant problem with any usage of that bit during testing: it only tracks methods that intersect with a target, but new methods do not necessarily intersect with any existing target. This provides a decent performance improvement to `methods` calls, which implies a decent speed up to package loading also (e.g. ModelingToolkit loads in about 4 seconds instead of 5 seconds). * build/llvm: Remove bash-specific curly expansion (#59058) Fixes #59050 * build: More msys2 fixes (#59028) * remove a testset from MMAP that might cause CI to now fail on Windows (#59062) * Use a dedicated parameter attribute to identify the gstack arg. (#59059) Otherwise, on systems without SwitfCC support (i.e. RISC-V) `getPGCstack` may return null, disabling the final GC pass. * skip unnecessary alias-check in `collect(::AbstractArray)` from `copyto!` (#55748) As discussed on Slack with @MasonProtter & @jakobnissen, `collect` currently does a usually cheap - but sometimes expensive - aliasing check (via `unalias`->`mightalias`->`dataid` -> `objectid`) before copying contents over; this check is unnecessary, however, since the source array is newly created and cannot possibly alias the input. This PR fixes that by swapping from `copyto!` to `copyto_unaliased!` in the `_collect_indices` implementations where the swap is straightforward (e.g., it is not so straightforward for the fallback `_collect_indices(indsA, A)`, so I skipped it there). This improves the following example substantially: ```jl struct GarbageVector{N} <: AbstractVector{Int} v :: Vector{Int} garbage :: NTuple{N, Int} end GarbageVector{N}(v::Vector{Int}) where N = GarbageVector{N}(v, ntuple(identity, Val(N))) Base.getindex(gv::GarbageVector, i::Int) = gv.v[i] Base.size(gv::GarbageVector) = size(gv.v) using BenchmarkTools v = rand(Int, 10) gv = GarbageVector{100}(v) @btime collect($v); # 30 ns (v1.10.4) -> 30 ns (PR) @btime collect($gv); # 179 ns (v1.10.4) -> 30 ns (PR) ``` Relatedly, it seems the fact that `mightalias` is comparing immutable contents as well - and hence slowing down the `unalias` check for the above `GarbageVector` via a slow `objectid` on tuples - is suboptimal. I don't know how to fix that though, so I'd like to leave that outside this PR. (Probably related to https://github.com/JuliaLang/julia/pull/26237) Co-authored-by: Matt Bauman <[email protected]> * Fix and update Revise manifest (#59077) * 🤖 [master] Bump the Pkg stdlib from 38d2b366a to 542ca0caf (#59083) Co-authored-by: IanButterworth <[email protected]> * Do not needlessly disable CPU features. (#59080) On QEMU's RISC-V cpu, LLVM's `getHostCPUFeatures` reports: ``` +zksed,+zkne,+zksh,+zfh,+zfhmin,+zacas,+v,+f,+c,+zvknha,+a,+zfa,+ztso,+zicond,+zihintntl,+zvbb,+zvksh,+zvkg,+zbkb,+zvkned,+zvbc,+zbb,+zvfhmin,+zbkc,+d,+i,+zknh,+zicboz,+zbs,+zvksed,+zbc,+zba,+zvknhb,+zknd,+zvkt,+zbkx,+zkt,+zvfh,+zvkb,+m ``` We change that to: ``` +zksed,+zkne,+zksh,+zfh,+zfhmin,+zacas,+v,+f,+c,+zvknha,+a,+zfa,+ztso,+zicond,+zihintntl,+zvbb,+zvksh,+zvkg,+zbkb,+zvkned,+zvbc,+zbb,+zvfhmin,+zbkc,+d,+i,+zknh,+zicboz,+zbs,+zvksed,+zbc,+zba,+zvknhb,+zknd,+zvkt,+zbkx,+zkt,+zvfh,+zvkb,+m,-zcmop,-zca,-zcd,-zcb,-zve64d,-zve64x,-zve64f,-zawrs,-zve32x,-zimop,-zihintpause,-zcf,-zve32f ``` i.e. we add `-zcmop,-zca,-zcd,-zcb,-zve64d,-zve64x,-zve64f,-zawrs,-zve32x,-zimop,-zihintpause,-zcf,-zve32f`, disabling stuff `zve*` after first enabling `v` (which includes `zvl*b`). That's not valid: ``` LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified ``` ... so disable this post-processing of LLVM feature sets and trust what it spits out. AFAICT this only matters for the fallback path of `processor.cpp`, so shouldn't impact most users. * build: Also pass -fno-strict-aliasing for C++ (#59066) As diagnosed by Andrew Pinski (https://github.com/JuliaLang/julia/issues/58466#issuecomment-3105141193), we are not respecting strict aliasing currently. We turn this off for C, but the flag appears to be missing for C++. Looks like it's been that way ever since that flag was first added to our build system (#484). We should probably consider running TypeSanitizer over our code base to see if we can make our code correct under strict aliasing as compilers are increasingly taking advantage of it. Fixes #58466 * Fix typo in `include`'s docstring (#59055) * results.json: Fix repo paths so links to github work (#59090) * Update RISC-V building docs. (#59088) We have pre-built binaries for RISC-V now. * Test: improve type stabilities (#59082) Also simplifies code a bit, by removing unnecessary branches. * LibCURL_jll: New version 8.15.0 (#59057) Note that CURL 8.15.0 does not support using Secure Transport on MacOS any more. This PR thus switches CURL to using OpenSSL on MacOS. --------- Co-authored-by: Mosè Giordano <[email protected]> * Switch RISC-V to large model on LLVM 20 (#57865) Co-authored-by: Tim Besard <[email protected]> * Support complex numbers in eps (#21858) This came up in https://github.com/JuliaMath/IterativeSolvers.jl/pull/113#issuecomment-301273365 . JuliaDiffEq and IterativeSolvers.jl have to make sure that the real-type is pulled out in order for `eps` to work: ```julia eps(real(typeof(b))) ``` This detail can make many algorithms with tolerances that are written generically that would otherwise work with complex numbers error. This PR proposes to do just that trick, so that way `eps(1.0 + 1.0im)` returns machine epsilon for a Float64 (and generally works for `AbstractFloat` of course). --------- Co-authored-by: Steven G. Johnson <[email protected]> * 🤖 [master] Bump the Pkg stdlib from 542ca0caf to d94f8a1d9 (#59093) Co-authored-by: IanButterworth <[email protected]> * add array element mutex offset in print and gc (#58997) The layout, printing, and gc logic need to correctly offset and align the inset fields to account for the per-element mutex of an atomic array with large elements. Fix #58993 * Fix typo in tests introduced by #21858 (#59102) That [2017 PR](https://github.com/JuliaLang/julia/pull/21858) used very old types and had a semantic merge conflict. * Fix msys symlink override rule (#59101) The `export VAR=VAL` is syntax, so it can't be expanded. Fixes #59096 * inference: Make test indepdent of the `Complex` method table (#59105) * Add uptime to CI test info (#59107) * Fix rounding when converting Rational to BigFloat (#59063) * make ReinterpretArray more Offset-safe (#58898) * remove extraneous function included in #21858 (#59109) Removes an apparently extraneous function accidentally included in #21858, as noted in https://github.com/JuliaLang/julia/pull/21858/files#r2233250284. * [REPL] Handle empty completion, keywords better (#59045) When the context is empty, (like "<TAB><TAB>"), return only names local to the module (fixes #58931). If the cursor is on something that "looks like" an identifier, like a boolean or one of the keywords, treat it as if it was one for completion purposes. Typing a keyword and hitting tab no longer returns the completions for the empty input (fixes #58309, #58832). * Add builtin function name to add methods error (#59112) ``` julia> Base.throw(x::Int) = 1 ERROR: cannot add methods to builtin function `throw` Stacktrace: [1] top-level scope @ REPL[1]:1 ``` * better error in juliac for defining main inside a new module (#59106) This is more helpful if the script you try to compile defines a module containing main instead of defining it at …
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.