Description
I think Julia is being overzealous in marking the first fmul
as contact
-able here:
julia> smat = (0.3759580090201759539070280879968777298927307128906250000000000000, -0.2925452398379911889136906211206223815679550170898437500000000000, -0.8792456187560546698733787707169540226459503173828125000000000000, -0.6857980393775208183271274720027577131986618041992187500000000000, 0.5502662724003037908460100879892706871032714843750000000000000000, -0.4763277008999405315314845665852772071957588195800781250000000000, 0.6231666106584448083793859041179530322551727294921875000000000000, 0.7820641355456769971965513832401484251022338867187500000000000000, 0.0062500602924510650915124188031768426299095153808593750000000000);
julia> svec = (0.5513258229231927654012679340667091310024261474609375000000000000, -0.8283034402416151742443162220297381281852722167968750000000000000, -0.0997659654489912867125767093057220336049795150756835937500000000)
(0.5513258229231928, -0.8283034402416152, -0.09976596544899129)
julia> f3(a, b) = muladd(a[9], b[3], muladd(a[6], b[2], a[3] * b[1]))
f3 (generic function with 1 method)
julia> @code_llvm f3(smat, svec)
; Function Signature: f3(NTuple{9, Float64}, Tuple{Float64, Float64, Float64})
; @ REPL[4]:1 within `f3`
define double @julia_f3_1940(ptr nocapture noundef nonnull readonly align 8 dereferenceable(72) %"a::Tuple", ptr nocapture noundef nonnull readonly align 8 dereferenceable(24) %"b::Tuple") #0 {
top:
; ┌ @ tuple.jl:31 within `getindex`
%"a::Tuple[9]_ptr" = getelementptr inbounds i8, ptr %"a::Tuple", i64 64
%"b::Tuple[3]_ptr" = getelementptr inbounds i8, ptr %"b::Tuple", i64 16
%"a::Tuple[6]_ptr" = getelementptr inbounds i8, ptr %"a::Tuple", i64 40
%"b::Tuple[2]_ptr" = getelementptr inbounds i8, ptr %"b::Tuple", i64 8
%"a::Tuple[3]_ptr" = getelementptr inbounds i8, ptr %"a::Tuple", i64 16
; └
; ┌ @ float.jl:493 within `*`
%"a::Tuple[3]_ptr.unbox" = load double, ptr %"a::Tuple[3]_ptr", align 8
%"b::Tuple.unbox" = load double, ptr %"b::Tuple", align 8
%0 = fmul contract double %"a::Tuple[3]_ptr.unbox", %"b::Tuple.unbox"
; └
; ┌ @ float.jl:496 within `muladd`
%"a::Tuple[6]_ptr.unbox" = load double, ptr %"a::Tuple[6]_ptr", align 8
%"b::Tuple[2]_ptr.unbox" = load double, ptr %"b::Tuple[2]_ptr", align 8
%1 = fmul contract double %"a::Tuple[6]_ptr.unbox", %"b::Tuple[2]_ptr.unbox"
%2 = fadd contract double %0, %1
%"a::Tuple[9]_ptr.unbox" = load double, ptr %"a::Tuple[9]_ptr", align 8
%"b::Tuple[3]_ptr.unbox" = load double, ptr %"b::Tuple[3]_ptr", align 8
%3 = fmul contract double %"a::Tuple[9]_ptr.unbox", %"b::Tuple[3]_ptr.unbox"
%4 = fadd contract double %2, %3
ret double %4
; └
}
Why is that first %0 = fmul contract
? That's the *
I wrote — and it is unexpected that passing that result to muladd
would induce a contract without @fastmath
.
We noticed this because having that *
marked contract
means that LLVM rearranges this operation on my ARM64 CPU. Specifically, f3
gives different answers in a function than the same code does at top-level.
julia> f3(smat, svec)
-0.0908304842736846
julia> (a, b) = (smat, svec);
julia> muladd(a[9], b[3], muladd(a[6], b[2], a[3] * b[1]))
-0.09083048427368462
julia> # it's actually doing this:
muladd(a[9], b[3], muladd(a[3], b[1], a[6] * b[2]))
-0.0908304842736846
I see this behavior both on v1.10.5 and today's nightly (1.12.0-DEV.1209 @ 55c40ce).
julia> versioninfo()
Julia Version 1.12.0-DEV.1209
Commit 55c40ce52eb (2024-09-16 17:25 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)