- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
Julia has a div
function that is used to implement floor division (fld
) and ceil division (cld
). I think I’ve found a bug that causes all of these division functions to return incorrect results.
Floor division is documented to return the largest integer less than or equal to x / y
. It should never return a number that is greater than x / y
.
But it does:
julia> 514 / Float16(0.75)
Float16(685.5)
julia> div(514, Float16(0.75)) # should round down, but rounds up instead
Float16(686.0)
julia> fld(514, Float16(0.75)) # likewise
Float16(686.0)
julia> fld(514, Float16(0.75)) ≤ 514 / Float16(0.75)
false
Similarly, ceil division should never return a number that is smaller than regular division, but it does:
julia> 515 / Float16(0.75)
Float16(686.5)
julia> cld(515, Float16(0.75)) # should round up, but rounds down instead
Float16(686.0)
julia> cld(515, Float16(0.75)) ≥ 515 / Float16(0.75)
false
This behavior is not limited to 16-bit floats. Here’s a case where fld
produces an incorrect result for Float32
inputs:
julia> 4_194_307 / Float32(0.75) # = 5592409.5
5.5924095f6
julia> fld(4_194_307, Float32(0.75)) # = 5592410, incorrectly rounded up
5.59241f6
julia> fld(4_194_307, Float32(0.75)) ≤ 4_194_307 / Float32(0.75)
false
And here’s the same for cld
:
julia> 4_194_308 / Float32(0.75) # = 5592410.5
julia> cld(4_194_308, Float32(0.75)) # = 5592410, incorrectly rounded down
5.59241f6
julia> cld(4_194_308, Float32(0.75)) ≥ 4_194_308 / Float32(0.75)
false
The equivalent operations in Python produce the correct results:
# For 16-bit floats:
>>> np.float16(514) / np.float16(0.75) # regular division
685.5
>>> np.float16(514) // np.float16(0.75) # floor division
685.0
# For 32-bit floats:
>>> np.float32(4_194_307) / np.float32(0.75) # regular division
5592409.5
>>> np.float32(4_194_307) // np.float32(0.75) # floor division
5592409.0
Examples of this incorrect behavior are not hard to find – for most floats, you can find a divisor that will make either fld
or cld
return the wrong answer.
Here are some examples for Float16
where either fld
or cld
is incorrect:
cld(1, Float16(0.000999)) < 1 / Float16(0.000999)
cld(2, Float16(0.001999)) < 2 / Float16(0.001999)
cld(3, Float16(0.002934)) < 3 / Float16(0.002934)
cld(4, Float16(0.003998)) < 4 / Float16(0.003998)
fld(5, Float16(0.004925)) > 5 / Float16(0.004925)
And here are some for Float32
:
fld(5, Float32(6.556511e-7)) > 5 / Float32(6.556511e-7)
fld(10, Float32(1.3113022e-6)) > 10 / Float32(1.3113022e-6)
fld(11, Float32(1.4305115e-6)) > 11 / Float32(1.4305115e-6)
cld(16, Float32(2.8014183e-6)) < 16 / Float32(2.8014183e-6)
cld(17, Float32(2.2053719e-6)) < 17 / Float32(2.2053719e-6)
For simplicity I’ve presented examples where the first argument is an integer; this bug also occurs for non-integral inputs.
A divisor producing the wrong result can be found for over 51% of all possible 16-bit floats. I have not evaluated how widespread this is for Float32
, but the results above suggest that it is similarly easy to find failures there too.
I’ve tracked the invocations down to this definition of div
, which has existed at least as far back as Julia 1.6:
div(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} =
convert(T, round((x - rem(x, y, r)) / y))
Activity
Seelengrab commentedon Apr 21, 2023
This was moved from
base/operators.jl
in this PR - see the diff here. The method seems to have been buggy before that, judging from these outputs:The passing Float16 tests in 1.3 are a bit surprising, since back then the way these were implemented was by doing the division in Float32 and just converting to Float16 afterwards. As far as I can tell, these are/were untested 😬
Full tests & outputs can be found in this gist.
Sounds like you have a testsuite covering these cases, it would be great if that could be incorporated into our testsuite!
yurivish commentedon Apr 21, 2023
I wrote a function that counts the number of floats with a floor above (or ceiling below) the result of regular division, using a nested loop.
Running it now over the full set of floats it looks like my earlier 51% figure was an underestimate, and a divisor producing the wrong result can be found for 68.4% of all possible 16-bit floats.
Seelengrab commentedon Apr 21, 2023
Ah, gotcha - that seems to be a tad bit high. I'm only counting 0.05%, when taking both arguments into account:
The sparsity pattern is insightful:
With one quadrant of that looking like:
The repeating nature of this pattern suggests that the issue is independent of the sign of the inputs.
Seelengrab commentedon Apr 21, 2023
Zooming in on one of these spikes (specifically, the one at x=
11265:12288
and y=1024:2048
, both from the top left corner) and rendering it:Red means
floor_is_wrong
, blue meansceil_is_wrong
. Maybe someone knowledgable in floating point shenanigans can gleam what is wrong with our implementations from that?Code
EDIT: After managing to save the full image (which browsers don't like to render..), I can confirm that the other spikes to the bottom right are exactly the same as the one from this image.
yurivish commentedon Apr 21, 2023
That error rate suggests that this bug manifests once out of every ~1,700 randomly chosen
Float16
divisions:That's frequent enough that randomized testing should let us estimate the prevalence of the bug at higher precisions, so I put together a quick script (see below).
Based on a billion samples, the bug manifests once out of every ~10,000 randomly chosen
Float32
divisions and once out of every ~74,000Float64
divisions.mikmoore commentedon Apr 21, 2023
The problem
Here's what I found digging into this:
So the issue appears to be with the implementation
It assumes that
x-rem(x,y,r)
will be roughly evenly divisible byy
. This is only true if there is small roundoff error in the subtraction. It then rounds the number to the nearest integer (which would address small errors in the subtraction, but not large ones). This means that, in the worst case of total underflow during the subtraction (as we see in the example), we effectively getdiv(x,y,::RoundingMode) = round(x/y)
. One notices that the rounding mode is effectively "unused" when the remainder is small relative to the numerator.These examples show that this manifests when
x - rem(x,y,r) == x
and the floating point rounding mode (usuallyRoundNearest
(ties go to even) resolves a0.5
fractional part ofx/y
in the "unlucky" direction.Also, with this failure mode in mind, it's entirely possible to construct failing examples in any precision:
Note that
BigFloat
has its own method and does not suffer from this bug.A step in the right direction
It appears that redefining
has significantly reduced the number of incorrect input pairs. This still feels like it may not be treating the full extent of these roundoff issues, however. I'm out of time to chase down the remaining issues right now.
It's actually very fortunate that we use the same methods for all
IEEEFloat
types. It means that if we can fix it for allFloat16
pairs then it will probably be correct for allFloat32
orFloat64
pairs.Seelengrab commentedon Apr 21, 2023
I wanted to plot the new sparsity pattern, but now I cannot reproduce those numbers on my end:
Which version are you on? I actually tried that exact same fix earlier today, but ran into some problems with
Float32
still not being fixed - perhaps that was a fluke though..mikmoore commentedon Apr 21, 2023
I added versioninfo to my earlier post. It's v1.8.0 (kinda old) on Intel x86-64 (non-native Float16). I ran the test again in a fresh session and reproduced my earlier number of failures. I'll remark that my number of resolved failures
2511024 - 2792
equals your number of resolved failures2508232
, so it appears some orthogonal issue I was seeing is resolved by a post-1.8.0 version or your hardware.It's interesting to hear that some
Float32
examples were not resolved by this. Can you post an example or two?aravindh-krishnamoorthy commentedon Apr 22, 2023
Well, if we already use a
round
, I don't see the problem with:The code
x-rem(x,y,r))/y
seems to have been implemented beforeround
and ensures that x/y can be implemented using integer division. After implementinground
someone seems to have directly wrapped it without removing therem
part!With the above definition (
newdiv2
),count_wrong_floats()
is zero.Edit: minor editorial changes.
mikmoore commentedon Apr 24, 2023
It appears that the
rem
component cannot be easily discarded. Consider this example: what shouldfld(1.0,0.1)
produce? At a glance, one would say10.0
. This is wrong. Recall that0.1
is not exactly representable inFloat64
.We see that we cannot ensure the correct result by computing
x / y
in the default floating point environment and then rounding the way we want. But we currently don't have good control over the floating point rounding mode (I think?setrounding
only works forBigFloat
and has a spooky warning about a lack of thread safety) and I'd be anxious trying to mess with it in any case (I don't remember if it's thread-local or processor-local or varies by architecture).So some form of precision extension,
rem
,fma
, or other mechanism of delayed rounding (or a way to briefly change the floating point environment without risking breaking everything elsewhere) appears necessary to get this right.nsajko commentedon Apr 27, 2023
A proof-of-concept fix using double-word floating-point arithmetic could maybe look something like this (not tested yet):
I guess that the assumption of
rounding(T) == RoundNearest
(necessary for multi-word arithmetics AFAIK) is OK considering that theround
documentation already says:aravindh-krishnamoorthy commentedon Apr 27, 2023
Thank you for this comment. Indeed, as you argue, it is not meaningful to round at the same precision as the computation of
x/y
.fld
(as above) seems to be written (and seems important for) for integral arguments, anyway. However, in this case, I feel that simplerounddown(x/y)
, i.e.:is preferable compared to
100.0
, in case anybody would want to use it. Furthermore, the documentation clarifies this point: https://docs.julialang.org/en/v1/base/math/#Base.fld (and seems to focus on floating-point numbers; not sure why).This is because, otherwise, the definition
x/y > fld(x,y)
might not be met with conventional FLT instructions and we may have to implement the floating point division ourselves in higher precision instdlib
, which might be counterproductive.IMHO as well, the perfect way, of course, is to tell the
Float16
FPU or software, whose job this is anyway, to round appropriately. But, wrapping it inround
seems to be the second best approach and, IMHO, the best we can do inbase
.But, kindly also advise if you see any issues with
fld(10.0,0.1)
being99.0
? If the reasoning is strong, we can argue to change the behaviour.KristofferC commentedon Apr 27, 2023
Would it be useful to look at the implementation of this in other languages?
29 remaining items