-
Notifications
You must be signed in to change notification settings - Fork 33
Add checked, wrapping and saturating arithmetic for add/sub/neg #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Although I didn't intend to work on optimization in this PR, I added the specialized methods for unsigned saturation operation. It may be better for Julia to support the LLVM's saturation arithmetic intrinsics. Perhaps that should have been discussed, but I couldn't find the latest info. Anyway, julia> x_n0f8 = collect(rand(N0f8, 1000, 1000)); y_n0f8 = collect(rand(N0f8, 1000, 1000));
julia> x_q0f7 = collect(rand(Q0f7, 1000, 1000)); y_q0f7 = collect(rand(Q0f7, 1000, 1000));
julia> @btime wrapping_add.($x_n0f8, $y_n0f8); # Julia v1.4.2, v1.5.0
87.900 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_add.($x_n0f8, $y_n0f8); # Julia v1.4.2
424.299 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_add.($x_n0f8, $y_n0f8); # Julia v1.5.0 (using AVX2 vpaddusb instruction)
98.499 μs (2 allocations: 976.70 KiB) julia> @btime saturating_add.($x_q0f7, $y_q0f7); # Julia v1.4.2 using add_with_overflow
2.523 ms (2 allocations: 976.70 KiB)
julia> @btime saturating_add.($x_q0f7, $y_q0f7); # Julia v1.5.0 using add_with_overflow
679.901 μs (2 allocations: 976.70 KiB) Edit:
They are still slow, but it's better than not doing anything. |
Codecov Report
@@ Coverage Diff @@
## master #190 +/- ##
==========================================
+ Coverage 87.89% 88.24% +0.34%
==========================================
Files 6 6
Lines 471 485 +14
==========================================
+ Hits 414 428 +14
Misses 57 57
Continue to review full report at Codecov.
|
32f7463
to
7248e33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice bit twiddling!
Do we need to check any intermediate values? All the tests use extreme values (typemin/typemax/zero/eps).
I foresee that the 100% test of 16-bit types for division family will take time. (Of course, the division operation itself should not be slow.) However, I feel like we can test the 8-bit types. |
That's kind of what I was thinking too. |
I added the 8-bit |
The default arithmetic is still the wrapping arithmetic.
There are still opportunities to review the test code. I plan to improve the multiplication implementation in two or three PRs after this. |
Checked arithmetic may throw an exception, so a slowdown is unavoidable. Instead of julia> x = rand(N0f8, 1000, 1000);
julia> y = typemax(N0f8) .- x;
julia> function checked_add_arrays(x, y)
s = saturating_add.(x, y)
all(wrapping_add.(x, y).==s) || throw(OverflowError("overflowed somewhere"))
s
end
checked_add_arrays (generic function with 1 method)
julia> @btime wrapping_add.($x, $y);
86.299 μs (2 allocations: 976.70 KiB)
julia> @btime checked_add.($x, $y);
898.900 μs (2 allocations: 976.70 KiB) # Windows
678.800 μs (2 allocations: 976.70 KiB) # Linux
julia> @btime checked_add_arrays($x, $y);
313.200 μs (6 allocations: 1.08 MiB) Edit: function checked_add(x::C, y::C) where {C <: AbstractRGB{<:FixedPoint}}
s = mapc(saturating_add, x, y)
s === mapc(wrapping_add, x, y) || throw(OverflowError(""))
s
end |
Closes #152.
The default arithmetic is still the wrapping arithmetic.
We may want to customize the overflow message.