-
Notifications
You must be signed in to change notification settings - Fork 33
dont specialize on type in showarg #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #232 +/- ##
=======================================
Coverage 96.24% 96.24%
=======================================
Files 6 6
Lines 692 692
=======================================
Hits 666 666
Misses 26 26
Continue to review full report at Codecov.
|
Doesn't this affect the run-time performance of commonly used types like |
Could you give an example benchmark of what you mean? |
This isn't the most up-to-date, but it's FYI. #189 (comment) The absolute value of the benchmark itself is not very meaningful because of the overhead of IO, but it actually affects the speed of printing for Edit: The reason I used |
@kimikage, could you test this branch and see? Another alternative would be to write this as @inline function showtype(io::IO, ::Type{X}) where {X <: FixedPoint}
hasalias(X) && return _showtype(io, typechar(X), nbitsfrac(X), nbitsint(X))
_print(io, X)
return io
end
@noinline function _showtype(io::IO, c::Char, f::Int, m::Int)
write(io, c)
m > 9 && write(io, Char(m ÷ 10 + 0x30))
write(io, Char(m % 10 + 0x30), 'f')
f > 9 && write(io, Char(f ÷ 10 + 0x30))
write(io, Char(f % 10 + 0x30))
return io
end
_print(io::IO, @nospecialize(X)) = print(io, X)
|
I tried
and I get no difference between this PR and master. Is the worry that you will print a FixedPointType hundreds of millions of times? |
Using It's not up to a million, but 1000 times is a realistic value in images. That's not the recommended use case, though. |
But the Could you come up with a benchmark that shows the effect you're worried about? |
I don't really get it. How would you "constant fold" on an |
I'm not sure if constant folding is the proper term in this case, but the specialization reduces the function calls. You are much more familiar with Julia, so I'm not going to complain about this any more. |
I'm happy to make any changes you want. Just trying to figure out a good way to measure so I know they are effective. |
Agreed---this isn't a critique, we're just trying to understand whether there's something important that we're overlooking. Let's leave this open for a day or so and see if someone comes up with a benchmark that favors one approach over the other. If not, let's merge. |
Did you read this comment? It's a fair argument in one sense that there's no need for specialization because there's no difference in the current slow implementation, but in another sense it's not fair. Currently, a "single" julia> ARGB32(0.1,0.2,0.3,0.4)
ARGB32(0.102N0f8,0.2N0f8,0.298N0f8,0.4N0f8) In this case (i.e. without JuliaGraphics/ColorTypes.jl#206), the "fully-optimized" version will certainly speed up the I'm not inherently opposed to this PR, but I am uncomfortable with the bias in the decision-making process. |
How is that not fair? If you get the same performance with / without, better not to compile hundreds of redundant versions of the same method?
All I have done is:
Which bias are you pointing to? Also, even if this made the function say two times slower you would have to print the type tens of millions of times to notice a difference. And printing types is not done that much. |
I think the printing still slow (in ColorTypes).
It's a matter of feeling, but you seem to be neglecting the issue. Again, I don't see a problem with this PR. I think it might be better to avoid calling the |
I don't think that the time to print something can be a matter of feeling since it can be measured. With "the issue", do you mean I should benchmark the time to print a matrix of ColorTypes to the REPL, for example?
If you have an option you like more, feel free to go with that. But if there is no performance gain to specialize it seems unnecessary to compile this function for all the different types during runtime as well. |
I'm totally happy to close this without merging, all that it would require is a benchmark that proves it's a bad idea. I don't see any bias in that. @KristofferC has supplied a benchmark that shows a convincing compile-time advantage; at the moment all we have is a theoretical runtime disadvantage, but efforts to actually demonstrate it have not indicated that it's real. Currently the balance of incontrovertible evidence is strongly in favor of merging, but that could change in an instant with the right benchmark. If you need more time to generate one, that's fine too; this isn't urgent, since the times involved are not huge either way. |
I don't mean to pick a fight, but @KristofferC doesn't benchmark the disadvantages of not doing specialization.
Efforts? We cannot benchmark what does not currently exist. How can I show the problem other than in theory? In theory, the importance of using FixedPointNumbers
function FixedPointNumbers.showtype(io::IO, ::Type{N0f8})
print(io, "N0f8")
io
end
using ColorTypes, BenchmarkTools
img = reinterpret.(ARGB32, rand(UInt32, 32, 32));
function dump(io, img32x32)
for y in 1:32
for x in 1:32
@inbounds print(io, img32x32[y, x], '\t')
end
println(io)
end
end
io = IOBuffer(); julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
julia> @benchmark dump(io, $img)
BenchmarkTools.Trial:
memory estimate: 1.70 MiB
allocs estimate: 9216
--------------
minimum time: 621.000 μs (0.00% GC)
median time: 1.327 ms (0.00% GC)
mean time: 1.516 ms (17.37% GC)
maximum time: 176.787 ms (0.18% GC)
--------------
samples: 3306
evals/sample: 1
julia> @benchmark dump(io, $img) # nospecialize
BenchmarkTools.Trial:
memory estimate: 1.70 MiB
allocs estimate: 9216
--------------
minimum time: 756.400 μs (0.00% GC)
median time: 1.542 ms (0.00% GC)
mean time: 2.029 ms (14.41% GC)
maximum time: 527.388 ms (1.27% GC)
--------------
samples: 2583
evals/sample: 1
Of course, this is bogus and does not indicate a problem with this PR. I'm not saying we have a problem with using First of all, Julia v1.6.0-DEV has a fatal regression in printing |
I didn't submit PR #233 to point out the problems with |
I thought I did by benchmarking the function that has
I was just trying out some ways of seeing what functions end up getting compiled when packages are precompiled. I loaded Plots (I think) and this just stuck out to me that it was compiling hundreds of versions of the same method. Since it was a method where performance is not usually important, I just did a quick fix, noticed it halved the time to precompile that packages and made a PR. Anyway, feel free to go with #233 or whatever you prefer. This PR went in a strange direction indeed. |
I apologize for disrupting the discussion. I think the timing was a bit unlucky this time. While refactoring the Evaluating the difference in speed due to changes (with/without PR #233 is not technically superior to this PR, and it is simply a matter of convenience in the development of this package. (From a technical standpoint, @timholy's idea may be the most technical.:smile:) I believe the developers have benefited from the fact that such a small change can make a big difference in precompilation time. 👍 |
A significant amount of time when precompiling the package is from compiling methods like:
in
FixedPointNumbers.jl/src/fixed.jl
Lines 32 to 41 in f57e96b
By avoiding specializing
showtype
on the type, the time for precompilation gets cut in about half.Before:
After:
Another option could be to run that piece of code in a separate module and have that module run without inference.