-
Notifications
You must be signed in to change notification settings - Fork 1
Create some benchmarks #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm assuming things are going to get better as LoopVectorization becomes easier to incorporate? |
Yep. The RGB filtering benchmarks ( |
Really want ImageJ benchmarks too, as that's kind of the standard in much of biology. |
Would be nice to have ITK benchmarks too, but that's pretty specialized and the last time I used ITK I nearly ripped out my hair. |
I checked a decade or so ago when I was still using Matlab and looking to use something faster, and to my surprise Matlab blew it away on most things I tried. That said, it has been a decade...I agree, ITK would be quite nice if at all possible. |
I am looking into the ImgLib2 benchmarks (ImageJ2 / FIJI). Another technology to benchmark against is the OpenCL based CLIJ2 which is available as an ImageJ plugin as well as for other platforms (Python, MATLAB) . For ITK, perhaps SimpleITK https://simpleitk.org/ might be appropriate. |
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1.
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn`
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn`
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn`
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>
If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>
I am not sure about But even for working with a simple kernel I can't reproduce the results above on the latest version of For instance: using Images;
using BenchmarkTools;
# Images
mA = rand(RGB{Float32}, 2048, 2048);
# Kernels
mW = Float32.(centered([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter
@btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR()); I get:
On MATLAB: mA = rand(2048, 2048, 3, 'single');
mW = [0.25, 0, 0.25; 0, -1, 0; 0.25, 0, 0.25];
hF = @() imfilter(mA, mW, 'replicate');
timeit(hF) I get:
So MATLAB is faster on my machine. |
Thanks for the benchmarks! A few thoughts:
|
OK, I chose a non separable kernel on purpose, for separable kernel I'd probably use |
Just a sample point with a static array: using Images;
using StaticArrays;
using StaticKernels;
using BenchmarkTools;
# Images
mA = rand(RGB{Float32}, 2048, 2048);
mB = rand(Float32, 2048, 2048, 3);
# Kernels
# mW = @SMatrix Float32.(centered([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter
mW = centered(SMatrix{3, 3, Float32}([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter
mK = @kernel w -> 0.25 * w[-1, -1] + 0.25* w[-1, 1] - w[0,0] + 0.25 * w[1, -1] + 0.25 * w[1, 1];
# Pre Allocated Results
mAO = Array{RGB{Float32}}(undef, 2048, 2048);
mBO = Array{Float32, 3}(undef, 2048, 2048, 3);
# Planar
@btime begin
for ii in 1:3
@views map!(mK, mBO[:, :, ii], extend(mB[:, :, ii], StaticKernels.ExtensionReplicate()));
end
end
# Packed
@btime imfilter!(mAO, mA, mW, "replicate", ImageFiltering.Algorithm.FIR());
# @btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR()); What I get it: julia>
13.101 ms (39 allocations: 1.22 KiB)
59.660 ms (28 allocations: 48.13 MiB) So I don't see any improvement. |
OK, Here is for a simple separable filter: using Images;
using StaticArrays;
using StaticKernels;
using BenchmarkTools;
mA = rand(RGB{Float32}, 2048, 2048);
vW = [0.2f0 -0.5f0 0.3f0];
mW = vW' * vW;
vW = centered(vW);
mW = centered(mW);
@btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR());
@btime imfilter(mA, (vW', vW), "replicate", ImageFiltering.Algorithm.FIR()); I get ~85 Mili Sec for both (Which makes sense since according to documentation In MATLAB: mA = rand(2038, 2048, 3, 'single');
vW = [0.2, -0.5, 0.3];
mW = vW.' * vW;
hF = @() imfilter(mA, mW, 'replicate');
timeit(hF) I get ~8 Mili Sec. About ~10 times faster and indeed faster than the previous kernel for MATLAB. By the way, in MATLAB function [ mO ] = Conv2DRgb( vW, mI )
mO = zeros(size(mI), 'like', mI);
for ii = 1:3
mO(:, :, ii) = conv2(vW, vW', mI(:, :, ii), 'same');
end
end Will get even faster by ~10%. Though above, for simplicity, I used Just for reference, I use MATLAB R2022b on Windows 10. |
I think it may be getting close to time to promote JuliaImages a bit more heavily. I still have a few things I want to do first wrt cleaning the code base (getting a release of Images.jl out that is compatible with ImageCore 0.9 is high on the list, as is the
abs2
issue). But it will take some time to assemble good benchmarks, and I think it's best to do that out in the open so that others can pitch in. @mkitti, you'll note from the README that I'd love to have Fiji benchmarks, is that by any chance something you'd be able and interested to contribute?FWIW, here's where we are on my machine today:
Note the log-scaling. The "generic" tests (top) are expressed per-voxel, the "specialized" tests (below) are absolute.
Together with OpenCV, we pretty much dominate already (in one case by more than three orders of magnitude), but there are a few exceptions that merit investigation. (That Gaussian blur algorithm in OpenCV is worth emulating...) Missing points generally represent unsupported operations; the "specialized" for OpenCV is still a WIP but they support very little 3d so a lot of points will be missing anyway.
This also doesn't capture some of our other advantages, like scaling to big data. I've been playing a bit with dask but it seems like a pretty far cry from what we offer.
scikit-image
has a habit of immediately converting everything to a numpy array.The text was updated successfully, but these errors were encountered: