Create some benchmarks #1

timholy · 2021-09-14T19:38:50Z

I think it may be getting close to time to promote JuliaImages a bit more heavily. I still have a few things I want to do first wrt cleaning the code base (getting a release of Images.jl out that is compatible with ImageCore 0.9 is high on the list, as is the abs2 issue). But it will take some time to assemble good benchmarks, and I think it's best to do that out in the open so that others can pitch in. @mkitti, you'll note from the README that I'd love to have Fiji benchmarks, is that by any chance something you'd be able and interested to contribute?

FWIW, here's where we are on my machine today:

Note the log-scaling. The "generic" tests (top) are expressed per-voxel, the "specialized" tests (below) are absolute.

Together with OpenCV, we pretty much dominate already (in one case by more than three orders of magnitude), but there are a few exceptions that merit investigation. (That Gaussian blur algorithm in OpenCV is worth emulating...) Missing points generally represent unsupported operations; the "specialized" for OpenCV is still a WIP but they support very little 3d so a lot of points will be missing anyway.

This also doesn't capture some of our other advantages, like scaling to big data. I've been playing a bit with dask but it seems like a pretty far cry from what we offer. scikit-image has a habit of immediately converting everything to a numpy array.

The text was updated successfully, but these errors were encountered:

timholy · 2021-09-23T11:23:33Z

Time for a fun update. This does rely on some stuff that either hasn't been released yet or is still local on my machine:

and one that now combines both kinds of benchmarking:

OpenCV's performance advantage has, on average, essentially been erased, and I believe there is more to come.

Tokazama · 2021-09-23T12:52:14Z

I'm assuming things are going to get better as LoopVectorization becomes easier to incorporate?

timholy · 2021-09-23T12:57:43Z

Yep. The RGB filtering benchmarks (blur and gradient, aka "big kernels" and "small kernels" respectively) still aren't using LoopVectorization. It looks like OpenCV struggles a bit there too, and I think it's possible that a really good implementation may ultimately beat them almost across the board. Even the grayscale ones could use more work, as my thread utilization is hovering at around 67%, but I haven't taken a close look. But beating Matlab & skimage by an order of magnitude on a lot of benchmarks is already pretty satisfying.

timholy · 2021-09-23T13:01:42Z

Really want ImageJ benchmarks too, as that's kind of the standard in much of biology.

Tokazama · 2021-09-23T13:38:13Z

Would be nice to have ITK benchmarks too, but that's pretty specialized and the last time I used ITK I nearly ripped out my hair.

timholy · 2021-09-23T13:39:31Z

I checked a decade or so ago when I was still using Matlab and looking to use something faster, and to my surprise Matlab blew it away on most things I tried. That said, it has been a decade...I agree, ITK would be quite nice if at all possible.

mkitti · 2021-09-23T15:43:47Z

I am looking into the ImgLib2 benchmarks (ImageJ2 / FIJI). Another technology to benchmark against is the OpenCL based CLIJ2 which is available as an ImageJ plugin as well as for other platforms (Python, MATLAB) . For ITK, perhaps SimpleITK https://simpleitk.org/ might be appropriate.

If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1.

If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn`

If Julia is being run with multiple threads, this now will default to using all threads in `feature_transform`. The multithreaded implementation is relatively straightforward, although the recursive algorithm makes it a little hard to wrap your head around: - `computeft!` for dimension `d` is dependent only on slices of dimensions `1:d`, and indeed for all except the final call to `voronoift!` just on dimensions `1:d-1`. Hence you can divide the last dimension `N` into chunks, skip the final `voronoift!` on dimension `N`, and give each to a thread. - For the final `voronoift!` call along dimension `N`, it's a one-dimensional operation along this dimension. Hence you can just split the next-to-last dimension into chunks instead. This change was responsible for performance improvements in `distance_transform` noted in JuliaImages/image_benchmarks#1. Julia 1.3 is required for `@spawn` Co-authored-by: Johnny Chen <[email protected]>

RoyiAvital · 2022-11-28T18:23:37Z

I am not sure about Blur in those benchmarks as the Gaussian Blur can be applied in many forms with different balance of quality vs. run time.

But even for working with a simple kernel I can't reproduce the results above on the latest version of Images.jl (Currently 0.25.2).

For instance:

using Images;
using BenchmarkTools;

# Images
mA = rand(RGB{Float32}, 2048, 2048);

# Kernels
mW = Float32.(centered([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter

@btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR());

I get:

julia> @btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR());
  66.387 ms (30 allocations: 96.13 MiB)

On MATLAB:

mA = rand(2048, 2048, 3, 'single');
mW = [0.25, 0, 0.25; 0, -1, 0; 0.25, 0, 0.25];
hF = @() imfilter(mA, mW, 'replicate');
timeit(hF)

I get:

ans =

    0.0113

So MATLAB is faster on my machine.

timholy · 2022-11-29T13:21:29Z

Thanks for the benchmarks! A few thoughts:

Currently we only multithread separable kernels, and yours isn't. Not a difficult fix and and could be added by anyone, it's just that my own work only requires separable kernels so that's where my own priorities have been.
Using a @SMatrix for the kernel should be a bit better, care to try?
in local tests LoopVectorization makes filtering a whole lot better, esp. for SoA layouts. LoopVectorization has its own multithreading paradigm so it might simply be best to transition to LV.

RoyiAvital · 2022-11-29T13:28:21Z

OK, I chose a non separable kernel on purpose, for separable kernel I'd probably use conv2() in MATLAB. Let me try it and report.

RoyiAvital · 2022-11-29T13:46:36Z

Just a sample point with a static array:

using Images;
using StaticArrays;
using StaticKernels;
using BenchmarkTools;

# Images
mA = rand(RGB{Float32}, 2048, 2048);
mB = rand(Float32, 2048, 2048, 3);

# Kernels
# mW = @SMatrix Float32.(centered([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter
mW = centered(SMatrix{3, 3, Float32}([0.25 0 0.25; 0 -1 0; 0.25 0 0.25])); #<! The Filter
mK = @kernel w -> 0.25 * w[-1, -1] + 0.25* w[-1, 1] - w[0,0] + 0.25 * w[1, -1] + 0.25 * w[1, 1];

# Pre Allocated Results
mAO = Array{RGB{Float32}}(undef, 2048, 2048);
mBO = Array{Float32, 3}(undef, 2048, 2048, 3);

# Planar
@btime begin
    for ii in 1:3
        @views map!(mK, mBO[:, :, ii], extend(mB[:, :, ii], StaticKernels.ExtensionReplicate()));
    end
end

# Packed
@btime imfilter!(mAO, mA, mW, "replicate", ImageFiltering.Algorithm.FIR());
# @btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR());

What I get it:

julia> 

  13.101 ms (39 allocations: 1.22 KiB)
  59.660 ms (28 allocations: 48.13 MiB)

So I don't see any improvement.
In few minutes I will share the comparison to MATLAB on a separable array.

RoyiAvital · 2022-11-29T14:31:00Z

OK, Here is for a simple separable filter:

using Images;
using StaticArrays;
using StaticKernels;
using BenchmarkTools;

mA = rand(RGB{Float32}, 2048, 2048);

vW = [0.2f0 -0.5f0 0.3f0];
mW = vW' * vW;

vW = centered(vW);
mW = centered(mW);

@btime imfilter(mA, mW, "replicate", ImageFiltering.Algorithm.FIR());
@btime imfilter(mA, (vW', vW), "replicate", ImageFiltering.Algorithm.FIR());

I get ~85 Mili Sec for both (Which makes sense since according to documentation imfilter() checks for separability).
The strange thing is being slower than the previous kernel.
I guess in the previous case it optimized out the zeros coefficients. Could it be?

In MATLAB:

mA = rand(2038, 2048, 3, 'single');
vW = [0.2, -0.5, 0.3];
mW = vW.' * vW;
hF = @() imfilter(mA, mW, 'replicate');
timeit(hF)

I get ~8 Mili Sec. About ~10 times faster and indeed faster than the previous kernel for MATLAB.

By the way, in MATLAB conv2() is even more performant in case one only after the valid part of the convolution.
Something like:

function [ mO ] = Conv2DRgb( vW, mI )

mO = zeros(size(mI), 'like', mI);

for ii = 1:3
    mO(:, :, ii) = conv2(vW, vW', mI(:, :, ii), 'same');
end


end

Will get even faster by ~10%. Though above, for simplicity, I used same (Which is still faster).

Just for reference, I use MATLAB R2022b on Windows 10.

timholy mentioned this issue Sep 24, 2021

feature_transform: add multithreading JuliaImages/ImageMorphology.jl#47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create some benchmarks #1

Create some benchmarks #1

timholy commented Sep 14, 2021

timholy commented Sep 23, 2021

Uh oh!

Tokazama commented Sep 23, 2021 •

edited

Loading

Uh oh!

timholy commented Sep 23, 2021 •

edited

Loading

Uh oh!

timholy commented Sep 23, 2021

Uh oh!

Tokazama commented Sep 23, 2021

Uh oh!

timholy commented Sep 23, 2021

Uh oh!

mkitti commented Sep 23, 2021 •

edited

Loading

Uh oh!

RoyiAvital commented Nov 28, 2022

Uh oh!

timholy commented Nov 29, 2022 •

edited

Loading

Uh oh!

RoyiAvital commented Nov 29, 2022 •

edited

Loading

Uh oh!

RoyiAvital commented Nov 29, 2022

Uh oh!

RoyiAvital commented Nov 29, 2022 •

edited

Loading

Uh oh!

Create some benchmarks #1

Create some benchmarks #1

Comments

timholy commented Sep 14, 2021

timholy commented Sep 23, 2021

Uh oh!

Tokazama commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timholy commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timholy commented Sep 23, 2021

Uh oh!

Tokazama commented Sep 23, 2021

Uh oh!

timholy commented Sep 23, 2021

Uh oh!

mkitti commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyiAvital commented Nov 28, 2022

Uh oh!

timholy commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyiAvital commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyiAvital commented Nov 29, 2022

Uh oh!

RoyiAvital commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tokazama commented Sep 23, 2021 •

edited

Loading

timholy commented Sep 23, 2021 •

edited

Loading

mkitti commented Sep 23, 2021 •

edited

Loading

timholy commented Nov 29, 2022 •

edited

Loading

RoyiAvital commented Nov 29, 2022 •

edited

Loading

RoyiAvital commented Nov 29, 2022 •

edited

Loading