-
Notifications
You must be signed in to change notification settings - Fork 93
Parallelization Framework for ImgLib2 #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TaskExecutor should be used in image processing algorithms to realize parallization instead of ExecutorService. TaskExecutor has a single threaded implementation. That runs with no parallelization overheaded, and it is easier to use than ExecutorService. The Parallelization class allows to set one TaskExecutor per thread. This means that algorithms don't need to have a TaskExecutor argument. Instead an algorithm can be called like this: Parallelization.runSingleThreaded( () -> myAlgorithm() ); or Parallelization.runMultiThreaded( () -> myAlgoritm() );
@maarzt this looks good, great work! I added a few comments to the code. Additionally, I think that a method Optional<int[]> TaskExecutor.suggestChunkSize(Dimensions dimensions) would be super useful if a certain block/chunk size is required, e.g. when writing to N5 in |
@hanslovsky Thank you for reviewing this so quickly! ❤️ Please elaborate a little more on this @gselzer Do I remember correctly, that you are working on Ops? I am really curios, if this PR works well together with Ops. Could it be used for parallelization there? |
src/test/java/net/imglib2/parallel/DefaultTaskExecutorTest.java
Outdated
Show resolved
Hide resolved
@maarzt very nice! Besides minor issues/questions (see inline comments) I like LoopBuilder changes are also fine. |
In many of our tasks, the way we parallelize is ultimately determined by the block/chunk/cell size of the data that we write out, eventually. In particular, that means that we need to provide a
It may be preferable to always pass it as explicit parameter to the method anyway, so callers are aware that they need to set this parameter explicitly, instead of falling back to a default value (if any) that does make sense for the number of tasks, but not for the block/cell/chunk size. This may also be more of a concern for distributed memory computations. |
This probably makes the |
Yeah, maybe the scijava-ops framework should make use of it. Hopefully I'll have time to do some experiments before the hackathon is over. |
@maarzt sorry for the delayed answer to your question, but yeah, I think this could be super useful in Ops. As scijava/scijava#31 suggests we have no good multithreading solution as of this new iteration of ops; I hacked together the |
@maarzt @tpietzsch Can this PR please be resolved and merged soon? I need it for scijava-ops, as well as for pyimagej. This work hopefully solves #237. |
This is the realization of an idea how to organize imglib2 related multi-threaded code. I previously discussed it with @tpietzsch, @hanslovsky, @axtimwalde. Your thoughts are very welcome.
The two concepts in this PR, are the interface
TaskExecutor
and the classParallelization
:TaskExecutor
TaskExecutor
actually is an interface not a class. It's similar toExecutorService
but offers simpler methods. It's better suited to image processing algorithms. The following example usesTaskExecutor
to fill an image with all ones in a multi-threaded way:This method can be called by
fillImageWithOnes( image, TaskExecutors.multiThreaded() )
. And it's possible to run it single threaded byfillImageWithOnes( image, TaskExecutors.singleThreaded() )
.TaskExecutor.singleThreaded()
is very lightweight. It requires now resources, threads, or memory allocation, and can therefor also be used in very tight loops. When single threadedtaskExecutor.suggestNumberOfTasks()
returns 1, andtaskExecutor.forEach(...)
is just a simple for loop. This means thatsplitImageIntoChunks( image, 1 )
can just return the original image, and as a consequencefillImageWithOnes
will run with optimal performance, in both modes: single- and multi-threaded.Lets imagine, what the method would look like, if
ExecutorService
is used:Runtime.getRuntime().availableProcessors() * 4
.Callable<Void>
.Future<Void> futeres = executorService.invokeAll(callables)
.ExecutionException
and `ItenterruptedException'.The
ExecutorService
based function is way more complicated. To run it single threaded one would useExecutors.newSingleThreadedExecutor()
, which has a big overhead.Parallelization
In ImgLib2 it's common that a multi-threaded algorithm has some additional parameters,
ExecutorService
,numThreads
ornumberOfTask
or combinations of these. This means the user needs to specify anExecutorService
, or number of threads. But most time a user wouldn't want to care about that, and just run the method quickly.I suggest the following approach:
The class
Parallelization
associates aTaskExecutor
with each thread. (ThreadLocal<TaskExecutor>
is used internally.)Parallelization.getTaskExecutor()
returns the task executor for the current task.Parallelization.runSingleThreaded( task )
first sets the task executor of the current thread toTaskExecutor.singleThreaded()
, then executes the task, and finally restores the current threads task executor to it's original value. The other methodsrunMultiThreaded
andrunWithExecutor
work similarly.This approach has several advantage:
TaskExecutor
doesn't need to be passed around as a parameter.ForkJoinPool
.TaskExecutor
setting with a thread. The threads in a fixed thread pool ('Executors.newFixedThreadPool()) shouldn't do nested parallelization. That's way these threads need their task executors be set to
TaskExecutor.singleThreaded(). (This is done for you just use
TaskExecutors.fixedThreadPool()`).LoopBuilder & ImgLib2-Algorithm
I used this approach in
LoopBuilder
. Let's write thefillImageWithOnes
method once more:It's possible to execute this method single, and multi-threaded with the
Parallelization.runSingleThreaded()
, etc.I used the presented approach, in imglib2-algorithm. I will put a link below.
Backwards compatibility
There are a whole bunch of methods, that ensure compatibility with the other approaches: