Description
The current @threads for
loop breaks the work into equal length chunks and runs them on available pooled threads. This could be improved by using a work stealing technique like the one used by Rust Rayon in which the work to be done is queued and threads that have some free time can pop work from this queue.
Work stealing can result in faster parallel execution due to these significant advantages:
-
More efficient cache utilization: all threads use similar memory locations. Unlike the current implementation where memory is divided into large chunks with work stealing each thread works on a single item per time
-
Load balancing: when an item takes more time to process other threads will not wait for it to finish, instead they will continue stealing work. In the current implementation one thread can finish all work significantly faster than others and just wait for all to finish instead of working
Haven't written an implementation yet, just wanted to start a discussion