Description
I have been running a performance comparison of Dask Futures and other solutions and found unexpected behavior when it comes to the throughput as I increase the number of tasks.
In this test, I am running on 6 nodes (client, scheduler and 4 workers, 1 thread per worker); however, the behavior is consistent for 2-2048 workers. Each task is a dummy function returning its argument. I am timing this:
futures = client.map(dummy, data, key=labels)
wait(futures,return_when='ALL_COMPLETED')
where data
is a list of None
, labels
is a unique key, and the dummy
function simply returns its argument. Complete code is attached.
Based on 30 repetitions for each configuration and after discarding the first timing to account for any lazy initialization, I am getting the following results:
Is this characteristic something you would expect given such a trivial task? Is the scheduler not able to keep up with this many tasks?
Thank you for any insights!
Environment:
- dask==2021.12.0, dask_mpi==2021.11.0, distributed==2021.12.0
- Python 3.10.1
- Cray XC-40, each node has 32 cores and 128 GB of RAM
- reproduced on another cluster with dask 2021.11
Complete code:
latency.py.txt