Skip to content

Dask Futures performance as the number of tasks increases #5715

Open
@mrogowski

Description

@mrogowski

I have been running a performance comparison of Dask Futures and other solutions and found unexpected behavior when it comes to the throughput as I increase the number of tasks.

In this test, I am running on 6 nodes (client, scheduler and 4 workers, 1 thread per worker); however, the behavior is consistent for 2-2048 workers. Each task is a dummy function returning its argument. I am timing this:

futures = client.map(dummy, data, key=labels)
wait(futures,return_when='ALL_COMPLETED')

where data is a list of None, labels is a unique key, and the dummy function simply returns its argument. Complete code is attached.

Based on 30 repetitions for each configuration and after discarding the first timing to account for any lazy initialization, I am getting the following results:

plot

Is this characteristic something you would expect given such a trivial task? Is the scheduler not able to keep up with this many tasks?

Thank you for any insights!

Environment:

  • dask==2021.12.0, dask_mpi==2021.11.0, distributed==2021.12.0
  • Python 3.10.1
  • Cray XC-40, each node has 32 cores and 128 GB of RAM
  • reproduced on another cluster with dask 2021.11

Complete code:
latency.py.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions