Closed
Description
Adding extra processes and scheduling with eager API seems to be producing error and warnings about reschduling do to workers dying. For example, snippet taken from README:
using Distributed; addprocs() # Add one Julia worker per CPU core
using Dagger
# This runs first:
a = Dagger.@spawn rand(100, 100)
# These run in parallel:
b = Dagger.@spawn sum(a)
c = Dagger.@spawn prod(a)
# Finally, this runs:
wait(Dagger.@spawn println("b: ", b, ", c: ", c))
Gives the following error:
From worker 2: b: 5061.860461804876, c: 0.0
┌ Warning: Worker 2 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Error: Error assigning workers
│ exception =
│ ProcessExitedException(2)
│ Stacktrace:
│ [1] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
│ @ Distributed ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1093
│ [2] worker_from_id
│ @ ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1090 [inlined]
│ [3] remote_do
│ @ ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:557 [inlined]
│ [4] cleanup_proc(state::Dagger.Sch.ComputeState, p::OSProc, log_sink::TimespanLogging.NoOpLog)
│ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:408
│ [5] monitor_procs_changed!(ctx::Context, state::Dagger.Sch.ComputeState)
│ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:890
│ [6] (::Dagger.Sch.var"#100#102"{Context, Dagger.Sch.ComputeState})()
│ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:508
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:510
┌ Warning: Worker 3 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 12 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 15 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 13 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 17 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 14 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 8 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 11 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 4 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 5 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 10 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 6 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 7 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 16 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
┌ Warning: Worker 9 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/kBlIi/src/sch/Sch.jl:545
The error sometimes is omitted but warnings about workers dying are present.
If lazy API is used then there are no warnings or errors
The warnings seems to be harmless since they appear only while finishing the job
versioninfo:
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 5700G with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Dagger: 0.18.11
I couldn't find any duplicates
Metadata
Metadata
Assignees
Labels
No labels