Closed
Description
Starting the effort of documenting any somewhat replicable hangs
Observations/conditions:
- Can be ctrl-c'd - stacktrace below (other's not so much)
- Some Julia instances will hang almost immediately (1st or 2nd run of groupy), others will never hang no matter how many runs (consistent with other hangs)
- Julia master with all available fixes merged and Dagger with all available fixes merged
- Running with threads only
Thread usage during the hang : none
Stacktrace:
PS C:\Users\krynjupc\.julia\dev\Dagger> julia -t16
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.0-DEV.490 (2021-09-13)
_/ |\__'_|_|_|\__'_| | kr/distributed-ref-count-race/dfd4724ce3 (fork: 2 commits, 8 days)
|__/ |
(@v1.8) pkg> activate .
Activating project at `C:\Users\krynjupc\.julia\dev\Dagger`
julia> using Dagger, DataFrames, Arrow, OnlineStats
julia> d = DTable(Arrow.Table, "data/".*readdir("data"))
DTable with 100 partitions
Tabletype: unknown (use `tabletype!(::DTable)`)
julia> g = Dagger.groupby(d, x->round(x.a, digits=1));
ERROR: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base .\task.jl:764
[2] wait()
@ Base .\task.jl:824
[3] wait(c::Base.GenericCondition{ReentrantLock})
@ Base .\condition.jl:112
[4] fetch_buffered(c::Channel{Any})
@ Base .\channels.jl:366
[5] fetch(c::Channel{Any})
@ Base .\channels.jl:360
[6] fetch_ref(::Distributed.RRID)
@ Distributed C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:593
[7] call_on_owner
@ C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:546 [inlined]
[8] fetch(r::Distributed.Future)
@ Distributed C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:587
[9] (::Dagger.var"#73#74"{OSProc, Dagger.ThunkFuture})()
@ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:132
[10] thunk_yield(f::Dagger.var"#73#74"{OSProc, Dagger.ThunkFuture})
@ Dagger.Sch C:\Users\krynjupc\.julia\dev\Dagger\src\sch\eager.jl:63
[11] fetch(t::Dagger.ThunkFuture; proc::OSProc)
@ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:131
[12] fetch
@ C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:131 [inlined]
[13] fetch(t::Dagger.EagerThunk)
@ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:193
[14] groupby(d::DTable, f::Function; merge::Bool, chunksize::Int64)
@ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\table\groupby.jl:70
[15] groupby(d::DTable, f::Function)
@ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\table\groupby.jl:57
[16] top-level scope
@ REPL[4]:1