Skip to content

Hang #1 in Eager API usage #282

Closed
@krynju

Description

@krynju

Starting the effort of documenting any somewhat replicable hangs

Observations/conditions:

  1. Can be ctrl-c'd - stacktrace below (other's not so much)
  2. Some Julia instances will hang almost immediately (1st or 2nd run of groupy), others will never hang no matter how many runs (consistent with other hangs)
  3. Julia master with all available fixes merged and Dagger with all available fixes merged
  4. Running with threads only

Thread usage during the hang : none
image

Stacktrace:

PS C:\Users\krynjupc\.julia\dev\Dagger> julia -t16
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-DEV.490 (2021-09-13)
 _/ |\__'_|_|_|\__'_|  |  kr/distributed-ref-count-race/dfd4724ce3 (fork: 2 commits, 8 days)
|__/                   |

(@v1.8) pkg> activate .
  Activating project at `C:\Users\krynjupc\.julia\dev\Dagger`

julia> using Dagger, DataFrames, Arrow, OnlineStats

julia> d = DTable(Arrow.Table, "data/".*readdir("data"))
DTable with 100 partitions
Tabletype: unknown (use `tabletype!(::DTable)`)

julia> g = Dagger.groupby(d, x->round(x.a, digits=1));
ERROR: InterruptException:
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base .\task.jl:764
  [2] wait()
    @ Base .\task.jl:824
  [3] wait(c::Base.GenericCondition{ReentrantLock})
    @ Base .\condition.jl:112
  [4] fetch_buffered(c::Channel{Any})
    @ Base .\channels.jl:366
  [5] fetch(c::Channel{Any})
    @ Base .\channels.jl:360
  [6] fetch_ref(::Distributed.RRID)
    @ Distributed C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:593
  [7] call_on_owner
    @ C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:546 [inlined]
  [8] fetch(r::Distributed.Future)
    @ Distributed C:\cygwin64\home\krynjupc\julia\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:587
  [9] (::Dagger.var"#73#74"{OSProc, Dagger.ThunkFuture})()
    @ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:132
 [10] thunk_yield(f::Dagger.var"#73#74"{OSProc, Dagger.ThunkFuture})
    @ Dagger.Sch C:\Users\krynjupc\.julia\dev\Dagger\src\sch\eager.jl:63
 [11] fetch(t::Dagger.ThunkFuture; proc::OSProc)
    @ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:131
 [12] fetch
    @ C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:131 [inlined]
 [13] fetch(t::Dagger.EagerThunk)
    @ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\thunk.jl:193
 [14] groupby(d::DTable, f::Function; merge::Bool, chunksize::Int64)
    @ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\table\groupby.jl:70
 [15] groupby(d::DTable, f::Function)
    @ Dagger C:\Users\krynjupc\.julia\dev\Dagger\src\table\groupby.jl:57
 [16] top-level scope
    @ REPL[4]:1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions