Skip to content

Julia can hang when using threading, on 1.7 beta2 and 1.8 #41407

@PallHaraldsson

Description

@PallHaraldsson

A.

$ time ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null

real	0m3,032s
user	0m8,379s
sys	0m0,883s

then immediately after:

$ time sh -c ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null

^C
^C
^C
Terminated

real	4m25,588s
user	0m0,004s
sys	0m0,000s

Note, CTRL-C didn't work to kill, I had to use the kill command (did in another terminal tab). I previously ran into hangs with hyperfine, which I do not believe implicated, it does the latter, but wqith it I could kill with CTRL-C, meaning hyperfine got killed but julia was still running in the background.

I would mark this as a regression. I've done plenty of benchmarking before, often with hyperfine for 1.6, or time (or just @btime), and this never happened with hyperfine. I'm however new to using only "sh -c" (which is implied by hyperfine). See at #39598 (comment)

This affect 1.7.0-beta2 too (tested later):

$ time sh -c ~/julia-1.7-DEV-da96fef327/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
Terminated

real	1m31,985s
user	0m0,001s
sys	0m0,003s

and strangely doing "nothing":

Note, I believe I pressed ENTER despite of ERROR on the same line. At least it was quick:

$ ~/julia-1.7-DEV-da96fef327/bin/juliaERROR: TaskFailedException
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:705
  [2] wait
    @ ./task.jl:764 [inlined]
  [3] wait(c::Base.GenericCondition{ReentrantLock})
    @ Base ./condition.jl:113
  [4] take_buffered(c::Channel{Any})
    @ Base ./channels.jl:389
  [5] take!
    @ ./channels.jl:383 [inlined]
  [6] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:195
  [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:185
  [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:317
  [9] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:305
 [10] (::Base.var"#914#916"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:394
 [11] #invokelatest#2
    @ ./essentials.jl:721 [inlined]
 [12] invokelatest
    @ ./essentials.jl:719 [inlined]
 [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:379
 [14] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:309
 [15] _start()
    @ Base ./client.jl:492

    nested task error: IOError: stream is closed or unusable
    Stacktrace:
     [1] check_open(x::Base.TTY)
       @ Base ./stream.jl:386
     [2] raw!(t::REPL.Terminals.TTYTerminal, raw::Bool)
       @ REPL.Terminals /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/Terminals.jl:138
     [3] prompt!(term::REPL.Terminals.TextTerminal, prompt::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
       @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/LineEdit.jl:2602
     [4] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
       @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/LineEdit.jl:2481
     [5] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
       @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:1126
     [6] (::REPL.var"#44#49"{REPL.LineEditREPL, REPL.REPLBackendRef})()
       @ REPL ./task.jl:406
    
    caused by: IOError: read: i/o error (EIO)
    Stacktrace:
     [1] wait_readnb(x::Base.TTY, nb::Int64)
       @ Base ./stream.jl:408
     [2] eof(s::Base.TTY)
       @ Base ./stream.jl:106
     [3] eof(io::REPL.Terminals.TTYTerminal)
       @ Base ./io.jl:416
     [4] match_input(k::Dict{Char, V} where V, s::Union{Nothing, REPL.LineEdit.MIState}, term::Union{REPL.Terminals.AbstractTerminal, IOBuffer}, cs::Vector{Char}, keymap::Dict{Char, V} where V) (repeats 4 times)
       @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/LineEdit.jl:1475
     [5] prompt!(term::REPL.Terminals.TextTerminal, prompt::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
       @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/LineEdit.jl:2571
     [6] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
       @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/LineEdit.jl:2481
     [7] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
       @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:1126
     [8] (::REPL.var"#44#49"{REPL.LineEditREPL, REPL.REPLBackendRef})()

then nothing for a while until I pressed ENTER (again?!) and got:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-DEV.714 (2021-03-16)
 _/ |\__'_|_|_|\__'_|  |  Commit da96fef327 (105 days old master)

This is getting strange, running that julia version straight after no problem. It's either only random and/or my machine getting strange. Any idea what could cause it? Everything else seems to work fine. I do not recall if I was using that exact git commit of 1.7 before.

I was probably using all my three 1.7 downloads frequently before, not sure which most, and running all of them once went ok now:

$ ~/julia-1.7-
julia-1.7-a7848a28e5/     julia-1.7-DEV-da96fef327/ julia-1.7-DEV-f2ea26d1a1/ 

B.
Since I got "ERROR: TaskFailedException" on the same line, it it plausible it was from previous invocation, i.e. when running with sh? If not then not even sh needs be implicated. I guess that's the mostly likely reason, since I tried later to run that 1.7 version 20 times in a row without any problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    multithreadingBase.Threads and related functionality

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions