Skip to content

jl_compile_extern_c appears to be running at the wrong time in sysimg startup #41670

Closed
@vtjnash

Description

@vtjnash

#40715 seems to have broken precompilation for GPUCompiler.jl. Before, doing this:

precomp_script.jl:

using LinearAlgebra
m = 10

compile.jl:

using PackageCompiler
default_compile_dir() = joinpath(homedir(), ".julia", "sysimages")

default_compile_filename() = "sys_gputest.so"

default_compile_path() = joinpath(default_compile_dir(), default_compile_filename())

function compile(;
  dir::AbstractString=default_compile_dir(),
  filename::AbstractString=default_compile_filename(),
)
  if !isdir(dir)
    println("""The directory "$dir" doesn't exist yet, creating it now.""")
    println()
    mkdir(dir)
  end
  path = joinpath(dir, filename)
  println(
    """Creating the system image "$path" containing the compiled version of GPUCompiler. This may take a few minutes.""",
  )
  create_sysimage(
    :GPUCompiler;
    sysimage_path=path,
    precompile_execution_file=joinpath(@__DIR__, "precomp_script.jl"),
  )
  return path
end

compile()

and running with
julia --project=~/.julia/dev/GPUCompiler compile.jl then julia --sysimage ~/.julia/sysimages/sys_gputest.so allows Julia to startup without problems. After this PR, the compilation process succeeds but as soon as we try to load the sysimage with a sanitized Julia build, we get:

$ ~/Julia/src/julia/build/sanitize/julia --sysimage /tmp/sys.so                                                                                                   [master] 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==260483==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000001c8 (pc 0x7fb5d77e79a6 bp 0x7ffc28e2ed30 sp 0x7ffc28e2ed20 T0)
==260483==The signal is caused by a READ memory access.
==260483==Hint: address points to the zero page.
    #0 0x7fb5d77e79a6 in llvm::StringRef::StringRef(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/tim/Julia/src/julia/build/sanitize/usr/include/llvm/ADT/StringRef.h
    #1 0x7fb5d77e79a6 in jl_setup_module(llvm::Module*, jl_cgparams_t const*) /home/tim/Julia/src/julia/src/codegen.cpp:1740:24
    #2 0x7fb5d77e78b5 in jl_create_llvm_module(llvm::StringRef) /home/tim/Julia/src/julia/src/codegen.cpp:1746:5
    #3 0x7fb5d7af303c in jl_compile_extern_c /home/tim/Julia/src/julia/src/jitlayers.cpp:245:16
    #4 0x7fb5d7a171ae in jl_reinit_item /home/tim/Julia/src/julia/src/staticdata.c:1453:27
    #5 0x7fb5d7a171ae in jl_finalize_deserializer /home/tim/Julia/src/julia/src/staticdata.c:1473:9
    #6 0x7fb5d7a14c60 in jl_restore_system_image_from_stream /home/tim/Julia/src/julia/src/staticdata.c:1845:5
    #7 0x7fb5d7a134f6 in jl_restore_system_image_data /home/tim/Julia/src/julia/src/staticdata.c:1894:5
    #8 0x7fb5d7a134f6 in jl_load_sysimg_so /home/tim/Julia/src/julia/src/staticdata.c:375:5
    #9 0x7fb5d7a134f6 in jl_restore_system_image /home/tim/Julia/src/julia/src/staticdata.c:1867:9
    #10 0x7fb5d79dbb56 in julia_init /home/tim/Julia/src/julia/src/init.c:738:9
    #11 0x7fb5d7a8bd16 in jl_repl_entrypoint /home/tim/Julia/src/julia/src/jlapi.c:684:5
    #12 0x500c1d in main /home/tim/Julia/src/julia/cli/loader_exe.c:53:15
    #13 0x7fb5db52fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x41f39d in _start (/home/tim/Julia/src/julia/build/sanitize/usr/bin/julia+0x41f39d)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/tim/Julia/src/julia/build/sanitize/usr/include/llvm/ADT/StringRef.h in llvm::StringRef::StringRef(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
==260483==ABORTING$ ~/Julia/src/julia/build/sanitize/julia --sysimage /tmp/sys.so                                                                                                   [master] 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==260483==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000001c8 (pc 0x7fb5d77e79a6 bp 0x7ffc28e2ed30 sp 0x7ffc28e2ed20 T0)
==260483==The signal is caused by a READ memory access.
==260483==Hint: address points to the zero page.
    #0 0x7fb5d77e79a6 in llvm::StringRef::StringRef(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/tim/Julia/src/julia/build/sanitize/usr/include/llvm/ADT/StringRef.h
    #1 0x7fb5d77e79a6 in jl_setup_module(llvm::Module*, jl_cgparams_t const*) /home/tim/Julia/src/julia/src/codegen.cpp:1740:24
    #2 0x7fb5d77e78b5 in jl_create_llvm_module(llvm::StringRef) /home/tim/Julia/src/julia/src/codegen.cpp:1746:5
    #3 0x7fb5d7af303c in jl_compile_extern_c /home/tim/Julia/src/julia/src/jitlayers.cpp:245:16
    #4 0x7fb5d7a171ae in jl_reinit_item /home/tim/Julia/src/julia/src/staticdata.c:1453:27
    #5 0x7fb5d7a171ae in jl_finalize_deserializer /home/tim/Julia/src/julia/src/staticdata.c:1473:9
    #6 0x7fb5d7a14c60 in jl_restore_system_image_from_stream /home/tim/Julia/src/julia/src/staticdata.c:1845:5
    #7 0x7fb5d7a134f6 in jl_restore_system_image_data /home/tim/Julia/src/julia/src/staticdata.c:1894:5
    #8 0x7fb5d7a134f6 in jl_load_sysimg_so /home/tim/Julia/src/julia/src/staticdata.c:375:5
    #9 0x7fb5d7a134f6 in jl_restore_system_image /home/tim/Julia/src/julia/src/staticdata.c:1867:9
    #10 0x7fb5d79dbb56 in julia_init /home/tim/Julia/src/julia/src/init.c:738:9
    #11 0x7fb5d7a8bd16 in jl_repl_entrypoint /home/tim/Julia/src/julia/src/jlapi.c:684:5
    #12 0x500c1d in main /home/tim/Julia/src/julia/cli/loader_exe.c:53:15
    #13 0x7fb5db52fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x41f39d in _start (/home/tim/Julia/src/julia/build/sanitize/usr/bin/julia+0x41f39d)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/tim/Julia/src/julia/build/sanitize/usr/include/llvm/ADT/StringRef.h in llvm::StringRef::StringRef(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
==260483==ABORTING

(thanks @maleadt for this log)
the problem is NULL pointer + offset:

(gdb) p jl_TargetMachine
$1 = (llvm::TargetMachine *) 0x0

Tim suggested that:

the GPUCompiler.ji deserialization triggers jl_compile_extern_c (for a @ccallable, I presume) from within jl_init before jl_init_codegen has initalized LLVM

Originally posted by @kshyatt in #40715 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions