Skip to content

1.7.0-rc2: StackOverflowError with complex-valued matrix exp #886

Closed
JuliaLang/julia
#43300
@daviehh

Description

@daviehh

On macos, using version 1.7.0-rc2, julia just shows StackOverflowError with no other info when taking the matrix exp with a ~ 300x300 complex-valued matrix. Minimum example:

using LinearAlgebra

n = 300
m = rand(ComplexF64, n, n);
mex = exp(m);

Alro ran with --startup-file=no to make sure it's not some clash with other packages.

Screen Shot 2021-11-08 at 5 52 35 PM

my versioninfo():

Julia Version 1.7.0-rc2
Commit f23fc0d27a (2021-10-20 12:45 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_MATHLINK = /Applications/Mathematica.app/Contents/Frameworks/mathlink.framework
  JULIA_MATHKERNEL = /Applications/Mathematica.app/Contents/MacOS/MathKernel

Same code runs fine with Version 1.6.3 (2021-09-23).

Hardware: Macbook pro (intel) with 16 GB ram, activity monitor shows low memory usage.

In addition, the same code sometimes gives ERROR: LoadError: ReadOnlyMemoryError(), not sure how to reproduce that one...

Thanks!

Activity

vtjnash

vtjnash commented on Nov 8, 2021

@vtjnash
SponsorMember

I can confirm (and seems fixed on master). Looks like an openblas issue:

(lldb) bt
* thread JuliaLang/julia#1, queue = 'com.apple.main-thread'
  * frame #0: 0x000000011b9e23a8 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 24
    frame JuliaLang/julia#1: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#2: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#3: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#4: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#5: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#6: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#7: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#8: 0x000000011b7e0372 libopenblas64_.0.3.13.dylib`zgesv_64_ + 402
    frame JuliaLang/julia#9: 0x00000001100fc1a4
    frame JuliaLang/julia#10: 0x00000001100fead8
    frame JuliaLang/julia#11: 0x0000000110100ea4
    frame JuliaLang/julia#12: 0x00000001063385f0 libjulia-internal.1.7.dylib`do_call + 208
daviehh

daviehh commented on Nov 9, 2021

@daviehh
Author

Looks like it, current master uses libopenblas64_.0.3.17.dylib, and by running

BLAS.lbt_forward("/path/to/libopenblas64_.0.3.17.dylib"; clear=true)

in 1.7.0-rc2 the issue is resolved,

image

so maybe just bump openblas_jll for 1.7?

added
bugSomething isn't working
regressionRegression in behavior compared to a previous version
on Nov 16, 2021
carstenbauer

carstenbauer commented on Dec 1, 2021

@carstenbauer
Member
stevengj

stevengj commented on Dec 2, 2021

@stevengj
Member

I think it's the same cause as the abovementioned discourse thread.

I'm getting the same issue with exp as above on 1.7.0 on macOS (x86_64). It boils down to a LAPACK call: I get StackOverflowError from

using LinearAlgebra
n = 300
A = rand(ComplexF64,n,n)
B = copy(A)
LAPACK.gesv!(A,B)

or even if I directly ccall to LAPACK:

import LinearAlgebra.BLAS: @blasfunc, libblastrampoline, BlasInt
ipiv = similar(A, BlasInt, n)
info = Ref{BlasInt}()
ccall((@blasfunc(cgesv_), libblastrampoline), Cvoid,
    (Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF32}, Ref{BlasInt}, Ptr{BlasInt},
    Ptr{ComplexF32}, Ref{BlasInt}, Ptr{BlasInt}),
    n, size(B,2), A, max(1,stride(A,2)), ipiv, B, max(1,stride(B,2)), info)

Similarly, the StackOverflowError in the discourse thread for inv(A) boils down to a ccall((@blasfunc(sgetrf_), libblastrampoline), ...).

carstenbauer

carstenbauer commented on Dec 2, 2021

@carstenbauer
Member
gbaraldi

gbaraldi commented on Dec 2, 2021

@gbaraldi
Member

It might be that macos is more susceptible to these stackoverflows because, unless I understood incorrectly, the default pthread stack is 512kb on macos and it's larger on other OSs. Linux seems to be 2Mb and windows 1Mb.

added a commit that references this issue on Dec 2, 2021

10 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexternal dependenciesInvolves LLVM, OpenBLAS, or other linked librariesregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @carstenbauer@vtjnash@ViralBShah@giordano@KristofferC

      Issue actions

        1.7.0-rc2: StackOverflowError with complex-valued matrix exp · Issue #886 · JuliaLang/LinearAlgebra.jl