-
Notifications
You must be signed in to change notification settings - Fork 465
ppc64le - Segfault during test: ./xeigtstz < nep.in > znep.out 2>&1
#85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW, the problem also occurred with gfortran:
|
./xeigtstz < nep.in > znep.out 2>&1
./xeigtstz < nep.in > znep.out 2>&1
Also occurs with -O0 (no opt). |
$200 bounty open to fix this! |
On a typical Amazon AWS I can compile lapack with -O0 and -frecursive, and have the compilation and tests run all the way through without a problem. However, when I go back into the TESTING directory and try to run valgrind or gdb on xeigtstz, it segfaults within zchkee.f at line 1134, upon trying to initialize a gigantic local array. Recompiling without -frecursive, I no longer get the segfault. My theory: -frecursive forces local arrays to be allocated on the stack (avoiding static variables is required for thread safety; I was the one recommending this flag originally). The static arrays within the test routines blow out the stack size (just the A array in zchkee is almost 1 MB). You may need to try increasing the stack size to make the tests run correctly if you compile with -frecursive. Edit: I should also add that valgrind finds thousands of illegal memory accesses in the main test drivers as well. I don't know enough about fortran WRITE and READ statements to know if these are spurious errors or not. |
gfortran succeeds compilation without -frecursive and fails with it. I'm most concerned with xlf, though - I also tried disabling -qnosave with xlf but no luck. Setting stack size to unlimited like so: Resulted in:
Edited to make clear I was running with -qnosave disabled. |
I just re-compiled using gfortran with "-frecursive -fcheck=all -O0 -ggdb". Running "make" causes the tests to fail on each test of the Aasen symmetric indefinite routines. This happens for all four precisions. Running each test manually within gdb results in: xlintsts < stest.in:
xlintstc < ctest.in:
./xlintstd < dtest.in
xlintstz < ztest.in:
Digging into the stacktrace, it appears the bottom level function (the files mentioned above), the matrix C is accessed at index (2,1) whereas N=LDC=1 in the routine. This is not necessarily an error since the called BLAS function performs an early exit (N=0). Full stack trace for single precision test:
More oddities involving the Aasen tests show up when running valgrind on "xlintsts < stest.in". Is it possible for you to run the test routine under your debugger (whatever it would be corresponding to xlf)?
|
Thank you, I've fixed some of these errors last week but will make sure to fix them in Best, On Sun, Nov 20, 2016 at 3:23 PM, Victor Liu [email protected]
|
I ran my failing test ( Here is the relevant part of the test results:
tl;dr: with |
Please run gdb with a breakpoint on zlascl.f:209 (or whichever line it is that sets the error return) and provide a backtrace. e.g.:
|
Here are the results of that. I also dumped the frame info.
|
I have a hard time to reproduce this problem. So this is hard to me to make much progress. I can read the information you are posting. So here is what is happening. (As far as I can see. Thanks James for the gdb, and Victor for suggestions.) We are in LAPACK/TESTING, so we have a Anyhow we have a true numerical error to figure out. It seems that, during the numerical test, ZGETSL calls ZLASCL with 4th parameter CFROM = NAN. (And this is NOT OK. And there is a check in ZLASCL for this purpose and this is why the program crashes.) The fact that you see |
Another related commit at 14f49eb |
@jlost : can you please do a |
Sure, I'll retest in a couple of days when I'm back from vacation. |
Same results as above (2700 COMPLEX16 errors) when I compile with xlf and a freshly compiled reference BLAS. With gfortran, I get 1 error:
|
Hi all. So I finally got my hands on an IBM machine. (IBM Power8E.) (Thanks @edelsohn , IBM and OSU.) So far, I am trying LAPACK with gfortran (gcc 6.2.1, and standard make.inc) and reference BLAS. You do have to use: |
I can't figure out how to build the test executables..... I run cmake but then it doesn't generate a Makefile with any targets... |
Hi, I built lapack with xlf on a ppc64le (POWER8) machine. I used the options in
INSTALL\make.inc.XLF
, although for myBLASLIB
, I switched to../../librefblas.a
.During
make
, I encountered the following error:Any idea what the problem might be? Is reference lapack tested on ppc64le architecture?
The text was updated successfully, but these errors were encountered: