Skip to content

Compilation fails with spack with cce compilers #4228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csccva opened this issue Sep 14, 2023 · 27 comments · Fixed by #4265
Closed

Compilation fails with spack with cce compilers #4228

csccva opened this issue Sep 14, 2023 · 27 comments · Fixed by #4265

Comments

@csccva
Copy link

csccva commented Sep 14, 2023

Hello,

I am trying to compile open openblas on a cluster with the cce compilers, but it fails:
This is the spack info:

> spack -V
0.20.0
> spack compilers
==> Available compilers
-- cce sles15-any -----------------------------------------------
[email protected]  [email protected]

-- clang sles15-any ---------------------------------------------
[email protected]

-- gcc sles15-any -----------------------------------------------
[email protected]  [email protected]  [email protected]

I try to install it via

spack install openblas%[email protected]

The build fails with this message:

Installing openblas-0.3.23-3p5f6xweur4fli4hx4oqlwngfanjhnic
....
....
.....
 Error: ProcessError: Command exited with status 2:
    'make' '-j32' '-s' 'CC=/pfs/lustrep1/appl/lumi/spack/23.03/0.20.0-user/lib/spack/env/cce/craycc' 'FC=/pfs/lustrep1/appl/lumi/spack/23.03/0.20.0-user/lib/spack/env/cce/crayftn' 'MAKE_NB_JOBS=0' 'ARCH=x86_64' 'TARGET=ZEN' 'USE_LOCKING=1' 'USE_OPENMP=0' 'USE_THREAD=0' 'RANLIB=ranlib' 'all'

4 errors found in build log:
     11    1 warning generated.
     12    ar: creating ../libopenblas_zen-r0.3.23.a
     13    parameter.c:273:7: warning: unused variable 'size' [-Wunused-variable]
     14      int size = 16;
     15          ^
     16    1 warning generated.
  >> 17    ld.lld: error: undefined reference due to --no-allow-shlib-undefined: la_constants_
     18    >>> referenced by ../libopenblas_zen-r0.3.23.so
  >> 19    clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
  >> 20    make[1]: *** [Makefile:196: ../libopenblas_zen-r0.3.23.so] Error 1
  >> 21    make: *** [Makefile:132: shared] Error 2

The complete log is this:

==> openblas: Executing phase: 'edit'
==> openblas: Executing phase: 'build'
==> [2023-09-14-08:47:31.908689] 'make' '-j32' '-s' 'CC=/pfs/lustrep1/appl/lumi/spack/23.03/0.20.0-user/lib/spack/env/cce/craycc' 'FC=/pfs/lustrep1/appl/lumi/spack/23.03/0.20.0-user/lib/spack/env/cce/crayftn' 'MAKE_NB_JOBS=0' 'ARCH=x86_64' 'TARGET=ZEN' 'USE_LOCKING=1' 'USE_OPENMP=0' 'USE_THREAD=0' 'RANLIB=ranlib' 'all'
gemmt.c:99:9: warning: unused variable 'alpha' [-Wunused-variable]
        FLOAT *alpha = Alpha;
               ^
1 warning generated.
gemmt.c:99:9: warning: unused variable 'alpha' [-Wunused-variable]
        FLOAT *alpha = Alpha;
               ^
1 warning generated.
ar: creating ../libopenblas_zen-r0.3.23.a
parameter.c:273:7: warning: unused variable 'size' [-Wunused-variable]
  int size = 16;
      ^
1 warning generated.
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: la_constants_
>>> referenced by ../libopenblas_zen-r0.3.23.so
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [Makefile:196: ../libopenblas_zen-r0.3.23.so] Error 1
make: *** [Makefile:132: shared] Error 2
@martin-frbg
Copy link
Collaborator

Strange. la_constants is a Fortran module used by some of the LAPACK code (which is a copy of Reference-LAPACK), do you see anything related to its compilation in the logs ? (And can you try to compile 0.3.24 - for mostly unrelated reasons - or is this not available in/for spack yet ?)

@csccva
Copy link
Author

csccva commented Sep 14, 2023

Strange. la_constants is a Fortran module used by some of the LAPACK code (which is a copy of Reference-LAPACK), do you see anything related to its compilation in the logs ? (And can you try to compile 0.3.24 - for mostly unrelated reasons - or is this not available in/for spack yet ?)

Hello,

The above messges are all I got from the spack. I tried spack install [email protected]%[email protected], but this is not present in my spack version (v0.20):

 Error: concretization failed for the following reasons:

   1. Cannot satisfy '[email protected]'

I would like to try it directly from source, but I do not have the make arguments from my system.

@csccva
Copy link
Author

csccva commented Sep 14, 2023

I just tried this

make PREFIX=/scratch/project_462000007/cristian/BLASSOURCE/ LIBNAMESUFFIX=nonthreaded   -j DYNAMIC_ARCH=0 CC=cc FC=ftn HOSTCC=gcc BINARY=64 INTERFACE=64 NO_AFFINITY=1 NO_WARMUP=1 USE_OPENMP=0 USE_THREAD=0 USE_LOCKING=1 LIBNAMESUFFIX=nonthreaded

I get this error:

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: la_constants_
>>> referenced by ../libopenblas_nonthreaded_zen-r0.3.24.so
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [Makefile:196: ../libopenblas_nonthreaded_zen-r0.3.24.so] Error 1

@martin-frbg
Copy link
Collaborator

Thanks. Can you try adding la_constants.o to the SCLAUX and DZLAUX lists near the top of lapack-netlib/SRC/Makefile please ? This appears to have been an oversight in my update of LAPACK that also happened in 0.3.21, and as la_constants does not contain any function symbols it may have gone unnoticed by other compilers.

@csccva
Copy link
Author

csccva commented Sep 14, 2023

It seems that it compiled. Though some tests failed. Should I be worried about that.

ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c zblat3.f  -o zblat3.o
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
make[1]: *** [../Makefile.system:1813: sblat1.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [../Makefile.system:1813: zblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: dblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: dblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: zblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: sblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: sblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: dblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: zblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat3.o] Error 1
make[1]: Leaving directory '/pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/test'
make: *** [Makefile:156: tests] Error 2

@martin-frbg
Copy link
Collaborator

These gcc-style options should have been filtered out in Makefile.system as the Cray compiler does not understand them - do you have F_COMPILER=CRAY in the generated Makefile.conf, or did compiler detection fail for some reason ?

@csccva
Copy link
Author

csccva commented Sep 14, 2023

These gcc-style options should have been filtered out in Makefile.system as the Cray compiler does not understand them - do you have F_COMPILER=CRAY in the generated Makefile.conf, or did compiler detection fail for some reason ?

No. I have GFORTRAN

OSNAME=Linux
ARCH=x86_64
C_COMPILER=CLANG
BINARY32=
BINARY64=1

F_COMPILER=GFORTRAN

@martin-frbg
Copy link
Collaborator

That's a bit weird - it must have seen "CRAY" at some point for the -hnopattern option to get added. Did you do a make clean before this, and use the same make options as before ?

@csccva
Copy link
Author

csccva commented Sep 14, 2023

I did a make clean and redid the compilation. I got the same.

@martin-frbg
Copy link
Collaborator

then the f_check script must be broken (again) :(
maybe it now returns both a hit for "GNU" in the compiler output and a valid version number that is greater than 3... not sure why I put that case before the name=Cray one though.(f_check from line 90, digging through the compiler output for a trivial fortran file)
still I wonder how it got that far in the build, instead of failing at compiling lapack.

@martin-frbg
Copy link
Collaborator

Can you provide the ftest.s file created by ftn -O2 -S ftest.f please ?

@martin-frbg
Copy link
Collaborator

...and I notice I already fixed Cray Fortran detection (for CCE version 15) in 0.3.22, so this "should not happen" ? (unless perhaps just calling "ftn" on your system invokes a different compiler than when the full path is specified as in the spack log)

@csccva
Copy link
Author

csccva commented Sep 15, 2023

ftn -O2 -S ftest.f

This worked without issues

@martin-frbg
Copy link
Collaborator

ftn -O2 -S ftest.f

This worked without issues

I don't doubt it, but I need to see the output file in order to figure out why the f_check script misidentified the compiler ?

@csccva
Copy link
Author

csccva commented Sep 15, 2023

...and I notice I already fixed Cray Fortran detection (for CCE version 15) in 0.3.22, so this "should not happen" ? (unless perhaps just calling "ftn" on your system invokes a different compiler than when the full path is specified as in the spack log)

The tests thingy fails when I do it directly from source.

> ftn --version
Cray Fortran : Version 15.0.0
``` 

@csccva
Copy link
Author

csccva commented Sep 15, 2023

ftn -O2 -S ftest.f

This worked without issues

I don't doubt it, but I need to see the output file in order to figure out why the f_check script misidentified the compiler ?

I apologize, I missed that. Here it is:

	.text
	.file	"The Cpu Module"
                                        # Start of file scope inline assembly
	.pushsection	.note.ftn_module_data
	.balign	4
	.4byte	27, 1f-0f, 8
	.asciz	"Hewlett Packard Enterprise"
	.balign	4
0:
	.ascii	"\107\043\004\145\362\340\001\000\001\000\001\000\025\000"
	.ascii	"\000\000\007\000\000\000\057\164\155\160\057\160\145\137"
	.ascii	"\061\062\063\061\062\062\057\057\160\154\144\151\162\000"
	.ascii	"\146\164\145\163\164\056\163\000"
	.balign	4
1:	.popsection

                                        # End of file scope inline assembly
	.globl	zhoge_                          # -- Begin function zhoge_
	.p2align	4, 0x90
	.type	zhoge_,@function
zhoge_:                                 # @zhoge_
	.cfi_startproc
# %bb.0:                                # %", bb1"
	vxorps	%xmm0, %xmm0, %xmm0             #  /pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/ftest.f:6
	vxorps	%xmm1, %xmm1, %xmm1             #  /pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/ftest.f:6
	retq                                    #  /pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/ftest.f:6
.Lfunc_end0:
	.size	zhoge_, .Lfunc_end0-zhoge_
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

@martin-frbg
Copy link
Collaborator

Thanks. It should pick up the "Hewlett Packard Enterprise" entry and put

F_COMPILER=CRAY
FC=ftn

in Makefile.conf (and does so in my test when I replace the corresponding line with a cat of your file).
I think you would only get F_COMPILER=GFORTRAN when FC was not set and the f_check script has to try all the compiler names known to it. (Or when the makefile manages to call f_check without the compiler name argument, but I do not see how this could happen). Do you get F_COMPILER=CRAY and FC=ftn in the file "mak" when you call f_check directly as f_check mak con ftn (first argument is normally the Makefile.conf, second the config.h) ?

@csccva
Copy link
Author

csccva commented Sep 15, 2023

Thanks. It should pick up the "Hewlett Packard Enterprise" entry and put

F_COMPILER=CRAY
FC=ftn

in Makefile.conf (and does so in my test when I replace the corresponding line with a cat of your file). I think you would only get F_COMPILER=GFORTRAN when FC was not set and the f_check script has to try all the compiler names known to it. (Or when the makefile manages to call f_check without the compiler name argument, but I do not see how this could happen). Do you get F_COMPILER=CRAY and FC=ftn in the file "mak" when you call f_check directly as f_check mak con ftn (first argument is normally the Makefile.conf, second the config.h) ?

I ran this:

./f_check Makefile.conf config.h ftn

This is what I got

OSNAME=Linux
ARCH=x86_64
C_COMPILER=CLANG
BINARY32=
BINARY64=1
CEXTRALIB= # lots of stuff
F_COMPILER=CRAY
FC=ftn
BU=_
CORE=ZEN

@martin-frbg
Copy link
Collaborator

That looks correct, I wonder why it did not work in your build where you ended up with the GFORTRAN entry

@csccva
Copy link
Author

csccva commented Sep 18, 2023

Is there some way to run the tests "manually"?

@martin-frbg
Copy link
Collaborator

well, f_check is just a shell script, the ./f_check Makefile.conf config.h ftn should have invoked it like the makefile would (if/when FC is ftn), and $FC -S ftest.f with a bit of expression matching in the .s file should have been the crucial part of the detection process

@martin-frbg
Copy link
Collaborator

I still do not understand why/how the GNU-style options crept back in when compiling in the test directory, but adding

ifeq ($(F_COMPILER),CRAY)
FLDFLAGS := $(filter-out -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mskylake-avx512 ,$(FLDFLAGS))
endif

in test/Makefile somewhere after the declaration of FLDFLAGS (line 262) should certainly stomp them out.
(Pity I don't have access to anything with cce to confirm this myself, but in any case adding these lines should not hurt)

@csccva
Copy link
Author

csccva commented Oct 16, 2023

I tried your suggestion with this compilation:

make clean;
 make PREFIX=/scratch/project_462000007/cristian/BLASSOURCE/ LIBNAMESUFFIX=nonthreaded   -j DYNAMIC_ARCH=0 CC=cc FC=ftn HOSTCC=gcc BINARY=64 INTERFACE=64 NO_AFFINITY=1 NO_WARMUP=1 USE_OPENMP=0 USE_THREAD=0 USE_LOCKING=1 LIBNAMESUFFIX=nonthreaded

I still getting the same errors:

make[1]: Entering directory '/pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/test'
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c sblat1.f  -o sblat1.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c dblat1.f  -o dblat1.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c cblat1.f  -o cblat1.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c zblat1.f  -o zblat1.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c sblat2.f  -o sblat2.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c dblat2.f  -o dblat2.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c cblat2.f  -o cblat2.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c zblat2.f  -o zblat2.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c sblat3.f  -o sblat3.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c dblat3.f  -o dblat3.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c cblat3.f  -o cblat3.o
ftn -O2 -hnopattern -O noomp -fPIC -msse3 -mssse3 -msse4.1 -mavx -mavx2 -c zblat3.f  -o zblat3.o
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
ftn-2307 ftn: ERROR in command line
  The "-m" option must be followed by 0, 1, 2, 3 or 4.
make[1]: *** [../Makefile.system:1813: sblat1.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [../Makefile.system:1813: dblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: sblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: zblat1.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: dblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: zblat2.o] Error 1
make[1]: *** [../Makefile.system:1813: dblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: cblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: sblat3.o] Error 1
make[1]: *** [../Makefile.system:1813: zblat3.o] Error 1
make[1]: Leaving directory '/pfs/lustrep4/scratch/project_462000007/cristian/BLASSPACK/OpenBLAS/test'
make: *** [Makefile:156: tests] Error 2

More info:

> module list

Currently Loaded Modules:
  1) craype-x86-rome      4) perftools-base/22.12.0                  7) craype/2.7.19      10) cray-libsci/22.12.1.1      13) lumi-tools/23.04 (S)
  2) libfabric/1.15.2.0   5) xpmem/2.5.2-2.4_3.47__gd0f7936.shasta   8) cray-dsmml/0.2.2   11) PrgEnv-cray/8.3.3          14) init-lumi/0.2    (S)
  3) craype-network-ofi   6) cce/15.0.0                              9) cray-mpich/8.1.23  12) ModuleLabel/label     (S)

@martin-frbg
Copy link
Collaborator

Thanks - I have no explanation for this, guess I should try to reproduce this behaviour with some wrapper scripts around a regular gcc and gfortran that make them behave like CCE.

@csccva
Copy link
Author

csccva commented Oct 16, 2023

In the end the openblas library is created. Even if the tests do no compile, can I trust the library?

@martin-frbg
Copy link
Collaborator

I think so - at least there were no issues with CCE builds in the past. BTW I cannot reproduce the misbehaviour with the patched test/Makefile (attached here but you'd need to remove the .txt extension that github insists on)
Makefile.txt

@martin-frbg
Copy link
Collaborator

Coming to the conclusion that we "only" need to be more assertive in stripping down and overwriting the existing FFLAGS setting in Makefile.system, so that the new setting actually takes effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants