Skip to content

Build problem on Red Hat Enterprise 6 #1269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
susilehtola opened this issue Aug 6, 2017 · 38 comments
Closed

Build problem on Red Hat Enterprise 6 #1269

susilehtola opened this issue Aug 6, 2017 · 38 comments

Comments

@susilehtola
Copy link
Contributor

Hi,

I'm trying to update the Fedora and Red Hat OpenBLAS packages to 0.2.20, but for some reason it fails to build on the Red Hat Enterprise Linux 6 branch. The problem I'm seeing is a bunch of errors [1] like

/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
make[1]: *** [dblat3] Error 1

This happens even though I have the NO_AVX2=1 flag set.

[1] https://kojipkgs.fedoraproject.org//work/tasks/4289/21074289/build.log

@martin-frbg
Copy link
Collaborator

Can you build an earlier version in the same environment ? Without NO_AVX2 you would get assembler errors, this here looks like maybe you have mismatched versions of gcc and gfortran installed.

@brada4
Copy link
Contributor

brada4 commented Aug 6, 2017

Can you share spec and patches, they seem for different openblas version.

@susilehtola
Copy link
Contributor Author

It actually looks like the build started on EL6 failing right after I added the patch that adds necessary flags to make the tests compile correctly on another distribution, where the tests of the OpenMP flavor library weren't linking properly with the OpenMP runtime libraries.

The up-to-date spec and patches are all available at
https://src.fedoraproject.org/rpms/openblas/tree/el6

I'll see if it compiles when I don't apply patch3.

@susilehtola
Copy link
Contributor Author

... and the packages in the buildroot can be seen at
https://kojipkgs.fedoraproject.org//work/tasks/4289/21074289/root.log

so gcc (and gfortran) is 4.4.7-18.el6.

@susilehtola
Copy link
Contributor Author

Removing patch3 didn't help, I still get

/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
make[1]: *** [sblat1] Error 1

@martin-frbg
Copy link
Collaborator

Obviously "main" is in the respective source files sblat2.f etc that it just compiled, I can only imagine there might be an ABI mismatch of some kind (even if it just a matter of underscores appended to module names). Hence the question about compiler versions. I'd try a plain and simple build from the source tarball without your .spec file and rpmbuild environment first to see if the build platform is sane.

@brada4
Copy link
Contributor

brada4 commented Aug 7, 2017

crt1.o is part of static glibc or so.
Upd: looks like something else static is missing.
Upd: planting SCL devtools as leads to same place.
Upd (last) OpenMP only problem, other configs build just fine

@brada4
Copy link
Contributor

brada4 commented Aug 22, 2017

Oops - @susilehtola your build log links faded away.....
Two questions:

  • Do tests run before you plant old static lapack or after?
  • I think there is a bug not affecting you that gfortran imports pthread lib which at best does not serve any purpose whatsoever in OMP build

@susilehtola
Copy link
Contributor Author

The GNU OpenMP implementation is based on pthreads, so I'm not sure the latter is a problem.

As to your first question, I don't really understand what you're asking: the package doesn't use "old static lapack", but uses the LAPACK sources that are shipped with OpenBLAS.

@brada4
Copy link
Contributor

brada4 commented Aug 29, 2017

Yes, but blank OMP build succeeds, but not rpmbuild.
It should be indifferent if it is -lpthread added there or not.
I will spend some time understanding the difference that breaks stuff.

@susilehtola
Copy link
Contributor Author

The crt1.o error just says that for some reason, sblat1 ends up without a main() function; that is, there's no PROGRAM in Fortran lingo.

@brada4
Copy link
Contributor

brada4 commented Aug 29, 2017

I am flipping (your) parameters one by one to see which one turns golden. I can easily produce failed rpmbuild with log.

@brada4
Copy link
Contributor

brada4 commented Aug 30, 2017

@susilehtola it turns out static gfortran -fopenmp as used to link tests in your SPEC requires additional -lpthread on centos6
I need to test 3 other combinations, but this is bare minimum to make it through.

Not related to this request... Do you think it s possible for you to implement AVX2 support https://github.com/xianyi/OpenBLAS/wiki/faq#binutils based on that RHEL6 once met Haswell?
It is like double _gemm speed or so...

@susilehtola
Copy link
Contributor Author

@brada4 thanks, I'll check if it builds with -lpthread.

As for the second one, it's not possible to compile Fedora EPEL packages with packages not in the RHEL repository. For AVX2 support, one just has to upgrade to RHEL7.

@brada4
Copy link
Contributor

brada4 commented Aug 30, 2017

You can add -lpthread universally, it does no damage as long as it stays linux, there are even some pthread calls mixed into omp code paths.

@susilehtola
Copy link
Contributor Author

susilehtola commented Aug 30, 2017

It still fails to build - the -lpthread does nothing since the issue is the lack of main() in the tests.

Maybe the problem is the warning

Warning: Nonconforming tab character at (1)
c_sblat2.f:354.7:

which ends up as broken code?

@martin-frbg
Copy link
Collaborator

Extremely unlikely - if you look at that file you will see that the only issue is that there are indeed a few tab characters in the indentation of the last argument to SCHK3(). I do not see how a miscompilation would cause the issue you saw. Can you re-upload a build log please ?

@brada4
Copy link
Contributor

brada4 commented Aug 30, 2017

You have typo, must be l - p - thread.

@susilehtola
Copy link
Contributor Author

@brada4 typo was in comment not in build. FWIW I think the -fopenmp flag I had in EXTRALIBS already had fixed this; IIRC I added it because the link phase had failed on some arch and it went through with -fopenmp.

@susilehtola
Copy link
Contributor Author

Scratch build log available at
https://kojipkgs.fedoraproject.org//work/tasks/9150/21559150/build.log

I won't paste it here since XZ is not supported and even zipped it's 1.6 MB. It'll vanish in a day or two tho....

@martin-frbg
Copy link
Collaborator

Thanks. I think one interesting part is that there is no "-fopenmp" among the gfortran options when building lapack-netlib/SRC - although it was present while building in "INSTALL" just before and it reappears (and stays) when building TESTING/MATGEN immediately afterwards.
I wonder how that could be - some unintended interaction between your FCOMMON flags and our method of generating the lapack-netlib/make.inc, or something in your patches ?
(The bad news is that current lapack-netlib would fail to build in the OPENMP configuration with such an old compiler anyway, as there are some OPENMP 4.0 directives now e.g. in ssytrd_sb2st.F that were apparently added sometime after the 4.9.1 release)

@brada4
Copy link
Contributor

brada4 commented Aug 31, 2017

Well
make DYNAMIC_ARCH=1 USE_OPENMP=1 NO_AVX2=1
made valid .so ) by exported symbols)
Also i did rpmbuild with -pthread added to all places -fopenmp is seen and it generated RPM
I did not profoundly thest that I got a good thing. Plain make with minimum flags uses much longer line for tests (basically static library is built, then tested, then wrapped with ELF so decoration, probably universal compiler flags are failing)
Artifact logs lost to ramdisks.

@susilehtola
Copy link
Contributor Author

Still fails with -pthread added to all -fopenmp and correct OpenMP flag usage in the LAPACK routines as well

https://kojipkgs.fedoraproject.org//work/tasks/7404/21607404/build.log

@martin-frbg
Copy link
Collaborator

"-fopenmp" still not present in the "make -C openmp" section of your log when building in lapack-netlib/SRC:

( cd SRC; make )
make[2]: Entering directory `/builddir/build/BUILD/openblas-0.2.20/openmp/lapack-netlib/SRC'
gfortran -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -fPIC -pthread -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -fPIC -pthread -frecursive -fPIC -c -o sbdsvdx.o sbdsvdx.f

@susilehtola
Copy link
Contributor Author

No wonder it's missing, since your Makefiles strip that out!

$ grep LAPACK_FFLAGS *
Makefile.system:LAPACK_FFLAGS := $(filter-out -fopenmp -mp -openmp -xopenmp=parallel,$(FFLAGS))

Is it supposed to be there or no?

@martin-frbg
Copy link
Collaborator

Try grep -C 3 for some context - the line you are quoting is #ifdef OS_WINDOWS only. (And if it got executed, I'd expect that option to be missing in all subdirectores of netlib-lapack, there must be something else in your setup that removes it for SRC only.)

@susilehtola
Copy link
Contributor Author

You're right - lapack-netlib/SRC/Makefile contains

ALLAUX = $(filter-out $(ALL_AUX_OBJS),$(ALLAUX_O))
SLASRC = $(filter-out $(SLAPACKOBJS),$(SLASRC_O))
DLASRC = $(filter-out $(DLAPACKOBJS),$(DLASRC_O))
CLASRC = $(filter-out $(CLAPACKOBJS),$(CLASRC_O))
ZLASRC = $(filter-out $(ZLAPACKOBJS),$(ZLASRC_O))
DSLASRC = $(filter-out $(SLAPACKOBJS),$(DSLASRC_O))
ZCLASRC = $(filter-out $(CLAPACKOBJS),$(ZCLASRC_O))

OPTS1 = $(filter-out -fopenmp, $(OPTS))
#end filter out


ALLOBJ = $(SLASRC) $(DLASRC) $(DSLASRC) $(CLASRC) $(ZLASRC) $(ZCLASRC) \
   $(SCLAUX) $(DZLAUX) $(ALLAUX)

And this originates in the OpenBLAS 0.2.20 tarball. Maybe you'll want to remove the filter.

But this still does not answer why it fails to build on EPEL6, when the exact same package builds just fine on all other distribution versions.

@martin-frbg
Copy link
Collaborator

Ah yes, I was comparing to a build with netlib updated to 3.7.1 at the time, and that difference just stood out. Seems this was part of @wernsaar 's PR #1046 (updating LAPACK to 3.7.0 in early january) with the only explanation being "filtered out -fopenmp and fix for mingw".

@martin-frbg
Copy link
Collaborator

Any chance to see if building 0.2.19 or earlier in that exact same EPEL6 environment would succeed ? Or any possibility to access that build host and try to do the make in the test directory by hand (and check what version of ld and libraries get involved, e.g. an incompatible libgomp ) ? I do not have an EPEL6 host for testing, but I really cannot think of any moderately recent change in OpenBLAS that would cause this behaviour.

@brada4
Copy link
Contributor

brada4 commented Sep 13, 2017

plain 'make USE_OPENMP=1 DY...'works just fine, there is some C/FFLAGS manipulation in RPM SPEC that breaks the test thing (since tests are not part of release package, they can be built with generated flags, not something aligned to system config)

@martin-frbg
Copy link
Collaborator

Unusual CFLAGS/FFLAGS should not hurt as long as the same set is used throughout the build (which seems to be the case here). At the moment the only detail I notice in his log is a stray -fopenmp among the libraries when linking the tests.

@brada4
Copy link
Contributor

brada4 commented Sep 13, 2017

... and all they are single-threaded calling functions that are parallel behind the scenes....

@susilehtola
Copy link
Contributor Author

It also fails with 0.2.19 with the same problem
https://kojipkgs.fedoraproject.org//work/tasks/4971/21844971/build.log

You can see the versions of all the packages installed in the build root (including ld) at
https://kojipkgs.fedoraproject.org//work/tasks/4971/21844971/root.log

@martin-frbg
Copy link
Collaborator

Is any version of OpenBLAS known to build on that host (with that combination of flags) in the past ?
Any luck building without that stray "-fopenmp" among the libraries, or with a reduced set of flags ?

@susilehtola
Copy link
Contributor Author

This is really weird. Apparently the build failed on RHEL6 just because of the EXTRALIB argument.

@martin-frbg
Copy link
Collaborator

the errant -fopenmp in your EXTRALIB, or the entire EXTRALIB argument as such ?

@susilehtola
Copy link
Contributor Author

I don't know - I just removed the whole of EXTRALIB. Which is weird, because -fopenmp was already elsewhere in the list of arguments.

@susilehtola
Copy link
Contributor Author

It might have been due to the other EXTRALIB arguments as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants