Skip to content

cmake: DYNAMIC_ARCH build broken on OS X #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leezu opened this issue May 29, 2020 · 16 comments · Fixed by #2636
Closed

cmake: DYNAMIC_ARCH build broken on OS X #2634

leezu opened this issue May 29, 2020 · 16 comments · Fixed by #2636
Milestone

Comments

@leezu
Copy link
Contributor

leezu commented May 29, 2020

With latest develop, DYNAMIC_ARCH build on OS X fails with symbol(s) not found for architecture x86_64 errors for symbols like _xtrsm_outucopy_ZEN", referenced from: _gotoblas_ZEN in libopenblas.a(setparam_ZEN.c.o).

See for example this Github Actions log: https://github.com/leezu/OpenBLAS/runs/719019580?check_suite_focus=true
The Makefile build works https://github.com/leezu/OpenBLAS/runs/719019582?check_suite_focus=true

Besides this bug, would it be sensible to enable Github Actions CI for OpenBLAS repo? I'm happy to open a PR with an slightly extended version of leezu@560c39d used to produce above log.

@martin-frbg
Copy link
Collaborator

Please pull from develop, not master - the latter is almost three years out of date. There is a github action for homebrew builds in the develop tree, though it currently fails in an xcode version check after the successful build.

@leezu
Copy link
Contributor Author

leezu commented May 30, 2020

This issue is about the develop version. Sorry for using the wrong word in the description.

@martin-frbg
Copy link
Collaborator

Oh, I see. From your log the problem seems to be not so much ZEN architecture but the essentially unimplemented "quadruple precision support, not sure why a CMAKE build would suddenly attempt this on OSX. Would you happen to have set EXPRECISION=1 or QUAD_PRECISION=1 in an environment variable like CFLAGS, perhaps in the context of a different project ? (Alos not sure what to make of the cmake... -DANDROID=1 either - was this an attempt to cross-compile ?)

@martin-frbg
Copy link
Collaborator

Can you make this go away with -DNO_EXPRECISION=1 ? (Can't replicate this on Linux, possibly osx should be treated like the BSDs where this option gets set automatically since #2076)

@martin-frbg martin-frbg added this to the 0.3.10 milestone May 30, 2020
@leezu
Copy link
Contributor Author

leezu commented May 30, 2020

I'm trying the cmake -DDYNAMIC_ARCH=1 -DNOFORTRAN=0 -DBUILD_WITHOUT_LAPACK=0 -DUSE_THREAD=1 -DUSE_LOCKING=1 .. build at https://github.com/leezu/OpenBLAS/runs/723909998?check_suite_focus=true and cmake -DDYNAMIC_ARCH=1 -DNOFORTRAN=0 -DBUILD_WITHOUT_LAPACK=0 -DUSE_THREAD=1 -DUSE_LOCKING=1 -DNO_EXPRECISION=1 .. build at https://github.com/leezu/OpenBLAS/runs/723910520?check_suite_focus=true

The DANDROID was taken from the Conan package file (where it is only set on Linux / Android): https://github.com/conan-io/conan-center-index/blob/375275ef84db5afd7aa22f618b774180c829958c/recipes/openblas/all/conanfile.py#L63-L65

@leezu
Copy link
Contributor Author

leezu commented May 31, 2020

Thanks @martin-frbg. According to above Github Actions runs, -DNO_EXPRECISION=1 fixes the DYNAMIC_ARCH cmake build on macOS.

In general, could you recommend "how experimental" you would consider the cmake build to be? Is there a plan to consider it "stable" at some point?

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 31, 2020

The problem with the cmake build is that it is a comparatively late addition where the gmake build already used some trickery (recursions and kernel configuration files that use gmake conditional syntax), and not all combinations of options will have been tested on all platforms. Your build logs seem to suggest that USE_LOCKING may not have been passed correctly, I'll check that later today. (~~Though this should be¨¨ Actually this is redundant with the USE_THREAD option you set - if you leave that out the USE_LOCKING gets applied as it should be) Not sure I understand what the "libs dependency" is that Conan is working around with their use of -DANDROID - if it is necessary in an OSX context, it could be just the "-lm" that gets added on both Linux and BSD together with the NO_EXPRECISION.

@leezu
Copy link
Contributor Author

leezu commented Jun 1, 2020

@martin-frbg there is also another issue when building with cmake -DDYNAMIC_ARCH=1 -DNOFORTRAN=0 -DBUILD_WITHOUT_LAPACK=0 .. on OSX:

2020-06-01T19:07:57.5971040Z In file included from /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/build/kernel/CMakeFiles/sgemm_kernel_SKYLAKEX.c:7:
2020-06-01T19:07:57.5971320Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:504:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5974160Z     for(;n_count>23;n_count-=24) COMPUTE_n24
2020-06-01T19:07:57.5974290Z                                  ^
2020-06-01T19:07:57.5974860Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:454:5: note: expanded from macro 'COMPUTE_n24'
2020-06-01T19:07:57.5975260Z     "vbroadcastss %8,%%zmm0; vmovups %10,%%zmm1; vmovups %11,%%zmm2;"\
2020-06-01T19:07:57.5975390Z     ^
2020-06-01T19:07:57.5975500Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:504:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5976110Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:454:5: note: expanded from macro 'COMPUTE_n24'
2020-06-01T19:07:57.5976240Z     "vbroadcastss %8,%%zmm0; vmovups %10,%%zmm1; vmovups %11,%%zmm2;"\
2020-06-01T19:07:57.5976330Z     ^
2020-06-01T19:07:57.5976440Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:504:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5976980Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:454:5: note: expanded from macro 'COMPUTE_n24'
2020-06-01T19:07:57.5977160Z     "vbroadcastss %8,%%zmm0; vmovups %10,%%zmm1; vmovups %11,%%zmm2;"\
2020-06-01T19:07:57.5977260Z     ^
2020-06-01T19:07:57.5977360Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:505:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5977810Z     for(;n_count>19;n_count-=20) COMPUTE(20)
2020-06-01T19:07:57.5977910Z                                  ^
2020-06-01T19:07:57.5978400Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5978520Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5978630Z     ^
2020-06-01T19:07:57.5978730Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:506:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5979170Z     for(;n_count>15;n_count-=16) COMPUTE(16)
2020-06-01T19:07:57.5979270Z                                  ^
2020-06-01T19:07:57.5979760Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5979880Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5979970Z     ^
2020-06-01T19:07:57.5980080Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:507:34: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5980500Z     for(;n_count>11;n_count-=12) COMPUTE(12)
2020-06-01T19:07:57.5980610Z                                  ^
2020-06-01T19:07:57.5981100Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5981200Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5981300Z     ^
2020-06-01T19:07:57.5981410Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:508:32: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5981830Z     for(;n_count>7;n_count-=8) COMPUTE(8)
2020-06-01T19:07:57.5981930Z                                ^
2020-06-01T19:07:57.5982420Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5982530Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5982620Z     ^
2020-06-01T19:07:57.5982710Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:509:32: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5983710Z     for(;n_count>3;n_count-=4) COMPUTE(4)
2020-06-01T19:07:57.5983820Z                                ^
2020-06-01T19:07:57.5984500Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5984650Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5984740Z     ^
2020-06-01T19:07:57.5984840Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:510:32: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5985320Z     for(;n_count>1;n_count-=2) COMPUTE(2)
2020-06-01T19:07:57.5985420Z                                ^
2020-06-01T19:07:57.5985900Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5986030Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5986120Z     ^
2020-06-01T19:07:57.5986220Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:511:19: error: inline assembly requires more registers than available
2020-06-01T19:07:57.5986330Z     if(n_count>0) COMPUTE(1)
2020-06-01T19:07:57.5986410Z                   ^
2020-06-01T19:07:57.5986910Z /Users/runner/runners/2.263.0/work/OpenBLAS/OpenBLAS/kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c:421:5: note: expanded from macro 'COMPUTE'
2020-06-01T19:07:57.5987030Z     "vbroadcastss %7,%%zmm0; vmovups %9,%%zmm1; vmovups %10,%%zmm2;"\
2020-06-01T19:07:57.5987110Z     ^
2020-06-01T19:07:57.5987190Z 10 errors generated.

reference: https://github.com/leezu/OpenBLAS/runs/728422254?check_suite_focus=true

@martin-frbg
Copy link
Collaborator

Hmm. Wonder why that is - when the build gets there it must have confirmed that AVX512 support is available. Possibly an OSX LLVM bug ?

@leezu
Copy link
Contributor Author

leezu commented Jun 1, 2020

@martin-frbg But why would cmake -DDYNAMIC_ARCH=1 -DNOFORTRAN=0 -DBUILD_WITHOUT_LAPACK=0 -DUSE_THREAD=1 -DUSE_LOCKING=1 -DNO_EXPRECISION=1 .. work (see log above)?

@martin-frbg
Copy link
Collaborator

No idea unless builds happened on different hardware (one Xeon so including the AVX512 "SkylakeX" kernels, the other not)

@leezu
Copy link
Contributor Author

leezu commented Jun 1, 2020

Maybe. I added a commit #2638 to print the CPU arch. It now shows


machdep.cpu.brand_string: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
machdep.cpu.family: 6
machdep.cpu.model: 58
machdep.cpu.extmodel: 3
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 9
machdep.cpu.feature_bits: 18427078393948011519
machdep.cpu.leaf7_feature_bits: 643 0
machdep.cpu.leaf7_feature_bits_edx: 3154117632
machdep.cpu.extfeature_bits: 4967106816
machdep.cpu.signature: 198313
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH MMX FXSR SSE SSE2 SS HTT SSE3 PCLMULQDQ MON VMX SSSE3 CX16 SSE4.1 SSE4.2 x2APIC POPCNT AES VMM PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C

@martin-frbg
Copy link
Collaborator

Confused... but one difference appears to be that the cmake build lost the "-O2" , wonder how and when that happened (and disabling optimization for the skx sgemm kernel is sufficient to break the gmake build).

@martin-frbg
Copy link
Collaborator

The "-O2" was removed from system.cmake in #1279 , possibly on the premise that the "correct" way to handle this in cmake would be to specify a CMAKE_BUILD_TYPE or to set CMAKE_C_FLAGS. The default build type (with neither Debug or Release specified) seems to invoke the compiler without any optimization, which for some reason makes llvm run out of registers in the case of that one AVX512-heavy source file.

@leezu
Copy link
Contributor Author

leezu commented Jun 3, 2020

Thank you @martin-frbg for investigating this and finding the CMAKE_BUILD_TYPE=Release fix. This is indeed a confusing issue.. Do you think that's worth a bug report to the Clang team? (Though need to verify first if it happens on latest clang)

@martin-frbg
Copy link
Collaborator

Probably could not hurt (googling found lots of references to that "more registers than available" message but most seemed to be for ARM or 32bit x86). Sorry for messing up your PR BTW - at first I had actually got it wrong and thought overoptimization was causing the problem.
There is currently one other difference between gmake and cmake builds, cmake is missing the
MAX_STACK_ALLOC=2048 that was introduced in #482 to speed up gemv/ger for small matrix sizes. I'll see if I can address that tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants