-
Notifications
You must be signed in to change notification settings - Fork 466
Allow a possible installation of index-64 library alongside standard index-32 library? #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Aisha, that makes total sense to me, but let us see if we have feedback from others. So let us wait a few days. J |
Most definitely, that sounds like a good plan. |
OMG @langou Just for completeness, I am writing down things that we still have to do:
Any suggestions to solve the first point are welcome, I unfortunately don't have a "clean" solution. |
A couple of questions, which will help me handle the naming for the files
|
Hi @epsilon-0,
you might be looking for PR #218. The author of this PR is Björn Esser from the Fedora Project. |
Hi @epsilon-0. Did #462 solve this issue? |
@weslleyspereira no this isn't complete yet.
|
Ok, I see. Thanks for the quick follow-up! |
I have similar trouble trying to package things with pkgsrc. I would like to have a complete install of the reference, with cblas and lapacke. For differing implementations being installed at the same time, I settled for differing library names and sub-directories for the headers, so for example
(and so forth) We do not consider runtime switching like the binary distros, so it is OK if each cblas.h (and lapacke.h) is specific to its matched library, like with extra names for libopenblas. Build-time selection happens via
(etc.) That's what the .pc files are supposed to say, and it a lot easier than to communicate a different header file name. They are not yet consistent in that, but I am fixing it up. Seems so far people just hacked that in their distros, if bothering with all reference libs at all. I got one question about those headers, though. I am hacking the cmake build to get each component built separately, and am trying some other fixup (see #556). I get the libblas.so and libblas64.so libraries built fine, I get the header dirs configured … but the installed cblas.h and lapacke.h are identical for the 32 and 64 bit indexing versions. This is at odds with openblas: There, I got a crucial difference that I do not see for the netlib builds:
For the reference libraries, all headers from the 32 and 64 bit index builds are identical and apparently users are expected to put Do we agree that it is the right solution to modify the headers at build time to define the correct integers? Btw, the cblas config files that are installed also miss any reference to the necessary defs, so are broken for the 64 bit index builds, as it seems. But actually, I ponder not installing these at all. They are redundant with the .pc files and make it possibly harder to convince cmake-using dependent packages to accept a packager choice via |
PS: With Intel MKL, there is a central switch
to fit the general scheme. I could also put the define into BLAS_INCLUDES, same for the weird netlib defines. What is better? Do we want to do it like Intel or like OpenBLAS? |
Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.
Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use |
This is ambiguous to me. Which is the cleaner solution? What I am preparing now is such:
The CMakeFile shall replace HAVE_ILP with 1 or 0, the resulting header being installed for the current build. (Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)
I am imagining a future where you do
And things are handled in foo/include/netlib64/cblas.h, otherwise by foo/include/netlib/cblas.h (possibly linked to foo/include/cblas.h). I have the suspicion that this is not what you meant, but I want to convince that it is better;-) You could try to not to duplicate the header by placing 'the' header in /foo/include/cblas.h and have /foo/include/netlib64/cblas.h include that one only by defining WeirdNEC, but that means that the 64 bit and 32 bit packages share that common header file, which is messy for packaging. It is way better if each puts its file into separate places/names. The name needs to stay cblas.h because you don't want to go around replacing Edit: Also, having cblas.h include ../cblas.h is messy by itself. Also we define one header installation directory for cmake. By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):
|
A beautiful aspect of shipping/installing the 32 bit cblas.h with this modification to the usual location is that the original mechanics still work. Only the 64 bit variant will enforce WeirdNEC. You could decide to only install the 64 bit one into a prefix and keep the other parts of the ecosystem untouched. |
Oh, come on … the CBLAS/cmake/cblas-config-install.cmake.in seems to forget -DCMAKE_INSTALL_INCLUDEDIR, doesn't it?
(The comment is sugar on top.) I have the feeling that the CMake build is a lot less mature at one would think. Is the project serious about having that as primary build or is this just a drive-by contribution? I am really tempted to rather fix up the old-style Makefile, less fuss all around. But I now sank so much time into fixing up the CMake stuff, that I detest anyway. So I would like to get it over with. |
I have to give up now … I managed to move cblas.h to cblas.h.in as indicated above, and added
to CBLAS/include/CMakeLists.txt, also having defined What is the macro append_subdir_files supposed to do? It seems to prepend a copy of the prefix to the header paths. I got not enough or too much path to the source header files. I just want to install the header files from HERE to THERE, dammit. Can someone knowledgeable help out here? I guess I could figure it out tomorrow, but I am not sure if that is without smashing something in the real world for emotional relief. |
Sorry, let me explain. At first, I liked the idea of keeping the original header #if defined(WeirdNEC)
#define WeirdNEC
#endif
#include <cblas.h>
but yes, we would have to use By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):
That seems good to me. So, you would only add an alternative to build LAPACK without having to guess the compiler flags. But the current way would also work.
Good to know. BLAS++ and LAPACK++ use int64_t instead of long long. |
@weslleyspereira So you at first liked this idea:
with /prefix/include/cblas.h and /prefix/include/netlib64/cblas.h, the latter locating the former? But you do agree now that it is a more robust solution to install a header that looks like this for a 64 bit build?
(long vs. int64 is a different matter, but I am all for doing that change, just like BLAS++) Heck, I am not even sure if it is safe to assume that `#include ".../cblas.h" will find only the other indented header. The C standard seems to say that search order is implementation-defined, not necessarily relative to the current header. My main issue as packager is that I'd need a separate package for that common header or have the 64 bit package depend on the 32 bit one just for that. This would suck. I would really like to go forward now with such a change for pkgsrc, to settle a change for the upstream code later. We could discuss a new symbol to force 32 bit indices or 64 bit indices explicitly with any of the headers ( Can I get agreement about our intended solution being this?
and
Each build of LAPACK code results in headers that, at least by default, match the installed libraries without the user defining anything. OK? I then could slip this in before the upcoming release of pkgsrc (deadline nearing) and we can further discuss the details of that implementation so that I can drop the patches after merging something here, with a new LAPACK release. With this change, the plain Makefile build also needs fixing, but I don't need that for my patches yet when I just use the CMake build. (Just need to somehow check my temper when trying to beat that weird CMake build into submission, where it shuffles header copies around the build directories and then cannot find them for install. Or decide about those broken .cmake files having any use for us, maybe just drop them from install … we got pkg-config!) |
Anything? I must admit that I don't see much chance for a different solution in practice, as this is the example set by openblas, the main implementation we use. I can imagine convincing Intel to have subdirectory for 64 bit/32 bit index headers, too, wrapping over their mkl_cblas.h and mkl_lapacke.h. Otherwise I build a simple package that just provides those.
Currently, I added machinery to pkgsrc to provide builds with the funny |
Yes, that's it. I agree with your solution of having subfolders for the 32- and 64-bits headers. I discussed this with @langou, and he was also convinced this would be a good solution.
Right. This should be addressed in another issue.
Yes. I think you can go forward and propose a PR in the future, thanks! I personally think a new symbol like
Sounds good to me.
Ok! We will probably have a LAPACK release in the second semester of 2021. And yes, the Makefile should be adjusted accordingly, and I am willing to help with that. |
This is somewhat related. We should not forget that the headers for netlib CBLAS are not only provided by netlib … NumPy always uses its own header: https://github.com/numpy/numpy/blob/main/numpy/core/src/common/npy_cblas.h And in this header, it sets
The difference:
I wonder if that's possibly causing trouble. For Netlib, there is only one type of index, while other implementations use a differing return value type for the index functions. OpenBLAS sets the example. They say isamax returns unsigned size_t, but the C wrapper actually calls a Fortran function that returns a signed integer (Edit: A subroutine that writes a signed 32 or 64 bit integer value to a handed-in reference to an unsigned 64 bit variable on 64 bit systems). Does the reference implementation have an opinion about this? I guess there is no real trouble, as a size_t value will always be able to hold any non-negative return from isamax(). But it smells iffy. (Edit: You could build with 64 bit indices on a 32 bit system where size_t is 32 bits, right? Then you got overflow. In additon to the uneasyness of casting Since optmized implementations seem to have decided on size_t there, should the reference accept that fact and follow? |
And how dangerous is it, actually, to link numpy with reference cblas? |
I certainly cannot speak for numpy (or mkl etc. for that matter), but I'd hesitate to claim OpenBLAS to be normative in any form, least of all relative to what is (I believe) generally seen as the reference implementation... |
Sure. It's just that OpenBLAS or MKL are what people use in practice and both seem to have settled on
or similarily
vs. the reference
How come that they diverge from the reference here? Was there communication about that? Also … I see MKL and OpenBLAS defining a host of functions that aren't even part of reference CBLAS:
So, extending the standard is one thing, but
… handing in an address of size_t for iamax, that just seems wrong. I didn't find another implementation than this reference one in the OpenBLAS sources. Are they just stupid changing the external type like that or am I overlooking something very basic? Is anyone actually using these functions? |
Hi all, Reference BLAS, reference CBLAS, reference LAPACK, two of the main thrusts of these projects are (1) numerical algorithms and (2) defining common interfaces, a reference implementation and a test suite that goes this. I think everyone involved in these projects is happy to look and learn from other projects (OpenBLAS, MKL, etc.) about software engineering, best practice for deploying the software, etc. We have a lot to learn from these projects. (And we also learn a lot from other numerical linear algebra projects!) Anyhow: reference BLAS, CBLAS, LAPACK can use some improvement in its CMake packaging, interfaces, and if OpenBLAS (e.g.) has a better process, that is well suited for us, well, I am all in favor moving toward this model. |
To add some context, the CBLAS was born from a committee (the Basic Linear Algebra Subprograms Technical Forum) that worked from 1996 to 2000 on revisiting the BLAS, as part of this they defined a C interface for the BLAS. See: If there are suggestions to improve CBLAS, send them along. I can try to pass this to the various stakeholders. |
Thanks for the pointer. So the relevant part seems to be B.2.2 in that spec, which says that So it seems that popular optimized implementations chose
In https://github.com/LuaDist/gsl/blob/master/cblas/gsl_cblas.h, this is accompanied by
This sounds like this originated in the reference implementation, but has since changed? Looking at 4177968 … I see that the I also see that the earlier version of cblas.h handed an But could it be that the optimized libraries that inherited an old version of cblas.h with
This results in
Iniitalizing the To conclude:
This sounds like a wonderful breakage as a result. Headers and code diverged. How come nobody noticed issues yet? Or did I miss the part where the reference CBLAS wrapper code for isamax and friends is not actually used? |
OpenBLAS at least does not use the CBLAS wrapper code from Reference-LAPACK (and never did, the source is there but does not get built) |
@martin-frbg Good to know. Can you point out a code path for, say, x86-64 that shows how the size_t is passed around to the actual computation for cblas_isamax()? I found some specific kernel implementation but am not sure about the general case. Would be good to know that nobody actually passes a For sure it's not good that projects just assume
when the actual library might offer an int or long (or int64_t) as return value. Might work most of the time with values in 64 bit registers, but it's not nice. Can we rectify this in the implementations? People did not pick up on the example of Netlib in the past 5 years about consinstent use of |
relevant code is in OpenBLAS/interface, e.g. interface/imax.c gets compiled to cblas_isamax() when CBLAS is defined, no Fortran code involved in its call graph. |
Ah, good. So the one case which is actually problematic is depending projects using a copy of cblas.h that does not fit the library. I don't find actual usage of
Is this a separate issue to discuss? It does releate to the choice of 32 or 64 bit library, though. PS: I am still unsure if enums in the API are a good idea (as actual data type for function arguments and struct members), as there are compiler options to change what integer is used underneath them. Not that relevant in practice, but makes me uneasy nevertheless. |
The more I think of this, the more I lean towards Option 2: We had size_t in the API for a very long time. Then Netlib changed that size_t to int or long. Regardless of what better matches the Fortran code or might be more consistent, the size_t was established API and Netlib reference broke that. Should I open a PR about changing things for
again? There should be no macro at this position anymore to emphasize that it's always size_t, everywhere, past and future. |
In numpy/numpy#19243 we now basically got down to: „Screw Netlib, size_t works for everyone else“. |
There are three reasons to use
No because a 32-bit system may have more than 4 GB of virtual memory (Linux supports this) but a single 32-bit process can never access more than 4 GB. That is, the upper 32-bit of the 64-bit indices are never used. Memory limit to a 32-bit process running on a 64-bit Linux OS |
I'm also thinking that keeping size_t is the correct thing to do, because changing from it was the ABI break and brought Netlib out of sync with the rest of the world. But I feel compelled to nitpick on your arguments;-)
When I researched this,I stumbled upon the admission that it was a historic error to use an unsigned type for C++ container indices, and probably even the return type of the size() method, as you quickly end up mixing signed and unsigned numbers in some way. The current state of Netlib would be consistent with itself, always using signed types for size and indices, but of course inconsistent with I am wondering myself right about that in code I wrote where I eventually hand an index as offset to a function call. The index is unsigned, the offset signed. Apart from compilers (MSVC) being confused by But anyway, if it's just about computing memory sizes to hand to malloc() and friends, size_t is the natural thing, and it has been there before in CBLAS. On possible issues with the current state of the code, mismatch with vendored
Right,
The x86 calling convention might put the 64 bit value into EAX and EDX. Or it might work with a pointer return and some buffer. But what would other architectures do? So you might not get corruption, but for sure a wrong value. The best case is the higher 32 bits being ignored. Now imagine a big-endian 32 bit system (some form of ARM) … sure you'd even get the desired half of the value returned? You couldn't really work with non-sparse data that needs 64 bit indices in the 32 bit program, sure. But just being able to make a non-matching function call that at_least gives wrong results seems unhealthy. I did some quick testing … on x86 Linux ( The more interesting case … 64 bit size_t:
Again, on x86-64, the peculiar relation between 64 bit RAX and 32 bit EAX makes things somewhat well-defined to also just silently zero the upper 32 bits once you do a 32 bit operation on the shared register. But there is fun to have with a slightly weird function defintion:
You could argue if it is smart for the compiler to work on the full 64 bit register and leave the upper 32 bits uncleared for a function that is expected to return a 32 bit value, but it is perfectly legal if you rely on the caller only using the lower 32 bits, I guess.
Fun. A 32 bit return value that gives more than what's possible in 32 bits. This is what can happen (in principle) with the current state of Netlib CBLAS being linked with code that expects size_t. I guess though that the upper 32 bits of RAX will be zero in the actual code in practice. But who knows … compiler expects caller not to use more than the lower 32 bits on any platform … might as well store garbage there. So … are we agreeing on moving Netlib back to size_t as return value? |
Thanks for all these valuable comments! I discussed this topic a bit with @langou. Based on the discussion here, my proposal is: In a separate PR:
I think this is aligned (or maybe the same) idea behind the recent discussions in this thread. Just to reinforce:
Do you agree? If you do, I can open the PR. Or maybe @drhpc would like to do that. |
I agree. And please just go ahead with the PR. |
@mgates3 mentioned to me the discussion on the Slate Google Group: |
Please, see #588. |
This is game over. On 32-bit Arm CPUs, four 32-bit values can be passed and returned in registers, 64-bit values occupy two consecutive registers, see Section 6.1.1.1 in Procedure Call Standard for the Arm Architecture. Instead of writing to one register, the callee will clutter two registers with his 64-bit integers; this is obviously a problem. As soon as the caller runs out of registers for parameters, the stack is used. The stack alignment is 32 bits but instead of reading or writing 32 bit, the callee writes 64 bit; again, this is game over and this problem (mismatch of stack read/write sizes) should cause problems on all instruction set architectures at some point. |
No, the standard committees behind C and C++ make your code behave the obvious way in this case: If The relevant clauses in the C11 standard draft are the following:
Hence,
In the absence of over- or underflow, this expression will evaluate to |
There are some C++ programmers (including the inventor of C++) who are proposing to use signed integers everywhere, see the C++ Core Guidelines but I would not call this an admission. The problem with the "signed integers everywhere" policy are
You can attempt to check for a signed overflow with the expression |
Just FYI. I'm the Debian developer who added the BLAS64 and LAPACK64 debian packages. And I developed the inital version of blas-lapack-switching mechanism for Gentoo. Feel free to at me if you would like my inputs. |
Currently debian has (some) separate headers for blas64 and cblas64 (though not from the reference implementation).
I am not sure if they are correct or not, with regards to the reference index64 API (they are from the blis library).
Would it be possible to add an option to cmake, something like
BUILD_INDEX64
, which is defaulted toOFF
, but if has been turned on, it will create the index-64 libraries?If I make a PR for such an option, would it be entertained as a possibility?
Some things which I had in mind for allowing this to co-exist with the standard installation - name the libraries to
libblas64.so, libcblas64.so, liblapack64.so, liblapacke64.so
, this way there is no conflict between the library names (though of course, you can't link with both libblas and libblas64 at the same time).Also the library would need to be compiled twice, once for index32 and once for index64 (but that is a perfectly normal scenario and not a deal breaker).
The only conflict I am unable to think of a good resolution is the header file names.
If following debians style, it might be wise to call the c headers ending with 64 as well.
(I'm maintaining this for Gentoo and would like to keep the ecosystem very close to debian, so that we have minimal problems for developers switching between systems)
I'm open to any suggestions before I make the PR ❤️
Thanks,
Aisha
The text was updated successfully, but these errors were encountered: