Allow a possible installation of index-64 library alongside standard index-32 library? #461

epsilon-0 · 2020-11-01T01:15:09Z

Currently debian has (some) separate headers for blas64 and cblas64 (though not from the reference implementation).

I am not sure if they are correct or not, with regards to the reference index64 API (they are from the blis library).

Would it be possible to add an option to cmake, something like BUILD_INDEX64, which is defaulted to OFF, but if has been turned on, it will create the index-64 libraries?
If I make a PR for such an option, would it be entertained as a possibility?

Some things which I had in mind for allowing this to co-exist with the standard installation - name the libraries to libblas64.so, libcblas64.so, liblapack64.so, liblapacke64.so, this way there is no conflict between the library names (though of course, you can't link with both libblas and libblas64 at the same time).
Also the library would need to be compiled twice, once for index32 and once for index64 (but that is a perfectly normal scenario and not a deal breaker).
The only conflict I am unable to think of a good resolution is the header file names.
If following debians style, it might be wise to call the c headers ending with 64 as well.
(I'm maintaining this for Gentoo and would like to keep the ecosystem very close to debian, so that we have minimal problems for developers switching between systems)

I'm open to any suggestions before I make the PR ❤️

Thanks,
Aisha

The text was updated successfully, but these errors were encountered:

langou · 2020-11-01T01:53:03Z

Hi Aisha, that makes total sense to me, but let us see if we have feedback from others. So let us wait a few days. J

epsilon-0 · 2020-11-01T01:54:35Z

Most definitely, that sounds like a good plan.

epsilon-0 · 2020-11-05T17:09:15Z

OMG @langou
you are so fast ❤️

Just for completeness, I am writing down things that we still have to do:

Figure out how to name headers so that 32bit API can co-exist with 64bit API
Fix printf/fprintf statements so that they use correct qualifier for printing.

Any suggestions to solve the first point are welcome, I unfortunately don't have a "clean" solution.

epsilon-0 · 2020-11-06T23:51:37Z

A couple of questions, which will help me handle the naming for the files

There seems to be a tonne of duplicate definitions between cblas_f77.h and cblas_test.h. Do we really need that?
Does cblas_test.h need to be installed? Given its name (and the files that it is used in) I presume it to be used only during the testing phase. Maybe we should not install this file on a system-wide level?

christoph-conrads · 2021-02-15T08:57:21Z

Hi @epsilon-0,

Some things which I had in mind for allowing this to co-exist with the standard installation - name the libraries to libblas64.so, libcblas64.so, liblapack64.so, liblapacke64.so, this way there is no conflict between the library names (though of course, you can't link with both libblas and libblas64 at the same time).

you might be looking for PR #218. The author of this PR is Björn Esser from the Fedora Project.

weslleyspereira · 2021-04-21T20:00:03Z

Hi @epsilon-0. Did #462 solve this issue?

epsilon-0 · 2021-04-21T20:10:48Z

@weslleyspereira no this isn't complete yet.
There needs to be some more header renaming/handling done.
I am busy for the next few weeks so am not going to be able to do this soon.
Basic outline

the header files should be called cblas.h and cblas64.h, likewise for other headers
- this means the *.c files would need some slight adjustment to include the proper header, but this is only during build time, so it can be hacked out.
the cmake files should be installed under lapack64 or cblas64, etc.

weslleyspereira · 2021-04-21T20:24:55Z

Ok, I see. Thanks for the quick follow-up!

drhpc · 2021-06-03T17:43:55Z

I have similar trouble trying to package things with pkgsrc. I would like to have a complete install of the reference, with cblas and lapacke. For differing implementations being installed at the same time, I settled for differing library names and sub-directories for the headers, so for example

/usr/lib/libopenblas.so
/usr/lib/libopenblas64.so
/usr/lib/libblas.so
/usr/lib/libcblas.so
/usr/lib/libblas64.so
/usr/lib/libcblas64.so
/usr/include/openblas/cblas.h
/usr/include/openblas64/cblas.h
/usr/include/netlib/cblas.h
/usr/include/netlib64/cblas.h
/usr/include/cblas.h -> netlib/cblas.h (for compatibility, having the default)

(and so forth)

We do not consider runtime switching like the binary distros, so it is OK if each cblas.h (and lapacke.h) is specific to its matched library, like with extra names for libopenblas. Build-time selection happens via

BLAS_INCLUDES=-I/prefix/include/netlib64
BLAS_LIBS=-lblas64
CBLAS_LIBS=-lcblas64

(etc.) That's what the .pc files are supposed to say, and it a lot easier than to communicate a different header file name. They are not yet consistent in that, but I am fixing it up. Seems so far people just hacked that in their distros, if bothering with all reference libs at all.

I got one question about those headers, though.

I am hacking the cmake build to get each component built separately, and am trying some other fixup (see #556). I get the libblas.so and libblas64.so libraries built fine, I get the header dirs configured … but the installed cblas.h and lapacke.h are identical for the 32 and 64 bit indexing versions. This is at odds with openblas: There, I got a crucial difference that I do not see for the netlib builds:

diff -ruN /data/pkg/include/openblas/openblas_config.h /data/pkg/include/openblas64/openblas_config.h
--- /data/pkg/include/openblas/openblas_config.h	2021-06-03 19:03:53.000000000 +0200
+++ /data/pkg/include/openblas64/openblas_config.h	2021-06-03 19:13:36.000000000 +0200
@@ -44,6 +44,7 @@
 #define OPENBLAS_DLOCAL_BUFFER_SIZE 32768
 #define OPENBLAS_CLOCAL_BUFFER_SIZE 16384
 #define OPENBLAS_ZLOCAL_BUFFER_SIZE 12288
+#define OPENBLAS_USE64BITINT 
 #define OPENBLAS_GEMM_MULTITHREAD_THRESHOLD 4
 #define OPENBLAS_VERSION " OpenBLAS 0.3.15 "
 /*This is only for "make install" target.*/

For the reference libraries, all headers from the 32 and 64 bit index builds are identical and apparently users are expected to put
-DWeirdNEC in their flags (might have been funny 30 years ago) for cblas.h and -DLAPACK_ILP64 -DHAVE_LAPACK_CONFIG_H. Since people use the optimized BLAS libraries in production, the de facto standard is not to expose that to the users. These feed back on the reference, IMHO, and the headers installed from an ILP64 build should not require funky flags to avoid crashing your app when linking to the 64 bit lib.

Do we agree that it is the right solution to modify the headers at build time to define the correct integers?

Btw, the cblas config files that are installed also miss any reference to the necessary defs, so are broken for the 64 bit index builds, as it seems. But actually, I ponder not installing these at all. They are redundant with the .pc files and make it possibly harder to convince cmake-using dependent packages to accept a packager choice via BLAS_LIBS etal.

drhpc · 2021-06-03T17:56:26Z

PS: With Intel MKL, there is a central switch -DMKL_ILP64 to be set. I imagine setting up trivial
include/intel-mkl64/cblas.h with

#ifndef MKL_ILP64
#define MKL_ILP64
#endif
#include <mkl_cblas.h>

to fit the general scheme. I could also put the define into BLAS_INCLUDES, same for the weird netlib defines. What is better? Do we want to do it like Intel or like OpenBLAS?

weslleyspereira · 2021-06-07T21:15:17Z

Do we agree that it is the right solution to modify the headers at build time to define the correct integers?

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

Btw, the cblas config files that are installed also miss any reference to the necessary defs, so are broken for the 64 bit index builds, as it seems.

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

drhpc · 2021-06-08T00:16:19Z

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

This is ambiguous to me. Which is the cleaner solution? What I am preparing now is such:

#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

The CMakeFile shall replace HAVE_ILP with 1 or 0, the resulting header being installed for the current build.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

I am imagining a future where you do

cc -I/foo/include/netlib64 -o bar bar.c -L/foo/lib -lcblas64

And things are handled in foo/include/netlib64/cblas.h, otherwise by foo/include/netlib/cblas.h (possibly linked to foo/include/cblas.h).

I have the suspicion that this is not what you meant, but I want to convince that it is better;-)

You could try to not to duplicate the header by placing 'the' header in /foo/include/cblas.h and have /foo/include/netlib64/cblas.h include that one only by defining WeirdNEC, but that means that the 64 bit and 32 bit packages share that common header file, which is messy for packaging. It is way better if each puts its file into separate places/names. The name needs to stay cblas.h because you don't want to go around replacing #include <cblas.h> lines.

Edit: Also, having cblas.h include ../cblas.h is messy by itself. Also we define one header installation directory for cmake. By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):

.if !empty(LAPACK_COMPONENT:M*64)
.  if empty(MACHINE_ARCH:M*64)
PKG_FAIL_REASON+=       "${LAPACK_COMPONENT} incompatible with non-64-bit platform"
.  endif
HEADERDIR=netlib64
.else
HEADERDIR=netlib
.endif

# Note: We patch the build to install both static and
# shared libraries.
CMAKE_ARGS=     -DBUILD_DEPRECATED=ON \
                -DBUILD_SHARED_LIBS=ON \
                -DBUILD_STATIC_LIBS=ON \
                -DCMAKE_INSTALL_INCLUDEDIR=${PREFIX}/include/${HEADERDIR} \
                ${LAPACK_COMPONENT_CMAKE_ARGS}

drhpc · 2021-06-08T00:29:42Z

A beautiful aspect of shipping/installing the 32 bit cblas.h with this modification to the usual location is that the original mechanics still work. Only the 64 bit variant will enforce WeirdNEC. You could decide to only install the 64 bit one into a prefix and keep the other parts of the ecosystem untouched.

drhpc · 2021-06-08T00:48:34Z

Oh, come on … the CBLAS/cmake/cblas-config-install.cmake.in seems to forget -DCMAKE_INSTALL_INCLUDEDIR, doesn't it?

# Report lapacke header search locations.
set(CBLAS_INCLUDE_DIRS ${_CBLAS_PREFIX}/include)

(The comment is sugar on top.)

I have the feeling that the CMake build is a lot less mature at one would think. Is the project serious about having that as primary build or is this just a drive-by contribution? I am really tempted to rather fix up the old-style Makefile, less fuss all around. But I now sank so much time into fixing up the CMake stuff, that I detest anyway. So I would like to get it over with.

drhpc · 2021-06-08T02:08:58Z

I have to give up now … I managed to move cblas.h to cblas.h.in as indicated above, and added

configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cblas.h.in cblas.h @ONLY)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cblas_f77.h.in cblas_f77.h @ONLY)

to CBLAS/include/CMakeLists.txt, also having defined @HAVE_ILP64@ to 1 or 0 in the toplevel CMakeLists.txt. But I can for the life of me not figure out how to make the install stuff which is in a higher-level CMakeLists.txt install the generated headers, or the weird copy of the same from ${LAPACK_BINARY_DIR}/include (really? A copy within the source tree?)

What is the macro append_subdir_files supposed to do? It seems to prepend a copy of the prefix to the header paths. I got not enough or too much path to the source header files. I just want to install the header files from HERE to THERE, dammit.

Can someone knowledgeable help out here? I guess I could figure it out tomorrow, but I am not sure if that is without smashing something in the real world for emotional relief.

weslleyspereira · 2021-06-08T12:53:44Z

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

This is ambiguous to me. Which is the cleaner solution? What I am preparing now is such:
#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif
The CMakeFile shall replace HAVE_ILP with 1 or 0, the resulting header being installed for the current build.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

I am imagining a future where you do
cc -I/foo/include/netlib64 -o bar bar.c -L/foo/lib -lcblas64
And things are handled in foo/include/netlib64/cblas.h, otherwise by foo/include/netlib/cblas.h (possibly linked to foo/include/cblas.h).

I have the suspicion that this is not what you meant, but I want to convince that it is better;-)

Sorry, let me explain. At first, I liked the idea of keeping the original header cblas.h and creating include/netlib64/cblas.h and include/netlib/cblas.h with something like

#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include <cblas.h>

You could try to not to duplicate the header by placing 'the' header in /foo/include/cblas.h and have /foo/include/netlib64/cblas.h include that one only by defining WeirdNEC, but that means that the 64 bit and 32 bit packages share that common header file, which is messy for packaging. It is way better if each puts its file into separate places/names. The name needs to stay cblas.h because you don't want to go around replacing #include <cblas.h> lines.

Edit: Also, having cblas.h include ../cblas.h is messy by itself. Also we define one header installation directory for cmake.

but yes, we would have to use include/netlib64 and include in as include dirs if we one header includes the other.

By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):

.if !empty(LAPACK_COMPONENT:M*64)
.  if empty(MACHINE_ARCH:M*64)
PKG_FAIL_REASON+=       "${LAPACK_COMPONENT} incompatible with non-64-bit platform"
.  endif
HEADERDIR=netlib64
.else
HEADERDIR=netlib
.endif

# Note: We patch the build to install both static and
# shared libraries.
CMAKE_ARGS=     -DBUILD_DEPRECATED=ON \
                -DBUILD_SHARED_LIBS=ON \
                -DBUILD_STATIC_LIBS=ON \
                -DCMAKE_INSTALL_INCLUDEDIR=${PREFIX}/include/${HEADERDIR} \
                ${LAPACK_COMPONENT_CMAKE_ARGS}

That seems good to me. So, you would only add an alternative to build LAPACK without having to guess the compiler flags. But the current way would also work.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Good to know. BLAS++ and LAPACK++ use int64_t instead of long long.

drhpc · 2021-06-09T08:07:43Z

@weslleyspereira So you at first liked this idea:

#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include "../cblas.h"

with /prefix/include/cblas.h and /prefix/include/netlib64/cblas.h, the latter locating the former? But you do agree now that it is a more robust solution to install a header that looks like this for a 64 bit build?

#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

(long vs. int64 is a different matter, but I am all for doing that change, just like BLAS++)

Heck, I am not even sure if it is safe to assume that `#include ".../cblas.h" will find only the other indented header. The C standard seems to say that search order is implementation-defined, not necessarily relative to the current header. My main issue as packager is that I'd need a separate package for that common header or have the 64 bit package depend on the 32 bit one just for that. This would suck.

I would really like to go forward now with such a change for pkgsrc, to settle a change for the upstream code later. We could discuss a new symbol to force 32 bit indices or 64 bit indices explicitly with any of the headers (-DNETLIB_INDEX_BITS=64 ?), just defaulting to what the library was built with.

Can I get agreement about our intended solution being this?

lib/libcblas64.so
include/optional_subdir64/cblas.h

and

lib/libcblas.so
include/optional_subdir/cblas.h

Each build of LAPACK code results in headers that, at least by default, match the installed libraries without the user defining anything. OK?

I then could slip this in before the upcoming release of pkgsrc (deadline nearing) and we can further discuss the details of that implementation so that I can drop the patches after merging something here, with a new LAPACK release. With this change, the plain Makefile build also needs fixing, but I don't need that for my patches yet when I just use the CMake build.

(Just need to somehow check my temper when trying to beat that weird CMake build into submission, where it shuffles header copies around the build directories and then cannot find them for install. Or decide about those broken .cmake files having any use for us, maybe just drop them from install … we got pkg-config!)

drhpc · 2021-06-10T18:28:50Z

Anything? I must admit that I don't see much chance for a different solution in practice, as this is the example set by openblas, the main implementation we use. I can imagine convincing Intel to have subdirectory for 64 bit/32 bit index headers, too, wrapping over their mkl_cblas.h and mkl_lapacke.h. Otherwise I build a simple package that just provides those.

include/mkl-blas/cblas.h
include/mkl-blas64/cblas.h

Currently, I added machinery to pkgsrc to provide builds with the funny -DWeirdNEC -DHAVE_LAPACK_CONFIG_H -DLAPACK_ILP64 line, with both cblas and cblas64 installing identical headers. It could stay that way, but I still think it makes sense to have the header set up to match the build ABI.

weslleyspereira · 2021-06-11T14:42:39Z

@weslleyspereira So you at first liked this idea:
#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include "../cblas.h"
with /prefix/include/cblas.h and /prefix/include/netlib64/cblas.h, the latter locating the former? But you do agree now that it is a more robust solution to install a header that looks like this for a 64 bit build?
#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

Yes, that's it. I agree with your solution of having subfolders for the 32- and 64-bits headers. I discussed this with @langou, and he was also convinced this would be a good solution.

(long vs. int64 is a different matter, but I am all for doing that change, just like BLAS++)

Right. This should be addressed in another issue.

I would really like to go forward now with such a change for pkgsrc, to settle a change for the upstream code later. We could discuss a new symbol to force 32 bit indices or 64 bit indices explicitly with any of the headers (-DNETLIB_INDEX_BITS=64 ?), just defaulting to what the library was built with.

Can I get agreement about our intended solution being this?
lib/libcblas64.so
include/optional_subdir64/cblas.h
and
lib/libcblas.so
include/optional_subdir/cblas.h

Yes. I think you can go forward and propose a PR in the future, thanks! I personally think a new symbol like NETLIB_INDEX_BITS makes total sense. I would just make sure the default value remains 32, and that -DWeirdNEC implies -DNETLIB_INDEX_BITS=64.

Each build of LAPACK code results in headers that, at least by default, match the installed libraries without the user defining anything. OK?

Sounds good to me.

I then could slip this in before the upcoming release of pkgsrc (deadline nearing) and we can further discuss the details of that implementation so that I can drop the patches after merging something here, with a new LAPACK release. With this change, the plain Makefile build also needs fixing, but I don't need that for my patches yet when I just use the CMake build.

Ok! We will probably have a LAPACK release in the second semester of 2021. And yes, the Makefile should be adjusted accordingly, and I am willing to help with that.

drhpc · 2021-06-12T07:52:14Z

This is somewhat related. We should not forget that the headers for netlib CBLAS are not only provided by netlib … NumPy always uses its own header:

https://github.com/numpy/numpy/blob/main/numpy/core/src/common/npy_cblas.h

And in this header, it sets CBLAS_INDEX=size_t, different from the integer type used in specifying indices. It is used solely for the return values of some functions:

$ grep CBLAS_INDEX ./numpy/core/src/common/npy_cblas_base.h                                                                                                                                  
CBLAS_INDEX BLASNAME(cblas_isamax)(const BLASINT N, const float  *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_idamax)(const BLASINT N, const double *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_icamax)(const BLASINT N, const void   *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_izamax)(const BLASINT N, const void   *X, const BLASINT incX);

The difference:

$ grep cblas_isamax ./numpy/core/src/common/npy_cblas_base.h  /data/pkg/include/cblas.h                                                                                                      
./numpy/core/src/common/npy_cblas_base.h:CBLAS_INDEX BLASNAME(cblas_isamax)(const BLASINT N, const float  *X, const BLASINT incX);
/data/pkg/include/cblas.h:CBLAS_INDEX cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);

I wonder if that's possibly causing trouble. For Netlib, there is only one type of index, while other implementations use a differing return value type for the index functions. OpenBLAS sets the example. They say isamax returns unsigned size_t, but the C wrapper actually calls a Fortran function that returns a signed integer (Edit: A subroutine that writes a signed 32 or 64 bit integer value to a handed-in reference to an unsigned 64 bit variable on 64 bit systems).

Does the reference implementation have an opinion about this? I guess there is no real trouble, as a size_t value will always be able to hold any non-negative return from isamax(). But it smells iffy. (Edit: You could build with 64 bit indices on a 32 bit system where size_t is 32 bits, right? Then you got overflow. In additon to the uneasyness of casting size_t * to int *.)

Since optmized implementations seem to have decided on size_t there, should the reference accept that fact and follow?

drhpc · 2021-06-12T07:58:05Z

And how dangerous is it, actually, to link numpy with reference cblas?

martin-frbg · 2021-06-12T16:17:36Z

OpenBLAS sets the example. (...)
Since optmized implementations seem to have decided on size_t there, should the reference accept that fact and follow?

I certainly cannot speak for numpy (or mkl etc. for that matter), but I'd hesitate to claim OpenBLAS to be normative in any form, least of all relative to what is (I believe) generally seen as the reference implementation...

drhpc · 2021-06-12T17:08:34Z

Sure. It's just that OpenBLAS or MKL are what people use in practice and both seem to have settled on

#define CBLAS_INDEX size_t  /* this may vary between platforms */
#ifdef MKL_ILP64
#define MKL_INT MKL_INT64
#else
#define MKL_INT int
#endif
CBLAS_INDEX cblas_isamax(const MKL_INT N, const float  *X, const MKL_INT incX);

or similarily

#ifdef OPENBLAS_USE64BITINT
typedef BLASLONG blasint;
#else
typedef int blasint;
#endif
#define CBLAS_INDEX size_t
CBLAS_INDEX cblas_isamax(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);

vs. the reference

#ifdef WeirdNEC
   #define CBLAS_INDEX long
#else
   #define CBLAS_INDEX int
#endif
CBLAS_INDEX cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);

How come that they diverge from the reference here? Was there communication about that? Also … I see MKL and OpenBLAS defining a host of functions that aren't even part of reference CBLAS:

CBLAS_INDEX cblas_isamin(const MKL_INT N, const float  *X, const MKL_INT incX);
CBLAS_INDEX cblas_idamin(const MKL_INT N, const double *X, const MKL_INT incX);
CBLAS_INDEX cblas_icamin(const MKL_INT N, const void   *X, const MKL_INT incX);
CBLAS_INDEX cblas_izamin(const MKL_INT N, const void   *X, const MKL_INT incX);

CBLAS_INDEX cblas_isamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

CBLAS_INDEX cblas_ismax(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

CBLAS_INDEX cblas_ismin(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

So, extending the standard is one thing, but size_t vs int seems to be a serious issue on 64 bit systems. This should be settled some way. It seems to me that the Netlib way is sensible: Same type as that which is used for indices. As all call Fortran routines like this in the end

c     isamaxsub.f
c
c     The program is a fortran wrapper for isamax.
c     Witten by Keita Teranishi.  2/11/1998
c
      subroutine isamaxsub(n,x,incx,iamax)
c
      external isamax
      integer  isamax,iamax
      integer n,incx
      real x(*)
c
      iamax=isamax(n,x,incx)
      return
      end

… handing in an address of size_t for iamax, that just seems wrong. I didn't find another implementation than this reference one in the OpenBLAS sources. Are they just stupid changing the external type like that or am I overlooking something very basic? Is anyone actually using these functions?

langou · 2021-06-12T17:24:11Z

Hi all, Reference BLAS, reference CBLAS, reference LAPACK, two of the main thrusts of these projects are (1) numerical algorithms and (2) defining common interfaces, a reference implementation and a test suite that goes this. I think everyone involved in these projects is happy to look and learn from other projects (OpenBLAS, MKL, etc.) about software engineering, best practice for deploying the software, etc. We have a lot to learn from these projects. (And we also learn a lot from other numerical linear algebra projects!) Anyhow: reference BLAS, CBLAS, LAPACK can use some improvement in its CMake packaging, interfaces, and if OpenBLAS (e.g.) has a better process, that is well suited for us, well, I am all in favor moving toward this model.

langou · 2021-06-12T19:57:16Z

To add some context, the CBLAS was born from a committee (the Basic Linear Algebra Subprograms Technical Forum) that worked from 1996 to 2000 on revisiting the BLAS, as part of this they defined a C interface for the BLAS. See:
http://www.netlib.org/blas/blast-forum/
In particular see:
http://www.netlib.org/blas/blast-forum/cinterface.pdf
I believe that the CBLAS offered by LAPACK is an implementation of the interface as defined by the Basic Linear Algebra Subprograms Technical Forum 25 years ago.

If there are suggestions to improve CBLAS, send them along. I can try to pass this to the various stakeholders.

drhpc · 2021-06-13T07:59:27Z

Thanks for the pointer. So the relevant part seems to be B.2.2 in that spec, which says that BLAS_INDEX usually is size_t, but might also be chosen to be identical to the (signed) Fortran integer type used for indexing. It's up to the implementation.

So it seems that popular optimized implementations chose size_t and the Netlib reference chose the same integer it uses for Fortran. I see copies of cblas.h all around in various projects that use the lib (like numpy, shipping a header for an external lib), with that line

#define CBLAS_INDEX size_t  /* this may vary between platforms */

In https://github.com/LuaDist/gsl/blob/master/cblas/gsl_cblas.h, this is accompanied by

/* This is a copy of the CBLAS standard header.
 * We carry this around so we do not have to
 * break our model for flexible BLAS functionality.
 */

This sounds like this originated in the reference implementation, but has since changed? Looking at 4177968 … I see that the CBLAS_INDEX switch with WeirdNEC has been there before the 64 bit build. Wow, is this commit recent. Now I see that size_t was in the referenc cblas.h until 2015, 83fc0b4 having changed it and introduced the WeirdNEC define. I did not imagine that this is so recent! Wildly confusing.

I also see that the earlier version of cblas.h handed an int to the fortran call, now CBLAS_INDEX. This seems to be correct now, with consistent usage of CBLAS_INDEX as integer type and the switch for 32 or 64 bits in the Fortran part.

But could it be that the optimized libraries that inherited an old version of cblas.h with size_t but sync sources with current CBLAS code from the reference have a nice bug going? Aren't they doing something like this for the 32 bit case on a 64 bit system?

#include <stdio.h>
#include <stdlib.h>


void ia_max(int a, void *b)
{
	int *ib = (int*)b;
	*ib = a*2;
}


int main(int argc, char **argv)
{
	int i = atoi(argv[1]);
	size_t maxi;
	ia_max(i, &maxi);
	printf("in %d out %ld\n", i, (long)maxi);
	return 0;
}

This results in

$ gcc -O -o t t.c
$ ./t 42
in 42 out 140724603453524

Iniitalizing the size_t value to zero helps, but probably only in the little-endian case. Does nobody get into trouble for this? I have to be missing something.

To conclude:

Reference CBLAS had the size_t as return value first.
It used int in the actual call to Fortran, though.
Downstream (optimized BLAS, CBLAS users) run with the old version of the header.
Reference CBLAS introduces WeirdNEC hack for a specific system, replacing size_t with int or long (matching the Fortran side?!)
64 bit Reference CBLAS interface gets build on top of that, using CBLAS_INDEX everywhere for Fortran default integer.
Downstreams did their own thing with 64 bit support, but separating it from CBLAS_INDEX, which is always size_t.
Downstreams inherit the CBLAS wrappers that employ CBLAS_INDEX to call Fortran that expects default integer.

This sounds like a wonderful breakage as a result. Headers and code diverged. How come nobody noticed issues yet? Or did I miss the part where the reference CBLAS wrapper code for isamax and friends is not actually used?

martin-frbg · 2021-06-13T10:38:04Z

OpenBLAS at least does not use the CBLAS wrapper code from Reference-LAPACK (and never did, the source is there but does not get built)

drhpc · 2021-06-13T12:00:36Z

@martin-frbg Good to know. Can you point out a code path for, say, x86-64 that shows how the size_t is passed around to the actual computation for cblas_isamax()? I found some specific kernel implementation but am not sure about the general case.

Would be good to know that nobody actually passes a (size_t*) to the Fortran interface.

For sure it's not good that projects just assume

size_t cblas_isamax(…)

when the actual library might offer an int or long (or int64_t) as return value. Might work most of the time with values in 64 bit registers, but it's not nice. Can we rectify this in the implementations? People did not pick up on the example of Netlib in the past 5 years about consinstent use of CBLAS_INDEX.

martin-frbg · 2021-06-13T12:30:51Z

relevant code is in OpenBLAS/interface, e.g. interface/imax.c gets compiled to cblas_isamax() when CBLAS is defined, no Fortran code involved in its call graph.

drhpc · 2021-06-13T21:44:45Z

Ah, good. So the one case which is actually problematic is depending projects using a copy of cblas.h that does not fit the library.

I don't find actual usage of cblas_isamax() and friends in NumPy (and SciPy), so this might be only a theoretical issue. It should be fixed nonetheless. So:

Others follow the Netlib example of using int32_t/int64_t (let's be explicit while at that;-) BLAS_INDEX for both size returns and index arguments.
Netlib caves in and reverts to size_t for those returns like others.

Is this a separate issue to discuss? It does releate to the choice of 32 or 64 bit library, though.

PS: I am still unsure if enums in the API are a good idea (as actual data type for function arguments and struct members), as there are compiler options to change what integer is used underneath them. Not that relevant in practice, but makes me uneasy nevertheless.

drhpc · 2021-06-14T07:50:21Z

The more I think of this, the more I lean towards Option 2: We had size_t in the API for a very long time. Then Netlib changed that size_t to int or long. Regardless of what better matches the Fortran code or might be more consistent, the size_t was established API and Netlib reference broke that.

Should I open a PR about changing things for

size_t cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);
size_t cblas_idamax(const CBLAS_INDEX N, const double *X, const CBLAS_INDEX incX);
size_t cblas_icamax(const CBLAS_INDEX N, const void   *X, const CBLAS_INDEX incX);
size_t cblas_izamax(const CBLAS_INDEX N, const void   *X, const CBLAS_INDEX incX);

again? There should be no macro at this position anymore to emphasize that it's always size_t, everywhere, past and future.

drhpc · 2021-06-18T09:06:53Z

In numpy/numpy#19243 we now basically got down to: „Screw Netlib, size_t works for everyone else“.

christoph-conrads · 2021-06-20T18:56:18Z

There are three reasons to use size_t:

All of the C and C++ standard library functions accept and return this value, e.g., void* malloc(size_t), size_t strlen(), or std::size_t std::vector<T>::size() (C++). Using size_t avoids truncating values and signed/unsigned conversions.
size_t is often used to be express quantities that cannot be negative, e.g., matrix dimensions.
The C and C++ standards guarantee that you can store the size of any array in a size_t and that you can index all elements with size_t, cf. cppference.com: size_t.

Edit: You could build with 64 bit indices on a 32 bit system where size_t is 32 bits, right? Then you got overflow.

No because a 32-bit system may have more than 4 GB of virtual memory (Linux supports this) but a single 32-bit process can never access more than 4 GB. That is, the upper 32-bit of the 64-bit indices are never used.

Memory limit to a 32-bit process running on a 64-bit Linux OS

drhpc · 2021-06-20T22:34:14Z

I'm also thinking that keeping size_t is the correct thing to do, because changing from it was the ABI break and brought Netlib out of sync with the rest of the world.

But I feel compelled to nitpick on your arguments;-)

1. All of the C and C++ standard library functions accept and return this value

When I researched this,I stumbled upon the admission that it was a historic error to use an unsigned type for C++ container indices, and probably even the return type of the size() method, as you quickly end up mixing signed and unsigned numbers in some way. The current state of Netlib would be consistent with itself, always using signed types for size and indices, but of course inconsistent with malloc(), which has a requirement for unsigned size to actually be able to address all memory that fits into 32 bits (or 64 bits, in theory).

I am wondering myself right about that in code I wrote where I eventually hand an index as offset to a function call. The index is unsigned, the offset signed. Apart from compilers (MSVC) being confused by -unsigned_value, this would mean that I always have to worry about possible overflow in conversion.

But anyway, if it's just about computing memory sizes to hand to malloc() and friends, size_t is the natural thing, and it has been there before in CBLAS.

On possible issues with the current state of the code, mismatch with vendored cblas.h in builds:

No because a 32-bit system may have more than 4 GB of virtual memory (Linux supports this) but a single 32-bit process can never access more than 4 GB. That is, the upper 32-bit of the 64-bit indices are never used.

Right, size_t stays 32 bit. When you (silly as it might be) built cblas_isamax() to return a 64 bit integer, after hacking the build to not use long, but int64_t, of course, what will really happen in such usage?

size_t cblas_isamax(); // really int64_t cblas_isamax()!
size_t value = cblas_isamax(…);

The x86 calling convention might put the 64 bit value into EAX and EDX. Or it might work with a pointer return and some buffer. But what would other architectures do? So you might not get corruption, but for sure a wrong value. The best case is the higher 32 bits being ignored.

Now imagine a big-endian 32 bit system (some form of ARM) … sure you'd even get the desired half of the value returned?

You couldn't really work with non-sparse data that needs 64 bit indices in the 32 bit program, sure. But just being able to make a non-matching function call that at_least gives wrong results seems unhealthy.

I did some quick testing … on x86 Linux (gcc -m32 on an x86-64 system), you do just drop the upper 32 bits.

The more interesting case … 64 bit size_t:

size_t cblas_isamax(); // really int32_t cblas_isamax()!
size_t value = cblas_isamax(…);

Again, on x86-64, the peculiar relation between 64 bit RAX and 32 bit EAX makes things somewhat well-defined to also just silently zero the upper 32 bits once you do a 32 bit operation on the shared register. But there is fun to have with a slightly weird function defintion:

$ cat ret32.c 
#include <stdint.h>

int32_t ret64(int64_t a)
{
	a += 1LL<<32;
	return a;
}
$ gcc -m64  -g -c -o ret32.o ret32.c 
$ LANG=C objdump -S ret32.o 
[…]
   8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
	a += 1LL<<32;
   c:	48 b8 00 00 00 00 01 	movabs $0x100000000,%rax
  13:	00 00 00 
  16:	48 01 45 f8          	add    %rax,-0x8(%rbp)
	return a;
  1a:	48 8b 45 f8          	mov    -0x8(%rbp),%rax

You could argue if it is smart for the compiler to work on the full 64 bit register and leave the upper 32 bits uncleared for a function that is expected to return a 32 bit value, but it is perfectly legal if you rely on the caller only using the lower 32 bits, I guess.

$ cat call.c 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

INDEX ret64(int64_t);

int main(int argc, char **argv)
{
	if(argc < 2)
		return 1;
	int64_t a = (int64_t)strtoll(argv[1], NULL, 10);
	INDEX  s = ret64(a);
	printf("%lld\n", (long long)s);
	return 0;
}
$ gcc -m64 -g -DINDEX=int32_t -c -o call32_64.o call.c
$ gcc -m64 -g -DINDEX=size_t -c -o call64_64.o call.c
$ ./call32_64 1
1
$ ./call64_64 1
4294967297

Fun. A 32 bit return value that gives more than what's possible in 32 bits. This is what can happen (in principle) with the current state of Netlib CBLAS being linked with code that expects size_t. I guess though that the upper 32 bits of RAX will be zero in the actual code in practice. But who knows … compiler expects caller not to use more than the lower 32 bits on any platform … might as well store garbage there.

So … are we agreeing on moving Netlib back to size_t as return value?

weslleyspereira · 2021-06-23T18:14:09Z

Thanks for all these valuable comments!

I discussed this topic a bit with @langou. Based on the discussion here, my proposal is:

In a separate PR:

We move back to a cblas.h that uses two integer definitions, let's say CBLAS_INDEX and CBLAS_INT. That is what happens in MKL (CBLAS_INDEX and MKL_INT) and OpenBLAS (CBLAS_INDEX and blasint). CBLAS_INDEX will be used only in the return of i*amax. With that, we restore an ABI that is compatible with other BLAS.
Additionally, we choose the default value of CBLAS_INDEX to be size_t and collect opinions from the community.

I think this is aligned (or maybe the same) idea behind the recent discussions in this thread.
As @drhpc pointed out,
83fc0b4 changed the default value of CBLAS_INDEX, and
4177968 changed the use of CBLAS_INDEX.

Just to reinforce:

OpenBLAS, MKL, GNU Scientific Library, and Numpy all use size_t by default.
The C interface for BLAS (https://www.netlib.org/blas/blast-forum/cinterface.pdf) indicates that, usually, CBLAS_INDEX = size_t.

Do you agree? If you do, I can open the PR. Or maybe @drhpc would like to do that.

drhpc · 2021-06-23T18:25:40Z

I agree. And please just go ahead with the PR.

langou · 2021-06-23T19:14:09Z

@mgates3 mentioned to me the discussion on the Slate Google Group:
https://groups.google.com/a/icl.utk.edu/g/slate-user/c/f5y6gt0aoLs/m/oQyyhikwCgAJ
The discussion is not about what "CBLAS_INDEX" should be, but more what "CBLAS_INT" should be. Should CBLAS_INT be size_t or signed integer or etc.? I think participants are making good points so I am passing along.

weslleyspereira · 2021-06-23T21:57:25Z

Please, see #588.

christoph-conrads · 2021-06-27T15:25:15Z

Right, size_t stays 32 bit. When you (silly as it might be) built cblas_isamax() to return a 64 bit integer, after hacking the build to not use long, but int64_t, of course, what will really happen in such usage?
size_t cblas_isamax(); // really int64_t cblas_isamax()!
size_t value = cblas_isamax(…);
The x86 calling convention might put the 64 bit value into EAX and EDX. Or it might work with a pointer return and some buffer. But what would other architectures do? So you might not get corruption, but for sure a wrong value. The best case is the higher 32 bits being ignored.

Now imagine a big-endian 32 bit system (some form of ARM) … sure you'd even get the desired half of the value returned?

This is game over. On 32-bit Arm CPUs, four 32-bit values can be passed and returned in registers, 64-bit values occupy two consecutive registers, see Section 6.1.1.1 in Procedure Call Standard for the Arm Architecture. Instead of writing to one register, the callee will clutter two registers with his 64-bit integers; this is obviously a problem. As soon as the caller runs out of registers for parameters, the stack is used. The stack alignment is 32 bits but instead of reading or writing 32 bit, the callee writes 64 bit; again, this is game over and this problem (mismatch of stack read/write sizes) should cause problems on all instruction set architectures at some point.

Arm ABI documentation

christoph-conrads · 2021-06-27T15:59:27Z

I am wondering myself right about that in code I wrote where I eventually hand an index as offset to a function call. The index is unsigned, the offset signed. Apart from compilers (MSVC) being confused by -unsigned_value, this would mean that I always have to worry about possible overflow in conversion.

No, the standard committees behind C and C++ make your code behave the obvious way in this case: If u is an unsigned value and s is a signed value, where u has at least as many bits as s, then u + s will yield the mathematically correct result unless u + s over- or underflows. If it under-/overflows, the result will wrap around, i.e., (u + s) mod 2^b, where b is the number of bits in u and s. On the other hand, if the signed type can represent all values of the unsigned type, then the unsigned value will be converted to the unsigned type.

The relevant clauses in the C11 standard draft are the following:

6.2.5.9: Binary operations with only unsigned operands cannot overflow; the result is taken modulo MAX + 1, where MAX is the largest representable value.
6.3.1.3: Given a signed value s, it is converted to the unsigned value s if s >= 0, otherwise it is converted to s + MAX + 1.
6.3.1.8: Signed and unsigned operands [of the same size] are converted to unsigned; an unsigned operand is converted to a signed type if the signed type can represent all values of the unsigned type

Hence, u + s (C syntax) will be evaluated to

(u + s) mod (M + 1) if s >= 0,
(u + s + M + 1) mod (M + 1) otherwise.

In the absence of over- or underflow, this expression will evaluate to u + s which is the intuitively desired outcome.

christoph-conrads · 2021-06-27T16:15:05Z

When I researched this,I stumbled upon the admission that it was a historic error to use an unsigned type for C++ container indices, and probably even the return type of the size() method, as you quickly end up mixing signed and unsigned numbers in some way.

There are some C++ programmers (including the inventor of C++) who are proposing to use signed integers everywhere, see the C++ Core Guidelines but I would not call this an admission. The problem with the "signed integers everywhere" policy are

checking for minimum values: with an unsigned integer it is in many cases superfluous to check the minimum value, with a signed integer it is obligatory; this is error prone and can cause security problems, see for example CWE-839 Numeric Range Comparison Without Minimum Check.
overflows: an unsigned overflow has a well-defined result whereas a signed integer overflow constitutes undefined behavior.

You can attempt to check for a signed overflow with the expression a + b < a but the compiler might optimize it away without warning, see for example GCC bug 30475 assert(int+100 > int) optimized away from 2007. This would work with unsigned integers though (a unsigned, b possibly signed and b having at most as many bits as a). Seeing the article OptOut – Compiler Undefined Behavior Optimizations from 2020, GCC behavior has apparently not changed.

cdluminate · 2021-08-30T23:06:58Z

Just FYI. I'm the Debian developer who added the BLAS64 and LAPACK64 debian packages. And I developed the inital version of blas-lapack-switching mechanism for Gentoo. Feel free to at me if you would like my inputs.

epsilon-0 mentioned this issue Nov 1, 2020

create INDEX64 target #462

Merged

weslleyspereira added the Type: Enhancement label Apr 21, 2021

weslleyspereira added the Related: Build System label Apr 21, 2021

drhpc mentioned this issue Jun 3, 2021

cmake build: more modularity, linker fix, pkg-config fixup #556

Closed

2 tasks

langou closed this as completed Jun 12, 2021

langou reopened this Jun 12, 2021

drhpc mentioned this issue Jun 14, 2021

npy_cblas.h does not match libcblas (size_t vs int or long for CBLAS_INDEX in Netlib) numpy/numpy#19243

Open

weslleyspereira mentioned this issue Jun 23, 2021

Using CBLAS_INDEX and CBLAS_INT integer types in CBLAS #588

Merged

2 tasks

mkrainiuk mentioned this issue May 5, 2022

CBLAS/LAPACKE extension for 64bit integer #666

Closed

MBlaschek mentioned this issue Mar 19, 2024

added index64 to netlib-lapack spack/spack#43260

Open

Allow a possible installation of index-64 library alongside standard index-32 library? #461

Allow a possible installation of index-64 library alongside standard index-32 library? #461

Comments

epsilon-0 commented Nov 1, 2020 • edited Loading

langou commented Nov 1, 2020

epsilon-0 commented Nov 1, 2020

epsilon-0 commented Nov 5, 2020

epsilon-0 commented Nov 6, 2020

christoph-conrads commented Feb 15, 2021

weslleyspereira commented Apr 21, 2021

epsilon-0 commented Apr 21, 2021

weslleyspereira commented Apr 21, 2021

drhpc commented Jun 3, 2021

drhpc commented Jun 3, 2021

weslleyspereira commented Jun 7, 2021

drhpc commented Jun 8, 2021 • edited Loading

drhpc commented Jun 8, 2021

drhpc commented Jun 8, 2021

drhpc commented Jun 8, 2021

weslleyspereira commented Jun 8, 2021

drhpc commented Jun 9, 2021

drhpc commented Jun 10, 2021

weslleyspereira commented Jun 11, 2021

drhpc commented Jun 12, 2021 • edited Loading

drhpc commented Jun 12, 2021

martin-frbg commented Jun 12, 2021

drhpc commented Jun 12, 2021

langou commented Jun 12, 2021

langou commented Jun 12, 2021

drhpc commented Jun 13, 2021 • edited Loading

martin-frbg commented Jun 13, 2021

drhpc commented Jun 13, 2021 • edited Loading

martin-frbg commented Jun 13, 2021

drhpc commented Jun 13, 2021

drhpc commented Jun 14, 2021 • edited Loading

drhpc commented Jun 18, 2021

christoph-conrads commented Jun 20, 2021

drhpc commented Jun 20, 2021

weslleyspereira commented Jun 23, 2021

drhpc commented Jun 23, 2021

langou commented Jun 23, 2021

weslleyspereira commented Jun 23, 2021

christoph-conrads commented Jun 27, 2021 • edited Loading

christoph-conrads commented Jun 27, 2021

christoph-conrads commented Jun 27, 2021 • edited Loading

cdluminate commented Aug 30, 2021

epsilon-0 commented Nov 1, 2020 •

edited

Loading

drhpc commented Jun 8, 2021 •

edited

Loading

drhpc commented Jun 12, 2021 •

edited

Loading

drhpc commented Jun 13, 2021 •

edited

Loading

drhpc commented Jun 13, 2021 •

edited

Loading

drhpc commented Jun 14, 2021 •

edited

Loading

christoph-conrads commented Jun 27, 2021 •

edited

Loading

christoph-conrads commented Jun 27, 2021 •

edited

Loading