You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our Gonum lapacke wrapper CI tests fail on calls to LAPACKE_dpbtrf_work (gonum/netlib#63). The tests fail already on a 1x1 matrix with a positive element. I've created a small reproducer in C:
#include <stdio.h>
#include <lapacke.h>
int main() {
lapack_int n = 1;
lapack_int kd = 1; // Arbitrary but harmless, at least for the Fortran reference.
lapack_int ldab = kd+1;
double ab[4] = {2, -1, -1, -1}; // Arbitrary padding with -1
lapack_int info = LAPACKE_dpbtrf_work(LAPACK_ROW_MAJOR, 'U', n, kd, ab, ldab);
// info=1 with ROW_MAJOR, correct info=0 with COL_MAJOR
printf("info = %d\n", info);
return 0;
}
Valgrind reports use of uninitialized memory allocated by LAPACKE_dpbtrf_work so it's possible that success/failure depends on what's in that memory.
With COL_MAJOR layout returned info is always 0, so it's clearly an issue in the ROW_MAJOR part of LAPACKE_dpbtrf_work. The only relevant operation is the row->col->row conversion. LAPACKE_dpb_trans looks ok but it calls LAPACKE_dgb_trans which looks problematic to me.
The loops for (for example) row->col conversion are:
for( j = 0; j < MIN( n, ldin ); j++ ) {
for( i = MAX( ku-j, 0 ); i < MIN3( ldout, m+ku-j, kl+ku+1 ); i++ ) {
At first sight, the for loop bounds refer to ldin and ldout which with a correct input is harmless but very odd anyway. Ignoring that, these loops for the reproducer input translate into:
for( j = 0; j < 1; j++ ) {
for( i = 1; i < 2; i++ ) {
The i loop bounds are clearly wrong and access an element which is not in the matrix. The first element of the output array is not set.
There is another surprising aspect of LAPACKE_dgb_trans which is how the conversion between layouts is actually done. Using row major as an example, it apparently assumes that the input is the packed band matrix with columns in columns, diagonals in rows stored in row major and all it does is a plain full matrix transpose. However, this is in contrast to how BLAS defines band storage in row-major. Citing from https://www.netlib.org/blas/blast-forum/chapter2.pdf, page 25:
Similarly, for C (row-major storage), order = blas_rowmajor, the contiguous dimension (rows) of the matrix is stored in the contiguous dimension (rows) of the array, strided by lda. ...
This is further confirmed in Appendix B, page 194 where the row major storage is depicted explicitly. As you can read in gonum/netlib#63 this discordance is causing an utter confusion to us and would be extremely helpful and relieving if someone could conclusively clarify how exactly LAPACKE band matrices are laid out in memory in row-major order.
The text was updated successfully, but these errors were encountered:
I'll close this because based on experiments with OpenBLAS and MKL I came to the conclusion that the row-major band storage for LAPACKE is indeed the LAPACK packing (diagonals in rows, columns in colums) only stored in row-major layout. This means that it's not the CBLAS format which is surprising and unfortunate, and is not clearly documented anywhere. Since in Gonum we use the CBLAS format, we'll convert the matrices on our side directly to the LAPACK packing in col-major layout.
Knowing this, LAPACKE_dgb_trans now makes more sense. Also, I cannot reproduce the valgrind warning about uninitialized memory. I suspect that the binary was picking OpenBLAS 0.3.6 from the system which was somehow causing trouble (maybe because it wasn't compiled with the -fno-optimize-sibling-calls flag? Not sure).
Our Gonum lapacke wrapper CI tests fail on calls to LAPACKE_dpbtrf_work (gonum/netlib#63). The tests fail already on a 1x1 matrix with a positive element. I've created a small reproducer in C:
Valgrind reports use of uninitialized memory allocated by
LAPACKE_dpbtrf_work
so it's possible that success/failure depends on what's in that memory.With COL_MAJOR layout returned info is always 0, so it's clearly an issue in the ROW_MAJOR part of
LAPACKE_dpbtrf_work
. The only relevant operation is the row->col->row conversion.LAPACKE_dpb_trans
looks ok but it callsLAPACKE_dgb_trans
which looks problematic to me.The loops for (for example) row->col conversion are:
At first sight, the for loop bounds refer to
ldin
andldout
which with a correct input is harmless but very odd anyway. Ignoring that, these loops for the reproducer input translate into:The i loop bounds are clearly wrong and access an element which is not in the matrix. The first element of the output array is not set.
There is another surprising aspect of
LAPACKE_dgb_trans
which is how the conversion between layouts is actually done. Using row major as an example, it apparently assumes that the input is the packed band matrix with columns in columns, diagonals in rows stored in row major and all it does is a plain full matrix transpose. However, this is in contrast to how BLAS defines band storage in row-major. Citing from https://www.netlib.org/blas/blast-forum/chapter2.pdf, page 25:This is further confirmed in Appendix B, page 194 where the row major storage is depicted explicitly. As you can read in gonum/netlib#63 this discordance is causing an utter confusion to us and would be extremely helpful and relieving if someone could conclusively clarify how exactly LAPACKE band matrices are laid out in memory in row-major order.
The text was updated successfully, but these errors were encountered: