LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

vladimir-ch · 2019-07-30T07:49:14Z

Our Gonum lapacke wrapper CI tests fail on calls to LAPACKE_dpbtrf_work (gonum/netlib#63). The tests fail already on a 1x1 matrix with a positive element. I've created a small reproducer in C:

#include <stdio.h>
#include <lapacke.h>

int main() {
        lapack_int n = 1;
        lapack_int kd = 1; // Arbitrary but harmless, at least for the Fortran reference.
        lapack_int ldab = kd+1;
        double ab[4] = {2, -1, -1, -1}; // Arbitrary padding with -1

        lapack_int info = LAPACKE_dpbtrf_work(LAPACK_ROW_MAJOR, 'U', n, kd, ab, ldab);

         // info=1 with ROW_MAJOR, correct info=0 with COL_MAJOR
        printf("info = %d\n", info);

        return 0;
}

Valgrind reports use of uninitialized memory allocated by LAPACKE_dpbtrf_work so it's possible that success/failure depends on what's in that memory.

With COL_MAJOR layout returned info is always 0, so it's clearly an issue in the ROW_MAJOR part of LAPACKE_dpbtrf_work. The only relevant operation is the row->col->row conversion. LAPACKE_dpb_trans looks ok but it calls LAPACKE_dgb_trans which looks problematic to me.

The loops for (for example) row->col conversion are:

for( j = 0; j < MIN( n, ldin ); j++ ) {
    for( i = MAX( ku-j, 0 ); i < MIN3( ldout, m+ku-j, kl+ku+1 ); i++ ) {

At first sight, the for loop bounds refer to ldin and ldout which with a correct input is harmless but very odd anyway. Ignoring that, these loops for the reproducer input translate into:

for( j = 0; j < 1; j++ ) {
    for( i = 1; i < 2; i++ ) {

The i loop bounds are clearly wrong and access an element which is not in the matrix. The first element of the output array is not set.

There is another surprising aspect of LAPACKE_dgb_trans which is how the conversion between layouts is actually done. Using row major as an example, it apparently assumes that the input is the packed band matrix with columns in columns, diagonals in rows stored in row major and all it does is a plain full matrix transpose. However, this is in contrast to how BLAS defines band storage in row-major. Citing from https://www.netlib.org/blas/blast-forum/chapter2.pdf, page 25:

Similarly, for C (row-major storage), order = blas_rowmajor, the contiguous dimension (rows) of the matrix is stored in the contiguous dimension (rows) of the array, strided by lda. ...

This is further confirmed in Appendix B, page 194 where the row major storage is depicted explicitly. As you can read in gonum/netlib#63 this discordance is causing an utter confusion to us and would be extremely helpful and relieving if someone could conclusively clarify how exactly LAPACKE band matrices are laid out in memory in row-major order.

The text was updated successfully, but these errors were encountered:

vladimir-ch · 2019-08-19T16:46:56Z

I'll close this because based on experiments with OpenBLAS and MKL I came to the conclusion that the row-major band storage for LAPACKE is indeed the LAPACK packing (diagonals in rows, columns in colums) only stored in row-major layout. This means that it's not the CBLAS format which is surprising and unfortunate, and is not clearly documented anywhere. Since in Gonum we use the CBLAS format, we'll convert the matrices on our side directly to the LAPACK packing in col-major layout.

Knowing this, LAPACKE_dgb_trans now makes more sense. Also, I cannot reproduce the valgrind warning about uninitialized memory. I suspect that the binary was picking OpenBLAS 0.3.6 from the system which was somehow causing trouble (maybe because it wasn't compiled with the -fno-optimize-sibling-calls flag? Not sure).

vladimir-ch mentioned this issue Jul 30, 2019

lapack/netlib: add Dpbtrf and Dpbtrs gonum/netlib#63

Merged

vladimir-ch closed this as completed Aug 19, 2019

julielangou added this to the LAPACK 3.9.0 milestone Nov 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Aug 19, 2019

Uh oh!

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

Comments

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Aug 19, 2019

Uh oh!