Skip to content

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vladimir-ch opened this issue Jul 30, 2019 · 1 comment
Closed

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work #348

vladimir-ch opened this issue Jul 30, 2019 · 1 comment
Milestone

Comments

@vladimir-ch
Copy link
Contributor

Our Gonum lapacke wrapper CI tests fail on calls to LAPACKE_dpbtrf_work (gonum/netlib#63). The tests fail already on a 1x1 matrix with a positive element. I've created a small reproducer in C:

#include <stdio.h>
#include <lapacke.h>

int main() {
        lapack_int n = 1;
        lapack_int kd = 1; // Arbitrary but harmless, at least for the Fortran reference.
        lapack_int ldab = kd+1;
        double ab[4] = {2, -1, -1, -1}; // Arbitrary padding with -1

        lapack_int info = LAPACKE_dpbtrf_work(LAPACK_ROW_MAJOR, 'U', n, kd, ab, ldab);

         // info=1 with ROW_MAJOR, correct info=0 with COL_MAJOR
        printf("info = %d\n", info);

        return 0;
}

Valgrind reports use of uninitialized memory allocated by LAPACKE_dpbtrf_work so it's possible that success/failure depends on what's in that memory.

With COL_MAJOR layout returned info is always 0, so it's clearly an issue in the ROW_MAJOR part of LAPACKE_dpbtrf_work. The only relevant operation is the row->col->row conversion. LAPACKE_dpb_trans looks ok but it calls LAPACKE_dgb_trans which looks problematic to me.

The loops for (for example) row->col conversion are:

for( j = 0; j < MIN( n, ldin ); j++ ) {
    for( i = MAX( ku-j, 0 ); i < MIN3( ldout, m+ku-j, kl+ku+1 ); i++ ) {

At first sight, the for loop bounds refer to ldin and ldout which with a correct input is harmless but very odd anyway. Ignoring that, these loops for the reproducer input translate into:

for( j = 0; j < 1; j++ ) {
    for( i = 1; i < 2; i++ ) {

The i loop bounds are clearly wrong and access an element which is not in the matrix. The first element of the output array is not set.

There is another surprising aspect of LAPACKE_dgb_trans which is how the conversion between layouts is actually done. Using row major as an example, it apparently assumes that the input is the packed band matrix with columns in columns, diagonals in rows stored in row major and all it does is a plain full matrix transpose. However, this is in contrast to how BLAS defines band storage in row-major. Citing from https://www.netlib.org/blas/blast-forum/chapter2.pdf, page 25:

Similarly, for C (row-major storage), order = blas_rowmajor, the contiguous dimension (rows) of the matrix is stored in the contiguous dimension (rows) of the array, strided by lda. ...

This is further confirmed in Appendix B, page 194 where the row major storage is depicted explicitly. As you can read in gonum/netlib#63 this discordance is causing an utter confusion to us and would be extremely helpful and relieving if someone could conclusively clarify how exactly LAPACKE band matrices are laid out in memory in row-major order.

@vladimir-ch
Copy link
Contributor Author

I'll close this because based on experiments with OpenBLAS and MKL I came to the conclusion that the row-major band storage for LAPACKE is indeed the LAPACK packing (diagonals in rows, columns in colums) only stored in row-major layout. This means that it's not the CBLAS format which is surprising and unfortunate, and is not clearly documented anywhere. Since in Gonum we use the CBLAS format, we'll convert the matrices on our side directly to the LAPACK packing in col-major layout.

Knowing this, LAPACKE_dgb_trans now makes more sense. Also, I cannot reproduce the valgrind warning about uninitialized memory. I suspect that the binary was picking OpenBLAS 0.3.6 from the system which was somehow causing trouble (maybe because it wasn't compiled with the -fno-optimize-sibling-calls flag? Not sure).

@julielangou julielangou added this to the LAPACK 3.9.0 milestone Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants