Skip to content

dgesv produces incorrect answer when compiling the library using CMake with USE_OPENMP=1 #2396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
heilokchow opened this issue Feb 9, 2020 · 15 comments · Fixed by #2420
Closed
Milestone

Comments

@heilokchow
Copy link

When I use OpenBLAS to do solve some high dimensional linear equations using LAPACKE_dgesv, I use CMake to compile the OpenBLAS library libopenblas.a with -DUSE_OPENMP=1 since my program itself uses OpenMP for parallelism. My CMake build command is (unix system):

cmake -DUSE_OPENMP=1 <openblas directory with CMakeLists.txt>
cmake --build <current binary directory>

The computing process doesn't generate any warning or bugs. First, I get vector b by b = A*x, then, I solve x by using dgesv function with parameter A and b. If I compare the original x and solved x, when dimension is low, it works fine with no error. However, when dimension is high, lets say n = 100, the solved x is inconsistent with original x and the error is very large. If I use make USE_OPENMP=1 to recompile libopenblas.a using original (not CMake generated) Makefile and link my program with this library, the result of dgesv function is always correct. My testing code is shown below:

#include <stdio.h>
#include <cblas.h>
#include <lapacke.h>
#include <random>
#include <omp.h>

int main ()
{
    int n = 1000;
    int nrep = 20;
    double sum_error = 0.0;

    for (int k = 0; k < nrep; k++){
        double* A = new double[n*n];
        double* x = new double[n];
        double* b = new double[n];
        int* ipiv = new int[n];
        double sum = 0;

        std::random_device device;
        std::mt19937 generator(device());
        std::normal_distribution<double> normal(0.0, 1.0);

        for (int i = 0; i < n*n; i++)
            A[i] = normal(generator);

        for (int i = 0; i < n; i++)
            x[i] = normal(generator);

        cblas_dgemv(CblasRowMajor, CblasNoTrans, n, n, 1.0, A, n, x, 1, 0.0, b, 1);
        LAPACKE_dgesv(LAPACK_ROW_MAJOR, n, 1, A, n, ipiv, b, 1);

        for (int i = 0; i < n; i++)
            sum += abs(b[i] - x[i]);
        
        sum_error += sum;

        delete[] A;
        delete[] x;
        delete[] b;
        delete[] ipiv;
    }

    printf("Error: %f\n", sum_error);

    return 0;
}
@brada4
Copy link
Contributor

brada4 commented Feb 9, 2020

What processor? Which OpenBLAS version?
Test case does not break with cores up to Sandybridge, omp or pthread, 0.3.7
EDIT: compiled with make, not cmake

@heilokchow
Copy link
Author

Both my zen and kabylake processors will have the same issue. I use OpenBLAS version 0.3.7 for testing. Yep, directly compile with make USE_OPENMP=1 won't generate such problem. So is it caused by the experimental support of cmake?

@brada4
Copy link
Contributor

brada4 commented Feb 10, 2020

Probably this report will keep experimental label for another release or two.
Could you post at least cmake and make (or other backend used) versions, compiler versions etc.
It seems similar if not same as #2395 , maybe compare findings over there.
At least you have workaround with make

@martin-frbg martin-frbg added this to the 0.3.9 milestone Feb 10, 2020
@heilokchow
Copy link
Author

my cmake version is 3.16, gnu make version is 4.1 and gcc version is 7.3.0. I test the code on Ubuntu 18.04 and 19.04. I will try cmake on other platforms to see whether similar problem will occur.

@brada4
Copy link
Contributor

brada4 commented Feb 10, 2020

If you say make works, it could be worth testing versions with radically outdated/updated cmake makefile generator. The "old way" sort of works very consistently for decade or so.

@heilokchow
Copy link
Author

Update: I test that if I perform a normal compiling with cmake on unix system using pthread instead of omp, The dgesv function's result is also incorrect (when n is large). I use cmake 2.8.12 and cmake 3.16. Both of them will produce same error. Here is the building logs for cmake and standard make.

for cmake (under a build directory):

cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_VERBOSE_MAKEFILE=ON ../../OpenBLAS-0.3.7 >> cmake_log.txt 2>&1
make >> cmake_log.txt 2>&1

cmake_log.txt

for make (under /OpenBLAS-0.3.7 directory)

make >> make_log.txt 2>&1

make_log.txt

@brada4
Copy link
Contributor

brada4 commented Feb 11, 2020

How big are exact parameters? http://www.netlib.org/lapack/lapack-3.1.1/html/dgesv.f.html
The issue could be insufficiently robust integer arithmetic leading to pointer problems.

@heilokchow
Copy link
Author

On my device, when n=100, the sum_error in the test code above is quite large (around 1000).

@brada4
Copy link
Contributor

brada4 commented Feb 11, 2020

Does it correct itself id you make N divisible by 16? i.e if some asm kernel misfires down the hood?

@martin-frbg
Copy link
Collaborator

@brada4 stop please - we know already that it does work correctly when built with gmake, so the error must be somewhere in the cmake rules rather than the actual code.
(As OpenBLAS reimplements gesv with getrf/getrs, it is probably in the generator rules for the TRSM kernels. Yesterday's fix for TRMM does not appear to have solved it)

@marxin
Copy link
Contributor

marxin commented Feb 14, 2020

@heilokchow
Can you please describe how you compile and link the test from top-level OpenBLAS folder?
The exact command line would be preferred.

@martin-frbg
Copy link
Collaborator

@marxin the problem is completely reproducible (with or without USE_OPENMP) with just plain cmake .. (which is why I said "stop" above). So far I have traced it to the OpenBLAS reimplementation of getf2 in lapack/getf2/getf2_k.c (which uses iamax, gemv_n, swap and scal kernels) but have not yet narrowed it down further.

@marxin
Copy link
Contributor

marxin commented Feb 14, 2020

@marxin the problem is completely reproducible (with or without USE_OPENMP) with just plain cmake .. (which is why I said "stop" above). So far I have traced it to the OpenBLAS reimplementation of getf2 in lapack/getf2/getf2_k.c (which uses iamax, gemv_n, swap and scal kernels) but have not yet narrowed it down further.

All right! What we can do is to compare assemblies and see where's a difference.
Can you please provide steps on how to link to the code snippet?

@marxin
Copy link
Contributor

marxin commented Feb 14, 2020

Can you please provide steps on how to link to the code snippet?

I was missing some system header files. Now I can link it against a statically build library.
The binary has only about 100K, so one should be able to see a difference in the assembly.

@martin-frbg
Copy link
Collaborator

At least part of the problem seems to be getrf_single getting built without the -DUNIT. Trying a fix for lapack/CMakeLists.txt now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants