Skip to content

add ReLaPACK to OpenBLAS? #788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carlkl opened this issue Mar 5, 2016 · 13 comments
Closed

add ReLaPACK to OpenBLAS? #788

carlkl opened this issue Mar 5, 2016 · 13 comments

Comments

@carlkl
Copy link

carlkl commented Mar 5, 2016

I want to draw the attention to a new linear algebra library called Recursive LAPACK sitting on the top of BLAS and Lapack, see:

http://hpac.rwth-aachen.de/~peise/relapack
https://github.com/HPAC/ReLAPACK (MIT license)

According to the authors this library even beats optimized Blas/LAPACK implementations in several cases (see the paper from their website). The API seems to be very simple: just use the augumented functions names instead of the original Lapack functions: relapack.h

I propose to add this library to OpenBLAS and to switch between Lapack and ReLAPCK functions steered with an environment variable and/or an function. Any hints how to implement this in an easy way?

@xianyi
Copy link
Collaborator

xianyi commented Mar 5, 2016

  • You can create a relapack folder or use git submodule.
  • Then, add a flag (e.g. BUILD_RELAPACK) at Makefile.rule.
  • Remove the redundant objs at interface/lapack/Makefile and lapack-netlib/SRC/Makefile with BUILD_RELAPACK=1

@elmar-peise
Copy link

I think ReLAPACK would be a great and easy way to speed up LAPACK performance in OpenBLAS: As seen in our paper on arXiv, with OpenBLAS ReLAPACK is faster than LAPACK for most of its routines.

In case one wants to select whether to call ReLAPACK or another LAPACK implementation depending on the architecture (or other factors), I see two alternatives:

  • Use ReLAPACK's configuration header config.h to specify which of its routines to make available under their default LAPACK interfaces. This is a compile time option.
  • Disable the LAPACK interface altogether and selectively call ReLAPACK's own/internal routines (named, e.g., RELAPACK_dgetrf) which have the same interfaces as their LAPACK counterparts.

Either way, I'm more than happy to help with any ReLAPACK-related questions or issues and suggestions and feedback are very welcome!

@xianyi
Copy link
Collaborator

xianyi commented Mar 7, 2016

@elmar-peise , thank you for your comment. I read your paper. Great work.

@gaming-hacker
Copy link

i think the performance will vary wildly depending upon the hardware and precision needed. i can't see using it with mulitprecision libs as the overhead would probably at least be O^2 but it's another tool in the toolbox if someone wants to use. recursion also has its own limitations of numerical stability.

@carlkl
Copy link
Author

carlkl commented Mar 8, 2016

I will add ReLAPACK later on with a PR. As the responses are positive I will close this issue.

@carlkl carlkl closed this as completed Mar 8, 2016
@xianyi
Copy link
Collaborator

xianyi commented Mar 8, 2016

@carlkl , thank you for your PR advanced 👍

@martin-frbg
Copy link
Collaborator

Has there been any followup on this, or was it put on hold for lack of time ?

@carlkl
Copy link
Author

carlkl commented Jun 6, 2017

The 2nd. There are even some much more important developments (not OpenBLAS specific) on hold for that reason.

@martin-frbg
Copy link
Collaborator

Thanks for the feedback. Depending on how things pan out I may have time to spare in the near future.

@elmar-peise
Copy link

Please let me know if you need any help from the ReLAPACK side!

@martin-frbg
Copy link
Collaborator

@elmar-peise seems to me you have a series of typos in lapack_wrapper.c where wrappers for the xTGSYL functions are made conditional on INCLUDE_xRGSYL
Out of curiosity, do you have any performance figures for a newer OpenBLAS/netlib combo than what you used in the preprint ? I think netlib LAPACK has seen some perfomance improvements between
3.5.0 and 3.7.0, and OpenBLAS 0.2.15 had an awkward Makefile bug that dropped all the optimization flags for the fortran (netlib) part of the build, making its lapack parts significantly slower than the same code built from the netlib package.

@elmar-peise
Copy link

Thanks for spotting the typos! I fixed them in HPAC/ReLAPACK@c8dbbd4
Unfortunately, I don't have any more recent performance plots. I've been following the recent LAPACK release notes but don't think there were any performance-relevant changes to the routines covered by ReLAPACK. Unfortunately, I can't say the same for OpenBLAS.

@martin-frbg
Copy link
Collaborator

There is a similar problem for CHEGST and ZHEGST which appear to be represented by interfaces to nonexistent routines "CSYGST" and "ZSYGST" in lapack_wrapper.c (i.e. not only the INCLUDE_ label in the ifdef is wrong, but also all subsequent occurences of the function name in that file) ?

This was referenced Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants