Skip to content

Fix some incorrect SLASET/DLASET calls (includes backport of openblas#2778) #438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Sep 4, 2020

@serguei-patchkovskii observed some errors and suggested some fixes in #425 & #429 that were picked up by @martin-frbg in OpenMathLib/OpenBLAS#2778. Serguei said he won't have time to prepare a backport-PR, and it seems (after ~3 weeks) neither did Martin.

I have left the commits from OpenMathLib/OpenBLAS#2778 intact; just added Suggested-By: Serguei Patchkovskii <[email protected]> to reflect Martin's comment

Relates to #318
Closes #425
Closes #429

Edit: after some more explanation by @serguei-patchkovskii & tips by @martin-frbg, I investigated a bit more and I believe this PR now additionally:
Closes #318

Reference-LAPACK issue 429

Suggested-By: Serguei Patchkovskii <[email protected]>
Reference-LAPACK issue 425 (and 318)

Suggested-By: Serguei Patchkovskii <[email protected]>
Reference-LAPACK issue 425 (and 318)

Suggested-By: Serguei Patchkovskii <[email protected]>
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled over a difference in the suggestion from @serguei-patchkovskii and the implementation from @martin-frbg.

PTAL

Comment on lines 683 to 684
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SD, 1 )
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SE, 1 )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I compare the diff with the OP from #429, @serguei-patchkovskii actually suggested the following

Suggested change
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SD, 1 )
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SE, 1 )
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SD, N )
CALL SLASET( 'Full', N, 1, ZERO, ZERO, SE, N )

Was it intentional to implement this differently @martin-frbg?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 4, 2020

Intentional, and also the reason why I did not commit the PR immediately, and/or copy it here - as far as I (and valgrind) can tell, specifying "1" instead of "N" is consistent with current usage in the test code, while for a general case serguei-patchkovskii's assessment seems correct.

@serguei-patchkovskii
Copy link

serguei-patchkovskii commented Sep 5, 2020

Intentional, and also the reason why I did not commit the PR immediately, and/or copy it here - as far as I (and valgrind) can tell, specifying "1" instead of "N" is consistent with current usage in the test code, while for a general case serguei-patchkovskii's assessment seems correct.

@martin-frbg is absolutely correct: this is the current usage in the code. This usage is contrary to the letter of the standard, and should be corrected. Let me explain in detail.

The relevant fragments of lapack/SRC/slaset.f are:

  SUBROUTINE SLASET( UPLO, M, N, ALPHA, BETA, A, LDA )

  REAL               A( LDA, * )
  •    Set the leading m-by-n submatrix to ALPHA.
    
  •    DO 60 J = 1, N
          DO 50 I = 1, M
             A( I, J ) = ALPHA
    
    50 CONTINUE
    60 CONTINUE

In the calling routine, we have the call: CALL SLASET( 'Full', N, 1, ZERO, ZERO, SE, 1 ), where SE is declared as a 1-dimensional array, with the at least SE(1:N) index range. In SLASET, we are trying to set A(1:N,1:1) = ZERO. This is not pretty, but as far as I can tell is allowed under the Fortran sequence association rules for Fortran77-style calls.

Unfortunately, the matrix A in this call is declared as A(1:1,*) - so accessing the first index beyond 1 is a bounds violation, and explicitly invokes undefined behaviour. Therefore, I would claim that all calls to SLASET/DLASET with the argument LDA<M are incorrect - even if they are the current usage. While it might seem to work, it is not hard to imagine a standard-conforming implementation, where it will generate incorrect code (e.g. an implementation which relies on the callee performing copy-in/copy-out for array arguments will produce unexpected results).

If I had to guess why it wasn't spotted and corrected so far, I would speculate that this is an archaism in many compilers, which goes back to FORTRAN-IV. Before Fortran77 (which as far as I recall introduced the A(*) array bound declaration), the commonly-used extension was to declare arrays with an unknown upper bound was to give them with the upper bound of 1. For this reason, many Fortran compilers still treat the upper bound of "1" same as "*" in bounds checking, and do not flag a bounds violation.

So this code might have been semi-OK under the 60-year-old language definition - and has been grandfathered by compilers ever since. Quite possibly because every self-respecting Fortran compiler must be able to compile LAPACK :P

@h-vetinari
Copy link
Contributor Author

@serguei-patchkovskii Thanks for the explanation!

These have been grandfathered in by compilers based on an ancient
version of the language definition, but would invoke undefined
behaviour per the standard. For more details, see the following:
Reference-LAPACK#438 (comment)

Suggested-By: Serguei Patchkovskii <[email protected]>
@codecov
Copy link

codecov bot commented Sep 6, 2020

Codecov Report

Merging #438 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #438   +/-   ##
=======================================
  Coverage   83.24%   83.24%           
=======================================
  Files        1808     1808           
  Lines      170197   170197           
=======================================
  Hits       141681   141681           
  Misses      28516    28516           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b136e8...bffac66. Read the comment docs.

The last parameter must satisfy `LDA >= max(1,M)`, where M is the
number of rows of the matrix, see
https://github.com/Reference-LAPACK/lapack/blob/v3.9.0/SRC/slaset.f#L95

Suggested-By: Serguei Patchkovskii <[email protected]>
The last parameter must satisfy `LDA >= max(1,M)`, where M is the
number of rows of the matrix, see
https://github.com/Reference-LAPACK/lapack/blob/v3.9.0/SRC/dlaset.f#L95

Suggested-By: Serguei Patchkovskii <[email protected]>
@h-vetinari h-vetinari changed the title Backport openblas#2778 Fix some incorrect SLASET/DLASET calls (includes backport of openblas#2778) Sep 6, 2020
@h-vetinari
Copy link
Contributor Author

@serguei-patchkovskii
The requirement that you bring up is also explicitly noted in the docstrings of SLASET & DLASET.

I grepped around a bit and found some more occurrences, which I fixed in the last two commits. However, I make no claims of exhaustiveness (in particular, I didn't check all cases where the LDA argument didn't show up due to a linebreak)

@h-vetinari
Copy link
Contributor Author

Looking also at #318 in particular, one would be tempted to actually implement the check for LDA >= max(1,M) in SLASET/DLASET (and throw with XERBLA if violated). Would that make sense?

Also CC @langou, since #318 is marked "Priority: High"

@martin-frbg
Copy link
Collaborator

@h-vetinari From the original report in #318 it seems running a testsuite build with -fcheck=all would be sufficient (and more effective than grep) for catching all current offenders.

@h-vetinari
Copy link
Contributor Author

@martin-frbg, thanks for the tip (although I needed to find 33258cb to understand where exactly to add -fcheck=all).

In any case, I ran a build in conda-forge/lapack-feedstock#32 with the current patch & -fcheck=all, and while there were some errors, none of them come from SLASET/DLASET anymore (I had first run -fcheck=all without the patch, and the bounds violations for SLASET/DLASET were raising errors).

For the other errors that were uncovered, I opened #440.

@julielangou
Copy link
Contributor

Thanks again Martin!

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 8, 2020

all glory to h-vetinari and serguei-patchkovskii 😄

@h-vetinari h-vetinari deleted the backport_openblas_2778 branch September 8, 2020 10:10
@h-vetinari
Copy link
Contributor Author

It was a team effort! 💪 🙃

christoph-conrads pushed a commit to christoph-conrads/lapack that referenced this pull request May 23, 2021
These have been grandfathered in by compilers based on an ancient
version of the language definition, but would invoke undefined
behaviour per the standard. For more details, see the following:
Reference-LAPACK#438 (comment)

Suggested-By: Serguei Patchkovskii <[email protected]>
christoph-conrads pushed a commit to christoph-conrads/lapack that referenced this pull request May 23, 2021
…blas_2778

Fix some incorrect SLASET/DLASET calls (includes backport of openblas#2778)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect calls to DLASET in TESTING/EIG/cchkhb2stg.f Incorrect call to SLASET in schkst2stg.f out of the array bound in SRC/slaset.f
4 participants