Skip to content

ARM64v8-TSV110: dgemm sigfaults on large square matrices single thread #2538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akobotov opened this issue Apr 2, 2020 · 4 comments · Fixed by #2551
Closed

ARM64v8-TSV110: dgemm sigfaults on large square matrices single thread #2538

akobotov opened this issue Apr 2, 2020 · 4 comments · Fixed by #2551
Milestone

Comments

@akobotov
Copy link

akobotov commented Apr 2, 2020

Description:

Crash in dgemm_oncopy(). This is because sb parameter points out of allocated buffer. And the pointer is calculated in driver/level3/level3.c:350 as (sb + min_l * (jjs - js) * COMPSIZE * l1stride), where jjs is close to 4000 and min_l is 512. jjs limited by GEMM_R and it is GEMM_R == DGEMM_DEFAULT_R == 4096.

To reproduce:

OS: CentOS Linux release 7.6.1810 (AltArch)
CPU: TSV110
Compiler: gcc 9.2.0

Commands:
make TARGET=TSV110 USE_OPENMP=yes
cd benchmarks; make
ulimit -s unlimited
OMP_NUM_THREADS=1 ./dgemm.goto 4000 10000 1000

Root cause:

BUFFER_SIZE is too small for P, Q and R.

Suggested fix

Add to common_arm64.h:

 #if defined(CORTEXA57)
 #define BUFFER_SIZE     (20 << 20)
+#elif defined(TSV110) || defined(EMAG8180)
+#define BUFFER_SIZE     (32 << 20)
 #else
 #define BUFFER_SIZE     (16 << 20)
 #endif

And in the beginning of memory.c add a buffer size guard like this:

/* Memory buffer must fit two matrix subblocks of maximal size */
#if BUFFER_SIZE < (SGEMM_DEFAULT_P * SGEMM_DEFAULT_Q * 4 * 2) || \
    BUFFER_SIZE < (SGEMM_DEFAULT_P * SGEMM_DEFAULT_R * 4 * 2) || \
    BUFFER_SIZE < (SGEMM_DEFAULT_R * SGEMM_DEFAULT_Q * 4 * 2)
#error BUFFER_SIZE is too small for P, Q, and R of SGEMM
#endif
#if BUFFER_SIZE < (DGEMM_DEFAULT_P * DGEMM_DEFAULT_Q * 8 * 2) || \
    BUFFER_SIZE < (DGEMM_DEFAULT_P * DGEMM_DEFAULT_R * 8 * 2) || \
    BUFFER_SIZE < (DGEMM_DEFAULT_R * DGEMM_DEFAULT_Q * 8 * 2)
#error BUFFER_SIZE is too small for P, Q, and R of DGEMM
#endif
#if BUFFER_SIZE < (CGEMM_DEFAULT_P * CGEMM_DEFAULT_Q * 8 * 2) || \
    BUFFER_SIZE < (CGEMM_DEFAULT_P * CGEMM_DEFAULT_R * 8 * 2) || \
    BUFFER_SIZE < (CGEMM_DEFAULT_R * CGEMM_DEFAULT_Q * 8 * 2)
#error BUFFER_SIZE is too small for P, Q, and R of CGEMM
#endif
#if BUFFER_SIZE < (ZGEMM_DEFAULT_P * ZGEMM_DEFAULT_Q * 16 * 2) || \
    BUFFER_SIZE < (ZGEMM_DEFAULT_P * ZGEMM_DEFAULT_R * 16 * 2) || \
    BUFFER_SIZE < (ZGEMM_DEFAULT_R * ZGEMM_DEFAULT_Q * 16 * 2)
#error BUFFER_SIZE is too small for P, Q, and R of ZGEMM
#endif

Note
This issue mostlikly also affect EMAG8180 target, possible others - the guard should notify about that.

@martin-frbg
Copy link
Collaborator

Thanks - ultimately this is probably "just" another manifestation of the fundamental problem from #1698 (BUFFER_SIZE should be more easily configurable at compile time, if it cannot go away completely). The current values are mostly from the GotoBLAS of ten years ago, and must have been quite arbitrary even then.

@martin-frbg
Copy link
Collaborator

Well that escalated quickly... seems same problem affects SkylakeX and ARMV7 targets at least.

@martin-frbg
Copy link
Collaborator

Hmm. Either the formula is wrong or there is something fundamentally wrong with the buffer allocation on POWER8. I just do not see how BUFFER_SIZE can ever be large enough to accomodate the value derived in the absence of a predefined xGEMM_DEFAULT_R...

@martin-frbg
Copy link
Collaborator

One thing that seems to be clear is that the calculation of the size requirement in the error message is going to overflow when xGEMM_DEFAULT:_R is itself a notrivial formula rather than a constant. (Which appears to have been the case specifically for POWERx platforms ever since GotoBLAS2, but I have no idea why)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants