Skip to content

Commit f831414

Browse files
committed
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of thread throttling so I've gone with `SWITCH_RATIO=8`.
1 parent 042e3c0 commit f831414

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

param.h

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
/*****************************************************************************
22
Copyright (c) 2011-2014, The OpenBLAS Project
3+
Copyright (c) 2022, Arm Ltd
34
All rights reserved.
45
56
Redistribution and use in source and binary forms, with or without
@@ -3338,6 +3339,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
33383339

33393340
#elif defined(NEOVERSEN1)
33403341

3342+
#if defined(XDOUBLE) || defined(DOUBLE)
3343+
#define SWITCH_RATIO 8
3344+
#else
3345+
#define SWITCH_RATIO 16
3346+
#endif
3347+
33413348
#define SGEMM_DEFAULT_UNROLL_M 16
33423349
#define SGEMM_DEFAULT_UNROLL_N 4
33433350

@@ -3367,7 +3374,11 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
33673374

33683375
#elif defined(NEOVERSEV1)
33693376

3370-
#define SWITCH_RATIO 16
3377+
#if defined(XDOUBLE) || defined(DOUBLE)
3378+
#define SWITCH_RATIO 8
3379+
#else
3380+
#define SWITCH_RATIO 16
3381+
#endif
33713382

33723383
#define SGEMM_DEFAULT_UNROLL_M 16
33733384
#define SGEMM_DEFAULT_UNROLL_N 4
@@ -3398,6 +3409,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
33983409

33993410
#elif defined(NEOVERSEN2)
34003411

3412+
#if defined(XDOUBLE) || defined(DOUBLE)
3413+
#define SWITCH_RATIO 8
3414+
#else
3415+
#define SWITCH_RATIO 16
3416+
#endif
3417+
34013418
#undef SBGEMM_ALIGN_K
34023419
#define SBGEMM_ALIGN_K 4
34033420

0 commit comments

Comments
 (0)