Skip to content

Commit 75d183d

Browse files
authored
Update compiler options (#656)
* Better options for aocc, clarify aocc is in clang module * Due to bug in OpenBLAS remove thread binding from defaults The bug OpenMathLib/OpenBLAS#2238 is fixed in the development version * REmove fixme * Fix table formatting * Improve AMD naming * Typo fix * Add example with thread binding (with Openblas caveat) * Specify full module name * typo
1 parent 3c50718 commit 75d183d

File tree

4 files changed

+35
-17
lines changed

4 files changed

+35
-17
lines changed

docs/computing/compiling-mahti.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@
1313
C/C++ and Fortran applications can be built with
1414
[GNU](https://gcc.gnu.org), [AMD](https://developer.amd.com/amd-aocc/),
1515
or [Intel](https://software.intel.com/en-us/parallel-studio-xe/documentation/get-started)
16-
compiler suites. The GNU suite is loaded by default, AMD or Intel can
17-
be selected via the [Modules](modules.md) system, i.e.
16+
compiler suites. The GNU compilers are loaded by default. AMD compilers can be
17+
loaded using the [Modules](modules.md) system with the command:
1818
```
1919
module load clang
2020
```
21-
or
21+
and Intel compilers with the command:
2222
```
2323
module load intel
2424
```
@@ -37,13 +37,12 @@ the safe level and then move up to intermediate or even aggressive,
3737
while making sure the results are correct and the program's
3838
performance has improved.
3939

40-
<!-- FIXME: clang options, what are recommend Intel options? -->
4140

42-
| Optimisation level | GNU | Intel | AMD |
41+
| Optimisation level | GNU | Intel | AMD (clang) |
4342
| :----------------- | :---------------- | :--------------------------- | :----------- |
44-
| **Safe** | -O2 -march=native | -O2 -fp-model precise | -O2 |
45-
| **Intermediate** | -O3 -march=native | -O2 | -O3 |
46-
| **Aggressive** | -O3 -march=native -ffast-math -funroll-loops | -O3 -fp-model fast=2 -no-prec-div -fimf-use-svml=true | -O3 |
43+
| **Safe** | -O2 -march=native | -O2 -fp-model precise | -O2 -march=native |
44+
| **Intermediate** | -O3 -march=native | -O2 | -O3 -march=native |
45+
| **Aggressive** | -O3 -march=native -ffast-math -funroll-loops | -O3 -fp-model fast=2 -no-prec-div -fimf-use-svml=true | -O3 -march=native -ffast-math -funroll-loops |
4746

4847

4948
A detailed list of options for the Intel and GNU compilers can be found on the _man_

docs/computing/running/creating-job-scripts-mahti.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ environment. The rest of this page focuses on Mahti specific topics.
1010
with the exception of interactive jobs (to be added). Many options also work
1111
differently in Puhti and Mahti, so it is not advisable to copy scripts from Puhti
1212
to Mahti.
13-
<!-- FIXME interactive jobs -->
1413

1514
[TOC]
1615

@@ -96,9 +95,9 @@ export OMP_NUM_THREADS=1
9695
srun myprog -i input -o output
9796
```
9897

99-
For hybrid applications, one should use `OMP_PLACES` and
100-
`OMP_PROC_BIND` OpenMP runtime environment variables for obtaining
101-
optimum placement of OpenMP threads. As an example, in order to run
98+
For hybrid applications, one should use
99+
`OMP_PROC_BIND` OpenMP runtime environment variable for
100+
placing the OpenMP threads. As an example, in order to run
102101
one MPI tasks per NUMA domain and one OpenMP thread per L3cache one
103102
can set
104103

@@ -107,7 +106,6 @@ can set
107106
#SBATCH --cpus-per-task=16
108107

109108
export OMP_NUM_THREADS=4
110-
export OMP_PLACES=cores
111109
export OMP_PROC_BIND=spread
112110

113111
module load myprog/1.2.3

docs/computing/running/example-job-scripts-mahti.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,32 @@ srun myprog <options>
5252
5353
# Set the number of threads based on --cpus-per-task
5454
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
55-
# Bind OpenMP threads to cores
55+
56+
srun myprog <options>
57+
```
58+
59+
## MPI + OpenMP with thread binding
60+
61+
Note! Due to bug in OpenBLAS, thread binding should not be used in applications
62+
utilizing threaded OpenBLAS (openblas/0.3.10-omp module)
63+
```
64+
#!/bin/bash
65+
#SBATCH --job-name=example
66+
#SBATCH --account=<project>
67+
#SBATCH --partition=large
68+
#SBATCH --time=02:00:00
69+
#SBATCH --nodes=100
70+
#SBATCH --ntasks-per-node=16
71+
#SBATCH --cpus-per-task=8
72+
73+
# Set the number of threads based on --cpus-per-task
74+
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
5675
export OMP_PLACES=cores
5776
5877
srun myprog <options>
5978
```
6079

80+
6181
## MPI + OpenMP with simultaneous multithreading
6282

6383
```
@@ -75,8 +95,6 @@ srun myprog <options>
7595
7696
# Set the number of threads based on --cpus-per-task
7797
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
78-
# Bind OpenMP threads to hardware threads
79-
export OMP_PLACES=threads
8098
8199
srun myprog <options>
82100
```

docs/computing/running/performance-checklist.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@ higher and poor load balancing gets more likely.
5454

5555
Many HPC applications benefit from binding OpenMP threads to CPU cores
5656
which can be achieved by setting `export OMP_PLACES=cores` in the
57-
batch job script. When starting new production runs it is also good
57+
batch job script. Note! Due to bug in OpenBLAS thread binding should not be
58+
specified when using threaded OpenBLAS (openblas/0.3.10-omp module).
59+
60+
When starting new production runs it is also good
5861
practice to ensure correct thread affinity by adding to batch job
5962
script
6063
```

0 commit comments

Comments
 (0)