Open
Description
Describe the bug
Hi, while running experiments I noticed very low performance on the HIP backend on MI300 GPUs.
To check if it was my code, I ran BabelStream with the same dpcpp installation and these are the results:
$ ./sycl2020-usm-stream
BabelStream
Version: 5.0
Implementation: SYCL2020 USM
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device AMD Radeon Graphics
Driver: HIP 60342.13
Init: 0.040114 s (=20075.484721 MBytes/sec)
Read: 0.022698 s (=35478.972608 MBytes/sec)
Function MBytes/sec Min (sec) Max Average
Copy 97522.605 0.00551 0.02083 0.00616
Mul 97132.458 0.00553 0.02115 0.00577
Add 85752.264 0.00939 0.03005 0.01027
Triad 85734.069 0.00939 0.03009 0.01004
Dot 55937.484 0.00960 0.01854 0.00980
$ ./sycl-stream
BabelStream
Version: 5.0
Implementation: SYCL
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device AMD Radeon Graphics
Driver: HIP 60342.13
Reduction kernel config: 1216 groups of size 1024
Init: 0.122072 s (=6596.988514 MBytes/sec)
Read: 0.008526 s (=94454.229177 MBytes/sec)
Function MBytes/sec Min (sec) Max Average
Copy 3158956.129 0.00017 0.00087 0.00022
Mul 3202178.912 0.00017 0.00088 0.00022
Add 3315532.478 0.00024 0.00094 0.00030
Triad 3308979.611 0.00024 0.00095 0.00029
Dot 2068595.683 0.00026 0.00688 0.00045
With the SYCL2020 USM programming model, I get very low performance even with a stream benchmark. Note that the performances are fine with (old) SYCL model.
This is the configuration line for BabelStream: cmake -Bbuild -S. -DMODEL=sycl -DSYCL_COMPILER=ONEAPI-Clang -DCXX_EXTRA_FLAGS='-fsycl-targets=amd_gpu_gfx942'
This is the output for sycl-ls
:
$ sycl-ls
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:1] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:2] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:3] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:4] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:5] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:6] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:7] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
I also ran a hip stream to be sure it wasn't some driver or env issue, and I get the highest performances:
$ ./hip-stream
BabelStream
Version: 5.0
Implementation: HIP
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using HIP device AMD Radeon Graphics
Driver: 60342134
Memory: DEFAULT
Init: 0.023351 s (=34487.210455 MBytes/sec)
Read: 0.181511 s (=4436.678499 MBytes/sec)
Function MBytes/sec Min (sec) Max Average
Copy 4242127.358 0.00013 0.00017 0.00014
Mul 3986892.165 0.00013 0.00020 0.00015
Add 3687418.406 0.00022 0.00035 0.00024
Triad 3856665.029 0.00021 0.00031 0.00023
Dot 3253289.897 0.00017 0.00023 0.00018