Skip to content

[AMD BACKEND - MI300X] Bad performance on BabelStream with sycl2020-usm model #19058

Open
@Aympab

Description

@Aympab

Describe the bug

Hi, while running experiments I noticed very low performance on the HIP backend on MI300 GPUs.

To check if it was my code, I ran BabelStream with the same dpcpp installation and these are the results:

$ ./sycl2020-usm-stream 
BabelStream
Version: 5.0
Implementation: SYCL2020 USM
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device AMD Radeon Graphics
Driver: HIP 60342.13
Init: 0.040114 s (=20075.484721 MBytes/sec)
Read: 0.022698 s (=35478.972608 MBytes/sec)
Function    MBytes/sec  Min (sec)   Max         Average     
Copy        97522.605   0.00551     0.02083     0.00616     
Mul         97132.458   0.00553     0.02115     0.00577     
Add         85752.264   0.00939     0.03005     0.01027     
Triad       85734.069   0.00939     0.03009     0.01004     
Dot         55937.484   0.00960     0.01854     0.00980     

$ ./sycl-stream 
BabelStream
Version: 5.0
Implementation: SYCL
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device AMD Radeon Graphics
Driver: HIP 60342.13
Reduction kernel config: 1216 groups of size 1024
Init: 0.122072 s (=6596.988514 MBytes/sec)
Read: 0.008526 s (=94454.229177 MBytes/sec)
Function    MBytes/sec  Min (sec)   Max         Average     
Copy        3158956.129 0.00017     0.00087     0.00022     
Mul         3202178.912 0.00017     0.00088     0.00022     
Add         3315532.478 0.00024     0.00094     0.00030     
Triad       3308979.611 0.00024     0.00095     0.00029     
Dot         2068595.683 0.00026     0.00688     0.00045

With the SYCL2020 USM programming model, I get very low performance even with a stream benchmark. Note that the performances are fine with (old) SYCL model.

This is the configuration line for BabelStream: cmake -Bbuild -S. -DMODEL=sycl -DSYCL_COMPILER=ONEAPI-Clang -DCXX_EXTRA_FLAGS='-fsycl-targets=amd_gpu_gfx942'

This is the output for sycl-ls:

$ sycl-ls
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:1] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:2] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:3] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:4] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:5] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:6] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]
[hip:gpu][hip:7] AMD HIP BACKEND, AMD Radeon Graphics gfx942:sramecc+:xnack- [HIP 60342.13]

I also ran a hip stream to be sure it wasn't some driver or env issue, and I get the highest performances:

$ ./hip-stream 
BabelStream
Version: 5.0
Implementation: HIP
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using HIP device AMD Radeon Graphics
Driver: 60342134
Memory: DEFAULT
Init: 0.023351 s (=34487.210455 MBytes/sec)
Read: 0.181511 s (=4436.678499 MBytes/sec)
Function    MBytes/sec  Min (sec)   Max         Average     
Copy        4242127.358 0.00013     0.00017     0.00014     
Mul         3986892.165 0.00013     0.00020     0.00015     
Add         3687418.406 0.00022     0.00035     0.00024     
Triad       3856665.029 0.00021     0.00031     0.00023     
Dot         3253289.897 0.00017     0.00023     0.00018

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghipIssues related to execution on HIP backend.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions