Skip to content

OpenMP reporting wrong device number when using amd64 as target #132549

@KaruroChori

Description

@KaruroChori

Compiling with flags -fopenmp -g -fopenmp-targets=amd64 on an amd64 system results in the omp runtime returning 4 devices available for some strange reason, even if the current system has a single socket processor, so I don't really understand where that 4 is coming from.

#include <cstdio>
#include <omp.h>
int main()
{
    printf("%d %d\n", omp_get_num_devices(), omp_get_device_num());
    return 0;
}

Outputs 4 4.

Activity

llvmbot

llvmbot commented on Mar 22, 2025

@llvmbot
Member

@llvm/issue-subscribers-openmp

Author: None (KaruroChori)

Compiling with flags `-fopenmp -g -fopenmp-targets=amd64` on an amd64 system results in the omp runtime returning 4 devices available for some strange reason, even if the current system has a single socket processor, so I don't really understand where that 4 is coming from.
#include &lt;cstdio&gt;
#include &lt;omp.h&gt;
int main()
{
    printf("%d %d\n", omp_get_num_devices(), omp_get_device_num());
    return 0;
}

Outputs 4 4.

shiltian

shiltian commented on Mar 22, 2025

@shiltian
Contributor

That is because we hard code the number of devices for offloading to host.

KaruroChori

KaruroChori commented on Mar 22, 2025

@KaruroChori
Author

Sorry but I am not sure I understand.
Does it mean that if I were to install three GPUs on my system I would not see three devices available for offloading?
How can the number of devices be hardcoded while being useful?
For example, this would prevent splitting load on multiple devices.

shiltian

shiltian commented on Mar 22, 2025

@shiltian
Contributor

-fopenmp-targets=amd64 means offloading to the CPU/host, and we use a very specific (and naive) implementation for that. If you want to offload to a GPU instead, you should use --offload-arch=gfx942 or --offload-arch=sm_90, depending on your actual GPU architecture. In that case, the number of devices is determined based on how many GPUs (or other offloading devices) support the given offload image. For example, if you have 8 NVIDIA GPUs but your offload target is gfx942, you'll end up with 0 devices. Similarly, even if you have 8 GPUs, regardless of vendor, but your offload target is amd64 (i.e., CPU), you'll get a fixed number of devices, which is hardcoded in our simple host-offloading implementation.

KaruroChori

KaruroChori commented on Mar 22, 2025

@KaruroChori
Author

Got it. So the contribution to the count for GPUs is correct, but not for the CPU. And if one tries to mix them it will be wrong still. Like -fopenmp-targets=amd64,nvptx64 on my system results in 5 device found and not 2 (1 because of the nvidia card I have, the other four are because of amd64).
This is a problem for any kind of mixed application which does not want to explicitly split host and target computation. For example, I am trying to split a dataset for multiple devices and run a subtask on each according to a benchmark of their capabilities, but if the the CPU takes 4 slots any performance metric would end up skewed.

At this point I am just curious, why 4? I would assume 1 to be a more reasonable default if one has to really set some hardcoded value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @shiltian@EugeneZelenko@KaruroChori@llvmbot

        Issue actions

          OpenMP reporting wrong device number when using amd64 as target · Issue #132549 · llvm/llvm-project