Description
We've recently released Score-P v9.0, having support for the OpenMP target via the OpenMP Tools Interface for the first time.
However, a user quickly noticed an issue, which boils down to a bug in LLVM.
When the OpenMP runtime initializes a connected tool, the following callback will be dispatched:
typedef int (*ompt_initialize_t) (ompt_function_lookup_t lookup, int initial_device_num, ompt_data_t *tool_data);
initial_device_num
is defined as "the value that a call to omp_get_initial_device would return". This itself is defined as being "the value of the device number is the value of omp_initial_device or the value returned by the omp_get_num_devices routine".
Looking at the actual returned values however, this is not the case. Instead, LLVM always reports initial_device_num = 0
. Since LLVM also does not implement ompt_get_num_devices
correctly
a tool cannot safely determine if a device identifier in some target callback is actually the host.
LLVM typically uses -1
for this, but tools should not have to rely on guessing this, as it might or might not change in later versions. This also makes supporting different runtimes more complicated.
To reproduce the issue:
#include <omp.h>
#include <omp-tools.h>
#include <stdlib.h>
#include <assert.h>
#include <inttypes.h>
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdatomic.h>
/* MAIN */
int main( int argc, char** argv )
{
#pragma omp target
{
printf( "Hello from target region\n" );
}
}
/* OMPT INTERFACE */
#if ( defined( __ppc64__ ) || defined( __powerpc64__ ) || defined( __PPC64__ ) )
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#elif ( defined( __x86_64 ) || defined( __x86_64__ ) || defined( __amd64 ) || defined( _M_X64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "pause" )
#elif ( defined( __aarch64__ ) || defined( __ARM64__ ) || defined( _M_ARM64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "yield" )
#else
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#endif
/* Test-and-test-and-set lock. Mutexes are of type bool */
#define OMPT_TOOL_LOCK( MUTEX ) \
while ( true ) \
{ \
if ( atomic_flag_test_and_set_explicit( &( MUTEX ), memory_order_acquire ) != true ) \
{ \
break; \
} \
OMPT_TOOL_CPU_RELAX; \
}
#define OMPT_TOOL_UNLOCK( MUTEX ) atomic_flag_clear_explicit( &( MUTEX ), memory_order_release );
#define OMPT_TOOL_GUARDED_PRINTF( ... ) \
OMPT_TOOL_LOCK( ompt_tool_printf_mutex ) \
printf( __VA_ARGS__ ); \
OMPT_TOOL_UNLOCK( ompt_tool_printf_mutex )
_Thread_local int32_t ompt_tool_tid = -1; /* thread counter. >= 1 after thread_begin */
atomic_flag ompt_tool_printf_mutex = ATOMIC_FLAG_INIT;
static int
my_initialize_tool( ompt_function_lookup_t lookup,
int initial_device_num,
ompt_data_t* tool_data )
{
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | initial_device_num %d\n",
__FUNCTION__,
ompt_tool_tid,
initial_device_num );
return 1; /* non-zero indicates success */
}
static void
my_finalize_tool( ompt_data_t* tool_data )
{
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 "\n",
__FUNCTION__,
ompt_tool_tid );
}
ompt_start_tool_result_t*
ompt_start_tool( unsigned int omp_version,
const char* runtime_version )
{
setbuf( stdout, NULL );
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | omp_version %d | runtime_version = \'%s\'\n",
__FUNCTION__,
ompt_tool_tid,
omp_version,
runtime_version );
static ompt_start_tool_result_t tool = { &my_initialize_tool,
&my_finalize_tool,
ompt_data_none };
return &tool;
}
Building the reproducer on a system with at least one GPU (maybe -fopenmp-targets=x86_64
works too), one will see the following result:
$ clang -fopenmp --offload-arch=sm_75 test.c
$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
Hello from target region
[my_finalize_tool] tid = -1
As one can see initial_device_num
is 0
. In my case, it should be either -1
or 1
.
Activity
llvmbot commentedon Apr 4, 2025
@llvm/issue-subscribers-openmp
Author: Jan André Reuter (Thyre)
However, a user quickly noticed an issue, which boils down to a bug in LLVM.
When the OpenMP runtime initializes a connected tool, the following callback will be dispatched:
initial_device_num
is defined as "the value that a call to omp_get_initial_device would return". This itself is defined as being "the value of the device number is the value of omp_initial_device or the value returned by the omp_get_num_devices routine".Looking at the actual returned values however, this is not the case. Instead, LLVM always reports
initial_device_num = 0
. Since LLVM also does not implementompt_get_num_devices
correctlyllvm-project/openmp/runtime/src/ompt-general.cpp
Line 868 in 19e0233
a tool cannot safely determine if a device identifier in some target callback is actually the host.
LLVM typically uses
-1
for this, but tools should not have to rely on guessing this, as it might or might not change in later versions. This also makes supporting different runtimes more complicated.To reproduce the issue:
Building the reproducer on a system with at least one GPU (maybe
-fopenmp-targets=x86_64
works too), one will see the following result:As one can see
initial_device_num
is0
. In my case, it should be either-1
or1
.Thyre commentedon Apr 4, 2025
Looking at the source code a bit, there are two places where OMPT is initialized:
llvm-project/openmp/runtime/src/ompt-general.cpp
Line 489 in 412f7fa
llvm-project/openmp/runtime/src/ompt-general.cpp
Line 932 in 412f7fa
The first case looks correct. However, I think we never see it in the case where offload is used, right?
The second case should never use
0
, but alsoomp_get_initial_device
. There are two possible values which can then be returned:llvm-project/offload/DeviceRTL/src/State.cpp
Line 430 in 64b060f
llvm-project/offload/libomptarget/OpenMP/API.cpp
Line 93 in 64b060f
Here, I don't understand when which one is chosen...
Callbacks later on also use
omp_get_initial_device
, so we should stick with the same value everywhere. For sake of consistency with the non-tool side, havingomp_get_num_devices
would be nice. From a tool perspective however, I wouldn't really care.Thyre commentedon Apr 9, 2025
If I understand this correctly:
This is the actual initialization, using the host side of the OpenMP runtime:
llvm-project/openmp/runtime/src/ompt-general.cpp
Line 489 in 412f7fa
This connects the host side to the offload library:
llvm-project/openmp/runtime/src/ompt-general.cpp
Line 932 in 412f7fa
In the second case,
initial_device_num
andtool_data
are unused, hence dummy values are passed.Wouldn't hurt to pass the correct data though.
The host side initialization calls
omp_get_initial_device()
, which ends up inllvm-project/offload/libomptarget/OpenMP/API.cpp
Line 93 in 64b060f
This is where my guess work begins:
I think that we do not yet know how many devices exist when calling the initialization in the tool. Therefore,
omp_get_initial_device()
callsomp_get_num_devices()
returning0
.The same would probably also apply if we would implement
ompt_get_num_devices()
. This would work after the tool initialization, but not during it.I'm wondering how much would break if we would just switch to returning
-1
, which is allowed per OpenMP 6.0 spec. (but would not be conformant with OpenMP 5.0 -- 5.2 ...)mhalk commentedon Jun 5, 2025
Hi @Thyre , sorry for the latency, I just spent some time to understand the issue at hand.
Unfortunately, I do not see a "good" solution ATM to correct the value during initialization.
AFAICT you are right and the number of devices is only known after initialization of the accelerators / devices has completed.
If returning "-1" as a known value during initialization alleviates the issue, IMHO that sounds reasonable.
Could you please point me to the parts of the spec that disallow "-1" for pre-6.0 and also the ones that allow for 6.x?
Thyre commentedon Jun 5, 2025
Heya @mhalk, absolutely no problem 😄
Thanks a lot for looking into this at all!
The OpenMP 5.2 and 6.0 spec. state for
ompt_start_tool
:5.2 (p.473, l.28):
6.0 (p.745, l.19):
When then looking at
omp_get_initial_device
, we can see the following:5.2 (p. 385, l.11):
6.0 (p. 598, l.16):
With
omp_get_num_devices
finally being defined as:5.2 (p. 383, l. 20):
6.0 (p. 594,l. 16):
and
omp_initial_device
being a fixed constant for-1
(OpenMP 5.2: p. 346, l.14, OpenMP 6.0: p. 534, l. 5).So while both specs are pretty much aligned in all of the definitions, OpenMP 6.0 allows the usage of
omp_initial_device
foromp_get_initial_device
and thereforeinitial_device_num
in OMPT. Using-1
would not entirely follow the OpenMP 5.2 spec. unfortunately.Personally, I would be totally fine with LLVM using
-1
for the host consistently, asomp_initial_device
is defined as-1
even for 5.2, with the spec. even mentioning (OpenMP 5.2, p. 24, l. 2):which is exactly what we're looking for.
mhalk commentedon Jun 10, 2025
Thanks for the effort you put into this; helps tremendously.
For OpenMP 6.0, I'd argue that the situation is clear and by using the constant
omp_initial_device
/-1
, we adhere to the spec.For OpenMP 5.2,
omp_initial_device
/-1
might not be obviously conformant on every path -- but IMO it actually is conformant with this situation.According to the 5.2 spec:
ompt_start_tool
usesomp_get_initial_device
(for determininginitial_device_num
)Taking the following into account:
5.2 (p. 385, l. 3-4):
omp_get_initial_device
summary5.2 (p. 385, l. 11):
omp_get_initial_device
effect5.2 (p. 24, l. 2-3): Execution Model
Since
omp_initial_device
is an alias for the host device,omp_get_initial_device
is IMO OpenMP 5.2 spec conformant when returning-1
while being called duringompt_start_tool
.Especially
omp_get_initial_device
's summary loosens the constraints: "returns a device number that represents the host device".As you already stated:
I'll try to find that ^ out.