Skip to content

[OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 #95611

Open
@MattPD

Description

@MattPD

This may be a regression between LLVM version 17.0.1 and 18.1.0.
The issue is still present in the main branch as of version 19.0.0 (dbc3e26).

Consider the following loop:

#define ARRAY_SIZE 256

__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
            for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
                X[r][k] = X[r][k - 2] + sinf((float)(r / c));
            }
        }
    }
}

We have that:

"2.13.8 ordered Construct:
The ordered construct either specifies a structured block in a loop, simd, or loop SIMD region that will be executed in the order of the loop iterations, or it is a stand-alone directive that specifies cross-iteration dependences in a doacross loop nest. The ordered construct sequentializes and orders the execution of ordered regions while allowing code outside the region to run in parallel."

However, as of LLVM 18.1.0 when we:

  • run the omp_simd_loop using #pragma omp simd and #pragma omp ordered simd
  • run the sequential comparison_loop (which is otherwise the same loop without any #pragma omp)
  • compare the results, counting the number of errors whenever the comparison fails (up to an including a rather large relative comparison tolerance of 1000000.0 * FLT_EPSILON)

We have 12,090 errors for the code compiled with LLVM 18.1.0 but 0 errors for the code compiled with LLVM 17.0.1.

Compiler Explorer repro:

The bug is only present when compiling with -fopenmp (compiling without -fopenmp makes LLVM 18.1.0 pass).
Removing all #pragma omp also makes this pass.
Using #pragma omp simd safelen(2) instead of #pragma omp simd is similarly sufficient: But this effectively makes #pragma omp ordered simd unnecessary.
The above would strongly indicate this is an OpenMP issue. However, when attempting to track this down--and in particular analyze the interactions with different loop vectorizer decisions between LLVM 17.0.1 and 18.1.0--I've run into some "interesting" challenges (notes on the findings in the next comment to keep this one short).

This may be related to an earlier bug (although note that this one is a bit simpler in that it doesn't use printf inside the loop which currently prevents vectorization and thus does not reproduce for me at the time of writing):

[OpenMP 4.5] ORDERED SIMD construct in loop SIMD doesn't work as required by the specification
#51043


Full repro source code (for completeness only: the aforementioned Compiler Explorer repros are identical):

#include <float.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int compare_float(float x1, float x2, float scalar) {
    const float diff = fabsf(x1 - x2);
    x1 = fabsf(x1);
    x2 = fabsf(x2);
    const float l = (x2 > x1) ? x2 : x1;
    if (diff <= l * scalar * FLT_EPSILON)
        return 1;
    else
        return 0;
}

#define ARRAY_SIZE 256

__attribute__((noinline)) void initialization_loop(
    float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    const float max = 1000.0;
    srand(time(NULL));
    for (int r = 0; r < ARRAY_SIZE; r++) {
        for (int c = 0; c < ARRAY_SIZE; c++) {
            X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max;
            Y[r][c] = X[r][c];
        }
    }
}

__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
            for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
                X[r][k] = X[r][k - 2] + sinf((float)(r / c));
            }
        }
    }
}

__attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE],
                                              float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    int totalErrors_simd = 0;
    const float scalar = 1.0;
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
            for (int k = 2; k < ARRAY_SIZE; ++k) {
                Y[r][k] = Y[r][k - 2] + sinf((float)(r / c));
            }
        }
        // check row for simd update
        for (int k = 0; k < ARRAY_SIZE; ++k) {
            if (!compare_float(X[r][k], Y[r][k], scalar)) {
                ++totalErrors_simd;
            }
        }
    }
    return totalErrors_simd;
}

int main(void) {
    float X[ARRAY_SIZE][ARRAY_SIZE];
    float Y[ARRAY_SIZE][ARRAY_SIZE];

    initialization_loop(X, Y);
    omp_simd_loop(X);
    const int totalErrors_simd = comparison_loop(X, Y);

    if (totalErrors_simd) {
        fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd);
        fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n",
                __FILE__, __LINE__);
    } else {
        fprintf(stdout, "Success!\n");
    }

    return totalErrors_simd;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions