Skip to content

Write a pass for Data layout scalarization #107920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #92422 ...
farzonl opened this issue Sep 9, 2024 · 0 comments · Fixed by #110029
Closed
Tracked by #92422 ...

Write a pass for Data layout scalarization #107920

farzonl opened this issue Sep 9, 2024 · 0 comments · Fixed by #110029
Assignees

Comments

@farzonl
Copy link
Member

farzonl commented Sep 9, 2024

The scalarizer pass does not handle scalarization of data structures. Further there isn't an existing llvm pass that does this.

In this godbolt link we have seven scenarios of Array vectors: https://hlsl.godbolt.org/z/9c35aa9zj

uint3 bArr[3];
export uint3 fn0(int index) {
    return bArr[index];
}
export uint3 fn1(int index) {
    uint3 aArr[3];
    for(int i = 0; i < 3; i++)
        aArr[i] = uint3(i,i,i);
    return aArr[index];
}
groupshared uint4 cArr[3];
export uint4 fn2(int index) {
    for(int i = 0; i < 3; i++)
        cArr[i] = uint4(i,i,i,i);
    return cArr[index];
}
groupshared uint4 cVec;
export uint fn3(int i, int index) {
    cVec = uint4(i,i,i,i);
    return cVec[index];
}
static uint4 dArr[3];
export uint4 fn4(int index) {
    for(int i = 0; i < 3; i++)
        dArr[i] = uint4(i,i,i,i);
    return dArr[index];
}
export uint3 fn5(int index) {
    static uint3 eArr[3];
    for(int i = 0; i < 3; i++)
        eArr[i] = uint3(i,i,i);
    return eArr[index];
}

and

static uint4 fVec;
export uint fn6(int i, int index) {
    fVec = uint4(i,i,i,i);
    return fVec[index];
}

The idea behind this is to see the data transformation requirements for vectors defined on the
stack vs those defined globally vs those defined with groupshared or static.

In clang the three different global array of vectors scenarios look roughly the same
with a few attribute differences.

cArr = local_unnamed_addr addrspace(3) global [3 x <4 x i32>] zeroinitializer, align 16
bArr = local_unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16
dArr = internal unnamed_addr global [3 x <4 x i32>] zeroinitializer, align 16
@"?eArr@?1??fn5@@YAT?$__vector@I$02@__clang@@H@Z@4PAT23@A" = internal unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16, !dbg !26

DXC however converts bArr (the global non groupshared case) into a cbuffer.
The cArr groupshared global however gets represented as a flattened 12 wide array in DXC

@"\01?cArr@@3PAV?$vector@I$03@@A.v.1dim" = addrspace(3) global [12 x i32] undef, align 4

And dArr the static case the vev4 gets scalarized into 4 3 element arrays.

@dArr.0 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.1 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.2 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.3 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4

static in a function scope is represented similarly to a function in global scope with only name mangling differences

@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4

aArr, the array defined on the function is optimized away into a series of extract elements.

In the cVec DXC converts the vector into an array of 4 elements:

@"\01?cVec@@3V?$vector@I$03@@A.v" = addrspace(3) global [4 x i32] undef, align 4

The working theory is that data layout transformations are needed for data defined globally.
Further there seems to be three specific behaviors we want.

  1. static scalar layouts
  2. groupshared scalar layouts
  3. cbuffer usage for regular arrays.

As such The proposal is:

  • Traverse global variables in the module.
  • Identify global variables of vector types.
  • Replace the global vector with a new global array of scalar values.
    • Flatten vectors into arrays
    • Flatten arrays of vectors into one dim array
    • Replacement will include when cbuffers are needed as well as when flatten arrays are
  • Update all uses of the global variable to work with the new scalar array.
  • Remove the old global variable.

Globals can be iterated over like so:

for (GlobalVariable &GV : M.globals()) {...}

And we will need to update uses like so

for (auto *User : GV.users()) {...}
@farzonl farzonl removed the new issue label Sep 9, 2024
farzonl added a commit to llvm/wg-hlsl that referenced this issue Sep 20, 2024
This proposal covers:
- all the cases mentioned in:
llvm/llvm-project#107920
- The results of our design discussion
- the global multi-d to 1d array flattening cases mentioned in:
llvm/llvm-project#89646
- intro, background, and motivation for this work.
@farzonl farzonl moved this to Planning in HLSL Support Sep 20, 2024
@farzonl farzonl self-assigned this Sep 20, 2024
@pow2clk pow2clk moved this from Planning to Active in HLSL Support Sep 23, 2024
@farzonl farzonl linked a pull request Sep 26, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from Active to Done in HLSL Support Sep 26, 2024
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this issue Sep 27, 2024
This change adds a pass to scalarize vectors in global scope into
arrays.

There are three distinct parts
1. find the globals that need to be updated and define what the new type
should be
2. initialize that new type and copy over all the right attributes over
from the old type.
3. Use the instruction visitor pattern to update the loads, stores, and
geps for the layout of the new data structure.

resolves llvm#107920
puja2196 pushed a commit to puja2196/LLVM-tutorial that referenced this issue Sep 30, 2024
This change adds a pass to scalarize vectors in global scope into
arrays.

There are three distinct parts
1. find the globals that need to be updated and define what the new type
should be
2. initialize that new type and copy over all the right attributes over
from the old type.
3. Use the instruction visitor pattern to update the loads, stores, and
geps for the layout of the new data structure.

resolves llvm/llvm-project#107920
puja2196 pushed a commit to puja2196/LLVM-tutorial that referenced this issue Oct 2, 2024
This change adds a pass to scalarize vectors in global scope into
arrays.

There are three distinct parts
1. find the globals that need to be updated and define what the new type
should be
2. initialize that new type and copy over all the right attributes over
from the old type.
3. Use the instruction visitor pattern to update the loads, stores, and
geps for the layout of the new data structure.

resolves llvm/llvm-project#107920
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant