Write a pass for Data layout scalarization

The scalarizer pass does not handle scalarization of data structures. Further there isn't an existing llvm pass that does this.

In  this godbolt link we have seven scenarios of Array vectors: https://hlsl.godbolt.org/z/9c35aa9zj
```hlsl
uint3 bArr[3];
export uint3 fn0(int index) {
    return bArr[index];
}
```
```hlsl
export uint3 fn1(int index) {
    uint3 aArr[3];
    for(int i = 0; i < 3; i++)
        aArr[i] = uint3(i,i,i);
    return aArr[index];
}
```
```hlsl
groupshared uint4 cArr[3];
export uint4 fn2(int index) {
    for(int i = 0; i < 3; i++)
        cArr[i] = uint4(i,i,i,i);
    return cArr[index];
}
```
```hlsl
groupshared uint4 cVec;
export uint fn3(int i, int index) {
    cVec = uint4(i,i,i,i);
    return cVec[index];
}
```
```hlsl
static uint4 dArr[3];
export uint4 fn4(int index) {
    for(int i = 0; i < 3; i++)
        dArr[i] = uint4(i,i,i,i);
    return dArr[index];
}
```
```hlsl
export uint3 fn5(int index) {
    static uint3 eArr[3];
    for(int i = 0; i < 3; i++)
        eArr[i] = uint3(i,i,i);
    return eArr[index];
}
```
and
```hlsl
static uint4 fVec;
export uint fn6(int i, int index) {
    fVec = uint4(i,i,i,i);
    return fVec[index];
}
```
The idea behind this is to see the data transformation requirements for vectors defined on the
stack vs those defined globally vs those defined with groupshared or static.

In clang the three different global array of vectors scenarios look roughly the same
with a few attribute differences.
```llvm
cArr = local_unnamed_addr addrspace(3) global [3 x <4 x i32>] zeroinitializer, align 16
bArr = local_unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16
dArr = internal unnamed_addr global [3 x <4 x i32>] zeroinitializer, align 16
@"?eArr@?1??fn5@@YAT?$__vector@I$02@__clang@@H@Z@4PAT23@A" = internal unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16, !dbg !26
```
DXC however converts `bArr` (the global non `groupshared` case) into a cbuffer.
The cArr `groupshared` global however gets represented as a flattened 12 wide array in DXC
```llvm
@"\01?cArr@@3PAV?$vector@I$03@@A.v.1dim" = addrspace(3) global [12 x i32] undef, align 4
```
And dArr the static case the vev4 gets scalarized into 4 3 element arrays.
```llvm
@dArr.0 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.1 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.2 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.3 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
``` 
static in a function scope is represented similarly to a function in global scope with only name mangling differences
```llvm
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@4PAV2@A.0" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@4PAV2@A.1" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@4PAV2@A.2" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
```
aArr, the array defined on the function is optimized away into a series of extract elements.

In the cVec DXC converts the vector into an array of 4 elements:
```llvm
@"\01?cVec@@3V?$vector@I$03@@A.v" = addrspace(3) global [4 x i32] undef, align 4
```

The working theory is that data layout transformations are needed for data defined globally.
Further there seems to be three specific behaviors we want. 
1. static scalar layouts
2. groupshared scalar layouts
3. cbuffer usage for regular arrays.

As such The proposal is:
- Traverse global variables in the module.
- Identify global variables of vector types.
- Replace the global vector with a new global array of scalar values.
    - Flatten vectors into arrays
    -  Flatten arrays of vectors into one dim array 
    - Replacement will include when cbuffers are needed as well as when flatten arrays are
- Update all uses of the global variable to work with the new scalar array.
- Remove the old global variable.

Globals can be iterated over like so:
```cpp
for (GlobalVariable &GV : M.globals()) {...}
```
And we will need to update uses like so
```cpp
for (auto *User : GV.users()) {...}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write a pass for Data layout scalarization #107920

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Write a pass for Data layout scalarization #107920

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions