Skip to content

Conversation

llvm-beanz
Copy link
Collaborator

@llvm-beanz llvm-beanz commented Jun 5, 2025

This test aims to verify that UAV accesses are performed sequentially
and that the reads and writes are consistent such that a read which
occurs after a write must observe the effect of the write.

@llvm-beanz llvm-beanz changed the title Cbieneman/uav sequential consistency Add test for UAV sequential consistency Jun 5, 2025
@llvm-beanz llvm-beanz force-pushed the cbieneman/uav-sequential-consistency branch from 75b4269 to aafd130 Compare June 7, 2025 15:18
This test aims to verify that UAV accesses are performed sequentially
and that the reads and writes are consistent such that a read which
occurs after a write must observe the effect of the write.
This just helps with debugging when things fail.
Aadded a lit feature for Intel UHD drivers that are exhibiting this failure.
A subesquent change will update the test to XFAIL based on this feature.
@llvm-beanz llvm-beanz force-pushed the cbieneman/uav-sequential-consistency branch from aafd130 to 941e510 Compare June 9, 2025 16:27
Result[0] = X[0] + 1;
Result[1] = X[1] + Result[0];

Result[2] = X[0] + 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this equivalent to lines 8 and 9? Is the bug not observable if 11 and 12 are not included?

Copy link
Collaborator

@spall spall Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you reorder them yourself? will intel move them back?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to reproduce the issue if I didn't have both sequences of reads and
writes. I suspect whatever is going on in their optimizer depends on the amount
of adjacent data being loaded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right; that makes sense

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect lines 8 and 9 alone to reproduce the issue, if the bug comes from a coalesced load not invalidating previously read locations upon store.

The first read from Result occurs after the first write to Result[0], so if it's a coalescing bug as I hypothesized, the coalesced load would happen after that first write. It would require another read back from a location overwritten after the coalesced load to expose the bug.

That said, this probably could be reduced by one line, since the write to Result[1] and a read back of that should be sufficient for the repro.

[numthreads(1,1,1)]
void main() {
  Result[0] = X[0] + 1;

  // The write to Result[1] here should invalidate any coalesced load of that
  // value during the read of Result[0].
  Result[1] = X[1] + Result[0];

  // Now, this should be sufficient to expose the invalid use of Result[1]
  // from the coalesced load.
  Result[2] = X[2] + Result[1];
}

I feel like the more adjacent the location is to the coalesced value loaded, the more likely we are to catch a bug of this kind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm not asking for this change to be made. I'm just trying to illustrate what could be an even more minimal repro, if I understand the issue correctly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just realized... If the Adjacent-Partial-Writes.yaml test fails (wow!), this issue might be slightly different than what I thought, but would manifest in the same way in this case. I'm surprised a driver would get away with a bug that causes this prior simple case to fail!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I iterated a bunch trying to reduce this further. This was the smallest that I got to reproduce it. This is partially made more challenging to iterate on because it only impacts Intel UHD drivers, and the only one of those I have access to is the GitHub action runner here, which is way over subscribed on its work.

Without a clearer understanding of the underling issue (which has been reported to Intel to investigate) I don't want to spend more time reducing the already quite small test case.

I've grouped these both under the same LIT check because they seem to be loosely related, and I can imagine they could both be caused by unsafe load/store optimizations resulting in mis-compiles. They may not be the same issue, we'll just have to wait until Intel can investigate and identify the problem.


# RUN: split-file %s %t
# RUN: %dxc_target -T cs_6_5 -Fo %t.o %t/source.hlsl
# RUN: %offloader %t/pipeline.yaml %t.o No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

Comment on lines 30 to 34
ZeroInitSize: 48
- Name: ExpectedOut # The result we expect
Format: UInt32
Stride: 16
Data: [3, 0, 32, 64, 3, 0, 32, 64, 3, 0, 0, 0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we initialize the output to a unique sequence instead of zero? That would help differentiate a lack of writing outputs from overwriting adjacent locations with zeros.

Suggested change
ZeroInitSize: 48
- Name: ExpectedOut # The result we expect
Format: UInt32
Stride: 16
Data: [3, 0, 32, 64, 3, 0, 32, 64, 3, 0, 0, 0]
Data: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]
- Name: ExpectedOut # The result we expect
Format: UInt32
Stride: 16
Data: [3, 102, 32, 64, 3, 106, 32, 64, 3, 110, 111, 112]

Copy link
Collaborator Author

@llvm-beanz llvm-beanz Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting sentinel values in the output buffer is a good idea, although the change here also changes the expected output which isn't correct (since all values in the expected output are written to by the shader). I'll push an update.

Edit: except the last two... the other 0's are intended to be 0.

Copy link
Collaborator

@spall spall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@llvm-beanz llvm-beanz marked this pull request as ready for review June 17, 2025 17:52
@llvm-beanz llvm-beanz merged commit 81e73ae into llvm:main Jun 19, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants