-
Notifications
You must be signed in to change notification settings - Fork 21
Add test for UAV sequential consistency #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for UAV sequential consistency #225
Conversation
75b4269
to
aafd130
Compare
This test aims to verify that UAV accesses are performed sequentially and that the reads and writes are consistent such that a read which occurs after a write must observe the effect of the write.
This just helps with debugging when things fail.
Aadded a lit feature for Intel UHD drivers that are exhibiting this failure. A subesquent change will update the test to XFAIL based on this feature.
aafd130
to
941e510
Compare
Result[0] = X[0] + 1; | ||
Result[1] = X[1] + Result[0]; | ||
|
||
Result[2] = X[0] + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this equivalent to lines 8 and 9? Is the bug not observable if 11 and 12 are not included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you reorder them yourself? will intel move them back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to reproduce the issue if I didn't have both sequences of reads and
writes. I suspect whatever is going on in their optimizer depends on the amount
of adjacent data being loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right; that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect lines 8 and 9 alone to reproduce the issue, if the bug comes from a coalesced load not invalidating previously read locations upon store.
The first read from Result occurs after the first write to Result[0], so if it's a coalescing bug as I hypothesized, the coalesced load would happen after that first write. It would require another read back from a location overwritten after the coalesced load to expose the bug.
That said, this probably could be reduced by one line, since the write to Result[1] and a read back of that should be sufficient for the repro.
[numthreads(1,1,1)]
void main() {
Result[0] = X[0] + 1;
// The write to Result[1] here should invalidate any coalesced load of that
// value during the read of Result[0].
Result[1] = X[1] + Result[0];
// Now, this should be sufficient to expose the invalid use of Result[1]
// from the coalesced load.
Result[2] = X[2] + Result[1];
}
I feel like the more adjacent the location is to the coalesced value loaded, the more likely we are to catch a bug of this kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm not asking for this change to be made. I'm just trying to illustrate what could be an even more minimal repro, if I understand the issue correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just realized... If the Adjacent-Partial-Writes.yaml
test fails (wow!), this issue might be slightly different than what I thought, but would manifest in the same way in this case. I'm surprised a driver would get away with a bug that causes this prior simple case to fail!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I iterated a bunch trying to reduce this further. This was the smallest that I got to reproduce it. This is partially made more challenging to iterate on because it only impacts Intel UHD drivers, and the only one of those I have access to is the GitHub action runner here, which is way over subscribed on its work.
Without a clearer understanding of the underling issue (which has been reported to Intel to investigate) I don't want to spend more time reducing the already quite small test case.
I've grouped these both under the same LIT check because they seem to be loosely related, and I can imagine they could both be caused by unsafe load/store optimizations resulting in mis-compiles. They may not be the same issue, we'll just have to wait until Intel can investigate and identify the problem.
|
||
# RUN: split-file %s %t | ||
# RUN: %dxc_target -T cs_6_5 -Fo %t.o %t/source.hlsl | ||
# RUN: %offloader %t/pipeline.yaml %t.o No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline
This is also reduced from a case that was failing on Intel UHD drivers.
ZeroInitSize: 48 | ||
- Name: ExpectedOut # The result we expect | ||
Format: UInt32 | ||
Stride: 16 | ||
Data: [3, 0, 32, 64, 3, 0, 32, 64, 3, 0, 0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we initialize the output to a unique sequence instead of zero? That would help differentiate a lack of writing outputs from overwriting adjacent locations with zeros.
ZeroInitSize: 48 | |
- Name: ExpectedOut # The result we expect | |
Format: UInt32 | |
Stride: 16 | |
Data: [3, 0, 32, 64, 3, 0, 32, 64, 3, 0, 0, 0] | |
Data: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112] | |
- Name: ExpectedOut # The result we expect | |
Format: UInt32 | |
Stride: 16 | |
Data: [3, 102, 32, 64, 3, 106, 32, 64, 3, 110, 111, 112] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting sentinel values in the output buffer is a good idea, although the change here also changes the expected output which isn't correct (since all values in the expected output are written to by the shader). I'll push an update.
Edit: except the last two... the other 0's are intended to be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This test aims to verify that UAV accesses are performed sequentially
and that the reads and writes are consistent such that a read which
occurs after a write must observe the effect of the write.