-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[DSE] Fixes bug caused by skipped read clobbers #83181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The bug seems to be caused by the use of BatchAA after the IR has been modified. Considering the reproducer in `read-clobber-skipped.ll`, the bug is caused by `isWriteAtEndOfFunction()` returning true instead of false when trying to eliminate `@llvm.memset.po0`. The problem is that when it visits `%ret = load ptr, ptr %struct.byte.8`, `isReadClobber()` should return true, but `isRefSet(BatchAA.getModRefInfo())` actually returns false, so the read clobber is missed. As far as I understand the cause is that a mem user store of `llvm.memset` has already been erased from the IR, causing BatchAA to return the wrong data. Replacing BatchAA with AA at this point fixes the issue.
@llvm/pr-subscribers-llvm-transforms Author: vporpo (vporpo) ChangesThe bug seems to be caused by the use of BatchAA after the IR has been modified. Considering the reproducer in Replacing BatchAA with AA at this point fixes the issue. Full diff: https://github.com/llvm/llvm-project/pull/83181.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
index d30c68a2f08712..6f9fe67e66ca0a 100644
--- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
@@ -1274,7 +1274,8 @@ struct DSEState {
if (CB->onlyAccessesInaccessibleMemory())
return false;
- return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));
+ AAQueryInfo TmpAAQI(AA, &EI);
+ return isRefSet(AA.getModRefInfo(UseInst, DefLoc, TmpAAQI));
}
/// Returns true if a dependency between \p Current and \p KillingDef is
diff --git a/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll b/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll
new file mode 100644
index 00000000000000..fd4c3bf1351df4
--- /dev/null
+++ b/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=dse < %s | FileCheck %s
+;
+; DSE kills both `store i32 44, ptr %struct.byte.4, align 4` and
+; `call void @llvm.memset.p0.i64(...)` but the memset should not be killed
+; because it has a clobber read: `%ret = load ptr, ptr %struct.byte.8`
+
+%struct.type = type { ptr, ptr }
+
+define ptr @foo(ptr noundef %ptr) {
+; CHECK-LABEL: define ptr @foo(
+; CHECK-SAME: ptr noundef [[PTR:%.*]]) {
+; CHECK-NEXT: [[STRUCT_ALLOCA:%.*]] = alloca [[STRUCT_TYPE:%.*]], align 8
+; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 56, ptr nonnull [[STRUCT_ALLOCA]]) #[[ATTR2:[0-9]+]]
+; CHECK-NEXT: [[STRUCT_BYTE_8:%.*]] = getelementptr inbounds i8, ptr [[STRUCT_ALLOCA]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[STRUCT_BYTE_8]], i64 4
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 [[TMP1]], i8 42, i64 4, i1 false)
+; CHECK-NEXT: store i32 43, ptr [[STRUCT_BYTE_8]], align 4
+; CHECK-NEXT: [[RET:%.*]] = load ptr, ptr [[STRUCT_BYTE_8]], align 8
+; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 56, ptr nonnull [[STRUCT_ALLOCA]]) #[[ATTR2]]
+; CHECK-NEXT: ret ptr [[RET]]
+;
+ %struct.alloca = alloca %struct.type, align 8
+ call void @llvm.lifetime.start.p0(i64 56, ptr nonnull %struct.alloca) nounwind
+ %struct.byte.8 = getelementptr inbounds i8, ptr %struct.alloca, i64 8
+ ; Set %struct.alloca[8, 16) to 42.
+ call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 %struct.byte.8, i8 42, i64 8, i1 false)
+ ; Set %struct.alloca[8, 12) to 43.
+ store i32 43, ptr %struct.byte.8, align 4
+ ; Set %struct.alloca[4, 8) to 44.
+ %struct.byte.4 = getelementptr inbounds i8, ptr %struct.alloca, i64 4
+ store i32 44, ptr %struct.byte.4, align 4
+ ; Return %struct.alloca[8, 16).
+ %ret = load ptr, ptr %struct.byte.8
+ call void @llvm.lifetime.end.p0(i64 56, ptr nonnull %struct.alloca) nounwind
+ ret ptr %ret
+}
+
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #0
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2
|
I had to revert the pre-commit test because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the removed dead store (at %struct.byte.4) that invalidates the batchAA right?
@@ -1274,7 +1274,8 @@ struct DSEState { | |||
if (CB->onlyAccessesInaccessibleMemory()) | |||
return false; | |||
|
|||
return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc)); | |||
AAQueryInfo TmpAAQI(AA, &EI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. AA.getModRefInfo(UseInst, DefLoc)
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that won't work because it won't use EarliestEscapeInfo and the result will be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to call getModeRefInfo()
and using EI
for the query, without having to use the temporary AAQueryInfo
?
Yes. |
DSE uses BatchAA, which caches queries using pairs of MemoryLocations. At the moment, DSE may remove instructions that are used as pointers in cached MemoryLocations. If a new instruction used by a new MemoryLoation and this instruction gets allocated at the same address as a previosuly cached and then removed instruction, we may access an incorrect entry in the cache. To avoid this delay removing all instructions except MemoryDefs until the end of DSE. This should avoid removing any values used in BatchAA's cache. Test case by @vporpo from llvm#83181. (Test not precommitted because the results are non-determinstic - memset only sometimes gets removed)
DSE
This is an interesting case, thanks for sharing! AFAIK I noticed that the When removing a store, we also remove any instructions that become dead after removal (reversibly), so we may remove an instruction that was used as memory location pointer in the cache. In most cases, this isn't a problem either, as the instruction has been removed and we are left with stale entries in the cache, which shouldn't be accessed though. What seems to be happening in the case at hand is that we create a new GEP for the adjusted memset and in some cases it gets allocated at the same address as a previous GEP that we removed, but is still in the cache. I experimented a bit to confirm this was indeed the issue and it looks like we should be able to keep using BatchAA, if we avoid deleting instructions that may be cached before the end of DSE, which I sketched here #83411 |
Thanks for looking into it @fhahn and for figuring out how to fix it without resorting to using AA instead of BatchAA. |
Closing, the bug will be fixed by this patch: #83411 |
DSE uses BatchAA, which caches queries using pairs of MemoryLocations. At the moment, DSE may remove instructions that are used as pointers in cached MemoryLocations. If a new instruction used by a new MemoryLoation and this instruction gets allocated at the same address as a previosuly cached and then removed instruction, we may access an incorrect entry in the cache. To avoid this delay removing all instructions except MemoryDefs until the end of DSE. This should avoid removing any values used in BatchAA's cache. Test case by @vporpo from #83181. (Test not precommitted because the results are non-determinstic - memset only sometimes gets removed) PR: #83411
DSE uses BatchAA, which caches queries using pairs of MemoryLocations. At the moment, DSE may remove instructions that are used as pointers in cached MemoryLocations. If a new instruction used by a new MemoryLoation and this instruction gets allocated at the same address as a previosuly cached and then removed instruction, we may access an incorrect entry in the cache. To avoid this delay removing all instructions except MemoryDefs until the end of DSE. This should avoid removing any values used in BatchAA's cache. Test case by @vporpo from llvm#83181. (Test not precommitted because the results are non-determinstic - memset only sometimes gets removed) PR: llvm#83411
DSE uses BatchAA, which caches queries using pairs of MemoryLocations. At the moment, DSE may remove instructions that are used as pointers in cached MemoryLocations. If a new instruction used by a new MemoryLoation and this instruction gets allocated at the same address as a previosuly cached and then removed instruction, we may access an incorrect entry in the cache. To avoid this delay removing all instructions except MemoryDefs until the end of DSE. This should avoid removing any values used in BatchAA's cache. Test case by @vporpo from llvm#83181. (Test not precommitted because the results are non-determinstic - memset only sometimes gets removed) PR: llvm#83411 (cherry picked from commit 10f5e98)
DSE uses BatchAA, which caches queries using pairs of MemoryLocations. At the moment, DSE may remove instructions that are used as pointers in cached MemoryLocations. If a new instruction used by a new MemoryLoation and this instruction gets allocated at the same address as a previosuly cached and then removed instruction, we may access an incorrect entry in the cache. To avoid this delay removing all instructions except MemoryDefs until the end of DSE. This should avoid removing any values used in BatchAA's cache. Test case by @vporpo from llvm#83181. (Test not precommitted because the results are non-determinstic - memset only sometimes gets removed) PR: llvm#83411 (cherry picked from commit 10f5e98)
The bug seems to be caused by the use of BatchAA after the IR has been modified.
Considering the reproducer in
read-clobber-skipped.ll
, the bug is caused byisWriteAtEndOfFunction()
returning true instead of false when trying to eliminate@llvm.memset.po0
.The problem is that when it visits
%ret = load ptr, ptr %struct.byte.8
,isReadClobber()
should return true, butisRefSet(BatchAA.getModRefInfo())
actually returns false, so the read clobber is missed. As far as I understand the cause is that a mem user store ofllvm.memset
has already been erased from the IR, causing BatchAA to return the wrong data.Replacing BatchAA with AA at this point fixes the issue.