-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[PreISelIntrinsicLowering] Produce a memset_pattern16 libcall for llvm.experimental.memset.pattern when available #120420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
442c72f
6cd0d09
071a399
5c92d38
eee7696
70ebad3
0244831
9b6d15e
25960aa
1d39934
ae7d826
02ea442
b9f9d98
2c55ea5
b9d366a
7d7c203
f960ef3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please add a test with e.g. an i64 constant? If I read your code correctly that is also handled by splatting, so we should have coverage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, definitely an oversight not to cover the 'splatting'. I've added test cases to cover this. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 5 | ||
; RUN: opt -mtriple=x86_64-apple-darwin10.0.0 -passes=pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s | ||
|
||
asb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
;. | ||
; CHECK: @.memset_pattern = private unnamed_addr constant [2 x i64] [i64 -6148895925951734307, i64 -6148895925951734307], align 16 | ||
; CHECK: @.memset_pattern.1 = private unnamed_addr constant [2 x i64] [i64 4614256656552045848, i64 4614256656552045848], align 16 | ||
; CHECK: @.memset_pattern.2 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16 | ||
; CHECK: @.memset_pattern.3 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16 | ||
; CHECK: @.memset_pattern.4 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16 | ||
; CHECK: @.memset_pattern.5 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16 | ||
; CHECK: @.memset_pattern.6 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16 | ||
;. | ||
define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_1_dynvalue( | ||
; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] { | ||
; CHECK-NEXT: br i1 false, label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Branch on false? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just a reflection of the naive/unoptimised baseline expansion for memset.pattern, which will be improved in future patches. |
||
; CHECK: [[LOADSTORELOOP]]: | ||
; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP3:%.*]], %[[LOADSTORELOOP]] ] | ||
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i128, ptr [[A]], i64 [[TMP1]] | ||
; CHECK-NEXT: store i128 [[VALUE]], ptr [[TMP2]], align 1 | ||
; CHECK-NEXT: [[TMP3]] = add i64 [[TMP1]], 1 | ||
; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 1 | ||
; CHECK-NEXT: br i1 [[TMP4]], label %[[LOADSTORELOOP]], label %[[SPLIT]] | ||
; CHECK: [[SPLIT]]: | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 %value, i64 1, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_1(ptr %a, i128 %value) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_1( | ||
; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 16) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_1_nz_as(ptr addrspace(1) %a, i128 %value) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_1_nz_as( | ||
; CHECK-SAME: ptr addrspace(1) [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: br i1 false, label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]] | ||
; CHECK: [[LOADSTORELOOP]]: | ||
; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP3:%.*]], %[[LOADSTORELOOP]] ] | ||
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i128, ptr addrspace(1) [[A]], i64 [[TMP1]] | ||
; CHECK-NEXT: store i128 -113427455635030943652277463699152839203, ptr addrspace(1) [[TMP2]], align 1 | ||
; CHECK-NEXT: [[TMP3]] = add i64 [[TMP1]], 1 | ||
; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 1 | ||
; CHECK-NEXT: br i1 [[TMP4]], label %[[LOADSTORELOOP]], label %[[SPLIT]] | ||
; CHECK: [[SPLIT]]: | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr addrspace(1) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_1_align_attr( | ||
; CHECK-SAME: ptr align 16 [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.4, i64 16) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr align(16) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_16(ptr %a) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_16( | ||
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 256) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_x(ptr %a, i64 %x) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_x( | ||
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 16, [[X]] | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.6, i64 [[TMP1]]) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i128_x_nonzero_as(ptr addrspace(10) %a, i64 %x) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i128_x_nonzero_as( | ||
; CHECK-SAME: ptr addrspace(10) [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 0, [[X]] | ||
; CHECK-NEXT: br i1 [[TMP1]], label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]] | ||
; CHECK: [[LOADSTORELOOP]]: | ||
; CHECK-NEXT: [[TMP2:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP4:%.*]], %[[LOADSTORELOOP]] ] | ||
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i128, ptr addrspace(10) [[A]], i64 [[TMP2]] | ||
; CHECK-NEXT: store i128 -113427455635030943652277463699152839203, ptr addrspace(10) [[TMP3]], align 1 | ||
; CHECK-NEXT: [[TMP4]] = add i64 [[TMP2]], 1 | ||
; CHECK-NEXT: [[TMP5:%.*]] = icmp ult i64 [[TMP4]], [[X]] | ||
; CHECK-NEXT: br i1 [[TMP5]], label %[[LOADSTORELOOP]], label %[[SPLIT]] | ||
; CHECK: [[SPLIT]]: | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr addrspace(10) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i16_x(ptr %a, i64 %x) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i16_x( | ||
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 2, [[X]] | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 [[TMP1]]) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i16 u0xabcd, i64 %x, i1 false) | ||
ret void | ||
} | ||
|
||
define void @memset_pattern_i64_x(ptr %a, i64 %x) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i64_x( | ||
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 8, [[X]] | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern, i64 [[TMP1]]) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i64 u0xaaaabbbbccccdddd, i64 %x, i1 false) | ||
ret void | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do these support non-int types? Test with FP and pointer? Plus a poison, null, and nontrivial constantexpr test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, llvm.experimental.memset.pattern only accepts integer types. |
||
|
||
; Demonstrate that TBAA metadata is preserved. | ||
define void @memset_pattern_i64_128_tbaa(ptr %a) nounwind { | ||
; CHECK-LABEL: define void @memset_pattern_i64_128_tbaa( | ||
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] { | ||
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.1, i64 1024), !tbaa [[TBAA0:![0-9]+]] | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.experimental.memset.pattern(ptr %a, i64 u0x400921fb54442d18, i64 128, i1 false), !tbaa !5 | ||
ret void | ||
} | ||
|
||
!5 = !{!6, !6, i64 0} | ||
!6 = !{!"double", !7, i64 0} | ||
!7 = !{!"omnipotent char", !8, i64 0} | ||
!8 = !{!"Simple C++ TBAA"} | ||
|
||
;. | ||
; CHECK: attributes #[[ATTR0]] = { nounwind } | ||
; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) } | ||
; CHECK: attributes #[[ATTR2:[0-9]+]] = { nofree nounwind willreturn memory(argmem: readwrite) } | ||
;. | ||
; CHECK: [[TBAA0]] = !{[[META1:![0-9]+]], [[META1]], i64 0} | ||
; CHECK: [[META1]] = !{!"double", [[META2:![0-9]+]], i64 0} | ||
; CHECK: [[META2]] = !{!"omnipotent char", [[META3:![0-9]+]], i64 0} | ||
; CHECK: [[META3]] = !{!"Simple C++ TBAA"} | ||
;. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just directly put the original constant in the global array? Why do you need to process it into a specific data type? Also seems like you could use getConstantDataArrayInfo if you really want to get it into this array form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Nikita rightly pointed out, I was missing an important test case that would have perhaps made this clearer. This logic (ported across from LoopIdiomRecognize) handles what icalls "splatting". e.g. if you have an i16 used in a memset in a loop, it will create a 128-bit pattern by duplicating it that is appropriate for use with memset_pattern16. You could alternatively assembly a new i128 constant, but it's not clear that would be better or simpler than the ConstantArray approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the type isn't significant if you're just constructing a source pointer to read from
Don't need temporary std::vector to produce this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're probably talking past each other, for which I can only apologise. I'll try to explain my understanding again and hopefully the source of confusion (likely on my end!) is then more obvious.
So for memset_pattern16 we need to create a pointer argument that points to a 16 byte pattern. If the original memset.pattern intrinsic had an i128 constant argument, then that's great - we make a GlobalVariable from that directly. If it had a narrower argument, the logic here is creating a ConstantArray with that element repeated the appropriate number of times (e.g. i16 repeated 8 times). I agree the type of the pointer isn't significant, but creating a ConstantArray doesn't seem like the worst way of producing a pointer to 16bytes of a repeated value, given I have a narrower value. We could instead write logic that inspects the bit width and creates a new APInt and a constant based on that as appropriate, but I think it would be more complex (perhaps there's a handy helper I'm missing?).