Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 105 additions & 1 deletion llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
#include "llvm/Support/Casting.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h"
#include "llvm/Transforms/Utils/BuildLibCalls.h"
#include "llvm/Transforms/Utils/LowerMemIntrinsics.h"
#include "llvm/Transforms/Utils/LowerVectorIntrinsics.h"

Expand Down Expand Up @@ -233,6 +234,60 @@ static bool canEmitLibcall(const TargetMachine *TM, Function *F,
return TLI->getLibcallName(LC) != nullptr;
}

// Return a value appropriate for use with the memset_pattern16 libcall, if
// possible and if we know how. (Adapted from equivalent helper in
// LoopIdiomRecognize).
static Constant *getMemSetPattern16Value(MemSetPatternInst *Inst,
const TargetLibraryInfo &TLI) {
// TODO: This could check for UndefValue because it can be merged into any
// other valid pattern.

// Don't emit libcalls if a non-default address space is being used.
if (Inst->getRawDest()->getType()->getPointerAddressSpace() != 0)
return nullptr;

Value *V = Inst->getValue();
Type *VTy = V->getType();
const DataLayout &DL = Inst->getDataLayout();
Module *M = Inst->getModule();

if (!isLibFuncEmittable(M, &TLI, LibFunc_memset_pattern16))
return nullptr;

// If the value isn't a constant, we can't promote it to being in a constant
// array. We could theoretically do a store to an alloca or something, but
// that doesn't seem worthwhile.
Constant *C = dyn_cast<Constant>(V);
if (!C || isa<ConstantExpr>(C))
return nullptr;

// Only handle simple values that are a power of two bytes in size.
uint64_t Size = DL.getTypeSizeInBits(VTy);
if (!DL.typeSizeEqualsStoreSize(VTy) || !isPowerOf2_64(Size))
return nullptr;

// Don't care enough about darwin/ppc to implement this.
if (DL.isBigEndian())
return nullptr;

// Convert to size in bytes.
Size /= 8;

// TODO: If CI is larger than 16-bytes, we can try slicing it in half to see
// if the top and bottom are the same (e.g. for vectors and large integers).
if (Size > 16)
return nullptr;

// If the constant is exactly 16 bytes, just use it.
if (Size == 16)
return C;

// Otherwise, we'll use an array of the constants.
uint64_t ArraySize = 16 / Size;
ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just directly put the original constant in the global array? Why do you need to process it into a specific data type? Also seems like you could use getConstantDataArrayInfo if you really want to get it into this array form

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Nikita rightly pointed out, I was missing an important test case that would have perhaps made this clearer. This logic (ported across from LoopIdiomRecognize) handles what icalls "splatting". e.g. if you have an i16 used in a memset in a loop, it will create a 128-bit pattern by duplicating it that is appropriate for use with memset_pattern16. You could alternatively assembly a new i128 constant, but it's not clear that would be better or simpler than the ConstantArray approach here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the type isn't significant if you're just constructing a source pointer to read from

Don't need temporary std::vector to produce this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're probably talking past each other, for which I can only apologise. I'll try to explain my understanding again and hopefully the source of confusion (likely on my end!) is then more obvious.

So for memset_pattern16 we need to create a pointer argument that points to a 16 byte pattern. If the original memset.pattern intrinsic had an i128 constant argument, then that's great - we make a GlobalVariable from that directly. If it had a narrower argument, the logic here is creating a ConstantArray with that element repeated the appropriate number of times (e.g. i16 repeated 8 times). I agree the type of the pointer isn't significant, but creating a ConstantArray doesn't seem like the worst way of producing a pointer to 16bytes of a repeated value, given I have a narrower value. We could instead write logic that inspects the bit width and creates a new APInt and a constant based on that as appropriate, but I think it would be more complex (perhaps there's a handy helper I'm missing?).

}

// TODO: Handle atomic memcpy and memcpy.inline
// TODO: Pass ScalarEvolution
bool PreISelIntrinsicLowering::expandMemIntrinsicUses(Function &F) const {
Expand Down Expand Up @@ -323,7 +378,56 @@ bool PreISelIntrinsicLowering::expandMemIntrinsicUses(Function &F) const {
}
case Intrinsic::experimental_memset_pattern: {
auto *Memset = cast<MemSetPatternInst>(Inst);
expandMemSetPatternAsLoop(Memset);
const TargetLibraryInfo &TLI = LookupTLI(*Memset->getFunction());
Constant *PatternValue = getMemSetPattern16Value(Memset, TLI);
if (!PatternValue) {
// If it isn't possible to emit a memset_pattern16 libcall, expand to
// a loop instead.
expandMemSetPatternAsLoop(Memset);
Changed = true;
Memset->eraseFromParent();
break;
}
// FIXME: There is currently no profitability calculation for emitting
// the libcall vs expanding the memset.pattern directly.
IRBuilder<> Builder(Inst);
Module *M = Memset->getModule();
const DataLayout &DL = Memset->getDataLayout();

StringRef FuncName = "memset_pattern16";
FunctionCallee MSP = getOrInsertLibFunc(
M, TLI, LibFunc_memset_pattern16, Builder.getVoidTy(),
Memset->getRawDest()->getType(), Builder.getPtrTy(),
Memset->getLength()->getType());
inferNonMandatoryLibFuncAttrs(M, FuncName, TLI);

// Otherwise we should form a memset_pattern16. PatternValue is known
// to be an constant array of 16-bytes. Put the value into a mergable
// global.
assert(Memset->getRawDest()->getType()->getPointerAddressSpace() == 0 &&
"Should have skipped if non-zero AS");
GlobalVariable *GV = new GlobalVariable(
*M, PatternValue->getType(), /*isConstant=*/true,
GlobalValue::PrivateLinkage, PatternValue, ".memset_pattern");
GV->setUnnamedAddr(
GlobalValue::UnnamedAddr::Global); // Ok to merge these.
// TODO: Consider relaxing alignment requirement.
GV->setAlignment(Align(16));
Value *PatternPtr = GV;
Value *NumBytes = Builder.CreateMul(
Builder.getInt64(DL.getTypeSizeInBits(Memset->getValue()->getType()) /
8),
Memset->getLength());
CallInst *MemsetPattern16Call =
Builder.CreateCall(MSP, {Memset->getRawDest(), PatternPtr, NumBytes});
MemsetPattern16Call->setAAMetadata(Memset->getAAMetadata());
// Preserve any call site attributes on the destination pointer
// argument (e.g. alignment).
AttrBuilder ArgAttrs(Memset->getContext(),
Memset->getAttributes().getParamAttrs(0));
MemsetPattern16Call->setAttributes(
MemsetPattern16Call->getAttributes().addParamAttributes(
Memset->getContext(), 0, ArgAttrs));
Changed = true;
Memset->eraseFromParent();
break;
Expand Down
156 changes: 156 additions & 0 deletions llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test with e.g. an i64 constant? If I read your code correctly that is also handled by splatting, so we should have coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, definitely an oversight not to cover the 'splatting'. I've added test cases to cover this.

Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 5
; RUN: opt -mtriple=x86_64-apple-darwin10.0.0 -passes=pre-isel-intrinsic-lowering -S -o - %s | FileCheck %s

;.
; CHECK: @.memset_pattern = private unnamed_addr constant [2 x i64] [i64 -6148895925951734307, i64 -6148895925951734307], align 16
; CHECK: @.memset_pattern.1 = private unnamed_addr constant [2 x i64] [i64 4614256656552045848, i64 4614256656552045848], align 16
; CHECK: @.memset_pattern.2 = private unnamed_addr constant [8 x i16] [i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555, i16 -21555], align 16
; CHECK: @.memset_pattern.3 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
; CHECK: @.memset_pattern.4 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
; CHECK: @.memset_pattern.5 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
; CHECK: @.memset_pattern.6 = private unnamed_addr constant i128 -113427455635030943652277463699152839203, align 16
;.
define void @memset_pattern_i128_1_dynvalue(ptr %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1_dynvalue(
; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: br i1 false, label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch on false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a reflection of the naive/unoptimised baseline expansion for memset.pattern, which will be improved in future patches.

; CHECK: [[LOADSTORELOOP]]:
; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP3:%.*]], %[[LOADSTORELOOP]] ]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i128, ptr [[A]], i64 [[TMP1]]
; CHECK-NEXT: store i128 [[VALUE]], ptr [[TMP2]], align 1
; CHECK-NEXT: [[TMP3]] = add i64 [[TMP1]], 1
; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 1
; CHECK-NEXT: br i1 [[TMP4]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
; CHECK: [[SPLIT]]:
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 %value, i64 1, i1 false)
ret void
}

define void @memset_pattern_i128_1(ptr %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1(
; CHECK-SAME: ptr [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.3, i64 16)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
ret void
}

define void @memset_pattern_i128_1_nz_as(ptr addrspace(1) %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1_nz_as(
; CHECK-SAME: ptr addrspace(1) [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: br i1 false, label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
; CHECK: [[LOADSTORELOOP]]:
; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP3:%.*]], %[[LOADSTORELOOP]] ]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i128, ptr addrspace(1) [[A]], i64 [[TMP1]]
; CHECK-NEXT: store i128 -113427455635030943652277463699152839203, ptr addrspace(1) [[TMP2]], align 1
; CHECK-NEXT: [[TMP3]] = add i64 [[TMP1]], 1
; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 1
; CHECK-NEXT: br i1 [[TMP4]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
; CHECK: [[SPLIT]]:
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr addrspace(1) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
ret void
}

define void @memset_pattern_i128_1_align_attr(ptr align(16) %a, i128 %value) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_1_align_attr(
; CHECK-SAME: ptr align 16 [[A:%.*]], i128 [[VALUE:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: call void @memset_pattern16(ptr align 16 [[A]], ptr @.memset_pattern.4, i64 16)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr align(16) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 1, i1 false)
ret void
}

define void @memset_pattern_i128_16(ptr %a) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_16(
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.5, i64 256)
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 16, i1 false)
ret void
}

define void @memset_pattern_i128_x(ptr %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_x(
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 16, [[X]]
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.6, i64 [[TMP1]])
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false)
ret void
}

define void @memset_pattern_i128_x_nonzero_as(ptr addrspace(10) %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i128_x_nonzero_as(
; CHECK-SAME: ptr addrspace(10) [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 0, [[X]]
; CHECK-NEXT: br i1 [[TMP1]], label %[[SPLIT:.*]], label %[[LOADSTORELOOP:.*]]
; CHECK: [[LOADSTORELOOP]]:
; CHECK-NEXT: [[TMP2:%.*]] = phi i64 [ 0, [[TMP0:%.*]] ], [ [[TMP4:%.*]], %[[LOADSTORELOOP]] ]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i128, ptr addrspace(10) [[A]], i64 [[TMP2]]
; CHECK-NEXT: store i128 -113427455635030943652277463699152839203, ptr addrspace(10) [[TMP3]], align 1
; CHECK-NEXT: [[TMP4]] = add i64 [[TMP2]], 1
; CHECK-NEXT: [[TMP5:%.*]] = icmp ult i64 [[TMP4]], [[X]]
; CHECK-NEXT: br i1 [[TMP5]], label %[[LOADSTORELOOP]], label %[[SPLIT]]
; CHECK: [[SPLIT]]:
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr addrspace(10) %a, i128 u0xaaaaaaaabbbbbbbbccccccccdddddddd, i64 %x, i1 false)
ret void
}

define void @memset_pattern_i16_x(ptr %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i16_x(
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 2, [[X]]
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.2, i64 [[TMP1]])
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i16 u0xabcd, i64 %x, i1 false)
ret void
}

define void @memset_pattern_i64_x(ptr %a, i64 %x) nounwind {
; CHECK-LABEL: define void @memset_pattern_i64_x(
; CHECK-SAME: ptr [[A:%.*]], i64 [[X:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 8, [[X]]
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern, i64 [[TMP1]])
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i64 u0xaaaabbbbccccdddd, i64 %x, i1 false)
ret void
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these support non-int types? Test with FP and pointer? Plus a poison, null, and nontrivial constantexpr test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, llvm.experimental.memset.pattern only accepts integer types.


; Demonstrate that TBAA metadata is preserved.
define void @memset_pattern_i64_128_tbaa(ptr %a) nounwind {
; CHECK-LABEL: define void @memset_pattern_i64_128_tbaa(
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: call void @memset_pattern16(ptr [[A]], ptr @.memset_pattern.1, i64 1024), !tbaa [[TBAA0:![0-9]+]]
; CHECK-NEXT: ret void
;
tail call void @llvm.experimental.memset.pattern(ptr %a, i64 u0x400921fb54442d18, i64 128, i1 false), !tbaa !5
ret void
}

!5 = !{!6, !6, i64 0}
!6 = !{!"double", !7, i64 0}
!7 = !{!"omnipotent char", !8, i64 0}
!8 = !{!"Simple C++ TBAA"}

;.
; CHECK: attributes #[[ATTR0]] = { nounwind }
; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) }
; CHECK: attributes #[[ATTR2:[0-9]+]] = { nofree nounwind willreturn memory(argmem: readwrite) }
;.
; CHECK: [[TBAA0]] = !{[[META1:![0-9]+]], [[META1]], i64 0}
; CHECK: [[META1]] = !{!"double", [[META2:![0-9]+]], i64 0}
; CHECK: [[META2]] = !{!"omnipotent char", [[META3:![0-9]+]], i64 0}
; CHECK: [[META3]] = !{!"Simple C++ TBAA"}
;.