Skip to content

[clang][AMDGPU][CUDA] Handle __builtin_printf for device printf #68515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

Maetveis
Copy link
Contributor

@Maetveis Maetveis commented Oct 8, 2023

Previously __builtin_printf would result to emitting call to printf, even though directly calling printf was translated.

Ref: #68478

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 8, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2023

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Changes

Previously __builtin_printf would result to emitting call to printf, even though directly calling printf was translated.

Ref: #68478


Full diff: https://github.com/llvm/llvm-project/pull/68515.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+1)
  • (modified) clang/lib/CodeGen/CGGPUBuiltin.cpp (+2-1)
  • (added) clang/test/CodeGenCUDA/printf-builtin.cu (+20)
  • (added) clang/test/CodeGenHIP/printf-builtin.hip (+21)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index bf984861bccb5cc..c16c005787ca778 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5464,6 +5464,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *HalfVal = Builder.CreateLoad(Address);
     return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
   }
+  case Builtin::BI__builtin_printf:
   case Builtin::BIprintf:
     if (getTarget().getTriple().isNVPTX() ||
         getTarget().getTriple().isAMDGCN()) {
diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index 75fb06de938425d..794be0520163157 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -135,7 +135,8 @@ RValue EmitDevicePrintfCallExpr(const CallExpr *E, CodeGenFunction *CGF,
                                 llvm::Function *Decl, bool WithSizeArg) {
   CodeGenModule &CGM = CGF->CGM;
   CGBuilderTy &Builder = CGF->Builder;
-  assert(E->getBuiltinCallee() == Builtin::BIprintf);
+  assert(E->getBuiltinCallee() == Builtin::BIprintf ||
+         E->getBuiltinCallee() == Builtin::BI__builtin_printf);
   assert(E->getNumArgs() >= 1); // printf always has at least one arg.
 
   // Uses the same format as nvptx for the argument packing, but also passes
diff --git a/clang/test/CodeGenCUDA/printf-builtin.cu b/clang/test/CodeGenCUDA/printf-builtin.cu
new file mode 100644
index 000000000000000..586d00a878ddf89
--- /dev/null
+++ b/clang/test/CodeGenCUDA/printf-builtin.cu
@@ -0,0 +1,20 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -disable-llvm-optzns -fno-builtin-printf -fcuda-is-device \
+// RUN:   -o - %s | FileCheck  %s
+
+#define __device__ __attribute__((device))
+
+extern "C" __device__ int printf(const char *format, ...);
+
+// CHECK-LABEL: @_Z4foo1v()
+__device__ int foo1() {
+  // CHECK-NOT: call i32 (ptr, ...) @printf
+  return __builtin_printf("Hello World\n");
+}
+
+// CHECK-LABEL: @_Z4foo2v()
+__device__ int foo2() {
+  // CHECK: call i32 (ptr, ...) @printf
+  return printf("Hello World\n");
+}
diff --git a/clang/test/CodeGenHIP/printf-builtin.hip b/clang/test/CodeGenHIP/printf-builtin.hip
new file mode 100644
index 000000000000000..76c7d41376c972d
--- /dev/null
+++ b/clang/test/CodeGenHIP/printf-builtin.hip
@@ -0,0 +1,21 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -disable-llvm-optzns -mprintf-kind=hostcall -fno-builtin-printf -fcuda-is-device \
+// RUN:   -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -disable-llvm-optzns -mprintf-kind=buffered -fno-builtin-printf -fcuda-is-device \
+// RUN:   -o - %s | FileCheck %s
+
+#define __device__ __attribute__((device))
+
+extern "C" __device__ int printf(const char *format, ...);
+
+// CHECK-LABEL: @_Z4foo1v()
+__device__ int foo1() {
+  // CHECK-NOT: call i32 (ptr, ...) @printf
+  return __builtin_printf("Hello World\n");
+}
+
+// CHECK-LABEL: @_Z4foo2v()
+__device__ int foo2() {
+  // CHECK: call i32 (ptr, ...) @printf
+  return printf("Hello World\n");
+}

@Maetveis
Copy link
Contributor Author

Maetveis commented Feb 3, 2024

@jlebar @yxsamliu I don't know who could review this, but it is sitting here for quite some time unnoticed now, while being a simple enough change IMO.

@jlebar
Copy link
Member

jlebar commented Feb 4, 2024

It looks reasonable to me, although I'm not really an AMDGPU person. /me summons @arsenm ?

Copy link
Collaborator

@yxsamliu yxsamliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@yxsamliu
Copy link
Collaborator

yxsamliu commented Feb 4, 2024

It looks reasonable to me, although I'm not really an AMDGPU person. /me summons @arsenm ?

AMDGPU backend relies on LLVM passes to translate printf at IR level. __builtin_printf and printf should be equivalent other than the name.

@arsenm
Copy link
Contributor

arsenm commented Feb 5, 2024

It looks reasonable to me, although I'm not really an AMDGPU person. /me summons @arsenm ?

AMDGPU backend relies on LLVM passes to translate printf at IR level.

For the OpenCL case only, not for HIP/OpenMP

Previously __builtin_printf would result to emitting call to printf,
even though directly calling printf was translated.

Ref: llvm#68478
@Maetveis Maetveis force-pushed the gpu_handle_builtin_printf branch from 193c652 to b7a87d0 Compare February 5, 2024 14:58
@Maetveis Maetveis requested a review from arsenm February 5, 2024 14:58
@Maetveis
Copy link
Contributor Author

Maetveis commented Feb 5, 2024

I don't have commit access, @arsenm please merge.

@arsenm arsenm merged commit 5942868 into llvm:main Feb 5, 2024
@Maetveis Maetveis deleted the gpu_handle_builtin_printf branch February 5, 2024 20:24
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…#68515)

Previously `__builtin_printf` would result to emitting call to `printf`,
even though directly calling `printf` was translated.

Ref: llvm#68478
ichaer added a commit to ichaer/llvm-project-onesided_lower_bound that referenced this pull request Feb 12, 2024
* llvm/main: (328 commits)
  [Flang][OpenMP] Attempt to make map-types-and-sizes.f90 test more agnostic to other architectures
  [Transforms] Add more cos combinations to SimplifyLibCalls and InstCombine (llvm#79699)
  [workflows] Close issues used for backports once the PR has been created (llvm#80394)
  [RISCV] Add support for RISC-V Pointer Masking (llvm#79929)
  [lldb] Cleanup regex in libcxx formatters (NFC) (llvm#80618)
  [lldb] Remove unused private TypeCategoryMap methods (NFC) (llvm#80602)
  [mlir][sparse] refine sparse assembler strategy (llvm#80521)
  [NFC] Fix typo (llvm#80703)
  Fix broken ARM processor features test (llvm#80717)
  [ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (llvm#80657)
  [x86_64][windows][swift] do not use Swift async extended frame for wi… (llvm#80468)
  [X86] addConstantComments - add FP16 MOVSH asm comments support
  [X86] Regenerate some vector constant comments missed in recent patches to improve mask predicate handling in addConstantComments
  [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (llvm#68515)
  Add some clarification to email check message
  [GitHub][Workflows] Prevent multiple private email comments (temporarily) (llvm#80648)
  [workflows] Use /mnt as the build directory on Linux (llvm#80583)
  [Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (llvm#71766)
  [AMDGPU] GlobalISel for f8 conversions (llvm#80503)
  [AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8 (llvm#80502)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants