-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AMDGPU] Treat printf as builtin for OpenCL #72554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You can test this locally with the following command:git-clang-format --diff 5353d3f509814a44093a61c2725fdfe7273aa25a 9833353ab6d7bb9716883b89f4e8b90285c1a60c -- clang/lib/AST/Decl.cpp clang/lib/Basic/Targets/AMDGPU.cpp clang/lib/CodeGen/CGBuiltin.cpp View the diff from clang-format here.diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 0cf6daee87..307cfa49f5 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -92,8 +92,7 @@ static constexpr Builtin::Info BuiltinInfo[] = {
#define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) \
{#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
#define LANGBUILTIN(ID, TYPE, ATTRS, LANG) \
- { #ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANG } \
- ,
+ {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANG},
#include "clang/Basic/BuiltinsAMDGPU.def"
};
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any tests? Can you explain why it's not sufficient to do this lowering in the AMDGPU pass?
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
// Mutate the printf builtin ID so that we use the same CodeGen path for | ||
// HIP and OpenCL with AMDGPU targets. | ||
if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf) | ||
BuiltinID = Builtin::BIprintf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very close to landing 'real' printf support in the GPU libc where printf
is just a regular function call. Will this change the handling for that in any way? I've already had to make the backend pass respect -fno-builtins
and remove ockl
from OpenMP to make that possible so I'm hoping we don't end up with a lot more special casing for printf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhuber6 I do not believe there are any current plans to use GPU libc from HIP or OpenCL. So there will continue to be a division between OpenMP and HIP and OpenCL printf handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do the eager replacement of printf
that HIP and OpenCL uses currently then it won't be linked in. So users should still be able to link in stuff like strcmp
or whatever without it interfering. This would require the new driver however, and if they attempted to use something like fputs
it would segfault because no one initialized the buffer, which isn't a terrible failure mode all things considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhuber6 , I had your implementation in mind when I wrote this, The printf wont be expanded by clang with "-fno-builtin" and users would still have an option to use a lib variant if need be. This also makes code more elegant as we would not have to hack the "fno-builtin" handling into the implementation, this is part of clang builtin handling.
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") | |||
|
|||
// OpenCL | |||
LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to define a new target builtin, just to hack it to the generic lang builtin later? Just handle the existing printf builtin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically to recognize the OpenCL version of printf (where fmt string arg is a pointer to const address space) as a builtin. The hack to generic builtin is just a option that I had as I did not want to add a new case to builtin expansion code (since the API used by both OpenCL and HIP are same ), however Im okay with adding a new case too if you feel it makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see why this is necessary. A target-defined language-specific builtin is a whole new beast. What is missing in the current parsing of OpenCL printf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssahasra , I still feel this is the way to move here since I dont see a way to access the printf option at IR level (i.e during optimization pipeline) and thus decide version of printf to use. It has to be at clang CodeGen. I ask other reviewers too if they feel there are major concerns with adding such a builtin variant (i.e AMDGPU and OCL specific). I might have to look for alternative approaches if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're handling the builtin in clang directly, you can go off the original Builtin::BIprintf. I don't see what the alias AMDGPU::BIprintf is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL spec says printf format string should be in constant address space. This makes the OCL printf signature target specific and hence we would need a target specific builtinID to recognize this. Im not sure I understand how we can go ahead with generic "BIPrinf" here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address space makes the type language dependent, it does not make it target dependent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @vikramRH is saying is that the magic number "4" for OpenCL address space "__constant" is specific to AMDGPU.
I intended these changes to be part of #72556, but it seemed too many changes at one place, so I extracted this part out for ease of review. This cannot be merged standalone and has to be with 72556 , the tests are also part of that patch (This really should have been a stack of patches :( ). Also for AMDGPU pass, I plan to remove that altogether and handle all printf lowering at one place during clang Codegen. since we now use a compiler option to switch between different implementations, This makes a lot more sense I feel. |
6ace9d0
to
9833353
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with #68515? Do we just need to add OpenCL test coverage?
closing this in favour of #86801 |
This is a prerequisite to enable opencl hostcall printf. ensures that AMDGPU printf calls are lowered at clang CodeGen for both HIP and OCL.