Skip to content

[SPIR-V][Codegen] Represent the property of the target to declare and use typed pointers and update MachineVerifier to use it #110270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

VyacheslavLevytskyy
Copy link
Contributor

@VyacheslavLevytskyy VyacheslavLevytskyy commented Sep 27, 2024

When MachineVerifier validates G_BITCAST we see a check of a kind: if Source Type is equal to Destination Type then report error "bitcast must change the type". This doesn't take into account the notion of a typed pointer that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (see, https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast). It's important for correct lowering in SPIR-V, because interpretation of the data type is not left to instructions that utilize the pointer, but encoded by the pointer declaration, and the SPIRV target can and must handle the declaration and use of pointers that specify the type of data they point to.

It's not feasible to improve validation of G_BITCAST using just information provided by low level types of source and destination. This PR proposes a solution by introducing a new class member hasTypedPointer() to represent the required target lowering property. This function is used then by MachineVerifier to allow for pointer type source and destination operands, and is overridden by SPIRV implementation of target lowering rules.

@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-backend-spir-v

Author: Vyacheslav Levytskyy (VyacheslavLevytskyy)

Changes

When MachineVerifier validates G_BITCAST we see a check of a kind: if Source Type is equal to Destination Type then report error "bitcast must change the type". This doesn't take into account the notion of a typed pointer that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (see, https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast). It's important for correct lowering in SPIR-V, because interpretation of the data type is not left to instructions that utilize the pointer, but encoded by the pointer declaration, and if the SPIRV target can and must handle the declaration and use of pointers that specify the type of data they point to.

It's not feasible to improve validation of G_BITCAST using just information provided by low level types of source and destination. This PR proposes a solution by introducing a new class member hasTypedPointer() to represent the required target lowering property. This function is used then by MachineVerifier to allow for pointer type source and destination operands, and is overridden by SPIRV implementation of target lowering rules.

where SrcTy and DstTy both being LLT pointers just have no means to express the notion of pointer type that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (see, https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

This PR helps to MachineVerifier to take into account the notion of typed pointers when checking G_BITCAST validity.


Full diff: https://github.com/llvm/llvm-project/pull/110270.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+6)
  • (modified) llvm/lib/CodeGen/MachineVerifier.cpp (+3-1)
  • (modified) llvm/lib/Target/SPIRV/SPIRVISelLowering.h (+3)
  • (added) llvm/test/MachineVerifier/SPIRV/test_typedptr_bitcast.mir (+15)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 3842af56e6b3d7..19113de188d43d 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -455,6 +455,12 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the target can handle the declaration and use of pointers
+  /// that specify the type of data they point to, meaning that interpretation
+  /// of the data type is not left to instructions that utilize the pointer, but
+  /// encoded by the pointer declaration.
+  virtual bool hasTypedPointer() const { return false; }
+
   /// Return true if the @llvm.experimental.vector.partial.reduce.* intrinsic
   /// should be expanded using generic code in SelectionDAGBuilder.
   virtual bool
diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 24a0f41775cc1d..1131ab053814a7 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -1294,7 +1294,9 @@ void MachineVerifier::verifyPreISelGenericInstruction(const MachineInstr *MI) {
       report("bitcast sizes must match", MI);
 
     if (SrcTy == DstTy)
-      report("bitcast must change the type", MI);
+      if (!SrcTy.isPointer() ||
+          !MF->getSubtarget().getTargetLowering()->hasTypedPointer())
+        report("bitcast must change the type", MI);
 
     break;
   }
diff --git a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
index 77356b7512a739..18d463d78a7f65 100644
--- a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
+++ b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
@@ -41,6 +41,9 @@ class SPIRVTargetLowering : public TargetLowering {
   // prevent creation of jump tables
   bool areJTsAllowed(const Function *) const override { return false; }
 
+  // allow for typed pointers
+  bool hasTypedPointer() const override { return true; }
+
   // This is to prevent sexts of non-i64 vector indices which are generated
   // within general IRTranslator hence type generation for it is omitted.
   MVT getVectorIdxTy(const DataLayout &DL) const override {
diff --git a/llvm/test/MachineVerifier/SPIRV/test_typedptr_bitcast.mir b/llvm/test/MachineVerifier/SPIRV/test_typedptr_bitcast.mir
new file mode 100644
index 00000000000000..cb0ff5863f6215
--- /dev/null
+++ b/llvm/test/MachineVerifier/SPIRV/test_typedptr_bitcast.mir
@@ -0,0 +1,15 @@
+#RUN: llc -mtriple=spirv64-unknown-unknown -o - -global-isel -run-pass=none -verify-machineinstrs %s 2>&1 | FileCheck %s
+
+---
+name:            test_typedptr_bitcast
+legalized:       true
+regBankSelected: false
+selected:        false
+tracksRegLiveness: true
+liveins:
+body:             |
+  bb.0:
+    ; CHECK-NOT: Bad machine code: bitcast must change the type
+    %0:_(p0) = G_IMPLICIT_DEF
+    %1:_(p0) = G_BITCAST %0
+...

@VyacheslavLevytskyy
Copy link
Contributor Author

@arsenm @efriedma-quic May I ask you for a feedback please? It's an attempt to better support specific requirements of SPIR-V code generation, similar in its nature to what we are currently discussing in #110019

/// that specify the type of data they point to, meaning that interpretation
/// of the data type is not left to instructions that utilize the pointer, but
/// encoded by the pointer declaration.
virtual bool hasTypedPointer() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this bit be in the Subtarget/Target info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is the best option. After some reflection I decided that the notion of typed pointers seems to be more of a target lowering property than property of the SPIRV target machine itself, especially given the recent appearance of untyped pointers (https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_untyped_pointers.html) that do not specify the type of data they point to. Although we can't lower without typed pointers at the moment, probably this property belongs to lowering rather than to target machine. I'm not 100% sure though.

@efriedma-quic
Copy link
Collaborator

If you need an opcode to represent a point in the code where the operand needs to be inferred to a different pointer type than result, please add a new target-specific opcode for that, instead of trying to reuse G_BITCAST.

@VyacheslavLevytskyy
Copy link
Contributor Author

If you need an opcode to represent a point in the code where the operand needs to be inferred to a different pointer type than result, please add a new target-specific opcode for that, instead of trying to reuse G_BITCAST.

I can't say that I agree with (or maybe understand the reason for) the implemented machine verification of G_BITCAST arguments, because it voluntarily restricts bitcast definition in LLVM IR (https://llvm.org/docs/LangRef.html#i-bitcast): "If the source type is a pointer, the destination type must also be a pointer of the same size", and changes its behavior wrt. address spaces: "Pointer ... types may only be converted to other pointer ... types with the same address space ...".

But almost certainly I just miss some important aspects of this, that's why I need the feedback from experienced contributors. Let me please recap, to be 100% sure that I correctly understood your advice. Are the shift in G_BITCAST's semantics and maintenance burden, that this PR suggests, substantial to the degree when it's better to explicitly convert original bitcast to a call site and lower this call directly to SPIRV's OpBitcast, ignoring existing GISel's support for bitcast?

===

In a wider context, this probably a question of whether developers of SPIRV should try to (non-invasively) enrich GISel with the notion of a target that differs from traditional cpu/gpu targets, or rather reasons of maintenance, etc. make this solely a problem of the SPIRV target's code base and GISel should remain intact with this respect?

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the exact problem you are trying to solve, but this is not the solution. G_ opcodes must have a constant behavior, and not vary between targets.

@arsenm
Copy link
Contributor

arsenm commented Sep 30, 2024

I can't say that I agree with (or maybe understand the reason for) the implemented machine verification of G_BITCAST arguments, because it voluntarily restricts bitcast definition in LLVM IR (https://llvm.org/docs/LangRef.html#i-bitcast): "If the source type is a pointer, the destination type must also be a pointer of the same size", and changes its behavior wrt. address spaces: "Pointer ... types may only be converted to other pointer ... types with the same address space ...".

Having G_BITCAST have different rules than the IR bitcast is an oddity we could relax. The IR permits no-op bitcasts. The IR also does not allow bitcasts between pointers with different address spaces (although it should.) So we could just remove this verifier check, but I don't understand why you would want this. A no-op bitcast conveys no information you could use.

Do you have sample IR and MIR with more context?

@VyacheslavLevytskyy
Copy link
Contributor Author

I don't understand the exact problem you are trying to solve, but this is not the solution. G_ opcodes must have a constant behavior, and not vary between targets.

If I understand you correctly, you mean that a G_ opcode has its fixed semantics, general enough to cover multiple targets, even when each target introduces eventually a peculiar behavior, specific to its hardware. This definition, however, allows us to slightly alter or widen G_ opcode behavior, so that G_ opcode continues to be an umbrella for all related targets.

I believe we have here exactly such a case when G_BITCAST should be relaxed a bit to continue supporting SPIRV in the same degree as other targets. I agree that it's a nice and simple solution just to remove this verifier check as not general enough, and given a difference with LLVM IR bitcast.

@VyacheslavLevytskyy
Copy link
Contributor Author

... why you would want this. A no-op bitcast conveys no information you could use.
Do you have sample IR and MIR with more context?

SPIR-V uses typed pointers, so in case of SPIR-V G_BITCAST between two pointers does convey an information about the pointee. This has a wider context, but let's start with requirements of SPIRV specification, that is reason purely in terms of output SPIRV code validity.

Consider LLVM IR

define void @foo(i1 %arg) {
entry:
  %r1 = tail call ptr @f1()
  %r2 = tail call ptr @f2()
  br i1 %arg, label %l1, label %l2

l1:
  br label %exit

l2:
  br label %exit

exit:
  %ret = phi ptr [ %r1, %l1 ], [ %r2, %l2 ]
  ret void
}

define ptr @f1() {
entry:
  %p = alloca i8
  store i8 8, ptr %p
  ret ptr %p
}

define ptr @f2() {
entry:
  %p = alloca i32
  store i32 32, ptr %p
  ret ptr %p
}

This should be translated to the SPIRV code

        %foo = OpFunction %void None %4
        %arg = OpFunctionParameter %bool
         %23 = OpLabel
         %r1 = OpFunctionCall %_ptr_Function_uchar %f1
         %16 = OpFunctionCall %_ptr_Function_uint %f2
         %r2 = OpBitcast %_ptr_Function_uchar %16
               OpBranchConditional %arg %24 %25
         %24 = OpLabel
               OpBranch %26
         %25 = OpLabel
               OpBranch %26
         %26 = OpLabel
        %ret = OpPhi %_ptr_Function_uchar %r1 %24 %r2 %25
               OpReturn
               OpFunctionEnd

         %f1 = OpFunction %_ptr_Function_uchar None %8
         %27 = OpLabel
          %p = OpVariable %_ptr_Function_uchar Function
               OpStore %p %uchar_8 Aligned 1
               OpReturnValue %p
               OpFunctionEnd

         %f2 = OpFunction %_ptr_Function_uint None %10
         %28 = OpLabel
        %p_0 = OpVariable %_ptr_Function_uint Function
               OpStore %p_0 %uint_32 Aligned 4
               OpReturnValue %p_0
               OpFunctionEnd

The relevant part is

         %r1 = OpFunctionCall %_ptr_Function_uchar %f1
         %16 = OpFunctionCall %_ptr_Function_uint %f2
         %r2 = OpBitcast %_ptr_Function_uchar %16
        %ret = OpPhi %_ptr_Function_uchar %r1 %24 %r2 %25

From LLVM IR, having definitions of f1() and f2(), we must assign %r1 to be a pointer to uchar and %r2 a pointer to uint. They both are included into OpPhi, and must be of the same type formally. That's why in SPIRV code you see the bitcast %r2 = OpBitcast %_ptr_Function_uchar %16 that is to get a pointer to uchar from %r2 and produce a valid SPIRV code.

This example is purely artificial, but it demonstrates the difference between LLVM IR (higher level), SPIRV (higher level) and other targets (lower level). LLVM IR uses untyped pointers and doesn't have the problem, other targets don't care about types at all and doesn't have the problem. Being a higher level representation and carrying about types of pointers, SPIRV is to convey the meaning with the G_BITCAST opcode.

In real world application we do care about validity of SPIRV output, so this example is not as artificial actually. Also in real world applications we get a more complicated picture when we may need, for example, to re-interpret function argument, explicitly set by a customer with ptr byval(%"some_user_wrapper_struct") as a pointer to the underlying integer type to get a proper load instruction, like in

spir_kernel void @foo(ptr byval(%"class::id") %arg_buf {
...
%id = load i64, ptr %arg_buf

translated to

%Casted = OpBitcast %_ptr_Function_ulong %arg_buf
%Id = OpLoad %ulong  %Casted

To recap, when dealing with typed pointers, G_BITCAST does convey information. If we may just delete this machine verifier check, it indeed resolves the problem with GISel interpretation of what is valid and what's not in this case.

@VyacheslavLevytskyy
Copy link
Contributor Author

@arsenm Please let me know if this makes sense, and in that case I will rework this PR just to removal of the machine verifier check in question and the corresponding part of the machine verifier's unit test.

@arsenm
Copy link
Contributor

arsenm commented Oct 1, 2024

SPIR-V uses typed pointers, so in case of SPIR-V G_BITCAST between two pointers does convey an information about the pointee.

But it does not. G_BITCAST does not carry additional information about the pointee type. Your OpBitcast is carrying more information than G_BITCAST, so it's not the same thing. If we were to allow G_BITCAST to have identical types, it would still be a legal transformation to just replace G_BITCAST with COPY. Your operator here is not the same as bitcast, and requires introducing something else.

The relevant part is

         %r1 = OpFunctionCall %_ptr_Function_uchar %f1
         %16 = OpFunctionCall %_ptr_Function_uint %f2
         %r2 = OpBitcast %_ptr_Function_uchar %16
        %ret = OpPhi %_ptr_Function_uchar %r1 %24 %r2 %25

I don't understand how the system is supposed to work as a whole. The pointee types do not exist in the LLVM IR, so how are you supposed to just introduce them out of nowhere? You have to just treat everything as uchar*? At what point are these OpBitcasts getting produced?

From LLVM IR, having definitions of f1() and f2(), we must assign %r1 to be a pointer to uchar and %r2 a pointer to uint.

Based on what? This information is not in the original IR

@VyacheslavLevytskyy
Copy link
Contributor Author

VyacheslavLevytskyy commented Oct 1, 2024

We are just getting deeper into implementation details of SPIRV Backend and shifting the discussion from G_BITCAST to a much more complicated and wider topic of how SPIRV Backend is getting along with GISel without actually being supported by GISel in the part of typed pointers (and others). I'm not sure that it's absolutely relevant to this particular PR, although I'd be happy to provide you with any details of implementation.

But it does not. G_BITCAST does not carry additional information about the pointee type. Your OpBitcast is carrying more information than G_BITCAST.

It's a good point and you are absolutely right in that. G_BITCAST doesn't pass this info on its own, but only with the help of additional data structures. That's why you see in this PR implementation as an if based on target-specific lowering properties. G_BITCAST on its own has no place to store anything related to typed pointers and so to reason about them, so this current implementation of the PR in fact proposes to G_BITCAST just to believe that the target knows what it's doing when inserting G_BITCAST with two pointers. On GISel level there is no way to check if this operation makes sense and is not a no-op, that's why a new method in target lowering that can be overridden by a target.

From LLVM IR, having definitions of f1() and f2(), we must assign %r1 to be a pointer to uchar and %r2 a pointer to uint.

Based on what? This information is not in the original IR

Inferring types from store i32 32, ptr %p. We clearly see that %p is i32* in this case. Unfortunately SPIRV spec requires us to know (and so to deduce) this.

At what point are these OpBitcasts getting produced?

During instruction selection step.

GISel doesn't keep track of how newly created virtual registers are related to original LLVM IR values. SPIRV Backend needs to do it however, to conform with the requirement to track types in general (not LLT types). GISel notion of a virtual register doesn't perfectly match SPIRV notion of identifier. LLVM IR doesn't provide direct info about pointee types, but it's possible to infer it. All these points are pain points, but they are not directly related to this PR. The focus point of this PR is that G_BITCAST restricts bitcast from using pointer to pointer conversion, because the notion of typed pointers is removed from LLVM IR, but not from SPIRV. In SPIRV backend this operation not simply "makes sense", but is required. This G_BITCAST machine verifier check looks like an imperfection that we have a chance to fix.

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Oct 1, 2024

That's why I think you should add a separate opcode not named G_BITCAST. You don't really get any benefit from reusing the existing opcode name, and you confuse people who aren't aware of how this works.

@VyacheslavLevytskyy
Copy link
Contributor Author

That's why I think you should add a separate opcode not named G_BITCAST. You don't really get any benefit from reusing the existing opcode name, and you confuse people who aren't aware of how this works.

Appreciate your feedback. I don't quite agree with this approach in general, although probably I understand the rationale behind it. I'd rather expect that in less invasive and more trivial cases GISel could easier accommodate client targets and slightly widen its applicability.

@VyacheslavLevytskyy
Copy link
Contributor Author

@arsenm May I ask your opinion as well?

@arsenm
Copy link
Contributor

arsenm commented Oct 2, 2024

@arsenm May I ask your opinion as well?

You are looking for something that is not a no-op bitcast, so yes this is a different operation. Additionally, we cannot have optional restrictions on opcodes. Any code would have to follow the lowest common denominator, and the point of having these restrictions is to reduce the number of cases that code operating on instructions need to consider

@VyacheslavLevytskyy
Copy link
Contributor Author

@efriedma-quic @arsenm Appreciate your feedback on the issue. I'm going to close this PR in favor of the solution specific and local wrt. SPIR-V (#114216) when it gets approved.

On another note, in my opinion, still we should think about this subtle difference between G_BITCAST and IR's bitcast (as in #110270 (comment)) and probably proceed with deletion of this check for pointers if Source Type is equal to Destination Type then report error "bitcast must change the type", but, given a wider context unrelated to SPIR-V specifically, this is a matter for a separate discussion.

VyacheslavLevytskyy added a commit that referenced this pull request Oct 30, 2024
…114216)

This PR implements instruction selection for G_BITCAST on an earlier
stage to avoid MachineVerifier complains on subtle semantics difference
between G_BITCAST and OpBitcast.

We do instruction selections for OpBitcast after IR Translation instead
of calling MIB.buildBitcast() generating the general op code G_BITCAST,
because when MachineVerifier validates G_BITCAST we see a check of a
kind: 'if Source Type is equal to Destination Type then report error
"bitcast must change the type"'. This doesn't take into account the
notion of a typed pointer that is important for SPIR-V where a user may
and should use bitcast between pointers with different pointee types
(https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

It's important for correct lowering in SPIR-V, because interpretation of
the data type is not left to instructions that utilize the pointer, but
encoded by the pointer declaration, and the SPIRV target can and must
handle the declaration and use of pointers that specify the type of data
they point to.

It's not feasible to improve validation of G_BITCAST using just
information provided by low level types of source and destination.
Therefore we don't produce G_BITCAST as the general op code with
semantics different from OpBitcast, but rather lower to OpBitcast
immediately.

See discussion in #110270 for
even more context.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…lvm#114216)

This PR implements instruction selection for G_BITCAST on an earlier
stage to avoid MachineVerifier complains on subtle semantics difference
between G_BITCAST and OpBitcast.

We do instruction selections for OpBitcast after IR Translation instead
of calling MIB.buildBitcast() generating the general op code G_BITCAST,
because when MachineVerifier validates G_BITCAST we see a check of a
kind: 'if Source Type is equal to Destination Type then report error
"bitcast must change the type"'. This doesn't take into account the
notion of a typed pointer that is important for SPIR-V where a user may
and should use bitcast between pointers with different pointee types
(https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

It's important for correct lowering in SPIR-V, because interpretation of
the data type is not left to instructions that utilize the pointer, but
encoded by the pointer declaration, and the SPIRV target can and must
handle the declaration and use of pointers that specify the type of data
they point to.

It's not feasible to improve validation of G_BITCAST using just
information provided by low level types of source and destination.
Therefore we don't produce G_BITCAST as the general op code with
semantics different from OpBitcast, but rather lower to OpBitcast
immediately.

See discussion in llvm#110270 for
even more context.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…lvm#114216)

This PR implements instruction selection for G_BITCAST on an earlier
stage to avoid MachineVerifier complains on subtle semantics difference
between G_BITCAST and OpBitcast.

We do instruction selections for OpBitcast after IR Translation instead
of calling MIB.buildBitcast() generating the general op code G_BITCAST,
because when MachineVerifier validates G_BITCAST we see a check of a
kind: 'if Source Type is equal to Destination Type then report error
"bitcast must change the type"'. This doesn't take into account the
notion of a typed pointer that is important for SPIR-V where a user may
and should use bitcast between pointers with different pointee types
(https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

It's important for correct lowering in SPIR-V, because interpretation of
the data type is not left to instructions that utilize the pointer, but
encoded by the pointer declaration, and the SPIRV target can and must
handle the declaration and use of pointers that specify the type of data
they point to.

It's not feasible to improve validation of G_BITCAST using just
information provided by low level types of source and destination.
Therefore we don't produce G_BITCAST as the general op code with
semantics different from OpBitcast, but rather lower to OpBitcast
immediately.

See discussion in llvm#110270 for
even more context.
@VyacheslavLevytskyy
Copy link
Contributor Author

Close in favor of the solution specific and local wrt. the SPIR-V Backend: #114216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants