Skip to content

[AutoDiff][DebugInfo] Properly transfer location info from debug_value into alloc_stack in pullback cloner #42245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2022

Conversation

asl
Copy link
Contributor

@asl asl commented Apr 7, 2022

In order to transfer debug info from debug_value to alloc_stack we need to drop op_ref debug expression, as the latter instruction represents the location and not the value itself. Also, fix the debug info verifier as variable type was deduced from alloc_stack / alloc_box improperly: pointer type of instruction itself was used instead of underlying object type.

The issue is only exposed when both optimizations (SIL mem2reg at least) and debug info are enabled

Resolves SR-15849

@asl asl changed the title [AutoDiff][DebugInfo] Properly transfer location info from debug_value into alloc_stack [AutoDiff][DebugInfo] Properly transfer location info from debug_value into alloc_stack in pullback cloner Apr 7, 2022
@asl
Copy link
Contributor Author

asl commented Apr 7, 2022

Tagging @rxwei

@rxwei rxwei requested review from rxwei and adrian-prantl April 7, 2022 17:34
@rxwei
Copy link
Contributor

rxwei commented Apr 7, 2022

Did you mean SR-15849 instead?

@asl
Copy link
Contributor Author

asl commented Apr 7, 2022

Oh, yes, sorry!

@asl asl requested a review from adrian-prantl April 8, 2022 17:20
Copy link
Contributor

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one last question inline.

@asl
Copy link
Contributor Author

asl commented Apr 8, 2022

@rxwei @adrian-prantl Will you please trigger ci for me?

@BradLarson
Copy link
Contributor

@swift-ci Please test.

@rxwei
Copy link
Contributor

rxwei commented Apr 8, 2022

@swift-ci please test linux

@rxwei
Copy link
Contributor

rxwei commented Apr 9, 2022

@swift-ci please test

@rxwei
Copy link
Contributor

rxwei commented Apr 10, 2022

@swift-ci please test linux

@asl
Copy link
Contributor Author

asl commented Apr 10, 2022

Weird, now we have difference between Linux and Mac. The SIL is different though:
Mac:

bb0(%0 : $*Model.TangentVector, %1 : $_AD__$s4main6adjust5model10multiplieryAA5ModelVz_SftF_bb3__PB__src_0_wrt_0_1):
  %2 = alloc_stack $Model.TangentVector, var, name "model", argno 1 // users: %55, %54, %52, %48, %47, %45, %6, %107, %93, %66
  %3 = metatype $@thin Model.TangentVector.Type   // user: %5
  // function_ref static Model.TangentVector.zero.getter
  %4 = function_ref @$s4main5ModelV13TangentVectorV4zeroAEvgZ : $@convention(method) (@thin Model.TangentVector.Type) -> Model.TangentVector // user: %5
  %5 = apply %4(%3) : $@convention(method) (@thin Model.TangentVector.Type) -> Model.TangentVector // user: %6
  store %5 to %2 : $*Model.TangentVector          // id: %6
  %7 = tuple ()
  %8 = alloc_stack $Model.TangentVector, var, name "model", argno 1 // users: %41, %12, %106, %105, %39, %37
...

Linux:

bb0(%0 : $*Model.TangentVector, %1 : @owned $_AD__$s4main6adjust5model10multiplieryAA5ModelVz_SftF_bb3__PB__src_0_wrt_0_1):
  %2 = alloc_stack $Builtin.FPIEEE32, var, (name "model", loc "/home/build-user/swift/test/AutoDiff/validation-test/inout_control_flow.swift":43:13, scope 0), argno 1, type $*Model.TangentVector, expr op_fragment:#Float._value // users: %145, %58, %65, %80, %87, %71, %93, %10, %117, %106
  %3 = alloc_stack $Builtin.FPIEEE32, var, (name "model", loc "/home/build-user/swift/test/AutoDiff/validation-test/inout_control_flow.swift":43:13, scope 0), argno 1, type $*Model.TangentVector, expr op_fragment:#Float._value // users: %144, %133, %60, %67, %82, %89, %73, %95, %12, %137
  %4 = metatype $@thin Model.TangentVector.Type   // user: %6
  // function_ref static Model.TangentVector.zero.getter
  %5 = function_ref @$s4main5ModelV13TangentVectorV4zeroAEvgZ : $@convention(method) (@thin Model.TangentVector.Type) -> Model.TangentVector // user: %6
  %6 = apply %5(%4) : $@convention(method) (@thin Model.TangentVector.Type) -> Model.TangentVector // user: %7
  (%7, %8) = destructure_struct %6 : $Model.TangentVector // users: %9, %11
  %9 = destructure_struct %7 : $Float             // user: %10
  store %9 to [trivial] %2 : $*Builtin.FPIEEE32   // id: %10
  %11 = destructure_struct %8 : $Float            // user: %12
  store %11 to [trivial] %3 : $*Builtin.FPIEEE32  // id: %12
  %13 = tuple ()
  %14 = alloc_stack $Model.TangentVector, var, name "model", argno 1 // users: %49, %46, %51, %18, %143

Note that on Linux we allocate two separate stack slots for members of $Model.TangentVector and, more important, provide the pointer type as type for variable: type $*Model.TangentVector, expr op_fragment:#Float._value which seems to be incorrect to me. @adrian-prantl please verify :)

@asl
Copy link
Contributor Author

asl commented Apr 11, 2022

After some plumbing with cross compiling I was able to reproduce linux issue with cross-compiler. Looking

@asl
Copy link
Contributor Author

asl commented Apr 11, 2022

Looks like the problem is exposed by SIL SROA as it uses SILDebugVariable::createFromAllocation(const AllocationInst *AI). And we're having here:

  // Coalesce the debug loc attached on AI into VarInfo
  SILType Type = AI->getType();

I guess the real question is: per SIL specification we do not need op_deref on debug_value for alloca instruction. What if the debug info is attached to alloca itself? Do we need to use pointer type or value type? I'd lean for value type as things like type $*Model.TangentVector, expr op_fragment:#Float._value do not make much sense to me.

Another question is as follows:

bb0(%0 : $*TestType.ManualTestTangent, %1 : $_AD__$s4main8TestTypeV24doDifferentiableTimeStep04timeG0ySf_tF_bb6__PB__src_0_wrt_0_1):
  %2 = alloc_stack $TestType.ManualTestTangent, var, name "self", argno 2, implicit // users: %41, %33, %8, %50, %39
  %3 = integer_literal $Builtin.Int64, 0          // user: %4
  %4 = builtin "sitofp_Int64_FPIEEE32"(%3 : $Builtin.Int64) : $Builtin.FPIEEE32 // user: %5
  %5 = struct $Float (%4 : $Builtin.FPIEEE32)     // users: %51, %6, %6, %6
  %6 = struct $TestType.TestState.TangentVector (%5 : $Float, %5 : $Float, %5 : $Float) // user: %7
  %7 = struct $TestType.ManualTestTangent (%6 : $TestType.TestState.TangentVector) // user: %8
  store %7 to %2 : $*TestType.ManualTestTangent   // id: %8
  %9 = alloc_stack $TestType.ManualTestTangent, var, name "self", argno 2, implicit // users: %20, %12, %49, %18
  %10 = struct_extract %1 : $_AD__$s4main8TestTypeV24doDifferentiableTimeStep04timeG0ySf_tF_bb6__PB__src_0_wrt_0_1, #_AD__$s4main8TestTypeV24doDifferentiableTimeStep04timeG0ySf_tF_bb6__PB__src_0_wrt_0_1.predecessor // user: %14
  %11 = load %0 : $*TestType.ManualTestTangent    // users: %12, %13, %27, %30
  store %11 to %9 : $*TestType.ManualTestTangent  // id: %12
  debug_value %11 : $TestType.ManualTestTangent, var, name "self", argno 2, implicit // id: %13

Here we're having have two locations for "self" – one in alloca w/o explicit type and another one on debug_value. Do we consider this debug info as consistent?

@BradLarson
Copy link
Contributor

@swift-ci Please test.

@rxwei
Copy link
Contributor

rxwei commented Apr 13, 2022

I'll defer to @adrian-prantl on the question above.

@adrian-prantl
Copy link
Contributor

You might want to enable -sil-print-debuginfo so you can be sure that both instructions refer to the same self and one isn't an inlined self from another method. I also think the type used by the alloc_stack is hidden from sight without this option. There should be an explicit type $*TestType.ManualTestTangent there.

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

@adrian-prantl Yes, they refer to the same self – this is a compiler-generated code (from autodiff routines). And I double verified: there is no explicit type there:

  %2 = alloc_stack $TestType.ManualTestTangent, var, name "self", argno 2, implicit, loc "sr15849-invalid-debug-info.swift":82:16, scope 309 // users: %41, %33, %8, %50, %39

@adrian-prantl
Copy link
Contributor

Right, sorry. If the explicit type is missing, the type is the one from of the alloc_stack.

I guess the real question is: per SIL specification we do not need op_deref on debug_value for alloca instruction. What if the debug info is attached to alloca itself? Do we need to use pointer type or value type? I'd lean for value type as things like type $*Model.TangentVector, expr op_fragment:#Float._value do not make much sense to me.

When it's attached to the alloca and there's no expression, the value type from the alloca instruction is used.

Here we're having have two locations for "self" – one in alloca w/o explicit type and another one on debug_value. Do we consider this debug info as consistent?

Yes. There can be many debug_value instructions for one variable. This is particularly important for optimized code as the debug_values follow the value around as it's moved from one SSA value to another. In your example the debug_value is redundant, but benign.

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

When it's attached to the alloca and there's no expression, the value type from the alloca instruction is used.

How we'd treat expr op_fragment:#Float._value then? type $*Model.TangentVector, expr op_fragment:#Float._value does look strange, no?

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

Yes. There can be many debug_value instructions for one variable. This is particularly important for optimized code as the debug_values follow the value around as it's moved from one SSA value to another. In your example the debug_value is redundant, but benign.

However, here (in example) we're having clash in SIL debug info verifier. As first it sees alloca's type (pointer one) and after this – debug_value one (value type). Hence the clash and incompatibility :)

@adrian-prantl
Copy link
Contributor

@mshockwave do you happen to remember the design decisions that went into pointer versus value types?

@adrian-prantl
Copy link
Contributor

I think that the debug_value should really be

debug_value %11 : $*TestType.ManualTestTangent, var, name "self", argno 2, implicit, expr op_deref

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

@adrian-prantl it cannot be – %11 is of $TestType.ManualTestTangent type and it's created by InstructionDeleter via salvageDebugInfo. Also sil-mem2reg promotes pointers to value types

@adrian-prantl
Copy link
Contributor

Sorry, I should really read your example more carefully. %11 is the load of the alloca, not the alloca, so the value type is correct.

@adrian-prantl
Copy link
Contributor

What I meant to say is that the debug info in
%2 = alloc_stack $TestType.ManualTestTangent, var, name "self", argno 2, implicit
and
debug_value %2 : $*TestType.ManualTestTangent, var, name "self", argno 2, implicit, expr op_deref
are equivalent.

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

@adrian-prantl Great, thanks! What do we do with op_fragment's? Is this a valid debug info or not?

  %2 = alloc_stack $Builtin.FPIEEE32, var, (name "model", loc "/home/build-user/swift/test/AutoDiff/validation-test/inout_control_flow.swift":43:13, scope 0), argno 1, type $*Model.TangentVector, expr op_fragment:#Float._value // users: %145, %58, %65, %80, %87, %71, %93, %10, %117, %106

@adrian-prantl
Copy link
Contributor

The fragment looks fine to me and is consistent with the tests: https://github.com/apple/swift/blob/b5ed5f81409887abfe449a55274d88394a58f317/test/DebugInfo/sroa_mem2reg.sil#L60

@asl
Copy link
Contributor Author

asl commented Apr 13, 2022

So, op_fragment should imply op_deref then? The last (Linux) test fail was because we had pointer type + op_fragment and corresponding value type debug info for the same incoming arg. This caused debug info verifier to fail.

In my last commit I changed sroa to use value type debug info (and fixed tests)

@adrian-prantl
Copy link
Contributor

In this example:

debug_value %[[FIELD_X]] : $Int64, var, (name "my_struct", loc "sroa.swift":8:9, scope 2), type $*MyStruct, expr op_fragment:#MyStruct.x

there is no implied deref. The debug_value describes the new location of the x field of MyStruct whose value is in FIELD_X.

@asl
Copy link
Contributor Author

asl commented Apr 14, 2022

@adrian-prantl Fair enough and thanks for the clarification (my mental model was "set of operations to obtain a value of variable"). Then I guess the debug info verifier should be fixed as it is not clear what it should check conceptually

Consider the following cases:

  1. Changed location of a variable / parameter
  %2 = alloc_stack $Foo, var, name "self", argno 2
...
  %11 = load %2 : $*Foo
  debug_value %11 : $Foo, var, name "self", argno 2
  1. Same, but with explicit debug_value on alloca:
  %2 = alloc_stack $Foo
  debug_value %2 : $*Foo, var, name "self", argno 2
...
  %11 = load %2 : $*Foo
  debug_value %11 : $Foo, var, name "self", argno 2
  1. Same, but with op_fragment and explicit type of the field:
  %2 = alloc_stack $Builtin.FPIEEE32, var, name "self", argno 2, type $*Foo, expr op_fragment:#Foo.x
...
  %11 = load %2 : $*Builtin.FPIEEE32
  debug_value %11 : $Builtin.FPIEEE32, var, name "self", argno 2

It seems that we'd need to check that the debug info for the value is consistent, therefore:

  1. In this case we'd need to obtain the object type out of alloca and compare with it
  2. Here we'd need to walk debug_value to understand that this is a variable declaration and decide whether obtain object type or not
  3. Here we'd need to "unfold" op_fragment, obtain type of x field of Foo and use for checking

Does this look as the intended behaviour and consistent with the implied debug info verified design? Or maybe I'm missing something?

@mshockwave
Copy link
Contributor

There seems to be some traffic after I was tagged, so I'll try to clarify from the earlier one:

What if the debug info is attached to alloca itself? Do we need to use pointer type or value type? I'd lean for value type as things like type $*Model.TangentVector, expr op_fragment:#Float._value do not make much sense to me.

Given the original alloc_stack instruction

%3 = alloc_stack $Builtin.FPIEEE32, var, (name "model", loc "...", scope 0), argno 1, type $*Model.TangentVector, expr op_fragment:#Float._value

The suggested mental model here: %3 has type $Builtin.FPIEEE32, this SSA value is the fragment of another aggregate-type source varialbe, "model". Source variable "model" has type $*Model.TangentVector. And %3 is a fragment located at field Float._value. I think the key is how we interpret op_fragment: "%3 is the fragment of <source variable>", rather than "taking fragment from %3".

Another highlight is that "model", as well as the following loc, is the name and location for the aggregate type source variable, rather than that for %3. That's why we put brackets around it to distinguish it from "normal" use cases like:

%8 = alloc_stack $Int32, var, name "hello", loc foo.swift

In which case the source variable name of %8 is really "hello".

I agree this is not so straightforward, but the DWARF standard coined this (model) and both LLVM and SIL are following it.

  1. Same, but with op_fragment and explicit type of the field:
    %2 = alloc_stack $Builtin.FPIEEE32, var, name "self", argno 2, type $*Foo, expr op_fragment:#Foo.x
    ...
    %11 = load %2 : $*Builtin.FPIEEE32
    debug_value %11 : $*Builtin.FPIEEE32, var, name "self", argno 2

Sorry but I think the type of %11 is wrong...it should be $Builtin.FPIEEE32 right?

  1. Same, but with explicit debug_value on alloca:
    %2 = alloc_stack $Foo
    debug_value %2 : $*Foo, var, name "self", argno 2
    ...
    %11 = load %2 : $*Foo
    debug_value %11 : $Foo, var, name "self", argno 2

Slightly tangent: But is the second debug_value instruction necessary? Because I feel like inside a compiler, to verify whether the pointee of %2 is "self", in the worst case we need to do a alias analysis (of course in the snippet above it's trivial to track, but what if we're loading from some SSA value other than %2?). But the DWARF generated at the end will simply tag the corresponding stack slot with "self". And inside a debugger -- since we're in run-time -- we don't need any alias analysis to know if an arbitrary value is "self".

@asl
Copy link
Contributor Author

asl commented Apr 14, 2022

@mshockwave Thanks for the clarifications!

Sorry but I think the type of %11 is wrong...it should be $Builtin.FPIEEE32 right?

Yes, typo, should be value type, not pointer.

Slightly tangent: But is the second debug_value instruction necessary?

Well, I'm essentially copying real cases I've seen in wild :) The debug_value in question is left after sil-mem2reg, when it removes alloca (originally we had like 2-3 allocas with same debug value attached to it), but saves debug info.

@asl
Copy link
Contributor Author

asl commented Apr 19, 2022

@mshockwave @adrian-prantl Thanks for the clarifications! I refined the patch and relaxed the checks in SIL verifier to accommodate for the location changes like in case 1 above. Please review.

PS: It seems we have plenty of cases when op_dered is applied to non-address types (so I cannot simply resolve FIXME). I will file a separate issue for this.

@adrian-prantl
Copy link
Contributor

LGTM

@asl
Copy link
Contributor Author

asl commented Apr 19, 2022

@adrian-prantl Will you please ping CI for me? :)

@adrian-prantl
Copy link
Contributor

@swift-ci test

@asl
Copy link
Contributor Author

asl commented Apr 21, 2022

@adrian-prantl @BradLarson @rxwei Will you please merge the PR for me? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants