Skip to content

[SR-15823] [AutoDiff] Runtime segfault from edge case #58097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
philipturner opened this issue Feb 4, 2022 · 10 comments
Open

[SR-15823] [AutoDiff] Runtime segfault from edge case #58097

philipturner opened this issue Feb 4, 2022 · 10 comments
Labels
AutoDiff bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler itself crash Bug: A crash, i.e., an abnormal termination of software SILGen Area → compiler: The SIL generation stage swift 5.9

Comments

@philipturner
Copy link
Contributor

Previous ID SR-15823
Radar None
Original Reporter @philipturner
Type Bug

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s
Labels Bug
Assignee None
Priority Medium

md5: 7ac04b0e86fad47aa1f9a8580634f6ab

Issue Description:

The following code crashes at runtime under specific conditions as explained below. As-is, it crashes. To stop it from crashing, follow instructions in its comments.

import _Differentiation

// Remove the @noDerivative attribute here...
typealias MyType = @differentiable(reverse) (inout Float, @noDerivative Int) -> Void

@differentiable(reverse)
// ... and here to stop it from crashing.
func myFunc(_ x: inout Float, _ y: @noDerivative Int) -> Void {}

// Alternatively, un-comment out this line and it no longer crashes.
//print("hi")
print(myFunc as MyType)
@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@philipturner
Copy link
Contributor Author

@asl your comment on SR-15818 (#58095 (comment)) was right on point! I spent around 4 days investigating the bug, and the cause is something very massive. This bug can actually crash the compiler. Take this reproducer, for example. It crashes during SILGen.

import _Differentiation

func myFunc(x: Float, y: Float) -> Float { x + y }

let item: @differentiable(reverse) (Float, Float) -> Float = myFunc

class Box<T> {
  var value: T
  init(value: T) {
    self.value = value
  }
}

let box = Box(value: item)
let retrieved = box.value

My full investigation is published as a gist, but here is the summary:

// There are multiple symptoms that point to the same problem:
// 1) Suffering from improper reference counting, basically unsafe bit casting
// 2) Type mismanagement in the compiler
// 3) No standardized or documented way to handle differentiable function
//    pointers in memory, at least in the way mentioned in the reproducer
// 
// The problem: there might be no built-in support for reading differentiable
// functions from device RAM.
//
// This is likely a much bigger problem than I can handle alone. I'm giving up
// for now and hoping I can defer it to people with more experience in the
// future.

@asl asl added the AutoDiff label May 3, 2022
@fibrechannelscsi
Copy link
Contributor

The code in the 4/25 post is still causing recent toolchains (2023-01-02a, 2023-01-14a and 2023-01-18a) to crash.

We have:

Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 269.
...
1.	Apple Swift version 5.9-dev (LLVM 3f23b4ceaf01213, Swift 0763e4b98c74b5b)
2.	Compiling with the current language version
3.	While evaluating request ASTLoweringRequest(Lowering AST to SIL for module SomethingSmall)
4.	While emitting reabstraction thunk in SIL function "@$sS6fIegnrr_Iegnnro_S6fIegydd_Iegyydo_TR".
 for <<debugloc at "<compiler-generated>":0:0>>5.	While emitting reabstraction thunk in SIL function "@$sS3fIegnrr_S3fIegydd_TR".

@AnthonyLatsis AnthonyLatsis added compiler The Swift compiler itself crash Bug: A crash, i.e., an abnormal termination of software compiler crash SILGen Area → compiler: The SIL generation stage labels Jan 19, 2023
@asl
Copy link
Contributor

asl commented Jan 19, 2023

@fibrechannelscsi Looks like we've failed to create proper reabstraction thunk from @escaping @callee_guaranteed (@in_guaranteed Float) -> (@out Float, @out Float) to (Float) -> (Float, Float). Apparently the inner abstraction pattern for result is Opaque and outer result is obviously a tuple. Therefore the result planner thinks that inner type is not a tuple and just a scalar.

Hence an assertion when address is scalar is casted to address of tuple.

There is some special wrt Opaque in case of differential reabstraction thunks, but I do not recall it offhand. @rxwei Does this ring any bell?

@asl
Copy link
Contributor

asl commented Jan 19, 2023

Ok, the problem is quite weird. Looks like we're confusing (@in_guaranteed Float) -> (@out Float, @out Float and (@in_guaranteed Float) -> @out (Float, Float) as a pullback type here.

The difference is caused by the context where we calculate the pullback type. In the first case we're inside generic function Box<T>::init so we're seeing tuple as a whole due to substitution, hence @out (Float, Float) (actually we're dealing with @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, (Float, Float) substituted type there). In the second case we're at let retrieved = box.value where the type is known exactly and we're "seeing" tuple as as separate arguments and therefore when we calculate pullback type we're ending with (@out Float, @out Float). Now, when we're creating a reabstraction thunk with Opaque pattern it really expects a single indirect result, but we're providing a tuple of such results.

We can extend thunk generation to handle such case, but I'm not 100% sure we will not end with some other corner cases. @BradLarson @rxwei @dan-zheng any suggestions?

@philipturner philipturner closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2023
@asl
Copy link
Contributor

asl commented Apr 25, 2023

@philipturner Why are you closing it?

@BradLarson
Copy link
Contributor

The reproducer from April 25, 2022, still triggers the above assertion failure as of the 2023-04-27 nightly snapshot, so this is still an issue.

@jkshtj
Copy link
Contributor

jkshtj commented May 3, 2024

Reproducer no longer crashes with the 05/24 toolchain.

@jkshtj jkshtj closed this as completed May 3, 2024
@asl
Copy link
Contributor

asl commented May 3, 2024

@jkshtj Have you checked that the type of reabstraction thunk is correct now?

@jkshtj
Copy link
Contributor

jkshtj commented May 3, 2024

@jkshtj Have you checked that the type of reabstraction thunk is correct now?

I haven't. I'm doing a quick run-through of the list of issues that we have internally.

If we need deeper investigation besides checking for no crashes, I'll keep this open.

@jkshtj jkshtj reopened this May 3, 2024
@asl
Copy link
Contributor

asl commented May 3, 2024

Thanks! I'd rather not close issues simply because "does not crash anymore" and instead try to check what changed, especially if some analysis was done before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoDiff bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler itself crash Bug: A crash, i.e., an abnormal termination of software SILGen Area → compiler: The SIL generation stage swift 5.9
Projects
None yet
Development

No branches or pull requests

6 participants