[AutoDiff] Supporting differentiable functions with multiple semantic results #66873

asl · 2023-06-22T23:23:21Z

PR #32629 added reverse-mode differentiation support for apply instructions with multiple active semantic results. This completes user-facing support for differentiable functions with multiple semantic results.

Previously, it was not possible to state that a function with multiple semantic results was @differentiable. This included:

functions with an inout parameter which returned a result
functions with multiple inout parameters
mutating functions which returned a result
functions that return a tuple of results

It is now possible to mark these functions as @differentiable and to supply custom pullbacks for them.

This is essentially #38781 rebased on main with additional bugfixes and some changes here and there

Co-authored-by: @BradLarson

…ults Co-authored-by: Brad Larson <[email protected]>

asl · 2023-06-22T23:23:37Z

@swift-ci please test

asl · 2023-06-23T03:47:02Z

@swift-ci please test macos platform

asl · 2023-06-23T18:04:40Z

@rxwei @BradLarson @dan-zheng Let me know if there is something that needs to be added / changed :)

rxwei

Makes sense. I have just a few questions:

Does this mean that we lose support for inout parameters that are not intended to be treated as a semantic result? Does this break any existing code with an inout parameters?
In SIL we have "result indices" to choose which results to differentiate. How are we sorting the inout results with other results? Could you add some SIL FileCheck tests?
Should result indices be surfaced to the language so that we can express it in the @differentiable attribute?

asl · 2023-06-23T21:27:50Z

Does this mean that we lose support for inout parameters that are not intended to be treated as a semantic result? Does this break any existing code with an inout parameters?

I do not think so. @BradLarson do you have some production-code example in mind when this might matter?

In SIL we have "result indices" to choose which results to differentiate. How are we sorting the inout results with other results? Could you add some SIL FileCheck tests?

Should result indices be surfaced to the language so that we can express it in the @differentiable attribute?

Currently inout handling is sprinkled around codebase here and there as a special case. I am working on class differentiation on top of this PR (with the intention to support Arrays without manual hacks). For this I needed to generalize the meaning of "semantic result parameter", so I can treat inouts and class references somehow in a similar way. I think we can postpone adding result bits to the language until we will have some common support for these things.

So far the semantic result indices will always go after the usual result indices in the declaration order (so they are essentially numResults + original parameter index). I think we will need to make this user-friendly when we will expose them to the users :)

BradLarson · 2023-06-23T21:56:27Z

@asl - Last time around, I'd run these changes against our entire codebase and didn't see any cases where it caused a problem, even in speculative differentiable code that would then be able to take advantage of multiple results. I know we have at least one test case with multiple inout parameters and a wrt to ignore one of them for purposes of differentiability as an input. That case was handled correctly in terms of the arguments, but I don't recall if the ignored inout parameter was still present as an active result.

Regarding result indices, I don't think we were able to come up with a scenario where we needed to wrt results. There are common cases for wrt on arguments, but we couldn't determine when we'd need to do so for results.

rxwei · 2023-06-24T08:24:41Z

Not being able to express result selection in @differentiable attribute can cause pretty bad ABI mismatches. IIRC we currently infer result indices from types conforming to Differentiable. Let's consider the following scenario:

Module1:

@differentiable
public func foo(x: Float, y: inout Foo) -> Float

// `@differentiable` implies the following derivative functions in Module1's ABI.
public func foo_jvp(x: Float, y: inout Foo) -> (result: Float, differential: (Float) -> Float)
public func foo_vjp(x: Float, y: inout Foo) -> (result: Float, pullback: (Float) -> Float)

Module2:

public extension Foo: Differentiable {}

Module3:

import Module1
import Module2

valueWithPullback(at: ...) {
    foo(...)
}

Because Module1's swiftinterface says @differentiable(wrt: x) without specifying the result indices, the result indices are always going to be inferred when the swiftinterface is type-checked. Module2 made Foo conform to Differentiable, so the compiler is now considering the inout Foo parameter as a differentiable semantic result, and will infer wrong result indices for declaring a differentiability witness (and getting the wrong JVP and VJP), hence undefined symbols.

If that's indeed what's happening here, I think we should mitigate that before merging this PR. One possible way to deal with this, before we have any user-facing syntax for expressing result selection in @differentiable, is to never treat an inout parameter as a semantic result unless it's an "wrt" parameter, because "wrt" guarantees that the inout parameter conforms to Differentiable when it's being differentiated, both when it's an wrt parameter and when it's a semantic result.

asl · 2023-06-24T19:55:04Z

@rxwei Thanks for the example. Let me investigate the things.

asl · 2023-06-25T05:03:20Z

@rxwei Actually the issue is more obvious. In your example we will be unable to infer the tangent type for Foo in Module1.

rxwei · 2023-06-25T06:34:45Z

@rxwei Actually the issue is more obvious. In your example we will be unable to infer the tangent type for Foo in Module1.

Do you mean you are unconditionally treating all inout parameters as semantic results? This would be a source-breaking change. Existing code like the following would be broken and will be impossible to express after this PR.

@differentiable(wrt: input)
func prediction(from input: Vector, in context: inout Context) -> Vector

After thinking about this a bit more, I don't believe it should be the default behavior. It seems quite confusing to the user to treat one or more inout parameters as "results" when a function also has formal results, let alone the source breakage.

One possible way to deal with this, before we have any user-facing syntax for expressing result selection in @differentiable, is to never treat an inout parameter as a semantic result unless it's an "wrt" parameter

I believe such confusion would be resolved by the above suggestion, especially in cases where the function has a non-Void return type.

asl · 2023-06-25T06:37:14Z

@rxwei Yes. With this PR we're asserting on the following code (instead of valid diagnostics):

public struct ArrayWrapper {
    var values: [Float]

    mutating func get(index: Int) -> Float {
        self.values[index]
    }
}

@differentiable(reverse)
func test(x: Int, y: inout ArrayWrapper, z: Float) {
	y.get(index: x) + z
}

And assert instead of compiling:

public struct ArrayWrapper {
    var values: [Float]

    mutating func get(index: Int) -> Float {
        self.values[index]
    }
}

@differentiable(reverse)
func test(x: Int, y: inout ArrayWrapper, z: Float) -> Float {
	y.get(index: x) + z
}

So yes, we need take into account only wrt semantic results

rxwei · 2023-06-25T06:48:12Z

In the case where a function has both inout parameters and formal results, it may be better to treat an inout parameter as a semantic result only when the user explicitly declares wrt: for that parameter (even if the type conforms to Differentiable.)

asl · 2023-06-27T00:04:54Z

@rxwei Ok, I think here is the important point. Currently we do support derivatives of void functions with non-wrt inout (#33304), essentially:

protocol Proto {
  @differentiable(reverse, wrt: x)
  func method(x: Float, y: inout Float)
}

struct Struct: Proto {
  @differentiable(reverse, wrt: x)
  func method(x: Float, y: inout Float) {
    y = y * x
  }
}

So, are you suggesting the following:

For void functions, treat all inouts as semantic results (and all of them should be differentiable)
For non-void functions, treat only wrt inouts as semantic results

Is this correct?

rxwei · 2023-06-28T05:58:06Z

I think there are two angles where we need to decide:

Whether non-wrt parameters should be treated as semantic results.
- It seems that our conclusion was "no".
The default wrt inference behavior, i.e. the behavior when you apply @differentiable without specifying wrt:.
- My current preference is to never infer wrt-ness on any inout argument. If the user explicitly specifies wrt: on an inout argument, we treat it as a semantic result.
- Enforcing explicit wrt annotations has better clarity. We can always add inference rules in the future if there's a usability benefit, without breaking source code.

@differentiable // inferred as wrt: x, semantic result: formal result (no y)
func foo(x: Float, y: inout Float) -> Float

@differentiable // error: 'String' does not conform to 'Differentiable'
func foo(x: Float, y: inout Float) -> String

@differentiable // error: function does not have a differentiable return type
                // note: did you mean to differentiate wrt the inout parameter `y`?
                // fixit: insert `(wrt: y)`
func foo(x: Float, y: inout Float)

asl · 2023-06-28T06:13:57Z

My current preference is to never infer wrt-ness on any inout argument. If the user explicitly specifies wrt: on an inout argument, we treat it as a semantic result.

I am thinking we'd need to make different rules for ordinary functions vs methods. In the latter case it might make sense to differentiate around inout Self for the mutating methods. Otherwise the things might be confusing as self is implicit there.

…sults

asl · 2023-06-28T21:27:29Z

@swift-ci please test

asl · 2023-06-28T21:33:25Z

@rxwei @BradLarson Ok, while we are deciding on what would be the best way to handle inout semantic results, here is the commit (1ab1598) that I believe should fix immediate problems while keeping the existing logic intact (plus things are factored in such way that it would be easy to change inout handling later on).

wrt inouts are treated as semantic results
Unless function returns Void non-wrt inouts are not considered as semantic results
All inouts are considered as semantic results for void functions
All semantic results are required to be differentiable, we do not silently skip them

I think now we are correctly handing result indices in all cases (e.g. for methods where Self is curried).

PTAL

BradLarson · 2023-06-29T15:00:37Z

The change to requiring wrt for inout parameters with non-Void returns will lead to a slight change in behavior in the case where there's a single inout parameter and a non-differentiable result, which previously did not require a wrt annotation. I'm fine with annotating the few places in our code that use a function like that, but would that cause a problem for any other existing code out there?

rxwei · 2023-07-02T03:01:52Z

Unless function returns Void non-wrt inouts are not considered as semantic results

I'm trying to understand this clearly. Is this saying: if a function returns Void, non-wrt inputs are semantic results? What about the following case:

@differentiable(wrt: x)
func foo(x: Float, y: inout Float)

Are you saying that the function above treats y as a semantic result even it's not wrt? If so, it seems contradicting the earlier conclusion as I understood it.

asl · 2023-07-02T04:45:35Z

Are you saying that the function above treats y as a semantic result even it's not wrt? If so, it seems contradicting the earlier conclusion as I understood it.

As I said, the PR keeps the existing behavior for void functions and semantic results – for void functions all inouts are treated as semantic results regardless whether they are wrt or not. In order to overcome possible ABI breakage we require all semantic results to be differentiable (and not just silently skip as it was before).

My main concern for now are methods: Self is inout for struct mutating methods (and always a result for class methods), however it is implicit for the user. Are we going to require explicit wrt Self here?

rxwei · 2023-07-06T18:19:24Z

Can we have a follow-up PR to tighten up the wrt inference a little bit? The existing implicit behavior seems too complex.

My main concern for now are methods: Self is inout for struct mutating methods (and always a result for class methods), however it is implicit for the user. Are we going to require explicit wrt Self here?

I don't think there is harm to require explicit wrt: self.

asl · 2023-07-06T19:31:55Z

Can we have a follow-up PR to tighten up the wrt inference a little bit? The existing implicit behavior seems too complex.

Absolutely! I think we'd need to prepare some set of cases / code examples and decided on all of them

asl · 2023-07-06T23:26:46Z

I created #67174 to track the refined wrt semantics

asl and others added 3 commits June 22, 2023 16:18

Add support for differentiable functions having multiple semantic res…

0872802

…ults Co-authored-by: Brad Larson <[email protected]>

Enable slicing of results in reabstraction thunks

f372c9f

Properly slice indirect results as well

fe2df1e

asl requested review from rxwei and dan-zheng June 22, 2023 23:23

asl requested review from AnthonyLatsis, hborla, slavapestov and xedin as code owners June 22, 2023 23:23

xedin removed their request for review June 23, 2023 00:03

BradLarson mentioned this pull request Jun 23, 2023

[AutoDiff] [TF-1288] Supporting differentiable functions with multiple semantic results #38781

Closed

rxwei reviewed Jun 23, 2023

View reviewed changes

Correctly infer result indices & types for multiple semantic inout re…

1ab1598

…sults

rxwei approved these changes Jul 6, 2023

View reviewed changes

asl mentioned this pull request Jul 6, 2023

Refine the implicit wrt logic for @differentiable attribute #67174

Closed

asl merged commit eb82df6 into swiftlang:main Jul 6, 2023

asl deleted the TF-1288 branch July 6, 2023 23:31

JaapWijnen mentioned this pull request Apr 11, 2024

[Autodiff] valueWithPullback operator does not take functions with inout parameters #72982

Open

[AutoDiff] Supporting differentiable functions with multiple semantic results #66873

[AutoDiff] Supporting differentiable functions with multiple semantic results #66873

Conversation

asl commented Jun 22, 2023

Uh oh!

asl commented Jun 22, 2023

Uh oh!

asl commented Jun 23, 2023

Uh oh!

asl commented Jun 23, 2023

Uh oh!

rxwei left a comment

Choose a reason for hiding this comment

Uh oh!

asl commented Jun 23, 2023

Uh oh!

BradLarson commented Jun 23, 2023

Uh oh!

rxwei commented Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asl commented Jun 24, 2023

Uh oh!

asl commented Jun 25, 2023

Uh oh!

rxwei commented Jun 25, 2023

Uh oh!

asl commented Jun 25, 2023

Uh oh!

rxwei commented Jun 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asl commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxwei commented Jun 28, 2023

Uh oh!

asl commented Jun 28, 2023

Uh oh!

asl commented Jun 28, 2023

Uh oh!

asl commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BradLarson commented Jun 29, 2023

Uh oh!

rxwei commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asl commented Jul 2, 2023

Uh oh!

rxwei commented Jul 6, 2023

Uh oh!

asl commented Jul 6, 2023

Uh oh!

asl commented Jul 6, 2023

Uh oh!

rxwei commented Jun 24, 2023 •

edited

Loading

rxwei commented Jun 25, 2023 •

edited

Loading

asl commented Jun 27, 2023 •

edited

Loading

asl commented Jun 28, 2023 •

edited

Loading

rxwei commented Jul 2, 2023 •

edited

Loading