[AutoDiff] [TF-1288] Supporting differentiable functions with multiple semantic results #38781

BradLarson · 2021-08-06T13:30:35Z

PR #32629 added reverse-mode differentiation support for apply instructions with multiple active semantic results. This completes user-facing support for differentiable functions with multiple semantic results.

Previously, it was not possible to state that a function with multiple semantic results was @differentiable. This included:

functions with an inout parameter which returned a result
functions with multiple inout parameters
mutating functions which returned a result
functions that return a tuple of results

It is now possible to mark these functions as @differentiable and to supply custom pullbacks for them.

As a concrete practical benefit, it allows for the definition of one-step accessors for arrays that have constant-time performance on the backwards pass, instead of requiring O(N) zero value generation for each access. This was highlighted as an advantage of Swift's mutable value semantics in the the Swift for TensorFlow overview paper.

For example, the following code for a faster backwards pass involving array access is now possible as a result of this PR:

public extension Array where Element: Differentiable {
  @inlinable
  @differentiable(reverse)
  mutating func access(at index: Int) -> Element {
      return self[index]
  }

  @inlinable
  @derivative(of: access(at:))
  mutating func vjpSubscriptAccess(at index: Int)
  -> (value: Element, pullback: (Element.TangentVector, inout Array.TangentVector) -> Void)
  {
      let valueCount = self.count
      return (self[index], { dElement, v in
          if v.base.count == 0 {
              v.base = Array<Element.TangentVector>(repeating: .zero, count: valueCount)
          }
          v.base[index] = v.base[index] + dElement
      })
  }
}

(A mutating get subscript could also potentially be used for this.)

Resolves TF-1288.

This partially resolves TF-1038 by allowing for differentiating through functions that return tuples of differentiable values (for example, func multipleResult(_ x: Float) -> (Float, Float)). However, all results are treated as being with-respect-to (wrt) due to the lack of a means at present for specifying result indices within the @differentiable attribute. It is anticipated that specification of with-respect-to results can come as a later enhancement.

BradLarson · 2021-08-06T13:31:49Z

@swift-ci Please test.

porterchild · 2021-08-06T16:04:31Z

Exciting!

dan-zheng · 2021-08-06T21:03:50Z

lib/AST/Type.cpp

+  // This uses the logic from getSubsetParameters(), only operating over all
+  // parameter indices and looking for non-wrt indices.
+  SmallVector<AnyFunctionType *, 2> curryLevels;
+  // An inlined version of unwrapCurryLevels().


Note: unwrapCurryLevels is hacky – it would be good to refactor it some point.

Without going too deep into implementation details: we only need to handle one potential "curry level" for method declarations – no need for loops or confusing ad-hoc terminology.

dan-zheng · 2021-08-06T21:04:18Z

lib/AST/Type.cpp

+  for (auto i : range(originalResults.size())) {
+    auto originalResult = originalResults[i];
+    auto originalResultType = originalResult.type;
+    // Voids currently have a defined tangent vector, so ignore them.


Could you please clarify what this comment means exactly?

// Voids currently have a defined tangent vector, so ignore them.

I'm specifically thinking of this test case: https://github.com/apple/swift/blob/72866b6dd9e6024ab97469cafb56333800bbeb67/test/AutoDiff/Sema/derivative_attr_type_checking.swift#L867 involving an inout Void. getAutoDiffTangentSpace() returns an empty tuple for Void: https://github.com/apple/swift/blob/main/lib/AST/Type.cpp#L5359 rather than None, so we can't rely on the if (!resultTan) check to filter them out in that one case. I didn't want to alter the behavior of getAutoDiffTangentSpace() for this one edge case, so I opted for detecting Voids as a special case.

dan-zheng · 2021-08-06T21:05:26Z

test/AutoDiff/validation-test/simple_math.swift

@@ -121,6 +121,32 @@ SimpleMathTests.test("MultipleResults") {
  expectEqual((4, 3), gradient(at: 3, 4, of: multiply_swapAndReturnProduct))
 }

+// Test function with multiple `inout` parameters and a custom pullback.


Did you mention that functions returning direct tuples of Differentiable values can also be differentiated? If so, could you please add a test case?

Great suggestion, I added tests for returning tuples, and that uncovered a result type issue I hadn't considered. It took me a little bit to track it down, but the tests and a fix for this have been added in the latest commit.

rxwei · 2021-08-12T23:24:49Z

Sorry I'm late to the review. A high level question: could we convert these JIRA tickets to SRs?

dan-zheng · 2021-08-12T23:28:51Z

A high level question: could we convert these JIRA tickets to SRs?

Sounds good to me.

BradLarson · 2021-08-12T23:42:59Z

@rxwei - Porter had converted some of these over (I think he said that he had tried to move TF-700 and above), but when he and I went to the TF-* tickets today, none had the option to move. I also don't even have options to comment on these. Did the Swift for TensorFlow category get archived or placed in a read-only state?

rxwei · 2021-08-13T00:05:10Z

Did the Swift for TensorFlow category get archived or placed in a read-only state?

Ah yes, I believe that's the case. IMO we should probably clone these to new SRs.

rxwei · 2021-08-13T00:12:12Z

How about @noDerivative inout parameters in function declarations and function types? Are there any tests?

rxwei

From the example you gave in the PR description, I'm not convinced this is a good motivation for allowing multiple semantic results.

  @inlinable
  @differentiable(reverse)
  mutating func access(at index: Int) -> Element {
      return self[index]
  }

The above function is a mutating function that doesn't actually mutate anything. The premise of the proposed change seems to say that in order to make a fast derivative for array subscript, the library author would have to provide a fake mutating version of it so that it can have a mutating pullback. I agree with the goal and the priority of optimizing array subscripts, but the proposed solution doesn't really make sense to me.

To allow functions like array subscript to have an efficient derivative, I believe we should allow functions to have derivatives with an "inout-optimized" pullback.

func foo(_ x: T) -> R

@derivative(of: foo(_:))
func dFoo(_ x: T) -> (value: R, pullback: (R, inout T) -> Void)

The emitted library interface should include the fact that this derivative is an inout-optimized one. The differentiation transform should be able to pick up the inout-optimized derivative and just use that in generated derivatives.

To make everything else work with inout-optimized derivatives, e.g. forming @differentiable(reverse) closures, you just need to thunk them up to have a standard pullback that returns results formally.

functions with multiple inout parameters

I'm interested in seeing some compelling use cases of this.

functions that return a tuple of results

We had support for this very early on but removed it. I'm not sure adding a compiler special case is a good idea, because 1) it will fall out naturally when tuples can be extended to conform to protocols and 2) you can already work around this by defining a struct.

Plus, there are bigger issues with allowing multiple semantic results which will require opening up many axes of customization points. For example, in a function with two inout parameters, (inout T0, inout T1) -> Void, what if you wanted to differentiate w.r.t. both inputs but only w.r.t. the T1 output? These would requite additional syntax to support and therefore I don't believe we should do multiple semantic results prematurely.

BradLarson · 2021-08-16T19:23:38Z

@rxwei - Thank you for the detailed comments.

I absolutely agree that the array accessor example isn't in the optimal form that we'd want. I'd thought that matching an inout tangent vector to a non-inout/non-mutating function would be more controversial, so I chose this as a first step towards realizing some of the performance benefits. It sounds like being able to associate "inout-optimized" pullbacks to functions that don't use inouts themselves would be a preferred for this case, so I'll start independently working towards enabling that. It'd be ideal to have Array in the standard library with this kind of optimized pullback by default.

However, beyond that motivating example in terms of performance, I do see many cases in a simulator codebase I'm currently working on that could be cleaned up with support for multiple results. While not all of the listed differentiable function types are of the same utility, some do provide functionality that's awkward to reproduce otherwise. I'll note that all of these function types are enabled via the same fundamental additions, so it's not that I'm adding special cases for each. For example, differentiation through tuple results comes along basically for free as part of this.

Of these, the most compelling use case might be mutating functions on structs that also return a value. At present, a value that only is needed in the local context has to be added as a property on the struct, bloating its size and tangent vector, and
one has to then remember to read from the property after the mutating function has run.

extension Model {
  mutating func adjustTimeStep() -> Float {
    // Perform calculations, adjust internal counter.
    return newTimeStep
  }
}

This, as with the other use cases, could potentially be reworked to return a struct instead of having multiple results. The call site is what really benefits from allowing multiple results, where you no longer need to use a bespoke struct and unpack its results. Even the less frequent multiple inout case could simplify the call site of a function that adjusts multiple input properties in response to a single calculation.

When it comes to being able to specify which results to differentiate with respect to, that is definitely an enhancement that can be made on top of this, but I don't believe it is necessary for this to be immediately useful. In fact, this implementation provides the framework to support that: when the syntax exists to specify w.r.t. results, the few places where all result indices are enumerated and set in configs can be replaced by the w.r.t. result indices parsed upstream. All other logic should remain the same.

Each of the use cases in the simulator code I'm working with would be served right now by using all differentiable results, and I don't have any that I can think of that would require not differentiating through specific results. This PR serves what I believe is the more common case, but does not preclude adding such restrictions in the future. It should hopefully lay a good foundation for that.

Finally, regarding @noDerivative inouts, there are @derivative attribute test cases for non-w.r.t. inouts (
pullback,
differential)
and non-w.r.t. self on a mutating function. If those don't provide sufficient coverage, I'd be glad to add more.

rxwei · 2021-08-19T22:51:42Z

Finally, regarding @noDerivative inouts, there are @Derivative attribute test cases for non-w.r.t. inouts (pullback, differential)

By @noDerivative inout, I mean literally function types that have @noDerivative inout parameters, e.g. (Float, @noDerivative inout T) -> Float. Do you have tests that cover this?

rxwei · 2021-08-19T23:10:14Z

This, as with the other use cases, could potentially be reworked to return a struct instead of having multiple results. The call site is what really benefits from allowing multiple results, where you no longer need to use a bespoke struct and unpack its results. Even the less frequent multiple inout case could simplify the call site of a function that adjusts multiple input properties in response to a single calculation.

While I have no doubt about the potential benefits at call sites, this feature is a special case that's going into the compiler and, to my knowledge, doesn't have a precedent in other user-facing features. This adds maintenance cost and it is unlikely to be considered in the initial Swift Evolution proposal due to being an optional special case. IMO its complexity is greater than the benefits it provides, and you can achieve its functionality today just with a little bit more workaround code.

In your test I find the following case:

  @differentiable(reverse)
  static func tupleResultsInt(_ x: Self) -> (Int, Self) {}

As discussed earlier, at the AST level there is currently no way to specify which output(s) to differentiate with respect to. This makes the above example fragile because if there was any retroactive conformance to Differentiable for an element in the result tuple that did not conform previously, e.g. Int: Differentiable, the compiler will try to link the wrong derivatives (w.r.t. both outputs in this case) and result in a linker error. To my knowledge the only solution here would be adding an AST-level annotation to specify outputs to differentiate w.r.t., and having the compiler emit those annotations into the module interface. That is not covered by the proposed change.

Moreover, it's unspecified how the proposed type checking rules deal with functions that return a nested tuple of differentiable types, e.g. (Float, (Float, (Float, Float))).

For reasons above, I don't believe the support for tuple results is ready to go in.

rxwei · 2021-08-19T23:21:39Z

Regarding supporting result tuples, if you really need it, I might actually suggest trying an approach with precedents in the language, i.e. making tuples conform to Differentiable when all of their element types conform. That's just like tuples conforming to Equatable, Comparable and Hashable today (SE-0283).

Regarding the proposed features other than supporting result tuples, would you mind writing up the rules used to infer the differential and pullback types from a function type? That would really help me better understand and review the proposed typing rules.

BradLarson · 2022-08-12T01:34:47Z

@swift-ci please test

philipturner · 2022-08-12T02:47:41Z

Given what happened with #59467, I fear that this may break S4TF. It is an extremely large pull request developed over several months, with many opportunities to introduce new crashers. Since I don't have the skillset to compile a full Swift toolchain from this branch, I won't be able to test it until the PR is integrated into nightly snapshots. Then, we might have to revert the commits, which is not great to do.

I'm asking if anyone will spare some time to prevent this from occurring again. I have a spent a lot of time making a well-maintained build script for S4TF. It runs for sure on macOS, although some modifications must be made before it can run on Linux. It hard-codes the toolchains IDs for the August 9, 2022 development snapshot, so you may need to tweak it a little before running. Also, please read over the entire thing before running because that's best practice for security purposes.

https://gist.github.com/philipturner/7aa063af04277d463c14168275878511

TensorFlow 2.9.1 could take 30 minutes to an hour (roughly speaking) to compile once; after that Bazel will cache C++ build products. I encourage using the --skip-ctensorflow flag on subsequent builds if the Swift portion must be compiled multiple times. After that, I only ask for a report on whether it compiled successfully.

Alternatively, you could examine the S4TF template notebook on Colab, which uses the very last X10 binaries built by Google (TensorFlow 2.4.0). The XLA/TPU support has broken beyond repair (the old binaries fail and downstreaming recent pytorch/xla commits produces something unusable), and I'm in no place to launch up-to-date binaries. Plus, my focus is on the new backend with no dependency to tensorflow/tensorflow. It's an evolution of LazyTensor, which JIT compiles in an elaborate pipeline that does not require LazyTensorBarrier or de-unrolling of loops. Until that's finished, eager mode stuff works with CTensorFlow; it just breaks after touching the XLA stuff. I enabled some X10 package tests previously limited to CI scripts, but they only utilize _RawTFEager.

https://colab.research.google.com/drive/1v3ZhraaHdAS2TGj03hE0cK-KRFzsqxO1?usp=sharing

If you have an x86 Mac, Google still has an x86 binary of X10 cached. That will save time and be more secure. I was not able to use it because my primary machine is ARM64, hence the need to compile TensorFlow from scratch. If you choose this route, translate the Jupyter magic commands into a Bash script (the commands are mostly documented here) and let me know you chose that route instead of the TF 2.9.1 binary.

Third option: if you already have some CI setup to test internal toolchains, it might spin off a Linux x86 toolchain. You could try uploading that to Google Drive, then fetching in Colab (which has extremely fast network speed inside Google's servers). Then, wait 3 minutes to see if S4TF compiles inside the notebook.

——-

Whatever the option, please compile 3 times:

debug mode
-Xswiftc -Onone -c release
release mode

…tiveFunctionLinearMapType(). Fails for certain non-wrt inouts.

…ults.

…utoDiffDifferentialType(), rather than the hardcoded 0 index.

…ction(), adding some serialization tests.

…semantic results.

… index with all result indices in multiple places.

…t types.

…th multiple results, and an activity analysis test representing code that had exposed that issue. Co-authored-by: Anton Korobeynikov <[email protected]>

BradLarson · 2022-08-12T20:47:33Z

@swift-ci please test

BradLarson · 2022-08-12T21:06:25Z

@philipturner - I can try to do a local build of Swift for TensorFlow to verify, but I would be very surprised if the changes here would cause a problem with any of the code on your end. It should not change the behavior of existing code, aside from now supporting differentiating through functions with multiple results. In a large differentiable Swift codebase locally, it hasn't impacted the functioning over a significant amount of testing.

This took a while to revisit because it had exposed some other issues elsewhere when you tried to differentiate through certain types of functions with multiple semantic results, and those cases have one-by-one been resolved since via other PRs or @asl's patch in the latest commit here. However, if there is some kind of interaction that causes problems in existing code, I'd very much like to be aware of it, so I'll see if I can't test this against the current Swift for TensorFlow code.

philipturner · 2022-08-12T21:32:41Z

@BradLarson - It seems there’s a low chance of creating new compiler crashers that will break S4TF; however, it’s better safe than sorry. Compilation will be much simpler and faster once I complete the GPU-only backend.

If you have already created a workflow for testing the repository, and subsequent builds take minutes, it might be good practice to keep those build products around. S4TF would remain a good validation test in addition to PassiveLogic’s code base, and you are free to test whenever you think it is appropriate.

BradLarson · 2023-06-23T02:29:03Z

Closing in favor of PR #66873.

BradLarson requested review from rxwei and dan-zheng August 6, 2021 20:40

dan-zheng reviewed Aug 6, 2021

View reviewed changes

rxwei reviewed Aug 13, 2021

View reviewed changes

ProfFan mentioned this pull request Dec 3, 2021

Requesting assistance tensorflow/swift-apis#1185

Closed

philipturner mentioned this pull request Jun 14, 2022

[AutoDiff][Sema] Properly type-check differentiability of parameters and results in differentiable function type declarations #41174

Closed

BradLarson force-pushed the TF-1288 branch from 3bc3e30 to 2902ad2 Compare August 12, 2022 00:09

BradLarson added 10 commits August 12, 2022 11:39

Initial easing of checks for differentiability of inouts with results.

8b6704f

Initial attempt at allowing for multiple results in getAutoDiffDeriva…

2c7672a

…tiveFunctionLinearMapType(). Fails for certain non-wrt inouts.

Attempting to detect non-wrt inout parameters, all inouts are now res…

47fff24

…ults.

Iterate over all result indices in getAutoDiffPullbackType() and getA…

2d49ad5

…utoDiffDifferentialType(), rather than the hardcoded 0 index.

Returning all resultIndices from emitDifferentiabilityWitnessesForFun…

98a8337

…ction(), adding some serialization tests.

Adding test for cross-module registration of functions with multiple …

6ce7df5

…semantic results.

Loosening checks that assume only one result. Adjusting tests.

2420dbc

Reworking logic for non-wrt inout parameters. Replacing single result…

f5d4885

… index with all result indices in multiple places.

Consolidated repeated result index generation into a central function.

ff8dc58

Converting a last few areas to use multiple result indices.

4997035

BradLarson and others added 2 commits August 12, 2022 11:39

Added tuple result tests, extracting tuple elements as semantic resul…

ffe1e52

…t types.

Adding @asl's fix for subset parameters thunks involving functions wi…

17e6987

…th multiple results, and an activity analysis test representing code that had exposed that issue. Co-authored-by: Anton Korobeynikov <[email protected]>

BradLarson force-pushed the TF-1288 branch from 2902ad2 to 17e6987 Compare August 12, 2022 20:38

asl mentioned this pull request Jun 22, 2023

[AutoDiff] Supporting differentiable functions with multiple semantic results #66873

Merged

BradLarson closed this Jun 23, 2023

[AutoDiff] [TF-1288] Supporting differentiable functions with multiple semantic results #38781

[AutoDiff] [TF-1288] Supporting differentiable functions with multiple semantic results #38781

Uh oh!

Conversation

BradLarson commented Aug 6, 2021

Uh oh!

BradLarson commented Aug 6, 2021

Uh oh!

porterchild commented Aug 6, 2021

Uh oh!

dan-zheng Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dan-zheng Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

BradLarson Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dan-zheng Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

BradLarson Aug 13, 2021

Choose a reason for hiding this comment

Uh oh!

rxwei commented Aug 12, 2021

Uh oh!

dan-zheng commented Aug 12, 2021

Uh oh!

BradLarson commented Aug 12, 2021

Uh oh!

rxwei commented Aug 13, 2021

Uh oh!

rxwei commented Aug 13, 2021

Uh oh!

rxwei left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BradLarson commented Aug 16, 2021

Uh oh!

rxwei commented Aug 19, 2021

Uh oh!

rxwei commented Aug 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxwei commented Aug 19, 2021

Uh oh!

BradLarson commented Aug 12, 2022

Uh oh!

philipturner commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BradLarson commented Aug 12, 2022

Uh oh!

BradLarson commented Aug 12, 2022

Uh oh!

philipturner commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BradLarson commented Jun 23, 2023

Uh oh!

Uh oh!

dan-zheng Aug 6, 2021 •

edited

Loading

BradLarson Aug 6, 2021 •

edited

Loading

rxwei left a comment •

edited

Loading

rxwei commented Aug 19, 2021 •

edited

Loading

philipturner commented Aug 12, 2022 •

edited

Loading

philipturner commented Aug 12, 2022 •

edited

Loading