-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[AutoDiff] [TF-1288] Supporting differentiable functions with multiple semantic results #38781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please test. |
Exciting! |
// This uses the logic from getSubsetParameters(), only operating over all | ||
// parameter indices and looking for non-wrt indices. | ||
SmallVector<AnyFunctionType *, 2> curryLevels; | ||
// An inlined version of unwrapCurryLevels(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: unwrapCurryLevels
is hacky – it would be good to refactor it some point.
Without going too deep into implementation details: we only need to handle one potential "curry level" for method declarations – no need for loops or confusing ad-hoc terminology.
for (auto i : range(originalResults.size())) { | ||
auto originalResult = originalResults[i]; | ||
auto originalResultType = originalResult.type; | ||
// Voids currently have a defined tangent vector, so ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify what this comment means exactly?
// Voids currently have a defined tangent vector, so ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm specifically thinking of this test case: https://github.com/apple/swift/blob/72866b6dd9e6024ab97469cafb56333800bbeb67/test/AutoDiff/Sema/derivative_attr_type_checking.swift#L867 involving an inout Void
. getAutoDiffTangentSpace()
returns an empty tuple for Void: https://github.com/apple/swift/blob/main/lib/AST/Type.cpp#L5359 rather than None, so we can't rely on the if (!resultTan)
check to filter them out in that one case. I didn't want to alter the behavior of getAutoDiffTangentSpace()
for this one edge case, so I opted for detecting Voids as a special case.
@@ -121,6 +121,32 @@ SimpleMathTests.test("MultipleResults") { | |||
expectEqual((4, 3), gradient(at: 3, 4, of: multiply_swapAndReturnProduct)) | |||
} | |||
|
|||
// Test function with multiple `inout` parameters and a custom pullback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mention that functions returning direct tuples of Differentiable
values can also be differentiated? If so, could you please add a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, I added tests for returning tuples, and that uncovered a result type issue I hadn't considered. It took me a little bit to track it down, but the tests and a fix for this have been added in the latest commit.
Sorry I'm late to the review. A high level question: could we convert these JIRA tickets to SRs? |
Sounds good to me. |
@rxwei - Porter had converted some of these over (I think he said that he had tried to move TF-700 and above), but when he and I went to the TF-* tickets today, none had the option to move. I also don't even have options to comment on these. Did the Swift for TensorFlow category get archived or placed in a read-only state? |
Ah yes, I believe that's the case. IMO we should probably clone these to new SRs. |
How about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the example you gave in the PR description, I'm not convinced this is a good motivation for allowing multiple semantic results.
@inlinable
@differentiable(reverse)
mutating func access(at index: Int) -> Element {
return self[index]
}
The above function is a mutating function that doesn't actually mutate anything. The premise of the proposed change seems to say that in order to make a fast derivative for array subscript, the library author would have to provide a fake mutating
version of it so that it can have a mutating pullback. I agree with the goal and the priority of optimizing array subscripts, but the proposed solution doesn't really make sense to me.
To allow functions like array subscript to have an efficient derivative, I believe we should allow functions to have derivatives with an "inout-optimized" pullback.
func foo(_ x: T) -> R
@derivative(of: foo(_:))
func dFoo(_ x: T) -> (value: R, pullback: (R, inout T) -> Void)
The emitted library interface should include the fact that this derivative is an inout-optimized one. The differentiation transform should be able to pick up the inout-optimized derivative and just use that in generated derivatives.
To make everything else work with inout-optimized derivatives, e.g. forming @differentiable(reverse)
closures, you just need to thunk them up to have a standard pullback that returns results formally.
functions with multiple inout parameters
I'm interested in seeing some compelling use cases of this.
functions that return a tuple of results
We had support for this very early on but removed it. I'm not sure adding a compiler special case is a good idea, because 1) it will fall out naturally when tuples can be extended to conform to protocols and 2) you can already work around this by defining a struct.
Plus, there are bigger issues with allowing multiple semantic results which will require opening up many axes of customization points. For example, in a function with two inout parameters, (inout T0, inout T1) -> Void
, what if you wanted to differentiate w.r.t. both inputs but only w.r.t. the T1
output? These would requite additional syntax to support and therefore I don't believe we should do multiple semantic results prematurely.
@rxwei - Thank you for the detailed comments. I absolutely agree that the array accessor example isn't in the optimal form that we'd want. I'd thought that matching an inout tangent vector to a non-inout/non-mutating function would be more controversial, so I chose this as a first step towards realizing some of the performance benefits. It sounds like being able to associate "inout-optimized" pullbacks to functions that don't use inouts themselves would be a preferred for this case, so I'll start independently working towards enabling that. It'd be ideal to have Array in the standard library with this kind of optimized pullback by default. However, beyond that motivating example in terms of performance, I do see many cases in a simulator codebase I'm currently working on that could be cleaned up with support for multiple results. While not all of the listed differentiable function types are of the same utility, some do provide functionality that's awkward to reproduce otherwise. I'll note that all of these function types are enabled via the same fundamental additions, so it's not that I'm adding special cases for each. For example, differentiation through tuple results comes along basically for free as part of this. Of these, the most compelling use case might be mutating functions on structs that also return a value. At present, a value that only is needed in the local context has to be added as a property on the struct, bloating its size and tangent vector, and extension Model {
mutating func adjustTimeStep() -> Float {
// Perform calculations, adjust internal counter.
return newTimeStep
}
} This, as with the other use cases, could potentially be reworked to return a struct instead of having multiple results. The call site is what really benefits from allowing multiple results, where you no longer need to use a bespoke struct and unpack its results. Even the less frequent multiple inout case could simplify the call site of a function that adjusts multiple input properties in response to a single calculation. When it comes to being able to specify which results to differentiate with respect to, that is definitely an enhancement that can be made on top of this, but I don't believe it is necessary for this to be immediately useful. In fact, this implementation provides the framework to support that: when the syntax exists to specify w.r.t. results, the few places where all result indices are enumerated and set in configs can be replaced by the w.r.t. result indices parsed upstream. All other logic should remain the same. Each of the use cases in the simulator code I'm working with would be served right now by using all differentiable results, and I don't have any that I can think of that would require not differentiating through specific results. This PR serves what I believe is the more common case, but does not preclude adding such restrictions in the future. It should hopefully lay a good foundation for that. Finally, regarding |
By |
While I have no doubt about the potential benefits at call sites, this feature is a special case that's going into the compiler and, to my knowledge, doesn't have a precedent in other user-facing features. This adds maintenance cost and it is unlikely to be considered in the initial Swift Evolution proposal due to being an optional special case. IMO its complexity is greater than the benefits it provides, and you can achieve its functionality today just with a little bit more workaround code. In your test I find the following case: @differentiable(reverse)
static func tupleResultsInt(_ x: Self) -> (Int, Self) {} As discussed earlier, at the AST level there is currently no way to specify which output(s) to differentiate with respect to. This makes the above example fragile because if there was any retroactive conformance to Moreover, it's unspecified how the proposed type checking rules deal with functions that return a nested tuple of differentiable types, e.g. For reasons above, I don't believe the support for tuple results is ready to go in. |
Regarding supporting result tuples, if you really need it, I might actually suggest trying an approach with precedents in the language, i.e. making tuples conform to Regarding the proposed features other than supporting result tuples, would you mind writing up the rules used to infer the differential and pullback types from a function type? That would really help me better understand and review the proposed typing rules. |
@swift-ci please test |
Given what happened with #59467, I fear that this may break S4TF. It is an extremely large pull request developed over several months, with many opportunities to introduce new crashers. Since I don't have the skillset to compile a full Swift toolchain from this branch, I won't be able to test it until the PR is integrated into nightly snapshots. Then, we might have to revert the commits, which is not great to do. I'm asking if anyone will spare some time to prevent this from occurring again. I have a spent a lot of time making a well-maintained build script for S4TF. It runs for sure on macOS, although some modifications must be made before it can run on Linux. It hard-codes the toolchains IDs for the August 9, 2022 development snapshot, so you may need to tweak it a little before running. Also, please read over the entire thing before running because that's best practice for security purposes. https://gist.github.com/philipturner/7aa063af04277d463c14168275878511 TensorFlow 2.9.1 could take 30 minutes to an hour (roughly speaking) to compile once; after that Bazel will cache C++ build products. I encourage using the Alternatively, you could examine the S4TF template notebook on Colab, which uses the very last X10 binaries built by Google (TensorFlow 2.4.0). The XLA/TPU support has broken beyond repair (the old binaries fail and downstreaming recent pytorch/xla commits produces something unusable), and I'm in no place to launch up-to-date binaries. Plus, my focus is on the new backend with no dependency to tensorflow/tensorflow. It's an evolution of https://colab.research.google.com/drive/1v3ZhraaHdAS2TGj03hE0cK-KRFzsqxO1?usp=sharing If you have an x86 Mac, Google still has an x86 binary of X10 cached. That will save time and be more secure. I was not able to use it because my primary machine is ARM64, hence the need to compile TensorFlow from scratch. If you choose this route, translate the Jupyter magic commands into a Bash script (the commands are mostly documented here) and let me know you chose that route instead of the TF 2.9.1 binary. Third option: if you already have some CI setup to test internal toolchains, it might spin off a Linux x86 toolchain. You could try uploading that to Google Drive, then fetching in Colab (which has extremely fast network speed inside Google's servers). Then, wait 3 minutes to see if S4TF compiles inside the notebook. ——- Whatever the option, please compile 3 times:
|
…tiveFunctionLinearMapType(). Fails for certain non-wrt inouts.
…utoDiffDifferentialType(), rather than the hardcoded 0 index.
…ction(), adding some serialization tests.
…semantic results.
… index with all result indices in multiple places.
…th multiple results, and an activity analysis test representing code that had exposed that issue. Co-authored-by: Anton Korobeynikov <[email protected]>
@swift-ci please test |
@philipturner - I can try to do a local build of Swift for TensorFlow to verify, but I would be very surprised if the changes here would cause a problem with any of the code on your end. It should not change the behavior of existing code, aside from now supporting differentiating through functions with multiple results. In a large differentiable Swift codebase locally, it hasn't impacted the functioning over a significant amount of testing. This took a while to revisit because it had exposed some other issues elsewhere when you tried to differentiate through certain types of functions with multiple semantic results, and those cases have one-by-one been resolved since via other PRs or @asl's patch in the latest commit here. However, if there is some kind of interaction that causes problems in existing code, I'd very much like to be aware of it, so I'll see if I can't test this against the current Swift for TensorFlow code. |
@BradLarson - It seems there’s a low chance of creating new compiler crashers that will break S4TF; however, it’s better safe than sorry. Compilation will be much simpler and faster once I complete the GPU-only backend. If you have already created a workflow for testing the repository, and subsequent builds take minutes, it might be good practice to keep those build products around. S4TF would remain a good validation test in addition to PassiveLogic’s code base, and you are free to test whenever you think it is appropriate. |
Closing in favor of PR #66873. |
PR #32629 added reverse-mode differentiation support for
apply
instructions with multiple active semantic results. This completes user-facing support for differentiable functions with multiple semantic results.Previously, it was not possible to state that a function with multiple semantic results was
@differentiable
. This included:It is now possible to mark these functions as
@differentiable
and to supply custom pullbacks for them.As a concrete practical benefit, it allows for the definition of one-step accessors for arrays that have constant-time performance on the backwards pass, instead of requiring O(N) zero value generation for each access. This was highlighted as an advantage of Swift's mutable value semantics in the the Swift for TensorFlow overview paper.
For example, the following code for a faster backwards pass involving array access is now possible as a result of this PR:
(A
mutating get
subscript could also potentially be used for this.)Resolves TF-1288.
This partially resolves TF-1038 by allowing for differentiating through functions that return tuples of differentiable values (for example,
func multipleResult(_ x: Float) -> (Float, Float)
). However, all results are treated as being with-respect-to (wrt) due to the lack of a means at present for specifying result indices within the@differentiable
attribute. It is anticipated that specification of with-respect-to results can come as a later enhancement.