-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[AutoDiff] Fix memory leaks caused by partial application handling. #25967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either. To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions. Example where the original function has substitutions: ``` f' = partial_apply f<...>() f_diff = autodiff_function f' release_value f_diff ``` Example where the original function does not have substitutions: ``` retain_value f f_diff = autodiff_function f release_value f_diff ``` Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later. This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
@swift-ci please test tensorflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Curious what's wrong with ADContext::foldAutoDiffFunctionExtraction
.
Nice work @rxwei ! |
For retain count balancing, the result of |
@rxwei have you guys taken my enable ownership verification everywhere? The lifetime of partial_apply should be validated by that. Or is this done later in the pipeline (after ossa stripping?) |
AD currently happens after ownership verification, and we plan to move it before so that things get verified. |
@swift-ci please test tensorflow |
1 similar comment
@swift-ci please test tensorflow |
@@ -55,7 +55,8 @@ LeakCheckingTests.test("BasicVarLeakChecking") { | |||
_ = model.gradient(at: x) { m, x in m.applied(to: x) } | |||
} | |||
|
|||
testWithLeakChecking { | |||
// TODO: Fix memory leak. | |||
testWithLeakChecking(expectedLeakCount: 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This is a expected memory leak caused by closure capture that got exposed after tidying up AD-associated functions' lifetime. This will be fixed later.
…wiftlang#25967) In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either. To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions. Example where the original function has substitutions: ``` f' = partial_apply f<...>() f_diff = autodiff_function f' release_value f_diff ``` Example where the original function does not have substitutions: ``` retain_value f f_diff = autodiff_function f release_value f_diff ``` Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later. This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
…25967) In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either. To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions. Example where the original function has substitutions: ``` f' = partial_apply f<...>() f_diff = autodiff_function f' release_value f_diff ``` Example where the original function does not have substitutions: ``` retain_value f f_diff = autodiff_function f release_value f_diff ``` Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later. This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
commit 5997b7d Author: Bart Chrzaszcz <[email protected]> Date: Tue Jul 9 15:45:53 2019 +0100 Remove activity info and get last tuple element for func. commit 59b51b1 Author: Bart Chrzaszcz <[email protected]> Date: Tue Jul 9 15:05:21 2019 +0100 Fix crasher by mapping result type into context of func. commit fbdd2b9 Merge: e5baca9 2ce6f9c Author: Bart Chrzaszcz <[email protected]> Date: Tue Jul 9 14:15:57 2019 +0100 Merge branch 'tensorflow' into jvp-emitter commit e5baca9 Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 09:54:19 2019 -0700 Fix JVP result type. commit e8cfcd7 Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 16:57:12 2019 +0100 WIP: only generate JVP body if VJP not user defined. commit 5ccab9f Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 14:53:54 2019 +0100 WIP commit ee8818c Merge: d06195f 8f2f81b Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 12:50:08 2019 +0100 Merge branch 'tensorflow' into jvp-emitter commit 8f2f81b Author: Richard Wei <[email protected]> Date: Mon Jul 8 10:31:48 2019 +0100 [AutoDiff] Fix memory leaks caused by partial application handling. (swiftlang#25967) In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either. To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions. Example where the original function has substitutions: ``` f' = partial_apply f<...>() f_diff = autodiff_function f' release_value f_diff ``` Example where the original function does not have substitutions: ``` retain_value f f_diff = autodiff_function f release_value f_diff ``` Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later. This should fix [TF-621](https://bugs.swift.org/browse/TF-621). commit d06195f Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 10:50:54 2019 +0100 Generate undef return in basic block of differential. commit 56119ca Merge: 5b82a74 bad0d63 Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 01:40:47 2019 -0700 Merge branch 'tensorflow' into jvp-emitter commit bad0d63 Merge: 5a8eb79 8c4853a Author: Bart Chrzaszcz <[email protected]> Date: Mon Jul 8 01:40:16 2019 -0700 Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow commit 5b82a74 Author: Bart Chrzaszcz <[email protected]> Date: Wed Jul 3 17:57:36 2019 -0700 WIP commit 9f25b77 Author: Bart Chrzaszcz <[email protected]> Date: Wed Jul 3 15:54:37 2019 -0700 Get correct result type of JVP with undef. commit cca858c Author: Bart Chrzaszcz <[email protected]> Date: Wed Jul 3 15:15:36 2019 -0700 Remove PullbackInfo. commit e22221f Author: Bart Chrzaszcz <[email protected]> Date: Wed Jul 3 13:55:29 2019 -0700 Get basic sil creation working. commit 1c00ee1 Merge: 78a2e41 868410c Author: Bart Chrzaszcz <[email protected]> Date: Wed Jul 3 10:26:57 2019 -0700 Merge branch 'tensorflow' into jvp-emitter commit 78a2e41 Author: Bart Chrzaszcz <[email protected]> Date: Tue Jul 2 18:02:15 2019 -0700 Init copying over JVP class. commit 5a8eb79 Author: Dan Zheng <[email protected]> Date: Wed Jun 26 22:04:04 2019 -0700 Update checkout for tensorflow-swift-apis. (swiftlang#25814) tensorflow/swift-apis@60f63e1 `Tensor` now conforms to `PointwiseMultiplicative`. commit 7ead8fd Author: Dan Zheng <[email protected]> Date: Wed Jun 26 19:29:29 2019 -0700 [AutoDiff] Partial fix for `@differentiable` function thunking crash. (swiftlang#25803) Partial fix for `@differentiable` function thunk with output opaque abstraction pattern. Avoid `AbstractionPattern` methods that crash for opaque patterns. Small step towards fixing TF-123. Reproducers no longer crash, SIL verification fails instead. Robust fix requires more work - see TF-123 for more details. commit dd5207a Author: Dan Zheng <[email protected]> Date: Wed Jun 26 15:42:09 2019 -0700 [stdlib] Add PointwiseMultiplicative protocol. (swiftlang#25772) `PointwiseMultiplicative` represents a ring over multiplication with identity element "one" and a multiplicative inverse (reciprocal). Conform `Differentiable` synthesized struct types to `PointwiseMultiplicative` protocol if possible. Enables revamping optimizers, which requires division (i.e. multiplication with multiplicative inverse). commit 7ea6534 Author: Gogul Balakrishnan <[email protected]> Date: Wed Jun 26 11:07:10 2019 -0700 Move swift-apis to the latest commit. (swiftlang#25785) commit c2fa0ac Author: Nanjiang Jiang <[email protected]> Date: Wed Jun 26 02:15:55 2019 -0500 [AutoDiff] Minor style fix (swiftlang#25780) commit 2ae3c3a Author: Dan Zheng <[email protected]> Date: Tue Jun 25 21:43:19 2019 -0700 [build-script] Fix e11e127 to handle Xcode-based builds (swiftlang#25769) (swiftlang#25778) Xcode puts the bin/ and lib/ directories in a subdirectory based on configuration; account for this when copying over compiler-rt for non-host Apple platforms using one of the existing helpers in build-script-impl. commit 1c5fe68 Merge: 5241118 b495095 Author: Bart Chrzaszcz <[email protected]> Date: Mon Jun 24 20:26:50 2019 -0700 Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow commit 5241118 Merge: 4eb86ab 71ba224 Author: Bart Chrzaszcz <[email protected]> Date: Sat Jun 22 16:42:44 2019 -0700 Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow commit 4eb86ab Merge: 8dbf5d3 8996939 Author: Bart Chrzaszcz <[email protected]> Date: Fri Jun 21 10:36:36 2019 -0700 Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow commit 8dbf5d3 Author: Bart Chrzaszcz <[email protected]> Date: Tue Jun 18 18:40:02 2019 -0700 Add tests.
@rxwei The memory leak issue still persists on Colab. Do you have any plans on when the new memory leak fixed version of Swift for TensorFlow will go live on Google Colab? |
This fix eliminated the majority of memory leaks that got exposed in 0.4. We are working on an additional fix. You can expect it to be ready this week. A workaround you can try for now is to avoid having any closure you differentiate capture variables if the variables are in created in a loop. |
In VJPEmitter, if the original function call has substitutions, we
partial_apply
it with no arguments to specialize it. Thispartial_apply
is not being released. JVP and VJP are being specialized the same way, but they are not being released either.To fix this, we release the
@differentiable
function returned byautodiff_function
, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the@differentiable
function that comes later. As a result,ADContext::promoteToDifferentiableFunction
no longer needs to retain the associated functions.Example where the original function has substitutions:
Example where the original function does not have substitutions:
Note: This makes the
autodiff_function
folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.This should fix TF-621.