Skip to content

[AutoDiff] Fix memory leaks caused by partial application handling. #25967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2019

Conversation

rxwei
Copy link
Contributor

@rxwei rxwei commented Jul 4, 2019

In VJPEmitter, if the original function call has substitutions, we partial_apply it with no arguments to specialize it. This partial_apply is not being released. JVP and VJP are being specialized the same way, but they are not being released either.

To fix this, we release the @differentiable function returned by autodiff_function, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the @differentiable function that comes later. As a result, ADContext::promoteToDifferentiableFunction no longer needs to retain the associated functions.

Example where the original function has substitutions:

f' = partial_apply f<...>()
f_diff = autodiff_function f'
release_value f_diff

Example where the original function does not have substitutions:

retain_value f
f_diff = autodiff_function f
release_value f_diff

Note: This makes the autodiff_function folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.

This should fix TF-621.

rxwei added 2 commits July 3, 2019 00:26
In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either.

To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions.

Example where the original function has substitutions:
```
f' = partial_apply f<...>()
f_diff = autodiff_function f'
release_value f_diff
```

Example where the original function does not have substitutions:
```
retain_value f
f_diff = autodiff_function f
release_value f_diff
```

Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.

This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
@rxwei rxwei added the tensorflow This is for "tensorflow" branch PRs. label Jul 4, 2019
@rxwei rxwei requested a review from dan-zheng July 4, 2019 00:08
@rxwei
Copy link
Contributor Author

rxwei commented Jul 4, 2019

@swift-ci please test tensorflow

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Curious what's wrong with ADContext::foldAutoDiffFunctionExtraction.

@saeta
Copy link
Contributor

saeta commented Jul 4, 2019

Nice work @rxwei !

@rxwei
Copy link
Contributor Author

rxwei commented Jul 4, 2019

Nice catch! Curious what's wrong with ADContext::foldAutoDiffFunctionExtraction.

For retain count balancing, the result of autodiff_function won't be trivially dead.

@gottesmm
Copy link
Contributor

gottesmm commented Jul 6, 2019

@rxwei have you guys taken my enable ownership verification everywhere? The lifetime of partial_apply should be validated by that. Or is this done later in the pipeline (after ossa stripping?)

@rxwei
Copy link
Contributor Author

rxwei commented Jul 8, 2019

@rxwei have you guys taken my enable ownership verification everywhere? The lifetime of partial_apply should be validated by that. Or is this done later in the pipeline (after ossa stripping?)

AD currently happens after ownership verification, and we plan to move it before so that things get verified.

@rxwei
Copy link
Contributor Author

rxwei commented Jul 8, 2019

@swift-ci please test tensorflow

1 similar comment
@rxwei
Copy link
Contributor Author

rxwei commented Jul 8, 2019

@swift-ci please test tensorflow

@@ -55,7 +55,8 @@ LeakCheckingTests.test("BasicVarLeakChecking") {
_ = model.gradient(at: x) { m, x in m.applied(to: x) }
}

testWithLeakChecking {
// TODO: Fix memory leak.
testWithLeakChecking(expectedLeakCount: 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is a expected memory leak caused by closure capture that got exposed after tidying up AD-associated functions' lifetime. This will be fixed later.

@rxwei rxwei merged commit 78f9a13 into swiftlang:tensorflow Jul 8, 2019
@rxwei rxwei deleted the leaks branch July 8, 2019 09:31
bartchr808 pushed a commit to bartchr808/swift that referenced this pull request Jul 8, 2019
…wiftlang#25967)

In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either.

To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions.

Example where the original function has substitutions:
```
f' = partial_apply f<...>()
f_diff = autodiff_function f'
release_value f_diff
```

Example where the original function does not have substitutions:
```
retain_value f
f_diff = autodiff_function f
release_value f_diff
```

Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.

This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
pschuh pushed a commit that referenced this pull request Jul 9, 2019
…25967)

In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either.

To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions.

Example where the original function has substitutions:
```
f' = partial_apply f<...>()
f_diff = autodiff_function f'
release_value f_diff
```

Example where the original function does not have substitutions:
```
retain_value f
f_diff = autodiff_function f
release_value f_diff
```

Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.

This should fix [TF-621](https://bugs.swift.org/browse/TF-621).
bartchr808 added a commit to bartchr808/swift that referenced this pull request Jul 9, 2019
commit 5997b7d
Author: Bart Chrzaszcz <[email protected]>
Date:   Tue Jul 9 15:45:53 2019 +0100

    Remove activity info and get last tuple element for func.

commit 59b51b1
Author: Bart Chrzaszcz <[email protected]>
Date:   Tue Jul 9 15:05:21 2019 +0100

    Fix crasher by mapping result type into context of func.

commit fbdd2b9
Merge: e5baca9 2ce6f9c
Author: Bart Chrzaszcz <[email protected]>
Date:   Tue Jul 9 14:15:57 2019 +0100

    Merge branch 'tensorflow' into jvp-emitter

commit e5baca9
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 09:54:19 2019 -0700

    Fix JVP result type.

commit e8cfcd7
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 16:57:12 2019 +0100

    WIP: only generate JVP body if VJP not user defined.

commit 5ccab9f
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 14:53:54 2019 +0100

    WIP

commit ee8818c
Merge: d06195f 8f2f81b
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 12:50:08 2019 +0100

    Merge branch 'tensorflow' into jvp-emitter

commit 8f2f81b
Author: Richard Wei <[email protected]>
Date:   Mon Jul 8 10:31:48 2019 +0100

    [AutoDiff] Fix memory leaks caused by partial application handling. (swiftlang#25967)

    In VJPEmitter, if the original function call has substitutions, we `partial_apply` it with no arguments to specialize it. This `partial_apply` is not being released. JVP and VJP are being specialized the same way, but they are not being released either.

    To fix this, we release the `@differentiable` function returned by `autodiff_function`, which will release the original and the associated functions tht are to be filled in later altogether. If the original function does not have substitutions, we retain the original function to balance out the release of the `@differentiable` function that comes later. As a result, `ADContext::promoteToDifferentiableFunction` no longer needs to retain the associated functions.

    Example where the original function has substitutions:
    ```
    f' = partial_apply f<...>()
    f_diff = autodiff_function f'
    release_value f_diff
    ```

    Example where the original function does not have substitutions:
    ```
    retain_value f
    f_diff = autodiff_function f
    release_value f_diff
    ```

    Note: This makes the `autodiff_function` folding optimization no longer able to detect the pattern, but it is necessary. We can rewrite the optimization later.

    This should fix [TF-621](https://bugs.swift.org/browse/TF-621).

commit d06195f
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 10:50:54 2019 +0100

    Generate undef return in basic block of differential.

commit 56119ca
Merge: 5b82a74 bad0d63
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 01:40:47 2019 -0700

    Merge branch 'tensorflow' into jvp-emitter

commit bad0d63
Merge: 5a8eb79 8c4853a
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jul 8 01:40:16 2019 -0700

    Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow

commit 5b82a74
Author: Bart Chrzaszcz <[email protected]>
Date:   Wed Jul 3 17:57:36 2019 -0700

    WIP

commit 9f25b77
Author: Bart Chrzaszcz <[email protected]>
Date:   Wed Jul 3 15:54:37 2019 -0700

    Get correct result type of JVP with undef.

commit cca858c
Author: Bart Chrzaszcz <[email protected]>
Date:   Wed Jul 3 15:15:36 2019 -0700

    Remove PullbackInfo.

commit e22221f
Author: Bart Chrzaszcz <[email protected]>
Date:   Wed Jul 3 13:55:29 2019 -0700

    Get basic sil creation working.

commit 1c00ee1
Merge: 78a2e41 868410c
Author: Bart Chrzaszcz <[email protected]>
Date:   Wed Jul 3 10:26:57 2019 -0700

    Merge branch 'tensorflow' into jvp-emitter

commit 78a2e41
Author: Bart Chrzaszcz <[email protected]>
Date:   Tue Jul 2 18:02:15 2019 -0700

    Init copying over JVP class.

commit 5a8eb79
Author: Dan Zheng <[email protected]>
Date:   Wed Jun 26 22:04:04 2019 -0700

    Update checkout for tensorflow-swift-apis. (swiftlang#25814)

    tensorflow/swift-apis@60f63e1

    `Tensor` now conforms to `PointwiseMultiplicative`.

commit 7ead8fd
Author: Dan Zheng <[email protected]>
Date:   Wed Jun 26 19:29:29 2019 -0700

    [AutoDiff] Partial fix for `@differentiable` function thunking crash. (swiftlang#25803)

    Partial fix for `@differentiable` function thunk with output opaque
    abstraction pattern. Avoid `AbstractionPattern` methods that crash for
    opaque patterns.

    Small step towards fixing TF-123.
    Reproducers no longer crash, SIL verification fails instead.
    Robust fix requires more work - see TF-123 for more details.

commit dd5207a
Author: Dan Zheng <[email protected]>
Date:   Wed Jun 26 15:42:09 2019 -0700

    [stdlib] Add PointwiseMultiplicative protocol. (swiftlang#25772)

    `PointwiseMultiplicative` represents a ring over multiplication with
    identity element "one" and a multiplicative inverse (reciprocal).

    Conform `Differentiable` synthesized struct types to
    `PointwiseMultiplicative` protocol if possible.

    Enables revamping optimizers, which requires division
    (i.e. multiplication with multiplicative inverse).

commit 7ea6534
Author: Gogul Balakrishnan <[email protected]>
Date:   Wed Jun 26 11:07:10 2019 -0700

    Move swift-apis to the latest commit. (swiftlang#25785)

commit c2fa0ac
Author: Nanjiang Jiang <[email protected]>
Date:   Wed Jun 26 02:15:55 2019 -0500

    [AutoDiff] Minor style fix (swiftlang#25780)

commit 2ae3c3a
Author: Dan Zheng <[email protected]>
Date:   Tue Jun 25 21:43:19 2019 -0700

    [build-script] Fix e11e127 to handle Xcode-based builds (swiftlang#25769) (swiftlang#25778)

    Xcode puts the bin/ and lib/ directories in a subdirectory based on
    configuration; account for this when copying over compiler-rt for
    non-host Apple platforms using one of the existing helpers in
    build-script-impl.

commit 1c5fe68
Merge: 5241118 b495095
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Jun 24 20:26:50 2019 -0700

    Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow

commit 5241118
Merge: 4eb86ab 71ba224
Author: Bart Chrzaszcz <[email protected]>
Date:   Sat Jun 22 16:42:44 2019 -0700

    Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow

commit 4eb86ab
Merge: 8dbf5d3 8996939
Author: Bart Chrzaszcz <[email protected]>
Date:   Fri Jun 21 10:36:36 2019 -0700

    Merge branch 'tensorflow' of https://github.com/apple/swift into tensorflow

commit 8dbf5d3
Author: Bart Chrzaszcz <[email protected]>
Date:   Tue Jun 18 18:40:02 2019 -0700

    Add tests.
@RahulBhalley
Copy link

@rxwei The memory leak issue still persists on Colab. Do you have any plans on when the new memory leak fixed version of Swift for TensorFlow will go live on Google Colab?

@rxwei
Copy link
Contributor Author

rxwei commented Jul 15, 2019

This fix eliminated the majority of memory leaks that got exposed in 0.4. We are working on an additional fix. You can expect it to be ready this week.

A workaround you can try for now is to avoid having any closure you differentiate capture variables if the variables are in created in a loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tensorflow This is for "tensorflow" branch PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants