-
Notifications
You must be signed in to change notification settings - Fork 614
MovingAverage: add dynamic decay and swap weights #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MovingAverage: add dynamic decay and swap weights #1726
Conversation
def test_swap_weights(sequential_update): | ||
for sequential_update in [True, False]: | ||
|
||
strategy = tf.distribute.MirroredStrategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use distributed strategies in the test suite yet. It's work in progress, a first step is there: #1713
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swap_weights has to be called in a cross replica context.
Can I use OneDeviceStrategy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to have very soon distributed capabilities in the tests, could you take a look at #1770 and tell me if this would work for you?
This will enable true multi-Gpu (virtual at least) strategy testing, even with multiple pytest workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work for this test case. Do you know when those patches will be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two pull requests need to get reviewed to get there. So I'd expect a week or so. If next week it's not merged, I'll ping the other maintainers about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the blocking pull requests are still under review, would it be possible to merge this in with either the test commented out or just using OneDeviceStrategy in the test? I could immediately make a pull request that fixed the test case so it could be merged once the blocking pull requests are submitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would either of these options work for you? We would like to get these changes in soon because they are fixing real problems with MovingAverage.
775c98d
to
addaa4a
Compare
Adding dynamic decay to MovingAverage improves early accuracy. When using dynamic decay, decay starts at 0.1 and gradually increases up to `average_decay`.
This patch makes it easier to swap the model weights and the MovingAverage weights before eval and swap them back after eval.
addaa4a
to
d9629a0
Compare
Since #1770 has been merged, I have updated the patches to use "@pytest.mark.with_device([tf.distribute.MirroredStrategy])". Please take a look and let me know if there is anything else I should do. |
@Squadrick Is there anything you need from me before you can review this pull request? We would like to submit it soon. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the contribution!
@gabrieldemarmiesse Can you please merge this?
a.assign_sub(b) | ||
return a | ||
|
||
def swap(strategy, a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this trick for swapping.
Thanks @tf-marissaw for the pull request and thanks @Squadrick for the review! |
* Add dynamic decay to MovingAverage. Adding dynamic decay to MovingAverage improves early accuracy. When using dynamic decay, decay starts at 0.1 and gradually increases up to `average_decay`. * Add ability to swap weights to MovingAverage. This patch makes it easier to swap the model weights and the MovingAverage weights before eval and swap them back after eval.
* Add dynamic decay to MovingAverage. Adding dynamic decay to MovingAverage improves early accuracy. When using dynamic decay, decay starts at 0.1 and gradually increases up to `average_decay`. * Add ability to swap weights to MovingAverage. This patch makes it easier to swap the model weights and the MovingAverage weights before eval and swap them back after eval.
Adds two new features to MovingAverage:
Dynamic decay: This helps the early accuracy of MovingAverage by starting the decay at 0.1 and gradually increasing it to average_decay.
Swap weights: This makes it easier to temporarily swap the model weights and the average weights during eval.