Skip to content

Initialize normalizer with mean from first trajectory #4299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 12, 2020
Merged

Conversation

andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Aug 3, 2020

Proposed change(s)

To address MLA-1213. The issue is essentially that if the difference between the initial average (0) and first trajectory average (~1800) (1) too large and (2) skewed negative i.e. the resulting change in variance will be negative and too large. With the observations of ~1800 on walker, we are essentially updating the variance with 1800 * -x < -1 so that the change results in variance < 0 leading to NaNs. The correct way to initialize this algorithm is to use the mean of the first samples as the initial mean.

The assumption is that samples are coming from a normal distribution, so 'seeing' a 0 and then all 1800s is, from the algo's perspective, an incredibly low probability event.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@andrewcoh andrewcoh changed the title Increase initial normalize steps to 100 Initialize normalizer with mean from first trajectory Aug 4, 2020
@andrewcoh andrewcoh requested review from ervteng and chriselion August 4, 2020 15:58
Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Unit test would be great, but not a requirement if it's too hard to do (or too overfit to tensorflow).

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for a TF fix - let's separately figure out how to do it for PyTorch.

Note for posterity: if the first update has none of the large observations, it's still possible to get a NaN. But this should fix the issue for most cases.

@andrewcoh
Copy link
Contributor Author

LGTM for a TF fix - let's separately figure out how to do it for PyTorch.

Note for posterity: if the first update has none of the large observations, it's still possible to get a NaN. But this should fix the issue for most cases.

To clarify, it's possible to get a NaN if the values in successive trajectories are really different and don't follow anything near to a normal distribution (an assumption of the welford algorithm).

@@ -123,6 +123,9 @@ def make_fake_trajectory(
memory=memory,
)
steps_list.append(experience)
obs = []
for _shape in observation_shapes:
obs.append(np.ones(_shape, dtype=np.float32))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done so that changing the last obs doesn't overwrite the second to last obs

@andrewcoh andrewcoh mentioned this pull request Aug 12, 2020
10 tasks
@andrewcoh andrewcoh merged commit 01cccd8 into master Aug 12, 2020
@andrewcoh andrewcoh mentioned this pull request Sep 8, 2020
10 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants