-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Initialize normalizer with mean from first trajectory #4299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Unit test would be great, but not a requirement if it's too hard to do (or too overfit to tensorflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for a TF fix - let's separately figure out how to do it for PyTorch.
Note for posterity: if the first update has none of the large observations, it's still possible to get a NaN. But this should fix the issue for most cases.
To clarify, it's possible to get a NaN if the values in successive trajectories are really different and don't follow anything near to a normal distribution (an assumption of the welford algorithm). |
@@ -123,6 +123,9 @@ def make_fake_trajectory( | |||
memory=memory, | |||
) | |||
steps_list.append(experience) | |||
obs = [] | |||
for _shape in observation_shapes: | |||
obs.append(np.ones(_shape, dtype=np.float32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done so that changing the last obs doesn't overwrite the second to last obs
Proposed change(s)
To address MLA-1213. The issue is essentially that if the difference between the initial average (0) and first trajectory average (~1800) (1) too large and (2) skewed negative i.e. the resulting change in variance will be negative and too large. With the observations of ~1800 on walker, we are essentially updating the variance with 1800 * -x < -1 so that the change results in variance < 0 leading to NaNs. The correct way to initialize this algorithm is to use the mean of the first samples as the initial mean.
The assumption is that samples are coming from a normal distribution, so 'seeing' a 0 and then all 1800s is, from the algo's perspective, an incredibly low probability event.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments