Skip to content

don't save model twice on exit #4298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Aug 3, 2020

Proposed change(s)

Followup from #4127

After that PR, exiting during training would save twice, once when the checkpoint is saved, and once for the "final" model.

This tweaks the logic so that RLTrainer.save_model() doesn't call the model when calling _checkpoint().

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

#4127

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@dongruoping
Copy link
Contributor

This will result in inconsistency of all checkpoints having both .ckpt and .nn files except the last one only having .ckpt while its .nn file is with a different name under different path. This could work but might cause some confusion when someone look at the result folder or when writing scripts to track the checkpointing in cloud training.

@harperj
Copy link
Contributor

harperj commented Aug 4, 2020

Another way to handle this is to have the final model simply checkpoint the same as all of the others, and then copy the .nn file to its resulting location. What do you think about that as an alternative?

@chriselion
Copy link
Contributor Author

Yeah, I like the copying idea better. I'll try to get to it this week unless someone else gets to it first.

@chriselion chriselion closed this Aug 4, 2020
@chriselion chriselion deleted the dont-save-model-twice branch August 4, 2020 16:30
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants