Skip to content

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Aug 5, 2020

Proposed change(s)

Followup from #4127, retrying from #4298

After that PR, exiting during training would save twice, once when the checkpoint is saved, and once for the "final" model.

The bypasses the "final" save and just copies the checkpointed .nn file (and possibly .onnx too, if it exists).

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

#4127

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

Copy link
Contributor

@harperj harperj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor suggestion, otherwise looks good to me.

try:
shutil.copyfile(source_onnx_path, destination_onnx_path)
logger.info(f"Copied {source_onnx_path} to {destination_onnx_path}.")
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe handle the more specific exception(s)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to OSError, which covers FileNotFound (the one we'd expect when onnx not installed) and misc permissions errors.

@chriselion chriselion merged commit b68beb4 into master Aug 5, 2020
@delete-merged-branch delete-merged-branch bot deleted the copy-checkpoint-models branch August 5, 2020 18:34
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants