Skip to content

Add profiler #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 19, 2024
Merged

Add profiler #1

merged 4 commits into from
Jan 19, 2024

Conversation

lessw2020
Copy link
Contributor

Adds option to do torch profile tracing via:
--run_profiler (T/F)
--profile_folder (str)

Traces are saved out with rank_X as part of the trace name.
rank_named_traces

Implemented as context wrapper around the main training loop.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2024
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for the quick progress! I think this looks great, have a few comments about how to trigger the profiler inlined.

…rofiling.py file, global dumps folder, logging_utils.py
@lessw2020
Copy link
Contributor Author

pr is updated to address the previous feedback.
adds user config control for profiling via train_config.toml,
separate profiling.py file, global dumps folder, logging_utils.py.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, only have one comment about profiling flags

…filing, create custom named folders for traces
@wanchaol wanchaol merged commit f1e86e4 into pytorch:main Jan 19, 2024
lessw2020 added a commit that referenced this pull request Apr 18, 2024
Adds option to do torch profile tracing via:
--run_profiler  (T/F)
--profile_folder (str) 

Traces are saved out with rank_X as part of the trace name.
<img width="1711" alt="rank_named_traces"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/6eb3c3e0-6034-4d1f-8ea8-f43988755714">

Implemented as context wrapper around the main training loop.
jinsun-yoo pushed a commit to jinsun-yoo/torchtitan that referenced this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants