Skip to content

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Jul 10, 2024

tianyu-l added a commit that referenced this pull request Jul 10, 2024
ghstack-source-id: 357f610
Pull Request resolved: #451
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 10, 2024
@tianyu-l tianyu-l requested review from wconstab and awgu July 11, 2024 05:10
"--experimental.pipeline_parallel_schedule 1f1b",
"--training.data_parallel_degree 1",
"--model.norm_type rmsnorm", # fused_rmsnorm crashes with PP
"--model.norm_type rmsnorm", # compiled_rmsnorm / fused_rmsnorm crashes with PP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what is the reason that compiled RMSNorm is incompatible with PP

test_runner.py Outdated
"--experimental.pipeline_parallel_split_points layers.1,layers.2,layers.3,layers.4,layers.5,layers.6,layers.7",
"--experimental.pipeline_parallel_schedule interleaved_1f1b",
"--model.norm_type rmsnorm", # fused_rmsnorm throws cuda context error with pp
"--model.norm_type rmsnorm", # compiled_rmsnorm / fused_rmsnorm throws cuda context error with pp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. is this cuda context error the same reason as above for "crashes with PP"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It seems nobody understands why the error happens at this moment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is the same error, maybe we should make the commit uniform?

tianyu-l added a commit that referenced this pull request Aug 2, 2024
ghstack-source-id: fa38b22
Pull Request resolved: #451
@tianyu-l tianyu-l force-pushed the gh/tianyu-l/15/head branch from a437389 to a757a84 Compare August 16, 2024 21:00
tianyu-l added a commit that referenced this pull request Aug 16, 2024
ghstack-source-id: fa38b22
Pull Request resolved: #451
@tianyu-l
Copy link
Contributor Author

see updates in #535

@tianyu-l tianyu-l closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants