-
Notifications
You must be signed in to change notification settings - Fork 437
Enable ROCm CI support. #1260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable ROCm CI support. #1260
Conversation
Hi @akashveramd! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
ebaafdc
to
ce60b15
Compare
No ciflow labels are configured for this repo. |
TEST_WITH_ROCM | ||
and test_flavor.test_name in skip_for_rocm_test_list | ||
): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic makes sense to me, but if we really want to use the test setting in integration_tests_h100.py, we should move this logic to that file (and of course rename it to be more agnostic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jithunnair-amd : All tests in integration_tests_h100.py passes for rocm. Hence, we don't TEST_WITH_ROCM in integration_tests_h100.py. However, we need to talk about renaming integration_tests_h100.py filename as we also run it on rocm runners.
cc: @tianyu-l @fegin
…g ubuntu folder for cuda Dockerfile.
…Fixed error in integration_tests.py. Fixed lint errors.
…single workflow file to run integration tests including h100 for both cuda and rocm. Need different name for integration_test_8gpu_h100.yaml as we are also running it for rocm. Fixed file permission for integration_tests.py.
d527f27
to
18025ad
Compare
@huydhn Need your help in creating a new docker repo for torchtitan ROCm docker image: https://github.com/pytorch/torchtitan/actions/runs/16042425274/job/45266420732?pr=1260#step:7:1436
|
…_job_v2.yml for integration_test_8gpu.yaml.
In this PR-