-
Notifications
You must be signed in to change notification settings - Fork 633
[GCP] Fix machine image for cluster creation #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Left several comments 🫡
Co-authored-by: Tian Xia <[email protected]>
…into fix-machine-image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A nit: should we modify the function name here to sth like _provision
? Though I'm happy with the current implementation since most of the time it won't trigger single inserts.
skypilot/sky/provision/provisioner.py
Line 53 in 16cdc7f
def _bulk_provision( |
Thanks for the suggestion @cblmemo! I suppose the |
Fixes #3027
The
bulkInsert
API fails to create VMs usingmachineImage
along with disk parameter override, as it does a check for the source of the disk, which should not be applied whenmachineImage
is specified (bug of GCP API).This PR makes it fallback to the regular
insert
API instead to support the creation of instances using machine image.Tested (run the relevant ones):
bash format.sh
sky launch -c test-machine-image2 --cloud gcp --disk-size 256 --num-nodes 2 --gpus A100-80GB:8
; correctly failover through multiple regionssky launch -c test-machine-image5 --cloud gcp --image-id p rojects/skypilot-375900/global/machineImages/test-machine-image --disk-size 256 --cpus 2 --num-nodes 4
, successfully create the clustersky launch -c test-machine-image5 --cloud gcp --image-id projects/skypilot-375900/global/machineImages/test-machine-image --disk-size 256 --num-nodes 2 --gpus A100-80GB:8
successfuly failver through zones/regions.pytest tests/test_smoke.py --gcp
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh