Skip to content

[core] Refactoring the return value of teardown #1595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 26, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jan 15, 2023

Fixes #1649

This is a minor refactoring that avoids using the return value to indicate the failure of the teardown. It also avoids the sky down's progress bar progress when a teardown fails. cc @mraheja

Tested (run the relevant ones):

  • All smoke tests: pytest tests/test_smoke.py

@Michaelvll Michaelvll changed the title [core] Use runtime error instead [core] Refactoring the return value of teardown Jan 26, 2023
@Michaelvll Michaelvll marked this pull request as ready for review January 27, 2023 01:47
@romilbhardwaj
Copy link
Collaborator

Thanks for adding this @Michaelvll! What's a good way to test this PR/simulate down failures?

…cretevitamin/sky-experiments into teardown-return-value
@Michaelvll
Copy link
Collaborator Author

What's a good way to test this PR/simulate down failures?

Sorry for missing the question. One way to test it is to disconnect the internet and sky down test-down*. With the master branch, the progress bar will show 100%, even though multiple error shows up, but the current branch will show 0%.

@Michaelvll
Copy link
Collaborator Author

This is also important to avoid leakage of spot clusters in managed spot, as we didn't actually catch the exceptions when the sky.down fails

usage_lib.messages.usage.set_internal()
sky.down(cluster_name)
return

@Michaelvll Michaelvll added this to the v0.3 milestone May 23, 2023
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if it's confirmed that the following works: sky down -a (or multiple clusters) and only some of the clusters throw a RuntimeError.

Copy link
Collaborator Author

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested sky down multiple clusters: some succeeded and some failed, with the correct progress bar.

I am testing again with the smoke tests and will merge it after all tests pass:

  • pytest tests/test_smoke.py

@Michaelvll Michaelvll merged commit f733415 into master May 26, 2023
@Michaelvll Michaelvll deleted the teardown-return-value branch May 26, 2023 03:17
@Michaelvll Michaelvll mentioned this pull request Jun 1, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] sky down does not respect the failure from ray down in progress bar
3 participants