-
Notifications
You must be signed in to change notification settings - Fork 640
[UX] Fix unexpected output for ctrl-c during sky launch
#2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sky/skylet/log_lib.py
Outdated
os.killpg(proc.pid, signal.SIGINT) | ||
try: | ||
os.killpg(proc.pid, signal.SIGINT) | ||
except Exception: # pylint: disable=broad-except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I confirm it fixed ProcessLookupError: #2205 (comment).
-
However, what was the cause of the echo job's error
PermissionError: [Errno 1] Operation not permitted
? It seems like ctrl-c should just work. -
It seems a bit dangerous to ignore all exceptions. Is it ok to only catch ProcessLookupError, and surface real PermissionError's (different from the repro)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, what was the cause of the echo job's error PermissionError: [Errno 1] Operation not permitted? It seems like ctrl-c should just work.
I am not exactly sure for the reason of the error, but my guess is that the parent process might be killed before the child processes, causing the child process owned by the root
user and a PermssionError
when os.killpg
is called.
I now replaced the implementation with the kill_children_processes
instead, to be align with our implementation in
skypilot/sky/backends/cloud_vm_ray_backend.py
Line 3258 in c2e9395
signal.signal(signal.SIGINT, backend_utils.interrupt_handler) |
skypilot/sky/backends/backend_utils.py
Lines 2479 to 2487 in c2e9395
def interrupt_handler(signum, frame): | |
del signum, frame | |
subprocess_utils.kill_children_processes() | |
# Avoid using logger here, as it will print the stack trace for broken | |
# pipe, when the output is piped to another program. | |
print(f'{colorama.Style.DIM}Tip: The job will keep ' | |
f'running after Ctrl-C.{colorama.Style.RESET_ALL}') | |
with ux_utils.print_exception_no_traceback(): | |
raise KeyboardInterrupt(exceptions.KEYBOARD_INTERRUPT_CODE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes the two repros pass. LGTM if there isn't any known issues of SIGKILL vs. SIGINT.
Thanks for the review @concretevitamin. There are no known issues of SIGKILL vs SIGINT at the moment, and we were already doing that with the signal handler. The keyboard interrupt will only be sent in an interactive way, so using SIGTERM/SIGKILL does not cause any issue with the user own program. ; ) |
Fixes #2205.
The
os.killpg
might be applied to a subprocess that was attached under the root user, due the parent process being killed first. This will cause thePermissionError: [Errno 1] Operation not permitted
@concretevitamin Could you verify if this solve the problem?
Tested (run the relevant ones):
bash format.sh
ctrl-c
during theLaunching
progress spinnerctrl-c
during thesky logs
(originally will showProcessLookupError
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh