Skip to content

[UX] Fix unexpected output for ctrl-c during sky launch #2206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jul 10, 2023

Fixes #2205.

The os.killpg might be applied to a subprocess that was attached under the root user, due the parent process being killed first. This will cause the PermissionError: [Errno 1] Operation not permitted

@concretevitamin Could you verify if this solve the problem?

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • ctrl-c during the Launching progress spinner
    • ctrl-c during the sky logs (originally will show ProcessLookupError
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll marked this pull request as ready for review July 10, 2023 05:30
os.killpg(proc.pid, signal.SIGINT)
try:
os.killpg(proc.pid, signal.SIGINT)
except Exception: # pylint: disable=broad-except
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I confirm it fixed ProcessLookupError: #2205 (comment).

  1. However, what was the cause of the echo job's error PermissionError: [Errno 1] Operation not permitted? It seems like ctrl-c should just work.

  2. It seems a bit dangerous to ignore all exceptions. Is it ok to only catch ProcessLookupError, and surface real PermissionError's (different from the repro)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, what was the cause of the echo job's error PermissionError: [Errno 1] Operation not permitted? It seems like ctrl-c should just work.

I am not exactly sure for the reason of the error, but my guess is that the parent process might be killed before the child processes, causing the child process owned by the root user and a PermssionError when os.killpg is called.

I now replaced the implementation with the kill_children_processes instead, to be align with our implementation in

signal.signal(signal.SIGINT, backend_utils.interrupt_handler)

def interrupt_handler(signum, frame):
del signum, frame
subprocess_utils.kill_children_processes()
# Avoid using logger here, as it will print the stack trace for broken
# pipe, when the output is piped to another program.
print(f'{colorama.Style.DIM}Tip: The job will keep '
f'running after Ctrl-C.{colorama.Style.RESET_ALL}')
with ux_utils.print_exception_no_traceback():
raise KeyboardInterrupt(exceptions.KEYBOARD_INTERRUPT_CODE)

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes the two repros pass. LGTM if there isn't any known issues of SIGKILL vs. SIGINT.

@Michaelvll
Copy link
Collaborator Author

Thanks, this makes the two repros pass. LGTM if there isn't any known issues of SIGKILL vs. SIGINT.

Thanks for the review @concretevitamin. There are no known issues of SIGKILL vs SIGINT at the moment, and we were already doing that with the signal handler. The keyboard interrupt will only be sent in an interactive way, so using SIGTERM/SIGKILL does not cause any issue with the user own program. ; )

@Michaelvll Michaelvll merged commit b231bfe into master Jul 11, 2023
@Michaelvll Michaelvll deleted the fix-unexpected-output-for-ctrl-c branch July 11, 2023 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weird error messages after ctrl-c out a running job's log
2 participants