Skip to content

[supervisor] execve into ring3 #2664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 8, 2021
Merged

Conversation

csweichel
Copy link
Contributor

This PR makes ring2 execve supervisor run (aka ring3). This way PID 1 in the workspace is supervisor run, much like it would be without user namespaces. Hence supervisor can properly do its job as reaper, and the graceful shutdown of tasks is a bit less brittle (#2654).

How does it work?

Instead of starting a child process, ring2 now calls setgid/setuid to drop to the Gitpod user (33333) and then execve's /proc/self/exe run, thereby replacing the ring2 process with supervisor run.

Why didn't we do this before?

setuid/setgid have a long and difficult history in Go. Go 1.16 introduced AllThreadsSyscall which makes setuid/setgid work on all threads without the use of CGO. TLDR; Before Go 1.16 we'd need CGO, but we don't want CGO because supervisor has to run in glibc and muslibc environments. cap - a Go-native port of libcap - supports this feature, and if it's present supports capability-preserving setuid/setgid calls without CGO.

But wait, Go 1.16 is still in beta?!?

Indeed. That's why leeway now supports choosing a Go version on a package level. Starting with this PR, we compile supervisor with go1.16beta1, and the rest using the Go version we have installed in the dev-environment image. Once Go 1.16 is out of beta (Feb 2021), we'll switch over for all components. Go betas have been really rather stable and I don't expect issues because we build supervisor using Go 1.16. Note golang/go#43149 does not concern us, because only ring2 uses AllThreadsSyscall and ring2 does not use signal.

How to test

  1. Enable feature preview
  2. Start a workspace on http://cw-supervisor-execve-ring3.staging.gitpod-dev.com/#https://github.com/csweichel/test-repo/tree/proper-shutdown. This workspace starts a command which listens for sigterm and writes a file.
  3. Stop the workspace. You should see proper-shutdown.txt in the list of changed files. This file was created when the task stopped properly.
  4. Create a zombie: ./zombie & sleep 1 && ps guaxf | grep zombie ... once the zombie parent exits ps guaxf | grep defunct should not return anything. Previously, the defunct process would still have been around.
Screen.Recording.2021-01-05.at.11.22.18.mov

/werft https

@csweichel csweichel force-pushed the cw/supervisor-execve-ring3 branch 3 times, most recently from 28a182c to f511f1a Compare January 6, 2021 08:30
@geropl
Copy link
Member

geropl commented Jan 8, 2021

/werft run

👍 started the job as gitpod-build-cw-supervisor-execve-ring3.23

log.WithError(err).Error("failed to start the child process")
err = cap.SetUID(33333)
if err != nil {
log.WithError(err).Error("cannot setgid")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error msg switched

},
err = cap.SetGroups(33333)
if err != nil {
log.WithError(err).Error("cannot setuid")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error msg switched

Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switche err messages, but works like a charm!

@csweichel csweichel force-pushed the cw/supervisor-execve-ring3 branch 2 times, most recently from 1a9c380 to fd3db95 Compare January 8, 2021 14:37
@csweichel csweichel force-pushed the cw/supervisor-execve-ring3 branch from fd3db95 to 65dc59c Compare January 8, 2021 14:48
@csweichel csweichel merged commit 4ab5e8c into master Jan 8, 2021
@csweichel csweichel deleted the cw/supervisor-execve-ring3 branch January 8, 2021 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants