Skip to content

eliminate posix_spawnp from the standard library #14866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 10, 2023
Merged

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Mar 10, 2023

Today I found out that posix_spawnp is trash. It's actually implemented on top of posix_spawn inside of libc (or libSystem in the case of macOS).

So, anything posix_spawnp can do, we can do better. In particular, what we can do better is handle spawning of child processes that are potentially foreign binaries. If you try to spawn a wasm binary, for example, posix_spawnp does the following:

  • Goes ahead and creates a child process.
  • The child process writes "foo.wasm: foo.wasm: cannot execute binary file" to stderr (yes, it prints the filename twice).
  • The child process then exits with code 126.

This behavior is indistinguishable from the binary being successfully spawned, and then printing to stderr, and exiting with a failure - something that is an extremely common occurrence.

Meanwhile, using the lower level posix_spawn will simply return ENOEXEC code from the execve syscall (which is mapped to zig error.InvalidExe).

The posix_spawnp behavior means the zig build runner can't tell the difference between a failure to run a foreign binary, and a binary that did run, but failed in some other fashion. This is unacceptable, because attempting to execve is the proper way to support things like Rosetta.

Today I found out that posix_spawn is trash. It's actually implemented
on top of fork/exec inside of libc (or libSystem in the case of macOS).

So, anything posix_spawn can do, we can do better. In particular, what
we can do better is handle spawning of child processes that are
potentially foreign binaries. If you try to spawn a wasm binary, for
example, posix spawn does the following:

 * Goes ahead and creates a child process.
 * The child process writes "foo.wasm: foo.wasm: cannot execute binary file"
   to stderr (yes, it prints the filename twice).
 * The child process then exits with code 126.

This behavior is indistinguishable from the binary being successfully
spawned, and then printing to stderr, and exiting with a failure -
something that is an extremely common occurrence.

Meanwhile, using the lower level fork/exec will simply return ENOEXEC
code from the execve syscall (which is mapped to zig error.InvalidExe).

The posix_spawn behavior means the zig build runner can't tell the
difference between a failure to run a foreign binary, and a binary that
did run, but failed in some other fashion. This is unacceptable, because
attempting to excecve is the proper way to support things like Rosetta.
@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

I was actually aware of this limitation with posix_spawnp on macOS and Rosetta which is indeed sad. Anyhow I am fine with this change but would like to check if we indeed can replace posix_spawnp with fork+execve when spawning processes suspended and registering another process as Mach exception handler. Would you mind if I had a quick look today and report back?

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

Also, somewhat tangential, should we completely remove the posix_spawn wrappers from the libstd even if we are not actively using them as part of ChildProcess? We could still provide them as valid wrappers for the user to use them should they choose so, right? Or are you suggesting we only expose the raw C bindings and it is upto our users to create the wrappers how they see fit? I am fine with either FWIW but wanted to bring this up.

@andrewrk
Copy link
Member Author

Would you mind if I had a quick look today and report back?

Sure, no problem.

If you discover that there truly is no way to disable_aslr and start_suspended with the fork/exec model, then I'm fine with keeping posix_spawn as another option that can be selected via the ChildProcess API. However, I strongly suspect that these options are implemented in posix_spawn on top of fork/exec and that we could do the same here, providing a strictly more powerful API than posix_spawn offers.

My reasoning for deleting the wrappers was to discourage using the API since it is, if my hypothesis is correct, a useless API. Worse than useless, in fact, because it serves as a red herring that leads to subtly broken behavior.

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

Would you mind if I had a quick look today and report back?

Sure, no problem.

If you discover that there truly is no way to disable_aslr and start_suspended with the fork/exec model, then I'm fine with keeping posix_spawn as another option that can be selected via the ChildProcess API. However, I strongly suspect that these options are implemented in posix_spawn on top of fork/exec and that we could do the same here, providing a strictly more powerful API than posix_spawn offers.

My reasoning for deleting the wrappers was to discourage using the API since it is, if my hypothesis is correct, a useless API. Worse than useless, in fact, because it serves as a red herring that leads to subtly broken behavior.

From I gather, it looks like __posix_spawn is an actual syscall on macOS https://github.com/apple-oss-distributions/xnu/blob/5c2921b07a2480ab43ec66f5b9e41cb872bc554f/libsyscall/wrappers/spawn/posix_spawn.c#L2783 and it doesn't seem to be implemented by calling fork+execve under-the-hood. Also note how much more featureful it is having abililty to control a lot of internal darwin/xnu behavior. In fact, it seems the opposite is true on macOS: execve is using __posix_spawn syscall where possible. Example, _execve_with_filter is calling __posix_spawn where/if possible https://github.com/apple-oss-distributions/xnu/blob/5c2921b07a2480ab43ec66f5b9e41cb872bc554f/libsyscall/wrappers/spawn/posix_spawn_filtering.c#L328. I will try digging a little deeper and see if I can find out more.

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

It seems Apple supports two implementations for posix_spawn syscall: 1) opt-in via a flag which turns posix_spawn into fork+execve, and 2) the default which uses cloneproc instead. If we set POSIX_SPAWN_SETEXEC they claim this will make posix_spawn a more featureful version of execve, perhaps this is what we want on darwin by default? Ref: https://github.com/apple-oss-distributions/xnu/blob/5c2921b07a2480ab43ec66f5b9e41cb872bc554f/bsd/kern/kern_exec.c#L3678

@lin72h
Copy link

lin72h commented Mar 10, 2023

And there's related API:

posix_spawnattr_get_qos_class_np
Screenshot 2023-02-08 at 23 15 48

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

I spent the entire day today trying to turn posix_spawnp into a glorified execve with POSIX_SPAWN_SETEXEC flag but failed to make it trip in a similar way to execve. The general steps would be to fork using fork but then instead of calling into execve we would instead call posix_spawnp with the aforementioned flag set. This way we would retain all the goodies provided by posix_spawnp such as ability to start suspended and turn ASLR off. However, so far I have been unable to make this work. posix_spawnp indeed doesn't return to the caller thus acting as exec but I cannot get the errors to properly work to match fork+execve method. Ideas would be appreciated! Here's the diff of the changes that should work as far as I understand Apple's codebase and according to very few incomplete examples I found online:

diff --git a/lib/std/c/darwin.zig b/lib/std/c/darwin.zig
index 9c5ac1e93..be36de0ad 100644
--- a/lib/std/c/darwin.zig
+++ b/lib/std/c/darwin.zig
@@ -227,7 +227,7 @@ pub extern "c" fn posix_spawn_file_actions_addinherit_np(actions: *posix_spawn_f
 pub extern "c" fn posix_spawn_file_actions_addchdir_np(actions: *posix_spawn_file_actions_t, path: [*:0]const u8) c_int;
 pub extern "c" fn posix_spawn_file_actions_addfchdir_np(actions: *posix_spawn_file_actions_t, filedes: fd_t) c_int;
 pub extern "c" fn posix_spawn(
-    pid: *pid_t,
+    pid: ?*pid_t,
     path: [*:0]const u8,
     actions: ?*const posix_spawn_file_actions_t,
     attr: ?*const posix_spawnattr_t,
@@ -235,7 +235,7 @@ pub extern "c" fn posix_spawn(
     env: [*:null]?[*:0]const u8,
 ) c_int;
 pub extern "c" fn posix_spawnp(
-    pid: *pid_t,
+    pid: ?*pid_t,
     path: [*:0]const u8,
     actions: ?*const posix_spawn_file_actions_t,
     attr: ?*const posix_spawnattr_t,
diff --git a/lib/std/child_process.zig b/lib/std/child_process.zig
index dba92ab99..aaebe1c5f 100644
--- a/lib/std/child_process.zig
+++ b/lib/std/child_process.zig
@@ -142,11 +142,6 @@ pub const ChildProcess = struct {
         if (!std.process.can_spawn) {
             @compileError("the target operating system cannot spawn processes");
         }
-
-        if (comptime builtin.target.isDarwin()) {
-            return self.spawnMacos();
-        }
-
         if (builtin.os.tag == .windows) {
             return self.spawnWindows();
         } else {
@@ -337,10 +332,11 @@ pub const ChildProcess = struct {
     }
 
     fn waitUnwrapped(self: *ChildProcess) !void {
-        const res: os.WaitPidResult = if (comptime builtin.target.isDarwin())
-            try os.posix_spawn.waitpid(self.id, 0)
-        else
-            os.waitpid(self.id, 0);
+        // const res: os.WaitPidResult = if (comptime builtin.target.isDarwin())
+        //     try os.posix_spawn.waitpid(self.id, 0)
+        // else
+        //     os.waitpid(self.id, 0);
+        const res = os.waitpid(self.id, 0);
         const status = res.status;
         self.cleanupStreams();
         self.handleWaitResult(status);
@@ -416,121 +412,6 @@ pub const ChildProcess = struct {
             Term{ .Unknown = status };
     }
 
-    fn spawnMacos(self: *ChildProcess) SpawnError!void {
-        const pipe_flags = if (io.is_async) os.O.NONBLOCK else 0;
-        const stdin_pipe = if (self.stdin_behavior == StdIo.Pipe) try os.pipe2(pipe_flags) else undefined;
-        errdefer if (self.stdin_behavior == StdIo.Pipe) destroyPipe(stdin_pipe);
-
-        const stdout_pipe = if (self.stdout_behavior == StdIo.Pipe) try os.pipe2(pipe_flags) else undefined;
-        errdefer if (self.stdout_behavior == StdIo.Pipe) destroyPipe(stdout_pipe);
-
-        const stderr_pipe = if (self.stderr_behavior == StdIo.Pipe) try os.pipe2(pipe_flags) else undefined;
-        errdefer if (self.stderr_behavior == StdIo.Pipe) destroyPipe(stderr_pipe);
-
-        const any_ignore = (self.stdin_behavior == StdIo.Ignore or self.stdout_behavior == StdIo.Ignore or self.stderr_behavior == StdIo.Ignore);
-        const dev_null_fd = if (any_ignore)
-            os.openZ("/dev/null", os.O.RDWR, 0) catch |err| switch (err) {
-                error.PathAlreadyExists => unreachable,
-                error.NoSpaceLeft => unreachable,
-                error.FileTooBig => unreachable,
-                error.DeviceBusy => unreachable,
-                error.FileLocksNotSupported => unreachable,
-                error.BadPathName => unreachable, // Windows-only
-                error.InvalidHandle => unreachable, // WASI-only
-                error.WouldBlock => unreachable,
-                else => |e| return e,
-            }
-        else
-            undefined;
-        defer if (any_ignore) os.close(dev_null_fd);
-
-        var attr = try os.posix_spawn.Attr.init();
-        defer attr.deinit();
-        var flags: u16 = os.darwin.POSIX_SPAWN_SETSIGDEF | os.darwin.POSIX_SPAWN_SETSIGMASK;
-        if (self.disable_aslr) {
-            flags |= os.darwin._POSIX_SPAWN_DISABLE_ASLR;
-        }
-        if (self.start_suspended) {
-            flags |= os.darwin.POSIX_SPAWN_START_SUSPENDED;
-        }
-        try attr.set(flags);
-
-        var actions = try os.posix_spawn.Actions.init();
-        defer actions.deinit();
-
-        try setUpChildIoPosixSpawn(self.stdin_behavior, &actions, stdin_pipe, os.STDIN_FILENO, dev_null_fd);
-        try setUpChildIoPosixSpawn(self.stdout_behavior, &actions, stdout_pipe, os.STDOUT_FILENO, dev_null_fd);
-        try setUpChildIoPosixSpawn(self.stderr_behavior, &actions, stderr_pipe, os.STDERR_FILENO, dev_null_fd);
-
-        if (self.cwd_dir) |cwd| {
-            try actions.fchdir(cwd.fd);
-        } else if (self.cwd) |cwd| {
-            try actions.chdir(cwd);
-        }
-
-        var arena_allocator = std.heap.ArenaAllocator.init(self.allocator);
-        defer arena_allocator.deinit();
-        const arena = arena_allocator.allocator();
-
-        const argv_buf = try arena.allocSentinel(?[*:0]u8, self.argv.len, null);
-        for (self.argv, 0..) |arg, i| argv_buf[i] = (try arena.dupeZ(u8, arg)).ptr;
-
-        const envp = if (self.env_map) |env_map| m: {
-            const envp_buf = try createNullDelimitedEnvMap(arena, env_map);
-            break :m envp_buf.ptr;
-        } else std.c.environ;
-
-        const pid = try os.posix_spawn.spawnp(self.argv[0], actions, attr, argv_buf, envp);
-
-        if (self.stdin_behavior == StdIo.Pipe) {
-            self.stdin = File{ .handle = stdin_pipe[1] };
-        } else {
-            self.stdin = null;
-        }
-        if (self.stdout_behavior == StdIo.Pipe) {
-            self.stdout = File{ .handle = stdout_pipe[0] };
-        } else {
-            self.stdout = null;
-        }
-        if (self.stderr_behavior == StdIo.Pipe) {
-            self.stderr = File{ .handle = stderr_pipe[0] };
-        } else {
-            self.stderr = null;
-        }
-
-        self.id = pid;
-        self.term = null;
-
-        if (self.stdin_behavior == StdIo.Pipe) {
-            os.close(stdin_pipe[0]);
-        }
-        if (self.stdout_behavior == StdIo.Pipe) {
-            os.close(stdout_pipe[1]);
-        }
-        if (self.stderr_behavior == StdIo.Pipe) {
-            os.close(stderr_pipe[1]);
-        }
-    }
-
-    fn setUpChildIoPosixSpawn(
-        stdio: StdIo,
-        actions: *os.posix_spawn.Actions,
-        pipe_fd: [2]i32,
-        std_fileno: i32,
-        dev_null_fd: i32,
-    ) !void {
-        switch (stdio) {
-            .Pipe => {
-                const idx: usize = if (std_fileno == 0) 0 else 1;
-                try actions.dup2(pipe_fd[idx], std_fileno);
-                try actions.close(pipe_fd[1 - idx]);
-            },
-            .Close => try actions.close(std_fileno),
-            .Inherit => {},
-            .Ignore => try actions.dup2(dev_null_fd, std_fileno),
-        }
-    }
-
     fn spawnPosix(self: *ChildProcess) SpawnError!void {
         const pipe_flags = if (io.is_async) os.O.NONBLOCK else 0;
         const stdin_pipe = if (self.stdin_behavior == StdIo.Pipe) try os.pipe2(pipe_flags) else undefined;
@@ -647,11 +528,29 @@ pub const ChildProcess = struct {
                 os.setreuid(uid, uid) catch |err| forkChildErrReport(err_pipe[1], err);
             }
 
-            const err = switch (self.expand_arg0) {
-                .expand => os.execvpeZ_expandArg0(.expand, argv_buf.ptr[0].?, argv_buf.ptr, envp),
-                .no_expand => os.execvpeZ_expandArg0(.no_expand, argv_buf.ptr[0].?, argv_buf.ptr, envp),
-            };
-            forkChildErrReport(err_pipe[1], err);
+            if (comptime builtin.target.isDarwin()) {
+                // Use posix_spawn with SETEXEC flag which turns it into execve on steroids
+                // on macOS.
+                var attr = try os.posix_spawn.Attr.init();
+                defer attr.deinit();
+                var flags: u16 = try attr.get();
+                if (self.disable_aslr) {
+                    flags |= os.darwin._POSIX_SPAWN_DISABLE_ASLR;
+                }
+                if (self.start_suspended) {
+                    flags |= os.darwin.POSIX_SPAWN_START_SUSPENDED;
+                }
+                try attr.set(flags);
+
+                const err = os.posix_spawn.spawnpZ_execve(argv_buf.ptr[0].?, null, attr, argv_buf, envp);
+                forkChildErrReport(err_pipe[1], err);
+            } else {
+                const err = switch (self.expand_arg0) {
+                    .expand => os.execvpeZ_expandArg0(.expand, argv_buf.ptr[0].?, argv_buf.ptr, envp),
+                    .no_expand => os.execvpeZ_expandArg0(.no_expand, argv_buf.ptr[0].?, argv_buf.ptr, envp),
+                };
+                forkChildErrReport(err_pipe[1], err);
+            }
         }
 
         // we are the parent
diff --git a/lib/std/os/posix_spawn.zig b/lib/std/os/posix_spawn.zig
index 32904a942..ecb750586 100644
--- a/lib/std/os/posix_spawn.zig
+++ b/lib/std/os/posix_spawn.zig
@@ -266,6 +266,44 @@ const posix_spawn = if (builtin.target.isDarwin()) struct {
         }
     }
 
+    pub fn spawnpZ_execve(
+        file: [*:0]const u8,
+        actions: ?Actions,
+        attr: ?Attr,
+        argv: [*:null]?[*:0]const u8,
+        envp: [*:null]?[*:0]const u8,
+    ) Error {
+        var curr_attr = attr orelse try Attr.init();
+        const flags = try curr_attr.get();
+        try curr_attr.set(flags | os.darwin.POSIX_SPAWN_SETEXEC);
+        switch (errno(system.posix_spawnp(
+            null,
+            file,
+            if (actions) |a| &a.actions else null,
+            &curr_attr.attr,
+            argv,
+            envp,
+        ))) {
+            .SUCCESS => unreachable,
+            .@"2BIG" => return error.TooBig,
+            .NOMEM => return error.SystemResources,
+            .BADF => return error.InvalidFileDescriptor,
+            .ACCES => return error.PermissionDenied,
+            .IO => return error.InputOutput,
+            .LOOP => return error.FileSystem,
+            .NAMETOOLONG => return error.NameTooLong,
+            .NOENT => return error.FileNotFound,
+            .NOEXEC => return error.InvalidExe,
+            .NOTDIR => return error.NotDir,
+            .TXTBSY => return error.FileBusy,
+            .BADARCH => return error.InvalidExe,
+            .BADEXEC => return error.InvalidExe,
+            .FAULT => unreachable,
+            .INVAL => unreachable,
+            else => |err| return unexpectedErrno(err),
+        }
+    }
+
     /// Use this version of the `waitpid` wrapper if you spawned your child process using `posix_spawn`
     /// or `posix_spawnp` syscalls.
     /// See also `std.os.waitpid` for an alternative if your child process was spawned via `fork` and

And the example I have been testing it with:

const std = @import("std");

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();

pub fn main() !void {
    var args = std.ArrayList([]const u8).init(allocator);
    try args.append("./run.wasm");

    var child = std.ChildProcess.init(args.items, allocator);
    const res = try child.spawnAndWait();
    std.debug.print("{}", .{res});
}

@sionescu
Copy link

I also needed a more powerful version of posix_spawn, so I wrote one. If you might be interested in using it, I'd be happy to help improve or extend it.

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

I also needed a more powerful version of posix_spawn, so I wrote one. If you might be interested in using it, I'd be happy to help improve or extend it.

Thanks! The problem for us is specifically macOS and support for requesting ASLR off for a child process which currently is looking like it's only supported by posix_spawn syscall. I am exploring using fork+execve to accomplish this too however.

@kubkon
Copy link
Member

kubkon commented Mar 10, 2023

Update: from my investigation, using lldb's debugserver implementation as a guide, sadly I have concluded only posix_spawn supports extended flags for controlling things like ASLR on/off, etc. I think for the purposes of libstd we are still fine not using posix_spawn for ChildProcess on macOS and reverting back to fork+execve instead. For hot-code swapping usecase, we have then the option of using posix_spawn internally without exposing the API to the users - this is the approach I will take in https://github.com/kubkon/zdb. If however we get lucky in successfully using posix_spawn with POSIX_SPAWN_SETEXEC to act as execve this would be the ideal solution. In the meantime, in order not to block Andrew's progress on parallel builds I recommend we merge this PR.

@andrewrk
Copy link
Member Author

I have great news.

The actual culprit here is posix_spawnp (note the 'p' suffix) which is a libc function that wraps the actual syscall, posix_spawn:

https://github.com/apple-oss-distributions/Libc/blob/7861c72b1692b65f79c03f21a8a1a8e51e14c843/sys/posix_spawn.c#L144-L156

Take a moment to appreciate how depraved this Apple code is:

		case ENOEXEC:
			for (cnt = 0; argv[cnt]; ++cnt)
				;
			memp = alloca((cnt + 2) * sizeof(char *));
			if (memp == NULL) {
				/* errno = ENOMEM; XXX override ENOEXEC? */
				goto done;
			}
			memp[0] = "sh";
			memp[1] = bp;
			bcopy(argv + 1, memp + 2, cnt * sizeof(char *));
			err = posix_spawn(pid, _PATH_BSHELL, file_actions, attrp, memp, envp);
			goto done;

I'm in awe.

If the spawning of the child process fails, it allocates on the stack a new argv array, and blindly inserts sh as the first argument, then retries the spawn. Truly perverted.

So the great news is that this code is in libc, and in zig we can simply call posix_spawn, the actual syscall, instead of posix_spawnp, the creepy uncle that we don't want touching our child processes inappropriately.

This is a great example of why in the Zig standard library we prefer to use NtDll instead of Kernel32, and in general always reach for the lowest level, most powerful syscalls. If you want something done right, you gotta do it yourself.

@andrewrk andrewrk changed the title eliminate posix_spawn from the standard library eliminate posix_spawnp from the standard library Mar 10, 2023
@andrewrk
Copy link
Member Author

Here is the plan forward: I will merge this PR now, because it works, and expect to see our child process implementation switch to using posix_spawn on darwin in the future. I don't trust os/posix_spawn.zig (the deleted file from this PR) because it was written on top of posix_spawnp, and so I think it needs to be rewritten from scratch on top of posix_spawn this time. It also uses usingnamespace which I don't like, and I want the chance to review the PR before accepting it into the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants