Skip to content

std::process::Command is broken on VMWare ESXi #129654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aapanfilovv opened this issue Aug 27, 2024 · 12 comments
Open

std::process::Command is broken on VMWare ESXi #129654

aapanfilovv opened this issue Aug 27, 2024 · 12 comments
Labels
A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@aapanfilovv
Copy link

std::process::Command is broken on VMWare ESXi, probably VMKernel does't support a SOCK_SEQPACKET

I tried this code:

// main.rs
use std::process::{Command, Stdio};

fn main() {
    let res = Command::new("ls")
        .arg("-l")
        .output()
        .unwrap();
    println!("status = {:?}", res.status);
    println!("stderr = {:?}", String::from_utf8_lossy(&res.stderr));
    println!("stdout = {:?}", String::from_utf8_lossy(&res.stdout));
}
# Cross.toml
[target.x86_64-unknown-linux-gnu]
image = "ghcr.io/cross-rs/x86_64-unknown-linux-gnu:0.2.2-centos"

Build with cross:
cross +nightly build -Zbuild-std=std --target x86_64-unknown-linux-gnu --release

I expected to see this happen: exit status code, stdout, stderr of ls process

Instead, this happened: panic with message: thread 'main' panicked at /home/root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/process.rs:153:17: called `Result::unwrap()` on an `Err` value: Os { code: 25, kind: Uncategorized, message: "Inappropriate ioctl for device" }

But if I change the code a little:

// main.rs
use std::process::{Command, Stdio};

fn main() {
    let res = Command::new("ls")
        .arg("-l")
        .stderr(Stdio::null()) // <--
        .output()
        .unwrap();
    println!("status = {:?}", res.status);
    println!("stderr = {:?}", String::from_utf8_lossy(&res.stderr));
    println!("stdout = {:?}", String::from_utf8_lossy(&res.stdout));
}

Panic message is changed: thread 'main' panicked at src/main.rs:10:10: called `Result::unwrap()` on an `Err` value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }

Meta

rustc --version --verbose of current toolchain:

rustc 1.82.0-nightly (60d146580 2024-08-06)
binary: rustc
commit-hash: 60d146580c10036ce89e019422c6bc2fd9729b65
commit-date: 2024-08-06
host: x86_64-unknown-linux-gnu
release: 1.82.0-nightly
LLVM version: 19.1.0

rustc --version --verbose of working toolchain:

rustc 1.73.0-nightly (f88a8b71c 2023-08-08)
binary: rustc
commit-hash: f88a8b71cebb730cbd5058c45ebcae1d4d9be377
commit-date: 2023-08-08
host: x86_64-unknown-linux-gnu
release: 1.73.0-nightly
LLVM version: 17.0.0

rustc --version --verbose of first broken toolchain:

rustc 1.73.0-nightly (08d00b40a 2023-08-09)
binary: rustc
commit-hash: 08d00b40aef2017fe6dba3ff7d6476efa0c10888
commit-date: 2023-08-09
host: x86_64-unknown-linux-gnu
release: 1.73.0-nightly
LLVM version: 17.0.0

vmware -v:

VMware ESXi 8.0.1 build-21813344

Affected commit: #113939

@aapanfilovv aapanfilovv added the C-bug Category: This is a bug. label Aug 27, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 27, 2024
@jieyouxu jieyouxu added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 27, 2024
@workingjubilee
Copy link
Member

Does ESXi claim that it replicates the functional properties of a Linux kernel of a certain version?

@Noratrieb Noratrieb added O-linux Operating system: Linux A-process Area: `std::process` and `std::env` and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 27, 2024
@aapanfilovv
Copy link
Author

Does ESXi claim that it replicates the functional properties of a Linux kernel of a certain version?

No, but ESXi api is very similar to POSIX. I does't found any documentation about certain version.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 27, 2024

Honestly, Linux is not "very similar to POSIX", so ESXi can't possibly be that similar to Linux if it is very similar to POSIX. Linux is a very specific take on POSIX. It deviates from the standard (or "extends", if you prefer) in many places. I struggle to think differently when most of our Unix code looks like "Linux" and then "Everyone Else". ESXi either has to deviate in the same way or it can't be expected to run our code compiled for Linux.

Linux is Linux, it's not VMware's ESXi, and we cannot coherently support random cut-up jobs of the Linux kernel API. At least when a distro patches the source, we can inspect their patches!

@workingjubilee
Copy link
Member

workingjubilee commented Aug 27, 2024

Mind, it's not that it's theoretically impossible for us to take on a patch to fix this... but we probably can't fix it ourselves (we would need someone experienced with ESXi to root-cause this, as there's more than just one or two possible failure points in that PR), and we probably will basically never test it for compatibility against ESXi. Not on our own, anyways, because Broadcom will not provide free licenses anymore. So the day after we accept such a patch, we might blithely merge something that breaks ESXi again, and that is because we will test the code works on Linux, not ESXi.

Most of the things I say are mutable, I think, if e.g. Broadcom had a sincere interest in providing sufficient technical expertise to make this work in our CI.

@aapanfilovv
Copy link
Author

aapanfilovv commented Aug 27, 2024

I think it could be added a target like a for Windows 7 support ( *-win7-windows-msvc), but you're right - there are few people interested in this.
I fixed it locally but didn't submit a PR since it's unlikely anyone else will need it, including Broadcom.

@workingjubilee
Copy link
Member

I think esxi-linux as a tier 3 would be on the table, yeah.

I'm curious what the patch is like, at least, unless it's just a revert.

@aapanfilovv
Copy link
Author

aapanfilovv commented Aug 27, 2024

I don't have time to look into the matter in more detail, so I just rolled back the changes:

# that patch for std/src/sys/pal/unix/process/process_unix.rs

diff --git a/process_unix.rs b/process_unix.rs
index 011be6a..210d2a9 100644
--- a/process_unix.rs	
+++ b/process_unix.rs
@@ -72,10 +72,6 @@ impl Command {
             return Ok((ret, ours));
         }
 
-        #[cfg(target_os = "linux")]
-        let (input, output) = sys::net::Socket::new_pair(libc::AF_UNIX, libc::SOCK_SEQPACKET)?;
-
-        #[cfg(not(target_os = "linux"))]
         let (input, output) = sys::pipe::anon_pipe()?;
 
         // Whatever happens after the fork is almost for sure going to touch or
@@ -95,10 +91,7 @@ impl Command {
             crate::panic::always_abort();
             mem::forget(env_lock); // avoid non-async-signal-safe unlocking
             drop(input);
-            #[cfg(target_os = "linux")]
-            if self.get_create_pidfd() {
-                self.send_pidfd(&output);
-            }
+
             let Err(err) = unsafe { self.do_exec(theirs, envp.as_ref()) };
             let errno = err.raw_os_error().unwrap_or(libc::EINVAL) as u32;
             let errno = errno.to_be_bytes();
@@ -122,14 +115,6 @@ impl Command {
         drop(env_lock);
         drop(output);
 
-        #[cfg(target_os = "linux")]
-        let pidfd = if self.get_create_pidfd() {
-            self.recv_pidfd(&input)
-        } else {
-            -1
-        };
-
-        #[cfg(not(target_os = "linux"))]
         let pidfd = -1;
 
         // Safety: We obtained the pidfd (on Linux) using SOCK_SEQPACKET, so it's valid.

And code now work:

// main.rs
use std::process::{Command, Stdio};

fn main() {
    let res = Command::new("ls")
        .arg("-l")
        .stderr(Stdio::null())
        .output()
        .unwrap();
    println!("status = {:?}", res.status);
    println!("stderr = {:?}", String::from_utf8_lossy(&res.stderr));
    println!("stdout = {:?}", String::from_utf8_lossy(&res.stdout));
}

@the8472
Copy link
Member

the8472 commented Aug 27, 2024

That's weird. Those branches should not be taken unless you call command.create_pidfd(true). So given your examples your patch should not make a difference.

@aapanfilovv
Copy link
Author

That's weird. Those branches should not be taken unless you call command.create_pidfd(true). So given your examples your patch should not make a difference.

Yeah, sorry, I post not a full patch. Edited.

@the8472
Copy link
Member

the8472 commented Aug 28, 2024

Ah well, then I guess it doesn't support SOCK_SEQPACKET unix sockets or creating it with SOCK_CLOEXEC.

@workingjubilee
Copy link
Member

Yeah, I don't think we could take a straight revert because it would be costing performance on true Linux systems.

@aapanfilovv
Copy link
Author

Yeah, I don't think we could take a straight revert because it would be costing performance on true Linux systems.

Yeah, I understand that. Issue created only for information purposes for users who develop for ESXi target (are there any at all?). Thx y'all for answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants