Skip to content

zig cc: support reading from stdin via "-x LANG -" #14462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 17, 2023

Conversation

motiejus
Copy link
Contributor

echo 'some C program' | $CC -x c -

Is a common pattern to test for compiler or linker features. This PR adds support for reading from non-regular files.

This will make at least one more Go test to pass.

@motiejus
Copy link
Contributor Author

motiejus commented Jan 26, 2023

The build fails like this:

: && /home/ci/deps/zig+llvm+lld+clang-x86_64-linux-musl-0.11.0-dev.971+19056cb68/bin/zig  cc -target x86_64-linux-musl -mcpu=baseline -g -rdynamic CMakeFiles/zig1.dir/zig1.c.o CMakeFiles/zig1.dir/stage1/wasi.c.o -o zig1  -lm && :
ld.lld: error: undefined symbol: wasi_snapshot_preview1_fd_seek
>>> referenced by zig1.c:31166

However, I can see this in wasm.zig:

pub extern "wasi_snapshot_preview1" fn fd_seek(fd: fd_t, offset: filedelta_t, whence: whence_t, newoffset: *filesize_t) errno_t;

Edit: fixed.

@andrewrk
Copy link
Member

andrewrk commented Feb 1, 2023

Have you considered the alternate implementation using clang_passthrough_mode instead of buffering the input to a temporary file?

I don't want to support the use case of stdin as a root source file in any case other than C/C++ compiler compatibility mode.

@motiejus motiejus closed this Feb 1, 2023
@motiejus
Copy link
Contributor Author

motiejus commented Feb 2, 2023

Have you considered the alternate implementation using clang_passthrough_mode instead of buffering the input to a temporary file?

I ruled it out for a reason I can no longer remember.

This is how I understand it: if the input file is -, set clang_passthrough_mode = true for the zig clang command (which will cause it to read from stdin), and then do some dancing with the filenames from the subsequent steps like linking.

I will look at this again.

I don't want to support the use case of stdin as a root source file in any case other than C/C++ compiler compatibility mode.

I agree.

@motiejus motiejus reopened this Feb 2, 2023
@motiejus
Copy link
Contributor Author

motiejus commented Feb 2, 2023

I started poking at this and remember better now. I did indeed consider the clang passthrough mode; we had this discussion before (around Oct 2022) and you advised me to slurp the file and pass it on to the remaining compilation chain as a regular file. :)

Here was my argument: everything in src/main.zig, src/Compilation.zig and src/Cache.zig assume that all the input files are files. So we have 2 choices:

  • add special handling for stdin in those 3 files, or
  • slurp stdin and keep assuming everything is a file. Like in this PR.

For a taste on how many things will need to change to adjust this, here is a start:

diff --git a/src/Compilation.zig b/src/Compilation.zig
index 7d42d3b61..8b351270e 100644
--- a/src/Compilation.zig
+++ b/src/Compilation.zig
@@ -210,7 +210,10 @@ pub const LangToExt = std.ComptimeStringMap(FileExt, .{
 
 /// For passing to a C compiler.
 pub const CSourceFile = struct {
-    src_path: []const u8,
+    src: union(enum) {
+        stdin,
+        path: []const u8,
+    },
     extra_flags: []const []const u8 = &.{},
     /// Same as extra_flags except they are not added to the Cache hash.
     cache_exempt_flags: []const []const u8 = &.{},

There are many places in code that will change as a result of this. I can do it, but:

  • it will become more complex due to this single case.
  • we will lose caching for all compilations that arrive over stdin. This may or may not be a good thing.

Let me know.

@motiejus
Copy link
Contributor Author

motiejus commented Feb 2, 2023

I can reduce the surface area of this feature this way: allow only - as a non-file entry; all other non-files except - would be not allowed.

motiejus added a commit to motiejus/zig that referenced this pull request Mar 17, 2023
This is useful for tests that want to `execve` zig directly. The string
is already null-terminated, so this will just expose it as such,
removing an extra allocation from the test.

Will be used in ziglang#14462
@motiejus
Copy link
Contributor Author

Rebased, rewrote the test for the "build system v2".

Also, simplified the change to only account for - as stdin. No other streams will be accepted.

andrewrk pushed a commit that referenced this pull request Mar 17, 2023
This is useful for tests that want to `execve` zig directly. The string
is already null-terminated, so this will just expose it as such,
removing an extra allocation from the test.

Will be used in #14462
@motiejus motiejus force-pushed the infile_stdin branch 2 times, most recently from eb6f34d to 90e6657 Compare March 24, 2023 16:56
truemedian pushed a commit to truemedian/zig that referenced this pull request Mar 30, 2023
This is useful for tests that want to `execve` zig directly. The string
is already null-terminated, so this will just expose it as such,
removing an extra allocation from the test.

Will be used in ziglang#14462
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I have some requested changes. Let me know if you want any help with them.

src/main.zig Outdated
Comment on lines 3150 to 3153
// if a c_source_file is "-", dump it to a tempdir and replace the src_name
// with the newly created file. translate-c works fine with streams; other
// actions don't.
if (arg_mode != .translate_c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you get the idea that translate-c handles - arguments? I don't see that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does handle /dev/stdin well without the intermediate files.

The previous commits have support for any streams (so /dev/stdin would work with anything), but later I decided to drop support for everything but -. That simplifies things a lot.

This comment is a left-over from "those times".

test/tests.zig Outdated
Comment on lines 1140 to 1249
fn exec(
allocator: std.mem.Allocator,
cwd: []const u8,
expect_code: u8,
argv: []const []const u8,
) !std.ChildProcess.ExecResult {
const max_output_size = 100 * 1024;
const result = std.ChildProcess.exec(.{
.allocator = allocator,
.argv = argv,
.cwd = cwd,
.max_output_bytes = max_output_size,
}) catch |err| {
std.debug.print("The following command failed:\n", .{});
printCmd(cwd, argv);
return err;
};
switch (result.term) {
.Exited => |code| {
if (code != expect_code) {
std.debug.print(
"The following command exited with error code {}, expected {}:\n",
.{ code, expect_code },
);
printCmd(cwd, argv);
std.debug.print("stderr:\n{s}\n", .{result.stderr});
return error.CommandFailed;
}
},
else => {
std.debug.print("The following command terminated unexpectedly:\n", .{});
printCmd(cwd, argv);
std.debug.print("stderr:\n{s}\n", .{result.stderr});
return error.CommandFailed;
},
}
return result;
}

fn printCmd(cwd: []const u8, argv: []const []const u8) void {
std.debug.print("cd {s} && ", .{cwd});
for (argv) |arg| {
std.debug.print("{s} ", .{arg});
}
std.debug.print("\n", .{});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which section(s) are you referring to? It's not quite clear from the comment.

test/tests.zig Outdated
Comment on lines 779 to 814
if (builtin.os.tag != .windows) {
const tmp_path = b.makeTempPath();
var dir = std.fs.cwd().openDir(tmp_path, .{}) catch @panic("unhandled");
dir.writeFile("truth.c", "int main() { return 42; }") catch @panic("unhandled");
var infile = dir.openFile("truth.c", .{}) catch @panic("unhandled");

const outfile = std.fs.path.joinZ(
b.allocator,
&[_][]const u8{ tmp_path, "truth" },
) catch @panic("unhandled");
const pid_result = std.os.fork() catch @panic("unhandled");
if (pid_result == 0) { // child
std.os.dup2(infile.handle, std.os.STDIN_FILENO) catch @panic("unhandled");
const argv = &[_:null]?[*:0]const u8{
b.zig_exe, "cc",
"-o", outfile,
"-x", "c",
"-",
};
const envp = &[_:null]?[*:0]const u8{
std.fmt.allocPrintZ(b.allocator, "ZIG_GLOBAL_CACHE_DIR={s}", .{tmp_path}) catch @panic("unhandled"),
};
const err = std.os.execveZ(b.zig_exe, argv, envp);
std.debug.print("execve error: {any}\n", .{err});
std.os.exit(1);
}

const res = std.os.waitpid(pid_result, 0);
assert(0 == res.status);

// run the compiled executable and check if it's telling the truth.
_ = exec(b.allocator, tmp_path, 42, &[_][]const u8{outfile}) catch @panic("unhandled");

const cleanup = b.addRemoveDirTree(tmp_path);
step.dependOn(&cleanup.step);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test as is does the following things directly in the configure phase rather than using build steps like it must do in order to play nicely with the other build steps:

  • making a temp dir and writing a file
  • calling fork, execve, and wait
  • calling exec

If you look at the Test Godbolt API code block above, it shows how to use the build system API for testing. It will be a combination of WriteFileStep, and RunStep.

I'm happy to provide assistance in writing this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did everything in the configure phase, because I couldn't find how to execute a command and change it's stdin file descriptor.

After rebasing this test breaks zig build altogether, so I commented the full test.

I would really appreciate help here. Not asking to write the full code, rather a sequence of how Step , RunStep should be structured, keeping in mind the "need to pass this fd to stdin, which was acquired earlier by opening that temporary file".

@motiejus
Copy link
Contributor Author

Note to myself: handle this case: #15338 (comment)

@motiejus
Copy link
Contributor Author

I have addressed all the review comments for the code, but not for the tests. Need some help there.

@motiejus motiejus requested a review from andrewrk April 25, 2023 09:40
@motiejus motiejus changed the title zig cc: support reading from non-files zig cc: support reading from stdin via -x <lang> - Apr 25, 2023
@motiejus motiejus changed the title zig cc: support reading from stdin via -x <lang> - zig cc: support reading from stdin via -x LANG - Apr 25, 2023
@motiejus motiejus changed the title zig cc: support reading from stdin via -x LANG - zig cc: support reading from stdin via "-x LANG -" Apr 25, 2023
motiejus and others added 3 commits May 16, 2023 16:40
    echo 'some C program' | $CC -x c -

Is a common pattern to test for compiler or linker features. This patch
adds support for reading from non-regular files.

This will make at least one more Go test to pass.
 * no need to move `tmpFilePath` around
 * no need for calculating max length of `FileExt` tag name
 * provide a canonical file extension name for `FileExt` so that, e.g.
   the file will be named `stdin.S` instead of
   `stdin.assembly_with_cpp`.
 * move temp file cleanup to a function to reduce defer bloat in a large
   function.
 * fix bug caused by mixing relative and absolute paths in the cleanup
   logic.
 * remove commented out test and dead code
In one of the happy paths, execve() is used to switch to clang in which
case any cleanup logic that exists for this temporary file will not run
and this temp file will be leaked. Oh well. It's a minor punishment for
using `-x c` which nobody should be doing. Therefore, we make no effort
to clean up. Using `-` for stdin as a source file always leaks a temp
file.

Note that the standard `zig build-exe` CLI does not support stdin as an
input file. This is only for `zig cc` C compiler compatibility.
@andrewrk andrewrk enabled auto-merge May 16, 2023 23:41
@andrewrk
Copy link
Member

Alright, I have made some executive decisions and finished this patch. The final diff ended up being quite small and tidy.

I removed the dead code added to tests.zig; adding test coverage for this CLI usage can be a separate enhancement. Here is a clue should you wish to pursue this: you'll want to have a Run step with the stdin field populated with your C source code, and the stdio field set to check with some checks against stderr or stdout. If you want to run the created binary, you can use addOutputArg on the Run step, and use that with a second Run step with addFileSourceArg.

I have tested this locally and force-pushed to your branch, with auto-merge enabled.

Comment on lines +3035 to +3040
// Note that in one of the happy paths, execve() is used to switch
// to clang in which case any cleanup logic that exists for this
// temporary file will not run and this temp file will be leaked.
// Oh well. It's a minor punishment for using `-x c` which nobody
// should be doing. Therefore, we make no effort to clean up. Using
// `-` for stdin as a source file always leaks a temp file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could minimize the damage by:

  • hashing the file contents (while writing to the file, so no additional allocations)
  • renaming the filename to the hash.

That way, we avoid a small chance of hash collision (I know) and multiple invocations of echo | zig cc <...> produce only one garbage-file.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewrk andrewrk merged commit fd213ac into ziglang:master May 17, 2023
@motiejus motiejus deleted the infile_stdin branch May 19, 2023 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants