-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
std.start: initialize Windows console output CP to UTF-8 on exe startup #14411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ebbe8f
to
7dc74db
Compare
Not disagreeing that this is the right thing to do, but it's worth noting that the console code page is part of the console host's state and so sticks around after the program that modified it exits. That might result in surprising behavior to some people. Also: What about the input code page? |
Ah, I didn't realise that this modifies the console globally, I assumed it was specific to this process invocation. That's not ideal, but I still think it's preferable to the status quo; I'd expect any process that relies on system codepage to set it specifically, no? (Especially since UTF-8 can be globally set as the default codepage, as in #7600 (comment)).
By my understanding, that's a more complex issue (see #5148 as linked by r00ster91), and I'm not sure of the optimal solution for it - in order to keep the current IO abstractions in place, we might need to add a flag to @r00ster91 |
7dc74db
to
ae6a989
Compare
Can you use ProcMon to check if |
Looking at the ReactOS source, this function's implementation is definitely not a simple wrapper: it's pretty complex. Moreover, its implementation seems to be completely different across ReactOS, Wine, and Windows. Since this only makes sense in userland anyway, I'm happy to conclude that this makes sense to stay at the |
I don't want to put a dependency on kernel32.dll in start.zig. |
An alternative approach would be to put this in I do think this use case needs a resolution, because right now if you just use UTF-8 (which is a pretty standard thing to do for anything's that's not just outputting a stream of English text) your code simply breaks on Windows (of course, you yourself just ran into this). It's worth noting that this PR does allow opting out of the call simply using an |
I agree something needs to be done, and a dependency on kernel32.dll would be better than nothing. But we can do even better than that if we look into it more deeply. At the very least, such code can be omitted when the subsystem is not console. The reason for avoiding kernel32 in favor of ntdll is that kernel32 is high level code, not syscalls, and often does problematic things such as allocating heap memory and panicking on OOM, or hiding the actual capabilities of the system such as open directory handles, and the ability to create a directory and open it at the same time. Or, implementing operations with multiple syscalls unnecessarily, or obscuring the real error code. The equivalent problem exists on other systems too. See for example #14866 where the problem was that libc had a bunch of garbage logic wrapping the actual syscall, which caused a real bug in practice due to the error code not bubbling up properly. I insist on at least inspecting the DLL stack trace in ProcMon before merging this PR. I'm happy to do that work; just leave the PR open and I'll get to it after I get to the other ~30 PRs in line before it. |
I'm at least confident in saying that the implementations are completely different across React, Wine, and Windows. React lowers it to a Regardless, thank you for the update - I'll wait for you to look further into it. |
|
I've tried to translate the wine implementation, but I keep getting const std = @import("std");
const windows = std.os.windows;
const kernel32 = windows.kernel32;
const condrv_input_info_params = extern struct {
/// Setting mask.
mask: c_uint,
info: extern struct {
/// Console input codepage.
input_cp: c_uint,
/// Console output codepage.
output_cp: c_uint,
/// Number of available input records.
input_count: c_uint,
},
};
const SET_CONSOLE_INPUT_INFO = struct {
const INPUT_CODEPAGE = 0x01;
const OUTPUT_CODEPAGE = 0x02;
};
inline fn ctlCode(device_type: FILE_DEVICE, function: u32, method: METHOD, access: u32) u32 {
return @enumToInt(device_type) << 16 | access << 14 | function << 2 | @enumToInt(method);
}
const FILE_ANY_ACCESS = 0;
const FILE_SPECIAL_ACCESS = 0;
const FILE_READ_ACCESS = windows.FILE_READ_DATA;
const FILE_WRITE_ACCESS = windows.FILE_WRITE_DATA;
const FILE_DEVICE = enum(u32) {
BEEP = 0x00000001,
CD_ROM = 0x00000002,
CD_ROM_FILE_SYSTEM = 0x00000003,
CONTROLLER = 0x00000004,
DATALINK = 0x00000005,
DFS = 0x00000006,
DISK = 0x00000007,
DISK_FILE_SYSTEM = 0x00000008,
FILE_SYSTEM = 0x00000009,
INPORT_PORT = 0x0000000a,
KEYBOARD = 0x0000000b,
MAILSLOT = 0x0000000c,
MIDI_IN = 0x0000000d,
MIDI_OUT = 0x0000000e,
MOUSE = 0x0000000f,
MULTI_UNC_PROVIDER = 0x00000010,
NAMED_PIPE = 0x00000011,
NETWORK = 0x00000012,
NETWORK_BROWSER = 0x00000013,
NETWORK_FILE_SYSTEM = 0x00000014,
NULL = 0x00000015,
PARALLEL_PORT = 0x00000016,
PHYSICAL_NETCARD = 0x00000017,
PRINTER = 0x00000018,
SCANNER = 0x00000019,
SERIAL_MOUSE_PORT = 0x0000001a,
SERIAL_PORT = 0x0000001b,
SCREEN = 0x0000001c,
SOUND = 0x0000001d,
STREAMS = 0x0000001e,
TAPE = 0x0000001f,
TAPE_FILE_SYSTEM = 0x00000020,
TRANSPORT = 0x00000021,
UNKNOWN = 0x00000022,
VIDEO = 0x00000023,
VIRTUAL_DISK = 0x00000024,
WAVE_IN = 0x00000025,
WAVE_OUT = 0x00000026,
@"8042_PORT" = 0x00000027,
NETWORK_REDIRECTOR = 0x00000028,
BATTERY = 0x00000029,
BUS_EXTENDER = 0x0000002a,
MODEM = 0x0000002b,
VDM = 0x0000002c,
MASS_STORAGE = 0x0000002d,
SMB = 0x0000002e,
KS = 0x0000002f,
CHANGER = 0x00000030,
SMARTCARD = 0x00000031,
ACPI = 0x00000032,
DVD = 0x00000033,
FULLSCREEN_VIDEO = 0x00000034,
DFS_FILE_SYSTEM = 0x00000035,
DFS_VOLUME = 0x00000036,
SERENUM = 0x00000037,
TERMSRV = 0x00000038,
KSEC = 0x00000039,
FIPS = 0x0000003a,
INFINIBAND = 0x0000003b,
VMBUS = 0x0000003e,
CRYPT_PROVIDER = 0x0000003f,
WPD = 0x00000040,
BLUETOOTH = 0x00000041,
MT_COMPOSITE = 0x00000042,
MT_TRANSPORT = 0x00000043,
BIOMETRIC = 0x00000044,
PMI = 0x00000045,
EHSTOR = 0x00000046,
DEVAPI = 0x00000047,
GPIO = 0x00000048,
USBEX = 0x00000049,
CONSOLE = 0x00000050,
NFP = 0x00000051,
SYSENV = 0x00000052,
VIRTUAL_BLOCK = 0x00000053,
POINT_OF_SERVICE = 0x00000054,
STORAGE_REPLICATION = 0x00000055,
TRUST_ENV = 0x00000056,
UCM = 0x00000057,
UCMTCPCI = 0x00000058,
PERSISTENT_MEMORY = 0x00000059,
NVDIMM = 0x0000005a,
HOLOGRAPHIC = 0x0000005b,
SDFXHCI = 0x0000005c,
};
const METHOD = enum(u32) {
BUFFERED = 0,
IN_DIRECT = 1,
OUT_DIRECT = 2,
NEITHER = 3,
};
const IOCTL_CONDRV_GET_INPUT_INFO = ctlCode(.CONSOLE, 15, .BUFFERED, FILE_READ_ACCESS);
const IOCTL_CONDRV_SET_INPUT_INFO = ctlCode(.CONSOLE, 16, .BUFFERED, FILE_WRITE_ACCESS);
pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;
pub fn main() !void {
const writer = std.io.getStdOut().writer();
var params = condrv_input_info_params{
.mask = SET_CONSOLE_INPUT_INFO.OUTPUT_CODEPAGE,
.info = .{ .input_cp = 0, .output_cp = 65001, .input_count = 0 },
};
const stdout = try windows.GetStdHandle(windows.STD_OUTPUT_HANDLE);
var io: windows.IO_STATUS_BLOCK = undefined;
const status = windows.ntdll.NtDeviceIoControlFile(
stdout, //RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
null,
null,
null,
&io,
IOCTL_CONDRV_SET_INPUT_INFO,
¶ms,
@sizeOf(condrv_input_info_params),
null,
0,
);
try writer.print("status: {}\n", .{status});
} |
That's because, like I say, these are driver-defined constants. |
Some info from using NtTrace with the following code: const std = @import("std");
pub fn main() !void {
_ = std.os.windows.kernel32.SetConsoleOutputCP(65001);
} The relevant ntdll calls from a
(for some reason it does two
where the
and
respectively (I'm assuming they are both pointing to memory that is 8 bytes long but that could be wrong). Note that EDIT: I think the bytes with the 65001 correspond to this struct, which would mean EDIT#2: I think the EDIT#3: If the above is true, that would mean the first pointer might be pointing to this struct, and However, note that my method of looking at these bytes was extremely janky, so they might not even be the actual values that I was still unable to construct a successful My failed attemptconst std = @import("std");
const windows = std.os.windows;
pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;
pub fn main() !void {
const ptr_val1 = "\x04\x00\x00\x02\x08\x00\x00\x00".*;
// I'm assuming the first 8 bytes are the only ones that matter but I've included more just incase
const ptr_val2 = "\xe9\xfd\x00\x00\x01\x7f\x00\x00p\xa3j\x19\xf6\x01\x00\x00@\x8fk\x19\xf6\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa0\xf56/5\x00\x00".*;
const a = "\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x00".*;
const b = "\x08\x00\x00\x00\x00\x00\x00\x00".*;
var buf: [0x30]u8 = undefined;
var fbs = std.io.fixedBufferStream(&buf);
var writer = fbs.writer();
writer.writeAll(&a) catch unreachable;
writer.writeIntLittle(usize, @ptrToInt(&ptr_val1)) catch unreachable;
writer.writeAll(&b) catch unreachable;
writer.writeIntLittle(usize, @ptrToInt(&ptr_val2)) catch unreachable;
const control_code = 0x00500016;
var io: windows.IO_STATUS_BLOCK = undefined;
const status = windows.ntdll.NtDeviceIoControlFile(
RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
null,
null,
null,
&io,
control_code,
&buf,
buf.len,
null,
0,
);
std.debug.print("status: {}\n", .{status});
} It fails with
and NtTrace gives:
|
Got something working. The fix, strangely, was to make the input Working example: const std = @import("std");
const windows = std.os.windows;
pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;
const CONSOLE_MSG_HEADER = extern struct {
ApiNumber: u32,
ApiDescriptorSize: u32,
};
// This is actually one value in an enum but we just care about this for now
const ConsolepSetCP = (2 << 24) + 4;
const CONSOLE_SETCP_MSG = extern struct {
CodePage: u32,
Output: bool,
};
const CONSOLE_MSG_L2 = extern struct {
Header: CONSOLE_MSG_HEADER,
// This is actually a union of other types but we just care about this for now
Body: CONSOLE_SETCP_MSG,
};
const UNKNOWN_IOCTL_INPUT = extern struct {
a: u32 = 0,
b: u32 = 0,
c: u32 = 1,
d: u32 = 1,
e: u32 = 0x10,
f: u32 = 0x20,
header: *CONSOLE_MSG_HEADER,
g: u32 = 0x8,
h: u32 = 0,
body: *CONSOLE_SETCP_MSG,
};
pub fn main() !void {
std.debug.print("⚡\n", .{});
const control_code = 0x00500016;
var console_msg = CONSOLE_MSG_L2{
.Header = .{ .ApiNumber = ConsolepSetCP, .ApiDescriptorSize = @sizeOf(CONSOLE_SETCP_MSG) },
.Body = .{ .CodePage = 65001, .Output = true },
};
var input = UNKNOWN_IOCTL_INPUT{
.header = &console_msg.Header,
.body = &console_msg.Body,
};
var io: windows.IO_STATUS_BLOCK = undefined;
const status = windows.ntdll.NtDeviceIoControlFile(
RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
null,
null,
null,
&io,
control_code,
&input,
@sizeOf(UNKNOWN_IOCTL_INPUT),
null,
0,
);
std.debug.print("status: {}\n", .{status});
std.debug.print("⚡\n", .{});
} Outputs:
(that's with code page 437 set when the program starts, run For completeness, here's a hexdump of the
|
Made a PR in your fork against this branch to avoid the |
The thing is, that solution doesn't work in Wine, and IMO we should strive to make Wine work. Moreover, since this looks to be an implementation detail nothing depends on (evidenced by Wine and React doing it differently), afaict it's entirely possible Microsoft change it one day |
In addition to the above: If memory serves from past discussions on the microsoft/terminal repo, console handles saw a significant rework in Windows 8. Among other things, I think they were changed so that various So even if you ignore Wine and ReactOS, you're still looking at compatibility issues just for Windows proper. It still might be workable, just something to keep in mind. |
Agreed about the |
I ran another test for WriteConsoleW and ran it through NtTrace to see if there was an easy ntdll function to use, but it seems like it's another ioctl: relevant trace:
And the code: const std = @import("std");
const windows = std.os.windows;
const kernel32 = windows.kernel32;
pub extern "kernel32" fn WriteConsoleW(
hConsoleOutput: *anyopaque,
lpBuffer: *const anyopaque,
nNumberOfCharsToWrite: u32,
lpNumberOfCharsWritten: ?*u32,
lpReserved: ?*anyopaque,
) callconv(windows.WINAPI) windows.BOOL;
const L = std.unicode.utf8ToUtf16LeStringLiteral;
const foo = L("foobar\n");
pub fn main() !void {
while (true) {
_ = WriteConsoleW(
windows.peb().ProcessParameters.hStdOutput,
foo[0..],
@truncate(u32, foo.len),
null,
null,
);
}
} |
Nice work @squeek502 on figuring out what you did. That's some impressive sleuthing. At this point I'm convinced that our two options forward are:
|
@squeek502 do you have any opinion or suggestion on the path forward here? |
@andrewrk it's a tricky problem and I don't feel like I have an answer yet. I think something like #12400 should be looked into more to see what the ramifications would be, since (if I understand correctly), it'd mean that the Zig standard library would bypass the code page setting and write/read via UTF-16. Until that option is ruled out/in, though, I feel like I don't have enough information to make a good decision here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm actually OK with this, since it's configurable by the application. However, it should observe the subsystem and default to false in that case. The subsystem is observable via std.builtin.subsystem
.
inline fn setupWindows() void { | ||
if (std.options.windows_force_utf8_codepage) { | ||
_ = std.os.windows.kernel32.SetConsoleOutputCP(65001); // use UTF-8 codepage | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this would be on the same level of prescription as enabling ansi escape codes here, could sneak in something like
} | |
} | |
if (std.options.windows_force_virtual_terminal) { | |
const ENABLE_VIRTUAL_TERMINAL_PROCESSING = 4; | |
var stdout_mode: u32 = undefined; | |
_ = std.os.windows.kernel32.GetConsoleMode(std.io.getStdOut().handle, &stdout_mode); | |
stdout_mode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING; | |
_ = std.os.windows.kernel32.SetConsoleMode(std.io.getStdOut().handle, stdout_mode); | |
} |
Closing abandoned PR. This issue is tracked at #7600, where you can also find a link to this PR in case someone wants to revive it. |
Windows treats console output outside the ASCII range as being part of its "codepage", which is very much not UTF-8 by default. This can be annoying when writing programs on Windows - you need to put
std.windows.kernel32.SetConsoleOutputCP
somewhere. Since all ofstd
is based on UTF-8 output, it makes sense to do this automatically in most cases, so here we do it instd.start
, but overrideable with a root option.(I put the logic for this in its own function because I don't doubt there'll be similar weird things we end up having to do to make Windows act somewhat like a normal platform)