Skip to content

std.start: initialize Windows console output CP to UTF-8 on exe startup #14411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Jan 22, 2023

Windows treats console output outside the ASCII range as being part of its "codepage", which is very much not UTF-8 by default. This can be annoying when writing programs on Windows - you need to put std.windows.kernel32.SetConsoleOutputCP somewhere. Since all of std is based on UTF-8 output, it makes sense to do this automatically in most cases, so here we do it in std.start, but overrideable with a root option.

(I put the logic for this in its own function because I don't doubt there'll be similar weird things we end up having to do to make Windows act somewhat like a normal platform)

@mlugg mlugg force-pushed the feat/windows-start-utf8 branch 3 times, most recently from 0ebbe8f to 7dc74db Compare January 22, 2023 13:37
@alexrp
Copy link
Member

alexrp commented Jan 22, 2023

Not disagreeing that this is the right thing to do, but it's worth noting that the console code page is part of the console host's state and so sticks around after the program that modified it exits. That might result in surprising behavior to some people.

Also: What about the input code page?

@wooster0
Copy link
Contributor

wooster0 commented Jan 22, 2023

Does this fix #7600 or #5148?
See also #12400 where I think there's already been a lot of discussion.

@mlugg
Copy link
Member Author

mlugg commented Jan 22, 2023

@alexrp

Not disagreeing that this is the right thing to do, but it's worth noting that the console code page is part of the console host's state and so sticks around after the program that modified it exits. That might result in surprising behavior to some people.

Ah, I didn't realise that this modifies the console globally, I assumed it was specific to this process invocation. That's not ideal, but I still think it's preferable to the status quo; I'd expect any process that relies on system codepage to set it specifically, no? (Especially since UTF-8 can be globally set as the default codepage, as in #7600 (comment)).

Also: What about the input code page?

By my understanding, that's a more complex issue (see #5148 as linked by r00ster91), and I'm not sure of the optimal solution for it - in order to keep the current IO abstractions in place, we might need to add a flag to std.fs.File on Windows indicating whether it's a TTY stdio stream. (#12400 originally went for a solution where we split stdio streams into a separate type, but I don't really like that - it's not a big deal to make File a tiny bit bigger on Windows.) If we do that, this fix might not be necessary (instead outputting data differently based on that flag), but I'm not sure what the best option would be in that case, and can't actually test Windows stuff myself, so don't feel comfortable working on that. This solution might not be permanent, but it's a definite improvement on the status quo (IIRC we've had 2 threads in Discord quite recently where people were confused why UTF-8 printing doesn't work on Windows, it's a not-insignificant source of frustration, and outputting UTF-8 is likely a fair bit more common than inputting it, in no small part due to the existence of emoji).

@r00ster91
This fixes #7600, but not #5148. Thanks for the link to #12400, that was helpful to look at!

@andrewrk andrewrk force-pushed the feat/windows-start-utf8 branch from 7dc74db to ae6a989 Compare January 27, 2023 06:08
@andrewrk
Copy link
Member

Can you use ProcMon to check if SetConsoleOutputCP is calling an NtDll function, or inspect the Wine or ReactOS source code to find out the same? See #1840.

@mlugg
Copy link
Member Author

mlugg commented Mar 14, 2023

Looking at the ReactOS source, this function's implementation is definitely not a simple wrapper: it's pretty complex. Moreover, its implementation seems to be completely different across ReactOS, Wine, and Windows. Since this only makes sense in userland anyway, I'm happy to conclude that this makes sense to stay at the kernel32 level.

@andrewrk
Copy link
Member

I don't want to put a dependency on kernel32.dll in start.zig.

@mlugg
Copy link
Member Author

mlugg commented Mar 14, 2023

An alternative approach would be to put this in getStdOut or similar (to be called the first time it's gotten). In what cases are kernel32 unavailable? Skimming that issue, it seems like the main one is kernel/driver development. Is there some way we can detect such cases and omit the call? I'm not hugely familiar with Windows development in general to be honest, so wouldn't know.

I do think this use case needs a resolution, because right now if you just use UTF-8 (which is a pretty standard thing to do for anything's that's not just outputting a stream of English text) your code simply breaks on Windows (of course, you yourself just ran into this). It's worth noting that this PR does allow opting out of the call simply using an std_options flag if you do need to.

@andrewrk
Copy link
Member

I agree something needs to be done, and a dependency on kernel32.dll would be better than nothing. But we can do even better than that if we look into it more deeply. At the very least, such code can be omitted when the subsystem is not console.

The reason for avoiding kernel32 in favor of ntdll is that kernel32 is high level code, not syscalls, and often does problematic things such as allocating heap memory and panicking on OOM, or hiding the actual capabilities of the system such as open directory handles, and the ability to create a directory and open it at the same time. Or, implementing operations with multiple syscalls unnecessarily, or obscuring the real error code.

The equivalent problem exists on other systems too. See for example #14866 where the problem was that libc had a bunch of garbage logic wrapping the actual syscall, which caused a real bug in practice due to the error code not bubbling up properly.

I insist on at least inspecting the DLL stack trace in ProcMon before merging this PR. I'm happy to do that work; just leave the PR open and I'll get to it after I get to the other ~30 PRs in line before it.

@mlugg
Copy link
Member Author

mlugg commented Mar 14, 2023

I'm at least confident in saying that the implementations are completely different across React, Wine, and Windows. React lowers it to a NtDeviceIoControlFile call, using data from a special file \\SystemRoot\\vgafonts.cab. Wine also calls into NtDeviceIoControlFile, but with a completely different control code (IOCTL_CONDRV_SET_INPUT_INFO rather than IOCTL_CONSOLE_LOADFONT) and with the codepage ID directly rather than data loaded from elsewhere. Microsoft's own documentation suggests that these are device-dependent, and Googling the names of these constants spits out only React and Wine respectively.

Regardless, thank you for the update - I'll wait for you to look further into it.

@The-King-of-Toasters
Copy link
Contributor

vgafonts seems to be a ReactOS specific thing, as it's bundled in their source tree. It looks like they're extracting a specific codepage font from the cabinet. I'm trying to get a proper backtrace but I can't find a specific function that kernel32 wraps around. So it might be worth investigating the wine implementation.

@The-King-of-Toasters
Copy link
Contributor

I've tried to translate the wine implementation, but I keep getting INVALID_DEVICE_REQUEST. I also did a simple C impl and got the same result. Here's the code:

const std = @import("std");
const windows = std.os.windows;
const kernel32 = windows.kernel32;

const condrv_input_info_params = extern struct {
    /// Setting mask.
    mask: c_uint,
    info: extern struct {
        /// Console input codepage.
        input_cp: c_uint,
        /// Console output codepage.
        output_cp: c_uint,
        /// Number of available input records.
        input_count: c_uint,
    },
};

const SET_CONSOLE_INPUT_INFO = struct {
    const INPUT_CODEPAGE = 0x01;
    const OUTPUT_CODEPAGE = 0x02;
};

inline fn ctlCode(device_type: FILE_DEVICE, function: u32, method: METHOD, access: u32) u32 {
    return @enumToInt(device_type) << 16 | access << 14 | function << 2 | @enumToInt(method);
}

const FILE_ANY_ACCESS = 0;
const FILE_SPECIAL_ACCESS = 0;
const FILE_READ_ACCESS = windows.FILE_READ_DATA;
const FILE_WRITE_ACCESS = windows.FILE_WRITE_DATA;

const FILE_DEVICE = enum(u32) {
    BEEP = 0x00000001,
    CD_ROM = 0x00000002,
    CD_ROM_FILE_SYSTEM = 0x00000003,
    CONTROLLER = 0x00000004,
    DATALINK = 0x00000005,
    DFS = 0x00000006,
    DISK = 0x00000007,
    DISK_FILE_SYSTEM = 0x00000008,
    FILE_SYSTEM = 0x00000009,
    INPORT_PORT = 0x0000000a,
    KEYBOARD = 0x0000000b,
    MAILSLOT = 0x0000000c,
    MIDI_IN = 0x0000000d,
    MIDI_OUT = 0x0000000e,
    MOUSE = 0x0000000f,
    MULTI_UNC_PROVIDER = 0x00000010,
    NAMED_PIPE = 0x00000011,
    NETWORK = 0x00000012,
    NETWORK_BROWSER = 0x00000013,
    NETWORK_FILE_SYSTEM = 0x00000014,
    NULL = 0x00000015,
    PARALLEL_PORT = 0x00000016,
    PHYSICAL_NETCARD = 0x00000017,
    PRINTER = 0x00000018,
    SCANNER = 0x00000019,
    SERIAL_MOUSE_PORT = 0x0000001a,
    SERIAL_PORT = 0x0000001b,
    SCREEN = 0x0000001c,
    SOUND = 0x0000001d,
    STREAMS = 0x0000001e,
    TAPE = 0x0000001f,
    TAPE_FILE_SYSTEM = 0x00000020,
    TRANSPORT = 0x00000021,
    UNKNOWN = 0x00000022,
    VIDEO = 0x00000023,
    VIRTUAL_DISK = 0x00000024,
    WAVE_IN = 0x00000025,
    WAVE_OUT = 0x00000026,
    @"8042_PORT" = 0x00000027,
    NETWORK_REDIRECTOR = 0x00000028,
    BATTERY = 0x00000029,
    BUS_EXTENDER = 0x0000002a,
    MODEM = 0x0000002b,
    VDM = 0x0000002c,
    MASS_STORAGE = 0x0000002d,
    SMB = 0x0000002e,
    KS = 0x0000002f,
    CHANGER = 0x00000030,
    SMARTCARD = 0x00000031,
    ACPI = 0x00000032,
    DVD = 0x00000033,
    FULLSCREEN_VIDEO = 0x00000034,
    DFS_FILE_SYSTEM = 0x00000035,
    DFS_VOLUME = 0x00000036,
    SERENUM = 0x00000037,
    TERMSRV = 0x00000038,
    KSEC = 0x00000039,
    FIPS = 0x0000003a,
    INFINIBAND = 0x0000003b,
    VMBUS = 0x0000003e,
    CRYPT_PROVIDER = 0x0000003f,
    WPD = 0x00000040,
    BLUETOOTH = 0x00000041,
    MT_COMPOSITE = 0x00000042,
    MT_TRANSPORT = 0x00000043,
    BIOMETRIC = 0x00000044,
    PMI = 0x00000045,
    EHSTOR = 0x00000046,
    DEVAPI = 0x00000047,
    GPIO = 0x00000048,
    USBEX = 0x00000049,
    CONSOLE = 0x00000050,
    NFP = 0x00000051,
    SYSENV = 0x00000052,
    VIRTUAL_BLOCK = 0x00000053,
    POINT_OF_SERVICE = 0x00000054,
    STORAGE_REPLICATION = 0x00000055,
    TRUST_ENV = 0x00000056,
    UCM = 0x00000057,
    UCMTCPCI = 0x00000058,
    PERSISTENT_MEMORY = 0x00000059,
    NVDIMM = 0x0000005a,
    HOLOGRAPHIC = 0x0000005b,
    SDFXHCI = 0x0000005c,
};

const METHOD = enum(u32) {
    BUFFERED = 0,
    IN_DIRECT = 1,
    OUT_DIRECT = 2,
    NEITHER = 3,
};

const IOCTL_CONDRV_GET_INPUT_INFO = ctlCode(.CONSOLE, 15, .BUFFERED, FILE_READ_ACCESS);
const IOCTL_CONDRV_SET_INPUT_INFO = ctlCode(.CONSOLE, 16, .BUFFERED, FILE_WRITE_ACCESS);

pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;

pub fn main() !void {
    const writer = std.io.getStdOut().writer();
    var params = condrv_input_info_params{
        .mask = SET_CONSOLE_INPUT_INFO.OUTPUT_CODEPAGE,
        .info = .{ .input_cp = 0, .output_cp = 65001, .input_count = 0 },
    };

    const stdout = try windows.GetStdHandle(windows.STD_OUTPUT_HANDLE);

    var io: windows.IO_STATUS_BLOCK = undefined;
    const status = windows.ntdll.NtDeviceIoControlFile(
        stdout, //RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
        null,
        null,
        null,
        &io,
        IOCTL_CONDRV_SET_INPUT_INFO,
        &params,
        @sizeOf(condrv_input_info_params),
        null,
        0,
    );

    try writer.print("status: {}\n", .{status});
}

@mlugg
Copy link
Member Author

mlugg commented Mar 15, 2023

That's because, like I say, these are driver-defined constants. IOCTL_CONDRV_SET_INPUT_INFO is a Wine thing, and will not be meaningful on Windows or React.

@squeek502
Copy link
Collaborator

squeek502 commented Mar 16, 2023

Some info from using NtTrace with the following code:

const std = @import("std");

pub fn main() !void {
    _ = std.os.windows.kernel32.SetConsoleOutputCP(65001);
}

The relevant ntdll calls from a SetConsoleOutputCP call are these:

NtDeviceIoControlFile( FileHandle=0x50, Event=0, ApcRoutine=null, ApcContext=null, IoStatusBlock=0x2894fff930 [0/0], IoControlCode=0x00500016, InputBuffer=0x2894fff940, InputBufferLength=0x30, OutputBuffer=null, OutputBufferLength=0 ) => 0
NtDeviceIoControlFile( FileHandle=0x50, Event=0, ApcRoutine=null, ApcContext=null, IoStatusBlock=0x2894fff8e0, IoControlCode=0x00500016, InputBuffer=0x2894fff8f0, InputBufferLength=0x30, OutputBuffer=null, OutputBufferLength=0 ) => 0xc00700bb [187 'The specified system semaphore name was not found.']

(for some reason it does two NtDeviceIoControlFile calls and the second always fails with that The specified system semaphore name was not found. error)

  • The FileHandle param it uses is what you get from RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle in @The-King-of-Toasters' code above
  • IoControlCode is CTL_CODE(FILE_DEVICE_CONSOLE, 5, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
  • InputBufferLength is 0x30, which is different from the size of the wine struct
    • I attempted to look at the memory of these bytes and got this:
00 00 00 00 00 00 00 00  01 00 00 00 01 00 00 00  ................
10 00 00 00 20 00 00 00  40 F5 36 2F 35 00 00 00  .... [email protected]/5...
08 00 00 00 00 00 00 00  48 F5 36 2F 35 00 00 00  ........H.6/5...

where the 40 F5 36 2F 35 00 00 00 and 48 F5 36 2F 35 00 00 00 seem to be pointers that point 8 bytes apart from one another and they point to memory that looked like:

04 00 00 02 08 00 00 00

and

e9 fd 00 00 01 7f 00 00

respectively (I'm assuming they are both pointing to memory that is 8 bytes long but that could be wrong). Note that e9 fd 00 00 if interpreted as a little-endian u32 is 65001 (the codepage we're trying to set the console to).

EDIT: I think the bytes with the 65001 correspond to this struct, which would mean CodePage is 65001 and Output is 1 which would make perfect sense.

EDIT#2: I think the 04 00 00 02 bytes in the first pointer correspond to this enum value. If interpretted as a little-endian u32, those bytes have the value 0x02000004 which is (2 << 24) + 4, and CONSOLE_FIRST_API_NUMBER(2) is 2 << 24 so the ConsolepSetCP enum value would be 0x02000004.

EDIT#3: If the above is true, that would mean the first pointer might be pointing to this struct, and ApiDescriptorSize would be 8.

However, note that my method of looking at these bytes was extremely janky, so they might not even be the actual values that InputBuffer had at the time of the NtDeviceIoControlFile (I set a breakpoint after the kernel32.SetConsoleOutputCP call returned and looked at the memory location that NtTrace said was used in the NtDeviceIoControlFile call so it's totally possible it was stale/overwritten by that point)


I was still unable to construct a successful NtDeviceIoControlFile call myself, though, even when trying to mimic exactly what the successful call was doing, so I don't have the full picture.

My failed attempt
const std = @import("std");
const windows = std.os.windows;

pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;

pub fn main() !void {
    const ptr_val1 = "\x04\x00\x00\x02\x08\x00\x00\x00".*;
    // I'm assuming the first 8 bytes are the only ones that matter but I've included more just incase
    const ptr_val2 = "\xe9\xfd\x00\x00\x01\x7f\x00\x00p\xa3j\x19\xf6\x01\x00\x00@\x8fk\x19\xf6\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa0\xf56/5\x00\x00".*;

    const a = "\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x00".*;
    const b = "\x08\x00\x00\x00\x00\x00\x00\x00".*;

    var buf: [0x30]u8 = undefined;
    var fbs = std.io.fixedBufferStream(&buf);
    var writer = fbs.writer();
    writer.writeAll(&a) catch unreachable;
    writer.writeIntLittle(usize, @ptrToInt(&ptr_val1)) catch unreachable;
    writer.writeAll(&b) catch unreachable;
    writer.writeIntLittle(usize, @ptrToInt(&ptr_val2)) catch unreachable;

    const control_code = 0x00500016;

    var io: windows.IO_STATUS_BLOCK = undefined;
    const status = windows.ntdll.NtDeviceIoControlFile(
        RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
        null,
        null,
        null,
        &io,
        control_code,
        &buf,
        buf.len,
        null,
        0,
    );

    std.debug.print("status: {}\n", .{status});
}

It fails with

status: os.windows.ntstatus.NTSTATUS.ACCESS_VIOLATION

and NtTrace gives:

NtDeviceIoControlFile( FileHandle=0x50, Event=0, ApcRoutine=null, ApcContext=null, IoStatusBlock=0x18f79ff5f0, IoControlCode=0x00500016, InputBuffer=0x18f79ff550, InputBufferLength=0x30, OutputBuffer=null, OutputBufferLength=0 ) 
=> 0xc0000005 [998 'Invalid access to memory location.']

@squeek502
Copy link
Collaborator

squeek502 commented Mar 16, 2023

Got something working. The fix, strangely, was to make the input var instead of const to make the memory get put on the stack instead of in the executable directly (at least I think that's what the issue was).

Working example:

const std = @import("std");
const windows = std.os.windows;

pub extern "ntdll" fn RtlGetCurrentPeb() callconv(windows.WINAPI) *windows.PEB;

const CONSOLE_MSG_HEADER = extern struct {
    ApiNumber: u32,
    ApiDescriptorSize: u32,
};

// This is actually one value in an enum but we just care about this for now
const ConsolepSetCP = (2 << 24) + 4;

const CONSOLE_SETCP_MSG = extern struct {
    CodePage: u32,
    Output: bool,
};

const CONSOLE_MSG_L2 = extern struct {
    Header: CONSOLE_MSG_HEADER,
    // This is actually a union of other types but we just care about this for now
    Body: CONSOLE_SETCP_MSG,
};

const UNKNOWN_IOCTL_INPUT = extern struct {
    a: u32 = 0,
    b: u32 = 0,
    c: u32 = 1,
    d: u32 = 1,
    e: u32 = 0x10,
    f: u32 = 0x20,
    header: *CONSOLE_MSG_HEADER,
    g: u32 = 0x8,
    h: u32 = 0,
    body: *CONSOLE_SETCP_MSG,
};

pub fn main() !void {
    std.debug.print("⚡\n", .{});

    const control_code = 0x00500016;
    var console_msg = CONSOLE_MSG_L2{
        .Header = .{ .ApiNumber = ConsolepSetCP, .ApiDescriptorSize = @sizeOf(CONSOLE_SETCP_MSG) },
        .Body = .{ .CodePage = 65001, .Output = true },
    };

    var input = UNKNOWN_IOCTL_INPUT{
        .header = &console_msg.Header,
        .body = &console_msg.Body,
    };

    var io: windows.IO_STATUS_BLOCK = undefined;
    const status = windows.ntdll.NtDeviceIoControlFile(
        RtlGetCurrentPeb().*.ProcessParameters.ConsoleHandle,
        null,
        null,
        null,
        &io,
        control_code,
        &input,
        @sizeOf(UNKNOWN_IOCTL_INPUT),
        null,
        0,
    );

    std.debug.print("status: {}\n", .{status});
    std.debug.print("⚡\n", .{});
}

Outputs:

ΓÜí
status: os.windows.ntstatus.NTSTATUS.SUCCESS
⚡

(that's with code page 437 set when the program starts, run chcp 437 to reset the code page between runs)

For completeness, here's a hexdump of the input bytes from a working run:

00 00 00 00 00 00 00 00  01 00 00 00 01 00 00 00  ................
10 00 00 00 20 00 00 00  30 F8 7F 3E 69 00 00 00  .... ...0..>i...
08 00 00 00 00 00 00 00  38 F8 7F 3E 69 00 00 00  ........8..>i...

@squeek502
Copy link
Collaborator

squeek502 commented Mar 17, 2023

Made a PR in your fork against this branch to avoid the kernel32 dependency: mlugg#1

@mlugg
Copy link
Member Author

mlugg commented Mar 17, 2023

The thing is, that solution doesn't work in Wine, and IMO we should strive to make Wine work. Moreover, since this looks to be an implementation detail nothing depends on (evidenced by Wine and React doing it differently), afaict it's entirely possible Microsoft change it one day

@alexrp
Copy link
Member

alexrp commented Mar 17, 2023

In addition to the above: If memory serves from past discussions on the microsoft/terminal repo, console handles saw a significant rework in Windows 8. Among other things, I think they were changed so that various kernel32 functions (e.g. ReadFile) stopped special-casing them, instead delegating to the console driver like for other handle types. There may have been additional changes since then that I'm not aware of.

So even if you ignore Wine and ReactOS, you're still looking at compatibility issues just for Windows proper. It still might be workable, just something to keep in mind.

@squeek502
Copy link
Collaborator

squeek502 commented Mar 18, 2023

Agreed about the ntdll implementation being brittle/specific/non-portable/etc. Perhaps reviving #12400 would be the most robust way forward, thereby sidestepping the console output codepage entirely (though it still uses the kernel32 functions ReadConsoleW/WriteConsoleW so at some point those would need to be replaced with ntdll implementations as well).

@The-King-of-Toasters
Copy link
Contributor

I ran another test for WriteConsoleW and ran it through NtTrace to see if there was an easy ntdll function to use, but it seems like it's another ioctl: relevant trace:

NtDeviceIoControlFile( FileHandle=0x68, Event=0, ApcRoutine=null, ApcContext=null, IoStatusBlock=0x95585ff890 [0/0xe], IoControlCode=0x00500016, InputBuffer=0x95585ff8a0, InputBufferLength=0x40, OutputBuffer=null, OutputBufferLength=0 ) => 0

And the code:

const std = @import("std");
const windows = std.os.windows;
const kernel32 = windows.kernel32;

pub extern "kernel32" fn WriteConsoleW(
    hConsoleOutput: *anyopaque,
    lpBuffer: *const anyopaque,
    nNumberOfCharsToWrite: u32,
    lpNumberOfCharsWritten: ?*u32,
    lpReserved: ?*anyopaque,
) callconv(windows.WINAPI) windows.BOOL;

const L = std.unicode.utf8ToUtf16LeStringLiteral;

const foo = L("foobar\n");
pub fn main() !void {
    while (true) {
        _ = WriteConsoleW(
            windows.peb().ProcessParameters.hStdOutput,
            foo[0..],
            @truncate(u32, foo.len),
            null,
            null,
        );
    }
}

@andrewrk
Copy link
Member

andrewrk commented Mar 19, 2023

Nice work @squeek502 on figuring out what you did. That's some impressive sleuthing.

At this point I'm convinced that our two options forward are:

  • Figure out how to make UTF-8 in the terminal work without modifying global state. This would also protect zig programs from being compromised by a different process or third party code changing global state while the zig-based application is running.

  • Go ahead and depend on kernel32.dll for subsystem console programs and make this call to setConsoleOutputCP.

@andrewrk
Copy link
Member

@squeek502 do you have any opinion or suggestion on the path forward here?

@squeek502
Copy link
Collaborator

@andrewrk it's a tricky problem and I don't feel like I have an answer yet. I think something like #12400 should be looked into more to see what the ramifications would be, since (if I understand correctly), it'd mean that the Zig standard library would bypass the code page setting and write/read via UTF-16. Until that option is ruled out/in, though, I feel like I don't have enough information to make a good decision here.

@andrewrk andrewrk changed the title std.start: iniitalize Windows console output CP to UTF-8 on exe startup std.start: initialize Windows console output CP to UTF-8 on exe startup Oct 19, 2023
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm actually OK with this, since it's configurable by the application. However, it should observe the subsystem and default to false in that case. The subsystem is observable via std.builtin.subsystem.

inline fn setupWindows() void {
if (std.options.windows_force_utf8_codepage) {
_ = std.os.windows.kernel32.SetConsoleOutputCP(65001); // use UTF-8 codepage
}
Copy link
Contributor

@expikr expikr Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this would be on the same level of prescription as enabling ansi escape codes here, could sneak in something like

Suggested change
}
}
if (std.options.windows_force_virtual_terminal) {
const ENABLE_VIRTUAL_TERMINAL_PROCESSING = 4;
var stdout_mode: u32 = undefined;
_ = std.os.windows.kernel32.GetConsoleMode(std.io.getStdOut().handle, &stdout_mode);
stdout_mode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
_ = std.os.windows.kernel32.SetConsoleMode(std.io.getStdOut().handle, stdout_mode);
}

@andrewrk
Copy link
Member

andrewrk commented Jan 4, 2024

Closing abandoned PR. This issue is tracked at #7600, where you can also find a link to this PR in case someone wants to revive it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants