Skip to content

Cannot read unicode data from STDIN on Windows #5148

Open
@Tetralux

Description

@Tetralux

Tracking issue for how, on Windows, just using ReadFile will completely corrupt UTF-16 data.

Instead, you have to use ReadConsoleW.

For example:

extern "kernel32" fn ReadConsoleW(handle: os.fd_t, buffer: [*]u16, len: os.windows.DWORD, read: *os.windows.DWORD, input_ctrl: ?*c_void) i32;
extern "kernel32" fn SetConsoleOutputCP(cp: os.windows.UINT) i32;

// ...

assert(SetConsoleOutputCP(65001) != 0); // display outputted utf8 correctly in the console
    
var stdin = io.getStdIn().handle;
var data: [256]u16 = undefined;
var read: u32 = undefined;
const ok = ReadConsoleW(stdin, &data, data.len, &read, null); // type in '안녕'
assert(ok != 0);

var utf8: [1024]u8 = undefined;
const utf8_len = try unicode.utf16leToUtf8(&utf8, data[0..read]);
const s = utf8[0..utf8_len];

print("'{}'\n", .{s});  // prints out '안녕
                        //            '

As noted by @fengb, this may mean Zig needs to have a special InStream for the console.

Metadata

Metadata

Assignees

No one assigned

    Labels

    os-windowsstandard libraryThis issue involves writing Zig code for the standard library.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions