-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
const std = @import("std");
pub fn main() void {
std.log.debug("Hello, μ! (^=◕ᴥ◕=^)", .{});
var f = std.io.getStdOut();
_ = f.write("Hello, μ! (^=◕ᴥ◕=^)\n") catch @panic("write to stdout failed");
f = std.io.getStdErr();
_ = f.write("Hello, μ! (^=◕ᴥ◕=^)\n") catch @panic("write to stderr failed");
}
> repro_windows_utf8.exe
debug: Hello, μ! (^=◕ᴥ◕=^)
Hello, μ! (^=◕ᴥ◕=^)
Hello, μ! (^=◕ᴥ◕=^)
Poor kitty, its face has been replaced by encoding errors!
The zig documentation has this to say about string literals:
String literals are single-item constant Pointers to null-terminated UTF-8 encoded byte arrays.
... String literals are const pointers to null-terminated arrays of u8, and by convention parameters that are "strings" are expected to be UTF-8 encoded slices of u8.
If this is true of all standard library functions, std.log
and writers that refer to console output streams are expecting utf-8 encoded strings, and callers likely expect these functions to output these strings to the console as they appear in their source files or data sources. Users of zig command-line applications (and zig developers) may not have configured Windows to use utf-8 by default, but this can be overridden for a given console session by SetConsoleOutputCP
.
I propose that the zig runtime should, at program startup, call SetConsoleOutputCP
with an argument of 65001 to amend terminal output to correctly show utf-8 strings in all cases. If zig developers want legacy behavior or alternative code pages, they can call this function again with a different argument in their program's entry point.
(Aside: in this case, the zig compiler does not have the correct behavior by default either, and will print source code with the incorrect encoding in error messages):
.\repro_windows_utf8.zig:6:44: error: expression value is ignored
f.write("Hello, μ! (^=◕ᴥ◕=^)") catch @panic("write to stderr failed");