Skip to content

Switch to using unicode when parsing the command line on windows #7241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 30, 2020
Merged

Switch to using unicode when parsing the command line on windows #7241

merged 4 commits into from
Nov 30, 2020

Conversation

Rageoholic
Copy link
Contributor

This should help out with issue #534. Just using GetCommandLineW and then adding in some fixups to make it compile, return utf8 still, and make the tests pass.

@LemonBoy
Copy link
Contributor

The tests are failing because mips is a big-endian machine, the result from nextCodepoint needs to be byte-swapped or the tests disabled on everything but Windows.

@Rageoholic
Copy link
Contributor Author

Changes are applied. Looks like the W functions should give you back little endian codepoints. MS has AFAIK never released a version of windows that runs on big endian so I don't care but doing the fixup should be really inexpensive or free and really who puts argument parsing on the fast path anyways. Also we can just run the tests on any given machine and you'll know if you broke it.

@LemonBoy
Copy link
Contributor

Changes are applied.

Tests are still failing because skip/next are not using littleToNative.

Looks like the W functions should give you back little endian codepoints.

UTF-16 or UCS-2 ?

Also we can just run the tests on any given machine and you'll know if you broke it.

That's a great argument, I like it 👍

MIPs

The lowercase s is making me extra sad for no reason :(

@Rageoholic
Copy link
Contributor Author

The lowercase s is making me extra sad for no reason :(

I wrote that early in the morning and now it's making me sad.

UTF-16 or UCS-2 ?

It's definitely UTF-16. I crawled through MSDN to check. Frankly though they can't port to a big endian architecture without either breaking the people who assumed little endian architecture (given I just did I can't blame them) and people who actually do the fixup on big endian architectures. Hopefully enough people were like "I'll just use a library" and the library did the right thing so that MS can keep it's word without too much pain.
https://docs.microsoft.com/en-us/windows/win32/intl/using-byte-order-marks

@daurnimator
Copy link
Contributor

It's definitely UTF-16. I crawled through MSDN to check

Where/how? MSDN usually fails to note that when they say UTF-16 they really mean UCS-2.

@Rageoholic
Copy link
Contributor Author

Rageoholic commented Nov 29, 2020 via email

@andrewrk andrewrk merged commit 0369b65 into ziglang:master Nov 30, 2020
@andrewrk
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants