Skip to content

switch to using 'W' versions of windows functions instead of 'A' #534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewrk opened this issue Oct 14, 2017 · 21 comments · Fixed by #9037
Closed

switch to using 'W' versions of windows functions instead of 'A' #534

andrewrk opened this issue Oct 14, 2017 · 21 comments · Fixed by #9037
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Oct 14, 2017

Here's the situation:

The 'A' versions of windows functions depend on a global code-page setting. We don't want to depend on global state in this way.

The 'W' versions use UTF-16LE with no global state, which at least supports all of unicode.

There are also some microsoft decisions such as: 'A' versions of file paths are limited to 260 bytes while 'W' versions are limited to 32,727. For some new API functions, there is no 'A' version, only 'W'. In general, 'A' seems legacy and deprecated, and 'W' is the correct way to use the Windows API.

Sadly, since Zig uses UTF-8 in the standard library (and this remains the correct decision), this essentially means decoding UTF-8, encoding UTF-16LE, making a Windows API call, decoding UTF-16LE, encoding UTF-8 for many of our syscalls on windows. But that's how it goes in the windows world.

@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Oct 14, 2017
@andrewrk andrewrk added this to the 0.2.0 milestone Oct 14, 2017
@PavelVozenilek
Copy link

'A' functions convert narrow string parameter to UTF16 and call the 'W' variant.

Manual conversion can be done by Win32 functions MultiByteToWideChar/WideCharToMultiByte. Their use is a bit tricky (CP_THREAD_ACP does not work in some situations, CP_OEMCP does) but they allow to find out how much memory will be needed for the conversion and then use good old alloca, not touching the heap.

It is also possible to set codepage to UTF8 per application (or per thread) and let Windows do the conversions.

@andrewrk
Copy link
Member Author

Manual conversion can be done by Win32 functions MultiByteToWideChar/WideCharToMultiByte

Zig will use its own functions for this. Less room for error.

use good old alloca, not touching the heap.

This is not acceptable since the strings have a non-upper-bounded length.

It is also possible to set codepage to UTF8 per application (or per thread) and let Windows do the conversions.

How?

@PavelVozenilek
Copy link

Zig will use its own functions for this. Less room for error.

The above functions handle (or claim to handle) invalid input sentences and surrogates.

[alloca] is not acceptable since the strings have a non-upper-bounded length.

I have yet to see string longer than a page. Applications dealing with big texts (editors) split them into parts, long blob is a no-no.
Allocations larger than N can be trivially rerouted to the heap.

It is also possible to set codepage to UTF8 per application (or per thread) and let Windows do the conversions.

How?

Oops, not possible (to switch Win32 app to "UTF8 mode"). I was thinking something else.

@andrewrk
Copy link
Member Author

The above functions handle (or claim to handle) invalid input sentences and surrogates.

Better to not make syscalls when no syscalls are fundamentally required.

Allocations larger than N can be trivially rerouted to the heap.

Not true since we accept an allocator argument when we want to use the heap.

@pluto439
Copy link

File paths is probably the only time this really matters. But yeah, to be safe.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Jan 17, 2018
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018
@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Aug 9, 2018
andrewrk added a commit that referenced this issue Aug 22, 2018
 * error.BadFd is not a valid error code. it would always be a bug to
   get this error code.
 * merge error.Io with existing error.InputOutput
 * merge error.PathNotFound with existing error.FileNotFound.
   Not all OS's support both.
 * add os.File.openReadC
 * add error.BadPathName for windows file operations with invalid
   characters
 * add os.toPosixPath to help stack allocate a null terminating byte
 * add some TODOs for other functions to investigate removing the
   allocator requirement
 * optimize some implementations to use the alternate functions when
   a null byte is already available
 * add a missing error.SkipZigTest
 * os.selfExePath uses a non-allocating API
 * os.selfExeDirPath uses a non-allocating API
 * os.path.real uses a non-allocating API
 * add os.path.realAlloc and os.path.realC
 * convert many windows syscalls to use the W versions (See #534)
andrewrk added a commit that referenced this issue Aug 22, 2018
This does a proof of concept of changing most file system APIs to not
require an allocator and remove the possibility of failure via
OutOfMemory.

This also does most of the work of #534.
@andrewrk andrewrk modified the milestones: 0.4.0, 0.3.0 Aug 22, 2018
@andrewrk
Copy link
Member Author

Bumping up to 0.3.0 since this is mostly solved by the above commit.

@andrewrk
Copy link
Member Author

andrewrk commented Sep 3, 2018

progress:

  • std/os/windows/advapi32.zig:pub extern "advapi32" stdcallcc fn CryptAcquireContextA(
  • std/os/windows/kernel32.zig:pub extern "kernel32" stdcallcc fn FreeEnvironmentStringsA(penv: [*]u8) BOOL;
  • std/os/windows/kernel32.zig:pub extern "kernel32" stdcallcc fn GetCommandLineA() LPSTR;
  • std/os/windows/kernel32.zig:pub extern "kernel32" stdcallcc fn GetEnvironmentStringsA() ?[*]u8;
  • std/os/windows/kernel32.zig:pub extern "kernel32" stdcallcc fn GetEnvironmentVariableA(lpName: LPCSTR, lpBuffer: LPSTR, nSize: DWORD) DWORD;
  • std/os/windows/kernel32.zig:pub extern "kernel32" stdcallcc fn LoadLibraryA(lpLibFileName: LPCSTR) ?HMODULE;
  • std/os/windows/shlwapi.zig:pub extern "shlwapi" stdcallcc fn PathFileExistsA(pszPath: ?LPCTSTR) BOOL;
  • std/os/windows/user32.zig:pub extern "user32" stdcallcc fn MessageBoxA(hWnd: ?HANDLE, lpText: ?LPCTSTR, lpCaption: ?LPCTSTR, uType: UINT) c_int;
  • std/os/windows/util.zig: return windows.LoadLibraryA(padded_buff.ptr) orelse error.DllNotFound;

@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Sep 3, 2018
@emekoi
Copy link
Contributor

emekoi commented Sep 17, 2018

about the first one, should we really be using it? accord to the docs that api is deprecated.

@andrewrk
Copy link
Member Author

andrewrk commented Sep 17, 2018

See #1318 for the details on CryptAcquireContextA vs RtlGenRandom.

@andrewrk
Copy link
Member Author

Oops, according to that issue, actually we should just remove CryptAcquireContextA from the standard library since it's deprecated according to microsoft and the zig std lib does not depend on it.

andrewrk added a commit that referenced this issue Sep 18, 2018
 * `CryptAcquireContextA`
 * `CryptReleaseContext`
 * `CryptGenRandom`

See #534 (comment)
@Sobeston
Copy link
Contributor

If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. - MSDN

In the future we could go for this and drop -W, using UTF-8 directly. Unfortunately this only works in very new versions of windows. Perhaps this should be left for zig 1.0 or beyond.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020
@codehz
Copy link
Contributor

codehz commented Nov 20, 2020

If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. - MSDN

In the future we could go for this and drop -W, using UTF-8 directly. Unfortunately this only works in very new versions of windows. Perhaps this should be left for zig 1.0 or beyond.

BUT, it still has some problem, the path length is still limited to 260...

@andrewrk
Copy link
Member Author

OK now we're down to one final callsite before closing this issue:

if (windows.kernel32.FillConsoleOutputCharacterA(

@marler8997
Copy link
Contributor

Now that zig std no longer uses the *A functions, do you think we should remove their declarations from std? So long as people can get the declarations easily from somewhere else if they need them, so maybe we would wait for the package manager first?

@expikr
Copy link
Contributor

expikr commented Oct 28, 2023

With #17448 it now might be worth reconsidering using the ANSI functions by default.

@andrewrk
Copy link
Member Author

How is that related to this issue?

@squeek502
Copy link
Collaborator

squeek502 commented Oct 29, 2023

How is that related to this issue?

It's possible to use embedded .manifest files to get Windows to 'speak' UTF-8 via the A suffixed functions: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

The problems with this strategy:

  • It would require every Zig-compiled program to use such a .manifest
  • It may or may not apply to all Win32 APIs; from the docs: "If the ANSI code page is configured for UTF-8, then -A APIs typically operate in UTF-8"
  • AFAIK this is just a layer over the W APIs--that is, setting the code page to UTF-8 only means that the A suffixed functions will convert between UTF-8 and UTF-16 for you; it won't cut UTF-16 conversion out of the picture
  • Zig prefers ntdll APIs anyway which don't support UTF-8, so the benefit to the Zig standard library implementation would be minimal

Oh, and another point that I tested a while back that's also relevant: setting the code page to UTF-8 using the .manifest does not affect the console input/output code page (so it's not a potential solution for #7600)

@Paul-Dempsey
Copy link

FWIW there are areas of the Windows API where the A entry points are not simply wrappers that convert parameters and call the W version. This is a simplistic notion that is not universally true. For example, GetAddrInfoEx: the A entry points do not support async and all related parameters must be null, but the W variant does support async completion. There are other corners of the Windows ecosystem where this phenomenon holds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.