Skip to content

venv windows: Missing venv/bin folder, should symlink venv/bin -> venv/Scripts #97586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zackees opened this issue Sep 27, 2022 · 13 comments
Open
Labels
3.13 bugs and security fixes OS-windows stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@zackees
Copy link

zackees commented Sep 27, 2022

Bug report

Entering into a venv environment on Windows doesn't work well.

The command to enter into a virtual environment on Linux and MacOS looks like this:

source venv/bin/activate.sh

On Windows however, there is no venv/bin but instead venv/Scripts, which appears to be a similar / same folder to venv/bin. Therefore on windows (git-bash) source venv/bin/activate.sh does not work

As a workaround in my projects, I simply create a symlink using the windows junction point feature (mkdir /j) so that symlink venv/bin points to venv/Scripts and then everything seems to work right.

Windows gained non-admin symlinking in 2019

The fix for this problem only became available in 2019 since mkdir /j was added or enabled for windows directory symlinks. Prior to this, a windows host needed to have admin privilege's or a registry hack in order to symlink a folder.

Your environment

  • Windows 10, in git-bash
  • python 3.10.7
@zackees zackees added the type-bug An unexpected behavior, bug, or error label Sep 27, 2022
@eryksun
Copy link
Contributor

eryksun commented Sep 27, 2022

CMD's mklink /j command creates a junction (mount point), not a symlink 1. Junctions have been supported since Windows 2000. The filesystem must support reparse points (e.g. NTFS and ReFS, but not FAT32 and exFAT). Creating a junction does not require a specific privilege.

CMD's mklink /d command creates a symlink to a directory. Symlinks have been supported since Windows Vista (2007). The filesystem must support reparse points. Creating a symlink requires the user to have SeCreateSymbolicLinkPrivilege. By default this privilege is only granted to the "Administrators" group, but it can easily be granted to the "Users" group (or "Authenticated Users"), in which case it's always available without having to elevate. Windows 10+ supports unprivileged creation of symlinks if the system is in developer mode.

What do you suggest be done if venv can't create a symlink or junction because the filesystem doesn't support reparse points? Should we leave that as an unsupported case? Would it be better to create "bin" as the real directory, and try to create "Scripts" as a compatibility link?

Footnotes

  1. FYI, a mount point is handled differently than a symlink when opening a path. The resolved path of a symlink replaces the previously traversed path, while that of a mount point does not. This can affect how relative symlink targets are resolved in the remaining path (e.g. a rooted target path such as "\spam" or a target path with ".." components such as "..\..\spam"). Also, when accessed in a UNC path, the target of a mount point is traversed on the server side, which is possible because a junction is only allowed to target local system devices (that is, devices that are local to the server), while the target of a symlink has to be resolved and traversed on the client (uh oh, that link to "C:\spam" actually resolves to the "C:" drive on the client machine; fortunately remote-to-local symlinks are disallowed by default).

@eryksun eryksun added OS-windows stdlib Python modules in the Lib dir labels Sep 27, 2022
@zackees
Copy link
Author

zackees commented Sep 27, 2022

What do you suggest be done if venv can't create a symlink or junction because the filesystem doesn't support reparse points? Should we leave that as an unsupported case? Would it be better to create "bin" as the real directory, and try to create "Scripts" as a compatibility link?

My humble suggestion is to create a junction point venv/bin -> venv/Scripts. I've used that when working at Google and for 40-some independent python projects and it works great. I don't know if a real symlink on windows would have any extra benefit.

If the juction point fails then emit a warning and continue. It's the same behavior as it is now, anyway.

@zackees zackees changed the title venv windows: Missing venv/bin folder, should symlink venv/bin -> symlink venv/Scripts venv windows: Missing venv/bin folder, should symlink venv/bin -> venv/Scripts Sep 27, 2022
@eryksun
Copy link
Contributor

eryksun commented Sep 27, 2022

My humble suggestion is to create a junction point venv/bin -> venv/Scripts.

That's fine, regarding the warning and not supporting "bin" if the filesystem doesn't support reparse points. I was just curious whether you thought forward consistency with POSIX "bin" was more important than backward compatibility with "Scripts" in that case.

Python doesn't currently have a supported function that creates a junction. We have _winapi.CreateJunction() for internal use in tests, but it's not cautiously designed as a public API, and it lacks comprehensive tests. We'd either have to improve it or call out to the shell via subprocess.run('mklink /j ...', shell=True). Passing a command line to the shell introduces its own set of risks and problems.

@zackees
Copy link
Author

zackees commented Sep 27, 2022

I was just curious whether you thought forward consistency with POSIX "bin" was more important than backward compatibility with "Scripts" in that case.

I do, but that ship has sailed so to speak. The biggest importance is maintaining behavior for older clients that depend on venv/Scripts.

We have _winapi.CreateJunction() for internal use in tests, but it's not cautiously designed as a public API, and it lacks comprehensive tests

Yes, this is what I use too and it works great. Does it need to be introduced as a public api? It seems like it's just necessary to invoke internally.

@eryksun
Copy link
Contributor

eryksun commented Sep 27, 2022

Does it need to be introduced as a public api?

No, but CreateJunction() would need still need to be improved. It should support passing src_path as an extended path that's prefixed by "\\?\". If it enables SeRestorePrivilege for backup semantics, it should only be enabled for the current thread, not the entire process. It should validate that src_path is on a local device; else a broken junction is created that won't be discovered until something tries to traverse it. If it fails to set the reparse point, it should delete the created directory, dst_path. Above all, it needs tests to validate its design and protect against regressions.

@eryksun
Copy link
Contributor

eryksun commented Sep 29, 2022

We'd also need to fix Argument Clinic, because the generated function _winapi_CreateJunction() is really bad code. If the call has the wrong number of arguments or wrong argument types, it calls PyMem_Free((void *)src_path) and PyMem_Free((void *)dst_path) on uninitialized local variables, which corrupts the heap and causes a crash.

@zackees
Copy link
Author

zackees commented Sep 29, 2022

Why not use mklink ? Is there a concern that mklink can be swapped out on the host machine?

@eryksun
Copy link
Contributor

eryksun commented Sep 29, 2022

Here's an attempt at cleaning up _winapi.CreateJunction().

  • I removed the privilege enable code since it's not necessary. (If desired, a new _winapi function can be implemented to enable a privilege set for the current thread.)
  • It supports the WinAPI "\\?\" and "\\.\" device prefixes, and more carefully rejects other UNC paths and the NTAPI "\??\" device prefix. Any mix of forward slashes and backslashes is supported.
  • It sets a Windows error code for all failure cases.
  • It locks down the full print path early on, made rigorous against a race condition with another thread that changes the current working directory.
  • It ensures that if the print path exists, it targets a directory, and that it always must target an existing local volume device.
  • It requests the least access when opening the directory. Only FILE_WRITE_DATA and SYNCHRONIZE access are required for the synchronous DeviceIoControl() call that sets the reparse point.
  • On cleanup, it ensures that all allocated memory is freed and the handle is closed. If setting the reparse point failed, the destination directory gets deleted.
static PyObject *
_winapi_CreateJunction_impl(PyObject *module, LPCWSTR src_path,
                            LPCWSTR dst_path)
/*[clinic end generated code: output=44b3f5e9bbcc4271 input=963d29b44b9384a7]*/
{
    WCHAR print_path_buf[MAX_PATH];
    WCHAR vol_path_buf[MAX_PATH];
    LPWSTR print_path = print_path_buf;
    LPWSTR vol_path = vol_path_buf;
    size_t len = Py_ARRAY_LENGTH(print_path_buf);
    size_t prefix_maybe = 4;
    size_t rdb_size;
    _Py_PREPARSE_DATA_BUFFER prdb = NULL;
    HANDLE hdir = INVALID_HANDLE_VALUE;
    BOOL created = FALSE;
    BOOL success = FALSE;

    if (PySys_Audit("_winapi.CreateJunction", "uu", src_path, dst_path) < 0) {
        return NULL;
    }

    if (src_path == NULL || dst_path == NULL) {
        // This should never occur. The Argument Clinic declaration
        // doesn't allow NoneType.
        return PyErr_SetFromWindowsErr(ERROR_INVALID_PARAMETER);
    }

    if ((wcsnlen(src_path, 4) == 4) &&
        (src_path[0] == L'\\' || src_path[0] == L'/')) {
        if (src_path[1] == L'\\' || src_path[1] == L'/') {
            if ((src_path[2] == L'?' || src_path[2] == L'.') &&
                (src_path[3] == L'\\' || src_path[3] == L'/')) {
                prefix_maybe = 0;
            } else {
                // UNC paths are not supported.
                SetLastError(ERROR_INVALID_PARAMETER);
                goto cleanup;
            }
        } else if ((src_path[1] == L'?' && src_path[2] == L'?') &&
                   (src_path[3] == L'\\' || src_path[3] == L'/')) {
            // NT "\??\" could be silently replaced with Windows "\\?\",
            // but strictly speaking this is a domain error.
            SetLastError(ERROR_INVALID_NAME);
            goto cleanup;
        }
    }

    // Get the resolved print path and length.
    while (1) {
        DWORD result = GetFullPathNameW(src_path, (DWORD)len, print_path,
                                        NULL);
        if (result == 0) {
            goto cleanup;
        }
        if (result < len) {
            // Include the null terminator in the length.
            len = result + 1;
            break;
        }
        LPWSTR temp = PyMem_RawRealloc(
                        (print_path == print_path_buf) ? NULL : print_path,
                        result * sizeof(WCHAR));
        if (temp == NULL) {
            SetLastError(ERROR_NOT_ENOUGH_MEMORY);
            goto cleanup;
        }
        print_path = temp;
        len = result;
    }

    // Contents of the reparse data buffer for a junction:
    //
    //   * The substitute name is the target path in the system.
    //     It must be a native NT path, e.g. prefixed with "\??\".
    //   * The print name is an optional target path to show in
    //     directory listings. It should be a DOS path.
    //   * Both names are stored sequentially in PathBuffer, and
    //     both must be null-terminated.
    //   * There are four fields that define the respective offset and
    //     length inside PathBuffer: SubstituteNameOffset,
    //     SubstituteNameLength, PrintNameOffset and PrintNameLength.
    //   * The total required allocation for the reparse data buffer
    //     is the sum of the following:
    //       - REPARSE_DATA_BUFFER_HEADER_SIZE
    //       - size of MountPointReparseBuffer, without PathBuffer
    //       - size of the substitute name in bytes
    //       - size of the print name in bytes

    rdb_size = _Py_REPARSE_DATA_BUFFER_HEADER_SIZE +
               sizeof(prdb->MountPointReparseBuffer) -
               sizeof(prdb->MountPointReparseBuffer.PathBuffer) +
               (prefix_maybe + len + len) * sizeof(WCHAR);
    if (rdb_size > _Py_MAXIMUM_REPARSE_DATA_BUFFER_SIZE) {
        SetLastError(ERROR_FILENAME_EXCED_RANGE);
        goto cleanup;
    }
    prdb = PyMem_RawCalloc(1, rdb_size);
    if (prdb == NULL) {
        SetLastError(ERROR_NOT_ENOUGH_MEMORY);
        goto cleanup;
    }

    prdb->ReparseTag = IO_REPARSE_TAG_MOUNT_POINT;
    prdb->ReparseDataLength =
        (USHORT)(rdb_size - _Py_REPARSE_DATA_BUFFER_HEADER_SIZE);
    prdb->MountPointReparseBuffer.SubstituteNameOffset = 0;
    prdb->MountPointReparseBuffer.SubstituteNameLength =
        (USHORT)((prefix_maybe + len - 1) * sizeof(WCHAR));
    prdb->MountPointReparseBuffer.PrintNameOffset =
        (USHORT)((prefix_maybe + len) * sizeof(WCHAR));
    prdb->MountPointReparseBuffer.PrintNameLength =
        (USHORT)((len - 1) * sizeof(WCHAR));

    // Store the subsitute path. At this point, it may have a WinAPI
    // "\\?\" or "\\.\" local device prefix.
    wcscpy(prdb->MountPointReparseBuffer.PathBuffer + prefix_maybe,
           print_path);
    // Store the NTAPI "\??\" local device prefix of the substitute path.
    wcsncpy(prdb->MountPointReparseBuffer.PathBuffer, L"\\??\\", 4);
    // Store the print path. It may have a WinAPI "\\?\" or "\\.\" local
    // device prefix.
    wcscpy(prdb->MountPointReparseBuffer.PathBuffer + prefix_maybe + len,
           print_path);

    // The target path must be a directory if it exists.
    DWORD attributes = GetFileAttributesW(print_path);
    if (attributes != INVALID_FILE_ATTRIBUTES &&
        (attributes & FILE_ATTRIBUTE_DIRECTORY) == 0) {
        SetLastError(ERROR_DIRECTORY);
        goto cleanup;
    }

    // The target path must be on an existing, local volume.
    len = Py_ARRAY_LENGTH(vol_path_buf);
    while (len <= UNICODE_STRING_MAX_CHARS) {
        BOOL ret = GetVolumePathNameW(print_path, vol_path, (DWORD)len);
        if (!ret && GetLastError() != ERROR_FILENAME_EXCED_RANGE) {
            goto cleanup;
        }
        if (ret && vol_path[wcslen(vol_path)-1] == L'\\') {
            break;
        }
        len = len * 2;
        LPWSTR temp = PyMem_RawRealloc(
                        (vol_path == vol_path_buf) ? NULL : vol_path,
                        len * sizeof(WCHAR));
        if (temp == NULL) {
            SetLastError(ERROR_NOT_ENOUGH_MEMORY);
            goto cleanup;
        }
        vol_path = temp;
    }
    switch (GetDriveTypeW(vol_path)) {
    case DRIVE_UNKNOWN:
    case DRIVE_NO_ROOT_DIR:
    case DRIVE_REMOTE:
        SetLastError(ERROR_INVALID_PARAMETER);
        goto cleanup;
    }

    // Create the directory, open it, and set the junction reparse point.

    if (!(created = CreateDirectoryW(dst_path, NULL))) {
        goto cleanup;
    }

    hdir = CreateFileW(dst_path, FILE_WRITE_DATA | SYNCHRONIZE, 0, NULL,
                OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT |
                FILE_FLAG_BACKUP_SEMANTICS, NULL);
    if (hdir == INVALID_HANDLE_VALUE) {
        goto cleanup;
    }

    DWORD outbytes;
    if (!(success = DeviceIoControl(hdir, FSCTL_SET_REPARSE_POINT,
                        prdb, (DWORD)rdb_size, NULL, 0, &outbytes, NULL))) {
        goto cleanup;
    }

cleanup:

    DWORD error = GetLastError();

    if (print_path != print_path_buf) {
        PyMem_RawFree(print_path);
    }
    if (vol_path != vol_path_buf) {
        PyMem_RawFree(vol_path);
    }
    if (prdb != NULL) {
        PyMem_RawFree(prdb);
    }
    if (hdir != INVALID_HANDLE_VALUE) {
        CloseHandle(hdir);
    }

    if (!success) {
        if (created) {
            RemoveDirectoryW(dst_path);
        }
        return PyErr_SetFromWindowsErr(error);
    }

    Py_RETURN_NONE;
}

@eryksun
Copy link
Contributor

eryksun commented Sep 29, 2022

Why not use mklink ? Is there a concern that mklink can be swapped out on the host machine?

For one, CMD doesn't support escaping "%" characters in a command line, and "%" is a valid filename character. At best, one can add a caret after each percent character, such as "%^username%^". On the first pass of the command line, CMD will try to expand an environment variable named "^username". Hopefully no such environment variable exists, but that's just hope. On its second pass, CMD consumes the carets, leaving unexpanded "%username%" in the command line.

@eryksun eryksun added type-feature A feature request or enhancement 3.12 only security fixes and removed type-bug An unexpected behavior, bug, or error labels Oct 3, 2022
@mohamadArdebili
Copy link

mohamadArdebili commented Sep 16, 2023

I'm using Windows, but instead of Scripts Folder, there's bin and the env won't be activated, I don't know why.
Can you help??
@eryksun
@zackees

@zackees
Copy link
Author

zackees commented Sep 16, 2023

No idea

@brettcannon
Copy link
Member

I'm using Windows, but instead of Scripts Folder, there's bin and the env won't be activated, I don't know why.

The code for calculating the path can be found at

binpath = self._venv_path(env_dir, 'scripts')
. Chances are you Python install is doing something odd.

@zackees
Copy link
Author

zackees commented Sep 27, 2023

Fyi in python 3.12 bin is now symlinked to scripts in python 3.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes OS-windows stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants