Skip to content

[cuda_pathfinder] Initial version of find_nvidia_headers.py for nvshmem (Minimal Viable Product) #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented May 28, 2025

Description

Currently no public API.

To support internal developments.

Copy link
Contributor

copy-pr-bot bot commented May 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented May 28, 2025

/ok to test

@rwgk rwgk self-assigned this May 28, 2025
Copy link

@leofang leofang self-requested a review May 28, 2025 22:26
@leofang leofang added P0 High priority - Must do! feature New feature or request cuda.bindings Everything related to the cuda.bindings module labels May 28, 2025
@leofang leofang added this to the cuda-pathfinder first release milestone Jul 2, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jul 2, 2025
@ZzEeKkAa
Copy link

Just wondering what is the status for this PR?

@leofang leofang added cuda.pathfinder Everything related to the cuda.pathfinder module and removed cuda.bindings Everything related to the cuda.bindings module labels Jul 18, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 18, 2025

I still need to fix this up, after merging #723 a couple days ago.

@isVoid
Copy link

isVoid commented Jul 28, 2025

Is there some mechanism so that user can override the path returned by the api? Or that's largely a library duty to maintain?

@rwgk rwgk changed the title Initial version of path_finder find_nvidia_headers.py (Minimal Viable Product) [cuda_pathfinder] Initial version of find_nvidia_headers.py for nvshmem (Minimal Viable Product) Aug 12, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 12, 2025

/ok to test

@ZzEeKkAa
Copy link

Will it tolerate CUDA_HOME and CUDA_PATH ?

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 12, 2025

Will it tolerate CUDA_HOME and CUDA_PATH ?

The code added in this PR ignores those completely. I've only tested these situations (copy-pasted from the current cuda_pathfinder/tests/test_find_nvidia_headers.py):

    # pip install nvidia-nvshmem-cu12
    # pip install nvidia-nvshmem-cu13

    # conda create -y -n nvshmem python=3.12
    # conda activate nvshmem
    # conda install -y conda-forge::libnvshmem3 conda-forge::libnvshmem-dev

    # wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
    # sudo dpkg -i cuda-keyring_1.1-1_all.deb
    # sudo apt update
    # sudo apt install libnvshmem3-cuda-12 libnvshmem3-dev-cuda-12
    # sudo apt install libnvshmem3-cuda-13 libnvshmem3-dev-cuda-13

Offline @leofang wrote that getting headers in general is now the top priority (I just created #832 to track the work). That will bring in CUDA_HOME / CUDA_PATH for headers. The next step will be to get an overview of where the headers live.

Coming back to nvshmem: Can those headers also be installed into CUDA_HOME / CUDA_PATH? (How is that usually done?)

@isVoid
Copy link

isVoid commented Aug 13, 2025

Coming back to nvshmem: Can those headers also be installed into CUDA_HOME / CUDA_PATH? (How is that usually done?)

From offline with nvshmem team: if installed via apt package they go to /opt/nvshmem, if installed via pip they go to site_packates/nvidia/nvshmem/. So generally not inside CUDA_PATH or CUDA_HOME.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 14, 2025

Coming back to nvshmem: Can those headers also be installed into CUDA_HOME / CUDA_PATH? (How is that usually done?)

From offline with nvshmem team: if installed via apt package they go to /opt/nvshmem, if installed via pip they go to site_packates/nvidia/nvshmem/. So generally not inside CUDA_PATH or CUDA_HOME.

Oh, /opt? — That doesn't match what I see on the machine that I used for interactive testing. See below.

@isVoid Is there another way people usually install nvshmem via apt?

rwgk-win11.localdomain:~ $ dpkg -L libnvshmem3-cuda-13
/.
/usr
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/nvshmem
/usr/lib/x86_64-linux-gnu/nvshmem/13
/usr/lib/x86_64-linux-gnu/nvshmem/13/libnvshmem_host.so.3.3.20
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_mpi.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi2.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmix.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_shmem.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_uid.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibdevx.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibgda.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibrc.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_libfabric.so.3.0.0
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ucx.so.3.0.0
/usr/share
/usr/share/doc
/usr/share/doc/libnvshmem3-cuda-13
/usr/share/doc/libnvshmem3-cuda-13/changelog.Debian.gz
/usr/share/doc/libnvshmem3-cuda-13/copyright
/usr/lib/x86_64-linux-gnu/nvshmem/13/libnvshmem_host.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_mpi.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi2.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmix.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_shmem.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_uid.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibdevx.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibgda.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibrc.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_libfabric.so.3
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ucx.so.3
rwgk-win11.localdomain:~ $ dpkg -L libnvshmem3-dev-cuda-13
/.
/usr
/usr/bin
/usr/bin/nvshmem_13
/usr/bin/nvshmem_13/examples
/usr/bin/nvshmem_13/examples/collective-launch
/usr/bin/nvshmem_13/examples/dev-guide-ring
/usr/bin/nvshmem_13/examples/dev-guide-ring-mpi
/usr/bin/nvshmem_13/examples/moe_shuffle
/usr/bin/nvshmem_13/examples/mpi-based-init
/usr/bin/nvshmem_13/examples/on-stream
/usr/bin/nvshmem_13/examples/put-block
/usr/bin/nvshmem_13/examples/ring-bcast
/usr/bin/nvshmem_13/examples/ring-reduce
/usr/bin/nvshmem_13/examples/thread-group
/usr/bin/nvshmem_13/examples/uid-based-init
/usr/bin/nvshmem_13/examples/user-buffer
/usr/bin/nvshmem_13/hydra_nameserver
/usr/bin/nvshmem_13/hydra_persist
/usr/bin/nvshmem_13/hydra_pmi_proxy
/usr/bin/nvshmem_13/nvshmem-info
/usr/bin/nvshmem_13/nvshmrun.hydra
/usr/bin/nvshmem_13/perftest
/usr/bin/nvshmem_13/perftest/device
/usr/bin/nvshmem_13/perftest/device/coll
/usr/bin/nvshmem_13/perftest/device/coll/alltoall_latency
/usr/bin/nvshmem_13/perftest/device/coll/alltoall_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/barrier_latency
/usr/bin/nvshmem_13/perftest/device/coll/barrier_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/bcast_latency
/usr/bin/nvshmem_13/perftest/device/coll/bcast_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/fcollect_latency
/usr/bin/nvshmem_13/perftest/device/coll/fcollect_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/reducescatter_latency
/usr/bin/nvshmem_13/perftest/device/coll/reducescatter_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/reduction_latency
/usr/bin/nvshmem_13/perftest/device/coll/reduction_latency.cubin
/usr/bin/nvshmem_13/perftest/device/coll/sync_latency
/usr/bin/nvshmem_13/perftest/device/coll/sync_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_bw
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_bw.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_atomic_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_g_bw
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_g_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_g_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_get_bw
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_get_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_get_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_p_bw
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_p_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_p_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_p_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_p_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_atomic_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_atomic_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_bw
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_signal_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_put_signal_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_signal_ping_pong_latency
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_signal_ping_pong_latency.cubin
/usr/bin/nvshmem_13/perftest/device/pt-to-pt/shmem_st_bw
/usr/bin/nvshmem_13/perftest/device/tile
/usr/bin/nvshmem_13/perftest/device/tile/tile_allgather_latency
/usr/bin/nvshmem_13/perftest/device/tile/tile_allreduce_latency
/usr/bin/nvshmem_13/perftest/host
/usr/bin/nvshmem_13/perftest/host/coll
/usr/bin/nvshmem_13/perftest/host/coll/alltoall_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/barrier_all_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/barrier_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/broadcast_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/fcollect_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/reducescatter_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/reduction_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/sync_all_on_stream
/usr/bin/nvshmem_13/perftest/host/coll/sync_on_stream
/usr/bin/nvshmem_13/perftest/host/init
/usr/bin/nvshmem_13/perftest/host/init/malloc
/usr/bin/nvshmem_13/perftest/host/pt-to-pt
/usr/bin/nvshmem_13/perftest/host/pt-to-pt/bw
/usr/bin/nvshmem_13/perftest/host/pt-to-pt/latency
/usr/bin/nvshmem_13/perftest/host/pt-to-pt/stream_latency
/usr/include
/usr/include/nvshmem_13
/usr/include/nvshmem_13/bootstrap_device_host
/usr/include/nvshmem_13/bootstrap_device_host/nvshmem_uniqueid.h
/usr/include/nvshmem_13/device
/usr/include/nvshmem_13/device/nvshmem_coll_defines.cuh
/usr/include/nvshmem_13/device/nvshmem_defines.h
/usr/include/nvshmem_13/device/nvshmem_device_macros.h
/usr/include/nvshmem_13/device/nvshmemx_coll_defines.cuh
/usr/include/nvshmem_13/device/nvshmemx_collective_launch_apis.h
/usr/include/nvshmem_13/device/nvshmemx_defines.h
/usr/include/nvshmem_13/device/tile
/usr/include/nvshmem_13/device/tile/nvshmemx_tile_api.hpp
/usr/include/nvshmem_13/device/tile/nvshmemx_tile_api_defines.cuh
/usr/include/nvshmem_13/device_host
/usr/include/nvshmem_13/device_host/nvshmem_common.cuh
/usr/include/nvshmem_13/device_host/nvshmem_proxy_channel.h
/usr/include/nvshmem_13/device_host/nvshmem_tensor.h
/usr/include/nvshmem_13/device_host/nvshmem_types.h
/usr/include/nvshmem_13/device_host_transport
/usr/include/nvshmem_13/device_host_transport/nvshmem_common_ibgda.h
/usr/include/nvshmem_13/device_host_transport/nvshmem_common_transport.h
/usr/include/nvshmem_13/device_host_transport/nvshmem_constants.h
/usr/include/nvshmem_13/host
/usr/include/nvshmem_13/host/nvshmem_api.h
/usr/include/nvshmem_13/host/nvshmem_coll_api.h
/usr/include/nvshmem_13/host/nvshmem_macros.h
/usr/include/nvshmem_13/host/nvshmemx_api.h
/usr/include/nvshmem_13/host/nvshmemx_coll_api.h
/usr/include/nvshmem_13/non_abi
/usr/include/nvshmem_13/non_abi/device
/usr/include/nvshmem_13/non_abi/device/coll
/usr/include/nvshmem_13/non_abi/device/coll/alltoall.cuh
/usr/include/nvshmem_13/non_abi/device/coll/barrier.cuh
/usr/include/nvshmem_13/non_abi/device/coll/broadcast.cuh
/usr/include/nvshmem_13/non_abi/device/coll/defines.cuh
/usr/include/nvshmem_13/non_abi/device/coll/fcollect.cuh
/usr/include/nvshmem_13/non_abi/device/coll/reduce.cuh
/usr/include/nvshmem_13/non_abi/device/coll/reducescatter.cuh
/usr/include/nvshmem_13/non_abi/device/coll/utils.cuh
/usr/include/nvshmem_13/non_abi/device/common
/usr/include/nvshmem_13/non_abi/device/common/nvshmemi_common_device.cuh
/usr/include/nvshmem_13/non_abi/device/common/nvshmemi_tile_utils.cuh
/usr/include/nvshmem_13/non_abi/device/pt-to-pt
/usr/include/nvshmem_13/non_abi/device/pt-to-pt/ibgda_device.cuh
/usr/include/nvshmem_13/non_abi/device/pt-to-pt/nvshmemi_transfer_api.cuh
/usr/include/nvshmem_13/non_abi/device/pt-to-pt/proxy_device.cuh
/usr/include/nvshmem_13/non_abi/device/pt-to-pt/transfer_device.cuh
/usr/include/nvshmem_13/non_abi/device/pt-to-pt/utils_device.h
/usr/include/nvshmem_13/non_abi/device/team
/usr/include/nvshmem_13/non_abi/device/team/nvshmemi_team_defines.cuh
/usr/include/nvshmem_13/non_abi/device/threadgroup
/usr/include/nvshmem_13/non_abi/device/threadgroup/nvshmemi_common_device_defines.cuh
/usr/include/nvshmem_13/non_abi/device/wait
/usr/include/nvshmem_13/non_abi/device/wait/nvshmemi_wait_until_apis.cuh
/usr/include/nvshmem_13/non_abi/nvshmem_build_options.h
/usr/include/nvshmem_13/non_abi/nvshmem_version.h
/usr/include/nvshmem_13/non_abi/nvshmemx_error.h
/usr/include/nvshmem_13/nvshmem.h
/usr/include/nvshmem_13/nvshmem_host.h
/usr/include/nvshmem_13/nvshmemx.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/nvshmem
/usr/lib/x86_64-linux-gnu/nvshmem/13
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMConfig.cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMDeviceTargets-release.cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMDeviceTargets.cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMTargets-release.cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMTargets.cmake
/usr/lib/x86_64-linux-gnu/nvshmem/13/cmake/nvshmem/NVSHMEMVersion.cmake
/usr/share
/usr/share/doc
/usr/share/doc/libnvshmem3-dev-cuda-13
/usr/share/doc/libnvshmem3-dev-cuda-13/changelog.Debian.gz
/usr/share/doc/libnvshmem3-dev-cuda-13/copyright
/usr/src
/usr/src/nvshmem
/usr/src/nvshmem/13
/usr/src/nvshmem/13/src
/usr/src/nvshmem/13/src/bootstrap-plugins
/usr/src/nvshmem/13/src/bootstrap-plugins/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/common
/usr/src/nvshmem/13/src/bootstrap-plugins/common/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/common/bootstrap_util.cpp
/usr/src/nvshmem/13/src/bootstrap-plugins/common/bootstrap_util.h
/usr/src/nvshmem/13/src/bootstrap-plugins/common/env_defs.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include
/usr/src/nvshmem/13/src/bootstrap-plugins/include/bootstrap_device_host
/usr/src/nvshmem/13/src/bootstrap-plugins/include/bootstrap_device_host/nvshmem_uniqueid.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include/bootstrap_host_transport
/usr/src/nvshmem/13/src/bootstrap-plugins/include/bootstrap_host_transport/env_defs_internal.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include/internal
/usr/src/nvshmem/13/src/bootstrap-plugins/include/internal/bootstrap_host
/usr/src/nvshmem/13/src/bootstrap-plugins/include/internal/bootstrap_host/nvshmemi_bootstrap.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include/internal/bootstrap_host_transport
/usr/src/nvshmem/13/src/bootstrap-plugins/include/internal/bootstrap_host_transport/nvshmemi_bootstrap_defines.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include/non_abi
/usr/src/nvshmem/13/src/bootstrap-plugins/include/non_abi/nvshmem_version.h
/usr/src/nvshmem/13/src/bootstrap-plugins/include/non_abi/nvshmemx_error.h
/usr/src/nvshmem/13/src/bootstrap-plugins/mpi
/usr/src/nvshmem/13/src/bootstrap-plugins/mpi/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/mpi/bootstrap_mpi.c
/usr/src/nvshmem/13/src/bootstrap-plugins/nvshmem_bootstrap.sym
/usr/src/nvshmem/13/src/bootstrap-plugins/pmi
/usr/src/nvshmem/13/src/bootstrap-plugins/pmi/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/pmi/bootstrap_pmi.cpp
/usr/src/nvshmem/13/src/bootstrap-plugins/pmix
/usr/src/nvshmem/13/src/bootstrap-plugins/pmix/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/pmix/bootstrap_pmix.c
/usr/src/nvshmem/13/src/bootstrap-plugins/shmem
/usr/src/nvshmem/13/src/bootstrap-plugins/shmem/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/shmem/bootstrap_shmem.c
/usr/src/nvshmem/13/src/bootstrap-plugins/uid
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/CMakeLists.txt
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/bootstrap_uid.cpp
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/bootstrap_uid_remap.h
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/bootstrap_uid_types.hpp
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_checks.h
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_debug.h
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_nccl.h
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_param.h
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_socket.cpp
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_socket.hpp
/usr/src/nvshmem/13/src/bootstrap-plugins/uid/ncclSocket/ncclsocket_utils.h
/usr/src/nvshmem/13/src/examples
/usr/src/nvshmem/13/src/examples/CMakeLists.txt
/usr/src/nvshmem/13/src/examples/bootstrap_helper.h
/usr/src/nvshmem/13/src/examples/collective-launch.cu
/usr/src/nvshmem/13/src/examples/dev-guide-ring-mpi.cu
/usr/src/nvshmem/13/src/examples/dev-guide-ring.cu
/usr/src/nvshmem/13/src/examples/gemm_allreduce
/usr/src/nvshmem/13/src/examples/gemm_allreduce/allreduce_nvls_warpspecialized.hpp
/usr/src/nvshmem/13/src/examples/gemm_allreduce/gemmAR_fusion_blackwell_fp16.cu
/usr/src/nvshmem/13/src/examples/gemm_allreduce/nvshmemAlloc.hpp
/usr/src/nvshmem/13/src/examples/gemm_allreduce/sm100_gemm_tma_warpspecialized_allreduce.hpp
/usr/src/nvshmem/13/src/examples/hello.cpp
/usr/src/nvshmem/13/src/examples/moe_shuffle.cu
/usr/src/nvshmem/13/src/examples/mpi-based-init.cu
/usr/src/nvshmem/13/src/examples/on-stream.cu
/usr/src/nvshmem/13/src/examples/put-block.cu
/usr/src/nvshmem/13/src/examples/ring-bcast.cu
/usr/src/nvshmem/13/src/examples/ring-reduce.cu
/usr/src/nvshmem/13/src/examples/shmem-based-init.cu
/usr/src/nvshmem/13/src/examples/thread-group.cu
/usr/src/nvshmem/13/src/examples/uid-based-init.cu
/usr/src/nvshmem/13/src/examples/user-buffer.cu
/usr/src/nvshmem/13/src/perftest
/usr/src/nvshmem/13/src/perftest/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/README.md
/usr/src/nvshmem/13/src/perftest/common
/usr/src/nvshmem/13/src/perftest/common/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/common/atomic_bw_common.h
/usr/src/nvshmem/13/src/perftest/common/atomic_one_sided_common.h
/usr/src/nvshmem/13/src/perftest/common/atomic_ping_pong_common.h
/usr/src/nvshmem/13/src/perftest/common/utils.cu
/usr/src/nvshmem/13/src/perftest/common/utils.h
/usr/src/nvshmem/13/src/perftest/device
/usr/src/nvshmem/13/src/perftest/device/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/device/coll
/usr/src/nvshmem/13/src/perftest/device/coll/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/device/coll/alltoall_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/alltoall_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/barrier_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/barrier_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/bcast_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/bcast_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/coll_test.h
/usr/src/nvshmem/13/src/perftest/device/coll/fcollect_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/fcollect_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/redmaxloc_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/reducescatter_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/reducescatter_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/reduction_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/reduction_latency.cu
/usr/src/nvshmem/13/src/perftest/device/coll/sync_latency.args
/usr/src/nvshmem/13/src/perftest/device/coll/sync_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_bw.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_ping_pong_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_atomic_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_g_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_g_bw.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_g_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_g_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_get_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_get_bw.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_get_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_get_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_bw.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_ping_pong_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_p_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_atomic_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_bw.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_ping_pong_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_signal_ping_pong_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_put_signal_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_signal_ping_pong_latency.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_signal_ping_pong_latency.cu
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_st_bw.args
/usr/src/nvshmem/13/src/perftest/device/pt-to-pt/shmem_st_bw.cu
/usr/src/nvshmem/13/src/perftest/device/tile
/usr/src/nvshmem/13/src/perftest/device/tile/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/device/tile/tile_allgather_latency.cu
/usr/src/nvshmem/13/src/perftest/device/tile/tile_allreduce_latency.cu
/usr/src/nvshmem/13/src/perftest/device/tile/tile_coll_test.h
/usr/src/nvshmem/13/src/perftest/host
/usr/src/nvshmem/13/src/perftest/host/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/host/coll
/usr/src/nvshmem/13/src/perftest/host/coll/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/host/coll/alltoall_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/alltoall_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/barrier_all_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/barrier_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/barrier_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/broadcast_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/broadcast_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/coll_test.h
/usr/src/nvshmem/13/src/perftest/host/coll/fcollect_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/fcollect_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/reducescatter_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/reducescatter_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/reduction_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/reduction_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/sync_all_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/coll/sync_on_stream.args
/usr/src/nvshmem/13/src/perftest/host/coll/sync_on_stream.cpp
/usr/src/nvshmem/13/src/perftest/host/init
/usr/src/nvshmem/13/src/perftest/host/init/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/host/init/malloc.cpp
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/CMakeLists.txt
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/bw.args
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/bw.cpp
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/latency.args
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/latency.cpp
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/stream_latency.args
/usr/src/nvshmem/13/src/perftest/host/pt-to-pt/stream_latency.cu
/usr/src/nvshmem/13/src/perftest/perftest-mmap-full.list
/usr/src/nvshmem/13/src/perftest/perftest-mmap-sanity.list
/usr/src/nvshmem/13/src/perftest/perftest-p2p-cudagraph.list
/usr/src/nvshmem/13/src/transport-plugins
/usr/src/nvshmem/13/src/transport-plugins/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/common
/usr/src/nvshmem/13/src/transport-plugins/common/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/common/env_defs.h
/usr/src/nvshmem/13/src/transport-plugins/common/mlx5_ifc.h
/usr/src/nvshmem/13/src/transport-plugins/common/mlx5_prm.h
/usr/src/nvshmem/13/src/transport-plugins/common/transport_common.cpp
/usr/src/nvshmem/13/src/transport-plugins/common/transport_common.h
/usr/src/nvshmem/13/src/transport-plugins/common/transport_gdr_common.cpp
/usr/src/nvshmem/13/src/transport-plugins/common/transport_gdr_common.h
/usr/src/nvshmem/13/src/transport-plugins/common/transport_ib_common.cpp
/usr/src/nvshmem/13/src/transport-plugins/common/transport_ib_common.h
/usr/src/nvshmem/13/src/transport-plugins/common/transport_mlx5_common.cpp
/usr/src/nvshmem/13/src/transport-plugins/common/transport_mlx5_common.h
/usr/src/nvshmem/13/src/transport-plugins/ibdevx
/usr/src/nvshmem/13/src/transport-plugins/ibdevx/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/ibdevx/ibdevx.cpp
/usr/src/nvshmem/13/src/transport-plugins/ibdevx/ibdevx.h
/usr/src/nvshmem/13/src/transport-plugins/ibgda
/usr/src/nvshmem/13/src/transport-plugins/ibgda/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/ibgda/ibgda.cpp
/usr/src/nvshmem/13/src/transport-plugins/ibrc
/usr/src/nvshmem/13/src/transport-plugins/ibrc/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/ibrc/ibrc.cpp
/usr/src/nvshmem/13/src/transport-plugins/include
/usr/src/nvshmem/13/src/transport-plugins/include/bootstrap_host_transport
/usr/src/nvshmem/13/src/transport-plugins/include/bootstrap_host_transport/env_defs_internal.h
/usr/src/nvshmem/13/src/transport-plugins/include/device_host_transport
/usr/src/nvshmem/13/src/transport-plugins/include/device_host_transport/nvshmem_common_ibgda.h
/usr/src/nvshmem/13/src/transport-plugins/include/device_host_transport/nvshmem_common_transport.h
/usr/src/nvshmem/13/src/transport-plugins/include/device_host_transport/nvshmem_constants.h
/usr/src/nvshmem/13/src/transport-plugins/include/internal
/usr/src/nvshmem/13/src/transport-plugins/include/internal/bootstrap_host_transport
/usr/src/nvshmem/13/src/transport-plugins/include/internal/bootstrap_host_transport/nvshmemi_bootstrap_defines.h
/usr/src/nvshmem/13/src/transport-plugins/include/internal/host_transport
/usr/src/nvshmem/13/src/transport-plugins/include/internal/host_transport/cudawrap.h
/usr/src/nvshmem/13/src/transport-plugins/include/internal/host_transport/nvshmemi_transport_defines.h
/usr/src/nvshmem/13/src/transport-plugins/include/internal/host_transport/transport.h
/usr/src/nvshmem/13/src/transport-plugins/include/non_abi
/usr/src/nvshmem/13/src/transport-plugins/include/non_abi/nvshmem_build_options.h
/usr/src/nvshmem/13/src/transport-plugins/include/non_abi/nvshmem_version.h
/usr/src/nvshmem/13/src/transport-plugins/include/non_abi/nvshmemx_error.h
/usr/src/nvshmem/13/src/transport-plugins/libfabric
/usr/src/nvshmem/13/src/transport-plugins/libfabric/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/libfabric/libfabric.cpp
/usr/src/nvshmem/13/src/transport-plugins/libfabric/libfabric.h
/usr/src/nvshmem/13/src/transport-plugins/nvshmem_transport.sym
/usr/src/nvshmem/13/src/transport-plugins/ucx
/usr/src/nvshmem/13/src/transport-plugins/ucx/CMakeLists.txt
/usr/src/nvshmem/13/src/transport-plugins/ucx/ucx.cpp
/usr/src/nvshmem/13/src/transport-plugins/ucx/ucx.h
/usr/bin/nvshmem_13/nvshmrun
/usr/lib/x86_64-linux-gnu/nvshmem/13/libnvshmem_host.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_mpi.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmi2.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_pmix.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_shmem.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_bootstrap_uid.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibdevx.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibgda.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ibrc.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_libfabric.so
/usr/lib/x86_64-linux-gnu/nvshmem/13/nvshmem_transport_ucx.so

@isVoid
Copy link

isVoid commented Aug 14, 2025

Oh, /opt? — That doesn't match what I see on the machine that I used for interactive testing. See below.

This aligns with my local testing as well. Sorry about the confusion! To support the python use case, we are only requesting the library to be discoverable when it's installed via nvidia-nvshmem-cu[X]. It's a hard dependency by nvshmem4py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.pathfinder Everything related to the cuda.pathfinder module feature New feature or request P0 High priority - Must do!
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants