-
Notifications
You must be signed in to change notification settings - Fork 351
Add CUDA forward compatibility hook #948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change adds an nvidia-cdi-hook enable-cuda-compat hook that checks the container for cuda compat libs and updates /etc/ld.so.conf.d to include their parent folder if their driver major version is sufficient. This allows CUDA Forward Compatibility to be used when this is not available through the libnvidia-container. Signed-off-by: Evan Lezar <[email protected]>
This change adds the enable-cuda-compat hook to the incomming OCI runtime spec if the allow-cuda-compat-libs-from-container feature flag is not enabled. An update-ldcache hook is also injected to ensure that the required folders are processed. Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
3307cb1
to
c1bac28
Compare
default: | ||
return []string{"mode", "graphics", "feature-gated"} | ||
return []string{"feature-gated", "graphics", "mode"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar Hi, I have a question here. In this modifier order, it will create CreateContainer Hook like this ["enable-cuda-compat", "update-ldcache", "create-symlinks"]. As "update-ldcache" runs before "create-symlinks", so hook "create-symlinks" do some bind mount so(dynamic link library) in container will not add into ldcache?
With #877 the default behaviour of the NVIDIA Container Runtime / NVIDIA Container Runtime Hook was changed to not mount compat libraries from the container into the container. This removed "automatic" support for CUDA Forward compatibility.
This change attempts to address this by adding a
createContainerHook
that will create a file in/etc/ld.so.conf.d/
in the container to ensure that the/usr/local/cuda/compat
libraries are added to the ldcache over the libraries mounted from the host. The provided host diver version is compared to the version of the compat libraries in the container and the config update is only performed if the compat libraries are newer than the host drivers.Note that the hook only creates a file in the container's file system and does not perform any mount operations. This means that this mechanism is not present the same vulnerabilities causing CVE-2024-0132 and CVE-2025-23359.
In the case of the legacy runtime, this behaviour is only triggered if the
allow-cuda-compat-libs-from-container
feature flag is not enabled. The CDI spec generation has also been extended to include this hook.This backports #906