Skip to content

Conversation

elezar
Copy link
Member

@elezar elezar commented Jun 18, 2025

This change switches to the nvcr.io/nvidia/distroless/go:v3.1.9-dev distroless go image for both the application image and the packaging image.

@elezar elezar requested review from cdesiniotis and tariq1890 June 18, 2025 10:07
@elezar elezar self-assigned this Jun 18, 2025
@coveralls
Copy link

coveralls commented Jun 18, 2025

Pull Request Test Coverage Report for Build 15753213707

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 33.644%

Totals Coverage Status
Change from base Build 15744214901: 0.0%
Covered Lines: 4366
Relevant Lines: 12977

💛 - Coveralls

@elezar elezar force-pushed the switch-to-distroless branch from ea3b2ed to 67e0b1c Compare June 18, 2025 10:17
@elezar elezar changed the title Switch to distroless Switch to distroless Base image Jun 18, 2025
@elezar elezar force-pushed the switch-to-distroless branch 2 times, most recently from 27605cd to bccb7d8 Compare June 18, 2025 12:15
@elezar elezar force-pushed the switch-to-distroless branch 3 times, most recently from 6998d23 to 9b15d1d Compare June 18, 2025 13:12
@elezar elezar marked this pull request as ready for review June 18, 2025 13:37
@elezar elezar added this to the v1.18.0 milestone Jun 18, 2025
FROM nvcr.io/nvidia/cuda:12.9.0-base-ubi9
# The application stage contains the application used as a GPU Operator
# operand.
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the shell included in the -dev tags is located at /busybox/sh. I would recommend creating a symlink at /bin/sh so that in the operator we can use #! /bin/sh as the shebang for the entrypoint script. By using /bin/sh we remain backwards compatible with older toolkit images that are not built on distroless. We have tested this with other operands, e.g. https://github.com/NVIDIA/k8s-kata-manager/blob/f58e4dad0695043a545b17e3e159e24828816a62/deployments/container/Dockerfile#L50-L51

Suggested change
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
FROM nvcr.io/nvidia/distroless/go:v3.1.9-dev AS application
SHELL ["/busybox/sh", "-c"]
RUN ln -s /busybox/sh /bin/sh

Copy link
Contributor

@cdesiniotis cdesiniotis Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we explicitly set USER 0:0 in the Dockerfile as the default user in distroless is uid 1000? I assume the toolkit requires running as root (for restarting containerd).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the shell tip. Will update.

I'm not sure on the user preference. Does the GPU Operator not set the user in general? Would using the current user (USER 1000:1000) not be more "compliant"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the user to 0:0 below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on the user preference. Does the GPU Operator not set the user in general? Would using the current user (USER 1000:1000) not be more "compliant"?

The GPU Operator does not explicitly set the runAsUser / runAsGroup fields when deploying Daemonsets, so we currently depend on the user / group defined in the image itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have performed a quick sanity check of the toolkit image built in this PR. Looks good.

Besides having to change the entrypoint script to be a POSIX shell script (and not bash), I also had to change https://github.com/NVIDIA/gpu-operator/blob/6324d2aca562edf46d93cbf9d2a0837ab5c12e59/assets/state-container-toolkit/0400_configmap.yaml#L34 from

exec nvidia-toolkit

to

exec nvidia-ctk-installer

I see the name of the executable has changed. This is a breaking change that will need to be made when we bump the version of the toolkit to 1.18.0 in the GPU Operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised NVIDIA/gpu-operator#1496 which updates our entrypoint scripts in the operator to use sh instead of bash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can include an nvidia-toolkit symlink so that we maintain backward compatibility.

Also on:

I've updated the user to 0:0 below.

I had to set the user before we create the /bin/sh symlink since the default user doesn't have permissions to write to /bin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the image to include a /work/nvidia-toolkit -> /work/nvidia-ctk-installer symlink. This should allow compatibility with the GPU Operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@elezar elezar force-pushed the switch-to-distroless branch from fb7573b to e8abb58 Compare June 18, 2025 21:31
@elezar elezar force-pushed the switch-to-distroless branch from e8abb58 to b94721c Compare June 18, 2025 21:39
elezar added 3 commits June 19, 2025 10:20
This change removes the NGC-DL-CONTAINER-LICENSE (since this
is not available in the distroless images) and includes the
repo's Apache LICENSE file in the image.

Signed-off-by: Evan Lezar <[email protected]>
This change ensures that a symlink from /work/nvidia-toolkit to
/work/nvidia-ctk-installer exists to allow GPU Operator versions
that override the entrypoint and assume nvidia-toolkit as the
original entrypoint.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the switch-to-distroless branch from b94721c to 6070681 Compare June 19, 2025 08:25
@elezar elezar merged commit 5bc2f50 into NVIDIA:main Jun 24, 2025
16 checks passed
@elezar elezar deleted the switch-to-distroless branch June 24, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants