Skip to content

Always try to preload pypi cuda deps. #133963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

oraluben
Copy link
Contributor

@oraluben oraluben commented Aug 20, 2024

Fixes #101314
Fixes #121207

Could detect #138324 before release

Fixes the import error when using redhat-like system with a system-wide cuda runtime.

Redhat-like have seperate platlib and purelib dir and breaks the rpath search, while if a system-wide cuda runtime presents, importing libtorch_global_deps.so will not fail because it only depends on libcudart.so and libnvToolsExt.so so the pypi path is not triggered, then it will fail likely because of a version mismatch of libcudnn and libnccl.

This PR always try to preload pypi nvidia librarys if any site-packages/nvidia presents, and fixes the issue.

Risk: unexpected (e.g. partiacally installed pypi nvidia packages) site-packages/nvidia folder present will lead to import failure. This PR also change the behavior that all paths in sys.path will be walked through, searching for /nvidia/<libs>. It seems unnecessary to me, but I do see some references from old issue e.g. #92096. Now it only searched for site-packages/nvidia. (Debian might need to take care of the dist-packages dir)

We use python's import mechanism to detect and find the pypi packages, this can handle various case, including purelib/platlib, and different install location from easy install.

@malfet @atalman

Copy link

pytorch-bot bot commented Aug 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133963

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c4188d5 with merge base 24b695a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@oraluben oraluben force-pushed the preload-pypi-if-possible branch from 409bdf6 to 02f2270 Compare August 20, 2024 04:24
@soulitzer soulitzer requested a review from malfet August 21, 2024 00:02
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 21, 2024
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Oct 20, 2024
@oraluben
Copy link
Contributor Author

ping @malfet

@oraluben
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 22, 2024
@oraluben oraluben force-pushed the preload-pypi-if-possible branch from 6112c47 to daaadd0 Compare October 31, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
3 participants