Skip to content

Install Jetson TX2 Max Regcount Error #12641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lesreaper opened this issue Oct 15, 2018 · 5 comments
Open

Install Jetson TX2 Max Regcount Error #12641

lesreaper opened this issue Oct 15, 2018 · 5 comments
Labels
module: build Build system issues module: jetson Related to the Jetson builds by NVIDIA needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@lesreaper
Copy link

lesreaper commented Oct 15, 2018

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Attempt to install from source on a fresh Jetpack 3.3 on nVidia Jetson TX2
  2. Instead of python setup.py install, install with python3 setup.py install (Tried with both, same error)

Errors are:
...about 100 NVLink errors, listing the last few below along with final error log.
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
Makefile:83: recipe for target '/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o] Error 255
Makefile:45: recipe for target 'devicelib' failed
make[4]: *** [devicelib] Error 2
Makefile:24: recipe for target 'src.build' failed
make[3]: *** [src.build] Error 2
CMakeFiles/nccl.dir/build.make:60: recipe for target 'lib/libnccl.so' failed
make[2]: *** [lib/libnccl.so] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl.dir/all' failed
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack nccl caffe2 libshm gloo c10d THD'

Expected behavior

Install should work so that I can open a Python 3 console and can succesfully do: import torch

Environment

Script does not run.

  • PyTorch Version (e.g., 1.0): Latest master
  • OS (e.g., Linux): nVidia Jetson TX2 Ubuntu, aarch64 architecture
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python3 setup.py install
  • Python version: 3.5.3
  • CUDA/cuDNN version: 9.0, 7.0
  • GPU models and configuration:
  • Any other relevant information:

There is no Conda build for aarch64, so have to use standard python libraries.

cc @malfet @seemethere @walterddr

@jyangchisyan
Copy link

jyangchisyan commented Oct 17, 2018

Hi @lesreaper
I think this problem is third_party/nccl issue.
you could work around modify third_party/nccl/nccl/makefiles/common.mk:48 with maxrregcount options

NVCUFLAGS  := -ccbin $(CXX) $(NVCC_GENCODE) -lineinfo -std=c++11 -Xptxas -maxrregcount=80 -Xfatbin -compress-all

the related issue reference

@SiyuanMa0316
Copy link

Hi @lesreaper
I think this problem is third_party/nccl issue.
you could work around modify third_party/nccl/nccl/makefiles/common.mk:48 with maxrregcount options

NVCUFLAGS  := -ccbin $(CXX) $(NVCC_GENCODE) -lineinfo -std=c++11 -Xptxas -maxrregcount=80 -Xfatbin -compress-all

the related issue reference
Thank you @jyangchisyan. I'm facing the same issue. However this doesn't work...

@jyangchisyan
Copy link

jyangchisyan commented Oct 23, 2018

@SiyuanMa0316
Could you paste error log? Maybe you also need remove CmakeCache.txt.

Cmake had translated CmakeList.txt to makefile, so you still use old options file.
I don't trace which line excute above action.
The stupid method returns clean project, modify that and execute again.

If you are interesting Cmake, the little reference

@zou3519 zou3519 added the module: build Build system issues label Oct 23, 2018
@lesreaper
Copy link
Author

I got it installed. I had to use a specific install as state on the nVidia forums here:
https://devtalk.nvidia.com/default/topic/1042821/?comment=5291480

@t-vi
Copy link
Collaborator

t-vi commented Apr 14, 2021

I wonder if this bug now is obsolete. PyTorch builds on Jetpack just fine for me and apparently has for quite a while.
@lesreaper Do you think the issue still exists or might it be closed?

@mruberry mruberry added module: jetson Related to the Jetson builds by NVIDIA triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: jetson Related to the Jetson builds by NVIDIA needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants