Skip to content

make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions #67161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented Oct 24, 2021

Make TORCH_CUDABLAS_CHECK and TORCH_CUSOLVER_CHECK available in custom extensions by exporting the internal functions called by the both macros.

Rel: #67073

cc @xwang233 @ptrblck

@pytorch-probot
Copy link

pytorch-probot bot commented Oct 24, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/crcrpar/pytorch/blob/c28da230e602bcffc893f081f5a2d14e5c2f260c/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 24, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c28da23 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/verbatim-sources/job-specs/job-specs-custom.yml
Auto-merging .circleci/verbatim-sources/job-specs/job-specs-custom.yml
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_rocm.sh
Auto-merging .circleci/docker/common/install_rocm.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh
Auto-merging .circleci/docker/common/install_base.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/verbatim-sources/job-specs/job-specs-custom.yml
Auto-merging .circleci/verbatim-sources/job-specs/job-specs-custom.yml
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_rocm.sh
Auto-merging .circleci/docker/common/install_rocm.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh
Auto-merging .circleci/docker/common/install_base.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@crcrpar crcrpar marked this pull request as ready for review October 25, 2021 01:10
@xwang233
Copy link
Collaborator

Thanks @crcrpar for the fix. Can you also check if torch_cusolver_check and torch_cusparse_check have the same issue?

@ngimel
Copy link
Collaborator

ngimel commented Oct 26, 2021

@crcpar can you please add cusolver fix here too? If cusparse is problematic, let's postpone it.

To verify that `TORCH_CUSOLVER_CHECK` is NOT available in a custom
extension.

```
======================================================================
ERROR: test_cusolver_extension (__main__.TestCppExtensionAOT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cpp_extensions_aot_ninja.py", line 95, in
  test_cusolver_extension
      from torch_test_cpp_extension import cusolver_extension
      ImportError:
      /home/mkozuki/ghq/github.com/crcrpar/torch-0/test/cpp_extensions/install/home/mkozuki/anaconda3/envs/torch-0/lib/python3.8/site-packages/torch_test_cpp_extension/cusolver_extension.cpython-38-x86_64-linux-gnu.so:
      undefined symbol:
      _ZN2at4cuda6solver23cusolverGetErrorMessageE16cusolverStatus_t

```
@crcrpar crcrpar changed the title export at::cuda::blas::_cublasGetErrorEnum with C10_EXPORT make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions Oct 27, 2021
@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented Oct 28, 2021

ROCm build error is real

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in d4493b2.

@crcrpar crcrpar deleted the cublas_macro_in_ext branch October 30, 2021 00:49
@mruberry
Copy link
Collaborator

Reverting as this appears to have broken Windows CUDA jobs. See https://github.com/pytorch/pytorch/runs/4052843481?check_suite_focus=true and https://github.com/pytorch/pytorch/runs/4052717247?check_suite_focus=true for examples.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by aa16de5. To re-land this change, follow these steps.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 5aca6b496d1f5e6402877e852e96ca7ae6d9bf8b. To re-land this change, follow these steps.

@ngimel
Copy link
Collaborator

ngimel commented Nov 2, 2021

Hey @crcrpar can you please resubmit this PR skipping windows test? If you can figure out what's failing with windows that'd be great, but if not, we can at least fix linux extensions (windows is not getting worse, right? It couldn't have extensions with TORCH_CUDABLAS_CHECK, and it still can't?)

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Nov 3, 2021
…ytorch#67161)

Summary:
Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros.

Rel: pytorch#67073

cc xwang233 ptrblck

Pull Request resolved: pytorch#67161

Reviewed By: jbschlosser

Differential Revision: D31984694

Pulled By: ngimel

fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106
@crcrpar crcrpar mentioned this pull request Nov 3, 2021
@crcrpar
Copy link
Collaborator Author

crcrpar commented Nov 3, 2021

sure, was busy a bit.

facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2021
Summary:
Skip building extensions if windows following #67161 (comment)

Related issue: #67073

cc ngimel xwang233 ptrblck

Pull Request resolved: #67735

Reviewed By: bdhirsh

Differential Revision: D32141250

Pulled By: ngimel

fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by aa16de5. To re-land this change, follow these steps.

@crcrpar
Copy link
Collaborator Author

crcrpar commented Jan 7, 2022

hmm, the commit mentioned above comment is the same one as in #67161 (comment)

@mruberry
Copy link
Collaborator

hmm, the commit mentioned above comment is the same one as in #67161 (comment)

The bot had an issue where it was repeating revert messages; sorry about that, @crcrpar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants