Skip to content

[FBcode->GH] Fix DeformConvTester::test_backward_cuda #3942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 1, 2021

Conversation

NicolasHug
Copy link
Member

Summary:
test_backward_cuda_contiguous and test_backward_cuda_non_contiguous have been failing on fbcode for a while with the following error too many resources requested for launch which suggests that too may threads per block are requested.

This issue was already causing problems in the original PR #2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability)

However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense.

Moreover, let's note that that the current tests (as on master):

  • pass on OSS linux CI which rely on a P4 GPU (up to last week), i.e. CC = 6.1
  • pass on OSS windows CI which relies on a T4 GPU, i.e. CC = 7.5
  • fail on the AWS cluster which relies on a V100 GPU, i.e. CC = 7.0

It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC.

Reviewed By: fmassa

Differential Revision: D28641626

fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b

Summary:
`test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested.

This issue was already causing problems in the original PR pytorch#2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability)

However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense.

Moreover, let's note that that the current tests (as on `master`):

- **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1**
- **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5**
- **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0**

It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC.

ngimel,  fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks!

Reviewed By: fmassa

Differential Revision: D28641626

fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b
@NicolasHug
Copy link
Member Author

Clang is failing for a minor issue, I'm not sure what we should do in this case.
I could fix the formatting but if I do it here, this will likely create a conflict when syncing GH with fbcode.

I guess it's fine to ignore the failure?

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change looks good but the linter fails.

After fixing the styling issue, we will still need to pass this change to FBcode.

Edit: Sorry my browser showed a cached version of your PR, without your above comment. We have to options:

  1. Merge and fix ASAP the styling issue on a follow up PR. This has the unfortunate side-effect that breaks master for 1 commit but does not cause conflicts on FBcode.
  2. Close this PR, send a new PR that is not marked with [FBcode->GH] and properly adds the change (including styles) and then move this single PR on FBcode and resolve conflicts.

I don't have a strong opinion over what we will do. To avoid similar issues on the future, we should always export our Diffs to GH before merging on FBcode and ensure all CIs pass.

@NicolasHug
Copy link
Member Author

@datumbox so how do you suggest we proceed? As I mentioned above, fixing the formatting in this PR will create conflicts I think, whether we prefix with [FBcode->GH] or not.

We can merge as-is and fix the formatting in another PR but that might be overkill as the formatting issue should only be present in the PR that introduces them (so clang will stay green in master starting from the next merged PR). At least that's how flake8 works.

@fmassa what would you recommend?

@NicolasHug
Copy link
Member Author

OK, sorry @datumbox I replied before I saw your EDIT.

Let's do 1 then, as it has the lowest risk of conflicts, which is always a pain when syncing. I'll merge now and provide the formatting fix right away

@NicolasHug
Copy link
Member Author

@fmassa you're the oncall this week and I forgot to include the tag in the commit message for this PR, sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants