Skip to content

[ROCm] Fix unit tests on CI #11191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 534 commits into from
Closed

[ROCm] Fix unit tests on CI #11191

wants to merge 534 commits into from

Conversation

iotamudelta
Copy link
Contributor

Disables two of the unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm.

iotamudelta and others added 30 commits August 6, 2018 16:14
…RAND_PR

While there, add the remaining changes requested in upstream PR pytorch#10266
Refactor unit test skip statements to use @skipIfRocm annotation
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ssnl
Copy link
Collaborator

ssnl commented Sep 3, 2018

I'm really confused why these tests fail with

02:00:13 ======================================================================
02:00:13 ERROR: test_max_with_inf (__main__.TestCuda)
02:00:13 ----------------------------------------------------------------------
02:00:13 Traceback (most recent call last):
02:00:13   File "/var/lib/jenkins/workspace/test/common.py", line 251, in wrapper
02:00:13     method(*args, **kwargs)
02:00:13   File "test_cuda.py", line 1700, in test_max_with_inf
02:00:13     TestTorch._test_max_with_inf(self, (torch.half, torch.float, torch.double), 'cuda')
02:00:13   File "/var/lib/jenkins/workspace/test/test_torch.py", line 800, in _test_max_with_inf
02:00:13     self.assertTrue(torch.all(torch.max(a, dim=1)[0] == inf).item())
02:00:13 RuntimeError: Expected object of scalar type Half but got scalar type Long for argument #0 'max'

Any ideas?

@iotamudelta
Copy link
Contributor Author

@ssnl unclear. We are working on unit test pass rate ATM. There are a few things that could interfere here: a) we are aware of a compiler bug (fixed in #11198 ) that can cause hangs and crashes, b) we are also aware of a few tests (min/max in particular) that succeed on our nodes but fail on the CI (we are looking into this).

PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
Disables two of the  unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm.
Pull Request resolved: pytorch#11191

Differential Revision: D9628702

Pulled By: ezyang

fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants