-
Notifications
You must be signed in to change notification settings - Fork 7.1k
test_detection_model[cuda-fasterrcnn_mobilenet_v3_large_320_fpn] failing due to NVFuser #6015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just for my own record. Failing log:
|
Looking at the output value (comparing jit & eager), I'm suspecting there's some indexing issue... I'm waiting on my build of devel merged into upstream to try my luck there. FYI, @eellison suggested dropout. I went through the TS graph and don't spot any dropout/rand-like ops there. |
Hmmmm. I think there's a combination of indexing + permutation support. devel branch does get rid of the mismatched output. So the mysterious indexing fixes are real. (though we never really pin down which PR actually fixed that). |
errrr.... the index issue is not related to permutation... I patched that anyway in pytorch/pytorch#77460 |
I confirm that TorchVision's latest main branch is failing still. See #6017 @davidberard98 @jjsjann123 Could you please revert the offending PR to resolve the failure? We have been facing breakages from upstream commits on core for several weeks now and this has been very disruptive for the project. Please help us restore the CI on green and run the necessary tests prior relanding to ensure we won't break TorchVision's tests. |
Sorry for the inconvenience and confusion. I verified the fix on my local machine last Friday. |
@davidberard98 @jjsjann123 Thanks a lot for helping us fix the breakage. I confirm that the issue now is resolved on TorchVision's latest main branch d9a6950. |
Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: #77471 Approved by: https://github.com/seemethere, https://github.com/eellison
Summary: Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point bypass-github-export-checks Pull Request resolved: #77471 Reviewed By: malfet, seemethere Differential Revision: D36393120 Pulled By: eellison fbshipit-source-id: 876f2d066e8e54b5d076de66ad1811f6970be1c8
Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: pytorch/pytorch#77471 Approved by: https://github.com/seemethere, https://github.com/eellison
Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: pytorch/pytorch#77471 Approved by: https://github.com/seemethere, https://github.com/eellison
🐛 Describe the bug
convolution decomposition has a bug in nvfuser - currently looking into it with @jjsjann123
Versions
CI, windows gpu + linux 3.10 gpu
The text was updated successfully, but these errors were encountered: