-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Fix a bug in CUDA pool op #19780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug in CUDA pool op #19780
Conversation
@@ -182,16 +182,20 @@ Status Pool<T, PoolType, NHWC>::ComputeInternal(OpKernelContext* context) const | |||
if (NHWC) { | |||
x_dims_cudnn.insert(x_dims_cudnn.begin() + 1, 1); | |||
y_dims_cudnn.insert(y_dims_cudnn.begin() + 1, 1); | |||
ORT_ENFORCE(pads.size() >= kernel_shape.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I'm not sure what's the correct order for this NHWC path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NHWC implementation seems not right: x_dims_cudnn.insert(x_dims_cudnn.begin() + 1, 1)
means it adds a dimension H (like NWC => NHWC). Then the kernel_shape and pads for spatial axis shall pad to the first instead of second.
I think a proper implementation (NHC => NHWC):
if (NHWC) {
x_dims_cudnn.insert(x_dims_cudnn.begin() + 2, 1);
y_dims_cudnn.insert(y_dims_cudnn.begin() + 2, 1);
ORT_ENFORCE(pads.size() >= kernel_shape.size());
pads.insert(pads.begin() + kernel_shape.size(), 0);
pads.insert(pads.end(), 0);
kernel_shape.push_back(1);
strides.push_back(1);
}
Then we can refactor it to share last 5 lines of common code for NHWC and NCHW.
BTW, the code for global pooling also need add NHWC support (in a separated PR):
onnxruntime/onnxruntime/core/providers/cuda/nn/pool.cc
Lines 163 to 168 in 901356d
if (pool_attrs_.global_pooling) { | |
kernel_shape.assign(x_dims.begin() + 2, x_dims.end()); | |
pads.assign(kernel_shape.size(), 0); | |
strides.assign(kernel_shape.size(), 1); | |
} | |
auto out_channel = NHWC ? x_shape[3] : x_shape[1]; |
I think pads shall be like
pads.assign(kernel_shape.size() * 2, 0);
in line 165?Line 168 does not handle 3D or 5D inputs, could cause issue as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We have been adding the extra feature dim to the right of the existing feature dim - not the to the left.
(1) It seems like - we need 1-D NHWC Pool tests added (would have caught the issue Tianlei mentioned)
(2) The existing 1-D NCHW Pool tests need some enhancements - (
static void MaxPool1D_8_WithIndexTest(int64_t storage_order) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the code shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the code for global pooling also need add NHWC support (in a separated PR):
Right. But how could we already have a unit test for it, and it is passing? See this commit: 8e1b4b8
The commit was trying to disable the test.
Is there a test case? |
It was an out-of-bounds write.error, which results undefined behavior, which mean a test case often may still pass. |
CC: @gedoensmax as FYI |
FYI: AzureML Vision team is also encountering this issue. Thank you @snnn @hariharans29 and @pranavsharma for investigating. Repro: import torch.nn.functional as F
from torch import nn
from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel
import torch
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(28, 10)
self.fc2 = nn.Linear(10, 10)
self.pooler = nn.AdaptiveAvgPool1d(1) # BUG
def forward(self, x):
x = self.pooler(x) # BUG
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
model = Net()
model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='pooler', log_level=LogLevel.VERBOSE))
model.to("cuda")
images = torch.randn(8, 28, 28).to("cuda")
output = model(images) |
Add a test case. Without the code change in pool.cc, the code fails with [chasun@bigdisk Debug]$ ./onnxruntime_test_all --gtest_filter=PoolTest.MaxPool1D_case3 /data/bt/src/onnxruntime/onnxruntime/test/providers/checkers.cc:300: Failure /data/bt/src/onnxruntime/onnxruntime/test/providers/checkers.cc:300: Failure /data/bt/src/onnxruntime/onnxruntime/test/providers/checkers.cc:300: Failure /data/bt/src/onnxruntime/onnxruntime/test/providers/checkers.cc:300: Failure [ FAILED ] PoolTest.MaxPool1D_case3 (147 ms) [----------] Global test environment tear-down |
This PR is ready to be merged. |
@@ -59,6 +59,8 @@ TYPED_TEST(CudaNhwcTypedTest, MaxPoolNhwc) { | |||
MAKE_PROVIDERS() | |||
} | |||
|
|||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to keep this test ? It is 2-D NHWC Global Pooling (not 1-D which we disabled support for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I reverted it. Thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the NHWC 2D global pool is still not right. I am debugging it.
// The last dim of x_dims is channel(C). | ||
// Put the other part in kernel_shape | ||
kernel_shape.assign(x_dims.begin() + 1, x_dims.end() - 1); | ||
pads.assign(kernel_shape.size(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pads need to be 2X size of kernel_shape see line 203
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how could the old code work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (pool_attrs_.global_pooling) {
kernel_shape.assign(x_dims.begin() + 2, x_dims.end());
pads.assign(kernel_shape.size(), 0);
strides.assign(kernel_shape.size(), 1);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have test case for that? How did it pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there are no "pads" for Global pooling - https://onnx.ai/onnx/operators/onnx__GlobalMaxPool.html (i.e.) pads are irrelevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snnn, that's discussed in #19889 (comment). I think that PR is almost ready since tests have passed (need some minor change to address build and feedback). That PR can address this bug as well.
The only concern is that PR might depend on other NHWC change, so it might be little harder to cherry-pick if we want to have a patch release soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just fix the NCHW regression (for the patch) and leave anything related to NHWC un-implemented that can be filled in by the other PR. In any case, I don't think we have a ton of NHWC users just yet that will be broken by the patch release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either is fine. If we gave up this one, we should cherry-pick #19889 to the patch release. That's PR is not very big. So I am not worried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the bugs are fixed ,we should refactor the SetPoolingNdDescriptorHelper function a little: change the three array type parameters to std::span, and check if all their lengths equal. I can take the work after #19889 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in CUDAExecutionProvider::GetCapability function https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_execution_provider.cc we should check if paddings are symmetric when the op is a pooling op but it is not global pooling.
Replaced by #19889 |
Description
PR #17200 introduced a bug in CUDA EP. Before the change, a piece of code in poo.cc was like:
After the change, the code becomes:
The change swapped the order of "pads.insert(pads.begin() + kernel_shape.size(), 0);" and "kernel_shape.push_back(1);"
Therefore the "pads.insert" could cause an out-of-bounds write.
Motivation and Context