Skip to content

Commit 5074a72

Browse files
nickggfacebook-github-bot
authored andcommitted
Add Layout field to Conv and Pool nodes and remove OCL specific versions (#3367)
Summary: For convolutions in glow we use NHWC layout, but it's more efficient to use NCHW on GPUS. To enable this the OCL backend adds some specific OCLConvolution, OCLMaxPool and OCLAvgPool nodes and transforms general Conv and Pool nodes into them by shuffling dimensions and adding TransposeNodes. This PR adds a field `Layout` to ConvolutionNode, MaxPoolNode and AvgPoolNode which can be either NHWC or NCHW. The OCL backend still transforms these nodes but into the same node type with the layout set to NCHW (we still add transposes). This will allow us to reuse this logic in other backends with the same NCHW preference without needing multiple identical BackendConvolutionNodes (etc). This **DOES NOT** add NCHW convolution support to any backends, or change any OCL kernels - it just removes OCL specific nodes in favour of more general ones. This was a consensus decision, but worth thinking about whether or no the code is clearer after this change or not too. Documentation: deleted the section in docs referencing these nodes. Pull Request resolved: #3367 Test Plan: ninja test in various modes (debug, release, asan). Ran resnet-runtime on OCL, ran image-classifier on OCL, tracing-compare across interp, cpu and ocl. Differential Revision: D16631421 Pulled By: nickgg fbshipit-source-id: 3005a0cfda474db95020f2c1f162aebe4b016a59
1 parent 9b9ce4f commit 5074a72

File tree

24 files changed

+385
-386
lines changed

24 files changed

+385
-386
lines changed

docs/NewBackendSpecificNode.md

Lines changed: 0 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -48,47 +48,6 @@ ReLU is max between zero and the input value. Glow lowers `ReLUNode` to two basi
4848

4949
Please refer to the document in [Backend](https://github.com/pytorch/glow/blob/master/docs/Backends.md#backend-specific-nodes-and-instructions) part for source code details on adding a new backend-specific CPUMaxSplatNode on CPU.
5050

51-
#### Data Layout Transformation for Conv Operator in OpenCL
52-
53-
OpenCL Conv is faster in layout `NCHW`, but the default layout of convolution operator in Glow is `NHWC`. So we transpose the inputs/output and replace the `ConvolutionNode` with a backend-specific `OCLConvolutionNode` that uses `NCHW`. The transposes mostly can get optimized away thanks to the high-level graph optimizations.
54-
55-
The OpenCL backend defines `OCLConvolution` in `tools/ClassGen/OpenCL/OpenCLSpecificNodes.h` to support layout `NCHW` input.
56-
57-
```cpp
58-
BB.newNode("OCLConvolution")
59-
.addInput("Input")
60-
.addInput("Filter")
61-
.addInput("Bias")
62-
.addMember(MemberType::VectorUnsigned, "Kernels")
63-
.addMember(MemberType::VectorUnsigned, "Strides")
64-
.addMember(MemberType::VectorUnsigned, "Pads")
65-
.addMember(MemberType::Unsigned, "Group")
66-
.addResultFromCtorArg()
67-
.setDocstring(
68-
"This is an OpenCL-specific convolution implementation where the "
69-
"filter, the bias and the input are in the NCHW format");
70-
```
71-
72-
During `transformPostLowering()`, this `convertConvToNCHWConv` node which contains a `NCHWConvNode` node and multiple`Transpose` nodes for `Input`, `Filter` and `Result` replaces the aforementioned pattern.
73-
74-
A corresponding backend-specific `OCLConvolution` instruction is also needed, defined in
75-
`tools/ClassGen/Backends/OpenCL/OpenCLSpecificInstrs.h`:
76-
77-
```cpp
78-
BB.newBackendSpecificInstr("OCLConvolution")
79-
.addOperand("Dest", OperandKind::Out)
80-
.addOperand("Src", OperandKind::In)
81-
.addOperand("Filter", OperandKind::In)
82-
.addOperand("Bias", OperandKind::In)
83-
.addMember(MemberType::VectorUnsigned, "Kernels")
84-
.addMember(MemberType::VectorUnsigned, "Strides")
85-
.addMember(MemberType::VectorUnsigned, "Pads")
86-
.addMember(MemberType::Unsigned, "Group")
87-
.autoIRGen()
88-
.autoVerify(VerifyKind::SameElementType, {"Dest", "Src", "Filter", "Bias"});
89-
90-
```
91-
9251

9352
### References
9453

include/glow/Backends/LayoutConverter.h

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ namespace glow {
2323

2424
/// Convert regular convolution nodes (that use NHWC) into a backend-specific
2525
/// convolution nodes using NCHW.
26-
template <class NCHWConvNode>
2726
Node *convertConvToNCHWConv(ConvolutionNode *CN, Function *F) {
2827
// Convert filter and input from NHWC (Glow's default) into NCHW.
2928
auto *NI = F->createTranspose("conv.input", CN->getInput(), NHWC2NCHW);
@@ -34,30 +33,49 @@ Node *convertConvToNCHWConv(ConvolutionNode *CN, Function *F) {
3433
auto outTy = F->getParent()->uniqueTypeWithNewShape(CN->getResult().getType(),
3534
dimsNCHW);
3635

37-
auto *NC = F->addNode(new NCHWConvNode(
38-
CN->getName(), outTy, NI, NF, CN->getBias(), CN->getKernels(),
39-
CN->getStrides(), CN->getPads(), CN->getGroup(), CN->getDilation()));
36+
auto *NC = F->addNode(
37+
new ConvolutionNode(CN->getName(), outTy, NI, NF, CN->getBias(),
38+
CN->getKernels(), CN->getStrides(), CN->getPads(),
39+
CN->getGroup(), CN->getDilation(), NCHW));
4040
auto *NR = F->createTranspose("conv.result", NC, NCHW2NHWC);
4141

4242
return NR;
4343
}
4444

4545
/// Convert regular pool nodes (that use NHWC) into backend-specific nodes using
4646
/// NCHW.
47-
template <class PoolNode, class NCHWPoolNode>
48-
Node *convertPoolToNCHWPool(PoolNode *PN, Function *F) {
47+
Node *convertMaxPoolToNCHWPool(MaxPoolNode *PN, Function *F) {
4948
// Convert input from NHWC (Glow's default) into NCHW.
50-
auto *NI = F->createTranspose("conv.input", PN->getInput(), NHWC2NCHW);
49+
auto *NI = F->createTranspose("maxpool.input", PN->getInput(), NHWC2NCHW);
50+
51+
auto dimsNHWC = ShapeNHWC(PN->getResult().getType()->dims());
52+
auto dimsNCHW = {dimsNHWC.n, dimsNHWC.c, dimsNHWC.h, dimsNHWC.w};
53+
auto outTy = F->getParent()->uniqueTypeWithNewShape(PN->getResult().getType(),
54+
dimsNCHW);
55+
auto AMT = F->getParent()->uniqueTypeWithNewShape(PN->getArgmax().getType(),
56+
dimsNCHW);
57+
58+
auto *NPN = F->addNode(new MaxPoolNode(PN->getName(), outTy, AMT, NI,
59+
PN->getKernels(), PN->getStrides(),
60+
PN->getPads(), NCHW));
61+
auto *NR = F->createTranspose("maxpool.result", NPN->getResult(), NCHW2NHWC);
62+
63+
return NR;
64+
}
65+
66+
Node *convertAvgPoolToNCHWPool(AvgPoolNode *PN, Function *F) {
67+
// Convert input from NHWC (Glow's default) into NCHW.
68+
auto *NI = F->createTranspose("maxpool.input", PN->getInput(), NHWC2NCHW);
5169

5270
auto dimsNHWC = ShapeNHWC(PN->getResult().getType()->dims());
5371
auto dimsNCHW = {dimsNHWC.n, dimsNHWC.c, dimsNHWC.h, dimsNHWC.w};
5472
auto outTy = F->getParent()->uniqueTypeWithNewShape(PN->getResult().getType(),
5573
dimsNCHW);
5674

5775
auto *NPN =
58-
F->addNode(new NCHWPoolNode(PN->getName(), outTy, NI, PN->getKernels()[0],
59-
PN->getStrides()[0], PN->getPads()));
60-
auto *NR = F->createTranspose("maxpool.result", NPN, NCHW2NHWC);
76+
F->addNode(new AvgPoolNode(PN->getName(), outTy, NI, PN->getKernels(),
77+
PN->getStrides(), PN->getPads(), NCHW));
78+
auto *NR = F->createTranspose("avgpool.result", NPN->getResult(), NCHW2NHWC);
6179

6280
return NR;
6381
}

include/glow/Graph/Graph.h

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -341,14 +341,15 @@ class Function final : public Named {
341341
/// \p group defines the number of groups the input and output channels should
342342
/// be divided into and convolved separately. \p dilation defines factor by
343343
/// which gap between 2 neighboring kernel elements is expanded along each
344-
/// axis.
344+
/// axis. \p layout defines the Tensor layout and must be either NHWC or NCHW.
345345

346-
ConvolutionNode *createConv(llvm::StringRef name, NodeValue input,
347-
NodeValue filter, NodeValue bias, TypeRef outTy,
348-
llvm::ArrayRef<unsigned_t> kernels,
349-
llvm::ArrayRef<unsigned_t> strides,
350-
llvm::ArrayRef<unsigned_t> pads, unsigned_t group,
351-
unsigned_t dilation = 1);
346+
ConvolutionNode *
347+
createConv(llvm::StringRef name, NodeValue input, NodeValue filter,
348+
NodeValue bias, TypeRef outTy, llvm::ArrayRef<unsigned_t> kernels,
349+
llvm::ArrayRef<unsigned_t> strides,
350+
llvm::ArrayRef<unsigned_t> pads, unsigned_t group,
351+
unsigned_t dilation = 1,
352+
ConvolutionLayout layout = ConvolutionLayout::NHWC);
352353

353354
/// Creates a ConvolutionNode with the given \p name which convolves the 4D
354355
/// \p input with \p filter and \bias. \p kernel defines the size of the
@@ -358,13 +359,14 @@ class Function final : public Named {
358359
/// \p group defines the number of groups the input and output channels should
359360
/// be divided into and convolved separately. \p dilation defines factor by
360361
/// which gap between 2 neighboring kernel elements is expanded along each
361-
/// axis.
362+
/// axis. \p layout defines the Tensor layout and must be either NHWC or NCHW.
362363

363-
ConvolutionNode *createConv(llvm::StringRef name, NodeValue input,
364-
NodeValue filter, NodeValue bias, TypeRef outTy,
365-
unsigned_t kernel, unsigned_t stride,
366-
unsigned_t pad, unsigned_t group,
367-
unsigned_t dilation = 1);
364+
ConvolutionNode *
365+
createConv(llvm::StringRef name, NodeValue input, NodeValue filter,
366+
NodeValue bias, TypeRef outTy, unsigned_t kernel,
367+
unsigned_t stride, unsigned_t pad, unsigned_t group,
368+
unsigned_t dilation = 1,
369+
ConvolutionLayout layout = ConvolutionLayout::NHWC);
368370

369371
/// Creates a Convolution3DNode with the given \p name which convolves the 5D
370372
/// \p input with \p filter and \bias. \p kernels defines the size of the
@@ -405,8 +407,9 @@ class Function final : public Named {
405407
/// cells should be added to the input during convolution. \p group defines
406408
/// the number of groups the input and output channels should be divided into
407409
/// and convolved separately.
408-
/// NOTE: ChannelwiseQuantizedConvolutionNode does not yet have an
409-
/// implementation so attempting to run a graph containing this node fails.
410+
/// NOTE: ChannelwiseQuantizedConvolutionNode does
411+
/// not yet have an implementation so attempting to run a graph containing
412+
/// this node fails.
410413
ChannelwiseQuantizedConvolutionNode *createChannelwiseQuantizedConv(
411414
llvm::StringRef name, NodeValue input, Constant *filter, Constant *bias,
412415
Constant *scales, Constant *offsets, TypeRef outTy,
@@ -419,25 +422,28 @@ class Function final : public Named {
419422
MaxPoolNode *createMaxPool(llvm::StringRef name, NodeValue input,
420423
llvm::ArrayRef<unsigned_t> kernels,
421424
llvm::ArrayRef<unsigned_t> strides,
422-
llvm::ArrayRef<unsigned_t> pads);
425+
llvm::ArrayRef<unsigned_t> pads,
426+
ConvolutionLayout layout = NHWC);
423427

424428
MaxPoolNode *createMaxPool(llvm::StringRef name, NodeValue input,
425429
unsigned_t kernel, unsigned_t stride,
426-
unsigned_t pad);
430+
unsigned_t pad, ConvolutionLayout layout = NHWC);
427431

428432
AvgPoolNode *createAvgPool(llvm::StringRef name, NodeValue input,
429433
llvm::ArrayRef<unsigned_t> kernels,
430434
llvm::ArrayRef<unsigned_t> strides,
431-
llvm::ArrayRef<unsigned_t> pads);
435+
llvm::ArrayRef<unsigned_t> pads,
436+
ConvolutionLayout layout = NHWC);
432437

433438
AvgPoolNode *createAvgPool(llvm::StringRef name, NodeValue input,
434439
TypeRef outTy, llvm::ArrayRef<unsigned_t> kernels,
435440
llvm::ArrayRef<unsigned_t> strides,
436-
llvm::ArrayRef<unsigned_t> pads);
441+
llvm::ArrayRef<unsigned_t> pads,
442+
ConvolutionLayout layout = NHWC);
437443

438444
AvgPoolNode *createAvgPool(llvm::StringRef name, NodeValue input,
439445
unsigned_t kernel, unsigned_t stride,
440-
unsigned_t pad);
446+
unsigned_t pad, ConvolutionLayout layout = NHWC);
441447

442448
/// Creates and \returns an AdaptiveAvgPool node with \p name, \p input, and
443449
/// \p outTy. The AdaptiveAvgPoolNode will perform average pooling over the
@@ -1100,14 +1106,15 @@ class Function final : public Named {
11001106
/// defines the number of groups the input and output channels should be
11011107
/// divided into and convolved separately. \p dilation defines factor by
11021108
/// which gap between 2 neighboring kernel elements is expanded along each
1103-
/// axis.
1109+
/// axis. \p layout defines the Tensor layout and must be either NHWC or NCHW.
11041110
ConvolutionNode *createConv(PlaceholderBindings &bindings,
11051111
llvm::StringRef name, NodeValue input,
11061112
size_t outChannels,
11071113
llvm::ArrayRef<unsigned_t> kernels,
11081114
llvm::ArrayRef<unsigned_t> strides,
11091115
llvm::ArrayRef<unsigned_t> pads, unsigned_t group,
1110-
unsigned_t dilation = 1);
1116+
unsigned_t dilation = 1,
1117+
ConvolutionLayout layout = NHWC);
11111118

11121119
/// Creates a ConvolutionNode with the given \p name which convolves the 4D
11131120
/// \p input. \p kernel defines the size of the height and width dimensions of
@@ -1117,12 +1124,13 @@ class Function final : public Named {
11171124
/// defines the number of groups the input and output channels should be
11181125
/// divided into and convolved separately.\p dilation defines factor by
11191126
/// which gap between 2 neighboring kernel elements is expanded along each
1120-
/// axis.
1127+
/// axis. \p layout defines the Tensor layout and must be either NHWC or NCHW.
11211128
ConvolutionNode *createConv(PlaceholderBindings &bindings,
11221129
llvm::StringRef name, NodeValue input,
11231130
size_t outChannels, unsigned_t kernel,
11241131
unsigned_t stride, unsigned_t pad,
1125-
unsigned_t group, unsigned_t dilation = 1);
1132+
unsigned_t group, unsigned_t dilation = 1,
1133+
ConvolutionLayout layout = NHWC);
11261134

11271135
/// Creates a Convolution3DNode with the given \p name which convolves the 5D
11281136
/// \p input. \p kernels defines the size of the height, width, and depth

include/glow/Graph/Nodes.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,9 @@ inline ShapeHWD calculate3DConvPoolOutputDims(
203203
/// Modes of the padding operation.
204204
enum PaddingMode { CONSTANT = 0, REFLECT, EDGE };
205205

206+
/// Convolution Layouts.
207+
enum ConvolutionLayout { NHWC = 0, NCHW };
208+
206209
/// Support for hashing the Nodes. This is required for using
207210
/// llvm::hash_combine.
208211
class Node;

include/glow/IR/IRBuilder.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,16 @@ class IRBuilder {
5252
/// @name High-level, operation-level IRBuilder.
5353
///@{
5454

55-
MaxPoolWithArgmaxInst *createMaxPoolWithArgmaxOp(
56-
llvm::StringRef name, Value *input, llvm::ArrayRef<unsigned_t> kernels,
57-
llvm::ArrayRef<unsigned_t> strides, llvm::ArrayRef<unsigned_t> pads);
55+
MaxPoolWithArgmaxInst *
56+
createMaxPoolWithArgmaxOp(llvm::StringRef name, Value *input,
57+
llvm::ArrayRef<unsigned_t> kernels,
58+
llvm::ArrayRef<unsigned_t> strides,
59+
llvm::ArrayRef<unsigned_t> pads, unsigned_t layout);
5860

5961
AvgPoolInst *createAvgPoolOp(Value *input, llvm::ArrayRef<unsigned_t> kernels,
6062
llvm::ArrayRef<unsigned_t> strides,
61-
llvm::ArrayRef<unsigned_t> pads);
63+
llvm::ArrayRef<unsigned_t> pads,
64+
unsigned_t layout);
6265

6366
CrossEntropyLossInst *createCrossEntropyLossOp(llvm::StringRef name, Value *P,
6467
Value *labels);

lib/Backends/Interpreter/InterpreterNodes.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,8 @@ void BoundInterpreterFunction::fwdConvolutionInstQuantizedImpl(
282282
}
283283

284284
void BoundInterpreterFunction::fwdConvolutionInst(const ConvolutionInst *I) {
285+
assert(I->getLayout() == NHWC &&
286+
"Glow Interpreter supports only NHWC Convolutions");
285287
auto kernelSizes = I->getKernels();
286288
auto pads = I->getPads();
287289
auto strides = I->getStrides();
@@ -303,6 +305,8 @@ void BoundInterpreterFunction::fwdConvolutionInst(const ConvolutionInst *I) {
303305

304306
void BoundInterpreterFunction::fwdConvolutionGradInst(
305307
const ConvolutionGradInst *I) {
308+
assert(I->getLayout() == NHWC &&
309+
"Glow Interpreter supports only NHWC Convolutions");
306310
auto inW = getWeightHandle(I->getSrc());
307311
auto inG = getWeightHandle(I->getSrcGrad());
308312
auto outG = getWeightHandle(I->getDestGrad());
@@ -753,6 +757,7 @@ static void fwdMaxPool(Tensor *inW, Tensor *outW, Tensor *argmaxW,
753757
}
754758

755759
void BoundInterpreterFunction::fwdMaxPoolInst(const MaxPoolInst *I) {
760+
assert(I->getLayout() == NHWC && "Glow Interpreter supports only NHWC Pools");
756761
auto inW = getTensor(I->getSrc());
757762
auto outW = getTensor(I->getDest());
758763

@@ -770,6 +775,7 @@ void BoundInterpreterFunction::fwdMaxPoolInst(const MaxPoolInst *I) {
770775

771776
void BoundInterpreterFunction::fwdMaxPoolWithArgmaxInst(
772777
const MaxPoolWithArgmaxInst *I) {
778+
assert(I->getLayout() == NHWC && "Glow Interpreter supports only NHWC Pools");
773779
auto inW = getTensor(I->getSrc());
774780
auto outW = getTensor(I->getDest());
775781
auto argmaxW = getTensor(I->getArgmax());
@@ -888,6 +894,7 @@ void BoundInterpreterFunction::fwdAvgPoolInstI8Impl(const AvgPoolInst *I) {
888894
}
889895

890896
void BoundInterpreterFunction::fwdAvgPoolInst(const AvgPoolInst *I) {
897+
assert(I->getLayout() == NHWC && "Glow Interpreter supports only NHWC Pools");
891898
if (I->getSrc()->getType()->isQuantizedType()) {
892899
fwdAvgPoolInstI8Impl(I);
893900
return;

0 commit comments

Comments
 (0)