[Quantization] Support int32 quantized bias for quantized Conv #1876

beicy · 2018-10-16T04:14:36Z

Description:

We are working on loading quantized ResNet50 model directly.
So far, we quantize weights and bias into int8. Both the interpreter and CPU backend support int8 bias. However, this quantized Resnet50 model quantizes weights into int8, but bias into int32. It is because the partial sum of the matrix-matrix multiplication is accumulated into int32, so int32 bias can be added to the int32 partial sum for better accuracy (i.e. int8 bias caused accuracy drop).
Now we plan to add support of conv with int32 bias and eliminate int8 bias.

Testing:
Added unittest.

Documentation:
#1727, #1762

rdzhabarov · 2018-10-16T04:19:28Z

Since it's not fixing the whole issue, I'd remove "fixes ..." otherwise that issue will be closed.

lib/Backends/Interpreter/InterpreterFunction.h

@@ -90,7 +90,7 @@ class InterpreterFunction final : public CompiledFunction {
 #define DEF_INSTR(CLASS, NAME) void fwd##CLASS(const CLASS *I);
 #define DEF_BACKEND_SPECIFIC_INSTR(CLASS, NAME)
 #include "glow/AutoGenInstr.def"
-
+  template <typename ElemTy = int8_t>


lib/Backends/Interpreter/InterpreterNodes.cpp

@@ -103,14 +103,16 @@ void InterpreterFunction::fwdConvolutionInst_FloatImpl(
 }

 // This is the quantized i8 implementation of Convolution.
+// For bias, we support both int8 and int32 quantization.


lib/Backends/Interpreter/InterpreterNodes.cpp

-  auto destHandle = destTensor->getHandle<int8_t>();
-  for (size_t i = 0, e = destHandle.size(); i < e; ++i) {
-    destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params);
+  if (destTensor->getType().getElementType() == ElemKind::Int8QTy) {


tools/ClassGen/InstrGen.cpp

@@ -446,7 +445,6 @@ int main(int argc, char **argv) {
  BB.newInstr("Quantize")
      .addOperand("Dest", OperandKind::Out)
      .addOperand("Src", OperandKind::In)
-      .autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"})


nadavrot

@beicy Can you please describe the problem that this PR is solving? Is this a correctness problem? Are we adding a new feature? Do we need this to support some model?

@beicy I see you doubling the number of convolutions in the code and I am not sure that I understand what is the motivation for this change. Why do we need both? What's wrong with the i8 version? If i32 is better then why keep the i8?

beicy · 2018-10-16T06:03:52Z

@nadavrot This is for loading resnet50 quantized model. In that model, the bias is quantized to int32_t. But for our current version, we only support int8_t quantized bias. The discussion in issue #1727.

nadavrot · 2018-10-16T16:22:19Z

@beicy oh, got it. Thank you for pointing me to the discussion. In that case, let's remove the i8 version and just keep the i32 bias version.

rdzhabarov · 2018-10-16T17:50:38Z

If we are to remove int8 completely in the same PR vs having an intermediate step via int32,int8 support for bias, we'd need to modify quantization procedure (fp32 -> int8 net conversion) accordingly here.

include/glow/Quantization/Base/Base.h

@@ -76,6 +76,10 @@ enum Schema {
 /// parameters \p TQP.
 int8_t quantize(float input, const TensorQuantizationParams &TQP);


include/glow/Quantization/Base/Base.h

+
+/// Converts floating point value to int32 based on the quantization
+/// parameters \p TQP.
+int32_t quantizeInt32(float input, const TensorQuantizationParams &TQP);


include/glow/Quantization/Base/Base.h

+  auto mn = std::numeric_limits<DestTy>::min();
+  return std::max<SrcTy>(mn, std::min<SrcTy>(mx, in));
+}
+
 /// Converts floating point value to int8 based on the quantization


lib/Backends/Interpreter/InterpreterNodes.cpp

-  for (size_t i = 0, e = destHandle.size(); i < e; ++i) {
-    destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params);
+  if (destTensor->getType().getElementType() == ElemKind::Int8QTy) {
+    auto destHandle = destTensor->getHandle<int8_t>();


lib/Quantization/Quantization.cpp

-                           TQP.offset);
+    // For bias of a conv op, it is quantized to int32.
+    if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) {
+      return mod_.uniqueType(ElemKind::Int32QTy, val.dims(), TQP.scale,


tools/ClassGen/InstrGen.cpp

@@ -446,7 +445,6 @@ int main(int argc, char **argv) {
  BB.newInstr("Quantize")
      .addOperand("Dest", OperandKind::Out)
      .addOperand("Src", OperandKind::In)
-      .autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"})


include/glow/Quantization/Base/Base.h

-int8_t quantize(float input, const TensorQuantizationParams &TQP);
+/// \returns the value \p in as clipped to the range of \p DestTy.
+template <class SrcTy, class DestTy> DestTy clip(SrcTy in) {
+  assert(sizeof(SrcTy) >= sizeof(DestTy) && "Invalid types");


lib/Graph/Graph.cpp

@@ -1397,7 +1397,8 @@ QuantizeNode *Function::createQuantize(llvm::StringRef name, NodeValue input,
                                       TypeRef outTy) {
  assert(input.getElementType() == ElemKind::FloatTy &&
         "Input must be a floating type");
-  assert(outTy->getElementType() == ElemKind::Int8QTy &&
+  assert((outTy->getElementType() == ElemKind::Int8QTy ||
+          outTy->getElementType() == ElemKind::Int32QTy) &&


rdzhabarov

Looking mostly good to me, can you verify that you can dump and load profile for renset50.

lib/Quantization/Quantization.cpp

+    if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) {
+      // For bias of a conv op, it is quantized to int32. Also, we should make
+      // sure its scale should be (scale of input) * (scale of weights).
+      NodeValue val1 = use.getNthInput(0);


lib/Quantization/Quantization.cpp

+      // sure its scale should be (scale of input) * (scale of weights).
+      NodeValue val1 = use.getNthInput(0);
+      NodeValue val2 = use.getNthInput(1);
+      float scale1 = val1.getNode()->getNthResult(0).getType()->getScale();


lib/Quantization/Quantization.cpp

-      assert(destTy->getElementType() == ElemKind::Int8QTy && "");
+      assert((destTy->getElementType() == ElemKind::Int8QTy ||
+              destTy->getElementType() == ElemKind::Int32QTy) &&
+             "");


beicy · 2018-10-23T06:04:30Z

@rdzhabarov Thanks for reminding me this test. Now we can dump and profile resnet50.
With int8 quantized bias:

File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5875)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9599)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9881)

With int32 quantized bias:

File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5655)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9506)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9893)

nadavrot

LGTM.

lib/Backends/Interpreter/InterpreterNodes.cpp

+  for (size_t i = 0, e = destH.size(); i < e; ++i) {
+    destH.raw(i) = quantization::quantize<eTy>(srcH.raw(i), params);
+  }
+}


rdzhabarov

Awesome work!

nadavrot · 2018-10-25T19:16:16Z

Nice! It's good to see this change landing.

facebook-github-bot added the CLA Signed label Oct 16, 2018

rdzhabarov suggested changes Oct 16, 2018

View reviewed changes

rdzhabarov reviewed Oct 16, 2018

View reviewed changes

nadavrot requested changes Oct 16, 2018

View reviewed changes

rdzhabarov reviewed Oct 16, 2018

View reviewed changes

beicy force-pushed the ref branch 2 times, most recently from a63d194 to a2696c9 Compare October 18, 2018 02:56

rdzhabarov reviewed Oct 18, 2018

View reviewed changes

include/glow/Quantization/Base/Base.h Outdated

/// Converts floating point value to int32 based on the quantization

/// parameters \p TQP.

int32_t quantizeInt32(float input, const TensorQuantizationParams &TQP);

This comment was marked as off-topic.

Sign in to view

rdzhabarov reviewed Oct 18, 2018

View reviewed changes

include/glow/Quantization/Base/Base.h Outdated

auto mn = std::numeric_limits<DestTy>::min();

return std::max<SrcTy>(mn, std::min<SrcTy>(mx, in));

}

/// Converts floating point value to int8 based on the quantization

This comment was marked as off-topic.

Sign in to view

rdzhabarov reviewed Oct 18, 2018

View reviewed changes

beicy force-pushed the ref branch 2 times, most recently from 0d3d521 to ef16a44 Compare October 18, 2018 21:19

rdzhabarov reviewed Oct 20, 2018

View reviewed changes

rdzhabarov suggested changes Oct 20, 2018

View reviewed changes

beicy force-pushed the ref branch from ef16a44 to ad86ab4 Compare October 23, 2018 05:55

nadavrot approved these changes Oct 23, 2018

View reviewed changes

qcolombet reviewed Oct 24, 2018

View reviewed changes

lib/Backends/Interpreter/InterpreterNodes.cpp Outdated

for (size_t i = 0, e = destH.size(); i < e; ++i) {

destH.raw(i) = quantization::quantize<eTy>(srcH.raw(i), params);

}

}

This comment was marked as off-topic.

Sign in to view

beicy force-pushed the ref branch from ad86ab4 to 191e0f4 Compare October 24, 2018 17:24

rdzhabarov approved these changes Oct 24, 2018

View reviewed changes

beicy force-pushed the ref branch from 191e0f4 to f9c3c84 Compare October 25, 2018 17:43

[Quantization] Support int32 quantized bias for quantized Conv

fb77d62

beicy force-pushed the ref branch from f9c3c84 to fb77d62 Compare October 25, 2018 18:37

beicy merged commit 3afb477 into pytorch:master Oct 25, 2018

This was referenced Oct 25, 2018

[Clean up] Clean up the unused code. #1929

Merged

[Quantization] Support int32 quantized bias of FullyConnected. #1940

Merged

		@@ -76,6 +76,10 @@ enum Schema {
		/// parameters \p TQP.
		int8_t quantize(float input, const TensorQuantizationParams &TQP);

[Quantization] Support int32 quantized bias for quantized Conv #1876

[Quantization] Support int32 quantized bias for quantized Conv #1876

Uh oh!

Conversation

beicy commented Oct 16, 2018 • edited by rdzhabarov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdzhabarov commented Oct 16, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

nadavrot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beicy commented Oct 16, 2018

Uh oh!

nadavrot commented Oct 16, 2018

Uh oh!

rdzhabarov commented Oct 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

rdzhabarov left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

beicy commented Oct 23, 2018

Uh oh!

nadavrot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

rdzhabarov left a comment

Choose a reason for hiding this comment

Uh oh!

nadavrot commented Oct 25, 2018

beicy commented Oct 16, 2018 •

edited by rdzhabarov

Loading

nadavrot left a comment •

edited

Loading

rdzhabarov commented Oct 16, 2018 •

edited

Loading