Skip to content

[Quantization] Support int32 quantized bias for quantized Conv #1876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 25, 2018

Conversation

beicy
Copy link
Contributor

@beicy beicy commented Oct 16, 2018

Description:

We are working on loading quantized ResNet50 model directly.
So far, we quantize weights and bias into int8. Both the interpreter and CPU backend support int8 bias. However, this quantized Resnet50 model quantizes weights into int8, but bias into int32. It is because the partial sum of the matrix-matrix multiplication is accumulated into int32, so int32 bias can be added to the int32 partial sum for better accuracy (i.e. int8 bias caused accuracy drop).
Now we plan to add support of conv with int32 bias and eliminate int8 bias.

Testing:
Added unittest.

Documentation:
#1727, #1762

@rdzhabarov
Copy link
Contributor

Since it's not fixing the whole issue, I'd remove "fixes ..." otherwise that issue will be closed.

@@ -90,7 +90,7 @@ class InterpreterFunction final : public CompiledFunction {
#define DEF_INSTR(CLASS, NAME) void fwd##CLASS(const CLASS *I);
#define DEF_BACKEND_SPECIFIC_INSTR(CLASS, NAME)
#include "glow/AutoGenInstr.def"

template <typename ElemTy = int8_t>

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -103,14 +103,16 @@ void InterpreterFunction::fwdConvolutionInst_FloatImpl(
}

// This is the quantized i8 implementation of Convolution.
// For bias, we support both int8 and int32 quantization.

This comment was marked as off-topic.

auto destHandle = destTensor->getHandle<int8_t>();
for (size_t i = 0, e = destHandle.size(); i < e; ++i) {
destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params);
if (destTensor->getType().getElementType() == ElemKind::Int8QTy) {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -446,7 +445,6 @@ int main(int argc, char **argv) {
BB.newInstr("Quantize")
.addOperand("Dest", OperandKind::Out)
.addOperand("Src", OperandKind::In)
.autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"})

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@nadavrot nadavrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beicy Can you please describe the problem that this PR is solving? Is this a correctness problem? Are we adding a new feature? Do we need this to support some model?

@beicy I see you doubling the number of convolutions in the code and I am not sure that I understand what is the motivation for this change. Why do we need both? What's wrong with the i8 version? If i32 is better then why keep the i8?

@beicy
Copy link
Contributor Author

beicy commented Oct 16, 2018

@nadavrot This is for loading resnet50 quantized model. In that model, the bias is quantized to int32_t. But for our current version, we only support int8_t quantized bias. The discussion in issue #1727.

@nadavrot
Copy link
Contributor

@beicy oh, got it. Thank you for pointing me to the discussion. In that case, let's remove the i8 version and just keep the i32 bias version.

@rdzhabarov
Copy link
Contributor

rdzhabarov commented Oct 16, 2018

If we are to remove int8 completely in the same PR vs having an intermediate step via int32,int8 support for bias, we'd need to modify quantization procedure (fp32 -> int8 net conversion) accordingly here.

@@ -76,6 +76,10 @@ enum Schema {
/// parameters \p TQP.
int8_t quantize(float input, const TensorQuantizationParams &TQP);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@beicy beicy force-pushed the ref branch 2 times, most recently from a63d194 to a2696c9 Compare October 18, 2018 02:56

/// Converts floating point value to int32 based on the quantization
/// parameters \p TQP.
int32_t quantizeInt32(float input, const TensorQuantizationParams &TQP);

This comment was marked as off-topic.

auto mn = std::numeric_limits<DestTy>::min();
return std::max<SrcTy>(mn, std::min<SrcTy>(mx, in));
}

/// Converts floating point value to int8 based on the quantization

This comment was marked as off-topic.

for (size_t i = 0, e = destHandle.size(); i < e; ++i) {
destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params);
if (destTensor->getType().getElementType() == ElemKind::Int8QTy) {
auto destHandle = destTensor->getHandle<int8_t>();

This comment was marked as off-topic.

TQP.offset);
// For bias of a conv op, it is quantized to int32.
if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) {
return mod_.uniqueType(ElemKind::Int32QTy, val.dims(), TQP.scale,

This comment was marked as off-topic.

@@ -446,7 +445,6 @@ int main(int argc, char **argv) {
BB.newInstr("Quantize")
.addOperand("Dest", OperandKind::Out)
.addOperand("Src", OperandKind::In)
.autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"})

This comment was marked as off-topic.

This comment was marked as off-topic.

@beicy beicy force-pushed the ref branch 2 times, most recently from 0d3d521 to ef16a44 Compare October 18, 2018 21:19
int8_t quantize(float input, const TensorQuantizationParams &TQP);
/// \returns the value \p in as clipped to the range of \p DestTy.
template <class SrcTy, class DestTy> DestTy clip(SrcTy in) {
assert(sizeof(SrcTy) >= sizeof(DestTy) && "Invalid types");

This comment was marked as off-topic.

@@ -1397,7 +1397,8 @@ QuantizeNode *Function::createQuantize(llvm::StringRef name, NodeValue input,
TypeRef outTy) {
assert(input.getElementType() == ElemKind::FloatTy &&
"Input must be a floating type");
assert(outTy->getElementType() == ElemKind::Int8QTy &&
assert((outTy->getElementType() == ElemKind::Int8QTy ||
outTy->getElementType() == ElemKind::Int32QTy) &&

This comment was marked as off-topic.

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking mostly good to me, can you verify that you can dump and load profile for renset50.

if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) {
// For bias of a conv op, it is quantized to int32. Also, we should make
// sure its scale should be (scale of input) * (scale of weights).
NodeValue val1 = use.getNthInput(0);

This comment was marked as off-topic.

// sure its scale should be (scale of input) * (scale of weights).
NodeValue val1 = use.getNthInput(0);
NodeValue val2 = use.getNthInput(1);
float scale1 = val1.getNode()->getNthResult(0).getType()->getScale();

This comment was marked as off-topic.

assert(destTy->getElementType() == ElemKind::Int8QTy && "");
assert((destTy->getElementType() == ElemKind::Int8QTy ||
destTy->getElementType() == ElemKind::Int32QTy) &&
"");

This comment was marked as off-topic.

@beicy
Copy link
Contributor Author

beicy commented Oct 23, 2018

@rdzhabarov Thanks for reminding me this test. Now we can dump and profile resnet50.
With int8 quantized bias:

File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5875)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9599)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9881)

With int32 quantized bias:

File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5655)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9506)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9893)

Copy link
Contributor

@nadavrot nadavrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

for (size_t i = 0, e = destH.size(); i < e; ++i) {
destH.raw(i) = quantization::quantize<eTy>(srcH.raw(i), params);
}
}

This comment was marked as off-topic.

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@beicy beicy merged commit 3afb477 into pytorch:master Oct 25, 2018
@nadavrot
Copy link
Contributor

Nice! It's good to see this change landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants