-
Notifications
You must be signed in to change notification settings - Fork 699
[Quantization] Support int32 quantized bias for quantized Conv #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since it's not fixing the whole issue, I'd remove "fixes ..." otherwise that issue will be closed. |
@@ -90,7 +90,7 @@ class InterpreterFunction final : public CompiledFunction { | |||
#define DEF_INSTR(CLASS, NAME) void fwd##CLASS(const CLASS *I); | |||
#define DEF_BACKEND_SPECIFIC_INSTR(CLASS, NAME) | |||
#include "glow/AutoGenInstr.def" | |||
|
|||
template <typename ElemTy = int8_t> |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -103,14 +103,16 @@ void InterpreterFunction::fwdConvolutionInst_FloatImpl( | |||
} | |||
|
|||
// This is the quantized i8 implementation of Convolution. | |||
// For bias, we support both int8 and int32 quantization. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
auto destHandle = destTensor->getHandle<int8_t>(); | ||
for (size_t i = 0, e = destHandle.size(); i < e; ++i) { | ||
destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params); | ||
if (destTensor->getType().getElementType() == ElemKind::Int8QTy) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -446,7 +445,6 @@ int main(int argc, char **argv) { | |||
BB.newInstr("Quantize") | |||
.addOperand("Dest", OperandKind::Out) | |||
.addOperand("Src", OperandKind::In) | |||
.autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"}) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beicy Can you please describe the problem that this PR is solving? Is this a correctness problem? Are we adding a new feature? Do we need this to support some model?
@beicy I see you doubling the number of convolutions in the code and I am not sure that I understand what is the motivation for this change. Why do we need both? What's wrong with the i8 version? If i32 is better then why keep the i8?
@beicy oh, got it. Thank you for pointing me to the discussion. In that case, let's remove the i8 version and just keep the i32 bias version. |
If we are to remove int8 completely in the same PR vs having an intermediate step via int32,int8 support for bias, we'd need to modify quantization procedure (fp32 -> int8 net conversion) accordingly here. |
@@ -76,6 +76,10 @@ enum Schema { | |||
/// parameters \p TQP. | |||
int8_t quantize(float input, const TensorQuantizationParams &TQP); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
a63d194
to
a2696c9
Compare
|
||
/// Converts floating point value to int32 based on the quantization | ||
/// parameters \p TQP. | ||
int32_t quantizeInt32(float input, const TensorQuantizationParams &TQP); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
auto mn = std::numeric_limits<DestTy>::min(); | ||
return std::max<SrcTy>(mn, std::min<SrcTy>(mx, in)); | ||
} | ||
|
||
/// Converts floating point value to int8 based on the quantization |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
for (size_t i = 0, e = destHandle.size(); i < e; ++i) { | ||
destHandle.raw(i) = quantization::quantize(srcHandle.raw(i), params); | ||
if (destTensor->getType().getElementType() == ElemKind::Int8QTy) { | ||
auto destHandle = destTensor->getHandle<int8_t>(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
lib/Quantization/Quantization.cpp
Outdated
TQP.offset); | ||
// For bias of a conv op, it is quantized to int32. | ||
if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) { | ||
return mod_.uniqueType(ElemKind::Int32QTy, val.dims(), TQP.scale, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -446,7 +445,6 @@ int main(int argc, char **argv) { | |||
BB.newInstr("Quantize") | |||
.addOperand("Dest", OperandKind::Out) | |||
.addOperand("Src", OperandKind::In) | |||
.autoVerify(VerifyKind::SameElementType, {"Dest", "ElemKind::Int8QTy"}) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
0d3d521
to
ef16a44
Compare
int8_t quantize(float input, const TensorQuantizationParams &TQP); | ||
/// \returns the value \p in as clipped to the range of \p DestTy. | ||
template <class SrcTy, class DestTy> DestTy clip(SrcTy in) { | ||
assert(sizeof(SrcTy) >= sizeof(DestTy) && "Invalid types"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -1397,7 +1397,8 @@ QuantizeNode *Function::createQuantize(llvm::StringRef name, NodeValue input, | |||
TypeRef outTy) { | |||
assert(input.getElementType() == ElemKind::FloatTy && | |||
"Input must be a floating type"); | |||
assert(outTy->getElementType() == ElemKind::Int8QTy && | |||
assert((outTy->getElementType() == ElemKind::Int8QTy || | |||
outTy->getElementType() == ElemKind::Int32QTy) && |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking mostly good to me, can you verify that you can dump and load profile for renset50.
lib/Quantization/Quantization.cpp
Outdated
if (use.getKind() == glow::Kinded::Kind::ConvolutionNodeKind && idx == 2) { | ||
// For bias of a conv op, it is quantized to int32. Also, we should make | ||
// sure its scale should be (scale of input) * (scale of weights). | ||
NodeValue val1 = use.getNthInput(0); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
lib/Quantization/Quantization.cpp
Outdated
// sure its scale should be (scale of input) * (scale of weights). | ||
NodeValue val1 = use.getNthInput(0); | ||
NodeValue val2 = use.getNthInput(1); | ||
float scale1 = val1.getNode()->getNthResult(0).getType()->getScale(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
lib/Quantization/Quantization.cpp
Outdated
assert(destTy->getElementType() == ElemKind::Int8QTy && ""); | ||
assert((destTy->getElementType() == ElemKind::Int8QTy || | ||
destTy->getElementType() == ElemKind::Int32QTy) && | ||
""); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@rdzhabarov Thanks for reminding me this test. Now we can dump and profile resnet50.
With int32 quantized bias:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
for (size_t i = 0, e = destH.size(); i < e; ++i) { | ||
destH.raw(i) = quantization::quantize<eTy>(srcH.raw(i), params); | ||
} | ||
} |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
Nice! It's good to see this change landing. |
Description:
We are working on loading quantized ResNet50 model directly.
So far, we quantize weights and bias into int8. Both the interpreter and CPU backend support int8 bias. However, this quantized Resnet50 model quantizes weights into int8, but bias into int32. It is because the partial sum of the matrix-matrix multiplication is accumulated into int32, so int32 bias can be added to the int32 partial sum for better accuracy (i.e. int8 bias caused accuracy drop).
Now we plan to add support of conv with int32 bias and eliminate int8 bias.
Testing:
Added unittest.
Documentation:
#1727, #1762