diff --git a/docs/source/validated_model_list.md b/docs/source/validated_model_list.md index fba2536cfe8..eaa32c0dfdb 100644 --- a/docs/source/validated_model_list.md +++ b/docs/source/validated_model_list.md @@ -1,21 +1,20 @@ +# Validated Models -Validated Models -====== IntelĀ® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in [example tables](https://github.com/intel/neural-compressor/blob/master/examples/README.md), and the performance/accuracy results is available here. 1. [Validated Quantization Examples](#Validated-Quantization-Examples) - 1.1. [TensorFlow Models with TensorFlow 2.15.0](#tensorflow-models-with-tensorflow-2150) + 1.1. [TensorFlow Models with TensorFlow 2.16.1](#tensorflow-models-with-tensorflow-2161) - 1.2. [PyTorch Models with Torch 2.2.1+cpu in PTQ Mode](#pytorch-models-with-torch-221cpu-in-ptq-mode) + 1.2. [Keras Models with keras 2.15.1](#keras-models-with-keras-2151) - 1.3. [PyTorch Models with Torch 2.2.1+cpu in QAT Mode](#pytorch-models-with-torch-221cpu-in-qat-mode) + 1.3. [PyTorch Models with Torch 2.3.0+cpu in PTQ Mode](#pytorch-models-with-torch-230cpu-in-ptq-mode) - 1.4. [PyTorch Models with Torch 2.0.1+cpu in WOQ Mode](#pytorch-models-with-torch-201cpu-in-woq-mode) + 1.4. [PyTorch Models with Torch 2.3.0+cpu in QAT Mode](#pytorch-models-with-torch-230cpu-in-qat-mode) - 1.5. [ONNX Models with ONNX Runtime 1.17.1](#onnx-models-with-onnx-runtime-1171) + 1.5. [PyTorch Models with Torch 2.3.0+cpu in IPEX Mode](#pytorch-models-with-torch-230cpu-in-ipex-mode) - 1.6. [ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode](#onnx-models-with-onnx-runtime-1150-in-woq-mode) + 1.6. [ONNX Models with ONNX Runtime 1.18.1](#onnx-models-with-onnx-runtime-1181) 2. [Validated Pruning Examples](#Validated-Pruning-Examples) @@ -25,14 +24,14 @@ IntelĀ® Neural Compressor validated examples with multiple compression technique ## Validated Quantization Examples -System summary: Test by Intel on 3/18/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0, -CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. +System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590, +Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model. Performance varies by use, configuration and other factors. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks -### TensorFlow Models with TensorFlow 2.15.0 +### TensorFlow Models with TensorFlow 2.16.1 @@ -58,9 +57,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -68,9 +67,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -78,29 +77,29 @@ For more complete information about performance and benchmark results, visit www - - - + + + - - - - + + + + - - - - + + + + @@ -108,9 +107,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -118,9 +117,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -128,89 +127,109 @@ For more complete information about performance and benchmark results, visit www - - - + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - - + + + + - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - - + + + + - + - - - - - - + + + + + + @@ -218,9 +237,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -228,9 +247,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -238,9 +257,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -248,9 +267,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -258,74 +277,215 @@ For more complete information about performance and benchmark results, visit www - - - + + + - - - - + + + + - - - - + + + + - - + + - - - + + + - - + + - - - + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + + + + + +
74.11% 74.27% -0.22%1720.00582.182.95x1732.92578.882.99x
ResNet50 v1.576.25% 76.46% -0.28%1517.38570.652.66x1535.20530.002.90x
ResNet10177.52% 76.45% 1.41%1058.93382.962.77x1048.36384.022.73x
Inception V1 pb 70.45% 69.74%1.03%2080.56951.852.19x+1.03%2079.24927.822.24x
Inception V2 pb 74.33% 73.97%0.49%1587.53863.371.84x+0.49%1644.36840.531.96x
Inception V376.72% 76.75% -0.03%1052.91434.272.42x1076.10401.892.68x
Inception V480.13% 80.27% -0.18%707.41234.383.02x704.96199.283.54x
Inception ResNet V280.25% 80.40% -0.18%320.37179.461.79x313.97178.271.76x
DenseNet-161pb76.29%76.29%+0.00%279.20214.031.30x
MobileNet V1 pb 71.79% 70.96%1.18%4312.311512.592.85x+1.18%4199.131506.682.79x
MobileNet V2 pb 72.48% 71.76%1.01%2287.771406.751.63x+1.01%2170.391445.051.50x
VGG16 pb 72.69% 70.89%2.55%1367.34207.416.59x+2.55%1388.62203.396.83x
VGG19 pb 72.67% 71.01%2.33%1244.82176.797.04x+2.33%1236.12169.747.28x
ResNet50pb69.09%69.03%+0.09%411.79284.531.45x
ResNetV2 50 pb 70.37% 69.64%1.05%780.51582.961.34x+1.05%779.42539.541.44x
ResNetV2 101 pb 72.64% 71.87%1.08%494.43329.511.50x+1.08%492.00295.771.66x
ResNetV2 152 pb 73.12% 72.37%1.04%349.42235.481.48x+1.04%348.39205.721.69x
Densenet   161ViT pb76.29%76.29%0.00%282.31223.191.26x81.39%81.92%-0.64%230.53132.661.74x
SSD ResNet50 V137.91% 38.00% -0.24%139.4930.994.50x135.7128.754.72x
SSD MobileNet V123.00% 23.13% -0.57%1284.41756.561.70x1237.70719.301.72x
SSD ResNet50 v137.88% 38.00% -0.31%139.5627.795.02x130.5422.055.92x
SSD MobileNet v122.96% 23.13% -0.71%1280.88530.232.42x1234.56529.342.33x
Faster R-CNN ResNet10130.32% 30.39% -0.22%161.1923.806.77x144.2122.646.37x
Faster R-CNN ResNet50 pb 26.61% 26.59%0.09%178.8929.206.13x+0.09%164.5528.385.80x
YOLOv3 pb 83.28% 82.35%1.12%249.3594.442.64x+1.12%247.5681.453.04x
BERT large SQuAD pb92.4492.9992.44%92.99% -0.58%46.5420.372.28x49.1717.522.81x
BERT large SQuAD (ONNX Model Zoo) pb92.3692.9892.36%92.98% -0.67%42.6520.792.05x45.0617.552.57x
BERT base MRPCTransformer LTpb25.82%25.86%-0.15%28.9915.771.84x
Transformer lt MLPerfpb27.13%27.17%-0.13%10.275.082.02x
Mask R-CNN Inception V2pb28.46%28.73%-0.91%195.6850.723.86x
Mask R-CNN Inception V2 ckpt85.78%86.52%-0.85%390.36212.961.83x28.46%28.73%-0.91%206.1447.044.38x
+ +### Keras Models with keras 2.15.1 + + + + + + + + - + + + + + + + + + + + - - - - - - + + + + + + - -
ModelExampleAccuracyPerformance 1s4c14ins1bs
Throughput(samples/sec)
VITINT8FP32Accuracy Ratio
[(INT8-FP32)/FP32]
INT8FP32Performance Ratio
[INT8/FP32]
Inception ResNet V2 pb81.39%81.92%-0.64%230.91142.241.62x80.25%80.40%-0.18%313.97178.271.76x
+ + Inception V3 + pb + 76.72% + 76.75% + -0.03% + 1076.10 + 401.89 + 2.68x + + + MobileNet V2 + pb + 71.49% + 71.76% + -0.37% + 947.44 + 779.51 + 1.22x + + + ResNet101 + pb + 77.52% + 76.45% + +1.41% + 1048.36 + 384.02 + 2.73x + + + ResNet50 + pb + 69.09% + 69.03% + +0.09% + 411.79 + 284.53 + 1.45x + + + ResNet50 + pb + 78.07% + 78.12% + -0.06% + 680.56 + 498.08 + 1.37x + + + ResNetV2 101 + pb + 72.64% + 71.87% + +1.08% + 492.00 + 295.77 + 1.66x + + + ResNetV2 50 + pb + 70.37% + 69.64% + +1.05% + 779.42 + 539.54 + 1.44x + + + VGG16 + pb + 72.69% + 70.89% + +2.55% + 1388.62 + 203.39 + 6.83x + + + VGG19 + pb + 72.67% + 71.01% + +2.33% + 1236.12 + 169.74 + 7.28x + + -### PyTorch Models with Torch 2.2.1+cpu in PTQ Mode +### PyTorch Models with Torch 2.3.0+cpu in PTQ Mode @@ -351,9 +511,29 @@ For more complete information about performance and benchmark results, visit www - - - + + + + + + + + + + + + + + + + + + + + + + + @@ -361,9 +541,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -371,9 +551,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -381,9 +561,9 @@ For more complete information about performance and benchmark results, visit www - + - + @@ -391,89 +571,49 @@ For more complete information about performance and benchmark results, visit www - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + - - - - - - - - - - - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + @@ -481,79 +621,69 @@ For more complete information about performance and benchmark results, visit www - - - + + + - - + + - - - - - - - - - - - - - + + + - + - - - - + + + + - - - - + + + + - + - - - - + + + + - + - - - + + + - + - - - - + + + + @@ -561,95 +691,113 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - - - - - - - - + + + + + + + + - - - - - - + + + + + + - - + + - - - + + + - -
69.59% 69.76% -0.24%1989.72600.453.31x1707.52602.472.83x
EfficientNet-B3static77.78%78.54%-0.98%513.82360.021.43x
PeleeNetstatic71.83%72.10%-0.37%837.83541.661.55x
ResNet5075.98% 76.15% -0.21%1165.92303.913.84x1135.22311.473.64x
Inception V369.46% 69.52% -0.09%953.35302.523.15x948.03322.552.94x
ResNeSt5080.76% 81.04% -0.35%365.44406.11 39.669.21x10.24x
ResNeXt101_32x8d78.92% 79.31% -0.49%548.78104.145.27x
Efficientnet_b0static76.94%77.67%-0.94%636.62566.421.12x
Efficientnet_b3static77.78%78.54%-0.98%471.61358.591.32x
Peleenetstatic71.83%72.10%-0.37%790.03504.441.57x582.22106.735.45x
YOLO V3 static 55.10% 54.93%0.31%162.9857.372.84x
SSD ResNet34static19.4819.63-0.77%137.8911.6111.88x+0.31%156.2960.302.59x
Roberta base MRPC static92.97%93.14% 93.59%-0.66%390.95175.442.23x-0.48%396.85176.802.24x
CamemBERT base MRPC static88.47%88.58% 89.28%-0.91%393.70174.512.26x-0.78%405.37182.872.22x
DistilBERT base MRPC static90.30%90.64% 90.27%0.04%783.37344.912.27x+0.41%799.05346.502.31x
DistilBERT base MRPC90.02% 90.27% -0.28%684.20344.681.99x705.91348.162.03x
ALBERT base MRPC static92.63%92.63%92.28%92.28% 0.00%312.48155.602.01x
Funnel   MRPCstatic91.94%92.25%-0.34%281.83179.041.57x350.78164.322.13x
Xlm Roberta MRPC static89.46%87.80% 88.62%0.94%395.91173.592.28x-0.93%396.06175.962.25x
Xlm Roberta MRPC dynamic 88.54% 88.24%0.35%373.90173.912.15x+0.35%381.19175.962.17x
BERT base MRPC static89.56%89.59% 90.42%-0.95%405.08176.382.30x-0.91%402.42177.732.26x
BERT base COLA static52.86%53.47% 53.39%-0.99%395.37177.37+0.16%395.25177.02 2.23x
BERT base STSB static87.39%87.61% 88.05%-0.74%396.71173.802.28x-0.49%397.62177.232.24x
BERT base SST-291.97% 92.32% -0.37%393.20173.652.26x407.66182.932.23x
BERT large COLA static62.80%63.39% 63.35%-0.88%136.5551.82+0.06%147.8656.01 2.64x
BERT base RTE static73.29%71.84% 72.56%1.00%377.79173.842.17x-1.00%397.83177.402.24x
BERT large MRPC static89.36%90.07% 90.38%-1.12%136.7251.872.64x-0.34%146.8452.972.77x
BERT large QNLI static90.79%91.12% 91.54%-0.82%391.67173.822.25x-0.46%394.51176.922.23x
BERT large RTE static73.29%73.65% 74.01%-0.98%135.2051.902.61x-0.49%148.8455.832.67x
BERT large RTEdynamic73.29%74.01%-0.98%117.1451.742.26xFunnel MRPC91.94%92.25%-0.34%294.76187.411.57x
BERT large SQuAD static92.2993.16-0.93%32.6116.881.93x92.34%93.16%-0.88%50.2118.692.69x
lvwerra/pegasus-samsum static42.3242.6742.32%42.67% -0.82%93.8037.592.50x102.7337.992.70x
- + + ResNet18 PT2E + static + 69.49% + 69.76% + -0.39% + 1873.51 + 1106.97 + 1.69x + + + OPT-125M PT2E + static + 37.07% + 37.90% + -2.20% + 42.09 + 29.68 + 1.42x + + -### PyTorch Models with Torch 2.2.1+cpu in QAT Mode +### PyTorch Models with Torch 2.3.0+cpu in QAT Mode @@ -675,9 +823,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -685,9 +833,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -695,715 +843,57 @@ For more complete information about performance and benchmark results, visit www - - - - - - - - - - - - - + + +
69.74% 69.76% -0.03%1981.66598.393.31x1717.59602.652.85x
ResNet5076.03% 76.15% -0.15%1095.95298.923.67x1091.62305.833.57x
ResNeXt101_32x8d79.31% 79.31% 0.00%549.02103.725.29x
BERT base MRPCstatic89.40%90.40%-1.11%375.61176.152.13x584.54107.385.44x
+### PyTorch Models with Torch 2.3.0+cpu in IPEX Mode -### PyTorch Models with Torch 2.0.1+cpu in WOQ Mode - - +
- - - - - - - - + + + + - - - - - - - + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + +
Model nameConfigurationLambada_openaiHellaswagWinograndePiqaAverage
[Mean accuracy of previous four tasks]
WikitextModelExampleAccuracyPerformance 1s4c14ins1bs
Throughput(samples/sec)
AccuracyAccuracyAccuracyAccuracyAccuracyAccuracy Ratio
[INT4/FP32]
Word_perplexityINT8FP32Accuracy Ratio
[(INT8-FP32)/FP32]
INT8FP32Performance Ratio
[INT8/FP32]
EleutherAI/gpt-j-6bFP320.68310.49540.64090.75410.6434/10.8816
GPTQ
W4G128Asym
0.6790.48950.64330.74760.63990.994511.0999
GPTQ
W4G32Asym
0.68290.49230.64010.74860.64100.996311.0141
GPTQ
W4G128Sym
0.6850.49070.63610.74430.63900.993211.1498
GPTQ
W4G32Sym
0.69110.48990.64480.74970.64391.000811.0927
facebook/opt-6.7bFP320.67690.50490.65430.76280.6497/12.2862
GPTQ
W4G32Asym
0.68040.49840.65350.75680.64730.996212.4193
GPTQ
W4G32Sym
0.68850.49730.64330.7530.64550.993512.4607
decapoda-research/llama-7b-hfFP320.73610.56420.67090.78350.6887/9.4202
GPTQ
W4G32Asym
0.72440.56030.66140.78350.68240.99099.5881
decapoda-research/llama-13b-hfFP320.76270.59110.70090.78780.7106/8.212
GPTQ
W4G128Asym
0.75180.58430.69610.79110.70580.99328.4319
GPTQ
W4G32Asym
0.75720.58980.70560.78940.71050.99988.3429
GPTQ
W4G128Sym
0.75960.58410.69770.79050.70800.99638.4916
decapoda-research/llama-30b-hfFP320.77590.62660.72770.80960.7350/6.2384
GPTQ
W4G128Asym
0.7780.6240.72690.80470.73340.99796.4237
GPTQ
W4G32Asym
0.77060.62390.72850.80580.73220.99636.4697
GPTQ
W4G128Sym
0.78360.61950.72690.80470.73370.99836.5604
meta-llama/Llama-2-7b-chat-hfFP320.70580.57320.6480.77150.6746/11.7107
GPTQ
W4G128Asym
0.69820.56370.65270.77040.67130.995011.9702
GPTQ
W4G32Asym
0.69530.56820.65750.77580.67420.999411.9317
meta-llama/Llama-2-7b-hfFP320.73920.5670.67090.78350.6902/8.7911
GPTQ
W4G32Asym
0.73530.56420.66220.78290.68620.99428.9635
GPTQ
W4G128Sym
0.72460.56170.67560.77970.68540.99319.2799
meta-llama/Llama-2-13b-chat-hfFP320.73120.60590.71030.78350.7077/10.2213
GPTQ
W4G128Asym
0.72730.60180.70880.77420.70300.99342538.083
GPTQ
W4G32Asym
0.72830.60530.70240.77640.70310.99351889.374
GPTQ
W4G128Sym
0.7270.59970.70240.7780.70180.99162504.497
meta-llama/Llama-2-13b-hfFP320.76770.59720.69610.78780.7122/7.8984
GPTQ
W4G128Asym
0.76270.59330.6890.78510.70750.99341556.448
GPTQ
W4G32Asym
0.76750.59340.69770.78560.71110.99841514.927
GPTQ
W4G128Sym
0.75660.58990.70320.78560.70880.99531374.728
bigscience/bloom-7b1FP320.57640.46280.64560.72690.6029/30.6438
GPTQ
W4G32Sym
0.57990.45420.63610.73120.60040.995732.0626
bigscience/bloomz-7b1FP320.55930.47890.65270.76280.6134/51.7432
GPTQ
W4G32Asym
0.55250.47310.65040.76170.60940.993552.7828
databricks/dolly-v1-6bFP320.68660.50980.64330.76220.6505/11.3242
GPTQ
W4G128Asym
0.68780.50580.63930.76330.64910.997811.5514
GPTQ
W4G32Asym
0.68640.50840.65190.75680.65091.000611.4728
GPTQ
W4G128Sym
0.68760.50450.64330.75410.64740.995211.6474
databricks/dolly-v2-7bFP320.63790.52820.6140.74480.6312/16.161
GPTQ
W4G32Asym
0.63770.52280.59910.74480.62610.991916.4096
EleutherAI/gpt-neo-2.7bFP320.62240.42710.5770.7220.5871/13.9359
GPTQ
W4G128Asym
0.61230.42270.57380.72030.58230.991714.3377
GPTQ
W4G32Asym
0.6150.42590.57140.72470.58430.995114.2083
GPTQ
W4G32Sym
0.61540.42080.57770.71980.58340.993714.3121
EleutherAI/gpt-neox-20bFP320.72330.53590.66140.77530.6740/9.195
GPTQ
W4G128Asym
0.71860.53280.65350.76990.66870.99229.3463
GPTQ
W4G32Asym
0.72680.5330.6590.77150.67260.99799.2897
mosaicml/mpt-7bFP320.70560.57180.68590.79270.6890/9.9324
GPTQ
W4G128Asym
0.70060.56550.68030.79650.68570.995210.1515
mosaicml/mpt-7b-chatFP320.6550.57520.67480.78450.6724/13.5951
GPTQ
W4G128Asym
0.64720.57160.66850.7840.66780.993213.8539
mosaicml/mpt-7b-instructFP320.69180.58190.6780.79270.6861/10.8863
GPTQ
W4G128Asym
0.68640.57650.68270.78730.68320.995811.1451
mosaicml/mpt-7b-storywriterFP320.6930.54770.6630.7840.6719/9.9125
GPTQ
W4G128Asym
0.68540.54430.66610.78130.66930.996110.1137
tiiuae/falcon-rw-7bFP320.66040.54190.65980.77530.6594/11.7616
GPTQ
W4G128Asym
0.64840.53690.65750.78070.65590.994711.9411
GPTQ
W4G32Asym
0.65710.53980.65820.77640.65790.997811.8809
GPTQ
W4G128Sym
0.6520.5350.65750.76820.65320.990612.0048
tiiuae/falcon-7b-instructFP320.64370.51770.66690.78240.6527/14.5053
GPTQ
W4G128Asym
0.63010.51420.66540.78350.64830.993314.8146
GPTQ
W4G32Asym
0.63770.5170.65980.78070.64880.994114.6953bert-large-uncased-whole-word-masking-finetuned-squadstatic93.01%93.16%-0.16%150.0522.426.69x
distilbert-base-uncased-distilled-squadstatic86.10%86.84%-0.85%1034.60151.136.85x
- -### ONNX Models with ONNX Runtime 1.17.1 +### ONNX Models with ONNX Runtime 1.18.1 @@ -1424,24 +914,24 @@ For more complete information about performance and benchmark results, visit www - + - + - - - - + + + + - + - - - - + + + + @@ -1449,19 +939,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - + + + @@ -1469,49 +959,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + @@ -1519,9 +979,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1529,9 +989,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1539,19 +999,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1559,9 +1019,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1569,9 +1029,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1579,9 +1039,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + + + + + + + + + + + @@ -1589,19 +1059,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1609,19 +1079,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1629,19 +1099,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1649,19 +1119,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1669,19 +1139,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + @@ -1689,19 +1159,29 @@ For more complete information about performance and benchmark results, visit www - - - + + + - + - - - - + + + + + + + + + + + + + + @@ -1709,29 +1189,49 @@ For more complete information about performance and benchmark results, visit www - - - + + + + + + + + + + + + + - + - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - + + + + + + @@ -1739,9 +1239,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1749,9 +1249,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1759,98 +1259,138 @@ For more complete information about performance and benchmark results, visit www - - - + + + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - - - - + + + + - + - + + + + + + + + + + + + + + + + + + + + + - - - - + + + + - - + + - - - + + + - + + + + + + + + + + + + + + + + + + + + + - - + + @@ -1859,9 +1399,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1869,9 +1409,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1879,9 +1419,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1889,8 +1429,8 @@ For more complete information about performance and benchmark results, visit www - - + + @@ -1899,9 +1439,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + + + + + + + + + + + @@ -1909,9 +1459,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1919,8 +1469,8 @@ For more complete information about performance and benchmark results, visit www - - + + @@ -1928,10 +1478,10 @@ For more complete information about performance and benchmark results, visit www - - - - + + + + @@ -1939,9 +1489,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1949,18 +1499,38 @@ For more complete information about performance and benchmark results, visit www - - + + + + + + + + + + + + + + + + + + + + + + - - + + @@ -1969,9 +1539,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1979,9 +1549,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -1989,19 +1559,19 @@ For more complete information about performance and benchmark results, visit www - - - + + + - - - - + + + + @@ -2009,9 +1579,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -2019,9 +1589,9 @@ For more complete information about performance and benchmark results, visit www - - - + + + @@ -2029,254 +1599,241 @@ For more complete information about performance and benchmark results, visit www - - + + + + + + + + + + + + + + + + + + + + + + - - - - + + + + + + + + + + + + + + - - + + - - - + + + - - + + - - - + + + - - + + - - - + + + - - + + - - - + + + - - + + - - + + - - + + - - - + + + - - + + - - - + + + - - + + - - - + + + - - - - - - - - - - - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - - - -
ResNet50   V1.5ResNet50 V1.5 qlinearops72.16%72.18% 72.29%-0.18%1666.73734.162.27x-0.16%1495.72715.942.09x
ResNet50 V1.5 qdq72.19%72.13% 72.29%-0.15%1658.10734.332.26x-0.23%1547.30717.032.16x
ResNet50 V1.5 MLPerf76.15% 76.46% -0.41%1495.15733.592.04x1365.56718.551.90x
ResNet50 V1.5 MLPerf qdq76.12%76.13% 76.46% -0.44%1661.90732.042.27x1445.75718.962.01x
ResNet50 V1.5 (ONNX Model Zoo)74.77% 74.99% -0.29%1713.86767.912.23x1574.38749.362.10x
ResNet50 V1.5 (ONNX Model Zoo) qdq74.48%74.78% 74.99%-0.67%1747.21770.142.27x
MobileNet V2qlinearops65.55%66.89%-2.01%7519.954430.841.70x
MobileNet V2qdq65.60%66.89%-1.93%7572.974413.581.72x
MobileNet V2 (ONNX Model Zoo)qlinearops68.51%69.48%-1.41%7190.264019.161.79x-0.27%1564.15755.582.07x
VGG1666.55% 66.69% -0.20%613.47170.953.59x526.57162.643.24x
VGG1666.62% 66.69% -0.11%611.78186.213.29x520.09172.423.02x
VGG16 (ONNX Model Zoo)72.37% 72.40% -0.04%619.00184.353.36x558.81162.873.43x
VGG16 (ONNX Model Zoo) qdq72.37%72.36% 72.40%-0.03%623.02172.273.62x-0.04%556.58176.923.15x
MobileNet V3 MLPerf75.51% 75.74% -0.30%5711.042584.172.21x5421.722578.082.10x
MobileNet V3 MLPerf75.51% 75.74% -0.30%6136.362630.212.33x5382.872567.482.10x
ShuffleNet V2 (ONNX Model Zoo)66.13% 66.36% -0.36%6820.893686.461.85x6426.223725.691.72x
ShuffleNet V2 (ONNX Model Zoo)qdq66.22%66.36%-0.22%6534.243707.741.76x
GoogleNet (ONNX Model Zoo)67.69% 67.79% -0.14%1971.181120.081.76x1842.901137.581.62x
GoogleNet (ONNX Model Zoo) qdq67.64%67.71% 67.79%-0.22%1838.281142.351.61x-0.11%1818.991136.371.60x
SqueezeNet (ONNX Model Zoo)56.49% 56.87% -0.67%10163.135771.891.76x9521.995530.361.72x
SqueezeNet (ONNX Model Zoo) qdq56.33%56.49% 56.87%-0.94%10339.146002.841.72x-0.67%9391.075519.791.70x
CaffeNet (ONNX Model Zoo)56.26% 56.30% -0.07%2805.961077.802.60x2949.36893.773.30x
CaffeNet (ONNX Model Zoo) qdq56.18%56.26% 56.30%-0.21%4351.65822.715.29x-0.08%2847.24901.153.16x
AlexNet (ONNX Model Zoo)54.73% 54.79% -0.10%2169.83893.062.43x2070.17816.712.53x
AlexNet (ONNX Model Zoo) qdq54.74%54.71% 54.79%-0.08%2232.07841.462.65x-0.14%2059.13844.972.44x
ZFNet (ONNX Model Zoo)55.83% 55.96% -0.24%921.09525.211.75x858.76461.251.86x
ZFNet (ONNX Model Zoo) qdq55.82%55.87% 55.96%-0.24%925.69534.051.73x-0.16%853.77457.911.86x
Inception V1 (ONNX Model Zoo)67.23% 67.24% -0.02%1862.371161.551.60x1891.361205.951.57x
Inception V1 (ONNX Model Zoo) qdq67.19%67.23% 67.24%-0.07%1956.471262.641.55x-0.02%1879.271202.191.56x
BEiT (ONNX Model Zoo)qlinearops85.07%85.28%-0.25%205.15126.591.62x
EfficientNet (ONNX Model Zoo)77.02% 77.11% -0.12%2793.231383.392.02x2428.321344.031.81x
EfficientNet (ONNX Model Zoo)qdq76.99%77.11%-0.16%2286.731307.181.75x
BEITDenseNet (ONNX Model Zoo) qlinearops85.0785.28-0.25%206.50128.131.61x60.53%60.96%-0.71%626.26499.761.25x
SSD MobileNet V1 (ONNX Model Zoo)qlinearops22.96%23.02%-0.27%1121.43841.321.33x
SSD (ONNX Model Zoo)SSD MobileNet V1 (ONNX Model Zoo) qdq18.62%18.98%-1.90%56.9714.573.91x22.96%23.02%-0.27%1048.50798.221.31x
DUC (ONNX Model Zoo)81.62% 81.92% -0.37%8.765.031.74x9.264.991.86x
Ultra Face (ONNX Model Zoo)83.33% 83.65% -0.38%8780.521920.304.57x8993.581988.464.52x
Emotion FERPlus (ONNX Model Zoo)7.94% 8.00% -0.70%6360.853067.122.07x6113.743087.501.98x
ArcFace (ONNX Model Zoo) qlinearops 99.82% 99.80%0.02%449.50235.011.91x+0.02%442.85230.751.92x
BERT base MRPC qlinearops85.78%85.54% 86.03%-0.28%511.36225.152.27x-0.57%483.81219.452.20x
BERT base MRPC qdq85.78%85.54% 86.03%-0.28%484.44222.432.18x-0.57%485.08218.332.22x
BERT base MRPC integerops85.78%85.29% 86.03%-0.28%728.48222.353.28x-0.85%684.46218.863.13x
DistilBERT base MRPC qdq85.05%84.07% 84.56%0.58%635.93405.581.57x-0.58%633.28399.311.59x
DistilBERT base MRPC integerops85.29%85.54% 84.56%0.87%1324.26405.483.27x+1.16%1388.44401.083.46x
Roberta base MRPCMobile bert MRPC qdq88.24%85.54%86.28%-0.85%505.62387.431.31x
Mobile bert MRPCintegerops85.54%86.28%-0.85%565.46386.391.46x
Roberta base MRPCintegerops90.93% 89.95%-1.91%484.00223.372.17x+1.09%702.17219.503.20x
BERT SQuAD (ONNX Model Zoo) integerops80.2980.6780.29%80.67% -0.47%244.9399.292.47x242.5897.712.48x
BERT base cased MRPC (HuggingFace)MobileBERT SQuAD MLPerf (ONNX Model Zoo)integerops89.87%90.03%-0.17%151.69125.351.21x
GPT2 lm head WikiText (ONNX Model Zoo)integerops31.98%29.00%+10.31%17.9610.211.76x
BERT base uncased MRPC (HuggingFace) qlinearops 90.21% 90.42% -0.23%440.17214.15434.65210.58 2.06x
89.58% 90.42% -0.93%715.22201.243.55x708.66210.743.36x
Roberta base MRPC (HuggingFace)91.00% 91.38% -0.41%434.48214.202.03x431.37211.032.04x
Roberta base MRPC (HuggingFace)90.85% 91.38% -0.58%714.20213.543.34x711.11210.713.37x
XLM Roberta base MRPC (HuggingFace)89.37% 90.10% -0.81%339.02214.41334.88211.56 1.58x
89.66% 90.10% -0.50%406.04215.121.89x401.99211.431.90x
Camembert base MRPC (HuggingFace)qlinearops89.28%89.28%0.00%282.30213.331.32x
Camembert base MRPC (HuggingFace)89.19% 89.28% -0.10%712.67217.683.27x707.22214.233.30x
MiniLM L12 H384 uncased MRPC (HuggingFace)90.13% 90.97% -0.93%1209.98588.931188.05578.35 2.05x
integerops 91.07% 90.97%0.10%1268.43588.052.16x+0.10%1285.13576.042.23x
DistilBERT base uncased SST-2 (HuggingFace)90.71% 91.06% -0.38%1253.85399.523.14x1259.69396.603.18x
DistilBERT base uncased SST-2 (HuggingFace)90.25% 91.06% -0.88%925.68399.54914.63395.09 2.32x
Albert base v2 SST-2 (HuggingFace)qlinearops92.09%92.32%-0.25%284.62210.521.35x
Albert base v2 SST-2 (HuggingFace)integerops91.74%92.32%-0.62%284.69210.001.36x
MiniLM L6 H384 uncased SST-2 (HuggingFace) qlinearops 89.45% 90.14% -0.76%2209.721139.622172.981121.66 1.94x
89.91% 90.14% -0.26%2365.971137.322.08x2326.271114.572.09x
BERT base cased MRPC (HuggingFace)87.70% 88.29% -0.67%497.73214.322.32x494.96210.802.35x
BERT base cased MRPC (HuggingFace)88.19% 88.29% -0.12%718.26214.323.35x714.61210.993.39x
Electra small discriminator MRPC (HuggingFace) qlinearops 89.92% 89.83%0.09%1951.071142.891.71x+0.09%1998.711115.181.79x
Electra small discriminator MRPC (HuggingFace)89.27% 89.83% -0.63%2198.931129.201.95x2202.811121.411.96x
BERT mini MRPC (HuggingFace)86.21% 86.52% -0.35%5814.173388.021.72x5767.233254.791.77x
BERT mini MRPC (HuggingFace)86.16% 86.52% -0.41%6396.893445.066354.663424.42 1.86x
Xlnet base cased MRPC (HuggingFace)qlinearops90.05%89.86%+0.21%121.2495.561.27x
Xlnet base cased MRPC (HuggingFace)integerops89.58%89.86%-0.31%123.0695.601.29x
BART large MRPC (HuggingFace) integerops 92.36% 91.20%1.28%126.3152.282.42x+1.28%126.1451.062.47x
DeBERTa v3 base MRPC (HuggingFace)integerops92.39%92.23%+0.17%193.16153.161.26x
Spanbert SQuAD (HuggingFace) qlinearops91.1491.9891.14%91.98% -0.91%75.8643.481.74x81.9643.361.89x
Spanbert SQuAD (HuggingFace) integerops91.4091.9891.40%91.98% -0.63%92.2443.512.12x101.7143.372.35x
Bert base multilingual cased SQuAD (HuggingFace) qlinearops88.4289.1388.42%89.13% -0.79%79.0643.451.82x86.3343.272.00x
Bert base multilingual cased SQuAD (HuggingFace) integerops88.7089.1388.70%89.13% -0.48%93.0343.232.15x101.7843.242.35x
DistilBert base uncased SQuAD (HuggingFace) qlinearops86.3386.8686.33%86.86% -0.62%118.6868.43120.7169.72 1.73x
DistilBert base uncased SQuAD (HuggingFace) integerops86.0586.8686.05%86.86% -0.94%186.3368.412.72x203.7169.682.92x
BERT large uncased whole word masking SQuAD (HuggingFace) qlinearops92.3493.1692.34%93.16% -0.88%28.6713.122.19x31.8112.942.46x
BERT large uncased whole word masking SQuAD (HuggingFace) integerops92.9993.1692.99%93.16% -0.18%32.3213.142.46x35.8312.942.77x
Roberta large SQuAD v2 (HuggingFace)integerops89.0489.020.02%32.3713.402.42x
LayoutLMv3 FUNSD (HuggingFace) qlinearops89.66%90.49%-0.91%47.6027.281.74x89.03%89.02%+0.02%17.6113.271.33x
LayoutLMv3 FUNSD (HuggingFace)Roberta large SQuAD v2 (HuggingFace) integerops89.95%90.49%-0.59%56.2627.432.05x89.04%89.02%+0.02%35.8513.262.70x
LayoutLMv2 (HuggingFace)GPT2 WikiText (HuggingFace) qlinearops80.95%81.17%-0.27%64.1438.911.65x30.25%29.00%+4.33%13.8510.171.36x
LayoutLMv2 (HuggingFace)GPT2 WikiText (HuggingFace) integerops80.60%81.17%-0.71%67.0138.841.73x
- -### ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + - - - - - - - - - - - + + + + + + + + - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - + + + + + + + + - - - - + + + + + + + + - - - - - + + + + + + + + - - - - + + + + + + + + - -
Model nameConfigurationLambada_openaiAccuracy Ratio
[INT4/FP32]
AccuracyPerplexity
meta-llama/Llama-2-7b-chat-hfFP320.70583.2788/
GPTQ
W4G32Asym
0.70023.41240.9921
meta-llama/Llama-2-7b-hfFP320.73923.3950/29.68%29.00%+2.36%14.6410.091.45x
GPTQ
W4G32Asym
0.73123.57110.9892
meta-llama/Llama-2-13b-chat-hfFP320.73122.9163/DistilGPT2 WikiText (HuggingFace)qlinearops44.93%43.43%+3.46%21.8017.131.27x
GPTQ
W4G128Asym
0.72402.99450.9902
meta-llama/Llama-2-13b-hfFP320.76773.0438/DistilGPT2 WikiText (HuggingFace)integerops44.62%43.43%+2.74%23.0217.091.35x
GPTQ
W4G128Asym
0.76343.11860.9944
GPTQ
W4G32Asym
0.76153.12760.9919LayoutLMv3 FUNSD (HuggingFace)integerops90.07%90.49%-0.46%39.5028.001.41x
meta-llama/Llama-2-70b-chat-hfFP320.75432.6181/CodeBert (HuggingFace)qlinearops64.97%65.41%-0.67%75.6945.101.68x
RTN
W4G32Asym
0.75182.64960.9967CodeBert (HuggingFace)integerops64.93%65.41%-0.73%94.4745.102.09x
meta-llama/Llama-2-70b-hfFP320.79642.6612/FCN (ONNX Model Zoo)qlinearops64.54%64.98%-0.67%25.8312.902.00x
RTN
W4G32Sym
0.79412.72430.9971FCN (ONNX Model Zoo)qdq64.54%64.98%-0.67%25.9712.992.00x
- + ## Validated Pruning Examples @@ -2617,18 +2174,18 @@ For more complete information about performance and benchmark results, visit www ## Validated Knowledge Distillation Examples -| Example Name | Dataset | Student
(Metrics) | Teacher
(Metrics) | Student With Distillation
(Metrics Improvement) | Student With
Distributed Distillation
(Metrics Improvement) | -|---------------------|-----------|--------------------------------------|------------------------------------|-----------------------------------------------------|-----------------------------------------------------| -| MobileNet example | CIFAR-10 | MobileNetV2-0.35
(0.7965 ACC) | WideResNet40-2
(0.9522 ACC) | 0.8178 ACC
(0.0213 ACC) | 0.8235 ACC
(0.027 ACC) | -| CNN example | CIFAR-100 | CNN-2
(0.5494 ACC) | CNN-10
(0.7153 ACC) | 0.5540 ACC
(0.0046 ACC) | 0.5523 ACC
(0.0029 ACC) | -| VGG example | CIFAR-100 | VGG-8-BN
(0.7022 ACC) | VGG-13-BN
(0.7415 ACC) | 0.7025 ACC
(0.0003 ACC) | NA | -| ResNet example | ImageNet | ResNet18
(0.6739 ACC) | ResNet50
(0.7399 ACC) | 0.6845 ACC
(0.0106 ACC) | NA | -| BlendCnn example | MRPC | BlendCnn
(0.7034 ACC) | BERT-Base
(0.8382 ACC) | 0.7034 ACC
(0 ACC) | NA | -| BiLSTM example | SST-2 | BiLSTM
(0.8314 ACC) | RoBERTa-Base
(0.9403 ACC) | 0.9048 ACC
(0.0734 ACC) | NA | -|DistilBERT example | SQuAD | DistilBERT
(0.7323/0.8256 EM/F1) | BERT-Base
(0.8084/0.8814 EM/F1) | 0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1) | NA | -|TinyBERT example | MNLI | TinyBERT
(0.8018/0.8044 m/mm) | BERT-Base
(0.8363/0.8411 m/mm) | 0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm) | NA | -|BERT-3 example | QQP | BERT-3
(0.8626/0.8213 EM/F1) | BERT-Base
(0.9091/0.8782 EM/F1) | 0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1) | NA | -|DistilRoBERTa example| COLA | DistilRoBERTa
(0.6057 ACC) | RoBERTa-Large
(0.6455 ACC) | 0.6187 ACC
(0.0130 ACC) | NA | +| Example Name | Dataset | Student
(Metrics) | Teacher
(Metrics) | Student With Distillation
(Metrics Improvement) | Student With
Distributed Distillation
(Metrics Improvement) | +| --------------------- | --------- | ----------------------------------- | ---------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------ | +| MobileNet example | CIFAR-10 | MobileNetV2-0.35
(0.7965 ACC) | WideResNet40-2
(0.9522 ACC) | 0.8178 ACC
(0.0213 ACC) | 0.8235 ACC
(0.027 ACC) | +| CNN example | CIFAR-100 | CNN-2
(0.5494 ACC) | CNN-10
(0.7153 ACC) | 0.5540 ACC
(0.0046 ACC) | 0.5523 ACC
(0.0029 ACC) | +| VGG example | CIFAR-100 | VGG-8-BN
(0.7022 ACC) | VGG-13-BN
(0.7415 ACC) | 0.7025 ACC
(0.0003 ACC) | NA | +| ResNet example | ImageNet | ResNet18
(0.6739 ACC) | ResNet50
(0.7399 ACC) | 0.6845 ACC
(0.0106 ACC) | NA | +| BlendCnn example | MRPC | BlendCnn
(0.7034 ACC) | BERT-Base
(0.8382 ACC) | 0.7034 ACC
(0 ACC) | NA | +| BiLSTM example | SST-2 | BiLSTM
(0.8314 ACC) | RoBERTa-Base
(0.9403 ACC) | 0.9048 ACC
(0.0734 ACC) | NA | +| DistilBERT example | SQuAD | DistilBERT
(0.7323/0.8256 EM/F1) | BERT-Base
(0.8084/0.8814 EM/F1) | 0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1) | NA | +| TinyBERT example | MNLI | TinyBERT
(0.8018/0.8044 m/mm) | BERT-Base
(0.8363/0.8411 m/mm) | 0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm) | NA | +| BERT-3 example | QQP | BERT-3
(0.8626/0.8213 EM/F1) | BERT-Base
(0.9091/0.8782 EM/F1) | 0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1) | NA | +| DistilRoBERTa example | COLA | DistilRoBERTa
(0.6057 ACC) | RoBERTa-Large
(0.6455 ACC) | 0.6187 ACC
(0.0130 ACC) | NA | ## Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime