diff --git a/docs/source/validated_model_list.md b/docs/source/validated_model_list.md index fba2536cfe8..eaa32c0dfdb 100644 --- a/docs/source/validated_model_list.md +++ b/docs/source/validated_model_list.md @@ -1,21 +1,20 @@ +# Validated Models -Validated Models -====== IntelĀ® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in [example tables](https://github.com/intel/neural-compressor/blob/master/examples/README.md), and the performance/accuracy results is available here. 1. [Validated Quantization Examples](#Validated-Quantization-Examples) - 1.1. [TensorFlow Models with TensorFlow 2.15.0](#tensorflow-models-with-tensorflow-2150) + 1.1. [TensorFlow Models with TensorFlow 2.16.1](#tensorflow-models-with-tensorflow-2161) - 1.2. [PyTorch Models with Torch 2.2.1+cpu in PTQ Mode](#pytorch-models-with-torch-221cpu-in-ptq-mode) + 1.2. [Keras Models with keras 2.15.1](#keras-models-with-keras-2151) - 1.3. [PyTorch Models with Torch 2.2.1+cpu in QAT Mode](#pytorch-models-with-torch-221cpu-in-qat-mode) + 1.3. [PyTorch Models with Torch 2.3.0+cpu in PTQ Mode](#pytorch-models-with-torch-230cpu-in-ptq-mode) - 1.4. [PyTorch Models with Torch 2.0.1+cpu in WOQ Mode](#pytorch-models-with-torch-201cpu-in-woq-mode) + 1.4. [PyTorch Models with Torch 2.3.0+cpu in QAT Mode](#pytorch-models-with-torch-230cpu-in-qat-mode) - 1.5. [ONNX Models with ONNX Runtime 1.17.1](#onnx-models-with-onnx-runtime-1171) + 1.5. [PyTorch Models with Torch 2.3.0+cpu in IPEX Mode](#pytorch-models-with-torch-230cpu-in-ipex-mode) - 1.6. [ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode](#onnx-models-with-onnx-runtime-1150-in-woq-mode) + 1.6. [ONNX Models with ONNX Runtime 1.18.1](#onnx-models-with-onnx-runtime-1181) 2. [Validated Pruning Examples](#Validated-Pruning-Examples) @@ -25,14 +24,14 @@ IntelĀ® Neural Compressor validated examples with multiple compression technique ## Validated Quantization Examples -System summary: Test by Intel on 3/18/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0, -CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. +System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590, +Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model. Performance varies by use, configuration and other factors. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks -### TensorFlow Models with TensorFlow 2.15.0 +### TensorFlow Models with TensorFlow 2.16.1
74.11% | 74.27% | -0.22% | -1720.00 | -582.18 | -2.95x | +1732.92 | +578.88 | +2.99x | ||||||
ResNet50 v1.5 | @@ -68,9 +67,9 @@ For more complete information about performance and benchmark results, visit www76.25% | 76.46% | -0.28% | -1517.38 | -570.65 | -2.66x | +1535.20 | +530.00 | +2.90x | |||||
ResNet101 | @@ -78,29 +77,29 @@ For more complete information about performance and benchmark results, visit www77.52% | 76.45% | 1.41% | -1058.93 | -382.96 | -2.77x | +1048.36 | +384.02 | +2.73x | |||||
Inception V1 | pb | 70.45% | 69.74% | -1.03% | -2080.56 | -951.85 | -2.19x | ++1.03% | +2079.24 | +927.82 | +2.24x | |||
Inception V2 | pb | 74.33% | 73.97% | -0.49% | -1587.53 | -863.37 | -1.84x | ++0.49% | +1644.36 | +840.53 | +1.96x | |||
Inception V3 | @@ -108,9 +107,9 @@ For more complete information about performance and benchmark results, visit www76.72% | 76.75% | -0.03% | -1052.91 | -434.27 | -2.42x | +1076.10 | +401.89 | +2.68x | |||||
Inception V4 | @@ -118,9 +117,9 @@ For more complete information about performance and benchmark results, visit www80.13% | 80.27% | -0.18% | -707.41 | -234.38 | -3.02x | +704.96 | +199.28 | +3.54x | |||||
Inception ResNet V2 | @@ -128,89 +127,109 @@ For more complete information about performance and benchmark results, visit www80.25% | 80.40% | -0.18% | -320.37 | -179.46 | -1.79x | +313.97 | +178.27 | +1.76x | +|||||
DenseNet-161 | +pb | +76.29% | +76.29% | ++0.00% | +279.20 | +214.03 | +1.30x | |||||||
MobileNet V1 | pb | 71.79% | 70.96% | -1.18% | -4312.31 | -1512.59 | -2.85x | ++1.18% | +4199.13 | +1506.68 | +2.79x | |||
MobileNet V2 | pb | 72.48% | 71.76% | -1.01% | -2287.77 | -1406.75 | -1.63x | ++1.01% | +2170.39 | +1445.05 | +1.50x | |||
VGG16 | pb | 72.69% | 70.89% | -2.55% | -1367.34 | -207.41 | -6.59x | ++2.55% | +1388.62 | +203.39 | +6.83x | |||
VGG19 | pb | 72.67% | 71.01% | -2.33% | -1244.82 | -176.79 | -7.04x | ++2.33% | +1236.12 | +169.74 | +7.28x | +|||
ResNet50 | +pb | +69.09% | +69.03% | ++0.09% | +411.79 | +284.53 | +1.45x | |||||||
ResNetV2 50 | pb | 70.37% | 69.64% | -1.05% | -780.51 | -582.96 | -1.34x | ++1.05% | +779.42 | +539.54 | +1.44x | |||
ResNetV2 101 | pb | 72.64% | 71.87% | -1.08% | -494.43 | -329.51 | -1.50x | ++1.08% | +492.00 | +295.77 | +1.66x | |||
ResNetV2 152 | pb | 73.12% | 72.37% | -1.04% | -349.42 | -235.48 | -1.48x | ++1.04% | +348.39 | +205.72 | +1.69x | |||
Densenet 161 | +ViT | pb | -76.29% | -76.29% | -0.00% | -282.31 | -223.19 | -1.26x | +81.39% | +81.92% | +-0.64% | +230.53 | +132.66 | +1.74x |
SSD ResNet50 V1 | @@ -218,9 +237,9 @@ For more complete information about performance and benchmark results, visit www37.91% | 38.00% | -0.24% | -139.49 | -30.99 | -4.50x | +135.71 | +28.75 | +4.72x | |||||
SSD MobileNet V1 | @@ -228,9 +247,9 @@ For more complete information about performance and benchmark results, visit www23.00% | 23.13% | -0.57% | -1284.41 | -756.56 | -1.70x | +1237.70 | +719.30 | +1.72x | |||||
SSD ResNet50 v1 | @@ -238,9 +257,9 @@ For more complete information about performance and benchmark results, visit www37.88% | 38.00% | -0.31% | -139.56 | -27.79 | -5.02x | +130.54 | +22.05 | +5.92x | |||||
SSD MobileNet v1 | @@ -248,9 +267,9 @@ For more complete information about performance and benchmark results, visit www22.96% | 23.13% | -0.71% | -1280.88 | -530.23 | -2.42x | +1234.56 | +529.34 | +2.33x | |||||
Faster R-CNN ResNet101 | @@ -258,74 +277,215 @@ For more complete information about performance and benchmark results, visit www30.32% | 30.39% | -0.22% | -161.19 | -23.80 | -6.77x | +144.21 | +22.64 | +6.37x | |||||
Faster R-CNN ResNet50 | pb | 26.61% | 26.59% | -0.09% | -178.89 | -29.20 | -6.13x | ++0.09% | +164.55 | +28.38 | +5.80x | |||
YOLOv3 | pb | 83.28% | 82.35% | -1.12% | -249.35 | -94.44 | -2.64x | ++1.12% | +247.56 | +81.45 | +3.04x | |||
BERT large SQuAD | pb | -92.44 | -92.99 | +92.44% | +92.99% | -0.58% | -46.54 | -20.37 | -2.28x | +49.17 | +17.52 | +2.81x | ||
BERT large SQuAD (ONNX Model Zoo) | pb | -92.36 | -92.98 | +92.36% | +92.98% | -0.67% | -42.65 | -20.79 | -2.05x | +45.06 | +17.55 | +2.57x | ||
BERT base MRPC | +Transformer LT | +pb | +25.82% | +25.86% | +-0.15% | +28.99 | +15.77 | +1.84x | +||||||
Transformer lt MLPerf | +pb | +27.13% | +27.17% | +-0.13% | +10.27 | +5.08 | +2.02x | +|||||||
Mask R-CNN Inception V2 | +pb | +28.46% | +28.73% | +-0.91% | +195.68 | +50.72 | +3.86x | +|||||||
Mask R-CNN Inception V2 | ckpt | -85.78% | -86.52% | --0.85% | -390.36 | -212.96 | -1.83x | +28.46% | +28.73% | +-0.91% | +206.14 | +47.04 | +4.38x | +
Model | +Example | +Accuracy | +Performance 1s4c14ins1bs Throughput(samples/sec) |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VIT | +INT8 | +FP32 | +Accuracy Ratio [(INT8-FP32)/FP32] |
+ INT8 | +FP32 | +Performance Ratio [INT8/FP32] |
+ |||||||
Inception ResNet V2 | pb | -81.39% | -81.92% | --0.64% | -230.91 | -142.24 | -1.62x | +80.25% | +80.40% | +-0.18% | +313.97 | +178.27 | +1.76x |
69.59% | 69.76% | -0.24% | -1989.72 | -600.45 | -3.31x | +1707.52 | +602.47 | +2.83x | + +|||||||
EfficientNet-B3 | +static | +77.78% | +78.54% | +-0.98% | +513.82 | +360.02 | +1.43x | +||||||||
PeleeNet | +static | +71.83% | +72.10% | +-0.37% | +837.83 | +541.66 | +1.55x | ||||||||
ResNet50 | @@ -361,9 +541,9 @@ For more complete information about performance and benchmark results, visit www75.98% | 76.15% | -0.21% | -1165.92 | -303.91 | -3.84x | +1135.22 | +311.47 | +3.64x | ||||||
Inception V3 | @@ -371,9 +551,9 @@ For more complete information about performance and benchmark results, visit www69.46% | 69.52% | -0.09% | -953.35 | -302.52 | -3.15x | +948.03 | +322.55 | +2.94x | ||||||
ResNeSt50 | @@ -381,9 +561,9 @@ For more complete information about performance and benchmark results, visit www80.76% | 81.04% | -0.35% | -365.44 | +406.11 | 39.66 | -9.21x | +10.24x | |||||||
ResNeXt101_32x8d | @@ -391,89 +571,49 @@ For more complete information about performance and benchmark results, visit www78.92% | 79.31% | -0.49% | -548.78 | -104.14 | -5.27x | -|||||||||
Efficientnet_b0 | -static | -76.94% | -77.67% | --0.94% | -636.62 | -566.42 | -1.12x | -||||||||
Efficientnet_b3 | -static | -77.78% | -78.54% | --0.98% | -471.61 | -358.59 | -1.32x | -||||||||
Peleenet | -static | -71.83% | -72.10% | --0.37% | -790.03 | -504.44 | -1.57x | +582.22 | +106.73 | +5.45x | |||||
YOLO V3 | static | 55.10% | 54.93% | -0.31% | -162.98 | -57.37 | -2.84x | -||||||||
SSD ResNet34 | -static | -19.48 | -19.63 | --0.77% | -137.89 | -11.61 | -11.88x | ++0.31% | +156.29 | +60.30 | +2.59x | ||||
Roberta base MRPC | static | -92.97% | +93.14% | 93.59% | --0.66% | -390.95 | -175.44 | -2.23x | +-0.48% | +396.85 | +176.80 | +2.24x | |||
CamemBERT base MRPC | static | -88.47% | +88.58% | 89.28% | --0.91% | -393.70 | -174.51 | -2.26x | +-0.78% | +405.37 | +182.87 | +2.22x | |||
DistilBERT base MRPC | static | -90.30% | +90.64% | 90.27% | -0.04% | -783.37 | -344.91 | -2.27x | ++0.41% | +799.05 | +346.50 | +2.31x | |||
DistilBERT base MRPC | @@ -481,79 +621,69 @@ For more complete information about performance and benchmark results, visit www90.02% | 90.27% | -0.28% | -684.20 | -344.68 | -1.99x | +705.91 | +348.16 | +2.03x | ||||||
ALBERT base MRPC | static | -92.63% | -92.63% | +92.28% | +92.28% | 0.00% | -312.48 | -155.60 | -2.01x | -||||||
Funnel MRPC | -static | -91.94% | -92.25% | --0.34% | -281.83 | -179.04 | -1.57x | +350.78 | +164.32 | +2.13x | |||||
Xlm Roberta MRPC | static | -89.46% | +87.80% | 88.62% | -0.94% | -395.91 | -173.59 | -2.28x | +-0.93% | +396.06 | +175.96 | +2.25x | |||
Xlm Roberta MRPC | dynamic | 88.54% | 88.24% | -0.35% | -373.90 | -173.91 | -2.15x | ++0.35% | +381.19 | +175.96 | +2.17x | ||||
BERT base MRPC | static | -89.56% | +89.59% | 90.42% | --0.95% | -405.08 | -176.38 | -2.30x | +-0.91% | +402.42 | +177.73 | +2.26x | |||
BERT base COLA | static | -52.86% | +53.47% | 53.39% | --0.99% | -395.37 | -177.37 | ++0.16% | +395.25 | +177.02 | 2.23x | ||||
BERT base STSB | static | -87.39% | +87.61% | 88.05% | --0.74% | -396.71 | -173.80 | -2.28x | +-0.49% | +397.62 | +177.23 | +2.24x | |||
BERT base SST-2 | @@ -561,95 +691,113 @@ For more complete information about performance and benchmark results, visit www91.97% | 92.32% | -0.37% | -393.20 | -173.65 | -2.26x | +407.66 | +182.93 | +2.23x | ||||||
BERT large COLA | static | -62.80% | +63.39% | 63.35% | --0.88% | -136.55 | -51.82 | ++0.06% | +147.86 | +56.01 | 2.64x | ||||
BERT base RTE | static | -73.29% | +71.84% | 72.56% | -1.00% | -377.79 | -173.84 | -2.17x | +-1.00% | +397.83 | +177.40 | +2.24x | |||
BERT large MRPC | static | -89.36% | +90.07% | 90.38% | --1.12% | -136.72 | -51.87 | -2.64x | +-0.34% | +146.84 | +52.97 | +2.77x | |||
BERT large QNLI | static | -90.79% | +91.12% | 91.54% | --0.82% | -391.67 | -173.82 | -2.25x | +-0.46% | +394.51 | +176.92 | +2.23x | |||
BERT large RTE | static | -73.29% | +73.65% | 74.01% | --0.98% | -135.20 | -51.90 | -2.61x | +-0.49% | +148.84 | +55.83 | +2.67x | |||
BERT large RTE | -dynamic | -73.29% | -74.01% | --0.98% | -117.14 | -51.74 | -2.26x | +Funnel MRPC | ++ | 91.94% | +92.25% | +-0.34% | +294.76 | +187.41 | +1.57x |
BERT large SQuAD | static | -92.29 | -93.16 | --0.93% | -32.61 | -16.88 | -1.93x | +92.34% | +93.16% | +-0.88% | +50.21 | +18.69 | +2.69x | ||
lvwerra/pegasus-samsum | static | -42.32 | -42.67 | +42.32% | +42.67% | -0.82% | -93.80 | -37.59 | -2.50x | +102.73 | +37.99 | +2.70x |
69.74% | 69.76% | -0.03% | -1981.66 | -598.39 | -3.31x | +1717.59 | +602.65 | +2.85x | ||
ResNet50 | @@ -685,9 +833,9 @@ For more complete information about performance and benchmark results, visit www76.03% | 76.15% | -0.15% | -1095.95 | -298.92 | -3.67x | +1091.62 | +305.83 | +3.57x | |
ResNeXt101_32x8d | @@ -695,715 +843,57 @@ For more complete information about performance and benchmark results, visit www79.31% | 79.31% | 0.00% | -549.02 | -103.72 | -5.29x | -||||
BERT base MRPC | -static | -89.40% | -90.40% | --1.11% | -375.61 | -176.15 | -2.13x | +584.54 | +107.38 | +5.44x |
Model name | -Configuration | -Lambada_openai | -Hellaswag | -Winogrande | -Piqa | -Average [Mean accuracy of previous four tasks] |
- Wikitext | +Model | +Example | +Accuracy | +Performance 1s4c14ins1bs Throughput(samples/sec) |
|||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | -Accuracy | -Accuracy | -Accuracy | -Accuracy | -Accuracy Ratio [INT4/FP32] |
- Word_perplexity | +INT8 | +FP32 | +Accuracy Ratio [(INT8-FP32)/FP32] |
+ INT8 | +FP32 | +Performance Ratio [INT8/FP32] |
||||
EleutherAI/gpt-j-6b | -FP32 | -0.6831 | -0.4954 | -0.6409 | -0.7541 | -0.6434 | -/ | -10.8816 | -||||||||
GPTQ W4G128Asym |
- 0.679 | -0.4895 | -0.6433 | -0.7476 | -0.6399 | -0.9945 | -11.0999 | -|||||||||
GPTQ W4G32Asym |
- 0.6829 | -0.4923 | -0.6401 | -0.7486 | -0.6410 | -0.9963 | -11.0141 | -|||||||||
GPTQ W4G128Sym |
- 0.685 | -0.4907 | -0.6361 | -0.7443 | -0.6390 | -0.9932 | -11.1498 | -|||||||||
GPTQ W4G32Sym |
- 0.6911 | -0.4899 | -0.6448 | -0.7497 | -0.6439 | -1.0008 | -11.0927 | -|||||||||
facebook/opt-6.7b | -FP32 | -0.6769 | -0.5049 | -0.6543 | -0.7628 | -0.6497 | -/ | -12.2862 | -||||||||
GPTQ W4G32Asym |
- 0.6804 | -0.4984 | -0.6535 | -0.7568 | -0.6473 | -0.9962 | -12.4193 | -|||||||||
GPTQ W4G32Sym |
- 0.6885 | -0.4973 | -0.6433 | -0.753 | -0.6455 | -0.9935 | -12.4607 | -|||||||||
decapoda-research/llama-7b-hf | -FP32 | -0.7361 | -0.5642 | -0.6709 | -0.7835 | -0.6887 | -/ | -9.4202 | -||||||||
GPTQ W4G32Asym |
- 0.7244 | -0.5603 | -0.6614 | -0.7835 | -0.6824 | -0.9909 | -9.5881 | -|||||||||
decapoda-research/llama-13b-hf | -FP32 | -0.7627 | -0.5911 | -0.7009 | -0.7878 | -0.7106 | -/ | -8.212 | -||||||||
GPTQ W4G128Asym |
- 0.7518 | -0.5843 | -0.6961 | -0.7911 | -0.7058 | -0.9932 | -8.4319 | -|||||||||
GPTQ W4G32Asym |
- 0.7572 | -0.5898 | -0.7056 | -0.7894 | -0.7105 | -0.9998 | -8.3429 | -|||||||||
GPTQ W4G128Sym |
- 0.7596 | -0.5841 | -0.6977 | -0.7905 | -0.7080 | -0.9963 | -8.4916 | -|||||||||
decapoda-research/llama-30b-hf | -FP32 | -0.7759 | -0.6266 | -0.7277 | -0.8096 | -0.7350 | -/ | -6.2384 | -||||||||
GPTQ W4G128Asym |
- 0.778 | -0.624 | -0.7269 | -0.8047 | -0.7334 | -0.9979 | -6.4237 | -|||||||||
GPTQ W4G32Asym |
- 0.7706 | -0.6239 | -0.7285 | -0.8058 | -0.7322 | -0.9963 | -6.4697 | -|||||||||
GPTQ W4G128Sym |
- 0.7836 | -0.6195 | -0.7269 | -0.8047 | -0.7337 | -0.9983 | -6.5604 | -|||||||||
meta-llama/Llama-2-7b-chat-hf | -FP32 | -0.7058 | -0.5732 | -0.648 | -0.7715 | -0.6746 | -/ | -11.7107 | -||||||||
GPTQ W4G128Asym |
- 0.6982 | -0.5637 | -0.6527 | -0.7704 | -0.6713 | -0.9950 | -11.9702 | -|||||||||
GPTQ W4G32Asym |
- 0.6953 | -0.5682 | -0.6575 | -0.7758 | -0.6742 | -0.9994 | -11.9317 | -|||||||||
meta-llama/Llama-2-7b-hf | -FP32 | -0.7392 | -0.567 | -0.6709 | -0.7835 | -0.6902 | -/ | -8.7911 | -||||||||
GPTQ W4G32Asym |
- 0.7353 | -0.5642 | -0.6622 | -0.7829 | -0.6862 | -0.9942 | -8.9635 | -|||||||||
GPTQ W4G128Sym |
- 0.7246 | -0.5617 | -0.6756 | -0.7797 | -0.6854 | -0.9931 | -9.2799 | -|||||||||
meta-llama/Llama-2-13b-chat-hf | -FP32 | -0.7312 | -0.6059 | -0.7103 | -0.7835 | -0.7077 | -/ | -10.2213 | -||||||||
GPTQ W4G128Asym |
- 0.7273 | -0.6018 | -0.7088 | -0.7742 | -0.7030 | -0.9934 | -2538.083 | -|||||||||
GPTQ W4G32Asym |
- 0.7283 | -0.6053 | -0.7024 | -0.7764 | -0.7031 | -0.9935 | -1889.374 | -|||||||||
GPTQ W4G128Sym |
- 0.727 | -0.5997 | -0.7024 | -0.778 | -0.7018 | -0.9916 | -2504.497 | -|||||||||
meta-llama/Llama-2-13b-hf | -FP32 | -0.7677 | -0.5972 | -0.6961 | -0.7878 | -0.7122 | -/ | -7.8984 | -||||||||
GPTQ W4G128Asym |
- 0.7627 | -0.5933 | -0.689 | -0.7851 | -0.7075 | -0.9934 | -1556.448 | -|||||||||
GPTQ W4G32Asym |
- 0.7675 | -0.5934 | -0.6977 | -0.7856 | -0.7111 | -0.9984 | -1514.927 | -|||||||||
GPTQ W4G128Sym |
- 0.7566 | -0.5899 | -0.7032 | -0.7856 | -0.7088 | -0.9953 | -1374.728 | -|||||||||
bigscience/bloom-7b1 | -FP32 | -0.5764 | -0.4628 | -0.6456 | -0.7269 | -0.6029 | -/ | -30.6438 | -||||||||
GPTQ W4G32Sym |
- 0.5799 | -0.4542 | -0.6361 | -0.7312 | -0.6004 | -0.9957 | -32.0626 | -|||||||||
bigscience/bloomz-7b1 | -FP32 | -0.5593 | -0.4789 | -0.6527 | -0.7628 | -0.6134 | -/ | -51.7432 | -||||||||
GPTQ W4G32Asym |
- 0.5525 | -0.4731 | -0.6504 | -0.7617 | -0.6094 | -0.9935 | -52.7828 | -|||||||||
databricks/dolly-v1-6b | -FP32 | -0.6866 | -0.5098 | -0.6433 | -0.7622 | -0.6505 | -/ | -11.3242 | -||||||||
GPTQ W4G128Asym |
- 0.6878 | -0.5058 | -0.6393 | -0.7633 | -0.6491 | -0.9978 | -11.5514 | -|||||||||
GPTQ W4G32Asym |
- 0.6864 | -0.5084 | -0.6519 | -0.7568 | -0.6509 | -1.0006 | -11.4728 | -|||||||||
GPTQ W4G128Sym |
- 0.6876 | -0.5045 | -0.6433 | -0.7541 | -0.6474 | -0.9952 | -11.6474 | -|||||||||
databricks/dolly-v2-7b | -FP32 | -0.6379 | -0.5282 | -0.614 | -0.7448 | -0.6312 | -/ | -16.161 | -||||||||
GPTQ W4G32Asym |
- 0.6377 | -0.5228 | -0.5991 | -0.7448 | -0.6261 | -0.9919 | -16.4096 | -|||||||||
EleutherAI/gpt-neo-2.7b | -FP32 | -0.6224 | -0.4271 | -0.577 | -0.722 | -0.5871 | -/ | -13.9359 | -||||||||
GPTQ W4G128Asym |
- 0.6123 | -0.4227 | -0.5738 | -0.7203 | -0.5823 | -0.9917 | -14.3377 | -|||||||||
GPTQ W4G32Asym |
- 0.615 | -0.4259 | -0.5714 | -0.7247 | -0.5843 | -0.9951 | -14.2083 | -|||||||||
GPTQ W4G32Sym |
- 0.6154 | -0.4208 | -0.5777 | -0.7198 | -0.5834 | -0.9937 | -14.3121 | -|||||||||
EleutherAI/gpt-neox-20b | -FP32 | -0.7233 | -0.5359 | -0.6614 | -0.7753 | -0.6740 | -/ | -9.195 | -||||||||
GPTQ W4G128Asym |
- 0.7186 | -0.5328 | -0.6535 | -0.7699 | -0.6687 | -0.9922 | -9.3463 | -|||||||||
GPTQ W4G32Asym |
- 0.7268 | -0.533 | -0.659 | -0.7715 | -0.6726 | -0.9979 | -9.2897 | -|||||||||
mosaicml/mpt-7b | -FP32 | -0.7056 | -0.5718 | -0.6859 | -0.7927 | -0.6890 | -/ | -9.9324 | -||||||||
GPTQ W4G128Asym |
- 0.7006 | -0.5655 | -0.6803 | -0.7965 | -0.6857 | -0.9952 | -10.1515 | -|||||||||
mosaicml/mpt-7b-chat | -FP32 | -0.655 | -0.5752 | -0.6748 | -0.7845 | -0.6724 | -/ | -13.5951 | -||||||||
GPTQ W4G128Asym |
- 0.6472 | -0.5716 | -0.6685 | -0.784 | -0.6678 | -0.9932 | -13.8539 | -|||||||||
mosaicml/mpt-7b-instruct | -FP32 | -0.6918 | -0.5819 | -0.678 | -0.7927 | -0.6861 | -/ | -10.8863 | -||||||||
GPTQ W4G128Asym |
- 0.6864 | -0.5765 | -0.6827 | -0.7873 | -0.6832 | -0.9958 | -11.1451 | -|||||||||
mosaicml/mpt-7b-storywriter | -FP32 | -0.693 | -0.5477 | -0.663 | -0.784 | -0.6719 | -/ | -9.9125 | -||||||||
GPTQ W4G128Asym |
- 0.6854 | -0.5443 | -0.6661 | -0.7813 | -0.6693 | -0.9961 | -10.1137 | -|||||||||
tiiuae/falcon-rw-7b | -FP32 | -0.6604 | -0.5419 | -0.6598 | -0.7753 | -0.6594 | -/ | -11.7616 | -||||||||
GPTQ W4G128Asym |
- 0.6484 | -0.5369 | -0.6575 | -0.7807 | -0.6559 | -0.9947 | -11.9411 | -|||||||||
GPTQ W4G32Asym |
- 0.6571 | -0.5398 | -0.6582 | -0.7764 | -0.6579 | -0.9978 | -11.8809 | -|||||||||
GPTQ W4G128Sym |
- 0.652 | -0.535 | -0.6575 | -0.7682 | -0.6532 | -0.9906 | -12.0048 | -|||||||||
tiiuae/falcon-7b-instruct | -FP32 | -0.6437 | -0.5177 | -0.6669 | -0.7824 | -0.6527 | -/ | -14.5053 | -||||||||
GPTQ W4G128Asym |
- 0.6301 | -0.5142 | -0.6654 | -0.7835 | -0.6483 | -0.9933 | -14.8146 | -|||||||||
GPTQ W4G32Asym |
- 0.6377 | -0.517 | -0.6598 | -0.7807 | -0.6488 | -0.9941 | -14.6953 | +bert-large-uncased-whole-word-masking-finetuned-squad | +static | +93.01% | +93.16% | +-0.16% | +150.05 | +22.42 | +6.69x | +|
distilbert-base-uncased-distilled-squad | +static | +86.10% | +86.84% | +-0.85% | +1034.60 | +151.13 | +6.85x |
ResNet50 V1.5 | +ResNet50 V1.5 | qlinearops | -72.16% | +72.18% | 72.29% | --0.18% | -1666.73 | -734.16 | -2.27x | +-0.16% | +1495.72 | +715.94 | +2.09x | |
ResNet50 V1.5 | qdq | -72.19% | +72.13% | 72.29% | --0.15% | -1658.10 | -734.33 | -2.26x | +-0.23% | +1547.30 | +717.03 | +2.16x | ||
ResNet50 V1.5 MLPerf | @@ -1449,19 +939,19 @@ For more complete information about performance and benchmark results, visit www76.15% | 76.46% | -0.41% | -1495.15 | -733.59 | -2.04x | +1365.56 | +718.55 | +1.90x | |||||
ResNet50 V1.5 MLPerf | qdq | -76.12% | +76.13% | 76.46% | -0.44% | -1661.90 | -732.04 | -2.27x | +1445.75 | +718.96 | +2.01x | |||
ResNet50 V1.5 (ONNX Model Zoo) | @@ -1469,49 +959,19 @@ For more complete information about performance and benchmark results, visit www74.77% | 74.99% | -0.29% | -1713.86 | -767.91 | -2.23x | +1574.38 | +749.36 | +2.10x | |||||
ResNet50 V1.5 (ONNX Model Zoo) | qdq | -74.48% | +74.78% | 74.99% | --0.67% | -1747.21 | -770.14 | -2.27x | -||||||
MobileNet V2 | -qlinearops | -65.55% | -66.89% | --2.01% | -7519.95 | -4430.84 | -1.70x | -|||||||
MobileNet V2 | -qdq | -65.60% | -66.89% | --1.93% | -7572.97 | -4413.58 | -1.72x | -|||||||
MobileNet V2 (ONNX Model Zoo) | -qlinearops | -68.51% | -69.48% | --1.41% | -7190.26 | -4019.16 | -1.79x | +-0.27% | +1564.15 | +755.58 | +2.07x | |||
VGG16 | @@ -1519,9 +979,9 @@ For more complete information about performance and benchmark results, visit www66.55% | 66.69% | -0.20% | -613.47 | -170.95 | -3.59x | +526.57 | +162.64 | +3.24x | |||||
VGG16 | @@ -1529,9 +989,9 @@ For more complete information about performance and benchmark results, visit www66.62% | 66.69% | -0.11% | -611.78 | -186.21 | -3.29x | +520.09 | +172.42 | +3.02x | |||||
VGG16 (ONNX Model Zoo) | @@ -1539,19 +999,19 @@ For more complete information about performance and benchmark results, visit www72.37% | 72.40% | -0.04% | -619.00 | -184.35 | -3.36x | +558.81 | +162.87 | +3.43x | |||||
VGG16 (ONNX Model Zoo) | qdq | -72.37% | +72.36% | 72.40% | --0.03% | -623.02 | -172.27 | -3.62x | +-0.04% | +556.58 | +176.92 | +3.15x | ||
MobileNet V3 MLPerf | @@ -1559,9 +1019,9 @@ For more complete information about performance and benchmark results, visit www75.51% | 75.74% | -0.30% | -5711.04 | -2584.17 | -2.21x | +5421.72 | +2578.08 | +2.10x | |||||
MobileNet V3 MLPerf | @@ -1569,9 +1029,9 @@ For more complete information about performance and benchmark results, visit www75.51% | 75.74% | -0.30% | -6136.36 | -2630.21 | -2.33x | +5382.87 | +2567.48 | +2.10x | |||||
ShuffleNet V2 (ONNX Model Zoo) | @@ -1579,9 +1039,19 @@ For more complete information about performance and benchmark results, visit www66.13% | 66.36% | -0.36% | -6820.89 | -3686.46 | -1.85x | +6426.22 | +3725.69 | +1.72x | +|||||
ShuffleNet V2 (ONNX Model Zoo) | +qdq | +66.22% | +66.36% | +-0.22% | +6534.24 | +3707.74 | +1.76x | |||||||
GoogleNet (ONNX Model Zoo) | @@ -1589,19 +1059,19 @@ For more complete information about performance and benchmark results, visit www67.69% | 67.79% | -0.14% | -1971.18 | -1120.08 | -1.76x | +1842.90 | +1137.58 | +1.62x | |||||
GoogleNet (ONNX Model Zoo) | qdq | -67.64% | +67.71% | 67.79% | --0.22% | -1838.28 | -1142.35 | -1.61x | +-0.11% | +1818.99 | +1136.37 | +1.60x | ||
SqueezeNet (ONNX Model Zoo) | @@ -1609,19 +1079,19 @@ For more complete information about performance and benchmark results, visit www56.49% | 56.87% | -0.67% | -10163.13 | -5771.89 | -1.76x | +9521.99 | +5530.36 | +1.72x | |||||
SqueezeNet (ONNX Model Zoo) | qdq | -56.33% | +56.49% | 56.87% | --0.94% | -10339.14 | -6002.84 | -1.72x | +-0.67% | +9391.07 | +5519.79 | +1.70x | ||
CaffeNet (ONNX Model Zoo) | @@ -1629,19 +1099,19 @@ For more complete information about performance and benchmark results, visit www56.26% | 56.30% | -0.07% | -2805.96 | -1077.80 | -2.60x | +2949.36 | +893.77 | +3.30x | |||||
CaffeNet (ONNX Model Zoo) | qdq | -56.18% | +56.26% | 56.30% | --0.21% | -4351.65 | -822.71 | -5.29x | +-0.08% | +2847.24 | +901.15 | +3.16x | ||
AlexNet (ONNX Model Zoo) | @@ -1649,19 +1119,19 @@ For more complete information about performance and benchmark results, visit www54.73% | 54.79% | -0.10% | -2169.83 | -893.06 | -2.43x | +2070.17 | +816.71 | +2.53x | |||||
AlexNet (ONNX Model Zoo) | qdq | -54.74% | +54.71% | 54.79% | --0.08% | -2232.07 | -841.46 | -2.65x | +-0.14% | +2059.13 | +844.97 | +2.44x | ||
ZFNet (ONNX Model Zoo) | @@ -1669,19 +1139,19 @@ For more complete information about performance and benchmark results, visit www55.83% | 55.96% | -0.24% | -921.09 | -525.21 | -1.75x | +858.76 | +461.25 | +1.86x | |||||
ZFNet (ONNX Model Zoo) | qdq | -55.82% | +55.87% | 55.96% | --0.24% | -925.69 | -534.05 | -1.73x | +-0.16% | +853.77 | +457.91 | +1.86x | ||
Inception V1 (ONNX Model Zoo) | @@ -1689,19 +1159,29 @@ For more complete information about performance and benchmark results, visit www67.23% | 67.24% | -0.02% | -1862.37 | -1161.55 | -1.60x | +1891.36 | +1205.95 | +1.57x | |||||
Inception V1 (ONNX Model Zoo) | qdq | -67.19% | +67.23% | 67.24% | --0.07% | -1956.47 | -1262.64 | -1.55x | +-0.02% | +1879.27 | +1202.19 | +1.56x | +||
BEiT (ONNX Model Zoo) | +qlinearops | +85.07% | +85.28% | +-0.25% | +205.15 | +126.59 | +1.62x | |||||||
EfficientNet (ONNX Model Zoo) | @@ -1709,29 +1189,49 @@ For more complete information about performance and benchmark results, visit www77.02% | 77.11% | -0.12% | -2793.23 | -1383.39 | -2.02x | +2428.32 | +1344.03 | +1.81x | +|||||
EfficientNet (ONNX Model Zoo) | +qdq | +76.99% | +77.11% | +-0.16% | +2286.73 | +1307.18 | +1.75x | |||||||
BEIT | +DenseNet (ONNX Model Zoo) | qlinearops | -85.07 | -85.28 | --0.25% | -206.50 | -128.13 | -1.61x | +60.53% | +60.96% | +-0.71% | +626.26 | +499.76 | +1.25x | +
SSD MobileNet V1 (ONNX Model Zoo) | +qlinearops | +22.96% | +23.02% | +-0.27% | +1121.43 | +841.32 | +1.33x | |||||||
SSD (ONNX Model Zoo) | +SSD MobileNet V1 (ONNX Model Zoo) | qdq | -18.62% | -18.98% | --1.90% | -56.97 | -14.57 | -3.91x | +22.96% | +23.02% | +-0.27% | +1048.50 | +798.22 | +1.31x |
DUC (ONNX Model Zoo) | @@ -1739,9 +1239,9 @@ For more complete information about performance and benchmark results, visit www81.62% | 81.92% | -0.37% | -8.76 | -5.03 | -1.74x | +9.26 | +4.99 | +1.86x | |||||
Ultra Face (ONNX Model Zoo) | @@ -1749,9 +1249,9 @@ For more complete information about performance and benchmark results, visit www83.33% | 83.65% | -0.38% | -8780.52 | -1920.30 | -4.57x | +8993.58 | +1988.46 | +4.52x | |||||
Emotion FERPlus (ONNX Model Zoo) | @@ -1759,98 +1259,138 @@ For more complete information about performance and benchmark results, visit www7.94% | 8.00% | -0.70% | -6360.85 | -3067.12 | -2.07x | +6113.74 | +3087.50 | +1.98x | |||||
ArcFace (ONNX Model Zoo) | qlinearops | 99.82% | 99.80% | -0.02% | -449.50 | -235.01 | -1.91x | ++0.02% | +442.85 | +230.75 | +1.92x | |||
BERT base MRPC | qlinearops | -85.78% | +85.54% | 86.03% | --0.28% | -511.36 | -225.15 | -2.27x | +-0.57% | +483.81 | +219.45 | +2.20x | ||
BERT base MRPC | qdq | -85.78% | +85.54% | 86.03% | --0.28% | -484.44 | -222.43 | -2.18x | +-0.57% | +485.08 | +218.33 | +2.22x | ||
BERT base MRPC | integerops | -85.78% | +85.29% | 86.03% | --0.28% | -728.48 | -222.35 | -3.28x | +-0.85% | +684.46 | +218.86 | +3.13x | ||
DistilBERT base MRPC | qdq | -85.05% | +84.07% | 84.56% | -0.58% | -635.93 | -405.58 | -1.57x | +-0.58% | +633.28 | +399.31 | +1.59x | ||
DistilBERT base MRPC | integerops | -85.29% | +85.54% | 84.56% | -0.87% | -1324.26 | -405.48 | -3.27x | ++1.16% | +1388.44 | +401.08 | +3.46x | ||
Roberta base MRPC | +Mobile bert MRPC | qdq | -88.24% | +85.54% | +86.28% | +-0.85% | +505.62 | +387.43 | +1.31x | +|||||
Mobile bert MRPC | +integerops | +85.54% | +86.28% | +-0.85% | +565.46 | +386.39 | +1.46x | +|||||||
Roberta base MRPC | +integerops | +90.93% | 89.95% | --1.91% | -484.00 | -223.37 | -2.17x | ++1.09% | +702.17 | +219.50 | +3.20x | |||
BERT SQuAD (ONNX Model Zoo) | integerops | -80.29 | -80.67 | +80.29% | +80.67% | -0.47% | -244.93 | -99.29 | -2.47x | +242.58 | +97.71 | +2.48x | ||
BERT base cased MRPC (HuggingFace) | +MobileBERT SQuAD MLPerf (ONNX Model Zoo) | +integerops | +89.87% | +90.03% | +-0.17% | +151.69 | +125.35 | +1.21x | +||||||
GPT2 lm head WikiText (ONNX Model Zoo) | +integerops | +31.98% | +29.00% | ++10.31% | +17.96 | +10.21 | +1.76x | +|||||||
BERT base uncased MRPC (HuggingFace) | qlinearops | 90.21% | 90.42% | -0.23% | -440.17 | -214.15 | +434.65 | +210.58 | 2.06x | |||||
89.58% | 90.42% | -0.93% | -715.22 | -201.24 | -3.55x | +708.66 | +210.74 | +3.36x | ||||||
Roberta base MRPC (HuggingFace) | @@ -1869,9 +1409,9 @@ For more complete information about performance and benchmark results, visit www91.00% | 91.38% | -0.41% | -434.48 | -214.20 | -2.03x | +431.37 | +211.03 | +2.04x | |||||
Roberta base MRPC (HuggingFace) | @@ -1879,9 +1419,9 @@ For more complete information about performance and benchmark results, visit www90.85% | 91.38% | -0.58% | -714.20 | -213.54 | -3.34x | +711.11 | +210.71 | +3.37x | |||||
XLM Roberta base MRPC (HuggingFace) | @@ -1889,8 +1429,8 @@ For more complete information about performance and benchmark results, visit www89.37% | 90.10% | -0.81% | -339.02 | -214.41 | +334.88 | +211.56 | 1.58x | ||||||
89.66% | 90.10% | -0.50% | -406.04 | -215.12 | -1.89x | +401.99 | +211.43 | +1.90x | +||||||
Camembert base MRPC (HuggingFace) | +qlinearops | +89.28% | +89.28% | +0.00% | +282.30 | +213.33 | +1.32x | |||||||
Camembert base MRPC (HuggingFace) | @@ -1909,9 +1459,9 @@ For more complete information about performance and benchmark results, visit www89.19% | 89.28% | -0.10% | -712.67 | -217.68 | -3.27x | +707.22 | +214.23 | +3.30x | |||||
MiniLM L12 H384 uncased MRPC (HuggingFace) | @@ -1919,8 +1469,8 @@ For more complete information about performance and benchmark results, visit www90.13% | 90.97% | -0.93% | -1209.98 | -588.93 | +1188.05 | +578.35 | 2.05x | ||||||
integerops | 91.07% | 90.97% | -0.10% | -1268.43 | -588.05 | -2.16x | ++0.10% | +1285.13 | +576.04 | +2.23x | ||||
DistilBERT base uncased SST-2 (HuggingFace) | @@ -1939,9 +1489,9 @@ For more complete information about performance and benchmark results, visit www90.71% | 91.06% | -0.38% | -1253.85 | -399.52 | -3.14x | +1259.69 | +396.60 | +3.18x | |||||
DistilBERT base uncased SST-2 (HuggingFace) | @@ -1949,18 +1499,38 @@ For more complete information about performance and benchmark results, visit www90.25% | 91.06% | -0.88% | -925.68 | -399.54 | +914.63 | +395.09 | 2.32x | ||||||
Albert base v2 SST-2 (HuggingFace) | +qlinearops | +92.09% | +92.32% | +-0.25% | +284.62 | +210.52 | +1.35x | +|||||||
Albert base v2 SST-2 (HuggingFace) | +integerops | +91.74% | +92.32% | +-0.62% | +284.69 | +210.00 | +1.36x | +|||||||
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 89.45% | 90.14% | -0.76% | -2209.72 | -1139.62 | +2172.98 | +1121.66 | 1.94x | |||||
89.91% | 90.14% | -0.26% | -2365.97 | -1137.32 | -2.08x | +2326.27 | +1114.57 | +2.09x | ||||||
BERT base cased MRPC (HuggingFace) | @@ -1979,9 +1549,9 @@ For more complete information about performance and benchmark results, visit www87.70% | 88.29% | -0.67% | -497.73 | -214.32 | -2.32x | +494.96 | +210.80 | +2.35x | |||||
BERT base cased MRPC (HuggingFace) | @@ -1989,19 +1559,19 @@ For more complete information about performance and benchmark results, visit www88.19% | 88.29% | -0.12% | -718.26 | -214.32 | -3.35x | +714.61 | +210.99 | +3.39x | |||||
Electra small discriminator MRPC (HuggingFace) | qlinearops | 89.92% | 89.83% | -0.09% | -1951.07 | -1142.89 | -1.71x | ++0.09% | +1998.71 | +1115.18 | +1.79x | |||
Electra small discriminator MRPC (HuggingFace) | @@ -2009,9 +1579,9 @@ For more complete information about performance and benchmark results, visit www89.27% | 89.83% | -0.63% | -2198.93 | -1129.20 | -1.95x | +2202.81 | +1121.41 | +1.96x | |||||
BERT mini MRPC (HuggingFace) | @@ -2019,9 +1589,9 @@ For more complete information about performance and benchmark results, visit www86.21% | 86.52% | -0.35% | -5814.17 | -3388.02 | -1.72x | +5767.23 | +3254.79 | +1.77x | |||||
BERT mini MRPC (HuggingFace) | @@ -2029,254 +1599,241 @@ For more complete information about performance and benchmark results, visit www86.16% | 86.52% | -0.41% | -6396.89 | -3445.06 | +6354.66 | +3424.42 | 1.86x | ||||||
Xlnet base cased MRPC (HuggingFace) | +qlinearops | +90.05% | +89.86% | ++0.21% | +121.24 | +95.56 | +1.27x | +|||||||
Xlnet base cased MRPC (HuggingFace) | +integerops | +89.58% | +89.86% | +-0.31% | +123.06 | +95.60 | +1.29x | +|||||||
BART large MRPC (HuggingFace) | integerops | 92.36% | 91.20% | -1.28% | -126.31 | -52.28 | -2.42x | ++1.28% | +126.14 | +51.06 | +2.47x | +|||
DeBERTa v3 base MRPC (HuggingFace) | +integerops | +92.39% | +92.23% | ++0.17% | +193.16 | +153.16 | +1.26x | |||||||
Spanbert SQuAD (HuggingFace) | qlinearops | -91.14 | -91.98 | +91.14% | +91.98% | -0.91% | -75.86 | -43.48 | -1.74x | +81.96 | +43.36 | +1.89x | ||
Spanbert SQuAD (HuggingFace) | integerops | -91.40 | -91.98 | +91.40% | +91.98% | -0.63% | -92.24 | -43.51 | -2.12x | +101.71 | +43.37 | +2.35x | ||
Bert base multilingual cased SQuAD (HuggingFace) | qlinearops | -88.42 | -89.13 | +88.42% | +89.13% | -0.79% | -79.06 | -43.45 | -1.82x | +86.33 | +43.27 | +2.00x | ||
Bert base multilingual cased SQuAD (HuggingFace) | integerops | -88.70 | -89.13 | +88.70% | +89.13% | -0.48% | -93.03 | -43.23 | -2.15x | +101.78 | +43.24 | +2.35x | ||
DistilBert base uncased SQuAD (HuggingFace) | qlinearops | -86.33 | -86.86 | +86.33% | +86.86% | -0.62% | -118.68 | -68.43 | +120.71 | +69.72 | 1.73x | |||
DistilBert base uncased SQuAD (HuggingFace) | integerops | -86.05 | -86.86 | +86.05% | +86.86% | -0.94% | -186.33 | -68.41 | -2.72x | +203.71 | +69.68 | +2.92x | ||
BERT large uncased whole word masking SQuAD (HuggingFace) | qlinearops | -92.34 | -93.16 | +92.34% | +93.16% | -0.88% | -28.67 | -13.12 | -2.19x | +31.81 | +12.94 | +2.46x | ||
BERT large uncased whole word masking SQuAD (HuggingFace) | integerops | -92.99 | -93.16 | +92.99% | +93.16% | -0.18% | -32.32 | -13.14 | -2.46x | +35.83 | +12.94 | +2.77x | ||
Roberta large SQuAD v2 (HuggingFace) | -integerops | -89.04 | -89.02 | -0.02% | -32.37 | -13.40 | -2.42x | -|||||||
LayoutLMv3 FUNSD (HuggingFace) | qlinearops | -89.66% | -90.49% | --0.91% | -47.60 | -27.28 | -1.74x | +89.03% | +89.02% | ++0.02% | +17.61 | +13.27 | +1.33x | |
LayoutLMv3 FUNSD (HuggingFace) | +Roberta large SQuAD v2 (HuggingFace) | integerops | -89.95% | -90.49% | --0.59% | -56.26 | -27.43 | -2.05x | +89.04% | +89.02% | ++0.02% | +35.85 | +13.26 | +2.70x |
LayoutLMv2 (HuggingFace) | +GPT2 WikiText (HuggingFace) | qlinearops | -80.95% | -81.17% | --0.27% | -64.14 | -38.91 | -1.65x | +30.25% | +29.00% | ++4.33% | +13.85 | +10.17 | +1.36x |
LayoutLMv2 (HuggingFace) | +GPT2 WikiText (HuggingFace) | integerops | -80.60% | -81.17% | --0.71% | -67.01 | -38.84 | -1.73x | -
Model name | -Configuration | -Lambada_openai | -Accuracy Ratio [INT4/FP32] |
- |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | -Perplexity | -|||||||||||
meta-llama/Llama-2-7b-chat-hf | -FP32 | -0.7058 | -3.2788 | -/ | -||||||||
GPTQ W4G32Asym |
- 0.7002 | -3.4124 | -0.9921 | -|||||||||
meta-llama/Llama-2-7b-hf | -FP32 | -0.7392 | -3.3950 | -/ | +29.68% | +29.00% | ++2.36% | +14.64 | +10.09 | +1.45x | ||
GPTQ W4G32Asym |
- 0.7312 | -3.5711 | -0.9892 | -|||||||||
meta-llama/Llama-2-13b-chat-hf | -FP32 | -0.7312 | -2.9163 | -/ | +DistilGPT2 WikiText (HuggingFace) | +qlinearops | +44.93% | +43.43% | ++3.46% | +21.80 | +17.13 | +1.27x |
GPTQ W4G128Asym |
- 0.7240 | -2.9945 | -0.9902 | -|||||||||
meta-llama/Llama-2-13b-hf | -FP32 | -0.7677 | -3.0438 | -/ | +DistilGPT2 WikiText (HuggingFace) | +integerops | +44.62% | +43.43% | ++2.74% | +23.02 | +17.09 | +1.35x |
GPTQ W4G128Asym |
- 0.7634 | -3.1186 | -0.9944 | -|||||||||
GPTQ W4G32Asym |
- 0.7615 | -3.1276 | -0.9919 | +LayoutLMv3 FUNSD (HuggingFace) | +integerops | +90.07% | +90.49% | +-0.46% | +39.50 | +28.00 | +1.41x | |
meta-llama/Llama-2-70b-chat-hf | -FP32 | -0.7543 | -2.6181 | -/ | +CodeBert (HuggingFace) | +qlinearops | +64.97% | +65.41% | +-0.67% | +75.69 | +45.10 | +1.68x |
RTN W4G32Asym |
- 0.7518 | -2.6496 | -0.9967 | +CodeBert (HuggingFace) | +integerops | +64.93% | +65.41% | +-0.73% | +94.47 | +45.10 | +2.09x | |
meta-llama/Llama-2-70b-hf | -FP32 | -0.7964 | -2.6612 | -/ | +FCN (ONNX Model Zoo) | +qlinearops | +64.54% | +64.98% | +-0.67% | +25.83 | +12.90 | +2.00x |
RTN W4G32Sym |
- 0.7941 | -2.7243 | -0.9971 | +FCN (ONNX Model Zoo) | +qdq | +64.54% | +64.98% | +-0.67% | +25.97 | +12.99 | +2.00x |