-
Notifications
You must be signed in to change notification settings - Fork 27
Closed
Description
After quantizing an AutoModelForSequenceClassification model using autofp8, I observed a slight drop in performance. The left chart shows the inference time for bf16 linear layers, while the right chart shows the combined scale and matrix multiplication time for fp8. As shown, the fp8 scale and mm time is approximately double of bf16. Is this expected behavior? Or could the autofp8 team provide guidance on improving performance.
1 non-fp8: torch.Size([22, 768]) 0.05331993103027344 | 1 scale: 6.031990051269531e-05
2 | 2 mm: torch.Size([22, 768]) 0.2548956871032715
3 non-fp8: torch.Size([22, 768]) 0.0032389163970947266 | 3 scale: 4.982948303222656e-05
4 | 4 mm: torch.Size([22, 768]) 6.29425048828125e-05
5 non-fp8: torch.Size([22, 768]) 5.1021575927734375e-05 | 5 scale: 1.52587890625e-05
6 | 6 mm: torch.Size([22, 768]) 2.3365020751953125e-05
7 non-fp8: torch.Size([22, 3072]) 5.936622619628906e-05 | 7 scale: 4.172325134277344e-05
8 | 8 mm: torch.Size([22, 3072]) 4.649162292480469e-05
9 non-fp8: torch.Size([22, 768]) 1.52587890625e-05 | 9 scale: 9.775161743164062e-06
10 | 10 mm: torch.Size([22, 768]) 1.8835067749023438e-05
11 non-fp8: torch.Size([22, 768]) 1.4781951904296875e-05 | 11 scale: 1.1682510375976562e-05
12 | 12 mm: torch.Size([22, 768]) 2.0742416381835938e-05
13 non-fp8: torch.Size([22, 768]) 1.2159347534179688e-05 | 13 scale: 2.002716064453125e-05
14 | 14 mm: torch.Size([22, 768]) 2.0265579223632812e-05
15 non-fp8: torch.Size([22, 3072]) 1.5974044799804688e-05 | 15 scale: 1.8358230590820312e-05
16 | 16 mm: torch.Size([22, 3072]) 2.7418136596679688e-05
17 non-fp8: torch.Size([22, 768]) 9.059906005859375e-06 | 17 scale: 8.58306884765625e-06
18 | 18 mm: torch.Size([22, 768]) 1.52587890625e-05
19 non-fp8: torch.Size([22, 768]) 1.3589859008789062e-05 | 19 scale: 9.775161743164062e-06
20 | 20 mm: torch.Size([22, 768]) 1.6927719116210938e-05
21 non-fp8: torch.Size([22, 768]) 1.0728836059570312e-05 | 21 scale: 9.5367431640625e-06
22 | 22 mm: torch.Size([22, 768]) 1.6450881958007812e-05
23 non-fp8: torch.Size([22, 3072]) 1.1682510375976562e-05 | 23 scale: 7.62939453125e-06
24 | 24 mm: torch.Size([22, 3072]) 1.5735626220703125e-05
25 non-fp8: torch.Size([22, 768]) 9.298324584960938e-06 | 25 scale: 7.867813110351562e-06
26 | 26 mm: torch.Size([22, 768]) 1.4543533325195312e-05
27 non-fp8: torch.Size([22, 768]) 1.2159347534179688e-05 | 27 scale: 8.821487426757812e-06
28 | 28 mm: torch.Size([22, 768]) 1.6450881958007812e-05
29 non-fp8: torch.Size([22, 768]) 1.0251998901367188e-05 | 29 scale: 9.298324584960938e-06
30 | 30 mm: torch.Size([22, 768]) 1.8596649169921875e-05
31 non-fp8: torch.Size([22, 3072]) 1.1682510375976562e-05 | 31 scale: 7.62939453125e-06
32 | 32 mm: torch.Size([22, 3072]) 1.6450881958007812e-05
33 non-fp8: torch.Size([22, 768]) 9.5367431640625e-06 | 33 scale: 7.152557373046875e-06
34 | 34 mm: torch.Size([22, 768]) 1.5020370483398438e-05
Metadata
Metadata
Assignees
Labels
No labels