fp8 vs bf16 performance problem

After quantizing an AutoModelForSequenceClassification model using autofp8, I observed a slight drop in performance. The left chart shows the inference time for bf16 linear layers, while the right chart shows the combined scale and matrix multiplication time for fp8. As shown, the fp8 scale and mm time is approximately double of bf16. Is this expected behavior? Or could the autofp8 team provide guidance on improving performance.

```
    1 non-fp8:  torch.Size([22, 768]) 0.05331993103027344       |    1 scale:   6.031990051269531e-05                           
    2                                                           |    2 mm:  torch.Size([22, 768]) 0.2548956871032715            
    3 non-fp8:  torch.Size([22, 768]) 0.0032389163970947266     |    3 scale:   4.982948303222656e-05                           
    4                                                           |    4 mm:  torch.Size([22, 768]) 6.29425048828125e-05          
    5 non-fp8:  torch.Size([22, 768]) 5.1021575927734375e-05    |    5 scale:   1.52587890625e-05                               
    6                                                           |    6 mm:  torch.Size([22, 768]) 2.3365020751953125e-05        
    7 non-fp8:  torch.Size([22, 3072]) 5.936622619628906e-05    |    7 scale:   4.172325134277344e-05                           
    8                                                           |    8 mm:  torch.Size([22, 3072]) 4.649162292480469e-05        
    9 non-fp8:  torch.Size([22, 768]) 1.52587890625e-05         |    9 scale:   9.775161743164062e-06                           
   10                                                           |   10 mm:  torch.Size([22, 768]) 1.8835067749023438e-05        
   11 non-fp8:  torch.Size([22, 768]) 1.4781951904296875e-05    |   11 scale:   1.1682510375976562e-05                          
   12                                                           |   12 mm:  torch.Size([22, 768]) 2.0742416381835938e-05        
   13 non-fp8:  torch.Size([22, 768]) 1.2159347534179688e-05    |   13 scale:   2.002716064453125e-05                           
   14                                                           |   14 mm:  torch.Size([22, 768]) 2.0265579223632812e-05        
   15 non-fp8:  torch.Size([22, 3072]) 1.5974044799804688e-05   |   15 scale:   1.8358230590820312e-05                          
   16                                                           |   16 mm:  torch.Size([22, 3072]) 2.7418136596679688e-05       
   17 non-fp8:  torch.Size([22, 768]) 9.059906005859375e-06     |   17 scale:   8.58306884765625e-06                            
   18                                                           |   18 mm:  torch.Size([22, 768]) 1.52587890625e-05             
   19 non-fp8:  torch.Size([22, 768]) 1.3589859008789062e-05    |   19 scale:   9.775161743164062e-06                           
   20                                                           |   20 mm:  torch.Size([22, 768]) 1.6927719116210938e-05        
   21 non-fp8:  torch.Size([22, 768]) 1.0728836059570312e-05    |   21 scale:   9.5367431640625e-06                             
   22                                                           |   22 mm:  torch.Size([22, 768]) 1.6450881958007812e-05        
   23 non-fp8:  torch.Size([22, 3072]) 1.1682510375976562e-05   |   23 scale:   7.62939453125e-06                               
   24                                                           |   24 mm:  torch.Size([22, 3072]) 1.5735626220703125e-05       
   25 non-fp8:  torch.Size([22, 768]) 9.298324584960938e-06     |   25 scale:   7.867813110351562e-06                           
   26                                                           |   26 mm:  torch.Size([22, 768]) 1.4543533325195312e-05        
   27 non-fp8:  torch.Size([22, 768]) 1.2159347534179688e-05    |   27 scale:   8.821487426757812e-06                           
   28                                                           |   28 mm:  torch.Size([22, 768]) 1.6450881958007812e-05        
   29 non-fp8:  torch.Size([22, 768]) 1.0251998901367188e-05    |   29 scale:   9.298324584960938e-06                           
   30                                                           |   30 mm:  torch.Size([22, 768]) 1.8596649169921875e-05        
   31 non-fp8:  torch.Size([22, 3072]) 1.1682510375976562e-05   |   31 scale:   7.62939453125e-06                               
   32                                                           |   32 mm:  torch.Size([22, 3072]) 1.6450881958007812e-05       
   33 non-fp8:  torch.Size([22, 768]) 9.5367431640625e-06       |   33 scale:   7.152557373046875e-06                           
   34                                                           |   34 mm:  torch.Size([22, 768]) 1.5020370483398438e-05    
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fp8 vs bf16 performance problem #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fp8 vs bf16 performance problem #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions