Skip to content

fp8 vs bf16 performance problem #38

@AllenDou

Description

@AllenDou

After quantizing an AutoModelForSequenceClassification model using autofp8, I observed a slight drop in performance. The left chart shows the inference time for bf16 linear layers, while the right chart shows the combined scale and matrix multiplication time for fp8. As shown, the fp8 scale and mm time is approximately double of bf16. Is this expected behavior? Or could the autofp8 team provide guidance on improving performance.

    1 non-fp8:  torch.Size([22, 768]) 0.05331993103027344       |    1 scale:   6.031990051269531e-05                           
    2                                                           |    2 mm:  torch.Size([22, 768]) 0.2548956871032715            
    3 non-fp8:  torch.Size([22, 768]) 0.0032389163970947266     |    3 scale:   4.982948303222656e-05                           
    4                                                           |    4 mm:  torch.Size([22, 768]) 6.29425048828125e-05          
    5 non-fp8:  torch.Size([22, 768]) 5.1021575927734375e-05    |    5 scale:   1.52587890625e-05                               
    6                                                           |    6 mm:  torch.Size([22, 768]) 2.3365020751953125e-05        
    7 non-fp8:  torch.Size([22, 3072]) 5.936622619628906e-05    |    7 scale:   4.172325134277344e-05                           
    8                                                           |    8 mm:  torch.Size([22, 3072]) 4.649162292480469e-05        
    9 non-fp8:  torch.Size([22, 768]) 1.52587890625e-05         |    9 scale:   9.775161743164062e-06                           
   10                                                           |   10 mm:  torch.Size([22, 768]) 1.8835067749023438e-05        
   11 non-fp8:  torch.Size([22, 768]) 1.4781951904296875e-05    |   11 scale:   1.1682510375976562e-05                          
   12                                                           |   12 mm:  torch.Size([22, 768]) 2.0742416381835938e-05        
   13 non-fp8:  torch.Size([22, 768]) 1.2159347534179688e-05    |   13 scale:   2.002716064453125e-05                           
   14                                                           |   14 mm:  torch.Size([22, 768]) 2.0265579223632812e-05        
   15 non-fp8:  torch.Size([22, 3072]) 1.5974044799804688e-05   |   15 scale:   1.8358230590820312e-05                          
   16                                                           |   16 mm:  torch.Size([22, 3072]) 2.7418136596679688e-05       
   17 non-fp8:  torch.Size([22, 768]) 9.059906005859375e-06     |   17 scale:   8.58306884765625e-06                            
   18                                                           |   18 mm:  torch.Size([22, 768]) 1.52587890625e-05             
   19 non-fp8:  torch.Size([22, 768]) 1.3589859008789062e-05    |   19 scale:   9.775161743164062e-06                           
   20                                                           |   20 mm:  torch.Size([22, 768]) 1.6927719116210938e-05        
   21 non-fp8:  torch.Size([22, 768]) 1.0728836059570312e-05    |   21 scale:   9.5367431640625e-06                             
   22                                                           |   22 mm:  torch.Size([22, 768]) 1.6450881958007812e-05        
   23 non-fp8:  torch.Size([22, 3072]) 1.1682510375976562e-05   |   23 scale:   7.62939453125e-06                               
   24                                                           |   24 mm:  torch.Size([22, 3072]) 1.5735626220703125e-05       
   25 non-fp8:  torch.Size([22, 768]) 9.298324584960938e-06     |   25 scale:   7.867813110351562e-06                           
   26                                                           |   26 mm:  torch.Size([22, 768]) 1.4543533325195312e-05        
   27 non-fp8:  torch.Size([22, 768]) 1.2159347534179688e-05    |   27 scale:   8.821487426757812e-06                           
   28                                                           |   28 mm:  torch.Size([22, 768]) 1.6450881958007812e-05        
   29 non-fp8:  torch.Size([22, 768]) 1.0251998901367188e-05    |   29 scale:   9.298324584960938e-06                           
   30                                                           |   30 mm:  torch.Size([22, 768]) 1.8596649169921875e-05        
   31 non-fp8:  torch.Size([22, 3072]) 1.1682510375976562e-05   |   31 scale:   7.62939453125e-06                               
   32                                                           |   32 mm:  torch.Size([22, 3072]) 1.6450881958007812e-05       
   33 non-fp8:  torch.Size([22, 768]) 9.5367431640625e-06       |   33 scale:   7.152557373046875e-06                           
   34                                                           |   34 mm:  torch.Size([22, 768]) 1.5020370483398438e-05    

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions