FP6 dtype!

### 🚀 The feature, motivation and pitch

https://arxiv.org/abs/2401.14112

I think you guys are really going to like this.
The deepspeed developers introduce FP6 datatype on cards without fp8 support, while maintaining full tensor core suppourt using a kernel they created called tc-fpX. Tests were done on a a100! And they achieved 1.69x-2.65x inference performance! And I assume this can be transferred over to training (with the exception of possibly the KV cache, and embedding module). This is really exiting, this will breathe new life into the rapidly aging a100 due to the introduction of the h100’s fp8.

It was merged into deepspeed in this commit: 
https://github.com/microsoft/DeepSpeed/commit/ccfdb84e2a4a373ac657a99afd2d97e1d741b22b

Getting this pushed into the Pytorch as a dtype, that would be a major win. These are the benefits FP6 provides:
![IMG_4696](https://github.com/pytorch/pytorch/assets/122953474/53beb625-fe5f-4790-9045-d873497b6705)


### Alternatives

These kernels shouldn’t be limited by only the a100, they theoretically could work on any card with uint8_t and fp16 support. Provided these kernels were only written for a100 so without modification it might only work on ampere cards.


### Additional context

The tc-FPx kernel essentially takes 4fp16 values, quantizes them to fp6 with some place holders. Then they get pushed into an array built of 3x Uint8_t. As shown here:
![IMG_4686](https://github.com/pytorch/pytorch/assets/122953474/91b77585-b6f3-4e38-9b97-f91749436a23)
![IMG_4688](https://github.com/pytorch/pytorch/assets/122953474/adcda5d5-4b40-4b49-ac33-a64e7e1183cb)
![IMG_4689](https://github.com/pytorch/pytorch/assets/122953474/c891eced-bcb9-4db3-813c-b34bb59ee9a0)


cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FP6 dtype! #208

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FP6 dtype! #208

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions