[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3

DeepSeek v3 uses a blockwise fp8 quantization strategy, where the scaling factor is computed independently for each block, rather than for each tensor/row/etc. The code is available [here](https://github.com/deepseek-ai/DeepSeek-V3/blob/ee4c4ea32bfd89e197616b80e713466954c51c75/inference/kernel.py#L33).

It would be useful for torchao to support this as well, for users wishing to do research or development with this same quantization strategy.

cc @drisspg @vkuzo 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3 #1594

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3 #1594

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions