Open
Description
DeepSeek v3 uses a blockwise fp8 quantization strategy, where the scaling factor is computed independently for each block, rather than for each tensor/row/etc. The code is available here.
It would be useful for torchao to support this as well, for users wishing to do research or development with this same quantization strategy.