Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 231 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 231 Bytes

FlashMLA PyTorch

PyTorch implementation of FlashMLA.

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving. Currently released: BF16; Paged kvcache with block size of 64.