Closed
Description
Release date: Aug 8 2024
Branch cut: Aug 2 2024
Developer Facing API
- static quantization flow example @jerryzh168
- QAT refactor to generalize to other dtypes/techniques @andrewor14
- Int4 weight-only QAT
Developer Facing API use cases
- hqq, hqq-mix subclasses (defer to next release) @HDCharles
- AffineQuantizedTensor layout cleanup @jerryzh168
- [postponed to 0.5] int4 weight only quantization change device support (e.g. cpu -> cuda) @jerryzh168
- sparse + quantization composability support @jcaip
- quantize kv_cache to int8 for gpt-fast/torchAO llama @HDCharles
Modeling user API
- autoquant to use AffineQuantizedTensor @HDCharles
- [handed off to executorch team] torchchat/executorch compatibility @jerryzh168
- add sam-fast to torchao @jcaip
Documentation
- add quantization overview to torchao doc @supriyar
- Huggingface neural-magic SparseLlama 2:4 notebook