[Tracker] WIP features for torchao 0.4

Release date: Aug 8 2024
Branch cut: Aug 2 2024

## [Developer Facing API](https://github.com/pytorch/ao/issues/391)
- [x] static quantization flow example @jerryzh168
- [ ] QAT refactor to generalize to other dtypes/techniques @andrewor14 
- [x] Int4 weight-only QAT

## Developer Facing API use cases
- [ ] hqq, hqq-mix subclasses (defer to next release) @HDCharles 
- [x] AffineQuantizedTensor layout cleanup @jerryzh168 
- [x] [postponed to 0.5] int4 weight only quantization change device support (e.g. cpu -> cuda) @jerryzh168 
- [x] sparse + quantization composability support @jcaip 
- [x] quantize kv_cache to int8 for gpt-fast/torchAO llama @HDCharles 

## Modeling user API
- [ ] autoquant to use AffineQuantizedTensor @HDCharles 
- [x] [handed off to executorch team] torchchat/executorch compatibility @jerryzh168 
- [x] add sam-fast to torchao @jcaip 

## Documentation
- [ ] add quantization overview to torchao doc @supriyar 
- [ ] Huggingface neural-magic SparseLlama 2:4 notebook



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tracker] WIP features for torchao 0.4 #493

Developer Facing API

Developer Facing API use cases

Modeling user API

Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracker] WIP features for torchao 0.4 #493

Description

Developer Facing API

Developer Facing API use cases

Modeling user API

Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions