Skip to content

[Tracker] WIP features for torchao 0.4 #493

Closed
@jerryzh168

Description

@jerryzh168

Release date: Aug 8 2024
Branch cut: Aug 2 2024

Developer Facing API

  • static quantization flow example @jerryzh168
  • QAT refactor to generalize to other dtypes/techniques @andrewor14
  • Int4 weight-only QAT

Developer Facing API use cases

  • hqq, hqq-mix subclasses (defer to next release) @HDCharles
  • AffineQuantizedTensor layout cleanup @jerryzh168
  • [postponed to 0.5] int4 weight only quantization change device support (e.g. cpu -> cuda) @jerryzh168
  • sparse + quantization composability support @jcaip
  • quantize kv_cache to int8 for gpt-fast/torchAO llama @HDCharles

Modeling user API

  • autoquant to use AffineQuantizedTensor @HDCharles
  • [handed off to executorch team] torchchat/executorch compatibility @jerryzh168
  • add sam-fast to torchao @jcaip

Documentation

  • add quantization overview to torchao doc @supriyar
  • Huggingface neural-magic SparseLlama 2:4 notebook

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions