The next tutorials

From our README.md

> torchao is a library to create and integrate high-performance custom data types layouts into your PyTorch workflows

And so far we've done a good job building out the primitive data types along with their corresponding transformed Linear Layers so for example given a new `ExoticDtype()` we have a playbook to create `ExoticDtypeLinear()` and indeed for weight only transformations this is a perfectly fine workflow and how the majority of quantization libraries operate.

For example 

```python
m = DownloadModelFromHuggingFace()
quantize_(m, int4_weight_only()) # This will swap out all torch.nn.Linear with a 4 bit Linear
```

We can make the above shine with more accessible blogs and performance benchmarks and integrations with more partners

However, this is doing somewhat of a disservice at explaining the ao value proposition. For example, we're a dtype library and not a dtype Linear library so given a dtype it should be easy for us to do a lot more. So some examples I'd like to see next are

* [x] Quantized Optimizers with the most obvious additions being 8 bit and 4 bit ADAM 
* [ ] Quantized KV cache
* [ ] Quantization Aware training with an exotic dtype

None of the above is "research", this is very much the way engineering is moving for inference https://blog.character.ai/optimizing-ai-inference-at-character-ai/ 

Also given an exotic quantization schema I'd like to be more proactive in helping people benchmark their models so this should include

- [ ] Flop utilization
- [ ] Memory bandwidth
- [ ] Cache hit rate (for kv cache only)
- [ ] Roofline analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The next tutorials #426

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The next tutorials #426

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions