Skip to content

The next tutorials #426

@msaroufim

Description

@msaroufim

From our README.md

torchao is a library to create and integrate high-performance custom data types layouts into your PyTorch workflows

And so far we've done a good job building out the primitive data types along with their corresponding transformed Linear Layers so for example given a new ExoticDtype() we have a playbook to create ExoticDtypeLinear() and indeed for weight only transformations this is a perfectly fine workflow and how the majority of quantization libraries operate.

For example

m = DownloadModelFromHuggingFace()
quantize_(m, int4_weight_only()) # This will swap out all torch.nn.Linear with a 4 bit Linear

We can make the above shine with more accessible blogs and performance benchmarks and integrations with more partners

However, this is doing somewhat of a disservice at explaining the ao value proposition. For example, we're a dtype library and not a dtype Linear library so given a dtype it should be easy for us to do a lot more. So some examples I'd like to see next are

  • Quantized Optimizers with the most obvious additions being 8 bit and 4 bit ADAM
  • Quantized KV cache
  • Quantization Aware training with an exotic dtype

None of the above is "research", this is very much the way engineering is moving for inference https://blog.character.ai/optimizing-ai-inference-at-character-ai/

Also given an exotic quantization schema I'd like to be more proactive in helping people benchmark their models so this should include

  • Flop utilization
  • Memory bandwidth
  • Cache hit rate (for kv cache only)
  • Roofline analysis

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions