Skip to content

deep-spin/triton-tutorial

Repository files navigation

Triton Tutorial

This repository contains a series of puzzle-style Jupyter notebooks that guide you through implementing common GPU kernels first in raw PyTorch, then with torch.compile, and finally with Triton.

Slides: https://docs.google.com/presentation/d/1VooJ3gQlhbPSyG5F08JTgvo7FCm4TyVivvdLl_gZ7-c

Installation

First, install dependencies:

pip install -r requirements.txt

Linux/Windows

Install Triton directly via pip:

pip install triton==3.3.1

MacOS

You need to build from source (it takes some time -- ~15min on my Mac):

git clone https://github.com/triton-lang/triton-cpu.git
cd triton-cpu
git submodule update --init --recursive
cd python
pip install -r requirements.txt
pip install -e .

Puzzles Covered

Now, just open the notebooks in Jupyter, step through each cell and solve the puzzle!

  1. Vector Addition
  2. Fused Softmax
  3. Fused Entmax
  4. Matrix Multiplication
  5. Layer Normalization
  6. Cross-Entropy Loss
  7. Softmax Attention - Forward Pass
  8. Sparsemax Attention - Forward Pass (BONUS)

Happy puzzling!


References

Kernels:

About

From a+b to sparsemax(QK^T)V in Triton!

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •