You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
current status: CI use tokenizer from "./test/assets/test_tiktoken.model", and has vacab size = 2256. FP8 GEMM requires matrix size to be divisible by 16. without any sharding, we have shape 2256 / 16 = 141, and 141 is not divisible by world size 2/4/8
Option 1: customizer or add a new tokenizer in CI with vacab size = 2560 (2560 / 16 = 160). it's enough to test 2-way, 4-way, 8-way TP
Option 2: enable padding for fp8 GEMM at the cost of memeory spike and 20% perf regression
repro CONFIG_FILE="./train_configs/debug_model.toml" ./run_llama_train.sh and look for the following log Building llama3 debugmodel with ModelArgs(dim=256, n_layers=8, n_heads=16, n_kv_heads=None, vocab_size=2256