customize vacabulary size to 2560 instead of 2256 to enable fp8 GEMM in CI

**current status**: CI use tokenizer from "./test/assets/test_tiktoken.model", and has vacab size = 2256. FP8 GEMM requires matrix size to be divisible by 16. without any sharding, we have shape 2256 / 16 = 141, and 141 is not divisible by world size 2/4/8

**Option 1**: customizer or add a new tokenizer in CI with vacab size = 2560 (2560 / 16 = 160). it's enough to test 2-way, 4-way, 8-way TP

**Option 2**: enable padding for fp8 GEMM at the cost of memeory spike and 20% perf regression

**repro**
`CONFIG_FILE="./train_configs/debug_model.toml" ./run_llama_train.sh` and look for the following log
`Building llama3 debugmodel with ModelArgs(dim=256, n_layers=8, n_heads=16, n_kv_heads=None, vocab_size=2256`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

customize vacabulary size to 2560 instead of 2256 to enable fp8 GEMM in CI #461

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

customize vacabulary size to 2560 instead of 2256 to enable fp8 GEMM in CI #461

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions