Skip to content

customize vacabulary size to 2560 instead of 2256 to enable fp8 GEMM in CI #461

@weifengpy

Description

@weifengpy

current status: CI use tokenizer from "./test/assets/test_tiktoken.model", and has vacab size = 2256. FP8 GEMM requires matrix size to be divisible by 16. without any sharding, we have shape 2256 / 16 = 141, and 141 is not divisible by world size 2/4/8

Option 1: customizer or add a new tokenizer in CI with vacab size = 2560 (2560 / 16 = 160). it's enough to test 2-way, 4-way, 8-way TP

Option 2: enable padding for fp8 GEMM at the cost of memeory spike and 20% perf regression

repro
CONFIG_FILE="./train_configs/debug_model.toml" ./run_llama_train.sh and look for the following log
Building llama3 debugmodel with ModelArgs(dim=256, n_layers=8, n_heads=16, n_kv_heads=None, vocab_size=2256

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions