How to run

Hybrid data type with custom weight location

How to use hybrid data type with custom weight location:

Change the data type to "bf16_fp16" in demo.py, which means to use BF16 weights for first token，FP16 weights for next tokens.
Set the environment variable: "FIRST_TOKEN_WEIGHT_LOCATION" and "NEXT_TOKEN_WEIGHT_LOCATION". Use the NUMA node ID as the value.

Example: FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=1 SINGLE_INSTANCE=1 OMP_NUM_THREADS=32 taskset -c 56-87 python demo.py

Performance on Intel (R) Xeon (R) CPU Max 9468 with command FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=8 OMP_NUM_THREADS=8 numactl -C 0-7 -m 8 ./example /data/chatglm2-6b-cpu 3 1024

First token latency: Next token latency:

对于混合精度的分布式性能测试：

FIRST_TOKEN_WEIGHT_LOCATION=1 NEXT_TOKEN_WEIGHT_LOCATION=3 OMP_NUM_THREADS=20 mpirun \
    -n 1 numactl -N 1 -m 3 python demo.py --dtype=bf16_fp16 --token_path /data/chatglm2-6b-hf/ --model_path /data/chatglm2-6b-cpu/ --streaming False : \
    -n 1 numactl -N 1 -m 3 python demo.py --dtype=bf16_fp16 --token_path /data/chatglm2-6b-hf/ --model_path /data/chatglm2-6b-cpu/ --streaming False

export $(python -c 'import xfastertransformer as xft; print(xft.get_env())')

ENV for xDNN

XDNN_N64 for all gemv but will overwrite with vars below
XDNN_N64_N for normal gemv
XDNN_N64_NR for normal gemv with residential
XDNN_N64_A for batchA gemv
N64Flag_AR for batchA gemv with residential
N64Flag_C for batchC gemv
N64Flag_CR for batchC gemv with residential

if vars > 0 then using N64 version kernel
if vars < 0 then using N16 version kernel
if var == 0 then using default dispatch method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to run

Hybrid data type with custom weight location

ENV for xDNN

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally