Fast inference for DeepSeek-V3 LLMs written in CUDA.
- Install Weights
# Weights is a symbolic link so you will have to change it
cd weights
# Clone weights from huggingface
git clone https://huggingface.co/deepseek-ai/DeepSeek-V3
- Build Inference Engine
bazel build
- Run