FastDeploy 2.0: Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

Installation | Quick Start | Supported Models

FastDeploy 2.0: Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线! 完成文心4.5系列开源模型的推理部署等任务，即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金！🎁 欢迎大家体验反馈～ 📌报名地址 📌活动详情

[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live! Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌Sign up here 📌Event details

[2025-06] 🔥 Released FastDeploy v2.0: Supports inference and deployment for ERNIE 4.5. Furthermore, we open-source an industrial-grade PD disaggregation with context caching, dynamic role switching for effective resource utilization to further enhance inference performance for MoE models.

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.
🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
⏩ Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU etc.

Requirements

OS: Linux
Python: 3.10 ~ 3.12

Installation

FastDeploy supports inference deployment on NVIDIA GPUs, Kunlunxin XPUs, Iluvatar GPUs, Enflame GCUs, and other hardware. For detailed installation instructions:

Note: We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU, Hygon DCU, and MetaX GPU are currently under development and testing. Stay tuned for updates!

Get Started

Learn how to use FastDeploy through our documentation:

Supported Models

Model	Data Type	PD Disaggregation	Chunked Prefill	Prefix Caching	MTP	CUDA Graph	Maximum Context Length
ERNIE-4.5-300B-A47B	BF16/WINT4/WINT8/W4A8C8/WINT2/FP8	✅	✅	✅	✅	WIP	128K
ERNIE-4.5-300B-A47B-Base	BF16/WINT4/WINT8	✅	✅	✅	❌	WIP	128K
ERNIE-4.5-VL-424B-A47B	BF16/WINT4/WINT8	WIP	✅	WIP	❌	WIP	128K
ERNIE-4.5-VL-28B-A3B	BF16/WINT4/WINT8	❌	✅	WIP	❌	WIP	128K
ERNIE-4.5-21B-A3B	BF16/WINT4/WINT8/FP8	❌	✅	✅	✅	✅	128K
ERNIE-4.5-21B-A3B-Base	BF16/WINT4/WINT8/FP8	❌	✅	✅	❌	✅	128K
ERNIE-4.5-0.3B	BF16/WINT8/FP8	❌	✅	✅	❌	✅	128K

Advanced Usage

Acknowledgement

FastDeploy is licensed under the Apache-2.0 open-source license. During development, portions of vLLM code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

Name		Name	Last commit message	Last commit date
Latest commit History 3,011 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
custom_ops		custom_ops
dockerfiles		dockerfiles
docs		docs
fastdeploy		fastdeploy
scripts		scripts
test		test
tools		tools
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
build.sh		build.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dcu.txt		requirements_dcu.txt
requirements_iluvatar.txt		requirements_iluvatar.txt
requirements_metaxgpu.txt		requirements_metaxgpu.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastDeploy 2.0: Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

About

Requirements

Installation

Get Started

Supported Models

Advanced Usage

Acknowledgement

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 150

Languages

License

PaddlePaddle/FastDeploy

Folders and files

Latest commit

History

Repository files navigation

FastDeploy 2.0: Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

About

Requirements

Installation

Get Started

Supported Models

Advanced Usage

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 150

Languages

Packages