FastServe

FastServe is a fast, efficient and scalable inference serving system for Large Language Models (LLMs). It serves as an easy-to-use Python front-end and utilizes a high performance LLM inference C++ library SwiftTransformer.

It is fast with:

preemptive scheduling
continuous batching
custom attention kernels
C++ model implementation

It is memory efficient with:

proactive memory swapping
paged attention kernels

It is scalable with:

megatron-LM tensor parallelism
streaming pipeline parallelism

It currently supports:

OPT (facebook/opt-1.3b, facebook/opt-6.7b, ...)
LLaMA2 (meta-llama/Llama-2-7b, meta-llama/Llama-2-13b, ...)

Build && Install

# git clone the project
git clone [email protected]:LLMServe/FastServe.git && cd FastServe

# setup the fastserve conda environment
conda env create -f environment.yml && conda activate fastserve

# clone and build the SwiftTransformer library  
git clone https://github.com/LLMServe/SwiftTransformer.git && cd SwiftTransformer && git submodule update --init --recursive && cmake -B build && cmake --build build -j$(nproc) && cd ..

# install fastserve
pip install -e .

Artifact Evaluation

See benchmarks/artifact-evaluation/README.md for detailed instructions.

Run

Offline case

python fastserve/examples/offline.py

Online case

# launch api server
python -m fastserve.api_server.fastserve_api_server

# launch client
python fastserve/examples/online.py

Contribution

If you want to contribute to the project, please read contribution.md.

Acknowledgement

The architecture design of FastServe is greatly inspired by vLLM.

Citation

If you use FastServe for your research, please cite our paper:

@misc{wu2023fast,
      title={Fast Distributed Inference Serving for Large Language Models}, 
      author={Bingyang Wu and Yinmin Zhong and Zili Zhang and Gang Huang and Xuanzhe Liu and Xin Jin},
      year={2023},
      eprint={2305.05920},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benchmarks		benchmarks
doc		doc
fastserve		fastserve
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
contribution.md		contribution.md
environment.yml		environment.yml
profiling-model		profiling-model
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastServe

Build && Install

Artifact Evaluation

Run

Offline case

Online case

Contribution

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

LLMServe/FastServe

Folders and files

Latest commit

History

Repository files navigation

FastServe

Build && Install

Artifact Evaluation

Run

Offline case

Online case

Contribution

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages