Skip to content

Commit 0113281

Browse files
simplify readme (#110)
1 parent 7c62e51 commit 0113281

File tree

4 files changed

+33
-137
lines changed

4 files changed

+33
-137
lines changed

README.md

Lines changed: 10 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ We gained a great deal of inspiration and motivation from [this open source proj
88

99
<img src="./docs/llm-inference.png" alt="image" width=600 height="auto">
1010

11+
### TL;DR
12+
1113
Llm-inference is a platform for deploying and managing LLM (Lifelong Learning Machine) inference tasks with the following features:
1214

1315
- Utilizes Ray technology to organize multiple nodes into a cluster, achieving centralized management of computational resources and distributing resources required for each inference task.
@@ -22,100 +24,45 @@ Llm-inference is a platform for deploying and managing LLM (Lifelong Learning Ma
2224

2325
More features in [Roadmap](./Roadmap.md) are coming soon.
2426

25-
## Getting started
2627

27-
### Deploy locally
28+
## Deployment
2829

29-
#### Install `LLM Inference` and dependencies
30+
### Install `LLM Inference` and dependencies
3031

3132
You can start by cloning the repository and pip install `llm-serve`. It is recommended to deploy `llm-serve` with Python 3.10+.
3233

3334
```
3435
git clone https://github.com/OpenCSGs/llm-inference.git
3536
cd llm-inference
36-
pip install .
37-
```
38-
39-
Option to use another pip source for faster transfer if needed.
40-
41-
```
42-
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple/
4337
```
4438

4539
Install specified dependencies by components:
4640

4741
```
4842
pip install '.[backend]'
49-
pip install '.[frontend]'
5043
```
5144

52-
**Note:** Install vllm dependency if runtime supports GPUs, run the following command:
45+
**Note:** `vllm` is optional, since it requires GPU:
5346

5447
```
5548
pip install '.[vllm]'
5649
```
5750

58-
Option to use other pip sources for faster transfers if needed.
59-
51+
Install `llm-inference`:
6052
```
61-
pip install '.[backend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
62-
pip install '.[frontend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
63-
pip install '.[vllm]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
64-
```
65-
66-
#### Install Ray and start a Ray Cluster locally
67-
68-
Pip install Ray:
69-
70-
```
71-
pip install -U "ray[serve-grpc]==2.9.3"
53+
pip install .
7254
```
7355

74-
Option to use another pip source for faster transfer if needed.
75-
76-
```
77-
pip install -U "ray[serve-grpc]==2.9.3" -i https://pypi.tuna.tsinghua.edu.cn/simple/
78-
```
79-
80-
> **Note:** ChatGLM2-6b requires transformers<=4.33.3, while the latest vllm requires transformers>=4.36.0.
81-
82-
Start cluster then:
56+
### Start a Ray Cluster locally
8357

8458
```
8559
ray start --head --port=6379 --dashboard-host=0.0.0.0 --dashboard-port=8265
8660
```
8761

88-
See reference [here](https://docs.ray.io/en/releases-2.9.3/ray-overview/installation.html).
89-
90-
#### Quick start
91-
92-
You can follow the [quick start](./docs/quick_start.md) to run an end-to-end case for model serving.
93-
94-
#### Uninstall
95-
96-
Uninstall `llm-serve` package:
97-
98-
```
99-
pip uninstall llm-serve
100-
```
101-
102-
Then shutdown the `Ray` cluster:
103-
104-
```
105-
ray stop
106-
```
107-
108-
### API server
109-
110-
See the [guide](./docs/api_server.md) for API server and API documents.
111-
112-
### Deploy on bare metal
113-
114-
See the [guide](./docs/deploy_on_bare_metal.md) to deploy on bare metal.
62+
### Quick start
11563

116-
### Deploy on kubernetes
64+
You can follow the [quick start](./docs/quick_start.md) to run an end-to-end case.
11765

118-
See the [guide](./docs/deploy_on_kubernetes.md) to deploy on kubernetes.
11966

12067
## FAQ
12168

README_cn.md

Lines changed: 9 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -19,101 +19,54 @@ Llm-inference 是一个用于部署和管理LLM(Lifelong Learning Machine)
1919

2020
更多[开发路线图](./Roadmap.md)中的功能正在开发中,欢迎您的贡献。
2121

22-
## 快速入门
22+
## 本地部署
2323

24-
### 本地部署
25-
26-
#### 部署`LLM Inference`及其依赖
24+
### 部署`LLM Inference`及其依赖
2725

2826
您可以下载此项目代码,然后使用pip install ' llm-serve '安装。建议使用Python 3.10+部署'llm-serve '。
2927

3028
```
3129
git clone https://github.com/OpenCSGs/llm-inference.git
3230
cd llm-inference
33-
pip install .
34-
```
35-
36-
如果您受到网络传输速度的限制或影响,可以选择使用更快的传输速度的pip源。
37-
38-
```
39-
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple/
4031
```
4132

4233
按组件安装指定的依赖项:
4334

4435
```
4536
pip install '.[backend]'
46-
pip install '.[frontend]'
4737
```
4838

49-
**注意:** 如果运行时支持gpu,请执行以下命令安装vllm依赖:
39+
**注意:** vllm是可选的,因为它需要GPU, 根据你环境的情况安装:
5040

5141
```
5242
pip install '.[vllm]'
5343
```
5444

55-
如果您受到网络传输速度的限制或影响,可以选择使用更快的传输速度的pip源。
56-
57-
```
58-
pip install '.[backend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
59-
pip install '.[frontend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
60-
pip install '.[vllm]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
61-
```
62-
63-
#### 安装Ray并在本地启动Ray Cluster
64-
65-
安装Ray:
66-
45+
安装 `llm-inference`:
6746
```
68-
pip install -U "ray[serve-grpc]==2.8.0"
47+
pip install .
6948
```
7049

7150
如果您受到网络传输速度的限制或影响,可以选择使用更快的传输速度的pip源。
7251

7352
```
74-
pip install -U "ray[serve-grpc]==2.8.0" -i https://pypi.tuna.tsinghua.edu.cn/simple/
53+
pip install '.[backend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
54+
pip install '.[vllm]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
55+
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple/
7556
```
7657

77-
> **注意:** ChatGLM2-6b要求transformers<=4.33.3,最新的vllm要求transformers>=4.36.0。
58+
#### 安装Ray并在本地启动Ray Cluster
7859

7960
启动Ray集群:
8061

8162
```
8263
ray start --head --port=6379 --dashboard-host=0.0.0.0 --dashboard-port=8265
8364
```
8465

85-
请参阅[此文档](https://docs.ray.io/en/releases-2.8.0/ray-overview/installation.html)获取更多Ray的安装与启动的信息.
86-
8766
#### 快速入门
8867

8968
你可以按照[快速入门](./docs/quick_start.md)来运行一个端到端的模型服务案例。
9069

91-
#### 卸载
92-
93-
卸载 `llm-serve` :
94-
95-
```
96-
pip uninstall llm-serve
97-
```
98-
99-
停止`Ray` 集群:
100-
101-
```
102-
ray stop
103-
```
104-
105-
### API服务器
106-
107-
关于详细的API Server和API的内容,请参见[此文档](./docs/api_server.md)
108-
109-
### 在裸机上部署
110-
111-
请参见[此文档](./docs/deploy_on_bare_metal.md)中描述,查阅如何在裸机上部署。
112-
113-
### 在Kubernetes中部署
114-
115-
请参见[此文档](./docs/deploy_on_kubernetes.md),查阅如何在Kubernetes中部署。
116-
11770
## 其他事项
11871

11972
### 使用模型从本地路径或git服务器或S3存储或OpenCSG Model Hub

docs/quick_start.md

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction to llm-serve
44

5-
`llmserve` comes with its own CLI, `llm-serve`, which allows you to interact directly with the backend without having to use the Gradio frontend.
5+
`llmserve` comes with its own CLI, `llm-serve`, which allows you to interact directly with the backend.
66

77
Installing `llmserve` also installs the `llm-serve` CLI, and you can get a list of all available commands by running `llm-serve --help`.
88

@@ -25,57 +25,52 @@ Installing `llmserve` also installs the `llm-serve` CLI, and you can get a list
2525

2626
## Start a model serving
2727

28-
You can deploy any model in the `models` directory of this repo, or define your own model YAML file and run that instead.
28+
You can deploy any model in the [models](../models) directory of this repo, or define your own model YAML file and run that instead.
2929
For example:
3030

3131
```
32-
llm-serve start serving-rest --model=models/text-generation--gpt2.yaml
33-
34-
# You can start mutiple models serving at once.
35-
llm-serve start serving-rest --model=models/text-generation--facebook--opt-125m.yaml --model=models/text-generation--gpt2.yaml
32+
llm-serve start serving-rest --model models/text-generation--facebook--opt-125m.yaml
3633
```
3734

3835
## Check model serving status and predict URL
3936

4037
Check model serving status and predict URL by:
4138

4239
```SHELL
43-
# llm-serve list serving --name gpt2
40+
# llm-serve list serving --appname default
4441
{
45-
"gpt2": {
42+
"default": {
4643
"status": {
47-
"gpt2": {
44+
"default": {
4845
"application_status": "RUNNING",
4946
"deployments_status": {
50-
"gpt2": "HEALTHY",
51-
"RouterDeployment": "HEALTHY"
47+
"facebook--opt-125m": "HEALTHY",
48+
"facebook--opt-125m-router": "HEALTHY"
5249
}
5350
}
5451
},
5552
"url": {
56-
"prodict_url": "http://0.0.0.0:8000/api/v1/default/gpt2/run/predict"
53+
"facebook/opt-125m": "http://0.0.0.0:8000/api/v1/default/facebook--opt-125m/run/predict"
5754
}
5855
}
5956
}
6057
```
6158

62-
## Using the model serving
59+
## invoke model serving
6360

6461
Invoke model with command `llm-serve predict`
6562

6663
```
67-
llm-serve predict --model gpt2 --prompt "I am going to do" --prompt "What do you like"
64+
llm-serve predict --model facebook/opt-125m --prompt "I am going to do"
6865
```
6966

7067
If you start the model using `llm-serve start serving-rest`, you can also run the following command `curl` to call the model predict API shown above.
7168

7269
```
73-
curl -H "Content-Type: application/json" -X POST -d '{"prompt": "What can I do"}' "http://127.0.0.1:8000/api/v1/default/gpt2/run/predict"
74-
75-
curl -H "Content-Type: application/json" -X POST -d '[{"prompt":"How can you"}, {"prompt": "What can I do"}]' "http://127.0.0.1:8000/api/v1/default/gpt2/run/predict"
70+
curl -H "Content-Type: application/json" -X POST -d '{"prompt": "What can I do"}' "http://127.0.0.1:8000/api/v1/default/facebook--opt-125m/run/predict"
7671
```
7772

78-
## Start your trial
73+
## Start a model serving with Gradio UI
7974

8075
You can start a trial with the following command, which will start a serving and built-in UI for the model running on <http://127.0.0.1:8000/facebook--opt-125m>.
8176

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
"torchmetrics==1.2.1",
5454
"llama_cpp_python==0.2.57",
5555
"transformers==4.39.1",
56+
"ray[serve]==2.9.3",
5657
],
5758
"vllm": [
5859
"vllm>=0.2.0,<0.2.6",

0 commit comments

Comments
 (0)