You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Llm-inference is a platform for deploying and managing LLM (Lifelong Learning Machine) inference tasks with the following features:
12
14
13
15
- Utilizes Ray technology to organize multiple nodes into a cluster, achieving centralized management of computational resources and distributing resources required for each inference task.
@@ -22,100 +24,45 @@ Llm-inference is a platform for deploying and managing LLM (Lifelong Learning Ma
22
24
23
25
More features in [Roadmap](./Roadmap.md) are coming soon.
24
26
25
-
## Getting started
26
27
27
-
### Deploy locally
28
+
##Deployment
28
29
29
-
####Install `LLM Inference` and dependencies
30
+
### Install `LLM Inference` and dependencies
30
31
31
32
You can start by cloning the repository and pip install `llm-serve`. It is recommended to deploy `llm-serve` with Python 3.10+.
llm-serve predict --model gpt2 --prompt "I am going to do" --prompt "What do you like"
64
+
llm-serve predict --model facebook/opt-125m --prompt "I am going to do"
68
65
```
69
66
70
67
If you start the model using `llm-serve start serving-rest`, you can also run the following command `curl` to call the model predict API shown above.
71
68
72
69
```
73
-
curl -H "Content-Type: application/json" -X POST -d '{"prompt": "What can I do"}' "http://127.0.0.1:8000/api/v1/default/gpt2/run/predict"
74
-
75
-
curl -H "Content-Type: application/json" -X POST -d '[{"prompt":"How can you"}, {"prompt": "What can I do"}]' "http://127.0.0.1:8000/api/v1/default/gpt2/run/predict"
70
+
curl -H "Content-Type: application/json" -X POST -d '{"prompt": "What can I do"}' "http://127.0.0.1:8000/api/v1/default/facebook--opt-125m/run/predict"
76
71
```
77
72
78
-
## Start your trial
73
+
## Start a model serving with Gradio UI
79
74
80
75
You can start a trial with the following command, which will start a serving and built-in UI for the model running on <http://127.0.0.1:8000/facebook--opt-125m>.
0 commit comments