Pie is a high-performance, programmable LLM serving system that empowers you to design and deploy custom inference logic and optimization strategies.
Note 🧪
This software is in a pre-release stage and under active development. It's recommended for testing and research purposes only.
-
Configure a Backend:
Navigate to a backend directory and follow itsREADME.md
for setup: -
Add Wasm Target:
Install the WebAssembly target for Rust:rustup target add wasm32-wasip2
This is required to compile Rust-based inferlets in the
example-apps
directory.
Build the PIE CLI and the example inferlets.
-
Build the PIE CLI:
From the repository root, run:cd pie-cli && cargo install --path .
-
Build the Examples:
cd example-apps && cargo build --target wasm32-wasip2 --release
Download a model, start the engine, and run an inferlet.
-
Download a Model:
Use the PIE CLI to add a model from the model index:pie model add "llama-3.2-1b-instruct"
-
Start the Engine:
Launch the PIE engine with an example configuration. This opens the interactive PIE shell:cd pie-cli pie start --config ./example_config.toml
-
Run an Inferlet:
From within the PIE shell, execute a compiled inferlet:pie> run ../example-apps/target/wasm32-wasip2/release/text_completion.wasm -- --prompt "What is the capital of France?"