Unify local and remote models into one OpenAI-compatible endpoint
Linx is a bridge application that connects local models (via Ollama or Llama.cpp) and remote models (via OpenRouter.ai or other OpenAI-compatible providers) under a single unified API. It exposes all connected models through an OpenAI-compatible interface, allowing seamless use in applications like Cursor AI, VSCode extensions, or any client supporting the OpenAI API format. Both CLI and GUI versions exist, with the CLI being fully functional and the GUI in active development.
- Unified Endpoint — Merge local and remote models into one
/v1
API - Multi-Provider Support — Ollama, Llama.cpp, OpenRouter, and OpenAI-compatible APIs
- OpenAI-Compatible — Works with any OpenAI-style client (Cursor, Continue, etc.)
- Privacy First — Keep your data local with Ollama or Llama.cpp
- Smart Routing — Automatic provider selection with intelligent fallback
- Tunneling — Public access via
localhost.run
or ngrok - CLI & GUI — Command-line interface ready, GUI in development
- Model Mapping — Custom model name aliases across providers
- Secure — Optional API key authentication
- Stream Support — Full streaming for real-time responses
- No Timeout Limits — Long-running tasks supported
Option A: Ollama
ollama serve
Option B: Llama.cpp Server
./llama-server -m model.gguf --port 8080
pip install -r requirements.txt
Edit config.json
to configure your providers (see Configuration section below).
CLI Mode:
python run_cli.py
With Options:
python run_cli.py --port 8080 --tunnel
Note: Electron GUI is in active development.
Linx works with any OpenAI-compatible tool:
- Cursor AI — Set API URL to
http://localhost:8080/v1
- Continue.dev — Configure as OpenAI-compatible provider
- VSCode Extensions — Use Linx endpoint for AI features
- Custom Applications — Query via standard OpenAI API format
Base URL: http://localhost:8080
GET /v1/models
— List available modelsPOST /v1/chat/completions
— Chat completions (streaming & non-streaming)
GET /api/tags
— List Ollama modelsPOST /api/chat
— Ollama native chat (NDJSON)POST /api/generate
— Ollama generate endpointPOST /api/show
— Model information
GET /v1/providers/status
— Provider health statusPOST /api/tunnel/start
— Start localhost.run tunnelPOST /api/tunnel/stop
— Stop tunnelGET /api/tunnel/status
— Tunnel status
Example config.json
:
{
"ollama": {
"enabled": true,
"endpoint": "http://localhost:11434",
"thinking_mode": true,
"model_mappings": {
"gpt-4o": "qwen2.5-coder:32b",
"gpt-4": "llama3.1:70b",
"gpt-3.5-turbo": "llama3.2:3b",
"default": "qwen2.5-coder:7b"
}
},
"llamacpp": {
"enabled": false,
"endpoint": "http://localhost:8080",
"model_mappings": {
"gpt-4": "local-model"
}
},
"openrouter": {
"enabled": false,
"api_key": "sk-or-v1-your-api-key-here",
"endpoint": "https://openrouter.ai/api/v1",
"model_mappings": {
"gpt-4o": "openai/gpt-4o",
"claude-3.5-sonnet": "anthropic/claude-3.5-sonnet",
"deepseek-chat": "deepseek/deepseek-chat"
}
},
"routing": {
"provider_priority": ["ollama", "llamacpp", "openrouter"],
"fallback_enabled": true,
"cost_optimization": true
},
"server": {
"port": 8080,
"hostname": "127.0.0.1"
},
"tunnel": {
"use_tunnel": true,
"type": "localhost_run"
}
}
Configuration Options:
- enabled — Enable/disable provider
- endpoint — Provider API URL
- thinking_mode — Enable extended reasoning (Ollama/Llama.cpp)
- model_mappings — Map requested model names to provider-specific models
- provider_priority — Order of provider selection
- fallback_enabled — Auto-fallback to next provider on failure
- cost_optimization — Prefer cheaper providers when possible
python run_cli.py [options]
Options:
--port PORT
— Server port (default: 8080)--host HOST
— Bind address (default: 127.0.0.1)--tunnel
— Enable localhost.run tunnel--no-tunnel
— Disable tunnel--ollama URL
— Override Ollama endpoint URL--api-key KEY
— Require API key authentication
Examples:
# Basic usage
python run_cli.py
# Custom port with tunnel
python run_cli.py --port 9000 --tunnel
# With API key protection
python run_cli.py --api-key sk-your-secret-key
# Custom Ollama endpoint
python run_cli.py --ollama http://192.168.1.100:11434
Linx allows you to map common model names (like gpt-4o
) to your preferred local or remote models:
"model_mappings": {
"gpt-4o": "qwen2.5-coder:32b",
"gpt-4": "llama3.1:70b",
"claude-3.5-sonnet": "anthropic/claude-3.5-sonnet"
}
How it works:
- Client requests
gpt-4o
- Linx checks mappings for each provider
- Routes to first available provider with that mapping
- Falls back to next provider if primary fails
Benefits:
- Use familiar model names across providers
- Seamless switching between local and remote models
- Easy A/B testing of different models
pyinstaller --name Linx-CLI --onefile --console --icon=icon.ico --add-data "config.json;." run_cli.py
python setup.py py2app --cli
Linx acts as an intelligent proxy between AI clients and model providers:
┌─────────────┐
│ AI Client │ (Cursor, Continue, Kilocode, Custom App)
│ (OpenAI │
│ API) │
└──────┬──────┘
│
▼
┌─────────────────────────────────────┐
│ Linx Router │
│ ┌──────────────────────────────┐ │
│ │ Smart Routing & Fallback │ │
│ │ Model Mapping & Translation │ │
│ │ Health Checks & Monitoring │ │
│ └──────────────────────────────┘ │
└───┬─────────┬─────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌──────┐ ┌──────────┐
│ Ollama │ │Llama │ │OpenRouter│
│ Local │ │.cpp │ │ Remote │
└────────┘ └──────┘ └──────────┘
- Renamed: OllamaLink → Linx
- Multi-Provider: Added Llama.cpp support alongside Ollama
- Enhanced Routing: Smart provider selection with health monitoring
- OpenAI Compatible: Full
/v1
API compliance - Streaming: Proper SSE streaming for all providers
- Tunnel Support: localhost.run integration for remote access
- Code Optimization: Cleaner architecture, removed global variables
- GUI Development: Electron-based interface in progress
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - see license.md for details