Prerequisites
Note: You don’t need the Ollama server installed - SGLang acts as the backend. You only need the ollama CLI or Python library as the client.
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ | GET, HEAD | Health check for Ollama CLI |
/api/tags | GET | List available models |
/api/chat | POST | Chat completions (streaming & non-streaming) |
/api/generate | POST | Text generation (streaming & non-streaming) |
/api/show | POST | Model information |
Quick Start
1. Launch SGLang Server
Note: The model name used withollama runmust match exactly what you passed to--model.
2. Use Ollama CLI
3. Use Ollama Python Library
Smart Router
For intelligent routing between local Ollama (fast) and remote SGLang (powerful) using an LLM judge, see the Smart Router documentation.Summary
| Component | Purpose |
|---|---|
| Ollama API | Familiar CLI/API that developers already know |
| SGLang Backend | High-performance inference engine |
| Smart Router | Intelligent routing - fast local for simple tasks, powerful remote for complex tasks |
