Kimi-K2.5 is Moonshot AI’s open-source, native multimodal, agentic MoE. It is a 1T-parameter model (32B active) with 256K context, MLA attention, and a MoonViT vision encoder, supporting both thinking and instant modes.
In SGLang, Kimi-K2.5 uses the kimi_k2 reasoning and tool-call parsers for correct thinking and tool handling.
Kimi-K2.5 support is in SGLang main and will land in the next release. Use the latest main or a nightly image until then.
Official deployment guide: Kimi-K2.5 deployment guide
Install (Latest Main)
uv pip install "sglang @ git+https://github.com/sgl-project/sglang.git#subdirectory=python"
# For CUDA 12:
uv pip install "nvidia-cudnn-cu12==9.16.0.29"
# For CUDA 13:
uv pip install "nvidia-cudnn-cu13==9.16.0.29"
Launch Kimi-K2.5 with SGLang
Example: single node, TP8 on H200.
python3 -m sglang.launch_server \
--model-path moonshotai/Kimi-K2.5 \
--tp 8 \
--trust-remote-code \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k2
Parser Requirements
--tool-call-parser kimi_k2: Required for tool calling.
--reasoning-parser kimi_k2: Required to parse thinking content; thinking mode is enabled by default.
Test the Deployment
Thinking mode is enabled by default. To disable thinking (instant mode), pass extra_body.chat_template_kwargs.thinking=false.
# Thinking mode (default)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain mixture-of-experts in one sentence."}
],
"max_tokens": 256
}'
# Instant mode (thinking disabled)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{"role": "user", "content": "Give one sentence on MoE models."}
],
"max_tokens": 128,
"extra_body": {"chat_template_kwargs": {"thinking": false}}
}'
Kimi-K2.5 is multimodal. Image inputs are supported via the OpenAI-compatible vision API. For more details, see openai_api_vision.ipynb.
# Image input (SGLang)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "https://github.com/sgl-project/sglang/blob/main/examples/assets/example_image.png?raw=true"
}
}
]
}
],
"max_tokens": 256
}'
Video chat is experimental and is only supported in the official Moonshot API for now.