Currently supported parsers:
| Parser | Supported Models | Notes |
|---|---|---|
deepseekv3 | DeepSeek-v3 (e.g., deepseek-ai/DeepSeek-V3-0324) | Recommend adding --chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja to launch command. |
deepseekv31 | DeepSeek-V3.1 and DeepSeek-V3.2-Exp (e.g. deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3.2-Exp) | Recommend adding --chat-template ./examples/chat_template/tool_chat_template_deepseekv31.jinja (Or ..deepseekv32.jinja for DeepSeek-V3.2) to launch command. |
deepseekv32 | DeepSeek-V3.2 (deepseek-ai/DeepSeek-V3.2) | |
glm | GLM series (e.g. zai-org/GLM-4.6) | |
gpt-oss | GPT-OSS (e.g., openai/gpt-oss-120b, openai/gpt-oss-20b, lmsys/gpt-oss-120b-bf16, lmsys/gpt-oss-20b-bf16) | The gpt-oss tool parser filters out analysis channel events and only preserves normal text. This can cause the content to be empty when explanations are in the analysis channel. To work around this, complete the tool round by returning tool results as role="tool" messages, which enables the model to generate the final content. |
kimi_k2 | moonshotai/Kimi-K2-Instruct | |
llama3 | Llama 3.1 / 3.2 / 3.3 (e.g. meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.3-70B-Instruct) | |
llama4 | Llama 4 (e.g. meta-llama/Llama-4-Scout-17B-16E-Instruct) | |
mistral | Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3) | |
pythonic | Llama-3.2 / Llama-3.3 / Llama-4 | Model outputs function calls as Python code. Requires --tool-call-parser pythonic and is recommended to use with a specific chat template. |
qwen | Qwen series (e.g. Qwen/Qwen3-Next-80B-A3B-Instruct, Qwen/Qwen3-VL-30B-A3B-Thinking) except Qwen3-Coder | |
qwen3_coder | Qwen3-Coder (e.g. Qwen/Qwen3-Coder-30B-A3B-Instruct) | |
step3 | Step-3 |
OpenAI Compatible API
Launching the Server
--tool-call-parser defines the parser used to interpret responses.
Define Tools for Function Call
Below is a Python snippet that shows how to define a tool as a dictionary. The dictionary includes a tool name, a description, and property defined Parameters.Define Messages
Initialize the Client
Non-Streaming Request
Handle Tools
When the engine determines it should call a particular tool, it will return arguments or partial arguments through the response. You can parse these arguments and later invoke the tool accordingly.Streaming Request
Handle Tools
When the engine determines it should call a particular tool, it will return arguments or partial arguments through the response. You can parse these arguments and later invoke the tool accordingly.Define a Tool Function
Execute the Tool
Send Results Back to Model
Native API and SGLang Runtime (SRT)
Offline Engine API
Tool Choice Mode
SGLang supports OpenAI’stool_choice parameter to control when and which tools the model should call. This feature is implemented using EBNF (Extended Backus-Naur Form) grammar to ensure reliable tool calling behavior.
Supported Tool Choice Options
tool_choice="required": Forces the model to call at least one tooltool_choice={"type": "function", "function": {"name": "specific_function"}}: Forces the model to call a specific function
Backend Compatibility
Tool choice is fully supported with the Xgrammar backend, which is the default grammar backend (--grammar-backend xgrammar). However, it may not be fully supported with other backends such as outlines.
Example: Required Tool Choice
Example: Specific Function Choice
Pythonic Tool Call Format (Llama-3.2 / Llama-3.3 / Llama-4)
Some Llama models (such as Llama-3.2-1B, Llama-3.2-3B, Llama-3.3-70B, and Llama-4) support a “pythonic” tool call format, where the model outputs function calls as Python code, e.g.:- The output is a Python list of function calls, with arguments as Python literals (not JSON).
- Multiple tool calls can be returned in the same list:
How to enable
- Launch the server with
--tool-call-parser pythonic - You may also specify —chat-template with the improved template for the model (e.g.,
--chat-template=examples/chat_template/tool_chat_template_llama4_pythonic.jinja). This is recommended because the model expects a special prompt format to reliably produce valid pythonic tool call outputs. The template ensures that the prompt structure (e.g., special tokens, message boundaries like<|eom|>, and function call delimiters) matches what the model was trained or fine-tuned on. If you do not use the correct chat template, tool calling may fail or produce inconsistent results.
Forcing Pythonic Tool Call Output Without a Chat Template
If you don’t want to specify a chat template, you must give the model extremely explicit instructions in your messages to enforce pythonic output. For example, forLlama-3.2-1B-Instruct, you need:
Note:
The model may still default to JSON if it was heavily finetuned on that format. Prompt engineering (including examples) is the only way to increase the chance of pythonic output if you are not using a chat template.
How to support a new model?
- Update the TOOLS_TAG_LIST in sglang/srt/function_call_parser.py with the model’s tool tags. Currently supported tags include:
- Create a new detector class in sglang/srt/function_call_parser.py that inherits from BaseFormatDetector. The detector should handle the model’s specific function call format. For example:
- Add the new detector to the MultiFormatParser class that manages all the format detectors.
