Optimized Model List
A list of LLMs have been optimized on Intel GPU, and more are on the way:| Model Name | BF16 |
|---|---|
| Llama-3.2-3B | meta-llama/Llama-3.2-3B-Instruct |
| Llama-3.1-8B | meta-llama/Llama-3.1-8B-Instruct |
| Qwen2.5-1.5B | Qwen/Qwen2.5-1.5B |
Installation
Install From Source
Currently SGLang XPU only supports installation from source. Please refer to “Getting Started on Intel GPU” to install XPU dependency.Install Using Docker
The docker for XPU is under active development. Please stay tuned.Launch of the Serving Engine
Example command to launch SGLang serving:Benchmarking with Requests
You can benchmark the performance via thebench_serving script.
Run the command in another terminal.
curl) or via your own script.