System Settings section to ensure the clusters are roaring at max performance. Feel free to leave an issue here at sglang if you encounter any issues or have any problems.
Component Version Mapping For SGLang
| Component | Version | Obtain Way |
|---|---|---|
| HDK | 25.3.RC1 | link |
| CANN | 8.5.0 | Obtain Images |
| Pytorch Adapter | 7.3.0 | link |
| MemFabric | 1.0.5 | pip install memfabric-hybrid==1.0.5 |
| Triton | 3.2.0 | pip install triton-ascend |
| Bisheng | 20251121 | link |
| SGLang NPU Kernel | NA | link |
Obtain CANN Image
You can obtain the dependency of a specified version of CANN through an image.Preparing the Running Environment
Method 1: Installing from source with prerequisites
Python Version
Onlypython==3.11 is supported currently. If you don’t want to break system pre-installed python, try installing with conda.
CANN
Prior to start work with SGLang on Ascend you need to install CANN Toolkit, Kernels operator package and NNAL version 8.3.RC2 or higher, check the installation guideMemFabric-Hybrid
If you want to use PD disaggregation mode, you need to install MemFabric-Hybrid. MemFabric-Hybrid is a drop-in replacement of Mooncake Transfer Engine that enables KV cache transfer on Ascend NPU clusters.Pytorch and Pytorch Framework Adaptor on Ascend
torch and install torch_npu, check installation guide
Triton on Ascend
We provide our own implementation of Triton for Ascend.SGLang Kernels NPU
We provide SGL kernels for Ascend NPU, check installation guide.DeepEP-compatible Library
We provide a DeepEP-compatible Library as a drop-in replacement of deepseek-ai’s DeepEP library, check the installation guide.CustomOps
TODO: to be removed once merged into sgl-kernel-npu. Additional package with custom operations. DEVICE_TYPE can be “a3” for Atlas A3 server or “910b” for Atlas A2 server.Installing SGLang from source
Method 2: Using Docker Image
Obtain Image
You can download the SGLang image or build an image based on Dockerfile to obtain the Ascend NPU image.- Download SGLang image
- Build an image based on Dockerfile
Create Docker
Notice:--privileged and --network=host are required by RDMA, which is typically needed by Ascend NPU clusters.
Notice: The following docker command is based on Atlas 800I A3 machines. If you are using Atlas 800I A2, make sure only davinci[0-7] are mapped into container.
System Settings
CPU performance power scheme
The default power scheme on Ascend hardware isondemand which could affect performance, changing it to performance is recommended.
Disable NUMA balancing
Prevent swapping out system memory
Running SGLang Service
Running Service For Large Language Models
PD Mixed Scene
PD Separation Scene
- Launch Prefill Server
- Launch Decode Server
- Launch Router
