System Configuration
When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:- AMD MI300X Tuning Guides
- LLM inference performance validation on AMD Instinct MI300X
- AMD Instinct MI300X System Optimization
- AMD Instinct MI300X Workload Optimization
- Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
Update GRUB Settings
In/etc/default/grub, append the following to GRUB_CMDLINE_LINUX:
sudo update-grub (or your distro’s equivalent) and reboot.
Disable NUMA Auto-Balancing
Install SGLang
You can install SGLang using one of the methods below.Install from Source
Install Using Docker (Recommended)
The docker images are available on Docker Hub at lmsysorg/sglang, built from rocm.Dockerfile. The steps below show how to build and use an image.-
Build the docker image.
If you use pre-built images, you can skip this step and replace
sglang_imagewith the pre-built image names in the steps below. -
Create a convenient alias.
If you are using RDMA, please note that:
--network hostand--privilegedare required by RDMA. If you don’t need RDMA, you can remove them.- You may need to set
NCCL_IB_GID_INDEXif you are using RoCE, for example:export NCCL_IB_GID_INDEX=3.
-
Launch the server.
NOTE: Replace
<secret>below with your huggingface hub token. -
To verify the utility, you can run a benchmark in another terminal or refer to other docs to send requests to the engine.
Examples
Running DeepSeek-V3
The only difference when running DeepSeek-V3 is in how you start the server. Here’s an example command:Running Llama3.1
Running Llama3.1 is nearly identical to running DeepSeek-V3. The only difference is in the model specified when starting the server, shown by the following example command:Warmup Step
When the server displaysThe server is fired up and ready to roll!, it means the startup is successful.