1. Model Introduction
Qwen2.5-VL series is a vision-language model from the Qwen team, offering significant improvements over its predecessor in understanding, reasoning, and multi-modal processing. This generation delivers comprehensive upgrades across the board:- Enhanced Visual Understanding: Strong performance in document understanding, chart analysis, and scene recognition.
- Improved Reasoning: Logical reasoning and mathematical problem-solving capabilities in multi-modal contexts.
- Multiple Sizes: Available in 3B, 7B, 32B, and 72B variants to suit different deployment needs.
- ROCm Support: Compatible with AMD MI300X GPUs via SGLang (verified).
2. SGLang Installation
SGLang offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang installation guide for installation instructions.3. Model Deployment
This section provides deployment configurations optimized for AMD MI300X hardware platforms and different use cases.3.1 Basic Configuration
The Qwen2.5-VL series offers models in various sizes. The following configurations have been verified on AMD MI300X GPUs. Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform and model size.3.2 Configuration Tips
- Memory Management: For the 72B model on MI300X, we have verified successful deployment with
--context-length 128000. Smaller context lengths can be used to reduce memory usage if needed. - Multi-GPU Deployment: Use Tensor Parallelism (
--tp) to scale across multiple GPUs. For example, use--tp 8for the 72B model and--tp 2for the 32B model on MI300X.
4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, please refer to:4.2 Advanced Usage
4.2.1 Multi-Modal Inputs
Qwen2.5-VL supports image inputs. Here’s a basic example with image input:- You can also provide local file paths using
file://protocol - For larger images, you may need more memory; adjust
--mem-fraction-staticaccordingly
