1. Model Introduction
Qwen-Image-Edit-2511 is an enhanced version over Qwen-Image-Edit-2509, featuring multiple improvements—including notably better consistency. Built upon the 20B Qwen-Image model, Qwen-Image-Edit-2511 successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Key Enhancements in Qwen-Image-Edit-2511:- Mitigate Image Drift: Reduces unwanted changes in non-edited regions of the image.
- Improved Character Consistency: The model can perform imaginative edits based on an input portrait while preserving the identity and visual characteristics of the subject.
- Multi-Person Consistency: Enhanced consistency in multi-person group photos, enabling high-fidelity fusion of two separate person images into a coherent group shot.
- Integrated LoRA Capabilities: Selected popular community-created LoRAs are integrated directly into the base model, unlocking their effects without extra tuning (e.g., lighting enhancement, viewpoint generation).
- Enhanced Industrial Design Generation: Special attention to practical engineering scenarios, including batch industrial product design and material replacement for industrial components.
- Strengthened Geometric Reasoning: Stronger geometric reasoning capability for generating auxiliary construction lines for design or annotation purposes.
2. SGLang-diffusion Installation
SGLang-diffusion offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang-diffusion installation guide for installation instructions.3. Model Deployment
This section provides deployment configurations optimized for different hardware platforms and use cases.3.1 Basic Configuration
Qwen-Image-Edit-2511 is a 20B parameter model optimized for image editing tasks. The recommended launch configurations vary by hardware. Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform.3.2 Configuration Tips
Current supported optimization all listed here.--vae-path: Path to a custom VAE model or HuggingFace model ID (e.g., fal/FLUX.2-Tiny-AutoEncoder). If not specified, the VAE will be loaded from the main model path.--num-gpus: Number of GPUs to use--tp-size: Tensor parallelism size (only for the encoder; should not be larger than 1 if text encoder offload is enabled, as layer-wise offload plus prefetch is faster)--sp-degree: Sequence parallelism size (typically should match the number of GPUs)--ulysses-degree: The degree of DeepSpeed-Ulysses-style SP in USP--ring-degree: The degree of ring attention-style SP in USP
4. API Usage
For complete API documentation, please refer to the official API usage guide.4.1 Edit an Image
4.2 Advanced Usage
4.2.1 Cache-DiT Acceleration
SGLang integrates Cache-DiT, a caching acceleration engine for Diffusion Transformers (DiT), to achieve up to 7.4x inference speedup with minimal quality loss. You can setSGLANG_CACHE_DIT_ENABLED=True to enable it. For more details, please refer to the SGLang Cache-DiT documentation.
Basic Usage
-
DBCache Parameters: DBCache controls block-level caching behavior:
Parameter Env Variable Default Description Fn SGLANG_CACHE_DIT_FN1 Number of first blocks to always compute Bn SGLANG_CACHE_DIT_BN0 Number of last blocks to always compute W SGLANG_CACHE_DIT_WARMUP4 Warmup steps before caching starts R SGLANG_CACHE_DIT_RDT0.24 Residual difference threshold MC SGLANG_CACHE_DIT_MC3 Maximum continuous cached steps -
TaylorSeer Configuration: TaylorSeer improves caching accuracy using Taylor expansion:
Combined Configuration Example:
Parameter Env Variable Default Description Enable SGLANG_CACHE_DIT_TAYLORSEERfalse Enable TaylorSeer calibrator Order SGLANG_CACHE_DIT_TS_ORDER1 Taylor expansion order (1 or 2)
4.2.2 CPU Offload
--dit-cpu-offload: Use CPU offload for DiT inference. Enable if run out of memory.--text-encoder-cpu-offload: Use CPU offload for text encoder inference.--image-encoder-cpu-offload: Use CPU offload for image encoder inference.--vae-cpu-offload: Use CPU offload for VAE.--pin-cpu-memory: Pin memory for CPU offload. Only added as a temp workaround if it throws “CUDA error: invalid argument”.
5. Benchmark
Test Environment:- Hardware: NVIDIA B200 GPU (1x)
- Model: Qwen/Qwen-Image-Edit-2511
- sglang diffusion version: 0.5.6.post2
