- Open-To-Use Refit Functionality: diverse method for colocate or disaggregate
- Easy To Postpone Generation: enable partial rollout and dedicated rollout control
- Fine-Grained Engine Sleep And Wake Up: facilitate maxium-powered rollout and training
- Training Serving Alignment: ensure the performance consistency in training and serving
- Load Balancing Router: cache-aware load-balancing for high-throughput rollout
- Deterministic Inference: ensure zero kl divergence between rollout and training
Adoption
- Miles: Enterprise-scale RL framework for large MoE models with SGLang-native rollout, speculative training, and production-grade stability
- slime: Post-training framework combining Megatron and SGLang, used to train GLM-4.6
- AReaL: Fully asynchronous RL system achieving 2.77x speedup with SGLang backend for continuous rollout generation
- ROLL: ROLL is an efficient and user-friendly RL library designed for Large Language Models utilizing Large Scale GPU resources
- verl: Full-stack RLHF framework supporting PPO, GRPO, and ReMax with modular SGLang integration
- Unsloth: 2x faster fine-tuning with optimized kernels, deploys seamlessly with SGLang inference
- LLaMA Factory: Unified framework for training 100+ LLMs with LoRA, QLoRA, and full fine-tuning methods
- Tunix: Google’s JAX-native library for LLM post-training with SFT, DPO, PPO, and GRPO support
- RL2: Ray Less Reinforcement Learning, a concise library of post-training for large language models
