Rl
verl is an open-source reinforcement learning (RL) training framework designed specifically for post-training large language models (LLMs). It supports agentic RL training with features such as server-based asynchronous rollout, multi-turn conversations, and tool calls within an agent framework. The framework employs a hybrid programming model that combines single-controller and multi-controller paradigms, allowing flexible representation and execution of complex post-training dataflows. verl integrates with popular LLM infrastructures including PyTorch FSDP, Megatron-LM, vLLM, and SGLang, and offers modular APIs for seamless extension and integration with HuggingFace models. verl is optimized for efficient resource utilization through flexible device mapping and parallelism across GPU clusters. It achieves high throughput by integrating state-of-the-art LLM training and inference frameworks and reduces memory redundancy and communication overhead during training-generation transitions using actor model resharding with its 3D-HybridEngine technology. The framework targets developers and researchers working on RL post-training for LLMs who require scalable and efficient training solutions on GPU clusters.
verl is an open-source RL framework for post-training large language models that supports flexible dataflows and integrates with multiple LLM infrastructures.
Post-Training RL for Large Language Models
Researchers and developers can apply reinforcement learning techniques to fine-tune large language models after initial training to improve performance on specific tasks.
Integration with Existing LLM Infrastructure
Teams using frameworks like PyTorch FSDP or Megatron-LM can extend their workflows by incorporating RL training with verl's modular APIs.