Agent-R1¶

Training Powerful LLM Agents with End-to-End Reinforcement Learning¶

Agent-R1 is an open-source framework for training powerful language agents with end-to-end reinforcement learning. With Agent-R1, you can build custom agent workflows, define interactive environments and tools, and train multi-step agents in a unified RL pipeline.

Step-level MDP

A principled MDP formulation that enables flexible context management and per-step reward signals.

Learn more
Layered Abstractions

From maximum flexibility to out-of-the-box, choose the right level of abstraction for your use case.

Learn more

Reading Guide¶

Start with Getting Started if you want the minimal path: use the same environment as verl, download the processed data from ModelScope, run a sanity check, and confirm the repository is ready.
Read Step-level MDP and Layered Abstractions if you want to understand the framework design before touching code.
Follow Agent Task Tutorial if you want to see the minimal GSM8K + Tool example through ToolEnv + BaseTool.
Read Recipes and Algorithms for the current GSM8K, HotpotQA, ALFWorld, WebShop, paper-search, and algorithm script layout.

Scope of This Documentation¶

This version of the documentation is intentionally compact. It focuses on the parts that are already central to Agent-R1 today: the core agent abstractions, runnable examples, and recipe-level integrations.

Supported by the State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China (USTC).