Skip to content

Agent-R1

Training Powerful LLM Agents with End-to-End Reinforcement Learning

Agent-R1 is an open-source framework for training powerful language agents with end-to-end reinforcement learning. With Agent-R1, you can build custom agent workflows, define interactive environments and tools, and train multi-step agents in a unified RL pipeline.

  • Step-level MDP


    A principled MDP formulation that enables flexible context management and per-step reward signals.

    Learn more

  • Layered Abstractions


    From maximum flexibility to out-of-the-box, choose the right level of abstraction for your use case.

    Learn more


Reading Guide

  • Start with Getting Started if you want the minimal path: use the same environment as verl, download the processed data from ModelScope, run a sanity check, and confirm the repository is ready.
  • Read Step-level MDP and Layered Abstractions if you want to understand the framework design before touching code.
  • Follow Agent Task Tutorial if you want to see the minimal GSM8K + Tool example through ToolEnv + BaseTool.
  • Read Recipes and Algorithms for the current GSM8K, HotpotQA, ALFWorld, WebShop, paper-search, and algorithm script layout.

Scope of This Documentation

This version of the documentation is intentionally compact. It focuses on the parts that are already central to Agent-R1 today: the core agent abstractions, runnable examples, and recipe-level integrations.


Supported by the State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China (USTC).