Skip to content

Claw-R1

Empowering OpenClaw with Advanced Agentic RL

GitHub Stars GitHub Forks License: MIT Python 3.10+


Claw-R1 is an Agentic RL training framework that bridges the gap between General Agents (e.g., OpenClaw, Claude Code) and Agentic Reinforcement Learning.

It introduces a Middleware Layer — Gateway Server + DataPool — as the sole bridge between the Agent Side and the Training Side. Agents, whether white-box or black-box, connect to the framework via standard HTTP with zero code modification.

  • Zero-Code Integration


    Black-box agents (LangChain, AutoGen, CrewAI, OpenClaw) integrate instantly — just redirect base_url to the Gateway. No SDK hooks, no source modifications.

    Base URL Integration

  • Middleware Layer


    Gateway + DataPool completely decouple the Agent Side from the Training Side, enabling asynchronous, non-blocking training while the agent keeps serving.

    Middleware Layer

  • Production Agent Scenario


    Supports three modes: white-box offline, black-box offline, and black-box online service. In online mode, agents serve and train simultaneously — no dataset required.

    Production Scenario

  • Async Training & Rollout


    Rollout Engine and Training Engine run independently. Data flows from live requests into DataPool; the Trainer continuously fetches batches — never blocking the agent.

    Async Training


Framework Overview

Claw-R1 Framework

The framework consists of three logical layers:

Layer Components Role
Agent Side OpenClaw / White-box AgentFlow / Black-box Agent Executes tasks, calls LLM via Gateway
Middleware Layer Gateway Server + DataPool Intercepts LLM calls, buffers trajectories asynchronously
Training Side Async Trainer + Rollout Engine + Reward System Fetches batches, updates model, syncs weights

Why Claw-R1?

Most Agentic RL frameworks share a hidden assumption: training ≠ deployment. They train on simulated data, deploy a fixed model, and periodically retrain. This works for research but breaks down in production:

  • Models trained on synthetic tasks degrade on real user request distributions
  • No mechanism for continuous adaptation to specific users or tool ecosystems
  • Blocking synchronous loops make it impossible to serve while training

Claw-R1 is designed to fill this void. It enables deployment = training: a production agent that continuously learns from its own service interactions.

Get started in minutes Read the concepts


Project Status

Active Development

Claw-R1 was initiated in March 2026 and is under active development. APIs and configurations may change before the first stable release. Contributions and feedback are welcome.


Team

Members: Daoyu Wang, Jie Ouyang, Shuo Yu

Supervisors: Qi Liu, Mingyue Cheng

Affiliation: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China


Citation

@misc{clawr1-2026,
  title   = {Claw-R1: Agentic RL for Modern Agents},
  author  = {Wang, Daoyu and Ouyang, Jie and Yu, Shuo and Cheng, Mingyue and Liu, Qi},
  year    = {2025},
  url     = {https://github.com/AgentR1/Claw-R1},
  note    = {GitHub repository}
}