Skip to content

Core Concepts

Claw-R1's design is built around three tightly integrated ideas. Understanding them together is the key to understanding why the framework works the way it does.

  • Base URL Integration


    Any agent that speaks HTTP can join the training loop with a single configuration change. No SDK patches, no source-code modifications — the Gateway acts as a transparent network-layer proxy.

    Read more

  • Middleware Layer


    Gateway + DataPool form the sole bridge between Agent Side and Training Side. The two sides never communicate directly, enabling fully asynchronous, non-blocking co-existence of service and training.

    Read more

  • Production Scenario


    The framework is designed for agents that live in the real world: agents that must serve users without interruption while continuously improving from their own interactions.

    Read more


The Closed Loop

These three concepts are not independent features — they form a flywheel:

  Black-box agent changes base_url (zero code modification)
  Production agent can be deployed and serve real users
  Real users generate real interaction trajectories
  Middleware Layer intercepts and buffers trajectories asynchronously
  Trainer fetches batches, updates model weights
  Updated weights pushed back → agent improves
  Better agent generates higher-quality trajectories
              └─────────────────────────────────────┘
                    (positive feedback loop)

Remove any one of the three and the loop breaks:

  • Without base URL integration, black-box agents need code changes → real-world deployment becomes impractical
  • Without the Middleware Layer, Agent Side and Training Side must be coupled → training blocks service
  • Without the production scenario focus, the first two points are just a more convenient RLVR framework — the core value of learning from real user interactions is lost