Components¶

Claw-R1 is composed of five independently runnable components that communicate via HTTP and Ray RPC.

Gateway Server

FastAPI HTTP service. The network-layer entry point for all agent LLM calls. Manages load balancing across vLLM servers and submits steps to DataPool.

Gateway Server
DataPool

Ray Actor. The central trajectory buffer between Agent Side and Training Side. Supports asynchronous writes from the Gateway and batch reads from the Trainer.

DataPool
Agent Flow

Python framework for white-box agents. Manages chat templates, tokenization, multimodal data processing, and HTTP communication with the Gateway.

Agent Flow
Async Training

AsyncTrainer and AsyncRollouter Ray Actors. Continuous, non-blocking training loop with parameter synchronization.

Async Training
Reward System

RewardLoopWorker Ray Actor. Computes step-level rewards from rule-based, discriminative, or generative reward models.

Reward System

Component Interaction Map¶

                      ┌─────────────────────────────────────────┐
  Black-box Agent ───►│                                         │
  (base_url only)     │         GATEWAY SERVER                  │
                      │         (FastAPI, port 8000)            │
  White-box Agent ───►│                                         │
  (AgentFlow)         └────────────┬────────────────────────────┘
                                   │ Ray RPC (submit_step)
                                   ▼
                      ┌─────────────────────────────────────────┐
                      │         DATAPOOL                         │
                      │         (Ray Actor)                      │
                      └──────────────────┬──────────────────────┘
                                         │ fetch_batch()
                                         ▼
                      ┌─────────────────────────────────────────┐
                      │         ASYNC TRAINER                    │
                      │         (Ray Actor)                      │
                      │   ┌─────────────────────────────────┐   │
                      │   │  Actor │ Critic │ RefPolicy      │   │
                      │   └────────────────────────────────-┘   │
                      └────────────────┬────────────────────────┘
                                       │ weight sync (NCCL)
                                       ▼
                      ┌─────────────────────────────────────────┐
                      │         ASYNC ROLLOUTER                  │
                      │         (Ray Actor, rollout GPU pool)    │
                      │         vLLM servers                     │
                      └─────────────────────────────────────────┘