Components¶
Claw-R1 is composed of five independently runnable components that communicate via HTTP and Ray RPC.
-
Gateway Server
FastAPI HTTP service. The network-layer entry point for all agent LLM calls. Manages load balancing across vLLM servers and submits steps to DataPool.
-
DataPool
Ray Actor. The central trajectory buffer between Agent Side and Training Side. Supports asynchronous writes from the Gateway and batch reads from the Trainer.
-
Agent Flow
Python framework for white-box agents. Manages chat templates, tokenization, multimodal data processing, and HTTP communication with the Gateway.
-
Async Training
AsyncTrainerandAsyncRollouterRay Actors. Continuous, non-blocking training loop with parameter synchronization. -
Reward System
RewardLoopWorkerRay Actor. Computes step-level rewards from rule-based, discriminative, or generative reward models.
Component Interaction Map¶
┌─────────────────────────────────────────┐
Black-box Agent ───►│ │
(base_url only) │ GATEWAY SERVER │
│ (FastAPI, port 8000) │
White-box Agent ───►│ │
(AgentFlow) └────────────┬────────────────────────────┘
│ Ray RPC (submit_step)
▼
┌─────────────────────────────────────────┐
│ DATAPOOL │
│ (Ray Actor) │
└──────────────────┬──────────────────────┘
│ fetch_batch()
▼
┌─────────────────────────────────────────┐
│ ASYNC TRAINER │
│ (Ray Actor) │
│ ┌─────────────────────────────────┐ │
│ │ Actor │ Critic │ RefPolicy │ │
│ └────────────────────────────────-┘ │
└────────────────┬────────────────────────┘
│ weight sync (NCCL)
▼
┌─────────────────────────────────────────┐
│ ASYNC ROLLOUTER │
│ (Ray Actor, rollout GPU pool) │
│ vLLM servers │
└─────────────────────────────────────────┘