Quick Start¶

This guide walks you through two minimal examples: a white-box agent (AgentFlow managed by Claw-R1) and a black-box agent (any HTTP-based agent, zero modification required).

White-box Mode¶

In white-box mode, Claw-R1 owns the agent loop via AgentFlowBase. The framework can observe token IDs, compute rewards internally, and submit steps directly.

1. Start the Gateway¶

python -m claw_r1.gateway.gateway \
    --data-pool-name data_pool \
    --vllm-addresses http://localhost:8001 \
    --tokenizer-path /path/to/your/model \
    --prompt-length 4096 \
    --response-length 1024

2. Launch Training¶

Use the provided Hydra configuration:

python -m claw_r1.main_agent_ppo \
    --config-name agent_ppo_trainer \
    trainer.model.path=/path/to/your/model \
    trainer.project_name=claw_r1_quickstart

3. Monitor¶

Ray Dashboard is available at http://localhost:8265 by default once the Ray cluster is initialized.

Black-box Mode¶

In black-box mode, your existing agent needs only one change: redirect its base_url to the Gateway.

Any Python agent (OpenAI SDK)¶

from openai import OpenAI
import uuid

traj_uid  = str(uuid.uuid4())  # unique per conversation
prompt_uid = str(uuid.uuid4()) # unique per prompt group (for GRPO)

# Before
# client = OpenAI(base_url="https://api.openai.com/v1")

# After — single line change
client = OpenAI(
    base_url=f"http://localhost:8000/{traj_uid}/{prompt_uid}",
    api_key="not-used",
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello, what can you do?"}],
)
print(response.choices[0].message.content)

OpenClaw¶

In your OpenClaw configuration file, change:

# Before
LLM_API_BASE: "https://api.openai.com/v1"

# After
LLM_API_BASE: "http://gateway-host:8000"

That's it. Every LLM call OpenClaw makes will be transparently intercepted, logged to DataPool, and used for training — with no interruption to service.

Async Training Mode¶

For production scenarios where the agent serves requests continuously, use the fully-async entry point:

python -m claw_r1.async_main \
    --config-name async_ppo_trainer \
    trainer.model.path=/path/to/your/model

The async runner automatically:

Initializes the DataPool Ray actor
Starts the AsyncRollouter on the rollout GPU pool
Starts the AsyncTrainer on the training GPU pool
Starts a ParameterSynchronizer that periodically pushes updated weights to the rollout vLLM servers
Starts the Gateway as an independent FastAPI process

No dataset required

In async online mode, the training data comes entirely from live agent interactions. No pre-collected dataset is needed — the DataPool fills automatically as the agent handles real requests.