Gateway Server¶
The Gateway Server is a FastAPI HTTP service that acts as the network-layer proxy between agents and the Claw-R1 training infrastructure.
Design Principles¶
- Independent process: The Gateway runs as a plain OS process, not a Ray actor. This means it can be restarted independently of the Ray cluster.
- Pure proxy: The Gateway does not manage any engine lifecycle. It only forwards requests, collects steps, and submits to DataPool.
- OpenAI-compatible: Implements the same interface as OpenAI's chat completions API, making it a drop-in replacement.
Starting the Gateway¶
python -m claw_r1.gateway.gateway \
--data-pool-name data_pool \
--vllm-addresses http://host1:8001,http://host2:8001 \
--tokenizer-path /path/to/model \
--prompt-length 4096 \
--response-length 1024
Arguments¶
| Argument | Required | Description |
|---|---|---|
--data-pool-name |
Yes | Ray actor name of the DataPool to connect to |
--vllm-addresses |
Yes | Comma-separated list of vLLM server addresses (load-balanced round-robin) |
--tokenizer-path |
Yes | Path to the HuggingFace tokenizer |
--prompt-length |
Yes | Maximum prompt token length (for padding) |
--response-length |
Yes | Maximum response token length (for padding) |
Endpoints¶
POST /generate (white-box mode)¶
Called by AgentFlowBase.gateway_generate(). Forwards the generation request to a vLLM server, tokenizes the response, and returns token IDs.
# Request
{
"trajectory_uid": "string",
"prompt_uid": "string",
"messages": [...], # OpenAI chat messages
"max_tokens": 1024,
"temperature": 1.0
}
# Response
{
"response_text": "string",
"response_ids": [101, 202, ...], # token IDs
"prompt_ids": [50, 60, ...] # full context token IDs
}
POST /submit_steps (white-box mode)¶
Called by AgentFlowBase.gateway_submit_steps(). Submits one or more Step objects to the DataPool.
# Request
{
"steps": [
{
"trajectory_uid": "string",
"prompt_uid": "string",
"prompt_ids": [...],
"response_ids": [...],
"reward": 0.0,
"step_index": 0,
"policy_version": 42,
"is_last": false,
"metadata": {}
}
]
}
POST /compute_reward¶
Computes a reward score for a completed trajectory step.
# Request
{
"trajectory_uid": "string",
"messages": [...],
"dataset_fields": {} # task-specific fields (ground truth, etc.)
}
# Response
{
"reward": 0.85
}
POST /{trajectory_uid}/{prompt_uid}/v1/chat/completions (reserved)¶
OpenAI-compatible endpoint for black-box agents. The trajectory_uid and prompt_uid are encoded in the URL path, allowing the Gateway to associate incoming requests with the correct trajectory without any client-side changes beyond base_url.
Status
This endpoint is designed and stubbed. Full black-box online integration is under active development.
POST /complete_trajectory/{trajectory_uid} (reserved)¶
Called by black-box agents to mark the end of a trajectory and optionally provide a final reward.
Load Balancing¶
When multiple --vllm-addresses are provided, the Gateway distributes requests across them using round-robin:
# Internal: cycle through vLLM addresses
self.vllm_address_cycle = itertools.cycle(vllm_addresses)
vllm_url = next(self.vllm_address_cycle)
This provides basic load balancing without requiring an external proxy.