Gateway Server¶

The Gateway Server is a FastAPI HTTP service that acts as the network-layer proxy between agents and the Claw-R1 training infrastructure.

Design Principles¶

Independent process: The Gateway runs as a plain OS process, not a Ray actor. This means it can be restarted independently of the Ray cluster.
Pure proxy: The Gateway does not manage any engine lifecycle. It only forwards requests, collects steps, and submits to DataPool.
OpenAI-compatible: Implements the same interface as OpenAI's chat completions API, making it a drop-in replacement.

Starting the Gateway¶

python -m claw_r1.gateway.gateway \
    --data-pool-name  data_pool \
    --vllm-addresses  http://host1:8001,http://host2:8001 \
    --tokenizer-path  /path/to/model \
    --prompt-length   4096 \
    --response-length 1024

Arguments¶

Argument	Required	Description
`--data-pool-name`	Yes	Ray actor name of the DataPool to connect to
`--vllm-addresses`	Yes	Comma-separated list of vLLM server addresses (load-balanced round-robin)
`--tokenizer-path`	Yes	Path to the HuggingFace tokenizer
`--prompt-length`	Yes	Maximum prompt token length (for padding)
`--response-length`	Yes	Maximum response token length (for padding)

Endpoints¶

`POST /generate` (white-box mode)¶

Called by AgentFlowBase.gateway_generate(). Forwards the generation request to a vLLM server, tokenizes the response, and returns token IDs.

# Request
{
    "trajectory_uid": "string",
    "prompt_uid": "string",
    "messages": [...],         # OpenAI chat messages
    "max_tokens": 1024,
    "temperature": 1.0
}

# Response
{
    "response_text": "string",
    "response_ids": [101, 202, ...],   # token IDs
    "prompt_ids": [50, 60, ...]        # full context token IDs
}

`POST /submit_steps` (white-box mode)¶

Called by AgentFlowBase.gateway_submit_steps(). Submits one or more Step objects to the DataPool.

# Request
{
    "steps": [
        {
            "trajectory_uid": "string",
            "prompt_uid": "string",
            "prompt_ids": [...],
            "response_ids": [...],
            "reward": 0.0,
            "step_index": 0,
            "policy_version": 42,
            "is_last": false,
            "metadata": {}
        }
    ]
}

`POST /compute_reward`¶

Computes a reward score for a completed trajectory step.

# Request
{
    "trajectory_uid": "string",
    "messages": [...],
    "dataset_fields": {}    # task-specific fields (ground truth, etc.)
}

# Response
{
    "reward": 0.85
}

`POST /{trajectory_uid}/{prompt_uid}/v1/chat/completions` (reserved)¶

OpenAI-compatible endpoint for black-box agents. The trajectory_uid and prompt_uid are encoded in the URL path, allowing the Gateway to associate incoming requests with the correct trajectory without any client-side changes beyond base_url.

Status

This endpoint is designed and stubbed. Full black-box online integration is under active development.

`POST /complete_trajectory/{trajectory_uid}` (reserved)¶

Called by black-box agents to mark the end of a trajectory and optionally provide a final reward.

Load Balancing¶

When multiple --vllm-addresses are provided, the Gateway distributes requests across them using round-robin:

# Internal: cycle through vLLM addresses
self.vllm_address_cycle = itertools.cycle(vllm_addresses)
vllm_url = next(self.vllm_address_cycle)

This provides basic load balancing without requiring an external proxy.