Quick Start¶

This quick start is a sanity check, not the main Agent-R1 workflow. Its purpose is to verify that your environment, dataset path, model path, and training stack are wired correctly.

1. Prepare a Minimal Dataset¶

The processed Agent-R1 datasets are available on ModelScope. Download the release and place or symlink the GSM8K files to ~/data/gsm8k, or use the GSM8K preprocessing script to regenerate the sanity-check data locally:

pip install modelscope
modelscope download --dataset Melmaphother/Agent-R1-data --local_dir data/agent-r1-data

You can also clone the dataset repository with git:

git lfs install
git clone https://www.modelscope.cn/datasets/Melmaphother/Agent-R1-data.git data/agent-r1-data

python3 -m recipes.gsm8k.data_preprocess.process_gsm8k --local_save_dir ~/data/gsm8k

This produces:

~/data/gsm8k/train.parquet
~/data/gsm8k/test.parquet

2. Run the Sanity Check Script¶

Use the provided single-step script:

bash examples/gsm8k/run_steppo.sh

If needed, adjust the following values before running:

CUDA_VISIBLE_DEVICES
actor_rollout_ref.model.path
dataset paths under ~/data/gsm8k

The script entrypoint is examples/gsm8k/run_steppo.sh, which launches python3 -m agent_r1.trainer.main_agent_ppo with the StepPO-style gae estimator.

3. What to Do Next¶

Read Step-level MDP to understand the main training abstraction.
Read Layered Abstractions to see how AgentFlowBase, AgentEnvLoop, and ToolEnv fit together.
Continue to the Agent Task Tutorial for the minimal GSM8K + Tool example based on ToolEnv + BaseTool.
Use Recipes and Algorithms to find task-specific recipes and algorithm scripts.