Quick Start¶

这个 quick start 是一个 sanity check，不是 Agent-R1 的主要智能体工作流。它的目标是确认环境、数据路径、模型路径和训练栈已经正确连接。

1. 准备最小数据集¶

Agent-R1 处理好的数据集已经发布在 ModelScope。下载后，将 GSM8K 文件放置或软链到 ~/data/gsm8k；也可以使用 GSM8K 数据预处理脚本在本地重新生成 sanity-check 数据：

pip install modelscope
modelscope download --dataset Melmaphother/Agent-R1-data --local_dir data/agent-r1-data

也可以用 git 克隆数据集仓库：

git lfs install
git clone https://www.modelscope.cn/datasets/Melmaphother/Agent-R1-data.git data/agent-r1-data

python3 -m recipes.gsm8k.data_preprocess.process_gsm8k --local_save_dir ~/data/gsm8k

它会生成：

使用单步训练脚本：

bash examples/gsm8k/run_steppo.sh

如果需要，请在运行前调整：

脚本入口是 examples/gsm8k/run_steppo.sh，它会使用 StepPO 风格的 gae estimator 启动 python3 -m agent_r1.trainer.main_agent_ppo。