Quick Start¶
This quick start is a sanity check, not the main Agent-R1 workflow. Its purpose is to verify that your environment, dataset path, model path, and training stack are wired correctly.
1. Prepare a Minimal Dataset¶
The processed Agent-R1 datasets are available on ModelScope. Download the release and place or symlink the GSM8K files to ~/data/gsm8k, or use the GSM8K preprocessing script to regenerate the sanity-check data locally:
pip install modelscope
modelscope download --dataset Melmaphother/Agent-R1-data --local_dir data/agent-r1-data
You can also clone the dataset repository with git:
git lfs install
git clone https://www.modelscope.cn/datasets/Melmaphother/Agent-R1-data.git data/agent-r1-data
This produces:
~/data/gsm8k/train.parquet~/data/gsm8k/test.parquet
2. Run the Sanity Check Script¶
Use the provided single-step script:
If needed, adjust the following values before running:
CUDA_VISIBLE_DEVICESactor_rollout_ref.model.path- dataset paths under
~/data/gsm8k
The script entrypoint is examples/gsm8k/run_steppo.sh, which launches python3 -m agent_r1.trainer.main_agent_ppo with the StepPO-style gae estimator.
3. What to Do Next¶
- Read
Step-level MDPto understand the main training abstraction. - Read
Layered Abstractionsto see howAgentFlowBase,AgentEnvLoop, andToolEnvfit together. - Continue to the
Agent Task Tutorialfor the minimal GSM8K + Tool example based onToolEnv + BaseTool. - Use
Recipes and Algorithmsto find task-specific recipes and algorithm scripts.