AGILE provides a comprehensive reinforcement learning framework for training whole-body control policies with validated sim-to-real transfer capabilities. Built on NVIDIA Isaac Lab, this toolkit enables researchers and practitioners to develop loco-manipulation behaviors for humanoid robots.
Top row: Booster T1 – stand-up recovery (sim-to-sim), velocity tracking (sim-to-sim), velocity tracking (sim-to-real).
Bottom row: Unitree G1 – velocity-height tracking (sim-to-sim), velocity-height tracking (sim-to-real), teleoperation with trained policy.
Project Structure
agile/ # Repository root
├── agile/ # Main package
│ ├── algorithms/ # Algorithms for policy training
│ │ ├── rsl_rl/ # Custom rsl_rl library with TensorDict support
│ │ └── evaluation/ # Evaluation and metrics computation
│ ├── data/ # Data handling and policy checkpoints
│ ├── isaaclab_extras/ # Isaac Lab extensions and monkey patches
│ └── rl_env/ # Reinforcement learning environments
│ ├── assets/ # Robot assets and configurations
│ ├── mdp/ # MDP components (rewards, commands, actions, etc.)
│ ├── tasks/ # Task definitions and configurations
│ ├── tests/ # Unit tests for MDP components
│ ├── utils/ # Environment utilities
│ └── rsl_rl/ # RSL-RL integration and wrappers
├── docs/ # Documentation and media files
│ └── videos/ # Demo videos (tracked with Git LFS)
├── scripts/ # Utility scripts
│ ├── train.py # Training script
│ ├── eval.py # Evaluation and policy export script
│ ├── play.py # Environment validation script (no policy)
│ ├── verify_rsl_rl.py # Verify RSL-RL installation
│ ├── export_IODescriptors.py # Export I/O descriptors
│ ├── setup/ # Installation and setup scripts
│ │ ├── install_deps.sh # Install for Docker deployment
│ │ ├── install_deps_ci.sh # Install for CI environment
│ │ ├── install_deps_local.sh # Install for local development
│ │ └── setup_hooks.sh # Set up git hooks
│ ├── wandb_sweep/ # Hyperparameter optimization with W&B
├── tests/ # Test suite
├── workflows/ # Support workflow such as docker file
├── pyproject.toml # Project configuration
├── CONTRIBUTING.md # Contribution guidelines
└── README.md # Project documentation
- Installation
- Quick Start
- Usage
- Development
- Deployment
- Troubleshooting
- Contributing
- License
- Core Contributors
- Acknowledgments
Prerequisites
Install Isaac Lab 2.3.0: Follow the installation guide. Note that Isaac Sim 5.1 is required to use the verified USD provided in this project. We recommend using the conda installation. Remember to check out the specific branch as follows.
# Ensure you're using version 2.3.0
git checkout v2.3.0Local Development Setup
For local development on your machine:
# Ensure ISAACLAB_PATH is set
export ISAACLAB_PATH=/path/to/isaac_lab
# Install all dependencies and packages
./scripts/setup/install_deps_local.sh
# Verify the custom rsl_rl is correctly installed
${ISAACLAB_PATH}/isaaclab.sh -p scripts/verify_rsl_rl.pyThe scripts/setup/install_deps_local.sh script will:
- Install runtime dependencies (tensordict, wandb, datasets, etc.)
- Remove any conflicting rsl_rl packages from Isaac Lab
- Install our custom rsl_rl with TensorDict support
- Install the agile package
Getting Started
Get started with AGILE in three simple steps locally:
1. Train a velocity tracking policy:
python scripts/train.py \
--task Velocity-T1-v0 \
--num_envs 2048 \
--headless2. Visualize the trained policy:
# After training completes, visualize and evaluate the policy
python scripts/eval.py \
--task Velocity-T1-v0 \
--num_envs 32 \
--checkpoint <path_to_checkpoint>đź’ˇ Try a pre-trained policy: We provide a variety of pre-trained policies for different robots and tasks. For a quick start, we recommend trying the G1 recurrent student policy. This policy has better tracking performance comparing to the velocity tracking only policy, does not require linear velocity observations and is ready for direct deployment on real hardware.
đź’ˇ Next Steps:
- Try available policies for different robots and tasks.
- Explore available tasks for different robots and behaviors
- Learn about teacher-student distillation for robust deployment
- See Evaluation for performance analysis and metrics
Embodiments
The framework has been validated on two humanoid robots: Booster T1 and Unitree G1, with both robot USDs available in Isaac Sim 5.1 public release. For the G1 robot, we provide two actuator configurations: a delayed DC motor model and an implicit actuator setup adapted from BeyondMimic, both verified in sim-to-sim and sim-to-real transfers.
Tasks & Policy Architecture
AGILE uses a modular approach to enable complex loco-manipulation behaviors:
The framework separates lower body locomotion (trained via RL) from upper body control (IK/IL/Random), with optional distillation to deployable student policies. This architecture enables flexible behavior composition and efficient training strategies.
🎯 Teleoperation Integration: AGILE policies power Isaac Lab's official teleoperation examples. For optimal performance, use the latest policies from this repository—Isaac Lab will be updated with these improved versions soon.
Note: This modular architecture represents our current implementation focus for loco-manipulation tasks, particularly enabling teleoperation where the upper body responds to external commands while maintaining stable locomotion. AGILE is not limited to this approach—the framework supports various policy architectures including unified full-body control (e.g., stand-up task) and will expand to support additional architectures in future releases.
Each task configuration is intentionally self-contained with all MDP components in one file:
- âś… Transparent & Maintainable: Complete setup visible without inheritance tracing
- âś… Collaboration-Friendly: Developers work independently without conflicts
- âś… Fast Iteration: Localized changes with immediate, visible impact
This project supports multiple tasks across different robot embodiments (G1 and T1):
- Locomotion: Velocity tracking for G1 (legs + waist) and T1 (legs only)
- Locomotion + Height: Extended tracking with height commands, includes teacher and student distillation variants (recurrent & history-based)
- Stand Up: Full-body autonomous recovery from arbitrary fallen poses
đź“– For detailed task specifications, MDP configurations, complete design philosophy, and training pipeline documentation, see the Task README.
💡 We've included Lessons Learned to share practical insights and tips from our experience developing these policies—from robot modeling to sim-to-real deployment.
Play
After building a task, we suggest validating the task including scene, action, MDP functions etc before training. For environment validation without a policy (using sinusoidal test actions), use scripts/play.py:
python scripts/play.py --task Velocity-T1-v0 --num_envs 2Training
Following Isaac Lab conventions, most training configuration lives in the corresponding rsl_rl_ppo_cfg.py file. Many options can be overridden via CLI. Run for full help:
python scripts/train.py -hFor local training, use the following command. We use W&B for logging by default.
python scripts/train.py \
--task Velocity-T1-v0 \
--num_envs 4096 \
--headless \
--logger wandb \
--log_project_name Velocity-T1-v0 \
--run_name test💡 Experiment Reproducibility: Training (including evaluation) automatically captures and logs lightweight git metadata (commit hash, branch, uncommitted changes, and diffs) to your experiment logs. When using W&B, this information is uploaded to your run for easy tracking and reproduction. This ensures you can always trace back the exact code state—including any staged or unstaged changes—used for any experiment, without storing the entire repository.
Teacher Student Distillation
Teacher Training Training a teacher policy with privileged observations is often more effective than directly training a deployable policy using noisy and partially observable inputs. To train a teacher policy, follow the standard training procedure, adding any useful observations and removing noise. Once training is complete, export the policy using the play script.
Student Distillation
After obtaining the exported teacher policy (.pt file), you can distill it into a student policy that uses realistic (i.e., deployable) observations.
To configure the distillation process, set up the runner as follows:
@configclass
class DistillationRunnerCfg(TeacherPpoRunnerCfg):
algorithm = RslRlDistillationAlgorithmCfg(
num_learning_epochs=5,
gradient_length=15,
learning_rate=1e-3,
max_grad_norm=1.0,
loss_type="mse",
)
policy = RslRlStudentTrainedTeacherCfg(
class_name="StudentTrainedTeacher", # or "StudentTrainedTeacherRecurrent"
teacher_path="/path/to/exported/teacher_policy.pt",
student_hidden_dims=[256, 256, 128],
activation="elu",
)In the environment configuration, define separate observation dictionaries:
policy: for student observationsteacher: for teacher observations (this corresponds to thecriticin RL training). This is simply what you defined aspolicyobservations during teacher training.
Finally, register the task as a standard rsl_rl task and start training. Note that during distillation, the reward is not used for optimization—it is still logged for reference.
Tip Training the student as a recurrent network is often beneficial as it can help to cope with noise and partial observability.
Hyperparameter Sweep
Deploy a W&B sweep for hyperparameter optimization, see scripts/wandb_sweep/README for details.
Evaluation
To visualzie and export a trained policy, use the scritps/eval.py. This script can also be used for evaluation with deterministic scenarios and report generation:
python scripts/eval.py \
--task Velocity-Height-G1-v0 \
--checkpoint /path/to/model.pt \
--num_envs 1024 \
--headlessAdditional evaluation options include --save_trajectories to save trajectory data for analysis, --generate_report to generate HTML evaluation reports, --eval_config to use deterministic evaluation scenarios, and more. Run with --run_evaluation to enable the full evaluation pipeline. See agile/algorithms/evaluation/README.md for detailed configurations.
Sim to MuJoCo
We provide a generic Sim2MuJoCo framework that enables seamless policy transfer from Isaac Lab to MuJoCo simulation. The framework is task-agnostic and automatically handles observation/action mapping by parsing the exported I/O descriptor YAML file—no code changes needed for different tasks.
Quick Start:
- Export policy and I/O descriptor from your trained checkpoint
- Get robot MJCF from Unitree's official repository or bring your own
- Run evaluation in MuJoCo
python scripts/sim2mujoco_eval.py \
--checkpoint path/to/policy.pt \
--config path/to/config.yaml \
--mjcf unitree_mujoco/unitree_robots/g1/scene_29dof.xmlFor detailed instructions on exporting policies and I/O descriptors, see scripts/README.md.
Testing
# Run all tests in Docker (matches CI environment)
./tests/test_e2e_ci_locally.sh --all
# Run locally (requires Isaac Lab)
./tests/run_unit_tests.shSee tests/README.md for detailed testing guide.
Docker Build Process
The workflows/Dockerfile:
- Starts from
nvcr.io/nvidia/isaac-lab:2.3.0base image - Installs Python dependencies into Isaac Lab's environment
- Removes conflicting rsl_rl packages
- Installs custom rsl_rl with TensorDict support
- Verifies correct installation
Pre-commit Hooks
This repository uses pre-commit hooks to ensure code quality. To set up the hooks:
- Install the pre-commit hooks:
./scripts/setup/setup_hooks.sh- The hooks will run automatically on each commit. To run them manually:
pre-commit run --all-filesThe pre-commit configuration includes:
- Code formatting with Black and isort
- Linting with Flake8
- Type checking with mypy
- Various file checks (trailing whitespace, merge conflicts, etc.)
Note: The third_party directory is excluded from all pre-commit hooks to preserve the original code style of external dependencies.
Policy deployment for sim-to-real transfer currently utilizes NVIDIA's internal deployment framework, which is planned for public release in the near future.
Pre-trained Policies: We include several verified pre-trained checkpoints in the repository for evaluation and deployment. See agile/data/policy/README.md for available policies and usage instructions.
Common issues
Issue: ModuleNotFoundError: No module named 'tensordict'
- The dependencies are not installed in Isaac Lab's Python environment
- Solution: Re-run
./scripts/setup/install_deps_local.shfor local development or rebuild Docker image with--rebuild
Issue: Wrong rsl_rl version being used
- Isaac Lab's bundled rsl_rl is taking precedence
- Solution: Run
${ISAACLAB_PATH}/isaaclab.sh -p scripts/verify_rsl_rl.pyto check which version is installed - The custom version should show TensorDict support
Issue: Docker build fails at verification step
- The custom rsl_rl was not properly installed
- Check that
agile/algorithms/rsl_rl/exists and contains the custom implementation
Issue: Isaac Sim initialization failures in containers
- The wrapper automatically retries failed training runs (2 attempts with 10s delay)
- This handles common Isaac Sim cold start issues in Docker containers
Please see CONTRIBUTING.md for detailed information on how to contribute to this project.
License Information
This repository contains code under two different open-source licenses:The reinforcement learning algorithm library located in agile/algorithms/rsl_rl/ is licensed under the BSD 3-Clause License.
- Copyright holders: ETH Zurich, NVIDIA CORPORATION & AFFILIATES
- This portion is based on the RSL_RL library developed at ETH Zurich
- See the full BSD 3-Clause license text in the LICENCE file (Section A)
All other portions of this repository are licensed under the Apache License 2.0.
- Copyright holder: NVIDIA CORPORATION & AFFILIATES
- See the full Apache 2.0 license text in the LICENCE file (Section B)
When using or distributing this software, you must comply with both licenses as applicable:
- If you modify or redistribute the
agile/algorithms/rsl_rl/directory, comply with the BSD 3-Clause License terms - For all other code, comply with the Apache 2.0 License terms
For complete license information and full terms, see the LICENCE file at the root of this repository.
Huihua Zhao, Rafael Cathomen, Lionel Gulich, Efe Arda Ongan, Michael Lin, Shalin Jain, Wei Liu, Vishal Kulkarni, Soha Pouya, Yan Chang
We would like to acknowledge the following projects from which parts of the code in this repo are derived:
If you use AGILE in your research, please cite:
@misc{agile2025,
title = {AGILE: A Generic Isaac-Lab based Engine for Humanoid Loco-Manipulation Learning},
author = {Zhao, Huihua and Cathomen, Rafael and Gulich, Lionel and Ongan, Efe Arda and Lin, Michael and Jain, Shalin and Liu, Wei and Kulkarni, Vishal and Pouya, Soha and Chang, Yan},
year = {2025},
note = {Version compatible with Isaac Lab 2.3; accessed 2025-11-19},
url = {https://github.com/nvidia-isaac/WBC_AGILE/tree/main}
}






