Skip to content

nvidia-isaac/WBC-AGILE

Repository files navigation

AGILE: A Generic Isaac-Lab based Engine for humanoid loco-manipulation learning

Overview

AGILE provides a comprehensive reinforcement learning framework for training whole-body control policies with validated sim-to-real transfer capabilities. Built on NVIDIA Isaac Lab, this toolkit enables researchers and practitioners to develop loco-manipulation behaviors for humanoid robots.


Top row: Booster T1 – stand-up recovery (sim-to-sim), velocity tracking (sim-to-sim), velocity tracking (sim-to-real).
Bottom row: Unitree G1 – velocity-height tracking (sim-to-sim), velocity-height tracking (sim-to-real), teleoperation with trained policy.

Key Features

Key Features

AGILE Highlights

Figure: AGILE highlights and key features.

Project Structure

Project Structure

agile/                       # Repository root
├── agile/                   # Main package
│   ├── algorithms/          # Algorithms for policy training
│   │   ├── rsl_rl/          # Custom rsl_rl library with TensorDict support
│   │   └── evaluation/      # Evaluation and metrics computation
│   ├── data/                # Data handling and policy checkpoints
│   ├── isaaclab_extras/     # Isaac Lab extensions and monkey patches
│   └── rl_env/              # Reinforcement learning environments
│       ├── assets/          # Robot assets and configurations
│       ├── mdp/             # MDP components (rewards, commands, actions, etc.)
│       ├── tasks/           # Task definitions and configurations
│       ├── tests/           # Unit tests for MDP components
│       ├── utils/           # Environment utilities
│       └── rsl_rl/          # RSL-RL integration and wrappers
├── docs/                    # Documentation and media files
│   └── videos/              # Demo videos (tracked with Git LFS)
├── scripts/                 # Utility scripts
│   ├── train.py             # Training script
│   ├── eval.py              # Evaluation and policy export script
│   ├── play.py              # Environment validation script (no policy)
│   ├── verify_rsl_rl.py     # Verify RSL-RL installation
│   ├── export_IODescriptors.py # Export I/O descriptors
│   ├── setup/               # Installation and setup scripts
│   │   ├── install_deps.sh           # Install for Docker deployment
│   │   ├── install_deps_ci.sh        # Install for CI environment
│   │   ├── install_deps_local.sh     # Install for local development
│   │   └── setup_hooks.sh            # Set up git hooks
│   ├── wandb_sweep/         # Hyperparameter optimization with W&B
├── tests/                   # Test suite
├── workflows/               # Support workflow such as docker file
├── pyproject.toml           # Project configuration
├── CONTRIBUTING.md          # Contribution guidelines
└── README.md                # Project documentation

Table of Contents

Installation

Prerequisites

Prerequisites

Install Isaac Lab 2.3.0: Follow the installation guide. Note that Isaac Sim 5.1 is required to use the verified USD provided in this project. We recommend using the conda installation. Remember to check out the specific branch as follows.

# Ensure you're using version 2.3.0
git checkout v2.3.0
Local Development Setup

Local Development Setup

For local development on your machine:

# Ensure ISAACLAB_PATH is set
export ISAACLAB_PATH=/path/to/isaac_lab

# Install all dependencies and packages
./scripts/setup/install_deps_local.sh

# Verify the custom rsl_rl is correctly installed
${ISAACLAB_PATH}/isaaclab.sh -p scripts/verify_rsl_rl.py

The scripts/setup/install_deps_local.sh script will:

  • Install runtime dependencies (tensordict, wandb, datasets, etc.)
  • Remove any conflicting rsl_rl packages from Isaac Lab
  • Install our custom rsl_rl with TensorDict support
  • Install the agile package

Quick Start

Getting Started

Get started with AGILE in three simple steps locally:

1. Train a velocity tracking policy:

python scripts/train.py \
    --task Velocity-T1-v0 \
    --num_envs 2048 \
    --headless

2. Visualize the trained policy:

# After training completes, visualize and evaluate the policy
python scripts/eval.py \
    --task Velocity-T1-v0 \
    --num_envs 32 \
    --checkpoint <path_to_checkpoint>

đź’ˇ Try a pre-trained policy: We provide a variety of pre-trained policies for different robots and tasks. For a quick start, we recommend trying the G1 recurrent student policy. This policy has better tracking performance comparing to the velocity tracking only policy, does not require linear velocity observations and is ready for direct deployment on real hardware.

đź’ˇ Next Steps:

Usage

Embodiments

The framework has been validated on two humanoid robots: Booster T1 and Unitree G1, with both robot USDs available in Isaac Sim 5.1 public release. For the G1 robot, we provide two actuator configurations: a delayed DC motor model and an implicit actuator setup adapted from BeyondMimic, both verified in sim-to-sim and sim-to-real transfers.

Tasks & Policy Architecture

Tasks & Policy Architecture

Modular Policy Design

AGILE uses a modular approach to enable complex loco-manipulation behaviors:

Modular Policy Architecture

The framework separates lower body locomotion (trained via RL) from upper body control (IK/IL/Random), with optional distillation to deployable student policies. This architecture enables flexible behavior composition and efficient training strategies.

🎯 Teleoperation Integration: AGILE policies power Isaac Lab's official teleoperation examples. For optimal performance, use the latest policies from this repository—Isaac Lab will be updated with these improved versions soon.

Note: This modular architecture represents our current implementation focus for loco-manipulation tasks, particularly enabling teleoperation where the upper body responds to external commands while maintaining stable locomotion. AGILE is not limited to this approach—the framework supports various policy architectures including unified full-body control (e.g., stand-up task) and will expand to support additional architectures in future releases.

Self-Contained Task Design

Each task configuration is intentionally self-contained with all MDP components in one file:

  • âś… Transparent & Maintainable: Complete setup visible without inheritance tracing
  • âś… Collaboration-Friendly: Developers work independently without conflicts
  • âś… Fast Iteration: Localized changes with immediate, visible impact

Available Tasks

This project supports multiple tasks across different robot embodiments (G1 and T1):

  • Locomotion: Velocity tracking for G1 (legs + waist) and T1 (legs only)
  • Locomotion + Height: Extended tracking with height commands, includes teacher and student distillation variants (recurrent & history-based)
  • Stand Up: Full-body autonomous recovery from arbitrary fallen poses

đź“– For detailed task specifications, MDP configurations, complete design philosophy, and training pipeline documentation, see the Task README.

💡 We've included Lessons Learned to share practical insights and tips from our experience developing these policies—from robot modeling to sim-to-real deployment.

Play

Play

After building a task, we suggest validating the task including scene, action, MDP functions etc before training. For environment validation without a policy (using sinusoidal test actions), use scripts/play.py:

python scripts/play.py --task Velocity-T1-v0 --num_envs 2
Training

Training

Following Isaac Lab conventions, most training configuration lives in the corresponding rsl_rl_ppo_cfg.py file. Many options can be overridden via CLI. Run for full help:

python scripts/train.py -h

For local training, use the following command. We use W&B for logging by default.

python scripts/train.py \
    --task Velocity-T1-v0 \
    --num_envs 4096 \
    --headless \
    --logger wandb \
    --log_project_name Velocity-T1-v0 \
    --run_name test

💡 Experiment Reproducibility: Training (including evaluation) automatically captures and logs lightweight git metadata (commit hash, branch, uncommitted changes, and diffs) to your experiment logs. When using W&B, this information is uploaded to your run for easy tracking and reproduction. This ensures you can always trace back the exact code state—including any staged or unstaged changes—used for any experiment, without storing the entire repository.

Teacher Student Distillation

Teacher Student Distillation

Teacher Training Training a teacher policy with privileged observations is often more effective than directly training a deployable policy using noisy and partially observable inputs. To train a teacher policy, follow the standard training procedure, adding any useful observations and removing noise. Once training is complete, export the policy using the play script.

Student Distillation After obtaining the exported teacher policy (.pt file), you can distill it into a student policy that uses realistic (i.e., deployable) observations.

To configure the distillation process, set up the runner as follows:

@configclass
class DistillationRunnerCfg(TeacherPpoRunnerCfg):
    algorithm = RslRlDistillationAlgorithmCfg(
        num_learning_epochs=5,
        gradient_length=15,
        learning_rate=1e-3,
        max_grad_norm=1.0,
        loss_type="mse",
    )
    policy = RslRlStudentTrainedTeacherCfg(
        class_name="StudentTrainedTeacher",  # or "StudentTrainedTeacherRecurrent"
        teacher_path="/path/to/exported/teacher_policy.pt",
        student_hidden_dims=[256, 256, 128],
        activation="elu",
    )

In the environment configuration, define separate observation dictionaries:

  • policy: for student observations
  • teacher: for teacher observations (this corresponds to the critic in RL training). This is simply what you defined as policy observations during teacher training.

Finally, register the task as a standard rsl_rl task and start training. Note that during distillation, the reward is not used for optimization—it is still logged for reference.

Tip Training the student as a recurrent network is often beneficial as it can help to cope with noise and partial observability.

Hyperparameter Sweep

Hyperparameter Sweep

Deploy a W&B sweep for hyperparameter optimization, see scripts/wandb_sweep/README for details.

Evaluation

Evaluation

To visualzie and export a trained policy, use the scritps/eval.py. This script can also be used for evaluation with deterministic scenarios and report generation:

python scripts/eval.py \
    --task Velocity-Height-G1-v0 \
    --checkpoint /path/to/model.pt \
    --num_envs 1024 \
    --headless

Additional evaluation options include --save_trajectories to save trajectory data for analysis, --generate_report to generate HTML evaluation reports, --eval_config to use deterministic evaluation scenarios, and more. Run with --run_evaluation to enable the full evaluation pipeline. See agile/algorithms/evaluation/README.md for detailed configurations.

Sim to MuJoCo

Sim to MuJoCo

We provide a generic Sim2MuJoCo framework that enables seamless policy transfer from Isaac Lab to MuJoCo simulation. The framework is task-agnostic and automatically handles observation/action mapping by parsing the exported I/O descriptor YAML file—no code changes needed for different tasks.

Quick Start:

  1. Export policy and I/O descriptor from your trained checkpoint
  2. Get robot MJCF from Unitree's official repository or bring your own
  3. Run evaluation in MuJoCo
python scripts/sim2mujoco_eval.py \
  --checkpoint path/to/policy.pt \
  --config path/to/config.yaml \
  --mjcf unitree_mujoco/unitree_robots/g1/scene_29dof.xml

For detailed instructions on exporting policies and I/O descriptors, see scripts/README.md.

Testing

Testing

# Run all tests in Docker (matches CI environment)
./tests/test_e2e_ci_locally.sh --all

# Run locally (requires Isaac Lab)
./tests/run_unit_tests.sh

See tests/README.md for detailed testing guide.

Development

Docker Build Process

Docker Build Process

The workflows/Dockerfile:

  1. Starts from nvcr.io/nvidia/isaac-lab:2.3.0 base image
  2. Installs Python dependencies into Isaac Lab's environment
  3. Removes conflicting rsl_rl packages
  4. Installs custom rsl_rl with TensorDict support
  5. Verifies correct installation
Pre-commit Hooks

Pre-commit Hooks

This repository uses pre-commit hooks to ensure code quality. To set up the hooks:

  1. Install the pre-commit hooks:
./scripts/setup/setup_hooks.sh
  1. The hooks will run automatically on each commit. To run them manually:
pre-commit run --all-files

The pre-commit configuration includes:

  • Code formatting with Black and isort
  • Linting with Flake8
  • Type checking with mypy
  • Various file checks (trailing whitespace, merge conflicts, etc.)

Note: The third_party directory is excluded from all pre-commit hooks to preserve the original code style of external dependencies.

Deployment

Policy deployment for sim-to-real transfer currently utilizes NVIDIA's internal deployment framework, which is planned for public release in the near future.

Pre-trained Policies: We include several verified pre-trained checkpoints in the repository for evaluation and deployment. See agile/data/policy/README.md for available policies and usage instructions.

Troubleshooting

Common issues

Issue: ModuleNotFoundError: No module named 'tensordict'

  • The dependencies are not installed in Isaac Lab's Python environment
  • Solution: Re-run ./scripts/setup/install_deps_local.sh for local development or rebuild Docker image with --rebuild

Issue: Wrong rsl_rl version being used

  • Isaac Lab's bundled rsl_rl is taking precedence
  • Solution: Run ${ISAACLAB_PATH}/isaaclab.sh -p scripts/verify_rsl_rl.py to check which version is installed
  • The custom version should show TensorDict support

Issue: Docker build fails at verification step

  • The custom rsl_rl was not properly installed
  • Check that agile/algorithms/rsl_rl/ exists and contains the custom implementation

Issue: Isaac Sim initialization failures in containers

  • The wrapper automatically retries failed training runs (2 attempts with 10s delay)
  • This handles common Isaac Sim cold start issues in Docker containers

Contributing

Please see CONTRIBUTING.md for detailed information on how to contribute to this project.

License

License Information This repository contains code under two different open-source licenses:

BSD 3-Clause License

The reinforcement learning algorithm library located in agile/algorithms/rsl_rl/ is licensed under the BSD 3-Clause License.

  • Copyright holders: ETH Zurich, NVIDIA CORPORATION & AFFILIATES
  • This portion is based on the RSL_RL library developed at ETH Zurich
  • See the full BSD 3-Clause license text in the LICENCE file (Section A)

Apache License 2.0

All other portions of this repository are licensed under the Apache License 2.0.

  • Copyright holder: NVIDIA CORPORATION & AFFILIATES
  • See the full Apache 2.0 license text in the LICENCE file (Section B)

Compliance

When using or distributing this software, you must comply with both licenses as applicable:

  • If you modify or redistribute the agile/algorithms/rsl_rl/ directory, comply with the BSD 3-Clause License terms
  • For all other code, comply with the Apache 2.0 License terms

For complete license information and full terms, see the LICENCE file at the root of this repository.

Core Contributors

Huihua Zhao, Rafael Cathomen, Lionel Gulich, Efe Arda Ongan, Michael Lin, Shalin Jain, Wei Liu, Vishal Kulkarni, Soha Pouya, Yan Chang

Acknowledgments

We would like to acknowledge the following projects from which parts of the code in this repo are derived:

Citation

If you use AGILE in your research, please cite:

@misc{agile2025,
  title        = {AGILE: A Generic Isaac-Lab based Engine for Humanoid Loco-Manipulation Learning},
  author       = {Zhao, Huihua and Cathomen, Rafael and Gulich, Lionel and Ongan, Efe Arda and Lin, Michael and Jain, Shalin and Liu, Wei and Kulkarni, Vishal and Pouya, Soha and Chang, Yan},
  year         = {2025},
  note         = {Version compatible with Isaac Lab 2.3; accessed 2025-11-19},
  url          = {https://github.com/nvidia-isaac/WBC_AGILE/tree/main}
}

About

Whole Body Control for humanoids: AGILE

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published