This repository contains a complete research and experimentation framework built on Isaac Lab v2.1.0 for learning safe and adaptive robot positioning in single-user and multi-user human–robot interaction scenarios. For a detailed overview of this project please refer to the thesis document.
The key design goal of this project is clarity and modularity:
- All tasks are implemented as external Python packages (no modification of the Isaac Lab core repository).
- Core physical and behavioral logic is centralized in a shared library (
rlpos). - Increasing task complexity is handled hierarchically: single-user → multi-user planning → multi-user navigation.
If you read this README, you should be able to understand:
- How Isaac Lab direct workflows work
- How the single-user and multi-user tasks differ
- How GuideRL and NavRL interact
- How to train, evaluate, and extend the system
- Prerequisites
- Project Architecture & Folder Structure
- Installation
- Isaac Lab Concepts Used in This Project
- The
rlposShared Library - Single-User Task: RL Positioning
- Multi-User Tasks: GuideRL & NavRL
- Training & Testing Workflow
- Extending the Project
This project assumes basic familiarity with Isaac Sim and Reinforcement Learning, but no prior experience with Isaac Lab task development.
Before working with this repository, you should:
- Have Isaac Lab v2.1.0 installed and verified
- Be able to launch Isaac Sim and run headless training
- Use a Python environment (e.g., conda) where Isaac Lab is active, see Isaac Lab conda install instructions.
To understand the terminology used here (DirectRLEnv, decimation, task registration, etc.), it is strongly recommended to skim the official Isaac Lab concepts page.
The documentation targets newer Isaac Lab versions, but the core concepts are unchanged and fully applicable to v2.1.0.
At a high level, the repository is organized as follows:
root/
├── assets/ # Shared simulation assets for Isaac Sim
├── data/ # Human motion dataset (has to be downloaded)
├── IsaacLab/ # Isaac Lab v2.1.0 installation (not modified)
├── scripts/ # Build utilities and trajectory utils scripts
├── rlpos/ # Shared physics & logic
├── single_user/ # Single-user RL task
└── multi_user/ # Multi-user hierarchical tasks
├── guiderl/ # High-level user selection task
└── navrl/ # Low-level navigation task
Isaac Lab normally expects tasks inside its folder. Instead, this project:
- Treats each task as a standalone Python package
- Registers environments through
__init__.py - Uses editable installs to enable rapid iteration
This mirrors how larger research codebases are typically structured. LycheeAI has a great tutorial on this topic.
From the repository root, run:
bash scripts/build_all.shThis script:
- Downloads the data folder of human motion datasets used in the simulations
- Installs
rlpos - Installs each task package in editable mode
- Ensures all isaac lab environments are registered
After installation, Isaac Lab training scripts can discover the tasks automatically.
This project exclusively uses Isaac Lab Direct Workflows.
Key concepts you will encounter repeatedly:
- DirectRLEnv: Base class where physics stepping, rewards, and observations are computed directly in PyTorch
- Environment configuration classes: Used to define observation sizes, episode length, reward scales, and assets
All tasks follow the same lifecycle:
- Scene setup (
_setup_scene) - Action application & State update (
_pre_physics_step,_apply_action) - Observation construction (
_get_observations) - Reward computation (
_get_rewards) - Termination checks (
_get_dones)
See the Isaac Lab documentation for more details on these concepts.
The rlpos package contains all shared simulation logic and is the backbone of the project.
-
Robot model (
robot.py) Kinematics, state buffers, collision handling, noise injection, and velocity limits. -
User models (
user.py) Human agents with trajectories, velocities, interaction timers, and intent signals. -
Lidar sensor (
lidar.py) Ray-casting logic used for obstacle awareness. -
Discretizer (
discretizer.py) Converts static geometry into an occupancy grid used for valid spawn locations. -
Utilities (
utils.py) Common math and helper functions.
By centralizing these components:
- Single-user and multi-user tasks behave identically at the physical level
- Bugs fixed once propagate everywhere
- Experiments remain comparable
Location: single_user/rl_positioning
This is the foundational task where the robot learns to position itself relative to a single human user. It uses a Direct Workflow for higher performance by calculating physics and rewards directly in PyTorch.
observations.mp4
Init: Sets up the Discretizer, User, Lidar, and Robot from rlpos.
Observations: Concatenates relative positions, headings, velocities, and lidar ranges into a single tensor.
Actions: Outputs linear and angular velocity commands for the robot.
Rewards: Dense rewards for maintaining ideal distance/orientation, penalties for collisions and jerky movements.
Location: multi_user/
The multi-user scenario introduces two human users and significantly more complexity.
To manage this, the problem is decomposed hierarchically.
Location: multi_user/guiderl
GuideRL decides which user the robot should currently serve.
Instead of outputting motion commands, GuideRL outputs a scalar which updates the user_weight, that indicates the relative importance of each user.
Mechanism: The user_weight is used to interpolate between the two users' positions to calculate a dynamic "Goal XY" (the ideal interaction point).
Observations: Kinematic data of users (velocities, accelerations).
Actions: A single scalar in [0, 1] representing the user weight change.
Rewards: Rewards for choosing correct user (goal is set to user which requests interaction first), penalties for switching too often.
GuideRL must be trained first and then frozen.
Location: multi_user/navrl
NavRL is responsible for physically moving the robot.
During each step:
- NavRL collects user observations
- A frozen GuideRL policy computes the target goal
- NavRL receives the goal relative to the robot
- NavRL outputs velocity commands
Observations: Kinematic data of users + relative position to dynamic goal, LiDAR readings, robot state.
Actions: Linear and angular velocity commands.
Rewards: Similar to single-user task but based on distance/orientation to the dynamic goal set by GuideRL.
This separation allows each agent to solve a well-scoped problem.
cd single_user/rl_positioning
bash scripts/train.sh
bash scripts/play.shcd multi_user/guiderl
bash scripts/train.shEnsure the GuideRL checkpoint path is set, then:
cd multi_user/navrl
bash scripts/train.sh
bash scripts/play.shTo visualize the rewards and statistics, launch TensorBoard:
cd <task_directory>
tensorboard --logdir logs/This will open a local server where you can inspect training curves on localhost:6006.
results.mp4
Common extensions include:
- Adding new reward terms
- Increasing the number of users
- Transferring policies to real life
The modular structure is designed to make these changes localized and safe.

