Project Overview

This repository contains a complete research and experimentation framework built on Isaac Lab v2.1.0 for learning safe and adaptive robot positioning in single-user and multi-user human–robot interaction scenarios. For a detailed overview of this project please refer to the thesis document.

The key design goal of this project is clarity and modularity:

All tasks are implemented as external Python packages (no modification of the Isaac Lab core repository).
Core physical and behavioral logic is centralized in a shared library (rlpos).
Increasing task complexity is handled hierarchically: single-user → multi-user planning → multi-user navigation.

If you read this README, you should be able to understand:

How Isaac Lab direct workflows work
How the single-user and multi-user tasks differ
How GuideRL and NavRL interact
How to train, evaluate, and extend the system

Prerequisites
Project Architecture & Folder Structure
Installation
Isaac Lab Concepts Used in This Project
The rlpos Shared Library
Single-User Task: RL Positioning
Multi-User Tasks: GuideRL & NavRL
- Agent 1: GuideRL (High-Level Planner)
- Agent 2: NavRL (Low-Level Navigator)
Training & Testing Workflow
Extending the Project

Prerequisites

This project assumes basic familiarity with Isaac Sim and Reinforcement Learning, but no prior experience with Isaac Lab task development.

Before working with this repository, you should:

Have Isaac Lab v2.1.0 installed and verified
Be able to launch Isaac Sim and run headless training
Use a Python environment (e.g., conda) where Isaac Lab is active, see Isaac Lab conda install instructions.

To understand the terminology used here (DirectRLEnv, decimation, task registration, etc.), it is strongly recommended to skim the official Isaac Lab concepts page.

The documentation targets newer Isaac Lab versions, but the core concepts are unchanged and fully applicable to v2.1.0.

Project Architecture & Folder Structure

At a high level, the repository is organized as follows:

root/
├── assets/                 # Shared simulation assets for Isaac Sim
├── data/                   # Human motion dataset (has to be downloaded)
├── IsaacLab/               # Isaac Lab v2.1.0 installation (not modified)
├── scripts/                # Build utilities and trajectory utils scripts
├── rlpos/                  # Shared physics & logic
├── single_user/            # Single-user RL task
└── multi_user/             # Multi-user hierarchical tasks
    ├── guiderl/            # High-level user selection task
    └── navrl/              # Low-level navigation task

Why this structure?

Isaac Lab normally expects tasks inside its folder. Instead, this project:

Treats each task as a standalone Python package
Registers environments through __init__.py
Uses editable installs to enable rapid iteration

This mirrors how larger research codebases are typically structured. LycheeAI has a great tutorial on this topic.

Installation

From the repository root, run:

bash scripts/build_all.sh

This script:

Downloads the data folder of human motion datasets used in the simulations
Installs rlpos
Installs each task package in editable mode
Ensures all isaac lab environments are registered

After installation, Isaac Lab training scripts can discover the tasks automatically.

Isaac Lab Concepts Used in This Project

This project exclusively uses Isaac Lab Direct Workflows.

Key concepts you will encounter repeatedly:

DirectRLEnv: Base class where physics stepping, rewards, and observations are computed directly in PyTorch
Environment configuration classes: Used to define observation sizes, episode length, reward scales, and assets

All tasks follow the same lifecycle:

Scene setup (_setup_scene)
Action application & State update (_pre_physics_step, _apply_action)
Observation construction (_get_observations)
Reward computation (_get_rewards)
Termination checks (_get_dones)

See the Isaac Lab documentation for more details on these concepts.

The `rlpos` Shared Library

The rlpos package contains all shared simulation logic and is the backbone of the project.

What lives here

Robot model (robot.py) Kinematics, state buffers, collision handling, noise injection, and velocity limits.
User models (user.py) Human agents with trajectories, velocities, interaction timers, and intent signals.
Lidar sensor (lidar.py) Ray-casting logic used for obstacle awareness.
Discretizer (discretizer.py) Converts static geometry into an occupancy grid used for valid spawn locations.
Utilities (utils.py) Common math and helper functions.

Why this matters

By centralizing these components:

Single-user and multi-user tasks behave identically at the physical level
Bugs fixed once propagate everywhere
Experiments remain comparable

Single-User Task: RL Positioning

Location: single_user/rl_positioning

This is the foundational task where the robot learns to position itself relative to a single human user. It uses a Direct Workflow for higher performance by calculating physics and rewards directly in PyTorch.

Environment Logic (rl_positioning_env.py)

observations.mp4

Init: Sets up the Discretizer, User, Lidar, and Robot from rlpos.

Observations: Concatenates relative positions, headings, velocities, and lidar ranges into a single tensor.

Actions: Outputs linear and angular velocity commands for the robot.

Rewards: Dense rewards for maintaining ideal distance/orientation, penalties for collisions and jerky movements.

Multi-User Tasks: GuideRL & NavRL

Location: multi_user/

The multi-user scenario introduces two human users and significantly more complexity.

To manage this, the problem is decomposed hierarchically.

Agent 1: GuideRL (High-Level Planner)

Location: multi_user/guiderl

GuideRL decides which user the robot should currently serve.

Core idea

Instead of outputting motion commands, GuideRL outputs a scalar which updates the user_weight, that indicates the relative importance of each user.

Mechanism: The user_weight is used to interpolate between the two users' positions to calculate a dynamic "Goal XY" (the ideal interaction point).

Observations: Kinematic data of users (velocities, accelerations).

Actions: A single scalar in [0, 1] representing the user weight change.

Rewards: Rewards for choosing correct user (goal is set to user which requests interaction first), penalties for switching too often.

GuideRL must be trained first and then frozen.

Agent 2: NavRL (Low-Level Navigator)

Location: multi_user/navrl

NavRL is responsible for physically moving the robot.

Integration with GuideRL

During each step:

NavRL collects user observations
A frozen GuideRL policy computes the target goal
NavRL receives the goal relative to the robot
NavRL outputs velocity commands

Observations: Kinematic data of users + relative position to dynamic goal, LiDAR readings, robot state.

Actions: Linear and angular velocity commands.

Rewards: Similar to single-user task but based on distance/orientation to the dynamic goal set by GuideRL.

This separation allows each agent to solve a well-scoped problem.

Training & Testing Workflow

Single-User Task

cd single_user/rl_positioning
bash scripts/train.sh
bash scripts/play.sh

Multi-User Tasks

GuideRL

cd multi_user/guiderl
bash scripts/train.sh

NavRL

Ensure the GuideRL checkpoint path is set, then:

cd multi_user/navrl
bash scripts/train.sh
bash scripts/play.sh

To visualize the rewards and statistics, launch TensorBoard:

cd <task_directory>
tensorboard --logdir logs/

This will open a local server where you can inspect training curves on localhost:6006.

results.mp4

Extending the Project

Common extensions include:

Adding new reward terms
Increasing the number of users
Transferring policies to real life

The modular structure is designed to make these changes localized and safe.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
IsaacLab		IsaacLab
assets		assets
figures		figures
multi_user		multi_user
rlpos		rlpos
scripts		scripts
single_user/rl_positioning		single_user/rl_positioning
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Table of Contents

Prerequisites

Project Architecture & Folder Structure

Why this structure?

Installation

Isaac Lab Concepts Used in This Project

The `rlpos` Shared Library

What lives here

Why this matters

Single-User Task: RL Positioning

Environment Logic (rl_positioning_env.py)

Multi-User Tasks: GuideRL & NavRL

Agent 1: GuideRL (High-Level Planner)

Core idea

Agent 2: NavRL (Low-Level Navigator)

Integration with GuideRL

Training & Testing Workflow

Single-User Task

Multi-User Tasks

GuideRL

NavRL

Extending the Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Table of Contents

Prerequisites

Project Architecture & Folder Structure

Why this structure?

Installation

Isaac Lab Concepts Used in This Project

The rlpos Shared Library

What lives here

Why this matters

Single-User Task: RL Positioning

Environment Logic (rl_positioning_env.py)

Multi-User Tasks: GuideRL & NavRL

Agent 1: GuideRL (High-Level Planner)

Core idea

Agent 2: NavRL (Low-Level Navigator)

Integration with GuideRL

Training & Testing Workflow

Single-User Task

Multi-User Tasks

GuideRL

NavRL

Extending the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `rlpos` Shared Library

Packages