Skip to content

Telios/master_thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

161 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview

This repository contains a complete research and experimentation framework built on Isaac Lab v2.1.0 for learning safe and adaptive robot positioning in single-user and multi-user human–robot interaction scenarios. For a detailed overview of this project please refer to the thesis document.

The key design goal of this project is clarity and modularity:

  • All tasks are implemented as external Python packages (no modification of the Isaac Lab core repository).
  • Core physical and behavioral logic is centralized in a shared library (rlpos).
  • Increasing task complexity is handled hierarchically: single-user → multi-user planning → multi-user navigation.

If you read this README, you should be able to understand:

  • How Isaac Lab direct workflows work
  • How the single-user and multi-user tasks differ
  • How GuideRL and NavRL interact
  • How to train, evaluate, and extend the system

Table of Contents

Prerequisites

This project assumes basic familiarity with Isaac Sim and Reinforcement Learning, but no prior experience with Isaac Lab task development.

Before working with this repository, you should:

  • Have Isaac Lab v2.1.0 installed and verified
  • Be able to launch Isaac Sim and run headless training
  • Use a Python environment (e.g., conda) where Isaac Lab is active, see Isaac Lab conda install instructions.

To understand the terminology used here (DirectRLEnv, decimation, task registration, etc.), it is strongly recommended to skim the official Isaac Lab concepts page.

The documentation targets newer Isaac Lab versions, but the core concepts are unchanged and fully applicable to v2.1.0.

Project Architecture & Folder Structure

At a high level, the repository is organized as follows:

root/
├── assets/                 # Shared simulation assets for Isaac Sim
├── data/                   # Human motion dataset (has to be downloaded)
├── IsaacLab/               # Isaac Lab v2.1.0 installation (not modified)
├── scripts/                # Build utilities and trajectory utils scripts
├── rlpos/                  # Shared physics & logic
├── single_user/            # Single-user RL task
└── multi_user/             # Multi-user hierarchical tasks
    ├── guiderl/            # High-level user selection task
    └── navrl/              # Low-level navigation task

Why this structure?

Isaac Lab normally expects tasks inside its folder. Instead, this project:

  • Treats each task as a standalone Python package
  • Registers environments through __init__.py
  • Uses editable installs to enable rapid iteration

This mirrors how larger research codebases are typically structured. LycheeAI has a great tutorial on this topic.

Installation

From the repository root, run:

bash scripts/build_all.sh

This script:

  • Downloads the data folder of human motion datasets used in the simulations
  • Installs rlpos
  • Installs each task package in editable mode
  • Ensures all isaac lab environments are registered

After installation, Isaac Lab training scripts can discover the tasks automatically.

Isaac Lab Concepts Used in This Project

This project exclusively uses Isaac Lab Direct Workflows.

Key concepts you will encounter repeatedly:

  • DirectRLEnv: Base class where physics stepping, rewards, and observations are computed directly in PyTorch
  • Environment configuration classes: Used to define observation sizes, episode length, reward scales, and assets

All tasks follow the same lifecycle:

  1. Scene setup (_setup_scene)
  2. Action application & State update (_pre_physics_step, _apply_action)
  3. Observation construction (_get_observations)
  4. Reward computation (_get_rewards)
  5. Termination checks (_get_dones)

See the Isaac Lab documentation for more details on these concepts.

The rlpos Shared Library

The rlpos package contains all shared simulation logic and is the backbone of the project.

What lives here

  • Robot model (robot.py) Kinematics, state buffers, collision handling, noise injection, and velocity limits.

  • User models (user.py) Human agents with trajectories, velocities, interaction timers, and intent signals.

  • Lidar sensor (lidar.py) Ray-casting logic used for obstacle awareness.

  • Discretizer (discretizer.py) Converts static geometry into an occupancy grid used for valid spawn locations.

  • Utilities (utils.py) Common math and helper functions.

Why this matters

By centralizing these components:

  • Single-user and multi-user tasks behave identically at the physical level
  • Bugs fixed once propagate everywhere
  • Experiments remain comparable

Single-User Task: RL Positioning

Location: single_user/rl_positioning

This is the foundational task where the robot learns to position itself relative to a single human user. It uses a Direct Workflow for higher performance by calculating physics and rewards directly in PyTorch.

Environment Logic (rl_positioning_env.py)

observations.mp4

Init: Sets up the Discretizer, User, Lidar, and Robot from rlpos.

Observations: Concatenates relative positions, headings, velocities, and lidar ranges into a single tensor.

Actions: Outputs linear and angular velocity commands for the robot.

Rewards: Dense rewards for maintaining ideal distance/orientation, penalties for collisions and jerky movements.

Multi-User Tasks: GuideRL & NavRL

Location: multi_user/

The multi-user scenario introduces two human users and significantly more complexity.

To manage this, the problem is decomposed hierarchically.

Agent 1: GuideRL (High-Level Planner)

Location: multi_user/guiderl

GuideRL decides which user the robot should currently serve.

Core idea

Instead of outputting motion commands, GuideRL outputs a scalar which updates the user_weight, that indicates the relative importance of each user.

Mechanism: The user_weight is used to interpolate between the two users' positions to calculate a dynamic "Goal XY" (the ideal interaction point).

Observations: Kinematic data of users (velocities, accelerations).

Actions: A single scalar in [0, 1] representing the user weight change.

Rewards: Rewards for choosing correct user (goal is set to user which requests interaction first), penalties for switching too often.

GuideRL must be trained first and then frozen.

Agent 2: NavRL (Low-Level Navigator)

Location: multi_user/navrl

NavRL is responsible for physically moving the robot.

Integration with GuideRL

During each step:

  1. NavRL collects user observations
  2. A frozen GuideRL policy computes the target goal
  3. NavRL receives the goal relative to the robot
  4. NavRL outputs velocity commands

Observations: Kinematic data of users + relative position to dynamic goal, LiDAR readings, robot state.

Actions: Linear and angular velocity commands.

Rewards: Similar to single-user task but based on distance/orientation to the dynamic goal set by GuideRL.

This separation allows each agent to solve a well-scoped problem.

Training & Testing Workflow

Single-User Task

cd single_user/rl_positioning
bash scripts/train.sh
bash scripts/play.sh

Multi-User Tasks

GuideRL

cd multi_user/guiderl
bash scripts/train.sh

NavRL

Ensure the GuideRL checkpoint path is set, then:

cd multi_user/navrl
bash scripts/train.sh
bash scripts/play.sh

To visualize the rewards and statistics, launch TensorBoard:

cd <task_directory>
tensorboard --logdir logs/

This will open a local server where you can inspect training curves on localhost:6006.

results.mp4

Extending the Project

Common extensions include:

  • Adding new reward terms
  • Increasing the number of users
  • Transferring policies to real life

The modular structure is designed to make these changes localized and safe.

About

Deep Reinforcement Learning framework for learning safe and adaptive robot positioning in a single- and multi-user human-robot interaction scenario

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors