The baseline provided a reinforcement learning environment based on NVIDIA Isaac Gym. For the Pi Humanoid Robots from the HighTorque Robotics , Pi_rl_baseline also includes sim2sim framework from Isaac Gym to Mujoco, enabling users to validate their trained policies in different simulations to ensure policy robustness and generalization capabilities.
-
Use
minicondaoranacondato create a virtual environmentconda create -n pi_env python=3.8. -
Use
aptto install nvidia driversudo apt install nvidia-driver-525, the driver version has to be at least 515. Installing higher version is also viable, as the driver is backward compatible. After installation, check the graphic driver's CUDA version usingnvidia-smi. As shown in the picture, the CUDA version is 12.4, driver version is 550. -
Install the latest version of
Pytorch: VisitPytorchwebsite https://pytorch.org/. ForPackagechooseConda, forCompute Platformchoose suitableCUDAversion.CUDAis backward compatible, but not forward compatible software library, so the chosenCUDAversion needs to be smaller than the computer's installed version.conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -
Use
condato install numpyconda install numpy=1.23. -
Install
Isaac Gym:- Visit Nvidia official website, download and install
Isaac Gym Preview 4https://developer.nvidia.com/isaac-gym. - Activate conda environment, then access
isaacgymrepository to installcd isaacgym/python && pip install -e . - Run the example python script to test whether the environment is installed successfully:
cd examples && python 1080_balls_of_solitude.py. - Please read
isaacgym/docs/index.htmlto troubleshoot.
- Visit Nvidia official website, download and install
-
Install this baseline:
- Clone this repository
git clone https://github.com/HighTorque-Locomotion/pi_rl_baseline.git. cd pi_rl_baseline && pip install -e .
- Clone this repository
# Use 4096 environments, and using "v1" as training version to do PPO policy training.
# This command will initiate the robot's training task.
python scripts/train.py --task=pai_ppo --run_name v1 --headless --num_envs 4096
# Evaluate the trained policy
# This command will load "v1" policy for performance evaluation under its environment.
# In addition, it will automatically exports a JIT model suitable for deployment purposes.
python scripts/play.py --task=pai_ppo --run_name v1
# Use Mujoco to achieve sim2sim
python scripts/sim2sim.py --load_model /path/to/logs/Pai_ppo/exported/policies/policy_torch.pt
# Run the trained policy provided by us
python scripts/sim2sim.py --load_model /path/to/logs/Pai_ppo/exported/policies/policy_example.pt- CPU and GPU Usage: Use CPU to run the simulation, set
--sim_device=cpuand--rl_device=cpusimultaneously. Use specific GPU to run the simulation, set--sim_device=cuda:{0,1,2...}and--rl_device={0,1,2...}simultaneously. Please note:CUDA_VISIBLE_DEVICESis not applicable, and the setting that match--sim_deviceand--rl_deviceis very important. - Headless Operation: use
--headlessparameters to run without rendering. - Rendering Control: during training, press 'v' to open or close rendering.
- Policy Location: Trained model save to
humanoid/logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt.
Run RL training, please refer to humanoid/utils/helpers.py.
Run sim2sim, please refer to humanoid/scripts/sim2sim.py.
- Every environment relies on an
envfile (legged_robot.py) and aconfigfile (legged_robot_config.py). The latter includes two classes:LeggedRobotCfg(which includes all environment parameters) andLeggedRobotCfgPPO(which includes all training parameters). - Both
envandconfigclasses use inheritance. - In
cfg, a non-zero reward specified in contributes the correspondingly named function to the total reward. - Must use
task_registry.register(name, EnvClass, EnvConfig, TrainConfig)to register task. Registration can happen withinenvs/__init__.pyor outside the repository.
Basic environment "legged_robot" creates a construct a rough terrain locomotion task. The corresponding configuration does not specify the robot assets (URDF/MJCF) and reward scale.
-
If you need to add a new environment, please create a new repository under "envs/", and includes configuration file "<your_env>_config.py" within. New configurations should inherit from existing environment configurations.
-
If you propose to use a new robot:
- Insert corresponding assets into the "resources/" repository.
- In "cfg" files, set the path to the asset, define the body name, default_joint_positions and PD gains. Specify the desired 'train_cfg' and the name of the environment (python class).
- In "train_cfg" , set "experiment_name" and "run_name".
-
If needed, please create your environment in "<your_env>.py". Inherit from an existing environment, overriding required functionality and/or adding your bonus functionality.
-
Register your environment in
humanoid/envs/__init__.py. -
Modify or adjust other parameter in
cfgorcfg_trainaccording to your need. To remove the reward, set its scale to zero. Avoid modifying other environments' parameters! -
If you want your new robots/environments to implement sim2sim, you might need to change
humanoid/scripts/sim2sim.py:- Check the robot joint mapping between MJCF and URDF.
- Change the initial joint positions of the robot according to your trained policy.
The accomplishment of pi_rl_baseline relied on the resource from the legged_gym projec
该基线工作提供了一个基于 NVIDIA Isaac Gym 的强化学习环境,对 高擎机电的双足机器人 Pi Pi_rl_baseline 还整合了从 Isaac Gym 到 Mujoco 的sim2sim框架,使用户能够在不同的物理模拟中验证训练得到的策略,以确保策略的稳健性和泛化能力。
-
使用
miniconda或anaconda创建一个虚拟环境conda create -n pi_env python=3.8. -
使用
apt安装nvidia显卡驱动sudo apt install nvidia-driver-525,驱动版本至少为515,因为驱动是向下兼容的,所以也可以安装更高版本的驱动。安装完成后,在命令行中使用命令nvidia-smi以查看驱动的CUDA版本。可以看到示例图片中的CUDA版本为12.4,驱动版本为550。 -
安装最新版本的
Pytorch: 进入Pytorch官网 https://pytorch.org/ ,Package选项选择Conda,Compute Platform选择合适的CUDA版本。CUDA是一个向下兼容,但不向上兼容的软件库,所以所选择的CUDA版本要小于等于电脑安装的版本。conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -
使用
conda安装numpyconda install numpy=1.23. -
安装
Isaac Gym:- 在Nvidia官网下载并安装
Isaac Gym Preview 4https://developer.nvidia.com/isaac-gym. - 激活conda环境,并进入
isaacgym的包中进行安装 :cd isaacgym/python && pip install -e . - 可以通过运行自带的示例脚本,测试环境安装是否成功:
cd examples && python 1080_balls_of_solitude.py. - 请参阅
isaacgym/docs/index.html以进行故障排除。
- 在Nvidia官网下载并安装
-
安装本baseline:
- 克隆此仓库:
git clone https://github.com/HighTorque-Locomotion/pi_rl_baseline.git. cd pi_rl_baseline && pip install -e .
- 克隆此仓库:
# 使用 4096 个环境,并以“v1”为训练版本进行 PPO policy 训练
# 该命令将会开始机器人的训练任务.
python scripts/train.py --task=pai_ppo --run_name v1 --headless --num_envs 4096
# 评估训练好的policy
# 此命令将会加载“v1”policy以在其环境中进行性能评估。
# 此外,它还会自动导出适合部署目的的 JIT 模型。
python scripts/play.py --task=pai_ppo --run_name v1
# 通过使用Mujoco实现sim2sim
python scripts/sim2sim.py --load_model /path/to/logs/Pai_ppo/exported/policies/policy_torch.pt
# 运行我们提供的训练好的policy
python scripts/sim2sim.py --load_model /path/to/logs/Pai_ppo/exported/policies/policy_example.pt- CPU and GPU Usage: 使用CPU运行仿真, 同时设置
--sim_device=cpu和--rl_device=cpu. 使用指定GPU运行仿真,同时设置--sim_device=cuda:{0,1,2...}和--rl_device={0,1,2...}. 请注意,CUDA_VISIBLE_DEVICES不适用,并且匹配--sim_device和--rl_device的设置至关重要。 - Headless Operation: 使用
--headless参数用于无渲染运行. - Rendering Control: 在训练期间按
v键开启或关闭渲染. - Policy Location: 训练好的模型保存在
humanoid/logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt.
进行RL训练,请参考 humanoid/utils/helpers.py.
进行sim2sim,请参考 humanoid/scripts/sim2sim.py.
- 每个环境都依赖于一个
env文件(legged_robot.py)和一个config文件(legged_robot_config.py)。后者包含两个类:LeggedRobotCfg(包含所有环境参数)和LeggedRobotCfgPPO(表示所有训练参数)。 env和config类都使用继承。cfg中指定的非零奖励将相应名称的函数贡献给总奖励。- 必须使用
task_registry.register(name, EnvClass, EnvConfig, TrainConfig)注册任务。注册可能发生在envs/__init__.py内,也可能发生在此存储库之外。
基础环境“legged_robot”构建了一个崎岖地形运动任务。相应的配置未指定机器人资产(URDF/MJCF)和奖励量表。
-
如果您需要添加新环境,请在“envs/”目录中创建一个新文件夹,其中包含名为“<your_env>_config.py”的配置文件。新配置应继承自现有环境配置。
-
如果提议使用新机器人:
- 将相应的资产插入“resources/”文件夹中。
- 在“cfg”文件中,设置资产的路径,定义主体名称、default_joint_positions 和 PD 增益。指定所需的“train_cfg”和环境的名称(python 类)。
- 在“train_cfg”中,设置“experiment_name”和“run_name”。
-
如果需要,请在“<your_env>.py”中创建您的环境。从现有环境继承,覆盖所需功能和/或添加您的奖励功能。
-
在
humanoid/envs/__init__.py中注册您的环境。 -
根据需求修改或调整
cfg或cfg_train中的其他参数。要删除奖励,请将其比例设置为零。避免修改其他环境的参数! -
如果您想要新的机器人/环境来执行 sim2sim,您可能需要修改
humanoid/scripts/sim2sim.py:- 检查 MJCF 和 URDF 之间的机器人关节映射。
- 根据您训练的策略更改机器人的初始关节位置。
pai_rl_baseline 的实现依赖于 legged_gym 项目的资源。


