BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation (CVPR 2025)

Authors: Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano*, Shalini De Mello*, Michael Stengel*
*denotes equal contribution

Overview

Unlike prior methods that avoid camera recovery and only work well for subjects far from the camera, BLADE tackles close-range human mesh recovery where perspective distortion is strongest in 3 steps:

Predicts pelvis depth (Tz): directly from the image
Conditions pose/shape on Tz: the same shape/pose can look different at different distances
Recovers full perspective camera: focal length and XY-translation through differentiable rasterization

Key Features

✅ True perspective HMR: recovers focal length and full 3D translation without heuristics
✅ Close-range robust: strong performance under severe perspective distortion
✅ 3D Accurate & 2D Aligned: improved 3D pose, metrical depth, and 2D reprojection alignment

📚 Citation & Acknowledgement

@inproceedings{wang2025blade,
  title     = {BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation},
  author    = {Wang, Shengze and Li, Jiefeng and Li, Tianye and Yuan, Ye and Fuchs, Henry and Nagano, Koki and De Mello, Shalini and Stengel, Michael},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025}
}

BLADE builds on the outstanding work and open-source contributions of many teams. We thank:

MMHuman3D — dataset preprocessing utilities, evaluation protocols, and loaders that informed our data pipeline.
ZOLLY & PDHuman — PDHuman dataset and related preprocessing guidance and ZOLLY as baseline.
SMPL & SMPL-X body models (MPI-IS) — foundational parametric human body models.
Depth Anything v2 (metric) — depth encoder's features are used in our DepthNet.
MediaPipe & Sapiens Pose & RTMDet — human segmenter and keypoint models used in our preprocessing pipeline.
AiOS & ControlNet — basis for our pelvis-depth conditioned PoseNet.
Datasets — We gratefully acknowledge the following datasets used for training/evaluation and comparisons: HuMMan, Human3.6M, PDHuman, SPEC, BEDLAM (basis for our BEDLAM-CC renders)

If you use BLADE in your research, please also cite the original works for any datasets/models you use alongside our paper.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

🛠️ Installation

Prerequisites

CUDA 11.8 recommended. guide for CUDA 13 also provided
Python 3.9.19
Conda

Quick Installation (CUDA 11.8)

# Create environment
apt-get install git-lfs
git lfs install

git clone https://github.com/NVlabs/BLADE

conda create -n blade_env python=3.9.19
conda deactivate && conda activate blade_env

# pytorch stuff
pip install tqdm torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 && pip install fvcore iopath numpy==1.24.4 && pip install wandb && pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt201/download.html && pip install --no-warn-conflict matplotlib==3.8.4 colorama requests huggingface-hub safetensors pillow six click openxlab && pip install --no-warn-conflict chumpy scipy munkres tqdm cython fsspec yapf==0.40.1 packaging omegaconf ipdb ftfy regex && pip install --no-warn-conflict json_tricks terminaltables modelindex prettytable albumentations && pip install --no-warn-conflict smplx==0.1.28 debugpy numba yacs scikit-learn filterpy h5py trimesh scikit-image tensorboardx pyrender torchgeometry joblib boto3 easydict pycocotools colormap pytorch-transformers pickle5 plyfile timm pyglet future tensorboard cdflib ftfy einops && pip install --no-warn-conflict numpy==1.23.1 mediapipe
# MMCV stuff
cd mmcv; MMCV_WITH_OPS=1 pip install --no-warn-conflicts -e . -v; pip install --no-warn-conflicts -e .; cd ..; cd sapiens; cd engine; pip install --no-warn-conflicts -e . -v; cd ../pretrain; pip install --no-warn-conflicts -e . -v; cd ../pose; pip install --no-warn-conflicts -e . -v; cd ../det; pip install --no-warn-conflicts -e . -v; cd ../seg; pip install --no-warn-conflicts -e . -v; cd ../..; pip install --no-warn-conflicts -e .;
pip install --no-warn-conflict ffmpeg astropy easydev pandas rtree vedo codecov flake8 interrogate isort pytest surrogate xdoctest setuptools loguru open3d omegaconf
cd aios_repo/models/aios/ops/; python setup.py build install; cd ../../../..; cd torch-trust-ncg; python setup.py install; 
cd ..; pip install  --no-warn-conflict numpy==1.23.1

Note: Ignore numpy version warnings - they don't affect functionality.

For non-CUDA 11.8 Users (e.g. CUDA 12 & CUDA 13)

If you don't want to install another cuda, try changing the cu118 in the above commands to your version. This sometimes works, depending on whether pytorch and pytorch3d has a matching version for you.

The more reliable way would be following this guide (requires installing CUDA 11.8) INSTALL_CUDA13.md for detailed non-CUDA 11.8 instructions.

FYI a system could have multiple versions of CUDA at the same time.

📦 Download Models

Required Models

# Install HuggingFace CLI
pip install "huggingface_hub[cli]"
# Note: You might need to set up your HuggingFace token in order to download

# Download BLADE checkpoint
Pretrained weights are personally provided by Shengze Wang 
Please check https://mcmvmc.github.io/blade_weights.html 

# If you are on SLURM and limited by disk quota, redirect the download directories
export HF_HOME=</path/to/.cache/>; 
export TORCH_HOME=</path/to/.cache/>; 
export WANDB_DIR=</path/to/.wandb_cache/>; 

# Download supporting models
hf download depth-anything/Depth-Anything-V2-Metric-Hypersim-Large depth_anything_v2_metric_hypersim_vitl.pth --local-dir pretrained/model_init_weights
hf download ttxskk/AiOS aios_checkpoint.pth --local-dir pretrained/model_init_weights
hf download facebook/sapiens-pose-bbox-detector rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth --local-dir pretrained/rtmpose
hf download facebook/sapiens-pose-1b sapiens_1b_goliath_best_goliath_AP_639.pth --local-dir pretrained/pose

SMPL/SMPL-X Models (Manual Download Required)

Due to licensing restrictions, download these manually:

SMPL: Download from official site → place in body_models/smpl/
SMPL-X: Download from official site → place in body_models/smplx/
Transfer data: Download from SMPL-X repo → place in pretrained/transfer_data/

Expected Directory Structure

blade_repo/
├── body_models/
│   ├── smpl/          # SMPL models (manual download)
│   └── smplx/         # SMPL-X models (manual download)
├── pretrained/
│   ├── epoch_2.pth    # BLADE checkpoint
│   ├── model_init_weights/
│   │   ├── depth_anything_v2_metric_hypersim_vitl.pth
│   │   └── aios_checkpoint.pth
│   ├── rtmpose/
│   │   └── rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth
│   ├── pose/
│   │   └── sapiens_1b_goliath_best_goliath_AP_639.pth
│   └── transfer_data/ # SMPL conversion data (manual download)
│       ├── smplh2smpl_def_transfer.pkl
│       ├── smplx2smplh_deftrafo_setup.pkl
│       ├── smpl2smplh_def_transfer.pkl
│       ├── smplh2smplx_deftrafo_setup.pkl
│       ├── smplx_mask_ids.npy
│       ├── smpl2smplx_deftrafo_setup.pkl
│       ├── smplx2smpl_deftrafo_setup.pkl
│       └── smplx_to_smpl.pkl

🚀 Usage

Quick Demo

Although we can't share the photos used in our paper due to license limitations, we provide some CC0-licensed images for users to test quickly.

For in-the-wild demos, we enable pose & depth optimization by default (opt_pose=True and opt_tz=True in blade/configs/blade_inthewild.py) If you are running numeric evaluation, please turn off pose and depth optimization because the optimization could fail due to various factors such as bad 2D keypoint detections or segmentations.

MINI_BATCHSIZE=5 python api/test_api.py ./demo_images/

Note: If you see issue related to EGL or XRender (often on monitor-less servers like SLURM), do:

export PYGLET_HEADLESS=True
export PYOPENGL_PLATFORM=egl
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:${LD_LIBRARY_PATH}"
[ -f "$CONDA_PREFIX/lib/libEGL.so" ] || ln -s "$CONDA_PREFIX/lib/libEGL.so.1" "$CONDA_PREFIX/lib/libEGL.so"

Samples You Should See:

Results are written to results/test_demo by default. You should see mesh overlaid on Segmented images like below, otherwise the mediapipe human segmentation failed and the camera solver would be less accurate.

Note: We find that human segmentation and keypoint detection can fail for heavily distorted images (you could verify from the rendering results), leading to bad results. Robustness would improve if you use better models.

Demo Configuration Explained:

API: <blade_repo>/api/BLADE_API.py
Config file used: <blade_repo>/blade/configs/blade_inthewild.py
Images with SMPL-X overlay saved to: temp_output_folder=<blade_repo>/results in config file
To enable visualization window (matplotlib): Set enable_vis_window=True in config file

Run Evaluation

MINI_BATCHSIZE=5 python scripts/test.py ./blade/configs/blade_posenet.py --work-dir=./work_dirs/eval_test --out ./work_dirs/eval_test/test.out --data-name <spec_mtp_p3 or humman_p3 or pdhuman_p5> --checkpoint ./pretrained/epoch_2.pth

Note:

Depth accuracy is most important for close-range cases (e.g. < 1m) due to inverse relationship to depth during perspective distortion. As depth increases, the image approximates orthographic projection and depth becomes much less important.

Can be slow due to SMPL-X -> SMPL conversion, which is needed for evaluation on SMPL datasets. Also much faster locally comparing to running on servers

🏋️ Training

Datasets Used

HuMMan, H36M, PDHuman
BEDLAM-CC: Custom rendering (see Dataset Guide section)

Note: Due to BEDLAM's licensing restrictions, we cannot share the rendered images, please refer to our guide above to generate your own dataset. Note: Due to license restrictions, we cannot directly provide the SMPL-X version of the labels for the 3 datasets PDHuman, HuMMan, and H36M. You can obtain the original labels from MMHuman3D. For h36m_mosh_train_transl.npz, please contact ZOLLY's authors.

(Optional) If you already have MMHuman Datasets

# Create symlinks to datasets
cd <repo_root>
ln -s <path_to_dataset_root>/datasets mmhuman_data/datasets  # images
ln -s <path_to_dataset_root>/preprocessed_datasets mmhuman_data/preprocessed_datasets  # labels

Dataset Folder Structure

blade/mmhuman_data/
├── datasets/                    # Raw dataset images
│   ├── humman/
│   │   ├── test_images/
│   │   └── train_images/
│   ├── h36m/
│   │   ├── S1/ S11/ S5/ S6/ S7/ S8/ S9/
│   ├── pdhuman/                 # download from ZOLLY: https://github.com/SMPLCap/Zolly
│   │   └── imgs/
│   ├── spec_mtp/
│   │   └── imgs/
│   └── bedlamcc/
│       └── png/
│           └── seq_000000/ ...
└── preprocessed_datasets/       # Processed labels
    ├── # Testing (original SMPL labels)
    ├── spec_mtp_p3.npz
    ├── pdhuman_test_p5.npz
    ├── humman_test_p3.npz
    ├── # Training (converted to SMPL-X)
    ├── pdhuman_train_smplx.npz
    ├── humman_train_smplx.npz
    ├── h36m_mosh_train_transl_smplx.npz
    ├── bedlamcc.npz
    └── # Original SMPL labels for conversion
        ├── pdhuman_train.npz
        ├── humman_train.npz
        └── h36m_mosh_train_transl.npz

Two-Stage Training

Stage 1: DepthNet

MINI_BATCHSIZE=48 python scripts/train.py ./blade/configs/blade_depthnet.py --launcher none --work-dir=./work_dirs/train_depth

Expected Duration: Best performance often achieved within 4 epochs.

Note: 8 GPUs with batch size 16 works

Stage 2: PoseNet

Update depthnet_ckpt_path in blade_posenet.py to point to Stage 1 checkpoint

MINI_BATCHSIZE=7 python scripts/train.py ./blade/configs/blade_posenet.py --launcher none --work-dir=./work_dirs/train_posenet

Expected Duration: Best performance often comes from one of the first 2 epochs.

Note: We only tested PoseNet training in multi-node settings described below.

Multi-Node Training

For Slurm-like clusters, see MULTI_NODE_TRAINING.md for detailed instructions.

Reproduced Numbers versus in the Paper

Note: ↓ means lower is better; ↑ means higher is better.

Numbers can be slightly different each time due to randomness in camera optimization.

SPEC-MTP (in-the-wild capture)

Method	Tz (m) ↓	1/Tz ↓	Txy (m) ↓	f (%) ↓	PA-MPJPE ↓	MPJPE ↓	PVE ↓	mIoU ↑	P-mIoU ↑
TokenHMR	0.909	0.436	0.095	112.1	64.2	107.1	124.3	49.8	19.0
AiOS	1.035	0.464	0.121	112.1	62.8	101.6	110.9	48.7	11.3
In Paper	0.127	0.112	0.044	15.9	56.7	94.1	99.6	69.9	41.5
re-trained	0.135	0.122	0.054	18.0	56.8	93.9	102.0	67.9	39.4

HuMMan (in-studio capture)

Method	Tz (m) ↓	1/Tz ↓	Txy (m) ↓	f (%) ↓	PA-MPJPE ↓	MPJPE ↓	PVE ↓	mIoU ↑	P-mIoU ↑
TokenHMR	2.599	0.307	0.044	41.6	46.4	72.2	82.0	60.9	31.1
AiOS	2.311	0.292	0.033	41.6	66.1	91.8	99.4	72.0	44.3
In Paper	0.187	0.058	0.056	8.3	23.8	41.1	52.3	70.6	38.2
re-trained	0.149	0.045	0.059	5.6	27.8	44.4	52.1	71.0	41.9

PDHuman (synthetic)

Method	Tz (m) ↓	1/Tz ↓	Txy (m) ↓	f (%) ↓	PA-MPJPE ↓	MPJPE ↓	PVE ↓	mIoU ↑	P-mIoU ↑
TokenHMR	2.280	1.034	0.068	55.0	92.1	141.5	156.7	53.0	27.8
AiOS	2.312	1.024	0.149	55.0	106.6	170.6	183.4	49.5	16.0
In Paper	0.107	0.178	0.049	22.3	61.4	90.1	102.6	65.2	41.4
re-trained	0.110	0.185	0.109	21.5	67.0	101.9	116.2	62.3	38.6

Note: As mentioned in our paper, we found that PDHuman's groundtruth labels do not align with the images well

🔧 All Documentations

Installation Guide for CUDA 13 - Detailed CUDA 13+ setup and troubleshooting
Multi-Node Training Guide - Multi-node training and configuration
Dataset Guide - BEDLAM-CC Dataset generation and preprocessing, and converting SMPL datasets to SMPL-X

Quick Reference

Configuration files: blade/configs/blade_posenet.py, blade/configs/blade_depthnet.py
API: api/BLADE_API.py
Dataset conversion: python blade/datasets/combine_preprocessed_smpldata.py <dataset_name>
BEDLAM-CC preprocessing: python blade/datasets/combine_preprocessed_ourdata.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
DAv2		DAv2
aios_repo		aios_repo
api		api
assets		assets
blade		blade
body_models		body_models
cache		cache
demo_images		demo_images
docs		docs
mmcv		mmcv
mmhuman3d		mmhuman3d
mmhuman_data		mmhuman_data
pretrained		pretrained
sapiens		sapiens
scripts		scripts
smplx_repo		smplx_repo
torch-trust-ncg		torch-trust-ncg
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo.sh		demo.sh
setup.py		setup.py
test_bash_slurm.sh		test_bash_slurm.sh
train_bash_slurm.sh		train_bash_slurm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation (CVPR 2025)

Overview

Key Features

📚 Citation & Acknowledgement

🛠️ Installation

Prerequisites

Quick Installation (CUDA 11.8)

For non-CUDA 11.8 Users (e.g. CUDA 12 & CUDA 13)

📦 Download Models

Required Models

SMPL/SMPL-X Models (Manual Download Required)

Expected Directory Structure

🚀 Usage

Quick Demo

Run Evaluation

🏋️ Training

Datasets Used

(Optional) If you already have MMHuman Datasets

Dataset Folder Structure

Two-Stage Training

Multi-Node Training

Reproduced Numbers versus in the Paper

SPEC-MTP (in-the-wild capture)

HuMMan (in-studio capture)

PDHuman (synthetic)

🔧 All Documentations

Quick Reference

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

NVlabs/blade

Folders and files

Latest commit

History

Repository files navigation

BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation (CVPR 2025)

Overview

Key Features

📚 Citation & Acknowledgement

🛠️ Installation

Prerequisites

Quick Installation (CUDA 11.8)

For non-CUDA 11.8 Users (e.g. CUDA 12 & CUDA 13)

📦 Download Models

Required Models

SMPL/SMPL-X Models (Manual Download Required)

Expected Directory Structure

🚀 Usage

Quick Demo

Run Evaluation

🏋️ Training

Datasets Used

(Optional) If you already have MMHuman Datasets

Dataset Folder Structure

Two-Stage Training

Multi-Node Training

Reproduced Numbers versus in the Paper

SPEC-MTP (in-the-wild capture)

HuMMan (in-studio capture)

PDHuman (synthetic)

🔧 All Documentations

Quick Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages