This work introduces IDOL, a feed-forward, single-image human reconstruction framework that is fast, high-fidelity, and generalizable. Leveraging a large-scale dataset of 100K multi-view subjects, our method demonstrates exceptional generalizability and robustness in handling diverse human shapes, cross-domain data, severe viewpoints, and occlusions. With a uniform structured representation, the reconstructed avatars are directly animatable and easily editable, providing a significant step forward for various applications in graphics, vision, and beyond.
In summary, this project introduces:
- IDOL: A scalable pipeline for instant photorealistic 3D human reconstruction using a simple yet efficient feed-forward model.
- HuGe100K Dataset: We develop a data generation pipeline and present \datasetname, a large-scale multi-view human dataset featuring diverse attributes, high-fidelity, high-resolution appearances, and a well-aligned SMPL-X model.
- Application Support: Enabling 3D human reconstruction and downstream tasks such as editing and animation.
- 2024-12-18: Paper is now available on arXiv.
- 2025-01-02: The demo dataset containing 100 samples is now available for access. The remaining dataset is currently undergoing further cleaning and review.
- 2025-03-01: π Paper accepted by CVPR 2025.
- 2025-03-01: π We have released the inference code! Check out the Code Release section for details.
- 2025-04-01: π₯ Full HuGe100K dataset is now available! See the Dataset Access section.
- 2025-04-05: π₯ Training code is now available! Check out the Training Code section for details.
We are actively working on releasing the following resources:
| Resource | Status | Expected Release Date |
|---|---|---|
| Dataset Demo | β Available | Now Live! (2025.01.02) |
| Inference Code | β Available | Now Live! (2025.03.01) |
| Full Dataset Access | β Available | Now Live! (2025.04.01) |
| Online Demo | π§ In Progress | Before April 2025 |
| Training Code | β Available | Now Live! (2025.04.05) |
Stay tuned as we update this section with new releases! π
Please refer to env/README.md for detailed environment setup instructions.
Run demo with different modes:
# Reconstruct the input image
python run_demo.py --render_mode reconstruct
# Generate novel poses (animation)
python run_demo.py --render_mode novel_pose
# Generate 360-degree view
python run_demo.py --render_mode novel_pose_A-
Dataset Structure: First, prepare your dataset with the following structure:
dataset_root/ βββ deepfashion/ β βββ image1/ β β βββ videos/ β β β βββ xxx.mp4 β β β βββ xxx.jpg β β βββ param/ β β βββ xxx.npy β βββ image2/ β βββ videos/ β βββ param/ βββ flux_batch1_5000/ βββ image1/ β βββ videos/ β βββ param/ βββ image2/ βββ videos/ βββ param/ -
Process Dataset: Run the data processing script to generate cache files:
# Process the dataset and generate cache files # Please modify the dataset path and the sample number in the script bash data_processing/process_datasets.sh
This will generate cache files in the
processed_datadirectory:deepfashion_train_140.npydeepfashion_val_10.npydeepfashion_test_50.npyflux_batch1_5000_train_140.npyflux_batch1_5000_val_10.npyflux_batch1_5000_test_50.npy
-
Configure Cache Path: Update the cache path in your config file (e.g.,
configs/idol_v0.yaml):params: cache_path: [ ./processed_data/deepfashion_train_140.npy, ./processed_data/flux_batch1_5000_train_140.npy ]
-
Single-Node Training: For single-node multi-GPU training:
python train.py \ --base configs/idol_v0.yaml \ --num_nodes 1 \ --gpus 0,1,2,3,4,5,6,7
-
Multi-Node Training: For multi-node training, specify additional parameters:
python train.py \ --base configs/idol_v0.yaml \ --num_nodes <total_nodes> \ --node_rank <current_node_rank> \ --master_addr <master_node_ip> \ --master_port <port_number> \ --gpus 0,1,2,3,4,5,6,7
Example for a 2-node setup:
# On master node (node 0): python train.py --base configs/idol_v0.yaml --num_nodes 2 --node_rank 0 --master_addr 192.168.1.100 --master_port 29500 --gpus 0,1,2,3,4,5,6,7 # On worker node (node 1): python train.py --base configs/idol_v0.yaml --num_nodes 2 --node_rank 1 --master_addr 192.168.1.100 --master_port 29500 --gpus 0,1,2,3,4,5,6,7
-
Resume Training: To resume training from a checkpoint:
python train.py \ --base configs/idol_v0.yaml \ --resume PATH/TO/MODEL.ckpt \ --num_nodes 1 \ --gpus 0,1,2,3,4,5,6,7
-
Test and Evaluate Metrics:
python train.py \ --base configs/idol_v0.yaml \ # Main config file (model) --num_nodes 1 \ --gpus 0,1,2,3,4,5,6,7 \ --test_sd /path/to/model_checkpoint.ckpt \ # Path to the .ckpt model you want to test --test_dataset ./configs/test_dataset.yaml # (Optional) Dataset config used specifically for testing
- Make sure all GPUs have enough memory for the selected batch size
- For multi-node training, ensure network connectivity between nodes
- Monitor training progress using the logging system
- Adjust learning rate and other hyperparameters in the config file as needed
- π Paper on arXiv
- π Project Website
- π Live Demo (Coming Soon!)
We introduce HuGe100K, a large-scale multi-view human dataset, supporting 3D human reconstruction and animation research.
For detailed information about the dataset format, structure, and usage guidelines, please refer to our Dataset Documentation.
π₯ HuGe100K - The largest multi-view human dataset with 100,000+ subjects! π₯
High-resolution β’ Multi-view β’ Diverse poses β’ SMPL-X aligned
Complete the form to get access credentials and download links!
This dataset includes images derived from the DeepFashion dataset, originally provided by MMLAB at The Chinese University of Hong Kong. The use of DeepFashion images in this dataset has been explicitly authorized by the original authors solely for the purpose of creating and distributing this dataset. Users must not further reproduce, distribute, sell, or commercially exploit any images or derived data originating from DeepFashion. For any subsequent or separate use of the DeepFashion data, users must directly obtain authorization from MMLAB and comply with the original DeepFashion License.
If you find our work helpful, please cite us using the following BibTeX:
@article{zhuang2024idolinstant,
title={IDOL: Instant Photorealistic 3D Human Creation from a Single Image},
author={Yiyu Zhuang and Jiaxi Lv and Hao Wen and Qing Shuai and Ailing Zeng and Hao Zhu and Shifeng Chen and Yujiu Yang and Xun Cao and Wei Liu},
journal={arXiv preprint arXiv:2412.14963},
year={2024},
url={https://arxiv.org/abs/2412.14963},
}This project is licensed under the MIT License.
- Permissions: This license grants permission to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.
- Condition: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
- Disclaimer: The software is provided "as is", without warranty of any kind.
For more information, see the full license here.
If you find our work useful for your research or applications:
- Please β star our repository to help us reach more people
- Consider citing our paper in your publications (see Citation section)
- Share our project with others who might benefit from it
Your support helps us continue developing open-source research projects like this one!
This project is majorly built upon several excellent open-source projects:
- E3Gen: Efficient, Expressive and Editable Avatars Generation
- SAPIENS: High-resolution visual models for human-centric tasks
- GeoLRM: Large Reconstruction Model for High-Quality 3D Generation
- 3D Gaussian Splatting: Real-Time 3DGS Rendering
We thank all the authors for their contributions to the open-source community.

