Skip to content

IDOL: Instant Photorealistic 3D Human Creation from a Single Image. An open-source project for fast, high-fidelity, and generalizable 3D human reconstruction from a single image.

Notifications You must be signed in to change notification settings

yiyuzhuang/IDOL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Website Paper Live Demo License


Teaser Image for IDOL


Abstract

This work introduces IDOL, a feed-forward, single-image human reconstruction framework that is fast, high-fidelity, and generalizable. Leveraging a large-scale dataset of 100K multi-view subjects, our method demonstrates exceptional generalizability and robustness in handling diverse human shapes, cross-domain data, severe viewpoints, and occlusions. With a uniform structured representation, the reconstructed avatars are directly animatable and easily editable, providing a significant step forward for various applications in graphics, vision, and beyond.

In summary, this project introduces:

  • IDOL: A scalable pipeline for instant photorealistic 3D human reconstruction using a simple yet efficient feed-forward model.
  • HuGe100K Dataset: We develop a data generation pipeline and present \datasetname, a large-scale multi-view human dataset featuring diverse attributes, high-fidelity, high-resolution appearances, and a well-aligned SMPL-X model.
  • Application Support: Enabling 3D human reconstruction and downstream tasks such as editing and animation.

πŸ“° News

  • 2024-12-18: Paper is now available on arXiv.
  • 2025-01-02: The demo dataset containing 100 samples is now available for access. The remaining dataset is currently undergoing further cleaning and review.
  • 2025-03-01: πŸŽ‰ Paper accepted by CVPR 2025.
  • 2025-03-01: πŸŽ‰ We have released the inference code! Check out the Code Release section for details.
  • 2025-04-01: πŸ”₯ Full HuGe100K dataset is now available! See the Dataset Access section.
  • 2025-04-05: πŸ”₯ Training code is now available! Check out the Training Code section for details.

🚧 Project Status

We are actively working on releasing the following resources:

Resource Status Expected Release Date
Dataset Demo βœ… Available Now Live! (2025.01.02)
Inference Code βœ… Available Now Live! (2025.03.01)
Full Dataset Access βœ… Available Now Live! (2025.04.01)
Online Demo 🚧 In Progress Before April 2025
Training Code βœ… Available Now Live! (2025.04.05)

Stay tuned as we update this section with new releases! πŸš€

πŸ’» Code Release

Installation & Environment Setup

Please refer to env/README.md for detailed environment setup instructions.

Quick Start

Run demo with different modes:

# Reconstruct the input image
python run_demo.py --render_mode reconstruct

# Generate novel poses (animation)
python run_demo.py --render_mode novel_pose

# Generate 360-degree view
python run_demo.py --render_mode novel_pose_A

Training

Data Preparation

  1. Dataset Structure: First, prepare your dataset with the following structure:

    dataset_root/
    β”œβ”€β”€ deepfashion/
    β”‚   β”œβ”€β”€ image1/
    β”‚   β”‚   β”œβ”€β”€ videos/
    β”‚   β”‚   β”‚   β”œβ”€β”€ xxx.mp4
    β”‚   β”‚   β”‚   └── xxx.jpg
    β”‚   β”‚   └── param/
    β”‚   β”‚       └── xxx.npy
    β”‚   └── image2/
    β”‚       β”œβ”€β”€ videos/
    β”‚       └── param/
    └── flux_batch1_5000/
        β”œβ”€β”€ image1/
        β”‚   β”œβ”€β”€ videos/
        β”‚   └── param/
        └── image2/
            β”œβ”€β”€ videos/
            └── param/
    
  2. Process Dataset: Run the data processing script to generate cache files:

    # Process the dataset and generate cache files
    # Please modify the dataset path and the sample number in the script
    bash data_processing/process_datasets.sh

    This will generate cache files in the processed_data directory:

    • deepfashion_train_140.npy
    • deepfashion_val_10.npy
    • deepfashion_test_50.npy
    • flux_batch1_5000_train_140.npy
    • flux_batch1_5000_val_10.npy
    • flux_batch1_5000_test_50.npy
  3. Configure Cache Path: Update the cache path in your config file (e.g., configs/idol_v0.yaml):

      params:
        cache_path: [
          ./processed_data/deepfashion_train_140.npy,
          ./processed_data/flux_batch1_5000_train_140.npy
        ]

Training

  1. Single-Node Training: For single-node multi-GPU training:

    python train.py \
      --base configs/idol_v0.yaml \
      --num_nodes 1 \
      --gpus 0,1,2,3,4,5,6,7
  2. Multi-Node Training: For multi-node training, specify additional parameters:

    python train.py \
      --base configs/idol_v0.yaml \
      --num_nodes <total_nodes> \
      --node_rank <current_node_rank> \
      --master_addr <master_node_ip> \
      --master_port <port_number> \
      --gpus 0,1,2,3,4,5,6,7

    Example for a 2-node setup:

    # On master node (node 0):   
    python train.py --base configs/idol_v0.yaml --num_nodes 2 --node_rank 0 --master_addr 192.168.1.100 --master_port 29500 --gpus 0,1,2,3,4,5,6,7
    
    # On worker node (node 1):
    python train.py --base configs/idol_v0.yaml --num_nodes 2 --node_rank 1 --master_addr 192.168.1.100 --master_port 29500 --gpus 0,1,2,3,4,5,6,7
  3. Resume Training: To resume training from a checkpoint:

    python train.py \
      --base configs/idol_v0.yaml \
      --resume PATH/TO/MODEL.ckpt \
      --num_nodes 1 \
      --gpus 0,1,2,3,4,5,6,7
  4. Test and Evaluate Metrics:

    python train.py \
      --base configs/idol_v0.yaml \                # Main config file (model)
      --num_nodes 1 \
      --gpus 0,1,2,3,4,5,6,7 \
      --test_sd /path/to/model_checkpoint.ckpt \   # Path to the .ckpt model you want to test
       --test_dataset ./configs/test_dataset.yaml   # (Optional) Dataset config used specifically for testing

Notes

  • Make sure all GPUs have enough memory for the selected batch size
  • For multi-node training, ensure network connectivity between nodes
  • Monitor training progress using the logging system
  • Adjust learning rate and other hyperparameters in the config file as needed

🌐 Key Links


πŸ“Š Dataset Demo Access

We introduce HuGe100K, a large-scale multi-view human dataset, supporting 3D human reconstruction and animation research.

β–Ά Watch the Demo Video

Dataset GIF

πŸ“‹ Dataset Documentation

For detailed information about the dataset format, structure, and usage guidelines, please refer to our Dataset Documentation.

πŸš€ Access the Dataset

πŸ”₯ HuGe100K - The largest multi-view human dataset with 100,000+ subjects! πŸ”₯

High-resolution β€’ Multi-view β€’ Diverse poses β€’ SMPL-X aligned

Apply for Access

Complete the form to get access credentials and download links!

βš–οΈ License and Attribution

This dataset includes images derived from the DeepFashion dataset, originally provided by MMLAB at The Chinese University of Hong Kong. The use of DeepFashion images in this dataset has been explicitly authorized by the original authors solely for the purpose of creating and distributing this dataset. Users must not further reproduce, distribute, sell, or commercially exploit any images or derived data originating from DeepFashion. For any subsequent or separate use of the DeepFashion data, users must directly obtain authorization from MMLAB and comply with the original DeepFashion License.


πŸ“ Citation

If you find our work helpful, please cite us using the following BibTeX:

@article{zhuang2024idolinstant,                
  title={IDOL: Instant Photorealistic 3D Human Creation from a Single Image}, 
  author={Yiyu Zhuang and Jiaxi Lv and Hao Wen and Qing Shuai and Ailing Zeng and Hao Zhu and Shifeng Chen and Yujiu Yang and Xun Cao and Wei Liu},
  journal={arXiv preprint arXiv:2412.14963},
  year={2024},
  url={https://arxiv.org/abs/2412.14963}, 
}

License

This project is licensed under the MIT License.

  • Permissions: This license grants permission to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.
  • Condition: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
  • Disclaimer: The software is provided "as is", without warranty of any kind.

For more information, see the full license here.

Support Our Work ⭐

If you find our work useful for your research or applications:

  • Please ⭐ star our repository to help us reach more people
  • Consider citing our paper in your publications (see Citation section)
  • Share our project with others who might benefit from it

Your support helps us continue developing open-source research projects like this one!

πŸ“š Acknowledgments

This project is majorly built upon several excellent open-source projects:

  • E3Gen: Efficient, Expressive and Editable Avatars Generation
  • SAPIENS: High-resolution visual models for human-centric tasks
  • GeoLRM: Large Reconstruction Model for High-Quality 3D Generation
  • 3D Gaussian Splatting: Real-Time 3DGS Rendering

We thank all the authors for their contributions to the open-source community.

About

IDOL: Instant Photorealistic 3D Human Creation from a Single Image. An open-source project for fast, high-fidelity, and generalizable 3D human reconstruction from a single image.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published