Skip to content

Lucaria-Academy/CoDance

Repository files navigation

CoDance

Official implementation of CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation.

CoDance animates one or multiple subjects from a reference image using a driving pose sequence. It is designed for mismatched-pose settings where the pose layout is not rigidly aligned with the reference image.

Links: Project Page · Checkpoint · Model Card

News

  • Code release scaffold is available.
  • CoDance checkpoint is hosted on Hugging Face.

Installation

conda create -n codance python=3.10 -y
conda activate codance

# Choose the CUDA wheel matching your machine.
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt
pip install -e .

Optional acceleration backends are supported when installed, including FlashAttention, SageAttention, and torch SDPA. torch SDPA is used by default.

Checkpoints

Create the checkpoint directory:

mkdir -p checkpoints

Download the CoDance checkpoint:

python scripts/download_weights.py

Or download it manually from Hugging Face and place it at:

checkpoints/codance.ckpt

Download Wan2.1-I2V-14B-720P:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
  --local-dir Wan2.1-I2V-14B-720P

Download the DWPose ONNX models and place them as:

checkpoints/yolox_l.onnx
checkpoints/dw-ll_ucoco_384.onnx

Expected layout:

CoDance/
├── checkpoints/
│   ├── codance.ckpt
│   ├── yolox_l.onnx
│   └── dw-ll_ucoco_384.onnx
└── Wan2.1-I2V-14B-720P/
    ├── diffusion_pytorch_model-00001-of-00007.safetensors
    ├── ...
    ├── models_t5_umt5-xxl-enc-bf16.pth
    ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
    └── Wan2.1_VAE.pth

Inference

CoDance uses three inputs:

  • A reference image.
  • A DWPose frame directory extracted from a driving video.
  • A reference subject mask generated by SAM-2 or provided by the user.

1. Extract driving poses

python process_data.py \
  --source_video_paths data/videos/driving.mp4 \
  --saved_pose_dir data/saved_pkl \
  --saved_pose data/saved_pose \
  --det_model checkpoints/yolox_l.onnx \
  --pose_model checkpoints/dw-ll_ucoco_384.onnx

This writes pose images to:

data/saved_pose/driving/

2. Generate a reference mask

Provide one or more positive points on the target subject:

python get_mask.py \
  --image data/images/reference.png \
  --points "480,1470;1168,1424" \
  --output data/masks/reference_mask.png

You can also use any external segmentation tool, as long as the saved mask is an RGB image.

3. Run CoDance

python examples/inference_480p_single.py \
  --ref_img data/images/reference.png \
  --mask data/masks/reference_mask.png \
  --pose_dir data/saved_pose/driving \
  --save_path outputs/codance.mp4 \
  --prompt "A character is dancing." \
  --wan_dir Wan2.1-I2V-14B-720P \
  --codance_ckpt checkpoints/codance.ckpt

Default inference settings are 81 frames, 832x480 resolution, CFG scale 5, 50 denoising steps, and sigma shift 5.

If your Wan2.1 local directory uses a different layout, pass explicit model files:

python examples/inference_480p_single.py \
  --ref_img data/images/reference.png \
  --mask data/masks/reference_mask.png \
  --pose_dir data/saved_pose/driving \
  --model_path Wan2.1-I2V-14B-720P/models_t5_umt5-xxl-enc-bf16.pth \
  --model_path Wan2.1-I2V-14B-720P/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
  --model_path Wan2.1-I2V-14B-720P/Wan2.1_VAE.pth \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00001-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00002-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00003-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00004-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00005-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00006-of-00007.safetensors \
  --model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00007-of-00007.safetensors

Training Details

The released checkpoint contains LoRA weights, the Pose Shift Encoder, and the Mask Encoder. See docs/TRAINING.md for the implementation settings used in the paper.

Repository Structure

.
├── diffsynth/              # Diffusion models, pipelines, schedulers, and loaders
├── dwpose/                 # DWPose ONNX inference utilities
├── examples/               # CoDance inference scripts
├── scripts/                # Download and maintenance helpers
├── process_data.py         # Driving-pose extraction CLI
├── get_mask.py             # SAM-2 mask generation CLI
├── MODEL_CARD.md
└── README.md

Ethics

CoDance is released for academic research. Do not use this project for impersonation, non-consensual identity manipulation, harassment, fraud, or deceptive media generation. Users are responsible for ensuring that reference images, masks, and driving videos are used with proper rights and consent.

Acknowledgements

This implementation builds on the DiffSynth-style video generation codebase and benefits from prior work including UniAnimate-DiT, MimicMotion, MusePose, Animate-X, Wan2.1, DWPose, and SAM-2.

Citation

@article{CoDance2025,
  title={CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation},
  author={Tan, Shuai and Gong, Biao and Ma, Ke and Feng, Yutong and Zhang, Qiyuan and Wang, Yan and Shen, Yujun and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2601.11096},
  year={2025}
}

License

This repository is released under the Apache-2.0 license. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages