Official implementation of CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation.
CoDance animates one or multiple subjects from a reference image using a driving pose sequence. It is designed for mismatched-pose settings where the pose layout is not rigidly aligned with the reference image.
Links: Project Page · Checkpoint · Model Card
- Code release scaffold is available.
- CoDance checkpoint is hosted on Hugging Face.
conda create -n codance python=3.10 -y
conda activate codance
# Choose the CUDA wheel matching your machine.
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -e .Optional acceleration backends are supported when installed, including FlashAttention, SageAttention, and torch SDPA. torch SDPA is used by default.
Create the checkpoint directory:
mkdir -p checkpointsDownload the CoDance checkpoint:
python scripts/download_weights.pyOr download it manually from Hugging Face and place it at:
checkpoints/codance.ckpt
Download Wan2.1-I2V-14B-720P:
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
--local-dir Wan2.1-I2V-14B-720PDownload the DWPose ONNX models and place them as:
checkpoints/yolox_l.onnx
checkpoints/dw-ll_ucoco_384.onnx
Expected layout:
CoDance/
├── checkpoints/
│ ├── codance.ckpt
│ ├── yolox_l.onnx
│ └── dw-ll_ucoco_384.onnx
└── Wan2.1-I2V-14B-720P/
├── diffusion_pytorch_model-00001-of-00007.safetensors
├── ...
├── models_t5_umt5-xxl-enc-bf16.pth
├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
└── Wan2.1_VAE.pth
CoDance uses three inputs:
- A reference image.
- A DWPose frame directory extracted from a driving video.
- A reference subject mask generated by SAM-2 or provided by the user.
python process_data.py \
--source_video_paths data/videos/driving.mp4 \
--saved_pose_dir data/saved_pkl \
--saved_pose data/saved_pose \
--det_model checkpoints/yolox_l.onnx \
--pose_model checkpoints/dw-ll_ucoco_384.onnxThis writes pose images to:
data/saved_pose/driving/
Provide one or more positive points on the target subject:
python get_mask.py \
--image data/images/reference.png \
--points "480,1470;1168,1424" \
--output data/masks/reference_mask.pngYou can also use any external segmentation tool, as long as the saved mask is an RGB image.
python examples/inference_480p_single.py \
--ref_img data/images/reference.png \
--mask data/masks/reference_mask.png \
--pose_dir data/saved_pose/driving \
--save_path outputs/codance.mp4 \
--prompt "A character is dancing." \
--wan_dir Wan2.1-I2V-14B-720P \
--codance_ckpt checkpoints/codance.ckptDefault inference settings are 81 frames, 832x480 resolution, CFG scale 5, 50 denoising steps, and sigma shift 5.
If your Wan2.1 local directory uses a different layout, pass explicit model files:
python examples/inference_480p_single.py \
--ref_img data/images/reference.png \
--mask data/masks/reference_mask.png \
--pose_dir data/saved_pose/driving \
--model_path Wan2.1-I2V-14B-720P/models_t5_umt5-xxl-enc-bf16.pth \
--model_path Wan2.1-I2V-14B-720P/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--model_path Wan2.1-I2V-14B-720P/Wan2.1_VAE.pth \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00001-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00002-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00003-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00004-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00005-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00006-of-00007.safetensors \
--model_path Wan2.1-I2V-14B-720P/diffusion_pytorch_model-00007-of-00007.safetensorsThe released checkpoint contains LoRA weights, the Pose Shift Encoder, and the Mask Encoder. See docs/TRAINING.md for the implementation settings used in the paper.
.
├── diffsynth/ # Diffusion models, pipelines, schedulers, and loaders
├── dwpose/ # DWPose ONNX inference utilities
├── examples/ # CoDance inference scripts
├── scripts/ # Download and maintenance helpers
├── process_data.py # Driving-pose extraction CLI
├── get_mask.py # SAM-2 mask generation CLI
├── MODEL_CARD.md
└── README.md
CoDance is released for academic research. Do not use this project for impersonation, non-consensual identity manipulation, harassment, fraud, or deceptive media generation. Users are responsible for ensuring that reference images, masks, and driving videos are used with proper rights and consent.
This implementation builds on the DiffSynth-style video generation codebase and benefits from prior work including UniAnimate-DiT, MimicMotion, MusePose, Animate-X, Wan2.1, DWPose, and SAM-2.
@article{CoDance2025,
title={CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation},
author={Tan, Shuai and Gong, Biao and Ma, Ke and Feng, Yutong and Zhang, Qiyuan and Wang, Yan and Shen, Yujun and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2601.11096},
year={2025}
}This repository is released under the Apache-2.0 license. See LICENSE.