Skip to content

xlite-dev/longcat-video-fast

Repository files navigation

⚡️LongCat-Video-Fast

🔥LongCat-Video with 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.

⚙️Installation

git clone https://github.com/xlite-dev/longcat-video-fast.git
cd longcat-video-fast && git submodule update --init --recursive --force
cd LongCat-Video && pip3 install -r requirements.txt && cd ..
pip3 install torch==2.9.0 torchvision torchao bitsandbytes # >= 2.7.1
pip3 install git+https://github.com/vipshop/cache-dit.git # cache-dit
pip3 install git+https://github.com/huggingface/diffusers.git # latest main

📚Examples

We have release a Inference Acceleration example (📚longcat_video_fast.py) with 1.7x🎉 speedup in this repo for LongCat-Video, feel free to take a try (Hybrid Cache Acceleration + Context Parallelism + FP8 Weight Only + Torch Compile). For example:

modelscope download --model meituan-longcat/LongCat-Video
export LONGCAT_VIDEO_DIR=/path/to/models/of/LongCat-Video
# Add `--quantize` to enable loading models with bitsandbytes / torchao
# for lower memory usage (e.g, GPU w/ < 48GB memory)
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile # w/o cache
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile --cache
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile --cache --Fn 1
🤖Baseline w/o Cache Acceleration 🎉w/ Cache Acceleration
1.0x, NVIDIA L20x4 ~1.7x🎉 speedup, NVIDIA L20x4

©️Acknowledgements

This repo is based on cache-dit and LongCat-Video. Many thanks to these awesome open-source projects.

About

🔥LongCat-Video 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages