🔥LongCat-Video with 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.
git clone https://github.com/xlite-dev/longcat-video-fast.git
cd longcat-video-fast && git submodule update --init --recursive --force
cd LongCat-Video && pip3 install -r requirements.txt && cd ..
pip3 install torch==2.9.0 torchvision torchao bitsandbytes # >= 2.7.1
pip3 install git+https://github.com/vipshop/cache-dit.git # cache-dit
pip3 install git+https://github.com/huggingface/diffusers.git # latest mainWe have release a Inference Acceleration example (📚longcat_video_fast.py) with 1.7x🎉 speedup in this repo for LongCat-Video, feel free to take a try (Hybrid Cache Acceleration + Context Parallelism + FP8 Weight Only + Torch Compile). For example:
modelscope download --model meituan-longcat/LongCat-Video
export LONGCAT_VIDEO_DIR=/path/to/models/of/LongCat-Video
# Add `--quantize` to enable loading models with bitsandbytes / torchao
# for lower memory usage (e.g, GPU w/ < 48GB memory)
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile # w/o cache
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile --cache
torchrun --nproc_per_node=4 longcat_video_fast.py --quantize --compile --cache --Fn 1| 🤖Baseline w/o Cache Acceleration | 🎉w/ Cache Acceleration | 
|---|---|
| 1.0x, NVIDIA L20x4 | ~1.7x🎉 speedup, NVIDIA L20x4 | 
|  |  | 
This repo is based on cache-dit and LongCat-Video. Many thanks to these awesome open-source projects.