BlobCtrl

😃 This repository contains the implementation of "BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing".

Keywords: Image Generation, Image Editing, Diffusion Models, Element-level

TL;DR: BlobCtrl enables precise, user-friendly multi-round element-level visual manipulation.
Main Features: 🦉Element-level Add/Remove/Move/Replace/Enlarge/Shrink.

Yaowei Li ¹, Lingen Li ³, Zhaoyang Zhang ^2‡, Xiaoyu Li ², Guangzhi Wang ², Hongxiang Li ¹, Xiaodong Cun ², Ying Shan ², Yuexian Zou ^1✉
¹Peking University ²ARC Lab, Tencent PCG ³The Chinese University of Hong Kong ^‡Project Lead ^✉Corresponding Author

🌐Project Page | 📜Arxiv | 📹Video | 🤗Hugging Face Demo | 🤗Hugging Model

🤗Hugging Data (TBD) | 🤗Hugging Benchmark (TBD)

compress_BlobCtrl-video.mp4

Youtube Introduction Video: Youtube.

📖 Table of Contents

BlobCtrl

🔥 Update Logs

[TBD] Release the data preprocessing code.
[TBD] Release the BlobData and BlobBench.
[TBD] Release the training code
[20/03/2025] Release the inference code.
[17/03/2025] Release the paper, webpage and gradio demo.

🛠️ Method Overview

We introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks, offering a practical solution for precise and flexible visual content creation.

🚀 Getting Started

Environment Requirement 🌍

BlobCtrl has been implemented and tested on CUDA121, Pytorch 2.2.0, python 3.10.15.

Clone the repo:

git clone [email protected]:TencentARC/BlobCtrl.git

We recommend you first use conda to create virtual environment, and install needed libraries. For example:

conda create -n blobctrl python=3.10.15 -y
conda activate blobctrl
python -m pip install --upgrade pip
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install xformers torch==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

Download Model Checkpoints 💾

Download the corresponding checkpoints of BlobCtrl.

sh examples/blobctrl/scripts/download_models.sh

The ckpt folder contains

Our provided BlobCtrl checkpoints (UNet LoRA + BlobNet).
Pretrained SD-v1.5 checkpoint.
Pretrained DINOv2 checkpoint.
Pretrained SAM checkpoint.

The checkpoint structure should be like:

|-- models
    |-- blobnet
        |-- config.json
        |-- diffusion_pytorch_model.safetensors
    |-- dinov2-large
        |-- config.json
        |-- model.safetensors
        ...
    |-- sam
        |-- sam_vit_h_4b8939.pth
    |-- unet_lora
        |-- pytorch_lora_weights.safetensors

🏃🏼 Running Scripts

BlobCtrl demo 🤗

You can run the demo using the script:

sh examples/blobctrl/scripts/run_app.sh

BlobCtrl Inference 🌠

You can run the inference using the script:

examples/blobctrl/scripts/inference.sh

🤝🏼 Cite Us

@misc{li2024brushedit,
  title={BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing}, 
  author={Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou},
  year={2025},
  eprint={2503.13434},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

💖 Acknowledgement

Our implementation builds upon the diffusers library. We extend our sincere gratitude to all the contributors of the diffusers project!

We also acknowledge the BlobGAN project for providing valuable insights and inspiration for our blob-based representation approach.

❓ Contact

For any question, feel free to email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benchmarks		benchmarks
docker		docker
docs		docs
examples		examples
scripts		scripts
src/diffusers		src/diffusers
tests		tests
utils		utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
LICENSE_ori		LICENSE_ori
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PHILOSOPHY.md		PHILOSOPHY.md
README.md		README.md
README_diffusers.md		README_diffusers.md
_typos.toml		_typos.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

BlobCtrl

🔥 Update Logs

🛠️ Method Overview

🚀 Getting Started

🏃🏼 Running Scripts

🤝🏼 Cite Us

💖 Acknowledgement

❓ Contact

🌟 Star History

About

Licenses found

Releases

Packages

Languages

License

Licenses found

TencentARC/BlobCtrl

Folders and files

Latest commit

History

Repository files navigation

BlobCtrl

🔥 Update Logs

🛠️ Method Overview

🚀 Getting Started

🏃🏼 Running Scripts

🤝🏼 Cite Us

💖 Acknowledgement

❓ Contact

🌟 Star History

About

Topics

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages