Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

This repository contains the code for the paper - Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Installation Instructions

Follow these steps to set up the environment and run the code:

Create a Conda Environment:

Open your terminal and create a new Conda environment with Python 3.x:
```
conda create --name lddbm python=3.11 -y && conda activate lddbm
```

Install Dependencies:

Navigate to the root directory of this repository and install the required packages using pip:

pip install -r requirements.txt

Download pretrained weights for LPIPS evaluation

mkdir -p models && cd models
wget https://huggingface.co/spaces/multimodalart/vqgan/resolve/dec38285640c45fc3f8377a9726daf6e0de08d6a/taming/modules/autoencoder/lpips/vgg.pth

Download Datasets:

Multi-view to 3D - ShapeNet

Download the ShapeNet dataset from the following URL:

https://github.com/fomalhautb/3D-RETR/archive/refs/heads/main.zip

Unzip the files into the below folder within your project.

lddbm/datasets/shapenet

Zero-shot Super Resolution - Celebs and Flicker.

Download the datasets VoxCelebs and Flicker50k

https://www.kaggle.com/datasets/arnaud58/flickrfaceshq-dataset-ffhq
https://www.robots.ox.ac.uk/~vgg/data/voxceleb/

and place them in the same folder

lddbm/datasets/sr

The loading of the files happens in the 'init.py' file of the datasets folder:

train_paths = sorted([str(p) for p in glob(f'{data_path}/Flicker50k' + '/*.png')])
trainset = CelebaDataset(train_paths, lr_transforms=lr_transforms, hr_transforms=hr_transforms, train=True)

image_paths = sorted([str(p) for p in glob(f'{data_path}/celebsA_HQ/celeba_hq_256' + '/*.jpg')])
_, valid_paths = train_test_split(image_paths, test_size=5000, shuffle=True, random_state=42)

make sure folder postfix are alike.

Run Training and Evaluation: Execute the training and evaluation scripts using the following command:

For multi-view to 3D task:

python scripts/main.py --config_name multi2shape --data_path lddbm/datasets/shapenet "

For super resolution task:

python scripts/main.py --config_name sr --data_path lddbm/datasets/sr "

License

This project is licensed under the AGPL 3.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
lddbm		lddbm
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Installation Instructions

Multi-view to 3D - ShapeNet

Zero-shot Super Resolution - Celebs and Flicker.

License

About

Uh oh!

Releases

Packages

Languages

License

boschresearch/Multimodal-Distribution-Translation-MDT

Folders and files

Latest commit

History

Repository files navigation

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Installation Instructions

Multi-view to 3D - ShapeNet

Zero-shot Super Resolution - Celebs and Flicker.

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages