This repository contains the code for the paper - Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
Follow these steps to set up the environment and run the code:
-
Create a Conda Environment:
Open your terminal and create a new Conda environment with Python 3.x:
conda create --name lddbm python=3.11 -y && conda activate lddbm -
Install Dependencies:
Navigate to the root directory of this repository and install the required packages using pip:
pip install -r requirements.txtDownload pretrained weights for LPIPS evaluation
mkdir -p models && cd models wget https://huggingface.co/spaces/multimodalart/vqgan/resolve/dec38285640c45fc3f8377a9726daf6e0de08d6a/taming/modules/autoencoder/lpips/vgg.pth -
Download Datasets:
Download the ShapeNet dataset from the following URL:
https://github.com/fomalhautb/3D-RETR/archive/refs/heads/main.zipUnzip the files into the below folder within your project.
lddbm/datasets/shapenetDownload the datasets VoxCelebs and Flicker50k
https://www.kaggle.com/datasets/arnaud58/flickrfaceshq-dataset-ffhq https://www.robots.ox.ac.uk/~vgg/data/voxceleb/and place them in the same folder
lddbm/datasets/srThe loading of the files happens in the 'init.py' file of the datasets folder:
train_paths = sorted([str(p) for p in glob(f'{data_path}/Flicker50k' + '/*.png')]) trainset = CelebaDataset(train_paths, lr_transforms=lr_transforms, hr_transforms=hr_transforms, train=True) image_paths = sorted([str(p) for p in glob(f'{data_path}/celebsA_HQ/celeba_hq_256' + '/*.jpg')]) _, valid_paths = train_test_split(image_paths, test_size=5000, shuffle=True, random_state=42)make sure folder postfix are alike.
-
Run Training and Evaluation: Execute the training and evaluation scripts using the following command:
For multi-view to 3D task:
python scripts/main.py --config_name multi2shape --data_path lddbm/datasets/shapenet "For super resolution task:
python scripts/main.py --config_name sr --data_path lddbm/datasets/sr "
This project is licensed under the AGPL 3.0 License - see the LICENSE file for details.