DreamWorld

DreamWorld combines EEG brain signals with world models to generate immersive video worlds from neural activity.

This project bridges two exciting research areas:

DreamDiffusion: Encoding EEG signals and aligning them with CLIP features to generate images from brain activity
World Models: Recent advances in open-source world generation (e.g., Nvidia's Cosmos-Predict2.5)

How It Works

EEG to Image: Use DreamDiffusion to generate an image from EEG data
Image to Caption: Apply BLIP-2 to create a text description of the generated image
World Generation: Feed the image and caption into a world model to generate dynamic video worlds

Examples

This is a successful example where the eeg encoder captures the correct semantic information and the world model generates a sensible video.

Ground Truth	EEG -> Image	Image + prompt -> video
		test3-3.mp4

BLIP 2 caption + guidance prompt

a computer monitor with a picture of a city on it. Dynamic camera movement sweeping through the scene with fluid motion. Start with a smooth dolly forward, then transition into an energetic orbital pan that circles completely around the subject. The camera glides and flows continuously, capturing multiple angles. Bright, warm illumination bathes everything in golden light as the camera maintains constant, purposeful movement throughout the entire shot.

This is an example where the eeg encoder captures the semantics of the eeg signal correctly but the world model creates a nightmare instead of a dream.

Ground Truth	EEG -> Image	Image + prompt -> video
		test8-5.6.mp4

BLIP 2 caption + guidance prompt

a group of chairs and tables outside a store. Dynamic camera movement sweeping through the scene with fluid motion. Start with a smooth dolly forward, then transition into an energetic orbital pan that circles completely around the subject. The camera glides and flows continuously, capturing multiple angles. Bright, warm illumination bathes everything in golden light as the camera maintains constant, purposeful movement throughout the entire shot.

This is an example where the eeg encoder fails and only captures the "silver color". I think it might be because the object is almost unrecognizable. The world model does a fantastic job though.

Ground Truth	EEG -> Image	Image + prompt -> video
		test7-3.4.mp4

BLIP 2 caption + guidance prompt

a pair of silver shoes sitting on a marble floor. Dynamic camera movement sweeping through the scene with fluid motion. Start with a smooth dolly forward, then transition into an energetic orbital pan that circles completely around the subject. The camera glides and flows continuously, capturing multiple angles. Bright, warm illumination bathes everything in golden light as the camera maintains constant, purposeful movement throughout the entire shot.

Future Directions

A promising next step is to bypass the intermediate image generation step and directly use the EEG encoder embeddings to condition world generation, creating an end-to-end EEG-to-world pipeline.

In a distant future where we are able to pick up eeg signals more easily, this demo should show us where we might be headed. Creative professionals such as film makers, architects or designers might no longer have the need to compress their imagination into low dimensional data such as text or speech.

Setup Guide

System Requirements

NVIDIA GPUs with Ampere architecture (RTX 30 Series, A100) or newer
NVIDIA driver >=570.124.06 compatible with CUDA 12.8.1
Linux x86-64
glibc>=2.31 (e.g Ubuntu >=22.04)
Python 3.10

Installation

Clone the repository:

git clone git@github.com:nvidia-cosmos/cosmos-predict2.5.git
cd cosmos-predict2.5

Install system dependencies:

uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Install the package into a new environment:

uv sync
source .venv/bin/activate

Or, install the package into the active environment (e.g. conda):

uv sync --active --inexact

Downloading Checkpoints / Data

Get a Hugging Face Access Token with Read permission
Install Hugging Face CLI: uv tool install -U "huggingface_hub[cli]"
Login: hf auth login
Accept the NVIDIA Open Model License Agreement.

Checkpoints for cosmos-predict2.5 are automatically downloaded during inference and post-training. To modify the checkpoint cache location, set the HF_HOME environment variable.

The datasets folder and pretrains folder for DreamDiffusion are not included in this repository. Please download eeg data from eeg and put it in the root directory of this repository as shown below. We also provide a copy of the Imagenet subset which may be used for eval imagenet.

The finetuned DreamDiffusion checkpoint: ckpt

File path | Description


DreamDiffusion/pretrains
┣ 📂 models
┃   ┗ 📜 config15.yaml
┃   ┗ 📜 checkpoint.pth  (pre-trained EEG encoder)

DreamDiffusion/datasets
┣ 📂 imageNet_images (subset of Imagenet)

┗  📜 block_splits_by_image_all.pth
┗  📜 block_splits_by_image_single.pth 
┗  📜 eeg_5_95_std.pth  

DreamDiffusion/code
┣ 📂 sc_mbm
┃   ┗ 📜 mae_for_eeg.py
┃   ┗ 📜 trainer.py
┃   ┗ 📜 utils.py

┣ 📂 dc_ldm
┃   ┗ 📜 ldm_for_eeg.py
┃   ┗ 📜 utils.py
┃   ┣ 📂 models
┃   ┃   ┗ (adopted from LDM)
┃   ┣ 📂 modules
┃   ┃   ┗ (adopted from LDM)

┗  📜 stageA1_eeg_pretrain.py   (main script for EEG pre-training)
┗  📜 eeg_ldm.py    (main script for fine-tuning stable diffusion)
┗  📜 gen_eval_eeg.py               (main script for generating images)

┗  📜 dataset.py                (functions for loading datasets)
┗  📜 eval_metrics.py           (functions for evaluation metrics)
┗  📜 config.py                 (configurations for the main scripts)

Acknowledgements

This project builds upon several open-source works:

DreamDiffusion for EEG-to-image generation
NVIDIA Cosmos for world model generation
BLIP-2 for image captioning

DreamDiffusion

@article{bai2023dreamdiffusion,
  title={DreamDiffusion: Generating High-Quality Images from Brain EEG Signals},
  author={Bai, Yunpeng and Wang, Xintao and Cao, Yanpei and Ge, Yixiao and Yuan, Chun and Shan, Ying},
  journal={arXiv preprint arXiv:2306.16934},
  year={2023}
}

Cosmos World Foundation Model

@misc{nvidia2025cosmosworldfoundationmodel,
  title={Cosmos World Foundation Model Platform for Physical AI},
  author={NVIDIA and : and Niket Agarwal and Arslan Ali and Maciej Bala and Yogesh Balaji and Erik Barker and Tiffany Cai and Prithvijit Chattopadhyay and Yongxin Chen and Yin Cui and Yifan Ding and Daniel Dworakowski and Jiaojiao Fan and Michele Fenzi and Francesco Ferroni and Sanja Fidler and Dieter Fox and Songwei Ge and Yunhao Ge and Jinwei Gu and Siddharth Gururani and Ethan He and Jiahui Huang and Jacob Huffman and Pooya Jannaty and Jingyi Jin and Seung Wook Kim and Gergely Klár and Grace Lam and Shiyi Lan and Laura Leal-Taixe and Anqi Li and Zhaoshuo Li and Chen-Hsuan Lin and Tsung-Yi Lin and Huan Ling and Ming-Yu Liu and Xian Liu and Alice Luo and Qianli Ma and Hanzi Mao and Kaichun Mo and Arsalan Mousavian and Seungjun Nah and Sriharsha Niverty and David Page and Despoina Paschalidou and Zeeshan Patel and Lindsey Pavao and Morteza Ramezanali and Fitsum Reda and Xiaowei Ren and Vasanth Rao Naik Sabavat and Ed Schmerling and Stella Shi and Bartosz Stefaniak and Shitao Tang and Lyne Tchapmi and Przemek Tredak and Wei-Cheng Tseng and Jibin Varghese and Hao Wang and Haoxiang Wang and Heng Wang and Ting-Chun Wang and Fangyin Wei and Xinyue Wei and Jay Zhangjie Wu and Jiashu Xu and Wei Yang and Lin Yen-Chen and Xiaohui Zeng and Yu Zeng and Jing Zhang and Qinsheng Zhang and Yuxuan Zhang and Qingqing Zhao and Artur Zolkowski},
  year={2025},
  eprint={2501.03575},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2501.03575},
}

BLIP-2

@misc{li2023blip2bootstrappinglanguageimagepretraining,
  title={BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models},
  author={Junnan Li and Dongxu Li and Silvio Savarese and Steven Hoi},
  year={2023},
  eprint={2301.12597},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2301.12597},
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DreamDiffusion		DreamDiffusion
cosmos-predict2.5		cosmos-predict2.5
.dockerignore		.dockerignore
.gitignore		.gitignore
.link-check-relative.json		.link-check-relative.json
.link-check.json		.link-check.json
.pre-commit-config-base.yaml		.pre-commit-config-base.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pytest.ini		.pytest.ini
.python-version		.python-version
.ruff.toml		.ruff.toml
README.md		README.md
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
pyrefly.toml		pyrefly.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamWorld

How It Works

Examples

Future Directions

Setup Guide

System Requirements

Installation

Downloading Checkpoints / Data

Acknowledgements

DreamDiffusion

Cosmos World Foundation Model

BLIP-2

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DreamWorld

How It Works

Examples

Future Directions

Setup Guide

System Requirements

Installation

Downloading Checkpoints / Data

Acknowledgements

DreamDiffusion

Cosmos World Foundation Model

BLIP-2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages