Skip to content

Commit 8a97791

Browse files
committed
Merge remote-tracking branch 'origin/Olivia' into dev_schrum
2 parents 7029d7d + 2540243 commit 8a97791

File tree

3 files changed

+132
-2
lines changed

3 files changed

+132
-2
lines changed

MM_Batch/MM-data.bat

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
@echo off
2+
cd ..
3+
4+
5+
:: Convert Mega Man raw level data to JSON
6+
python create_megaman_json_data.py --output datasets\\MM_Levels_Full.json
7+
python create_megaman_json_data.py --output datasets\\MM_Levels_Simple.json --group_encodings
8+
9+
:: Generate captions for Mega Man
10+
python MM_create_ascii_captions.py --dataset datasets\\MM_Levels_Full.json --tileset datasets\\MM.json --output datasets\\MM_LevelsAndCaptions-full-regular.json
11+
python MM_create_ascii_captions.py --dataset datasets\\MM_Levels_Simple.json --tileset datasets\\MM_Simple_Tileset.json --output datasets\\MM_LevelsAndCaptions-simple-regular.json
12+
13+
:: Tokenize Mega Man data
14+
python tokenizer.py save --json datasets\\MM_LevelsAndCaptions-full-regular.json --pkl_file datasets\MM_Tokenizer-full-regular.pkl
15+
python tokenizer.py save --json datasets\\MM_LevelsAndCaptions-simple-regular.json --pkl_file datasets\MM_Tokenizer-simple-regular.pkl
16+
17+
:: Validation captions making, this is not compatable yet
18+
REM python create_random_test_captions.py --save_file "datasets\\MM_RandomTest-simple-regular.json" --json datasets\\MM_LevelsAndCaptions-simple-regular.json --seed 0 --game MM-Simple
19+
REM python create_random_test_captions.py --save_file "datasets\\LR_RandomTest-absence.json" --json %default_out%-absence.json --seed 0 --describe_absence --game LR
20+
21+
:: Split output files into train/val/test sets, also not done
22+
REM python split_data.py --json %default_out%-regular.json --train_pct 0.9 --val_pct 0.05 --test_pct 0.05 --seed 42 --game loderunner
23+
REM python split_data.py --json %default_out%-absence.json --train_pct 0.9 --val_pct 0.05 --test_pct 0.05 --seed 42 --game loderunner

MM_README.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Mega Man Generation
2+
3+
Generate Mega Man level scenes with a diffusion model conditioned on text input.
4+
This Mega Man data is still experimental and on-going and the current results are not as good as the Mario levels and outputs. This mostly has to do with a smaller, more complex dataset, as well as incomplete code. Many features present in other games have not yet been implemented, but the core of the training and level generation works as intended.
5+
6+
## Set up the repository
7+
This repository can be checked out with this command:
8+
```
9+
git clone https://github.com/schrum2/MarioDiffusion.git
10+
```
11+
Data used for training our models already exists in the `datasets` directory of this repo,
12+
but you can recreate the data using these commands. First, you will need to check out
13+
[my forked copy of TheVGLC](https://github.com/schrum2/TheVGLC). Note that the following
14+
command should be executed in the parent directory of the `MarioDiffusion` repository so that
15+
the directories for `MarioDiffusion` and `TheVGLC` are next to each other in the same directory:
16+
```
17+
git clone https://github.com/schrum2/TheVGLC.git
18+
```
19+
20+
Then, enter the "MarioDiffusion" repository
21+
```
22+
cd MarioDiffusion
23+
```
24+
25+
Before running any code, install all requirements with pip:
26+
```
27+
pip install -r requirements.txt
28+
```
29+
Before being able to generate Mega Man levels, you must create a dataset which happens below.
30+
31+
## Create datasets
32+
33+
Due to the massivly increased number of tiles in Mega Man, we split our data into 2 different games internally. "MM-Full" contains the full dataset of tiles, including unique enemies and powerups, while "MM-Simple" groups things like enemies, poweups, and hazards together, giving us a boost in performance, at the cost of some complexity.
34+
35+
In order to create the datasets for both versions of Mega Man, we will be running all of these commands twice. First, we need to create the raw 16X16 level samples with these commands:
36+
```
37+
python create_megaman_json_data.py --output datasets\\MM_Levels_Full.json
38+
python create_megaman_json_data.py --output datasets\\MM_Levels_Simple.json --group_encodings
39+
```
40+
41+
The next step is to create captions for these raw levels, which can be done with this command:
42+
```
43+
python MM_create_ascii_captions.py --dataset datasets\\MM_Levels_Full.json --tileset datasets\\MM.json --output datasets\\MM_LevelsAndCaptions-full-regular.json
44+
python MM_create_ascii_captions.py --dataset datasets\\MM_Levels_Simple.json --tileset datasets\\MM_Simple_Tileset.json --output datasets\\MM_LevelsAndCaptions-simple-regular.json
45+
```
46+
The last step is to create tokenizers for our data, which can be done like this:
47+
```
48+
python tokenizer.py save --json datasets\\MM_LevelsAndCaptions-full-regular.json --pkl_file datasets\MM_Tokenizer-full-regular.pkl
49+
python tokenizer.py save --json datasets\\MM_LevelsAndCaptions-simple-regular.json --pkl_file datasets\MM_Tokenizer-simple-regular.pkl
50+
```
51+
52+
All of this can be done with this batch file, which runs each of these commands in sequence
53+
54+
```
55+
cd MM_Batch
56+
MM-data.bat
57+
```
58+
Now you can browse level scenes and their captions with a command like this (the json file can be replaced by any levels and captions json file in datasets):
59+
```
60+
python ascii_data_browser.py datasets\MM_LevelsAndCaptions-full-regular.json datasets\MM.json
61+
```
62+
63+
64+
## Train unconditional diffusion model
65+
66+
To train an unconditional diffusion model without any text embeddings, run this command:
67+
```
68+
python train_diffusion.py --json datasets\\MM_LevelsAndCaptions-simple-regular.json --augment --output_dir MM_unconditional_simple0 --seed 0 --game MM-Simple
69+
```
70+
71+
## Train text encoder
72+
73+
Masked language modeling is used to train the text embedding model. Use any dataset with an appropriate tokenizer, we will default to the ones for MM-Simple for the rest of the commands here, though both sub-games work fine.
74+
```
75+
python train_mlm.py --epochs 300 --save_checkpoints --json datasets\MM_LevelsAndCaptions-simple-regular.json --pkl datasets\MM_Tokenizer-simple-regular.pkl --output_dir MM-MLM-simple-regular --seed 0
76+
```
77+
78+
## Train text-conditional diffusion model
79+
80+
Now that the text embedding model is ready, train a diffusion model conditioned on text embeddings from the descriptive captions. Note that this can take a while. We used relatively modest consumer GPUs, so our models took about 12 hours to train:
81+
```
82+
python train_diffusion.py --pkl datasets\MM_Tokenizer-simple-regular.pkl --json datasets\\MM_LevelsAndCaptions-simple-regular.json --augment --mlm_model_dir MM-MLM-simple-regular --text_conditional --output_dir MM_conditional_simple_regular0 --seed 0 --game MM-Simple
83+
```
84+
Another trick if you care more about speed than seeing intermediate results is to set `--save_image_epochs` to a large number (larger than the number of epochs), like this
85+
```
86+
python train_diffusion.py --pkl datasets\MM_Tokenizer-simple-regular.pkl --json datasets\\MM_LevelsAndCaptions-simple-regular.json --augment --mlm_model_dir MM-MLM-simple-regular --text_conditional --output_dir MM_conditional_simple_regular0 --seed 0 --game MM-Simple --save_image_epochs 100000
87+
```
88+
89+
This process, from creating the level sample files all the way to diffusion training, can be done with this batch file (This only trains and runs the Simple version):
90+
```
91+
cd MM_Batch
92+
MM_conditional.bat
93+
```
94+
95+
96+
## Generate levels from text-conditional diffusion model
97+
98+
In order to generate levels from a base caption, use this command
99+
```
100+
python text_to_level_diffusion.py --model_path MM_conditional_simple_regular0 --game MM-Simple
101+
```
102+
An easier-to-use GUI interface will let you select and combine known caption phrases to send to the model. Note that the selection of known phrases needs to come from the dataset you trained on.
103+
```
104+
python interactive_tile_level_generator.py --model_path MM_conditional_simple_regular0 --load_data datasets\\MM_LevelsAndCaptions-simple-regular.json --game MM-Simple
105+
```

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -419,5 +419,7 @@ within the Mario Diffusion directory.
419419

420420
[View LR_README.md](LR_README.md)
421421

422-
For more For more information regarding Mega Man, go to the file named (whatever Mega Man readme is named)
423-
within the Mario Diffusion directory.
422+
For more For more information regarding Mega Man, go to the file named `MM_README.md`
423+
within the Mario Diffusion directory.
424+
425+
[View MM_README.md](MM_README.md)

0 commit comments

Comments
 (0)