HAML is an extension of YAML providing syntax to make parts of the file optional
or generate values.
This is particularly useful for generating YAML files defining the hyperparameters
of ML experiments. The repository also includes a simple runtime executor that can
expand one .hml file into concrete YAML configs and launch each config in its own
tmux session.
This package is currently not on PiPy and has to be installed directly from github:
pip install git+https://github.com/lamarr-institute/haml.gitAlternatively, you can clone this repository and install it from there:
git clone git@github.com:lamarr-institute/haml.git
cd haml
pip install .Choice lists are the most important and versatile syntax element of HAML files. They define blocks of text of which only one is selected for the resulting YAML file.
model:
loss: cross-entropy
dropout_p: {{ 0.0 || 0.05 }}
norm: channel
activation: elu
mlp_size: {{ 128 || 512 }}
name: rutime
channels: {{ ["E1-M2", "E2-M1"]
|| ["E1-M2", "E2-M1", "1-F", "1-2", "2-F"]
|| ["E1-M2", "E2-M1", "1-F", "1-2", "2-F", "Resp Rate", "Pulse Waveform", "Heart Rate"]
}}
By default, line breaks are preserved exactly as they occur in the HAML file, i.e., newline characters between the list item separators ({{, || and }}) are part of the items' content!
When not used carefully, this can result in unexpected line breaks in the generated YAML files.
The command line interface of HAML removes empty lines in the generated files by default.
An optional weight can be added after {{ or ||, separated from the main content by %. By default, each list item has weight 1.
When sampling randomly from a HAML file, the probability of choosing each item is given by its weight divided by the sum of all items' weights.
Consequently, assigning weight 2 to an item (e.g., through ``||2% ...`) doubles its probability to be sampled.
The weights can be non-negative floats or integers.
channels: {{2% ["E1-M2", "E2-M1"]
||3% ["E1-M2", "E2-M1", "1-F", "1-2", "2-F"]
||5% ["E1-M2", "E2-M1", "1-F", "1-2", "2-F", "Resp Rate", "Pulse Waveform", "Heart Rate"]}}
The weighted choice list can be used to create optional content blocks by adding an empty item:
{{
option: debug
||}}
Use [[ ... || ... ]] when a choice list should be expanded exhaustively even during random sampling:
optimizer: [[ adam || sgd || rmsprop ]]
lr: {{ 0.001 || 0.0003 || 0.0001 }}
batch_size: {{ 32 || 64 }}With -s 20, HAML samples the regular {{ ... }} choices 20 times and combines
each random sample with every optimizer value, producing 20 * 3 YAML files.
Multiple [[]] lists are combined as a Cartesian product. When using --all,
{{}} and [[]] are both expanded exhaustively.
{{1-2%
key: foo
value: 3
||
key: bar
value: 4
||
key: baz
value: -29
}}
HAML can insert random values through a special syntax:
intensity: {{%normal(loc=10, scale=2)%}}
saturation: {{%uniform(low=0, high=10)%}}
num-scans: {{%integers(high=20)%}}
In place of normal or uniform, every method provided by a NumPy Generator object can be invoked this way (e.g., triangular, poisson, triangular, etc.), with keyword arguments provided in parenthesis.
Note that random functions called without arguments still require parentheses (e.g., {{%normal()%}}).
Write your HAML file using the syntax described above. The recommended file ending is .hml.
The haml package provides methods for parsing such files into a HAMLObject, which allows to (1) generate all possible YAML files matching the HAML file, or (2) sample random YAML files.
You can use the package both from the command line or within a Python script. The
command line also provides a lightweight runtime executor for launching generated
configs through tmux.
The haml package can be used from the command line using the following syntax:
usage: python -m haml [-h] [-d DIR] [-a] [-s NUM] [--seed SEED] [--rvlimit RVLIMIT]
[--keep-empty-lines]
file
positional arguments:
file input HAML file
options:
-h, --help show this help message and exit
-d DIR, --directory DIR
output directory for generated files
-a, --all generate all possible files
-s NUM, --sample NUM randomly sample files; `[[]]` lists multiply the output
--seed SEED random seed for sampling
--rvlimit RVLIMIT number of samples when running `all` on random variables with infinite
support
--keep-empty-lines do not remove empty lines from the output (this is done by default)
The runtime executor expands one or more .hml files into YAML configs, hashes
each generated YAML file, stores it under a temp directory, and launches one
tmux session per run.
usage: python -m haml run [-h] -r RUNTIME_CONFIG files [files ...]
Example:
python -m haml run experiments/train.hml \
--runtime-config experiments/runtime.ymlMultiple HAML files can share one runtime config:
python -m haml run experiments/train-a.hml experiments/train-b.hml \
--runtime-config experiments/runtime.ymlCPU-only parallel example:
python -m haml run experiments/train.hml --runtime-config experiments/runtime-cpu.ymlThe runtime uses these rules:
- HAML files contain only script configs. Runtime settings live in a separate
runtime YAML file passed with
--runtime-config. - A single runtime config is shared by all HAML files passed to one
runcommand. - If sampled configs contain
[[]]lists, each random sample is combined with every exhaustive choice from those lists. - The runtime computes
sha256(yaml_content)and uses the resulting hex digest as the run id. - Every launched script receives
--id <run_id>. - Generated configs are written to
<temp_dir>/<run_id>.yaml. Iftemp_diris not provided, HAML uses a folder belowtempfile.gettempdir()such as/tmp/haml-runs/<stem>. Here,<stem>is the HAML filename without directory or extension. For example,experiments/train-a.hmluses/tmp/haml-runs/train-a. - If
temp_diris set and multiple HAML files are passed, all generated configs and logs are written below that shared directory. backenddefaults totmux. Setbackend: directto run scripts directly without tmux.- Each tmux-backed run continues to print in the tmux pane and additionally mirrors both
stdout and stderr to
<temp_dir>/logs/<run_id>.log. - Direct runs write stdout and stderr to
<temp_dir>/logs/<run_id>.logwhen logging is enabled, or inherit the terminal output when logging is disabled. enable_logging: falsedisables per-run log files.- Failed tmux panes remain open for inspection after the command exits.
- If
skipis enabled, existing<run_id>.yamlfiles are assumed to have been run successfully already. They are neither rewritten nor relaunched. - If multiple CUDA devices are provided, the runtime schedules one active job per device.
cpu_workerscannot be set together withcuda_visible_devices. - If CUDA devices are omitted,
cpu_workerscontrols how many CPU-only jobs are launched in parallel. Ifcpu_workersis omitted, runs are launched sequentially. In CPU mode,CUDA_VISIBLE_DEVICESis left unset. progress_bar: trueshows one TQDM bar for multi-run execution and suppresses per-run launch/completion log messages.
Runtime config entries:
scriptis required. It is the command to launch, either as a shell-style string such aspython train.pyor as a list such as[python, train.py].pass_configdefaults totrue. When enabled, HAML passes--config <temp_dir>/<run_id>.yaml --id <run_id>to the script.argsis optional and is only used whenpass_config: false. It can be a mapping like{lr: 0.001, batch-size: 64, use-ema: true}or a list like[--lr, "0.001", --batch-size, "64"].envis an optional mapping of environment variables for each launched process.backenddefaults totmux. Supported values aretmuxanddirect.num_samplesoptionally controls how many random HAML samples are generated. If omitted, all combinations are generated.seedoptionally sets the random seed for HAML sampling.rvlimitoptionally sets the expansion limit for random variables with infinite support when generating all combinations.keep_empty_linesdefaults tofalse. Set it totrueto keep blank lines in generated YAML configs.temp_diroptionally sets where generated configs and logs are written.skipdefaults tofalse. Set it totrueto skip generated config files that already exist on disk.enable_loggingdefaults totrue. Set it tofalseto disable per-run log files.no_log_file: trueis accepted as the inverse spelling.log_leveldefaults toINFO. Supported values areDEBUG,INFO,WARNING, andERROR.progress_bardefaults tofalse. Set it totrueto show one TQDM progress bar for multi-run execution instead of per-run scheduling log messages.cuda_visible_devicesis an optional list. HAML schedules one active run per listed value and setsCUDA_VISIBLE_DEVICESfor that run.cpu_workersoptionally sets the number of CPU-only runs to execute in parallel. It cannot be combined withcuda_visible_devices.
Example generated YAML:
model: model-a
epochs: 100
use_ema: true
lr: 0.001Example runtime YAML:
script: python train.py
num_samples: 8
env:
WANDB_MODE: offline
cuda_visible_devices:
- 0
- 1
temp_dir: /tmp/haml-train
skip: true
progress_bar: true
log_level: INFOCPU-only runtime YAML:
script: python train.py
num_samples: 8
cpu_workers: 4
temp_dir: /tmp/haml-train
skip: trueDirect runtime YAML:
script: python train.py
backend: direct
num_samples: 8
cpu_workers: 4
temp_dir: /tmp/haml-train- Your script must accept an
--idCLI argument. - Your script is responsible for storing checkpoints, metrics, and any other results locally.
- The runtime supports
tmuxanddirectbackends with optionalCUDA_VISIBLE_DEVICESassignment for GPU selection. - Existing config files are treated as completed runs when
skipis used. There is no separate result verification layer yet.
The following example shows how to parse a HAML file and work with the resulting HAMLObject.
Content of the HAML file foo.hml:
# A sample yaml file
company: spacelift
domain:
{{2%
- devops
||
- devsecops
}}
tutorial:
{{2-2%
- yaml:
name: "YAML Ain't Markup Language"
type: awesome
born: 2001
||
- json:
name: JavaScript Object Notation
type: great
born: 2001
||
- xml:
name: Extensible Markup Language
type: good
born: 1996
}}
author: omkarbirade
published: truePython script:
import haml
import numpy as np
# parse a HAML file
h = haml.parse_file('foo.hml')
rng = np.random.default_rng(2993644)
for i in range(10):
# generate a random YAML string from HAML file
s = h.random(random_state=rng)
# write to YAML file
with open(f'foo_random_{i}.yaml', 'w') as f:
f.write(s)
# generate all possible YAML files matching the HAML file
# (this can be a large number, so check before jampacking your harddisk!)
print(f'Generating {h.num_combinations()} files')
for i, s in enumerate(h.all()):
with open(f'foo_all_{i}.yaml', 'w') as f:
f.write(s)