augment-atoms is a tool for augmenting datasets of atomic configurations via a model-driven, GPU-accelerated, rattle-relax-repeat procedure.
For each structure in the starting dataset, augment-atoms uses the provided potential energy surface (PES) model to generate a "family tree" of new structures.
In the beginning, the tree consists of the single starting structure.
To generate a new "child" structure, augment-atoms:
- selects a "parent" structure from the tree,
- rattles the atomic positions and unit cell,
- relaxes using the PES model to get a new structure,
- labels the child structure with the PES model, and
- inserts the child structure into the tree.
For precise details of each of these steps, see the Details section below.
pip install augment-atomsThis will install the augment-atoms command line tool (see pyproject.toml for the dependencies, requires Python 3.9+). Using uv is recommended, and will install augment-atoms with the correct dependencies in under 20 seconds starting from scratch.
There are no specific hardware requirements for augment-atoms. If a GPU is available, and the PES model supports it, the GPU will be used to accelerate structure generation. augment-atoms has been tested on both Linux and macOS.
augment-atoms config.yamlwhere config.yaml is a YAML file containing the following:
data:
# an ase-readable file containing the starting structures
input: input.xyz
# an ase-writeable path to append the new structures to
output: output.xyz
config:
# number of augmentations per starting structure
n_per_structure: 10
# the temperature
T: 300 # units are Kelvin
# the explore-vs-exploit trade-off (see below)
beta: 0.5
# the range of values from which to sample a
# standard deviation to rattle with at each step
sigma_range: [0.01, 0.1] # units are Å
# the random seed to use (for reproducibility)
seed: 42
# the standard deviation of the cell perturbation
# if null, no cell perturbation is applied
cell_sigma: null # units are Å
# the units of the energies generated by the PES model
units: eV
# the maximum force magnitude to relax to
max_force: 30 # units are (energy / Å)
# the minimum separation between atoms to consider
min_separation: 0.5 # units are Å
# the maximum number of relaxations to perform per iteration
max_relax_steps: 20
# the threshold for considering a structure too similar to the existing pool
similarity_threshold: 0.1 # units are Å
model:
# the calculator to use to generate the PES model
calculator: +lennard_jones()In-built options for the calculator are:
- a Lennard-Jones calculator:
model:
calculator: +lennard_jones()- any model from the graph-pes package. If a GPU is available, it will be used to accelerate the PES model.
model:
calculator:
+graph_pes_calculator:
path: path/to/model.ptAlternatively, you are free to point to any instance of an ase.Calculator object.
If you have my_function in my_file.py that returns an ase.Calculator object, you can use it as follows:
model:
calculator: +my_file.my_function()To choose a new parent structure, we randomly sample from all structures in the tree, such that atom
where
To create a "child" from this parent structure, we perform the following transformation:
where
-
$R$ are the atomic positions -
$C_0$ is the unit cell of the original seed structure -
$A \in \mathbb{R}^{3\times 3}$ has entries sampled from$\mathcal{N}(0, \sigma_{A})$ where$\sigma_{A} \in \rm{sigma \_ range}$ -
$B \in \mathbb{R}^{N \times 3}$ has entries sampled from$\mathcal{N}(0, \sigma_{B})$ where$\sigma_{B} \in [0, \rm{cell \_ sigma}]$
In the case of isolated structures, we only rattle the positions (i.e.
To relax the rattled child structure, we use energies and forces generated by the PES model using a scheme inspired by the Robbins-Monro algorithm.
Step
where config.max_force and where config.min_separation Å.
This demo uses structures and a model taken from this repo's sister repository, found here.
We include a stand-alone demo usage in the demo directory. This takes 3 water structures as input and uses a PaiNN model to generate and label 27 new structures, for a total of 30 structures.
The demo directory has the following files:
input.xyzcontains 3 starting water structuresconfig.yamlcontains the configuration for the demomodel.ptis a PaiNN model trained on water structures from ...output.xyzis the augmented dataset output.
To run this demo yourself:
# clone the repository
git clone https://github.com/jla-gardner/augment-atoms.git
cd augment-atoms/demo
# remove the output file if it exists
rm -rf output.xyz
# run the demo
augment-atoms config.yamlThis entire script took under 10 seconds on my M1 MacBook Pro.
If you use augment-atoms in your research, please cite the following pre-print:
@misc{Gardner-25-06,
title = {Distillation of Atomistic Foundation Models across Architectures and Chemical Domains},
author = {Gardner, John L. A. and du Toit, Daniel F. Thomas and Mahmoud, Chiheb Ben and Beaulieu, Zo{\'e} Faure and Juraskova, Veronika and Pa{\c s}ca, Laura-Bianca and Rosset, Louise A. M. and Duarte, Fernanda and Martelli, Fausto and Pickard, Chris J. and Deringer, Volker L.},
year = {2025},
number = {arXiv:2506.10956},
doi = {10.48550/arXiv.2506.10956},
}