Training on data with different number of atoms. #131

Arijit-Majumdar · 2025-02-07T18:54:07Z

Hi,
I am trying to train the HIP-NN model on DFT data for water. I have two datasets, one with 24 atoms and the other with 96 atoms. When I trained on the 24 atoms data, I converted all the position, force, energy and cell size into numpy arrays and used the hippynn.databases.Database class to read the arrays. So, the position array was of dimension [num configuration x 24 x 3]. How can I train the model on both 24 atoms and 96 atoms datasets? Do I need to store the data in something other than numpy arrays?

Thanks

The text was updated successfully, but these errors were encountered:

lubbersnick · 2025-02-07T20:11:33Z

The documentation is here, although it could be made more thorough. We accept pull requests!

We use padding. All arrays dimensions that are atom-wise should have shape (n_sys,96,...). The species array should have zeros where there is no atom in that system.
e.g.

# species for one data point with 1 water, and one with 2 waters
z = [
    [8,1,1,0,0,0],
    [8,1,1,8,1,1],
]

For other atom-based arrays such as force and position, the values corresponding to z=0 will be irrelevant. There is a system for removing the padding while processing the NN and the loss function; these values don't cost you significant computational time or change the meaning of the metrics; you could pad the whole array to length 300 atoms if you wanted.

You can also put all the oxygens first or however you like. The atom order will not affect anything at all. I think that datasets in the ANI format require having the padding at the end of the array, rather than in an arbitrary place. The one exception is finding pairs, which might currently depend on the total array length and/or position of the padding. With systems of 96 atoms, this should not be a concern.

lubbersnick closed this as completed Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on data with different number of atoms. #131

Training on data with different number of atoms. #131

Arijit-Majumdar commented Feb 7, 2025

lubbersnick commented Feb 7, 2025

Training on data with different number of atoms. #131

Training on data with different number of atoms. #131

Comments

Arijit-Majumdar commented Feb 7, 2025

lubbersnick commented Feb 7, 2025