You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to train the HIP-NN model on DFT data for water. I have two datasets, one with 24 atoms and the other with 96 atoms. When I trained on the 24 atoms data, I converted all the position, force, energy and cell size into numpy arrays and used the hippynn.databases.Database class to read the arrays. So, the position array was of dimension [num configuration x 24 x 3]. How can I train the model on both 24 atoms and 96 atoms datasets? Do I need to store the data in something other than numpy arrays?
Thanks
The text was updated successfully, but these errors were encountered:
The documentation is here, although it could be made more thorough. We accept pull requests!
We use padding. All arrays dimensions that are atom-wise should have shape (n_sys,96,...). The species array should have zeros where there is no atom in that system.
e.g.
# species for one data point with 1 water, and one with 2 waters
z = [
[8,1,1,0,0,0],
[8,1,1,8,1,1],
]
For other atom-based arrays such as force and position, the values corresponding to z=0 will be irrelevant. There is a system for removing the padding while processing the NN and the loss function; these values don't cost you significant computational time or change the meaning of the metrics; you could pad the whole array to length 300 atoms if you wanted.
You can also put all the oxygens first or however you like. The atom order will not affect anything at all. I think that datasets in the ANI format require having the padding at the end of the array, rather than in an arbitrary place. The one exception is finding pairs, which might currently depend on the total array length and/or position of the padding. With systems of 96 atoms, this should not be a concern.
Hi,
I am trying to train the HIP-NN model on DFT data for water. I have two datasets, one with 24 atoms and the other with 96 atoms. When I trained on the 24 atoms data, I converted all the position, force, energy and cell size into numpy arrays and used the
hippynn.databases.Database
class to read the arrays. So, the position array was of dimension [num configuration x 24 x 3]. How can I train the model on both 24 atoms and 96 atoms datasets? Do I need to store the data in something other than numpy arrays?Thanks
The text was updated successfully, but these errors were encountered: