hippynn.pretraining doesn't have hierarchical_energy_initialization and problem with species data type. #124

Arijit-Majumdar · 2025-01-07T19:28:24Z

Hi,
I am having two issues while trying to train hippynn. I used the following command to install hippynn in my conda environment
conda install -c conda-forge hippynn
However, when trying to run from hippynn.pretraining import hierarchical_energy_initialization, I am getting the following error

I checked the pretraining.py file and there is no hierarchical_energy_initialization in it. I haven't installed the dependencies from conda_environment.txt .

My next issue is with training hippynn with numpy arrays as dataset. I have DFT data of water molecules with position, species and energy. I tried converting the numpy arrays into database using hippynn.databases.Database. But, when I tried training the model, I am getting the following error

My species data, db_name = Z, is in int32 data type, as shown here

Here is the code for creating the database

r = read_coords(data_path)
E = get_energy(data_path)
atom_type = np.ones((E.shape[0],Natoms),dtype=np.int32)
atom_type[:,:8] = 8

I have total 24 atoms (8 molecules) and 2300 configurations. r is the position array of shape (2300 x 72), E is the energy array of shape (2300) and atom_type is the atomic number array of shape (2300 x 8).

After this I followed the steps from barebones.py and used the following lines to create nodes and losses

network_params = {
    "possible_species": [0, 1, 8],  # Z values of the elements in QM7
    "n_features": 20,  # Number of neurons at each layer
    "n_sensitivities": 20,  # Number of sensitivity functions in an interaction layer
    "dist_soft_min": 1.6,  # qm7 is in Bohr!
    "dist_soft_max": 10.0,
    "dist_hard_max": 12.5,
    "n_interaction_layers": 2,  # Number of interaction blocks
    "n_atom_layers": 3,  # Number of atom layers in an interaction block
}

species = inputs.SpeciesNode(db_name="Z")
positions = inputs.PositionsNode(db_name="R")

network = networks.Hipnn("hipnn_model", (species, positions), module_kwargs=network_params)
henergy = targets.HEnergyNode("HEnergy", network, db_name="T")

mse_energy = loss.MSELoss.of_node(henergy)
mae_energy = loss.MAELoss.of_node(henergy)
rmse_energy = mse_energy ** (1 / 2)

validation_losses = {
    "RMSE": rmse_energy,
    "MAE": mae_energy,
    "MSE": mse_energy,
}

training_modules, db_info = hippynn.experiment.assemble_for_training(mse_energy, validation_losses)

After this I used 100 configurations to create the database.

Nconfig = 100

data_dict = {
    "R":r[:Nconfig,:],
    "Z":atom_type[:Nconfig,:],
    "T":E[:Nconfig],
}

database = Database(data_dict, inputs = ["Z","R"], targets=["T"], test_size = 0.1, valid_size = 0.1, seed = 1000)

From here I used the training code from barebones.py and ran into the data type issue.

Please let me know what I can do to resolve these issues.

The text was updated successfully, but these errors were encountered:

lubbersnick · 2025-01-07T20:02:33Z

Hi, thanks for getting in touch!

For the first problem with hierarchical_energy_initialization, install the library from source. You can use the existing conda environment, download the repository, and then do pip install -e . in the directory. Our conda package is a bit behind (although this is something for us to address) so you're using an older version of the code. In that version the function is called set_e0_values, in case you just want to use that.

For the second problem, you want to set the database inputs and targets using the output of assemble_for_training (see docs here . This yields an db_info object which describes the arrays which are inputs to the model ('inputs') and inputs to the loss function ('targets'), including the order of these arrays. The error you're seeing is most likely because the underlying model is set up to expect inputs=['R','Z'], where you have manually specified inputs=['Z','R'], and as a result the positions are being fed in where the species are used, which causes data type errors. The fix is to either pipe the inputs and targets in using **db_info as given here here. If you want to build a database and later change the inputs and target, you can do so by setting the attributes, e.g. database.inputs=db_info['inputs']

Finally, as a small remark, depending on your programming perspective, it may also help to read the ani1x_training.py example which organizes a training code into separate pieces and demonstrates how these pieces interact, as well as a wider set of options that can be explored.

Please let us know if these comments resolve your problems, thanks!

Arijit-Majumdar · 2025-01-09T17:38:55Z

Thanks for the help! I managed to run the training by reorganizing the inputs. I just have a follow up question. The training data I have is in a periodic domain. I have 2300 configurations and for each configuration I have different cell size. This means I have a 2300 x 3 x 3 array containing the cell vectors. The documentation says that I need a cell node to incorporate periodic BC and the cell data should be of the shape n_atoms x 3 x 3. Is it possible to have different cell sizes for different configurations or do all configurations need the same cell size?

lubbersnick · 2025-01-09T17:50:46Z

Ah, you found a typo in our documentation! We will fix that. You have the right shape already, it should say (n_sys,3,3): we will fix this. Then you can instantiate a model like in this example

By the way, if your system does not have a symmetric cell matrix, then please pay attention to the convention in that same paragraph describing which version of the cell we use. If you get crazy results, it could be that the cell has the wrong transposition.

Arijit-Majumdar · 2025-01-10T00:39:28Z

Thanks for your prompt response. I managed to include the periodic BC. I have one last question. I keep getting this warning message during the training process

I am training on water molecules and the O-H bond is typically 1 Å in length. Is it possible to adjust the minimum threshold? Does it affect the ML model? I noticed a warn_if_under(distance, threshold) function in the documentation but I am not sure how to use it to change the threshold.

Thanks again for the help!

lubbersnick · 2025-01-10T17:57:00Z

It looks like you are using hyperparameters from barebones.py where it is fit to the QM7 dataset, which was given in Bohr rather than Angstroms. So you are specifying a minimum distance sensitivity of 1.7 which is to large for data in Angstroms. Hippynn is unit-transparent, so any dimensionful hyperparameters will operate with respect to the units in your dataset. A baseline general set of sensitivity hyperparameters in Angstroms is here . If you are looking to model more near-equilibrium phenomena you can maybe get away with a larger lower cutoff, like 0.85 Angtrom.

Neural net potentials will typically go crazy outside of their training data - for example if some high-energy system is constructed and atoms come very close, like 0.5 A away from each other - most datasets do not cover this kind of regime. HIP-NN neural networks are designed not to pass messages below some threshold distance, and instead the package will just warn you that your system contains things that are too close to reasonably expect the NN to produce reasonable answers.

Arijit-Majumdar · 2025-01-10T22:06:19Z

Thanks for all the help!

lubbersnick · 2025-01-10T23:15:02Z

You're welcome!

Arijit-Majumdar closed this as completed Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hippynn.pretraining doesn't have hierarchical_energy_initialization and problem with species data type. #124

hippynn.pretraining doesn't have hierarchical_energy_initialization and problem with species data type. #124

Arijit-Majumdar commented Jan 7, 2025 •

edited

Loading

lubbersnick commented Jan 7, 2025

Arijit-Majumdar commented Jan 9, 2025

lubbersnick commented Jan 9, 2025 •

edited

Loading

Arijit-Majumdar commented Jan 10, 2025

lubbersnick commented Jan 10, 2025

Arijit-Majumdar commented Jan 10, 2025

lubbersnick commented Jan 10, 2025

hippynn.pretraining doesn't have hierarchical_energy_initialization and problem with species data type. #124

hippynn.pretraining doesn't have hierarchical_energy_initialization and problem with species data type. #124

Comments

Arijit-Majumdar commented Jan 7, 2025 • edited Loading

lubbersnick commented Jan 7, 2025

Arijit-Majumdar commented Jan 9, 2025

lubbersnick commented Jan 9, 2025 • edited Loading

Arijit-Majumdar commented Jan 10, 2025

lubbersnick commented Jan 10, 2025

Arijit-Majumdar commented Jan 10, 2025

lubbersnick commented Jan 10, 2025

Arijit-Majumdar commented Jan 7, 2025 •

edited

Loading

lubbersnick commented Jan 9, 2025 •

edited

Loading