Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cgnf embeddings #145

Merged
merged 3 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,14 @@ Please use the [issue tracker](https://github.com/WMD-group/ElementEmbeddings/is

We welcome new contributions to this project. See [the contributing guide](contributing.md) for detailed instructions on how to contribute to our project.

### Add an embedding scheme

The steps required to add a new representation scheme are:
1. Add data file to [data/element_representations](src/elementembeddings/data/element_representations).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown formatting issue: Lists should be surrounded by blank lines

Ensure that lists are surrounded by blank lines to adhere to Markdown best practices and improve readability.

- 1. Add data file to [data/element_representations](src/elementembeddings/data/element_representations).
+ 
+ 1. Add data file to [data/element_representations](src/elementembeddings/data/element_representations).
+ 
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
1. Add data file to [data/element_representations](src/elementembeddings/data/element_representations).
1. Add data file to [data/element_representations](src/elementembeddings/data/element_representations).
Tools
Markdownlint

172-172: null (MD032, blanks-around-lists)
Lists should be surrounded by blank lines

2. Edit docstring table in [core.py](src/elementembeddings/core.py).
3. Edit [utils/config.py](src/elementembeddings/utils/config.py) to include the representation in `DEFAULT_ELEMENT_EMBEDDINGS` and `CITATIONS`.
4. Add the represention to the documentation in the [reference.md](docs/reference.md) file.

### Developer

* [Anthony Onwuli](https://github.com/AntObi) (Department of Materials, Imperial College London)
23 changes: 15 additions & 8 deletions docs/reference.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Elemental Embeddings

The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. A majority of these representations have been obtained from the following repositories:
The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. Some representations have been obtained from the following repositories:

* [lrcfmd/ElMD](https://github.com/lrcfmd/ElMD/tree/master)
* [Kaaiian/CBFV](https://github.com/Kaaiian/CBFV/tree/master)
Expand All @@ -24,7 +24,14 @@ We included `atomic` as a linear representation to generate one-hot vectors corr

The following representations are all vector representations (some are local, some are distributed) and the `Embedding` class will load these representations as they are.

### Magpie
### cgnf

The following paper describes the implementation of the composition graph neural fingerprint (cgnf) from the node embedding vectors of a pre-trained crystal graph convolution neural network:
[Synthesizability of materials stoichiometry using semi-supervised learning](https://www.sciencedirect.com/science/article/pii/S2590238524002273)

[Data source](https://github.com/kaist-amsg/Synthesizability-stoi-CGNF/blob/main/cgcnn_hd_rcut4_nn8.element_embedding.json)

### magpie

The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework:
[A general-purpose machine learning framework for predicting properties of inorganic materials](https://www.nature.com/articles/npjcompumats201628)
Expand Down Expand Up @@ -66,21 +73,21 @@ The following paper describes the implementation of mat2vec:

[Data source](https://github.com/Kaaiian/CBFV/blob/master/cbfv/element_properties/mat2vec.csv)

### MatScholar
### matscholar

The following paper describes the natural language processing implementation of Materials Scholar (matscholar):
[Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00470)

[Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/matscholar.json)

### MEGnet
### megnet

The following paper describes the details of the construction of the MatErials Graph Network (MEGNet):
[Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals](https://doi.org/10.1021/acs.chemmater.9b01294)
[Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals](https://doi.org/10.1021/acs.chemmater.9b01294). The 16 dimensional vectors are drawn from the atomic weights of a model trained to predict the formation energies of crystalline materials.

[Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/megnet16.json)

### Oliynyk
### oliynyk

The following paper describes the details:
[High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds](https://pubs.acs.org/doi/full/10.1021/acs.chemmater.6b02724)
Expand Down Expand Up @@ -139,7 +146,7 @@ The 44 features of the embedding vector are formed of the following properties:

* `oliynyk_sc` is a scaled version of the oliynyk embeddings: [Data source](https://github.com/lrcfmd/ElMD/blob/master/ElMD/el_lookup/oliynyk_sc.json)

### Random
### random

This is a set of 200-dimensional vectors in which the components are randomly generated

Expand All @@ -152,7 +159,7 @@ mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma,
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
```

### SkipAtom
### skipatom

The following paper describes the details:
[Distributed representations of atoms and materials for machine learning](https://www.nature.com/articles/s41524-022-00729-3)
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
numpy >= 1.23.3
numpy >= 1.23.3,<2
scipy >=1.10.1
pymatgen > 2022.9.21
seaborn >=0.13.0
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
},
test_suite="elementembeddings.tests.test",
install_requires=[
"numpy>=1.23.3",
"numpy>=1.23.3,<2",
"scipy>=1.10.1",
"pymatgen>2022.9.21",
"seaborn>=0.13.0",
Expand Down
5 changes: 3 additions & 2 deletions src/elementembeddings/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,15 @@ def load_data(embedding_name: Optional[str] = None):
| Mat2Vec | mat2vec |
| Matscholar | matscholar |
| Megnet (16 dimensions) | megnet16 |
| Modified pettifor scale | mod_petti |
| Modified Pettifor scale | mod_petti |
| Oliynyk | oliynyk |
| Oliynyk (scaled) | oliynyk_sc |
| Random (200 dimensions) | random_200 |
| SkipAtom | skipatom |
| Atomic Number | atomic |
| Crystallm | crystallm |
| CrystaLLM | crystallm |
| XenonPy | xenonpy |
| Cgnf | cgnf |


Args:
Expand Down

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions src/elementembeddings/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"atomic": "atomic.json",
"crystallm": "crystallm_v24c.dim512_atom_vectors.csv",
"xenonpy": "xenonpy_element_features.csv",
"cgnf": "cgnf.json",
}

CITATIONS = {
Expand Down Expand Up @@ -146,6 +147,18 @@
"year={2023}}",
],
"xenonpy": [],
"cgnf": [
"@article{jang2024synthesizability,"
"title={Synthesizability of materials stoichiometry "
"using semi-supervised learning},"
"author={Jang, Jidon and Noh, Juhwan and Zhou, Lan "
"and Gu, Geun Ho and Gregoire, John M and Jung, Yousung},"
"journal={Matter},"
"volume={7},"
"number={6},"
"pages={2294--2312},"
"year={2024}",
],
}

ELEMENT_GROUPS_PALETTES = {
Expand Down