Materials-Aware Large Language Models (LLMs) are transforming the field of materials science by automating complex tasks traditionally reliant on human expertise. Leveraging advancements in AI, these models facilitate everything from data extraction and property prediction to inverse design, synthesis planning and self-driven labs.
A non-exhaustive progression of LLMs tailored for materials science, highlighting key milestones within each general model family.Here, we provide a curated, non-exhaustive list of research papers that showcase the applications of LLMs in advancing materials science.
sort by date
LLMs for data extraction can process text, images, tables, and graphs from scientific literature, converting unstructured information into structured data, which is essential for building comprehensive materials databases.
Name & Link | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
MagBERT (Kumar et al.) | BERT | Magnesium | 2024.08 | Materials Today Communications |
MagBERT (Zhumabayeva et al.) | BERT | Magnetic | 2024.07 | The Journal of Physical Chemistry C |
ChemREL | RoBERTa | Chemicals | 2024.07 | Journal of Chemical Information and Modeling |
LLaMat | LLaMA-2-7B | Crystal | 2024.07 | OpenReview |
MaTableGPT | GPT-4 | Electrocatalysts | 2024.06 | arXiv |
MatGPT | GPT-3, LLaMA-7B | Solar cell | 2024.05 | Cell Reports Physical Science |
LLaMP | GPT-3.5, GPT-4 | General materials | 2024.01 | arXiv |
MatSci-LumEn | GPT-3.5, GPT-4 | General materials | 2024.01 | GitHub |
MatSciRE | BERT, RoBERTa | General materials | 2024.01 | arXiv |
ACE | Transformer | Single-atom heterogeneous catalysts | 2023.12 | Nature Communications |
MechGPT | OpenOrca-Platypus2-13B | Materials failure | 2023.10 | arXiv |
DARWIN | LLaMA-7B | Solar cell | 2023.08 | arXiv |
GPT Chemistry Assistant | GPT-3.5, GPT-4 | MOF | 2023.06 | Journal of the American Chemical Society |
Recycle-BERT | BERT | Recycling plastic | 2023.08 | ACS Sustainable Chemistry & Engineering |
GPT-MLP | GPT-3, GPT-3.5, GPT-4 | Solid-state, doped semiconductors, gold nanoparticle | 2023.08 | Communications Materials |
MatSci-NLP | BERT | General materials | 2023.05 | arXiv |
ChatExtract | GPT-3.5, GPT-4 | High entropy alloys | 2023.03 | Nature Communications |
OpticalBERT | BERT | Optical | 2023.03 | Journal of Chemical Information and Modeling |
BatteryDataExtractor | BERT | Battery | 2022.09 | Chemical Science |
MaterialsBERT | BERT | Polymer | 2022.09 | npj Computational Materials |
BatteryBERT | BERT | Battery | 2022.05 | Journal of Chemical Information and Modeling |
MatBERT | BERT | Solid-state, doped semiconductors, gold nanoparticle | 2022.04 | Patterns |
MatSciBERT | BERT | Solid oxide fuel cells | 2021.09 | npj Computational Materials |
ChemRxnExtractor | BERT | Chemical Reaction | 2021.06 | Journal of Chemical Information and Modeling |
ChemBERT | BERT | Chemical Reaction | 2021.06 | Journal of Chemical Information and Modeling |
RXNMapper | Transformer | Chemical reactions | 2021.04 | Science Advances |
RXN4Chemistry | Transformer | Chemical reactions | 2019.12 | GitHub |
SciBERT | BERT | General scientific text | 2019.03 | arXiv |
LLMs for data mining support advanced querying, knowledge graph construction, and answering complex questions within materials science.
Name | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
SciQAG | vicuna-7b-v1.5-16k | Question-answering | 2024.05 | arXiv |
BatteryGPT | ChatGPT | Question-answering | 2024.03 | Cell Reports Physical Science |
MatKG | BERT | Knowledge graph | 2024.01 | Scientific Data |
LitLLM | GPT-3.5, GPT-4 | Literature Review | 2023.12 | arXiv |
PaperQA | GPT-3.5, GPT-4 | Question-answering | 2023.12 | arXiv |
LitQA | GPT-3.5, GPT-4 | Question-answering | 2023.10 | arXiv |
LLMs assist in predicting various properties of materials, helping researchers design new materials with targeted characteristics.
Name | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
MolecularGPT | T5 | Organic molecule | 2024.06 | arXiv |
ChatMOF | GPT-4, GPT-3.5 | MOF | 2024.06 | Nature Communications |
ChemLLM | InternLM2-Base-7B | Organic molecule | 2024.04 | arXiv |
AlloyBERT | RoBERTa | Alloy | 2024.03 | arXiv |
CrystalLLM (Gruver et al.) | LLaMA-2 70B | Inorganic | 2024.02 | arXiv |
GPTChem | GPT-3 | Organic molecule | 2024.02 | Nature Machine Intelligence |
LLaMP | GPT-3.5, GPT-4 | Crystal | 2024.01 | arXiv |
PolyNC | T5 | Polymer | 2023.12 | Chemical Science |
FG-BERT | BERT | Organic molecule | 2023.11 | Briefings in Bioinformatics |
LLM-Prop | T5 | Crystalline Solids | 2023.10 | arXiv |
GPT-MolBERTa | BERT, RoBERTa | Organic molecule | 2023.09 | arXiv |
CatBERTa | RoBERTa | Catalyst | 2023.09 | ACS Catalysis |
DARWIN | LLaMA-7B | Thermoelectric | 2023.08 | arXiv |
GIMLET | T5 | Thermoelectric | 2023.08 | arXiv |
MolRoPE-BERT | T5 | Organic molecule | 2023.07 | Journal of Molecular Graphics and Modelling |
BERTOS | BERT | Inorganic | 2022.11 | Advanced Science |
SolvBERT | BERT | Solvent | 2022.10 | Digital Discovery |
PolyBERT | DeBERTa | Polymer | 2022.09 | Nature Communications |
ChemBERTa | RoBERTa | Organic molecule | 2022.08 | arXiv |
ChemGPT | GPT-Neo | Organic molecule | 2022.05 | Nature Machine Intelligence |
Mol-BERT | BERT | Organic molecule | 2022.05 | Journal of Chemistry |
ChemBERTa | RoBERTa | Organic molecule | 2022.03 | arXiv |
SMILES-BERT | BERT | RT | 2019.09 | Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics |
LLMs contribute to generating new material structures, especially for complex materials, enabling accelerated discovery of novel materials.
Name | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
ChatMol | T5 | MOF | 2024.09 | Bioinformatics |
MatterGPT | Customized GPT | Crystalline Solids | 2024.08 | arXiv |
MOLLEO | GPT-4, T5 | Organic molecule | 2024.07 | arXiv |
AtomGPT | GPT-2 | Crystalline Solids | 2024.06 | The Journal of Physical Chemistry Letters |
ChatMOF | GPT-4, GPT-3.5 | MOF | 2024.06 | Nature Communications |
CrystalLLM (Antunes et al.) | Transformer-based | Rutiles, spinels, pyrochlores | 2024.02 | arXiv |
GPTChem | GPT-3 | Organic molecule | 2024.02 | Nature Machine Intelligence |
GPT Linker Designer | GPT-3.5 | MOF Linker | 2023.12 | Journal of the American Chemical Society |
DARWIN | LLaMA-7B | MOF | 2023.08 | arXiv |
Text+Chem T5 | T5 | Inorganic | 2023.02 | arXiv |
MolT5 | T5 | Inorganic | 2022.11 | arXiv |
MT-GPT | GPT | Inorganic | 2022.10 | arXiv |
MT-GPT2 | GPT-2 | Inorganic | 2022.10 | arXiv |
MT-GPTNeo | GPT-Neo | Inorganic | 2022.10 | arXiv |
MT-GPTJ | GPT-J | Inorganic | 2022.10 | arXiv |
MT-BART | BART | Inorganic | 2022.10 | arXiv |
MT-RoBERTa | RoBERTa | Inorganic | 2022.10 | arXiv |
MolGPT | Customized GPT | Organic molecule | 2021.10 | Journal of Chemical Information and Modeling |
LLMs are employed to predict synthesis routes, aiding researchers in planning experiments and identifying potential synthesis challenges.
Name | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
CSLLM | LLaMA-7B | Crystal | 2024.07 | arXiv |
SynthGPT | GPT-3.5, GPT-4 | Inorganic | 2024.04 | Journal of the American Chemical Society |
ReactionT5 | T5 | Organic | 2023.03 | arXiv |
MatChat | LLaMA2 | Inorganic | 2023.10 | arXiv |
GPT Chemistry Assistant | GPT-3.5, GPT-4 | MOF | 2023.08 | Journal of the American Chemical Society |
T5Chem | T5 | Organic | 2022.03 | Journal of the American Chemical Society |
ChemFormer | BART | Organic | 2022.01 | Machine Learning: Science and Technology |
LLM-based agent systems facilitate laboratory automation by controlling instruments, analyzing real-time data, and autonomously adjusting experiments.
Name | Models | Material Types | Release Date | Journal |
---|---|---|---|---|
ChemAgents | Llama-3-70B | Literature reader, experiments designer, robot operator, computation performer | 2024.07 | ChemRxiv |
LLMatDesign | GPT-4o | Data acquisition and filtering, integrated simulations, data analysis and visualization | 2024.06 | arXiv |
MicroGPT | GPT-4 | - | 2024.05 | Digital Discovery |
ChatGPT Research Group | GPT-4 | Synthesis conditions extraction, code generation, research planning, and procedural guidance | 2023.11 | ACS Central Science |
GPT-Lab | GPT-4 | Requirements analysis, literature retrieval, text mining, human researcher feedback, experiment execution | 2023.09 | arXiv |
AtomAgents | GPT-4 | Automatic robotic experiments | 2023.07 | arXiv |
CREST | GPT-3.5 | - | 2023.07 | ChemRxiv |
GPT-4 Reticular Chemist | GPT-4 | Project overview, progress summary, propose task choices, evaluation | 2023.06 | Angewandte Chemie International Edition |
ChemCrow | GPT-4 | Synthesis execution | 2023.04 | Nature Machine Intelligence |
Coscientist | GPT-4 | Web and documentation search, code execution | 2023.03 | Nature |
If you find our work and this repository useful, please consider giving a star ⭐ and citation 🍺:
@misc{yuan2024materials,
title={Materials-Aware Large Language Models as Enablers of Scaling Metadata Ontology and Autonomous Discovery},
author={Wenhao Yuan, Guangyao Chen, Zhilong Wang and Fengqi You},
year={2024},
note={Unpublished manuscript},
institution={Cornell University},
url={https://github.com/PEESEgroup/Awesome-Materials-Aware-Large-Language-Models}
}
Contributions are welcome! Please submit a pull request to add new resources, models, or papers to the repository.