Skip to content

oxpig/ChemIQ

Repository files navigation

ChemIQ - Assessing the Chemical Intelligence of Large Language Models

ChemIQ is a benchmark designed to test the ability of LLMs to interpret molecular structures and perform chemical reasoning. Questions in this benchmark range from counting the number of carbon atoms in a molecule, to performing NMR elucidation.

Read the paper here: https://arxiv.org/abs/2505.07735

Task summary figure

Quick start

Create a conda environment:

conda create -n ChemIQ python=3.11 numpy pandas matplotlib scipy requests openai rdkit jupyterlab ipykernel -c conda-forge

And activate it:

conda activate ChemIQ

All benchmark questions are stored in questions/chemiq.jsonl.

Benchmark construction

ChemIQ consists of algorithmically generated questions from eight question categories:

Task summary figure

Figure 1 Question categories in the ChemIQ benchmark. The number of questions in each category is shown in the panel header, and * indicates the set contains 50% canonical and 50% randomized SMILES.

question_category Task Purpose
counting_carbon How many carbon atoms are in the molecule [SMILES] Counting characters is a basic requirement for interpreting SMILES strings.
counting_ring How many rings are in the molecule [SMILES] Testing basic requirement for interpreting SMILES string. This can be solved by counting the "ring number" characters in the SMILES and dividing by 2.
shortest_path Count the bonds between the dummy atoms [SMILES] Interpreting graph-based features from SMILES strings
atom_mapping Map the atoms from [SMILES 1] to [SMILES 2] Understanding graph isomorphism - that is, two different SMILES strings can represent the same molecule. Doing this indicates an ability to navigate and interpret the molecular graph.
smiles_to_iupac Write the IUPAC name of the molecule [SMILES] Task requires interpreting molecular graph and then writing this in natural language. Demonstrates ability to describe functional groups and their relative positioning to each other
sar Given [molecular data] determine the score of [SMILES] Shows ability to extract molecular features, assign values, then generalise this to an unseen molecule
reaction Write the product of reaction [SMILES 1] + [SMILES 2] as a SMILES string This task is primarily focused on interpreting basic chemical reactions from SMILES and then applying the correct transformation to write the SMILES string of the product. These reaction questions are "easy" for a chemist and do not test other reaction prediction factors like selectivity, stereochemistry, reaction conditions etc.
nmr_elucidation Write the SMILES string of the molecule consistent with this data [Formula] [¹H NMR] [¹³C NMR] This task is our most advanced task for interpreting molecular structures. This requires mapping of NMR features to local chemical structures, then combining them together consistent with the NMR data. In the latest update (10/07/2025) we have replaced the 30 ZINC 1D NMR questions with 50 2D NMR questions.

Questions

File Path Description
questions/chemiq.jsonl Main benchmark consisting of 816 questions.
questions/additional_smiles_to_iupac.jsonl Additional questions used for error analysis of SMILES to IUPAC task (functional group naming and locant numbering).

Each line in the .jsonl is a single question stored as a Python dictionary:

{'uuid': 'cbfe1b13-aadb-40e4-838d-388c8878e3ee',
 'question_category': 'counting_carbon',
 'sub_category': None,
 'meta_data': {'smiles': 'Nc1nnc(SC(F)F)s1',
  'smiles_random': 'S(c1sc(N)nn1)C(F)F',
  'carbon_count': 3},
 'prompt': 'How many carbon atoms are in the molecule:\n\nS(c1sc(N)nn1)C(F)F\n\nGive your answer as an integer. Do not write any comments.',
 'answer': 3,
 'answer_format': 'integer',
 'answer_range': None,
 'verification_method': 'exact_match',
 'ChemIQ': True}

Submit each prompt to the LLM. The responses can be scored using the helpers in 2_process_results.ipynb. For nearly all questions, the correct answer is given in the answer field. The only exception is the SAR questions with added noise where any value inside answer_range is accepted. The answer checking method is defined by verification_method, which points to one of the checker functions implemented in 2_process_results.ipynb.

Citation

If you use ChemIQ, please cite:

@article{runcie2025assessing,
  title={Assessing the Chemical Intelligence of Large Language Models},
  author={Nicholas T. Runcie and Charlotte M. Deane and Fergus Imrie},
  journal={arXiv preprint arXiv:2505.07735},
  year={2025},
  doi={10.48550/arXiv.2505.07735},
  url={https://arxiv.org/abs/2505.07735},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •