ChemIQ - Assessing the Chemical Intelligence of Large Language Models

ChemIQ is a benchmark designed to test the ability of LLMs to interpret molecular structures and perform chemical reasoning. Questions in this benchmark range from counting the number of carbon atoms in a molecule, to performing NMR elucidation.

Read the paper here: https://arxiv.org/abs/2505.07735

Quick start

Create a conda environment:

conda create -n ChemIQ python=3.11 numpy pandas matplotlib scipy requests openai rdkit jupyterlab ipykernel -c conda-forge

And activate it:

conda activate ChemIQ

All benchmark questions are stored in questions/chemiq.jsonl.

Benchmark construction

ChemIQ consists of algorithmically generated questions from eight question categories:

Figure 1 Question categories in the ChemIQ benchmark. The number of questions in each category is shown in the panel header, and * indicates the set contains 50% canonical and 50% randomized SMILES.

question_category	Task	Purpose
`counting_carbon`	How many carbon atoms are in the molecule [SMILES]	Counting characters is a basic requirement for interpreting SMILES strings.
`counting_ring`	How many rings are in the molecule [SMILES]	Testing basic requirement for interpreting SMILES string. This can be solved by counting the "ring number" characters in the SMILES and dividing by 2.
`shortest_path`	Count the bonds between the dummy atoms [SMILES]	Interpreting graph-based features from SMILES strings
`atom_mapping`	Map the atoms from [SMILES 1] to [SMILES 2]	Understanding graph isomorphism - that is, two different SMILES strings can represent the same molecule. Doing this indicates an ability to navigate and interpret the molecular graph.
`smiles_to_iupac`	Write the IUPAC name of the molecule [SMILES]	Task requires interpreting molecular graph and then writing this in natural language. Demonstrates ability to describe functional groups and their relative positioning to each other
`sar`	Given [molecular data] determine the score of [SMILES]	Shows ability to extract molecular features, assign values, then generalise this to an unseen molecule
`reaction`	Write the product of reaction [SMILES 1] + [SMILES 2] as a SMILES string	This task is primarily focused on interpreting basic chemical reactions from SMILES and then applying the correct transformation to write the SMILES string of the product. These reaction questions are "easy" for a chemist and do not test other reaction prediction factors like selectivity, stereochemistry, reaction conditions etc.
`nmr_elucidation`	Write the SMILES string of the molecule consistent with this data [Formula] [¹H NMR] [¹³C NMR]	This task is our most advanced task for interpreting molecular structures. This requires mapping of NMR features to local chemical structures, then combining them together consistent with the NMR data. In the latest update (10/07/2025) we have replaced the 30 ZINC 1D NMR questions with 50 2D NMR questions.

Questions

File Path	Description
`questions/chemiq.jsonl`	Main benchmark consisting of 816 questions.
`questions/additional_smiles_to_iupac.jsonl`	Additional questions used for error analysis of SMILES to IUPAC task (functional group naming and locant numbering).

Each line in the .jsonl is a single question stored as a Python dictionary:

{'uuid': 'cbfe1b13-aadb-40e4-838d-388c8878e3ee',
 'question_category': 'counting_carbon',
 'sub_category': None,
 'meta_data': {'smiles': 'Nc1nnc(SC(F)F)s1',
  'smiles_random': 'S(c1sc(N)nn1)C(F)F',
  'carbon_count': 3},
 'prompt': 'How many carbon atoms are in the molecule:\n\nS(c1sc(N)nn1)C(F)F\n\nGive your answer as an integer. Do not write any comments.',
 'answer': 3,
 'answer_format': 'integer',
 'answer_range': None,
 'verification_method': 'exact_match',
 'ChemIQ': True}

Submit each prompt to the LLM. The responses can be scored using the helpers in 2_process_results.ipynb. For nearly all questions, the correct answer is given in the answer field. The only exception is the SAR questions with added noise where any value inside answer_range is accepted. The answer checking method is defined by verification_method, which points to one of the checker functions implemented in 2_process_results.ipynb.

Citation

If you use ChemIQ, please cite:

@article{runcie2025assessing,
  title={Assessing the Chemical Intelligence of Large Language Models},
  author={Nicholas T. Runcie and Charlotte M. Deane and Fergus Imrie},
  journal={arXiv preprint arXiv:2505.07735},
  year={2025},
  doi={10.48550/arXiv.2505.07735},
  url={https://arxiv.org/abs/2505.07735},
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
figures		figures
model_responses		model_responses
o3-mini-excerpts		o3-mini-excerpts
questions		questions
utils		utils
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
analysis.ipynb		analysis.ipynb
example_openai_api_submission.ipynb		example_openai_api_submission.ipynb
read_questions.ipynb		read_questions.ipynb
reasoning_traces.ipynb		reasoning_traces.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChemIQ - Assessing the Chemical Intelligence of Large Language Models

Quick start

Benchmark construction

Questions

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

oxpig/ChemIQ

Folders and files

Latest commit

History

Repository files navigation

ChemIQ - Assessing the Chemical Intelligence of Large Language Models

Quick start

Benchmark construction

Questions

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages