Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [Paper] [Project Website]
We provide the questions of quantitative reasoning with data (QRData) in benchmark/QRData.json
. It contains 411 questions with the following keys.
data_description
question
answer
data_files
: a list of names of data filesmeta_data
: a dict containsreference
,keywords
,question_type
, andmultiple_choices
(the possible choices ifquestion_type
is 'multiple_choice').
Data files related to the questions are in benchmark/data.zip
.
Questions of quantitative reasoning with text (QRText) are in
Some numerical questions in QRText encounter measurement errors. We will release the corrected version in the future.benchmark/QRText.json
.
The script for evaluation is in 'benchmark/eval.py'.
Please cite our paper if this repository inspires your work.
@inproceedings{liu-etal-2024-llms,
title = "Are {LLM}s Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data",
author = "Liu, Xiao and
Wu, Zirui and
Wu, Xueqing and
Lu, Pan and
Chang, Kai-Wei and
Feng, Yansong",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.548",
pages = "9215--9235",
}