GitHub - HKUST-KnowComp/CoT-ICL-Eval: Official Repository for Paper: The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

The Curse of CoT:
On the Limitations of Chain-of-Thought in In-Context Learning

Official Github repository for the benchmark datasets and codes in the paper:
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning (arXiv:2504.05081).

Main Results*

*Results averaged across 16 Large Language Models.

Dataset	Direct Answering	CoT	CoT tokens	React	React tokens	ToT	ToT tokens
ARC	10.01	7.50	914.04	6.34	955.76	6.99	1376.96
MiniARC	17.11	10.36	419.75	8.69	663.37	8.85	233.47
1DARC	41.30	34.93	359.57	28.51	435.97	27.88	594.70
SCAN	62.79	60.04	134.51	57.35	270.16	51.31	455.39
MiniSCAN	20.85	17.32	239.99	15.72	330.14	15.42	554.62
COGS	19.73	14.88	244.11	12.99	272.18	9.24	484.92
SALT	37.72	34.15	175.99	31.06	316.41	27.25	492.73
List Function	44.31	38.29	305.49	34.84	310.73	31.25	486.49
RAVEN	16.94	7.37	434.75	3.09	533.09	5.80	737.64
Average	30.08	24.98	358.69	22.07	454.20	20.44	601.88

Datasets

Dataset	Source Paper	Task Modality	Size
ARC	On the Measure of Intelligence	Symbolic	835
MiniARC	Playgrounds for Abstraction and Reasoning	Symbolic	149
1D-ARC	LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations	Symbolic	901
SCAN	Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks	Textual	1000
MiniSCAN	Learning Compositional Rules via Neural Program Synthesis	Textual	1000
COGS	COGS: A Compositional Generalization Challenge Based on Semantic Interpretation	Textual	1000
SALT	LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Textual	1200
List Functions	The child as hacker: building more human-like models of learning	Numerical	1250
RAVEN	In-Context Analogical Reasoning with Pre-Trained Language Models	Numerical / Symbolic	1259

Citation

If you find our paper interesting, please cite our paper:

@misc{zheng2025cursecotlimitationschainofthought,
      title={The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning}, 
      author={Tianshi Zheng and Yixiang Chen and Chengxi Li and Chunyang Li and Qing Zong and Haochen Shi and Baixuan Xu and Yangqiu Song and Ginny Y. Wong and Simon See},
      year={2025},
      eprint={2504.05081},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.05081}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
1d_arc		1d_arc
arc		arc
cogs		cogs
listfunc		listfunc
miniarc		miniarc
miniscan		miniscan
raven		raven
salt		salt
scan		scan
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Curse of CoT:
On the Limitations of Chain-of-Thought in In-Context Learning

Main Results*

Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

HKUST-KnowComp/CoT-ICL-Eval

Folders and files

Latest commit

History

Repository files navigation

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Main Results*

Datasets

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

The Curse of CoT:
On the Limitations of Chain-of-Thought in In-Context Learning

Packages