Contributed by Kimon Fountoulakis
-
On the Computational Power of Neural Nets. Journal of Computer and System Sciences 1995. paper
H.T. Siegelmann, E.D. Sontag
-
Attention is Turing Complete. Journal of Machine Learning Research 2021. paper
Jorge Pérez, Pablo Barceló, Javier Marinkovic
-
Looped Transformers as Programmable Computers. ICML 2023. paper
Angeliki Giannou, Shashank Rajput, Jy-Yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos
-
Exposing Attention Glitches with Flip-Flop Language Modeling. NeurIPS 2023. paper
Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang
-
Transformers Learn Shortcuts to Automata. ICLR 2023. paper
Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang
-
Memory Augmented Large Language Models are Computationally Universal. arXiv 2023. paper
Dale Schuurmans
-
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems. ICLR 2024. paper
Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma
-
Representational Capabilities of Feed-Forward and Sequential Neural Architectures. PhD Thesis 2024. paper
Sanford, Clayton Hendrick
-
Transformers, parallel computation, and logarithmic depth. ICML 2024. paper
Clayton Sanford, Daniel Hsu, Matus Telgarsky
-
Understanding Transformer Reasoning Capabilities via Graph Algorithms. NeurIPS 2024. paper
Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni
-
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers. NeurIPS 2024 Workshop M3L. paper
William Merrill, Ashish Sabharwal
-
On Limitations of the Transformer Architecture. COLM 2024. paper
Binghui Peng, Srini Narayanan, Christos Papadimitriou
-
The Expressive Power of Transformers with Chain of Thought. ICLR 2024. paper
William Merrill, Ashish Sabharwal
-
Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers. arXiv 2025. paper
Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir Globerson
-
Positional Attention: Expressivity and Learnability of Algorithmic Computation. ICML 2025. paper
Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis
-
Round and Round We Go! What makes Rotary Positional Encodings useful?. ICLR 2025. paper
Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos, Razvan Pascanu, Petar Veličković
-
Reasoning with Latent Thoughts: On the Power of Looped Transformers. ICLR 2025. paper
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi
-
ALTA: Compiler-Based Analysis of Transformers. TMLR 2025. paper
Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova
-
Provably good solutions to the knapsack problem via neural networks of bounded size. AAAI 2021. paper
Christoph Hertrich, Martin Skutella
-
ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation. Integer Programming and Combinatorial Optimization 2023. paper
Christoph Hertrich, Leon Sering
-
Representational Capabilities of Feed-Forward and Sequential Neural Architectures. PhD Thesis 2024. paper
Sanford, Clayton Hendrick
-
What graph neural networks cannot learn: depth vs width. ICLR 2020. paper
Andreas Loukas
-
Simulation of Graph Algorithms with Looped Transformers. ICML 2024. paper
Artur Back De Luca, Kimon Fountoulakis
-
Exploring the Power of Graph Neural Networks in Solving Linear Optimization Problems. AISTATS 2024. paper
Chendi Qian, Didier Chételat, Christopher Morris
-
MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation. ICML 2024. paper
Alexandre Hayderi, Amin Saberi, Ellen Vitercik, Anders Wikum
-
Aligning Transformers with Weisfeiler-Leman. ICML 2024. paper
Luis Müller, Christopher Morris
-
Graph Transformers Dream of Electric Flow. ICLR 2025. paper
Xiang Cheng, Lawrence Carin, Suvrit Sra
-
Primal-Dual Neural Algorithmic Reasoning. arXiv 2025. paper
Yu He, Ellen Vitercik
-
Positional Attention: Expressivity and Learnability of Algorithmic Computation. ICML 2025. paper
Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis
-
Learning Compositional Functions with Transformers from Easy-to-Hard Data. COLT 2025. paper
Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, Denny Wu
-
Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks. arXiv 2025. paper
George Giapitzakis, Artur Back de Luca, Kimon Fountoulakis
-
Graph neural networks extrapolate out-of-distribution for shortest paths. arXiv 2025. paper
Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang
-
Neural Networks and the Chomsky Hierarchy. ICLR 2023. paper
Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega
-
Training Neural Networks as Recognizers of Formal Languages. ICLR 2025. paper
Alexandra Butoi, Ghazal Khalighinejad, Anej Svete, Josef Valvoda, Ryan Cotterell, Brian DuSell
-
Learning to Execute. arXiv 2015. paper
Wojciech Zaremba, Ilya Sutskever
-
Neural Programmer-Interpreters. arXiv 2015. paper
Scott Reed, Nando de Freitas
-
Neural Programmer: Inducing Latent Programs with Gradient Descent. arXiv 2016. paper
Arvind Neelakantan, Quoc V. Le, Ilya Sutskever
-
Deep Neural Solver for Math Word Problems. arXiv 2017. paper
Yan Wang, Xiaojiang Liu, and Shuming Shi
-
Analysing Mathematical Reasoning Abilities of Neural Models. arXiv 2019. paper
David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
-
LSTM Networks Can Perform Dynamic Counting. ACL 2019 Workshop on Deep Learning and Formal Languages. paper
Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber
-
Thinking Like Transformers. ICML 2021. paper
Gail Weiss, Yoav Goldberg, Eran Yahav
-
Investigating the Limitations of Transformers with Simple Arithmetic Tasks. arXiv 2021. paper
Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
-
A Primer for Neural Arithmetic Logic Modules. arXiv 2022. paper
Bhumika Mistry, Katayoun Farrahi, Jonathon Hare
-
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. arXiv 2022. paper
Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, Vedant Misra
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2023. paper
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
-
Implicit Chain of Thought Reasoning via Knowledge Distillation. arXiv 2023. paper
Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber
-
Positional Description Matters for Transformers Arithmetic. arXiv 2023. paper
Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang
-
Learning Transformer Programs. NeurIPS 2023. paper
Dan Friedman, Alexander Wettig, Danqi Chen
-
Length Generalization in Arithmetic Transformers. arXiv 2023. paper
Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton
-
Transformers Can Do Arithmetic with the Right Embeddings. arXiv 2024. paper
Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
-
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step. arXiv 2024. paper
Yuntian Deng, Yejin Choi, Stuart Shieber