🔥🔥🔥 Improving Causal Reasoning in Large Language Models: A Survey [Paper]
We provide a comprehensive review of research aimed at enhancing LLMs for causal reasoning (CR). We categorize existing methods based on the role of LLMs: either as reasoning engines or as helpers providing knowledge or data to traditional CR methods, followed by a detailed discussion of the methodologies in each category. We then evaluate the performance of LLMs on various causal reasoning tasks, providing key findings and in-depth analysis. Finally, we provide insights from current studies and highlight promising directions for future research.
Table of Contents
C2P: Featuring Large Language Models with Causal Reasoning
Abdolmahdi Bagheri, Matin Alinejad, Kevin Bello, Alireza Akhondi-Asl. Preprint'24
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning.
Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao. ICLR'2024
Large Language Model for Causal Decision Making.
Jiang, Haitao, Lin Ge, Yuhe Gao, Jianian Wang, and Rui Song. COLM'2024
Ziyi Tang, Ruilin Wang, Weixing Chen, Keze Wang, Yang Liu, Tianshui Chen, Liang Lin. Preprint'2023
CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schölkopf. NeurIPS'2023
Causal Reasoning of Entities and Events in Procedural Texts
Li Zhang, Hainiu Xu, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch. ACL'2023
Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
Junhao Zheng, Qianli Ma, Shengjie Qiu, Yue Wu, Peitian Ma, Junlong Liu, Huawen Feng, Xichen Shang, Haibin Chen. ACL'23
Answering Causal Questions with Augmented LLMs
Nick Pawlowski, James Vaughan, Joel Jennings, Cheng Zhang. ICML Workshop'2023
Neuro-Symbolic Procedural Planning with Commonsense Prompting
Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang. ICLR'2023
Faithful Reasoning Using Large Language Models.
Antonia Creswell, Murray Shanahan. Preprint'2022
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning.
Antonia Creswell, Murray Shanahan, Irina Higgins. Preprint'2022
CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision.
Zhongyang Li, Xiao Ding, Kuo Liao, Bing Qin, Ting Liu. Preprint'2021
LLM-Enhanced Causal Discovery in Temporal Domain from Interventional Data
Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, Wenweu Zhu. Preprint'2024
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Yair Ori Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma, Roi Reichart. ICLR'2024
Causal Structure Learning Supervised by Large Language Model
Taiyu Ban, Lyuzhou Chen, Derui Lyu, Xiangyu Wang, Huanhuan Chen. Preprint'2023
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
Sen Yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam. Preprint'2023
Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning
Sara Abdali, Anjali Parikh, Steve Lim, Emre Kiciman. Preprint'2023
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation
Ieva Staliūnaitė, Philip John Gorinski, Ignacio Iacobacci. Preprint'2021
We first categorize the end tasks into three groups: causal discovery, causal inference, and additional causal tasks. For each category, we evaluate recent LLMs using pass@1 accuracy with strategies such as zero-shot, few-shot, direct I/O prompting, and Chain-of-Thought (CoT) reasoning.
To replicate our results, first navigate to the src
directory, then run the eval_all.py
script, which will generate the model results. Alternatively, browse the llm_result
folder to review the raw data directly.
Each file in llm_result
follows the naming convention:
{Model_name}_{seed}_{sample_num}_{few_shot}_{direct_io}.json
For example: claude-3-5-sonnet-20240620_seed_42_sample_num_100_few_shot_False_direct_io_True.json
.
To explore the dataset, navigate to the dataset/{dataset_name}
folder, and for the corresponding prompt, check the prompt/{dataset_name}
folder. The merged results can be found in the result
folder.
To acclearate the process, run the bash script run_all.sh
to generate the results.
Can large language models infer causation from correlation
Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf. ICLR'2024
CausalQA: A Benchmark for Causal Question Answering
Alexander Bondarenko, Magdalena Wolska, Stefan Heindorf, Lukas Blübaum, Axel-Cyrille Ngonga Ngomo, Benno Stein, Pavel Braslavski, Matthias Hagen, Martin Potthast. ACL'2022
e-CARE: a New Dataset for Exploring Explainable Causal Reasoning
- Li Du, Xiao Ding, Kai Xiong, Ting Liu, and Bing Qin.* ACL'2022
CausaLM: Causal Model Explanation Through Counterfactual Language Models
- Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart.* ACL'2021
CRAB:Assessing the Strength of Causal Relationships Between Real-World Events
Angelika Romanou, Syrielle Montariol, Debjit Paul, Léo Laugier, Karl Aberer, Antoine Bosselut. EMNLP'2023
CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schölkopf. NeurIPS'2023
COLA: Contextualized Commonsense Causal Reasoning from the Causal Inference Perspective
Zhaowei Wang, Quyet V. Do, Hongming Zhang, Jiayao Zhang, Weiqi Wang, Tianqing Fang, Yangqiu Song, Ginny Wong, Simon See. ACL'2023
Abductive Commonsense Reasoning
Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi. ICLR'2020
TRAM: Benchmarking Temporal Reasoning for Large Language Models
Yuqing Wang, Yun Zhao. ACL'2024
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie, Yuhui Zhang, Atharva Amdekar, Chris Piech, Tatsunori Hashimoto, Tobias Gerstenberg. NeurIPS'2023
CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
Jörg Frohberg, Frank Binder. LREC'2022
@article{xiong2024improving,
title={Improving Causal Reasoning in Large Language Models: A Survey},
author={Xiong, Siheng and Chen, Delin and Wu, Qingyang and Yu, Longxuan and Liu, Qingzhen and Li, Dawei and Chen, Zhikai and Liu, Xiaoze and Pan, Liangming},
journal={arXiv preprint arXiv:2410.16676},
year={2024}
}