Paper List

Paper List

Papers 📰

Large Language Models (LLMs) 🛰

[link]

A Survey on In-context Learning.

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, Zhifang Sui. arxiv, 2023. [pdf] [arxiv] [project]
Finetuned Language Models Are Zero-Shot Learners.

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le. ICLR, 2022. [pdf] [arxiv]

Proposed instruction tuning and conducted experiments on many tasks.
Learning to summarize from human feedback.

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. NeurIPS, 2020. [pdf] [arxiv] [project] [samples]

Summarize with RLHF.
Training language models to follow instructions with human feedback.

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. NeurIPS, 2022. [pdf] [arxiv] [blog]
Scaling Laws for Reward Model Overoptimization.

Leo Gao, John Schulman, Jacob Hilton. arxiv, 2022. [pdf] [arxiv]

About over-optimization in RLHF.
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning.

Qian Liu, Fan Zhou, Zhengbao Jiang, Longxu Dou, Min Lin. arxiv, 2023. [pdf] [arxiv] [project]

This paper introduces a straightforward yet effective method for enhancing instruction tuning by employing symbolic tasks.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. NeurIPS, 2022. [pdf] [arxiv]
Emergent Abilities of Large Language Models.

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. Transactions on Machine Learning Research (TMLR), 2022. [pdf] [arxiv]
- definition of emergent abilities of LLMs: An ability is emergent if it is not present in smaller models but is present in larger models.
- few-shot prompting (in-context learning ability)
- augmented prompting strategies (CoT prompting, instruction following without exemplars/demonstrations and so on)
Toolformer: Language Models Can Teach Themselves to Use Tools.

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom. arxiv, 2023. [pdf] [arxiv]
Tool Learning with Foundation Models.

Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun. arxiv, 2023. [pdf] [arxiv] [project]
GLM: General Language Model Pretraining with Autoregressive Blank Infilling.

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang. ACL, 2022. [pdf] [acl] [arxiv] [project]

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

Seems like the perturbation language modeling in XLNet. (Zhilin Yang is the co-first author of XLNet.)
GLM-130B: An Open Bilingual Pre-trained Model.

Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang. ICLR, 2023. [pdf] [arxiv] [project]

GLM as backbone. A bilingual (English and Chinese) pre-trained language model with 130 billion parameters from Tsinghua and Zhipu. They released ChatGLM-6B in March 2023. ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. Related information about ChatGLM: [blog] [project]
LLaMA: Open and Efficient Foundation Language Models.

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. arxiv, 2023. [pdf] [arxiv] [project]

A series of LLMs (7B, 13B, 33B, 65B) from Meta AI.
LLaMA 2.

Hugo Touvron et al. 2023. [pdf] [homepage] [project]
PaLM 2 Technical Report.

Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson and many authors. arxiv, 2023. [pdf] [arxiv]
RWKV: Reinventing RNNs for the Transformer Era.

Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu. arxiv, 2023. [pdf] [arxiv] [project]

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable).
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.

Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, Xia Hu. arxiv, 2023. [pdf] [arxiv] [project]

Authors also provide a curated list of practical guide resources for LLMs.

~~It's an evolutionary tree of modern LLMs.~~ (The taxonomy and nomenclature may be confusing sometimes, so I do not refer to the tree diagram here.)

Moreover, they build a decision flow for choosing LLMs or fine-tuned models for user's NLP applications. The decision flow helps users assess whether their downstream NLP applications at hand meet specific conditions and, based on that evaluation, determine whether LLMs or fine-tuned models are the most suitable choice for their applications.

Parameter-Efficient Fine-Tuning / Other Efficient Tuning or Inferring Methods 🛩

LoRA: Low-Rank Adaptation of Large Language Models.

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. arxiv, 2021. [pdf] [arxiv]

A low-rank approximation method for parameter-efficient fine-tuning.

$$ W = W + \Delta W, W \in \mathbb{R}^{d \times d} \notag \\ \Delta W = A B, A \in \mathbb{R}^{d \times r}, B \in \mathbb{R}^{r \times d} \notag $$
Towards a Unified View of Parameter-Efficient Transfer Learning.

Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig. ICLR, 2022. [pdf] [arxiv] [project]

It provides a unified view of three representative PEFT methods (adapter, prefix tuning and LoRA). Authors also proposed three variants: Parallel Adapter, Multi-head Parallel Adapter and Scaled Parallel Adapter and conducted further experiments and discussions. Insightful.
QLoRA: Efficient Finetuning of Quantized LLMs.

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. arxiv, 2023. [pdf] [arxiv] [project]

Model Architecture / Training and Tuning / Decoding 🚋

Attention is All You Need.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. NIPS, 2017. [pdf] [arxiv]

Transformer, a model architecture.
Modeling Relational Data with Graph Convolutional Networks.

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling. European Semantic Web Conference (ESWC), 2018. [pdf] [arxiv]

Relational GCN.
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints.

Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi. NAACL, 2021. [pdf] [acl] [arxiv]
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.

Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi. NAACL, 2022. [pdf] [acl] [arxiv]

Information Extraction / Script Learning 🛴

A survey of script learning.

Yi Han, Linbo Qiao, Jianming Zheng, Hefeng Wu, Dongsheng Li, Xiangke Liao. Frontiers of Information Technology & Electronic Engineering, 2021. [pdf] [journal website]
What Happens Next? Event Prediction Using a Compositional Neural Network Model.

Mark Granroth-Wilding, Stephen Clark. AAAI, 2016. [pdf] [aaai]
Multi-Relational Script Learning for Discourse Relations.

I-Ta Lee, Dan Goldwasser. ACL, 2019. [pdf] [acl] [project]
Weakly-Supervised Modeling of Contextualized Event Embedding for Discourse Relations.

I-Ta Lee, Maria Leonor Pacheco, Dan Goldwasser. Findings of EMNLP, 2020. [pdf] [acl] [project]
Modeling Human Mental States with an Entity-based Narrative Graph.

I-Ta Lee, Maria Leonor Pacheco, Dan Goldwasser. NAACL, 2021. [pdf] [acl] [arxiv] [project]
A Survey on Deep Learning Event Extraction: Approaches and Applications.

Qian Li, Jianxin Li, Jiawei Sheng, Shiyao Cui, Jia Wu, Yiming Hei, Hao Peng, Shu Guo, Lihong Wang, Amin Beheshti, Philip S. Yu. IEEE TNNLS, 2021. [pdf-v6-2022.11] [pdf-v1-2021.07] [arxiv]
Open Domain Event Extraction Using Neural Latent Variable Models.

Xiao Liu, Heyan Huang, Yue Zhang. ACL, 2019. [pdf] [acl] [arxiv] [project]
Entity, Relation, and Event Extraction with Contextualized Span Representations.

David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi. EMNLP-IJCNLP, 2019. [pdf] [acl] [project]

DyGIE++.
Event Extraction as Machine Reading Comprehension.

Jian Liu, Yubo Chen, Kang Liu, Wei Bi, Xiaojiang Liu. EMNLP, 2020. [pdf] [acl] [project]
A Joint Neural Model for Information Extraction with Global Features.

Ying Lin, Heng Ji, Fei Huang, Lingfei Wu. ACL, 2020. [pdf] [acl] [homepage]

OneIE.
Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction.

Yaojie Lu, Hongyu Lin, Jin Xu, Xianpei Han, Jialong Tang, Annan Li, Le Sun, Meng Liao, Shaoyi Chen. ACL-IJCNLP, 2021. [pdf] [acl] [arxiv] [project]

Formulate event extraction as a sequence-to-structure generation task.
Unified Structure Generation for Universal Information Extraction.

Yaojie Lu, Qing Liu, Dai Dai, Xinyan Xiao, Hongyu Lin, Xianpei Han, Le Sun, Hua Wu. ACL, 2022. [pdf] [acl] [arxiv] [project]

Namely UIE. Phrase four infomation extraction tasks (RE, NER, EE, and IE) as a unified structure generation task. In PaddleNLP from Baidu Inc., UIE is implemented in two ways:
- extraction (prompt + MRC): An ERNIE model with two linear layers on top of the hidden states output to compute start_prob and end_prob. [implementation]
- sequence-to-structure: The way in original paper. [implementation]
Zero-Shot Information Extraction via Chatting with ChatGPT.

Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han. arxiv, 2023. [pdf] [arxiv]

This paper transforms the zero-shot IE task into a multi-turn QA problem with a two-stage framework named ChatIE (based-on ChatGPT). Experiments are conducted on RE, NER and EE tasks across two languages (English and Chinese).
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction.

Xiao Wang, Weikang Zhou, Can Zu, Han Xia, Tianze Chen, Yuansen Zhang, Rui Zheng, Junjie Ye, Qi Zhang, Tao Gui, Jihua Kang, Jingsheng Yang, Siyuan Li, Chunsai Du. arxiv, 2023. [pdf] [arxiv] [project]

Flan-T5 (11B) as backbone.
SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres.

Shumin Deng, Shengyu Mao, Ningyu Zhang, Bryan Hooi. ACL, 2023. [pdf] [arxiv] [slides] [project]

SPEECH is proposed to address event-centric structured prediction with energy-based hyperspheres. SPEECH models complex dependency among event structured components with energy-based modeling, and represents event classes with simple but effective hyperspheres.

Event Relation Extraction (Identification or Classification) 🌱

Challenges of Adding Causation to Richer Event Descriptions.

Rei Ikuta, Will Styler, Mariah Hamang, Tim O’Gorman, Martha Palmer. Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2014. [pdf] [acl]
Modeling Document-level Causal Structures for Event Causal Relation Identification.

Lei Gao, Prafulla Kumar Choubey, Ruihong Huang. NAACL, 2019. [pdf] [acl]
Graph Convolutional Networks for Event Causality Identification with Rich Document-level Structures.

Minh Tran Phu, Thien Huu Nguyen. NAACL, 2021. [pdf] [acl]
MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction.

Xiaozhi Wang, Yulin Chen, Ning Ding, Hao Peng, Zimu Wang, Yankai Lin, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie Zhou. EMNLP, 2022. [pdf] [acl] [arxiv] [projec t]

Event Type Induction / Event Schema Induction 🚲

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction.

Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss. EMNLP, 2021. [pdf] [acl] [arxiv] [project] [note]

(no codes, only data)
Corpus-based Open-Domain Event Type Induction.

Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han. EMNLP, 2021. [pdf] [acl] [arxiv] [project]
Harvesting Event Schemas from Large Language Models.

Jialong Tang, Hongyu Lin, Zhuoqun Li, Yaojie Lu, Xianpei Han, Le Sun. arxiv, 2023. [pdf] [arxiv] [project] [note]
Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification.

Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch and Jiawei Han. ACL, 2023. [pdf] [copy form upenn] [copy form uiuc]

Language Aquisition 📚

Word Acquisition in Neural Language Models.

Tyler A. Chang, Benjamin K. Bergen. TACL, 2022. [pdf] [acl] [arxiv] [project]
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Ronen Eldan, Yuanzhi Li. arxiv, 2023. [pdf] [arxiv]

Graph Learning 🗺

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models.

Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, Jure Leskovec. ICML, 2018. [pdf] [icml] [arxiv] [project]

Causal Inference 👌🏻

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond.

Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, Diyi Yang. TACL, 2022. [pdf] [acl] [arxiv] [mit archive]
Learning Causal Effects on Hypergraphs.

Jing Ma, Mengting Wan, Longqi Yang, Jundong Li, Brent Hecht, Jaime Teevan. KDD, 2022. [pdf] [arxiv]

Video / Audio / Multimodal 🎞

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan. arxiv, 2023 [pdf] [arxiv] [project]

Linguistics 📜

Causal relatedness and importance of story events.

Tom Trabasso, Linda L. Sperry. Journal of memory and language. 1985. [pdf]
Narrative ablities of Mandarin-speaking children with and without specific langauge impairment: macrostructure and microstructure.

Pao-Chuan Torng, Wen-Hui Sah. Clinical Linguistics & Phonetics. 2019. [pdf]
The development of coherence in narratives: causal relations.

Wen-Hui Sah. Pacific Asia Conference on Language, Information and Computaion (PACLIC). 2013. [pdf] [acl]

Courses and Tutorials / Tools 🍔

Retrieval-based Language Models and Applications.

Akari Asai, Sewon Min, Zexuan Zhong, Danqi Chen. ACL 2023 Tutorial. [homepage]
Introduction to Large Language Models.

Qi Zhang, Tao Gui, Rui Zheng, Xuanjing Huang. online resource, 2023. [homepage] [github]
Introduction to Natural Language Processing.

Qi Zhang, Tao Gui, Xuanjing Huang. online resource, 2023. [pdf] [homepage] [github]
Natural Language Processing with Transformers.

Lewis Tunstall, Leandro von Werra, Thomas Wolf. Published by O'Reilly, 2022. [pdf] [project]
CS 224N (23') Lecture 10.Natural Language Generation.

Xiang Lisa Li. 2023. [slides] [slides on-line]
CS 224N (23') Lecture 11.Prompting, Reinforcement Learning from Human Feedback.

Jesse Mu. 2023. [slides] [slides on-line] [note]
- zero-shot and few-shot prompting (in-context learning)
- instruction fine-tuning
- reinforcement learning from human feedback (RLHF)
Lecture: Graph Convolutional Neural Networks.

Huawei Shen (ICT, CAS). 2021. [slides] [video]
YEDDA: A Lightweight Collaborative Text Span Annotation Tool.

Jie Yang, Yue Zhang, Linwei Li, Xingxuan Li. ACL 2018, System Demonstrations. [pdf] [acl] [arxiv] [project]

YEDDA (SUTDAnnotator) is developed for annotating chunk/entity/event on text. It supports shortcut annoatation. The tool is developed based on tkinter package in Python.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows		.github/workflows
notes		notes
CNAME		CNAME
README.md		README.md
_config.yml		_config.yml
llm-note.md		llm-note.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper List

Papers 📰

Large Language Models (LLMs) 🛰

Parameter-Efficient Fine-Tuning / Other Efficient Tuning or Inferring Methods 🛩

Model Architecture / Training and Tuning / Decoding 🚋

Information Extraction / Script Learning 🛴

Event Relation Extraction (Identification or Classification) 🌱

Event Type Induction / Event Schema Induction 🚲

Language Aquisition 📚

Graph Learning 🗺

Causal Inference 👌🏻

Video / Audio / Multimodal 🎞

Linguistics 📜

Courses and Tutorials / Tools 🍔

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Paper List

Papers 📰

Large Language Models (LLMs) 🛰

Parameter-Efficient Fine-Tuning / Other Efficient Tuning or Inferring Methods 🛩

Model Architecture / Training and Tuning / Decoding 🚋

Information Extraction / Script Learning 🛴

Event Relation Extraction (Identification or Classification) 🌱

Event Type Induction / Event Schema Induction 🚲

Language Aquisition 📚

Graph Learning 🗺

Causal Inference 👌🏻

Video / Audio / Multimodal 🎞

Linguistics 📜

Courses and Tutorials / Tools 🍔

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages