Awesome-Papers

❓ Objective of `jinmang2/Awesome-Papers` Repo.

💡 To be AI Researcher, Artist and Good Person...!!

2021 Papers to Read

Learning to Learn without Gradient Descent by Gradient Descent
Massively Multitask Networks for Drug Discovery
One-Shot Imitation Learning
Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
Meta-Learning for Low-Resource Neural Machine Translation
Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
SYNTHESIZER: Rethinking Self-Attention in Transformer Models
Fine-tune BERT for Extractive Summarization
ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

2020 Reading Papers

대충 쓱 본 논문은 기재하지 않음
전체 논문을 다 읽고 나 스스로 다른 정보까지 찾아본 논문들만 기재
예를 들어, word2vec같은 경우 개념은 알고 있지만 paper로 뜯어보진 않았기 때문에 기재하지 않음

Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning
- A3C, DeepMind & Montreal
Continuous Control With Deep Reinforcement Learning
- DDPG, DQN+DPG, Replay Buffer, Soft-Update via Polyak Averaging, Ornstein Uhlenbeck process, White Gaussian Random process, DeepMind
Deterministic Policy Gradient Algorithms
- DeepMind, Policy Gradient, Actor-Critic, Deterministic Policy
Policy Gradient Methods for Reinforcement Learning with Function Approximation
- Compatible Function Approximation, Policy Gradient, Sutton
Approximately Optimal Approximate Reinforcement Learning
- Kakade & Langford, Mixture Policy, Policy Improvement
True Region Policy Optimiation
- Trust Region, Natural Policy, Kakade & Langford Thm, Policy Improvement, OpenAI
Proximal Policy Optimization Algorithms
- OpenAI, Practical TRPO, Clip Gradient

Meta-Learning

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- MAML, Optimization-Based Meta-Learning

NLP

Efficient Estimation of Word Representations in Vector Space
- Word2Vec, CBOW, Skip-Gram
Distributed Representations of Words and Phrases and their Compositionality
- Enhanced vec repr quality, SubSampling, Negative Sampling, Hierarchical Softmax
Deep contextualized word representations
- ELMo, Feature-Based, Pre-ELMo + Linear Combination, SubWord Information by ConvNet
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Transformer's Encoder, MLM, NSP
Neural Machine Translatoin By Jointly Learning to Align and Translate
- GRU, Seq2Seq with Attention, Bahnau Attention
Attention Is All You Need
- Transformers, Self-Dot Product Attention, Seq2Seq
Advances in Pre-Training Distributed Word Representations
- FastText
Enriching Word Vectors with Subword Information
- FastText
Minimum Risk Training for Neural Machine Translation
- MRT, NMT
Bag of Tricks for Efficient Text Classification
- FastText for Text Classification, Fast!
A Fast and Accurate Dependency Parsing using Neural Networks
- Parsing
MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
- Parsing
Incrementality in Deterministic Dependency Parsing
- Parsing
A Neural Probabilistic Language Model
- NPLM
Universal Language Model Fine-tuning for Text Classification
- ULMFit, Fine-Tuning
The Natural Language Decathlon: Multitask Learning as Question Answering
- MultiTask Learning, anti-curriculum learning
Phrase-Based & Neural Unsupervised Machine Translation
- Initialization, ``, Back-Translation
A Structured Self-Attentive Sentence Embedding
- Self-Attentive

Graph

Graph Attention Networks
- GNN, Attention
MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
- GAT, MLTC

Conversational AI

Memory Networks
End-To-End Memory Networks
Learning Through Dialogue Interactions By Asking Questions
Hierarchical Attention Networks for Document Classification
Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty

Fundamental

Decoupled Neural Interfaces using Synthetic Gradients
Decoupled Weight Decay Regularization
Neural Network Ensembles, Cross Validation, and Active Learning
Sharp Minima Can Generalize For Deep Nets
Long short-term memory
Highway Networks
Recurrent Highway Networks

ETC

LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
C3D Learning Spatiotemporal Features with 3D Convolutional Networks

🏢 NLP

Tokenization

BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
- In Wikipedia
Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
- Compare between n-gram and byte-pair-encoding

Wordpiece

SentencePiece

Morphological

Word Vector Representation

NLP Tasks

A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)

A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018

SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016

introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003

Dependency Parsing

Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper

Neural Machine Translation

MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper

Text Classification

FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper

Question Answering

Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556

Textual Entailment

Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038

Semantic Role Labeling

Deep Semantic Role Labeling: What Works and What’s Next https://www.aclweb.org/anthology/P17-1044/

Summarization

Extractive

BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper
BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper

Pre-trained NLP Architecture

Semi-supervised sequence learning (NIPS 2015) paper

Word Representations: A Simple and General Method for Semi-Supervised Learning

institute	subtitle	title	journal	published	etc
AllenAI	ELMo	Deep contextualized word representations	ACL	2018	paper
AllenAI	LongFormer	Longformer: The Long-Document Transformer	arxiv	20.04.10	paper
GoogleAI	BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	ACL	2018	paper
GoogleAI	ALBERT	ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS	ICLR	19.09.26	paper
GoogleAI	T5	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	JMLR	19.10.23	paper
GoogleAI	PEGASUS	PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization	ICML	2020	paper
GoogleAI	ELECTRA	ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS	ICLR	2020	paper
DeepMind	Compressive Transformers	COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING	arxiv	19.11.13	paper
UNC Chapel Hill	LXMERT	LXMERT: Learning Cross-Modality Encoder Representations from Transformers	arxiv	19.08.20	paper
OpenAI	GPT-1	Improving language understanding with unsupervised learning	OpenAI	2018	paper
OpenAI	GPT-2	Language Models are Unsupervised Multitask Learners	OpenAI	2019	paper
OpenAI	GPT-3	Language Models are Few-Shot Learners	OpenAI	2020	paper
FAIR	FastText	Advances in Pre-Training Distributed Word Representations	arxiv	17.12.26	paper
FAIR	XLM	Cross-lingual Language Model Pretraining	arxiv	19.01.22	paper
FAIR	FSMT	Facebook FAIR's WMT19 News Translation Task Submission	arxiv	19.07.15	paper
FAIR	RoBERTa	RoBERTa: A Robustly Optimized BERT Pretraining Approach	arxiv	19.07.26	paper
FAIR	MMBT	Supervised Multimodal Bitransformers for Classifying Images and Text	arxiv	19.09.06	paper
FAIR	BART	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	arxiv	19.10.29	paper
FAIR	CamemBERT	CamemBERT: a Tasty French Language Model	arxiv	19.11.10	paper
FAIR	mBART	Multilingual Denoising Pre-training for Neural Machine Translation	arxiv	20.01.22	paper
FAIR	RAG	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	arxiv	20.05.22	paper
Hugging Face	DistilBERT	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	arxiv	19.10.02	paper
Microsoft	Marian	Marian: Cost-effective High-Quality Neural Machine Translation in C++	ACL	2018	paper
Microsoft	MT-DNN	Multi-Task Deep Neural Networks for Natural Language Understanding	arxiv	19.05.30	paper
Microsoft	LayoutLM	LayoutLM: Pre-training of Text and Layout for Document Image Understanding	arxiv	19.12.31	paper
NVIDIA	MegatronLM	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	arxiv	19.09.17	paper
Univ. of Washington	Grover-Mega	Defending Against Neural Fake News	arxiv	19.10.29	paper
Carnegie Mellon GoogleBrain	Transformer-XL	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	arxiv	19.06.02	paper
Carnegie Mellon GoogleBrain	XLNet	XLNet: Generalized Autoregressive Pretraining for Language Understanding	arxiv	19.06.19	paper
Carnegie Mellon GoogleBrain	Funnel	Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing	arxiv	20.06.05	paper
Salesforce	CTRL	CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION	arxiv	19.09.11	paper
Anonymous authors	MobileBERT	MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer	ICLR	2020	paper

✨ Attention Mechanism

Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper
Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper
Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper

💆 Conversational AI

Memory-Based Research

Sumit Chopra, Jason Weston님 연구 추적
Memory Networks (14.10.15, arxiv; ICLR 2015) paper
End-To-End Memory Networks (NIPS 2015) paper
Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper

Open-Domain

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
Kelvin Guu의 REALM, ACL
DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper
- Huffon님 소개자료

🎨 Generative Model

GAN

Original GAN; Generative Adversarial Net (NIPS 2014) paper

🐵 Meta Learning

MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper

Curiosity Algorithms

Road to General Intelligence

AutoML Style Approach
- Neural Architecture Search (NAS)
- Hyperparameter optimization for deep networks
- Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
Meta-learning with genetic programming, evolutionary computing
Programming Automation
- Searching over mathematical operations within neural networks
- Neural networks that learn programs
Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
Agent57 (DeepMind)

🧠 Reinforcement Learning

Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
Deterministic Policy Gradient Algorithm
Continuous Control with Deep Reinforcement Learning
Approximetely Optimal Approximate Reinforcement Learning
True Region Policy Optimization
Proximal Policy Optimization Algorithms

RL.start() 오늘의 논문 series

ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
Implementation Matters In Deep RL () paper
CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
Dream to Control: Learning Behaviors by Latent Imagination () paper

📈 Financial Mathematics & Engineer

🎨 Neuromorphic

🐈 Theoretical Deep Learning

Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper

Batch Normalization

Lipschitz gradient

Global Batch Normalization

Input Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

How Does Batch Normalization Help Optimization?

Layer Normalization https://arxiv.org/abs/1607.06450

LeCun Initialization Efficient BackProp

Xavier initialization Understanding the difficulty of training deep feedforward neural networks

He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Nesterov Optimizer (Optimization류 논문들)

weight_standardization

😍 Schmidhuber

Juergen Schmidhuber's Google Scholar

ETC

LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems

C3D Learning Spatiotemporal Features with 3D Convolutional Networks

n-gram 관련 논문

Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
Interpolated estimation of Markov source parameters from sparse data

Pointing the Unknown Words (몬트리홀 대학)

Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Real-World Anomaly Detection in Surveillance Videos

self-attention on classification - A Structured Self-Attentive Sentence Embedding

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
AI-Article-Study-2nd @ 2e61248		AI-Article-Study-2nd @ 2e61248
Meta-Learning-Papers @ dbc9b18		Meta-Learning-Papers @ dbc9b18
Paper-Reviews		Paper-Reviews
ShallowMinded		ShallowMinded
nlp-paper-reading @ 6d6fdf0		nlp-paper-reading @ 6d6fdf0
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

jinmang2/Awesome-Papers

Folders and files

Latest commit

History

Repository files navigation