๐ก To be AI Researcher, Artist and Good Person...!!
- Learning to Learn without Gradient Descent by Gradient Descent
- Massively Multitask Networks for Drug Discovery
- One-Shot Imitation Learning
- Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
- Meta-Learning for Low-Resource Neural Machine Translation
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- SYNTHESIZER: Rethinking Self-Attention in Transformer Models
- Fine-tune BERT for Extractive Summarization
- ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
- Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
- ๋์ถฉ ์ฑ ๋ณธ ๋ ผ๋ฌธ์ ๊ธฐ์ฌํ์ง ์์
- ์ ์ฒด ๋ ผ๋ฌธ์ ๋ค ์ฝ๊ณ ๋ ์ค์ค๋ก ๋ค๋ฅธ ์ ๋ณด๊น์ง ์ฐพ์๋ณธ ๋ ผ๋ฌธ๋ค๋ง ๊ธฐ์ฌ
- ์๋ฅผ ๋ค์ด, word2vec๊ฐ์ ๊ฒฝ์ฐ ๊ฐ๋ ์ ์๊ณ ์์ง๋ง paper๋ก ๋ฏ์ด๋ณด์ง ์์๊ธฐ ๋๋ฌธ์ ๊ธฐ์ฌํ์ง ์์
Reinforcement Learning
- Asynchronous Methods for Deep Reinforcement Learning
A3C,DeepMind & Montreal
- Continuous Control With Deep Reinforcement Learning
DDPG,DQN+DPG,Replay Buffer,Soft-Update via Polyak Averaging,Ornstein Uhlenbeck process,White Gaussian Random process,DeepMind
- Deterministic Policy Gradient Algorithms
DeepMind,Policy Gradient,Actor-Critic,Deterministic Policy
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
Compatible Function Approximation,Policy Gradient,Sutton
- Approximately Optimal Approximate Reinforcement Learning
Kakade & Langford,Mixture Policy,Policy Improvement
- True Region Policy Optimiation
Trust Region,Natural Policy,Kakade & Langford Thm,Policy Improvement,OpenAI
- Proximal Policy Optimization Algorithms
OpenAI,Practical TRPO,Clip Gradient
Meta-Learning
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
MAML,Optimization-Based Meta-Learning
NLP
- Efficient Estimation of Word Representations in Vector Space
Word2Vec,CBOW,Skip-Gram
- Distributed Representations of Words and Phrases and their Compositionality
Enhanced vec repr quality,SubSampling,Negative Sampling,Hierarchical Softmax
- Deep contextualized word representations
ELMo,Feature-Based,Pre-ELMo + Linear Combination,SubWord Information by ConvNet
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Transformer's Encoder,MLM,NSP
- Neural Machine Translatoin By Jointly Learning to Align and Translate
GRU,Seq2Seq with Attention,Bahnau Attention
- Attention Is All You Need
Transformers,Self-Dot Product Attention,Seq2Seq
- Advances in Pre-Training Distributed Word Representations
FastText
- Enriching Word Vectors with Subword Information
FastText
- Minimum Risk Training for Neural Machine Translation
MRT,NMT
- Bag of Tricks for Efficient Text Classification
FastText for Text Classification,Fast!
- A Fast and Accurate Dependency Parsing using Neural Networks
Parsing
- MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
Parsing
- Incrementality in Deterministic Dependency Parsing
Parsing
- A Neural Probabilistic Language Model
NPLM
- Universal Language Model Fine-tuning for Text Classification
ULMFit,Fine-Tuning
- The Natural Language Decathlon: Multitask Learning as Question Answering
MultiTask Learning,anti-curriculum learning
- Phrase-Based & Neural Unsupervised Machine Translation
Initialization, ``,Back-Translation
- A Structured Self-Attentive Sentence Embedding
Self-Attentive
Graph
- Graph Attention Networks
GNN,Attention
- MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
GAT,MLTC
Conversational AI
- Memory Networks
- End-To-End Memory Networks
- Learning Through Dialogue Interactions By Asking Questions
- Hierarchical Attention Networks for Document Classification
- Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty
Fundamental
- Decoupled Neural Interfaces using Synthetic Gradients
- Decoupled Weight Decay Regularization
- Neural Network Ensembles, Cross Validation, and Active Learning
- Sharp Minima Can Generalize For Deep Nets
- Long short-term memory
- Highway Networks
- Recurrent Highway Networks
ETC
- LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
- C3D Learning Spatiotemporal Features with 3D Convolutional Networks
- BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
- Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
- Compare between
n-gramandbyte-pair-encoding
- Compare between
Wordpiece
SentencePiece
Morphological
- NPLM; A Neural Probabilistic Language Model (jmlr 2003) paper
- NPLM's Reference -> ๋ฌธ์ฅ์์ ๋จ์ด์ ์ญํ ์ ํ์ต
- Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
- NN์ผ๋ก ๊ณ ์ฐจ์ ์ด์ง ๋ถ์ฐ ํํ์ ์ค์ํ๋ ์์ด๋์ด ์ ์
- Extracting distributed representations of concepts and relations from positive and negative propositions (IEEE 2000) link
- Hinton ๊ต์์ ์ฐ๊ตฌ๊ฐ ์ฑ๊ณต์ ์ผ๋ก ์ ์ฉ๋ ์ฌ๋ก
- Natural Language Processing With Modular Pdp Networks and Distributed Lexicon (Cognitive Science 1991 July) link
- Neural network๋ฅผ LM์ ์ ์ฉ์ํค๋ ค ํ ์ฌ๋ก
- Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
- NPLM's Reference -> word sequence distribution์ statistical model์ ํ์ต
- Sequential neural text compression (IEEE 1996) link
- I Love Schmidhuber a lot :)
- Sequential neural text compression (IEEE 1996) link
- NPLM's Reference -> ๋ฌธ์ฅ์์ ๋จ์ด์ ์ญํ ์ ํ์ต
- Word2Vec 2013a; Efficient Estimation of Word Representations in Vector Space (ICLR 2013) paper
- Introduce
Skip-Gram&CBOW - Google Team
- Introduce
- Word2Vec 2013b; Distributed Representations of Words and Phrases and their Compositionality (NIPS 2013) paper
- Propose train optimization method such as
negative sampling
- Propose train optimization method such as
- GloVe(Global Word Vectors); GloVe: Global Vectors for Word Representation (ACL 2014) paper
- Stanford Univ.
- Overcome
Word2VecandLSA
- Swivel(Submatrix-Wise Vector Embedding Learner); Swivel: Improving Embeddings by Noticing Whatโs Missing () paper
- Google, source code
- FastText; Enriching Word Vectors with Subword Information (17.06.16, arxiv) paper
A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)
A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018
SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016
introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003
- Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
- MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
- A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper
- MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper
- FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
- UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper
Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556
Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038
Deep Semantic Role Labeling: What Works and Whatโs Next https://www.aclweb.org/anthology/P17-1044/
Extractive
- BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper
- BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper
- Semi-supervised sequence learning (NIPS 2015) paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
| institute | subtitle | title | journal | published | etc |
|---|---|---|---|---|---|
| AllenAI | ELMo | Deep contextualized word representations | ACL | 2018 | paper |
| AllenAI | LongFormer | Longformer: The Long-Document Transformer | arxiv | 20.04.10 | paper |
| GoogleAI | BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ACL | 2018 | paper |
| GoogleAI | ALBERT | ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS | ICLR | 19.09.26 | paper |
| GoogleAI | T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | JMLR | 19.10.23 | paper |
| GoogleAI | PEGASUS | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization | ICML | 2020 | paper |
| GoogleAI | ELECTRA | ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS | ICLR | 2020 | paper |
| DeepMind | Compressive Transformers | COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING | arxiv | 19.11.13 | paper |
| UNC Chapel Hill | LXMERT | LXMERT: Learning Cross-Modality Encoder Representations from Transformers | arxiv | 19.08.20 | paper |
| OpenAI | GPT-1 | Improving language understanding with unsupervised learning | OpenAI | 2018 | paper |
| OpenAI | GPT-2 | Language Models are Unsupervised Multitask Learners | OpenAI | 2019 | paper |
| OpenAI | GPT-3 | Language Models are Few-Shot Learners | OpenAI | 2020 | paper |
| FAIR | FastText | Advances in Pre-Training Distributed Word Representations | arxiv | 17.12.26 | paper |
| FAIR | XLM | Cross-lingual Language Model Pretraining | arxiv | 19.01.22 | paper |
| FAIR | FSMT | Facebook FAIR's WMT19 News Translation Task Submission | arxiv | 19.07.15 | paper |
| FAIR | RoBERTa | RoBERTa: A Robustly Optimized BERT Pretraining Approach | arxiv | 19.07.26 | paper |
| FAIR | MMBT | Supervised Multimodal Bitransformers for Classifying Images and Text | arxiv | 19.09.06 | paper |
| FAIR | BART | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | arxiv | 19.10.29 | paper |
| FAIR | CamemBERT | CamemBERT: a Tasty French Language Model | arxiv | 19.11.10 | paper |
| FAIR | mBART | Multilingual Denoising Pre-training for Neural Machine Translation | arxiv | 20.01.22 | paper |
| FAIR | RAG | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | arxiv | 20.05.22 | paper |
| Hugging Face | DistilBERT | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | arxiv | 19.10.02 | paper |
| Microsoft | Marian | Marian: Cost-effective High-Quality Neural Machine Translation in C++ | ACL | 2018 | paper |
| Microsoft | MT-DNN | Multi-Task Deep Neural Networks for Natural Language Understanding | arxiv | 19.05.30 | paper |
| Microsoft | LayoutLM | LayoutLM: Pre-training of Text and Layout for Document Image Understanding | arxiv | 19.12.31 | paper |
| NVIDIA | MegatronLM | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | arxiv | 19.09.17 | paper |
| Univ. of Washington | Grover-Mega | Defending Against Neural Fake News | arxiv | 19.10.29 | paper |
| Carnegie Mellon GoogleBrain | Transformer-XL | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | arxiv | 19.06.02 | paper |
| Carnegie Mellon GoogleBrain | XLNet | XLNet: Generalized Autoregressive Pretraining for Language Understanding | arxiv | 19.06.19 | paper |
| Carnegie Mellon GoogleBrain | Funnel | Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | arxiv | 20.06.05 | paper |
| Salesforce | CTRL | CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION | arxiv | 19.09.11 | paper |
| Anonymous authors | MobileBERT | MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer | ICLR | 2020 | paper |
-
Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper
-
Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper
-
Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper
Sumit Chopra,Jason Weston๋ ์ฐ๊ตฌ ์ถ์ - Memory Networks (14.10.15, arxiv; ICLR 2015) paper
- End-To-End Memory Networks (NIPS 2015) paper
- Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
- Kelvin Guu์ REALM, ACL
- DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper
- Original GAN; Generative Adversarial Net (NIPS 2014) paper
- MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper
- https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html
- Meta-leraning curiosity algorithms
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- Novelty search (Lehman & Stanley, 2008)
- Buffers and Nearest Neighbors (Fu et al., 2017)
- Generating goals (Srivastava et al., 2013; Kulkarni et al., 2016)
- Learning progress (Oudeyer et al., 2007; Schmidhuber, 2008)
- Generating diverse skills (Eysenbach et al., 2018)
- Stochastic neural networks (Florensa et al., 2017; Fortunato et al., 2017)
- Count-based exploration (Tang et al., 2017)
- Object-based curiosity measures (Forestier & Oudeyer, 2016)
- Bonus-based (Taiga et al., 2019)
- AutoML Style Approach
- Neural Architecture Search (NAS)
- Hyperparameter optimization for deep networks
- Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
- Meta-learning with genetic programming, evolutionary computing
- Programming Automation
- Searching over mathematical operations within neural networks
- Neural networks that learn programs
- Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
- Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
- Agent57 (DeepMind)
- Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
- Deterministic Policy Gradient Algorithm
- Continuous Control with Deep Reinforcement Learning
- Approximetely Optimal Approximate Reinforcement Learning
- True Region Policy Optimization
- Proximal Policy Optimization Algorithms
- ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
- Implementation Matters In Deep RL () paper
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
- Dream to Control: Learning Behaviors by Latent Imagination () paper
- Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper
Batch Normalization
Lipschitz gradient
Global Batch Normalization
Input Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
How Does Batch Normalization Help Optimization?
Layer Normalization https://arxiv.org/abs/1607.06450
LeCun Initialization Efficient BackProp
Xavier initialization Understanding the difficulty of training deep feedforward neural networks
He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Nesterov Optimizer (Optimization๋ฅ ๋ ผ๋ฌธ๋ค)
weight_standardization
- Long short-term memory (Neural Computation 1997) paper
- LSTM: A Search Space Odyssey (IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017) paper
- Highway Networks (15.05.03, arxiv) paper
- Full Paper: Training Very Deep Networks link
- Recurrent Highway Networks (ICML 2017) paper
- Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (IEEE 2001) paper paper
- Bidirectional LSTM networks for improved phoneme classification and recognition (International Conference on Artificial Neural Networks 05.09.11)
- Sequential neural text compression (IEEE 1996) paper
- Neural expectation maximazation (NIPS 2017) paper
- Accelerated Neural Evolution through Cooperatively Coevolved Synapses (JMLR 2008) paper
- World Models (18.05.09, arxiv) paper
LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
C3D Learning Spatiotemporal Features with 3D Convolutional Networks
n-gram ๊ด๋ จ ๋ ผ๋ฌธ
- Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
- Interpolated estimation of Markov source parameters from sparse data
Pointing the Unknown Words (๋ชฌํธ๋ฆฌํ ๋ํ)
Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Real-World Anomaly Detection in Surveillance Videos
self-attention on classification - A Structured Self-Attentive Sentence Embedding