Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Tokenization | Word Tokenization | Medium Github | |
| Tokenization | Sentence Tokenization | Medium Github | |
| Part of Speech | Medium Github | ||
| Lemmatization | Medium Github | ||
| Stemming | Medium Github | ||
| Stop Words | Medium Github | ||
| Phrase Word Recognition | |||
| Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github |
| Spell Checking | Lexicon-based | Symspell | Medium Github |
| Spell Checking | Machine Translation | Statistical Machine Translation | Medium |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Pattern-based Recognition | Medium | ||
| Lexicon-based Recognition | Medium | ||
| Named Entity Recognition (NER) | Pre-trained NER | Medium Github | |
| Custom NER |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Extractive Approach | Medium Github | ||
| Abstractive Approach |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | ||
| Edit Distance | Medium Github | ||
| Word Moving Distance (WMD) | Medium Github |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Traditional Method | Bag-of-words (BoW) | Using all words as a feature | Medium Github |
| Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | ||
| Character Level | Character Embedding | Using all character (word, number, special character) for computing word vector | Medium Github |
| Word Level | Negative Sampling and Hierarchical Softmax | ||
| Word2Vec, GloVe, fastText | Medium Github | ||
| Contextualized Word Vectors (CoVe) | Medium Github | ||
| Embeddings from Language Models (ELMo) | Medium Github | ||
| Sentence Level | Skip-thoughts | Medium Github | |
| InferSent | Medium Github | ||
| Document Level | lda2vec | Medium | |
| doc2vec | Using a unsupervised learning approach to leanr the word vectors for computing document vector | Medium Github |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| ELI5, LIME and Skater | Medium Github | ||
| SHapley Additive exPlanations (SHAP) | Medium Github | ||
| Anchors | Medium Github |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Using Deep Learning can resolve all problem? | Medium Kaggle |
| Section | Sub-Section | Description | Link |
|---|---|---|---|
| Spellcheck | Github | ||
| InferSent | Github |