NLP :
Week 01 (3 hours) - Introduction to Natural Language Processing (NLP)
Course Introduction & Motivation: Overview of why NLP is important and its
applications.
Multilingualism: The challenges and importance of handling multiple languages in NLP.
Morphology in Languages: Understanding word structure and formation in different
languages.
Part-of-Speech (PoS) Tagging: Introduction to PoS tagging, which assigns parts of speech
to each word in a sentence.
Week 02 (3 hours) - PoS Tagging Layer of NLP
Mathematics of PoS tagging: The underlying math used in PoS tagging models.
Sequences in NLP: Understanding sequential data, such as sentences, and their
processing in NLP.
NLP Lab 1 (Non-graded): Focuses on simple matrix operations using NumPy and scikit-
learn for basic NLP tasks.
Week 03 (3 hours) - Hidden Markov Models (HMM) in NLP
PoS Tagging (HMM): Using HMMs for tagging parts of speech.
Viterbi Decoding for Tagging and Sequences: Applying the Viterbi algorithm for efficient
sequence tagging.
NLP Lab 2 (Non-graded): A PoS tagging task using the most frequent tag assignment
technique.
Week 04 (3 hours) - Handling Sequential Tasks
Shallow Parsing: Breaking down a sentence into smaller chunks without deep syntactic
structure.
Named Entity Recognition (NER): Identifying named entities (people, organizations,
locations) within text.
Introduction to Conditional Random Field (CRF): An advanced method for sequence
prediction tasks like PoS tagging and NER.
Challenges due to Morphological Richness: Handling the complexity of languages with
rich morphological structures (e.g., inflections, prefixes, suffixes).
Week 05 (3 hours) - Feature Engineering
CRF (contd.): Continuation of CRF methods.
Maximum Entropy Markov Model (MEMM): A probabilistic sequence model that extends
the HMM.
Feature Extraction and Engineering: Techniques for extracting meaningful features from
text data to improve NLP models.
NLP Lab 3 (Non-graded): A task focused on performing NER in multiple languages.
Week 06 (3 hours) - Knowledge Bases and Ambiguity
Ambiguity and NLP: Understanding and addressing the inherent ambiguities in language
processing.
Knowledge Bases (WordNet, FrameNet, VerbNet): Exploring resources like WordNet for
semantic relationships, FrameNet for event structures, and VerbNet for verb
categorization.
Word Sense Disambiguation (WSD): Techniques for determining the correct meaning of a
word in a given context.
NLP Lab 4 (Graded): A lab focused on disambiguating word senses in context.
Week 07 (3 hours) - Applications of Neural Networks (NN) in NLP
Cognate Detection and its applications: Identifying cognates (words in different
languages that share a common origin) using NLP techniques.
NER using NNs: Applying neural networks to perform NER.
Text Classification using NNs: Using neural networks to classify text into predefined
categories.
Transformer Architecture: An introduction to the Transformer model, which has
revolutionized NLP tasks like machine translation and text generation.
Introduction to Distributional Semantics: Understanding how word meaning can be
represented in vector spaces based on context.
Week 08 (3 hours) - Distributional Semantics
word2vec, doc2vec, sent2vec: Techniques for representing words, documents, and
sentences as vectors in continuous space.
sub-words in NLP: Handling smaller units of text (sub-words) for languages with complex
morphology.
FastText: A model that improves upon word2vec by incorporating sub-word information.