CSE440: Natural Language
Processing II
Dr. Farig Sadeque
Assistant Professor
Department of Computer Science and Engineering
BRAC University
Lecture 1: Introduction
Class Expectation
What can we expect?
- Some linguistics knowledge
- A whole lot of algorithms. This is an algorithms course
- A bunch of programming
What this course is not:
- This is not a linguistics course. We will learn whatever linguistics we require
during the course
- This is not a machine learning/neural network course. There will be a refresher
within the first couple of weeks, but for deeper understanding, please take the
respective courses.
Class Expectations
- Several semesters of programming
- Primary language: Python
- Linguistics experience helpful, not required
- Mathematical experience helpful, not required
- As long as you can understand some maths notation, you will do well
Course Structure
- Attendance: 5%
- Assignments/project: 20%
- Quiz (best 3 out of 4): 20%
- Midterm: 25%
- Final 30%
Course Contents
- Books, lectures, assignments etc.
- https://drive.google.com/drive/folders/1KXTLF4oq_4otsFU-s1BowvlmCchuzgq2?u
sp=sharing
Consultation
- You can visit me during consultation hours
- You can also book a timeslot– online or offline
For in-office consultation For online consultation
Course Plan
Linguistics essentials Sequence tagging
● Sentence segmentation ● Sequence tagging basics
● Tokenization ● Markov Models
● Lemmatization/Stemming ● Deep Learning Architectures: Recurrent Neural Network
● Parts-of-Speech tagging ● Transfer Learning with Pretrained Language Models
● Named Entity Recognition Parsing
● Parsing ● Parsing Basics
● Coreference Resolution ● Constituency Grammar
Machine Learning Essentials Review ● Constituency Parsing
● Probability Review ● Dependency Parsing
● Naive Bayes, Logistic regression Translation
● Splits, metrics, statistical significance Coreference
● Essential ML maths refresher Text Generation: Encoder-Decoder Algorithm
Text Categorization and Representations Question Answering
● Representation basics
● Word embeddings
● Contextual embeddings
● Text Categorization Algorithms
Why is NLP Hard?
Ambiguity
- Phonetics: I scream? Ice cream?
- Morphology: Union-ized? Un-ionized?
- Syntax: Squad helps dog bite victim.
- Squad helps a dog to bite a victim?
- Squad helps a dog-bite victim?
- Semantics: Ball: an orb, or a dance?
- “High-end” nonsense: Colorful green ideas sleep furiously.
- Discourse: see that photo again
Variability
He bought it
- He purchased it
- He acquired it
- It was bought by him (and all other synonyms with passive voice)
- It was sold to him
- ……….
Language Change
Language Change
- English beats up other languages in dark alleys, then rifles through their
pockets for loose grammar and spare vocabulary
- Example: We eat beef, but we raise cows.
- Fun video: https://www.youtube.com/watch?v=Jl3K63Rbygw