INT405 Natural Language
Processing
Lecture 1 – Introduction and History of NLP
Dr Chunchuan Lyu
Module Leader-Dr Chunchuan Lyu
• Graduated from The University of Edinburgh and XJTLU
Studied computational semantics but moved to unsupervised
reinforcement learning (what an agent should do if no reward is
given?) , offline RL and combinatorial optimization
• Contact: chunchuan.lyu@xjtlu.edu.cn
• TA: Xiao Ke xiao.ke25@student.xjtlu.edu.cn
Working on LLM+Vision
• Feel free to email us at any time
Module Structure
• Lectures + Tutorial/Lab (jax/pytorch)
• Take-home Assessment 15%
• Take-home Lab Report 15%
• Default Project or Custom Project 70% (3 people, individual report )
• There will oral assessment for all of the components
• Reference Books: https://web.stanford.edu/~jurafsky/slp3/
https://www.deeplearningbook.org/
A custom project should be NLP related and consist of significant work.
Contact me as early as possible to get approval from me.
Please, ask questions at anytime during the lecture/tutorial.
Additional References
https://web.stanford.edu/class/cs224n/ CS224N: Natural Language
Processing with Deep Learning Stanford
https://cmu-l3.github.io/anlp-spring2025 CMU Advanced NLP, Many
New Papers in Their Readings
https://www.bishopbook.com/ Deep Learning Foundations and Concepts
All you need to know for Deep Learning in 2025. Starting from basic
probability
Something I learnt from last year’s offering
• Some of you are struggle with English.
Please, stop me for clarification on terms you are not sure.
• You can be shy.
Additional mini-tutorial (in place of office hour,
Bilingual 双语): 11:00-13:00 Monday at SD214 Ask Me
Anything. or random topic that I want to talk about.
• The module will be a struggle, but it is worth it.
Last year’s cohort is working very hard on this
module, but we get the highest students’ rating among INT
MRes modules.
Lecture Today
• What is NLP?
• A Not So Brief History of NLP
• The Status Now
• What you will learn and why
What is Natural Language Processing?
Me: What is the goal of natural language processing?
ChatGPT-4o:
The goal of Natural Language Processing (NLP) is to enable computers to understand, interpret,
and generate human language in a way that is both meaningful and useful. …
Qwen-72B-T:
The goal of Natural Language Processing (NLP) is to enable computers to understand, interpret,
and generate human language in a way that is both accurate and contextually meaningful….
Gemini-1.5-Pro-Search:
The ultimate goal of Natural Language Processing (NLP) is to enable computers to comprehend and
interact with human language as effectively as humans do. [1][2] …
LLMs all agree that understand, interpret and generate are the tasks.
They use very ambiguous terms in terms of evaluation.
What is meaningful/useful? How can we know?
What is Natural Language Processing?
Me: What is the goal of natural language processing?
Llama-3.1-405B-T:
The primary goal of Natural Language Processing (NLP) is to enable computers to understand,
interpret, and generate human language, allowing humans and machines to communicate effectively
and efficiently.
NLP aims to bridge the gap between human language and computer language, enabling computers to:
Understand human language: NLP helps computers to comprehend the meaning of text or speech,
including grammar, syntax, semantics, and pragmatics.
Interpret human language: NLP enables computers to extract relevant information, identify patterns,
and make inferences from text or speech.
Generate human language: NLP allows computers to produce human-like text or speech, such as
chatbots, language translation systems, and text summarization tools.
Formal
Language
(code)
Generate Formal
Natural Language
Language (execution
result)
What is Natural Language Processing?
Me: What is the goal of natural language processing?
Llama-3.1-405B-T:
The primary goal of Natural Language Processing (NLP) is to enable computers to understand,
interpret, and generate human language, allowing humans and machines to communicate effectively
and efficiently.
NLP aims to bridge the gap between human language and computer language, enabling computers to:
Large language models to rule them all?
Or should it be?
Formal
Language
(code)
Generate Formal
Natural Language
Language (execution
result)
What is Natural Language Processing?
Large language models to rule them all?
Or should it be?
Formal
Language
(code)
Generate Formal
Natural Language
Language (execution
result)
Understand
or
Natural Natural
Language Language
Generate
What is Natural Language Processing?
Me: What is the goal of natural language processing?
Llama-3.1-405B-T:
The primary goal of Natural Language Processing (NLP) is to enable computers to understand,
interpret, and generate human language, allowing humans and machines to communicate effectively
and efficiently.
NLP aims to bridge the gap between human language and computer language, enabling computers to:
Understand human language: NLP helps computers to comprehend the meaning of text or speech,
including grammar, syntax, semantics, and pragmatics.
Interpret human language: NLP enables computers to extract relevant information, identify patterns,
and make inferences from text or speech.
Generate human language: NLP allows computers to produce human-like text or speech, such as
chatbots, language translation systems, and text summarization tools.
Do you think that current (or in a very near future) NLP system can Understand human language?
Do you think that current (or in a very near future) NLP system can Interpret human language?
Do you think that current (or in a very near future) NLP system can Generate human language?
Which one is the hardest? Go to LM to vote.
A Brief History of NLP
Let us travel back in time to 1940s
What do you expect computers can do
in terms of NLP?
" One naturally wonders if the problem
of translation could conceivably be
treated as a problem in cryptography.
When I look at an article in Russian, I
say: 'This is really written in English,
but it has been coded in some strange
symbols. I will now proceed to
decode. "
— Warren Weaver, Letter to Norbert
Harvard Mark II 1947 Wiener, March 4, 1947
A Brief History of NLP
The Mathematical Theory of
Communication, 1949,
Claude Shannon and
Warren Weaver
Warren Weaver 1894-1978
English Russian Some Errors English
" One naturally wonders if the problem of
translation could conceivably be treated as a
problem in cryptography. When I look at an
article in Russian, I say: 'This is really written
in English, but it has been coded in some
strange symbols. I will now proceed to
decode. "
Sounds great, but
“The early workers were often mathematicians who
struggled with the primitive computing machinery.” [3]
A Brief History of NLP
Claude Shannon
1919-2001
" I visualize a time when we will be to robots
Prediction and Entropy of Printed English, 1950, Claude Shannon what dogs are to humans. And I am rooting
for the machines"
Clearly, this is autoregressive language model.
A Brief History of NLP
Claude Shannon
1919-2001
" I visualize a time when we will be to robots
Prediction and Entropy of Printed English, 1950, Claude Shannon what dogs are to humans. And I am rooting
for the machines "
And, we have masked language model.
A Brief History of NLP
Due to the coding difficulties, what we have in the
1940s/1950s are rule-based system.
The first was a dictionary look-up system developed at
Birkbeck College, London in 1948.
Georgetown–IBM experiment happened in 1954, where
a algorithm first translates Russian words into numerical
codes, then performs the following case-analysis on
each numerical code to choose between possible
English word translations, reorder the English words, or IBM 701
omit some English words. [4]
Well publicized by journalists and perceived as a success, the experiment did
encourage governments to invest in computational linguistics. The authors claimed
that within three or five years, machine translation could well be a solved problem.
Of course, it failed and the progress slowed down.
Language was far more complex than they had imagined.
The obvious place to look for help was from Linguistics.
A Brief History of NLP
An American professor, father of modern linguistics
• Transformational Analysis (1955)
• Syntactic Structures (1957)
• Minimalist program (1995) Noam Chomsky
1928-Now
“It must be recognized that the
notion 'probability of a sentence' is
What is language? an entirely useless one, under any
known interpretation of this term.”
Why does it have the properties it has?
Formal Basis of a Language Universal (2021 Miloš Stanojević, Mark Steedman)
Lesson: language is much more complicated than what is expected
(from a rule-based system perspective)
A Brief History of NLP
Rules:
S → NP VP
NP → Det Nominal
Nominal → Noun | Nominal PP
VP → VP PP | Verb NP
PP → Preposition NP
Context-Free Grammar
A Brief History of NLP
AI winter
US funding of Machine Translation research was reckoned to have cost the public purse $20 million
by the mid 1960s. The Automatic Language Processing Advisory Committee (ALPAC) produced a
report on the results of the funding and concluded that "there had been no machine translation of
general scientific text, and none is in immediate prospect". US funding for machine translation
was stopped, and this had the effect of halting most associated work in non-Machine Translation
NLP.
Due to lack of funding, much of work in NLP came from the computational linguistics during the AI
winter.
A Brief History of NLP
Transition Network Grammars for Natural Language Analysis, 1970, W. A. WOODS
A Brief History of NLP
Charles J. Fillmore developed frame semantics under the name
Case Grammar in 1968 "The Case for Case“. Later, we have
computational semantics
Charles J. Fillmore
1929-2014
"By finding representative
Example of Propbank from [slp3] examples of the uses of
each word and classifying
the meanings of the phrases
that go with them, we are
able to include in our
database information about
the words in our language
that standard dictionaries
simply don't have room for,"
Example of Abstract Meaning Representation from [slp3]
A Brief History of NLP
Richard Montague developed formal semantics under the
name Universal Grammar in 1970 "Universal grammar “.
Richard Montague
1930-1971
"There is in my opinion no
important theoretical
difference between natural
languages and the artificial
languages of logicians "
Example of expressions and their meaning from [5]
A Brief History of NLP
Mark Steedman developed Combinatory Categorial Grammar
in 1996, “Surface Structure and Interpretation”.
Mark Steedman
1946-now
" The problem of child
language acquisition then
reduces to the problem of
learning (a) a lexicon, and
(b) a parsing model "
All language-specific information is specified in the lexicon.
Syntactic and phonological derivation are isomorphic, and are
synchronously coupled with semantic composition in a purely type-
dependent rule-to-rule relation.
A Brief History of NLP
What computational linguistics really want?
1. A theory of what is language
You need to distinguish strings that should be considered as
grammatical v.s. those are not or in another language
2. A theory of meaning representation
You have to specify what is meaning (e.g. logical form)
3. A theory of constructing the meaning representation (via syntactic
rules)
You have to specify how to get meaning from natural language
4. A theory of why learning such construction is possible for children
You have to explain why children can lean such complicated
structure, as we don’t memorize the whole internet to learn language
A Brief History of NLP
Computational Linguistics v.s. Natural Language Processing
1. A theory of what is language (syntax)
2. A theory of meaning representation (semantics)
3. A theory of constructing the meaning representation (parsing)
4. A theory of why learning such construction is possible for children (language acquisition)
Compare this list with goals of NLP (understand, interpret and generate)
Does NLP care about 1?
No, we are more than happy to be able to process ungrammatical sentences from the user.
Does NLP care about 2?
Yes, vectors are meaning representation, also executable code can be helpful.
Does NLP care about 3?
Yes, we need our meaning representation and get them. Even for vectors, how to compute
transformer faster is important.
Does NLP care about 4?
Not at all. However, some computational linguists working in the CS department might do.
In all, computational linguistics are an means to an end for NLP.
That being said, those are my personal opinions.
A Brief History of NLP
Fred Jelinek developed the first statistical nlp system in 1976,
“Continuous speech recognition by statistical methods”.
This experimental
system established
the statistical
principle of NLP.
An early paper in
1975, “Design of a Fred Jelinek
Linguistic Statistical 1932-2010
Decoder for the
Recognition of "Every time I fire a linguist,
Continuous Speech” the performance of the
by F. Jelinek, L. speech recognizer goes up "
Bahl, R. Mercer
developed the " Some of My Best Friends
hidden Markov Are Linguists "
model idea for
speech.
A Brief History of NLP
Peter Brown and others at IBM finally developed the first statistical
MT system in 1988, “A statistical approach to language translation”.
The noisy channel
model are still
researched today,
and worked in
speech industry at
least until around
2018. Peter Brown
1955-Now
"We fed nothing but data" [6]
On a technical level, the paper introduced word to word
alignments as latent variables.
On a high level, machine translation really begins to be useful.
A Brief History of NLP
Sepp Hochreiter and Jürgen Schmidhuber
developed "Long short-term memory“ in 1995
Jürgen Schmidhuber
1963-Now
So @ylecun: "I've been
advocating for deep learning
architecture capable of
Addressed the long range dependency of representations. planning since 2016" vs me:
"I've been publishing deep
learning architectures
capable of planning since
1990."
A Brief History of NLP
Yoshua Bengio and others developed the first neural language model
in 2000, “A Neural Probabilistic Language Model ”.
Yoshua Bengio
1964-Now
"What fascinates me is how
an intelligent agent, animal,
human or machine,
can figure out how their
environment works. "
A Brief History of NLP
Tomáš Mikolov and others developed the first working neural word
representations, “Efficient Estimation of Word Representations in
Vector Space”.
Tomáš Mikolov
1982-Now
"What if actually not
achieving A.I. is the biggest
existential threat for
humans?"
A Brief History of NLP
Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton
developed deep learning based speech system in 2013, “Speech
recognition with deep recurrent neural networks”.
Geoffrey Hinton
1947-Now
"My advice is to always trust
your own intuition. Either you
have good intuitions or you
don't. If you have good
This really showed neural networks improve NLP system.
intuitions you need to follow
them and if you have bad
intuitions it doesn't really
matter what you do, so you
might as well follow them
anyway"
A Brief History of NLP
Ilya Sutskever and others developed GPT2 in 2019, “Language
Models are Unsupervised Multitask Learners”.
This is the first work that shows, language model alone can solve
multiple tasks.
GPT3 is developed in the paper in 2020 “ Language Models are Few-
Shot Learners”
Ilya Sutskever
Where language models can “learn” new tasks on the fly.
1986-Now
"The things that was missing
Then came ChatGPT, average people get to know the power of language
was a lot of supervised data
models.
and a lot of compute and the
conviction to mix them
This line of work cannot simply be attributed to one person’s
together. "
leadership, but Ilya is a real star.
The Status Now
Dynabench: Rethinking Benchmarking in NLP
Supervised learning has suppressing human performance.
The Status Now
Emergent Abilities of Large Language Models
LLM performance keep scaling up with more compute
The Status Now
AI clusters run in Ocean
for better cooling.
“ We could have AGI in 5 years”, says - OpenAI CEO Sam Altman
“ In fact, its only achievement is to use up a lot of California’s energy. You can’t go to a physics
conference and say: I’ve got a great theory. It accounts for everything and is so simple it can be
captured in two words: “Anything goes.” ” - Noam Chomsky
"Don't work on LLM. This is in the hands of large companies, there's nothing you can bring to the
table. You should work on next-gen AI systems that lift the limitations of LLMs. “ – Yann Lecun
What you will learn and why
• Traditional NLP tasks from linguistics
Hopefully, we can convince you that rule-based system or the structure of language are
still relevant today. In addition, some classification tasks are still important and LLM does not perform
so well.
• Lots of (probabilistic) inference algorithm
Those algorithms could help you in other structure prediction tasks such as protein
generation, and planning.
• Deep neural networks
To understand what exactly neural networks are
• Small language model at code level
To have an understanding of how exactly language model functions token by token
Planned syllabus
1. History + Regular Expression
2. Text Classification + Probabilistic Inference (Linear Model and Naïve Bayes)
3. Sequence Labeling + Probabilistic Graphical Model (Hidden Markov Model)
4. Vector Semantics
5. Neural Sequence Models (LSTM Transformer) + Machine Translation
6. Neural Language Models (BERT GPT) + Finetuning
7. Syntactic Processing (Context Free Grammar) + CYK (Cocke–Younger–Kasami algorithm )
8. Semantic Processing (Knowledge Graph, SQL)
9. LLMs In-Context Learning + Reinforcement Learning from Human Feedback
10. LLMs Evaluation
11. LLMs Physics (https://physics.allen-zhu.com/)
12. LLMs Vision + Frontier
13. Project Presentation
Prerequisites (from Stanford cs224n)
Proficiency in Python
All class assignments will be in Python (using NumPy and PyTorch). If you need to remind yourself of
Python, or you're not very familiar with NumPy, you can come to the Python review session in week 1
(listed in the schedule). If you have a lot of programming experience but in a different language (e.g.
C/C++/Matlab/Java/Javascript), you will probably be fine.
College Calculus, Linear Algebra
You should be comfortable taking (multivariable) derivatives and understanding matrix/vector notation
and operations.
Basic Probability and Statistics
You should know the basics of probabilities, gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with gradient
descent. If you already have basic machine learning and/or deep learning knowledge, the course will
be easier; however it is possible to take CS224n without it. There are many introductions to ML, in
webpage, book, and video form. One approachable introduction is Hal Daumé’s in-progress A Course
in Machine Learning. Reading the first 5 chapters of that book would be good background. Knowing
the first 7 chapters would be even better!
Quick Review
• NLP is about human to machine communication that has three
major components: understand, interpret and generate.
• NLP went from statistical (never realized) to rule-based (hit a
wall) to statistical and finally combined with deep learning.
• Many people do believe autoregressive models can achieve AGI,
but some either doubt its’ reliability or dismiss it as a scientific
theory of intelligence.
THANK YOU