100% found this document useful (1 vote)

456 views34 pages

NLP Unit 1

The document outlines a syllabus for a natural language processing course, including prerequisites of basic English grammar and machine learning knowledge, course objectives of understanding NLP algorithms and tasks, and units covering introductions to NLP, applications like information extraction and question answering, and the different approaches of rule-based, statistical, and neural machine translation.

Uploaded by

hellrider22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

456 views34 pages

NLP Unit 1

Uploaded by

hellrider22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Natural Language Processing

Dr. Ankur Priyadarshi

Assistant Professor
Computer Science and Information Technology
Syllabus

Prerequisites:

1. Basic knowledge about English grammar and

Theory of Computation.
2. Basic knowledge in Machine Learning tools.
Course objectives
1. To understand the algorithms available for the processing of
linguistic information and computational properties of natural languages.

2. To conceive basic knowledge on various morphological,

syntactic and semantic NLP tasks.

3. To familiarize various NLP software libraries and datasets

publicly available.

4. To develop systems for various NLP problems with moderate

complexity.

5. To learn various strategies for NLP system evaluation and error

analysis.
Unit I:
INTRODUCTION TO NLP
Natural Language
Processing
⊹ Natural language processing (NLP) refers to the branch of computer

science—and more specifically, the branch of artificial intelligence or

AI—concerned with giving computers the ability to understand text and

spoken words in much the same way human beings can.

⊹ NLP combines computational linguistics—rule-based modeling of human

language—with statistical, machine learning, and deep learning models.

⊹ Together, these technologies enable computers to process human

language in the form of text or voice data and to ‘understand’ its full

meaning, complete with the speaker or writer’s intent and sentiment.

NLP APPLICATIONS

1. Information Extraction

2. Question Answering

3. Sentiment Analysis

4. Machine Translation and many..

Speech recognition, Intent classification, Urgency detection, Auto-correct, Market Intelligence, Email
filtering, Voice assistants and chatbots, Advertisement to target audience, Recruitment
Information Extraction (IE)
1. Working with an enormous amount of text data is always hectic

and time-consuming.

2. Hence, many companies and organisations rely on Information

Extraction techniques to automate manual work with intelligent

algorithms.

3. Information extraction can reduce human effort, reduce expenses,

and make the process less error-prone and more efficient.

Example: IE

We can extract the following information from the text:

● Country – India, Captain – Virat Kohli

● Batsman – Virat Kohli, Runs – 2
● Bowler – Kyle Jamieson
● Match venue – Wellington
● Match series – New Zealand
● Series highlight – single fifty, 8 innings, 3 formats
Question Answering

⊹ Question answering is a critical NLP problem and a

long-standing artificial intelligence milestone.
⊹ QA systems allow a user to express a question in natural
language and get an immediate and brief response.
⊹ QA systems are now found in search engines and phone
conversational interfaces, and they’re fairly good at
answering simple snippets of information.
⊹ On more hard questions, however, these normally only go as
far as returning a list of snippets that we, the users, must
then browse through to find the answer to our question.
Sentiment Analysis

⊹ Sentiment analysis (or opinion mining) is a natural

language processing (NLP) technique used to determine

whether data is positive, negative or neutral.

⊹ Sentiment Analysis, as the name suggests, it means to

identify the view or emotion behind a situation. It basically

means to analyze and find the emotion or intent behind a

piece of text or speech or any mode of communication.

Suppose, there is a fast-food chain company and they sell a variety of

different food items like burgers, pizza, sandwiches, milkshakes, etc. They

have created a website to sell their food and now the customers can order

any food item from their website and they can provide reviews as well, like

whether they liked the food or hated it.

● User Review 1: I love this cheese sandwich, it’s so delicious.

● User Review 2: This chicken burger has a very bad taste.
● User Review 3: I ordered this pizza today.
So, as we can see that out of these above 3 reviews,

The first review is definitely a positive one and it signifies that the customer was

really happy with the sandwich. The second review is negative, and hence the

company needs to look into their burger department. And, the third one doesn’t

signify whether that customer is happy or not, and hence we can consider this as a

neutral statement.
Machine Translation
Machine Translation (MT) is the task of automatically
converting one natural language into another, preserving
the meaning of the input text, and producing fluent text in the
output language.

Machine Translation (MT) is the task of automatically converting one natural

language into another, preserving the meaning of the input text, and producing
ﬂuent text in the output language.

While machine translation is one of the oldest subﬁelds of artiﬁcial intelligence

research, the recent shift towards large-scale empirical techniques has led to
very signiﬁcant improvements in translation quality.

The Stanford Machine Translation group's research interests lie in techniques

that utilize both statistical methods and deep linguistic analyses.
Machine translation: approaches

● Rule-based Machine Translation (RBMT): 1970s-1990s

● Statistical Machine Translation (SMT): 1990s-2010s

● Neural Machine Translation (NMT): 2014-...

Rule based MT (RBMT)

A rule-based system requires experts’ knowledge about the source and

the target language to develop syntactic, semantic and morphological

rules to achieve the translation.

The Wikipedia article of RBMT includes a basic example of rule-based

translation from English to German. The translation needs an

English-German dictionary, a rule set for English grammar and a rule

set for German grammar

An RBMT system contains a pipeline of Natural Language Processing

(NLP) tasks including Tokenization, Part-of-Speech tagging and so on.

Most of these jobs have to be done in both source and target language.

SYSTRAN is one of the oldest Machine Translation company.

It translates from and to around 20 languages.

SYSTRAN was used for the Apollo-Soyuz project (1973) and by the

European Commission (1975)

Advantages

● No bilingual text required

● Domain-independent

● Total control (a possible new rule for every situation)

● Reusability (existing rules of languages can be transferred

when paired with new languages)

Disadvantages

● Requires good dictionaries

● Manually set rules (requires expertise)

Statistical MT
This approach uses statistical models based on the analysis of bilingual

text corpora.

It was first introduced in 1955, but it gained interest only after 1988

when the IBM Watson Research Center started using it.

SMT Examples

● Google Translate (between 2006 and 2016, when they

announced to change to NMT)

● Microsoft Translator (in 2016 changed to NMT)

● Moses: Open source toolkit for statistical machine translation

Advantages

● Less manual work from linguistic experts

● One SMT suitable for more language pairs

● Less out-of-dictionary translation: with the right language

model, the translation is more fluent

Disadvantages

● Requires bilingual corpus

● Specific errors are hard to fix

● Less suitable for language pairs with big differences in word

Neural MT
❖ The neural approach uses neural networks to achieve machine

translation.

❖ Compared to the previous models, NMTs can be built with one

network instead of a pipeline of separate tasks.

NMT examples

● Google Translate (from 2016) link to language team at Google

● Microsoft Translate (from 2016) link to MT research at

Microsoft

● Translation on Facebook: link to NLP at Facebook AI

● OpenNMT: An open-source neural machine translation

system.
Advantages

● End-to-end models (no pipeline of specific tasks)

Disadvantages

● Requires bilingual corpus

● Rare word problem

NLP PHASES
Lexical Analysis
● It involves identifying and analyzing the structure of words. Lexicon of a

language means the collection of words and phrases in that particular

language.

● The lexical analysis divides the text into paragraphs, sentences, and words.

So we need to perform Lexicon Normalization.

The most common lexicon normalization techniques are Stemming:

● Stemming: Stemming is the process of reducing derived words to their word

stem, base, or root form generally a written word form like-“ing”, “ly”, “es”, “s”,
etc
● Lemmatization: Lemmatization is the process of reducing a group of words
into their lemma or dictionary form. It takes into account things like POS(Parts
of Speech), the meaning of the word in the sentence, the meaning of the
word in the nearby sentences, etc. before reducing the word to its lemma.
Syntactic Analysis
Syntactic Analysis is used to check grammar, arrangements of words,

and the interrelationship between the words.

Example: Mumbai goes to the Sara

Here “Mumbai goes to Sara”, which does not make any sense, so this

sentence is rejected by the Syntactic analyzer.

Syntactical parsing involves the analysis of words in the sentence for

grammar.

Dependency Grammar and Part of Speech (POS) tags are the important

attributes of text syntactic.

Semantic analysis

The way we understand what someone has said is an unconscious

process relying on our intuition and knowledge about language itself.

In other words, the way we understand language is heavily based on

meaning and context. Computers need a different approach, however.
The word “semantic” is a linguistic term and means "related to
meaning or logic."

Semantic analysis is the process of understanding the meaning and

interpretation of words, signs and sentence structure.
Discourse Integration

Discourse integration is closely related to pragmatics (context of the sentence).

Discourse integration is considered as the larger context for any smaller part of NL

structure. NL is so complex and, most of the time, sequences of text are dependent

on prior discourse.

This concept occurs often in pragmatic ambiguity. This analysis deals with how the

immediately preceding sentence can affect the meaning and interpretation of the

next sentence. Here, context can be analyzed in a bigger context, such as paragraph

level, document level, and so on.

Pragmatic Analysis
Pragmatic Analysis is part of the process of extracting information from text.
Speciﬁcally, it’s the portion that focuses on taking a structures set of text and
ﬁguring out what the actual meaning was.

It actually comes from the ﬁeld of linguistics (as a lot of NLP does), where the
context is considered from the text.

Why is this important? Because a lot of text’s meaning does have to do with
the context in which it was said/written.

Ambiguity, and limiting ambiguity, are at the core of natural language

processing, so needless to say, pragmatic analysis is actually quite crucial
with respect to extracting meaning or information.
Difficulty In NLP

● Contextual words and phrases and homonyms

● Synonyms

● Irony and sarcasm

● Ambiguity

● Errors in text or speech

● Colloquialisms and slang

● Domain-speciﬁc language

● Low-resource languages

● Lack of research and development

1 - Introduction TO NLP
100% (1)
1 - Introduction TO NLP
46 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
Unit I - NLP
No ratings yet
Unit I - NLP
24 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
51 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Solution To NLP Viva Questions
No ratings yet
Solution To NLP Viva Questions
21 pages
NLP Basics for AI Enthusiasts
100% (1)
NLP Basics for AI Enthusiasts
21 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
B.Tech CSE NLP Course Overview
No ratings yet
B.Tech CSE NLP Course Overview
24 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
NLP UNIT 1 (Ques Ans Bank)
No ratings yet
NLP UNIT 1 (Ques Ans Bank)
20 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
NLP QB
100% (2)
NLP QB
14 pages
Introduction To NLP
No ratings yet
Introduction To NLP
30 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
NLP Unit 4
No ratings yet
NLP Unit 4
40 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
NLP Notes (Ch1-5) PDF
100% (1)
NLP Notes (Ch1-5) PDF
41 pages
NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Question Bank
No ratings yet
Question Bank
13 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Generative Models For Ambiguity Resolution
No ratings yet
Generative Models For Ambiguity Resolution
8 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Unit-Iii
No ratings yet
NLP Unit-Iii
26 pages
Unit 1
No ratings yet
Unit 1
99 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Natural Language Processing Parsing Techniques:: Unit IV
100% (1)
Natural Language Processing Parsing Techniques:: Unit IV
24 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
21AD3202 - Natural LanguageProcessing-Record
No ratings yet
21AD3202 - Natural LanguageProcessing-Record
64 pages
NLP Previous Question Papers
No ratings yet
NLP Previous Question Papers
5 pages
UNIT-V NLP
No ratings yet
UNIT-V NLP
25 pages
Language Model Adaptation
100% (1)
Language Model Adaptation
10 pages
Unit 2
No ratings yet
Unit 2
15 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP for Tech Enthusiasts
No ratings yet
NLP for Tech Enthusiasts
35 pages
Semantic Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Semantic Analysis: Natural Language Processing (CSE 5321)
35 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
Introduction to NLP Techniques
100% (1)
Introduction to NLP Techniques
105 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
NLP Lab File
100% (2)
NLP Lab File
66 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
NLP Course for B.Tech CSE Students
100% (1)
NLP Course for B.Tech CSE Students
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP Sem Questions and Answers
100% (1)
NLP Sem Questions and Answers
72 pages
Chapter 6 NLP
No ratings yet
Chapter 6 NLP
16 pages
UNIT 6 Applications of NLP
No ratings yet
UNIT 6 Applications of NLP
7 pages
Natural Language Processing Unit 5
No ratings yet
Natural Language Processing Unit 5
23 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
NLP and Machine Translation Overview
No ratings yet
NLP and Machine Translation Overview
100 pages
Machine Translation Thesis PDF
100% (3)
Machine Translation Thesis PDF
8 pages
Exercícios de Interrogative
No ratings yet
Exercícios de Interrogative
7 pages
SDL Learning and Development: Post-Editing Certification SDL Imt
No ratings yet
SDL Learning and Development: Post-Editing Certification SDL Imt
54 pages
English to Telugu Translation Guide
No ratings yet
English to Telugu Translation Guide
6 pages
The Era of Artificial Speech Translation
No ratings yet
The Era of Artificial Speech Translation
2 pages
Machine Translation - An Introductary Guide, Arnold
No ratings yet
Machine Translation - An Introductary Guide, Arnold
323 pages
Tamil Langu PDF
No ratings yet
Tamil Langu PDF
338 pages
Altarabin Mahmoud The Routledge Course in Arabic Business TR
100% (1)
Altarabin Mahmoud The Routledge Course in Arabic Business TR
199 pages
CAT Tools - History
No ratings yet
CAT Tools - History
3 pages
The Article
No ratings yet
The Article
35 pages
NLP Notes
No ratings yet
NLP Notes
19 pages
Automatic Code Summarization: A Systematic Literature Review
No ratings yet
Automatic Code Summarization: A Systematic Literature Review
12 pages
Different Effects of Machine Translation On L2 Revisions Across Students' L2 Writing Proficiency Levels
No ratings yet
Different Effects of Machine Translation On L2 Revisions Across Students' L2 Writing Proficiency Levels
21 pages
Geez-Amharic Translation Study
No ratings yet
Geez-Amharic Translation Study
94 pages
Lexical Problems of Translation
No ratings yet
Lexical Problems of Translation
10 pages
EIdoma Translator
No ratings yet
EIdoma Translator
33 pages
AI VS Human Translation
No ratings yet
AI VS Human Translation
14 pages
Maide Dolmaci
No ratings yet
Maide Dolmaci
72 pages
Amharic Sentence To Ethiopian Sign Language Translator
86% (7)
Amharic Sentence To Ethiopian Sign Language Translator
114 pages
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
No ratings yet
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
4 pages
Achieve IELTS Academic Writing Success
100% (1)
Achieve IELTS Academic Writing Success
262 pages
Machine Translation Questions
No ratings yet
Machine Translation Questions
9 pages
Assessing Google Translate
No ratings yet
Assessing Google Translate
63 pages
A Guide For Translator-1
100% (1)
A Guide For Translator-1
210 pages
Verifika
No ratings yet
Verifika
4 pages
Machine and Human Translation
No ratings yet
Machine and Human Translation
10 pages
Neural Translation with Luong Attention
No ratings yet
Neural Translation with Luong Attention
2 pages
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
No ratings yet
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
61 pages
Translation Models & Techniques
No ratings yet
Translation Models & Techniques
40 pages