0% found this document useful (0 votes)
174 views19 pages

NLP Basics for Beginners

The document discusses natural language processing including its history, applications, challenges, and components. It also provides an overview of the course which will cover text data techniques, sentiment analysis, and deep learning with text data.

Uploaded by

Ansruta Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views19 pages

NLP Basics for Beginners

The document discusses natural language processing including its history, applications, challenges, and components. It also provides an overview of the course which will cover text data techniques, sentiment analysis, and deep learning with text data.

Uploaded by

Ansruta Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Working with Text Data

Natural Language Processing


Session 1

Madhuri Prabhala
Books
Books

❑ Foundations of Statistical Natural Language Processing by Christopher Manning and


Hinrich Schutze

❑ Natural Language Processing with Python by Steven Bird, Ewan Klein and Edward Loper
Introduction
What is Natural Language Processing?

The field of designing methods and algorithms that take as input or produce as
output unstructured, natural language data.

❑ A branch of Data science and Artificial intelligence

❑ Enables the machine understanding of human languages in text or audio


formats

❑ Several applications on the internet use NLP to improve user experience


A brief history of NLP

o Came into prominence based on a paper published by Alan Mathison Turing in


1950 titled “Computing Machinery and Learning”

o First time there were discussions on “automatic interpretation” and “language


generation”

o Today we have different tools which enable this to happen within minutes

o Python developed in the 1980s is one tool which gives a lot of applications that
can do many NLP tasks within minutes
A few examples of NLP applications

Service bots News classification


Spelling corrections and prompts:

Hllo sir.

Where …
are
is

Gaining customer insights Machine translation

And many more…

o Search engines
o Grammarly, word
o Text summarization
o Speech processing – Alexa, Siri
Images from public domain
Some of the challenges with NLP

o Cultural jargons, informal expressions, idioms

o Data and computation heavy

Ambiguity
o Not always accurate
I ate pizza with friends

I ate pizza with olives

Friends and I had pizza together


Components of NLP

Statistical
Neural network
Heuristic based machine learning
based
based

Rules are
Deep
Rule-based derived from
learning
data

Domain Naïve bayes,


RNNs, LSTM
knowledge SVM

Regex
A brief overview of the course

o Introduction to text data

o Key techniques of text data

o Sentiment analysis

o Deep learning with text data


Text data
Text dataset

❑ Text data

o Words or characters arranged in a meaningful manner


o In any natural language such as Hindi, Telugu, Japanese or English
o Driven by grammar rules and well-defined structures

❑ Examples

o Social media: tweets, Facebook comments, Instagram feeds


o Articles: Blogs, newspapers
o Conversations: Emails, chats
The NLP System

Documents NLP Insights

Impact of good insights


One business application

❑ Is the new version of iPhone popular among customers?

❑ NLP solution

o Data o Social media posts, reviews

o Analysis o Sentiments, aspects, topics

o Model o Is positive sentiment associated with higher sales


o Is negative sentiment associated with lower sales
o Is discussion on specific aspects affecting sales

o Deploy and evaluate


Working with text data
Regular expression

❑ Patterns of special characters having specific meaning

❑ Help find information and patterns embedded in the text

❑ Also called as wild card expressions for matching, searching, and parsing strings

❑ Used for writing rule – based information mining systems

❑ Can segment words from sentences, sentences from paragraphs

❑ Information retrieval
Regular expression – functions

o match – finds the first occurrence of a pattern

o search – locates any occurrence of the pattern

o findall – finds all the occurrences of the pattern

o sub – Find and replace a pattern

o split – Split a given text


Stemming and Lemmatizing

❑ Stemming
o Removes last few characters irrespective of context
o E.g. Stemming “increases” returns “increas”
o If dataset is very large, to save computational time it can be used.

❑ Lemmatization
o Considers the context before converting to the base form
o E.g. Lemmatizing “increases” returns “increase”
o Computationally expensive. However more accurate.
Parts of speech tagging

❑ Tokenize
o Break up paragraphs to sentences
o Break up sentences to words

❑ Pos tagging
o Identify and tag the various parts of speech.

You might also like