0% found this document useful (0 votes)
17 views11 pages

NLP Notes-1

Natural Language Processing (NLP) is a branch of artificial intelligence that enables machines to read, understand, and derive meaning from human languages by combining linguistics and computer science. Key components of NLP include Natural Language Understanding (NLU) and Natural Language Generation (NLG), which help machines analyze and generate human language. Applications of NLP range from sentiment analysis and machine translation to chatbots and information extraction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

NLP Notes-1

Natural Language Processing (NLP) is a branch of artificial intelligence that enables machines to read, understand, and derive meaning from human languages by combining linguistics and computer science. Key components of NLP include Natural Language Understanding (NLU) and Natural Language Generation (NLG), which help machines analyze and generate human language. Applications of NLP range from sentiment analysis and machine translation to chatbots and information extraction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Natural Language ocessing 20it

Natural Language Processing or NlP velers to

the banch 'of Avtificial ntelhgence that gves fhe


yoY
machines the a biltts to redd, underStand and

devive meovlng rom human larquages sirh


NLP Combnes the fie ld of knguisties and Computtv
SCclence to deciphe bnguage structure and guldelnes

and to make models which can Com pre herid; bréaddo


wn and Separat slqni fcontto:ls ypm tub 2 Spech,
Computt
SCtenC Cono ponent of
NLP
byioy2
NLU
Hurman iAl ofysm2) NLG
2 u anguae
2 pp tog 9 209

Netural Larguoge Underst oncling nhtural Lanquage ienevotim.


NLU helps the machine to NLG acts as a trans latoy
urders tard and analuse
humanthat Converts the Computi
lonquage by extracttng the 2ed datoh natural
lanqu-
e ta data fom Contnt suchas a98 re prejento fiom. ît
invo lies
ConCepts, entittes, keytwor dsTe dt plainng, Senttnce
mot on, velat ons, Semantic Yoles planning and Text RealiZatom
NLU the process of reoding2 NLG S the
procesS of
Ond ntypve tng language wrting o qeneroting language.
APpkcatioms of NLPspriazs3o pupin
auestiom Answe nq EXAlexap ob
2 Spom peteottbr ExirSpom mall oletrctioy
3 Sentiment AnalySS EM- Delicions food
(tve
Unhappy with 0er (-ve)
Hochine Tranglatiby
E-Google TYanslaton (trat w

S:Speltng, Covre cti m e ExGvammerly Speech


G: Chat bot/eno)o tcsstomer
Suppovt
nfomatlibn Extractlon E Resune ATs
NLP Ppelines
Steps InVolved-
Sentonce Seqnentatitn:- Serrtcnde Seqmentatim s

tUa
to breaks the paraqyaph int Seperate Senttn ces.
EX A bo plauing icket atch staytd at 1bAM.
rdor909
He S
otol 2 7 Soo
9 20 tredo in or 9cvD 2

41ty.SS A bou0S plau/ng Cicke b0 rat2


-ro faiori Matsh Stay trd at 1b AH.x po
l ov f?-0Sy 4eS Soo tved2 Ttro) ovev ohob
2 Wlovd 1okenl Zation lord To kenzatlon s used to
olhnsbreoak the Sen tende mto sepevale wod oV tokens-
S
o 22930r 9r12 S h o i t 225307
ToEeni Zey gerafotes fhe follooing resu lt
The starsS are tuunklingatight

The Stovs aYe tuxnt at gbt


Each Wlord s Called a tokon.

Rernoving Stoplords n Eqtsh, theve ave a lot

o words fhat appear Very frequert ly tke "is", and


wods
thea NLP pipelies wll flag these
as

be filkved out befove


Stop LodS Stop words mght
StaHsilcal anolySiSs
efl
dong any
at night
The stars are tuinkling
2 2
910 9
Stars tuink nq night ol
4Steromings- Stemrong is used to novmali Ze words

nto HBmse form or root Form. For exarmple,


Celebrotes, cdebro ttd ad Cele bratng, all these ods
4Te Oranatd wltho a s/ngle root wordcelë bvo te
th Sternmi ng is that Sometes i t
he big problem wi

PYroduCes the root Loord whch may not have any

meaning n t ellh genc Intrlkgen


o intellugent ntellgen Stt
Snteltgently - nklkgen
SEipingc Spting SEiP
Stip
Stm
SKiped Skip4ed SKP

sLemmat2atlom,-temmat 2ationis qut f Simlay.

to the stamming 3tts used to qYoup dittevent

inflecttd foms of the wovd Called Lemma The

ma'n dffevence blw Semmlng& leromalilta tion is th

it produces the root-wrd, whlch. hasa meaning.


2Dr0

EXintctJigence ntelkgent) Lemm g


inteltgent antellg ent t
Tntelttgently ntelltgent which hos

DependenCy ParSiviq Dependency PayStngs used to


fnd that how all the twovds 'n the Sentnce are
relattd to each other
951m0 ob)N 21 ptyoy poio
Povt ot Speech Tagging: Now, ue must expoin The
f

bnCept of: nouns, Vverbs avticles .and othey povts f


Speech tp the moch:ne by adding these. tags to ouY
woYds2 1o, 2 otonf2 lkang pe
DetvmintYsloun VerbAdecttuePrepeston| ]Nou

the Stas ave twmking at imgh


8:Named Entity Rec09nition (WER) #-
Mamed Entity Recoonition Is the process of
armed of
the ramed entity Such as person rame movie
Hectng gioytr
hame, Ovqani tatlon ame OY lbcatlon.

FX Steve Jobs inthroduced iPhone

The
q.Chun Rth9 Chunkig is used to Collect
irem
individual pece of infmotlon and 9Ybuping
yrtiluboV Yo Yd
nto blggey pleces of SentnCes

Data'set Test-21wovds
preproce smg preproeSing vectovS
To Ken 7atlon Ostmmng
u ORo
touerng the Case Lemmatt iation TE 10OF
woids O Stopuods ud 2Vec
Baslc Atrmno loces uofd in NLP
COR PUS Poraqraph
POCume nts SentnG
Vo Cobulary Ontqw wods po

lord WoYd
Bag of blords (Bol);
2293 3T
The Baq of lords (Bohl) mode 1sa
repre Sentat'on
epresentait
i no297 20 2
that tuvns avb hvavy text into ted-lergth vectmi
dlfond YO
DS
yCounting hoo many times each lo0Yd appears

ThiS proCess is of en refevred to as vectotattion.


holla) bs2
Rag of blbrds wTYKS on two Tbings

A knbwn word VoCabulary


29%cmo2 ho 25)sk7 yhpok
2 A meaNure of hoo many Known wods are present

The moel doeS not (onSieY the ovder Stuchure oY

LId's OY the infom atin present in t al thot


26 S dCaxYded. The model omly deals with whethey
ndfo re33
known words oC ur in the doCument, even Wilhut
60
ConS'devin9 uhere in the dbument

Steps
IData Collection: Comsider 3 ltnes of teut as, a
Se perate doCument which needs to be Cc tol Zel.

O the dog at
the dog Sat in the hat
el
the doq with the hat.
.Detcmne
2 Detc the VoCabulayg.gngofotdsy a
all the words
Vocobuhry s defined as the Set, of
the doCuments the uw0cls in the doument
Purnd in
with
abovethe, dog, Sab, in, the, hat,
3-CUuntin9 The Vectorltatin process tnvolves Counting
the number of tlmeS each wor d appears.

PoCument the dog Sat th hat wth

The dog Sat

The dog Sat in the hat 1

The dog witb the hat

This generatesa 6 lenqth vecto foY each doGument


As you Can See, the bow vecto only Contans info

a but what wovds OCCurE and h o many times wi thout

Contrxtual tnformatim or where They oCuv

4Hanaging Vora bulary:


frorn prev. example, as
oCobu lay 9Tous
AS we Can See
doCuments also grows.
The vecho repre'sentation in the
This means that for very larqe O u ments, books
The ve cky length Can stretth up o thousands of poston
Contain a fco known
ince each doCu ment Can also
Zevos
Loords, that cveatt a lot of empty lofs Wth
Y9
Called a Spavse vectuv Doh92 2 7 . 2 - 3

33hys2 22
2T
kle use chtaCleoning methodS to heduo the '
Ste
The Vocabulary Ths includes gnong cos
punctuatt on, finng Ynsspelt Loovds, gnovlStbp
ng Stootoor
S Stovinq wovds Stoing hu wods s ply atte
Simply attachi
a tho
numeviCal valu to maYk o CCuveng&ot tfho
uwbrds înabbve example, Scoving Oas onary
Me sena Y ab SenCe Of wOTds fr
ofhey SCovng methols ncluc
.COunts: this is to Count eveny time The Loord Oppear
in The doCument

requencies: CalCu late tho. freque nCu Of The twods


) O 0 9 101
a doCument in Contiost to the total
udioJ 2 words
yro
n, The dolument hvo to
pisadvon tages ssw o Padvantoge s huo
Spar sity OStmple 2tuitiw.
Ovdeving. Of or oi s t o t vau
Semontl.c mêaning not able to Caprune i3
'N-qrams sprol yrv osh p09 2

N-qYam iS a Sequend of the N-words in the


Jeabob h09 90
modeling of NLP )a3in rD
Jol oo/ ov,2hrop
OngYarn or One qram-There iS a one-twOYd Seouent
EY-ThisS 1S a SenttnC This sa, Sevrknce
bl-Qram bY two qYaml- Two-Word Se nCe

Th:S s a Sen tnG i s S, 1s a,a Sentnc,

TY-q1am or
thre-Q Thre-wOrd Sequgna

oxt ThiSS a Sen tent This s a,is a Serntnc


e Can CalCilate N-granms
Sorme way
translatlonet
Applica tiorSpeech YeCoqnitim machine
enhofontroyo uon
9oht-21
T-IDF

TP-DF Stords f 1erm hvequenCynverse DoCutnt

frequency, and The tf-idf weiq ht is a e'ght often


text
used in tn formatin retrieval and erpuS mining mt
b No
this welqht s a stati: stcal meaSure used to evaluate
a
hau tmportant a a doCumenttn
word s to

Cor pus The impYtanG nre a8es propoYt-


collectton or

humber 6f times a wovd appeors in


onally to the
s offSet by the freque ny of
the doCument out
rob2) zfe gos
The worde in the Corpug2
are
of the tf-idf Loelghting SCheme
Varia tions
toolin
of ttn uSted by Search engines as a Central

SCoring and ranktnga cbaument'S velevanG gien


SCoring
a usey query2
16-10f an be SucCe sSfully used for Stop-twods
fling tn var bus Subject fie lods
tncludi n
tnt SummaY ta tion and Classift Calion

TF-Tevm Fequeny ulhich m eosuves hoo f


afrm OCCurt n a dolumêht Y
frecquer
T TE NoD- times termtappers in a doCume
Total ho- ofttms n The dolument
PF- Inverse Doumennt reguenCy nlhich mêasum
houd impovtant a ttrm iS. While lomputi9
all tevns ane
Consi dered equally im pr tant:Htoee.
It is kn0von That ceytatn
ttrms, Such as S, be
eA 2
thot may appeaa lot of tmes but have
xgbriov9hyfr
have
diormos
ttle
lttl
mpbr tanC. Thus loe need to
ueigh doon The
frequent Herms while Scale. up fhe Yave ones

IDf log, Potal n0:Of documents)


Nb: Df
doCurnentt wiTh
Hrm tn
911
it
Exomple l-lets ConstdeY 3 SenttnCes
(doCuments)
Good boy Here Bo Shauld be qven more
Good g mpor ane o ue'ght then god,Sine E
frequant
th fhe Copus)
Aind the voCabu la in the Sentence S
and
-1 t b20 plut2es)bw2 9d
TP pirdoX 2 1OF
Sent1 Sent2 Sent3
word S 3OF
go0d
gpod loge)o
boy 0 '3 boy 1oge(h)
gw toe (h)
Nou Hulttply 1F,and ADFor
to obtain 7-aDE Leigt
good boy 81g tals 2
Sent 1

Senb2 OTtolr0ptng loget)


senb 3
6Vlae()"Y%lye()
Advantoges bonti oSadantagestoodolla
ntultie i o t SpovSity e
2 wlord 1mportanda Dou2Out VoCobuanbdo
vo of,
Sgetting Capture
Word Em beddingsoto nob ot 9 boT
of word veprUen ta.
hoTd Embe cings are type
à

Cion that allnws wods wlth imtar mearnings fo


have a Sim;lar vepre sentationd gong 9uob uel
Enbe ctings tvanslot lav9e sparse Vec tors nto a

oldedmenSomal S2G that preserves emattle Yeloton


Ships.

You might also like