Introduction to NLP (ELL 881)
Special Topics in Computers 2
Image: https://www.blumeglobal.com/learning/natural-language-processing/
                   Neuro-linguistic
                    programming
Introduction to NLP (ELL 881)
  Special Topics in Computers 2
                   Neuro-linguistic
                    programming
Introduction to NLP (ELL 881)
  Special Topics in Computers 2
                     Non-Linear
                    Programming
Introduction to NLP (ELL 881)
  Special Topics in Computers 2
                     Non-Linear
                    Programming
Introduction to NLP (ELL 881)
  Special Topics in Computers 2
                     Natural
                    Language
                    Processing
Introduction to NLP (ELL 881)
  Special Topics in Computers 2
NLP (Wiki)
1.   Natural Language Processing    5. National Library of Poland
                                    6. National Library of the Philippines
2.   Natural-linear Programming
                                    7. No light perception
3.   Neuro-linguistic Programming   8. National Labour Party
4.   Natural-language Programming   9. National Liberal Party
                                    10. National Liberation Party
                                    11. Natural Law Party
                                    12. New Labour Party
                                                     https://en.wikipedia.org/wiki/NLP
• Course Instructor: Tanmoy Chakraborty (tanmoychak.com)
                    (NLP, Social Media, Graph Neural Networks)
                    tanchak@iitd.ac.in
• Guest Lecture: TBD
• Course page: https://sites.google.com/view/ell881-iitd/home
• Piazza: https://piazza.com/iitd.ac.in/spring2023/ell881
• TAs:
  • Kshitij Alwadhi (Kshitij.Alwadhi.ee119@ee.iitd.ac.in)
  • Gurusha Juneja (ee1190480@ee.iitd.ac.in)
• Group Email: TBD
Useful resources/tools/libraries
  • Natural Language Toolkit (NLTK)
  • Stanford CoreNLP
  • CMU ARK for Noisy Text
  • Scikit-learn
  • Spacy
  • Stanza
  • Shallow Parser - for Indian Language
  • Universal Parser - Multi-lingual
  • HuggingFace
                                           9
Reading and Reference materials
• Books
    •   Speech and Language Processing, Dan Jurafsky and James H. Martin
    https://web.stanford.edu/~jurafsky/slp3/
    •   Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
    •   Natural Language Processing, Jacob Eisenstein
    https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
    •   A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
    http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
    •   Computational Linguistics, Natural Language Engineering, TACL, KBS, ACM TALLIP, ....
• Conferences
    •   ACL, EMNLP, NAACL, COLING, AAAI, IJCNLP, ICML, NIPS, WWW, KDD, SIGIR, ….
Research papers repository
                             https://aclanthology.org/
                                                         11
Research papers repository
                             https://arxiv.org/list/cs.CL/recent
                                                                   12
 Prerequisite
     • Excitement about language!
     • Willingness to learn
   Mandatory                     Desirable
   • Data Structures & Algorithm Deep learning
   • Machine Learning
   • Python programming
• Strongly recommended to learn ML. This class will not cover fundamentals of ML.
• Instructor/TAs may cover DL-related prerequisites
  Course Directives
                                                 HashLearn
   • Class Time: Mon & Thu, 2 pm – 3:30 pm       • Meet your instructor at least once
                                                   per 15 days to resolve your doubts.
   • Office Hour: Mon 5-6 pm                     • Mon 5-5:30 pm (appointment
                                                   based, email me at least 1 hr before
   • Room: LH-519                                  coming)
Marks distribution (tentative):          • Audit: Discouraged!
• Minor 1: 10%                             B- (threshold to pass the course)
• Minor 2: 10%
                                         • Grading Scheme: Relative?
• Major: 20%
• Quiz (3): 15%
                                         • 75% attendance mandatory (Timble)
• Assignment (2): 20%
• Mini-project: 20% (group-wise)             • If you want to deregister, please do it ASAP
• Paper reading (1): 5% (group-wise?)        • Please allow others to register
                                             • Registration limit (80) may not be increased
       Mini Project (20%)
      • A few problem statements, and datasets will be floated (in Jan 2023)*
      • A leaderboard will be maintained per problem statement
      • Each group should consist of 1-3 students?
      • Best Project Award
                                   Students are encouraged to publish their projects in good
      • You need to                conferences/journals
           •   develop models                         Deliverables:
           •   evaluate your models                   1. Final project report (8%), 8 pages ACL format. Need to arxiv
                                                      2. Repo of dataset and source code (2%)
           •   prepare presentation                   3. Final project presentation (5%)
           •   write tech report                      4. Performance on leaderboard (5%)
* You are welcome to propose a new idea if you find it fascinating to be qualified for a mini project. Instructor opines!
List of Projects
• TBD
             Content (Tentative)
                 • Introduction
                 • Classical NLP   •   Regular Expressions, Text Normalization, and Edit Distance
                                   •   Morphology & Finite-state Transducers
   1980-2010
                                   •   N-grams, smoothing and entropy
                                   •   HMM, Viterbi and A* decoding
                                   •   Word classes and POS tagging
                                   •   Semantics & distributional semantics
                 • Intro to deep learning
                 • Deep Learning for NLP
   2011 - 2017
                                     • Word vectors and word window classification (Word2Vec, GloVe, etc.)
                                     • RNNs and language models (vanishing gradients, fancy RNNs)
                                     • Sequence-to-sequence models and applications
                                     • Attention mechanisms & self-attention
                                     • Transformers
                 • Adv. NLP •      More about Transformers (BERT, RoBERTA, ELMo, transfer learning)
                            •      Prompt-based learning
2018 – till
                            •      In-context learning
  date
                            •      Multilingual and multimodal models
                            •      Fairness and ethics in NLP
                            •      Miscellaneous
Timeline
           Classical NLP                      NLP with DL                            Adv. NLP
 Day 1                              Minor 1                           Minor 2                            Major
                           Quiz 1                            Quiz 2                             Quiz 3
          Assignment 1                        Assignment 2                Assignment 3
                        Mini projects                                                            Mini project
                    (problem statements)                                                         evaluation
   Two Assignments: Max(Assignment 1, Assignment 2) + Assignment 3
                   Acknowledgment
                     These slides were adapted from the book
SPEECH and LANGUAGE PROCESSING:
An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Advanced NLP, Graham Nuebig http://www.phontron.com/class/anlp2022/
Advanced NLP, Mohit Ayyer https://people.cs.umass.edu/~miyyer/cs685/
NLP with Deep Learning, Chris Manning, http://web.stanford.edu/class/cs224n/
Understanding Large Language Models, Danqi Chen https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
and some modifications from presentations found in the WEB by
          several scholars including the following
Credits and Acknowledgment
     Husni Al-Muhtaseb
                               Heshaam Feili          Khurshid Ahmad    Martha Palmer
     James Martin              Björn Gambäck                            julia hirschberg
                                                      Staffan Larsson
     Jim Martin                Christian Korthals                       Elaine Rich
                               Thomas G. Dietterich   Robert Wilensky   Christof Monz
     Dan Jurafsky
                               Devika Subramanian     Feiyu Xu          Bonnie J. Dorr
     Sandiway Fong             Duminda Wijesekera                       Nizar Habash
                               Lee McCluskey          Jakub Piskorski
     Song young in             David J. Kriegman                        Massimo Poesio
     Paula Matuszek                                   Rohini Srihari    David Goss-Grubbs
                               Kathleen McKeown
                                                      Mark Sanderson    Thomas K Harris
     Mary-Angela Papalaskari   Michael J. Ciaraldi                      John Hutchins
     Dick Crouch               David Finkel           Andrew Elks       Alexandros Potamianos
                               Min-Yen Kan            Marc Davis        Mike Rosner
     Tracy Kin
                               Andreas Geyer-Schulz                     Latifa Al-Sulaiti
     L. Venkata Subramaniam    Franz J. Kurfess       Ray Larson        Giorgio Satta
     Martin Volk               Tim Finin              Jimmy Lin         Jerry R. Hobbs
     Bruce R. Maxim            Nadjet Bouayad         Marti Hearst      Christopher Manning
                               Kathy McCoy                              Hinrich Schütze
     Jan Hajič                                        Andrew McCallum   Alexander Gelbukh
                               Hans Uszkoreit
     Srinath Srinivasa                                Nick Kushmerick   Gina-Anne Levow
                               Azadeh Maghsoodi                         Guitao Gao
     Simeon Ntafos             Md Shad Akhtar         Mark Craven
                                                                        Qing Ma
     Paolo Pirjanian           Mohit Ayyer            Chia-Hui Chang    Zeynep Altan
     Ricardo Vilalta           Graham Neubig          Diana Maynard     Edureka
     Tom Lenaerts              Chris Manning          James Allan       And many others…
Introduction
Is this a grammatically correct English sentence?
    Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
Natural Language Processing
• What is a Natural Language?
  Any language that has evolved naturally in
  humans through use and repetition without
  conscious planning or premeditation.
•   What is a Natural Language Processing?
    A field of computer science, artificial intelligence and
    computational linguistics concerned with the interactions
    between computers and human (natural) languages.
                                               https://www.javatpoint.com/nlp
              Natural Language Processing
                    • Setup
                        •   Two rooms, two humans, and a computer.
                             • Room 1: One human C
                             • Room 2: One computer (A) and one human (B)
                    • A response generated from room 2 (either by A or B)
                    • C has to figure out the source of the response
                        •   If C is successful   → “A” failed the turing test
                        •   Else,                → “A” passed the turing test
"Computing Machinery and Intelligence" which
proposed what is now called the Turing test
Natural Language Processing
       In 1957, Noam Chomsky’s Syntactic Structures
       revolutionized Linguistics with 'universal
       grammar', a rule based system of syntactic
       structures
Natural Language Processing
     Aravind Krishna Joshi (August 5, 1929 – December 31, 2017)
     was a Professor of Computer and Cognitive Science in University
     of Pennsylvania.
     Joshi defined the tree-adjoining grammar formalism which is
     often used in computational linguistics and natural language
     processing.
Natural Language Processing
                  https://en.wikipedia.org/wiki/History_of_natural_language_processing
Why NLP is challenging?
     Ambiguity
   The real reason why NLP is hard
“Rohit Sharma was on fire last night. He totally destroyed the other teams”
       Ambiguity
           ●       Is ambiguity present in language only?
                   ● No, ambiguity is prevalent in every dimension!
                                  Duck or Rabbit?
shadakhtar:nlp:iiitd:2022:intro
                                                                 Who has the
                                                                 telescope?
       Ambiguity in language
           ●       I saw a girl with a telescope.
          ●      I saw a girl with a bicycle.
                                                                               OR
          ●      I saw a bus with a telescope.
                                                      No
                                                    ambiguity!
shadakhtar:nlp:iiitd:2022:intro
       Ambiguity in language
           ●       I saw a girl with a telescope.
           ●       Mary had a little lamb.
                                                    OR
shadakhtar:nlp:iiitd:2022:intro
                                                        Who’ll gift
                                                         whom?
       Ambiguity in language
           ●       I saw a girl with a telescope.
           ●       Mary had a little lamb.                     I have to gift you some sweets.
           ●       Mujhe aapko mithai khilani padegi.
                                                                            OR
                                                             You have to gift me some sweets.
shadakhtar:nlp:iiitd:2022:intro
       Ambiguity in language
           ●       I saw a girl with a telescope.
           ●       Mary had a little lamb.
           ●       Mujhe aapko mithai khilani padegi.
           ●       Public demand changes
                                                                                           OR
                                                                                  Public              Public
                                                                                 demand:             demand:
    (a) Public demand changes, but does anybody listen to them?
    (b) Public demand changes, and we companies have to adapt to such changes.    ABC           OR    XYZ
shadakhtar:nlp:iiitd:2022:intro
       Ambiguity in language
           ●       I saw a girl with a telescope.
           ●       Mary had a little lamb.
           ●       Mujhe aapko mithai khilani padegi.
           ●       Public demand changes
           ●       Baby changing room                                   OR
                                                        IN              OUT
                                                               Baby
                                                             changing
                                                               room
shadakhtar:nlp:iiitd:2022:intro
       Ambiguity in language
           ●       I saw a girl with a telescope.
           ●       Mary had a little lamb.
           ●       Mujhe aapko mithai khilani padegi.
           ●       Public demand changes
           ●       Baby changing room
           ●       I ate rice with spoon.
           ●       I ate rice with curd.
           ●       I ate rice with Rahul.
                                                    Similar surface
                                                     structures but
                                                        different
                                                    interpretations!
shadakhtar:nlp:iiitd:2022:intro
       Ambiguity and Punctuations!
                                     A woman without her man is nothing
shadakhtar:nlp:iiitd:2022:intro
Ambiguity makes NLP hard
                    Surface form has multiple interpretations
    • Syntactic Ambiguity
        • Violinist Linked to JAL Crash Blossoms => main verb?
 the study of the origin of words and the
 way in which their meanings have changed
 throughout history.
                                                                                                 Is it a valid
                                                                                                 sentence?
     What about this?
                                 Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
 The word buffalo has three senses:
   1.  Noun: Animal (plural is also buffalo)
   2.  Proper Noun: American State
   3.  Verb: To bully someone
                     Buffalo buffalo, whom other Buffalo buffalo buffalo, buffalo Buffalo buffalo
  The sentence uses a restrictive clause, so there are no commas, nor is there the word "which," as in, "Buffalo buffalo, which Buffalo buffalo buffalo, buffalo
  Buffalo buffalo." This clause is also a reduced relative clause, so the word that, which could appear between the second and third words of the sentence, is
  omitted.
shadakhtar:nlp:iiitd:2022:intro                                               Dmitri Borgmann's Beyond Language: Adventures in Word and Thought. 1967.
Why else is natural language
understanding difficult?
non-standard English                segmentation issues              Idioms/Multiword
Great job @justinbieber! Were                                              dark horse
SOO PROUD of what youve         the New York-New Haven Railroad           get cold feet
accomplished! U taught us 2     the New York-New Haven Railroad             lose face
#neversaynever & you yourself                                          throw in the towel
should never give up either♥                                           Khana-wana (Echo)
         neologisms                  world knowledge                 tricky entity names
           unfriend                 Mary and Sue are sisters.     Where is A Bug’s Life playing …
           Retweet                 Mary and Sue are mothers.      Let It Be was recorded …
          bromance
                                                                  … a mutation on the for gene …
       NLP layers
           ●       Understanding the semantics is a non-trivial task.
           ●       Needs to performs a series of incremental tasks to achieve this.
           ●       NLP happens in layers
shadakhtar:nlp:iiitd:2022:intro
       NLP trinity
                                  DL
shadakhtar:nlp:iiitd:2022:intro
       Word and Token
          ●       Word:
                   ○  Smallest sequence of phonemes of a spoken language that can be uttered in isolation
          ●       Word Segmentation/Tokenization:
                   ○  Breaking a string of characters into a sequence of words.
                   ○  Smallest sequence of graphemes that are delimited with some predefined characters (space,
                      comma, full-stop, etc.);
     Ram, Shyam, and Mohan are playing.                  ⇒    [Ram] [,] [Shyam] [,] [and] [Mohan] [are] [playing] [.]
     21,53,010 COVID cases in India.                     ⇒    [21] [,] [53] [,] [010] [COVID] [cases] [in] [India] [.]
                                                              [21,53,010] [COVID] [cases] [in] [India] [.]                                     ✅
     Check this out…https://www.abc.com                  ⇒    [Check] [this] [out] [.] [.] [.] [https] [:] [/] [/] [www] [.] [abc] [.] [com]
                                                              [Check] [this] [out] [...] [https://www.abc.com]                                 ✅
     #GreatDayEver                                       ⇒     [#] [Great] [Day] [Ever]
shadakhtar:nlp:iiitd:2022:intro
       Morphology
           ●       Field of linguistics that studies the internal structure of words
                     ○    How they are formed
                     ○    Their relationship to other words in the same language.
           ●       It defines word formation rule from the root word.
           ●       Morpheme is the smallest linguistic unit that has semantic meaning
                       ○          E.g.:
                                    ■     “Pre”, “ed”, “ing”, “s”, “es”, etc.
                                            ○      Dogs              ⇒ dog + s (plural)
                                            ○      Going             ⇒ go + ing (present participle)
                                            ○      Independently ⇒ independent + ly (Adverb)
                                                                    ⇒ in + dependent + ly (Negation)
                                                                   ⇒ in + depend + ent + ly (relying)
                                                                   ⇒ in + de + pend + ent + ly
shadakhtar:nlp:iiitd:2022:intro                                                                        Pend: (verb) to remain undecided or unsettled.
       Morphology
           ●       English, Chinese, etc. are commonly referred as morphologically-poor language.
           ●       Indian, Turkish, Hungarian, etc. are termed as morphologically-rich language.
shadakhtar:nlp:iiitd:2022:intro
       Parts-of-Speech (POS)                                                            Tags
                                                                                        PRP: Personal Pronoun
                                                                                        VBD: Verb, Past
                                                                                        DT: Determiner
         ●    Grammatical class of the word.                                            NN: Noun, Singular, Mass
                                                                                        TO: to
                                                                                        IN: Preposition
                                  He     ate    an   apple     .
                                  PRP    VBD    DT    NN       .
       ●        PoS disambiguation
                 ○   A word can belong to different grammatical classes.
                                  He     went   to    the    park      in    a    car     .
                                  PRP    VBD    TO    DT      NN       IN    DT   NN      .
                                  They   went   to   park     the      car   in   the    shed         .
                                  PRP    VBD    TO    VB      DT       NN    IN   DT      NN          .
shadakhtar:nlp:iiitd:2022:intro
       Chunking
           ●       Identification of non-recursive phrases (noun, verb, etc.)
                       ○          He went to the Indian city Mumbai. ⇒
                                  [NP He] [VP went] [PP to] [NP the Indian city Mumbai]
                       ○          Mumbai green lights women icons on traffic signals earns global praise. ⇒
                                  [NP Mumbai green lights women icons] [PP on] [NP traffic signals] [VP earns] [NP global praise]
shadakhtar:nlp:iiitd:2022:intro
       Syntax Processing
                                                                                         S
          ●      Validate the grammatical structure of the sentence.
          ●      Let, vocabulary = [the, mango, he, eats, ...]
                    ○      He eats a mango. ⇒ ✅
                    ○      He mango eats a. ⇒ ❌                                   NP             VP        .
           ●        The sequence of words must follow the grammatical                    VBZ          NP
                    structure of the language to form a valid sentence.
                     ○   Construct a parse tree.
                                                                            PRP                  DT        NN
                                                                       He         eats       a        mango
                                                   Parse Tree
shadakhtar:nlp:iiitd:2022:intro
       Syntax Processing
                                                                                    S
           ●        Every language has a grammar G = <V, T, P, S>.
                   Productions (P) or rules:
                         S      →              NP VP .                       NP             VP        .
                         NP     →              PRP | NN | DT NP
                         VP     →              VBZ NP
                         PRP →                 He
                         VBZ →                 eats                                 VBZ          NP
                         DT     →              a
                         NN     →              mango
                                                                       PRP                  DT        NN
                                                                  He         eats       a        mango
shadakhtar:nlp:iiitd:2022:intro
                                                                Syntactic Ambiguity
                                                                                                                 S
                                                    S
                                                                                   NP                    VP                  .
      NP                                    VP                  .
                                                                                         VBZ        NP
                      VBZ              NP               PP
                                                                                   PRP         DT         NN         PP
     PRP                          DT         NN         IN          NP
                                                                                                                     IN          NP
                                                               DT         NN                                                DT         NN
                                                                         telesco                                                      telesco
        I             saw         a          girl       with   a                    I    saw   a          girl       with   a
                                                                            pe                                                           pe
shadakhtar:nlp:iiitd:2022:intro
       Semantic Role Labelling (SRL)
           ●       Identify the semantic role of each argument (noun phrase) w.r.t. the predicate (main
                   verb) of the sentence
                                  John    drove   Mary      from   Delhi       to      Pune         in         his      car
                                  Agent           Patient          source            destination                     instrument
                                  Ram      hit    Shyam     with     a      hockey     stick       yesterday
                                  Agent           Patient                      instrument            time
shadakhtar:nlp:iiitd:2022:intro
       Textual Entailment
           ●       Determine whether one natural language sentence entails (implies) another under an
                   ordinary interpretation
                    (Ram hit Shyam with a hockey stick yesterday. → Shyam got hurt)         ⇒ Positive TE
                    (Ram hit Shyam with a hockey stick yesterday. → Shyam did not get hurt) ⇒ Negative TE
                    (Ram hit Shyam with a hockey stick yesterday. → Shyam got hospitalized) ⇒ non TE
shadakhtar:nlp:iiitd:2022:intro
       Pragmatics
           ●        Pragmatics considers [Thomas, 1995]:
                     ○  the negotiation of meaning between speaker and listener.
                     ○  the context of the utterance.
                     ○  the intention of the user.
                       ○          Context/World knowledge: An employee coming late to the office.
                                   ■   Utterance: Do you know what time is it?
                                   ■   Literal meaning: Are you aware of the current time? (Response: Yes, it is 12:30 PM)
                                   ■   Pragmatic meaning: Why are you coming so late? (Response: Reason for being late.)
                       ○          Intention:
                                    ■    Utterance: Can you pass the water bottle?
                                    ■    Literal meaning: Are you able to pass the water bottle? (Response: Yes, I can.)
                                    ■    Pragmatic meaning: Pass me the water bottle. (Response: Handover the water bottle)
shadakhtar:nlp:iiitd:2022:intro
       Discourse
           ●       Processing of sequence of sentences.
             Mother said to John: Go to school. It is open today. Are you planning to bunk? Father
       will be very angry.
                       ○          Discourse processing helps answering these questions.
                                   ■   What is open?
                                   ■   Bunk what?
                                   ■   Why the father will be angry?
shadakhtar:nlp:iiitd:2022:intro
       Coreference Resolution
          ●       Two referring expressions used to refer to the same entity are said to corefer.
          ●       Determine which phrases in a document corefer.
                    John shows Bob his Toyota yesterday. It’s similar to the one I bought five years ago.
                    That was really nice, but he like this one even better.
shadakhtar:nlp:iiitd:2022:intro
       Information Extraction
           ●        Extraction of relevant piece of information
           ●       Named Entity Recognition (NER):
                       ○          Identify names (Proper nouns)
                                    ■      [India]Location born [Sundar Pichai]Person is the CEO of [Google]Organization and its parent company [Alphabet]Organization
           ●        Relation extraction:
                        ○         Relation among entities
                                   ■    CEO(Sundar Pichai, Google), CEO(Sundar Pichai, Alphabet), Born-at(Sundar Pichai,
                                        India), ParentOrg(Alphabet, Google)
shadakhtar:nlp:iiitd:2022:intro
       Word Sense Disambiguation (WSD)
         ●     What does a word mean?
                       ○      The fisherman went to the bank. ⇒ Financial bank or river bank?
                        ○         The fisherman went to the bank to withdraw money.
                        ○         The fisherman went to the bank to fish.
shadakhtar:nlp:iiitd:2022:intro
       Sentiment Analysis
          ●      Extract polarity orientation of the subjectivity
                     ○      Really superb pillow. Love to sleep on it.. very comfortable...          ⇒ Positive
                     ○      It's a mass Chinese product. Too expensive. Thin and useless             ⇒ Negative
                     ○      My neighbours are home and it’s good to wake up at 3am in the morning.   ⇒ Negative?
                     ○      Campus has deadly snakes.                                                ⇒ Negative
                     ○      Shane Warne is a deadly spinner.                                         ⇒ Positive?
                     ○      The food was cheap.                                                      ⇒ Positive?
                     ○      Not to mention the cheap service I got at the restaurant.                ⇒ Negative
                     ○      Movie was 4 hrs long.                                                    ⇒ Neutral?
shadakhtar:nlp:iiitd:2022:intro
       Machine Translation
          ●     Given a sentence in the source language L1, convert it to the target language L2, such that the semantic (adequacy and fluency)
                is preserved.
                                                                                                       Source: Google Translate
shadakhtar:nlp:iiitd:2022:intro
       Summarization
         ●       Given a document, summarize the semantics (extract relevant information) in shorter length text.
             ●      Document
                        ○         Sen. Barack Obama sealed the Democratic presidential nomination last night after a grueling
                                  and history-making campaign against Sen. Hillary Rodham Clinton that will make him the first
                                  African American to head a major-party ticket.
           ●        Summary
                       ○          Barack Obama is the Democratic presidential candidate.
shadakhtar:nlp:iiitd:2022:intro
       Question Answering
           ●       Answer natural language questions based on information presented in the repository.
           ●        Factoid Questions
                        ○         Question: Who is the author of the book Wings of Fire?
                        ○         Answer: A. P. J, Abdul Kalam
           ●        List Questions
                        ○         Question: What are the islands in India?
                        ○         Answer: Andaman Island, Nicobar Island, Labyrinth Island, Barren Island
           ●        Descriptive Questions
                        ○         Question: What is Greenhouse effect?
                        ○         Answer: The analogy used to describe the ability of gases in the atmosphere to absorb
                                  heat from the earth’s surface.
shadakhtar:nlp:iiitd:2022:intro
       Dialog System and Chatbot
           ●       Conversation of two or more parties.
shadakhtar:nlp:iiitd:2022:intro
       Hate Speech
        •     Any post that targets a specific individual/group of people based on their ethnicity, religious beliefs,
              geographical belonging, race, etc., with malicious intentions of disseminating hate or emboldening
              violence.
                 •      #BuildThatWall #BuildTheDamnWall I’m sorry my Lord #Jesus but people are just deaf down
                        here
                 •      Women ... Can’t live with them...Can’t shoot them
        •     Related terms
                 •      Insult, Abuse, Offensive, Provocative
shadakhtar:nlp:iiitd:2022:intro
       Fake News
        •     A piece of information or an alleged claim that is verifiable to be false.
        •     Intentionally created posts to spread malicious and false narratives
                       ◦          Leverages the chaos/misinformation to gain political, financial, or regional advantages in a quick time
shadakhtar:nlp:iiitd:2022:intro
     Language Technology
                                                               Still really hard
                     Mostly solved                    Question answering (QA)
                                                       Q. How effective is ibuprofen in reducing
                                                       fever in patients with acute febrile illness?
        Spam detection
               Let’s go to Agra!             ✓
                                                      Paraphrase
               Buy V1AGRA …                  ✗
                                                       XYZ acquired ABC yesterday
                                                       ABC has been taken over by XYZ
      Part-of-speech (POS) tagging
               ADJ         ADJ NOUN VERB   ADV        Summarization
          Colorless green ideas sleep furiously.        The Dow Jones is up                   Economy is
                                                         The S&P500 jumped                       good
                                                            Housing prices rose
        Named entity recognition (NER)
             PERSON               ORG      LOC        Dialog        Where is Citizen Kane playing in SF?
        Einstein met with UN officials in Princeton
                                                                  Castro Theatre at 7:30. Do you
                                                                          want a ticket?
shadakhtar:nlp:iiitd:2022:intro
Why Study NLP?
 • To get a job in industry
    • e.g., many current job listings are CL jobs
        • Google Inc.
        • Amazon Inc.
        • Facebook Inc.
        • Flipkart Inc., etc.
 • To get a job in academia
    • As a computational linguist
    • computational literacy and an understanding of computational methods will become critical
      in the next decade.