Course Code:                                  Course Title                                  Credit
CSDO7011                           Natural Language Processing                               3
Prerequisite: Artificial Intelligence and Machine Learning, Basic knowledge of Python
Course Objectives:
1 To understand natural language processing and to learn how to apply basic algorithms in this field
2 To get acquainted with the basic concepts and algorithmic description of the main language levels:
   morphology, syntax, semantics, and pragmatics
3 To design and implement various language models and POS tagging techniques
4 To design and learn NLP applications such as Information Extraction, Question answering
5 To design and implement applications based on natural language processing
Course Outcomes:
1 To have a broad understanding of the field of natural language processing
2 To design language model for word level analysis for text processing
3 To design various POS tagging techniques
4 To design, implement and test algorithms for semantic analysis
5 To develop basic understanding of Pragmatics and to formulate the discourse segmentation and
   anaphora resolution
6 To apply NLP techniques to design real world NLP applications
Module           Content                                                                        Hrs
   1             Introduction                                                                      4
           1.1   Origin & History of NLP, The need of NLP, Generic NLP System, Levels
                 of NLP, Knowledge in Language Processing, Ambiguity in Natural
                 Language, Challenges of NLP, Applications of NLP.
   2             Word Level Analysis                                                               8
           2.1   Tokenization, Stemming, Segmentation, Lemmatization, Edit Distance,
                 Collocations, Finite Automata, Finite State Transducers (FST), Porter
          Stemmer,    Morphological    Analysis,   Derivational   and   Reflectional
          Morphology, Regular expression with types.
    2.2   N –Grams, Unigrams/Bigrams Language Models, Corpora, Computing the
          Probability of Word Sequence, Training and Testing.
3         Syntax analysis                                                              8
    3.1   Part-Of-Speech Tagging (POS) - Open and Closed Words. Tag Set for
          English (Penn Treebank), Rule Based POS Tagging, Transformation Based
          Tagging, Stochastic POS Tagging and Issues –Multiple Tags & Words,
          Unknown Words.
    3.2   Introduction to CFG, Hidden Markov Model (HMM), Maximum Entropy,
          And Conditional Random Field (CRF).
4         Semantic Analysis                                                            8
    4.1   Introduction, meaning representation; Lexical Semantics; Corpus study;
          Study of Various language dictionaries like WordNet, Babelnet; Relations
          among lexemes & their senses –Homonymy, Polysemy, Synonymy,
          Hyponymy; Semantic Ambiguity
    4.2   Word Sense Disambiguation (WSD); Knowledge based approach (Lesk‘s
          Algorithm), Supervised (Naïve Bayes, Decision List), Introduction to
          Semi-supervised method (Yarowsky), Unsupervised (Hyperlex)
5         Pragmatic & Discourse Processing                                             6
    5.1   Discourse: Reference Resolution, Reference Phenomena, Syntactic &
          Semantic constraint on coherence; Anaphora Resolution using Hobbs and
          Cantering Algorithm
6         Applications (preferably for Indian regional languages)                      5
    6.1   Machine Translation, Information Retrieval, Question Answers System,
          Categorization, Summarization, Sentiment Analysis, Named Entity
          Recognition.
    6.2   Linguistic Modeling – Neurolinguistics Models- Psycholinguistic Models –
          Functional Models of Language – Research Linguistic Models- Common
          Features of Modern Models of Language.
Textbooks:
 1 Daniel Jurafsky, James H. and Martin, Speech and Language Processing, Second Edition,
        Prentice Hall, 2008.
    2   Christopher D.Manning and HinrichSchutze, Foundations of Statistical Natural Language
        Processing, MIT Press, 1999.
References:
 1 Siddiqui and Tiwary U.S., Natural Language Processing and Information Retrieval, Oxford
        University Press, 2008.
    2   Daniel M Bikel and ImedZitouni ― Multilingual natural language processing applications: from
        theory to practice, IBM Press, 2013.
    3   Nitin Indurkhya and Fred J. Damerau, ―Handbook of Natural Language Processing, Second
        Edition, Chapman and Hall/CRC Press, 2010.
Assessment:
Internal Assessment:
Assessment consists of two class tests of 20 marks each. The first class test is to be conducted when
approx. 40% syllabus is completed and second class test when additional 40% syllabus is completed.
Duration of each test shall be one hour.
End Semester Theory Examination:
1       Question paper will comprise of total six questions.
2       All question carries equal marks
3       Questions will be mixed in nature (for example supposed Q.2 has part (a) from module 3 then
        part (b) will be from any module other than module 3)
4       Only Four question need to be solved
5       In question paper weightage of each module will be proportional to number of respective lecture
        hours as mention in the syllabus
Useful Links
1   https://onlinecourses.nptel.ac.in/noc21_cs102/preview
2   https://onlinecourses.nptel.ac.in/noc20_cs87/preview
3   https://nptel.ac.in/courses/106105158