NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation,CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation

CS8084                       NATURAL LANGUAGE PROCESSING                             L T P C                                                                                                                            3 0 0 3


  • To learn the fundamentals of natural language processing
  • To understand the use of CFG and PCFG in NLP
  • To understand the role of semantics of sentences and pragmatics
  • To apply the NLP techniques to IR applications

UNIT I INTRODUCTION                                                   9

Origins and challenges of NLP – Language Modeling: Grammar-based LM, Statistical LM – Regular Expressions, Finite-State Automata – English Morphology, Transducers for lexicon and rules, Tokenization, Detecting and Correcting Spelling Errors, Minimum Edit Distance

UNIT II WORD LEVEL ANALYSIS                                   9

Unsmoothed N-grams, Evaluating N-grams, Smoothing, Interpolation and Backoff – Word Classes, Part-of-Speech Tagging, Rule-based, Stochastic and Transformation-based tagging, Issues in PoS tagging – Hidden Markov and Maximum Entropy models.

UNIT III SYNTACTIC ANALYSIS                                      9

Context-Free Grammars, Grammar rules for English, Treebanks, Normal Forms for grammar – Dependency Grammar – Syntactic Parsing, Ambiguity, Dynamic Programming parsing – Shallow parsing – Probabilistic CFG, Probabilistic CYK, Probabilistic Lexicalized CFGs – Feature structures, Unification of feature structures.


Requirements for representation, First-Order Logic, Description Logics – Syntax-Driven Semantic analysis, Semantic attachments – Word Senses, Relations between Senses, Thematic Roles, selectional restrictions – Word Sense Disambiguation, WSD using Supervised, Dictionary & Thesaurus, Bootstrapping methods – Word Similarity using Thesaurus and Distributional methods.

UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCES                                                                   8

Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using Hobbs and Centering Algorithm – Coreference Resolution – Resources: Porter Stemmer, Lemmatizer, Penn Treebank, Brill’s Tagger, WordNet, PropBank, FrameNet, Brown Corpus, British National Corpus (BNC).

                                                                                                      TOTAL :45 PERIODS


Upon completion of the course, the students will be able to:

  1. To tag a given text with basic Language features
  2. To design an innovative application using NLP components
  3. To implement a rule based system to tackle morphology/syntax of a language
  4. To design a tag set to be used for statistical processing for real-time applications
  5. To compare and contrast the use of different statistical approaches for different types of NLP applications.


