0% found this document useful (0 votes)

11 views6 pages

Module 5 NLP

NLP module 5 by Vansh Neggi

Uploaded by

Vansh negi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Module 5 NLP

NLP module 5 by Vansh Neggi

Uploaded by

Vansh negi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

MODULE 5 NLP

1 Define Machine Translation. List and explain its types and applications.

Machine Translation (MT) is a subfield of Natural Language Processing (NLP) that deals with the
automatic translation of text or speech from one natural language to another using computer
systems, without human intervention.

Types of Machine Translation:

1. Rule-Based Machine Translation (RBMT):

o Works using grammar rules and dictionaries.

o Example: English–Hindi dictionary-based translation.

o Advantage: Grammatically accurate for simple sentences.

o Limitation: Needs a large number of rules, difficult to maintain.

2. Statistical Machine Translation (SMT):

o Works using probability and statistics from bilingual text data.

o Example: Google Translate (earlier versions).

o Advantage: Works well with large data.

o Limitation: Struggles with rare words and context.

3. Example-Based Machine Translation (EBMT):

o Works using previously translated examples

o Translates by matching new sentences with similar past examples.

o Advantage: Good for repetitive content.

o Limitation: Limited if example database is small.

4. Neural Machine Translation (NMT):

o Works using deep learning and neural networks.

o Example: Modern Google Translate, Microsoft Translator.

o Advantage: Produces fluent, natural translations.

o Limitation: Requires large computational resources.

Applications of Machine Translation (Alphabetical Order – 5 points)

1. Business Communication – Helps companies communicate with international clients by

translating emails, contracts, and documents.

2. Education – Assists students and researchers in understanding study material, books, and
papers written in foreign languages.
3. E-commerce – Translates product descriptions and customer reviews for global online
shopping platforms.

4. Healthcare – Helps doctors and patients from different language backgrounds to

communicate effectively.

5. Tourism & Travel – Real-time translation apps guide travelers to understand local signs,
menus, and conversations.

2. What are language divergences? Give examples in translation.

Language divergences occur when two languages express the same idea in different grammatical,
lexical, or structural ways. In Machine Translation, divergences create difficulties because a direct
word-by-word translation often leads to incorrect or unnatural sentences.

Types of Language Divergences with Examples:

1. Word Order Typology

 Languages differ in the order of Subject (S), Verb (V), and Object (O) in a sentence.

 Examples:

o English, German, French, Mandarin → SVO (e.g., I eat apples).

o Hindi, Japanese → SOV (Main seb khata hoon).

o Irish, Arabic → VSO (Eats the boy an apple).

 VO languages → usually have prepositions (English: to a friend).

 OV languages → usually have postpositions (Hindi: table par).

 Translation issue: Systems must reorder sentence structure.

2. Lexical Divergences

 Occur when words have different meanings, usages, or gaps across languages.

 Examples:

o English word bass → can mean a fish (lubina in Spanish) or a *musical instrument
(bajo in Spanish).

o English word wall → in German, Wand (inside wall) vs. Mauer (outside wall).

o English word brother → in Chinese there are two separate words: gege (elder
brother) and didi (younger brother).

 Translation issue: Context decides the correct word.

3. Morphological Typology

 Languages differ in how many morphemes (smallest meaning units) they pack into words.

1. Isolating Languages (Vietnamese): one morpheme per word.

2. Polysynthetic Languages (Siberian Yupik): many morphemes, one word = full sentence.

3. Agglutinative Languages (Turkish): clear separable morphemes, each with one meaning.

4. Fusional Languages (Russian): morphemes blend; one morpheme may show case + number
+ gender together.

 Translation issue: Morphologically rich languages need subword handling in MT.

4. Referential Density

 Measures how much a language uses explicit pronouns.

 Hot languages (high referential density): English → frequent pronouns (e.g., He is eating).

 Cold languages (low referential density): Chinese, Japanese → omit pronouns (Eating
instead of He is eating).

 Translation issue: From cold → hot, MT must insert missing pronouns.

Explain the Encoder-Decoder architecture used in Neural Machine Translation.

Ans :

The Encoder–Decoder architecture is the most widely used framework in Neural Machine Translation
(NMT). It is based on deep learning, where one network (encoder) reads the input sentence in the
source language and another network (decoder) generates the translation in the target language.

Working of Encoder–Decoder:

1. Encoder:

o Takes the input sentence in the source language (e.g., English).

o Converts each word into a vector (word embedding).

o Uses RNN, LSTM, or GRU to process the sequence of words.

o Produces a context vector (fixed-length representation) summarizing the whole

sentence.

2. Decoder:

o Takes the context vector from the encoder.

o Generates the target sentence (e.g., Hindi) word by word.

o At each step, it predicts the next word using probabilities.

o Example: Input: “I am happy” → Output: “Main khush hoon.”

3. Training:

o The system is trained on large parallel corpora (source–target pairs).

o The goal is to minimize the difference between predicted output and correct
translation.

Encoder Block

1. Self-Attention → looks at all words in the input.

2. Add & Norm → stabilize.

3. Feed-Forward → small NN for each word.

4. Add & Norm → stabilize again.

Decoder Block
1. Masked Self-Attention → looks at past words only.

2. Add & Norm.

3. Cross-Attention → looks at encoder output (input sentence).

4. Add & Norm.

5. Feed-Forward.

6. Add & Norm.

Limitations of Basic Encoder–Decoder:

 The context vector is fixed-length, so long sentences lose information.

 Translation may be inaccurate for complex sentences.

How is machine translation evaluated? Discuss BLEU and METEOR scores.

MT Evaluation

Machine Translation (MT) systems are evaluated mainly along two dimensions:

1. Adequacy – how well the translation captures the exact meaning of the source sentence
(faithfulness or fidelity).

2. Fluency – how natural, clear, and grammatically correct the translation is in the target
language.

Human judgment is the most reliable way to evaluate MT.

Raters score translations on scales (e.g., 1–5 or 1–100) based on fluency and
adequacy.
Human evaluation is slow and costly, so automatic metrics are often used. Common
automatic metrics include BLEU, METEOR, ROUGE, TER, etc.

1. BLEU (Bilingual Evaluation Understudy)

Definition:
BLEU is one of the first and most popular automatic metrics. It measures how many n-grams
(word sequences) in the machine output match with reference (human) translations.
How it works:
1. n-gram Precision: It calculates the overlap of 1-gram, 2-gram, 3-gram, and 4-gram
between system output and reference.
2. Brevity Penalty: If the machine output is too short compared to the reference, BLEU
reduces the score.
3. Final score is between 0 and 1 (or 0–100).
Example:
o Reference: “the cat is on the mat”
o MT Output: “the cat sat on the mat”
o Common n-grams: “the cat”, “on the mat”.
o BLEU score will be high because of overlaps, even though “is” vs “sat” is different.

2. METEOR (Metric for Evaluation of Translation with Explicit ORdering)

Definition:
METEOR was developed to address BLEU’s weaknesses. It tries to match words semantically
and not just by surface form.
How it works:
1. Checks exact word matches.
2. Includes stem matches (run vs running).
3. Includes synonym matches (big vs large).
4. Adds word order penalties (if words are jumbled, score decreases).
5. Final score ranges from 0 to 1.
Example:
o Reference: “the boy is playing football”
o MT Output: “the kid plays soccer”
o BLEU score will be low (few exact matches).
o METEOR score will be higher because it matches “boy–kid” (synonym), “football–
soccer”, “play–plays”.

13 Machine Translation
No ratings yet
13 Machine Translation
22 pages
Translation Theory
No ratings yet
Translation Theory
44 pages
18 Machine Translation
No ratings yet
18 Machine Translation
42 pages
Unit 5
No ratings yet
Unit 5
42 pages
Machine Translation Questions
No ratings yet
Machine Translation Questions
9 pages
Paper Review
No ratings yet
Paper Review
41 pages
Encoder Decoder Models
No ratings yet
Encoder Decoder Models
31 pages
Machine Translation Insights
No ratings yet
Machine Translation Insights
71 pages
Machine Translation Final Draft
No ratings yet
Machine Translation Final Draft
27 pages
Module 5 NLP
No ratings yet
Module 5 NLP
28 pages
Wa0194.
No ratings yet
Wa0194.
4 pages
FN Paper 2
No ratings yet
FN Paper 2
13 pages
NLP CH 7 Machine Translation
No ratings yet
NLP CH 7 Machine Translation
86 pages
NLP Module5 and 6
No ratings yet
NLP Module5 and 6
31 pages
Luận Văn Integrated Linguistic to Statistical Machine Translation Tích Hợp Thông Tin Ngôn Ngữ Vào Dịch Máy Tính Thống Kê
No ratings yet
Luận Văn Integrated Linguistic to Statistical Machine Translation Tích Hợp Thông Tin Ngôn Ngữ Vào Dịch Máy Tính Thống Kê
16 pages
ASWIN TS Unit 3 NLP Translations Gen AI
No ratings yet
ASWIN TS Unit 3 NLP Translations Gen AI
5 pages
Bangla-English MT Thesis
No ratings yet
Bangla-English MT Thesis
112 pages
2018 - Generating Noun Declension-Case Markers For English To Indian Languages in Declension Rule Based MT Systems
No ratings yet
2018 - Generating Noun Declension-Case Markers For English To Indian Languages in Declension Rule Based MT Systems
7 pages
Machine Translation, Auto Encoders and Decoders
No ratings yet
Machine Translation, Auto Encoders and Decoders
29 pages
Machine Translation MT
No ratings yet
Machine Translation MT
29 pages
Ia 3 Scheme DS
No ratings yet
Ia 3 Scheme DS
6 pages
NLP Machine Translation Week6
No ratings yet
NLP Machine Translation Week6
17 pages
Mod 5
No ratings yet
Mod 5
16 pages
TTT Class 1
No ratings yet
TTT Class 1
15 pages
Natural Language Processing Unit 5
No ratings yet
Natural Language Processing Unit 5
23 pages
Machine Translation
No ratings yet
Machine Translation
13 pages
NLP - PPT - CH 7
No ratings yet
NLP - PPT - CH 7
75 pages
(Slide) Neural Machine Translation
No ratings yet
(Slide) Neural Machine Translation
37 pages
NLP M5 Part-2 SPP
No ratings yet
NLP M5 Part-2 SPP
62 pages
TIpos de Traducciones
No ratings yet
TIpos de Traducciones
55 pages
Module 5 Notes
No ratings yet
Module 5 Notes
22 pages
MT Slides
No ratings yet
MT Slides
236 pages
RCSHPPR 22
No ratings yet
RCSHPPR 22
5 pages
Comparative Study of Machine Translation Techniques
No ratings yet
Comparative Study of Machine Translation Techniques
16 pages
Urk22ai1022 NLP Qa
No ratings yet
Urk22ai1022 NLP Qa
21 pages
Module 5
No ratings yet
Module 5
50 pages
An Introduction To Machine Translation: Andy Way, DCU
No ratings yet
An Introduction To Machine Translation: Andy Way, DCU
23 pages
Natural Language Processing For Language Translation
No ratings yet
Natural Language Processing For Language Translation
23 pages
Challenges in NMT - 2004.05809
No ratings yet
Challenges in NMT - 2004.05809
22 pages
Machine Translation Approaches Issues An
No ratings yet
Machine Translation Approaches Issues An
7 pages
Machine Translation Thesis PDF
100% (3)
Machine Translation Thesis PDF
8 pages
NLP - PPT - CH 7
No ratings yet
NLP - PPT - CH 7
75 pages
Machine Translation
No ratings yet
Machine Translation
58 pages
Introduction Transformer
No ratings yet
Introduction Transformer
2 pages
English Amharic Document Translation Using Hybrid Approach - by Samrawit Zewgneh - Addis Ababa University
100% (2)
English Amharic Document Translation Using Hybrid Approach - by Samrawit Zewgneh - Addis Ababa University
62 pages
Lect 07 - MT and Seq2seq
No ratings yet
Lect 07 - MT and Seq2seq
86 pages
Assignment 2 Report
No ratings yet
Assignment 2 Report
10 pages
Salloum Columbia 0054D 14922
No ratings yet
Salloum Columbia 0054D 14922
201 pages
Cs224n 2020 Lecture08 NMT
No ratings yet
Cs224n 2020 Lecture08 NMT
77 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
Machine Translation Systems For Indian Languages: Review of Modelling Techniques, Challenges, Open Issues and Future Research Directions
No ratings yet
Machine Translation Systems For Indian Languages: Review of Modelling Techniques, Challenges, Open Issues and Future Research Directions
29 pages
Leeds 2006
No ratings yet
Leeds 2006
34 pages
Machine Learning in Translation Corpora Processing
No ratings yet
Machine Learning in Translation Corpora Processing
281 pages
Nlp-Module 5
No ratings yet
Nlp-Module 5
46 pages
Machine Translation
No ratings yet
Machine Translation
234 pages
Assignment 05 CL
No ratings yet
Assignment 05 CL
3 pages
NLP Mini Project Report by Vansh Editted
No ratings yet
NLP Mini Project Report by Vansh Editted
11 pages
Data and Signals
No ratings yet
Data and Signals
24 pages
Eligibility Criteria - Technical Batch 2026
No ratings yet
Eligibility Criteria - Technical Batch 2026
3 pages
NPTEL BIA Week3 Medium Quiz by Vansh Neggi
No ratings yet
NPTEL BIA Week3 Medium Quiz by Vansh Neggi
3 pages
Vansh Neggi: Software Engineering
No ratings yet
Vansh Neggi: Software Engineering
1 page
MNPREPORT
No ratings yet
MNPREPORT
7 pages
SDG Abswwr
No ratings yet
SDG Abswwr
15 pages
NLP Repport
No ratings yet
NLP Repport
11 pages
Devops Report
No ratings yet
Devops Report
10 pages
Applied Illumination Engineering
No ratings yet
Applied Illumination Engineering
12 pages
Urban Lighting Research Transdisciplinary Framework A Collaborative
No ratings yet
Urban Lighting Research Transdisciplinary Framework A Collaborative
18 pages
Finalppt
No ratings yet
Finalppt
12 pages
Scheme and Syllabus - 5th Sem 2022-28th Sep
No ratings yet
Scheme and Syllabus - 5th Sem 2022-28th Sep
33 pages
Complete CSE Software Projects List For Cse by Vansh
No ratings yet
Complete CSE Software Projects List For Cse by Vansh
6 pages
Data 3
No ratings yet
Data 3
65 pages
DT20246424137 Application
No ratings yet
DT20246424137 Application
5 pages
Data 1
No ratings yet
Data 1
66 pages
Unification, Number, List, Set, Assertion, Queries
No ratings yet
Unification, Number, List, Set, Assertion, Queries
7 pages
Data and Signals
No ratings yet
Data and Signals
14 pages
Magicpin Invoice 1
No ratings yet
Magicpin Invoice 1
1 page
Cache by Vansh
No ratings yet
Cache by Vansh
3 pages
Software Attributes
No ratings yet
Software Attributes
1 page
Mod 3
No ratings yet
Mod 3
6 pages
MODULE 2 Sepm Vansh
No ratings yet
MODULE 2 Sepm Vansh
3 pages
Module 4 Sepm Vansh
No ratings yet
Module 4 Sepm Vansh
13 pages
Sepm Risk Management
No ratings yet
Sepm Risk Management
4 pages
Mod4 Sepm Word
No ratings yet
Mod4 Sepm Word
4 pages
Aptitude Book New 2023
No ratings yet
Aptitude Book New 2023
212 pages
Ada Module 1 Question Bank
No ratings yet
Ada Module 1 Question Bank
1 page
An Overview of Applied Linguistics
No ratings yet
An Overview of Applied Linguistics
22 pages
Metodika 2 Imtahan Sualları III Kurs - 2024
No ratings yet
Metodika 2 Imtahan Sualları III Kurs - 2024
3 pages
READING INTERVENTION PLAN - G3 Escoda
No ratings yet
READING INTERVENTION PLAN - G3 Escoda
5 pages
LOOC SAPI PS CONSOLIDATED DATAs FOR CRLA RMA Sy2023 2024
No ratings yet
LOOC SAPI PS CONSOLIDATED DATAs FOR CRLA RMA Sy2023 2024
2 pages
Eight Types of Translation
100% (1)
Eight Types of Translation
2 pages
Teaching Reading Techniques and Materials For Non Readers: Mrs. Debbie S. Ocampo
No ratings yet
Teaching Reading Techniques and Materials For Non Readers: Mrs. Debbie S. Ocampo
70 pages
Critical Period Hypothesis With Cases
No ratings yet
Critical Period Hypothesis With Cases
19 pages
Types of Bilingualism Part 1
80% (5)
Types of Bilingualism Part 1
34 pages
Brill's Rule-Based PoS Tagger
No ratings yet
Brill's Rule-Based PoS Tagger
10 pages
Unit Learning Goals Diagram English 8 (Q2)
No ratings yet
Unit Learning Goals Diagram English 8 (Q2)
2 pages
Presentation Dele For Students
No ratings yet
Presentation Dele For Students
12 pages
Vocabulary: Look at The Picture. Then Complete The Sentences With The Words Below
No ratings yet
Vocabulary: Look at The Picture. Then Complete The Sentences With The Words Below
2 pages
Voices Beginner U01 Speaking Test
No ratings yet
Voices Beginner U01 Speaking Test
4 pages
Multimodal Text - Presentation Rubric
No ratings yet
Multimodal Text - Presentation Rubric
1 page
Korean Language 4B WB Sample
No ratings yet
Korean Language 4B WB Sample
22 pages
5 LP 22
No ratings yet
5 LP 22
10 pages
Literature in ELT: A Teaching Guide
No ratings yet
Literature in ELT: A Teaching Guide
4 pages
Vocabulary in Language Teaching
57% (7)
Vocabulary in Language Teaching
14 pages
Cognitive-Interactionist Perspective On SLA
No ratings yet
Cognitive-Interactionist Perspective On SLA
7 pages
Sun & Dang (2020)
No ratings yet
Sun & Dang (2020)
42 pages
Mtb-Mle - Group 1 - Strategies in Teaching Decoding PDF
No ratings yet
Mtb-Mle - Group 1 - Strategies in Teaching Decoding PDF
59 pages
Lemmatization Stemming Presentation
No ratings yet
Lemmatization Stemming Presentation
11 pages
Review of Empirical Studies On Collocation
No ratings yet
Review of Empirical Studies On Collocation
10 pages
Running Head: The Benefits of Speaking Another Language 1
No ratings yet
Running Head: The Benefits of Speaking Another Language 1
5 pages
TOEFL Test Types Explained
No ratings yet
TOEFL Test Types Explained
3 pages
Translation Studies, Lecture 1
100% (1)
Translation Studies, Lecture 1
34 pages
Pembagian Tugas Presentasi Inggris 3a
No ratings yet
Pembagian Tugas Presentasi Inggris 3a
3 pages
Big Books Explanation
No ratings yet
Big Books Explanation
2 pages
Springer Machine Translation: This Content Downloaded From 42.111.238.73 On Sun, 12 May 2019 05:30:19 UTC
No ratings yet
Springer Machine Translation: This Content Downloaded From 42.111.238.73 On Sun, 12 May 2019 05:30:19 UTC
5 pages
Survey Questionnaire
No ratings yet
Survey Questionnaire
3 pages

Module 5 NLP

Uploaded by

Module 5 NLP

Uploaded by

MODULE 5 NLP

Types of Machine Translation:

1. Rule-Based Machine Translation (RBMT):

o Works using grammar rules and dictionaries.

o Example: English–Hindi dictionary-based translation.

o Advantage: Grammatically accurate for simple sentences.

o Limitation: Needs a large number of rules, difficult to maintain.

2. Statistical Machine Translation (SMT):

o Works using probability and statistics from bilingual text data.

o Example: Google Translate (earlier versions).

o Advantage: Works well with large data.

o Limitation: Struggles with rare words and context.

3. Example-Based Machine Translation (EBMT):

o Works using previously translated examples

o Translates by matching new sentences with similar past examples.

o Advantage: Good for repetitive content.

o Limitation: Limited if example database is small.

4. Neural Machine Translation (NMT):

o Works using deep learning and neural networks.

o Example: Modern Google Translate, Microsoft Translator.

o Advantage: Produces fluent, natural translations.

o Limitation: Requires large computational resources.

Applications of Machine Translation (Alphabetical Order – 5 points)

1. Business Communication – Helps companies communicate with international clients by

4. Healthcare – Helps doctors and patients from different language backgrounds to

2. What are language divergences? Give examples in translation.

Types of Language Divergences with Examples:

1. Word Order Typology

o English, German, French, Mandarin → SVO (e.g., I eat apples).

o Hindi, Japanese → SOV (Main seb khata hoon).

o Irish, Arabic → VSO (Eats the boy an apple).

 VO languages → usually have prepositions (English: to a friend).

 OV languages → usually have postpositions (Hindi: table par).

 Translation issue: Systems must reorder sentence structure.

 Translation issue: Context decides the correct word.

1. Isolating Languages (Vietnamese): one morpheme per word.

 Translation issue: Morphologically rich languages need subword handling in MT.

 Measures how much a language uses explicit pronouns.

 Translation issue: From cold → hot, MT must insert missing pronouns.

Explain the Encoder-Decoder architecture used in Neural Machine Translation.

o Takes the input sentence in the source language (e.g., English).

o Converts each word into a vector (word embedding).

o Uses RNN, LSTM, or GRU to process the sequence of words.

o Produces a context vector (fixed-length representation) summarizing the whole

o Takes the context vector from the encoder.

o Generates the target sentence (e.g., Hindi) word by word.

o At each step, it predicts the next word using probabilities.

o Example: Input: “I am happy” → Output: “Main khush hoon.”

o The system is trained on large parallel corpora (source–target pairs).

1. Self-Attention → looks at all words in the input.

2. Add & Norm → stabilize.

3. Feed-Forward → small NN for each word.

4. Add & Norm → stabilize again.

2. Add & Norm.

3. Cross-Attention → looks at encoder output (input sentence).

4. Add & Norm.

6. Add & Norm.

Limitations of Basic Encoder–Decoder:

 The context vector is fixed-length, so long sentences lose information.

 Translation may be inaccurate for complex sentences.

How is machine translation evaluated? Discuss BLEU and METEOR scores.

Human judgment is the most reliable way to evaluate MT.

1. BLEU (Bilingual Evaluation Understudy)

2. METEOR (Metric for Evaluation of Translation with Explicit ORdering)

You might also like