UNIT-5 Over View
Here's a simple and clear explanation of each topic in Language Modeling, broken down for easy
understanding:
1. Language Modeling: Introduction
Language modeling is about predicting the next word in a sequence of words.
For example:
Given the sentence "I like to eat", a language model tries to predict the next word, like "pizza" or
"ice cream".
Use: It helps in applications like speech recognition, text generation, machine translation, etc.
2. N-Gram Models
    •   An N-gram is a sequence of N words.
            o   Unigram = 1 word at a time (e.g., "I", "like", "pizza")
            o   Bigram = 2 words (e.g., "I like", "like pizza")
            o   Trigram = 3 words (e.g., "I like pizza")
    •   N-gram models predict the next word based on the previous N-1 words.
            o   Example (bigram):
                "I like ___" → predict "pizza"
    •   Limitation: Only looks at a fixed small window of previous words.
3. Language Model Evaluation
We need to check how good a language model is.
Common metrics:
    •   Perplexity: Measures how surprised the model is by the next word.
        Lower perplexity = Better model.
    •   Accuracy: How often the model predicts the correct next word.
4. Bayesian Parameter Estimation
Sometimes, we have little data. Bayesian methods help by:
    •   Starting with a prior belief (what we think before seeing data),
    •   Updating it using observed data → gives posterior (final belief).
Helps avoid zero probabilities in N-gram models (e.g., when a word combination is missing from
data).
5. Language Model Adaptation
Adapting a model means tuning it to work better on a specific domain or user.
Example: A general model may not work well for medical text. So we "adapt" it using some medical
data, making it more accurate for that domain.
6. Class-based Language Models
Instead of using actual words, group words into classes like:
    •   Animals = {cat, dog, horse}
    •   Actions = {run, eat, sleep}
Now, model the probability of classes and words inside them.
Why? This reduces complexity and helps when there’s little data.
7. Variable-length Language Models
    •   N-gram models use a fixed window (e.g., always 3 words).
    •   But sometimes, longer history is helpful.
    •   Variable-length models (like Probabilistic Suffix Trees) use more context when needed, and
        less when not.
They are smarter about how much of the past to look at.
8. Bayesian Topic-based Language Models
These models assume that a document is about topics.
For example:
    •   Topic 1 = Sports → words like "goal", "team", "match"
    •   Topic 2 = Cooking → "recipe", "salt", "oven"
Use Bayesian methods to:
    •   Infer which topics are present in a document.
    •   Predict words based on topic distributions.
Latent Dirichlet Allocation (LDA) is a popular example.
9. Multilingual and Cross-lingual Language Modeling
    •   Multilingual: A single model that understands multiple languages.
        Example: One model that can predict in English, French, and Spanish.
    •   Cross-lingual: A model that transfers knowledge from one language to another.
        Example: Learn from English, use that to understand Hindi.
Useful for translation, low-resource languages, and multi-language apps.