0% found this document useful (0 votes)

48 views28 pages

Matters of Discussion

The document discusses the theory of generalization in machine learning, emphasizing its importance for model performance on unseen data and the tradeoff between approximation and generalization. It covers concepts such as generalization bounds, VC and Natarajan dimensions, PAC learning, Bayes theorem, and the Minimum Description Length principle, all of which contribute to understanding model selection and performance. The goal is to find a balance in model complexity that minimizes error while ensuring effective generalization.

Uploaded by

kkhare603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views28 pages

Matters of Discussion

Uploaded by

kkhare603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Matters of Discussion

Theory of generalization
Generalization bound – approximation-generalization tradeoff – bias
and variance – learning curve

- Finite and Infinite Hypothesis Spaces,

Probably Approximately Correct (PAC) Learning-

Bayes theorem,
MDL principle.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

1
Generalization in Machine Learning:
❖ Definition:
Generalization is the core goal of machine learning – to
build models that can accurately predict outcomes on data
they haven't encountered during training.
❖ Importance:
Effective generalization ensures that models are useful in
real-world applications, not just in the specific training
environment.
❖ Factors Influencing Generalization:
Overfitting (model learns noise in training data) and
underfitting (model is too simple to capture underlying
patterns) can both hamper generalization.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
2
Theory of generalization
❖ In machine learning, generalization refers to a model's ability to
perform well on unseen data after being trained on a specific
dataset.

❖ Generalization bounds are mathematical expressions that provide

upper limits on this generalization ability, essentially quantifying
how well a trained model will perform on new, unseen data.

❖ The approximation-generalization tradeoff highlights the inherent

tension between a model's ability to fit the training data (low
approximation error) and its ability to generalize to new data (low
generalization error).
3
Generalization Bounds:
❖ These bounds provide a theoretical way to assess how well a
learning algorithm will generalize.
❖ They typically relate the training error to the generalization
error (error on unseen data).
❖ Common Approaches:
➢ Rademacher Complexity: Measures the complexity of a
hypothesis space by how well it can fit random noise.
➢ VC Dimension: A measure of the capacity of a hypothesis
space, indicating how well it can classify data points.
➢ Concentration Inequalities: Statistical tools used to bound
the difference between the empirical (training) error and the
true (generalization) error.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
4
Key insights of VC dimension
❖ The VC dimension is defined for spaces of binary
functions (functions to {0,1}).
❖ Several generalizations have been suggested for
spaces of non-binary functions.
❖ For multi-valued functions (functions to {0,...,n}),
the Natarajan dimension can be used.
❖ VC dimension- Binary classification, Natarajan
dimension- Multi-class classification
VC dimension- Binary classification, Natarajan dimension- Multi-class
classification 5
Natarajan dimension
For example, let’s say your visual system can recognize N
different Boolean features
(e.g., has-tail, has-fur, can-bark, has-four-legs, can-fly,
and so on).
We might describe a dog as an animal that
has-fur (1), can-fly (0), has-four-legs (1), can-bark(1)
1- yes; 0--no

Fur– soft thick hair that covers the bodies of some animals
6
Approximation-Generalization Tradeoff:
❖ This tradeoff highlights the tension between a model's
ability to fit the training data (approximation) and its ability
to generalize to new data (generalization).
❖ Complex Models: Have more parameters and can fit the
training data very well (low approximation error), but they
are more prone to overfitting, leading to high generalization
error.
❖ Simple Models: Have fewer parameters, may not fit the
training data perfectly (high approximation error), but can
be more robust and generalize better to new data (low
generalization error).

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

7
Cont..
Finding the Sweet Spot:
The goal is to find a model complexity that balances
approximation and generalization, minimizing the overall error.
Example:
➢ If the model is too simple (underfitting), it will have high bias
(systematic error) and high training and generalization error.
➢ If the model is too complex (overfitting), it will have low
bias, high variance (sensitivity to training data), and high
generalization error.

8
summary
❖ In essence, the theory of generalization and generalization
bounds provides a framework for understanding the
limitations and potential of machine learning models,

❖ The approximation-generalization tradeoff highlights the

practical challenges of balancing model complexity and
predictive performance on unseen data.

9
Finite and Infinite Hypothesis Spaces
❖ ML professionals and data scientists make an initial assumption for the
solution of the problem.

❖ This assumption in Machine learning is known as Hypothesis. In

Machine Learning, at various times, Hypothesis and Model are used
interchangeably.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

10
Cont.. Finite Hypothesis Space:
- In a finite hypothesis space, the learning algorithm is restricted to
choose from a limited, finite set of hypothesis functions. This means
that the algorithm can only consider a fixed number of possible
solutions during the learning process.
- Examples of algorithms that operate within a finite hypothesis space
include decision trees with a limited depth, linear regression with a
fixed set of features, and finite sets of rules in rule-based systems.
- Advantages of a finite hypothesis space include computational
efficiency, simplicity, and potentially improved generalization to new,
unseen data if the true function can be accurately represented within
the limited set of hypotheses.

11
Cont.. infinite hypothesis space
In an infinite hypothesis space, the learning algorithm can potentially
consider an unlimited number of hypothesis functions. This allows for a more
flexible and expressive representation of the true function, as the algorithm
is not constrained by a fixed set of hypotheses.
- Examples of algorithms that operate within an infinite hypothesis space
include neural networks with a large number of parameters, kernel methods
such as support vector machines with infinite-dimensional feature spaces,
and non-parametric models like Gaussian processes.
- Advantages of an infinite hypothesis space include the ability to
approximate complex, nonlinear functions more accurately, potentially
leading to better performance on complex tasks where the true function is
highly intricate or not easily captured by a simpler model.

12
PAC Learning in ML
❖ Probably approximately correct learning
❖ It is a framework for mathematical analysis of
machine learning.
❖ defines a mathematical relationship between the
number of training samples, the error rate, and the
probability that the available training data are large
enough to attain the desired error rate.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

13
Why PAC Learning?
❖ We use a collection of training samples to train a machine
learning algorithm to learn classification or regression patterns.
Usually, we calculate various metrics to estimate its
performance. One of these metrics is the error rate.
❖ However, the problem is that we can only estimate it using a test
set. The estimates differ from the actual value due to their
statistical nature. So, we may ask how many training samples we
need to be confident about our estimates.
❖ The PAC theory is an answer to that. It’s about finding the
relationship between the true error rate and the number of
training samples.

14
Cont.. Example

Cat & Dog Classification

15
Cont..
❖ Imagine you have a teacher who wants to teach you how to recognize
cats from dogs based on pictures.
❖ In the PAC learning framework, the teacher's goal is to help you learn
to classify new images correctly with high probability and a small
margin of error.
❖ "Probably" refers to the idea that the learner should be able to classify
new examples with high probability.
❖ This means that the learner should be correct most of the time when
making predictions based on the training data.
❖ "Approximately Correct" means that the learner is not expected to be
perfect but should classify most examples correctly.
❖ There is an allowance for some errors, but the errors should be limited
within a small margin. 16
Cont..
❖ In PAC learning, the learner aims to find a hypothesis (a
model) that approximates the true relationship between
inputs and outputs in the data.
❖ The goal is to minimize the error between the hypothesis
and the true function that generated the data.
❖ To achieve PAC learning, the learner needs to find a
balance between fitting the training data well (achieving
low error) and generalizing to new, unseen data.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

17
18
Key insight on PAC learning
PAC learning is a theoretical framework that aims to
address the fundamental challenge in ML:
▪ building models that can generalize well to
unseen data.
▪ PAC learning aims to find a hypothesis, or a
model, that approximates the target concept as
accurately as possible.

****************************************

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

19
Bayes theorem
❖ Bayes' Theorem is a fundamental concept in
probability theory and machine learning that helps
update the probability of a hypothesis based on new
evidence.
❖ In machine learning, it's crucial for tasks like
classification, where it's used to calculate the
probability of a data point belonging to a specific
class, given its features.
❖ How it's used in Machine Learning:
❖ Classification: Bayes' Theorem is the foundation of
the Naive Bayes classifier, a simple yet effective
algorithm for classifying data points into different
categories.
20
Cont.. Key Concepts:
Prior Probability:
The initial belief about the probability of an event before considering any
new evidence (e.g., the overall probability of spam emails).
Likelihood:
The probability of observing the new evidence given a specific hypothesis
(e.g., the probability of seeing certain words in an email given that it is
spam).
Posterior Probability:
The updated probability of a hypothesis after considering the new evidence
(e.g., the probability that an email is spam after analyzing its content).
Evidence:
The overall probability of observing the new evidence.
21
Cont..
Bayes' Theorem Formula:
P(A|B) = [P(B|A) * P(A)] / P(B)
Where:
P(A|B) is the posterior probability of event A given evidence
B.
P(B|A) is the likelihood of evidence B given event A.
P(A) is the prior probability of event A.
P(B) is the evidence, or the probability of observing evidence
B.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

22
MDL principle
❖ The Minimum Description Length (MDL) principle in machine
learning is a model selection method that favors the model that
provides the shortest description of the data.
❖ It's based on the idea that the best model for a given dataset is
the one that allows for the most efficient compression of that
data.
❖ MDL balances model complexity with its ability to fit the data,
selecting the model that achieves the best trade-off between
these two factors.

Fur– soft thick hair that covers the bodies of some animals
23
Cont..
❖ MDL is rooted in information theory and
aims to find the simplest explanation for
observed data.

❖ The principle suggests that the best model is

the one that allows you to describe the data
using the fewest number of bits.

Fur– soft thick hair that covers the bodies of some animals
24
How MDL Works:
1. Model Selection:
MDL compares different models (hypotheses) by calculating how much data is needed to
describe both the model and the data using that model.
2. Description Length:
This involves calculating the length of the code required to represent the model itself (model
description length) and the length of the code needed to describe the data given that model
(data description length).
3. Trade-off:
A more complex model will have a shorter data description length (because it fits the data
better), but a longer model description length (because it is more complex). Conversely, a
simpler model will have a longer data description length, but a shorter model description length.
4. Optimal Model:
The MDL principle selects the model for which the sum of the model description length and the
data description length is minimized.

25
How MDL Works: cont..example
➢ Imagine you have a dataset of coin flips.
➢ You could model this data with a simple model that assumes a fair
coin (50/50 chance of heads or tails), or with a more complex
model that allows for a different probability of heads and tails.
➢ MDL would compare the description length of the data using each
model.
➢ If the observed data is significantly closer to 50/50, the simpler
fair coin model might be preferred, as it would require fewer bits
to describe the data given the model.
➢ If the observed data deviates significantly from 50/50, a more
complex model allowing for a different probability might be
chosen. 26
Explore on your end [A-4]

Investigate VC and Natarajan dimensions with

some relevant problem scenarios.

Vapnik-Chervonenkis dimension—VC

27
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

PAC Learning & Machine Learning Course
No ratings yet
PAC Learning & Machine Learning Course
36 pages
PAC Bayesian Learning Introduction
No ratings yet
PAC Bayesian Learning Introduction
124 pages
Module 1 Part3
No ratings yet
Module 1 Part3
56 pages
MACHINE LEARNING Updated
No ratings yet
MACHINE LEARNING Updated
12 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
12 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
Unit 3
No ratings yet
Unit 3
99 pages
Al3451 - Machine Learning - Answer Key 13 Mark
No ratings yet
Al3451 - Machine Learning - Answer Key 13 Mark
22 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Computer Network: 02 December 2024 22:38
No ratings yet
Computer Network: 02 December 2024 22:38
5 pages
Lec-3-Vc Dimension and Pac Learning
No ratings yet
Lec-3-Vc Dimension and Pac Learning
19 pages
Notes
No ratings yet
Notes
125 pages
Hypothesis Space and Inductive Bias
No ratings yet
Hypothesis Space and Inductive Bias
51 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Aml CH.1
No ratings yet
Aml CH.1
11 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Finite and Infinite Hypothesis Spaces - PAC and Bayes Theorem
No ratings yet
Finite and Infinite Hypothesis Spaces - PAC and Bayes Theorem
9 pages
PAC Learning Frameworks Explained
No ratings yet
PAC Learning Frameworks Explained
59 pages
Week 3
No ratings yet
Week 3
56 pages
Machine Learning Module 2 Overview
No ratings yet
Machine Learning Module 2 Overview
20 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
24 pages
Machine Leaning 1 Unit
No ratings yet
Machine Leaning 1 Unit
10 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
No ratings yet
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
51 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Learning Theory for CS Scholars
No ratings yet
Learning Theory for CS Scholars
91 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
ML Unit-1
No ratings yet
ML Unit-1
42 pages
ML 01
No ratings yet
ML 01
24 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
Unit 3
No ratings yet
Unit 3
16 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
MachineLearning - UNIT III
No ratings yet
MachineLearning - UNIT III
30 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
38 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
MLSM Lecture2 120923
No ratings yet
MLSM Lecture2 120923
35 pages
Machine Learning Basics Dl2 RK
No ratings yet
Machine Learning Basics Dl2 RK
16 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
FML (2) (1) - Compressed
No ratings yet
FML (2) (1) - Compressed
13 pages
PAC Learning Explained
No ratings yet
PAC Learning Explained
15 pages
Lecture+Notes+Model+ Selection PDF
No ratings yet
Lecture+Notes+Model+ Selection PDF
12 pages
Inherent Stochasticity
No ratings yet
Inherent Stochasticity
12 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
No ratings yet
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
195 pages
Lecture5 Learning Theory v1.1
No ratings yet
Lecture5 Learning Theory v1.1
59 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
SLT 2024
No ratings yet
SLT 2024
94 pages
Chapter 08
100% (2)
Chapter 08
202 pages
PAC Learning and Complexity
No ratings yet
PAC Learning and Complexity
14 pages
Lecture22 s12
No ratings yet
Lecture22 s12
21 pages
DL Unit1
100% (2)
DL Unit1
79 pages
DL Unit1
100% (1)
DL Unit1
61 pages
PAC Learning for ML Theorists
No ratings yet
PAC Learning for ML Theorists
34 pages
Machine Learning Unit 2 Detailed Study Material
No ratings yet
Machine Learning Unit 2 Detailed Study Material
2 pages
ML Unit-4 Prob Learning
No ratings yet
ML Unit-4 Prob Learning
36 pages
Deep Learning Is Not So Mysterious or Different: Andrew Gordon Wilson
No ratings yet
Deep Learning Is Not So Mysterious or Different: Andrew Gordon Wilson
20 pages
Idp-2 Class 10
67% (3)
Idp-2 Class 10
2 pages
Al Farabi
No ratings yet
Al Farabi
11 pages
Nelson Math Sampler K-6
100% (5)
Nelson Math Sampler K-6
163 pages
Case Study - Career Planning
100% (1)
Case Study - Career Planning
5 pages
Perancangan Model Data Flow Diagram Untuk Mengukur Kualitas Website Menggunakan Webqual 4.0
No ratings yet
Perancangan Model Data Flow Diagram Untuk Mengukur Kualitas Website Menggunakan Webqual 4.0
8 pages
Batch B 2025 Pre Mob Full Time
No ratings yet
Batch B 2025 Pre Mob Full Time
266 pages
Mathematics N5 Study Notes
No ratings yet
Mathematics N5 Study Notes
3 pages
Digital Kids Planner
No ratings yet
Digital Kids Planner
38 pages
Taming The Tongue
No ratings yet
Taming The Tongue
18 pages
AKTU B.Tech Exam Admit Card 2022-23
No ratings yet
AKTU B.Tech Exam Admit Card 2022-23
2 pages
Notice 1171
No ratings yet
Notice 1171
15 pages
Presentation Model Lesson Plan Template
No ratings yet
Presentation Model Lesson Plan Template
7 pages
Heidi M. Ladignon-Globalization and Language Education
No ratings yet
Heidi M. Ladignon-Globalization and Language Education
7 pages
Daus Ashkanasy 05
No ratings yet
Daus Ashkanasy 05
32 pages
Issues On Human Development
No ratings yet
Issues On Human Development
20 pages
Application Form: Professional Regulation Commission
No ratings yet
Application Form: Professional Regulation Commission
1 page
Addition of Whole Numbers
No ratings yet
Addition of Whole Numbers
16 pages
E-Sports Pre-Event Script: Emcee 1
No ratings yet
E-Sports Pre-Event Script: Emcee 1
4 pages
OT Believers Indwelt-McCabe
No ratings yet
OT Believers Indwelt-McCabe
50 pages
l1 Capitulo 1b
No ratings yet
l1 Capitulo 1b
1 page
GLAD-Grade Level Assessment Device For Children - Removed
No ratings yet
GLAD-Grade Level Assessment Device For Children - Removed
7 pages
Online Elearning Proposal
No ratings yet
Online Elearning Proposal
29 pages
Level 3 NVQ in Plumbing & Heating
No ratings yet
Level 3 NVQ in Plumbing & Heating
4 pages
Train Like A Spartan With The Spartan Warrior Wor
No ratings yet
Train Like A Spartan With The Spartan Warrior Wor
2 pages
G10 Pe Week 2
No ratings yet
G10 Pe Week 2
3 pages
2021 Global Korea Scholarship Application Guide
No ratings yet
2021 Global Korea Scholarship Application Guide
10 pages
Introducing Rethinking Economics
No ratings yet
Introducing Rethinking Economics
21 pages
22 Speaking
No ratings yet
22 Speaking
21 pages
Financial Literacy-A Systematic Review and Bibliometric Analysis
No ratings yet
Financial Literacy-A Systematic Review and Bibliometric Analysis
26 pages
Aisha Ali Alowais - h00330699 - Brochure
No ratings yet
Aisha Ali Alowais - h00330699 - Brochure
2 pages

Matters of Discussion

Uploaded by

Matters of Discussion

Uploaded by

Matters of Discussion

- Finite and Infinite Hypothesis Spaces,

Probably Approximately Correct (PAC) Learning-

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

❖ Generalization bounds are mathematical expressions that provide

❖ The approximation-generalization tradeoff highlights the inherent

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

❖ The approximation-generalization tradeoff highlights the

❖ This assumption in Machine learning is known as Hypothesis. In

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

Cat & Dog Classification

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

❖ The principle suggests that the best model is

Investigate VC and Natarajan dimensions with

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]

You might also like