0% found this document useful (0 votes)

18 views92 pages

Unit 1

Machine Learning is defined as the study of algorithms that improve their performance on a task through experience, with applications ranging from spam filtering to medical image recognition. It encompasses various learning types such as supervised, unsupervised, and reinforcement learning, each with distinct methodologies and use cases. The document also discusses the importance of feature selection, dimensionality reduction, and the practical implementation of machine learning systems.

Uploaded by

Sonali S. Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views92 pages

Unit 1

Uploaded by

Sonali S. Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 92

Machine

Learning
TE IT
What is Machine Learning?
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon

Definition by Tom Mitchell (1998):

Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T,
E>.
What is Machine Learning?

Consider the example of Spam Filter, an email program

which watches the email is to be mark as spam or not.

T: To decide whether an email is spam or not

E:The number of emails which are correctly decided as

spam/not spam

P:Observing the label of emails for available email data

Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)

9
Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience
E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself

T: Recognizing hand-written words

P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words

T: Driving on four-lane highways using vision

sensors
P: Average distance traveled before a human-
judged error
E: A sequence of images and steering commands recorded while
observing a human driver. 10
Slide credit: Ray Mooney
Traditional Programming

Data
Computer Output

Program
Machine Learning

Data
Computer Progra
m
Output 4
Machine Learning process
When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

Learning isn’t always useful:

• There is no need to “learn” to calculate payroll
5
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2

6
Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
7
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software

8
Career opportunities
Search Engine companies: Google , Yahoo ,Bing (Microsoft)

Data science vendors: Palantir Teradata Pixar SAS ,Alpine Labs ,Pivotal Tableau , More

than 8000 companies hiring data scientist.

Quantitative Qualitative

23
Data and types: Scales of Measurement
Nominal

Ordinal

Discrete Data

Continuous Data

23
Data and types: Scales of Measurement

Quantitative: interval and ratio

Qualitative :nominal and ordinal

Features and Patterns:
Pattern recognition is a process of finding regularities and similarities
in data using machine learning data.
In machine learning and pattern recognition, a feature is an
individual measurable property or characteristic of a phenomenon.
Choosing informative, discriminating and independent features is a
crucial element of effective algorithms in pattern recognition,
classification and regression.
Attributes are the items of data that are used in machine learning.
Attributes are also referred as variables, fields, or predictors
Features and Patterns:
Learning Tasks- Descriptive and Predictive Tasks.
Learning Tasks- Descriptive and Predictive Tasks.
Types of Learning

• Supervised (inductive) learning

– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions

24
Based on slide by Pedro Domingos
Types of Learning

23
Learning Paradigms
Supervised (inductive) learning
Supervised (inductive) learning
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given
x
– y is real-valued
9 == regression
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1990 2000 2010 2020
1980 Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21,
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given
x
– y is categorical == classification
Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Based on example by Andrew Ng

Tumor Size 28
Unsupervised Learning
• Given x1 , x2 , ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes

Individuals 32
[Source: Daphne Koller]
Unsupervised Learning

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 33

Slide credit: Andrew Ng
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

34
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

35
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states  actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

36
The Agent-Environment Interface

Agent and environment interact at discrete time : t  0, 1, 2,

steps Agent observes state at step t: K
t S
sproduces action at step t : at 
A(st )resulting reward : rt 1 
gets
and resulting next state :
st 1

... st rt +1 rt +2 s rt +3 ...
at st +1 t +2 st +3
at +1 at +2 at +3
37
Slide credit: Sutton & Barto
Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya-wjgY 38
Framing a Learning Problem

40
Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience

Training data Learner

Environment/
Experience Knowledge

Testing data
Performanc
e Element 41
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”

• If examples are not independent, requires

collective classification
• If test distribution is different, requires
transfer learning
42
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year

• Every ML algorithm has three components:

– Representation
– Optimization
– Evaluation

43
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
44
Various Search/Optimization Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
45
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

47
ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
Loop • Learn models
• Interpret results
• Consolidate and deploy discovered knowledge

48
Based on a slide by Pedro Domingos
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.

• Function approximation can be viewed as a search

through a space of hypotheses (representations of
functions) for one that best fits a set of training data.

• Different learning methods assume different

hypothesis spaces (representation languages) and/or
employ different search techniques.

49
Slide credit: Ray Mooney
Features
•Mathematically, features are functions that map
from the instance space to some set of feature
values called the domain of the feature.
•Features are variety in nature e.g.
oSet of integers, the number of occurrence of particular
word
oBoolean, true or false for email is spam or ham
oArbitrary finite set of colors or shapes etc.
Data and Dimensionality

Increasing the number of features will not

always improve classification accuracy. k=3
3 bins
1

In practice, the inclusion of more features

might actually lead to worse performance.

The number of training examples required

increases exponentially with 32 bins
dimensionality d (i.e., kd). 33 bins
k: number of bins per feature
Dimensionality reduction
• Too many factors on the basis for the final classification or regression
•51 Factors are basically variables called features

• The higher the number of features, the harder it gets to visualize the training set
• These features are correlated, and hence redundant
• Dimensionality reduction is a series of techniques in machine learning and

statistics to reduce the number of random variables to consider feature selection

and feature extraction

 Some features (dimensions) bear little or nor useful information (e.g., color
of hair for a car selection)
 Can drop some features
 Have to estimate which features can be dropped from data

 Several features can be combined together without loss or even with gain
of information (e.g., income of all family members for loan application)
 Some features can be combined together
 Have to estimate which features to combine from data
Why Dimensionality Reduction
• What is the objective?
Choose an optimum set of features of lower dimensionality to
improve classification accuracy.

• Different methods can be used to reduce dimensionality:

• Feature extraction
• Feature selection
• Reduces Time Complexity: Less computation
• Reduces Space Complexity: Less parameters
• Saves the cost of observing the feature
• Simpler models are more robust on small datasets
• More interpretable; simpler explanation
• Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3
dimensions
52
Data and Dimensionality
Dimensionality Reduction (cont’d)
 x1 
 
Feature selection: chooses a subset of  x2   xi1 
 .   
the original features.    xi2 
.
x    y   . 
 .   
   . 
  .  
 .   xiK 
 
 xN 

• find a smaller subset of a many-dimensional data set to create

a data model
• finding k features of the d dimensions that give us the
most information and discard the other (d − k) dimensions.
• Subset selection is one of the widely used method

K<<N K<<N
54
Feature Extraction

Feature extraction: finds a set of new features (i.e., through

some mapping f()) from the existing features.

y=𝑓(x) is equivalent to optimizing an objective criterion.

From a mathematical point of view, finding an optimum mapping

Different methods use different objective criteria, e.g.,

Minimize Information Loss: represent the data as accurately as

possible in the lower-dimensional space.

Maximize Discriminatory Information: enhance the class-

discriminatory information in the lower-dimensional space.
55
Feature Extraction
Popular linear feature extraction methods:
Principal Components Analysis (PCA): Seeks a projection that preserves as
much information in the data as possible.
Linear Discriminant Analysis (LDA): Seeks a projection that best discriminates
the data.

Many other methods:

Making features as independent as possible (Independent Component
Analysis or ICA).
Retaining interesting directions (Projection Pursuit).
Embedding to lower dimensional manifolds (Isomap, Locally Linear
Embedding or LLE).

56
Subset-selection
• to find the best subset of the set of features
57

• The best subset contains the least number

of dimensions that most contribute to accuracy
• used in both regression and classification problems.
• 2𝑑 possible subsets of 𝑑 variables
• is not possible to test for all of them unless 𝑑 is small
• Insteadsome heuristics is designed to get a reasonable
(but not optimal) solution in reasonable (polynomial) time.
Subset-selection
 Forward search
 Start from empty set of features
58
 Try each of remaining features
 Estimate classification/regression error for adding specific feature
 Select feature that gives maximum improvement in validation
error
 Stop when no significant improvement

 Backward search
 Start with original set of size d
 Drop features with smallest impact on error
Principal Component Analysis:
• mapping from d dimensional space to a new(k <d)-
dimensional space, with minimum loss of information.
• As the dimensions of data increases, the difficulty to visualize it and
perform computations on it also increase
• Mainly there are two strategies to reduce the dimensions of a data-
o Remove the redundant dimensions
o Only keep the most important dimensions
Concepts used in
PCA
Variance: It is a measureof the variability or it simply
measures how spread the data set is
Concepts used in
PCA
Covariance: : It is a measure of the extent to which
corresponding elements from two sets of ordered data move in the
same direction
Concepts used in
PCA
• PCA finds a new set of dimensions such that all the dimensions are
orthogonal (and hence linearly independent) and ranked according
to the variance of data along them.
• It means more important principle axis occurs first. (more
important = more variance/more spread out data).
• The principal direction in which the data varies is shown by the U
axis and the second most important direction is the V axis orthogonal
to it.
• If the each (X, Y) instance is transform coordinate into its
corresponding (U, V) value, the data is de-correlated, meaning that
the co-variance between the U and V variables is zero.
Concepts used in
PCA
• The directions U and V are called the principal components.

PCA for Data Representation PCA for Dimension Reduction

Working of
PCA
1. The first step is to gather reliable raw data from a sample
based on a questionnaire
2. The second step is to calculate correlations between the
variables.
3. In principal component analysis, principal components are
extracted and presented as a table with the components in
columns and variables in rows.
4. The principal components analysis table is truncated.
Components are reported in order by eigenvalue and by the
proportion of total variance. Frequently, these components
are easily interpreted.
Objectives of PCA

• PCA helps to Extract the most important information from the

data table and compress the size of the data set by keeping
only important information,
• PCA also Simplify the description of the data
• set and analyze the structure of the observations and the
variables.
Dataset Validation Techniques – Hold-out,
k-fold Cross validation, Leave-One-Out
Cross-Validation (LOOCV).
Lesson Goals:

Understand the basic concepts of resampling methods for

machine learning model selection and assessment.
Understand the similarities and differences between the
validation set approach, leave-one-out cross-validation, and K-
fold cross-validation.
Understand the basic theoretical concepts of the bootstrap
method for assessing statistical accuracy.
Recognize that the performance of machine learning models
depends on their prediction capability on independent test data.
Resampling Methods

Involves repeatedly drawing samples from a training set and refitting a

model of interest on each sample in order to obtain more information
about the fitted model.
Example: We can estimate the variability of a linear regression fit by
repeatedly drawing different samples from the training data, fitting a
OLS regression to each new sample, and then examining the extent to
which the resulting fits differ.
Model Assessment: having chosen a final model, estimating its
prediction error on new data.
Model Selection: estimating the performance of different models in
order to choose the best one.
Resampling Methods (cont.)

Cross-Validation
Used to estimate test set prediction error rates associated with a given
machine learning method to evaluate its performance, or to select the
appropriate level of model flexibility.

Bootstrap
Used most commonly to provide a measure of accuracy of a parameter
estimate or of a given machine learning method.
Model Assessment

The generalization performance of a machine learning method

relates to its prediction capability on independent test sets.
Assessment of this performance is extremely important in
practice, since it guides the choice of the machine learning
method or model.
Further, this gives us a measure of the quality of the ultimately
chosen model.
Model Assessment (cont.)

Test Error
The average error that results from using a machine learning
method to predict the response on a new observation.
The prediction error over an independent test sample.
Training Error
The average loss over the training sample:

Note: The training error rate can dramatically underestimate the test
error rate
Model Assessment (cont.)

 As the model becomes more and

more complex, it uses the training
data more and is able to adapt to
more complicated underlying
structures.
 Hence, there is a decrease in bias but
an increase in variance.
 However, training error is not a good
estimate of the test error.
• Training error consistently decreases with model complexity.
• A model with zero training error is overfit to the training data and will typically
generalize poorly.
Model Assessment (cont.)

 If we are in a data-rich situation, the best approach for both model

selection and model assessment is to randomly divide the dataset
into three parts: training set, validation set, and test set.

 The training set is used to fit the models. The validation set is used
to estimate prediction error for model selection. The test set is used
for assessment of the prediction error of the final chosen model.

 A typical split might by 50% for training, and 25% each for
validation and testing.
Model Assessment
(cont.)
Best solution: use a large designated test set, which is often not
available. For the methods presented here, there is insufficient data to
split it into three parts.

There are some methods to make mathematical adjustments to the

training error rate in order to estimate the test error rate:
Cp statistic, AIC, BIC (we will discuss these next week)

Here, we consider cross-validation (CV) methods that estimate the test

error by holding out a subset of the training observations from the
fitting process, and then applying the machine learning method to
those held out observations.
Overview: Model
Selection
By far, the most important use of validation is for model selection,
which we will discuss in greater detail next week.
This could be the choice between a linear model and a nonlinear
model, the choice of the order of polynomial in a model, the choice of
a regularization parameter, or any other choice that affects the
learning process.
In almost every learning situation, there are some choices to be made
and we need a principled way of making these choices.
The leap is to realize that validation can be used to estimate the out-
of-sample error for more than one model.
Overview: Model Selection (cont.)

Suppose we have M models; validation can

be used to select one of these models.

We use the training data to fit the model,

and we evaluate each model on the
validation set to obtain the validation
errors.

It is now a simple matter to select the

model with the lowest validation error.
Validation Set
Approach
Suppose that we would like to find a set of variables that give the
lowest validation error rate (an estimate of the test error rate).

If we have a large data set, we can achieve this goal by randomly

splitting the data into separate training and validation data sets.

Then, we use the training data set to build each possible model and
select the model that gives the lowest error rate when applied to the
validation data set.

Training Data Validation Data

Validation Set Approach: Example

Example: we want to predict mpg from horsepower

Two models:
mpg ~ horsepower
mpg ~ horsepower + horspower2

Which model gives a better fit?

We randomly split (50/50) 392 observations into training and
validation data sets, and we fit both models using the training data.
Next, we evaluate both models using the validation data set.
Winner = model with the lowest validation (testing) MSE
Validation Set Approach: Example Results

Left Panel: Validation error estimates for a single split into

training and validation data sets.
Right Panel: Validation error estimates for multiple splits;
shows the test error rate is highly variable.
Validation Set Approach: Review

Advantages:
Conceptually simple and easy implementation.

Drawbacks:
The validation set error rate (MSE) can be highly variable.
Only a subset of the observations (those in the training set) are used
to fit the model.
Machine learning methods tend to perform worse when trained on
fewer observations.
Thus, the validation set error rate may tend to overestimate the test
error rate for the model fit on the entire data set.
Leave-One-Out Cross-
Validation
Instead of creating two subsets of comparable size, a single
observation is used for the validation set and the remaining
observations (n – 1) make up the training set.
 LOOCV Algorithm:
– Split the entire data set of size n into:
• Blue = training data set
• Beige = validation data set

– Fit the model using the training data set

– Evaluate the model using validation set and
compute the corresponding MSE.
𝑛 – Repeat this process n times, producing n
1
CV (𝑛)= ∑ MSE 𝑖 squared errors. The average of these n
𝑛 𝑖=1 squared errors estimates the test MSE.
Validation Set Approach vs. LOOCV

LOOCV has far less bias and, therefore, tends not to overestimate the
test error rate.

Performing LOOCV multiple times always yields the same results

because there is no randomness in the training/validation set splits.

LOOCV is computationally intensive because the model has to be fit n

times. However, there is a shortcut with OLS linear or polynomial
regression (where hi is the leverage):

( )
1
𝑛 ^
𝑌 𝑖 −𝑌
2

CV (𝑛)= ∑ 𝑖
𝑛 𝑖 =1 1 − h𝑖
K-Fold Cross-Validation

Probably the simplest and most widely used method for estimating prediction
error.

This method directly estimates the average prediction error when the
machine learning method is applied to an independent test sample.

Ideally, if we had enough data, we would set aside a validation set (as
previously described) and use it to assess the performance of our prediction
model.

To finesse the problem, K-fold cross-validation uses part of the available data
to fit the model, and a different part to test it.
K-Fold Cross-Validation
(cont.)
We use this method because LOOCV is
computationally intensive.
We randomly divide the data set of into K
folds (typically K = 5 or 10).

 The first fold is treated as a validation set, and the method is fit on
the remaining K – 1 folds. The MSE is computed on the
observations in the held-out fold. The process is repeated K times,
taking out a different part each time.
 By averaging the K estimates of the test error, we get an estimated
validation (test) error rate for new observations.
K-Fold Cross-Validation
(cont.)
Let the K folds be C1, … , CK, where Ck denotes the indices of the
observations in fold k. There are nk observations in fold k: if N is a
multiple of K, then nk = n / K.

Compute:
where and is the fitted value for observation i, obtained from the
data with fold k removed.
K-Fold Cross-Validation vs. LOOCV

Left Panel: LOOCV Error Curve

Right Panel: 10-fold CV run nine separate times, each with a different
random split of the data into ten parts.
Note: LOOCV is a special case of K-fold, where K = n
Bias-Variance Trade-off for K-Fold Cross-
Validation

Which is better, LOOCV or K-fold CV?

LOOCV is more computationally intensive than K-fold CV
From the perspective of bias reduction, LOOCV is preferred to K-fold CV (when K < n)
However, LOOCV has higher variance than K-fold CV (when K < n)
Thus, we see the bias-variance trade-off between the two resampling methods

We tend to use K-fold CV with K = 5 or K = 10, as these values have been shown
empirically to yield test error rate estimates that suffer neither from excessively
high bias nor from very high variance
Cross-Validation on Classification Problems

We will cover classification problems in more detail later in the course,

but we briefly show how CV can be used when Y is qualitative
(categorical) as opposed to quantitative. Here, rather than use MSE to
quantify test error, we instead use the number of misclassified
observation.
LOOCV Error Rate:
We use CV as follows:
Divide data into K folds; hold-out one part and fit using the remaining data
(compute error rate on hold-out data); repeat K times.
CV Error Rate: average over the K errors we have computed
Cross-Validation:
Wrong Way
Cross-Validation: Right
Way
Summary

Resampling methods for machine learning model selection

and assessment.
The validation set approach, leave-one-out cross-validation,
and K-fold cross-validation.

Unit 1 ML
No ratings yet
Unit 1 ML
93 pages
Unit 1
No ratings yet
Unit 1
93 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
01 Introduction ML
No ratings yet
01 Introduction ML
48 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
24 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
71 pages
Module 1
No ratings yet
Module 1
175 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
606 pages
1.0 Introduction
No ratings yet
1.0 Introduction
50 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Unit1 2
No ratings yet
Unit1 2
101 pages
Introduction To ML P2
No ratings yet
Introduction To ML P2
30 pages
BE02000041 Funda of AI Unit 3 Basics of ML
No ratings yet
BE02000041 Funda of AI Unit 3 Basics of ML
86 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
46 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages
Lecture Compiled
No ratings yet
Lecture Compiled
224 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Lecture 01 - Machine Learning Basics Revision
No ratings yet
Lecture 01 - Machine Learning Basics Revision
80 pages
Unit 1
No ratings yet
Unit 1
72 pages
Lecture 1.2 Introduction To Machine Learning
No ratings yet
Lecture 1.2 Introduction To Machine Learning
31 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
28 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
51 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
History and Types of Machine Learning
No ratings yet
History and Types of Machine Learning
84 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
59 pages
Unit Iii
No ratings yet
Unit Iii
39 pages
ML Unit-1
No ratings yet
ML Unit-1
130 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
Introduction To ML
No ratings yet
Introduction To ML
46 pages
ML Unit 1 Intro ML
No ratings yet
ML Unit 1 Intro ML
43 pages
Module 1-Basics of ML
No ratings yet
Module 1-Basics of ML
142 pages
AI & ML Basics for Beginners
No ratings yet
AI & ML Basics for Beginners
39 pages
Unit 1
No ratings yet
Unit 1
62 pages
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
100% (1)
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
51 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
Machine Learning Week2
No ratings yet
Machine Learning Week2
51 pages
Lec 7 - 8 - Machine Learning Introduction
No ratings yet
Lec 7 - 8 - Machine Learning Introduction
55 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
AML All Merged PDF Class 1 To 8
No ratings yet
AML All Merged PDF Class 1 To 8
423 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
63 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Chapter 7 - Artificial Intelligence Application
No ratings yet
Chapter 7 - Artificial Intelligence Application
29 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
Doxycycline Hyclate Prices, Coupons & Savings Tips - GoodRx
No ratings yet
Doxycycline Hyclate Prices, Coupons & Savings Tips - GoodRx
1 page
KIIT Lesson Plan
No ratings yet
KIIT Lesson Plan
9 pages
Real World Python A Hacker S Guide To Solving Problems With Code 1st Edition Lee Vaughan No Waiting Time
100% (1)
Real World Python A Hacker S Guide To Solving Problems With Code 1st Edition Lee Vaughan No Waiting Time
104 pages
For New Sales Tax Registration, Biometric Verification Is Necessary Individual NTN S of Members and Company Is Necessary For The Process
No ratings yet
For New Sales Tax Registration, Biometric Verification Is Necessary Individual NTN S of Members and Company Is Necessary For The Process
1 page
Project 2 978314918104156
No ratings yet
Project 2 978314918104156
3 pages
Events Management Services NC Iii - CBLM PDF
100% (2)
Events Management Services NC Iii - CBLM PDF
1 page
Account List
No ratings yet
Account List
1,294 pages
LA 310 Liquid Adhesive Piston Pumps
No ratings yet
LA 310 Liquid Adhesive Piston Pumps
2 pages
Hyundai R320LC-7 Swing Parts List
No ratings yet
Hyundai R320LC-7 Swing Parts List
1 page
Iso 16162 2010
No ratings yet
Iso 16162 2010
9 pages
VeriVide Accessories Datasheet UK 2019
No ratings yet
VeriVide Accessories Datasheet UK 2019
2 pages
Strategi Perusahaan Bidang Konstruksi Dalam Menghadapi Resesi Ekonomi (Studi Kasus PT Adhi Karya (Persero) TBK)
100% (1)
Strategi Perusahaan Bidang Konstruksi Dalam Menghadapi Resesi Ekonomi (Studi Kasus PT Adhi Karya (Persero) TBK)
9 pages
Google's Innovations and Revenue Model
No ratings yet
Google's Innovations and Revenue Model
14 pages
CTR - 8500 - 8300 - Install - Guide - 3.4 - Dec2016 - 260-668240-001
No ratings yet
CTR - 8500 - 8300 - Install - Guide - 3.4 - Dec2016 - 260-668240-001
172 pages
CSB Scope
No ratings yet
CSB Scope
7 pages
Ey Does Integration Playbook Tackle Merger Acquisition Challenge Manufacturing v1 20180914
No ratings yet
Ey Does Integration Playbook Tackle Merger Acquisition Challenge Manufacturing v1 20180914
4 pages
Admission Bulletin (Honours) 2025-2026
No ratings yet
Admission Bulletin (Honours) 2025-2026
20 pages
DWD 2
No ratings yet
DWD 2
7 pages
Sikament R4 N
No ratings yet
Sikament R4 N
2 pages
All Type of Glove Making Machine
No ratings yet
All Type of Glove Making Machine
7 pages
DG Commission REPORT
No ratings yet
DG Commission REPORT
8 pages
Parrot Products Botswana (Pty) LTD - August Pricelist
No ratings yet
Parrot Products Botswana (Pty) LTD - August Pricelist
336 pages
Lab Manual 5 EXP
No ratings yet
Lab Manual 5 EXP
15 pages
2.4 Analyzing Indicators of Malicious Activity
No ratings yet
2.4 Analyzing Indicators of Malicious Activity
4 pages
Information Security Risk Analysis
No ratings yet
Information Security Risk Analysis
42 pages
OM 04 - Unit-2
No ratings yet
OM 04 - Unit-2
56 pages
Hacking Airbnb's Search Rank Algorithm
No ratings yet
Hacking Airbnb's Search Rank Algorithm
17 pages
Tabela Made Xiaomi, Samsung, Oppo, Motorola, Infinix 14-01-2025
No ratings yet
Tabela Made Xiaomi, Samsung, Oppo, Motorola, Infinix 14-01-2025
5 pages
Volume 1
No ratings yet
Volume 1
343 pages
Procedure and Work Instrunction Manual (PAWIM)
No ratings yet
Procedure and Work Instrunction Manual (PAWIM)
219 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

Machine

Definition by Tom Mitchell (1998):

Consider the example of Spam Filter, an email program

T: To decide whether an email is spam or not

E:The number of emails which are correctly decided as

P:Observing the label of emails for available email data

T: Recognizing hand-written words

T: Driving on four-lane highways using vision

Learning isn’t always useful:

than 8000 companies hiring data scientist.

Social network: Engineering related :

Quantitative: interval and ratio

Qualitative :nominal and ordinal

• Supervised (inductive) learning

Based on example by Andrew Ng

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 33

Agent and environment interact at discrete time : t  0, 1, 2,

Training data Learner

• If examples are not independent, requires

• Every ML algorithm has three components:

• Function approximation can be viewed as a search

• Different learning methods assume different

Increasing the number of features will not

In practice, the inclusion of more features

The number of training examples required

statistics to reduce the number of random variables to consider feature selection

and feature extraction

• Different methods can be used to reduce dimensionality:

• find a smaller subset of a many-dimensional data set to create

Feature extraction: finds a set of new features (i.e., through

y=𝑓(x) is equivalent to optimizing an objective criterion.

Different methods use different objective criteria, e.g.,

Minimize Information Loss: represent the data as accurately as

Maximize Discriminatory Information: enhance the class-

Many other methods:

• The best subset contains the least number

PCA for Data Representation PCA for Dimension Reduction

• PCA helps to Extract the most important information from the

Understand the basic concepts of resampling methods for

Involves repeatedly drawing samples from a training set and refitting a

The generalization performance of a machine learning method

 As the model becomes more and

 If we are in a data-rich situation, the best approach for both model

There are some methods to make mathematical adjustments to the

Here, we consider cross-validation (CV) methods that estimate the test

Suppose we have M models; validation can

We use the training data to fit the model,

It is now a simple matter to select the

If we have a large data set, we can achieve this goal by randomly

Training Data Validation Data

Example: we want to predict mpg from horsepower

Which model gives a better fit?

Left Panel: Validation error estimates for a single split into

– Fit the model using the training data set

Performing LOOCV multiple times always yields the same results

LOOCV is computationally intensive because the model has to be fit n

Left Panel: LOOCV Error Curve

Which is better, LOOCV or K-fold CV?

We will cover classification problems in more detail later in the course,

Resampling methods for machine learning model selection

You might also like