Unit 1 Machine Learning
Unit 1 Machine Learning
Machine Learning
• Machine Learning (ML) is a sub-field of Artificial Intelligence (AI)
• The goal of machine learning generally is to understand the structure
of data and fit that data into models that can be understood and
utilized by people.
• Hence, ML algorithms enables the computers to learn from data and
improve themselves without being explicitly programmed.
• It is a continuously developing field.
Machine Learning
Machine Learning
• Machine Learning is programming computers to optimize a
performance criterion using example data or past experience
• We have a model defined up to some parameters
• Learning is the execution of computer program to optimize the
parameters of the model using training data
• The model may be predictive to make predictions in the future or
descriptive to gain knowledge from data or both
• Machine Learning uses: theory of statistics in building mathematical
models
Machine Learning
• The role of Computer Science is two folds:
• First: in training – we need efficient algorithms to solve the optimization
problems and to store and process massive amount of data we usually have
• Second: Once the model is learned, its solution needs to be efficient as well
• In some applications, the efficiency of learning algorithm: space and
time complexity, is as much important as its predictive accuracy
Machine Learning
• Machine Learning applications:
• Facial Recognition
• Optical Character Recognition
• Recommender Engines (what music to listen/movie or show to watch etc)
• Self driven cars
• Prediction of customer loan applications (probability of fault)
• Image Recognition
• Speech Recognition (translation of spoken words into text)
• Medical diagnosis (diagnosis of diseases) – based on images or data
• Financial industry and trading (fraud transactions etc)
Machine Learning Life Cycle
1. Gathering Data
2. Data Preprocessing
4. Model Testing
5. Model Deployment
Machine Learning Life Cycle
• Data Gathering:
• Identification of various sources and collection of data
• Data Preprocessing:
• Data is analyzed for missing values, duplicate values, invalid data etc. using various
analytical techniques
• It also does feature extraction, feature analysis and data visualization
• Model Development
• Develop model using machine learning algorithms and train it using the dataset
• Training is important: model understands various patterns, classes, rules, features
• Model Testing
• The trained model is tested on test dataset and model accuracy is checked
Machine Learning Life Cycle
• Model deployment:
• Involves integrating a machine learning model into an existing production
environment that takes input and returns output to make business decisions
based on data.
• Various technologies that you can use to deploy machine learning models are:
• Docker, Kubernetes, AWS SageMaker, MLFlow, Azure Machine Learning
Service
• Model Monitoring
• monitoring of machine learning models for factors like errors, crashes, and
latency and most importantly to ensure that your model is maintaining the
desired performance.
Types of Machine Learning
• Machine Learning is classified in three types:
• Supervised Learning
• Classification
• Regression
• Unsupervised Learning
• Clustering
• Association
• Dimensionality Reduction
• Reinforcement Learning
• Policy/Decision Making
Types of Machine Learning
Supervised Learning
• In Supervised learning, the system is presented with data which is labeled, which
means that each data is tagged with the correct label.
• The goal is to approximate the mapping function so well that when you have new
input data (x), you can predict the output variables (Y) for that data.
• Example: Identifying Spam Emails
Supervised Learning
• Example: Identifying Spam Emails
• Initially some data is taken and marked as ‘Spam’ or ‘Not Spam’.
• This labeled data is used to train the supervised model
• Once it is trained the model can be tested with some test mails (test
data) and see whether the model is able to predict the right output
Sample Dataset
Data Splitting
D3 Overcast Hot High Weak Yes D12 Overcast Mild High Strong
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes D13 Overcast Hot Normal Weak
Example of hypothesis class. The class of family car is a rectangle is a rectangle in the price-engine power space.
Hypothesis
• Hypothesis: Is an idea that is suggested as possible explanation for something but
has not yet been found to be correct or true.
• The aim is to find h ϵ H that is as similar as possible to C.
• To evaluate how well hypothesis h matches C we find empirical error.
• It is the proportion of training instances where the predictions of h do not match
the required values given in X.
• The error of hypothesis h given the training set X is:
𝐸 ℎ 𝑋 = σ𝑁 𝑡
𝑡=1 1 (ℎ (𝑥 ) ≠ 𝑟 )
𝑡
• Hypothesis class H is the set of all possible rectangles: each quadruple (𝑝1ℎ , 𝑝2ℎ ,
𝑒1ℎ , 𝑒2ℎ ) defines one hypothesis, h, from H – we need to choose the best one
• We need to find the values of these 4 parameters given the training set, that
includes all positive examples and none of negative examples.
• If parameters are real valued, there are infinite h for which E is 0.
Hypothesis
• C is the actual class and
h is our induced
hypothesis.
• False Negative: point
where C is 1 but h is 0
• False Positive: point
where C is 0 but h is 1.
• Other points: True
positive and true
negative are correctly
classified
Hypothesis
• Generalization: Given a future example somewhere close to the boundary between
positive and negative examples, different candidate hypotheses may make different
predictions.
• This is problem of generalization – that is, how well our hypothesis will correctly
classify future examples that are not part of training set.
• Most Specific Hypothesis, S – tightest rectangle that includes all positive examples
and none of the negative examples. The actual class C may be larger that S but never
is smaller
• Most General Hypothesis G – largest rectangle that includes all positive examples
and none of the negative examples.
• Any h ϵ H between S and G is a valid hypothesis with no error (consistent with
training set) and such h make the version space.
• Given another training set, S, G, version space – the parameters and learning
hypothesis, h, can be different.
Hypothesis
• For Boolean function, to end up with a single hypothesis we need to see all 2d
training examples.
• If training set we are provided with contains only small subset of all possible
instances (usually has) – the solution is not unique.
• After seeing N example cases there remains 2d-N possible functions.
• This is an example of ill-posed problem where data by itself is not sufficient to
find a unique solution.
Model Selection – Boolean Function Example
d =2 (attributes x1,x2)
2x2 ways of writing input : 4
2N - learning ways
16 – learning ways
Model Selection and Generalization – Inductive bias
• If learning is ill-posed, and the data is not sufficient to find the solution, we
should make some extra assumptions to have a unique solution with the data.
• The set of assumptions we make to have learning possible is called the inductive
bias of the learning algorithm.
• One way to introduce inductive bias is when we assume a hypothesis class H.
• In learning class of family cars, there are infinitely many ways of separating the positive
examples from the negative ones.
• Assuming the shape of a rectangle is one inductive bias
• Considering rectangle with largest margin is another inductive bias
• In linear regression, assuming a linear function is an inductive bias and among all
lines, choosing the one that minimizes squared error is another inductive bias.
Model Selection - Generalization
• Learning is not possible without inductive bias, and now the question is
how to choose the right bias.
• This is called model selection, which is choosing between possible H.
• The aim of machine learning is rarely to replicate the training data but
the prediction for new cases.
• That is we would like to be able to generate the right output for an input
instance outside the training set, one for which the correct output is not
given in the training set.
• How well a model trained on the training set predicts the right output
for new instances is called generalization.
Model Selection – Underfitting and Overfitting
• For best generalization, we should match the complexity of the hypothesis class H
with the complexity of the function underlying the data.
• If H is less complex than the function, we have underfitting:
• Ex: when trying to fit a line to data sampled from a third-order polynomial.
• In such a case, as we increase the model complexity, the training error decreases.
• But if we have H that is too complex, the data is not enough to constrain it and we
may end up with a bad hypothesis, h ∈ H,
• Ex: when fitting two rectangles to data sampled from one rectangle.
• Or if there is noise, an overcomplex hypothesis may learn not only the underlying function but
also the noise in the data and may make a bad fit,
• Ex: when fitting a sixth-order polynomial to noisy data sampled from third order polynomial
• This is called overfitting (having more training data helps but only up to a certain
point)
• Given a training set and H, we can find h ∈ H that has the minimum training error
but if H is not chosen well, no matter which h ∈ H we pick, we will not have good
generalization.
Underfitting & Overfitting
𝑇𝑃+𝑇𝑁
• Accuracy =
𝑃+𝑁 (𝐴𝑙𝑙 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠)
• Precision: Out of the examples marked as positive by our learning
𝑇𝑃
algorithm/model how many are actually positive =
𝑇𝑃+𝐹𝑃
• Recall: Out of all positive examples how many are correctly predicted
𝑇𝑃 𝑇𝑃
as positive: =
𝑎𝑙𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑇𝑃+𝐹𝑁
Cross Validation
• Cross-Validation is a technique used to test a model’s ability to predict unseen
data (data not used to train the model).
• It is useful if we have limited data when test set is not large enough.
• Cross Validation splits the training data into k blocks.
• In each iteration, the model trains on k-1 blocks and is validated using the last
block.
• The average error over all the iterations is used to evaluate the model.
• Types: K- Fold Cross Validation, Monte Carlo Cross Validation (leave one out),
Validation set approach, Stratified K-Fold Cross Validation.
• K-fold cross-validation is the most common technique for model evaluation and
model selection in machine learning.
• The idea behind K-Fold Cross Validation is each sample in the dataset has equal
opportunity of getting tested.
K-Fold Cross Validation
Steps of K-Fold Cross Validation:
1. Split training data into K equal parts
2. Fit the model on k-1 parts (merged as training set) and calculate test error using
the fitted model on the kth part (test set)
3. Repeat k times, using each data subset as the test set once.
1 1 1 1
[y1 y2 y3 y4] = [θ0 θ1]
x1 x2 x3 x4
1
= 2 𝑋 4 (0.057 − 3) 2 +(0.161 − 10) 2 +(0.135 − 4) 2 +(0.187 − 3) 2
1
= 8 (8.66 + 96.80 + 14.93 + 7.91) = 16.03
Linear Regression
• Gradient Descent (Update θ0 value) Gradient Descent – Update θ1 value
Here, j = 1
Here, j = 0
0.001 0.001
Θ0 = 0.005 - [(0.057-3) + (0.161-10) + (0.135-4) + Θ1 = 0.026 - [(0.057-3)2 + (0.161-10)6 + (0.135-
4 4
(0.187 - 3)] 4)5 + (0.187 - 3)7]
0.001 0.001
= 0.005 - (-2.943 – 9.839 – 3.865 – 2.813) = 0.026 - (-2.943x2 + (– 9.839x6) + (– 3.865x5) +
4 4
(–2.813x7))
0.001
= 0.005 - (-19.46)
4 0.001
= 0.026 - (-5.886 + (-59.034) + (-19.325) +
4
= 0.005 – (-0.0048) = 0.0098 (-19.691))
0.001
= 0.026 – (-103.936) = 0.026 + 0.0259 = 0.0519
4
Linear Regression
Iteration 3 : θ0 = 0.098 and θ1 = 0.051
y1= 0.098 x 1 + 0.051 x 2 = 0.2
y2= 0.098 x 1 + 0.051 x 5 = 0.353
y3= 0.098 x 1 + 0.051 x 6 = 0.404
y4= 0.098 x 1 + 0.051 x 7 = 0.455
= 0.098 0.051 1 1 1 1
2 5 6 7
0.001 0.001
= 0.098 - (-2.8 + (– 9.647) + (– 3.596) +(-2.545)) = 0.051 - (-2.8x2 + (– 9.647x6) + (– 3.596x5) +
4 4
(–2.545x7))
0.001
= 0.098 - (-18.588)
4 0.001
= 0.051 - (-5.6 + (-57.882) + (-17.98) +
4
= 0.098 – (-0.0046) = 0.102 (-17.815))
0.001
= 0.051 – (-99.277) = 0.051 + 0.0248 = 0.075
4