0% found this document useful (0 votes)
86 views62 pages

Lecture - 2 & 3

The document discusses supervised and unsupervised learning. It covers topics like classification and regression algorithms for supervised learning, including KNN, linear regression, logistic regression, decision trees, and neural networks. For unsupervised learning, it discusses dimensionality reduction, feature extraction, and clustering algorithms like K-means.

Uploaded by

Yonatan tamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views62 pages

Lecture - 2 & 3

The document discusses supervised and unsupervised learning. It covers topics like classification and regression algorithms for supervised learning, including KNN, linear regression, logistic regression, decision trees, and neural networks. For unsupervised learning, it discusses dimensionality reduction, feature extraction, and clustering algorithms like K-means.

Uploaded by

Yonatan tamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Lecture – 2 & 3

Supervised and Unsupervised Learning


Outline

Supervised Learning Unsupervised learning & preprocessing

• Classification & regression • Types of unsupervised learning


• Pre-processing & scaling
• Supervised machine learning
algorithms • Dimensionality reduction & feature extraction
✓ K-nearest Neighbor, Linear Models ✓ Principal component analysis (PCA)

✓ Naïve Bayes Classifiers • Clustering


✓ Decision trees ✓ K-means Clustering
✓ Support Vector Machine ✓ Agglomerative clustering
✓ Neural networks (Deep learning) ✓ Evaluating & comparing clustering algorithms
Supervised Learning
It is a method used in artificial intelligence (AI) to teach computers how to
understand and analyze data.
The goal is to enable the algorithm to identify patterns and relationships within
the data to accurately label new, unseen data.
Key Points:
✓It involves training a machine from labeled data.
✓Labeled data consists of examples with the correct answer or classification.
✓The machine learns the relationship between inputs (fruit images) and outputs
(fruit labels).
✓The trained machine can then make predictions on new, unlabeled data.
Steps Involved in Supervised Learning:
✓First Determine the type of training dataset

✓Collect/Gather the labelled training data.

✓Split the training dataset into training dataset, test dataset, and validation
dataset.

✓Determine the input features of the training dataset

✓Determine the suitable algorithm for the model, such as SVM, decision tree, etc.

✓Execute the algorithm on the training dataset.

✓Evaluate the accuracy of the model by providing the test set.


How Supervised Learning Works?
Types of Supervised Learning Algorithms
Classification: Classification
algorithms are used when the output
variable is categorical, such as “Yes-
No”, “Male-Female”, “True-false”,
“Red”- “blue”, “disease”- “no disease”.
Regression: A regression problem is when the output variable is a real
number value, such as “dollars” or “weight”. It is used for the prediction
of continuous variables, such as Weather forecasting, and Market Trends.
Types of Supervised Learning: Regression
Regression is used to predict continuous values, such as house prices,
stock prices. Regression algorithms learn a function that maps from
the input features to the output value.
Some common regression algorithms include:
✓Linear Regression
✓Polynomial Regression
✓Logistic Regression
✓SVM Regression
✓Decision Tree Regression
✓Random Forest Regression
Linear Regression

It is used to find the linear relationship between the dependent and


independent variables for predictive analysis.

The equation for Linear Regression: 𝑌 = 𝐴 + 𝐵 ⋅ X


where:

✓X is the input or independent variable


✓Y is the output or dependent variable
✓a is the intercept, and b is the coefficient of X
Linear Regression
The straight line shows the best-fit weight Y and the data of height X
of 21-year-old candidates scattered over the plot.
To get this best-fit line, the best values of a and b should be found. By
adjusting the values of a and
b, the errors in the prediction
of Y can be reduced.
This is how linear regression
helps in finding the linear
relationship and predicting
the output.
Linear Regression
Selection Criteria
✓Classification and regression capabilities
✓Data quality
✓Computational complexity
✓Comprehensible and transparent
Where is linear Regression used?
✓Evaluating trends and sales estimate
✓Analyses the impact of price change
✓Assessment of risk in financial services and insurance domain
Linear Regression Algorithm
Understanding Linear Regression
Mean Square error
Logistic Regression

Logistic regression comes into play when the dependent variable is


categorical or has binary outcomes, such as “yes” or “no,” or “true” or
“false.”

logistic regression is designed for binary classification problems, it helps


us predict discrete values for variables.

Example predict whether a student will pass or fail a test based on


factors like their study time and previous scores.
Logistic Regression
Types of Logistic Regression
Logistic Regression can be classified into three types:

Binomial: there can be only two possible types of the dependent variables,
such as 0 or 1, Pass or Fail, etc.

Multinomial: there can be 3 or more possible unordered types of the


dependent variable, such as “cat”, “dogs”, or “sheep”

Ordinal: there can be 3 or more possible ordered types of dependent


variables, such as “low”, “Medium”, or “High”.
Logistic Regression vs Linear Regression

Logistic Regression Linear Regression

Continues Variable Categorical Variable


Solve Regression Solve Classification
Problem problem
Straight line S – Curve
Polynomial Regression
It is a special case of linear regression where we fit a polynomial equation
on the data with a curvilinear relationship between the target variable
and the independent variables.
In a curvilinear relationship, the value of the target variable changes in a
non-uniform manner with respect to the predictor (s).
In Linear Regression, with a single predictor, we have the following
equation: 𝐘 = 𝜽 𝟎 + 𝜽 𝟏 𝒙 where,
Y is the target,
x is the predictor,
𝜃0 is the bias, and 𝜃1 is the weight in the regression equation
Polynomial Regression
Polynomial Regression
Polynomial Regression Question

Read the Implementation of the Polynomial


Regression with example based on the following Steps.

Step 1: Data Preparation


Step 2: Selection the polynomial Degree
Step 3: Model Training
Step 4: Model Evaluation
Decision Tree Regression

✓A decision tree is used to explain the sequence of actions that


must be performed to get the desired output.
✓It is a hierarchical diagram that shows the actions.
✓An algorithm can be created for a decision tree on the basis of
the set hierarchy of actions.
✓Look at the next example.
Decision Tree Regression
Supervised Learning: Classification Algorithms

Classification algorithm aims to organize different inputs into specific


categories or groups, relying on the labeled data it has been trained on.

This algorithm is used for separating emails into spam or non-spam or


classifying customer feedback as positive or negative.

Another example is recognizing certain features, such as handwritten


letters and numbers, or categorizing drugs into various groups.
Classification Algorithms
Classification involves identifying and studying specific elements to
determine their appropriate category or group.

Classification algorithms include:


✓Support vector machines

✓K-Nearest Neighbor (KNN)

✓Naive Bayes

✓Random forest

✓Neural networks
Support Vector Machine (SVM)
▪ The SVM algorithm creates a hyperplane to segregate n-dimensional space
into classes and identify the correct category of new data points.

▪ The extreme cases that help create the hyperplane are called support vectors,
hence the name Support Vector Machine.

▪ A SVM is a type of algorithm that is used for both classification and


regression tasks.
✓Support Vector Regression: It is an extension of SVM used for predicting
continuous values.
✓Support Vector Classifier: It aims to find the best hyperplane that maximizes the
margin between data points of different classes.
Support Vector Machine (SVM)
Support Vector Machine Regression
SVM Use Case
✓Face Detection

✓Text and Hypertext Categorization

✓Classification of Images

✓Bioinformatics

✓Remote Homology Detection

✓Handwriting Detection

✓Generalized Predictive Control


K-Nearest Neighbors (KNN)
KNN works by finding k training examples closest to a given input and then predicts the
class or value based on the majority class or average value of these neighbors.

The performance of KNN can be influenced by the choice of k and the distance metric
used to measure proximity. However, it is intuitive but can be sensitive to noisy data and
requires careful selection of k for optimal results.

A KNN is a type of algorithm that is used for both classification and regression tasks.

KNN Regression: It predicts continuous values by averaging the outputs of the k closest
neighbors.

KNN Classification: Data points are classified based on the majority class of their k
closest neighbors.
Naive Bayes
It is a probabilistic ML model that specializes in classification tasks.

It applies Bayes’ Theorem, which allows us to calculate the probability of an event (A)
happening, given that another event (B) has already occurred.

The key assumption in Naive Bayes is that the predictors used in the model are
independent, meaning that the presence of one feature doesn’t affect the others. This is
why it’s called “naive.”

One common variation of Naive Bayes is the decision tree, widely used in business
settings. Unlike a flowchart, a decision tree is a supervised learning algorithm that uses
a series of control statements to make decisions and determine their consequences.
Random Forest Regression
▪ Made up of multiple decision trees that work together to make predictions.

▪ Each tree in the forest is trained on a different subset of the input features and data.

▪ The final prediction is made by aggregating the predictions of all the trees in the forest.

▪ Random forests are an ensemble learning technique that is used for both classification
and regression tasks.
▪ Random Forest Regression: It combines multiple decision trees to reduce overfitting
and improve prediction accuracy.
▪ Random Forest Classifier: Combines several decision trees to improve the accuracy of
classification while minimizing overfitting.
Neural network
Neural networks are advanced algorithms that imitate the
interconnected nature of the human brain.
They consist of nodes with inputs, weights, a bias, and an output.
Through learning complex patterns and relationships, these networks
process training data for tasks like deep learning.
Here’s how it works: when a node’s output value surpasses a threshold,
it becomes active and passes data to the next layer.
Neural network
They then adjust their performance using gradient descent, optimizing a
loss function to minimize the difference between predicted and desired
outputs.
The goal is to approach zero in the cost function, indicating high model
accuracy.
Neural networks have applications in categorizing data, interpreting
sensory information, and identifying patterns. However, their use is
limited due to the need for significant computational resources.
Neural network
(2)
𝑥0 𝑎0
(2)
𝑥1 𝑎1
“Output”
(2)
𝑥2 𝑎2 ℎΘ 𝑥
(2)
𝑥3 𝑎3
Layer 1 Layer 2 Layer 3
Neural network
(𝑗)
𝑥0 𝑎0
(2) 𝑎𝑖 = “activation” of unit 𝑖 in layer 𝑗
(2)
Θ𝑗 = matrix of weights controlling
𝑥1 𝑎1
function mapping from layer 𝑗 to layer 𝑗 + 1
(2) ℎΘ 𝑥
𝑥2 𝑎2

𝑥3 (2) 𝑠𝑗 unit in layer 𝑗


𝑎3
𝑠𝑗+1 units in layer 𝑗 + 1
(2) (1) (1) (1) (1)
𝑎1 = 𝑔 Θ10 𝑥0 + Θ11 𝑥1 + Θ12 𝑥2 + Θ13 𝑥3
𝑗
(2)
𝑎2 =𝑔
(1) (1) (1) (1)
Θ20 𝑥0 + Θ21 𝑥1 + Θ22 𝑥2 + Θ23 𝑥3 Size of Θ ?
(2) (1) (1) (1) (1)
𝑎3 =𝑔 Θ30 𝑥0 + Θ31 𝑥1 + Θ32 𝑥2 + Θ33 𝑥3 𝑠𝑗+1 × (𝑠𝑗 + 1)
(2) (2) (1) (2) (1) (2) (1) (2)
ℎΘ (𝑥) = 𝑔 Θ10 𝑎0 + Θ11 𝑎1 + Θ12 𝑎2 + Θ13 𝑎3
Slide credit: Andrew Ng
Evaluating Supervised Learning Models
It is an important step in ensuring that the model is accurate and generalizable.
There are a number of different metrics that can be used to evaluate supervised
learning models.

For Regression
Mean Squared Error (MSE): MSE measures the average squared difference
between the predicted values and the actual values. Lower MSE values indicate
better model performance.
Root Mean Squared Error (RMSE): RMSE is the square root of MSE,
representing the standard deviation of the prediction errors. Similar to MSE,
lower RMSE values indicate better model performance.
Evaluating Supervised Learning Models
For Regression

Mean Absolute Error (MAE)


✓MAE measures the average absolute difference between the predicted
values and the actual values.
✓ It is less sensitive to outliers compared to MSE or RMSE.

R-squared (Coefficient of Determination)


✓ R-squared measures the proportion of the variance in the target variable
that is explained by the model.
✓Higher R-squared values indicate better model fit.
Evaluating Supervised Learning Models
For Classification
Accuracy: It is the percentage of predictions that the model makes correctly.

It is calculated by dividing the number of correct predictions by the total number of


predictions.

Precision: it is the percentage of positive predictions that the model makes that are actually
correct.

It is calculated by dividing the number of true positives by the total number of positive
predictions.

Recall: it is a metric that measures how often a machine learning model correctly identifies
positive instances (true positives) from all the actual positive samples in the dataset.
Applications of Supervised learning
Fraud detection: can analyze financial transactions and identify
patterns that indicate fraudulent activity, helping financial institutions
prevent fraud and protect their customers.

NLP: have crucial role in NLP tasks, including sentiment analysis,


machine translation, and text summarization, enabling machines to
understand and process human language effectively.

Finance, Face Recognition, Voice Recognition, Weather Forecasting,


Interactions Between Therapeutic Drugs
Applications of Supervised learning
Spam filtering: identify and classify spam emails based on their
content, helping users avoid unwanted messages.

Image classification: can automatically classify images into different


categories, such as animals, objects, facilitating tasks like image search,
content moderation, and image-based product recommendations.

Medical diagnosis: can assist in medical diagnosis by analyzing


patient data, such as medical images, test results, and patient history, to
identify patterns that suggest specific diseases or conditions.
Advantages of Supervised learning
▪ It allows collecting data and produces data output from previous experiences.

▪ Helps to optimize performance criteria with the help of experience.

▪ It helps to solve various types of real-world computation problems.

▪ It performs classification and regression tasks.

▪ It allows estimating or mapping the result to a new sample.

▪ We have complete control over choosing the number of classes we want in


the training data.
Disadvantages of Supervised learning
✓Their limitations require careful consideration during problem formulation, data
collection, model selection, and evaluation.

✓Overfitting: which leads to poor performance.

✓Feature Engineering: Extracting relevant features from raw data is crucial for
model performance, but it is time-consuming

✓Bias in Models: Training data biases can lead to unfair predictions.

✓It depends on labeled training data, Classifying big data can be challenging.

✓Supervised learning cannot handle all complex tasks in Machine Learning.


What is Unsupervised learning?
It is the training of a machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without
guidance.
Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.
No teacher is provided, Therefore the machine is restricted to find the
hidden structure in unlabeled data by itself.
Example animal data groupings based various animal species, the creatures
without depending on labels that already exist.
Key Points of Unsupervised learning

▪ Allows the model to discover patterns and relationships in unlabeled data.

▪ Clustering algorithms group similar data points together based on their


inherent characteristics.

▪ Feature extraction captures essential information from the data, enabling the
model to make meaningful distinctions.

▪ Label association assigns categories to the clusters based on the extracted


patterns and characteristics.
Types of Unsupervised Learning
1. Clustering: it is a type of unsupervised learning that is used to
group similar data points together.

Clustering algorithms work by iteratively moving data points closer to


their cluster centers and further away from data points in other
clusters.

2. Association: An association rule learning problem is where you


want to discover rules that describe large portions of your data, such
as people that buy X also tend to buy Y.
Different Types of Clustering Algorithm
most prominent examples of clustering
algorithms are:

Distribution-based methods: It is a
clustering model in which we will fit the
data on the probability that how it may
belong to the same distribution.

The grouping done may be normal or


gaussian.
Different Types of Clustering Algorithm
For Ex- K – means algorithm is one of
Centroid-based methods: it is one of the
the popular examples of this algorithm.
iterative clustering algorithms, clusters are
formed by the closeness of data points to
the centroid of clusters.

Centroid is formed such that the distance


of data points is minimum with the center.
This problem is Hard problems and thus
solutions are commonly approximated over
a number of trials.
Different Types of Clustering Algorithm

For Ex- hierarchical algorithm and its variants.


Connectivity based methods

✓ Similar to Centroid models, define


clusters based on data point.

✓ providing an extensive hierarchy


of clusters merging at specific
distances, but lacking scalability.
Different Types of Clustering Algorithm
Density Models: there will be For Ex- DBSCAN and OPTICS.

searching of data space for areas of


the varied density of data points in
the data space.

It isolates various density regions


based on different densities present in
the data space.
Different Types of Clustering Algorithm

Subspace clustering: it is a problem that groups data points into multiple


clusters, ensuring they lie on a low-dimensional linear subspace.

▪ It's an extension of feature selection and requires a search method and


evaluation criteria.

▪ Used for computer vision problems.

▪ It raises concerns about data privacy

▪ There are two branches of subspace clustering based on their search strategy.
Different Types of Clustering Algorithm
Subspace clustering

Top-down and bottom-up algorithms are two methods used to find clusters in data.
Top-down algorithms find initial clustering in dimensions and evaluate
subspaces, it is also called projection-based clustering.

Bottom-up algorithms identify dense regions in low-dimensional space and


combine them to form clusters in full-dimensional space. These algorithms are
useful for discovering clusters with different patterns in different subspaces, similar
patterns across different subspaces, or those with significant variation in data
points within a cluster.
Evaluating Non-Supervised Learning Models

It is ensuring that the model is effective and useful.

it can be more challenging than evaluating supervised learning models, as


there is no ground truth data to compare the model’s predictions to.

different metrics used to evaluate non-supervised learning models:

✓Silhouette score: it measures how well each data point is clustered with
its own cluster members and separated from other clusters.

It ranges from -1 to 1, with higher scores indicating better clustering.


Evaluating Non-Supervised Learning Models
✓Calinski-Harabasz score: it measures the ratio between the variance
between clusters and the variance within clusters. It ranges from 0 to
infinity, with higher scores indicating better clustering.

✓Adjusted Rand index: it measures the similarity between two clustering.

It ranges from -1 to 1, with higher scores indicating more similar clustering.

✓Davies-Bouldin index: it measures the average similarity between


clusters. It ranges from 0 to infinity, with lower scores indicating better
clustering.
Evaluating Non-Supervised Learning Models
F1 score: it is a weighted average of precision and recall, which are two
metrics that are commonly used in supervised learning to evaluate
classification models. However, the F1 score can also be used to evaluate
non-supervised learning models, such as clustering models.
Application of Unsupervised learning

It can be used to solve :


Anomaly detection: it can identify unusual patterns or deviations from normal
behavior in data, enabling the detection of fraud, intrusion, or system failures.

Scientific discovery: It can uncover hidden relationships and patterns in scientific


data, leading to new hypotheses and insights in various scientific fields.

Recommendation systems: It can identify patterns and similarities in user


behavior and preferences to recommend products, movies, or music that align with
their interests.
Application of Unsupervised learning

It can be used to solve :

Customer segmentation: it can identify groups of customers with

similar characteristics, allowing businesses to target marketing campaigns

and improve customer service more effectively.

Image analysis: it can group images based on their content, facilitating

tasks such as image classification, object detection, and image retrieval.


Advantages of Unsupervised learning
✓It does not require training data to be labeled.

✓Dimensionality reduction can be easily accomplished

✓Capable of finding previously unknown patterns in data.

✓It can help you gain insights from unlabeled data that you might not
have been able to get otherwise.

✓It is good at finding patterns and relationships in data without being


told what to look for. This can help you learn new things about your
data.
Disadvantages of Unsupervised learning

✓Difficult to measure accuracy or effectiveness due to lack of predefined


answers during training. The results often have lesser accuracy.

✓The user needs to spend time interpreting and label the classes

✓It can be sensitive to data quality, including missing values, outliers, and
noisy data.

✓Without labeled data, it can be difficult to evaluate the performance of


unsupervised learning models, making it challenging to assess their
effectiveness.
Supervised vs. Unsupervised Machine Learning
Unsupervised machine
Parameters Supervised machine learning
learning
Input Data labeled data. Not labeled data.
Computational
Simpler method Computationally complex
Complexity
Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is unknown

Linear and Logistics regression, Random


K-Means clustering,
forest, multi-class classification, decision
Algorithms used Hierarchical clustering, KNN,
tree, Support Vector Machine, Neural
Apriori algorithm,
Network
Supervised vs. Unsupervised Machine Learning

Parameters Supervised machine learning Unsupervised machine learning

Data Analysis Uses offline analysis Uses real-time analysis of data

Output Desired output is given. Desired output is not given.

Training data Use training data to infer model. No training data is used.

Complex It is not possible to learn larger It is possible to learn larger and


model and more complex models more complex models

Model We can test our model. We can not test our model.

Called as also called classification. also called clustering.


Questions
1. What is the difference between Lasso and Ridge regression?

2. What are the assumptions you need to take before starting with linear
regression?

3. How does the Support Vector Machine algorithm handle self-learning?

4. What are the different methods to split a tree in a decision tree algorithm?

5. What is Cross-Validation?

6. What is Ensemble learning?

7. What is Principal Component Analysis?


Questions
9. What is the difference between logistic regression and linear regression?

10. What is feature engineering? How does it affect the model’s performance?

11. How do measure the effectiveness of the clusters?

12. What is Overfitting in Machine Learning and how can it be avoided?

13. Why we cannot use linear regression for a classification task?

14. Explain the working principle of SVM.

15. What is a radial basis function? Explain its use.

16. Explain SMOTE method used to handle data imbalance.

17. Explain some methods to handle missing values in the dataset.

You might also like