0% found this document useful (0 votes)
21 views94 pages

Document From Sindhu Reddy... ??

This document discusses image processing techniques using machine learning, focusing on feature mapping with the SIFT algorithm and image registration with the RANSAC algorithm. It covers the steps involved in SIFT for feature extraction, including scale-space peak selection, keypoint localization, orientation assignment, and keypoint matching, as well as the application of RANSAC for robust image registration. The document also highlights real-time use cases such as face detection and movement tracking.

Uploaded by

konerusindhu83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views94 pages

Document From Sindhu Reddy... ??

This document discusses image processing techniques using machine learning, focusing on feature mapping with the SIFT algorithm and image registration with the RANSAC algorithm. It covers the steps involved in SIFT for feature extraction, including scale-space peak selection, keypoint localization, orientation assignment, and keypoint matching, as well as the application of RANSAC for robust image registration. The document also highlights real-time use cases such as face detection and movement tracking.

Uploaded by

konerusindhu83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

UNIT V

Image Processing Using Machine Learning & Real Time Use Cases
Feature Mapping Using the SIFT Algorithm, Image Registration Using the RANSAC Algorithm: estimate_ affine,
residual lengths, processing the Images, The Complete code.
Image Classification Using Artificial Neural Networks, Image Classification Using CNNs, Image Classification
Using Machine Learning Approaches: Decision Trees, Support Vector Machines, Logistics Regression, Code,
Important Terms

Introduction to Real-Time Use Cases: Finding Palm Lines, Detecting Faces, Recognizing
Faces, Tracking Movements, Detecting Lanes

Image Processing Using Machine Learning & Real Time Use Cases
Feature Extraction
After an image has been segmented into regions or their boundaries using methods such as those in, the resulting
sets of segmented pixels usually have to be converted into a form suitable for further computer processing.
Typically, the step after segmentation is Feature extraction, which consists of Feature detection and Feature
description.

Feature detection refers to finding the features in an image, region, or boundary.

Feature description assigns quantitative attributes to the detected features.

For example, we might detect corners in a region boundary, and describe those corners by their orientation and
location, both of which are quantitative attributes.

Feature processing methods discussed in this chapter are subdivided into three principal categories, depending on
whether they are applicable to
• Boundaries,
• Regions, or
• Whole images.
Some features are applicable to more than one category.
Feature descriptors should be as insensitive as possible to variations in parameters such as scaling, translation,
rotation, illumination, and viewpoint. The descriptors discussed in this chapter are either insensitive to, or can be
normalized to compensate for, variations in one or more of these parameters.

5.1 Feature Mapping Using the SIFT Algorithm,

Feature mapping is a technique used in data analysis and machine learning to transform input data from
a lower-dimensional space to a higher-dimensional space, where it can be more easily analyzed or classified.
• Feature mapping involves selecting or designing a set of functions that map the original data to a
new set of features that better capture the underlying patterns in the data. The resulting feature
space can then be used as input to a machine learning algorithm or other analysis technique.
• Feature mapping can be used in a wide range of applications, from natural language processing to
computer vision, and is a powerful tool for transforming data into a format that can be analyzed
more easily. However, there are also potential issues to consider, such as the curse of dimensionality,
overfitting, and computational complexity.
• Feature mapping, also known as Feature engineering, is the process of transforming raw input data
into a set of meaningful features that can be used by a machine learning algorithm. Feature mapping
is an important step in machine learning, as the quality of the features can have a significant impact
on the performance of the algorithm.
SIFT stands for Scale-Invariant Feature Transform and was first presented in 2004, by D.Lowe, University of British
Columbia. SIFT is invariance to image scaling and rotation. This algorithm is patented, so this algorithm is included
in the Non-free module in OpenCV.

SIFT (Scale-Invariant Feature Transform) is a powerful technique for image matching that can identify and
match features in images that are invariant to Scaling, Rotation, and affine Distortion. It is widely used in
computer vision applications, including image matching, object recognition, and 3D reconstruction.

Major advantages of SIFT are


• Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
• Distinctiveness: individual features can be matched to a large database of objects
• Quantity: many features can be generated for even small objects
• Efficiency: close to real-time performance
• Extensibility: can easily be extended to a wide range of different feature types, with each adding
robustness
What is sift feature extraction techniques?

The SIFT algorithm


SIFT is quite an involved algorithm. There are mainly four steps involved in the SIFT algorithm. We will see them
one-by-one.
• Scale-space peak selection: Potential location for finding features. (Scale extreme detection)
• Keypoint Localization: Accurately locating the feature keypoints. (Removal of unreliable key)
points
• Orientation Assignment: Assigning orientation to keypoints.
• Keypoint descriptor: Describing the keypoints as a high dimensional vector.
• Keypoint Matching
From the set of reference images SIFT key points of objects are extracted and are stored in the data base.

1. Scale-space peak Selection

Scale-space
Real world objects are meaningful only at a certain scale. You might see a sugar cube perfectly on a table. But if
looking at the entire milky way, then it simply does not exist. This multi-scale nature of objects is quite
common in nature. And a scale space attempts to replicate this concept on digital images.

The scale space of an image is a function L(x,y,σ) that is produced from the convolution of a Gaussian
kernel(Blurring) at different scales with the input image. Scale-space is separated into octaves and the number
of octaves and scale depends on the size of the original image. So we generate several octaves of the original
image. Each octave’s image size is half the previous one.
Blurring
Within an octave, images are progressively blurred using the Gaussian Blur operator. Mathematically,
“blurring” is referred to as the convolution of the Gaussian operator and the image. Gaussian blur has a
particular expression or “operator” that is applied to each pixel. What results is the blurred image.

Blurred image G is the Gaussian Blur operator and I is an image. While x,y are the location coordinates and
σ is the “scale” parameter. Think of it as the amount of blur. Greater the value, greater the blur.

Gaussian Blur operator


DOG(Difference of Gaussian kernel)
Now we use those blurred images to generate another set of images, the Difference of Gaussians (DoG). These
DoG images are great for finding out interesting keypoints in the image. The difference of Gaussian is obtained as
the difference of Gaussian blurring of an image with two different σ, let it be σ and kσ. This process is done for
different octaves of the image in the Gaussian Pyramid. It is represented in below image:

Finding keypoints
Up till now, we have generated a scale space and used the scale space to calculate the Difference of Gaussians.
Those are then used to calculate Laplacian of Gaussian approximations that are scale invariant.

One pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next scale and 9 pixels in previous
scales. This way, a total of 26 checks are made. If it is a local extrema, it is a potential keypoint. It basically means
that keypoint is best represented in that scale.

Keypoint Localization
Keypoints generated in the previous step produce a lot of keypoints. Some of them lie along an edge, or they don’t
have enough contrast. In both cases, they are not as useful as features. So we get rid of them. The approach is
similar to the one used in the Harris Corner Detector for removing edge features. For low contrast features, we
simply check their intensities.

They used Taylor series expansion of scale space to get a more accurate location of extrema, and if the intensity at
this extrema is less than a threshold value (0.03 as per the paper), it is rejected. DoG has a higher response for
edges, so edges also need to be removed. They used a 2x2 Hessian matrix (H) to compute the principal curvature.
Orientation Assignment
Now we have legitimate keypoints. They’ve been tested to be stable. We already know the scale at which the
keypoint was detected (it’s the same as the scale of the blurred image). So we have scale invariance. The next thing
is to assign an orientation to each keypoint to make it rotation invariance.

A neighborhood is taken around the keypoint location depending on the scale, and the gradient magnitude and
direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. Let's
say the gradient direction at a certain point (in the “orientation collection region”) is 18.759 degrees, then it will go
into the 10–19-degree bin. And the “amount” that is added to the bin is proportional to the magnitude of the
gradient at that point. Once you’ve done this for all pixels around the keypoint, the histogram will have a peak at
some point.
The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the
orientation. It creates keypoints with same location and scale, but different directions. It contributes to the stability
of matching.

Keypoint descriptor
At this point, each keypoint has a location, scale, orientation. Next is to compute a descriptor for the local image
region about each keypoint that is highly distinctive and invariant as possible to variations such as changes in
viewpoint and illumination.
To do this, a 16x16 window around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size.

For each sub-block, 8 bin orientation histogram is created.

So 4 X 4 descriptors over 16 X 16 sample array were used in practice. 4 X 4 X 8 directions give 128 bin
values. It is represented as a feature vector to form keypoint descriptor. This feature vector introduces a
few complications. We need to get rid of them before finalizing the fingerprint.

Rotation dependence The feature vector uses gradient orientations. Clearly, if you rotate the image, everything
changes. All gradient orientations also change. To achieve rotation independence, the keypoint’s rotation is
subtracted from each orientation. Thus each gradient orientation is relative to the keypoint’s orientation.

Illumination dependence If we threshold numbers that are big, we can achieve illumination independence. So, any
number (of the 128) greater than 0.2 is changed to 0.2. This resultant feature vector is normalized again. And now
you have an illumination independent feature vector!

Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbors. But in some cases, the second
closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, the
ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminates
around 90% of false matches while discards only 5% correct matches, as per the paper.
Advantages of SIFT

• Distinctiveness
The features that are obtained can be compared with large datasets of objects.

• Quantity SIFT can help generate many features even from small objects.

• Efficiency The performance of this algorithm is comparable to real-time performance.

Disadvantages of SIFT:

• Used to Be Expensive: It used to cost money to use SIFT because of patents. ...
• Needs a Lot of Computer Power: SIFT can be slow and needs a strong computer, especially for big pictures or
lots of features.

5.2 Image Registration Using the RANSAC Algorithm:


What is image registration?
Image registration is defined as a process that overlays two or more images from various imaging equipment or
sensors taken at different times and angles, or from the same scene to geometrically align the images for
analysis (Zitová and Flusser, 2003).
What Is RANSAC?
Random sample consensus, or RANSAC, is an iterative method for estimating a mathematical model from a data
set that contains outliers. The RANSAC algorithm works by identifying the outliers in a data set and estimating the
desired model using data that does not contain outliers.
❖ RANSAC is one of the best algorithms for image registration. It consists of 4 steps:

1. Feature detection and extraction.

2. Feature matching.

3. Transformation function fitting.

4. Image transformation and image resampling

The RANSAC algorithm is often used in computer vision, e.g., to simultaneously solve the correspondence
problem and estimate the fundamental matrix related to a pair of stereo cameras; see also: Structure from
motion, scale-invariant feature transform, image stitching, rigid motion segmentation.

Feature Mapping using the SIFT Algorithm


SIFT is a patented algorithm so newer versions of OpenCV are no longer supporting it. So I commented all the
code, it won't work anyway.
Features of the image that the SIFT algorithm tries to factor out during processing: Scale (zoomed-in or zoomed-
out image); Rotation; Illumination; Perspective.

Step by step processes of using the SIFT Algorithm:

1. Find and constructing a space to ensure scale invariance


2. Find the difference between gaussians
3. Find the important points present inside the image
4. Remove the unimportant points to make efficient comparisons
5. Provide orientation to the important points found in step 3
6. Identifying the key features uniquely

Image stitching [1, 2] is the stitching of two or more images with overlapping or identical features into a
Image registration is a critical step in computer vision and image processing that involves aligning two or
more images of the same scene taken at different times, from different viewpoints, or by different sensors.
The Random Sample Consensus (RANSAC) algorithm is often used for robust estimation in the presence
of outliers. Here's a high-level overview of how RANSAC can be applied to image registration:

Steps for Image Registration using RANSAC:

Feature Detection:

• Identify distinctive features in both images. Common features include corners, keypoints,
or other unique patterns.

Feature Matching:

• Match the features between the two images. This can be done using descriptors like SIFT,
SURF, or ORB.

Random Sample Selection:

• Randomly select a minimal subset of feature correspondences. This subset is used to


estimate a transformation model.

Model Estimation:

• Use the randomly selected correspondences to estimate a transformation model. This could
be an affine transformation, a homography, or another transformation depending on the
nature of the images.

Inlier Selection:

• Apply the estimated model to all feature correspondences and identify inliers—matches
that agree well with the model.

Evaluate Model:

• Assess the quality of the model by counting the number of inliers. This is a measure of
how well the model aligns the images.
Repeat:

• Repeat steps 3-6 for a predefined number of iterations or until a sufficiently good model is
found.

Refinement (Optional):

• Refine the final transformation model using all inliers. This step may involve using a more
sophisticated optimization method.

Apply Transformation:

• Apply the computed transformation to register one image onto the other.

single image with a larger angle of view and scene. The main field of the paper is based on the feature
point image mosaic. Currently, researchers have proposed a number of feature detection and extraction
algorithms based on points, such as Harris algorithm [3], Fast algorithm [4], SIFT algorithm [5] and so
on. However, when images are imaged, they are subject to light, angle etc., resulting in slowing matching,
poor extraction accuracy etc. Herbert Bay et al. proposed speeded up robust features (SURF) [6] in 2006
and improved it in 2008. SURF is a local robust feature detection algorithm. Part of it was inspired by
scale-invariant feature transform (SIFT) [5]. The extracted feature point pairs usually have large errors or
matching-points are wrong, resulting in inaccurate transformation matrix. Robustness methods are usually
used to eliminate mis-matching points, such as M-estimation, least median squares, random sample
consensus (RANSAC) [7]. The RANSAC algorithm is a robust data fitting algorithm, and first proposed
by Fischler et al. in 1981. Its basic assumption is that a set of data contains a sample data set composed of
correct data and a small amount of abnormal data. Iteratively eliminates the process of erroneous data.
Applying the RANSAC algorithm to the feature points matching screening can effectively eliminate the
error matching-points. However, the RANSAC algorithm also has the disadvantages of long time caused
by iteration of all pairs of points to be matched and lack of strong stability. In order to make the image
matching more accurate and efficient, this paper proposes an improved RANSAC features image
matching method based on SURF. First, SURF algorithm is used to select the feature points of the
images, and fast library for approximate nearest neighbours-based matcher algorithm is used to pre-match
the extracted feature points. For the mis-matched points in the matching process, the improved RANSAC
algorithm is used to select and reject the features. It reduces the number of iterations and improves the
accuracy and efficiency of the match.

The RANSAC algorithm is used to eliminate false-matching points in OpenCV. First, find an optimal homography
matrix H so that the number of feature points satisfying this matrix is the most. The relationship between feature
point pairs and homography matrices is as follows:

Matching statistics Matching pairs Wrong match pairs Correct match rate (%) Time (s)

BFMatcher 223 63 71.7 2.39

FLANN 134 20 85.1 1.43

RANSAC 33 3 90.9 1.94

improved RANSAC 18 1 94.4 1.72


where (x, y, z) is the feature point of the image to be matched and (x ‫׳‬, y ‫׳‬, z ‫ )׳‬is the point of the reference image.
There are eight parameters to be solved in the matrix H. Solving these eight parameters requires at least
obtaining the position of the four pairs of feature matching points. Substituting into the solution equation, the
eight parameters are obtained. The matrix obtained can map the to-be-matched image and the reference image
to the same coordinate system, laying the foundation for subsequent image stitching. However, the points
obtained by feature matching generally have the phenomenon of incorrect matching, resulting in low accuracy of
the obtained transformation matrix. In general, the RANSAC algorithm is used to eliminate points with large
errors in matching points, and an iterative method is used to find the exact matrix (Table 2).

Table. Algorithm statistical analysis

Improved RANSAC matching

For the mis-matched points in the initial matching point pairs, the RANSAC algorithm [10, 11] is usually used for
filter and rejection. The RANSAC algorithm uses an iterative method to randomly sample all pairs of feature
matching points, obtain a plurality of minimum sample sets, and sequentially perform tests from a plurality of
minimum sample sets. In the N pair of matching pairs, the sample sets whose error range are less than the given
threshold are considered as the correct matching pairs, and the other matching pairs are considered as the outer
points, those also called the mis-matched points. However, when the RANSAC algorithm filter matching points,
the probability that each pair of matching points in the sample are selected is the same, and the results of each
selection do not affect each other. Considering that there is a fixed transformation matrix between the two
images to be matched, the Euclidean distance value between the correct matching point pairs floats within a
certain range, and there will be no abnormally large or abnormally small phenomenon. This paper proposes an
improved RANSAC algorithm based on the idea of literature [12]. Before the RANSAC algorithm is performed, the
Euclidean distances of the matching point pairs are calculated, and the Euclidean distances of the pair of matching
points are sorted, the pairs of points that are too close or too far away are eliminated, and the RANSAC algorithm
is performed only on the middle matching point pairs. The algorithm improves the probability that the correct
matching point is sampled, so that a correct feedback loop is obtained after obtaining the correct matching point
pairs. At the same time, the sample data are reduced and the iteration time is reduced too.

The steps of the improved RANSAC algorithm here are as follows:

(i) Calculate the Euclidean distance of N pairs of matching points, sort them in order of distance, and delete
the first 20% and the last 20% of points;

(ii) Randomly select four pairs of matched points in the matched pair of 0.6*N pairs to ensure that four pairs
of matching points satisfy any three points not collinear and calculate the transformation matrix H;

(iii) Calculate the distance d of the remaining (0.6*N-4) pairs of matching pairs while they are matched with
the original corresponding matching points through transformation matrix H. Set the threshold T (the
threshold of this experiment is 2), less than the threshold is the correct matching points pair; otherwise it is
removed.

(iv) Calculate the number of interior points obtained from step 3 and re-fit the transformation matrix

(v) Until the number of interior points no longer changes, the final interior point set is obtained, and the final
transformation matrix is calculated at the same time.

Relative to the RANSAC algorithm, the improved RANSAC algorithm makes a change in the selection matching
point, and the probability of selecting a correct pair of point increases, reducing the impact of the mis-matched
points on the desired matrix. At the same time, the number of iterations is greatly reduced, the efficiency
improved, and the accuracy of the transformation matrix is improved. It also laid the foundation for the real-time
panoramic stitching in the later period.
5.3 estimate_ affine,
What is estimate?
estimation is the case where a clean image is contaminated with noise, usually through sensing, transmission, or
storage. Image restoration means that in addition to the noise, there is some blurring due to motion or lack of
focus. Both nonrecursive and recursive approaches for 2-D estimation are presented
What is affine?
The affine transformation technique is typically used to correct for geometric distortions or deformations that
occur with non-ideal camera angles. For example, satellite imagery uses affine transformations to correct for wide
angle lens distortion, panorama stitching, and image registration.
In image processing, estimating an affine transformation is a common task when trying to align or register two
images. An affine transformation is a linear mapping that preserves points, straight lines, and planes. It includes
translations, rotations, scaling, and shearing. Here's how you can estimate an affine transformation using Python
and OpenCV:

python
Copy code
import cv2
import numpy as np

# Load images
image1 = cv2.imread('image1.jpg')
image2 = cv2.imread('image2.jpg')

# Convert images to grayscale


gray1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)

# Detect ORB keypoints and descriptors


orb = cv2.ORB_create()
keypoints1, descriptors1 = orb.detectAndCompute(gray1, None)
keypoints2, descriptors2 = orb.detectAndCompute(gray2, None)

# Match features using a brute-force matcher


bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(descriptors1, descriptors2)

# Get matching points


points1 = np.float32([keypoints1[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
points2 = np.float32([keypoints2[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)

# Estimate an affine transformation using RANSAC


affine_transform, inliers = cv2.estimateAffine2D(points1, points2, method=cv2.RANSAC,
ransacReprojThreshold=5.0)

# Apply the affine transformation to image1


result = cv2.warpAffine(image1, affine_transform, (image2.shape[1], image2.shape[0]))

# Display results
cv2.imshow('Image 1', image1)
cv2.imshow('Image 2', image2)
cv2.imshow('Aligned Image 1', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this example:

We use the ORB detector and descriptor to find keypoints and descriptors in both images.
We match the features using a brute-force matcher with the Hamming distance.
We use the cv2.estimateAffine2D function with the RANSAC method to estimate the affine transformation matrix.
The resulting transformation matrix (affine transform) is then used to align or warp image1 onto image2.
The aligned image is displayed for visual inspection.
Adjust the parameters and methods according to the characteristics of your images and the specific requirements
of your application. Keep in mind that RANSAC helps in robustly estimating the transformation by handling
outliers (mismatched or erroneous correspondences).
What is an Affine Transformation?
1. A transformation that can be expressed in the form of a matrix multiplication (linear transformation)
followed by a vector addition (translation).
2. From the above, we can use an Affine Transformation to express:
a. Rotations (linear transformation)
b. Translations (vector addition)
c. Scale operations (linear transformation)
you can see that, in essence, an Affine Transformation represents a relation between two images.
3. The usual way to represent an Affine Transformation is by using a 2×3 matrix.
A=[a00a10a01a11]2×2B=[b00b10]2×1
M=[AB]=[a00a10a01a11b00b10]2×3
Considering that we want to transform a 2D vector X=[xy] by using A and B, we can do the same with:
T=A⋅[xy]+B or T=M⋅[x,y,1]T
T=[a00x+a01y+b00a10x+a11y+b10]
How do we get an Affine Transformation?
1. We mentioned that an Affine Transformation is basically a relation between two images. The information
about this relation can come, roughly, in two ways:
a. We know both X and T and we also know that they are related. Then our task is to find M
b. We know M and X. To obtain T we only need to apply T=M⋅X. Our information for M may be
explicit (i.e. have the 2-by-3 matrix) or it can come as a geometric relation between points.
2. Let's explain this in a better way (b). Since M relates 2 images, we can analyze the simplest case in which
it relates three points in both images. Look at the figure below:

the points 1, 2 and 3 (forming a triangle in image 1) are mapped into image 2, still forming a triangle, but now
they have changed notoriously. If we find the Affine Transformation with these 3 points (you can choose them as
you like), then we can apply this found relation to all the pixels in an image.
5.4 residual lengths,
What is residual lengths?
Definition. The residual for each observation is the difference between predicted values of y (dependent
variable) and observed values of y. Residual=actual y value−predicted y value, RI=yi−^yi. Residual = actual y value
− predicted y value, r i = y i − y i ^.

Residual plot

A residual plot shows the difference between the observed response and the fitted response values.

The ideal residual plot, called the null residual plot, shows a random scatter of points forming an approximately
constant width band around the identity line.

It is important to check the fit of the model and assumptions – constant variance, normality, and independence of
the errors, using the residual plot, along with normal, sequence, and lag plot.

Assumption How to check


Model function is linear The points form a pattern when the model function is incorrect.
You might be able to transform variables or add polynomial and interaction terms
to remove the pattern.

Constant variance If the points tend to form an increasing, decreasing or non-constant width band,
then the variance is not constant.
You should consider transforming the response variable or incorporating weights
into the model. When variance increases as a percentage of the response, you can
use a log transform, although you should ensure it does not produce a poorly fitting
model.

Even with non-constant variance, the parameter estimates remain unbiased if


somewhat inefficient. However, the hypothesis tests and confidence intervals are
inaccurate.

Normality Examine the normal plot of the residuals to identify non-normality.


Violation of the normality assumption only becomes an issue with small sample
sizes. For large sample sizes, the assumption is less important due to the central
limit theorem, and the fact that the F- and t-tests used for hypothesis tests and
forming confidence intervals are quite robust to modest departures from
normality.

Independence When the order of the cases in the dataset is the order in which they occurred:
Examine a sequence plot of the residuals against the order to identify any
dependency between the residual and time.

Examine a lag-1 plot of each residual against the previous residual to identify a
serial correlation, where observations are not independent, and there is a
correlation between an observation and the previous observation.

Time-series analysis may be more suitable to model data where serial correlation is
present.

For a model with many terms, it can be difficult to identify specific problems using the residual plot. A non-null
residual plot indicates that there are problems with the model, but not necessarily what these are.

Residuals - normality

Normality is the assumption that the underlying residuals are normally distributed, or approximately so.

While a residual plot, or normal plot of the residuals can identify non-normality, you can formally test the
hypothesis using the Shapiro-Wilk or similar test.

The null hypothesis states that the residuals are normally distributed, against the alternative hypothesis that they
are not normally-distributed. If the test p-value is less than the predefined significance level, you can reject the
null hypothesis and conclude the residuals are not from a normal distribution. If the p-value is greater than the
predefined significance level, you cannot reject the null hypothesis.

Violation of the normality assumption only becomes an issue with small sample sizes. For large sample sizes, the
assumption is less important due to the central limit theorem, and the fact that the F- and t-tests used for
hypothesis tests and forming confidence intervals are quite robust to modest departures from normality.
Residuals – independence
Autocorrelation occurs when the residuals are not independent of each other. That is, when the value of e[i+1] is
not independent from e[i].

While a residual plot, or lag-1 plot allows you to visually check for autocorrelation, you can formally test the
hypothesis using the Durbin-Watson test. The Durbin-Watson statistic is used to detect the presence of
autocorrelation at lag 1 (or higher) in the residuals from a regression. The value of the test statistic lies between 0
and 4, small values indicate successive residuals are positively correlated. If the Durbin-Watson statistic is much
less than 2, there is evidence of positive autocorrelation, if much greater than 2 evidence of negative
autocorrelation.
The null hypothesis states that the residuals are not autocorrelated, against the alternative hypothesis that they
are. If the test p-value is less than the predefined significance level, you can reject the null hypothesis and
conclude the residuals are correlated. If the p-value is greater than the predefined significance level, you cannot
reject the null hypothesis.
Note: The p-value is computed using the bootstrap method and can take a long time to compute.
As of my last knowledge update in January 2022, the term "residual lengths" doesn't have a specific, widely
recognized meaning in a general context. However, the term could be used in various fields or disciplines with
different meanings. Without additional context, it's challenging to provide a precise definition.

In mathematics or signal processing, "residual" often refers to the difference between an observed or measured
value and the value predicted or estimated by a model. "Lengths" might refer to the size or extent of something.
So, "residual lengths" could potentially be related to differences in sizes or extents in a particular context.

If you have a specific field or context in mind where you've encountered this term, providing more details could
help in offering a more accurate explanation. If "residual lengths" is a term coined or introduced after my last
update in January 2022, I recommend checking the latest literature or resources in the relevant field for the most
up-to-date information.
5.5 processing the Images,
What is the processing the image?
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to
extract some useful information from it. It is a type of signal processing in which input is an image and output may
be image or characteristics/features associated with that image. Image processing refers to the manipulation of
an image to extract meaningful information or enhance certain features. This field is crucial in various
applications, including computer vision, medical imaging, remote sensing, and more. Image processing can
involve a wide range of operations, and here are some common tasks:

Image Acquisition: The process begins with capturing or obtaining the image data through cameras, sensors, or
other devices. Image processing refers to the manipulation of an image to extract meaningful information or
enhance certain features. This field is crucial in various applications, including computer vision, medical imaging,
remote sensing, and more. Image processing can involve a wide range of operations, and here are some common
tasks:
1. Image Acquisition: The process begins with capturing or obtaining the image data through cameras,
sensors, or other devices.
2. Preprocessing: This step involves preparing the image for further analysis. Operations may include
resizing, noise reduction, and image enhancement to improve the quality of the image.
3. Image Enhancement: This step aims to improve the visual appearance of an image. Techniques such as
contrast adjustment, sharpening, and filtering are commonly used.
4. Image Restoration: Involves the removal or reduction of artifacts or distortions introduced during image
acquisition or transmission.
5. Image Segmentation: The process of dividing an image into meaningful segments or regions. This is often
a crucial step in object recognition and computer vision.
6. Feature Extraction: Involves identifying and extracting relevant features from an image, such as edges,
corners, or texture patterns, which are important for subsequent analysis.
7. Image Recognition: Using patterns and features identified in the previous steps to recognize and classify
objects or patterns within an image. This is a fundamental aspect of computer vision.
8. Image Compression: Reducing the size of an image to save storage space or enable faster transmission.
9. Image Analysis: Involves extracting quantitative information from an image. This can include
measurements, statistical analysis, and other data extraction techniques.
10. Image Synthesis: Creating new images from existing ones, often using computer graphics techniques.
11. Image Understanding: The highest level of image processing involves interpreting and understanding the
content of an image, often requiring advanced artificial intelligence and machine learning techniques.
Image processing can be performed using various tools and programming languages, and it often involves a
combination of traditional methods and modern machine learning approaches. The specific techniques used
depend on the goals of the image processing task and the characteristics of the images being analyzed.

Preprocessing: This step involves preparing the image for further analysis. Operations may include resizing, noise
reduction, and image enhancement to improve the quality of the image.

Image Enhancement: This step aims to improve the visual appearance of an image. Techniques such as contrast
adjustment, sharpening, and filtering are commonly used.

Image Restoration: Involves the removal or reduction of artifacts or distortions introduced during image
acquisition or transmission.

Image Segmentation: The process of dividing an image into meaningful segments or regions. This is often a crucial
step in object recognition and computer vision.

Feature Extraction: Involves identifying and extracting relevant features from an image, such as edges, corners, or
texture patterns, which are important for subsequent analysis.

Image Recognition: Using patterns and features identified in the previous steps to recognize and classify objects
or patterns within an image. This is a fundamental aspect of computer vision.

Image Compression: Reducing the size of an image to save storage space or enable faster transmission.

Image Analysis: Involves extracting quantitative information from an image. This can include measurements,
statistical analysis, and other data extraction techniques.

Image Synthesis: Creating new images from existing ones, often using computer graphics techniques.

Image Understanding: The highest level of image processing involves interpreting and understanding the content
of an image, often requiring advanced artificial intelligence and machine learning techniques.

Image processing can be performed using various tools and programming languages, and it often involves a
combination of traditional methods and modern machine learning approaches. The specific techniques used
depend on the goals of the image processing task and the characteristics of the images being analyzed. Examples
of this operation are shown below.
Source: Paper
The authors achieved a 3% boost in performance with this simple preprocessing procedure which is a
considerable enhancement, especially in a biomedical application where the accuracy of diagnosis is crucial for AI
systems. The quantitative results obtained with and without preprocessing for the lesion segmentation problem
in three different datasets are shown below.

Source: Paper
Types of Images / How Machines “See” Images?
Digital images are interpreted as 2D or 3D matrices by a computer, where each value or pixel in the matrix
represents the amplitude, known as the “intensity” of the pixel. Typically, we are used to dealing with 8-bit
images, wherein the amplitude value ranges from 0 to 255.

Image by the author


Thus, a computer “sees” digital images as a function: I(x, y) or I(x, y, z), where “I” is the pixel intensity and (x, y) or
(x, y, z) represent the coordinates (for binary/grayscale or RGB images respectively) of the pixel in the image.

Convention of the coordinate system used in an image


Computers deal with different “types” of images based on their function representations. Let us look into them
next.

1. Binary Image
Images that have only two unique values of pixel intensity- 0 (representing black) and 1 (representing white) are
called binary images. Such images are generally used to highlight a discriminating portion of a colored image. For
example, it is commonly used for image segmentation, as shown below.

Source: Paper
2. Grayscale Image
Grayscale or 8-bit images are composed of 256 unique colors, where a pixel intensity of 0 represents the black
color and pixel intensity of 255 represents the white color. All the other 254 values in between are the different
shades of gray.

An example of an RGB image converted to its grayscale version is shown below. Notice that the shape of the
histogram remains the same for the RGB and grayscale images.

3. RGB Color Image


The images we are used to in the modern world are RGB or colored images which are 16-bit matrices to
computers. That is, 65,536 different colors are possible for each pixel. “RGB” represents the Red, Green, and Blue
“channels” of an image.

Up until now, we had images with only one channel. That is, two coordinates could have defined the location of
any value of a matrix. Now, three equal-sized matrices (called channels), each having values ranging from 0 to
255, are stacked on top of each other, and thus we require three unique coordinates to specify the value of a
matrix element.

Thus, a pixel in an RGB image will be of color black when the pixel value is (0, 0, 0) and white when it is (255, 255,
255). Any combination of numbers in between gives rise to all the different colors existing in nature. For example,
(255, 0, 0) is the color red (since only the red channel is activated for this pixel). Similarly, (0, 255, 0) is green and
(0, 0, 255) is blue.

An example of an RGB image split into its channel components is shown below. Notice that the shapes of the
histograms for each of the channels are different.
Splitting of an image into its Red, Green and Blue channels
4. RGBA Image
RGBA images are colored RGB images with an extra channel known as “alpha” that depicts the opacity of the RGB
image. Opacity ranges from a value of 0% to 100% and is essentially a “see-through” property.

Opacity in physics depicts the amount of light that passes through an object. For instance, cellophane paper is
transparent (100% opacity), frosted glass is translucent, and wood is opaque. The alpha channel in RGBA images
tries to mimic this property. An example of this is shown below.
Example of changing the “alpha” parameter in RGBA images
Phases of Image Processing
The fundamental steps in any typical Digital Image Processing pipeline are as follows:

1. Image Acquisition
The image is captured by a camera and digitized (if the camera output is not digitized automatically) using an
analogue-to-digital converter for further processing in a computer.

2. Image Enhancement
In this step, the acquired image is manipulated to meet the requirements of the specific task for which the image
will be used. Such techniques are primarily aimed at highlighting the hidden or important details in an image, like
contrast and brightness adjustment, etc. Image enhancement is highly subjective in nature.

3. Image Restoration
This step deals with improving the appearance of an image and is an objective operation since the degradation of
an image can be attributed to a mathematical or probabilistic model. For example, removing noise or blur from
images.

4. Color Image Processing


This step aims at handling the processing of colored images (16-bit RGB or RGBA images), for example, peforming
color correction or color modeling in images.

5. Wavelets and Multi-Resolution Processing


Wavelets are the building blocks for representing images in various degrees of resolution. Images subdivision
successively into smaller regions for data compression and for pyramidal representation.

6. Image Compression
For transferring images to other devices or due to computational storage constraints, images need to be
compressed and cannot be kept at their original size. This is also important in displaying images over the internet;
for example, on Google, a small thumbnail of an image is a highly compressed version of the original. Only when
you click on the image is it shown in the original resolution. This process saves bandwidth on the servers.

7. Morphological Processing
Image components that are useful in the representation and description of shape need to be extracted for further
processing or downstream tasks. Morphological Processing provides the tools (which are essentially mathematical
operations) to accomplish this. For example, erosion and dilation operations are used to sharpen and blur the
edges of objects in an image, respectively.

8. Image Segmentation
This step involves partitioning an image into different key parts to simplify and/or change the representation of
an image into something that is more meaningful and easier to analyze. Image segmentation allows for
computers to put attention on the more important parts of the image, discarding the rest, which enables
automated systems to have improved performance.

9. Representation and Description


Image segmentation procedures are generally followed by this step, where the task for representation is to
decide whether the segmented region should be depicted as a boundary or a complete region. Description deals
with extracting attributes that result in some quantitative information of interest or are basic for differentiating
one class of objects from another.

10. Object Detection and Recognition


After the objects are segmented from an image and the representation and description phases are complete, the
automated system needs to assign a label to the object—to let the human users know what object has been
detected, for example, “vehicle” or “person”, etc.

11. Knowledge Base


Knowledge may be as simple as the bounding box coordinates for an object of interest that has been found in the
image, along with the object label assigned to it. Anything that will help in solving the problem for the specific
task at hand can be encoded into the knowledge base.
5.6 The Complete code.
Code Complete is a software development book, written by Steve McConnell and published in 1993 by Microsoft Press,
encouraging developers to continue past code-and-fix programming and the big design up front and waterfall models. It is
also a compendium of software construction techniques, which include techniques from naming variables to deciding when
to write a subroutine.
"Code Complete" is not a software; rather, it is a highly regarded book in the field of software development. The full title of
the book is "Code Complete: A Practical Handbook of Software Construction," and it was written by Steve McConnell. The
book is widely considered a classic in the software development community and provides practical advice and guidance on
various aspects of software construction, coding, and project management.

First published in 1993, "Code Complete" covers a broad range of topics, including software design principles, coding
practices, debugging, testing, and project management. It aims to help software developers improve their coding skills and
produce higher-quality software. The book has been updated over the years to reflect changes in technology and software
development practices.

If you are looking for information on a specific software product or library, please provide more details, and I'll do my best to
assist you.

What types of code example are available?


There are four types of code examples available on MDN:

Static examples — plain code blocks, possibly with a screenshot to statically show the result of such code if it were to be run.
Interactive examples — Our system for creating live interactive examples that show the code running live but also allow you
to change code on the fly to see what the effect is and easily copy the results.
Traditional MDN "live samples" — A macro that takes plain code blocks, dynamically puts them into a document inside an
<iframe> element, and embeds it into the page to show the code running live.
GitHub "live samples" — A macro that takes a document in a GitHub repo inside the MDN organization, puts it inside an
<iframe> element, and embeds it into the page to show the code running live.
We'll discuss each one in later sections.

When should you use each one?


Each type of code example has its own use cases. When should you use each one?

Static examples are useful if you just need to show some code, and it isn't super important to show what the live result is.
Some people just want something to copy and paste. Maybe you are just showing an intermediate step, or the source code is
enough. (For example, the article is for an advanced audience, and they just need to see the code.) Also, you might be
demonstrating an API feature that doesn't work well as an embedded example, which might need its own separate page to
link to.
The interactive examples are great as readers can modify values on the fly — this is very valuable for learning. However, they
are more complex to set up than the other forms, with more limitations, and are intended for specific purposes.
Traditional live samples are useful if you want to show source code on a page, then show it running, and you're not that
bothered about it being accessible as a standalone example. This approach also has the advantage that if you are showing
source code and live examples side by side, you only need to update the code once to update both. They can however be
awkward to edit and get working.
GitHub live samples are useful when you've got an existing example you want to embed, don't want to show the source code
for, and/or you want to make sure the example is available in standalone form. They have a better contribution workflow,
but it does require you to know GitHub. Also because on-page code and source code are in two different places, it is easier
for them to get out of sync.
General guidelines
Aside from the specific system for presenting the live samples, there are style and content considerations to keep in mind
when adding or updating samples on MDN.
When placing samples on a page, try to ensure that all of the features or options of the API or concept you're writing about
are covered. At a minimum, at least the most-common options or properties should be included in examples.
Precede each example with an explanation of what the example does and why it's interesting or useful.
Follow each piece of code with an explanation of what it does.
When possible, break large examples into smaller pieces. For instance, the "live sample" system will automatically
concatenate all your code together into one piece before running the example, so you can actually break your JavaScript,
HTML, and/or CSS into smaller pieces with descriptive text after each piece if you choose to do so. This is a great way to help
explain long or complicated stretches of code more clearly.
Go beyond just demonstrating how each piece of the API or technology works. Consider possible real-world use cases you
might try to demonstrate.
Static examples
By static examples, we are talking about static code blocks that show how a feature might be used in code. These are put on
a page using Markdown "code fences", as described in Example code blocks. An example result might look like this:
What is source code?

Source code is the fundamental component of a computer program that is created by a programmer, often written in the
form of functions, descriptions, definitions, calls, methods and other operational statements. It is designed to be human-

readable and formatted in a way that developers and other users can understand.

As an example, when a programmer types a sequence of C programming language statements into Windows Notepad and

saves the sequence as a text file, the text file now contains source code.

Source code and object code are sometimes referred to as the before and after versions of a compiled computer program.

However, source code and object code do not apply to script (non compiled or interpreted) program languages,

like JavaScript, since there is only one form of the code.

Programmers can use a text editor, a visual programming tool or an integrated development environment (IDE) such as a

software development kit (SDK) to create source code. In large program development environments, there are often

management systems that help programmers separate and keep track of different states and levels of source code files.

Licensing of source code

Source code can be proprietary or open, and licensing agreements often reflect this distinction.

When a user installs a software suite like Microsoft Office, for example, the source code is proprietary. Microsoft only gives
the customer access to the software's compiled executables and the associated library files that various executable files

require to call program functions.

By comparison, when a user installs Apache OpenOffice, its open source software code can be downloaded and modified.
5.7 Image Classification Using Artificial Neural Networks,
What is Image classification?
The process of categorizing and labeling groups of pixels or vectors within an image based on specific rules. The
categorization law can be devised using one or more spectral or textural characteristics. Two general methods
of classification are 'supervised' and 'unsupervised'.
These convolutional neural network models are ubiquitous in the image data space. They work phenomenally
well on computer vision tasks like image classification, object detection, image recognition, etc. They have hence
been widely used in artificial intelligence modeling, especially to create image classifiers
Image classification using artificial neural networks is a popular and powerful application of machine learning.
Convolutional Neural Networks (CNNs) are particularly well-suited for this task. Here's a step-by-step guide on
how image classification using artificial neural networks, specifically CNNs, can be implemented:

1. Dataset Preparation:
Obtain a labeled dataset with images and corresponding labels. Common datasets for image classification include
CIFAR-10, CIFAR-100, ImageNet, etc.
2. Data Preprocessing:
Resize images to a consistent size.
Normalize pixel values (typically between 0 and 1).
Augment data for better generalization (rotate, flip, zoom, etc.) to increase the variety of training examples.
3. Architecture Design:
Build a CNN architecture. A typical architecture includes convolutional layers, pooling layers, and fully connected
layers.
Popular CNN architectures include LeNet, AlexNet, VGG, GoogLeNet (Inception), ResNet, and more. You can also
design a custom architecture based on your specific requirements.
4. Model Compilation:
Choose an appropriate loss function (categorical crossentropy for multi-class classification) and an optimizer (e.g.,
Adam, SGD).
Compile the model with these settings.
5. Training:
Split the dataset into training and validation sets.
Train the model using the training set and validate it using the validation set.
Adjust hyperparameters like learning rate, batch size, and architecture based on the validation performance.
6. Model Evaluation:
Evaluate the model on a separate test set to measure its performance accurately.
7. Fine-Tuning:
If the model performance is not satisfactory, consider fine-tuning the architecture or hyperparameters.
Techniques like transfer learning can be employed by using pre-trained models and adapting them to your
specific task.
8. Deployment:
Once satisfied with the performance, deploy the model for inference. This can involve integrating the model into
a web application, mobile app, or other platforms.
9. Monitoring and Maintenance:
Regularly monitor the model's performance, and update it if necessary with new data or retraining to ensure its
accuracy over time.
Tips and Considerations:
Experiment with different architectures and hyperparameters to find the best combination for your specific
problem.
Use GPU acceleration for faster training times.
Regularization techniques like dropout can help prevent overfitting.
Keep an eye on class imbalances and use techniques such as class weights or oversampling to address them.

Example (using Python and TensorFlow/Keras):


python
Copy code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Define the CNN architecture


model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compile the model


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model


model.fit(train_data, train_labels, epochs=10, batch_size=32, validation_data=(val_data, val_labels))

# Evaluate the model


test_loss, test_acc = model.evaluate(test_data, test_labels)
print(f'Test Accuracy: {test_acc}')

Birds inspired us to fly, nature inspired us to countless inventions. It seems logical, then to look at the brain’s
architecture for inspiration on how to build an Intelligent Machine. This is the logic that sparked Artificial Neural
Networks (ANN). ANN is a Machine Learning Model inspired by the networks of biological neurons found in our
brains. However, although planes were inspired by birds, they don’t have to flap their wings. Similarly, ANN have
gradually become quite different from their biological cousins. In this Article, I will build an Image Classification
model with ANN to show you how ANN works

Building an Image Classification with ANN


First, we need to load a dataset. In this Image Classification model we will tackle Fashion MNIST. It has a format
of 60,000 grayscale images of 28 x 28 pixels each, with 10 classes. Let’s import some necessary libraries to start
with this task:
# Python ≥3.5 is required
import sys
assert sys.version_info &gt;= (3, 5)

# Scikit-Learn ≥0.20 is required


import sklearn
assert sklearn. version &gt;= "0.20"

try:
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
pass

# TensorFlow ≥2.0 is required


import tensorflow as tf
assert tf. version &gt;= "2.0"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs


np.random.seed(42)

# To plot pretty figures


%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)Code language: Python (python)

Using Keras to Load the Dataset

Keras provide some quality functions to fetch and load common datasets, including MNIST, Fashion MNIST, and
the California housing dataset. Let’s start by loading the fashion MNIST dataset to create an Image Classification
model.

Keras has a number of functions to load popular datasets in keras.datasets. The dataset is already split for you
between a training set and a test set, but it can be useful to split the training set further to have a validation set:

import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()Code language: Python (python)
When loading MNIST or Fashion MNIST using Keras rather than Scikit-Learn, one important difference is that
every image is represented as a 28 x 28 array rather than a 1D array of size 784. Moreover, the pixel intensities
are represented as integers rather than the floats. Let’s take a look at the shape and data type of the training set:
X_train_full.shapeCode language: Python (python)
(60000, 28, 28)

X_train_full.dtypeCode language: Python (python)


dtype(‘uint8’)
Note that the dataset is already split into a training set and a test set, but there is no validation set, so we’ll create
one now. Additionally, since we are going to train the ANN using Gradient Descent, we must scale the input
features. For simplicity, I will scale the pixel intensities down to the 0-1 range by dividing them by 255.0:

X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.


y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.Code language: Python (python)
You can plot an image using Matplotlib’s imshow() function, with a 'binary' color map:

plt.imshow(X_train[0], cmap="binary")
plt.axis('off')
plt.show()Code language: Python (python)

The labels are the class IDs (represented as uint8), from 0 to 9:

y_trainCode language: Python (python)


array([4, 0, 7, …, 3, 0, 5], dtype=uint8)

With MNIST, when the label is equal to 5, it means that the image represents the handwritten digit 5. easy. For
Fashion MNIST, however, we need the list of class names to know what we are dealing with:

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",


"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]Code language: Python (python)
For example, the first image in the training set represents a coat:

class_names[y_train[0]]Code language: Python (python)


Coat

The validation set contains 5,000 images, and the test set contains 10,000 images:

X_valid.shapeCode language: Python (python)


(5000, 28, 28)

X_test.shapeCode language: Python (python)


(10000, 28, 28)
Let’s take a look at a sample of the images in the dataset:

n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(n_rows):
for col in range(n_cols):
index = n_cols * row + col
plt.subplot(n_rows, n_cols, index + 1)
plt.imshow(X_train[index], cmap="binary", interpolation="nearest")
plt.axis('off')
plt.title(class_names[y_train[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
save_fig('fashion_mnist_plot', tight_layout=False)
plt.show()Code language: Python (python)

Image Classification Model using Sequential API

Now, let’s build the neural network. Here is a classification MLP with two hidden layers:

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)Code language: Python (python)
Let’s go through the above code line by line:

• The first line creates a Sequential model. This is the simplest kind of Keras model for neural networks that
are just composed of a single stack of layers connected sequentially. This is called the Sequential API.
• Next, we build the first layer and add it to the model. It is Flatten layer whose role is to convert each input
image into a 1D array. If it receives input data X, it computes X.reshape(-1,1). This layer does not have any
parameters, it is just there to do some simple preprocessing. Since it is the first layer in the model, you should
specify the input_shape, which doesn’t include the batch size, only the shape of the instances. Alternatively, you
could add a keras.layers.InputLayer as the first layer, setting input _shape = [28,28].
• Next we add a Dense hidden layer with 300 neurons.It will use the ReLU activation function. Each Dense
layer manages its own weight matrix, containing all the connection weights between the neurons and their
inputs. It also manages a vector of bias term.
• Then we add a second Dense hidden layer with 100 neurons, also using the ReLU activation function.
• Finally, we add a Dense output layer with 10 neurons, using the softmax qctivation function.
Instead of adding the layers one by one as we just did, you can pass a list of layers when creating the Sequential
model:

model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.layersCode language: Python (python)
The model’s summary() method will display all the model’s layers. including each layer’s name, it’s output shape,
and it’s number of parameters, including trainable and non-trainable parameters.

model.summary()Code language: Python (python)


Model: "sequential" Layer (type)
Output Shape Param # ================================================================= flatten
(Flatten) (None, 784) 0 dense
(Dense) (None, 300) 235500
dense_1 (Dense) (None, 100) 30100
dense_2 (Dense) (None, 10) 1010
================================================================= Total params: 266,610 Trainable
params: 266,610 Non-trainable params: 0
keras.utils.plot_model(model, "my_fashion_mnist_model.png", show_shapes=True)Code language: Python
(python)
hidden1 = model.layers[1]
hidden1.nameCode language: Python (python)
dense

model.get_layer(hidden1.name) is hidden1Code language: Python (python)


True

weights, biases = hidden1.get_weights()


Code language: Python (python)
array([[ 0.02448617, -0.00877795, -0.02189048, ..., -0.02766046, 0.03859074, -0.06889391], [ 0.00476504, -
0.03105379, -0.0586676 , ..., 0.00602964, -0.02763776, -0.04165364], [-0.06189284, -0.06901957, 0.07102345,
..., -0.04238207, 0.07121518, -0.07331658], ..., [-0.03048757, 0.02155137, -0.05400612, ..., -0.00113463,
0.00228987, 0.05581069], [ 0.07061854, -0.06960931, 0.07038955, ..., -0.00384101, 0.00034875, 0.02878492], [-
0.06022581, 0.01577859, -0.02585464, ..., -0.00527829, 0.00272203, -0.06793761]], dtype=float32)
biasesCode language: Python (python)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.], dtype=float32)

Compiling the Image Classification Model

After a model is created, you must call its compile() methid to specify that the loss function and the optimizer to
use. Optionally, you can specify a list of extra metrices to compute during training and evaluation:

model.compile(loss="sparse_categorical_crossentropy",
optimizer="sgd",
metrics=["accuracy"])Code language: Python (python)

Training and Evaluating the Image Classification Model

Now the model is ready to be trained. For this we simply need to call its fit() method:

history = model.fit(X_train, y_train, epochs=30,


validation_data=(X_valid, y_valid))Code language: Python (python)
The fit() method returns a History object containing the training parameters, the list of epochs it went through,
and most importantly a dictionary containing the loss and extra metrics it measured at the end of each epoch on
the training set and on the validation set. If you use this dictionary to create a pandsa DataFrame and call its
plot(), then you can see the learning curves of our trained model:

import pandas as pd

pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1)
save_fig("keras_learning_curves_plot")
plt.show()Code language: Python (python)
You can see that both the training accuracy and the validation accuracy steadily increase during training, while
the training loss and the validation loss decrease.

Once you are satisfied with your model’s validation accuracy, you should evaluate it on a test set to estimate the
generalization error before you deploy it to the production. You can easily do this using the evaluate() method:

model.evaluate(X_test, y_test)Code language: Python (python)


313/313 [==============================] - 0s 2ms/step - loss: 0.3382 - accuracy: 0.8822
[0.3381877839565277, 0.8822000026702881]

Use the Model to Make Predictions

Next, we can use the model’s predict() method to make predictions on new instances. Since we don’t have actual
new instances, we will just use the first three instances of the test set:

X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)Code language: Python (python)
array([[0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.03, 0. , 0.96], [0. , 0. , 0.99, 0. , 0.01, 0. , 0. , 0. , 0. , 0. ], [0. , 1. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ]], dtype=float32)
y_pred = model.predict_classes(X_new)
y_predCode language: Python (python)
array([9, 2, 1])

np.array(class_names)[y_pred]Code language: Python (python)


array([‘Ankle boot’, ‘Pullover’, ‘Trouser’], dtype='<U11′)

Here, the classification model actually classified all three images correctly:
y_new = y_test[:3]
plt.figure(figsize=(7.2, 2.4))
for index, image in enumerate(X_new):
plt.subplot(1, 3, index + 1)
plt.imshow(image, cmap="binary", interpolation="nearest")
plt.axis('off')
plt.title(class_names[y_test[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
save_fig('fashion_mnist_images_plot', tight_layout=False)
plt.show()Code language: Python (python)

5.8 Image Classification Using CNNs,


What is Image classification?
The process of categorizing and labeling groups of pixels or vectors within an image based on specific rules. The
categorization law can be devised using one or more spectral or textural characteristics. Two general methods

of classification are 'supervised' and 'unsupervised'.

What is CNN?
Classifier for image classification is a CNN-based model specifically designed to classify images into different
predefined classes. It learns to extract relevant features from input images and map them to the corresponding
classes, enabling accurate image classification.
Remember to make appropriate changes according to your setup.
Step 1: Choose a Dataset. ...
Step 2: Prepare Dataset for Training. ...
Step 3: Create Training Data. ...
Step 4: Shuffle the Dataset. ...
Step 5: Assigning Labels and Features. ...
Step 6: Normalizing X and Converting Labels to Categorical Data. ...
Step 7: Split X and Y for Use in CNN.

Image classification using Convolutional Neural Networks (CNNs) is a popular and effective approach in the field
of computer vision. CNNs are particularly well-suited for tasks like image classification because they can
automatically learn hierarchical representations of features from raw pixel values. Here's a general outline of the
steps involved in building an image classification system using CNNs:

Dataset Preparation:
Collect a labeled dataset of images for training and testing. Ensure that the dataset is diverse and representative
of the target classes. Split the dataset into training and testing sets.
Data Preprocessing:
Resize images to a standard size.
Normalize pixel values to a common scale (e.g., between 0 and 1).
Augment the dataset with techniques like rotation, flipping, and zooming to increase variability in the training set.
Building the CNN Model:
Import necessary libraries (e.g., Tensor Flow, Porch, Keas).
Define the CNN architecture, typically consisting of convolutional layers, pooling layers, and fully connected
layers.
Add activation functions (e.g., Relay) to introduce non-linearity.
Use dropout layers to reduce overfitting.
Choose an appropriate output layer activation function based on the number of classes in your problem (e.g., soft
ax for multi-class classification).
Compiling the Model:
Specify the optimizer (e.g., Adam, SGD), loss function (e.g., categorical cross entropy), and evaluation metric (e.g.,
accuracy). Compile the model.
Training the Model:
Feed the training data into the model.
Adjust the model weights during training using backpropagation and optimization algorithms.
Monitor training performance using validation data.
Evaluation:
Evaluate the trained model on the test set to assess its performance.
Analyze metrics like accuracy, precision, recall, and F1 score.
Fine-tuning:
Fine-tune the model based on performance metrics.
Adjust hyper parameters or experiment with different architectures if needed.
Prediction:
Use the trained model for making predictions on new, unseen data.
Deployment:
Deploy the model in a production environment if necessary.
Optimize the model for inference speed and resource usage.

Here's a simple example using Python and Tensor Flow/Keas:


This is a basic example, and the architecture and hyper parameters might need adjustments depending on your
specific task and dataset. Experimentation and tuning are crucial for achieving the best performance.

Why CNN for Image Classification?


Image classification involves the extraction of features from the image to observe some patterns in the dataset.
Using an ANN for the purpose of image classification would end up being very costly in terms of computation
since the trainable parameters become extremely large.

For example,
if we have a 50 X 50 image of a cat, and we want to train our traditional ANN on that image to classify it into a
dog or a cat the trainable parameters become –
(50*50) * 100 image pixels multiplied by hidden layer + 100 bias + 2 * 100 output neurons + 2bias=2,50,302

We use filters when using CNNs. Filters exist of many different types according to their purpose.
Examples of different filters and their effects
Filters help us exploit the spatial locality of a particular image by enforcing a local connectivity pattern between
neurons. Convolution basically means a pointwise multiplication of two functions to produce a third function.
Here one function is our image pixels matrix and another is our filter. We slide the filter over the image and get
the dot product of the two matrices. The resulting matrix is called an “Activation Map” or “Feature Map”.

Vertical filter CNN image classification

There are multiple convolutional layers extracting features from the image and finally the output layer.

How Are CNN Used Image Classification?

Image classification involves assigning labels or classes to input images. It is a supervised learning task where a
model is trained on labeled image data to predict the class of unseen images. CNN are commonly used for
image classification as they can learn hierarchical features like edges, textures, and shapes, enabling accurate
object recognition in images. CNNs excel in this task because they can automatically extract meaningful spatial
features from images.

Here are different layers involved in the process:

Input Layer
The input layer of a CNN takes in the raw image data as input. The images are typically represented as matrices
of pixel values. The dimensions of the input layer correspond to the size of the input images (e.g., height,
width, and color channels).
Convolutional Layers
Convolutional layers are responsible for feature extraction. They consist of filters (also known as kernels) that
are convolved with the input images to capture relevant patterns and features. These layers learn to detect
edges, textures, shapes, and other important visual elements.
Pooling Layers
Pooling layers reduce the spatial dimensions of the feature maps produced by the convolutional layers. They
perform downsampling operations (e.g., max pooling) to retain the most salient information while discarding
unnecessary details. This helps in achieving translation invariance and reducing computational complexity.

Fully Connected Layers


The output of the last pooling layer is flattened and connected to one or more fully connected layers. These layers
function as traditional neural network layers and classify the extracted features. The fully connected layers learn
complex relationships between features and output class probabilities or predictions.
Output Layer
The output layer represents the final layer of the CNN. It consists of neurons equal to the number of distinct
classes in the classification task. The output layer provides each class’s classification probabilities or predictions,
indicating the likelihood of the input image belonging to a particular class.

Convolutional Neural Networks steps


Tutorial: CNN Image Classification with Koras and CIFAR-10

I will be working on Google Colab and I have connected the dataset through Google Drive, so the code provided
by me should work if the same setup is being used. Remember to make appropriate changes according to your
setup.

Step 1: Choose a Dataset

Choose a dataset of your interest or you can also create your own image dataset for solving your own image
classification problem. An easy place to choose a dataset is on kaggle.com.

The dataset I’m going with can be found here. This dataset contains 12,500 augmented images of blood cells
(JPEG) with accompanying cell type labels (CSV). There are approximately 3,000 images for each of 4 different cell
types grouped into 4 different folders (according to cell type). The cell types are Eosinophil, Lymphocyte,
Monocyte, and Neutrophil.

Here are all the libraries that we would require and the code for importing them:

from keras.models import Sequential


import tensorflow as tf
import tensorflow_datasets as tfds
tf.enable_eager_execution()
from keras.layers.core import Dense, Activation, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, RMSprop, adam
from keras.utils import np_utils
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn import metricsfrom sklearn.utils import shuffle
from sklearn.model_selection import train_test_splitimport matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import cv2
import randomfrom numpy import *
from PIL import Image
import theano

Step 2: Prepare Dataset for Training


Preparing our dataset for training will involve assigning paths and creating categories(labels), resizing our images.

Resizing images into 200 X 200


path_test = "/content/drive/My Drive/semester 5 - ai ml/datasetHomeAssign/TRAIN"
CATEGORIES = ["EOSINOPHIL", "LYMPHOCYTE", "MONOCYTE", "NEUTROPHIL"]
print(img_array.shape)IMG_SIZE =200
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))

Step 3: Create Training Data


Training is an array that will contain image pixel values and the index at which the image in the CATEGORIES list.

training = []def createTrainingData():


for category in CATEGORIES:
path = os.path.join(path_test, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training.append([new_array, class_num])createTrainingData()

Step 4: Shuffle the Dataset


random.shuffle(training)
Step 5: Assigning Labels and Features
This shape of both the lists will be used in Classification using the NEURAL NETWORKS.

X =[]
y =[]for features, label in training:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
Step 6: Normalising X and Converting Labels to Categorical Data
X = X.astype('float32')
X /= 255
from keras.utils import np_utils
Y = np_utils.to_categorical(y, 4)
print(Y[100])
print(shape(Y))
Step 7: Split X and Y for Use in CNN
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 4)
Step 8: Define, Compile and Train the CNN Model
Define, compile and train the CNN Model | cnn image classification
batch_size = 16
nb_classes =4
nb_epochs = 5
img_rows, img_columns = 200, 200
img_channel = 3
nb_filters = 32
nb_pool = 2
nb_conv = 3

model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu,
input_shape=(200, 200, 3)),
tf.keras.layers.MaxPooling2D((2, 2), strides=2),
tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D((2, 2), strides=2),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(4, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train, y_train, batch_size = batch_size, epochs = nb_epochs, verbose = 1, validation_data = (X_test,
y_test))
Step 9: Accuracy and Score of Model
score = model.evaluate(X_test, y_test, verbose = 0 )
print("Test Score: ", score[0])
print("Test accuracy: ", score[1])
accuracy and model score | cnn image classification

In these 9 simple steps, you would be ready to train your own Convolutional Neural Networks model and solve
real-world problems using these skills. You can practice these skills on platforms like Analytics Vaidya and Haggle.
You can also play around by changing different parameters and discovering how you would get the best accuracy
and score. Try changing the batch size, the number of epochs or even adding/removing layers in the CNN model,
and have conclusion,

CNN image classification has revolutionized the field of computer vision, enabling accurate recognition of
objects within images. With its ability to automatically learn and extract complex features, CNNs have become
a powerful tool for various applications. To further enhance your understanding and skills in CNN image
classification and other advanced data science techniques, consider enrolling in our Black belt Program. This
comprehensive program offers in-depth knowledge and practical experience, empowering you to become a
proficient data scientist.

5.9 Image Classification Using Machine Learning Approaches:


Image classification is a supervised learning problem which define a set of target classes (objects to
identify in images), and train a model to recognize them using labeled example photos.
Early computer vision models relied on raw pixel data as the input to the model.

➢ Image classification is a task in machine learning that involves categorizing images into
predefined classes or labels. It is achieved by training a machine learning algorithm on a
dataset of labeled images, which allows the algorithm to learn patterns and features that
differentiate one class from another. Once trained, the algorithm can classify new, unseen
images into the appropriate classes.

➢ Image classification, is a cornerstone of computer vision, empowers computers to


automatically identify and categorize images based on their visual content. Its applications
span a vast spectrum, from object detection and scene recognition to medical image analysis
and content-based image retrieval.

➢ Image classification is the task of assigning a label to an image based on its content. For
example, an image classifier can recognize whether an image contains a cat, a dog, a flower, or
a car. Image classification is one of the most common applications of machine learning and
computer vision.
There are different approaches to perform image classification using machine learning, depending on
the type of features and algorithms used. Some of the most popular approaches are:
➢ Multilayer Perceptron (MLP): This approach treats an image as a vector of pixel values, and
feeds it directly to a neural network with multiple hidden layers. The neural network learns to
extract features and classify the image in an end-to-end manner.

MLP is more powerful than BoVW, as it can learn non-linear and complex features from the raw
pixels. However, MLP is also more prone to overfitting, as it has many parameters to tune and
requires a large amount of training data.

Moreover, MLP does not take advantage of the spatial structure and the local patterns in the
image, as it treats each pixel independently.
➢ Convolutional Neural Network (CNN): This approach is a special type of neural network that
uses convolutional layers to extract features from the image.

A convolutional layer consists of a set of filters that slide over the image and produce a feature
map, which captures the presence of certain patterns or shapes in the image. By stacking
multiple convolutional layers, the network can learn hierarchical and abstract features from the
image, such as edges, textures, shapes, and objects.

CNN also uses pooling layers to reduce the dimensionality and increase the invariance of the
features. A pooling layer applies a function, such as max or average, to a local region of the
feature map and outputs a single value. CNN is followed by one or more fully connected layers,
which perform the final classification.

CNN is the most advanced and successful approach for image classification, as it can learn high-
level and semantic features from the image, and exploit the spatial structure and the local
patterns in the image. CNN also requires less parameters and less training data than MLP, as it
shares the weights of the filters across the image.

➢ Transfer Learning: This approach leverages the knowledge and the features learned by a pre-
trained CNN on a large and generic dataset, such as ImageNet, and applies it to a new and
specific dataset.

Transfer learning can be done in two ways: feature extraction and fine-tuning. Feature extraction
involves using the pre-trained CNN as a fixed feature extractor, and feeding its output to a new
classifier, such as SVM or MLP. Fine-tuning involves updating the weights of the pre-trained CNN,
or some of its layers, using the new dataset.

Transfer learning is useful when the new dataset is small or similar to the original dataset, as it
can improve the performance and reduce the training time of the classifier.

These are some of the main approaches to perform image classification using machine learning. Each
approach has its own advantages and disadvantages, and the choice of the best approach depends on
the characteristics and the requirements of the problem.

Traditional vs. Machine Learning Approaches


• Traditional image classification methods often rely on hand-crafted features, such as edges,
shapes, and textures, which are extracted from the image and used to represent its content.
These features are then fed into a classifier, such as a support vector machine (SVM) or a neural
network, to determine the image's label.

• Machine learning approaches, on the other hand, have revolutionized image classification by
automating the feature extraction process. These methods, particularly deep learning models,
can learn hierarchical representations of images directly from raw pixel data, capturing complex
patterns and relationships that may be difficult to define manually.

How do I choose the best approach for my problem?


Choosing the best approach for your problem depends on several factors, such as:

• The size and the quality of your dataset: If you have a large and diverse dataset, you can use
more complex and powerful approaches, such as CNN or MLP, to learn from the raw pixels. If
you have a small or noisy dataset, you can use simpler and faster approaches, such as BoVW, or
use transfer learning to leverage the features learned by a pre-trained CNN.
• The similarity and the complexity of your classes: If your classes are very similar or very
complex, you need more discriminative and abstract features, which can be obtained by using
CNN or transfer learning. If your classes are very different or very simple, you can use more
generic and low-level features, which can be obtained by using BoVW or MLP.
• The computational resources and the time constraints: If you have limited resources or time,
you can use more efficient and scalable approaches, such as BoVW or feature extraction, which
require less parameters and less training time. If you have more resources or time, you can use
more expressive and accurate approaches, such as CNN or fine-tuning, which require more
parameters and more training time.

These are some of the general guidelines to help you choose the best approach for your problem.
However, there is no definitive answer, and you may need to experiment with different approaches and
compare their results to find the optimal solution for your problem.

Popular Machine Learning Algorithms for Image Classification


Numerous machine learning algorithms have been successfully applied to image classification problems.
Some of the most widely used algorithms include:

• Support Vector Machines (SVMs): SVMs are a powerful classification algorithm that finds a
hyperplane that best separates data points of different classes.

• Random Forests: Random forests are ensemble methods that combine multiple decision trees to
improve classification accuracy.

• K-Nearest Neighbors (KNN): KNN is a non-parametric algorithm that classifies an image based on
the labels of its k nearest neighbors in the feature space.

• Deep Learning Models: Deep learning models, particularly CNNs, have achieved state-of-the-art
results in image classification tasks.

Evaluation Metrics for Image Classification


The performance of image classification models is typically evaluated using various metrics, including:

• Accuracy: Accuracy is the proportion of correctly classified images.

• Precision: Precision is the proportion of positive predictions that are actually correct.

• Recall: Recall is the proportion of actual positive cases that are correctly identified.

• F1-score: The F1-score is a harmonic mean of precision and recall, providing a balanced measure
of classification performance.

Applications of Image Classification in the Real World


Image classification has found widespread applications in various domains, including:

• Object Recognition: Identifying and classifying objects in images, such as cars, pedestrians, and
animals, has applications in autonomous vehicles, surveillance systems, and robotics.

• Medical Diagnosis: Classifying medical images, such as X-rays and MRI scans, can assist doctors
in diagnosing diseases and identifying abnormalities.
• Content-Based Image Retrieval: Enabling users to search for images based on their content,
such as finding images of cats or landscapes.

• Product Classification: Classifying products in e-commerce images for product categorization


and search.

• Satellite Image Analysis: Classifying land cover types and identifying features in satellite imagery
for environmental monitoring and urban planning.

Conclusion
Image classification has become an indispensable tool in various fields, driven by advancements in
machine learning, particularly deep learning. With the increasing availability of labeled image data and
computational resources, machine learning approaches are continuously pushing the boundaries of
image classification performance, enabling new and groundbreaking applications.

5.10 Decision Trees

A decision tree is one of the most powerful tools of supervised learning algorithms used for both
classification and regression tasks.

It builds a flowchart-like tree structure where each internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal node) holds a class label. It is constructed by
recursively splitting the training data into subsets based on the values of the attributes until a stopping
criterion is met, such as the maximum depth of the tree or the minimum number of samples required to split
a node.

During training, the Decision Tree algorithm selects the best attribute to split the data based on a metric
such as entropy or Gini impurity, which measures the level of impurity or randomness in the subsets.
The goal is to find the attribute that maximizes the information gain or the reduction in impurity after the
split.

What is a decision tree?


• A decision tree is a graphical representation of a decision-making process, where each node represents
a question or a condition, each branch represents an answer or an outcome, and each leaf represents a
final decision or a prediction.

• A decision tree can be used for both classification and regression problems, where the goal is to assign
a label or a value to a given input based on a set of rules or criteria.

• A decision tree can be constructed by recursively splitting the data into smaller and smaller subsets,
based on the values of one or more features, until a stopping criterion is met, such as the maximum
depth of the tree, the minimum number of samples in a node, or the purity of the node. The feature
that is used to split the data at each node is chosen by a splitting criterion, such as the information gain
or the Gini index, which measures how much the split reduces the impurity or the uncertainty in the
data. The impurity of a node is a measure of how mixed or homogeneous the samples in the node are,
in terms of their labels or values. The lower the impurity, the more confident the prediction.

• A decision tree can be easily interpreted and understood, as it mimics the human way of thinking and
reasoning. It can also handle both numerical and categorical features, and can deal with missing values
and outliers. However, a decision tree can also suffer from some drawbacks, such as overfitting,
instability, and bias.

Decision Tree Terminologies


Some of the common Terminologies used in Decision Trees are as follows:
• Root Node: It is the topmost node in the tree, which represents the complete dataset. It is the
starting point of the decision-making process.
• Decision/Internal Node: A node that symbolizes a choice regarding an input feature. Branching off
of internal nodes connects them to leaf nodes or other internal nodes.
• Leaf/Terminal Node: A node without any child nodes that indicates a class label or a numerical
value.
• Splitting: The process of splitting a node into two or more sub-nodes using a split criterion and a
selected feature.
• Branch/Sub-Tree: A subsection of the decision tree starts at an internal node and ends at the leaf
nodes.
• Parent Node: The node that divides into one or more child nodes.
• Child Node: The nodes that emerge when a parent node is split.
• Impurity: A measurement of the target variable’s homogeneity in a subset of data. It refers to the
degree of randomness or uncertainty in a set of examples. The Gini index and entropy are two
commonly used impurity measurements in decision trees for classifications task
• Variance: Variance measures how much the predicted and the target variables vary in different
samples of a dataset. It is used for regression problems in decision trees. Mean squared error,
Mean Absolute Error, friedman_mse, or Half Poisson deviance are used to measure the variance
for the regression tasks in the decision tree.
• Information Gain: Information gain is a measure of the reduction in impurity achieved by splitting a
dataset on a particular feature in a decision tree. The splitting criterion is determined by the
feature that offers the greatest information gain, It is used to determine the most informative
feature to split on at each node of the tree, with the goal of creating pure subsets
• Pruning: The process of removing branches from the tree that do not provide any additional
information or lead to overfitting.

Attribute Selection Measures:

Construction of Decision Tree:


A tree can be “learned” by splitting the source set into subsets based on Attribute Selection Measures.
Attribute selection measure (ASM) is a criterion used in decision tree algorithms to evaluate the usefulness of
different attributes for splitting a dataset.
The goal of ASM is to identify the attribute that will create the most homogeneous subsets of data after the
split, thereby maximizing the information gain. This process is repeated on each derived subset in a recursive
manner called recursive partitioning.
The recursion is completed when the subset at a node all has the same value of the target variable, or when
splitting no longer adds value to the predictions. The construction of a decision tree classifier does not require
any domain knowledge or parameter setting and therefore is appropriate for exploratory knowledge discovery.
Decision trees can handle high-dimensional data.
Entropy:
Entropy is the measure of the degree of randomness or uncertainty in the dataset. In the case of
classifications, It measures the randomness based on the distribution of class labels in the dataset.
The entropy for a subset of the original dataset having K number of classes for the ith node can be defined as:

Where,
• S is the dataset sample.
• k is the particular class from K classes
• p(k) is the proportion of the data points that belong to class k to the total number of data points in
dataset sample S.

• Here p(i,k) should not be equal to zero.

Important points related to Entropy:


1. The entropy is 0 when the dataset is completely homogeneous, meaning that each instance
belongs to the same class. It is the lowest entropy indicating no uncertainty in the dataset
sample.
2. when the dataset is equally divided between multiple classes, the entropy is at its maximum
value. Therefore, entropy is highest when the distribution of class labels is even, indicating
maximum uncertainty in the dataset sample.
3. Entropy is used to evaluate the quality of a split. The goal of entropy is to select the attribute that
minimizes the entropy of the resulting subsets, by splitting the dataset into more homogeneous
subsets with respect to the class labels.
4. The highest information gain attribute is chosen as the splitting criterion (i.e., the reduction in
entropy after splitting on that attribute), and the process is repeated recursively to build the
decision tree.

Gini Impurity
Gini Impurity is a score that evaluates how accurate a split is among the classified groups. The Gini Impurity
evaluates a score in the range between 0 and 1, where 0 is when all observations belong to one class, and 1
is a random distribution of the elements within classes. In this case, we want to have a Gini index score as low
as possible. Gini Index is the evaluation metric we shall use to evaluate our Decision Tree Model.

Here, pi is the proportion of elements in the set that belongs to the ith category.

Information Gain:
Information gain measures the reduction in entropy or variance that results from splitting a dataset based on
a specific property. It is used in decision tree algorithms to determine the usefulness of a feature by
partitioning the dataset into more homogeneous subsets with respect to the class labels or target variable.
The higher the information gain, the more valuable the feature is in predicting the target variable.
The information gain of an attribute A, with respect to a dataset S, is calculated as follows:

where
• A is the specific attribute or class label
• |H| is the entropy of dataset sample S
• |HV| is the number of instances in the subset S that have the value v for attribute A
Information gain measures the reduction in entropy or variance achieved by partitioning the dataset on
attribute A. The attribute that maximizes information gain is chosen as the splitting criterion for building the
decision tree.
Information gain is used in both classification and regression decision trees. In classification, entropy is used as
a measure of impurity, while in regression, variance is used as a measure of impurity. The information gain
calculation remains the same in both cases, except that entropy or variance is used instead of entropy in the
formula.

How does the Decision Tree algorithm Work?

The decision tree operates by analyzing the data set to predict its classification. It commences from the tree’s
root node, where the algorithm views the value of the root attribute compared to the attribute of the record in
the actual data set. Based on the comparison, it proceeds to follow the branch and move to the next node.
The algorithm repeats this action for every subsequent node by comparing its attribute values with those of the
sub-nodes and continuing the process further. It repeats until it reaches the leaf node of the tree. The
complete mechanism can be better explained through the algorithm given below.
• Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node Classification and Regression Tree algorithm.

First Example: Decision Tree with two binary features


Before creating the decision tree for our entire dataset, we will first consider a subset, that only considers two
features: ‘likes gravity’ and ‘likes dogs’.
The first thing we have to decide is, which feature is going to be the root node. We do that by predicting the target
with only one of the features and then use the feature, that has the lowest Gini Impurity as the root node. That is,
in our case we build two shallow trees, with just the root node and two leafs. In the first case we use ‘likes gravity’
as a root node and in the second case ‘likes dogs’. We then calculate the Gini Impurity for both. The trees look like
this:

The Gini Impurity for these trees are calculated as follows:


Case 1:
Dataset 1:

Dataset 2:

The Gini Impurity is the weighted mean of both:


Case 2:
Dataset 1:

Dataset 2:

The Gini Impurity is the weighted mean of both:

That is, the first case has lower Gini Impurity and is the chosen split. In this simple example, only one
feature remains, and we can build the final decision tree.

Final Decision Tree considering only the features ‘likes gravity’ and ‘likes dogs’

Second Example: Add a numerical Variable

Until now, we considered only a subset of our data set - the categorical variables. Now we will add the numerical
variable ‘age’. The criterion for splitting is the same. We already know the Gini Impurities for ‘likes gravity’ and
‘likes dogs’. The calculation for the Gini Impurity of a numerical variable is similar, however the decision takes more
calculations.

The following steps need to be done


1. Sort the data frame by the numerical variable (‘age’)
2. Calculate the mean of neighbouring values
3. Calculate the Gini Impurity for all splits for each of these means
4.
This is again our data, sorted by age, and the mean of neighbouring values is given on the left-hand side.
The data set sorted by age. The left hand side shows the mean of neighbouring values for age.
We then have the following possible splits.
Possible splits for age and their Gini Imputity.

We can see that the Gini Impurity of all possible ‘age’ splits is higher than the one for ‘likes gravity’ and
‘likes dogs’. The lowest Gini Impurity is, when using ‘likes gravity’, i.e. this is our root node and the first
split.

The first split of the tree. ‘likes gravity’ is the root node.

The subset Dataset 2 is already pure, that is, this node is a leaf and no further splitting is necessary. The branch on
the left-hand side, Dataset 1 is not pure and can be split further. We do this in the same way as before: We
calculate the Gini Impurity for each feature: ‘likes dogs’ and ‘age’.

Possible splits for Dataset 2.


We see that the lowest Gini Impurity is given by the split “likes dogs”. We now can build our final tree.

Final Decision Tree.

How to build decision tree


To build the Decision Tree, CART (Classification and Regression Tree) algorithm is used. It works by selecting
the best split at each node based on metrics like Gini impurity or information Gain.
In order to create a decision tree. Here are the basic steps of the CART algorithm:
1. The root node of the tree is supposed to be the complete training dataset.
2. Determine the impurity of the data based on each feature present in the dataset. Impurity can be
measured using metrics like the Gini index or entropy for classification and Mean squared error,
Mean Absolute Error, friedman_mse, or Half Poisson deviance for regression.
3. Then selects the feature that results in the highest information gain or impurity reduction when
splitting the data.
4. For each possible value of the selected feature, split the dataset into two subsets (left and right),
one where the feature takes on that value, and another where it does not. The split should be
designed to create subsets that are as pure as possible with respect to the target variable.
5. Based on the target variable, determine the impurity of each resulting subset.
6. For each subset, repeat steps 2–5 iteratively until a stopping condition is met. For example, the
stopping condition could be a maximum tree depth, a minimum number of samples required to
make a split or a minimum impurity threshold.
7. Assign the majority class label for classification tasks or the mean value for regression tasks for
each terminal node (leaf node) in the tree.

Classification and Regression Tree algorithm for Classification

Let the data available at node m be Qm and it has nm samples. and tm as the threshold for node m. then, The
classification and regression tree algorithm for classification can be written as :

Here,
• H is the measure of impurities of the left and right subsets at node m. it can be entropy or Gini
impurity.
• nm is the number of instances in the left and right subsets at node m.
To select the parameter, we can write as:
# Import the necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from graphviz import Source

# Load the dataset


iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

# DecisionTreeClassifier
tree_clf = DecisionTreeClassifier(criterion='entropy',
max_depth=2)
tree_clf.fit(X, y)

# Plot the decision tree graph


export_graphviz(
tree_clf,
out_file="iris_tree.dot",
feature_names=iris.feature_names[2:],
class_names=iris.target_names,
rounded=True,
filled=True
)

with open("iris_tree.dot") as f:
dot_graph = f.read()

Source(dot_graph)

Output:

How to use a decision tree?


• A decision tree can be used for both classification and regression problems, depending on the type of the
target variable. A classification tree is a decision tree that predicts a categorical label, such as yes or no,
spam or ham, dog or cat, etc. A regression tree is a decision tree that predicts a numerical value, such as
the price of a house, the age of a person, the height of a tree, etc.
• To use a decision tree, one simply needs to follow the branches from the root node to a leaf node,
according to the values of the features in the input. For example, suppose we have a decision tree that
predicts whether a person will buy a product or not, based on their age, gender, and income. The decision
tree looks like this:

• To use this decision tree, we need to answer the questions or conditions at each node, starting from the
root node. For example, if we have a person who is 35 years old, male, and has an income of $50,000, we
would follow the path:

• Is age <= 30? No, go to the right child node.


• Is gender = male? Yes, go to the left child node.
• Is income <= 40,000? No, go to the right child node.

The right child node is a leaf node, which predicts that the person will not buy the product. Therefore, the
decision tree gives us a negative prediction for this person.

There are different algorithms to build a decision tree, such as ID3, C4.5, CART, and CHAID. These algorithms
differ in the way they handle the splitting criterion, the stopping criterion, the pruning technique, and the
handling of missing values and continuous features. To use a decision tree for image classification, one needs to
extract relevant features from the images, such as pixel values, color histograms, or other image descriptors, and
feed them to the decision tree algorithm.

What are the advantages and disadvantages of a decision tree?


A decision tree has many advantages, such as:

• It is easy to understand and interpret, as it visualizes the decision-making process and the logic behind
the prediction.
• It can handle both numerical and categorical features, and can deal with missing values and outliers by
using different strategies, such as ignoring, replacing, or splitting.
• It can perform feature selection and dimensionality reduction, as it chooses the most relevant and
informative features to split the data.
• It is fast and scalable, as it can handle large datasets and perform parallel computations.

However, a decision tree also has some disadvantages, such as:

• It can overfit the data, especially if the tree is too deep or too complex, and capture the noise or the
outliers in the data, leading to poor generalization and high variance.
• It can be unstable, as small changes in the data or the parameters can result in large changes in the
structure and the prediction of the tree, leading to high sensitivity and low robustness.
• It can be biased, as some features or splits may be favored over others, depending on the splitting
criterion, the data distribution, and the order of the features, leading to poor accuracy and high error.

How to improve a decision tree?


There are different ways to improve a decision tree, such as:

• Tuning the parameters, such as the maximum depth, the minimum samples, the splitting criterion, the
pruning technique, etc., to find the optimal balance between the complexity and the accuracy of the
tree.
• Using cross-validation, such as k-fold or leave-one-out, to evaluate the performance of the tree on
different subsets of the data, and to avoid overfitting and underfitting.
• Using ensemble methods, such as bagging, boosting, or random forest, to combine multiple decision
trees, and to reduce the variance, the bias, and the error of the prediction.
5.11 Support Vector Machines
Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear
classification, regression, and even outlier detection tasks.
SVMs can be used for a variety of tasks, such as text classification, image classification, spam
detection, handwriting identification, gene expression analysis, face detection, and anomaly detection. SVMs
are adaptable and efficient in a variety of applications because they can manage high-dimensional data and
nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating hyperplane between the different
classes available in the target feature.

Support Vector Machine Terminology

1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of
different classes in a feature space. In the case of linear classifications, it will be a linear equation
i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a
critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main objective of
the support vector machine algorithm is to maximize the margin. The wider margin indicates
better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original input data
points into high-dimensional feature spaces, so, that the hyperplane can be easily found out even if
the data points are not linearly separable in the original input space. Some of the common kernel
functions are linear, polynomial, radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a hyperplane that
properly separates the data points of different categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft
margin technique. Each data point has a slack variable introduced by the soft-margin SVM
formulation, which softens the strict margin requirement and permits certain misclassifications or
violations. It discovers a compromise between increasing the margin and reducing violations.
7. C: Margin maximisation and misclassification lines are balanced by the regularisation parameter C
in SVM. The penalty for going over the margin or misclassifying data items is decided by i t. A
stricter penalty is imposed with a greater value of C, which results in a smaller margin and perhaps
fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect classifications or
margin violations. The objective function in SVM is frequently formed by combining it with the
regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange
multipliers related to the support vectors can be used to solve SVM. The dual formulation enables
the use of kernel tricks and more effective computing

Support Vector Machine


• Support Vector Machine (SVM) is a supervised machine learning algorithm used for both
classification and regression.
• Though we say regression problems as well it’s best suited for classification. The main objective
of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can
separate the data points in different classes in the feature space.
• The hyperplane tries that the margin between the closest points of different classes should be as
maximum as possible.
• The dimension of the hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line.
• If the number of input features is three, then the hyperplane becomes a 2-D plane. It becomes
difficult to imagine when the number of features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable which is either a blue circle or a
red circle.

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because we are
considering only two input features x1, x2) that segregate our data points or do a classification between red and
blue circles. So how do we choose the best line or in general the best hyperplane that segregates our data
points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that represents the largest separation or margin
between the two classes.

Multiple hyperplanes separate the data from two classes


So we choose the hyperplane whose distance from it to the nearest data point on each side is maximized. If
such a hyperplane exists it is known as the maximum-margin hyperplane/hard margin. So from the above
figure, we choose L2. Let’s consider a scenario like shown below
Selecting hyperplane for data with outlier
Here we have one blue ball in the boundary of the red ball. So how does SVM classify the data? It’s simple! The
blue ball in the boundary of red ones is an outlier of blue balls. The SVM algorithm has the characteristics to
ignore the outlier and finds the best hyperplane that maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one


So in this type of data point what SVM does is, finds the maximum margin as done with previous data sets
along with that it adds a penalty each time a point crosses the margin. So the margins in these types of cases
are called soft margins. When there is a soft margin to the data set, the SVM tries to
minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used penalty. If no violations no hinge loss.If
violations hinge loss proportional to the distance of violation.
Till now, we were talking about linearly separable data(the group of blue balls and red balls are separable by a
straight line/linear line). What to do if data are not linearly separable?

Original 1D dataset for classification


Say, our data is shown in the figure above. SVM solves this by creating a new variable using a kernel. We call a
point xi on the line and we create a new variable yi as a function of distance from origin o. so if we plot this we
get something like as shown below
Mapping 1D data to 2D to become able to separate the two classes
In this case, the new variable y is created as a function of distance from the origin. A non-linear
function that creates a new variable is referred to as a kernel.

Mathematical intuition of Support Vector Machine

Consider a binary classification problem with two classes, labeled as +1 and -1. We have a training dataset
consisting of input feature vectors X and their corresponding class labels Y.
The equation for the linear hyperplane can be written as:

The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to the
hyperplane. The parameter b in the equation represents the offset or distance of the hyperplane from the
origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:

where ||w|| represents the Euclidean norm of the weight vector w.


Euclidean norm of the normal vector W
For Linear SVM classifier :

Optimization:
• For Hard margin linear SVM classifier:

The target variable or label for the ith training instance is denoted by the symbol ti in this statement. And ti=-1
for negative occurrences (when yi= 0) and ti=1positive instances (when yi = 1) respectively. Because we require

the decision boundary that satisfy the constraint:


• For Soft margin linear SVM classifier:

• Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange
multipliers related to the support vectors can be used to solve SVM. The optimal Lagrange
multipliers α(i) that maximize the following dual objective function

where,
• αi is the Lagrange multiplier associated with the ith training sample.
• K(xi, xj) is the kernel function that computes the similarity between two samples xi and xj. It
allows SVM to handle nonlinear classification problems by implicitly mapping the samples into a
higher-dimensional feature space.
• The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal Lagrange multipliers and the support
vectors once the dual issue has been solved and the optimal Lagrange multipliers have been discovered. The
training samples that have i > 0 are the support vectors, while the decision boundary is supplied by:

Types of Support Vector Machine


Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided into two main
parts:
• Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of different
classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This
means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide
the data points into their respective classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.
• Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be separated into
two classes by a straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can
handle nonlinearly separable data. The original input data is transformed by these kernel functions
into a higher-dimensional feature space, where the data points can be linearly separated. A linear
SVM is used to locate a nonlinear decision boundary in this modified space.

Popular kernel functions in SVM

The SVM kernel is a function that takes low-dimensional input space and transforms it into higher-dimensional
space, ie it converts nonseparable problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data transformations and then finds out the
process to separate the data based on the labels or outputs defined.

Advantages of SVM
• Effective in high-dimensional cases.
• Its memory is efficient as it uses a subset of training points in the decision function called support
vectors.
• Different kernel functions can be specified for the decision functions and its possible to specify
custom kernels.

Code
# Load the important packages
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC

# Load the datasets


cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target

#Build the model


svm = SVC(kernel="rbf", gamma=0.5, C=1.0)
# Trained the model
svm.fit(X, y)

# Plot Decision Boundary


DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)

# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()

Output:

5.12 Logistics Regression,


Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where the
goal is to predict the probability that an instance of belonging to a given class or not.
It is a kind of statistical algorithm, which analyze the relationship between a set of independent variables
and the dependent binary variables. It is a powerful tool for decision-making. For example email spam or
not.

Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where the
goal is to predict the probability that an instance of belonging to a given class. It is used for classification
algorithms its name is logistic regression. it’s referred to as regression because it takes the output of
the linear regression function as input and uses a sigmoid function to estimate the probability for the given
class.

The difference between linear regression and logistic regression is that linear regression output is the
continuous value that can be anything while logistic regression predicts the probability that an instance
belongs to a given class or not.
Logistic Regression:
It is used for predicting the categorical dependent variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for
solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function,
which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
• Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
• Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification.
Logistic Function (Sigmoid Function):
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1. o The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like
the “S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the
threshold values tends to 0.
Type of Logistic Regression:
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types
of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.
Terminologies involved in Logistic Regression:
Here are some common terms involved in logistic regression:
• Independent variables: The input characteristics or predictor factors applied to the dependent
variable’s predictions.
• Dependent variable: The target variable in a logistic regression model, which we are trying to
predict.
• Logistic function: The formula used to represent how the independent and dependent variables
relate to one another. The logistic function transforms the input variables into a probability value
between 0 and 1, which represents the likelihood of the dependent variable being 1 or 0.
• Odds: It is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could possibly
occur.
• Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the odds. In
logistic regression, the log odds of the dependent variable are modeled as a linear combination of
the independent variables and the intercept.
• Coefficient: The logistic regression model’s estimated parameters, show how the independent and
dependent variables relate to one another.
• Intercept: A constant term in the logistic regression model, which represents the log odds when all
independent variables are equal to zero.
• Maximum likelihood estimation: The method used to estimate the coefficients of the logistic
regression model, which maximizes the likelihood of observing the data given the model.

How does Logistic Regression work?


The logistic regression model transforms the linear regression function continuous value output into
categorical value output using a sigmoid function, which maps any real-valued set of independent variables
input into a value between 0 and 1. This function is known as the logistic function.
Let the independent input features be

and the dependent variable is Y having only binary value i.e. 0 or 1.

then apply the multi-linear function to the input variables X

Here is the ith observation of X, is the weights or Coefficient, and b is the bias term
also known as intercept. simply this can be represented as the dot product of weight and bias.

whatever we discussed above is the linear regression.

Sigmoid Function

Now we use the sigmoid function where the input will be z and we find the probability between 0 and 1. i.e
predicted y.
As shown above, the figure sigmoid function converts the continuous variable data into the

The odd is the ratio of something occurring to something not occurring. it is different from probability as the
probability is the ratio of something occurring to everything that could possibly occur. so odd will be

Applying natural log on odd. then log odd will be

then the final logistic regression equation will be:

Likelihood function for Logistic Regression


The predicted probabilities will p(X;b,w) = p(x) for y=1 and for y = 0 predicted probabilities will 1-p(X;b,w) = 1-
p(x)

Taking natural logs on both sides


Gradient of the log-likelihood function
To find the maximum likelihood estimates, we differentiate w.r.t w,

Assumptions for Logistic Regression


The assumptions for Logistic regression are as follows:
• Independent observations: Each observation is independent of the other. meaning there is no
correlation between any input variables.
• Binary dependent variables: It takes the assumption that the dependent variable must be binary or
dichotomous, meaning it can take only two values. For more than two categories softmax functions
are used.
• Linearity relationship between independent variables and log odds: The relationship between the
independent variables and the log odds of the dependent variable should be linear.
• No outliers: There should be no outliers in the dataset.
• Large sample size: The sample size is sufficiently large

Types of Logistic Regression


Based on the number of categories, Logistic regression can be classified as:
Binomial Logistic regression:
target variable can have only 2 possible types: “0” or “1” which may represent “win” vs “loss”, “pass” vs
“fail”, “dead” vs “alive”, etc., in this case, sigmoid functions are used, which is already discussed above.
# import the necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load the breast cancer dataset
X, y = load_breast_cancer(return_X_y=True)
# split the train and test dataset
X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size=0.20,
random_state=23)
# LogisticRegression
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
# Prediction
y_pred = clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)


print("Logistic Regression model accuracy (in %):", acc*100)

Output:
Logistic Regression model accuracy (in %): 95.6140350877193
Multinomial Logistic Regression

target variable can have 3 or more possible types which are not ordered(i.e. types have no quantitative
significance) like “disease A” vs “disease B” vs “disease C”.
In this case, the softmax function is used in place of the sigmoid function. Softmax function for K classes will be:

Then the probability will be:

In Multinomial Logistic Regression, the output variable can have more than two possible discrete outputs.
Consider the Digit Dataset.

from sklearn.model_selection import train_test_split


from sklearn import datasets, linear_model, metrics

# load the digit dataset


digits = datasets.load_digits()

# defining feature matrix(X) and response vector(y)


X = digits.data
y = digits.target

# splitting X and y into training and testing sets


X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size=0.4,
random_state=1)

# create logistic regression object


reg = linear_model.LogisticRegression()

# train the model using the training sets


reg.fit(X_train, y_train)

# making predictions on the testing set


y_pred = reg.predict(X_test)

# comparing actual response values (y_test)


# with predicted response values (y_pred)
print("Logistic Regression model accuracy(in %):",
metrics.accuracy_score(y_test, y_pred)*100)

Output:
Logistic Regression model accuracy(in %): 96.52294853963839

Ordinal Logistic Regression

It deals with target variables with ordered categories. For example, a test score can be categorized as: “very
poor”, “poor”, “good”, or “very good”. Here, each category can be given a score like 0, 1, 2, or 3.
Applying steps in logistic regression modeling:
The following are the steps involved in logistic regression modeling:
• Define the problem: Identify the dependent variable and independent variables and determine if
the problem is a binary classification problem.
• Data preparation: Clean and preprocess the data, and make sure the data is suitable for logistic
regression modeling.
• Exploratory Data Analysis (EDA): Visualize the relationships between the dependent and
independent variables, and identify any outliers or anomalies in the data.
• Feature Selection: Choose the independent variables that have a significant relationship with the
dependent variable, and remove any redundant or irrelevant features.
• Model Building: Train the logistic regression model on the selected independent variables and
estimate the coefficients of the model.
• Model Evaluation: Evaluate the performance of the logistic regression model using appropriate
metrics such as accuracy, precision, recall, F1-score, or AUC-ROC.
• Model improvement: Based on the results of the evaluation, fine-tune the model by adjusting the
independent variables, adding new features, or using regularization techniques to reduce
overfitting.
• Model Deployment: Deploy the logistic regression model in a real-world scenario and make
predictions on new data.

Logistic Regression Model Thresholding


Logistic regression becomes a classification technique only when a decision threshold is brought into the
picture. The setting of the threshold value is a very important aspect of Logistic regression and is dependent on
the classification problem itself.
The decision for the value of the threshold value is majorly affected by the values of precision and
recall. Ideally, we want both precision and recall to be 1, but this seldom is the case.

In the case of a Precision-Recall tradeoff, we use the following arguments to decide upon the threshold:
1. Low Precision/High Recall: In applications where we want to reduce the number of false negatives
without necessarily reducing the number of false positives, we choose a decision value that has a
low value of Precision or a high value of Recall. For example, in a cancer diagnosis application, we
do not want any affected patient to be classified as not affected without giving much heed to if the
patient is being wrongfully diagnosed with cancer. This is because the absence of cancer can be
detected by further medical diseases but the presence of the disease cannot be detected in an
already rejected candidate.
2. High Precision/Low Recall: In applications where we want to reduce the number of false positives
without necessarily reducing the number of false negatives, we choose a decision value that has a
high value of Precision or a low value of Recall. For example, if we are classifying customers
whether they will react positively or negatively to a personalized advertisement, we want to be
absolutely sure that the customer will react positively to the advertisement because otherwise, a
negative reaction can cause a loss of potential sales from the customer.

Introduction to Real-Time Use Cases:

5.13 Finding Palm Lines,


Studying OpenCV with python by working on a project which aims to detect the palm lines.
What I have done is basically use Canny edge detection and then apply Hough line detection on the edges but the
outcome is not so good.
Here is the source code I am using:

original = cv2.imread(file)
img = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
save_image_file(img, "gray")

img = cv2.equalizeHist(img)
save_image_file(img, "equalize")

img = cv2.GaussianBlur(img, (9, 9), 0)


save_image_file(img, "blur")

img = cv2.Canny(img, 40, 80)


save_image_file(img, "canny")

lined = np.copy(original) * 0
lines = cv2.HoughLinesP(img, 1, np.pi / 180, 15, np.array([]), 50, 20)
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(lined, (x1, y1), (x2, y2), (0, 0, 255))
save_image_file(lined, "lined")

output = cv2.addWeighted(original, 0.8, lined, 1, 0)


save_image_file(output, "output")

5.14 Detecting Faces,


Here's an in-depth study material covering all the aspects of the provided Python code for real-time face and eye
detection using OpenCV:

1. Overview of the Program

The objective of this program is to use OpenCV, a popular computer vision library, to detect and track faces and eyes in
real-time using a webcam. The program utilizes pre-trained Haar Cascade classifiers, which are machine learning models
used for object detection. These classifiers can identify specific features of objects, such as faces and eyes, in an image or
video stream.

2. Prerequisites and Setup

Before running the code, make sure your system meets the following requirements:

a. Software Requirements:

• Python 2.7.x: The code is written for Python 2.7, which is compatible with older versions of OpenCV. Consider
using Python 3.x and OpenCV 4.x for better performance and support.
• NumPy: This library is used for handling arrays and matrices.
• OpenCV 2.7.x: The OpenCV library used here is version 2.7.x, which is compatible with Python 2.7.

b. Installation Steps:

1. Download Python 2.7.x:


o Download from Python's official website.
2. Install OpenCV and NumPy:
o Run the following commands in your terminal or command prompt:
o pip install numpy
o pip install opencv-python==2.7.0.72 # Ensure this is the compatible version
3. Download Haar Cascade Classifier Files:
o Download haarcascade_frontalface_default.xml and haarcascade_eye.xml from OpenCV's
GitHub repository and save them in the same directory as the Python script.

3. Understanding the Code

Let's go through the code line by line and explain its functionality:

a. Import Libraries

import cv2 # Import the OpenCV library

• cv2: The OpenCV library is imported to handle image processing tasks.

b. Load Pre-trained Classifiers

# Load the pre-trained face and eye classifiers


face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

• cv2.CascadeClassifier(): Loads the Haar Cascade classifier XML file, which is used for detecting objects
(faces and eyes in this case).

c. Capture Video from Webcam

cap = cv2.VideoCapture(0) # Capture video from the default camera (0 for the default webcam)

• cv2.VideoCapture(0): Initializes the video capture from the default camera. You can use 1 or another number if
you're using an external camera.

d. Main Loop for Processing Video Frames

while True:
# Read a frame from the camera
ret, img = cap.read() # `ret` indicates if the frame was read successfully, `img` is the
frame itself

# Convert the frame to gray scale for better detection


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

• Loop: The while True loop runs indefinitely, continuously capturing frames from the webcam.
• cv2.cvtColor(): Converts the captured frame from color (BGR) to grayscale. Grayscale images simplify
processing and are more efficient for detection.

e. Face Detection

# Detect faces of different sizes in the input image


faces = face_cascade.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5)

• detectMultiScale():
o Detects objects (faces) of varying sizes in the input image.
o scaleFactor: Specifies how much the image size is reduced at each image scale. A value of 1.3 means
that the image is reduced by 30% at each scale.
o minNeighbors: Specifies how many neighbors each candidate rectangle should have to retain it. A value
of 5 works well for face detection.
f. Draw Rectangles Around Detected Faces

for (x, y, w, h) in faces:


# Draw a rectangle around the detected face
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 2)

# Region of interest for gray and color images


roi_gray = gray[y:y + h, x:x + w]
roi_color = img[y:y + h, x:x + w]

• cv2.rectangle(): Draws a rectangle around the detected face with color (255, 255, 0) (blue) and thickness 2.
• roi_gray and roi_color: Define the region of interest (ROI) within the frame to be used for eye detection.

g. Eye Detection within Faces

# Detect eyes within the region of interest for the face


eyes = eye_cascade.detectMultiScale(roi_gray)

# Loop through the detected eyes


for (ex, ey, ew, eh) in eyes:
# Draw a rectangle around the detected eye
cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 127, 255), 2)

• Detect eyes: The eye_cascade.detectMultiScale() method detects eyes within the roi_gray, which is the
gray-scale version of the detected face.
• Draw rectangles: For each detected eye, cv2.rectangle() is used to draw rectangles around them with color
(0, 127, 255) (orange) and thickness 2.

h. Display the Image with Annotations

cv2.imshow('img', img) # Display the image with detected faces and eyes

• cv2.imshow(): Opens a window named 'img' to display the current frame with rectangles drawn around faces and
eyes.

i. Exit the Program

k = cv2.waitKey(30) & 0xff # Wait for 30 milliseconds for a key event


if k == 27: # Exit if 'Esc' key (27) is pressed
break

• cv2.waitKey(): Waits for a key event for 30 milliseconds.


• & 0xff: Ensures compatibility across different systems.
• Exit on 'Esc' key: If the Esc key (ASCII code 27) is pressed, the loop breaks, stopping the program.

j. Release Resources

cap.release() # Releases the video capture object


cv2.destroyAllWindows() # Closes all OpenCV windows

• Release the webcam: cap.release() releases the webcam so other applications can use it.
• Close the window: cv2.destroyAllWindows() closes all the OpenCV windows that were opened.

4. Expected Output
When you run the script, a window titled img will display the webcam feed with rectangles drawn around detected faces
and eyes. The program will continue running until you press the Esc key to stop it.

5. Customization and Extensions

• Detect other objects: Train or use pre-trained Haar Cascade classifiers for other objects (e.g., cars, animals).
• Use Python 3.x and OpenCV 4.x: Consider updating to Python 3.x and the latest OpenCV library for enhanced
features and better support.
• Adjust detection parameters: Modify the scaleFactor and minNeighbors parameters for optimal detection
based on lighting and camera quality.

6. Common Issues and Troubleshooting

• Poor detection in low light: Ensure good lighting conditions for better detection.
• False positives/negatives: Adjust the parameters or use more robust classifiers like deep learning-based detectors
(e.g., DNNs with OpenCV).
• Compatibility issues: Use the latest version of Python and OpenCV for improved performance.

7. Potential Improvements

• Add face tracking: Integrate a tracking algorithm (e.g., cv2.TrackerKLT or cv2.TrackerMOSSE) to track faces
across multiple frames.
• Integrate facial recognition: Combine this code with facial recognition libraries like face_recognition for
identifying and matching specific faces.
• Enhance user interface: Use cv2.putText() to label detected faces or show relevant information on the display.

Conclusion

This study material provides you with an understanding of how to implement real-time face and eye detection using
OpenCV. With this foundation, you can experiment with more complex computer vision tasks and extend the program
for specific use cases like surveillance, access control, and more.
5.15 Recognizing Faces,
We will build a detector to identify the human face in a photo from Unsplash. Make sure to save the picture to
your working directory and rename it to input_image before coding along.

Step 1: Import the OpenCV Package

Now, let’s import OpenCV and enter the input image path with the following lines of code:

import cv2

imagePath = 'input_image.jpg'

Run code

POWERED BY

Step 2: Read the Image

Then, we need to read the image with OpenCV’s imread() function:


img = cv2.imread(imagePath)

Run code

POWERED BY

This will load the image from the specified file path and return it in the form of a Numpy array.

Let’s print the dimensions of this array:


img.shape

Run code

POWERED BY

(4000, 2667, 3)

Run code

POWERED BY

Notice that this is a 3-dimensional array. The array’s values represent the picture’s height, width, and channels
respectively. Since this is a color image, there are three channels used to depict it - blue, green, and red (BGR).

Note that while the conventional sequence used to represent images is RGB (Red, Blue, Green), the OpenCV library
uses the opposite layout (Blue, Green, Red).

Step 3: Convert the Image to Grayscale


To improve computational efficiency, we first need to convert this image to grayscale before performing face
detection on it:
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Run code

POWERED BY

Let’s now examine the dimensions of this grayscale image:


gray_image.shape

Run code

POWERED BY

(4000, 2667)

Run code

POWERED BY

Notice that this array only has two values since the image is grayscale and no longer has the third color channel.

Step 4: Load the Classifier

Let’s load the pre-trained Haar Cascade classifier that is built into OpenCV:

face_classifier = cv2.CascadeClassifier(

cv2.data.haarcascades + "haarcascade_frontalface_default.xml"

Run code

POWERED BY

Notice that we are using a file called haarcascade_frontalface_default.xml. This classifier is designed specifically for
detecting frontal faces in visual input.

OpenCV also provides other pre-trained models to detect different objects within an image - such as a person’s
eyes, smile, upper body, and even a vehicle’s license plate. You can learn more about the different classifiers built
into OpenCV by examining the library’s GitHub repository.

Step 5: Perform the Face Detection

We can now perform face detection on the grayscale image using the classifier we just loaded:

face = face_classifier.detectMultiScale(
gray_image, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40)

Run code

POWERED BY

Let’s break down the methods and parameters specified in the above code:

1. detectMultiScale():
The detectMultiScale() method is used to identify faces of different sizes in the input image.

1. grey_image:

The first parameter in this method is called grey_image, which is the grayscale image we created previously.

1. scaleFactor:

This parameter is used to scale down the size of the input image to make it easier for the algorithm to detect larger
faces. In this case, we have specified a scale factor of 1.1, indicating that we want to reduce the image size by 10%.

1. minNeighbors:

The cascade classifier applies a sliding window through the image to detect faces in it. You can think of these
windows as rectangles.

Initially, the classifier will capture a large number of false positives. These are eliminated using
the minNeighbors parameter, which specifies the number of neighboring rectangles that need to be identified for an
object to be considered a valid detection.

To summarize, passing a small value like 0 or 1 to this parameter would result in a high number of false positives,
whereas a large number could lead to losing out on many true positives.

The trick here is to find a tradeoff that allows us to eliminate false positives while also accurately identifying true
positives.

1. minSize:

Finally, the minSize parameter sets the minimum size of the object to be detected. The model will ignore faces that
are smaller than the minimum size specified.

Step 6: Drawing a Bounding Box

Now that the model has detected the faces within the image, let’s run the following lines of code to create a
bounding box around these faces:

for (x, y, w, h) in face:

cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 4)

Run code

POWERED BY
The face variable is an array with four values: the x and y axis in which the faces were detected, and their width
and height. The above code iterates over the identified faces and creates a bounding box that spans across these
measurements.

The parameter 0,255,0 represents the color of the bounding box, which is green, and 4 indicates its thickness.

Step 7: Displaying the Image

To display the image with the detected faces, we first need to convert the image from the BGR format to RGB:
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Run code

POWERED BY

Now, let’s use the Matplotlib library to display the image:

import matplotlib.pyplot as plt

plt.figure(figsize=(20,10))

plt.imshow(img_rgb)

plt.axis('off')

Run code

POWERED BY

The above code should generate the following output:


Great!

The model has successfully detected the human face in this image and created a bounding box around it.

Real-Time Face Detection with OpenCV


Now that we have successfully performed face detection on a static image with OpenCV, let’s see how to do the
same on a live video stream.

Step 1: Pre-Requisites

First, let’s go ahead and import the OpenCV library and load the Haar Cascade model just like we did in the
previous section. You can skip this block of code if you already ran it previously:

import cv2

face_classifier = cv2.CascadeClassifier(

cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)

Run code

POWERED BY

Step 2: Access the Webcam

Now, we need to access our device’s camera to read a live stream of video data. This can be done with the
following code:
video_capture = cv2.VideoCapture(0)

Run code

POWERED BY

Notice that we have passed the parameter 0 to the VideoCapture() function. This tells OpenCV to use the default
camera on our device. If you have multiple cameras attached to your device, you can change this parameter value
accordingly.

Step 3: Identifying Faces in the Video Stream

Now, let’s create a function to detect faces in the video stream and draw a bounding box around them:

def detect_bounding_box(vid):

gray_image = cv2.cvtColor(vid, cv2.COLOR_BGR2GRAY)

faces = face_classifier.detectMultiScale(gray_image, 1.1, 5, minSize=(40, 40))

for (x, y, w, h) in faces:

cv2.rectangle(vid, (x, y), (x + w, y + h), (0, 255, 0), 4)

return faces

Run code

POWERED BY

The detect_bounding_box function takes the video frame as input.

In this function, we are using the same codes as we did earlier to convert the frame into grayscale before
performing face detection.

Then, we are also detecting the face in this image using the same parameter values for scaleFactor, minNeighbors,
and minSize as we did previously.

Finally, we draw a green bounding box of thickness 4 around the frame.


Step 4: Creating a Loop for Real-Time Face Detection

Now, we need to create an indefinite while loop that will capture the video frame from our webcam and apply the
face detection function to it:

while True:

result, video_frame = video_capture.read() # read frames from the video

if result is False:

break # terminate the loop if the frame is not read successfully

faces = detect_bounding_box(

video_frame

) # apply the function we created to the video frame

cv2.imshow(

"My Face Detection Project", video_frame

) # display the processed frame in a window named "My Face Detection Project"

if cv2.waitKey(1) & 0xFF == ord("q"):

break

video_capture.release()

cv2.destroyAllWindows()

Run code
POWERED BY

After running the above code, you should see a window called My Face Detection Project appear on the screen:

The algorithm should track your face and create a green bounding box around it regardless of where you move
within the frame.

In the frame above, the model recognizes my face and my picture on the driving license I’m holding up.

You can also test the efficacy of this model by holding up multiple pictures or by getting different people to stand
at various angles behind the camera. The model should be able to identify all human faces in different
backgrounds or lighting settings.

If you’d like to exit the program, you can press the “q” key on your keyboard to break out of the loop.

5.16 Tracking Movements,


6 jimport imutils
7 import time
8 import cv2
9
10 previousFrame = None
11
12 def searchForMovement(cnts, frame, min_area):
13
14 text = "Undetected"
15
16 flag = 0
17
18 for c in cnts:
19 # if the contour is too small, ignore it
20 if cv2.contourArea(c) < min_area:
21 continue
22
23 #Use the flag to prevent the detection of other motions in the video
24 if flag == 0:
25 (x, y, w, h) = cv2.boundingRect(c)
26
27 #print("x y w h")
28 #print(x,y,w,h)
29 cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
30 text = "Detected"
31 flag = 1
32
33 return frame, text
34
35 def trackMotion(ret,frame, gaussian_kernel, sensitivity_value, min_area):
36
37
38 if ret:
39
40 # Convert to grayscale and blur it for better frame difference
41 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
42 gray = cv2.GaussianBlur(gray, (gaussian_kernel, gaussian_kernel), 0)
43
44
45
46 global previousFrame
47
48 if previousFrame is None:
49 previousFrame = gray
50 return frame, "Uninitialized", frame, frame
51
52
53
54 frameDiff = cv2.absdiff(previousFrame, gray)
55 thresh = cv2.threshold(frameDiff, sensitivity_value, 255, cv2.THRESH_BINARY)[1]
56
57 thresh = cv2.dilate(thresh, None, iterations=2)
58 _, cnts, _ = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
59
60 frame, text = searchForMovement(cnts, frame, min_area)
61 #previousFrame = gray
62
63 return frame, text, thresh, frameDiff
64
65
66
67
68 if __name__ == '__main__':
69
70 video = "Track.avi"
71 video0 = "Track.mp4"
72 video1= "Ntest1.avi"
73 video2= "Ntest2.avi"
74
75 camera = cv2.VideoCapture(video1)
76 time.sleep(0.25)
77 min_area = 5000 #int(sys.argv[1])
78
79 cv2.namedWindow("Security Camera Feed")
80
81
82 while camera.isOpened():
83
84 gaussian_kernel = 27
85 sensitivity_value = 5
86 min_area = 2500
87
88 ret, frame = camera.read()
89
90 #Check if the next camera read is not null
91 if ret:
92 frame, text, thresh, frameDiff = trackMotion(ret,frame, gaussian_kernel, sensitivity_value, min_area)
93
94 else:
95 print("Video Finished")
96 break
97
98
99 cv2.namedWindow('Thresh',cv2.WINDOW_NORMAL)
100 cv2.namedWindow('Frame Difference',cv2.WINDOW_NORMAL)
101 cv2.namedWindow('Security Camera Feed',cv2.WINDOW_NORMAL)
102
103 cv2.resizeWindow('Thresh', 800,600)
104 cv2.resizeWindow('Frame Difference', 800,600)
105 cv2.resizeWindow('Security Camera Feed', 800,600)
106 # uncomment to see the tresh and framedifference displays
107 cv2.imshow("Thresh", thresh)
108 cv2.imshow("Frame Difference", frameDiff)
109
110
111
112 cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
113 cv2.imshow("Security Camera Feed", frame)
114
115 key = cv2.waitKey(3) & 0xFF
116 if key == 27 or key == ord('q'):
117 print("Bye")
118 break
119
120 camera.release()
121 cv2.destroyAllWindows()
122 This picture shows how the very first frame is still affecting the frame difference results, which forces the
box to cover area with no motion.
123
124 This one shows a case when motion is ignored an no-longer existing motion (frame difference from the
second and first frames of the video) being falsely detected. When I allow multiple tracking it tracks both,
which is still wrong since it detects an empty area.
5.16 Detecting Lanes
Road Lane Detection requires to detection of the path of self-driving cars and avoiding
the risk of entering other lanes. Lane recognition algorithms reliably identify the
location and borders of the lanes by analyzing the visual input. Advanced driver
assistance systems (ADAS) and autonomous vehicle systems both heavily rely on
them. Today we will be talking about one of these lane detection algorithms. The
steps involved are:

• Capturing and decoding video file: We will capture the video using VideoFileClip
object and after the capturing has been initialized every video frame is decoded (i.e.
converting into a sequence of images).
• Grayscale conversion of image: The video frames are in RGB format, RGB is
converted to grayscale because processing a single channel image is faster than
processing a three-channel colored image.
• Reduce noise: Noise can create false edges, therefore before going further, it’s
imperative to perform image smoothening. Gaussian blur is used to perform this
process. Gaussian blur is a typical image filtering technique for lowering noise and
enhancing image characteristics. The weights are selected using a Gaussian
distribution, and each pixel is subjected to a weighted average that considers the
pixels surrounding it. By reducing high-frequency elements and improving overall
image quality, this blurring technique creates softer, more visually pleasant images.
• Canny Edge Detector: It computes gradient in all directions of our blurred image and
traces the edges with large changes in intensity. For more explanation please go
through this article: Canny Edge Detector
• Region of Interest: This step is to take into account only the region covered by the
road lane. A mask is created here, which is of the same dimension as our road
image. Furthermore, bitwise AND operation is performed between each pixel of our
canny image and this mask. It ultimately masks the canny image and shows the
region of interest traced by the polygonal contour of the mask.
• Hough Line Transform: In image processing, the Hough transformation is a feature
extraction method used to find basic geometric objects like lines and circles. By
converting the picture space into a parameter space, it makes it possible to identify
shapes by accumulating voting points. We’ll use the probabilistic Hough Line
Transform in our algorithm. The Hough transformation has been extended to
address the computational complexity with the probabilistic Hough transformation.
In order to speed up processing while preserving accuracy in shape detection, it
randomly chooses a selection of picture points and applies the Hough
transformation solely to those points.
• Draw lines on the Image or Video: After identifying lane lines in our field of interest
using Hough Line Transform, we overlay them on our visual input(video
stream/image).
Dataset: To demonstrate the working of this algorithm we will be working on a video
file of a road. You can download the dataset from this GitHub link – Dataset
Note: This code is implemented in google colab. If you are working on any other editor
you might have make some alterations in code because colab has some dependency
issues with OpenCV
Steps to Implement Road Lane Detection
Step 1: Install OpenCV library in Python.
• Python3

!pip install -q opencv-python


Step 2: Import the necessary libraries.
• Python3

# Libraries for working with image processing


import numpy as np
import pandas as pd
import cv2
from google.colab.patches import cv2_imshow
# Libraries needed to edit/save/watch video clips
from moviepy import editor
import moviepy
Step 3: Define the driver function for our algorithm.
• Python3

def process_video(test_video, output_video):


"""
Read input video stream and produce a video file with detected lane lines.
Parameters:
test_video: location of input video file
output_video: location where output video file is to be saved
"""
# read the video file using VideoFileClip without audio
input_video = editor.VideoFileClip(test_video, audio=False)
# apply the function "frame_processor" to each frame of the video
# will give more detail about "frame_processor" in further steps
# "processed" stores the output video
processed = input_video.fl_image(frame_processor)
# save the output video stream to an mp4 file
processed.write_videofile(output_video, audio=False)
Step 4: Define “frame_processor” function where all the processing happens on a
frame to detect lane lines.
• Python3

def frame_processor(image):
"""
Process the input frame to detect lane lines.
Parameters:
image: image of a road where one wants to detect lane lines
(we will be passing frames of video to this function)
"""
# convert the RGB image to Gray scale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# applying gaussian Blur which removes noise from the image
# and focuses on our region of interest
# size of gaussian kernel
kernel_size = 5
# Applying gaussian blur to remove noise from the frames
blur = cv2.GaussianBlur(grayscale, (kernel_size, kernel_size), 0)
# first threshold for the hysteresis procedure
low_t = 50
# second threshold for the hysteresis procedure
high_t = 150
# applying canny edge detection and save edges in a variable
edges = cv2.Canny(blur, low_t, high_t)
# since we are getting too many edges from our image, we apply
# a mask polygon to only focus on the road
# Will explain Region selection in detail in further steps
region = region_selection(edges)
# Applying hough transform to get straight lines from our image
# and find the lane lines
# Will explain Hough Transform in detail in further steps
hough = hough_transform(region)
#lastly we draw the lines on our resulting frame and return it as output
result = draw_lane_lines(image, lane_lines(image, hough))
return result
Output:

Canny Edge Detection output


Step 5: Region Selection
Till now we have converted frames from RGB to Grayscale, applied Gaussian Blur to
reduce noise and used canny edge detection. Next we will select the region where we
want to detect road lanes.
• Python3
def region_selection(image):
"""
Determine and cut the region of interest in the input image.
Parameters:
image: we pass here the output from canny where we have
identified edges in the frame
"""
# create an array of the same size as of the input image
mask = np.zeros_like(image)
# if you pass an image with more then one channel
if len(image.shape) > 2:
channel_count = image.shape[2]
ignore_mask_color = (255,) * channel_count
# our image only has one channel so it will go under "else"
else:
# color of the mask polygon (white)
ignore_mask_color = 255
# creating a polygon to focus only on the road in the picture
# we have created this polygon in accordance to how the camera was placed
rows, cols = image.shape[:2]
bottom_left = [cols * 0.1, rows * 0.95]
top_left = [cols * 0.4, rows * 0.6]
bottom_right = [cols * 0.9, rows * 0.95]
top_right = [cols * 0.6, rows * 0.6]
vertices = np.array([[bottom_left, top_left, top_right, bottom_right]],
dtype=np.int32)
# filling the polygon with white color and generating the final mask
cv2.fillPoly(mask, vertices, ignore_mask_color)
# performing Bitwise AND on the input image and mask to get only the edges on the
road
masked_image = cv2.bitwise_and(image, mask)
return masked_image
Output:
Region Selection Output
Step 6: Now we will be identifying straight lines in the output image from the above
function using Probabilistic Hough Transform
• Python3

def hough_transform(image):
"""
Determine and cut the region of interest in the input image.
Parameter:
image: grayscale image which should be an output from the edge detector
"""
# Distance resolution of the accumulator in pixels.
rho = 1
# Angle resolution of the accumulator in radians.
theta = np.pi/180
# Only lines that are greater than threshold will be returned.
threshold = 20
# Line segments shorter than that are rejected.
minLineLength = 20
# Maximum allowed gap between points on the same line to link them
maxLineGap = 500
# function returns an array containing dimensions of straight lines
# appearing in the input image
return cv2.HoughLinesP(image, rho = rho, theta = theta, threshold = threshold,
minLineLength = minLineLength, maxLineGap = maxLineGap)
Output:
[[[284 180 382 278]]

[[281 180 379 285]]

[[137 274 183 192]]

[[140 285 189 188]]

[[313 210 388 285]]

[[139 285 188 188]]

[[132 282 181 194]]

[[146 285 191 196]]

[[286 187 379 284]]]


Step 7: Plotting Lines on video frames
Now that we have received the coordinates using Hough Transform, we will plot them
on our original image(frame) but as we can see that we are getting coordinates of
more than 2 lines so we will first find slope of left and right lane and then overlay
them over the original image.
We have define 4 functions here to help draw left and right lane on our input frame:
1. Average_Slope_Intercept: This function takes in the hough transform lines and
calculate their slope and intercept. If the slope of a line is negative then it belongs to
left lane else the line belongs to the right lane. Then we calculate the weighted
average slope and intercept of left lane and right lanes.
2. Pixel_Points: By using slope, intercept and y-values of the line we find the x values
for the line and returns the x and y coordinates of lanes as integers.
3. Lane_Lines: The function where Average_Slope_Intercept and Pixel Points are called
and coordinates of right lane and left lane are calculated.
4. Draw_Lane_Lines: This function draws the left lane and right lane of the road on the
input frame. Returns the output frame which is then stored in the variable
“processed” in our driver function “process_video”.
• Python3

def average_slope_intercept(lines):
"""
Find the slope and intercept of the left and right lanes of each image.
Parameters:
lines: output from Hough Transform
"""
left_lines = [] #(slope, intercept)
left_weights = [] #(length,)
right_lines = [] #(slope, intercept)
right_weights = [] #(length,)

for line in lines:


for x1, y1, x2, y2 in line:
if x1 == x2:
continue
# calculating slope of a line
slope = (y2 - y1) / (x2 - x1)
# calculating intercept of a line
intercept = y1 - (slope * x1)
# calculating length of a line
length = np.sqrt(((y2 - y1) ** 2) + ((x2 - x1) ** 2))
# slope of left lane is negative and for right lane slope is positive
if slope < 0:
left_lines.append((slope, intercept))
left_weights.append((length))
else:
right_lines.append((slope, intercept))
right_weights.append((length))
#
left_lane = np.dot(left_weights, left_lines) / np.sum(left_weights) if
len(left_weights) > 0 else None
right_lane = np.dot(right_weights, right_lines) / np.sum(right_weights) if
len(right_weights) > 0 else None
return left_lane, right_lane

def pixel_points(y1, y2, line):


"""
Converts the slope and intercept of each line into pixel points.
Parameters:
y1: y-value of the line's starting point.
y2: y-value of the line's end point.
line: The slope and intercept of the line.
"""
if line is None:
return None
slope, intercept = line
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
y1 = int(y1)
y2 = int(y2)
return ((x1, y1), (x2, y2))

def lane_lines(image, lines):


"""
Create full lenght lines from pixel points.
Parameters:
image: The input test image.
lines: The output lines from Hough Transform.
"""
left_lane, right_lane = average_slope_intercept(lines)
y1 = image.shape[0]
y2 = y1 * 0.6
left_line = pixel_points(y1, y2, left_lane)
right_line = pixel_points(y1, y2, right_lane)
return left_line, right_line
def draw_lane_lines(image, lines, color=[255, 0, 0], thickness=12):
"""
Draw lines onto the input image.
Parameters:
image: The input test image (video frame in our case).
lines: The output lines from Hough Transform.
color (Default = red): Line color.
thickness (Default = 12): Line thickness.
"""
line_image = np.zeros_like(image)
for line in lines:
if line is not None:
cv2.line(line_image, *line, color, thickness)
return cv2.addWeighted(image, 1.0, line_image, 1.0, 0.0)
Output:

Road Lane Line Detection Output on an image


Complete Code for Real-time Road Lane Detection
Python3
import numpy as np
import pandas as pd
import cv2
from google.colab.patches import cv2_imshow
#Import everything needed to edit/save/watch video clips
from moviepy import editor
import moviepy

def region_selection(image):
"""
Determine and cut the region of interest in the input image.
Parameters:
image: we pass here the output from canny where we have
identified edges in the frame
"""
# create an array of the same size as of the input image
mask = np.zeros_like(image)
# if you pass an image with more then one channel
if len(image.shape) > 2:
channel_count = image.shape[2]
ignore_mask_color = (255,) * channel_count
# our image only has one channel so it will go under "else"
else:
# color of the mask polygon (white)
ignore_mask_color = 255
# creating a polygon to focus only on the road in the picture
# we have created this polygon in accordance to how the camera was placed
rows, cols = image.shape[:2]
bottom_left = [cols * 0.1, rows * 0.95]
top_left = [cols * 0.4, rows * 0.6]
bottom_right = [cols * 0.9, rows * 0.95]
top_right = [cols * 0.6, rows * 0.6]
vertices = np.array([[bottom_left, top_left, top_right, bottom_right]],
dtype=np.int32)
# filling the polygon with white color and generating the final mask
cv2.fillPoly(mask, vertices, ignore_mask_color)
# performing Bitwise AND on the input image and mask to get only the edges on the
road
masked_image = cv2.bitwise_and(image, mask)
return masked_image

def hough_transform(image):
"""
Determine and cut the region of interest in the input image.
Parameter:
image: grayscale image which should be an output from the edge detector
"""
# Distance resolution of the accumulator in pixels.
rho = 1
# Angle resolution of the accumulator in radians.
theta = np.pi/180
# Only lines that are greater than threshold will be returned.
threshold = 20
# Line segments shorter than that are rejected.
minLineLength = 20
# Maximum allowed gap between points on the same line to link them
maxLineGap = 500
# function returns an array containing dimensions of straight lines
# appearing in the input image
return cv2.HoughLinesP(image, rho = rho, theta = theta, threshold = threshold,
minLineLength = minLineLength, maxLineGap = maxLineGap)

def average_slope_intercept(lines):
"""
Find the slope and intercept of the left and right lanes of each image.
Parameters:
lines: output from Hough Transform
"""
left_lines = [] #(slope, intercept)
left_weights = [] #(length,)
right_lines = [] #(slope, intercept)
right_weights = [] #(length,)

for line in lines:


for x1, y1, x2, y2 in line:
if x1 == x2:
continue
# calculating slope of a line
slope = (y2 - y1) / (x2 - x1)
# calculating intercept of a line
intercept = y1 - (slope * x1)
# calculating length of a line
length = np.sqrt(((y2 - y1) ** 2) + ((x2 - x1) ** 2))
# slope of left lane is negative and for right lane slope is positive
if slope < 0:
left_lines.append((slope, intercept))
left_weights.append((length))
else:
right_lines.append((slope, intercept))
right_weights.append((length))
#
left_lane = np.dot(left_weights, left_lines) / np.sum(left_weights) if
len(left_weights) > 0 else None
right_lane = np.dot(right_weights, right_lines) / np.sum(right_weights) if
len(right_weights) > 0 else None
return left_lane, right_lane

def pixel_points(y1, y2, line):


"""
Converts the slope and intercept of each line into pixel points.
Parameters:
y1: y-value of the line's starting point.
y2: y-value of the line's end point.
line: The slope and intercept of the line.
"""
if line is None:
return None
slope, intercept = line
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
y1 = int(y1)
y2 = int(y2)
return ((x1, y1), (x2, y2))

def lane_lines(image, lines):


"""
Create full lenght lines from pixel points.
Parameters:
image: The input test image.
lines: The output lines from Hough Transform.
"""
left_lane, right_lane = average_slope_intercept(lines)
y1 = image.shape[0]
y2 = y1 * 0.6
left_line = pixel_points(y1, y2, left_lane)
right_line = pixel_points(y1, y2, right_lane)
return left_line, right_line

def draw_lane_lines(image, lines, color=[255, 0, 0], thickness=12):


"""
Draw lines onto the input image.
Parameters:
image: The input test image (video frame in our case).
lines: The output lines from Hough Transform.
color (Default = red): Line color.
thickness (Default = 12): Line thickness.
"""
line_image = np.zeros_like(image)
for line in lines:
if line is not None:
cv2.line(line_image, *line, color, thickness)
return cv2.addWeighted(image, 1.0, line_image, 1.0, 0.0)
def frame_processor(image):
"""
Process the input frame to detect lane lines.
Parameters:
image: image of a road where one wants to detect lane lines
(we will be passing frames of video to this function)
"""
# convert the RGB image to Gray scale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# applying gaussian Blur which removes noise from the image
# and focuses on our region of interest
# size of gaussian kernel
kernel_size = 5
# Applying gaussian blur to remove noise from the frames
blur = cv2.GaussianBlur(grayscale, (kernel_size, kernel_size), 0)
# first threshold for the hysteresis procedure
low_t = 50
# second threshold for the hysteresis procedure
high_t = 150
# applying canny edge detection and save edges in a variable
edges = cv2.Canny(blur, low_t, high_t)
# since we are getting too many edges from our image, we apply
# a mask polygon to only focus on the road
# Will explain Region selection in detail in further steps
region = region_selection(edges)
# Applying hough transform to get straight lines from our image
# and find the lane lines
# Will explain Hough Transform in detail in further steps
hough = hough_transform(region)
#lastly we draw the lines on our resulting frame and return it as output
result = draw_lane_lines(image, lane_lines(image, hough))
return result

# driver function
def process_video(test_video, output_video):
"""
Read input video stream and produce a video file with detected lane lines.
Parameters:
test_video: location of input video file
output_video: location where output video file is to be saved
"""
# read the video file using VideoFileClip without audio
input_video = editor.VideoFileClip(test_video, audio=False)
# apply the function "frame_processor" to each frame of the video
# will give more detail about "frame_processor" in further steps
# "processed" stores the output video
processed = input_video.fl_image(frame_processor)
# save the output video stream to an mp4 file
processed.write_videofile(output_video, audio=False)

# calling driver function


process_video('input.mp4','output.mp4')

You might also like