0% found this document useful (0 votes)

59 views45 pages

UNIT - 4 ML

Uploaded by

Sunil Sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views45 pages

UNIT - 4 ML

Uploaded by

Sunil Sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Neural Network Learning: Neural Network Representation, Problems for

Neural Network Learning, Perceptions and gradient descent, Multi Layer

Network and Back propagation Algorithm, Illustrative Example of Back
Propagation Algorithm- Face Recognition, Advanced Topics inANN.
UNIT-4 NEURAL NETWORK LEARNING
1.What is a Neural Network?
Neural networks are machine learning models that mimic the complex
functions of the human brain. These models consist of interconnected
nodes or neurons that process data, learn patterns, and enable tasks
such as pattern recognition and decision-making.
In this article, we will explore the fundamentals of neural networks,
their architecture, how they work, and their applications in various
fields. Understanding neural networks is essential for anyone
interested in the advancements of artificial intelligence.
Understanding Neural Networks in Deep Learning
Neural networks are capable of learning and identifying patterns
directly from data without pre-defined rules. These networks are built
from several key components:
1. Neurons: The basic units that receive inputs, each neuron is
governed by a threshold and an activation function.
2. Connections: Links between neurons that carry information,
regulated by weights and biases.
3. Weights and Biases: These parameters determine the strength and
influence of connections.
4. Propagation Functions: Mechanisms that help process and
transfer data across layers of neurons.
5. Learning Rule: The method that adjusts weights and biases over
time to improve accuracy.
Learning in neural networks follows a structured, three-stage
process:
1. Input Computation: Data is fed into the network.
2. Output Generation: Based on the current parameters, the network
generates an output.

1
3. Iterative Refinement: The network refines its output by adjusting
weights and biases, gradually improving its performance on diverse
tasks.
In an adaptive learning environment:
 The neural network is exposed to a simulated scenario or dataset.
 Parameters such as weights and biases are updated in response to
new data or conditions.
 With each adjustment, the network’s response evolves, allowing it
to adapt effectively to different tasks or environments.

Importance of Neural Networks

Neural networks are pivotal in identifying complex patterns, solving
intricate challenges, and adapting to dynamic environments. Their
ability to learn from vast amounts of data is transformative, impacting
technologies like natural language processing, self-driving
vehicles, and automated decision-making.
Neural networks streamline processes, increase efficiency, and
support decision-making across various industries. As a backbone of
artificial intelligence, they continue to drive innovation, shaping the
future of technology.
Evolution of Neural Networks
Neural networks have undergone significant evolution since their
inception in the mid-20th century. Here’s a concise timeline of the
major developments in the field:
 1940s-1950s: The concept of neural networks began with
McCulloch and Pitts’ introduction of the first mathematical model
for artificial neurons. However, the lack of computational power
during that time posed significant challenges to further
advancements.
2
 1960s-1970s: Frank Rosenblatt’s worked on
perceptrons. Perceptrons are simple single-layer networks that can
solve linearly separable problems, but can not perform complex
tasks.
 1980s: The development of backpropagation by Rumelhart,
Hinton, and Williams revolutionized neural networks by enabling
the training of multi-layer networks. This period also saw the rise
of connectionism, emphasizing learning through interconnected
nodes.
 1990s: Neural networks experienced a surge in popularity with
applications across image recognition, finance, and more.
However, this growth was tempered by a period known as the “AI
winter,” during which high computational costs and unrealistic
expectations dampened progress.
 2000s: A resurgence was triggered by the availability of larger
datasets, advances in computational power, and innovative network
architectures. Deep learning, utilizing multiple layers, proved
highly effective across various domains.
 2010s-Present: The landscape of machine learning has been
dominated by deep learning models such as convolutional neural
networks (CNNs) and recurrent neural networks (RNNs).
Layers in Neural Network Architecture
1. Input Layer: This is where the network receives its input data.
Each input neuron in the layer corresponds to a feature in the input
data.
2. Hidden Layers: These layers perform most of the computational
heavy lifting. A neural network can have one or multiple hidden
layers. Each layer consists of units (neurons) that transform the
inputs into something that the output layer can use.
3. Output Layer: The final layer produces the output of the model.
The format of these outputs varies depending on the specific task
(e.g., classification, regression).

3
Working of Neural Networks
Forward Propagation
When data is input into the network, it passes through the network in
the forward direction, from the input layer through the hidden layers
to the output layer. This process is known as forward propagation.
Here’s what happens during this phase:
1. Linear Transformation: Each neuron in a layer receives inputs,
which are multiplied by the weights associated with the
connections. These products are summed together, and a bias is
added to the sum. This can be represented mathematically
as: z=w1x1+w2x2+…+wnxn+bz=w1x1+w2x2+…+wnxn
+b where ww represents the weights, xx represents the inputs,
and bb is the bias.
2. Activation: The result of the linear transformation (denoted as zz)
is then passed through an activation function. The activation
function is crucial because it introduces non-linearity into the
system, enabling the network to learn more complex patterns.
Popular activation functions include ReLU, sigmoid, and tanh.
Backpropagation
After forward propagation, the network evaluates its performance
using a loss function, which measures the difference between the
actual output and the predicted output. The goal of training is to
minimize this loss. This is where backpropagation comes into play:
1. Loss Calculation: The network calculates the loss, which provides
a measure of error in the predictions. The loss function could vary;
common choices are mean squared error for regression tasks or
cross-entropy loss for classification.
2. Gradient Calculation: The network computes the gradients of the
loss function with respect to each weight and bias in the network.
4
This involves applying the chain rule of calculus to find out how
much each part of the output error can be attributed to each weight
and bias.
3. Weight Update: Once the gradients are calculated, the weights
and biases are updated using an optimization algorithm like
stochastic gradient descent (SGD). The weights are adjusted in the
opposite direction of the gradient to minimize the loss. The size of
the step taken in each update is determined by the learning rate.
Iteration
This process of forward propagation, loss calculation,
backpropagation, and weight update is repeated for many iterations
over the dataset. Over time, this iterative process reduces the loss, and
the network’s predictions become more accurate.
Through these steps, neural networks can adapt their parameters to
better approximate the relationships in the data, thereby improving
their performance on tasks such as classification, regression, or any
other predictive modeling.
Example of Email Classification
Let’s consider a record of an email dataset:
Email Email Subject
ID Content Sender Line Label

“Get free
“Exclusive
1 gift cards spam@example.com 1
Offer”
now!”
To classify this email, we will create a feature vector based on the
analysis of keywords such as “free,” “win,” and “offer.”
The feature vector of the record can be presented as:
 “free”: Present (1)
 “win”: Absent (0)
 “offer”: Present (1)

5
Email Featur
Emai Conten Subject e Labe
l ID t Sender Line Vector l

“Get
free gift spam@example.co “Exclusiv [1, 0,
1 1
cards m e Offer” 1]
now!”
Now, let’s delve into the working:
1. Input Layer: The input layer contains 3 nodes that indicates the
presence of each keyword.
2. Hidden Layer
 The input data is passed through one or more hidden layers.
 Each neuron in the hidden layer performs the following operations:
1. Weighted Sum: Each input is multiplied by a corresponding
weight assigned to the connection. For example, if the weights
from the input layer to the hidden layer neurons are as follows:
o Weights for Neuron H1: [0.5, -0.2, 0.3]
o Weights for Neuron H2: [0.4, 0.1, -0.5]
2. Calculate Weighted Input:
o For Neuron H1:
o Calculation=(1×0.5)+(0×−0.2)+(1×0.3)=0.5+0
+0.3=0.8Calculation=(1×0.5)+(0×−0.2)+(1×0.3
)=0.5+0+0.3=0.8
o For Neuron H2:
o Calculation=(1×0.4)+(0×0.1)+(1×−0.5)=0.4+0
−0.5=−0.1Calculation=(1×0.4)+(0×0.1)+(1×−0.
5)=0.4+0−0.5=−0.1
3. Activation Function: The result is passed through an activation
function (e.g., ReLU or sigmoid) to introduce non-linearity.
o For H1, applying ReLU: ReLU(0.8)=0.8ReLU(0.8)=0.8
o For H2, applying ReLU: ReLU(−0.1)=0ReLU(−0.1)=0
3. Output Layer
 The activated outputs from the hidden layer are passed to the
output neuron.

6
 The output neuron receives the values from the hidden layer
neurons and computes the final prediction using weights:
o Suppose the output weights from hidden layer to output
neuron are [0.7, 0.2].
o Calculation:
o Input=(0.8×0.7)+(0×0.2)=0.56+0=0.56Input=(0.8
×0.7)+(0×0.2)=0.56+0=0.56
o Final Activation: The output is passed through a sigmoid
activation function to obtain a probability:
o σ(0.56)≈0.636σ(0.56)≈0.636
4. Final Classification
 The output value of approximately 0.636 indicates the probability
of the email being spam.
 Since this value is greater than 0.5, the neural network classifies
the email as spam (1).

Neural Network for Email Classification Example

Learning of a Neural Network

1. Learning with Supervised Learning
In supervised learning, a neural network learns from labeled input-
output pairs provided by a teacher. The network generates outputs
based on inputs, and by comparing these outputs to the known desired
7
outputs, an error signal is created. The network iteratively adjusts its
parameters to minimize errors until it reaches an acceptable
performance level.
2. Learning with Unsupervised Learning
Unsupervised learning involves data without labeled output variables.
The primary goal is to understand the underlying structure of the input
data (X). Unlike supervised learning, there is no instructor to guide
the process. Instead, the focus is on modeling data patterns and
relationships, with techniques like clustering and association
commonly used.
3. Learning with Reinforcement Learning
Reinforcement learning enables a neural network to learn through
interaction with its environment. The network receives feedback in
the form of rewards or penalties, guiding it to find an optimal policy
or strategy that maximizes cumulative rewards over time. This
approach is widely used in applications like gaming and decision-
making.
Types of Neural Networks
There are seven types of neural networks that can be used.
 Feedforward Networks: A feedforward neural network is a
simple artificial neural network architecture in which data moves
from input to output in a single direction.
 Multilayer Perceptron (MLP): MLP is a type of feedforward
neural network with three or more layers, including an input layer,
one or more hidden layers, and an output layer. It uses nonlinear
activation functions.
 Convolutional Neural Network (CNN): A Convolutional Neural
Network (CNN) is a specialized artificial neural network designed
for image processing. It employs convolutional layers to
automatically learn hierarchical features from input images,
enabling effective image recognition and classification.
 Recurrent Neural Network (RNN): An artificial neural network
type intended for sequential data processing is called a Recurrent
Neural Network (RNN). It is appropriate for applications where
contextual dependencies are critical, such as time series prediction
and natural language processing, since it makes use of feedback
loops, which enable information to survive within the network.

8
 Long Short-Term Memory (LSTM): LSTM is a type of RNN
that is designed to overcome the vanishing gradient problem in
training RNNs. It uses memory cells and gates to selectively read,
write, and erase information.
Implementation of Neural Network using TensorFlow
Here, we implement simple feedforward neural network that trains on
a sample dataset and makes predictions using following steps:
Step 1: Import Necessary Libraries
Import necessary libraries, primarily TensorFlow and Keras, along
with other required packages such as NumPy and Pandas for data
handling.
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
Step 2: Create and Load Dataset
 Create or load a dataset. Convert the data into a format suitable for
training (usually NumPy arrays).
 Define features (X) and labels (y).
data = {
'feature1': [0.1, 0.2, 0.3, 0.4, 0.5],
'feature2': [0.5, 0.4, 0.3, 0.2, 0.1],
'label': [0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)
X = df[['feature1', 'feature2']].values
y = df['label'].values
Step 3: Create a Neural Network
Instantiate a Sequential model and add layers. The input layer and
hidden layers are typically created using Dense layers, specifying the
number of neurons and activation functions.
model = Sequential()
model.add(Dense(8, input_dim=2, activation='relu')) # Hidden layer
model.add(Dense(1, activation='sigmoid')) # Output layer
Step 4: Compile the Model
Compile the model by specifying the loss function, optimizer, and
metrics to evaluate during training.

9
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
Step 5: Train the Model
Fit the model on the training data, specifying the number of epochs
and batch size. This step trains the neural network to learn from the
input data.
model.fit(X, y, epochs=100, batch_size=1, verbose=1)
Step 5: Make Predictions
Use the trained model to make predictions on new data. Process the
output to interpret the predictions (e.g., convert probabilities to binary
outcomes).
test_data = np.array([[0.2, 0.4]])
prediction = model.predict(test_data)
predicted_label = (prediction > 0.5).astype(int)
Complete Code
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Sample dataset (binary classification)

data = {
'feature1': [0.1, 0.2, 0.3, 0.4, 0.5],
'feature2': [0.5, 0.4, 0.3, 0.2, 0.1],
'label': [0, 0, 1, 1, 1] # Binary labels
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Features (X) and Labels (y)

X = df[['feature1', 'feature2']].values # Input features
y = df['label'].values # Output labels

# Create a Sequential model

model = Sequential()

10
# Add input layer and hidden layer
model.add(Dense(8, input_dim=2, activation='relu')) # 2 input
features, 8 neurons in hidden layer

# Add output layer

model.add(Dense(1, activation='sigmoid')) # Output layer for binary
classification

# Compile the model

model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])

# Train the model

model.fit(X, y, epochs=100, batch_size=1, verbose=1)

# Example input for prediction

test_data = np.array([[0.2, 0.4]]) # New input data

# Make a prediction
prediction = model.predict(test_data)
predicted_label = (prediction > 0.5).astype(int) # Convert probability
to binary output
print(f"Predicted label: {predicted_label[0][0]}")
Output:
Predicted label: 1
Advantages of Neural Networks
Neural networks are widely used in many different applications
because of their many benefits:
 Adaptability: Neural networks are useful for activities where the
link between inputs and outputs is complex or not well defined
because they can adapt to new situations and learn from data.
 Pattern Recognition: Their proficiency in pattern recognition
renders them efficacious in tasks like as audio and image
identification, natural language processing, and other intricate data
patterns.
 Parallel Processing: Because neural networks are capable of
parallel processing by nature, they can process numerous jobs at

11
once, which speeds up and improves the efficiency of
computations.
 Non-Linearity: Neural networks are able to model and
comprehend complicated relationships in data by virtue of the non-
linear activation functions found in neurons, which overcome the
drawbacks of linear models.
Disadvantages of Neural Networks
Neural networks, while powerful, are not without drawbacks and
difficulties:
 Computational Intensity: Large neural network training can be a
laborious and computationally demanding process that demands a
lot of computing power.
 Black box Nature: As “black box” models, neural networks pose a
problem in important applications since it is difficult to understand
how they make decisions.
 Overfitting: Overfitting is a phenomenon in which neural
networks commit training material to memory rather than
identifying patterns in the data. Although regularization approaches
help to alleviate this, the problem still exists.
 Need for Large datasets: For efficient training, neural networks
frequently need sizable, labeled datasets; otherwise, their
performance may suffer from incomplete or skewed data.
Applications of Neural Networks
Neural networks have numerous applications across various fields:
1. Image and Video Recognition: CNNs are extensively used in
applications such as facial recognition, autonomous driving, and
medical image analysis.
2. Natural Language Processing (NLP): RNNs and transformers
power language translation, chatbots, and sentiment analysis.
3. Finance: Predicting stock prices, fraud detection, and risk
management.
4. Healthcare: Neural networks assist in diagnosing diseases,
analyzing medical images, and personalizing treatment plans.
5. Gaming and Autonomous Systems: Neural networks enable real-
time decision-making, enhancing user experience in video games
and enabling autonomous systems like self-driving cars.

12
2.Model Representation I
 Neuron in the brain
o Many neurons in our brain
o Dendrite: receive input
o Axon: produce output
 When it sends a message through the Axon to
another neuron
 It sends to another neuron’s

 Dendrite
 Neuron model: logistic unit
o Yellow circle: body of neuron
o Input wires: dendrite
o Output wire: axon

 Neural Network

13
o 3 Layers
 1 Layer: input layer
 2 Layer: hidden layer
 Unable to observe values
 Anything other than input or output layer
 3 Layer: output
layer

 We calculate each of the layer-2 activations based on

the input values with the bias term (which is equal to
1)
 i.e. x0 to x3
 We then calculate the final hypothesis (i.e. the
single node in layer 3) using exactly the same
14
logic, except in input is not x values, but the
activation values from the preceding layer
 The activation value on each hidden unit (e.g. a12 )
is equal to the sigmoid function applied to the linear
combination of inputs
 Three input units
 Ɵ(1) is the matrix of parameters governing the
mapping of the input units to hidden units
 Ɵ(1) here is a [3 x 4] dimensional matrix
 Three hidden units
 Then Ɵ(2) is the matrix of parameters
governing the mapping of the hidden layer to
the output layer
 Ɵ(2) here is a [1 x 4] dimensional matrix
(i.e. a row vector)
 Every input/activation goes to every node in
following layer
 Which means each “layer transition” uses a
matrix of parameters with the following

significance
 j (first of two subscript numbers)= ranges
from 1 to the number of units in layer l+1
 i (second of two subscript numbers) =
ranges from 0 to the number of units in
layer l

15
 l is the layer you’re moving
FROM

 Notation

2a. Model Representation II

 Here we’ll look at how to carry out the computation efficiently
through a vectorized implementation. We’ll also consider why
neural networks are good and how we can use them to learn
complex non-linear things
 Forward propagation: vectorized implementation
o g applies sigmoid-function element-wise to z
o This process of calculating H(x) is called forward
propagation
 Worked out from the first layer
 Starts off with activations of input unit
 Propagate forward and calculate the activation of
each layer sequentially

16


 Similar to logistic regression if you leave out the first layer

o Only second and third layer
o Third layer resembles a logistic regression node
o The features in layer 2 are calculated/learned, not original

features
o Neural network, learns its own features
 The features a’s are learned from x’s
 It learns its own features to feed into logistic
regression
 Better hypothesis than if we were constrained with
just x1, x2, x3
 We can have whatever features we want to feed to
the final logistic regression function
 Implemention in Octave for a2
17
 a2 = sigmoid (Theta1 *
x);

 Other network architectures

o Layer 2 and 3 are hidden
layers

3.PROBLEMS FOR NEURAL NETWORK LEARNING

Neural network learning can face several challenges. Here are some
common problems:

1. Overfitting

When a neural network model learns the training data too well,
including the noise and outliers, it may perform poorly on new,
unseen data. This can be mitigated with techniques like dropout,
regularization, and cross-validation.

18
2. Underfitting

This occurs when a model is too simple to capture the underlying

structure of the data, leading to poor performance on both training and
test data. Increasing the model complexity or improving feature
engineering can help.

3. Vanishing/Exploding Gradients

In deep networks, gradients can become very small (vanishing) or

very large (exploding) during backpropagation. This makes it difficult
for the network to learn. Solutions include using appropriate
activation functions like ReLU and normalization techniques like
Batch Normalization.

4. Data Quality and Quantity

Neural networks require large amounts of high-quality, labeled data.
Insufficient or poor-quality data can lead to inaccurate models.
Techniques like data augmentation, synthetic data generation, and
transfer learning can be employed to address this.

5. Computational Requirements
Training deep neural networks can be computationally expensive and
time-consuming, requiring powerful hardware like GPUs. Optimizing
algorithms, model architecture, and using distributed computing can
mitigate this issue.

6. Hyperparameter Tuning
Finding the right hyperparameters (like learning rate, batch size, etc.)
is crucial for performance but can be a tedious and complex task. Grid
search, random search, and more advanced methods like Bayesian
optimization can help in tuning hyperparameters.

7. Interpretability

Neural networks are often seen as "black boxes," making it hard to

understand how they make decisions. Developing interpretable

19
models and using tools like SHAP (SHapley Additive exPlanations)
can help in understanding model predictions.

8. Class Imbalance
When the dataset has classes that are not equally represented, the
model might become biased towards the majority class. Techniques
like resampling, class weighting, and anomaly detection can help in
handling imbalanced datasets.

4.PERCEPTRON AND GRADIENT DESCENT

A perceptron is one of the simplest types of artificial neural networks and
serves as a fundamental building block for more complex neural
networks. It consists of:

1. Input nodes/units: These take in your input features.

2. Weights: Each input node is connected to an output node
through a weight, which adjusts during the learning process.
3. Activation function: It processes the weighted sum of the
inputs.
4. Output: The perceptron's result, often transformed through a
step function for binary classification (output is 0 or 1).

The perceptron learns by adjusting weights based on the error of its

predictions compared to the actual labels, using the Perceptron
Learning Algorithm

5.Gradient Descent in Machine Learning

Gradient Descent is known as one of the most commonly used
optimization algorithms to train machine learning models by means of
minimizing errors between actual and expected results. Further,
gradient descent is also used to train Neural Networks.

In mathematical terminology, Optimization algorithm refers to the task

of minimizing/maximizing an objective function f(x) parameterized by
x. Similarly, in machine learning, optimization is the task of
20
minimizing the cost function parameterized by the model's parameters.
The main objective of gradient descent is to minimize the convex
function using iteration of parameter updates. Once these machine
learning models are optimized, these models can be used as powerful
tools for Artificial Intelligence and various computer science
applications.

In this tutorial on Gradient Descent in Machine Learning, we will learn

in detail about gradient descent, the role of cost functions specifically
as a barometer within Machine Learning, types of gradient descents,
learning rates, etc.

What is Gradient Descent or Steepest Descent?

Gradient descent was initially discovered by "Augustin-Louis

Cauchy" in mid of 18th century. Gradient Descent is defined as one
of the most commonly used iterative optimization algorithms of
machine learning to train the machine learning and deep learning
models. It helps in finding the local minimum of a function.
The best way to define the local minimum or local maximum of a
function using gradient descent is as follows:

o If we move towards a negative gradient or away from the gradient

of the function at the current point, it will give the local
minimum of that function.
o Whenever we move towards a positive gradient or towards the
gradient of the function at the current point, we will get the local
maximum of that function.

21
This entire procedure is known as Gradient Ascent, which is also
known as steepest descent. The main objective of using a gradient
descent algorithm is to minimize the cost function using iteration. To
achieve this goal, it performs two steps iteratively:

o Calculates the first-order derivative of the function to compute

the gradient or slope of that function.
o Move away from the direction of the gradient, which means slope
increased from the current point by alpha times, where Alpha is
defined as Learning Rate. It is a tuning parameter in the
optimization process which helps to decide the length of the steps.

What is Cost-function?

The cost function is defined as the measurement of difference or

error between actual values and expected values at the current
position and present in the form of a single real number. It helps to
increase and improve machine learning efficiency by providing
feedback to this model so that it can minimize error and find the local
or global minimum. Further, it continuously iterates along the direction
of the negative gradient until the cost function approaches zero. At this
steepest descent point, the model will stop learning further. Although
cost function and loss function are considered synonymous, also there
is a minor difference between them. The slight difference between the
loss function and the cost function is about the error within the training

22
of machine learning models, as loss function refers to the error of one
training example, while a cost function calculates the average error
across an entire training set.

The cost function is calculated after making a hypothesis with initial

parameters and modifying these parameters using gradient descent
algorithms over known data to reduce the cost function.

Hypothesis:

Parameters:

How does Gradient Descent work?

Before starting the working principle of gradient descent, we should

know some basic concepts to find out the slope of a line from linear
regression. The equation for simple linear regression is given as:

1. Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the
intercepts on the y-axis.

23
The starting point(shown in above fig.) is used to evaluate the
performance as it is considered just as an arbitrary point. At this starting
point, we will derive the first derivative or slope and then use a tangent
line to calculate the steepness of this slope. Further, this slope will
inform the updates to the parameters (weights and bias).
The slope becomes steeper at the starting point or arbitrary point, but
whenever new parameters are generated, then steepness gradually
reduces, and at the lowest point, it approaches the lowest point, which
is called a point of convergence.
The main objective of gradient descent is to minimize the cost function
or the error between expected and actual. To minimize the cost
function, two data points are required:

oDirection & Learning Rate

These two factors are used to determine the partial derivative
calculation of future iteration and allow it to the point of convergence
or local minimum or global minimum. Let's discuss learning rate
factors in brief;

Learning Rate:

It is defined as the step size taken to reach the minimum or lowest point.
This is typically a small value that is evaluated and updated based on
the behavior of the cost function. If the learning rate is high, it results
in larger steps but also leads to risks of overshooting the minimum. At
the same time, a low learning rate shows the small step sizes, which
compromises overall efficiency but gives the advantage of more
precision.

24
Types of Gradient Descent

Based on the error in various training models, the Gradient Descent

learning algorithm can be divided into Batch gradient descent,
stochastic gradient descent, and mini-batch gradient descent. Let's
understand these different types of gradient descent:

1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each point in
the training set and update the model after evaluating all training
examples. This procedure is known as the training epoch. In simple
words, it is a greedy approach where we have to sum over all examples
for each update.

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.
o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all
training samples.

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that

runs one training example per iteration. Or in other words, it processes
a training epoch for each example within a dataset and updates each
training example's parameters one at a time. As it requires only one
training example at a time, hence it is easier to store in allocated
memory. However, it shows some computational efficiency losses in
25
comparison to batch gradient systems as it shows frequent updates that
require more detail and speed. Further, due to frequent updates, it is
also treated as a noisy gradient. However, sometimes it can be helpful
in finding the global minimum and also escaping the local minimum.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every
example, and it consists of a few advantages over other gradient
descent.

o It is easier to allocate in desired memory.

o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient

descent and stochastic gradient descent. It divides the training datasets
into small batch sizes then performs the updates on those batches
separately. Splitting training datasets into smaller batches make a
balance to maintain the computational efficiency of batch gradient
descent and speed of stochastic gradient descent. Hence, we can
achieve a special type of gradient descent with higher computational
efficiency and less noisy gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

o It is computationally efficient.
o It produces stable gradient descent convergence.

Challenges with the Gradient Descent

Although we know Gradient Descent is one of the most popular

methods for optimization problems, it still also has some challenges.
There are a few challenges as follows:

1. Local Minima and Saddle Point:

26
For convex problems, gradient descent can find the global minimum
easily, while for non-convex problems, it is sometimes difficult to find
the global minimum, where the machine learning models achieve the
best results.

Whenever the slope of the cost function is at zero or just close to zero,
this model stops learning further. Apart from the global minimum, there
occur some scenarios that can show this slop, which is saddle point and
local minimum. Local minima generate the shape similar to the global
minimum, where the slope of the cost function increases on both sides
of the current points.

In contrast, with saddle points, the negative gradient only occurs on one
side of the point, which reaches a local maximum on one side and a
local minimum on the other side. The name of a saddle point is taken
by that of a horse's saddle.

The name of local minima is because the value of the loss function is
minimum at that point in a local region. In contrast, the name of the
global minima is given so because the value of the loss function is
minimum there, globally across the entire domain the loss function.

2. Vanishing and Exploding Gradient

27
In a deep neural network, if the model is trained with gradient descent
and backpropagation, there can occur two more issues other than local
minima and saddle point.

Vanishing Gradients:

Vanishing Gradient occurs when the gradient is smaller than expected.

During backpropagation, this gradient becomes smaller that causing the
decrease in the learning rate of earlier layers than the later layer of the
network. Once this happens, the weight parameters update until they
become insignificant.

Exploding Gradient:

Exploding gradient is just opposite to the vanishing gradient as it occurs

when the Gradient is too large and creates a stable model. Further, in
this scenario, model weight increases, and they will be represented as
NaN. This problem can be solved using the dimensionality reduction
technique, which helps to minimize complexity within the model.

6.Multi-layer Perceptron in TensorFlow

Multi-Layer perceptron defines the most complex architecture of artificial neural networks.
It is substantially formed from multiple layers of the perceptron. TensorFlow is a very
popular deep learning framework released by, and this notebook will guide to build a
neural network with this library. If we want to understand what is a Multi-layer perceptron,
we have to develop a multi-layer perceptron from scratch using Numpy.

The pictorial representation of multi-layer perceptron learning is as shown below-

MLP networks are used for supervised learning format. A typical learning algorithm for
MLP networks is also called back propagation's algorithm.

28
A multilayer perceptron (MLP) is a feed forward artificial neural network that generates a
set of outputs from a set of inputs. An MLP is characterized by several layers of input
nodes connected as a directed graph between the input nodes connected as a directed
graph between the input and output layers. MLP uses backpropagation for training the
network. MLP is a deep learning method.

Creating an interactive section

We have two basic options when using TensorFlow to run our code:

o Build graphs and run sessions [Do all the set-up and then execute a session to
implement a session to evaluate tensors and run operations].
o Create our coding and run on the fly.
For this first part, we will use the interactive session that is more suitable for an
environment like Jupiter notebook.

1. sess = tf.InteractiveSession()

Creating placeholders
It's a best practice to create placeholder before variable assignments when using
TensorFlow. Here we'll create placeholders to inputs ("Xs") and outputs ("Ys").

Placeholder "X": Represent the 'space' allocated input or the images.

o Each input has 784 pixels distributed by a 28 width x 28 height matrix.

o The 'shape' argument defines the tensor size by its dimensions.

7.Backpropagation in Neural Network

Backpropagation (short for "Backward Propagation of Errors") is a method
used to train artificial neural networks. Its goal is to reduce the difference
between the model’s predicted output and the actual output by adjusting
the weights and biases in the network. A neural network is a structured
system composed of computing units called neurons, which enable it to
compute functions. These neurons are interconnected through edges and
assigned an activation function, along with adjustable parameters.
These parameters allow the neural network to compute specific functions.
Regarding activation functions, higher activation values indicate greater
neuron activation in the network.
What is Backpropagation?
Backpropagation is a powerful algorithm in deep learning, primarily used
to train artificial neural networks, particularly feed-forward networks. It
works iteratively, minimizing the cost function by adjusting weights and
biases.
In each epoch, the model adapts these parameters, reducing loss by
following the error gradient. Backpropagation often utilizes optimization
algorithms like gradient descent or stochastic gradient descent. The
29
algorithm computes the gradient using the chain rule from calculus,
allowing it to effectively navigate complex layers in the neural network to
minimize the cost function.

fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Why is Backpropagation Important?

Backpropagation plays a critical role in how neural networks improve over
time. Here's why:
1. Efficient Weight Update: It computes the gradient of the loss function
with respect to each weight using the chain rule, making it possible to
update weights efficiently.
2. Scalability: The backpropagation algorithm scales well to networks
with multiple layers and complex architectures, making deep learning
feasible.
3. Automated Learning: With backpropagation, the learning process
becomes automated, and the model can adjust itself to optimize its
performance.
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward
Pass and the Backward Pass.
30
How Does the Forward Pass Work?
In the forward pass, the input data is fed into the input layer. These
inputs, combined with their respective weights, are passed to hidden
layers.
For example, in a network with two hidden layers (h1 and h2 as shown in
Fig. (a)), the output from h1 serves as the input to h2. Before applying an
activation function, a bias is added to the weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified
Linear Unit), which returns the input if it’s positive and zero otherwise.
This adds non-linearity, allowing the model to learn complex relationships
in the data. Finally, the outputs from the last hidden layer are passed to
the output layer, where an activation function, such as softmax, converts
the weighted outputs into probabilities for classification.

The forward pass

using weights and biases

How Does the Backward Pass Work?

In the backward pass, the error (the difference between the predicted and
actual output) is propagated back through the network to adjust the
weights and biases. One common method for error calculation is
the Mean Squared Error (MSE), given by:
MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2
Once the error is calculated, the network adjusts weights using gradients,
which are computed with the chain rule. These gradients indicate how
much each weight and bias should be adjusted to minimize the error in
the next iteration. The backward pass continues layer by layer, ensuring
that the network learns and improves its performance. The activation
function, through its derivative, plays a crucial role in computing these
gradients during backpropagation.
Example of Backpropagation in Machine Learning
31
Let’s walk through an example of backpropagation in machine learning.
Assume the neurons use the sigmoid activation function for the forward
and backward pass. The target output is 0.5, and the learning rate is 1.

Example (1) of backpropagation sum

Here's how backpropagation is implemented:

Forward Propagation
1. Initial Calculation
The weighted sum at each node is calculated using:
aj=∑(wi,j∗xi)aj=∑(wi,j∗xi)
Where,
 ajaj is the weighted sum of all the inputs and weights at each node,
 wi,jwi,j represents the weights associated with the jthjth input to
the ithith neuron,
 xixi represents the value of the jthjth input,
2. Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing non-
linearity into the model.
yj=11+e−ajyj=1+e−aj1

To find the
outputs of y3, y4 and y5
3. Computing Outputs

32
At h1 node,
a1=(w1,1x1)+(w2,1x2)=(0.2∗0.35)+(0.2∗0.7)=0.21a1=(w1,1 x1)+(w2,1x2
)=(0.2∗0.35)+(0.2∗0.7)=0.21
Once, we calculated the a1 value, we can now proceed to find the
y3 value:
yj=F(aj)=11+e−a1yj=F(aj)=1+e−a11
y3=F(0.21)=11+e−0.21y3=F(0.21)=1+e−0.211
y3=0.56y3 =0.56
Similarly find the values of y4 at h2 and y5 at O3 ,
a2=(w1,2∗x1)+(w2,2∗x2)=(0.3∗0.35)+(0.3∗0.7)=0.315a2 =(w1,2 ∗x1)+(w2,2∗x2
)=(0.3∗0.35)+(0.3∗0.7)=0.315
y4=F(0.315)=11+e−0.315y4 =F(0.315)=1+e−0.3151
a3=(w1,3∗y3)+(w2,3∗y4)=(0.3∗0.57)+(0.9∗0.59)=0.702a3=(w1,3∗y3)+(w2,3 ∗y4
)=(0.3∗0.57)+(0.9∗0.59)=0.702
y5=F(0.702)=11+e−0.702=0.67y5=F(0.702)=1+e−0.7021=0.67

Values of y3, y4 and y5

4. Error Calculation
Note that, our actual output is 0.5 but we obtained 0.67.
To calculate the error, we can use the below formula:
Errorj=ytarget−y5Errorj=ytarget −y5
Error=0.5−0.67=−0.17Error=0.5−0.67=−0.17
Using this error value, we will be backpropagating.
Backpropagation
1. Calculating Gradients
The change in each weight is calculated as:
Δwij=η×δj×OjΔwij=η×δj×Oj
Where:
 δjδj is the error term for each unit,
 ηη is the learning rate.
2. Output Unit Error
For O3:

33
δ5=y5(1−y5)(ytarget−y5)δ5=y5(1−y5)(ytarget−y5)
=0.67(1−0.67)(−0.17)=−0.0376=0.67(1−0.67)(−0.17)=−0.0376
3. Hidden Unit Error
For h1:
δ3=y3(1−y3)(w1,3×δ5)δ3=y3(1−y3)(w1,3×δ5)
=0.56(1−0.56)(0.3×−0.0376)=−0.0027=0.56(1−0.56)(0.3×−0.0376)=−0.0
027
For h2:
δ4=y4(1−y4)(w2,3×δ5)δ4=y4(1−y4)(w2,3×δ5)
=0.59(1−0.59)(0.9×−0.0376)=−0.0819=0.59(1−0.59)(0.9×−0.0376)=−0.081
9
4. Weight Updates
For the weights from hidden to output layer:
Δw2,3=1×(−0.0376)×0.59=−0.022184Δw2,3
=1×(−0.0376)×0.59=−0.022184
New weight:
w2,3(new)=−0.22184+0.9=0.67816w2,3(new)=−0.22184+0.9=0.67816
For weights from input to hidden layer:
Δw1,1=1×(−0.0027)×0.35=0.000945Δw1,1=1×(−0.0027)×0.35=0.000945
New weight:
w1,1(new)=0.000945+0.2=0.200945w1,1(new)=0.000945+0.2=0.200945
Similarly, other weights are updated:
 w1,2(new)=0.271335w1,2(new)=0.271335
 w1,3(new)=0.08567w1,3(new)=0.08567
 w2,1(new)=0.29811w2,1(new)=0.29811
 w2,2(new)=0.24267w2,2(new)=0.24267
The updated weights are illustrated below,

Through backward pass the weights are updated

Final Forward Pass:

After updating the weights, the forward pass is repeated, yielding:
 y3=0.57y3=0.57
 y4=0.56y4=0.56

34
 y5=0.61y5=0.61
Since y5=0.61y5=0.61 is still not the target output, the process of calculating
the error and backpropagating continues until the desired output is
reached.
This process demonstrates how backpropagation iteratively updates
weights by minimizing errors until the network accurately predicts the
output.
Error=ytarget−y5Error=ytarget−y5
=0.5−0.61=−0.11=0.5−0.61=−0.11
This process is said to be continued until the actual output is gained by
the neural network.

35
8Face recognition
Face recognition using Artificial Intelligence(AI) is a computer
vision technology that is used to identify a person or object from an image or
video. It uses a combination of techniques including deep learning, computer
vision algorithms, and Image processing. These technologies are used to
enable a system to detect, recognize, and verify faces in digital images or
videos.
The technology has become increasingly popular in a wide variety of
applications such as unlocking a smartphone, unlocking doors, passport
authentication, security systems, medical applications, and so on. There are
even models that can detect emotions from facial expressions.
Difference between Face recognition & Face detection
Face recognition is the process of identifying a person from an image or video
feed and face detection is the process of detecting a face in an image or video
feed. In the case of Face recognition, someone’s face is recognized and
differentiated based on their facial features. It involves more advanced
processing techniques to identify a person’s identity based on feature point
extraction, and comparison algorithms. and can be used for applications such
as automated attendance systems or security checks. While Face detection is a
much simpler process and can be used for applications such as image tagging
or altering the angle of a photo based on the face detected. it is the initial step
in the face recognition process and is a simpler process that simply identifies a
face in an image or video feed.

Image Processing and Machine learning

Image processing by computers involves the process of Computer Vision. It

deals with a high-level understanding of digital images or videos. The
requirement is to automate tasks that the human visual systems can do. So, a
computer should be able to recognize objects such as the face of a human
being or a lamppost, or even a statue.
Image reading:
The computer reads any image in a range of values between 0 and 255. For
any color image, there are 3 primary colors – Red, green, and blue. A matrix is
formed for every primary color and later these matrices combine to provide a
Pixel value for the individual R, G, and B colors. Each element of the matrices
provide data about the intensity of the brightness of the pixel.
OpenCV is a Python library that is designed to solve computer vision
problems. OpenCV was originally developed in 1999 by Intel but later
supported by Willow Garage.
Machine learning

36
Every Machine Learning algorithm takes a dataset as input and learns from the
data it basically means to learn the algorithm from the provided input and
output as data. It identifies the patterns in the data and provides the desired
algorithm. For instance, to identify whose face is present in a given image,
multiple things can be looked at as a pattern:
 Height/width of the face.
 Height and width may not be reliable since the image could be rescaled to a
smaller face or grid. However, even after rescaling, what remains
unchanged are the ratios – the ratio of the height of the face to the width of
the face won’t change.
 Color of the face.
 Width of other parts of the face like lips, nose, etc.
There is a pattern involved – different faces have different dimensions like the
ones above. Similar faces have similar dimensions. Machine Learning
algorithms only understand numbers so it is quite challenging. This numerical
representation of a “face” (or an element in the training set) is termed as a
feature vector. A feature vector comprises of various numbers in a specific
order.
As a simple example, we can map a “face” into a feature vector which can
comprise various features like:
 Height of face (cm)
 Width of the face (cm)
 Average color of face (R, G, B)
 Width of lips (cm)
 Height of nose (cm)
Essentially, given an image, we can convert them into a feature vector like:
Height of face (cm) Width of the face (cm) Average color of face (RGB)
Width of lips (cm) Height of nose (cm)
23.1 15.8 (255, 224, 189) 5.2 4.4
So, the image is now a vector that could be represented as (23.1, 15.8, 255,
224, 189, 5.2, 4.4). There could be countless other features that could be
derived from the image,, for instance, hair color, facial hair, spectacles, etc.
Machine Learning does two major functions in face recognition technology.
These are given below:
1. Deriving the feature vector: it is difficult to manually list down all of the
features because there are just so many. A Machine Learning algorithm can
intelligently label out many of such features. For instance, a complex
feature could be the ratio of the height of the nose and the width of the
forehead.

37
2. Matching algorithms: Once the feature vectors have been obtained, a
Machine Learning algorithm needs to match a new image with the set of
feature vectors present in the corpus.
3. Face Recognition Operations
Face Recognition Operations
The technology system may vary when it comes to facial recognition.
Different software applies different methods and means to achieve face
recognition. The stepwise method is as follows:
 Face Detection: To begin with, the camera will detect and recognize a
face. The face can be best detected when the person is looking directly at
the camera as it makes it easy for facial recognition. With the
advancements in technology, this is improved where the face can be
detected with slight variation in their posture of face facing the camera.
 Face Analysis: Then the photo of the face is captured and analyzed. Most
facial recognition relies on 2D images rather than 3D because it is more
convenient to match to the database. Facial recognition software will
analyze the distance between your eyes or the shape of your cheekbones.
 Image to Data Conversion: Now it is converted to a mathematical formula
and these facial features become numbers. This numerical code is known as
a face print. The way every person has a unique fingerprint, in the same
way, they have unique face prints.
 Match Finding: Then the code is compared against a database of other face
prints. This database has photos with identification that can be compared.
The technology then identifies a match for your exact features in the
provided database. It returns with the match and attached information such
as name and address or it depends on the information saved in the database
of an individual.

Implementations

Steps:
 Import the necessary packages
 Load the known face images and make the face embedding of known image
 Launch the live camera
 Record the images from the live camera frame by frame
 Make the face detection using the face_recognization face_location
command
 Make the rectangle around the faces
 Make the face encoding for the faces captured by the camera
 if the faces are matched then plot the person image else continue

38
The model accuracy further can be improved using deep learning and and
other methods.
Face Recognition Softwares
Many renowned companies are constantly innovating and improvising to
develop face recognition software that is foolproof and dependable. Some
prominent software is being discussed below:
a. Deep Vision AI
Deep Vision AI is a front-runner company excelling in facial recognition
software. The company owns the proprietorship of advanced computer vision
technology that can understand images and videos automatically. It then turns
the visual content into real-time analytics and provides very valuable
insights.
Deep Vision AI provides a plug and plays platform to its users worldwide. The
users are given real-time alerts and faster responses based upon the analysis of
camera streams through various AI-based modules. The product offers a
highly accurate rate of identification of individuals on a watch list by
continuous monitoring of target zones. The software is highly flexible that it
can be connected to any existing camera system or can be deployed through
the cloud.
At present, Deep Vision AI offers the best performance solution in the market
supporting real-time processing at +15 streams per GPU.
Business intelligence gathering is helped by providing real-time data on
customers, their frequency of visits, or enhancement of security and safety.
Further, the output from the software can provide attributes like count, age,
gender, etc that can enhance the understanding of consumer behavior,
changing preferences, shifts with time, and conditions that can guide future
marketing efforts and strategies. The users also combine the face recognition
capabilities with other AI-based features of Deep Vision AI like vehicle
recognition to get more correlated data of the consumers.

39
The company complies with international data protection laws and applies
significant measures for a transparent and secure process of the data generated
by its customers. Data privacy and ethics are taken care of.
The potential markets include cities, public venues, public transportation,
educational institutes, large retailers, etc. Deep Vision AI is a certified partner
for NVIDIA’s Metropolis, Dell Digital Cities, Amazon AWS, Microsoft, Red
Hat, and others.
b. SenseTime
 SenseTime is a leading platform developer that has dedicated efforts to
create solutions using the innovations in AI and big data analysis. The
technology offered by SenseTime is multifunctional. The aspects of this
technology are expanding and include the capabilities of facial recognition,
image recognition, intelligent video analytics, autonomous driving, and
medical image recognition. SenseTime software includes different subparts
namely, SensePortrait-S, SensePortrait-D, and SenseFace.
 SensePortrait-S is a Static Face Recognition Server. It includes the
functionality of face detection from an image source, extraction of features,
extraction, and analysis of attributes, and target retrieval from a vast facial
image database
 SensePortrait D is a Dynamic Face Recognition Server. The capabilities
included are face detection, tracking of a face, extraction of features, and
comparison and analysis of data from data in multiple surveillance video
streams.
 SenseFace is a Face Recognition Surveillance Platform. This utility is a
Face Recognition technology that uses a deep learning algorithm.
SenseFace is very efficient in integrated solutions to intelligent video
analysis. It can be extensively used for target surveillance, analysis of the
trajectory of a person, management of population and the associated data
analysis, etc
 SenseTime has provided its services to many companies and government
agencies including Honda, Qualcomm, China Mobile, UnionPay, Huawei,
Xiaomi, OPPO, Vivo, and Weibo.
c. Amazon Rekognition
Amazon provides a cloud-based software solution Amazon Rekognition is a
service computer vision platform. This solution allows an easy method to add
image and video analysis to various applications. It uses a highly scalable and
proven deep learning technology. The user is not required to have any machine
learning expertise to use this software. The platform can be utilized to identify
objects, text, people, activities, and scenes in images and videos. It can also
detect any inappropriate content. The user gets a highly accurate facial
analysis and facial search capabilities. Hence, the software can be easily used

40
for verification, counting of people, and public safety by detection, analysis,
and comparison of faces.
Organizations can use Amazon Rekognition Custom Labels to generate data
about specific objects and scenes available in images according to their
business needs. For example, a model may be easily built to classify specific
machine parts on the assembly line or to detect unhealthy plants. The user
simply provides the images of objects or scenes he wants to identify, and the
service handles the rest.
d. FaceFirst
The FaceFirst software ensures the safety of communities, secure transactions,
and great customer experiences. FaceFirst is secure, accurate, private, fast, and
scalable software. Plug-and-play solutions are also included for physical
security, authentication of identity, access control, and visitor analytics. It can
be easily integrated into any system. This computer vision platform has been
used for face recognition and automated video analytics by many
organizations to prevent crime and improve customer engagement.
As a leading provider of effective facial recognition systems, it benefits to
retail, transportation, event security, casinos, and other industry and public
spaces. FaceFirst ensures the integration of artificial intelligence with existing
surveillance systems to prevent theft, fraud, and violence.
e. Trueface
TrueFace is a leading computer vision model that helps people understand
their camera data and convert the data into actionable information. TrueFace is
an on-premise computer vision solution that enhances data security and
performance speeds. The platform-based solutions are specifically trained as
per the requirements of individual deployment and operate effectively in a
variety of ecosystems. The software places the utmost priority on the diversity
of training data. It ensures equivalent performance for all users irrespective of
their widely different requirements.
Trueface has developed a suite consisting of SDKs and a dockerized container
solution based on the capabilities of machine learning and artificial
intelligence. The suite can convert the camera data into actionable intelligence.
It can help organizations to create a safer and smarter environment for their
employees, customers, and guests using facial recognition, weapon detection,
and age verification technologies.
f. Face++
 Face++ is an open platform enabled by the Chinese company Megvii. It
offers computer vision technologies. It allows users to easily integrate deep
learning-based image analysis recognition technologies into their
applications.

41
 Face++ uses AI and machine vision in amazing ways to detect and analyze
faces, and accurately confirm a person’s identity. Face++ is also developer-
friendly being an open platform such that any developer can create apps
using its algorithms. This feature has resulted in making Face++ the most
extensive facial recognition platform in the world, with 300,000 developers
from 150 countries using it.
 The most significant usage of Face++ has been its integration into
Alibaba’s City Brain platform. This has allowed the analysis of the CCTV
network in cities to optimize traffic flows and direct the attention of medics
and police by observing incidents.
g. Kairos
 Kairos is a state-of-the-art and ethical face recognition solution available to
developers and businesses across the globe. Kairos can be used for Face
Recognition via Kairos cloud API, or the user can host Kairos on their
servers. The utility can be used for control of data, security, and privacy.
Organizations can ensure a safer and better accessibility experience for
their customers.
 Kairos Face Recognition On-Premises has the added advantage of
controlling data privacy and security, keeping critical data in-house and
safe from any potential third parties/hackers. The speed of face recognition-
enabled products is highly enhanced because it does not come across the
issue of delay and other risks associated with public cloud deployment.
 Kairos is ultra-scalable architecture such that the search for 10 million
faces can be done at approximately the same time as 1 face. It is being
accepted by the market with open hands.
h. Cognitec
Cognitec’s FaceVACS Engine enables users to develop new applications for
face recognition. The engine is very versatile as it allows a clear and logical
API for easy integration in other software programs. Cognitec allows the use
of the FaceVACS Engine through customized software development kits. The
platform can be easily tailored through a set of functions and modules specific
to each use case and computing platform. The capabilities of this software
include image quality checks, secure document issuance, and access control by
accurate verification.
The distinct features include:
 A very powerful face localization and face tracking
 Efficient algorithms for enrollment, verification, and identification
 Accurate checking of age, gender, age, exposure, pose deviation, glasses,
eyes closed, uniform lighting detection, unnatural color, image, and face
geometry
 Fulfills the requirements of ePassports by providing ISO 19794-5 full-
frontal image type checks and formatting

42
Utilization of Face Recognition
While facial recognition may seem futuristic, it’s currently being used in a
variety of ways. Here are some surprising applications of this technology.
Genetic Disorder Identification:
There are healthcare apps such as Face2Gene and software like Deep Gestalt
that uses facial recognition to detect genetic disorders. This face is then
analyzed and matched with the existing database of disorders.
Airline Industry:
Some airlines use facial recognition to identify passengers. This face scanner
would help save time and to prevent the hassle of keeping track of a ticket.
Hospital Security:
Facial recognition can be used in hospitals to keep a record of the patients
which is far better than keeping records and finding their names, and
addresses. It would be easy for the staff to use this app and recognize a patient
and get its details within seconds. Secondly, can be used for security purposes
where it can detect if the person is genuine or not or if is it a patient.
Detection of emotions and sentiments:
Real-time emotion detection is yet another valuable application of face
recognition in healthcare. It can be used to detect emotions that patients
exhibit during their stay in the hospital and analyze the data to determine how
they are feeling. The results of the analysis may help to identify if patients
need more attention in case they’re in pain or sad.
Problems and Challenges
Face recognition technology is facing several challenges. The common
problems and challenges that a face recognition system can have while
detecting and recognizing faces are discussed in the following paragraphs.
 Pose: A Face Recognition System can tolerate cases with small rotation
angles, but it becomes difficult to detect if the angle would be large and if
the database does not contain all the angles of the face then it can impose a
problem.
 Expressions: Because of emotions, human mood varies and results in
different expressions. With these facial expressions, the machine could
make mistakes to find the correct person’s identity.
 Aging: With time and age face changes it is unique and does not remain
rigid due to which it may be difficult to identify a person who is now 60
years old.
 Occlusion: Occlusion means blockage. This is due to the presence of
various occluding objects such as glasses, beard, mustache, etc. on the face,
and when an image is captured, the face lacks some parts. Such a problem
can severely affect the classification process of the recognition system.

43
 Illumination: Illumination means light variations. Illumination changes can
vary the overall magnitude of light intensity reflected from an object, as
well as the pattern of shading and shadows visible in an image. The
problem of face recognition over changes in illumination is widely
recognized to be difficult for humans and algorithms. The difficulties posed
by illumination condition is a challenge for automatic face recognition
systems.
 Identify similar faces: Different persons may have a similar appearance
that sometimes makes it impossible to distinguish.
Disadvantages of Face Recognition
1. The danger of automated blanket surveillance
2. Lack of clear legal or regulatory framework
3. Violation of the principles of necessity and proportionality
4. Violation of the right to privacy
5. Effect on democratic political culture

9. Neural Network Learning: Neural Network Representation, Problems for Neural Network
Learning, Perceptions and gradient descent, Multi Layer Network and Back propagation Algorithm,
Illustrative Example of Back Propagation Algorithm- Face Recognition, Advanced Topics inANN.
ADVANCED TOPICS INANN IN MACHINE LEARNING
Here are some advanced topics in Artificial Neural Networks (ANNs) and
machine learning:

1. Deep Learning

 Multi-layer Perceptrons (MLP): Basic deep learning architecture.

 Convolutional Neural Networks (CNNs): Used for image recognition.
 Recurrent Neural Networks (RNNs): Used for sequential data.
 Long Short-Term Memory Networks (LSTMs): A type of RNN to
tackle issues of long-term dependencies.
 Autoencoders: Used for data compression and denoising.

2. Kernel Methods

 Support Vector Machines (SVM): Classification and regression.

 Kernel Selection and Multiple Kernel Learning: Enhancing model
performance.
 Kernel PCA: A method for nonlinear dimensionality reduction.

3. Probabilistic Graphical Models

44
 Bayesian Networks: Representing relationships between variables using
graph structures.
 Undirected Models: Also known as Markov Random Fields.
 Bayesian Learning and Structure Learning: Inference on graphical
models.

4. Reinforcement Learning

 Multi-armed Bandit Problem: Exploring trade-offs between exploration

and exploitation.
 Markov Decision Processes: Framework for modeling decision making.
 Deep-Q Learning: A type of deep learning for reinforcement learning
tasks.
 Policy Gradient Methods: Improving policies in reinforcement learning
scenarios.

5. Generative Models

 Variational Autoencoders (VAEs): Used for generating new data

samples.
 GANs (Generative Adversarial Networks): Two neural networks
contesting with each other to generate new data.

These topics offer a lot of depth and potential for groundbreaking applications
in various domains such as image processing, natural language understanding,
and autonomous systems.

Unit 1
No ratings yet
Unit 1
16 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Neural Network Basics Explained
No ratings yet
Neural Network Basics Explained
10 pages
Unit III
No ratings yet
Unit III
29 pages
Neural Networks: Training & Evolution
No ratings yet
Neural Networks: Training & Evolution
17 pages
Unit 5
No ratings yet
Unit 5
24 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
What Are Neural Networks
No ratings yet
What Are Neural Networks
5 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Neural Networks
No ratings yet
Neural Networks
4 pages
Chapter-4 Fundamental of Neural Network
No ratings yet
Chapter-4 Fundamental of Neural Network
26 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
Unit 2 ML Ak
No ratings yet
Unit 2 ML Ak
12 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Unit 1
No ratings yet
Unit 1
20 pages
Neural Networks
No ratings yet
Neural Networks
16 pages
Chapter 5
No ratings yet
Chapter 5
63 pages
CCS355 NNDL Unit1
No ratings yet
CCS355 NNDL Unit1
30 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Report On Neural Networks
No ratings yet
Report On Neural Networks
15 pages
ML Unit-5
No ratings yet
ML Unit-5
22 pages
Neural Network
No ratings yet
Neural Network
7 pages
Don T Be Scared of Neural Network 1731079571
No ratings yet
Don T Be Scared of Neural Network 1731079571
18 pages
Unit 1
No ratings yet
Unit 1
19 pages
Neural Networks-A Diffusion Model Changing The Landscape
No ratings yet
Neural Networks-A Diffusion Model Changing The Landscape
13 pages
Transcript - Module 3 - Deep Learning and Generative AI
No ratings yet
Transcript - Module 3 - Deep Learning and Generative AI
38 pages
Neural Networks for BI Enthusiasts
No ratings yet
Neural Networks for BI Enthusiasts
18 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
71 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Understanding of Neural Networks
No ratings yet
Understanding of Neural Networks
7 pages
4.0 The Complete Guide To Artificial Neural Networks
0% (1)
4.0 The Complete Guide To Artificial Neural Networks
23 pages
ML Module 5
No ratings yet
ML Module 5
22 pages
1756210939665-Artificial Neural Networks - A Primer
No ratings yet
1756210939665-Artificial Neural Networks - A Primer
7 pages
Lecture8,9-Neural Networks
No ratings yet
Lecture8,9-Neural Networks
65 pages
Unit-1 NN
No ratings yet
Unit-1 NN
12 pages
Neural Networks
No ratings yet
Neural Networks
61 pages
What Is A Neural Network? - IBM
No ratings yet
What Is A Neural Network? - IBM
10 pages
Artificial Neural Network Using R
No ratings yet
Artificial Neural Network Using R
15 pages
Unit 4 DSA
No ratings yet
Unit 4 DSA
76 pages
Chapter 05
No ratings yet
Chapter 05
25 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
Introduction Neural
No ratings yet
Introduction Neural
13 pages
Unit V
No ratings yet
Unit V
49 pages
Neural
No ratings yet
Neural
53 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Physics 12
No ratings yet
Physics 12
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Mod 4 Notes
No ratings yet
Mod 4 Notes
46 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Design of Single Neuron On FPGA
No ratings yet
Design of Single Neuron On FPGA
4 pages
Artificial Neural Network Modeling of Bioethanol Production Via Syngas Fermentation
No ratings yet
Artificial Neural Network Modeling of Bioethanol Production Via Syngas Fermentation
13 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
DL LWB
No ratings yet
DL LWB
116 pages
Chapter 5 Machine Learning Basics
No ratings yet
Chapter 5 Machine Learning Basics
9 pages
Artificial Neural Network: Vardhaman College of Engineering
No ratings yet
Artificial Neural Network: Vardhaman College of Engineering
35 pages
AI As A Tool For Predicting The Quality Attributes of Garlic Slices During Continuous IR-hot Air Drying
No ratings yet
AI As A Tool For Predicting The Quality Attributes of Garlic Slices During Continuous IR-hot Air Drying
20 pages
100 MCQ Questions For Practice
No ratings yet
100 MCQ Questions For Practice
35 pages
Data Science Projects
No ratings yet
Data Science Projects
74 pages
DL Mod1
No ratings yet
DL Mod1
58 pages
REVISED B Tech IT Final Year Syllabus 2023 24JULY 2023 Docx 1 1
No ratings yet
REVISED B Tech IT Final Year Syllabus 2023 24JULY 2023 Docx 1 1
178 pages
3.convolutional Networks and Sequence Modeling
No ratings yet
3.convolutional Networks and Sequence Modeling
19 pages
DL Unit 2.3
No ratings yet
DL Unit 2.3
16 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
23 pages
Deep vs. Shallow Neural Networks
No ratings yet
Deep vs. Shallow Neural Networks
12 pages
ML - UNIT-1 &2 Notes
No ratings yet
ML - UNIT-1 &2 Notes
84 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Harrison Kinsley, Daniel Kukieła - Neural Networks From Scratch in Python (2020) - 1-30
No ratings yet
Harrison Kinsley, Daniel Kukieła - Neural Networks From Scratch in Python (2020) - 1-30
30 pages
Schizophrenia Research: Jose A. Cortes-Briones, Nicolas I. Tapia-Rivas, Deepak Cyril D'Souza, Pablo A. Estevez
No ratings yet
Schizophrenia Research: Jose A. Cortes-Briones, Nicolas I. Tapia-Rivas, Deepak Cyril D'Souza, Pablo A. Estevez
19 pages
Design Variable Structure Fuzzy Control Based On Deep Neural Network Model For Servomechanism Drive System
No ratings yet
Design Variable Structure Fuzzy Control Based On Deep Neural Network Model For Servomechanism Drive System
12 pages
Advanced Systems For Environmental Monitoring, Iot and The Application of Artificial Intelligence 1St Edition Jamal Mabrouki
100% (3)
Advanced Systems For Environmental Monitoring, Iot and The Application of Artificial Intelligence 1St Edition Jamal Mabrouki
65 pages
Neural Networks
No ratings yet
Neural Networks
21 pages
Soft vs Hard Computing Explained
No ratings yet
Soft vs Hard Computing Explained
51 pages
CSE Artificial Neural Networks Report
No ratings yet
CSE Artificial Neural Networks Report
22 pages
Sri Ram - Week 3 Assignment
No ratings yet
Sri Ram - Week 3 Assignment
14 pages
Machine Learning and Blockchain
No ratings yet
Machine Learning and Blockchain
47 pages
Artificial Neural Network Applications in Power Electronics
No ratings yet
Artificial Neural Network Applications in Power Electronics
8 pages
Divorce Prediction Using Correlation Based Feature Selection and Artificial Neural Networks (#549416) - 748448
No ratings yet
Divorce Prediction Using Correlation Based Feature Selection and Artificial Neural Networks (#549416) - 748448
15 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
An Ensemble Neural Network Model For Predicting The Energy
No ratings yet
An Ensemble Neural Network Model For Predicting The Energy
16 pages

UNIT - 4 ML

Uploaded by

UNIT - 4 ML

Uploaded by

Neural Network Learning: Neural Network Representation, Problems for

Neural Network Learning, Perceptions and gradient descent, Multi Layer

Importance of Neural Networks

Neural Network for Email Classification Example

Learning of a Neural Network

# Sample dataset (binary classification)

# Features (X) and Labels (y)

# Create a Sequential model

# Add output layer

# Compile the model

# Train the model

# Example input for prediction

 We calculate each of the layer-2 activations based on

2a. Model Representation II

 Similar to logistic regression if you leave out the first layer

 Other network architectures

3.PROBLEMS FOR NEURAL NETWORK LEARNING

This occurs when a model is too simple to capture the underlying

In deep networks, gradients can become very small (vanishing) or

4. Data Quality and Quantity

Neural networks are often seen as "black boxes," making it hard to

4.PERCEPTRON AND GRADIENT DESCENT

1. Input nodes/units: These take in your input features.

The perceptron learns by adjusting weights based on the error of its

5.Gradient Descent in Machine Learning

In mathematical terminology, Optimization algorithm refers to the task

In this tutorial on Gradient Descent in Machine Learning, we will learn

What is Gradient Descent or Steepest Descent?

Gradient descent was initially discovered by "Augustin-Louis

o If we move towards a negative gradient or away from the gradient

o Calculates the first-order derivative of the function to compute

The cost function is defined as the measurement of difference or

The cost function is calculated after making a hypothesis with initial

How does Gradient Descent work?

Before starting the working principle of gradient descent, we should

oDirection & Learning Rate

Based on the error in various training models, the Gradient Descent

1. Batch Gradient Descent:

Advantages of Batch gradient descent:

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that

Advantages of Stochastic gradient descent:

o It is easier to allocate in desired memory.

3. MiniBatch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

Challenges with the Gradient Descent

Although we know Gradient Descent is one of the most popular

1. Local Minima and Saddle Point:

2. Vanishing and Exploding Gradient

Vanishing Gradient occurs when the gradient is smaller than expected.

Exploding gradient is just opposite to the vanishing gradient as it occurs

6.Multi-layer Perceptron in TensorFlow

The pictorial representation of multi-layer perceptron learning is as shown below-

Creating an interactive section

Placeholder "X": Represent the 'space' allocated input or the images.

o Each input has 784 pixels distributed by a 28 width x 28 height matrix.

7.Backpropagation in Neural Network

fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Why is Backpropagation Important?

The forward pass

How Does the Backward Pass Work?

Example (1) of backpropagation sum

Here's how backpropagation is implemented:

Values of y3, y4 and y5

Through backward pass the weights are updated

Final Forward Pass:

Image Processing and Machine learning

Image processing by computers involves the process of Computer Vision. It

 Multi-layer Perceptrons (MLP): Basic deep learning architecture.

 Support Vector Machines (SVM): Classification and regression.

3. Probabilistic Graphical Models

 Multi-armed Bandit Problem: Exploring trade-offs between exploration

 Variational Autoencoders (VAEs): Used for generating new data

You might also like