0% found this document useful (0 votes)
509 views61 pages

Unit 5

Uploaded by

Sai Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
509 views61 pages

Unit 5

Uploaded by

Sai Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Unit V:

Neural Networks and Deep Learning

Introduction to Artificial Neural Networks with


Keras, Implementing MLPs with Keras, Installing
TensorFlow2, Loading and Preprocessing Data with
Tensor Flow.
Introduction to ANNs

• Artificial Neural Networks (ANN) are algorithms based on brain


function and are used to model complicated patterns and forecast
issues. The Artificial Neural Network (ANN) is a deep learning
method that arose from the concept of the human brain Biological
Neural Networks.

• ANNs are at the very core of Deep Learning, being used in


» Google Images
» Apple’s Siri
» YouTube
» DeepMind’s AlphaGo
From Biological to Artificial Neurons
• ANNs were first introduced back in 1943 by the
neurophysiologist Warren McCulloch and the mathematician
Walter Pitts.
• The early successes of ANNs led to the widespread belief that
we would soon be conversing with truly intelligent machines.
• Sadly, this promise went unfulfilled, triggering the first AI
winter in the 1970s.
• In mid-80s, new architectures such as Multilayer Perceptrons
(MLPs) and better training techniques such as
backpropagation algorithm revives the interest in
connection (i.e., the study of neural networks).
• However, by the 1990s, other powerful Machine Learning
techniques such as Support Vector Machines and Random
Forests overtake ANNs.
From Biological to Artificial Neurons (cont.)
• Since early 2010s, the success of deep learning in computer
vision, there is a huge wave of interest in ANNs.
• Reasons for this AI spring:
» There is now a huge quantity of data available to train
neural networks
 ANNs frequently outperform other ML techniques on very
large and complex problems.
» The tremendous increase in computing power since the 1990s
now makes it possible to train large neural networks in a
reasonable amount of time.
 The availability of Powerful GPU cards and cloud
computing platforms.
» The training algorithms have been improved.
 We can train very large networks now.
A Biological Neuron
Multiple layers in a biological neural
network (human cortex)
Logical Computations with Neurons
• McCulloch and Pitts proposed a very simple model of the
biological neuron, which later became known as an artificial
neuron: it has one or more binary (on/off) inputs and one
binary output.
» Even with such a simplified model it is possible to build a network
of artificial neurons that computes any logical proposition you want.

• These networks can be combined to compute complex logical


expressions
The Perceptron
• The Perceptron is one of the simplest ANN architectures,
invented in 1957 by Frank Rosenblatt.
» Based on a slightly different artificial neuron called a
threshold logic unit (TLU), or sometimes a linear
threshold unit (LTU).
 Compute a weighted sum of its inputs then applies a step
function
• Architecture of a Perceptron with two input neurons, one bias
neuron, and three output neurons:
Single-layer perceptron
Single layer perceptron is a simple Neural Network which contains
only one layer.
It is the calculation of sum of input vector with the value multiplied by
corresponding vector weight.
The displayed output value will be the input of an activation function.

The perceptron consists of 4 parts.


1.Input values or One input layer
2.Weights and Bias
3.Net sum
4.Activation Function
Single Layer Perceptron has just two layers of input and output. It only
has single layer hence the name single layer perceptron. It does not
contain Hidden Layers as that of Multilayer perceptron.

Input nodes are connected fully to a node or multiple nodes in the next
layer. A node in the next layer takes a weighted sum of all its inputs
Multi-layer perceptron
A multilayer perceptron is a type of feed-forward artificial neural
network that generates a set of outputs from a set of inputs.
An MLP is a neural network connecting multiple layers in a directed
graph, which means that the signal path through the nodes only goes
one way.
The MLP network consists of input, output, and hidden layers.
Each hidden layer consists of numerous perceptron’s which are called
hidden layers or hidden unit.
How is a Perceptron trained?

• The Perceptron training algorithm proposed by Rosenblatt was


largely inspired by Hebb’s rule.
» when a biological neuron triggers another neuron often, the connection
between these two neurons grows stronger.
» “Cells that fire together, wire together”
• Perceptron learning rule:
» For every output neuron that produced a wrong prediction, it reinforces
the connection weights from the inputs that would have con- tributed to
the correct prediction.

• Perceptron convergence theorem:


» if the training instances are linearly separable, Rosenblatt demonstrated
that this algorithm would converge to a solution.
Perception in SciKit-Learn
Limitations of Perceptrons

• In their 1969 monograph Perceptrons, Marvin Minsky and


Seymour Papert highlighted a number of serious weaknesses of
Perceptrons
» Exclusive OR (XOR) classification problem
• But some of the limitations can be eliminated by Multilayer
Perceptron (MLP)
Architecture of a Multilayer Perceptron
Backpropagation Algorithm
• The field of Deep Learning studies deep neural networks
(DNNs)---ANNs contain a deep stack of hidden layers---and more
generally models containing deep stacks of computations.
• For many years researchers struggled to find a way to train MLPs,
without success.
• In 1986, David Rumelhart, Geoffrey Hinton, and Ronald
Williams introduced the backpropagation training
algorithm.
» Two passes through the network (one forward,
one backward)
» Compute the gradient of the network’s error with regard
to every single model parameter.
» Adjust the parameters by using gradient descent.
• The algorithm handles one mini-batch at a time (e.g., 32
instances in the training set).
• It goes through the full training set multiple times. Each pass is
called an epoch.
• Forward pass: Each mini-batch is passed from the network’s
input layer to the output layer through the hidden layers. All
intermediate results are preserved.
• Measures the network’s output error by a loss function that
compares the desired output and the actual output of the
network.
• Computes how much each output connection contributed to the
error by the chain rule.
• Reserve pass: Measures how much of these error
contributions came from each connection in the layer below,
again using the chain rule, working backward until the
algorithm reaches the input layer.
• Performs a Gradient Descent step to weak all the connection
weights in the network, using the error gradients it just
computed.
Activation Functions
• Backpropagation algorithm cannot use with the step function,
which provides no gradient for Gradient Descent
• Logistic (sigmoid) function: σ(z) = 1 / (1 + exp(–z)).
• Hyperbolic tangent function: tanh(z) = 2σ(2z) – 1
• Rectified Linear Unit function: ReLU(z) = max(0, z)
Output Neurons for Regression MLPs
• For regression tasks, an ANN predicts a single value only.
Thus, one output neuron is sufficient.
• For multivariate regression, there is one output neuron per
output dimension.
• Output neurons should return any range of values.
» ReLU function: ReLU(z) = max(0, z)
» Softplus function: softplus(z) = log(1 + exp(z))
» logistic function or hyperbolic tangent with scaling factors.
• Loss functions:
» Mean Squared Error
» Mean Absolute Error
» Huber Loss
Typical regression MLP architecture
Output Neurons for Classification MLPs
• For binary classification tasks, there is one output neuron with
the logistic activation function
» The output can be interpreted as the estimated probability of the
positive class.
• For multilabel binary classification tasks, you need multiple
output neurons.
• For multiclass classification tasks, there is one output neuron
per class, and the softmax activation function should be used
for the whole output layer.
Output Neurons for Classification MLPs (cont.)
• The predicted class is:

• Loss function: cross-entropy loss (a.k.a.


the log loss):
A modern MLP for classification
Typical classification MLP architecture
Implementing MLPs with Keras
• Keras is a high-level Deep Learning API that
allows you to easily build, train, evaluate, and
execute all sorts of neural networks.
» https://keras.io
» Computation backend: TensorFlow, Microsoft
Cognitive Toolkit (CNTK), Theano, Apache
MXNet, Apple’s Core ML, JavaScript or
TypeScript, and PlaidML.
• tf.keras: Extended Keras implementation based
on TensorFlow with TensorFlow-specific
features.
• PyTorch is also quite popular.
Multibackend Keras vs. tf.keras
Installing TensorFlow 2
• If you are using Google Colab only, you can skip this step.
• If you plan to run your code on your own computer, please install Jupyter,
Scikit-Learn, etc.
• Activate the virtual environment and then use pip to install TensorFlow 2:

• Open a Python shell or a Jupyter notebook and print the version of


TensorFlow and tf.keras
Fashion MNIST Dataset
• 70,000 grayscale images of 28 × 28 pixels each, with 10 classes

• Drop-in replacement of MNIST in Chapter 2.


» But the images represent fashion items rather than handwritten
digits.
» More challenging than MNIST: a simple linear model reaches about 92%
accuracy on MNIST, but only about 83% on Fashion MNIST.
Using Keras to Load the Dataset
• Keras provides some utility functions to fetch and load common datasets.

• Loading data from Keras is different from Scikit-Learn:


» Every image is represented as a 28 × 28 array rather than a 1D array of
size 784.

» The pixel intensities are represented as integers (from 0 to 255) rather than floats (from
0.0 to 255.0)
• Since we are going to train the neural network using Gradient Descent, we
must scale the input features down to the 0–1 range by dividing them by
255.0:
Naming the Labels
• Unlike MNIST, Fashion MNIST needs the list of class names of
each label to know what the image are:

• For example, the first image in the training set represents a


coat:
Creating the model using the Sequential API

• The first method for building a neural network in tf.keras is the use of
Sequential API.
» Only for neural neteworks that compose of a single stack of layers connected
sequentially.
• The tf.keras code for building a classification MLP with two
hidden layers:

» The Flatten layer convert each input image into a 1D array.


» Each Dense layer manages its own weight matrix, containing all the connection weights
between the neurons and their inputs, as well as the bias terms.
Creating the model using the Sequential API
(cont.)
• Alternatively, you can add the layers when the Sequential model is created.

• The model’s summary() method displays the information of the model’s layers:
Accessing the Information of a Model
• Directly get a model’s list of layers:

• All the parameters of a layer can be accessed using its get_weights() and set_weights() methods:
Compiling the Model
• Before training the model, you must compile the model:

• Use the "sparse_categorical_cross entropy" loss when we have


sparse labels (i.e., for each instance, there is just a tar- get class
index, from 0 to 9 in this case) and the classes are exclusive.
» Otherwise, the "categorical_crossentropy" loss if one-hot vectors is used
(i.e., [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.] represents class 3).
» Otherwise, use the "binary_crossentropy" loss if the "sigmoid" (i.e.,
logistic) activation function in the output layer is used for binary
classification tasks.
• Use “sgd” for Stochastic Gradient Descent (i.e., reverse-mode
autodiff plus Gradient Descent)
• Use “accuracy” because our model is a classifier.
Training the Model
• After compiling the model, call fit() to train the model with the training and
validation datasets.

• You should check whether overfitting occurs (i.e., accuracy >> val_accuracy)
• Consider passing the class_weight argument if the training set is skewed.
Drawing the Learning Curves
• fit() returns a History object, which contains:
» The training parameters (history.params)
» The list of epochs it went through (history.epoch)
» The loss and extra metrics at the end of each epoch on the training set
and on the validation set (history.history).
• You can draw the learning curves using matplotlib:
Drawing the Learning Curves (cont.)
• The learning curve shows the mean training loss and accuracy
measured over each epoch, and the mean validation loss and
accuracy measured at the end of each epoch:

• When reporting the learning curves, you should shift the training
curve in the above graph by half an epoch to the left.
Continue the Training
• If the model has not converged yet, call fit() again to continue the
training.
• If you are not satisfied with the performance of your model, you
should go back and tune the hyperparameters.
» Tune the learning rate
» Try another optimizer
» Adjust the number of layers, the number of neurons per layer, and the
types of activation functions to use for each hidden layer
» Change the batch size
• Finally, estimate the generalization error using the test set before
you deploy the model to production.

• Don’t tweak the hyperparameters to improve the accuracy of the


test set
Using the Model to Make Predictions
• After training the model, you can use the model’s predict() method to
make predictions on new instances:

• If you want to know the class with the highest estimated probability only,
use the predict_classes() method instead:

• They should be correct (otherwise, more training)


California Housing with the Sequential API
Building Complex Models Using the
Functional API
• You cannot use the Sequential API to build nonsequential
neural networks.
• For example, consider the Wide & Deep neural network:
» can learn both deep patterns (using the deep path) and simple rules
(through the short path)
Using the Functional API
• How about sending a subset of the features through the wide
path and a different subset (possibly overlapping) through the
deep path:
Using the Functional API (cont.)
• Compile, train, and evaluate the model, and then make
predictions:
Models with Multiple Outputs
• Reasons for having multiple outputs:
» The task may demand it.
» You have multiple independent tasks
based on the same data---multitask
classification.
» Add some auxiliary outputs for
regularization.
Models with Multiple Outputs (cont.)
• Each output will need its own loss function:

• Train the models with two datasets:

• Evaluate the outputs separately:

• Likewise, make predictions separately:


Using the Subclassing API to Build
Dynamic Models
• Both the Sequential API and the Functional API
are declarative
» Advantages:
 The model can easily be saved, cloned, and shared
 its structure can be displayed and analyzed
 the framework can infer shapes and check types, so errors
can be caught early
 It’s also fairly easy to debug, since the whole model is a
static graph of layers.
» Disadvantage:
 The models are static---cannot build models that involve
loops, varying shapes, conditional branching, and other
dynamic behaviors.
Using the Subclassing API to Build
Dynamic Models (cont.)
• The Subclassing API: subclass the Model class, create the layers you need in
the constructor, and use them to perform the computations you want in the
call() method.
» Advantage: Imperative programming style---you can use for loops, if statements,
low-level TensorFlow operation in call()
» Disadvantage: Keras cannot inspect the model’s architecture and it is hard to
debug
Saving and Restoring a Model
• When using the Sequential API or the Functional API, you can save a
trained Keras model:

• Keras will use the HDF5 format to save


» The model’s architecture (including every layer’s hyperparameters)
» The values of all the model parameters for every layer (e.g., connection
weights and biases)
» The optimizer (including its hyperparameters and any state it may
have)
» etc.
• To load the model:
Using Callbacks to Save Intermediate
Models during Training
• Remember to save models at regular intervals during a long training
session to avoid losing everything if your computer crashes.
• The fit() method accepts a callbacks argument that lets you specify a list of
objects that Keras will call at the start and end of training, at the start and
end of each epoch, and even before and after processing each batch.

• If you use a validation set during training, you can set


save_best_only=True when creating the ModelCheckpoint to implement
early stopping:
Using Callbacks to Implement Early
Stopping and custom callbacks
• Another way to implement early stopping is to simply use the
EarlyStopping callback.

• If you need extra control, you can easily write your own
custom callbacks. For example,
Using TensorBoard for Visualization
• TensorBoard is a great interactive visualization
tool that you can use to
» view the learning curves during training
» compare learning curves between multiple runs
» visualize the computation graph
» analyze training statistics
» view images generated by your model
» visualize complex multidimensional data projected
down to 3D and automatically clustered for you
» etc.
Visualizing Learning Curves with TensorBoard
Using TensorBoard
• To use TensorBoard, you must modify your program so that it outputs the
data you want to visualize to special binary log files called event files.
• Each binary data record is called a summary.
• The TensorBoard server will monitor the log directory, and it will
automatically pick up the changes and update the visualizations.
• In general, you want to point the TensorBoard server to a root log
directory and configure your program so that it writes to a different
subdirectory every time it runs.
• Define the root log directory for TensorBoard logs
Using TensorBoard (cont.)
• Keras provides the TensorBoard() callback:

• The callback automatically create the log directory, generate


event files and write summaries to them during training.
• The directory structure:
Using TensorBoard (cont.)
• Start the TensorBoard server by running a command in a
terminal:

• Once the server is up, you can open a web browser and go to
http://localhost:6006
• To use TensorBoard directly within Jupyter:
Using TensorBoard (cont.)
• TensorFlow offers a lower-level API in the tf.summary package.
» E.g., you can create a SummaryWriter using the create_file_writer() function,
and it uses this writer as a context to log scalars, histograms, images, audio,
and text, all of which can then be visualized using TensorBoard
Number of Hidden Layers
• Theoretically, you can use a shallow neural networks to model even the
most complex functions, provided it has enough neurons.
• But deep networks have a much higher parameter efficiency than shallow
ones for complex problems.
» Real-world data is often structured in such a hierarchical way, and deep neural networks
automatically take advantage of this fact.
• Not only does this hierarchical architecture help DNNs converge faster to a
good solution, but it also improves their ability to generalize to new
datasets (i.e., transfer learning)
• Very complex tasks, such as large image classification or speech
recognition, typically require networks with hundreds of layers and they
need a huge amount of training data.
» It is more common to reuse parts of a pretrained state-of-the-art network that performs
these tasks.
Number of Neurons per Hidden Layer
• The number of neurons in the input and output layers is
determined by the type of input and output your task requires.
» e.g., the MNIST task requires 28 × 28 = 784 input neurons and 10 output
neurons.
• As for the hidden layers, it used to be common to size them to form
a pyramid, with fewer and fewer neurons at each layer.
• You can try increasing the number of neurons gradually until the
network starts overfitting.
• The “stretch pants” approach: pick a model with more layers and
neurons than you actually need, then use early stopping and other
regularization techniques to prevent it from overfitting.
» Avoid bottleneck layers that could ruin your model.
• In general you will get more bang for your buck by increasing the
number of layers instead of the number of neurons per layer.
Tuning the Learning Rate
• Learning rate is arguably the most important hyperparameter.
• One way to find a good learning rate is to train the model for a few
hundred iterations, starting with a very low learning rate (e.g., 10-5)
and gradually increasing it up to a very large value (e.g., 10).
» This is done by multiplying the learning rate by a constant factor at each
iteration (e.g., by exp(log(106)/500) to go from 10-5 to 10 in 500 iterations).
• If you plot the loss as a function of the learning rate (using a log
scale for the learning rate), you should see it dropping at first.
» But after a while, the learning rate will be too large, so the loss will shoot
back up
• The optimal learning rate will be a bit lower than the point at which
the loss starts to climb (typically about 10 times lower than the
turning point).
• You can then reinitialize your model and train it normally using this
good learning rate.
Tuning Optimizer, Batch Size, Activation
Functions, and Number of Iterations
• Choosing a better optimizer than plain old Mini-batch
Gradient Descent is quite important.
• The main benefit of using large batch sizes is that hardware
accelerators like GPUs can process them efficiently, so the
training algorithm will see more instances per second.
» But some researchers reported that large batch sizes often lead
to training instabilities, the resulting model may not generalize
as well as a model trained with a small batch size.
• There are activation functions better than ReLU
• In most cases, the number of training iterations does not
actually need to be tweaked: just use early stopping
instead.

You might also like