0% found this document useful (0 votes)
13 views22 pages

Ad - Aiml - Module 3

Uploaded by

Tahir Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views22 pages

Ad - Aiml - Module 3

Uploaded by

Tahir Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

6 • C H A P T E R 1 / I n tr o d u cti o n t o M achi n e L ear n i n g

of the patient (age, sex, weight, and so on). On the basis of the results of these measurements, a model
can be created that analyses the test results of a new patient and predicts the chance and severity of
the disease. Based on the results of the model, further tests can be suggested. Proper preventive and
prognostic measures can then be advised by doctors.

1.2 Where is Machine Learning Used?


Are you fascinated by how Netflix recommends movies you might like? Have you wondered how Google
shows you such accurate search results? Machine learning is behind these technological advances. It represents
a key evolution in the fields of computer science, data analysis, software engineering and artificial intelligence.
Figure 1.4 shows how machine learning is used in Google.

Figure 1.4 Google using machine learning.

Last decade has seen immense growth in the field of artificial intelligence and data science. “Indeed” is
an American worldwide employment-related search engine for job listings. It was launched in November
2004. As a single-topic search engine, it is also an example of vertical search. Indeed is currently available
in over 60 countries and 28 languages. In October 2010, Indeed.com passed Monster.com to become
the highest-traffic job website in the United States. According to their statistics, there has been a 200%
increase in “data scientist” job searches and a 50% increase in job listings. The statistical plot is given in
Fig. 1.5.

MachineL_Ch01.indd 6 12/14/2018 11:57:32 AM


1 . 3   A pplicati o n s o f M achi n e L ear n i n g  • 7

Percentage of matching job postings

0.04

0.02

0
Jan ’06 Jan ’07 Jan ’08 Jan ’09 Jan ’10 Jan ’11 Jan ’12 Jan ’13 Jan ’14 Jan ’15

Data Scientists

Figure 1.5 Job trends from Indeed.com.

Many technologies today benefit from machine learning. Facial recognition technology allows social media
platforms to help users tag and share photos of friends. Recommendation engines, powered by machine
learning, suggest what products you should buy based on user preferences. Self-driving cars using machine
learning to navigate may soon be available to consumers.

1.3 Applications of Machine Learning

1.3.1 Marketing and Sales


The purchase pattern is complicated because customers engage in businesses through a variety of methods.
Digital marketing creates a plethora of customer data that needs to be analyzed and acted upon to improve
sales. Insights from the large dataset of sales are required to be collected to develop new strategies to
improve the sale. This is where machine learning comes into picture. It uses algorithms to quickly interpret
diverse datasets and build correlations. As a result, marketing and sales departments are able to analyze the
path to purchase and understand how they can optimize the buyer’s experience.

1.3.2 Search Engines


Search engines use machine learning algorithm to find the best personalized match for your query. Google, the
world’s biggest search engine now offers recommendations and suggestions based on previous user searches
using machine learning. In 2015, Google introduced RankBrain – a machine learning algorithm used to
decipher the semantic content of a search query. Through the use of an intuitive neural network, RankBrain
identifies the intent behind a user’s search and offers them tailored information on that particular topic.

1.3.3 Transportation
Based on travel history and pattern of traveling across various routes, machine learning can help transporta-
tion companies predict potential problems that could arise on certain routes, and accordingly advise their
customers to opt for a different route. Transportation firms like Uber and delivery organizations like Swiggy

MachineL_Ch01.indd 7 12/14/2018 11:57:32 AM


6 Artificial Neural Networks

LEARNING OBJECTIVES
• To introduce the concept of artificial neural • To introduce the different architectures of arti-
networks. ficial neural networks.
• To compare the workings of biological neurons • To use McCulloch–Pitts model for solving
versus artificial neurons. basic classification problems.
• To introduce different types of learning.

LEARNING OUTCOMES
• Students will be able compare biological neu- • Students will be able use McCulloch–Pitts
rons with artificial neurons. model for solving basic logical operations like
• Students will be able compare the use of activa- AND, OR, and NOT. Students will be able to
tion function of biological neurons and artifi- solve nonlinear problems like EXOR problems
cial neurons. using this model.
• Students will be able to discriminate between
supervised and unsupervised models.

6.1 Introduction
Modern digital computers are truly astounding in the matter of power and speed. Humans are not so intelli-
gent to compute millions of mathematical operations in a second. Humans cannot search a particular docu-
ment from among the millions in a computer, whereas a computer can do it in milliseconds. However, there
are some tasks where even the most powerful computers cannot compete with the human brain. Human
beings are good in narration, while computers are good in logic and mathematics. The art of storytelling
which humans have is not possessed by a computer. Imagine the power of a machine if it can accommodate
capabilities of both computers and humans. It would turn out to be the most remarkable invention of all
time. This is the aim of artificial intelligence, in general.

6.1.1 Introduction to Artificial Neural Networks


A neural network’s ability to perform computations is based on the hope that we can reproduce some of
the flexibility and power of the human brain by artificial means. It tries to mimic the structure and func-
tion of our nervous system. An artificial neural network (ANN) is used as a methodology for information
processing and the method got its inspiration from biological nervous systems. These systems consist of

MachineL_Ch06.indd 105 12/14/2018 1:47:17 PM


106 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

innumerable highly interconnected neurons working together to solve different kinds of problems. Learning
in biological systems takes place by adjusting the synaptic connections that exist between the neurons. ANN
learns mostly by example and thus tries to emulate this structure.

6.1.2 Use of Artificial Neural Networks


Neural networks are highly capable of deriving information from complicated or imprecise data. They can
extract patterns and detect trends which are too complex for humans or computers to perform. A trained
network can work as an expert to analyze the given data to extract information. The information can be
extracted from newly obtained data by comparing its characteristics to the trained data. Not only can ANN
be thought of as an expert, it has other advantages as well.
1. Adaptive Learning: Neural networks have the ability to do tasks after learning from experience gained
from previous data.
2. Self-Organization: ANN is capable of organizing and representing the information it receives from
training data.
3. Real-Time Operation: ANN computations can be carried out in parallel. Special hardware devices
can be designed and manufactured so that we can take advantage of this capability.
4. Fault Tolerance: If a neuron fails to work, the performance of the network will not stop, but it will
give less accurate results.

6.2 Evolution of Neural Networks


The evolution of neural networks dates way back to the 1870s. Different theories were invented from 1870
onwards, which is shown in Table 6.1. Starting from a single neuron, the evolution of ANN started centu-
ries before; right now we are in the era of deep learning.
Table 6.1 Evolution of neural networks
Year Theory Name Inventor Features

1871–73 Reticular theory Joseph von Gerlach Nervous system is a single continuous
network.
1888–91 Neuron doctrine Santiago Ramon y Golgi’s technique to study the nervous
Cajal system. Proposed that it is actually made
up of discrete individual cells forming a
network.
1891 Neuron term Heinrich Wilhelm Consolidation of neuron doctrine
coined Gottfried von
Waldeyer-Hartz
1950 Neuron doctrine Visualized using Nerve cells were individual cells intercon-
was accepted Electron Microscope nected through synapses. This was found
by electron microscope.
1943 McCulloch–Pitts McCulloch and Pitts Simplified model of a neuron.
Neuron
1957–58 Perceptron Frank Rosenblatt The perceptron may be able to learn, make
decisions, and translate languages.
(Continued  )

MachineL_Ch06.indd 106 12/14/2018 1:47:17 PM


6 . 3    B i o l o g i ca l N e u r o n  • 107

Table 6.1 (Continued)


Year Theory Name Inventor Features

1965–68 Multilayer Ivakhnenko et al. Though perceptron were advanced of


perceptron McCulloch–Pitts model it had its own
limitations.
1960–70 Back Propagation Became popular by A multilayered network of neurons with
Rumelhart et al. in hidden layer(s) can be used to approximate
1986 any continuous function to any desired
precision
2006 Unsupervised Hinton and Unsupervised pre-training.
learning Salakhutdinov Used in training a very deep learner

6.3 Biological Neuron


Dendrites are structures used for collecting input for a neuron. It collects and sums up the inputs and if the
result is greater than its firing threshold, the neuron fires else it inhibits. Figure 6.1 shows the structure of a
biological neuron. When a neuron fires, it sends an electrical impulse from its nucleus to its boutons. The
boutons can then network to more neurons via connections called synapses. Learning takes place by chang-
ing the effectiveness of the synapses so that the influence of one neuron on another changes. The human
brain consists of about one hundred billion (100,000,000,000) neurons, each with about 1000 synaptic
connections. Our intelligence depends on the effectiveness of these synaptic connections.

Boutons
Dendrites
Nucleus

Axon

Figure 6.1 Structure of a biological neuron.

6.3.1 From Human Neurons to Artificial Neurons


First we try to deduce the essential features of neurons and their interconnections. Figure 6.2 shows a direct
mapping between a human neuron and a biological neuron. We then typically program a computer to sim-
ulate these features. However, because our knowledge of neurons is incomplete and our computing power is
limited, our models are necessarily gross idealizations of real networks of neurons.
Network computation is performed by a dense mesh of computing nodes and connections. They oper-
ate collectively and simultaneously on most or all data and inputs. The basic processing elements of neural
networks are called artificial neurons, or simply neurons. Often we simply call them nodes. Neurons per-
form as summing and nonlinear mapping junctions. In some cases, they can be considered as threshold
units that fire when their total input exceeds certain levels. Neurons usually operate in parallel and are con-
figured in regular architectures. They are often organized in layers, and feedback connections both within

MachineL_Ch06.indd 107 12/14/2018 1:47:17 PM


108 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

Cell body
Dendrites
Threshold

Axon
Summation

Figure 6.2 An artificial neuron mimicking a biological neuron.

the layer and toward adjacent layers are allowed. Each connection strength is expressed by a numerical value
called weight, which can be modified.
Artificial neural systems function as parallel distributed computing networks. Their most basic charac-
teristic is their architecture. Only some of the networks provide instantaneous responses. Others need time
to respond and are characterized by their time-domain behavior, which we often refer to as dynamics.
Neural networks also differ from each other in their learning modes. There are a variety of learning rules
that establish when and how the connecting weights change. Finally, networks exhibit different speeds and
efficiency of learning. As a result, they also differ in their ability to accurately respond to the cues presented
at the input.
Vast discrepancies exist between both the architectures and capabilities of artificial and natural neural
networks. Knowledge about actual brain functions is so limited, however, and there is little to guide those
who would try to emulate them. No models have been successful in duplicating the performance of the
human brain. Therefore, the brain has been and still is only a metaphor for a wide variety of neural network
configurations that have been developed.

6.4 Basics of Artificial Neural Networks


The models of ANN are specified by the three basic entities:
1. Model’s synaptic connections.
2. Training/learning rules adopted for adjusting weights.
3. Activation functions.

6.4.1 Network Architecture


The arrangement of neurons form layers and the connection pattern formed within and between layers is
called network architecture. Neural networks can be classified as single-layer or multilayer neural networks.
Each layer is formed by a set of processing elements. Layer is a stage that can link between input layer and
output layer. How the linking takes place leads to different types of network architectures.

6.4.1.1 Single Layer Feed-forward Network


There can be a type of network in which the input layer is directly connected to output layer without any
intermediate layers. Such a network is called single-layer feed forward network (Fig. 6.3).

6.4.1.2 Multilayer Feed-forward Network


A multilayer feed-forward network is formed by the interconnection of one or more intermediate layers. The
input layer receives the input of the neural network and buffers the input signal. The output layer generates

MachineL_Ch06.indd 108 12/14/2018 1:47:18 PM


6 . 4   B a s i c s o f A rt i f i c i a l N e u ra l N et w o r k s  • 109

Input Output
layer layer
w11
X1 Y1
w21
wn1

w12
Input w22 Output
X2 Y2
neurons wn2 neurons

w2n w1m

Xn Ym
wnm

Figure 6.3 Single layer feed-forward network.

the output of the network. A layer that is formed between the input and output layers is called hidden layer.
The hidden layer does not contact with the external environment directly. There can be zero to several hidden
layers. The more the number of hidden layers, the more complex is the network. This may improve the effi-
ciency of the network but requires more time to train it. In a fully connected network, every output from one
layer is connected to every node in the next layer. This is illustrated in Fig. 6.4. In a feed-forward network, no
neuron from the output layer is connected as input to a node in the same layer or any of the preceding layers.

Input Hidden
layer layers
Output
X1 Weights R1 layer

Z1 Y1

Input
X2 R2
neurons
Z2 Y2 Output
neurons

Xn Rq

Zk Ym

Figure 6.4 Multilayer feed-forward network.

6.4.1.3 Feedback Network


When outputs are directed back as inputs to same or preceding layer nodes, it results in the network archi-
tecture of feedback networks. When the feedback of the output layer is connected back to the same layer,
then it is called lateral feedback.

MachineL_Ch06.indd 109 12/14/2018 1:47:19 PM


110 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

6.4.1.4 Recurrent Network


Recurrent network is a feedback network with a closed loop. This type of network can be a single layer
network or multilayer network. In a single layer network with a feedback connection, a processing ele-
ment’s output can be directed back to itself or to another processing element or to both as shown in
Fig. 6.5.

w11
X1 Y1

w22
X2 Y2

wmn
Xn Ym

Figure 6.5 Recurrent network.

When the feedback is directed back to the hidden layers it forms a multilayer recurrent network. In addition,
a processing element output can be directed back to itself and to other processing elements in the same layer.

6.4.2 Learning
Learning (or training) is a process in which the neural network responds to a stimulus correctly by adopt-
ing proper adjustments of values of network parameters producing the desired response for each stimulus.
Learning can be classified into three categories: supervised learning, unsupervised learning, and reinforce-
ment learning.

6.4.2.1 Supervised Learning


In supervised learning, the learning is done with the help of a teacher. Consider the example of a child
learning to sing. At first, he/she does not know how to sing. He/she tries to sing a song the same way as the
singer. The training involves listening to the song again and again till he/she can reproduce it in the same
tone and manner. Here, singing takes place by trying to sing in the same manner as the singer.
In supervised learning, each input vector is associated with the output which is desired. The input vector
and the corresponding output vector results in a training pair. Here, the network knows what should be the
output.
During training, the input vector is given to the network to produce an output. This output is the
actual output. Then this actual output is checked whether it is same as the desired output. The block dia-
gram of supervised learning algorithm is shown in Fig. 6.6. The difference between the actual and desired
output is considered as the error signal and is generated by the network. This error signal can be used to
adjust the weights of the network layers so that for all training pair the actual output becomes the desired
output.

MachineL_Ch06.indd 110 12/14/2018 1:47:19 PM


6 . 4   B a s i c s o f A rt i f i c i a l N e u ra l N et w o r k s  • 111

Neural
X (Input) Y (Actual Output)
Network

Error signal (D−Y )


Error
Signal D (Desired Output)
Generator

Figure 6.6 Blocks of supervised learning algorithm.

6.4.2.2 Unsupervised Learning


Just as the name suggests, unsupervised learning is done without the help of a teacher. Consider how a fish
learns to swim. It is not taught how to do so but it develops the skills on its own. Thus, it is clear that this
type of learning is independent and not taught by a teacher.
In unsupervised learning, the inputs of a similar category are grouped together without the help of any
training. The network clubs together the similar input patterns to form clusters in the training process.
When a new input is applied, the network gives an output response indicating the class to which it belongs.
If an input does not belong to any cluster, a new cluster is formed. This is shown in Fig. 6.7.

Neural
X (input) Y (Actual Output)
Network

Figure 6.7 Block diagram of unsupervised learning algorithm.

Here, there is no feedback from the environment to decide if the output is correct. Here the network discov-
ers its own patterns by changing its parameters. This is termed as self-organization.

6.4.2.3 Reinforcement Learning


Reinforcement learning is similar to supervised learning in that information is available. However, in case
of reinforcement learning, only critical information is available. The exact information needs to be obtained
from this critical information. The process of extracting real information from critical information is termed
as reinforcement learning (Fig. 6.8).

Neural
X (Input) Y (Actual Output)
Network

Error signal (D–Y )


Error
Signal Reinforcement Signal
Generator

Figure 6.8 Reinforcement model.

MachineL_Ch06.indd 111 12/14/2018 1:47:21 PM


112 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

6.5 Activation Functions


The neuron as a processing node performs the operation of summation of its weighted inputs, or the scalar
product computation to obtain the net. Subsequently, it performs the nonlinear operation f (net) through its
activation function. Typical activation functions used are bipolar activation functions and unipolar activation
functions.

6.5.1 Bipolar Activation Functions


There are two types of bipolar activation functions: bipolar binary and bipolar continuous.
The bipolar binary function is defined as
∆  +1, net > 0
f (net ) = sgn (net ) =  (6.2)
 −1, net < 0
The bipolar continuous function is defined as
∆ 2
f (net ) = −1 (6.3)
1 + exp ( − λ net )

where l > 0 is proportional to the neuron gain determining the steepness of the continuous function f  (net)
near net = 0.
The continuous activation function for various l is shown in Fig. 6.9.

0.5
f(x)

-0.5

-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x

Bipolar sigmoid (l = 1) Bipolar sigmoid (l = 2)


Bipolar sigmoid (l = 4) Bipolar sigmoid (l = 8)
Bipolar sigmoid (l = 16)

Figure 6.9 Activation functions for different values of λ.

MachineL_Ch06.indd 112 12/14/2018 1:47:22 PM


6 . 5    A ct i vat i o n F u nct i o n s  • 113

Notice that as l → ∞, the continuous function becomes the sgn(net) function. The word “bipolar” is used
to point out that both positive and negative responses of neurons are produced for this definition of the
activation function.

6.5.2 Unipolar Activation Functions


By shifting and scaling the bipolar activation functions, unipolar continuous and unipolar binary activation
functions can be obtained.
The unipolar continuous activation function is defined as
∆ 1
f (net ) = (6.4)
1 + exp ( − λ net )
The unipolar binary activation function is defined as
∆ 1, net > 0
f (net ) =  (6.5)
0, net < 0
Again, the unipolar binary function is the limit of f  (net) when l → ∞.

6.5.3 Identity Function


The identity function is a linear function and can be defined as

f  (x) = x for all x

Figure 6.10 shows the identity function.

f(x)

Figure 6.10 Identify function.

The output here remains the same as input. The input layer uses the identity activation function.

6.5.4 Ramp Function


The ramp function is defined as
1 if x >1

f ( x ) =  x if 0 ≤ x ≤1 (6.6)
0 if x <0

Figure 6.11 shows the ramp function. It is clear from the figure that f  (x) takes the value 0 for values of
x < 0, x for values of 0 < x < 1, and 1 for values of x > 1.

MachineL_Ch06.indd 113 12/14/2018 1:47:24 PM


114 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

f(x)

+1

x
+1

Figure 6.11 Ramp function.

6.6 McCulloch–Pitts Neuron Model


The first definition of a synthetic neuron was based on the simplified biological model formulated by
McCulloch and Pitts (1943). The McCulloch–Pitts model of the neuron is shown in Fig. 6.12.

w1
x1
w2
x2
T o
wi = ±1,
wn i = 1, 2, . . . n
xn

Figure 6.12 McCulloch–Pitts model.

The inputs xi, for i = 1, 2, ..., n, are 0 or 1, depending on the absence or presence of the input impulse at
instant k. The neuron’s output signal is denoted as “o”. The firing rule for this model is given in Eq. (6.7).

 n

1 if

∑w x
i =1
k
i i ≥T
k +1
o = n
(6.7)
0 if
 ∑ wi xik < T
i =1

where
k = 0, 1, 2, … denotes the discrete-time instant.
wi is the multiplicative weight connecting the ith input with the neuron’s membrane.
We will assume that a unity delay elapses between the instants k and k + 1. Note that for this model wi = +1
for excitatory synapses, wi = –1 for inhibitory synapses, and T is the neuron’s threshold value, which needs
to exceed by the weighted sum of signals for the neuron to fire.
The function f is a linear step function at threshold T as shown in Fig. 6.13.
Although this neuron model is simplistic, it has substantial computing potential. It can perform the
basic logic operations NOT, OR, and, provided its weight and threshold are appropriately selected. The
basic McCulloch–Pitts model for NOR gate is shown in Fig. 6.14.

MachineL_Ch06.indd 114 12/14/2018 1:47:26 PM


6 . 6    M c C ULLO C H – P I T T S N E U R O N M O D E L  • 115

Sum
0 T

Figure 6.13 Linear threshold function.

In McCulloch–Pitts neuron, only analysis can be performed. Hence, we need to assume both weights w1
and w2 are excitatory and analyze. If the weights are not suitable, we have to try one weight as excitatory and
the other weight as inhibitory and analyze.

x1 1

1 −1
x2 T=1 T=0 o
(NOR)

x3 1

Figure 6.14 Basic McCulloch–Pitts model for NOR gate.

6.6.1 Solved Problems


Solved Problem 6.1
Implement AND function using McCulloch–Pitts neuron.

Solution:
Consider the truth table for AND function.

x1 x2 y

0 0 0
0 1 0
1 0 0
1 1 1

As already mentioned, only analysis can be performed in the McCulloch–Pitts model. Hence, let us
assume the weights w1 = 1 and w2 = 1. The network architecture is shown in Fig. 6.15.

MachineL_Ch06.indd 115 12/14/2018 1:47:27 PM


116 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

x1
X1 w1 = 1

y
x2
X2 w2 = 1

Figure 6.15 AND implementation using McCulloch–Pitts model.

With these assumed weights, the net input is calculated for four inputs as

(1,1) , yin = x1w1 + x 2w2 = 1 × 1 + 1 × 1 = 2


(1, 0) , yin = x1w1 + x 2w2 = 1 × 1 + 0 × 1 = 1
(0,1) , yin = x1w1 + x 2w2 = 0 × 1 + 1 × 1 = 1
(0, 0) , yin = x1w1 + x 2w2 = 0 × 1 + 0 × 1 = 0

Thus, the output of neuron Y can be written as

1 if yin ≥ 2
y = f ( yin ) = 
0 if yin < 2

where 2 represents the threshold value.

Solved Problem 6.2


Implement ANDNOT function using McCulloch–Pitts neuron (use binary data representation).

Solution:
In case of ANDNOT function, the response is true if the first input is true and the second input is false.
For all other input variations, the response is false. The truth table for ANDNOT function is given as

x1 x2 y

0 0 0
0 1 0
1 0 1
1 1 0

Case 1:
Assume that both weights w1 and w2 are excitatory, i.e., w1 = 1 and w2 = 1. Then for the four inputs, we
calculate the net input using the following equation.

yin = x1w1 + x2w2

MachineL_Ch06.indd 116 12/14/2018 1:47:28 PM


6 . 6    M c C ULLO C H – P I T T S N E U R O N M O D E L  • 117

Now, we calculate the net input as follows


(1,1) , yin = 1 × 1 + 1 × 1 = 2
(1, 0) , yin = 1 × 1 + 0 × 1 = 1
(0,1) , yin = 0 × 1 + 1 × 1 = 1
(0, 0) , yin = 0 × 1 + 0 × 1 = 0
From the calculated net inputs, it is not possible to fire the neuron for input (1, 0) only. Hence, these
weights are not suitable.

Case 2:
Assume one weight as excitatory and the other as inhibitory, i.e., w1 = 1 and w2 = –1. Now, we calculate
the net input as follows
(1,1) , yin = 1 × 1 + 1 × −1 = 0
(1, 0) , yin = 1 × 1 + 0 × −1 = 1
(0,1) , yin = 0 × 1 + 1 × −1 = −1
(0, 0) , yin = 0 × 1 + 0 × −1 = 0

From the calculated net inputs, it is now possible to fire the neuron for input (1, 0) by fixing a threshold
of 1, i.e., T ≥ 1. Thus, w1 = 1 and w2 = –1. The output of the neuron can be written as
1 if yin ≥ 1
y = f ( yin ) = 
0 if yin < 1

Solved Problem 6.3


Implement XOR function using McCulloch–Pitts neuron.

Solution:
The truth table for XOR function is given as
x1 x2 y

0 0 0
0 1 1
1 0 1
1 1 0
The XOR function cannot be represented by a simple and single logic function. It is represented by the
following two equations.

MachineL_Ch06.indd 117 12/14/2018 1:47:30 PM


118 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

y = x1 x2 + x1 x2

y = z1 + z2

where z1 = x1 x2 , z2 = x1 x2 , y = z1 + z2.
A single layer net is not sufficient to represent the function. An intermediate layer is necessary.
First function ( z1 = x1 x 2 ):
The truth table for function z1 is shown as

x1 x2 z1

0 0 0
0 1 0
1 0 1
1 1 0

Case 1:
Assume both weights as excitatory, i.e., w12 = w22 = 1.
We calculate the net inputs
(0,0), z1in = 0 × 1 + 0 × 1 = 0
(0,1), z1in = 0 × 1 + 1 × 1 = 1
(1,0), z1in = 1 × 1 + 0 × 1 = 1
(1,1), z1in = 1 × 1 + 1 × 1 = 2
Hence it is not possible to obtain z1 using these weights.

Case 2:
Assume one weight as excitatory and the other as inhibitory, i.e., w12 = 1 and w22 = –1.
We calculate the net inputs
(0,0), z1in = 0 × 1 + 0 × –1 = 0
(0,1), z1in = 0 × 1 + 1 × –1 = –1
(1,0), z1in = 1 × 1 + 0 × –1 = 1
(1,1), z1in = 1 × 1 + 1 × –1 = 0

Case 3:
Assume one weight as inhibitory and the other as excitatory, i.e., w12 = –1 and w22 = 1
We calculate the net inputs
(0,0), z1in = 0 × –1 + 0 × 1 = 0
(0,1), z1in = 0 × –1 + 1 × 1 = 1
(1,0), z1in = 1 × –1 + 0 × 1 = –1
(1,1), z1in = 1 × –1 + 1 × 1 = 0

MachineL_Ch06.indd 118 12/14/2018 1:47:32 PM


6 . 6    M c C ULLO C H – P I T T S N E U R O N M O D E L  • 119

It is possible to get the desired output based on this calculated net input. Thus w12 = –1 and w22 = –1 and
threshold, T ≥ 1 for z2 neuron. The neuron can be represented as follows:

X1 Z1

−1
x2
X2

Second function ( z2 = x1 x 2 ):
The truth table for function z2 is given as

x1 x2 z1

0 0 0
0 1 1
1 0 0
1 1 0

Case 1:
Assume both weights as excitatory, i.e., w11 = w21 = 1
We calculate the net inputs
(0,0), z1in = 0 × 1 + 0 × 1 = 0
(0,1), z1in = 0 × 1 + 1 × 1 = 1
(1,0), z1in = 1 × 1 + 0 × 1 = 1
(1,1), z1in = 1 × 1 + 1 × 1 = 2

Case 2:
Assume one weight as excitatory and the other as inhibitory, i.e., w12 = 1 and w22 = –1.
We calculate the net inputs
(0,0), z1in = 0 × 1 + 0 × –1 = 0
(0,1), z1in = 0 × 1 + 1 × –1 = –1
(1,0), z1in = 1 × 1 + 0 × –1 = 1
(1,1), z1in = 1 × 1 + 1 × –1 = 0

Case 3:
Assume one weight as excitatory and the other as inhibitory, i.e., w12 = –1 and w22 = 1.
We calculate the net inputs
(0,0), z1in = 0 × –1 + 0 × 1 = 0
(0,1), z1in = 0 × –1 + 1 × 1 = 1
(1,0), z1in = 1 × –1 + 0 × 1 = –1
(1,1), z1in = 1 × –1 + 1 × 1 = 0

MachineL_Ch06.indd 119 12/14/2018 1:47:33 PM


120 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

It is possible to get the desired output based on this calculated net input. Thus w12 = –1, w22 = –1, and
threshold T ≥ 1 for z2 neuron. The neuron can be represented as follows:
Third function ( y = z1 OR z2):
The truth table is given as

x1 x2 y z1 z2

0 0 0 0 0
0 1 1 0 1
1 0 1 1 0
1 1 0 0 0

Here the net input is calculated as


yin = v1z1 + v2z2

Assume both weights as excitatory, i.e., v1 = v2 = 1. We calculate the net inputs


(0,0), yin = 0 × 1 + 0 × 1 = 0
(0,1), yin = 0 × 1 + 1 × 1 = 1
(1,0), yin = 1 × 1 + 0 × 1 = 1
(1,1), yin = 0 × 1 + 0 × 1 = 0
By setting threshold Θ ≥ 1, the network can be implemented.

z1
Z1 1

Y
1
z2
Z2

The McCulloch–Pitts model for XOR function is given as follows:

1
X1 Z1 1
−1
−1 Y
1
x2 1
X2 Z2

Summary

• An artificial neural network (ANN) is an infor- • A trained neural network can be thought of as
mation processing system inspired by biological an “expert” in the category of information it has
nervous systems. been given to analyze.

MachineL_Ch06.indd 120 12/14/2018 1:47:34 PM


V e ry S h o r t A n s w e r Q u e s t i o n s  • 121

• A trained neural network has other advan- • ANN is used both in supervised learning and
tages like adaptive learning, self-organization, unsupervised learning.
real-time operation, and fault tolerance. • ANN uses both discrete and continuous activa-
• A neuron collects inputs using a structure called tion functions.
dendrites. It effectively sums all of the inputs • The McCulloch-Pitt model is a basic model of
from the dendrites. If the resulting value is ANN which replicates the biological neuron.
greater than its firing threshold, then the neuron • By manipulating the weights and thresholds,
fires. basic logic operations like AND, OR, and NOT
• ANN has various network architectures. These can be implemented using the McCulloch–Pitts
are single layer feed-forward network, multilayer model.
feed-forward network, feedback network, and • Since EXOR is linearly inseparable two layers of
recurrent network. neurons are required to solve this issue.

Multiple-Choice Questions

1. What is unsupervised learning? (c) Moderate process


(a) The features of a group are not explicitly (d) Can be either slow or fast
stated
5. The change in weight vector depends on what
(b) The number of a group may not be known parameter?
(c) Both (a) and (b) (a) Learning
(d) None of the above (b) Input vector
2. Signal transmission at synapse is a (c) Learning signal
(a) Physical process (d) All of the above
(b) Chemical process 6. Which of the following are advantages of a
(c) Both (a) and (b) neural network over a conventional computer?
(d) None of the above      i. A
 neural network has the ability to learn by
3. The function of a dendrite is to act as a example
(a) Receptor   ii. A neural network is more fault tolerant
(b) Transmitter
iii. A neural network is more suited for
(c) Both (a) and (b) ­real-time operation due to its high “compu-
(d) None of the above tational” rate
4. Learning is a (a) (i) and (ii)
(a) Slow process (b) (i) and (iii)
(b) Fast process (c) All three statements are true
(d) None of the statements are true

Very Short Answer Questions

1. What is a simple artificial neuron? 3. What is a perceptron in machine learning?


2. List some commercial practical applications of 4. What are the advantages of neural networks?
artificial neural networks.
5. List different activation neurons or functions.

MachineL_Ch06.indd 121 12/19/2018 11:24:36 AM


122 • C H A P T E R 6 / A rt i f i c i a l N e u ra l N et w o r k s

Short Answer Questions

1. Mention what you can and cannot do with an 3. What are the requirements of learning rules in
ANN. ANN?
2. What are deterministic models? 4. What are linearly separable problems of interest
for neural network researchers?

Review Questions

1. Explain the working of a biological neuron 4. What activations functions are used in artificial
with the help of a neat diagram. neural network?
2. What are the similarities between biological 5. Why do linearly inseparable problems require
neuron and artificial neuron? two layers? Demonstrate with the example of
3. What are the different categories of learning EXOR problem using the McCulloch–Pitts
algorithms? model.

Answers

Multiple-Choice Questions
1. (d) 2. (b) 3. (a) 4. (a) 5. (d) 6. (c)

MachineL_Ch06.indd 122 12/14/2018 1:47:34 PM


258 • C H A P T E R 1 3 / I n tr o d u cti o n t o U n s u per v i s e d L ear n i n g A l g o rith m s

æ P1 P2 P3, P6, P4 P5 ö
ç P1 0 ÷
ç ÷
ç P2 0.23 0 ÷
ç ÷
ç P3, P6, P4 0.22 0.14 0 ÷
ç P5 0 .34 0 .14 0 . 23 0 ÷
è ø

Merging two closest members of the two clusters and finding the minimum element in distance matrix.

æ P1 P2 P3, P6, P4 P5 ö
ç P1 0 ÷
ç ÷
ç P2 0.23 0 ÷
ç ÷
ç P3, P6, P4 0.22 0.14 0 ÷
ç P5 0 .34 0 .14 0 . 23 0 ÷
è ø

Here the minimum value is 0.14 and hence we combine P2 and P5. Now, form cluster of elements cor-
responding to minimum values and update distance matrix. To update the distance matrix

min ((P2, P5), P1) = min((P2, P1), (P5, P1)) = min(0.23, 0.34) = 0.23
min ((P2, P5),(P3, P6, P4)) = min((P2, (P3, P6, P4)), (P5, (P3, P6, P4))) = min(0.14, 0.23) = 0.14

æ P1 P2, P5 P3, P6, P4 ö


ç P1 0 ÷
ç ÷
ç P2, P5 0.23 0 ÷
ç ÷
è P3, P6, P4 0.22 0.14 0 ø
Merging two closest members of the two clusters and finding the minimum element in distance matrix.

æ P1 P2, P5 P3, P6, P4 ö


ç P1 0 ÷
ç ÷
ç P2, P5 0.23 0 ÷
ç ÷
è P3, P6, P4 0.22 0.14 0 ø

Here the minimum value is 0.14 and hence we combine P2, P5 and P3, P6, P4. Now, form cluster of
elements corresponding to minimum values and update distance matrix. To update the distance matrix

min ((P2, P5, P3, P6, P4), P1) = min((P2, P5), P1), ((P3, P6, P4), P1)) = min(0.23, 0.22) = 0.22

æ P1 P2, P5, P3, P6, P4 ö


ç ÷
ç P1 0 ÷
ç P2, P5, P3, P6, P4 0.22 0 ÷
è ø

MachineL_Ch013.indd 258 12/15/2018 2:48:09 PM


1 3 . 4    H ierarchical Meth o d s  • 259

The dendogram can now be drawn as shown in Fig. 13.5.

P3

P6

P4

P2

P5

P1

Figure 13.5 Dendogram of the cluster formed.

13.4.1.3 Agglomerative Algorithm: Complete Link


Complete farthest distance or complete linkage is the agglomerative method that uses the distance between
the members that are farthest apart.

Solved Problem 13.6


For the given set of points, identify clusters using complete link agglomerative clustering.

Solution:
To compute distance matrix:

d éë( x , y ) ( a, b ) ùû = ( x - a )2 + ( y - b )
2

The Euclidean distance is:

d ( P1, P2 ) = (1.0 - 1.5)2 + (1.0 - 1.5)2


= 0.25 + 0.25 = 0.5 = 0.71

The distance matrix is:

 P1 P2 P3 P 6
P4 P5
 P1 0 
 
 P2 0.71 0 
 P3 5.66 4.95 0 
 
 P4 3.6 2.92 2.24 0 
 P5 4.24 3.53 1.41 1.0 0 
 
 P6 3.20 2.5 2.5 0.5 1.12 0 

Merging two closest members of the two clusters and finding the minimum element in distance matrix
and forming the clusters, we get

MachineL_Ch013.indd 259 12/15/2018 2:48:10 PM

You might also like