What actions can human Brain do?
§ Human can do Classification
§ Human can do Clustering (group similar patterns together)
§ Human can do Mapping (pattern association)
• Child must be trained before asking him to classify some thing
or to cluster similar patterns
• There is a part in the human brain that allow the human to do
the previous actions called Biological Neural Network (BNN)
• The main component in human neural system is Neuron Cell
• Neuron considered to be a small processor and memory in
human brain
Human brain
• The brain is a highly complex, non-linear, parallel
information processing system.
• It performs tasks like pattern recognition, perception,
motor control, many times faster than the fastest
digital computers.
• It characterizes by
– Robust and fault tolerant
– Flexible – can adjust to new environment by learning
– Can deal with fuzzy, probabilistic, noisy or inconsistent
information
– highly parallel
– small, compact and requires little power.
2
Human Brain VS Von Neuman Computer
Human brain Von Neumann
computer
# elements 1010 - 1012 neurons 107 - 108
transistors
# connections / 104 50
element
power 10 Watt 100 - 500 Watt
consumption
reliability of low reasonable
elements
reliability of high reasonable
system
Data analog digital
representation
Memory distributed localized
localization
Processing parallel sequential
Skill acquisition learning programming
3
Biological Neuron structure
• Brain consists of approximately 1011 elements
called neurons.
• Components of Biological Neurons:
► Dendrite: Receives signals from other neurons
► Soma: Processes the information
► Axon: Transmits the output of this neuron
► Synapse: Point of connection to other neurons
4
How Biological neurons work
• Electrical signal is generated by the neuron, passes down
the axon, and is received by the synapses that join onto
other neurons dendrites.
• Transmitter Chemicals can have an excitatory effect on the
receiving neuron (making it more likely to fire) or an
inhibitory effect (making it less likely to fire)
• Let's say you are watching Friends. Now the information
your brain receives is taken in by the set of neurons that
will help you make a decision on whether to laugh or not.
5
• Neurons are arranged in a hierarchical fashion – Layers ,
each layer has its own role and responsibility
• Example: To detect a face, the brain could be relying on
the entire network and not on a single layer.
6
Learning in networks of neurons
• Knowledge is represented in neural networks by the
strength of the synaptic connections between neurons
(hence “connectionism”)
• Learning in neural networks is accomplished by adjusting
the synaptic strengths
• Lets suppose that I want to predict my own decision,
whether to watch a random football game or not on TV.
The inputs are all boolean i.e., {0,1} and my output variable
is also boolean {0: Will watch it, 1: Won’t watch it}.
• So, x_1 could be isPremierLeagueOn (I like Premier League
more)
• x_2 could be isItAFriendlyGame (I tend to care less about
the friendlies)
• x_3 could be isNotHome (Can’t watch it when I’m running
errands. Can I?)
• x_4 could be isManUnitedPlaying (I am a big Man United
fan)
7
Artificial Neural network (ANN)
► An Artificial neural network is an information-processing
system that has certain performance characteristics in
common with biological neural networks.
► ANN have been developed as generalizations of mathematical
models of human cognition or neural biology.
BNN VS ANN
Biological neural network (BNN) Artificial neural network (ANN)
Soma Neuron
Dendrite Input
Axon Output
Synapse Weigh
Learning the solution to a Changing the connection weights
problem
Examples Training data
§ Figure of Artificial Neuron
8
How ANN Works ?
1) Information processing occurs at many simple elements called neurons.
2) Signals are passed between neurons over connection links.
3) Each connection link has an associated weight, which multiplies the
signal transmitted.
4) Net input is calculated as the weighted sum of the input signals
5) Each neuron applies an (Transfer) activation function to its net input
(sum of weighted input signals) to determine its output signal.
6) Each neuron has a single threshold value
7) An output signal is either discrete (e.g., 0 or 1) or it is a real-valued
number (e.g., between 0 and 1)
y = f(netinput). è f is activation function
9
Learning in ANN
► Strength of connection between
the neurons is stored as a weight-
value for the specific connection.
► Training algorithm change the
connection weights using
training data
Adding Bias
• A linear neuron is a more flexible model if we include a bias.
• A Bias unit can be thought of as a unit which always has an
output value of 1, which is connected to the hidden and output
layer units via modifiable weights.
• It sometimes helps convergence of the weights to an acceptable
solution
• A bias is exactly equivalent to a weight on an extra input line
that always has an activity of 1.
10
ANN with Bias
y = f(netinput). è f is activation function
11
Example-1:
Calculate net input
net input = 3
Calculate output using activation function (step function)
12
Example-2
Example-3 – Sigmoidal function
net input = 3
13
Feature Extraction
Characteristics of ANN
1. Architecture (Structure): the pattern of nodes and connections
between them
2. Training, or learning, algorithm: method of determining the weights
on the connections
3. Activation function: A function that produces an output based on
the input values received by node
14
Architecture
1- Feed Forward NN
- Neurons are arranged in separate layers (input – hidden - output)
- No feed back
A- Single layer : has one layer of connection weights
► input layer connected to neurons of output layer.
► fully connected
B- Multi-Layer network :
► A net with one or more layers of neurons
► Hidden neurons between input units and output layers.
► Solve more complicated problems than can single layer nets.
15
2- Feed-back (Recurrent) NN :
► Some connections are present from a layer to the previous layers
► Powerful and complicated
3- Associative Network:
► The connections can be bidirectional
16
Training – Learning algorithm.
17
1- Supervised Learning
• Presented with inputs together with the target (teacher signal).
• NN produce output close to target by adjusting the weights.
• “error correction method”,
– Least Mean Square (LMS)
– Back Propagation
2- Un-Supervised Learning
• No teacher (target signal) from outside
• Competitive learning: neurons take part in competition for each
input.
3- Reinforcement training
• Generalization of Supervised Learning;
• Teacher scores the performance of the training examples.
• Based on actions
1- Supervised 2- Unsupervised
Define: Define:
Each of the training patterns Each of the training patterns
associated with target output not associated with target
vector output vector
Data: Data:
(Input, desired output) (Different input)
Problems: Problems:
Classification , regression Clustering , data reduction
Pattern recognition
NN models:
NN models: Self-organizing maps (SOM)
18
perceptron Hopfield
Heb
Activation function
1- Identity function – linear transfer function
• Performs no input squashing
• Not very interesting...
• output = input
2- Binary step function – threshold function – hard limit transfer
function – unipolar
• Convert the net input to binary (l or 0)
• Also known as Heaviside function.
19
3- Bipolar step function – Symmetric hard Limit
output = 1 net > = θ
-1 net < θ
4- Binary sigmoid – log sigmoid
• Squashes the neuron’s pre-activation between 0 and 1
• "S"-shaped curve
• Always positive
• Bounded
• Strictly increasing
• Continuous
20
5- hyperbolic tangent (‘‘tanh’’)
• Squashes the neuron’s pre-activation between -1 and 1
• Can be positive or negative
• Bounded
• Strictly increasing
6- Rectified linear activation function
• Bounded below by 0 (always non-negative)
• Not upper bounded
• Strictly increasing
• If the net input less or equal than 0 è output 0
• If the net input bigger than 0 è output equals the net
21
7- Radial basis activation function
Summary of Activation Functions
22
Artificial Neural Network Development Process
23
Linearly Separable Function
§ The function is capable of assigning all inputs to two categories.
§ Used if number of classes is 2
§ Decision boundary : line that partitions the plane into two decision
regions
b + åi =1 x i w i = 0
n
§ Decision boundary has equation
o Positive region : decision region for output 1 with equation
b + x1w1 + x2 w 2 ³ 0
o Negative region : decision boundary for output -1 with
equation
b + x1w1 + x2 w 2 < 0
§ Example on linear separable problems
24
§ If two classes of patterns can be separated by a decision boundary
then they are said to be linearly separable
§ If such a decision boundary does not exist, then the two classes
are said to be linearly inseparable (Non-Linear Separable)
§ Linearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed.
25
Capacity of single neuron
• Could do binary classification (two outputs):
• also known as logistic regression classifier
o if greater than 0.5,
predict class 1
o otherwise, predict
class 0
u Can solve linearly separable problems
26
u Can’t solve non-linearly separable problems
McCulloch-Pitts network
• The McCulloch-Pitts neuron is perhaps the earliest artificial neuron
• The neuron has binary inputs (0 or 1)
• The binary activation (output)
• The neuron either fires (has an activation of 1) or
• The neuron does not fire (has an activation of 0).
• Each neuron has a fixed threshold value T such that if the net
input to the neuron is greater than the threshold, the neuron fires.
• The activation function is Binary step
•
27
Architecture
• McCulloch-Pitts neuron receive signals from any number of neurons.
• Each connection path is either excitatory, with weight w > 0, or
inhibitory, with weight w < 0 .
• All excitatory connections into a particular neuron have the same
weights.
• Output of each neuron is as follow
• Figure of McCulloch-Pitts neuron
Algorithm
• The weights for a McCulloch-Pitts neuron are set, together with the
threshold for the neuron's activation function,
• The analysis is used to determine the values of the weights and
threshold.
• Logic functions will be used as simple examples for a number of
neural nets.
28
Example -1 : And Function
Example -2 : OR Function
29
Example -3 : And Not (y = X1’ X2 )
Example - 4: NAND Function
30
Example -5 : XOR Function
• X1 xor X2 = X1’ X2 + X1 X2‘
• X1 xor X2 = (X1 And Not X2 ) OR (X2 And Not X1)
• Z1 = (X1 And Not X2 )
• Z2 = (X2 And Not X1)
31
ý Applications of neural Network
☺ Financial modelling – predicting stocks, currency exchange rates
☺ Other time series prediction – climate, weather,
☺ Computer games – intelligent agents, backgammon
☺ Control systems – autonomous adaptable robotics
☺ Pattern recognition – speech recognition, hand-writing recognition,
☺ Data analysis – data compression, data mining
☺ Noise reduction – ECG noise reduction
☺ Bioinformatics – DNA sequencing
ý Advantage of Neural Network
o ANN are powerful computation system: consists of many neurons
o Generalization:
§ can learn from training data and generalize to new one.
§ using responses to prior input patterns to determine the
response to a novel input
o fault tolerance:
§ able to recognizes a pattern that has some noises
§ Still works when part of the net fails
o Massive parallel processing:
§ process more than one pattern at same time using same
set of weights
o distributed memory representation
o Adaptability:
§ increase network ability of recognition by more training
o low energy consumption
o useful for brain modeling
o used pattern recognition
o Able to learn any complex non-linear mapping
o Learning instead of programming
o Robust: Can deal with incomplete and/or noisy data
32
ý Disadvantage of Neural Network
o Need training to operate
o High processing time for training
o require specialized HW and SW
o lack of understanding the behavior of NN
o convergence not guaranteed ( to reach solution is not
guaranteed )
o no mathematical proof for the learning process
o Difficult to design
o They are no clear design rules for arbitrary applications
o Learning process can be very time consuming
o Can over-fit the training data, becoming useless for
generalization
ý Types of Problems Solved by NN
☺ Classification: determine to which of a discrete number of
classes a given input case belongs
☺ Regression: predict value of continuous variable (weather)
☺ Times series- you wish to predict the value of variables from
earlier values of the same or other variables.
☺ Clustering (Natural language processing , Data mining)
☺ Control (Automotive control) (robotics)
☺ Function approximation (Modelling)
Modelling of highly nonlinear industrial processes , Financial
market prediction.
33
ý Who is concerned with NNs?
• Computer scientists want to find out about the properties of non-
symbolic information processing with neural nets and about learning
systems in general.
• Statisticians use neural nets as flexible, nonlinear regression and
classification models.
• Engineers of many kinds exploit the capabilities of neural networks
in many areas, such as signal processing and automatic control.
• Cognitive scientists view neural networks as a possible apparatus to
describe models of thinking and consciousness (High-level brain
function).
• Neurophysiologists use neural networks to describe and explore
medium-level brain function (e.g. memory, sensory system, motorics).
• Physicists use neural networks to model phenomena statistical
mechanics and for a lot of other tasks.
• Biologists use Neural Networks to interpret nucleotide sequences.
• Philosophers and some other people may also be interested in
Neural Networks for various reasons.
34
Kohonen Self Organizing Map (SOM)
ý Training algorithm used for clustering
ý Clustering: Grouping a set of data objects (patterns) into clusters
o Objects similar to one another within the same cluster
o Dissimilar to the objects in other clusters
ý Use unsupervised training for clustering
ý Measure of similarity between two patterns
o Euclidean distance
d (i, j) = (| x - x |2 + | x - x |2 +...+ | x - x |2 )
i1 j1 i2 j 2 ip jp
ý Kohonen self-organizing map is based on competitive learning (winner-
takes-all).
ý In competitive learning the network is trained to organize input vector
space into s clusters
ý Competitive learning, neurons compete with each other. The winners of
the competition Update their weights
ý Kohonen Network Architecture
9
ý Number of neurons in output layer equal to number of clusters
ý Neurons in output layer are typically placed in a 1 or 2 dimensional lattice
structure.
R=0
o Linear structure
R=1
R=2
o Rectangular
R=0
R=1
R=2
o Hexagonal
R=0
R=1
R=2
ý Based on the lattice structure, each neuron has a neighborhood of
adjacent neurons.
ý In a linear array each neuron has 2 nearest neighbors (R = 1)
ý In a rectangular grid each neuron has 8 nearest neighbors ( R =1)
ý In a hexagonal grid each neuron has 6 nearest neighbors. ( R = 1 )
ý The output layer called the Kohonen layer.
ý The neurons in the Kohonen layer are called Kohonen neurons
10
ý SOM algorithm in high level
o Initialize weight
o Competitive process: compute the response of every neuron to the
input pattern and locate the winner.
o Cooperative process: determine the neighborhood of the winning
neuron
o Weight adjustment process: weights of the winning neuron and
neighboring neurons are updated .
ý SOM algorithm in detail
l Initialize the weights to small random values. Set learning rate
parameters.
l While stopping condition is false
– For each input vector x do
l Compute the distance of x to each weight vector
D(J) = ∑ (wij – xi )2
l Find the index such that D(J) is minimum
l For all neurons j in the current neighborhood update
the weights:
wij(new) = wij (old) + α (xi – wij(old))
– Update the learning rate α(t) = k α (t)
– Update the size of the topological neighborhood
ý Stopping condition
o Specific number of iterations
o Learning rate (α) reach particular value
11
ý Example
Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into 2 clusters with no
topological structure (assumes no neighborhood) R = 0
Initial α
α (0) = 0.6
decrease α
α (t+1) = 0.5 * a(t)
initial radius R=0
random initial weight
Solution
W1 of cluster 1 W2 of cluster 2
.2 .8
.6 .4
.5 .7
.9 .3
First Pattern P1= (1 1 0 0)
Compute distance
D(1) = (W1 - P1) = (.2 – 1)2 + (.6 – 1)2 + (.5 – 0)2 + (.9 – 0)2 = 1.86
D(2) = (W2 - P1) = (.8 – 1)2 + (.4 – 1)2 + (.7 – 0)2 + (.3 – 0)2 = .98
Find winner
D(2) is minimum à neuron of cluster 2 wins !
Update weigh of neuron 2
wi2(new) = wi2(old) + .6[xi – wi2(old)]
12
New weight
Second Pattern P2= (0 0 0 1)
13
Third Pattern P3= (1 0 0 0)
Fourth Pattern P4= (0 0 1 1)
14
ÞApplication Examples of Kohonen Network
o A neural phonetic typewriter
Speaker speech recognizer
o Data compression
Compress and quantize data, such as for speech and images,
before storage or transmission.
15