ADALINE NETWORK
Proposed by Widrow & Hoff in 1960’s.
Stands for Adaptive Linear Network.
Architecturally it is similar to perceptron network except for transfer
function.
Adaline uses pure purely linear transfer function while perceptron uses hard
limiting transfer function.
Has large number of application in signal processing.
W11
output
[ ] W12
∑
W1r
BIAS
Direct Weight
Adjustment or
Iterative Training
Algorithm
TARGET
Adaline network along with arrangement for training
DECISION BOUNDARY OF ADALINE N/W
Consider 2 input, 1 output adaline network
W11
a1*1
* + ∑
W12
b
1
n= w11p1 + w12p2 + b
a= purelin(n) = n
a= w11p1 + w12p2 + b
Limiting case n=0
w12p2 =- w11p1 - b
p2= -w11p1/w12 –b/w12
This line is called the decision boundary.
a= 0 along decision boundary
How to decide on which side o/p is greater than zero?
Direction of weight vector is the direction in which output will be positive.
Thus adaline has some limitation as the perceptron can classify only linearly
separable patterns. However due to linear TF it can be put to other uses.
TRAINING ADALINE USING LMS ALGORITHM.
N/W can be considered to be trained if it produces o/p with acceptable error
for i/ps.
Let X= * + = [ ]
Z= * +=[ ]
n = wTp + b
a = purelin(n)=n
In matrix notation,
a= [ ] [ ] = XTZ
e = t-a =t-XTZ
Since error may be positive or negative, we take square of the error.
e2 = (t- XTZ)2
Mean of square of errors
E[e2] = E[(t-a)2] = E[(t- XTZ)2]
where E[e2] = statistical expectation operator.
-----------------------
Expectation of discrete variable x E(X) = ∑ xip(xi)
xi is the ith discrete value of variable x.
p(xi) is probability of occurrence of xi.
Hence, E[e2] = e12p(e12) + e22p(e22) + ………
Assuming that all values of e2 have equal probability of occurrence p(e2)
=1/n.
E[X2] = e12/n+ e22/n+……. en2/n
Thus E[X2] is mean of square of errors or mean squared error.
------------------------
Let E[e2] = F(X) performance function(reflects how well the
network is performing)
F(X) = E[(t- XTZ)2]
= E[(t2-2tXTZ+XTZZTX)]
= E(t2) – 2XTE(tZ) +XTE(ZZT)X
= C – 2XTh + XTRX
Where C = E(t2), h = E(tZ), R = E(ZZT)
R is input correlation matrix ( measure of similarity of a signal with a
delayed version of same signal)
h is cross correlation vector (measure of similarity between a signal and
delayed version of another signal)
In order to bring F (X) to standard quadratic form
F(X) = 1/2 XT2RX -2hTX+C
=1/2 XTAX + dTX+C
Where A=2R d=-2h
Stationary point (point at which gradient is zero) can be found by setting
gradient of F(X) = 0
F(X) = 0
But gradient of quadrature function is given by Ax+d.
So, 2RX -2h = 0
2RX = 2h
2R-1RX = 2R-1h
X=R-1h
where R = E(ZZT) and h= E(t2)
Thus if we could calculate statistical properties like R & h the value of
vector X i.e. weights & biases can be computed directly i.e without any
iterations. In general it is not convenient to calculate h & R .We can avoid
calculating inverse of R using steepest descent algorithm.
WINDOW HOFF ALGORIYHM FOR TRAINING ADALINE.
It is an approximate steepest descend algorithm in which performance index
is mean square error. Performance function to be minimized is taken as e2(k)
rather than E(e2). Error is minimized after each individual pattern is applied.
(k+1)th value of vector X (weight vector) is found from kth value such that
F(x(k+1)) < F(x(k)). i.e. we are going down hill on the surface formed by
performance function.
xk+1 = xk – αkgk (method of steepest descent) gk is gradient at kth
iteration,
From 2 input network,
2
gk = F(x) = (k) = =
[ ] [ ]
e(k) =t(u) – a(k)
e(k) = t(k) –wTp(k)+b
e(k) = t(k) – ( w1ipi(k)+ b)
= 0- p1(k)-0……+0= -p1(k)
In general, = -pj(k) & = -1
gk = -2e(k)[ ]
xk+1 = xk -α gk
[ ] = [ ] +2 αe(k) [ ]
OR
W(k+1) =w(k)+2αe(k)p(k)
B(k+1) = b(k)+2αe(k)
These two equations are LMS algorithm or Widrow Hoff learning algorithm
or delta rule
Widrow Hoff Algorithm
Performance function to be minimized is e2(k)
Minimization method used is method of steepest descent.
xk+1= xk-αkgk
Window Hoff algorithm
W(k+1) =w(k)+2αe(k)pT(k)
B(k+1) = b(k)+2αe(k)
Steps:
1. Start with small random weights & biases.
2. Apply the 1st i/p vector & propagate it forward to find output.
3. Compute the error.
4. Modify weights & biases using formulae.
W(k+1) =w(k)+2αe(k)pT(k)
B(k+1) = b(k)+2αe(k)
5. Stop when e(k) drops to an acceptably low value.
In the classical LMS method, we first apply all the available input patterns &
find the individual errors. We then try to minimize the mean of squared
error. In Windrow Hoff method, we proceed in iterative fashion as each
input pattern is applied, thus avoiding finding inverse of matrix that requires
statistical properties of input vectors to be known. This solves large amount
of labor in practical sized problems.
PROBLEM:
I/O pairs are
p1 =* + t1 =1 ; p2 = * + t2= -1
Train the network using LMS algorithm with the initial guess set to zero &
learning rate α = 0.25. Neglect bias.
a(k) = purelin(wTpk)
w(k+1) = w(k) +2 αe(k)p(k)
p1 is applied
a(0)= purelin([ ] * +) =0 ; t(0) = 1
e(0) =t(0) –a(0) =1
w(1)= wT(0) +2*0.25*1*pT(0)
= [ ] + 2*0.25* [ ] =[ ]
p2 is applied.
a(1)= purelin([ ]* +)=0
e(1) = t(1) –a(1)= -1-0 = -1
w(2) = wT(1) +2*0.25*-1*pT(1)
=[ ] [ ]
=[ ]
Adaline is more widely used than perceptron.
Major area of application of adaline is in adaptive filtering.
Adaptive filter is able to separate undesirable components from signals even if
undesirable components fall in the frequency band as the useful signal.
The adaptive filtering has the foll applications.
Noise cancellation
System identification
Inverse sysyem modeling
Prediction
s(t) +f1(n(t)) Restored signal s(t)
Useful signal s(t)
f1(n(t)) f2(n(t))
Noise Adaline Training
path
filter algorithm
error
Noise source n(t)
System Identification
input output desired signal
system to
be
identified
target
LMS
Algorithm
error
output of
adaptive filter
Adaptive filter
adaline
Inverse System Modeling
output of filter
input System whose Adaptive filter
inverse model (adaline) for
is to be found inverse system
modeling
Training
algorithm
Delay
Prediction
Prediction is required in many situations.
signal
Actual value of current
sample (target)
LMS
D algorithm
error
Past D
samples Adaptive filter
of signal (adaline) for Predicted value
D prediction of current
sample