0% found this document useful (0 votes)
35 views61 pages

Lecture15 NeuronNetworks

This document discusses artificial neural networks (ANNs). It contains the following key points: 1) ANNs are composed of artificial neurons that are interconnected and inspired by biological neural systems. Each neuron has an input/output characteristic and implements local computation based on its interconnections and inputs. 2) ANNs can learn from training data by assigning and adjusting interconnection weights between neurons. The network topology, individual neuron characteristics, learning strategy, and training data determine the overall function. 3) Common applications of ANNs include pattern recognition, signal processing, control systems, and medical diagnosis.

Uploaded by

freeintro0404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views61 pages

Lecture15 NeuronNetworks

This document discusses artificial neural networks (ANNs). It contains the following key points: 1) ANNs are composed of artificial neurons that are interconnected and inspired by biological neural systems. Each neuron has an input/output characteristic and implements local computation based on its interconnections and inputs. 2) ANNs can learn from training data by assigning and adjusting interconnection weights between neurons. The network topology, individual neuron characteristics, learning strategy, and training data determine the overall function. 3) Common applications of ANNs include pattern recognition, signal processing, control systems, and medical diagnosis.

Uploaded by

freeintro0404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Artificial Intelligence

Lecturer 15 – Artificial Neuron Networks

School of Information and Communication


Technology - HUST

1
Artificial neural networks

• Artificial neural network (ANN)


• Inspired by biological neural systems, i.e., human brains
• ANN is a network composed of a number of artificial neurons

• Neuron
• Has an input/output (I/O) characteristic
• Implements a local computation

• The output of a unit is determined by


• Its I/O characteristic
• Its interconnections to other units
• Possibly external inputs

2
Artificial neural networks

• ANN can be seen as a parallel distributed information


processing structure
• ANN has the ability to learn, recall, and generalize from
training data by assigning and adjusting the interconnection
weights
• The overall function is determined by
• The network topology
• The individual neuron characteristic
• The learning/training strategy
• The training data

3
Applications of ANNs
• Image processing and computer vision
• E.g., image matching, preprocessing, segmentation and analysis, computer
vision, image compression, stereo vision, and processing and understanding of
time-varying images
• Signal processing
• E.g., seismic signal analysis and morphology
• Pattern recognition
• E.g., feature extraction, radar signal classification and analysis, speech
recognition and understanding, fingerprint identification, character recognition,
face recognition, and handwriting analysis
• Medicine
• E.g., electrocardiographic signal analysis and understanding, diagnosis of various
diseases, and medical image processing

4
Applications of ANNs
• Military systems
• E.g., undersea mine detection, radar clutter classification, and tactical speaker
recognition
• Financial systems
• E.g., stock market analysis, real estate appraisal, credit card authorization, and
securities trading
• Planning, control, and search
• E.g., parallel implementation of constraint satisfaction problems, solutions to
Traveling Salesman, and control and robotics
• Power systems
• E.g., system state estimation, transient detection and classification, fault detection
and recovery, load forecasting, and security assessment
• ...

5
Structure and operation of a neuron
• The input signals to the
neuron (xi, i = 1..m) x0=1
• Each input xi associates
with a weight wi x1 w0
w1
• The bias w0 (with the input x2 Output
x0=1) …
w2
 of the
• Net input is an integration wm neuron
xm (Out)
function of the inputs –
Net(w,x)
• Activation (transfer)
function computes the
Inputs Net Activation
output of the neuron –
f(Net(w,x)) to the input (transfer)
neuron (Net) function
• Output of the neuron: (x) (f)
Out=f(Net(w,x))

6
Net input and The bias
• The net input is typically computed using a linear function
m m
Net = w0 + w1 x1 + w2 x2 + ... + wm xm = w0 .1 +  wi xi =  wi xi
i =1 i =0
• The importance of the bias (w0)
→ The family of separation functions Net=w1x1 cannot separate the
instances into two classes
→ The family of functions Net=w1x1+w0 can

Net Net
Net = w1x1

Net = w1x1 + w0
x1 x1

7
Activation function – Hard-limiter
• Also called the threshold function 1, if Net  
Out ( Net ) = hl1( Net , ) = 
• The output of the hard-limiter is either 0, if otherwise
of the two values
•  is the threshold value
• Disadvantage: neither continuous nor Out ( Net ) = hl 2( Net , ) = sign( Net , )
continuously differentiable

Binary
Out
Bipolar
Out
hard-limiter
hard-limiter
1
1

 0 Net  0 Net

-1

8
Activation function – Threshold logic

 0, if Net  −
 1
Out ( Net ) = tl ( Net ,  , ) =  ( Net +  ), if −   Net  − 
 
 1, if 1
Net  − 
 
(α >0)
= max(0, min(1,  ( Net +  ))) Out

• It is called also saturating linear


function 1
• A combination of linear and hard-
limiter activation functions - (1/α)-
0 Net
• α decides the slope in the linear
range
1/α
• Disadvantage: continuous – but not
continuously differentiable

9
Activation function – Sigmoidal
1
Out ( Net ) = sf ( Net ,  , ) =
1 + e − ( Net + )

•Most often used in ANNs Out


•The slope parameter α is important
1
•The output value is always in (0,1)
•Advantage 0.5
• Both continuous and continuously
differentiable - 0 Net
• The derivative of a sigmoidal
function can be expressed in terms
of the function itself

10
Activation function – Hyperbolic tangent
1 − e − ( Net + ) 2
Out ( Net ) = tanh( Net ,  ,  ) = − ( Net + )
= − ( Net + )
−1
1+ e 1+ e

• Also often used in ANNs Out


• The slope parameter α is important
• The output value is always in (-1,1)
1
• Advantage
• Both continuous and continuously
- 0 Net
differentiable
• The derivative of a tanh function can -1
be expressed in terms of the function
itself

11 11
Network structure
bias
◼ Topology of an ANN is composed by:
❑ The number of input signals and input
output signals
❑ The number of layers hidden
❑ The number of neurons in each layer layer
❑ The number of weights in each neuron
output
❑ The way the weights are linked layer
together within or between the layer(s)
output
❑ Which neurons receive the (error)
correction signals • An ANN with one hidden layer
◼ Every ANN must have • Input space: 3-dimensional
❑ exactly one input layer • Output space: 2-dimensional
❑ exactly one output layer • In total, there are 6 neurons
❑ zero, one, or more than one hidden - 4 in the hidden layer

layer(s) - 2 in the output layer

12
Network structure
• A layer is a group of neurons
• A hidden layer is any layer between the input and the output layers
• Hidden nodes do not directly interact with the external environment
• An ANN is said to be fully connected if every output from one layer is
connected to every node in the next layer
• An ANN is called feed-forward network if no node output is an input to a
node in the same layer or in a preceding layer
• When node outputs can be directed back as inputs to a node in the same (or
a preceding) layer, it is a feedback network
• If the feedback is directed back as input to the nodes in the same layer, then it is
called lateral feedback
• Feedback networks that have closed loops are called recurrent networks

13
Network structure – Example
single layer single node with
feed-forward feedback to itself
network

single layer
recurrent
network

multilayer
feed-forward
network
multilayer
recurrent
network

14
Learning rules
• Two kinds of learning in neural networks
• Parameter learning
→ Focus on the update of the connecting weights in an ANN
• Structure learning
→ Focus on the change of the network structure, including the number of
processing elements and their connection types

• These two kinds of learning can be performed simultaneously


or separately
• Most of the existing learning rules are the type of parameter
learning
• We focus the parameter learning

15
General weight learning rule
• At a learning step (t) the adjustment
of the weight vector w is x0= 1
w0
proportional to the product of the
x1 a neuron
learning signal r(t) and the input x(t) w1
Out
w(t) ~ r(t).x(t) x ... wj
w
xj
w(t) = .r(t).x(t) ...
wm x Learning d
where  (>0) is the learning rate xm
signal
 generator
• The learning signal r is a function of
w, x, and the desired output d
r = g(w,x,d) Note that xj can be either:
• The general weight learning rule • an (external) input signal, or
• an output from another neuron
w(t) = .g(w(t),x(t),d(t)).x(t)

16
Perceptron
• A perceptron is the
simplest type of ANNs x0=1
x1 w0
• Use the hard-limit w1
activation function x2 Out

w2

 m 
Out = sign(Net ( w, x) ) = sign  w j x j  xm
wm
 j =0 
• For an instance x, the
perceptron output is
• 1, if Net(w,x)>0
• -1, otherwise

17
Perceptron – Illustration

The decision hyperplane


x1 w0+w1x1+w2x2=0

Output=1

Output=-1
x2

18
Perceptron – Learning
• Given a training set D= {(x,d)}
• x is the input vector
• d is the desired output value (i.e., -1 or 1)
• The perceptron learning is to determine a weight vector that makes the
perceptron produce the correct output (-1 or 1) for every training
instance
• If a training instance x is correctly classified, then no update is needed
• If d=1 but the perceptron outputs -1, then the weight w should be
updated so that Net(w,x) is increased
• If d=-1 but the perceptron outputs 1, then the weight w should be
updated so that Net(w,x) is decreased

19
Perceptron_incremental(D, η)
Initialize w (wi ← an initial (small) random value)
do
for each training instance (x,d)D
Compute the real output value Out
if (Outd)
w ← w + η(d-Out)x
end for
until all the training instances in D are correctly classified
return w

20 20
Perceptron_batch(D, η)
Initialize w (wi ← an initial (small) random value)
do
∆w ← 0
for each training instance (x,d)D
Compute the real output value Out
if (Outd)
∆w ← ∆w + η(d-Out)x
end for
w ← w + ∆w
until all the training instances in D are correctly classified
return w

21 21
Perceptron - Limitation
• The perceptron learning procedure is proven
to converge if
A perceptron cannot correctly
• The training instances are linearly separable
classify this training set!
• With a sufficiently small η used
• The perceptron may not converge if the
training instances are not linearly separable
• We need to use the delta rule
• Converges toward a best-fit approximation of
the target function
• The delta rule uses gradient descent to
search the hypothesis space (of possible
weight vectors) to find the weight vector that
best fits the training instances

22
Error (cost) function
• Let’s consider an ANN that has n output neurons
• Given a training instance (x,d), the training error made by the
currently estimated weights vector w:
n 2

Ex (w ) =  (d i − Outi )
1
2 i =1
• The training error made by the currently estimated weights
vector w over the entire training set D:
1
ED (w ) =
D
 E (w )
xD
x

23
Gradient descent
• Gradient of E (denoted as E) is a vector
• The direction points most uphill
• The length is proportional to steepness of hill

• The gradient of E specifies the direction that produces the steepest


increase in E
 E E E 

E ( w ) =  , ,..., 
 w1 w2 wN 
where N is the number of the weights in the network (i.e., N is the length of w)

• Hence, the direction that produces the steepest decrease is the negative of
the gradient of E
w = -.E(w); E
wi = − , i = 1..N
wi must be
• Requirement: The activation functions used in the network
continuous functions of the weights, differentiable everywhere

24
Gradient descent – Illustration

One-dimensional Two-dimensional
E(w) E(w1,w2)

25
Gradient_descent_incremental (D, η)
Initialize w (wi ← an initial (small) random value)
do
for each training instance (x,d)D
Compute the network output
for each weight component wi
wi ← wi – η(∂Ex/∂wi)
end for
end for
until (stopping criterion satisfied)
return w

Stopping criterion: # of iterations (epochs), threshold error, etc.

26 26
Multi-layer NNs and Back-propagation alg.
• As we have seen, a perceptron can only express a linear decision
surface
• A multi-layer NN learned by the back-propagation (BP) algorithm
can represent highly non-linear decision surfaces
• The BP learning algorithm is used to learn the weights of a multi-
layer NN
• Fixed structure (i.e., fixed set of neurons and interconnections)
• For every neuron the activation function must be continuously
differentiable
• The BP algorithm employs gradient descent in the weight update rule
• To minimize the error between the actual output values and the desired
output ones, given the training instances

27
Back-propagation algorithm (1)
• Back-propagation algorithm searches for the weights vector
that minimizes the total error made over the training set
• Back-propagation consists of the two phases
• Signal forward phase. The input signals (i.e., the input vector) are
propagated (forwards) from the input layer to the output layer (through
the hidden layers)
• Error backward phase
• Since the desired output value for the current input vector is known,
the error is computed
• Starting at the output layer, the error is propagated backwards through
the network, layer by layer, to the input layer
• The error back-propagation is performed by recursively computing the
local gradient of each neuron

28
Back-propagation algorithm (2)

Signal forward phase


• Network activation

Error backward phase


• Output error computation
• Error propagation

29
Derivation of BP alg. – Network structure
• Let’s use this 3-layer NN to
illustrate the details of the BP
learning algorithm Input xj x1 ... xj ... xm
(j=1..m)
• m input signals xj (j=1..m) wqj
• l hidden neurons zq (q=1..l)
Hidden
• n output neurons yi (i=1..n) ... ...
neuron zq Outq
• wqj is the weight of the (q=1..l)
interconnection from input signal wiq
xj to hidden neuron zq
• wiq is the weight of the Output ... ...
interconnection from hidden neuron yi
neuron zq to output neuron yi (i=1..n) Outi
• Outq is the (local) output value of
hidden neuron zq
• Outi is the network output w.r.t.
the output neuron yi

30
BP algorithm – Forward phase (1)
• For each training instance x
• The input vector x is propagated from the input layer to the output layer
• The network produces an actual output Out (i.e., a vector of Outi, i=1..n)

• Given an input vector x, a neuron zq in the hidden layer receives


a net input of

…and produces a (local) output of m


Net q =  wqj x j
j =1

where f(.) is the activation (transfer) function ofneuron zq 


m
Out q = f ( Net q ) = f   wqj x j 
 j =1 

31
BP algorithm – Forward phase (2)
• The net input for a neuron yi in the output layer is
l l  m 
Neti =  wiq Out q =  wiq f   wqj x j 
q =1 q =1  j =1 
• Neuron yi produces the output value (i.e., an output of the
network)
 l   l  m 
Outi = f ( Neti ) = f   wiq Out q  = f   wiq f   wqj x j  
 q =1 
 q =1    j =1 
• The vector of output values Outi (i=1..n) is the actual network
output, given the input vector x

32
BP algorithm – Backward phase (1)
• For each training instance x
• The error signals resulting from the difference between the desired output
d and the actual output Out are computed
• The error signals are back-propagated from the output layer to the
previous layers to update the weights

• Before discussing the error signals and their back propagation,


we first define an error (cost) function

1 n 1 n
E (w) =  (d i − Outi ) =  d i − f ( Neti )
2 2

2 i =1 2 i =1
2
1  n  l 
=  d i − f   wiq Out q 
2 i =1   q =1 

33
BP algorithm – Backward phase (2)
• According to the gradient-descent method, the weights in the hidden-
to-output connections are updated by
E
wiq = −
wiq
• Using the derivative chain rule for E/wiq, we have
 E   Outi   Neti 
wiq = −     
 =  d i − Outi  f ' (Neti ) Out q =  i Out q
 Outi   Neti   wiq 
(note that the negative sign is incorporated in E/Outi)
• i is the error signal of neuron yi in the output layer
E  E   Out 
i = − = −  = d i − Outi  f ' ( Neti )
i

Neti  Out   Netyi in the output layer, and
where Neti is the net input to ineuron i
f'(Neti)=f(Neti)/Neti

34
BP algorithm – Backward phase (3)
• To update the weights of the input-to-hidden connections, we
also follow gradient-descent method and the derivative chain
rule

E  E   Out q   Net q 
wqj = − = −    
wqj  Out q
  Net q 
  wqj 

• From the equation of the error function E(w), it is clear that
each error term (di-yi) (i=1..n) is a function of Outq
2
1  n  l 
E (w ) =  d i − f   wiq Out q 
2 i =1   q =1 

35
BP algorithm – Backward phase (4)
• Evaluating the derivative chain rule, we have
 
wqj =   (d i − Outi ) f ' ( Neti ) wiq f ' (Net q ) x j
n

i =1

 
=    i wiq f ' (Net q ) x j =  q x j
n

i =1

• q is the error signal of neuron zq in the hidden layer


E  E   Out q 
 = f ' (Net q )  i wiq
n
q = − = − 
Net q  Out q   Net q  i =1

where Netq is the net input to neuron zq in the hidden layer, and
f'(Netq)=f(Netq)/Netq

36
BP algorithm – Backward phase (5)
• According to the error equations i and q above, the error signal of
a neuron in a hidden layer is different from the error signal of a
neuron in the output layer
• Because of this difference, the derived weight update procedure is
called the generalized delta learning rule
• The error signal q of a hidden neuron zq can be determined
• in terms of the error signals i of the neurons yi (i.e., that zq connects to)
in the output layer
• with the coefficients are just the weights wiq
• The important feature of the BP algorithm: the weights update
rule is local
• To compute the weight change for a given connection, we need only the
quantities available at both ends of that connection!

37
BP algorithm – Backward phase (6)
• The discussed derivation can be easily extended to the network
with more than one hidden layer by using the chain rule
continuously
• The general form of the BP update rule is
wab = axb
• b and a refer to the two ends of the (b→a) connection (i.e., from neuron
(or input signal) b to neuron a)
• xb is the output of the hidden neuron (or the input signal) b,
• a is the error signal of neuron a

38
Back_propagation_incremental(D, η)
A network with Q feed-forward layers, q = 1,2,...,Q
qNet and qOuti are the net input and output of the ith neuron in the qth layer
i

The network has m input signals and n output neurons


qw is the weight of the connection from the jth neuron in the (q-1)th layer to the ith
ij
neuron in the qth layer
Step 0 (Initialization)
Choose Ethreshold (a tolerable error)
Initialize the weights to small random values
Set E=0
Step 1 (Training loop)
Apply the input vector of the kth training instance to the input layer (q=1)
qOut
i = 1Outi = xi(k), I
Step 2 (Forward propagation)
Propagate the signal forward through the network, until the network outputs
(in the output layer) QOuti have all been obtained
 q q −1 
q
Outi = f ( q
)
Neti = f   wij Out j 

 j 

39
Step 3 (Output error measure)
Compute the error and error signals Qi for every neuron in the output layer
1 n
E = E +  (d i( k ) − QOuti ) 2
2 i =1
Q
δi = (di(k) −QOuti )f '( QNeti )
Step 4 (Error back-propagation)
Propagate the error backward to update the weights and compute the error
signals q-1i for the preceding layers
qwij = .(qi).(q-1Outj); qw
ij = qwij + qwij
q −1
δi = f '( q −1Neti ) q w ji q δ j ; for all q = Q, Q − 1,...,2
j
Step 5 (One epoch check)
Check whether the entire training set has been exploited (i.e., one epoch)
If the entire training set has been exploited, then go to step 6; otherwise, go to step 1
Step 6 (Total error check)
If the current total error is acceptable (E<Ethreshold) then the training process terminates
and output the final weights;
Otherwise, reset E=0, and initiate the new training epoch by going to step 1

40
BP illustration – Forward phase (1)
f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3)

41
BP illustration – Forward phase (2)
f(Net1)
w1x1 x1
x1 w1x2 x2 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3) Out1 = f (w1x1 x1 + w1x2 x2 )

42
BP illustration – Forward phase (3)
f(Net1)

x1 f(Net4)
w2 x1 x1
Out6
f(Net6)
f(Net2)

w2 x2 x2
x2 f(Net5)

f(Net3)
Out2 = f (w2 x1 x1 + w2 x2 x2 )

43
BP illustration – Forward phase (4)
f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 w3 x1 x1 f(Net5)

w3 x2 x2 f(Net3)
Out3 = f ( w3 x1 x1 + w3 x2 x2 )

44
BP illustration – Forward phase (5)
f(Net1)
w41Out1
x1 w42 Out 2 f(Net4)

Out6
f(Net2)
w43 Out3 f(Net6)

x2 f(Net5)

f(Net3)
Out 4 = f ( w41Out1 + w42 Out 2 + w43Out3 )

45
BP illustration – Forward phase (6)
f(Net1)

x1 w51Out1 f(Net4)

Out6
f(Net6)
f(Net2)
w52 Out 2
x2 f(Net5)

w53 Out3
f(Net3)
Out5 = f ( w51Out1 + w52 Out 2 + w53Out3 )

46
BP illustration – Forward phase (7)
f(Net1)

x1 f(Net4)
w 64 Out4
Out6
f(Net6)
f(Net2)

w65 Out5
x2 f(Net5)

f(Net3)
Out6 = f ( w64 Out 4 + w65 Out5 )

47
BP illustration – Compute the error
f(Net1)

x1
6
f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)
d is the desired
output value
f(Net3)  E   Out6 
E
6 = − = −   = d − Out6   f ' (Net 6 )
Net 6  Out6   Net6 

48
BP illustration – Backward phase (1)
f(Net1)
4
x1 f(Net4)
w64 6
Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3)
δ4 = f ' (Net 4 )(w64 δ6 )

49
BP illustration – Backward phase (2)
f(Net1)

x1 f(Net4)
6
Out6
f(Net6)
5
f(Net2)

w65
x2 f(Net5)

f(Net3)
δ5 = f '(Net 5 )(w65 δ6 )

50
BP illustration – Backward phase (3)
1
f(Net1)
w41 4
x1 w51 f(Net4)

Out6
f(Net6)
5
f(Net2)

x2 f(Net5)

f(Net3) δ1 = f '(Net1 )(w41δ4 + w51δ5 )

51
BP illustration – Backward phase (4)

4
f(Net1)

x1 w42 f(Net4)
2
Out6
f(Net6)
5
f(Net2) w52

x2 f(Net5)

f(Net3)
δ2 = f '(Net 2 )(w42 δ4 + w52 δ5 )

52
BP illustration – Backward phase (5)

4
f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2) w43
5
x2 w53 f(Net5)
3
f(Net3) δ3 = f '(Net 3 )(w43 δ4 + w53 δ5 )

53
BP illustration – Weight update (1)
1
w1x1 f(Net1)

x1 w1x2 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

w1x1 = w1x1 + 1 x1


f(Net3)
w1x2 = w1x2 + 1 x2
54
BP illustration – Weight update (2)
f(Net1)

x1 f(Net4)

w2 x1 2 Out6
f(Net6)
f(Net2)

w2x2
x2 f(Net5)

w2 x1 = w2 x1 +  2 x1
f(Net3)
w2 x2 = w2 x2 +  2 x2

55
BP illustration – Weight update (3)
f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 w3x1
3
f(Net5)

w3x2 w3 x1 = w3 x1 +  3 x1
f(Net3)
w3 x2 = w3 x2 +  3 x2

56
BP illustration – Weight update (4)
f(Net1)
w41 4
x1 w42 f(Net4)

Out6
w43 f(Net6)
f(Net2)

x2 f(Net5)

w41 = w41 +  4Out1


f(Net3) w42 = w42 +  4Out 2
w43 = w43 +  4Out3

57
BP illustration – Weight update (5)
f(Net1)

x1 f(Net4)

Out6
f(Net6)
5
f(Net2)
w 51
w52
x2 f(Net5)

w53 w51 = w51 +  5Out1


f(Net3) w52 = w52 +  5Out 2
w53 = w53 +  5Out3

58
BP illustration – Weight update (6)
f(Net1)

x1 f(Net4)
w64 6
Out6
f(Net6)
f(Net2)
w65
x2 f(Net5)

f(Net3)
w64 = w64 + ηδ6Out4
w65 = w65 + ηδ6Out5
59
Advantages vs. Disadvantages
• Advantages
• Massively parallel in nature
• Fault (noise) tolerant because of parallelism
• Can be designed to be adaptive

• Disadvantages
• No clear rules or design guidelines for arbitrary applications
• No general way to assess the internal operation of the network
(therefore, an ANN system is seen as a “black-box”)
• Difficult to predict future network performance (generalization)

60
When using ANNs?
• Input is high-dimensional discrete or real-valued
• The target function is real-valued, discrete-valued or vector-
valued
• Possibly noisy data
• The form of the target function is unknown
• Human readability of result is not (very) important
• Long training time is accepted
• Short classification/prediction time is required

61

You might also like