ENPM808F:
Robot Learning
Summer 2017
Lecture 2:
Memory-Based Learning
Course Outline
• Motor Learning and the EvoluBon of Intelligence
• Memory-Based Learning
• Behavior Based RoboBcs
• Reinforcement Learning
• Value versus Policy IteraBon
• Q-Learning and Actor-CriBc Models
• Robot Shaping and Evolving Behaviors
• Crossing the Reality Gap
• ImitaBon and Learning from DemonstraBon
• Deep Reinforcement Learning with CNNs
• On-line and Lifelong Learning
Global vs. Local Learning
Global Neural Network Learning (e.g., mul=layer perceptron network)
u=lizes a completely distributed weight representa=on, thus requiring
update of all network weights per paFern per itera=on.
Local Neural Network Learning (e.g., CMAC, RBF, LWR) u=lizes
Locally distributed weight representa=on, thus requiring only
A small subset of all network weights.
Global Learning Local Learning
Advantages: Advantages:
• Compact Representa.on • Rapid Convergence
• Automa.c Resource Alloca.on • Computa.onally Inexpensive
• Generally Con.nuous and • No Local Minima
Differen.able Mappings • Convergence Guaranteed
• Very High Accuracy
Disadvantages: Disadvantages:
• Very Slow Convergence • Memory Intensive
• Unpredictable Local Minima (may not • Resource Alloca.on Not Automa.c
converge to global minimum) • Con.nuity and Differen.ability of
• Computa.onally Expensive Mapping More Difficult to Guarantee
• Generaliza.on Not Easily Controllable
• Compara.vely Poor Accuracy
(on some problems)
Secant Approxima=on to Tangent
Con=nuous CMAC
Con=nuous CMAC
Curse of Dimensionality
vs.
Blessing of Non-Uniformity*
* --- Pedro Domingos
The Curse of Dimensionality in Machine Learning refers to the effect that
many algorithms that work well in low dimensions become intractable in
higher dimensions.
The Blessing of Non-Uniformity refers to the fact that many problems, and
therefore input spaces, which are high-dimensional can be represented
using a lower dimensional manifold or representa=on.
Lazy Learning
versus
Eager Learning
Lazy versus Eager Learning
Lazy Learning methods store all of the training data and use it only
when called with a new input vector (query) to perform a
mapping. They make no assump=ons about the overall
shape of the global mapping before the query is presented.
Also referred to as Instance-based Learning methods.
Examples include: k-Nearest Neighbors, Locally Weighted Regression,
and Case-Based Reasoning
Lazy versus Eager Learning
Eager Learning methods construct an approximate representa=on
of the global func=on before receiving a query. They are
therefore limited to genera=ng a single global
approxima=on to the target mapping.
Examples include: Mul.layer Perceptrons (e.g., Backprop Networks),
RBFs, Decision Trees, CMAC, …
k-Nearest Neighbor
Training Examples !! , !(!! )
k-Nearest Neighbor
Query
!!
k-Nearest Neighbor
Query
!!
k-Nearest Neighbor
k = 6
!
!!! !(!! )
!(!! ) ←
!
k-Nearest Neighbor
Requires:
• Training exemplars and queries map to points in ℜ!
• Small input vectors
• Sufficient density of training data to cover areas of interest
Advantages:
• No informa=on is lost
• Fast training
• Can model highly complex surfaces
Disadvantages:
• Addi=onal computa=onal complexity to answer queries
• Weights all input aFributes equally
• Suffers from Curse of Dimensionality
k-Nearest Neighbor Algorithm
(Real-valued)
• Store all training examples !! , !(!! )
!! ! !
• Given query calculate mean of values of nearest neighbors
!
!!! !(!! )
!(!! ) ←
!
• Note that there are no weights to update
Distance Weighted
k-Nearest Neighbor Algorithm
(Real-valued)
Neighbors are weighted based upon their distance from the query point !!
!(!! , !! ) !! !!
Define distance between and
1
Define weights !! ≡ !
!(! ,
! ! ! )
!
!!! !! !(!! )
!! ! !! ←
Then given query , !
!!! !!
Shepard’s Method
• Store all training examples !! , !(!! )
!! !
• Given query calculate weighted sum of values of all points
!
!!! !! !(!! )
! !! ← !
!!! !!
• Note that since all points are used, Shepard’s Method is a global learning
algorithm
Nearest Neighbor vs
Locally Weighted Regression
k-Nearest Neighbor
(Discrete)
Distance Weighted
k-Nearest Neighbor
(Con=nuous)
Locally Weighted
Regression
Locally Weighted Regression
!(!)
LWR is a Lazy Learning method in which an approxima=on is
!!
formed around each query point
!!
• It is Local since only the points near are used.
• It is Weighted since the influence of the points is determined by their
distance from the query point.
!(!)
• It is Regression since approximates a real-valued target func=on.
We can approximate the target func=on as a linear combina=on of
m weighted aFributes of the training points
! ! = !! + !! !! ! + ⋯ + !! !! (!)
Locally Weighted Linear Regression
Unweighted Averaging Locally Weighted Averaging
Using Springs Using Springs
Locally Weighted Linear Regression
To calculate squared error over k nearest neighbors, define error func=on
!
1
!! !! ≡ (! ! − !(!))!
2
!!!
To weight the influence of points based upon distance, define kernel func=on
1
!! ! ≡
!
1
!! ! ≡ !
!
!
!! ! ≡ ! !!
To minimize the weighted error across the en=re training set
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
Locally Weighted Linear Regression
Some typical kernel
func=ons (from
Atkeson et al., 1997a)
Locally Weighted Linear Regression
To minimize the weighted error across the en=re training set
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
To minimize the weighted error across the k nearest neighbors
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
The weight update rule can then be expressed
!
△ !! = ! !(!) ! ! − ! ! !! (!)
!!!
! !
with learning rate for each aFribute of input vector
Locally Weighted Nonlinear Regression
Locally weighted regression techni-
ques may be extended to u=lize
nonlinear local support func=ons
(e.g, quadra=c, cubic polynomial)
through use of widely supported
curve fi`ng techniques from
sta=s=cal analysis.
LOESS (Locally Es=mated ScaFerplot
Smoothing) and LOWESS (Locally
Weighted ScaFerplot Smoothing) are
efficient nonparametric methods for
fi`ng models to subsets of data.
Radial Basis Func=on Networks
Radial Basis Func.on Networks (RBFs) are a type of
neural network closely related to distance-weighted
regression.
They form global approxima=ons of the target
func=ons using a linear combina=on of basis
func=ons weighted by distance from the points of
interest.
Radial Basis Func=on Network
! = (!! ! , !! ! , … , !! (!))
Radial Basis Func=on Networks
! ! = !! + !! !! (!(!! , !))
!!!
!!
Kernel func=on is defined so that
it decreases with increasing distance.
!!
A common choice for is the
Gaussian func=on
! !
! ! ! (! ! ,!)
!! ! !! , ! = ! !!!
!!!
where denotes the variance of the Gaussian at !!
RBF Network Training
RBF networks are trained in a two-stage process.
First, the number of hidden units is determined (ocen the same as
the number of points in the training set). Their centers are
ini=alized (generally to the training points), and their variances
ini=alized to correspond with the chosen kernel func=on.
Second, the weights are trained using the global error func=on
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
or a localized error func=on, e.g.
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
RBF Network Training
The hidden layer nodes may
alterna=vely be ini=alized using the
EM (Expecta.on Maximiza.on)
Algorithm, an unsupervised
clustering technique for fi`ng data
to a mixture of Gaussians.
Only the input vectors are used to
determine the cluster centers. As
EM is an unsupervised clustering
technique, the target vectors
(outputs) are not used or needed.
RBF Network Training
The outputs, however, are used to determine the weights connec=ng
the hidden layer nodes to the output. Since the output(s) are a linear
combina=on of these inputs, there are numerous linear regression-based
techniques for efficiently op=mizing these weights.
Locally Weighted Learning
for Robo=c Control
Locally weighted learning for control u=lizes Memory Based (Lazy)
Learning methods for construc=ng local models from data.
A key concept is that rather than building a global model up front,
simply store all of the data.
When a query is presented, use the data near the query point to
construct a local model to answer the query. This model is then
discarded.
Locally Weighted Learning methods specify how models may be
built using Lazy Learning methods (such as LWR), but not how
they are used to construct learning controllers.
Locally Weighted Learning
for Robo=c Control
First we will consider temporally independent control tasks, such
as setpoint control and vehicle trajectory following, such that
! = ! !, ! + !"#$%
! ! !
for output vector , state vector , and control vector .
! !
The control task is to choose so that the outcome is .
!
Use Lazy Learning to infer a model which approximates . !
Locally Weighted Learning
for Inverse Models
Inverse model based control techniques use states and desired
outcomes to predict the control inputs necessary to achieve
the desired outcomes.
! = ! !! (!, !)
Learned database
implemen=ng
inverse model
Locally Weighted Learning
for Inverse Models
Pros:
• The database is “trained” by adding new points (!, !, !)
! !
• If there is a monotonic rela=onship between and , then there
are efficient methods for rapidly converging on the correct mapping
Cons:
• May not work if
Ø Vector space of ac=ons and outcomes is not the same
Ø Mapping is not one-to-one
Ø Data include misleading noisy observa=ons
Locally Weighted Learning
for Inverse Models
Inverse model fails to accurately predict control ac.on for
desired outcome on non-monotonic func.on.
Locally Weighted Learning
for Forward Models
Forward model based control techniques use states and control
inputs to predict outcomes.
! = !(!, !)
Learned database
implemen=ng
forward model
Locally Weighted Learning
for Forward Models
Pros:
• The database is “trained” by adding new points (!, !, !)
• Allows “mental simula=on,” or predic=on of the effects of different
ac=ons
Cons:
• Requires search of the database to find ac=on that corresponds
to the desired outcome for the current state.
Combining Inverse and
Forward Models
An Inverse Model may be used to generate a good star=ng poin=ng
for search of a Forward Model.
!! (!, ! )
! ! = ! !
! may be used with a Lazy Forward Model
!
! = !(!, !! )
! !!
If is close to then Newton’s Method may be used to
!
further refine .
Locally Weighted Learning
for Robo=c Control
Next we will consider temporally dependent control tasks such that
! ! + 1 = !(! ! , ! ! )
The task is then to regulate (control) the state to achieve a desired
!!
setpoint , or a sequence of values to create a trajectory
!! 1 , !! 2 , !! 3 , …
The database is “trained” by adding triples of the form (!! , !! , !!!! )
!!
current state
! !
current control input
! next state
!!!
Deadbeat Control
for Devil S=cking
One-step deadbeat control chooses ac.ons to cause the immediate
next state to be the desired next state. If the next state is aYainable
in one step the ac.on may be chosen without regard to future
states, decisions, or performance.
Atkeson et al. (1997) applied deadbeat control to learn the Devil
S=cking task.
!! = ! !! (!! , !!!! )
First, an Inverse Model was learned
(database populated from sufficient explora=on).
!!
Next, given a desired state the database was queried to
determine !! = ! !! (!! , !! )
Locally Weighted Learning
for Robo=c Control
Devil s=cking robot link, by Professor Chris Atkeson:
YouTube Link: Devil S=cking Robot Video
Locally Weighted Learning
for Robo=c Control
Deadbeat Control will fail if the dynamics of the plant being
controlled require more than a single step lookahead.
Lazy Learning may be applied to more complex control
methods (e.g., LQR, Nonlinear Op.mal Control).
The use of linear regression models ocen (with sufficient data)
guarantees the existence of deriva=ves, which may be easily
calculated from the models through numerical differen=a=on.
Learning methods for Nonlinear Op.mal Control techniques
(e.g., value and policy itera=on) ocen fall under Reinforcement
Learning techniques and will be visited later in the course.
References
• Albus, James S., “A New Approach to Manipulator Control: The Cerebellar Model
Ar=cula=on Controller,” Journal of Dynamic Systems, Measurement, and Control,
pp. 225, September 1975.
• Atkeson, C. G., Moore, A. W., and Schaal, S., "Locally Weighted Learning,”
Ar.ficial Intelligence Review, 11.1-5 (1997): 11-73, 1997.
• Atkeson, C. G., Moore, A. W., and Schaal, S., “Locally Weighted Learning for Control,”
Ar.ficial Intelligence Review, 11:1-5 (1997): 75-113, 1997.
• Domingos, P., "A few useful things to know about machine learning,"
Communica.ons of the ACM, 55.10:78-87, 2012.
• Mitchell, T., “The Discipline of Machine Learning,” CMU-ML-06-108
(CMU Technical Report), July 2006.
• Mitchell, T., Machine Learning, Chapter 8, McGraw Hill Educa=on, 1997.
Reading Assignments
• Atkeson, C. G., Moore, A. W., & Schaal, S., "Locally Weighted Learning,”
Ar=ficial Intelligence Review 11.1-5 (1997): 11-73.
• Atkeson, C. G., Moore, A. W., & Schaal, S., “Locally Weighted Learning for
Control,” Ar=ficial Intelligence Review, 11:1-5 (1997): 75-113.
• Brooks, Rodney A. "Intelligence without representa=on." Ar.ficial Intelligence
47.1 (1991): 139-159.
• Arkin, Ronald. "Motor schema based naviga=on for a mobile robot: An approach
to programming by behavior." Robo.cs and Automa.on. Proceedings. 1987 IEEE
Interna.onal Conference on. Vol. 4. IEEE, 1987.
• Wahde, Ma`as, and Wolff, Krister, “Behavior-Based Robo=cs,” 1997.
On-line (hYp://www.am.chalmers.se/~wolff/AA/Chapter3.pdf)
Homework Assignment #2
Due by 4pm, 06/20
1) Program a Discrete CMAC and train it on a 1-D func=on (ref: Albus 1975, Fig. 5)
Explore effect of overlap area on generaliza=on and =me to convergence.
2) Program a Con=nuous CMAC by allowing par=al cell overlap, and modifying
the weight update rule accordingly. Compare the output of the Discrete CMAC
with that of the Con=nuous CMAC.
3) Discuss how you might use recurrent connec=ons to train a CMAC to output
a desired trajectory without using =me as an input (e.g., state only).