0% found this document useful (0 votes)

38 views69 pages

Robotics Imitation Learning Course

This document provides an agenda for a course on imitation learning for robotics. It discusses administrative details like the course website and grading policy. It then covers topics like behavioral cloning, imitation learning, and applications. Examples discussed include using neural networks for lane following in self-driving cars, and using visual teach and repeat to have robots precisely track demonstrated paths over long distances.

Uploaded by

Beerbhan Naru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views69 pages

Robotics Imitation Learning Course

Uploaded by

Beerbhan Naru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

CSC2626

Imitation Learning for Robotics

Florian Shkurti
Week 1: Behavioral Cloning vs. Imitation
New robotics faculty in CS

Jessica Burgner-Kahrs Animesh Garg Myself Igor Gilitschenski

Today’s agenda

• Administrivia
• Topics covered by the course
• Behavioral cloning
• Imitation learning
• Quiz about background and interests
• (Time permitting) Query the expert only when policy is uncertain
Administrivia
Administrivia
This is a graduate level course

Course website: http://www.cs.toronto.edu/~florian/courses/csc2626w21

Discussion forum + announcements: https://q.utoronto.ca (Quercus)

Request improvements anonymously: https://www.surveymonkey.com/r/LJJV5LY

Course-related emails should have CSC2626 in the subject

Prerequisites
Mandatory:
• Introductory machine learning (e.g. CSC411/ECE521 or equivalent)
• Basic linear algebra + multivariable calculus
• Intro to probability
• Programming skills in Python or C++ (enough to validate your ideas)

Recommended:
• Experience training neural networks or other function approximators
• Introductory concepts from reinforcement learning or control (e.g. value function/cost-to-go)
Prerequisites
Mandatory:
If you’re missing any of
• Introductory machine learning (e.g. CSC411/ECE521 or equivalent) these this is not the course
• Basic linear algebra + multivariable calculus for you.
• Intro to probability
You’re welcome to audit.
• Programming skills in Python or C++ (enough to validate your ideas)

Recommended: If you’re missing this we can

• Experience training neural networks or other function approximators organize tutorials to help you.
• Introductory concepts from reinforcement learning or control (e.g. value function/cost-to-go)
Grading
Two assignments: 50%

Course project: 50%

• Project proposal: 10%
• Midterm progress report: 5%
• Project presentation: 5%
• Final project report (6-8 pages) + code: 30%

Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Grading
Two assignments: 50% Individual submissions

Course project: 50%

• Project proposal: 10%
• Midterm progress report: 5%
• Project presentation: 5%
• Final project report (6-8 pages) + code: 30%

Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Grading
Two assignments: 50% Individual submissions

Course project: 50%

• Project proposal: 10%
• Midterm progress report: 5% Groups of 2-3

• Project presentation: 5%
• Final project report (6-8 pages) + code: 30%

Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Guiding principles for this course

Robots do not operate in a vacuum. They do not need to learn everything from scratch.
Guiding principles for this course

Robots do not operate in a vacuum. They do not need to learn everything from scratch.

Humans need to easily interact with robots and share our expertise with them.
Guiding principles for this course

Robots do not operate in a vacuum. They do not need to learn everything from scratch.

Humans need to easily interact with robots and share our expertise with them.

Robots need to learn from the behavior and experience of others, not just their own.
Main questions

How can robots incorporate others’

decisions into their own?

How can robots easily understand our

objectives from demonstrations?

How do we balance autonomous

control and human control in the
same system?
Main questions
Learning from demonstrations
How can robots incorporate others’ Apprenticeship learning
decisions into their own? Imitation learning

Reward/cost learning
How can robots easily understand our Task specification
objectives from demonstrations? Inverse reinforcement learning
Inverse optimal control
Inverse optimization

How do we balance autonomous

control and human control in the Shared or sliding autonomy
same system?
Applications

Any control problem where:

- writing down a dense cost function is difficult

- there is a hierarchy of decision-making processes

- our engineered solutions might not cover all cases

- unrestricted exploration during learning is slow or dangerous

https://www.youtube.com/watch?v=M8r0gmQXm1Y
Applications

Any control problem where:

- writing down a dense cost function is difficult

- there is a hierarchy of interacting decision-making processes

- our engineered solutions might not cover all cases

- unrestricted exploration during learning is slow or dangerous

https://www.youtube.com/watch?v=Q3LXJGha7Ws
Applications

Any control problem where:

- writing down a dense cost function is difficult

- there is a hierarchy of interacting decision-making processes

- our engineered solutions might not cover all cases

- unrestricted exploration during learning is slow or dangerous

https://www.youtube.com/watch?v=RjGe0GiiFzw
Applications

Any control problem where:

- writing down a dense cost function is difficult

- there is a hierarchy of interacting decision-making processes

- our engineered solutions might not cover all cases

- unrestricted exploration during learning is slow or dangerous Robot explorer

Applications

Any control problem where:

- writing down a dense cost function is difficult

- there is a hierarchy of interacting decision-making processes

- our engineered solutions might not cover all cases

- unrestricted exploration during learning is slow or dangerous

https://www.youtube.com/watch?v=0XdC1HUp-rU
Back to the future

https://www.youtube.com/watch?v=2KMAAmkz9go https://www.youtube.com/watch?v=ilP4aPDTBPE

Navlab 1 (1986-1989) Navlab 2 + ALVINN (Dean Pomerleau’s PhD thesis, 1989-1993)

30 x 32 pixels, 3 layer network, outputs steering command
~5 minutes of training per road type
ALVINN: architecture

https://drive.google.com/file/d/0Bz9namoRlUKMa0pJYzRGSFVwbm8/view
Dean Pomerleau’s PhD thesis
ALVINN: training set

Online updates via

backpropagation
Problems Identified by Pomerlau

Test distribution is different Catastrophic forgetting

from training distribution
(covariate shift)
(Partially) Addressing Covariate Shift
(Partially) Addressing Catastrophic Forgetting

1. Maintains a buffer of old (image, action) pairs

2. Experiments with different techniques to ensure diversity and avoid outliers

Behavioral Cloning = Supervised Learning
25 years later

https://www.youtube.com/watch?v=qhUvQiKec2U
How much has changed?

offline

End to End Learning for Self-Driving Cars, Bojarski et al, 2016

How much has changed?

“Our collected data is labeled with road type, weather condition, and the driver’s
activity (staying in a lane, switching lanes, turning, and so forth).”

End to End Learning for Self-Driving Cars, Bojarski et al, 2016

How much has changed?
How much has changed?

A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots, Giusti et al., 2016
https://www.youtube.com/watch?v=umRdt3zGgpU
How much has changed?

Not a lot for learning lane following with neural networks.

But, there are a few other beautiful ideas that do not involve end-to-end learning.
Visual Teach & Repeat
Human Operator or
Planning Algorithm

Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat
Key Idea #1: Manifold Map

Build local maps relative to the

path. No global coordinate frame.

Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat
Key Idea #1: Manifold Map Key Idea #2: Visual Odometry

Build local maps relative to the Given two consecutive images,

path. No global coordinate frame. how much has the camera
moved? Relative motion.

Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat

https://www.youtube.com/watch?v=_ZdBfU4xJnQ https://www.youtube.com/watch?v=9dN0wwXDuqo

Centimeter-level precision in tracking the demonstrated path over kilometers-long trails.

Today’s agenda

(Ross & Bagnell, 2010): How are we sure these errors are not due to
overfitting or underfitting?

1. Maybe the network was too small (underfitting)

2. Maybe the dataset was too small and the network overfit it

Steering commands
where s are image features

Test distribution is different

from training distribution
(covariate shift)
Back to Pomerleau

(Ross & Bagnell, 2010): How are we sure these errors are not due to
overfitting or underfitting?

1. Maybe the network was too small (underfitting)

2. Maybe the dataset was too small and the network overfit it

Steering commands
where s are image features

Test distribution is different

It was not 1: they showed that even a linear policy can work well.
from training distribution
It was not 2: their error on held-out data was close to training error.
(covariate shift)
Imitation learning Supervised learning

(Ross & Bagnell, 2010): IL is a sequential decision-making problem.

• Your actions affect future observations/data.

• This is not the case in supervised learning

Supervised Learning

Assumes train/test data are i.i.d.

If expected training error is

Expected test error after T decisions

Test distribution is different

from training distribution
(covariate shift) Errors are independent
Imitation learning Supervised learning

(Ross & Bagnell, 2010): IL is a sequential decision-making problem.

• Your actions affect future observations/data.

• This is not the case in supervised learning

Imitation Learning Supervised Learning

Train/test data are not i.i.d. Assumes train/test data are i.i.d.

If expected training error is If expected training error is

Expected test error after T decisions Expected test error after T decisions
is up to
Test distribution is different
from training distribution
(covariate shift) Errors compound Errors are independent
DAgger
(Ross & Gordon & Bagnell, 2011): DAgger, or Dataset Aggregation

• Imitation learning as interactive supervision

• Aggregate training data from expert with test data from execution
DAgger
(Ross & Gordon & Bagnell, 2011): DAgger, or Dataset Aggregation

• Imitation learning as interactive supervision

• Aggregate training data from expert with test data from execution

Imitation Learning via DAgger Supervised Learning

Train/test data are not i.i.d. Assumes train/test data are i.i.d.

If expected training error on aggr. dataset is If expected training error is

Expected test error after T decisions is Expected test error after T decisions

Errors do not compound Errors are independent

DAgger

Initial expert trajectories Supervised learning DAgger

https://www.youtube.com/watch?v=V00npNnWzSU
DAgger
DAgger

Q: Any drawbacks of using it in a robotics setting?

DAgger

https://www.youtube.com/watch?v=hNsP6-K3Hn4

Learning Monocular Reactive UAV Control in Cluttered Natural Environments, Ross et al, 2013
Today’s agenda

• Administrivia
• Topics covered by the course
• Behavioral cloning
• Imitation learning
• Quiz about background and interests
• (Time permitting) Query the expert only when policy is uncertain
DAgger: Assumptions for theoretical guarantees
(Ross & Gordon & Bagnell, 2011): DAgger, or Dataset Aggregation

• Imitation learning as interactive supervision

• Aggregate training data from expert with test data from execution

Imitation Learning via DAgger Supervised Learning

Train/test data are not i.i.d. Assumes train/test data are i.i.d.

If expected training error on aggr. dataset is If expected training error is

Expected test error after T decisions is Expected test error after T decisions

Strongly convex loss

No-regret online learner
Errors do not compound Errors are independent
Appendix: No-Regret Online Learners
Intuition: No matter what the distribution of input data, your online policy/classifier will do
asymptotically as well as the best-in-hindsight policy/classifier.

Policy has access to Policy has access to

data up to round i data up to round N

No-regret:
Appendix: Types of Uncertainty &
Query-Efficient Imitation

Let’s revisit the two main ideas from query-efficient imitation:

1. DropoutDAgger:
Keep an ensemble of learner policies, and only query the expert when they significantly disagree

2. SHIV, SafeDagger, MMD-IL:

(Roughly) Query expert only if input is too close to the decision boundary of the learner’s policy

Need to review a few concepts about different types of uncertainty.

Biased Coin

observations
Biased Coin

how biased is the coin?

Biased Coin

how biased is the coin?

Induces uncertainty in the model, or epistemic uncertainty,

which asymptotically goes to 0 with infinite observations
Biased Coin

Q: Even if you eventually discover the true model, can you predict if the next flip will be heads?
Biased Coin

Q: Even if you eventually discover the true model, can you predict if the next flip will be heads?

A: No, there is irreducible uncertainty / observation noise in the system. This is called aleatoric uncertainty.
Gaussian Process Regression

http://pyro.ai/examples/gp.html
Gaussian Process Regression

http://pyro.ai/examples/gp.html

Zero mean prior over functions

Noisy observations
Gaussian Process Regression

No matter how much data we get, this

observation noise will not go to zero

http://pyro.ai/examples/gp.html

Zero mean prior over functions

Noisy observations
Gaussian Process Regression

If we get data here we can reduce

model / epistemic uncertainty

http://pyro.ai/examples/gp.html

Zero mean prior over functions

Noisy observations
Gaussian Process Classification

Gaussian Processes for Machine Learning, chapter 2

Gaussian Process Classification vs SVM

Gaussian Processes for Machine Learning, chapter 2

GP handles uncertainty in f by averaging

while SVM considers only best f for classification.
Model Uncertainty in Neural Networks
Want

But easier to control network weights

Model Uncertainty in Neural Networks
Want

But easier to control network weights

How do we represent posterior over network weights?

How do we quickly sample from it?
Model Uncertainty in Neural Networks
Want

But easier to control network weights

How do we represent posterior over network weights?

How do we quickly sample from it?
Main ideas:

1. Use an ensemble of networks trained on different

copies of D (bootstrap method)

2. Use an approximate distribution over weights

(Dropout, Bayes by Backprop, …)

3. Use MCMC to sample weights

Model Uncertainty in Neural Networks
Want

But easier to control network weights

How do we represent posterior over network weights?

How do we quickly sample from it?
Main ideas:

1. Use an ensemble of networks trained on different

copies of D (bootstrap method)

2. Use an approximate distribution over weights

(Dropout, Bayes by Backprop, …)

3. Use MCMC to sample weights

Model Uncertainty in Neural Networks
Want

But easier to control network weights

How do we represent posterior over network weights?

How do we quickly sample from it?
Main ideas:

1. Use an ensemble of networks trained on different

copies of D (bootstrap method)
Variational inference
2. Use an approximate distribution over weights
(Dropout, Bayes by Backprop, …)

3. Use MCMC to sample weights

Model Uncertainty in Neural Networks
Want

But easier to control network weights

How do we represent posterior over network weights?

How do we quickly sample from it?
Main ideas:

1. Use an ensemble of networks trained on different

copies of D (bootstrap method)
Variational inference
2. Use an approximate distribution over weights
(Dropout, Bayes by Backprop, …)

3. Use MCMC to sample weights

Quiz 0
No ratings yet
Quiz 0
3 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
EPR-Lec12-12072025 - BehaviorBasedRobots-GeneticAlgo-ANN
No ratings yet
EPR-Lec12-12072025 - BehaviorBasedRobots-GeneticAlgo-ANN
38 pages
Real Car Steering with NFQ in 20 Min
No ratings yet
Real Car Steering with NFQ in 20 Min
8 pages
UNIT-2 Goyal Question & Answer
No ratings yet
UNIT-2 Goyal Question & Answer
5 pages
Answer Key
No ratings yet
Answer Key
12 pages
Dynamic Neural Learning Engine Complete
No ratings yet
Dynamic Neural Learning Engine Complete
9 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
AI for Telescope Scheduling
No ratings yet
AI for Telescope Scheduling
38 pages
Artificial Intelligence, NLP
No ratings yet
Artificial Intelligence, NLP
6 pages
Deep Learning Hand Book 2024
No ratings yet
Deep Learning Hand Book 2024
185 pages
MLT U1
No ratings yet
MLT U1
23 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Immitation Learning I Katef
No ratings yet
Immitation Learning I Katef
60 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Machine Learning Concepts & Design
No ratings yet
Machine Learning Concepts & Design
20 pages
Deep Learning with Keras Basics
No ratings yet
Deep Learning with Keras Basics
58 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Chapters 1-4
No ratings yet
Chapters 1-4
6 pages
Ai Lect6 Genetic
No ratings yet
Ai Lect6 Genetic
94 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
Case 2 Object Detection
No ratings yet
Case 2 Object Detection
77 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
28 pages
ICML 2018 RL Highlights
No ratings yet
ICML 2018 RL Highlights
55 pages
cs4302 Lecture1
No ratings yet
cs4302 Lecture1
65 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
ML (Unit-1)
No ratings yet
ML (Unit-1)
17 pages
Manual - Deep Learning Lab.
No ratings yet
Manual - Deep Learning Lab.
43 pages
03 Introtoml Ueh
No ratings yet
03 Introtoml Ueh
43 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
ASM MotionPlanning Part 3
No ratings yet
ASM MotionPlanning Part 3
115 pages
LBDL
No ratings yet
LBDL
185 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
20 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
ML UNIT-1 Notes PDF
No ratings yet
ML UNIT-1 Notes PDF
22 pages
Machine Learning
No ratings yet
Machine Learning
99 pages
Research Areas in Artificial Intelligence and Machine Learning
100% (1)
Research Areas in Artificial Intelligence and Machine Learning
72 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Softcomputing NN
No ratings yet
Softcomputing NN
84 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
SCT Paper Solved
No ratings yet
SCT Paper Solved
18 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
Chapter 19 Network Layer Protocols
No ratings yet
Chapter 19 Network Layer Protocols
37 pages
Top Startup Failure Reasons Analyzed
100% (1)
Top Startup Failure Reasons Analyzed
25 pages
DC 9105e 64266 en Admin
No ratings yet
DC 9105e 64266 en Admin
2 pages
Github Copilot Coding With Copilot
100% (2)
Github Copilot Coding With Copilot
41 pages
New Headway Academic Skills Level 2 Stud
No ratings yet
New Headway Academic Skills Level 2 Stud
7 pages
5 Steps To A 5: AP Computer Science Principles, 2nd Edition Julie Schacht Sway PDF Available
No ratings yet
5 Steps To A 5: AP Computer Science Principles, 2nd Edition Julie Schacht Sway PDF Available
88 pages
LeetCode Java Cheat Sheet For Interview
100% (1)
LeetCode Java Cheat Sheet For Interview
9 pages
Revenue Assurance and Cost Management: Wedo Technologies: A Stratecast Report Featuring
No ratings yet
Revenue Assurance and Cost Management: Wedo Technologies: A Stratecast Report Featuring
18 pages
Create Reports Using External Data
No ratings yet
Create Reports Using External Data
4 pages
Fingerprint Attendance System Guide
No ratings yet
Fingerprint Attendance System Guide
3 pages
S4 CH 1 - 2 (Quadratic Equations)
No ratings yet
S4 CH 1 - 2 (Quadratic Equations)
12 pages
VNP20N07FI VNB20N07/VNV20N07: "Omnifet": Fully Autoprotected Power Mosfet
No ratings yet
VNP20N07FI VNB20N07/VNV20N07: "Omnifet": Fully Autoprotected Power Mosfet
13 pages
Manual 360 Visonic
0% (1)
Manual 360 Visonic
70 pages
Siemens 7UM62x V4.7 Generator PTT User Manual ENU
No ratings yet
Siemens 7UM62x V4.7 Generator PTT User Manual ENU
10 pages
All India Mobile Number Database
No ratings yet
All India Mobile Number Database
15 pages
Huawei BB Schematic Overview
No ratings yet
Huawei BB Schematic Overview
43 pages
Applications of Artificial Intelligence
No ratings yet
Applications of Artificial Intelligence
6 pages
1 ( )
No ratings yet
1 ( )
12 pages
Vish Project Report
No ratings yet
Vish Project Report
13 pages
Heading Image P and Table
No ratings yet
Heading Image P and Table
9 pages
Class 3: Hardware & Software Basics
No ratings yet
Class 3: Hardware & Software Basics
4 pages
RESULTS Kolkata Municipal Corporation
No ratings yet
RESULTS Kolkata Municipal Corporation
194 pages
Skip To Main Content Accessibility Feedback
No ratings yet
Skip To Main Content Accessibility Feedback
7 pages
Quick Reference Sheet
No ratings yet
Quick Reference Sheet
10 pages
Radio Resource Control Parameters
No ratings yet
Radio Resource Control Parameters
10 pages
Curriculum Vitae English Template
100% (1)
Curriculum Vitae English Template
5 pages
Oops Ds Notes
No ratings yet
Oops Ds Notes
203 pages
AI Design Process Overview
No ratings yet
AI Design Process Overview
60 pages
Unit IV Robot Kinematics and Robot Programming
No ratings yet
Unit IV Robot Kinematics and Robot Programming
23 pages
Theory
No ratings yet
Theory
11 pages