0% found this document useful (0 votes)

7 views43 pages

Lec 2

UC Berkly CS182 Lecture Notes

Uploaded by

Phạm Thạch Thanh Trúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views43 pages

Lec 2

UC Berkly CS182 Lecture Notes

Uploaded by

Phạm Thạch Thanh Trúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Introduction to Machine Learning

Designing, Visualizing and Understanding Deep Neural Networks

CS W182/282A
Instructor: Sergey Levine
UC Berkeley
How do we formulate learning problems?
Different types of learning problems
[object label] supervised learning

unlabeled
data
representation unsupervised learning

reinforcement learning
Supervised learning
Given:

[object label]

Questions to answer:
Unsupervised learning
unlabeled
data
representation what does that mean?

GANs
generative modeling: VAEs
pixel RNN, etc.

self-supervised
representation learning:
Reinforcement learning

Actions: muscle contractions Actions: motor current or torque Actions: what to purchase
Observations: sight, smell Observations: camera images Observations: inventory levels
Rewards: food Rewards: task success measure (e.g., Rewards: profit
running speed)
Reinforcement learning
But many other application areas too!
➢ Education (recommend which topic to study next)

➢ YouTube recommendations!

➢ Ad placement
Haarnoja et al., 2019 ➢ Healthcare (recommending treatments)
Let’s start with supervised learning…
Supervised learning
Given:

[object label]

The overwhelming majority of machine learning that is used in industry is supervised learning

➢ Encompasses all prediction/recognition models trained from ground truth data

➢ Multi-billion $/year industry!
➢ Simple basic principles
Example supervised learning problems
Given:

Predict… Based on…

category of object image

sentence in French sentence in English

presence of disease X-ray image

text of a phrase audio utterance

Prediction is difficult
0 1 2 3 4 5 6 7 8 9
5? 0% 0% 0% 0% 0% 90% 8% 0% 2% 0%

9? 4% 0% 0% 0% 11% 0% 4% 0% 6% 75%

3? 5% 0% 0% 40% 0% 30% 20% 0% 5% 0%

4? 5% 0% 0% 0% 50% 0% 3% 0% 2% 40%

0? 70% 0% 20% 0% 0% 0% 0% 0% 10% 0%

Predicting probabilities
Often makes more sense than predicting discrete labels

We’ll see later why it is also easier to learn, due to smoothness

Intuitively, we can’t change a discrete label “a tiny bit,” it’s all or nothing
But we can change a probability “a tiny bit”

Given:
Conditional probabilities
random variable representing the input
why is it a random variable?

random variable representing the output

chain rule
definition of
conditional
probability
How do we represent it?

computer
program [object label]
[object probability]

0 1 2 3 4 5 6 7 8 9
0% 0% 0% 0% 0% 90% 8% 0% 2% 0%

10 possible labels, output 10 numbers

(that are positive and sum to 1.0)
How do we represent it?

computer
program [object label]
[object probability]

(that are positive and sum to 1.0)

How do we represent it?

computer
program [object label]
[object probability]

why any function?

could be any (ideally one to one & onto)

function that takes these inputs and outputs
probabilities that are positive and sum to 1
How do we represent it?

could be any (ideally one to one & onto)

function that takes these inputs and outputs
probabilities that are positive and sum to 1

especially convenient because it’s one to one & onto

maps entire real number line to entire set of positive reals
(but don’t overthink it, any one of these would work)
How do we represent it?

makes it positive

makes it sum to 1

There is nothing magical about this

It’s not the only way to do it

Just need to get the numbers to be positive and sum to 1!

The softmax in general
0 1 2 3 4 5 6 7 8 9
0% 0% 0% 0% 0% 90% 8% 0% 2% 0%
An illustration: 2D case
An illustration: 1D case

definitely blue not sure definitely red

probability increases exponentially as
we move away from boundary

normalizer
Why is it called a softmax?
Loss functions
So far…

computer
program
[object probability]

this has learned parameters

The machine learning method
for solving any problem ever
How do represent the “program”
1. Define your model class
We (mostly) did this in the last section

(though we’ll spend a lot more time on this later)

2. Define your loss function How to measure if one model in your model
class is better than another?

How to search the model class to find the model

3. Pick your optimizer that minimizes the loss function?

4. Run it on a big GPU

Aside: Marr’s levels of analysis
computational “why?” e.g., loss function

algorithmic “what?” e.g., the model

implementation “how?” e.g., the optimization algorithm

“on which GPU?”

There are many variants on this basic idea…

The machine learning method
for solving any problem ever
How do represent the “program”
1. Define your model class
We (mostly) did this in the last section

(though we’ll spend a lot more time on this later)

2. Define your loss function How to measure if one model in your model
class is better than another?

How to search the model class to find the model

3. Pick your optimizer that minimizes the loss function?

4. Run it on a big GPU

How is the dataset “generated”?

probability distribution
~ over photos

conditional probability
distribution over labels
How is the dataset “generated”?

Training set:
How is the dataset “generated”?
How is the dataset “generated”?

maximum likelihood estimation (MLE)

negative log-likelihood (NLL)

this is our loss function!
Loss functions aside: cross-entropy

In general:

Examples:
Optimization
The machine learning method
for solving any problem ever
1. Define your model class

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

The loss “landscape”

some small constant

called “learning rate” or
“step size”
Gradient descent

negative slope = go to the right

positive slope = go to the left gradient:

in general:
for each dimension, go in the direction
opposite the slope along that dimension

etc.
Gradient descent

We’ll go into a lot more detail about gradient

descent and related methods in a later lecture!
The machine learning method
for solving any problem ever
1. Define your model class

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

Logistic regression

matrix
Special case: binary classification

this is called the logistic equation

also referred to as a sigmoid
Empirical risk and true risk
1 if wrong, 0 if right

is this a good approximation?

Empirical risk minimization

Overfitting: when the empirical risk is low, but the true risk is high
can happen if the dataset is too small
can happen if the model is too powerful (has too many parameters/capacity)

Underfitting: when the empirical risk is high, and the true risk is high
can happen if the model is too weak (has too few parameters/capacity)
can happen if your optimizer is not configured well (e.g., wrong learning rate)

This is very important, and we will discuss this in much more detail later!
Summary
1. Define your model class

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

ML 01
No ratings yet
ML 01
24 pages
Introduction To Machine Learning Top-Down Approach - Towards Data Science
No ratings yet
Introduction To Machine Learning Top-Down Approach - Towards Data Science
6 pages
Advanced ML Slides Intro
No ratings yet
Advanced ML Slides Intro
14 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
(Fall 2024) Intro To ML
No ratings yet
(Fall 2024) Intro To ML
51 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
CM20315 01 Intro
No ratings yet
CM20315 01 Intro
62 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Advanced Machine Learning (AML)
No ratings yet
Advanced Machine Learning (AML)
70 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
01 ML Basics
No ratings yet
01 ML Basics
61 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
CSE 445 - Lecture 1 - Machine Learning Introduction
No ratings yet
CSE 445 - Lecture 1 - Machine Learning Introduction
23 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Aprendizaje de Máquina: Joaquín F Sánchez
No ratings yet
Aprendizaje de Máquina: Joaquín F Sánchez
30 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
Introduction To ML
No ratings yet
Introduction To ML
4 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
No ratings yet
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
3 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
1 Intro
No ratings yet
1 Intro
18 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Day 1 S3
No ratings yet
Day 1 S3
29 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Lec2 Intro To ML
No ratings yet
Lec2 Intro To ML
35 pages
03 Introtoml Ueh
No ratings yet
03 Introtoml Ueh
43 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Cours 1
No ratings yet
Cours 1
42 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Fingerprint Submission Guide
No ratings yet
Fingerprint Submission Guide
2 pages
Optimus S: S S Iiss Aa Ccoom
No ratings yet
Optimus S: S S Iiss Aa Ccoom
2 pages
KLN College of Engineering B.E. Computer Science and Engineering Choice Based Credit System Regulation-2017 I-VIII Semesters Course Outcomes
No ratings yet
KLN College of Engineering B.E. Computer Science and Engineering Choice Based Credit System Regulation-2017 I-VIII Semesters Course Outcomes
26 pages
Operating Systems Lesson Plan
No ratings yet
Operating Systems Lesson Plan
2 pages
Compare The Raster and Vector Data Structures.: Point Entity Line Entity Polygon Entity
No ratings yet
Compare The Raster and Vector Data Structures.: Point Entity Line Entity Polygon Entity
6 pages
2 Itr
No ratings yet
2 Itr
18 pages
Bokeh: Interactive Python Visualizations
No ratings yet
Bokeh: Interactive Python Visualizations
42 pages
FoxCharts Documentation - A Tutorial
100% (1)
FoxCharts Documentation - A Tutorial
43 pages
Siwarex Ftc-En V 1-0
No ratings yet
Siwarex Ftc-En V 1-0
17 pages
Sensors 20 07151
No ratings yet
Sensors 20 07151
14 pages
Big Data Quantum
No ratings yet
Big Data Quantum
78 pages
Ansys Meshing Tutorial Guide 2021 R2
No ratings yet
Ansys Meshing Tutorial Guide 2021 R2
65 pages
Higher Surveying La Putt PDF Free Download Compress
0% (2)
Higher Surveying La Putt PDF Free Download Compress
3 pages
P1 Gen 7 Lenovo ThinkStation
No ratings yet
P1 Gen 7 Lenovo ThinkStation
19 pages
3D Exotic Bike Report
No ratings yet
3D Exotic Bike Report
28 pages
Study Id51846 Steam Gaming Platform
No ratings yet
Study Id51846 Steam Gaming Platform
44 pages
Dog Breed Classification with CNN
No ratings yet
Dog Breed Classification with CNN
54 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
29 pages
Peer Coaching Boosts TVL-ICT Computer Skills
No ratings yet
Peer Coaching Boosts TVL-ICT Computer Skills
3 pages
20-Unit 7-22-04-2024
No ratings yet
20-Unit 7-22-04-2024
97 pages
Industrial: 800xa - System 800xa For Advant Master
No ratings yet
Industrial: 800xa - System 800xa For Advant Master
42 pages
Crowd Management Using openCV Srs
No ratings yet
Crowd Management Using openCV Srs
32 pages
Practical Final Class 10 AI Term 1 Answer Key
No ratings yet
Practical Final Class 10 AI Term 1 Answer Key
4 pages
.docx
No ratings yet
.docx
2 pages
Electronic Data Processing Basics
No ratings yet
Electronic Data Processing Basics
64 pages
Cyber-Plus Evolution A 8 Channels "Expandable To 16 Channels""
100% (1)
Cyber-Plus Evolution A 8 Channels "Expandable To 16 Channels""
26 pages
138-Assembly Manual For DLP - SLA
No ratings yet
138-Assembly Manual For DLP - SLA
26 pages
Class 3rd Computer Worksheet
No ratings yet
Class 3rd Computer Worksheet
5 pages
Video Test Pattern Generator V8.2: Logicore Ip Product Guide
No ratings yet
Video Test Pattern Generator V8.2: Logicore Ip Product Guide
51 pages
19 Unpriced BOM, Project & Manpower Plan, Compliances
No ratings yet
19 Unpriced BOM, Project & Manpower Plan, Compliances
324 pages

Lec 2

Uploaded by

Lec 2

Uploaded by

Introduction to Machine Learning

Designing, Visualizing and Understanding Deep Neural Networks

➢ Encompasses all prediction/recognition models trained from ground truth data

Predict… Based on…

category of object image

sentence in French sentence in English

presence of disease X-ray image

text of a phrase audio utterance

3? 5% 0% 0% 40% 0% 30% 20% 0% 5% 0%

0? 70% 0% 20% 0% 0% 0% 0% 0% 10% 0%

We’ll see later why it is also easier to learn, due to smoothness

random variable representing the output

10 possible labels, output 10 numbers

(that are positive and sum to 1.0)

why any function?

could be any (ideally one to one & onto)

could be any (ideally one to one & onto)

especially convenient because it’s one to one & onto

There is nothing magical about this

It’s not the only way to do it

Just need to get the numbers to be positive and sum to 1!

definitely blue not sure definitely red

this has learned parameters

(though we’ll spend a lot more time on this later)

How to search the model class to find the model

4. Run it on a big GPU

algorithmic “what?” e.g., the model

implementation “how?” e.g., the optimization algorithm

“on which GPU?”

There are many variants on this basic idea…

(though we’ll spend a lot more time on this later)

How to search the model class to find the model

4. Run it on a big GPU

maximum likelihood estimation (MLE)

negative log-likelihood (NLL)

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

some small constant

negative slope = go to the right

We’ll go into a lot more detail about gradient

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

this is called the logistic equation

is this a good approximation?

2. Define your loss function

3. Pick your optimizer

4. Run it on a big GPU

You might also like