10-‐601
Introduction to Machine Learning
Machine Learning Department
School of Computer Science
Carnegie Mellon University
Machine Learning in
Practice + k-‐Nearest
Neighbors
Intro Readings: KNN Readings:
Mitchell 1 Mitchell 8.2
Matt Gormley
HTF 1, 2 HTF 13.3 Lecture 2
Murphy 1 Murphy -‐-‐-‐ January 23, 2016
Bishop 1 Bishop 2.5.2
1
Reminders
• Background Test
– Tue, Jan. 24 at 6:30pm
– **Your test location depends on your
registration status – see Piazza for details
• Background Exercises (Homework 1)
– Released: Tue, Jan. 24 after the test
– Due: Mon, Jan. 30 at 5:30pm
2
Machine Learning & Ethics
What ethical responsibilities do we Some topics that we
won’t cover are probably
have as machine learning experts? deserve an entire course
If our search results for news are
optimized for ad revenue, might
they reflect gender / racial / socio-‐
economic biases?
http://bing.com/
http://arstechnica.com/
Should restrictions be placed on
intelligent agents that are capable of
interacting with the world?
How do autonomous vehicles make
decisions when all of the outcomes
are likely to be negative? 3
http://vizdoom.cs.put.edu.pl/
Outline
• Defining Learning Problems
– Artificial Intelligence (AI)
– Mitchell’s definition of learning
– Example learning problems
– Data annotation
– The Machine Learning framework
• Classification
– Binary classification
– 2D examples
– Decision rules / hypotheses
• k-‐Nearest Neighbors (KNN)
– KNN for binary classification
– Distance functions
– Special cases
Covered
– Choosing k Next
Lecture
4
This section is based on Chapter 1 of (Mitchell, 1997)
DEFINING LEARNING PROBLEMS
5
Artificial Intelligence
The basic goal of AI is to develop intelligent
machines.
This consists of many sub-‐goals: Artificial
• Perception Intelligence
• Reasoning Machine
• Control / Motion / Manipulation Learning
• Planning
• Communication
• Creativity
• Learning
6
Amazon Go
https://www.amazon.com/b?node=16008589011
https://www.youtube.com/watch?v=NrmMk1Myrxc
7
Slide from Roni Rosenfeld
Artificial Intelligence (AI):
Example Tasks:
– Identify objects in an image
– Translate from one human language to another
– Recognize speech
– Assess risk (e.g. in loan application)
– Make decisions (e.g. in loan application)
– Assess potential (e.g. in admission decisions)
– Categorize a complex situation (e.g. medical diagnosis)
– Predict outcome (e.g. medical prognosis, stock prices,
inflation, temperature)
– Predict events (default on loans, quitting school, war)
– Plan ahead under perfect knowledge (chess)
– Plan ahead under partial knowledge (Poker, Bridge)
© Roni Rosenfeld, 2016 8
Well-‐Posed Learning Problems
Three components:
1. Task, T
2. Performance measure, P
3. Experience, E
Mitchell’s definition of learning:
A computer program learns if its performance
at tasks in T, as measured by P, improves with
experience E.
9
Definition from (Mitchell, 1997)
Example Learning Problems
(historical perspective)
1. Learning to recognize spoken words
THEN NOW
“…the SPHINX system (e.g.
Lee 1989) learns speaker-
specific strategies for
recognizing the primitive
sounds (phonemes) and
words from the observed
speech signal…neural
network methods…hidden
Markov models…”
(Mitchell, 1997)
Source: https://www.stonetemple.com/great-‐knowledge-‐box-‐
showdown/#VoiceStudyResults
10
Example Learning Problems
(historical perspective)
2. Learning to drive an autonomous vehicle
THEN NOW
“…the ALVINN system
(Pomerleau 1989) has used
its learned strategies to drive
unassisted at 70 miles per
hour for 90 miles on public
highways among other
cars…”
(Mitchell, 1997)
waymo.com
11
Example Learning Problems
(historical perspective)
2. Learning to drive an autonomous vehicle
THEN NOW
“…the ALVINN system
(Pomerleau 1989) has used
its learned strategies to drive
unassisted at 70 miles per
hour for 90 miles on public
highways among other
cars…”
https://www.geek.com/wp-‐
(Mitchell, 1997)
content/uploads/2016/03/uber.jpg
12
Example Learning Problems
(historical perspective)
3. Learning to beat the masters at board games
THEN NOW
“…the world’s top computer
program for backgammon,
TD-GAMMON (Tesauro,
1992, 1995), learned its
strategy by playing over one
million practice games
against itself…”
(Mitchell, 1997)
13
Example Learning Problems
3. Learning to beat the masters at chess
1. Task, T:
2. Performance measure, P:
3. Experience, E:
14
Example Learning Problems
4. Learning to respond to voice commands (Siri)
1. Task, T:
2. Performance measure, P:
3. Experience, E:
15
Capturing the Knowledge of Experts
1980 1990 2000 2010
Solution #1: Expert Systems Give me directions to Starbucks
• Over 20 years ago, we If: “give me directions to X”
had rule based systems Then: directions(here, nearest(X))
• Ask the expert to How do I get to Starbucks?
1. Obtain a PhD in
If: “how do i get to X”
Linguistics Then: directions(here, nearest(X))
2. Introspect about the
structure of their native Where is the nearest Starbucks?
language
If: “where is the nearest X”
3. Write down the rules Then: directions(here, nearest(X))
they devise
16
Capturing the Knowledge of Experts
1980 1990 2000 2010
Solution #1: Expert Systems Give medirections
I need directionstotoStarbucks
Starbucks
• Over 20 years ago, we If:
If: “give me directions
“I need directions to
to X”
X”
had rule based systems Then:
Then: directions(here,
directions(here, nearest(X))
nearest(X))
• Ask the expert to How do I get to Starbucks?
Starbucks directions
1. Obtain a PhD in
If:
If: “how do i get to X”
Linguistics Then:
“X directions”
Then: directions(here,
directions(here, nearest(X))
nearest(X))
2. Introspect about the
structure of their native Where
Is thereis athe nearest Starbucks?
Starbucks nearby?
language
If:
If: “where is the
“Is there an Xnearest
nearby”X”
3. Write down the rules Then:
Then: directions(here,
directions(here, nearest(X))
nearest(X))
they devise
17
Capturing the Knowledge of Experts
1980 1990 2000 2010
Solution #2: Annotate Data and Learn
• Experts:
– Very good at answering questions about specific
cases
– Not very good at telling HOW they do it
• 1990s: So why not just have them tell you what
they do on SPECIFIC CASES and then let
MACHINE LEARNING tell you how to come to
the same decisions that they did 18
Capturing the Knowledge of Experts
1980 1990 2000 2010
Solution #2: Annotate Data and Learn
1. Collect raw sentences {x1, …, xn}
2. Experts annotate their meaning {y1, …, yn}
x1: How do I get to Starbucks? x3: Send a text to John that I’ll be late
y1: directions(here, y3: txtmsg(John, I’ll be late)
nearest(Starbucks))
x4: Set an alarm for seven in the
x2: Show me the closest Starbucks
morning
y2: map(nearest(Starbucks)) y4: setalarm(7:00AM) 19
Example Learning Problems
4. Learning to respond to voice commands (Siri)
1. Task, T:
predicting action from speech
2. Performance measure, P:
percent of correct actions taken in user pilot
study
3. Experience, E:
examples of (speech, action) pairs
20
Slide from Roni Rosenfeld
The Machine Learning Framework
• Formulate a task as a mapping from input to output
– Task examples will usually be pairs: (input, correct_output)
• Formulate performance as an error measure
– or more generally, as an objective function (aka Loss function)
• Examples:
– Medical Diagnosis
• mapping input to one of several classes/categories è Classification
– Predict tomorrow’s Temperature
• mapping input to a number è Regression
– Chance of Survival: From patient data to p(survive >= 5 years)
• mapping input to probability è Density estimation
– Driving recommendation
• mapping input into a plan è Planning
© Roni Rosenfeld, 2016 21
Slide from Roni Rosenfeld
Choices in ML Formulation
Often, the same task can be formulated in more than one way:
• Ex. 1: Loan applications
– creditworthiness/score (regression)
– probability of default (density estimation)
– loan decision (classification)
• Ex. 2: Chess
– Nature of available training examples/experience:
• expert advice (painful to experts)
• games against experts (less painful but limited, and not much control)
• experts’ games (almost unlimited, but only ”found data” – no control)
• games against self (unlimited, flexible, but can you learn this way?)
– Choice of target function: boardàmove vs. boardàscore
© Roni Rosenfeld, 2016 22
Slide from Roni Rosenfeld
How to Approach a Machine Learning Problem
1. Consider your goal à definition of task T
– E.g. make good loan decisions, win chess competitions, …
2. Consider the nature of available (or potential) experience E
– How much data can you get? What would it cost (in money, time or effort)?
3. Choose type of output O to learn
– (Numerical? Category? Probability? Plan?)
4. Choose the Performance measure P (error/loss function)
5. Choose a representation for the input X
6.Choose a set of possible solutions H (hypothesis space)
– set of functions h: X è O
– (often, by choosing a representation for them)
7. Choose or design a learning algorithm
– for using examples (E) to converge on a member of H that optimizes P
© Roni Rosenfeld, 2016 23
CLASSIFICATION
24
Fisher Iris Dataset
Fisher (1936) used 150 measurements of flowers
from 3 different species: Iris setosa (0), Iris
virginica (1), Iris versicolor (2) collected by
Anderson (1936)
Species Sepal Sepal Petal Petal
Length Width Length Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
1 6.7 3.0 5.0 1.7
26
Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Fisher Iris Dataset
Classification
Whiteboard:
– Binary classification
– 2D examples
– Decision rules / hypotheses
28
K-‐NEAREST NEIGHBORS
29
k-‐Nearest Neighbors
Whiteboard:
– KNN for binary classification
– Distance functions
30
Takeaways
• Learning Problems
– Defining a learning problem is tricky
– Formalizing exposes the many possibilities
• k-‐Nearest Neighbors
– KNN is an extremely simple algorithm for
classification
31