Artificial Intelligence & Machine Learning
Prof. Sankhadeep Chatterjee
Department of Computer Science and Technology
&
Department of Computer Science and Information Technology
University of Engineering and Management, Kolkata
Outline
• Types of Problems (Recap)
• Classifiers
• K-Nearest Neighbour
Machine Learning?
Traditional Programming
Machine Learning
Classification (Data)
Attributes Target
• Collection of data objects and their
attributes
age income student credit_rating buys_computer
• An attribute is a property or <=30 high no fair no
characteristic of an object <=30 high no excellent no
31…40 high no fair yes
– Examples: eye color of a person,
>40 medium no fair yes
temperature, etc.
>40 low yes fair yes
– Attribute is also known as variable, >40 low yes excellent no
Objects
field, characteristic, dimension, or 31…40 low yes excellent yes
feature <=30 medium no fair no
• A collection of attributes describe <=30 low yes fair yes
an object >40 medium yes fair yes
<=30 medium yes excellent yes
– Object is also known as record, point, 31…40 medium no excellent yes
case, sample, entity, or instance 31…40 high yes fair yes
>40 medium no excellent no
Classifier
Classification
Algorithms
Training
Data
age
<=30
income student credit_rating
high no fair
buys_computer
no
Classifier
<=30
31…40
high
high
no excellent
no fair
no
yes
(Model)
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Testing Classifier
Classifier
Testing
Data
Unseen Data
age income student credit_rating buys_computer
<=30
<=30
high
high
no
no
fair
excellent
no
no
(>40, high, yes, excellent)
31…40 high no fair yes
>40 medium no fair yes
>40
>40
low
low
yes
yes
fair
excellent
yes
no Buys computers?
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Instance Based Classifiers
• Examples:
– Rote-learner
• Memorizes entire training data and performs classification only if attributes
of record match one of the training examples exactly
– Nearest neighbor
• Uses k “closest” points (nearest neighbors) for performing classification
Visualization
Unknown record
• Consider the problem
of classifying people
based on weight and
height
Height
Weight Height Class
65 156 N
56 152 N
42 143 S
72 151 N
…
Weight
Nearest-Neighbor Classifiers
Unknown record Requires three things
– The set of labeled records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
To classify an unknown record:
– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
K-Nearest Neighbour
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points
that have the k smallest distances to x
K-Nearest Neighbour
• Compute distance between two points:
– Euclidean distance
d ( p, q ) ( pi
i
q )
i
2
• Determine the class from nearest neighbor list
– Take the majority vote of class labels among the k-nearest neighbors
K-Nearest Neighbour
Attribute_1 Attribute_2 class
4.9 1.5 1
Test Data = (4.9 , 1.75)
4.6 1.5 1
4.7 1.6 1
4.8 1.8 1
5 1.7 1
5.1 1.6 1
4.7 1.5 1
5.1 1.9 2
5 1.5 2
4.9 1.8 2
5.1 1.5 2
5 1.9 2
5.1 1.8 2
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
Test Data = (4.9 , 1.75)
5.1 1.5 2 0.32
5 1.9 2 0.18 𝑑𝑖 = (4.9 − 4.9)2 +(1.75 − 1.5)2
5.1 1.8 2 0.21
= (0)2 +(0.25)2 = 0.25
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
5.1 1.5 2 0.32
5 1.9 2 0.18
5.1 1.8 2 0.21
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
5.1 1.5 2 0.32
5 1.9 2 0.18
5.1 1.8 2 0.21
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
5.1 1.5 2 0.32
5 1.9 2 0.18
5.1 1.8 2 0.21
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
5.1 1.5 2 0.32
5 1.9 2 0.18
5.1 1.8 2 0.21
K-Nearest Neighbour
Attribute_1 Attribute_2 class dist
4.9 1.5 1 0.25
4.6 1.5 1 0.39
4.7 1.6 1 0.25
4.8 1.8 1 0.11
5 1.7 1 0.11
5.1 1.6 1 0.25
4.7 1.5 1 0.32
5.1 1.9 2 0.25
5 1.5 2 0.27
4.9 1.8 2 0.05
5.1 1.5 2 0.32
5 1.9 2 0.18
5.1 1.8 2 0.21
Thank You