Introduction To Machine Learning: K-Nearest Neighbor Algorithm

The document discusses the k-nearest neighbors (KNN) machine learning algorithm. KNN is a simple classification algorithm that classifies new data based on the majority class of its k nearest neighbors. It does not learn until a new data point needs to be classified, at which point it finds the k nearest neighbors in the training data and assigns the new point the class of the majority of those neighbors. The document discusses how KNN works, provides examples, pseudocode, and discusses factors like choosing k, similarity/distance measures, and the pros and cons of KNN.

Uploaded by

Alwan Siddiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

292 views25 pages

Introduction To Machine Learning: K-Nearest Neighbor Algorithm

Uploaded by

Alwan Siddiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

INTRODUCTION TO MACHINE LEARNING

K-NEAREST NEIGHBOR ALGORITHM

Mingon Kang, PhD

Department of Computer Science @ UNLV
KNN
 K-Nearest Neighbors (KNN)
 Simple, but a very powerful classification algorithm
 Classifies based on a similarity measure
 Non-parametric
 Lazy learning
 Does not “learn” until the test example is given
 Whenever we have a new data to classify, we find its
K-nearest neighbors from the training data

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
 Assigned to the most common class amongst its K-
nearest neighbors (by measuring “distant” between
data)

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Example

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Pseudocode

Ref: https://www.slideshare.net/PhuongNguyen6/text-categorization
KNN: Example

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
KNN: Euclidean distance matrix

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
Decision Boundaries
 Voronoi diagram
 Describes the areas that are nearest to any given point,
given a set of data.
 Each line segment is equidistant between two points of
opposite class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Decision Boundaries
 With large number of examples and possible noise
in the labels, the decision boundary can become
nasty!
 “Overfitting” problem

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Effect of K
 Larger k produces smoother boundary effect
 When K==N, always predict the majority class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Discussion
 Which model is better between K=1 and K=15?
 Why?
How to choose k?
 Empirically optimal k?

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Pros and Cons
 Pros
 Learning and implementation is extremely simple and
Intuitive
 Flexible decision boundaries

 Cons
 Irrelevantor correlated features have high impact and
must be eliminated
 Typically difficult to handle high dimensionality
 Computational costs: memory and classification time
computation

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Similarity and Dissimilarity
 Similarity
 Numerical measure of how alike two data objects are.
 Is higher when objects are more alike.
 Often falls in the range [0,1]

 Dissimilarity
 Numerical measure of how different are two data objects
 Lower when objects are more alike
 Minimum dissimilarity is often 0
 Upper limit varies

 Proximity refers to a similarity or dissimilarity

Euclidean Distance
 Euclidean Distance

𝑑𝑖𝑠𝑡 = σ𝑝𝑘=1(𝑎𝑘 − 𝑏𝑘 )2

Where p is the number of dimensions (attributes) and

𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes
(components) or data objects a and b.

 Standardization is necessary, if scales differ.

Euclidean Distance
Minkowski Distance
 Minkowski Distance is a generalization of Euclidean
Distance
𝑝 1/𝑟

𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘 |𝑟
𝑘=1
Where r is a parameter, p is the number of dimensions
(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th
attributes (components) or data objects a and b
Minkowski Distance: Examples
 r = 1. City block (Manhattan, taxicab, L1 norm) distance.
 A common example of this is the Hamming distance, which is just
the number of bits that are different between two binary vectors

 r = 2. Euclidean distance

 r →∞. “supremum” (𝐿𝑚𝑎𝑥 norm, 𝐿∞ norm) distance.

 This is the maximum difference between any component of the
vectors

 Do not confuse r with p, i.e., all these distances are defined

for all numbers of dimensions.
Cosine Similarity
 If 𝑑1 and 𝑑2 are two document vectors
cos(𝑑1 , 𝑑2 )=(𝑑1 ∙ 𝑑2 )/||𝑑1 ||||𝑑2 ||,
Where ∙ indicates vector dot product and ||𝑑|| is the length of vector d.

 Example:
𝑑1 = 3 2 0 5 0 0 0 2 0 0
𝑑2 = 1 0 0 0 0 0 0 1 0 2

𝑑1 ∙ 𝑑2 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||𝑑1 || = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5= (42)^0.5= 6.481
||𝑑1 || = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245
cos( d1, d2) = .3150
Cosine Similarity
1: exactly the same
 cos(𝑑1 , 𝑑2 ) = ቐ 0: orthogonal
−1: exactly opposite
Feature scaling
 Standardize the range of independent variables
(features of data)
 A.k.a Normalization or Standardization
Standardization
 Standardization or Z-score normalization
 Rescalethe data so that the mean is zero and the
standard deviation from the mean (standard scores) is
one

x−𝜇
x𝑛𝑜𝑟𝑚 =
𝜎
𝜇 is mean, 𝜎 is a standard deviation from the mean
(standard score)
Min-Max scaling
 Scale the data to a fixed range – between 0 and 1

x − xmin
xmorm =
xmax − xmin
Efficient implementation
 Consider data as a matrix or a vector
 Matrix/Vector computational is much more efficient
than computing with loop
Discussion
 Can we use KNN for regression problems?

Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
K Means Questions
No ratings yet
K Means Questions
2 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
Lab Program
100% (1)
Lab Program
15 pages
3.4 Lda
No ratings yet
3.4 Lda
12 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
No ratings yet
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
35 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
2 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
ML Unit-1
No ratings yet
ML Unit-1
34 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML Unit4
No ratings yet
ML Unit4
41 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Mining Graphs
No ratings yet
Mining Graphs
23 pages
R Programming Practical File (1319036)
No ratings yet
R Programming Practical File (1319036)
25 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
Unit 5
No ratings yet
Unit 5
104 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Regression Notes
100% (1)
Regression Notes
20 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Ad3351 Daa Question Bank
No ratings yet
Ad3351 Daa Question Bank
12 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Session-5.1-Measuring Data Similarity and Dissimilarity - Part-1
No ratings yet
Session-5.1-Measuring Data Similarity and Dissimilarity - Part-1
11 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
79 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Machine Learning for B.Tech Students
No ratings yet
Machine Learning for B.Tech Students
3 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
k-Nearest Neighbors Lecture Slides
No ratings yet
k-Nearest Neighbors Lecture Slides
57 pages
Derivatives Formula Sheet
No ratings yet
Derivatives Formula Sheet
1 page
Hw4-Vision14 Solution3
No ratings yet
Hw4-Vision14 Solution3
2 pages
Ma 214 hws4
No ratings yet
Ma 214 hws4
18 pages
1 IEOR 4700: Introduction To Stochastic Integration
No ratings yet
1 IEOR 4700: Introduction To Stochastic Integration
8 pages
Fibonacci
No ratings yet
Fibonacci
12 pages
JEE Main B.Plan Sample Paper 2020 PDF
No ratings yet
JEE Main B.Plan Sample Paper 2020 PDF
30 pages
Factoring Flow Sheet PDF
No ratings yet
Factoring Flow Sheet PDF
7 pages
Comandi Ansys
No ratings yet
Comandi Ansys
5 pages
Regenesys Prospectus
No ratings yet
Regenesys Prospectus
48 pages
Illanes - Macias - Continuum Theory Proceedings of The Special Session in Honor of Professor Sam B. Nadler, Jr.'s 60th Birthday (2002, Marcel Dekker LTD) - Libgen - Li
No ratings yet
Illanes - Macias - Continuum Theory Proceedings of The Special Session in Honor of Professor Sam B. Nadler, Jr.'s 60th Birthday (2002, Marcel Dekker LTD) - Libgen - Li
357 pages
Advanced Electromagnetic Theory
No ratings yet
Advanced Electromagnetic Theory
36 pages
Fem Unit I & II Question Bank
No ratings yet
Fem Unit I & II Question Bank
7 pages
Student Solutions Manual: Boundary Value Problems
No ratings yet
Student Solutions Manual: Boundary Value Problems
10 pages
1 3binomial
No ratings yet
1 3binomial
14 pages
A Consistent World Model For Consciousness and Physics
No ratings yet
A Consistent World Model For Consciousness and Physics
12 pages
College Algebra: Practice Problems Solving Equations and Inequalities
No ratings yet
College Algebra: Practice Problems Solving Equations and Inequalities
12 pages
Marvs 3d Examination For St. James..
No ratings yet
Marvs 3d Examination For St. James..
2 pages
Antiderivatives & Integration Guide
No ratings yet
Antiderivatives & Integration Guide
5 pages
Advanced Inequality Techniques
No ratings yet
Advanced Inequality Techniques
34 pages
Complex Number Summary
No ratings yet
Complex Number Summary
1 page
04 - Fundamental Trig Identities
No ratings yet
04 - Fundamental Trig Identities
4 pages
Solution: AB AB 2 2 2 AB AB AB
No ratings yet
Solution: AB AB 2 2 2 AB AB AB
1 page
Gamma Function Inequalities
No ratings yet
Gamma Function Inequalities
7 pages
1 Math
No ratings yet
1 Math
67 pages
Optimal Truss Design for Michell Structures
No ratings yet
Optimal Truss Design for Michell Structures
7 pages
Tangent & Normal Lines Explained
No ratings yet
Tangent & Normal Lines Explained
6 pages
Stress Transformation, Principal Stresses and Mohr's Circle (Chapter 5)
No ratings yet
Stress Transformation, Principal Stresses and Mohr's Circle (Chapter 5)
43 pages
BasCal Integral Formula Card
No ratings yet
BasCal Integral Formula Card
2 pages
Solving The Generalized Poisson Equation Using The
No ratings yet
Solving The Generalized Poisson Equation Using The
19 pages