AN FPGA-BASED HARDWARE ACCELERATOR FOR K-NEAREST
NEIGHBOR CLASSIFICATION FOR MACHINE LEARNING
by
MOKHLES AAMEL MOHSIN
B.S., University of Technology, 2010
W
IE
EV
PR
A Thesis submitted to the Graduate Faculty of the
University of Colorado Colorado Springs
in partial fulfillment of the
requirements for the degree of
Master of Science
Department of Electrical and Computer Engineering
2017
ProQuest Number: 10641089
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
W
IE
EV
ProQuest 10641089
Published by ProQuest LLC (2017 ). Copyright of the Dissertation is held by the Author.
All rights reserved.
PR
This work is protected against unauthorized copying under Title 17, United States Code
Microform Edition © ProQuest LLC.
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
W
IE
EV
PR
© 2017
MOKHLES AAMEL MOHSIN
All Rights Reserved
This thesis for Master of Science Degree by
Mokhles Aamel Mohsin
has been approved for the
Department of Electrical and Computer Engineering
By
Darshika G. Perera, Chair
W
T.S. Kalkur
IE
EV
John Lindsey
PR
Date 11/15/2017
II
Mohsin, Mokhles Aamel (M.S., Electrical Engineering)
AN FPGA-BASED HARDWARE ACCELERATOR FOR K-NEAREST NEIGHBOR
CLASSIFICATION FOR MACHINE LEARNING.
Thesis directed by assistant professor Darshika G. Perera.
ABSTRACT
Machine learning has become the cornerstones of the information technology.
Machine learning is a type of artificial intelligence (AI), which enables a program (or a
W
computer) to learn, without being explicitly programmed and without human intervention.
Machine learning algorithms can be categorized into supervised learning (classification)
and unsupervised learning (clustering).
IE
Among many classification algorithms, K-nearest neighbor (K-NN) classifier is
one of the most commonly used machine learning algorithms. This algorithm is typically
EV
used for pattern recognition, data mining, image recognition, text categorization, and
predictive analysis.
Many machine learning algorithms, including K-NN classification are compute and
PR
data intensive, requiring significant processing power. For instance, the K-NN consists of
many complex and iterative computations, which include computing the distance measure
between the training dataset and the testing dataset, element by element, and
simultaneously sorting these measurements.
In this research work, our main objective is to investigate and provide an efficient
hardware architecture to accelerate the K-NN algorithm, while addressing the associated
requirements and constraints. Our hardware architecture is designed and developed on an
FPGA-based (Field Programmable Gate Array–based) development platform. We perform
experiments to evaluate our hardware architecture, in terms of speed-performance, space,
and accuracy. Our hardware architecture is also evaluated with its software counter-part
running on the same development platform. Experiments are performed using three
different benchmark datasets with varying data sizes.
III
We introduce unique techniques, including pre-fetching and burst transfers, to
reduce the external memory access latency issue that is common in embedded platforms.
We also address various issues with the proposed hardware support for K-NN algorithm in
the existing literature.
This investigation demonstrates that machine learning algorithms can indeed
benefit from hardware support. Our proposed hardware architecture is generic and
parameterized. Our hardware design is scalable to support varying data sizes. Our
experimental results and analysis illustrate that our proposed hardware is significantly
faster, for instance, our hardware architecture executed up to 127 times faster than its
software counter-part.
W
IE
EV
PR
IV
W
IE
EV
Dedicated to my family
PR
V
ACKNOWLEDGEMENTS
I would like to thank my advisor, Dr. Darshika G. Perera, for giving me an
opportunity to conduct this research work.
I am also thankful to the thesis panel members, Dr. T.S. Kalkur and Dr. John Lindsey, for
setting aside time from their academic and professional commitments to review and
evaluate my research work.
A special thanks to my fellow researcher, S. Navid Shahrouzi, for providing significant
help during my research.
W
I would also like to give special thanks to my family for supporting me academically.
IE
EV
PR
VI
TABLE OF CONTENTS
CHAPTER
I. INTRODUCTION ............................................................................................................1
1.1 Our research objective ............................................................................................... 4
1.2 Thesis organization ................................................................................................... 4
II. BACKGROUND .............................................................................................................6
2.1 Machine learning ....................................................................................................... 6
2.2 Classification ............................................................................................................. 8
W
2.3 K-NN Classifier......................................................................................................... 9
IE
2.4 Existing research work for hardware support for K-NN classifier ......................... 12
EV
III. DESIGN APPROACH AND DEVELOPMENT PLATFORM ..................................18
3.1 Hierarchical Platform-based Design Approach....................................................... 18
3.2 System-Level Interface ........................................................................................... 19
PR
3.2.1 MicroBlaze Soft Processor Core ...................................................................... 20
3.2.2 DDR3 Controller .............................................................................................. 20
3.2.3 DDR3-SDRAM External Memory ................................................................... 21
3.2.4 AXI Interconnect .............................................................................................. 22
3.2.5 AXI Timer ........................................................................................................ 22
3.2.6 UART Lite ........................................................................................................ 22
3.2.7 IP Version and Memory Map ........................................................................... 23
VII
3.3 Experimental Platform ............................................................................................ 23
3.4 Benchmark Dataset ................................................................................................. 25
IV. EFFICIENT EMBEDDED ARCHITECTURE FOR K-NN ALGORITHM ..............26
4.1 Embedded Software Architecture ........................................................................... 26
4.2 Embedded Hardware Architecture .......................................................................... 30
4.2.1 Step 1 normalization ......................................................................................... 32
4.2.2 Step 2 distance measurement ............................................................................ 34
W
4.2.3 Step 3 Sorting ................................................................................................... 36
4.2.4 Step 4 new class determination ........................................................................ 37
IE
4.3 Xilinx LogiCORE CoreGen IPs .............................................................................. 40
EV
4.3.1 Multiplier .......................................................................................................... 40
4.3.2 Subtractor.......................................................................................................... 41
PR
4.3.3 Adder ................................................................................................................ 41
4.3.3 Divider .............................................................................................................. 42
4.3.3 Square root ........................................................................................................ 42
4.3.4 Comparator ....................................................................................................... 43
4.3.5 BRAM .............................................................................................................. 43
4.4 Hardware Parameter List ......................................................................................... 44
4.5 Proposed parallel architecture ................................................................................. 44
V. EXPERIMENTAL RESULTS AND ANALYSIS .......................................................46
VIII
5.1 Execution time for CPU, Embedded HW, Embedded SW and classification
accuracy ......................................................................................................................... 46
5.1.1 Iris benchmark dataset .......................................................................................... 47
5.1.2 Heart benchmark .................................................................................................. 52
5.1.3 User knowledge modeling benchmark ................................................................. 56
5.2 Dataset effect on classification accuracy ................................................................ 61
5.2 Hardware space statics ............................................................................................ 61
VI. CONCLUSION AND FUTURE WORK ....................................................................63
W
6.1 Conclusion............................................................................................................... 63
IE
6.2 Future work ............................................................................................................. 64
BIBLIOGRAPHY ..............................................................................................................66
EV
PR
IX
LIST OF TABLES
TABLE
3-1. Memory map and IP Version......................................................................................23
4-1. Multiplier IP configuration .........................................................................................40
4-2. Subtractor IP configuration ........................................................................................41
4-3. Adder IP configuration ...............................................................................................41
4-4. Divider IP configuration .............................................................................................42
4-5. Square root IP configuration.......................................................................................42
W
4-6. Comparator IP configuration ......................................................................................43
4-7. Subtractor IP configuration ........................................................................................43
IE
4-8. Hardware parameters ..................................................................................................44
5-1. Execution time, speedup and accuracy for iris benchmark when K =1 .....................47
EV
5-2. Execution time, speedup and accuracy for iris benchmark when K =3 .....................48
5-3. Execution time, speedup and accuracy for iris benchmark when K =5 .....................48
PR
5-4. Execution time, speedup and accuracy for iris benchmark when K =7 .....................49
5-5. Execution time, speedup and accuracy for iris benchmark when K =9 .....................49
5-6. Execution time, speedup and accuracy for heart benchmark when K =1...................52
5-7. Execution time, speedup and accuracy for heart benchmark when K =3...................52
5-8. Execution time, speedup and accuracy for heart benchmark when K =5...................53
5-9. Execution time, speedup and accuracy for heart benchmark when K =7...................53
5-10. Execution time, speedup and accuracy for heart benchmark when K =9.................54
5-11. Execution time, speedup and accuracy for user knowledge modeling benchmark
when K =1 ..........................................................................................................................56
X
5-12. Execution time, speedup and accuracy for user knowledge modeling benchmark
when K =3 ..........................................................................................................................57
5-13. Execution time, speedup and accuracy for user knowledge modeling benchmark
when K =5 ..........................................................................................................................57
5-14. Execution time, speedup and accuracy for user knowledge modeling benchmark
when K =7 ..........................................................................................................................58
5-15. Execution time, speedup and accuracy for user knowledge modeling benchmark
when K =9 ..........................................................................................................................58
5-16. Utilization summary .................................................................................................61
W
IE
EV
PR
XI
LIST OF FIGURES
FIGURE
1. Hierarchical Platform-Based Design Approach.............................................................18
2. System level design........................................................................................................19
3. DDR3-SDRAM Organization........................................................................................21
4. Software flow design .....................................................................................................26
5. K=3, dataset has two classes. .........................................................................................29
W
6. K=3, dataset has three classes. .......................................................................................29
7. High-level architecture...................................................................................................30
IE
8. Top-level module ...........................................................................................................31
EV
9. Min-Max data path .........................................................................................................32
10. Normalization data path ...............................................................................................33
PR
11. Distance measurement data path 1 ...............................................................................34
12. Distance measurement data path 2 ...............................................................................35
13. Sorting distance data path ............................................................................................36
14. Kth class data path .......................................................................................................37
15. Class counter data path ................................................................................................38
16. The data path to find the class of the new data ............................................................39
17. Proposed parallel architecture ......................................................................................45
18. Hardware execution time vs percentage of test set when K=1 ....................................50
XII
19. MicroBlaze execution time vs percentage of test set when K=1 .................................51
20. Hardware Vs MicroBlaze Speedup ..............................................................................51
21. Hardware execution time vs percentage of test set when K=1 ....................................54
22. MicroBlaze speed vs test set percent K=1 ...................................................................55
23. Hardware Vs MicroBlaze Speedup ..............................................................................55
24. Hardware execution time vs percentage of test set when K=1 ....................................59
25. MicroBlaze speed vs test set percent K=1 ...................................................................59
W
26. Hardware Vs MicroBlaze Speedup ..............................................................................60
IE
27. Dataset types ................................................................................................................61
EV
PR
XIII
CHAPTER I
INTRODUCTION
Machine learning is a type of artificial intelligence (AI), which enables a program
(or a computer) to learn, without being explicitly programmed and without human
intervention [33]. The basic premise of machine learning is to explore and construct
algorithms that can learn from the input data received, which includes performing statistical
analysis on this data, and then providing accurate predictions on this data. Machine learning
allows the programs (or algorithms) to learn, adjust, develop, and grow autonomously,
W
based on the new data [34]; thus, the programs become more accurate in predicting
outcomes.
IE
Machine learning has become one of the cornerstones of the Information
EV
Technology within the last two decades [35]. Machine learning has been incorporated into
wide range of applications that have become parts of our daily lives. Few examples include:
PR
effective web searching; online recommendation engines that suggest friends on social
media networks (such as Facebook), or suggest movies/shows on Netflix; self-driving
Google cars; cyber-fraud detections; practical speech recognitions; and a vastly improved
understanding of the human genomes [34], [33]. Furthermore, applications such as data
security, personal security, financial trading, healthcare, marketing personalization, natural
language processing, all incorporate some form of machine learning techniques [39]. In
addition, according to market data [36], the global market for machine learning was $2.5
billion in 2014, and is expected to reach $12.5 billion in 2019. The above facts illustrate
1
that machine learning market will continue to flourish over the long term as new and
autonomous applications emerge.
Machine learning involves many important data mining tasks. Data mining is the
science of extracting useful information from big datasets [42]. Data mining commonly
involves many high-level tasks such as classification, regression, clustering,
summarization, dependency modeling, and change and deviation detection [14].
Furthermore, depending on the algorithms, machine learning can be categorized into
supervised learning and unsupervised learning.
W
In this research work, we focus mainly on classification task, which is a supervised
learning technique. Classification techniques are employed by many applications in
IE
various domains including medical, security, and marketing. There exist many
EV
classification algorithms. After our investigations on different classifiers, we decide to
focus on the K-nearest neighbor (K-NN) classification algorithm, which is one the most
popular classification algorithm. The K-NN classification algorithm is widely used in many
PR
machine learning applications.
Most of today’s classification algorithms (including algorithms used for machine
learning) are becoming more complex (compute/data intensive) requiring significant
processing power, thus affecting the speed-performance of these applications.
Furthermore, existing classification algorithms for machine learning are processor-based
software-only designs. These processor-based algorithms, as is, are incapable of analyzing
and processing the enormous amount of data efficiently and effectively.
Consequently, some kind of hardware support is imperative to address the
constraints and requirements of the algorithms used for machine learning applications. In