ABSTRACT
Nowadays, more and more people use images to represent and transmit information. It is also
popular to extract important information from images.
Image recognition is an important research area for its widely applications. In the relatively
young field of computer pattern recognition, one of the challenging tasks is the accurate
automated recognition of human handwriting. Optical Character Recognition (OCR) is a
subfield of Image Processing which is concerned with extracting text from images or scanned
documents. In this project, we have chosen to focus on recognizing handwritten digits
available in the MNIST database. The challenge in this project is to use basic Image
Correlation, also known as Matrix Matching, techniques in order to maximize the accuracy of
the handwritten digits recognizer without going through sophisticated techniques like
machine learning.
Key Words: Image Processing, Optical Character Recognition, Handwritten Digits, Image
Correlation, Matrix Matching, Machine Learning.
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF ABBREVATION
CHAPTER 1
INTRODUCTION
It is very easy to process the images and analyse them in the human brain. If the eye sees a
definite image, it can easily break it apart and know various aspects of it.
That process automatically occurs in the brain, which not only involves the analysis of these
images but also the comparison of their different characteristics with what it already knows,
in order to be able to recognize these elements. There is a field in computer science that tries
to do the same thing for machines, which is Image Processing. Image processing is the field
concerning the analysis of images to extract some useful information from them. This method
takes images and converts them into the digital form readable by computers, it applies certain
algorithms on them, and results in a better-quality image or with some of their characteristics
that could be utilized to extract some information from them. This concept applied in image
processing is actually used in several areas, especially nowadays. Several software’s have
been developed using this concept of image processing. In the present world, self-driven cars
are appearing which can detect other cars and human beings to avoid accidents. Also, some
social media applications, like Facebook, can do facial recognition owing to this technique.
In addition, some software uses it to identify characters in some images, which is the concept
of optical character recognition, that we will be discussing and discovering in this project.
One of the narrow fields of image processing is recognizing characters from an image, which
is called Optical Character Recognition (OCR).
It is meant to read an image containing one or more characters, or read a scanned text of
typed or handwritten characters and be able to recognize them. There have been many
researches in this area for the purposes of developing optimal techniques that possess a high
accuracy and correctness. Among the most used algorithms and that proved a very high
performance are machine learning algorithms, such as Neural Networks, Support Vector
Machine, among others. One of the applications of OCR is recognizing handwritten
characters. And we will focus on building a mechanism that would recognize handwritten
digits. We will be reading
images of hand-written digits obtained from the MNIST database, trying to identify which
digit is represented by that image. To do this we will use the basic Image Correlation
techniques, more commonly known as Matrix Matching. This method is basically based on
matrices manipulations, since it reads images as matrices in which each element is a pixel.
1.2 OBJECTIVES
• To provide an easy user interface to input the object image.
• User should be able to upload the image.
• System should be able to upload the image.
• System should be able to preprocess the given input to supress the background.
• System should be able to detect digit regions present in the image.
• System should retrieve digit present in the image and display them to the user.
1.2 SCOPE
• Improve of human computer interface for computer illiterate people by providing various
computing services on inputs.
• Can be implemented on smart phones, tablets, as a virtual keyboard.
• The system can create paperless environment by digitizing handwritten character.
CHAPTER 2
LITERATURE SURVEY
Handwriting recognition the last frontiers, Proceeding's 15th International Conference
on Pattern Recognition, Barcelona, ICPR-2000, Vol.4, pp. 1-10.
The last frontiers of handwriting recognition are considered to have started in the last decade
of the second millennium. This paper summarizes the nature of the problem of handwriting
recognition, the state of the art of handwriting recognition at the turn of the new millennium,
the results of CENPARMI researchers in automatic recognition of handwritten digits,
touching numerals, cursive scripts, and dates formed by a mixture of the former 3 categories.
Wherever possible, comparable results have been tabulated according to techniques used,
databases, and performance. Aspects related to human generation and perception of
handwriting are discussed. The extraction and usage of human knowledge, and their
cooperation into handwriting recognition systems are presented. Challenges, aims, trends,
efforts and possible rewards, and suggestions for future investigations are also included.
Central Research Laboratory, Performance evaluation of pattern classifiers for
handwritten character recognition. International Journal on Document Analysis and
Recognition, Tokyo 185-8601, Japan.
This paper describes a performance evaluation study in which some efficient classifiers are
tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier
(modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ
(learning vector quantization) classifier. They are efficient in that high accuracies can be
achieved at moderate memory space and computation cost. The performance is measured in
terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and
outlier resistance. The outlier resistance of neural classifiers is enhanced by training with
synthesized outlier data. The classifiers are tested on a large data set extracted from NIST
SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher
than those of the nearest neighbour (1-NN) rule and regularized discriminant analysis (RDA).
It is shown that neural classifiers are more susceptible to small sample size than MQDF,
although they yield higher accuracies on large sample size. As a neural classifier, the
polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity
rejection. On the other hand, MQDF is superior in outlier rejection even though it is not
trained with outlier data. The results indicate that pattern classifiers have complementary
advantages and they should be appropriately combined to achieve higher performance.
A Shallow Convolutional Neural Network for Accurate Handwritten Digits
Classification" 13th international conference, PRIP, Minsk, Belarus, pp. 77-85.
At present the deep neural network is the hottest topic in the domain of machine learning and
can accomplish a deep hierarchical representation of the input data. Due to deep architecture
the large convolutional neural networks can reach very small test error rates below 0.4%
using the MNIST database. In this work we have shown, that high accuracy can be achieved
using reduced shallow convolutional neural network without adding distortions for digits.
The main contribution of this paper is to point out how using simplified convolutional neural
network is to obtain test error rate 0.71% on the MNIST handwritten digit bench-mark. It
permits to reduce computational resources in order to model convolutional neural network.
Handwritten Digit String Recognition using Convolutional Neural Network. 2018 24
International Conference on Pattern Recognition (ICPR).
String recognition is one of the most important tasks in computer vision applications.
Recently the combinations of convolutional neural network (CNN) and recurrent neural
network (RNN) have been widely applied to deal with the issue of string recognition.
However, RNNs are not only hard to train but also time-consuming. In this paper, we propose
a new architecture which is based on CNN only, and apply it to handwritten digit string
recognition (HDSR). This network is composed of three parts from bottom to top: feature
extraction layers, feature dimension transposition layers and an output layer. Motivated by its
super performance of Dense Net, we utilize dense blocks to conduct feature extraction. At the
top of the network, a CTC (connectionist temporal classification) output layer is used to
calculate the loss and decode the feature sequence, while some feature dimension
transposition layers are applied to connect feature extraction and output layer. The
experiments have demonstrated that, compared to other methods, the proposed method
obtains significant improvements on ORAND-CAR-A and ORAND-CAR-B datasets with
recognition rates 92.2% and 94.02%.
Improved handwritten digit recognition using convolutional neural networks (CNN).
Sensors, 20(12), 3344.
Traditional systems of handwriting recognition have relied on handcrafted features and a
large amount of prior knowledge. Training an Optical character recognition (OCR) system
based on these prerequisites is a challenging task. Research in the handwriting recognition
field is focused around deep learning techniques and has achieved breakthrough performance
in the last few years. Still, the rapid growth in the amount of handwritten data and the
availability of massive processing power demands improvement in recognition accuracy and
deserves further investigation. Convolutional neural networks (CNNs) are very effective in
perceiving the structure of handwritten characters/words in ways that help in automatic
extraction of distinct features and make CNN the most suitable approach for solving
handwriting recognition problems. Our aim in the proposed work is to explore the various
design options like number of layers, stride size, receptive field, kernel size, padding and
dilution for CNN-based handwritten digit recognition. In addition, we aim to evaluate various
SGD optimization algorithms in improving the performance of handwritten digit recognition.
A network's recognition accuracy increases by incorporating ensemble architecture. Here, our
objective is to achieve comparable accuracy by using a pure CNN architecture without
ensemble architecture, as ensemble architectures introduce increased computational cost and
high testing complexity. Thus, a CNN architecture is proposed in order to achieve accuracy
even better than that of ensemble architectures, along with reduced operational complexity
and cost. Moreover, we also present an appropriate combination of learning parameters in
designing a CNN that leads us to reach a new absolute record in classifying MNIST
handwritten digits. We carried out extensive experiments and achieved a recognition accuracy
of 99.87% for a MNIST dataset.
CHAPTER 3
DESIGN AND IMPLEMENTATION
3.1 Image Processing
Image processing is a very wide field within computer science which deals mainly with
analysing images and trying to get some information out of them. The image to be processed
is imported then analysed using some computations, which, by the end, results either in an
image with a better quality or some of the characteristics of this image depending on the
purpose of this analysis. This is a very wide field within computer science, which also has
several other subfields of which Optical Character Recognition that we will be mainly
dealing with throughout this project.
There are two ways to provide input to the system. The user can either upload the image of
the digit he wants to detect or the data from the MNIST dataset. The input images are pre-
processed. Using the different classifiers, the recognized digits' accuracy is compared and the
result is obtained. The results obtained are displayed along with the accuracy.
3.2 Optical Character Recognition (OCR)
It is easy for the naked eye to recognize a character when spotted in any document; however,
computers cannot identify the characters from an image or scanned document. In order to
make this possible, a lot of research has been done, which resulted in the development of
several algorithms that made this possible. One of the fields that specialize in character
recognition under the light of Image Processing is Optical Character Recognition (OCR). In
Optical Character Recognition, a scanned document or an image is read and segmented in
order to be able to decipher the characters it contains. The images are taken and are pre-
processed so as to get rid of the noise and have unified colours and shades, then the
characters are segmented and recognized one by one, to finally end up with a file containing
encoded text containing these characters, which can be easily read by computers. Optical
Character Recognition dates back to the early 1900s, as it was developed in the United States
in some reading aids for the blind. In 1914, Emanuel Goldberg was able to implement a
machine able to convert characters into "standard telegraph code". In the 1950s, David
Shepard, who was at that time an engineer at the Department of Defence, developed a
machine that he named Gismo, which is able to read characters and translate them into
machine language. In 1974, Ray Kurzweil decided to develop a machine that would read text
for blind and visually impaired people under his company, Kurzweil Computer Products.
There are several software and programs, nowadays, which use OCR in several different
applications. In 1996, the United States Postal Services were able to develop a mechanism,
HWAI, which recognizes handwritten mail addresses.
3.3 Methods Used in OCR
A lot of research has been done in the field of OCR, and still being done, which resulted in
the development of several algorithms which enable computers to recognize characters from
images or scanned texts. Many of these techniques have attained very high efficiency and a
low error rate. However, these algorithms are still being investigated and improved for a
better performance.
3.3.1 Machine Learning
Machine learning is a field that concerns making programs learn and know how to behave in
different situations using data. One of its applications is Optical Character Recognition.
3.3.2 Artificial Neural Network
An Artificial Neural Network (ANN) is a system that mimics the human's biological neural
network in the brain. It is an algorithm used for machine learning, which means it uses data to
learn how to respond to different inputs. The ANN can be seen as a box, which takes one or
more inputs and gives one output. Inside the box, there exist several interconnected nodes.
The input is fed into the program, which goes through the several layers and nodes of the
ANN and gives an output using a transfer function.
Artificial Neural Networks are used for OCR and have proved a very high accuracy rate. In
this case, the ANN would "recognize a character based on its topological features such as
shape, symmetry, closed or open areas, and number of pixels". The high accuracy of this kind
of algorithms is mainly thanks to its ability of learning from the training set, which would
contain characters with similar features.
some Neural Networks have proven a very high performance. An implementation of the ANN
done by Simard, Steinkraus, and Platt has reduced the error rate of recognizing handwritten
digits from the MNIST dataset to a percentage as low as 0.7%.
3.3.3 Support Vector Machine
Support Vector Machine (SVM) is an algorithm that belongs to machine learning as
well. SVMs are known as high performance pattern classifiers. While Neural Networks aim
at minimizing the training error, SVMs have as goal to minimize the "upper bound of the
generalization error". The learning algorithm in this technique is based on classification and
regression analysis.
This kind of classifier has been used in the recognition of very complex characters like the
Khmer language and has proved a very high performance.
3.3.4 Image Correlation
Image Correlation is a technique used to recognize characters from images. This
approach, also referred to as Matrix Matching, uses mathematical computations in order to
analyse the images. By using this technique, the images are read as matrices, where each
element represents a pixel, which makes it easier to manipulate them using mathematical
approaches. The image to be identified is loaded as a matrix and compared to the images in
the reference set. The test image is overlapped with each image in the reference set to be able
to see how it matches with each one of them so as to tell which one represents it the most.
The decision can be made by seeing the pixels that match and the ones left out from either
one of the two images. This technique has many challenges and limitations, as it only
overlaps the images and tries to see how much they look alike. By using this method,
problems arise when having characters of different sizes, or when one of them is rotated by a
certain angle.
3.3.5 Feature Extraction
Feature extraction is a technique based on pattern recognition. The main idea of feature
extraction is analysing the images and derive some characteristics from these images that
identify each specific element. An example of these characteristics would be the curvatures,
the holes, the edges, etc. In the case of digits recognition, these features could be the holes
inside the digits (for example for the eight, the six, and maybe the two as well) as well as the
angles between some straight lines (for example in the one, the four, and the 6 seven).
Whenever an unknown image is to be recognized, its features are compared to these so that it
can be classified.
3.4 Tools
This project's main objective is to be able to read the images containing the handwritten
digits and be able to identify those digits using basic image correlation techniques. These
images are normally represented and read as matrices, in which every element portrays a
pixel. The image correlation technique takes these matrices and compares them using some
algorithms so as to identify the match that represents the digit we are trying to figure out.
This project will be mainly using matrices and heavy numerical computations, that is why it
is very important to consider the tools that would provide us with a suitable environment for
performing these computations.
3.4.1 Octave
Octave is a free and open-source software that uses a high-level programming
language. It has the same functionalities as MATLAB and is compatible with it. It offers a
very simple and suitable interface to exert some mathematical computations. It provides some
tools to solve mathematical problems like some common linear algebra problems. It is also
very efficient when it comes to the use of resources, i.e., time and memory, when it comes to
these operations. Also, it is very easy to use it when dealing with matrices, as it provides with
many functions and operations that make it less costly to manipulate them. In this project, we
will deal with images as matrices, in which each element represents a pixel, that is why it is
very necessary for us to choose a tool that will make our computations easier and more
efficient in terms of time and memory resources, Both MATLAB and Octave are very easy to
learn and work with and provide a suitable environment for this kind of projects. We have
opted for Octave as it is free and open source.
3.4.2 MNIST Database
The MNIST database, which stands for the Modified National Institute of Standards
and Technology database, is a very large dataset containing several thousands of handwritten
digits. This dataset was created by mixing different sets inside the original National Institute
of Standards and Technology (NIST) sets, so as to have a training set containing several types
and shapes of handwritten digits, as the NIST set was divided into those written by high
school students and others written by the Census Bureau workers. The MNIST dataset 8 has
been the target of so many researches done in recognizing handwritten digits. This allowed
the development and improvements of many different algorithms with a very high
performance, such as machine learning classifiers. In order to be able to implement our
recognizer and test its performance, it is necessary to have a suitable dataset which contains a
large number of handwritten digits. This dataset should be able to allow us to discover the
challenges and limitation of the image correlation technique and push us to look for ways and
rules to enhance it and assess its accuracy. We have opted for this dataset to be used for
testing our program since it has proved a great reliability and importance in the field.
3.5 Feasibility Study
From a technical perspective, since this project makes heavy use of numerical computations,
using Octave is a wise choice as it will make the program more efficient. This software will
also provide us with some libraries to read and manipulate the images that will make the
implementation process easier.
As for the dataset to use in the testing of the project, we have chosen the MNIST Database.
This database contains thousands of handwritten digits that have been used in the
development of programs with a similar aim. This dataset is open for public use with no
charges. It is also very convenient for our project and will help us reduce the time by using
directly as a test set without having to make one ourselves.
Since all the tools to be used in this project are free of charge and very easy to use, we can
conclude that this project is very feasible in terms of financial resources as well as effort and
time.
CHAPTER 4
METHODOLGY
4.1 Getting Familiar with the Tools
The first step we had to go through while working on this project was getting familiar with
the tools used, i., Octave and the MNIST dataset. After setting up the environment for Octave
to work perfectly and downloading the dataset, I have started experimenting with both in
order to get familiar with them and know how to use them easily in the future. Since all the
programming is mainly done in Octave, we had to download it along with its Graphical User
Interface into the computer, and learn a little bit about its functions and how to use it. Octave
is a free software which makes it very easy to work with matrices and vectors and is very
efficient in performing calculations on them. I have started learning how to use it and looking
for its main functions that I will be using in the implementation of the project. For that, I have
used some random images of digits to see how they can be read and modified as well as how
to apply some computations on them. Moreover, I had to investigate the format of the MNIST
dataset and get familiar with its representation. The MNIST dataset, which was used to create
our test set, contains thousands of handwritten digits, represented as matrices. It has been
used in the development of several programs and projects with the same aim as ours. After
downloading the file which contains the handwritten digits, I have loaded it on Octave in
order to visualize the images and figure out how to use and manipulate them.
4.2 Creating the reference and test set
One of the main steps in the project is creating the reference and the test set that will both be
used in the implementation phase. The test set is to be used in order to assess the performance
of the program and evaluate its success or error rate. It is to be taken from the MNIST
dataset, since it contains the handwritten digits that we intend to recognize and identify. As
for the reference set, it is used to compare the test images and be able to identify the digit
they represent. It is to be created using different fonts.
CHAPTER 5
DATA
Data
It is very necessary to know the kind of data we are using before we start the design and the
implementation of the program. That is why we had to have a look at its format to understand
how it is represented before creating the reference and the test set.
5.1. Dataset Format
The dataset that I have downloaded from the MNIST database contains 60,000 images of
handwritten digits, from zero to nine, all grouped in one file. Each of the images is of size 28
by 28 pixels and represents a digit. I have noticed that there is no pattern or order to the way
the images were organized in the file. The images are represented as matrices, of which the
elements represent the pixels. Also, each image has a label that indicates the digit
represented. This label was very helpful later on in order to be able to create the test set.
Furthermore, the data did not contain noise or any major problems to deal with, that is why it
was used without preprocessing it.
5.2. Reference Set
To be able to recognize the digit represented by a certain image, it is required to compare it
with other images containing known digits to be able to make the decision. For that it is
necessary to create a reference set which will contain all these images. That is to say, each
image we would want to recognize is to be compared to the images in the reference set. The
image with the highest match is the one that represents the right number. Since handwritten
digits differ from a person to another, the reference set needs to have digits with different
fonts. That is why, we have created six images of each digit using the online image editor
pixr.com, each one with a different font. The reference set contains images with the same
dimensions as the ones in the MNIST dataset, i.e., 28 by 28 pixels. Furthermore, these images
have a black background and a white font, which made it easier to use and manipulate them
later on using Octave. Furthermore, to make the comparison easier, we have regrouped each
six images representing the same digit under one file. So the resulting reference set was ten
files, each one representing a digit from zero to nine, and containing six images of that digit
in different fonts. The pixels of these images are then changed into zeros and ones, which
makes the overlapping of the images easier. The black background was initially represented
as zeros, so it is left the same. As for the pixels of the white font, each one of them was
represented with a different non zero value depending on the shade of white. These non-zero
values are all converted into ones. The following image displays the digit "2" reference set.
Rest of the reference sets are in Appendix A.
5.3. Test Set
The program to be developed needs to be tested against some images that contain handwritten
digits so as to be able to assess its performance and calculate its success rate. That is why it is
very necessary to create a test set. The test set represents an example of the images containing
the handwritten digits which will have to be compared to the images in the reference set so as
to identify them. This set was formed using the file from the MNIST database. The original
file contained 60,000 images representing different digits. This made it difficult to look for
each number using the label for the testing of the program. In order to make it easier to access
each digit we want; we have decided to store a number of images from each digit in a
separate file. That is why we have stored 20 images of each digit in ten different files. That is
to say, the resulting test set was in the form of ten files, each one of them represents a digit
and contains 20 images of it. These images were extracted from the initial file by reading
them and their labels using Octave. In order to make the manipulation of the matrices/images
easier, we had to make some modifications in the elements of all the matrices representing
the test set as well. The black pixels were originally represented as zeros, so they were left the
same. As for the white ones, each of them had a different non zero number, so we turned
them all into ones.
CHAPTER 6
PROGRAM CODE AND SAMPLE OUTPUT
Handwritten Digit Recognition program in Python using a neural network with the popular
MNIST dataset. We'll use the Keras library with TensorFlow backend for building and
training the neural network model.
6.1. Step-by-Step Explanation and Code
1. Install and Import Libraries
Make sure you have the necessary libraries installed:
pip install tensorflow
Then import the required modules:
import tensorflow as tf from
tensorflow.keras.datasets import mnist from
tensorflow.keras.models import Sequential from
tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
2. Load the Dataset
The MNIST dataset is preloaded in Keras and consists of 60,000 training images and
10,000 testing images of handwritten digits (0 to 9).
(x_train, y_train), (x_test, y_test) = mnist.load_data()
3. Preprocess the Data
Neural networks work better with normalized data. Therefore, we scale the pixel
values of images from a range of 0-255 to 0-1.
x_train, x_test = x_train / 255.0, x_test / 255.0
Also, we convert the labels to categorical (one-hot encoding), turning each label into a
vector where only the target index is 1 (e.g., 7 becomes [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]).
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
4. Build the Model
We use a simple neural network with one input layer, one hidden layer, and one output
layer. This basic architecture works well for image classification tasks.
model = Sequential([
Flatten(input_shape=(28, 28 )),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
5. Compile the Model
We specify the optimizer, loss function, and evaluation metric. For multi-class
classification, we use categorical cross-entropy as the loss function, and for
optimization, Adam is a commonly used optimizer.
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
6. Train the Model
Train the model on the training data for 5 epochs, with a batch size of 32. This can be
adjusted based on system capability and desired accuracy.
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)
7. Evaluate the Model
Check how the model performs on the test data.
test_loss, test_acc = model.evaluate(x_test, y_test)
print("Test accuracy:", test_acc)
8. Make Predictions
Finally, make predictions on new data (in this case, the test data) and display the first
5 predictions.
predictions = model.predict(x_test)
for i in range(5):
print("Predicted label:", predictions[i].argmax(), "Actual label:",
y_test[i].argmax())
6.2. Explanation of Key Concepts:
Flatten Layer: This reshapes the 28x28 images into a flat array of 784 pixels for
input to the dense layer.
Dense Layer: Fully connected layers; the hidden layer has 128 neurons, and the
output layer has 10 neurons (one for each digit).
Activation Functions: ReLU for hidden layers, which helps introduce non-linearity,
and Softmax for output to ensure output values represent probabilities.
6.3. FULL CODE:
Here’s the complete code:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
# Load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Preprocess data
x_train, x_test = x_train / 255.0, x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# Build model
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Train model
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)
# Evaluate model
test_loss, test_acc = model.evaluate(x_test, y_test)
print("Test accuracy:", test_acc)
# Make predictions
predictions = model.predict(x_test)
for i in range(5):
print("Predicted label:", predictions[i].argmax(), "Actual label:", y_test[i].argmax())
This model achieves good accuracy on the MNIST dataset and serves as a simple,
foundational introduction to image classification.
6.4. SAMPLE OUTPUT
1. Model Training Process
During the training, the model will display the loss and accuracy for each
epoch. Here’s a sample output for 5 epochs:
Epoch 1/5
1500/1500 [==============================] - 4s 2ms/step - loss:
0.2941 - accuracy: 0.9165 - val_loss: 0.1498 - val_accuracy: 0.9553
Epoch 2/5
1500/1500 [==============================] - 3s 2ms/step - loss:
0.1285 - accuracy: 0.9616 - val_loss: 0.1152 - val_accuracy: 0.9647
Epoch 3/5
1500/1500 [==============================] - 3s 2ms/step - loss:
0.0898 - accuracy: 0.9730 - val_loss: 0.1021 - val_accuracy: 0.9685
Epoch 4/5
1500/1500 [==============================] - 3s 2ms/step - loss:
0.0687 - accuracy: 0.9785 - val_loss: 0.0952 - val_accuracy: 0.9712
Epoch 5/5
1500/1500 [==============================] - 3s 2ms/step - loss:
0.0537 - accuracy: 0.9830 - val_loss: 0.0957 - val_accuracy: 0.9714
2. Model Evaluation on Test Data
After training, the model is evaluated on the test data. The output will include
the test loss and test accuracy:
313/313 [==============================] - 0s 1ms/step - loss:
0.0889 - accuracy: 0.9732
Test accuracy: 0.9732
This indicates that the model has achieved 97.32% accuracy on the test
dataset.
3. Predictions
The code makes predictions for the first five images in the test dataset and
compares them with the actual labels. A sample output might look like this:
Predicted label: 7 Actual label: 7
Predicted label: 2 Actual label: 2
Predicted label: 1 Actual label: 1
Predicted label: 0 Actual label: 0
Predicted label: 4 Actual label: 4
This output shows that the model correctly predicted the labels for the first five
test images.
This accuracy can vary slightly depending on factors such as training
parameters, hardware, and randomness in initialization, but it generally should
be around 97%
CHAPTER 7
CONCLUSION AND FUTURE WORK
Conclusion
Optical Character Recognition is a very broad field concerned with turning an image or a
scanned document containing a set of characters into an encoded text that could be read by
machines. In this project, we have attempted to build a recognizer for handwritten digits
using the MNIST dataset. The challenge of this project was to be able to come up with some
basic image correlation techniques, instead of some sophisticated algorithms, and see to what
extent we can make this mechanism accurate. We have tried several versions and kept trying
to improve each one in order to reach a higher performance rate. The last version has reached
a rate of 57% accuracy. Unfortunately, we could not compare the performance of the
mechanism we have built to some others that have already been designed and/or implemented
before because we did not find any academic paper that tackles this method. The performance
we have reached is far less than that of machine learning, which reaches a performance rate
of 99.3%; however, it could be further improved and made into a better one. The goal of this
project was to explore the field of OCR and try to come up with some techniques that could
be used without going into deep computations, and even if the final result is not very reliable,
it still provides an accuracy way better than random.
Future work
The future steps that to go for would be having a closer look at the results of all the versions
in order to find new rules. By extracting and implementing them, we will be able to enhance
the performance of these versions. Moreover, it would be good if we could make some
modifications to both the reference set and the rules in order to make our program more
general and able to identify both typed and handwritten digits. Furthermore, in the future, we
could make a great use of the matrices that indicate the first maximum overlap of each test
image with the reference images, along with the number of pixels left out from both. These
matrices could be used with some clustering algorithms to build a program able to recognize
handwritten digits with a very high efficiency. Last but not least, we thought about using
linear or high-level regression in the versions we have developed in order to create more
rules. As regression could be used for binary classification and is not very suitable to classify
a digit out of ten, this technique could be used in order to tell which digit is the most suitable,
the first maximum or second maximum, which will enable us to generate more rules; thus,
reach a higher efficiency.
REFERENCES
[1] C.Y. Suen, J. Kim, K. Kim, Q. Xu, L. Lam, Handwriting recognition the last frontiers.
Proc. 15th ICPR, Barcelona, 2000, Vol.4, pp. 1-10.
[2] T Siva Ajay (July 2017), "Handwritten Digit Recognition Using Convolutional Neural
Networks" International Research Journal of Engineering and Technology (IRJET), Vol. 04,
Issue 07, pp. 2971-2976.
[3] Vladimir Golovko, MikhnoEgor, AliaksandrBrich, and AnatoliySachenko (October 2016),
"A Shallow Convolutional Neural Network for Accurate Handwritten Digits Classification"
13th international conference, PRIP, Minsk, Belarus, pp. 77-85.
[4] Hongjian Zhan, ShujingLyu, Yue Lu Shanghai (August 2018), "Handwritten Digit String
Recognition using Convolutional Neural Network", 24th International Conference on Pattern
Recognition (ICPR), pp. 3729-3734.
[5] Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved
handwritten digit recognition using convolutional neural networks (CNN). Sensors, 20(12),
3344. doi:10.3390/s20123344.
[6] N. Hagita, S. Naito, I. Masuda, Handprinted Kanji characters recognition based on pattern
matching method, Pr oc. ICTP, 1983, pp. 169-174.
[7] D.-S. Lee, S.N. Srihari, Handprinted digit recognition: a comparison of algorithms, Pr oc.
3rd IWFHR, 1993, pp. 153-164.
[8] U. Kreel, J. Sche urmann, Pattern classication techniques based on function
approximation, Handbook of Character Recognition and Document Image Analysis, World
Scientic, 1997, pp.49-78.