Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network
Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network
ABSTRACT Traditional assessment methods often face a trade-off between accessibility and in-depth
evaluation. While multiple-choice exams offer easy grading, they may limit the ability to assess critical
thinking and analytical skills. On the other hand, essay exams provide a valuable tool to gauge these skills,
but their manual evaluation has several drawbacks. First, grading essays is a time-consuming process. Each
student’s response requires individual attention, leading to a significant workload for educators. Second,
subjectivity is a major concern. Factors like the evaluator’s mental state, fatigue, or even background can
influence their judgment, leading to inconsistencies and potential biases in grading. This paper proposes
solving these challenges using artificial intelligence (AI) for essay assessment. We present a novel approach
utilizing a One-dimensional Convolutional Neural Network (1D CNN) deep learning model. This model
is specifically designed to analyze image-based student answer sheets, automatically classifying them
according to the scores allocated for each question. The dataset used consists of answer sheets from
30 students in a coding class, each provided with a pre-annotated template. Our development process divided
the available data into a 60/40 split, with 60% dedicated to testing the model’s performance and 40% used
for training. The model achieved an average validation accuracy of 81.18% through this training. These
results suggest that the proposed 1D CNN model offers a promising avenue for mitigating the limitations of
manual essay grading. By automating the process and reducing subjectivity, this approach has the potential
to streamline the assessment workload for educators while promoting consistency and fairness in evaluating
student learning outcomes.
INDEX TERMS Deep learning, automatic essay scoring, handwritten image, 1D CNN.
exam evaluators in assigning scores to essay answers. The In addressing the existing limitations in automated scor-
application will leverage deep learning technology with a ing systems for multiple-choice tests, the paper in [31]
classification approach. proposed an enhanced approach by incorporating image
In the realm of auto-grading and autoscoring for handwrit- processing to achieve a cost-effective and rapid evaluation
ten assessments, numerous studies have delved into enhanc- of multiple-choice questions in questionnaires. The system
ing evaluation processes. Notable methodologies explored allows users to print and scan answer sheets using a
in the literature include Convolutional Neural Networks regular scanner, subsequently processing the scanned sheets
(CNN) [1], [2], [3], [4], [5], k-Nearest Neighbors (KNN) on a personal computer. Notably, the system offers quick
[6], [7], [8], [9], Optical Handwriting Recognition (OHR) feedback to students by annotating returned answer sheets.
[10], [11], [12], Support Vector Machine (SVM) [13], It features handwriting recognition, recognition of student
[14], Deep Convolutional Neural Networks (DCNN) [15], Identification (student ID), and the flexibility for students
Mobile Net [16], [17], and Transfer Learning [18], [19]. to modify their answers directly on the answer sheet. The
These approaches aim to streamline the assessment of method encompasses finder pattern recognition and optical
handwritten content, leveraging advanced neural network mark recognition algorithms, as detailed in the paper, which
architectures, image processing techniques, and machine also discusses related works in the field. The system’s
learning algorithms. effectiveness is underscored by the experimental results,
Many previous studies [9], [20], [21], [22], [23], [24], [25], with a 95% success rate in student ID recognition. Future
[26], [27], [28] also used the Optical Character Recognition work is aimed at enhancing the user experience and making
(OCR) method. OCR technology facilitates the automatic the system available online, acknowledging a continual
transformation of images into text that can be read and commitment to addressing evolving needs in the domain of
processed by machines. It can handle both printed and automated multiple-choice test assessment.
handwritten text and is utilized for digitizing various types An essay answer assessment system was developed by
of documents. These documents can include scanned papers, combining OHR with Automatic Essay Scoring (AES). The
image files, or even photographs of scenes with text. OCR dataset used in this research was obtained from the Hewlett
has a wide range of applications and is commonly used Foundation. Handwritten essays were processed using
for converting printed materials into digital text, such as Multi-Dimensional Long Short-Term Memory (MDLSTM)
data records, invoices, checks, bank statements, computer- along with several convolutional layers. The model also
generated receipts, mail, documentation, printed pages, and incorporated a pre-trained GloVe word vector model. The
business cards, among others [22]. outcomes of the essay evaluations performed by OHR were
The proposed system in [29] introduced a framework compared to manual assessments, resulting in a Quadratic
designed to recognize handwriting and present students’ Weighted Kappa score of 0.88. This score suggests that the
final scores. Utilizing a dataset consisting of 250 answer system performs effectively, even with some errors present.
sheets, the system was tested with two deep CNN models. [33].
Despite achieving a commendable accuracy of 92.86%, it is The reviewed papers collectively highlight some com-
essential to note that this accuracy falls short compared mon limitations in existing automated grading systems for
to those reported in section II, possibly attributed to the handwritten assessments. These include challenges related to
system utilizing its handwritten dataset. Additionally, the dataset diversity with [29], relying on a proprietary dataset,
system exhibited a limitation in terms of cropping, requiring and [30] acknowledging the need for further refinement to
manual intervention. Each dataset entry consists of an enhance adaptability across various assessment scenarios.
image captured by the student, accompanied by text-based Additionally, both [30] and [31] reveal limitations in as-
information such as the student’s name and ID (Nomor assessing longer responses or essays, pointing to a gap
Induk Mahasiswa/NIM), which are provided through the in the existing systems. Furthermore, the need for manual
application. intervention in the cropping process, as identified by
The paper in [30] focused on a model tailored for [29], suggests an opportunity for automation improvement.
evaluating responses of 40 words, surpassing typical student In summary, these limitations collectively underscore the
answers by 13 words. The methodology utilizes a dataset motivation for the present paper, emphasizing the need for a
of 450 answers, allocating 70% for training and 30% for more versatile, comprehensive, and user-friendly automated
testing. The process includes Handwritten Text Recognition grading system for handwritten assessments. To achieve this
using a CNN, followed by Data Preprocessing, Stemming, goal, we utilize a dataset comprising 30 student answers, each
and Vector Representation of Answers. Although the model pre-annotated with a template on the answer sheet. The study
achieved 80% accuracy on the test set, it has limitations, aims to create a system capable of automatically evaluating
such as its inability to evaluate longer responses, images, handwritten responses, utilizing image processing techniques
or equations. Acknowledging a dependence on substantial and methods like CNN. The model in this paper is a discrete
training data, the study underscores the necessity for further classification model since the dataset class is set to discrete
refinement to enhance adaptability and accuracy across type classification in order to simplify the lecturer to asses
diverse assessment scenarios. manually for the training dataset. This endeavor aims to
contribute to the advancement of a more adaptive, effective, continually enrich the realm of neural networks. Conversely,
and streamlined automated grading system tailored to the ANN serves as a catalyst for innovation across these
specific context of handwritten student responses. disciplines by introducing new tools and representations.
The layout of this paper is outlined as follows: Section II ANN functions as extensive parallel computing systems
provides an overview of the theories and technologies used in comprising numerous interconnected simple processors.
system development. Section III details the implementation They seek to replicate the organizational principles theorized
of deep learning methods and models. Section IV presents to exist in the human brain. ANN are often described
the system development results, and Section V concludes the as weighted directed graphs, where artificial neurons act
paper. as nodes, and the weighted edges represent connections
between neuron inputs and outputs. These networks are
typically classified into two main categories: feed-forward
II. ARTIFICIAL INTELLIGENCE ARCHITECTURE networks, which lack feedback loops, and recurrent networks,
A. ARTIFICIAL NEURAL NETWORK where loops arise due to feedback connections. One of the
Artificial Neural Networks (ANN) [5], [34], [35] have primary advantages of ANN is its capacity to autonomously
become a cornerstone in the development of modern tech- learn underlying patterns, such as input-output relationships,
nology. Various types of ANN, such as Modular Neural Net- from a dataset of exemplar instances. This autonomous
works (MNN) [36], Recurrent Neural Networks (RNN) [37], learning capability sets ANN apart from traditional expert
Generative Adversarial Networks (GAN) [38], [39], Deep systems, rendering them highly adaptable and applicable
Neural Networks (DNN) [40], Spiking Neural Networks across various real-world scenarios. This attribute continues
(SNN) [41], and Feedforward Neural Networks (FNN) [42], to fuel ongoing advancements in the realm of ANN [43].
are designed to address specific tasks [32]. This diversity
reflects the remarkable power and versatility of ANN in B. DEEP LEARNING
solving complex problems. Deep learning is a distinct area within machine learning
A comprehensive understanding of ANN requires inter- focused on creating algorithmic models that replicate the
disciplinary knowledge. ANN is extensively developed in structure of the human neural network. This artificial neural
various related fields [43]. Advancements in these fields network is composed of interconnected layers of neurons,
where each neuron can receive input, process it, and pass its comprises two convolutional layers and two pooling layers,
output to other neurons in the network. The process includes hosting 4 and 6 neurons, respectively. The final output from
assigning weights and combining data, enabling the network the last pooling layer is passed through a single dense
to detect patterns and extract significant features from the layer, after which the result is produced by the output layer
input data. Deep learning has the ability to autonomously responsible for classification. The connections supplying
capture complex feature representations, making it highly the convolutional layers are governed by weight filters (w)
effective in diverse applications, such as image recognition, characterized by a kernel size of (Kx × Ky ). Convolution
natural language processing, handwriting recognition, and occurs while adhering to the image boundaries, leading to
many other tasks. a reduction of (Kx - 1) pixels in width and (Ky - 1) pixels
CNN [44] stands out as a prevalent deep learning algorithm in height within the feature map dimensions. Pre-determined
employed extensively across a spectrum of disciplines such subsampling factors (Sx × Sy ) are applied in the pooling
as natural language processing, computer vision, and various layers. For the instance depicted in the figure, kernel sizes
other domains. CNN is capable of efficiently extracting of 4 × 4 were chosen for the two convolutional layers, while
features from data by utilizing various layers, including subsampling factors of 3 × 3 and 4 × 4 were set for the first
convolutional layers, pooling layers, activation layers, fully and second pooling layers respectively. Importantly, these
connected layers, and output layers [29], [30]. values were intentionally selected to ensure that the outputs
One-dimensional (1D) CNN is a derivative of CNN of the final pooling layer (fed into the fully connected layer)
used to process sequentially ordered data or 1D data [45]. are single scalar values (1 × 1). The output layer comprises
The operation of 1D CNN is similar to regular CNN, but two fully connected neurons, aligning with the number of
convolution is performed along the 1D data sequence. A filter categories to which the image is assigned.
or kernel is passed over the 1D data sequence to identify
important features within the data. The data initially existed
as a two-dimensional (2D) image, so flattening is necessary D. ONE-DIMENSIONAL CNN
to transform it into a 1D vector that can be processed by the The traditional deep CNN discussed in the preceding
1D CNN model as shown in the Fig. 2 and Table 1. section is specifically crafted to function on two-dimensional
data types like images and videos. Consequently, they are
commonly labelled as ’2D CNN’. In contrast, a modified
iteration of these, known as 1D CNN, has emerged as
an alternative. Recent research underscores that 1D CNN
offer distinct benefits and has gained preference over their
2D equivalents for handling one-dimensional signals in
particular applications. This inclination towards 1D CNN is
attributed to the following factors:
FIGURE 2. Convolution and pooling in 1D CNN. 1) There exists a significant contrast in computational
intricacies between 1D and 2D convolutions. For
instance, when an image with dimensions N ×N under-
C. TWO-DIMENSIONAL CNN goes convolution with a K × K kernel, the resulting
Despite nearly three decades passing since the inception computational complexity approaches approximately
of the initial CNN, contemporary CNN structures still O (N 2 × K 2 ). In contrast, the corresponding 1D
retain key characteristics of the original design, including convolution, operating with the same dimensions (N
convolutional and pooling layers. Furthermore, aside from a and K ), yields a complexity of around O(N × K ). This
few modifications, the widely used training approach known implies that when subjected to equivalent conditions,
as Back-Propagation has remained a shared feature since encompassing matching configurations, networks, and
the 1990s. This section aims to offer a concise summary of hyperparameters, a 1D CNN exhibits markedly lower
traditional deep CNN while also introducing the fundamental computational complexity than its 2D counterpart.
concepts and foundational architectures from the past. 2) In many recent studies [45], [48], [49], a notable trend
In a traditional Multi-Layer Perceptron (MLP) [46], [47], has emerged: the majority of 1D CNN applications tend
each hidden neuron encompasses singular weights, input, and to adopt compact configurations, often comprising just
output components. However, due to the two-dimensional 1-2 hidden CNN layers, with networks featuring fewer
structure of images, each neuron within a CNN encompasses than 10,000 (K ) parameters. In contrast, nearly all 2D
two-dimensional weight planes, referred to as kernels, CNN applications lean towards ‘‘deep’’ architectures,
alongside input and output components known as feature housing over 1 million (M ) parameters, commonly
maps. The foundational components of a representative surpassing the 10 million mark. Evidently, networks
CNN setup are designed for classifying grayscale images of with shallower architectures prove notably more man-
24 × 24 pixels into two categories. This specific network ageable to train and implement.
A. DATA ACQUISITION
The data used is student handwriting on standardized answer
sheets, as shown in Fig. 6. The answer sheet used has borders
FIGURE 4. A sample of 1D CNN with 2 MLP layers.
and boxes to fill in the work steps and final answer. To train
the model, we utilized only the final answer boxes from the
3) Notably, the training of intricate 2D CNN generally exam answer sheets. This required preprocessing the data to
necessitates specialized hardware setups such as cloud obtain the relevant images.
computing resources or GPU clusters. Conversely,
executing training for compact 1D CNN with only
a few hidden layers (e.g., 2 or fewer) and a limited
number of neurons (e.g., less than 50) is feasible and
relatively swift on standard computers utilizing the
Central Processing Unit (CPU).
4) Due to their modest computational demands, compact
1D CNN are highly suitable for real-time and cost-
effective applications, particularly in scenarios involv-
ing mobile or handheld devices.
2) Removing vertical and horizontal line FIGURE 10. The transformation of 2D images into a 1D representation.
Once the image has been trimmed to include only
the final answer area, there are lingering vertical and After constructing the deep learning model with 1D CNN,
horizontal lines that must be eradicated to acquire the next step is to determine appropriate hyperparameters.
an image containing only the handwritten content. In developing this system, we utilized the Adam optimizer
To address this, the Open Source Computer Vision set at a learning rate of 0.002 and configured a batch size of
Library (OpenCV) library is employed to eliminate 32. The loss function applied was ‘categorical cross-entropy,’
these lines. A comparison of the image before and after while ‘accuracy’ was chosen as the evaluation metric. After
the removal of these lines is presented in Fig. 5. splitting the dataset, we used the training data to train it with
3) Removing vacant spaces a 1D CNN deep learning model. This training is conducted
TABLE 1. Model summary of 1D CNN. Post-training and testing, we assessed the system’s effec-
tiveness by calculating performance metrics. Confusion
matrix tables, such as Tabel 2, for example, provide a com-
parative overview between actual and predicted classification
values, thereby helping in analyzing system performance and
identifying areas that need to be improved or optimized to
improve the overall deep learning model.
Several system performance measurement parameters,
commonly referred to as performance metrics, are evaluated
in the design of this paper, including accuracy, precision,
recall, and F1-score. The meaning of each metric is as
follows:
1) Accuracy
Accuracy assesses the overall precision of the model’s
separately for each exam question number, for all students, outputs. It determines the proportion of accurate
which will be classified based on its score. There are 12 exam predictions relative to the total number of data points
question numbers that will be trained, including 1a, 1b, 1c, 1d, tested. Furthermore, accuracy signifies the system’s
2a, 3a, 3b, 3c, 4a, 4b, 5a, and 5b. An example of the dataset ability to generate outputs that match their respective
can be seen in Fig. 11. classes for the formula, as follows:
TP + TN
Accuracy = . (1)
TP + TN + FP + FN
2) Precision
Precision evaluates the model’s ability to correctly
classify actual positive cases among the predicted
positives. The calculation involves taking the number
of true positives and dividing it by the overall count of
positive predictions produced by the model, which is
represented by the following formula:
TP
Precision = . (2)
TP + FP
3) Recall
Recall evaluates the model’s effectiveness in redis-
covering positive data. It quantifies the ratio of true
positives to the total actual positive data for the formula
FIGURE 11. Example of the dataset.
as follows:
TP
Recall = . (3)
D. MODEL EVALUATION TP + FN
Data split testing involves determining the ideal balance 4) F1-Score
between training and test data to maximize model perfor- The F1-score unifies two key metrics: precision and
mance. An ideal partition is obtained when the system reaches recall to evaluate a model’s effectiveness. It assesses
a high accuracy level while maintaining a larger test dataset how well the model identifies true positives while
relative to the training dataset. The test data sizes used in factoring in its ability to limit both false positives and
testing data separation scenarios are 0.1, 0.2, 0.3, 0.4, 0.5, false negatives. The formula is as follows:
0.6, 0.7, 0.8, and 0.9. Precision × Recall
Hyperparameter testing is the process of finding the F1-Score = 2 × . (4)
Precision + Recall
optimal value of a hyperparameter. There are three hyper-
parameter testing scenarios, including the optimizer testing 5) Loss
scenario, learning rate testing scenario, and batch size testing The ‘‘loss’’ parameter signifies the system’s inaccuracy
scenario. The types of optimizers that will be used in in making predictions to classify an input. There
testing optimizer scenarios are Stochastic Gradient Descent are several ways to calculate loss for the formula,
(SGD), Adaptive Network Estimation (Adam), and Root- as follows:
Mean-Square Propagation (RMSProp) [51]. Meanwhile, the N
X
learning rates that will be compared for each type of optimizer H (P, Q) = − P(i)log(Q(i)), (5)
are 0.002, 0.02, and 0.2. i=1
where H () is the cross-entropy function, P is the target highest model efficacy. The compiled results, illustrating the
distribution, Q is the estimated target distribution, i is mean validation accuracy for each question, are conveniently
index of dataset and N the number of dataset. presented in Fig. 14 for reference and analysis.
Fig. 14 depict the average validation accuracy yielded by
TABLE 2. Model summary of 1D CNN. the model after training on each question number. Throughout
the testing process, the training procedure is executed with
500 epochs, incorporating a callback function that auto-
matically halts training when the training accuracy reaches
100%, and the validation accuracy no longer demonstrates
improvement.
For the test data proportion of 0.1, an average validation
accuracy of 79.00% is obtained for each question number.
IV. RESULT AND DISCUSSION This figure then increases as the percentage of test data is
After completing the training of the data using 1D CNN, augmented, albeit the increments achieved are not profoundly
the next step involves evaluating the model’s performance by substantial. The highest average validation accuracy is
calculating accuracy and loss. We conduct a training testing achieved with a test data size of 0.6, namely a validation
dataset ratio variation ranging from 9:1 to 1:9 for all numbers accuracy of 81.18%. Among the tested data sizes, a 0.6 test
in order to identify the most optimal proportion between data ratio consistently demonstrated the highest average
training and testing data. We can see the result in Fig. 14. validation accuracy, indicating its superiority in terms of both
Once the optimal proportion is determined based on accuracy and overall system performance. With a testing data
accuracy values, the deep learning model is evaluated to size of 0.6, it implies that only 40.00% of the questions
measure its reliability in predicting question scores per need manual examination by the user. This indicates that the
number. The evaluation results exhibit variations in outcomes constructed system efficiently assists in assessing 60.00% of
for each question, as depicted in the following image. the overall essay answers from students.
In the first image, the training outcome for question 3b The results of hyperparameter testing on the deep learning
shows a validation accuracy of 45.83%, while in the second model architecture with 1D CNN can be seen in Table 3,
image, the training outcome for question 3c demonstrates Table 4 and Table 5. The test results of the three existing
a validation accuracy of 100%. From these findings, it can optimizers can be concluded that the use of the Adam
be inferred that the model possesses diverse capabilities in optimizer produces the best performance when viewed from
predicting question scores, contingent upon the complexity the accuracy and loss values obtained in Table 3 using number
and characteristics of each individual question. 3c as a sample. Regarding the learning rate experiments,
a learning rate of 0.002 consistently outperformed others,
achieving the highest validation accuracy and the lowest loss,
suggesting its optimal performance. Then, for test results, the
optimal batch size is batch size 32.
TABLE 3. Training results from different optimizers of 1D-CNN.
FIGURE 14. The Validation Accuracy for Different Training-Testing Ratio within the Dataset.
TABLE 5. Training results from different batch size of 1D-CNN. The training outcomes for each question at the 40.00%
test data and 60.00% training data allocation. Table 6
displays training results with hyperparameters, such as Adam
optimizer, learning rate 0.002 and batch size 32. We use those
hyperparameters based on the training result from Fig. 14,
Table 3, Table 4, and Table 5.
Based on the data obtained in Table 6, it is evident that the This commendable classification outcome is attributed
highest validation accuracy is achieved for question number to the fact that question 3c encompasses only two
3c with an accuracy reaching 100%, whereas the lowest classes, specifically class 3 and class 5. Moreover,
validation accuracy is recorded for question number 3b with considering the complexity of the trained data, question
validation accuracy of 45.83%. 3c exhibits data that is not excessively intricate,
Table 7 and Table 8 present the results of classification featuring a noteworthy distinction between images in
analysis using a 1-5 scale, where classes 3, 4, and 5 represent class 3 and those in class 5. Consequently, the model’s
specific classification levels. This classification is derived task of pattern recognition and classification remains
from manual correction, where answers can receive points 3, relatively uncomplicated. Illustrations of an image
4, or 5 based on the manual evaluation. After several answers from class 3 and an image from class 5 are presented in
are manually checked, the remaining unchecked answers will Fig. 15 and Fig. 16, respectively.
be classified using a system based on predefined classes,
where the scale ranges from 1 to 5. The system evaluates the
similarity of answers to the predefined classes to assign an
appropriate classification score.
Therefore, the author will provide a detailed presentation
of the experiment results for the highest accuracy, which is FIGURE 15. Sample image in class ’3’ for number 3c.
question number 3c, as well as for the lowest accuracy, which
is question number 3b, to enable a thorough comparison. For
further analysis of the remaining questions, please refer to
Table 6.
FIGURE 16. Sample image in class ’5’ for number 3c.
1) Question number 3c
Based on the training outcomes with a test data size 2) Question number 3b
of 0.6, by applying the Adam optimization algorithm Question number 3b exhibits the lowest validation
with a learning rate of 0.002 and a batch size of 32, accuracy compared to the other questions after under-
the model reaches a perfect 100% validation accuracy going training with a test data size of 0.6, using
when evaluated on question 3c. This indicates that the Adam optimization algorithm, a learning rate of
the model accurately predicts all instances within the 0.002, and a batch size of 32. The obtained validation
test dataset for question 3c, correctly assigning them accuracy value is 45.50%, indicating that the model
to their respective classes. Furthermore, the model’s struggles to classify the data effectively and accurately.
performance can be assessed through performance Subsequently, the performance metrics are presented in
metrics, as indicated by the classification report and Table 8.
illustrated in the confusion matrix depicted in Table 7.
TABLE 8. Example of a classification report for number 3b.
In addition to precision, the classification report also TABLE 9. Comparison scenario for number 3B.
reveals recall and F1-score values for this particular
question. Class 3 has a recall value of 0.75, meaning
the model successfully detects 75% of the entire class
3 dataset, while for class 5, its recall is 0.43, implying
the model accurately identifies 43% of the overall class
5 data. Class 4 exhibits a recall of 0.22, indicating that
the model only manages to detect 22% of the entire
class 4 data.
TABLE 10. Comparison scenario for number 5A.
From the precision and recall data, F1-scores for each
class can be derived. The F1-score represents the
harmonic average of both precision and recall. The F1-
scores for each class are as follows: 0.50 for class 3,
0.46 for class 5, and 0.36 for class 4. In conclusion,
this model demonstrates a tendency towards lower
performance with relatively low F1-scores for each
class. This can be attributed to various factors, one
of which is image complexity. In question number TABLE 11. Comparison scenario for number 3C.
3b, the classified images are complex, and there’s a
degree of similarity between different classes, making
it challenging for the model to discern patterns and
leading to suboptimal classification. Images from each
class can be observed in Fig. 17, 18, and 19.
TABLE 13. Comparison with other method. lowest loss compared to the other tested methods. These
results substantiate the claim that the 1D CNN method offers
significant advantages in terms of accuracy and effectiveness.
The findings indicate that the 1D CNN approach not
only excels in performance but also makes a substantial
contribution to the development of efficient and accurate
automated assessment systems. This advantage underscores
the relevance of the 1D CNN method in enhancing the quality
and reliability of automated evaluations within the studied
context.
V. CONCLUSION [6] H. Wang, P. Xu, and J. Zhao, ‘‘Improved KNN algorithms of spherical
This paper proposes a system for automatically assessing regions based on clustering and region division,’’ Alexandria Eng. J.,
vol. 61, no. 5, pp. 3571–3585, May 2022.
handwritten essay answers using a classification approach [7] M. Zhao, D. Li, Z. Shi, S. Du, P. Li, and J. Hu, ‘‘Blur feature
with a 1D CNN deep learning model. The dataset comprised extraction plus automatic KNN matting: A novel two stage blur region
handwritten essay answers from 30 students, addressing detection method for local motion blurred images,’’ IEEE Access, vol. 7,
pp. 181142–181151, 2019.
12 question numbers. Following the image processing stages, [8] V. Abrol, H. G. Pathak, and A. Shukla, ‘‘Power of image-based
the data underwent training utilizing the 1D CNN model. The digit recognition with machine learning,’’ in Proc. Int. Conf. Emergent
training was conducted for all question numbers, with varying Converging Technol. Biomed. Syst. Cham, Switzerland: Springer, 2023,
pp. 323–336.
training ratios to the testing dataset, ranging from 9:1 to 1:9. [9] A. T. Azar, A. Khamis, N. A. Kamal, and B. J. Galli, ‘‘Machine learning
Our analysis of the average validation accuracy results techniques for handwritten digit recognition,’’ in Proc. Int. Conf. Artif.
Intell. Comput. Vis. (AICV). Cham, Switzerland: Springer, Jan. 2020,
revealed that the optimal proportion between training and pp. 414–426.
test data was achieved at 40% training data and 60% test [10] A. Yavariabdi, H. Kusetogullari, T. Celik, S. Thummanapally, S. Rijwan,
data, yielding an average validation accuracy of 81.18%. and J. Hall, ‘‘CArDIS: A Swedish historical handwritten character and
word dataset,’’ IEEE Access, vol. 10, pp. 55338–55349, 2022.
However, it’s noteworthy that validation accuracy varied [11] F. Ott, D. Rügamer, L. Heublein, B. Bischl, and C. Mutschler, ‘‘Auxiliary
across different questions, with some achieving high accuracy cross-modal representation learning with triplet loss functions for online
while others exhibited lower accuracy. handwriting recognition,’’ IEEE Access, vol. 11, pp. 94148–94172, 2023.
[12] A. F. De Sousa Neto, B. L. D. Bezerra, E. B. Lima, and A. H. Toselli,
Specifically, question 3c emerged with the highest valida- ‘‘HDSR-flor: A robust end-to-end system to solve the handwritten digit
tion accuracy for training and F1-score for testing, achieving string recognition problem in real complex scenarios,’’ IEEE Access, vol. 8,
a validation accuracy of 100%. Conversely, question 3b pp. 208543–208553, 2020.
[13] A. Gummaraju, A. K. B. Shenoy, and S. N. Pai, ‘‘Performance comparison
demonstrated the lowest validation accuracy for training and of machine learning models for handwritten devanagari numerals classifi-
F1-score for testing, with a validation accuracy of 45.50% cation,’’ IEEE Access, vol. 11, pp. 133363–133371, 2023.
and F1-score of 42%. This variance in validation accuracy [14] A. A. A. Ali and S. Mallaiah, ‘‘Intelligent handwritten recognition using
hybrid CNN architectures based-SVM classifier with dropout,’’ J. King
across different questions can be attributed to factors such as Saud Univ. Comput. Inf. Sci., vol. 34, no. 6, pp. 3294–3300, Jun. 2022.
the number of classes and image complexity. [15] T. M. Ghanim, M. I. Khalil, and H. M. Abbas, ‘‘Comparative study on deep
In conclusion, this paper highlights the potential of convolution neural networks DCNN-based offline Arabic handwriting
recognition,’’ IEEE Access, vol. 8, pp. 95465–95482, 2020.
employing deep learning models for automating the assess- [16] H. Pan, Z. Pang, Y. Wang, Y. Wang, and L. Chen, ‘‘A new image
ment of handwritten essay answers. Further research could recognition and classification method combining transfer learning algo-
explore strategies to improve accuracy, particularly for ques- rithm and MobileNet model for welding defects,’’ IEEE Access, vol. 8,
pp. 119951–119960, 2020.
tions with lower validation accuracy, potentially enhancing [17] K. Kadam, S. Ahirrao, K. Kotecha, and S. Sahu, ‘‘Detection and
the effectiveness of automated essay grading systems. Future localization of multiple image splicing using MobileNet V1,’’ IEEE
work will include expanding the dataset by incorporating Access, vol. 9, pp. 162499–162519, 2021.
[18] G. C. M. Silvestre, F. Balado, O. Akinremi, and M. Ramo, ‘‘Online gesture
larger and more diverse data, adopting improved methods recognition using transformer and natural language processing,’’ 2023,
or algorithms to enhance performance, and training the arXiv:2305.03407.
[19] M. Ramo and G. C. M. Silvestre, ‘‘A transformer architecture for
model not only on final answers but also on the step-by- online gesture recognition of mathematical expressions,’’ in Proc. Irish
step reasoning process to capture a deeper understanding of Conf. Artif. Intell. Cognit. Sci. Cham, Switzerland: Springer, Jan. 2022,
student responses. pp. 55–67.
[20] R. Rajesh. and R. Kanimozhi., ‘‘Digitized exam paper evaluation,’’ in
Proc. IEEE Int. Conf. Syst., Comput., Autom. Netw. (ICSCAN), Mar. 2019,
ACKNOWLEDGMENT pp. 1–5.
The authors would like to thank the Research and Community [21] X. Y. T. Gong, ‘‘A deep learning technology based OCR framework for
Service Directorate, Telkom University. recognition handwritten expression and text,’’ in Proc. CONVERTER,
Jul. 2021, pp. 1–10.
[22] D. K. Tayal, S. Vij, G. Malik, and A. Singh, ‘‘An OCR based automated
REFERENCES method for textual analysis of questionnaires,’’ Indian J. Comput. Sci. Eng.,
vol. 8, no. 2, pp. 184–191, 2017.
[1] S. P. Jenal, D. Rana, and S. K. Pradhan, ‘‘A hand written digit [23] P. Sijimol and S. M. Varghese, ‘‘Handwritten short answer evaluation
recognition based learning Android application,’’ PalArch’s J. Archaeol. system (HSAES),’’ Int. J. Sci. Res. Sci. Technol., vol. 4, no. 2,
Egypt/Egyptol., vol. 17, no. 9, pp. 2151–2163, 2020. pp. 1514–1518, 2018.
[2] M. B. Bora, D. Daimary, K. Amitab, and D. Kandar, ‘‘Handwritten [24] A. Ueda, W. Yang, and K. Sugiura, ‘‘Switching text-based image
character recognition from images using CNN-ECOC,’’ Proc. Comput. encoders for captioning images with text,’’ IEEE Access, vol. 11,
Sci., vol. 167, pp. 2403–2409, Jan. 2020. pp. 55706–55715, 2023.
[25] I. H. El-Shal, O. M. Fahmy, and M. A. Elattar, ‘‘License plate image
[3] P. Chakraborty, S. Roy, S. N. Sumaiya, and A. Sarker, ‘‘Handwritten
analysis empowered by generative adversarial neural networks (GANs),’’
character recognition from image using CNN,’’ in Micro-Electronics
IEEE Access, vol. 10, pp. 30846–30857, 2022.
and Telecommunication Engineering. Cham, Switzerland: Springer, 2023,
[26] Q.-D. Nguyen, N.-M. Phan, P. Krömer, and D.-A. Le, ‘‘An efficient
pp. 37–47.
unsupervised approach for OCR error correction of Vietnamese OCR text,’’
[4] A. Abdulraheem and I. Y. Jung, ‘‘Effective digital technology enabling IEEE Access, vol. 11, pp. 58406–58421, 2023.
automatic recognition of special-type marking of expiry dates,’’ Sustain- [27] W. Ohyama, M. Suzuki, and S. Uchida, ‘‘Detecting mathematical
ability, vol. 15, no. 17, p. 12915, Aug. 2023. expressions in scientific document images using a U-Net trained on a
[5] A. Karim, P. Ghosh, A. A. Anjum, M. S. Junayed, Z. H. Md, K. M. Hasib, diverse dataset,’’ IEEE Access, vol. 7, pp. 144030–144042, 2019.
and A. N. B. Emran, ‘‘A comparative study of different deep learning [28] Y. Jiang, H. Dong, and A. El Saddik, ‘‘Baidu meizu deep learning
model for recognition of handwriting digits,’’ in Proc. ICICNIS, 2020, competition: Arithmetic operation recognition using end-to-end learning
pp. 857–866. OCR technologies,’’ IEEE Access, vol. 6, pp. 60128–60136, 2018.
[29] E. Shaikh, I. Mohiuddin, A. Manzoor, G. Latif, and N. Mohammad, [51] L. Lakshmi, A. N. Kalyani, D. K. Madhuri, S. Potluri, G. S. Pandey, S. Ali,
‘‘Automated grading for handwritten answer sheets using convolutional M. I. Khan, F. A. Awwad, and E. A. A. Ismail, ‘‘Performance analysis
neural networks,’’ in Proc. 2nd Int. Conf. new Trends Comput. Sci. of cycle GAN in photo to portrait transfiguration using deep learning
(ICTCS), Oct. 2019, pp. 1–6. optimizers,’’ IEEE Access, vol. 11, pp. 136541–136551, 2023.
[30] M. A. Rahaman and H. Mahmud, ‘‘Automated evaluation of handwritten [52] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’
answer script using deep learning approach,’’ Trans. Mach. Learn. Artif. 2014, arXiv:1412.6980.
Intell., vol. 10, no. 4, pp. 1–16, 2022. [53] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[31] M. Alomran and D. Chai, ‘‘Automated scoring system for multiple choice MA, USA: MIT Press, 2016.
test with quick feedback,’’ Int. J. Inf. Educ. Technol., vol. 8, no. 8, [54] D. Choi, C. J. Shallue, Z. Nado, J. Lee, C. J. Maddison, and G. E. Dahl,
pp. 538–545, 2018. ‘‘On empirical comparisons of optimizers for deep learning,’’ 2019,
[32] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, ‘‘Artificial neural arXiv:1910.05446.
networks-based machine learning for wireless networks: A tutorial,’’ IEEE [55] M. Reyad, A. M. Sarhan, and M. Arafa, ‘‘A modified Adam algorithm for
Commun. Surveys Tuts., vol. 21, no. 4, pp. 3039–3071, 4th Quart., 2019. deep neural network optimization,’’ Neural Comput. Appl., vol. 35, no. 23,
[33] A. Sharma and D. B. Jayagopi, ‘‘Automated grading of handwritten pp. 17095–17112, Aug. 2023.
essays,’’ in Proc. 16th Int. Conf. Frontiers Handwriting Recognit.
(ICFHR), Aug. 2018, pp. 279–284.
[34] P. Darshni, B. S. Dhaliwal, R. Kumar, V. A. Balogun, S. Singh,
and C. I. Pruncu, ‘‘Artificial neural network based character recognition
using SciLab,’’ Multimedia Tools Appl., vol. 82, no. 2, pp. 2517–2538,
Jan. 2023. NOVALANZA GRECEA PASARIBU was born in
[35] A. Malhotra, M. Gupta, R. Yadav, P. Tokas, and A. Sharma, ‘‘ANN-based Blora, Central Java, Indonesia. She received the
handwritten digit recognition and equation solver,’’ in Proceedings of Data bachelor’s degree in telecommunication engineer-
Analytics Management, vol. 1. Cham, Switzerland: Springer, Jan. 2022, ing from Telkom University, in 2023, where she is
pp. 77–86. currently pursuing the master’s degree in electrical
[36] M. P. Singh and G. Singh, ‘‘Two phase learning technique in modular
engineering with a focus on signal processing. She
neural network for pattern classification of handwritten Hindi alphabets,’’
Mach. Learn. With Appl., vol. 6, Dec. 2021, Art. no. 100174.
is an enthusiastic individual with a high dedication
[37] J. Fernández, J. Chiachío, J. Barros, M. Chiachío, and C. S. Kulkarni, to the field of signal processing. She has published
‘‘Physics-guided recurrent neural network trained with approximate one article in reputed international journals.
Bayesian computation: A case study on structural response prognostics,’’
Rel. Eng. Syst. Saf., vol. 243, Mar. 2024, Art. no. 109822.
[38] W. Serrano, ‘‘The deep learning generative adversarial random neural
network in data marketplaces: The digital creative,’’ Neural Netw., vol. 165,
pp. 420–434, Aug. 2023.
[39] N. S. Cho, C. S. Kim, C. Park, and K. R. Park, ‘‘GAN-based blur
restoration for finger wrinkle biometrics system,’’ IEEE Access, vol. 8, GELAR BUDIMAN (Member, IEEE) was born
pp. 49857–49872, 2020. in Bandung, West Java, Indonesia, in 1978.
[40] E. Osa, P. E. Orukpe, and U. Iruansi, ‘‘Design and implementation of a He received the B.E. and M.E. degrees in elec-
deep neural network approach for intrusion detection systems,’’ e-Prime trical engineering from Sekolah Tinggi Teknologi
Adv. Electr. Eng., Electron. Energy, vol. 7, Mar. 2024, Art. no. 100434. Telkom (STT Telkom), in 2005, and the Ph.D.
[41] L. Xiaoxue, Z. Xiaofan, Y. Xin, L. Dan, W. He, Z. Bowen, Z. Bohan, degree from the School of Electrical and Infor-
Z. Di, and W. Liqun, ‘‘Review of medical data analysis based on matics Engineering, Institut Teknologi Bandung,
spiking neural networks,’’ Proc. Comput. Sci., vol. 221, pp. 1527–1538, in 2021. He began his academic career as a
Jan. 2023. Lecturer at the Telecommunication Engineer-
[42] A. M. Durán-Rosal, A. Durán-Fernández, F. Fernández-Navarro, and
ing Department, Institut Teknologi Telkom (IT
M. Carbonero-Ruz, ‘‘A multi-class classification model with parametrized
target outputs for randomized-based feedforward neural networks,’’ Appl. Telkom), in 2008, which later evolved into Telkom University under the
Soft Comput., vol. 133, Jan. 2023, Art. no. 109914. Faculty of Electrical Engineering. Since 2018, he has been an Assistant
[43] A. K. Jain, J. Mao, and K. M. Mohiuddin, ‘‘Artificial neural networks: A Professor. His academic expertise spans signal processing, information
tutorial,’’ Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. hiding, artificial intelligence, computer vision, wireless communications,
[44] C. B. Gonçalves, J. R. Souza, and H. Fernandes, ‘‘CNN architecture quantum signal processing, and security. He has an extensive portfolio
optimization using bio-inspired algorithms for breast cancer detec- of research contributions, including ten papers published in reputed
tion in infrared images,’’ Comput. Biol. Med., vol. 142, Mar. 2022, international journals, ten papers presented in Scopus-indexed international
Art. no. 105205. proceedings, nine intellectual property rights, and one authored book.
[45] X. Li, R. Lu, Q. Wang, J. Wang, X. Duan, Y. Sun, X. Li, and Y.
Zhou, ‘‘One-dimensional convolutional neural network (1D-CNN) image
reconstruction for electrical impedance tomography,’’ Rev. Sci. Instrum.,
vol. 91, no. 12, pp. 1–10, 2020.
[46] M. Desai and M. Shah, ‘‘An anatomization on breast cancer detection and
diagnosis employing multi-layer perceptron neural network (MLP) and INDRARINI DYAH IRAWATI (Member, IEEE)
convolutional neural network (CNN),’’ Clin. eHealth, vol. 4, pp. 1–11, was born in Karanganyar, Central Java, in 1978.
Jan. 2021. She received the Ph.D. degree from the School
[47] Z. Zhao, S. Xu, B. H. Kang, M. M. J. Kabir, Y. Liu, and R. Wasinger,
of Electrical and Information Engineering, Insti-
‘‘Investigation and improvement of multi-layer perceptron neural networks
tut Teknologi Bandung. She was with Telkom
for credit scoring,’’ Expert Syst. Appl., vol. 42, no. 7, pp. 3508–3516,
May 2015.
University as an Instructor, from 2007 to 2014,
[48] J. Gan, W. Wang, and K. Lu, ‘‘A new perspective: Recognizing online an Assistant Professor, from 2014 to 2019, and
handwritten Chinese characters via 1-dimensional CNN,’’ Inf. Sci., has been an Associate Professor, since 2019,
vol. 478, pp. 375–390, Apr. 2019. and a Professor, since 2023. She has published
[49] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, 12 papers in reputed international journals, nine
‘‘1D convolutional neural networks and applications: A survey,’’ Mech. papers in Scopus international proceedings, ten intellectual property rights,
Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398. and one book. Her main research interests include compressive sensing,
[50] A. Anand, S. De, A. Jaiswal, S. Agrawal, and B. C. Narendra, ‘‘Automation watermarking, signal processing, artificial intelligence, the Internet of
of non-classroom courses using machine learning techniques,’’ in Proc. 6th Things, and computer networks.
Int. Conf. Educ. Technol. (ICET), Oct. 2020, pp. 90–96.