0% found this document useful (0 votes)

20 views14 pages

Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network

This paper presents a novel approach for automated essay assessment using a One-dimensional Convolutional Neural Network (1D CNN) to address the challenges of traditional grading methods, which are often time-consuming and subjective. The model, trained on a dataset of 30 student answer sheets, achieved an average validation accuracy of 81.18%, indicating its potential to streamline the grading process while promoting consistency and fairness. The research aims to enhance the efficiency of evaluating handwritten responses, contributing to the development of more adaptive automated grading systems.

Uploaded by

KCPD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views14 pages

Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network

Uploaded by

KCPD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Received 29 October 2024, accepted 2 December 2024, date of publication 11 December 2024, date of current version 20 December 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3515837

Auto Evaluation for Essay Assessment Using a

1D Convolutional Neural Network
NOVALANZA GRECEA PASARIBU1 , GELAR BUDIMAN 1, (Member, IEEE),
AND INDRARINI DYAH IRAWATI 2 , (Member, IEEE)
1 School of Electrical Engineering, Telkom University, Bandung 40257, Indonesia
2 School of Applied Sciences, Telkom University, Bandung 40257, Indonesia
Corresponding author: Gelar Budiman (gelarbudiman@telkomuniversity.ac.id)
This research activity is supported through Riset dan Inovasi untuk Indonesia Maju (RIIM) Kompetisi funding from the Indonesia
Endowment Fund for Education Agency or Lembaga Pengelola Dana Pendidikan (LPDP), Ministry of Finance of the Republic of
Indonesia and National Research and Innovation Agency of Indonesia according to the contract number: No. 81/IV/KS/05/2023
and No. 12/II.7/HK/2023.

ABSTRACT Traditional assessment methods often face a trade-off between accessibility and in-depth
evaluation. While multiple-choice exams offer easy grading, they may limit the ability to assess critical
thinking and analytical skills. On the other hand, essay exams provide a valuable tool to gauge these skills,
but their manual evaluation has several drawbacks. First, grading essays is a time-consuming process. Each
student’s response requires individual attention, leading to a significant workload for educators. Second,
subjectivity is a major concern. Factors like the evaluator’s mental state, fatigue, or even background can
influence their judgment, leading to inconsistencies and potential biases in grading. This paper proposes
solving these challenges using artificial intelligence (AI) for essay assessment. We present a novel approach
utilizing a One-dimensional Convolutional Neural Network (1D CNN) deep learning model. This model
is specifically designed to analyze image-based student answer sheets, automatically classifying them
according to the scores allocated for each question. The dataset used consists of answer sheets from
30 students in a coding class, each provided with a pre-annotated template. Our development process divided
the available data into a 60/40 split, with 60% dedicated to testing the model’s performance and 40% used
for training. The model achieved an average validation accuracy of 81.18% through this training. These
results suggest that the proposed 1D CNN model offers a promising avenue for mitigating the limitations of
manual essay grading. By automating the process and reducing subjectivity, this approach has the potential
to streamline the assessment workload for educators while promoting consistency and fairness in evaluating
student learning outcomes.

INDEX TERMS Deep learning, automatic essay scoring, handwritten image, 1D CNN.

I. INTRODUCTION In the process of grading exam answers, multiple-choice

Examinations are one of the ways to measure students’ exams tend to be easier and quicker, while evaluating
learning quality. There are two common types of exams used, essay-type exams requires more time. This is because
namely multiple-choice exams and essay exams. Multiple- in evaluating essay answers, the evaluator must review
choice exams are conducted by providing questions and each student’s writing one by one. Furthermore, manual
several answer options, so students only need to select one assessment of essay answers can lead to subjective grading
of the available answer choices. Additionally, there is a type due to human factors, such as the evaluator’s mental state
of exam called the essay exam where students need to provide and health conditions, causing a decline in grading quality
their answers. and inconsistency in essay assessment. To address the issues
in essay answer evaluation, it is necessary to develop a
The associate editor coordinating the review of this manuscript and software tool that can automatically assess essay answers.
approving it for publication was Andrea F. Abate . The developed software is an Android application that aids
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 188217
N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

exam evaluators in assigning scores to essay answers. The In addressing the existing limitations in automated scor-
application will leverage deep learning technology with a ing systems for multiple-choice tests, the paper in [31]
classification approach. proposed an enhanced approach by incorporating image
In the realm of auto-grading and autoscoring for handwrit- processing to achieve a cost-effective and rapid evaluation
ten assessments, numerous studies have delved into enhanc- of multiple-choice questions in questionnaires. The system
ing evaluation processes. Notable methodologies explored allows users to print and scan answer sheets using a
in the literature include Convolutional Neural Networks regular scanner, subsequently processing the scanned sheets
(CNN) [1], [2], [3], [4], [5], k-Nearest Neighbors (KNN) on a personal computer. Notably, the system offers quick
[6], [7], [8], [9], Optical Handwriting Recognition (OHR) feedback to students by annotating returned answer sheets.
[10], [11], [12], Support Vector Machine (SVM) [13], It features handwriting recognition, recognition of student
[14], Deep Convolutional Neural Networks (DCNN) [15], Identification (student ID), and the flexibility for students
Mobile Net [16], [17], and Transfer Learning [18], [19]. to modify their answers directly on the answer sheet. The
These approaches aim to streamline the assessment of method encompasses finder pattern recognition and optical
handwritten content, leveraging advanced neural network mark recognition algorithms, as detailed in the paper, which
architectures, image processing techniques, and machine also discusses related works in the field. The system’s
learning algorithms. effectiveness is underscored by the experimental results,
Many previous studies [9], [20], [21], [22], [23], [24], [25], with a 95% success rate in student ID recognition. Future
[26], [27], [28] also used the Optical Character Recognition work is aimed at enhancing the user experience and making
(OCR) method. OCR technology facilitates the automatic the system available online, acknowledging a continual
transformation of images into text that can be read and commitment to addressing evolving needs in the domain of
processed by machines. It can handle both printed and automated multiple-choice test assessment.
handwritten text and is utilized for digitizing various types An essay answer assessment system was developed by
of documents. These documents can include scanned papers, combining OHR with Automatic Essay Scoring (AES). The
image files, or even photographs of scenes with text. OCR dataset used in this research was obtained from the Hewlett
has a wide range of applications and is commonly used Foundation. Handwritten essays were processed using
for converting printed materials into digital text, such as Multi-Dimensional Long Short-Term Memory (MDLSTM)
data records, invoices, checks, bank statements, computer- along with several convolutional layers. The model also
generated receipts, mail, documentation, printed pages, and incorporated a pre-trained GloVe word vector model. The
business cards, among others [22]. outcomes of the essay evaluations performed by OHR were
The proposed system in [29] introduced a framework compared to manual assessments, resulting in a Quadratic
designed to recognize handwriting and present students’ Weighted Kappa score of 0.88. This score suggests that the
final scores. Utilizing a dataset consisting of 250 answer system performs effectively, even with some errors present.
sheets, the system was tested with two deep CNN models. [33].
Despite achieving a commendable accuracy of 92.86%, it is The reviewed papers collectively highlight some com-
essential to note that this accuracy falls short compared mon limitations in existing automated grading systems for
to those reported in section II, possibly attributed to the handwritten assessments. These include challenges related to
system utilizing its handwritten dataset. Additionally, the dataset diversity with [29], relying on a proprietary dataset,
system exhibited a limitation in terms of cropping, requiring and [30] acknowledging the need for further refinement to
manual intervention. Each dataset entry consists of an enhance adaptability across various assessment scenarios.
image captured by the student, accompanied by text-based Additionally, both [30] and [31] reveal limitations in as-
information such as the student’s name and ID (Nomor assessing longer responses or essays, pointing to a gap
Induk Mahasiswa/NIM), which are provided through the in the existing systems. Furthermore, the need for manual
application. intervention in the cropping process, as identified by
The paper in [30] focused on a model tailored for [29], suggests an opportunity for automation improvement.
evaluating responses of 40 words, surpassing typical student In summary, these limitations collectively underscore the
answers by 13 words. The methodology utilizes a dataset motivation for the present paper, emphasizing the need for a
of 450 answers, allocating 70% for training and 30% for more versatile, comprehensive, and user-friendly automated
testing. The process includes Handwritten Text Recognition grading system for handwritten assessments. To achieve this
using a CNN, followed by Data Preprocessing, Stemming, goal, we utilize a dataset comprising 30 student answers, each
and Vector Representation of Answers. Although the model pre-annotated with a template on the answer sheet. The study
achieved 80% accuracy on the test set, it has limitations, aims to create a system capable of automatically evaluating
such as its inability to evaluate longer responses, images, handwritten responses, utilizing image processing techniques
or equations. Acknowledging a dependence on substantial and methods like CNN. The model in this paper is a discrete
training data, the study underscores the necessity for further classification model since the dataset class is set to discrete
refinement to enhance adaptability and accuracy across type classification in order to simplify the lecturer to asses
diverse assessment scenarios. manually for the training dataset. This endeavor aims to

188218 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

FIGURE 1. Summary of ANN (redrawn from [32]).

contribute to the advancement of a more adaptive, effective, continually enrich the realm of neural networks. Conversely,
and streamlined automated grading system tailored to the ANN serves as a catalyst for innovation across these
specific context of handwritten student responses. disciplines by introducing new tools and representations.
The layout of this paper is outlined as follows: Section II ANN functions as extensive parallel computing systems
provides an overview of the theories and technologies used in comprising numerous interconnected simple processors.
system development. Section III details the implementation They seek to replicate the organizational principles theorized
of deep learning methods and models. Section IV presents to exist in the human brain. ANN are often described
the system development results, and Section V concludes the as weighted directed graphs, where artificial neurons act
paper. as nodes, and the weighted edges represent connections
between neuron inputs and outputs. These networks are
typically classified into two main categories: feed-forward
II. ARTIFICIAL INTELLIGENCE ARCHITECTURE networks, which lack feedback loops, and recurrent networks,
A. ARTIFICIAL NEURAL NETWORK where loops arise due to feedback connections. One of the
Artificial Neural Networks (ANN) [5], [34], [35] have primary advantages of ANN is its capacity to autonomously
become a cornerstone in the development of modern tech- learn underlying patterns, such as input-output relationships,
nology. Various types of ANN, such as Modular Neural Net- from a dataset of exemplar instances. This autonomous
works (MNN) [36], Recurrent Neural Networks (RNN) [37], learning capability sets ANN apart from traditional expert
Generative Adversarial Networks (GAN) [38], [39], Deep systems, rendering them highly adaptable and applicable
Neural Networks (DNN) [40], Spiking Neural Networks across various real-world scenarios. This attribute continues
(SNN) [41], and Feedforward Neural Networks (FNN) [42], to fuel ongoing advancements in the realm of ANN [43].
are designed to address specific tasks [32]. This diversity
reflects the remarkable power and versatility of ANN in B. DEEP LEARNING
solving complex problems. Deep learning is a distinct area within machine learning
A comprehensive understanding of ANN requires inter- focused on creating algorithmic models that replicate the
disciplinary knowledge. ANN is extensively developed in structure of the human neural network. This artificial neural
various related fields [43]. Advancements in these fields network is composed of interconnected layers of neurons,

VOLUME 12, 2024 188219

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

where each neuron can receive input, process it, and pass its comprises two convolutional layers and two pooling layers,
output to other neurons in the network. The process includes hosting 4 and 6 neurons, respectively. The final output from
assigning weights and combining data, enabling the network the last pooling layer is passed through a single dense
to detect patterns and extract significant features from the layer, after which the result is produced by the output layer
input data. Deep learning has the ability to autonomously responsible for classification. The connections supplying
capture complex feature representations, making it highly the convolutional layers are governed by weight filters (w)
effective in diverse applications, such as image recognition, characterized by a kernel size of (Kx × Ky ). Convolution
natural language processing, handwriting recognition, and occurs while adhering to the image boundaries, leading to
many other tasks. a reduction of (Kx - 1) pixels in width and (Ky - 1) pixels
CNN [44] stands out as a prevalent deep learning algorithm in height within the feature map dimensions. Pre-determined
employed extensively across a spectrum of disciplines such subsampling factors (Sx × Sy ) are applied in the pooling
as natural language processing, computer vision, and various layers. For the instance depicted in the figure, kernel sizes
other domains. CNN is capable of efficiently extracting of 4 × 4 were chosen for the two convolutional layers, while
features from data by utilizing various layers, including subsampling factors of 3 × 3 and 4 × 4 were set for the first
convolutional layers, pooling layers, activation layers, fully and second pooling layers respectively. Importantly, these
connected layers, and output layers [29], [30]. values were intentionally selected to ensure that the outputs
One-dimensional (1D) CNN is a derivative of CNN of the final pooling layer (fed into the fully connected layer)
used to process sequentially ordered data or 1D data [45]. are single scalar values (1 × 1). The output layer comprises
The operation of 1D CNN is similar to regular CNN, but two fully connected neurons, aligning with the number of
convolution is performed along the 1D data sequence. A filter categories to which the image is assigned.
or kernel is passed over the 1D data sequence to identify
important features within the data. The data initially existed
as a two-dimensional (2D) image, so flattening is necessary D. ONE-DIMENSIONAL CNN
to transform it into a 1D vector that can be processed by the The traditional deep CNN discussed in the preceding
1D CNN model as shown in the Fig. 2 and Table 1. section is specifically crafted to function on two-dimensional
data types like images and videos. Consequently, they are
commonly labelled as ’2D CNN’. In contrast, a modified
iteration of these, known as 1D CNN, has emerged as
an alternative. Recent research underscores that 1D CNN
offer distinct benefits and has gained preference over their
2D equivalents for handling one-dimensional signals in
particular applications. This inclination towards 1D CNN is
attributed to the following factors:

FIGURE 2. Convolution and pooling in 1D CNN. 1) There exists a significant contrast in computational
intricacies between 1D and 2D convolutions. For
instance, when an image with dimensions N ×N under-
C. TWO-DIMENSIONAL CNN goes convolution with a K × K kernel, the resulting
Despite nearly three decades passing since the inception computational complexity approaches approximately
of the initial CNN, contemporary CNN structures still O (N 2 × K 2 ). In contrast, the corresponding 1D
retain key characteristics of the original design, including convolution, operating with the same dimensions (N
convolutional and pooling layers. Furthermore, aside from a and K ), yields a complexity of around O(N × K ). This
few modifications, the widely used training approach known implies that when subjected to equivalent conditions,
as Back-Propagation has remained a shared feature since encompassing matching configurations, networks, and
the 1990s. This section aims to offer a concise summary of hyperparameters, a 1D CNN exhibits markedly lower
traditional deep CNN while also introducing the fundamental computational complexity than its 2D counterpart.
concepts and foundational architectures from the past. 2) In many recent studies [45], [48], [49], a notable trend
In a traditional Multi-Layer Perceptron (MLP) [46], [47], has emerged: the majority of 1D CNN applications tend
each hidden neuron encompasses singular weights, input, and to adopt compact configurations, often comprising just
output components. However, due to the two-dimensional 1-2 hidden CNN layers, with networks featuring fewer
structure of images, each neuron within a CNN encompasses than 10,000 (K ) parameters. In contrast, nearly all 2D
two-dimensional weight planes, referred to as kernels, CNN applications lean towards ‘‘deep’’ architectures,
alongside input and output components known as feature housing over 1 million (M ) parameters, commonly
maps. The foundational components of a representative surpassing the 10 million mark. Evidently, networks
CNN setup are designed for classifying grayscale images of with shallower architectures prove notably more man-
24 × 24 pixels into two categories. This specific network ageable to train and implement.

188220 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

FIGURE 3. Illustration of CNN with convolution and fully connected layers.

have filled out. The answer sheets used by the students

are standardized with a specific layout so that all student
answers will have the exact same layout. The entire dataset
is divided into 12 question numbers, which will be used
for training using deep learning techniques. Before feeding
the data into the deep learning model, the entire dataset
undergoes preprocessing to help the model recognize patterns
more precisely and reduce the risk of overfitting.

A. DATA ACQUISITION
The data used is student handwriting on standardized answer
sheets, as shown in Fig. 6. The answer sheet used has borders
FIGURE 4. A sample of 1D CNN with 2 MLP layers.
and boxes to fill in the work steps and final answer. To train
the model, we utilized only the final answer boxes from the
3) Notably, the training of intricate 2D CNN generally exam answer sheets. This required preprocessing the data to
necessitates specialized hardware setups such as cloud obtain the relevant images.
computing resources or GPU clusters. Conversely,
executing training for compact 1D CNN with only
a few hidden layers (e.g., 2 or fewer) and a limited
number of neurons (e.g., less than 50) is feasible and
relatively swift on standard computers utilizing the
Central Processing Unit (CPU).
4) Due to their modest computational demands, compact
1D CNN are highly suitable for real-time and cost-
effective applications, particularly in scenarios involv-
ing mobile or handheld devices.

III. PROPOSED METHOD

This paper presents a methodology to develop a model
aimed at classifying data based on their scores. The process
encompasses data acquisition, pre-processing, and dataset
training to construct a deep learning model. Data acquisition
involves collecting images of student answer sheets from the
Coding and Compression course. The student answer sheets
will be scanned with high quality to avoid noise that may
interfere with the evaluation. Subsequently, the acquired data
undergoes pre-processing, including cropping the images to
FIGURE 5. Flowchart 1D CNN for classification.
the desired area and eliminating vertical and horizontal lines
to refine the image quality. Following pre-processing, the
dataset is trained using a CNN architecture employing 1D B. IMAGE PREPROCESSING
convolutional layers. Before the data in possession is utilized as training material,
The data used consists of 30 images of essay exam a preliminary round of image processing is conducted
answer sheets for Coding and Compression that students to improve the accuracy of the subsequent deep-learning

VOLUME 12, 2024 188221

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

model. This image manipulation encompasses procedures

like cropping, eradicating both horizontal and vertical lines,
and eliminating vacant spaces.
1) Cropping
FIGURE 8. Convolution and pooling in 1D CNN.
Within the answer sheets that have already been
completed by students, the focus for training using the
deep learning algorithm pertains to the final answers As a result of the border removal, there is still a vacant
situated in the lower-right corner of the answer sheets. area surrounding the handwritten text. To extract solely
Consequently, it becomes imperative to trim the image the handwritten content, it becomes essential to remove
in order to extract this specific region. The cropping this empty space using the OpenCV library [50]. The
procedure employs the Python Imaging Library (PIL), image depicted in Fig. 8 stands as the conclusive
where the coordinates, length, and breadth of the picture, ready to be employed as input for the deep
designated cropped area are stipulated. The differences learning model.
between the images prior to and following the cropping
process are elucidated in Fig. 3 and 4.

FIGURE 9. The Final image that is ready to train with 1D CNN.

C. TRAINING DATA WITH 1D CNN

The collected dataset will be labeled with the actual values
of each question, then divided into training data to be
processed using a 1D CNN-based deep learning model and
testing data. This architecture is slightly different, where the
data previously input into the deep learning model as 2D
images is transformed into 1D. The representation of the
transformation of the image from 2D to 1D can be seen in
Fig.10. This transformation is achieved through the reshape
method applied to the images, altering their dimensions. Once
reshaped, the image dimensions will become (height × width
× channels, 1), wherein each image pixel becomes an element
in the 1D representation vector. In this case, if the image
initially had dimensions of (32, 32, 1), it will have dimensions
of (1024, 1) after the transformation. Afterwards, the data will
be implemented into a 1D CNN model with layers as shown
in the following Table 1.

FIGURE 6. The standardized answer sheet.

FIGURE 7. Final answer area as the output of the cropping step.

2) Removing vertical and horizontal line FIGURE 10. The transformation of 2D images into a 1D representation.
Once the image has been trimmed to include only
the final answer area, there are lingering vertical and After constructing the deep learning model with 1D CNN,
horizontal lines that must be eradicated to acquire the next step is to determine appropriate hyperparameters.
an image containing only the handwritten content. In developing this system, we utilized the Adam optimizer
To address this, the Open Source Computer Vision set at a learning rate of 0.002 and configured a batch size of
Library (OpenCV) library is employed to eliminate 32. The loss function applied was ‘categorical cross-entropy,’
these lines. A comparison of the image before and after while ‘accuracy’ was chosen as the evaluation metric. After
the removal of these lines is presented in Fig. 5. splitting the dataset, we used the training data to train it with
3) Removing vacant spaces a 1D CNN deep learning model. This training is conducted

188222 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

TABLE 1. Model summary of 1D CNN. Post-training and testing, we assessed the system’s effec-
tiveness by calculating performance metrics. Confusion
matrix tables, such as Tabel 2, for example, provide a com-
parative overview between actual and predicted classification
values, thereby helping in analyzing system performance and
identifying areas that need to be improved or optimized to
improve the overall deep learning model.
Several system performance measurement parameters,
commonly referred to as performance metrics, are evaluated
in the design of this paper, including accuracy, precision,
recall, and F1-score. The meaning of each metric is as
follows:
1) Accuracy
Accuracy assesses the overall precision of the model’s
separately for each exam question number, for all students, outputs. It determines the proportion of accurate
which will be classified based on its score. There are 12 exam predictions relative to the total number of data points
question numbers that will be trained, including 1a, 1b, 1c, 1d, tested. Furthermore, accuracy signifies the system’s
2a, 3a, 3b, 3c, 4a, 4b, 5a, and 5b. An example of the dataset ability to generate outputs that match their respective
can be seen in Fig. 11. classes for the formula, as follows:
TP + TN
Accuracy = . (1)
TP + TN + FP + FN
2) Precision
Precision evaluates the model’s ability to correctly
classify actual positive cases among the predicted
positives. The calculation involves taking the number
of true positives and dividing it by the overall count of
positive predictions produced by the model, which is
represented by the following formula:
TP
Precision = . (2)
TP + FP
3) Recall
Recall evaluates the model’s effectiveness in redis-
covering positive data. It quantifies the ratio of true
positives to the total actual positive data for the formula
FIGURE 11. Example of the dataset.
as follows:
TP
Recall = . (3)
D. MODEL EVALUATION TP + FN
Data split testing involves determining the ideal balance 4) F1-Score
between training and test data to maximize model perfor- The F1-score unifies two key metrics: precision and
mance. An ideal partition is obtained when the system reaches recall to evaluate a model’s effectiveness. It assesses
a high accuracy level while maintaining a larger test dataset how well the model identifies true positives while
relative to the training dataset. The test data sizes used in factoring in its ability to limit both false positives and
testing data separation scenarios are 0.1, 0.2, 0.3, 0.4, 0.5, false negatives. The formula is as follows:
0.6, 0.7, 0.8, and 0.9. Precision × Recall
Hyperparameter testing is the process of finding the F1-Score = 2 × . (4)
Precision + Recall
optimal value of a hyperparameter. There are three hyper-
parameter testing scenarios, including the optimizer testing 5) Loss
scenario, learning rate testing scenario, and batch size testing The ‘‘loss’’ parameter signifies the system’s inaccuracy
scenario. The types of optimizers that will be used in in making predictions to classify an input. There
testing optimizer scenarios are Stochastic Gradient Descent are several ways to calculate loss for the formula,
(SGD), Adaptive Network Estimation (Adam), and Root- as follows:
Mean-Square Propagation (RMSProp) [51]. Meanwhile, the N
X
learning rates that will be compared for each type of optimizer H (P, Q) = − P(i)log(Q(i)), (5)
are 0.002, 0.02, and 0.2. i=1

VOLUME 12, 2024 188223

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

where H () is the cross-entropy function, P is the target highest model efficacy. The compiled results, illustrating the
distribution, Q is the estimated target distribution, i is mean validation accuracy for each question, are conveniently
index of dataset and N the number of dataset. presented in Fig. 14 for reference and analysis.
Fig. 14 depict the average validation accuracy yielded by
TABLE 2. Model summary of 1D CNN. the model after training on each question number. Throughout
the testing process, the training procedure is executed with
500 epochs, incorporating a callback function that auto-
matically halts training when the training accuracy reaches
100%, and the validation accuracy no longer demonstrates
improvement.
For the test data proportion of 0.1, an average validation
accuracy of 79.00% is obtained for each question number.
IV. RESULT AND DISCUSSION This figure then increases as the percentage of test data is
After completing the training of the data using 1D CNN, augmented, albeit the increments achieved are not profoundly
the next step involves evaluating the model’s performance by substantial. The highest average validation accuracy is
calculating accuracy and loss. We conduct a training testing achieved with a test data size of 0.6, namely a validation
dataset ratio variation ranging from 9:1 to 1:9 for all numbers accuracy of 81.18%. Among the tested data sizes, a 0.6 test
in order to identify the most optimal proportion between data ratio consistently demonstrated the highest average
training and testing data. We can see the result in Fig. 14. validation accuracy, indicating its superiority in terms of both
Once the optimal proportion is determined based on accuracy and overall system performance. With a testing data
accuracy values, the deep learning model is evaluated to size of 0.6, it implies that only 40.00% of the questions
measure its reliability in predicting question scores per need manual examination by the user. This indicates that the
number. The evaluation results exhibit variations in outcomes constructed system efficiently assists in assessing 60.00% of
for each question, as depicted in the following image. the overall essay answers from students.
In the first image, the training outcome for question 3b The results of hyperparameter testing on the deep learning
shows a validation accuracy of 45.83%, while in the second model architecture with 1D CNN can be seen in Table 3,
image, the training outcome for question 3c demonstrates Table 4 and Table 5. The test results of the three existing
a validation accuracy of 100%. From these findings, it can optimizers can be concluded that the use of the Adam
be inferred that the model possesses diverse capabilities in optimizer produces the best performance when viewed from
predicting question scores, contingent upon the complexity the accuracy and loss values obtained in Table 3 using number
and characteristics of each individual question. 3c as a sample. Regarding the learning rate experiments,
a learning rate of 0.002 consistently outperformed others,
achieving the highest validation accuracy and the lowest loss,
suggesting its optimal performance. Then, for test results, the
optimal batch size is batch size 32.
TABLE 3. Training results from different optimizers of 1D-CNN.

FIGURE 12. Accuracy and Loss in 3b.

TABLE 4. Training results from different learning rates of 1D-CNN.

FIGURE 13. Accuracy and Loss in 3c.

The evaluation process involves conducting training across

all question numbers, employing different allocations of test
data ranging from 1:9 to 9:1. This holistic method allows The selection of hyperparameters as shown in Table 3,
for evaluating the model’s performance across different Table 4 and Table 5 is supported by numerous studies that
proportions and helps in determining the most optimal emphasize the benefits of each method [52], [53], [54]. The
balance. The culmination of this endeavour is the determi- Adam optimizer, which merges the advantages of AdaGrad
nation of the optimal test data proportion that yields the and RMSprop, facilitates rapid and stable convergence due

188224 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

FIGURE 14. The Validation Accuracy for Different Training-Testing Ratio within the Dataset.

TABLE 5. Training results from different batch size of 1D-CNN. The training outcomes for each question at the 40.00%
test data and 60.00% training data allocation. Table 6
displays training results with hyperparameters, such as Adam
optimizer, learning rate 0.002 and batch size 32. We use those
hyperparameters based on the training result from Fig. 14,
Table 3, Table 4, and Table 5.

TABLE 6. Test size variation and validation accuracy average.

to its adaptive learning rate adjustments. This characteristic

makes it particularly effective in managing sparse gradients
and noisy data [55]. Conversely, RMSprop is renowned for its
ability to control overshooting by adjusting the learning rate
based on squared gradients, which aids in achieving faster
convergence across various scenarios. Despite its slower
convergence rate, SGD with momentum remains highly
valued for its simplicity and efficacy in generalizing well
across different datasets.
The chosen learning rates (0.002, 0.02, and 0.2) are crucial
for examining how different rates impact model convergence
and performance. These specific values were selected to
investigate the effects of small to large step sizes in the
optimization process, given that variations in learning rate can Table 6 presents the training and testing results on
significantly influence the model’s ability to identify optimal 12 datasets, where each dataset represents a single question
weights. comprising 30 recordings. This reflects the class structure,
Additionally, varying the batch sizes (16, 32, and 64) helps which has 30 students. Each recording in every dataset
assess the trade-off between gradient estimate stability and includes an image captured by the student, along with
computational efficiency. Smaller batch sizes introduce more additional text-based information such as name and student
noise into gradient estimates, offering a form of regular- ID (NIM) inputted by the student through the application
ization, whereas larger batch sizes provide more accurate used. Thus, the acquired data provides a fairly good
gradient estimates, which can accelerate convergence. representation of the system’s performance in classifying
By tuning these hyperparameters, we aim to optimize the images and text-based information from students. A deeper
1D-CNN model’s training process and overall performance. examination of the training and testing outcomes can provide
This approach is supported by empirical evidence from valuable information about the system’s ability to process
various studies that have demonstrated the effectiveness of various types of data and the sophistication of the questions
these hyperparameters in multiple deep learning tasks. asked.

VOLUME 12, 2024 188225

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

Based on the data obtained in Table 6, it is evident that the This commendable classification outcome is attributed
highest validation accuracy is achieved for question number to the fact that question 3c encompasses only two
3c with an accuracy reaching 100%, whereas the lowest classes, specifically class 3 and class 5. Moreover,
validation accuracy is recorded for question number 3b with considering the complexity of the trained data, question
validation accuracy of 45.83%. 3c exhibits data that is not excessively intricate,
Table 7 and Table 8 present the results of classification featuring a noteworthy distinction between images in
analysis using a 1-5 scale, where classes 3, 4, and 5 represent class 3 and those in class 5. Consequently, the model’s
specific classification levels. This classification is derived task of pattern recognition and classification remains
from manual correction, where answers can receive points 3, relatively uncomplicated. Illustrations of an image
4, or 5 based on the manual evaluation. After several answers from class 3 and an image from class 5 are presented in
are manually checked, the remaining unchecked answers will Fig. 15 and Fig. 16, respectively.
be classified using a system based on predefined classes,
where the scale ranges from 1 to 5. The system evaluates the
similarity of answers to the predefined classes to assign an
appropriate classification score.
Therefore, the author will provide a detailed presentation
of the experiment results for the highest accuracy, which is FIGURE 15. Sample image in class ’3’ for number 3c.
question number 3c, as well as for the lowest accuracy, which
is question number 3b, to enable a thorough comparison. For
further analysis of the remaining questions, please refer to
Table 6.
FIGURE 16. Sample image in class ’5’ for number 3c.
1) Question number 3c
Based on the training outcomes with a test data size 2) Question number 3b
of 0.6, by applying the Adam optimization algorithm Question number 3b exhibits the lowest validation
with a learning rate of 0.002 and a batch size of 32, accuracy compared to the other questions after under-
the model reaches a perfect 100% validation accuracy going training with a test data size of 0.6, using
when evaluated on question 3c. This indicates that the Adam optimization algorithm, a learning rate of
the model accurately predicts all instances within the 0.002, and a batch size of 32. The obtained validation
test dataset for question 3c, correctly assigning them accuracy value is 45.50%, indicating that the model
to their respective classes. Furthermore, the model’s struggles to classify the data effectively and accurately.
performance can be assessed through performance Subsequently, the performance metrics are presented in
metrics, as indicated by the classification report and Table 8.
illustrated in the confusion matrix depicted in Table 7.
TABLE 8. Example of a classification report for number 3b.

TABLE 7. Example of a classification report for number 3c.

Question number 3b encompasses data across three

Through the classification report in Table 7, it is evident classes: class 3, class 4, and class 5. The classification
that for question number 3c, precision, recall, and report and confusion matrix provide insights into
F1 score values of 1.00 are achieved. These values precision, which signifies the ratio of true positive(TP)
indicate that the model predicts flawlessly for both to all positive predictions (TP + FP). Precision reflects
existing classes, namely class 3 and class 5. A precision the model’s accuracy in predicting a specific class. For
of 1.00 signifies the absence of false positives(FP), class 3, its precision is 0.38, indicating that around
meaning no negative cases are incorrectly predicted 38% of predictions classified as class 3 are accurate.
as positive, while a recall of 1.00 indicates the Class 5 has a precision value of 0.50, signifying that
absence of false negatives, implying no positive cases approximately 50% of predictions classified as class
are wrongly predicted as negative. The F1-score 5 are correct. Class 4 achieves a precision of 1.00,
of 1.00 demonstrates the model’s achievement of a indicating that all predictions classified as class 4 are
balanced performance between precision and recall. accurate.

188226 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

In addition to precision, the classification report also TABLE 9. Comparison scenario for number 3B.
reveals recall and F1-score values for this particular
question. Class 3 has a recall value of 0.75, meaning
the model successfully detects 75% of the entire class
3 dataset, while for class 5, its recall is 0.43, implying
the model accurately identifies 43% of the overall class
5 data. Class 4 exhibits a recall of 0.22, indicating that
the model only manages to detect 22% of the entire
class 4 data.
TABLE 10. Comparison scenario for number 5A.
From the precision and recall data, F1-scores for each
class can be derived. The F1-score represents the
harmonic average of both precision and recall. The F1-
scores for each class are as follows: 0.50 for class 3,
0.46 for class 5, and 0.36 for class 4. In conclusion,
this model demonstrates a tendency towards lower
performance with relatively low F1-scores for each
class. This can be attributed to various factors, one
of which is image complexity. In question number TABLE 11. Comparison scenario for number 3C.
3b, the classified images are complex, and there’s a
degree of similarity between different classes, making
it challenging for the model to discern patterns and
leading to suboptimal classification. Images from each
class can be observed in Fig. 17, 18, and 19.

FIGURE 17. Sample image in class ’3’ for number 3b.

batch size of 64 yields the highest validation accuracy at
50%, showcasing its efficacy relative to alternative scenarios.
Similarly, for Scenario 5a, both Adam and RMSprop
optimizers with batch sizes of 64 achieve the highest F1-
FIGURE 18. Sample image in class ’4’ for number 3b.
Score of 90%. Moreover, Scenario 3C consistently showcases
perfect F1-Score across all scenarios, indicating exceptional
model performance irrespective of the optimizer or batch size
utilized. The selection of Adam and RMSprop optimizers
FIGURE 19. Sample image in class ’5’ for number 3b. with batch sizes of 32 and 64 is underpinned by observed F1-
Score values from previous experiments, highlighting their
In this paper, a comprehensive comparison table of
effectiveness in attaining satisfactory results. As a result,
scenarios (Tabel 9, Tabel 10, and Tabel 11) is presented,
the insights gained guide the deliberate choice of parameter
featuring four distinct experimental setups varying in the
configurations, aiming to achieve an optimal balance between
optimizer, learning rate, and batch size parameters. These
the speed of convergence and the accuracy of the model in
tables delineate the f1 score accuracy results across three
the context of the research. In addition, the sample answers
different question numbers, drawing from data samples
provided by students for questions 3b, 5a, and 3c can be
exhibiting both good and poor validation accuracies as per
visualized in Fig. 20, Fig. 21, and Fig. 22, respectively.
Table 6. Notably, four methods were scrutinized, employing
combinations of Adam and RMSprop optimizers alongside
batch sizes of 32 and 64, grounded in the significant
validation accuracy observations highlighted in Table 5. FIGURE 20. Sample Image for Number 3b.
Furthermore, evaluation outcomes in Table 3 underscored the
proficient performance of Adam and RMSprop optimizers in
terms of validation accuracy and loss metrics.
An analysis of the validation accuracy patterns reveals dis- FIGURE 21. Sample Image for Number 5a.
tinct trends across scenarios. Notably, Scenario 4 consistently
demonstrates superior validation accuracy compared to other
scenarios for question number 3b, while for question number
5a, Scenarios 2, 3, and 4 exhibit similarly high F1-Scores.
Specifically, in Scenario 3b, employing RMSprop with a FIGURE 22. Sample Image for Number 3c.

VOLUME 12, 2024 188227

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

TABLE 12. Comparison with previous research.

TABLE 13. Comparison with other method. lowest loss compared to the other tested methods. These
results substantiate the claim that the 1D CNN method offers
significant advantages in terms of accuracy and effectiveness.
The findings indicate that the 1D CNN approach not
only excels in performance but also makes a substantial
contribution to the development of efficient and accurate
automated assessment systems. This advantage underscores
the relevance of the 1D CNN method in enhancing the quality
and reliability of automated evaluations within the studied
context.

Regarding the comparison shown in Table 12, it should be

acknowledged that the comparison is not entirely equivalent
due to variations in datasets and data processing techniques.
Nonetheless, the capabilities offered by our study surpass
those of other research. Our study combines various existing
approaches, allowing for assessment without the need to
convert images to text and handling a broader range of
data variations (both characters and alphabets). While other
studies tend to be limited to one type of data or aspect,
our research provides an advantage through updates and
enhancements of more comprehensive aspects, even though
different methods and datasets are used.
The proposed method stands out with efficient grading
capabilities without requiring additional conversion steps
often found in other approaches, as seen in the work of [29]
achieving a test accuracy of 92.86% at Prince Mohammad Bin
Fahd University. Another advantage is that our system can
achieve competitive performance even after undergoing only
one initial training with a continually expanding dataset. This
demonstrates that our approach not only clearly distinguishes
itself but also makes a valuable contribution to developing
automated assessment systems. Thus, while comparisons
with other methods such as CNN-LSTM [30], Segmented
HCR [31], and OHRT-AES [33] highlight their respective
strengths, our approach provides a significant additional
dimension to the evolving landscape of automated assessment
systems.
After comparing the proposed method with previous
research, further analysis was conducted to compare this
method against others using the same dataset. This com-
parison is presented in Table 13, which demonstrates that FIGURE 23. Accuracy and Loss comparison 1D CNN with other methods.
the 1D CNN method achieved the highest accuracy and the

188228 VOLUME 12, 2024

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

V. CONCLUSION [6] H. Wang, P. Xu, and J. Zhao, ‘‘Improved KNN algorithms of spherical
This paper proposes a system for automatically assessing regions based on clustering and region division,’’ Alexandria Eng. J.,
vol. 61, no. 5, pp. 3571–3585, May 2022.
handwritten essay answers using a classification approach [7] M. Zhao, D. Li, Z. Shi, S. Du, P. Li, and J. Hu, ‘‘Blur feature
with a 1D CNN deep learning model. The dataset comprised extraction plus automatic KNN matting: A novel two stage blur region
handwritten essay answers from 30 students, addressing detection method for local motion blurred images,’’ IEEE Access, vol. 7,
pp. 181142–181151, 2019.
12 question numbers. Following the image processing stages, [8] V. Abrol, H. G. Pathak, and A. Shukla, ‘‘Power of image-based
the data underwent training utilizing the 1D CNN model. The digit recognition with machine learning,’’ in Proc. Int. Conf. Emergent
training was conducted for all question numbers, with varying Converging Technol. Biomed. Syst. Cham, Switzerland: Springer, 2023,
pp. 323–336.
training ratios to the testing dataset, ranging from 9:1 to 1:9. [9] A. T. Azar, A. Khamis, N. A. Kamal, and B. J. Galli, ‘‘Machine learning
Our analysis of the average validation accuracy results techniques for handwritten digit recognition,’’ in Proc. Int. Conf. Artif.
Intell. Comput. Vis. (AICV). Cham, Switzerland: Springer, Jan. 2020,
revealed that the optimal proportion between training and pp. 414–426.
test data was achieved at 40% training data and 60% test [10] A. Yavariabdi, H. Kusetogullari, T. Celik, S. Thummanapally, S. Rijwan,
data, yielding an average validation accuracy of 81.18%. and J. Hall, ‘‘CArDIS: A Swedish historical handwritten character and
word dataset,’’ IEEE Access, vol. 10, pp. 55338–55349, 2022.
However, it’s noteworthy that validation accuracy varied [11] F. Ott, D. Rügamer, L. Heublein, B. Bischl, and C. Mutschler, ‘‘Auxiliary
across different questions, with some achieving high accuracy cross-modal representation learning with triplet loss functions for online
while others exhibited lower accuracy. handwriting recognition,’’ IEEE Access, vol. 11, pp. 94148–94172, 2023.
[12] A. F. De Sousa Neto, B. L. D. Bezerra, E. B. Lima, and A. H. Toselli,
Specifically, question 3c emerged with the highest valida- ‘‘HDSR-flor: A robust end-to-end system to solve the handwritten digit
tion accuracy for training and F1-score for testing, achieving string recognition problem in real complex scenarios,’’ IEEE Access, vol. 8,
a validation accuracy of 100%. Conversely, question 3b pp. 208543–208553, 2020.
[13] A. Gummaraju, A. K. B. Shenoy, and S. N. Pai, ‘‘Performance comparison
demonstrated the lowest validation accuracy for training and of machine learning models for handwritten devanagari numerals classifi-
F1-score for testing, with a validation accuracy of 45.50% cation,’’ IEEE Access, vol. 11, pp. 133363–133371, 2023.
and F1-score of 42%. This variance in validation accuracy [14] A. A. A. Ali and S. Mallaiah, ‘‘Intelligent handwritten recognition using
hybrid CNN architectures based-SVM classifier with dropout,’’ J. King
across different questions can be attributed to factors such as Saud Univ. Comput. Inf. Sci., vol. 34, no. 6, pp. 3294–3300, Jun. 2022.
the number of classes and image complexity. [15] T. M. Ghanim, M. I. Khalil, and H. M. Abbas, ‘‘Comparative study on deep
In conclusion, this paper highlights the potential of convolution neural networks DCNN-based offline Arabic handwriting
recognition,’’ IEEE Access, vol. 8, pp. 95465–95482, 2020.
employing deep learning models for automating the assess- [16] H. Pan, Z. Pang, Y. Wang, Y. Wang, and L. Chen, ‘‘A new image
ment of handwritten essay answers. Further research could recognition and classification method combining transfer learning algo-
explore strategies to improve accuracy, particularly for ques- rithm and MobileNet model for welding defects,’’ IEEE Access, vol. 8,
pp. 119951–119960, 2020.
tions with lower validation accuracy, potentially enhancing [17] K. Kadam, S. Ahirrao, K. Kotecha, and S. Sahu, ‘‘Detection and
the effectiveness of automated essay grading systems. Future localization of multiple image splicing using MobileNet V1,’’ IEEE
work will include expanding the dataset by incorporating Access, vol. 9, pp. 162499–162519, 2021.
[18] G. C. M. Silvestre, F. Balado, O. Akinremi, and M. Ramo, ‘‘Online gesture
larger and more diverse data, adopting improved methods recognition using transformer and natural language processing,’’ 2023,
or algorithms to enhance performance, and training the arXiv:2305.03407.
[19] M. Ramo and G. C. M. Silvestre, ‘‘A transformer architecture for
model not only on final answers but also on the step-by- online gesture recognition of mathematical expressions,’’ in Proc. Irish
step reasoning process to capture a deeper understanding of Conf. Artif. Intell. Cognit. Sci. Cham, Switzerland: Springer, Jan. 2022,
student responses. pp. 55–67.
[20] R. Rajesh. and R. Kanimozhi., ‘‘Digitized exam paper evaluation,’’ in
Proc. IEEE Int. Conf. Syst., Comput., Autom. Netw. (ICSCAN), Mar. 2019,
ACKNOWLEDGMENT pp. 1–5.
The authors would like to thank the Research and Community [21] X. Y. T. Gong, ‘‘A deep learning technology based OCR framework for
Service Directorate, Telkom University. recognition handwritten expression and text,’’ in Proc. CONVERTER,
Jul. 2021, pp. 1–10.
[22] D. K. Tayal, S. Vij, G. Malik, and A. Singh, ‘‘An OCR based automated
REFERENCES method for textual analysis of questionnaires,’’ Indian J. Comput. Sci. Eng.,
vol. 8, no. 2, pp. 184–191, 2017.
[1] S. P. Jenal, D. Rana, and S. K. Pradhan, ‘‘A hand written digit [23] P. Sijimol and S. M. Varghese, ‘‘Handwritten short answer evaluation
recognition based learning Android application,’’ PalArch’s J. Archaeol. system (HSAES),’’ Int. J. Sci. Res. Sci. Technol., vol. 4, no. 2,
Egypt/Egyptol., vol. 17, no. 9, pp. 2151–2163, 2020. pp. 1514–1518, 2018.
[2] M. B. Bora, D. Daimary, K. Amitab, and D. Kandar, ‘‘Handwritten [24] A. Ueda, W. Yang, and K. Sugiura, ‘‘Switching text-based image
character recognition from images using CNN-ECOC,’’ Proc. Comput. encoders for captioning images with text,’’ IEEE Access, vol. 11,
Sci., vol. 167, pp. 2403–2409, Jan. 2020. pp. 55706–55715, 2023.
[25] I. H. El-Shal, O. M. Fahmy, and M. A. Elattar, ‘‘License plate image
[3] P. Chakraborty, S. Roy, S. N. Sumaiya, and A. Sarker, ‘‘Handwritten
analysis empowered by generative adversarial neural networks (GANs),’’
character recognition from image using CNN,’’ in Micro-Electronics
IEEE Access, vol. 10, pp. 30846–30857, 2022.
and Telecommunication Engineering. Cham, Switzerland: Springer, 2023,
[26] Q.-D. Nguyen, N.-M. Phan, P. Krömer, and D.-A. Le, ‘‘An efficient
pp. 37–47.
unsupervised approach for OCR error correction of Vietnamese OCR text,’’
[4] A. Abdulraheem and I. Y. Jung, ‘‘Effective digital technology enabling IEEE Access, vol. 11, pp. 58406–58421, 2023.
automatic recognition of special-type marking of expiry dates,’’ Sustain- [27] W. Ohyama, M. Suzuki, and S. Uchida, ‘‘Detecting mathematical
ability, vol. 15, no. 17, p. 12915, Aug. 2023. expressions in scientific document images using a U-Net trained on a
[5] A. Karim, P. Ghosh, A. A. Anjum, M. S. Junayed, Z. H. Md, K. M. Hasib, diverse dataset,’’ IEEE Access, vol. 7, pp. 144030–144042, 2019.
and A. N. B. Emran, ‘‘A comparative study of different deep learning [28] Y. Jiang, H. Dong, and A. El Saddik, ‘‘Baidu meizu deep learning
model for recognition of handwriting digits,’’ in Proc. ICICNIS, 2020, competition: Arithmetic operation recognition using end-to-end learning
pp. 857–866. OCR technologies,’’ IEEE Access, vol. 6, pp. 60128–60136, 2018.

VOLUME 12, 2024 188229

N. G. Pasaribu et al.: Auto Evaluation for Essay Assessment Using a 1D CNN

[29] E. Shaikh, I. Mohiuddin, A. Manzoor, G. Latif, and N. Mohammad, [51] L. Lakshmi, A. N. Kalyani, D. K. Madhuri, S. Potluri, G. S. Pandey, S. Ali,
‘‘Automated grading for handwritten answer sheets using convolutional M. I. Khan, F. A. Awwad, and E. A. A. Ismail, ‘‘Performance analysis
neural networks,’’ in Proc. 2nd Int. Conf. new Trends Comput. Sci. of cycle GAN in photo to portrait transfiguration using deep learning
(ICTCS), Oct. 2019, pp. 1–6. optimizers,’’ IEEE Access, vol. 11, pp. 136541–136551, 2023.
[30] M. A. Rahaman and H. Mahmud, ‘‘Automated evaluation of handwritten [52] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’
answer script using deep learning approach,’’ Trans. Mach. Learn. Artif. 2014, arXiv:1412.6980.
Intell., vol. 10, no. 4, pp. 1–16, 2022. [53] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[31] M. Alomran and D. Chai, ‘‘Automated scoring system for multiple choice MA, USA: MIT Press, 2016.
test with quick feedback,’’ Int. J. Inf. Educ. Technol., vol. 8, no. 8, [54] D. Choi, C. J. Shallue, Z. Nado, J. Lee, C. J. Maddison, and G. E. Dahl,
pp. 538–545, 2018. ‘‘On empirical comparisons of optimizers for deep learning,’’ 2019,
[32] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, ‘‘Artificial neural arXiv:1910.05446.
networks-based machine learning for wireless networks: A tutorial,’’ IEEE [55] M. Reyad, A. M. Sarhan, and M. Arafa, ‘‘A modified Adam algorithm for
Commun. Surveys Tuts., vol. 21, no. 4, pp. 3039–3071, 4th Quart., 2019. deep neural network optimization,’’ Neural Comput. Appl., vol. 35, no. 23,
[33] A. Sharma and D. B. Jayagopi, ‘‘Automated grading of handwritten pp. 17095–17112, Aug. 2023.
essays,’’ in Proc. 16th Int. Conf. Frontiers Handwriting Recognit.
(ICFHR), Aug. 2018, pp. 279–284.
[34] P. Darshni, B. S. Dhaliwal, R. Kumar, V. A. Balogun, S. Singh,
and C. I. Pruncu, ‘‘Artificial neural network based character recognition
using SciLab,’’ Multimedia Tools Appl., vol. 82, no. 2, pp. 2517–2538,
Jan. 2023. NOVALANZA GRECEA PASARIBU was born in
[35] A. Malhotra, M. Gupta, R. Yadav, P. Tokas, and A. Sharma, ‘‘ANN-based Blora, Central Java, Indonesia. She received the
handwritten digit recognition and equation solver,’’ in Proceedings of Data bachelor’s degree in telecommunication engineer-
Analytics Management, vol. 1. Cham, Switzerland: Springer, Jan. 2022, ing from Telkom University, in 2023, where she is
pp. 77–86. currently pursuing the master’s degree in electrical
[36] M. P. Singh and G. Singh, ‘‘Two phase learning technique in modular
engineering with a focus on signal processing. She
neural network for pattern classification of handwritten Hindi alphabets,’’
Mach. Learn. With Appl., vol. 6, Dec. 2021, Art. no. 100174.
is an enthusiastic individual with a high dedication
[37] J. Fernández, J. Chiachío, J. Barros, M. Chiachío, and C. S. Kulkarni, to the field of signal processing. She has published
‘‘Physics-guided recurrent neural network trained with approximate one article in reputed international journals.
Bayesian computation: A case study on structural response prognostics,’’
Rel. Eng. Syst. Saf., vol. 243, Mar. 2024, Art. no. 109822.
[38] W. Serrano, ‘‘The deep learning generative adversarial random neural
network in data marketplaces: The digital creative,’’ Neural Netw., vol. 165,
pp. 420–434, Aug. 2023.
[39] N. S. Cho, C. S. Kim, C. Park, and K. R. Park, ‘‘GAN-based blur
restoration for finger wrinkle biometrics system,’’ IEEE Access, vol. 8, GELAR BUDIMAN (Member, IEEE) was born
pp. 49857–49872, 2020. in Bandung, West Java, Indonesia, in 1978.
[40] E. Osa, P. E. Orukpe, and U. Iruansi, ‘‘Design and implementation of a He received the B.E. and M.E. degrees in elec-
deep neural network approach for intrusion detection systems,’’ e-Prime trical engineering from Sekolah Tinggi Teknologi
Adv. Electr. Eng., Electron. Energy, vol. 7, Mar. 2024, Art. no. 100434. Telkom (STT Telkom), in 2005, and the Ph.D.
[41] L. Xiaoxue, Z. Xiaofan, Y. Xin, L. Dan, W. He, Z. Bowen, Z. Bohan, degree from the School of Electrical and Infor-
Z. Di, and W. Liqun, ‘‘Review of medical data analysis based on matics Engineering, Institut Teknologi Bandung,
spiking neural networks,’’ Proc. Comput. Sci., vol. 221, pp. 1527–1538, in 2021. He began his academic career as a
Jan. 2023. Lecturer at the Telecommunication Engineer-
[42] A. M. Durán-Rosal, A. Durán-Fernández, F. Fernández-Navarro, and
ing Department, Institut Teknologi Telkom (IT
M. Carbonero-Ruz, ‘‘A multi-class classification model with parametrized
target outputs for randomized-based feedforward neural networks,’’ Appl. Telkom), in 2008, which later evolved into Telkom University under the
Soft Comput., vol. 133, Jan. 2023, Art. no. 109914. Faculty of Electrical Engineering. Since 2018, he has been an Assistant
[43] A. K. Jain, J. Mao, and K. M. Mohiuddin, ‘‘Artificial neural networks: A Professor. His academic expertise spans signal processing, information
tutorial,’’ Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. hiding, artificial intelligence, computer vision, wireless communications,
[44] C. B. Gonçalves, J. R. Souza, and H. Fernandes, ‘‘CNN architecture quantum signal processing, and security. He has an extensive portfolio
optimization using bio-inspired algorithms for breast cancer detec- of research contributions, including ten papers published in reputed
tion in infrared images,’’ Comput. Biol. Med., vol. 142, Mar. 2022, international journals, ten papers presented in Scopus-indexed international
Art. no. 105205. proceedings, nine intellectual property rights, and one authored book.
[45] X. Li, R. Lu, Q. Wang, J. Wang, X. Duan, Y. Sun, X. Li, and Y.
Zhou, ‘‘One-dimensional convolutional neural network (1D-CNN) image
reconstruction for electrical impedance tomography,’’ Rev. Sci. Instrum.,
vol. 91, no. 12, pp. 1–10, 2020.
[46] M. Desai and M. Shah, ‘‘An anatomization on breast cancer detection and
diagnosis employing multi-layer perceptron neural network (MLP) and INDRARINI DYAH IRAWATI (Member, IEEE)
convolutional neural network (CNN),’’ Clin. eHealth, vol. 4, pp. 1–11, was born in Karanganyar, Central Java, in 1978.
Jan. 2021. She received the Ph.D. degree from the School
[47] Z. Zhao, S. Xu, B. H. Kang, M. M. J. Kabir, Y. Liu, and R. Wasinger,
of Electrical and Information Engineering, Insti-
‘‘Investigation and improvement of multi-layer perceptron neural networks
tut Teknologi Bandung. She was with Telkom
for credit scoring,’’ Expert Syst. Appl., vol. 42, no. 7, pp. 3508–3516,
May 2015.
University as an Instructor, from 2007 to 2014,
[48] J. Gan, W. Wang, and K. Lu, ‘‘A new perspective: Recognizing online an Assistant Professor, from 2014 to 2019, and
handwritten Chinese characters via 1-dimensional CNN,’’ Inf. Sci., has been an Associate Professor, since 2019,
vol. 478, pp. 375–390, Apr. 2019. and a Professor, since 2023. She has published
[49] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, 12 papers in reputed international journals, nine
‘‘1D convolutional neural networks and applications: A survey,’’ Mech. papers in Scopus international proceedings, ten intellectual property rights,
Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398. and one book. Her main research interests include compressive sensing,
[50] A. Anand, S. De, A. Jaiswal, S. Agrawal, and B. C. Narendra, ‘‘Automation watermarking, signal processing, artificial intelligence, the Internet of
of non-classroom courses using machine learning techniques,’’ in Proc. 6th Things, and computer networks.
Int. Conf. Educ. Technol. (ICET), Oct. 2020, pp. 90–96.

188230 VOLUME 12, 2024

Tmlai 12831
No ratings yet
Tmlai 12831
16 pages
Automated Essay Scoring Machine Learning
100% (1)
Automated Essay Scoring Machine Learning
10 pages
Sujective Paper
No ratings yet
Sujective Paper
3 pages
Sangeetha Priya R: Ai Handwritten Paper Evaluation System
No ratings yet
Sangeetha Priya R: Ai Handwritten Paper Evaluation System
7 pages
Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory
No ratings yet
Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory
18 pages
Automated Evaluation of Exam Paper Project Final
No ratings yet
Automated Evaluation of Exam Paper Project Final
21 pages
R Paper
No ratings yet
R Paper
7 pages
Mini Project
No ratings yet
Mini Project
17 pages
Capstone Phase 1 Report
No ratings yet
Capstone Phase 1 Report
33 pages
A Lexical Model For Automated Essay Evaluation System Using Ensemble Learning Techniques
No ratings yet
A Lexical Model For Automated Essay Evaluation System Using Ensemble Learning Techniques
4 pages
AI-Based Automated Grading Systems For Open Book Examination System Implications For Assessment in Higher Education
No ratings yet
AI-Based Automated Grading Systems For Open Book Examination System Implications For Assessment in Higher Education
8 pages
2022.xx - Xx-An Improved Approach For Automated Essay Scoring With LSTM and Word Embedding
No ratings yet
2022.xx - Xx-An Improved Approach For Automated Essay Scoring With LSTM and Word Embedding
7 pages
AI Based Automated Essay Grading System Using NLP
No ratings yet
AI Based Automated Essay Grading System Using NLP
6 pages
Ed 615567
No ratings yet
Ed 615567
7 pages
Revolutionizing Writing Assessment With AI Precision, Speed, and Innovation - Febriola Eklesia
No ratings yet
Revolutionizing Writing Assessment With AI Precision, Speed, and Innovation - Febriola Eklesia
4 pages
Intelligent Auto-Grading System
No ratings yet
Intelligent Auto-Grading System
6 pages
IJRPR41077
No ratings yet
IJRPR41077
3 pages
Machine Learning-Based Automated System For Subjective Answer Evaluation
No ratings yet
Machine Learning-Based Automated System For Subjective Answer Evaluation
6 pages
The Intelligent Essay Assessor
No ratings yet
The Intelligent Essay Assessor
7 pages
PDF 1
No ratings yet
PDF 1
24 pages
QUESTION PAPER CHECKING USING GENERATIVE AI Ijariie21928
No ratings yet
QUESTION PAPER CHECKING USING GENERATIVE AI Ijariie21928
6 pages
Neural Networks For Automated Essay Grading
No ratings yet
Neural Networks For Automated Essay Grading
11 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Review of Related Literature of Automated Grading System
No ratings yet
Review of Related Literature of Automated Grading System
18 pages
Automatic Grading with Machine Learning
No ratings yet
Automatic Grading with Machine Learning
10 pages
Increasing Accuracy of Automated Essay Grading by Grouping Similar Graders
No ratings yet
Increasing Accuracy of Automated Essay Grading by Grouping Similar Graders
6 pages
PDF 2
No ratings yet
PDF 2
9 pages
An Improved Approach For Automated An Improved Approach For Automated Embedding
No ratings yet
An Improved Approach For Automated An Improved Approach For Automated Embedding
8 pages
IJCRT2108256
No ratings yet
IJCRT2108256
5 pages
Aes C
No ratings yet
Aes C
8 pages
Automated Evaluation of Handwritten Answer Script Using
No ratings yet
Automated Evaluation of Handwritten Answer Script Using
4 pages
AI-Powered Exam Assessment System For Handwritten Answer Sheets
No ratings yet
AI-Powered Exam Assessment System For Handwritten Answer Sheets
4 pages
Automated Essay Scoring A Literature Review
100% (1)
Automated Essay Scoring A Literature Review
20 pages
Efficient Subjective Answer Evaluation in E-Learning: Leveraging AI For Automated Scoring
No ratings yet
Efficient Subjective Answer Evaluation in E-Learning: Leveraging AI For Automated Scoring
3 pages
Research Paper
No ratings yet
Research Paper
5 pages
2023 BDCC - Short Ans. Grading
No ratings yet
2023 BDCC - Short Ans. Grading
14 pages
PDF 20
No ratings yet
PDF 20
5 pages
Automated MCQ Evaluation Using Deep Learning and Image Segmentation
No ratings yet
Automated MCQ Evaluation Using Deep Learning and Image Segmentation
6 pages
Grading Descriptive Answer Scripts Using Deep Learning: Neethu George, Sijimol PJ, Surekha Mariam Varghese
No ratings yet
Grading Descriptive Answer Scripts Using Deep Learning: Neethu George, Sijimol PJ, Surekha Mariam Varghese
6 pages
The Impact of Artificial Intelligence On Higher Education Assessment Systems
No ratings yet
The Impact of Artificial Intelligence On Higher Education Assessment Systems
3 pages
Text
No ratings yet
Text
4 pages
Project Synopsis
No ratings yet
Project Synopsis
5 pages
Building A Deep Neural Network For Automated Essay Scoring in English Language Teaching
No ratings yet
Building A Deep Neural Network For Automated Essay Scoring in English Language Teaching
6 pages
Ieee Answer
No ratings yet
Ieee Answer
3 pages
Bhumika Major Project Synopsis
No ratings yet
Bhumika Major Project Synopsis
10 pages
A Transformer-Based Approach For Enhancing Automated Essay Scoring
No ratings yet
A Transformer-Based Approach For Enhancing Automated Essay Scoring
6 pages
Enhancing Educational Assessment With Artificial Intelligence Challenges and Opportunities
No ratings yet
Enhancing Educational Assessment With Artificial Intelligence Challenges and Opportunities
5 pages
Data Augmentation For Automated Essay Scoring Using Transformer Models
No ratings yet
Data Augmentation For Automated Essay Scoring Using Transformer Models
5 pages
Automated Essay Evaluation Using Natural Language Processing and
No ratings yet
Automated Essay Evaluation Using Natural Language Processing and
46 pages
Improving Automatic Essay Scoring For Indonesian Language Using Simpler Model and Richer Feature
No ratings yet
Improving Automatic Essay Scoring For Indonesian Language Using Simpler Model and Richer Feature
8 pages
Final
No ratings yet
Final
8 pages
Automated Exam Paper Generator
No ratings yet
Automated Exam Paper Generator
5 pages
Mini Project
No ratings yet
Mini Project
16 pages
Peerj Cs 208
No ratings yet
Peerj Cs 208
16 pages
On Automated Essay Grading Using Large Language Models: Pei Yee Liew Ian K. T. Tan
No ratings yet
On Automated Essay Grading Using Large Language Models: Pei Yee Liew Ian K. T. Tan
8 pages
Aysegul Conference
No ratings yet
Aysegul Conference
1 page
Joundy Hazar 2019 J. Phys. Conf. Ser. 1294 042014
No ratings yet
Joundy Hazar 2019 J. Phys. Conf. Ser. 1294 042014
15 pages
Final Document PDF
No ratings yet
Final Document PDF
73 pages
An Optimized LSTM-based Augmented Language Model F
No ratings yet
An Optimized LSTM-based Augmented Language Model F
14 pages
Framework For Integration of Domain Knowlede Into Logistic Regression
No ratings yet
Framework For Integration of Domain Knowlede Into Logistic Regression
8 pages
HW Manual Integra 32
No ratings yet
HW Manual Integra 32
39 pages
Cse - R21 - Ge3151 - Problem Solving and Python Programming
No ratings yet
Cse - R21 - Ge3151 - Problem Solving and Python Programming
16 pages
AI 210 Instrumentation
No ratings yet
AI 210 Instrumentation
59 pages
Generative AI's Impact on Procurement
No ratings yet
Generative AI's Impact on Procurement
12 pages
Integrating CyberArk With Microsoft Sentinel
No ratings yet
Integrating CyberArk With Microsoft Sentinel
18 pages
Network Virtualization
No ratings yet
Network Virtualization
6 pages
Process #1: Develop Project Charter: - PG 75, PMBOK 6th Ed
No ratings yet
Process #1: Develop Project Charter: - PG 75, PMBOK 6th Ed
9 pages
Pearson International Certificate Test Taker Handbook
No ratings yet
Pearson International Certificate Test Taker Handbook
17 pages
Elios 2 - Product Offering - USD
No ratings yet
Elios 2 - Product Offering - USD
9 pages
Cloud Value Chain Exposed 030512FINAL
No ratings yet
Cloud Value Chain Exposed 030512FINAL
22 pages
Shooting Raw
No ratings yet
Shooting Raw
22 pages
Seminar Topic On Pill Camera
No ratings yet
Seminar Topic On Pill Camera
20 pages
IoT Communication in Smart Agriculture
No ratings yet
IoT Communication in Smart Agriculture
16 pages
12 PAS Essentials PSM
No ratings yet
12 PAS Essentials PSM
27 pages
Math10 Q3 W1
No ratings yet
Math10 Q3 W1
5 pages
Impact of Technology in Todays World.
No ratings yet
Impact of Technology in Todays World.
2 pages
X7 Access Control Guide
No ratings yet
X7 Access Control Guide
14 pages
Vedic Mathematics
No ratings yet
Vedic Mathematics
53 pages
Software Engineering Intern 2025 - 11 Months
No ratings yet
Software Engineering Intern 2025 - 11 Months
2 pages
Talend - Making ETL Easy
0% (1)
Talend - Making ETL Easy
21 pages
Activision Blizzard Strategy Overview
No ratings yet
Activision Blizzard Strategy Overview
7 pages
iBF120 GX Technical Data Sheet 380466 1
No ratings yet
iBF120 GX Technical Data Sheet 380466 1
2 pages
Xis 1517dv 320kv-1
No ratings yet
Xis 1517dv 320kv-1
2 pages
How To Manage Windows
No ratings yet
How To Manage Windows
6 pages
Student Count Wise Aadhaar Verification Detail Report AS ON 25.10.2023, SIRA BLOCK
No ratings yet
Student Count Wise Aadhaar Verification Detail Report AS ON 25.10.2023, SIRA BLOCK
24 pages
Sources of Error
No ratings yet
Sources of Error
11 pages
Crysis Manual
0% (1)
Crysis Manual
14 pages
@havencenter - 99260 Lines 23.11.2024
No ratings yet
@havencenter - 99260 Lines 23.11.2024
2,307 pages
DH Ipc Hfw5541t Ase Datasheet 20190620
No ratings yet
DH Ipc Hfw5541t Ase Datasheet 20190620
3 pages

Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network

Uploaded by

Auto Evaluation For Essay Assessment Using A 1D Convolutional Neural Network

Uploaded by

Received 29 October 2024, accepted 2 December 2024, date of publication 11 December 2024, date of current version 20 December 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3515837

Auto Evaluation for Essay Assessment Using a

I. INTRODUCTION In the process of grading exam answers, multiple-choice

188218 VOLUME 12, 2024

FIGURE 1. Summary of ANN (redrawn from [32]).

VOLUME 12, 2024 188219

188220 VOLUME 12, 2024

FIGURE 3. Illustration of CNN with convolution and fully connected layers.

have filled out. The answer sheets used by the students

III. PROPOSED METHOD

VOLUME 12, 2024 188221

model. This image manipulation encompasses procedures

FIGURE 9. The Final image that is ready to train with 1D CNN.

C. TRAINING DATA WITH 1D CNN

FIGURE 6. The standardized answer sheet.

FIGURE 7. Final answer area as the output of the cropping step.

188222 VOLUME 12, 2024

VOLUME 12, 2024 188223

FIGURE 12. Accuracy and Loss in 3b.

TABLE 4. Training results from different learning rates of 1D-CNN.

FIGURE 13. Accuracy and Loss in 3c.

The evaluation process involves conducting training across

188224 VOLUME 12, 2024

TABLE 6. Test size variation and validation accuracy average.

to its adaptive learning rate adjustments. This characteristic

VOLUME 12, 2024 188225

TABLE 7. Example of a classification report for number 3c.

Question number 3b encompasses data across three

188226 VOLUME 12, 2024

FIGURE 17. Sample image in class ’3’ for number 3b.

VOLUME 12, 2024 188227

TABLE 12. Comparison with previous research.

Regarding the comparison shown in Table 12, it should be

188228 VOLUME 12, 2024

VOLUME 12, 2024 188229

188230 VOLUME 12, 2024

You might also like