Information
Information
Article
Artificially Intelligent Readers: An Adaptive Framework for
Original Handwritten Numerical Digits Recognition with
OCR Methods
Parth Hasmukh Jain 1 , Vivek Kumar 2, * , Jim Samuel 1 , Sushmita Singh 3 , Abhinay Mannepalli 1
and Richard Anderson 1
                                         Abstract: Advanced artificial intelligence (AI) techniques have led to significant developments
                                         in optical character recognition (OCR) technologies. OCR applications, using AI techniques for
                                         transforming images of typed text, handwritten text, or other forms of text into machine-encoded
                                         text, provide a fair degree of accuracy for general text. However, even after decades of intensive
                                         research, creating OCR with human-like abilities has remained evasive. One of the challenges has
                                         been that OCR models trained on general text do not perform well on localized or personalized
                                         handwritten text due to differences in the writing style of alphabets and digits. This study aims
                                         to discuss the steps needed to create an adaptive framework for OCR models, with the intent of
                                         exploring a reasonable method to customize an OCR solution for a unique dataset of English language
                                         numerical digits were developed for this study. We develop a digit recognizer by training our model
                                         on the MNIST dataset with a convolutional neural network and contrast it with multiple models
Citation: Jain, P.H.; Kumar, V.;
                                         trained on combinations of the MNIST and custom digits. Using our methods, we observed results
Samuel, J.; Singh, S.; Mannepalli, A.;
                                         comparable with the baseline and provided recommendations for improving OCR accuracy for
Anderson, R. Artificially Intelligent
                                         localized or personalized handwritten text. This study also provides an alternative perspective to
Readers: An Adaptive Framework
for Original Handwritten Numerical
                                         generating data using conventional methods, which can serve as a gold standard for custom data
Digits Recognition with OCR              augmentation to help address the challenges of scarce data and data imbalance.
Methods. Information 2023, 14, 305.
https://doi.org/10.3390/                 Keywords: OCR; adaptive; custom; digits; MNIST; informatics; machine learning; deep learning
info14060305
                                  OCR is increasingly used for automation and enhancement of workflow and business
                            processes. OCR is used by organizations to manage handwritten and printed text and
                            document types identification and to parse data according to complex rules. OCR also has
                            social and societal functional capabilities; for example, OCR can be used to provide an
                            assistive solution for those with vision impairment by converting text images into text-to-
                            speech [5]. One of the significant challenges faced by hospitals is the loss of patient medical
                            information. OCR technologies can support electronic medical records and health infor-
                            mation capture. Additionally, OCR can be used to automate information extraction from
                            test results for input into hospital information management systems (HIMS), supporting
                            the effective representation of patients’ medical histories. OCR tools such as Scayne and
                            Expensify are used in finance, accounting and business process management for functions
                            such as receipt recognition [6]. OCR is also used in loan processing, security and fraud
                            detection, government sector services such as scanning legacy identification documents and
                            automated license plate detection, and for supporting cultural and historical heritage initia-
                            tives, while fairly robust solutions are available for typed or printed machine-readable texts,
                            automated handwriting recognition systems are yet to provide high levels of accuracy due
                            to the challenges of variations in handwriting styles and composition of visual structure.
                            Resolving these challenges remains an open problem, and improving handwriting OCR can
                            lead to very valuable insights for decision support through the inclusion of handwritten
                            data in the evaluation of public sentiment and other textual analytics solutions [7,8].
                                  It is, thus, evident that OCR is an important technology and AI capability. It is, there-
                            fore, important to improve the accuracy of OCR tools with handwritten text, especially for
                            specific handwriting styles, such as the handwriting of a particular doctor or a handwriting
                            style from a particular region. Our work contributes to the body of knowledge of OCR
                            methods by introducing the idea of fine-tuning MNIST with training data from a custom
                            dataset belonging to a particular style to improve the accuracy for that specific style of
                            handwritten digits. The model, method, and process used for OCR with such custom
                            personalized or localized data can be extended to any handwritten style. Our work can also
                            provide an alternative mechanism for generating custom data and conversion to MNIST
                            format using conventional methods and can serve as a standardized mechanism for data
                            augmentation to address the challenges associated with scarce data and data imbalance.
                            The main contributions of this paper are as follows:
                            •    We create a new handwritten custom dataset from a single contributor to ensure
                                 overall consistency of style.
                            •    We train and apply a convolutional neural network to MNIST data for validation and
                                 then extend the method with custom data.
                            •    Furthermore, we provide insights into the effects of training the model with variations
                                 in the proportions of custom data with and without rotations
                            •    We provide a framework for adaptive OCR applicable to handwritten documents
                                 from a single source or belonging to a particular style of writing.
                            •    Finally, we provide a comprehensive discussion of how the adaptive framework could
                                 be implemented and potential improvements with alternate technological solutions.
                                 The rest of the paper is organized as follows. Section 2 presents the literature back-
                            ground of notable OCR methods. Section 3 mentions the problem statements and provides
                            details of the dataset, and preprocessing strategies used for the experiments. Section 4
                            presents the stages involved in OCR and the architecture of the employed deep learning
                            (DL) approaches. Section 5 sums up the total experiments performed and presents the
                            experimental outcomes. Section 6 presents the analyses of the obtained results and the
                            limitations of the employed methods. Finally, Section 7 provides the concluding remarks
                            and the future research direction.
                            2. Related Works
                                 In this section, we review the existing works related to OCR. OCR systems can be
                            classified into two groups depending on the kind of input: handwriting recognition and
Information 2023, 14, 305                                                                                            3 of 19
                            based on the transformers architecture have shown promise in reading handwritten text,
                            but the need to improve accuracy further remains [19].
                                   The quality of OCR-generated text can impact various downstream natural lan-
                            guage processing tasks, including NER and dependency parsing in certain workflows.
                            Retrieval results on OCR-generated text should be viewed cautiously and manual super-
                            vision is recommended, as they can result in false positives. Topic models can also be
                            affected by OCR quality in certain workflows, leading to increasingly divergent topics.
                            It is, therefore, recommended to use high-quality OCR above 90 percent and better [20].
                            A new architecture for OCR-based image captioning called Long Short-Term Memory
                            Relational Aware Pointer Network (LSTM-R) uses Long Short-Term Memory and a relation-
                            aware pointer network to incorporate the geometrical relationships between OCR tokens.
                            The relation-based pointer network copies words from OCR tokens based on their geometric
                            relationship, and the model is optimized with multi-label loss. Experimental results demon-
                            strated the effectiveness of LSTM-R over benchmarks, achieving state-of-the-art results on
                            the TextCaps dataset [21]. It is also important to consider potential directions for developing
                            post-OCR processing approaches. It is possible to create artificial materials to train machine
                            learning models, utilize external resources for neural network-based models, improve error
                            detection and correction tasks, and analyze performance with different OCR errors; it is
                            also useful to focus on ‘handling errors’ involving word segmentation, consider ‘different
                            weights for evaluation metrics’, create datasets with more detailed information, and de-
                            velop ‘post-processing approaches for languages other than English’ [22]. Analysis by [23]
                            revealed that most of the OCR software is incapable of accurately detecting variations in
                            fonts and formats. Furthermore, while it can effectively convert text images, it fails to appro-
                            priately convert mathematical equations and symbols, providing only plain text as output.
                            This highlights the need for new and improvised OCR tools to be developed in the future
                            that can effectively perform the identification of mathematical equations and symbols [23].
                                   According to [24], creating a dataset specific to a particular language and an associated
                            period is highly beneficial for OCR correction workflows. Interestingly, using only a
                            language-specific dataset based on the Bible created more errors than corrections. Ref. [24]
                            considered this as being similar to a person from the biblical era attempting to fix OCR
                            errors in modern texts. Furthermore, the error generation algorithm for Hebrew historical
                            newspapers required only 105 manually corrected articles, which is significantly less than
                            the amount of labeled training data needed for a neural network [24]. Ref. [25] showed
                            that document AI and Textract had consistently lower error rates than Tesseract on various
                            documents with different noise levels and fonts in English and Arabic.
                                   Tesseract was more sensitive to noise and performed better only on noise-free doc-
                            uments in English. Further analysis by noise type showed that Textract and Tesseract
                            performed better on grayscale images than on color images and struggled with blur, and
                            Tesseract was more sensitive to salt and pepper noise. In conclusion, Document AI and
                            Textract had better OCR accuracy than Tesseract, and accuracy varied across languages and
                            noise types [25].
                                   It is also important to take a quick view of some of the state-of-the-art solutions
                            available currently. Popular OCR solutions include Tesseract, EASY OCR, Keras OCR,
                            PyPDF2, DocTr, and TrOCR. Tesseract was initially proposed in [26]. Tesseract was one of
                            the first architectures to handle white-on-black text with ease [26]. Tesseract was initially
                            developed at HP in 1984 and 1994. Tesseract was open-sourced in 2005, then acquired by
                            Google, and it works well with Tensorflow. Tesseract solves the problem of text localization,
                            identifying where the text is in the document or an image [27]. Tesseract 4.00 has a new
                            neural network subsystem as a text line recognizer. It has its origins in [28]. Easy OCR is the
                            most straightforward way to implement OCR, and Easy OCR can be implemented with just
                            a few lines of code. It is a Python package, and Jaided AI [29] maintains it. It was released
                            in July 2020 and received a recent update in June 2022. EasyOCR uses Pytorch and OpenCV
                            in the backend [30]. Keras OCR [31] provides end to end training pipeline to build new
                            OCR models. It is a slightly better and packed version of Keras CRNN implementation and
Information 2023, 14, 305                                                                                                     5 of 19
                                CRAFT text detection model. It provides a very accurate API for training a text detection
                                and OCR pipeline [32]. Convolutional Recurrent Neural Network is a CNN, RNN, and
                                CTC (Connectionist Temporal Classification) loss combination for image-based sequence
                                recognition applications, such as scene text recognition and OCR [33]. The convolution is
                                used to extract the feature sequence from an input image. The Recurrent neural network
                                predicts the label distribution for each frame. Furthermore, a transcription layer translates
                                the per-frame predictions into a final label sequence [34]. A convolutional neural network
                                is used to construct Character Region Awareness For Text (‘CRAFT’) Detection to provide
                                the character region and affinity scores. Individual characters are localized in the image
                                using the region score, and each character is grouped into one instance using the affinity
                                score [35].
                                     PyPDF2 is a Pure-Python library that Fenniak [36] created as a PDF toolkit. It has
                                the ability to extract document information [37], divide and merge documents page by
                                page, crop pages, combine numerous pages into one, and encrypt and decode PDF files.
                                DocTR was published by Mindee in March 2021 [38]. Its latest version was released in
                                March 2022. It achieves end-to-end OCR in a two-stage approach; in the first stage, it
                                implements text detection using Resnet 50 architecture, and in the second stage, it achieves
                                text recognition by using CRNN. It can process both PDFs and images. It also detects
                                rotated images. It can show text localization along with confidence. Its overall good
                                implementation and gives overall accurate results [39]. More recently, we have seen the
                                use of transformers for converting textual or handwritten data to machine-encoded text.
                                The transformer concept and architecture were first introduced in the famous paper
                                by [40], and since then, transformers have been increasingly used in NLP, computer vi-
                                sion, and other AI tasks. A Transformer architecture is a stack of encoder and decoder
                                layers. Encoder and Decoder are composed of modules stacked on top of each other
                                multiple times. The encoder will convert the English text into numerical representations.
                                Those numerical representations are fed into a Decoder, and the decoder converts to des-
                                ored output for further processing. Microsoft’s TrOCR models are encoder–decoder models
                                with a text Transformer as the decoder and an image Transformer as the encoder [41].
                                The Hugging Face version of the TrOCR model is popular because of its longevity and
                                simplicity. With many other implementations, it is observed that the same model or li-
                                braries cannot recognize handwritten and typed text with high accuracy. The TrOCR model
                                can convert both text types into machine-encoded text. Furthermore, experiments have
                                shown that the TrOCR model outperforms the current state-of-the-art models on printed
                                and handwritten text recognition tasks. Table 1 list the notable open-source OCR models.
                            3.2. Datasets
                                For our work, we have used two datasets, namely the Modified National Institute of
                            Standards and Technology database (MNIST) and our own generated dataset. For ease of
                            understanding, we termed our dataset as the Custom dataset.
                            1.   MNIST: The MNIST is a benchmarking [44] of 60,000 train handwritten digits and a
                                 test set of 10,000 (10 class labels) with each example represented as an image of 28 × 28
                                 gray-scale pixels. Images are normalized to fit in a 20 × 20 pixel box while maintaining
                                 their aspect ratio. Images are centered in a 28 × 28 image. One column for each of the
                                 pixels because there are 28 × 28 pixels, there are 784 columns, and +1 for the labels.
                                 Pixel values are between 0 and 255, where 0 means black and 255 means white.
                            2.   Custom: Custom dataset consists of 240 training digits, 40 testing digits, and 20 val-
                                 idation digits, ensuring that the model is trained on a diverse set of data and can
                                 generalize well to new data points.
                                 The Custom dataset is also made available on our Githubrepository [45].
                                  to be black or white. A white canvas of 28 × 28 is created the standard size for all
                                  MNIST images. The width or height of the image is then determined, and the image
                                  is resized such that the greatest dimension is 20 pixels, while the smaller dimension
                                  is scaled proportionally. To improve the image quality, the antialias function of pillow
                                  Library is applied, which smoothens jagged edges by averaging the colors of the pixels
                                  at the boundary and sharpening it. Subsequently, the resized image is pasted onto the
                                  28 × 28 pixel white canvas, with 4 pixels from the top or side of the largest dimension used
                                  to center the picture. The smallest dimension is placed halfway between the original size
                                  of 28 and the scaled picture to center the image. After obtaining the pixel values of the
                                  new image, the values are adjusted to a range of 0 to 1, where 1 represents black and 0
                                  represents white, using a formula to invert and normalize the values. As a result of these
                                  steps, the resulting image of the handwritten number is clear and accurate, making it easier
                                  to analyze and process. The original image (Figure 1b) was transformed into the MNIST-
                                  style image shown in Figure 1c using the process mentioned above. Figure 1a and our
                                  transformed digit Figure 1c are quite similar in appearance. However, our transformation
                                  process resulted in a sharper image compared to the original figure.
                                           Item                           Specification
                                           CPU                            Intel Core i3-7100 (-HT-MCP-) CPU @ 3.90 GHz
                                           GPU                            NVIDIA T4
                                           CUDA                           Version 10.1
                                           OS                             Ubuntu 17.10
                                           Python                         Version 3.10
                                           Tensorflow                     Version 2.12.0
                                 For our experiments, we built a CNN Architecture, as shown in Figure 3, with the
                            following parameters:
                            1.   In our model, we stack multiple Conv2D layers to extract increasingly complex
                                 features from an input image.
                            2.   Each Conv2D layer has parameters such as the number of filters, kernel size, padding,
                                 and activation function, which are tuned to optimize the performance of the network.
                                 The Conv2D layers are used to build feature maps from the data.
                            3.   The kernel size of the filters is usually (3 × 3), and we use padding to ensure that
                                 the filters fit the input image. We also use the ReLU activation function to introduce
                                 nonlinearity in our model. Max-Pooling is used to reduce dimensionality. In Max-
Information 2023, 14, 305                                                                                            9 of 19
                                  Pooling, the output value is just the maximum input value in each patch (for example,
                                  the maximum pixel in a span of three pixels).
                            4.    The next step is to flatten out the output from the last pooling layer because the input
                                  of the fully-connected layer is not a 2D vector but a 1D vector. To flatten means to
                                  convert a 2D matrix into a 1D matrix.
                            5.    A fully-connected hidden layer is added to perform classification. The fully-connected
                                  layers combine the features extracted from the convolutional layers to create a model.
                                  Between fully-connected layers, dropout layers are added to remove specific neurons’
                                  contributions to the following layer while leaving the rest intact. Dropout layers are
                                  applied to reduce overfitting.
                            6.    Finally, an activation function as softmax is used to classify the outputs as digits as 0,
                                  1, 2 . . . 9.
                                                                               TP + TN
                                                         Accuracy =                                                (1)
                                                                          TP + TN + FP + FN
                                                                Precision × Recall
                                                              F1 = 2 ×                                       (2)
                                                                Precision + Recall
                            where TP, FP, TN, and FN are True positive, False positive, True Negative, and False
                            Negative, respectively.
                            5.1. Experiment 1
                                 In this experiment, we trained the CNN Model on 60,000 MNIST images and tested
                            the model on 10,000 MNIST test data points. The output from this experiment is not shown
                            in Table 4 because it was a standard experiment with MNIST data, and we only used it to
                            validate our base process.
                                 Results: Overall, the model performed well and generalized well to different data
                            points, with very few mispredicted labels, as shown in Figure 4a. Furthermore, Figure 4b
                            depicts the Loss vs. Epoch and Loss vs. Accuracy graph, demonstrating that the validation
                            accuracy and validation loss were consistent with the training accuracy and training loss.
                            The small gap between training and validation accuracy indicates that the model does
                            not overfit and can generalize well. This model will serve as our baseline for comparison
                            to other models. The model achieved an accuracy of 97.90% and a validation accuracy
                            of 99.18%.
                                                                         (b) Graph
                            Figure 4. Experiment 1 outcome.
Information 2023, 14, 305                                                                                        11 of 19
                            5.2. Experiment 2
                                 In this experiment, the model was trained on a dataset of 60,000 images from the
                            MNIST dataset and was then tested on our Custom test dataset consisting of 40 images. The
                            MNIST test set, which consists of 10,000 images, was used as the validation dataset for
                            this model.
                                 Results: To evaluate the model’s performance on our Custom dataset, we used the
                            same model as before and tested it on 40 images. The results showed that five of the
                            40 digits were misclassified, as seen in Figure 5. Notably, the digit “2” had the lowest
                            accuracy of classification.
                            5.3. Experiment 3
                                  The model was trained on a combined dataset comprising 60,000 images from the
                            MNIST dataset and 240 images from our Custom dataset. To prevent overfitting and ensure
                            that the model generalizes well, we used the MNIST test set, which contains 10,000 images,
                            along with 20 digits from our Custom validation dataset, as the validation dataset for this
                            model. After the training phase, we evaluated the model’s performance on our Custom test
                            dataset of 40 images.
                                  Results: The evaluation of the model’s performance on our Custom test dataset re-
                            vealed that two of the 40 digits were misclassified, as shown in Figure 6a. Notably, the
                            digits “4” and “5” were the ones that were misclassified. The graph shown in Figure 6b
                            indicates that there is only a tiny gap between training and validation accuracy and loss,
                            indicating that the model has maintained its generalizability and can fine-tune itself over
                            our Custom dataset, achieving a low number of misclassification. The model achieved
                            an accuracy of 97.07% on the combined dataset and a validation accuracy of 98.91%.
                            These results suggest that the model performs well on both the MNIST and our Custom
                            datasets, indicating that it has learned relevant common features across different datasets.
                            These findings demonstrate the effectiveness of using a combined dataset and diverse
                            validation data to improve the model’s performance on new data points.
Information 2023, 14, 305                                                                                      12 of 19
                                                                    (b) Graph
                            Figure 6. Experiment 3 outcome.
                            5.4. Experiment 4
                                 This model was trained on a combined dataset of 60,000 images from the MNIST
                            dataset, 240 images from our Custom dataset, and 720 rotated digits. The rotated digits
                            were generated by applying a random rotation angle between −15 degrees to 15 degrees to
                            our 240 digits. For validation, we used the MNIST test set, which contains 10,000 images,
                            and added 20 digits from our Custom validation dataset. Following the training phase, we
                            evaluated the model’s performance on our Custom test dataset of 40 images.
                                 Results: The evaluation of the model’s performance on our Custom test dataset re-
                            vealed that two of the 40 digits were misclassified, as shown in Figure 7a. As seen in the
                            graph presented in Figure 7b, there is a slightly increased gap between the training and
                            validation accuracy, which was expected due to the addition of our large Custom dataset.
                            Although the model’s generalizability decreased slightly, it still performed well on the
                            Custom test dataset. The model achieved an accuracy of 96.30% on the combined dataset
                            and a validation accuracy of 98.83%.
Information 2023, 14, 305                                                                                      13 of 19
                                                                    (b) Graph
                            Figure 7. Experiment 4 outcome.
                            5.5. Experiment 5
                                 This model was trained on a combined dataset of 60,000 images from the MNIST
                            dataset, 240 images from our Custom dataset, and 9600 rotated digits. The rotated digits
                            were generated by applying a random rotation angle between −15 degrees to 15 degrees to
                            our 240 digits. For validation, we used the MNIST test set, which contains 10,000 images,
                            and added 20 digits from our Custom validation dataset. Following the training phase, we
                            evaluated the model’s performance on our Custom test dataset of 40 images.
                                 Results: The evaluation of the model’s performance on our Custom test dataset re-
                            vealed that two of the 40 digits were misclassified, as shown in Figure 8a. As seen in the
                            graph presented in Figure 8b, there is a drastic increased gap between the training and
                            validation accuracy, which was expected due to the addition of our larger Custom dataset.
                            Although the model’s generalizability decreased slightly, it still performed well on the
                            Custom test dataset. The model achieved an accuracy of 85.30% on the combined dataset
                            and a validation accuracy of 98.96%.
Information 2023, 14, 305                                                                                           14 of 19
                                                                          (b) Graph
                            Figure 8. Experiment 5 outcome.
                            5.6. Experiment 6
                                   The model used in this experiment is based on the transformer architecture and was
                            first introduced in the paper by Li et al. (2021) [41]. It was trained on the IAM dataset [49],
                            and the specific model used in this study is a pre-trained model fine-tuned on the MNIST
                            dataset accessible via [50]. On the MNIST dataset, the model achieved an accuracy of
                            99.52%. However, when tested on our dataset, it misclassified two digits, as seen in
                            Figure 9.
                            typed data and associated textual data distributions [56]. It would be very useful to
                            develop OCR solutions for handwritten documents to create a seamless integration with
                            NLP solutions, such as sentiment analysis and NLP-based socioeconomic modeling [57–59].
                            OCR is a mature discipline with industry-level solutions for identifying and ‘reading’
                            images of machine-printed text. However, due to the high degree of variations, OCR for
                            handwriting recognition needs additional work. Based on the early-stage success of TrOCR,
                            we believe there is significant potential for improving OCR solutions for handwritten text
                            with transformer-based applications. We intend to explore the potential of fine-tuning and
                            limited shot learning with pretrained transformer models to cater to user-specific digit
                            recognition needs. Furthermore, the additional use of AI methods and tools to mimic
                            human intelligence’s capability to identify text in varying colors, mixed sizes, and styles
                            and other complex forms holds great promise. As an applied direction for future research,
                            it is possible to use OCR methods to generate data for domains such as heritage culture
                            and preservation from images of historical texts. One of the most important research areas
                            to build upon would be the capability of an OCR application to be ‘flexible’ with custom
                            handwriting styles. Incorporating the on-demand flexibility of OCR models would be a
                            powerful way to advance the effectiveness of OCR with Custom data and variations in font
                            styles. We also aim to incorporate world knowledge in the form of triples to address the
                            domain adaptation challenges in identifying subtle sentiments [60].
                            Author Contributions: Conceptualization, P.H.J., V.K., J.S., S.S., A.M. and R.A.; methodology, P.H.J.,
                            V.K., J.S., S.S., A.M. and R.A.; software, P.H.J. and J.S.; formal analysis, P.H.J., V.K. and J.S.; inves-
                            tigation, P.H.J., V.K. and J.S.; resources, P.H.J., J.S. and A.M.; data curation, P.H.J., J.S. and A.M.;
                            writing—original draft preparation, P.H.J., V.K. and J.S.; writing—review and editing, P.H.J., V.K.,
                            J.S., S.S. A.M. and R.A.; visualization, P.H.J., V.K. and J.S.; supervision, V.K. and J.S.; project adminis-
                            tration, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of
                            the manuscript.
                            Funding: This research was funded by the NJ State Policy Lab (https://policylab.rutgers.edu/ (ac-
                            cessed on 11 April 2023)) and the Public Informatics program at Bloustein School, Rutgers University
                            (https://bloustein.rutgers.edu/graduate/public-informatics/mpi/ (accessed on 11 April 2023)).
                            Institutional Review Board Statement: Not applicable.
                            Informed Consent Statement: Not applicable.
                            Data Availability Statement: The MNIST dataset is available at (https://archive-beta.ics.uci.edu/
                            dataset/683/mnist+database+of+handwritten+digits (accessed on 11 April 2023)) and our Custom
                            dataset is available at the Github repository (https://github.com/ay7n/OCR-RUCILDigits-4 (ac-
                            cessed on 11 April 2023)).
                            Conflicts of Interest: The authors declare no conflict of interest.
                            Abbreviations
                            The following abbreviations are used in this manuscript:
                            AI          Artificial Intelligence
                            CNN         Convolution Neural Network
                            DL          Deep Learning
                            ML          Machine Learning
                            OCR         Optical Character Recognition
                            LSTM        Long Short-Term Memory
                            NLP         Natural Language processing
                            RNN         Recurrent Neural Network
                            NER         Named-Entity Recognition
                            ReLU        Rectified Linear Unit
                            LSTM-R      Long Short-Term Memory plus Relation-aware pointer network
                            HMM         Hidden Markov Models
                            DNN         Deep Neural Network
Information 2023, 14, 305                                                                                                           17 of 19
References
1.    Samuel, J.; Kashyap, R.; Samuel, Y.; Pelaez, A. Adaptive cognitive fit: Artificial intelligence augmented management of
      information facets and representations. Int. J. Inf. Manag. 2022, 65, 102505. [CrossRef]
2.    Thorat, C.; Bhat, A.; Sawant, P.; Bartakke, I.; Shirsath, S. A detailed review on text extraction using optical character recognition.
      In ICT Analysis and Applications; Springer: Singapore, 2022; pp. 719–728.
3.    Singh, S. Optical character recognition techniques: A survey. J. Emerg. Trends Comput. Inf. Sci. 2013, 4, 2009–2015.
4.    Samuel, J. A call for proactive policies for informatics and artificial intelligence technologies. Scholars Strategy Network, 19
      December 2021.
5.    Srivastava, N.; Singh, S. Netra: Smart Hand Gloves Comprises Obstacle Detection, Object Identification & OCR Text to Speech
      Converter for Blinds. In Proceedings of the 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics
      and Computer Engineering (UPCON), Gorakhpur, India, 2–4 November 2018; pp. 1–4.
6.    Januszewski, A.; Kujawski, J.; Buchalska-Sugajska, N. Benefits of and obstacles to RPA implementation in accounting firms.
      Procedia Comput. Sci. 2021, 192, 4672–4680. [CrossRef]
7.    Samuel, J.; Rahman, M.M.; Ali, G.M.N.; Samuel, Y.; Pelaez, A.; Chong, P.H.J.; Yakubov, M. Feeling Positive About Reopening?
      New Normal Scenarios From COVID-19 US Reopen Sentiment Analytics. IEEE Access 2020, 8, 142173–142190. [CrossRef]
8.    Ali, G.M.N.; Rahman, M.M.; Hossain, M.A.; Rahman, M.S.; Paul, K.C.; Thill, J.C.; Samuel, J. Public perceptions of COVID-19
      vaccines: Policy implications from US spatiotemporal sentiment analytics. Healthcare 2021, 9, 1110. [CrossRef]
9.    Manwatkar, P.M.; Yadav, S.H. Text recognition from images. In Proceedings of the 2015 International Conference on Innovations
      in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 19–20 March 2015; pp. 1–6.
10.   Srihari, S.N.; Shekhawat, A.; Lam, S.W. Optical character recognition (OCR). In Encyclopedia of Computer Science; Wiley: London,
      UK, 2003; pp. 1326–1333.
11.   Srihari, S.N.; Kuebert, E.J. Integration of hand-written address interpretation technology into the united states postal service
      remote computer reader system. In Proceedings of the Fourth International Conference on Document Analysis and Recognition,
      Ulm, Germany, 18–20 August 1997; Volume 2, pp. 892–896.
12.   Kornai, A. An experimental HMM-based postal ocr system. In Proceedings of the 1997 IEEE International Conference on
      Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997; Volume 4, pp. 3177–3180.
13.   Ifhaam, M.; Jayalal, S. Sinhala handwritten postal address recognition for postal sorting. In Proceedings of the 2019 International
      Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 28 March 2019; pp. 134–141.
14.   Patel, M.; Reddy, S.L. An impact of grid based approach in offline handwritten Kannada word recognition. In Proceedings of
      the 2014 International Conference on Contemporary Computing and Informatics (IC3I), Mysore, India, 27–29 November 2014;
      pp. 630–633.
15.   Nagy, G. Disruptive developments in document recognition. Pattern Recognit. Lett. 2016, 79, 106–112. [CrossRef]
16.   Faizullah, S.; Ayub, M.S.; Hussain, S.; Khan, M.A. A Survey of OCR in Arabic Language: Applications, Techniques, and
      Challenges. Appl. Sci. 2023, 13, 4584. [CrossRef]
17.   Al-Hadhrami, A.A.; Allen, M.; Moffatt, C.; Jones, A.E. National characteristics and variation in Arabic handwriting. Forensic Sci.
      Int. 2015, 247, 89–96. [CrossRef]
18.   Bhagyasree, P.; James, A.; Saravanan, C. A proposed framework for recognition of handwritten cursive english characters
      using DAG-CNN. In Proceedings of the 2019 1st International Conference on Innovations in Information and Communication
      Technology (ICIICT), Chennai, India, 25–26 April 2019; pp. 1–4.
19.   Bhunia, A.K.; Khan, S.; Cholakkal, H.; Anwer, R.M.; Khan, F.S.; Shah, M. Handwriting transformers. In Proceedings of the
      IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1086–1094.
20.   Van Strien, D.; Beelen, K.; Ardanuy, M.C.; Hosseini, K.; McGillivray, B.; Colavizza, G. Assessing the impact of OCR quality on
      downstream NLP tasks. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Valletta, Malta,
      22–24 February 2020.
21.   Wang, J.; Tang, J.; Yang, M.; Bai, X.; Luo, J. Improving OCR-based image captioning by incorporating geometrical relationship. In
      Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 1306–1315.
22.   Nguyen, T.T.H.; Jatowt, A.; Coustaty, M.; Doucet, A. Survey of post-OCR processing approaches. ACM Comput. Surv. (CSUR)
      2021, 54, 1–37. [CrossRef]
23.   Vijayarani, S.; Sakila, A. Performance comparison of OCR tools. Int. J. UbiComp (IJU) 2015, 6, 19–30.
24.   Suissa, O.; Elmalech, A.; Zhitomirsky-Geffet, M. Optimizing the Neural Network Training for OCR Error Correction of Historical
      Hebrew Texts; iConference 2020 Proceedings; iSchools Inc.: Grandville, MI, USA, 2020; pp. 1–10.
25.   Hegghammer, T. OCR with Tesseract, Amazon Textract, and Google Document AI: A benchmarking experiment. J. Comput. Soc.
      Sci. 2022, 5, 861–882. [CrossRef]
26.   Smith, R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis
      and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633.
27.   Ramiah, S.; Liong, T.Y.; Jayabalan, M. Detecting text based image with optical character recognition for English translation
      and speech using Android. In Proceedings of the 2015 IEEE Student Conference on Research and Development (SCOReD),
      Kuala Lumpur, Malaysia, 13–14 December 2015; pp. 272–277.
Information 2023, 14, 305                                                                                                           18 of 19
28.   Breuel, T.M. The OCRopus open source OCR system. Proc. Doc. Recognit. Retr. SPIE 2008, 6815, 120–134.
29.   Kittinaradorn, R. EasyOCR. 2020. Available online: https://github.com/JaidedAI/EasyOCR/tree/master (accessed on 11 April 2023).
30.   Awalgaonkar, N.; Bartakke, P.; Chaugule, R. Automatic license plate recognition system using ssd. In Proceedings of the 2021
      International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation (IRIA), Goa, India,
      20–22 September 2021; pp. 394–399.
31.   Troller, M. Practical OCR System Based on State of Art Neural Networks. Bachelor’s Thesis, Czech Technical University in Prague,
      Dejvice, Czech Republic, 2017.
32.   Alrasheed, N.; Prasanna, S.; Rowland, R.; Rao, P.; Grieco, V.; Wasserman, M. Evaluation of Deep Learning Techniques for Content
      Extraction in Spanish Colonial Notary Records. In Proceedings of the 3rd Workshop on Structuring and Understanding of
      Multimedia heritAge Contents, Virtual, 20 October 2021; pp. 23–30.
33.   Chen, Y.; Yang, J. Research on scene text recognition algorithm basedon improved CRNN. In Proceedings of the 2020 4th
      International Conference on Digital Signal Processing, Chengdu, China, 19–21 June 2020; pp. 107–111.
34.   Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene
      text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [CrossRef]
35.   Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character region awareness for text detection. In Proceedings of the IEEE/CVF
      Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9365–9374.
36.   FenniakM. PyPDF2·PyPI. Available online: https://pypi.org/project/PyPDF2/ (accessed on 4 April 2023).
37.   Kekare, A.; Jachak, A.; Gosavi, A.; Hanwate, P. Techniques for Detecting and Extracting Tabular Data from PDFs and Scanned
      Documents: A Survey. Tabula 2020, 7, 415–417.
38.   Mindee. docTR: Document Text Recognition. Available online: https://github.com/mindee/doctr (accessed on 4 April 2023).
39.   Batra, P.; Phalnikar, N.; Kurmi, D.; Tembhurne, J.; Sahare, P.; Diwan, T. OCR-MRD: Performance Analysis of Different Optical
      Character Recognition Engines for Medical Report Digitization. 2023. Available online: https://www.researchsquare.com/
      article/rs-2513255/v1 (accessed on 4 April 2023).
40.   Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need.
      arXiv 2017. arXiv:1706.03762.
41.   Li, M.; Lv, T.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. Trocr: Transformer-based optical character recognition with
      pre-trained models. arXiv 2021. arXiv:2109.10282
42.   Smith, R.W. The Extraction and Recognition of Text from Multimedia Document Images. Ph.D. Thesis, University of Bristol,
      Bristol, UK, 1987.
43.   Ai, J. EasyOCR. Available online: https://github.com/JaidedAI/EasyOCR (accessed on 4 April 2023).
44.   Le Cun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998,
      86, 2278–2324. [CrossRef]
45.   Lab, R. OCR-RUCILDigits-4. 2022. Available online: https://github.com/ay7n/OCR-RUCILDigits-4 (accessed on 4 April 2023).
46.   Eikvil, L. Optical Character Recognition. 1993. Available online: http://home.nr.no/~eikvil/OCR.pdf (accessed on 4 April 2023).
47.   Agrawal, A.K.; Shrivas, A.; kumar Awasthi, V. A Robust model for handwritten digit recognition using machine and deep
      learning technique. In Proceedings of the 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India,
      21–23 May 2021; pp. 1–4.
48.   Kumar, V.; Recupero, D.R.; Riboni, D.; Helaoui, R. Ensembling Classical Machine Learning and Deep Learning Approaches for
      Morbidity Identification From Clinical Notes. IEEE Access 2021, 9, 7107–7126. [CrossRef]
49.   Cheng, L.; Bing, L.; He, R.; Yu, Q.; Zhang, Y.; Si, L. IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument
      Mining Tasks. arXiv 2022, arXiv:2203.12257
50.   Aico. TROCR Digit. Available online: https://huggingface.co/spaces/aico/TrOCR-digit (accessed on 4 April 2023).
51.   Karthick, K.; Ravindrakumar, K.; Francis, R.; Ilankannan, S. Steps involved in text recognition and recent research in OCR:
      A study. Int. J. Recent Technol. Eng. 2019, 8, 2277–3878.
52.   Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput.
      2019, 31, 1235–1270. [CrossRef] [PubMed]
53.   Williams, G.; Baxter, R.; He, H.; Hawkins, S.; Gu, L. A comparative study of RNN for outlier detection in data mining.
      In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9–12 December 2002;
      pp. 709–712.
54.   Mueller-Gastell, J.; Sena, M.; Tan, C.Z. A Multi-Digit OCR System for Historical Records (Computer Vision). Available online:
      http://cs230.stanford.edu/projects_spring_2020/reports/38792124.pdf (accessed on 4 April 2023).
55.   Goodfellow, I.J.; Bulatov, Y.; Ibarz, J.; Arnoud, S.; Shet, V. Multi-digit number recognition from street view imagery using deep
      convolutional neural networks. arXiv 2013, arXiv:1312.6082
56.   Samuel, J.; Palle, R.; Soares, E.C. Textual Data Distributions: Kullback Leibler Textual Distributions Contrasts on GPT-2 Generated
      Texts, with Supervised, Unsupervised Learning on Vaccine & Market Topics & Sentiment. arXiv 2022, arXiv:2107.02025.
57.   Rahman, M.M.; Ali, G.M.N.; Li, X.J.; Samuel, J.; Paul, K.C.; Chong, P.H.; Yakubov, M. Socioeconomic factors analysis for COVID-19
      US reopening sentiment with Twitter and census data. Heliyon 2021, 7, e06200. [CrossRef]
58.   Samuel, J.; Ali, G.G.M.N.; Rahman, M.M.; Esawi, E.; Samuel, Y. COVID-19 public sentiment insights and machine learning for
      tweets classification. Information 2020, 11, 314. [CrossRef]
Information 2023, 14, 305                                                                                                      19 of 19
59.   Bhandari, A.; Kumar, V.; Thien Huong, P.T.; Thanh, D.N. Sentiment analysis of COVID-19 tweets: Leveraging stacked word
      embedding representation for identifying distinct classes within a sentiment. In Artificial Intelligence in Data and Big Data
      Processing, Proceedings of ICABDE 2021, Ho Chi Minh City, Vietnam, 18–19 December 2022; Springer: Berlin/Heidelberg, Germany,
      2022; pp. 341–352.
60.   Kumar, V.; Reforgiato Recupero, D.; Helaoui, R.; Riboni, D. K-LM: Knowledge Augmenting in Language Models Within the
      Scholarly Domain. IEEE Access 2022, 10, 91802–91815. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.