0% found this document useful (0 votes)

23 views8 pages

Devanagari Ancient Documents Recognition Using Statistical Feature Extraction Techniques

Uploaded by

abhay sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views8 pages

Devanagari Ancient Documents Recognition Using Statistical Feature Extraction Techniques

Uploaded by

abhay sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Sådhanå (2019) 44:141 Ó Indian Academy of Sciences

https://doi.org/10.1007/s12046-019-1126-9 Sadhana(0123456789().,-volV)FT3
](0123456789().,-volV)

Devanagari ancient documents recognition using statistical feature

extraction techniques
SONIKA NARANG1, M K JINDAL2 and MUNISH KUMAR3,*
1
Department of Computer Science, DAV College, Abohar, India
2
Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, India
3
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, India
e-mail: sonikanarang@davcollegeabohar.com; manishphd@rediffmail.com; munishcse@gmail.com

MS received 19 July 2018; revised 14 November 2018; accepted 4 March 2019; published online 13 May 2019

Abstract. Devanagari ancient document recognition process is drawing a lot of consideration from researchers
nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not
available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is
designed for digital archiving. This system includes image binarization, character segmentation and recognition
phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may
include basic characters (vowels and consonants), modifiers (matras) and various compound characters (char-
acters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient
manuscripts recognition system has been presented using statistical features extraction techniques. In feature
extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent
features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Percep-
tron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and
classification techniques are considered and compared to the recognition of basic characters segmented from
Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents,
is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination
of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.

Keywords. Ancient manuscripts; Devanagari historical documents; feature extraction; classification.

1. Introduction work on digitizing historical documents [1]. Hence, this work

has motivated us to offer a system for recognition of
The recognition of characters from scanned printed or hand- Devanagari ancient documents. In this paper, authors have
written documents can be made using one major area of pattern recognized characters of Devanagari ancient documents using
recognition called Optical Character Recognition (OCR). various features, namely, intersection and open endpoints
OCR involves many steps like image pre-processing and (F1), centroid (F2), horizontal peak extent (F3) and vertical
segmentation followed by feature extraction, classification peak extent (F4). For classification, authors have considered
and post-processing. Text recognition of ancient documents five classifiers, namely, MLP (C1), Neural Network (C2),
poses many challenges because of the degraded quality of Convolution Neural Network (CNN, C3), RBF-SVM (C4) and
documents. Degradation of ancient documents may be due to random forest (C5), for performing the classification task. This
age or writing style. There may be documents with ink stains, paper is structured as follows. Characteristics of Devanagari
faded ink, uneven space between text lines, overlapping of text script are discussed in section 2. Prior work is presented in
lines or characters, different layouts, broken characters or torn section 3. Section 4 describes the proposed methodology.
pages. India has a very rich history and culture and ancient Results and discussion are given in section 5. Concluding
documents of India are a wealth of knowledge. These docu- notes and future directions are presented in section 6.
ments are preserved by libraries and museums, but the purpose
of the preservation has not been served [1]. These documents
are not available for public owing to their delicate condition. 2. Characteristics of Devanagari script
To preserve our cultural heritage and for automated processing
of documents, libraries and national archives have initiated India is a multi-script and multi-lingual country. One of the
most popular scripts in India is the Devanagari script.
*For correspondence Devanagari is used to write Hindi, Marathi, Nepali and

1
141 Page 2 of 8 Sådhanå (2019) 44:141

Sanskrit languages. Devanagari has 11 vowels, 33 conso- proposed OCR builds fuzzy membership functions from
nants and 3 common conjuncts. These are known as basic oriented features extracted using Gabor filter banks. Cecotti
characters. We can write vowels as independent characters and Belaid [7] presented a hybrid combination approach
or by using some special diacritical marks. These diacritical complemented by specialized ICR for ancient documents.
marks are known as modifiers or matras. Characters formed They proposed a model for combining several OCRs and
using modifiers are called conjuncts. Some characters, specialized intelligent character recognition based on the
formed by combining two or more consonants, are called CNN. Diem and Sablatnig [8] presented a work to recog-
compound characters. A set of the basic Devanagari char- nize degraded characters using local features. This work
acteristics is delineated in tables 1 and 2. There is a hori- was done in the ancient manuscript where the character was
zontal line at the upper part of every Devanagari character, washed out (partially visible) due to age. Due to washout
known as Shirorekha or headline. Various characters of the characters, it was not suitable for binarization. Hence,
same word are joined by the headline. Headline poses a segmentation-free approach based on local descriptors was
challenge while segmenting characters. There are many developed. Local descriptors are classified using Support
similarly shaped characters, which make character recog- Vector Machine (SVM) and then identified by a voting
nition a challenging problem [2]. scheme of neighbouring regional descriptors.
Raghuraj et al [9] presented a scheme to develop com-
plete OCR for five different fonts and sizes of Devanagari
3. Related work characters. They used various approaches like matrix
matching, fuzzy logic, feature extraction, structural analysis
Kleber et al [3] presented a method to generalize missing and neural networks. They used a histogram-based thresh-
parts of a degraded ancient document based on a priori old approach to convert images to two-tone images, median
knowledge. They introduced an algorithm for ruling esti- filters for salt and pepper noise and derivative operators to
mation of Glagolitic texts that are based on the extraction of increase edges. They used three features: mean distance,
text line. The proposed algorithm is appropriate for histogram of projection based on the spatial position of
degraded manuscripts. For line extraction, they have con- pixels and the histogram of projection based on pixel value.
sidered the approach based on connected component. They used artificial neural network (ANN) approach for
Bansal and Sinha [4] developed a complete recognition classification. Holambe et al [10] presented an overview of
system for text in Devanagari script. They observed real- feature extraction and selection methods for recognition of
life printed text in Devanagari that contained character numerals and characters of the Devanagari script. They
fusions and noisy environment. For feature extraction, they used Zernike moment for feature extraction and - k-NN
employed region coverage of the core strip, vertical bar classifier based on Euclidean distance. Yadav et al [11]
feature, horizontal zero crossings, number of positions of proposed an OCR system for printed Hindi recognition,
the vertex points, moments and structural descriptors of the using ANN. They used projection profiles for segmentation
characters as features. By employing a decision tree clas- and histograms of projection based on mean distance, his-
sifier, they attained an accuracy of about 93.0%. Kim et al togram of projection based on pixel value, vertical zero
[5] presented a dedicated OCR system for Hanja historical crossing for feature extraction and back-propagation neural
documents. Sousa et al [6] proposed an OCR system based network with two hidden layers for classification. They
on fuzzy logic for ancient printed documents. Their achieved a recognition rate of about 90.0%. Yunxue et al
[12] introduced restoration method for character image in
Table 1. Vowels and corresponding modifiers.
order to recognize unconstrained character handwritten in
Chinese. They modelled the character image by combining
the ideal character image with two types of noise images,
namely, omitted stroke noise image and added stroke noise
image. Restoration is done in order to maintain the original
gradient features. The determined features are then
employed to differentiate similar characters.
Table 2. Consonants.
Katiyar and Mehfuz [13] developed a hybrid recognition
system to recognize offline handwritten characters. They
merged multiple features extracted using seven different
approaches. In order to optimize the number of features,
Genetic Algorithm is employed. Adaptive Multi-Layer
Perceptron (MLP) classifier is employed for classification
purpose. For experiments, CEDAR (Centre of Excellence
for Document Analysis and Recognition) database on
English alphabet is used. Belhe et al [14] developed a
recognition system for handwritten words in Hindi. For
Sådhanå (2019) 44:141 Page 3 of 8 141

recognition, HMM and tree classifiers are employed. As a 4.1 Image acquisition and digitization
result, a recognition accuracy of 89.0% based on 10,000
Hindi words has been attained. Lehal and Singh [15] For this work, Devanagari ancient documents are collected
worked for feature extraction and classification for OCR of from various libraries and museums. A sample of the
Gurmukhi script. They developed two sets of features: Devanagari ancient document is shown in figure 2. Digiti-
Primary feature set and Secondary feature set. They used zation means converting paper-based document into an
binary tree classifier and nearest neighbours for classifica- electronic form. The electronic conversion is obtained by
tion. In order to recognize offline handwritten Gurmukhi scanning or by using a digital camera. Bitmap image of the
characters, Kumar et al [16] proposed a novel feature original document is produced in this phase.
extraction technique. Various feature extraction techniques
based on curvature features have also been proposed by
them for recognition of offline characters handwritten in 4.2 Image pre-processing
Gurmukhi script [17]. Kumar et al [18] also presented a In image pre-processing, three steps are considered. In the
survey for character and numeral recognition of various first step, the document image is enhanced using an auto-
non-Indic and Indic scripts. Based on the related work, correct feature of Office image viewer. In the second step,
authors noticed that a lot of work has been done for printed the input image is transformed into a binary image. For
and handwritten text recognition of different scripts. binarization, the global threshold value is used. If the global
However, text recognition system for ancient documents is threshold value does not give satisfactory results, then the
not provided. Hence, in this paper, the authors have pre- local threshold value is used in the third step.
sented a Devanagari ancient character recognition that is a
combination of multiple classifiers with the majority voting
scheme. 4.3 Segmentation
Segmentation phase is used to partition the input document
4. Proposed system into lines, words and characters.
4.3a Line segmentation: For line segmentation, piecewise
For recognition of Devanagari ancient manuscripts, the projection profile is used. Devanagari ancient documents
proposed system consists of various phases like image have a slant and touching/overlapping lines. Hence, authors
acquisition, pre-processing, segmentation, feature extrac- have considered piecewise projection profiles to segment
tion and classification. Block diagram of the proposed lines. In this method, the document image is partitioned
system is depicted in figure 1. into vertical strips and then piecewise horizontal projection
profile (HPP) is used to segment lines. After this, average
line height was used to check the correctness of segmen-
tation. Based on average line height, some lines were found
to be over-segmented and some lines were found to be
Digitization under-segmented.
A general algorithm for line segmentation is given here.
Algorithm: for line_segmentation

Pre-processing Step 1: Divide the document image into fixed size vertical
stripes.
Step 2: Calculate horizontal projection profile (HPP) in
each row of each stripe.
Segmentation

Feature extraction

Classification

Text Recognized

Figure 1. Block diagram of the proposed system. Figure 2. A sample of Devanagari ancient document.
141 Page 4 of 8 Sådhanå (2019) 44:141

Step 3: If HPP\=threshold value for any row, then that into 64 9 64 using Nearest Neighbourhood Interpolation
row is considered as piece-wise separating line (PSL). (NNI) algorithm.
Step 4: Consecutive PSLs are reduced to one PSL only.
Step 5: Average line height (avg_line_height) is computed.
Step 6: Based on avg_line_height, over-segmentation is 4.4 Feature extraction
detected and handled.
In this work, authors have extracted four types of statis-
Step 7: Based on avg_line_height, under-segmentation is
tical features, namely, intersection and open endpoints,
detected and handled.
centroid, horizontal peak extent and vertical peak extent,
Step 8: Finally, the lines are separated.
for Devanagari ancient character recognition. Kumar et al
4.3b Word segmentation: For word segmentation, vertical [19] extracted various types of features for offline hand-
projection profiles (VPP) are used. If the number of con- written Gurmukhi character recognition and they presented
secutive columns with VPP=0 is greater than a given a study of these features with different classifiers. They
threshold value, then this is considered as a word boundary. concluded that intersection and open endpoints, centroid,
4.3c Character segmentation: It is very difficult to segment horizontal peak extent and vertical peak extent features
characters from a word because (i) neighbouring characters perform better than other techniques for handwritten text
in a word may touch and (ii) neighbouring characters may recognition. A pixel that has more than one pixels in its
not touch each other but they can overlap. Character seg- neighbourhood is known as intersection point. A pixel that
mentation in Devanagari documents becomes very easy if has only one pixel in its neighbourhood is known as open
the headline (Shirorekha) is removed, but ancient endpoint. The centroid is the point that can be considered
Devanagari documents have a thick and uneven headline. as the centre of a two-dimensional image. In this work,
Hence, it is very difficult to remove the headline from such authors have considered the centroid of foreground pixels
documents. Characters are segmented without removing in each zone of a character image as features. In horizontal
headline. Characters of the document image are segmented peak extent based features, a character image is divided
in multiple iterations. In the first iteration, connected into 85 zones. Then the sum of successive black pixels in
components in the document image are found. Due to the each row of a zone is computed. Maximum values (peak
writing style of ancient documents, many times, most of the values) in each row of a zone are added. The resulting sum
characters have been segmented correctly as in most of the is the required feature value of that zone. Authors get a
ancient documents, there is a slight break in the headline feature set of length 85 for 85 zones. Similarly, in features
after every character as shown in figure 3. based on vertical peak extent the sum of successive fore-
To find touching/overlapping characters, the aspect ratio ground pixels in the vertical direction in each column of a
of all the connected components was found. If the aspect zone is computed. Hence, authors considered these four
ratio of any component is greater than the threshold value, techniques in the present work. These features are calcu-
it is identified as having touching characters. lated on different zones of a character image of 64 9 64
4.3d Character normalization: Initially, all the segmented size. Authors used the hierarchical zoning method to
images of Devanagari ancient documents are normalized obtain zones in an image. For hierarchical zoning, first of
all, the entire image is considered as one zone and features
are calculated for this zone. After this, this image is
divided into 4 zones of size 32 9 32 each and features of
these 4 zones are calculated. Next, these 4 zones are fur-
ther divided into 16 zones of size 16 9 16 each as
depicted in figure 4. Features are calculated for these 16
zones. These 16 zones are further divided into 64 zones of
size 8 9 8 each and features are calculated for these 64
Figure 3. Characters already segmented because of writing
zones. Hence, a feature vector of 85 (1 ? 4 ? 16 ? 64)
style. zones has been extracted.

Figure 4. A character image: a one zone, b 4 zones, c 16 zones and d 64 zones.

Sådhanå (2019) 44:141 Page 5 of 8 141

4.5 Classification However, SVM with RBF kernel performs better than other
kernels. Hence, in the present paper, authors have consid-
Classification techniques are going to be described in this ered only RBF kernel of SVM.
section. Features extracted in the feature extraction stage
are used for classification. Classification decides the class 4.5e Random forests: A random forest can be considered as a
membership. Classification makes decisions in a character collection of decision trees. Decision trees are white box
recognition system. For classification task, MLP (C1), models, which implies that the inner workings of these
Neural Network (C2), CNNs (C3), RBF-SVM (C4) and models are clearly understood. In the case of classification,
random forest (C5) are considered. Finally, results are the data are segregated based on a series of questions. Any
computed by simple majority voting scheme with all the new data point is assigned to the selected leaf node. In a
classification techniques considered in this paper. Follow- decision tree, start at the root of tree and use the decision
ing subsections briefly describe the classifiers used in the algorithm to split the data on the feature, resulting in the
present work. largest information gain (IG). This splitting procedure is then
repeated in an iterative process at each child node until the
4.5a Multilayer perceptron: MLP is a feed-forward neural leaves are classes. This means that the data samples at each
network. It has one or more layers between the input and node belong to the same class. From the training set, a
output layer. In a feed-forward, data flows from input layer sample of size n is drawn randomly. A decision tree from the
to the output layer (forward). Back propagation learning bootstrap sample is grown. Grow ‘k’ such trees using a
algorithm is used to train such network. Multi-Layer Per- subset of samples. Aggregate the prediction of each tree for a
ceptrons are used to solve non-linearly separable problems. new data point and use majority vote (pick the group
4.5b ANNs: ANNs can be considered as structure that selected by the most number of trees and assign new data
contain adaptive simple processing elements called artifi- point to that group) to assign the class label.
cial neurons, which are tightly inter-connected. These
artificial neurons have the capacity to perform extensively
parallel computations for data processing and knowledge 5. Experimental results and discussion
representation [22]. Neural networks are typically layered
structures. There are interconnected nodes in each layer. In this work, authors have considered 6152 characters (34-
Nodes contain an activation function. Input layer receives class problem), which are segmented from Devanagari
input data, the output layer gives classification results and ancient manuscripts collected from various libraries and
others are hidden layers present between the input and museums. In recognition of vowels and consonants of
output layer. Connections between nodes have weights that ancient Devanagari manuscripts, Intersection and open
are modified according to the input and output provided. endpoint features (F1), Centroid features (F2), Horizontal
Input is passed into the first layer. Individual neurons peak extent features (F3), Vertical peak extent features (F4)
receive the inputs, with each of them receiving a specific and all of their possible combinations in serial mode are
value. After this, an output is produced based on these also used to improve recognition accuracy. For classifica-
values. The outputs of the first layer are then passed into the tion, MLP (C1), Neural Network (C2), CNN (C3), RBF-
second layer to be processed. This continues until the final SVM (C4) and random forest (C5) are used. CNN and
output is produced. The assumption is that the correct random forest perform better than other classification
output is predefined. The network, in effect, trains itself. techniques considered in this work, because CNN can
extract topological properties of an image and they are
4.5c CNN: CNN is a supervised deep learning algorithm that is learnt with a version of the back-propagation algorithm.
trained using the back-propagation algorithm. CNNs can They can recognize patterns with extreme variability.
withdraw features automatically. CNN is utilized to learn Random forest classifier achieves the best recognition
complex, high-dimensional data, and diverge on the basis of accuracy because initially it does efficient feature selection
investigation of convolutional and sub-sampling layers [20]. for classification. It then builds trees based on good features
CNNs are considered as the hierarchical architecture of MLPs and favours these trees over other trees that are built based
where the succeeding alternating layers are designed. These on noisy features. Experimental results are obtained using
layers are designed to learn successively higher-level features, 5-fold cross-validation technique. In the MLP classifier,
and the classification results are produced by the last layer [21]. learning rate is set to 0.3 and the momentum is set to 0.2. In
The two basic operations, namely, convolution and sub-sam- CNN classifier, authors have taken the patch size as 3 9 3
pling, are provided by the alternating layers of CNNs. and pool size as 2 9 2. For F1 feature set, the accuracy
4.5d SVM: SVM is a widely used technique for classifica- achieved is 39.59%. For F2 features, accuracy is slightly
tion. SVMs are used for supervised learning. They are improved and an accuracy of 66.95% is achieved. For F3
based on statistics. SVM classification is used to assign data set of features, achieved recognition accuracy is 74.64%
to various classes. Authors have used SVM classifier for the and for F4 feature set, accuracy is 60.78%. Different
same data set with linear, polynomial and RBF kernels. combinations of these features have been experimented to
141 Page 6 of 8 Sådhanå (2019) 44:141

Table 3. Confusion matrix based on a combination of all features (F1 ? F2 ? F3 ? F4) and majority voting scheme.

Total Accurately
Character
samples recognized Confused with characters

56 27 च(4) ह(4) क(10) म (1) प(3) व (7)

21 2 द(2) ह(6) र(9) व(2)
30 13 ह(3) र(7) स(5) य(2)
277 268 च(1) ह(3) ज(2) म(1) र(1) त(1)
56 28 च(1) क(2) म(17) न(6) र(2)
16 0 अ(1) ध(1) क(1) स(1) य(12)
148 74 ह(1) म(1) न(2) त(1) व(69)
12 2 ध(1) ह(5) प(1) थ(1) व(1) य(1)
89 51 क(2) ल(2) म(2) न(11) त(21)
59 46 द(3) ह(4) म(1) र(2) स(1) त(2)
42 17 द(3) ह(16) र(3) त(3)
25 7 ए(1) ह(5) क(4) ल(2) र(1) स(5)
705 668 ह(4) Õ(3) ल(3) म(4) न(18) र(5)
84 40 ह(2) क(1) प(10) त(1) व(3) य(27)
202 142 ह(13) न(1) प(2) र(15) ट(2) व(25) य(2)
80 33 ह(1) प(11) र(2) थ(1) व(8) य(24)
498 437 ध(1) क(1) म(9) र(4) त(36) व(10)
354 292 ध(2) ह(5) म(7) य(48)
21 1 च(1) ह(2) क(17)
82 31 म(35) न(7) स(1) त(7) य(1)
594 559 ह(1) क(1) न(8) प(1) र(4) स(8) त(12)
410 362 च(10) द(3) ध(1) ह(2) प(27) म(5)
445 426 द(3) ह(3) न(5) प(2) व(6)
158 106 द(1) ह(2) म(4) न(12) स(11) त(22)
693 665 च(5) द(1) ह(2) म(2) न(6) र(6) त(6)
23 2 ह(4) ज(1) क(5) म(2) स(9)
54 3 ह(1) म(4) प(25) र(3) व(9) य(9)
352 304 ह(3) क(1) ल(1) म(28) व(4) त(11)
294 250 द(5) क(1) र(7) स(6) त(21) व(4)
36 17 द(1) ह(2) क(2) न(1) स(8) त(4) व(1)
87 40 च(3 ह(6) ल(1) म(2) र(3) त(3) व(29)
17 1 ह(2) र(1) स(7) त(6)
38 3 अ(3) ह(4) प(6) त(1) थ(1) व(6) य(14)
94 50 ह(1) म(6) प(9) र(25) स(1) व(2)
Sådhanå (2019) 44:141 Page 7 of 8 141

Figure 5. Recognition results using different features and classifiers.

improve the accuracy. Authors have achieved the recogni- features with hierarchical zoning (similar to proposed
tion accuracy of 79.41% using a combination of features work) for offline handwritten Gurmukhi character recog-
F2 ? F3 ? F4 with MLP classifier. A recognition accuracy nition. They experimented with different combinations for
of 82.80% using a neural network classifier with a combi- features. They used SVM for classification. Table 4 depicts
nation of all features (F1 ? F2 ? F3 ? F4) has been the comparison of their best accuracy on full feature set
achieved. The CNN is the best classifier in the field of with present work.
computer vision and pattern recognition.
In this work, authors have considered LeNet architecture
for CNN. Recognition accuracy of 78.87% has been 6. Conclusion and future directions
achieved using a combination of all features (F1 ? F2 ?
F3 ? F4) and random forest classifier. Finally, a combi- Authors have presented a character recognition system for
nation of multiple classifiers considered in this paper by a the ancient Devanagari documents. Ancient documents
majority voting scheme to test and improve the accuracy of were collected from libraries and museums and characters
Devanagari ancient character recognition has been used. are segmented from these documents. For the recognition
Simple majority voting is a decision rule that chooses one of these characters, 4 features are extracted. These features
of many alternatives. This selection is based on the pre- are intersection and open endpoints, centroid, horizontal
dicted classes with the majority votes. Once the training of peak extent and vertical peak extent. All the possible
individual classifiers is done, majority voting does not combinations of these features are used. Authors have used
require tuning of any parameter [23]. Using a majority five classifiers, namely, MLP (C1), Neural Network (C2),
voting scheme of classifiers (C1–C5), the maximum CNN (C3), RBF-SVM (C4) and Random Forest (C5)
recognition accuracy of 88.95% has been achieved with a classifiers. Finally, a simple majority voting scheme with
combination of all features (F1 ? F2 ? F3 ? F4). Con- different combinations of these classifiers is used. Maxi-
fusion matrix for this case is depicted in table 3. These mum accuracy of 88.95% was achieved when all the fea-
recognition results are illustrated in figure 5. tures were combined and simple majority voting was used
Kumar et al [24] used diagonal features, centroid fea- for classification. This work considers only basic characters
tures, horizontal peak extent and vertical peak extent of the Devanagari script. In the future, authors will include
modifiers and conjuncts for recognition. Also, more effi-
Table 4. Compared with existing work.
cient features will be extracted to increase the recognition
accuracy and other classification techniques will be
Accuracy experimented.
Authors Database Classifier (%)
Kumar Handwritten Linear-SVM 90.08
et al Gurmukhi References
[24] text
Proposed Devanagari Simple majority voting 83.92 [1] Shah K R and Badgujar D D 2013 Devnagari handwritten
work ancient on MLP, NN, CNN, character recognition (DHCR) for ancient documents: a
text decision tree, random review. In: Proceedings of the 2013 IEEE Conference on
forest Information and Communication Technology. 656–660
141 Page 8 of 8 Sådhanå (2019) 44:141

[2] Sarkar R, Malakar S, Das N, Basu S and Nasipuri M 2010 A character recognition. Int. J. Doc. Anal. Recognit. 18(1):
script independent technique for extraction of characters 73–86
from handwritten word images. Int. J. Comput. Appl. 1(23): [13] Katiyar G and Mehfuz S 2016 A hybrid recognition system
83–88 for off-line handwritten characters. SpringerPlus 5: 1–18
[3] Kleber F, Sablatnig R, Gau M and Miklas H 2008 Ancient [14] Belhe S, Paulzagade C, Deshmukh A, Jetley S and Mehrotra
document analysis based on text line extraction. In: Pro- K 2012 Hindi handwritten word recognition using HMM and
ceedings of the 19th International Conference on Pattern symbol tree. In: Proceedings of the Workshop on Document
Recognition. 1–4 Analysis and Recognition (DAR). 9–14
[4] Bansal V and Sinha R M K 2001 A complete OCR for [15] Lehal G S and Singh C 1999 Feature extraction and classi-
printed Hindi text in Devanagari script. In: Proceedings of fication for OCR of Gurmukhi script. Vivek 12(2): 2–12
the 6th International Conference on Document Analysis and [16] Kumar M, Sharma R K and Jindal M K 2013 A novel feature
Recognition. 800–804 extraction technique for offline handwritten Gurmukhi
[5] Kim M S, Jang M D, Choi H L, Rhee T H, Kim J H and Kwag H character recognition. IETE J. Res. 59(6): 687–692
K 2004 Digitalizing scheme of handwritten Hanja historical [17] Kumar M, Sharma R K and Jindal M K 2014 Efficient fea-
documents. In: Proceedings of the First International Work- ture extraction techniques for offline handwritten Gurmukhi
shop on Document Image Analysis for Libraries. 321–327 character recognition. Natl. Acad. Sci. Lett. 37(4): 381–391
[6] Sousa J M C, Pinto J R C, Ribeiro C S and Gil J M 2005 [18] Kumar M, Sharma R K and Jindal M K 2018 Character and
Ancient document recognition using fuzzy methods. In: numeral recognition for non-Indic and Indic scripts: a survey.
Proceedings of the IEEE International Conference on Fuzzy Artif. Intell. Rev. https://doi.org/10.1007/s10462-017-9607-x
Systems. 833–836 [19] Kumar M, Jindal M K and Sharma R K 2012 Offline hand-
[7] Cecotti H and Belaid A 2005 Hybrid OCR combination written Gurmukhi character recognition: study of different
approach complemented by a specialized ICR applied on features and classifiers combinations. In: Proceedings of the
ancient documents. In: Proceedings of the 8th International International Workshop on Document Analysis and Recog-
Conference on Document Analysis and Recognition. 1045–1049 nition, IIT Bombay. 94–99
[8] Diem M and Sablatnig R 2009 Recognition of degraded [20] Elleuch M, Maalej R and Kherallah M 2016 A new design
handwritten characters using local features. In: Proceedings based-SVM of the CNN classifier architecture with dropout
of the 10th International Conference on Document Analysis for offline Arabic handwritten recognition. Procedia Com-
and Recognition. 221–225 puter Science 80: 1712–1723
[9] Raghuraj S, Yadav C S, Verma P and Yadav V 2010 Optical [21] Lecun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-
Character Recognition (OCR) for printed Devanagari script based learning applied to document recognition. Proc. IEEE
using artificial neural network. Int. J.Computer Science & 86(11): 2278–2324
Communication 1(1): 91–95 [22] Niu X Y, Xia L Y, Wang T X and Zhang X Y 2010
[10] Holambe A N, Thool R C and Jagade S M 2011 A brief Application of BP-ANN and LS-SVM to discrimination of
review and survey of feature extraction methods for Dev- rice origin based on trace metals. Proc. Int. Conf. Mach.
nagari OCR. In: Proceedings of the 9th International Con- Learn. Cybern. 3: 1426–1430
ference on ICT and Knowledge Engineering. 99–104 [23] Zhang Y, Liu B and Yang F 2016 Differential evolution
[11] Yadav D, Sánchez-Cuadrado S and Morato J 2013 OCR for based selective ensemble of extreme learning machine. In:
Hindi language using a neural network approach. J. Inf. IEEE Trustcom/Bgdatase/ispa. 1327–1333
Process. Syst. 9(1): 117–140 [24] Kumar M, Sharma R K and Jindal M K 2014 A novel
[12] Yunxue S Y, Wang C and Xiao B 2015 A character image hierarchical technique for offline handwritten Gurmukhi
restoration method for unconstrained handwritten Chinese character recognition. Natl. Acad. Sci. Lett. 37(6): 567–572

1 s2.0 S1877050919306854 Main
No ratings yet
1 s2.0 S1877050919306854 Main
11 pages
CNN - LSTM Based Approach For Recognition of Devanagari Manuscripts
No ratings yet
CNN - LSTM Based Approach For Recognition of Devanagari Manuscripts
5 pages
Recognition of Devanagari Printed Text Using Neural Network and Genetic Algorithm
No ratings yet
Recognition of Devanagari Printed Text Using Neural Network and Genetic Algorithm
4 pages
Devanagari OCR Using A Recognition Driven Segmenta
No ratings yet
Devanagari OCR Using A Recognition Driven Segmenta
17 pages
Comparison of Deep CNN and ResNet For Handwritten Devanagari Character Recognition
No ratings yet
Comparison of Deep CNN and ResNet For Handwritten Devanagari Character Recognition
4 pages
Paper 2
No ratings yet
Paper 2
3 pages
A Survey On Recognition of Devnagari Script: Ratnashil N Khobragade1 Dr. Nitin A. Koli Mahendra S Makesar
No ratings yet
A Survey On Recognition of Devnagari Script: Ratnashil N Khobragade1 Dr. Nitin A. Koli Mahendra S Makesar
5 pages
Layer - 1 2
No ratings yet
Layer - 1 2
12 pages
Project Report OCR
92% (25)
Project Report OCR
50 pages
Character Recognition of Devanagari Characters Using Artificial Neural Network
No ratings yet
Character Recognition of Devanagari Characters Using Artificial Neural Network
4 pages
Prashanth2022 Article HandwrittenDevanagariCharacter
No ratings yet
Prashanth2022 Article HandwrittenDevanagariCharacter
30 pages
Devanagari OCR with Curvelet Transform
No ratings yet
Devanagari OCR with Curvelet Transform
8 pages
Optical Character Recognition of Handwri PDF
No ratings yet
Optical Character Recognition of Handwri PDF
6 pages
Paper 8863
No ratings yet
Paper 8863
16 pages
A Deep Learning Approach For Optical Character
No ratings yet
A Deep Learning Approach For Optical Character
6 pages
OCR For Printed Telugu Documents
No ratings yet
OCR For Printed Telugu Documents
32 pages
Handwritten Marathi Compound Character PDF
No ratings yet
Handwritten Marathi Compound Character PDF
6 pages
Devanagari Handwritten Character Segmentation
No ratings yet
Devanagari Handwritten Character Segmentation
4 pages
Handwritten Marathi Character Recognition Using R
No ratings yet
Handwritten Marathi Character Recognition Using R
10 pages
An Attempt To Recognize Handwritten Tamil Character Using Kohonen SOM
No ratings yet
An Attempt To Recognize Handwritten Tamil Character Using Kohonen SOM
5 pages
Devanagari Script Recognition
No ratings yet
Devanagari Script Recognition
5 pages
Digitisation of Handwritten Davengari Text
No ratings yet
Digitisation of Handwritten Davengari Text
9 pages
Devnagari Handwritten Numeral Recognition Using Geometric Features and Statistical Combination Classifier
No ratings yet
Devnagari Handwritten Numeral Recognition Using Geometric Features and Statistical Combination Classifier
8 pages
Telugu OCR with Deep Learning
No ratings yet
Telugu OCR with Deep Learning
32 pages
Ocr Progress4
No ratings yet
Ocr Progress4
22 pages
Ancient Kannada Text Recognition IEEE Paper
No ratings yet
Ancient Kannada Text Recognition IEEE Paper
4 pages
OCR for Bangla and Devnagari Scripts
No ratings yet
OCR for Bangla and Devnagari Scripts
5 pages
Nepali Script OCR with Tesseract & ANN
No ratings yet
Nepali Script OCR with Tesseract & ANN
7 pages
Handwritten Assamese - Character
No ratings yet
Handwritten Assamese - Character
9 pages
Gujarati Character Recognition: 1.1 Previous Work
No ratings yet
Gujarati Character Recognition: 1.1 Previous Work
4 pages
A Top-Down Character Segmentation Approach For Assamese and Telugu Handwritten Documents
No ratings yet
A Top-Down Character Segmentation Approach For Assamese and Telugu Handwritten Documents
13 pages
Devanagari Character Recognition
No ratings yet
Devanagari Character Recognition
6 pages
Handwritten Assamese Character
No ratings yet
Handwritten Assamese Character
9 pages
Pehchaan Hindi Handwritten Character Recognition S
No ratings yet
Pehchaan Hindi Handwritten Character Recognition S
6 pages
OCR Sanskrit CNN
No ratings yet
OCR Sanskrit CNN
6 pages
(40-48) CE Ancient Tamil Character Recognition-Format
No ratings yet
(40-48) CE Ancient Tamil Character Recognition-Format
9 pages
Recognition of Marathi Handwritten Numerals by Using Support Vector Machine
No ratings yet
Recognition of Marathi Handwritten Numerals by Using Support Vector Machine
8 pages
Handwritten Telugu Character Recognition Using Machine Learning
No ratings yet
Handwritten Telugu Character Recognition Using Machine Learning
6 pages
Study On The OCR of The Devanagari Script Using CNN
No ratings yet
Study On The OCR of The Devanagari Script Using CNN
7 pages
Research Paper
No ratings yet
Research Paper
8 pages
Review On Optical Character Recognition of Devanagari Script Using Neural Network
No ratings yet
Review On Optical Character Recognition of Devanagari Script Using Neural Network
6 pages
OCR for Handwritten Devnagri Numerals
No ratings yet
OCR for Handwritten Devnagri Numerals
16 pages
A Deep Neural Network Based Holistic Approach For Optical Character Recognition of Handwritten Documents
No ratings yet
A Deep Neural Network Based Holistic Approach For Optical Character Recognition of Handwritten Documents
9 pages
Recognition of Isolated Handwritten Kannada Vowels: Sangame S.K., Ramteke R.J., Rajkumar Benne
No ratings yet
Recognition of Isolated Handwritten Kannada Vowels: Sangame S.K., Ramteke R.J., Rajkumar Benne
4 pages
Kunte 2007
No ratings yet
Kunte 2007
6 pages
Aabin
No ratings yet
Aabin
4 pages
Survey On Handwritten Gujarati Word Image Matching
No ratings yet
Survey On Handwritten Gujarati Word Image Matching
5 pages
A Simple and Efficient Optical Character Recognition System For Basic Symbols in Printed Kannada Text
No ratings yet
A Simple and Efficient Optical Character Recognition System For Basic Symbols in Printed Kannada Text
13 pages
OHKWR Offline Handwritten Kannada Words Recognitio
No ratings yet
OHKWR Offline Handwritten Kannada Words Recognitio
9 pages
05698323
No ratings yet
05698323
6 pages
Paper 1
No ratings yet
Paper 1
3 pages
Paper 5
No ratings yet
Paper 5
5 pages
Optical Character Recognition OCR For Telugu Datab
No ratings yet
Optical Character Recognition OCR For Telugu Datab
6 pages
Dhandra 2007
No ratings yet
Dhandra 2007
5 pages
ML Report 2025
No ratings yet
ML Report 2025
18 pages
Algorithm For Devanagari Character
No ratings yet
Algorithm For Devanagari Character
6 pages
A Hybrid Model For End To End Online Handwriting Recognition
No ratings yet
A Hybrid Model For End To End Online Handwriting Recognition
6 pages
Lattice
No ratings yet
Lattice
14 pages
DSTL - ST1 2024-25, Odd Sem - SET 2
No ratings yet
DSTL - ST1 2024-25, Odd Sem - SET 2
3 pages
DSTL - ST1 2024-25, Odd Sem - SET1
No ratings yet
DSTL - ST1 2024-25, Odd Sem - SET1
3 pages
Hanuman Chalisa
No ratings yet
Hanuman Chalisa
3 pages
Graph BFS DFS
No ratings yet
Graph BFS DFS
4 pages
Types of Images Filters and Smoothing Techniques
No ratings yet
Types of Images Filters and Smoothing Techniques
9 pages
15.-Oracle Cloud Infrastructure AI Foundations
No ratings yet
15.-Oracle Cloud Infrastructure AI Foundations
14 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
Sentiment Analysis Meets Explainable Artificial Intelligence A Survey On Explainable Sentiment Analysis
No ratings yet
Sentiment Analysis Meets Explainable Artificial Intelligence A Survey On Explainable Sentiment Analysis
10 pages
Practical 8 GRU
No ratings yet
Practical 8 GRU
3 pages
Data Mining: Concepts and Techniques, 4th Edition Jiawei Han PDF Version
50% (2)
Data Mining: Concepts and Techniques, 4th Edition Jiawei Han PDF Version
130 pages
Inside Deep Learning: Math, Algorithms, Models 1st Edition Edward Raff Online Version
No ratings yet
Inside Deep Learning: Math, Algorithms, Models 1st Edition Edward Raff Online Version
164 pages
Deep Learning Workshop Session 2
No ratings yet
Deep Learning Workshop Session 2
4 pages
DL Unit 4
No ratings yet
DL Unit 4
58 pages
CS3491 AI ML Model Exam Answer Key
No ratings yet
CS3491 AI ML Model Exam Answer Key
6 pages
DeepMind Reinforcement Learning Overview
No ratings yet
DeepMind Reinforcement Learning Overview
216 pages
Expectation-Maximization (E-M) Algorithm
No ratings yet
Expectation-Maximization (E-M) Algorithm
12 pages
Project Report - Waste Classification With CNN and EfficientNet: Six Class Transfer Learning - by Saaiem Salaar
No ratings yet
Project Report - Waste Classification With CNN and EfficientNet: Six Class Transfer Learning - by Saaiem Salaar
27 pages
RPubs Lab 06 K Means Clustering With R
No ratings yet
RPubs Lab 06 K Means Clustering With R
22 pages
2025-Ai-Based Web Application Firewall
No ratings yet
2025-Ai-Based Web Application Firewall
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
3 pages
Machine Learning Refined (Foundations, Algorithms, and Applications) (2nd Edition) Watt
No ratings yet
Machine Learning Refined (Foundations, Algorithms, and Applications) (2nd Edition) Watt
10 pages
Exam 1Z0 1122 25 Dumps - OCI 2025 AI Foundations Associate
No ratings yet
Exam 1Z0 1122 25 Dumps - OCI 2025 AI Foundations Associate
5 pages
Multimodal Fake News Detection
No ratings yet
Multimodal Fake News Detection
10 pages
Syllabus DSA4213
No ratings yet
Syllabus DSA4213
6 pages
SCSA1703 Computer Vision Unit-1
No ratings yet
SCSA1703 Computer Vision Unit-1
41 pages
Deep Learning From Scratch
No ratings yet
Deep Learning From Scratch
33 pages
Sangem Jaya Prakash: BV Raju Institute of Technology
No ratings yet
Sangem Jaya Prakash: BV Raju Institute of Technology
2 pages
21csc305p - Machine Learning - Student Copy Course Plan
No ratings yet
21csc305p - Machine Learning - Student Copy Course Plan
11 pages
A Deep Learning-Based Algorithm For ECG Arrhythmia Classification
No ratings yet
A Deep Learning-Based Algorithm For ECG Arrhythmia Classification
6 pages
Hands-On Large Language Models
No ratings yet
Hands-On Large Language Models
59 pages
AI Demystified 800
No ratings yet
AI Demystified 800
2 pages
Oracle Exam 1Z0-1122-25 Dump
No ratings yet
Oracle Exam 1Z0-1122-25 Dump
8 pages
Data Mining Lab
No ratings yet
Data Mining Lab
3 pages
Machine Learning Techniques MCA3MLT Expanded PartB Milen Antony
No ratings yet
Machine Learning Techniques MCA3MLT Expanded PartB Milen Antony
5 pages

Devanagari Ancient Documents Recognition Using Statistical Feature Extraction Techniques

Uploaded by

Devanagari Ancient Documents Recognition Using Statistical Feature Extraction Techniques

Uploaded by

Sådhanå (2019) 44:141 Ó Indian Academy of Sciences

Devanagari ancient documents recognition using statistical feature

Keywords. Ancient manuscripts; Devanagari historical documents; feature extraction; classification.

1. Introduction work on digitizing historical documents [1]. Hence, this work

Figure 4. A character image: a one zone, b 4 zones, c 16 zones and d 64 zones.

56 27 च(4) ह(4) क(10) म (1) प(3) व (7)

Figure 5. Recognition results using different features and classifiers.

You might also like