0% found this document useful (0 votes)

10 views12 pages

Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm

Uploaded by

Ismail Humied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm

Uploaded by

Ismail Humied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

98 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.

9, September 2016

Segmentation Accuracy for Offline Arabic Handwritten

Recognition Based on Bounding Box Algorithm
Ismail. A. Humied
Assistant Professor of Computer Science, Faculty of Police, Policy Academic, Ministry of interior, Sana'a, Yemen

ABSTRACT without segmentation [2]. On the other hand, the complete

Character segmentation plays an important role in the Arabic dominance of the Internet as the main source of
optical character recognition (OCR) system, because the letters knowledge enforces a proper conversion of knowledge
incorrectly segmented perform to unrecognized character. into an editable and searchable format, so that such
Accuracy of character recognition depends mainly on the priceless knowledge can be not only preserved, but also
segmentation algorithm used. The domain of off-line
handwriting in the Arabic script presents unique technical
mined for information [3]. Nowadays, OCR systems built
challenges and has been addressed more recently than other upon segmentation-free algorithms are put successfully
domains. Many different segmentation algorithms for off-line into service in a number of application areas such as
Arabic handwriting recognition have been proposed and applied automatic reading of postal addresses and bank checks,
to various types of word images. This paper provides modify processing documents such as forms [4]. Given the
segmentation algorithm based on bounding box to improve importance and the difficulty of the segmentation problem,
segmentation accuracy using two main stages: preprocessing solving such problem would be a great achievement in the
stage and segmentation stage. In preprocessing stage, used a set field of OCR applications [5]. Hence, this article
of methods such as noise removal, binarization, skew correction, contribution is modified bounding box segmentation
thinning and slant correction, which retains shape of the
character. In segmentation stage, the modify bounding box
algorithm to improve segmentation accuracy.
algorithm is done. In this algorithm a distance analysis use on The rest of the paper is organized as follows. Sect.2, give
bounding boxes of two connected components (CCs): main a brief review of related literature. Sect.3, describe the
(CCs), auxiliary (CCs). The modified algorithm is presented and concept of OCR and its phases. In Sect.4 describes Arabic
taking place according to three cases. Cut points also determined writing characteristics, preprocessing, methodology used
using structural features for segmentation character. The in the segmentation of off-line handwritten Arabic
modified bounding box algorithm has been successfully tested character research, the modified for bounding box
on 450 word images of Arabic handwritten words. The results segmentation algorithm, and cut points. Simulation results
were very promising, indicating the efficiency of the suggested and discussions are presented in Sect. 5. Finally,
approach.
concludes works is provided in section 6.
Keywords:
Arabic OCR, Off-line Handwriting Segmentation; Connected
Components, Pattern Recognition.
2. Previous works

1. Introduction In the literature, multiple research works reported the

segmentation use of several off-line Arabic words. One of
Optical character recognition technology has grown from algorithms based segmentation of Arabic handwritten
the simple character recognition tools into widely used word proposed by Lorigo and Govindaraju in 2005. It
and specialized technologies, capable of enhancing proposed a new algorithm for Arabic handwritten word
numerous business processes and the researches on the for segmentation into sub word. This algorithm used to
recognition of handwritten letters have obtained over-segment the words and derivative info nearby the
increasing attention in recent years. The handwritten baseline, an imaginary horizontally line, location [6]. Xiu,
recognition is generally considered a difficult task P et al. in 2006 propose a probabilistic model
because of the differences of handwritings and of the segmentation algorithm, which is performed contour
irregularity of the writing of the same writer [1]. based over segmentation in the image of the text, and as a
Although there are many researches on OCR handwritten result, the production of a group called grapheme [7].
in more than 50 years of age, there are still many open Abdulla, S.et al. in 2008 proposed algorithm that begins in
issues that must be resolved. System for off-line Arabic the assessment of strokes, where a stroke is the curve
handwriting recognition still faces most challenges. Due between any two structure points (end points or branch
to the character nature of the Arabic language, most of points), or with curved segments in words, extracts the
published works are based on recognition of a whole word upper contour of the image of the word smoothed. Then

Manuscript received September 5, 2016

Manuscript revised September 20, 2016
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 99

using the chain- code representation of the upper contour, parts of the word image. In the two phases the method
and are paired adjacent points is the slope of the line applies neural-based segmentation technique to examine
connecting each pair calculated [8]. AlKhateeb, et al. in all prospective points of segmentation and identify the
2009 proposed technique for segmentation word; the invalid ones [15]. Osman.Y. in 2013 proposed algorithm
method words are extracted and detected in the Arabic for segmentation of word Arabic handwritten. The idea of
handwritten. The technique is based on the distances this algorithm is to segment the image to the lines and sub
between words and also sub-words. The measure words. After that, keep track of all sub word, and the
distances between connected components and analyzed to contour of every sub word. In this algorithm detects the
determine the optimal threshold for the segmentation finer points where the contour condition of a horizontal
word [9]. Al Hamad and Abu Zitar in 2010 proposed line is changed to another state of the vertical line [16].
segmentation algorithm and strategy validation for the Elnagar,A & Bentrcia, R. in 2015 proposed effective
Arabic words handwritten. The described technique to segmentation Arabic words handwriting method. In this
segmenting a word to its priorities is used ANN to method, using a multi agent technique to segment words
validate points segmentation on the basis of certain and relied on recognition to verify the validity of the
features. Technique works in three phases. Is obtained by candidate segmentation points. The method use of
an over segmentation on the histogram vertical modified artificial neural network along with the compilation of
of the word thinning. First, the segmentation is using a rules lead to good treatment of the problem of excessive
heuristic algorithm to segmentation of the Arabic word to segmentation of the handwritings are Arab. This was due
the primitives, then, the extraction of the structural to a proxy resolution, which shall take the appropriate
features of the characters using the modified direction decisions to determine the candidate segmentation points.
features method. Then, conversion of these features in the The segments pass led to the identification that will
character of the ANN training and testing into validates invoke and apply the rules and agent pool on the
the point's segmentation [10]. Elzobi, M., et al. in 2011 unrecognized slides before passing to recognize again
suggested segmentation algorithm has two phases. First [17].
algorithm starts by processing stage, it considered issues The note in this section is that the researchers have
such as skew words and slant correct. Secondly proposed several algorithms for segmentation of words.
segmentation stage by detecting and solving sub words Research workers typically use the easy algorithm to
overlapping, and then is applied segmentation using horizontal and vertical projections of the word
topographical features through a set of rules heuristic [11]. picture and search for minima’s to segment characters
Lawgali et al. in 2011 proposed algorithm exploited the from words. The algorithm of Arabic Heuristic
segmentation points that occur through end of a letter and segmentation is used to segment a word into primitives.
start of the after letter, also are located in the baseline of Subsequently, the features of the structural characters are
the region surrounding to present a segmentation extracted through the use of the Modified Direction
algorithm of Arabic handwritten words. In this Features technique, and there algorithm begins with
segmentation algorithm starts with segmenting the word segmenting the word into sub-words or connected
into sub words and the baseline of every sub word is components and then the baseline of every sub-word is
computed. And then, deletes all the descended sub- words computed. A variant of that is to make use of the
that have a beginning point below in the baseline. In this projection of a segment across the baseline to prevent the
algorithm used vertical projection for find the candidate problems of overlapping characters and holes. Some
points for the segmentation [12]. Eraqi & Abdelazeem in researchers use the minima of the upper profiles of words.
2012 proposed technique combined the neighborhood A lot of the algorithms presume that the characters are
geometric characteristics and the local writing direction connected at baseline. Other methods use the upper
information to propose a new efficient explicit method for contour rather of projections. Many researchers over
offline handwriting Arabic segmentation which segment the text and finalize the segmentation after
segmented the text into graphemes [13]. Samoud et al. in recognition by combining segments until characters are
2012 proposed two combining methods for segmenting shaped. In this event they use all potential mixes of
Arabic handwritten word into characters. The one method consequent segments. Yet other algorithms thin the word
was on the basis of the analysis of the contour minima and or make use of the skeleton of the word to simplify the
maxima and the projection. The two methods were on the segmentation. From a previous works bounding box
basis of Hough Transform and also Mathematical segmentation algorithm will modify to improve
Morphology operators [14]. Al Hamad.H in 2013 segmentation accuracy for handwritten Arabic text
proposed fusion equations for improving the segmentation recognition.
of Arabic word, this method has two phases. In the one
phase the method applied Arabic Heuristic Segment to
place the prospective points of segmentation in the each
100 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016

3. Optical Character Recognition skeletonization (thinning) the threshold image and slant
correction.
Optical character recognition (OCR) is a research field in Skew is the tilt in the image that occurs during scanning,
pattern recognition, artificial intelligence and computer if the paper is not fed straight into the scanner.
vision. It is used widely as a form of data entry from Skeletonization removes the width of the image from
paper data printed records, both passports, invoices, bank much pixel width to a single pixel width [18]. The
statements, receipts computerized, business cards, mail, preprocessed image is used as input in further phase after
publications of static data, or any appropriate documents. removing the above mentioned imperfections. In order to
It is common to digitize printed text the way so they can achieve the highest recognition rates, it is necessary to
be electronically adjusted, inspected or stored more have an effective preprocessing phase, and therefore;
compact, offer on the Internet, and is used in operations, using the effective processing algorithms makes the OCR
such as automatic machine translation, and text-to-speech, system power mainly through precise image enhancement,
the fundamental data and text mining. It uses OCR also by and noise removal system, the threshold image, thinning,
some of the archives as a means to transform the massive skew and slant correction, as describe it in detail in
amounts of handwritten to search digital forms of section 4.
historical documents, easy access. In general, Recognize
Handwritten Letters Systems are divided, according to 3.3 Segmentation
preliminary data (image) acquisition, to the main systems;
on-line and off-line systems. There are many phases in Segmentation is a very important stage in any recognition
OCR systems performed one by one to carry out the system. Segmentation includes the separation of text to
whole task [4]. The phases of OCR systems include the lines, lines to word and also word into characters.
following phases: data acquisition, pre-processing, Handwritten text has a lot of problems, such as touching
segmentation, feature extraction, classification, and post of the characters, leading to segmentation inappropriate
processing, these phases are as follows: and errors in the segmentation can reduce the rate of
recognition. Therefore, efforts should be made to develop
3.1 Data Acquisition good segmentation techniques. Two techniques were
applied to divide the printing and Arabic words
Any OCR takes as input data in two directions, either handwritten machine to segment characters: explicit
online or offline system. In handwriting recognition on segmentations and implicit segmentations. Explicit
the online, when use a special pen to write on a digital segmentation: words are externally segmented into
tablet, it is also that image is stored in digital form. When pseudo-letters, which are individually recognized. Implicit
a handwritten word image scanned, it is converted into a segmentation: usually the design of this type of
digital image. The words images that experimented on are segmentation with the rules that tries to identify all points
gray scale images, taken from an under construction of segmentation image for the segment words directly to
database; conventional flatbed scanner is used to extract the letters. Implicit segmentation operation is performed
the text with 350 dpi resolution. The form test includes by several methods such as region based segmentation,
450 words. The set of words that include all the forms of the edge-based segmentation, threshold based
the characters in all positions in Arabic are used. The segmentation, clustering technique and, bounding box
images saved in PNG graphic file format rather than other algorithm [19]. In this paper the segmentation improved
format, for example, TIFF, BMP, JPEG, or GIF, since using bounding box algorithm.
PNG files are relatively smaller in size with no loss in
quality. 3.4 Feature Extraction

3.2 Pre-processing Feature extraction is to extract useful information from

the text that can be used for the recognition purpose,
The importance of preprocessing phase of character therefore it is very important to determine the features
recognition system lies in its ability to address some of the meaningful. The feature extraction is done before
problems that may occur as a result of certain factors. The recognition of any character. Recognition accuracy OCR
use of pre-processing techniques can enhance the image system on the characters directly depends also on the
of its preparation for the next phase in the character feature extraction precision accuracy. As characters
recognition system. The preprocessing is a collection of handwritten vary greatly in slant and size, so efforts
processes applied on the digital image, for enhancing and should be made to determine the slant and size invariant
smoothing image to take additional steps of character features. The key goal of the feature extraction would be
recognition simple and accurate. This includes processing to map the input picture onto points in a feature area for
of noise removal reduction, binarization, skew correction, the classification and recognition stage. Features may be
separated into statistical and structural features. The
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 101

statistical features are extracted in the statistical removal, binarization, skew correction, thinning and slant
distribution of pixels which describe the feature correction. Finally explain segmentation phase to
measurements of the input picture pattern. The structural segmentation of off-line handwritten Arabic character.
features include the geometrical and topological features
of an input picture [20]. In this paper use topological 4.1 Arabic Writing Characteristics
features to extract useful information from the text image
that can be used for the recognition purpose. The actual Arabic alphabet contains of 28 characters and
contains numerous characteristics. Arabic writing process
3.5 Classification differs than the English language; Arabic is written from
right to left and it is cursive in general. The alphabet set
Classification phase is the phase of making major can broaden to 84 different forms based on the location of
decisions for any OCR system. It classifies unknown the character in addition to the style of writing (Nasekh,
character into different classes based on the extracted Roqa’a, Farisi and others). Table 1 shows the different
features. A class is a feature space or region in which the forms of Arabic characters depending on their location
particular character falls. The different algorithms are within the word and from the 28 basic Arabic characters,
used to classify characters pixel-based, statistical, six may be linked in the right part : dal (‫)د‬, raa (‫ )ر‬waw (‫)و‬,
structural and neural network. Typical character alef (‫)ا‬, thal (‫ )ذ‬, and zay (‫)ز‬, just as the other 22 can be
classification systems typical of many of the features of linked from many sides. Most of these six characters
each character picture on the basis of similarity of feature include only two forms, the stand alone form and the last
vectors to the character class, trying to classify. There are form. Although other characters can come in any of four
various character classifier structures of isolated forms: the beginning, the middle, the last, and the stand
handwritten character classification, such as simple linear alone form. Therefore, the Arabic word might contain one
classifiers, two-phase tree classifiers, and hierarchical or more connected components [19].
classifiers. According to the results of tests on the The secondary components (dots) perform an important
handwritten characters which combine multiple classifiers role in Arabic characters. The form of many characters is
is an effective way to produce works extremely reliable similar but the difference occurs with number and position
decision classifiers [21]. of dots, which could take place either above or below the
characters, like (‫ ث‬,‫ ت‬,‫)ب‬. Two characters in the alphabet
3.6 Post-processing have three dots; three have two dots and ten have one dot.
Dots can take place as two distinctive dots or could be
Post-processing system is the main stage to correct connected in a line in handwritten word. The difficulty in
segmentation and classification errors without human recognizing the secondary components comes due to
intervention. Recognize some of the characters that cannot quickly writing, as writers draw them connected to the
be properly segmented in a speech during a post-treatment, main body.
and the word can also be interpreted as a whole. And can In Arabic, small marks like a '"hamza", may be located
classification process output go through a phase error above or below five distinct characters or can appear as
detection and correction. Post processing include isolated characters. In addition, besides the alphabet there
dictionary look up and apply of language-specific are what so-called Diacritic Symbols, which are used to
information on words unrecognized. Known from the indicate vowels, and written in a very small size
lexical knowledge of contextual post-processing (compared to letters size) above or below a letter (e.g.) ,
application compares the dictionary on the basis of top to ).
bottom and statistical algorithm bottom to top. Finally, the The cursive nature of Arabic text means that characters of
post-processing of the context of the results of OCR can a word are connected through an imaginary horizontally
also take into account the knowledge of the context of line known as baseline. Arabic writing is proven to be
words [22]. cursive additionally in printed type. On the other hand,
that is not the same as cursive handwriting of the English
language in that several characters may be connected in
4. Offline Handwritten Arabic Character one part only. Several Arabic characters have a loop, like
Segmentation (‫ و‬,‫ ف‬,‫[ )ص‬23].
Firstly, in this section describe Arabic writing
characteristics to help full understand methodology used
in the segmentation of off-line handwritten Arabic
character. After that describe, using example, most
methods in pre-processing phase of OCR systems as
mentioned in the previous section; such that: noise
102 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016

Table 1. The different forms of Arabic characters depending on their Figure 1: Noise removal in a word image using median filtering
location within the word. technique with size 3X3 (a) Grey scale word image with noise. (b)
Name Isolated First Middle Last Image (a) without noise.
Alif ‫ا‬ ‫ا‬ ‫ا‬ ‫ا‬
Baa ‫ب‬ ‫ﺑـ‬ ‫ـﺑـ‬ ‫ـب‬
Taa ‫ت‬ ‫ﺗـ‬ ‫ـﺗـ‬ ‫ـت‬
4.2.2 Binarization
Thaa ‫ث‬ ‫ﺛـ‬ ‫ـﺛـ‬ ‫ـث‬
Geem ‫ج‬ ‫ﺟـ‬ ‫ـﺟـ‬ ‫ـﺞ‬ The global thresholding used to converting image into
Hha ‫ح‬ ‫ﺣـ‬ ‫ـﺣـ‬ ‫ـﺢ‬ binary image which iteratively determine all possible
Kha ‫خ‬ ‫ﺧـ‬ ‫ـﺧـ‬ ‫ـﺦ‬
threshold values and find out there variance [3]. The
Dal ‫د‬ ‫د‬ ‫ـد‬ ‫ـد‬
Thal ‫ذ‬ ‫ذ‬ ‫ـذ‬ ‫ـذ‬ output binary image has values of 0 as the front pixels
Raa ‫ر‬ ‫ر‬ ‫ـر‬ ‫ـر‬ (black) for all pixels in the input image and 1 as the
Zain ‫ز‬ ‫ز‬ ‫ـز‬ ‫ـز‬ background pixels (white) for all other pixels [10]. Fig. 2
Seen ‫س‬ ‫ﺳـ‬ ‫ـﺳـ‬ ‫ـس‬
Sheen ‫ش‬ ‫ﺷـ‬ ‫ـﺷـ‬ ‫ـش‬
shows the binary word image of Fig. 1 (b).
Saad ‫ص‬ ‫ﺻـ‬ ‫ـﺻـ‬ ‫ـص‬
Dhad ‫ض‬ ‫ﺿـ‬ ‫ـﺿـ‬ ‫ـض‬
Tta ‫ط‬ ‫طـ‬ ‫ـطـ‬ ‫ـط‬
Zha ‫ظ‬ ‫ظـ‬ ‫ـظـ‬ ‫ـظ‬
Ain ‫ع‬ ‫ﻋـ‬ ‫ـﻌـ‬ ‫ـﻊ‬
Ghain ‫غ‬ ‫ﻏـ‬ ‫ـﻐـ‬ ‫ـﻎ‬
Faa ‫ف‬ ‫ﻓـ‬ ‫ـﻔـ‬ ‫ـف‬ Figure 2: The binary word image of Fig. 1 (b).
Gaf ‫ق‬ ‫ﻗـ‬ ‫ـﻘـ‬ ‫ـﻖ‬
Kaf ‫ك‬ ‫ﻛـ‬ ‫ـﻛـ‬ ‫ـك‬
Lam ‫ل‬ ‫ﻟـ‬ ‫ـﻠـ‬ ‫ـل‬
4.2.3 Skew correction and baseline estimation
Meem ‫م‬ ‫ﻣـ‬ ‫ـﻣـ‬ ‫ـم‬
Noon ‫ن‬ ‫ﻧـ‬ ‫ـﻧـ‬ ‫ـن‬ Skew correction is based on the estimation of a fitting line
Haa ‫ه‬ ‫ھـ‬ ‫ـﮭـ‬ ‫ـﮫ‬ used during the writing process. In this paper, use the
Waw ‫و‬ ‫و‬ ‫ـو‬ ‫ـو‬ algorithm estimate linear regression of this line is the use
Yaa ‫ي‬ ‫ﯾـ‬ ‫ـﯾـ‬ ‫ـﻲ‬
made by linear regression of local minima of the word
image skeleton (LMR) [26]. Benefiting from the fact that
4.2 Pre-processing
most of the local minima (LM) points are usually
The purpose of this phase is enhancing the readability of occurring on near of the baseline; the problem of finding
text image and removing the details that do not have the the baseline can be reduced to a linear fitting problem of
discriminatory power in the recognition process. The pre- local minima points. However, this point contains points
processing is a series of operations performed on the from the descending letter and the baseline estimate. A
scanned input text image. It essentially enhances the consequence, those spurious points have to be filtered
resulting image to suitable for segmentation, which prior to the baseline estimation. The method is performed
includes noise removal, binarization, skew correction, and it is based on a linear regression on the remaining points
thinning and slant correction, as description in the to estimate the skew of the baseline image of the word,
following [16, 21, 24]: then the skew correction using the rotation. The baseline
detection algorithm is dependent on a two-step linear
4.2.1 Noise Removal regression:
First step begins through the fitting line of local minima
The spatial noise descriptor which shall be concerned is points calculated according to the following equations:
"salt and pepper" of structural features in the noise
component of the model. The "salt and pepper" noise y = a + bx (1)
model is the most common in OCR system found in Where a and b are coefficients calculated as follow:
image processing applications. Median filtering technique
is non-linear helpful to remove noise from pictures, it's 𝑎𝑎 = 𝑦𝑦� − 𝑏𝑏𝑥𝑥̅ (2)
especially efficient to removing the "salt and pepper"
∑𝑛𝑛 �)
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )(𝑦𝑦𝑖𝑖 −𝑦𝑦
noise [25]. In this research used median filtering of size 𝑏𝑏 = ∑𝑛𝑛 2 (3)
3X3 to remove noise from text image. Fig. 1 shows the 𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )

noise removal in a word image using this technique. Where 𝑥𝑥̅ and 𝑦𝑦� are the statistical means of x and y
coordinates, respectively. The slope angle α of inclination
processed line is calculated according to the following
equation:

α = arctan(b) (4)
(a) (b)
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 103

Second step to compute baseline using a give the θ limited consists of removing all the contour points of the picture
area. First, discretize the θ and the parameters ρ then each except those points that belong to the skeleton. In order to
additional point (x i , y i ) at area of the image; calculated ρ´ preserve the connectivity of the skeleton, it divides
as stated in Eq. (5): iteration into two sub iterations. In the first sub iteration,
the contour point P 1 is deleted from the digital pattern if it
ρ´ = x i sin θ´ + y i cos θ´ ∀ θ´ ∈ [α − ε, α + ε] (5) satisfies the following conditions:
where ε is constant that uses to offset the random error a) 2 <= N (P 1 ) <= 6. b) S(P 1 ) =1.
that can be produced in the first step. Experimentally, it c) P 2 *P 4 *P 6 = 0. d) P 4 *P 6 *P 8 =
found that ε = 10𝑜𝑜 gives most accurate results. Next, 0.
each point in the image space will vote for bins that could
have generated it in the though accumulator A, and votes Where S(P 1 ) is the number of 01 patterns in the ordered
will be accumulated in A according to Eq. (6). sequence of P 2 , P 3 , … P 8 , P 9 and N(P 1 ) is the number of
nonzero neighbors of P 1 , that is,
A(ρ´, θ´) = A (ρ´, θ´) +1 (6)
N(P 1 ) = P 2 + P 3 + ··· + P 8 + P 9 .
Finally, it will be considered ρ' and the θ' with the In the second sub iteration, conditions (a) and (b) remain
maximum number (global maxima) of votes will be the same, but conditions (c) and (d) are changed to
considered as the parameters of the word baseline as
shown in the following equation: c') P 2 *P 4 *P 8 = 0. d') P 2 *P 6 *P 8 =
0.
arg max 𝐴𝐴(ρ´, θ´) (7)
𝑝𝑝′,θ´ Is executed one step to each pixel in the binary area under
consideration, In the event of one or more violations of
Fig. 3 shows an example of the results, where Fig. 3(a) is the requirements (a) to (d), does not change the value of
the binary word image and its baseline estimation. Fig. the points in question. If the all of the requirements and a
3(b) is the skew corrected image with LMs: skew angle = point of has developed for deletion are met. And it is
-1.749. important to note, that the point is not deleted even
address all the points. This prevents changing the
structure of the data during the implementation of the
algorithm. After they have been applied one step to each
pixel, and the ones that have been flagged are deleted.
(a) (b)
Then apply a two-step to the resulting data at exactly the
Figure 3 skew correction and baseline estimation, (a) Binary word image
same manner as a single step. Fig.4 shows the thinned
and its baseline estimation (b) Skew corrected using linear regression. word image.

4.2.4 Thinning

Thinning is a process to reduce the foreground regions in

the binary image of the remains to the skeleton that keeps
largely on the extent of the contact in the original region Figure 4: The thinned word image of Fig. 3(b)
while throwing more than the original foreground.
Commonly used in pattern recognition, digital image 4.2.5 Slant correction
processing and image analysis. The thinning process is
applying to enhanced images words. An effectively Handwritten word is usually characterized by slanted
skeleton algorithm has been proven in a wide range of characters. The slanted characters slope either from left to
applications for image processing including the OCR. right or versa. Different deviations may appear not only
Skeleton algorithm will find a single pixel thick within a word but also within a single character. The slant
representation showing centerlines of the text. Generally, correction does not affect the connectivity of the word and
skeletonization algorithm to be effective, it should ideally the resulting characters are natural. Slant is an individual
data compression and retaining the important features of variation in handwritten words to lessen the consequences
this pattern. For the case of handwritten Arabic it is hard of this variation the slant angle must be detected shear
to find a robust and useful skeleton algorithm that retains normalization has to be done efficiently on the contour
the significant feature of the pattern due to the variety of level. Among the measurable variables of various
handwritten Arabic writing styles. This paper has been handwriting forms is the slant angle between longest
used the thinning algorithm that is based on the Zhang- strokes in a character along with the vertical direction.
Suen’s thinning algorithm [27]. The Zhang-Suen’s Slant correction can be used to correct all characters into a
thinning algorithm for extracting the skeleton of a picture regular form. Coordinates of the beginning and end-points
104 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016

of each line component provide the slant angle. The Fig. 5 shows a slant corrected, Fig 5(a - c) shows a binary
algorithm used by which projection profiles are computed word image of Fig. 2, the thinned word image of Fig. 4
for a number of angles from the vertical direction [28]. and the slant corrected version of Fig. 4, respectively; and
In this paper use for slant correction technique its corresponding vertical projection profiles.
the vertical projection histogram [29]. The histogram of 35

the word that is written in a row would be a distinct peaks 40

larger and more. Therefore, the chart can looked of the

word in different shear angles and take the one with the
highest peaks. It does for the angles between -45 and 45 50

degrees, which is the most common form of slant angles 55

in a regular writing. Every angle, the histogram vertical 60

0 20 40 60 80 100 120

calculated, application function scale, which measures the (a)

height of the peaks. The angle with the highest winning 32

34
measure and will be used as the shear angle. To save time, 36

go through the first set great strides, say 5 degrees. Each 38

of these angles has to identify those with the highest 42

standards, and looking about each of these with smaller 44

steps. 48

The technique projection profile based technique 50

52
0 20 40 60 80 100 120 140
are calculated on the image of the horizontal gradient in
different shear angles in the range [± 45], and used to (b)
48

estimate the slant angle. For the first time, the focus will 49

be on the image of the horizontal gradient to calculate 50

vertical strokes at the expense of those horizontal and 51

second, will reduce the cost of the expense, due to the

need to address the relatively less pixels. It is determined

extremist points in the horizontal extent to contour the

54
Arabic word handwritten. It is presumed that the amount 0 20 40 60 80 100 120 140

of the absolute differences involving the coordinates x of (c)

the left (or right) end points from five successive runs
Figure 5: Slant correction (a) A binary word image (Fig.2), (b) The
vertically with the present range of being one in the center
thinned word image (Fig. 4), (c) The slant corrected version of Fig. 4.
to be an intrinsic element of the slope of the endpoint. The corresponding vertical projection histograms are in the left of Fig.

x´ = x − y. tan (α), y´ = y (8)

4.3 Segmentation
Where α ∈ [±45]: is the shearing angle.
Following the preprocessing phase, character recognition
For each sheared image, vertical histogram H is calculate systems perform segmentation, for the text to be
as stated in Eq. 9 recognized. Generally, segmentation of a binary text is
∞ dependent on re-grouping of the connected components
𝐻𝐻(𝑥𝑥́ 𝐿𝐿 ; 𝛼𝛼) = � 𝚤𝚤̇́ (𝑥𝑥𝐿𝐿́ , 𝑦𝑦𝑘𝑘́ ) (9) CCs. Arabic writing is text therefore which words are
𝑘𝑘=0 divide by spaces. While, a word might include many CCs
that are parts of the word including one or even more
And apply a variation analysis for every histogram profile connected character. The CCs for the word must be
according to Eq. 10 determined. The objective of the CCs phase is to form
∞
minimum sized rectangles about all the connected objects
𝑉𝑉(𝛼𝛼) = �[𝐻𝐻(𝑥𝑥́ 𝐿𝐿 ; 𝛼𝛼) in the image. The method used to acquire the CCs is an
𝐿𝐿=0 iterative process which checks any black pixels for
− 𝐻𝐻(𝑥𝑥𝐿𝐿́ + 1; 𝛼𝛼)]2 (10) connectivity with another. Bounding rectangles are
extended to enclose any grouping of connected black
Where the sheared angle is the angle associated with pixels. In this paper, the 8 neighbors are used for
maximum variation according to Eq. 11 extracting the connecting components by scanning the
image pixel by pixel checking for pixel connectivity and
𝛼𝛼́ = arg max 𝑉𝑉(α) (11) improve segmentation using bounding box algorithm as
𝛼𝛼
mentioned above.
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 105

Firstly, let P describes to foreground pixel in the below the half line of potential as points. It must
skeleton word g(x, y), then allow the 8-neighborhood set determine the number of points and its location relative to
of P. Second, by analyze every P ∈ g(x, y), a set of feature the main skeleton structure of the character in every part.
points are identified, that call the main feature points And it must be done to determine the number of points
(MFPs). Third, determine bounding box segmentation can be one, two, or three, can also be above or below the
algorithm. Finally, generate a new set called the cut point main skeleton of the structure of the character. All pieces
(CP). As describe in the following sub-sections: are accounted for the first points that follow each track of
each endpoint. If you reach the end point along the track
4.3.1 The Main Feature Points to another track procedure finds less than the threshold
point. If the path pixels more than one, and joined the
Structural and statistical features are the most commonly halfway point of the feature. This is then added to the
used features of the character recognition. Choose the connected components (CCs) are cleared endpoint at this
type of features and extraction of the characters is a very point feature (one point). Contour, if the width of the
critical step. Feature extraction transfer of Two- point is twice the height of a point, then the line is
dimensional image into a set of vectors that are letter considered to be a few points. Therefor the dot points (DP)
input representation by a set of numerical values to pass to is the union of the set of all isolated pixels, and the set of
determine the recognition. Since the words are pixels that belong to CCs that are less in size than an
represented in the system by the skeletal pattern so most adaptive threshold T proportional to the estimated
of the topological features are suitable for this character size calculated upon the thinned text image.
representation [30]. The topological features were chosen
are: loop point, branch point, dot point and end point, DP = {P|N8(P) = 0}∪{P|P ∈ CC ∧ size(CC) < T} (14)
which all operate on a skeletonized word and describe in
(iv) End Points (EP)
more detail in the following:
The endpoint is the beginning or end of a word segment.
(i) Loop Points (LP)
The end point in the skeleton with only one neighbor,
First convert the starting locations linear indicators which also marks the completion of the strokes, the
because the linear indicators may be used to extract the endpoint is determined by examining each individual one
pixel values of all of the locations and then select eight pixel in the bitmap skeletons. The consequence of the
neighbors N8(P) compensation expense of all the skeleton, one end points of the total eight neighborhoods a
neighbors of a group of pixels in one. We're considering one pixel. Therefore, if the total of eight neighbors N8(P)
in finding the north, east, south, and west neighbors all one, this is the end point.
these pixels and then add all the neighbors compensate for
EP = {P|N8(P) = 1} (15)
each linear index now carry out flood-fill algorithm [31].
The flood-fill (ff) algorithm on a binary image, you Fig. 6 shows a thinned word image with all possible
specify a background pixel as a starting-point, and flood- MFPs that will be utilized to guide the characters
fill changes associated background pixels (1) to segmentation process.
foreground pixels (0), stopping when it reaches object
boundaries. The boundaries are determined according to
the type of neighborhood you specify.

LP = {P|P ∈ f f (i (x, y))} (12)

(ii) Branch Points (BP) Figure 6: Thinned word image in Fig. 5 with all possible MFPs.

The branch points in the skeleton with 3 to 4

The rest of the sub-sections will be details the modified
neighbors, the BP is determined by examining each with
for bounding box segmentation algorithm and then
3 to 4 pixels in the bitmap skeletons. The consequence of
presents cut points of character to segmentation words.
the skeleton, if the total of eight neighbors N8(P) with 3 or
4 pixels, this is the Branch Points. 4.3.2 Modified for bounding box segmentation algorithm
BP = {P|N8(P) = 3 ∨ P|N8(P) = 4} (13) One of the most important methods to improve
segmentation accuracy is enhance nonoverlapping
(iii) Dots Points (DP) connected components (CCs) using bounding box
segmentation algorithm. First, find the word baseline as
Points are determined whether higher or lower than short stated above. Then upon finding the baseline, differentiate
strokes significantly, and isolated that occur on above or between two types of CCs. The first is call main (CCs),
106 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016

which are all (CCs) that intersecting with the baseline “y” Fig. 7(a), auxiliary CC number “6” will be assigned to the
coordinate. The second are call auxiliary (CCs), which are main CC number “4”.
all (CCs) that are not intersecting the baseline “y” (iii) If two or more main CCs are intersecting one at least
coordinate. After identifying main CCs and auxiliaries of Ci the absolute distance along y-axis will calculate,
CCs, CCs bounding boxes computed along the y-axis [32]. between right bounding box of the auxiliary CC and right
Fig. 7(a) shows a simple example of CCs of word image, bounding box of the intersecting main CCs; the one with
where the main CCs are 2, 3 and 4, and the auxiliaries are minimal distance wins the auxiliary, like in case C5 of
1, 5 and 6; and the horizontal red line representing the auxiliary CC number “5” that is intersecting both main
baseline. CCs “3” and “4”. The auxiliary CC number “5” will be
assigned to the main CC number “4” because the absolute
distance along y-axis, between the right bounding box of
auxiliary CC number “5” and the right bounding box of
main CC number “4” (d1) less than the absolute distance
along y-axis, between the right bounding box of auxiliary
CC of number “5” and the right bounding box of main CC
number “3” (d2); {d1 < d2}.
Even though the aforementioned rules resolve almost all
the cases, there are some extreme cases where auxiliary
CCs in wrong position of another suitable main CCs. Such
that, if write character ‫( د‬Dal) before character ‫( ز‬Zain) in
Fig. 7(a). The column Ci is intersecting main CC ‫( د‬Dal)
and it is suitable for this auxiliary CC, and then assigns
the auxiliary CC to this main according to case (ii). Those
problems can be solved in subsequent recognition phases
like in the post-processing phase, for example, where the
recognition results can be corrected against lexicons using
different text retrieval techniques.

Figure 7: (a) Overlapped CCs within a given word image, (b)

nonoverlapping CCs of (a).

Basely conduct the distance analysis on their bounding

boxes along the y-axis, in order to identify the baseline
overlapped main CCs and their corresponding overlapping
distances according to the fact that the Arabic text is
written from right to left. The right border of the bounding
Figure 8: (a) Overlapped connected components within a given word
box is computed to be the farthest right border among all image, (b) nonoverlapping connected components.
bounding box elements. In Fig. 7(a), for example, main
CCs that are overlapping are (2, 3) and (3, 4). Another Figure 8 shows another example overlapped CCs. Fig. 8
distance analysis is performed against the auxiliary (CCs), (a) shows that main CCs that are overlapping are 1 and 3,
so each can be assigned to its corresponding main CC the auxiliaries CCs are 2 and 4; and the horizontal red line
according to a collection of columns (Ci) which each representing the baseline. Apply modified bounding box
column is intersecting the box of the auxiliary CC. Fig. segmentation algorithm on this figure. Therefore auxiliary
7(a) shows three columns (green dashed line) of Ci where CC number “2” will be assigned to the main CC number
i, auxiliary CC number, equal to 1, 5 and 6. There are “1” according to case (ii) at above, and the auxiliary CC
three cases as following: number “4” will be assigned to the main CC number “3”
(i) If Ci are not intersecting any main CCs, the auxiliary because d1 < d2 as discuses above in case (iii).
CC is assigned to the direct next main CC on the left. This Finally, a distance analysis is preformed against the new
is due to the fact that Arabic text is written from right to sub-word borders and the nonoverlapping CCs done by
left, and writers are usually writing main CC first then shifting away the overlapped CCs. Figs. 7(b) and 8(b)
auxiliaries. For example, In Fig. 7, auxiliary CC number show the overlapping free version of Figs. 7(a) and 8(a).
“1” will be assigned to the main CC number “2”.
(ii) If Ci are intersecting only a given main CC, then
assign the auxiliary CC to this main. So for example in
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 107

reported in 13% of all cases, as illustrates in Fig. 12 and

13.
The partial overlapping cases between auxiliary CCs and
main CCs represent 2.4% of all cases for partial
overlapping. It generated when auxiliary CCs in wrong
position of another suitable main CCs as discuses in
Figure 9: Bounded by a rectangle in the thinned word image of Fig.6.
subsection 4.3.2. Fig. 12 illustrates such case: the
auxiliary CCs (dot) in characters ‫ ي‬on left column and ‫ن‬
4.3.3 Cut points of character
on right column.
After perform bounding box segmentation, the character The other 10.6% of the partial overlapping cases in main
represent in segmentation. The segmentation algorithm CCs happen when MFPs occur inside the character
presented in [32] is adopted on the basis of segmentation instead of on its borders and position of the character in
algorithm. Firstly, generate a new set called cut points the end of word, this leading to call an over-segmentation.
(CP) and the Arabic characters have their boundaries in That is because part of a stroke is regarded as character
column C with the minimum number of pixels. Cut-point representative, which in fact is not. This problem is
set is a collection of columns that will be indicators specific to characters ‫ س‬and ‫( ش‬SIEN and SHIEN). The
columns, where each column is in a word thinning column first row in Fig. 13 illustrates such case. The second row
indicators, which only contain a single pixel. The next in Fig. 13 illustrates another case of partial overlapping
step is to exclude from the candidates set all columns that happen when ‫( ﻛـ‬KAF) occurs in the middle of two
are intersecting with any MFP. After excluding some connected characters, ‫ ك‬having upper part vertically
points from the cut points will notice that alleviate the overlapping the previous character on the right. Also, it
number of points of the cut points to be the character appears occasionally incase that MFPs cease to exist
segmentation properly, so there is no more than a between two consecutive characters, leading them to
segmented character after the exclusion of points of pieces. being considered as a representative of one character. This
Moreover, to mark the start and/or the end of all letters, problem is called under-segmentation and it is specific for
insert segmentation candidates direct before and after cases, when the second character to left is connected ‫ا‬
each main CC. Finally, each set of pixels between every (ALF) or connected ‫( ل‬LAM) with sheared distortion
two segmentation candidates in the binary image are angle to the left. The last row in Fig. 13 illustrates such
assumed to represent a letter in the word. case. Those problems may be solved by expanding the
To extract the most possible accurate letter image and to MFP set to contain more features points like local minima
eliminate isolated pixels belonging to neighboring letters points and then accordingly modify and add heuristic
that may appear as a result of the crop process, the rules.
reconstruction module perform according to [33], in Original Image
which use those sets of pixels as masks, and their for LMs
counterparts in the thinned image as a marker. Then, Corrected
constructed the letter image and save it, as the result of the binary image
segmentation. Fig. 10 illustrates the final segmentation with LMs skew
angle
results of the word image.
image after
thinning
Corrected
image with
(a) (b) thinning slant
bounded by a
rectangle in the
Figure 10: (a) the segmented characters borders on the thinned image of CCs
Fig.6. (b) The segmented characters. The segmented
characters
5. Experimental Results borders on the
thinned image
The experimented of the modified for bounding box
The segmented
segmentation algorithm to improve segmentation characters
methodology on 450 word images; Figures 11, 12 and 13
illustrate some of the results. Figure 11 illustrates cases of
Figure 11: Examples of segmentation of word images without
segmentation of word images without overlapping of overlapping of main CCs or auxiliary CCs (The skew angle = -3.096 in
main CC or auxiliary CC, it represent 87% of all cases. the left word and -0.142 in the right).
Partial overlapping cases of auxiliary CCs or main CCs
108 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016

The note through experiments that an algorithm

acceptable characters very effectively by identifying
algorithm for bounding box segmentation and cut points,
which in turn uses a set of rules that enable them to
correct segment characters rate since been identified loop,
branch, dot and end points. There are some cases are not
selected dot points because some of the people writing the
points tangled way. Figure 13: Examples of segmentation of word images with partial
The effort was to enhance the current status of off-line overlapping of main CCs.
handwritten Arabic character segmentation. Although
every of the algorithms summarized in section 2 have 6. Conclusion
their own downsides and superiorities, the offered Characters segmentation is an essential phase in
segmentation results of different systems seem quite handwriting-recognition system. There is not any
successful. It is extremely difficult to make a judgment universally accepted solution in automated handwritten
about the success of the results of segmentation systems, document recognition techniques. In this paper the
especially in terms of segmentation rates, because of modified bounding box segmentation algorithm
different databases, constraints and sample rates. For performed to improve segmentation word into characters.
words that are handwritten under poor conditions or for This approach is based on a distance analysis on bounding
freestyle hand writing, there is still an intensive demand in boxes of two CCs: main (CCs), auxiliary (CCs). The
virtually all the phases of the handwritten Arabic modified for bounding box segmentation algorithm is
character segmentation research. presented and taking place according to three cases. Cut
Original
Image for
points also determined using structural features for
LMs segmentation character. The proposed modified bounding
Corrected box segmentation algorithm has been successfully tested
binary image on 450 word images of Arabic handwritten words. The
with LMs results were very promising, indicating the efficiency of
skew angle the suggested approach. However, this technique can be
image after conducted in automated manner to segmentation several
thinning off-line Arabic words.
Corrected
image with References
thinning slant [1] Elzobi, M., Al-Hamadi, A., Dinges, L., Michaelis, B.: A
bounded by a structural features based segmentation for off-line
rectangle in handwritten Arabic text. In: 2010 5th International
the CCs
Symposium on I/V Communications and Mobile Network
The (ISVC), pp. 1–4. Rabat, Morocco (2010).
segmented
characters
[2] Belaïd, A., Choisy, C.: Human reading based strategies for
borders on offline arabic word recognition. In: Proceedings of the 2006
the thinned Conference on Arabic and Chinese Handwriting
image Recognition, SACH’06, pp. 36–56. Springer-Verlag, Berlin,
The Heidelberg (2008).
segmented [3] Al Aghbari, Z., Brook, S.: Hahmanuscripts: a holistic
characters paradigm for classifying and retrieving historical arabic
handwritten documents. Expert Syst. Appl. 36(8), pp.
Figure 12: Examples of segmentation of word images with partial 10942–10951 (2009).
overlapping between auxiliary CC and main CCs (The skew angle = - [4] Lavrenko, V., Rath, T.M., Manmatha, R.: Holisticword
0.32 in the left word and 1.187 in the right). recognition for handwritten historical documents. In:
Proceedings of the First International Workshop on
Document Image Analysis for Libraries, pp. 278–287.
ACM, New York (2004).
Segmented characters borders on the [5] Blumenstein, M.: Cursive character segmentation using
The binary image
thinned image neural network techniques. In: Marinai, S., Fujisawa, H.
(eds.) Machine Learning in Document Analysis and
Recognition, vol. 90 of Studies in Computational
Intelligence, pp. 259–275. Springer, Berlin (2008).
[6] Lorigo, L., and Govindaraju, V.: Segmentation and Pre-
Recognition of Arabic Handwriting. In Proceedings of the
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 109

Eighth International Conference on Document Analysis and [21] Ali, A., Shaout, A., Elhafiz, M.: Two stage classifier for
Recognition, vol. 2, pp. 605-609 (2005). Arabic Handwritten Character Recognition, International
[7] Xiu, P., Peng, L., Ding, X., and Wang, H.: Offline Journal of Advanced Research in Computer and
Handwritten Arabic Character Segmentation with Communication Engineering, pp 646- 650 (2015).
Probabilistic Model. Document Analysis Systems. VII, pp. [22] Bassil, Y., Alwani, M.: Ocr Post-Processing Error
402-412(2006). Correction Algorithm Using Google's Online Spelling
[8] Abdulla, S., Al-Nassiri, A., and Salam, R.A.: Offline Suggestion, Journal of Emerging Trends in Computing and
Arabic Handwriting Word Segmentation Using Rotational Information Sciences, Vol. 3, No. 1 (2012).
Invariant Segments Features. The International Arab [23] Zeki, A.M.: The segmentation problem in Arabic character
Journal of Information Technology. vol. 5, no. 2 . pp. 200- recognition: The state of the art. First International
208(2008). Conference on Information and Communication
[9] AlKhateeb,J.H.,Jiang ,J., Ren, J., & Ipson, S.: Component- Technologies, ICICT, pp. 11–26 (2005).
based Segmentation of words from handwritten Arabic text. [24] Farooq, F., Govindaraju, V., and Perrone, M.: Pre-
International Journal of Computer Systems Science and processing Methods for Handwritten Arabic Documents”,
Engineering, 5(1) (2009). In Eighth International Conference on Document Analysis
[10] Al-Hamad H.A., Zitar R. A.: Development of an efficient and Recognition, vol. 1, pp. 267–271 (2005).
neural -Based Segmentation Technique for Arabic [25] Gonzalez, R., and Woods, R., Digital Image Processing
Handwriting Recognition. Pattern Recognition, vol. 43, no. (3rd Edition), Prentice Hall, August (2008).
8, pp. 2773–2798(2010). [26] Boubaker, H., Kherallah, M., Alimi, A.M.: New Algorithm
[11] Elzobi, M., Al-Hamadi, A., Al Aghbari, Z.: Off-line of Straight or Curved Baseline Detection for Short Arabic
Handwritten Arabic Words Segmentation Based on Handwritten Writing. 10th International Conference on
Structural Features and Connected Components Analysis. Document Analysis and Recognition. ICDAR '09, pp. 778-
In I/V Communications and Mobile Network (ISVC) 782. Washington, DC, USA. IEEE Computer Society
(2011). (2009).
[12] Lawgali, A., Bouridane, A., Angelova, M., and [27] ZHANG, T., and SUEN C.: A Fast Parallel Algorithm for
Ghassemlooy, Z.: Automatic segmentation for Arabic Thinning Digital Patterns, Communications of the ACM,
characters in handwriting documents. In Image Processing Volume 27 Number 3, pp 236-239 (1984).
(ICIP), 18th International Conference on IEEE, pp. 3529- [28] Bunke.H. and Wang. P.S.P.: Handbook of Character
3532. IEEE (2011). Recognition and Document Image Analysis. Chapter Image
[13] Eraqi, H., M., and Abdelazeem. S.: A new Efficient Processing Methods for Document Image Analysis, pp. 15,
Graphemes Segmentation Technique for Offline Arabic 19. World Scientic ( 1997).
Handwriting. Frontiers in Handwriting Recognition [29] Slavik, P., Govindaraju, V. (eds.): Equivalence of different
(ICFHR), International Conference on. IEEE, 2012. methods for slant and skew corrections in word recognition
[14] Samoud, F.B., Maddouri, S.S., and Amiri, H.: Three applications. IEEE Trans. Pattern Anal. Mach. Intell. 23(3),
Evaluation Criteria's towards a Comparison of Two pp. 323–326 (2001).
Characters Segmentation Methods for Handwritten Arabic [30] Al Aghbari, Z.: HAH manuscripts: Aholistic paradigm for
Script. Frontiers in Handwriting Recognition (ICFHR), classifying and retrieving historical Arabic handwritten
International Conference on IEEE (2012). documents, Expert Systems with Applications, Vol 36, pp.
[15] Al Hamad, Husam A.: Neural-Based Segmentation 10943- 10951 (2009).
Technique for Arabic Handwriting Scripts. 21st [31] El-Abed, H., and Margner, V.: Comparison of Different
International Conference on Computer Graphics, Preprocessing and Feature Extraction Methods for Offline
Visualization and Computer Vision, WSCG (2013). Recognition of Handwritten Arabic Words. In Ninth
[16] Osman, Y.: Segmentation Algorithm for Arabic International Conference on Document Analysis and
Handwritten Text based on Contour Analysis. International Recognition, vol. 2, pp. 974-978 (2007).
Conference on computing, Electrical and Engineering [32] Elzobi, M., Al-Hamadi, A., Al Aghbari, Z., and Dings, L.:
(ICCEEE) (2013). IESK-ArDB: a database for handwritten Arabic and an
[17] Elnagar, A., and Bentrcia, R.: A Recognition-Based optimized topological segmentation approach, In
Approach to Segmenting Arabic Handwritten Text. Journal International Journal on Document Analysis and
of Intelligent Learning Systems and Applications, 7, pp. 93- Recognition (IJDAR) (2012).
103 (2015). [33] Vincent, L.: Morphological grayscale reconstruction in
[18] MELHI, M., H.: Off-Line Arabic Cursive Handwriting image analysis: applications and efficient algorithms. IEEE
Recognition Using Artificial Neural Networks; PhD thesis. Trans. Image Process. 2, pp. 176–201 (1993).
Department of Cybernetics, Internet and Virtual Systems.
Bradford, University Bradford (2001).
[19] PLAMONDON, R., and SRIHARI, S. N.: Online and off-
line handwriting recognition: a comprehensive survey.
Pattern Analysis and Machine Intelligence, IEEE
Transactions on, pp. 22, 63-84 (2000).
[20] W. M. Newman and R. F. Sproull: Principles of Interactive
Computer Graphics. Sec 17.2. 2nd edition, McGraw Hill
(1989).

Maliki 2012
No ratings yet
Maliki 2012
8 pages
Arabic Character Recognition System Development: Sciencedirect
No ratings yet
Arabic Character Recognition System Development: Sciencedirect
8 pages
Research Paper On Character Recognition
No ratings yet
Research Paper On Character Recognition
8 pages
Signal & Image Processing: An International Journal
No ratings yet
Signal & Image Processing: An International Journal
16 pages
Arabic Word Segmentation Method
No ratings yet
Arabic Word Segmentation Method
7 pages
Off-Line Arabic Handwriting Character Recognition Using Word Segmentation
No ratings yet
Off-Line Arabic Handwriting Character Recognition Using Word Segmentation
6 pages
Devanagari Handwritten Line Segmentation
No ratings yet
Devanagari Handwritten Line Segmentation
6 pages
Gara in 2002
No ratings yet
Gara in 2002
11 pages
Accepted Manuscript: 10.1016/j.eswa.2017.06.034
No ratings yet
Accepted Manuscript: 10.1016/j.eswa.2017.06.034
25 pages
Analogic Preprocessing and Segmentation Algorithms For Off-Line Handwriting Recognition
No ratings yet
Analogic Preprocessing and Segmentation Algorithms For Off-Line Handwriting Recognition
20 pages
Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout
No ratings yet
Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout
7 pages
Poster PHD Chadli
No ratings yet
Poster PHD Chadli
1 page
Printed Arabic Letter Recognition Based On Image
No ratings yet
Printed Arabic Letter Recognition Based On Image
6 pages
Isolated Arabic Handwritten Character Recognition A
No ratings yet
Isolated Arabic Handwritten Character Recognition A
11 pages
Segmentation of Cursive Online Arabic Script: A Final Paper For MIT 6.870 by Mahmoud Arram
No ratings yet
Segmentation of Cursive Online Arabic Script: A Final Paper For MIT 6.870 by Mahmoud Arram
18 pages
A Novel Fuzzy Approach For Handwritten Arabic Character
No ratings yet
A Novel Fuzzy Approach For Handwritten Arabic Character
16 pages
Determine Characters by Mathematical Model For Segmentation Arabic Words
No ratings yet
Determine Characters by Mathematical Model For Segmentation Arabic Words
7 pages
2014 IJDAR DFernandez
No ratings yet
2014 IJDAR DFernandez
21 pages
Segmentation of Arabic Handwritten Documents Into PDF
No ratings yet
Segmentation of Arabic Handwritten Documents Into PDF
8 pages
Arabic Handwriting Recognition Thesis
100% (3)
Arabic Handwriting Recognition Thesis
7 pages
2208 11484v2
No ratings yet
2208 11484v2
31 pages
Data Augmentation For Offline Arabic Handwritten Text Recognition Using Moving Least Squares
No ratings yet
Data Augmentation For Offline Arabic Handwritten Text Recognition Using Moving Least Squares
10 pages
Confluence 2018 8442875
No ratings yet
Confluence 2018 8442875
4 pages
De Nardin A One-Shot Learning Approach To Document Layout Segmentation of Ancient WACV 2024 Paper
No ratings yet
De Nardin A One-Shot Learning Approach To Document Layout Segmentation of Ancient WACV 2024 Paper
10 pages
Binarization and Segmentation of Kannada Handwritten Document Images
No ratings yet
Binarization and Segmentation of Kannada Handwritten Document Images
6 pages
8th ICCIT - 2005 - 564
No ratings yet
8th ICCIT - 2005 - 564
5 pages
Text Line Segmentation and Word Recognition in A System For General Writer Independent Handwriting Recognition
No ratings yet
Text Line Segmentation and Word Recognition in A System For General Writer Independent Handwriting Recognition
11 pages
Word Segmentation Method For Handwritten
No ratings yet
Word Segmentation Method For Handwritten
5 pages
A Recognition-Based Arabic Optical Character Recognition System
No ratings yet
A Recognition-Based Arabic Optical Character Recognition System
6 pages
An Approach To Offline Handwritten Chinese Character Recognition Based On Segment Evaluation of Adaptive Duration
No ratings yet
An Approach To Offline Handwritten Chinese Character Recognition Based On Segment Evaluation of Adaptive Duration
6 pages
Fragmentation of Handwritten Touching Characters in Devanagari Script
No ratings yet
Fragmentation of Handwritten Touching Characters in Devanagari Script
11 pages
A Novel Arabic OCR Post Processing Using
No ratings yet
A Novel Arabic OCR Post Processing Using
13 pages
Arabic Manuscript Recognition System
No ratings yet
Arabic Manuscript Recognition System
8 pages
A Solution For Line Segmentation Problems in Sindhi Character Recognition System
No ratings yet
A Solution For Line Segmentation Problems in Sindhi Character Recognition System
7 pages
Layout Analysis For Arabic Historical Document Images
No ratings yet
Layout Analysis For Arabic Historical Document Images
6 pages
Detection and Segmentation of Touching Characters in Mathematical Expressions
No ratings yet
Detection and Segmentation of Touching Characters in Mathematical Expressions
5 pages
2024 IJACSA - Revolutionizing Historical Document Digitization
No ratings yet
2024 IJACSA - Revolutionizing Historical Document Digitization
10 pages
Design and Implementation Recognition System For Handwritten Hindi/Marathi Document
No ratings yet
Design and Implementation Recognition System For Handwritten Hindi/Marathi Document
5 pages
IRJET-Hand Written Character Recognition Using Template Matching
No ratings yet
IRJET-Hand Written Character Recognition Using Template Matching
5 pages
Hand Writing Recognition System
No ratings yet
Hand Writing Recognition System
24 pages
Chapter One: 1.1 Problem Definition
No ratings yet
Chapter One: 1.1 Problem Definition
41 pages
GrayLineExtraction PDF
No ratings yet
GrayLineExtraction PDF
7 pages
PAA RevisedVersionFinal PDF
No ratings yet
PAA RevisedVersionFinal PDF
30 pages
370 Oct Ijamte - 1126
No ratings yet
370 Oct Ijamte - 1126
7 pages
Recognition of English Characters by Codes Generated Using Neighbour Identification
No ratings yet
Recognition of English Characters by Codes Generated Using Neighbour Identification
5 pages
Ijet 19271 PDF
No ratings yet
Ijet 19271 PDF
10 pages
A Survey On Handwritten Character Recognition (HCR) Techniques For English Alphabets
No ratings yet
A Survey On Handwritten Character Recognition (HCR) Techniques For English Alphabets
12 pages
Thesis Research Proposal
No ratings yet
Thesis Research Proposal
5 pages
Ijst 2024 2220
No ratings yet
Ijst 2024 2220
13 pages
Implementation of A Statistical Based Arabic Character Recognition System
No ratings yet
Implementation of A Statistical Based Arabic Character Recognition System
4 pages
Segmentation of The Overlapping Kannada Characters: Soumyadeep Sinha
No ratings yet
Segmentation of The Overlapping Kannada Characters: Soumyadeep Sinha
2 pages
Devanagari Handwritten Character Segmentation
No ratings yet
Devanagari Handwritten Character Segmentation
4 pages
A Novel Approach For Structural Feature Extraction: Contour vs. Direction
No ratings yet
A Novel Approach For Structural Feature Extraction: Contour vs. Direction
15 pages
Character Extraction Algorithm For Handwritten Character Recognition Systems
No ratings yet
Character Extraction Algorithm For Handwritten Character Recognition Systems
8 pages
Final A Two Stage Character Segmentation Technique
100% (1)
Final A Two Stage Character Segmentation Technique
31 pages
Layer 3 1
No ratings yet
Layer 3 1
5 pages
Ekram 2017
No ratings yet
Ekram 2017
4 pages
Farsi & Arabic Sub-Word Separation
No ratings yet
Farsi & Arabic Sub-Word Separation
3 pages
A Survey On Odia Handwritten Character Recognition
No ratings yet
A Survey On Odia Handwritten Character Recognition
2 pages
Presentaton
No ratings yet
Presentaton
6 pages
Crumpler Cybersecurity FINAL
No ratings yet
Crumpler Cybersecurity FINAL
10 pages
06 - Chapter 6 - Database Design and The Entity-Relationship Model
No ratings yet
06 - Chapter 6 - Database Design and The Entity-Relationship Model
28 pages
Paper 11
No ratings yet
Paper 11
9 pages
07-Database Design-ERM 15-04-2020
No ratings yet
07-Database Design-ERM 15-04-2020
24 pages
05 The SQL DML Language 15-04-2020
No ratings yet
05 The SQL DML Language 15-04-2020
48 pages
Lect - 02 Dr. Belal
No ratings yet
Lect - 02 Dr. Belal
26 pages
Applications of Stack
No ratings yet
Applications of Stack
4 pages
Building E-Commerce and E-Learning Models: Hassanin M. Al-Barhamtoshy
No ratings yet
Building E-Commerce and E-Learning Models: Hassanin M. Al-Barhamtoshy
25 pages
Lecture No1
No ratings yet
Lecture No1
23 pages
Oracle 10G: E. Seham AL-Seragee
No ratings yet
Oracle 10G: E. Seham AL-Seragee
30 pages
نماذج اجتماعيات الصف التاسع 2021 2022ـ
No ratings yet
نماذج اجتماعيات الصف التاسع 2021 2022ـ
20 pages
Sample Midterm
No ratings yet
Sample Midterm
9 pages
نماذج اختبارات منوع
No ratings yet
نماذج اختبارات منوع
2 pages
Summary Report
No ratings yet
Summary Report
1 page
Bakel
No ratings yet
Bakel
6 pages
CPE740 Fall10 Project3
No ratings yet
CPE740 Fall10 Project3
2 pages
2
No ratings yet
2
11 pages
Lexmark Cx820 Cx827 Xc6152 Xc6150 MFP Service Manual
No ratings yet
Lexmark Cx820 Cx827 Xc6152 Xc6150 MFP Service Manual
1,000 pages
XP-400 XP-300 XP-200 XP-100 ME-101 Seriese
No ratings yet
XP-400 XP-300 XP-200 XP-100 ME-101 Seriese
78 pages
Agile Risk Management: A Remote Deposit Capture Imperati Ve
No ratings yet
Agile Risk Management: A Remote Deposit Capture Imperati Ve
24 pages
Dowsing Resources & Techniques Joe Smith Diagnosing Technique
No ratings yet
Dowsing Resources & Techniques Joe Smith Diagnosing Technique
6 pages
HP Smart Tank Wireless 615
No ratings yet
HP Smart Tank Wireless 615
2 pages
Baymax Scanner: Medical Innovation
No ratings yet
Baymax Scanner: Medical Innovation
23 pages
ICT-Full Version-PDF-1
No ratings yet
ICT-Full Version-PDF-1
162 pages
PCTEL RF Solutions Presentation Scanner MX EXf IBf
100% (1)
PCTEL RF Solutions Presentation Scanner MX EXf IBf
35 pages
Product Guide Ineo 163 - 213
No ratings yet
Product Guide Ineo 163 - 213
40 pages
Epson WP-4590 4540 4530 4520 4510 4090 4020 4010 Series
100% (1)
Epson WP-4590 4540 4530 4520 4510 4090 4020 4010 Series
107 pages
101 Testing Vlsi
No ratings yet
101 Testing Vlsi
21 pages
8MP Film Scanner Manual
No ratings yet
8MP Film Scanner Manual
30 pages
ALE-1000 + ST-1200 Edger Operation Manuals
No ratings yet
ALE-1000 + ST-1200 Edger Operation Manuals
26 pages
DH-IPC-HFW2231T-ZS-S2: 2MP WDR IR Bullet Network Camera
No ratings yet
DH-IPC-HFW2231T-ZS-S2: 2MP WDR IR Bullet Network Camera
3 pages
The TC52/TC57
0% (1)
The TC52/TC57
34 pages
Mo2 - Business Resources
No ratings yet
Mo2 - Business Resources
41 pages
Bookkeeping NC III Curriculum Guide
90% (31)
Bookkeeping NC III Curriculum Guide
83 pages
Magnetic Flux Leakage MFL Inspection Limitations
100% (4)
Magnetic Flux Leakage MFL Inspection Limitations
30 pages
Equitrac Ver4.2 User Guide Ver1.3
No ratings yet
Equitrac Ver4.2 User Guide Ver1.3
20 pages
Visual Systems Design Dec 2007
No ratings yet
Visual Systems Design Dec 2007
60 pages
Photoshop Notes Down
No ratings yet
Photoshop Notes Down
55 pages
Votinng System Presented By:: Miss - Kamble Ankita Laxman
No ratings yet
Votinng System Presented By:: Miss - Kamble Ankita Laxman
15 pages
LACTOSENSE R Guidelines Product Sheet - PREV
No ratings yet
LACTOSENSE R Guidelines Product Sheet - PREV
2 pages
Konica-Minolta BizHub Pro 1050,1050P Theory of Operation
100% (10)
Konica-Minolta BizHub Pro 1050,1050P Theory of Operation
460 pages
Pagepack™ Center 1.4: User Guide
No ratings yet
Pagepack™ Center 1.4: User Guide
37 pages
CUDA Parallel Prefix Sum Guide
No ratings yet
CUDA Parallel Prefix Sum Guide
21 pages
Introduction To Computer by Norton CH 02B
No ratings yet
Introduction To Computer by Norton CH 02B
12 pages
Ubfb3043/Ubfb3743 Financial Markets and Institutions
No ratings yet
Ubfb3043/Ubfb3743 Financial Markets and Institutions
2 pages
1
No ratings yet
1
13 pages
233 DW
No ratings yet
233 DW
4 pages

Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm

Uploaded by

Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm

Uploaded by

98 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.

Segmentation Accuracy for Offline Arabic Handwritten

ABSTRACT without segmentation [2]. On the other hand, the complete

1. Introduction In the literature, multiple research works reported the

Manuscript received September 5, 2016

3.2 Pre-processing Feature extraction is to extract useful information from

Thinning is a process to reduce the foreground regions in

the word that is written in a row would be a distinct peaks 40

larger and more. Therefore, the chart can looked of the

degrees, which is the most common form of slant angles 55

in a regular writing. Every angle, the histogram vertical 60

calculated, application function scale, which measures the (a)

go through the first set great strides, say 5 degrees. Each 38

of these angles has to identify those with the highest 42

standards, and looking about each of these with smaller 44

The technique projection profile based technique 50

be on the image of the horizontal gradient to calculate 50

vertical strokes at the expense of those horizontal and 51

second, will reduce the cost of the expense, due to the

need to address the relatively less pixels. It is determined

extremist points in the horizontal extent to contour the

of the absolute differences involving the coordinates x of (c)

x´ = x − y. tan (α), y´ = y (8)

LP = {P|P ∈ f f (i (x, y))} (12)

The branch points in the skeleton with 3 to 4

Figure 7: (a) Overlapped CCs within a given word image, (b)

Basely conduct the distance analysis on their bounding

reported in 13% of all cases, as illustrates in Fig. 12 and

The note through experiments that an algorithm

You might also like