Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm
Segmentation Accuracy For Offline Arabic Handwritten Recognition Based On Bounding Box Algorithm
9, September 2016
using the chain- code representation of the upper contour, parts of the word image. In the two phases the method
and are paired adjacent points is the slope of the line applies neural-based segmentation technique to examine
connecting each pair calculated [8]. AlKhateeb, et al. in all prospective points of segmentation and identify the
2009 proposed technique for segmentation word; the invalid ones [15]. Osman.Y. in 2013 proposed algorithm
method words are extracted and detected in the Arabic for segmentation of word Arabic handwritten. The idea of
handwritten. The technique is based on the distances this algorithm is to segment the image to the lines and sub
between words and also sub-words. The measure words. After that, keep track of all sub word, and the
distances between connected components and analyzed to contour of every sub word. In this algorithm detects the
determine the optimal threshold for the segmentation finer points where the contour condition of a horizontal
word [9]. Al Hamad and Abu Zitar in 2010 proposed line is changed to another state of the vertical line [16].
segmentation algorithm and strategy validation for the Elnagar,A & Bentrcia, R. in 2015 proposed effective
Arabic words handwritten. The described technique to segmentation Arabic words handwriting method. In this
segmenting a word to its priorities is used ANN to method, using a multi agent technique to segment words
validate points segmentation on the basis of certain and relied on recognition to verify the validity of the
features. Technique works in three phases. Is obtained by candidate segmentation points. The method use of
an over segmentation on the histogram vertical modified artificial neural network along with the compilation of
of the word thinning. First, the segmentation is using a rules lead to good treatment of the problem of excessive
heuristic algorithm to segmentation of the Arabic word to segmentation of the handwritings are Arab. This was due
the primitives, then, the extraction of the structural to a proxy resolution, which shall take the appropriate
features of the characters using the modified direction decisions to determine the candidate segmentation points.
features method. Then, conversion of these features in the The segments pass led to the identification that will
character of the ANN training and testing into validates invoke and apply the rules and agent pool on the
the point's segmentation [10]. Elzobi, M., et al. in 2011 unrecognized slides before passing to recognize again
suggested segmentation algorithm has two phases. First [17].
algorithm starts by processing stage, it considered issues The note in this section is that the researchers have
such as skew words and slant correct. Secondly proposed several algorithms for segmentation of words.
segmentation stage by detecting and solving sub words Research workers typically use the easy algorithm to
overlapping, and then is applied segmentation using horizontal and vertical projections of the word
topographical features through a set of rules heuristic [11]. picture and search for minima’s to segment characters
Lawgali et al. in 2011 proposed algorithm exploited the from words. The algorithm of Arabic Heuristic
segmentation points that occur through end of a letter and segmentation is used to segment a word into primitives.
start of the after letter, also are located in the baseline of Subsequently, the features of the structural characters are
the region surrounding to present a segmentation extracted through the use of the Modified Direction
algorithm of Arabic handwritten words. In this Features technique, and there algorithm begins with
segmentation algorithm starts with segmenting the word segmenting the word into sub-words or connected
into sub words and the baseline of every sub word is components and then the baseline of every sub-word is
computed. And then, deletes all the descended sub- words computed. A variant of that is to make use of the
that have a beginning point below in the baseline. In this projection of a segment across the baseline to prevent the
algorithm used vertical projection for find the candidate problems of overlapping characters and holes. Some
points for the segmentation [12]. Eraqi & Abdelazeem in researchers use the minima of the upper profiles of words.
2012 proposed technique combined the neighborhood A lot of the algorithms presume that the characters are
geometric characteristics and the local writing direction connected at baseline. Other methods use the upper
information to propose a new efficient explicit method for contour rather of projections. Many researchers over
offline handwriting Arabic segmentation which segment the text and finalize the segmentation after
segmented the text into graphemes [13]. Samoud et al. in recognition by combining segments until characters are
2012 proposed two combining methods for segmenting shaped. In this event they use all potential mixes of
Arabic handwritten word into characters. The one method consequent segments. Yet other algorithms thin the word
was on the basis of the analysis of the contour minima and or make use of the skeleton of the word to simplify the
maxima and the projection. The two methods were on the segmentation. From a previous works bounding box
basis of Hough Transform and also Mathematical segmentation algorithm will modify to improve
Morphology operators [14]. Al Hamad.H in 2013 segmentation accuracy for handwritten Arabic text
proposed fusion equations for improving the segmentation recognition.
of Arabic word, this method has two phases. In the one
phase the method applied Arabic Heuristic Segment to
place the prospective points of segmentation in the each
100 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016
3. Optical Character Recognition skeletonization (thinning) the threshold image and slant
correction.
Optical character recognition (OCR) is a research field in Skew is the tilt in the image that occurs during scanning,
pattern recognition, artificial intelligence and computer if the paper is not fed straight into the scanner.
vision. It is used widely as a form of data entry from Skeletonization removes the width of the image from
paper data printed records, both passports, invoices, bank much pixel width to a single pixel width [18]. The
statements, receipts computerized, business cards, mail, preprocessed image is used as input in further phase after
publications of static data, or any appropriate documents. removing the above mentioned imperfections. In order to
It is common to digitize printed text the way so they can achieve the highest recognition rates, it is necessary to
be electronically adjusted, inspected or stored more have an effective preprocessing phase, and therefore;
compact, offer on the Internet, and is used in operations, using the effective processing algorithms makes the OCR
such as automatic machine translation, and text-to-speech, system power mainly through precise image enhancement,
the fundamental data and text mining. It uses OCR also by and noise removal system, the threshold image, thinning,
some of the archives as a means to transform the massive skew and slant correction, as describe it in detail in
amounts of handwritten to search digital forms of section 4.
historical documents, easy access. In general, Recognize
Handwritten Letters Systems are divided, according to 3.3 Segmentation
preliminary data (image) acquisition, to the main systems;
on-line and off-line systems. There are many phases in Segmentation is a very important stage in any recognition
OCR systems performed one by one to carry out the system. Segmentation includes the separation of text to
whole task [4]. The phases of OCR systems include the lines, lines to word and also word into characters.
following phases: data acquisition, pre-processing, Handwritten text has a lot of problems, such as touching
segmentation, feature extraction, classification, and post of the characters, leading to segmentation inappropriate
processing, these phases are as follows: and errors in the segmentation can reduce the rate of
recognition. Therefore, efforts should be made to develop
3.1 Data Acquisition good segmentation techniques. Two techniques were
applied to divide the printing and Arabic words
Any OCR takes as input data in two directions, either handwritten machine to segment characters: explicit
online or offline system. In handwriting recognition on segmentations and implicit segmentations. Explicit
the online, when use a special pen to write on a digital segmentation: words are externally segmented into
tablet, it is also that image is stored in digital form. When pseudo-letters, which are individually recognized. Implicit
a handwritten word image scanned, it is converted into a segmentation: usually the design of this type of
digital image. The words images that experimented on are segmentation with the rules that tries to identify all points
gray scale images, taken from an under construction of segmentation image for the segment words directly to
database; conventional flatbed scanner is used to extract the letters. Implicit segmentation operation is performed
the text with 350 dpi resolution. The form test includes by several methods such as region based segmentation,
450 words. The set of words that include all the forms of the edge-based segmentation, threshold based
the characters in all positions in Arabic are used. The segmentation, clustering technique and, bounding box
images saved in PNG graphic file format rather than other algorithm [19]. In this paper the segmentation improved
format, for example, TIFF, BMP, JPEG, or GIF, since using bounding box algorithm.
PNG files are relatively smaller in size with no loss in
quality. 3.4 Feature Extraction
statistical features are extracted in the statistical removal, binarization, skew correction, thinning and slant
distribution of pixels which describe the feature correction. Finally explain segmentation phase to
measurements of the input picture pattern. The structural segmentation of off-line handwritten Arabic character.
features include the geometrical and topological features
of an input picture [20]. In this paper use topological 4.1 Arabic Writing Characteristics
features to extract useful information from the text image
that can be used for the recognition purpose. The actual Arabic alphabet contains of 28 characters and
contains numerous characteristics. Arabic writing process
3.5 Classification differs than the English language; Arabic is written from
right to left and it is cursive in general. The alphabet set
Classification phase is the phase of making major can broaden to 84 different forms based on the location of
decisions for any OCR system. It classifies unknown the character in addition to the style of writing (Nasekh,
character into different classes based on the extracted Roqa’a, Farisi and others). Table 1 shows the different
features. A class is a feature space or region in which the forms of Arabic characters depending on their location
particular character falls. The different algorithms are within the word and from the 28 basic Arabic characters,
used to classify characters pixel-based, statistical, six may be linked in the right part : dal ()د, raa ( )رwaw ()و,
structural and neural network. Typical character alef ()ا, thal ( )ذ, and zay ()ز, just as the other 22 can be
classification systems typical of many of the features of linked from many sides. Most of these six characters
each character picture on the basis of similarity of feature include only two forms, the stand alone form and the last
vectors to the character class, trying to classify. There are form. Although other characters can come in any of four
various character classifier structures of isolated forms: the beginning, the middle, the last, and the stand
handwritten character classification, such as simple linear alone form. Therefore, the Arabic word might contain one
classifiers, two-phase tree classifiers, and hierarchical or more connected components [19].
classifiers. According to the results of tests on the The secondary components (dots) perform an important
handwritten characters which combine multiple classifiers role in Arabic characters. The form of many characters is
is an effective way to produce works extremely reliable similar but the difference occurs with number and position
decision classifiers [21]. of dots, which could take place either above or below the
characters, like ( ث, ت,)ب. Two characters in the alphabet
3.6 Post-processing have three dots; three have two dots and ten have one dot.
Dots can take place as two distinctive dots or could be
Post-processing system is the main stage to correct connected in a line in handwritten word. The difficulty in
segmentation and classification errors without human recognizing the secondary components comes due to
intervention. Recognize some of the characters that cannot quickly writing, as writers draw them connected to the
be properly segmented in a speech during a post-treatment, main body.
and the word can also be interpreted as a whole. And can In Arabic, small marks like a '"hamza", may be located
classification process output go through a phase error above or below five distinct characters or can appear as
detection and correction. Post processing include isolated characters. In addition, besides the alphabet there
dictionary look up and apply of language-specific are what so-called Diacritic Symbols, which are used to
information on words unrecognized. Known from the indicate vowels, and written in a very small size
lexical knowledge of contextual post-processing (compared to letters size) above or below a letter (e.g.) ,
application compares the dictionary on the basis of top to ).
bottom and statistical algorithm bottom to top. Finally, the The cursive nature of Arabic text means that characters of
post-processing of the context of the results of OCR can a word are connected through an imaginary horizontally
also take into account the knowledge of the context of line known as baseline. Arabic writing is proven to be
words [22]. cursive additionally in printed type. On the other hand,
that is not the same as cursive handwriting of the English
language in that several characters may be connected in
4. Offline Handwritten Arabic Character one part only. Several Arabic characters have a loop, like
Segmentation ( و, ف,[ )ص23].
Firstly, in this section describe Arabic writing
characteristics to help full understand methodology used
in the segmentation of off-line handwritten Arabic
character. After that describe, using example, most
methods in pre-processing phase of OCR systems as
mentioned in the previous section; such that: noise
102 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016
Table 1. The different forms of Arabic characters depending on their Figure 1: Noise removal in a word image using median filtering
location within the word. technique with size 3X3 (a) Grey scale word image with noise. (b)
Name Isolated First Middle Last Image (a) without noise.
Alif ا ا ا ا
Baa ب ﺑـ ـﺑـ ـب
Taa ت ﺗـ ـﺗـ ـت
4.2.2 Binarization
Thaa ث ﺛـ ـﺛـ ـث
Geem ج ﺟـ ـﺟـ ـﺞ The global thresholding used to converting image into
Hha ح ﺣـ ـﺣـ ـﺢ binary image which iteratively determine all possible
Kha خ ﺧـ ـﺧـ ـﺦ
threshold values and find out there variance [3]. The
Dal د د ـد ـد
Thal ذ ذ ـذ ـذ output binary image has values of 0 as the front pixels
Raa ر ر ـر ـر (black) for all pixels in the input image and 1 as the
Zain ز ز ـز ـز background pixels (white) for all other pixels [10]. Fig. 2
Seen س ﺳـ ـﺳـ ـس
Sheen ش ﺷـ ـﺷـ ـش
shows the binary word image of Fig. 1 (b).
Saad ص ﺻـ ـﺻـ ـص
Dhad ض ﺿـ ـﺿـ ـض
Tta ط طـ ـطـ ـط
Zha ظ ظـ ـظـ ـظ
Ain ع ﻋـ ـﻌـ ـﻊ
Ghain غ ﻏـ ـﻐـ ـﻎ
Faa ف ﻓـ ـﻔـ ـف Figure 2: The binary word image of Fig. 1 (b).
Gaf ق ﻗـ ـﻘـ ـﻖ
Kaf ك ﻛـ ـﻛـ ـك
Lam ل ﻟـ ـﻠـ ـل
4.2.3 Skew correction and baseline estimation
Meem م ﻣـ ـﻣـ ـم
Noon ن ﻧـ ـﻧـ ـن Skew correction is based on the estimation of a fitting line
Haa ه ھـ ـﮭـ ـﮫ used during the writing process. In this paper, use the
Waw و و ـو ـو algorithm estimate linear regression of this line is the use
Yaa ي ﯾـ ـﯾـ ـﻲ
made by linear regression of local minima of the word
image skeleton (LMR) [26]. Benefiting from the fact that
4.2 Pre-processing
most of the local minima (LM) points are usually
The purpose of this phase is enhancing the readability of occurring on near of the baseline; the problem of finding
text image and removing the details that do not have the the baseline can be reduced to a linear fitting problem of
discriminatory power in the recognition process. The pre- local minima points. However, this point contains points
processing is a series of operations performed on the from the descending letter and the baseline estimate. A
scanned input text image. It essentially enhances the consequence, those spurious points have to be filtered
resulting image to suitable for segmentation, which prior to the baseline estimation. The method is performed
includes noise removal, binarization, skew correction, and it is based on a linear regression on the remaining points
thinning and slant correction, as description in the to estimate the skew of the baseline image of the word,
following [16, 21, 24]: then the skew correction using the rotation. The baseline
detection algorithm is dependent on a two-step linear
4.2.1 Noise Removal regression:
First step begins through the fitting line of local minima
The spatial noise descriptor which shall be concerned is points calculated according to the following equations:
"salt and pepper" of structural features in the noise
component of the model. The "salt and pepper" noise y = a + bx (1)
model is the most common in OCR system found in Where a and b are coefficients calculated as follow:
image processing applications. Median filtering technique
is non-linear helpful to remove noise from pictures, it's 𝑎𝑎 = 𝑦𝑦� − 𝑏𝑏𝑥𝑥̅ (2)
especially efficient to removing the "salt and pepper"
∑𝑛𝑛 �)
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )(𝑦𝑦𝑖𝑖 −𝑦𝑦
noise [25]. In this research used median filtering of size 𝑏𝑏 = ∑𝑛𝑛 2 (3)
3X3 to remove noise from text image. Fig. 1 shows the 𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )
noise removal in a word image using this technique. Where 𝑥𝑥̅ and 𝑦𝑦� are the statistical means of x and y
coordinates, respectively. The slope angle α of inclination
processed line is calculated according to the following
equation:
α = arctan(b) (4)
(a) (b)
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.9, September 2016 103
Second step to compute baseline using a give the θ limited consists of removing all the contour points of the picture
area. First, discretize the θ and the parameters ρ then each except those points that belong to the skeleton. In order to
additional point (x i , y i ) at area of the image; calculated ρ´ preserve the connectivity of the skeleton, it divides
as stated in Eq. (5): iteration into two sub iterations. In the first sub iteration,
the contour point P 1 is deleted from the digital pattern if it
ρ´ = x i sin θ´ + y i cos θ´ ∀ θ´ ∈ [α − ε, α + ε] (5) satisfies the following conditions:
where ε is constant that uses to offset the random error a) 2 <= N (P 1 ) <= 6. b) S(P 1 ) =1.
that can be produced in the first step. Experimentally, it c) P 2 *P 4 *P 6 = 0. d) P 4 *P 6 *P 8 =
found that ε = 10𝑜𝑜 gives most accurate results. Next, 0.
each point in the image space will vote for bins that could
have generated it in the though accumulator A, and votes Where S(P 1 ) is the number of 01 patterns in the ordered
will be accumulated in A according to Eq. (6). sequence of P 2 , P 3 , … P 8 , P 9 and N(P 1 ) is the number of
nonzero neighbors of P 1 , that is,
A(ρ´, θ´) = A (ρ´, θ´) +1 (6)
N(P 1 ) = P 2 + P 3 + ··· + P 8 + P 9 .
Finally, it will be considered ρ' and the θ' with the In the second sub iteration, conditions (a) and (b) remain
maximum number (global maxima) of votes will be the same, but conditions (c) and (d) are changed to
considered as the parameters of the word baseline as
shown in the following equation: c') P 2 *P 4 *P 8 = 0. d') P 2 *P 6 *P 8 =
0.
arg max 𝐴𝐴(ρ´, θ´) (7)
𝑝𝑝′,θ´ Is executed one step to each pixel in the binary area under
consideration, In the event of one or more violations of
Fig. 3 shows an example of the results, where Fig. 3(a) is the requirements (a) to (d), does not change the value of
the binary word image and its baseline estimation. Fig. the points in question. If the all of the requirements and a
3(b) is the skew corrected image with LMs: skew angle = point of has developed for deletion are met. And it is
-1.749. important to note, that the point is not deleted even
address all the points. This prevents changing the
structure of the data during the implementation of the
algorithm. After they have been applied one step to each
pixel, and the ones that have been flagged are deleted.
(a) (b)
Then apply a two-step to the resulting data at exactly the
Figure 3 skew correction and baseline estimation, (a) Binary word image
same manner as a single step. Fig.4 shows the thinned
and its baseline estimation (b) Skew corrected using linear regression. word image.
4.2.4 Thinning
of each line component provide the slant angle. The Fig. 5 shows a slant corrected, Fig 5(a - c) shows a binary
algorithm used by which projection profiles are computed word image of Fig. 2, the thinned word image of Fig. 4
for a number of angles from the vertical direction [28]. and the slant corrected version of Fig. 4, respectively; and
In this paper use for slant correction technique its corresponding vertical projection profiles.
the vertical projection histogram [29]. The histogram of 35
word in different shear angles and take the one with the
highest peaks. It does for the angles between -45 and 45 50
34
measure and will be used as the shear angle. To save time, 36
40
46
steps. 48
52
0 20 40 60 80 100 120 140
are calculated on the image of the horizontal gradient in
different shear angles in the range [± 45], and used to (b)
48
estimate the slant angle. For the first time, the focus will 49
Firstly, let P describes to foreground pixel in the below the half line of potential as points. It must
skeleton word g(x, y), then allow the 8-neighborhood set determine the number of points and its location relative to
of P. Second, by analyze every P ∈ g(x, y), a set of feature the main skeleton structure of the character in every part.
points are identified, that call the main feature points And it must be done to determine the number of points
(MFPs). Third, determine bounding box segmentation can be one, two, or three, can also be above or below the
algorithm. Finally, generate a new set called the cut point main skeleton of the structure of the character. All pieces
(CP). As describe in the following sub-sections: are accounted for the first points that follow each track of
each endpoint. If you reach the end point along the track
4.3.1 The Main Feature Points to another track procedure finds less than the threshold
point. If the path pixels more than one, and joined the
Structural and statistical features are the most commonly halfway point of the feature. This is then added to the
used features of the character recognition. Choose the connected components (CCs) are cleared endpoint at this
type of features and extraction of the characters is a very point feature (one point). Contour, if the width of the
critical step. Feature extraction transfer of Two- point is twice the height of a point, then the line is
dimensional image into a set of vectors that are letter considered to be a few points. Therefor the dot points (DP)
input representation by a set of numerical values to pass to is the union of the set of all isolated pixels, and the set of
determine the recognition. Since the words are pixels that belong to CCs that are less in size than an
represented in the system by the skeletal pattern so most adaptive threshold T proportional to the estimated
of the topological features are suitable for this character size calculated upon the thinned text image.
representation [30]. The topological features were chosen
are: loop point, branch point, dot point and end point, DP = {P|N8(P) = 0}∪{P|P ∈ CC ∧ size(CC) < T} (14)
which all operate on a skeletonized word and describe in
(iv) End Points (EP)
more detail in the following:
The endpoint is the beginning or end of a word segment.
(i) Loop Points (LP)
The end point in the skeleton with only one neighbor,
First convert the starting locations linear indicators which also marks the completion of the strokes, the
because the linear indicators may be used to extract the endpoint is determined by examining each individual one
pixel values of all of the locations and then select eight pixel in the bitmap skeletons. The consequence of the
neighbors N8(P) compensation expense of all the skeleton, one end points of the total eight neighborhoods a
neighbors of a group of pixels in one. We're considering one pixel. Therefore, if the total of eight neighbors N8(P)
in finding the north, east, south, and west neighbors all one, this is the end point.
these pixels and then add all the neighbors compensate for
EP = {P|N8(P) = 1} (15)
each linear index now carry out flood-fill algorithm [31].
The flood-fill (ff) algorithm on a binary image, you Fig. 6 shows a thinned word image with all possible
specify a background pixel as a starting-point, and flood- MFPs that will be utilized to guide the characters
fill changes associated background pixels (1) to segmentation process.
foreground pixels (0), stopping when it reaches object
boundaries. The boundaries are determined according to
the type of neighborhood you specify.
(ii) Branch Points (BP) Figure 6: Thinned word image in Fig. 5 with all possible MFPs.
which are all (CCs) that intersecting with the baseline “y” Fig. 7(a), auxiliary CC number “6” will be assigned to the
coordinate. The second are call auxiliary (CCs), which are main CC number “4”.
all (CCs) that are not intersecting the baseline “y” (iii) If two or more main CCs are intersecting one at least
coordinate. After identifying main CCs and auxiliaries of Ci the absolute distance along y-axis will calculate,
CCs, CCs bounding boxes computed along the y-axis [32]. between right bounding box of the auxiliary CC and right
Fig. 7(a) shows a simple example of CCs of word image, bounding box of the intersecting main CCs; the one with
where the main CCs are 2, 3 and 4, and the auxiliaries are minimal distance wins the auxiliary, like in case C5 of
1, 5 and 6; and the horizontal red line representing the auxiliary CC number “5” that is intersecting both main
baseline. CCs “3” and “4”. The auxiliary CC number “5” will be
assigned to the main CC number “4” because the absolute
distance along y-axis, between the right bounding box of
auxiliary CC number “5” and the right bounding box of
main CC number “4” (d1) less than the absolute distance
along y-axis, between the right bounding box of auxiliary
CC of number “5” and the right bounding box of main CC
number “3” (d2); {d1 < d2}.
Even though the aforementioned rules resolve almost all
the cases, there are some extreme cases where auxiliary
CCs in wrong position of another suitable main CCs. Such
that, if write character ( دDal) before character ( زZain) in
Fig. 7(a). The column Ci is intersecting main CC ( دDal)
and it is suitable for this auxiliary CC, and then assigns
the auxiliary CC to this main according to case (ii). Those
problems can be solved in subsequent recognition phases
like in the post-processing phase, for example, where the
recognition results can be corrected against lexicons using
different text retrieval techniques.
Eighth International Conference on Document Analysis and [21] Ali, A., Shaout, A., Elhafiz, M.: Two stage classifier for
Recognition, vol. 2, pp. 605-609 (2005). Arabic Handwritten Character Recognition, International
[7] Xiu, P., Peng, L., Ding, X., and Wang, H.: Offline Journal of Advanced Research in Computer and
Handwritten Arabic Character Segmentation with Communication Engineering, pp 646- 650 (2015).
Probabilistic Model. Document Analysis Systems. VII, pp. [22] Bassil, Y., Alwani, M.: Ocr Post-Processing Error
402-412(2006). Correction Algorithm Using Google's Online Spelling
[8] Abdulla, S., Al-Nassiri, A., and Salam, R.A.: Offline Suggestion, Journal of Emerging Trends in Computing and
Arabic Handwriting Word Segmentation Using Rotational Information Sciences, Vol. 3, No. 1 (2012).
Invariant Segments Features. The International Arab [23] Zeki, A.M.: The segmentation problem in Arabic character
Journal of Information Technology. vol. 5, no. 2 . pp. 200- recognition: The state of the art. First International
208(2008). Conference on Information and Communication
[9] AlKhateeb,J.H.,Jiang ,J., Ren, J., & Ipson, S.: Component- Technologies, ICICT, pp. 11–26 (2005).
based Segmentation of words from handwritten Arabic text. [24] Farooq, F., Govindaraju, V., and Perrone, M.: Pre-
International Journal of Computer Systems Science and processing Methods for Handwritten Arabic Documents”,
Engineering, 5(1) (2009). In Eighth International Conference on Document Analysis
[10] Al-Hamad H.A., Zitar R. A.: Development of an efficient and Recognition, vol. 1, pp. 267–271 (2005).
neural -Based Segmentation Technique for Arabic [25] Gonzalez, R., and Woods, R., Digital Image Processing
Handwriting Recognition. Pattern Recognition, vol. 43, no. (3rd Edition), Prentice Hall, August (2008).
8, pp. 2773–2798(2010). [26] Boubaker, H., Kherallah, M., Alimi, A.M.: New Algorithm
[11] Elzobi, M., Al-Hamadi, A., Al Aghbari, Z.: Off-line of Straight or Curved Baseline Detection for Short Arabic
Handwritten Arabic Words Segmentation Based on Handwritten Writing. 10th International Conference on
Structural Features and Connected Components Analysis. Document Analysis and Recognition. ICDAR '09, pp. 778-
In I/V Communications and Mobile Network (ISVC) 782. Washington, DC, USA. IEEE Computer Society
(2011). (2009).
[12] Lawgali, A., Bouridane, A., Angelova, M., and [27] ZHANG, T., and SUEN C.: A Fast Parallel Algorithm for
Ghassemlooy, Z.: Automatic segmentation for Arabic Thinning Digital Patterns, Communications of the ACM,
characters in handwriting documents. In Image Processing Volume 27 Number 3, pp 236-239 (1984).
(ICIP), 18th International Conference on IEEE, pp. 3529- [28] Bunke.H. and Wang. P.S.P.: Handbook of Character
3532. IEEE (2011). Recognition and Document Image Analysis. Chapter Image
[13] Eraqi, H., M., and Abdelazeem. S.: A new Efficient Processing Methods for Document Image Analysis, pp. 15,
Graphemes Segmentation Technique for Offline Arabic 19. World Scientic ( 1997).
Handwriting. Frontiers in Handwriting Recognition [29] Slavik, P., Govindaraju, V. (eds.): Equivalence of different
(ICFHR), International Conference on. IEEE, 2012. methods for slant and skew corrections in word recognition
[14] Samoud, F.B., Maddouri, S.S., and Amiri, H.: Three applications. IEEE Trans. Pattern Anal. Mach. Intell. 23(3),
Evaluation Criteria's towards a Comparison of Two pp. 323–326 (2001).
Characters Segmentation Methods for Handwritten Arabic [30] Al Aghbari, Z.: HAH manuscripts: Aholistic paradigm for
Script. Frontiers in Handwriting Recognition (ICFHR), classifying and retrieving historical Arabic handwritten
International Conference on IEEE (2012). documents, Expert Systems with Applications, Vol 36, pp.
[15] Al Hamad, Husam A.: Neural-Based Segmentation 10943- 10951 (2009).
Technique for Arabic Handwriting Scripts. 21st [31] El-Abed, H., and Margner, V.: Comparison of Different
International Conference on Computer Graphics, Preprocessing and Feature Extraction Methods for Offline
Visualization and Computer Vision, WSCG (2013). Recognition of Handwritten Arabic Words. In Ninth
[16] Osman, Y.: Segmentation Algorithm for Arabic International Conference on Document Analysis and
Handwritten Text based on Contour Analysis. International Recognition, vol. 2, pp. 974-978 (2007).
Conference on computing, Electrical and Engineering [32] Elzobi, M., Al-Hamadi, A., Al Aghbari, Z., and Dings, L.:
(ICCEEE) (2013). IESK-ArDB: a database for handwritten Arabic and an
[17] Elnagar, A., and Bentrcia, R.: A Recognition-Based optimized topological segmentation approach, In
Approach to Segmenting Arabic Handwritten Text. Journal International Journal on Document Analysis and
of Intelligent Learning Systems and Applications, 7, pp. 93- Recognition (IJDAR) (2012).
103 (2015). [33] Vincent, L.: Morphological grayscale reconstruction in
[18] MELHI, M., H.: Off-Line Arabic Cursive Handwriting image analysis: applications and efficient algorithms. IEEE
Recognition Using Artificial Neural Networks; PhD thesis. Trans. Image Process. 2, pp. 176–201 (1993).
Department of Cybernetics, Internet and Virtual Systems.
Bradford, University Bradford (2001).
[19] PLAMONDON, R., and SRIHARI, S. N.: Online and off-
line handwriting recognition: a comprehensive survey.
Pattern Analysis and Machine Intelligence, IEEE
Transactions on, pp. 22, 63-84 (2000).
[20] W. M. Newman and R. F. Sproull: Principles of Interactive
Computer Graphics. Sec 17.2. 2nd edition, McGraw Hill
(1989).