Thinning Algorithms for Arabic OCR
M.Tellache', M.A.Sid-Ahmed' and B.Abaza'
           First author was a student at the Univ. of Windsor, now in
           Algeria I.
       *   Second author is with the University of Windsor.
           Third author is with B t Y research lab.
                 Abstract                         A thinning algorithm is
           Two parallel thinning al-         considered useful if:
   gorithms are presented. One is                 1. It preserves the shape
   based on local operations to              of the original image.
   detect edge points, end-points                 2. It preserves the con-
   and break-points. The other is            nectivity.
   a matching algorithm in which a                3. It does not leave ex-
   set of eight templates and two            traneous pixels (branches).
   images (the current image and a
   ::.c?rk:nu ,maqa) arc. used in the                   2. Procedure.
   processiilg     .                                Most skeletonization al-
                                             gorithms consists of iterative-
               1. Introduction.              ly executing many passes over
        L a b i f 'cysad text consists       the pattern where iz eneL .-ass
   of connected characters forming           a few dark points are deleted.
   portrons Df words and isolated                   In any pass, .=i dark point
   characters. Thischaracteristic            to be deleted from the pattern
   makes recognition of typed text           must satisfy the following in-
   a more challenging one than               tuitive criteria:
   English text. The isolation of            a. It is an edge point.
   the characters from the connec-           b. It is not an end point.
   ted Arabic text is required a             c. It is not a break point.
   preprocessing step. Algorithms            d. Its deletion must not cause
   that do not require character             excessive erosion.
   isolation such as the vector-                    The processing of a thinn-
   stroke approach (11 processes             ing algorithm can either be
   the thinned image of a scanned            serial (sequential) or parall-
   page. Such algorithms are more            el. If g(i,j) and g'(i,j)l de-
   suited for the cursive nature             note respectively the input and
   of the Arabic text. Thinning              output of an elementary step in
   plays a major role in such an             point (i,j) then serial pro-
   application, andsince recogni-            cessing is defined by
   tion is dependent in part on                g' (i ,j)=F [: g' (i-1,j-1) ,gat( f -
   the effectiveness of the thin-            W + l )      '(i-1, j+l) , g f(I,]-
   ning algorithm, attention is              1) ,s(i,;;J,g(i,jtl) tg(i+ltj-l)
   given in this paper to the de-            g(l+l/I),g(i+L1+1)3 *
   velopment of effective thinning                  In parallel processing all
   algorithms for the purpose of             the points can be processed
   developing an Arabic OCR.                 simultaneously, then
        Thinning (or skeletoniza-              g' (i,j) =F[g (i-1, j-1) ,g (i-
   tion) of a binary pattern con-            1,j+1) ,g(i-1, j + l >,g(i,j-
   sists of successive deletion of           1) ,g(i,j),g(i,j+l),g(i+l,j-l),
   dark points (i.e. changing them           g ( i + W ) ,g(i+l,j+1! 1
   to white points) along the edg-                  In the following two stlz-
   es of the pattern until it is             tions two parallel algorithms
   thinned to a line.                        are developed.
IEEE Pac Rim '93                  -248   -                            0 1993 IEEE
                                                  0-7803-0971-5/93/$3.00
     3. PARALU3L-G                                        .
                                                         3 P2xP4xP6==0
                                                         4. P2xP4xP8=0
AuJo9sx1Rf153,
                                                         3.2   wn8tion      of   the
                               3 neighbor-
hood of the input image:
                                                    because they are the same for
                   Psi Pl P4                        all the sub-iterations, then
                                                    candit$ons 3 and 4 for each
                                                    sub-iteration is explained.
                                                          1. By co-ndition 1 the end-
         The algorithm is as fol-                   points of the skeleton line are
 lows:                                             preserved, i.e. this verifies
         In each pass, the input                   that PI is not am end-point.
pattern is scanned from the                              2. Condition 2 verifies
upper left hand corner to the                      that PI to be deleted is not a
i.--v      right- hand corner. In                  break-point, i.e. its deletiorr
each pass a dark point Pi is                       must not break the connected-
flagged if at least one of each
sub-iteration is satisfied.                              3. &
                                                                 . .
                                                   ness of the original pattern.
                                                                               d 4 or
         Nets +&et each of the sub-                the first sub-iteretion.
icerations is divided into four
               -.
                                                         By f % b e o o n d i t h s the
b4,ririditions                                     poirrt P1 which has been removed
F i r s t sub i-tion-                              might be an east border point
         1. 2SB(Pl)S6                              or south border point or north-
         2. A(P1)=1                                west corner point [2],[3].
         3. P2XP4XP6=0                                   The solution to the set of
         4. PIXP6XP8=0                             equations are:
where                                                    P4=0 or P6=0 or P2-0 and
   A(P1) is the number of 01
pattern in the ordered P2, P3,
                                                   P8=0.          . . s 3 and 4 for
       .
P4,. 8P9 and P2 that are the
eight neighbors of P1.
                                                         4. -tion
                                                   the second. mub-iteratian.
                                                         By these conditions the
         B(P1) is the number of                    point P1 which has been removed
nonhero neighbors of Pi. That                      might be a south border point
is                                                 or a west border point or a
B(Pl)=P2+P3+P4+P+P6+P+P8+P9                        north-east corner point.
                   -
Second sub iteration.                                    The solution to the set of
                                                   equations are:
     1. 21B(P1)16                                        P6=0 or P8=0 or P2=0 and
     2. A(P1)=1                                    P4=0.
     3. P2XP6XP8=0                                       5. Con-         3 and 4 for
     4. P4XP6XP8=0                                 the third sub iteration.
                                                                 h
               -
T h i r d s u b i t e r a t ion.                         By these conditions the
                                                   point P1 which has been removed
         1. 2SB(P1)56                              might he a north border point
           .
         2. A(Pl)=l
         3 P2XP4XP8=0
         4. P2XP6XP8=0
                                                   or a west border point or a
                                                   south-east corner point.
                                                         The solution to the set of
                -
Fourth sub i t a r p t i on .                      equations are:
                                                         P2=0 or P8=0 or P4=0 and
     1. 2SB(P1)56                                  P6=0.
     2. A(P1)=1
                                        -249   -
        6. Conditions 3 and 4 for             templates. In the templates
t            a            .                   zeros must match 0 valued pix-
     By these conditions the                  els and ones must match 1 val-
point P1 which has been removed               ued pixels and asterisks must
might be a north border point                 match either 1 or 0 valued pix-
or an east border point or a                  els in the current image.
south-west corner point.                           Initially the current im-
     The solution to the set of               age and the working image are
equations are:                                identical copies of the origi-
     P2=0 or P4=0 or P6=0 and                 nal input image. Template A1 is
P8=0.                                         compared with all pixels having
     The above algorithm can                  a value 1 and their neighbors
also be implemented using tem-                in the current image. If the
plates.                                       match is obtained, then the
     The process is repeated                  central pixel is removed in the
until no more pixels are delet-               working image. After processing
ed from the pattern, thus the                 with template A l l the current
final skeleton is obtained.                   image is discarded and the wor-
     Fig.1 shows the result of                king image becomes the new cur-
applying the above algorithm.                 rent image, and the new working
                                              image is obtained by copying
                                              the new current image. The pro-
                                              cess is repeated with templates
                                              A2, A3 , ...,B4 forming a com-
                                              plete cycle until no more pix-
                                              els are removed (i.e. the skel-
                                              eton is obtained.)
                                                   This method preserves the
                                              shape of the original image and
                                              maintains connectivity. The
                                              algorithm leaves a few extran-
                                              eous pixels only on the dots of
                                              the characters as shown in
F i g u r e 1. Result of applying             Fig.3. To avoid this problem
the parallel thinning al-                     the character and the dots are
gorithm.
                                              o o *   * o o    * 1 *    * 1 *
                                              0 1 1 1 1 0      1 1 0    0 1 1
                                              * 1 * * 1 *      * o o    o o *
        4.   MATCHING       ALGO-               A1     A2         A3      A4
RITHM.                                        0 0 0   1 * 0    * 1 1    o * *
        4 . 1 Alcroritbm.                     * 1 *   1 1 0    * 1 *    0 1 1
     Matching algorithm util-                 1 1 *   * * o    0 0 0    0 * 1
izes   eight    templates   [4]                 B1        B2      B3      B4
(Fig.2) and two images. The two
images are the current image                  F i g u r e 2 . Templates f o r the
and a working image of the same               matching algorithm. Note t I * t t
size which is updated when tem-               can either match to 1 or 0.
plates are matched.
     Pixels are removed by com-
paring each pixel having a val-               initially separated.
ue 1 and its neighbors in the
current image with a set of
                                    - 250 -
                                            Acknowledgement.
                                                 The support provided by
                                            the Defense Industrial Research
                                            Program (DIRP), Department of
                                            National Defense, Ottawa, is
                                            greatly appreciated.
     WO parallel thinning al-
gorich0 w a e pr-ted.     Both
algorithur preserve very well
the shap of the original image
and yield results that can be
effectively used in Arabic OCR
algorithms.
1. H.Almuallim and S.Yamaguchi,
"A method of recognition of
Arabic    Cursive    Handwriting" ,
IEEE Trans on PAUI Sept. 1987 ,
Vol. PAnr-9 "her 5, pp.715-
723.
2. T.Y.Zhang and C.Y.Suen, "A
fast parallel algorithm for
thinning digital patterns",
CO".    mn 27, 1988, pp. 236-
239.
3. Y.S.Chen   and W.H.HSU,   "A
modified fast parallel algo-
rithm for thinning digital pat-
terns",    Pattern    Recognition
Letters 7, 1988, pp.99-106.
4. C.Arcellf, L-Cordella and
S.Levialdi, "Parallel thinning
of binary pictures* , Electronic
Letters, Vol. 11, pp.148-149,
July 1975.
                                 -251   -