0% found this document useful (0 votes)
17 views5 pages

Pan 2005

Uploaded by

Savet Omron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Pan 2005

Uploaded by

Savet Omron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Script Identification Using Steerable Gabor Filters

W. M. Pan[1-2], C. Y. Suen[1], T. D. Bui[2]


[1]
Centre for Pattern Recognition and Machine Intelligence, Concordia University
[2]
Dept. Computer Science and Software Engineering, Concordia University
{wumo_pan, suen} @cenparmi.concordia.ca
bui@cs.concordia.ca

Abstract Cluster-based templates of textual symbols are used in [4]


for language identification. In [6], the length and position
Multi-channel Gabor filtering has been widely used in of the longest horizontal run in a text line are used to
texture classification. In this paper, Gabor filters have classify non-Indian scripts (English, Chinese, Arabic) and
been applied to the problem of script identification in Indian scripts (Devanagari and Bangla). Then three non-
printed documents. Our work is divided into two stages. Indian scripts are differentiated by means of vertical black
Firstly, a Gabor filter bank is appropriately designed so run information, character density, distribution of
that extracted rotation-invariant features can handle lowermost points of the component and some features
scripts that are similar in shape and even share many based on water overflow analogy. For Devanagari and
characters. Secondly, the steerability property of Gabor Bangla scripts, some specific character level stroke
filters is exploited to reduce the high computation cost configurations [7] are used to achieve the identification
resulted from the frequent image filtering, which is a goal. Comparison study of the features extracted at text
common problem encountered in Gabor filter related line level and word level in English/Arabic identification
applications. Results from preliminary experiments are are made in [9]. An automatic feature selection scheme
quite promising, where Chinese, Japanese, Korean and for language identification has been proposed in [8]. In
English are considered. Over 98.5% language general, these methods require preprocessing like deskew,
identification rate can be achieved while image filtering page decomposition and connected component analysis.
operations have been reduced by 40%. Global method formulates the language identification
problem as a texture classification problem. In [5],
1. Introduction representative features for each script are obtained by
computing the mean of the channel output of Gabor filters
Script identification is an important step in the at given radial frequency and orientations. After some
automatic processing of document images in a preprocessing and normalization, Gabor filters are also
multilingual environment. Early determination of the applied to identify handwritten scripts in [10]. Different
language used in a document can greatly facilitate further from local methods, these methods do not require
processing, such as character recognition, document connected component analysis or image skew removal.
image indexing and translation. Many algorithms have In this paper, a Gabor filter bank is appropriately
been proposed in the past years, which can be generally designed so that extracted rotation-invariant features can
divided into two categories: local methods, where local handle scripts that are similar in shape and even share
features are extracted at connected component level or many characters. Then, the steerability property of Gabor
text line level, and global methods, where script filters is exploited to reduce the high computation cost
identification is treated as a texture classification problem. resulted from the frequent image filtering, which is a
Methods fall into the first category include [1, 2, 3, 4, common problem encountered in Gabor filter related
6, 7, 8, 9]. In [1], the vertical distributions of the upward applications.
concavities, the optical densities statistics and the most This paper is organized as follows: the rotation
frequent occurring word shape tokens are applied to invariant features extracted by the Gabor filter bank are
identify seven Han based and Latin based scripts. This introduced in section 2. In section 3, steerablility property
work has been extended in [2, 3] to cope with complex of Gabor filters with respect to rotation is discussed and
degraded document images and with more languages. basis filters are calculated to approximate the exact Gabor

Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005 IEEE
filters. Data preparation and preliminary results are given 2.2. Rotation invariant features
in section 4. Section 5 concludes this paper.
Each input image is first filtered by the Gabor filters in
2. Rotation invariant features via Gabor the filter bank. Then, we calculate the mean and standard
filter bank deviation of the filtered image, which are noted as m(F, I)
and s(F, I) respectively.
2.1. Gabor filter bank design For a given radial frequency, the rotation of the input
image would only result in a shift in I of both m(F, I) and
Gabor filters are complex sinusoidal gratings s(F, I), and the amplitude of the 1-D Fourier transform in
modulated by 2-D Gaussian functions in the space I of these two quantities would remain invariant.
domain, and shifted Gaussian functions in the frequency Therefore, we can obtain rotation invariant features by
domain. They can be configured to have various shapes, taking 1-D Fourier transform in I of both m(F, I) and s(F,
bandwidths, orientations and center frequencies. For the I). In this research, only the amplitudes of the first 5
sake of convenience, we assume in this research that the coefficients of the DFT are selected as features, since the
modulating Gaussians of the filters have the same amplitudes of the other coefficients are too small and
orientation as the complex sine grating. Mathematically, insignificant. Similar features may also be found in [5,
these filters can be defined as 13]. In total, we extract 30 features for each image.
h(x, y; F,I) g(rx (x, y,I),ry (x, y,I))˜ exp(2SjFrx (x, y,I)) (1)
where rx ( x, y,I ) x cosI  y sin I , ry ( x, y,I )  x sin I  y cosI , 3. Steerable Approximation of Gabor Filters
F is the radial frequency, I stands for the orientation and The feature extraction mechanism discussed in the
1 x2  y2 (2) previous section is very time consuming. It requires
g (x, y) ( ) ˜ exp( 
2
) 2
2 SV 2V
filtering the given input image 48 times! Therefore, we
with V being the scale parameter. The frequency response need to find a method to speed up the feature extraction
of (1) is procedure.
H (u, v; F,I) exp{2SV 2 [((u cosI  v sinI)  F ) 2  (u sinI  v cosI) 2 ]} (3) Formula (3) shows that we actually get 16 rotated
What we need to do in Gabor filter design here is to versions of the same Gabor filter at each radial frequency
configure the channel parameters F, I and V. In this in the filter bank. This property suggests that the
research, three radial frequencies F = 16, 32, and 48 are approximation method introduced below could be applied
used, instead of only one (F = 16) as in [5], so that more to save the computation costs.
features can be extracted to differentiate languages with
similar shape or even share many characters. For each 3.1. Steerable filter
radial frequency, an orientation sample interval of 11.25q
is used. This means that we will get 16 Gabor filters at The concept of steerability was first proposed by
different orientations. The constant V of these channels, Freeman and Adelson in [14] and was further discussed
which determines the channel bandwidths, is chosen to be by others in [15, 11, 16]. A function f(x,y) : R2 o C is
inversely proportional to the central frequencies of the steerable with respect to rotation if:G
channels [12]. Frequency responses of the Gabor filters M

used in feature extraction are shown in Fig. 1. f T ( x, y ) ¦k j (T )M j ( x, y ) (4)


j 1

here f T ( x, y) is the rotated version (by an angle T) of f(x,


y). { M j ( x, y ) }(j = 1, …, M) are the base functions which
are independent of the rotation angle T. { k j (T ) }(j = 1,
…, M) are called the steering functions of f associated
with the base functions { M j ( x, y ) } and depend solely on
T.G
It is well known that convolution is a linear operation.
Fig.1. Frequency responses of Gabor filters used
Therefore, if a filter is steerable with respect to rotation,
in feature extraction. There are 48 Gabor
the filter output of a rotated version of this filter can be
channels in total. Image has been rescaled
obtained by linearly combining the filter outputs of its
for better visibility.
associated basis functions, or specifically,

Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005 IEEE
M
f T ( x, y ) I ( x, y )
4.1. Data preparations
¦k
j 1
j (T ) ˜ [M j ( x, y ) I ( x, y)] (5)

where I(x,y) is the image to be filtered. In this preliminary research, four languages are
If M is smaller than the number of orientation samples investigated, including Chinese, Korean, Japanese and
we need, we can save some computations. In cases when English. We collected 300 non-overlapping sample image
no finite M could be found, such as Gabor function [17], blocks for English and 275 non-overlapping sample image
some approximation has to be made. In this paper, the blocks for the other languages. These image blocks are
method proposed by Perona [11] is used, where singular extracted from newspapers, books, magazines and
value decomposition (SVD) is used to compute the least- computer printouts, which are scanned at a resolution of
squares optimal set of basis functions. With a given 200 dpi. Each sample image block has an area of
tolerable amount of error, this method also gives the 256×256 pixels.
minimum number of basis functions. Furthermore, this Font variations are also taken into consideration during
method also provides us with the discretized version of data preparation. For example, SongTi, KaiTi, HeiTi and
the steering functions needed to synthesize the exact Fang Song are used for Chinese language and Times New
Gabor functions. Roman, Arial and Courier are used for English. For the
other 2 languages, different fonts are also used. Fig. 2
3.2. Fast Feature Extraction Using Basis shows some sample images in our database.
Functions In order to test the rotation invariant property of the
extracted features, we rotate each sample by an angle
Feature extraction using exact Gabor filters has already randomly chosen within [1º, 180 º]. Therefore, we get 600
been discussed in section 2. Here we show in pseudo-code samples for English and 550 samples for each of the
how the filter output is synthesized using a limited set of remaining languages. Among these samples, 300 samples
basis functions. Features are extracted from these are chosen for training and the rest for testing.
synthesized filter outputs in the same way as described in
section 2.
Initialize a desired approximation level W between 0
and 1;
For each F in {16, 32, 48} Do
x Calculate the 16 rotated replicas of the discretized
Gabor spectrum function H(u, v; F, 0) based on
formula (3);
x Find the basis functions that can approximate these
Gabor spectrum functions up to the level W;
x Use these basis functions to filter the input image;
x Synthesize the original 16 filter outputs using
formula (5);
x Extract features as described in section 2.
End For

4. Experimental Results
Our experiments are divided into three steps. First, we
use features extracted via exact Gabor filters to do
classifier training and testing. Second, we investigate the
performance of features extracted from the synthesized Fig. 2. Sample images in our data base.
filter output using the basis functions. From here we can
compare the performance difference of these two 4.2. Classification method
methods. Lastly, we compare the proposed method with
that in [4], which is a template-matching based method With extracted features, language identification can be
with templates selected using clustering analysis. treated as a standard pattern recognition problem, and
many classifiers can be used. A two-layer feed-forward
back propagation neural network is used as classifier in

Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005 IEEE
our experiments. The last layer contains 4 neurons Table_2 Performance of the features extracted
corresponding to the four language classes and the output by synthesized Gabor filter outputs at different
of these neurons falls into [0, 1]. A class-assigning with approximation levels. (‘C’ stands for “Correct
rejection strategy is used: Rate”, ‘E’ stands for “Error Rate” and ‘R’
x Class label for each sample is the same as the index stands for “Reject Rate”.)
of the neuron which achieves the largest output D(n) = 0.99 (n|F=16 = 8, n|F=32 = 10, n|F=48 = 11)
among the four neurons, if the following conditions
Training Set Testing Set
are satisfied:
C(%) E(%) R(%) C(%) E(%) R(%)
L1 ! T1 and
Chinese 100 0.0 0.0 97.6 0.0 2.4
( L2  L1 ) / L1 ! T2
Japanese 100 0.0 0.0 99.2 0.0 0.8
here L1 and L2 stand for the first and the second
Korean 100 0.0 0.0 98.80 0.0 1.2
largest outputs among the four neurons, and T1 and T2
are two thresholds. In our experiments, T1 = 0.6 and English 100 0.0 0.0 99.33 0.33 0.33
T2 = 0.4. Average 100 0.0 0.0 98.76 0.1 1.14
x Otherwise, the given sample is rejected. Reliability 100 99.9
In this class-assign strategy, we can further define the In the end, we take the template matching based
reliability of the script identification algorithm as: method proposed in [4] as a comparison. We exclude the
Error _ sample (6) rotated images from the training and testing sets since the
reliability 1 
Total _ sample  Rejected _ sample method in [4] accepts images with no skew. Therefore, we
have 300 samples for English and 275 samples for the
4.3. Results other languages. 150 samples for each language are
selected for training and the rest for testing. Following the
Table_1 gives the script identification performance procedure given in [4], we get 267 clusters for Chinese,
using features extracted from exact Gabor filter bank. 80 clusters for English, 297 clusters for Japanese, 177
These features can correctly identify all the images in clusters for Korean. Recognition performances under
the training set and achieve an average correct rate of three reliability thresholds (R values in [4]) are
99.05% on the testing set. The reliability of the proposed investigated during the matching or recognition
recognition method is also very high. procedure. We use all textual symbols found in the given
Table_1 Performance of the rotation invariant image block. For each given textual symbol, we find the
features extracted by exact Gabor filters. (‘C’ template within each script that is the best match
stands for “Correct Rate”, ‘E’ stands for (hamming distance is used here). If the best template has
“Error Rate” and ‘R’ stands for “Reject Rate”.) an R value higher than the given threshold, we record the
matching score. Else, this textual symbol is ignored. The
Training Set Testing Set script with the best mean matching score is picked as the
C(%) E(%) R(%) C(%) E(%) R(%) script for the input image. No rejection criterion is applied
Chinese 100.0 0.0 0.0 98.80 0.80 0.40 in this experiment. The results are given in Table 3.
Japanese 100.0 0.0 0.0 99.60 0.0 0.40 Table_3 Comparison between the proposed
Korean 100.0 0.0 0.0 99.20 0.0 0.80 method and the method in [4] (R stands for the
English 100.0 0.0 0.0 98.67 0.0 1.33
reliability threshold used there).
Proposed
Average 100.0 0.0 0.0 99.05 0.2 0.75 Template Matching
Method
Reliability 100 99.81 R=0.9 R=0.7 R=0.5 D(n) = 0.99
Traini Testin Traini Testin Trainin Testin Traini Testin
Table_2 shows the results using features extracted ng set g set ng set g set g set g set ng set g set
from synthesized Gabor filter outputs. The approximation (%) (%) (%) (%) (%) (%) (%) (%)
level is set to 99%. Compared with exact Gabor filters, Chinese 98.67 98.4 100 100 100 100 100 99.2
there is a slight drop in the average language
Japanese 100 100 100 100 100 100 100 100
identification rate while the reliability of these features
remains very high. Furthermore, only 60% image filtering Korean 100 97.6 100 100 100 100 100 100
operations are needed using the proposed feature English 75.3 77.3 98.67 97.33 100 98.67 100 100
extraction strategy. Average 93.5 92.57 99.67 99.24 100 99.62 100 99.81

Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005 IEEE
We can see that the proposed method is better on [5]. T.N. Tan, “Rotation Invariant Texture Features and Their
average performance. When we select high reliability Use in Automatic Script Identification,” IEEE Trans.
threshold, fewer symbols can pass the thresholding test Pattern Analysis and Machine Intelligence, vol. 20, no. 7,
pp. 751-756, 1998.
during the matching procedure in our case, since we have
only limited number of symbols. Then we may not be able [6]. U. Pal, B.B. Chaudhuri, “Automatic identification of
to get enough information for script identification. That is English, Chinese, Arabic, Devnagari and Bangla script
why the template matching method works relatively poor line,” Proc. 6th ICDAR. pp. 790 – 794, 2001.
when we set R = 0.9. The performance is improved when
we decrease the reliability threshold. [7]. U. Pal, B.B. Chaudhuri, “Automatic separation of words in
multi-lingual multi-script Indian documents,” Proc. 4th
ICDAR. pp. 576-579, 1997.
5. Conclusion [8]. V. Ablavsky, M. R. Stevens, “Automatic Feature Selection
with Applications to Script Identification of Degraded
In this paper, script identification using features Documents,” Proc. 7th ICDAR, pp. 750-754, 2003.
extracted by Gabor filters is investigated and preliminary
results on four languages, namely, Chinese, Korean, [9]. A.M. Elgammal, M.A. Ismail, “Techniques for language
Japanese and English, are given. Experimental results identification for hybrid Arabic-English document
show the effectiveness of these features, even under some images,” Proc. 6th ICDAR. pp.1100 – 1104, 2001
font variations and different orientations. Furthermore, the
[10].V. Singhal, N. Navin, D. Ghosh, “Script-based
steerablility property of Gabor filters is proved to be very
classification of hand-written text documents in a
useful in reducing the computations needed for feature multilingual environment,” Proc. 13th International
extraction by Gabor filters, which is usually too high in Workshop on RIDE-MLIM 2003, pp. 47 – 54.
Gabor filtering related applications. In our experiments,
about 40% computations are saved with only a slight [11].P. Perona, “Deformable Kernels for Early Vision,” IEEE
decrease in the performance of script identification. In our Trans. Pattern Analysis and Machine Intelligence, vol. 17,
future research, more languages will be investigated. We no. 5, pp. 488-499, 1995.
will also compare the performance between different
texture feature extraction methods. [12].T.N. Tan, “Texture Feature Extraction via Cortical Channel
Modeling,” Proc. 11th Int'l Conf. Pattern Recognition, vol.
III, pp. 607-610, 1992.
References:
[13].G.M. Haley, B.S Manjunath, “Rotation-invariant texture
[1]. A.L. Spitz, “Determination of the Script and Language classification using modified Gabor filters,” Proc. Int’l
Content of Document Images,” IEEE Trans. Pattern Conf. Image Processing, vol. I, pp. 262 – 265, 1995.
Analysis and Machine Intelligence, vol. 19, no. 3, pp. 235-
245, 1997. [14].W. Freeman and E. Adelson. “The design and use of
steerable filters,” IEEE Trans. Pattern Analysis and
[2]. D.S. Lee, C.R. Nohl, and H.S. Baird, “Language Machine Intelligence, vol. 13, no. 9 pp. 891-906, 1991.
Identification in Complex, Unoriented, and Degraded
Document Images,” Proc. IAPR Workshop on Document [15].E. Simoncelli, W. Freeman, E. Adelson, and D. Heeger.
Analysis Syst., pp. 76-98, Oct. 1996. “Shiftable multiscale transforms.” IEEE Trans. Information
Theory, vol. 38, no. 2, pp. 587-607, 1992.
[3]. J. Ding, L. Lam, and C.Y. Suen, “Classification of Oriental
and European Scripts by Using Characteristic Features,” [16].P.C. Teo and Y. Hel-Or, “Design of Multi-Parameter
Proc. 4th ICDAR, pp. 1,023-1,027, 1997. Steerable Functions Using Cascade-Basis Reduction,”
IEEE Trans. Pattern Analysis and Machine Intelligence,
[4]. J. Hochberg et al., “Automatic Script Identification from vol. 21, no. 6, pp. 552-556, 1999.
Images Using Cluster-Based Templates,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 19, no. 2, [17].P.C. Teo and Y. Hel-Or, “Lie Generators for computing
pp. 176-181, 1997. Steerable Functions,” Pattern Recognition Letters, vol. 19,
Issue 1, pp. 7-17, 1998.

Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR’05)
1520-5263/05 $20.00 © 2005 IEEE

You might also like