0% found this document useful (0 votes)
6 views17 pages

De 2017

The document presents a novel technique for hierarchical vectorization of electrical drawings from document images, involving recognition of electrical symbols and their interconnections through morphological operations and geometric analysis. The process includes generating adjacency lists and performing super-component analysis to compactly represent circuit connections, ultimately leading to efficient storage through binary and ASCII encoding. Experimental results demonstrate the proposed method's efficiency and robustness in vectorizing complex electrical diagrams.

Uploaded by

hanoma7736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views17 pages

De 2017

The document presents a novel technique for hierarchical vectorization of electrical drawings from document images, involving recognition of electrical symbols and their interconnections through morphological operations and geometric analysis. The process includes generating adjacency lists and performing super-component analysis to compactly represent circuit connections, ultimately leading to efficient storage through binary and ASCII encoding. Experimental results demonstrate the proposed method's efficiency and robustness in vectorizing complex electrical diagrams.

Uploaded by

hanoma7736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

APPLIED

PROBLEMS

Hierarchical Vectorization of Electrical Drawings in Document


Images by Connectivity Analysis of Symbols and Super-Components1
Paramita Dea,*, Sekhar Mandalb,**, and Partha Bhowmickc,***
a
Department of Computer Science and Engineering, MCKV Institute of Engineering, Howrah, West Bengal
b
Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology,
Shibpur, West Bengal, India
c
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
*e-mail: paramitade13@gmail.com
**e-mail: sekhar@cs.becs.ac.in
***e-mail: pb@cse.iitkgp.ernet.in

Abstract—A novel integrated technique is proposed for hierarchical vectorization of electrical drawings in
document images. Its first step includes recognition of different electrical symbols and their interconnections
based on morphological operations and geometric analysis in three well-distinguished subspaces. This is fol-
lowed by a hierarchical analysis for detecting the (series-or parallel-connected) super-components in an iter-
ative manner. Finally a compact collection of circuit adjacency lists is produced, which are reduced further
by binary encoding. Reconstruction algorithm has also been explained to merit the overall efficacy of the vec-
torization. Experimental results have been furnished to demonstrate its efficiency and robustness.

Keywords: electrical drawings, circuit symbols, geometric analysis, morphological operation, super-compo-
nent, vectorization
DOI: 10.1134/S1054661817020079

1. INTRODUCTION are different other techniques, such as morphological


processing and run-length compression [3, 19],
In the field of electrical engineering, architecture, region-based classification [9], orthogonal zig-zag
and mechanical engineering, different graphic nota-
technique [7], sparse pixel vectorization [8], and
tions or symbols are used to prepare different engi-
neering drawings. Automated systems that can convert object-oriented vectorization [29].
images of these drawings into suitable vector forms are Although some works can be found in the literature
in high demand today. A vectorized representation has on recognition of electrical symbols, there is no pub-
multi-faceted advantages like reduced memory lished work till date about an integrated approach on
requirement, ease of maintenance, and a hierarchical recognizing different classes of electrical symbols and
representation of their structure and content. In addi- then vectorizing the entire circuit in a scientific way.
tion, such a representation can readily be used for edit- For example, the earlier approaches considered only
ing, browsing, indexing, and filing of document logic circuit diagrams (made of OR, AND, and
images [17, 30, 32]. Exclusive systems have been inverter only) as input [10, 14, 22]; other electrical
designed, therefore, over the last few decades, e.g., symbols are not taken into consideration in the input
image and diagram extraction [11, 16], logo detection
drawings.
[4, 25], etc. An overview of these, along with their per-
formance evaluation, may be seen in [32]. There exist several techniques for electrical symbol
Vectorization approaches are of two categories: off- recognition. These include moment-based feature
line [1, 5, 13] and on-line [15, 21], and the techniques analysis, stroke analysis, neural network, and graph-
mostly depend on the domain knowledge. For exam- theoretic heuristics [1, 5, 13, 31, 33]. We specially refer
ple, for architectural drawings, classification of line to [18] on graph-structure representation of technical
components is important [24], whereas for map vec- drawings for recognition of repetitive sub-structures
torization, color-based partitioning is useful [28]. by graph parsing. Recently, it is shown in [2] that a
Similarly, for mechanical engineering drawings, there hierarchical graph-theoretic representation of symbols
1 The article is published in the original.
can indeed help graph matching methods in dealing
with low-level vectorization errors. The rationale of
our hierarchical approach for electrical drawing vec-
torization truly reckons with all these contemporary
Received March 20, 2016 works on symbol recognition and vectorization.

ISSN 1054-6618, Pattern Recognition and Image Analysis, 2017, Vol. 27, No. 2, pp. 309–325. © Pleiades Publishing, Ltd., 2017.
310 DE et al.

Electrical Symbol Connectivity Adjacencymatrix Adjacencylist Binary ASCII


drawing recognition analysis generation generation encoding encoding

yes

Super-
stop no Super-component Hierarchical
components
analysis vectorization
found?

Fig. 1. Schematic flow of the proposed algorithm on vectorization.

1.1. Our Contribution for circuit reconstruction is also given in this section.
Section 5 presents experimental results. We conclude
The objective of our work is to design an integrated the paper in Section 5 with a few interesting extensions
technique for vectorization of electrical drawings of of the work pertaining to electrical drawings in docu-
various sorts that may be present in document images. ment images.
Its first step is based on certain morphological opera-
tions, which, in turn, help in implementing a novel
idea of three spaces containing the horizontal line seg- 2. PREPROCESSING
ments (H-space), vertical line segments (V-space), and The input gray-scale image is first binarized and
(possibly broken) circuit symbols (C-space). These skew-corrected using the existing techniques [6, 23].
three spaces are built without thinning, and used in an From the histograms of height and width, the median
efficient way for searching and scanning during the
geometric analysis so as to derive the necessary relation height h and the median width w of the connected
among the constituent primitives of the symbols to be components are computed, and subsequently the
recognized using their structural features. After recog- components having height and width less than h and
nition, the center coordinates of the bounding box of w are removed as text. Morphological opening opera-
each component are stored, and the adjacency lists tion [12] with a structuring element of size 2 × 2w (2 ×
describing the connectivity of the circuit elements are 2h ) is applied to separate the horizontal (vertical) line
generated. This list can be used to regenerate the cir- segments. Three spaces H-, V-, and С-spaces are used
cuit from its vectorized representation. to store horizontal line segments, vertical line seg-
Figure 1 shows a schematic flow of the proposed ments, and circuit symbols, respectively. The length of
technique of vectorization. From the given image con- each line segment stored in the corresponding H or V
taining one or more electrical circuits, all the electrical space, is at least 2 × h or 2 × w (see Fig. 2).
symbols are recognized first using the information in While constructing H- and V-spaces, some of the
H-, V-, and C-spaces. Then the connectivities among components in C-space may get disconnected. To
the recognized symbols are analyzed, and the connec- restore the connectedness, a joining procedure is
tivity information is stored in an adjacency matrix and applied in C-space using the information from H- and
a set of adjacency lists. In addition, a hierarchical V-spaces. For example, in Fig. 3a, the two primitives
super-component analysis is done to find all the super- (for TRN) remain unchanged after joining, whereas
components present in the circuit. A super-component the three disconnected components in Fig. 3b become
is a subset of the circuit in which some symbols or a single component after joining, as shown in Fig. 3c.
components are connected either in series or in paral- The information about basic geometric primitives are
lel. At each hierarchical level of vectorization, adja- finally stored in C-space, as explained next.
cency lists are generated for the corresponding circuit.
A binary encoding is done to represent each adjacency
list. The encoded binary string is further converted 2.1. Inclined Line Segments
into its equivalent ASCII string to finally minimize the
Horizontal (vertical) scans are done from left to
storage space.
right (from top to bottom) to count the number of
The rest of this paper is organized as follows. Sec- black pixels in each scan. If these counts are nearly
tion 2 describes the preprocessing to separate text from equal, and then a hypothetical line segment joining
graphics and hence to extract line segments present in the two centroids of the black points corresponding to
a circuit. Section 3 explains the recognition of sym- the two extreme vertical scan lines, is considered. If
bols. Connectivity analysis, generation and represen- the points lying on the hypothetical segment are black,
tation of adjacency lists, and super-component analy- then the segment is identified as a line segment, and its
sis are presented in Section 4. An efficient algorithm length and angle of orientation are stored in C-space.

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 311

(a) (b) (c)

Input Text-segmented graphics H-space


(d) (e) (f)

V-space C-space Recognized symbols

Fig. 2. Step-by-step demonstration of our symbol recognition procedure on a portion of a typical document page containing an
electrical drawing.

The scans are performed before joining, since two from left; otherwise, if they first decrease and then
inclined segments may get joined in the joining phase increase, then the component is concave from right. A
(e.g., OPA, DIO, etc.), resulting to difficulty in recog- concave segment may have four types of concavities
nizing them as inclined segments. (left, right, top, bottom), which are stored in C-space.

2.2. Circles 3. RECOGNITION OF ELECTRICAL SYMBOLS


Circles are detected after joining phase. The
bounding box (BB) is considered for a connected A symbol is a graphical entity with a particular
component in C-space. If the BB is found to be squar- meaning in the context of a specific application
ish, then four scan lines are considered whose orienta- domain. In case of electrical circuit diagrams, apart
tions are 0° (horizontal), 45°, 90° (vertical), and from standardized notations used to denote the sym-
135°—each passing through the center of BB. The BB bols, their rectilinear connections are also important.
is scanned from its exterior to verify whether the dis- The acronyms and the typical digital images of the
tance (measuring the inner diameter of the circle) symbols used in our work is given in Table 1.
from the point of black-to-white transition to the
point of white-to-black transition along each scan line
nearly equals that of each other. (a) (b) (c)

2.3. Concave Segments


Concave segments are also detected after joining
phase using the BB. The orientation is decided by the
height-width ratio of BB. For vertical orientation, hor-
izontal scan lines from top to bottom are considered.
Along each scan line, the distance of the white-to-
black transition point from the left boundary of BB is Fig. 3. Inclined line segments of a transistor (TRN) lying
in C-space. (a) Two primitives without enclosing circle,
measured. If in the ordered set of distances corre- (b) three components constituted by two primitives and
sponding to the scan lines, the distances first increase split-off arcs of the enclosing circle, (c) a single compo-
and then decrease, then the component is concave nent after joining the three components in (b).

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


312 DE et al.

Table 1. Symbols detected by our algorithm. They are represented by their acronyms (Column 1), their orientations (Col-
umn 2), number of orientations (Column 3), and the corresponding binary codes (Column 4). Out of eight orientations of
EFT, only four have been shown here

Active components
Passive components
OPA 4 00000
RES 2 01101
DIO 4 00001
IND 2 01110
TRN
TRF 2 01111
8 00010
CAP 4 10000
NMOS 4 00011
PMOS 4 00100 Sources
SIG 2 10001
FET 8 00101
BAT 4 10010
Logic gates
Earthing
INV 4 00110 ETH 1 10011
AND 4 00111
OR 4 01000
NAND 4 01001
NOR 4 01010
XOR 4 01011
XNOR 4 01100

A symbol can have more than one orientation but is 3.1. Passive Components
always uniquely characterizable irrespective of that. As shown in Fig. 5, the height and the width of each
These unique properties are used for their detection, component are first computed in C-space. If the width-
as explained in this section. For example, a resistor to-height (or height-to-width) ratio is greater than 2,
(RES) is symbolized by three or more zig-zag-cum- then horizontal (vertical) scans are done to record the
periodic strokes of lines; whereas, an inductor (IND), number of white-to-black transitions (wb) for each
although closely resembles RES apropos its periodic scan. In Fig. 5a for the bottom scan, wb’s are at p1, p2,
strokes, is distinguishable by its constituent spiral and р3, and for the top scan, they are at p4, p5, and p6.
lines. A transformer (TRF), on the contrary, consists Let 8 denote the universal set of symbols, as enu-
of two oppositely faced inductors with two or more merated in Table 1. We explain here a typical case of
line segments of equal length between them. All other recognizing and distinguishing between two passive
symbols can indeed be distinguished from each other components, namely RES and IND. For this, let 6 1 =
based on the combination, dimensional proportion, {RES, IND} and 6 1' = 8\ 6 1 . For other symbols, the
and arrangement of constituent geometric elements rules are designed based on their structure and com-
like line segments, triangles, circles, and circular position, as mentioned earlier. Let the respective
shapes, concave and convex arcs, etc. number of wb’s for the bottom and the top scans be tmin
and tmax. Let xi be the x-coordinate of pi for i = 1, 2, …,
During the recognition procedure each time a sym- 6 (Fig. 5) and Δxi = xi + 1 – xi for i = 1, 2, …, 5. To rec-
bol is identified, a name and number like R1, R2, … ognize a symbol 6 ∈ 8 , we use the following empiri-
for resistor, C1, C2, … for capacitor like that is cal formula.
assigned. All the necessary information like shape or
orientation, number of terminals, center coordinates ⎧6 if (t min = t max ≥ 3) ∧ (Δ x max
of bounding boxes of the symbols are kept in a data ⎪⎪ 1
6 = ⎨− Δ x min ≤ αΔ x min ), (1)
structure. Figure 4 shows the schematic flow of recog- ⎪
nizing various symbols in stages. ⎪⎩6 1' otherwise,

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 313

Passive components No Logic gates


Yes Yes Yes
IsRES? IsDIO? IsINV? Yes
IsXOR?
No No
No No
No Yes Yes Yes
IsIND? IsOPA? IsAND? Yes
IsXNOR?
No
No No
Yes Yes Yes
IsTRF? IsTRN? IsNAND? Yes
IsSIG?
No No
No No
Yes Yes Yes
IsCAP? IsFET? IsOR? Other symbols

Yes No No
No
No Yes Yes Yes
IsETH? IsBAT? IsMOS? IsNOR?
Yes
No No
Active components

Fig. 4. Proposed scheme on step-by-step detection of different symbols.

(a) (b)
y y
Δ x4 Δ x5 Δ x4 Δ x5
ymax ymax
p4 p5 p6
p4 p5 p6
3 3
h h h h
4 4
p1 p2 p3 p1 p2 p3
ymin ymin
w w
xmin x1 x4 x2 x5 x3 x6 xmax x xmin = x1 x4 x2 x5 x3 x6 xmax x
Δ x1 Δ x2 Δ x1 Δ x2

Fig. 5. Horizontal scanning of a circuit element, which results in distinctive recognition of a resistor (a) and an inductor (b).

where, Δxmin = min{Δxi : i = 1, 2, …, 5} and Δxmax{Δxi : it lies in C-space, and its bounding box is used to verify
i = 1, 2, …, 5}. The parameter α is set empirically; in the above conditions.
our experiments, we have taken a decision is taken α =
1/8. Next, to distinguish between RES and IND, the
decision is taken as follows. 3.2. Active Components
For a symbol like NMOS, the main structure being
⎧RES if nb < nw ,
6=⎨ (2) similar to CAP, we verify condition (c1) and an appro-
⎩IND otherwise, priate modification of condition (c2) of CAP. For
PMOS, in condition (c2), we search and verify the
where, nb denotes the number of black pixels and nw existence of a small circle (in C-space: Section 2)
that of white pixels, for the top scan (Fig. 5). touching the midpoint of one of the line segments
For many of the other symbols, the arrangement of constituting the PMOS. For all other symbols (FET,
the component lines is taken into consideration. For TRN, OPA, DIO) in this group, similar geometric
example, for the symbol CAP (capacitor), which is analysis is done.
formed by two vertical (horizontal) parallel lines,
searching is done in V(H)-space. For these two seg-
3.3. Logic Gates
ments, if (c1) their lengths lie in [2h , 4h ] and (c2)
there exists a line segment in H-space lying left (right) For this group of symbols, the major geometric
of and touching the midpoint of left (right) segment, primitives are lines, triangle, circle, and circular
then the two components under consideration form a shapes. As in the two previous groups, here also black-
CAP. If one of the two plates is a curve segment, then to-white transitions and geometric analysis are done

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


314 DE et al.

1 2 3 4
(1)
Si

e
Scanline H-space

ac

ac
sp

sp
with

C-

C-
min{nb} li

V-space
Si (1) (2)
li li
(1')
li
(2)
Fig. 6. Detecting an inverter in C-space. (3)
Si
Si

e
e

ac
ac

sp
sp

C-
C-
to identify the gate symbols. For example, INV is rec-
ognized by its triangle and a small adjoining circle
(Fig. 6), AND (OR) gate by its convex (concave) seg- Fig. 7. Illustration showing (partial) connectivity analysis
ment from C-space and the matching line segments for the symbol Si.
from H- and V-spaces, and so forth.

Figure 7 provides an illustration. For each of l i(1)


4. CONNECTIVITY ANALYSIS
and l i(2) belonging to Li, C-space is searched for new
An electrical circuit can be represented by an adja- circuit symbols. On searching, it is found that the sym-
cency matrix or a set of adjacency lists, which bol S i(1) (resistor) is connected with l i(1) and another
describes the connectivity among different symbols in
the circuit. To identify the connectivity among the cir- symbol S i(2) (resistor) with l i(2). As in the previous step,
cuit symbols, the following procedure is applied. searching is performed again in H-space to find new
horizontal line segments that are incident on or are
Step 1. The orientation (horizontal/vertical) and
the number of terminals for each symbol Si are deter- intersecting with l i(1) . Here, the horizontal line seg-
mined. ment l i(1') is found in H-space, which is incident on l i(1) ,
Step 2. Based on the orientation and the number of from which the circuit symbol S i(3) (inverter) attached
terminals of Si, a search is applied in H(V)-space for with l i(1') is found in C-space.
the horizontal (vertical) line segments that are inci- Step 5. For each symbol Si, the neighborhood
dent on the terminals of Si. For example, for a hori- information is stored according to the type and the ori-
zontally oriented inverter, the search is done in entation of Si and those of the symbols with which Si is
H-space only on the left and the right terminals of the connected.
symbol. As illustrated in Fig. 7, we get li in H-space
corresponding to the right terminal of Si.
4.1. Adjacency List Generation
Step 3. If a horizontal (vertical) line segment li is
present, then a search is carried on in C-space for any Connectivity analysis is done to generate the adja-
cency matrix. As shown in Fig. 8a, each identified
other circuit symbol S i( j ) attached to li. A search is also symbol is surrounded by a gray-colored rectangle and
made to prepare the set of vertical (horizontal) line denoted by an uppercase character, such as R for resis-
segments, namely Li = {l i(1) , l i(2), …}, which are incident tor, F for FET, and so on. The connectivity analysis for
on or intersecting with li. In Fig. 7, corresponding to the symbols is done in accordance with the procedure
the horizontal line segment li found in H-space, we get explained in Section 4, and their adjacency lists are
generated to capture the connectivity or neighborhood
the vertical line segment l i(1) intersecting with li and l i(2) information. Before generation of adjacency lists,
incident on li. adjacency matrix is generated in order to easily sepa-
rate out the different circuits present within the same
Step 4. If Li ≠ 0/ , then for each vertical (horizontal) document page.
line segment if l i( j ) ∈ Li, H(V)-space is recursively As shown in Fig. 8b, the adjacency matrix for the
searched for horizontal (vertical) line segments that circuit is a sparse matrix, which is quite natural for any
are incident on or are intersecting with if. Then electrical circuit. Hence, use of adjacency lists is more
C-space is searched for any other circuit symbols con- scientific compared to adjacency matrix.
nected with these line segments, as explained in For each symbol, the characters L, R, T, В repre-
Step 3. sent left, right, top, and bottom terminals, respec-

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 315

(a) (b)
R1 R2 R3 R4 R5 R6 F1 N1 N2 N3
R1 0 1 1 0 0 0 0 1 0 0
R2 1 0 1 0 0 0 0 0 1 0
R3 1 1 0 0 0 0 0 0 0 1
R4 0 0 0 0 0 0 1 0 0 0
R5 0 0 0 0 0 1 1 1 1 1
R6 0 0 0 0 1 0 1 1 1 1
F1 0 0 0 1 1 1 0 1 1 1
N1 1 0 0 0 1 1 1 0 1 1
N2 0 1 0 0 1 1 1 1 0 1
N3 0 0 1 0 1 1 1 1 1 0

(c)
R1 LR (x, y) R2 LL R3 LL N1 RL

F1 TLB (x, y) R4 TB R5 LB R6 LT N1 LR N2 LR N3 LR R6 BB

Fig. 8. Recognition of symbols and formation of adjacency lists for the image imaged, (a) the recognized symbols, surrounded by
gray-colored rectangles, (b) adjacency matrix after first level, (c) adjacency lists of symbols R1 and F1, which capture their neigh-
borhood information at the first level.

tively. In case of a symbol with two terminals, such as to the start symbol (Fig. 8c). The first field of ν1 (i ≥ 1)
resistor or capacitor, the terminals are “left” and contains the type of the symbol and its identification
“right” if the orientation is horizontal; and the termi- number. In the second field of νi excepting the first
nals are “top” and “bottom” if the symbol is vertically node ν1 that corresponds to the start symbol, the first
oriented. For a symbol with three terminals, such as character represents the terminal of the start symbol
FET or transistor, there are top, left, and bottom ter- stored in ν1 and the second character represents the
minals, or top, right, and bottom terminals, according terminal of the symbol in ν1. For the first node ν1, the
to their orientation. For a 4-terminal symbol, e.g., second field contains the terminals for the symbol
transformer, the types of four terminals are treated in stored in ν1, and the third field contains the coordi-
a conceptual manner instead of their physical orienta- nates ( x , y ) of the center of the bounding box of this
tion. For example, a transformer can have two left ter- symbol. For example, in Fig. 8c, in the adjacency list
minals and two right terminals if it is horizontally ori- of R1, the first field of the first node stores the name
ented. In this case, the upper left terminal is treated as R1. As R1 is horizontally oriented and has two termi-
T, and the lower left as L; the upper right terminal is nals, the second field of the first node contains L and
treated as R and the lower right as B. Similarly, a ver- R to denote its respective left and right terminals.
tically oriented transformer has two top terminals and During connectivity analysis, it is found that the left
two bottom terminals; in this case, the upper left ter- terminal of R2 is connected to the left terminal of R1.
minal is treated as L, the lower left as B, the upper right So in the second node, R2 is kept in the first field and
as T, and the lower right as R. LL in the second field, where the first L denotes the
Consider the resistor R1 in Fig. 8a. For this sym- left terminal of R1 and the second L denotes the left
bol, searching is performed in H-space—and subse- terminal of R2. Similarly, for symbol F1, there are
quently in V- and H-spaces in an iterative manner—to three terminals: top (T), left (L), and bottom (B), as
find the sequence of horizontal and vertical line seg- shown in the adjacency list of Fig. 8c. In Fig. 8a, total
ments adjacent to the left and the right sides of R1. The number of recognized symbols is ten. Hence, ten adja-
search result determines that the left terminal of R1 is cency lists are created for these ten symbols, two of
which being shown in Fig. 8c.
connected to left terminals of two symbols, namely R2
and R3, and the right terminal of R1 is connected to
the left terminal of N1. This entire connectivity infor- 4.2. Super-Component Analysis
mation is stored in the adjacency matrix correspond-
ing to R1, R2, R3, and N1 (Fig. 8b), from which their The symbols in an electrical or electronic circuit
adjacency lists are prepared in the next step. In an are connected in different ways. The frequently used
adjacency list, there are two fields corresponding to forms of connection are series and parallel. In a series
each node ν1 excepting the first node ν1 corresponding connection, symbols are connected along a single path

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


316 DE et al.

(a) (a)

(b) (b)
S1 LR (x, y) S2 LL S3 LL S2 RR P1 LR (x̄, ȳ) R5 RB R6 RT F1 RL

S3 RR R5 RB R6 RT F1 RL
Fig. 10. Recognition of super-components at third level
using the second-level information shown in Fig. 8. (a) All
Fig. 9. Recognition of super-components at second level the parallel connections (surrounded by dark gray-colored
corresponding to the circuit of Fig. 8. (a) Super-compo- rectangles), (b) adjacency list of super-component P1.
nents (light gray-colored rectangles) formed by symbols
(gray-colored rectangles) connected in series, (b) adja-
cency list of the super-component S1.

so that the same current flows through all the symbols. cated by light gray-colored rectangles and are named
On the contrary, the voltage across all the symbols as S1, S2, and S3, as shown in Fig. 9a. As the super-
connected in parallel is same, and thus the connected components are formed, the discrete components
symbols form a loop. The symbols connected in series connected in series are decreased in count, thereby
or parallel are replaced by a single component, which reducing the number of adjacency lists.
is called a super-component. Each of the components The super-components form the basis of the sec-
forming a super-component must have two terminals. ond level of hierarchical vectorization. At this level, a
The adjacency list is reduced in size by finding out the new set of adjacency lists is created by analyzing the
super-components present in a circuit diagram. adjacency lists obtained at the first level. Clearly, the
After getting the first-level adjacency lists of a cir- number as well as the total size of adjacency lists at this
cuit, all the super-components are recognized so that level is less than that at the first level. The adjacency
the circuit can be represented in a more compact form. lists generated at this level are used to identify the
To identify the super-components, the symbols con- series or parallel connections among the super-com-
nected with one another in series or in parallel are ponents in the circuit. As an example, the adjacency
considered and analyzed using the horizontal and ver- list for the super-component S1 is shown in Fig. 9b.
tical line segments found in the respective H- and While searching the adjacency list of S1, it is found
V-spaces. that each of S2 and S3 is connected with both ends of
The information about the super-components S1. Hence, each of S2 and S3 occurs twice in the adja-
includes type of connection (series or parallel), identi- cency list of S1—once with LL and once with RR.
fication number of the super-component, and also the That is, left sides of S1, S2, and S3 are connected at a
identification numbers of symbols that constitute the common point, and the right sides of S1, S2, and S3
concerned super-component. Consider the adjacency are also connected. Same information is obtained by
list of R1 in Fig. 8c, where it is found that only one checking the adjacency lists of S2 and S3. So, S1, S2,
symbol N1 is connected to the right terminal of R1. and S3 form a parallel connection, namely P1, as
Similarly, the left terminal of N1 is connected to only shown in Fig. 10a by a dark gray-colored rectangle.
one symbol, which is R1, as found from the adjacency The adjacency list of super-component P1 is shown in
list of N1. No other symbol is connected in between Fig. 10b.
R1 and N1. So, the symbols Rl and N1 are connected Adjacency list analysis. The adjacency lists gener-
in series. Hence, Rl and N1 form a super-component, ated for the super-components are analyzed again to
namely S1. By this analysis, for the circuit in Fig. 8a, determine the existence of larger super-components.
three series connections are found. These are indi- If such super-components are formed, then the algo-

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 317

Table 2. Space requirement for the binary string corresponding to the adjacency list of R1 shown in Fig. 11j
TS SL AL x y AS SL TRS TRD AS SL TRS TRD AS SL TRS TRD

R 1 3 173 231 R 2 B T T 1 B T T 2 T L
01101 0001 0011 000101 000111 01101 0010 11 10 01111 0001 11 10 01111 0010 10 00
01101 00111
Start symbol Size Center coordinates Next symbol Next symbol Next symbol
(13 bits) (22 bits) (13 bits) (13 bits) (13 bits)
35 bits 13 · value (AL) = 39 bits

rithm for generation of adjacency lists of these super- of a particular symbol in a particular image is 16, and
components is applied. The steps are as mentioned it has at most 16 neighbors. So, we need 4 bits to rep-
above. For example, in Fig. 10a, one parallel connec- resent the serial number SL of a particular symbol and
tion is formed by three series connections. This is done 4 bits to represent the size AL of its adjacency list (i.e.,
in an iterative manner until no super-component is number of its adjacent nodes). Clearly, with the
found. In this way, the circuit adjacency lists are pre- increase in the number of occurrence of a particular
pared in a hierarchical fashion in order to achieve symbol and with increasing interconnections among
multi-level vectorization. the symbols, the bit requirement per symbol will also
increase. Only two bits are needed for terminal infor-
mation, as there are at most four terminals (left, right,
4.3. Encoding of Adjacency List top, and bottom) for a particular symbol.
The hierarchical arrangement of all the adjacency Binary code for each type of symbols are already
lists generated for the symbols and the super-compo- shown in Table 1. Codes for the four possible terminals
nents present in a circuit effectively captures the con- are: L = 00, R = 01, T = 10, B = 11. Binary code used
nectivity information among the various circuit sym- for a super-component representing series connection
bols. For a compact representation, we apply a well- is 11110 and that for one representing parallel connec-
defined binary encoding to each adjacency list, which tion is 11111. Table 2 shows the binary string for a par-
generates a binary string. This binary string takes a few ticular adjacency list. First row of the table denotes the
bytes of memory space. As a result, instead of storing acronyms of the fields of information that form the
the image of the circuit, only the binary string is stored binary string. Second row shows the adjacency list
to succinctly represent a given circuit. The informa- information for a particular symbol. Third row gives
tion stored in the binary string comprises type of sym- the equivalent binary code for each field in the second
bol (TS), serial number (SL) of the symbol, size of the row. It is easy to observe that 9 bits are required to rep-
adjacency list (AL), center coordinates of the bound- resent the start symbol and 13 bits for each other sym-
ing box of the source symbol ( x , y ), adjacent symbol bol. Third field of the encoded string represents the
(AS), terminal name (TRS) of the source symbol, and numbers of nodes in a particular adjacency list, which
terminal name (TRD) of the destination symbol, in is encoded by 4 bits. So, first 13 bits of the binary string
this particular order. for each adjacency list represents the start symbol and
Number of bits required for each field of informa- the number of its neighbor symbols. Therefore, size of
tion in the encoded binary string is shown in Table 3. each binary string depends on the value of AL, as
Our algorithm can recognize 20 different symbols shown by an example in Table 2.
(given in Table 1), and a specific acronym is assigned
for each type of symbol. Hence, to represent the types The length of the binary string Li corresponding to
of symbol in binary, at least 5 bits are required. In our the ith adjacency list is given by
dataset, the maximum image size is 1870 × 1121,
thereby requiring 11 + 11 = 22 bits to represent the Li = 13(1 + value( ALi )) + 22
center coordinates of the bounding box for each sym- = 35 + 13 value( ALi ).
bol. We have assumed that the maximum occurrence
Let N be the number of symbols. Then the number of
adjacency lists is also N. Hence, the length of the con-
Table 3. Bit length for binary string catenated binary string is
TS SL AL x y AS SL TRS TRD N N

5 4 4 11 11 5 4 2 2 ∑L i = 35N + 13 ∑ value( AL ). i (3)


i =1 i =1

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


318 DE et al.

(a) Input (b) Text-segmented image (с) H-space

(d) V-space (e) C-space (f) Recognized symbols

R1 R2 R3 T1 T2 C1 S1
R1 0 1 0 1 1 0 0
R2 1 0 0 1 0 0 0
R3 0 0 0 0 0 1 1
T1 1 1 0 0 0 0 1
T2 1 0 0 0 0 0 1
C1 0 0 1 0 0 0 1
S1 0 0 1 1 1 1 0

(g) Individual symbols (h) Super-component “Z1” (dark-gray rectangle) (i) Adjacency matrix

Fig. 11. Step-by-step results of our algorithm on a portion of a typical document page containing an electrical drawing (continued
to next page).

To convert the concatenated binary string into its Step-by-step snapshots of our algorithm on a
ASCII equivalent, it is subdivided into a sequence of small-yet-representative image containing an electri-
substrings so that each substring consists of 8 bits. If cal circuit are shown in Fig. 11. All its seven adjacency
the binary string length is not divisible by 8, then lists at 1st level are shown in Fig. 11j. Note that for an
required number of zeros (at most 7) are padded with earthing symbol, no adjacency list is prepared. How-
it to make it divisible by 8. Hence, the length of the ever, for each symbol s with an earthing symbol adja-
ASCII string in bytes is given by cent to it, the necessary information is stored in the
adjacency list of s, which is sufficient. Table 2 shows
the binary code for the adjacency list of R1. The seven
⎡1 ⎛ N
⎞⎤
⎢ ⎜ ⎜
⎢⎣8 ⎝
35N + 13 ∑ value( ALi ) ⎟⎥ .
⎟⎥
(4) lines of the binary string shown in Fig. 11k corre-
sponds to the binary encoding of the seven adjacency
i =1 ⎠⎦ lists shown in Fig. 11j. The total length of this binary
Each of these 8-bit substrings is represented by its string is 531. The ASCII string, as shown in Fig. 11l, is
equivalent ASCII character, thereby replacing the equivalent to the binary string shown in Fig. 11k, and
binary string by its equivalent ASCII string in order to its length is 67 bytes.
minimize the storage. Decoding of the ASCII string The second level (and subsequent levels, if any) of
can easily be done by using the aforesaid information vectorization is also done by forming super-compo-
in its well-defined format. nents after the analysis of the adjacency lists generated

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 319

R1 TB 173 231 -> R2 BT -> T1 BT -> T2 TL


R2 TB 463 225 -> R1 TB -> T1 TT -> E1 BT
R3 TB 528 598 -> C1 TT -> S1 TB -> E1 BT
T1 TRLB 347 369 -> R1 TB -> R2 TT -> S1 RL
T2 LBTR 156 649 -> R1 LT -> S1 BT -> E1 RT
C1 TB 531 774 -> R3 TT -> S1 TB -> E1 BT
S1 TLB 308 569 -> R3 BT -> T1 LR -> T2 TB
-> C1 BT

(j) Adjacency listsat 1st level (k) Binary string corresponding to the adjacency list
at 1st level

(l) Equivalent ASCII string for the binary string in (k)


Z1 TB 528 695 -> S1 TB -> E1 BT
R1 TB 173 231 -> R2 BT -> T1 BT -> T2 TL
R2 TB 463 225 -> R1 TB -> T1 TT -> E1 BT
T1 TRLB 347 369 -> R1 TB -> R2 TT -> S1 RL
T2 LBTR 156 649 -> R1 LT -> S1 BT -> E1 RT
S1 TLB 308 569 -> Z1 BT -> T1 LR -> T2 TB

(m) Adjacency lists at 2nd level (n)Binary string corresponding to the adjacency list
at 2nd level

(o) Equivalent ASCII string for the binary stringin (n)

Fig. 11. (Contd.)

Level 1 Level 2 Level 3

Fig. 12. Reconstructed images from three-level vectorization corresponding to the circuit shown in Fig. 10.

at the first level (or previous level). In Fig. 11h, a par- 4.4. Circuit Reconstruction
allel connection is found formed by the symbol R3
and C1 (namely Z1) as the top terminals of both the We have prepared a symbol pool containing all the
symbols are connected at the same point, and the symbols that are recognized by our symbol recognition
bottom terminals of both the symbols are connected algorithm. During the reconstructed, the symbols are
with earthing symbols (Fig. 11g). Figure 11m shows used from this symbol pool. Figure 12 shows the step-
the adjacency lists at the second level. The equivalent by-step reconstruction results obtained from the vec-
binary string (431 bytes) and ASCII string (54 bytes) torized representation of image 01 (Fig. 8). These
for these adjacency lists are shown in Figs. 11n, 11o images correspond to the adjacency lists at hierarchi-
respectively. cal levels 3, 2, and 1.

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


320 DE et al.

ment stage (Section 4.4.1). The node νj is accessed by


using the pointer from A[j], as mentioned in
Section 4.4.1, in constant time, without traversing any
other node of + . Using the terminal coordinates of Si
and those of Sj, as obtained from νi and νj, respectively,
the appropriate terminals of Si and Sj are joined in
Manhattan layout by applying a line drawing algo-
rithm. Once this is done for all symbols in Ai, the node
νi is deleted from + , and the pointer from A[i] is set to
NULL. This procedure has a two-fold advantage. The
first one is that the connection of Si with all its adja-
cent nodes is done in time linear in the size of its adja-
Fig. 13. Reconstructed image from the vectorization result cency list, i.e., size of Ai. The second one is that while
corresponding to the circuit Fig. 11a. making the wire connection of Sj ∈ Ai to Si ∈ Aj, no
action will be taken as the pointer in A[i] is found to be
NULL. Thus, a new wire connection between two
For reconstruction of the circuit from the vector-
ized result, the binary string is generated at first from adjacent nodes Si and Sj is done only when the respec-
the ASCII string, with necessary truncation of the tive pointers from A[i] and A[j] are both not NULL.
zero-padded bits. The adjacency lists are then gener- This ensures a fast decision in avoiding duplicate con-
ated from the binary string, using which the circuit is nections.
reconstructed at the concerned level. Some major
issues involved with the reconstruction are explained As an example, consider the adjacency lists in
below. Fig. 11j. The bottom terminal of R1 is connected with
the top terminal of R2 in the first list. In the second
4.4.1. Symbol Placement. The source node of each list, the top terminal of R2 is connected with the bot-
adjacency list provides the center coordinates of the tom terminal of R1. Here, these two connections being
bounding box, the type, and the orientation of the
equivalent, we make the connection just once.
source symbol. Based on the symbol type, the corre-
sponding symbol image is fetched from the symbol If a neighbor node corresponds to the earthing
pool; then using its orientation and center coordinates symbol, then according to the terminals and the orien-
of the bounding box, a rectangular symbol box is tation of the source symbol, the position at which the
placed on the reconstructed image canvas and the earthing symbol will be placed, is calculated. The
symbol image is drawn inside this symbol box. The
earthing symbol is then retrieved from the symbol
dimensions of the symbol box are set in a manner so as
to just contain the symbol image. database and placed in the appropriate position inside
the symbol box. A line is then drawn to join the earth-
The necessary information about the symbol image ing symbol and the source symbol. The reconstructed
are stored in a linked list, namely + , which is used circuit corresponding to the adjacency lists in Fig. 11j
while drawing the wire connections (Section 4.4.2). is shown in Fig. 13.
For each symbol image, the coordinates of its termi-
nals and the ID of the concerned symbol constitute a 4.4.3. Time complexity. As explained in
node of + . For efficient reconstruction, a 1D array A Sections 4.4.1 and 4.4.2, the terminals of two adjacent
is used, whose functionality is explained in symbols are obtained from + , using the appropriate
Section 4.4.2. Each element of A contains a symbol ID pointers from A, and are located on the canvas in con-
and a pointer to the corresponding node in + . When- stant time. Hence, for each symbol Si, connections with
ever the image of a new symbol Si (with ID = i) is all its adjacent symbols are done in Ti = O(|Ai| + Wi)
drawn on the canvas, its corresponding node νi is time, where |Ai| denotes the size of the adjacency list of
inserted in + , and the pointer in A[i] is made to point Si, and Wi denotes the sum of lengths of all lines drawn
to νi. by the line drawing algorithm to make the wire con-
4.4.2 Wire connection. After placement of the sym- nections of all the symbols of Ai with Si. Summing up
bols within their respective symbol boxes, the adja- the time Ti over all symbols, we get the total time com-
cency list of each symbol Si is scanned to connect Si plexity for circuit image reconstruction as O(|A| + W),
with all its adjacent (neighbor) symbols. Let Ai denote where |A| denotes the total size of all the adjacency lists
the set of symbols adjacent to Si. Each symbol Sj ∈ Ai and W denotes the sum of lengths of all the wires. The
has a unique ID, i.e., j, whose information has already reconstruction procedure thus runs with an optimal
been stored in the node νj of + during symbol place- time complexity.

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 321

Input Text-segmented graphics


(a) (b)

H -space V-space

(c) (d)

C-space Recognized symbols

(e) (f)

(g) Series connections “Y1,” “Y2” (light gray rectangles) (h) Parallel connections “Z1,” “Z2” (dark-gray rectangles)

Fig. 14. Result of our algorithm on a portion of the image of a typical document page containing an electrical drawing.

5. EXPERIMENTAL RESULTS C-spaces are successfully performed on all the docu-


ment images. Detailed results on three typical images
The proposed algorithm has been implemented in are shown in Figs. 11, 14, and 15. As evident from these
С language and tested on a diverse set of document results, connectivity analysis is performed successfully
images scanned from different books [20, 26, 27]. The by our algorithm on different circuit diagrams. As an
dataset contains 43 document images, and each image example, consider the circuit diagram shown in
contains one or more circuit diagrams. All experi- Fig. 11g. There is a super-component formed by the
ments have been carried out by us in Intel (R) core symbols R3 and C1, which is correctly identified by
(TM) 2 Duo CPU T6400 2 GHz machine, the OS our algorithm, as shown in Fig. 11h. The adjacency
being Ubuntu 12.04 LTS. matrix and the adjacency lists of this circuit are also
As explained in Sections 2 and 3, the morphologi- correctly reported, as shown in Figs. 11i, 11j. Details
cal operations and geometric analysis in H-, V-, and about their encoding and subsequent reconstruction

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


322 DE et al.

(a) Input (b) Text-segmented graphics

(c) H-space (d) V-space

(e) C-space (f) Recognized symbols

(g) Individual symbol (h) Super-component “Z1”−“Z4” (dark-gray rectangles)

Fig. 15. Tlesult of our algorithm on a portion of a typical document page containing two electrical drawings. The fact that these
two drawings are disjoint, is correctly recognized by our algorithm.

have already been explained in Sections 4.3 and 4.4. the wire connections among these symbols, the series
Figure 14 shows the result on a larger image, which has and the parallel connections are also correctly recog-
six different types of symbols. These are resistor, nized, as shown in Figs. 14g, 14h.
capacitor, transistor, battery, signal, and earthing,
which are all correctly identified by our algorithm. As The document image shown in Fig. 15a is of
evident from Fig. 14b, all the text annotations are cor- another class in which the image contains two (or
rectly removed before recognizing the wire lines and more) distinct circuits. The adjacency matrix for the
the symbol components in H-, V-, and C-spaces, as image as a whole is generated at first. Then the adja-
shown in Figs. 14c–14e. The recognized symbols are cency matrix A is analyzed to identify and separate the
correctly restored from their fragmented components
in C-space, as evident from the resultant image shown two individual circuits. A queue is used to resolve the
in Fig. 14f. Note that, in this image, the symbol names disjoint matrices, and a depth-first-search is applied
written beside the corresponding symbols are assigned on A to detect the disconnected circuits one by one.
by the algorithm, and hence not conforming to the Step-by-step results shown in Figs. 15b–15h depict
annotated names in the input image. After analyzing the fitness of the algorithm in dealing with this class of

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 323

R1 TB 337 463 ->R3 BT -> I1 TT ->C2 BR ->C4 TT ->S1 BL


R3 TB 582 463 ->R1 TB -> C2 TR ->S1 TL ->E1 BT
R5 TB 624 665 ->C6 TT -> S1 TB ->E1 BT
I1 TB 284 726 ->R1 TT -> C1 BL ->C4 TT ->C4 BB ->S1 BT
C1 LR 379 775 ->I1 LB -> C4 LB ->S1 LT
C2 LR 457 389 ->R1 RB -> R3 RT ->S1 RL
C4 TB 286 621 ->R1 TT -> I1 TT ->I1 BB ->C1 BL ->S1 BT
C6 TB 619 812 ->R5 TT -> S1 TB ->E1 BT
S1 TLB 459 639 ->R1 LB -> R3 LT ->R5 BT ->I1 TB ->C1 TL -> C2 LR -> C4 TB -> C6 BT
R2 TB 342 1276 ->R4 BT -> I2 TT ->C3 BR ->C5 TT ->S2 BL
R4 TB 585 1275 ->R2 TB -> C3 TR ->S2 TL ->E1 BT
R6 TB 626 1456 ->C7 TT -> S2 TB ->E1 BT
I2 TB 286 1518 -> R2 TT -> C5 TT ->C5 BB ->S2 BT
C3 LR 458 1202 -> R2 RB -> R4 RT ->S2 RL
C5 TB 289 1414 -> R2 TT -> I2 TT ->I2 BB ->S2 BT
C7 TB 621 1602 ->R6 TT -> S2 TB ->E1 BT
S2 TLB 461 1432 ->R2 LB -> R4 LT ->R6 BT ->I2 TB -> C3 LR -> C5 TB -> C7 BT
I3 TB 286 1564 ->E1 BT

Z1 TB 624 747 ->S1 TB -> E1 BT


Z3 TB 284 663 ->R1 TT -> C1 BL -> S1 BT
R1 TB 337 463 ->Z3 TT -> R3 BT -> C2 BR -> S1 BL
R3 TB 582 463 ->R1 TB -> C2 TR -> S1 TL -> E1 BT
C1 LR 379 775 ->Z3 LB -> S1 LT
C2 LR 457 389 ->R1 RB -> R3 RT -> S1 RL
S1 TLB 459 639 ->Z1 BT -> Z3 TB -> R1 LB -> R3 LT -> C1 TL -> C2 LR
Z2 TB 626 1537 ->S2 TB -> E1 BT
Z4 TB 286 1455 ->R2 TT -> S2 BT
R2 TB 342 1276 ->Z4 TT -> R4 BT -> C3 BR -> S2 BL
R4 TB 585 1275 ->R2 TB -> C3 TR -> S2 TL -> E1 BT
C3 LR 458 1202 ->R2 RB -> R4 RT -> S2 RL
S2 TLB 461 1432 ->Z2 BT -> Z4 TB -> R2 LB -> R4 LT -> C3 LR
I3 TB 286 1564 ->E1 BT

Fig. 16. 1st level adjacency list (top) and 2nd level adjacency list (bottom) for the images of Figs. 15g, 15h.

images. The adjacency lists generated in 1st and 2nd fied series connections (Ns), number of identified par-
levels for the two circuits of Fig. 15 are shown in allel connections (Np), number of hierarchical levels of
Fig. 16. vectorization (L), and the total CPU time required to
Table 4 shows all the information about image size, execute the proposed algorithm. The CPU time
includes the time for recognition of symbols, connec-
number of identified symbols (N), number of identi-
tivity analysis of the recognized symbols, and super-
component identification. As a result, the CPU time
Table 4. Summary of experimental results for different varies not only with the size of the image, but also with
electrical drawings the total number of symbols identified in the input
CPU image, types of connections among these symbols,
Input Image size N Ns Np L and on the number of super-components present in
time, s
the image.
Image 01 (Fig. 10) 702 × 613 10 3 1 3 2.94
Figure 17 shows two plots on the efficiency of our
Image 02 (Fig. 15) 1524 × 578 18 0 4 2 8.34 vectorization algorithm. The x-axis represents the
Image 03 (Fig. 11) 671 × 634 7 0 1 2 2.28 number of symbols recognized from a circuit, and the
y-axis represents the size of the ASCII string obtained
Image 04 (Fig. 14) 647 × 1776 20 2 2 2 4.77 by vectorization corresponding to that circuit. The
Image 05 768 × 552 9 0 1 2 3.90 solid-stroke and dashed-stroke curves indicate the
first and the final levels of vectorization respectively.
Image 06 922 × 589 8 2 1 2 2.67 The dashed-stroke curve lies below the solid-stroke
Image 07 1316 × 626 11 2 1 2 3.66 curve because after finding the super-components in a
circuit, the adjacency lists are smaller in size, thus
Image 08 1412 × 592 13 4 1 2 3.96 reducing the size of the ASCII string at a higher level.
Image 09 596 × 576 8 2 1 2 2.13 The vertical gap between two curves signifies the num-
ber of super-components present in the circuit.
Image 10 519 × 575 6 1 1 2 1.30
Hence, this vertical gap increases with the increase in

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


324 DE et al.

Size of ASCII string in bytes in graphical document images,” in Proc. SSPR/SPR


350 (Hiroshima, 2012), pp. 529–538.
3. R. Cardoner and F. Thomas, “Efficient morphological
300 set transformations on line drawings,” Int. J. Pattern
Recogn. Artficial Intellig. 11 (6), 947–959 (1997).
250
4. J. Chen, M. K. Leung, and Y. Gao, “Noisy logo recog-
200 nition using line segment Hausdorff distance,” Pattern
Recogn. 36 (4), 943–955 (2003).
150 5. T. Cheng, J. Khan, H. Liu, and D. Yun, “A symbol rec-
ognition system,” in Proc. 2nd Int. Conf. on Document
100 1st level Analysis and Recognition, ICDAR’93 (Tsukuba, 1993),
2nd level pp. 918–921.
50
6. A. Das and B. Chanda, “A fast algorithm for skew
detection of document images using morphology,” Int.
5 10 15 20 25 J. Doc. Anal. Recogn. 4, 109–114 (2001).
Number of symbol 7. D. Dori, “Orthogonal zig-zag: An algorithm for vector-
izing engineering drawings compared with Hough
Fig. 17. Plot of vector size versus the number of symbols transform,” Adv. Eng. Software 28 (1), 11–24 (1997).
identified from a circuit, showing the quality of vectoriza- 8. D. Dori and W. Liu, “Sparse pixel vectorization: an
tion by the proposed algorithm. algorithm and its performance evaluation,” IEEE
Trans. Pattern Anal. Mach. Intellig. 21 (3), 202–215
(1999).
the number of super-components, and decreases with
9. D. Elliman, “A really useful vectorization algorithm,”
the decrease of the latter. in Proc. 3rd Int. Workshop on Graphics Recognition,
Recent Advances, GREC’99 (2000), pp. 19–27.
CONCLUSION 10. Y. Fukada, “A primary algorithm for the understanding
of logic circuit diagrams,” Pattern Recogn. 17 (1), 125–
The proposed work is the first of its kind in devel- 134 (1984).
oping an integrated system for vectorization of electri- 11. R. P. Futrelle, M. Shao, C. Cieslik, and A. E. Grimes,
cal drawings present in a document image. Its “Extraction, layout analysis and classification of dia-
approach towards hierarchical analysis of an electrical grams in pdf documents, in Proc. 7th Int. Conf. on Doc-
drawing can be useful for interpretation of a large and ument Analysis and Recognition, ICDAR’03 (Edinburgh,
complex circuit in symbol-wise level as well as in the 2003), pp. 1007–1014.
level of super-components. The electrical circuit has 12. R. C. Gonzalez and R. E. Woods, Digital Image Pro-
been vectorized to an optimized form, as the adjacency cessing, 2nd ed. (Addison-Wesley, Longman Publ.,
lists are represented by a binary string, and subse- 1992).
quently to an ASCII string to reduce the storage fur- 13. F. C. A. Groen, A. C. Sanderson, and J. F. Schlag,
ther. The novel idea of hierarchical vectorization may “Symbol recognition in electrical diagrams using prob-
be extended to represent a circuit diagram by an abilistic graph matching,” Pattern Recogn. Lett. 3 (5),
appropriate set of graphs corresponding to different 343–350 (1985).
levels of abstraction, which may be quite useful in 14. S. Kim, J. Suh, and J. Kim, “Recognition of logic dia-
determining whether two or more circuits of different grams by identifying loops and rectilinear polylines,” in
physical layouts are topologically similar or not. Proc. 2nd Int. Conf. on Document Analysis and Recogni-
tion, ICDAR’93 (Tsukuba, 1993), pp. 349–352.
15. H. Kojima and T. Toida, “Online hand-drawn line-fig-
ACKNOWLEDGMENTS ure recognition and its application,” Pattern Recogn. 2,
1138–1142 (1988).
The authors deeply acknowledge the anonymous 16. J. Li, A. Najmi, and R. Gray, “Image classification by
reviewers for their insightful and thorough review a two-dimensional hidden Markov model,” IEEE
works, which helped enhancing the content and qual- Trans. Signal Proc. 48 (2), 517–533 (2000).
ity of the paper.
17. W. Liu, “On-line graphics recognition: state-of-art,”
Proc. GREC 2003 Lecture Notes Comput. Sci. 4958,
291–304 (2004).
REFERENCES
18. J. Lladós and G. Sánchez, “Graph matching versus
1. M. Boraie and A. Balghonaim, “Optical recognition of graph parsing in graphics recognition: A combined
electrical circuit drawings,” in Proc. IEEE Pacific Rim approach,” Int. J. Pattern Recogn. Artificial Intellig. 18
Conf. on Networking the Pacific Rim (1997), Vol. 2, (3), 455–473 (2004).
pp. 843–846. 19. H. Luo, G. Agam, and I. Dinstein, “Directional math-
2. K. Broelemann, A. Dutta, X. Jiang, and J. Lladós, ematical morphology approach for line thinning and
“Hierarchical graph representation for symbol spotting extraction of character strings from maps and line

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017


HIERARCHICAL VECTORIZATION 325

drawings,” in Proc. 3rd Int. Conf. on Document Analysis Paramita De received her B.Sc. and
and Recognition, ICDAR’95 (1995), pp. 257–260. M.Sc. degree in Computer Science
20. M. M. Mano, Digital Logic and Computer Design (Pren- from Vidyasagar University, Midna-
tice Hall PTR, 1979). pur, India, in 2003 and 2005, respec-
tively. She received her M. Tech. de-
21. H. Murase and T. Wakahara, “Online hand-sketched gree in Computer Science and Engi-
figure recognition,” Pattern Recogn. 19 (2), 147–160 neering from West Bengal
(1986). University of Technology, Kolkata,
22. A. Okazaki, S. Tsunekawa, T. Kondo, K. Mori, and India in 2008 and Ph.D. from Indian
E. Kawamoto, “An automatic circuit diagram reader Institute of Engineering Science and
with loop-structure-based symbol recognition,” IEEE Technology, Shibpur, Howrah, In-
Trans. Pattern Anal. Mach. Intell. 10 (3), 331–341 dia in 2015. Presently, she is an As-
(1988). sistant Professor at Techno India
23. N. Otsu, “A threshold selection method from gray-level University, Salt lake, Kolkata, India. Her current research
histogram,” IEEE Trans. Syst. Man, Cybern. 9 (1), 62– area is digital image processing and pattern recognition. She
66 (1979). has authored several research papers published in interna-
tional journals and conferences.
24. Y. J. Park and Y. B. Kwon, “An effective vector
extraction method on architectural imaging using
drawing characteristics,” in Proc. 4th Int. Workshop on
Graphics Recognition Algorithms and Applications,
GREC’01 (Springer-Verlag, London, 2002), pp. 299– Sekhar Mandal did his B.Tech. and
309. M.Tech. from University of Calcut-
25. T. D. Pham, “Unconstrained logo detection in docu- ta, India, and his PhD from [Bengal
ment images,” Pattern Recogn. 36 (12), 3023–3025 Engineering and Science University,
(2003). Shibpur, Howrah, India. He is cur-
26. D. Pucknell and K. Eshraghian, Basic VLSI Design, 3rd rently an Associate Professor in
ed. (PHI, 2000). Computer Science and Technology
Department of Bengal Engineering
27. M. Rashid, Spice for Circuits and Electronics Using and Science University, Shibpur,
PSPICE, 2nd ed. (Prentice Hall of India, 2000). Howrah, India. His research interest
28. B. Sandhya, A. Agarwal, C. R. Rao, and R. Wankar, mainly lies in digital image process-
“Automatic gap identification towards efficient contour ing and pattern recognition. So far
line reconstruction in topographic maps,” in Proc. 3rd he has published 36 research papers
Asia Int. Conf. on Modelling & Simulation, AMS’09 in international journals, edited volumes, and refereed con-
(Bandung, 2009), pp. 309–314. ference proceedings.
29. J. Song, F. Su, C. L. Tai, and S. Cai, “An object-ori-
ented progressive-simplification-based vectorization
system for engineering drawings: Model, algorithm,
and performance,” IEEE Trans. Pattern Anal. Mach. Part ha Bhowmick graduated from the
Intellig. 24 (8), 1048–1060 (2002). Indian Institute of Technology,
30. S. Tabbone, “Indexing of technical line drawing based Kharagpur, India, and received Ihis
on f-signature,” in Proc. 6th Int. Conf. on Document Masters and PhD degrees from the In-
Analysis and Recognition, ICDAR’01 (Seattle, 2001), dian Statistical Institute, Kolkata, In-
pp. 1220–1224. dia. Currently he is Associate Profes-
31. J. Valois, M. Côté, and M. Cheriet, “Online recogni- sor in Computer Science and Engi-
tion of sketched electrical diagrams,” in Proc. 6th Int. neering Department, Indian Institute
Conf. on Document Analysis and Recognition, ICDAR’01 of Technology, Kharagpur, India. His
(Seattle, 2001), pp. 460–464. research focus primarily is digital ge-
ometry, with applications to combina-
32. Y. Wang, I. T. Phillips, and R. M. Haralick, “Docu- torial image analysis and computer
ment zone content classification and its performance graphics. He has coauthored over 100
evaluation,” Pattern Recogn. 39 (1), 57–73 (2006). research papers in these areas, which have been published in
33. Y. Yu, A. Samal, and S. C. Seth, “A system for recogni- peer-reviewed international journals, edited volumes, and in-
tion a large class of engineering drawing,” IEEE Trans. ternational conference proceedings. He has also coauthored
Pattern Anal. Mach. Intellig. 19 (8), 868–890 (1997). one book in digital geometry, and he holds 4 US patents.

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 27 No. 2 2017

You might also like