Prefix Circuit
Prefix Circuit
net/publication/3948467
Conference Paper in Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN · February 2002
DOI: 10.1109/ISPAN.2002.1004267 · Source: IEEE Xplore
CITATIONS READS
4 64
2 authors, including:
Yen-Chun Lin
Chang Jung Christian University
68 PUBLICATIONS 395 CITATIONS
SEE PROFILE
All content following this page was uploaded by Yen-Chun Lin on 29 March 2015.
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
example, Fig. 1 shows that s (S ) = d (S ) = n – 1; Prefix circuit Q (n) with fan-out at most 4 has
thus, S is optimal. been presented in [14].
A third important parameter is fan-out. The fan-out
of a node is its outdegree. A node having a smaller Property 1 [14]: The depth of Q(n) is
fan-out is faster and smaller in VLSI implementation lg n if n ≤ 7,
[18]. A node has unbounded fan-out if the fan-out is t–1 t–2
2t – 4 if t ≥ 4 and 2 ≤ n < 3 × 2 ,
not fixed and is a function of n; otherwise, the fan-out t–2 t
of the node is a constant, or is bounded. The fan-out of 2t – 3 if t ≥ 4 and 3 × 2 ≤ n < 2 .
prefix circuit D , fo(D ), is the maximum fan-out of W(n) is a prefix circuit defined with the following
all nodes in D . For example, fo(S) = 2. A circuit operation nodes [11]:
has unbounded fan-out if one of its nodes has G i = {(i + 1, i + 2)}, i = 1, 2,…, n – 2;
unbounded fan-out. A circuit should have a small Gn –1 = {(1, i) | i = 2, 3,…, n}.
bounded fan-out for it to be of practical use.
For ease of presentation, let i:j represent the result For example, W (3) and W (4) are shown in Fig. 2.
of computing x i ⊗ x i+1 ⊗ ... ⊗ x j , where i ≤ j. We By definition, d ( W ) = n – 1, s ( W ) = 2n – 3,
ia(W) = n – 2, and l(W) = n – 1 .
use ia(D ) = a to denote that line 1 of prefix circuit
D has a duplication node at level a and has no W(3) W(4) 1 2 3 4 5 6
duplication nodes at any level less than a . In 1 2 3 1 2 3 4
addition, we use l(D ) = b to denote that line n of
1
D obtains 1:n at level b. 1 1
Many previous parallel prefix circuits are briefly 2
2 2
reviewed in [11]. Some of them with smaller depth are
in the following. L Y D is an optimal prefix circuit 3 3
with unbounded fan-out, whose depth is between Fig. 2. W(3), W(4), and W(3) • W(4).
2 l g n – 6 and 2 lg n – 3 [9]. M is also an
optimal prefix circuit with unbounded fan-out, whose 2.2. Composition of prefix circuits, size
depth is between 2 lg n – 5 and 2 lg n – 3 [12]. optimality, and other properties
H4 is an optimal prefix circuit with fan-out 4, whose
depth is between 2 lg n – 5 and 2 lg n – 3 [11]. Assume that A (n 1 ) and B (n 2 ) are two prefix
Z4 is an optimal prefix circuit with fan-out 4, whose
depth is between 2 lg n – 6 and 2 lg n – 3 [6]. circuits with n 1 and n 2 inputs, respectively. A (n 1 )
In this paper, we introduce a new optimal parallel and B(n2 ) can be composed into a prefix circuit with
prefix circuit, denoted as W E 4 , with fan-out 4. In n1 + n 2 – 1 inputs by merging the operation node that
many cases of n, it has the smallest depth among all
produces 1:n 1 on line n 1 of A (n 1 ) with the first, if
known prefix circuits. Section 2 briefly reviews
previous results. Sections 3 and 4 present prefix not unique, duplication node on line 1 of B(n 2 ); the
circuits WL and WV, respectively; these circuits will resulting circuit is denoted by A (n 1 ) • B (n 2 ) [16].
be used in Section 5 to construct W E 4 . Section 6 For example, Fig. 2 also gives W(3) • W(4). Three or
compares W E 4 with the optimal parallel prefix more prefix circuits can also be composed into a single
circuits mentioned in the previous paragraph in some prefix circuit, and the composition operation is
detail. Section 7 concludes this paper. associative.
Definition 2 [11]: For any prefix circuit A with n
2. Brief review of previous results
inputs, if i a (A ) = a , l(A ) = b , and s (A ) = 2n – 2
+ a – b , then A is size optimal; we say that A is
A prefix circuit D can be defined with sets of
SOPC(n, a, b).
operation nodes at level i, i = 1, 2,... , d(D) [16]:
G i = {(x , y ) | at level i on line y there Property 3 [11]: Q(n) is SOPC(n, 0, lg n).
is an operation node whose left input Property 4 [11]: W (n ) is S O P C (n , n – 2, n –
is the output of a node on line x at 1) with fan-out n.
level i – 1}. Property 5 [11]: If A is S O P C (n , 0, b ) and
If (x, y) ∈ G i , the corresponding operation node d(A) = b, then A is optimal.
can be denoted as (x, y)i. Property 6 [11]: If A ( n 1 ) and B ( n 2 ) are
SOPC(n1 , a , b ) and S O P C ( n 2 , c , d ) ,
2.1. Prefix circuits Q and W respectively, where b ≥ c , then A (n 1 ) • B (n 2 ) is
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
1 2 3 6 9 0 1 12 13 4 15 6 17 8 9 20 1 22 3 24 25
SOPC(n1 + n 2 – 1, a , b – c + d ) with depth
max{d(A(n1 )), d(B(n2 )) + b – c}.
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
G6 = {(1, 2), (1, 3), (4, 5), (4, 6), (7, 8), (7, 9)}. by Definition 2, WV(6) is SOPC(19, 5, 6).
Fig. 5 shows that d(W V (5)) = 6, fo(W V (5)) = 4, The above method of obtaining WV(6) from WV(5)
i a (W V (5)) = 4, and l(W V (5)) = 5. In addition, can be generalized to derive W V (t + 1) from W V (t),
s(W V (5)) = 17 = 2 × 10 – 2 + 4 – 5. Therefore, by t ≥ 5.
Definition 2, WV(5) is SOPC(10, 4, 5). Algorithm B: Let N be the set of nodes at levels
t through 2t – 4 of W V (t), where t ≥ 5. We can
1 2 3 4 5 6 7 8 9 10 obtain WV(t + 1) by the following procedures.
t–5
1. Move down the operation node (1, 9 × 2 + 1)t of
1 W V (t) by 1 level to become (1, 9 × 2
t– 5
+ 1)t+ 1 ,
2 and move down all the other operation nodes in N
3 by 2 levels, resulting in VA(t).
2. Move down all the operation nodes in N by 2
4 levels to obtain VB(t).
t– 5 t– 4
5 3. Delete the node (9 × 2 + 1, 9 × 2 + 1)t+ 2 of
6 the composed prefix circuit V A (t) • V B (t), and
t–5 t– 4
Fig. 5. WV(5). add operation nodes (9 × 2 + 1, 9 × 2 + 1)t
t–4
and (1, 9 × 2 + 1)t+1 , resulting in WV(t + 1).
We can move down the operation node (1, 10)5 of t– 5
Theorem 8: WV(t) is SOPC(9 × 2 + 1, t – 1,
WV(5) by 1 level to be (1, 10)6 , and move the other
t) with depth 2t – 4 and fan-out 3, for t ≥ 6.
operation nodes at level 5 and level 6 downward by 2 Proof: This theorem can be proved by induction on
levels, resulting in a new prefix circuit VA(5). Note t. The proof is omitted because of space limitation.
that d (V A (5)) = 8, f o (V A (5)) = 3, and V A (5) is
S O P C (10, 5, 6). We can also obtain V B (5) by 5. Depth-size optimal prefix circuit WE4
moving down all the nodes at levels 5 and 6 of WV(5)
by 2 levels. Clearly, d(V B (5)) = 8, fo(V B (5)) = 4, For ease of presentation, let W G (t) = W L (t) •
and VB(5) is SOPC(10, 6, 7). W L ( t – 1) • ... • W L (5) • W (4), for t ≥ 5, and
By Property 6, VA(5) • VB(5) is SOPC(19, 5, 7). WG(4) = W(4).
We can delete the operation node (10, 19)7 of VA(5) • t– 2
Lemma 9: W G (t) is SOPC(3 × 2 – 8, t , 2t –
VB(5), and then add operation nodes (10, 19)5 and
3) with depth 2t – 3, for t ≥ 5.
(1, 19)6 , obtaining WV(6) as shown in Fig. 6. Clearly, t– 3
Proof: By Theorem 7, WL(t) is SOPC(3 × 2 +
d (W V (6)) = 8, fo(W V (6)) = 3, and ia(W V (6)) = 5. 1, t, t + 1) with depth 2t – 3. By Property 6,
Since only operation nodes on line 19 are modified and t– 3 t– 4
W V (6) obtains 1:19 at level 6, clearly W V (6) is a W L (t) • W L (t – 1) is S O P C (3 × 2 + 3 × 2 +
prefix circuit and l(WV(6)) = 6. 1, t, t + 2) with depth 2t – 3. Using Property 6
repeatedly, we can obtain that W L (t) • W L (t – 1)
t–3 t– 4 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 •...• WL(5) is SOPC(3 × 2 +3× 2 +…+ 3 × 2
t– 2
1 + 1, t, 2t – 4), i.e., S O P C (3 × 2 – 11, t, 2t – 4),
2
with depth 2t – 3. Since W (4) is SOPC(4, 2, 3) with
depth 3, by Property 6, W G (t) = W L (t) • W L (t – 1)
3 t– 2
4 •...• W L (5) • W (4) is SOPC(3 × 2 – 8, t, 2t – 3)
with depth 2t – 3. Q.E.D.
5
Assume that n ≥ 21 and t = lg n, we define n-
6
input prefix circuit WE4(n) as follows:
7 1. When 21 ≤ n ≤ 28 or 63 ≤ n ≤ 64,
8 t–5
WE4(n) = Q(n – 21 × 2 + 9) • WV(t) •
Fig. 6. WV(6). WG(t – 1).
2. When 29 ≤ n ≤ 32;
Since VA(5) • VB(5) is S O P C (19, 5, 7), s(VA(5) t–5 t–2
37 × 2 – 9 ≤ n ≤ 5 × 2 – 8, t ≥ 6; or
• VB(5)) = 2 × 19 – 2 + 5 – 7 = 34, Thus, s(WV(6)) = t–2
s (V A (5) • V B (5)) – 1 + 2 = 2 × 19 – 2 + 5 – 6. n = 7 × 2 – 9, t ≥ 5,
t–2
Furthermore, since ia(WV(6)) = 5 and l(WV(6)) = 6, WE4(n) = Q(n – 3 × 2 + 8) • S(2) • WG(t).
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
3. When 48 ≤ n ≤ 62; Case 8: When 7 × 2
t– 3 t
– 8 ≤ n ≤ 2 , where t ≥ 7,
t–3 t–3
5×2 –7≤n≤7×2 – 10, t ≥ 6; let m = n – 3 × 2
t–3
+ 9.
t–3 t
7 × 2 – 8 ≤ n ≤ 2 , t ≥ 7; or Case 9: When 2
t–1
<n ≤9 × 2
t– 4
– 10, where t ≥
t t–3
2 < n ≤ 9 × 2 – 10, t ≥ 7, 8, let m = n – 3 × 2
t–4
+ 9.
t–3
WE4(n) = Q(n – 3 × 2 + 9) • WG(t – 1). Case 10: When 9 × 2
t– 4
– 9 ≤ n ≤ 37 × 2
t– 6
– 10,
t–4 t–6
4. When 9 × 2 – 9 ≤ n ≤ 37 × 2 – 10, t ≥ 8, where t ≥ 8, let m = n – 21 × 2
t–6
+ 9.
t–5
WE4(n) = Q(n – 21 × 2 + 9) • WV(t – 1) By Case 2, when 29 ≤ n ≤ 32, we have d(W E 4 )
• WG(t – 2). = 2 lg n – 3. By Cases 1, 3, 4, 7, and 8, when 21 ≤
As an example, Fig. 7 shows WE4(27). t– 3 t
n ≤ 28, 47 ≤ n ≤ 64, or 7 × 2 – 9 ≤ n ≤ 2 , where
3 5 6 9 0 1 12 3 4 15 16 7 18 9 20 21 2 23 4 5 26 7 t ≥ 7, we have d(W E 4 ) = 2 lg n – 4. By Case 6,
1
when t = 6, 33 ≤ n ≤ 46, we have d ( W E 4 ) =
t–6
2 2 lg n – 5. By Cases 5 and 6, when 37 × 2 – 9 ≤
3 t– 3
4 n ≤7× 2 – 10, where t ≥ 7, we have d (W E 4 ) =
t– 1
5
6
2 lg n – 5. Finally, by Cases 9 and 10, when 2
t– 6
Fig. 7. WE4(27) = Q(15) • WV(5) • W(4). < n ≤ 37 × 2 – 10, where t ≥ 8, we have d(W E 4 )
= 2 lg n – 6.
Theorem 10: WE4(n) is an optimal prefix circuit As already mentioned, the fan-out of Q is at most
with fan-out 4 and depth 4 and the fan-out of WV(5) is 4. By Theorem 8, the
2 lg n – 3 when 29 ≤ n ≤ 32; fan-out of W V (t) is 3, t ≥ 6. By Property 4 and
2 lg n – 4 when 21 ≤ n ≤ 28, 47 ≤ n ≤ 64, or Theorem 7, the fan-out of W (4) and WL is 4, Hence,
t–3 t WE4(n) has a fan-out of 4. Q.E.D.
7 × 2 – 9 ≤ n ≤ 2 , where t ≥ 7;
t–6
2 lg n – 5 when 33 ≤ n ≤ 46 or 37 × 2 – 9 6. Comparisons of optimal prefix circuits
t–3
≤n≤7×2 – 10, where t ≥ 7;
t–1 t–6 For easy and exact comparisons, Table 1 gives the
2 lg n – 6 when 2 < n ≤ 37 × 2 – 10,
numbers of inputs that can be processed by
where t ≥ 8.
representative optimal parallel prefix circuits with
Proof: The proof is through distinguishing 10 cases.
specific depths. From Table 1, for n ≥ 21, d(WE4) ≤
Because the processes for the cases each are very
d(H4); that is, WE4 is better than H4.
similar, we will give details for the first case only and
Also from Table 1, when n = 29, 65 ≤ n ≤ 67, 139
give tips for the others.
≤ n ≤ 145, 287 ≤ n ≤ 303, 583 ≤ n ≤ 621, 1175 ≤ n
Case 1: When 21 ≤ n ≤ 28, let m = n – 12; thus 9
≤ 1259, 2359 ≤ n ≤ 2537, or 4727 ≤ n ≤ 5095, we get
≤ m ≤ 16 and lg m = 4. By Property 1,
d(Z 4 ) = d(W E 4 ) – 1; however, when 45 ≤ n ≤ 4 6 ,
we have 4 ≤ d(Q (m )) ≤ 6. By Property 3,
99 ≤ n ≤ 102, 209 ≤ n ≤ 214, 431 ≤ n ≤ 438, 877 ≤
Q ( m ) is S O P C ( m , 0, 4). As already
n ≤ 886, 1771 ≤ n ≤ 1782, or 3561 ≤ n ≤ 3574, we
mentioned, W V (5) is SOPC(10, 4, 5) with
have d (W E 4 ) = d (Z 4 ) – 1; otherwise, d (W E 4 ) =
depth 6. Thus, by Property 6, Q (m ) •
d(Z4).
W V (5) is S O P C (m + 9, 0, 5). Since W ( 4 )
We now compare WE4 with optimal prefix circuits
is SOPC(4, 2, 3), by Property 6, WE4(n) =
with unbounded fan-out. It can be checked that
Q (m ) • W V (5) • W (4) is S O P C (m + 12,
d ( W E 4 ) ≤ d ( M ) except when n = 29. As for
0, 6) with depth 6 = 2 lg n – 4. Finally,
L Y D , when 21 ≤ n ≤ 77, d(L Y D ) ≤ d(W E 4 ); when
by Property 5, WE4(n) is optimal.
Case 2: When 29 ≤ n ≤ 32, let m = n – 16. 78 ≤ n ≤ 446, d (W E 4 ) ≤ d (L Y D ); when n ≥ 447,
Case 3: When 48 ≤ n ≤ 62, let m = n – 15. d (W E 4 ) < d (L Y D ). That is, although both M and
Case 4: When 63 ≤ n ≤ 64, let m = n – 33. LYD have unbounded fan-out, only when n is small,
t– 6 t– 3 d(M ) and d(LYD) have some chances to be smaller
Case 5: When 37 × 2 – 9 ≤ n ≤5 × 2 – 8,
t–3 than d (W E 4 ) by 1. Specifically, when n > 29, M
where t ≥ 7, let m = n – 3 × 2 + 8. can be replaced by WE4; when n > 77, LYD can be
t– 3 t– 3
Case 6: When 5 × 2 – 7 ≤ n ≤7 × 2 – 10, replaced by WE4.
t–3
where t ≥ 6, let m = n – 3 × 2 + 9.
t– 3
Case 7: When n = 7 × 2 – 9, where t ≥ 6, let m
t–3
=n–3×2 + 8.
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
Table 1. The numbers of inputs that some References
optimal parallel prefix circuits
with specific depths can [1] S.G. Akl, Parallel Computation: Models and Methods ,
process. Prentice-Hall, Upper Saddle River, NJ, 1997.
[2] A. Bilgory and D.D. Gajski, "A heuristic for suffix
Depth WE4 Z4 H4 solutions," IEEE Trans. Comput. , vol. C-35, Jan. 1986,
pp. 34-42.
6 21–28 19–29 19–26 [3] G.E. Blelloch, "Scans as primitive operations," IEEE
Trans. Comput. , vol. 38, Nov. 1989, pp. 1526-1538.
7 29–46 30–44 27–41 [4] R.P. Brent and H.T. Kung, "A regular layout for parallel
8 47–64 45–67 42–57 adders," IEEE Trans. Comput., vol. C-31, Mar. 1982,
pp. 260-264.
9 65–102 68–98 58–87 [5] D.A. Carlson and B. Sugla, "Limited width parallel
prefix circuits," J. Supercomput. , vol. 4, June 1990, pp.
10 103–138 99–145 88–119 107-129.
[6] J.-N. Chen, Constructing Depth-Size Opimal Parallel
11 139–214 146–208 120–179 Prefix Circuit Z4 , Master Thesis, Dept. of Electronic
Engineering, National Taiwan University of Science
12 215–286 209–303 180–243 and Technology, 2001.
13 287–438 304–430 244–363 [7] C.P. Kruskal, L. Rudolph, and M. Snir, "The power of
parallel prefix," IEEE Trans. Comput. , vol. C-34, Oct.
14 439–582 431–621 364–491 1985, pp. 965-968.
[8] R.E. Ladner and M.J. Fischer, "Parallel prefix
15 583–886 622–876 492–731 computation," J. ACM , vol. 27, Oct. 1980, pp. 831-
838.
16 887–1174 877–1259 732–987 [9] S. Lakshmivarahan and S.K. Dhall, P a r a l l e l
Computing Using the Prefix Problem, Oxford
17 1175–1782 1260–1770 988–1467 University Press, Oxford, UK,1994.
[10] Y.-C. Lin, "Optimal parallel prefix circuits with fan-out
18 1783–2358 1771–2537 1468–1979 2 and corresponding parallel algorithms," Neural,
19 2359–3574 2538–3560 1980–2939 Parallel & Scientific Computations , vol. 7, Mar. 1999,
pp. 33-42.
20 3575–4726 3561–5095 2940–3963 [11] Y.-C. Lin, Y.-H. Hsu, and C.-K. Liu, "Depth-size
optimal parallel prefix circuits with fan-out 4 and small
depth," in Proc. Int. Conf. on Parallel and Distributed
Computing, Applications, and Technologies , 2001, pp.
7. Conclusion 74-82.
[12] Y.-C. Lin and C.-K. Liu, "Constructing optimal parallel
In this paper, we have presented Algorithm A and prefix circuits," in Proc. National Computer Symp.,
Algorithm B for constructing size optimal prefix 1999, pp. C-313-320.
circuits W L and W V ; together with Q and W , [13] Y.-C. Lin and C.-K. Liu, "Finding optimal parallel
prefix circuits with fan-out 2 in constant time," Inform.
theses circuits can be used to compose depth-size Process. Lett. , vol. 70, May 1999, pp. 191-195.
optimal parallel prefix circuit W E 4 . With our new [14] Y.-C. Lin and C.-C. Shih, "Optimal parallel prefix
approach, tedious and lengthy proofs are not required. circuits with fan-out at most 4," in Proc. 2nd IASTED
W E 4 can replace all the other optimal parallel Int. Conf. on Parallel and Distributed Computing and
prefix circuits with fan-out 4 except for Z4. We have Networks , 1998, pp. 312-317.
seen that d(W E 4 ) may be greater than d(Z4) by 1, [15] Y.-C. Lin and C.-C. Shih, "A new class of depth-size
d(W E 4 ) may be less than d(Z 4 ) by 1, and in most optimal parallel prefix circuits," J. Supercomput., vol.
cases of n, d(WE4) = d(Z4). 14, July 1999, pp. 39-52.
[16] M. Snir, "Depth-size trade-offs for parallel prefix
computation," J. Algorithms, vol. 7, 1986, pp. 185-201.
Acknowledgment [17] H. Wang, A. Nicolau, and K.S. Siu, "The strict time
lower bound and optimal schedules for parallel prefix
This research was supported in part by the National with resource constraints," IEEE Trans. Comput., vol.
Science Council of the R.O.C. under contract NCS 45, Nov. 1996, pp. 1257-1271.
89-2213-E-011-099. [18] N.H.E. Weste and K. Eshraghian, Principles of CMOS
VLSI Design: A System Perspective , 2nd ed.,
Addison-Wesley, Reading, MA, 1993.
Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02)
1087-4089/02 $17.00 © 2002 IEEE
View publication stats