0% found this document useful (0 votes)

118 views13 pages

Eager Aggregation and Lazy Aggregation

Group-by push down and group-by pull up are new query optimizations. They allow a query optimizer to move group-by operations up and down the query tree. Eager Aggregation partially pushes a groupby past a join. Laty aggregates combines two groupby operations into one.

Uploaded by

sameer250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views13 pages

Eager Aggregation and Lazy Aggregation

Uploaded by

sameer250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Eager Aggregation and Lazy Aggregation

Weipeng P. Yan Per-Bike Larson

Department of Computer Science,
University of Waterloo, Waterloo, Ontario, Canada N2L 3Gl
{pwyan,palarson}@bluebox.uwaterloo.ca

We proposed a new query optimization technique,

group-by push down and group-by pull up, which inter-
Abstract changes the order of group-by and joinspL94, YL95].
Groupby push down is to push groupby past a join.
Efficient processing of aggregation queries is Its major benefit is that the group-by may reduce the
essential for decision support applications. number of input rows of the join. Group-by pull up is
This paper describes a class of query trans- to delay the processing of groupby until after a join.
formations, called eager aggregation and laty Its major benefit is that the join may reduce the num-
aggregation, that allows a query optimizer to ber of input rows to the group-by, if the join is selec-
move group-by operations up and down the tive. Figure 1 shows the idea of commuting group-by
query tree. Eager aggregation partially pushes and join. In Figure l(a), we join Table Tl(Gl,Jl,Sl)
a groupby past a join. After a group-by is and T2(G2,J2) on join columns Jl and 52 then group
partially pushed down, we still need to per- the result on grouping columns Gl and G2, followed
form the original groupby in the upper query by aggregation on Sl. Figure 1(b) shows an alter-
block. Eager aggregation reduces the number native way where group-by is performed before join.
of input rows to the join and thus may result Note that group-by and join commutation cannot al-
in a better overall plan. The reverse trans- ways be done. The necessary and sufficient condition
formation, lazy aggregation, pulls a group-by is provided in [YL94, YL95].
above a join and combines two group-by op- SUM(S1) +
erations into one. This transformation is typ-
ically of interest when an aggregation query
@$jg Jl=J&
references a grouped view (a view containing
a groupby). Experimental results show that
t sum(s1,
ASss/ z
the technique is very beneficial for queries in
Jl=JZa T T2
the TPC-D benchmark.
WJ2)
/‘\
1 Introduction Tl T2 Tl
(Gl,Jl,Sl)
(Gl,Jl,Sl) (G2,52)
Aggregation is widely used in decision support sys-
tems. All queries in the TPC-D[Raa95] benchmark (a) Group-by Pull up (b) Group-by Push down
contain aggregation. Efficient processing of aggrega-
tion ,queries is essential for performance in decision Figure 1: Group-by and Join Commutation
support applications and large scale applications.
The technique to only partially push down a group-
Permission to copy without fee all or part of this material is by past a join can be extended. For some queries
granted provided that the copies arc not made or distributed for containing joins and groupby, we can perform group
direct commercial advantage, the VLDB copyright notice and by on some of the tables, then the join, and finally
the title of the publication and its date appear, and notice is another group-by. The first groupby, which we call
given that copying is by permission of the Very Large Data Base
Endowment. To copy otherwise, 01 to republish, requires a fee eager group-by, reduces the number of input rows to
and/or special pcmzission from the Endowment. the join and thus may result in a better plan. We
Proceedings of the 21st VLDB Conference call the groups generated by the early groupby par-
Zurich, Swizerland, 1995 tial groups because they will be merged by the second

345
group-by. When the amount of data reduction does
not justify the cost of eager group-by, we should prob- SUMISUM’
R
ably delay group-by until after the join, which we term ‘@&
lazy group-by. Both directions of the transformation
SUM& i
should be considered in query optimization. We call
the technique of performing aggregation before join
t J1=J2G9
eager aggregation, and delaying aggregation until after SUM(S1) AS SW/ \
join lazy aggregation.
Figure 2(a) and (b) show the basic idea of ea-
“‘=a m (ZJ2)
ger/lazy group-by. Eager group-by performs eager ag-
Tl
/‘\ T2
aI ’
gregation on all tables containing aggregation columns. (Gl,Jl.Sl)
Tl
(62952)
Lazy group-by is its reverse transformation. (Gl,Jl,Sl)

The following examples illustrate the basic idea of (a) Lazy grwW (b) Eager group-by
eager group-by and lazy group-by. The examples are
based on a subset of the TPC-D database[Raa95]. The
tables are defined in Appendix A.
Example 1 : Find the total loss of revenue on OT-
ders handled by each clerk due to parts being returned
by customers. Output clerk and loss of revenue.
SELECT O-CLERK,
SDM(L-EXTENDEDPRICE
* (I-L-DISCOUNT))
FROM LINEITEM, ORDERS Tl T2
WHERE O-ORDhKEY = L-ORDERKEY T2
AND LJETURNFLAG= 'R'
(Gl,Jl,Sl) (WJ2) mJJ2)
GROUPBY O-CLERK (c) Lazy Count (d) Eager Count
A
Each order is handled by one clerk so we can first
find the loss of revenue for each order. We then join
the aggregated view with table ORDERSto find the total
loss for each clerk. suM~s’& 51352 a
SELECT O-CLERK, SUM(REVENDE)
FROM (SELECT L-ORDERKEY.SUM(L-EXTENDEDPRICE
t f’\
‘XT
*(l-L-DISCOUNT)) AS REVENUE J,=>&k-& sAs
FROM LINEITEM
WHERE L-RETURNFLAG = 'R' t t
GROUPBY L-ORDERKEY)AS LOSS, ORDERS Tl l2 Tl T2
WHERE O-ORDBRKEY = L-ORDERKEY (Gl,Jl,Sl) WC2) (Gl,Jl,Sl) K-352)
GROUPBY O-CLERK (e) Double Lazy (t) Double Eager
The eager (inner) group-by reduces the number of
input rows to the join. If the LINEITEM table is clus-
tered on L-ORDERKEY,the eager group-by can be done
at almost no additional cost. Experiment on DB2
JlJ2 &
V2 Beta3 confirms that eager group-by reduces the
elapsed time by 16%. The following example shows I \

that lazy group-by can be beneficial.

Example 2 : Find the total loss of revenue on OT-
ders from May 1995 handled by each clerk due to parts
being returned by customers. Output clerk and loss of Tl T.2 Tl T2
revenue. (Gl,Jl,Sl) (G2J23) (Gl,Jl,Sl) W&W
SELECT O-CLERK, SUM(REVENUE)
FROM ORDERS,LOSS-BY-ORDER (9) Lazy Split (h) Eager Split
WHERE O-ORDERKEY = L,ORDERKEY Lazy Aggregation Eager Aggregation
O-ORDERDATE BETWEEN"1996-05-01" _______-______-- -----------------
AND
AND "1995-05-31"
GROUPBY O-CLERK Figure 2: Eager and Lazy Aggregation

346
where LOSS-BY-ORDERis an aggregated view defined by six queries. For example, it reduces the elapsed time
of Query 5 by a factor of ten.
CREATEVIEW LOSS-BY-ORDER(L-ORDERKEY,REVENUE)
(SELECT L-ORDERKEY,
~~~~(L-E~TENDEDPRI~E
* (~-L-DISCOUNT)) 1.1 Organization of This Paper
FROM LINEITEM The rest of the paper is organized as follows. Section 2
WHERE LJETURNFLAG= 'R' reviews aggregation functions in SQL2 and introduces
GROUPBY L-ORDERKEY); the concepts of decomposable aggregation functions,
We can merge the view with the query and rewrite the and class C and class D aggregation functions. Sec-
query as tion 3 defines the class of queries that we consider and
introduces notations. Section 4 presents the formalism
SELECT O-CLERK, that our results are based on. Section 5 introduces and
SUM(L-EXTENDEDPRICE*(l-L-DISCOUNT)) proves our main theorem. Sections 6, 7, 8 and 9 in-
FROM LINEITEM. ORDERS troduce corollaries for eager/lazy group-by, eager/lazy
WHERB O-ORDERKEY = L,ORDERKEY
count, double eager/lazy and eager/lazy split trans-
AND LJETURNFLAG= 'R'
AND O-ORDERDATB BETWEEN "1995-05-01" formations. Section 10 proposes algorithms for find-
AND "1995-05-31" ing all possible eager/lazy transformations for a query
GROUPBY O-CLERK and discusses the way to integrate eager/lazy aggre-
gation and group-by push down/pull up into existing
The predicate on 0-ORDERDATEis highly selective. optimizers. In order to simplify the proofs we have not
In this case, we should delay the group-by until after considered HAVING in the theorem and corollaries. Sec-
the join. A nested loop join with LINEITEM as the inner tion 11 considers the case when the HAVING clause is
and ORDERSas the outer looks like a very promising present. Section 12 shows that eager/lazy aggregation
evaluation strategy. Experiment on DB2 V2 Beta3 and group-by push down/pull up is very beneficial for
confirms that lazy group-by reduces the elapsed time TPC-D official queries. Section 13 discusses related
by 60%. work. Section 14 concludes the paper.
These examples show that both directions (eager
group-by and lazy group-by) should be considered in 2 Aggregation Functions
query optimization. There may be several ways of per-
In SQLP, a value expression may include aggregation
forming eager group-by when there are more than two
functions. There are five aggregation functions: SUM,
tables in the FROMclause[Yan95].
AVGs MIN, MAXand COUNT.Consider the query
Figure 2 and 3 show the eager/lazy transformations
introduced in this paper. Eager count transformation SELECT2*SUN(Tl.Cl)/COUNT(DISTINCT T2.C2)*
performs eager aggregation on tables not containing MIN(Tl.C3*T2.C3)
aggregation columns, as shown in Figure 2 (d). It first FROMTl,T2
counts the number of rows in each group in the early We can rewrite this query as
aggregation, then performs the join, and finally ag-
SELECT2*NCl/NC2*NC3
gregates the original aggregation columns. Lasy count FROM(SELECTmM(cl) AS NCI,
transformation is its reverse transformation. COUNT(DISTINCTT2.C2) AS NC2,
Double eager performs eager count on tables not #IN(Tl.C3*T2.C3) AS NC3
containing aggregation columns and eager group-by on FROMTl,T2 ) TMP-VIEW;
the remaining tables which may or may not contain
Any query block that has an arbitrary value ex-
aggregation columns, as shown in Figure 2 (f). The
reverse transformation is double lazy. pression containing more than one aggregation func-
tions, can always be rewritten so that the new query
Eager groupby-count, as shown in Figure 3, per-
block is a SELECT on top of a view that contains value
forms eager aggregation on a subset of tables contain-
ing the aggregation columns. Its reverse transforma- expression having at most one aggregation function.
Therefore, without loss of generality, we assume that
tion is called lazy groupby-count.
our query contains no value expression that has more
Eager split, as shown in Figure 2 (h), performs eager
than one aggregation functions.
groupby-count on both input streams before the join,
when both input streams are involved in aggregation.
2.1 Decomposable Aggregation Functions
Its reverse transformation is called lazy split.
Our experiments show that we can apply group- All sets in this paper are multi sets. Let U, denote set
by push down/pull up and eager/lazy aggregation to union preserving duplicates, and ud denote set union
twelve of the seventeen queries in the TPC-D bench- eliminating duplicates. These operations exist in SQL2
mark. This significantly reduces the elapsed time for as UNION ALL and UNION, respectively.

347
Definition 1 : (Decomposable Aggregation FROMLINBITEH
Function) An aggregation function F is decompos- GROUPBY L-ORDERKEY)AS COUNT-BY-ORDER.
able if there exist aggregation functions F1 and F,?2 ORDERS
such that F(&Ua&) = F2(Fl(Sr),Fl(S2)), where WHERE 0sORDERKEY = L-ORDERKEY
S1 and & are two sets of values. We call S1 and Sz GROUPBY O-CLERK
partial groups.
COUNTEY-ORDERcounts the number of lineitems for
SUM(C) is decomposable since SUM(S~U,S~) = each orders, then joins with table ORDERSto find the
SUM(SUM(Sl), SUM(S2)); count required. We call this transformation eager
COUNT(C) is decomposable since COUNT(SlU,S2) = count, and its corresponding reverse transformation
lazy count. Note that, this time, we are performing
SUM(COUNT(Sl), COUNT(S2));
and MIN(C) is decomposable since MIN(Slu,S2) = eager aggregation on a table which contains no aggre-
gation columns. Eager count performs eager aggrega-
MIN(MIN(Sl), MIN(S2)); and AVG(C) can be han-
dled as SUM(C) and COUNT(NOTNULL C) and thus is tion on tables not containing any aggregation columns.
decomposable’. Experiment on DB2 V2 Beta3 shows that eager count
For aggregation functions like COUMT(DISTINCT for this query reduces the elapsed time by 40%.
Cl), it is not trivial to determine whether it is de- When performing eager count and the original ag-
composable. There may be two rows with the same gregation function is either SUM or COUNT, we need
Cl value in Sl and S2. These two rows would then to count the number of rows in each group produced
contribute 2 instead of 1 in the final count. However, by the inner group-by and multiply the count with
if we know in advance that column Ci cannot contain the result from the later group by. We call aggrega-
duplicate values, then COUMT(DISTINCT Cl) is decom- tion functions satisfying this property class C aggre-
posable. Note that, even though Cl has duplicate val- gation functions(C stands for COUNT), and the count
ues, there may be other conditions which ensure that obtained from the inner group-by duplication factor. If
rows with the same Cl value belong to the same par- the original aggregation function is SUM(DISTINCT),
tial groups(e.g., Cl is a grouping column). Therefore, COUNT(DISTINCT), MIN, MAX, or AVG, we can discard
an aggregation function may or may not be decompos- the count in the subquery block. In other words, we
able. Aggregation functions MIN and MAX are always
can use a DISTINCT in the subquery block. We call
decomposable; SUMand COUNTare decomposable when aggregation functions satisfying this property class D
aggregation functions(r) stands for DISTINCT). And we
they contain no DISTINCT. The issue of determining
whether an aggregation function is decomposable will call this transformation eager distinct, and its corre-
not be discussed further. From now on we will assume sponding reverse transformation lazy distinct. There-
fore, combining this with whether the function is de-
that we have the knowledge about whether an aggre-
composable or not, we can have four types of aggre-
gation function is decomposable.
gation functions. Class D aggregation functions are
2.2 Class C and Class D Aggregation hnc- insensitive to duplication factors.
tions
Find the total number of urgent
3 Class of Queries Considered
Example 3 : OT
high priority lineitems handled by each clerk. Any column occurring as an operand of an aggregation
function (COUNT, MIN, MAX, SUM, AVG) in the SELECT
SELECT O-CLERK,
SUM(CASEWHENO-ORDERFRIORITY='l-URGENT' clause is called an aggregation column. Any column
OR O-ORDERF'RIORITY='2-HIGH' occurring in the SELECTclause which is not an aggrega-
THEN 1 ELSE 0 END) tion column is called a selection column. Aggregation
FROM LINEITEM, ORDERS columns may belong to more than one tables. We par-
WHERE O-ORDERKEY = L-ORDERKEY tition the tables in the FROMclause into two groups:
GROUPBY O-CLERK those tables that contain aggregation columns and
those that may or may not contain any such columns.
It is equivalent to the following query. Technically, each group can be treated as a single ta-
SELECT O-CLERK, ble consisting of the Cartesian product of the member
SUM(CASEWHENO-ORDERPRIORITY='l-URGENT' tables. Therefore, without loss of generality, we can
OR O-ORDERPRIORITY='2-HIGH' assume that the FROMclause contains only two tables,
THEN 1 ELSE 0 END) * CNT & and &. Let Rd denote the table containing aggre-
FROM (SELECTL-ORDERKEY,COUNT(*) AS CNT gation columns and R, the table that may or may not
lspL2 does not support COUMT(IOT IULL Cl) operation, but contain any such columns.
it is fairly easy to implement in any existing systems. The search conditions in the NMEREclause can be

348
expressed as Cd A Co A C,,, where Cd, Co, and C,, are GA: : E Gdd U a(&) - R,, i.e., the columns of &
in COnjUnCtiVe normal form, Cd Only involves columns participating in the join and grouping;
in &, C,, only involves columns in R,, and each dis- GA:: E Gdd U a(Co) - &, i.e., the columns of R,
junctive component in Co involves columns from both participating in the join and grouping
& and R,. Note that subqueries are allowed. FAA: resulting columns of the application of function
The grouping columns mentioned in the GROUP BY array F on AA in the fhst group-by when eager
clause may contain columns from & and R,, de- group-by is performed on the above query.
noted by C& and GA,,, respectively. According
to SQL2[ISO92], the selection columns in the SELECT 4 Formalization
clause must be a subset of the grouping columns. We
denote the selection columns as SC& and SGA, , sub- In this section we define the formal “machinery” we
sets of GAd and GA,,, respectively. For the time being, need for the theorems and proofs to follow.
we assume that the query does not contain a HAVING SQL2[ISO92] represents missing information by a
clause(relaxed in Section 11). The columns of & par- special value NULL. It adopts a three-valued logic in
ticipating in the join and grouping is denoted by GA:, evaluating a conditional expression. We define func-
and the columns of R, participating in the join and tional dependencies using strict SQL2 semantics tak-
grouping is denoted by GA,+. ing into account the effect of NULLS in SQLP. When
In summary, we consider queries of the following NULLS do not occur in the the columns involved in
form: a functional dependency, our definition of functional
dependency is the same as the traditional functional
SELECT [ALL/DISTINCT] SGdd, SGA,, F(AA)
dependency. The detailed definitions are included in
FROl4 &is Ru
WHERE cd A c,, A c,
pL94]. Due to space limitation, they are not included
GROUP BY Gdci,GA, here. Let A and B be two sets of columns, A fimction-
ally determines B is denoted by A-B.
where
Gdd: grouping columns of table Rd; 4.1 An Algebra for Representing SQL Queries
GA,: grouping columns of table R,; GAd and GA, Specifying operations using standard SQL is tedious.
cannot both be empty.
As a shorthand notation, we define an algebra whose
S&id: selection columns, must be a subset of grouping basic operations are defined by simple SQL statements.
cohmns G.&i; Because all operations are defined in terms of SQL,
SGA,,: selection columns, must be a subset of grouping there is no need to prove the semantic equivalence be-
columns GA,; tween the algebra and the SQL statements. Note that
AA: aggregation columns of table & and possible transformation rules for “standard” relational algebra
table R,. When considering eager/lazy group- do not necessarily apply to this new algebra. The op
by, eager/lazy count and double eager/lazy, AA erations are defined as follows.
belong to Rd. When considering eager/lazy
groupby-count and eager/lazy split, AA belong l B[GA] R: Group table R on grouping columns
to & and R, and is denoted by the union of GA = {GAI, GAz, .... GA,}. This operation is
aggregation columns AA,, and A&, where AA, defined by the query 2 SELECT * FROM R ORDER
and AAd belong to & and R, respectively.
BY GA. The result of this operation is a grouped
cd: conjunctive predicates on columns of table &; table.
c,: conjunctive predicates on columns of table R,;
l RI x R2: The Cartesian product of table RI and
co: conjunctive predicates involving columns of both
R2.
tables Rd and R,, e.g., join predicates;
a(C0): columns involved in CO; l a[C]R: Select all rows of table R that satisfy con-
F: array of aggregation functions and/or arithmetic dition C. Duplicate rows are not eliminated. This
aggregation expressions applied on AA (may be operation is defined by the query SELECT * FROM
empty). When considering eager/lazy groupby- R WHERE C.
count and eager/lazy split, F is denoted by the
union of aggregation functions Fa and F,,, where l xd[B]R, where d = A or D: Project table R on
Fd and Fd are applied on AAd and AA, respec- columns B, without eliminating duplicates when
tively.
2Certainly, this query does more than GROUP BY by ordering
F(AA): application of aggregation functions and/or the resulting groups. However, this appears to be the only valid
,arithmetic aggregation expressions F on aggre- SQL query that can represent this operation. It is appropriate
gation columns AA; for our purpose as long as we keep the difference in mind.

349
d = A and with duplicate elimination when d = 2. F contains both class C and class D aggregation
D. This operation is defined by the query SELECT functions. In this case, we need to use a COUNT
CALL /DISTINCT] B FROMR. aggregation function in the SELECTlist of the sub-
query block. The aggregation value of a class C
l $Jr;l~;A”[“=“l yAUd$4, fz@A),.... fn(AA)), aggregation function f is the count multiplied by
1, 2, ....A.,}, and F =
the value resulting from applying f. Therefore,
{fl, f2, . . ..f”}. AA are aggregation columns of
we need to change F into F,, in which every
grouped table R and F are arithmetic aggrega-
class C aggregation function f of F is replaced
tion expressions operating on AA. We must em-
by f * count. For example, if F(C1, C2, C3) is
phasis the requirement that table R is grouped
(SUM(Cl>,COUNT(C2),MIN(C3)), then
by some grouping columns C. All rows of table
F,(Cl, C2, C3, count)
R must agree on the values of all columns except
is (SUM(C1) ,MAX(C2) ,MIN(C3))o(count, 1, I) =
AA columns. Each fi, where i = 1,2, . . . . n, is an
(SUM(Cl)*count,MAX(C2) ,MIN(C3)). The oper-
arithmetic expression(which can simply be an ag-
ator o is vector product. We call F, the duplicated
gregation function) applied to some columns in
aggregation functions of F. As a shorthand nota
AA of each group of R and yields one value. An
tion, we use F(C1, C2, . .. . C,,) * count to repre-
example of fi(AA) is COUNT(A1)+ sUM(A2+ As).
sent F,, while keeping in, mind that we only need
Duplicates in the overall result are not eliminated.
to multiply class C aggregation function by the
This operation is defined by the query SELECT
count. Note that we need an additional argument
GA,A, F(AA) FROMR GROUPBY GA,whereGAis
to F,e
the grouping columns of R, and A is a set of
none grouping columns that are functionally de- Note that, it is not necessary that the functions in
termined by GA and may be empty. Note that F be decomposable.
this is not a syntactically valid SQL2 statement
since the columns A in the SELECT clause are not SUM(SSl), SUM(S2)‘CNT t
mentioned in the GROUPBY clause. However, since WWP SY
GA+ A, from a query processing point of view, SUM(Sl), t SUM(S2) 9
this is semantically sound.
Therefore, the class of query we consider can be
expressed as
nd[SGA,j, SGA,, FAA] F[AA]rA[GAd, GA,, , AA]
G[GAd, G&]+‘d A co A Cu](& x &a)
where d = A or d = D, and FAA are the aggre- Tl T2
gation values after applying F[AA] on each group. (Gl,Jl,Sl) (G2,J2,S2) (Gl ;,Sl)
The last projection simply projects the rows on the
Lazy groupbycount Eager groupby-count
columns wanted, and may eliminate duplicates. If
all the columns wanted are the same as all existing Lazy aggregation Eager aggregation
columns, and the projection does not eliminate dupli-
cates, then we usually omit the last projection in the Figure 3: The Main Theorem
expression. Consider the query to the left of Figure 3. It aggre-
All sets in this paper are multisets which may con- gates columns from both input streams. In the query
tain duplicates. Td, T” denote instances of table & and on the right, we can first perform aggregation on one
R,; T[SJ is used as a shorthand for nA[qT, where S of the input stream. We need to not only find the sum
is a set of columns and T is a grouped or ungrouped of partial groups, but also keep track of the number
table, or a row. of rows in each partial group for the aggregation on
the table(T2) that are aggregated only after the join.
5 Main Theorem This is the basic idea of eager groupby-count.
In the following theorem, let (1) NGAd denote a set
When performing eager count, we need to consider two of columns in Rd; (2) CNT the column produced by
cases: COUNT(*) after grouping a[Cd]& on NG&; (3) FA&
1. F contains only class D aggregation functions. We the rest of the columns produced by Fd in the first
can simply add a DISTINCT to the SELECTlist of group-by of table U[Cd]Td on NG&; and (4) F,, the
the subquery block and no modification to the duplicated aggregation function of F,,. Also assume
original aggregation functions is needed. that (1) AA = AAd ud AA, where AAd contains only

350
columns in &, and AA,, contains only columns in R,; S,. Note that the above statements hold for all joins,
(2) F = Fd ud F, where Fd applies to AAd and F,, not just equijoins.
applies to AA,,. Since S,, depends on Gd, we denote the set of rows
joining with Gd as S, (Gd). The set resulting from
Theorem 1 (Eager/Lazy Groupby-Count(Main
the join of Gd and S, is Gd x S,(Gd), i.e., a Carte-
Theorem)): The expcpressions sian product. (Fdl[A&], COUNT~)Gd denotes the
El : F[A&, A&]n[G&, GA,, AAd, AA,] row resulting from applying Fdr and COUNT on AAd of
the group Gd.
G[G&, G&]@d A co A cu](& x R,) Let Gdi, Gds be two (partial groups) produced by
and B[NGA,+[Cd]rd. We have two cases to consider.
Case 1: Gdi[G&] = Gdz[G&] and S, (Gdl)[GA,] =
E2 : ?Td[G&, GA,, FAA] S,,(Gds)[GA,]. In Es, after the join, all rows in
(Fua[AAu, CNT], FddFA&])
((Fdl [A&], COUNTIJ)
flA [G& GA,,, AA,,, PAAd, CNll]
(“[NGAd, GA,+, A&]‘&) x St, (Gdl)
G[GAd, GAMCo, ‘%](((Fdl[-‘h], C0UNTl-j)
and
c#=‘b, GA,+, A&]~[NGAd]+‘d]Rd) x R,)
((Fdl[A&], COUNTU)
are equivalent if (1) aggregation functions Fd contain
only decomposable aggregation functions and can be de- TINGAd, GA,+, AA&h) x su (Gd2)
composed into Fdl and Fd2; (2) F,, contain class C OT are merged into the same group by the second group-
D aggregation functions and (3) NGAd+ GA: hold by(after the join).
in U[cd]&. In El, each row in Gdi and Gd2 joins with each row
in &(Gdl) and &(Gd2), respectively. Therefore, all
The main theorem is illustrated in Figure 3. The ag-
rows in Gdr x S, (Gdl) and Gds x S,(Gds) are merged
gregation columns are split into two sets, which belong
into the same group by the group-by. Since every ag-
to & and R,, tables respectively. For the transforma-
gregation function in Fd can be decomposed as Fdi
tion from El to E2 (eager aggregation), we push down
and Fds, the aggregation values in the row produced
the & tables and perform eager aggregation on AAd
and obtain the count before the join. After the join, we by
then perform aggregation on FA& and AA,. There- Fd[A-‘ti]~A[G&, GA,, A&]
fore, we basically split the aggregation into two parts, ((Gdl X &(Gdl))Ua(Gd2 X &(&a)))
one is pre-evaluated before the join and one is evalu-
ated after the join. We call the transformation from in El are equal to the aggregation values produced by
El to E2 eager groupby-count and its reverse transfor- Fd2 [FA&]m [G-b, G-L, FA&]
mation lazy gmupby-count.
The requirement NG&+ GA: is not a neces- (((Fdl[AA&bdNG&, GA,+, A&]‘%) x s&h))
sary conditions. If NG& ti G& in some instance Ua((Fdl[A&]m[NG&, GA,+, A&]Gda) x su(Gd2)))
of a[Cd]&, then the first group-by of E2 may group in E2*
rows together when they do not belong to the same Since every aggregation function in F,, is either class
group in El. However, incorrectly assigned rows may C or D, the aggregation values in the row produced by
be eliminated by the join and we may still get the cor-
rect result. If NG& does not functionally determine E&%&r&&, GA,, AAul((Gdl x & (Gdl))
the join columns of table &, the join in E2 is un- UaGdz( Xsu (Gdz)))
defined since a group may contain different values on
in El are equal to the aggregation values produced by
the join columns. To obtain necessary and sufficient
condition, we need to extend the meaning of F[AA], J’u,[AA,, CNT]ci[G&, GA,, AA,, CNT]
which is beyond the scope of this paper.
(((COUNTbA[N’=d, GAd+]Gdl) x & (‘%l))Uo
Proof:
Consider a group Gd in g[NG&]u[cd]Td for some
((COUNTbA[NG&, GA,+]Gdz) x St,(h)))
instance Td of Rd. Since NG&+ GA:, all rows in in E2.
G,j have the same G& value and have the same value Case 2: Gdl [G&l # G-&G&] 01 & (Gdl)[G&] #
for the join columns of Rd. Therefore, if one row of S,, (Gd2) [GA,]. In Ez? the rows in
G,j qualifies in the join of b[Cd A Co A C&l (Td X T,),
all rows of Gd qualify. If one row of Gd joins with a ((Fdl[AAd], COUWI)
set of rows S, from ~[C,,]T,,, all rows of Gd join with n[NG&, GA,+, A&]&l) x St, (Gdl)

351
and and
((Fdl[AAd], CouNTO) E2 : F2 [FAA&A [G&, GA,, FAA&(G& GA,]
4NG&, GA:, AA&da) x &(&a) xA[G&, GA,, FAAd]@o A Ct,]
are not merged into the same group by the second ( (S [A&A [NG& GA,f 7 AAl
group-by(after the join). In Er, each row in Gdr and ~[NG&]~[Cd]Rd) X Ru)
Gds joins with each row in & (Gdi) and S, (Gdz), re-
spectively. However, the rows in Gdr x S,, (Gdr) and are equivalent if NG&-+ GA: holds in b[cd]Rd and
Gdz x S,,(Gdz) are not merged into the same group by all aggregation functions in F[AA] are decomposable
the group-by. Since F is decomposable, the aggrega- and can be decomposed into FI and F2.
tion values in Eager group-by transformation introduces a new
group-by, and lazy group-by transformation eliminates
Fd [A&]nA [G&, GA,, A&]
a group-by.
(Gdl X su (‘&I) The proof of the corollary is straightforward. Since
in Er are equal to the aggregation values in AA,, is empty, Fua[AAU, CNTJ is empty. Deleting all
terms relating to AA,, in E2 of the Main Theorem gives
Fd@‘&]u[GAd, GA”, F&z] Ez of the corollary.
((h[A&]m[NG&, GA,+, AA&al) X sty (‘&l))
7 Eager/Lazy Count and Eager/Lazy
in Ez.
Also, e aggregation values in the row produced by Distinct
In the Main Theorem, if we let GA, contain all the
u,CNT]%t[Gfti,
+L.l
Fu[A GA,,AAl aggregation columns, that is, all aggregation columns
(GdlX$Gdl)) belong to R, tables, then we obtain the following corol-
in El are equal ‘to the aggregation values produced by lary. In the following corollary, NGAd denotes a set
of grouping columns belonging to &, and CNT the
column produced by COUNT(*)after grouping o[c,j]&
on NGAa.

in E2. Cl “1 Corollary 2 (Eager Count/Lazy Count): The

The Main Theorem assumes that the final se- expressions
lection columns are the same as the grouping
EI : F[A~]~&‘&, GA,, AA]B[GAa, GA,]
columns(GAd, GA,) and the final projection must
be an ALL projection. We&an actually relaxes +d A co A cu] (Rd X R,)

these two restrictions, i.e., the fin\ 1 selection columns and

may be a subset(SGAd, SGA,)\of the grouping
columns(GAd, GA,), and the final projection may be E2 : F,,[AA, CNT]nA[GAd, GA,, AA, CNT]
a DISTINCT projection. This is also true for all other g[GAd, GA&A [GAd, GA,, AA, CNT]
corollaries in this paper. For a formal description of
fl[Co, CU]((COUNT@A[NG&, G&+]
the transformation and proof, please refer to [yan95].
s[NGAd]a[Cd]Rd) X &)
6 Eager Group-by and Lazy Group-by are equivalent if F are class C or class D aggregation
In the Main Theorem, if we let GAd contain all the fin&ions and NGAd+ GA: hold in a[Cd]Rd.
aggregation columns, that is, all aggregation columns In E2 above, COUNT(1 after the inner group-by in E2
belong to & tables, then we obtain the following corol- means that we add a COUNT(*)to the select list of the
lary. subquery block.
In the following corollary, let NGAd denote a set of The proof of the corollary is straightforward. Since
columns in table Rd. and FAAd the columns produced AAd is empty, Fd, Fdr and Fdz are all empty. Re-
by applying F[AA] after grouping table Rd on NG&. moving all terms relating to AAd in Ez of the Main
Corollary 1 (Eager Group-by and Lazy Group- Theorem gives E2 of the corollary.
by) : The expcpressions We call the transformation from El to E2 eager
count and from E2 to El lazy count.
El : F[AA]nA[G&, GA,, AA]B[GAd, GA,] Clearly, when F in the theorem contains only class
fl[Cd A co A cti](Rd X &) D aggregation functions, we can simply use a DISTINCT

352
in the subquery block. We then call the transformation Corollary 4
from El to Ez eager distinct and from Ea to El lazy (Double Group-by Push-Down/Double Group-
distinct. Note that in this case, F, is the same as F. by Pull-Up): A ssume that the conditions in Co~ol-
lary 3 holds. If, in addition, (1) GA;---, NGAd holds
in u[Cd]Rd, (2) GA,++ NGA, holds in u[C,,] R,,
8 Double Eager and Double Lazy
and (3) (GA,, Gdd) functionally determines the join
Now we are ready to tackle the double eager and dou- columns in r[Cd A Co A CU] (Rd x R,), then the es
ble lazy transformations. Consider the query in Fig- pressions
ure 2(e). It aggregates the columns belonging to one
El : F[A+A[G&, GA,, AA]B[GAd, GA,]
input stream (Tl). In the query in Figure 2(f), we per-
form eager group-by on the stream (Tl) containing c[cd A co A Cu](Rd X Ru)

aggregation columns and eager count on the stream and

(T2) not containing any aggregation columns. We call
the transformation double eager. Double eager can be E2 : ?rA[GAd, GA,, FAA * CNT@[GAd, GA,,]
understood as an eager group-by followed by an eager u[COI((COUNTO~~A[NGA,, GA,+]B[NGA,]
count transformation. The reverse transformation is 4WL) x (F[A+A[NG&, G&+, AA]
called double lazy.
i?[NG&]u[Cd]&))
In the following corollary, NGA, denotes a set of
columns in IS,,, NGAd a set of grouping columns be- are equivalent.
longing to & tables, FAA the columns produced by
This Corollary eliminates the groupby at the top
Fl in the first group-by of table u[Cd]& on NG&,
query block. This can be viewed as a more general
and CNT the column produced by COUNT(*) after
case of groupby push down, which pushs down group-
grouping g[CU]R, on NGA,. Also assume that AA
by into two lower query blocks. We call the transfor-
belongs to Rd.
mation double group-by push down. Its reverse trans-
formation, which pull up group-by’s from two lower
Corollary 3 (Double Eager/Double Lazy): the
query blocks, is called double group-by pull up. Please
expressions
refer to pan951 for the proof.
A simple way to ensure that the conditions of the
El : F[AAITA[GA~, GA,, AA]B[GAd, GA,]
corollary hold is to use GA: and GA: as NGAd and
a[Cd A co A cu](Rd X Ru) NGA, . Then, if (GA,, G&) functionally determines
the join columns, we can apply the Corollary.
and I Similarly, it is also possible to eliminate the group
by at the top query block after eager count, eager
Ez : F,[Fz[FAA], CNT]rA[GAd, GA,,, FAA, CNT] groupby-count and eager split to obtain the push down
C?[GAd, GA&[Co]((COUNTO versions for these transformations, and analogly, the
flA[NGAu, GAu+]G[NGAu]&L]Ru) pull up versions for the lazy aggregations. Due to
space limitation, we cannot provide the conditions
x (FI[AA]TA[NG& G&+, AA]G[NGAd]
here. Please refer to pan951 for detailed conditions
@dRd)) and proofs. Note that, the push down/pull up version
of eager/lazy groupby is group-by push down/pull up.
are equivalent if (1) NGA,+ GA,, holds in
+Z’,,]R,, (2) NGAd+ GAd holds in a[&]&, (3)
all aggregation functions in F are decomposable and
9 Eager Split and Lazy Split
can be decomposed as Fl and F2, (4) all aggregation If we apply eager groupby-count twice to Rd and R,
functions in F are class C OT D and its duplicated ag- respectively, we can perform eager aggregation on both
gregation function is F,. tables before the join. We call this transformation ea-
ger split since the aggregation is computed separately
The proof of this corollary is straightforward. It before the join. We call the reverse transformation
can be done by first performing an eager/lazy group- lazy split. Both transformations are illustrated in Fig-
by and then an eager/lazy count. ure 2(g) and (h).
Again, when F in the corollary contains only class D In the following corollary, (1) NG& and NGA, de-
aggregation functions, we can simply use a DISTINCT notes a set of columns in Rd and R,, respectively; (2)
in the subquery block of R,. Note that in this case, F, CNT, the column produced by COUNT(*) after group-
is the same as F. The following corollary shows when ing u[Cd]& on NGAd; (3) CNTz the column pro-
the group-by at the top query block may be eliminated. duced by COUNT(*) after grouping u[C,]R, on NGA,;

353
(4) FAAd the columns produced by Fd in the first ag- columns(NC&). According to Corollary 1, we can
gregation of table 6[Cd]Rd on NGAd; (5) FAA, the add more Rd columns to NGAd without changing the
columns produced by F,, in the first aggregation of result of the query. Normally we want to choose a
table o[C,]R, on NGA,; and (6) Fd,, and F,,, the new set of grouping columns only if the new set has
duplicated aggregation function of Fd and F,, respec- some ordering properties that save sorting time. For
tively. Also assume that (1) AA = AAd ud AA,, where example, if the ordering property on a new column
AAd contains only columns in &, and AA,, contains is supported by a clustering index, then after the new
only columns in R,; (2) F = Fd ud F, where Fd ap- column is added into NC&, it can be used as the ma-
plies to AAd and F,, applies to AA,,. jor of the sorting columns(assuming sorting is used for
GROUPBY). The subsequent sort may be faster since
Corollary 5 (Eager Split and Lazy Split:) The the minor columns are sorted in a smaller range, plus
expressions the advantage of sequential fetching of data rows. In
this case, even if one of the GA$ columns has an index,
El : F[A&, ~A,]~A[G& GA,, AAd, AA,]
since the index is not clustered, it may be more expen-
g[GAd, G&]&‘d A co A Cu](Rd x Ru) sive to perform the grouping using GA: as the group-
and ing columns than using the clustering index column
and GA: as the grouping columns. Therefore, we want
E2 : %[G&, GA,, FAA] to consider possible beneficial addition of columns to
(L#‘uz[FA&], CNrr,], Fda[Fdz[FA&],CNTz])GA,+as eager grouping columns. We call such columns
promising columns. Since adding new columns is of-
nA[GA,j, GA,, FAA,, FAAd, CNTI, CNTz]
ten not beneficial, a good heuristic might be not to
8[GAd, G&l+‘o, cu]((((Fdl[AAd], COUNT[) add grouping columns beyond GA$.
r.dN’=d, GA:, AA&[NG&]u[‘%]Rd) When performing eager aggregation, our objective
x ((Ful[AAwl, COUNTU) is to achieve data reduction before the join, so we want
each partial group to contain as many rows as possible.
AA[NG&, GA:, AAMNGA&[G]~u))
Therefore, if NC& contains a unique key of a[Cd]&,
are eqUiVaknt if (I) aggregation function3 Fd contain we should immediately abandon using this set for eager
only decomposable aggregation functions that can be group-by.
decomposed into Fdl and Fd2; (2) aggregation fwac-
tions F,, contain only decomposable aggregation junc- 10.1.2 Table Partitioning
tions that can be decomposed into Ful and F,,z; (3) When the query contains more than two tables, there
F, and Fd contain class C OT D aggregation func- may be several ways of performing eager aggregation.
tions; (4) NG&+ GA,+ holds in 6[Cd]Rd; (5) The question is how to partition the tables in the FROM
NGA,+ GA: holds in u[C&] R, . clause into Rd tables and R, tables. Section 10.3 dis-
The proof of this corollary is also straightforward. It cusses the way to partition tables to obtain all pos-
can be done by first performing an eager/lazy groupby- sible transformations. We assume that table parti-
tioning has been done before calling the algorithm in
count on Rd, and then an eager/lazy groupby-count on
Section 10.1.3.
&a.
10.1.3 The Algorithm
10 Algorithms and Implementation
Assuming table partitioning is done, we have the fol-
10.1 Algorithm for Eager Aggregation
lowing algorithm for finding valid eager aggregation.
In this section, we present a practical algorithm for In this algorithm, we choose not to add new columns
recognizing all valid eager transformations for a given to either NC& or NGA,. In the following algorithm,
query. We assume that & tables contain aggregation & tables must contain aggregation COhnUS.
columns and R, tables do not. That is, all queries
belong to the class of queries specified in Section 3. Algorithm 1 Eager Aggregation
Inputs: input query, &, R,, AA
10.1.1 Finding the Eager Grouping Columns Output: all possible rewritten queries
for Eager Aggregation
1 NG& := GA,+ and NGA, := GA:
Given two sets of tables, Rd and R,, with Rd ta 2 eagerd = false, eageru = false
bles containing aggregation columns and R, tables 3 if NC& is not a unique key of U[Cd]Rd
not, let’s first consider eager group-by. We can 4 eUgeTd = true
start with NC& using GA: as the eager grouping 5 end if

354
6 if NGA, is not a unique key of (r[C,,]R, an optimizer, we can first perform group-by pull up
7 eager, = true and lazy aggregation to obtain a canonical form in
8 end if which all group-bys are delayed as late as possible.
9 if eager, and eageTd Then, during dynamic programming process, when-
10 if no aggregation cohmns in R, ever a table access plan or join plan is constructed, we
11 Apply double eager on & end R, can consider adding a group-by on top of the plan. All
12 Output the reuritten query
tables in the query are then partitioned into two sets,
13 else
the set containing all tables in the current join plan,
14 Apply eager split on both .& and R,
15 Output the rearitten query and the set containing the remaining tables. We can
16 Apply eager groupby-count on Rd then apply Algorithm Eager Aggregation to find all
17 Output the reuritten query possible eager aggregations. There can be several pos-
18 end if sible ways for adding an aggregation on top of a plan.
19 else if eageTd and not eager, The optimizer may want to choose the cheapest way
20 if no aggregation columns in R, for each plan to reduce optimization cost. Then, for
21 Apply eager group-by on & each original join plan, there is at most one additional
22 else plan that performs a group-by at the top. On the
23 Apply eager groupby-count on Rd other hand, when considering join plan for two input
24 end if streams, the optimizer can consider the alternatives of
25 Output the reuritten query taking the streams with or without aggregation. If the
26 else if not eagerd and eager, optimizer employs an exhaust search and considers all
27 if no aggregation cohmns in R,
possible join plans in the dynamic programming pro-
28 Apply eager count on &,
cess(e.g., Starburst), all possible transformations can
29 else
be found in this process. This approach is also suitable
30 Apply eager groupby-count on &
31 end if
for dynamic programming process that generates only
32 Output the rearitten query left deep trees or right deep trees. However, it might
33 else overlook some possible rewrites.
34 Output “No transformation”
35 end if 11 Queries Including HAVING
END Algorithm 1
A query with a HAVING clause can always be trans-
10.2 Algorithm for Lazy Aggregation formed into one without. This technique is well known
and is used in existing database systems. For exam-
Now consider lazy aggregation. Whenever a query ple, the Starburst optimizer always transforms a query
matches the form in any one of our theorems and sat- with a HAVING into one without at the beginning of
isfies their conditions, we can perform a lazy aggre- the query rewrite phase[PHH92]. After the HAVING is
gation to eliminate one GROUPBY(or DISTINCT), and eliminated, we can perform eager aggregation trans-
delay grouping until after the join. Lazy aggregation is formation on the view created.
especially useful when the join is highly selective. The Now consider lazy aggregation. When a HAVING is
algorithm to find all valid lazy aggregation transforma- eliminated in a subquery block with an aggregation
tion for a given query is to iterate through each avail- (either groupby or distinct), and the HAVING clause
able transformation and output the rewritten forms. contains no aggregations, then the predicate in the
Please refer to [Yan95] for a detailed description for HAVING clause can be moved to the WHEREclause and
the algorithm. we can then try to apply one of our lazy aggregation
theorems. If the HAVING clause contains aggregations,
10.3 Implementation we usually give up performing lazy aggregations be
We need to find a way to efficiently integrate ea- cause the HAVING predicates have to be evaluated be
ger/lazy aggregation and group-by push down/pull fore the join. However, it is possible to perform lazy
up into existing optimizers. The standard technique aggregation when the HAVING clause of a query con-
for determinating join order in a cost-based opti- tains aggregation.
mizer is dynamic programming in a bottom up (e.g., We formally proved our theorem for the conditions
Starburst[LohM]) fashion. During the dynamic proof groupby push down transformation for queries con-
gramming process, plans for table accesses, two-table taining a HAVING clause in [yL95]. The process to
joins, three-table joins and joins involving more tables prove conditions of eager aggregation for queries with
are constructed and kept until the final query plan is a HAVING clause is completely analog to our previous
obtained. To integrate the transformations into such effort. Due to space limitation we shall not present the

355
conditions and proof here.
Table 2: Ratio Between Best And Worst Elapsed Time
12 TPC-D Queries For All TPC-D Queries That Can Be Transformed
We can apply group-by push down/pull up and ea- 1 Q ) # of I Worst 1 Best 1 W/B t
ger/lazy aggregation to twelve of the seventeen queries rewrites Formulation Formulation Ratio
in the TPC-D benchmark and significantly reduce the 3 3 PD on L/O Oligbl 3.93
elapsed time of six queries on DB2 V2 Beta 3, as shown 5 7 EG on EG on 43.71
in Table 1. For example, it improves the elapsed time w/w L/O/C
7 7 Origilld EGonL 2.58
of Query 5 by a factor of ten. Table 2 shows the ratio
8 14 EG on L/S EG on 501.02
between best and worst elapsed time for all TPC-D
official queries that can be transformed3. The perfor-
mance difference between a badly formed query and a
better formed query can be very significant. Partic-
ularly, in applications where queries are generated by
tools or inexperienced users, automatic transformation I I 1 for both 1 I I
of queries is indeed very important. aggregations
In both Table 1 and 2, each table is represented by 12 2 EConL Otigilld 1.02
the first letter of its name, except that table PART- 13 2 EGonL Olighl 16.59
SUPP is represented by PS. Also, we use PD, PU, EG, 14 2 Origin&l EGonL 1.07
EC and DC to represent group-by push down, group 15 2 PU Ori& 2.51
by pull up, eager group-by, eager count and query de-
correlation transformations respectively.
original query. The access plan must maintain a count
Table 1: TPC-D Queries With Reduced Elapsed Time of the number of duplicates being removed. Then, af-
(Compared With Original Formulation) ter or during the join, the access plan must restore the
duplicates. Chaudhuri and Shim[CS95] also general-
ized group-by pull up to handle the case when the join
is a many to many join.

14 Conclusion
Group-by push down and group-by pull up interchange
the order of join and group-by. The number of group
by’s is unchanged. Eager aggregation introduces an
additional group-by before a join, and lazy aggregation
eliminates a group-by before a join. Groupby push
13 Related Work
down and eager aggregation reduces the number of
We proposed the idea of eager aggregation and lazy rows participating in a join, groupby pull up and lazy
aggregation in [Yan94]. Chaudhuri and Shim[CS94] aggregation reduces the number of input rows to the
also independently discovered eager group-by and ea- groupby. Both directions of transformation should be
ger count. Their simple coalescing grouping and gen- considered during query optimization.
eralized coalescing grouping correspond to our eager We classify eager aggregation into five different
group-by and eager count transformation, respectively. types: eager group-by, eager count, double eager, ea-
They also proposed an algorithm to integrate group- ger groupby-count and eager split. Eager groupby
by push down, eager groupby and eager count into a partially pushs down a groupby on the tables that
greedy join enumeration algorithm which produces left contain all aggregation columns; eager count partially
deep trees in a cost based optimizer. However, they pushs down a groupby on the tables that do not con-
did not discuss lazy aggregation transformation in the tain any aggregation columns; double eager partially
paper. pushs down a groupby on both types of tables; eager
Gupta, Harinarayan and Quass[GHQ95] general- groupby-count partially pushs a groupby into a sub-
ized group-by push down in another fashion. They set of tables containing the aggregation columns; eager
showed that it is possible to perform early duplicate re- split splits a group-by into two group-bys and partially
moval before a join when there is no aggregation in the pushs the groupbys down the two input streams of the
3The ratio marked as ‘infinity’ means that the query with the join. As a special case of double eager, we can com-
worst formulation ran out of system space and did not finish. pletely push down group-by into two input streams,

356
CUSTOMERS(C_) PARTSUPP(PSJ SUPPLIERS ORDERS(O-) LINEITEM (L_)
15K 1K 1K 160K 6ooK

tTAX

Legend:
l Scale factor 1
l The highlighted column names in each table form its primary key
l The number below a table name shows the number of rows of the table.
1 COMMENT

Figure 4: Subset of the TPC-D Database

which is call double groupby push down. Similarly, References
we classify lazy aggregation into lazy groupby, lazy [CS94] S. Chaudhuri and K. Shim. Including group-by
count, double lazy, lazy groupby-count and lazy split, in query optimization. In Proc. VLDB Conf.,
which perform the reverse transformations of their ea- pages 354-366, Santiago, Chile, Sep. 1994.
ger counterparts. We also provide practical algorithms [CS95] S. Chaudhuri and K. Shim. Optimizing com-
for identifying all possible transformations. The algo- plex queries: A unifying approach. Tech. Re-
rithms do not restrict the join order of the query. port HPL-DTD-95-20, HP Lab, Mar. 1995.
Future work includes: (1) finding the conditions of [GHQ95] A. Gupta, V. Harinarayan, and D. Quass.
lazy transformation for subqueries containing an ag- Aggregate-query processing in data warehous-
gregating HAVING clause; (2) finding necessary and suf- ing environments. In Proc. VLDB Conf., 1995.
ficient conditions for all the transformations in the pa [IS0921 1.30. Information Technology - Database lan-
per and (3) finding eager/lazy aggregation transforma- guages - SQL. Reference number ISO/IEC
tions involving other binary relational operations (e.g., 9075:1992(E), Nov. 1992.
UNION, INTERSECT, EXCEPT and OUTER JOIN). [Loh88] G. M. Lohman. Grammar-like functional rules
for representing query optimization altern*
tives. In Proc. ACM SIGMOD Conf., pages 18-
Acknowledgements 27, Chicago, Illinois, June 1988.
[PHH92] H. Pirahesh, J. M. Heiierstein, and W. Hasan.
We thank the referees for their many useful comments. Extensible/rule based query rewrite optimiza-
We also would like to thank Guy M. Lohman for sug- tion in STARBURST. In Proc. SIGMOD Conf.,
gesting the word eager and K. Bernhard Schiefer for pages 39-48, San Diego, California, June 1992.
his help with the TPC-D experiments. We also want [Raa95] F. Raab, editor. TPC Benchmark D (Deci-
to extend our appreciation to Surajit Chaudhuri and sion Support), Working Draft 9.1. Transaction
Kyuseok Shim for their valuable comments. Processing Performance Council, San Jose CA,
95112-6311, USA, February 1995.
[yan94] w. P. Y an. Query optimization techniques for
aggregation queries. Research Proposal, Univer-
A The TPC-D Database
sity of Waterloo, April 1994.
TPC-D is a decision support benchmark proposed by pan951 W. P. Yan. Rewrite optimization of SQL queries
the the Transaction Processing Performance Coun- containing GROUP-BY. PhD thesis, Depart-
ment of Comp. Sci., University of Waterloo,
cil(TPC). It is a suite of business oriented queries to
Sep. 1995.
be executed against a database that allows continuous
access as well as concurrent updates[Raa95]. The size w 941 W. P. Yan and Per-Ake Larson. Performing
group-by before join. In Proc. IEEE ICDE,
of the database is scalable adjusted by a scale factor.
pages 89-100, Houston, Texas, Feb. 1994.
The scale factor for a 1OOMB database is 0.1. The
size of the database we used through out this paper IF951 W. P. Yan and Per-&ce Larson. Interchanging
the order of grouping and join. Technical Report
is 100MB. Figure 4 shows the subset of the TPC-D
CS 95-09, University of Waterloo, Feb. 1995.
database we used in this paper.

357

Including Group-By in Query Optimization VLDB
No ratings yet
Including Group-By in Query Optimization VLDB
13 pages
Subquery
No ratings yet
Subquery
11 pages
Session 9 XP
No ratings yet
Session 9 XP
89 pages
Step 5: Except For Calls To GROUPING, Each Aggregate Function Call in The
No ratings yet
Step 5: Except For Calls To GROUPING, Each Aggregate Function Call in The
1 page
Grouping and Summarizing Data
No ratings yet
Grouping and Summarizing Data
34 pages
SAP HANA Query Optimization Guide
No ratings yet
SAP HANA Query Optimization Guide
17 pages
Group by Clause
No ratings yet
Group by Clause
3 pages
db2 Olap
No ratings yet
db2 Olap
20 pages
Week-10b Rollup Cube Grouping
No ratings yet
Week-10b Rollup Cube Grouping
26 pages
3CS520IC24 - ADF Minor Lect-12 Accessing Data With DataReaders, Disconnected Operations With Data Tables and Data Sets, Connection Pool (1) 11111
No ratings yet
3CS520IC24 - ADF Minor Lect-12 Accessing Data With DataReaders, Disconnected Operations With Data Tables and Data Sets, Connection Pool (1) 11111
28 pages
Lab: INNER JOIN, GROUP BY, and HAVING Clauses
No ratings yet
Lab: INNER JOIN, GROUP BY, and HAVING Clauses
33 pages
Lab 6
No ratings yet
Lab 6
31 pages
Relational Algebra Ii: CS121: Relational Databases Fall 2018 - Lecture 3
No ratings yet
Relational Algebra Ii: CS121: Relational Databases Fall 2018 - Lecture 3
43 pages
Advanced SQL GROUP BY Enhancements
No ratings yet
Advanced SQL GROUP BY Enhancements
22 pages
SQL Techniques
No ratings yet
SQL Techniques
37 pages
Assignment 2
No ratings yet
Assignment 2
10 pages
Handling One-to-Many Relations - Grouping and Aggregation: No Class/lab Next Tuesday
No ratings yet
Handling One-to-Many Relations - Grouping and Aggregation: No Class/lab Next Tuesday
12 pages
'Jensen': Sum Sum Sum
No ratings yet
'Jensen': Sum Sum Sum
6 pages
Bai 3
No ratings yet
Bai 3
24 pages
LECTURE - 9 Aggregation and Grouping
No ratings yet
LECTURE - 9 Aggregation and Grouping
39 pages
Postgres Optimization Guide
No ratings yet
Postgres Optimization Guide
31 pages
Class 12 PYTHON Ch-14 Grouping Records, Joins in SQL (Sumita Arora Latest Edition)
No ratings yet
Class 12 PYTHON Ch-14 Grouping Records, Joins in SQL (Sumita Arora Latest Edition)
9 pages
Lecture 6 SQL Part 2
No ratings yet
Lecture 6 SQL Part 2
56 pages
SQL Aggregation and Joins Guide
No ratings yet
SQL Aggregation and Joins Guide
41 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
Relational Algebra and Data Aggregation
No ratings yet
Relational Algebra and Data Aggregation
43 pages
SYBCA DBMS Exam Solutions
No ratings yet
SYBCA DBMS Exam Solutions
18 pages
SQL Decorrelation and Window Functions - in Data Engineering
No ratings yet
SQL Decorrelation and Window Functions - in Data Engineering
75 pages
Advanced D.base 4
No ratings yet
Advanced D.base 4
20 pages
SQL and Relational Algebra Concepts
No ratings yet
SQL and Relational Algebra Concepts
46 pages
Week 5
No ratings yet
Week 5
11 pages
Sorting and Grouping Data
No ratings yet
Sorting and Grouping Data
60 pages
CubeRollup Slides PDF
No ratings yet
CubeRollup Slides PDF
27 pages
Files & Databases: Relational Dynamics and Triggers Lab Report
No ratings yet
Files & Databases: Relational Dynamics and Triggers Lab Report
20 pages
Aggregate Functions
No ratings yet
Aggregate Functions
11 pages
DB Chapter3
No ratings yet
DB Chapter3
56 pages
Database Nest Quiz
No ratings yet
Database Nest Quiz
22 pages
DBMS Lab 3 Tasks
No ratings yet
DBMS Lab 3 Tasks
2 pages
Lab Manual 4
No ratings yet
Lab Manual 4
17 pages
Working With A Single Grouping Set: Lesson 1: Writing Grouped Queries
No ratings yet
Working With A Single Grouping Set: Lesson 1: Writing Grouped Queries
2 pages
Oracle SQL Tips for Developers
No ratings yet
Oracle SQL Tips for Developers
37 pages
Unit 5
No ratings yet
Unit 5
18 pages
GROUP BY Clause in SQL - 1keydata
No ratings yet
GROUP BY Clause in SQL - 1keydata
2 pages
Aggregate Function
No ratings yet
Aggregate Function
2 pages
Relational Algebra
No ratings yet
Relational Algebra
11 pages
How Are Analytic Functions Different From Group or Aggregate Functions?
No ratings yet
How Are Analytic Functions Different From Group or Aggregate Functions?
4 pages
Chapter 4. Intermediate SQL: Objectives
No ratings yet
Chapter 4. Intermediate SQL: Objectives
25 pages
MySQL - Common Queries
100% (1)
MySQL - Common Queries
92 pages
Duplicate Records ETL Doctor
No ratings yet
Duplicate Records ETL Doctor
20 pages
Crack Your Data Engineering SQL Round
100% (1)
Crack Your Data Engineering SQL Round
112 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
Week11 Relational Algebra & SQL - Aggregation and Grouping Operation
No ratings yet
Week11 Relational Algebra & SQL - Aggregation and Grouping Operation
23 pages
SQL Lectures Ptit
No ratings yet
SQL Lectures Ptit
3 pages
Basic Concepts of Query Processing
No ratings yet
Basic Concepts of Query Processing
10 pages
Examples of Using The SELECT Statement
No ratings yet
Examples of Using The SELECT Statement
13 pages
Dbms Record4
No ratings yet
Dbms Record4
6 pages
10.2 - Chapter 6 - Full Relations Operations
No ratings yet
10.2 - Chapter 6 - Full Relations Operations
22 pages
Vmethod Forensic Toxicology SCRN
No ratings yet
Vmethod Forensic Toxicology SCRN
4 pages
ThinkPad X1 Carbon 5th Gen Datasheet en
No ratings yet
ThinkPad X1 Carbon 5th Gen Datasheet en
2 pages
Chrysler STD Metric Washers
No ratings yet
Chrysler STD Metric Washers
45 pages
Q2KDocv2 07
100% (1)
Q2KDocv2 07
102 pages
Importance of Studying English
100% (2)
Importance of Studying English
2 pages
Shankar UPSC-Mains-New-syllabus PDF
50% (2)
Shankar UPSC-Mains-New-syllabus PDF
4 pages
She Is A Born Leader (Autosaved)
No ratings yet
She Is A Born Leader (Autosaved)
54 pages
Relationship Strategies: American Media Incorporated Presents
No ratings yet
Relationship Strategies: American Media Incorporated Presents
63 pages
Microwave Moisture Meters 94T & 94L
No ratings yet
Microwave Moisture Meters 94T & 94L
4 pages
Life Path 1: Embrace Independence
No ratings yet
Life Path 1: Embrace Independence
51 pages
Any Rail 4 en
No ratings yet
Any Rail 4 en
102 pages
How To Defend Your Thesis Statement
100% (3)
How To Defend Your Thesis Statement
4 pages
2024-2025 KBC Self Sponsor Application Form
No ratings yet
2024-2025 KBC Self Sponsor Application Form
3 pages
Marksheet R220820007263 1
No ratings yet
Marksheet R220820007263 1
1 page
Cutting and Engraving Parameters Lightburn Laser GRBL (DIYKIT 5.5W.pdf Versión 1
No ratings yet
Cutting and Engraving Parameters Lightburn Laser GRBL (DIYKIT 5.5W.pdf Versión 1
8 pages
The Bionicle Story
No ratings yet
The Bionicle Story
523 pages
Production Planning for Managers
100% (2)
Production Planning for Managers
3 pages
MLA 9 Annotated Bibliography Sample
No ratings yet
MLA 9 Annotated Bibliography Sample
2 pages
Software Takes Command
No ratings yet
Software Takes Command
289 pages
Energy Performance Certificate: Estimated Energy Costs of This Home
100% (1)
Energy Performance Certificate: Estimated Energy Costs of This Home
4 pages
Electrical Resistivity Data Interpretation For Groundwater Detection in Tittagudi Taluk
No ratings yet
Electrical Resistivity Data Interpretation For Groundwater Detection in Tittagudi Taluk
7 pages
STAT Presentation
No ratings yet
STAT Presentation
14 pages
Peerlist PHP
No ratings yet
Peerlist PHP
3 pages
Literature Review
No ratings yet
Literature Review
3 pages
Performance of Tri-Axial Compression Test On Cohesive Soils Under Consolidated Drained (CD) Condition
No ratings yet
Performance of Tri-Axial Compression Test On Cohesive Soils Under Consolidated Drained (CD) Condition
14 pages
MUST 2014/2015 Undergraduate Admissions
No ratings yet
MUST 2014/2015 Undergraduate Admissions
13 pages
End Ii Semester Analysis Report Even 2023-24
No ratings yet
End Ii Semester Analysis Report Even 2023-24
6 pages
Hollow Section Connection Design:: Input Data
No ratings yet
Hollow Section Connection Design:: Input Data
3 pages
Sierra Club's June Election Guide
No ratings yet
Sierra Club's June Election Guide
10 pages
Netarhat Application Form
No ratings yet
Netarhat Application Form
3 pages

Eager Aggregation and Lazy Aggregation

Uploaded by

Eager Aggregation and Lazy Aggregation

Uploaded by

Eager Aggregation and Lazy Aggregation

Weipeng P. Yan Per-Bike Larson

We proposed a new query optimization technique,

that lazy group-by can be beneficial.

in E2. Cl “1 Corollary 2 (Eager Count/Lazy Count): The

these two restrictions, i.e., the fin\ 1 selection columns and

aggregation columns and eager count on the stream and

Figure 4: Subset of the TPC-D Database

You might also like