0% found this document useful (0 votes)
31 views32 pages

3 Join Optimization

The document provides an overview of Database Management Systems (DBMS) and various relational algebra operations, including union, cross-product, and joins. It discusses the requirements for these operations and outlines different join techniques such as nested-loops, single-loop, sort-merge, and hash-joins, along with their performance implications. Additionally, it emphasizes the importance of query processing strategies in optimizing database operations.

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views32 pages

3 Join Optimization

The document provides an overview of Database Management Systems (DBMS) and various relational algebra operations, including union, cross-product, and joins. It discusses the requirements for these operations and outlines different join techniques such as nested-loops, single-loop, sort-merge, and hash-joins, along with their performance implications. Additionally, it emphasizes the importance of query processing strategies in optimizing database operations.

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Database Management System (DBMS)

Web Embedded Interactive


Applications Forms SQL SQL

SQL Commands
DBMS
Query
Evaluation Engine

Files and Access Methods


Concurrency Recovery
Control Buffer Manager Manager
Disk Space Manager

Database Data Indexes Catalog

1
Relational Algebra Operations

2
Union () - Example
SID SName Rating Age
22 Dustin 7 45
S1  S2
31 Lubber 8 55
58 Rusty 10 35
S1
SID SName Rating Age
SID SName Rating Age 22 Dustin 7 45
28 Yuppy 9 35 28 Yuppy 9 35
31 Lubber 8 55 31 Lubber 8 55
44 Guppy 5 35 44 Guppy 5 35
58 Rusty 10 35 58 Rusty 10 35

S2

3
Union()
 Takes two input relations, which must be union-
compatible:
1. Same number of fields
2. Corresponding fields have the same type

 Same requirements for:


1. Intersection ()
2. Set-difference (−)

4
Cross-product () - Example
S1 × R1
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
SID1 SName Rating Age SID2 BID Day
22 Dustin 7 45 22 101 10/10/96
22 Dustin 7 45 58 103 11/12/96
31 Lubber 8 55 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35 22 101 10/10/96
58 Rusty 10 35 58 103 11/12/96

5
Cross-product ()
 Each row of S1 is paired with each row of R1

 Result schema has one field per field of S1 and R1

 Field names inherited if possible

 Conflict: Both S1 and R1 have a field called “SID”

 Rename to SID1 and SID2

6
Relational Algebra Operations

7
Joins
 Condition/Theta Join Defined as: R condition S
 Equijoin: a special case of condition join where the
condition contains only equalities
 Natural Join: Equijoin on all common fields

 Join can be defined as cross-product followed by


selection:
R condition S = condition (RS)

 But a join result has fewer tuples than cross-product!


 might be able to compute more efficiently (next slides..)
8
Condition Join - Example
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
S1 S1.sid<R1.sid R1

SID1 SName Rating Age SID2 BID Day


22 Dustin 7 45 58 103 11/12/96
31 Lubber 8 55 58 103 11/12/96

9
Equijoin - Example
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
S1 S1.sid=R1.sid R1

SID SName Rating Age BID Day


22 Dustin 7 45 101 10/10/96
58 Rusty 10 35 103 11/12/96

10
Query Processing: Natural Joins

1. Nested-loops join (tuple and block – based)

2. Single-loop join (index nested loops join)

3. Sort-merge join

4. Hash-join

11
Join

Student

Transcript

12
Nested-loops join (tuple-based)
 Join relations T and S
 A is the common attribute in T,
B is the common attribute in S

For each record t in T /*outer loop*/


{
For each record s in S /*inner loop*/
{
if (t.A = s.B)
add <t,s> to result
}
}

13
Nested-loops join

Read Block 1st Iteration Read Block

2nd Iteration
Read Block 3rd Iteration Read Block

4th Iteration

Relation T: Read Block


NT records in BT blocks

Read Block
Total number of block reads=

BT + Number of iterations (??) ×


Relation S:
BS =
NS records in BS blocks
BT + NT × BS
14
Nested-loops join
 Join relations T and S
 T: NT=50 records stored in BT=10 disk blocks
 S: NS=6000 records stored in BS=2000 disk blocs
 Relation T (outer loop) is scanned one time
 I/O cost = BT
 Relation S (inner loop) is scanned NT times
 I/O cost = NT x BS
 Per-tuple Implementation:
 I/O Cost = BT + NT BS = 10 + 50 * 2000 block access
 If block access = 10 ms, total cost = 16.6 mins!

15
Nested-loops join
 In per-tuple Implementation:
For each record t in T
For each record s in S
Compare record s with record t /*in-memory operation*/
 However, because of block access:
 an entire block of records is already read from T in memory

 Idea: compare tuple s with the:


 entire block of records from T  reduce iterations

 Page-at-a-time implementation Next slide…

16
Nested-loops join (page-based)
For each block Tblock in T { /*outer loop*/
For each block Sblock in S { /*inner loop*/

For each record t in Tblock { /* in-memory matching*/


For each record s in Sblock {
if (t.A = s.B)
add <t,s> to result
}
}
} /*end inner loop*/
} /*end outer loop*/

17
Nested-loops join: page-at-a-time

Read Block Read Block


1st Iteration

Read Block Read Block


2nd Iteration

Relation T: Read Block


NT records in BT blocks

Read Block
Total number of block reads=
BT + Number of iterations (??) ×
BS = Relation S:
NS records in BS blocks
BT + BT × BS
18
Nested-loops join: page-at-a-time
 Join relations T and S
 T: NT=50 records stored in BT=10 disk blocks
 S: NS=6000 records stored in BS=2000 disk blocs
 Relation T (outer loop) is scanned one time
 I/O cost = BT
 Relation S (inner loop) is scanned BT times
 I/O cost = BT x BS
 Per-page Implementation:
 I/O Cost = BT + BT BS = 10 + 10 * 2000 block access
 If block access = 10 ms, total cost = 3.3 mins!

19
Block nested-loops join
 Observation: having more tuples from T in memory
reduces the number of iterations 
 However, In both of the previous implementations:
 We read only one block from T 

 Idea: Read a chunk of blocks from T instead of only one!


 But how many?!

20
Block nested-loops join
 Let buffer size be NB memory blocks, we need:
 One block for buffering result, and
 One block for reading from the inner file (i.e., S)
 Then, remaining NB-2
 Read as much as:

NB-2 blocks at a time from the outer file (i.e., T)

21
Block nested-loops join

For each NB-2 blocks Tblocks in T { /*outer loop*/


For each block Sblock in S { /*inner loop*/

For each record t in Tblocks { /* in-memory matching*/


For each record s in Sblock {
if (t.A = s.B)
add <t,s> to result
}
}
} /*end inner loop*/
} /*end outer loop*/

22
Block nested-loops join

Relation S:
NS records in BS blocks
Relation T:
NT records in BT blocks
23
Block nested-loops join

Read Blocks Read Block

1st Iteration
Read Block

Read Blocks Read Block

2nd Iteration
Read Block

Relation S:
NS records in BS blocks
Relation T:
NT records in BT blocks
24
Block nested-loops join

Read Blocks Total number of block reads=

1st Iteration BT + Number of iterations (??) ×


BS
Number of iterations
Read Blocks = number of chunks= BT / (NB-2)

2nd Iteration
Total = BT + (BT /(NB-2)) × BS

Relation T:
NT records in BT blocks
25
Order matters!
 When, T: outer & S: inner
 BT + BS * (BT/(NB-2))
 When: T: inner & S: outer
 BS + BT * (BS/(NB-2))
 In general:
BOuter + BInner × (BOuter/(NB-2))

 If BOuter < Binner:


 Has no impact on the term: BInner * (BOuter/(NB-2))
 Reduces the term: Bouter

 Overall, the smaller file should be the outer file


26
Query Processing

1. Nested-loops join

2. Single-loop join (index loop join)

3. Sort-merge join

4. Hash-join

27
Single-loop join (index loop join)
 Join relations T and S
 A is the common attribute in T,
B is the common attribute in S

 Only works if an index exists for one of the two join


attributes (A or B).
 Assume an index on attribute B in relation S, then:

For each record t in T /*outer loop*/


{
Locate records s from S, that satisfy:
t.A = s.B
}
28
Single-loop join
SID Name Age SID CID GPA
Probe: 546007
546007 Peter 18 546007 INFS 18
Read Block
546100 Bob 19 546007 MMDS 19

546200 Ann 21 Probe: 546100 546200 INFS 21


Read Block
546207 Jane 20 546100 ENGG 20

546007 ENGG 24

546200 MMDS 18

Probe: 546200
546007 ELEC 27

546200 ELEC 20

B+ tree index

29
Single-loop join performance
 If for outer relation:
 number of blocks BOuter ,and
 number of records NOuter

 If for inner relation:


 number of index levels LIndex

 Cost in number block accesses:


BOuter + (NOuter × (LIndex + 1))
 Order matters for single-loop join as well

30
Query Processing: Natural Joins

1. Nested-loop join

2. Single-loop join

3. Sort-merge join

4. Hash-join

31
Putting it together

Nested-loop?
Single-loop?
 S.sname Inner and outer?

S.id = E.sid

 S.gpa > 4  E.cid = CSE454

Linear Search? Linear Search?


Binary Search? Binary Search?
Index? S E Index?

32

You might also like