Database Management System (DBMS)
Web Embedded Interactive
Applications Forms SQL SQL
SQL Commands
DBMS
Query
Evaluation Engine
Files and Access Methods
Concurrency Recovery
Control Buffer Manager Manager
Disk Space Manager
Database Data Indexes Catalog
1
Relational Algebra Operations
2
Union () - Example
SID SName Rating Age
22 Dustin 7 45
S1 S2
31 Lubber 8 55
58 Rusty 10 35
S1
SID SName Rating Age
SID SName Rating Age 22 Dustin 7 45
28 Yuppy 9 35 28 Yuppy 9 35
31 Lubber 8 55 31 Lubber 8 55
44 Guppy 5 35 44 Guppy 5 35
58 Rusty 10 35 58 Rusty 10 35
S2
3
Union()
Takes two input relations, which must be union-
compatible:
1. Same number of fields
2. Corresponding fields have the same type
Same requirements for:
1. Intersection ()
2. Set-difference (−)
4
Cross-product () - Example
S1 × R1
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
SID1 SName Rating Age SID2 BID Day
22 Dustin 7 45 22 101 10/10/96
22 Dustin 7 45 58 103 11/12/96
31 Lubber 8 55 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35 22 101 10/10/96
58 Rusty 10 35 58 103 11/12/96
5
Cross-product ()
Each row of S1 is paired with each row of R1
Result schema has one field per field of S1 and R1
Field names inherited if possible
Conflict: Both S1 and R1 have a field called “SID”
Rename to SID1 and SID2
6
Relational Algebra Operations
7
Joins
Condition/Theta Join Defined as: R condition S
Equijoin: a special case of condition join where the
condition contains only equalities
Natural Join: Equijoin on all common fields
Join can be defined as cross-product followed by
selection:
R condition S = condition (RS)
But a join result has fewer tuples than cross-product!
might be able to compute more efficiently (next slides..)
8
Condition Join - Example
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
S1 S1.sid<R1.sid R1
SID1 SName Rating Age SID2 BID Day
22 Dustin 7 45 58 103 11/12/96
31 Lubber 8 55 58 103 11/12/96
9
Equijoin - Example
SID SName Rating Age SID BID Day
22 Dustin 7 45 22 101 10/10/96
31 Lubber 8 55 58 103 11/12/96
58 Rusty 10 35
R1
S1
S1 S1.sid=R1.sid R1
SID SName Rating Age BID Day
22 Dustin 7 45 101 10/10/96
58 Rusty 10 35 103 11/12/96
10
Query Processing: Natural Joins
1. Nested-loops join (tuple and block – based)
2. Single-loop join (index nested loops join)
3. Sort-merge join
4. Hash-join
11
Join
Student
Transcript
12
Nested-loops join (tuple-based)
Join relations T and S
A is the common attribute in T,
B is the common attribute in S
For each record t in T /*outer loop*/
{
For each record s in S /*inner loop*/
{
if (t.A = s.B)
add <t,s> to result
}
}
13
Nested-loops join
Read Block 1st Iteration Read Block
2nd Iteration
Read Block 3rd Iteration Read Block
4th Iteration
Relation T: Read Block
NT records in BT blocks
Read Block
Total number of block reads=
BT + Number of iterations (??) ×
Relation S:
BS =
NS records in BS blocks
BT + NT × BS
14
Nested-loops join
Join relations T and S
T: NT=50 records stored in BT=10 disk blocks
S: NS=6000 records stored in BS=2000 disk blocs
Relation T (outer loop) is scanned one time
I/O cost = BT
Relation S (inner loop) is scanned NT times
I/O cost = NT x BS
Per-tuple Implementation:
I/O Cost = BT + NT BS = 10 + 50 * 2000 block access
If block access = 10 ms, total cost = 16.6 mins!
15
Nested-loops join
In per-tuple Implementation:
For each record t in T
For each record s in S
Compare record s with record t /*in-memory operation*/
However, because of block access:
an entire block of records is already read from T in memory
Idea: compare tuple s with the:
entire block of records from T reduce iterations
Page-at-a-time implementation Next slide…
16
Nested-loops join (page-based)
For each block Tblock in T { /*outer loop*/
For each block Sblock in S { /*inner loop*/
For each record t in Tblock { /* in-memory matching*/
For each record s in Sblock {
if (t.A = s.B)
add <t,s> to result
}
}
} /*end inner loop*/
} /*end outer loop*/
17
Nested-loops join: page-at-a-time
Read Block Read Block
1st Iteration
Read Block Read Block
2nd Iteration
Relation T: Read Block
NT records in BT blocks
Read Block
Total number of block reads=
BT + Number of iterations (??) ×
BS = Relation S:
NS records in BS blocks
BT + BT × BS
18
Nested-loops join: page-at-a-time
Join relations T and S
T: NT=50 records stored in BT=10 disk blocks
S: NS=6000 records stored in BS=2000 disk blocs
Relation T (outer loop) is scanned one time
I/O cost = BT
Relation S (inner loop) is scanned BT times
I/O cost = BT x BS
Per-page Implementation:
I/O Cost = BT + BT BS = 10 + 10 * 2000 block access
If block access = 10 ms, total cost = 3.3 mins!
19
Block nested-loops join
Observation: having more tuples from T in memory
reduces the number of iterations
However, In both of the previous implementations:
We read only one block from T
Idea: Read a chunk of blocks from T instead of only one!
But how many?!
20
Block nested-loops join
Let buffer size be NB memory blocks, we need:
One block for buffering result, and
One block for reading from the inner file (i.e., S)
Then, remaining NB-2
Read as much as:
NB-2 blocks at a time from the outer file (i.e., T)
21
Block nested-loops join
For each NB-2 blocks Tblocks in T { /*outer loop*/
For each block Sblock in S { /*inner loop*/
For each record t in Tblocks { /* in-memory matching*/
For each record s in Sblock {
if (t.A = s.B)
add <t,s> to result
}
}
} /*end inner loop*/
} /*end outer loop*/
22
Block nested-loops join
Relation S:
NS records in BS blocks
Relation T:
NT records in BT blocks
23
Block nested-loops join
Read Blocks Read Block
1st Iteration
Read Block
Read Blocks Read Block
2nd Iteration
Read Block
Relation S:
NS records in BS blocks
Relation T:
NT records in BT blocks
24
Block nested-loops join
Read Blocks Total number of block reads=
1st Iteration BT + Number of iterations (??) ×
BS
Number of iterations
Read Blocks = number of chunks= BT / (NB-2)
2nd Iteration
Total = BT + (BT /(NB-2)) × BS
Relation T:
NT records in BT blocks
25
Order matters!
When, T: outer & S: inner
BT + BS * (BT/(NB-2))
When: T: inner & S: outer
BS + BT * (BS/(NB-2))
In general:
BOuter + BInner × (BOuter/(NB-2))
If BOuter < Binner:
Has no impact on the term: BInner * (BOuter/(NB-2))
Reduces the term: Bouter
Overall, the smaller file should be the outer file
26
Query Processing
1. Nested-loops join
2. Single-loop join (index loop join)
3. Sort-merge join
4. Hash-join
27
Single-loop join (index loop join)
Join relations T and S
A is the common attribute in T,
B is the common attribute in S
Only works if an index exists for one of the two join
attributes (A or B).
Assume an index on attribute B in relation S, then:
For each record t in T /*outer loop*/
{
Locate records s from S, that satisfy:
t.A = s.B
}
28
Single-loop join
SID Name Age SID CID GPA
Probe: 546007
546007 Peter 18 546007 INFS 18
Read Block
546100 Bob 19 546007 MMDS 19
546200 Ann 21 Probe: 546100 546200 INFS 21
Read Block
546207 Jane 20 546100 ENGG 20
546007 ENGG 24
546200 MMDS 18
Probe: 546200
546007 ELEC 27
546200 ELEC 20
B+ tree index
29
Single-loop join performance
If for outer relation:
number of blocks BOuter ,and
number of records NOuter
If for inner relation:
number of index levels LIndex
Cost in number block accesses:
BOuter + (NOuter × (LIndex + 1))
Order matters for single-loop join as well
30
Query Processing: Natural Joins
1. Nested-loop join
2. Single-loop join
3. Sort-merge join
4. Hash-join
31
Putting it together
Nested-loop?
Single-loop?
S.sname Inner and outer?
S.id = E.sid
S.gpa > 4 E.cid = CSE454
Linear Search? Linear Search?
Binary Search? Binary Search?
Index? S E Index?
32