Architecture and Implementation of Database Systems HS 07 Indexing
Architecture and Implementation of Database Systems HS 07 Indexing
(12, Simon,
Simon, Schmidt)
Schmidt)
(42, Hugo, Müller)
(77, Frank, Meier)
(12, Simon, Schmidt) (12, Simon, Schmidt)
(77, Frank, Meier)
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 5 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf
Migration of a Tuple to Another Page Migration of a Tuple to Another Page
TID TID
42 2 42 2
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 9 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 10
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf
Migration of Tuples: Mapping Table Mapping Table
mapping table
! Drawbacks of a mapping table:
11 77 3 ! 2 block accesses (1 mapping table block + 1 data block)
43 ..
page 42 page 77 ! Advantage:
Head Head ! no space wasted for forward TIDs
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 14
Mapping Table (optimized, aka PPP) Mapping Table (optimized, aka PPP)
! Drawbacks of a mapping table: optimized old
Algorithm to find a record method method
! 2 block accesses (1 mapping table block + 1 data block)
IF record address contained in cache:
! Optimization: no I/O access on mapping table
! Access to mapping table can be avoided if frequently accessed entries address := cache-addresse 11 42 2 11 77 3
are kept in a separate cache in main memory if record was not found at address: 43 .. 43 ..
I/O-access on mapping table
ELSE
cache mapping table
I/O-access on mapping table
11 42 2 11 77 3
43 .. 43 ..
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 15 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 16
- memory usage
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 17 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 18
Record Layout Record Layout
! fixed-sized part: ! NULL-values
! stores all values having a type of fixed size ! small bitmap of fixed size at the beginning of each record
! e.g. numeric(10,2), date, char[42] ! “1” if attribute is set to zero, else “0”
! Advantage: direct addressing ! Advantage: simple and efficient
address = sizeof(type) * pos
! variable-sized part:
! e.g. varchars
! store size and pointer in fixed-size part
! store actual values in variable-sized part
! Disadvantage: indirect addressing
address = Zeiger
! Important: if variable-sized types are used for any attribute of a record
the direct addressing of the fixed-sized part is still possible!
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 19 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 20
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf
! alternatively: split table into one-column table storing the RID implicitly
(array-like representation)
Hugo Jens Schmidt Dittrich
76 25 Hans Hugo Meier Müller
11 42 12 77 Simon Frank Schmidt Meier
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 23 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 24
Decomposition Storage Model (DSM) Partition Attributes Across (PAX)
! optimized for accessing few attributes ! Idea: colocate values of the same attribute inside a page
! Advantage: very efficient when only
few attributes need to be accessed head
Key fname lname
! Disadvantage: inefficient when many
77 Frank Meier
attributes are accessed
12 Simon Schmidt
! Disadvantage: rows distributed to
42 Hugo Müller subpage 1
many pages Schmidt Dittrich
11 Hans Meier Meier Müller Schmidt Meier
25 Jens Dittrich
76 Hugo Schmidt
Hugo Jens subpage 2
! Literature Hans Hugo Simon Frank
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 27 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 28
! 2. Solution ! Discussion
! store record in separate storage space (e.g. file system of the OS) ! very efficient insert
! store only a link to the separate storage on the DB-page ! poor memory usage (deletes?)
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 29 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 30
Append Only(n) Best Fit, First Fit, Next Fit
! Generalization of append only: ! Best Fit:
! only consider the n last created pages ! start search from the beginning of the page list
! if record fits into one of this pages: OK ! find optimal page
! Else: create a new page ! Cost = linear search in list
! Discussion ! First Fit:
! very fast insert ! start search from the beginning of the page list
! still: poor memory usage (deletes?) ! take the first page that fits
! disadvantage: pages at the beginning of the list will soon be full
(waste of time to search through these pages)
! Next Fit:
! like first fit, but: start search from the last position
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 31 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 32
page 1 2 3 4 5 2 byte/entry
bytes avail. f1 f2 f3 f4 f5 k bits/entry
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 33 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 34
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 35 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 36
Motivation
Indexing ! Example queries:
! What is the address of the student having ID 424342?
! Which students attend less than 2 lectures in this semester?
1. One-dimensional Index Structures ! Which students live in canton Zurich?
! Which students do not live in canton Zurich?
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 39 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 40
Primary vs. Secondary Access Path Secondary Access Path and Inversion
! Queries using a secondary access path may return more than one
! Tow classes of access paths: result
1. Access paths for primary key (1:n relationship among keys and results)
Example:
SELECT *
TID Key fname lname
FROM employees fname 1,0 77 Frank Meier
WHERE empID = 42 Frank
1,1 12 Simon Schmidt
Hans
2. Access paths for secondary key 1,2 42 Hugo Müller
Hugo
Example: 1,3 11 Hans Meier
Jens
1,4 25 Jens Dittrich
SELECT * Simon
1,5 76 Hugo Schmidt
FROM employees
WHERE canton = ‘ZH’
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 41 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 42
Onde-dimensional Access Paths Onde-dimensional Access Paths
one-dimensional access paths one-dimensional access paths
sequential structures tree structures hashed structures sequential structures tree structures hashed structures
chains lists B-trees static hash dynamic chains lists B-trees static hash dynamic
methods hashing methods hashing
methods methods
logical physical constant dynamic logical physical B+-trees constant dynamic
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 43 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 44
! Discussion
! Discussion ! better I/O-performance
! poor I/O-performance ! Worst case: 1 random access for each page
! Worst case: 1 random access for each record ! used in DBMSs
! not used in DBMSs
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf
! b-trees
! Discussion ! most important index structure for DBMSs
! optimal I/O-performance ! advantage: very versatile, easy to extend
! Worst case: 1 random access + sequential access ! invented 30 years ago, still being improved
! very important for DBMSs (especially read-mostly environments) ! several index structures exist that are based on similar ideas
! Drawback: hard to maintain in the presence of inserts and updates (e.g. R-tree, M-tree)
October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 47 October 4, 2007 Dr. Jens Dittrich / Institut of Information Systems / jens.dittrich@inf 48
B-trees
! Basics: see also lectures “Introduction to Databases“ and “IS-K“
Next Week: Indexing
! Agenda (following slides):
! basics (repetition) 1. One-dimensional Index Structures (Part 2)
! ISAM
! Bulk-loading
! prefix B+-trees
! prefix/suffix-compression
! cache-conscious B+-trees
! primary vs. secondary B+-trees
! clustered B+-trees
! secondary index and search engines (e.g. Google)
! ...