Disk Organization
1
2 Disk Space Management
Lowest layer of DBMS software manages space on
disk.
Higher levels call upon this layer to:
allocate/de-allocate a page
read/write a page
Higher levels don’t need to know how this is done, or
how free space is managed.
3 Keeping Track of Free Blocks
Linked List
Bitmap
A bitmap also allows very fast identification and
allocation of contiguous areas on disk.
4 Buffer Management in a
DBMS
Data must be in RAM for DBMS to operate on it!
Table of <frame#, pageid> pairs is maintained.
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK choice of frame dictated
DB by replacement policy
5 More on Buffer Management
Page in pool may be requested many times,
a pin count is used (the number of current users of the
page)
A page is a candidate for replacement if pin count = 0.
A page is dirty, if its contents have been changed
after writing
Buffer Manager keeps a dirty bit
To remove P :
If P is dirty, we write it to disk
If P is not dirty, then what?
6 When a page is requested…
A page is the unit of memory we request
If Page in the pool
Great no need to go to disk!
increments the pin_count of that frame
If not?
If there is a free frame, use it!
If not? We need to choose a page to remove!
If frame is dirty, write it to disk
Read requested page into chosen frame
Pin the page and return its address.
Frame is chosen for replacement by a replacement policy
When the code releases the page, the pin_count is
decremented
7 Buffer Replacement Policy
Frame is chosen for replacement by a replacement
policies:
Least-recently-used (LRU),
Clock, A variant of LRU,
MRU
8 Least Recently Used (LRU)
This can be implemented in the buffer manager using
a queue of pointers to frames with pin_count 0.
A frame is added to the end of the queue when the
pin_count goes to 0.
The page chosen for replacement is the one in the
frame at the head of the queue
LRU is expensive (why ?)
9 The Clock Approximation
Instead we maintain a “last used clock”
Think of pages ordered 1…N around a clock
uses current variable that takes on values 1 through N in
circular order.
Pages keep a “ref bit”
Whenever the page pin_count goes to 0 set the bit
The current frame is considered for replacement
If current page has ref bit == false choose it
else, unset ref bit and move on
“Approximates LRU” since a recently referenced
pages is less likely to be replaced.
10 MRU (Most Recently Used)
The LRU and clock policies are not always the best
replacement strategies
# buffer frames < # pages in file
#buffer frames=10, # pages in file =11,
means each page request causes an I/O.
Why would you ever want to use MRU?
11 Files of Records
FILE: A collection of pages, each containing a
collection of records.
File must support:
insert/delete/modify record
read a particular record (specified using record id)
scan all records (possibly with some conditions on the
records to be retrieved)
12 Unordered Files (Heap Files)
Simplest file structure contains records in no
particular order.
As file grows and shrinks, disk pages are allocated
and de-allocated.
To support record level operations, we must:
keep track of the pages in a file
keep track of free space on pages
keep track of the records on a page
There are many alternatives for keeping track of
this.
13 Alternative 1:
Heap File Implemented as List
Maintain a table containing pairs of:
<heap_file_name, head_page_address>
Each page contains 2 `pointers’ (rid)
plus data.
Data Data Data Full Pages
Page Page Page
Header
Page
Data Data Data
Pages with
Page Page Page
Free Space
14 Heap File Implemented as a
List
Insert a new page into heap file
Disk manager adds a new free space page into link
Delete a page from heap file
Removed from the list
Disk manager deallocates it
Disadvantages:
If records are of variable length, all pages will be in free list.
because it is likely that every page has at least a few free bytes
Retrieve and examine several pages for enough
space.
15 Alternative 2: Heap File Using
Page Directory
In directory, each entry
Data for a page includes
Header Page 1 number of free bytes
Page on page.
Data The directory is a
Page 2 collection of pages
(linked list
implementation is just
one alternative).
Data Much smaller than
DIRECTORY Page N linked list of all HF
pages!
16 Alternative 2: Heap File Using
a Page Directory
Advantage of Page Directory :
The size of directory is very small (much smaller than heap
file.)
Searching space is very efficient, because find free
space without looking at actual heap data pages.
17 Page Formats
Identify a record:
Most cases, use <page_id, slot_number> as rid.
How to arrange records in pages?
Alternative approaches to manage slots on a
page
How to support insert/deleting/searching?
18 Page Formats: Fixed Length
Records
Alternative 1:
Store records in the first N slots
Whenever a record is deleted move the last record on
the page into the vacated slot.
Advantage:
Can locate the ith record on the page with simple offset
calculation.
All empty slots appear at the end of the page.
Disadvantage:
Does not work if there are external references to the record that
moved
19 Page Formats: Fixed Length
Records
Alternative 2:
Handle deletion using array of bit
One bit per slot to keep track of free slots
When a page is deleted, its bit is turned off (i.e. 0).
Locating records on the page requires scanning the bit
array to locate slots whose bit is on.
20 Page Formats: Fixed Length
Records
Record id = <page id, slot #>.
Note: In first alternative, moving records for free space
management changes rid; may not be acceptable if
existing external references to the record that is moved.
Slot 1 Slot 1
Slot 2 Slot 2
Free
... Space
...
Slot N Slot N
Slot M
N 1 . . . 0 1 1M
number
number M ... 3 2 1
of slots
PACKED of records UNPACKED, BITMAP
21 Page Formats: Variable
Length Records
Cannot divide the page into fixed-length slots
Challenges:
When a new record is to be inserted, we have to find an
empty slot of just the right length.
We must ensure that the free space on the page is
contiguous.
So
The ability to move records on a page becomes very
important
22 Page Formats: Variable
Length Records
Directory of slots
<record offset, record length> per slot
record offset: offset in bytes from the start of the data
area
Deletion: setting record offset to -1
rid <page id, slot id> does not change when a record
moves.
23 Page Formats: Variable
Length Records
Rid = (i,N) Length = 20 Offset of
Page i record
from start
Rid = (i,2) Length = 16 of data
area
Rid = (i,1) Length = 24
20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY
Slot directory = {<record_offset, record_length>}
24 Page Formats: Variable
Length Records
Maintain a pointer to the start of the free space area
When a new record does not fit into the remaining
free space
move records in the page to reclaim space deleted
earlier.
Cannot always remove a slot of a deleted record (or
the rid of the other slots will change).
When a new record is inserted, the directory is
scanned for an element not pointing to a record.
25 Page Formats: Variable
Length Records
Slot directory = {<record_offset, record_length>}
Advantages:
+ Moving: rid is not changed
+ Deletion: offset = -1 (rid changed?
Can we delete slot? Why?)
+ Insertion: Reuse deleted slot.
Only insert if none available.
26 Records Formats: Fixed
Length Record
Information about field types same for all
records in a file
Stored record format in system catalogs.
Finding i’th field does not require scan of
record, just offset calculation.
F1 F2 F3 F4
L1 L2 L3 L4
Base address (B) Address = B+L1+L2
27 Record Formats: Variable
Length
Alternative 1:
Store fields consecutively
Separate fields by delimiters
Disadvantage: requires the scan of the whole record to
reach a field
Alternative 2:
Reserve space at the beginning of the record for use as
an array of integer offsets
The ith entry is the starting address of the ith field relative
to the start of the record.
28 Record Formats: Variable
Length
Two alternative formats (# fields is fixed):
F1 F2 F3 F4
4 $ $ $ $
Field
Fields Delimited by Special Symbols
Count
F1 F2 F3 F4
Array of Field Offsets
+ Second offers direct access to i’th field
+ efficient storage of nulls ;
- small directory overhead.
29 Issues in Variable Length
Records
Modifying a field may cause it to grow
Need to shift subsequent fields to make room.
A modified record may no longer fit in the available
space on its page, May need to move to another
page
rid will change → causing problems
We must leave a forwarding address in the old place
A record may grow so large that it cannot fit on any
page
Break a record into smaller records
Chain them together