Advanced Database Systems
Spring 2025
Lecture #06:
Files, Pages, Records
R&G: Chapters 9.5-9.7
2
O VERVIEW : R EPRESENTATIONS
Record Table
12344 Jones CS 18 sid name dept age
Int Varchar Varchar Int 12344 Jones CS 18
12355 Smith Physics 23
12366 Gold CS 21
Byte Representation of Record
a b c d
Database File
Slotted Page
Header
Page1 Page2
Page3 Page4
Record #4 Record #3
Page5 Page6
Record #2 Record #1
3
O UTLINE
Storage Media SQ L C lie nt
Disk Space Management Q uer y Pl anning
Buffer Management O pera tor Exe cuti on
F ile s & Index Mana ge me nt
File Layout
Buf fe r Mana ge me nt
Page Layout
Di sk S pace Ma nag em e nt
Record Layout
Da taba se
4
F ILES OF PAGES OF R ECORDS
Tables stored as logical (database) files Header Header
A file consists of one or more pages Record #4
Record #2
Record #3
Record #1
Record #4
Record #2
Record #3
Record #1
Each page contains one or more records Header Header
Record #4 Record #3 Record #4 Record #3
Pages are managed
Record #2 Record #1 Record #2 Record #1
Header Header
On disk by the disk space manager Record #4 Record #3 Record #4 Record #3
Record #2 Record #1 Record #2 Record #1
Pages read/written to physical disk/files
Database File
In memory by the buffer manager
Higher levels of DBMS only operate in memory
Page management is oblivious to their actual content
5
F ILES OF PAGES OF R ECORDS
Database file: A collection of pages, each containing a collection of records
Could span multiple OS files and even machines
SQ L Clien t
API for higher levels of the DBMS:
Q u e r y P l a n n i ng
Create/delete a file
O p e r a t o r E xe c ut i o n
Insert/delete/modify a record
F i l e s & In d e x M a na g e m e n t
Fetch a particular record by record ID
B uf f e r M a n a g e m e n t
Record ID = (page ID, location on page)
Scan all records D i s k S p a ce M a n a g e m e n t
possibly with a predicate on the desirable records
D a t a b a se
6
DB F ILE O RGANISATION
Method of arranging a file of records
At this point in the hierarchy, we do not care what is page format
Different types exist, each ideal for some situations & not so good in others:
Heap Files
Records placed arbitrarily across pages
Sorted Files
Pages and records are in sorted order
Index Files
B+ trees, hash-based files
May contain records or point to records in other files
7
H EAP F ILE
Most important type of files in a database
Collection of records in no particular order
Not to be confused with “heap” data-structure
As file shrinks/grows, pages allocated/deallocated
To support record level operations, we must
Keep track of the pages in a file
Keep track of free space on pages
Keep track of the records on a page
8
H EAP F ILE : L INKED L IST
Doubly linked lists of pages
Header page allocated when the file is created
Header page ID stored in the system catalog full data pages
Initially both page lists are empty
Data Data
Each page keeps track of Page Page
the free space in itself Header
Data Data
Page Page
Easy to implement, but
pages w/ free space
Most pages end up in the free space list
Finding a page with sufficient empty space may search many pages
9
H EAP F ILE : PAGE D IRECTORY
Directory = set of special pages storing metadata about data pages
Each directory entry Header Data
Page
Identifies a data page & records #free bytes on it
Free space search more efficient Data
Far fewer pages read to find a page to fit a record Page
One header page reveals free space of many pages
Header pages accessed often ⇒ likely in cache Data
Page
Small memory overhead to host the directory
Page Directory
10
O UTLINE
Storage Media SQ L C lie nt
Disk Space Management Q uer y Pl anning
Buffer Management O pera tor Exe cuti on
F ile s & Index Mana ge me nt
File Layout
Buf fe r Mana ge me nt
Page Layout
Di sk S pace Ma nag em e nt
Record Layout
Da taba se
11
PAGE L AYOUT
How to organize the data stored inside the page
Header
Header stores metadata about the page
page ID, #records, free space, next/prev pointers, etc.
Find record by record ID, insert, delete records Data
record ID (rid) = (page ID, offset in page)
Things to consider:
Record length? Fixed or variable Page
Page packing? Packed or unpacked
12
F IXED -L ENGTH R ECORDS
Records are made up of multiple fields
Fields = values for columns in a table
We have fixed-length records when field lengths are consistent
The first field always has N bytes, the second field always has M bytes, etc.
⇒ Record lengths are fixed
Every record is always the same number of bytes
Notice that the implication might not be true the other way
We can store fixed-length records in two ways: packed and unpacked
13
F IXED -L ENGTH R ECORDS , PACKED
No gaps between records Header
# records = 4 A
Record ID = (page ID, position in page)
B
Easy to compute the offset of a record in the page:
C D
sizeof (Header) + (position – 1) * sizeof (Record)
Record ID:
(Page 2, 3) Free Space
Page
Note:
Data is always stored in linear order
Unused space
For presentation, we “wrap around” the linear order into a rectangle
14
F IXED -L ENGTH R ECORDS , PACKED
No gaps between records Header
# records = 5
4 A
Record ID = (page ID, position in page)
B
Easy to compute the offset of a record in the page:
C D
sizeof (Header) + (position – 1) * sizeof (Record)
E
Insert record
Free Space
Just append a new record to the end Record ID:
sizeof (Header) + # records * sizeof (Record) (Page 2, 5)
Page
Unused space
15
F IXED -L ENGTH R ECORDS , PACKED
No gaps between records Header
# records = 4 A
Record ID = (page ID, position in page)
Easy to compute the offset of a record in the page:
C D
sizeof (Header) + (position – 1) * sizeof (Record)
E
Delete record
Free Space
Move the last record to the emptied slot
But this changes the last record’s ID!
Not always desirable
Page
Updating record ID pointers could be expensive if they’re in other files
Unused space
16
F IXED -L ENGTH R ECORDS , PACKED
No gaps between records Header
# records = 4 A
Record ID = (page ID, position in page)
E
Easy to compute the offset of a record in the page:
C D
sizeof (Header) + (position – 1) * sizeof (Record)
Delete record
Record ID: Free Space
Move the last record to the emptied slot
(Page 2, 2)
But this changes the last record’s ID!
Not always desirable
Page
Updating record ID pointers could be expensive if they’re in other files
Unused space
17
F IXED -L ENGTH R ECORDS , U NPACKED (B ITMAP )
Allow gaps between records Header
Bitma p A
Record ID = (page ID, position in page)
Use a bitmap to keep track of where the gaps are
C D
Insert record E
Record ID:
Find first empty slot by scanning the bitmap
(Page 2, 3) Free Space
Delete record
Clear bit in the bitmap
Page
18
F IXED -L ENGTH R ECORDS , U NPACKED (F REE L IST )
Alternative to using bitmap Header
Free list A
Link all free slots into a free list
Each link points to the beginning of a free slot, last is null
C D
Insert record
Insert into slot pointed by head of free list E
Set next free slot as new head Free Space
Delete record
Set slot of deleted record as new head
Page
19
F IXED -L ENGTH R ECORDS , U NPACKED (F REE L IST )
Alternative to using bitmap Header
Free list A
Link all free slots into a free list
Each link points to the beginning of a free slot, last is null
D
Insert record
Insert into slot pointed by head of free list E
Set next free slot as new head Free Space
Delete record
Set slot of deleted record as new head
Page
Example: after deleting record C
20
VARIABLE -L ENGTH R ECORDS
Variable-length records may not have field lengths consistent
E.g.: The third field may take 0 to 4 bytes
How do we know where each record begins?
Header
What happens when we add and delete records? A B
C
Page
21
S LOTTED PAGES
Most common layout scheme is called slotted pages
Slot directory maps “slots” to the Slot directory
records’ starting position offsets Header
Record ID = (page ID, slot ID)
Header keeps track of:
The number of used slots
The offset of the last slot used Record D Record C
Record B Record A
Records stored at the end of page
Fixed/Var-length records
22
S LOTTED PAGES
Records can be moved without changing rid
Delete record Slot directory
Set slot offset to -1, delete slot only if last Header
Move records to fill up the whole or
defragment space periodically
Insert record
Find a slot with offset -1 or create if none Record D Record C
Allocate just the right amount of space Record B Record A
Defragment if not enough free space
Fixed/Var-length records
23
O UTLINE
Storage Media SQ L C lie nt
Disk Space Management Q uer y Pl anning
Buffer Management O pera tor Exe cuti on
F ile s & Index Mana ge me nt
File Layout
Buf fe r Mana ge me nt
Page Layout
Di sk S pace Ma nag em e nt
Record Layout
Da taba se
24
R ECORD L AYOUT
Relational model
Each record in table has some fixed type
We do not need to store metadata about the schema
Information about field types is stored in the system catalog
System catalog is just another set of tables
Goals:
Records should be compact in memory & disk format
Fast access to fields (why?)
Easy case: Fixed-length fields
Interesting case: Variable-length fields
25
R ECORD L AYOUT : F IXED -L ENGTH R ECORDS
Each field has a fixed length Base address + 6B
Base address
Direct access to record fields
Done via arithmetic (fast) a b c d
Each record can have a header storing metadata 4B 2B 8B 2B
E.g., bitmap for NULL values
Fixed field lengths
No need to store information about the schema
26
VARIABLE -L ENGTH R ECORDS
Some fields have variable length
Two ways to store variable length records:
1. Fields delimited by special symbols
Access to fields requires a scan of the record a $ b $ c $ d $
Special symbols in fields require “escaping”
27
VARIABLE -L ENGTH R ECORDS
2. Array of field offsets
Direct access to fields & no “escaping” a b c d
Useful for fixed-length records too
Clean way of dealing with NULL values
Aside: this is actually not sufficient for storing NULL values for all types
Cannot distinguish between empty string (“”) and NULL
Need some extra metadata (e.g. bitmap in record header or special char in field),
which varies widely between different DBMSs
28
VARIABLE -L ENGTH R ECORDS
Possible implementation
Fields are stored in order
Variable-length fields represented by fixed size (offset, length), with actual data stored
after all fixed-length fields
NULL values represented by NULL-value bitmap
Record
12344 Jones CS 18
BigInt Varchar Varchar Int
NULL Bitmap (stored in 1 byte)
0000
Byte Representation of Record
12344 21, 5 26, 2 18 Jones CS
Bytes 0 8 12 16 20 21 26 28
29
S UMMARY
DB file contains pages, and records within pages
Heap files: unordered records organized with directories
Page layout 12344
Record
Jones CS 18 sid
Table
name dept age
Int Varchar Varchar Int
Fixed-length packed and unpacked
12344 Jones CS 18
12355 Smith Physics 23
12366 Gold CS 21
Variable length records in slotted pages Byte Representation of Record
a b c d
Database File
Slotted Page
Header
Variable-length record layout
Page1 Page2
Page3 Page4
Direct access to i-th field and NULL values
R e c o rd # 4 R e c o rd # 3
Page5 Page6
R ec o rd # 2 R e c o rd # 1