Introduction to DBMS Internals
DBMS Architecture
Data storage
Alvin Cheung
Aditya Parameswaran
Reading: R & G Chapter 9
Course Overview
• Unit 1: Relational model and SQL
• Unit 2: Storage and indexing
• Unit 3: Query execution
• Unit 4: Query optimization
• Unit 5: Transactions
• Unit 6: Recovery
• Unit 7: Conceptual design
• Unit 8: Advanced topics (time permitting)
Slide Deck Title
DBMS Architecture
Architecture of a DBMS: SQL Client
• How is a SQL query executed?
SQL Client
Database
Management
System
File System Slide Deck Title
DBMS: Parsing & Optimization
Purpose:
Parse, check, and verify the SQL
SQL Client
SELECT S.sid, S.sname, R.bid
FROM Sailors R, Reserves R
WHERE S.sid = R.sid and S.age > 30 Query Parsing & Optimization
GROUP BY age
And translate into an efficient Database
relational query plan Management
System
File System Slide Deck Title
DBMS: Relational Operators
Purpose: Execute query plan by
operating on records and files
SQL Client
Query Parsing & Optimization
Relational Operators
GroupBy (Age)
Database
Indexed Join Management
System
Heap Scan Indexed Scan
Reserves Sailors
File System Slide Deck Title
DBMS: Files and Index Management
Purpose: Organize tables and
Records as groups of pages in
a logical file SQL Client
SSN Last First Age Salary
Name Name
123 Adams Elmo 31 $400
Query Parsing & Optimization
443 Grouc Oscar 32 $300
h
Relational Operators
244 Oz Bert 55 $140
134 Sande
rs
Ernie 55 $400
Database
Files and Index Management
Management
System
File System Slide Deck Title
DBMS: Buffer Management
Purpose:
Provide the illusion of operating
in memory SQL Client
Query Parsing & Optimization
Page Page Page Relational Operators
1 2 3
Database
Files and Index Management
Management
Buffer Management
RAM System
File System Slide Deck Title
Disk Space Management
DBMS: Disk Space Management
Purpose: Translate page requests into
physical bytes on one or more device(s)
SQL Client
Query Parsing & Optimization
Block Block Block Relational Operators
1 2 3 Database
Files and Index Management
Management
Buffer Management
System
Disk Space Management
File System Slide Deck Title
Architecture of a DBMS
• Organized in layers
• Each layer abstracts the layer below
• Manage complexity SQL Client
• Performance assumptions
• Example of good systems design Query Parsing & Optimization
Relational Operators
• Many non-relational DBMSs are Database
Files and Index Management
structured similarly Management
Buffer Management
System
Disk Space Management
File System Slide Deck Title
DBMS: Concurrency & Recovery
Two cross-cutting issues related to
storage and memory management:
SQL Client
Query Parsing & Optimization
Relational Operators
Concurrency Control Database
Files and Index Management
Recovery
Management
Buffer Management
System
Disk Space Management
File System Slide Deck Title
STORAGE MEDIA
Disks
• Most database systems were originally designed for
magnetic “spinning” disks
• Disk are a mechanical anachronism!
• Instilled design ideas that apply to using solid state disks as well
• Major implications:
• Disk API:
• READ: transfer “page” of data from disk to RAM.
• WRITE: transfer “page” of data from RAM to disk.
• No random reads / writes!!
• Both API calls are very, very slow! Slide Deck Title
• Plan carefully!
Storage Hierarchy
.5 ns Small, Fast Registers
L1 cache reference
L1 Cache
L2 Cache
100.0 ns – Currently used data
main memory reference RAM
SSD Likewise if available
.5 ns – Backup and logs
20x106 ns to read 1MB Big, Slow Disk Secondary & tertiary storage
sequentially Slide Deck Title
Economics
• $1000 at NewEgg 2018:
• Mag Disk: ~40TB for $1000
• SSD: ~2.3TB for $1000 TB/$1000
• RAM: 80GB for $1000 50
40
30
20
10
0
RAM SSD Hard Drive
Slide Deck Title
Components of a Disk
• Platters spin (say 15000 rpm)
• Arm assembly moved in or out to position a head
on a desired track Disk head
Spindle
• Tracks under heads make
a “cylinder”
Platters
Tracks
Slide Deck Title
Arm assembly
Components of a Disk
• Platters spin (say 15000 rpm)
• Arm assembly moved in or out to position a head
on a desired track Disk head
Spindle
• Tracks under heads make
a “cylinder”
Arm movement Platters
Tracks
Slide Deck Title
Arm assembly
Components of a Disk
• Platters spin (say 15000 rpm)
• Arm assembly moved in or out to position a head
on a desired track Disk head
Spindle
• Tracks under heads make Sector
a “cylinder”
• Only one head reads/writes at any one time
Platters
• Block/page size is a multiple of (fixed) sector size Arm movement
Tracks
Slide Deck Title
Arm assembly
An Analogy
Slide Deck Title
Accessing a Disk page
• Time to access (read/write) a disk block:
• seek time (moving arms to position disk head on track)
• ~2-3 ms on average
• rotational delay (waiting for block to rotate under head)
• ~0-4 ms (15000 RPM)
• transfer time (actually moving data to/from disk surface)
• ~0.25 ms per 64KB page
• Key to lower I/O cost: reduce seek/rotational delays
Slide Deck Title
Flash (SSD)
• Organized into “cells”
• Current generation (NAND)
• Random reads and writes, but:
• Fine-grain reads (4-8K reads), coarse-grain writes
(1-2MB writes)
Slide Deck Title
Flash (SSD), Pt. 2
• So… read is fast and predictable
• 4KB random reads: ~500MB/sec
• But write is not!
• 4KB random writes: ~120 MB/sec
• Why? Only 2k-3k erasures before failure
• so keep moving write units around (“wear leveling”)
Slide Deck Title
DISK SPACE MANAGEMENT
Block Level Storage
• Read and Write large chunks of sequential bytes
• Sequentially: “Next” disk block is fastest
• Maximize usage of data per Read/Write
• “Amortize” seek delays (HDDs) and writes (SSDs):
if you’re going all the way to Pluto, pack the spaceship full!
• Predict future behavior
• Cache popular blocks
• Pre-fetch likely-to-be-accessed blocks
• Buffer writes to sequential blocks
• More on these as we go Slide Deck Title
A Note on Terminology
• Block = Unit of transfer for disk read/write
• 64KB – 128KB is a good number today
• Book says 4KB
• We’ll use this unit for all storage devices
• Page: a common synonym for “block”
• In some texts, “page” = a block-sized chunk of RAM
• We’ll treat “block” and “page” as synonyms
Slide Deck Title
Disk Space Management
• Lowest layer of DBMS, manages space
on disk SQL Client
• Purpose: Query Parsing & Optimization
• Map pages to locations on disk Relational Operators
• Load pages from disk to memory Database
Files and Index Management
• Save pages back to disk & ensuring writes Management
Buffer Management
System
Disk Space Management
• Higher levels call upon this layer to:
• Read/write a page File System
Slide Deck Title
• Allocate/de-allocate logical pages
Disk Space Management:
Requesting Pages
• page = getFirstPage(“Sailors”);
while (!done) {
process(page);
page = page.nextPage();
}
• Physical details hidden from higher levels of system
• Higher levels may “safely” assume nextPage is fast
• Hence sequential runs of pages are quick to scan
Slide Deck Title
Disk Space Management: Implementation
• Proposal 1: Talk to the storage device directly
• Could be very fast if you knew the device well
• Hard to program when each device has its own API
• What happens when devices change?
Slide Deck Title
Disk Space Management: Implementation
• Proposal 2: Run our own over filesystem (FS)
• Bypass the OS, allocate single large “contiguous” file on
an empty disk
• assume sequential/nearby byte access are fast
• Most FS optimize disk layout for sequential access
• Gives us more or less what we want if we start with an
empty disk
• DBMS “file” may span multiple FS files on multiple
disks/machines Slide Deck Title
Disks and Files: Summary
• Magnetic (hard) disks and SSDs
• Basic HDD and SSD mechanics
• Concept of “near” pages and how it relates to cost of
access
• Relative cost of
• Random vs. sequential disk access (10x)
• Disk (Pluto) vs RAM (Sacramento) vs. registers (your head)
• Big, big differences! Slide Deck Title
Files: Summary
• DB File storage
• Typically over FS file(s)
• Disk space manager loads and stores pages
• Block level reasoning
• Abstracts device and file system; provides fast
“next page”
Slide Deck Title