0% found this document useful (0 votes)

20 views72 pages

8 - FileStructureandStorage - v2 2

Uploaded by

kevin146578

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views72 pages

8 - FileStructureandStorage - v2 2

Uploaded by

kevin146578

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Chapter 10: Storage and File Structure

The original presentation is infused with more information and slides

by Verena Kantere

Database System Concepts, 6th and 7th Ed.

©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 10: Storage and File Structure

 Overview of Physical Storage Media

 Magnetic Disks
 RAID
 Storage Access
 File Organization
 Organization of Records in Files
 Data-Dictionary Storage

Database System Concepts - 6th and 7th Edition 10.2 ©Silberschatz, Korth and Sudarshan
Classification of Physical Storage Media

 Speed with which data can be accessed

 Cost per unit of data
 Reliability
 data loss on power failure or system crash
 physical failure of the storage device
 Can differentiate storage into:
 volatile storage: loses contents when power is switched off
 non-volatile storage:
4 Contents persist even when power is switched off.
4 Includes secondary and tertiary storage, as well as
battery-backed up main-memory.

Database System Concepts - 6th and 7th Edition 10.3 ©Silberschatz, Korth and Sudarshan
Physical Storage Media

 Cache – fastest and most costly form of storage; volatile; managed

by the computer system hardware.
 Main memory:
 fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9
seconds)
 generally too small (or too expensive) to store the entire
database
4 capacities of up to a few Gigabytes widely used currently
4 Capacities have gone up and per-byte costs have
decreased steadily and rapidly (roughly factor of 2 every 2
to 3 years)
 Volatile — contents of main memory are usually lost if a power
failure or system crash occurs.

Database System Concepts - 6th and 7th Edition 10.4 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Flash memory
 Data survives power failure
 Data can be written at a location only once, but location can be
erased and written to again
4 Can support only a limited number (10K – 1M) of write/erase
cycles.
4 Erasing of memory has to be done to an entire bank of
memory
 Reads are roughly as fast as main memory
 But writes are slow (few microseconds), erase is slower
 Widely used in embedded devices such as digital cameras,
phones, and USB keys

Database System Concepts - 6th and 7th Edition 10.5 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Magnetic-disk
 Data is stored on spinning disk, and read/written magnetically
 Primary medium for the long-term storage of data; typically stores
entire database.
 Data must be moved from disk to main memory for access, and written
back for storage
4 Much slower access than main memory (more on this later)
 direct-access – possible to read data on disk in any order, unlike
magnetic tape
 Capacities range usually to some TB
4 Much larger capacity and smaller cost/byte than main
memory/flash memory
4 Growing constantly and rapidly with technology improvements
(factor of 2 to 3 every 2 years)
 Survives power failures and system crashes
4 disk failure can destroy data, but is rare
Database System Concepts - 6th and 7th Edition 10.6 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Optical storage
 non-volatile, data is read optically from a spinning disk using a laser
 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
 Blu-ray disks: 25 GB to 128 GB
 Write-one, read-many (WORM) optical disks used for archival storage
(CD-R, DVD-R, DVD+R)
 Multiple write versions also available (CD-RW, DVD-RW, DVD+RW,
and DVD-RAM)
 Reads and writes are slower than with magnetic disk
 Juke-box systems, with large numbers of removable disks, a few
drives, and a mechanism for automatic loading/unloading of disks
available for storing large volumes of data

Database System Concepts - 6th and 7th Edition 10.7 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Tape storage
 non-volatile, used primarily for backup (to recover from disk failure),
and for archival data
 sequential-access – much slower than disk
 very high capacity (20 to 100 TB tapes available – in December
2020 IBM and Fujifilm announced a new tape with 580 TB capacity)
 tape can be removed from drive Þ storage costs much cheaper
than disk, but drives are expensive
 Tape jukeboxes available for storing massive amounts of data
4 hundreds of terabytes (1 terabyte = 109 bytes) to even multiple
petabytes (1 petabyte = 1012 bytes)

Database System Concepts - 6th and 7th Edition 10.8 ©Silberschatz, Korth and Sudarshan
Tape Storage Example
CERN is still using tapes as the primary permanent storage:
Up to about 1.6B particle collisions per second inside the LHC
experiment's detectors (updated 2024).
The CERN Data Centre processes on average 1PB of data per day. The
LHC experiments produce over 45 PB of data per week, and an additional
hundreds of PB of data are produced per year from other (non-LHC)
experiments.
Magnetic tapes are used as the main long-term storage medium and
data from the archive is continuously migrated to newer technology,
higher density tapes.
The CERN storage system, EOS, was created for the extreme LHC
computing requirements. EOS instances at CERN are more than 2B. EOS
has expanded for other data storage needs beyond high-energy physics,
with AARNET, the Australian Academic and Research Network, and the
EU Joint Research Centre for Digital Earth and Reference Data adopting
it for their big-data systems.
https://information-technology.web.cern.ch/sites/information-
technology.web.cern.ch/files/CERNDataCentre_KeyInformation_June2020V1.pdf

Database System Concepts - 6th and 7th Edition 10.9 ©Silberschatz, Korth and Sudarshan
CERN Data Center
 Underfloor space of about 1m in the Data Center for cables

CERN-CO-0307026-01Photo Copyright of CERN

Database System Concepts - 6th and 7th Edition 10.10 ©Silberschatz, Korth and Sudarshan
CERN Data Center
 Grafana monitoring of CERN Data Center resources

https://monit-grafana-open.cern.ch/d/000000884/it-overview?orgId=16

 The custodial copy of all of CERN’s physics data is stored on magnetic tapes at
the CERN Data Centre, also called the WLCG Tier-0
 In the CERN’s report of November 2021: 32 244 tapes store about 380 PB

Database System Concepts - 6th and 7th Edition 10.11 ©Silberschatz, Korth and Sudarshan
CERN New Data Center
 On 23 February 2024, a new data centre was inaugurated on CERN’s site:
 It spans more than 6000 square metres and including six rooms for IT
equipment with a cooling capacity of 2 MW each
 The centre will host CPU (central processing unit) servers for physics
data processing as well as a small amount of CPU servers and storage
capacity for business continuity and disaster recovery (for example,
when data is corrupted).
 CERN’s main data centre on the Meyrin site (Switzerland) will continue
to house the majority of the Organization’s data storage capacity.
 The data from these experiments is fed into the Worldwide LHC Computing
Grid (WLCG), a collaboration of around 170 data centres distributed across
more than 40 countries, with a storage capacity of about 3 exabytes and one
million CPU cores distributed across the network.

https://home.cern/news/news/computing/new-data-centre-cern

Database System Concepts - 6th and 7th Edition 10.12 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy
cache

main memory

flash memory

magnetic disk

optical disk

magnetic tapes

Database System Concepts - 6th and 7th Edition 10.13 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy (Cont.)

 primary storage: Fastest media but volatile (cache, main

memory).
 secondary storage: next level in hierarchy, non-volatile,
moderately fast access time
 also called on-line storage
 E.g. flash memory, magnetic disks
 tertiary storage: lowest level in hierarchy, non-volatile, slow
access time
 also called off-line storage
 E.g. magnetic tape, optical storage

Database System Concepts - 6th and 7th Edition 10.14 ©Silberschatz, Korth and Sudarshan
Magnetic Hard Disk Mechanism
track t spindle

arm assembly
sector s

cylinder c read–write
head

platter
arm

rotation
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives

Database System Concepts - 6th and 7th Edition 10.15 ©Silberschatz, Korth and Sudarshan
Magnetic Disks
 Read-write head
 Positioned very close to the platter surface (almost touching it)
 Reads or writes magnetically encoded information
 Surface of platter divided into circular tracks
 Over 50K-100K tracks per platter on typical hard disks
 Each track is divided into sectors
 A sector is the smallest unit of data that can be read or written
 Sector size typically 512 bytes
 Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000 (on
outer tracks)
 To read/write a sector
 disk arm swings to position head on right track
 platter spins continually; data is read/written as sector passes under head
 Head-disk assemblies
 multiple disk platters on a single spindle (1 to 5 usually)
 one head per platter, mounted on a common arm
 Cylinder i consists of ith track of all the platters
Database System Concepts - 6th and 7th Edition 10.16 ©Silberschatz, Korth and Sudarshan
Magnetic Disks (Cont.)

 Earlier generation disks were susceptible to head-crashes

 Surface of earlier generation disks had metal-oxide coatings which would
disintegrate on head crash and damage all data on disk
 Current generation disks are less susceptible to such disastrous failures,
although individual sectors may get corrupted
 Disk controller – interfaces between the computer system and the disk
drive hardware.
 accepts high-level commands to read or write a sector
 initiates actions such as moving the disk arm to the right track and
actually reading or writing the data
 Computes and attaches checksums to each sector to verify that data is
read back correctly
4 If data is corrupted, with very high probability stored checksum won’t
match recomputed checksum
 Ensures successful writing by reading back sector after writing it
 Performs remapping of bad sectors

Database System Concepts - 6th and 7th Edition 10.17 ©Silberschatz, Korth and Sudarshan
Disk Subsystem

 Multiple disks connected to a computer system through a controller

Controllers functionality (checksum, bad sector remapping) often carried

out by individual disks; reduces load on controller
 Disk interface standards families
 ATA (AT adaptor) range of standards
 SATA (Serial ATA)
 SCSI (Small Computer System Interconnect) range of standards
 SAS (Serial Attached SCSI)
 Several variants of each standard (different speeds and capabilities)

Database System Concepts - 6th and 7th Edition 10.18 ©Silberschatz, Korth and Sudarshan
Disk Subsystem
 Disks usually connected directly to computer system
 In Storage Area Networks (SAN), a large number of disks are
connected by a high-speed network to a number of servers
 In Network Attached Storage (NAS) networked storage provides a
file system interface using networked file system protocol (TCP/IP),
instead of providing a disk system interface
 The difference is in how the data is accessed. A SAN accesses data
as blocks, while a NAS accesses data as files

Database System Concepts - 6th and 7th Edition 10.19 ©Silberschatz, Korth and Sudarshan
Performance Measures of Disks
 Access time – the time it takes from when a read or write request is issued
to when data transfer begins. Consists of:
 Seek time – time it takes to reposition the arm over the correct track.
4 Average seek time is 1/2 the worst case seek time.
4 4 to 10 milliseconds on typical disks
 Rotational latency – time it takes for the sector to be accessed to
appear under the head.
4 Average latency is 1/2 of the worst case latency.
4 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
 Data-transfer rate – the rate at which data can be retrieved from or stored
to the disk.
 25 to 100 MB per second max rate, lower for inner tracks
 Multiple disks may share a controller, so the rate that controller can
handle is also important
4 E.g. SATA: 150 MB/sec, SATA-II 3Gb (300 MB/sec)
4 Ultra 320 SCSI: 320 MB/s, SAS (3 to 6 Gb/sec)
4 Fiber Channel (FC2Gb or 4Gb): 256 to 512 MB/s

Database System Concepts - 6th and 7th Edition 10.20 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)

 Mean time to failure (MTTF) – the average time the disk is

expected to run continuously without any failure.
 Typically 3 to 5 years
 Probability of failure of new disks is quite low, corresponding to a
“theoretical MTTF” of 500,000 to 1,200,000 hours for a new disk
(about 57 to 136 years)
4 E.g., an MTTF of 1,200,000 hours for a new disk means that
given 1000 relatively new disks, on an average one will fail
every 1200 hours
 MTTF decreases as disk ages
 Most disks have an expected life span of about 5 years, and
have significantly higher rates of failure once they become more
than a few years old.

Database System Concepts - 6th and 7th Edition 10.21 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access
 Block – a contiguous sequence of sectors from a single track
 data is transferred between disk and main memory in blocks
 sizes range from 512 bytes to several kilobytes
4 Smaller blocks: more transfers from disk
4 Larger blocks: more space wasted due to partially filled blocks
4 Typical block sizes today range from 4 to 16 kilobytes
 Disk-arm-scheduling algorithms order pending accesses to tracks so
that disk arm movement is minimized
 elevator algorithm:

R6 R3 R1 R5 R2 R4

Inner track Outer track

Database System Concepts - 6th and 7th Edition 10.22 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access

Some algorithms for disk scheduling

 First Come-First Serve (FCFS)
 Shortest Seek Time First (SSTF)
 Elevator (SCAN)
 Circular SCAN (C-SCAN)
 LOOK
 C-LOOK

Database System Concepts - 6th and 7th Edition 10.23 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access