Storage and Indexing

The document provides an overview of storage and indexing in databases, emphasizing the importance of organization for efficient data retrieval. It discusses various storage devices like hard disk drives and solid-state drives, and different data organization methods such as heap files, sorted files, and indexing techniques like B+ trees and hash indexes. The document concludes with the acknowledgment of trade-offs in database systems, highlighting the balance between space, time, read/write performance, and consistency.

Uploaded by

betelix333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views47 pages

Storage and Indexing

Uploaded by

betelix333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

OVERVIEW OF

STORAGE AND
INDEXING
When you database storage and indexing, you
might think
"This sounds like a fancy way of saying 'how
do we put stuff away so we can find it
again?'"
That's exactly what it is!
But as always, the devil is in the details.
Think about your bedroom.
If you just throw everything on the floor, sure, it's
all there
but good luck finding your favorite socks
when you're late for work
But if you organize things
socks in one drawer
shirts in another
books on shelves arranged by topic
suddenly finding things becomes much
easier
That's indexing.
It's the art of organized laziness.
WHY CAN'T WE JUST SEARCH
EVERYTHING?
Imagine you have a library with a million books
and someone asks you to find every book
that mentions "quantum mechanics"
If the books are just randomly scattered on
shelves, you'd have to look through every
single book.
That's a million books to check!
If each book takes you just one second to check
(and that's being generous)
You'd need about 11.5 days of non-stop
searching.
But wait - what if ten people come in with
different requests?
Now you're looking at 115 days of work.
This doesn't scale, as they say in the
computer business.
This is exactly the problem databases face.
Data storage is cheap.
We can throw terabytes of information onto
disks without breaking a sweat.
But finding specific information?
THE PHYSICAL REALITY WE
CANNOT IGNORE
Before we get too abstract, let's talk about the
physical world.
When you store data, it doesn't just float around
in some mystical digital cloud.
It sits on actual, physical devices: spinning
disks, flash memory, magnetic tapes.
HARD DISK DRIVES
THE PATIENT ELEPHANTS
A hard disk drive is like a very patient elephant
with an excellent memory but terrible knees.
It can remember enormous amounts of
information
but it takes time to lumber over to where the
information is stored
Picture a record player, because that's essentially
what a hard drive is.
You have a spinning disk (or several of them)
and a read head that moves back and forth.
To read data, the head has to move to the
right track (seek time)
wait for the right spot to spin around
(rotational latency)
and then read the data sequentially
If your data is scattered all over the disk
the poor read head is dancing back and forth
like a frantic conductor
But if related data is stored together
the head can read it all in one smooth
motion
Sequential reads are roughly 100 times faster than
random reads on traditional hard drives.
That's not just a little better
that's like difference between walking
and taking a jet plane
SOLID STATE DRIVES
THE CAFFEINATED CHEETAHS
SSDs are different beasts.
They're like caffeinated cheetahs, incredibly fast
but they get tired if you work them too hard
and they're more expensive to feed
With SSDs, there's no mechanical movement.
Data access is nearly instantaneous regardless of
where it's stored.
But SSDs have their own quirks
they can only be written to a limited number
of times
and they prefer to work with data in blocks
MEMORY
THE BRILLIANT BUT FORGETFUL FRIEND
RAM is like having a brilliant friend who can
instantly recall anything you've told them
recently.
but completely forgets everything when they
go to sleep
It's incredibly fast.
thousands of times faster than even SSDs
but it's volatile and expensive
We want to keep frequently accessed data in the
fastest storage available.
and less frequently accessed data in cheaper,
slower storage
FILE ORGANIZATION
Now that we understand our storage devices, let's
talk about how we organize data on them.
There are several approaches, each with its own
trade-offs.
HEAP FILES
THE JUNK DRAWER APPROACH
A heap file is like that junk drawer/box.
New records just get thrown in wherever there's
space.
It's simple and fast for insertions, just find any
empty spot and plop the data down.
But searching?
That's where heap files show their weakness.
To find a specific record, you might have to look
through the entire file.
It's like searching for your passport in that junk
drawer, you might get lucky and find it right away,
or you might have to empty the entire drawer
onto the counter.
SORTED FILES
Sorted files are like having a perfectly organized
library.
Every book is in exactly the right place according
to some ordering (say, by author's last name).
This makes finding a specific item much faster,
you can use binary search, which is logarithmic in
time complexity.
If you have a million records, you can find any
specific record in at most 20 steps.
But there's a catch.
Maintaining sorted order is expensive.
Every time you want to insert a new record, you
might have to move thousands of other records to
make room.
It's like trying to insert a new book into a tightly
packed bookshelf - you might have to shift half
the books to make space.
CLUSTERED VS. UNCLUSTERED
ORGANIZATION
Think of clustered organization like a filing
cabinet where related documents are physically
stored together.
If you're looking for all invoices from January
2023, they're all in the same drawer, in the same
section.
Unclustered organization is like having a card
catalog that tells you where to find each
document
but the actual documents might be scattered
all over the building
The catalog is fast to search, but then you have to
run all over the place collecting the actual
documents.
INDEXING
An index is like having a really smart friend who
knows where everything is.
You don't ask them to get the thing for you; you
ask them where to find it.
THE PHONE BOOK PRINCIPLE
Remember phone books?
Phone books were examples of indexing.
You wanted to find Taylor Batumbakal's phone
number.
You didn't start at page 1 and read every entry
until you found Taylor Batumbakal.
Instead, you flipped to the B section, then to Ba,
then to Batumbakal, and finally to Taylor
Batumbakal.
Each step eliminated vast portions of the search
space.
This is what a B+ tree index does
but instead of alphabetical order, it uses
whatever ordering makes sense for your
data.
B+ TREES
B+ trees are like multi-level phone books.
The top level gives you broad categories
the next level breaks those down into smaller
categories
and so on, until you get to the actual data.
A B+ tree is a variant of a B tree where all data is in
the leaf nodes
and the leaf nodes are also a linked list
A B tree is like a binary tree but with more than
two nodes per level
They're always balanced.
No matter how much data you add or remove, the
tree maintains its shape.
This means that finding any piece of data always
takes roughly the same amount of time.
Think of it like a perfectly balanced pyramid.
Whether you're looking for something that should
be on the left side or the right side
you always have to climb the same number
of levels to get there.
HASH INDEXES
Hash indexes are like having a magical filing
system where you whisper the name of what
you're looking for
and a genie (almost) instantly hands it to you
Here's how they work:
you take your search key (say, a customer ID)
run it through a mathematical function called
a hash function, and that tells you exactly
which bucket to look in.
If the hash function is good and your data is
distributed evenly, you can find any record in
constant time
BITMAP INDEXES
Bitmap indexes are elegant in their simplicity.
For each possible value of a field, you create a
bitmap
a string of 1s and 0s indicating which records
have that value.
Let's say you have a million customer records, and
you want to index the "gender" field.
You create one bitmap for "Male" and one for
"Female".
The Male bitmap might look like: 1001011010...
where 1 means "this record is Male" and 0 means
"this record is not Male".
Bitmaps are also used in chess engines.
The thing about bitmaps is that you can combine
them using simple boolean operations.
Want to find all male customers in California?
Just take the Male bitmap AND the California
bitmap.
The computer can do these operations incredibly
quickly because they work at the bit level.
THE TRADE-OFFS
THERE'S NO FREE LUNCH
In database systems, there are fundamental
trade-offs.
You can't optimize everything at once.
SPACE VS. TIME
Indexes speed up queries, but they take up space.
More importantly, they slow down writes
because every time you modify data, you
might have to update multiple indexes.
It's like having multiple copies of your address
book
one sorted by name, one by phone number,
one by address.
Finding information is fast because you can use
whichever copy is most convenient.
But every time someone moves or changes
their phone number, you have to update all
the copies.
READ VS. WRITE PERFORMANCE
Systems optimized for reading (like data warehouses)
often use different strategies than systems optimized
for writing (like transaction processing systems).
For read-heavy systems
you might create lots of indexes
store data in sorted order
and use compression
For write-heavy systems
you might minimize indexes
accept some disorder in exchange for faster
insertions
and avoid expensive maintenance operations
CONSISTENCY VS. PERFORMANCE
In distributed systems:
you can have strong consistency (everyone
sees the same data at the same time)
or high performance
but it's very difficult to have both
It's like trying to keep a group of friends perfectly
synchronized when they're scattered around the
world.
The more you insist on perfect synchronization
the more time you spend coordinating
and the less time you spend actually doing
useful work
CONCLUSION
All the complexity ultimately comes down to a few
simple principles:
1. Locality matters: Keep related things close
together
2. Prediction helps: If you can guess what will be
needed next, you can prepare for it
3. Trade-offs are inevitable: You can't optimize
everything at once
4. Simple ideas scale: The best solutions are often
elegant applications of basic principles
The next time someone tries to impress you with
fancy database terminology
remember that they're really just talking
about organized ways to put things away so
you can find them again.
Everything else is just details
important details
fascinating details
but details nonetheless

DBMS Unit 4
No ratings yet
DBMS Unit 4
22 pages
Unit 4
No ratings yet
Unit 4
18 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
Indexing
No ratings yet
Indexing
62 pages
Introduction To Storage Strategies in DBMS
No ratings yet
Introduction To Storage Strategies in DBMS
8 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Blog Algomaster Io P A Detailed Guide On Database Indexes
No ratings yet
Blog Algomaster Io P A Detailed Guide On Database Indexes
8 pages
Adbms (Bca) 2 1744958912050
No ratings yet
Adbms (Bca) 2 1744958912050
40 pages
L6 Query Optimization
No ratings yet
L6 Query Optimization
52 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Dbms 3 Sem
No ratings yet
Dbms 3 Sem
31 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
DBMS Series Part-2
No ratings yet
DBMS Series Part-2
80 pages
Trends in Software Industry Lecture
No ratings yet
Trends in Software Industry Lecture
73 pages
File Organization
No ratings yet
File Organization
9 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
A Detailed Guide On Database Indexes 11
No ratings yet
A Detailed Guide On Database Indexes 11
14 pages
Chapter 4
No ratings yet
Chapter 4
47 pages
Unit - Iii
No ratings yet
Unit - Iii
16 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
Dbms - Unit 5 Notes
No ratings yet
Dbms - Unit 5 Notes
30 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
7 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
Indexing
No ratings yet
Indexing
6 pages
CAT - DBMS - UoE
No ratings yet
CAT - DBMS - UoE
5 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Week7 Slides
No ratings yet
Week7 Slides
70 pages
Topics Set 1
No ratings yet
Topics Set 1
36 pages
UNIT 5 Dbms
No ratings yet
UNIT 5 Dbms
25 pages
V Unit
No ratings yet
V Unit
15 pages
V Unit
No ratings yet
V Unit
36 pages
Lec 7
No ratings yet
Lec 7
34 pages
Pertemuan 14 Perancangan Arsitektur, Antarmuka, Dan Penyimpanan Data
No ratings yet
Pertemuan 14 Perancangan Arsitektur, Antarmuka, Dan Penyimpanan Data
30 pages
Class 6
No ratings yet
Class 6
15 pages
3: Database Systems: Part V: Physical Database Design
No ratings yet
3: Database Systems: Part V: Physical Database Design
47 pages
Final Lec
No ratings yet
Final Lec
22 pages
DBMS File & Index Organization
No ratings yet
DBMS File & Index Organization
10 pages
10 It Database Managementsystem Notes01-Combined
No ratings yet
10 It Database Managementsystem Notes01-Combined
37 pages
10 It Database Managementsystem Notes01
No ratings yet
10 It Database Managementsystem Notes01
4 pages
Mysql PPT Ver9
No ratings yet
Mysql PPT Ver9
713 pages
Lec 8 Indexing & Data Structures For Query Processing
No ratings yet
Lec 8 Indexing & Data Structures For Query Processing
51 pages
8 Query Optimization
No ratings yet
8 Query Optimization
39 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Efficient Database File Organization
No ratings yet
Efficient Database File Organization
26 pages
DBMS File Organization
No ratings yet
DBMS File Organization
60 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Physical Database Design Guide
No ratings yet
Physical Database Design Guide
38 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
HARDWARE Theory Book
No ratings yet
HARDWARE Theory Book
98 pages
Marklogic Data Rest Encryption
No ratings yet
Marklogic Data Rest Encryption
293 pages
PA-41 Manual1 en PDF
No ratings yet
PA-41 Manual1 en PDF
158 pages
1.OSI Model Interview Questions and Answers (CCNA) - Networker Interview
No ratings yet
1.OSI Model Interview Questions and Answers (CCNA) - Networker Interview
5 pages
COMP2310 Digital Forensics S1 2022 (3) - Merged
No ratings yet
COMP2310 Digital Forensics S1 2022 (3) - Merged
290 pages
DP 1 2 Practice
No ratings yet
DP 1 2 Practice
4 pages
Secure ASP.NET MVC: Top 10 Tips
No ratings yet
Secure ASP.NET MVC: Top 10 Tips
135 pages
SAP Subcontracting Challan Setup
No ratings yet
SAP Subcontracting Challan Setup
2 pages
BCA Data Structures Exam Guide
No ratings yet
BCA Data Structures Exam Guide
3 pages
Micro Controller Lab 3
No ratings yet
Micro Controller Lab 3
11 pages
Zerto Backup For Microsoft 365: Is Your Data Safe?
No ratings yet
Zerto Backup For Microsoft 365: Is Your Data Safe?
2 pages
MIC Assginment 3 Solution
No ratings yet
MIC Assginment 3 Solution
5 pages
Final Exam Monitor
No ratings yet
Final Exam Monitor
4 pages
Unit 6 Fds 2023
No ratings yet
Unit 6 Fds 2023
67 pages
ECE 545-Digital System Design With VHDL: Memories (RAM/ROM) 11/11/08
No ratings yet
ECE 545-Digital System Design With VHDL: Memories (RAM/ROM) 11/11/08
68 pages
MT7620 EEPROM Layout Guide
No ratings yet
MT7620 EEPROM Layout Guide
29 pages
SEOUC - How To Solve The Wrong Problem
No ratings yet
SEOUC - How To Solve The Wrong Problem
42 pages
JPA 2 - Understanding Relationships Between Entities
No ratings yet
JPA 2 - Understanding Relationships Between Entities
9 pages
Database Books
No ratings yet
Database Books
4 pages
Practical Slip Solutions For Data Structure Programming Using C
100% (1)
Practical Slip Solutions For Data Structure Programming Using C
25 pages
Aquilla Dr. Frisco, TX 75034 Cell Phone: 972-529-8222: Aamir Z Memon
No ratings yet
Aquilla Dr. Frisco, TX 75034 Cell Phone: 972-529-8222: Aamir Z Memon
3 pages
Angular JS Shopping Cart Using MVC and WCF Rest - CodeProject
No ratings yet
Angular JS Shopping Cart Using MVC and WCF Rest - CodeProject
23 pages
SQL Exercises Using The Postgresql Dbms
No ratings yet
SQL Exercises Using The Postgresql Dbms
15 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
31 pages
Terminal Examinationspring2021: Only For Teacher'S Use: Q. No. Marks Obtained 1 2 3
No ratings yet
Terminal Examinationspring2021: Only For Teacher'S Use: Q. No. Marks Obtained 1 2 3
13 pages
Btcmining Script
No ratings yet
Btcmining Script
4 pages
Beginner's Guide to Telnet Hacking
No ratings yet
Beginner's Guide to Telnet Hacking
7 pages
Ds Gigavue HC Series
No ratings yet
Ds Gigavue HC Series
24 pages
Borgwarner Supplier Edi Readiness Survey: Fields To Be Filled By: Vendor BW Business BW It
No ratings yet
Borgwarner Supplier Edi Readiness Survey: Fields To Be Filled By: Vendor BW Business BW It
6 pages
ALFO Plus
No ratings yet
ALFO Plus
47 pages

Storage and Indexing

Uploaded by

Storage and Indexing

Uploaded by

OVERVIEW OF

You might also like