0% found this document useful (0 votes)

14 views14 pages

Database Storage and Indexing

This is a book for any one to follow for Indexing in dbms

Uploaded by

Laxmi Priya Dash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Database Storage and Indexing

This is a book for any one to follow for Indexing in dbms

Uploaded by

Laxmi Priya Dash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Database Storage and Indexing (Parts 1–3)

1. Data on External Storage

Definition:
In database systems, external storage refers to the storage devices outside the main memory
(RAM) used to store large volumes of data. Common examples include hard disks (HDDs),
solid-state drives (SSDs), and magnetic tapes. External storage is non-volatile, meaning data
persists even when the system is powered off. DBMSs manage external storage efficiently to
handle read, write, and update operations.

Importance of External Storage:

 Memory Limitation: RAM is limited and costly.

 Large Data Volumes: Databases often contain gigabytes or terabytes of data that cannot
fit into memory.
 Cost-effective: Disks provide a cheaper alternative for large storage.

Characteristics of External Storage:

1. Block-based Access:
o Data is stored in blocks/pages of fixed size (typically 4 KB–16 KB).
o DBMS reads or writes data in blocks, not individual bytes.
2. Seek Time:
o Time required to move the disk arm to the correct track.
3. Rotational Latency:
o Time waiting for the desired block to rotate under the read/write head.
4. Transfer Rate:
o Speed at which data is transferred once the correct block is reached.

Example:
Suppose we have a table:

Student(ID, Name, Marks)

 Total records: 10 million

 Block size: 8 KB

Scenario:

 Query: SELECT * FROM Student WHERE ID=12345;

 Without proper indexing, DBMS would need to scan multiple blocks sequentially.
 Efficient block-based storage allows DBMS to read only the required block, reducing
disk I/O and time.

2. File Organizations and Indexing

2.1 File Organizations

Definition:
File organization refers to the way records are physically arranged and stored in files on disk.
The choice of file organization impacts the efficiency of search, insertion, deletion, and
update operations.

Types of File Organizations:

1. Heap (Unordered) File Organization

 Records stored in no particular order.

 New records are simply appended to the end of the file.
 Search: Requires full file scan, slow for large datasets.
 Best for: Small tables, bulk loading.

Example:
Finding a student with ID=50 in a heap file may require scanning all records until the desired
record is found.

2. Sequential (Ordered) File Organization

 Records stored sorted by a key (e.g., Student ID).

 Search: Efficient using binary search.
 Insertion/Deletion: Expensive; requires shifting/reordering records.
 Best for: Read-heavy applications with rare updates.

Example:

 Records sorted by ID.

 Searching ID=50 is fast (binary search).
 Inserting ID=25 requires moving records to maintain order.

3. Hashed File Organization

 A hash function computes the storage location (bucket) for each record.
 Search: Fast for equality queries (ID=50).
 Limitation: Poor for range queries (ID BETWEEN 10 AND 50).
 Best for: Applications with frequent exact-match lookups.

Example:

 Hash function: hash(ID) = ID mod 100

 ID=50 → bucket 50
 ID=150 → bucket 50
 Searching for ID=50 requires accessing one bucket only.

2.2 Indexing

Definition:
An index is a separate data structure that improves the speed of data retrieval operations by
pointing to the physical location of records. Like the index in a book, it avoids scanning the
whole file.

Purpose of Indexing:

 Quickly locate records without scanning the entire file.

 Reduce disk I/O and improve query performance.

Example:

 Table: 1 million student records (heap file)

 Query: SELECT * FROM Student WHERE ID=12345;
 Without index: Scan millions of records.
 With index: Directly access the block containing ID=12345.

3. Index Data Structures

Definition:
An index in a database is like the index at the back of a book. It helps you find the information
you want without reading the whole book.

 It contains key values (like topics) and pointers to the actual data (like page numbers).
 In databases, indexes help the DBMS find records quickly on the disk.
Why Indexes Are Important:

1. Searching in a huge table without index is slow.

2. Index reduces the number of disk reads.
3. Index helps in fast sorting and range queries.

Structure of an Index:

 Search Key: Column(s) you want to search quickly (e.g., Student.ID)

 Pointer: Physical location/address of the record

3.1 B+ Tree Index (Most Common)

Definition:
A B+ Tree is a tree structure where data is sorted, and all the data pointers are in the leaf
nodes.

Key Points:

 Keys are sorted → easy to find data.

 Good for searching a single value or range of values.
 Tree is balanced, so all leaf nodes are at the same level → same search time.

Example (Easy):

Suppose we have students with IDs: 10, 20, 30, 40, 50, 60, 70

B+ Tree (simplified view):

[30, 50]
/ | \
[10,20] [30,40] [50,60,70]

Explanation:

 To search ID=40:
1. Look at root [30,50] → 40 > 30 and < 50 → go to middle child [30,40]
2. Find ID=40 in leaf node → get pointer to the record

Why B+ Tree is useful:

 Can search for single ID or range like ID 30–50 efficiently.

 Only a few nodes need to be checked → fast even for millions of records.
3.2 Hash Index
Definition:
A hash index uses a hash function to decide where the record will be stored.

Key Points:

 Very fast for exact match searches (=).

 Not good for range searches (BETWEEN).

Easy Example:

 Hash function: hash(ID) = ID mod 10

 Student IDs: 12, 22, 32, 45

Bucket positions:

 12 → bucket 2
 22 → bucket 2
 32 → bucket 2
 45 → bucket 5

Search Example:

 Find ID=32 → hash(32) = 2 → go to bucket 2 → check records → found!

 Very fast O(1) time for exact match.

Limitation:

 Searching for ID < 30 → buckets not in order → slow

3.3 Bitmap Index

Definition:
A bitmap index uses bits (0 or 1) to indicate whether a record has a certain value.

Key Points:

 Best for columns with few distinct values (like Gender, Yes/No).
 Works well in data warehouses with large datasets.
Easy Example:

Table: Students and Gender

Student Gender
1 M
2 F
3 M
4 F

Bitmap for Male: 1 0 1 0

Bitmap for Female: 0 1 0 1

Query: “Find all male students”

 Use bitmap → 1 0 1 0 → Students 1 & 3

Why Bitmap is useful:

 Uses very little space

 Fast filtering using bitwise operations

Comparison of Index Types (Simple Table)

Index Type Best For Range Queries? Easy Example
B+ Tree Sorted data, equality & range queries Yes Find students with ID 30–50
Hash Exact match queries No Find student with ID=32
Bitmap Low-cardinality columns Limited Find all male students

✅ Key Takeaways Index = shortcut to find records quickly

1. B+ Tree: Good for both single value and range searches

2. Hash Index: Super fast for exact matches
3. Bitmap Index: Very efficient for columns with few possible values
4. Choice depends on type of queries and data characteristics
4. Comparison of File Organizations
Definition:
File organization is the way records are stored on disk. Different types affect how fast we can
search, insert, or delete records. Choosing the right one depends on how you use the data.

Types & Comparison:

1. Heap (Unordered) File

 Definition: Records are stored in the order they arrive; no sorting.

 Advantages:
o Very fast to insert new records
o Simple and easy to implement
 Disadvantages:
o Searching is slow → may need to scan the whole file
o Not efficient for large tables
 Example: Add students one by one → searching ID=50 → scan every record until found.
 Best for: Small tables or bulk loading data

2. Sequential (Ordered) File

 Definition: Records are stored in sorted order based on a key (e.g., ID).
 Advantages:
o Searching is fast → can use binary search
o Supports range queries (e.g., ID BETWEEN 10 AND 50)
 Disadvantages:
o Insertion/deletion is slow → need to shift records to maintain order
 Example: Students sorted by ID → search for ID=40 → binary search finds it quickly.
Insert ID=25 → shift records to keep order.
 Best for: Read-heavy tables, rarely updated

3. Hashed File

 Definition: Uses a hash function to decide where each record is stored.

 Advantages:
o Extremely fast equality search (ID=50)
 Disadvantages:
o Cannot efficiently do range queries (ID BETWEEN 10 AND 50)
 Example: hash(ID) = ID mod 100 → ID=50 → bucket 50 → access record
immediately.
 Best for: Frequent equality lookups
Quick Comparison Table:

File
Advantages Disadvantages Best Use Case
Organization
Heap - Very slow search - Not Small tables, bulk
- Fast insertion - Simple
(Unordered) efficient for large datasets loading
- Fast search using binary Read-heavy
Sequential - Slow insertion and deletion
search - Supports range applications with rare
(Ordered) - Requires shifting records
queries updates
- Very fast equality search - Poor for range queries - Frequent equality
Hashed
- Direct access to records Buckets may overflow lookups

5. Indexes and Performance Tuning

Definition:
Performance tuning is the process of making database queries run faster. Indexes are one of
the most important tools.

How Indexes Help:

1. Faster Searches: Can directly locate records using pointers.

2. Less Disk Access: Only necessary blocks are read.
3. Sorting: Queries with ORDER BY can use indexes.
4. Joins: Indexes on foreign keys speed up joins.

Trade-offs:

 Indexes take extra space on disk

 Slows down INSERT, UPDATE, DELETE because indexes must be updated

Example:

CREATE INDEX idx_student_name ON Student(Name);

 Query: SELECT * FROM Student WHERE Name='Ravi'; → fast

 Inserting new student → index must update → small delay

Tip for Students: Think of index as a shortcut. It saves time when reading, but adds a little
work when writing.
6. Guidelines for Index Selection
Definition:
Index selection is the process of deciding which columns of a table should have an index so
that queries run faster, while minimizing the cost of extra storage and slower updates. Good
index selection is crucial for database performance tuning.

Why Guidelines are Important:

 Indexes make search queries faster but consume storage.

 Each index slows down INSERT, UPDATE, DELETE, because the DBMS must update
indexes as well as the table.
 Poor index selection → wasted space, slower updates, and little performance
improvement.

Step-by-Step Guidelines
1. Always Index Primary Keys

 Reason: Primary keys are unique identifiers and frequently used in queries and joins.
 Example:

CREATE INDEX idx_student_id ON Student(ID);

 Query Use: SELECT * FROM Student WHERE ID=123; → fast lookup

 Tip: Almost all DBMS automatically index primary keys, but if not, create one manually.

2. Index Foreign Keys

 Reason: Foreign keys are used in joins between tables. Indexing them speeds up join
operations.
 Example:

CREATE INDEX idx_enrollment_studentid ON Enrollment(StudentID);

 Query Use:

SELECT * FROM Student S

JOIN Enrollment E ON S.ID = E.StudentID;

 Without index → full table scan

 With index → DBMS quickly finds matching records

3. Index Columns Frequently Used in WHERE, ORDER BY, or JOIN

 Reason: Columns that are often searched, filtered, or sorted benefit most from
indexing.
 Example 1: WHERE Name='Ravi' → index Name
 Example 2: ORDER BY Marks DESC → index Marks
 Example 3: Joining TeacherID → index TeacherID

Tip:

 Check query logs to see which columns are frequently used in filtering.

4. Avoid Indexing Small Tables

 Reason: Small tables can be scanned quickly. Indexes add overhead without benefit.
 Example: Table with 10–20 records → full scan faster than index lookup
 Tip: Only index tables with hundreds or thousands of rows.

5. Avoid Over-Indexing

 Reason: Each index slows down INSERT, UPDATE, DELETE. Too many indexes →
overhead.
 Tip: Only index columns that are frequently used in queries.
 Example: Don’t index rarely searched columns like Notes or Comments unless
necessary.

6. Use Composite Indexes for Multi-Column Queries

 Reason: If queries often filter on multiple columns together, create a composite index
instead of separate indexes.
 Example:
Query: WHERE Name='Ravi' AND Marks>80

CREATE INDEX idx_name_marks ON Student(Name, Marks);

 Benefit: DBMS uses single index → faster than two separate indexes
Tip:

 Order of columns matters: The first column in the composite index should be the most
selective (has more unique values).
 Example: Marks may have few distinct values → Name should come first.

7. Consider Column Cardinality

 Definition: Cardinality = number of distinct values in a column

 High-cardinality columns (e.g., ID, Email) → good for indexing
 Low-cardinality columns (e.g., Gender, Yes/No) → may be better for bitmap indexes

8. Use Covering Indexes When Possible

 Definition: An index that contains all columns needed for a query

 Benefit: DBMS can answer query directly from the index, without reading the main
table → faster
 Example:

CREATE INDEX idx_marks_grade ON Student(Marks, Grade);

Query: SELECT Marks, Grade FROM Student WHERE Marks>90;

→ DBMS uses only index → very fast

9. Monitor and Tune Regularly

 Reason: Database usage changes → queries evolve → some indexes become useless
 Tip:
o Drop unused indexes
o Add indexes for new frequent queries
o Use EXPLAIN PLAN (or DBMS query analyzer) to see which queries benefit
from indexes
7. Basic Examples of Index Selection
Definition:
Index selection examples show how to create indexes on database tables to make queries
faster. Choosing the right index depends on the type of query and the columns involved.

1. Single-Column Index
Definition:
An index created on one column to speed up searches that filter using that column.

Example:

CREATE INDEX idx_student_id ON Student(ID);

Explanation:

 Query:

SELECT * FROM Student WHERE ID=12345;

 Without index → DBMS scans all records (slow).

 With index → DBMS directly goes to the record → very fast.

Tip:

 Best for primary keys or columns frequently searched individually.

 Easy to maintain; small overhead for INSERT/UPDATE/DELETE.

2. Composite Index (Multi-Column Index)

Definition:
An index created on two or more columns. Helps queries that filter using multiple columns
together.

Example:

CREATE INDEX idx_student_name_marks ON Student(Name, Marks);

Explanation:

 Query:

SELECT * FROM Student WHERE Name='Ravi' AND Marks>80;

 The DBMS uses one composite index to quickly find records matching both conditions.
 Order of columns matters:
o The first column should be most selective (more unique values).
o In this example, Name comes first because Marks may have many repeated values.

Tip:

 Use composite indexes when queries often filter multiple columns together.
 Avoid creating separate single-column indexes for the same columns → less efficient.

3. Covering Index
Definition:
An index that includes all columns needed by a query. DBMS can answer the query directly
from the index without touching the main table.

Example:

CREATE INDEX idx_student_marks_grade ON Student(Marks, Grade);

 Query:

SELECT Marks, Grade FROM Student WHERE Marks > 90;

Explanation:

 The index contains both Marks and Grade.

 DBMS reads the index only → no need to access the main table → faster query
execution.

Tip:

 Best for read-heavy queries that access a few columns frequently.

 Reduces disk I/O significantly.
 Slightly larger index size due to multiple columns.
4. When to Use Which Index
Index Type When to Use Advantages Example Query
Single- Frequently searched WHERE ID=123
Simple, fast
Column column
Filter on multiple Efficient multi-column WHERE Name='Ravi' AND
Composite Marks>80
columns together search
Query needs only indexed Avoid table access, SELECT Marks, Grade WHERE
Covering Marks>90
columns faster

5. Key Tips -
1. Analyze your queries before creating indexes → choose columns used often in WHERE,
JOIN, ORDER BY.
2. Single-column indexes → fast for equality search.
3. Composite indexes → use when filtering on multiple columns together.
4. Covering indexes → use when query only needs indexed columns → avoids table
lookup.
5. Maintain balance: Indexes improve reads but slightly slow writes.

Module Iippt
No ratings yet
Module Iippt
27 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
7 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
V Unit
No ratings yet
V Unit
36 pages
V Unit
No ratings yet
V Unit
15 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Dbms 3 Sem
No ratings yet
Dbms 3 Sem
31 pages
Indexing
No ratings yet
Indexing
62 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Class 6
No ratings yet
Class 6
15 pages
DBMS File & Index Organization
No ratings yet
DBMS File & Index Organization
10 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Lec 7
No ratings yet
Lec 7
34 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
File Organization
No ratings yet
File Organization
9 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Unit 4
No ratings yet
Unit 4
18 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Storage and File Management
100% (1)
Storage and File Management
16 pages
Dbms Imp Qs Chatgpt
No ratings yet
Dbms Imp Qs Chatgpt
19 pages
Efficient Database File Organization
No ratings yet
Efficient Database File Organization
26 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Indexing
No ratings yet
Indexing
6 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
22 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Unit - Iii
No ratings yet
Unit - Iii
16 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Lecture 16
No ratings yet
Lecture 16
19 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Dbms Unit 5 Notes
No ratings yet
Dbms Unit 5 Notes
23 pages
UNIT 5 Dbms
No ratings yet
UNIT 5 Dbms
25 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
SQL Indexes
No ratings yet
SQL Indexes
20 pages
Index 1
No ratings yet
Index 1
25 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
File Organization
No ratings yet
File Organization
11 pages
Basic Daily Prayers 5
No ratings yet
Basic Daily Prayers 5
29 pages
TD Form 4 W6 Session 16 - Quadrilaterals Part 2
No ratings yet
TD Form 4 W6 Session 16 - Quadrilaterals Part 2
29 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Research Proposal RMU
No ratings yet
Research Proposal RMU
5 pages
Venus's Position in Lord Sri Ram's Birth Chart
No ratings yet
Venus's Position in Lord Sri Ram's Birth Chart
5 pages
Inspiring Insights From An English Teaching Scene PDF
100% (1)
Inspiring Insights From An English Teaching Scene PDF
195 pages
72 Cleanliness US
No ratings yet
72 Cleanliness US
6 pages
Omnivue Traffic Control System
No ratings yet
Omnivue Traffic Control System
6 pages
Passive Voice Exercise
No ratings yet
Passive Voice Exercise
4 pages
Enki's Puzzle Chapter 1
No ratings yet
Enki's Puzzle Chapter 1
11 pages
Madurez Escolar PDF Download
No ratings yet
Madurez Escolar PDF Download
3 pages
David's Secret
No ratings yet
David's Secret
4 pages
Operating Systems Quiz Guide
No ratings yet
Operating Systems Quiz Guide
64 pages
01 인공지능
100% (1)
01 인공지능
372 pages
UNIT 05 Extra Grammar Exercises
50% (2)
UNIT 05 Extra Grammar Exercises
2 pages
Bauminger (2002) PDF
No ratings yet
Bauminger (2002) PDF
16 pages
Cultural Body Language Guide
No ratings yet
Cultural Body Language Guide
3 pages
Red Hat Enterprise Linux 5 Logical Volume Manager Administration
No ratings yet
Red Hat Enterprise Linux 5 Logical Volume Manager Administration
122 pages
Education Paths for School-Leavers
No ratings yet
Education Paths for School-Leavers
7 pages
Day2 Cor 015
No ratings yet
Day2 Cor 015
5 pages
7IG Volume 4 GA Syllabus and Schedule 1-2
No ratings yet
7IG Volume 4 GA Syllabus and Schedule 1-2
2 pages
Photoelectric Effect Answers
No ratings yet
Photoelectric Effect Answers
4 pages
Acorn Yaskawa 5
No ratings yet
Acorn Yaskawa 5
1 page
Mdcs Lab Questions
No ratings yet
Mdcs Lab Questions
5 pages
Unseen Comprehention Passage: 1. Read The Passage and Answer The Questions That Follow
No ratings yet
Unseen Comprehention Passage: 1. Read The Passage and Answer The Questions That Follow
4 pages
Theology & Christian Education
No ratings yet
Theology & Christian Education
29 pages
Aiwa Cs p700
No ratings yet
Aiwa Cs p700
18 pages
Teaching and Assessment of Grammar
No ratings yet
Teaching and Assessment of Grammar
11 pages
English Practice Test This Sample English A2 Core Test Provides A Test Structure
No ratings yet
English Practice Test This Sample English A2 Core Test Provides A Test Structure
17 pages
Journey to Nirvikalpa Samadhi
No ratings yet
Journey to Nirvikalpa Samadhi
19 pages