0% found this document useful (0 votes)
14 views14 pages

Database Storage and Indexing

This is a book for any one to follow for Indexing in dbms

Uploaded by

Laxmi Priya Dash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Database Storage and Indexing

This is a book for any one to follow for Indexing in dbms

Uploaded by

Laxmi Priya Dash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Database Storage and Indexing (Parts 1–3)

1. Data on External Storage


Definition:
In database systems, external storage refers to the storage devices outside the main memory
(RAM) used to store large volumes of data. Common examples include hard disks (HDDs),
solid-state drives (SSDs), and magnetic tapes. External storage is non-volatile, meaning data
persists even when the system is powered off. DBMSs manage external storage efficiently to
handle read, write, and update operations.

Importance of External Storage:

 Memory Limitation: RAM is limited and costly.


 Large Data Volumes: Databases often contain gigabytes or terabytes of data that cannot
fit into memory.
 Cost-effective: Disks provide a cheaper alternative for large storage.

Characteristics of External Storage:

1. Block-based Access:
o Data is stored in blocks/pages of fixed size (typically 4 KB–16 KB).
o DBMS reads or writes data in blocks, not individual bytes.
2. Seek Time:
o Time required to move the disk arm to the correct track.
3. Rotational Latency:
o Time waiting for the desired block to rotate under the read/write head.
4. Transfer Rate:
o Speed at which data is transferred once the correct block is reached.

Example:
Suppose we have a table:

Student(ID, Name, Marks)

 Total records: 10 million


 Block size: 8 KB

Scenario:

 Query: SELECT * FROM Student WHERE ID=12345;


 Without proper indexing, DBMS would need to scan multiple blocks sequentially.
 Efficient block-based storage allows DBMS to read only the required block, reducing
disk I/O and time.

2. File Organizations and Indexing


2.1 File Organizations

Definition:
File organization refers to the way records are physically arranged and stored in files on disk.
The choice of file organization impacts the efficiency of search, insertion, deletion, and
update operations.

Types of File Organizations:

1. Heap (Unordered) File Organization

 Records stored in no particular order.


 New records are simply appended to the end of the file.
 Search: Requires full file scan, slow for large datasets.
 Best for: Small tables, bulk loading.

Example:
Finding a student with ID=50 in a heap file may require scanning all records until the desired
record is found.

2. Sequential (Ordered) File Organization

 Records stored sorted by a key (e.g., Student ID).


 Search: Efficient using binary search.
 Insertion/Deletion: Expensive; requires shifting/reordering records.
 Best for: Read-heavy applications with rare updates.

Example:

 Records sorted by ID.


 Searching ID=50 is fast (binary search).
 Inserting ID=25 requires moving records to maintain order.

3. Hashed File Organization


 A hash function computes the storage location (bucket) for each record.
 Search: Fast for equality queries (ID=50).
 Limitation: Poor for range queries (ID BETWEEN 10 AND 50).
 Best for: Applications with frequent exact-match lookups.

Example:

 Hash function: hash(ID) = ID mod 100


 ID=50 → bucket 50
 ID=150 → bucket 50
 Searching for ID=50 requires accessing one bucket only.

2.2 Indexing

Definition:
An index is a separate data structure that improves the speed of data retrieval operations by
pointing to the physical location of records. Like the index in a book, it avoids scanning the
whole file.

Purpose of Indexing:

 Quickly locate records without scanning the entire file.


 Reduce disk I/O and improve query performance.

Example:

 Table: 1 million student records (heap file)


 Query: SELECT * FROM Student WHERE ID=12345;
 Without index: Scan millions of records.
 With index: Directly access the block containing ID=12345.

3. Index Data Structures


Definition:
An index in a database is like the index at the back of a book. It helps you find the information
you want without reading the whole book.

 It contains key values (like topics) and pointers to the actual data (like page numbers).
 In databases, indexes help the DBMS find records quickly on the disk.
Why Indexes Are Important:

1. Searching in a huge table without index is slow.


2. Index reduces the number of disk reads.
3. Index helps in fast sorting and range queries.

Structure of an Index:

 Search Key: Column(s) you want to search quickly (e.g., Student.ID)


 Pointer: Physical location/address of the record

3.1 B+ Tree Index (Most Common)


Definition:
A B+ Tree is a tree structure where data is sorted, and all the data pointers are in the leaf
nodes.

Key Points:

 Keys are sorted → easy to find data.


 Good for searching a single value or range of values.
 Tree is balanced, so all leaf nodes are at the same level → same search time.

Example (Easy):

Suppose we have students with IDs: 10, 20, 30, 40, 50, 60, 70

B+ Tree (simplified view):

[30, 50]
/ | \
[10,20] [30,40] [50,60,70]

Explanation:

 To search ID=40:
1. Look at root [30,50] → 40 > 30 and < 50 → go to middle child [30,40]
2. Find ID=40 in leaf node → get pointer to the record

Why B+ Tree is useful:

 Can search for single ID or range like ID 30–50 efficiently.


 Only a few nodes need to be checked → fast even for millions of records.
3.2 Hash Index
Definition:
A hash index uses a hash function to decide where the record will be stored.

Key Points:

 Very fast for exact match searches (=).


 Not good for range searches (BETWEEN).

Easy Example:

 Hash function: hash(ID) = ID mod 10


 Student IDs: 12, 22, 32, 45

Bucket positions:

 12 → bucket 2
 22 → bucket 2
 32 → bucket 2
 45 → bucket 5

Search Example:

 Find ID=32 → hash(32) = 2 → go to bucket 2 → check records → found!


 Very fast O(1) time for exact match.

Limitation:

 Searching for ID < 30 → buckets not in order → slow

3.3 Bitmap Index


Definition:
A bitmap index uses bits (0 or 1) to indicate whether a record has a certain value.

Key Points:

 Best for columns with few distinct values (like Gender, Yes/No).
 Works well in data warehouses with large datasets.
Easy Example:

Table: Students and Gender

Student Gender
1 M
2 F
3 M
4 F

Bitmap for Male: 1 0 1 0


Bitmap for Female: 0 1 0 1

Query: “Find all male students”

 Use bitmap → 1 0 1 0 → Students 1 & 3

Why Bitmap is useful:

 Uses very little space


 Fast filtering using bitwise operations

Comparison of Index Types (Simple Table)


Index Type Best For Range Queries? Easy Example
B+ Tree Sorted data, equality & range queries Yes Find students with ID 30–50
Hash Exact match queries No Find student with ID=32
Bitmap Low-cardinality columns Limited Find all male students

✅ Key Takeaways Index = shortcut to find records quickly

1. B+ Tree: Good for both single value and range searches


2. Hash Index: Super fast for exact matches
3. Bitmap Index: Very efficient for columns with few possible values
4. Choice depends on type of queries and data characteristics
4. Comparison of File Organizations
Definition:
File organization is the way records are stored on disk. Different types affect how fast we can
search, insert, or delete records. Choosing the right one depends on how you use the data.

Types & Comparison:

1. Heap (Unordered) File

 Definition: Records are stored in the order they arrive; no sorting.


 Advantages:
o Very fast to insert new records
o Simple and easy to implement
 Disadvantages:
o Searching is slow → may need to scan the whole file
o Not efficient for large tables
 Example: Add students one by one → searching ID=50 → scan every record until found.
 Best for: Small tables or bulk loading data

2. Sequential (Ordered) File

 Definition: Records are stored in sorted order based on a key (e.g., ID).
 Advantages:
o Searching is fast → can use binary search
o Supports range queries (e.g., ID BETWEEN 10 AND 50)
 Disadvantages:
o Insertion/deletion is slow → need to shift records to maintain order
 Example: Students sorted by ID → search for ID=40 → binary search finds it quickly.
Insert ID=25 → shift records to keep order.
 Best for: Read-heavy tables, rarely updated

3. Hashed File

 Definition: Uses a hash function to decide where each record is stored.


 Advantages:
o Extremely fast equality search (ID=50)
 Disadvantages:
o Cannot efficiently do range queries (ID BETWEEN 10 AND 50)
 Example: hash(ID) = ID mod 100 → ID=50 → bucket 50 → access record
immediately.
 Best for: Frequent equality lookups
Quick Comparison Table:

File
Advantages Disadvantages Best Use Case
Organization
Heap - Very slow search - Not Small tables, bulk
- Fast insertion - Simple
(Unordered) efficient for large datasets loading
- Fast search using binary Read-heavy
Sequential - Slow insertion and deletion
search - Supports range applications with rare
(Ordered) - Requires shifting records
queries updates
- Very fast equality search - Poor for range queries - Frequent equality
Hashed
- Direct access to records Buckets may overflow lookups

5. Indexes and Performance Tuning


Definition:
Performance tuning is the process of making database queries run faster. Indexes are one of
the most important tools.

How Indexes Help:

1. Faster Searches: Can directly locate records using pointers.


2. Less Disk Access: Only necessary blocks are read.
3. Sorting: Queries with ORDER BY can use indexes.
4. Joins: Indexes on foreign keys speed up joins.

Trade-offs:

 Indexes take extra space on disk


 Slows down INSERT, UPDATE, DELETE because indexes must be updated

Example:

CREATE INDEX idx_student_name ON Student(Name);

 Query: SELECT * FROM Student WHERE Name='Ravi'; → fast


 Inserting new student → index must update → small delay

Tip for Students: Think of index as a shortcut. It saves time when reading, but adds a little
work when writing.
6. Guidelines for Index Selection
Definition:
Index selection is the process of deciding which columns of a table should have an index so
that queries run faster, while minimizing the cost of extra storage and slower updates. Good
index selection is crucial for database performance tuning.

Why Guidelines are Important:

 Indexes make search queries faster but consume storage.


 Each index slows down INSERT, UPDATE, DELETE, because the DBMS must update
indexes as well as the table.
 Poor index selection → wasted space, slower updates, and little performance
improvement.

Step-by-Step Guidelines
1. Always Index Primary Keys

 Reason: Primary keys are unique identifiers and frequently used in queries and joins.
 Example:

CREATE INDEX idx_student_id ON Student(ID);

 Query Use: SELECT * FROM Student WHERE ID=123; → fast lookup


 Tip: Almost all DBMS automatically index primary keys, but if not, create one manually.

2. Index Foreign Keys

 Reason: Foreign keys are used in joins between tables. Indexing them speeds up join
operations.
 Example:

CREATE INDEX idx_enrollment_studentid ON Enrollment(StudentID);

 Query Use:

SELECT * FROM Student S


JOIN Enrollment E ON S.ID = E.StudentID;

 Without index → full table scan


 With index → DBMS quickly finds matching records

3. Index Columns Frequently Used in WHERE, ORDER BY, or JOIN

 Reason: Columns that are often searched, filtered, or sorted benefit most from
indexing.
 Example 1: WHERE Name='Ravi' → index Name
 Example 2: ORDER BY Marks DESC → index Marks
 Example 3: Joining TeacherID → index TeacherID

Tip:

 Check query logs to see which columns are frequently used in filtering.

4. Avoid Indexing Small Tables

 Reason: Small tables can be scanned quickly. Indexes add overhead without benefit.
 Example: Table with 10–20 records → full scan faster than index lookup
 Tip: Only index tables with hundreds or thousands of rows.

5. Avoid Over-Indexing

 Reason: Each index slows down INSERT, UPDATE, DELETE. Too many indexes →
overhead.
 Tip: Only index columns that are frequently used in queries.
 Example: Don’t index rarely searched columns like Notes or Comments unless
necessary.

6. Use Composite Indexes for Multi-Column Queries

 Reason: If queries often filter on multiple columns together, create a composite index
instead of separate indexes.
 Example:
Query: WHERE Name='Ravi' AND Marks>80

CREATE INDEX idx_name_marks ON Student(Name, Marks);

 Benefit: DBMS uses single index → faster than two separate indexes
Tip:

 Order of columns matters: The first column in the composite index should be the most
selective (has more unique values).
 Example: Marks may have few distinct values → Name should come first.

7. Consider Column Cardinality

 Definition: Cardinality = number of distinct values in a column


 High-cardinality columns (e.g., ID, Email) → good for indexing
 Low-cardinality columns (e.g., Gender, Yes/No) → may be better for bitmap indexes

8. Use Covering Indexes When Possible

 Definition: An index that contains all columns needed for a query


 Benefit: DBMS can answer query directly from the index, without reading the main
table → faster
 Example:

CREATE INDEX idx_marks_grade ON Student(Marks, Grade);

Query: SELECT Marks, Grade FROM Student WHERE Marks>90;


→ DBMS uses only index → very fast

9. Monitor and Tune Regularly

 Reason: Database usage changes → queries evolve → some indexes become useless
 Tip:
o Drop unused indexes
o Add indexes for new frequent queries
o Use EXPLAIN PLAN (or DBMS query analyzer) to see which queries benefit
from indexes
7. Basic Examples of Index Selection
Definition:
Index selection examples show how to create indexes on database tables to make queries
faster. Choosing the right index depends on the type of query and the columns involved.

1. Single-Column Index
Definition:
An index created on one column to speed up searches that filter using that column.

Example:

CREATE INDEX idx_student_id ON Student(ID);

Explanation:

 Query:

SELECT * FROM Student WHERE ID=12345;

 Without index → DBMS scans all records (slow).


 With index → DBMS directly goes to the record → very fast.

Tip:

 Best for primary keys or columns frequently searched individually.


 Easy to maintain; small overhead for INSERT/UPDATE/DELETE.

2. Composite Index (Multi-Column Index)


Definition:
An index created on two or more columns. Helps queries that filter using multiple columns
together.

Example:

CREATE INDEX idx_student_name_marks ON Student(Name, Marks);


Explanation:

 Query:

SELECT * FROM Student WHERE Name='Ravi' AND Marks>80;

 The DBMS uses one composite index to quickly find records matching both conditions.
 Order of columns matters:
o The first column should be most selective (more unique values).
o In this example, Name comes first because Marks may have many repeated values.

Tip:

 Use composite indexes when queries often filter multiple columns together.
 Avoid creating separate single-column indexes for the same columns → less efficient.

3. Covering Index
Definition:
An index that includes all columns needed by a query. DBMS can answer the query directly
from the index without touching the main table.

Example:

CREATE INDEX idx_student_marks_grade ON Student(Marks, Grade);

 Query:

SELECT Marks, Grade FROM Student WHERE Marks > 90;

Explanation:

 The index contains both Marks and Grade.


 DBMS reads the index only → no need to access the main table → faster query
execution.

Tip:

 Best for read-heavy queries that access a few columns frequently.


 Reduces disk I/O significantly.
 Slightly larger index size due to multiple columns.
4. When to Use Which Index
Index Type When to Use Advantages Example Query
Single- Frequently searched WHERE ID=123
Simple, fast
Column column
Filter on multiple Efficient multi-column WHERE Name='Ravi' AND
Composite Marks>80
columns together search
Query needs only indexed Avoid table access, SELECT Marks, Grade WHERE
Covering Marks>90
columns faster

5. Key Tips -
1. Analyze your queries before creating indexes → choose columns used often in WHERE,
JOIN, ORDER BY.
2. Single-column indexes → fast for equality search.
3. Composite indexes → use when filtering on multiple columns together.
4. Covering indexes → use when query only needs indexed columns → avoids table
lookup.
5. Maintain balance: Indexes improve reads but slightly slow writes.

You might also like