0% found this document useful (0 votes)
16 views22 pages

DBMS Unit 4

Physical storage media encompasses various hardware devices for digital data storage, including magnetic, optical, solid-state, and magneto-optical types. The storage device hierarchy organizes these devices based on speed, cost, capacity, and proximity to the CPU, while file organization affects data retrieval efficiency through different methods. Indexing techniques enhance data access in databases, with hashing providing fast retrieval through key mapping, and B+ Trees offering efficient indexing for large datasets.

Uploaded by

fosefan745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views22 pages

DBMS Unit 4

Physical storage media encompasses various hardware devices for digital data storage, including magnetic, optical, solid-state, and magneto-optical types. The storage device hierarchy organizes these devices based on speed, cost, capacity, and proximity to the CPU, while file organization affects data retrieval efficiency through different methods. Indexing techniques enhance data access in databases, with hashing providing fast retrieval through key mapping, and B+ Trees offering efficient indexing for large datasets.

Uploaded by

fosefan745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Overview of Physical Storage Media – Notes

🔹 Definition:
Physical storage media refers to hardware devices used to store digital data, either
temporarily or permanently.

🔹 Types of Physical Storage Media:


1. Magnetic Storage
Examples: Hard Disk Drive (HDD), Magnetic Tapes, Floppy Disks
Features:
High storage capacity
Cost-effective
Slower than solid-state drives
Suitable for long-term storage
2. Optical Storage
Examples: CD, DVD, Blu-ray Disc
Features:
Uses laser technology for reading/writing
Portable and inexpensive
Commonly used for media distribution and backups
3. Solid-State Storage
Examples: Solid-State Drive (SSD), USB Flash Drive, SD Card
Features:
No moving parts (more durable)
High speed and performance
Higher cost per GB than HDD
Common in laptops, smartphones
4. Magneto-Optical Storage
Examples: MO Disks
Features:
Combines magnetic and optical technology
Rewritable and durable
Rarely used in modern systems

🔹 Key Performance Metrics:


Metric Description

Access Time Time to locate and access data

Data Transfer Rate Speed of reading/writing data

Storage Capacity Total data it can hold

Durability Resistance to physical damage or


environmental wear

Cost per Bit Expense of storing each unit of data

📘 Storage Device Hierarchy – Notes


🔹 Definition:
The storage device hierarchy refers to the organization of storage devices in a computer
system based on their speed, cost, capacity, and proximity to the CPU.

🔹 Key Characteristics:
Level Speed Cost per Bit Capacity Volatility

Top (Closer to Fastest Most Expensive Small Volatile


CPU)

Bottom (Farther Slowest Cheapest Very Large Non-volatile


from CPU)

🔹 Typical Hierarchical Levels:


1. Registers
Located inside the CPU
Fastest and most expensive
Very limited in size
Volatile
2. Cache Memory
Small memory close to the CPU
Stores frequently accessed data
Very fast and expensive
Volatile
3. Main Memory (RAM)
Temporary storage for running programs
Faster than secondary storage
Volatile
4. Secondary Storage
Examples: HDD, SSD
Used for long-term data storage
Slower and cheaper than RAM
Non-volatile
5. Tertiary Storage
Examples: Optical Discs, Magnetic Tapes
Used for backup and archival
Very large capacity
Very slow and cheap
Non-volatile
6. Cloud/Remote Storage (Optional Layer)
Online storage (e.g., Google Drive, AWS)
Depends on internet speed
Accessible anywhere
Used for backup, sync, collaboration

🔹 Summary:
As we go up the hierarchy: Speed increases, Cost increases, Capacity decreases
As we go down the hierarchy: Speed decreases, Cost decreases, Capacity increases

File Organization – Notes


🔹 Definition:
File organization refers to the way records are arranged and stored in a file on storage media.
It affects the efficiency of data retrieval, insertion, and deletion.

🔹 Types of File Organization:


1. Sequential File Organization
Records are stored in a linear sequence, one after the other.
Best suited for: Batch processing, large volume of data.
Pros:
Simple to implement
Efficient for reading the entire file
Cons:
Slow for random access
Insertion/deletion is costly
2. Heap (Unordered) File Organization
Records are stored randomly as they come.
Best suited for: Small files or when no specific order is needed.
Pros:
Fast insertions
Cons:
Slow search
Inefficient for updates/deletes
3. Hashing File Organization
Uses a hash function to compute the address of a record.
Best suited for: Direct access to individual records.
Pros:
Fast search for exact matches
Cons:
Poor performance on range queries
Collisions need handling
4. Indexed File Organization
An index is created to map keys to their locations in the file.
Best suited for: Applications with frequent searching.
Pros:
Fast access
Good for large datasets
Cons:
Overhead of maintaining index
Complex implementation
5. Clustered File Organization
Related records from different tables are stored together (used in DBMS).
Improves performance for join operations.

🔹 Comparison Table:
Type Search Speed Insert/Delete Ordered? Use Case

Sequential Slow (for random) Slow Yes Payroll, logs

Heap Slow Fast No Temporary data

Hashing Very Fast Moderate No Exact match queries

Indexed Fast Moderate Yes Large databases, search-

heavy apps

Clustered Fast (for joins) Moderate Yes Relational DB operations

Spanned vs Unspanned Records – Notes


🔹 Definition:
When storing records in blocks (or pages), a record may either:

Fit entirely within a single block (Unspanned), or


Be split across two or more blocks (Spanned)

🔹 1. Unspanned Records
A record must be stored entirely within one block.
If a record does not fit, it is moved to the next block.

✅ Advantages:
Simpler to manage
Faster access (only one block needs to be read)

❌ Disadvantages:
Wastes space (internal fragmentation)
Less flexible for variable-length records

🔹 2. Spanned Records
A record can be split across multiple blocks.
The system keeps track of parts of the record across blocks.

✅ Advantages:
Better space utilization (less wasted space)
Can store large records even in small blocks

❌ Disadvantages:
More complex to manage
Slower access (requires reading multiple blocks)

🔹 Example:
Assume block size = 100 bytes, record size = 70 bytes

Unspanned: Only 1 record per block (30 bytes wasted)


Spanned: First block stores 70 bytes of record A and 30 bytes of record B; next block
continues record B

🔹 Summary Table:
Feature Spanned Unspanned

Record split allowed Yes No

Space utilization High Low

Access speed Slower (may need 2+ blocks) Faster (1 block per record)

Complexity Higher Lower

Indexing – Notes
🔹 Definition:
Indexing is a data structure technique used to quickly locate and access the data in a
database table, just like an index in a book.

🔹 Why Use Indexing?


To speed up data retrieval operations (SELECT queries)
To avoid scanning the entire table (full table scan)
Especially useful for large databases

🔹 Types of Indexing:
1. Primary Index
Built on the primary key
One index entry per record
Records are stored in sorted order of the key
Dense or Sparse indexing can be used
2. Secondary Index
Built on non-primary key attributes
Can be many for a table
Useful for searching on non-unique fields (like Department, City)
3. Clustered Index
Data is physically sorted based on the indexed column
Only one clustered index per table
Faster retrieval for range-based queries
4. Non-Clustered Index
Index and data are stored separately
Can have multiple non-clustered indexes
Slower than clustered for range queries, but still effective
5. Multilevel Index
Index built on top of another index to reduce access time
Used when the first-level index is too large
Similar to B-tree or B+ tree structures

🔹 Dense vs Sparse Index:


Type Every Record Indexed? Space Used Speed

Dense Yes More Fast

Sparse No (only some entries) Less Slower

🔹 Indexing Data Structures:


B-Tree / B+ Tree: Balanced tree structures for multilevel indexing
Hash Indexing: Uses hash function for fast equality search
Bitmap Indexing: Efficient for columns with few distinct values (used in data warehouses)

🔹 Advantages of Indexing:
Faster data retrieval
Improves performance of SELECT queries
Efficient range and exact match searching

🔹 Disadvantages:
Takes extra disk space
Slows down INSERT, UPDATE, DELETE due to index maintenance
Indexing in DBMS – Detailed Explanation
🔹 1. Primary Index
🔸 Description:
Built on primary key of the table.
The data file is sorted on the primary key field.
One entry for each block, not each record (Sparse index).
Also known as sparse primary index if one entry per block.

🔸 Example:
Suppose a table of Students is sorted by Roll_No. A primary index is created on Roll_No.

Roll_No Name

101 A

102 B

103 C

...

Index might store:

Key (Roll_No) Block Address

101 0

110 1

120 2
🔸 Advantages:
Fast access using key
Simple to implement

🔸 Disadvantages:
Only one per table (based on sorting)

🔹 2. Secondary Index
🔸 Description:
Created on fields that are not primary keys.
Table may or may not be sorted on that field.
Can be dense (every record has an index entry).

🔸 Example:
Creating a secondary index on Department in the Employee table.

Emp_ID Name Department

101 A HR

102 B IT

103 C HR

Index (dense):

Department List of Record Pointers

HR [101, 103]

IT [102]

🔸 Advantages:
Allows fast search on non-key fields
Supports multiple indexes

🔸 Disadvantages:
Additional storage
Slower than primary index

🔹 3. Clustered Index
🔸 Description:
Data is physically sorted based on the indexed column.
Only one clustered index per table.
The index entries contain actual data records, not just pointers.

🔸 Example:
A clustered index on Order_Date would arrange the entire table based on dates.

🔸 Advantages:
Fast retrieval for range queries (e.g., date ranges)
Better performance for sorted scans

🔸 Disadvantages:
Insertion/Deletion costly
Only one per table

🔹 4. Non-Clustered Index
🔸 Description:
Index is stored separately from the data.
Index contains pointers to the data rows.
Can create multiple non-clustered indexes.

🔸 Example:
Non-clustered index on Customer_Name while table is sorted by Customer_ID.
Customer_ID Name

101 Zaid

102 Aamir

103 Vikram

Index:

Name Pointer

Aamir → 102

Vikram → 103

Zaid → 101

🔸 Advantages:
Multiple indexes allowed
Faster point lookup

🔸 Disadvantages:
Slower than clustered for range queries
Overhead on storage and maintenance

🔹 5. Dense vs Sparse Index


Feature Dense Index Sparse Index

Records Indexed Every record One per block

Space More storage Less storage


needed

Search Speed Faster (direct) Slightly slower

Maintenance More complex Easier to maintain


during updates

🔹 6. Multilevel Index
🔸 Description:
Index built on top of another index.
Helps when primary index is too large to be stored in memory.
Implemented as B-trees or B+ trees.

🔸 Structure:
Level 1: Index of blocks
Level 2: Index of index blocks
And so on…

🔸 Advantages:
Reduces disk I/O
Logarithmic search time (like binary search)

🔹 7. Hash Indexing
🔸 Description:
Uses a hash function to map keys to block addresses.
Good for exact match queries, not for range queries.

🔸 Example:
Hash(Employee_ID) = Block Number
🔸 Advantages:
Very fast access for equality searches
Simple to implement

🔸 Disadvantages:
Doesn’t support range searches
Collision handling needed (e.g., chaining)

🔹 8. Bitmap Indexing
🔸 Description:
Used for columns with few distinct values (e.g., gender, department).
Stores a bit array for each value.

🔸 Example:
For a Gender column:

ID Gender

1 M

2 F

3 M

Bitmap:

M→101
F→010

🔸 Advantages:
Efficient for AND/OR operations
Small storage space for low-cardinality fields

🔸 Disadvantages:
Not good for high-cardinality fields
Slower for frequent updates

📌 Summary Table:
Index Sorted Supports Speed Space Use Case
Type Range (Exact
Queries Match)

Primary Yes Yes Fast Low Primary


Index key access

Secondary No Yes Moderate Medium Search on


Index non-key
fields

Clustered Yes Yes Very Fast Medium Sorting,


Index range
scans

Non- No Yes Fast High Multiple


Clustered key
searches

Multilevel Yes Yes Very Fast Medium Large


Index datasets

Hash No No Very Fast Medium Equality


Index search

Bitmap No No Fast for Low Categoric


Index low values al
attributes

Hashing in DBMS – Notes


🔹 What is Hashing?
Hashing is a technique used to quickly access data by applying a hash function to a search
key. It maps a key to a location (or "bucket") in a hash table.

🔹 Why Use Hashing?


To enable fast data retrieval (especially for equality searches)
Used in indexing, hash-based joins, and partitioning
Efficient in large databases where quick access is needed

🔹 Hash Function
A hash function converts a search key (e.g., Employee ID) into a bucket address.

Formula example:

text

CopyEdit

h(k) = k mod m

Where:

k = search key
m = number of buckets

🔹 Types of Hashing:
1. Static Hashing

Number of buckets is fixed.


Hash function does not change.
Each bucket can hold multiple records.

🔸 Example:
If m = 10, then h(107) = 107 % 10 = 7

🔸 Drawbacks:
Overflow may occur (more records than bucket space)
Poor for growing datasets

2. Dynamic Hashing

Number of buckets grows/shrinks as data changes


Handles overflows gracefully
Common techniques: Extendible Hashing, Linear Hashing

🔹 Hash Collisions
Occurs when two keys hash to the same bucket.

🔸 Methods to Handle:
1. Chaining – Use a linked list at each bucket.
2. Open Addressing – Find next available bucket using probing:
Linear Probing: check next slots sequentially
Quadratic Probing: check i² positions ahead
Double Hashing: use a second hash function

🔹 Extendible Hashing
Uses a directory of pointers to buckets.
Directory size = 2^d, where d is the number of bits in hash prefix.
Buckets split when full; directory doubles if needed.

🔸 Advantage:
Grows incrementally
Avoids frequent rehashing

🔹 Linear Hashing
Buckets are split incrementally over time.
No directory is used.
Splits are triggered by load factor.

🔸 Advantage:
Simpler than extendible hashing
Automatically adapts to data growth

🔹 Applications of Hashing in DBMS


Hash-based indexing
Hash join algorithms
Partitioning large tables
Fast equality-based search

🔹 Advantages of Hashing
✅ Constant time access: O(1)
✅ Efficient for large data
✅ Good for equality searches (e.g., Emp_ID = 101)
🔹 Disadvantages of Hashing
❌ Not suitable for range queries
❌ Collisions can degrade performance
❌ Rehashing is expensive in static hashing
📌 Summary Table:
Feature Static Hashing Dynamic Hashing

Buckets Fixed Grows/Shrinks

Performance Degrades with size Maintains efficiency

Collision Handling Chaining/Probing Bucket split/directory


use

Range Queries Not supported Not supported

B+ Tree in DBMS – Notes


🔹 What is a B+ Tree?
A B+ Tree is a balanced tree data structure used for indexing in databases and file systems. It
is a type of self-balancing search tree that maintains sorted data and allows searches,
sequential access, insertions, and deletions in logarithmic time.

🔹 Why Use B+ Tree in DBMS?


Efficient for large-scale disk-based databases
Ideal for range queries
Used in multilevel indexing

🔹 Structure of B+ Tree
1. Internal Nodes:
Only store keys (not data)
Directs search by guiding to the correct child node
2. Leaf Nodes:
Store keys + actual data pointers
Linked together to form a linked list for sequential access
3. Root Node:
Can be a leaf node (if the tree is small) or an internal node

🔹 Properties of B+ Tree (Order n):


Feature Description

Order (n) Maximum number of children per node

Internal node Has at most n children and at least ⌈n/2⌉

Leaf node Contains ⌈(n−1)/2⌉ to n−1 keys

Root node Has minimum 2 children if not a leaf

Balanced All leaves appear on the same level

Data stored in Only leaf nodes, internal nodes hold keys for
navigation only

Leaf node linkage Leaf nodes are connected via linked list for
range traversal

🔹 Operations in B+ Tree
✅ 1. Search
Start at root and move down based on key comparison.
Final search is in leaf node.
Efficient due to balanced structure (Time: O(log n))

✅ 2. Insertion
Insert key in proper leaf node.
If overflow occurs:
Split the leaf
Promote key to parent
May recursively split and grow the tree

✅ 3. Deletion
Remove key from leaf.
If underflow:
Borrow from sibling, or
Merge with sibling
Update parent keys accordingly

🔹 Advantages of B+ Tree
Benefit Explanation

Efficient search O(log n) due to balanced structure

Fast range queries Leaf nodes linked for sequential traversal

Space efficient Internal nodes only store keys (not full


data)

Good for disk-based access High fan-out reduces tree height (fewer
disk I/Os)

Dynamic Supports insertion/deletion efficiently

What is a B-Tree?
A B-Tree is a self-balancing tree data structure that maintains sorted data and allows
searches, sequential access, insertions, and deletions in logarithmic time. It is commonly
used in databases and file systems to efficiently store and retrieve large blocks of data.

Why B-Trees in DBMS?

Databases often store data on disks, where accessing data is slow compared to memory.
B-Trees minimize disk reads by maximizing the number of keys stored in each node,
reducing the height of the tree.
This helps optimize operations such as search, insert, update, and delete.

Characteristics of B-Trees

1. Balanced tree: All leaf nodes are at the same level.


2. Nodes have multiple keys: Unlike binary trees, a B-tree node can have more than two
children.
3. Order (m): Defines the maximum number of children each node can have.
4. Node properties:
Each node can have a minimum of ceil(m/2) children (except root).
Each node can have a maximum of m children.
The number of keys in each node is one less than the number of children.
5. Keys in nodes are sorted, and child pointers separate key ranges.

Basic Operations in B-Tree

Search: Similar to binary search but over multiple keys in a node.


Insertion: Insert in the correct leaf node; if the node is full, split the node and propagate
the split upwards.
Deletion: Remove the key, and if underflow occurs (too few keys), redistribute keys or
merge nodes to maintain properties.

Example (Order = 4)

Each node can have up to 3 keys and 4 children.


Minimum children per node (except root) = 2.
Keys in node: [K1, K2, K3]
Children: [P0, P1, P2, P3]
Keys in subtree P0 < K1, keys in subtree P1 between K1 and K2, etc.

Use of B-Trees in DBMS

Indexing large tables.


Supporting range queries efficiently.
Used in storage engines like MySQL's InnoDB.
Handles large blocks of data with minimal disk I/O.

Summary

Feature Description

Data Structure Balanced multiway search tree

Purpose Efficient disk-based data


storage

Node Capacity Multiple keys and children

Height Low, to reduce disk reads

Operations Time O(log n) for search, insert,


delete

Use Case Indexing in databases

Heap File Organization (DBMS)


Definition:
A file organization method where records are stored in no particular order. New records
are simply added at the end of the file (or in the first available space).

Key Characteristics

Unordered storage: No sorting or indexing on records.


Insertion: Fast, simply append record at the end.
Search: Linear scan required to find a record.
Deletion: Mark record as deleted (using a flag) or remove physically, which may require
reorganization.
Update: Locate record by scanning, then modify.

Advantages

Simple and easy to implement.


Efficient for bulk loading and when no queries require sorted access.
Fast insertions because no order is maintained.

Disadvantages

Inefficient for searching (requires full file scan).


Not suitable for queries that require ordered data or fast retrieval.
Deletion and updates may lead to fragmentation, requiring periodic reorganization.

Use Cases

Small tables or temporary tables.


When no indexing or search on specific fields is required.
When insertions are more frequent than searches.

Summary Table

Operation Efficiency

Insertion Fast (append at end)

Search Slow (linear scan)

Deletion Slow (requires scanning)

Update Slow (search + modify)

You might also like