0% found this document useful (0 votes)
28 views9 pages

File Organization

Uploaded by

starlord68736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views9 pages

File Organization

Uploaded by

starlord68736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Physical Database Design issues :

Physical database design issues include:

1. Storage Space: Plan how much space is needed for tables, indexes, and growth. Poor
planning can cause storage problems.

2. Indexes: Indexes make searching faster but take up space and can slow updates. Use the
right type of index.

3. Data Partitioning: Break large data into smaller parts to improve speed and maintenance.

4. File Organization: Decide how data is stored, like sequential or hashed files, based on needs.

5. Backup and Recovery: Regularly save data and have a quick recovery plan to avoid data loss
or delays.

6. Concurrency Control: Allow multiple users to work at the same time while ensuring data
safety and avoiding conflicts.

7. Performance Tuning: Optimize storage and speed by using the right methods, such as
caching and compression.

Storage Types in DBMS

The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device

HOW DATA IS STORED ON HDDS

How Data is Stored on Hard Disks

• Blocks and Pages:


Data is divided into fixed-size units called blocks (or pages), which are stored on the disk.

o A block is the smallest unit of data that can be read or written to the disk.

o Blocks are managed by the DBMS to organize and retrieve data efficiently.

• File Organization:
Databases use specific techniques (e.g., sequential, hashed, or indexed) to store and manage
these blocks.

Techniques to Optimize Disk Storage

• Buffering: Frequently accessed data is kept in memory to reduce disk reads.

• Indexing: Helps locate data quickly without scanning the entire disk.

• Clustering: Related data is stored together to minimize disk seeks.

1) What is file organization?

1
it refers to the logical relationship between various records that constitute the file particularly with
respect to means of identification and access to any specific record

in simple words , storing the files in a certain order is called file organization

2) OBJECTIVE OF FILE ORGANIZATION

• It helps in faster selection of records , i.e. makes the process faster


• Different operations like inserting,deleting,updating different records are faster and easier
• Prevents us from entering duplicate records via various operations
• Helps in storing the records or the data very efficiently at a minimal cost

3) TYPES OF FILE ORGANIZATION

• Sequential File Organization


• Heap File Organization
• Hash File organization
• B+tree file organization
• Clustered file organization
• Indexed Sequential Access Method

3.1) Sequential File organization

• Records are stored in sequential order based on a specific key field


• Records are accessed sequentially starting from the beginning of the file
• It is best for batch processing applications where records are processed in specific order
• Inefficient for random access and updates

3.1.1) PILE FILE METHOD

• We store the records in a sequence that is one after the other in the order in which they are
inserted into the tables
• New record is simply added to the end of the list (like linked list)

3.1.2) Sorted File method (sorting is done based on PK)

• The records are stored in sorted order


• whenever a new record has to be inserted it has to be inserted in the correct sorted
position

2
Advantages of Sequential File Organization
• Fast and efficient method for huge amounts of data.
• Simple design.
• Files can be easily stored in magnetic tapes i.e. cheaper storage mechanism.

Disadvantages of Sequential File Organization


• Time wastage as we cannot jump on a particular record that is required, but we have to
move in a sequential manner which takes our time.
• The sorted file method is inefficient as it takes time and space for sorting records.

3.2) Heap File Organization

• It works with data blocks , in this method records are inserted at the end of the file into the
data blocks
• No sorting or ordering is required in the method
• If one data block is full the new record is stored in some other block , which can be any block
in the memory

>)If we want to search,delete or update data in heap file org we will traverse the data from
the beginning of the file till we get the requested record
>)Thus if db is very huge , the above processes will take a lot of time

Advantages of Heap File Organization

• Fetching and retrieving records is faster than sequential records but only in the case of small
databases.

• When there is a huge number of data that needs to be loaded into the database at a time,
then this method of file Organization is best suited.

Disadvantages of Heap File Organization

• The problem of unused memory blocks.

• Inefficient for larger databases.

3.3)Hashed File Organization

A hash function is used to calculate the location (address) of a record in memory or on disk. The
output of the hash function decides where the record will be stored.

• Operations:
3
1. Retrieve: When you want to fetch a record, the hash key is used to find its address, and
the record is retrieved directly.

2. Insert: The hash key generates the address, and the record is stored at that location.

3. Delete/Update: The address is calculated using the hash key, and the record is either
removed or modified at that location.

• Benefits:

o No need to search or sort the entire file.

o Records are stored randomly, making access faster and efficient.

This method simplifies data retrieval and management using direct addressing.

Hashing is better than Indexing because hashing helps calculate the location of the data that is
present in the disk without the help of index structure.

3.4)Indexed File Organization

it involves creating an index , which is a data structure that maps key values to the physical
location of records which allows faster access to specific records especially when the DB is large
and complex

INDEX CREATION :

• An index is created on a specific attribute(or a combination of attributes) called the indexing


key
• The index stores key values and pointers to the corresponding records
• The index is typically sorted by the key values making it efficient for searching

RECORD ACCESS:

• When a query is made the DB system first searches for the index of the specified key value
• Once the key value is found the index provides the pointer to the physical location of the
record
• The DB system then directly accesses the record using the pointer

ADVANTAGES :

• Efficient retrieval of records


• Reduces the time required for queries and updates
4
• Facilitates both sequential and random access
• Ensures uniqueness and consistency of indexed values

DISADVANTAGES :

• Requires additional spaces for index structures


• Index structures needs to be updated when records are inserted , deleted or modified
• The implementation of indexed file organization can be complex

Types of Indexes:

1. Primary Index:

o Based on the primary key of the table.

o Unique and non-null.

o Ensures data integrity and efficient record access.

2. Secondary Index:

o Created on non-primary key attributes.

o Can be unique or non-unique.

o Allows for efficient searching and sorting based on different criteria.

3. Clustered Index:

o The physical order of records in the file is the same as the order of the index.

o Improves performance for range queries and sequential access.

4. Non-Clustered Index:

o The physical order of records is independent of the index order.

o Useful for quick access to specific records based on the index key.

3.5) INDEXED SEQUENTIAL ACCESS METHOD

It is an advanced sequential file organization. In this method , records are stored in the file using
the primary key . An index value is generated for each primary key and mapped with the record .
This index contans the address of the record in the file

If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

5
Pros

• Quick and easy record search using data block addresses.

• Supports range and partial retrieval (e.g., retrieving names starting with 'JA').

Cons

• Requires additional disk space for index storage.

• Inserting new records may require file reconstruction.

• Deleting records can lead to unused space, slowing down performance.

Tree Structures in DBMS

• Definition: Tree-based structures like B-trees and B+ trees are commonly used for indexing in
DBMS to maintain a balanced hierarchy and optimize searches, insertions, and deletions.

• Types:

o B-Tree: Balanced tree where keys are stored at internal and leaf nodes. Suitable for
indexing but less efficient for sequential access.

o B+ Tree: Enhanced B-tree where data is stored only at leaf nodes, while internal nodes
store pointers, making sequential access and range queries faster.

Features of B+ trees :

• B+ Trees are self-balancing, which means that as data is added or removed from the tree, it
automatically adjusts itself to maintain a balanced structure. This ensures that the search time
remains relatively constant, regardless of the size of the tree.
• B+ Trees are multi-level data structures, with a root node at the top and one or more levels of
internal nodes below it. The leaf nodes at the bottom level contain the actual data.
• B+ Trees maintain the order of the keys in the tree, which makes it easy to perform range
queries and other operations that require sorted data.
• B+ Trees are often used for disk-based storage systems because they are efficient at storing
and retrieving data from disk.

6
Why Use B+ Tree?

• B+ Trees are the best choice for storage systems with sluggish data access because they
minimize I/O operations while facilitating efficient disc access.

• B+ Trees are a good choice for database systems and applications needing quick data
retrieval because of their balanced structure, which guarantees predictable performance for a
variety of activities and facilitates effective range-based queries

MULTI KEY FILE ORGANIZATION IN DBMS

When a file records are made accessed based on more than one key are called as Multikey file
organization.

E.g. In banking system we keep records of accounts in file. Now account holder needs account
information which can be access through account no, while loan officer needs account records with
a given value of overdue limit. So we need to provide to access path to the record based on
different need. Generally these files are index sequential file in which file is stored sequentially
based on primary key and more than one index table are provided based on different keys

Types

1. Inverted File Organization:

In this file organization a key’s inversion index contain all of the values that the key presently has
in the records of the data file. Each key-value entry in the inversion index points to all of the data
records that have the corresponding value. The data file is said to be inverted on that key. Inverted
files are sorted on inversion index so that binary search can be applied to find out index of record.
Whenever record is added in data file its corresponding entry has to be made in inverted file.

Advantages:

1. Efficient Searching:

o Inverted file organization enables fast searching by secondary keys.

o Each key-value in the index points directly to all matching records.

2. Flexible Data Storage:

o The records can be stored in any format (sequential, random, or linked), as the indexing
structure is independent of the data storage method.

3. Space Saving:

o When key fields are not required for retrieval, they can be removed from the records,
reducing storage requirements.

7
Comparison with Multilist File Organization:

• Similarities:

o Both inverted and multilist file organizations have indexes for each secondary key.

o Both maintain index entries for distinct values of secondary keys.

• Differences:

o In multilist organization, the index entry points to only the first record, which is linked to
others.

o In inverted file organization, the index entry points to all records with that key value,
allowing direct access.

2. Multi List File Organization:

In multi list file organization the index contain all values that the secondary key has in data file
same as inverted file but the difference is that the entry in the multi index for a secondary key
value is pointer to the first data record with that key value. That data record contains pointer to
second record having same key. Thus there is a linked list of data records for each value of
secondary key. Multi list chains usually are bidirectional and occasionally are circular to improve
update operation.

Advantages:

1. Efficient Insertion and Deletion:

o Records are linked by primary key values, making updates easier once the position is
identified.

2. Reduced Search Time:

o Indexes shorten the lists and speed up searches for both primary and secondary keys.

3. Support for Multiple Keys:

o Multiple indexes allow retrieval by different secondary keys.

Limitations:

1. Sequential Search Without Index:

o If no index is available, searching is slow as it requires scanning the entire list.

2. Complexity in Maintenance:

o Managing multiple indexes and links can increase overhead.

8
Why Do We Need Multiple Access Paths in RDBMS?

In Relational Database Management Systems (RDBMS), multiple access paths are needed to make
data retrieval and manipulation faster and more efficient.

Benefits of Multiple Access Paths

1. Improved Query Speed: The system can choose the best path to quickly execute queries.

2. Flexibility: Different paths are available for different types of queries, ensuring smooth
performance.

3. Better Data Retrieval: Allows fetching data from different angles, such as by ID, name, or
date.

4. Enhanced Indexing: Makes use of indexes to avoid scanning the entire table.

5. Efficient Data Updates: Makes insert, update, and delete operations faster.

Examples

• Indexing: Separate indexes for Employee ID and Department.

• Grid File Organization: Handles queries on multiple dimensions like Price and Location.

• Hashing: Multiple hash functions for different attributes.

Real-Life Examples

• Customer Database: Search by ID, name, email, or phone.


• E-Commerce Platform: Retrieve products by category, price, or brand.
• Financial Transactions: Fetch records by date, account number, or type.

You might also like