0% found this document useful (0 votes)

9 views23 pages

DBMS Unit-5 Notes

The document discusses storage and indexing in database management systems, highlighting the differences between external storage types like disks and tapes, and the importance of indexing for optimizing database performance. It outlines various indexing techniques, including primary, secondary, and clustering indexes, as well as file organization methods such as heap, sequential, and hash file organizations. Additionally, it compares different file organization methods and indexing structures like ISAM and B+ trees, emphasizing their efficiency in data retrieval and management.

Uploaded by

dekisugidoreamon2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

DBMS Unit-5 Notes

Uploaded by

dekisugidoreamon2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Database Management Systems

UNIT-5
STORAGE AND INDEXING
1. Data on External Storage

Disks:
Can retrieve random page at fixed cost
But reading several consecutive pages is much cheaper than
reading them in random order

Tapes:
Can only read pages in sequence
Cheaper than disks; used for archival storage

Indexing:

Indexing is a way to optimize performance of a database by minimizing the number of disk

accesses required when a query is processed.

An index or database index is a data structure which is used to quickly locate and access the
data in a database table.

Indexes are created using some database columns.

• The first column is the Search key that contains a copy of the primary key or candidate
key of the table. These values are stored in sorted order so that the corresponding data
can be accessed quickly (Note that the data may or may not be stored in sorted order).
• The second column is the Data Reference which contains a set of pointers holding the
address of the disk block where that particular key value can be found.

Alternatives for Data Entry k* in Index:

❖ In a data entry k* we can store:

▪ Data record with key value k, or
▪ <k, rid of data record with search key value k>, or
▪ <k, list of rids of data records with search key k>
❖ Choice of alternative for data entries is orthogonal to the indexing technique used to locate
data entries with a given key value k.
▪ Examples of indexing techniques: B+ trees, hash-based structures
▪ Typically, index contains auxiliary information that directs searches to the desired
data entries
❖ Alternative 1:
▪ If this is used, index structure is a file organization for data records (instead of a
Heap file or sorted file).
1
Database Management Systems

▪ At most one index on a given collection of data records can use Alternative 1.
(Otherwise, data records are duplicated, leading to redundant storage and potential
inconsistency.)
▪ If data records are very large, # of pages containing data entries is high. Implies
size of auxiliary information in the index is also large, typically.
❖ Alternatives 2 and 3:
▪ Data entries typically much smaller than data records. So, better than Alternative
1 with large data records, especially if search keys are small. (Portion of index
structure used to direct search, which depends on size of data entries, is much
smaller than with Alternative 1.)
▪ Alternative 3 more compact than Alternative 2, but leads to variable sized data
entries even if search keys are of fixed length.

2. Types of Indexing:

• Primary Index − Primary index is defined on an ordered data file. The data file is ordered
on a key field. The key field is generally the primary key of the relation.
• Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
• Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field. (or)

If order of data records is the same as, or `close to’, order of data entries, then called
clustered index. Alternative 1 implies clustered; in practice, clustered also implies
Alternative 1 (since sorted files are rare). A file can be clustered on at most one search
key. Cost of retrieving data records through index varies greatly based on whether index
is clustered or not!

Suppose that Alternative (2) is used for data entries, and that the data records are stored
in a Heap file.

▪ To build clustered index, first sort the Heap file (with some free space on each
page for future inserts).
▪ Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’,
but not identical to, the sort order.)

2
Database Management Systems

Ordered Indexing is of two types −

o Dense Index
o Sparse Index

Dense Index

In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain
search key value and a pointer to the actual record on the disk.

Sparse Index

In sparse index, index records are not created for every search key. An index record here
contains a search key and an actual pointer to the data on the disk. To search a record, we first
proceed by index record and reach at the actual location of the data. If the data we are looking
for is not where we directly reach by following the index, then the system starts sequential
search until the desired data is found.

Multilevel Index

Index records comprise search-key values and data pointers. Multilevel index is stored on the
disk along with the actual database files. As the size of the database grows, so does the size of
the indices. There is an immense need to keep the index records in the main memory so as to
speed up the search operations. If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses.

3
Database Management Systems

Indexes must be chosen to speed up important queries (and perhaps some updates!).

▪ Index maintenance overhead on updates to key fields.

▪ Choose indexes that can help many queries, if possible.
▪ Build indexes to support index-only strategies.
▪ Clustering is an important decision; only one index on a given relation can be
clustered!
▪ Order of fields in composite index key can be important.

3. File organization and Indexing:

Relative data and information are stored collectively in file formats. A file is a sequence of
records stored in binary format. A disk drive is formatted into several blocks that can store
records. File records are mapped onto those disk blocks.

File Organization

File Organization defines how file records are mapped onto disk blocks. We have four types of
File Organization to organize file records −

Heap File Organization

When a file is created using Heap File Organization, the Operating System allocates memory
area to that file without any further accounting details. File records can be placed anywhere in
that memory area. It is the responsibility of the software to manage the records. Heap File does
not support any ordering, sequencing, or indexing on its own.

Sequential File Organization

Every file record contains a data field (attribute) to uniquely identify that record. In sequential
file organization, records are placed in the file in some sequential order based on the unique key
field or search key. Practically, it is not possible to store all the records sequentially in physical
form.

Hash File Organization

Hash File Organization uses Hash function computation on some fields of the records. The
output of the hash function determines the location of disk block where the records are to be
placed.

4
Database Management Systems

Clustered File Organization

Clustered file organization is not considered good for large databases. In this mechanism,
related records from one or more relations are kept in the same disk block, that is, the ordering
of records is not based on primary key or search key.

File Operations

Operations on database files can be broadly classified into two categories −

• Update Operations
• Retrieval Operations

Update operations change the data values by insertion, deletion, or update. Retrieval operations,
on the other hand, do not alter the data but retrieve them after optional conditional filtering. In
both types of operations, selection plays a significant role. Other than creation and deletion of a
file, there could be several operations, which can be done on files.

• Open − A file can be opened in one of the two modes, read mode or write mode. In
read mode, the operating system does not allow anyone to alter data. In other words,
data is read only. Files opened in read mode can be shared among several entities. Write
mode allows data modification. Files opened in write mode can be read but cannot be
shared.
• Locate − Every file has a file pointer, which tells the current position where the data is
to be read or written. This pointer can be adjusted accordingly. Using find (seek)
operation, it can be moved forward or backward.
• Read − By default, when files are opened in read mode, the file pointer points to the
beginning of the file. There are options where the user can tell the operating system
where to locate the file pointer at the time of opening a file. The very next data to the
file pointer is read.
• Write − User can select to open a file in write mode, which enables them to edit its
contents. It can be deletion, insertion, or modification. The file pointer can be located at
the time of opening or can be dynamically changed if the operating system allows to do
so.
• Close − This is the most important operation from the operating system’s point of view.
When a request to close a file is generated, the operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.

The organization of data inside a file plays a major role here. The process to locate the file
pointer to a desired record inside a file various based on whether the records are arranged
sequentially or clustered.

5
Database Management Systems

4. Index Data structures:

1. Hash Based Indexing

2. Tree based Indexing

1. hash based Indexing: For a huge database structure, it can be almost next to
impossible to search all the index values through all its level and then reach the
destination data block to retrieve the desired data. Hashing is an effective technique to
calculate the direct location of a data record on the disk without using index structure.

Hashing uses hash functions with search keys as parameters to generate the address of a data
record.

• Bucket − A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store one
or more records.
• Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.

Ex:

2. Tree Based Indexing: An alternative to hash-based indexing is to organize records

using a tree like data structure. The data entries are arranged in sorted order by search
key value and a hierarchical search data structure is maintained that directs searches
to the correct page of data entries.

6
Database Management Systems

The lowest level of the tree is called the leaf level, contains the data entries. This allows
us to efficiently locate all data entries with search key values in a desired range. All
searches begin at the top most node, called the root, and the contents of pages in non-leaf
levels direct searches to the correct leaf page. Non leaf pages contain node pointers
separated by search key values. The pointer to the left of a key value k, points to the
subtree that contains only data entries less than k. The node pointer to the right of a key
value k, points to a subtree that contains only data entries greater than or equal to k.
Search starts here

pages

5. Comparison of File organizations:

Sequential Heap/Direct Hash ISAM B+ tree Cluster

Stored as Stored at the Stored at Address index is Stored in a Frequently

they come end of the the hash appended to the tree like joined
or sorted as file. But the address record structure tables are
Method of they come address in generated clubbed
storing the memory into one
is random. file based
on cluster
key

Pile file and Static and Dense, Sparse, Indexed

Types sorted file dynamic multilevel indexing and Hash
Method hashing

Simple Simplest Medium Complex Complex Simple

Design
Design

7
Database Management Systems

Cheap Cheap Medium Costlier Costlier Medium

Storage Cost (magnetic
tapes)

Fast and Best suited Faster Searching records Searching Best

efficient for bulk Access is faster. range of suited for
when there insertion, No Need to Suitable for large data & frequently
is large and small Sort database. partial data joined
volumes of files/tables Handles Any of the columns are efficient. tables.
data, Report multiple can be used as key No Suitable
generation, transactions column. performance for 1:M
statistical Suitable for degrades mappings
calculations Online Searching range of when there
etc transactions data & partial data is insert /
are efficient. delete /
update.
Grows and
shrinks with
data.
Works well
Advantage in
secondary
storage
devices and
hence
reducing
disk I/O.
Since all
data is at the
leaf node,
searching is
easy.
All data at
leaf node
are sorted
sequential
linked list.

Sorting of Records are Accidental Extra cost to Not suitable Not

data each scattered in Deletion or maintain index. for static suitable
time for the memory updation of File reconstruction tables for large
Disadvantage insert/delete/ and they are Data is needed as database.
update takes inefficiently Use of insert/update/delete. Suitable
time and used. Hence Memory is Does not grow with only for
makes increases the inefficient data. the joins
memory Searching on which
8
Database Management Systems

system size. range of clustering

slow. Proper data, partial is done.
memory data, non- Less
management hash key frequently
is needed. column, used joins
searching and 1: 1
Not suitable single hash Mapping
for large column are
tables. when inefficient.
multiple
hash keys
present or
frequently
updated
column as
hash key
are
inefficient.

Cost of Comparison of file organizations:

Operations to Compare Scan:
• Fetch all records from disk
• Equality search
• Range selection
• Insert a record
• Delete a record
We ignore CPU costs, for simplicity: B: The number of data pages

 R: Number of records per page

 D: (Average) time to read or write disk page

 Measuring number of page I/O’s ignores gains of pre-fetching a sequence of pages; thus,
even I/O cost is only approximated. Average-case analysis; based on several simplistic
assumptions.

9
Database Management Systems

6. Tree Structured Indexing:

There are two index data structures, called ISAM and B+ trees, based on tree organizations. These
structures provide efficient support for range searches, including sorted le scans as a special case.
Unlike sorted les, these index structures support efficient insertion and deletion. They also
provide support for equality selections, although they are not as efficient in this case as hash-
based indexes.
tree-structured indexing techniques support both range searches and equality searches
ISAM: static structure;
B+ tree: dynamic, adjusts gracefully under inserts and deletes.

7. ISAM (Indexed Sequential access Method):

In an ISAM system, data is organized into records which are composed of fixed length fields.
Records are stored sequentially, originally to speed access on a tape system. A secondary set
of hash tables known as indexes contain "pointers" into the tables, allowing individual records to
be retrieved without having to search the entire data set. This is a departure from the
contemporaneous navigational databases, in which the pointers to other data were stored inside
the records themselves. The key improvement in ISAM is that the indexes are small and can be
searched quickly, thereby allowing the database to access only the records it needs.
When an ISAM file is created, index nodes are fixed, and their pointers do not change during
inserts and deletes that occur later (only content of leaf nodes change afterwards). As a
consequence of this, if inserts to some leaf node exceed the node's capacity, new records are
stored in overflow chains. If there are many more inserts than deletions from a table, these
overflow chains can gradually become very large, and this affects the time required for retrieval
of a record.
ISAM is very simple to understand and implement, as it primarily consists of direct, sequential
access to a database file. It is also very inexpensive. The tradeoff is that each client machine must
manage its own connection to each file it accesses. This, in turn, leads to the possibility of
conflicting inserts into those files, leading to an inconsistent database state. This is typically
10
Database Management Systems

solved with the addition of a client-server framework which marshals client requests and
maintains ordering. This is the basic concept behind a database management system (DBMS),
which is a client layer over the underlying data store.
The indexed access method of reading or writing data only provides the desired outcome if in
fact the file is organized as an ISAM file with the appropriate, previously defined keys. Access
to data via the previously defined key(s) is extremely fast. Multiple keys, overlapping keys and
key compression within the hash tables are supported.

Node structure of ISAM:

The tree can be represented as,

Example:
Index entries: <the search key value, block/page id>
they direct search for data entries in leaves.
Example where each node can hold 2 entries; 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63*
97*
(* represents key entry)

11
Database Management Systems

After Inserting 23,48, 41,42,

Deleting 41,52,97*, the tree becomes...

8. B+ Tree:

B Trees. B Trees are multi-way trees. That is each node contains a set of keys and pointers. A B
Tree with four keys and five pointers represents the minimum size of a B Tree node. A B Tree
contains only data pages.

B Trees are dynamic. That is, the height of the tree grows and contracts as records are added
and deleted.

12
Database Management Systems

B+ Trees A B+ Tree combines features of ISAM and B Trees. It contains index pages and data
pages. The data pages always appear as leaf nodes in the tree. The root node and intermediate
nodes are always index pages. These features are similar to ISAM. Unlike ISAM, overflow
pages are not used in B+ trees.

The index pages in a B+ tree are constructed through the process of inserting and deleting
records. Thus, B+ trees grow and contract like their B Tree counterparts. The contents and the
number of index pages reflects this growth and shrinkage.

B+ Trees and B Trees use a "fill factor" to control the growth and the shrinkage. A 50% fill
factor would be the minimum for any B+ or B tree. As our example we use the smallest page
structure. This means that our B+ tree conforms to the following guidelines.

Number of Keys/page 4

Number of Pointers/page 5

Fill Factor 50%

Minimum Keys in each page 2

As this table indicates each page must have a minimum of two keys. The root page may violate
this rule. The following table shows a B+ tree. As the example illustrates this tree does not have
a full index page. (We have room for one more key and pointer in the root page.) In addition,
one of the data pages contains empty slots.

B+ Tree with four keys

13
Database Management Systems

Adding Records to a B+ Tree

The key value determines a record's placement in a B+ tree. The leaf pages are maintained in
sequential order AND a doubly linked list (not shown) connects each leaf page with its sibling
page(s). This doubly linked list speeds data movement as the pages grow and contract.

We must consider three scenarios when we add a record to a B+ tree. Each scenario causes a
different action in the insert algorithm. The scenarios are:

The insert algorithm for B+ Trees

Leaf Page Index Page

Action
Full FULL

NO NO Place the record in sorted position in the appropriate leaf page

1. Split the leaf page

2. Place Middle Key in the index page in sorted order.
3. Left leaf page contains records with keys below the
YES NO middle key.
4. Right leaf page contains records with keys equal to or
greater than the middle key.

1. Split the leaf page.

2. Records with keys < middle key go to the left leaf
page.
3. Records with keys >= middle key go to the right leaf
page.

YES YES 4. Split the index page.

5. Keys < middle key go to the left index page.
6. Keys > middle key go to the right index page.
7. The middle key goes to the next (higher level) index.

IF the next level index page is full, continue splitting

the index pages.

Illustrations of the insert algorithm

The following examples illustrate each of the insert scenarios. We begin with the simplest
scenario: inserting a record into a leaf page that is not full. Since only the leaf node containing
25 and 30 contains expansion room, we're going to insert a record with a key value of 28 into
the B+ tree. The following figures shows the result of this addition.

14
Database Management Systems

Add Record with Key 28

Adding a record when the leaf page is full but the index page is not

Next, we're going to insert a record with a key value of 70 into our B+ tree. This record should
go in the leaf page containing 50, 55, 60, and 65. Unfortunately this page is full. This means
that we must split the page as follows:

Left Leaf Page Right Leaf Page

50 55 60 65 70

The middle key of 60 is placed in the index page between 50 and 75.

The following table shows the B+ tree after the addition of 70.

Add Record with Key 70

15
Database Management Systems

Adding a record when both the leaf page and the index page are full

As our last example, we're going to add a record containing a key value of 95 to our B+ tree.
This record belongs in the page containing 75, 80, 85, and 90. Since this page is full we split it
into two pages:

Left Leaf Page Right Leaf Page

75 80 85 90 95

The middle key, 85, rises to the index page. Unfortunately, the index page is also full, so we
split the index page:

Left Index Page Right Index Page New Index Page

25 50 75 85 60

The following table illustrates the addition of the record containing 95 to the B+ tree.

Add Record with Key 95

16
Database Management Systems

Deletion in B+ Tree:

Consider the tree,

Deleting 19*,20*

Deleting 24*,

Tree appears like,

17
Database Management Systems

Deleting Keys from a B+ tree

We must consider three scenarios when we delete a record from a B+ tree. Each scenario causes
a different action in the delete algorithm. The scenarios are:

The delete algorithm for B+ Trees

Leaf Page Index Page

Below Fill Below Fill Action
Factor Factor

Delete the record from the leaf page. Arrange keys in

NO NO ascending order to fill void. If the key of the deleted record
appears in the index page, use the next key to replace it.

Combine the leaf page and its sibling. Change the index
YES NO page to reflect the change either by deleting the parent or
merging the parent level indexes.

1. Combine the leaf page and its sibling.

2. Adjust the index page to reflect the change.
YES YES 3. Combine the index page with its sibling.

Continue combining index pages until you reach a page with

the correct fill factor or you reach the root page.

9. Hash Based Indexing:

Hash Organization
• Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage.
A bucket typically stores one complete disk block, which in turn can store one or more
records.

• Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from search
keys to bucket addresses.

Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5 values.

18
Database Management Systems

The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.

Operation
• Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.

Bucket address = h(K)

• Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.

• Delete − This is simply a search followed by a deletion operation.

Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.

• Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is called Closed Hashing.

19
Database Management Systems

Disadvantages of Static hashing:

• In static hashing, function h maps searchkey values to a fixed set of B

of bucket addresses. Databases grow or shrink with time.
• If initial number of buckets is too small, and file grows, performance will degrade due to
too much overflows.
• If space is allocated for anticipated growth, a significant amount of space will be wasted
initially (and buckets will be underfull).
• If database shrinks, again space will be wasted.
One solution: periodic reorganization of the file with a new hash function
Expensive, disrupts normal operations

• Better solution: allow the number of buckets to be modified dynamically.

Extendible hashing/ Dynamic Hashing:

The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets
are added and removed dynamically and on-demand. Dynamic hashing is also known
as extended hashing.

Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.

The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is
used for computing bucket addresses. Every hash index has a depth value to signify how many
bits are used for computing a hash function. These bits can address 2n buckets. When all these
bits are consumed − that is, when all the buckets are full − then the depth value is increased
linearly and twice the buckets are allocated.

20
Database Management Systems

Hashing is not favorable when the data is organized in some ordering and the queries require a
range of data. When data is discrete and random, hash performs the best.

Hashing algorithms have high complexity than indexing. All hash operations are done in
constant time.

Mian points on Extendable hashing – one form of dynamic hashing

l Hash function generates values over a large range — typically b-bit integers, with
b = 32.
l At any time use only a prefix of the hash function to index into a table of bucket
addresses.
l Let the length of the prefix be i bits, 0  i  32.
 Bucket address table size = 2i. Initially i = 0
 Value of i grows and shrinks as the size of the database grows and shrinks.
l Multiple entries in the bucket address table may point to a bucket.
l Thus, actual number of buckets is < 2i
 The number of buckets also changes dynamically due to merging and
splitting of buckets.

Linear Hashing:

This is another dynamic hashing scheme, an alternative to Extendible Hashing.

LH handles the problem of long overflow chains without using a directory, and handles
duplicates.

Idea: Use a family of hash functions h0, h1, h2, ... – hi(key) = h(key) mod(2iN); N = initial #
buckets – h is some hash function (range is 0 to 2|MachineBitLength|) – If N = 2d0, for some
d0, hi consists of applying h and looking at the last di bits, where di = d0 + i. – hi+1 doubles the
range of hi (similar to directory doubling)

Directory avoided in LH by using overflow pages, and choosing bucket to split round-robin. –
Splitting proceeds in ‘rounds’. Round ends when all NR initial (for round R) buckets are split. –
Buckets 0 to Next-1 have been split; Next to NR yet to be split. – Current round number is Level.

21
Database Management Systems

Ex: Initial Layout

The Linear Hashing scheme has m initial buckets labelled 0 through m−1, and an initial hashing
function h0(k) = f(k) % m that is used to map any key k into one of the m buckets (for simplicity
assume h0(k) = k % m), and a pointer p which points to the bucket to be split next whenever an
overflow page is generated (initially p = 0). An example is shown in Figure 1. Bucket Split:
When the first overflow occurs (it can occur in any bucket), bucket 0, which is pointed by p, is
split (rehashed) into two buckets: the original bucket 0 and a new bucket m. A new empty page
is also added in the overflown bucket to accommodate the overflow. The search values originally
mapped into bucket 0

function h0) are now distributed between buckets 0 and m using a new hashing function h1.

22
Database Management Systems

As an example, Figure 2 shows the layout of the Linear Hashing of Figure 1 after inserting a
new record with key 11. The circled records are the existing records that are moved to the new
bucket. In more detail, the record is inserted into bucket 11 % 4 = 3. The bucket overflows and
an overflow page is introduced to accommodate the new record. Bucket 0 is split and the records
originally in bucket 0 are distributed between bucket 0 and bucket 4, using a new hash function
h1(k) = k % 8. The next bucket overflow, such as triggered by inserting two records in bucket 2
or four records in bucket 3 in Figure 2, will cause a new split that will attach a new bucket m+1
and the contents of bucket 1 will be distributed using h1 between buckets 1 and m + 1. A crucial
property of h1 is that search values that were originally mapped by h0 to some bucket j must be
remapped either to bucket j or bucket j + m. This is a necessary property for Linear Hashing to
work. An example of such hashing function is: h1(k) = k % 2m. Further bucket overflows will
cause additional bucket splits in a linear bucket-number order (increasing p by one for every
split).

Indexing
No ratings yet
Indexing
62 pages
Basic File Operation
100% (3)
Basic File Operation
4 pages
File Organization
No ratings yet
File Organization
11 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Index 1
No ratings yet
Index 1
25 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
Unit 5
No ratings yet
Unit 5
185 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Class 6
No ratings yet
Class 6
15 pages
Unit 1 Lecture 9
No ratings yet
Unit 1 Lecture 9
22 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Dbms Unit 5 Notes
No ratings yet
Dbms Unit 5 Notes
23 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
7 pages
Database Indexing Essentials
No ratings yet
Database Indexing Essentials
16 pages
Lesson 9 Mod2l2
No ratings yet
Lesson 9 Mod2l2
16 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
SS3 Term 1
No ratings yet
SS3 Term 1
18 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Efficient File Indexing Methods
No ratings yet
Efficient File Indexing Methods
40 pages
UNIT 4 Updated - 121124
No ratings yet
UNIT 4 Updated - 121124
52 pages
V Unit
No ratings yet
V Unit
15 pages
V Unit
No ratings yet
V Unit
36 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Unit V Dbms Question and Answer
No ratings yet
Unit V Dbms Question and Answer
9 pages
Database File Organisation Lecture
No ratings yet
Database File Organisation Lecture
32 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Ss Three Data PR 1therm
No ratings yet
Ss Three Data PR 1therm
17 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
STM32 Interrupt Setup Guide
No ratings yet
STM32 Interrupt Setup Guide
6 pages
Theory
No ratings yet
Theory
11 pages
AAT-20-II: Lang. Code
No ratings yet
AAT-20-II: Lang. Code
16 pages
Ig Computer Science - Question Paper 1 (Jun 2024)
100% (1)
Ig Computer Science - Question Paper 1 (Jun 2024)
36 pages
Thermal Receipt Printer RONGTA TECHNOLOGY RP80
No ratings yet
Thermal Receipt Printer RONGTA TECHNOLOGY RP80
12 pages
Assignment 3 CBNST
No ratings yet
Assignment 3 CBNST
23 pages
Cloud Cost Strategies for Businesses
No ratings yet
Cloud Cost Strategies for Businesses
4 pages
PDF The Palgrave Handbook of Everyday Digital Life 1st Edition Hopeton S. Dunn Download
100% (6)
PDF The Palgrave Handbook of Everyday Digital Life 1st Edition Hopeton S. Dunn Download
55 pages
Network and System Expert MCQ - Questions
No ratings yet
Network and System Expert MCQ - Questions
12 pages
Enclosed Transfer Switches - REV
No ratings yet
Enclosed Transfer Switches - REV
11 pages
ProbabilisticMethod 7
No ratings yet
ProbabilisticMethod 7
12 pages
Keystone Library
No ratings yet
Keystone Library
40 pages
SM-A507F Manual de Servicio Anibal Garcia Irepair
No ratings yet
SM-A507F Manual de Servicio Anibal Garcia Irepair
107 pages
Computing Security Exam Guide
No ratings yet
Computing Security Exam Guide
5 pages
Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Mac Family Tree User Guide
No ratings yet
Mac Family Tree User Guide
96 pages
SiCK-68 Cost Breakdown
No ratings yet
SiCK-68 Cost Breakdown
2 pages
Valves Cutsheet
No ratings yet
Valves Cutsheet
1 page
Liaisonr XL Brochure m0870004248g LR
No ratings yet
Liaisonr XL Brochure m0870004248g LR
16 pages
Designing Static CMOS Logic Circuits: © Forrest Brewer and © Digital Integrated Circuits Combinational Circuits
No ratings yet
Designing Static CMOS Logic Circuits: © Forrest Brewer and © Digital Integrated Circuits Combinational Circuits
65 pages
Manual 360 Visonic
0% (1)
Manual 360 Visonic
70 pages
MEDIA AND INFORMATION LITERACY 2nd Quarter Exam
No ratings yet
MEDIA AND INFORMATION LITERACY 2nd Quarter Exam
7 pages
Make-To-Stock Strategies - Strategy 10
No ratings yet
Make-To-Stock Strategies - Strategy 10
14 pages
Lmy 2024 Coam
No ratings yet
Lmy 2024 Coam
26 pages
HTML Tags Chart: Tag Name Code Example Browser View
No ratings yet
HTML Tags Chart: Tag Name Code Example Browser View
8 pages
Office Automation Lab Assignment
No ratings yet
Office Automation Lab Assignment
2 pages
Algebraic Number Problem Solving
No ratings yet
Algebraic Number Problem Solving
7 pages
LeetCode Java Cheat Sheet For Interview
100% (1)
LeetCode Java Cheat Sheet For Interview
9 pages
Revenue Assurance and Cost Management: Wedo Technologies: A Stratecast Report Featuring
No ratings yet
Revenue Assurance and Cost Management: Wedo Technologies: A Stratecast Report Featuring
18 pages
CS211 Exam PDF
No ratings yet
CS211 Exam PDF
8 pages

DBMS Unit-5 Notes

Uploaded by

DBMS Unit-5 Notes

Uploaded by

Database Management Systems

Indexing is a way to optimize performance of a database by minimizing the number of disk

Indexes are created using some database columns.

Alternatives for Data Entry k* in Index:

❖ In a data entry k* we can store:

Ordered Indexing is of two types −

▪ Index maintenance overhead on updates to key fields.

3. File organization and Indexing:

Heap File Organization

Sequential File Organization

Hash File Organization

Clustered File Organization

Operations on database files can be broadly classified into two categories −

4. Index Data structures:

1. Hash Based Indexing

2. Tree Based Indexing: An alternative to hash-based indexing is to organize records

5. Comparison of File organizations:

Sequential Heap/Direct Hash ISAM B+ tree Cluster

Stored as Stored at the Stored at Address index is Stored in a Frequently

Pile file and Static and Dense, Sparse, Indexed

Simple Simplest Medium Complex Complex Simple

Cheap Cheap Medium Costlier Costlier Medium

Fast and Best suited Faster Searching records Searching Best

Sorting of Records are Accidental Extra cost to Not suitable Not

system size. range of clustering

Cost of Comparison of file organizations:

 R: Number of records per page

 D: (Average) time to read or write disk page

6. Tree Structured Indexing:

7. ISAM (Indexed Sequential access Method):

Node structure of ISAM:

The tree can be represented as,

After Inserting 23*,48*, 41*,42*,

Deleting 41*,52*,97*, the tree becomes...

Fill Factor 50%

Minimum Keys in each page 2

B+ Tree with four keys

Adding Records to a B+ Tree

The insert algorithm for B+ Trees

Leaf Page Index Page

NO NO Place the record in sorted position in the appropriate leaf page

1. Split the leaf page

1. Split the leaf page.

YES YES 4. Split the index page.

IF the next level index page is full, continue splitting

Illustrations of the insert algorithm

Add Record with Key 28

Left Leaf Page Right Leaf Page

Add Record with Key 70

Left Leaf Page Right Leaf Page

Left Index Page Right Index Page New Index Page

Add Record with Key 95

Consider the tree,

Tree appears like,

Deleting Keys from a B+ tree

The delete algorithm for B+ Trees

Leaf Page Index Page

Delete the record from the leaf page. Arrange keys in

1. Combine the leaf page and its sibling.

Continue combining index pages until you reach a page with

9. Hash Based Indexing:

Bucket address = h(K)

• Delete − This is simply a search followed by a deletion operation.

Disadvantages of Static hashing:

• In static hashing, function h maps searchkey values to a fixed set of B

• Better solution: allow the number of buckets to be modified dynamically.

Extendible hashing/ Dynamic Hashing:

Mian points on Extendable hashing – one form of dynamic hashing

This is another dynamic hashing scheme, an alternative to Extendible Hashing.

Ex: Initial Layout

You might also like

After Inserting 23,48, 41,42,

Deleting 41,52,97*, the tree becomes...