0% found this document useful (0 votes)

31 views93 pages

CST 204 Dbms Module - 3 Physical Data Organization

Uploaded by

notaugusto7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views93 pages

CST 204 Dbms Module - 3 Physical Data Organization

Uploaded by

notaugusto7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

CST 204 DBMS

Module -3
Physical Data Organization

Prepared by,
Tintu Alphonsa Thomas,
Asst. Professor, CSE,
Amal Jyothi College of Engineering
Contents
Physical Data Organization
•Review of terms: physical and logical records,
•Blocking factor, spanned and unspanned organization.
• Heap files,
•Indexing,
• Single level indices,
•numerical examples,
•Multi-level-indices, numerical examples,
• B-Trees & B+-Trees (structure only)
•Extendible Hashing, Indexing on multiple keys
•Grid files
2
Physical Files
•Physical files contain the actual data that is stored on the system, and a description of how data
is to be presented to or received from a program.
•They contain only one record format, and one or more members.
•Records in database files can be externally or program- described.

•A physical file can have a keyed sequence access path.

• This means that data is presented to a program in a sequence based on one or more key fields
in the file.

3
Logical Files
•Logical files do not contain data.
•They contain a description of records found in one or more physical files.
•A logical file is a view or representation of one or more physical files.
•Logical files that contain more than one format are referred to as multi-format logical files.
•If your program processes a logical file which contains more than one record format, you can
use a read by record format to set the format you wish to use.

4
•Typical database applications need only a small portion of the database at a time for
processing.
•Whenever a certain portion of the data is needed, it must be located on disk, copied to
main memory for processing, and then rewritten to the disk if the data is changed.
•The data stored on disk is organized as files of records.
•Each record is a collection of data values that can be interpreted as facts about
entities, their attributes, and their relationships.
•Records should be stored on disk in a manner that makes it possible to locate them
efficiently when they are needed.

5
File Organizations
•File organization refers to the organization of the data of a file into records, blocks, and access
structures; this includes the way records and blocks are placed on the storage medium and
interlinked.
•An access method, on the other hand, provides a group of operations that can be applied to a
file.
•In general, it is possible to apply several access methods to a file organization

6
Primary File Organizations
•Primary file organizations determines how the file records are physically placed on the
disk, and hence how the records can be accessed.
1. Heap file (or unordered file) places the records on disk in no particular order by
appending new records at the end of the file.
2. Sorted file (or sequential file) keeps the records ordered by the value of a particular
field (called the sort key).
3. Hashed file uses a hash function applied to a particular field (called the hash key) to
determine a record’s placement on disk.
4. Other primary file organizations, such as B-trees, use tree structures

7
Secondary Organization
•A secondary organization or auxiliary access structure allows efficient access to file records
based on alternate fields than those that have been used for the primary file organization

8
Records and Record Types
•Data is usually stored in the form of records.
•Each record consists of a collection of related data values or items, where each value is
formed of one or more bytes and corresponds to a particular field of the record.

•Records usually describe entities and their attributes.

•A collection of field names and their corresponding data types constitutes a record type
or record format definition.
•A data type, associated with each field, specifies the types of values a field can take.

9
Files, Fixed-Length Records, and Variable- Length Records
•A file is a sequence of records.
•In many cases, all records in a file are of the same record type.
•If every record in the file has exactly the same size (in bytes), the file is said to be
made up of fixed-length records.
•If different records in the file have different sizes, the file is said to be made up of
variable-length records.

10
11
12
Spanned and Unspanned Organization
•A block is the unit of data transfer between disk and memory
•When the block size is larger than the record size, each block will contain numerous
records, although some files may have unusually large records that cannot fit in one
block.
•Part of the record can be stored on one block and the rest on another.
•A pointer at the end of the first block points to the block containing the remainder of the
record. This organization is called spanned because records can span more than one
block.
•Whenever a record is larger than a block, we must use a spanned organization.
•If records are not allowed to cross block boundaries, the organization is called unspanned.

13
Spanned Organization
Block-1

R1
File R2
R1 R3
R2

R4
Block-2
R5
R3

R5
Each record has Size 40 Bytes .
Each Block has Size 100 Bytes
14
Unspanned Organization
Block-1

R1
File R2
R1 Unused
R2

R4 Block-2

R5 R3

Unused

Each record has Size 40 Bytes .

Each Block has Size 100 Bytes
15
Blocking factor for the file
•The records of a file must be allocated to disk blocks because a block is the unit of data
transfer between disk and memory.
•When the block size is larger than the record size, each block will contain numerous
records, although some files may have unusually large records that cannot fit in one block.

•Suppose block size is B bytes.

•For a file of fixed-length records of size R bytes, with B ≥ R, we can

• fit bfr =└ B / R┘ records per block.

•The value bfr is called the blocking factor for the file.

16
•In general, R may not divide B exactly, so we have some unused space in each block
equal to
B − (bfr * R) bytes
•To utilize this unused space, we can store part of a record on one block and the rest on
another.
•A pointer at the end of the first block points to the block containing the remainder of the
record in case it is not the next consecutive block on disk.
•This organization is called spanned because records can span more than one block.
•Whenever a record is larger than a block, we must use a spanned organization.
•If records are not allowed to cross block boundaries, the organization is called unspanned
•If the average record is large, it is advantageous to use spanning to reduce the lost space in
each block

17
•For variable-length records using spanned organization, each block may store a
different number of records.
•In this case, the blocking factor bfr represents the average number of records per block
for the file.
•We can use bfr to calculate the number of blocks b needed for a file of r records:
• b = 𝖥(r/bfr)⎤ blocks
•where the 𝖥(x)⎤ (ceiling function) rounds the value x up to the next integer.

18
Question
•Number of records=100000

•record size=100 bytes

•block size=512 byte

•for spanned organization number of

• blocks needed= 100000 × 100 ÷ 512

= 19523 blocks

•In case of unspanned organization

•No of record per block=└512/100┘=5 record/block

•Total no of blocks= 100000/5 = 20000 blocks

•so we can see in unspanned need 468 disk blocks in case of unspanned organization
19
•How many disk block required to store 2000 records each of size 100bytes. Given block
size is 512 bytes

•For spanned organization

• no. of disk block= 2000 × 100 ÷ 512 =┌390.6┐ =391 block

•For unspanned organization

• No. of record per block=└512/100┘ = 5 records per block
• No. disk block= 2000/5 =400

20
Allocating File Blocks on Disk
•Contiguous allocation
• File blocks are allocated to consecutive disk blocks.
• File needs 'n' number of blocks in the disk and the file begins with a block at position'x', the
next blocks to be assigned to it will be x+1,x+2,x+3,...,x+n-1 so that they are in a contiguous
manner..

Advantages
•It is very easy to implement.
•There is a minimum amount of seek time.
•The disk head movement is minimum.
•Memory access is faster.
•It supports sequential as well as direct access.
Disadvantages
•At the time of creation, the file size must be initialized.
•As it is pre-initialized, the size cannot increase.
21
•Linked allocation
• File which we store on the hard disk is stored in a scattered manner according to the space
available on the hard disk.
• Linked File Allocation , pointers are used to point to the next block of the same file,
therefore along with the entry of each file each block also stores the pointer to the next
block.

•Clusters allocation

A combination of the two allocates clusters of consecutive disk blocks, and the clusters
are linked. Clusters are sometimes called file segments or extents.
22
•Indexed allocation
• The indexed file allocation also uses pointers and all the pointers are put together into one
location which is called index block.
• We will get all the locations of blocks in one index file.
• The blocks and pointers were spread over the memory in the Linked Allocation method,
where retrieval was accomplished by visiting each block sequentially.
• But here in indexed allocation, it becomes easier with the index block to retrieve.

23
File Headers

•A file header or file descriptor contains information about a file that is needed by the
system programs that access the file records.
•The header includes information to determine the disk addresses of the file blocks as
well as to record format descriptions.
• which may include field lengths and the order of fields within a record for fixed-length
unspanned records and field type codes, separator characters, and record type codes for
variable-length records.

24
•To search for a record on disk, one or more blocks are copied into main memory
buffers.
•Programs then search for the desired record or records within the buffers, using the
information in the file header.
•If the address of the block that contains the desired record is not known, the search
programs must do a linear search through the file blocks.
•Each file block is copied into a buffer and searched until the record is located or all
the file blocks have been searched unsuccessfully.
•This can be very time-consuming for a large file. The goal of a good file organization
is to locate the block that contains a desired record with a minimal number of block
transfers.
25
Typical file operations
•▫ OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
•▫ FIND: Searches for the first file record that satisfies a certain condition, and makes it
the current file record.
•▫ FINDNEXT: Searches for the next file record (from the current record) that satisfies
a certain condition, and makes it the current file record.
•▫ READ: Reads the current file record into a program variable.
•▫ INSERT: Inserts a new record into the file & makes it the current file record.

26
•DELETE: Removes the current file record from the file, usually by marking the
record to indicate that it is no longer valid.
•▫ MODIFY: Changes the values of some fields of the current file record.
•▫ CLOSE: Terminates access to the file.
•▫ REORGANIZE: Reorganizes the file records.
•▫ For example, the records marked deleted are physically removed from the file or a
new organization of the file records is created.
•▫ READ_ORDERED: Read the file blocks in order of a specific field of the file.

27
Files of Unordered Records (Heap Files)
•Records are placed in the file in the order in which they are inserted, so new records are
inserted at the end of the file.
•This organization is often used with additional access paths, such as the secondary
indexes
•Insertion
• ▫ Inserting a new record is very efficient.
•▫ The last disk block of the file is copied into a buffer, the new record is added, and the
block is then rewritten back to disk.
• ▫ The address of the last file block is kept in the file header.

28
Searching
•▫ searching a record using any search condition involves a linear search through the file
block by block an expensive procedure.
•▫ If only one record satisfies the search condition, then, on the average, a program will
read into memory and search half the file blocks before it finds the record.
•▫ For a file of b blocks, this requires searching (b/2) blocks, on average.
•▫ If no records or several records satisfy the search condition, the program must read
and search all b blocks in the file

29
Deletion
• a program must first find its block, copy the block into a buffer, delete the record from
the buffer, and finally rewrite the block back to the disk.
• This leaves unused space in the disk block.
• Deleting a large number of records in this way results in wasted storage space.
• Another technique used for record deletion is to have an extra byte or bit, called a
deletion marker, stored with each record.
• A record is deleted by setting the deletion marker to a certain value. A different value
for the marker indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search.
• Both of these deletion techniques require periodic reorganization of the file to reclaim
the unused space of deleted records.

30
•We can use either spanned or unspanned organization for an unordered file, and it may be used
with either fixed- length or variable-length records.
•Modifying a variable-length record may require deleting the old record and inserting a modified
record because the modified record may not fit in its old space on disk
•For a file of unordered fixed-length records using unspanned blocks and contiguous
allocation, it is straightforward to access any record by its position in the file.
•If the file records are numbered 0, 1, 2, ..., r − 1 and the records in each block are numbered 0, 1,
..., bfr − 1, where bfr is the blocking factor, then the ith record of the file is located in block ⎣
(i/bfr)⎦and is the (i mod bfr)th record in that block.
•Such a file is often called a relative or direct file because records can easily be accessed
directly by their relative positions.
•Accessing a record by its position does not help locate a record based on a search condition;
however, it facilitates the construction of access paths on the file, such as the indexes.

31
Files of Ordered Records (Sorted Files)
•We can physically order the records of a file on disk based on the values of one of their
fields called the ordering field.
•This leads to an ordered or sequential file.
•If the ordering field is also a key field of the file a field guaranteed to have a unique
value in each record then the field is called the ordering key for the file.

32
Ordered records advantages over unordered files
•First, reading the records in order of the ordering key values becomes extremely
efficient because no sorting is required.
•Second, finding the next record from the current one in order of the ordering key usually
requires no additional block accesses because the next record is in the same block as the
current one
•Third, using a search condition based on the value of an ordering key field results in
faster access when the binary search technique is used, which constitutes an
improvement over linear searches, although it is not often used for disk files.
•Ordered files are blocked and stored on contiguous cylinders to minimize the seek time

33
•A binary search for disk files can be done on the blocks rather than on the records.
•Suppose that the file has b blocks numbered 1, 2, ..., b; the records are ordered by
ascending value of their ordering key field; and we are searching for a record whose
ordering key field value is K.
•A binary search usually accesses log2(b) blocks, whether the record is found or not an
improvement over linear searches, where, on the average, (b/2) blocks are accessed when
the record is found and b blocks are accessed when the record is not found.
•Ordering does not provide any advantages for random or ordered access of the records
based on values of the other non-ordering fields of the file. In these cases, we do a linear
search for random access.
•To access the records in order based on a non-ordering field, it is necessary to create
another sorted copy in a different order of the file
34
•Inserting and deleting records are expensive operations for an ordered file because the
records must remain physically ordered.
•To insert a record, we must find its correct position in the file, based on its ordering field
value, and then make space in the file to insert the record in that position.
•For a large file this can be very time consuming because, on the average, half the
records of the file must be moved to make space for the new record.
•This means that half the file blocks must be read and rewritten after records are moved
among them.
•For record deletion, the problem is less severe if deletion markers and periodic
reorganization are used.

35
Average Access Times
The following table shows the average access time to access a specific record for a given type of
file

36
•Consider an unordered file of 100000 records with record size of 100 bytes stored on
block of 4096byte with an unspanned record organization. Assume that no system
relation information stored in block. On average how many block would be access to
find particular records

No. of records per block=└4096/100┘=40

No. of blocks=100000/40=2500 blocks
Average number of block access to search a record=b/2
=2500/2= 1250block access

37
•Consider an sorted file of 100000 records with record size of 100 bytes stored on block
of 4096byte with an unspanned record organization. Assume that no system relation
information stored in block. On average how many block would be access to find
particular records if search is based on ordering field .
• Ans. No. of records per block=└4096/100┘=40
• No. of blocks=100000/40=2500 blocks
• Average number of block access to search a record=log2 b
• =log2 2500=12 blocks

38
CST 204 DBMS
Module -3
Physical Data Organization
Static and Dynamic Hashing

Prepared by,
Tintu Alphonsa Thomas,
Asst. Professor, CSE,
Amal Jyothi College of Engineering
Hashed Files
✔ Hashing for disk files is called External Hashing
✔ The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ...,
bucketM-1
◾ Typically, a bucket corresponds to one (or a fixed number of) disk block.
✔ One of the file fields is designated to be the hash key of the file.
✔ The record with hash key value K is stored in bucket i, where i=h(K), and h is the
hashing function.
✔ Search is very efficient on the hash key.
✔ Collisions occur when a new record hashes to a bucket that is already full.
◾ An overflow file is kept for storing such records.
◾ Overflow records that hash to each bucket can be linked together.
40
• The hash function's output determines the location of disk block where the records are to be
placed.

41
Static Hashing Operations

• Search a Record
• When a record is needed, the very same hash function is used in order to get the address of that
bucket in which the data is kept.
• Insert a Record
• When a new record is entered into the table, the hash key is used to construct an address for the
new record, and the record is placed there.
• Delete a Record
• To delete a record, we must first retrieve the record that will be destroyed. The records for this
address will then be deleted from memory.

42
Update a Record

• To edit a record, we’ll use a hash function to find it first, then change the data record.

• If we wish to add a new record to the file, but the address of the data bucket formed by the
hash function isn’t empty, or information already exists in that address, we can’t add the
record. Bucket overflow is a term used in static hashing to describe this occurrence.

43
Open Hashing
• Whenever a hash function generates any address that already contains data, the next bucket is
assigned to it, and this process is called Linear probing.

• For instance, if R3 is a new address that needs to be entered, the hash function will generate
110 as R3’s address. However, the address that was produced is already full; as a result, the
system selects 113 as the next available data bucket and assigns R3 to it.

44
Close Hashing
• When a data bucket is filled, a new one is created for the very same hash result and connected
after the old one, and this method is called Overflow chaining.

• For example, if R3 is a new address that has to be added to the database, the hash function will
assign it the address 110. However, this bucket is too full to accommodate the additional data.
In this scenario, a new bucket is placed and linked to the end of 110 buckets.

45
■ Multiple hashing: The program applies a second hash function if the first results
in a collision.
■ If another collision results, the program uses open addressing or applies a third
hash function and then uses open addressing if necessary.

46
Static External Hashing for Disk Files
■ Hashing for disk files is called external hashing.
■ To suit the characteristics of disk storage, the target address space is made of buckets,
each of which holds multiple records.
■ A bucket is either one disk block or a cluster of contiguous disk blocks.
■ The hashing function maps a key into a relative bucket number, rather than assigning an
absolute block address to the bucket.
■ A table maintained in the file header converts the bucket number into the
corresponding disk block address
■ Here we have fixed number of buckets allocated so it is also called static external
hashing

47
48
• To reduce overflow records, a hash file is typically kept 70-80% full.
• The hash function h should distribute the records uniformly among the buckets
Otherwise, search time will be increased because many overflow records will exist.
Main disadvantages of static external hashing:
• Fixed number of buckets M is a problem if the number of records in the file grows or
shrinks.
• Ordered access on the hash key is quite inefficient (requires sorting the records).
• searching for a record given a value of some field other than the hash field is as expensive
as in the case of an unordered file

49
Hashed Files - Overflow handling

50
Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques
✔ Hashing techniques are adapted to allow the dynamic growth and shrinking of the number
of file records.
✔ These techniques include the following: dynamic hashing, extendible hashing, and
linear hashing.
✔ Both dynamic and extendible hashing use the binary representation of the hash value
h(K) in order to access a directory.
✔ In dynamic hashing the directory is a binary tree.
✔ In extendible hashing the directory is an array of size 2d where d is called the global depth.

51
• The dynamic hashing approach is used to solve problems like bucket overflow that can occur
with static hashing.

• As the number of records increases or decreases, data buckets grow or shrink in this manner.
This method makes hashing dynamic, allowing for insertion and deletion without causing
performance issues.

52
Extendible Hashing
• Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to
hash data. It is an aggressively flexible method in which the hash function also experiences
dynamic changes.

• Main features of Extendible Hashing

• Directories: The directories store addresses of the buckets in pointers. An id is assigned to

each directory which may change each time when Directory Expansion takes place.

• Buckets: The buckets are used to hash the actual data.

53
54
Extendible Hashing

55
• Directories: These containers store pointers to buckets. Each directory is given a unique id which may change
each time when expansion takes place. The hash function returns this directory id which is used to navigate to the
appropriate bucket. Number of Directories = 2^Global Depth.

• Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain more than one pointers
to it if its local depth is less than the global depth.

• Global Depth: It is associated with the Directories. They denote the number of bits which are used by the hash
function to categorize the keys. Global Depth = Number of bits in directory id.

• Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is associated with the
buckets and not the directories. Local depth in accordance with the global depth is used to decide the action that to
be performed in case an overflow occurs. Local Depth is always less than or equal to the Global Depth.

• Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then the bucket is split into
two parts.

• Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory Expansion is
performed when the local depth of the overflowing bucket is equal to the global depth
56
Basic Working of Extendible Hashing

Data 49 Hash Returns

( Converted to Function Directory Directory
Binary )110001 Location
f(h)
Hash function will return 3 LSBs of 110001 is 001 Data
traverses
to Bucket
Global depth of the Hash-directory is 3

Bucket
57
If Overflow occurs
•Local depth is less than or equal to the global depth. Then choose one of the cases below.

Case1: If the local depth of the overflowing Bucket is equal to the global depth, then Directory

Expansion, as well as Bucket Split, needs to be performed. Then increment the global depth and

the local depth value by 1. And, assign appropriate pointers.

Directory expansion will double the number of directories present in the hash structure.

•Case2: In case the local depth is less than the global depth, then only Bucket Split takes place.

Then increment only the local depth value by 1. And, assign appropriate pointers.

58
Example –Dynamic Hashing
• Consider the following classification of keys into buckets based on their hash address prefix

KEY Hash Address

1 11010
2 00000
3 11110
4 00000
5 01001
6 10101
7 10111

59
• Last two bits in 2 and 4 are 00, they will be placed in bucket B0.

• Because the last two bits of the numbers 5 and 6 are 01, they will be placed in bucket B1.
Since the last two parts of 1 and 3 add up to 10, they will be placed in bucket B2, and as the
last two bits in 7 are 11, they will be placed in B3.

KEY Hash Address

1 11010
2 00000
3 11110
4 00000
5 01001
6 10101
7 10111
60
• Inserting a key 9 into the above structure with hash address 10001
• Key 9 must be put into the first bucket because its hash address is 10001. However, because
bucket B1 is full, it will be split.
• Because the final three bits of 5, 9 are 001, they will be split into bucket B1, whereas the last
three bits in 6 are 101, and they will be split into bucket B5.
• The 2nd and 4th keys are still in B0. Because the last two bits of both entries are 00, the record
in B0 is pointed to 000 and 100 entries.
• The first and third keys are still in B2. Because the last two bits of both entries are 10, the
record in B2 is pointed to 010 and 110 entries.
• The keys 7 and 8 are still in B3. Because the last two bits of both entries are 11, the record in
B3 is pointed to 111 and 011 entries.
61
Data Records Data Buckets

2 4
000 B0
001 5
9 B1
010
1 3
011 B2

100
7 B3
101

110 6 B5

111

62
Indexing
• Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed.

• The index is a type of data structure. It is used to locate and access the data in a database table
quickly.

• Types of Single-level Ordered Indexes

• Primary Indexes

• Clustering Indexes

• Secondary Indexes
• Multilevel Indexes

• Dynamic Multilevel Indexes Using B-Trees and B+-Trees Indexes on Multiple Keys
63
Index Structures
•Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done.
•An index on a database table provides a convenient mechanism for locating a row
(data record) without scanning the entire table and thus greatly reduces the time it
takes to process a query.

•The index is usually specified on one field of the file.

•One form of an index is a file of entries <field value, pointer to record>, which is
ordered by field value.
•The index file usually occupies considerably less disk blocks than the data file
because its entries are much smaller.
64
Advantages:
• ▫ Stores and organizes data into computer files.
• ▫ Makes it easier to find and access data at any given time.
• ▫ It is a data structure that is added to a file to provide faster access to the data.
• ▫ It reduces the number of blocks that the DBMS has to check.
Disadvantages
• ▫ Index needs to be updated periodically for insertion or deletion of records in the main
table.

65
Types of index
•Indexes can be characterized as
1. Dense index
2. Sparse index
• A dense index has an index entry for every search key value (and hence every record) in the
data file.
• A sparse (or nondense) index, on the other hand, has index entries for only some of the
search values.

66
Structure of index

•An index is a small table having only two columns.

•The first column contains a copy of the primary or candidate key of a table
•The second column contains a set of pointers holding the address of the disk block where that
particular key value can be found.
•If the indexes are sorted, then it is called as ordered indices.

67
Primary Index

•Primary index is defined on an ordered data file. The data file is ordered on a key field.
•The key field is generally the primary key of the relation.
•A primary index is a nondense (sparse) index, since it includes an entry for each disk block of
the data file and the keys of its anchor record rather than for every search value.

68
69
• Example
•Suppose we have an ordered file with 30,000 records and stored on a disk of block
size 1024 bytes and records are of fixed size, unspanned organisation.
•Record length = 100 bytes. How many block access if using a primary index file, with
an ordering key field of the file 9 bytes and block pointer size 6 bytes.

70
• Ans. Ordering key field of file V=9 byte Block pointer P=6 byte
• we need to construct a primary index for the file size of each index entry=
9+6=15 byte
• so blocking fctor of index entry=
• └1024/15┘=68 entries per block Total no of entries per
block=┌3000/68┐=45
• To do binary search =┌log2 bi┐=┌log2 45┐=6 blocks
•To search for a record using index we need additional one block acess =6+1=7 block
access

71
Clustering Index

•If file records are physically ordered on a nonkey field which does not have a distinct value for
each record that field is called the clustering field and the data file is called a clustered file.
•clustering index speeds up retrieval of all the records that have the same value for the
clustering field.
•This differs from a primary index, which requires that the ordering field of the data file have a
distinct value for each record.

72
A Clustering Index on Dept_number Ordering Nonkey Field of a File

No. of index record= No. of distinct

data records
Sorted index file
Sorted data file
need >=log2 bi+1

73
• Question
•Suppose we have an ordered file with 30,0000 records and stored on a disk of block size
4096 bytes and records are of size 100 byte, unspanned organisation.
•The ordered file is based on non key field (department_name). How many block
access if using a primary index file, with an ordering key field of the file 5 bytes and
block pointer size 6 bytes. There are 1000 department and 300 employees per
department.

74
•No of record=300000 Block size=4096 b Record size=100 b Block
ptr=6 b Clustered Index Without index:

• No of blocks=┌300000/40┐=7500
• No. of block access= log2 7500=13 With clustered index
• No of index record=┌4096/(6+5)┐=4096/11=372 Total no of index record=1000
department
•No of clustered index blocks=┌ 1000/372┐ =3 blocks No of clock access= ┌log2 3┐ +1=
2+1=3 block access

75
• Secondary Index
•A secondary index provides a secondary means of accessing a file for which some primary
access already exists.
•The secondary index may be on a field which is a candidate key and has
•a unique value in every record, or a non-key with duplicate values.
•The index is an ordered file with two fields.
•The first field is of the same data type as some non-ordering field of the
•data file that is an indexing field.
•The second field is either a block pointer or a record pointer.
•There can be many secondary indexes (and hence, indexing fields) for the same file.
•Includes one entry for each record in the data file; hence, it is a dense index.
76
• Example
•Consider an unordered file with 10^8 records with record size of 400 bytes with an
unspanned organization. Suppose that we construct a single level secondary index for the
file where search key field is 16 bytes and block pointer is 4 byte. Assume disk block
size is 4096
•How many blocks are required to store the data file?
•How many blocks are required to store the index file?

77
•Disk file
• ▫ No of data records/ block=4096/400=10
• ▫ No of data block required=ceil(10^8/10)=10^7
•Index file
• ▫ Size of one index record=16+4=20 byte
• ▫ No of index record= No of data record=10^8
• Secondary index on key field
• ▫ No of index record per block=4096/20=204
• ▫ No of index block required=ceil(10^8/204)=490197
78
• Example 2
•Consider an unordered file with 10^8 records with record size of 400 bytes with an
spanned organization. Suppose that we construct a single level secondary index for the
file where search key field is 16 bytes and block pointer is 4 byte. Assume disk block
size is 4096
•How many blocks are required to store the data file?
•How many blocks are required to store the index file?

79
•Disk file
• ▫ No of data block required=ceil(10^8*400/4096)=9765625
•Index file
• ▫ Size of one index record=16+4=20 byte
• ▫ No of index record= No of data record=10^8
secondary index on key field
• ▫ No of index block required=ceil(10^8*20/4096)=488282

80
• Example
•Consider an unordered file with 30000 records with record size of 100 bytes with an
unspanned organization. Suppose that we construct a secondary index for the key field
with key field size is 9 bytes and block pointer is 6 byte. Assume disk block size is 1024
•Average number of disk block access to search for record without index
•Average number of disk block access to search for record with index

81
•Disk file

• ▫ No of data records/ block=1024/100=10

• ▫ No of data block required=ceil(30000/10)=3000

• ▫ Average no of disk access= ceil(3000/2)=1500

•Index file

• ▫ Size of one index record=9+6=15 byte

• ▫ No of index record= No of data record=3000

secondary index on key field

• ▫ No of index record per block=1024/15=68

• ▫ No of index block required=ceil(30000/68)=442

• ▫ Avg no of disk block access required=ceil(log2 442)+1 =10

82
Dense Secondary Index (with Block Pointer) on a Non Ordering Key Field of the File 83
Secondary Index (with Record Pointer) on a Non Key Field implemented using one level
of indirection so that Index entries are of Fixed Length and have unique field values 84
85
• Single level and Multi-level indexing
•Because a single-level index is an ordered file, we can create a primary index to
the index itself;
•In this case, the original index file is called the first- level index and the index
to the index is called the second-level index.
•We can repeat the process, creating a third, fourth, ..., top level until all entries
of the top level fit in one disk block.
•A multi-level index can be created for any type of first- level index (primary,
secondary, clustering) as long as the first-level index consists of more than one
disk block

86
Two-Level Primary Index
87
• Example
•Consider a file of 16384 records. Each record is 32 bytes long and its key field is of size
6 bytes. The file is ordered on a non-key field, and the file organization is unspanned.
The file is stored in a file system with block size 1024 bytes, and the size of a block
pointer is 10 bytes. If the secondary index is built on the key field of the file, and a
multi-level index scheme is used to store the secondary index, the number of first-level
and second- level blocks in the multi-level index are?

88
•Number of records in file = 16384
•Record size = 32 bytes
•Key Size = 6 bytes
•Block Size on file system = 1024 bytes
•Size of Block Pointer = 10 bytes
•Size of a record or index Entry = 10 + 6 = 16
• ordered on a non-key field
•secondary index is built on the key field of the file, multi-level index scheme is used

89
•Without Indexing
• ▫ No of data record per block=1024/32=32
• ▫ No of data block=16384/32=512
•Indexing First Level:
• ▫ No of index record=16384
secondary index is built on the key field of the file
• ordered
• ▫ No of index record per block =1024/(6+10)=64
• ▫ No of index blocks=16384/64=256

90
•Indexing Second Level
• ▫ No of index records=256
• ▫ ordered on a non-key field
• ▫ No of index record per block=1024/16=64
• ▫ No of index block =256/64=4 blocks

91
• Example 2
• Consider an EMPLOYEE file with 10000 records where each record is of size 80 bytes. The
file is sorted on employee number (15 bytes long), which is the primary key. Assuming
un-spanned organization and block size of

• 512 bytes compute the number of block accesses needed for selecting records based on
employee number if,

• i. No index is used

• ii. Single level primary index is used

• iii. Multi-level primary index is used

• Assume a block pointer size of 6 bytes.

92
•Given
•No of records=10000
•record size=80byte
•key field=15byte
•block size=512byte
•block pointer size=6 byte
•SORTED ON PRIMARY KEY UNSPANNED ORGANISATION
•No index used( data file)
• ▫ No of record per block=512/80=6
• No of data block needed=10000/6=1667
• ▫ No of block access needed = log2 1667=11
•Single level primary index
• ▫ No of index record=1667
• ▫ No of index record per block =512/(15+6)=24
• ▫ No of index block needed=1667/24=70
• ▫ No of block access needed=log2 70 +1=7+1=8
•Multilevel Primary Index
• ▫ No of multilevel index record=70
• ▫ No of multilevel index record/block=512/(15+6)=24
• ▫ No of multilevel index record =70/24=3
• ▫ No of block access=log2 3+2=5
93

File Organization
No ratings yet
File Organization
37 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
18 pages
DBMS - Unit 3 - Page 1-6
No ratings yet
DBMS - Unit 3 - Page 1-6
19 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Basic File Structure
No ratings yet
Basic File Structure
17 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
Lecture 01 - File Storage - Part 1
No ratings yet
Lecture 01 - File Storage - Part 1
48 pages
Lecture 1 Edited-1
No ratings yet
Lecture 1 Edited-1
48 pages
5 File Management
No ratings yet
5 File Management
14 pages
31 File Structures
No ratings yet
31 File Structures
20 pages
File Organization in RDBMS
No ratings yet
File Organization in RDBMS
9 pages
Dbms 5
No ratings yet
Dbms 5
38 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Part 6 Storage Index
No ratings yet
Part 6 Storage Index
28 pages
File Organization Techniques Guide
No ratings yet
File Organization Techniques Guide
37 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
24 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Database Storage & File Organization
No ratings yet
Database Storage & File Organization
53 pages
Unit 5
No ratings yet
Unit 5
185 pages
Data Storage, Indexing Structures For Files
No ratings yet
Data Storage, Indexing Structures For Files
83 pages
CH 13
No ratings yet
CH 13
6 pages
DBMS 5
No ratings yet
DBMS 5
17 pages
8.physical Database Design
No ratings yet
8.physical Database Design
20 pages
Physical Records
No ratings yet
Physical Records
14 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
Disk Storage & File Structures Guide
No ratings yet
Disk Storage & File Structures Guide
10 pages
Dbms Unit III
No ratings yet
Dbms Unit III
64 pages
CH 3
No ratings yet
CH 3
47 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
1.file Organization
No ratings yet
1.file Organization
90 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
Topic 5
No ratings yet
Topic 5
40 pages
Chapter 13:disk Storage and Basic File Structures
No ratings yet
Chapter 13:disk Storage and Basic File Structures
31 pages
File Structures Indexing Kopyası
No ratings yet
File Structures Indexing Kopyası
76 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
ENACh 13 Final
No ratings yet
ENACh 13 Final
34 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
34 pages
File Management Essentials
No ratings yet
File Management Essentials
29 pages
File Organisation Simple Structure1
No ratings yet
File Organisation Simple Structure1
31 pages
F - DataBase Chapter 5
No ratings yet
F - DataBase Chapter 5
20 pages
Unit 4 Storage and Querying
No ratings yet
Unit 4 Storage and Querying
48 pages
08 Filemanagement
No ratings yet
08 Filemanagement
66 pages
Storage and File Structures: Goals
No ratings yet
Storage and File Structures: Goals
13 pages
Unit3 Datastorage Structre
No ratings yet
Unit3 Datastorage Structre
29 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
23 pages
File and Database Design
No ratings yet
File and Database Design
28 pages
File Organization and Data Base Design
No ratings yet
File Organization and Data Base Design
17 pages
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
No ratings yet
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
38 pages
ADBMS Lec#2
No ratings yet
ADBMS Lec#2
42 pages
Seek Time
No ratings yet
Seek Time
10 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
28 pages
Chapter 5-Record Storage and Primary File Organization
100% (1)
Chapter 5-Record Storage and Primary File Organization
64 pages
File Organization for CS Students
No ratings yet
File Organization for CS Students
13 pages
Unit - 5 - Part 1
No ratings yet
Unit - 5 - Part 1
49 pages
Integrity Constraints-1 - 241109 - 150808
No ratings yet
Integrity Constraints-1 - 241109 - 150808
24 pages
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030
No ratings yet
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030
32 pages
AZ 04T00A ENU PowerPoint - 05
No ratings yet
AZ 04T00A ENU PowerPoint - 05
36 pages
3.3.2.5 Packet Tracer - Threat Modeling at The IoT Device Layer
No ratings yet
3.3.2.5 Packet Tracer - Threat Modeling at The IoT Device Layer
12 pages
Lab2 Task1 StickyKeys Solution
No ratings yet
Lab2 Task1 StickyKeys Solution
12 pages
8112 Base Station Subsystem Alarm Handling Manual
No ratings yet
8112 Base Station Subsystem Alarm Handling Manual
289 pages
Chapter 2: Basic Elements of Java
No ratings yet
Chapter 2: Basic Elements of Java
60 pages
Hackthebox Backdoor: Prepared by Imamrahman15
No ratings yet
Hackthebox Backdoor: Prepared by Imamrahman15
17 pages
Power Planning for Chip Designers
No ratings yet
Power Planning for Chip Designers
3 pages
Lectures On Recursive Algorithms: COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski
No ratings yet
Lectures On Recursive Algorithms: COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski
48 pages
MC-Part-A - 4 Units - Tamil
No ratings yet
MC-Part-A - 4 Units - Tamil
10 pages
OptiPlex 7080 vs 7070: Specs Guide
No ratings yet
OptiPlex 7080 vs 7070: Specs Guide
3 pages
Development Wireless Battery Monitoring For EV
No ratings yet
Development Wireless Battery Monitoring For EV
5 pages
Mad Imp Que
No ratings yet
Mad Imp Que
2 pages
UGRD-AI6100 - MIDTERM EXAM - Attempt PERFECT
No ratings yet
UGRD-AI6100 - MIDTERM EXAM - Attempt PERFECT
11 pages
Lesson Plan in Tle Computer Hardware and
No ratings yet
Lesson Plan in Tle Computer Hardware and
7 pages
ECE Preplacement Training Schedule
No ratings yet
ECE Preplacement Training Schedule
3 pages
Forensics Analysis of Privacy of Portable Web Browsers
No ratings yet
Forensics Analysis of Privacy of Portable Web Browsers
13 pages
34-Signal Constellations
No ratings yet
34-Signal Constellations
2 pages
Java 2D and SVG: Rachel Struthers
No ratings yet
Java 2D and SVG: Rachel Struthers
60 pages
Suka Oprek - Jawaban Mtcna
No ratings yet
Suka Oprek - Jawaban Mtcna
25 pages
Python Question Bank
No ratings yet
Python Question Bank
6 pages
Basic Always On Availability Groups in SQL Server Standard
100% (1)
Basic Always On Availability Groups in SQL Server Standard
24 pages
Implementation Diagrams: CSE870: UML Component Diagrams
No ratings yet
Implementation Diagrams: CSE870: UML Component Diagrams
22 pages
Principle and Interface Techniques of Microcontroller: - 8051 Microcontroller and Embedded Systems Using Assembly and C
No ratings yet
Principle and Interface Techniques of Microcontroller: - 8051 Microcontroller and Embedded Systems Using Assembly and C
59 pages
R18 DBMS Unit-V
No ratings yet
R18 DBMS Unit-V
43 pages
Tutorial - On - Adding - Probing - in Post - Using - UPG
No ratings yet
Tutorial - On - Adding - Probing - in Post - Using - UPG
21 pages
Vocational Training Authority of Sri Lanka
67% (3)
Vocational Training Authority of Sri Lanka
5 pages
HacktheBox Blackfield Guide
No ratings yet
HacktheBox Blackfield Guide
23 pages
Ravali Data Engineer GCP
No ratings yet
Ravali Data Engineer GCP
8 pages
TROVIS 6661: T 6661 EN TROVIS-VIEW Software
No ratings yet
TROVIS 6661: T 6661 EN TROVIS-VIEW Software
6 pages
Wifi Version User Manual - : At4P HD Color Screen Digital Display Smart Rail Watt-Hour
0% (2)
Wifi Version User Manual - : At4P HD Color Screen Digital Display Smart Rail Watt-Hour
1 page

CST 204 Dbms Module - 3 Physical Data Organization

Uploaded by

CST 204 Dbms Module - 3 Physical Data Organization

Uploaded by

CST 204 DBMS

•A physical file can have a keyed sequence access path.

•Records usually describe entities and their attributes.

Each record has Size 40 Bytes .

•Suppose block size is B bytes.

• fit bfr =└ B / R┘ records per block.

•record size=100 bytes

•block size=512 byte

•for spanned organization number of

• blocks needed= 100000 × 100 ÷ 512

•In case of unspanned organization

•No of record per block=└512/100┘=5 record/block

•Total no of blocks= 100000/5 = 20000 blocks

•For spanned organization

•For unspanned organization

No. of records per block=└4096/100┘=40

• Main features of Extendible Hashing

• Directories: The directories store addresses of the buckets in pointers. An id is assigned to

• Buckets: The buckets are used to hash the actual data.

Data 49 Hash Returns

the local depth value by 1. And, assign appropriate pointers.

KEY Hash Address

KEY Hash Address

• Types of Single-level Ordered Indexes

•The index is usually specified on one field of the file.

•An index is a small table having only two columns.

No. of index record= No. of distinct

• ▫ No of data records/ block=1024/100=10

• ▫ No of data block required=ceil(30000/10)=3000

• ▫ Average no of disk access= ceil(3000/2)=1500

• ▫ Size of one index record=9+6=15 byte

• ▫ No of index record= No of data record=3000

secondary index on key field

• ▫ No of index record per block=1024/15=68

• ▫ No of index block required=ceil(30000/68)=442

• ▫ Avg no of disk block access required=ceil(log2 442)+1 =10

• ii. Single level primary index is used

• iii. Multi-level primary index is used

• Assume a block pointer size of 6 bytes.

You might also like