CST 204 Dbms Module - 3 Physical Data Organization
CST 204 Dbms Module - 3 Physical Data Organization
Module -3
                                     Physical Data Organization
Prepared by,
Tintu Alphonsa Thomas,
Asst. Professor, CSE,
Amal Jyothi College of Engineering
                                     Contents
Physical Data Organization
 •Review of terms: physical and logical records,
 •Blocking factor, spanned and unspanned organization.
 • Heap files,
 •Indexing,
 • Single level indices,
 •numerical examples,
 •Multi-level-indices, numerical examples,
 • B-Trees & B+-Trees (structure only)
 •Extendible Hashing, Indexing on multiple keys
 •Grid files
                                                         2
                                        Physical Files
•Physical files contain the actual data that is stored on the system, and a description of how data
is to be presented to or received from a program.
•They contain only one record format, and one or more members.
•Records in database files can be externally or program- described.
                                                                                                  3
                                         Logical Files
•Logical files do not contain data.
•They contain a description of records found in one or more physical files.
•A logical file is a view or representation of one or more physical files.
•Logical files that contain more than one format are referred to as multi-format logical files.
•If your program processes a logical file which contains more than one record format, you can
use a read by record format to set the format you wish to use.
                                                                                                  4
•Typical database applications need only a small portion of the database at a time for
processing.
•Whenever a certain portion of the data is needed, it must be located on disk, copied to
main memory for processing, and then rewritten to the disk if the data is changed.
•The data stored on disk is organized as files of records.
•Each record is a collection of data values that can be interpreted as facts about
entities, their attributes, and their relationships.
•Records should be stored on disk in a manner that makes it possible to locate them
efficiently when they are needed.
                                                                                           5
                                      File Organizations
•File organization refers to the organization of the data of a file into records, blocks, and access
structures; this includes the way records and blocks are placed on the storage medium and
interlinked.
•An access method, on the other hand, provides a group of operations that can be applied to a
file.
•In general, it is possible to apply several access methods to a file organization
                                                                                                  6
                                 Primary File Organizations
 •Primary file organizations determines how the file records are physically placed on the
 disk, and hence how the records can be accessed.
1. Heap file (or unordered file) places the records on disk in no particular order by
  appending new records at the end of the file.
2. Sorted file (or sequential file) keeps the records ordered by the value of a particular
  field (called the sort key).
3. Hashed file uses a hash function applied to a particular field (called the hash key) to
  determine a record’s placement on disk.
4. Other primary file organizations, such as B-trees, use tree structures
                                                                                             7
                                 Secondary Organization
•A secondary organization or auxiliary access structure allows efficient access to file records
based on alternate fields than those that have been used for the primary file organization
                                                                                                  8
                               Records and Record Types
•Data is usually stored in the form of records.
•Each record consists of a collection of related data values or items, where each value is
formed of one or more bytes and corresponds to a particular field of the record.
                                                                                             9
            Files, Fixed-Length Records, and Variable- Length Records
•A file is a sequence of records.
•In many cases, all records in a file are of the same record type.
•If every record in the file has exactly the same size (in bytes), the file is said to be
made up of fixed-length records.
•If different records in the file have different sizes, the file is said to be made up of
variable-length records.
                                                                                            10
11
12
                       Spanned and Unspanned Organization
•A block is the unit of data transfer between disk and memory
•When the block size is larger than the record size, each block will contain numerous
records, although some files may have unusually large records that cannot fit in one
block.
•Part of the record can be stored on one block and the rest on another.
•A pointer at the end of the first block points to the block containing the remainder of the
record. This organization is called spanned because records can span more than one
block.
•Whenever a record is larger than a block, we must use a spanned organization.
•If records are not allowed to cross block boundaries, the organization is called unspanned.
                                                                                               13
                                  Spanned Organization
                                       Block-1
                                         R1
              File                       R2
               R1                        R3
               R2
R3
               R4
                                        Block-2
               R5
                                          R3
R4
                                          R5
Each record has Size 40 Bytes .
Each Block has Size 100 Bytes
                                                         14
                             Unspanned Organization
                                   Block-1
                                     R1
              File                   R2
               R1                  Unused
               R2
R3
R4 Block-2
R5 R3
R4
Unused
                                                                                               16
•In general, R may not divide B exactly, so we have some unused space in each block
equal to
               B − (bfr * R) bytes
•To utilize this unused space, we can store part of a record on one block and the rest on
another.
•A pointer at the end of the first block points to the block containing the remainder of the
record in case it is not the next consecutive block on disk.
•This organization is called spanned because records can span more than one block.
•Whenever a record is larger than a block, we must use a spanned organization.
•If records are not allowed to cross block boundaries, the organization is called unspanned
•If the average record is large, it is advantageous to use spanning to reduce the lost space in
each block
                                                                                                  17
•For variable-length records using spanned organization, each block may store a
different number of records.
•In this case, the blocking factor bfr represents the average number of records per block
for the file.
•We can use bfr to calculate the number of blocks b needed for a file of r records:
                • b = 𝖥(r/bfr)⎤ blocks
•where the 𝖥(x)⎤ (ceiling function) rounds the value x up to the next integer.
                                                                                            18
                                               Question
•Number of records=100000
= 19523 blocks
                                                                                           20
                                  Allocating File Blocks on Disk
     •Contiguous allocation
    • File blocks are allocated to consecutive disk blocks.
    • File needs 'n' number of blocks in the disk and the file begins with a block at position'x', the
       next blocks to be assigned to it will be x+1,x+2,x+3,...,x+n-1 so that they are in a contiguous
       manner..
 Advantages
•It is very easy to implement.
•There is a minimum amount of seek time.
•The disk head movement is minimum.
•Memory access is faster.
•It supports sequential as well as direct access.
 Disadvantages
•At the time of creation, the file size must be initialized.
•As it is pre-initialized, the size cannot increase.
                                                                                                   21
 •Linked allocation
• File which we store on the hard disk is stored in a scattered manner according to the space
  available on the hard disk.
• Linked File Allocation , pointers are used to point to the next block of the same file,
  therefore along with the entry of each file each block also stores the pointer to the next
  block.
•Clusters allocation
  A combination of the two allocates clusters of consecutive disk blocks, and the clusters
  are linked. Clusters are sometimes called file segments or extents.
                                                                                                22
 •Indexed allocation
• The indexed file allocation also uses pointers and all the pointers are put together into one
  location which is called index block.
• We will get all the locations of blocks in one index file.
• The blocks and pointers were spread over the memory in the Linked Allocation method,
  where retrieval was accomplished by visiting each block sequentially.
• But here in indexed allocation, it becomes easier with the index block to retrieve.
                                                                                            23
                                         File Headers
•A file header or file descriptor contains information about a file that is needed by the
system programs that access the file records.
•The header includes information to determine the disk addresses of the file blocks as
well as to record format descriptions.
 • which may include field lengths and the order of fields within a record for fixed-length
  unspanned records and field type codes, separator characters, and record type codes for
  variable-length records.
                                                                                              24
•To search for a record on disk, one or more blocks are copied into main memory
buffers.
•Programs then search for the desired record or records within the buffers, using the
information in the file header.
•If the address of the block that contains the desired record is not known, the search
programs must do a linear search through the file blocks.
•Each file block is copied into a buffer and searched until the record is located or all
the file blocks have been searched unsuccessfully.
•This can be very time-consuming for a large file. The goal of a good file organization
is to locate the block that contains a desired record with a minimal number of block
transfers.
                                                                                           25
                               Typical file operations
•▫ OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
•▫ FIND: Searches for the first file record that satisfies a certain condition, and makes it
the current file record.
•▫ FINDNEXT: Searches for the next file record (from the current record) that satisfies
a certain condition, and makes it the current file record.
•▫ READ: Reads the current file record into a program variable.
•▫ INSERT: Inserts a new record into the file & makes it the current file record.
                                                                                               26
•DELETE: Removes the current file record from the file, usually by marking the
record to indicate that it is no longer valid.
•▫ MODIFY: Changes the values of some fields of the current file record.
•▫ CLOSE: Terminates access to the file.
•▫ REORGANIZE: Reorganizes the file records.
•▫ For example, the records marked deleted are physically removed from the file or a
new organization of the file records is created.
•▫ READ_ORDERED: Read the file blocks in order of a specific field of the file.
                                                                                       27
                          Files of Unordered Records (Heap Files)
 •Records are placed in the file in the order in which they are inserted, so new records are
  inserted at the end of the file.
 •This organization is often used with additional access paths, such as the secondary
  indexes
 •Insertion
• ▫ Inserting a new record is very efficient.
  •▫ The last disk block of the file is copied into a buffer, the new record is added, and the
   block is then rewritten back to disk.
• ▫ The address of the last file block is kept in the file header.
                                                                                                 28
                                      Searching
•▫ searching a record using any search condition involves a linear search through the file
block by block an expensive procedure.
•▫ If only one record satisfies the search condition, then, on the average, a program will
read into memory and search half the file blocks before it finds the record.
•▫ For a file of b blocks, this requires searching (b/2) blocks, on average.
•▫ If no records or several records satisfy the search condition, the program must read
and search all b blocks in the file
                                                                                             29
                                        Deletion
• a program must first find its block, copy the block into a buffer, delete the record from
the buffer, and finally rewrite the block back to the disk.
• This leaves unused space in the disk block.
• Deleting a large number of records in this way results in wasted storage space.
• Another technique used for record deletion is to have an extra byte or bit, called a
deletion marker, stored with each record.
• A record is deleted by setting the deletion marker to a certain value. A different value
for the marker indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search.
• Both of these deletion techniques require periodic reorganization of the file to reclaim
the unused space of deleted records.
                                                                                              30
•We can use either spanned or unspanned organization for an unordered file, and it may be used
with either fixed- length or variable-length records.
•Modifying a variable-length record may require deleting the old record and inserting a modified
record because the modified record may not fit in its old space on disk
•For a file of unordered fixed-length records using unspanned              blocks and contiguous
allocation, it is straightforward to access any record by its position in the file.
•If the file records are numbered 0, 1, 2, ..., r − 1 and the records in each block are numbered 0, 1,
..., bfr − 1, where bfr is the blocking factor, then the ith record of the file is located in block ⎣
(i/bfr)⎦and is the (i mod bfr)th record in that block.
•Such a file is often called a relative or direct file because records can easily be accessed
directly by their relative positions.
•Accessing a record by its position does not help locate a record based on a search condition;
however, it facilitates the construction of access paths on the file, such as the indexes.
                                                                                                         31
                          Files of Ordered Records (Sorted Files)
•We can physically order the records of a file on disk based on the values of one of their
 fields called the ordering field.
•This leads to an ordered or sequential file.
•If the ordering field is also a key field of the file a field guaranteed to have a unique
 value in each record then the field is called the ordering key for the file.
                                                                                             32
                 Ordered records advantages over unordered files
•First, reading the records in order of the ordering key values becomes extremely
efficient because no sorting is required.
•Second, finding the next record from the current one in order of the ordering key usually
requires no additional block accesses because the next record is in the same block as the
current one
•Third, using a search condition based on the value of an ordering key field results in
faster access when the binary search technique is used, which              constitutes an
improvement over linear searches, although it is not often used for disk files.
•Ordered files are blocked and stored on contiguous cylinders to minimize the seek time
                                                                                             33
•A binary search for disk files can be done on the blocks rather than on the records.
•Suppose that the file has b blocks numbered 1, 2, ..., b; the records are ordered by
ascending value of their ordering key field; and we are searching for a record whose
ordering key field value is K.
•A binary search usually accesses log2(b) blocks, whether the record is found or not an
improvement over linear searches, where, on the average, (b/2) blocks are accessed when
the record is found and b blocks are accessed when the record is not found.
•Ordering does not provide any advantages for random or ordered access of the records
based on values of the other non-ordering fields of the file. In these cases, we do a linear
search for random access.
•To access the records in order based on a non-ordering field, it is necessary to create
another sorted copy in a different order of the file
                                                                                               34
•Inserting and deleting records are expensive operations for an ordered file because the
records must remain physically ordered.
•To insert a record, we must find its correct position in the file, based on its ordering field
value, and then make space in the file to insert the record in that position.
•For a large file this can be very time consuming because, on the average, half the
records of the file must be moved to make space for the new record.
•This means that half the file blocks must be read and rewritten after records are moved
among them.
•For record deletion, the problem is less severe if deletion markers            and periodic
reorganization are used.
                                                                                                  35
                                    Average Access Times
The following table shows the average access time to access a specific record for a given type of
file
                                                                                               36
•Consider an unordered file of 100000 records with record size of 100 bytes stored on
block of 4096byte with an unspanned record organization. Assume that no system
relation information stored in block. On average how many block would be access to
find particular records
                                                                                        37
  •Consider an sorted file of 100000 records with record size of 100 bytes stored on block
   of 4096byte with an unspanned record organization. Assume that no system relation
   information stored in block. On average how many block would be access to find
   particular records if search is based on ordering field .
• Ans. No. of records per block=└4096/100┘=40
  • No. of blocks=100000/40=2500 blocks
  • Average number of block access to search a record=log2 b
                        • =log2 2500=12 blocks
                                                                                             38
                                          CST 204 DBMS
                                              Module -3
                                     Physical Data Organization
                                         Static and Dynamic Hashing
Prepared by,
Tintu Alphonsa Thomas,
Asst. Professor, CSE,
Amal Jyothi College of Engineering
                                          Hashed Files
✔ Hashing for disk files is called External Hashing
✔ The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ...,
    bucketM-1
    ◾ Typically, a bucket corresponds to one (or a fixed number of) disk block.
✔   One of the file fields is designated to be the hash key of the file.
✔   The record with hash key value K is stored in bucket i, where i=h(K), and h is the
    hashing function.
✔   Search is very efficient on the hash key.
✔   Collisions occur when a new record hashes to a bucket that is already full.
    ◾   An overflow file is kept for storing such records.
    ◾   Overflow records that hash to each bucket can be linked together.
                                                                                            40
• The hash function's output determines the location of disk block where the records are to be
  placed.
                                                                                                 41
                                  Static Hashing Operations
• Search a Record
• When a record is needed, the very same hash function is used in order to get the address of that
  bucket in which the data is kept.
• Insert a Record
• When a new record is entered into the table, the hash key is used to construct an address for the
  new record, and the record is placed there.
• Delete a Record
• To delete a record, we must first retrieve the record that will be destroyed. The records for this
  address will then be deleted from memory.
                                                                                                42
Update a Record
• To edit a record, we’ll use a hash function to find it first, then change the data record.
• If we wish to add a new record to the file, but the address of the data bucket formed by the
  hash function isn’t empty, or information already exists in that address, we can’t add the
  record. Bucket overflow is a term used in static hashing to describe this occurrence.
                                                                                                 43
                                         Open Hashing
• Whenever a hash function generates any address that already contains data, the next bucket is
  assigned to it, and this process is called Linear probing.
• For instance, if R3 is a new address that needs to be entered, the hash function will generate
  110 as R3’s address. However, the address that was produced is already full; as a result, the
  system selects 113 as the next available data bucket and assigns R3 to it.
                                                                                                   44
                                         Close Hashing
• When a data bucket is filled, a new one is created for the very same hash result and connected
  after the old one, and this method is called Overflow chaining.
• For example, if R3 is a new address that has to be added to the database, the hash function will
  assign it the address 110. However, this bucket is too full to accommodate the additional data.
  In this scenario, a new bucket is placed and linked to the end of 110 buckets.
                                                                                                45
■   Multiple hashing: The program applies a second hash function if the first results
    in a collision.
■   If another collision results, the program uses open addressing or applies a third
    hash function and then uses open addressing if necessary.
                                                                                        46
                             Static External Hashing for Disk Files
■   Hashing for disk files is called external hashing.
■   To suit the characteristics of disk storage, the target address space is made of buckets,
    each of which holds multiple records.
■   A bucket is either one disk block or a cluster of contiguous disk blocks.
■   The hashing function maps a key into a relative bucket number, rather than assigning an
    absolute block address to the bucket.
■   A table maintained in the file header converts the bucket number into the
    corresponding disk block address
■   Here we have fixed number of buckets allocated so it is also called static external
    hashing
                                                                                                47
48
•       To reduce overflow records, a hash file is typically kept 70-80% full.
•       The hash function h should distribute the records uniformly among the buckets
        Otherwise, search time will be increased because many overflow records will exist.
Main disadvantages of static external hashing:
    •    Fixed number of buckets M is a problem if the number of records in the file grows or
         shrinks.
    •    Ordered access on the hash key is quite inefficient (requires sorting the records).
    •    searching for a record given a value of some field other than the hash field is as expensive
         as in the case of an unordered file
                                                                                                   49
Hashed Files - Overflow handling
                                   50
                      Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques
    ✔    Hashing techniques are adapted to allow the dynamic growth and shrinking of the number
         of file records.
    ✔    These techniques include the following: dynamic hashing, extendible hashing, and
         linear hashing.
✔       Both dynamic and extendible hashing use the binary representation of the hash value
        h(K) in order to access a directory.
    ✔    In dynamic hashing the directory is a binary tree.
    ✔    In extendible hashing the directory is an array of size 2d where d is called the global depth.
                                                                                                    51
• The dynamic hashing approach is used to solve problems like bucket overflow that can occur
  with static hashing.
• As the number of records increases or decreases, data buckets grow or shrink in this manner.
  This method makes hashing dynamic, allowing for insertion and deletion without causing
  performance issues.
                                                                                                 52
                                     Extendible Hashing
• Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to
  hash data. It is an aggressively flexible method in which the hash function also experiences
  dynamic changes.
                                                                                                  53
54
Extendible Hashing
                     55
• Directories: These containers store pointers to buckets. Each directory is given a unique id which may change
   each time when expansion takes place. The hash function returns this directory id which is used to navigate to the
   appropriate bucket. Number of Directories = 2^Global Depth.
• Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain more than one pointers
   to it if its local depth is less than the global depth.
• Global Depth: It is associated with the Directories. They denote the number of bits which are used by the hash
   function to categorize the keys. Global Depth = Number of bits in directory id.
• Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is associated with the
   buckets and not the directories. Local depth in accordance with the global depth is used to decide the action that to
   be performed in case an overflow occurs. Local Depth is always less than or equal to the Global Depth.
• Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then the bucket is split into
   two parts.
• Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory Expansion is
   performed when the local depth of the overflowing bucket is equal to the global depth
                                                                                                                      56
                              Basic Working of Extendible Hashing
                                                                              Bucket
                                                                                         57
                                       If Overflow occurs
•Local depth is less than or equal to the global depth. Then choose one of the cases below.
Case1: If the local depth of the overflowing Bucket is equal to the global depth, then Directory
Expansion, as well as Bucket Split, needs to be performed. Then increment the global depth and
Directory expansion will double the number of directories present in the hash structure.
•Case2: In case the local depth is less than the global depth, then only Bucket Split takes place.
Then increment only the local depth value by 1. And, assign appropriate pointers.
                                                                                                     58
                               Example –Dynamic Hashing
• Consider the following classification of keys into buckets based on their hash address prefix
                                                                                                  59
• Last two bits in 2 and 4 are 00, they will be placed in bucket B0.
• Because the last two bits of the numbers 5 and 6 are 01, they will be placed in bucket B1.
  Since the last two parts of 1 and 3 add up to 10, they will be placed in bucket B2, and as the
  last two bits in 7 are 11, they will be placed in B3.
                 2                  4
000                                     B0
001              5
                                    9   B1
010
                 1                  3
011                                     B2
100
                          7             B3
101
110 6 B5
111
                                             62
                                            Indexing
• Indexing is used to optimize the performance of a database by minimizing the number of disk
  accesses required when a query is processed.
• The index is a type of data structure. It is used to locate and access the data in a database table
  quickly.
• Primary Indexes
• Clustering Indexes
    • Secondary Indexes
 • Multilevel Indexes
 • Dynamic Multilevel Indexes Using B-Trees and B+-Trees Indexes on Multiple Keys
                                                                                                  63
                                      Index Structures
•Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done.
•An index on a database table provides a convenient mechanism for locating a row
(data record) without scanning the entire table and thus greatly reduces the time it
takes to process a query.
                                                                                               65
                                       Types of index
•Indexes can be characterized as
 1. Dense index
 2. Sparse index
• A dense index has an index entry for every search key value (and hence every record) in the
 data file.
• A sparse (or nondense) index, on the other hand, has index entries for only some of the
 search values.
                                                                                            66
                                        Structure of index
                                                                                             67
                                         Primary Index
•Primary index is defined on an ordered data file. The data file is ordered on a key field.
•The key field is generally the primary key of the relation.
•A primary index is a nondense (sparse) index, since it includes an entry for each disk block of
the data file and the keys of its anchor record rather than for every search value.
                                                                                               68
69
• Example
  •Suppose we have an ordered file with 30,000 records and stored on a disk of block
  size 1024 bytes and records are of fixed size, unspanned organisation.
  •Record length = 100 bytes. How many block access if using a primary index file, with
  an ordering key field of the file 9 bytes and block pointer size 6 bytes.
                                                                                          70
• Ans. Ordering key field of file V=9 byte Block pointer P=6 byte
• we need to construct a primary index for the file size of each index entry=
  9+6=15 byte
• so blocking fctor of index entry=
  •            └1024/15┘=68 entries per block Total no of entries per
               block=┌3000/68┐=45
• To do binary search =┌log2 bi┐=┌log2 45┐=6 blocks
  •To search for a record using index we need additional one block acess =6+1=7 block
   access
                                                                                        71
                                        Clustering Index
•If file records are physically ordered on a nonkey field which does not have a distinct value for
each record that field is called the clustering field and the data file is called a clustered file.
•clustering index speeds up retrieval of all the records that have the same value for the
clustering field.
•This differs from a primary index, which requires that the ordering field of the data file have a
distinct value for each record.
                                                                                                      72
A Clustering Index on Dept_number Ordering Nonkey Field of a File
                                                                    73
• Question
  •Suppose we have an ordered file with 30,0000 records and stored on a disk of block size
   4096 bytes and records are of size 100 byte, unspanned organisation.
  •The ordered file is based on non key field (department_name). How many block
   access if using a primary index file, with an ordering key field of the file 5 bytes and
   block pointer size 6 bytes. There are 1000 department and 300 employees per
   department.
                                                                                              74
•No of record=300000 Block size=4096 b Record size=100 b Block
ptr=6 b Clustered Index Without index:
    • No of blocks=┌300000/40┐=7500
•     No. of block access= log2 7500=13 With clustered index
•     No of index record=┌4096/(6+5)┐=4096/11=372 Total no of index record=1000
      department
•No of clustered index blocks=┌ 1000/372┐ =3 blocks No of clock access= ┌log2 3┐ +1=
2+1=3 block access
                                                                                       75
• Secondary Index
•A secondary index provides a secondary means of accessing a file for which some primary
 access already exists.
•The secondary index may be on a field which is a candidate key and has
•a unique value in every record, or a non-key with duplicate values.
•The index is an ordered file with two fields.
•The first field is of the same data type as some non-ordering field of the
•data file that is an indexing field.
•The second field is either a block pointer or a record pointer.
•There can be many secondary indexes (and hence, indexing fields) for the same file.
•Includes one entry for each record in the data file; hence, it is a dense index.
                                                                                           76
• Example
  •Consider an unordered file with 10^8 records with record size of 400 bytes with an
  unspanned organization. Suppose that we construct a single level secondary index for the
  file where search key field is 16 bytes and block pointer is 4 byte. Assume disk block
  size is 4096
  •How many blocks are required to store the data file?
  •How many blocks are required to store the index file?
                                                                                             77
•Disk file
• ▫ No of data records/ block=4096/400=10
• ▫ No of data block required=ceil(10^8/10)=10^7
•Index file
• ▫ Size of one index record=16+4=20 byte
• ▫ No of index record= No of data record=10^8
• Secondary index on key field
• ▫ No of index record per block=4096/20=204
• ▫ No of index block required=ceil(10^8/204)=490197
                                                       78
• Example 2
  •Consider an unordered file with 10^8 records with record size of 400 bytes with an
   spanned organization. Suppose that we construct a single level secondary index for the
   file where search key field is 16 bytes and block pointer is 4 byte. Assume disk block
   size is 4096
  •How many blocks are required to store the data file?
  •How many blocks are required to store the index file?
                                                                                            79
 •Disk file
• ▫ No of data block required=ceil(10^8*400/4096)=9765625
 •Index file
• ▫ Size of one index record=16+4=20 byte
• ▫ No of index record= No of data record=10^8
 secondary index on key field
• ▫ No of index block required=ceil(10^8*20/4096)=488282
                                                            80
• Example
  •Consider an unordered file with 30000 records with record size of 100 bytes with an
  unspanned organization. Suppose that we construct a secondary index for the key field
  with key field size is 9 bytes and block pointer is 6 byte. Assume disk block size is 1024
  •Average number of disk block access to search for record without index
  •Average number of disk block access to search for record with index
                                                                                               81
 •Disk file
•Index file
                                                                                              86
Two-Level Primary Index
                          87
• Example
  •Consider a file of 16384 records. Each record is 32 bytes long and its key field is of size
  6 bytes. The file is ordered on a non-key field, and the file organization is unspanned.
  The file is stored in a file system with block size 1024 bytes, and the size of a block
  pointer is 10 bytes. If the secondary index is built on the key field of the file, and a
  multi-level index scheme is used to store the secondary index, the number of first-level
  and second- level blocks in the multi-level index are?
                                                                                                 88
•Number of records in file = 16384
•Record size = 32 bytes
•Key Size = 6 bytes
•Block Size on file system = 1024 bytes
•Size of Block Pointer = 10 bytes
•Size of a record or index Entry = 10 + 6 = 16
•     ordered on a non-key field
    •secondary index is built on the key field of the file, multi-level index scheme is used
                                                                                               89
•Without Indexing
• ▫ No of data record per block=1024/32=32
• ▫ No of data block=16384/32=512
•Indexing First Level:
• ▫ No of index record=16384
  secondary index is built on the key field of the file
  • ordered
• ▫ No of index record per block =1024/(6+10)=64
• ▫ No of index blocks=16384/64=256
                                                          90
 •Indexing Second Level
• ▫ No of index records=256
• ▫ ordered on a non-key field
• ▫ No of index record per block=1024/16=64
• ▫ No of index block =256/64=4 blocks
                                              91
• Example 2
• Consider an EMPLOYEE file with 10000 records where each record is of size 80 bytes. The
  file is sorted on employee number (15 bytes long), which is the primary key. Assuming
  un-spanned organization and block size of
• 512 bytes compute the number of block accesses needed for selecting records based on
  employee number if,
• i. No index is used