CS470 Introduction To Database Management Systems: (Chapters 13 and 14 of The Textbook)
CS470 Introduction To Database Management Systems: (Chapters 13 and 14 of The Textbook)
File Organizations
(Chapters 13 and 14 of the textbook)
File Organizations
A file is a collection or set (ordered or unordered) of data elements stored on a storage media. A system software module is responsible for managing (reading, processing, deleting, etc.) a file. Let us see how much work goes in a simple file operation. The system performs the following steps to write a character to a sequential file. Example: write (f1, c); where f1 = text file name. c = a char variable contains a value 'P'. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. In the beginning 'P' is at memory location c. The program asks O/S to write the contents of c to file f1. The O/S passes the job to the File Manager. The file manager looks up f1 in the directory to see if f1 is open and available for use, allowed access types, and what physical file the logical name f1, corresponds to. The file manager searches a file allocation table for the physical location of the sector (last) that is to contain the byte. The file manager makes sure that the last sector in the file has been stored in a system I/O buffer in the main memory, then deposits the 'P' into its proper position in the buffer. The file manager gives instructions to the I/O processor about where the byte is stored in the main memory and where it needs to be sent on the disk. The I/O processor puts the data in the proper format for the disk. It may also buffer the data to send it out in chunks of the proper size for the disk. The I/O processor sends the data to the disk controller. The controller instructs the drive to move the read/write head to the proper track, waits for the desired sector to come under the read/write head, then sends the byte to the driver to be deposited bit by bit, on the surface of the disk. (Remember this byte has its companion and each companion is deposited on the disk bit by bit).
The complete the entire set of operations about 10-12,000 assembly instructions are executed.
Record: A set of logically related fields. Cardinality of this set may be fixed or variable, i.e., a record size may be fixed or variable. A file, therefore, is a collection of logically related records
A Field SSN A Record Name Age Phone # SSN Name A File Age Phone #
2/22
A record can be of fixed or variable size. Most commonly fixed size records are used. The file structure has been illustrated here in a tabular form but note that a file is not just a table its records can be organized in many different ways each way defines a unique organization. Operations on file Create: Write: Read: Rewind: Delete: Close: Open: Modify: Create a new file. Write to a file Read a part of a file Go to the beginning of the file Remove a file from the system. Close a file (at the end of manipulation). Open a file for processing. Modify a field value.
Relationship of a file with the storage media Common storage media are disk, optical disk, and tame. Disk is a random storage device because a record of a file can be accessed from the place it is stored. In some cases a file acquires the attributes of the storage media on which it is stored, i.e., magnetic tape. To access a record of a file stored on a magnetic tape, unlike disk files, the entire file has to search sequentially before reaching the record. Disk File structure: Several platters, several cylinders, several tracks on a cylinder, several sectors on a track (Sector = Block (fixed by manufacturer)). Access path: Disk No Cylinder No Track No Sector No Unit of data transfer between memory and disk: Block. Relationship between logical and physical units A fixed record size may be larger or equal to or smaller than a disk block. For storage purpose a unit called Blocking factor (Bfr) is defined, which is the number of records in a block and computed as follows: Blocking factor - Bfr (Number of records in a disk block) = B/R where B is disk block size in number of records and R is file record size. Thus, unused space in a block: B - (Bfr * R) bytes Disk parameters: Seek time, rotational latency time and block transfer time. Seek time: The time it takes to move the read/write arm to the correct cylinder. It is the largest in cost. Average seek time is the same as the time it takes to traverse one third of the cylinders. Rotational latency time: The time the disk unit takes to move or to position the read/write on to the beginning of the sector from where the file records are stored. Block transfer time: Time for the read/write head to pass over a disk block. During the block transfer time, the bytes of data can be transferred between the disk and main memory.
3/22
Rotational latency and Block transfer time Average rotational latency (r) = 1/2 of one disc revolution = 0.5. rpm = No. of rotations per min. 1 60 1000 0.5 60 1000 min = ms. Time for half revolution = ms. Time for 1 revolution= rpm rpm rpm Common speed for disk 0.5 60 1000 ms = 8.3ms 3600 rotation is 3600 rpm so average latency time =
Let data transfer speed = t bytes/ms and block size = B, then the Block transfer time (btt) = B/t ms. In a formatted data on the disk there is a gap between two consecutive blocks. Some control information (information about the following block) is stored in this gap. In reading a set of blocks these gaps are also read so we need to take into account the time to read these gaps in the computation of block transfer time. Suppose t' = data transfer speed for formatted data (block and the gap). So the effective block transfer time Ebt = B/t'. We use IBM 3380 disk in our example to compute various parameter values. The total capacity of the disk is over 2 billion bytes. Table 1 lists the values of some useful parameters of this disk. Parameter B t btt t' ebt N r s Table 1 Definition Block size Data transfer speed Block transfer time Formatted data (interblock gap) speed Effective block transfer time No. of cylinders Average rotational latency Average seek time Value 2400 bytes 3000 bytes/ms 0.8ms 2857 bytes/ms 0.84 ms (= B/t') 885 per spindle 8.3 ms 16 ms
Time to read 10 blocks sequentially It is the sum of average seek time, average rotational latency time, and 10 effective transfer time. Effective block transfer time must be used, since the head must move over the interblock gap. So reading time = s + r + 10 ebt = (16 + 8.3 + 8.4) = 32.7 ms. Note that the seek time and the rotational latency time are significant with respect to the transfer time. Time to read 10 blocks randomly In this case individual seek time and rotational latency time are required for each block. So reading time = 10 (s + r + btt) = 10 (16 + 8.3 + 0.84) = 10 25.14 = 251 ms. The random time for reading 10 blocks is higher than the sequential time by a factor of 8. Thus, time for reading 100 sequential blocks = 0.1083 sec. Time for reading 100 random blocks = 2.510 sec.
4/22
The time for reading randomly 100 blocks is 25 times larger than time for reading 100 blocks sequentially. Conclusions:
File Organizations
One of the main objectives of file organization is to speed up the file retrieval time, that is, to reduce the I/O time. This can be done by improving disk access speed but that has a limit because of the hardware constraints. The other way is to store file records in such a way that records can be accessed concurrently or in parallel. A hardware solution to this is using RAID structure. A RAID is an array of several disks which can be accessed in parallel. A software solution is through some file organization. We discuss a number of file organizations. Note that no single file organization is the most efficient for all types of data access. For this reason a database management system uses a number of different file organizations for storing the same set of data. There are basically three categories of file organizations (a) Sequential organization, (b) organization using Index, and (c) Random organization. We begin with sequential category. Files of these categories are called sequential files. We proceed as follows; (a) understand the file organization, (b) understand how a set of operations are performed, and (c) analyze the time required to performed each operation. This will help us to identify which file is suitable for what kind of data processing.
Sequential file
In this organization records are written consecutively when the file is created. Records in a sequential file can be stored in two ways. Each way identifies a file organization. Pile file: Records are placed one after another as they arrive (no sorting of any kind). Sorted file: Records are placed in ascending or descending values of the primary key.
Pile file: The total time to fetch (read) a record from a pile file requires seek time (s), rotational latency time (r), Block transfer time (btt), and number of blocks in the file (B). We also need a key field of the record (K). In pile file we do not know the address of a record, so we search sequentially. The possibilities are (a) the desired record may be in the first block or (b) it may be in some middle block or (c) it may be in the last block.
1 B 1 B ( B + 1) B i = . Thus the time to find B i =1 B 2 2 B (read) a record in a pile file is approximately: TF = ebt . The fetching (searching) process is 2 shown in the flow diagram (Figure 1) and the example illustrates the time involved.
5/22
Read next record in the block Is K = the key value of the record? N N Last record of this block? Y All blocks searched? Y Display message: "Record does not exist" and exit the search Y Exit
Figure 1. Search in a sequential file organization. File Reorganization: In file reorganization all records, which are marked to be deleted are deleted and all inserted records are moved to their correct place (sorting). File reorganization steps are: read the entire file (all blocks) in RAM. remove all the deleted records. write all the modified blocks at a different place on the disk.
Time to read the entire old file blocks = B (No. of blocks in the file) ebt. Time to write out new file blocks (n) = (n /Bfr) ebt. Total reorganization time TR = (b + n/Bfr) ebt. Example Environment: Record size: File size: Storage disk: Block size: Number of blocks in the file (B): The total time to read a record is TF: Hospital 400 bytes. 100,000 records 40 megabytes. IBM 3380. 2400 bytes. 40 mb/2400 = 16,667. 8333 0.84 ms = 7 secs.
This time is tolerable for infrequent reads. However, if frequency of reads is large, then this time is not acceptable. Suppose we want to look for 104 names of patients. For this the total search time is = 7 104 ms = 19 hrs. and time to read through the entire file with independent fetches is TX (independent fetches) = 100,000 7000ms > 8 days. Missouri has about 5 million personal income tax record of 400 bytes each. This will fit on one IBM 3380 drive. The time to read this file with independent searches is: TX =
5000000 7000 = 2 years . 1000 60 60 24 365
6/22
Inserting a record: To insert a record, it is placed at the end of the file. No need to sort (ascending or descending order) the file. However, the file may have duplicate records. The total time involved in this process is TI (insert time) = s (seek) + r (latency) + btt + r = 42 ms (on IBM 3380). Deleting or modifying a record: This will require to fetch the block containing the record, find the record in the block and just mark it deleted, then write the modified block to the disk. Total time required: TD or TM = TF + 2r. Sorted Sequential File: In a sorted file, first the record is inserted at the end of the file and then moved to its correct location (ascending or descending). Records are stored in order of the values of the key field.
Record 1 Record 2
Record n
A sequential file organization A sequential file usually has an overflow area. This area is to avoid sorting the file after every deletion, insertion and/or modification. All records, which were added after the file was first populated go to this overflow area. The overflow area is not itself sorted it is a pile file with fixed size record. At some convenient time the entire file is sorted where records from the overflow area go to their correct position in the main file. Record ordering: Stored in an ascending or descending order of key = K. Record Rj precedes record Rk iff K(Rj) <= K(Rk). Retrieval: Records are retrieved in a consecutive order. The order of record storage determines order of retrieval. During retrieval several required operations (partial result output etc.) can be performed simultaneously. Insert, delete and modify (Update): Sequential files are usually not updated in place. Each operation regenerates a new version of the file. The new version contains the up-to-date information and the old version is usually saved for recovery. The new version becomes the old version and the next update to the file uses this old version to generate next new version. The intensity or frequency of use of a sequential file is defined by a parameter called Hit ratio, which defines is defined as follows: Hit ratio =
No. of records accessed for responding to a query . Total number of records in the file
Desirable: high hit ratio value. This means a larger number of records are accessed to respond to a query. Interactive transactions have very low hit ratio. Advantages of sequential file Good for batch transactions. Simple to implement
7/22 CS470: Introduction to File Organizations. Vijay Kumar
Good for report generation, statistical computation and inventory control. Not good for interactive transactions High overheads in file processing for simple queries.
Disadvantages
Nondense index: No. of entries in the index file << no. of records in the data file. Dense index: No. of entries in the index file = no. of records in the data file. Example: How record search time is reduced by indexing. Sequential search (no indexing) File size = 30,000 records. Record size = 100 bytes (R). Block size = 1024 bytes (B). Blocking factor = 1024/100 =10 records per block. No. of blocks for the data file = 30,000/10 = 3000 blocks. A binary search on the data file log2 3000 = 12 blocks accesses. Indexing Primary key = 9 bytes. Block pointer = 6 bytes. Index entry size = 9 + 6 = 15 bytes. Index file blocking factor = 1024/15 = 68 entries. No. of blocks for index file = 3000/68 = 45. Binary search on index file = log2 45 = 6 blocks accesses. Total no. of accesses = 6 + 1 = 7.
Organization: Data records are stored on disk tracks in their ascending key values. Organization: Data records are stored on disk tracks in their ascending key values. Track No. Key Track 1 Track 2 Track 3 Track 4 Track 5 10 26 72 77 89 Record rec10 rec26 rec72 rec77 rec89 Key Record 14 34 73 78 91 rec14 rec34 rec73 rec78 rec91 Key 18 41 74 79 93 Record rec18 rec41 rec74 rec79 rec93 Key Record 20 60 75 82 98 rec20 rec60 rec75 rec82 rec98
To locate a data record a Track Index is required. Track Index Track No. 1 2 3 4 5 Operation on the file: Read: record 79. Sequential access: Search sequentially from track 1, record 10. Very slow. Semi-random access: (a) Search track index, (b) find the track that contains a key greater than 79. This takes us to track 4. Search track 4 sequentially to find record 79. Problem: For a large file track index would be large. A large index is not convenient to manage. Solution: Create Cylinder index to manage track index.
Cylinder index Cylinder Highest key 1 98 2 184 3 278 Track index for cylinder 1 Track Highest key 1 2 3 4 20 60 75 82 Track index for cylinder 2 Track Highest key 107 1 2 122 3 148 4 163 Track index for cylinder 3 Track Highest key 201 1 210 2 223 3 4 259
Find record with key value 248. Steps Search Cylinder index to find correct cylinder. Correct cylinder 3. Search track index for cylinder 3. Correct track 4. Search track 4 (sequentially) for the desired record.
9/22 CS470: Introduction to File Organizations. Vijay Kumar
Problem: Cylinder index may become very large. Example: File size = 5000 tracks of data. Disk Type: 3350 (IBM), 16 platters/volume, that is, 30 surfaces/volume, and 555 tracks/surface, that is, 555 cylinders/volume. A cylinder has 30 tracks, so to store the entire file 5000/30 =167 cylinders are required (cylinder wise data storage, i.e., store file records on the first track of cylinder 1, then first track of cylinder 2, and so on. ). Apart from the file, there are other controlling information such as track, cylinder, and master indexes, therefore, all 30 tracks are not used to store data. 25 tracks are used for data and the rest are for control information. Result: 5000/25 = 200 cylinders for indexes and data, which means there are 200 entries in a cylinder index. Each entry of index record may be large. The size of cylinder index might grow with the file size. To reduce the search time for a cylinder index, a master index is created, which holds the address of cylinders storing the records. Overflow area: It accommodates overflow record from the prime area. If the record key indicates that the record must go on a track, which is full then some records from this track is moved to the overflow area. With overflow area the track index takes the following form: Track No. 1 2 3 4 5 Highest key on track Highest key in overflow area Addr. of the first overflow 20 20 Null 60 60 Null 75 75 Null 82 82 Null 98 98 Null
File Processing Deletion: Simple. Find the desired record. Put a special marker into the record to indicate that it has been deleted. Insertion: Cumbersome. Example: Insert record 55. Destination: Track 2. Track 2 is full. Move record 60 to the overflow area. Modify the track index.
10 26 72 77 89 14 34 73 78 91 18 41 74 79 93 20 55 75 82 98 1 2 3 4 60 20 55 75 82 20 60 75 82 Null Address of Rec. 60 Null Null
Add record 42. Result Prime Area Track Index. Other records can be inserted similarly. When the overflow area is full then records are inserted in overflow area on the disk, however, this seldom happens.
10 26 72 77 89 14 34 73 78 91 18 41 74 79 93 20 42 75 82 98 1 2 3 4 20 42 75 82 20 60 75 82 Null Address of Rec. 55 Null Null
10/22
60
55
Implementation: The cylinder index is usually kept in the memory. The cylinder index gives the address of the correct track index, which is located on the disk. The track index is fetched and the address of the record is found. Performance: Time to fetch a record: TF = r + s + btt + r + btt As new records are added, the ISAM file degraded in performance. It becomes slow and often had to be reorganized at high cost. This is why ISAM is not the choice of indexing method for new systems. Let us suppose that all overflows is on the same cylinder. Then to look up a record, we obtain cylinder track index and search the primary area. If the record is not there then we must follow the chain of pointers to various records in the overflow area of the cylinder. If this overflow area does not contain the record then the second overflow area on another cylinder (not necessarily the adjacent cylinder). So a search for a record may require several seeks. Similarly, reading the file in order after many insertions becomes quite complicated. Each linked list must be followed, record-by-record, to read sequentially. This can require a large number of seeks. ISAM is not used any more. It is replaced by B+ structure.
M T
In an index set (block) the index records are maintained physically in ascending sequence by primary key value, The index records in the index set are nondense, meaning that there is only one index record for a lower level index block. Index Sequence Set: Control Interval: Nondense. Each record in this set points to one Control Interval. Set of data records, which are physically stored in ascending key values. Control Area: An ordered set of control intervals and Free control intervals for a file. Distributed free space: Set of free control intervals in a control area. The sequence set contains index records that point to every control interval allocated in the file. The following diagram illustrates Index sequence set, Control intervals, Distributed free space and Control area. The size of control area and the number of control intervals may be predefined. At the end of file initialization the unfilled control intervals are set aside as distributed free space.
11/22
D H M P S Free Free
File Processing Sequential access: It can begin from the beginning of the file or from any other point. Random access: Through the index set to the index sequence set to the data. Update (insert a new record): During insert the following situation might occur: a. Space available in the target control interval. b. New record does not fit in the target control interval. c. The control area is completely full. Space available in control interval Bring desired control interval in the buffer. Move key values higher than the target record up in the control interval. Insert the desired record in the control interval Write the control interval back to the file.
C
D F
A CD F
Not enough room in the target control interval: The available room in the control area is insufficient to accommodate the new record. In order to insert C we must get a new control interval. If there is a free control interval in the control area then the records with higher key values from the control interval are copied into the new control interval. If there is no free control interval in a control area then a new control area is allocated and initialized. The initialization process copies all the control intervals with records having value smaller than the new record. The new record is then copied into the proper place and the rest of the records are then copied. The old control area is discarded, but this seldom happens.
12/22
C Record to be added
A B C
D F
C F
A B C
D F
The control area is completely full: In this situation an entire new control area is created, added to the file, and then populated. File Size and File creation: User may specify the number and size of control intervals and control areas. If not specified, VSAM will use some default value. The size of control interval dictates the size of the unit of I/O data transfer. It is important for performance reasons to make the control interval a multiple of the actual physical blocks on the disk itself. File name: Given by the user. This name is stored in the master catalogue. Freespace: The user has the capability of allocating distributed free space throughout the file. This space will not be used when the file is populated. However, this will be available for control interval spliting tasks. Record size: Greater flexibility. The user must specify not only the maximum record length but also the average record size. VSAM can use the average size to perform calculations as to how large, for example, a control interval must be. File population: One of the main ways is to sort all data records by their key field and present this to the system. This is called mass insert and can optimize the process of how to move records within control intervals. The following diagram illustrates this process:
Empty file Add records A, B, C, and D A B C D E F G H I Add records E, F, G, H, and I A B C D Mass insert into empty file Original file A D K Z Add record L A D K L A D K L Z M N O P
Direct File
Indexing provides the most powerful and flexible tools for organizing files. However, (a) indexes take up space and time to keep them organized and (b) the amount of I/O increases with
13/22
the size of index. This problem can be minimized by direct file organization where the address of the desired record can be found directly (no need of indexing or sequential search). Such file are created using some hashing function so they are called hashing organization or hashed files.
Hashing
Two types of hashing: (a) Static and (b) Dynamic. Static: The address space size is predefined and does not grow or shrink with file. Dynamic: The address space size can grow and shrink with file. Key Space: Set of Primary keys. Address Space: Set of home addresses. Hash function (Primary Key) Home address of the record. Home address file manager disk address. Requirements: 1. A predefined set of home addresses (address space). 2. A hashing function. 3. Primary key (Hashed key). There is a unique relationship between the key and the record address. Any add, delete or change in a record must not break this relationship. If this bond is broken then the record is no longer accessible by hashing. Address distribution 1. With hashing, the address generated is random. 2. No obvious connection between key and the home address. (For this reason hashing is sometimes referred as randomizing.) Record distribution in address space: a. Uniform. b. Random.
Best Record Key Address 1 2 A 3 B 4 C 5 D 6 E 7 F 8 G 9 Worst Record Key Address 1 2 A 3 B 4 C 5 D 6 E 7 F 8 G 9 Acceptable Record Key Address 1 2 A 3 B 4 C 5 D 6 E 7 F 8 G 9
No synonyms
All synonyms
Few synonyms
Collision: When a hashing function generates the same home address for more than one record. Solutions for collision: Progressive overflow (also called open addressing)
5 6 7 8 N R First try (full) J Second try (full) M Third try (full) Fourth try. K's home address
5 6 7 8 9 N R J M S
H(K)
Home address
Wrap around
H(T)
Home address
Problems What happens if there is a search for a record, but the record was never placed in the file?
14/22
What happens if a record is hashed at a filled address and only the predecessor location is empty?
These problems increase the average search length. Example: Consider the following keys and their home addresses: Actual address No. of accesses to retrieve a record 0 -20 Adams 1 21 Bates 1 22 Cole 2 23 Dean 2 24 Evans 5 25 Average search length = (1 + 1 + 2 + 2 + 5)/5 = 2.2. Acceptable average search length = 2.0. Anything higher than this is regarded high. Improvement (reduce the average search length): by storing more than one record at a single home address. Storing records in Buckets Bucket: A logical unit of storage where more than one records are stored and the set of records set is retrieved in one disk access (one I/O). The home addresses of these records are the same. On sector-addressing disks, a bucket typically consists of one or more sectors. On block-addressing disks a bucket might be a block (physical, defined by the manufacturer). Consider the following set of keys, to be loaded into a hash file. Key Home address Buckets Green 30 30 Green Hall Hall 30 31 Jenks 32 32 Jenks King 33 33 King Land Marks Land 33 Nutt is an overflow record Marx 33 Nutt 33 Each address (30, 31 etc.) identifies a bucket capable of holding the records corresponding to three synonyms. Only the record corresponding to Nutt cannot be accommodated in a home address. Access record: Green. 1. Load the buffer in the memory. 2. Search the bucket serially. Deletions: More complicated than adding a record for two reasons: 1. The slot freed by the deletion must not be allowed to hinder later searches. 2. It should be possible to reuse the freed slot for later additions. In progressive overflow a search for a record terminates if an open address is encountered. Problems in searching a record Key Adams Bates Cole Dean Evans Home address 20 21 21 22 20
15/22
Four keys: Adams, Jones, Morris and Smith. Collisions: Adam and Smith at address 5, and Jones and Morris at 6. The following diagram illustrates the setup: Record Home address Actual address Adam 5 5 Jones 6 6 Morris 6 7 Smith 5 8 File before deletion 5 6 7 8 File Adam Jones Morris Smith
Now suppose Morris is deleted, leaving an empty space. A search for Smith will starts at address 5, then looks at addresses 6 and 7. Since address 7 is now empty, it is reasonable for the program to conclude that Smith's record is not in the file. Solution: Mark the deleted record with a tombstone. The search for a record will not stop at a tombstone. This solution makes insertion difficult. Example: Record deleted: Morris. Record to be inserted: Smith. The program gets the address 5 and tries to find an empty slot. During the search it encounters a tombstone. The record Smith is inserted here. Problem: The File has duplicate Smith. Solution: The program must search the entire file and then go back to the first tombstone and insert the record. Another problem with a tombstone is it should not be inserted when the last record or a record, which does not have any successor, is deleted.
Virtual Hashing Uses multiple hashing functions. These functions are related and the selection of the function depends upon the level of bucket split. Division is the basis of hashing. We assume N = Number of buckets. Address = key mod N (hashing into any bucket). When a bucket is full then to accommodate an insert: The bucket is split. A different but related hashing function is used to distribute the records of the old bucket between the old and the new buckets. The desired record is inserted in these buckets using the current hashing function.
16/22 CS470: Introduction to File Organizations. Vijay Kumar
Example N = 100, C = 4 (bucket size). Record Keys: 508, 17308, 208, 408. Hashing function: address = key mod 100. Records hashed into the bucket as follows: 508 17308 208 408 Bucket 8 The bucket is full. New record 1208 is to be inserted there. The insertion of 1208 forces a split. A new bucket is acquired on the disk. The reorganization of these records is done first in the memory buffer then the records are written on the disk buckets (old and new). The five records will be rehashed using function: address = key mod 2N (key mod 200). The following diagram shows the state of the file after the distribution of the four old records and the insertion of new record with key 1208. Buckets after rehashing 208 508 408 17308 1208 Bucket 8 Bucket 108 Suppose the record 1608 is now inserted into bucket 8 and then record 5808 comes. We need to split bucket 8 a second time, and on this occasion we use: address = key mod 4N (key mod 400). Bucket 208 is allocated to the file. The file after a second split 2408 508 1208 17308 1608 Bucket 8 Bucket 108 208 5808 Bucket 208
In general, if we are splitting a bucket that has been involved as destination of S previously splits, we use the function: address = key mod (2s+1 N). Disadvantages Waste of space (unused buckets after each split). If two buckets n and n+j are in use then all buckets between n and n+j must be available to the algorithm for inserting records. The history of insertion must be saved to access a record. This is because many related hashing functions are used in the search operation. Dynamic Hashing In this scheme there is no relationship between the buckets. The position of one bucket is not related to the position of any bucket, which was the case in virtual hashing. Initially files are organized in N buckets. These buckets reside on the disk but each of them is pointed to by a cell in the memory, and bucket address is therefore not as important as in Virtual hashing. Suppose N
17/22
(number of buckets) = 20. C (bucket size in number of records) = 3. We have the following layout: 1 2 3 Index level 0 Main memory Secondary storage Buc. 1 Buc. 2 Buc. 3 A hash function transforms a key into an address of a level 0 index element (1 through 3 in this case). The appropriate bucket is accessed by the pointer. Insertion: The hashing function gives the address of an index element and the record is inserted in the bucket pointed to by that node. If the target bucket is full, a new bucket is allocated to the file. The C + 1 records are redistributed between the two buckets. These two buckets then are pointed to by the children of the original index node as shown in the following diagram:
1 2 3 Index level 0 Index level 1 RAM Disk Buckets 1 2 3 21
Hash File after Bucket Split Two questions When redistributing records between two buckets (bucket split), how do we decide which record goes to which bucket? A record search begins from the Index level 0. When searching a record, how do we find which of the two buckets pointed to by the index block contains the record? Solution: To solve the above problems, a second function is used. This function, given a key (usually primary), returns an arbitrarily long binary string (1's and 0's) of fixed length (predefined) as shown below. This string is referred to here as B (Key). Key
157 95 88 205 13 125 6 301
H (Key)
2 1 1 2 1 1 1 1
B (Key)
10100... 00011... 01100... 10010... 10111... 10001... 01000... 00110...
How a B (Key) is used?: The bits of B are used to identify the right bucket for inserting and for searching the desired record. If the bucket being split is pointed to from a node at level I in the tree, then we use the value of the (I + 1)st digit of B (key) to determine if a record goes to the left (old) or to the right (new) bucket. The following diagram shows the insertions of the first five records
18/22
95 83 13
157 205
Bucket 1 Bucket 2
File after 5 insertions Insert 125 and 6 The hashing on key 125 gives 1 so it should be inserted in bucket pointed by node 1 of level 0. The bucket is full so it splits. A new bucket is allocated to the file. The full bucket was pointed to by a node at level 0, so we use the first digit in B (key) string to decide the destination bucket of a record 125. Records 95 and 88 go to the left bucket, and records 13 and 125 go to the right bucket. Record 6 goes to Bucket 1.
1 0 1 2 Index level 0 Index level 1 RAM Disk 95 88 6 Bucket 1 157 205 Bucket 2 13 125 Bucket 3
Split after inserting 125. No split in inserting 6 Insert record 301 H(301) = 1, so its B (key) takes it to bucket 1. Bucket 1 is full so a split occurs. The bucket to be split is being pointed to by a leaf at index level 1. So in distributing the records (old) and inserting new record (301), 2nd bit (Index level 1+1 = 2) of B key is used for distribution. The structure of the file after inserting 301 is:
1 Index level 0 Index level 1 0 Index level 2 95 301 Bucket 1 1 RAM Disk 157 205 Bucket 2 88 6 Bucket 3 13 125 Bucket 4 0 1 2
Bucket Split to accommodate 301 Indexing and Hashing Hashing of any form is not at all good for sequential access. So a process, which requires a few logically contiguous records may not be processed efficiently and indexing is the best organization for such requirements. The performance of index organization degrades when the volume of updates and insertion increases. There are a large number of different file organizations such as extendible hashing, linear hashing, B-trees group (B+-tree, B-tree, etc.) but all are based in some form of indexing and hashing. In a database system, a number of different file organizations are used to store data. In
19/22 CS470: Introduction to File Organizations. Vijay Kumar
many cases the same data is stored in multiple file organizations to improve data retrieval for different types of queries (key query, range query, mult-key queries, etc.). Students are strongly advised to study these file organizations from other books.
Disk-Space Management
System allocates disk space to file blocks. Disk space management has two problems: (a) several orders of magnitude slower disk access time and (b) an order of magnitude larger number of blocks to deal with. Moreover, the variability and dynamic changes in file sizes make prediction of their resource requirements difficult and unreliable in most cases. A good space management mechanism should take into consideration: 1. Processing speed of sequential and random access to files, and allocation and deallocation of blocks. 2. Ability to make use of multisector and multitrack transfers. 3. Disk utilization. 4. Main memory requirements of a given algorithm. Processing speed: Speed of sequential and random access to files and not the allocation and deallocation of blocks. Multisector and multitrack transfer: Least number of I/O commands for large transfers, i.e., the entire track or a large number of sectors. Disk utilization: Percentage of disk space allocatable to files, i.e., minimal fragmentation. Main memory utilization: Generates low table fragmentation. Methods of space allocation: Contiguous and non-contiguous. Contiguous In this method disk blocks are allocated contiguously. It also maintains logical contiguity. If we represent the disk space as a 2-dimensional matrix, then a contiguous allocation can be arranged as follows. LB indicates logical block of the file.
0 5 10 LB0 15 1 6 11 LB1 16 2 7 12 LB2 17 3 8 13 LB3 18 4 9 14 LB4 19
Steps 1. 2. 3. 4. 5. Estimate file size. Search BFD (Basic File Directory) to find (First Fit or Best Fit) contiguous disk blocks. Allocate the set of disk blocks to the file. Update BFD. Update free disk table.
Possible access type: sequential and random. Sequential: Address of the first block and file size is given in the BFD. Read all the blocks sequentially. Random: Address of the first block + (block number block size). Sequential access is fast. Multitrack or multisector transfer can be conveniently used.
20/22
Deallocation: Easy. Find the file in the BFD. Overwrite just the type field and leave the address and size field as they are. Advantages of contiguous allocation Easy implementation. Very little directory and memory space is needed for keeping track of contiguous files. Disk fragmentation: internal and external. File size must be pre-declared and pre-claimed especially in creating files, alright in copying files. For example, what would be the size of a file when creating it using editor? Must provide an overflow area for managing the file expansion.
Noncontiguous allocation: Chaining. A disk based version of linked list. Implementation: A few bytes of each disk block are reserved for forward pointer to the next block in the sequence. The directory contains the head pointer. The following figure shows the allocation.
0 5 10 LB0 [next 3] 15 1 6 11 16 2 7 12 17 3 LB1 [next 18] 8 LB4 [next - nil] 13 19 LB2 [next 20] 4 9 14 20 LB3 [next 8]
Properties Well suited for sequential access. Multisector transfer is difficult. Direct access very slow, in fact it is not easily possible since all intermediate disk blocks must be visited before going to the desired block. Expansion and contraction of files can be easily managed, therefore, pre-declaring of file size is not necessary. Minimum external fragmentation, therefore disk compaction is not necessary. Space for disk pointer is usually below 1%, its effect on the disk utilization is negligible. Allows easy handling of bad blocks by omitting them from both file and free chain. Insertion and deletion of file blocks in the middle of the chain is easy. Pointers are sensitive data and their correctness can be questionable.
0 5. 0 4 1 17 2 18 3 7 4 16 10 15 11 16 LB4 12 17 LB1 13 18 LB2 14 19 1 6 2 7 LB3 3 8 4 LB0 9
Noncontiguous allocation: Indexing. Based on index sequential file. The first disk block of the file is reserved as index block and the addresses of the index block is stored in the file directory. The following figure illustrates the allocation.
21/22
Properties Both random and sequential accessing of indexed files requires one disk access per level of indexing to locate the address of the target block. Multisector transfer is not easy.
22/22