File System
Himadri Vaidya
File Concept
Computers can store information on various storage media such as, magnetic disks,
magnetic tapes, optical disks.
The physical storage is converted into a logical storage unit by operating system.
The logical storage unit is called FILE.
A file is a collection of similar records.
A record is a collection of related fields that can be treated as a unit by some application
program.
A field is some basic element of data.
Any individual field contains a single value.
A data base is collection of related data
<date/time> <footer> 2
DATA FILE
Student name, Marks in sub1, sub2, Fail/Pass is fields.
The collection of fields is called a RECORD.
RECORD:
Collection of these records is called a data file.
<date/time> <footer> 3
FILE ATTRIBUTES
Name : A file is named for the convenience of the user and is referred by its name. A name is usually
a string of characters.
Identifier : This unique tag, usually a number ,identifies the file within the file system.
Type : Files are of so many types. The type depends on the extension of the file.
Example:
.exe Executable file
.obj Object file
.src Source file
Location : This information is a pointer to a device and to the location of the file on that device.
Size : The current size of the file (in bytes, words,blocks).
Protection : Access control information determines
<date/time> <footer>who can do reading, writing, executing and
4 so on.
FILE OPERATIONS
Creating a file : Two steps are needed to create a file. They are:
Check whether the space is available or not.
If the space is available then made an entry for the new file in the directory. The entry includes name of the file, path of the
file,etc...
Writing a file : To write a file, we have to know 2 things.
One is name of the file and second is the information or data to be written on the file, the system searches the entire given
location for the file.
If the file is found, the system must keep a write pointer to the location in the file where the next write is to take place.
Reading a file : To read a file, first of all we search the directories for the file, if the file is found, the system needs
to keep a read pointer to the location in the file where the next read is to take place. Once the read has taken
place, the read pointer is updated.
Repositioning within a file : The directory is searched for the appropriate entry and the current file position
pointer is repositioned to a given value. This operation is also called file seek.
Deleting a file : To delete a file, first of all search the directory for named file, then released the file space and
<date/time> <footer> 5
erase the directory entry.
FILE TYPES
The name of the file split into 2 parts.
One is name and second is Extension.
The file type is depending on extension of the file.
<date/time> <footer> 6
FILE TYPES
<date/time> <footer> 7
FILE ACCESS METHODS
Files stores information, this information must be accessed and
read into computer memory.
There are so many ways that the information in the file can be
accessed.
Sequential file access
Direct access
Indexed Sequential File access
<date/time> <footer> 8
Sequential file access
Information in the file is processed in order i.e. one record after
the other.
Magnetic tapes are supporting this type of file accessing.
Eg : A file consisting of 100 records, the current position of read/write
head is 45th record, suppose we want to read the 75th record then, it
access sequentially from 45, 46, 47 ........ 74, 75.
So the read/write head traverse all the records between 45 to 75.
<date/time> <footer> 9
Direct access
Direct access is also called relative access.
Here records can read/write randomly without any order.
The direct access method is based on a disk model of a file, because disks allow
random access to any file block.
Eg : A disk containing of 256 blocks, the position of read/write head is at 95th block.
The block is to be read or write is 250th block.
Then we can access the 250th block directly without any restrictions.
Eg : CD consists of 10 songs, at present we are listening song 3, If we want to listen
song 10, we can shift to 10.
<date/time> <footer> 10
Indexed Sequential File access
The main disadvantage in the sequential file is,
it takes more time to access a Record.
Records are organized in sequence based on
a key field.
Eg : A file consisting of 60000 records,the master
index divide the total records into 6 blocks, each
block consisting of a pointer to secondary index.
The secondary index divide the 10,000 records
into 10 indexes.
Each index consisting of a pointer to its orgin
allocation.
Each record in the index file consisting of 2 field, A
key field and a pointer field.
<date/time> <footer> 11
DIRECTORY STRUCTURE
Sometimes the file system consisting of millions of files,at that
situation it is very hard to manage the files.
To manage these files grouped these files and load one group
into one partition.
Each partition is called a directory .
A directory structure provides a mechanism for organizing many
files in the file system.
<date/time> <footer> 12
OPERATION ON THE DIRECTORIES
Search for a file : Search a directory structure for required file.
Create a file : New files need to be created, added to the directory.
Delete a file : When a file is no longer needed,we want to remove it from the
directory.
List a directory : We can know the list of files in the directory.
Rename a file : When ever we need to change the name of the file, we can
change the name.
Traverse the file system : We need to access every directory and every file
with in a directory structure we can
<date/time> traverse the file system.
<footer> 13
Directory Structures
Single level directory
Two level directory
Tree structured directory
Acyclic graph directory
General graph directory
<date/time> <footer> 14
Single level directory
The directory system having only one directory, it consisting of
all files some times it is said to be root directory.
<date/time> <footer> 15
Two level directory
The problem in single level directory is different user may be
accidentally use the same name for their files.
To avoid this problem each user need a private directory,
Names chosen by one user don't interfere with names chosen
by a different user.
<date/time> <footer> 16
Tree Structured Directory
Two level directory eliminates name conflicts among
users but it is not satisfactory for users with a large
number of files.
To avoid this create the sub- directory and load the
same type of files into the sub-directory.
So, here each can have as many directories are
needed.
There are 2 types of path:
Absolute path
Relative path
Absolute path : Begging with root and follows a path
down to specified files giving directory, directory name
on the path.
<date/time> <footer> 17
Acyclic Graph Directory
Multiple users are working on a project, the project files can be stored in a
comman sub-directory of the multiple users.
This type of directory is called acyclic graph directory.
The common directory will be declared a shared directory.
The graph contain no cycles with shared files, changes made by one user
are made visible to other users.
A file may now have multiple absolute paths.
When shared directory/file is deleted, all pointers to the directory/ files also to
be removed.
<date/time> <footer> 18
General graph directory
When we add links to an
existing tree structured
directory, the tree structure
is destroyed, resulting is a
simple graph structure.
Advantages :- Traversing
is easy. Easy sharing is
possible.
<date/time> <footer> 19
File system structure
Disk provides the bulk of secondary storage on which a file
system is maintained.
They have 2 characteristics that make them a convenient
medium for storing multiple files.
A disk can be rewritten in place.
It is possible to read a block from the disk, modify the block, and write
it back into same place.
A disk can access directly any block of information it contains.
<date/time> <footer> 20
File system structure
<date/time> <footer> 21
File system structure
I/O Control consists of device drivers and interrupt handlers to transfer
information between the main memory and the disk system.
The device driver writes specific bit patterns to special locations in the
I/O controller’s memory to tell the controller which device location to
act on and what actions to take.
The Basic File System needs only to issue commands to the
appropriate device driver to read and write physical blocks on the disk.
Each physical block is identified by its numeric disk address (Eg. Drive
1, cylinder 73, track2, sector 10).
<date/time> <footer> 22
File system structure
The File Organization Module knows about files and their logical blocks and
physical blocks.
By knowing the type of file allocation used and the location of the file, file
organization module can translate logical block address to physical
addresses for the basic file system to transfer.
Each file’s logical blocks are numbered from 0 to n.
So, physical blocks containing the data usually do not match the logical
numbers.
A translation is needed to locate each block.
<date/time> <footer> 23
File system structure
The Logical File System manages all file system structure
except the actual data (contents of file).
It maintains file structure via file control blocks.
A file control block contains information about the file,
ownership, permissions, location of the file contents.
<date/time> <footer> 24
Allocation Methods
Contiguous allocation
Non-Contiguous allocation
<date/time> <footer> 25
Contiguous allocation
An allocation method refers to how disk
blocks are allocated for files:
Contiguous allocation – each file
occupies set of contiguous blocks.
Best performance in most cases
Simple – only starting location (block #)
and length (number of blocks) are
required
Problems include finding space for file,
knowing file size, external fragmentation,
need for compaction off-line (downtime)
or on-line
<date/time> <footer> 26
Linked allocation
Each file a linked list of blocks.
File ends at nil pointer
No external fragmentation
Each block contains pointer to next block
No compaction, external fragmentation
Free space management system called when new block needed
Improve efficiency by clustering blocks into groups but increases internal fragmentation
Reliability can be a problem
Locating a block can take many I/Os and disk seeks FAT (File Allocation Table) variation
<date/time> <footer> 27
Beginning of volume has table, indexed by block number
Linked allocation
<date/time> <footer> 28
Indexed allocation
Each file has its own index block(s) of pointers to its data blocks
<date/time> <footer> 29
Free-Space Management
File system maintains free-space list to track available
blocks/clusters Linked list (free list)
Cannot get contiguous space easily
No waste of space
No need to traverse the entire list
<date/time> <footer> 30
Bitmap or Bit vector
A Bitmap or Bit Vector is series or collection of bits where each
bit corresponds to a disk block.
The bit can take two values: 0 and 1: 0 indicates that the block is
allocated and 1 indicates a free block.
The given instance of disk blocks on the disk in Figure 1 (where
green blocks are allocated) can be represented by a bitmap of
16 bits as: 0000111000000110.
Advantages –
Simple to understand.
Finding the first free block is efficient.
It requires scanning the words (a group of 8 bits) in a bitmap for a
non-zero word.
(A 0-valued word has all bits 0).
<date/time> <footer> 31
The first free block is then found by scanning for the first 1 bit in the
Linked Free Space List on Disk
In this approach, the free disk blocks
are linked together i.e. a free block
contains a pointer to the next free
block.
The block number of the very first disk
block is stored at a separate location
on disk and is also cached in memory.
<date/time> <footer> 32
Grouping
Modify linked list to store address of next n-1 free blocks in first
free block, plus a pointer to next block that contains free-block-
pointers.
An advantage of this approach is that the addresses of a group
of free disk blocks can be found easily.
<date/time> <footer> 33
Counting
Because space is frequently contiguously used and freed, with
contiguous- allocation allocation, extents, or clustering.
Keep address of first free block and count of following free
blocks.
Free space list then has entries containing addresses and
counts.
<date/time> <footer> 34
Directory Implementation
Linear List
Hash Table
<date/time> <footer> 35
Linear List
In this algorithm, all the files in a directory are maintained as singly lined list.
Each file contains the pointers to the data blocks which are assigned to it
and the next file in the directory.
Characteristics:
When a new file is created, then the entire list is checked whether the new file name
is matching to a existing file name or not. In case, it doesn't exist, the file can be
created at the beginning or at the end. Therefore, searching for a unique name is a
big concern because traversing the whole list takes time.
The list needs to be traversed in case of every operation (creation, deletion, updating,
etc) on the files therefore the systems become inefficient.
<date/time> <footer> 36
Hash Table
To overcome the drawbacks of singly linked list implementation of directories,
there is an alternative approach that is hash table.
This approach suggests to use hash table along with the linked lists.
A key-value pair for each file in the directory gets generated and stored in the
hash table.
The key can be determined by applying the hash function on the file name while
the key points to the corresponding file stored in the directory.
Now, searching becomes efficient due to the fact that now, entire list will not be
searched on every operating.
Only hash table entries are checked using the key and if an entry found then the
corresponding file will be fetched using
<date/time> the value.
<footer> 37
Hash Table
<date/time> <footer> 38
Efficiency and Performance
Efficiency dependent on:
Disk allocation and directory algorithms
Types of data kept in file’s directory entry
Performance
Disk cache – separate section of main memory for frequently used
blocks
free-behind and read-ahead techniques to optimize sequential access
improve PC performance by dedicating section of memory as virtual disk,
or RAM disk
<date/time> <footer> 39
I/O Hardware
I/O devices
Input/output devices are the devices that are responsible for the
input/output operations in a computer system.
Basically there are following two types of input/output devices:
Block devices
Character devices
<date/time> <footer> 40
Block Devices
A block device stores information in block with fixed-size and
own-address.
It is possible to read/write each and every block independently
in case of block device.
In case of disk, it is always possible to seek another cylinder
and then wait for required block to rotate under head without
mattering where the arm currently is.
Therefore, disk is a block addressable device.
<date/time> <footer> 41
Character Devices
A character device accepts/delivers a stream of characters
without regarding to any block structure.
Character device isn't addressable.
Character device doesn't have any seek operation.
There are too many character devices present in a computer
system such as printer, network interfaces etc.
<date/time> <footer> 42