File
A file is a named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tapes and optical disks.
It is a method of data collection that is used as a medium for giving input and receiving output
from that program.
In general, a file is a sequence of bits, bytes, or records whose meaning is defined by the file
creator and user.
Every File has a logical location where they are located for storage and retrieval.
File structure
A File Structure should be according to a required format that the operating system can
understand.
Three types of files structure in OS:
A text file: It is a series of characters that is organized in lines.
An object file: It is a series of bytes that is organized into blocks.
A source file: It is a series of functions and processes.
File Type
File type refers to the ability of the operating system to distinguish different types of file such as
text files source files and binary files etc.
Many operating systems support many types of files.
Ordinary files:
These are the files that contain user information.
These may have text, databases or executable program.
The user can apply various operations on such files like add, modify, delete or even remove the
entire file.
Directory files:
These files contain list of file names and other information related to these files.
Special files:
These files are also known as device files.
These files represent physical device like disks, terminals, printers, networks, tape drive etc.
These files are of two types −
Character special files − data is handled character by character as in case of terminals or printers.
Block special files − data is handled in blocks as in the case of disks and tapes.
File Attributes
Name: It is the only information stored in a human-readable form.
Identifier: Every file is identified by a unique tag number within a file system known as an
identifier.
Location: Points to file location on device.
Type: This attribute is required for systems that support various types of files.
Size. Attribute used to display the current file size.
Protection. This attribute assigns and controls the access rights of reading, writing, and executing
the file.
Time, date and security: It is used for protection, security, and also used for monitoring
Operations on the File
Create
Creation of the file is the most important operation on the file.
Different types of files are created by different methods.
for example text editors are used to create a text file, word processors are used to create a word file and
Image editors are used to create the image files.
Write
Writing the file is different from creating the file.
The OS maintains a write pointer for every file which points to the position in the file from which, the
data needs to be written.
Read
Every file is opened in three different modes : Read, Write and append.
A Read pointer is maintained by the OS, pointing to the position up to which, the data has been read.
Re-position
Re-positioning is simply moving the file pointers forward or backward depending upon the
user's requirement.
It is also called as seeking.
Delete
Deleting the file will not only delete all the data stored inside the file, It also deletes all the
attributes of the file.
The space which is allocated to the file will now become available and can be allocated to the
other files.
Truncate
Truncating is simply deleting the file except deleting attributes.
The file is not completely deleted although the information stored inside the file get replaced.
File Access Methods
File access mechanism refers to the manner in which the records of a file may be accessed.
There are several ways to access files −
Sequential access
Direct/Random access
Indexed sequential access
Sequential access
A sequential access is that in which the records are accessed in some sequence, i.e., the
information in the file is processed in order, one record after the other.
This access method is the most primitive one. Example: Compilers usually access files in this
fashion.
Direct/Random access
Random access file organization provides, accessing the records directly.
Each record has its own address on the file with by the help of which it can be directly accessed
for reading or writing.
The records need not be in any sequence within the file and they need not be in adjacent
locations on the storage medium.
Indexed sequential access
This mechanism is built up on base of sequential access.
An index is created for each file which contains pointers to various blocks.
Index is searched sequentially and its pointer is used to access the file directly.
Directory
A directory is a container that is used to contain folders and files.
It organizes files and folders in a hierarchical manner.
The directory may store some or the entire file attributes.
A hard disk can be divided into the number of partitions of different sizes. The partitions are also
called volumes or mini disks.
Each partition must have at least one directory in which, all the files of the partition can be listed.
A directory entry is maintained for each file in the directory which stores all the information
related to that file.
A directory can be viewed as a file which contains the Meta data of the bunch of files. Every
Directory supports a number of common operations on the file:
File Creation
Search for the file
File deletion
Renaming the file
Traversing Files
Listing of files
Directory Structure
The field File name, contains the name of the concerned file in the directory,
Type field indicates the kind or category of the file,
Location Info field indicates the location where the file is stored.
Protection Info field contains the information whether the file can be accessed by the other user in
the system or not.
Flag field contains the kind of directory entry like value D in Flag field indicates that the file is a
directory, value L indicates that the file is a link, value M indicates that the file is a mounted file
system.
The Misc info filed in the directory contains the miscellaneous information about the owner of the
file, the time of its creation, the time at which the file was modified last.
Types of Directory Structures
Single Level Directory
Two Level Directory
Tree Structured Directory
Acyclic-Graph Structured Directories
Single Level Directory
The simplest method is to have one big list of all the files on the disk.
The entire system will contain only one directory which is supposed to mention all the files
present in the file system.
The directory contains one entry per each file present on the file system.
Advantages:
Implementation is very simple.
If the sizes of the files are very small then the searching becomes faster.
File creation, searching, deletion is very simple since we have only one directory.
Disadvantages:
We cannot have two files with the same name.
The directory may be very big therefore searching for a file may take so much time.
Protection cannot be implemented for multiple users.
There are no ways to group same kind of files.
Choosing the unique name for every file is a bit complex.
Two Level Directory
In two level directory systems, we can create a separate directory for each user.
There is one master directory which contains separate directories dedicated to each user.
For each user, there is a different directory present at the second level, containing group of user's
file.
The system doesn't let a user to enter in the other user's directory without permission.
Characteristics:
Each files has a path name as /User-name/directory-name/
Different users can have the same file name.
Searching becomes more efficient as only one user's list needs to be traversed.
The same kind of files cannot be grouped into a single directory for a particular user.
Every Operating System maintains a variable as PWD which contains the present directory name
(present user name) so that the searching can be done appropriately.
Tree Structured Directory
In Tree structured directory system, any directory entry can either be a file or sub directory.
The similar kind of files can now be grouped in one directory.
Each user has its own directory and it cannot enter in the other user's directory.
However, the user has the permission to read the root's data but he cannot write or modify this.
only administrator of the system has the complete access of root directory.
Searching is more efficient in this directory structure.The concept of current working directory is
used.
A file can be accessed by two types of path, either relative or absolute.
Absolute path is the path of the file with respect to the root directory of the system while relative
path is the path with respect to the current working directory of the system.
Acyclic-Graph Structured Directories
The tree structured directory system doesn't allow the same file to exist in multiple directories
therefore sharing is major concern in tree structured directory system.
We can provide sharing by making the directory an acyclic graph.
In this system, two or more directory entry can point to the same file or sub directory.
That file or sub directory is shared between the two directory entries.
These kinds of directory graphs can be made using links or aliases.
We can have multiple paths for a same file.
Links can either be symbolic (logical) or hard link (physical).
If a file gets deleted in acyclic graph structured directory system, then
In the case of soft link, the file just gets deleted and we are left with a dangling pointer.
In the case of hard link, the actual file will be deleted only if all the references to it gets deleted.
Mass-storage structure (Disk structure)
Secondary storage devices are those devices whose memory is non volatile
Secondary storage is also called auxiliary storage.
Secondary storage is less expensive when compared to primary memory like RAMs.
few examples are magnetic disks, magnetic tapes, etc
Magnetic Disk Structure
Most of the secondary storage is in the form of magnetic disks.
Knowing the structure of a magnetic disk is necessary to understand how the data in the disk is
accessed by the computer
A magnetic disk contains several platters. Each platter is divided into circular shaped tracks. The
length of the tracks near the center is less than the length of the tracks farther from the center. Each
track is further divided into sectors, as shown in the figure.
Tracks of the same distance from center form a cylinder. A read-write head is used to read data
from a sector of the magnetic disk.
The speed of the disk is measured as two parts:
Transfer rate: This is the rate at which the data moves from disk to the computer.
Random access time: It is the sum of the seek time and rotational latency.
Seek time is the time taken by the arm to move to the required track. Rotational latency is
defined as the time taken by the arm to reach the required sector in the track.
Disk Access Time = Rotational Latency + Seek Time + Transfer Time
Question 01: Let suppose there are 8 platters in hard disk drive. As
each platters has two surfaces so 16 surfaces will be there in hard
drive. Therefore, required R/W head will also be 16. Suppose, there are
1,024 cylinders and 128 sectors in each track. The sector size is 512
bytes.
Then A). Calculate Disk Size? B). How many number of bits required to
represent the disk size?
Solution:
A) Size of Hard Disk = Cylinder x Heads X Sectors x Sector-Size
= 1,024 x 16 x 128 x 512 Bytes
= 2^10 + 2^4 + 2^7 + 2^9 Bytes
= 2^30 Bytes = 2GB
B) As 2GB = 2^30 bytes, So, 30 bits are required to represent 2GB
hard disk.
Disk Scheduling Algorithms
First Come First Serve
Shortest Seek Time First (SSTF)
SCAN algorithm
C-SCAN algorithm
First Come First Serve
This algorithm performs requests in the same order asked by the system. Let's take an example
where the queue has the following requests with cylinder numbers as follows:
(82,170,43,140,24,16,190)
And current position of Read/Write head is : 50.
So, total seek time:
=(82-50)+(170-82)+(170-43)+(140-43)+(140-24)+(24-16)+(190-16) =642
Advantages:
Every request gets a fair chance
No indefinite postponement
Disadvantages:
Does not try to optimize seek time
May not provide the best possible service
Shortest Seek Time First (SSTF)
Here the position which is closest to the current head position is chosen first. Consider the
previous example where disk queue looks like,
(82,170,43,140,24,16,190)
And current position of Read/Write head is : 50
So, total seek time:
=(50-43)+(43-24)+(24-16)+(82-16)+(140-82)+(170-40)+(190-170) =208
Advantages:
Average Response Time decreases
Throughput increases
Disadvantages:
Overhead to calculate seek time in advance
Can cause Starvation for a request if it has higher seek time as compared to incoming requests
High variance of response time as SSTF favours only some requests
SCAN algorithm
This algorithm is also called the elevator algorithm because of it's behavior.
In this algorithm, the head starts to scan all the requests in a direction and reaches the end of
the disk.
After that, it reverses its direction and starts to scan again the requests in its path and serves
them.
This behavior is similar to that of an elevator. Let's take the previous example,
82,170,43,140,24,16,190. And the Read/Write arm is at 50,
and it is also given that the disk arm should move “towards the larger value”.
Therefore, the seek time is calculated as:
=(199-50)+(199-16) =332
Advantages:
High throughput
Low variance of response time
Average response time
Disadvantages:
Long waiting time for requests for locations just visited by disk arm
CSCAN
C-SCAN disk scheduling algorithm:
It stands for "Circular-Scan". This algorithm is almost the same as the Scan disk algorithm but
one thing that makes it different is that 'after reaching the one end and reversing the head
direction, it starts to come back.
The disk arm moves toward the end of the disk and serves the requests coming into its path.
After reaching the end of the disk it reverses its direction and again starts to move to the other
end of the disk but while going back it does not serve any requests.
Advantages:
• The waiting time is uniformly distributed among the requests.
• Response time is good in it.
Disadvantages:
• The time taken by the disk arm to locate a spot is increased here.
• The head keeps going to the end of the disk.
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it
is also given that the disk arm should move “towards the larger value”.
Seek time is calculated as:
=(199-50)+(199-0)+(43-0) =391
Advantages:
Provides more uniform wait time compared to SCAN
LOOK - scheduling algorithm
In this algorithm, the disk arm moves to the 'last request' present and services
them. After reaching the last requests, it reverses its direction and again comes
back to the starting point.
It does not go to the end of the disk, in spite, it goes to the end of requests.
Advantages:
• Starvation does not occur.
• Since the head does not go to the end of the disk, the time is not wasted
here.
Disadvantage:
• The arm has to be conscious to find the last request.
Example a disk having 200 tracks (0-199). The request sequence(82,170,43,140,24,16,190)
are shown in the given figure and the head position is at 50.
Solution: The disk arm is starting from 50 and starts to serve requests in one direction only
but in spite of going to the end of the disk, it goes to the end of requests i.e.190.
Then comes back to the last request of other ends of the disk and serves them. And again
starts from here and serves till the last request of the first side.
Hence, Seek time =(190-50) + (190-16) =314
C-LOOK - scheduling algorithm
The C-Look algorithm is almost the same as the Look algorithm. The only
difference is that after reaching the end requests, it reverses the direction of the
head and starts moving to the initial position.
But in moving back, it does not serve any requests.
Advantages:
• The waiting time is decreased.
• If there are no requests till the end, it reverses the head direction immediately.
• Starvation does not occur.
• The time taken by the disk arm to find the desired spot is less.
Disadvantage:
• The arm has to be conscious about finding the last request.
Example: Suppose a disk having 200 tracks (0-199). The request
sequence(82,170,43,140,24,16,190) are shown in the given figure and the head position is at
50.
Solution: The disk arm is starting from 50 and starts to serve requests in one direction only
but in spite of going to the end of the disk, it goes to the end of requests i.e.190.
Then comes back to the last request of other ends of a disk without serving them. And again
starts from the other end of the disk and serves requests of its path.
Hence, Seek Time = (190-50) + (190-16) + (43-16) =341
Protection in File System
Types of Access :
Read
Reading from a file.
Write
Writing or rewriting the file.
Execute
Loading the file and after loading the execution process starts.
Append
Writing the new information to the already existing file, editing must be end at the end of the
existing file.
Delete
Deleting the file which is of no use and using its space for the another data.
List
List the name and attributes of the file.
Access Control :
Owner
Owner is the user who has created the file.
Group
A group is a set of members who has similar needs and they are sharing the same file.
Universe
In the system, all other users are under the category called universe.
Other Protection Approaches:
The access to any system is also controlled by the password.
If the use of password are is random and it is changed often, this may be result in limit the
effective access to a file.
The use of passwords has a few disadvantages:
The number of passwords are very large so it is difficult to remember the large passwords.
If one password is used for all the files, then once it is discovered, all files are accessible;
protection is on all-or-none basis.