UNIT 4
FILE SYSTEM
1) It is a software.
2) it manages how the data will be stored and fetched in the hard disk / storage medium such as
secondary memory, drives, flash memory, etc.
3) Permanent storage of data through hard disks or SSD.
4) User created file -> in folders/directory (collection of related files)->file system(manages file)
5) Data in sectors of disk. File system stores in sectors.
File system logically divides data(1gb) into blocks - then these blocks are mapped with sectors in disk
and further storing of data.
Properties of File System
1. Files are stored on storage medium such as disk and do not vanish when a user logs out of the
computer system.
2. Each file has access permissions that controls sharing of files.
3. Files may form complex structures
4. Files are grouped into a single directory.
Difference between Logical and Physical file
PHYSICAL FILE LOGICAL FILE
It occupies the portion of memory and has original It does not occupy space and does not contain
data data.
1 record format Can contain upto 32 record format
It can exist without a logical file It cannot exist without physical file
We can't delete a physical file unless we delete a We can delete a logical file without deleting a
logical file physical file
CRTPF command is used to make this file CRTLF command is used to make this file
It represent the real data on series system and The logical files represent one more physical file.
describe how the data should be displayed or
retrieved from program
FILE
A file is a named collection of information stored on a computer's secondary storage devices like hard
drives, solid state drives or even optical media. This information can be anything from text documents
and images to software programs and system configurations.
Eg - created a word file and added data in word.
1. Operations on file
1. Creating - attributes generated as we create - attributes are meta data
A space must be found for the file , an entry of the new file must be made in directory
2. Reading- we make a system call that specifies the name of the file and where the next
block of the file should be put.
3. Writing
4. Deleting - whole file deletes along with attributes
5. Truncating - content / data of file deletes without deleting attributes.
6. Repositioning - means repositioning of read write head
positioning of read write head
2. File attributes
1. Name
2. Extension (Type)
3. Identifier - unique number for identification
4. Location - stored where
5. Size
6. Modified date, created date
7. Protection/permission- who can access? / read??write??Control?
8. Encryption / compression
FILE TYPES
When we design a file system, our OS should recognize and support file types.
A common technique for implementing file type is to include the type as part of the filename. The name is
split into two parts, a name and extension, usually separated by a period character.
ACCESS METHODS
1. Files contain information that computers need to access and read into memory.
2. Different systems offer various methods to access file information.
3. Some systems offer only one way to access files, while others support multiple methods.
4. Choosing the appropriate access method for a specific application is a significant design
challenge.
There are 3 ways to access a file into computer systems-
1. Sequential access-
● It is the simplest access method.
● Information in the file is processed in order, one record after another.
● For example - editor and compiler usually access the file in this fashion.
● In sequential access, the OS reads the file word by word. A pointer is maintained which
initially points to the base address of the file. If the user wants to read the first word of the
file then the pointer provides that word to the user and increases its value by 1 word. This
process continues till the end of the file.
2. Direct Access-
● Also known as the relative access method.
● A fixed-length logical record that allows the program to read and write record rapidly in
no particular order.
● The direct access is based on the disk model of a file since disk allows random access to
any file block.
● In most of the cases, we need filtered information from the database. The sequential
access can be very slow and inefficient in such cases.
● Suppose every block of the storage stores 4 records and we know that the record we
needed is stored in the 10th block. In that case, the sequential access will not be
implemented because it will traverse all the blocks in order to access the needed record.
● Direct access will give the required result despite the fact that the operating system has to
perform some complex tasks such as determining the desired block number. However,
that is generally implemented in database applications.
3. Indexed access-
● If a file can be sorted on any of the fields then an index can be assigned to a group of
certain records.
● A particular record can be accessed by its index. The index is nothing but the address of a
record in the file.
● In index accessing, searching in a large database became very quick and easy but we need
to have some extra space in the memory to store the index value.
DIRECTORY STRUCTURE
A directory is a container used to contain folders and files. It organizes files and folders in a hierarchical
manner.
A directory can be viewed as a file which contains the Metadata of the bunch of files.
Following are the logical structures of a directory, each providing a solution to the problem faced in the
previous type of directory structure.
1. Single level directory -
● It is the simplest directory structure.
● All files are contained in the same directory which makes it easy to support and
understand.
● A single level directory has a significant limitation, however, when the number of files
increases or when the system has more than one user. Since all the files are in the same
directory, they must have a unique name. If two users call their dataset test, then the
unique name rule is violated.
Advantages
● Since it is a single directory , implementation is easy.
● If the files are smaller in size, searching will become faster.
● Basic file operations become easy since we have only one directory.
Disadvantages
● We cannot have 2 files with the same name.
● The directory may be very big, searching becomes difficult and time consuming.
● Protection cannot be implemented for multiple users.
● There are no ways to group the same kind of files.
● Choosing the unique name for every file is a bit complex and limits the number of files in
the system because most OS limits the number of characters used to construct the file
name.
2. Two level directory-
As we have seen, a single level directory often leads to confusion of files names among different
users. The solution to this problem is to create a separate directory for each user.
In two level directory systems, we can create a separate directory for each user. There is one
master directory which contains separate directories dedicated to each user. For each user, there is
a different directory present at the second level, containing a group of user's files. The system
doesn't let a user enter in the other user's directory without permission.
Advantages-
● There can be more than 2 files with the same name in different user directories.
● A security would be there which prevent user to access other user files
● Searching for files becomes easy in this structure.
Disadvantages-
● As there is an advantage of security , there is also disadvantage that the user
cannot share the file with the other users.
● Unlike the advantage users can create their own files, users don’t have the ability
to create subdirectories.
● Scalability is not possible because one use can’t group the same types of files
together.
3. Tree Structure
● Hierarchical Organization: Files and folders are organized in a hierarchy, resembling an
upside-down tree. The root directory sits at the top, with subdirectories branching out
from it. This allows for nested structures and categorization.
● Unique File Names: Each file within the entire system has a unique name, preventing
conflicts and confusion.
● Subdirectories for Grouping: Users can create subdirectories to group related files
together. This improves organization and simplifies finding specific files.
● Individual User Directories: Every user has their own designated directory to store
personal files, ensuring privacy and separation.
● Limited Root Access: Users typically have read access to the root directory but cannot
modify it. Only administrators have full control over the root.
● Current Working Directory: The system maintains a "current working directory" for each
user, streamlining file access within that specific location.
● Path Names:
○ Absolute Paths: Specify the complete path from the root directory down to the
desired file, providing a definitive location.
○ Relative Paths: Define the path relative to the current working directory, offering
a more concise way to navigate within the current directory structure.
● User Privileges: Users have the ability to create new files and directories within their
designated areas, allowing for personal organization.
● Search Efficiency: The hierarchical structure facilitates efficient file searching by allowing
users to navigate through directories or use specific paths to locate files.
Advantages:
● This directory structure allows subdirectories inside a directory.
● Searching is easier.
● File sorting of important and unimportant data becomes easier.
● This directory is more scalable than the other two directory structures explained.
Disadvantages:
● As the user isn’t allowed to access other user’s directory, this prevents file sharing
among users.
● As the user has the capability to make subdirectories, if the number of
subdirectories increases the searching may become complicated.
● Users cannot modify the root directory data.
● If files do not fit in one, they might have to be fit into other directories.
4. Acyclic graph structure
Traditional directory structures, like trees, struggle with sharing files efficiently. Acyclic graphs
offer a solution, allowing for shared access to files and directories across locations.
Key Points:
● Sharing is Central: Unlike tree structures, acyclic graphs enable files and directories to be
accessed from multiple locations. This is ideal for collaboration, where changes made by
one person are immediately visible to others with access.
● Two Ways to Share:
○ Links: These act like pointers, directing users to the actual file location. This
saves storage space by avoiding duplicate copies.
○ Duplicate Entries: While less efficient, some systems create complete copies of
information within each sharing directory. This can lead to redundancy.
● Complexity Increase: Managing links and ensuring consistency across shared files adds
complexity compared to simpler tree structures.
Challenges to Consider:
● Multiple Paths: A file can have multiple names due to different access points, potentially
causing confusion.
● Deletion: Deleting a shared file requires careful handling of remaining links to avoid
"dangling links" pointing to non-existent files.
● Navigation: Traversing the entire file system (e.g., for backups) can be more complex due
to the possibility of multiple paths and shared directories.
5. General Graph directory
Acyclic graphs offer a powerful way to manage files by enabling sharing across directories.
However, unlike tree structures, they come with a key challenge: preventing cycles.
Benefits of Acyclic Graphs:
● Efficient Sharing: Files and directories can be accessed from multiple locations,
streamlining collaboration.
● Reduced Redundancy: Links (pointers) save storage space by referencing existing files
instead of creating duplicates.
● Simplified Algorithms: Traversing the graph and finding unused files can be relatively
straightforward.
● Avoiding Redundant Searches: Shared sections are explored only once, improving
performance.
Challenge: Avoiding Cycles:
● Cycles occur when links create loops within the directory structure.
● This can lead to:
○ Infinite Loops: Search algorithms get stuck endlessly traversing the cycle.
○ Incorrect Results: Files might be missed or counted multiple times.
Solutions to Avoid Cycles:
● Limited Access Depth: Restrict the number of directories a search can explore, preventing
infinite loops in complex structures (may not be ideal for all scenarios).
● Bypassing Links: During directory traversal, ignore links altogether. This ensures cycles
are avoided but might miss files accessible only through links.
Acyclic Graphs: A Trade-off
While acyclic graphs offer significant benefits for sharing, they require careful management to
avoid cycles. Choosing the appropriate strategy (limited access depth or bypassing links) depends
on the specific needs and priorities of the system.
FILE PROTECTION
Keeping information safe is crucial in any computer system. This involves two main aspects:
1. Reliability: Protecting data from physical damage (like disk failures or power surges). This is
often achieved by making backup copies of files.
2. Protection: Preventing unauthorized access to your data.
Here's a breakdown of how file protection works:
Why Protection is Needed:
● Multi-user systems allow access to files by multiple people. This creates the need to control who
can access what.
● Protection mechanisms grant controlled access by limiting what users can do with files (read,
write, execute, etc.).
Types of File Access:
● Read: View the contents of a file.
● Write: Modify or overwrite a file's content.
● Execute: Run a program stored in the file.
● Append: Add new information to the end of a file.
● Delete: Remove a file permanently.
● List: See the name and properties of a file.
● Other: Additional actions like renaming, copying, or editing files might also be controlled.
Access Control Methods:
1. Access Lists and Groups:
○ Each file/directory has an access list specifying users and their allowed access types
(read, write, etc.).
○ Users are categorized as:
■ Owner: The creator of the file has full control.
■ Group: A set of users sharing the file with similar access needs.
■ Universe: All other users on the system.
○ This method can become cumbersome if many users need access.
2. Password Protection:
○ Each file can have a separate password for access.
○ While secure (if passwords are strong and changed frequently), managing numerous
passwords is impractical.
○ Using a single password for all files compromises security if it's discovered.
FILE SYSTEM STRUCTURES
1. File systems provide efficient access to the disk by allowing data to be stored, located and
retrieved.
2. Most OS use layering for every task including file systems. Each layer is responsible for some
activities.
3. Layers of file system are:
How this layer works
● Application program asks for a file, and the request is sent to the logical file system. This contains
the metadata of the file and directory structure. If the application program does not have
permissions of the file then this will throw an error. It verifies the path of the file
● Files divided into blocks. Files are stored in and retrieved from hard disks. Hard disks are divided
into tracks and sectors. In order to store logical blocks need to be mapped to physical blocks. This
mapping is done by a file organization module. It is also responsible for free space management.
● At the time of I/O operation initially file organization decides which physical block the
application program needs and passes this information to the file system. Then this basic file
system is responsible for issuing commands for I/O control.
● I/O control has codes by which it can access the hard disk. These codes are called device drivers.
I/O controls handle interrupts.
FILE ALLOCATION METHODS
File systems logically divide data into blocks and are stored in sectors of disks where data is stored
physically.
How the block is stored in disk is called allocation methods
Advantage of allocation:
1) Efficient disk utilization
2) Access faster
Contiguous Allocation-
● Each file occupies a contiguous set of blocks on the disk.
● Continuous memory allocation / sequential data storage / direct access.
● In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file
requires n blocks and is given a block b as the starting location, then the blocks assigned to the
file will be: b, b+1, b+2,……b+n-1. This means that given the starting block address and the
length of the file (in terms of blocks required), we can determine the blocks occupied by the file.
● Let in a directory there will be 3 files A, B, C having lengths i.e. blocks as 2, 5, 4 respectively
which starts as sector 0, 6, 14.
Advantages
1) Easy to implement.
2) Excellent read performance.
Disadvantages
1) Disk will become fragmented (Internal and external fragmentation)
Internal fragmentation
But the disk will have external fragmentation. When blocks are not continuously empty for the
new file.
2) Difficult to grow file
Linked List Allocation-
● It comes under non-contiguous allocation.
● Each file is a linked list of disk blocks which need not be contiguous.
● The directory entry contains a pointer to the starting and the ending file block. Each block
contains a pointer to the next block occupied by the file.
● The last block will contain -1 in the pointer.
Advantages
1) No external fragmentation
2) File size can increase
Disadvantages
1) Because the file blocks are distributed randomly on the disk, a large number of seeks are needed
to access every block individually. This makes linked allocation slower.
2) It does not support random or direct access. We can not directly access the blocks of a file. A
block k of a file can be accessed by traversing k blocks sequentially (sequential access ) from the
starting block of the file via block pointers.
3) Overhead of pointers- some memory will be allocated to pointers as well.
Indexed allocation-
● In this some blocks will be used to store indexes that will store the data where the blocks of files
are stored.
● a special block known as the Index block contains the pointers to all the blocks occupied by a file.
● Each file will have its own index.
● The directory entry contains the address of the index block.
Advantages
1) Support direct access - through index block which contain addresses of all blocks.
2) No external fragmentation - due to non contiguous blocks
Disadvantages
1) Pointer overhead
2) Multilevel index- When the index of a file is very large which will take more memory than a
single block, multilevel index is used here.
For files that are very large, a single index block may not be able to hold all the pointers.
● Linked Scheme- This scheme links two or more index blocks together for holding the
pointers. Every index block would then contain a pointer or the address to the next index
block.
● Multilevel Index- In this policy, a first level index block is used to point to the second
level index blocks which in turn points to the disk blocks occupied by the file. This can be
extended to 3 or more levels depending on the maximum file size.
● Combined Scheme- In this scheme, a special block called the Inode (information Node)
contains all the information about the file such as the name, size, authority, etc and the
remaining space of Inode is used to store the Disk Block addresses which contain the
actual file. The first few of these pointers in Inode point to the direct blocks i.e the
pointers contain the addresses of the disk blocks that contain data of the file. The next few
pointers point to indirect blocks. Indirect blocks may be single indirect, double indirect or
triple indirect. Single Indirect block is the disk block that does not contain the file data
but the disk address of the blocks that contain the file data. Similarly, double indirect
blocks do not contain the file data but the disk address of the blocks that contain the
address of the blocks containing the file data.
FREE SPACE MANAGEMENT
Free space management is a critical aspect of operating systems as it involves managing the available
storage space on the hard disk or other secondary storage devices
The system keeps track of the free disk blocks for allocating space to files when they are created, also to
reuse space created after deleting some files. The system maintains a free space list of disk blocks that are
not allocated to file or directory
Free space list can be implemented as
1. Bitmap and Bit Vector- A Bitmap or Bit Vector is a series or collection of bits where each bit
corresponds to a disk block. The bit can take 2 values 0 and 1. 0 means the block is allocated and
1 means the block is free.
Bitmap of 16 bits are: 0000111000000110.
2. Linked List- the free disk blocks are linked together i.e. a free block contains a pointer to the next
free block. The block number of the very first disk block is stored at a separate location on disk
and is also cached in memory.
The free space list head points to Block 5 which points to Block 6, the next free block and so on.
The last free block would contain a null pointer indicating the end of the free list. A drawback of
this method is the I/O required for free space list traversal.
3. Grouping- This approach stores the address of the free blocks in the first free block. The first free
block. The first free block stores the address of some, say n free blocks. Out of these n blocks, the
first n-1 blocks are actually free and the last block contains the address of the next free n blocks.
4. Counting- This approach stores the address of the first free disk block and a number n of free
contiguous disk blocks that follow the first block. Every entry in the list would contain:
● Address of first free disk block
● A number n
DIRECTORY IMPLEMENTATION
It is done using Linked list and hash tables.
Directory allocation and directory management algorithms affect the efficiency , reliability and
performance of file systems.
1. Using singly linked list - The implementation of directories using singly linked list is easy to
program but is time consuming to execute. We use a linear list of file names with pointers to the
data blocks.
● To create a new file the entire list has to be checked such that the new directory does not
exist previously.
● It can be added at the beginning or at the end.
● In order to delete we first need to search the directory with the name of the file to be
deleted. After searching we can delete that file by releasing the space allocated.
● To reuse the directory entry we can mark that entry as unused or we can append it to the
list of free directories.
● To delete a file linked list is best choice as it takes less time
Disadvantage
The main disadvantage of using a linked list is that when the user needs to find a file the user has
to do a linear search. In today’s world directory information is used quite frequently and linked
list implementation results in slow access to a file. So the operating system maintains a cache to
store the most recently used directory information.
2. Using a Hash table- It overcomes the drawback of linked lists. In this method we use a hash table
along with a linked list.
In the hash table for each pair in the directory key-value pair is generated. The hash function on
the file name determines the key and this key points to the corresponding file stored in the
directory. This method efficiently decreases the directory search time as the entire list will not be
searched on every operation. Using the keys the hash table entries are checked and when the file
is found it is fetched.
Disadvantage
The major drawback of using the hash table is that generally, it has a fixed size and its
dependency on size. But this method is usually faster than linear search through an entire
directory using a linked list.
FILE MODELS IN DISTRIBUTED SYSTEMS
● In distributed file systems , multiple machines are used to provide the file system’s facility.
● Different file systems often employ different conceptual models.
● The models based on structure and mobility are commonly used for the modeling of files.
There are 2 types of file models
1) Structured and Unstructured files
2) Mutable and Immutable files
Unstructured Files:
1. Simplest and most commonly used.
2. A file is a collection of unstructured sequence of data in the unstructured model.
3. No substructure
4. The data and structure of each file is an uninterpreted sequence of bytes that relies on applications
used like UNIX or DOS.
5. Most modern OS prefer unstructured file format because of sharing of files by different
applications.
6. It follows no structure
Structured Files:
1. It is a rarely used file model.
2. The file system sees a file consisting of a collection of a sequence of records in order.
3. It can also be possible that records of different files belonging to the same file system are of
variant sizes.
4. Files possess different properties despite belonging to the same file system.
5. The smallest form of data retrieved is called record.
6. The read or write operations are performed on a set of records.
7. Various file attributes that describe the file.
8. Each attribute has a name and its value.
9. 2 types
a. Files with Non - indexed records: the retrieving of records is performed concerning a
position in the file.
b. Files with indexed records: one or more key fields exist in each record, each of which can
be addressed by providing its value.
Mutable files:
1. Used in existing OS
2. The existing file contents gets overwritten by new contents after updating.
3. As the same file gets updated again and again so the file is described as a single sequence of
records
Immutable Files:
1. Cedar file system uses immutable file model
2. The file cannot be changed once created.
3. To implement updates, multiple version of files are created of same file
4. It maintains the consistency of multiple copies as distributed systems support caching and
replication.
5. Drawbacks
a. Increase in space utilization
b. Increase in disk allocation