Unit -5
File Systems
[Chapter 4 from Tanenbaum 3rd edition]
Prachi Shah
IT Dept.
BVM
Contents
1. Files: Naming, Structure, Types, Access, Attributes, Operations
2. Directories: Single-level, Hierarchical Directory Systems, Path names,
Operations
3. File System Implementation: File system layout, Implementing Files,
Implementing Directories, and Shared Files
4. File System Management: File System Backups, File System Consistency
5. Example of File Systems: MS-DOS, UNIX.
File Systems
• Requirements for long-term information storage:
1. It must be possible to store a very large amount of information.
2. The information must survive the termination of the process using it.
3. Multiple processes must be able to access the information concurrently.
• Disks are used to store files
Information is stored in blocks on the disks
Can read and write blocks
• Use file system as an abstraction to deal with accessing the
information kept in blocks on a disk
File Systems
• Files are logical units of information created by a processes
• Thousands of them on a disk managed by the OS
• OS structures them, names them, access them, use them, protect
them, implement them and manage them
• Part of the operating system dealing with files is known as the File
System.
• Two ways of looking at file system
User-how do we name a file, protect it, organize the files
Implementation-how are they organized on a disk
• Start with user, then go to implementer
• The user point of view
Naming
Structure
Directories
Naming
One to 8 letters in all current OS’s
Unix, MS-DOS (Fat16) file systems discussed
In Unix, file naming is case-sensitive, whereas not in MS-DOS.
FAT (16 and 32) were used in first Windows systems
Latest Window systems use Native File System
Naming
Many operating systems support two-part file names, with the two
parts separated by a period, as in prog.c.
The part following the period is called the file extension and
usually indicates something about the file.
In MS-DOS, for example, file names are 1 to 8 characters, plus an
optional extension of 1 to 3 characters.
In UNIX, the size of the extension, if any, is up to the user, and a file
may even have two or more extensions, as in homepage.html.zip
Typical file extensions
File Structure
Byte sequences
Maximum flexibility-can put anything in
Unix and Windows use this approach
Fixed length records
Tree of records- uses key field to find records in the tree
File Structure
Three kinds of files.
(a) Byte sequence. (b) Record sequence. (c) Tree.
File Types
1. Regular – contains user information
2. Directories – system files for maintaining structure of the
file system
3. Character special files – model serial I/O devices (e.g.
printers)
4. Block special files – model disks
1. Regular Files
a) ASCII
• Printable
• Can use pipes to connect programs if they produce / consume
ASCII
b) Binary File Types
• Two Unix examples
Executable (magic field identifies file as being executable)
Archive – compiled, not linked library procedures
• Every OS must recognize its own executable
Binary File Types (Early Unix)
(a) An executable file
(magic number
identifiies it as an
executable file)
(b)An archive-library
procedures compiled but
not linked
File Access
1. Sequential access – read from the beginning, can’t skip around
• Corresponds to magnetic tape
2. Random access – start where you want to start
• Came into play with disks
• Necessary for many applications, e.g. airline reservation
system
• Two methods are used for specifying where to start reading:
read (gives position in file)
seek (set the current position)
File Attributes (hypothetical OS)
Operations or System Calls for files
Name Purpose
1 Create with no data, sets some attributes
2 Delete to free disk space
3 Open after create, gets attributes and disk addresses into main memory
4 Close frees table space used by attributes and addresses
usually from current pointer position. Need to specify buffer into which data
5 Read
is placed
6 Write usually to current position (file size increases or data are overwritten)
7 Append at the end of the file
8 Seek puts file pointer at specific place in file. Read or write from that position on
e.g. make needs most recent modification times to arrange for group
9 Get Attributes
compilation
10 Set Attributes e.g. protection attributes
11 Rename change the name of an existing file
Directories
• Files which are used to organize a collection of files
• Also called folders in some OS’s
.
Single Level Directory Systems
A single-level directory system containing four files.
Used in devices like telephones, digital cameras and some
portable music players
Hierarchical Directory Systems
Path names
Windows \usr\ast\mailbox
UNIX /usr/ast/mailbox
MULTICS >USr>ast>mailbox
• Absolute: cp /usr/ast/mailbox /usr/ast/mailbox.bak
• to access specific file; no consideration of current directory
• Relative: cp mailbox mailbox.bak
• to do the working in current directory
• . Refers to current (working) directory
• .. Refers to parent of current directory
E.g. copy the file /usr/lib/dictionary to its own directory using the command
cp ../lib/dictionary
Path Names
A UNIX directory tree.
Unix cp commands involving dots
Cp ../lib/dictionary .
• .. says go to parent (usr)
• . says that target of the copy is current directory
• cp /usr/lib/dictionary dictionary works
• cp /usr/lib/dictionary /usr/ast/dictionary also works
Directory Operations
Name Purpose
1 Create creates directory
2 Delete directory has to be empty to delete it
3 Opendir Must be done before any operations on directory
4 Closedir Directory is closed to free up internal table space
5 Readdir returns next entry in open directory
6 Rename renames the directory
7 Link links file to another directory [Types: Hard link and Symbolic link]
8 Unlink Gets rid of directory entry
File Implementation
• Users are concerned with
how files are named
what operations are allowed on them
what the directory tree looks like, and
similar interface issues
• Implementors are interested in
how files and directories are stored
how disk space is managed, and
how to make everything work efficiently and reliably
File System Layout
• Files stored on disks. Disks broken up into one or more partitions, with
separate file system on each partition
• Sector 0 of disk is the Master Boot Record (MBR); which is used to
boot the computer
• End of MBR has partition table. Has starting and ending addresses of
each partition.
• One of the partitions is marked active in the master boot table
• Boot computer => BIOS reads & executes MBR
• MBR finds active partition and reads in first block (boot block)
• Program in boot block locates the OS for that partition and reads it in
• All partitions start with a boot block
A Possible File System Layout
• Superblock – contains magic number to identify info about the file
system (e.g. type of file system, number of blocks, …)
• i-nodes – an array of data structures, one per file, telling all about the
file
Allocating Blocks to files / Implementing files
• Most important implementation issue – keeping track of which disk
blocks go with which file.
• Four Methods:
1. Contiguous allocation
2. Linked list allocation
3. Linked list using table
4. I-nodes
1. Contiguous Allocation
(a) Contiguous allocation of disk space for 7 files.
(b) The state of the disk after files D and F have been removed.
1. Contiguous Allocation
The good
• Easy to implement – just remember two numbers: the disk address
of the first block and the number of blocks in the file
• Read performance is great. Only need one seek to locate the first
block in the file. The rest is easy.
The bad-disk becomes fragmented over time
• CD-ROM’s use contiguous allocation because the file system size is
known in advance
• DVD’s store a few consecutive 1 GB files because standard for
DVD only allows a 1 GB file max
2. Linked List Allocation
Storing a file as a linked list of disk blocks
The good – Gets rid of fragmentation
The bad – Random access is slow. Need to chase pointers to get to a block
3. Linked lists using a table in memory
• Put pointers in table in memory
• That table is called File Allocation Table (FAT)
• The bad – table becomes really big
• E.g 200 GB disk with 1 KB blocks needs a 600 MB table
• Growth of the table size is linear with the growth of the disk size
3. Linked List Allocation Using a Table in Memory – The
Solution
Storing a file as a linked list of disk blocks Linked list allocation using a file
allocation table in main memory
4. I-nodes (Index-node)
• Keep data structure in memory only for active files
• Data structure lists disk addresses of the blocks and
attributes of the files
• K active files, N blocks per file => k*n blocks (maximum)
• Solves the growth problem
• How big is N?
• Solution: Last entry in table points to disk block which
contains pointers to other disk blocks
4. I-nodes (Index-node)
Figure 4-13. An example i-node.
Implementing Directories
Open file, path name used to locate directory
Directory specifies block addresses by providing
Address of first block (contiguous)
Number of first block (linked)
Number of i-node
Implementing Directories
(a) fixed-size entries with the disk addresses and attributes (DOS)
(b) each entry refers to an i-node. Directory entry contains attributes. (Unix)
Implementing Directories
How do we deal with variable length names?
Problem is that names have gotten very long
Two approaches
1. Fixed header followed by variable length names
2. Heap-pointer points to names
Implementing Directories
Two ways of handling long file names in a directory.
(a) In-line. (b) In a heap.
Shared Files
File system containing a shared file.
File systems is a directed acyclic tree (DAG)
Shared files
If B or C adds new blocks, how does other owner find out?
Use special i-node for shared files-indicates that file is
shared
Use symbolic link - a special file put in B’s directory if C is
the owner. Contains the path name of the file to which it is
linked
I-node problem
If C removes file, B’s directory still points to i-node for
shared file
If i-node is re-used for another file, B’s entry points to
wrong i-node
Solution is to leave i-node and reduce number of owners
I node problem and solution
(a) Situation prior to linking. (b) After the link is created. (c) After
the original owner removes the file.
Symbolic links
Symbolic link solves problem
Can have too many symbolic links and they take time to
follow
Big advantage-can point to files on other machines
File System Management and Optimization-
the dirty work
File System Backups
File System Consistency
File System Backups (1)
• Backups to tape are generally made to handle one of two
potential problems:
Recover from disaster (e.g. disk crash, pit bull eats
computer)
Recover from stupidity (e.g. someone removes file by
stepping on keyboard)
• Moral:
(1) don’t allow anybody near computers (2) back up your files
• Tapes hold hundreds of gigabytes and are very cheap
Back up issues
1. Not to back up whole file system
Can get binaries from manufacturer’s CD’s
Temporary files don’t need to be backed up
Special files (I/O) don’t need it
2. Incremental dumps: Complete dump weekly/monthly and daily dump of
modified files since big dump
minimizes dumping time
It makes recovery more complicated; because first the most
recent full dump has to be restored, and then, all the incremental
dumps in reverse order.
need better algorithms
Back up issues
3. to compress data before dumping it
but what if part of the tape is bad
4. hard to dump when system is active (i.e. being used)
leads to inconsistency
Algorithms have been devised for making rapid snapshots of the
file system state
5. many nontechnical problems can happen into an organization
Suppose the best online security system in the world may be
useless if the system administrator keeps all the backup tapes in
his office and leaves it open and unguarded
Dumping strategies
1. Physical dump
2. Logical dump
1. Physical dump
dump the whole thing
The good – it is simple to implement & has great speed
Don’t want to dump:
a) unused blocks – program needs access to the unused block list
and must write block number for used blocks on tape
b) bad blocks – Disk controller must detect and replace them or the
program must know where they are (they are kept in a bad block
area by the OS)
The bad –
o Can’t skip a particular directory
o Can’t make incremental dumps
o Can’t restore individual files
2. Logical Dumps
o Starts at directory and recursively dumps all files/directories below it
which have changed since a given time
o Examine standard algorithm used by Unix. Dumps files/directories on
path to modified file/directory because
Can restore path on different computer
Have the ability to restore a single file
[Example of file /usr/jhs/proj/nr3/plans/summary in a directory whose
parent directories are removed]
Logical Dump Algorithm
o Uses bitmap indexed by i-node
o 4 phases
1) Phase 1 – starts at root and marks bits for modified
files and all directories (a)
2) Phase 2 – walks the tree, unmarks directories without
modified files in/under them (b)
3) Phase 3 – go through i-nodes and dump marked
directories (c)
4) Phase 4 – dump files (d)
File System Backups - Example
Restoring file on disk
o Start with empty file system on disk
o Restore most recent full dump. Directories first, then files.
o Then do restore incremental dumps (they are in order on the tape)
o Issues:
Free block list must be reconstructed. Easy to do-it is the
complement of the used blocks
Links to files have to be carefully restored
Files can have holes in them. Don’t want to restore files with lots of
zeroes in them
Pipes can’t be dumped
Tape density is not increasing => need lots of tapes and robots to get
at them.
File System Consistency
o Crash before blocks written out results in an inconsistent state
o It is critical if those blocks are i-node blocks, directory blocks, or
blocks containing the free list
o Need utility program to check for consistency in blocks and files (e.g.,
fsck in Unix, scandisk in Windows)
o Two types of consistency check: by blocks & by files
o For block consistency, it Uses two tables
1. How many times is block present in a file
2. How many times block is present in free list
Device then reads all of the i-nodes, increments counters
File System Consistency – Example
Figure 4-27. File system states. (a) Consistent. (b) Missing block.
(c) Duplicate block in free list. (d) Duplicate data block.
File System Consistency – Example
o Missing block (b)-put on free list
o Block occurs twice on free list (c)-re-build free list
o Block occurs twice on blocks in use (d)- notify user. (If one
file is removed, then block appears on both lists)
o Solution to duplication:
Look at files instead of blocks
Use table of counters, one per file
The MS-DOS File System-directory entry
.
The MS-DOS File System –maximum partition size
Figure 4-32. Maximum partition size for different block sizes. The
empty boxes represent forbidden combinations.
The UNIX V7 File System (1)
Figure 4-33. A UNIX V7 directory entry.
The UNIX V7 File System (2)
Figure 4-34. A UNIX i-node.
The UNIX V7 File System (3)
Figure 4-35. The steps in looking up /usr/ast/mbox.