0% found this document useful (0 votes)
71 views12 pages

Operating Systems Foundations MSIT 5170

This document provides an overview of file systems and how they are used in operating systems. It discusses how file systems manage blocks of storage on block devices and use metadata to organize files in a hierarchical structure. It also describes the different types of files that can exist in a file system, with regular files and directories being the most common. File names have certain character limitations and maximum lengths that file systems enforce.

Uploaded by

Asad Mehmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views12 pages

Operating Systems Foundations MSIT 5170

This document provides an overview of file systems and how they are used in operating systems. It discusses how file systems manage blocks of storage on block devices and use metadata to organize files in a hierarchical structure. It also describes the different types of files that can exist in a file system, with regular files and directories being the most common. File names have certain character limitations and maximum lengths that file systems enforce.

Uploaded by

Asad Mehmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

OPERATING SYSTEMS FOUNDATIONS

MSIT 5170

University of Massachusetts Lowell


Department of Computer Science
Spring Semester 2020

Week #8 Lesson

1. The material in this lesson follows the reading assignment for ch 4.1 through ch 4.3 from
the Tanenbaum text
2. Please read the text before you read through this lesson.
3. The material in chapter 4 will focus on persistent storage and file systems.
4. We continue to look at system level tools, and I encourage students to experiment with
these tools. We will also build some simple applications in the Linux environment to explore
some of the aspects of file systems.

File Systems:

Re-visiting our most fundamental view of a Von Neumann style computing system, we envision
a processor (CPU), a working memory of some size (RAM), some sort of interface to the outside
world (an I/O bridge), and a system bus to tie these basic components together.

Previously, we’ve considered the CPU and Memory roles in some detail, and we will now move
on to the I/O piece of the system. The actual I/O bridge details are not our first concern (we will
discuss more of this in chapter 5), but we want to focus now a little further down the I/O path to
the controllers that connect to persistent storage devices. These controllers come in various
flavors, such as SCSI/SAS, ATAPI/SATA, Fibre Channel, Fire Wire, USB, etc., but each of
these controllers are of interest to us here because they allow various types of block storage
devices to be attached through the controller to the I/O bridge interface, and thus to the system.
The most common type of storage device in this context is still the venerable rotating disk drive,
but flash drives, hybrid SSD/HDD drives and other formats are becoming more important on a
daily basis.

The vintage system shown above (2006 era), shows some details of the components we’ve
previously discussed. The functionality of the “Bridge chip” (often called a north bridge chip)
has been largely moved onto the CPU package in today’s architectures (i.e. memory controllers,
graphics processing and PICe lanes), but storage devices are still accessed by controllers either
attached to the PCI bus or located on some type of super-IO chip (often called a south bridge
chip) like the “ATAPI controller” depicted above.

What makes storage devices of interest here is their innate ability to transfer data in some size
block per operation. We call these devices block devices, and at their finest granularity, we
expect these devices to transfer 512 bytes per read/write minimum (the historical block size of
rotating disks since the early 70’s). The 512 byte block size is typically called a sector size
among the disk community, and while the size is subject to some variation from device to
device, operating systems like Linux and Windows have used the 512 byte sector size as part of
their file system architectures for decades. Even if a device has a different native block size
(i.e. flash devices usually have a block size of 256 KiB or larger, and newer HDDs (SATA and
SAS) have a native block size of 4 KiB), there is always a firmware interface available on a
device that allows the operating system (and a file system hosted by the operating system) to
talk to the drive in either 512 byte units, since that is what many legacy applications and file
systems still depend on. Most newer revisions of certain file systems such as NTFS in Windows
and the latest Ext-4 version in Linux, now provide native support for the newer 4KiB block size,
providing better performance by avoiding the firmware translation layer.
The controllers for these block devices typically support a transfer mechanism called Direct
Memory Access (DMA), where the driver tells the controller what blocks it wants from disk (in a
read operation) and where those blocks should be copied into physical RAM. The controller
then works with the I/O bridge and the disk itself to effect the transfer of the blocks requested
without any further need of CPU operations. When the transfer is complete, the controller
sends an interrupt to the CPU, forcing the currently running thread on the CPU to enter the
kernel and run the exception handler for that particular interrupt. The exception handler code
typically verifies status, clears the interrupt from the controller, and checks to see what thread(s)
was/were waiting for that transfer to complete. Any thread(s) in the block state that was/were
waiting for that event, is/are now moved to the ready state, and the exception is complete,
allowing the body snatched thread that ran the exception to leave the exception handler code
and return to whatever it was doing before the interrupt.
File systems leverage these block devices for persistent storage. A file system can be viewed
as a collection of software that manages the blocks on a block storage device. In a sense, a
file system is a high level generic block driver layered over a lower level block device driver. A
given file system driver doesn’t care what block device driver it uses (SCSI, USB, etc.), it only
cares that the block device driver provides the necessary interfaces for transferring 512 byte (or
multiples of 512 byte) block units in and out of RAM.
Because a file system is usually expected to be an indexed collection of information (a name
space), you can imagine that we need to store two kinds of items on a target block device to
have a file system. We must store the actual data that make up the content of a file object (if
the file object has content), and we must store the meta-data that define the file system itself as
well as each object that exists in the file system. When we say that we “format” a file system
onto a block device, we are actually writing a collection of data structures (the meta-data)
onto the block device to lay out the file system for use. The formatting operation is then
followed by some sort of mounting operation that allows the newly formatted block device to
become visible as a file system to the operating system, and available for use by
applications. During the mounting operation, a collection of meta-data known as the
“superblock” structures, are copied from the block device into the operating system’s file
system cache (in DRAM), where they will remain for as long as the file system remains
mounted and visible to the user community. The superblock structures are something like a
table of contents for the mounted file system, providing a root directory node from which all
content on the file system can be reached, and a number of structures that point to the available
free blocks (for allocating new objects or extending the size of existing objects). Whenever any
part of the file system is modified, the in-memory updated superblock is written back to the file
system device, so the in-memory information and the information stored on the file system
device are kept in close synchronization. When we want to remove the file system from
visibility, we un-mount it, forcing any cached data or meta-data back to the disk so the file
system is up-to-date and consistent. If a system crash occurs, we don’t get a clean shutdown
of the file system and there are likely to be modified blocks still in RAM that never got written
to the file system. In this case, the file system becomes corrupted and we must wait for the
operating system to put the file system into a consistent state before the system can be
rebooted.

Files:

We generally refer to the named objects within a file system as files, but the term is very
generic, and a given file system may have many different types of files, each type having
some specific set of attributes. For example, in the common Linux Ext<n> file systems
(where we have ext2, ext3 and ext4 variants today), there are 7 file types. Of the 7 types, the
2 most common are an ordinary (sometimes called regular) type, and a directory type. An
ordinary type Linux file includes just about every file object that can hold one or another type
of data, including text files, object files, binary data files, executable files, etc. A directory file,
on the other hand, is a file that holds only translation information in the form of a file name (one
of the files in said directory), and some indexing information that says where to find the
controlling structure (meta-data) for that named file. In Linux, a directory entry has very
little information about the files that are listed as members of that directory, but for each
named file, it translates the human readable name of the file into the address of the controlling
data structure for the file (called an inode). Everything that the file system knows about a
given file object (except for the human readable name) is kept in the object’s inode. So
ordinary file objects have arbitrary content, but directory file objects have a specific collection of
name to inode translations for the directory they represent. We will look at the remaining 5
Linux file types later.

Of the attributes that a file object may possess, its human readable name is among the most
important. Most file systems have rules about the characters that can be used in a file name,
and how long the filename (and possibly extension names) can be. Linux is very permissive in
this sense, allowing all characters on a standard keyboard except the forward slash ( / ) , and
most versions use a default file name maximum length of 256 characters (but this is a
configuration parameter, and can be changed for a given build of a Linux kernel). For name
portability however, it is generally recommended that names be composed of upper and lower
case letters, digits and a few special characters like the – (dash), .(dot, period) and
_(underscore). The forward slash ( / ) character is reserved for two purposes. The root
directory of the entire file system hierarchy is given the name / , so all absolute path names
that lead to an object in a given file system must start with a /. For example, on mercury, my
home directory (the working directory of my login shell process) is:

/usr/cs/faculty/bill

which means that the name of my directory is bill but to find it you must first look in the root
directory / and locate the usr directory and then look in the usr directory and locate the cs
directory, and then look in the cs directory and locate the faculty directory, and finally look in
the faculty directory and locate the bill directory. This kind of a string is called a path name,
beginning with the root directory, and finishing with the target of interest (which could be any of
the 7 types of file system objects). In Linux, all file objects in a given system’s file system
hierarchy (which can be composed of multiple discrete file systems, but must have one file
system in the form of the root file system) must have one (or possibly more than one) unique
path name(s) (unique in the sense that exactly one object in the file system can be reached
with a given path name). Remember, a pathname allows the kernel to find the human readable
file object name in some directory and translate it into the address of a piece of meta-data (i.e
the object’s inode). It’s possible to have more than one pathname for a given file object, since
the name merely connects to the inode of the object, but each valid pathname must be unique.
It’s really the inode that is the object. Think of an inode as a dog, and a pathname as a leash;
we can have many different leashes on a given dog, but they all lead to the same place
(removing one leash does not free the dog if another leash is still there). Besides a name, and
depending on a given file system, there may be many other attributes associated with a file
object such as:
These are the kinds of details we would normally keep on a file control block (the FCB in
Linux/UNIX is called an inode). In addition to keeping attributes for each file object, the file
system also must also support a set of operations on file objects, including:

The following lines of C code provide an example of some of these operations as shown for an
ordinary file. These lines are part of a larger program, but they show how two system calls are
needed to retrieve data from an ordinary file object. The first call opens the ordinary file object
for reading, and the second reads 100 bytes of the file content into a buffer:

int file_channle, read_count;


char my_buffer[100];
file_channel = open(“/usr/cs/faculty/bill/sample_file”, O_RDONLY, 0);
read_count = read(file_channel, my_buffer, 100);
The file_channel and read _count variables are just integers, and my_buffer is just any array of
100 bytes. The open() call is used to establish a connection to the file in my login directory
called sample_file, and the call says that we only want read access to the file. If the open() call
succeeds, then the returned file_channel variable will have a valid file channel number with
which we can do I/O. The read() call now uses the valid file_channel to read 100 bytes of data
from the file into the local buffer called my_buffer. The first read after an open, always starts at
byte zero in the file, so in the case above, we would have copied bytes 0 – 99 from the file into
my_buffer. The file system in conjunction with the operating system maintains a file pointer for
all open files, and this pointer is automatically moved ahead by the number of bytes either
read or written, so a second read with the same parameters as the first would return bytes 100 –
199. Of course Linux has a seek system call that allows the file pointer of an open file to be
positioned anywhere within the file for the next read or write.

Directories:

One of the 7 file types in Linux is a directory. You can imagine a directory as a file whose
contents contain information about other files that logically reside within the given directory. As
we said earlier, in the Linux/UNIX world we always begin the file system hierarchy at a top level
directory whose name is / (we call this the root directory).

Since directories are just a kind of file, they have attributes of a file, as listed above. The
operations on a directory, however, are somewhat different from those on an ordinary file, and
include:

In the Linux environment, directories are just translation structures as we said above. Each
file object in a Linux file system is represented by a data structure called an inode, and each
inode in a file system has a unique number. During the formatting of a file system onto a
storage device, a set of inodes is allocated and each inode is marked free. Whenever we
create any kind of file object within the file system, we must find a free inode, and use that
inode to represent the new object. Whenever an object is deleted from a file system, its inode is
returned to the free state, so it can be used again when a new file is created. This is a one-to-
one relationship, with exactly one inode for each file object. A directory simply contains the
human readable name for a file object, and its corresponding inode number. People like to use
names for things, but the operating system knows all files by their inode numbers, and their
directory names are only kept for looking up their inode numbers. Most contemporary file
systems work this way, with directories serving as translation tables, and all file details found on
a file control block (inode in Linux) that the directory associates with a human readable name.

From the above discussion, you can see that some of the blocks on a file system store data,
and some store meta-data. The meta-data that we’ve seen so far for the Linux example
include the file system superblock, and the inodes. During formatting (for traditional
Linux/UNIX file systems like the EXT family), the superblock and a fixed number of inodes are
created in some of the available blocks on the file system device, putting those blocks into
permanent use. The rest of the blocks are then available to hold data or additional meta-
data as needed.

File System Implementation:

The book describes three general methods for implementing a file system:
• contiguous allocation
• linked list allocation
• indexed allocation.

Contiguous allocation is appealing because it only requires us to remember a starting disk


block number and the number of disk blocks in the file object, but because allocations have to
be contiguous, if a file needs to grow and the disk blocks following its current deployment are in
use by some other file, the we have to find a suitably large contiguous region for the new size of
the file, and copy the original data to this new location. This is very inefficient, so these kinds of
file systems are not used very often.

Linked list file systems also have the appeal of needing very little information to access a file,
just the starting block location and where in the file you want to go. Of course, to get to a
destination, you must read every block in between, since each block contains a pointer to the
next block in the file. This solves the contiguity problem, but introduces serial access
semantics, which perform very poorly.

The indexed allocation model is the most widely used model, since it has no contiguity
requirements, and no serialization semantics. This is the kind of file system found in
Linux/UNIX, Windows, and most other general purpose platforms. In fact, the term inode
stands for index-node.

An indexed file system uses some type of file control structure to hold information about a file
object, including an index of the file system blocks that contain the data belonging to that file.
The main problem with this type of deployment, is managing enough index entries to keep
track of large file objects. After all, we can’t afford to have inodes that are big, since we need
potentially millions of them to keep track of all of the files in a file system. The Linux/UNIX
solution to this problem is to keep inodes quite small (usually 256 bytes or less), and use an
indirection scheme that allows a simple small inode to keep track of every byte of a file that may
be more than a terabyte large. We will look at inodes in more detail in the next lesson, and in
the weekly exercises.

Regardless of what kind of file system we may be using, we typically try and cache as much of
the file system as practical in RAM for performance purposes. When files change, the change
is not permanent until the new data and meta-data are written to disk, but because of delayed
writes for performance, if a system crashes while it has dirty buffers, then the files on the file
system may become inconsistent. Crashes lead to 2 major problems, lost data (those dirty
buffers in RAM) and slow restarts while the operating system tries to put the file system back
into a consistent state (in Linux we use a tool called fsck for this purpose (file system
checker)). To help mitigate these problems, many file systems include options for what is called
journaling. The idea of journaling is to write a journal entry immediately to disk whenever a file
object changes (the journal entry is small and fast, and takes much less effort than writing the
actual data and meta-data changes). The journal entry represents a transaction that’s in
progress, but has not yet completed. When the actual changes that the journal describes are
finally written completely to disk, the journal entry is discarded. So the journal is generally quite
small, and only includes those transactions that are in progress. If the system crashes, upon
recovery, the journal specifies what parts of the file system could possibly be corrupted, saving
the exhaustive search that fsck used to have to do to put a file system back online.
Depending on how much we journal (what settings we select for a given file system), the journal
can also help with saving what would otherwise have been lost data. Linux has three settings
for journaling on its ext3 and 4 file systems, the most elaborate of which will slow down your
system, but give you the maximum chance of saving data and restarting quickly after a crash.
The simplest journaling option has very little impact on file system performance, but cannot
recover dirty buffers at all after a crash. It is still very useful for bringing a file system back on
line quickly, however.

Week #8 Summary:

File systems are designed to provide persistent storage for computing systems. You can think
about a specific file system implementation as a generic block driver that, when deployed in an
operating system, will be layered over some specific block device driver (like a SCSI disk driver
for example). This means that file systems must have an interface to a block device upon which
to store and retrieve bytes. The classic block device is a rotational disk, which can only transfer
data in block size units (commonly 512 byte or 4 KiB units called disk sectors). When we talk to
a disk to request a read, for example, we must tell it what sector to start with, and how many
contiguous sectors that we want to transfer. We must also tell it where in RAM we want these
bytes to go.

The information that a file system stores on a block device includes both data and meta-data.
Not all file system objects actually contain data (for example, a file may exist but be empty), but
every file object must be represented by some meta-data. The basic units of meta-data are
deposited on a block device when we format that device for some sort of file system. These
items typically include a kind of table-of-contents called a superblock, and a collection of file
control structures (called inodes in the Linux/UNIX world)

When a file system is made available to an operating system (via some sort of mounting
operation), the superblock item is cached into RAM, and remains there as long as the file
system remains mounted. This newly mounted file system appears somewhere within the file
system hierarchy of the running system. Once mounted, a file system’s contents become
available beginning at some path that represents the top level directory of that file system. In
Linux, the first file system mounted at boot time is called the root file system, and the directory
named / becomes the root of the entire file hierarchy. Subsequent file systems can now be
mounted on any directory that’s available from the root file system. For example, if we
imagine that we have a root file system that has a directory at /dir1/dir2 (here dir1 is a directory
in / and dir2 is a directory in dir1), and that we have a block device found at the path
/dev/sda3 with an ext3 file system formatted on it, we could run the following shell command:

bash #> mount –t ext3 /dev/sda3 /dir1/dir2

Of course you must be logged in as root (superuser) to do this, but when the command
completes, then we have access to all that’s on the file system on the block device /dev/sda3
by using the path /dir1/dir2 as a prefix to any file access command. For example, if there is an
object called mydoc.txt in the base level directory of the newly mounted file system, we could
cat its contents to the screen with the shell command:

bash $> cat /dir1/dir2/mydoc.txt

A given Linux system may have several discrete file systems on multiple disks (or disk
partitions). When the system boots, the root file system will be mounted (it is a condition of a
successful boot), but the other file systems will not be available until each one is mounted (this
is usually done by system initialization scripts, which run after actual boot).

A given file system will generally support an ordinary file object and a directory file object as a
minimum number of file object types, but there may be more, as we’ll see next lesson. In Linux,
ordinary files are containers that can hold arbitrary bytes, sometimes the bytes represent
binary executable files, sometimes plain text files, but Linux treats them all as ordinary file
objects. Directory file objects, on the other hand, do not have arbitrary bytes in them, but
contains specific information in the form of a file object’s human readable name, and the
unique inode number that identifies that file object to the operating system. File object names
are for use by people, but the system recognizes a file object by its unique internal inode
number, so a directory is really just a translation table. Directory entries really don’t know any
details about the objects they contain, they just know how to find the inode for each of these
objects, and it’s the inode that has all details of any file object of any type.

There are different ways to implement file systems, but in today’s world, most contemporary
implementations use some form of indexed allocation mechanism. This is true in Windows,
Linux/UNIX, OSX, etc. The indexing element in Linux is called an inode, and there is exactly
one inode put into use for each and every file object in a Linux file system. The inode contains a
collection of state information about the object it’s tied to, as well as a list of disk blocks that
the object is currently using (if any). Whenever a file object changes (for example, more data is
written to an existing file) the file object’s inode must be updated as well, to represent the new
details about the object.

File systems tend to try and cache as much content as they can in RAM, delaying updates to
disk until they’ve collected enough information to make the write-to-disk operation efficient.
This becomes a problem when a system crashes, since changes to file objects may be stuck in
RAM cache and never make it to disk. When we try and reboot the system, the file system
driver will not allow file systems to be remounted until they are checked for consistency, and
this can take a long time, using the default exhaustive check strategy. Journaling was
introduced some years ago in attempt to expedite this situation. Most all contemporary file
systems now allow system administrators to enable journaling in some form. Journaling
introduces additional I/O overhead, but it can save significant amounts of time, and potentially
salvage what would have been lost data upon recovering from a crash.

• The Ext3-4 filesystems can be configured to log the operations affecting both metadata
and data blocks
• The system administrator decides what must be logged:
– Journal
• All data and metadata changes are logged into the journal
– Ordered
• Only changes to filesystem metadata are logged into the journal
• The Ext3 filesystem groups metadata and related data blocks so that data
blocks are written to disk before the metadata
• This way, the chance to have data corruption inside the files is reduced;
for instance, each write access that enlarges a file is guaranteed to be
fully protected by the journal
• This is the default Ext3 journaling mode
– Writeback
• Only changes to filesystem metadata are logged
• This is the method of other journaling filesystems and is the fastest

You might also like