COMP 293
System Security
The File System
001000001 001000001 01010010 01010010 01010010
File System Architecture
Structural view of the file system
The File System 2
File System Architecture
User space contains the applications (the user of the file
system) and the GNU C Library (glibc), which provides the
user interface for the file system calls (open, read, write,
close). The system call interface acts as a switch,
funneling system calls from user space to the appropriate
endpoints in kernel space.
The VFS is the primary interface to the underlying file
systems. This component exports a set of interfaces and
then abstracts them to the individual file systems, which
may behave very differently from one another.
Two caches exist for file system objects (inodes and
dentries – directory entries). Each provides a pool of
recently-used file system objects.
The File System 3
File System Architecture
Simplified further it looks like the above
With ~100 file systems supported – many legacy systems
one way to support legacy file systems
The File System 4
Virtual File System Layer
VFS acts as the root level of the file system
interface. The VFS keeps track of the currently-
supported file systems, as well as those file
systems that are currently mounted
File systems can be dynamically added/removed
Kernel keeps list of currently-supported file
systems, which can be viewed from user space
through the /proc file system.
/proc virtual file also shows the devices currently
associated with the file systems
We will get back to /proc
The File System 5
Buffer Cache
Buffer cache keeps track of r/w requests from the:
Individual file system implementations
Physical devices (through device drivers)
Linux maintains a cache of the requests
Avoids going back to physical device for all
requests - efficient
The most-recently used buffers (pages) are
cached thus can be quickly provided back to
the individual file systems
• Not found in cache? – Page fault
The File System 6
Page Faults
The thread experiencing the page fault is put into
a Wait state while the operating system finds the
specific page on disk and restores it to physical
memory
Takes time
• Sends trap to the OS
• Save user registers and process state
• Determine location of the page on the disk
Large numbers of page faults are an indication of
insufficient RAM
Also cause page reads (see disk counters)
The File System 7
Page Faults
top add fault stats via f nMaj (not as flags)
The File System 8
Block & Character Devices
Block devices move data to/from that occur in
blocks (such as disk sectors)
Supports buffering and random access behavior
(is not required to read blocks sequentially, but
can access any block at any time). Block devices
include hard drives, CD-ROMs, RAM disks.
Character devices differ in that they do not
have a physically-addressable media. Character
devices include serial ports and tape devices, in
which data is streamed character by character.
The File System 9
Journaling
Journaling keeps track of the major steps taken
during last file sessions
If system crashes, it can boot and back up to the
last known good configuration and recover to the
point of the crash
Hard drives may have their own write caches,
journaling thus forces the device to flush its cache
at certain points in the journal (called barriers in
ext3 and ext4)
ACID (atomicity, consistency, isolation, durability)
a set of properties that guarantee database
transactions are processed reliably
The File System 10
Vocabulary
Atomicity (database systems): a property of
database transactions which are guaranteed to
either completely occur, or have no effects.
Atomicity states that database modifications must
follow an “all or nothing” rule. Each transaction is
said to be “atomic.” If one part of the transaction
fails, the entire transaction fails. It is critical that
the database management system maintain the
atomic nature of transactions in spite of any
DBMS, operating system or hardware failure
The File System 11
Linux File Systems
Linux has several standard file systems
ext2 - legacy, general purpose, based on UFS
ext3 – journaling, based on UFS
avoids long file system checks after system crash
ext4 – journaling successor to ext2
ReiserFS – journaling file system (infamous)
XFS – IRIX’s file system (good for streaming media)
ZFS – Sun’s combined file system & volume mgr
procfs – interface to internal kernel structures
swap – used to support virtual memory
The File System 12
Additional File Systems
Several foreign file systems are supported
Easier to exchange files with another OS
Work just like native ones, except:
May lack some usual UNIX feature
• Long File Name support
• UNIX permissions
Have curious limitations/oddities
CDROM file systems supported
isofs – iso9660 CDROM file system
Joliet - Microsoft CDROM filesystem extensions
Why is it called Joliet, you ask?
The File System 13
Joliet ISO 9660 Extension
At the time of Windows NT 4.0 and Windows
95/98, the file system of choice to record files
with names of 128 characters was called Romeo.
Romeo didn’t contain the ISO 9660 file system
and broke backward compatibility with DOS
Joliet combined Romeo and ISO 9660
Joliet.doc metadata file in MS Win95 Driver
Development Kit comment: "Joliet is a small town
just outside of Chicago, where a man named Jake
did some time in The Blues Brothers."
The File System 14
Elwood J. Blues, “Joliet" Jake B. Blues, Bluesmobile
The File System 15
Additional File Systems
NFS - Network file system allows multiple users or
hosts to share the same files using a client/server
methodology
NTFS – preferred Microsoft file system since NT
provides ACL’s and journaling
FAT, FAT32 – potential 8.3 vs. Long Filenames
(LFN), permissions complications
Used to mount usb flash drives
vfat – LFN compatible
The File System 16
Partitioning
Partitioning is dividing a
single hard drive into
several logical drives.
A partition is a contiguous
set of blocks on a drive,
treated as an independent
disk.
A partition table is an index
that relates sections of the
hard drive to partitions.
The File System 17
Multiple Partitions
Multiple partitions reduce risk of
system failure should a partition
becomes full
Segregating OS and user data
space protects operating system if
allocated disk space is exhausted
Partitions can contain different
operating systems
Partitions can contain different
file systems
Called a slice in BSD, Solaris, and
GNU Hurd
The File System 18
Physical Disk Structure
1 or more circular platters
Platter has upper & lower
oxide-coated surface
Heads - min 1 per surface
Mounted on arms
Heads float very close to
platter surfaces
never touching them
disk crash
The File System 19
Partition Fields
Device: the partition's device name
Start: drive sector where partition begins
End: drive sector where partition ends
Size: partition's size (in MB)
Type: partition type (e.g. ext2, ext3, or vfat)
Mount Point: where partition will be mounted
within directory hierarchy (e.g. /,/var,/usr)
Tools - fdisk, cfdisk
Always document settings – hard copy
The File System 20
Master Boot Record
MBR – 512 byte sequence at first sector of drive
MBR used for one or more of:
Holds a partition (thus called a partition sector)
Bootstrapping an operating system.
• PC BIOS loads the MBR from disk and passes
execution to machine code instructions at
the beginning of the MBR
Identify disk with a 32-bit disk signature
MSDOS fdisk /mbr rewrites MBR – undocumented
GRUB and LILO can write to the MBR
The File System 21
Master Boot Record
MBR in more detail
The partition
definition section
of the MBR could
put them
anywhere, they
do not need to be
outer to
innermost
The File System 22
Master Boot Record
The organization of the
partition table in the MBR
limits the maximum
addressable storage space
of a partitioned disk to 2
TiB (232 × 512 bytes).
MBR-based partitioning
scheme is in the process of
being superseded by the
GUID (Globally Unique
Identifiers) Partition Table
(GPT). Can co-exist with an
MBR.
GPT is part of UEFI
standard
The File System 23
cfdisk
The File System 24
Disk Partition
IDE Disk Partition Example
Note: Physical drives can contain max. 4 primary partitions
/dev/hda (Primary Master Disk)
/dev/hda1 (First Primary Partition)
/dev/hda2 (Second Primary Partition)
/dev/hdb (Primary Slave Partition)
/dev/hdb1
/dev/hdc (Secondary Master/Slave Partition)
/dev/hdc1
The File System 25
Disk Partition
SATA and SCSI Disk Partition Example
Note: Physical drives can contain max. 4 primary partitions
/dev/sda (Primary Master Disk)
/dev/sda1 (First Primary Partition)
/dev/sda2 (Second Primary Partition)
/dev/sdb (Primary Slave Partition)
/dev/sdb1
/dev/sdc (Secondary Master/Slave Partition)
/dev/sdc1
The File System 26
Warning
MultiBoot with Windows & Linux can be tricky
Partitioning can be complicated
Be careful altering partitions
Assume it will go bad
Backup data prior to making any changes
Read the documentation
Research what others have done
Check with other students
Have a backout plan
The File System 27
Linux Disk Utilities
fdisk /dev/hda Linux/DOS drive partitioning tool
cfdisk /dev/hda Easier to use
sfdisk –l Lists the partition tables
parted /dev/hda Partition manipulation tool
fsck -t ext2 /dev/hda2 Check & repair file system
fsck runs automatically at boot if OS detects files
system wasn’t properly shut down. Run when file
system is unmounted or mounted read-only.
Similar to MS scandisk or chkdsk
The File System 28
sfdisk
root@tea:~# sfdisk –l
Disk /dev/hda: 1027 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/hda1 * 0+ 995 996- 8000338+ 83 Linux
/dev/hda2 996 1026 31 249007+ 82 Linux swap
/dev/hda3 0 - 0 0 0 Empty
/dev/hda4 0 - 0 0 0 Empty
The File System 29
sfdisk
sudo sdfisk – l (edited) Ubuntu
gpt
UEFI
The File System 30
Inconsistent State
A non-graceful shutdown (crash, power loss) can
leave a file system in an Inconsistent State. Prior
to journaling file systems, it was common for an
improperly shut-down Unix system's file system to
develop a corrupted superblock.
Running fsck to fix this could take minutes to
hours (volume size and disk I/O throughput)
The consequences of fsck not being able to fix the
error are not good
Hence: “fsck” and “fscked”
The File System 31
fsck on OS-X
Note: Spelling auto-
correct doesn’t play
well with UNIX terms
The File System 32
Remember
MBR: Master Boot Record
Superblock is a boot record
Big-endian machine stores most significant byte first
Little-endian machine stores least significant byte first
Ponder this
Journaling file systems
Raid
What if you don't have ECC?
Error correction code memory (ECC memory) uses an error correction
code (ECC) to detect and correct n-bit data corruption which occurs in
memory. ECC memory is used in most computers where data
corruption cannot be tolerated, like industrial control applications,
critical databases, and infrastructural memory caches.
The File System 33