File Systems and Storage
D r. B i l g i n A v e n o ğ l u
Storage
• All computer systems require some way to store data permanently
• in order to boot, run and be useful
• These bits reside on some sort of storage system.
• Even though can be stored remotely or in memory
• Most frequently, data is stored on local hard disks,
• Over the last few years more and more of our files have moved “into
the cloud”,
• easy access to large amounts of storage over the Internet.
Storage Models
• We distinguish different storage models
• by how the device keeping the bits in place interacts with the higher layers:
• by where raw block device access is made available,
• by where a file system is created to make available the disk space as a useful
unit,
• by which means and protocols the OS accesses the file system.
• Three main components of the storage device
• the storage device itself, i.e. the actual medium;
• the file system, providing access to the block level storage media to the
operating system;
• the application software.
Storage Models
• Direct Attached Storage
• Network Attached Storage
• Storage Area Networks
• Cloud Storage
Direct Attached Storage - DAS
• Hard drives are attached (commonly via a host bus adapter and a few
cables) directly to the server,
• The OS detects the block devices and maintains a file system on them
and thus allows for access with the smallest level of indirection.
• The vast majority of hosts (laptops, desktop and server
systems alike) all utilize this method.
• The term used nowadays
Direct Attached Storage (DAS)
Direct Attached Storage - DAS
• All components are within the control of a single
server’s OS
• frequently located within the same physical case
• multiple servers each have their own storage system
• It is common for servers to have multiple direct
attached devices
• Multiple direct attached disks can be combined
to create a single logical storage unit through the
use of a Logical Volume Manager (LVM) or a
Redundant Array of Independent Disks (RAID).
• This allows for improved performance, increased
amount of storage and/or redundancy.
Redundant Array of Independent Disks (RAID)
• RAID 0 (Striping)
• RAID 1 (Mirroring)
• RAID 5 (Striping with Parity)
• RAID 6 (Striping with Double Parity)
• RAID 10 (Striping and Mirroring)
https://www.youtube.com/watch?v=U-OCdTeZLac
Direct Attached Storage - DAS
• DAS need not be physically located in the same case (or even rack) as
the server using it.
• We differentiate between
• internal storage media attached inside the server with no immediate external
exposure
• external storage media attached to a server’s interface ports, such as Fibre
Channel, USB etc. with cables the lengths of which depend on the technology
used.
Direct Attached Storage - DAS
• The advantages of DAS should be obvious:
• No network or other additional layer in between the OS and the hardware,
• the possibility of failure on that level is eliminated.
• A performance penalty due to network latency, is impossible.
• There are some disadvantages.
• The storage media is directly attached, it implies a certain isolation from other
systems on the network.
• This is both an advantage as well as a drawback:
• each server requires certain data to be private or unique to its OS;
• data on one machine cannot immediately be made available to other systems.
Network Attached Storage - NAS
• We want to be able to access certain data
from multiple servers.
• Store all your users’ data on shared disks that
are made available to all clients over the
network.
• When a user logs into hostA, she expects
to find all her files in place just as when
she logs into hostB.
• (1) the host’s file system has to know how to
get the data from a central location
• (2) the central storage has to be accessible
over the network.
Network Attached Storage - NAS
• One host functions as the “file
server”,
• Multiple clients access the file
system over the network.
• The file server may be a general
purpose Unix/Linux system or a
special network appliance
• it provides access to a number of
disks or other storage media,
• which, within this system are
effectively direct attached storage.
Network Attached Storage - NAS
• From the clients’ perspective,
the job of managing storage has
become simpler:
• I/O operations are performed on
the file system much as they
would be on a local file system,
• with the complexity of how to shuffle
the data over the network being
handled in the protocol in question.
• even though the file system is
created on the file server,
• the clients still require support for
the network file system
Network Attached Storage - NAS
• In contrast to DAS, a dedicated file server generally contains
significantly more and larger disks;
• RAID or LVM may likewise be considered a requirement in this solution,
• to ensure both performance and failover.
• Given the additional overhead of transferring data over the network,
• a certain performance penalty (mainly due to network speed or congestion) is
incurred.
Network Attached Storage - NAS
• NAS allows multiple clients to access the same file system over the
network,
• That means it requires all clients to use specifically this file system.
• The NAS file server manages and handles the creation of the file
systems on the storage media and allows for shared access,
• overcoming many limitations of DAS.
QNAP Sample NAS Device
Storage Area Networks
• In these dedicated networks, central storage media is accessed using
high performance interfaces and protocols such as Fibre Channel or
iSCSI, making the exposed devices appear local on the clients.
• Properties of SANs
• storage size,
• data availability,
• data redundancy,
• performance
Storage Area Networks
• Fibre Channel (FC) is a high-
speed data transfer protocol
providing in-order, lossless
delivery of raw block data.
• Internet Small Computer System
Interface (iSCSI), works on top of
the TCP and allows the SCSI
command to be sent end-to-end
over LANs, WANs or the
Internet.
Storage Area Networks
• A SAN providing access to three
devices;
• one host accesses parts of the
available storage as if it was DAS,
• while a file server manages other
parts as NAS for two clients.
Comparison
• Direct Attached Storage (DAS): block-level access
• Network Attached Storage (NAS): distributed file systems
• Storage Area Network (SAN): block-level access
Cloud Storage
• Cloud storage services offer
customers a way to not only
store their files, but to access
them from different devices and
locations:
• they effectively provide network
attached storage over the largest
of WANs, the Internet.
• Dropbox, Google Drive, Apple’s
iCloud and Microsoft’s OneDrive
Oracle ExaData
File
• Files are logical units of information created by processes.
• A disk contains thousands or even millions of them, each one
independent of the others.
• A file is a kind of address space which are used to model the disk
instead of modeling the RAM.
• Processes can read existing files and create new ones if needed.
• Information stored in files must be persistent,
• not be affected by process creation and termination.
File Structure
• Files can be structured in several ways.
• There are three common possibilities
Three kinds of files. (a) Byte sequence. (b) Record sequence. (c) Tree.
Types of Files
• The types of files recognized by the operating systems generally are:
• regular,
• directory,
• special.
• However, the operating system uses many variations of these basic
types.
Regular Files
• ASCII File
• Defined as a file that consists of ASCII characters.
• Usually created by a text editor like emacs, TextEdit, vi, Notepad, etc.
• ASCII files are also binary files, because they store binary numbers.
• ASCII files store 0's and 1’s.
• ASCII code is a 7-bit code stored in a byte.
• Eighth bit was originally reserved for error checking.
• Binary File
• A full, general binary file has no restrictions.
• Any of the 256-bit patterns can be used in any byte of a binary file.
• We work with binary files all the time.
• Executables, object files, image files, sound files, and many file formats are binary.
• What makes them binary is merely the fact that each byte of a binary file can be one
of 256-bit patterns.
• They're not restricted to the ASCII codes.
Executable Files
• A Portable Executable file is a data structure that holds
information necessary for the OS loader to be able to
load that executable into memory and execute it.
File System
• Files are managed by the OS.
• How they are structured, named, accessed, used, protected,
implemented, and managed are major topics in OS design.
• The part of the OS dealing with files is known as the file system.
Some popular file systems
• Window's systems:
• FAT (old DOS, smaller USB drives)
• FAT32 (32-bit version for Windows, larger USB drives)
• NTFS (Modern Windows)
• ReFS Resilient FS (Future Windows)
• Linux:
• ext2 (second extended file system, standard until several years ago, no journaling; still good for
flash drives)
• ext3 (ext2+journaling)
• ext4 (ext3+extents)
• Plus about 40 others including FAT and NTFS
• Mac OS X:
• APFS (macOS, successor to HFS+)
• HFS+ (previously used in Mac OS X, successor to HFS)
• Common systems:
• CD (ISO 9660)
• DVD file systems (Universal Disk Format, UDF)
File Systems’ Properties
Storage Device Organization
• A general-purpose computer system can have multiple storage
devices, and those devices can be sliced up into partitions, which hold
volumes, which in turn hold file systems.
• Depending on the volume manager, a volume may span multiple
partitions.
File-System Layout
• Sector 0 of the disk is called the MBR (Master Boot Record) and is
used to boot the computer.
• Partition table gives the starting and ending addresses of each
partition.
• One of the partitions in the table is marked as active.
File-System Layout
• MBR program locates the active partition, reads in its first block,
which is called the boot block, and executes it.
• The program in the boot block loads the OS contained in that
partition.
• Other than starting with a boot block, the layout of a disk partition
varies a lot from file system to file system.
File-System Layout
• Superblock contains all the key parameters about the file system and
is read into memory when the computer is booted or the file system
is first touched.
File-System Layout
• Next might come information about free blocks in the file system, for
example in the form of a bitmap or a list of pointers.
• This might be followed by the i-nodes, an array of data structures,
one per file, telling all about the file.
• After that, might come the root directory, which contains the top of
the file-system tree.
• Finally, the remainder of the disk contains all the other directories
and files.
Block
• A track is divided into blocks.
• The block size B is fixed for each system.
• Typical block sizes range from B=512
bytes to B=4096 bytes.
• Whole blocks are transferred between
disk and main memory for processing.
File Allocation Methods
• There are three methods:
• Contiguous
• Linked
• Indexed
Contiguous Allocation
• Every file occupies a set of
consecutive addresses on the storage.
• Each entry in a directory contains
a number of blocks (size), starting
address of the first block, and the file name.
• Growing a file will run into a limit.
• Deleting or shrinking a file will fragment the disk.
• Requires to compact free space
• Otherwise, there is external fragmentation
• However, the advantages are efficient access to a file's contents
(random access) and easy disk management.
Linked Allocation
• Each data block in the file contains the address of the next block.
• Each entry in a directory contains file-name, block address, and (not
necessarily) a pointer to the last block.
• In this file allocation method, each file is
treated as a linked list of disks blocks.
• It is not necessary that disk blocks be assigned
to a file in a contiguous manner on the disk.
• Files can grow or shrink with almost no limit.
• File size need not be declared when created.
• One disadvantage
• direct access to ANY block is not possible,
no random access,
• sequential access is supported (just like linked lists).
Indexed Allocation
• A set of pointers is maintained in an index table.
• The index table is in turn stored in several index blocks.
• In an index block, the ith entry (the pointer at position i) holds the disk
address of the ith file block.
• All the advantages of linked allocation
plus direct access to any block.
• Allows random access to any block/byte of
the file when using fixed-size blocks.
Indexed Allocation
• If the index block is too small, it will not be
able to hold enough pointers for a large file.
• Linked scheme
• An index block is normally one storage block.
• To allow for large files, we can link together
several index blocks.
• Multi-level index
• Uses a first-level index block to point to a set of
second-level index blocks, which in turn point to the
file blocks.
• Combined scheme
• Keep the first, say, 15 pointers of the index block in the file’s I-node.
• The first 12 of these pointers point to direct blocks;
• that is, they contain addresses of blocks that contain data of the file.
• the data for small files (of no more than 12 blocks) do not need a separate index block.
• The next three pointers point to indirect blocks
Oracle Logical Storage Structures
THE END
Questions & Answers
D r. B i l g i n A v e n o ğ l u