0% found this document useful (0 votes)
20 views65 pages

Os - 9 - IO Subsystems

The document discusses I/O subsystems, focusing on file systems, caching mechanisms, and device drivers. It covers various file system structures like FAT32, FFS, and NTFS, as well as concepts such as virtual file systems and I/O protection using IOMMUs. Additionally, it highlights the importance of efficient data handling, memory management, and the role of device drivers in abstracting hardware interactions.

Uploaded by

dpanic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views65 pages

Os - 9 - IO Subsystems

The document discusses I/O subsystems, focusing on file systems, caching mechanisms, and device drivers. It covers various file system structures like FAT32, FFS, and NTFS, as well as concepts such as virtual file systems and I/O protection using IOMMUs. Additionally, it highlights the importance of efficient data handling, memory management, and the role of device drivers in abstracting hardware interactions.

Uploaded by

dpanic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

spcl.inf.ethz.

ch
@spcl_eth

ADRIAN PERRIG & TORSTEN HOEFLER

Networks and Operating Systems (252-0062-00)


Chapter 9: I/O Subsystems
Never underestimate the KISS principle!
spcl.inf.ethz.ch
@spcl_eth

Cache re-load and a magic trick


 Last time
 On-disk data structures
File representation
Block allocation
Directories
 FAT32 case study
Very simple block interface
Single table
 FFS case study
Blocked interface
Uses inodes, direct, (single, double, triple …) indirect blocks
 NTFS case study
Extent interface
Direct and indirect extent pointers

2
spcl.inf.ethz.ch
@spcl_eth

Our Small Quiz


 True or false (raise hand)
1. Directory structures can never contain cycles
2. Access control lists scale to large numbers of principals
3. Capabilities are stored with the principals and revocation can be complex
4. POSIX (Unix) access control is scalable to large numbers of files
5. Named pipes are just (special) files in Unix
6. Memory mapping improves sequential file access
7. Accessing different files on disk can have different speeds
8. The FAT filesystem enables fast random access
9. FFS enables fast random access for small files
10. The minimum storage for a file in FFS is 8kB (4kB inode + block)
11. Block groups in FFS are used to simplify the implementation
12. Multiple hard links in FFS are stored in the same inode
13. NTFS stores files that are contiguous on disk more efficiently than FFS
14. The volume information in NTFS is a file in NTFS

3
spcl.inf.ethz.ch
@spcl_eth

In-memory data structures


spcl.inf.ethz.ch
@spcl_eth

Opening a file
 Directories translated into kernel data structures on demand:

open(“foo”);
directory

directory structure file inode

User space Kernel Disk


spcl.inf.ethz.ch
@spcl_eth

Reading and writing


 Per-process open file table → index into…
 System open file table → cache of inodes

file inode
read(5,…)

Per-process System
open file table open file table
File blocks

User space Kernel Disk


spcl.inf.ethz.ch
@spcl_eth

Efficiency and Performance


 Efficiency dependent on:
 disk allocation and directory algorithms
 types of data kept in file’s directory entry

 Performance
 disk cache – separate section of main memory for frequently used blocks
 free-behind and read-ahead – techniques to optimize sequential access
 improve PC performance by dedicating section of memory as virtual disk,
or RAM disk
spcl.inf.ethz.ch
@spcl_eth

Page Cache
 A page cache caches pages rather than disk blocks using virtual
memory techniques

 Memory-mapped I/O uses a page cache

 Routine I/O through the file system uses the buffer (disk) cache

 This leads to the following figure


spcl.inf.ethz.ch
@spcl_eth

Two layers of caching?

File access with


Memory-mapped files read()/write()

Page cache

Buffer cache

File system
spcl.inf.ethz.ch
@spcl_eth

Unified Buffer Cache

File access with


Memory-mapped files read()/write()

Buffer cache

File system
spcl.inf.ethz.ch
@spcl_eth

Filesystem Recovery
 Consistency checking – compares data in directory structure
with data blocks on disk, and tries to fix inconsistencies

 Use system programs to back up data from disk to another


storage device (floppy disk, magnetic tape, other magnetic disk,
optical)

 Recover lost file or disk by restoring data from backup


spcl.inf.ethz.ch
@spcl_eth

Disks, Partitions and Logical Volumes


spcl.inf.ethz.ch
@spcl_eth

Partitions
Partition

File system
table

File system A File system C


B

0 Logical block address (LBA) on a single disk

 Multiplex single disk among >1 file systems


 Contiguous block ranges per FS
spcl.inf.ethz.ch
@spcl_eth

Logical volumes

Disk 1 Disk 2 Disk 3

File system A File system A File system A


(part 1) (part 2) (part 3)

Single logical volume with file system A

 Emulate 1 virtual disk from >1 physical ones


 Single file system spanning >1 disk
spcl.inf.ethz.ch
@spcl_eth

Multiple file systems


 How to name files in multiple file systems?
 Top-level volume names:
 Windows A:, B:, C:, D:, etc. (problematic)
 \\fs-systems.ethz.ch\
 Bind “mount points” in name space
 Unix, etc. (flexible)
spcl.inf.ethz.ch
@spcl_eth

Mount points
spcl.inf.ethz.ch
@spcl_eth

File hierarchy with mounts

home etc dev var usr

htor netos shm run lock bin

Mount point

Normal directory
spcl.inf.ethz.ch
@spcl_eth

Virtual File Systems


 Virtual File Systems (VFS) provide an object-oriented way of
implementing file systems.

 VFS allows the same system call interface (the API) to be used
for different types of file systems.

 The API is to the VFS interface, rather than any specific type of
file system.
spcl.inf.ethz.ch
@spcl_eth

Virtual File System

File system interface

VFS interface

EXT4 file NFS network


FAT file system
system file system

Advanced: check out Linux’ FUSE (Filesystem in Userspace)


spcl.inf.ethz.ch
@spcl_eth

Rest of today: I/O


1. Recap: what devices look like

2. Device drivers

3. The I/O subsystem


spcl.inf.ethz.ch
@spcl_eth

Recap from CASP:


What does a device look like?
spcl.inf.ethz.ch
@spcl_eth

Recap: What is a device?


Specifically, to an OS programmer:
 Piece of hardware visible from software
 Occupies some location on a bus
 Set of registers
 Memory mapped or I/O space
 Source of interrupts
 May initiate Direct Memory Access transfers
spcl.inf.ethz.ch
@spcl_eth

Recap: Registers

 Details of registers given


in chip “datasheets” or
“data books”
 Information is rarely
trusted by OS
programmers 

From the data


sheet for the
PC16550 UART
(standard PC
serial port)
spcl.inf.ethz.ch
@spcl_eth

Registers

 Slightly more readable


version:
 From Barrelfish, in a
language called “Mackerel”
 Compiler generates code to
do the “bit-banging”
spcl.inf.ethz.ch
@spcl_eth

Using registers

 From the Barrelfish console


driver
 Very simple!

 Note the issues:


 Polling loop on send
 Polling loop on receive
Only a good idea for debug
 CPU must write all the data
not much in this case
spcl.inf.ethz.ch
@spcl_eth

Very simple UART driver


 Actually, far too simple!
 But this is how the first version always looks…
 No initialization code, no error handling.
 Uses Programmed I/O (PIO)
 CPU explicitly reads and writes all values to and from registers
 All data must pass through CPU registers
 Uses polling
 CPU polls device register waiting before send/receive
Tight loop!
 Can’t do anything else in the meantime
Although could be extended with threads and care…
 Without CPU polling, no I/O can occur
spcl.inf.ethz.ch
@spcl_eth

Recap: Interrupts
 CPU Interrupt-request line triggered by I/O device

 Interrupt handler receives interrupts

 Maskable to ignore or delay some interrupts

 Interrupt vector to dispatch interrupt to correct handler


 Based on priority
 Some nonmaskable

 Interrupt mechanism also used for exceptions


spcl.inf.ethz.ch
@spcl_eth

Interrupt-driven I/O cycle

CPU Device
Process A performs
blocking I/O operation

Driver initiates I/O


operation with device
Device starts I/O

Scheduler blocks process …


A; switches to other
processes
I/O completes (or
… error occurs); device
raises interrupt
Interrupt handler
processes data

CPU resumes interrupted


process


Process A unblocks and
operation returns
spcl.inf.ethz.ch
@spcl_eth

Recap: Direct Memory Access


 Avoid programmed I/O for lots of data
 E.g. fast network or disk interfaces
 Requires DMA controller
 Generally built-in these days
 Bypasses CPU to transfer data directly between I/O device and
memory
 Doesn’t take up CPU time
 Can save memory bandwidth
 Only one interrupt per transfer
spcl.inf.ethz.ch
@spcl_eth

I/O protection
I/O operations can be dangerous to normal system operation!

 Dedicated I/O instructions usually privileged


 I/O performed via system calls
 Register locations must be protected
 DMA transfers must be carefully checked
 Bypass memory protection!
 How can that happen today?
Multiple operating systems on the same machine (e.g., virtualized)
 IOMMUs are beginning to appear…
spcl.inf.ethz.ch
@spcl_eth

IOMMUs

IOMMU does the same for the I/O devices as MMU does for the CPU!
➔ Translates device adresses (so called DVAs) into physical ones

➔ Uses so called IOTLB (I/O TLB)

➔ Works for DMA-capable

devices :-)

➔ Examples:
➔ Intel VT-d

➔ AMD IOMMU

➔ ...very similar in functionality

Source: Wikipedia
spcl.inf.ethz.ch
@spcl_eth

IOMMUs
➔ Security features for VMs
➔ Possibility to assign different devices to different address domains

➔ By address remapping we can isolate the domains from one another,

thus 'sandboxing' untrusted devices

Source: Intel VT-d specification


spcl.inf.ethz.ch
@spcl_eth

IOMMUs
➔ IOMMUs were designed for enhancing virtualization
➔ Remapping & security features can be applied to guest virtual

machines
➔ Better performance than software-based I/O virtualization

Source: Intel VT-d specification


spcl.inf.ethz.ch
@spcl_eth

IOMMUs
➔ IOMMUs take as the 'input request' the ID consisting of:
➔ Bus ID, stored in root tables (support for multiple buses),
➔ Device ID, stored in context tables (support for multiple devices within each bus)
➔ Function ID, also stored in context tables (support for multiple func. within each
device)

➔ Different page
table per I/O device

Source: Intel VT-d specification


spcl.inf.ethz.ch
@spcl_eth

IOMMUs - Address remapping


➔ IOMMUs support page remapping
➔ Some PCI devices use 32 bit addressing

➔ IOMMU Page Tables bounce


buffers IOMMU
➔ Similar to 'standard' multi-level

page tables
➔ Write-only / read-only bits

➔ Support for huge pages

➔ Currently no support for

more extended features


(e.g., reference bits)

Source: http://codingrelic.geekhold.com/
spcl.inf.ethz.ch
@spcl_eth

IOMMUs

➔ IOMMUs are much broader topic


➔ They provide also:
➔ Interrupt remapping (you can control interrupts in a similar

way as memory accesses)


➔ Device I/O TLBs (Intel VT-d)

➔ Fault logging

➔ …

➔ You can think of many interesting use cases for them :-)
➔ Interested? New ideas?
spcl.inf.ethz.ch
@spcl_eth

Device drivers
spcl.inf.ethz.ch
@spcl_eth

Device drivers
 Software object (module, object, process, hunk of code) which
abstracts a device
 Sits between hardware and rest of OS
 Understands device registers, DMA, interrupts
 Presents uniform interface to rest of OS

 Device abstractions (“driver models”) vary…


 Unix starts with “block” and “character” devices
spcl.inf.ethz.ch
@spcl_eth

Device driver structure: the basic problem


 Hardware is interrupt driven.
 System must respond to unpredictable I/O events
(or events it is expecting, but doesn’t know when)

 Applications are (often) blocking


 Process is waiting for a specific I/O event to occur

 Often considerable processing in between


 TCP/IP processing, retries, etc.
 File system processing, blocks, locking, etc.
spcl.inf.ethz.ch
@spcl_eth

Example: network receive

Recv()
User process

Kernel
Block Unblock

Demux
TCP processing
Retransmissions
Timeouts
Port allocation
Etc.

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Example: network receive

Recv()
User process

Kernel
Block Unblock

• Can’t take too long


Demux • Interrupts disabled?
TCP processing • Can’t change much
Retransmissions • Interrupt context
Timeouts • Arbitrary system state
Port allocation • Can’t hold locks
Etc.

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Example: network receive

Recv()
User process

Kernel
Block Unblock

• Process is blocked
Demux • Don’t even know it’s this
TCP processing process until demux
Retransmissions
Timeouts
Port allocation
Etc.

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Solution 1: driver threads

Recv()
User process

Kernel
Block Unblock

Driver thread

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Solution 1: driver threads

Recv()
User process

Kernel
Block Unblock

1. Interrupt handler
i. Masks interrupt
ii. Does minimal processing Driver thread
iii. Unblocks driver thread

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Solution 1: driver threads

Recv()
User process

Kernel
Block Unblock

2. Thread
i. Performs all necessary
packet processing
ii. Unblocks user processes
Driver thread iii.Unmasks interrupt

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Solution 1: driver threads

Recv()
User process

Kernel
Block Unblock

3. User process
i. Per-process handling
ii. Copies packet to user space
iii.Returns from kernel

Driver thread

Packet arrives;
Interrupt Interrupt handler
spcl.inf.ethz.ch
@spcl_eth

Terminology – very confused!


 1st-level Interrupt Handler (FLIH)
 Linux calls this the “top half”.
 In contrast to every other OS on the planet.

 Thread is an “interrupt handler thread” in Solaris


 Other names in other systems… 
spcl.inf.ethz.ch
@spcl_eth

Solution 2: deferred procedure calls (DPCs)

Recv()
User process

Kernel
Block Unblock

Run all
pending
Enqueue DPCs
DPC
(closure)

Packet arrives;
Interrupt FLIH FLIH
spcl.inf.ethz.ch
@spcl_eth

Deferred Procedure Calls


 Instead of using a thread, execute on the next process to be
dispatched
 Before it leaves the kernel

 Solution in most versions of Unix


 Don’t need kernel threads
 Saves a context switch
 Can’t account processing time to the right process

  3rd solution: demux early, run in user space


 Covered in Advanced OS Course!
spcl.inf.ethz.ch
@spcl_eth

More confusing terminology


 DPCs: also known as:
 2nd-level interrupt handlers
 Soft interrupt handlers
 Slow interrupt handlers
 In Linux ONLY: bottom-half handlers

 Any non-Linux OS (the way to think about it):


 Bottom-half = FLIH + SLIH, called from “below”
 Top-half = Called from user space (syscalls etc.), “above”
spcl.inf.ethz.ch
@spcl_eth

Life cycle of an I/O request

• Request I/O User process • I/O complete

System call Return from system call


• Transfer data to/from user
Can satisfy
space,
request? Yes • Return completion code
No I/O subsystem
• Send request to driver
• Block process if needed

• Issue commands to • Demultiplex I/O complete


device Device driver • Determine source of
• Block until interrupted request

• Handle interrupt
Interrupt handler • Signal device driver

Interrupt

• Issue interrupt when I/O Physical device • I/O complete


completed • Generate Interrupt

Time
spcl.inf.ethz.ch
@spcl_eth

The I/O subsystem


spcl.inf.ethz.ch
@spcl_eth

Generic I/O functionality


 Device drivers essentially move data to and from I/O devices
 Abstract hardware
 Manage asynchrony

 OS I/O subsystem includes generic functions for dealing with


this data
 Such as…
spcl.inf.ethz.ch
@spcl_eth

The I/O subsystem


 Caching - fast memory holding copy of data
 Always just a copy
 Key to performance

 Spooling - hold output for a device


 If device can serve only one request at a time
 E.g., printing
spcl.inf.ethz.ch
@spcl_eth

The I/O subsystem


 Scheduling
 Some I/O request ordering via per-device queue
 Some OSs try fairness

 Buffering - store data in memory while transferring between


devices or memory
 To cope with device speed mismatch
 To cope with device transfer size mismatch
 To maintain “copy semantics”
spcl.inf.ethz.ch
@spcl_eth

Naming and discovery


 What are the devices the OS needs to manage?
 Discovery (bus enumeration)
 Hotplug / unplug events
 Resource allocation (e.g. PCI BAR programming)

 How to match driver code to devices?


 Driver instance ≠ driver module
 One driver typically manages many models of device

 How to name devices inside the kernel?

 How to name devices outside the kernel?


spcl.inf.ethz.ch
@spcl_eth

Matching drivers to devices

 Devices have unique (model) identifiers


 E.g. PCI vendor/device identifiers

 Drivers recognize particular identifiers


 Typically a list…

 Kernel offers a device to each driver in turn


 Driver can “claim” a device it can handle
 Creates driver instance for it.
spcl.inf.ethz.ch
@spcl_eth

Naming devices in the Unix kernel


(Actually, naming device driver instances)

 Kernel creates identifiers for


 Block devices
 Character devices
 [ Network devices – see later… ]

 Major device number:


 Class of device (e.g. disk, CD-ROM, keyboard)

 Minor device number:


 Specific device within a class
spcl.inf.ethz.ch
@spcl_eth

Unix block devices


 Used for “structured I/O”
 Deal in large “blocks” of data at a time

 Often look like files (seekable, mappable)


 Often use Unix’ shared buffer cache

 Mountable:
 File systems implemented above block devices
spcl.inf.ethz.ch
@spcl_eth

Character devices
 Used for “unstructured I/O”
 Byte-stream interface – no block boundaries
 Single character or short strings get/put
 Buffering implemented by libraries

 Examples:
 Keyboards, serial lines, mice

 Distinction with block devices somewhat arbitrary…


spcl.inf.ethz.ch
@spcl_eth

Naming devices outside the kernel


 Device files: special type of file
 Inode encodes <type, major num, minor num>
 Created with mknod() system call

 Devices are traditionally put in /dev


 /dev/sda – First SCSI/SATA/SAS disk
 /dev/sda5 – Fifth partition on the above
 /dev/cdrom0 – First DVD-ROM drive
 /dev/ttyS1 – Second UART
spcl.inf.ethz.ch
@spcl_eth

Pseudo-devices in Unix
 Devices with no hardware!
 Still have major/minor device numbers. Examples:

/dev/stdin
/dev/kmem
/dev/random
/dev/null
/dev/loop0

etc.
spcl.inf.ethz.ch
@spcl_eth

Old-style Unix device configuration


 All drivers compiled into the kernel
 Each driver probes for any supported devices
 System administrator populates /dev
 Manually types mknod when a new device is purchased!
 Pseudo devices similarly hard-wired in kernel
spcl.inf.ethz.ch
@spcl_eth

Linux device configuration today


 Physical hardware configuration readable from /sys
 Special fake file system: sysfs
 Plug events delivered by a special socket
 Drivers dynamically loaded as kernel modules
 Initial list given at boot time
 User-space daemon can load more if required
 /dev populated dynamically by udev
 User-space daemon which polls /sys
spcl.inf.ethz.ch
@spcl_eth

Next time:
 Network stack implementation
 Network devices and network I/O
 Buffering
 Memory management in the I/O subsystem

You might also like