0% found this document useful (0 votes)

36 views67 pages

Os - 10 - IO Subsystems

The document discusses I/O subsystems in operating systems, focusing on device management, including device drivers, caching, spooling, scheduling, and buffering. It covers the naming and discovery of devices, the distinction between block and character devices, and the dynamic configuration of devices in modern Linux systems. Additionally, it outlines the network I/O interface and the internal workings of packet transmission and reception in BSD and Linux environments.

Uploaded by

dpanic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views67 pages

Os - 10 - IO Subsystems

Uploaded by

dpanic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

spcl.inf.ethz.

ch
@spcl_eth

ADRIAN PERRIG & TORSTEN HOEFLER

Networks and Operating Systems (252-0062-00)

Chapter 10: I/O Subsystems (2)

BE CAREFUL WITH I/O DEVICES!

spcl.inf.ethz.ch
@spcl_eth

Our Small Quiz

 True or false (raise hand)
 Open files are part of the process’ address-space
 Unified buffer caches improve the access times
 A partition table can unify the view of multiple disks
 Unix enables to bind arbitrary file systems to arbitrary locations
 The virtual file system interface improves modularity of OS code
 Programmed I/O is efficient for the CPUs
 DMA enables devices to access virtual memory of processes
 IOMMUs enable memory protection for devices
 IOMMUs improve memory access performance
 First level interrupt handlers process the whole request from the hardware
 Software interrupts reduce the request processing latency
 Deferred procedure calls execute second-level interrupt handlers

2
spcl.inf.ethz.ch
@spcl_eth

The I/O subsystem

spcl.inf.ethz.ch
@spcl_eth

Generic I/O functionality

 Device drivers essentially move data to and from I/O devices
 Abstract hardware
 Manage asynchrony

 OS I/O subsystem includes generic functions for dealing with

this data
 Such as…
spcl.inf.ethz.ch
@spcl_eth

The I/O Subsystem

 Caching - fast memory holding copy of data
 Always just a copy
 Key to performance

 Spooling - hold output for a device

 If device can serve only one request at a time
 E.g., printing
spcl.inf.ethz.ch
@spcl_eth

The I/O Subsystem

 Scheduling
 Some I/O request ordering via per-device queue
 Some OSs try fairness

 Buffering - store data in memory while transferring between

devices or memory
 To cope with device speed mismatch
 To cope with device transfer size mismatch
 To maintain “copy semantics”
spcl.inf.ethz.ch
@spcl_eth

Naming and Discovery

 What are the devices the OS needs to manage?
 Discovery (bus enumeration)
 Hotplug / unplug events
 Resource allocation (e.g., PCI BAR programming)

 How to match driver code to devices?

 Driver instance ≠ driver module
 One driver typically manages many models of device

 How to name devices inside the kernel?

 How to name devices outside the kernel?

spcl.inf.ethz.ch
@spcl_eth

Matching drivers to devices

 Devices have unique (model) identifiers

 E.g., PCI vendor/device identifiers

 Drivers recognize particular identifiers

 Typically a list…

 Kernel offers a device to each driver in turn

 Driver can “claim” a device it can handle
 Creates driver instance for it.
spcl.inf.ethz.ch
@spcl_eth

Naming devices in the Unix kernel

(Actually, naming device driver instances)

 Kernel creates identifiers for

 Block devices
 Character devices
 [ Network devices – see later… ]

 Major device number:

 Class of device (e.g., disk, CD-ROM, keyboard)

 Minor device number:

 Specific device within a class
spcl.inf.ethz.ch
@spcl_eth

Unix Block Devices

 Used for “structured I/O”
 Deal in large “blocks” of data at a time

 Often look like files (seekable, mappable)

 Often use Unix’ shared buffer cache

 Mountable:
 File systems implemented above block devices
spcl.inf.ethz.ch
@spcl_eth

Character Devices
 Used for “unstructured I/O”
 Byte-stream interface – no block boundaries
 Single character or short strings get/put
 Buffering implemented by libraries

 Examples:
 Keyboards, serial lines, mice

 Distinction with block devices somewhat arbitrary…

spcl.inf.ethz.ch
@spcl_eth

Mid-lecture mini-quiz
 Character or block device (raise hand)
 Video card
 USB stick
 Microphone
 Screen (graphics adapter)
 Network drive

12
spcl.inf.ethz.ch
@spcl_eth

Naming devices outside the kernel

 Device files: special type of file
 Inode encodes <type, major num, minor num>
 Created with mknod() system call

 Devices are traditionally put in /dev

 /dev/sda – First SCSI/SATA/SAS disk
 /dev/sda5 – Fifth partition on the above
 /dev/cdrom0 – First DVD-ROM drive
 /dev/ttyS1 – Second UART
spcl.inf.ethz.ch
@spcl_eth

Pseudo-devices in Unix
 Devices with no hardware!
 Still have major/minor device numbers. Examples:

/dev/stdin (was a device earlier, now link)

/dev/kmem
/dev/random
/dev/null
/dev/loop0

etc.
spcl.inf.ethz.ch
@spcl_eth

Old-style Unix device configuration

 All drivers compiled into the kernel
 Each driver probes for any supported devices
 System administrator populates /dev
 Manually types mknod when a new device is purchased!
 Pseudo devices similarly hard-wired in kernel
spcl.inf.ethz.ch
@spcl_eth

Linux device configuration today

 Physical hardware configuration readable from /sys
 Special fake file system: sysfs
 Plug events delivered by a special socket
 Drivers dynamically loaded as kernel modules
 Initial list given at boot time
 User-space daemon can load more if required
 /dev populated dynamically by udev
 User-space daemon which polls /sys
spcl.inf.ethz.ch
@spcl_eth

Interface to network I/O

spcl.inf.ethz.ch
@spcl_eth

Software routing

 OS protocol stacks
include routing
Routing daemon
functionality
 Routing protocols
typically in a user-space
User space daemon
Kernel space Routing  Non-critical
Routing
protocol control  Easier to change
messages
 Forwarding information
typically in kernel
FIB (forwarding
Protocol stack
information base)
 Needs to be fast
 Integrated into protocol stack

Network
spcl.inf.ethz.ch
@spcl_eth

Network stack implementation

spcl.inf.ethz.ch
@spcl_eth

Networking stack
 Probably most important peripheral
 GPU is increasingly not a peripheral
 Disk interfaces look increasingly like a network
 But…
 NO standard OS textbook talks about the network stack!
 Good references:
 The 4.4BSD book (for Unix at least)
 George Varghese: “Network Algorithmics” (up to a point)
spcl.inf.ethz.ch
@spcl_eth

Receiving a packet in BSD

Application Application

Stream Datagram
socket socket

TCP UDP ICMP

Receive queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Receiving a packet in BSD

Application Application

Stream Datagram
socket socket

TCP UDP ICMP

IP
1. Interrupt
Receive queue
1.1 Allocate buffer
1.2 Enqueue packet
Kernel 1.3 Post s/w interrupt
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Receiving a packet in BSD

Application Application

Stream Datagram
socket socket

2. S/W Interrupt
TCP UDP ICMP High priority
Any process context
Defragmentation
IP TCP processing
Enqueue on socket
Receive queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Receiving a packet in BSD

Application Application

3. Application
Stream Datagram Copy buffer to user
socket space
socket
Application process
context

TCP UDP ICMP

Receive queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Sending a packet in BSD

Application Application

Stream Datagram
socket socket

TCP UDP ICMP

Send queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Sending a packet in BSD

Application Application

1. Application
Copy from user space to buffer
Datagram
Stream
Call TCP code and process
socket
socket
Possible enqueue on socket
queue

TCP UDP ICMP

Send queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Sending a packet in BSD

Application Application

Stream Datagram
socket socket

2. S/W Interrupt
TCP UDP ICMP Any process context
Remaining TCP
processing
IP IP processing
Enqueue on i/f queue
Send queue

Kernel
Network
interface
spcl.inf.ethz.ch
@spcl_eth

Sending a packet in BSD

Application Application

Stream Datagram
socket socket

TCP UDP ICMP

Send queue
3. Interrupt
Send packet
Kernel Free buffer
Network
interface
spcl.inf.ethz.ch
@spcl_eth

The TCP state machine

Closed
Active open / SYN
Passive Open Close
Close
Listen
SYN / SYNACK Send / SYN

SYN_rcvd SYN / SYNACK SYN_sent

ACK SYNACK / ACK

Close / FIN Established

Close / FIN FIN / ACK

FIN_wait_1 FIN / ACK Close_wait

ACK Close / FIN

FIN_wait_2 Closing Last_Ack

ACK
Timeout after 2 ACK
segment lifetimes
FIN / ACK Time_wait Closed
spcl.inf.ethz.ch
@spcl_eth

OS TCP state machine

 More complex! Also needs to handle:
 Congestion control state (window, slow start, etc.)
 Flow control window
 Retransmission timeouts
 Etc.
 State transitions triggered when:
 User request: send, recv, connect, close
 Packet arrives
 Timer expires
 Actions include:
 Set or cancel a timer
 Enqueue a packet on the transmit queue
 Enqueue a packet on the socket receive queue
 Create or destroy a TCP control block
spcl.inf.ethz.ch
@spcl_eth

In-kernel protocol graph

TCP UDP ICMP

Interfaces can be
standard (e.g. X-
kernel, Windows) or
IP protocol-specific (e.g.
Unix)

e.g. ARP
Tunneling
Ethernet

Ethernet device
spcl.inf.ethz.ch
@spcl_eth

Protocol graphs
Graph nodes can be:
 Per-protocol (handle all flows)
 Packets are “tagged” with demux tags
 Per-connection (instantiated dynamically)
 Multiple interfaces as well as connections
 Ethernet  Ethernet  bridging
 IP  IP  IP routing
spcl.inf.ethz.ch
@spcl_eth

Memory management
spcl.inf.ethz.ch
@spcl_eth

Memory management
 Problem: how to ship packet data around
 Need a data structure that can:
 Easily add, remove headers
 Avoid copying lots of payload
 Uniformly refer to half-defined packets
 Fragment large datasets into smaller units
 Solution:
 Data is held in a linked list of “buffer structures”
spcl.inf.ethz.ch
@spcl_eth

BSD Unix mbufs (Linux equivalent: sk_buffs)

next
offset
length
type

Data
(112 bytes)

next object
spcl.inf.ethz.ch
@spcl_eth

BSD Unix mbufs (Linux equivalent: sk_buffs)

next Next mbuf

offset for this 36
length object 24 36
type Type: DATA bytes

24 bytes
Data
(112 bytes)

next object Next object

in a list
spcl.inf.ethz.ch
@spcl_eth

Case Study: Linux 3.x

• Implementing a simple protocol over Ethernet

• Why?
 You want to play with networking equipment (well, RAW sockets
are easier)
 You want to develop specialized protocols
E.g., application-specific “TCP”
E.g., for low-latency cluster computing
 You’ll understand how it works!
spcl.inf.ethz.ch
@spcl_eth

Sending Data in Linux 3.x

Socket
• Many layers
char*
• Most use the sk_buf struct
tcp_send_msg
ip_fragment
struct sk_buff

tcp_transmit_skb
ip_route_output_flow
struct sk_buff,
TCP Header
ip_forward
ip_queue_xmit dev_queue_xmit

Driver
spcl.inf.ethz.ch
@spcl_eth

Register a receive hook

• Fill packet_type struct:

 .type = your ethertype Receive hook table:
 .func = your receive function 0x0800 IPv4 hdlr.
• Receive handler recv_hook(...) 0x8864 PPPOE hdlr.
 Gets sk_buff, packet_type, net_device, ... 0x8915 RoCE hdlr.
 Called for each incoming frame …
 Reads skb->data field and processes protocols
spcl.inf.ethz.ch
@spcl_eth

Interaction with applications

 Socket Interface
 Need to implement handlers for connect(), bind(), listen(), etc.

 Call sock_register(struct net_proto_family*)

 Register a protocol family
 Enables user to create socket of this type
spcl.inf.ethz.ch
@spcl_eth

Anatomy of struct sk_buff

 Called “skb” in Linux jargon

 Allocate via alloc_skb() (or dev_alloc_skb() if in driver)
 Free with kfree_skb() (dev_kfree_skb())
 Use pskb_may_pull(skb, len) to check if data is available
 skb_pull(skb, len) to advance the data pointer
... it even has a webpage! http://www.skbuff.net/
spcl.inf.ethz.ch
@spcl_eth

SKB fields
 Double-linked list, each skb has .next/.prev
 .data contains payload (size of data field is set by skb_alloc)
 .sk is the socket this skb is owned by
 .mac_header, .network_header, .transport_header contain headers of
various layers
 .dev is the device this skb uses
 ... 58 member fields total
spcl.inf.ethz.ch
@spcl_eth

Case study: IP fragmenting

 Linux <2.0.32: // Determine the position of this fragment.
 Two fragments: end = offset + iph->tot_len - ihl; #1: 100, #2: 200
// Check for overlap with preceding fragment, and, if needed,
 #1 // align things so that any overlaps are eliminated.
Offset: 0 if (prev != NULL && offset < prev->end) {
Length: 100 i = prev->end - offset;
offset += i; /* ptr into datagram */
 #2 ptr += i; /* ptr into fragment data */
Offset 100 }
Length: 100 // initialize segment structure #1: 0, #2: 100
fp->offset = offset;
fp->end = end; #1: 100, #2: 200
fp->len = end - offset; #1: 100, #2: 100
.... // collect multiple such fragments in queue
// process each fragment
if(count+fp->len > skb->len) {
error_to_big;
}
memcpy((ptr + fp->offset), fp->ptr, fp->len);
count += fp->len;
fp = fp->next;
spcl.inf.ethz.ch
@spcl_eth

Case study: IP fragmenting

 Linux <2.0.32: // Determine the position of this fragment.
end = offset + iph->tot_len - ihl; #1: 100, #2: 30
 Two fragments:
// Check for overlap with preceding fragment, and, if needed,
 #1 // align things so that any overlaps are eliminated.
Offset: 0 if (prev != NULL && offset < prev->end) {
i = prev->end - offset; #2: 100-10=90
Length: 100
offset += i; /* ptr into datagram */ #2: 100
 #2 ptr += i; /* ptr into fragment data */
Offset 10 }
Length: 20 // initialize segment structure
#1: 0, #2: 100
fp->offset = offset;
fp->end = end; #1: 100, #2: 30
fp->len = end - offset; #1: 100, #2: -70
.... // collect multiple such fragments in queue
// process each fragment (size_t)-70 = 4294967226
if(count+fp->len > skb->len) {
error_to_big;
}
memcpy((ptr + fp->offset), fp->ptr, fp->len);
count += fp->len;
fp = fp->next;
spcl.inf.ethz.ch
@spcl_eth

Case study: IP fragmenting

 Linux <2.0.32: // Determine the position of this fragment.
 Two fragments: end = offset + iph->tot_len - ihl; #1: 100, #2: 30
// Check for overlap with preceding fragment, and, if needed,
 #1 // align things so that any overlaps are eliminated.
Offset: 0 if (prev != NULL && offset < prev->end) {
i = prev->end - offset; #2: 100-10=90
Length: 100
offset += i; /* ptr into datagram */ #2: 100
 #2 ptr += i; /* ptr into fragment data */
Offset 10 }
Length: 20 // initialize segment structure
#1: 0, #2: 100
fp->offset = offset;
fp->end = end; #1: 100, #2: 30
fp->len = end - offset; #1: 100, #2: -70
.... // collect multiple such fragments in queue
// process each fragment
if(count+fp->len > skb->len) {
(size_t)-70 = 4294967226
error_to_big;
}
memcpy((ptr + fp->offset), fp->ptr, fp->len);
count += fp->len;
fp = fp->next;
spcl.inf.ethz.ch
@spcl_eth

2.0.32 … that’s so last century!

spcl.inf.ethz.ch
@spcl_eth

Performance issues
spcl.inf.ethz.ch
@spcl_eth

Life cycle of an I/O request

• Request I/O User process • I/O complete

System call Return from system call

• Transfer data to/from user
Can satisfy
space,
request? Yes • Return completion code
No I/O subsystem
• Send request to driver
• Block process if needed

• Issue commands to • Demultiplex I/O complete

device Device driver • Determine source of
• Block until interrupted request

• Handle interrupt
Interrupt handler • Signal device driver

Interrupt

• Issue interrupt when I/O Physical device • I/O complete

completed • Generate Interrupt

Time
spcl.inf.ethz.ch
@spcl_eth

Consider 10 Gb/s Ethernet

spcl.inf.ethz.ch
@spcl_eth

At full line rate for 1 x 10 Gb port

 ~1GB (gigabyte) per second
 ~ 700k full-size Ethernet frames per second
 At 2GHz, must process a packet in ≤ 3000 cycles

 This includes:
 IP and TCP checksums
 TCP window calculations and flow control
 Copying packet to user space
spcl.inf.ethz.ch
@spcl_eth

A few numbers…
 L3 cache miss (64-byte lines) ≈ 300 cycles
 At most 10 cache misses per packet
Note: DMA ensures cache is cold for the packet!

 Interrupt latency ≈ 500 cycles

 Kernel entry/exit
 Hardware access
 Context switch / DPC
 Etc.
spcl.inf.ethz.ch
@spcl_eth

Plus…
 You also have to send packets.
 Card is full duplex  can send at 10 Gb/s
 You have to do something useful with the packets!
 Can an application make use of 1.5kB of data every 1000 machine cycles
or so?
 This card has two 10 Gb/s ports.
spcl.inf.ethz.ch
@spcl_eth

And Plus …

 And if you thought that

was fast …
 Mellanox EDR 100 Gb/s Adapter
 Impossible to use without
advanced features
RDMA
SR-IOV
TOE
Interrupt coalescing
spcl.inf.ethz.ch
@spcl_eth

What to do?
 TCP offload (TOE)
 Put TCP processing into hardware on the card
 Buffering
 Transfer lots of packets in a single transaction
 Interrupt coalescing / throttling
 Don’t interrupt on every packet
 Don’t interrupt at all if load is very high
 Receive-side scaling
 Parallelize: direct interrupts and data to different cores
spcl.inf.ethz.ch
@spcl_eth

Linux New API (NAPI)

 Mitigate interrupt pressure
1. Each packet interrupts the CPU
2. Vs. CPU polls driver
 NAPI switches between the two!
 NAPI-compliant drivers
 Offer a poll() function
 Which calls back into the receive path …
spcl.inf.ethz.ch
@spcl_eth

Linux NAPI Balancing

 Driver enables polling with netif_rx_schedule(struct net_device
*dev)
 Disables interrupts

 Driver deactivates polling with netif_rx_complete(struct

net_device *dev)
 Re-enable interrupts.

  but where does the data go???

spcl.inf.ethz.ch
@spcl_eth

Buffering
Key ideas:

 Decouple sending and receiving

 Neither side should wait for the other
 Only use interrupts to unblock host

 Batch requests together

 Spread cost of transfer over several packets
spcl.inf.ethz.ch
@spcl_eth

Producer-consumer buffer descriptor rings

Consumer
pointer
Free descriptors

Producer Descriptor format

pointer
Physical address
Size in bytes
Misc. flags
Full descriptors
spcl.inf.ethz.ch
@spcl_eth

Example transmit state machine

Sends last packet;

Sends packet; None left in
More packets in descriptor ring
descriptor ring

Running Idle

Host updates
Sends packet;
producer pointer
Ring occupancy
Host updates
below threshold
producer pointer;
Ring now full

Running;
host blocked
Sends packet;
Ring still nearly full
spcl.inf.ethz.ch
@spcl_eth

Transmit interrupts

 Ring empty Sends last packet;

 all packets sent Sends packet;
More packets in
None left in
descriptor ring
descriptor ring
 device going idle
 Ring occupancy drops Running Idle

 host can now send again Sends packet;

Ring occupancy Host updates
 device continues running below threshold
Host updates
producer pointer

producer pointer;
Ring now full

Running;
host blocked
Sends packet;
Ring still nearly full

Exercise: devise a
similar state machine
for receive!
spcl.inf.ethz.ch
@spcl_eth

Buffering summary
 DMA used twice
 Data transfer
 Reading and writing descriptors
 Similar schemes used for any fast DMA device
 SATA/SAS interfaces (such as AHCI)
 USB2/USB3 controllers
 etc.
 Descriptors send ownership of memory regions
 Flexible – many variations possible:
 Host can send lots of regions in advance
 Device might allocate out of regions, send back subsets
 Buffers might be used out-of-order
 Particularly powerful with multiple send and receive queues…
spcl.inf.ethz.ch
@spcl_eth

Receive-side scaling
 Insight:
 Too much traffic for one core to handle
 Cores aren’t getting any faster
 Must parallelize across cores
 Key idea: handle different flows on different cores
 But: how to determine flow for each packet?
 Can’t do this on a core: same problem!
 Solution: demultiplex on the NIC
 DMA packets to per-flow buffers / queues
 Send interrupt only to core handling flow
spcl.inf.ethz.ch
@spcl_eth

Receive-side scaling

Flow table
Received
packet

pointer
Flow state:
• IP src + dest • Ring buffer
• TCP src + dest • Message-signaled interrupt
Etc.

Hash of
packet
header

DMA Core to
address interrupt
spcl.inf.ethz.ch
@spcl_eth

Receive-side scaling
 Can balance flows across cores
 Note: doesn’t help with one big flow!
 Assumes:
 n cores processing m flows is faster than one core
 Hence:
 Network stack and protocol graph must scale on a multiprocessor.
 Multiprocessor scaling: topic for later
spcl.inf.ethz.ch
@spcl_eth

Next (final) week

 Virtual machines
 Multiprocessor operating systems

Os - 9 - IO Subsystems
No ratings yet
Os - 9 - IO Subsystems
65 pages
16-Module 6-21-10-2024
No ratings yet
16-Module 6-21-10-2024
29 pages
Lect 13 X 6
No ratings yet
Lect 13 X 6
5 pages
Understanding Linux Device Drivers
No ratings yet
Understanding Linux Device Drivers
77 pages
OS Final
No ratings yet
OS Final
33 pages
IO Systems
No ratings yet
IO Systems
42 pages
11 - Input Output Systems
No ratings yet
11 - Input Output Systems
50 pages
Chapter 1-c Devicedrivers
No ratings yet
Chapter 1-c Devicedrivers
46 pages
Lecture 17
No ratings yet
Lecture 17
13 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
14 pages
Files and Disks
No ratings yet
Files and Disks
33 pages
I/O Systems and Interfaces Guide
No ratings yet
I/O Systems and Interfaces Guide
14 pages
Chapter 13
No ratings yet
Chapter 13
42 pages
Chapter 11: I/O Systems: Al-Mansour University College Software Engineering and Information Technology Department
No ratings yet
Chapter 11: I/O Systems: Al-Mansour University College Software Engineering and Information Technology Department
42 pages
OS Unit - 4 Part-2 ch13
No ratings yet
OS Unit - 4 Part-2 ch13
31 pages
William Stallings Computer Organization and Architecture 6 Edition Input/Output
No ratings yet
William Stallings Computer Organization and Architecture 6 Edition Input/Output
56 pages
Linux Driver 1
No ratings yet
Linux Driver 1
15 pages
Os 2 Processes
No ratings yet
Os 2 Processes
53 pages
William Stallings Computer Organization and Architecture: Input/Output
No ratings yet
William Stallings Computer Organization and Architecture: Input/Output
53 pages
chp4 IO Management
No ratings yet
chp4 IO Management
49 pages
Module 10 - IO Systems
No ratings yet
Module 10 - IO Systems
46 pages
Linux Device Driver Basics
No ratings yet
Linux Device Driver Basics
37 pages
IO Systems
No ratings yet
IO Systems
71 pages
Chapter 4 Os
No ratings yet
Chapter 4 Os
21 pages
Chapter 4 Device Management
No ratings yet
Chapter 4 Device Management
33 pages
Operating Systems Lecture Notes-11
No ratings yet
Operating Systems Lecture Notes-11
15 pages
OS Mod5@AzDocuments - in
No ratings yet
OS Mod5@AzDocuments - in
71 pages
Ospart 3
No ratings yet
Ospart 3
48 pages
Operating System
No ratings yet
Operating System
15 pages
Os - Unit 6
No ratings yet
Os - Unit 6
38 pages
OS - IO Systems2
No ratings yet
OS - IO Systems2
16 pages
09 IOSystems
No ratings yet
09 IOSystems
23 pages
Unit 3 Io
No ratings yet
Unit 3 Io
45 pages
CSI3131 Mod 11 IO
No ratings yet
CSI3131 Mod 11 IO
33 pages
OS Fundamentals for IT Students
No ratings yet
OS Fundamentals for IT Students
115 pages
Os - 12 - Reliable Storage, NUMA & The Future
No ratings yet
Os - 12 - Reliable Storage, NUMA & The Future
83 pages
Networks and Operating Systems Chapter 2: Processes: A P & T H (252-0062-00)
No ratings yet
Networks and Operating Systems Chapter 2: Processes: A P & T H (252-0062-00)
46 pages
PCI Driver
No ratings yet
PCI Driver
22 pages
Interrupts I/O Hardware
No ratings yet
Interrupts I/O Hardware
21 pages
Linux Device Driver
No ratings yet
Linux Device Driver
2 pages
Linux Device Driver Guide
No ratings yet
Linux Device Driver Guide
14 pages
I/O Hardware: Incredible Variety of I/O Devices Common Concepts
No ratings yet
I/O Hardware: Incredible Variety of I/O Devices Common Concepts
22 pages
13 Io-Systems Printable
No ratings yet
13 Io-Systems Printable
37 pages
Network Hardware & OSI Model Guide
No ratings yet
Network Hardware & OSI Model Guide
33 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Os 06
No ratings yet
Os 06
81 pages
Multi Port Multi Phy Interfaces
No ratings yet
Multi Port Multi Phy Interfaces
31 pages
LDR 3 TOC
No ratings yet
LDR 3 TOC
6 pages
OS - Chapter - 6 - Input Output Management
No ratings yet
OS - Chapter - 6 - Input Output Management
21 pages
Lecture 5 Communicating With Peripherals 2021
100% (2)
Lecture 5 Communicating With Peripherals 2021
44 pages
HP2 T16
No ratings yet
HP2 T16
56 pages
Networks and Operating Systems Introduction To Operating Systems Administrivia
No ratings yet
Networks and Operating Systems Introduction To Operating Systems Administrivia
14 pages
IO Systems
No ratings yet
IO Systems
22 pages
07 - Input Output
No ratings yet
07 - Input Output
62 pages
Kanak Word Notes
No ratings yet
Kanak Word Notes
51 pages
Accounts Assistant (Finance) .: Subject: Syllabus For Written Test (Objective Type) For The Post of
No ratings yet
Accounts Assistant (Finance) .: Subject: Syllabus For Written Test (Objective Type) For The Post of
3 pages
03 Activity 3 - ARG
No ratings yet
03 Activity 3 - ARG
4 pages
Flowchart Symbols Guide
No ratings yet
Flowchart Symbols Guide
2 pages
HW Sandvine Ds Packetlogic 15000
No ratings yet
HW Sandvine Ds Packetlogic 15000
4 pages
Elem 11C
No ratings yet
Elem 11C
8 pages
Google Duo E2EE Technical Guide
No ratings yet
Google Duo E2EE Technical Guide
11 pages
Hms Python
No ratings yet
Hms Python
43 pages
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
No ratings yet
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
9 pages
11th-Part-A (U3) - Information & CommunicationTechnology Skills-III
No ratings yet
11th-Part-A (U3) - Information & CommunicationTechnology Skills-III
7 pages
Steve Thornton - AIDC Resume
No ratings yet
Steve Thornton - AIDC Resume
2 pages
Smartcon 4000
No ratings yet
Smartcon 4000
20 pages
Digital Business Systems Guide
No ratings yet
Digital Business Systems Guide
39 pages
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
No ratings yet
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
7 pages
AI-Driven Life Cycle Assessment Tool For Metallurgy & Mining (SIH25069)
No ratings yet
AI-Driven Life Cycle Assessment Tool For Metallurgy & Mining (SIH25069)
2 pages
Flutter
No ratings yet
Flutter
35 pages
Cfd-Fastran v2014.0 User Manual
No ratings yet
Cfd-Fastran v2014.0 User Manual
317 pages
Yxlon Mgi X-Ray Systems: Versatile Unipolar and Bipolar Stationary X-Ray Modules
No ratings yet
Yxlon Mgi X-Ray Systems: Versatile Unipolar and Bipolar Stationary X-Ray Modules
8 pages
Bugbounty en
No ratings yet
Bugbounty en
170 pages
Module 4 (ISTQB) - Shreejith
No ratings yet
Module 4 (ISTQB) - Shreejith
19 pages
2021 AWS Glue Developer Guide
100% (1)
2021 AWS Glue Developer Guide
1,005 pages
Odor Control Unit
No ratings yet
Odor Control Unit
43 pages
Programming Exercise Solutions
No ratings yet
Programming Exercise Solutions
4 pages
Object Oriented Programming CAT One Object Oriented Programming CAT One
No ratings yet
Object Oriented Programming CAT One Object Oriented Programming CAT One
7 pages
Avaya WFM Schedulers Guide
100% (1)
Avaya WFM Schedulers Guide
622 pages
HiPer HR Owners Manual
No ratings yet
HiPer HR Owners Manual
85 pages
Spec Trod Ens
No ratings yet
Spec Trod Ens
4 pages
Class 12 Selection 24-25
No ratings yet
Class 12 Selection 24-25
6 pages
Algorithmic Aspects of Cloud Computing 4th
100% (1)
Algorithmic Aspects of Cloud Computing 4th
195 pages
Baase Henry GoF5e Ch8
No ratings yet
Baase Henry GoF5e Ch8
34 pages

Os - 10 - IO Subsystems

Uploaded by

Os - 10 - IO Subsystems

Uploaded by

spcl.inf.ethz.

ADRIAN PERRIG & TORSTEN HOEFLER

Networks and Operating Systems (252-0062-00)

BE CAREFUL WITH I/O DEVICES!

Our Small Quiz

The I/O subsystem

Generic I/O functionality

 OS I/O subsystem includes generic functions for dealing with

The I/O Subsystem

 Spooling - hold output for a device

The I/O Subsystem

 Buffering - store data in memory while transferring between

Naming and Discovery

 How to match driver code to devices?

 How to name devices inside the kernel?

 How to name devices outside the kernel?

Matching drivers to devices

 Devices have unique (model) identifiers

 Drivers recognize particular identifiers

 Kernel offers a device to each driver in turn

Naming devices in the Unix kernel

 Kernel creates identifiers for

 Major device number:

 Minor device number:

Unix Block Devices

 Often look like files (seekable, mappable)

 Distinction with block devices somewhat arbitrary…

Naming devices outside the kernel

 Devices are traditionally put in /dev

/dev/stdin (was a device earlier, now link)

Old-style Unix device configuration

Linux device configuration today

Interface to network I/O

Unix interface to network I/O

Network stack implementation

Receiving a packet in BSD

TCP UDP ICMP

Receiving a packet in BSD

TCP UDP ICMP

Receiving a packet in BSD

Receiving a packet in BSD

TCP UDP ICMP

Sending a packet in BSD

TCP UDP ICMP

Sending a packet in BSD

TCP UDP ICMP

Sending a packet in BSD

Sending a packet in BSD

TCP UDP ICMP

The TCP state machine

SYN_rcvd SYN / SYNACK SYN_sent

Close / FIN Established

Close / FIN FIN / ACK

FIN_wait_1 FIN / ACK Close_wait

FIN_wait_2 Closing Last_Ack

OS TCP state machine

In-kernel protocol graph

TCP UDP ICMP

BSD Unix mbufs (Linux equivalent: sk_buffs)

BSD Unix mbufs (Linux equivalent: sk_buffs)

next Next mbuf

next object Next object

Case Study: Linux 3.x

• Implementing a simple protocol over Ethernet

Sending Data in Linux 3.x

Register a receive hook

• Fill packet_type struct:

Interaction with applications

 Call sock_register(struct net_proto_family*)

Anatomy of struct sk_buff

 Called “skb” in Linux jargon

Case study: IP fragmenting

Case study: IP fragmenting

Case study: IP fragmenting

2.0.32 … that’s so last century!

Life cycle of an I/O request