COA
. I/O Organization
Input/Output (I/O) Organization refers to the methods and structures used to facilitate
communication between the central processing unit (CPU), main memory, and external peripheral
devices.
A. I/O Interface
An I/O Interface (or I/O module) is a specialized hardware component that acts as a bridge between
the system bus (connecting CPU and memory) and the peripheral device's controller.
       Purpose: To resolve the differences between the central computer (fast, digital, word-based)
        and peripherals (slower, electromechanical, character-based). It handles tasks like data
        buffering, control signal conversion, and status reporting.
Concept Diagram of an I/O Interface:
Generated code
+---------------------------------+
                 |               CPU                       |
                 +---------------------------------+
                             ^         ^       ^
                             |         |       | System Bus (Address, Data, Control)
                             v         v       v
+-------------------------------------------------------------+
|                           I/O INTERFACE                                      |
|                                                                              |
| +-----------------+        +-----------------+        +---------------+ |
| | Data Register        |--| Status Register |         |Control Register |
| +-----------------+        +-----------------+        +---------------+ |
|            ^                                                                 |
|            | Port Logic (Data, Status, Control signals)                      |
|            v                                                                 |
+-------------------------------------------------------------+
                             ^         ^       ^
                             |         |       |
                             v         v       v
                 +---------------------------------+
               |             Peripheral Device              |
               |             (e.g., Keyboard, Disk)         |
               +---------------------------------+
B. Common I/O Buses
       PCI (Peripheral Component Interconnect): A parallel bus standard for attaching hardware
        devices in a computer. It was common for connecting graphics cards, sound cards, and
        network cards directly to the motherboard. It has largely been replaced by PCIe (PCI
        Express), which is a high-speed serial bus.
       SCSI (Small Computer System Interface): A set of standards for physically connecting and
        transferring data between computers and peripheral devices. It is a parallel bus often used
        for high-performance devices like hard disks and tape drives in servers.
       USB (Universal Serial Bus): A high-speed serial bus standard that has become the de-
        facto standard for connecting most peripherals (keyboards, mice, printers, external drives,
        phones).
            o Example: When you plug a USB drive into your computer, the OS uses the USB
                protocol to detect the device, assign resources, and manage data transfer serially.
C. Data Transfer Modes
1. Serial vs. Parallel Transfer
 Feature           Serial Transfer                          Parallel Transfer
                   Bits are sent one after another over a   Multiple bits are sent simultaneously
 Concept
                   single channel.                          over multiple channels.
                   Sender > 10110101 >                      Sender > 1 > Receiver <br>> 0
 Diagram
                   Receiver (Single Line)                   ><br>> 1 ><br>... (Multiple Lines)
                   Slower per cycle, but can achieve
                                                            Faster per cycle, but limited by clock
 Speed             higher clock rates over long
                                                            skew and crosstalk over distance.
                   distances.
 Cost              Cheaper (fewer wires and pins).          More expensive (more wires).
                                                            Older Printer Ports (Centronics),
 Examples          USB, SATA, Ethernet, HDMI
                                                            IDE/PATA, PCI
2. Synchronous vs. Asynchronous Transfer
        Synchronous Data Transfer: The transfer is synchronized by a common clock signal
         shared by both the sender and receiver. All data transfers happen on the edge of the clock
         pulse. It is fast and efficient for short distances.
        Asynchronous Data Transfer: No shared clock is used. Data is transmitted with start and
         stop bits to frame each character/byte. Control signals (handshaking) are used to coordinate
         the transfer. It is more flexible but has higher overhead.
Flowchart: Asynchronous Handshaking (Source-initiated)
Generated mermaid
graph TD
       A[Source places data on bus] --> B{Asserts 'Data Valid' signal};
       B --> C[Destination detects 'Data Valid'];
       C --> D[Destination reads data from bus];
       D --> E{Asserts 'Data Accepted' signal};
       E --> F[Source detects 'Data Accepted'];
       F --> G{De-asserts 'Data Valid' signal};
       G --> H[Destination detects 'Data Valid' is low];
       H --> I{De-asserts 'Data Accepted' signal};
       I --> J[Ready for next transfer];
D. Direct Memory Access (DMA)
DMA is a feature that allows I/O devices to access main memory directly, without involving the CPU
in the data transfer itself. This significantly reduces CPU overhead and improves system
performance.
Components: A DMA Controller (DMAC) manages the transfer.
Flowchart of a DMA Transfer:
Generated mermaid
graph TD
       subgraph CPU
             A[1. CPU programs the DMAC with: <br>- Memory start address <br>- Word
count to transfer <br>- I/O device address <br>- Transfer direction
(read/write)]
       end
     subgraph DMAC
           B[3. DMAC sends Bus Request to CPU]
           D[5. DMAC takes control of the bus]
           E[6. DMAC performs direct data transfer <br> between I/O device and
Main Memory]
           G[8. When transfer is complete, <br> DMAC sends an Interrupt to the
CPU]
     end
     subgraph System Bus
           F[BUS]
     end
     A --> Z[2. CPU continues with other tasks];
     B -- Bus Request --> C{CPU grants bus};
     C -- Bus Grant --> D;
     D <--> F;
     E <--> F;
     E --> G;
     G -- Interrupt --> H[9. CPU handles the interrupt <br> and knows the
transfer is done];
E. I/O Processor (IOP)
An I/O Processor, also known as a channel, is a more sophisticated version of a DMA controller. It is
a dedicated processor that can execute I/O-specific instructions ("channel programs") from main
memory, making it capable of managing complex I/O operations with minimal CPU intervention.
2. Memory Organization
This describes how different types of memory are arranged and managed in a computer system.
A. Memory Hierarchy
A pyramid structure that organizes memory based on speed, cost, and capacity. Memory closer to
the CPU is faster, more expensive, and smaller.
Diagram of Memory Hierarchy:
Generated code
/ \             <-- Registers (On-CPU, fastest)
              /---\
             / Caches \        <-- L1, L2, L3 Cache
            /----------\
           / Main Memory\      <-- RAM (DRAM)
        /--------------\
       /     Secondary        \ <-- Magnetic Disk (HDD/SSD), Optical
   /------------------\
  / Tertiary/Offline             \   <-- Magnetic Tape, Cloud Storage (slowest)
          Principle of Locality: Caching works because programs tend to access data and
           instructions in predictable patterns.
               o Temporal Locality: If an item is referenced, it will tend to be referenced again soon.
               o Spatial Locality: If an item is referenced, items whose addresses are close by will
                    tend to be referenced soon.
B. Main and Secondary Memory
 Type                  Sub-Types & Description
                       Volatile, primary storage for programs and data currently in use. <br> - RAM
 Main
                       (Random Access Memory): Can be read from and written to. <br> - ROM
 Memory
                       (Read-Only Memory): Non-volatile, holds firmware like the BIOS.
                       Non-volatile, long-term storage. <br> - Magnetic Disk (HDD): Spinning
 Secondary             platters with read/write heads. <br> - Optical Storage: CDs, DVDs, Blu-ray
 Memory                discs read by a laser. <br> - Flash Memory (SSD): No moving parts, faster
                       than HDDs.
C. Cache Memory
A small, extremely fast memory (SRAM) that sits between the CPU and main memory to store
frequently accessed data.
1. Cache Mapping Schemes
This defines how main memory blocks are placed into cache lines.
                                                       Diagram
 Scheme               Description                                    Pros / Cons
                                                       Concept
 Direct               Each memory block has                          Pros: Simple, fast
 Mapped               only one specific line in the                  lookup. <br> Cons: High chance
                  cache where it can be
                  placed. Cache Line =                          of "conflict misses" if two
                  (Memory Block Address)                        frequently used blocks map to
                  % (Number of Cache                            the same line.
                  Lines)
                  A memory block can be
                                                                Pros: Most flexible, lowest
                  placed in any available line
 Fully                                                          conflict misses. <br> Cons: Very
                  of the cache. The entire
 Associative                                                    expensive and slow to search
                  cache must be searched to
                                                                (requires parallel comparators).
                  find a block.
                  A compromise. The cache is
                  divided into sets. A memory
                                                                Pros: Good balance of
                  block maps to a specific set,
 Set-                                                           performance and
                  but can be placed in any line
 Associative                                                    cost. <br> Cons: More complex
                  within that set. Set =
                                                                than direct-mapped.
                  (Memory Block Address)
                  % (Number of Sets)
2. Replacement Algorithms
When a cache miss occurs and the cache is full, a replacement algorithm decides which existing
block to evict.
      LRU (Least Recently Used): Replaces the block that has been accessed least recently.
       Best performance but complex to implement perfectly.
      FIFO (First-In, First-Out): Replaces the block that has been in the cache the longest.
       Simple but can evict frequently used blocks.
      Random: Replaces a random block. Very simple and surprisingly effective.
D. Virtual Memory
A memory management technique that provides an "idealized" view of storage to a program. It
allows a program to be larger than the physical main memory by keeping only the necessary parts in
RAM and the rest on the disk.
Concept Diagram: Address Translation
The Memory Management Unit (MMU) translates a virtual address from the CPU into a physical
address in RAM using a Page Table.
Generated code
+---------+       Virtual Address         +-------------+        Physical Address
+---------------+
  |     CPU    | -------------------> |             MMU       | -------------------> |
Main Memory     |
  +---------+                               | (Page Table)|
+---------------+
                                            +-------------+
                                                     |
                                                     | Page Fault (if not in RAM)
                                                     v
                                            +---------------+
                                            | Secondary Mem |
                                            +---------------
3. Multiprocessors
A multiprocessor system contains two or more CPUs that share access to a common RAM and
peripherals, enabling simultaneous execution of multiple threads or processes.
A. Multiprocessor Structures (Architectures)
 Type                 Description                                                       Diagram
 UMA (Uniform         All processors have equal access time to all parts of the
 Memory               shared memory. Also called Symmetric Multiprocessing
 Access)              (SMP). Common in most modern consumer multicore PCs.
 NUMA (Non-           Memory access time depends on the memory location relative
 Uniform              to the processor. Accessing local memory is faster than
 Memory               accessing remote memory (memory connected to another
 Access)              processor). Common in high-end servers.
B. Inter-processor Arbitration & Communication
       Arbitration: A process to decide which processor gets access to a shared resource (like the
        system bus) when multiple processors request it at the same time. Example: A centralized
        bus arbiter grants access one processor at a time.
       Communication & Synchronization:
            o Communication: Processors communicate by writing to and reading from shared
               memory locations.
            o Synchronization: Essential to avoid race conditions and ensure data consistency.
               This is done using synchronization primitives like locks, mutexes, and
                    semaphores, which are often implemented using atomic hardware instructions
                    (e.g., Test-and-Set , Compare-and-Swap ).
C. Concept of Pipelining
Pipelining is an implementation technique where multiple instructions are overlapped in execution.
The instruction cycle is broken down into a series of stages.
Diagram: 5-Stage Instruction Pipeline
Each instruction (I1, I2, I3...) moves through the stages one clock cycle at a time.
Generated code
Clock Cycle ->            1        2         3          4       5        6        7
------------------------------------------------------------------
Instruction 1:           IF       ID       EX          MEM     WB
Instruction 2:                    IF       ID          EX      MEM      WB
Instruction 3:                             IF          ID      EX       MEM      WB
Instruction 4:                                         IF      ID       EX       MEM
content_copydownload
Use code with caution.
         Stages:
             o IF: Instruction Fetch
             o ID: Instruction Decode & Register Fetch
             o EX: Execute / Calculate Address
             o MEM: Memory Access
             o WB: Write Back result to register
D. RISC and CISC
This refers to the design philosophy of a CPU's instruction set.
                         CISC (Complex Instruction           RISC (Reduced Instruction Set
  Feature
                         Set Computer)                       Computer)
                         Make programming easier
  Goal                                                       Make hardware simpler and faster.
                         with powerful instructions.
                         Large number of complex,            Small number of simple, single-cycle
  Instructions
                         multi-cycle instructions.           instructions.
  Memory                 Many instructions can access        Only specific load and store instructions
    Access                  memory directly.                  access memory.
                            Difficult to pipeline due to
    Pipelining              variable instruction length and   Easy to pipeline.
                            complexity.
    Examples                Intel x86, AMD x86-64             ARM (in smartphones), MIPS, PowerPC
E. Multicore Processors (Intel, AMD)
A multicore processor is a single integrated circuit (chip) with two or more independent processing
units called "cores".
            Concept: It is a form of UMA (or NUMA on multi-socket systems) multiprocessing on a
             single physical chip.
            Example: A modern Intel Core i7 or AMD Ryzen 7 processor.
            Diagram of a simple Dual-Core Processor:
Generated code
+-------------------------------------------------------------+
|                                         Processor Chip                              |
|                                                                                     |
|        +---------------------+                     +---------------------+          |
|        |          CORE 0                |          |         CORE 1             |   |
|        | +------+ +------+              |          | +------+ +------+          |   |
|        | | L1-I$| | L1-D$|              |          | | L1-I$| | L1-D$|          |   |
|        | +------+ +------+              |          | +------+ +------+          |   |
|        |         |          |           |          |        |         |         |   |
|        |     +-----------+              |          |    +-----------+           |   |
|        |     |       L2$        |       |          |    |       L2$       |     |   |
|        |     +-----------+              |          |    +-----------+           |   |
|        +---------|-----------+                     +---------|-----------+          |
|                       |                                          |                  |
|                       +--------------+---------------+                              |
|                                              |                                      |
|                                     +-----------+                                   |
|                                     |   Shared L3$ |                                    |
|                                     +-----------+                                   |
|                                              |                                      |
|                                 +-----------------+                                 |
|                   | Memory Controller | <----> To Main Memory (RAM)
|                   +-----------------+                       |
|                                                             |
+-------------------------------------------------------------+