UNIT-4
1. Explain the Multiprocessors and Multi computers
Ans: Multiprocessors: A Multiprocessor is a computer system with two or more central
processing units (CPUs) share full access to a common RAM.
  ▪ The main objective of using a multiprocessor is to boost the system’s execution speed,
    with other objectives being fault tolerance and application matching.
  ▪ There are two types of multiprocessors, one is called shared memory multiprocessor
    and another is distributed memory multiprocessor.
  ▪ In shared memory multiprocessors, all the CPUs shares the common memory but in a
    distributed memory multiprocessor, every CPU has its own private memory.
  ▪   The interconnection among two or more processor and shared memory is done with
      three methods.
      1)Time shared common bus
      2)Multiport memories
      3)Crossbar switch network
1)Time shared common bus
In this method it contains a single shared bus through which all processor & memory unit can
be communicated.
Consider CPU-1 is interacting with memory unit using common shared bus in that case all
other processor must be idle as we have only one bus to communicate.
Advantage:
   •   Simple to implement.
   •   Due to single common bus cost to implement is very less.
Disadvantage:
   •   Data transfer rate is slow.
2)Multiport memories
Unlike in the shared common bus method, it contains separate bus for each processor to
communicate with the memory module.
Suppose CPU-1 wants to interact with memory module 1 then port mm1 is enabled. Similarly
CPU-4 wants to interact with memory module 4 then port mm4 is enabled. Hence all the
process can be communicated parallelly. If more than one CPU request for same time memory
module, priority will be given in the order of CPU-1,CPU-2,CPU-3,CPU-4.
                               Multiport memories architecture
                         Remaining ans in spectrum pg 4.6 Q5
3)Crossbar switch network
Here instead multiport unlike in multiport memories, a switch will be installed between
memory unit and CPU. Switch is responsible for whether to pass the request to a particular
memory module or not based on the request made for.
                                    cross-bar switch network
Advantage:
   •    High data through rate.
Disadvantage:
   •    Complex to implement as more switches involved.
   •    Costlier to implement.
Remaining ans in spectrum pg 4.5 Q3
Applications of Multiprocessor –
   1. As a uniprocessor, such as single instruction, single data stream (SISD).
   2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is
      usually used for vector processing.
   3. Multiple series of instructions in a single perspective, such as multiple instruction,
      single data stream (MISD), which is used for describing hyper-threading or pipelined
      processors.
   4. Inside a single system for executing multiple, individual series of instructions in multiple
      perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor –
   1.   Enhanced performance.
   2.   Multiple applications.
   3.   Multi-tasking inside an application.
   4.   High throughput and responsiveness.
   5.   Hardware sharing among CPUs.
Advantages:
   1.   Improved performance
   2.   Better scalability
   3.   Increased reliability
   4.   Reduced cost
   5.   Enhanced parallelism
Disadvantages:
   1.   Increased complexity
   2.   Higher power consumption
   3.   Difficult programming
   4.   Synchronization issues
   5.   Limited performance gains
Multicomputer: A multicomputer system is a computer system with multiple processors
that are connected together to solve a problem. Each processor has its own memory and it is
accessible by that particular processor and those processors can communicate with
eachother via an interconnection network.
As the multicomputer is capable of messages passing between the processors, it is possible
to divide the task between the processors to complete the task. Hence, a multicomputer can
be used for distributed computing. It is cost effective and easier to build a multicomputer
than a multiprocessor.
                   Multiprocessor                          Multicomputer
        Multiprocessor consists of multiple    Multicomputer is an interlinked
        processors within a single computer.   multiple autonomous computer.
        Multiprocessor is a singly shared      The memory attached to the
        memory that is attached to the         processing elements are distributed
        elements being processed.              in multiples.
        Multiprocessor is necessary for the    Multicomputer is not required for
        processing elements to communicate     elements being processed to
        with each other.                       communicate.
        Multiprocessor is a dynamic network.   Multicomputer is a type of static
                                               network.
       Multiprocessor supports parallel         Multicomputer supports distributed
       computing                                computing.
       Example of multiprocessor is a           Example of a multicomputer is a
       sequent symmetry S-81.                   message passing multicomputer.
2. Explain the message passing mechanisms
Ans: Message Passing Mechanism
Message passing is a paradigm for inter-process communication (IPC) and parallel computing.
It involves exchanging messages between processes or nodes to coordinate their actions and
share data, allowing for synchronization and coordination in computations. Message passing
in a multicomputer network demands special hardware and software support.
Message-Routing Schemes:
Message Format:
     ▪ Information units used in message routing are specified in Fig. A message is the logical
       unit for internode communication. It is often assembled from an arbitrary number of
       fixed-length packets; thus, it may have a variable length.
   ▪   A packet is the basic unit containing the destination address for routing purposes.
       Because different packets may arrive at the destination asynchronously, a sequence
       number is needed in each packet to allow reassembly of the message transmitted.
   ▪   The packet is divided into several flits which stores following:
          a) Routing information
          b) Sequence number
          c) Data
   ▪   A packet can be further divided into a number of fixed-lengthflits (flow control digits).
       Routing information (destination) and sequence number occupy the header flits. The
       remaining flits arc the data elements of a packet.
   ▪   The packet length is determined by the routing scheme and network implementation.
       Typical packet lengths range from 64 to 512 bits. The sequence number may occupy
       one to two flits depending on the message length.
Store-and-Forward Routing:
   ▪ Packets are the basic unit of information flow in a store-and-forward Routing. The
      concept is illustrated in Fig. a.
   ▪   Each node is required to use a packet buffer. A packet is transmitted from a source node
       to a destination node through a sequence of intermediate nodes.
   ▪   When a packet reaches an intermediate node, it is first stored in the buffer. Then it is
       forwarded to the next node if the desired output channel and a packet buffer in the
       receiving node are both available.
   ▪   The latency in store-and-forward networks is directly proportional to the distance (the
       number of hops) between the source and the destination.
   ▪   This routing scheme was implemented in the first generation of multicomputer.
Wormhole Routing:
  ▪ This was implemented by the generations of multicomputer. They divide the packets
    into several smaller flites and use flit buffer in hardware router attached the to nodes.
  ▪ The transmission from the source node to the destination node is done through a
    sequence of routers.
  ▪ All the flits in the same packet are transmitted in order as inseparable companions in a
    pipelined Fashion.
  ▪ The flits of packets are transmitted in sequence in cascading from having header flit
    just followed by the data flits. The path is known only to header flit who carries data
    flits to desired destination.
  ▪ Different packet can be interleaved during transmission. However, the flits from
    different packets cannot be mixed up. Otherwise they may be towed to the wrong
    destinations.
Asynchronous Pipelining:
   ▪ The pipelining of successive flits in a packet is done asynchronously using a
     handshaking protocol as shown in Fig.
   ▪ Along the path, a 1-bit ready/request (R/A) line is used between adjacent routers.
     When the receiving router (D) is ready to receive a flit (i.e. the flit buffer is available),
     it pulls the R/A line low. (Fig. a)
   ▪ When the sending router(S) is ready, it raises the line high and transmits flit i through
     the channel. (Fig.b)
   ▪ While the flit is being received by D, the R/A line is kept high. (Fig. c)
   ▪ After flit i is removed from D's buffer (i.e. is transmitted to the next node), the cycle
     repeats itself for the transmission of the next flit i + 1 until the entire packet is
     transmitted. (Fig.d)
   ▪ Asynchronous pipelining can be very efficient, and the clock used can be faster than
     that used in a synchronous pipeline.
Deadlock and Virtual Channels:
Virtual Channel: A virtual channel is a logical link between two nodes.
   ▪ It is formed by a flit buffer in the source node, a physical channel between them, and
      a flit buffer in the receiver node.
   ▪ Figure shows the concept of four virtual channels sharing a single physical channel.
      Four flit buffers are used at the source node and receiver node, respectively.
   ▪ One source buffer is paired with one receiver buffer to form a virtual channel when the
      physical channel is allocated for the pair.
Deadlock Avoidance:
  ▪ By two virtual channels, V3 and V4 in Fig. c, one can break the deadlock cycle.
   ▪ A modified channel-dependence graph is obtained by using the virtual channels V3 and
     V4, after the use of channel C2, instead of reusing C3 and C4.
   ▪ The cycle in Fig. b is being converted to a spiral, thus avoiding a deadlock. Channel
     multiplexing can be done at the flit level or at the packet level if the packet length is
     sufficiently short.
   ▪ Virtual channels can be implemented with either unidirectional channels or
     bidirectional channels. The use of virtual channels may reduce the effective channel
     bandwidth available to each request.
3. Describe the Multiprocessors System interconnect Architectures
Ans:
Spectrum pg.no 4.3 Q1 generalized multiprocessor system
 Q2 Hierarchical bus system
Q3 crossbar networks
Q4 crosspoint switch design
Q5 multiport memory
Q6 routing in butterfly networks
4. Explain directory-based cache coherence protocols
Ans:
CACHE COHERENCE : Cache coherence refers to the consistency of shared data stored in multiple caches
within a multiprocessor.
In systems with multiple processors, each processor typically has a private cache to reduce memory access
latency. However, when multiple caches store copies of the same memory block, cache coherence issues
arise:
    • Inconsistent Data: A processor modifies a memory block in its cache, but other caches hold stale
        copies.
    • Race Conditions: Multiple processors attempt to modify the same data simultaneously..
Cache coherence ensures that all the processors have a consistent view of shared memory even when data
is cached in multiple locations.
Directory-based cache coherence protocols:
  ▪ Directory-based cache coherence protocols are crucial for maintaining data consistency
    in large-scale multiprocessor systems.
  ▪ Directory-based cache coherence protocols maintain a centralized directory that tracks
    the state and location of cached data across all processors in a multiprocessor system
  ▪ The directory acts as a global point of control, managing coherence by storing
    information about which caches hold copies of each memory block and their respective
    states (shared, exclusive, modified)
  ▪ When a processor requests access to a memory block, it sends a message to the
    directory, which consults its records to determine the appropriate actions required to
    maintain coherence
        o If the block is not present in any cache, the directory fetches it from main
           memory and sends it to the requesting processor
        o If the block is present in other caches, the directory coordinates the necessary
           invalidation or update messages to ensure coherence before granting access to
           the requesting processor
  ▪   Directory-based protocols typically employ a set of coherence states (MESI, MOESI) to track the status
      of each cached block and enforce the coherence invariants.
  ▪   Common coherence states include:
          o Modified (M): The block is exclusively owned by a single cache and has been modified
          o Exclusive (E): The block is exclusively owned by a single cache but has not been modified
          o Shared (S): The block is shared among multiple caches and is read-only
          o Invalid (I): The block is not present in the cache or is outdated
  ▪   Coherence invariants ensure that:
          o At most one cache can have a block in the Modified state
          o If a block is in the Shared state, no cache can have it in the Modified or Exclusive state
          o A block in the Exclusive state cannot coexist with copies in other caches
  ▪   The directory maintains a presence vector or a sharing list to keep track of which processors have
      copies of each memory block, enabling efficient invalidation or update operations
  ▪   Directory-based protocols offer better scalability compared to snooping-based
      protocols,
          o In snooping-based protocols, all processors monitor a shared bus for coherence
             transactions, which can lead to increased traffic and limited scalability as the
             number of processors grows
     o Directory-based protocols avoid the need for a shared bus and centralized
       snooping, reducing the communication overhead and enabling more efficient
       use of interconnect bandwidth
▪ Protocol Categorization:
     o Full Map directories
     o Limited directories
     o Chained directories
1. Full Map directories:
     •   The Full Map Directory Protocol is a directory-based cache coherence protocol
         used to maintain consistency across multiple caches in a shared-memory
         multiprocessor system. This protocol is called "Full Map" because it maintains a
         complete directory for every cache line in the shared memory, allowing precise
         tracking of which caches hold copies of a given memory block.
     •   The full-map protocol implements directory entries with one bit per processor
         and a dirty bit(The cache block has been modified (written) more than once, and
         the cache copy is the only one in the system). Each bit represents the status of
         the block in the corresponding processor's cache (present or absent).
     •   If the dirty bit is set, then one and only one processor's bit is set and that
         processor can write into the block.
     •   Each directory entry contains N pointers, where N is the number of processors.
     •   There could be N cached copies of a particular block shared by all processors
     •   For every memory block, an N bit vector is maintained, where N equals the
         number of processors in the shared memory system. Each bit in the vector
         corresponds to one processor.
     •   The full-map protocol provides a useful upper bound for the performance of
         centralized directory-based cache coherence. However, it is not scalable due to
         excessive memory overhead.
2. Limited directories:
     •   The Limited Directory Protocol (LDP) is a memory coherence protocol used to
         maintain consistency in shared memory systems, particularly in distributed
         shared memory (DSM) or multiprocessor systems. It is a variation of the
         directory-based coherence protocol, optimized to reduce the storage and
         communication overhead associated with directory maintenance.
     •   Limited directory protocols are designed to solve the directory size problem.
         Restricting the number of simultaneously cached copies of any particular block
         of data limits the growth of the directory to a constant factor.
     •   It has Fixed number of pointers per directory entry regardless of the number of
         processors.
     •   These protocols are considered scalable with respect to memory overhead
         because the resource required to implement them grows approximately linearly
         with the number of processors in the system.
3. Chained directories:
     •   The Chained Directory Protocol is a variant of directory-based coherence
         protocols that uses a distributed and linked data structure to track cache
         coherence. Unlike centralized directory protocols, where a single directory
         maintains coherence information, the chained directory protocol links
         directories across distributed memory modules in a chain-like structure.
     •   Chained directories realize the scalability of limited directories without
         restricting the number of shared copies of data blocks. This type of cache
         coherence scheme is called a chained scheme because it keeps track of shared
         copies of data by maintaining a chain of directory pointers.
     •   Instead of broadcasting coherence messages to all processors or a centralized
         directory, coherence requests (e.g., read, write, or invalidate) are propagated
         through the chain.
     •   Each directory node forwards the request to the next node in the chain until the
         coherence operation is completed.
     •   Chained directories emulate full-map by distributing the directory among the
         caches Solving the directory size problem without restricting the number of
         shared block copies.
     •   Chained directories keep track of shared copies of a particular block by
         maintaining a chain of directory pointers.
     •   This scheme should be called a gossip protocol [as opposed to a snoopy
         protocol] because information is passed from individual to individual rather than
         being spread by covert observation.
     •   Although the chained protocols are more complex than the limited directory
         protocols, they are still scalable in terms of the amount of memory used For the
         directories. The number of pointers per cache or memory block is independent
         of the number of processors.
IF ASKED IN DETAIL
   1. Full Map directories:
         •   The Full Map Directory Protocol is a directory-based cache coherence protocol
             used to maintain consistency across multiple caches in a shared-memory
             multiprocessor system. This protocol is called "Full Map" because it maintains a
             complete directory for every cache line in the shared memory, allowing precise
             tracking of which caches hold copies of a given memory block.
Components of the Full Map Directory Protocol
   1. Directory:
         o A centralized or distributed structure that keeps metadata about every memory
            block.
         o Each entry in the directory corresponds to a memory block and contains:
               ▪ State: Indicates the coherence state of the block (e.g., Modified, Shared,
                   Invalid).
               ▪ Presence Vector: A bit vector where each bit corresponds to a specific
                   cache. A 1 in a bit indicates the cache holds a copy of the block.
   2. Cache States (for each block in individual caches):
         o Modified (M): The block is updated in the cache and not consistent with
            memory; the cache owns the block.
         o Shared (S): The block is consistent across all caches and the memory.
         o Invalid (I): The block is not valid in the cache.
   3. Interconnect:
         o A mechanism (e.g., bus or network) that facilitates communication between
            caches, processors, and the directory.
Operations in the Protocol
The Full Map Directory Protocol handles four primary operations: Read, Write, Invalidate,
and Evict.
a. Read Miss (Cache Read Request)
   •   If a processor requests a memory block not in its cache:
           1. The cache sends a read request to the directory.
           2. The directory checks the block's state:
                 ▪ If the block is Shared, the directory forwards the block from memory or
                    one of the caches with a copy.
                 ▪ If the block is Modified, the directory requests the cache with the block to
                    write it back to memory, then forwards the block to the requesting cache.
           3. The directory updates the presence vector to include the requesting cache.
b. Write Miss (Cache Write Request)
   •   If a processor wants to write to a block not in its cache or in the Shared state:
           1. The cache sends a write request to the directory.
           2. The directory checks the block's state:
                 ▪ If the block is Shared, the directory invalidates the block in all other caches
                    and updates the state to Modified.
                 ▪ If the block is Modified, the directory requests the owning cache to write
                    back the block, then forwards it to the requesting cache.
           3. The directory updates the presence vector to reflect the requesting cache as the
              sole owner.
c. Invalidate
   •   When a cache needs exclusive access to a block (e.g., for a write operation):
         1. The directory sends invalidate messages to all caches listed in the presence
            vector.
         2. Once all invalidations are acknowledged, the block is marked as Modified in the
            requesting cache.
d. Eviction
   •   When a cache evicts a block:
         1. The cache informs the directory to remove its entry from the presence vector.
         2. If the block is Modified, the evicting cache writes it back to memory to ensure
            consistency.
   1. Limited directories:
          •   The Limited Directory Protocol (LDP) is a memory coherence protocol used to
              maintain consistency in shared memory systems, particularly in distributed
              shared memory (DSM) or multiprocessor systems. It is a variation of the
              directory-based coherence protocol, optimized to reduce the storage and
              communication overhead associated with directory maintenance.
Key Features of Limited Directory Protocol:
   1. Compact Directory Representation:
         o  Traditional directory protocols store information for all processors in the system,
            which can lead to significant memory overhead in large-scale systems.
        o LDP limits the number of processors that can be explicitly tracked for a cache
            line. Instead of maintaining a full list of sharers, it uses a limited sharer list to
            track only a subset of processors that have a copy of the data.
  2. Fixed-Size Directory Entries:
        o Each directory entry has a fixed size, with space for a limited number of sharers.
        o If the number of sharers exceeds the directory's capacity, it enters a fallback
            state, such as broadcasting invalidation or downgrading the block.
  3. Coherence States:
        o Like other directory protocols, LDP supports standard states such as Modified,
            Shared, and Invalid (often abbreviated as MSI or MESI with Exclusive).
        o The directory keeps track of the coherence state and the subset of processors
            holding a valid copy.
  4. Efficient for Sparse Sharing:
        o LDP performs well in scenarios where data sharing is limited to a small number
            of processors at any given time. This is common in many parallel applications
            where data locality is high.
Operation of Limited Directory Protocol:
1. Read Request:
  •   If a processor requests a cache line, the directory checks its state:
          o Shared or Uncached State: The requesting processor is added to the limited
             sharer list, and the data is forwarded.
          o Modified State: The directory sends an invalidation request to the current
             owner, fetches the latest data, and updates the state to Shared.
2. Write Request:
  •   For a write request, the protocol ensures exclusive access:
         o If the data is in Shared State, invalidation messages are sent to all sharers in the
            directory.
         o The writer is granted exclusive ownership, and the state is updated to Modified.
3. Overflow Handling:
  •   When the number of sharers exceeds the directory's capacity:
        o Broadcasting: The directory resorts to broadcasting coherence messages to all
           processors, ensuring all sharers are updated or invalidated.
        o Coarser Tracking: The directory might switch to a coarse-grained mode,
           marking the cache line as globally shared without tracking individual sharers.
3.Chained Directory Protocol
The Chained Directory Protocol is a variant of directory-based coherence protocols that
uses a distributed and linked data structure to track cache coherence. Unlike centralized
directory protocols, where a single directory maintains coherence information, the chained
directory protocol links directories across distributed memory modules in a chain-like
structure.
Key Features
   1. Distributed Directory: Each memory module maintains a local directory that tracks
      which caches hold copies of its memory blocks. These directories are linked in a
      chain.
   2. Chaining Mechanism:
          o Instead of broadcasting coherence messages to all processors or a centralized
              directory, coherence requests (e.g., read, write, or invalidate) are propagated
              through the chain.
          o Each directory node forwards the request to the next node in the chain until
              the coherence operation is completed.
   3. Scalability:
          o The chained structure reduces bottlenecks in centralized directories and scales
              better with an increasing number of processors.
   4. Reduced Network Traffic:
          o By chaining requests instead of broadcasting them, the protocol minimizes
              unnecessary message traffic, making it suitable for large-scale systems.
How It Works
  1. Data Access and Directory Chain
        o When a processor accesses a memory block, it checks its local cache.
        o If the block is not found, a request is sent to the memory module that owns the
            block. The local directory at the memory module identifies which processors
            have cached copies of the block.
        o If coherence actions are needed (e.g., invalidating other copies), the request
            propagates through the chain of directories.
  2. Read Operations
        o If a processor requests a read, the directory checks whether the block is
            available in shared state.
        o If available, the block is sent to the requesting processor.
        o If the block is modified in another cache, the protocol fetches the updated
            block from that cache, writes it back to memory, and updates the requestor.
  3. Write Operations
        o For a write request, the protocol ensures that no stale copies exist in other
            caches.
        o It sends invalidate messages to all caches holding the block, propagating the
            request through the chain.
  4. Fault Tolerance
        o If a link in the chain fails, the protocol can often reroute requests to maintain
            functionality.
5. Explain snoopy bus protocol
Ans: Snoopy bus Protocol:
The Snoopy Bus Protocol is a cache coherence mechanism used in multiprocessor systems
where multiple processors share a common bus and memory. This protocol ensures that the
data consistency is maintained across all processor caches. It operates by "snooping" or
monitoring the shared bus to observe memory transactions. Two key maintained protocols
to achieve this coherency through bus-snooping are the write invalidate protocol and the
write update protocol. Here's a detailed breakdown:
Write Invalidate Protocol
When multiple caches share the same memory, a protocol is needed to keep them in sync.
The Write Invalidate Protocol ensures this by:
   •   When a processor writes to a memory location, all other caches with that memory
       line invalidate (nullify) their copies.
   •   This ensures no other processor reads outdated data, and only the writing processor
       has the latest data.
It prevents mismatched or outdated information in the system.
Advantages
   •   Reduces Bandwidth Usage: To this, the only message that is transmitted in the bus is
       invalidation messages and therefore the subject is smaller as compared to
       transmitting the actual data.
   •   Simpler Implementation: The protocol is also easy to implement because it simplifies
       the problem of cache coherence.
   •   Prevents Stale Data Access: By voiding other caches’ copies it ensures that no
       processor acquire old information from that particular cache.
Disadvantages
   •   Increased Cache Misses: Invalidating caches may result in more cache misses, in the
       sense that other processors may have to read the data in from either the main
       memory or from the cache of the writing processor.
   •   Performance Overhead: It also allows for frequent invalidations which can add
       latency where there is high write contention.
Write Update Protocol
The Write Update Protocol (or Write Broadcast Protocol) is a cache coherence method
used in multiprocessor systems.
  •   When a processor writes to a memory location, it updates all other caches with the
      new value instead of invalidating them.
  •   This ensures all caches have the latest data without needing to nullify any copies.
Advantages
  • Reduced Cache Misses: Due to the availability of the updated data in all the caches,
    they do not have to use the fetch operation thus reducing on the number of cache
    misses.
  •   Improved Performance for Read-Heavy Workloads: Applications with more frequent
      reads are benefitted most when there is the latest data in all the caches.
  •   Consistent Data Across Caches: Makes sure all caches have up to date data, thus
      improving data synchronization.
Disadvantages
  •   Higher Bandwidth Consumption: The approach of broadcasting updated data to all
      caches in order to reduce multiple invalidation messages requires more
      bus bandwidth than when sending invalidation messages only.
  •   Complex Implementation: Co-ordinating and making certain that all of the caches
      have the correct data can be even more challenging.
  •   Potential for Increased Traffic: When multiple processors are involved, the amount of
      update messages can grow big, causing congestion and thus the bus.
  •   Consider three processors (P1, P2, and Pn) maintaining consistent copies of block A'
      in their local caches (Fig. a) and in the shared-memory module marked X.
  •   Using a write-invalidate protocol, the processor P1 modifies (writes) its cache from X
      to X’, and all other copies are invalidated via the bus (denoted 1 in Fig. b). Invalidated
      blocks are sometimes called dirty, meaning they should not be used.
  •   The write-update protocol (Fig.c) demands the new block content X' be broadcast to
      all cache copies via the bus. The memory copy is also updated if write-through caches
      are used. In using write-back caches, the memory copy is updated later at block
      replacement time.
The memory copy is also updated if write-through caches are used. In using write-back
caches, the memory copy is updated later at block replacement time. Write-Through Cachet
The states of a cache block copy change with respect to read, write, and replacement
operations in the cache. Figure shows the state transitions for two basic write-invalidate
snoopy protocols developed for write-through and write-back caches, respectively. A block
copy of a write- through cache i attached to processor i can assume one of two possible
cache states: valid or invalid (Fig. a). Write-Back Caches The valid state of a write-back cache
can be further split into two cache states, labeled RW (read-write) and RO (read-only) as
shown in Fig. b. The INV (invalidated or not-in-cache) cache state is equivalent to the invalid
state mentioned before. This threestate coherence scheme corresponds to an ownership
protocol.
Write-once Protocol James Goodman proposed a cache coherence protocol for bus-based
multiprocessors. This scheme combines the advantages of both write-through and
writeback invalidations. In order to reduce bus traffic, the very first write of a cache block
uses a write-through policy.