0% found this document useful (0 votes)
17 views19 pages

Aca UNIT-4

The document discusses multiprocessors and multicomputers, explaining their architectures, advantages, and disadvantages. It details message passing mechanisms, including routing schemes and cache coherence protocols, particularly directory-based protocols that maintain data consistency in multiprocessor systems. The document also categorizes directory protocols into full map, limited, and chained directories, highlighting their scalability and efficiency in managing cache coherence.

Uploaded by

sarah .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views19 pages

Aca UNIT-4

The document discusses multiprocessors and multicomputers, explaining their architectures, advantages, and disadvantages. It details message passing mechanisms, including routing schemes and cache coherence protocols, particularly directory-based protocols that maintain data consistency in multiprocessor systems. The document also categorizes directory protocols into full map, limited, and chained directories, highlighting their scalability and efficiency in managing cache coherence.

Uploaded by

sarah .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT-4

1. Explain the Multiprocessors and Multi computers


Ans: Multiprocessors: A Multiprocessor is a computer system with two or more central
processing units (CPUs) share full access to a common RAM.
▪ The main objective of using a multiprocessor is to boost the system’s execution speed,
with other objectives being fault tolerance and application matching.
▪ There are two types of multiprocessors, one is called shared memory multiprocessor
and another is distributed memory multiprocessor.
▪ In shared memory multiprocessors, all the CPUs shares the common memory but in a
distributed memory multiprocessor, every CPU has its own private memory.

▪ The interconnection among two or more processor and shared memory is done with
three methods.
1)Time shared common bus
2)Multiport memories
3)Crossbar switch network
1)Time shared common bus
In this method it contains a single shared bus through which all processor & memory unit can
be communicated.
Consider CPU-1 is interacting with memory unit using common shared bus in that case all
other processor must be idle as we have only one bus to communicate.
Advantage:
• Simple to implement.
• Due to single common bus cost to implement is very less.
Disadvantage:
• Data transfer rate is slow.
2)Multiport memories
Unlike in the shared common bus method, it contains separate bus for each processor to
communicate with the memory module.
Suppose CPU-1 wants to interact with memory module 1 then port mm1 is enabled. Similarly
CPU-4 wants to interact with memory module 4 then port mm4 is enabled. Hence all the
process can be communicated parallelly. If more than one CPU request for same time memory
module, priority will be given in the order of CPU-1,CPU-2,CPU-3,CPU-4.

Multiport memories architecture


Remaining ans in spectrum pg 4.6 Q5
3)Crossbar switch network
Here instead multiport unlike in multiport memories, a switch will be installed between
memory unit and CPU. Switch is responsible for whether to pass the request to a particular
memory module or not based on the request made for.
cross-bar switch network
Advantage:
• High data through rate.
Disadvantage:
• Complex to implement as more switches involved.
• Costlier to implement.
Remaining ans in spectrum pg 4.5 Q3

Applications of Multiprocessor –
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is
usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction,
single data stream (MISD), which is used for describing hyper-threading or pipelined
processors.
4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor –
1. Enhanced performance.
2. Multiple applications.
3. Multi-tasking inside an application.
4. High throughput and responsiveness.
5. Hardware sharing among CPUs.
Advantages:
1. Improved performance
2. Better scalability
3. Increased reliability
4. Reduced cost
5. Enhanced parallelism
Disadvantages:
1. Increased complexity
2. Higher power consumption
3. Difficult programming
4. Synchronization issues
5. Limited performance gains

Multicomputer: A multicomputer system is a computer system with multiple processors


that are connected together to solve a problem. Each processor has its own memory and it is
accessible by that particular processor and those processors can communicate with
eachother via an interconnection network.

As the multicomputer is capable of messages passing between the processors, it is possible


to divide the task between the processors to complete the task. Hence, a multicomputer can
be used for distributed computing. It is cost effective and easier to build a multicomputer
than a multiprocessor.

Multiprocessor Multicomputer
Multiprocessor consists of multiple Multicomputer is an interlinked
processors within a single computer. multiple autonomous computer.
Multiprocessor is a singly shared The memory attached to the
memory that is attached to the processing elements are distributed
elements being processed. in multiples.
Multiprocessor is necessary for the Multicomputer is not required for
processing elements to communicate elements being processed to
with each other. communicate.
Multiprocessor is a dynamic network. Multicomputer is a type of static
network.
Multiprocessor supports parallel Multicomputer supports distributed
computing computing.
Example of multiprocessor is a Example of a multicomputer is a
sequent symmetry S-81. message passing multicomputer.

2. Explain the message passing mechanisms

Ans: Message Passing Mechanism


Message passing is a paradigm for inter-process communication (IPC) and parallel computing.
It involves exchanging messages between processes or nodes to coordinate their actions and
share data, allowing for synchronization and coordination in computations. Message passing
in a multicomputer network demands special hardware and software support.
Message-Routing Schemes:
Message Format:
▪ Information units used in message routing are specified in Fig. A message is the logical
unit for internode communication. It is often assembled from an arbitrary number of
fixed-length packets; thus, it may have a variable length.

▪ A packet is the basic unit containing the destination address for routing purposes.
Because different packets may arrive at the destination asynchronously, a sequence
number is needed in each packet to allow reassembly of the message transmitted.
▪ The packet is divided into several flits which stores following:
a) Routing information
b) Sequence number
c) Data
▪ A packet can be further divided into a number of fixed-lengthflits (flow control digits).
Routing information (destination) and sequence number occupy the header flits. The
remaining flits arc the data elements of a packet.
▪ The packet length is determined by the routing scheme and network implementation.
Typical packet lengths range from 64 to 512 bits. The sequence number may occupy
one to two flits depending on the message length.

Store-and-Forward Routing:
▪ Packets are the basic unit of information flow in a store-and-forward Routing. The
concept is illustrated in Fig. a.
▪ Each node is required to use a packet buffer. A packet is transmitted from a source node
to a destination node through a sequence of intermediate nodes.
▪ When a packet reaches an intermediate node, it is first stored in the buffer. Then it is
forwarded to the next node if the desired output channel and a packet buffer in the
receiving node are both available.
▪ The latency in store-and-forward networks is directly proportional to the distance (the
number of hops) between the source and the destination.
▪ This routing scheme was implemented in the first generation of multicomputer.

Wormhole Routing:
▪ This was implemented by the generations of multicomputer. They divide the packets
into several smaller flites and use flit buffer in hardware router attached the to nodes.
▪ The transmission from the source node to the destination node is done through a
sequence of routers.
▪ All the flits in the same packet are transmitted in order as inseparable companions in a
pipelined Fashion.
▪ The flits of packets are transmitted in sequence in cascading from having header flit
just followed by the data flits. The path is known only to header flit who carries data
flits to desired destination.
▪ Different packet can be interleaved during transmission. However, the flits from
different packets cannot be mixed up. Otherwise they may be towed to the wrong
destinations.

Asynchronous Pipelining:
▪ The pipelining of successive flits in a packet is done asynchronously using a
handshaking protocol as shown in Fig.
▪ Along the path, a 1-bit ready/request (R/A) line is used between adjacent routers.
When the receiving router (D) is ready to receive a flit (i.e. the flit buffer is available),
it pulls the R/A line low. (Fig. a)
▪ When the sending router(S) is ready, it raises the line high and transmits flit i through
the channel. (Fig.b)
▪ While the flit is being received by D, the R/A line is kept high. (Fig. c)
▪ After flit i is removed from D's buffer (i.e. is transmitted to the next node), the cycle
repeats itself for the transmission of the next flit i + 1 until the entire packet is
transmitted. (Fig.d)

▪ Asynchronous pipelining can be very efficient, and the clock used can be faster than
that used in a synchronous pipeline.

Deadlock and Virtual Channels:


Virtual Channel: A virtual channel is a logical link between two nodes.
▪ It is formed by a flit buffer in the source node, a physical channel between them, and
a flit buffer in the receiver node.
▪ Figure shows the concept of four virtual channels sharing a single physical channel.
Four flit buffers are used at the source node and receiver node, respectively.
▪ One source buffer is paired with one receiver buffer to form a virtual channel when the
physical channel is allocated for the pair.

Deadlock Avoidance:
▪ By two virtual channels, V3 and V4 in Fig. c, one can break the deadlock cycle.
▪ A modified channel-dependence graph is obtained by using the virtual channels V3 and
V4, after the use of channel C2, instead of reusing C3 and C4.
▪ The cycle in Fig. b is being converted to a spiral, thus avoiding a deadlock. Channel
multiplexing can be done at the flit level or at the packet level if the packet length is
sufficiently short.
▪ Virtual channels can be implemented with either unidirectional channels or
bidirectional channels. The use of virtual channels may reduce the effective channel
bandwidth available to each request.

3. Describe the Multiprocessors System interconnect Architectures


Ans:
Spectrum pg.no 4.3 Q1 generalized multiprocessor system
Q2 Hierarchical bus system
Q3 crossbar networks
Q4 crosspoint switch design
Q5 multiport memory
Q6 routing in butterfly networks

4. Explain directory-based cache coherence protocols


Ans:
CACHE COHERENCE : Cache coherence refers to the consistency of shared data stored in multiple caches
within a multiprocessor.
In systems with multiple processors, each processor typically has a private cache to reduce memory access
latency. However, when multiple caches store copies of the same memory block, cache coherence issues
arise:
• Inconsistent Data: A processor modifies a memory block in its cache, but other caches hold stale
copies.
• Race Conditions: Multiple processors attempt to modify the same data simultaneously..
Cache coherence ensures that all the processors have a consistent view of shared memory even when data
is cached in multiple locations.
Directory-based cache coherence protocols:
▪ Directory-based cache coherence protocols are crucial for maintaining data consistency
in large-scale multiprocessor systems.
▪ Directory-based cache coherence protocols maintain a centralized directory that tracks
the state and location of cached data across all processors in a multiprocessor system
▪ The directory acts as a global point of control, managing coherence by storing
information about which caches hold copies of each memory block and their respective
states (shared, exclusive, modified)
▪ When a processor requests access to a memory block, it sends a message to the
directory, which consults its records to determine the appropriate actions required to
maintain coherence
o If the block is not present in any cache, the directory fetches it from main
memory and sends it to the requesting processor
o If the block is present in other caches, the directory coordinates the necessary
invalidation or update messages to ensure coherence before granting access to
the requesting processor

▪ Directory-based protocols typically employ a set of coherence states (MESI, MOESI) to track the status
of each cached block and enforce the coherence invariants.
▪ Common coherence states include:
o Modified (M): The block is exclusively owned by a single cache and has been modified
o Exclusive (E): The block is exclusively owned by a single cache but has not been modified
o Shared (S): The block is shared among multiple caches and is read-only
o Invalid (I): The block is not present in the cache or is outdated
▪ Coherence invariants ensure that:
o At most one cache can have a block in the Modified state
o If a block is in the Shared state, no cache can have it in the Modified or Exclusive state
o A block in the Exclusive state cannot coexist with copies in other caches
▪ The directory maintains a presence vector or a sharing list to keep track of which processors have
copies of each memory block, enabling efficient invalidation or update operations
▪ Directory-based protocols offer better scalability compared to snooping-based
protocols,
o In snooping-based protocols, all processors monitor a shared bus for coherence
transactions, which can lead to increased traffic and limited scalability as the
number of processors grows
o Directory-based protocols avoid the need for a shared bus and centralized
snooping, reducing the communication overhead and enabling more efficient
use of interconnect bandwidth
▪ Protocol Categorization:
o Full Map directories
o Limited directories
o Chained directories

1. Full Map directories:


• The Full Map Directory Protocol is a directory-based cache coherence protocol
used to maintain consistency across multiple caches in a shared-memory
multiprocessor system. This protocol is called "Full Map" because it maintains a
complete directory for every cache line in the shared memory, allowing precise
tracking of which caches hold copies of a given memory block.
• The full-map protocol implements directory entries with one bit per processor
and a dirty bit(The cache block has been modified (written) more than once, and
the cache copy is the only one in the system). Each bit represents the status of
the block in the corresponding processor's cache (present or absent).
• If the dirty bit is set, then one and only one processor's bit is set and that
processor can write into the block.
• Each directory entry contains N pointers, where N is the number of processors.
• There could be N cached copies of a particular block shared by all processors
• For every memory block, an N bit vector is maintained, where N equals the
number of processors in the shared memory system. Each bit in the vector
corresponds to one processor.
• The full-map protocol provides a useful upper bound for the performance of
centralized directory-based cache coherence. However, it is not scalable due to
excessive memory overhead.

2. Limited directories:
• The Limited Directory Protocol (LDP) is a memory coherence protocol used to
maintain consistency in shared memory systems, particularly in distributed
shared memory (DSM) or multiprocessor systems. It is a variation of the
directory-based coherence protocol, optimized to reduce the storage and
communication overhead associated with directory maintenance.
• Limited directory protocols are designed to solve the directory size problem.
Restricting the number of simultaneously cached copies of any particular block
of data limits the growth of the directory to a constant factor.
• It has Fixed number of pointers per directory entry regardless of the number of
processors.
• These protocols are considered scalable with respect to memory overhead
because the resource required to implement them grows approximately linearly
with the number of processors in the system.

3. Chained directories:
• The Chained Directory Protocol is a variant of directory-based coherence
protocols that uses a distributed and linked data structure to track cache
coherence. Unlike centralized directory protocols, where a single directory
maintains coherence information, the chained directory protocol links
directories across distributed memory modules in a chain-like structure.
• Chained directories realize the scalability of limited directories without
restricting the number of shared copies of data blocks. This type of cache
coherence scheme is called a chained scheme because it keeps track of shared
copies of data by maintaining a chain of directory pointers.
• Instead of broadcasting coherence messages to all processors or a centralized
directory, coherence requests (e.g., read, write, or invalidate) are propagated
through the chain.
• Each directory node forwards the request to the next node in the chain until the
coherence operation is completed.
• Chained directories emulate full-map by distributing the directory among the
caches Solving the directory size problem without restricting the number of
shared block copies.
• Chained directories keep track of shared copies of a particular block by
maintaining a chain of directory pointers.
• This scheme should be called a gossip protocol [as opposed to a snoopy
protocol] because information is passed from individual to individual rather than
being spread by covert observation.
• Although the chained protocols are more complex than the limited directory
protocols, they are still scalable in terms of the amount of memory used For the
directories. The number of pointers per cache or memory block is independent
of the number of processors.
IF ASKED IN DETAIL

1. Full Map directories:


• The Full Map Directory Protocol is a directory-based cache coherence protocol
used to maintain consistency across multiple caches in a shared-memory
multiprocessor system. This protocol is called "Full Map" because it maintains a
complete directory for every cache line in the shared memory, allowing precise
tracking of which caches hold copies of a given memory block.

Components of the Full Map Directory Protocol

1. Directory:
o A centralized or distributed structure that keeps metadata about every memory
block.
o Each entry in the directory corresponds to a memory block and contains:
▪ State: Indicates the coherence state of the block (e.g., Modified, Shared,
Invalid).
▪ Presence Vector: A bit vector where each bit corresponds to a specific
cache. A 1 in a bit indicates the cache holds a copy of the block.
2. Cache States (for each block in individual caches):
o Modified (M): The block is updated in the cache and not consistent with
memory; the cache owns the block.
o Shared (S): The block is consistent across all caches and the memory.
o Invalid (I): The block is not valid in the cache.
3. Interconnect:
o A mechanism (e.g., bus or network) that facilitates communication between
caches, processors, and the directory.

Operations in the Protocol

The Full Map Directory Protocol handles four primary operations: Read, Write, Invalidate,
and Evict.

a. Read Miss (Cache Read Request)


• If a processor requests a memory block not in its cache:
1. The cache sends a read request to the directory.
2. The directory checks the block's state:
▪ If the block is Shared, the directory forwards the block from memory or
one of the caches with a copy.
▪ If the block is Modified, the directory requests the cache with the block to
write it back to memory, then forwards the block to the requesting cache.
3. The directory updates the presence vector to include the requesting cache.

b. Write Miss (Cache Write Request)

• If a processor wants to write to a block not in its cache or in the Shared state:
1. The cache sends a write request to the directory.
2. The directory checks the block's state:
▪ If the block is Shared, the directory invalidates the block in all other caches
and updates the state to Modified.
▪ If the block is Modified, the directory requests the owning cache to write
back the block, then forwards it to the requesting cache.
3. The directory updates the presence vector to reflect the requesting cache as the
sole owner.

c. Invalidate

• When a cache needs exclusive access to a block (e.g., for a write operation):
1. The directory sends invalidate messages to all caches listed in the presence
vector.
2. Once all invalidations are acknowledged, the block is marked as Modified in the
requesting cache.

d. Eviction

• When a cache evicts a block:


1. The cache informs the directory to remove its entry from the presence vector.
2. If the block is Modified, the evicting cache writes it back to memory to ensure
consistency.

1. Limited directories:
• The Limited Directory Protocol (LDP) is a memory coherence protocol used to
maintain consistency in shared memory systems, particularly in distributed
shared memory (DSM) or multiprocessor systems. It is a variation of the
directory-based coherence protocol, optimized to reduce the storage and
communication overhead associated with directory maintenance.

Key Features of Limited Directory Protocol:

1. Compact Directory Representation:


o Traditional directory protocols store information for all processors in the system,
which can lead to significant memory overhead in large-scale systems.
o LDP limits the number of processors that can be explicitly tracked for a cache
line. Instead of maintaining a full list of sharers, it uses a limited sharer list to
track only a subset of processors that have a copy of the data.
2. Fixed-Size Directory Entries:
o Each directory entry has a fixed size, with space for a limited number of sharers.
o If the number of sharers exceeds the directory's capacity, it enters a fallback
state, such as broadcasting invalidation or downgrading the block.
3. Coherence States:
o Like other directory protocols, LDP supports standard states such as Modified,
Shared, and Invalid (often abbreviated as MSI or MESI with Exclusive).
o The directory keeps track of the coherence state and the subset of processors
holding a valid copy.
4. Efficient for Sparse Sharing:
o LDP performs well in scenarios where data sharing is limited to a small number
of processors at any given time. This is common in many parallel applications
where data locality is high.

Operation of Limited Directory Protocol:

1. Read Request:

• If a processor requests a cache line, the directory checks its state:


o Shared or Uncached State: The requesting processor is added to the limited
sharer list, and the data is forwarded.
o Modified State: The directory sends an invalidation request to the current
owner, fetches the latest data, and updates the state to Shared.

2. Write Request:

• For a write request, the protocol ensures exclusive access:


o If the data is in Shared State, invalidation messages are sent to all sharers in the
directory.
o The writer is granted exclusive ownership, and the state is updated to Modified.

3. Overflow Handling:

• When the number of sharers exceeds the directory's capacity:


o Broadcasting: The directory resorts to broadcasting coherence messages to all
processors, ensuring all sharers are updated or invalidated.
o Coarser Tracking: The directory might switch to a coarse-grained mode,
marking the cache line as globally shared without tracking individual sharers.
3.Chained Directory Protocol
The Chained Directory Protocol is a variant of directory-based coherence protocols that
uses a distributed and linked data structure to track cache coherence. Unlike centralized
directory protocols, where a single directory maintains coherence information, the chained
directory protocol links directories across distributed memory modules in a chain-like
structure.
Key Features
1. Distributed Directory: Each memory module maintains a local directory that tracks
which caches hold copies of its memory blocks. These directories are linked in a
chain.
2. Chaining Mechanism:
o Instead of broadcasting coherence messages to all processors or a centralized
directory, coherence requests (e.g., read, write, or invalidate) are propagated
through the chain.
o Each directory node forwards the request to the next node in the chain until
the coherence operation is completed.
3. Scalability:
o The chained structure reduces bottlenecks in centralized directories and scales
better with an increasing number of processors.
4. Reduced Network Traffic:
o By chaining requests instead of broadcasting them, the protocol minimizes
unnecessary message traffic, making it suitable for large-scale systems.

How It Works
1. Data Access and Directory Chain
o When a processor accesses a memory block, it checks its local cache.
o If the block is not found, a request is sent to the memory module that owns the
block. The local directory at the memory module identifies which processors
have cached copies of the block.
o If coherence actions are needed (e.g., invalidating other copies), the request
propagates through the chain of directories.
2. Read Operations
o If a processor requests a read, the directory checks whether the block is
available in shared state.
o If available, the block is sent to the requesting processor.
o If the block is modified in another cache, the protocol fetches the updated
block from that cache, writes it back to memory, and updates the requestor.
3. Write Operations
o For a write request, the protocol ensures that no stale copies exist in other
caches.
o It sends invalidate messages to all caches holding the block, propagating the
request through the chain.
4. Fault Tolerance
o If a link in the chain fails, the protocol can often reroute requests to maintain
functionality.
5. Explain snoopy bus protocol
Ans: Snoopy bus Protocol:
The Snoopy Bus Protocol is a cache coherence mechanism used in multiprocessor systems
where multiple processors share a common bus and memory. This protocol ensures that the
data consistency is maintained across all processor caches. It operates by "snooping" or
monitoring the shared bus to observe memory transactions. Two key maintained protocols
to achieve this coherency through bus-snooping are the write invalidate protocol and the
write update protocol. Here's a detailed breakdown:
Write Invalidate Protocol
When multiple caches share the same memory, a protocol is needed to keep them in sync.
The Write Invalidate Protocol ensures this by:
• When a processor writes to a memory location, all other caches with that memory
line invalidate (nullify) their copies.
• This ensures no other processor reads outdated data, and only the writing processor
has the latest data.
It prevents mismatched or outdated information in the system.
Advantages
• Reduces Bandwidth Usage: To this, the only message that is transmitted in the bus is
invalidation messages and therefore the subject is smaller as compared to
transmitting the actual data.
• Simpler Implementation: The protocol is also easy to implement because it simplifies
the problem of cache coherence.
• Prevents Stale Data Access: By voiding other caches’ copies it ensures that no
processor acquire old information from that particular cache.
Disadvantages
• Increased Cache Misses: Invalidating caches may result in more cache misses, in the
sense that other processors may have to read the data in from either the main
memory or from the cache of the writing processor.
• Performance Overhead: It also allows for frequent invalidations which can add
latency where there is high write contention.
Write Update Protocol
The Write Update Protocol (or Write Broadcast Protocol) is a cache coherence method
used in multiprocessor systems.
• When a processor writes to a memory location, it updates all other caches with the
new value instead of invalidating them.
• This ensures all caches have the latest data without needing to nullify any copies.
Advantages
• Reduced Cache Misses: Due to the availability of the updated data in all the caches,
they do not have to use the fetch operation thus reducing on the number of cache
misses.
• Improved Performance for Read-Heavy Workloads: Applications with more frequent
reads are benefitted most when there is the latest data in all the caches.
• Consistent Data Across Caches: Makes sure all caches have up to date data, thus
improving data synchronization.
Disadvantages
• Higher Bandwidth Consumption: The approach of broadcasting updated data to all
caches in order to reduce multiple invalidation messages requires more
bus bandwidth than when sending invalidation messages only.
• Complex Implementation: Co-ordinating and making certain that all of the caches
have the correct data can be even more challenging.
• Potential for Increased Traffic: When multiple processors are involved, the amount of
update messages can grow big, causing congestion and thus the bus.
• Consider three processors (P1, P2, and Pn) maintaining consistent copies of block A'
in their local caches (Fig. a) and in the shared-memory module marked X.
• Using a write-invalidate protocol, the processor P1 modifies (writes) its cache from X
to X’, and all other copies are invalidated via the bus (denoted 1 in Fig. b). Invalidated
blocks are sometimes called dirty, meaning they should not be used.
• The write-update protocol (Fig.c) demands the new block content X' be broadcast to
all cache copies via the bus. The memory copy is also updated if write-through caches
are used. In using write-back caches, the memory copy is updated later at block
replacement time.
The memory copy is also updated if write-through caches are used. In using write-back
caches, the memory copy is updated later at block replacement time. Write-Through Cachet
The states of a cache block copy change with respect to read, write, and replacement
operations in the cache. Figure shows the state transitions for two basic write-invalidate
snoopy protocols developed for write-through and write-back caches, respectively. A block
copy of a write- through cache i attached to processor i can assume one of two possible
cache states: valid or invalid (Fig. a). Write-Back Caches The valid state of a write-back cache
can be further split into two cache states, labeled RW (read-write) and RO (read-only) as
shown in Fig. b. The INV (invalidated or not-in-cache) cache state is equivalent to the invalid
state mentioned before. This threestate coherence scheme corresponds to an ownership
protocol.

Write-once Protocol James Goodman proposed a cache coherence protocol for bus-based
multiprocessors. This scheme combines the advantages of both write-through and
writeback invalidations. In order to reduce bus traffic, the very first write of a cache block
uses a write-through policy.

You might also like