MP IA2 Q and A
1. Flag Register of 80386
The 80386 is a microprocessor released by Intel in the mid-1980s. It is a 32-bit
processor that can execute a variety of instructions, including conditional instructions
and control instructions.
Here are six conditional flags of the 80386:
1. Carry Flag (CF): This flag is set if the result of an arithmetic operation exceeds the
maximum value that can be represented by the data type being used. For example,
if two unsigned numbers are added and the result is larger than the maximum
value that can be represented by the data type, the Carry Flag will be set.
2. Zero Flag (ZF): This flag is set if the result of an operation is zero. For example, if
two numbers are subtracted and the result is zero, the Zero Flag will be set.
3. Sign Flag (SF): This flag is set if the result of an operation is negative. For example,
if two signed numbers are subtracted and the result is negative, the Sign Flag will
be set.
4. Overflow Flag (OF): This flag is set if the result of an arithmetic operation overflows
the range of the data type being used. For example, if two signed numbers are
added and the result is larger than the maximum positive value that can be
represented by the data type, the Overflow Flag will be set.
5. Parity Flag (PF): This flag is set if the result of an operation has an even number of
1's in its binary representation. For example, if the result of an operation is
01101010, which has four 1's, the Parity Flag will be set.
6. Auxiliary Carry Flag (AF): This flag is set if there is a carry out of the lower nibble
(4 bits) of the result of an arithmetic operation. For example, if two unsigned
numbers are added and there is a carry out of the lower nibble, the Auxiliary Carry
Flag will be set.
Here are three control flags of the 80386:
1. Interrupt Flag (IF): This flag is used to enable or disable interrupts. If the Interrupt
Flag is set, the processor will respond to interrupt requests. If it is cleared, the
processor will ignore interrupt requests.
2. Direction Flag (DF): This flag is used to control the direction of string operations. If
the Direction Flag is set, string operations will decrement the memory pointer after
each iteration. If it is cleared, string operations will increment the memory pointer
after each iteration.
3. Trap Flag (TF): This flag is used for single-stepping through code during debugging.
If the Trap Flag is set, the processor will execute one instruction at a time and then
generate a trap interrupt. This allows a debugger to examine the state of the
processor after each instruction.
Here are four system flags of the 80386:
1. Input/Output privilege level (IOPL): The two bits in the IOPL are used by the
processor and the operating system to determine your application’s access to I/O
facilities.
1
2. Nested Task (NT): This flag is set when one system task invokes another task.
3. Resume Flag (RF): This flag is used with the debug register breakpoints. It is
checked at the starting of every instruction cycle and if it is set, any debug fault is
ignored during the instruction cycle.
4. Virtual Mode Flag (VM): Indicates operating mode of 80386. When VM flag is set,
80386 switches from protected mode to virtual 8086 mode.
2. Control Register of 80386
Control Registers: The control registers CR0-CR3 control various features. CR0,
CR1,and CR3 hold the global machine status which affect all the tasks in the system
independent of executed task.
Control register CR0: It contains the six status bits. It gives us the Machine Status
Word
The six control bits are
1) PG (Paging): It enable or disable paging mechanism (PG=1, Enable)
2) EM (Emulate co-processor): This bit is made ‘1’ in the absence of a Math Co-
processor so that if a co-processor instruction is encountered, then it will be
executed by an on-chip emulator. If this bit is ‘0’ then the co-processor
instructions will be executed by 80387/80287 whichever is present in the
system.
3) MP (Math co-processor present):This bit is made ‘1’ to indicate that a math
coprocessor is present.
4) TS (Task Switched): If TS = 1, it means a task switch is performed. Now the
TSS of the current task has a back- link to previous task.
5) PE (Protection Enable): This bit is made to ‘1’ to enter protected mode. On
reset this bit is ‘0’. It is the only bit of CR0 which is also available in Real mode.
6) ET (Extension Type): This bit informs the 80386 DX whether the numeric
processor is an 80287 or 80387.IF ET = 0, It selects the 80287 co-processor and
if ET = 1, it selects the 80387 co-processor
Control register CR1: It is not used in the 0386 DX.
Control register CR2: is used to hold the linear address for which a page fault
(required page not being present in the physical memory) has occurred and using
this address the operating system can load the required page in the physical memory
from the secondary memory.
Control register CR3: (Page Directory Base Register-PDBR)
The 80386 microprocessors implement 2-level page translation mechanism.
*Information about various pages is stored in various page tables.
*Addresses of these page tables are stored in the page directory.
*CR3 gives the base address (starting address) of the page directory.
2
3. Explain Protected mode operation in 80386
• All the capabilities of 80386 are available for utilization in its protected mode of
operation.
• The 80386 in protected mode support all the software written for 80286 and 8086
to be executed under the control of memory management and protection abilities
of 80386.
• The protected mode allows the use of additional instruction, addressing modes
and capabilities of 80386.
ADDRESSING IN PROTECTED MODE
• The paging unit is a memory management unit enabled only in protected mode.
• The paging mechanism allows handling of large segments of memory in terms of
pages of 4Kbyte size.
• The paging unit operates under the control of segmentation unit.
• The paging unit if enabled converts linear addresses into physical address, in
protected mode.
3
4. Draw and explain code cache in Pentium
Code Cache:
• It is an 8 KB cache dedicated to supply instructions to processor’s execution
pipeline.
• 2 way set associative cache with a line size of 32 bytes
Prefetch Buffers:
• Four prefetch buffers within the processor works as two independent pairs.
• When instructions are prefetched from cache, they are placed into one set of
prefetch buffers.
• The other set is used as when a branch operation is predicted.
• Prefetch buffer sends a pair of instructions to instruction decoder
Instruction Decode Unit:
• It occurs in two stages – Decode1 (D1) and Decode2(D2)
• D1 checks whether instructions can be paired
• D2 calculates the address of memory resident operands
4
5. Explain instruction pairing rules for U and V pipeline in Pentium Integer Pipeline
• The pipelines are called ‘u’ and ‘v’ pipes.
• The u-pipe can execute any instruction, while the v-pipe can execute “simple"
instructions as defined in the ‘Instruction Pairing Rules’.
• When instructions are paired, the instruction issued to the v-pipe is always the next
in sequential after the one issued to u-pipe.
The integer pipeline stages are as follows:
1. Prefetch (PF):
• Instructions are prefetched from the on-chip instruction cache.
2. Decode1 (Dl):
• Two parallel decoders attempt to decode and issue the next two sequential
instructions.
• It checks whether the instructions can be paired.
• It decodes the instruction to generate a control word.
• A single control word causes direct execution of an instruction.
• Complex instructions require microcoded control sequencing
3. Decode2 (DZ):
• Decodes the control word
• Address of memory resident operands are calculated
4. Execute (EX):
• The instruction is executed in ALU
• Data cache is accessed at this stage
• For both ALU and data cache access requires more than one clock.
5. Writeback (WB):
• The CPU stores the result and updates the flags
Integer Instruction Pairing Rules
• To issue two instructions simultaneously they must satisfy the following conditions:
• Both instructions in the pair must be ‘simple’.
• There must be no read-after-write (RAW) or write-after-write register (WAW)
dependencies
RAW:
i1. R2 R1 + R3
i2. R4 R2 + R3
WAW:
i1. R2 R4 + R7
i2. R2 R1 + R3 A
• Neither instruction may contain both a displacement and an immediate.
• Instruction with prefixes (e.g., lock, repne) can only occur in the u-pipe.
5
6. Explain branch prediction logic used in Pentium
• BTB is a look-aside cache that sits off to the side of D1 stages of two pipelines and
• monitors for branch instructions.
• The first time that a branch instruction enters either pipeline, the BTB uses its
source
• memory address to perform a lookup in the cache.
• Since the instruction has not been seen before, this results in a BTB miss.
• It means the prediction logic has no history on instruction.
• It then predicts that the branch will not be taken and program flow is not altered.
• Even unconditional jumps will be predicted as not taken the first time that they are
seen by BTB.
• When the instruction reaches the execution stage, the branch will be either taken
or not taken.
• If taken, the next instruction to be executed should be the one fetched from branch
target address.
• If not taken, the next instruction is the next sequential memory address
• When the branch is taken for the first time, the execution unit provides feedback
to the branch prediction logic.
• The branch target address is sent back and recorded in BTB.
• A directory entry is made containing the source memory address and history bits
set as strongly taken.
6
7. Explain Intel Net Bus microarchitecture
a) Frond end:
Fetches the Instructions, decode them and send them to the out of order execution
core.
There are three parts to it:
• Fetch/Decode Unit: Fetches instructions from L2 cache. Decode into micro-ops.
Store micro-ops in L1 cache.
• Execution Trace cache: Execution Trace Cache stores decoded instructions and
when there is a miss-prediction there is no need to re-decode the instruction and
so decode latency is reduced.
• BTB/Branch Prediction: Determines next instruction to be fetch from L2 cache in
case of Trace cache miss
b) Integer and Floating-Point Units
This is the unit where the instructions are actually executed.
It has two parts:
L1 data cache: Used for both Integer and FP loads and stores. 4-way associative cache,
write through (Every data in L1 written to L2). 8 K in size and it is very fast.
c) Execution unit:
Execute micro-ops. Data from L1 cache.
Results in registers
- Up to 4 integer arithmetic operations per clock cycle
- 1 Floating point operation per clock cycle
- A memory load and read operation each clock cycle
d) Memory Subsystem:
- This includes the L2 cache and the system bus.
- The L2 cache stores both instructions and data that cannot fit in the Execution
Trace Cache and the L1 data cache.
- Used for Accessing the Main memory when there is a L2 cache miss.
- Used also for accessing the I/O devices
- Bandwidth – 3.2 GB/s
- Width – 64 Bits
- Clock rate – 400 MHz
7
e) Out of Order Engine:
This is where the Instructions are prepared for execution. Keeps the execution units
busy. Allocate as many instructions are possible that have their operands ready.
There are two parts to it:
• Out of order Execution Logic:
- Allows maximum Utilization
- Schedules micro-ops
- Based on data dependence and resources
- May speculatively execute
- Execute independent instructions that are ready to execute.
• Retirement Unit
- Ensures that the Instruction are back in order.
- The retirement unit reorders the instructions, executed in an out-of-order manner,
back to the original program order.
- This logic also reports branch history information to the branch predictors at the
front end of the machine so they can train with the latest known-good branch-
history information.
8. Give significance of hyper thread technology used in pentium4
• HT technology enables a single physical processor to execute two or more
separate code streams (called thread) concurrently.
• HT technology allows 1 physical processor to appear as 2 or more logical
processor to software (OS and application)
• HT technology is one form of hardware multithreading capability of processor
• Each logical processor has its own architecture state with its own set of general-
purpose and control registers, some machine state registers
• Logical processors share a single set of physical set of resources (Caches,
execution units, branch predictors, control logic and buses)
• OS view logical processors as physical processors
• Schedule threads to logical processors as in multiprocessor system
• Each logical processor has its own interrupt controller (Interrupts sent to a
specific logical processor are handled only by it)
8
9. Compare RM, PM and VM
Sr.
Real Mode Protected Mode Virtual Mode
No.
1. Only one task can Multiple tasks can Only one task can be
be executed at any be executed executed at any
given instant. simultaneously. given instant.
2. Switching between Protected Mode Switching between
real and protected switching with virtual and protected
mode requires virtual mode is mode is easy
complicated easier compared to compare to that of
process. real mode. real mode.
3. Maximum memory Memory accessible Memory accessible is
accessible is 1MB+ is entire 4Gb. entire 4Gb.
64Kb -16 bytes.
4. Memory Memory addressing Memory accessing
Addressing is is done using virtual seems to be
similar to that of description and similar to that of
8086. selectors. 8086.
5. No protection Protection is No protection
amongst tasks. implemented amongst tasks.
amongst tasks.
6. It is default It allows processor It allows execution of
operating mode on that supports X86 real mode
reset. Its main allows system application that are
function is to software to utilize incapable of running
initialize 80386 for memory currently in
protected mode segmentation, protected mode
operator paging, while processor is
multitasking, running a protected
protection. mode operating
system.