21CS1302 - Camp - Unit 4 Notes
21CS1302 - Camp - Unit 4 Notes
COMMON SIGNALS
The common signals for minimum and maximum mode are listed in Table. The lower
sixteen lines of the address are multiplexed with data and the upper four lines of the
address are multiplexed with status signals. During the first clock period of a bus
cycle the entire 20-bit address is available on these lines. During all other clock
periods of a bus cycle, the data and status signals will be available on these lines.
The status signals on S3 and S4 specify the segment register used for calculating
physical address. The output on the status lines S3 and S4 when the processor is
accessing various segments are listed in Table.
DT/R (Data Transmit / Receive)- It is an output signal from the processor to control
the direction of data flow through the data transceivers.
DEN (Data Enable) - It is an output signal from the processor used as output enable
for the data transceivers.
ALE (Address Latch Enable) - It is used to demultiplex the address and data lines
using external latches.
INTA (Interrupt Acknowledge) - The 8086 outputs low on this line to acknowledge
when the interrupt request is accepted by the processor.
HOLD - It is an input signal to the processor from other bus masters as a request to
grant the control of the bus. It is usually used by DMA controller to get the control of
bus.
RQ/GT0 (Bus Request/Bus Grant 0) - These requests are used by the other local bus
masters.
RQ/GT1(Bus Request/Bus Grant 1) - To force the processor to release the local bus at
the end of the processor's current bus cycle. These pins are bidirectional. The request
on GT0 will have higher priority than GT1.
LOCK - It is an output signal, activated by the LOCK prefix instruction and remains
active until the completion of the instruction prefixed by LOCK. The 8086 outputs
low on the LOCK pin while executing an instruction prefixed by LOCK to prevent
other bus masters from gaining control of the system bus.
QS1, QS0 (Queue Status) - The processor provides the status of queue on these lines.
The queue status can be used by the external device to track the internal status of the
queue in the 8086. The QS0 and QS1 are valid during the clock period following any
queue operation. The output on QS0 and QS1 can be interpreted as
S0, S1, S2 (Status Signals) - These are status signals and they are used by the 8288
bus controller to generate the bus timing and control signals. The status signals are
decoded as shown in Table.
3. BASIC CONFIGURATION
The 8086 can operate in two modes: Minimum mode and Maximum mode.
The mode is decided by a signal at MN/MX pin.
When the MN/MX is tied high, it works in minimum mode and the system is
called a uniprocessor system.
When MN/MX is tied low, it works in maximum mode and the system is called
a multiprocessor system.
MINIMUM MODE SIGNALS [MN/MX = VCC (Logic high)]
The minimum mode signals of an 8086 are listed in Table. For minimum mode of
operation the MN/MX pin is tied to Vcc(logic high). In minimum mode, the 8086
itself generates all bus control signals. The minimum mode signals are explained
below :
DT/R (Data Transmit / Receive)- It is an output signal from the processor to control
the direction of data flow through the data transceivers.
DEN (Data Enable) - It is an output signal from the processor used as output enable
for the data transceivers.
ALE (Address Latch Enable) - It is used to demultiplex the address and data lines
using external latches.
WR (Write) - It is a write control signal and it is asserted low whenever the processor
writes data to memory or IO port.
INTA (Interrupt Acknowledge) - The 8086 outputs low on this line to acknowledge
when the interrupt request is accepted by the processor.
HOLD - It is an input signal to the processor from other bus masters as a request to
grant the control of the bus. It is usually used by DMA controller to get the control of
bus.
RQ/GT0 (Bus Request/Bus Grant 0) - These requests are used by the other local bus
masters
RQ/GT1(Bus Request/Bus Grant 1) - To force the processor to release the local bus at
the end of the processor's current bus cycle. These pins are bidirectional. The request
on GT0 will have higher priority than GT1.
LOCK - It is an output signal, activated by the LOCK prefix instruction and remains
active until the completion of the instruction prefixed by LOCK. The 8086 outputs
low on the LOCK pin while executing an instruction prefixed by LOCK to prevent
other bus masters from gaining control of the system bus.
QS1, QS0 (Queue Status) - The processor provides the status of queue on these lines.
The queue status can be used by the external device to track the internal status of the
queue in the 8086. The QS0 and QS1 are valid during the clock period following any
queue operation. The output on QS0 and QS1 can be interpreted as
S0, S1, S2 (Status Signals) - These are status signals and they are used by the 8288
bus controller to generate the bus timing and control signals. The status signals are
decoded as shown in Table.
4. 8086 MICROPROCESSOR ARCHITECTURE
8086 is a 16 bit microprocessor with 20 bit address bus and 16 bit data bus.
Thus it can directly access 220=1,048,576(1MB) memory location and can
read/write 8 bits or 16 bit data from/to memory or I/O.
The internal architecture of 8086 has two functional units:
1. Bus interface unit (BIU)
2. Execution unit (EU)
The BIU and EU function independently.
The BIU fetches instructions, reads data from memory and I O ports, and
writes data to memory and I O ports. The BIU contains segment registers,
instruction pointer, instruction queue, address generation unit and bus control
unit.
The EU executes instructions that have already been fetched by the BIU.
BUS INTERFACE UNIT:
The BIU performs all bus operations for the execution unit, and is responsible for
executing all bus cycles. The BIU contains Bus interface logic, Segment Registers,
Instruction Pointer and an Instruction Queue.
Segment Registers
In the 8086, the 1 MB physical memory is divided into four segments – Code
Segment, Data Segment, Stack Segment and Extra Segment. Each segment has
memory space of 64 KB. Each segment is addressed by a 16-bit segment register as
follows:
The 8086 memory address is 20 bits. The segment register supplies the higher-order
16 bits of the 20-bit memory address. All memory addresses of the 8086 are computed
by summing the contents of the segment register and the offset address.
Instruction Pointer
This register is also referred as program counter.
It is used for the calculation of actual memory addresses of instructions.
It stores the offset for the instruction.
During an instruction fetch, IP contents are added to the code segment
register contents after 4 bit left-shift.
Instruction Queue
8086 employs parallel processing
When EU is busy decoding or executing current instruction, the buses of 8086
may not be in use.
At that time, BIU can use buses to fetch up to six instruction bytes for the
following instructions
BIU stores these pre-fetched bytes in a FIFO register called Instruction Queue
When EU is ready for its next instruction, it simply reads the instruction from
the queue in BIU
EXECUTION UNIT:
The execution unit (EU) contains the complete infrastructure required to execute an
instruction, i.e. Instruction Decoder, Control Circuitry Unit, Arithmetic Logic Unit,
General Purpose Registers, Flag Registers, Pointer Registers, and Index Registers.
Instruction Decoder
The Instruction Decoder translates instructions fetched from memory into a series of
actions which EU carries out.
BX register:
In addition to serving as a general purpose register, DX can be used as
base register while computing the data memory address.
CX register:
In addition to serving as a general purpose register, it can be used to hold
count in multi iteration instruction. Several 8086 instructions can be made
to repeat or to loop. In such instruction CX holds the desired number of
repetitions and is automatically decremented after each iteration. When
CX becomes zero, the execution of the instruction is terminated.
DX register:
In addition to serving as a general-purpose data register, DX may be used
in I/O instructions, multiply and divide instructions. DX contains the
addresses of the I/O ports in certain types of I/O instructions. In 32-bit
multiply and divide instructions, DX is used to hold the high-order word
operand.
Flag Registers
The 8086's PSW contains 16 bits, but 7 of them are not used. Each bit in the
PSW is called a flag. The 8086 flags are divided into the conditional flags, which
reflect the result of the previous operation involving the ALU, and the control
flags, which control the execution of special functions.
Conditional Flags :
CY (Carry Flag) - An addition causes this flag to be set if there is a carry out
of the MSB, and a subtraction causes it to be set if a borrow is needed.
P (Parity Flag) - It is set to I if the low-order 8 bits of the result contain an
even number of ls; otherwise it is cleared.
AC (Auxiliary Carry Flag) - It is set if there is a carry out of bit 3 during
an addition or a borrow by bit 3 during a subtraction. This flag is used
exclusively for BCD arithmetic.
Z (Zero Flag ) – The zero flag will be set to 1, if the result of an operation is
0. Or set to 1 , if result is nonzero.
S (Sign Flag) -After the arithmetic or logic operations, the sign flag is set if
the MSB of the result is 1. It indicates the result is negative.
O(Overflow Flag) - is set if an overflow occurs, i.e., a result is out of range.
More specifically, for addition this flag is set when there is a carry into the
MSB and no carry out of the MSB or vice versa. For subtraction, it is set
when the MSB needs a borrow and there is no borrow from the MSB, or vice
versa.
Control Flags:
T (Trap Flag) If set, it puts the processor into single step mode for
debugging.
I (Interrupt Enable Flag) – This flag enables the 8086 to recognize the
external interrupt requests. When IF = 0, all maskable interrupts are disabled.
It has no effect on either non- maskable interrupts or internally generated
interrupts.
D (Direction Flag) - It is used with string instructions. When set causes the
string instructions to auto decrement or to process the string from right to left.
Otherwise the string instructions are auto incremented i.e. from left to right
Pointer Registers
The registers in this group are:
Stack pointer (SP)
Base pointer (BP)
Stack pointer (SP):
SP register holds a 16-bit offset from the start of stack segment to
the top of the stack
The stack pointer is used in instructions which use stack, i.e.
PUSH, POP, CALL, RET, etc.
It always points to a location in memory known as stack top.
Base Pointer (BP)
The chief purpose of this register is to provide indirect access to data in
stack register.
It may also be used for general data storage
Index Registers
The registers in this group are:
Source index (SI)
Destination Index (DI)
Source index (SI) and Destination Index (DI)
These registers may be used for general data storage. However, the main purpose of
this registers is to store offset in case of indexed, base indexed and relative base
indexed addressing modes.
EX:
MOV AX,[SI] ; AL← [SI] ; AH← [SI+1]
Ex:
MOV AX,[BP+2] ; AL ← [BP+2]; AH← [BP+3]
EX:
MOV AX,[BX+SI+6] ; AL ← [BX+SI+6] ; AH← [BX+SI+7]
Operand:- The data on which operation should act. Operands may be register values or
memory values. The CPU executes the instructions using information present in this
field. It may be 8-bit data or 16-bit data.
Assembler - It converts the instruction into sequence of binary bits, so that thesed bits
can be read by the processor.
Mnemonic - These are the symbolic codes for either instructions or commands to
perform a particular function.
E.g. MOV, ADD, SUB etc.
Instruction Description
MOV Moves data from register to register, register to memory, memory to register,
memory to accumulator, accumulator to memory, etc.
MOV AX, BX
MOV AX, 5000H
LDS Loads a word from the specified memory locations into specified register. It
also loads a word from the next two memory locations into DS register.
LDS REG,MEM
LES Loads a word from the specified memory locations into the specified
register. It also loads a word from next two memory locations into ES
register.
LES REG,MEM
LAHF Loads low order 8-bits of the flag register into AH register.
LAHF
SAHF Stores the content of AH register into low order bits of the flags register.
SAHF
XCHG Exchanges the contents of the 16-bit or 8-bit specified register with the
contents of AX register, specified register or memory locations.
XCHG [5000H], AX
POP Pops (reads) two bytes from the top of the stack and keeps them in a
specified register, or memory location(s).
POP AX
POP [5000H]
PUSHF Pushes (writes) two bytes from the flag register to top of the stack.
PUSHF
POPF Pops (reads) two bytes from the top of the stack and keeps them in the flag
register.
POPF
2. Arithmetic Instructions
Instructions of this group perform addition, subtraction, multiplication, division,
increment, decrement, comparison, ASCII and decimal adjustment etc.The following
instructions come under this category:
Instruction Description
ADC Adds specified operands and the carry status (i.e. carry of the previous stage).
ADC AX, BX
ADC AX, 0100H
ADC AX, [1000H]
SBB Subtract immediate data with borrow from accumulator, memory or register.
SBB AX, BX
SBB AX, 0100H
SBB AX, [1000H]
DAA Decimal Adjust after BCD Addition: When two BCD numbers are added, the
DAA is used after ADD or ADC instruction to get correct answer in BCD.
DAA
DAS Decimal Adjust after BCD Subtraction: When two BCD numbers are added,
the DAS is used after SUB or SBB instruction to get correct answer in BCD.
DAS
AAA ASCII Adjust for Addition: When ASCII codes of two decimal digits are
added, the AAA is used after addition to get correct answer in unpacked
BCD.
AAA
AAD Adjust AX Register for Division: It converts two unpacked BCD digits in AX
to the equivalent binary number.
AAD
AAM Adjust result of BCD Multiplication: This instruction is used after the
multiplication of two unpacked BCD.
AAM
AAS ASCII Adjust for Subtraction: This instruction is used to get the correct result
in unpacked BCD after the subtraction of the ASCII code of a number from
ASCII code another number.
AAS
NEG Obtains 2's complement (i.e. negative) of the content of an 8-bit or 16-bit
specified register or memory location(s).
NEG AL
3. Logical Instructions
Instruction of this group perform logical AND, OR, XOR, NOT and TEST
operations. The following instructions come under this category:
Instruction Description
AND Performs bit by bit logical AND operation of two operands and places the
result in the specified destination.
AND AX, BX
AND AX, 0100H
AND AX, [1000H]
OR Performs bit by bit logical OR operation of two operands and places the result
in the specified destination.
OR AX, BX
OR AX, 0100H
OR AX, [1000H]
XOR Performs bit by bit logical XOR operation of two operands and places the
result in the specified destination.
XOR AX, BX
XOR AX, 0100H
XOR AX, [1000H]
TEST Perform logical AND operation of a specified operand with another specified
operand.
TEST AX,BX
4. Rotate Instructions
The following instructions come under this category:
Instruction Description
RCL Rotate all bits of the operand left by specified number of bits through carry
flag.
RCL CX, 1
RCL BL, CL
RCR Rotate all bits of the operand right by specified number of bits through carry
flag.
RCR CX, 1
RCR BL, CL
ROL Rotate all bits of the operand left by specified number of bits.
ROL CX, 1
ROL BL, CL
ROR Rotate all bits of the operand right by specified number of bits.
ROR CX, 1
ROR BL, CL
5. Shift Instructions
The following instructions come under this category:
Instruction Description
SAL or SHL Shifts each bit of operand left by specified number of bits and put zero in LSB
position.
SAL CX, 1
SAL AX, CL
SAR Shift each bit of any operand right by specified number of bits. Copy old MSB
into new MSB.
SAR CX, 1
SAR AX, CL
SHR Shift each bit of operand right by specified number of bits and put zero in
MSB position.
SHR CX, 1
SHR AX, CL
6. Branch Instructions
It is also called program execution transfer instruction. Instructions of this group
transfer program execution from the normal sequence of instructions to the specified
destination or target. The following instructions come under this category:
Instruction Description
CALL Calls a procedure whose address is given in the instruction and saves
their return address to the stack.
Here,
CF = Carry Flag
ZF = Zero Flag
OF = Overflow Flag
SF = Sign Flag
CX = Register
ESC Escape: makes bus free for external master like a coprocessor or peripheral
device.
WAIT When WAIT instruction is executed, the processor enters an idle state in which
the processor does no processing.
LOCK It is a prefix instruction. It makes the LOCK pin low till the execution of the
next instruction.
Instruction Description
CLC Clear Carry Flag: This instruction resets the carry flag CF to 0.
CLD Clear Direction Flag: This instruction resets the direction flag DF to 0.
CLI Clear Interrupt Flag: This instruction resets the interrupt flag IF to 0.
9. String Instructions
String is series of bytes or series of words stored in sequential memory locations. The
8086 provides some instructions which handle string operations such as string
movement, comparison, scan, load and store.
Instruction Description
c) Bus Timings
During T1 :
• The address is placed on the Address/Data bus.
• Control signals M/ IO , ALE and DT/ R specify memory or I/O, latch the
address onto the address bus and set the direction of data transfer on data bus.
During T2 :
• 8086 issues the RD or WR signal, DEN , and, for a write, the data.
• DEN enables the memory or I/O device to receive the data for writes and the
8086 to receive the data for reads.
During T3 :
• This cycle is provided to allow memory to access data.
• READY is sampled at the end of T2 .
• If low, T3 becomes a wait state.
• Otherwise, the data bus is sampled at the end of T3 .
During T4 :
• All bus signals are deactivated, in preparation for next bus cycle.
• Data is sampled for reads, writes occur for writes.
Setup time – The time before the rising edge of the clock, while the data must be valid
and constant
Hold time – The time after the rising edge of the clock during which the data must
remain valid and constant.
e) Wait State
A wait state (Tw) is an extra clocking period, inserted between T2 and T3, to
lengthen the bus cycle, allowing slower memory and I/O components to
respond.
The READY input is sampled at the end of T2, and again, if necessary in the
middle of Tw. If READY is ‘0’ then a Tw is inserted.
8. ASSEMBLER DIRECTIVES
1. ASSUME – The ASSUME directive is used to tell the assembler that the name
of the logical segment should be used for a specified segment.
7. END (End program) -This directive indicates the assembler that this is the
end of the program module. The assembler ignores any statements after an
END directive.
9. ENDS (End Segment) - This directive is used with the name of the segment to
indicate the end of that logical segment.
10. EQU (Equate) - This EQU directive is used to give a name to some value or to
a symbol.
11. PROC (Procedure) - The PROC directive is used to identify the start of a
procedure.
12. PTR (Pointer)-This PTR operator is used to assign a specific type of a variable
or to a label.
13. ORG (Originate) - The ORG statement changes the starting offset address of
the data.
Directives Examples
1. ASSUME CS:CODE (cs=> code segment)
2. ORG 3000
3. NAME DB ‘COMPUTER’
4. NUMBER DD 12341234H
9. MODULAR PROGRAMMING
Generally , industry-programming projects consist of thousands of lines of
instructions or operation code.
2. PROCEDURES
Calls, Returns, and Procedure Definitions
The branch to a procedure is referred to as the call, and the corresponding branch
back is known as the return. The return is always made to the instruction
immediately following the call regardless of where the call is located. If, as shown
in Fig. 1.29(a), several calls are made to the same procedure, the return after each
call is made to the instruction following that call.
Use of procedures
The CALL instruction branches to the indicated address, and also pushes the
return address onto the stack. The RET instruction simply pops the return address
from the stack.
The addressing modes for the CALL instruction are the same as those for the
JMP instruction. A CALL may be direct or indirect and intrasegment or
intersegment.
3. MACROS
GPU ARCHITECTURE
First GPU is GeForce 256 by NVIDIA in 1999.
These GPU chips can process a minimum of 10 million polygons per second.
The GPU consists of upto128 cores on a single chip.
Each core can handle 8 threads of instructions and hence 1,024(8 * 128)
threads are executed concurrently on a single GPU.
A GPU is hardware device which contain multiple small hardware units called
Streaming Multiprocessors(SM).
Multiple SMs can be built on single GPU chip.
Each SM can execute many threads concurrently.
Each SM is associated with a private L1 Data Cache.
Every Memory Controller (MC) is associated with a shared L2 cache for
faster access to the cached data. Both MC and L2 are on-chip.
Each SM has 32 CUDA cores (Totally 16*32 =512 CUDA Cores)
DESCENDING ORDER