0% found this document useful (0 votes)
19 views23 pages

COA Unit 2

The document provides an overview of computer arithmetic, focusing on number representation including unsigned and signed integers, as well as fixed and floating-point representations. It explains the methods for addition and subtraction of integers, including the use of 2's complement for signed numbers, and discusses various adder circuits like ripple carry and carry look-ahead adders. Additionally, it covers normalization of floating-point numbers and the IEEE 754 standard for representing real numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views23 pages

COA Unit 2

The document provides an overview of computer arithmetic, focusing on number representation including unsigned and signed integers, as well as fixed and floating-point representations. It explains the methods for addition and subtraction of integers, including the use of 2's complement for signed numbers, and discusses various adder circuits like ripple carry and carry look-ahead adders. Additionally, it covers normalization of floating-point numbers and the IEEE 754 standard for representing real numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Computer Arithmetic

NUMBER REPRESENTATION:

NUMBERS

Integers FLOATING POINT


NUMBERS

UNSIGNED
NUMBERS
(ONLY POSITIVE)

REAL WORLD INSIDE COMPUTER SYSTEM


STORED IN BINARY SIGNED NUMBERS
(DECIMAL SYSTEM)
(HEX FORMAT) (BOTH POSTIVE AND
NEGATIVE

UNSIGNED INTEGERS

These are binary numbers that are always assumed to be positive.Here all available bits of the number are
used to represent the magnitude of the number.No bits are used to indicate itssign, hence they are called
unsigned numbers.
E.g.: Roll Numbers, Memory addresses etc

SIGNED INTEGERS

These are binary numbers that can be either positive or negative. The MSB of the number indicates
whether it is positive or negative. If MSB is 0 then the number is Positive. If MSB is 1 then the
number is Negative. Negative numbers are always stored in 2’s complement form.

Three systems are used forrepresenting such numbers:

• Signed magnitude

• 1’s-complement

• 2’s-complement

In all three systems, the leftmost bit is 0 for positive numbers and 1 for negative numbers.Positive values
have identical representations in all systems, but negative values have different representations.
In the signed magnitude system, negative values are represented by changing the mostsignificant bit
from 0 to 1.For example, +5 is represented by 0101, and −5 is represented by 1101.

In 1’s-complement representation, negative values are obtained by complementing eachbit of the


corresponding positive number. Thus, the representation for −3 is obtainedby complementing each bit in
the vector 0011 to yield 1100.The same operation, bitcomplementing, is done to convert a negative
number to the corresponding positive value.

Fig: Binary signed number Representations

Two’s complement gives a unique representation for zero.Any other system gives a separate
representation for +0 and for -0. This is absurd. In two’s complement system, -(x) is stored as two’s
complement of (x). Applying the same rule for 0, -(0) should be stored as two’s complement of 0. 0 is
stored as 000. So –(0) should be stored as two’s complement of 000, which again is 000. Hence two’s
complement gives a unique representation for 0.It produces an additional number on the negative
side. As two’s complement system produces a unique combination for 0, it has a spare combination
‘1000’ in the above case, and can be used to represent –(8).
Fixed and Floating point Representations:
There are two major approaches to store real numbers (i.e., numbers withfractional component) in modern
computing. These are
(i) Fixed Point Notation and
(ii) Floating Point Notation.
Fixed Point Notation:In fixed point notation, there are a fixed number of digits after the decimal point,
whereas floating point number allows for avarying number of digits after the decimal point.
This representation has fixed number of bits for integer part and for fractional part. For example, if given
fixed-point representation is IIII.FFFF, then you can store minimum value is 0000.0001 and maximum
value is 9999.9999. There are three parts of a fixed-point number representation: the sign field, integer
field, and fractional field.

Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for the integer part and 16
bits for the fractional part. Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent -. 000000000101011 is 15-bit binary value for
decimal 43 and 1010000000000000 is 16-bit binary value for fractional 0.625.
The advantage of using a fixed-point representation is performance and disadvantage is relatively limited
range of values that they can represent. So, it is usually inadequate for numerical analysis as it does not
allow enough numbers and accuracy. A number whose representation exceeds 32 bits would have to be
stored inexactly.
Floating Point Representation:
In some numbers, which have a fractional part, the position of the decimal point is not fixed as the
number of bits before (or after) the decimal point may vary. Eg: 0010.01001, 0.0001101, -1001001.01
etc. the position of the decimal point is not fixed, instead it"floats" in the number.Such numbers are
called Floating Point Numbers. Floating Point Numbers are stored in a "Normalized" form.
NORMALIZATION OF A FLOATING POINTNUMBER:

Normalization is the process of shifting the point, left or right, so that there is only one non-zero digit to
the left of the point.

01010.01 (-1)0 x 1.01001 x 23

11111.01 (-1)0 x 1.111101 x 24

-10.01 (-1)1 x 1.001 x 21


A normalized form of a number is:
-1s x1.MX2E
Where: S = Sign, M = Mantissa and E = Exponent.

As Normalized numbers are of the 1.M format, the "1" is not actually stored, it is instead assumed. Also
the Exponent is stored in the biased form by adding an appropriate bias value to it so that -ve exponents
can be easily represented.
Advantages of Normalization.
1. Storing all numbers in a standard for makes calculations easier and faster.
2. By not storing the 1 (of 1.M format) for a number, considerable storage space is saved.
3. The exponent is biased so there is no need for storing its sign bit (as the biased exponent cannot be -
ve).
SHORT REAL FORMAT / SINGLE PRECISION FORMAT / IEEE 754: 32 BIT FORMAT:
1. 32 bits are used to store the number.
2. 23 bits are used for the Mantissa.
3. 8 bits are used for the Biased Exponent.
4. 1 bit used for the Sign of the number.
5. The Bias value is (127)10.

Range:

LONG REAL FORMAT / DOUBLE PRECISION FORMAT / IEEE 754: 64 BIT FORMAT
1. 64 bits are used to store the number.
2. 52 bits are used for the Mantissa.
3. 11 bits are used for the Biased Exponent.
4. 1 bit used for the Sign of the number.
5. The Bias value is (1023)10.
6. The range is +10-308 to +10308approximately.

s Biased Exponent Mantissa


1 bit 11-bits (Bias value:1023) 52-bits

Extreme cases of floating point numbers:


Floating point numbers are represented in IEEE formats.Consider IEEE 754 32-bit format also
called Single Precision format or Short real format.
Overflow:
For a value, 1.0 the normalized form will be
(-1)0 x 1.0 x 20
Herethe True Exponent is 0.

This is because the 8-bit biased exponent cannot hold a value more than 255.Hence, all cases where the
TE = 128 or more, the BE will be represented as 1111 1111.This indicates as exception (error) called
OVERFLOW. The number is called NaN (Not a Number).It is identified by Exponent being all 1s (1111
1111).Here, the Mantissa can be anything!The Extreme case of NaN is Infinity.It is also an
OVERFLOW and hence the Exponent will be 1111 1111.To differentiate Infinity from NaN, the
Mantissa in infinity is 0000 0000.Hence Infinity is identified as Exponent all 1s and Mantissa all 0s.
Suppose the number is 0.1.It will be normalized as
(-1)0 x 1.0 x 2-1
The true exponent here is -1.

Underflow: All cases where the TE = -127 or less, the BE will be represented as 0000 0000.This
indicates as exception (error) called UNDERFLOW.
The number is called De-Normal Number.It is identified by Exponent being all 0s (0000
0000).Here, the Mantissa can be anything.The Extreme case of De-Normal Number is Zero.
It is also an UNDERFLOW and hence the Exponent will be 0000 0000.To differentiate Zero from
De-Normal Number, the Mantissa in Zero is 0000 0000.Hence Zero is identified as Exponent all
0s and Mantissa all 0s.This means Zero is represented as all 0s.
Example:Convert 2A3BH into Short Real format.

Soln: Converting the number into binary we get:


0010 1010 0011 1011
Normalizing the number we get:
(-1)0x 1.0101000111011 x 213
Here S = 0; M = 0101000111011; True Exponent = 13.
Bias value for Short Real format is 127:
Biased Exponent (BE) = True Exponent + Bias
= 13 + 127
= 140.
Converting the Biased exponent into binary we get:
Biased Exponent (BE) = (1000 1100)
Representing in the required format we get:
0 10001100 010100011101100…

S Biased Exp Mantissa


(1) (8) (23)
Computer Arithmetic

Integer Addition:

Addition of Unsigned Integers: Addition of 1-bit numbers is illustrated below.The sum of 1 and
1 is the 2-bit vector 10, which represents the value 2. We say that the sum is 0 and the carry-out is
1. In order to add multiple-bit numbers, We add bit pairs starting from the low-order (right)
end of the bit vectors, propagating carries toward the high-order (left) end. The carry-out from a
bit pair becomes the carry-in to the next bit pair to the left. The carry-in must be added to a bit pair
in generating the sum and carry-out at that position. For example, if both bits of a pair are 1 and
the carry-in is 1, then the sum is 1 and the carry-out is 1, which represents the value 3.

Fig: Addition of 1-bit Numbers


Addition and Subtraction of Signed Integers:
 To add two numbers, add their n-bit representations, ignoring the carry-out bit fromthe most
significant bit (MSB) position. The sum will be the algebraically correct value in2’s-complement
representation if the actual result is in the range−(2n−1) through+2n−1– 1.
 To subtract two numbers X and Y, that is, to perform X − Y , form the 2’s-complement of Y , then
add it to X using the add rule. Again, the result will be the algebraically correct value in 2’s-
complement representation if the actual result is in the range −(2n−1) through+2n−1.
X-Y = X+(-Y) = X+(2’S Complement of Y)

Example: To perform 7-3 using 2’s complement addition


If we ignore the carry-out from the fourth bit position in this addition, we obtain the correct answer.
Few more examples:

Sign Extension:We often need to represent a value given in a certain number of bits by using a larger
number of bits. For a positive number, this is achieved by adding 0s to the left. For a negative number in
2’s-complement representation, the leftmost bit, which indicates the sign of the number, is a 1. A longer
number with the same value is obtained by replicating the sign bit to the left as many times as needed.
Overflow in Integer Arithmetic:Using 2’s-complement representation, n bits can represent values in the
range −(2n−1) through+2n−1.For example, the range of numbers that can be represented by 4 bits is
−8through +7.When the actualresult of an arithmetic operation isoutside the representable range, an
arithmetic overflow has occurred.
Introduction to adder circuits:

ONE BIT ADDITION: FULL ADDER


1) It is a 1-bit adder circuit.
2) It adds two 1-bit inputs Xi & Yi, along with a Carry Input Cin.
3) It produces a sum Zi and a Carry output Cout.
4) As it considers a carry input, it can be used in combination to add large numbers.
5) Hence it is called a Full Adder.
Fig: Circuit for Sum

Fig: Circuit for carry


RIPPLE CARRY ADDER( For Multiple bit addition ):
1) A Full Adder can add two “1-bit” numbers with a Carry input.
2) It produces a “1-bit” Sum and a Carry output.
3) Combining many of these Full Adders, we can add multiple bits.
4) One such method is called Serial Adder.
5) Here, bits are added one-by-one from Least significant bit(LSB) onwards.
6) The carries are connected in a chain through the full adders. The Carry of each stage is propagated
(Rippled) into the next stage.
7) Hence, these adders are also called Ripple Carry Adders.
Advantage: They are very easy to construct.
Drawback: As addition happens bit-by-bit, they are slow.
8) Number of cycles needed for the addition is equal to the number of bits to be added.
Inputs:
Assume X and Y are two “4-bit” numbers to be added, along with a Carry input CIN.
X = X0 X1 X2 X3 (X0 is the MSB … X3 is the LSB)
Y = Y0 Y1 Y2 Y3 (Y0 is the MSB … Y3 is the LSB)
CIN = Carry Input
Outputs:
Assume Z to be a “4-bit” output, and COUT to be the output Carry
Z = Z0 Z1 Z2 Z3 (Z0 is the MSB … Z3 is the LSB)(Here Z represents the sum)
COUT = Carry Output

Fig:4-bit Ripple Carry Adder

Carry Look ahead Adder(For multiple bit Addition):

1) This is also called as parallel adder. It is used to add multiple bits simultaneously.
2) While adding multiple bits, the main issue is that of the intermediate carries.
3) In Serial Adders, we therefore added the bits one-by-one.
4) This allowed the carry at any stage to propagate to the next stage.
5) But this also made the process very slow.
6) If we “PREDICT” the intermediate carries, then all bits can be added simultaneously.
7) This is done by the Carry Look Ahead Generator Circuit.
8) Once all carries are determined beforehand, then all bits can be added simultaneously.
Advantage: This makes the addition process extremely fast.
Drawback: Circuit is complex.
Inputs:
Assume X and Y are two “4-bit” numbers to be added, along with a Carry input CIN.
X = X0 X1 X2 X3 (X0 is the MSB … X3 is the LSB); Y = Y0 Y1 Y2 Y3 & CIN = Carry Input
Outputs:
Assume Z to be a “4-bit” output, and COUT to be the output Carry
Z = Z0 Z1 Z2 Z3 & COUT = Carry Output

Fig: Circuit for Carry Look ahead Adder

We can “Predict” (Look Ahead) all the intermediate carries in the followingmanner:
The carry at any stage can be calculated as:

This implies Ci = Gi + Pi.CIN

We need to predict the Carries: C3, C2, C1 and C0

C3 = G3 + P3CIN (I)
C2 = G2 + P2C3
Substituting the value of C3, we get:
C2 = G2 + P2G3 + P2P3CIN (II)
C1 = G1 + P1C2
Substituting the value of C2, we get:
C1 = G1 + P1G2 + P1P2G3 + P1P2P3CIN (III)
C0 = G0 + P0C1
Substituting the value of C1, we get:
C0 = G0 + P0G1 + P0P1G2 + P0P1P2G3 + P0P1P2P3CIN ( IV)

From the above four equations, it is clear that the values of all the four Carries (C3, C2, C1, C0) can be
determined beforehand even without doing the respective additions. To do this we need the values of all
G’s (Xi.Yi) and all P’s (Xi+Yi) and the original carry input CIN. This is done by the Carry Look Ahead
Generator Circuit.

Cycle 1: g1, p1, g2, p2, g3, p3, g0, p0are given to the carry look ahead generator.
Cycle 2: Input carries are given to the adders by the carry generator.
Cycle 3: Results are produced.
Total number of cycles required :3

Multiplication:
1) Shift and Add: This method is used to multiply two unsigned numbers. When we multiply two “N-
bit” numbers, the answer is “2 x N” bits. Three registers A, Q and M, are used for this process. Q
contains the Multiplier and M contains the Multiplicand. A (Accumulator) is initialized with 0. At the end
of the operation, the Result will be stored in (A & Q) combined. The process involves addition and
shifting. That is why it is called shift and add method.
Algorithm:
The number of steps required is equal to the number of bits in the multiplier.
1) At each step, examine the current multiplier bit starting from the LSB.
2) If the current multiplier bit is “1”, then the Partial-Product is the Multiplicand itself.
3) If the current multiplier bit is “0”, then the Partial-Product is the Zero.
4) At each step, ADD the Partial-Product to the Accumulator.
5) Now Right-Shift the Result produced so far (A & Q combined).
Repeat steps 1 to 5 for all bits of the multiplier.
The final answer will be in A & Q combined.
Fig: Shift and Add Multiplication

Example: Let us consider 7X6

Step C A Q M Explanation
Carry Accumulator Multiplier Multiplicand
0 0000 0110 0111 Initial Value

1 0 0000 0110 Current Multiplier bit is


0 0000 0011 “0” so ADD “0” to
Accumulator and
Right-Shift
2 0 0111 0011 Current Multiplier bit is
0 0011 1001 “1” so ADD Multiplicand
to Accumulator and
Right-Shift
3 0 1010 1001 Current Multiplier bit is
0 0101 0100 “1” so ADD Multiplicand
to Accumulator and
Right-Shift
4 0 0101 0100 Current Multiplier bit is
0 0010 1010 “0” so ADD “0” to
Accumulator and
Right-Shift

2) Booth Multiplier(For signed Multiplication):

Booth’s Algorithm is used to multiply two SIGNED numbers. When we multiply two “N-bit”
numbers, the answer is “2 x N” bits. Three registers A, Q and M, are used for this process.Q contains the
Multiplier and M contains the Multiplicand.A (Accumulator) is initialized with 0.At the end of the
operation, the Result will be stored in (A & Q) combined.The process involves addition, subtraction
and shifting.
Algorithm:
The number of steps required is equal to the number of bits in the multiplier.
At the beginning, consider an imaginary “0” beyond LSB of Multiplier
1) At each step, examine two adjacent Multiplier bits from Right to Left.
2) If the transition is from “0 to 1” then Subtract M from A and Right-Shift (A & Q) combined.
3) If the transition is from “1 to 0” then ADD M to A and Right-Shift.
4) If the transition is from “0 to 0” then simply Right-Shift.
5) If the transition is from “1 to 1” then simply Right-Shift.
Repeat steps 1 to 5 for all bits of the multiplier.
The final answer will be in A & Q combined.
Flowchart for Booth’s Algorithm:

Example: -9x10=-90
Multiplicand (M): -9 = 10111 9 = 01001. (Two’s Complement Form)
Multiplier (Q): 10 = 01010. -10 = 10110 (Two’s Complement Form)
step A Q Q(-1) M
Accumulator Multiplier Multiplicand
Initial 00000 01010 0 10111
1) (0 ç 0) 00000 01010 0
No Add or Sub 00000 00101 0
Right-Shift
2) (1 ç 0) 01001 00101 0
Perform (A - M) 00100 10010 1
Right-Shift
3) (0 ç 1) 11011 10010 1
Perform (A + M) 11101 11001 0
Right-Shift
4) (1 ç 0) 00110 11001 0
Perform (A - M) 00011 01100 1
Right-Shift
5) (0 ç 1) 11010 01100 1
Perform (A + M) 11101 00110 0
Right-Shift

Restoring and Non-Restoring Division:

Non Restoring Division:

1) Let Q register hold the divided, M register holds the divisor and A register is 0.
2) On completion of the algorithm, Q will get the quotient and A will get the remainder.

Algorithm:
The number of steps required is equal to the number of bits in the Dividend.
1) At each step, left shift the dividend by 1 position.
2) Subtract the divisor from A (perform A - M).
3) If the result is positive then the step is said to be “Successful”. In this case quotient bit will be “1” and
Restoration is NOT Required. The Next Step will also be Subtraction.
4) If the result is negative then the step is said to be “Unsuccessful”. In this case quotient bit will be “0”.
Here Restoration is NOT Performed. Instead, the next step will be ADDITION in place of subtraction.
As restoration is not performed, the method is called Non-Restoring Division.
Repeat steps 1 to 4 for all bits of the Dividend.
Example: (7) / (5)
Dividend (Q) = 7
Divisor (M) = 5
Accumulator (A) = 0
7 = 0111 5 = 0101
-7 = 1001-5 = 1011
Accumulator Dividend Divisor
A(0) Q(7) M(5)
Initial Values 0000 0111 0101
Step 1:Left shift 0000 111_
A-M +1011
Unsuccessful(-ve) 1011 1110
Next step: Add
Step 2:Left shift 0111 110_
A+M +0101
Unsuccessful(-ve) 1100 1100
Next step: Add
Step 3:Left shift 1001 100_
A+M +0101
Unsuccessful(-ve) 1110 1000
Next step: Add
Step 4:Left shift 1101 000_
A+M +0101
successful(+ve) 0010 0001

Remainder:2 Quotient:1

RESTORING DIVISION (For unsigned Numbers)

1) Let Q register hold the divided, M register holds the divisor and A register is 0.
2) On completion of the algorithm, Q will get the quotient and A will get theremainder.

Algorithm:
The number of steps required is equal to the number of bits in the Dividend.
1) At each step, left shift the dividend by 1 position.
2) Subtract the divisor from A (perform A - M).
3) If the result is positive then the step is said to be “Successful”.In this case quotient bit will be “1” and
Restoration is NOT Required.
4) If the result is negative then the step is said to be “Unsuccessful”.In this case quotient bit will be
“0”.Here Restoration is performed by adding back the divisor.
Hence the method is called Restoring Division.Repeat steps 1 to 4 for all bits of the Dividend.
Example: (6) / (4)
Dividend (Q) = 6
Divisor (M) = 4
Accumulator (A) = 0
6 = 0110 4 = 0100
-6 = 1010 -4 = 1100
Accumulator Dividend Divisor
A(0) Q(6) M(4)
Initial Values 0000 0110 0100
Step 1:Left shift 0000 110_
A-M + 1100
Unsuccessful(-ve) 1100
Restoration: 0000 1100
Step 2:Left shift 0001 100_
A-M +1100
Unsuccessful(-ve) 1101
Restoration: 0001 1000
Step 3:Left shift 0011 000_
A-M +1100
Unsuccessful(-ve) 1111
Restoration: 0011 0000
Step 3:Left shift 0110 000_
A-M +1100
Successful(+ve) 0010
No Restoration 0001
Remainder(2) Quotient(1)

RESTORING DIVISION FOR SIGNED NUMBERS:

1) Let M register hold the divisor, Q register hold the divided.


2) A register should be the signed extension of Q.
3) On completion of the algorithm, Q will get the quotient and A will get the remainder.

Algorithm:
The number of steps required is equal to the number of bits in the Dividend.
1) At each step, left shift the dividend by 1 position.
2) If Sign of A and M is the same then Subtract the divisor from A (perform A - M),
Else Add M to A
3) After the operation,If Sign of A remains the same or the dividend (in A and Q) becomes zero,then the
step is said to be “Successful”.In this case quotient bit will be “1” and Restoration is NOT Required.
4) If Sign of A changes, then the step is said to be “Unsuccessful”.In this case quotient bit will be
“0”.Here Restoration is Performed.Hence, the method is called Restoring Division.Repeat steps 1 to 4 for
all bits of the Dividend.

Note: The result of this algorithm is such that, the quotient will always bepositive and the remainder
will get the same sign as the dividend.
Example: (-19) / (7)
19 = 010011 7 = 000111
-19 = 101101 -7 = 111001
Accumulator Dividend Divisor
A(Sign Extension) Q(-19) M(7)
Initial Values 111111 101101 000111
Step 1: Left-shift 111111 01101_
Sign(A,M) Different: A+M + 000111
Sign changes: Unsuccessful 000110
Restore 111111 011010
Step 2: Left-shift 111110 11010_
Sign(A,M) Different: A+M + 000111
Sign changes: Unsuccessful 000101
Restore 111110 110100
Step 3: Left-shift 111101 10100_
Sign(A,M) Different: A+M + 000111
Sign changes: Unsuccessful 000100
Restore 111101 101000
Step 4: Left-shift 111011 01000_
Sign(A,M) Different: A+M + 000111
Sign changes: Unsuccessful 000010
Restore 111011 010000
Step 5: Left-shift 110110 10000_
Sign(A,M) Different: A+M + 000111
Sign still same: Successful 111101
Restoration not required 111101 100001
Step 6: Left-shift 111011 00001_
Sign(A,M) Different: A+M + 000111
Sign changes: Unsuccessful 000010
Restore 111011 000010
Remainder(-5) Quotient(2)

You might also like