Shi Wal 95 A
Shi Wal 95 A
Computing Machines
shirazi@pequod.ee.vt.edu
R1 R7 R7 1 R11 1 R11
Therefore: s1 = 0 e1 = 1000011 1.f1 = 1.1000000011
s2 = 1 e2 = 1000011 1.f2 = 1.1001011010
Shift Value
Shift Right
Stage 1:
+/-
Stage 2:
Bottom 12 • Swap v1 and v2 since e1= e2 and f2 > f1
R1 R7 R12 Now: s1 = 1 e1 = 1000011 1.f1 = 1.1001011010
s2 = 0 e2 = 1000011 1.f2 = 1.1000000011
Exponent
Adjust
Normalize • Since e1 - e2 = 0, 1.f2 does not need to be shifted in
Stage 3: the next stage.
R1 R7 R10
Stage 2:
Rx = x-Bit Register • Since s1 does not equal s2, 1.f3 = 1.f1 - 1.f2.
Figure 4: Three stage 18-bit Floating Point Adder. • Also, s3 = f1 and e3 = e1 since they are the sign and
exponent of the greater value.
After stage 2: s3 = 1 e3 = 1000011 1.f3 = 0.0001010111
Stage 2:
• Shift 1.f2 to the right (e2 - e1) places calculated in the Stage 3:
previous stage. • Normalize f3 by shifting it 5 places to the left.
• Add 1.f1 to 1.f2 if s1 equals s2, else subtract 1.f2 from
1.f1.
• Adjust the Exponent, e3, by subtracting 5 from it.
After final stage: s3 = 1 e3 = 0111111 1.f3 = 1.0101110000
• Set the sign and the exponent of the final result, v3, to
the sign and the exponent of the greater value v1.
The result, v3, after addition is shown as follows:
Stage 3:
Decimal Binary 18 Bit Format
• Normalization of f3 is done by shifting it to the left v3 -1.359375 1.010111 x 20 0 0111111 0101110000
until the high order bit is a one.
• Adjusting exponent of the result, e3, is done by sub-
tracting it by the number of positions that f3 was
shifted left. 3.3 Optimization
1 / f1 Memory
A floating-point division technique is presented Data
here which utilizes the pipelined multiplier discussed ear-
lier. Division can be done by using the reciprocal of the x Stage 3-5
divisor value so that the equation for division becomes a
multiplication of (A x (1/B) = Q). Independent operations
on the different floating point fields enable the design to be Q
pipelined easily.
Figure 6: Three stage 18 bit Floating Point Divider.
although a special case for 1.0 has to be made. The nor-
malization process is done automatically with k1. Once
the addition is done, the result becomes the new exponent Adder/
passed onto Stage 2. The mantissa in Stage 1 directly goes Subtracter Multiplier Divider
to the memory address buffer to obtain the new mantissa, FG Function 28% 44% 46%
but the old mantissa continues into Stage 2 and is replaced Generators
in Stage 3. Stage 2 of the pipeline waits for the data to Flip Flops 14% 14% 34%
become available from the memory. This occurs at Stage
Stages 3 3 5
3. The new mantissa is inserted into the final operand to
be passed to the multiplier. Although three pipeline stages Speed 8.6 MHz 4.9 MHz 4.7 MHz
are shown here, additional stages occur due to the pipe- Tested Speed 10 MHz 10 MHz 10 MHz
lined multiplier to make a total of five stages.
TABLE 5. Summary of 18 bit Floating Point Units.
6.0 Summary and Conclusions
description of the floating point arithmetic units. The Xil- 16 x 16 Partial Convolution Sum
P2_Register P3_Register
.
Adder/
Subtracter Multiplier Divider state mux state mux
FG Function 26 % 36% 38%
Generators +
Flip Flops 13 % 13% 32%
Stages 3 3 5
S3
Speed 9.3 MHz 6.0 MHz 5.9 MHz
Im_register
f(x).im
18
PE 1
.
“1”
f(x).im
18
..
PE 2 “1”
f(x).im
18
PE 3
*
k
f(x).im*WN.re
PE 4
-
result.im
f(x).re
18
. *
f(x).re
18
* k
f(x).im*WN.im
18
“0”
+ “0”
k
f(x).re * WN.im
result .re
*
k
18 f(x).re*WN.re k
16 WkN.re 18 f(x).re WN.im
KEY: 16
* Floating Point Multiply
+ Floating Point Add
- Floating Point Subtract
16 or 18 Bit Multiplexor
18-Bit Delay Register
Each of the floating point arithmetic units has been [2] J.A. Eldon and C. Robertson, “A Floating Point Format for
incorporated into two applications: a 2-D FFT [6] and a Signal Processing,” Proceedings IEEE International Con-
FIR filter [7]. The FFT application operates at 10 MHz ference on Acoustics, Speech, and Signal Processing, pp.
and the results of the transform are stored in memory on 717-720, 1982.
the Splash-2 array board. These results were checked by
[3] K. Eshraghian and N.H.E. Weste, Principles of CMOS
doing the same transform on a SPARC workstation An
VLSI Design, A Systems Perspective, 2nd Edition, Addison-
FIR tap design using a floating point adder and multiplier Wesley Publishing Company, 1993.
unit is shown in Figure 7. The complex floating point
multiplier used in the 2-D FFT butterfly calculation is [4] B. Fagin and C. Renard, “Field Programmable Gate Arrays
shown in Figure 8. and Floating Point Arithmetic,” IEEE Transactions on
VLSI, Vol. 2, No. 3, pp. 365-367, September 1994.
Acknowledgments
[5] IEEE Task P754, “A Proposed Standard for Binary Float-
ing-Point Arithmetic,” IEEE Computer, Vol. 14, No. 12, pp.
We wish to express our gratitude to Dr. J. T.
51-62, March 1981.
McHenry and Dr. D. Buell. We would also like to thank
Professor J. A. DeGroat of The Ohio State University for [6] N. Shirazi, Implementation of a 2-D Fast Fourier Trans-
technical advice. form on an FPGA Based Computing Platform, VPI&SU
This research has been supported in part by the Masters Thesis in progress.
National Science Foundation (NSF) under grant MIP-
9308390. [7] A. Walters, An Indoor Wireless Communications Channel
Model Implementation on a Custom Computing Platform,
References VPI&SU Master Thesis in progress.
[1] J.M. Arnold, D.A. Buell and E.G. Davis, “Splash 2,” Pro- [8] Xilinx, Inc., The Programmable Logic Data Book, San Jose,
ceedings of the 4th Annual ACM Symposium on Parallel California, 1993.
Algorithms and Architectures, pp. 316-322, June 1992.