0% found this document useful (0 votes)

33 views17 pages

COA Module6 FloatingPoint

The document discusses floating-point numbers, focusing on their representation, limitations, and encoding methods, particularly the IEEE-754 standard. It explains how floating-point arithmetic works, including addition, subtraction, multiplication, and division, with examples for clarity. Special values, rounding methods, and encoding examples are also provided to illustrate the concepts.

Uploaded by

s.sarthak1357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views17 pages

COA Module6 FloatingPoint

Uploaded by

s.sarthak1357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

10/14/24

Floating-point Numbers

Representing Fractional Numbers

• A binary number with fractional part

B = bn-1 bn-2 …..b1 b0 . b-1 b-2 ….. b-m
corresponds to the decimal number If the radix point is allowed to
n-1 move, we call it a floating-point
D = S bi 2i
i = -m representation.

• Also called fixed-point numbers.

• The position of the radix point is fixed.

1
10/14/24

Some Examples
1011.1 è 1x23 + 0x22 + 1x21 + 1x20 + 1x2-1 = 11.5
101.11 è 1x22 + 0x21 + 1x20 + 1x2-1 + 1x2-2 = 5.75
10.111 è 1x21 + 0x20 + 1x2-1 + 1x2-2 + 1x2-3 = 2.875

Some Observations:
• Shift right by 1 bit means divide by 2
• Shift left by 1 bit means multiply by 2
• Numbers of the form 0.111111…2 has a value less than 1.0 (one).

Limitations of Representation

• In the fractional part, we can only represent numbers of the form x/2k exactly.
• Other numbers have repeating bit representations (i.e. never converge).
• Examples:
3/4 = 0.11
7/8 = 0.111 • More the number of bits, more
5/8 = 0.101 accurate is the representation.
1/3 = 0.10101010101 [01] …. • We sometimes see: (1/3)*3 ≠ 1.
1/5 = 0.001100110011 [0011] ….
1/10 = 0.0001100110011 [0011] ….

2
10/14/24

Floating-point Number Representation (IEEE-754)

• For representing numbers with fractional parts, we can assume that the fractional point
is somewhere in between the number (say, n bits in integer part, m bits in fraction
part). à Fixed-point representation
• Lacks flexibility.
• Cannot be used to represent very small or very large numbers
(for example: 2.53 x 10-26, 1.7562 x 10+35, etc.).

• Solution :: use floating-point number representation.

• A number F is represented as a triplet <s, M, E> such that
F = (-1)s M x 2E

F = (-1)s M x 2E
• s is the sign bit indicating whether the number is negative (=1) or positive (=0).
• M is called the mantissa, and is normally a fraction in the range [1.0,2.0].
• E is called the exponent, which weights the number by power of 2.

Encoding:
• Single-precision numbers: total 32 bits, E 8 bits, M 23 bits
• Double-precision numbers: total 64 bits, E 11 bits, M 52 bits

s E M

3
10/14/24

Points to Note
• The number of significant digits depends on the number of bits in M.
• 7 significant digits for 24-bit mantissa (23 bits + 1 implied bit).
• The range of the number depends on the number of bits in E.
• 1038 to 10-38 for 8-bit exponent.

How many significant digits? Range of exponent?

224 = 10x 2127 = 10y
24 log102 = x log1010 127 log102 = y log1010
x = 7.2 à 7 significant decimal places y = 38.1 à maximum exponent value
38 (in decimal)

“Normalized” Representation
• We shall now see how E and M are actually encoded.
• Assume that the actual exponent of the number is EXP
(i.e. number is M x 2EXP).
• Permissible range of E: 1 ≤ E ≤ 254 (the all-0 and all-1 patterns are not allowed).
• Encoding of the exponent E:
The exponent is encoded as a biased value: E = EXP + BIAS
where BIAS = 127 (28-1 – 1) for single-precision, and
BIAS = 1023 (211-1 – 1) for double-precision.

4
10/14/24

• Encoding of the mantissa M:

• The mantissa is coded with an implied leading 1 (i.e. in 24 bits).
M = 1 . xxxx...x
• Here, xxxx…x denotes the bits that are actually stored for the mantissa. We get the extra
leading bit for free.
• When xxxx…x = 0000…0, M is minimum (= 1.0).
• When xxxx…x = 1111…1, M is maximum (= 2.0 – ε).

An Encoding Example

• Consider the number F = 15335

1533510 = 111011111001112 = 1.1101111100111 x 213

• Mantissa will be stored as: M = 1101111100111 00000000002

• Here, EXP = 13, BIAS = 127. è E = 13 + 127 = 140 = 100011002

0 10001100 11011111001110000000000 466F9C00 in hex

5
10/14/24

Another Encoding Example

• Consider the number F = -3.75

-3.7510 = -11.112 = -1.111 x 21

• Mantissa will be stored as: M = 111000000000000000000002

• Here, EXP = 1, BIAS = 127. è E = 1 + 127 = 128 = 100000002

1 10000000 11100000000000000000000 40700000 in hex

Special Values Zero is represented by the

all-zero string.
• When E = 000…0
Also referred to as de-
• M = 000…0 represents the value 0.
normalized numbers.
• M ≠ 000…0 represents numbers very close to 0.

• When E = 111…1
NaN represents cases
• M = 000…0 represents the value ∞ (infinity).
when no numeric value
• M ≠ 000…0 represents Not-a-Number (NaN).
can be determined, like
uninitialized values, ∞*0,
∞-∞, square root of a
negative number, etc.

6
10/14/24

Summary of Number Encodings

-¥ -Normalized -Denorm +Denorm +Normalized +¥

NaN +0 NaN
-0

Denormal numbers have very small magnitudes (close to 0) such that trying to
normalize them will lead to an exponent that is below the minimum possible value.
• Mantissa with leading 0’s and exponent field equal to zero.
• Number of significant digits gets reduced in the process.

Rounding
• Suppose we are adding two numbers (say, in single-precision).
• We add the mantissa values after shifting one of them right for exponent alignment.
• We take the first 23 bits of the sum, and discard the residue R (remaining bits).

• IEEE-754 format supports four rounding modes:

a) Truncation
b) Round to +∞ (similar to ceiling function)
c) Round to -∞ (similar to floor function)
d) Round to nearest

7
10/14/24

• To implement rounding, two temporary bits are maintained:

• Round Bit (r): This is equal to the MSB of the residue R.
• Sticky Bit (s): This the logical OR of the rest of the bits of the residue R.

• Decisions regarding rounding can be taken based on these bits:

a) R > 0: if r + s = 1
b) R = 0.5: if r.s’ = 1
c) R > 0.5: if r.s = 1 // ‘+’ is logical OR, ‘.’ is logical AND

• Renormalization after Rounding:

• If the process of rounding generates a result that is not in normalized form, then we need to re-
normalize the result.

Some Exercises

Decode the following single-precision floating-point numbers.

a) 0011 1111 1000 0000 0000 0000 0000 0000
b) 0100 0000 0110 0000 0000 0000 0000 0000
c) 0100 1111 1101 0000 0000 0000 0000 0000
d) 1000 0000 0000 0000 0000 0000 0000 0000
e) 0111 1111 1000 0000 0000 0000 0000 0000
f) 0111 1111 1101 0101 0101 0101 0101 0101

100

8
10/14/24

Floating-point Arithmetic

101

Floating Point Addition/Subtraction

• Two numbers: M1 x 2E1 and M2 x 2E2 , where E1 > E2 (say).
• Basic steps:
• Select the number with the smaller exponent (i.e. E2) and shift its mantissa right by (E1-E2) positions.
• Set the exponent of the result equal to the larger exponent (i.e. E1).
• Carry out M1 ± M2, and determine the sign of the result.
• Normalize the resulting value, if necessary.

102

9
10/14/24

Addition Example
• Suppose we want to add F1 = 270.75 and F2 = 2.375
F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
F2 = (2.375)10 = (10.011)2 = 1.0011 x 21
• Shift the mantissa of F2 right by 8 – 1 = 7 positions, and add:
1000 0111 0110 0000 0000 0000
1 0011 0000 0000 0000 0000 000
1000 1000 1001 0000 0000 0000 0000 000

• Result: 1.00010001001 x 28
Residue

103

Subtraction Example
• Suppose we want to subtract F2 = 224 from F1 = 270.75
F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
F2 = (224)10 = (11100000)2 = 1.111 x 27
• Shift the mantissa of F2 right by 8 – 7 = 1 position, and subtract:
1000 0111 0110 0000 0000 0000
111 0000 0000 0000 0000 0000 000
0001 0111 0110 0000 0000 0000 000
• For normalization, shift mantissa left 3 positions, and decrement E by 3.
• Result: 1.01110110 x 25

104

10
10/14/24

105

Floating-point Multiplication

• Two numbers: M1 x 2E1 and M2 x 2E2

• Basic steps:
• Add the exponents E1 and E2 and subtract the BIAS.
• Multiply M1 and M2 and determine the sign of the result.
• Normalize the resulting value, if necessary.

106

11
10/14/24

Multiplication Example

• Suppose we want to multiply F1 = 270.75 and F2 = -2.375

F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
F2 = (-2.375)10 = (-10.011)2 = -1.0011 x 21

• Add the exponents: 8 + 1 = 9

• Multiply the mantissas: 1.01000001100001
• Result: - 1.01000001100001 x 29

107

s1 E1 M1 s2 E2 M2
23
8 23
1 1
8
8-bit Adder 24 x 24 Multiplier

9 1111111

s1 s2 48
9-bit Subtractor

8
XOR
Normalizer
8 23
s3 E3 M3

108

12
10/14/24

Floating-point Division

• Two numbers: M1 x 2E1 and M2 x 2E2

• Basic steps:
• Subtract the exponents E1 and E2 and add the BIAS.
• Divide M1 by M2 and determine the sign of the result.
• Normalize the resulting value, if necessary.

109

Division Example

• Suppose we want to divide F1 = 270.75 by F2 = -2.375

F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
F2 = (-2.375)10 = (-10.011)2 = -1.0011 x 21

• Subtract the exponents: 8 – 1 = 7

• Divide the mantissas: 0.1110010
• Result: - 0.1110010 x 27
• After normalization: - 1.110010 x 26

110

13
10/14/24

s1 E1 M1 s2 E2 M2
23
8 23
1 1
8
8-bit Subtractor 24-bit Divider
9 1111111

s1 s2 48
9-bit Adder

8
XOR
Normalizer
8 23
s3 E3 M3

111

FLOATING-POINT ARITHMETIC in MIPS32

112

14
10/14/24

• The MIPS32 architecture defines the following floating-point registers (FPRs).

• 32 32-bit floating-point registers F0 to F31, each of which is capable of storing a single-
precision floating-point number.
• Double-precision floating-point numbers can be stored in even-odd pairs of FPRs (e.g.,
(F0,F1), (F10,F11), etc.).

• In addition, there are five special-purpose FPU control registers.

113

F0
F1 FIR
F2 FCCR
F3 FEXR
F4 FENR
FPRs
F5 FCSR
..
. Special-purpose
Registers
F30
F31

114

15
10/14/24

Typical Floating Point Instructions in MIPS32

• Load and Store instructions

• Load Word from memory
• Load Double-word from memory
• Store Word to memory
• Store Double-word to memory

• Data Movement instructions

• Move data between integer registers and floating-point registers
• Move data between integer registers and floating-point control registers

115

• Arithmetic instructions
• Floating-point absolute value
• Floating-point compare
• Floating-point negate
• Floating-point add
• Floating-point subtract
• Floating-point multiply
• Floating-point divide
• Floating-point square root
• Floating-point multiply add
• Floating-point multiply subtract

116

16
10/14/24

• Rounding instructions:
• Floating-point truncate
• Floating-point ceiling
• Floating-point floor
• Floating-point round

• Format conversions:
• Single-precision to double-precision
• Double-precision to single-precision

117

Example: Add a scalar s to a vector A

for (i=1000; i>0; i--)

A[i]= A[i] + s;
R1: initially qoints to A[1000]
(F2,F3): contains the scalar s
Loop: L.D F0,0(R1) R2: initialized such that 8(R2) is the
ADD.D F4,F0,F2 address of A[1]
S.D F4,0(R1) We assume double precision (64 bits):
ADDI R1,R1,-8 • Numbers stored in (F0,F1), (F2,F3), and (F4,F5).
BNE R1,R2,Loop

118

Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Module2.1 of Nothing
No ratings yet
Module2.1 of Nothing
7 pages
Review: How To Represent Real Numbers
No ratings yet
Review: How To Represent Real Numbers
9 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
28 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
IEEE 754: Floating Point Guide
No ratings yet
IEEE 754: Floating Point Guide
10 pages
8.1.4 Data Representation - Floatng Point Numbers
No ratings yet
8.1.4 Data Representation - Floatng Point Numbers
3 pages
BCSE205L-Module 2 Division and Floating Point Arithmetic
No ratings yet
BCSE205L-Module 2 Division and Floating Point Arithmetic
36 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
38 pages
Floating Point
No ratings yet
Floating Point
26 pages
Floating Point Numbers: Do You Have Your Laptop Here?
No ratings yet
Floating Point Numbers: Do You Have Your Laptop Here?
10 pages
COA
No ratings yet
COA
14 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
23 pages
Floating Points
No ratings yet
Floating Points
31 pages
Floating Point Number
No ratings yet
Floating Point Number
28 pages
Lecture 10 (Temp)
No ratings yet
Lecture 10 (Temp)
50 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
26 pages
arch1-LECTURE-NUMBER REPRESENTATION
No ratings yet
arch1-LECTURE-NUMBER REPRESENTATION
42 pages
Computer Organization
No ratings yet
Computer Organization
22 pages
Part 5 Floating Point Add Sub Mul
No ratings yet
Part 5 Floating Point Add Sub Mul
20 pages
Lec 3 Cao Floating Point Representation
No ratings yet
Lec 3 Cao Floating Point Representation
28 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
3-EED220 Lecture 3
No ratings yet
3-EED220 Lecture 3
22 pages
Computer Arithmetic Basics
No ratings yet
Computer Arithmetic Basics
18 pages
Unit-1 COA
No ratings yet
Unit-1 COA
26 pages
Floating Point Integer
No ratings yet
Floating Point Integer
15 pages
Chap-03 Computer Arithmetics
No ratings yet
Chap-03 Computer Arithmetics
16 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
26 pages
What Are Floating Point Numbers?
No ratings yet
What Are Floating Point Numbers?
7 pages
Computer Architecture: Data Types
No ratings yet
Computer Architecture: Data Types
25 pages
Floating Point
No ratings yet
Floating Point
16 pages
Scientific Programming: Floating Point Numbers
No ratings yet
Scientific Programming: Floating Point Numbers
4 pages
Data Representation Workbook
No ratings yet
Data Representation Workbook
8 pages
Floating-Point Representation Guide
No ratings yet
Floating-Point Representation Guide
14 pages
Unit2 2.3&2.4
No ratings yet
Unit2 2.3&2.4
28 pages
How To Represent Real Numbers: - in Decimal Scientific Notation
No ratings yet
How To Represent Real Numbers: - in Decimal Scientific Notation
16 pages
Floating Point Numbers Guide
No ratings yet
Floating Point Numbers Guide
7 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
26 pages
10 MIPS Floating Point Arithmetic
No ratings yet
10 MIPS Floating Point Arithmetic
28 pages
Lec 3
No ratings yet
Lec 3
20 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
BCS302 Unit-2 (Part-III)
No ratings yet
BCS302 Unit-2 (Part-III)
7 pages
13.3 Floating-Point Numbers, Representation & Manipulation
No ratings yet
13.3 Floating-Point Numbers, Representation & Manipulation
10 pages
Lecture 05 - Floating Point Numbers
No ratings yet
Lecture 05 - Floating Point Numbers
28 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
Chapter 5 - Floating Point Numbers
No ratings yet
Chapter 5 - Floating Point Numbers
9 pages
Floating-Point Representation in Computing
No ratings yet
Floating-Point Representation in Computing
6 pages
L-5 Floating Point Representation of Numbers
No ratings yet
L-5 Floating Point Representation of Numbers
12 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point Tutorial
No ratings yet
Floating Point Tutorial
15 pages
Fiche Utilisation Topspin (Analyseur RMN Et Autres Spectres)
No ratings yet
Fiche Utilisation Topspin (Analyseur RMN Et Autres Spectres)
106 pages
Network Management
No ratings yet
Network Management
27 pages
Energy Mastery Booklet
No ratings yet
Energy Mastery Booklet
21 pages
Digitization Parish Survey
No ratings yet
Digitization Parish Survey
1 page
GO - Cumulative Pacing Guide - 2
No ratings yet
GO - Cumulative Pacing Guide - 2
6 pages
Cybersource Payer Authentication: Using The Simple Order API
No ratings yet
Cybersource Payer Authentication: Using The Simple Order API
261 pages
Lesson 8: Exercise: Answer All Questions
No ratings yet
Lesson 8: Exercise: Answer All Questions
2 pages
Panasonic Ajd610wbp
No ratings yet
Panasonic Ajd610wbp
160 pages
Final Draft For Poster IC 2024
No ratings yet
Final Draft For Poster IC 2024
3 pages
100 zTidBits FICON HFP - A Reference
No ratings yet
100 zTidBits FICON HFP - A Reference
1 page
Heat Balance (Capl)
No ratings yet
Heat Balance (Capl)
19 pages
Design Thinking for Professionals
No ratings yet
Design Thinking for Professionals
28 pages
SW2 Lecture 6 MCQ (UML)
No ratings yet
SW2 Lecture 6 MCQ (UML)
11 pages
B.Tech, M.Tech, MCA Students CGPA List
No ratings yet
B.Tech, M.Tech, MCA Students CGPA List
17 pages
Vacuum Blowers
No ratings yet
Vacuum Blowers
16 pages
Com - Dual.space - Parallel.apps - Multiaccounts.appscloner Logcat
No ratings yet
Com - Dual.space - Parallel.apps - Multiaccounts.appscloner Logcat
157 pages
The Regulatory Vacuum
No ratings yet
The Regulatory Vacuum
4 pages
(English) I've Never Seen ANYTHING Like This Before... Temple OS (DownSub - Com)
No ratings yet
(English) I've Never Seen ANYTHING Like This Before... Temple OS (DownSub - Com)
19 pages
Nexus Prologue
No ratings yet
Nexus Prologue
79 pages
Course Content SAP S4 HANA SD Module
57% (7)
Course Content SAP S4 HANA SD Module
3 pages
Time Table Summer 2024 SMME V1.1
No ratings yet
Time Table Summer 2024 SMME V1.1
1 page
Database Project for Students
No ratings yet
Database Project for Students
7 pages
Waterjet Cutting Omax
No ratings yet
Waterjet Cutting Omax
2 pages
Scrip para Mikrotik 750gl
No ratings yet
Scrip para Mikrotik 750gl
2 pages
990 Atomic Absorption Spectrometer
No ratings yet
990 Atomic Absorption Spectrometer
2 pages
CMOS Battery CSS 9
No ratings yet
CMOS Battery CSS 9
28 pages
Sony Cpd-E200
No ratings yet
Sony Cpd-E200
42 pages
DTC C1231/31 Malfunction in Steering Angle Sensor Circuit
No ratings yet
DTC C1231/31 Malfunction in Steering Angle Sensor Circuit
4 pages
Factoring Techniques Guide
No ratings yet
Factoring Techniques Guide
36 pages
Assignment
No ratings yet
Assignment
6 pages

COA Module6 FloatingPoint

Uploaded by

COA Module6 FloatingPoint

Uploaded by

10/14/24

Representing Fractional Numbers

• A binary number with fractional part

• Also called fixed-point numbers.

Floating-point Number Representation (IEEE-754)

• Solution :: use floating-point number representation.

How many significant digits? Range of exponent?

• Encoding of the mantissa M:

• Consider the number F = 15335

• Mantissa will be stored as: M = 1101111100111 00000000002

• Here, EXP = 13, BIAS = 127. è E = 13 + 127 = 140 = 100011002

0 10001100 11011111001110000000000 466F9C00 in hex

Another Encoding Example

• Consider the number F = -3.75

• Mantissa will be stored as: M = 111000000000000000000002

• Here, EXP = 1, BIAS = 127. è E = 1 + 127 = 128 = 100000002

1 10000000 11100000000000000000000 40700000 in hex

Special Values Zero is represented by the

Summary of Number Encodings

• IEEE-754 format supports four rounding modes:

• To implement rounding, two temporary bits are maintained:

• Decisions regarding rounding can be taken based on these bits:

• Renormalization after Rounding:

Decode the following single-precision floating-point numbers.

Floating Point Addition/Subtraction

• Two numbers: M1 x 2E1 and M2 x 2E2

• Suppose we want to multiply F1 = 270.75 and F2 = -2.375

• Add the exponents: 8 + 1 = 9

• Two numbers: M1 x 2E1 and M2 x 2E2

• Suppose we want to divide F1 = 270.75 by F2 = -2.375

• Subtract the exponents: 8 – 1 = 7

FLOATING-POINT ARITHMETIC in MIPS32

• The MIPS32 architecture defines the following floating-point registers (FPRs).

• In addition, there are five special-purpose FPU control registers.

Typical Floating Point Instructions in MIPS32

• Load and Store instructions

• Data Movement instructions

Example: Add a scalar s to a vector A

for (i=1000; i>0; i--)

You might also like