0% found this document useful (0 votes)

100 views11 pages

Floating Point Representation of Numbers: Wide Range

Floating point (FP) representation allows numbers to be stored in a wide range from very small to very large values. FP uses a sign bit, exponent, and significand (mantissa) to represent numbers. IEEE 754 standards define single and double precision FP formats using 32 and 64 bits respectively. Special values like infinity, Not-a-Number (NaN), and denormalized numbers are also defined. MIPS architecture supports 32 single precision and 16 double precision FP registers for operations like addition, subtraction, multiplication and division.

Uploaded by

doomachaley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views11 pages

Floating Point Representation of Numbers: Wide Range

Uploaded by

doomachaley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Floating point Representation of Numbers

FP is useful for representing a number in a wide

range: very small to very large. It is widely
used in the scientific world. Consider, the following
FP representation of a number

Exponent E
+/- x x x x

significand F (also called mantissa)

y y y y y y y y y y y y

Sign bit

In decimal it means (+/-) 1. yyyyyyyyyyyy x 10xxxx

In binary, it means (+/-) 1. yyyyyyyyyyyy x 2xxxx
(The 1 is implied)

IEEE 754 single-precision (32 bits)

s xxxxxxxx yyyyyyyyyyyyyyyyyyyyyyy

Single precision

23 bits

Largest = 1. 1 1 1 x 2
Smallest = 1.000 x 2

+127

128

2 x 10

1 x 10

+38

-38

These can be positive and negative, depending on s.

(But there are exceptions too)

IEEE 754 double precision (64 bits)

exponent

11 bits

significand
52 bits

Largest =

1. 1 1 1 x 2

Smallest =

1.000 X 2

+1023
1024

Overflow and underflow in FP

An overflow occurs when the number if too large
to fit in the frame. An underflow occurs when
the number is too small to fit in the given frame.

How do we represent zero?

IEEE standards committee solved this by
making zero a special case: if every bit is zero
(the sign bit being irrelevant), then the
number is considered zero.

Then how do we represent 1.0?

It should have been 1.0 x 20 (same as 0)! The way
out of this is that the interpretation of the
exponent bits is not straightforward. The
exponent of a single-precision float is "shift-

127" encoded (biased representation),

meaning that the actual exponent is (xxxxxxx
minus 127). So thankfully, we can get an exponent
of zero by storing 127.

Exponent = 11111111 (i.e. 255) means 255-127 = 128

Exponent = 01111111 (i.e. 127) means 127-127 = 0
Exponent = 00000001 (i.e. 1) means 1-127 = -126

More on Biased Representation

The consequence of shift-127

Exponent = 00000000 (reserved for 0) can no

more be used to represent the smallest number.
We forego something at the lower end of the
spectrum of representable exponents, (which could
be 2-127). That said, it seems wise, to give up the
smallest exponent instead of giving up the ability
to represent 1 or zero!

More special cases

Zero is not the only "special case" float. There are also
representations for positive and negative infinity, and for a
not-a-number (NaN) value, for results that do not make
sense (for example, non-real numbers, or the result of an
operation like infinity times zero). How do these work? A
number is infinite if every bit of the exponent is 1 (yes, we
lose another one), and is NaN if every bit of the exponent is 1
plus any mantissa bits are 1. The sign bit still distinguishes
+/-inf and +/-NaN. Here are a few sample floating point
representations:
Exponent

Mantissa

Object

Zero

Nonzero

Denormalized number*

1-254

Anything

+/- FP number

255

+ / - infinity

255

Nonzero

NaN like 0/0 or 0x inf

* Any non-zero number that is smaller than the smallest normal

number is a denormalized number. The production of a denormal is
sometimes called gradual underflow because it allows a calculation to
lose precision slowly when the result is small.

Floating point operations in MIPS

32 separate single precision FP registers in MIPS
f0, f1, f2, f31,
Can also be used as 16 double precision registers
f0, f2, f4, f30 (f0 means f0,f1 f2 means f2,f3)
These reside in a coprocessor

C1 in the same package

Operations supported
add.s

$f2, $f4, $f6

# f2 = f4 + f6 (single precision)

add.d

$f2, $f4, $f6

# f2 = f4 + f6 (double precision)

(Also subtract, multiply, divide format are similar)

lwc1

$f1, 100($s2)

# f1 = M [s2 + 100]

(32-bit load)

mtc1

$t0, $f0

# f0 = t0 (move to coprocessor 1)

mfc1

$t1, $f1

# t1 = f1 (move from coprocessor 1)

Sample program
Evaluation of a Polynomial a.x2 + b.x + c

# $f0 --- x
# $f2 --- sum of terms
.....
Pseudoinstruction

a:
b:
c:

# Evaluate the quadratic

l.s
$f2,a
mul.s
$f2,$f2,$f0

# sum = a
# sum = ax

l.s
add.s
mul.s

$f4,b
$f2,$f2,$f4
$f2,$f2,$f0

# get b
# sum = ax + b
# sum = (ax+b)x = ax^2 + bx

l.s
add.s
......

$f4,c
$f2,$f2,$f4

# get c
# sum = ax^2 + bx + c

.data
.float 1.0
.float 1.0
.float 1.0

Floating Point Addition

Example using decimal

A = 9.999 x 10 1, B = 1.610 x 10 1, A+B =?

Step 1. Align the smaller exponent with the larger

one.
B = 0.0161 x 101 = 0.016 x 101 (round off)
Step 2. Add significands
9.999 + 0.016 = 10.015, so A+B = 10.015 x 101
Step 3. Normalize
A+B = 1.0015 x 102
Step 4. Round off
A+B = 1.002 x 102

Now, try to add 0.5 and 0.4375 in binary.

Floating Point Multiplication

Example using decimal

A = 1.110 x 1010, B = 9.200 x 10-5

A x B =?

Step 1. Exponent of A x B = 10 + (-5) = 5

Step 2. Multiply significands
1.110x 9.200 = 10.212000
Step 3. Normalize the product
10.212 x 105 = 1.0212 x 106
Step 4. Round off
A x B = 1.021 x 106
Step 5. Decide the sign of A x B (+ x + = +)

So, A x B = + 1.021 x 106

Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
Floating Points
No ratings yet
Floating Points
31 pages
MIPS Architecture - BITS Pilani
No ratings yet
MIPS Architecture - BITS Pilani
58 pages
Computer Architecture: Data Types
No ratings yet
Computer Architecture: Data Types
25 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
FALLSEM2025-26 VL BACSE103 00100 ETH 2025-09-17 Mod-3-Part-2
No ratings yet
FALLSEM2025-26 VL BACSE103 00100 ETH 2025-09-17 Mod-3-Part-2
28 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Asembly Language
No ratings yet
Asembly Language
42 pages
ENSC254 - Floating Point Computation
No ratings yet
ENSC254 - Floating Point Computation
29 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Floating Point
No ratings yet
Floating Point
26 pages
COA
No ratings yet
COA
14 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
38 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point Arithmetic Guide
No ratings yet
Floating Point Arithmetic Guide
42 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Binary Number Representations
No ratings yet
Binary Number Representations
14 pages
Fixed vs Floating Point Numbers
No ratings yet
Fixed vs Floating Point Numbers
20 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
Floating-Point Representation in Computing
No ratings yet
Floating-Point Representation in Computing
6 pages
Single Precision Floating Point
No ratings yet
Single Precision Floating Point
24 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Floating Point
No ratings yet
Floating Point
33 pages
Cse 321 4 5
No ratings yet
Cse 321 4 5
11 pages
Chapter3 3
No ratings yet
Chapter3 3
13 pages
Lec 08
No ratings yet
Lec 08
36 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
05 Floating Point
No ratings yet
05 Floating Point
24 pages
COD - Unit-3 - N - 4 - PPT AJAY Kumar
No ratings yet
COD - Unit-3 - N - 4 - PPT AJAY Kumar
93 pages
Dit 705 - DSP
No ratings yet
Dit 705 - DSP
15 pages
Floating Point
No ratings yet
Floating Point
3 pages
COA Module 2
No ratings yet
COA Module 2
65 pages
Floating Point Representation Examples
No ratings yet
Floating Point Representation Examples
2 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
L09 - Floating-Point & Logic
No ratings yet
L09 - Floating-Point & Logic
59 pages
VHDL Floating-Point Design Lab
100% (1)
VHDL Floating-Point Design Lab
10 pages
Lecture 10 (Temp)
No ratings yet
Lecture 10 (Temp)
50 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
Cacc
No ratings yet
Cacc
106 pages
Unit 2
No ratings yet
Unit 2
16 pages
Chap 02
No ratings yet
Chap 02
16 pages
Computer Arithmetic (5 Hours)
No ratings yet
Computer Arithmetic (5 Hours)
27 pages
CSC340 - HW3
No ratings yet
CSC340 - HW3
28 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
How To Represent Real Numbers: - in Decimal Scientific Notation
No ratings yet
How To Represent Real Numbers: - in Decimal Scientific Notation
16 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
28 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
312s16 - Floating Point Arithmetic
No ratings yet
312s16 - Floating Point Arithmetic
31 pages
312s16 - Floating Point Arithmetic
No ratings yet
312s16 - Floating Point Arithmetic
31 pages
Unified DPI Flow for Chip Teams
No ratings yet
Unified DPI Flow for Chip Teams
14 pages
Chapter 4
No ratings yet
Chapter 4
95 pages
Chapter 6
No ratings yet
Chapter 6
50 pages
Chapter 0
No ratings yet
Chapter 0
13 pages
Chapter 7
No ratings yet
Chapter 7
24 pages
VLSI Testability Analysis Lecture
No ratings yet
VLSI Testability Analysis Lecture
18 pages
Testing 7
No ratings yet
Testing 7
20 pages
A UVM-Based Smart Functional Verification Platform: Concepts, Pros, Cons, and Opportunities
No ratings yet
A UVM-Based Smart Functional Verification Platform: Concepts, Pros, Cons, and Opportunities
6 pages
Soc Level Verification Using System Verilog
No ratings yet
Soc Level Verification Using System Verilog
3 pages
Java Basics for Beginners
No ratings yet
Java Basics for Beginners
309 pages
CS6303 Computer Architecture ACT Notes
No ratings yet
CS6303 Computer Architecture ACT Notes
76 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
No ratings yet
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
77 pages
The Conversion Procedure (Decimal To Floating Point)
No ratings yet
The Conversion Procedure (Decimal To Floating Point)
8 pages
Unit I
No ratings yet
Unit I
137 pages
Slide03 NumSys Ops Part1
No ratings yet
Slide03 NumSys Ops Part1
47 pages
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
No ratings yet
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
4 pages
Nvidia Ampere Architecture Whitepaper
No ratings yet
Nvidia Ampere Architecture Whitepaper
83 pages
Floating Point Arithmetic With VHDL
No ratings yet
Floating Point Arithmetic With VHDL
5 pages
COA Unit2 - G Maity
No ratings yet
COA Unit2 - G Maity
23 pages
Binary Numbers and Arithmetic Basics
No ratings yet
Binary Numbers and Arithmetic Basics
26 pages
C Programming Expressions Guide
No ratings yet
C Programming Expressions Guide
7 pages
Single and Double Precision Floating Point Multiplication and Division Alu
No ratings yet
Single and Double Precision Floating Point Multiplication and Division Alu
26 pages
Design of A Power Optimized L024-Point 32-Bit
No ratings yet
Design of A Power Optimized L024-Point 32-Bit
3 pages
Test 1 - Introduction To ICT
No ratings yet
Test 1 - Introduction To ICT
7 pages
Visiferm Do Visiferm Do Arc: Modbus Rtu Programmer'S Manual
No ratings yet
Visiferm Do Visiferm Do Arc: Modbus Rtu Programmer'S Manual
97 pages
Decimal To Floating-Point Conversions: The Conversion Procedure
No ratings yet
Decimal To Floating-Point Conversions: The Conversion Procedure
5 pages
SDD HSC Notes-1
No ratings yet
SDD HSC Notes-1
18 pages
COA Assignment
No ratings yet
COA Assignment
28 pages
DSP Module 5 2018 Scheme
100% (1)
DSP Module 5 2018 Scheme
104 pages
The 80x87 Math Coprocessor
No ratings yet
The 80x87 Math Coprocessor
44 pages
Booth and Radix-4 Questions
No ratings yet
Booth and Radix-4 Questions
8 pages
BaiTap Chuong123
No ratings yet
BaiTap Chuong123
19 pages
RISC-V Unprivileged ISA Manual
No ratings yet
RISC-V Unprivileged ISA Manual
670 pages
FCA4
No ratings yet
FCA4
45 pages
Cao Assignment
No ratings yet
Cao Assignment
2 pages
CA - Unit 2 - Important Question & Ans.
No ratings yet
CA - Unit 2 - Important Question & Ans.
6 pages

Floating Point Representation of Numbers: Wide Range

Uploaded by

Floating Point Representation of Numbers: Wide Range

Uploaded by

Floating point Representation of Numbers

FP is useful for representing a number in a wide

significand F (also called mantissa)

In decimal it means (+/-) 1. yyyyyyyyyyyy x 10xxxx

IEEE 754 single-precision (32 bits)

These can be positive and negative, depending on s.

IEEE 754 double precision (64 bits)

Overflow and underflow in FP

How do we represent zero?

Then how do we represent 1.0?

Then how do we represent 1.0?

127" encoded (biased representation),

Exponent = 11111111 (i.e. 255) means 255-127 = 128

More on Biased Representation

Exponent = 00000000 (reserved for 0) can no

More special cases

NaN like 0/0 or 0x inf

* Any non-zero number that is smaller than the smallest normal

Floating point operations in MIPS

C1 in the same package

$f2, $f4, $f6

$f2, $f4, $f6

(Also subtract, multiply, divide format are similar)

# t1 = f1 (move from coprocessor 1)

# Evaluate the quadratic

Floating Point Addition

Example using decimal

Step 1. Align the smaller exponent with the larger

Now, try to add 0.5 and 0.4375 in binary.

Floating Point Multiplication

Example using decimal

Step 1. Exponent of A x B = 10 + (-5) = 5

So, A x B = + 1.021 x 106

You might also like