COMP 30660: Computer Architecture and Organization (CONV)
Lecture 3: Data Representation in
 Computer Systems                                              http://www.flickr.com/photos/sarahseverson/
                                        Madhusanka Liyanage
                                            School of Computer Science
                                        University College Dublin, Ireland
                                                 madhusanka@ucd.ie
                                                                                                        1
Learning Objectives
• Understand the fundamentals of numerical data
  representation in digital computers.
• Gain familiarity with the most popular character codes.
• Become aware of the differences between how data is
  stored in computer memory and how it is transmitted
  over networks.
• Understand the concepts of error detecting and
  correcting codes.
                                                            2
Data and Information
• Data can be defined as a representation of facts,
  concepts, or instructions in a formalized manner, which
  should be suitable for communication, interpretation, or
  processing by human or electronic machine.
• Information is organized or classified data, which has
  some meaningful values for the receiver.
• Information is the processed data on which decisions
  and actions are based.
                                                         3
Basic Unit of Data
• Use to indicate the capacity of some standard
  data storage system or communication channels.
• Units derived from
   –   bit
   –   Byte
   –   Nibble
   –   Crumb
   –   Word
                                                   4
Bit
• A bit is the most basic unit of data in a computer.
   – It is a state of “on” or “off” in a digital circuit.
   – Sometimes these states are “high” or “low”
     voltage instead of “on” or “off”
                                                        5
Byte
• A byte is a group of eight bits.
   – A byte is the smallest
     possible addressable unit
     of computer storage.
   – The term, “addressable,”
     means that a particular
     byte can be retrieved
     according to its location in
     memory.
                                     6
Nibble
• A group of four bits is called a nibble (or nybble).
   – Half a byte
   – Bytes, therefore, consist of two nibbles: a
     “high-order/Upper nibble” and a “low-
     order/lower nibble”.
   – Nibble is most often used in the context of
     hexadecimal number representations, since a
     nibble has the same amount of information as
     one hexadecimal digit.
                                                         7
Crumb
• A pair of two bits or a quarter byte was called a
  crumb.
   – Quarter of a byte
   – Often used in early 8-bit computing.
                                                      8
      Word
• A word is a contiguous group of
  bytes.
   – Words can be any number of
     bits or bytes.
   – Word sizes of 16, 32, or 64
     bits are most common.
   – In a word-addressable
     system, a word is the
     smallest addressable unit of
     storage.
   – The number of bits in a word
     is usually defined by the size
     of the registers in the
     computer's CPU
                                      9
Data Representation
• The computer work with binary numbers
• Therefore, the numbers, letters, and other
  symbols must be converted into their binary
  equivalents.
Integers
           12
Integer Representation (Recap)
• The Representation of a positive integer number
  is quite straight forward
   – but we are interested to represent positive as well
     as negative numbers.
• Add a sign bit to representation
• For a Positive number, the sign bit set to 0 and
  for negative number the sign bit is set to 1.
Integer Representation (Recap)
▪ An integer can be represented by fixed point
  representation
▪ The left most bit is considered as sign bit.
▪ The magnitude of the number represent by the
  rest of the bits
                                                 14
Integer Representation (Recap)
▪ The magnitude of the number can be
  represented in following three ways:
1. Signed magnitude representation.
2. Signed 1’s complement representation.
3. Signed 2’s complement representation.
But how to represent the Floating-
Point numbers?
                                     16
Floating-Point Representation
• The signed magnitude, one’s
  complement, and two’s
  complement representation that
  we have just presented deal with
  integer values only.
• Without modification, these
  formats are not useful in
  scientific or business applications
  that deal with real number
  values.
• Floating-point representation
  solves this problem.
                                        17
Floating-Point: Scientific Notation
 • Scientific notation is a way of expressing numbers
   that are too large or too small to be conveniently
   written in decimal form.
    – For example:
       0.125 = 1.25  10-1
       5,000,000 = 5.0  106
                                                  18
Scientific Notation
• Scientific Notation: has a single digit to the left of the decimal point.
• Numbers written in scientific notation have three components:
                                                                          19
 Floating-Point Representation
• Computers use a form of scientific notation for
  floating-point representation
• Computer representation of a floating-point number
  consists of three fixed-size fields:
• This is the standard arrangement of these fields.
                                                      20
Floating-Point Representation
• The one-bit sign field is the sign of the stored value.
• The size of the exponent field, determines the range
  of values that can be represented.
• The size of the significand (mantissa) determines the
  precision of the representation.
                                                      21
Example:
  For illustrative purposes, we use a 14-bit model with a 5-bit
  exponent and an 8-bit significand.
  • Example:
     – Express 3210 in the simplified 14-bit floating-
       point model.
  • We know that 32 is 25. So in (binary) scientific
    notation 32 = 1.0 x 25
  • Using this information, we put 101 (= 510) in the
    exponent field and 1 in the significand as shown.
                                                                  22
Example: synonymous forms
                  32 = 1.0 x 25 = 0.1 x 26 = 0.01 x 27 = 0.001 x 28 = 0.0001 x 29
• The illustrations shown at
  the right are all equivalent
  representations for 32
  using our simplified model.
• Not only these
  synonymous
  representations waste
  space, but they can also
  cause confusion.
                                                                          23
Floating-Point Representation: Negative
exponents
  • Another problem with our system is that we have made
    no allowances for negative exponents.
  • E.g. no way to express 0.25 =1/4 = 1.0 x 2-2 = 0.1 x 2-1
     – Notice that there is no sign in the exponent field!
                                                             24
IEEE-754 Representation
• A technical standard for floating-point arithmetic by
  the Institute of Electrical and Electronics Engineers
  (IEEE).
• The standard defines several interchange formats,
                                                    26
IEEE-754 Representation: How to Solve
synonymous Issue
• To resolve the problem of synonymous forms,
  IEEE-754 establish a rule that the first digit of
  the significand must be 1 (and integer part
  should be zero).
       • e.g. 32 = 1.0 x 25 = 0.1 x 26
• This results in a unique pattern for each floating-point
  number.
   – In the IEEE-754 standard, this 1 is implied meaning
     that a 1 is assumed after the binary point.
                                                             27
IEEE-754 Representation: How to
Solve negative exponents
 • To provide for negative exponents, IEEE-754 uses a
   biased exponent.
 • A bias is a number that is approximately midway in
   the range of values expressible by the exponent.
 • Exponent filed in IEEE-754 is filled by adding the
   bias to the real exponent value
    – So, Need to subtract the bias from the value in the
      exponent field to determine its true value.
 • Exponent values less than bias are negative,
   representing fractional numbers.
                                                            28
IEEE-754 Representation
• The IEEE-754 single precision floating point
  standard uses bias of 127 over its 8-bit exponent.
• The double precision standard has a bias of 1023
  over its 11-bit exponent.
                                                       29
Example 1:
     – Express 3210 in the revised 14-bit
       floating-point model with a 5-bit
       exponent and an 8-bit significand. Use
       16 as bias.
  • We know that 32 = 1.0 x 25 = 0.1 x 26.
  • To use our excess 16 biased exponent, we add 16 to
    6, giving 2210 (=101102).
  • Graphically:
                                                         30
Example 2:Representation
    – Express 0.062510 in the revised 14-bit
      floating-point model with a 5-bit
      exponent and an 8-bit significand. Use
      16 as bias.
 • We know that 0.0625 is 2-4. So, in (binary) scientific
   notation 0.0625 = 1.0 x 2-4 = 0.1 x 2 -3.
 • To use our excess 16 biased exponent, we add
   16 to -3, giving 1310 (=011012).
                                                            31
Example 3 (To Do):Representation
     – Express -26.62510 in the revised 14-bit
       floating-point model with a 5-bit
       exponent and an 8-bit significand. Use 16
       as bias.
  • We find 26.62510 = 11010.1012. Normalizing, we have:
    26.62510 = 0.11010101 x 2 5.
  • To use our excess 16 biased exponent, we add 16 to 5,
    giving 2110 (=101012).
  • We also need a 1 in the sign bit (for a negative
    number).
                                                       32
What about Characters?
                         33
Character Codes
                  34
Character Codes
• Calculations are not useful until their results can
  be displayed in a manner that is meaningful to
  people.
• Also need to store the results of calculations and
  provide a meaning for data input.
• Thus, human-understandable characters must be
  converted to computer-understandable bit patterns
  (and vise versa) using some sort of character
  encoding scheme.
• Character Codes are used for this purpose
                                                        35
Character Codes :
Binary-coded decimal (BCD)
• The earliest computer coding systems used six bits.
• Binary-coded decimal (BCD) was one of these early
  codes.
• In BCD, each digit is represented by a fixed number
  of bits, usually four or eight.
• It was used by IBM mainframes in the 1950s and
  1960s.
• As computers have evolved, character codes have
  evolved.
• Larger computer memories and storage devices
  permit richer character codes.
                                                        36
Character Codes : EBCDIC
• In 1964, BCD was extended to an 8-bit code,
  Extended Binary-Coded Decimal Interchange
  Code (EBCDIC).
• EBCDIC was one of the first widely-used computer
  codes that supported upper and lowercase
  alphabetic characters, in addition to special
  characters, such as punctuation and control
  characters.
• EBCDIC and BCD are still in use by IBM
  mainframes today.
                                                37
ASCII (American Standard Code for
Information Interchange)
 • Other computer manufacturers chose the 7-bit
   ASCII (American Standard Code for Information
   Interchange) as a replacement for 6-bit codes.
 • Until recently, ASCII was the dominant character
   code outside the IBM mainframe world.
                                                    39
The ASCII Code
                 40
41
Unicode
Unicode
• Many of today’s systems embrace Unicode, a 16-bit
  system that can encode the characters of every
  language in the world.
• Defines 144,697 characters covering 159 modern and
  historic scripts, as well as symbols, emoji, and non-
  visual control and formatting codes.
• Maintained by the Unicode Consortium
                                                      43
 Unicode
• The Unicode codes-
  pace allocation is
  shown at the right.
• The lowest-numbered
  Unicode characters
  comprise the ASCII
  code.
• The highest provide for
  user-defined codes.
                            44
Data Recording and Transmission
                                  45
Codes for Data Recording and
Transmission
• When character codes or numeric values are stored in
  computer memory, their values are unambiguous (Fixed).
• However, this is not always the case when data is stored
  on magnetic disk or transmitted over a distance of more
  than a few feet.
   – Owing to the physical irregularities of data
     storage and transmission media, bytes can
     become distorted or garbled.
• Data errors are reduced by use of suitable coding
  methods as well as through the use of various error-
  detection techniques.
                                                         46
  Codes for Data Recording
  and Transmission
• To transmit data, pulses of “high” and “low” voltage
  are sent across communications media.
• To store data, changes are induced in the magnetic
  polarity of the recording medium.
• The period of time during which a bit is transmitted,
  or the area of magnetic storage within which a bit is
  stored is called a bit cell.
                                                         47
 Non-Return-to-Zero (NRZ)
• The simplest data recording and transmission code
  is the non-return-to-zero (NRZ) code.
• NRZ encodes 1 as “high” and 0 as “low.”
• The coding of OK (in ASCII) is shown below.
       The problem with NRZ code is that long strings of
       zeros and ones cause synchronization loss.
                                                           48
  Non-return-to-zero-invert (NRZI)
• Non-Return-to-Zero-Invert (NRZI) reduces this
  synchronization loss by providing a transition (either
  low-to-high or high-to-low) for each binary 1 and no
  transition for binary zero (0)
       Although it prevents loss of synchronization over long
       strings of binary ones, NRZI coding does nothing to
       prevent synchronization loss within long strings of zeros
                                                                   49
  Manchester coding
• Manchester coding (also known as phase modulation)
  prevents this problem by encoding a binary one with an
  “up” transition and a binary zero with a “down” transition.
                                                         50
Error Detection and Correction
                                 51
2.8 Error Detection and Correction
 • It is physically impossible for any data recording or
   transmission medium to be 100% perfect 100% of the
   time over its entire expected useful life.
 • As more bits are packed onto a square centimeter of
   disk storage, as communications transmission speeds
   increase, the likelihood of error is increasing.
 • Thus, error detection and correction is critical to
   accurate data transmission, storage and retrieval.
                                                         52
    Types of Error
• Single bit error
   – Only one bit in the
     data unit has
     changed.
• Burst error
   – Two or more bits
     in the data unit
     has changed.
                           53
           Error detection/correction
• Error detection
  – Check if any error has occurred
  – Don’t care the number of errors
  – Don’t care the positions of errors
• Error correction
  – Need to know the number of errors
  – Need to know the positions of errors
  – More difficult
                                           10.54
Error Detection
• Error detecting code is to include
  only enough redundancy to allow
  the receiver to deduce that an error
  occurred, but not which error, and
  have it request a retransmission.
• Error detection uses the concept of
  redundancy, which means adding
  extra bits for detecting error at the
  destination.
                                          55
Redundancy
• For error detection, a
  shorter group of bits may
  be appended to the end
  of each unit.
• This technique is called
  Redundancy because the
  extra bits are redundant
  to the information.
• They are discarded as
  soon as the accuracy of
  the transmission has
  been determined.
                              56
Error Detection Techniques
• Some popular techniques for error detection are:
   –   Parity check
   –   Checksum
   –   Cyclic redundancy check
   –   Cryptographic hash function
                                                 57
    Parity check
•   Check bit or parity bit will be added.
•   Two methods
     – Even parity checking
     – Odd parity checking
•   Even parity checking
     – 1 is added to the block if the data
       contains odd number of 1’s,
     – 0 is added if the data contains even
       number of 1’s
     – Adding the parity bit makes the total
       number of 1’s in the data even, that is
       why it is called even parity checking.
•   Odd parity checking
     – 0 is added to the block if the data
       contains odd number of 1’s,
     – 1 is added if the data contains even
       number of 1’s
     – Adding the parity bit makes the total
       number of 1’s in the data odd, that is    • Can detect on Odd
       why it is called odd parity checking.       numbers of errors
                                                 • Only useful for detecting
                                                   errors                  58
     Checksum
• A small data block derived
  from transmitted/stored digital
  data for the purpose of
  detecting errors that may have
  been introduced during its
  transmission or storage.
• The procedure which
  generates this checksum is
  called a checksum function
  or checksum algorithm.
• E.g. a checksum of a message
  can be a modular arithmetic
  sum of message code words of
  a fixed word length
                                    59
Home work
• Find out what is
   – Cyclic redundancy check
   – Cryptographic hash function
                                   60
Summery
• Understand the fundamentals of numerical data
  representation in digital computers.
• Gain familiarity with the most popular character
  codes.
• Become aware of the differences between how
  data is stored in computer memory and how it is
  transmitted over telecommunication lines.
• Understand the concepts of error detecting and
  correcting codes.
                                                     61
Thank You
            62