Introduction to
Information Theory
          Nguyen Phuong Thai
About me
Name: Nguyễn Phương Thái
Address: Natural Language Processing Laboratory, Institute of
 Artificial Intelligence, VNU UET
Email: thainp@vnu.edu.vn Phone number: 090 205 9945
Homepage: https://nlp.uet.vnu.edu.vn
Research interests
    Natural language processing
    Machine learning
    Linguistics annotation
Related Courses in the CS Program
Probability and statistics
Elective courses (Group 1)
  Artificial intelligence
  Machine learning
  Natural language processing
  Bioinformatics
  Speech processing
  …
What is Information Theory?
Information theory studies the quantification,
 representation, and communication of information.
It was originally proposed by Claude Shannon in 1948
What is information theory?
(A video from Khan Academy)
https://youtu.be/d9alWZRzBWk
Claude Shannon, 1916-2001
Communication
Communication
Communication Theory
In 1948, Claude E. Shannon of Bell Labs published a
 paper “The Mathematical Theory of Communications”,
 and single-handedly created this field. (paper is on the
 web)
Shannon’s work grew out of solving problems of
 electrical communication during WWII, but IT applies to
 many other fields as well.
Key idea: Communication is sending information from
 one place and/or time to another place and/or time over a
 medium that might have errors.
Main Contributions of the
Paper
The first time probability was applied to deal with
 communication problems.
The innovation idea was that “information” (from any
 source) has a digital nature.
The concept of “entropy” was invented and used to
 measure the “complexity” or the “randomness” of the
 source.
The paper also set a foundation for data compression,
 coding, and decoding with error detection and correction.
Shannon’s Communication Model
The paper introduced a model for communication:
 encoding, sending signal though a channel, and
 decoding.aper introduced a model for communication:
 encoding, sending signal though a channel, and decoding.
Source
Voice
Words
Pictures
Music, art
Human cells about to reproduce
Human parents about to reproduce
Sensory input of biological organism
or any signal at all (any binary data)
Channel
Telephone line
High frequency radio link
Storage (disk, tape, internet, TCP/IP, facebook, twitter),
 transmission through time rather than space, could be
 degradation due to decay
Biological organism (send message from brain to foot, or
 from ear to brain)
Receiver
The destination of the information transmitted
Person,
Computer
Disk
Analog Radio or TV
Internet streaming audio system
Noise
Represents our imperfect understanding of the universe.
Thus, we treat it as random, often however obeying some
 rules, such as that of a probability distribution..
Encoder
Processing done before placing info into channel
First stage: data reduction (keep only important bits or
 remove source redundancy), followed by redundancy
 insertion catered to channel.
A code = a mechanism for representation of information
 of one signal by another.
An encoding = representation of information in another
 form.
Decoder
Exploit and then remove redundancy
Remove and fix any transmission errors
Restore the information in original form
Example: Human Speech
Source = human thought, speakers brain
Encoder = Human Vocal Tract
Channel = air, sound pressure waves
Noise = background noise
Decoder = human auditory system
Receiver = human thought, listeners brain
Example: Morse Code
Morse Code
 Morse code, series of dots and
  dashes to represent letters
 Most frequent letter sent with the
  shortest code, 1 dot
 Note: codewords might be prefixes
  of each other (e.g., “E” and “F”).
 Uses only binary data (single
  current telegraph, size two
  “alphabet”), could use more (three,
  double current telegraph), but this
  is more susceptible to noise (binary
  in computer rather than ternary).
On Error
How do we decrease errors in a communications system?
  Physical: use more reliable components in circuitry, broaden
   spectral bandwidth, use more precise and expensive
   electronics, increase signal power.
  All of this is more expensive.
Question: Given a fixed imperfect analog channel and
 transmission equipment, can we achieve perfect
 communication over an imperfect communication line?
  Yes: Key is to add redundancy to signal.
  Encoder adds redundancy appropriate for channel. Decoder
    exploits and then removes redundancy.
The Meaning of Information
For years, telegraph messages were “compressed” by leaving
  certain words out, or sending key words that stood for longer
  messages, since costs were determined by the number of
  words sent.
    Yet people could easily read these abbreviated messages, since
     they supplied these predictable words, such “a” and “the.”
For Shannon, information is symbols that contain
  unpredictable news, like our sentence, “only infrmatn esentil
  to understandn mst b tranmitd.”
    The predictable symbols that we can leave out, which Shannon
     calls redundancy, are not really news.
The Meaning of Information
Another example is coin flipping. Each time we flip a
 coin, we can transmit which way it lands, “heads” or
 “tails,” by transmitting a code of “zero” or “one.”
But what if the coin has two “heads” and everyone knows
 it? Since there is no uncertainty concerning the outcome
 of a flip, no message need be sent at all.
Outline of the Course
 Probability and statistics
 Entropy, relative entropy, and
  mutual information
 Data compression
 Gambling
 Channel capacity
Grading Policy
Class attendance (x 0.1)
Midterm examination (x 0.3)
Final examination (x 0.6)
THANK YOU!