Prepared by: Imran Raza Khan
(Assistant Professor, ACET, Aligarh)
           INFORMATION THEORY FOR CYBER SECURITY
               Unit I: Introduction To Information Theory
1. Shannon’s Foundation of Information Theory
Introduction
   •   Claude E. Shannon, in 1948, introduced the mathematical theory of
       communication, which laid the foundation of modern information theory.
   •   Information theory, introduced by Claude Shannon in 1948, is a mathematical
       framework for quantifying information, data compression, and transmission. In
       machine learning, information theory provides powerful tools for analyzing and
       improving algorithms.
Key Concepts Introduced by Shannon
1.1 Bit – The Basic Unit of Information
   •   A bit (binary digit) is the most basic unit of information.
   •   It represents a choice between two equally probable alternatives (0 or 1).
   •   If a message conveys one of two equally likely outcomes, it carries 1 bit of
       information.
1.2 Information as Reduction of Uncertainty
   •   Information is fundamentally linked to uncertainty.
   •   When we receive new data, it reduces our uncertainty about the system or
       environment.
1.3 Entropy – Measuring Information
   •   Entropy quantifies the average amount of information produced by a source.
   •   It is a measure of the uncertainty in a random variable.
Example:
If you have a fair coin toss:
   •   Probability of heads = 0.5, tails = 0.5
1.4 Mutual Information
   •   Mutual information measures the amount of information obtained about one
       random variable through another random variable. It quantifies the dependency
       between variables.
   •   Interpretation: Mutual information is zero if X and Y are independent, and higher
       values indicate greater dependency
1.5 Kullback - Leibler (KL) Divergence
KL divergence measures the difference between two probability distributions. It is often
used in machine learning to compare the predicted probability distribution with the true
distribution.
Why Shannon’s Theory is Important in Cyber Security
   •     Provides limits on compression and encryption.
   •     Helps in understanding perfect secrecy and data leakage.
   •     Forms the basis for designing efficient and secure communication systems.
2 Random Variables
Definition
Random variable is a fundamental concept in statistics that bridges the gap between
theoretical probability and real-world data. A Random variable in statistics is a function
that assigns a real value to an outcome in the sample space of a random experiment.
For example: if you roll a die, you can assign a number to each possible outcome.
There are two basic types of random variables:
   •     Discrete Random Variables (which take on specific values).
   •     Continuous Random Variables (assume any value within a given range).
We define a random variable as a function that maps from the sample space of an
experiment to the real numbers. Mathematically, Random Variable is expressed as,
X: S →R
where,
   •     X is Random Variable (It is usually denoted using capital letter)
   •     S is Sample Space
   •     R is Set of Real Numbers
1. Discrete Random Variable
   •   Takes on finite or countably infinite values.
   •   Example: Number shown on a rolled die → Values: {1, 2, 3, 4, 5, 6}
Probability Mass Function (PMF)
2. Continuous Random Variable
   •   Takes uncountably infinite values over an interval.
   •   Example: Temperature measurement → Any value in a real range
Probability Density Function (PDF)
3. Cumulative Distribution Function (CDF)
   •   Works for both types of random variables.
   •   It gives the probability that a random variable is less than or equal to a certain
       value.
4. Expectation and Variance
Relevance to Information Theory
   •   Random variables model the behavior of data sources and channels.
   •   Used in computing entropy, mutual information, and uncertainty.
   •   Help in assessing security threats by modeling attackers' knowledge
       probabilistically.
3 Probability Distribution Factors
What is a Probability Distribution?
A probability distribution is a mathematical function or rule that describes how the
probabilities of different outcomes are assigned to the possible values of a random
variable. It provides a way of modeling the likelihood of each outcome in a random
experiment.
While a frequency distribution shows how often outcomes occur in a sample or
dataset, a probability distribution assigns probabilities to outcomes abstractly,
theoretically, regardless of any specific dataset. These probabilities represent the
likelihood of each outcome occurring.
1. Types of Probability Distributions
1. Discrete Probability Distribution
   •   For discrete random variables
   •   Described using a Probability Mass Function (PMF)
2. Continuous Probability Distribution
   •   For continuous random variables
   •   Described using a Probability Density Function (PDF)
Property Discrete Case Continuous Case
3. Important Distributions in Information Theory
                                          Use in Cyber Security / Info
Distribution    Type
                                          Theory
Uniform         Discrete/Continuous       Ideal case for max entropy
Bernoulli       Discrete                  Binary communication channels
                                          Repeated trials (e.g., packet
Binomial        Discrete
                                          loss)
                                          First success timing (e.g.,
Geometric       Discrete
                                          retries)
                                          Modeling rare events (e.g.,
Poisson         Discrete
                                          attacks)
                                          Noise in communication
Gaussian        Continuous
                                          channels
Exponential     Continuous                Time between random events
4. Application in Information Security
   •   Entropy is calculated using a distribution’s probabilities.
   •   Mutual information and conditional entropy also depend on joint and marginal
       distributions.
   •   Used to model uncertainty about system states and measure information
       leakage.
4 Uncertainty and Entropy Information Measures
What is Uncertainty in Information Theory?
   •   Uncertainty refers to lack of knowledge about the outcome of a random
       variable.
   •   The more uncertain you are about an event, the more information is gained
       when it occurs.
   •   In information theory, uncertainty is quantified by entropy, which measures the
       average amount of information needed to describe the outcome of a random
       variable. Essentially, it reflects how unpredictable or random a system is. A
       higher entropy indicates greater uncertainty, while a lower entropy signifies less
       uncertainty.
What is Entropy?
   •   Entropy (H) quantifies the average uncertainty (or information) in a random
       variable.
   •   Proposed by Claude Shannon as the core of information theory.
   •   In information theory, entropy is a measure of uncertainty or randomness
       associated with a random variable or a data source. It quantifies the average
       amount of information needed to describe an outcome from that source. A
       higher entropy indicates more randomness and unpredictability, while a lower
       entropy suggests more order and predictability.
Shannon Entropy Formula
Properties of Entropy
Special Cases
Other Entropy-Based Measures
   •   Measures the information shared between XXX and YYY
Relevance in Cyber Security
   •   Entropy helps:
          o   Assess password strength
          o   Measure information leakage
          o   Evaluate encryption quality
   •   High entropy ⇒ harder to predict ⇒ more secure
5 Leakage and Quantifying Leakage
What is Information Leakage?
   •   Information leakage refers to the unintended or unauthorized disclosure of
       sensitive information.
   •   Even if a system is encrypted, patterns or partial data may leak useful
       information to an attacker.
   •   Information leakage, also known as data leakage, refers to the unauthorized
       exposure or disclosure of sensitive or confidential information. This can occur
       both internally, through employee negligence or malicious intent, and externally,
       through cyberattacks or misconfigured systems. Information leakage can have
       severe consequences, including financial losses, reputational damage, and legal
       repercussions.
       Causes of Information Leakage:
   •   Human error: Negligence, misconfigurations, or accidental sharing of
       information through email or messaging platforms.
   •   Insider threats: Malicious or disgruntled employees intentionally exfiltrating
       data.
   •   Vulnerabilities in systems and applications: Exploiting weaknesses in
       software or network configurations.
   •   Improper disposal of sensitive information: Discarding documents or devices
       without proper sanitization.
   •   Lack of security awareness: Insufficient training or understanding of security
       protocols among employees.
Types of Information Leakage
Type                 Description
Explicit Leakage     Directly observable data leaks (e.g., error messages, logs)
                     Leaks through system behavior or patterns (e.g., timing, size of
Implicit Leakage
                     responses)
Side-channel         Leaks via physical observation (e.g., power consumption,
Leakage              electromagnetic signals)
Measuring Information Leakage
1. Mutual Information Leakage
2. Min-Entropy Leakage
   •   Focuses on the worst-case probability of guessing the secret.
3. Shannon Leakage
   •   Based on average information leaked.
   •   Uses expected reduction in entropy:
Examples
   1. Password System
           o   User has a 4-digit PIN → 10,000 possibilities → H=log210000≈13.29H =
               \log_2 10000 \approx 13.29H=log210000≈13.29 bits
           o   If attacker learns first digit is 3 → uncertainty drops → leakage occurs
   2. Side-Channel Example
           o   Power trace from a cryptographic device leaks key bits during encryption
           o   Leakage is quantified by how much the power usage correlates with the
               secret key bits
Partitioning and Leakage
   •   The concept of partitions comes into play in systems where data is grouped into
       observable categories.
   •   Leakage can be seen as a function of how finely or coarsely these partitions are
       made.
Security Goals Related to Leakage
Goal                      Requirement
Perfect Secrecy           I(X;Y)=0I(X; Y) = 0I(X;Y)=0 — no information leaked
Computational Secrecy Leakage is computationally infeasible to exploit
Differential Privacy      Controlled leakage when releasing statistical information
Why This Matters in Cyber Security
   •   Even if encryption is used, metadata, system behavior, or execution timing
       may leak information.
   •   Quantifying leakage helps in:
          o   Designing more secure systems
          o   Evaluating risk
          o   Setting privacy guarantees
6 Partitions and Lower Bounds on Key Size
In Information Theory, a partition refers to dividing a set of elements into non-
overlapping subsets, called blocks, that collectively cover the entire set. These
partitions are fundamental to understanding information content and relationships
between data. They are used to define concepts like entropy, mutual information, and
variation of information.
A. Partitions in Information Theory
A partition is a way to divide the possible values of a random variable into non-
overlapping subsets. In the context of cyber security:
   •   Partitions represent what an attacker can distinguish based on their
       observations.
   •   Finer partitions ⇒ More detailed observations ⇒ Higher information leakage
Example:
If a secret key KKK can be in {00, 01, 10, 11}:
   •   No partition: attacker knows nothing
   •   Partition into {00, 01} and {10, 11}: attacker knows which half, but not the exact
       key
B. Importance of Partitioning
   •   Helps model the information access level of an attacker
   •   Used to design systems that minimize leakable information
   •   Critical in designing protocols with provable privacy guarantees
C. Lower Bounds on Key Size
1. Shannon’s Theorem on Perfect Secrecy
   •   A system provides perfect secrecy if:
Implication:
   •   Example: One-Time Pad (OTP) uses a random key of the same length as the
       message.
2. Secrecy Capacity
   •   Secrecy capacity: Maximum rate at which data can be transmitted securely over
       a channel.
   •   If key is too short, attacker can brute-force the key, leading to:
          o     Higher computational leakage
          o     Unreliable confidentiality
3. Lower Bounds in Real-World Systems
   •   Security level (e.g., 128-bit AES) defines minimum key size to prevent brute-
       force.
   •   Longer key size ⇒ Higher computational difficulty for attackers
Security Level Approximate Brute-Force Time (2025 Est.)
56-bit (DES)     Crackable within seconds or minutes
128-bit (AES)    Secure for decades
256-bit (AES)    Practically unbreakable
7 Secret Sharing
A. What is Secret Sharing?
Secret sharing in information theory is a cryptographic method to distribute a secret
among a group of participants such that the secret can only be reconstructed when a
specific number or combination of participants pool their shares together. This
technique ensures that no single individual or unauthorized subset of participants can
gain access to the secret
   •   Secret sharing is a method of distributing a secret among a group of
       participants.
   •   Each participant gets a share, and the secret can only be reconstructed when a
       specific number (threshold) of shares are combined.
B. Purpose in Cyber Security
   •   To enhance fault tolerance, confidentiality, and security.
   •   Used in:
          o   Distributed key storage
          o   Threshold cryptography
          o   Secure multiparty computations
C. (k, n) Threshold Secret Sharing
   •   Secret is divided among n participants.
   •   Any k (k ≤ n) participants can reconstruct the secret.
   •   Fewer than k shares reveal no information about the secret.
D. Shamir’s Secret Sharing Scheme (1979)
   •   Based on polynomial interpolation over a finite field.
Example:
F. Properties of Shamir’s Scheme
Property                           Description
Perfect Secrecy                    Less than kkk shares reveal nothing
Linear                             Shares can be combined algebraically
Information-Theoretic Security Doesn’t rely on computational hardness
G. Applications
   •   Cryptographic key management
   •   Blockchain wallet recovery
   •   Distributed trust systems
   •   Backup & disaster recovery in secure environments
8 Provable Security
A. What is Provable Security?
Provable security in information theory refers to demonstrating the security of a
cryptographic system through mathematical proofs based on established hardness
assumptions. This approach provides a high level of confidence in the system's
resistance to attacks by reducing its security to the difficulty of solving a known hard
problem.
   •   Provable security is a method of formally verifying the strength of
       cryptographic algorithms using mathematical proofs.
   •   It defines security in terms of well-established complexity assumptions and
       tries to prove that breaking the system is as hard as solving a known difficult
       problem.
B. Models of Provable Security
Model                  Description
Information-           Unbreakable even with unlimited computational power.
Theoretic Security     Based on entropy and randomness. Example: One-Time Pad
                       Secure against feasible (polynomial-time) attacks. Security
Computational
                       relies on the difficulty of certain problems (e.g., factoring,
Security
                       discrete log).
C. Common Security Proof Techniques
   1. Reduction Proofs:
          o   Show that breaking the cryptosystem would solve a known hard
              problem (like factoring large primes).
          o   Example: RSA’s security is reducible to the difficulty of factoring.
   2. Simulation-Based Proofs:
          o   Simulate a scenario in which no adversary can distinguish between the
              real and ideal executions of the protocol.
          o   Common in Zero-Knowledge Proofs and Secure Multiparty
              Computation.
D. Security Assumptions
E. Security Models
Model                                Use
IND-CPA (Chosen Plaintext Attack) Attacker can encrypt plaintexts of their choice
IND-CCA (Chosen Ciphertext           Attacker can also decrypt ciphertexts except
Attack)                              target
ROM (Random Oracle Model)            Models hash functions as ideal random functions
                                     Protocols remain secure when combined with
UC (Universal Composability)
                                     others
F. Why Provable Security Matters
   •   Helps us trust cryptographic algorithms by providing rigorous guarantees.
   •   Useful in:
          o   Protocol design (e.g., SSL, TLS)
          o   Evaluating trade-offs between performance and security
          o   Setting realistic key sizes and usage durations
G. Real-World Application
   •   Many algorithms (AES, RSA, ECC) are designed to meet provable security goals.
   •   Standards (like NIST) often require formal security proofs before recommending
       algorithms.
9 Computational Security and Symmetric Cipher
A. Computational Security
Computational security in cybersecurity refers to the practice of building and analyzing
cryptographic systems that are secure against attackers with limited computational
resources. Instead of aiming for perfect, unconditional security, computational security
assumes that certain problems are computationally hard to solve, meaning they would
take an infeasible amount of time for an attacker to break.
   •   A system is computationally secure if it cannot be broken within a reasonable
       time, given current computational capabilities.
Key Idea:
   •   Security is based on the assumption that some problems are hard to solve,
       such as:
            o   Integer factorization (used in RSA)
            o   Discrete logarithms (used in Diffie-Hellman)
            o   Elliptic Curve problems (used in ECC)
Example:
B. Characteristics of Computational Security
Feature                     Description
                            No formal proof; based on the hardness of known
Relies on Assumptions
                            problems
Breakable in Theory         But not in feasible time with current technology
                            Almost all modern cryptographic algorithms (AES, RSA,
Used in Practice
                            ECC)
C. Security Levels and Key Sizes
Security Level       Minimum Key Size (as of 2025)
80-bit               Obsolete
128-bit              Sufficient for general security
                     High-security applications (long-term
256-bit
                     confidentiality)
D. Symmetric Ciphers
Definition:
   •     A cipher where the same secret key is used for both encryption and
         decryption.
Types of Symmetric Ciphers
1. Block Ciphers
   •     Encrypt fixed-size blocks (e.g., 128 bits)
   •     Common examples:
            o   AES (Advanced Encryption Standard)
            o   DES (Data Encryption Standard) — now obsolete
2. Stream Ciphers
   •   Encrypt data bit-by-bit or byte-by-byte.
   •   Common examples:
            o   RC4 (deprecated)
            o   Salsa20 / ChaCha20 (modern stream ciphers)
Security Properties of Symmetric Ciphers
Property                Description
Confusion               Obfuscates relationship between key and ciphertext
Diffusion               Spread plaintext information across ciphertext
Resistance to           Should withstand brute force, differential, and linear
Attacks                 cryptanalysis
E. Security Models for Symmetric Ciphers
IND-CPA (Indistinguishability under Chosen Plaintext Attack):
   •   Adversary cannot distinguish between encryptions of two chosen plaintexts.
IND-CCA (Chosen Ciphertext Attack):
   •   Stronger: adversary can also query a decryption oracle (excluding the target
       ciphertext).
F. Example: AES
   •   Block size: 128 bits
   •   Key sizes: 128, 192, 256 bits
   •   Rounds: 10/12/14 depending on key size
   •   Security: Considered highly secure when implemented properly
G. Use Cases in Cyber Security
   •   Data-at-rest encryption (files, databases)
   •   Network encryption (VPNs, TLS)
   •   Authentication protocols
   •   Secure messaging