0% found this document useful (0 votes)
15 views23 pages

HTCyberSecurity. UNIT 1

The document provides an overview of information theory as it relates to cyber security, focusing on key concepts introduced by Claude Shannon, such as bits, entropy, and mutual information. It discusses random variables, probability distributions, uncertainty, information leakage, and secret sharing, highlighting their relevance in assessing security threats and designing secure systems. Additionally, it covers provable security and computational security, emphasizing the importance of mathematical proofs in establishing the security of cryptographic systems.

Uploaded by

Tarun Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views23 pages

HTCyberSecurity. UNIT 1

The document provides an overview of information theory as it relates to cyber security, focusing on key concepts introduced by Claude Shannon, such as bits, entropy, and mutual information. It discusses random variables, probability distributions, uncertainty, information leakage, and secret sharing, highlighting their relevance in assessing security threats and designing secure systems. Additionally, it covers provable security and computational security, emphasizing the importance of mathematical proofs in establishing the security of cryptographic systems.

Uploaded by

Tarun Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Prepared by: Imran Raza Khan

(Assistant Professor, ACET, Aligarh)

INFORMATION THEORY FOR CYBER SECURITY


Unit I: Introduction To Information Theory

1. Shannon’s Foundation of Information Theory


Introduction

• Claude E. Shannon, in 1948, introduced the mathematical theory of


communication, which laid the foundation of modern information theory.

• Information theory, introduced by Claude Shannon in 1948, is a mathematical


framework for quantifying information, data compression, and transmission. In
machine learning, information theory provides powerful tools for analyzing and
improving algorithms.

Key Concepts Introduced by Shannon

1.1 Bit – The Basic Unit of Information

• A bit (binary digit) is the most basic unit of information.

• It represents a choice between two equally probable alternatives (0 or 1).

• If a message conveys one of two equally likely outcomes, it carries 1 bit of


information.

1.2 Information as Reduction of Uncertainty

• Information is fundamentally linked to uncertainty.

• When we receive new data, it reduces our uncertainty about the system or
environment.

1.3 Entropy – Measuring Information

• Entropy quantifies the average amount of information produced by a source.


• It is a measure of the uncertainty in a random variable.

Example:

If you have a fair coin toss:

• Probability of heads = 0.5, tails = 0.5

1.4 Mutual Information

• Mutual information measures the amount of information obtained about one


random variable through another random variable. It quantifies the dependency
between variables.

• Interpretation: Mutual information is zero if X and Y are independent, and higher


values indicate greater dependency

1.5 Kullback - Leibler (KL) Divergence

KL divergence measures the difference between two probability distributions. It is often


used in machine learning to compare the predicted probability distribution with the true
distribution.

Why Shannon’s Theory is Important in Cyber Security

• Provides limits on compression and encryption.

• Helps in understanding perfect secrecy and data leakage.

• Forms the basis for designing efficient and secure communication systems.

2 Random Variables
Definition

Random variable is a fundamental concept in statistics that bridges the gap between
theoretical probability and real-world data. A Random variable in statistics is a function
that assigns a real value to an outcome in the sample space of a random experiment.

For example: if you roll a die, you can assign a number to each possible outcome.

There are two basic types of random variables:

• Discrete Random Variables (which take on specific values).

• Continuous Random Variables (assume any value within a given range).

We define a random variable as a function that maps from the sample space of an
experiment to the real numbers. Mathematically, Random Variable is expressed as,

X: S →R

where,

• X is Random Variable (It is usually denoted using capital letter)

• S is Sample Space

• R is Set of Real Numbers


1. Discrete Random Variable

• Takes on finite or countably infinite values.

• Example: Number shown on a rolled die → Values: {1, 2, 3, 4, 5, 6}

Probability Mass Function (PMF)

2. Continuous Random Variable

• Takes uncountably infinite values over an interval.

• Example: Temperature measurement → Any value in a real range

Probability Density Function (PDF)

3. Cumulative Distribution Function (CDF)

• Works for both types of random variables.

• It gives the probability that a random variable is less than or equal to a certain
value.
4. Expectation and Variance

Relevance to Information Theory

• Random variables model the behavior of data sources and channels.

• Used in computing entropy, mutual information, and uncertainty.

• Help in assessing security threats by modeling attackers' knowledge


probabilistically.

3 Probability Distribution Factors


What is a Probability Distribution?

A probability distribution is a mathematical function or rule that describes how the


probabilities of different outcomes are assigned to the possible values of a random
variable. It provides a way of modeling the likelihood of each outcome in a random
experiment.

While a frequency distribution shows how often outcomes occur in a sample or


dataset, a probability distribution assigns probabilities to outcomes abstractly,
theoretically, regardless of any specific dataset. These probabilities represent the
likelihood of each outcome occurring.

1. Types of Probability Distributions

1. Discrete Probability Distribution

• For discrete random variables


• Described using a Probability Mass Function (PMF)

2. Continuous Probability Distribution

• For continuous random variables

• Described using a Probability Density Function (PDF)

Property Discrete Case Continuous Case


3. Important Distributions in Information Theory

Use in Cyber Security / Info


Distribution Type
Theory

Uniform Discrete/Continuous Ideal case for max entropy

Bernoulli Discrete Binary communication channels

Repeated trials (e.g., packet


Binomial Discrete
loss)

First success timing (e.g.,


Geometric Discrete
retries)

Modeling rare events (e.g.,


Poisson Discrete
attacks)

Noise in communication
Gaussian Continuous
channels

Exponential Continuous Time between random events

4. Application in Information Security

• Entropy is calculated using a distribution’s probabilities.

• Mutual information and conditional entropy also depend on joint and marginal
distributions.

• Used to model uncertainty about system states and measure information


leakage.

4 Uncertainty and Entropy Information Measures


What is Uncertainty in Information Theory?

• Uncertainty refers to lack of knowledge about the outcome of a random


variable.

• The more uncertain you are about an event, the more information is gained
when it occurs.

• In information theory, uncertainty is quantified by entropy, which measures the


average amount of information needed to describe the outcome of a random
variable. Essentially, it reflects how unpredictable or random a system is. A
higher entropy indicates greater uncertainty, while a lower entropy signifies less
uncertainty.

What is Entropy?

• Entropy (H) quantifies the average uncertainty (or information) in a random


variable.

• Proposed by Claude Shannon as the core of information theory.

• In information theory, entropy is a measure of uncertainty or randomness


associated with a random variable or a data source. It quantifies the average
amount of information needed to describe an outcome from that source. A
higher entropy indicates more randomness and unpredictability, while a lower
entropy suggests more order and predictability.

Shannon Entropy Formula


Properties of Entropy

Special Cases
Other Entropy-Based Measures

• Measures the information shared between XXX and YYY

Relevance in Cyber Security

• Entropy helps:

o Assess password strength

o Measure information leakage

o Evaluate encryption quality

• High entropy ⇒ harder to predict ⇒ more secure

5 Leakage and Quantifying Leakage


What is Information Leakage?

• Information leakage refers to the unintended or unauthorized disclosure of


sensitive information.

• Even if a system is encrypted, patterns or partial data may leak useful


information to an attacker.

• Information leakage, also known as data leakage, refers to the unauthorized


exposure or disclosure of sensitive or confidential information. This can occur
both internally, through employee negligence or malicious intent, and externally,
through cyberattacks or misconfigured systems. Information leakage can have
severe consequences, including financial losses, reputational damage, and legal
repercussions.
Causes of Information Leakage:

• Human error: Negligence, misconfigurations, or accidental sharing of


information through email or messaging platforms.

• Insider threats: Malicious or disgruntled employees intentionally exfiltrating


data.

• Vulnerabilities in systems and applications: Exploiting weaknesses in


software or network configurations.

• Improper disposal of sensitive information: Discarding documents or devices


without proper sanitization.

• Lack of security awareness: Insufficient training or understanding of security


protocols among employees.

Types of Information Leakage

Type Description

Explicit Leakage Directly observable data leaks (e.g., error messages, logs)

Leaks through system behavior or patterns (e.g., timing, size of


Implicit Leakage
responses)

Side-channel Leaks via physical observation (e.g., power consumption,


Leakage electromagnetic signals)

Measuring Information Leakage

1. Mutual Information Leakage


2. Min-Entropy Leakage

• Focuses on the worst-case probability of guessing the secret.

3. Shannon Leakage

• Based on average information leaked.

• Uses expected reduction in entropy:

Examples

1. Password System

o User has a 4-digit PIN → 10,000 possibilities → H=log⁡210000≈13.29H =


\log_2 10000 \approx 13.29H=log210000≈13.29 bits

o If attacker learns first digit is 3 → uncertainty drops → leakage occurs

2. Side-Channel Example

o Power trace from a cryptographic device leaks key bits during encryption

o Leakage is quantified by how much the power usage correlates with the
secret key bits

Partitioning and Leakage

• The concept of partitions comes into play in systems where data is grouped into
observable categories.

• Leakage can be seen as a function of how finely or coarsely these partitions are
made.
Security Goals Related to Leakage

Goal Requirement

Perfect Secrecy I(X;Y)=0I(X; Y) = 0I(X;Y)=0 — no information leaked

Computational Secrecy Leakage is computationally infeasible to exploit

Differential Privacy Controlled leakage when releasing statistical information

Why This Matters in Cyber Security

• Even if encryption is used, metadata, system behavior, or execution timing


may leak information.

• Quantifying leakage helps in:

o Designing more secure systems

o Evaluating risk

o Setting privacy guarantees

6 Partitions and Lower Bounds on Key Size


In Information Theory, a partition refers to dividing a set of elements into non-
overlapping subsets, called blocks, that collectively cover the entire set. These
partitions are fundamental to understanding information content and relationships
between data. They are used to define concepts like entropy, mutual information, and
variation of information.

A. Partitions in Information Theory

A partition is a way to divide the possible values of a random variable into non-
overlapping subsets. In the context of cyber security:

• Partitions represent what an attacker can distinguish based on their


observations.

• Finer partitions ⇒ More detailed observations ⇒ Higher information leakage


Example:

If a secret key KKK can be in {00, 01, 10, 11}:

• No partition: attacker knows nothing

• Partition into {00, 01} and {10, 11}: attacker knows which half, but not the exact
key

B. Importance of Partitioning

• Helps model the information access level of an attacker

• Used to design systems that minimize leakable information

• Critical in designing protocols with provable privacy guarantees

C. Lower Bounds on Key Size

1. Shannon’s Theorem on Perfect Secrecy

• A system provides perfect secrecy if:

Implication:

• Example: One-Time Pad (OTP) uses a random key of the same length as the
message.
2. Secrecy Capacity

• Secrecy capacity: Maximum rate at which data can be transmitted securely over
a channel.

• If key is too short, attacker can brute-force the key, leading to:

o Higher computational leakage

o Unreliable confidentiality

3. Lower Bounds in Real-World Systems

• Security level (e.g., 128-bit AES) defines minimum key size to prevent brute-
force.

• Longer key size ⇒ Higher computational difficulty for attackers

Security Level Approximate Brute-Force Time (2025 Est.)

56-bit (DES) Crackable within seconds or minutes

128-bit (AES) Secure for decades

256-bit (AES) Practically unbreakable

7 Secret Sharing
A. What is Secret Sharing?

Secret sharing in information theory is a cryptographic method to distribute a secret


among a group of participants such that the secret can only be reconstructed when a
specific number or combination of participants pool their shares together. This
technique ensures that no single individual or unauthorized subset of participants can
gain access to the secret

• Secret sharing is a method of distributing a secret among a group of


participants.

• Each participant gets a share, and the secret can only be reconstructed when a
specific number (threshold) of shares are combined.
B. Purpose in Cyber Security

• To enhance fault tolerance, confidentiality, and security.

• Used in:

o Distributed key storage

o Threshold cryptography

o Secure multiparty computations

C. (k, n) Threshold Secret Sharing

• Secret is divided among n participants.

• Any k (k ≤ n) participants can reconstruct the secret.

• Fewer than k shares reveal no information about the secret.

D. Shamir’s Secret Sharing Scheme (1979)

• Based on polynomial interpolation over a finite field.


Example:

F. Properties of Shamir’s Scheme

Property Description

Perfect Secrecy Less than kkk shares reveal nothing

Linear Shares can be combined algebraically

Information-Theoretic Security Doesn’t rely on computational hardness

G. Applications

• Cryptographic key management

• Blockchain wallet recovery

• Distributed trust systems

• Backup & disaster recovery in secure environments

8 Provable Security
A. What is Provable Security?

Provable security in information theory refers to demonstrating the security of a


cryptographic system through mathematical proofs based on established hardness
assumptions. This approach provides a high level of confidence in the system's
resistance to attacks by reducing its security to the difficulty of solving a known hard
problem.
• Provable security is a method of formally verifying the strength of
cryptographic algorithms using mathematical proofs.

• It defines security in terms of well-established complexity assumptions and


tries to prove that breaking the system is as hard as solving a known difficult
problem.

B. Models of Provable Security

Model Description

Information- Unbreakable even with unlimited computational power.


Theoretic Security Based on entropy and randomness. Example: One-Time Pad

Secure against feasible (polynomial-time) attacks. Security


Computational
relies on the difficulty of certain problems (e.g., factoring,
Security
discrete log).

C. Common Security Proof Techniques

1. Reduction Proofs:

o Show that breaking the cryptosystem would solve a known hard


problem (like factoring large primes).

o Example: RSA’s security is reducible to the difficulty of factoring.

2. Simulation-Based Proofs:

o Simulate a scenario in which no adversary can distinguish between the


real and ideal executions of the protocol.

o Common in Zero-Knowledge Proofs and Secure Multiparty


Computation.
D. Security Assumptions

E. Security Models

Model Use

IND-CPA (Chosen Plaintext Attack) Attacker can encrypt plaintexts of their choice

IND-CCA (Chosen Ciphertext Attacker can also decrypt ciphertexts except


Attack) target

ROM (Random Oracle Model) Models hash functions as ideal random functions

Protocols remain secure when combined with


UC (Universal Composability)
others

F. Why Provable Security Matters

• Helps us trust cryptographic algorithms by providing rigorous guarantees.

• Useful in:

o Protocol design (e.g., SSL, TLS)

o Evaluating trade-offs between performance and security

o Setting realistic key sizes and usage durations


G. Real-World Application

• Many algorithms (AES, RSA, ECC) are designed to meet provable security goals.

• Standards (like NIST) often require formal security proofs before recommending
algorithms.

9 Computational Security and Symmetric Cipher

A. Computational Security

Computational security in cybersecurity refers to the practice of building and analyzing


cryptographic systems that are secure against attackers with limited computational
resources. Instead of aiming for perfect, unconditional security, computational security
assumes that certain problems are computationally hard to solve, meaning they would
take an infeasible amount of time for an attacker to break.

• A system is computationally secure if it cannot be broken within a reasonable


time, given current computational capabilities.

Key Idea:

• Security is based on the assumption that some problems are hard to solve,
such as:

o Integer factorization (used in RSA)

o Discrete logarithms (used in Diffie-Hellman)

o Elliptic Curve problems (used in ECC)

Example:
B. Characteristics of Computational Security

Feature Description

No formal proof; based on the hardness of known


Relies on Assumptions
problems

Breakable in Theory But not in feasible time with current technology

Almost all modern cryptographic algorithms (AES, RSA,


Used in Practice
ECC)

C. Security Levels and Key Sizes

Security Level Minimum Key Size (as of 2025)

80-bit Obsolete

128-bit Sufficient for general security

High-security applications (long-term


256-bit
confidentiality)

D. Symmetric Ciphers

Definition:

• A cipher where the same secret key is used for both encryption and
decryption.

Types of Symmetric Ciphers

1. Block Ciphers

• Encrypt fixed-size blocks (e.g., 128 bits)

• Common examples:

o AES (Advanced Encryption Standard)

o DES (Data Encryption Standard) — now obsolete


2. Stream Ciphers

• Encrypt data bit-by-bit or byte-by-byte.

• Common examples:

o RC4 (deprecated)

o Salsa20 / ChaCha20 (modern stream ciphers)

Security Properties of Symmetric Ciphers

Property Description

Confusion Obfuscates relationship between key and ciphertext

Diffusion Spread plaintext information across ciphertext

Resistance to Should withstand brute force, differential, and linear


Attacks cryptanalysis

E. Security Models for Symmetric Ciphers

IND-CPA (Indistinguishability under Chosen Plaintext Attack):

• Adversary cannot distinguish between encryptions of two chosen plaintexts.

IND-CCA (Chosen Ciphertext Attack):

• Stronger: adversary can also query a decryption oracle (excluding the target
ciphertext).

F. Example: AES

• Block size: 128 bits

• Key sizes: 128, 192, 256 bits

• Rounds: 10/12/14 depending on key size

• Security: Considered highly secure when implemented properly


G. Use Cases in Cyber Security

• Data-at-rest encryption (files, databases)

• Network encryption (VPNs, TLS)

• Authentication protocols

• Secure messaging

You might also like