0% found this document useful (0 votes)
315 views46 pages

Cheat Sheet - JAM

This document is a cheat sheet covering topics in probability, statistics, and mathematics. It includes sections on probability theory, random variables, generating functions, inequalities, theoretical distributions, sampling distributions, distribution relationships, transformations, point estimation in statistics, and testing of hypotheses. The cheat sheet provides an overview of key concepts and formulas within each of these topics.

Uploaded by

Harnaik Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
315 views46 pages

Cheat Sheet - JAM

This document is a cheat sheet covering topics in probability, statistics, and mathematics. It includes sections on probability theory, random variables, generating functions, inequalities, theoretical distributions, sampling distributions, distribution relationships, transformations, point estimation in statistics, and testing of hypotheses. The cheat sheet provides an overview of key concepts and formulas within each of these topics.

Uploaded by

Harnaik Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Probability, Statistics & Mathematics

(Cheat Sheet)

Author: Anik Chakraborty


Contents

1 Probability 1
1.1 Theory of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sigma Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Stochastic Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Bivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.4 Probability Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Markov & Chebyshev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Cauchy-Schwarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.4 Lyapunov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Theoretical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.3 Multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.4 Truncated Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Chi-square, t, F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Distribution Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.2 Negative Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.4 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.5 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.6 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.7 Cauchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.8 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

i
CONTENTS ii

1.8 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.1 Orthogonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.2 Polar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.3 Special Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Statistics 21
2.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Minimum MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.3 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.5 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.6 Methods of finding UMVUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.7 Cramer-Rao Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.8 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Testing of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Methods of finding C.I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Wilk’s Optimum Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Test Inversion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Large Sample Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Mathematics 35
3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Vectors & Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.4 System of Linear Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 1

Probability

1.1 Theory of Probability


1.1.1 Sigma Field
Ω: Universal Set. A non-empty class A of few subsets of Ω is said to form a sigma field on Ω if it
satisfies the following properties-

(i) A ∈ A =⇒ Ac ∈ A (Closed under complementation)



S
(ii) A1 , A2 , . . . , An , . . . ∈ A =⇒ An ∈ A (Closed under countable union)
n=1

Theorems
(1) A σ-field is closed under finite unions.

(2) A σ-field must include the null set, ϕ and the whole set, Ω.

(a) A = {∅, Ω} is the smallest/minimal σ-field on Ω.


(b) If A ∈ Ω, then A = {∅, A, Ac , Ω} is the minimal σ-field containing A, on Ω.
(c) The power set of Ω (the set of all subsets of Ω) is the largest σ-field on Ω.

(3) A σ-field is closed under countable intersections.

1.1.2 Properties
(1) For two sets A, B ∈ A -

(a) Monotonic Property: If A ⊆ B, P (A) ≤ P (B)


(b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) =⇒ P (A ∪ B) ≤ P (A) + P (B)
(c) P (A ∪ B) = P (A − B) + P (B − A) + P (A ∩ B)
p
(d) P (A ∩ B) ≤ min{P (A), P (B)} =⇒ P (A ∩ B) ≤ P (A) · P (B)
(e) P (A ∩ B) ≥ P (A) + P (B) − 1
(f) P (A) = P (A ∩ B) + P (A ∩ B c ) =⇒ P (A − B) = P (A) − P (A ∩ B)

1
CHAPTER 1. PROBABILITY 2

(2) For any n events A1 , A2 , . . . , An ∈ A -


n  n
S P
(a) Boole’s inequality: P Ai ≤ P (Ai )
i=1 i=1
n  n
T P
(b) Bonferroni’s inequality: P Ai ≥ P (Ai ) − (n − 1)
i=1 i=1
n
n  n
P P S P
(c) P (Ai ) − 1⩽i1 <i2 ⩽n P (Ai1 ∩ Ai2 ) ≤ P Ai ≤ P (Ai )
i=1 i=1 i=1

(d) Poincare’s theorem:


n
! n
[ X X X
P Ai = P (Ai ) − P (Ai1 ∩ Ai2 ) + P (Ai1 ∩ Ai2 ∩ Ai3 ) − · · ·
i=1 i=1 1⩽i1 <i2 ⩽n 1⩽i1 <i2 <i3 ⩽n

n
!
\
· · · + (−1)n−1 P Ai
i=1

(e) Jordan’s theorem:


i. The probability that exactly m of the n events will occur is -
     
m+1 m+2 n−m n
P(m) = Sm − Sm+1 + Sm+2 − · · · + (−1) Sn
m m m

ii. The probability that atleast m of the n events will occur is -

Pm = P(m) + P(m+1) + · · · + P(n)


     
m m+1 n−m n − 1
= Sm − Sm+1 + Sm+2 − · · · + (−1) Sn
m−1 m−1 m−1
P
where, Sr = 1⩽i1 <i2 <···<ir ⩽n P (Ai1 ∩ Ai2 ∩ · · · ∩ Air ), r = 1(1)n

1.1.3 Conditional Probability


Consider, a probability space (Ω, A, P).
n−1 
T
(1) Compound probability: n events A1 , A2 , . . . , An ∈ A are such that P Ai > 0. Then,
i=1
n−1
T
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 /A1 ) · P (A3 /A1 ∩ A2 ) · · · P (An / Ai )
i=1

(2) Total Probability Theorem: If (B1 , B2 , . . . , Bn ) is a partition of Ω with P (Bi ) > 0 ∀i, then
n
P
for any event A ∈ A, P (A) = P (Bi ) · P (A/Bi )
i=1

P (Bi )P (A/Bi )
(3) Bayes’ Theorem: P (Bi /A) = n
P ,i = 1(1)n, if P (A) > 0
P (Bk )P (A/Bk )
k=1
CHAPTER 1. PROBABILITY 3

(4) Bayes’ theorem with future events: Let, C ∈ A be an event under the previous conditions
with P (A/Bi ) > 0, i = 1(1)n. Then,
n
P
P (Bi )P (A/Bi )P (C/A ∩ Bi )
i=1
P (C/A) = n
P
P (Bi )P (A/Bi )
i=1

1.1.4 Stochastic Independence


(1) For two independent events A, B -

P (A/B) = P (A/B c ) = P (A) ⇐⇒ P (A ∩ B) = P (A) · P (B)

(2) Pairwise independence: For n events A1 , A2 , . . . , An ∈ A are said to be ‘pairwise’ indepen-


dent if -
P (Ai1 ∩ Ai2 ) = P (Ai1 ) · P (Ai2 ), ∀i1 < i2

(3) Mutual independence: The above events are ‘mutually’ independent if -

P (Ai1 ∩ Ai2 ) = P (Ai1 ) · P (Ai2 ), ∀i1 < i2

P (Ai1 ∩ Ai2 ∩ Ai3 ) = P (Ai1 ) · P (Ai2 ) · P (Ai3 ), ∀i1 < i2 < i3


..
.
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 ) · · · P (An )

1.2 Random Variable


1.2.1 Univariate
X : Ω → R, such that {ω : X(ω) ≤ x} ∈ A, ∀x ∈ R is a random variable on {Ω, A}.

(1) The same variable X(·) is a R.V. for a particular choice of σ-field but may not for another
choice of σ-field.

(2) X is a R.V. on (Ω, A) =⇒ f (X) is also a R.V. on (Ω, A). (for any f )

(3) Continuity theorem of Probability: (A1 ⊂ A2 ⊂ . . .) or (A1 ⊃ A2 ⊃ . . .)


 
=⇒ lim P (An ) = P lim An
n→∞ n→∞

(4) CDF: FX (x) = P [{ω : X(ω) ≤ x}], ∀x ∈ R

(a) Non-decreasing: −∞ < x1 < x2 < ∞ =⇒ F (x1 ) ≤ F (x2 )


(b) Normalized: lim F (x) = 0, lim F (x) = 1
x→−∞ x→+∞

(c) Right Continuous: lim+ F (x) = F (a), ∀a ∈ R


x→a
CHAPTER 1. PROBABILITY 4

For any R.V. X with CDF F (·) -

P (a < X < b) = F (b − 0) − F (a) P (a ≤ X ≤ b) = F (b) − F (a − 0)

P (a < X ≤ b) = F (b) − F (a) P (a ≤ X < b) = F (b − 0) − F (a − 0)

(5) Decomposition theorem: F (x) = αFc (x) + (1 − α)Fd (x) where, 0 ≤ α ≤ 1 and Fc (x), Fd (x)
are continuous and discrete D.F., respectively.

(a) α = 0 =⇒ X is purely discrete.


(b) α = 1 =⇒ X is purely continuous.
(c) 0 < α < 1 =⇒ X is mixed.

(6) X is non-negative with E(X) = 0 =⇒ P (X = 0) = 1


(b−a)2
(7) P (a ≤ X ≤ b) = 1 =⇒ V ar(X) ≤ 4

(8) P (|X| ≤ M ) = 1 for some 0 ≤ M < ∞ =⇒ µ′r exists ∀ r



P
(9) P (X ∈ {0, N}) = 1 =⇒ E(X) = {1 − F (x)}
x=0

(10) P (X ∈ [0, ∞)) = 1 =⇒ lim x{1 − F (x)} = 0, if E(X) exists.


x→∞

R∞
(11) E(X) = {1 − F (x)}dx for any non-negative R.V. X.
0
Z ∞
r
E(X ) = r xr−1 {1 − F (x)}dx
0

(12) ln(GMX ) = E(ln X)

(13) pth quantile: ξp such that F (ξp − 0) ≤ p ≤ F (ξp ). For continuous case, F (ξp ) = p

Symmetry
X has a symmetric distribution about ‘a’ if any of the following, holds -

(a) P (X ≤ a − x) = P (X ≥ a + x), ∀x ∈ R

(b) F (a − x) + F (a + x) = 1 + P (a + x)

Again, if X is continuous then F (a − x) + F (a + x) = 1 or f (a − x) = f (a + x), ∀x ∈ R

ˆ E(X) = a, if it exists

ˆ Med (X) = a
CHAPTER 1. PROBABILITY 5

1.2.2 Bivariate
X

Y
: Ω → R2 , such that {ω : X(ω) ≤ x, Y (ω) ≤ y} ∈ A, ∀(x, y) ∈ R2 is a bivariate random variable
on {Ω, A}.

(1) CDF: F (x, y) = P [{ω : X(ω) ≤ x, Y (ω) ≤ y}], ∀(x, y) ∈ R2

(a) F (x, y) is non-decreasing and right continuous w.r.t. each of the arguments x and y.
(b) F (−∞, y) = F (x, −∞) = 0, F (+∞, +∞) = 1
(c) For x1 < x2 , y1 < y2 -

P (x1 < X < x2 , y1 < Y < y2 ) = F (x2 , y2 ) − F (x1 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ) ≥ 0

Marginal CDF

FX (x) = lim FX,Y (x, y), FY (y) = lim FX,Y (x, y)


y→∞ x→∞
p
• FX (x) + FY (y) − 1 ≤ FX,Y (x, y) ≤ FX (x) · FY (y), ∀(x, y) ∈ R2

(2) Joint distribution cannot be determined uniquely from the marginals.

(3) fX,Y (x, y) = fX (x) · fY |X (y|x) = fY (y) · fX|Y (x|y)


n o
(4) fX,Y (x, y; α) = fX (x) fY (y) 1 + α · 2FX (x) − 1 · 2FY (y) − 1 , α ∈ [−1, 1]

Stochastic Independence
(5) FX,Y (x, y) = FX (x) · FY (y), ∀(x, y) ∈ R2 =⇒ fX,Y (x, y) = fX (x) · fY (y), ∀(x, y)

(6) X ⊥⊥ Y =⇒ f (X) ⊥⊥ g(Y ) (converse is true when f , g is 1-1)

(7) X ⊥⊥ Y iff fX,Y (x, y) = k · f1 (x) · f2 (y), ∀ x, y ∈ R for some k > 0.

1.2.3 Results
(1) Sum Law: E(X + Y ) = E(X) + E(Y ), if all exists

(2) Product Law: X ⊥⊥ Y =⇒ E(XY ) = E(X) · E(Y )


Cov (X, Y ) = 0 ⇏ X ⊥⊥ Y

(3) X, Y identically distributed ⇏ P (X = Y ) = 1

(4) X1 , X2 , . . . , Xn are iid and continuous R.V.s =⇒ n! arrangements are equally likely
iid
(5) X ∼ Y =⇒ (X − Y ) is symmetric
Ru
(6) PDF of max{X, Y } : fU (u) = {f (u, t) + f (t, u)}dt f : Joint PDF of (X, Y )
−∞
CHAPTER 1. PROBABILITY 6

Conditional Distribution
(7) X ⊥⊥ Y =⇒ E(Y |X = x) = k, some constant ∀ x
 
(8) X ⊥⊥ Y − ρ σσXY =⇒ E(Y |X = x) is linear in x

(9) E(Y ) = E[E(Y |X)] or E(X) = E[E(X|Y )]


(10) V ar (Y ) = V ar {E(Y |X)} + E{V ar (Y |X)}
V ar {E(Y |X)}
(11) Correlation ratio: ηY2 X = V ar (Y )

N
P
(12) Wald’s equation: {Xn }: sequence of iid R.V.s, P (N ∈ N) = 1. Define, SN = Xi
i=1

=⇒ E(SN ) = E(X1 ) E(N )

=⇒ V ar (SN ) = V ar (X1 ) · E(N ) + E 2 (X1 ) · V ar (N )

1.3 Generating Functions


1.3.1 Moments
 
(1) MGF: MX (t) = E etX , |t| < t0 , for some t0 > 0 [if E etX < ∞]
It determines a distribution uniquely.
tr
(2) µ′r : coefficient of r!
in the expansion of MX (t), r = 0, 1, 2, . . .

tr µ′r
converges absolutely, then a sequence of moments {µ′r } determine
P
(3) If the power series: r!
r=0
a distribution uniquely. For a bounded R.V this always holds.

(4) Xi are independent with MGF Mi (t) =⇒ MS (t) = ni=1 Mi (t), where S = ni=1 Xi
Q P


(5) Bivariate MGF: MX,Y (t1 , t2 ) = E et1 X+t2 Y for |ti | < hi for some hi > 0, i = 1, 2
tr1 ts2
(6) µ′r,s : coefficient of r!s!
in the expansion of MX,Y (t1 , t2 )

∂ r+s
(7) Also, ∂tr1 ∂ts2
MX,Y (t1 , t2 ) |(t1 =0,t2 =0) = µ′r,s

(8) Marginal MGF: MX (t) = MX,Y (t, 0) & MY (t) = MX,Y (0, t)

(9) X & Y are independent ‘iff’ MX,Y (t1 , t2 ) = MX,Y (t1 , 0) · MX,Y (0, t2 ), ∀(t1 , t2 )

1.3.2 Cumulants
(1) CGF: KX (t) = ln{MX (t)}, provided the expansion is a convergent power series.

(2) k1 = µ′1 (mean), k2 = µ2 (variance), k3 = µ3 & k4 = µ4 − 3k22

(3) For two independent R.V. X & Y , kr (X + Y ) = kr (X) + kr (Y )


CHAPTER 1. PROBABILITY 7

1.3.3 Characteristic Function



(1) CF: ϕX (t) = E eitX

(2) ϕX (0) = 1, |ϕX (t)| ≤ 1

(3) ϕX (t) is continuous on R and always exists for t ∈ R

(4) ϕX (−t) = ϕX (t)

(5) If X has a symmetric distribution about ‘0’ then ϕX (t) is real valued and an even function of t.

(6) Uniqueness property and independence as of MGF.


R∞
(7) Inversion theorem: If −∞ ϕX (t)dt < ∞, then pdf of the distribution is -
Z ∞
1
f (x) = e−itX ϕX (t)dt
2π −∞

1.3.4 Probability Generating Function


(1) PGF: PX (t) = E(tX ), if |t| < 1

(2) It generates probability and factorial moments. It also determines a distribution uniquely.
dr
(3) rth order factorial moment: µ[r] = P (t) | (t=1) ,
dtr X
r = 0, 1, . . .
Qn Pn
(4) X1 , X2 , . . . , Xn are independent with PGF Pi (t) =⇒ PS (t) = i=1 Pi (t), where S = i=1 Xi

1.4 Inequalities
1.4.1 Markov & Chebyshev
E(X)
(1) Markov: For a non-negative R.V. X, P (X ⩾ a) ⩽ a
, for a > 0.
‘=’ holds if X has a two-point distribution.

(2) Chebyshev: P (|X − µ| ⩾ tσ) ⩽ t12 , t > 0 where µ = E(X) & σ 2 = V ar(X) < ∞.
‘=’ holds if X is such that -
1


 2 , if x = µ ± tσ
f (x) = 2t (t > 1)
1 − 1 , if x = µ

t2

(3) One-sided Chebyshev: E(X) = 0, V ar(X) = σ 2 < ∞



σ2
⩽ , if t > 0


P (X ⩾ t) σ 2 + t2
 t2
⩾
 , if t < 0
σ 2 + t2
CHAPTER 1. PROBABILITY 8

(4) If also µ4 < ∞ then,


µ4 − σ 4
P (|X − µ| ⩾ tσ) ⩽
µ4 − σ 4 + (t2 − 1)2 σ 4
µ4
It is an improvement over Chebyshev’s inequality if t2 ≥ σ4

(5) Bivariate Chebyshev: (X1 , X2 ) is a bivariate R.V. with means (µ1 , µ2 ), variances (σ12 , σ22 ) &
correlation ρ. Then for t > 0,
p
1 + 1 − ρ2
P (|X1 − µ1 | ⩾ tσ1 or |X2 − µ2 | ⩾ tσ2 ) ⩽
t2

1.4.2 Cauchy-Schwarz
If a bivariate R.V. (X, Y ) has finite variances and E(XY ) exists, then -

E 2 (XY ) ≤ E(X 2 )E(Y 2 )

‘=’ holds iff X & Y are linearly related passing through the origin i.e. P (X + λY = 0) = 1, for any
λ.

1.4.3 Jensen
f (·) is convex function and E(X) exists, then E [f (X)] ≥ f [E(X)]

Note: A function, f (·) is said to be convex on an interval I, if for x1 , x2 ∈ I and for some
λ ∈ [0, 1], if
f [λx1 + (1 − λ)x2 ] ≤ λf (x1 ) + (1 − λ)f (x2 )

If f (·) is twice differentiable then f ′′ (x) ≥ 0 is the condition for convexity.

1.4.4 Lyapunov
For a R.V. X, define βr = E (|X|r ) (assuming it exists) then -
n 1o 1 1
r+1
βrr is non decreasing i.e. βrr ≤ βr+1
CHAPTER 1. PROBABILITY 9

1.5 Theoretical Distributions


1.5.1 Discrete

X CDF PDF E(X) Var(X) MGF


 Nt 
#{i:xi ⩽ x} 1 N +1 N 2 −1 e t e −1
U {x1 , . . . , xN } N N 2 12 N et −1

Bernoulli (p) (1 − p)1−x px (1 − p)1−x p p(1 − p) (1 − p + pet )


n
 n
Bin (n, p) I1−p (n − x, x + 1)[1] x
px (1 − p)n−x np np(1 − p) (1 − p + pet )
−N p
(Nxp)(Nn−x ) N −n

Hyp (N, n, p) - N np np(1 − p) N −1
-
(n)
1−p 1−p p
Geo (p) 1 − (1 − p)x+1 p(1 − p)x p p2 1−(1−p)et
 n
n+x−1
 n(1−p) n(1−p) p
NB (n, p) Ip (n, x + 1)[1] n−1
pn (1 − p)x p p2 1−(1−p)et
R∞
eλ(e −1)
t
e−t tx x
Poisson (λ) Γ(x+1)
dt e−λ λx! λ λ
λ

Rp tk−1 (1−t)n−k
[1] Ip (k, n − k + 1) = 0 B(k,n−k+1) dt (Incomplete Beta Function)

Properties
Binomial
(1) Mode: [(n + 1) p] if (n + 1) p is not an integer, else {(n + 1) p − 1} and (n + 1) p.

(2) Factorial Moment: µ(r) = (n)r pr


1
(3) Bin (n, p) is symmetric iff p = 2
1
(4) Variance of Bin (n, p) is minimum iff p = 2
and minimum variance = n4 .
iid 2n 1 2n
(5) X, Y ∼ Bin (n, 21 ) =⇒ P (X = Y ) =
 
n 2

Geometric
1 1−p
(1) X : number of trials required to get the 1st success, then E(X) = p
and V ar(X) = p2

(2) Lack of Memory: X ∼ Geo (p) ⇐⇒ P (X > m + n | X > m) = P (X ⩾ n), ∀m, n ∈ N

Negative Binomial
h i  
(1) Mode: (n−1)(1−p)
p
if (n−1)(1−p)
p
is not an integer, else (n−1)(1−p)
p
−1 , (n−1)(1−p)
p
.

(2) NB (n, p) ≡ Bin (−n, P ) where, P = − 1−p


p
CHAPTER 1. PROBABILITY 10

(3) Y : number of trials required to get the rth success. Then -


 
y−1 r
P (Y = y) = p (1 − p)y−r , y = r, r + 1, . . .
r−1

Here, Y is discrete waiting time R.V. (Pascal Distribution)

(4) X ∼ Bin (n, p), Y ∼ NB (r, p) =⇒ P (X ≥ r) = P (Y ≤ n)

Poisson
(1) Mode: [λ] if λ is not an integer, else (λ − 1) and λ.

1.5.2 Continuous

X CDF PDF E(X) Var(X) MGF


x−a I{a<x<b} a+b (b−a)2 etb −eta
U (a, b) b−a b−a 2 12 t(b−a)
x
e− θ xn−1 1
Gamma (n, θ) Γx (n, θ)[2] θn Γ(n)
nθ nθ2 (1−tθ)n
x x
Exp (θ) 1 − e− θ 1
θ
e− θ θ θ2 1
(1−tθ)

xm−1 (1−x)n−1 m mn
Beta (m, n) Ix (m, n) B(m,n) m+n (m+n)2 (m+n+1)
-
1 xm−1 m m(m+n−1)
Beta2 (m, n) - B(m,n)
· (1+x)m+n n−1
(n > 1) (n−2)(n−1)2
(n > 2) -
(x−µ)2 t2 σ 2
x−µ √1 e−

N (µ, σ 2 ) Φ σ σ 2π
2σ 2 µ σ2 etµ+ 2

(ln x−µ)2 1
 
e−
ln x−µ
 1 2 2µ+σ 2 σ2
Λ (µ, σ 2 ) Φ σ xσ


2σ 2 eµ+ 2 σ e e −1 ×
x−µ
1 1
tan−1 σ

C (µ, σ) 2
+ π σ π{σ 2 +(x−µ)2 }
× × ×
x−µ x−µ
SE (µ, σ) 1 − e−( σ ) 1
σ
e−( σ ) µ+σ σ2 etµ
(1−tσ)
1 x−µ

 e σ
 ,x ≤ µ |x−µ|
DE (µ, σ) 2 1
e− σ µ 2σ 2 etµ
(1−t2 σ 2 )
1 − 1 e− x−µ

,x > µ
 σ
2
x0 θ
 θxθ0 θx0 θx20
Pareto (x0 , θ) 1− x xθ+1 θ−1
(θ > 1) (θ−2)(θ−1)2
(θ > 2) -
x−α
1 1 e β β 2 π2 πβt etα
Logistic (α, β) − x−α x−α 2 α
( ) β 3 sin(πβt)
 
1+e β 1+e β

t
e− θ tn−1
Rx
[2] Γx (n, θ) = 0 θ n Γ(n) dt (Incomplete Gamma Function)
CHAPTER 1. PROBABILITY 11

Properties
Uniform
br+1 −ar+1
(1) µ′r = (r+1)(b−a)

(2) X ∼ U (0, n), n ∈ N =⇒ X − [X] ∼ U (0, 1)


(3) Classical & Geometric definition of probability is based on ‘Uniform distribution’ over discrete
& continuous space, respectively.

Gamma
Γ(n+r)
(1) Moments: µ′r = θr Γ(n)
, if r > −n

(2) HM: (n − 1) θ, if n > 1


(3) Mode: Mode is at (n − 1) θ, if n > 1; 0, if n = 1 and for 0 < n < 1 no mode.

Exponential
(1) µ′r = θr r!
(2) ξp = −θ ln(1 − p) =⇒ Median = θ ln 2

(3) Mode is at x = 0, MDθ = e

(4) Lack of Memory: X ∼ Exp (θ) ⇐⇒ P (X > m + n | X > m) = P (X > n), ∀m, n > 0
F ′ (x)
(5) = constant ∀x > 0 ⇐⇒ X ∼ Exponential
1−F (x)
 1

(6) X ∼ Exp (λ) =⇒ [X] ∼ Geo p = 1 − e− λ and X ⊥⊥ [X]

1 n−1

(7) X ∼ DE (θ, 1) =⇒ P [X(1) ≤ θ ≤ X(n) ] = 1 − 2

Beta
B(r+m,n)
(1) µ′r = B(m,n)
, if r + m > 0
m−1
(2) HM: m+n−1
, if m > 1
m−1
(3) Mode: m+n−2
, if m > 1, n > 1

(4) If m = n, median = 21 , ∀n > 0 and mode = 21 , if n > 1, else no mode.


D
(5) Beta (1, 1) ≡ U (0, 1)

Beta2
B(r+m,n−r)
(1) µ′r = B(m,n)
, if −m < r < n
n
(2) HM: m−1
, if m > 1
m−1
(3) Mode: n+1
, if m > 1, for 0 < m < 1, no mode.
CHAPTER 1. PROBABILITY 12

Normal
(1) median = mode = µ and bell-shaped (unimodal)
Γ(r+ 21 )
(2) µ2r−1 = 0, µ2r = (2σ 2 )r √
π
= {(2r − 1) · (2r − 3) · · · 5 · 3 · 1} σ 2r
q
(3) MDµ = σ π2
R
(4) t ϕ(t) dt = −ϕ(t) + c
1−Φ(x) ϕ(x)
(5) For x > 0, x1 − x13 < 1

ϕ(x)
< x
=⇒ 1 − Φ(x) ≃ x
, for large x (x > 3)

(6) X ∼ N (0, 1) =⇒ E[X] = − 12

Lognormal
1 2 2
(1) µ′r = erµ+ 2 r σ

1 2 2
(2) HM: eµ− 2 σ , GM: eµ , Median: eµ , Mode: eµ−σ

=⇒ Mean > Median > Mode =⇒ Positively skewed

iid 2
(3) Xi ∼ Λ (µ, σ 2 ) =⇒ GM (X) ∼ Λ (µ, σn )
˜
Cauchy
(1) µ′r exists for −1 < r < 1

(2) Median = Mode = µ

1.5.3 Multivariate
′
A ‘p’-component (dimensional) Random Vector (R.V.), X p×1 = X1 X2 · · · Xp defined on
(Ω, A) is a vector of p real-valued functions X1 (·), X2 (·), . . . , X˜p (·) defined on ‘Ω’ such that -
′
{ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , · · · , Xp (ω) ≤ xp } ∈ A, ∀x = x1 x2 · · · xp ∈ Rp is a random vector.
˜
 
(1) CDF: FX (x) = P {ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , · · · , Xp (ω) ≤ xp } , ∀x ∈ Rp
˜ ˜ ˜
(a) FX (x) is non-decreasing and right continuous w.r.t. each of x1 , x2 , . . . , xp .
˜ ˜
(b) FX (+∞, +∞, . . . , +∞) = 1, lim FX (x) = 0, ∀i = 1(1)p
˜ xi →−∞ ˜ ˜

(c) For h1 , h2 , . . . , hp > 0 -

P (x1 < X1 < x1 + h1 , x2 < X2 < x2 + h2 , . . . , xp < Xp < xp + hp ) ≥ 0


p
P p
(2) FX (xi ) − (p − 1) ≤ FX (x) ≤ p
FX1 (x1 ) FX2 (x2 ) · · · FXp (xp )
i=1 ˜ ˜ ˜

(3) The distribution of any sub-vector is a marginal distribution. There are (2p − 1) marginals.
 
p×1 X(1)
(4) Independence: X = ˜ , X(1) ⊥⊥ X(2) ⇐⇒ FX (x) = FX(1) (x(1) )·FX(2) (x(2) ), ∀x ∈ Rp
˜ X (2) ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
˜
CHAPTER 1. PROBABILITY 13

(5) E(a′ X) = a′ µ, V ar(a′ X) = a′ Σ a, Cov(a′ X, b′ X) = a′ Σ b for non-stochastic vectors a, b ∈ Rp


˜ ˜ ˜˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜˜ ˜ ˜ ˜ ˜
′ ′
(6) E(AX) = Aµ, D(AX) = AΣA , Cov(AX, B X) = AΣB for non-stochastic matrices Aq×p , B r×p
 ˜ ˜ ˜ ˜ ˜
(7) E (X − α) A(X − α) = trace(AΣ) + (µ − α)′ A(µ − α)

˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(8) A matrix Σ = (σij ) is a dispersion matrix if and only if it is n.n.d.
(9) Generalized variance: det(Σ), where Σ = E (X − µ)(X − µ)′ = E(X X ′ )−µµ′ = D (X p×1 )

˜ ˜ ˜ ˜ ˜ ˜ ˜˜ ˜
(10) Σ is p.d. iff there is no a ̸= 0 for which P (a′ X = c) = 1
Σ is p.s.d. iff there is a ˜vector
˜ a ̸= 0 for which
˜ ˜ P a′ (X − µ) = 0 = 1
˜ ˜ ˜ ˜ ˜
(11) det(Σ) > 0 =⇒ Non-singular, det(Σ) = 0 =⇒ Singular Distribution
(12) Σ = BB ′ for any dispersion matrix Σ, where B is n.n.d.
(13) Σ is p.d. =⇒ Σ = BB ′ , B is non-singular and let, Y = B −1 (X−µ) =⇒ E(Y ) = 0, D(Y ) = Ip
˜ ˜ ˜ ˜ ˜ ˜
ρ12 − ρ23 ρ31
(14) ρ12· 3 = p p
1 − ρ213 1 − ρ223

Multinomial
PMF: (X1 , X2 , . . . , Xk ) ∼ Multinomial (n; p1 , p2 , . . . , pk )
(a) Singular -
k k

n! X X
px1 pxk · · · pxkk

 , if xi = n, pi = 1
fX (x1 , x2 , . . . , xk ) = x1 ! x2 ! · · · xk ! 1 k i=1 i=1
˜ 

0 , otherwise
k−1 k−1
 k−1 k−1

P P P P
(b) Non-singular - xi ≤ n, pi < 1 xk = n − xi , p k = 1 − pi
i=1 i=1 i=1 i=1
Properties
(
npi (1 − pi ) , if i = j
(1) E(Xi ) = npi , Cov(Xi , Xj ) = , i, j = 1, 2, . . . , k − 1
−npi pj ̸ j
, if i =
r
pi pj  k
P 
(2) ρij = ρ(Xi , Xj ) = − , i ̸= j As Xi = n, Xi ↑ =⇒ Xj ↓ on an average
(1 − pi )(1 − pj ) i=1

D = diag (p1 , p2 , . . . , pk−1 )


(3) det(Σ) = nk−1 det(D − P P ′ ) = nk−1 det(D)(1 − P ′ D−1 P )
˜˜ ˜ ˜ P = (p1 , p2 , . . . , pk−1 )′
˜
k−1
(4) X k−1×1 ∼ Multinomial (n; p1 , . . . , pk−1 ),
P
pi < 1
˜ i=1
 
k−1
 X p1 
=⇒ X1 | (X2 = x2 , . . . , Xk−1 = xk−1 ) ∼ Bin n − xi ,
 
k−1

 P 
i=2 1− pi
i=2

=⇒ the regression of X1 on X2 , X3 , . . . , Xk−1 is linear and the distribution is heteroscedastic.


CHAPTER 1. PROBABILITY 14
 k−1
n
t′ X

pi (eti − 1)
P
(5) MGF: E e˜ ˜ = 1+
i=1
Multiple Correlation
(6) For singular case, ρ1· 23···k = 1
k−1
P
p1 · pi
i=2
(7) For non-singular case, ρ21· 23···k−1 =  k−1

P
(1 − p1 ) 1 − pi
i=2

p1 p2
ρ12· 34···k−1 = −p p
(1 − p2 − p3 − · · · − pk−1 ) (1 − p1 − p3 − · · · − pk−1 )

Bivariate Normal
(X, Y ) ∼ BN(µ1 , µ2 , σ1 , σ2 , ρ)
 
(1) X ∼ N (µ1 , σ12 ), Y | X = x ∼ N µ2 + ρ σσ22 (x − µ1 ), σ22 (1 − ρ2 )
 2     2 
1 X−µ1 X−µ1 Y −µ2 Y −µ2
(2) Q(X, Y ) = 1−ρ2 σ1
− 2ρ σ1 σ2
+ σ2 = U 2 + V 2 ∼ χ22

Y − µ2 − ρ σσ21 (X − µ1 ) X − µ1 iid
where, U = p , V = ∼ N (0, 1)
σ2 1 − ρ2 σ1
(3) (X, Y ) is independent ⇐⇒ ρ = 0
(4) (X, Y ) ∼ BN(0, 0, 1, 1, ρ)
 q 
X+Y
(a) (X + Y ) ⊥⊥ (X − Y ) =⇒ X−Y ∼ C 0, 1+ρ 1−ρ
q  q 
(b) E{max(X, Y )} = 1−ρ π
, PDF: f U (u) = 2ϕ(u)Φ u 1−ρ
1+ρ

(c) ρ(X 2 , Y 2 ) = ρ2
(
−1, X < 0
(5) (X, Y ) ∼ BN(0, 0, 1, 1, 0), Y1 = X1 sgn(X2 ), Y2 = X2 sgn(X1 ), where sgn(X) =
1, X>0
2
=⇒ (Y1 , Y2 ) ≁ BN, ρ(Y1 , Y2 ) = π

1.5.4 Truncated Distribution


Univariate
F (x) be the CDF of X over the sample space X. Let, A = (a, b] ⊂ X, then the CDF of X over
truncated space A is -


 0 ,x ≤ a

 F (x) − F (a)
G(x) = P (X ≤ x | X ∈ A) = ,a < x ≤ b

 F (b) − F (a)

1 ,x > b

f (x)
PMF/PDF: P (X∈A)
, x∈A
CHAPTER 1. PROBABILITY 15

Results
ˆ E(X) = E(X|A) · P (X ∈ A) + E(X|Ac ) · P (X ∈ Ac )

ˆ X ∼ Geo (p) =⇒ (X − k) | X ≥ k ∼ Geo (p)

ˆ Truncated Normal distribution is platykurtic.

ˆ Truncated Cauchy distribution has finite moments.

Bivariate
(X, Y ): bivariate R.V. with PDF, f (x, y) over the sample space, X ⊆ R2 . Let, A ⊂ X, then the PDF
over the truncated space is -

f (x, y)
g(x, y) =   , if (x, y) ∈ A
P (X, Y ) ∈ A

• µ′r,s (A) = E(X r Y s |A) = xr y s  f (x,y)  dx dy


RR
A P (X,Y )∈A

1.6 Sampling Distributions


1.6.1 Chi-square, t, F
χ2n
(1) E(X) = n, Var (X) = 2n
Γ( n2 + r)
(2) µ′r =2 ·r
n , if r > − n2
Γ( 2 )
D n

(3) χ2n ≡ Gamma 2
,2 , n∈N

tn
n
(1) E(X) = 0 (n > 1), Var (X) = n−2
(n > 2)

Γ( 21 + r) Γ( n2 − r)
(2) µ′2r = nr · √ · , if −1 < 2r < n
π Γ( n2 )
D
(3) t1 ≡ C (0, 1)
D
(4) t2n ≡ F1,n

Fn1 ,n2
n2 2n22 (n1 + n2 − 2)
(1) E(X) = (if n2 > 2), Var (X) = (if n2 > 4)
n2 − 2 n1 (n2 − 2)2 (n2 − 4)
n2 (n1 − 2)
(2) Mode: (if n1 > 2) =⇒ Mean > 1 > Mode (if n1 , n2 > 2)
n1 (n2 + 2)
CHAPTER 1. PROBABILITY 16
r
Γ( n21 + r) Γ( n22 − r)

n2
(3) µ′r = · · , if −n1 < 2r < n2
n1 Γ( n21 ) Γ( n22 )

(4) ξp and ξp′ are pth quantile of Fn1 ,n2 and Fn2 ,n1 respectively =⇒ ξp ξ1−p =1
1 D
(5) F ∼ Fn,n =⇒ F ≡ and median (F ) = 1
F
n1 n n 
1 2
(6) F ∼ Fn1 ,n2 =⇒ F ∼ Beta2 ,
n2 2 2
(7) Points of inflexion are equidistant from mode (if n > 4)

1.6.2 Order Statistics


Order Statistics: X(1) ≤ X(2) ≤ · · · ≤ X(n)
n
n
{F (x)}k {1 − F (x)}n−k = IF (x) (r, n − r + 1)
P 
(1) FX(r) (x) = k
k=r

(2) FX(1) ,X(n) (x1 , x2 ) = {F (x2 )}n − {F (x2 ) − F (x1 )}n , x1 < x2

Only for Absolutely Continuous Random Variable - CDF: F (x), PDF: f (x)
(3) fX(r) (x) = n!
(r−1)!(n−r)!
{F (x)}r−1 f (x) {1 − F (x)}n−r , x ∈ R

x<y
(4) Joint PDF: n!
(r−1)!(s−r−1)!(n−r)!
{F (x)}r−1 f (x) {F (y) − F (x)}s−r−1 f (y) {1 − F (y)}n−s ,
(r < s)
R∞
(5) Sample Range: fR (r) = n(n − 1) {F (r + s) − F (s)}n−2 f (r + s)f (s) ds, 0 < r < ∞
−∞

Results
iid
(6) Xi ∼ U(0, 1) =⇒ X(r) ∼ Beta (r, n − r + 1), r = 1, 2, . . . , n
iid   σ   σ
(7) X1 , X2 ∼ N (µ, σ 2 ) =⇒ E X(1) = µ − √ , E X(2) = µ + √
π π
iid 1
(8) X1 , X2 , X3 ∼ N (µ, σ 2 ) =⇒ Sample Range: 2
(|X1 − X2 | + |X2 − X3 | + |X3 − X1 |)
iid
(9) Xi ∼ Exp (θ) =⇒ E[X(n) ] = θ 1 + 21 + 31 · · · + 1

n

iid ⊥
⊥ θ

(10) Xi ∼ Exp (θ) =⇒ Ui = X(i) − X(i−1) ∼ Exp n−i+1
, X(0) = 0

(11) X1 , X2 , . . . , X2k+1 : random sample from a continuous distribution, symmetric about µ


=⇒ Distribution of X̃ is also symmetric about µ
 
X(1) +X(n)
=⇒ E(X̃) = µ, E 2
= µ (if exists)

iid  iid
(12) Xi ∼ Shifted Exp (θ, 1) =⇒ (n − i + 1) X(i) − X(i−1) ∼ Exp (1)
  n  
=⇒ 2n X(1) − θ ∼ χ22 ⊥⊥ 2 X(i) − X(1) ∼ χ22n−2
P
i=2
CHAPTER 1. PROBABILITY 17

1.7 Distribution Relationships


1.7.1 Binomial
n
iid P
(1) Xi ∼ Bernoulli (p) =⇒ Xi ∼ Bin (n, p)
i=1

k
 k


⊥ P P
(2) Xi ∼ Bin (ni , p) =⇒ Xi ∼ Bin ni , p
i=1 i=1

iid
(3) X1 , X2 ∼ Bin (n, 12 ) =⇒ X1 − X2 is symmetric about ‘0’.
m
 k

⊥⊥ P P nk
(4) Xi ∼ Bin (ni , p) =⇒ Xk Xi = t ∼ Hyp N = ni , t, , k = 1(1)m
i=1 i=1 N
(5) Bin (n, p) → Poisson (λ = np), for n → ∞ and p → 0 such that np is finite.

(6) Bin (n, p) → N np, np(1 − p) , for large n and moderate p.

1.7.2 Negative Binomial


iid
(1) Xi ∼ Geo (p) =⇒ X(1) ∼ Geo (1 − q n ), where q = 1 − p
iid
(2) X, Y ∼ Geo (p) ⇐⇒ X X + Y = t ∼ U {0, 1, 2, . . . , t}
iid
(3) X, Y ∼ Geo (p) =⇒ min{X, Y } ⊥⊥ (X − Y )
n
iid P
(4) Xi ∼ Geo (p) =⇒ Xi ∼ NB (n, p)
i=1

k
 k


⊥ P P
(5) Xi ∼ NB (ni , p) =⇒ Xi ∼ NB ni , p
i=1 i=1

(6) NB (n, p) → Poisson (λ = n(1 − p)), for n → ∞ and p → 1 such that n(1 − p) is finite.

1.7.3 Poisson
k
 k


⊥ P P
(1) Xi ∼ Poisson (λi ) =⇒ Xi ∼ Poisson λi
i=1 i=1

m
  m

⊥ P λk P
(2) Xi ∼ Poisson (λi ) =⇒ Xk Xi = t ∼ Bin t, p = , k = 1(1)m, where λ = λi
i=1 λ i=1

k

⊥ P
(3) Xi ∼ Poisson (λi ) =⇒ (X1 , X2 , . . . , Xk ) Xi = t ∼ Multinomial (t, p1 , p2 , . . . , pk ), where
i=1
λi
pi = k
, i = 1, 2, . . . , k
P
λi
i=1
CHAPTER 1. PROBABILITY 18

1.7.4 Normal
D
(1) X ∼ N (0, σ 2 ) =⇒ X ≡ −X
n
 n n



N (µi , σi2 ) a2i σi2
P P P
(2) Xi ∼ =⇒ ai X i ∼ N ai µ i ,
i=1 i=1 i=1
n n
iid
(3) Xi ∼ N (µ, σ 2 ),
P P
ai Xi ⊥⊥
bi Xi ⇐⇒ a.b = 0
i=1 ˜˜
i=1
X̄ and (X1 − X̄, X2 − X̄, . . . , Xn − X̄) are independently distributed

1.7.5 Gamma
iid θ

(1) Xi ∼ Exp (θ) =⇒ X(1) ∼ Exp n
iid
(2) X, Y ∼ Exp (θ) =⇒ X X + Y = t ∼ U(0, t)
(3) X ∼ Shifted Exp (µ, θ) =⇒ (X − µ) ∼ Exp (θ)
X−µ
(4) X ∼ DE (µ, σ) =⇒ σ
∼ Exp (θ = 1) and |X| ∼ Shifted Exp (µ, σ)
n
iid P
(5) Xi ∼ Exp (θ) ≡ Gamma (n = 1, θ) =⇒ Xi ∼ Gamma (n, θ)
i=1
n
 k


⊥ P P
(6) Xi ∼ Gamma (ni , θ) =⇒ Xi ∼ Gamma ni , θ
i=1 i=1

1.7.6 Beta
X
(1) X ∼ Beta (m, n) =⇒ 1−X
∼ Beta2 (m, n)
X
(2) X ∼ Beta2 (m, n) =⇒ 1+X
∼ Beta (m, n)

(3) X1 ∼ Beta (n1 , n2 ) & X2 ∼ Beta (n1 + 21 , n2 ), independently =⇒ X1 X2 ∼ Beta (2n1 , 2n2 )

1.7.7 Cauchy
iid
(1) Xi ∼ C (µ, σ) =⇒ X̄n ∼ C(µ, σ)
iid iid
(2) Xi ∼ C (0, σ) =⇒ X1i ∼ C 0, σ1 =⇒ HMX ∼ C(0, σ)

˜
n
n n

⊥⊥ P P P
(3) Xi ∼ C (µi , σi ) =⇒ Xi ∼ C µi , σi
i=1 i=1 i=1

1.7.8 Others
n
iid
Xi2 ∼ χ2n
P
(1) Xi ∼ N (0, 1) =⇒
i=1

(2) U ∼ N (0, 1), V ∼ χ2n , independently =⇒ √U ∼ tn


V /n


⊥ U1 /n1
(3) Ui ∼ χ2ni =⇒ U2 /n2
∼ Fn1 ,n2
D
(4) X is symmetric about ‘0’ =⇒ X ≡ −X
CHAPTER 1. PROBABILITY 19

1.8 Transformations
1.8.1 Orthogonal
y = T (x) = An×n xn×1 → Linear Transformation. [If det(A) ̸= 0, Jacobian: J = det(A−1 )].
˜ ˜ ˜
(1) If T (x) is orthogonal transformation then AT A = In =⇒ det(A) = ±1 & |J| = 1
˜
(2) y y = xT x =⇒ |y|2 = |x|2 (length is preserved)
T
˜ ˜ ˜ ˜ ˜ ˜
n
iid
Xi2 = X T A1 X + X T A2 X, where A1 , A2 are n.n.d.
P
(3) Cochran’s theorem: Xi ∼ N (0, 1) &
i=1 ˜ ˜ ˜ ˜
matrices with ranks r1 , r2 , r1 + r2 = n

=⇒ X T A1 X ∼ χ2r1 and X T A2 X ∼ χ2r2 , independently.


˜ ˜ ˜ ˜

1.8.2 Polar
(1) For a point with Cartesian coordinates (x1 , x2 , . . . , xn ) in Rn -

x1 = r cos θ1

x2 = r sin θ1 cos θ2
x3 = r sin θ1 sin θ2 cos θ3
..
.
xn−1 = r sin θ1 · · · sin θn−2 cos θn−1
xn = r sin θ1 · · · sin θn−2 sin θn−1
n
where, r2 = x2i ,
P
0 < r < ∞ and 0 < θ1 , θ2 , . . . , θn−2 < π, 0 < θn−1 < 2π
i=1

Jacobian: |J| = rn−1 (sin θ1 )n−2 (sin θ2 )n−3 · · · sin θn−2

(2) X = R cos θ, Y = R sin θ, 0 < R < ∞, 0 < θ < 2π


iid θ ∼ U (0, 2π)
X, Y ∼ N (0, 1) ⇐⇒ D , independently.
R2 ∼ Exp (2) ≡ χ22

(3) θ ∼ U(0, 2π) ⊥⊥ R2 ∼ χ22 =⇒ R sin(θ + θ0 ) ∼ N (0, 1), θ0 is a fixed quantity

1.8.3 Special Transformations


X−a

(1) X ∼ U(a, b) =⇒ − ln b−a
∼ Exp (1)
iid
(2) X1 , X2 ∼ U(0, 1) =⇒ X1 + X2 ∼ Triangular (0, 2), |X1 − X2 | ∼ Beta (1, 2)

(3) Box-Muller Transformation:



iid Y1 = √−2 ln X1 cos(2πX2 ) iid
X1 , X2 ∼ U (0, 1) =⇒ ∼ N (0, 1)
Y2 = −2 ln X1 sin(2πX2 )
CHAPTER 1. PROBABILITY 20

(4) X ∼ Gamma (n1 , θ), Y ∼ Gamma (n2 , θ)


X
=⇒ X + Y ∼ Gamma (n1 + n2 , θ), X+Y
∼ Beta (n1 , n2 ), independently
X
=⇒ X + Y ∼ Gamma (n1 + n2 , θ), Y
∼ Beta2 (n1 , n2 ), independently
iid X X
(5) X, Y ∼ N (0, 1) =⇒ ,
Y |Y |
∼ C(0, 1)
iid 1
=⇒ (X1 − X2 ), RX1 − (1 − R)X2 ∼ DE 0, 1θ
 
(6) X1 , X2 ∼ Exp (θ), R ∼ Bernoulli 2

(7) X ∼ Beta (a, b) ⊥⊥ Y ∼ Beta (a + b, c) =⇒ XY ∼ Beta (a, b + c)

(8) X ∼ U − π2 , π2 ⇐⇒ tan X ∼ C(0, 1)




iid
(9) Dirichlet Transformation: Xi ∼ Exp (θ)
n−k+1
P
n
Xi
P i=1
=⇒ Y1 = Xi ∼ Gamma (n, θ), Yk = n−k+2
∼ Beta (n − k + 1, 1), k = 2, 3, . . . , n
i=1 P
Xi
i=1

Y1 , Y2 , . . . , Yn are independently distributed.


iid
(10) X1 , X2 , X3 , X4 ∼ N (0, 1) =⇒ X1 X2 ± X3 X4 ∼ DE (0, 1) (valid for any combination)
n
iid 2
Xi ∼ χ22n
P
(11) Xi ∼ Exp (θ) =⇒ θ
i=1

(12) X ∼ Beta (θ, 1) =⇒ − ln X ∼ Exp 1θ




 
(13) X ∼ Pareto (θ, x0 ) =⇒ ln xX0 ∼ Exp 1

θ

aX1 + bX2
(14) X1 , X2 ∼ χ22 =⇒ ∼ U(a, b) (a < b)
X1 + X2
Chapter 2

Statistics

2.1 Point Estimation


2.1.1 Minimum MSE
(1) Measures of Closeness: T : Statistic/Estimator, ψ(θ): Parametric function
Destroying the randomness, general measures of closeness are -

(a) E|T − θ|r , for some r > 0 (smaller value is better)


 
(b) P |T − θ| < ϵ , for ϵ > 0 (higher value is better)

 2
(2) Mean Square Error: MSEψ(θ) (T ) = E[T − ψ(θ)]2 = V ar(T ) + b ψ(θ), T
T can be said a ‘good estimator’ of ψ(θ) if it has a small variance.

(3) E(m′r ) = µ′r , if µ′r exists =⇒ E(X̄) = µ, provided µ = E(X1 ) exists


 n

2 1 2
= σ 2 , population variance (if exists) but E(m2 ) ̸= σ 2 = µ2
P
(4) E s = n−1
(Xi − X̄)
i=1

 n
  rn 
iid 2
pπ 1
P 2
P 2
(5) Xi ∼ N (0, σ ) =⇒ E T1 = 2
· n
|Xi | = σ = E T2 = Cn Xi .
i=1 i=1
2
n n Γ( n2 )
Xi2 ,
P P
Here, T1 , T2 are two UEs of σ based on |Xi | and respectively. (Cn = √ )
i=1 i=1 2 Γ( n+1
2
)

iid n−1 1

(6) Xi ∼ Exp (θ) =⇒ E(X̄) = θ, E nX̄
= θ

iid
(7) Xi ∼ N (µ, σ 2 ) =⇒ T ′ = n−1
n+1
· s2 has the smallest MSE in the class {bs2 : b > 0} i.e. a biased

estimator T is better than an UE s2 , in terms of MSE.

(8) X ∼ Poisson (λ) =⇒ T (X) = (−1)X is the UMVUE of e−2λ which is an absurd UE.
Note: Absurd unbiased estimator is that unbiased estimator which can take values outside the
parameter space.

21
CHAPTER 2. STATISTICS 22

iid
(9) Xi ∼ Poisson (λ) =⇒ Tα = αX̄ + (1 − α)s2 is an UE of λ for any α ∈ [0, 1]
=⇒ There may be infinitely many UEs

(10) Estimable Parametric Functions

(a) X ∼ Bin (n, p) =⇒ E[(X)r ] = (n)r pr , r = 1, 2, . . . , n


=⇒ Only polynomials of degree ≤ n are estimable.
(b) X ∼ Bernoulli (p) =⇒ Only ψ(p) = a + bp is estimable.

(c) X ∼ Poisson (λ) =⇒ E[(X)r ] = λr , r = 1, 2, . . . =⇒ e−λ is estimable but not λ1 , λ.

1
(11) X ∼ Bernoulli (θ), T1 (X) = X and T2 (X) = 2
=⇒ Between T1 and T2 none are uniformly better than the other, in terms of MSE.

iid    
(12) Xi ∼ f (x; θ), E T (X1 ) = θ, V ar T (X1 ) < ∞
=⇒ lim V ar(Sn ) = 0, where Sn is the UMVUE of θ
n→∞

(13) Best Linear Unbiased Estimator (BLUE)


T1 , T2 , . . . , Tk be UEs of ψ(θ) with known variances v1 , v2 , . . . , vk and are independent
k
1 X Ti
=⇒ BLUE of ψ(θ) : T = k
P 1 v
i=1 i
vi
i=1

2.1.2 Consistency
 
P |Tn − θ| < ϵ → 1
Tn is consistent for θ ⇐⇒  or  as n → ∞, ∀θ ∈ Ω for every ϵ > 0
P |Tn − θ| > ϵ → 0

(1) Sufficient Condition


P
E(Tn − θ)2 → 0 ⇐⇒ E(Tn ) → θ, V ar(Tn ) → 0 as n → ∞ =⇒ Tn −→ θ

n
P
(2) m′r = 1
xri −→ µ′r = E(X1r ), r = 1, 2, . . . , k (if k th order moment exists)
P
n
i=1

P
(3) If Tn −→ θ then -
P
(a) bn Tn −→ θ, if bn → 1 as n → ∞
P
(b) an + Tn −→ θ, if an → 0 as n → ∞

This also shows that, ‘unbiasedness’ and ‘consistency’ are not interrelated.

P P
(4) Invariance Property: Tn −→ θ =⇒ ψ(Tn ) −→ ψ(θ), provided ψ(·) is continuous
CHAPTER 2. STATISTICS 23

2.1.3 Sufficiency
S is sufficient for θ ⇐⇒ (X1 , X2 , . . . , Xn ) | S = s is independent of θ, ∀ s
S is sufficient for θ ⇐⇒ T | S = s is independent of θ, ∀ s, for all statistic T .

(1) Any one-to-one function of a sufficient statistic is also sufficient for a parameter.

(2) Factorization Theorem


n
Y 
f (xi ; θ) = g T (x); θ · h(x) ⇐⇒ T (X) is sufficient for θ
i=1
˜ ˜ ˜

where, g T (x); θ depends on θ and on x only through T (x) and h(x) is independent of θ.
˜ ˜ ˜ ˜
(3) Trivial Sufficient Statistic: (X1 , X2 , . . . , Xn ) and (X(1) , X(2) , . . . , X(n) ).
Sufficiency means “space reduction without losing any information”. In this aspect, the
order statistics, (X(1) , X(2) , . . . , X(n) ) is better as a sufficient statistic than the whole sample
i.e. (X1 , X2 , . . . , Xn ), with respect to data summarization.

(4) T1 , T2 are two sufficient statistic for θ =⇒ they are related


n
iid P
(5) Xi ∼ DE (µ, σ), ∃ non-trivial sufficient statistic if µ is known (say, µ0 ) and that is |Xi − µ0 |.
i=1

Minimal Sufficient Statistic


(6) T0 is a minimal sufficient of θ if,

(a) T0 is sufficient
(b) T0 is a function of every sufficient statistic
f (x;θ)
(7) Theorem: For two sample points x and y, the ratio f (˜y;θ)
is independent of θ if and only if
˜ ˜ ˜
T (x) = T (y), then T (X) is minimal sufficient for θ.
˜ ˜ ˜

2.1.4 Completeness
T is complete for θ ⇐⇒ “E[h(T )] = 0, ∀θ ∈ Ω =⇒ P [h(T ) = 0] = 1, ∀θ ∈ Ω”

Remark
If a two component statistic (T1 , T2 ) is minimal sufficient for a single component parameter θ, then
in general (T1 , T2 ) is not complete.
It is possible to find h1 (T1 ) and h2 (T2 ) such that,

E[h1 (T1 )] = ψ(θ) = E[h2 (T2 )], ∀θ

=⇒ E[h(T1 , T2 )] = 0, ∀θ where, h(T1 , T2 ) = h1 (T1 ) − h2 (T2 ) ̸= 0


=⇒ (T1 , T2 ) is not complete.
CHAPTER 2. STATISTICS 24

2.1.5 Exponential Family


One Parameter
An one parameter family of PDFs or PMFs, {f (x; θ) : θ ∈ Ω} that can be expressed in the form -
 
f (x; θ) = exp T (x)u(θ) + v(θ) + w(x) , x ∈ S

with the following regularity conditions -


C1 : The support, S = {x : f (x; θ) > 0} is independent of θ
C2 : The parameter space, Ω is an open interval in R i.e. Ω = {θ : a < θ < b}
C3 : {1, T (x)} and {1, u(θ)} are linearly independent i.e. T (x) and u(θ) are non-constant functions
is called One Parameter Exponential Family (OPEF)

K Parameter
A K-parameter family of PDFs or PMFs, {f (x; θ) : θ ∈ Ω ⊆ Rk } satisfying the form -
" k ˜ ˜ #
X
f (x; θ) = exp Ti (x)ui (θ) + v(θ) + w(x) , x ∈ S
˜ i=1
˜ ˜

with the following regularity conditions -


C1 : The support, S = {x : f (x; θ) > 0} is independent of θ
˜ ˜
C2 : The parameter space, Ω ⊆ Rk is an open rectangle in Rk i.e. ai < θi < bi , i = 1(1)k
C3 : {1, T1 (x), . . . , Tk (x)} and {1, u1 (θ), . . . , uk (θ)} are linearly independent
˜ ˜
is called K-parameter Exponential Family

Theorem
n
iid P
(a) X ∼ f (x; θ) ∈ OPEF =⇒ T (Xi ) is complete and sufficient for the family.
˜ i=1
 n n

iid P P
(b) X ∼ f (x; θ) ∈ K-parameter Exponential Family =⇒ T1 (Xi ), . . . , Tk (Xi ) is complete
˜ ˜ i=1 i=1
and sufficient for the family.

Distributions in Exponential Family



a(x) θx
a(x) θx
P
(a) f (x; θ) = g(θ)
, x = 0, 1, 2, . . . ; 0 < θ < ρ, a(x) ≥ 0, g(θ) = (Power Series)
x=0
=⇒ Binomial (n known), Poisson, Negative Binomial (n known) are in OPEF.
(b) Normal, Exponential, Gamma, Beta, Pareto (x0 known) are in the Exponential family.
(c) Uniform, Cauchy, Laplace, Shifted Exponential, {N (θ, θ2 ) : θ ̸= 0}, {N (θ, θ) : θ > 0} are not
in the Exponential Family.
The last two families are identified by Lehmann as ‘Curved Exponential Family’.
CHAPTER 2. STATISTICS 25

2.1.6 Methods of finding UMVUE


Theorem 2.1.6.1 (Necessary & Sufficient n Condition for UMVUE) Let X has a distribution
o
   
from {f (x; θ) : θ ∈ Ω}. Define, Uψ = T (X) : Eθ T (X) = ψ(θ), V arθ T (X) < ∞, ∀θ ∈ Ω and
n     o
U0 = u(X) : Eθ T (X) = 0, V arθ u(X) < ∞, ∀θ ∈ Ω . Then, T0 ∈ Uψ is UMVUE of θ if and
only if Covθ (T0 , u) = 0, ∀θ ∈ Ω, ∀u ∈ U0

Results
ˆ UMVUE if exists, is unique
k k
ˆ Ti is UMVUE of ψ(θ) =⇒
P P
ai Ti is UMVUE of ai ψi (θ)
i=1 i=1

ˆ T is UMVUE =⇒ T k is UMVUE =⇒ any polynomial function, f (T ) is UMVUE of their


expectations

Theorem 2.1.6.2 (Rao-Blackwell) Let X has a distribution from {f (x; θ) : θ ∈ Ω} and h be a


statistic from Uψ = {h : E(h) = ψ(θ), V ar(h) < ∞, ∀θ ∈ Ω}. Let, T be a sufficient statistic for θ.
Then-

(a) E(h | T ) is an UE of ψ(θ)


 
(b) V ar E(h | T ) ≤ V ar(h), ∀θ ∈ Ω

Implication: UMVUE is necessarily a function of minimal sufficient statistic

Theorem 2.1.6.3 (Lehmann-Scheffe) Let X has a distribution from {f (x; θ) : θ ∈ Ω} and T be


a complete sufficient statistic for θ. Then-
 
(a) If E h(T ) = ψ(θ), then UMVUE of ψ(θ) is the unique UE, h(T )

(b) If h∗ is an UE of ψ(θ), then E(h∗ | T ) is the UMVUE of ψ(θ)

UMVUE of Different Families


Binomial
n
iid P
Xi ∼ Bernoulli (p) =⇒ Complete Sufficient: T = Xi
i=1

T
(1) p = E(X1 ) : n

T (n−T )
(2) p(1 − p) = V ar(X1 ) : n(n−1)

(T )r T (T −1)···(T −r+1)
(3) pr : (n)r
= n(n−1)···(n−r+1)
, r = 1, 2, . . . , n

n
iid P
Xi ∼ Bin (n, p) =⇒ Complete Sufficient: T = Xi
i=1
X̄n T
→ p: = 2
n n
CHAPTER 2. STATISTICS 26

Poisson
n
iid P
Xi ∼ Poisson (λ) =⇒ Complete Sufficient: T = Xi
i=1

(T )r
(1) λr : nr
, r = 1, 2, . . .
k T (n−1)T −k
(2) e−k λk! = P (X1 = k) :

k nT

k T
(3) e−kλ = P (X1 = 0, X2 = 0, . . . , Xk = 0) : 1 −

n
, 1≤k<n

Geometric
n
iid P
Xi ∼ Geo (p) =⇒ Complete Sufficient: T = Xi
i=1
n−1
→ p = P (X1 = 0) : n−1+T

Uniform
iid
(1) Discrete: Xi ∼ U{1, 2, . . . , N } =⇒ Complete Sufficient: T = X(n)
T n+1 −(T −1)n+1
→N : T n −(T −1)n

iid
(2) Continuous: Xi ∼ U(0, θ) =⇒ Complete Sufficient: T = X(n)
n ′ o
→ ψ(θ) : T ψn(T ) + ψ(T ) ψ(θ) = θr : n+r
  r
n
T
iid 
Also if, Xi ∼ U(θ1 , θ2 ) =⇒ Complete Sufficient: T = X(1) , X(n)
nX(1) −X(n) nX(n) −X(1)
→ θ1 : n−1
θ2 : n−1

Gamma
n
iid P
Xi ∼ Exp (θ) =⇒ Complete Sufficient: T = Xi
i=1

T
(1) θ = E(X1 ) : n
1 n−1
(2) θ
: T
k k n−1
(3) P (X1 > k) = e− θ : 1 −

T
, if k < T
k (n−1) k n−2
(4) f (k; θ) = 1θ e− θ :

T
1− T
, if k < T
 n n

iid P P
Xi ∼ Gamma (p, θ) =⇒ Complete Sufficient: ln Xi , Xi (mean = p θ)
i=1 i=1

Γ(np)
For known p, θr : T r , r > −np
Γ(np + r)
iid σ0
Xi ∼ Shifted Exp(θ, σ0 ) =⇒ Complete Sufficient: X(1) → θ : X(1) −
n
n Γ(n)
iid
|Xi − µ0 | → σ r : T r , r > −n
P
Xi ∼ DE(µ0 , σ) =⇒ Complete Sufficient: T =
i=1 Γ(n + r)
CHAPTER 2. STATISTICS 27

Beta
 n n

iid P P
Xi ∼ Beta (θ1 , θ2 ) =⇒ Complete Sufficient: ln Xi , ln(1 − Xi )
i=1 i=1
n−1
For θ2 = 1, UMVUE of θ1 is n
P
− ln Xi
i=1

Normal
iid
n σ02
Xi ∼ N (µ, σ02 ) =⇒ Complete Sufficient: µ2 : X̄ 2 −
P
Xi or X̄ → µ : X̄
i=1 n
n
iid
Xi ∼ N (µ0 , σ 2 ) =⇒ Complete Sufficient: (Xi − µ0 )2 or S02 → σ r : S0r Kn,r , r > −n
P
i=1

 n n

iid 
Xi ∼ N (µ, σ 2 ) =⇒ Complete Sufficient: Xi2 or X̄, S 2
P P
Xi ,
i=1 i=1

(1) µ = E(X1 ) : X̄
" r
#
(n − 1) 2 Γ( n−1
2
)
(2) σ 2 : S 2 , σ r : S r Kn−1,r , r > −(n − 1) Kn−1,r = r n−1+r
2 2 Γ( 2 )
µ
(3) : X̄ · Kn−1,−r S −r , r < (n − 1)
σr
(4) pth quantile of X1 = ξp = µ + σ Φ−1 (p) : X̄ + Kn−1,1 S Φ−1 (p)
h i
iid
Xi ∼ N (θ, 1) ϕ(x; µ, σ 2 ) : PDF of N (µ, σ 2 )
 r 
n
(1) Φ(k − θ) = P (X1 ≤ k) : Φ (k − X̄)
n−1
(2) ϕ k; θ, 1) : ϕ(k; X̄, n−1

n

h2
(3) ehθ : ehX̄− 2n

Others
n n
iid Q P
Xi ∼ Pareto(θ, x0 ) =⇒ Complete Sufficient: Xi or ln Xi (x0 known)
i=1 i=1
P  Xi  r
n 
1 Γ(n)
→ r : ln x0 , r > −n
θ Γ(n + r) i=1
n−1
Special case: r = −1 =⇒ θ : P
n  
ln Xi
x0
i=1

iid
Xi ∼ Pareto(θ0 , α) =⇒ Complete Sufficient: X(1) (θ0 known)
 
r r r
 
→α : 1− X(1) if r < nθ0
nθ0
CHAPTER 2. STATISTICS 28

2.1.7 Cramer-Rao Inequality


Let X has a distribution from {f (x; θ) : θ ∈ Ω} satisfying the following regularity conditions -
(i) The parameter space, Ω is an open interval in R i.e. Ω = {θ : a < θ < b}

(ii) The support, S = {x : f (x; θ) > 0} is independent of θ



 
(iii) For each x ∈ S, ∂θ ln f (x; θ) exists and finite
P R
(iv) The identity “ f (x; θ) = 1” or “ f (x; θ)dx = 1” can be differentiated under the summation
x∈S S
or integral sign.
n     o
(v) T ∈ Uψ = T (X) : Eθ T (X) = ψ(θ), V arθ T (X) < ∞, ∀θ ∈ Ω is an UE of ψ(θ) such that
 
the derivative of ψ(θ) = E T (X) with respect to θ can be evaluated by differentiating under
the summation or integral sign.
′ (θ)}2 2
Then, V ar T (X) ≥ {ψI(θ)
  ∂
where I(θ) = E ∂θ ln f (x; θ) >0

Equality Case
‘=’ holds in CR Inequality iff -
∂  I(θ)
ln f (x; θ) = ± ′ {T − ψ(θ)} a.e. . . . . . . (∗)
∂θ ψ (θ)
⇐⇒ the family {f (x; θ) : θ ∈ Ω} belongs to OPEF
→ (∗) is the necessary and sufficient condition for attaining CRLB by an UE, T (X) of ψ(θ).

Remarks
 
(1) Even in OPEF, the only parametric function for which T (X) attains CRLB, is that E T (X)
′ (θ)
(2) If MVBUE T (X) of ψ(θ) exists, then it is given by, T (X) = ψ(θ) ± ψI(θ) ∂

· ∂θ ln f (X; θ)
MVBUE is also the UMVUE but UMVUE may not be MVBUE always -
ˆ Non-regular case: one of the regularity conditions does not hold, eg. {U(0, θ) : θ > 0}
ˆ If all the regularity conditions hold but CRLB is not attainable, then there may exist
UMVUE but that is not the MVBUE
(3) Fisher’s Information
∂ 2 h 2 i

(a) I(θ) = E ∂θ ln f (X; θ) = E − ∂θ2 ln f (X; θ)

(b) IX (θ) = n · IX1 (θ), if the regularity conditions hold


˜
iid
(c) X ∼ {f (x; θ) : θ ∈ Ω} =⇒ for any statistic T (X), IT (X) (θ) ≤ IX (θ)
˜ ˜ ˜ ˜
‘=’ holds if and only if T (X) is sufficient
˜ ∂ 2
E(T ) {ψ ′ (θ) + b′ (θ)}2
(4) Lower bound for the MSE of any estimator: MSEψ(θ) (T ) ≥ ∂θ =
I(θ) I(θ)
(5) {C(θ, 1) : θ ∈ R} is a regular family as the CR inequality holds, but CRLB is not attainable
CHAPTER 2. STATISTICS 29

2.1.8 Methods of Estimation


Method of Moments
If the sample drawn is a good representation of the population, then this method is quite reasonable.
Equate ‘k’ sample moments m′r with corresponding population moments µ′r and solve for k unknowns
for a k-parameter family.

Method of Least Squares


Here we minimize the sum of squares of errors with respect to the parameter (θ1 , θ2 , . . . , θk )

Model: yi = E(Y | X = xi ) + zi

Assumptions: Conditional distribution of Y | X = xi is homoscedastic.

Method of Maximum Likelihood


(1) Bernoulli (p)

(a) p ∈ (0, 1) =⇒ No MLE of p when x = 0 or x = 1, else X̄


˜ ˜ ˜ ˜
(b) Ω = p : p ∈ {Q′ ∩ [0, 1]} =⇒ No MLE of p ∈ Ω


iid
(2) Xi ∼ U(0, θ), θ > 0 =⇒ θ̂ = X(n)
 
iid
(3) Xi ∼ U(α, β), α < β =⇒ θˆ = α̂, β̂ = X(1) , X(n)

˜
(4) MLE is not unique
iid
Xi ∼ U(θ − k1 , θ + k2 ) =⇒ θ̂ = α(X(n) − k2 ) + (1 − α)(X(1) + k1 ), α ∈ [0, 1]

iid 
(5) Xi ∼ U(−θ, θ), θ > 0 =⇒ θ̂ = max {|Xi |} = max −X(1) , X(n)
i=1(1)n

   n

iid 2 2 1
P 2
(6) Xi ∼ N (µ, σ ) =⇒ µ̂, σ = X̄, n (Xi − X̄)
b
i=1

iid X̄
(7) Xi ∼ Gamma (p0 , θ) =⇒ θ̂ = p0
(p0 known)
iid n
(8) Xi ∼ Beta (θ, 1) =⇒ θ̂ = n
P
− ln Xi
i=1

 
 
iid
(9) Xi ∼ Pareto (x0 , θ) =⇒ xˆ0 , θ̂ = X(1) , n
n 
P Xi
ln X(1)
i=1

 n

iid 1
P
(10) Xi ∼ DE (µ, σ) =⇒ (µ̂, σ̂) = X̃, n |Xi − X̃|
i=1

iid 
(11) Xi ∼ Shifted Exp (µ, σ) =⇒ (µ̂, σ̂) = X(1) , X̄ − X(1)
In particular if µ = σ > 0, then µ̂ = X(1)
CHAPTER 2. STATISTICS 30

iid 1 3

(12) Truncated parameter: Xi ∼ Bernoulli (p), p ∈ ,
4 4
. Here, the MLE of p is -
1 1


 , if X̄ <


 4 4
1 3

p̂(X) = X̄, if ≤ X̄ ≤
˜ 
 4 4
 3 , if 3


X̄ >

4 4
1 3
It is better than the UMVUE, X̄ of p ∈ 4 , 4 , in terms of variability

Properties
(13) MLE, if exists is a function of (minimal) sufficient statistic
(14) Under the regularity conditions of CR inequality MVBUE exists, then that is the MLE
(15) Invariance property: θ̂ is the MLE of θ =⇒ h(θ̂) is the MLE of h(θ) for any function h(·)
(16) For large n, the bias of MLE become insignificant
(17) Under normality, LSE ≡ MLE.
(18) Asymptotic property
(a) Under certain regularity conditions, the MLE θ̂ of θ is consistent and also
√ 
   
a 1 1 
a 1
θ̂ ∼ N θ, = or n θ̂ − θ ∼ N 0,
nI1 (θ) In (θ) I1 (θ)

(b) In OPEF, let θ̂ is the MLE of θ then -


!
√  √ n {ψ ′ (θ)}2
 

a 1 o
a
n θ̂ − θ ∼ N 0, =⇒ n ψ(θ̂) − ψ(θ) ∼ N 0,
I1 (θ) I1 (θ)

2.2 Testing of Hypothesis


2.2.1 Tests of Significance
Univariate Normal
(1) H0 : µ = µ0 against H1 : µ ̸= µ0
n √ o √ 
(a) σ = σ0 (known) → ω = x : n(x̄−µ σ0
0)
> τ α
2
n(x̄−µ0 )
σ0
> τα if H1 : µ > µ0
˜
n √ o n
(b) σ unknown → ω = x : n(x̄−µ 0) 2 1
(Xi − X̄)2
P
s
> t α
2
; n−1 , s = n−1
˜ i=1

(2) H0 : σ = σ0 against H1 : σ ̸= σ0
n o
ns20 2 2
(a) µ = µ0 (known), Z = σ02
→ ω = Zobs > χ α ; n or Zobs < χ1− α ; n
2 2
n o
(n−1)s2
(b) µ unknown, Z = σ02
→ ω = Zobs > χ2α ; n−1 or Zobs < χ21− α ; n−1
2 2
CHAPTER 2. STATISTICS 31

Two Independent Univariate Normal


(1) H0 : µ1 − µ2 = ξ0 (known) against H0 : µ1 − µ2 ̸= ξ0

(a) σ1 , σ2 are known Z = X̄r1 −σ2X̄2 −ξ



σ 2
0
→ ω = |Zobs | > τ α2
1+ 2
n1 n2

1 −X̄2 −ξ0
X̄q
 (n1 −1)s21 +(n2 −1)s22
(b) σ1 = σ2 = σ (unknown), Z = → ω = |Zobs | > t α2 ; n1 +n2 −2 , s2 = n1 +n2 −2
s n1 + n1
1 2

σ1 σ1
(2) H0 : σ2
= ξ0 (known) against H1 : σ2
̸= ξ0
n o
s210 1 1
(a) µ1 , µ2 are known, F = · → ω = Fobs > F 2 ; n1 ,n2 or Fobs > F 2 ; n2 ,n1
s220 ξ02
α α

n o
s21 1 1
(b) µ1 , µ2 are unknown F = s2 · ξ2 → ω = Fobs > F 2 ; n1 −1,n2 −1 or Fobs > F 2 ; n2 −1,n1 −1
α α
2 0

Bivariate Normal (Correlated Case)


n √ o
n(x̄−ȳ−ξ0 )
(1) H0 : µ1 − µ2 = ξ0 (known) → ω = (x, y) : sxy
> t 2 ; n−1 , s2xy = s2x + s2y + 2rsx sy
α
˜ ˜
n √ o
(2) H0 : ρ = 0 → ω = r√1−r n−2
2 > t α2 ; n−2 [r: sample correlation coefficient of (x, y)]
˜ ˜
 √ 
U = X + ξ0 Y
(3) H0 : σσ21 = ξ0 → ω = r√ uv n−2
> t α2 ; n−2
2
1−ruv V = X − ξ0 Y

Binomial Proportion
(I) Single Proportion - H0 : p = p0 , observed value: x0

(a) H1 : p > p0 , p-value = P1 = PH0 (X ≥ x0 )


(b) H1 : p < p0 , p-value = P2 = PH0 (X ≤ x0 )
(c) H1 : p ̸= p0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

(II) Two Proportions - H0 : p1 = p2 = p, observed value of X1 : x10 and X1 + X2 : x0

(a) H1 : p1 > p2 , p-value = P1 = PH0 (X1 ≥ x10 | X1 + X2 = x0 )


(b) H1 : p1 < p2 , p-value = P2 = PH0 (X1 ≤ x10 | X1 + X2 = x0 )
(c) H1 : p1 ̸= p2 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

Poisson Mean
n
P
(I) Single Population - H0 : λ = λ0 , observed value of S = X i : s0
i=1

(a) H1 : λ > λ0 , p-value = P1 = PH0 (S ≥ s0 )


(b) H1 : λ < λ0 , p-value = P2 = PH0 (S ≤ s0 )
(c) H1 : λ ̸= λ0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)
CHAPTER 2. STATISTICS 32

n1
P
(II) Two Populations - H0 : λ1 = λ2 = λ, observed value of S1 = X1i : s10 and S1 + S2 : s0
i=1

(a) H1 : λ > λ0 , p-value = P1 = PH0 (S1 ≥ s10 | S1 + S2 = s0 )


(b) H1 : λ < λ0 , p-value = P2 = PH0 (S1 ≤ s10 | S1 + S2 = s0 )
(c) H1 : λ ̸= λ0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

2.3 Interval Estimation



T : sufficient statistic and θ1 (T ), θ2 (T ) is a confidence interval with confidence coefficient (1 − α)
  
=⇒ P θ1 (T ), θ2 (T ) ∋ ψ(θ) = 1 − α ∀θ ∈ Ω

2.3.1 Methods of finding C.I.


Find a function ϕ(T, θ), whose sampling distribution is completely specified. This is the pivot.
Then find c1 , c2 based on the restriction: Pθ [c1 < ϕ(T, θ) < c2 ] = 1 − α

Note
For a parameter θ, the method of guessing θ is known as estimation and an interval estimate of θ
is known as confidence interval for θ.
For a R.V. Y , a method of guessing Y is known as prediction and an interval prediction of Y is
known as prediction limits.
 q 
iid
Xi ∼ N (µ, σ 2 ), i = 1(1)n =⇒ Prediction limits for Xn+1 : X̄ ∓ t α2 ; n−1 n+1
n
s

2.3.2 Wilk’s Optimum Criteria



Definition: A (1 − α) level confidence interval θ∗ (T ), θ (T ) of θ ∈ Ω, is said to be shortest length


confidence interval, in the class of all level (1 − α) confidence intervals based on a pivot ψ(T, θ), if

Eθ θ∗ (T ) − θ (T ) ≤ Eθ θ(T ) − θ(T ) , ∀θ ∈ Ω
   


whatever the other (1 − α) level confidence interval θ(T ), θ(T ) based on ψ(T, θ).

2.3.3 Test Inversion Method


Let A(θ0 ) be the “Acceptance Region” of a size ‘α’ test of H0 : θ = θ0 . Define,

I(x) = {θ ∈ Ω : A(θ) ∋ x} , x ∈ X
˜ ˜
then I(x) is a confidence interval for θ at confidence coefficient (1 − α).
˜
CHAPTER 2. STATISTICS 33

List of Confidence Intervals


 
iid X(n) X(n)
(1) Xi ∼ U(0, θ) : P ivot = θ
=⇒ X(n) , √

iid σ0
(2) Xi ∼ Shifted Exp (µ, σ0 ) : P ivot = σn0 [X(1) − µ] =⇒ X(1) +
 
n
ln α, X(1) (finite length)
Infinite Length: −∞, X(1) + σn0 ln(1 − α)
 
(σ0 known)
iid √    
(3) Xi ∼ N (µ, σ 2 ) : P ivot = n X̄−µS
=⇒ X̄ ∓ t α
2
S

; n−1 n

!
n 2T 2T
iid 2
P
(4) Xi ∼ Exp (θ) : P ivot = θ
Xi =⇒ 2
, 2
i=1 χ α ;2n χ1− α ;2n
2 2
!
2n 2nX(1) 2nX(1)
Based on X(1) , P ivot = X(1) =⇒ ,
θ
χ2α ;2 χ21− α ;2
2 2

2.4 Large Sample Theory


2.4.1 Modes of Convergence
(I) Convergence in Distribution
Definition: A sequence {Xn } of random variables with the corresponding sequence Fn (x) of D.F.s
is said to converge to a random variable X with D.F. F (x), if

lim Fn (x) = F (x), at every continuity point of F(x)


n→∞

Results
iid  D D
(1) Xn ∼ U(0, θ) =⇒ n θ − X(n) −→ Exp (θ) ←− nX(1)
iid  D
(2) Xn ∼ Shifted Exp (0, θ) =⇒ n X(n) − θ −→ Exp (1)
iid D
(3) Xn ∼ N (µ, σ 2 ) =⇒ X̄ −→ µ
D
(4) Xn ∼ Geo (pn = nθ ) =⇒ Xn
n
−→ Exp ( 1θ )
D
(5) X ∼ NB (n, p) =⇒ 2pX −→ χ22n as p → 0
Xn D
(6) Xn ∼ Gamma (n, β) =⇒ n
−→ β

Limiting MGF
D
(7) MGF → Xn : Mn (t), X : M (t), E(Xn ) exists ∀ n and Xn −→ X
If lim Mn (t), lim E(Xn ) is finite then MN (t) → M (t), E(Xn ) → E(X) as n → ∞
n→∞ n→∞

(8) Theorem: Let, {Fn } be a sequence of D.F.s with corresponding M.G.F.s {Mn } and suppose
that Mn (t) exists for |t| ≤ t0 , ∀n. If there exists a D.F. F with corresponding M.G.F. M ,
which exists for |t| ≤ t1 < t0 , such that Mn (t) → M (t) as n → ∞ for every t ∈ [−t1 , t1 ], then
W
Fn −→ F
CHAPTER 2. STATISTICS 34

(II) Convergence in Probability


Definition: Let, {Xn } be a sequence of R.V.s defined on the probability space (Ω, A, P). Then we
say that {Xn } converges in probability to a R.V. X, defined on (Ω, A, P), if for every ϵ > 0,

lim P [|Xn − X| < ϵ] = 1 or lim P [|Xn − X| > ϵ] = 0


n→∞ n→∞

Sufficient Condition: If {Xn } is a sequence of R.V.s such that E(Xn ) → C and V ar(Xn ) → 0 as
P
n → ∞ or E(Xn − C)2 → 0 as n → ∞, then Xn −→ C.
Counter example:

1

1 −
 ,x = k
P (Xn = x) = n =⇒ E(Xn − k)2 ̸→ 0 as n → ∞ but Xn −→ k
P
1
,x = k + n


n

Results
 n
 n1
iid Q P 1
(1) Xi ∼ U(0, 1) =⇒ Xi −→ e
i=1

P P
(2) Xn −→ X, lim an = a ∈ R =⇒ an Xn −→ aX
n→∞

D P
(3) Xn −→ X, lim an = ∞, an > 0 ∀n =⇒ a−1
n Xn −→ 0
n→∞

P P
(4) Limit Theorems: If Xn −→ X, Yn −→ Y , then,
P
(a) Xn ± Yn −→ X ± Y
P
(b) Xn Yn −→ XY
P
(c) g(Xn ) −→ g(X), if g(·) is continuous (Invariance Property)
Xn P X
(d) −→ , provided P (Yn = 0) = 0 = P (Y = 0)
Yn Y
Chapter 3

Mathematics

3.1 Basics
3.1.1 Combinatorial Analysis
(1) For a population with n elements, the number of samples of size r is -
(
nr , WR
ordered sample = n
Pr or (n)r , WOR
 
n n
unordered sample = Cr or , WOR
r

(2) Partition of population - The number of ways in which a population of n elements can be
divided into k ordered parts of which ith part consists of ri members, i = 1, 2, . . . , k is -
    
n n − r1 n − r1 − r2 − · · · − rk−1
··· , r1 + r2 + · · · + rk = n
r1 r2 rk
 
n! n
= =
r1 !r2 ! · · · rk ! r1 r2 · · · rk

(a) The number of different distributions of r identical balls into n cells i.e. the number of
different solutions (r1 , r2 , . . . , rn ) of the equation:
   
n+r−1 n+r−1
r1 + r2 + · · · + rn = r where, ri ≥ 0, are integers, is =
r n−1

(b) The number of different distributions of r indistinguishable balls into n cells in which no
cell remains empty i.e. the number of different solutions (r1 , r2 , ..., rn ) of the equation:
 
r−1
r1 + r2 + · · · + rn = r where, ri ≥ 1, are integers, is
n−1

n 2 n 2 n 2 2n
   
(3) (a) 0
+ 1
+ · · · + n
= n
n
 n n
 n  n
 n 2n

(b) 0 m
+ 1 m+1 + · · · + n−m n
= n−m

35
CHAPTER 3. MATHEMATICS 36

3.1.2 Difference Equation


{xn } is a sequence, xn = f (xn−1 , . . . , x2 , x1 ) is difference equation
A.P.: xn = xn−1 + d
G.P.: xn = r xn−1
xn = a1 xn−1 + · · · + ap xn−p is a linear difference equation of order p.

First Order Linear Difference Equation


xn = axn−1 + b, n ≥ 1
=⇒ xn − c = (x1 − c)an−1 , b = c(1 − a)

Second Order Linear Difference Equation


xn = axn−1 + bxn−2 , n ≥ 2
Characteristic Equation: u2 − au − b = 0 with roots u1 , u2
Case I: u1 ̸= u2 =⇒ xn = Aun1 + Bun2
Case II: u1 = u2 = u =⇒ xn = (A + Bn)un
CHAPTER 3. MATHEMATICS 37

3.2 Linear Algebra


3.2.1 Vectors & Vector Spaces
r n n
a2i and
P P
(1) Length of a = ai = 1.a where, 1 = {1, 1, . . . , 1}
˜ i=1 ˜˜ i=1 ˜
rn
p P
(2) Distance between a and b = |b − a| = (b − a).(b − a) = (bi − ai )2
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ i=1

a.b
(3) Angle (θ) between two non-null vectors a and b is given by, cos θ = ˜ ˜
˜ ˜ |a| |b|
˜ ˜
2 2 2
(4) Cauchy-Schwarz: (a.b) ≤ |a| |b| ‘=’ holds iff b = λa for some λ
˜˜ ˜ ˜ ˜ ˜
(5) Triangular Inequality: |a − b| + |b − c| ≥ |a − c|
˜ ˜ ˜ ˜ ˜ ˜

Vector Spaces
A vector space V over a field F is a non-empty collection of elements, satisfying the following
axioms:

(a) a, b ∈ V =⇒ a + b ∈ V [closed under vector addition]

(b) a ∈ V =⇒ αa ∈ V, ∀ α ∈ F [closed under scalar multiplication]

Note: Every vector space must include the null vector (0).
˜
Useful Results
Pr
(1) “ λi ai = 0 =⇒ λ = 0” ⇐⇒ {a1 , a2 , . . . , ar } are Linearly Independent.
i=1 ˜ ˜ ˜ ˜ ˜ ˜ ˜
Pr
The set is L.D. iff ∃ λ ̸= 0 for which λi ai = 0
˜ ˜ i=1 ˜ ˜
(2) A set of vectors is LIN =⇒ any subset is LIN
A set of vectors is LD =⇒ any superset is LD

(3) Basis Vector: (i) Spans V (ii) LIN

(a) {1, t, t2 , . . . , tn } : polynomial of degree ≤ n


(b) {e1 , e2 , . . . , en } : n dimensional vector space
˜ ˜ ˜
(4) The representation of every vector in terms of basis is unique.
Pr
(5) Replacement theorem: {a1 , a2 , . . . , ar } is a basis vector and b = λi ai
˜ ˜ ˜ ˜ i=1 ˜
If λi ̸= 0 then replace ai by b =⇒ {a1 , a2 , . . . , ai−1 , b, ai+1 , . . . , ar } is also a basis vector.
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(6) (a) Any set of basis of basis vector for Vn space contains exactly n vectors
(b) Any n LIN vectors from Vn form a basis for Vn
(c) Any set of (n + 1) vectors from Vn is LD
CHAPTER 3. MATHEMATICS 38

(7) Extension theorem: A set of m(< n) LIN vectors from Vn can be extended to a basis of Vn

(8) Dimension: Number of vectors in a basis or maximum number of LIN vectors in the space
Dimension of a subspace: {Total no. of vectors} - {No. of LIN restrictions}

(9) V = {0} has no basis =⇒ dim(V ) is undefined [We assume dim(V ) = 0]


˜
(10) Consider two subspaces S, T of a vector space V over the field F -
p
(a) S ∩ T is also a subspace and dim(S ∩ T ) ≤ min{dim(S), dim(T )} ≤ dim(S) dim(T )
(b) S + T = {a + b : a ∈ S, b ∈ T } =⇒ dim(S + T ) = dim(S) + dim(T ) − dim(S ∩ T )
(c) S ⊆ T =⇒ dim(S) ≤ dim(T ) and dim(S) = dim(T ) ⇐⇒ S = T
In general, dim(S) = dim(T ) ⇏ S = T

Orthogonal Vectors
(11) a, b ∈ E n are orthogonal (⊥) if a.b = 0 [0 is orthogonal to every vector]
˜ ˜ ˜˜ ˜
(12) The set of vectors {a1 , a2 , . . . , an } is mutually orthogonal if ai .aj = 0, ∀ i ̸= j
˜ ˜ ˜ ˜ ˜
(13) If a mutually orthogonal set includes the null vector then it becomes LD, else LIN

(14) E n ∋ a ⊥ Sn ⊆ E n ⇐⇒ a is orthogonal to a basis of Sn


˜ ˜ ˜
(15) Ortho complement of Sn = O(Sn ) : Collection of all vectors in E n which are orthogonal to Sn

(16) (a) Sn ∩ O(Sn ) = {0} (b) Sn + O(Sn ) = E n (c) O{O(Sn )} = Sn


˜
where, Sn is a subspace of E n

(17) S ⊕ T if, S + T = {x + y : x ∈ S, y ∈ T }
˜ ˜ ˜ ˜
(18) S ⊕ T ⇐⇒ “S ∩ T = {0}” ⇐⇒ “If x ∈ S, y ∈ T, x, y ̸= 0 then {x, y} is LIN”
˜ ˜ ˜ ˜ ˜ ˜
⇐⇒ “x + y = 0 =⇒ x = y = 0 for x ∈ S, y ˜∈ T ” ⇐⇒ “dim(S + T ˜) = dim(S) + dim(T )”
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(19) S, T ∈ V is said to be complement if S ⊕ T = V

(20) Mn×n (R) : Vector space of all (n × n) real matrices


S : (n × n) Symmetric matrices, T : (n × n) Skew-symmetric matrices
n(n + 1) n(n − 1)
=⇒ S, T are subspaces of Mn×n and S ⊕ T = Mn×n , dim(S) = , dim(T ) =
2 2

3.2.2 Matrices
(1) Consider a matrix Am×n

(a) Row Space: R(A) = {x′ A : x ∈ Rm }


˜ ˜
(b) Column Space: C(A) = {Ax : x ∈ Rn }
˜ ˜
(c) Null Space: N (A) = {x ∈ Rn : Ax = 0}
˜ ˜ ˜
(d) Left Null Space: N ′ (A) = {x ∈ Rm : x′ A = 0′ }
˜ ˜ ˜
(2) N (A) = O{R(A)} =⇒ dim N (A) = n − dim R(A) = nullity of A
CHAPTER 3. MATHEMATICS 39

(3) Am×n , B n×p ∋ AB = O =⇒ C(B) ⊆ N (A) =⇒ r(A) + r(B) ≤ n

(4) An×n ∋ A2 = A =⇒ C(In − A) = N (A)

Rank
(5) r (Am×n ) ≤ min{m, n}

(6) r(AB) ≤ min{r(A), r(B)} if AB is defined

(7) r(AB) = r(A), if det B ̸= 0

(8) A2 = A ⇐⇒ r(A) + r(In − A) = n

(9) r(A + B) ≤ r(A) + r(B)

(10) r(A) = r(A′ ) = r(A′ A) = r(AA′ )


r
P
(11) r(A) = r =⇒ A = Mk with r(Mk ) = 1, k = 1, 2, . . . , r
k=1

(12) r(AB − I) ≤ r(A − I) + r(B − I)

(13) Am×n , B s×n ∋ AB ′ = O =⇒ r(A′ A + B ′ B) = r(A) + r(B)

(14) A2 = A, B 2 = B, det(I − A − B) ̸= 0 =⇒ r(A) = r(B)


 
A B
(15) r ≥ r(A) + r(C)
O C

(16) Sylvester Inequality: r(AB) ≥ r(A) + r(B) − n


=⇒ r(A) + r(B) − n ≤ r(AB) ≤ min{r(A), r(B)}

(17) r(A + B) ≤ r(A) + r(B) ≤ r(AB) + n, provided AB, (A + B) is defined

(18) r(AB) = r(B) − dim {N (A) ∩ C(B)}

(19) r(x′ x) = r(x′ y) = r(xy ′ ) = 1, where x, y =


̸ 0 ∈ Rn
˜˜ ˜˜ ˜˜ ˜ ˜ ˜
Other Results
(20) Sum of all entries in a matrix A : 1′ A1
˜ ˜
2
(21) A = A =⇒ (In − A) is also idempotent
n
(22) C m×r = Am×n B n×r =⇒ C = (cij ) =
P
aik bkj
k=1

(23) tr(A + B) = tr(A) + tr(B)


tr(AB) = tr(BA)
CHAPTER 3. MATHEMATICS 40

3.2.3 Determinants
1 a a2
(1) 1 b b2 = (a − b)(b − c)(c − a)
1 c c2

a b ··· b
b a ··· b 
(2) .. .. . . .. = a + n − 1 b (a − b)n−1
. . . .
b b ··· a n×n

a+b b+c c+a a b c 1 0 1 a b c


(3) b + c c + a a + b = b c a × 1 1 0 = 2 b c a
c+a a+b b+c c a b 0 1 1 c a b

(4) Tridiagonal matrix


a b 0 0 ··· 0 0
c a b 0 ··· 0 0
An = 0 c a b · · · 0 0 =⇒ An = aAn−1 − bcAn−2 = 1 + bc + (bc)2 + · · · + (bc)n
.. .. .. .. .. .. ..
. . . . . . .
0 0 0 0 ··· c a n×n
 
x21 + y12 x1 x2 + y 1 y 2 · · · x1 xn + y 1 y n x1 y 1 0 ··· 0
2 2
x2 x1 + y2 y1 x2 + y 2 · · · x 2 xn + y 2 y n  x2 y 2
 0 ··· 0
(5) = |A| |A′ | = 0, A =  ..

.. .. ... .. .. .. .. .. 
. . . . . . . .
xn x1 + yn y1 xn x2 + y n y 2 · · · x2n + yn2 xn y n 0 ··· 0
 
n×n n×n n−1
(6) A and B differ only by a single row (or column) =⇒ |A + B| = 2 |A| + |B|

(7) A = (aij ), B = (bij ) = ri−j aij =⇒ |B| = |A|

I O I B A B I O A B
(8) = |A|, = |A| =⇒ = × = |C||A| = |A||C|
O A O A O C O C O I

(9) An×n , A(adj A) = (adj A)A = |A| In


(
Pn |A|, r = s
ari Asi = r, s = 1, 2, . . . , n
i=1 0, r ̸= s

(10) (a) |adj A| = |A|n−1


(b) (adj A)−1 = adj (A−1 ) = A
|A|

(c) adj(adj A) = |A|n−2 A


2
(d) |adj(adj A)| = |A|(n−1)

(11) |kA| = k n |A|, adj(kA) = k n−1 adj A

(12) adj(AB) = adj(B) adj(A) [if |A|, |B| =


̸ 0]
CHAPTER 3. MATHEMATICS 41

(13) Adjoint of a symmetric matrix is symmetric


Adjoint of a skew-symmetric matrix is symmetric for even order, skew-symmetric if odd order
Adjoint of a diagonal matrix is diagonal

0, if r(A) ≤ n − 2

(14) r(adj A) = 1, if r(A) = n − 1

n, if r(A) = n

Inverse of a Matrix
(15) (a) (AB)−1 = B −1 A−1

(b) (A−1 ) = (A′ )−1
−1
(c) |A + B| = ̸ 0 =⇒ |A−1 + B −1 | = ̸ 0 =⇒ (B −1 + A−1 ) = A(A + B)−1 B
 k×k 
n×n A11 A12
(16) A = , |A| =
̸ 0
A21 A22
(
|A11 ||A22 − A21 A−1 11 A12 |, if |A11 | =
̸ 0
|A| = −1
|A22 ||A11 − A12 A22 A21 |, if |A22 | = ̸ 0
 
A −u
(17) M = ̸ 0 =⇒ |M | = |A|(1 + v ′ A−1 u) = |A + uv ′ |
, |A| =
v 1˜ ˜ ˜ ˜˜
˜
̸ 0, |A + uv ′ | =
(18) |A| = ̸ 0 ⇐⇒ (1 + v ′ A−1 u) ̸= 0
˜˜ ˜ ˜
A−1 uv ′ A−1
(A + uv ′ )−1 = A−1 − ˜˜
˜˜ 1 + v ′ A−1 u
˜ ˜
−1
(19) An×n ′
a,b = (a − b)In + b 11 =⇒ Aa,b = Ac,d iff ∆ = (a − b)(a + n − 1 b) ̸= 0
˜˜
where, c = a+(n−2)b

, d = − ∆b
n n
(20) An×n = (aij ), |A| = bij = k1 , ∀ i where A−1 = (bij )
P P
̸ 0, aij = k, ∀ i =⇒
j=1 j=1

Orthogonal Matrix
 
a′1
˜a′ 
 2
(21) An×n = ˜..  where, a′i = ai1 ai2 · · ·

ain , i = 1, 2, . . . , n
. ˜

an
˜ (
1, when i = j
AA′ = In =⇒ a′i aj = |A| = ±1
˜˜ 0, when i ̸= j

(22) AA′ = A′ A = In , (In + A) ̸= 0 =⇒ (In + A)−1 (In − A) is skew-symmetric

(23) A, B real orthogonal ∋ |A| + |B| = 0 =⇒ |A + B| = 0

(24) AA′ = kIn =⇒ A′ A = kIn


CHAPTER 3. MATHEMATICS 42

Rank & Determinant


(25) Theorem: For a matrix Am×n the rank of A is the order of the “highest order non-vanishing
minor” of A.

(26) An×n , r(A) = n ⇐⇒ |A| =


̸ 0

(27) Elementary Matrices: An elementary matrix is a matrix which differs from the Identity
matrix by single row (or column) operation.

(28) Elementary Row Operation ≡ Pre-multiplying by corresponding elementary row matrix


Elementary Column Operation ≡ Post-multiplying by corresponding elementary column matrix
 
I k O
(29) r(Am×n ) = k, ∃ P m×m , Qn×n , |P |, |Q| =
̸ 0 ∋ P AQ = (Normal Form)
O O
(30) Any non-singular matrix can be written as a product of elementary matrices

(31) Rank Factorization: Let, r(Am×n ) = k, then a pair (P m×k , Qk×n ) of matrices is said to be a
rank factorization of A if, Am×n = P m×k Qk×n (non-unique way)

(32) Am×n = P m×k Qk×n , r(A) ≤ k, the following statements are equivalent -

(a) r(A) = k i.e. (P, Q) is a rank factorization of A


(b) r(P m×k ) = r(Qk×n ) = k
(c) The columns of P forms a basis of C(A) and the rows of Q forms a basis of R(A)

(33) A2 = A, Am×m = P m×k Qk×m where k = r(A) =⇒ (i) QP = In (ii) r(A) = tr(A)

(34) r(A | B) = r(A) ⇐⇒ B = AC, for some C

3.2.4 System of Linear Equation


System of Linear Equations: Am×n xn×1 = bm×1
˜ ˜
(1) Homogeneous System: Ax = 0
˜ ˜
(a) x = 0 is always consistent as x = 0 is a trivial solution
˜ ˜ ˜ ˜
(b) Am×n x = 0 has a non-trivial solution iff r(A) < n
˜ ˜
(c) The no. of LIN solutions of Ax = 0 = dim N (A) = n − r(A)
˜ ˜
(d) Elementary row operation on a matrix A doesn’t alter the N (A)

(2) General System: Ax = b, b ̸= 0


˜ ˜ ˜ ˜
(a) r(A | b) is either r(A) or r(A) + 1
˜
C(A | b) ⊇ C(A)
˜ (
Consistent ⇐⇒ r(A | b) = r(A)
(b) Ax = b → ˜
˜ ˜ Inconsistent ⇐⇒ r(A | b) > r(A)
( ˜
Unique solution ⇐⇒ r(A) = n
(c) Am×n x = b is consistent →
˜ ˜ At least two solutions ⇐⇒ r(A) < n
CHAPTER 3. MATHEMATICS 43

(d) Ax1 = Ax2 = b =⇒ αx1 + (1 − α)x2 is also a solution


˜ ˜ ˜ ˜ ˜
=⇒ If a system has two distinct solutions then it has infinitely many solutions

(3) Theorem: Ax = b be a consistent system with x0 as a particular solution. Then the set of all
˜ of
possible solutions ˜ Ax = b is given by x + N (A)˜ = {x + u : u ∈ N (A)}
0 0
˜ ˜ ˜ ˜ ˜ ˜
• Point, lines, planes not necessarily passing through the origin are called ‘flats’. If W is non-empty
flat and x0 is a fixed vector, then the translation of W by x0 is, x0 + W = {x0 + w : w ∈ W } and is
˜
a flat parallel to W . ˜ ˜ ˜ ˜ ˜

You might also like