0% found this document useful (0 votes)
5 views18 pages

Stat

The document provides a comprehensive overview of key concepts in statistics, including probability, distributions, expectations, and their associated theorems. It defines essential terms such as sample space, random variable, probability distribution, and conditional probability, along with theorems that govern their relationships. Additionally, it covers topics like expected value, moments, variance, and independence of random variables.

Uploaded by

Prem Dagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Stat

The document provides a comprehensive overview of key concepts in statistics, including probability, distributions, expectations, and their associated theorems. It defines essential terms such as sample space, random variable, probability distribution, and conditional probability, along with theorems that govern their relationships. Additionally, it covers topics like expected value, moments, variance, and independence of random variables.

Uploaded by

Prem Dagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

STATISTICS-SUMMARY

PREM DAGAR

1. Probability

Definition 1.1. Sample Space. The set of all possible outcomes of an experiment is called the sample
space, and it is usually denoted by the letter S. Each outcome in a sample space is called an element of the
sample space, or simply a sample point.

Definition 1.2. Event. An event is a subset of a sample space.

Definition 1.3. Mutually Exclusive Events. Two events having no elements in common are said to be
mutually exclusive.

Definition 1.4. Conditional Probability. If A and B are any two events in a sample space S and P (A) ̸= 0,
the conditional probability of B given A is
P (A ∩ B)
P (B | A) = .
P (A)
Theorem 1.1. If A and B are any two events in a sample space S and P (A) ̸= 0, then
P (A ∩ B) = P (A) · P (B | A).

Theorem 1.2. If A, B, and C are any three events in a sample space S such that P (A ∩ B) ̸= 0, then
P (A ∩ B ∩ C) = P (A) · P (B | A) · P (C | A ∩ B).

Definition 1.5. Two events A and B are independent if and only if


P (A ∩ B) = P (A) · P (B).

Theorem 1.3. If the events B1 , B2 , . . . , Bk constitute a partition of the sample space S and P (Bi ) ̸= 0 for
i = 1, 2, . . . , k, then for any event A in S,
k
X
P (A) = P (Bi ) · P (A | Bi ).
i=1

Theorem 1.4. If B1 , B2 , . . . , Bk constitute a partition of the sample space S and P (Bi ) ̸= 0 for i = 1, 2, . . . , k,
then for any event A in S such that P (A) ̸= 0,
P (Br ) · P (A | Br )
P (Br | A) = Pk for r = 1, 2, . . . , k.
i=1 P (Bi ) · P (A | Bi )

2. Distribution

Definition 2.1 (Random Variable). If S is a sample space with a probability measure and X is a real-valued
function defined over the elements of S, then X is called a random variable.

Definition 2.2 (Probability Distribution). If X is a discrete random variable, the function given by f (x) =
P (X = x) for each x within the range of X is called the probability distribution of X.

Theorem 2.1. A function can serve as the probability distribution of a discrete random variable X if and
only if its values, f (x), satisfy the conditions:
1
2 PREM DAGAR

(1) f (x) ≥ 0 for each value within its domain;


P
(2) x f (x) = 1, where the summation extends over all the values within its domain.

Definition 2.3 (Distribution Function). If X is a discrete random variable, the function given by
X
F (x) = P (X ≤ x) = f (t) for − ∞ < x < ∞,
t≤x

where f (t) is the value of the probability distribution of X at t, is called the distribution function or the
cumulative distribution of X.

Theorem 2.2. The values F (x) of the distribution function of a discrete random variable X satisfy the
following conditions:
(1) F (−∞) = 0 and F (∞) = 1;
(2) If a < b, then F (a) ≤ F (b) for any real numbers a and b.

Theorem 2.3. If the range of a random variable X consists of the values x1 < x2 < x3 < · · · < xn , then
f (x1 ) = F (x1 ) and f (xi ) = F (xi ) − F (xi−1 ) for i = 2, 3, . . . , n.

Definition 2.4 (Probability Density Function). A function with values f (x), defined over the set of all real
numbers, is called a probability density function of the continuous random variable X if and only if
Z b
P (a ≤ X ≤ b) = f (x) dx
a
for any real constants a and b with a ≤ b.

Theorem 2.4. If X is a continuous random variable and a and b are real constants with a ≤ b, then
P (a ≤ X ≤ b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a < X < b).

Theorem 2.5. A function can serve as a probability density of a continuous random variable X if its values,
f (x), satisfy the conditions:
(1) fZ (x) ≥ 0 for −∞ < x < ∞;

(2) f (x) dx = 1.
−∞

Definition 2.5 (Distribution Function). If X is a continuous random variable and the value of its probability
density at t is f (t), then the function given by
Z x
F (x) = P (X ≤ x) = f (t) dt for − ∞ < x < ∞
−∞
is called the distribution function or the cumulative distribution function of X.

Theorem 2.6. If f (x) and F (x) are the values of the probability density and the distribution function of X
at x, then
P (a ≤ X ≤ b) = F (b) − F (a)
for any real constants a and b with a ≤ b, and
dF (x)
f (x) =
dx
where the derivative exists.

Definition 2.6 (Joint Probability Distribution). If X and Y are discrete random variables, the function given
by
f (x, y) = P (X = x, Y = y)
for each pair of values (x, y) within the range of X and Y is called the joint probability distribution of X and
Y.
STATISTICS-SUMMARY 3

Theorem 2.7. A bivariate function can serve as the joint probability distribution of a pair of discrete random
variables X and Y if and only if its values, f (x, y), satisfy the conditions:
(1) f (x, y) ≥ 0 for each pair of values (x, y) within its domain;
P P
(2) x y f (x, y) = 1, where the double summation extends over all possible pairs (x, y) within its domain.

Definition 2.7 (Joint Probability Density Function). A bivariate function with values f (x, y) defined over
the xy-plane is called a joint probability density function of the continuous random variables X and Y if and
only if
ZZ
P ((X, Y ) ∈ A) = f (x, y) dx dy
A
for any region A in the xy-plane.

Theorem 2.8. A bivariate function can serve as a joint probability density function of a pair of continuous
random variables X and Y if its values, f (x, y), satisfy the conditions:
(1) fZ (x, y) ≥ 0 for −∞ < x < ∞, −∞ < y < ∞;
∞ Z ∞
(2) f (x, y) dx dy = 1.
−∞ −∞

Definition 2.8 (Joint Distribution Function). If X and Y are continuous random variables, the function
given by
Z y Z x
F (x, y) = P (X ≤ x, Y ≤ y) = f (s, t) ds dt for − ∞ < x < ∞, −∞ < y < ∞,
−∞ −∞

where f (s, t) is the joint probability density of X and Y at (s, t), is called the joint distribution function of
X and Y .

If f (x, y) is the value of their joint probability distribution at (x, y), the function given by
X
g(x) = f (x, y)
y

for each x within the range of X is called the marginal distribution of X. Correspondingly, the function given
by
X
h(y) = f (x, y)
x
for each y within the range of Y is called the marginal distribution of Y .

Definition 2.9 (Marginal Density). If X and Y are continuous random variables and f (x, y) is the value of
their joint probability density at (x, y), the function given by
Z ∞
g(x) = f (x, y) dy for − ∞ < x < ∞
−∞

is called the marginal density of X. Correspondingly, the function given by


Z ∞
h(y) = f (x, y) dx for − ∞ < y < ∞
−∞

is called the marginal density of Y .

Definition 2.10 (Conditional Distribution). If f (x, y) is the value of the joint probability distribution of the
discrete random variables X and Y at (x, y) and h(y) is the value of the marginal distribution of Y at y, the
function given by
f (x, y)
f (x | y) = , h(y) ̸= 0
h(y)
4 PREM DAGAR

for each x within the range of X is called the conditional distribution of X given Y = y. Correspondingly, if
g(x) is the value of the marginal distribution of X at x, the function given by
f (x, y)
w(y | x) = , g(x) ̸= 0
g(x)
for each y within the range of Y is called the conditional distribution of Y given X = x.

Definition 2.11 (Conditional Density). If f (x, y) is the value of the joint density of the continuous random
variables X and Y at (x, y) and h(y) is the value of the marginal density of Y at y, the function given by
f (x, y)
f (x | y) = , h(y) ̸= 0
h(y)
for −∞ < x < ∞, is called the conditional density of X given Y = y. Correspondingly, if g(x) is the value of
the marginal density of X at x, the function given by
f (x, y)
f (y | x) = , g(x) ̸= 0
g(x)
for −∞ < y < ∞, is called the conditional density of Y given X = x.

Definition 2.12 (Independence of Discrete Random Variables). If f (x1 , x2 , . . . , xn ) is the value of the joint
probability distribution of the discrete random variables X1 , X2 , . . . , Xn at (x1 , x2 , . . . , xn ) and fi (xi ) is the
value of the marginal distribution of Xi at xi for i = 1, 2, . . . , n, then the n random variables are independent
if and only if
f (x1 , x2 , . . . , xn ) = f1 (x1 ) · f2 (x2 ) · . . . · fn (xn )
for all (x1 , x2 , . . . , xn ) within their range.

3. Expectations

Definition 3.1 (Expected Value). If X is a discrete random variable and f (x) is the value of its probability
distribution at x, the expected value of X is
X
E(X) = x · f (x).
x

Correspondingly, if X is a continuous random variable and f (x) is the value of its probability density at x,
the expected value of X is Z ∞
E(X) = x · f (x) dx.
−∞

Theorem 3.1. If X is a discrete random variable and f (x) is the value of its probability distribution at x,
the expected value of g(X) is given by
X
E[g(X)] = g(x) · f (x).
x

Correspondingly, if X is a continuous random variable and f (x) is the value of its probability density at x,
the expected value of g(X) is given by
Z ∞
E[g(X)] = g(x) · f (x) dx.
−∞

Theorem 3.2. If a and b are constants, then

E(aX + b) = aE(X) + b.

Corollary 3.3. If a is a constant, then


E(aX) = aE(X).
STATISTICS-SUMMARY 5

Theorem 3.4. If c1 , c2 , . . . , cn are constants, then


" n # n
X X
E ci gi (X) = ci E[gi (X)].
i=1 i=1

Theorem 3.5. If X and Y are discrete random variables and f (x, y) is the value of their joint probability
distribution at (x, y), the expected value of g(X, Y ) is
XX
E[g(X, Y )] = g(x, y) · f (x, y).
x y

Correspondingly, if X and Y are continuous random variables and f (x, y) is the value of their joint probability
density at (x, y), the expected value of g(X, Y ) is
Z ∞Z ∞
E[g(X, Y )] = g(x, y)f (x, y) dx dy.
−∞ −∞

Theorem 3.6. If c1 , c2 , . . . , cn are constants, then


" n # n
X X
E ci gi (X1 , X2 , . . . , Xk ) = ci E[gi (X1 , X2 , . . . , Xk )].
i=1 i=1

Definition 3.2 (Moments about the Origin). The rth moment about the origin of a random variable X,
denoted by µ′r , is the expected value of X r ; symbolically,
X
µ′r = E(X r ) = xr · f (x)
x

for r = 0, 1, 2, . . . when X is discrete, and


Z ∞
µ′r = E(X ) = r
xr · f (x) dx
−∞

when X is continuous.

Definition 3.3 (Mean of a Distribution). µ′1 is called the mean of the distribution of X, or simply the mean
of X, and it is denoted simply by µ.

Definition 3.4 (Moments about the Mean). The rth moment about the mean of a random variable X,
denoted by µr , is the expected value of (X − µ)r , symbolically
X
µr = E[(X − µ)r ] = (x − µ)r · f (x)
x

for r = 0, 1, 2, . . . when X is discrete, and


Z ∞
r
µr = E[(X − µ) ] = (x − µ)r · f (x) dx
−∞

when X is continuous.

Definition 3.5 (Variance). µ2 is called the variance of the distribution of X, or simply the variance of X,
and it is denoted by σ 2 , σX
2 , var(X), or V (X). The positive square root of the variance, σ, is called the

standard deviation of X.

Theorem 3.7.
σ 2 = µ′2 − µ2

Theorem 3.8. If X has the variance σ 2 , then

var(aX + b) = a2 σ 2
6 PREM DAGAR

Theorem 3.9 (Chebyshev’s Theorem). If µ and σ are the mean and the standard deviation of a random
variable X, then for any positive constant k the probability is at least 1 − k12 that X will take on a value within
k standard deviations of the mean; symbolically,
1
P (|X − µ|< kσ) ≥ 1 − , σ ̸= 0.
k2
Definition 3.6. The moment generating function of a random variable X, where it exists, is given by
X
MX (t) = E(etX ) = etx · f (x)
x

when X is discrete, and Z ∞


tX
MX (t) = E(e )= etx · f (x) dx
−∞
when X is continuous.

Theorem 3.10.
dr MX (t)
= µr
dtr t=0

Theorem 3.11. If a and b are constants, then


(1) MX+a (t) = E[e(X+a)t ] = eat · MX (t);
(2) MbX (t) = E[ebXt
h ]X+a
= MiX (bt);
a
(3) M X+a (t) = E e b )t = e b t · MX bt .
( 
b

Definition 3.7 (Product Moments About the Origin). The rth and sth product moment about the origin of
the random variables X and Y , denoted by µr,s , is the expected value of X r Y s ; symbolically,
XX
µr,s = E(X r Y s ) = xr y s · f (x, y)
x y

for r = 0, 1, 2, . . . and s = 0, 1, 2, . . . when X and Y are discrete, and


Z ∞Z ∞
r s
µr,s = E(X Y ) = xr y s · f (x, y) dx dy
−∞ −∞

when X and Y are continuous.

Definition 3.8 (Product Moments About the Mean). The rth and sth product moment about the means of
the random variables X and Y , denoted by µr,s , is the expected value of (X − µX )r (Y − µY )s ; symbolically,

XX
µr,s = E[(X − µX )r (Y − µY )s ] = (x − µX )r (y − µY )s · f (x, y)
x y
for r = 0, 1, 2, . . . and s = 0, 1, 2, . . . when X and Y are discrete, and
Z ∞ Z ∞
µr,s = E[(X − µX )r (Y − µY )s ] = (x − µX )r (y − µY )s · f (x, y) dx dy
−∞ −∞
when X and Y are continuous.

Definition 3.9 (Covariance). µ1,1 is called the covariance of X and Y , and it is denoted by σXY , cov(X, Y ),
or C(X, Y ).

Theorem 3.12.
σXY = µ1,1 − µX µY

Theorem 3.13. If X and Y are independent, then

E(XY ) = E(X) · E(Y ) and σXY = 0.


STATISTICS-SUMMARY 7

Remark: It is of interest to note that the independence of two random variables implies a zero covariance,
but a zero covariance does not necessarily imply their independence.

Theorem 3.14. If X1 , X2 , . . . , Xn are independent, then


E(X1 X2 · · · Xn ) = E(X1 ) · E(X2 ) · · · E(Xn ).

Theorem 3.15. If X1 , X2 , . . . , Xn are random variables and


n
X
Y = ai Xi
i=1
where a1 , a2 , . . . , an are constants, then
n
X
E(Y ) = ai E(Xi )
i=1
and
n
X X
var(Y ) = a2i · var(Xi ) + 2 ai aj · cov(Xi , Xj )
i=1 i<j
where the double summation extends over all values of i and j, from 1 to n, for which i < j.

Corollary 3.16. If the random variables X1 , X2 , . . . , Xn are independent and


n
X
Y = ai Xi ,
i=1
then
n
X
var(Y ) = a2i · var(Xi ).
i=1

Theorem 3.17. If X1 , X2 , . . . , Xn are random variables and


n
X n
X
Y1 = ai Xi and Y2 = bi Xi ,
i=1 i=1
where a1 , a2 , . . . , an and b1 , b2 , . . . , bn are constants, then
n
X X
cov(Y1 , Y2 ) = ai bi · var(Xi ) + (ai bj + aj bi ) · cov(Xi , Xj ).
i=1 i<j

Corollary 3.18. If the random variables X1 , X2 , . . . , Xn are independent, and


n
X n
X
Y1 = ai Xi and Y2 = bi Xi ,
i=1 i=1
then
n
X
cov(Y1 , Y2 ) = ai bi · var(Xi ).
i=1

Definition 10. Conditional Expectation.


If X is a discrete random variable and f (x|y) is the conditional probability distribution of X given Y = y at
x, then the conditional expectation of u(X) given Y = y is defined as
X
E[u(X) | y] = u(x) · f (x|y).
x

Correspondingly, if X is a continuous random variable and f (x|y) is the conditional probability density of X
given Y = y at x, then the conditional expectation of u(X) given Y = y is
Z ∞
E[u(X) | y] = u(x) · f (x|y) dx.
−∞
8 PREM DAGAR

4. Special Probability Distributions

Definition 1. Discrete Uniform Distribution.


A random variable X has a discrete uniform distribution and is called a discrete uniform random variable if
and only if its probability distribution is given by
1
f (x) =
k
for
x = x1 , x2 , . . . , xk ,
where xi ̸= xj when i ̸= j.
Definition 2. Bernoulli Distribution.
A random variable X has a Bernoulli distribution and is called a Bernoulli random variable if and only if its
probability distribution is given by
f (x; θ) = θx (1 − θ)1−x for x = 0, 1,
where 0 ≤ θ ≤ 1.
Definition 3. Binomial Distribution. A random variable X has a binomial distribution and is called a
binomial random variable if and only if its probability distribution is given by
 
n x
b(x; n, θ) = θ (1 − θ)n−x for x = 0, 1, 2, . . . , n,
x
where n is a non-negative integer and 0 ≤ θ ≤ 1.
Theorem 1.

b(x; n, θ) = b(n − x; n, 1 − θ)
Theorem 2. The mean and the variance of the binomial distribution are
µ = nθ and σ 2 = nθ(1 − θ)
X
Theorem 3. If X has a binomial distribution with parameters n and θ, and Y = n, then
θ(1 − θ)
E(Y ) = θ and σY2 =
n
Now, if we apply Chebyshev’s theorem with kσ = c, we can assert that for any positive constant c, the
probability is at least
θ(1 − θ)
1−
nc2
that the proportion of successes in n trials falls between µ − c and µ + c.
Hence, as n → ∞, the probability approaches 1 that the proportion of successes will differ from µ by less
than any arbitrary constant c.
This result is called a law of large numbers, and it should be observed that it applies to the proportion of
successes, not to their actual number. It is a fallacy to suppose that when n is large, the number of successes
must necessarily be close to nθ.
THEOREM 4. The moment-generating function of the binomial distribution is given by
n
MX (t) = 1 + θ et − 1


Definition 4. Negative Binomial Distribution.


A random variable X has a negative binomial distribution, and is referred to as a negative binomial random
variable, if and only if its probability distribution is given by
 
∗ x−1 k
b (x; k, θ) = θ (1 − θ)x−k for x = k, k + 1, k + 2, . . .
k−1
STATISTICS-SUMMARY 9

Thus, the number of the trial on which the kth success occurs is a random variable having a negative binomial
distribution with the parameters k and θ. The name “negative binomial distribution” derives from the fact
that the values of b∗ (x; k, θ) for x = k, k + 1, k + 2, . . . are the successive terms of the binomial expansion of

1 1 − θ −k
 
− .
θ θ
In the literature of statistics, negative binomial distributions are also referred to as binomial waiting-time
distributions or as Pascal distributions.
Theorem 5.

k
b∗ (x; k, θ) = · b(k; x, θ)
x
Theorem 6.
The mean and the variance of the negative binomial distribution are
 
k 2 k 1
µ= and σ = −1
θ θ θ
Definition 5. Geometric Distribution.
A random variable X has a geometric distribution and is referred to as a geometric random variable if and
only if its probability distribution is given by

g(x; θ) = θ(1 − θ)x−1 for x = 1, 2, 3, . . .

Definition 6. Hypergeometric Distribution.


A random variable X has a hypergeometric distribution, and is referred to as a hypergeometric random
variable, f and only if its probability distribution is given by
  
M N −M
x n−x
h(x; n, N, M ) =   for x = 0, 1, 2, . . . , n
N
n
subject to the constraints x ≤ M and n − x ≤ N − M .
Theorem 7.
The mean and the variance of the hypergeometric distribution are
nM nM (N − M )(N − n)
µ= and σ 2 =
N N 2 (N − 1)
Definition 7. Poisson Distribution.
A random variable X has a Poisson distribution, and is referred to as a Poisson random variable if and only
if its probability distribution is given by
λx e−λ
p(x; λ) = for x = 0, 1, 2, . . .
x!
Theorem 8.
The mean and the variance of the Poisson distribution are given by

µ=λ and σ 2 = λ

Theorem 9.
The moment-generating function of the Poisson distribution is given by
t −1)
MX (t) = eλ(e

Definition 1. Uniform Distribution.


10 PREM DAGAR

A random variable X has a uniform distribution and is referred to as a continuous uniform random variable
if and only if its probability density function is given by

 1 for α < x < β,
u(x; α, β) = β−α
0 elsewhere.
Theorem 1.
The mean and the variance of the uniform distribution are given by
α+β 1
µ= and σ 2 = (β − α)2
2 12
Definition 2. Gamma Distribution.
A random variable X has a gamma distribution and is referred to as a gamma random variable if and only
if its probability density function is given by

 1 xα−1 e−x/β for x > 0,
α
g(x; α, β) = β Γ(α)
0 elsewhere,
where α > 0 and β > 0.
Definition 3. Exponential Distribution.
A random variable X has an exponential distribution and is referred to as an exponential random variable if
and only if its probability density function is given by

 1 e−x/θ for x > 0,
g(x; θ) = θ
0 elsewhere,
where θ > 0.
Definition 4. Chi-Square Distribution.
A random variable X has a chi-square distribution and is referred to as a chi-square random variable if and
only if its probability density function is given by

ν
 1
2ν/2 Γ(ν/2)
x 2 −1 e−x/2 for x > 0,
f (x; ν) =
0 elsewhere,
where ν > 0 is the degrees of freedom.
Theorem 2.
The rth moment about the origin of the gamma distribution is given by
Γ(α + r)
µ′r = β r .
Γ(α)
Theorem 3.
The mean and the variance of the gamma distribution are given by

µ = αβ and σ 2 = αβ 2 .

Corollary 1.
The mean and the variance of the exponential distribution are given by

µ=θ and σ 2 = θ2 .

Corollary 2.
The mean and the variance of the chi-square distribution are given by

µ=ν and σ 2 = 2ν.

Definition 5. Beta Distribution.


STATISTICS-SUMMARY 11

A random variable X has a beta distribution and is referred to as a beta random variable if and only if its
probability density function is given by

 Γ(α+β) xα−1 (1 − x)β−1 for 0 < x < 1,
f (x; α, β) = Γ(α) Γ(β)
0 elsewhere,
where α > 0 and β > 0.
Theorem 5.
The mean and the variance of the beta distribution are given by
α αβ
µ= and σ 2 = .
α+β (α + β)2 (α + β + 1)
Definition 6. Normal Distribution.
A random variable X has a normal distribution and is referred to as a normal random variable if and only
if its probability density function is given by
1 1 x−µ 2
n(x; µ, σ) = √ e− 2 ( σ ) for − ∞ < x < ∞,
σ 2π
where σ > 0.
Theorem 6.
The moment-generating function of the normal distribution is given by
1 2 t2
MX (t) = eµt+ 2 σ .
Definition 7. Standard Normal Distribution.
The normal distribution with µ = 0 and σ = 1 is referred to as the standard normal distribution.
If a random variable X has a normal distribution with mean µ and standard deviation σ, then
X −µ
Z=
σ
has the standard normal distribution.
Theorem 8.
If X is a random variable having a binomial distribution with parameters n and θ, then the moment-generating
function of
X − nθ
Z=p
nθ(1 − θ)
approaches that of the standard normal distribution as n → ∞.
Definition 8. Bivariate Normal Distribution.
A pair of random variables X and Y has a bivariate normal distribution and are referred to as jointly normally
distributed random variables if and only if their joint probability density function is given by
( "  #)
x − µ1 2 y − µ2 2
    
1 1 x − µ1 y − µ2
f (x, y) = exp − − 2ρ +
2(1 − ρ2 )
p
2πσ1 σ2 1 − ρ2 σ1 σ1 σ2 σ2
for −∞ < x < ∞ and −∞ < y < ∞, where σ1 > 0, σ2 > 0, and −1 < ρ < 1.
Theorem 9.
If X and Y have a bivariate normal distribution, the conditional density of Y given X = x is a normal
distribution with mean
σ2
µY |x = µ2 + ρ (x − µ1 )
σ1
and variance
σY2 |x = σ22 (1 − ρ2 ).
Similarly, the conditional density of X given Y = y is a normal distribution with mean
σ1
µX|y = µ1 + ρ (y − µ2 )
σ2
12 PREM DAGAR

and variance
2
σX|y = σ12 (1 − ρ2 ).
Theorem 10.
If two random variables have a bivariate normal distribution, they are independent if and only if ρ = 0.
Theorem 3.
If X1 , X2 , . . . , Xn are independent random variables and
Y = X1 + X2 + · · · + Xn ,
then the moment-generating function of Y is
n
Y
MY (t) = MXi (t),
i=1

where MXi (t) is the moment-generating function of Xi evaluated at t.

5. Sampling Distributions

Definition 1. Population.
A set of numbers from which a sample is drawn is referred to as a population. The distribution of the numbers
constituting a population is called the population distribution.
Definition 2. Random Sample.
If X1 , X2 , . . . , Xn are independent and identically distributed random variables, we say that they constitute
a random sample from the infinite population given by their common distribution.
Definition 3. Sample Mean and Sample Variance.
If X1 , X2 , . . . , Xn constitute a random sample, then the sample mean is given by
n
1X
X= Xi
n
i=1
and the sample variance is given by
n
2 1 X
S = (Xi − X)2 .
n−1
i=1
Theorem 1.
If X1 , X2 , . . . , Xn constitute a random sample from an infinite population with mean µ and variance σ 2 , then
σ2
E(X) = µ and var(X) = .
n
Theorem 2.
For any positive constant c, the probability that X will take on a value between µ − c and µ + c is at least
σ2
1− .
nc2
As n → ∞, this probability approaches 1.
Theorem 3. Central Limit Theorem.
If X1 , X2 , . . . , Xn constitute a random sample from an infinite population with mean µ, variance σ 2 , and
moment-generating function MX (t), then the limiting distribution of
X −µ
Z= √
σ/ n
as n → ∞ is the standard normal distribution.
Theorem 4.
If X is the mean of a random sample of size n from a normal population with mean µ and variance σ 2 , then
2
its sampling distribution is a normal distribution with mean µ and variance σn .
STATISTICS-SUMMARY 13

Definition 4. Random Sample—Finite Population.


If X1 is the first value drawn from a finite population of size N , X2 is the second value drawn, . . ., Xn is the
nth value drawn, and the joint probability distribution of these n random variables is given by
1
f (x1 , x2 , . . . , xn ) =
N (N − 1) · · · (N − n + 1)
for each ordered n-tuple of values of these random variables, then X1 , X2 , . . . , Xn are said to constitute a
random sample from the given finite population.
Definition 5. Sample Mean and Variance—Finite Population. The sample mean and the sample
variance of the finite population {c1 , c2 , . . . , cN } are
N
X 1
µ= ci ·
N
i=1
and
N
X 1
σ2 = (ci − µ)2 · .
N
i=1
Theorem 5. If Xr and Xs are the rth and sth random variables of a random sample of size n drawn from
the finite population {c1 , c2 , . . . , cN }, then
σ2
cov(Xr , Xs ) = − .
N −1
Theorem 6. If X is the mean of a random sample of size n taken without replacement from a finite
population of size N with the mean µ and the variance σ 2 , then
σ2 N − n
E(X) = µ and ·
var(X) = .
n N −1
Definition. A random variable X has a chi-square distribution with ν degrees of freedom if its probability
density function is given by

1 ν
x 2 −1 e−x/2 for x > 0,


ν/2
f (x) = 2 Γ(ν/2)
0

elsewhere.

Theorem 7. If X has the standard normal distribution, then X 2 has the chi-square distribution with ν = 1
degree of freedom.
Theorem 8. If X1 , X2 , . . . , Xn are independent random variables having standard normal distributions,
then
Xn
Y = Xi2
i=1
has the chi-square distribution with ν = n degrees of freedom.
Theorem 9. If X1 , X2 , . . . , Xn are independent random variables having chi-square distributions with
ν1 , ν2 , . . . , νn degrees of freedom, then
Xn
Y = Xi
i=1
has the chi-square distribution with ν1 + ν2 + · · · + νn degrees of freedom.
Theorem 10. If X1 and X2 are independent random variables, X1 has a chi-square distribution with ν1
degrees of freedom, and X1 + X2 has a chi-square distribution with ν > ν1 degrees of freedom, then X2 has a
chi-square distribution with ν − ν1 degrees of freedom.
Theorem 11. If X and S 2 are the mean and the variance of a random sample of size n from a normal
population with mean µ and standard deviation σ, then
(1) X and S 2 are independent;
14 PREM DAGAR

(2) The random variable


(n − 1)S 2
σ2
has a chi-square distribution with n − 1 degrees of freedom.
Definition. If Y has a chi-square distribution with ν degrees of freedom, and Z has the standard normal
distribution, then the distribution of
Z
T =p
Y /ν
is given by
− ν+1
Γ ν+1
 
2 t2 2
f (t) = √ ν
 1+ for − ∞ < t < ∞,
πν Γ 2 ν
and it is called the t distribution with ν degrees of freedom.
Theorem 13. If X and S 2 are the mean and the variance of a random sample of size n from a normal
population with mean µ and variance σ 2 , then
X −µ
T = √
S/ n
has the t distribution with n − 1 degrees of freedom.
Theorem 14. If U and V are independent random variables having chi-square distributions with ν1 and ν2
degrees of freedom, then
U/ν1
F =
V /ν2
is a random variable having an F distribution, that is, a random variable whose probability density function
is given by
    ν1 − ν1 +ν2
Γ ν1 +ν 2

 2 ν1 2 ν1
−1 ν1 2


ν1
 ν2
 f 2 1+ f f > 0,
g(f ) = Γ 2 Γ 2 ν2 ν2


0 elsewhere.

Theorem 15. If S12 and S22 are the variances of independent random samples of sizes n1 and n2 from normal
populations with the variances σ12 and σ22 , then
S12 /σ12 σ2S 2
F = 2 2 = 22 12
S2 /σ2 σ1 S2
is a random variable having an F distribution with n1 − 1 and n2 − 1 degrees of freedom.
The F distribution is also known as the variance-ratio distribution.

6. Point Estimation

Definition 6.1 (Point Estimation). Using the value of a sample statistic to estimate the value of a population
parameter is called point estimation. The value of the statistic obtained from the sample is referred to as a
point estimate.

Definition 6.2 (Unbiased Estimator). A statistic θ̂ is an unbiased estimator of the parameter θ of a given
distribution if and only if
E(θ̂) = θ
for all possible values of θ.

Definition 6.3 (Asymptotically Unbiased Estimator). Letting bn (θ) = E(θ̂) − θ express the bias of an
estimator θ̂ based on a random sample of size n from a given distribution, we say that θ̂ is an asymptotically
unbiased estimator of θ if and only if
lim bn (θ) = 0.
n→∞
STATISTICS-SUMMARY 15

Theorem 6.1. If S2 is the variance of a random sample from an infinite population with finite variance σ 2 ,
then
E(S 2 ) = σ 2 .

Definition 6.4. Minimum Variance Unbiased Estimator. The estimator for the parameter θ of a given
distribution that has the smallest variance of all unbiased estimators for θ is called the minimum variance
unbiased estimator, or the best unbiased estimator for θ.

Theorem 6.2. If θ̂ is an unbiased estimator of θ and


" 2 #
1 ∂
var(θ̂) = · E ln f (X) ,
n ∂θ

then θ̂ is a minimum variance unbiased estimator of θ.

Definition 6.5 (Consistent Estimator). A statistic θ̂ is a consistent estimator of the parameter θ of a given
distribution if and only if for each c > 0,
lim P (|θ̂ − θ|< c) = 1.
n→∞

Theorem 6.3. If θ̂ is an unbiased estimator of the parameter θ and


var(θ̂) → 0 as n → ∞,
then θ̂ is a consistent estimator of θ.

Definition 6.6. The statistic θ̂ is a sufficient estimator of the parameter θ of a given distribution if and only
if for each value of θ̂, the conditional probability distribution or density of the random sample X1 , X2 , . . . , Xn ,
given θ̂ = t, is independent of θ.

Theorem 6.4. The statistic θ̂ is a sufficient estimator of the parameter θ if and only if the joint probability
distribution or density of the random sample can be factored as
f (x1 , x2 , . . . , xn ; θ) = g(θ̂, θ) · h(x1 , x2 , . . . , xn ),
where g(θ̂, θ) depends only on θ̂ and θ, and h(x1 , x2 , . . . , xn ) does not depend on θ.

Definition 6.7 (Sample Moments). The kth sample moment of a set of observations x1 , x2 , . . . , xn is the
mean of their kth powers and it is denoted by mk ; symbolically,
n
1X k
mk = xi .
n
i=1

Definition 6.8 (Maximum Likelihood Estimator). If x1 , x2 , . . . , xn are the values of a random sample from
a population with the parameter θ, the likelihood function of the sample is given by
L(θ) = f (x1 , x2 , . . . , xn ; θ)
for values of θ within a given domain. Here, f (x1 , x2 , . . . , xn ; θ) is the value of the joint probability distribution
or the joint probability density of the random variables X1 , X2 , . . . , Xn at X1 = x1 , X2 = x2 , . . . , Xn = xn .
We refer to the value of θ that maximizes L(θ) as the maximum likelihood estimator of θ.

Example 6.1. If x1 , x2 , . . . , xn are the values of a random sample from an exponential population, find the
maximum likelihood estimator of its parameter θ.
Solution:
Since the likelihood function is given by
n  n
Y 1 1 Pn
L(θ) = f (x1 , x2 , . . . , xn ; θ) = f (xi ; θ) = · e− θ i=1 xi
θ
i=1
16 PREM DAGAR

Differentiation of ln L(θ) with respect to θ yields


n
d n 1 X
[ln L(θ)] = − + 2 xi
dθ θ θ
i=1
Equating this derivative to zero and solving for θ, we get the maximum likelihood estimator
n
1X
θ̂ = xi = x̄
n
i=1

Hence, the maximum likelihood estimator is θ̂ = X̄.

7. Interval Estimation

Definition 7.1 (Confidence Interval). If θ̂1 and θ̂2 are values of the random variables Θ̂1 and Θ̂2 such that
P (Θ̂1 < θ < Θ̂2 ) = 1 − α
for some specified probability 1 − α, we refer to the interval
θ̂1 < θ < θ̂2
as a (1 − α) × 100% confidence interval for θ. The probability 1 − α is called the degree of confidence, and the
endpoints of the interval are called the lower and upper confidence limits.

Theorem 7.1. If X, the mean of a random sample of size n from a normal population with known variance
σ 2 , is to be used as an estimator of the mean of the population, then the probability is 1 − α that the error will
be less than
σ
zα/2 · √ .
n
Theorem 7.2. If x is the value of the mean of a random sample of size n from a normal population with
known variance σ 2 , then
σ σ
x − zα/2 · √ < µ < x + zα/2 · √
n n
is a (1 − α)100% confidence interval for the mean of the population.

Theorem 7.3. If x and s are the values of the mean and the standard deviation of a random sample of size
n from a normal population, then
s s
x − tα/2, n−1 · √ < µ < x + tα/2, n−1 · √
n n
is a (1 − α)100% confidence interval for the mean of the population.

Theorem 7.4. If x1 and x2 are the values of the means of independent random samples of sizes n1 and n2
from normal populations with known variances σ12 and σ22 , then
s s
σ12 σ22 σ12 σ22
(x1 − x2 ) − zα/2 · + < µ1 − µ2 < (x1 − x2 ) + zα/2 · +
n1 n2 n1 n2
is a (1 − α)100% confidence interval for the difference between the two population means.

Theorem 7.5. If x1 , x2 , s1 , and s2 are the values of the means and the standard deviations of independent
random samples of sizes n1 and n2 from normal populations with equal variances, then
r r
1 1 1 1
(x1 − x2 ) − tα/2, n1 +n2 −2 · sp + < µ1 − µ2 < (x1 − x2 ) + tα/2, n1 +n2 −2 · sp +
n1 n2 n1 n2
is a (1 − α)100% confidence interval for the difference between the two population means, where the pooled
standard deviation is s
(n1 − 1)s21 + (n2 − 1)s22
sp = .
n1 + n2 − 2
STATISTICS-SUMMARY 17
x
Theorem 7.6. If X is a binomial random variable with parameters n and θ, n is large, and θ̂ = n, then
s s
θ̂(1 − θ̂) θ̂(1 − θ̂)
θ̂ − zα/2 · < θ < θ̂ + zα/2 ·
n n
is an approximate (1 − α)100% confidence interval for θ.
x
Theorem 7.7. If θ̂ = n is used as an estimate of θ, then with (1 − α)100% confidence, the error in the
estimate is less than s
θ̂(1 − θ̂)
zα/2 · .
n
Theorem 7.8. If X1 is a binomial random variable with parameters n1 and θ1 , and X2 is a binomial random
variable with parameters n2 and θ2 , where n1 and n2 are large, and θ̂1 = nx11 , θ̂2 = nx22 , then
s s
θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 )
(θ̂1 − θ̂2 ) − zα/2 · + < θ1 − θ2 < (θ̂1 − θ̂2 ) + zα/2 · +
n1 n2 n1 n2
is an approximate (1 − α)100% confidence interval for θ1 − θ2 .

Theorem 7.9. If s2 is the value of the variance of a random sample of size n from a normal population, then
(n − 1)s2 2 (n − 1)s2
< σ <
χ2α/2, n−1 χ21−α/2, n−1
is a (1 − α)100% confidence interval for σ 2 .

Theorem 7.10. If s21 and s22 are the values of the variances of independent random samples of sizes n1 and
n2 from normal populations, then
s21 1 σ12 s21
· < < · Fα/2, n2 −1, n1 −1
s22 Fα/2, n1 −1, n2 −1 σ22 s22
σ12
is a (1 − α)100% confidence interval for σ22
.
18 PREM DAGAR

No. Distribution PDF/PMF Mean Variance MGF


1 a+b (b−a)2 etb −eta
1 Uniform Distribution: b−a , a<x<b 2 12 t(b−a)
2 x
Bernoulli Distribution: θ (1 − θ)1−x , x = 0, 1 θ θ(1 − θ) 1 − θ + θet
n
θx (1 − θ)n−x , x = nθ

3 Binomial Distribution: x nθ(1 − θ) (1 − θ +
0, 1, . . . , n θet )n
θet
4 Geometric Distribution: θ(1 − θ)x−1 , x = 1θ 1−θ
θ2 1−(1−θ)et
1, 2, . . .
 r
r(1−θ) θet
Negative Binomial Distribution: x−1 r
 r
5 r−1 θ (1 − θ θ2 1−(1−θ)et
θ)x−r , x = r, r + 1, . . .
−λ x t −1)
6 Poisson Distribution: e x!λ , x = 0, 1, 2, . . . λ λ eλ(e
7 Exponential Distribution: λe −λx , x>0 1 1 λ
λ λ2 λ−t α
λα α−1 −λx α α λ
8 Gamma Distribution: Γ(α) x e , x>0 λ λ2 λ−t
xα−1 (1−x)β−1 α αβ
9 Beta Distribution: B(α,β) , 0< x<1 α+β (α+β)2 (α+β+1)

(x−µ)2 1
− 2 t2
10 Normal Distribution: √ 1 2 e 2σ 2 , −∞ < µ σ2 eµt+ 2 σ
2πσ
x<∞
11 Chi-square Distribution: ν 2ν (1−2t)−ν/2
ν
1
2ν/2 Γ(ν/2)
x 2 −1 e−x/2 , x > 0

You might also like