Stat
Stat
PREM DAGAR
1. Probability
Definition 1.1. Sample Space. The set of all possible outcomes of an experiment is called the sample
space, and it is usually denoted by the letter S. Each outcome in a sample space is called an element of the
sample space, or simply a sample point.
Definition 1.3. Mutually Exclusive Events. Two events having no elements in common are said to be
mutually exclusive.
Definition 1.4. Conditional Probability. If A and B are any two events in a sample space S and P (A) ̸= 0,
the conditional probability of B given A is
P (A ∩ B)
P (B | A) = .
P (A)
Theorem 1.1. If A and B are any two events in a sample space S and P (A) ̸= 0, then
P (A ∩ B) = P (A) · P (B | A).
Theorem 1.2. If A, B, and C are any three events in a sample space S such that P (A ∩ B) ̸= 0, then
P (A ∩ B ∩ C) = P (A) · P (B | A) · P (C | A ∩ B).
Theorem 1.3. If the events B1 , B2 , . . . , Bk constitute a partition of the sample space S and P (Bi ) ̸= 0 for
i = 1, 2, . . . , k, then for any event A in S,
k
X
P (A) = P (Bi ) · P (A | Bi ).
i=1
Theorem 1.4. If B1 , B2 , . . . , Bk constitute a partition of the sample space S and P (Bi ) ̸= 0 for i = 1, 2, . . . , k,
then for any event A in S such that P (A) ̸= 0,
P (Br ) · P (A | Br )
P (Br | A) = Pk for r = 1, 2, . . . , k.
i=1 P (Bi ) · P (A | Bi )
2. Distribution
Definition 2.1 (Random Variable). If S is a sample space with a probability measure and X is a real-valued
function defined over the elements of S, then X is called a random variable.
Definition 2.2 (Probability Distribution). If X is a discrete random variable, the function given by f (x) =
P (X = x) for each x within the range of X is called the probability distribution of X.
Theorem 2.1. A function can serve as the probability distribution of a discrete random variable X if and
only if its values, f (x), satisfy the conditions:
1
2 PREM DAGAR
Definition 2.3 (Distribution Function). If X is a discrete random variable, the function given by
X
F (x) = P (X ≤ x) = f (t) for − ∞ < x < ∞,
t≤x
where f (t) is the value of the probability distribution of X at t, is called the distribution function or the
cumulative distribution of X.
Theorem 2.2. The values F (x) of the distribution function of a discrete random variable X satisfy the
following conditions:
(1) F (−∞) = 0 and F (∞) = 1;
(2) If a < b, then F (a) ≤ F (b) for any real numbers a and b.
Theorem 2.3. If the range of a random variable X consists of the values x1 < x2 < x3 < · · · < xn , then
f (x1 ) = F (x1 ) and f (xi ) = F (xi ) − F (xi−1 ) for i = 2, 3, . . . , n.
Definition 2.4 (Probability Density Function). A function with values f (x), defined over the set of all real
numbers, is called a probability density function of the continuous random variable X if and only if
Z b
P (a ≤ X ≤ b) = f (x) dx
a
for any real constants a and b with a ≤ b.
Theorem 2.4. If X is a continuous random variable and a and b are real constants with a ≤ b, then
P (a ≤ X ≤ b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a < X < b).
Theorem 2.5. A function can serve as a probability density of a continuous random variable X if its values,
f (x), satisfy the conditions:
(1) fZ (x) ≥ 0 for −∞ < x < ∞;
∞
(2) f (x) dx = 1.
−∞
Definition 2.5 (Distribution Function). If X is a continuous random variable and the value of its probability
density at t is f (t), then the function given by
Z x
F (x) = P (X ≤ x) = f (t) dt for − ∞ < x < ∞
−∞
is called the distribution function or the cumulative distribution function of X.
Theorem 2.6. If f (x) and F (x) are the values of the probability density and the distribution function of X
at x, then
P (a ≤ X ≤ b) = F (b) − F (a)
for any real constants a and b with a ≤ b, and
dF (x)
f (x) =
dx
where the derivative exists.
Definition 2.6 (Joint Probability Distribution). If X and Y are discrete random variables, the function given
by
f (x, y) = P (X = x, Y = y)
for each pair of values (x, y) within the range of X and Y is called the joint probability distribution of X and
Y.
STATISTICS-SUMMARY 3
Theorem 2.7. A bivariate function can serve as the joint probability distribution of a pair of discrete random
variables X and Y if and only if its values, f (x, y), satisfy the conditions:
(1) f (x, y) ≥ 0 for each pair of values (x, y) within its domain;
P P
(2) x y f (x, y) = 1, where the double summation extends over all possible pairs (x, y) within its domain.
Definition 2.7 (Joint Probability Density Function). A bivariate function with values f (x, y) defined over
the xy-plane is called a joint probability density function of the continuous random variables X and Y if and
only if
ZZ
P ((X, Y ) ∈ A) = f (x, y) dx dy
A
for any region A in the xy-plane.
Theorem 2.8. A bivariate function can serve as a joint probability density function of a pair of continuous
random variables X and Y if its values, f (x, y), satisfy the conditions:
(1) fZ (x, y) ≥ 0 for −∞ < x < ∞, −∞ < y < ∞;
∞ Z ∞
(2) f (x, y) dx dy = 1.
−∞ −∞
Definition 2.8 (Joint Distribution Function). If X and Y are continuous random variables, the function
given by
Z y Z x
F (x, y) = P (X ≤ x, Y ≤ y) = f (s, t) ds dt for − ∞ < x < ∞, −∞ < y < ∞,
−∞ −∞
where f (s, t) is the joint probability density of X and Y at (s, t), is called the joint distribution function of
X and Y .
If f (x, y) is the value of their joint probability distribution at (x, y), the function given by
X
g(x) = f (x, y)
y
for each x within the range of X is called the marginal distribution of X. Correspondingly, the function given
by
X
h(y) = f (x, y)
x
for each y within the range of Y is called the marginal distribution of Y .
Definition 2.9 (Marginal Density). If X and Y are continuous random variables and f (x, y) is the value of
their joint probability density at (x, y), the function given by
Z ∞
g(x) = f (x, y) dy for − ∞ < x < ∞
−∞
Definition 2.10 (Conditional Distribution). If f (x, y) is the value of the joint probability distribution of the
discrete random variables X and Y at (x, y) and h(y) is the value of the marginal distribution of Y at y, the
function given by
f (x, y)
f (x | y) = , h(y) ̸= 0
h(y)
4 PREM DAGAR
for each x within the range of X is called the conditional distribution of X given Y = y. Correspondingly, if
g(x) is the value of the marginal distribution of X at x, the function given by
f (x, y)
w(y | x) = , g(x) ̸= 0
g(x)
for each y within the range of Y is called the conditional distribution of Y given X = x.
Definition 2.11 (Conditional Density). If f (x, y) is the value of the joint density of the continuous random
variables X and Y at (x, y) and h(y) is the value of the marginal density of Y at y, the function given by
f (x, y)
f (x | y) = , h(y) ̸= 0
h(y)
for −∞ < x < ∞, is called the conditional density of X given Y = y. Correspondingly, if g(x) is the value of
the marginal density of X at x, the function given by
f (x, y)
f (y | x) = , g(x) ̸= 0
g(x)
for −∞ < y < ∞, is called the conditional density of Y given X = x.
Definition 2.12 (Independence of Discrete Random Variables). If f (x1 , x2 , . . . , xn ) is the value of the joint
probability distribution of the discrete random variables X1 , X2 , . . . , Xn at (x1 , x2 , . . . , xn ) and fi (xi ) is the
value of the marginal distribution of Xi at xi for i = 1, 2, . . . , n, then the n random variables are independent
if and only if
f (x1 , x2 , . . . , xn ) = f1 (x1 ) · f2 (x2 ) · . . . · fn (xn )
for all (x1 , x2 , . . . , xn ) within their range.
3. Expectations
Definition 3.1 (Expected Value). If X is a discrete random variable and f (x) is the value of its probability
distribution at x, the expected value of X is
X
E(X) = x · f (x).
x
Correspondingly, if X is a continuous random variable and f (x) is the value of its probability density at x,
the expected value of X is Z ∞
E(X) = x · f (x) dx.
−∞
Theorem 3.1. If X is a discrete random variable and f (x) is the value of its probability distribution at x,
the expected value of g(X) is given by
X
E[g(X)] = g(x) · f (x).
x
Correspondingly, if X is a continuous random variable and f (x) is the value of its probability density at x,
the expected value of g(X) is given by
Z ∞
E[g(X)] = g(x) · f (x) dx.
−∞
E(aX + b) = aE(X) + b.
Theorem 3.5. If X and Y are discrete random variables and f (x, y) is the value of their joint probability
distribution at (x, y), the expected value of g(X, Y ) is
XX
E[g(X, Y )] = g(x, y) · f (x, y).
x y
Correspondingly, if X and Y are continuous random variables and f (x, y) is the value of their joint probability
density at (x, y), the expected value of g(X, Y ) is
Z ∞Z ∞
E[g(X, Y )] = g(x, y)f (x, y) dx dy.
−∞ −∞
Definition 3.2 (Moments about the Origin). The rth moment about the origin of a random variable X,
denoted by µ′r , is the expected value of X r ; symbolically,
X
µ′r = E(X r ) = xr · f (x)
x
when X is continuous.
Definition 3.3 (Mean of a Distribution). µ′1 is called the mean of the distribution of X, or simply the mean
of X, and it is denoted simply by µ.
Definition 3.4 (Moments about the Mean). The rth moment about the mean of a random variable X,
denoted by µr , is the expected value of (X − µ)r , symbolically
X
µr = E[(X − µ)r ] = (x − µ)r · f (x)
x
when X is continuous.
Definition 3.5 (Variance). µ2 is called the variance of the distribution of X, or simply the variance of X,
and it is denoted by σ 2 , σX
2 , var(X), or V (X). The positive square root of the variance, σ, is called the
standard deviation of X.
Theorem 3.7.
σ 2 = µ′2 − µ2
var(aX + b) = a2 σ 2
6 PREM DAGAR
Theorem 3.9 (Chebyshev’s Theorem). If µ and σ are the mean and the standard deviation of a random
variable X, then for any positive constant k the probability is at least 1 − k12 that X will take on a value within
k standard deviations of the mean; symbolically,
1
P (|X − µ|< kσ) ≥ 1 − , σ ̸= 0.
k2
Definition 3.6. The moment generating function of a random variable X, where it exists, is given by
X
MX (t) = E(etX ) = etx · f (x)
x
Theorem 3.10.
dr MX (t)
= µr
dtr t=0
Definition 3.7 (Product Moments About the Origin). The rth and sth product moment about the origin of
the random variables X and Y , denoted by µr,s , is the expected value of X r Y s ; symbolically,
XX
µr,s = E(X r Y s ) = xr y s · f (x, y)
x y
Definition 3.8 (Product Moments About the Mean). The rth and sth product moment about the means of
the random variables X and Y , denoted by µr,s , is the expected value of (X − µX )r (Y − µY )s ; symbolically,
XX
µr,s = E[(X − µX )r (Y − µY )s ] = (x − µX )r (y − µY )s · f (x, y)
x y
for r = 0, 1, 2, . . . and s = 0, 1, 2, . . . when X and Y are discrete, and
Z ∞ Z ∞
µr,s = E[(X − µX )r (Y − µY )s ] = (x − µX )r (y − µY )s · f (x, y) dx dy
−∞ −∞
when X and Y are continuous.
Definition 3.9 (Covariance). µ1,1 is called the covariance of X and Y , and it is denoted by σXY , cov(X, Y ),
or C(X, Y ).
Theorem 3.12.
σXY = µ1,1 − µX µY
Remark: It is of interest to note that the independence of two random variables implies a zero covariance,
but a zero covariance does not necessarily imply their independence.
Correspondingly, if X is a continuous random variable and f (x|y) is the conditional probability density of X
given Y = y at x, then the conditional expectation of u(X) given Y = y is
Z ∞
E[u(X) | y] = u(x) · f (x|y) dx.
−∞
8 PREM DAGAR
b(x; n, θ) = b(n − x; n, 1 − θ)
Theorem 2. The mean and the variance of the binomial distribution are
µ = nθ and σ 2 = nθ(1 − θ)
X
Theorem 3. If X has a binomial distribution with parameters n and θ, and Y = n, then
θ(1 − θ)
E(Y ) = θ and σY2 =
n
Now, if we apply Chebyshev’s theorem with kσ = c, we can assert that for any positive constant c, the
probability is at least
θ(1 − θ)
1−
nc2
that the proportion of successes in n trials falls between µ − c and µ + c.
Hence, as n → ∞, the probability approaches 1 that the proportion of successes will differ from µ by less
than any arbitrary constant c.
This result is called a law of large numbers, and it should be observed that it applies to the proportion of
successes, not to their actual number. It is a fallacy to suppose that when n is large, the number of successes
must necessarily be close to nθ.
THEOREM 4. The moment-generating function of the binomial distribution is given by
n
MX (t) = 1 + θ et − 1
Thus, the number of the trial on which the kth success occurs is a random variable having a negative binomial
distribution with the parameters k and θ. The name “negative binomial distribution” derives from the fact
that the values of b∗ (x; k, θ) for x = k, k + 1, k + 2, . . . are the successive terms of the binomial expansion of
1 1 − θ −k
− .
θ θ
In the literature of statistics, negative binomial distributions are also referred to as binomial waiting-time
distributions or as Pascal distributions.
Theorem 5.
k
b∗ (x; k, θ) = · b(k; x, θ)
x
Theorem 6.
The mean and the variance of the negative binomial distribution are
k 2 k 1
µ= and σ = −1
θ θ θ
Definition 5. Geometric Distribution.
A random variable X has a geometric distribution and is referred to as a geometric random variable if and
only if its probability distribution is given by
µ=λ and σ 2 = λ
Theorem 9.
The moment-generating function of the Poisson distribution is given by
t −1)
MX (t) = eλ(e
A random variable X has a uniform distribution and is referred to as a continuous uniform random variable
if and only if its probability density function is given by
1 for α < x < β,
u(x; α, β) = β−α
0 elsewhere.
Theorem 1.
The mean and the variance of the uniform distribution are given by
α+β 1
µ= and σ 2 = (β − α)2
2 12
Definition 2. Gamma Distribution.
A random variable X has a gamma distribution and is referred to as a gamma random variable if and only
if its probability density function is given by
1 xα−1 e−x/β for x > 0,
α
g(x; α, β) = β Γ(α)
0 elsewhere,
where α > 0 and β > 0.
Definition 3. Exponential Distribution.
A random variable X has an exponential distribution and is referred to as an exponential random variable if
and only if its probability density function is given by
1 e−x/θ for x > 0,
g(x; θ) = θ
0 elsewhere,
where θ > 0.
Definition 4. Chi-Square Distribution.
A random variable X has a chi-square distribution and is referred to as a chi-square random variable if and
only if its probability density function is given by
ν
1
2ν/2 Γ(ν/2)
x 2 −1 e−x/2 for x > 0,
f (x; ν) =
0 elsewhere,
where ν > 0 is the degrees of freedom.
Theorem 2.
The rth moment about the origin of the gamma distribution is given by
Γ(α + r)
µ′r = β r .
Γ(α)
Theorem 3.
The mean and the variance of the gamma distribution are given by
µ = αβ and σ 2 = αβ 2 .
Corollary 1.
The mean and the variance of the exponential distribution are given by
µ=θ and σ 2 = θ2 .
Corollary 2.
The mean and the variance of the chi-square distribution are given by
A random variable X has a beta distribution and is referred to as a beta random variable if and only if its
probability density function is given by
Γ(α+β) xα−1 (1 − x)β−1 for 0 < x < 1,
f (x; α, β) = Γ(α) Γ(β)
0 elsewhere,
where α > 0 and β > 0.
Theorem 5.
The mean and the variance of the beta distribution are given by
α αβ
µ= and σ 2 = .
α+β (α + β)2 (α + β + 1)
Definition 6. Normal Distribution.
A random variable X has a normal distribution and is referred to as a normal random variable if and only
if its probability density function is given by
1 1 x−µ 2
n(x; µ, σ) = √ e− 2 ( σ ) for − ∞ < x < ∞,
σ 2π
where σ > 0.
Theorem 6.
The moment-generating function of the normal distribution is given by
1 2 t2
MX (t) = eµt+ 2 σ .
Definition 7. Standard Normal Distribution.
The normal distribution with µ = 0 and σ = 1 is referred to as the standard normal distribution.
If a random variable X has a normal distribution with mean µ and standard deviation σ, then
X −µ
Z=
σ
has the standard normal distribution.
Theorem 8.
If X is a random variable having a binomial distribution with parameters n and θ, then the moment-generating
function of
X − nθ
Z=p
nθ(1 − θ)
approaches that of the standard normal distribution as n → ∞.
Definition 8. Bivariate Normal Distribution.
A pair of random variables X and Y has a bivariate normal distribution and are referred to as jointly normally
distributed random variables if and only if their joint probability density function is given by
( " #)
x − µ1 2 y − µ2 2
1 1 x − µ1 y − µ2
f (x, y) = exp − − 2ρ +
2(1 − ρ2 )
p
2πσ1 σ2 1 − ρ2 σ1 σ1 σ2 σ2
for −∞ < x < ∞ and −∞ < y < ∞, where σ1 > 0, σ2 > 0, and −1 < ρ < 1.
Theorem 9.
If X and Y have a bivariate normal distribution, the conditional density of Y given X = x is a normal
distribution with mean
σ2
µY |x = µ2 + ρ (x − µ1 )
σ1
and variance
σY2 |x = σ22 (1 − ρ2 ).
Similarly, the conditional density of X given Y = y is a normal distribution with mean
σ1
µX|y = µ1 + ρ (y − µ2 )
σ2
12 PREM DAGAR
and variance
2
σX|y = σ12 (1 − ρ2 ).
Theorem 10.
If two random variables have a bivariate normal distribution, they are independent if and only if ρ = 0.
Theorem 3.
If X1 , X2 , . . . , Xn are independent random variables and
Y = X1 + X2 + · · · + Xn ,
then the moment-generating function of Y is
n
Y
MY (t) = MXi (t),
i=1
5. Sampling Distributions
Definition 1. Population.
A set of numbers from which a sample is drawn is referred to as a population. The distribution of the numbers
constituting a population is called the population distribution.
Definition 2. Random Sample.
If X1 , X2 , . . . , Xn are independent and identically distributed random variables, we say that they constitute
a random sample from the infinite population given by their common distribution.
Definition 3. Sample Mean and Sample Variance.
If X1 , X2 , . . . , Xn constitute a random sample, then the sample mean is given by
n
1X
X= Xi
n
i=1
and the sample variance is given by
n
2 1 X
S = (Xi − X)2 .
n−1
i=1
Theorem 1.
If X1 , X2 , . . . , Xn constitute a random sample from an infinite population with mean µ and variance σ 2 , then
σ2
E(X) = µ and var(X) = .
n
Theorem 2.
For any positive constant c, the probability that X will take on a value between µ − c and µ + c is at least
σ2
1− .
nc2
As n → ∞, this probability approaches 1.
Theorem 3. Central Limit Theorem.
If X1 , X2 , . . . , Xn constitute a random sample from an infinite population with mean µ, variance σ 2 , and
moment-generating function MX (t), then the limiting distribution of
X −µ
Z= √
σ/ n
as n → ∞ is the standard normal distribution.
Theorem 4.
If X is the mean of a random sample of size n from a normal population with mean µ and variance σ 2 , then
2
its sampling distribution is a normal distribution with mean µ and variance σn .
STATISTICS-SUMMARY 13
Theorem 7. If X has the standard normal distribution, then X 2 has the chi-square distribution with ν = 1
degree of freedom.
Theorem 8. If X1 , X2 , . . . , Xn are independent random variables having standard normal distributions,
then
Xn
Y = Xi2
i=1
has the chi-square distribution with ν = n degrees of freedom.
Theorem 9. If X1 , X2 , . . . , Xn are independent random variables having chi-square distributions with
ν1 , ν2 , . . . , νn degrees of freedom, then
Xn
Y = Xi
i=1
has the chi-square distribution with ν1 + ν2 + · · · + νn degrees of freedom.
Theorem 10. If X1 and X2 are independent random variables, X1 has a chi-square distribution with ν1
degrees of freedom, and X1 + X2 has a chi-square distribution with ν > ν1 degrees of freedom, then X2 has a
chi-square distribution with ν − ν1 degrees of freedom.
Theorem 11. If X and S 2 are the mean and the variance of a random sample of size n from a normal
population with mean µ and standard deviation σ, then
(1) X and S 2 are independent;
14 PREM DAGAR
Theorem 15. If S12 and S22 are the variances of independent random samples of sizes n1 and n2 from normal
populations with the variances σ12 and σ22 , then
S12 /σ12 σ2S 2
F = 2 2 = 22 12
S2 /σ2 σ1 S2
is a random variable having an F distribution with n1 − 1 and n2 − 1 degrees of freedom.
The F distribution is also known as the variance-ratio distribution.
6. Point Estimation
Definition 6.1 (Point Estimation). Using the value of a sample statistic to estimate the value of a population
parameter is called point estimation. The value of the statistic obtained from the sample is referred to as a
point estimate.
Definition 6.2 (Unbiased Estimator). A statistic θ̂ is an unbiased estimator of the parameter θ of a given
distribution if and only if
E(θ̂) = θ
for all possible values of θ.
Definition 6.3 (Asymptotically Unbiased Estimator). Letting bn (θ) = E(θ̂) − θ express the bias of an
estimator θ̂ based on a random sample of size n from a given distribution, we say that θ̂ is an asymptotically
unbiased estimator of θ if and only if
lim bn (θ) = 0.
n→∞
STATISTICS-SUMMARY 15
Theorem 6.1. If S2 is the variance of a random sample from an infinite population with finite variance σ 2 ,
then
E(S 2 ) = σ 2 .
Definition 6.4. Minimum Variance Unbiased Estimator. The estimator for the parameter θ of a given
distribution that has the smallest variance of all unbiased estimators for θ is called the minimum variance
unbiased estimator, or the best unbiased estimator for θ.
Definition 6.5 (Consistent Estimator). A statistic θ̂ is a consistent estimator of the parameter θ of a given
distribution if and only if for each c > 0,
lim P (|θ̂ − θ|< c) = 1.
n→∞
Definition 6.6. The statistic θ̂ is a sufficient estimator of the parameter θ of a given distribution if and only
if for each value of θ̂, the conditional probability distribution or density of the random sample X1 , X2 , . . . , Xn ,
given θ̂ = t, is independent of θ.
Theorem 6.4. The statistic θ̂ is a sufficient estimator of the parameter θ if and only if the joint probability
distribution or density of the random sample can be factored as
f (x1 , x2 , . . . , xn ; θ) = g(θ̂, θ) · h(x1 , x2 , . . . , xn ),
where g(θ̂, θ) depends only on θ̂ and θ, and h(x1 , x2 , . . . , xn ) does not depend on θ.
Definition 6.7 (Sample Moments). The kth sample moment of a set of observations x1 , x2 , . . . , xn is the
mean of their kth powers and it is denoted by mk ; symbolically,
n
1X k
mk = xi .
n
i=1
Definition 6.8 (Maximum Likelihood Estimator). If x1 , x2 , . . . , xn are the values of a random sample from
a population with the parameter θ, the likelihood function of the sample is given by
L(θ) = f (x1 , x2 , . . . , xn ; θ)
for values of θ within a given domain. Here, f (x1 , x2 , . . . , xn ; θ) is the value of the joint probability distribution
or the joint probability density of the random variables X1 , X2 , . . . , Xn at X1 = x1 , X2 = x2 , . . . , Xn = xn .
We refer to the value of θ that maximizes L(θ) as the maximum likelihood estimator of θ.
Example 6.1. If x1 , x2 , . . . , xn are the values of a random sample from an exponential population, find the
maximum likelihood estimator of its parameter θ.
Solution:
Since the likelihood function is given by
n n
Y 1 1 Pn
L(θ) = f (x1 , x2 , . . . , xn ; θ) = f (xi ; θ) = · e− θ i=1 xi
θ
i=1
16 PREM DAGAR
7. Interval Estimation
Definition 7.1 (Confidence Interval). If θ̂1 and θ̂2 are values of the random variables Θ̂1 and Θ̂2 such that
P (Θ̂1 < θ < Θ̂2 ) = 1 − α
for some specified probability 1 − α, we refer to the interval
θ̂1 < θ < θ̂2
as a (1 − α) × 100% confidence interval for θ. The probability 1 − α is called the degree of confidence, and the
endpoints of the interval are called the lower and upper confidence limits.
Theorem 7.1. If X, the mean of a random sample of size n from a normal population with known variance
σ 2 , is to be used as an estimator of the mean of the population, then the probability is 1 − α that the error will
be less than
σ
zα/2 · √ .
n
Theorem 7.2. If x is the value of the mean of a random sample of size n from a normal population with
known variance σ 2 , then
σ σ
x − zα/2 · √ < µ < x + zα/2 · √
n n
is a (1 − α)100% confidence interval for the mean of the population.
Theorem 7.3. If x and s are the values of the mean and the standard deviation of a random sample of size
n from a normal population, then
s s
x − tα/2, n−1 · √ < µ < x + tα/2, n−1 · √
n n
is a (1 − α)100% confidence interval for the mean of the population.
Theorem 7.4. If x1 and x2 are the values of the means of independent random samples of sizes n1 and n2
from normal populations with known variances σ12 and σ22 , then
s s
σ12 σ22 σ12 σ22
(x1 − x2 ) − zα/2 · + < µ1 − µ2 < (x1 − x2 ) + zα/2 · +
n1 n2 n1 n2
is a (1 − α)100% confidence interval for the difference between the two population means.
Theorem 7.5. If x1 , x2 , s1 , and s2 are the values of the means and the standard deviations of independent
random samples of sizes n1 and n2 from normal populations with equal variances, then
r r
1 1 1 1
(x1 − x2 ) − tα/2, n1 +n2 −2 · sp + < µ1 − µ2 < (x1 − x2 ) + tα/2, n1 +n2 −2 · sp +
n1 n2 n1 n2
is a (1 − α)100% confidence interval for the difference between the two population means, where the pooled
standard deviation is s
(n1 − 1)s21 + (n2 − 1)s22
sp = .
n1 + n2 − 2
STATISTICS-SUMMARY 17
x
Theorem 7.6. If X is a binomial random variable with parameters n and θ, n is large, and θ̂ = n, then
s s
θ̂(1 − θ̂) θ̂(1 − θ̂)
θ̂ − zα/2 · < θ < θ̂ + zα/2 ·
n n
is an approximate (1 − α)100% confidence interval for θ.
x
Theorem 7.7. If θ̂ = n is used as an estimate of θ, then with (1 − α)100% confidence, the error in the
estimate is less than s
θ̂(1 − θ̂)
zα/2 · .
n
Theorem 7.8. If X1 is a binomial random variable with parameters n1 and θ1 , and X2 is a binomial random
variable with parameters n2 and θ2 , where n1 and n2 are large, and θ̂1 = nx11 , θ̂2 = nx22 , then
s s
θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 )
(θ̂1 − θ̂2 ) − zα/2 · + < θ1 − θ2 < (θ̂1 − θ̂2 ) + zα/2 · +
n1 n2 n1 n2
is an approximate (1 − α)100% confidence interval for θ1 − θ2 .
Theorem 7.9. If s2 is the value of the variance of a random sample of size n from a normal population, then
(n − 1)s2 2 (n − 1)s2
< σ <
χ2α/2, n−1 χ21−α/2, n−1
is a (1 − α)100% confidence interval for σ 2 .
Theorem 7.10. If s21 and s22 are the values of the variances of independent random samples of sizes n1 and
n2 from normal populations, then
s21 1 σ12 s21
· < < · Fα/2, n2 −1, n1 −1
s22 Fα/2, n1 −1, n2 −1 σ22 s22
σ12
is a (1 − α)100% confidence interval for σ22
.
18 PREM DAGAR