Review of Statistics
Topics
Descriptive Statistics
Mean, Variance
Probability
Union event, joint event
Random Variables
Discrete and Continuous
Distributions, Moments
Two Random Variables
Covariance and correlation
Central Limit Theorem
Hypothesis testing
z-test, p-value
Simple Linear Regression
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Descriptive Statistics
Involves
Collecting Data
Presenting Data
Characterizing Data
Purpose
Describe Data
90
80
70
60
50
40
30
20
10
0
East
West
North
1st 2nd 3rd 4th
Qtr Qtr Qtr Qtr
Inferential Statistics
Involves
Estimation
Hypothesis
Testing
Purpose
Make Decisions
About Population
Characteristics
Population?
Descriptive Statistics
Mean
Measure of central tendency
Acts as Balance Point
Affected by extreme values (outliers)
Formula:
n
X =
Xi
i =1
X1 + X2 +
n
+ Xn
Median
Measure of central tendency
Middle value in ordered sequence
If odd n, Middle Value of Sequence
If even n, Average of 2 Middle Values
Value that splits the distribution into two
halves
Not Affected by Extreme Values
Median (Example)
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Median =
16 + 16
2
= 16
Mode
Measure of Central Tendency
Value That Occurs Most Often
Not Affected by Extreme Values
There May Be Several Modes
Raw Data:
17
16
Ordered:
11
12
21
18
13
16
12
11
13
16
16
17
18
21
Sample Variance
n
S =
(Xi X)
i =1
n 1
2
n - 1 in denominator!
(Use n if population
variance)
(X1 X) + (X2 X) +
n 1
+ (X n X)
Sample Standard Deviation
S=
S
n
(Xi X )
i =1
n 1
2
(X1 X ) + (X 2 X ) +
n 1
+ (Xn X )
Probability
Event, Sample Space
Event:
Sample space:
one possible outcome
collection of all the possible events
S ={
Probability of an outcome: proportion of times that the
outcome occurs in the long run
The complement of event A: includes all the events that
are not part of the event A: Symbol A
Event A { }
Complement of A A {
Properties of an Event
1. Mutually Exclusive
Two outcomes that cannot occur
at the same time
2. Collectively Exhaustive
One outcome in sample space
must occur
Experiment: Observe
gender of one person
Joint Events
Joint event: Event that has two or more characteristics
means intersection of event (set) A and event (set) B
Example:
A and B, (AB): Female, Under age 20
Compound Events
Union of event A and event B ( A B ): Total area of the
two circles
A B contains all the outcomes which are part of event
(set) A, part of event (set) B or part of both A and B
means union of event A and event B
Compound Probability Addition Rule
Used to Get Compound Probabilities for Unions of Events
P(A OR B)
= P(A B)
= P(A) + P(B) - P(A B)
For Mutually Exclusive Events:
P(A OR B)
= P(A B) = P(A) + P(B)
Mutually Exclusive Events
A
B
Random variables
Random variable
numerical summary of a random outcome
a function that assigns a numerical value to each simple
event in a sample space
Discrete or continuous random variables
Discrete: only a discrete set of possible values
=> summarized by probability distribution:
list of all possible values of the variables and the
probability that each value will occur.
Continuous: continuum of possible values
=> summarized by the probability density function
(pdf)
Discrete Probability Distribution
1.
List of pairs [ Xi, P(Xi) ]
Xi = Value of Random Variable (Outcome)
P(Xi) = Probability Associated with Value
2.
Mutually exclusive (no overlap)
3.
Collectively exhaustive (nothing left out)
4.
0 P(Xi) 1
5.
P(Xi) = 1
Joint Probability Using Contingency
Table
Conditional
probability:
Event
Event
B1
B2
Total
A1
P(A1 B1) P(A1 B2) P(A1)
A2
P(A2 B1) P(A2 B2) P(A2)
Total
P(B1)
P(B2)
Joint Probability
Marginal Probability
Joint distribution:
Marginal distributions:
Conditional distribution:
P( A B1 )
P ( A1 B1 )
P ( B1 )
P ( A2 B1 )
P( B1 )
Contingency Table Example
Joint Event: Draw 1 Card. Note Kind, Color
Color
Type
Ace
Red
Black
Total
2/52
2/52
4/52
Non-Ace 24/52
24/52 48/52
26/52
26/52 52/52
Total
P(Red)
P(Ace)
P(Ace AND Red)
Moments Discrete Case
Moment: Summary of a certain aspect of a
distribution
Mean, Expected Value
Mean of Probability Distribution
Weighted Average of All Possible Values
= E(X) = Xi P(Xi)
Variance
Weighted Average Squared Deviation about
Mean
2 = E[ (Xi )2 ] = (Xi )2 P(Xi)
Statistical Independence
When the outcome of one event (B) does not affect the
probability of occurrence of another event (A), the events A
and B are said to be statistically independent.
Example: Toss a coin twice => no causality
Condition for independence:
Two events A and B are statistically independent if and
only if (iff)
P(A | B) = P(A)
Bayes Theorem and Multiplication Rule
Bayes Theorem
P(A | B) =
P(A B)
P(B)
The difficult part is P(A B)
Use above equation to derive P(A B)
P(A and B) = P(A B)
= P(A)P(B | A)
= P(B)P(A | B)
For independent events:
P(A and B) = P(A B) = P(A)P(B)
Covariance
Measures the joint variability of two random variables
N
= (X )(Y )P(X Y )
i
X
i
Y
i, i
XY
i=1
Can take any value in the real numbers
Depends on units of measurement (e.g., dollars, cents,
billions of dollars)
Example:
positive covariance = y and x are positively related;
when y is above its mean, x tends to be above its mean;
when y is below its mean, x tends to be below its mean.
Correlation
Standardized covariance, takes values in [-1, 1]
Does not depend on unit of measurement
Correlation coefficient () formula:
cov( XY ) XY
=
=
X Y
X Y
Covariance and correlation measure only linear dependence!
Example: Cov(X,Y)=0
Does not necessarily imply that y and x are independent.
They may be non-linearly related.
But if X and Y are jointly normally distributed, then they
are independent.
Sum of Two Random Variables
Expected Value of the Sum of Two Random
Variables
E(X + Y) = E(X) + E(Y)
Variance of the Sum of Two Random Variables
2
Var (X + Y) = X+Y
= 2X + 2 Y + 2XY
Continuous Probability Distributions Normal Distribution
Bell-Shaped, symmetrical
Mean, median, mode are equal
f(X)
Infinite range
68% of the data are within 1 standard
deviation of the mean
95% of the data are within 2 standard
deviations of the mean
In early 1800's, German
mathematician and physicist Karl
Gauss used it to
analyze astronomical
data, therefore known
as Gaussian distribution.
Mean, Median,
Mode
Normal Distribution
Probability Density Function
1 X
2
f (X )= 1 e
2
f(X)
=
=
=
=
=
frequency of random variable X
3.14159; e = 2.71828
population standard deviation
value of random variable (- < X < )
population mean
Effect of Varying Parameters ( & )
f(X)
B
A
C
X
Normal Distribution Probability
Probability is the
area under the
curve!
P (c X d ) =
f ( x)dx ?
c
f(X)
Infinite Number of Normal Distribution
Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
Thats an infinite number!
Standardize the Normal Distribution
Z=
Normal
Distribution
Standardized
Normal Distribution
z = 1
Z = 0
One table!
Standardizing Example
6
.
2
5
Z=
=
= 0.12
10
Normal
Distribution
= 10
= 5 6.2 X
Standardized
Normal Distribution
Z = 1
Z= 0 .12
Moments: Mean, Variance
(Continuous Case)
Mean, Expected Value
Mean of probability distribution
Weighted average
of
all
possible
values
= E(X) = X f(X) dX
-
Variance
Weighted average squared
deviation about mean
2 = E[ (X )2 ] = (X- )2 f(X) dX
-
Moments: Skewness, Kurtosis
E(X )
Skewness:
S=
Measures asymmetry in distribution
3
The larger the absolute size of the skewness, the more
asymmetric is distribution.
A large positive value indicates a long right tail, and a large
negative value indicates a long left tail. A zero value
indicates symmetry around the mean.
3
E(X )
K=
Kurtosis:
4
Measures thickness of tails of a distribution
A kurtosis above three indicates fat tails or leptokurtosis,
relative to the normal, i.e. extreme events are more likely to
occur.
Central Limit Theorem: Basic Idea
As sample size
gets
large
(n 30) ...
sample mean
will have a normal
distribution.
Important Continuous Distributions
All derived from normal
distribution
2
distribution: arises
from squared normal
random variables,
t distribution: arises
from ratios of normal
2
and
variables
F distribution: arises
2
from ratios of
variables.
distribution
t distribution (red),
normal distribution (blue)
F distribution
Fundamentals of Hypothesis
Testing
Identifying Hypotheses
1.
2.
3.
Question, e.g. test that the population mean is equal to 3
State the question statistically (H0: = 3)
State its opposite statistically (H1: 3)
Hypotheses are mutually exclusive & exhaustive
Sometimes it is easier to form the alternative
hypothesis first.
4. Choose level of significance
Typical values are 0.01, 0.05, 0.10
Rejection region of sampling distribution: the
unlikely values of sample statistic if null hypothesis is
true
Identifying Hypotheses: Examples
1. Is the population average amount of TV viewing
12 hours?
= 12
12
H0: = 12
H1: 12
2. Is the population average amount of TV viewing
different from 12 hours?
12
= 12
H0: = 12
H1: 12
Hypothesis Testing: Basic Idea
Sampling Distribution
It is unlikely
that we would
get a sample
mean of this
value ...
... Therefore, we
reject the null
hypothesis that
= 50.
... if in fact this were
the population mean.
20
= 50
H0
sample mean
Example: Z-test statistic ( known)
1. Convert Sample Statistic (e.g., X ) to Standardized
Z Variable
Z=
X x
x
X
=
n
2. Compare to Critical Z Values
If Z-test statistic falls in critical region, reject H0;
Otherwise do not reject H0
p-value
Probability of obtaining a test statistic more
extreme ( or ) than actual sample value
given H0 is true
Smallest value of for which H0 can be
rejected
Used to make rejection decision
If p value , do not reject H0
If p value < , reject H0
One-Tailed Test: Rejection Region
H0: 0 H1: < 0
H0: 0 H1: > 0
Reject H0
Reject H0
Must be significantly
below .
Here: Small values dont
contradict H0.
One-Tailed Z Test: Finding Critical Z
Values
What Is Z Given = 0.025?
.500
- .025
.475
Z = 1
/2 = .025
Standardized Normal
Probability Table (Portion)
.05
.06
.07
1.6 .4505 .4515 .4525
1.7 .4599 .4608 .4616
1.9 .4744 .4750 .4756
0 1.96 Z
1.8 .4678 .4686 .4693
Two-Tailed Test: Rejection Regions
H0: = 0 H1: 0
Sampling Distribution
Level of Confidence
Rejection
Region
Rejection
Region
1-
1/2
1/2
Nonrejection
Region
Critical
Value
H0
Sample Statistic
Value Critical
Value
t-test, F-test
Test statistic may not be normally distributed
=> z-test not applicable
Examples: Variance unknown, but estimated.
Hypothesis that the slope of a regression line differs
significantly from zero.
=> t-test
Hypothesis that the standard deviations of two normally
distributed populations are equal.
=> F-test
Jarque-Bera test
Assesses whether a given sample of data is normally
distributed.
Aggregates information in the data about both skewness
and kurtosis. Test of the hypothesis that S = 0 and K = 3,
based on S and K .
2
T
1
2
Test statistic:
JB = S + K 3
(
4
(here T is the number of observations)
Under the null hypothesis of independent normallydistributed observations, the Jarque-Bera
statistic is
2
distributed in large samples as a random variable with 2
degrees of freedom.
Simple Linear Regression
Simple Linear Regression Model
y-intercept
random iid
error t (0, 2 )
slope
yt = 0 + 1 xt + t
dependent
(response)
variable
independent
(explanatory)
variable
Linear Regression Assumptions
1. x is exogenously determined
2. t are iid(0,2)
(iid = independently and identically distributed)
Zero mean
Independence of errors (no autocorrelation)
Constant variance (homoscedasticity)
More things to think about:
Normality of t (if not satisfied, inference procedures only
asymptotically valid)
Model specification (e.g. linearity, 1 constant over time?)
Simple Linear Regression Model
y
yt = 0 + 1 xt + t
observed
value
t = disturbance
E y x* = 0 + 1 x*
x
observed value
-- Sample Linear Regression Model
y
y i = b0 + b1x i + ei
ei = Random
Error
y i = b0 + b1x i
Unsampled
Observation
x
Observed Value
Ordinary Least Squares
T
OLS minimizes sum of squared residuals ( yt yt )
T
x
min
y
t
0
1 t
0 , 1
t =1
t =1
= et 2
t =1
predicted
value
yt = 0 + 1 xt + t
e4
e2
e1
e3
fitted value
(in-sample forecast)
yt = 0 + 1 xt
On Thursday: Evaluating the Model
1. Examine variation measures
coefficient of determination (goodness of fit)
standard error of the estimate
2. Analyze residuals e
serial correlation
3. Test coefficients
for significance
yt = 0 + 1 xt
Random Error Variation
1.Variation of Actual Y from Predicted Y
2. Measured by Standard Error of Estimate
Sample Standard Deviation of e
Denoted SYX
3. Affects Several Factors
Parameter Significance
Prediction Accuracy
Measures of Variation in Regression
1.Total Sum of Squares (SST)
Measures variation of observed Yi around the
mean,Y
2.Explained Variation (SSR)
Variation due to relationship between
X&Y
3.Unexplained Variation (SSE)
Variation due to other factors
Variation Measures
Yi
Total Sum
of Squares
(Yi - Y)2
Unexplained Sum of
Squares (Yi - Y^i)2
Yi = b0 + b1X i
Explained Sum of
Squares (Y^i - Y)2
Xi
Coefficient of Determination
Proportion of Variation Explained by
Relationship Between X & Y
0 r2 1
2
r =
Explained Variation
Total Variation
n
i =1
i =1
SSR
SST
b0 Yi + b1 X iYi n (Y)
n
Yi
i =1
n (Y)
Coefficients of Determination (r2) and
Correlation (r)
Y r2 = 1, r = +1
Y r2 = 1, r = -1
^=b +b X
Y
i
^=b +b X
Y
i
0
1 i
X
Yr2 = .8, r = +0.9
X
Y
^=b +b X
Y
i
0
1 i
X
1 i
r2 = 0, r = 0
^=b +b X
Y
i
0
1 i
X
Standard Error of Estimate
n
SYX =
SYX =
Y
i =1
(
Y
Y
)
i i
i =1
n2
n
i =1
i =1
b0 Yi b1 X i Yi
n2
Residual Analysis
1.Graphical Analysis of Residuals
Plot residuals vs. Xi values
Residuals mean errors
Difference between actual Yi & predicted Yi
2.Purposes
Examine functional form (linear vs. non-Linear
Model)
Evaluate violations of assumptions
Test of Slope Coefficient for Significance
1.Tests If There Is a Linear Relationship
Between X & Y
2.Hypotheses
H0: 1 = 0 (No Linear Relationship)
H1: 1 0 (Linear Relationship)
3.Test Statistic
b1
1
t
=
n2 S
b
1
S
YX
where S =
b
n 2
2
1
X
n
(
X
)
i
i =1