0% found this document useful (0 votes)

26 views30 pages

00 - Inrroduction To Statistics

This document provides an overview of key concepts in statistics including terminology, confidence intervals, hypothesis testing, notation, standard error and standard deviation, sampling distributions, t-distributions, hypothesis tests, and the difference between statistical significance and everyday significance. It explains concepts like confidence intervals, how they are calculated, and their relationship to hypothesis testing. It also defines important terms like p-values, significance levels, and how they are interpreted in statistical analyses.

Uploaded by

mehri.mahdi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views30 pages

00 - Inrroduction To Statistics

Uploaded by

mehri.mahdi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Overview

• Terminology
• Confidence intervals
• Hypothesis testing
What is statistics?
• Statistics is about making decisions in the face of uncertainty
• In the simplest case, we are trying to understand uncertainty
when estimating a single mean
• Often, we characterize the uncertainty by assuming a
distribution
Notation
• The normal or Gaussian distribution can be written in one of
two ways.
G=Gaussian
N=Normal

G(µ ,σ )
N (µ ,σ )
2
Standard Error (se) and
Standard Deviation (sd)
• There are unfortunately no standard definitions of sd and se.

– Standard deviation: square root of the variance of a random variable

sd ( X ) = Var ( X )
– Standard error: square root of the variance of a function of a random
variable
– e.g. se( X ) = Var ( X )
Standard Error (se) and
Standard Deviation (sd)
• Some professors refer to standard deviations as the theoretical
quantities (the formula), and to standard errors as the
estimated quantities
Small Example
• Draw a sample from some population
• Data: x=4,5,6
• Clicker Question: What is the sample
standard deviation sd ( X ) ?
Answers:
a) sqrt(2/3)
b) 2/3
c) 1
d) 2
e) other
Small Example
• Data: 4,5,6
• Given that you have computed the sd, what is the standard
error of the mean
se( X )
• Clicker question: What is the se? Answers:
a) sd/n
b) sd/sqrt(n)
c) sd*n
d) sd*sqrt(n)
e) sd
Symbols for estimator vs estimate
• An estimator is a formula, an estimate is a numerical value
• In Statistics we distinguish between an estimator
(tilde ~) and an estimate (hat ^)
Confidence intervals
• One sample model

Y= µ + ε
where ε is a residual or error
• What is a plausible range for µ ?
• We have a sample of Y’s
Confidence interval
• For a mean  σ σ 
 y −c* , y+ c * 
 n n
• Where c is a critical value and depends on the distribution of y
• Because of the central limit theorem, for large samples the
distribution of y is approximated by a normal distribution
– regardless of the distribution of y
• The critical value c needs to be obtained from a look-up table or a
computer
– For the 95% two-sided confidence interval c=1.96
Sampling distribution of y
• E.g. alpha=5%
.4

• We are 95% confident that the true

mean is in the interval
.3
probability density

• 95% of the time, when we

calculate a confidence interval in

(1 − α )
this way, the true mean will be in
.1

the confidence interval

α /2 α /2
• Technically Incorrect: The
0

Because the total probability is 1, probability of the true mean being

the total area under the curve is 1. in the interval is 95%
α / 2 + (1 − α ) + α / 2 =
1
Sampling distribution assuming normality
• Suppose we assume normality

µ + ε , ε ∼ G (0, σ )
Y =
• Then the sampling distribution is t
µˆ − µ
~ tn −1
σˆ n −1 / n
• The t-distribution looks very similar to the normal distribution, but
has heavier tails
t-distribution vs. normal
• The t-distribution has

.4
heavier tails

.3
.2
y
.1
0
-4 -2 0 2 4
x

t density with 4 df Normal density

t-distribution vs. normal
• The t-distribution converges to the normal distribution as n
increases.
– Once (n>30), there is not much difference.
• For a 95% confidence interval:
– critical values of the t-distribution are always larger than 1.96.
– critical values converge to 1.96 for “large” sample sizes
Example
• A 95% confidence interval for the simple example with 3
observations:
CI =( y − t2 * se( y ), y + t2 * se( y ) )
 1 1 
=  5 − 4.3* ,5 + 4.3* 
 3 3
= ( 2.52, 7.48 )
• Note: If I had had a large number of observations, the critical value
would have been 1.96
Hypothesis tests
• A hypothesis is a statement about a parameter
• The most common hypothesis test is
H0 : µ = 0
H1 : µ ≠ 0
• The alternative hypothesis affects the answer. Here it specifies both
large positive and negative values are evidence against the
hypothesis
Rejection of hypotheses
• We either reject the null hypothesis or we
fail to reject the hypothesis
– We never accept the null hypothesis, because
we can never be sure it is true
• This parallels convictions in a court of law:
– “guilty” implies proof of guilt beyond H 0 : innocent
reasonable doubt
– “not guilty” does not mean innocence; it H1 : not innocent
simply means you couldn’t prove guilt
Hypotheses
• In hypothesis testing you compute the probability that the
hypothesis is true given the data (and the alternative)
• If the probability that the hypothesis is true is very small you
reject the hypothesis
Hypothesis
.4

H 0 : µ = µ0
.3
probability density

H1 : µ ≠ µ 0
.2

(1 − α )
.1

α /2 α /2
0

mu
Suppose that we flipped the coin 50 times, and
observed 22 heads.
The observed value of D is d = |22 – 25| = 3
What is the probability of observing a value of D
greater than or equal to 3 if H0 : 𝜃 = 0.5 is true?
Suppose that we repeated the same experiment
with many coins – each coin is tossed 50 times.

If the coins were fair, then we would expect to

observe about 48% of the experiments have a value
of D = |Y-25| greater than or equal to the value
d = |22-25| = 3

Important Note: This test cannot prove that the

coin is fair. However, in this case, based on the
evidence on hand (the data), there is no evidence
to suggest that the coin is biased
Key Concept: We care about the probability of
observing a value of D greater than or equal to the
observed value (d = 3 in this example) if the null
hypothesis were true.
We do not care about the probability of observing D
= 3 exactly.
Why?
If we repeated the experiment and got y = 22 again,
the only surprise is that we got the exact same
result. It provides zero evidence against H0 : 𝜃 = 0.5 .
Instead we ask, what is the probability of a result at least
as surprising.
The probability calculated in this question is called
the p-value
The p-value represents the probability of
observing a value as extreme or more extreme
than the value observed, under the assumption
that the null hypothesis is true
It is a measure of the level of evidence against Ho
based on the observed data

So, the smaller the p-value, the more evidence we

have against Ho, or the less the data supports the
claim that the null hypothesis is true
Hypothesis testing vs confidence interval
For a given significance level alpha:
• The hypothesis is rejected if and only if the observed statistic
falls outside of the confidence interval
• Therefore, hypothesis testing and confidence intervals always
lead to the same conclusion
Significance level
• The significance level is the probability at which the null
hypothesis becomes so unlikely, that you no longer believe it
– usually alpha=0.05, sometimes alpha=0.01

– p-value is interpreted as “degree of evidence” against the hypothesis

– strong evidence, reasonable evidence, weak evidence against the
hypothesis
• Unfortunately, in practice you often have to decide: reject or
not reject
“Statistical significance” vs.
(English) significance
• “significant” has a different meaning in statistics than in the
English language
• “This is a significant change”. What does this mean?
– English: “This is an important change” or “This change is large
enough to matter in practice”
– Statistics: “The change is not zero” or “The change cannot be
explained by chance alone”
Highway or surface road?
• When Dr. Schonlau was living in Los Angeles working for the
RAND Corporation, he had two choices for his morning
commute:
– Surface streets (shorter distance)
– Highway (longer distance)
• It was not clear to him which of the two options took less time
• Therefore, he started to record how many minutes each
commute took
• We will now play the commuting game
Highway or surface road?
• Here is the time of the first 3 trips (in Surface High-
minutes) n Roads way
• I will give you additional commuting 1 19
times shortly 2 19
• When you think you know which route 3 23
is shorter, write down the value of n
• From that point on you would only
commute on the shorter route
One–sample t-test
• The one-sample t-test corresponds to the following model:

µ + ε , ε ∼ G (0, σ )
Y =
Regression
• In regression, the systematic part of the model changes as a
function of other x-variables

Y =α + β x + ε , ε ∼ G (0, σ )

• This is a generalization of the one sample model

– The one sample t-test arises with x=0

Ch8 Statistics Ver1
No ratings yet
Ch8 Statistics Ver1
21 pages
StockWatson Econ CH 2
No ratings yet
StockWatson Econ CH 2
39 pages
1 Vocab Reasoning
No ratings yet
1 Vocab Reasoning
3 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Week4 Modified
No ratings yet
Week4 Modified
28 pages
Z Test
No ratings yet
Z Test
14 pages
Intro to Hypothesis Testing
No ratings yet
Intro to Hypothesis Testing
48 pages
Hypothesis Testing - A Visual Introduction To Statistical Significance (Scott Hartshorn)
No ratings yet
Hypothesis Testing - A Visual Introduction To Statistical Significance (Scott Hartshorn)
137 pages
Hypothesis Testing - A Visual Introduction To Statistical Significance
100% (4)
Hypothesis Testing - A Visual Introduction To Statistical Significance
137 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
55 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
12 pages
Core Concepts - M&E
No ratings yet
Core Concepts - M&E
40 pages
Unit 6
No ratings yet
Unit 6
81 pages
Chapter 16
No ratings yet
Chapter 16
24 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
57 pages
BE186
No ratings yet
BE186
51 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Biostats Midterms
No ratings yet
Biostats Midterms
4 pages
Things To Know PDF
No ratings yet
Things To Know PDF
56 pages
Testing of Hypothesis
67% (3)
Testing of Hypothesis
37 pages
Lecture 5 Test of Hypothesis Upload T
No ratings yet
Lecture 5 Test of Hypothesis Upload T
30 pages
Normal Distribution
No ratings yet
Normal Distribution
8 pages
Hypothesis Testing - The Scientists' Moral Imperative
No ratings yet
Hypothesis Testing - The Scientists' Moral Imperative
34 pages
PSM 201 Sampling Distributions and Hypothesis Testing
No ratings yet
PSM 201 Sampling Distributions and Hypothesis Testing
31 pages
ESTADISTICA APLICADA - Elementos Básicos
No ratings yet
ESTADISTICA APLICADA - Elementos Básicos
30 pages
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
No ratings yet
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
26 pages
Engineering Data Analysis Probability: Probability Is A Measure Quantifying The Likelihood That Events Will Occur
No ratings yet
Engineering Data Analysis Probability: Probability Is A Measure Quantifying The Likelihood That Events Will Occur
8 pages
Unit 6
No ratings yet
Unit 6
51 pages
Testing
No ratings yet
Testing
29 pages
Chapter Five Hypothesis Testing
No ratings yet
Chapter Five Hypothesis Testing
50 pages
Week 13 Hypothesis Testing
No ratings yet
Week 13 Hypothesis Testing
32 pages
Statistics
No ratings yet
Statistics
28 pages
MNSTA Chapter 4
No ratings yet
MNSTA Chapter 4
31 pages
Statistical Inferences
No ratings yet
Statistical Inferences
46 pages
Stab22 Lecture9
No ratings yet
Stab22 Lecture9
36 pages
Ed Inference1
No ratings yet
Ed Inference1
20 pages
Eco254 Summary (Full) 08024665051
No ratings yet
Eco254 Summary (Full) 08024665051
12 pages
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
No ratings yet
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
91 pages
CH 21
No ratings yet
CH 21
58 pages
Week 5
No ratings yet
Week 5
26 pages
Statistical Tests Martin G 161131 V15 UPLOAD
No ratings yet
Statistical Tests Martin G 161131 V15 UPLOAD
33 pages
Testing of Single Mean
No ratings yet
Testing of Single Mean
15 pages
T-Tests: - Computing A T-Test
No ratings yet
T-Tests: - Computing A T-Test
38 pages
Tests of Significance Notes PDF
No ratings yet
Tests of Significance Notes PDF
12 pages
STAT2170 and STAT6180: Applied Statistics: // // / Karol Binkowski
No ratings yet
STAT2170 and STAT6180: Applied Statistics: // // / Karol Binkowski
69 pages
Researchers' Guide to T-Tests
No ratings yet
Researchers' Guide to T-Tests
13 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Inferential Statistics Powerpoint
No ratings yet
Inferential Statistics Powerpoint
65 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
5 - Stat Lecture..
No ratings yet
5 - Stat Lecture..
44 pages
RP Notes Unit 4 - Distribution Fucntions
No ratings yet
RP Notes Unit 4 - Distribution Fucntions
5 pages
Sampling Theory
No ratings yet
Sampling Theory
7 pages
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
No ratings yet
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
53 pages
SBC 3305
No ratings yet
SBC 3305
11 pages
Isds361b Notes
No ratings yet
Isds361b Notes
103 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
8 pages
Review of Statistics
No ratings yet
Review of Statistics
36 pages
14 UnknownMeans
No ratings yet
14 UnknownMeans
43 pages

00 - Inrroduction To Statistics

Uploaded by

00 - Inrroduction To Statistics

Uploaded by

Overview

– Standard deviation: square root of the variance of a random variable

• We are 95% confident that the true

• 95% of the time, when we

calculate a confidence interval in

the confidence interval

Because the total probability is 1, probability of the true mean being

t density with 4 df Normal density

If the coins were fair, then we would expect to

Important Note: This test cannot prove that the

So, the smaller the p-value, the more evidence we

– p-value is interpreted as “degree of evidence” against the hypothesis

• This is a generalization of the one sample model

You might also like