Inferential Statistics
• Infer?
• What to Infer?
• Tests on Means
• Tests on Frequencies & proportions
Null Hypothesis Significance Testing (NHST)
Null Hypothesis Alternate/Research Hypothesis
No difference between populations The populations are different
No relationship between two variables There is a relationship between two variables
The intervention does not make a The intervention does make a difference/has
difference/has no effect an effect
Bivariate Analysis - Fit Y by X
Independent Variable
(X)
(X)
Continuous Categorical
(Y)
Continuous (OLS) Regression T Test /ANOVA
Dependent
Bivariate Fit
Variable
(Y) Categorical Logistic Regression Chi square
Contingency Analysis
T Test
T Test
One Sample t-test Two-sample t-test Paired t-test
Synonyms Student’s t-test Independent groups t-test Paired groups t-test
Independent samples t-test Dependent samples t-test
Equal variance t-test Repeated-samples t-test
Pooled t-test
Unequal variances t-test
No of One Two Two
variables
Type of Continuous Continuous and Categorical (to Continuous and Categorical (to define pairing
variable define groups) within groups)
Purpose Decide if the population mean is Decide if the population mean for Decide if the difference between paired
equal to a specified value or not two different groups are equal or measurements for a population is zero or not
not
Example Mean weight of Students is Mean weight for male and female Mean difference in weight loss for a group of
equal to 110 or not are same or not? students before and after exercise is zero or
not
Distri Normal Normal Normal
Two Sample T Test
Is there any Statistically significant difference in the
CTC between Genders (Male and Female)?
Step 2 Under the Red Triangle dropdown choose t -test
T Test
Step 1
Analyze , Fit Y by X ,
Y = CTC, X = Gender , OK
Look for P value and Interpret
ANOVA
Is there any Statistically significant difference in the CTC
between Class mode? (Online, Offline, Hybrid?)
One Way ANOVA
Step 1
Analyze , Fit Y by X ,
Y = CTC, X = class mode , OK
Step 2
Under the Red Triangle
dropdown choose
Means/ANOVA
Look for P value and
Interpret
The Analysis of Variance report shows the standard
ANOVA information. You notice that the Prob > F (the p-
value) is 0.022, which supports your visual conclusion that
there are significant differences in the average CTC
value between the Class Modes.
Compare Means >
Look at the pairwise comparison
Is there any Statistically association between
Specialization and Gender?
Ho: proportion of people in one gender is independent of the Specialization
Ho: proportion of people in one gender is independent of the Specialization. i.e. Gender and
Specialization are independent of each other. No Association between Gender and Specialization.
Does Weight and Height are related?
To predict the Gender based on CTC?
What is the probability that an individual is
Male, or Female based on the CTC
Impact of Training on Sales
Paired T Test – Sales Increase.jmp
Step 1
Analyze>Specialized Modeling > Matched Pairs
Fit Y by X , Y = Before, After > ok
Look for P value and Interpret
Paired T Test – Sales Increase.jmp
Look for P value and Interpret
There is a statistically significant
difference in the sales before and after
training.
(There is an increase in sales which is
statistically significant)
Or the mean difference between before
and after is NOT ZERO
Non-Parametric Tests
Nonparametric tests are useful when the usual analysis of
variance assumption of normality is not viable.
Non-Parametric Tests
Wilcoxon Like t-test using Ranks (also called Mann Whitney U Test) for only 2 groups.
Kruskal Wallis Like One way ANOVA using Ranks for more than 2 groups.
Median One Way ANOVA using Median. Whether the point is above the median or below the
median. Performs a test based on Median rank scores.
Van der Waerden One way ANOVA using normal quantile scores based on rank.
Kolmogorov Smirnov Available only when the X factor has two levels.) Performs a test based on the
empirical distribution function, which tests whether the distribution of the response is
the same across the groups.
Friedman Rank Like repeated measures ANOVA
Exact Provides options for performing exact versions of the Wilcoxon, Median, van der
Waerden, and Kolmogorov-Smirnov tests. These options are available only when the X
factor has two levels. Results for both the approximate and the exact test are given.
Check out Nonparametric Multiple Comparisons also
Analgesics Example / Exercise
Help > sample Data > Analysis of
Variance > Analgesics
Questions?
Is Pain Normally distributed?
Is the calculated mean of Pain being Statistically different from 9.1?
Is the calculated mean of Pain being Statistically different from 7.1?
Is there any Statistically significant difference in the pain between Genders (Male and Female)?
Is there any Statistically significant difference in the pain between different Drugs? (Drug A, B & C)
Is there any Statistically association between Drug Type and Gender?
Is Pain Normally distributed?
Analyze > Distribution > Y = Pain, OK
Red Triangle next to Pain
Under drop down ,
Select> Normal Quantile Plot
Normality continued
Normal Quantile Plot
Continuous Fit> Fit Normal Fitted Distribution > goodness of Fit
Null : Sample taken from a normal distribution.
(P value greater than 0.05)
Alternate : Sample was taken from a non normal
distribution (P value less than 0.05)
If the P Value is GREATER than 0.05,
then the data is normally distributed
Pain is Normally distributed
One sample T Test
Is the calculated mean of Pain being Statistically
different from 9.1?
Test of Mean (One sample T Test)
Is the calculated mean of Pain is
Statistically different from 9.1?
Pain Red Triangle > Test Mean
Input Hypothesized value = 9.1 > ok
Since P Value is MORE than 0.05, There
is NO significant difference in the mean
pain and hypothesized value of 9.1
One sample T Test
Is the calculated mean of Pain being Statistically
different from 7.1?
Test of Mean (One sample T Test)
Is the calculated mean of Pain
Statistically different from 7.1?
Pain Red Triangle > Test Mean
Input Hypothesized value = 7.1 > ok
Since P Value is less than 0.05, There is a
significant difference in the mean pain
and hypothesized value of 7.1
Two Sample T Test
Is there any Statistically significant difference in the
pain between Genders (Male and Female)?
Step 2 Under the Red Triangle dropdown choose t -test
T Test
Step 1
Analyze , Fit Y by X ,
Y = Pain, X = Gender , OK
Look for P value and Interpret
Is there any Statistically significant
difference in the pain between different
Drugs? (Drug A, B & C)
One Way ANOVA
Data = Help>Sample Data>Analysis of Variance>Analgesics
Step 1 Step 2
Analyze , Fit Y by X , Under the Red Triangle dropdown
Y = Pain, X = drug , OK choose Means/ANOVA
Look for P value and Interpret
The Analysis of Variance report shows the
standard ANOVA information. You notice that
the Prob > F (the p-value) is 0.0053, which
supports your visual conclusion that there are
significant differences in the average pain value
between the drugs.
Is there any Statistically significant
difference in the pain between
• Drug A & Drug B?
• Drug B & Drug C?
• Drug C & Drug A?
One Way ANOVA – (Pairwise comparisons)
Data = Help>Sample Data>Analysis of Variance>Analgesics
Step 1 Step 2 Step 3
Analyze , Fit Y by X , Under the Red Triangle Under the Red Triangle dropdown
Y = Pain, X = drug , OK dropdown choose choose compare Means > all Pairs,
Means/ANOVA Tukey HSD
Look for P value and Interpret
Is there any Statistically association between
Drug Type and Gender
Ho: proportion of people in one gender is independent of the Drug Type