UNIT-5
hypothesis testing
• Hypothesis testing: Hypothesis testing: one tailed and two tailed tests for means
of small sample (t-test)- F-test – one way and two way analysis of
variance (ANOVA) – chi-square test for simple sample standard
deviation, independence of attributes and goodness of fit.
Sample Design
• All the items under consideration in any field constitute a
“Universe” or “Population”
• A complete enumeration of all the items in the
“population” is known as a “census enquiry”
• Since a complete census enquiry is not possible generally,
we select a ‘sample’ – a few items from the “universe”
for our study
• Researcher selects the sample by using ‘sampling design’
– a definite plan determined before any data is actually
collected
Types of Sampling
Probability sampling techniques:
1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster/area Sampling
5. Multi-stage Sampling
Types of Sampling
Non-Probability sampling techniques:
1. Deliberate Sampling
2. Quota Sampling
3. Sequential Sampling
4. Snowball sampling
5. Panel samples
Sampling Techniques
• Sample: A sample can be defined as a part of
the target population that represents the
total population.
• Sampling Process:
1. Define the population.
2. Identify the sampling frame.
3. Specify the sampling unit.
4. Selection of sampling method.
Sampling Techniques
5. Determination of sample size.
6. Specify the sampling plan.
7. Selection of samples.
Factors for determination of sample
size for a survey/research:
1. Inappropriate sampling frame.
2. Defective measuring device.
3. Non-respondents.
Sampling Techniques
4. Indeterminancy principle.
5.Natural bias in the reporting of
data. Sampling errors: these are random
thevariations in the sample estimates around the
true population estimates. It decreases with
the increase in the size of the sample and it
happens to be of smaller magnitude in case of
homogeneous population.
Determination of Sample Size
• To determine the sample size for a pilot study the
following formula is used:
• Sample size, n = {(σZα) / e}2
Where,
σ represents the SD of the population, Zα the value
is determined with the level of confidence of the
researcher and e represents the error expected in
the study
Hypothesis Testing
• Hypothesis: in statistics, hypothesis is referred
to as a statement characterising the
population that the researcher wishes to
verify on the basis of available sample
information.
• Hypothesis testing: it is a process in which
choice is made between the two actions,
i.e., either accept or reject the
presumed statement.
Hypothesis Testing
• Terminologies:
1. Null hypothesis: it is a statement about the
population whose credibility or validity the
researcher wants to assess based on the sample.
This is formulated specifically to test for possible
rejection or nullification. It always states ‘no
difference’. The researchers main aim is tested
with this statement only.
Eg: there is no significant difference in the
customers opinion on opening walmart outlets
in chennai city.
Hypothesis Testing
2. Alternative or Alternate hypothesis:
the conclusion that we accept when the data
fail to support null hypothesis. Eg: the
customers prefer kirana shops rather than
established outlets.
3. Significance level: it is a percentage value
that is the probability of rejecting the
null hypothesis when it is true. Normally, 5%
and 1% significance values are considered
for evaluation.
Hypothesis Testing
4. One-tailed test: a hypothesis test in
which there is only one rejection region, i.e.,
we are concerned with whether the
observed value deviates from the
hypothesised value in one direction.
5. Two-tailed test: a hypothesis test in which
the null hypothesis is rejected and if the
sample value is significantly higher or lower
than the hypothesised value. It is the test that
invloves both the rejection regions
Hypothesis Testing
• Types of hypothesis:
Descriptive hypothesis.
Relational hypothesis.
Working hypothesis.
Null hypothesis.
Analytical hypothesis.
Statistical hypothesis.
Common sense hypothesis.
Simple and composite hypothesis.
Hypothesis Testing
• Sources of hypothesis:
1. Theory.
2. Observation.
3. Past experience.
4. Case studies.
5. Similarity.
Steps involved in Hypothesis
Testing
1. Formulate the hypothesis.
2. Select the level of significance.
3. Select an appropriate test.
4. Calculate the value.
5. Obtain the critical test value.
6. Make decisions.
Errors in hypothesis testing
1. Type 1 error- Hypothesis is true but your test
rejects it.
2. Type 2 error- Hypothesis is false but your test
accepts it.
Level of significance and
confidence
• Significance means the percentage risk to
reject a null hypothesis when it is true and
it is denoted by 𝛼. Generally taken as 1%,
5%, 10%
• (1−𝛼)is the confidence interval in which the
null hypothesis will exist when it is true.
Two tailed test at
5% Significance level
Acceptance and Rejection
regions in case of a Two Suitable When 𝐻0: 𝜇 = 𝜇0
𝐻𝑎: 𝜇 ≠
tailed test
𝜇0
𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛
𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖o𝑛 /𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
/𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
(𝛼 = 0.025 𝑜𝑟 2.5%) 𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (𝛼 = 0.025 𝑜𝑟
(1 − 𝛼) = 95% 2.5%)
𝐻0: 𝜇 =
𝜇0
Left tailed test at
5% Significance level
Acceptance and Rejection
regions in case of a left Suitable When 𝐻0: 𝜇 =
tailed test 𝜇0
𝐻𝑎 : 𝜇 <
𝜇0
𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜𝑛
/𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
(𝛼 = 0.05 𝑜𝑟 5%) (1 − 𝛼) = 95%
𝐻0: 𝜇 = 𝜇0
Right tailed test at
5% Significance level
Acceptance and Rejection
regions in case of a Right Suitable When 𝐻0: 𝜇 =
𝜇0
tailed test
𝐻𝑎 : 𝜇 >
𝜇0
𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜𝑛 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛
𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 /𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
(1 − 𝛼) = 95% (𝛼 = 0.05 𝑜𝑟 5%)
𝐻0: 𝜇 =
𝜇0
HYPOTHESIS TESTING
PROCEDURES
Z-test (Large Samples)
• The z-test is a hypothesis test in which the z-statistic follows
a normal distribution.
• The z-test is best used for greater than 30 samples because,
under the central limit theorem, as the number of
samples gets larger, the samples are considered to be
approximately normally distributed.
• A z-test is a statistical test used to determine whether two
population means are different when the variances
are known and the sample size is
large.
t-test (Small Samples)
• A t-test is an analysis of two populations means through the
use of statistical examination; a t-test with two samples
is commonly used with small sample sizes, testing the
difference between the samples when the variances/SD
of two normal distributions are not known.
• A t-test looks at the t-statistic, the t-distribution and degrees
of freedom to determine the probability of
difference between populations for hypothesis testing.
• T Test is often called Student's T test in the name of its
founder "Student".
t-test : Test for a specified mean
Two tailed test hypothesis:
• H0 : μ=μ0
• H1 : μ≠μ0
• Test Statistic,
• Where S = √{(ns2) / (n-1)}
Inference:
• Table Value: (n-1) is the degrees of freedom for the distribution. This value is used to find
the table value for the given level of significance.
• If the calculated value is less than the table value at 5% or 1% significance value,
Null
• Hypothesis is accepted.
If the calculated value is more than the table value at 5% or 1% significance value, Null
Hypothesis is rejected. So, the alternative hypothesis will be accepted in that case.
• Note: One tailed test is performed the same way the difference is with the representation in
hypothesis and the table values for significance
t-test : Test of significance for the difference between
two population means when the population SD’s are
not known
Two tailed test hypothesis:
• H0 : μ1=μ2
• H1 : μ1≠μ2
• Test Statistic,
Where Sp = √{(n1 s1 2 + n2 s2 2) / (n1 + n2 -2)}
Inference:
• Table Value: (n-1) is the degrees of
freedom for the distribution. This
value is used to find
the table value for the given level of
significance.
• If the calculated value is less than
the table value at 5% or 1%
significance value, Null
t-test – Paired Observations
• Condition of independence may not hold good for all samples.
When the samples are related to each other t-test can be
performed for small samples on converting the given sample into
single data type by taking the difference. So, the formula will be:
Where d=x-y and Sd represents the S.D of the population(here).
Note: If Sd value is taken from the sample then the denominator will
be Sd/ √n-1
• Inference is similar to the previous t-tests discussed.
F-test
• This test is based on the test statistic that
follows F-distribution.
• This F-test is used to check the equality of two
population variances.
Two tailed test hypothesis:
• H0 : σ1 =σ2
2 2
• H1 : σ1 ≠σ2
2 2
F-test
• Test Statistic,
• The value of F is calculated such that it is always
greater than 1
F-test
Table value Calculations:
• The value of (n1 – 1) degrees of freedom represents the
row & the value of (n2 – 1) degrees of freedom
represents the column.
• With this table value the final interpretation is made.
• If the calculated value is less than the table value,
Null Hypothesis is accepted.
• If the calculated value is more than the table value,
Null Hypothesis is rejected. So, the
alternative hypothesis will be accepted in that case.
ANOVA
• Analysis Of Variance.
• It is a technique to test equality of means when
more than 2 populations are considered.
• Between Sample Variation and within sample
variation.
• There are two types in this:
(i) One-way ANOVA &
(ii) Two-way ANOVA
One-Way Analysis of
Variance
One-Way Analysis of Variance
Methodology:
• Write down the Hypothesis for one way ANOVA i.e., H0 :
H1 :
1. Calculate N (Total No.of Observations)
2. Calculate T (Total of all the observations)
3. Calculate Correction Factor T2/N.
4. Calculate Sum of Squares:
(i) Total Sum of Squares:
SST = [∑X12 + ∑ X22 + ……. +∑ X 2] – T2/N
n
(ii) Column Sum of Squares:
SSC = {[(∑X1 ) 2 /N1]+ [(∑X2) 2/N1] + ……. +[(∑Xn ) 2/N1]} –
T2/N
Where N1 refers to the No.of elements in each column
One-Way Analysis of Variance
5. Prepare ANOVA table and Calculate F-ratio
(F-value is calculated such that F>1)
ANOVA TABLE
SOURCE OF SUM OF DEGREES OF MEAN SUM VARIANCE
VARIATION SQUARES FREEDOM OF SQUARES RATIO
Between SSC c-1 MSC = F=
Columns SSC/ (c-1) MSC / MSE
Within SSE N-c MSE = Or
Columns SSE / (N-c) F=
(Errors) MSE/MSC
Total SST N-1
One-Way Analysis of Variance
6. After calculating the F ratio value, final interpretation is
made on comparison with the respective table
value.
• Finding Table value:
When F is calculated using the formula MSC / MSE
7. (c-1) degrees of freedom value - Column.
8. (N-c) degrees of freedom value – Row.
Compare the respective table value with the calculated
value at 5% or 1% level of significance.
• If calculated value < table value, Null hypothesis
is accepted.
• If calculated value > table value, Null hypothesis is
rejected & Alternate hypothesis is accepted.
• Accordingly give the final interpretation in
Example- one way ANOVA
Example: 3 samples obtained from normal
populations with equal variances. Test the
hypothesis at 5% level of significance that sample
means are equal.
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
Solution: H0 : H1 :
X1 (X1) 2 X2 (X2 )2 X3 (X3 )2
8 64 7 49 12 144
10 100 5 25 9 81
7 49 10 100 13 169
14 196 9 81 12 144
11 121 9 81 14 196
Total 50 530 40 336 60 734
1. Number of observations , N= 15
2. Total sum of all observations , T= 50 + 40 + 60 = 150
3. Correction factor = T2 / N=(150)2 /15= 22500/15=1500
4. Total sum of squares, SST= 530+ 336+ 734 – 1500= 100
5. Sum of squares between samples, SSC=(50)2/5 + (40)2 /5 + (60) 2 /5 - 1500=40
6. Sum of squares within samples, SSE= 100-40= 60
ANOVA Table
SOURCE OF SUM OF DEGREES OF MEAN SUM VARIANCE
VARIATION SQUARES FREEDOM OF SQUARES RATIO
Between SSC = 40 c-1 = 3-1 = 2 MSC = F=
Columns SSC/ (c-1) MSC / MSE
=40/2 = 20 =20/5 = 4
(Since,
MSC>MSE)
Within SSE = 60 N-c = 15-3 = MSE =
Columns 12 SSE / (N-c) =
(Errors) 60/12 = 5
Total SST = 100 N-1 = 15-1 =
14
• F=4 (Calculated Value)
Solution…
• Table Value: V1 = 2 and V2 = 12 at 5% level
of significance = 3.89
• Calculated value, so Null
value>Table Hypothesis and Alternate
is
Hypothesis is accepted.
• rejected
So, the population means are not equal at 5%
level of significance.
Two-Way Analysis of Variance
Methodology:
• Write down the Hypothesis for Two way ANOVA i.e.,
H0 : There is no significant difference between column means as well as between row
means.
H1 : There is a significant difference between column means as well as between row
means.
1. Calculate N (Total No.of Observations)
2. Calculate T (Total of all the observations)
3. Calculate Correction Factor T2/N.
4. Calculate Sum of Squares:
(i) Total Sum of Squares:
SST = [∑X12 + ∑ X 2 + ……. +∑ X 2] 2– T2/N n
(ii) Column Sum of Squares:
SSC = {[(∑X1 ) 2 /N1]+ [(∑X2) 2/N1] + ……. +[(∑Xn ) 2/N1]} – T2/N
Where N1 refers to the No.of elements in each column
(iii) Row Sum of Squares:
SSR = {[(∑Y1 ) 2/N2] + [(∑Y2) 2/N2] + ……. +[(∑Yn ) 2/N2]} – T2/N
Where N2 refers to the No.of elements in each row
Two-Way Analysis of Variance
5. Prepare ANOVA table and Calculate F-ratio
(F-value is calculated such that F>1)
ANOVA TABLE
SOURCE OF SUM OF DEGREES OF MEAN SUM VARIANCE
VARIATION SQUARES FREEDOM OF SQUARES RATIO
Between SSC c-1 MSC = Fc = MSC/MSE
Columns SSC/ (c-1)
Between Rows SSR r-1 MSR = FR = MSR/MSE
SSR / (r-1)
Residual (Errors) SSE N-c-r+1 MSE =
SSE / (N-c-r+1)
Total SST N-1
Two-Way Analysis of Variance
6. After calculating the Fc and FRratio value, final interpretation is made on
comparison with the respective table value.
Finding Table value:
Fc
1. (c-1) degrees of freedom value - Column.
2.(N-c-r+1) degrees of freedom value –
Row. FR
3. (r-1) degrees of freedom value -
Column.
4. (N-c-r+1) degrees of freedom value – Row.
Compare the respective table value with
the calculated value at 5% or 1%
level of significance.
• If calculated value < table value, Null
hypothesis is accepted.
• If calculated value > table value, Null
Example – Two way ANOVA
• In a certain factory, production can be accomplished by four
different workers on five different machines. A sample study,
in context of a two-way design without repeated values is
being made with two fold objective of examining whether the
four workers differ with respect to mean production and
whether the mean productivity is same for all the 5 machines.
The researcher involved in this study reports the following
data:
i. Sum of squares for variation between machines = 35.2
ii. Sum of squares for variation between workmen = 53.8
iii. Sum of squares for total variation = 174.2
Set up ANOVA table for the given
information and draw the inference about variation at
5% level of significance.
Solution:
H0 : Workers do not differ significantly with
respect to mean productivity and the mean
productivity is same for 5 different machines
H1 : Workers differ significantly with respect to
mean productivity and the mean productivity
is not the same for the machines
Solution:
Two-way ANOVA Table
SOURCE OF SUM OF DEGREES OF MEAN SUM OF VARIANCE
VARIATION SQUARES FREEDOM SQUARES RATIO
Between SSC = 35.2 c-1 = 5-1 MSC = Fc = MSC/MSE
Columns =4 SSC/ (c-1) = 8.8/7.1
= 35.2/4 = 8.8 = 1.24
Between Rows SSR = 53.8 r-1 = 4-1 MSR = FR = MSR/MSE
=3 SSR / (r-1) = 17.93/7.1
= 53.8/3 = 17.93 = 2.53
Residual (Errors) SSE = 85.2 N-c-r+1 = 20-5- MSE =
4+1 = 12 SSE / (N-c-r+1)
= 85.2/12 = 7.1
Total SST = 174.2 N-1 = 20-1 = 19
Solution:
• Calculated values:
Fc = MSC/MSE = 1.24
FR = MSR/MSE = 2.53
• Table Values:
For Fc (Column = 4, Row=12) = 3.26
For FR(Column = 3, Row=12) = 3.49
• Final Interpretations:
Calculated Fc < Table value of Fc
Calculated FR < Table value of FR
• So, H0 is accepted. There is no significant difference
between mean productivity with respect to workers
and machines.
Two-Way Analysis of Variance
• Coding method is another method of solving two-way
ANOVA.
• In this method the first step is subtract the given value from
all the values in the data set and then the regular methodology
discussed earlier will be followed.
NON-PARAMETRIC
METHODS
• Non-parametric tests can be applied when:
– Data don’t follow any specific distribution and no
assumptions
about the population are made. Distribution-free
tests.
– Data measured on any scale.
• Commonly used Non Parametric Tests are:
− Chi Square test
− The Sign Test
− Wilcoxon Signed-Ranks Test
− Mann–Whitney U test
CHI SQUARE TEST
• First used by Karl Pearson
• Simplest & most widely used non-parametric test in statistical work.
• Calculated using the formula-
χ2 = ∑ ( O – E )2
E
O = observed frequencies
E = expected frequencies
• Greater the discrepancy between observed &
expected
frequencies, greater shall be the value of χ2.
• Calculated value of χ2 is compared with table value of χ2 for given
degrees of freedom.
CHI SQUARE TEST
• Application of chi-square test:
– The test of goodness of fit (determine if
actual numbers are similar to the
expected/ theoretical numbers)
– Test for independence of attributes (disease &
treatment,
vaccination & immunity)
– To test if the population has a specified value
of variance.
CHI-SQURE TEST FOR
INDEPENDENCE OF
ATTRITUBTES
• H0: In the population, the two categorical
variables are independent OR there is no
relationship between the two variables.
• H1: In the population, two categorical variables
are dependent OR there is a relationship between
the two variables.
• Summarize the data in the two-way contingency
table. One column representing the observed
counts and another for expected counts.
• Calculating the Expected count from the
observed data set through a formula
CHI-SQURE TEST FOR INDEPENDENCE
• Expected Count, E=[rowOFtotal × column
ATTRIBUTES
total] / sample size.
• Then the table is expanded to calculate
χ2 = ∑ ( O – E )2
E
• The calculated value is compared with the
table value and final interpretations are made.
• For Table Value: Degrees of freedom = (r - 1) (c - 1),
which represents the row value to be used under
significance column values.
Example:
1. The following contingency table shows the
classification of 1000 workers in a factory according
to the disciplinary action taken by the management
and their promotional experience:
Disciplinary Promotional Experience Total
Action
Promoted Not Promoted
Offenders 30 670 700
Non-offenders 70 230 300
Total 100 900 1000
Use Chi-square test to ascertain the disciplinary
action taken and promotional experience are
associated.
• Solution:
• H0: There is no significant relationship between disciplinary action taken and promotional
experience of workers
• H1: There is a significant relationship between disciplinary action taken and promotional
experience of workers
Observed Frequency Table:
Disciplinary Promotional Experience Total
Action
Promoted Not Promoted
Offenders 30 670 700
Non-offenders 70 230 300
Total 100 900 1000
Expected Frequency Table:
Disciplinary Promotional Experience Total
Action
Promoted Not Promoted
Offenders 70 630 700
Non-offenders 30 270 300
Total 100 900 1000
• Solution Ctd…
Observed Expected (O-E) ( O – E )2 ( O – E ) 2/ E
Values, O Values, E
30 70 -40 1600 22.86
670 630 40 1600 2.54
70 30 40 1600 53.33
230 270 -40 1600 5.92
∑ ( O – E )2/ E
= 84.65
•Calculated Value =
84.65 Table Value:
• Degrees of freedom =
(r-1)(c-1)
• It is a 2x2 contingency
table. So, r=2 & c=2
• N.d.f = (2-1)(2-1) = 1
Final Interpretation:
• Calculated Value = 84.65
• Table value=3.8415
• Calculated value > Table value
• Null hypothesis is and alternate
rejected
hypothesis is accepted. So there exists a
significant relationship between the
disciplinary action taken and the promotional
experience of workers in the factory.
CHI SQUARE TEST FORGOODNESS
OF FIT
The chi-square goodness of fit test is appropriate
when the following conditions are met:
• The sampling method is simple random sampling
• The variable under study is categorical
• The expected value of the number of sample
observations in each level of the variable is at
least 5.
• This approach consists of four steps: (1) state the
hypothesis, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.
CHI SQUARE TEST FOR GOODNESS
•
OF FIT -ANALYSIS
Degrees of freedom: The degrees of freedom (DF) is equal to the number
of levels (k) of the categorical variable minus 1: DF = k - 1 .
• Expected frequency counts: The expected frequency counts at each level
of the categorical variable are equal to the sample size times the
hypothesized proportion from the null hypothesis i.e., Ei = npi
where Ei is the expected frequency count for the ith level of the categorical
variable, n is the total sample size, and pi is the hypothesized proportion
of observations in level i.
• Test statistics:. The test statistic is a chi-square random variable (Χ2)
defined
by the following equation.Χ2 = Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of the
categorical variable, and Ei is the expected frequency count for the ith
level of the categorical variable.
• INTERPRETING THE FINAL RESULTS ON COMPARISON WITH THE TABLE
Example:
1. A sample analysis of examination results of 500
students was made. It was found that 220 students
have failed, 170 have secured a third class, 90 have
secured a second class and rest first class. Does this
sample support the general belief that the above
categories are in the ratio 4:3:2:1 respectively?
Solution:
H0: Results of the four category students follow the ratio
4:3:2:1
H1: Results of the four category
students do not follow the ratio 4:3:2:1
Solution ctd…
• Calculating Expected Frequencies: Ei = npi
• E1 = np1 = 500 x (4/10) = 200
• E2 = np2 = 500 x (3/10) = 150
• E3 = np3 = 500 x (2/10) = 100
• E4 = np4 = 500 x (1/10) = 50
Categories Observed Expected (O-E) ( O – E )2 ( O – E ) 2/ E
Values, O Values, E
Failures 220 200 20 400 2
Third Class 170 150 20 400 2.67
Second Class 90 100 -10 100 1
First Class 20 50 -30 900 18
Total 500 ∑ ( O – E )2/ E
=23.67
Table Value:
• N.d.f = (k-1) = (4-1) = 3
• Level of Significance = 5%
Final Interpretation
• Calculated Value = 23.67
• Table Value = 7. 8147
• Calculated Value > Table Value
• So, Null hypothesis is rejected and
alternate hypothesis is accepted.
• Hence the results of the four category
of students do not follow the ratio of 4:3:2:1
CHI SQUARE TEST FOR A
SPECIFIED POPULATION VARIANCE
OR STANDARD DEVIATION
• To test a claim about the value of the variance
or the standard deviation of a population,
then the test statistic will follow a chi-square
distribution with n−1 degrees of freedom, and
is given by the following formula.
χ2= ns2 / σ2
•where n=Sample size, s=Sample S.D
and σ = Population S.D
Example:
1. A random sample of size 20 from a population
gives the sample standard deviation of 6. Test the
hypothesis that the population standard deviation
is 9 at 1% level of significance.
Solution:
H0: σ = 9 and H1: σ ≠ 9
Formula: χ2= ns2 / σ2
Given: n = 20, s=6 and σ = 9
Substituting the values in the formula,
χ2= (20 x 62)/92 = 720/81 = 8.88
Final Interpretation
• Table Value:
• N.d.f = n-1 = 20-1 =19
• Level of Significance = 1%
• Table value = 36.1909
• Calculated value < Table value.
• So, Null hypothesis is accepted and hence the population
standard deviation for the given distribution is 9.
Parametric vs Non-parametric
• Parametric tests => have information
about population, or can make certain assumptions
– Assume normal distribution of population.
– Data is distributed normally.
– population variances are the same.
• Non-parametric tests are used when there are
no assumptions made about population distribution
– Also known as distribution free tests.
– But info is known about sampling distribution.