Testing of Hypothesis: Structure
Testing of Hypothesis: Structure
Structure
19.0 Objectives
19.1 Introduction
19.2 Formulation of a Hypothesis
19.3 Rejection Region and Type of Errors
19.3.1 Rejection Region for Large Samples
19.3.2 One-tail and Two-tail Tests
19.3.3 Type I and Type I1 Errors
19.3.4 Rejection Region for Small Samples
19.4 Testing of Hypothesis for a Single Sample
19.4.1 Population Variance is Known
19.4.2 Population Variance not Known
19.5 Tests for Difference between two Samples
19.5.1 Population Variance is Known
19.5.2 Population Variance is not Known
19.6 Let Us Sum Up
19.7 Key Words
19.8 Some Useful Books
19.9 AnswersfHints to Check Your Progress Exercises
19.0 OBJECTIVES -
19.1 INTRODUCTION
In the previous Unit we learnt about the estimation of confidence interval for population
mean on the basis of sample data. In the present Unit we will look into another aspect
of statistical inference, that is, hypothesis testing. Hypothesis is a statement or assertion
or claim about the population parameter. For example, suppose we have a hitch that
the per capita income of Chhattisgarh state is Rs.20000 per annum. We can be sure
about the truth in the above statement if we urldertake a complete census of households
in the state. This implies we collect data on the income of all the households in
Chhattisgarh and calculate the per capita income of the state. However, constraints
such as time, money and manpower may restrict us to go for a s a p l e survey and draw '
a conclusion about the statement on the basis of sample information. The procedure
followed in the above is the subject matter of hypothesis testing.
Hypothesis testing is applied widely in various fields and to various situations. For
example, suppose the effectiveness of a new drug in curing tuberculosis needs to be
Statistical Inference tested. Obviously all the patients suffering from tuberculosis need not be,administered
with the new drug to see its effectiveness. What we need is a representative sample and
test whether the new drug is more effective than existing drugs. As another example let
us take the case of a planner who asserts that the crude birth rate is the same in the
states Bihar and Rajasthan. In this case it may not be possible on our part to go for a
census survey of all the births that have taken place in Bihar and Rajasthan during the
last year andcalculate the crude birth rate. Instead a sample survey is undertaken and
the assertion made by the planner is put to test.
In hypothesistesting we try to answer questions of the following types: Is the sample
under consideration drawn from a particular pbpulation? Is the difference between two
samples significant enough that they cannot belong to the same population?
We have to keep in mind that null hypothesis and alternative hypothesis are mutually
exclusive, that is, both cannot be true simultaneously. Secondly, both H , and H A
exhaust all possible options regarding the parameter, that is, there cannot be a third
possibility. For example, in the case of female literacy in Orissa, there are two
possibilities -literacy rate is 5 1 per cent or it is not 5 1 per cent; a third possibility is
not there.
It is alrare coincidence that sample mean ( i ) is equal to population mean ( P ). In most
cases we find a difference between 5 and ,D . Is the difference because of sampling
fluctxtation or is there a genuine difference between the sample and the population? In
orderltoanswer this question we need a test statistic to test the difference between the
two. The result that we obtain by using the test statistic needs to be interpreted and a
decisiion needs to be taken regarding the acceptance or rejection of the null hypothesis.
The development of test statistic for hypothesis testing and interpretation of results
requires elaboration. Before discussing further on these two steps we present another
concept -critical or rejection region.
Testing of Hypothesis
19.3 REJECTION REGION AND TYPE OF ERRORS
The underlying idea behind hypothesis testing and interval estimation (discussed in the
previous Unit) is the same. Recall from Unit 18 that a confidence interval is built
around sample mean with certain confidence level. A confidence level of 95 per cent
implies that in 95 per cent cases the population mean would remain in the confidence
interval estimated from the sample mean. It is implicit that in 5 per cent cases the
population mean will not remain within the confidence interval. Note that when the
population mean does not remain within the confidence interval we should reject the
null hypothesis.
On the other hand, if population standard deviation is not known the test statistic is
In the case of t-statistic, however, the area under the curve (which implies probability)
changes according to degrees of freedom. p u s while finding the critical value oft we
should take into account the degrees of freedom. When sample size is n, degrees of
freedom is n - 1 . Thus we should remember two things while finding critical value of
t. These are: i) significance level, and ii) degtees of freedom.
~u$posea sample of 100 students has mean age of 12.5 years. Show through
diagram the rejection region at 5 per cent level of significance to test the hypoth-
esis that the sample has a mean age greater than the population mean. Assume
thatpopulation mean and standard deviation are 10 years and 2 years respec-
tively.
x"
households of Raigarh district of Chhattisgarh state and calcul ted the per capita income
of these hbuseholds. Subsequently, our task is to test the ypothesis that per capita
income calculated from the sample is not different from h e per capita income of the
district.
In the above example we can have two different sitwions: i) population (in this case
all the households of the district) variance is k n w n , ii) population variance is not
known to us. We explain the steps to be followyl in each case below.
From the central limit theorem we know that when sample; size is large, sample is
approximatelynormally distributed. This is true even in cases where the parent population
is normally distributed. Thus the example is appropriate for application of normal
distribution.
Our null hypothesis in this case is Testing ofHypothesis
The null hypothesis suggests that sample mean is equal to population mean. In other
words, per capita income obtained from the sample is the same as the data provided in
official records.
Our alternative hypothesis is
Suppose we do not have any reason to say that per capita income obtained from the
sample ( 2 ) is greater than or smaller than the per capita income available in official
recdrds. Thus our alternative hypothesis is that 2 could be on either side of p .
Therefore, we should go for two-tail test so that rejection region is on both sides ofthe
i standard normal curve and the test statistic is -
Recall the standard normal curve and the area for different values ofz (See Table 15.1
in Unit 15). We notice that when z = 1.96 the area covered under standard normal
curve is 0.4750. Therefore, the level of significance is 5 per cent. Similarly, when
z = 2.58 the area covered under standard normal curve is 0.4950. Therefore, the level
of significance is 1 per cent.
In the above case since z = 6.67, the sample lies in the critical region and we reject the
null hypothesis. Thus the per capita income obtained from the sample is significantly
different from the per capita income provided in official records.
The steps you should follow are:
I
1) Specify the null hypothesis.
2) Find out whether it requires one-tail or two-tail test. Accordingly identify your
critical region. This will help in specification of alternative hypothesis.
3) Apply sample values to z-statistic given at (19.6).
4) Find out from z-table the critical value according to level of significance.
5) If you obtain a value lower than the critical value do not reject the null hypoth-
esis.
6) If you obtain a value greater than the critical value reject the null hypothesis and
accept the alternative hypothesis.
Example 19.1
Suppose the voltage generated by certain brand of battery is normally distributed. A
random sample of 100 such batteries was tested and found to have a mean voltage of
1.4 volts. At 0.01 level of significance, does this indicate that these batteries have a
general average voltage, that is different from 1.5volts? Assume that population standard
deviation is 0.2 1 volts.
Statistical Inference Since average voltage of the sample can be different frem average voltage of
the population if it is either less than or more than 1.5 volts, our rejection region
is on both sides of the normal curve. Thus it is a case of two-tail test and
alternative hypothesis is
From the table for the area under the standard normal curve, we find that the
critical value at the 1 per cent level of significance is 2.58. Since the observed
value of z is greater than 2.58 we reject the null hypothesis at 1% level and
accept the alternative hypothesis that the average life of batteries is different
from 1.5 volts.
In case the sample size is small ( n 5 30) we apply t-statistic with n - 1 degrees of
freedom. The test statistic is
2) Find out whether it requires one-tail or two-tail test. Accordingly identify the
rejection region in the standard normal curve. This will help in specification of
the alternative hypothesis.
3 Check whether sample size is large ( n > 30) or small ( n s 30 ).
4) In case n > 30,apply z-statistic given at (19.7).
5) Find out from z-table the critical value according to level of significance ( a ).
6) In case n s 3 0 ,apply t-statistic given at (19.8).
7) Find out from t-table (given in Table 15.3 in Block 5) the critical value for n - 1
degrees of freedom and level of significance (a).
8) If you obtain a value lower than q e critical value do not reject the null.hypoth-
esis.
9) If you obtain greater than the critical value reject the null hypothesis and
Example 19.2 Testing ofHypothesis*
At 5 per cent level of significance the critical value of z is 1.96. Since the z
value that we have obtained is less than 1.96, we do not reject the null hypothesis.
Therefore the mean level of aspirin is 10 mg.
r Example 19.3
The population of Haripura district has a mean life expectancy of 60 years. Certain
health care measures are undertaken in the district. Subsequently,a random sample of
25 persons shows an average life expectancy of 60.5 years with a standard deviation
of 2 years. Can we conclude at the 0.05 level of significance that the average life
expectancy in the district has indeed gone up?
Here, Ho : p = 60
Since sample size is 25, degrees of freedom is 25 - 1 = 24. From the t-table we
find that for 24 degrees of freedom, 5 per cent level of significance, and one-
tail test the t-value is 1.71.
Since t-value obtained above is less than the critical value we donot reject the
hypothesis. Therefore, life expectancy has not changed Therefore, we accept
the alternative hypothesis that life expectancy for the district has not changed
after the health care measures.
Statistical Inference Check Your Progress 2
1) A report claimed that in the 'School Leaving Examination', the average marks
scored in Mathematics was 78 with a standard deviation of 16. However, a ran-
dom sample of 37 students showed an average of 84 marks in Mathematics. In
the light of this evidence, can we conclude that actually the average was more
than 78? Use 0.05 level of significance.
2). A passenger car company claims that average fuel efficiency of cars is 35 km.
per litre of petrol. A random sample of 50 cars shows an average of 32 km. per
litre with a standard deviation of 1.2 km. Does this evidence falsify the claim of
the passenger car company at 0.01 level of significance?
3) A random sample of 200 tins of coconut oil gave an average weight of 4.95 kg
per tin with a standard deviation of 0.2 1 kg. Do we accept the hypothesis of net
weight of 5 kg per tin at 0.01 level of significance?
Here again we deal with two different situations: whether variance of both the
populations are known. Another consideration is sample size: large or small.
The null hypothesis is the statement that population means of both the populgtions are
the same. In notations
I Ho : PI = P2 ...(19.9)
The alternativehypothesis is the statementthat both the population means are different.
In notations
When standard deviations (positive square root of variance) of both the populations
are known we apply z statistic specified as follows:
In (19.1 1) above, subscript 1 refers to the first sample and subscript 2 refers tb the
second sample. By applying relevant data in (19.1 1) we obtain-theobserved value ofz
and compare it with the critical value for specified level of significance.
Example 19.4
A bank wants to find out the average savings of its customers in Delhi and Kolkata. A
sample of 250 accounts in Delhi shows an average savings of Rs. 22500 while a
C
sample of 200 accounts in Kolkata shows an average savings of Rs. 21500. Tt is
known that standard deviation of savings in Delhi is Rs. 150 and that in Kolkata is Rs.
200. Can we conclude at 1 per cent level of significancethat banking pattern of customers
in Delhi and Kolkata is the same?
In this case the null kypothesis is Ho : p, = p,
n, = 250 n, = 200
, ,
Since o and o are known we apply z-test.
Statistical Inference
- x 2 ) - (PI - ~2 1
The test statistic is z = I -
We find that at 1 per cent level of significance the critical value obtained from Table
19.2 is 2.58.
Since the observed value oft is greater than the critical value oft the null hypothesis is
rejected and the alternativehypothesis is accepted. Thusthe banking pattern of customers
in Delhi and Kolkata are different.
On the other hand, if samples are small in size (n5 30 ) then we apply t-statistic as
follows:
-
t = I(%
i--
- x2 1 -(PI - ~2 1
"2
Example 19.5
A mathematics teacher wants to compare the performance of Class X students in two
sections. She administersthe same set of questions to 25 students in Section A and 20
students in Section B. She finds that Section A students have a mean score of 78 marks
with standard deviation of 4 marks while Section B students have a mean score of 75
marks with standard deviation of 5 marks. Is the performance of students in both
Sectionsdifferent at 1 per cent level of significance?
In this case the null hypothesis is H , :p, = p2
t=
1 -2 1- 1 -2 1 178 - 751
3
0 , =6.1 0, =6.1 .
Test the following hypothesis.
2, From two normal populations two samples are drawn. The following informa-
tion is obtained.
nl=l 5 n2=10
3) Suppose that samples of size n1=20and n2=15 are drawn from two normal popu-
lations. The sample statistics are as follows:
Statistical Inference
Can we conclude at the 5% level of significance that p, < p, ?
LET US SUM UP
In the present Unit we discussed about the methods of testing a hypothesis and drawing
conclusions about the population. Hypothesis is a statement about a parameter. In
order to test a hypothesis we formulate test statistic on the basis of the information
available to us. In this Unit we considered two situations: i) description of a single
sample, and ii) comparison between two samples.
Construction ofthe test statistic depends on the knowledge about the population variance
and sample size. When population variance is known to us or the sample size is
large we apply normal distribution and use z statistic to test the hypothesis. On the
other hand, when we do not know the population variance and sample size is small
we construct the test statistic on the basis o f t distribution. Remember that for large
samples t distribution approximates normal distribution and therefore we can use z
statistic.
1) Since it is large sample with known variance, we apply z-statistic. Since alterna-
tive hypothesis is p > 78 ,we apply one-tail test. The observed value ofz is 2.28
and critical value of z at 5% level of significance is 1.65. Since the observed
value is greater than the critical value we reject the null hypothesis. Therefore,
we conclude that the average marks was more than 78.
2) It is a large sample with unknown variance. It requires two-tail test. The observed
value ofz is 17.68 and critical value ofzat 1% level of significance is 2.58. Since
the observed value is greater than the critical value, the null hypothesis is re-
jected.
3) It is a large sample with unknown variance. Requires two-tail test with z-statistic.
Observed value of z is 3.37. Null hypothesis is rejected.
4) Since it is large sample with known standard deviation, we applyz-statistic. Re-
quires two-tail test. Observed value ofz is 3.00. critical value ofz at 5% level of
significance is 2.58. null hypothesis is rejected. Therefore, the national average
of annual income of government employees is different from Rs.24532.
1) The samples sizes are large and population standard deviations are known. Hence
we apply z-statistic and observed value ofz is 2.58. Since the alternative hypoth-
esis is p, z p , ,we have a two tail test at a = 0.05 the critical value ofz is 1.96.
Null hypothesis is rejected.
2) The sample sizes are small and a is not known. Hence we apply t-statistic and
observed value oft is 0.61. The hypothesis are Ho: p, = p, against HA:p, z p , .
Thus, it requires a two-tail test. For 23 d.f., at 1% level of significance, the
critical value o f t is 2.50. Ho is not'rejected.
3) The sample sizes are small and a is not known, t-statistic is applied. Observed
value of t is 0.72. Ho is p, = p, and HAis p, < p, . Hence one-tail test is
required. Thus critical value o f t at 33 d.f. for 5% level of-significanceis 2.00.
Ho is not rejected.