0% found this document useful (0 votes)
19 views34 pages

Sampling

The document provides an overview of sampling theory, including definitions of key terms such as population, sample, mean, standard deviation, and hypothesis testing. It explains the concepts of sampling distribution, standard error, and the Central Limit Theorem, as well as the procedures for conducting Z-tests for single and difference of means. Additionally, it outlines the steps for hypothesis testing and the significance levels used in statistical analysis.

Uploaded by

sandeepswamy54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views34 pages

Sampling

The document provides an overview of sampling theory, including definitions of key terms such as population, sample, mean, standard deviation, and hypothesis testing. It explains the concepts of sampling distribution, standard error, and the Central Limit Theorem, as well as the procedures for conducting Z-tests for single and difference of means. Additionally, it outlines the steps for hypothesis testing and the significance levels used in statistical analysis.

Uploaded by

sandeepswamy54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

SAMPLING THEORY

Basic Definitions
∑𝒙𝒊 ∑ 𝒇𝒊 ⋅𝒙𝒊
• Average(Mean) : For ungrouped data 𝝁 = or For grouped data
𝒏 ∑𝒇𝒊

∑(𝒙𝒊 −ഥ𝒙)𝟐
• Standard Deviation: 𝝈 = 𝒏

• Population or Universe : The group of individuals from which we draw data for a study
• Sample : Finite subset of the population
• Sampling : The process of selecting a sample from a population
• Sample size : The number of individuals in a sample
• Parameter : The statistical constants such as mean S.D. of the population
• Statistics: The statistical constants such as mean S.D. of the sample
• Symbols which are used for population and sample: →
Population Sample
Parameter Statistics
Population size (𝑁) Sample size (𝑛)
Population Mean (𝜇) Sample Mean (𝜇𝑥ҧ )
Population S.D (𝜎) Sample S.D (𝜎𝑥ҧ 𝑜𝑟 𝑠)

• Sampling distribution: Let us consider a population of size 𝑁 and let us draw all possible samples
of a given size 𝑛. For each of these samples, we compute a statistic (i.e., sample mean, sample
variance, sample proportion etc..). The value of the statistic may vary from sample to sample.
• “The distribution of values of the statistic for different samples of the same size is called sampling
distribution of the statistic”.
• When we obtain a distribution of mean, it is called Sampling distribution of mean and when we
obtain a distribution of proportion, it is called Sampling distribution of proportion
• Standard error : The standard deviation of sampling distribution is called the standard error (S.E)

• The relation between mean of the sampling distribution and population mean: 𝝁ഥ𝒙 = 𝝁
• The relation between variance of the sampling distribution and population variance:
𝝈𝟐 𝝈𝟐 𝑵−𝒏
• With replacement → 𝝈𝟐ഥ𝒙 = without replacement → 𝝈𝟐ഥ𝒙 =
𝒏 𝒏 𝑵−𝟏
ഥ and Standard deviation (𝒔) in Calculator CASIO fx-991ES and onwards
Steps to find Mean 𝒙
For ungrouped data
Step1: MODE>>3: STAT>>1:1-VAR>>{Enter the x values} >> AC

Step2: Shift+1(STAT)>>4:Var>>{2:𝑥ҧ for 𝑥ҧ 3:𝜎𝑥 for 𝑠}

For grouped data


Step1: Shift+MODE>>More(Down Arrow)>>4:STAT>>Frequency 1:ON>>AC (Select 2:OFF for ungrouped data)

Step2: MODE>>3: STAT>>1:1-VAR>>{Enter the x values and Frequencies} >> AC

Step3: Shift+1(STAT)>>4:Var>>{2:𝑥ҧ for 𝑥ҧ 3:𝜎𝑥 for 𝑠}


Example: Consider a population consisting of four numbers 3, 7, 11, 15. Consider all possible samples of size
2. Find i) mean of the population (𝛍) ii) S.D. of the population (𝛔)
iii) mean and variance of the sample distribution (𝛍𝐱ത , 𝛔𝐱ത ) with replacement and show that 𝛍𝐱ത = 𝛍,
𝛔𝟐 𝛔𝟐 𝐍−𝐧
and 𝛔𝟐𝐱ത
= also iv) prove that 𝛍𝐱ത = 𝛍, and 𝛔𝟐𝐱ത = in the case of without replacement.
𝐧 𝐧 𝐍−𝟏
• i) Population mean :

• ii) Population S.D. :

• All possible samples of size 2 with replacement :{(3,3), (3,7), (3,11), (3,15), (7,3), (7,7), (7,11), (7,15), (11,3),
(11,7), (11,11), (11,15),(15,3),(15,7),(15,11),(15,15)}
• Sample means={3,5,7,9,5,7,9,11,7,9,11,13,9,11,13,15}
• Distribution of the sample means
𝑥 3 5 7 9 11 13 15
𝑓 1 2 3 4 3 2 1

∑(𝑓⋅𝑥)
• Mean of the sample means : 𝜇𝑥ҧ = = 9;
∑𝑓
∑ 𝑓⋅𝑥 2 1456
• 𝜎𝑥2ҧ = 𝜇𝑥ҧ 2 − 𝜇𝑥ҧ 2
= − 𝜇𝑥ҧ 2
= − 92 = 10
∑𝑓 16
• All possible samples of size 2 without replacement :{(3,7), (3,11), (3,15), (7,3),(7,11), (7,15), (11,3),(11,7),
(11,15),(15,3),(15,7),(15,11)}
• Sample means={5,7,9,11,13,5,9,11,7,9,13,9,11,13} (Example: Mean of (3,7) is (3+7)/2=5)
• Distribution of the sample means

𝑥 5 7 9 11 13
𝑓 2 2 4 2 2
∑(𝑓⋅𝑥) 108
• Mean of the sample means : 𝜇𝑥ҧ = = = 9;
∑𝑓 12
1052 80 20
• Variance of the sample distribution : 𝜎𝑥2ҧ = 𝜇𝑥ҧ 2 − 𝜇𝑥ҧ 2 = − 9 2 = 12 =
12 3

Conclusion

• Mean with or without replacement : 𝜇𝑥ҧ = 9 = 𝜇,


20 𝜎2
• Variance with replacement : 𝜎𝑥2ҧ = = 10 =
2 𝑛
2
2 𝜎 𝑁−𝑛 4−2 20
• Variance without replacement : 𝜎𝑥ҧ = 𝑛 𝑁−1 = 10 =
4−1 3
• Hypothesis: An assumption or concession made for the sake of argument
• Null Hypothesis(𝑯𝟎 ): A tentative assumption is made about the parameter for the sake of testing.
• Alternative Hypothesis 𝑯𝟏 : The hypothesis which is the opposite of what is stated in the null hypothesis.
• Types of Errors:
Decision
Types of Errors:
Accept 𝑯𝟎 Reject 𝑯𝟎
𝐇𝟎 is True Correct Decision Type I Error
Fact

𝐇𝟎 is False Type II Error Correct Decision

• Test statistic is a quantity derived from the sample for statistical hypothesis testing..
• Critical region : The test procedure divides the possible values of the test statistic into two regions namely an
acceptance region for 𝐻0 and a rejection region for 𝐻0 . The region where 𝐻0 is rejected is known as the critical
region
• Level of Significance (LoS) : The probability of rejecting 𝐻0 when it is true. Usually we take 5% and 1% LoS.
• One/Two tailed test : The nature of the critical region depends on the alternative hypothesis 𝐻1
• For example if 𝐻0 : 𝜇 = 𝜇0 and if 𝐻1 : 𝜇 < 𝜇0 then we use chose the critical region from one-tailed test(left)
if 𝐻1 : 𝜇 > 𝜇0 then we use chose the critical region from one-tailed test(right)
if 𝐻1 : 𝜇 ≠ 𝜇0 then we use chose the critical region from two-tailed test

• Test procedure
The steps in the application of a statistical test procedure for testing a null hypothesis are as follows:
• Setting up the null hypothesis.
• Setting up the alternative hypothesis.
• Identifying the test statistic.
• Setting a suitable level of significance such as 1% or 5%.(Default 5%)
• Identifying the critical region.
• Making decision based on calculated value 𝑇 and critical value 𝑇𝛼 .
• Accept 𝐻0 if 𝑇 < 𝑇𝛼 or Reject 𝐻0 if 𝑇 > |𝑇𝛼 |
Central Limit Theorem
the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of
the population's distribution, i.e., standardized sample mean is given by
ҧ
𝑥−𝝁
𝑧 = 𝝈/ 𝒏

Z-test
• For large samples (n ≥ 30), most of the sampling distributions tend to normality, and so, the test may be based
on normal distribution.
• Critical values for 𝑧 test and confidence limits for 𝑧 Two Tailed
Z-test for single mean
• For large samples (n ≥ 30), most of the sampling distributions tend to normality, and so, the test may be based
on normal distribution.
• Null Hypothesis 𝑯𝟎 : 𝝁 = 𝝁𝟎 (There is no significance difference between means of the population and
sample)
• 𝑧-statistic is given by
• 𝒙 ഥ → Mean of the sample,
• 𝜇0 → 𝐻0 (assuming equal to population mean)
• 𝜎 → Standard deviation
• 𝑛 → size of the sample
• Accept 𝐻0 if 𝑧 < 𝑧𝛼 or Reject 𝐻0 if 𝑧 > |𝑧𝛼 |
1) A sample of 900 members is found to have a mean of 3.4 cm. Can it be reasonably regarded as a truly random
sample from a large population with mean 3.25cm and SD 1.61cm.
Solution: 𝑛 = 900, 𝑥ҧ = 3.4, 𝜎 = 1.61,
H0 : 𝜇 = 3.25 𝑆𝑎𝑚𝑝𝑙𝑒 𝑖𝑠 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑚𝑒𝑎𝑛 3.25 ,
𝐻1 : 𝜇 ≠ 3.25 ⇒ Two tailed test ⇒ in 5% 𝐿𝑜𝑆 𝑧𝛼 = 1.96, in 1% 𝐿𝑜𝑆 𝑧𝛼 = 2.58
ҧ
𝑥−𝝁
|𝑧| = = 2.795 > 𝑧𝛼 at both LoS
𝝈/ 𝒏
⇒ Reject 𝐻0 at both 5% and 1% LoS.

2) An ambulance service claims that it takes on the average less than 10 minutes to reach its destination in
emergency calls. A sample of 36 calls has a mean of 11 minutes and the variance of 16 minutes. Test the
significance at 0.05 level.
Solution: 𝑛 = 36, 𝑥ҧ = 11, 𝜎 2 = 16,
H0 : 𝜇 = 10 𝐻𝑖𝑠 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑡𝑖𝑚𝑒 𝑖𝑠 10 𝑚𝑖𝑛𝑠 ,
𝐻1 : 𝜇 < 10 𝐻𝑖𝑠 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑡𝑖𝑚𝑒 𝑖𝑠 < 10 𝑚𝑖𝑛𝑠 ⇒ One tailed test ⇒ in 5% 𝐿𝑜𝑆 𝑧𝛼 = 1.645
ҧ
𝑥−𝝁
𝑧 = = 1.5 < 𝑧𝛼 at 5% LoS
𝝈/ 𝒏
⇒ Accept 𝐻0 at 5% LoS. Hence Reject the ambulance claim
Z-test for difference of means
• For two large samples (n ≥ 30), if we want to check the difference of means then
• Null Hypothesis 𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 (There is no significance difference between means)
• 𝑧-statistic is given by
𝑥1 − 𝑥2
𝑧=
𝜎1 𝜎2
𝑛 +𝑛
1 2
• If we 𝜎1 and 𝜎2 are unknown and 𝑠1 and 𝑠2 are known and 𝜎1 ≠ 𝜎2 then 𝜎1 = 𝑠1 and 𝜎2 = 𝑠2
𝑛1 𝑠12 +𝑛2 𝑠22
• If we 𝜎1 and 𝜎2 are unknown and 𝑠1 and 𝑠2 are known and 𝜎1 = 𝜎2 then 𝜎1 = 𝜎2 = 𝑛1 +𝑛2

1) A company claims that its bulbs are superior to those of its main competitor. If a study showed that a sample of
40 of its bulbs has a mean life time of 647 hrs of continuous use with a S.D of 27 hrs. While a sample of 40
bulbs made by its main competitor had a mean life time of 638 hrs of continuous use with a S.D of 31 hrs. Test
the significance between the difference of two means at 5% LoS and 1% LoS.
Solution: 𝑛1 = 40, 𝑥1 = 647, 𝜎1 = 27, 𝑛2 = 40, 𝑥2 = 638, 𝜎2 = 31
𝐻0 : 𝜇1 = 𝜇2 (𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑜𝑓 𝑏𝑜𝑡ℎ 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 ′ 𝑠 𝑏𝑢𝑙𝑏𝑠 𝑎𝑟𝑒 𝑠𝑎𝑚𝑒),
𝐻1 : 𝜇1 > 𝜇2 , ⇒ One tailed test ⇒ in 5% 𝐿𝑜𝑆 𝑧𝛼 = 1.64, in 1% 𝐿𝑜𝑆 𝑧𝛼 = 2.33
𝑥1 −𝑥2
𝑧= = 7.47 > 𝑧𝛼 at both LoS
𝜎1 𝜎2
+
𝑛1 𝑛2
⇒ Reject 𝐻0 at both 5% and 1% LoS. ⇒ Accept 𝐻1 i.e., Company claim is accepted.
Z-test for significance of proportions
• For large samples (n ≥ 30), and proportions of success is given then 𝑧-statistic is given by
• 𝑋 → number of success in independent trials
• 𝑃 → probability of success in independent trials
• 𝑛 → size of the sample

𝑋
• If P is unknown then let 𝑝 = 𝑛 , then the probable limits for the proportion in the population is given by
𝑝 ± 2.58 𝑝𝑞/𝑛 𝑜𝑟 𝑝 ± 1.96 𝑝𝑞/𝑛
• Accept 𝐻0 if 𝑧 < 𝑧𝛼 or Reject 𝐻0 if 𝑧 > |𝑧𝛼 |

1) A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis that the coin is unbiased
at 5% level of significance.
Solution: 𝑛 = 400, 𝑋 = 216, 𝑃 =prob. of head = 1/2 ⇒ 𝑄 = 1/2
𝐻0 : Coin is unbiased i.e. 𝐻1 : Coin is unbiased ⇒ Two tailed test ⇒ at 5% 𝐿𝑜𝑆 𝑧𝛼 = 1.96
𝑋 − 𝑛𝑃
|𝑧| = = 1.6 < 𝑧𝛼
𝑛𝑃𝑄
⇒ 𝐻0 is accepted. i.e., Coin is unbiased.
2) A dice was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times on assumption of random
throwing. Do the data indicate an unbiased dice at 0.01 LoS?
(Hint: n=9000, X=3240, P= prob. of getting 5 or 6 = 2/6)

3) A survey was conducted in a slum locality of 2000 families by selecting a sample of size 800. It was revealed
that 180 families were illiterates. Find the probable limits of the illiterate families in the population of 2000 at
5% LoS.
𝑋 180
Solution: 𝑃 is unknown so 𝑝 = 𝑛 = 800
⇒ the probable limits at 5% LoS is 𝑝 ± 1.96 𝑝𝑞/𝑛 = 0.206, 0.2433
Probable number of illiterate families are 0.206 ⋅ 2000, 0.2433 ⋅ 2000 = 413, 487

4) A sample of 900 days is taken from metrological records of a certain district and 100 of them are found to be
foggy. What are the probable limits in percentage of foggy days in the district?
𝑋 1 𝑝𝑞
(Hint: 𝑛 = 900, 𝑋 = 100, 𝑝 = 𝑛 = 9 , 𝑝 ± 1.96 ⇒ (7.96% , 14.25%)
𝑛
Student’s t- Test
• Student’s t-test is used when sample size is less than 30 and
population S.D is unknown. 𝑛 > 30
• The probability density function of t-distribution is given by

• No. of degrees of freedom is the number of values in a set


which may be assigned arbitrarily.

Example: If the case is 𝑥 + 𝑦 + 𝑧 = 1, to find 𝑥(or 𝑦 or 𝑧) we have freedom to choose 2 values 𝑦 and 𝑧(𝑥&𝑧 or
𝑥&𝑦) arbitrarily, hence degrees of freedom is 2
Student’s t- Test for single mean
The following assumptions are made for the t-test
• The sample size is small (n<30)
• Null Hypothesis- 𝐻0 : 𝜇 = 𝜇0 (Sample mean=population mean)
ҧ 0
𝑥−𝜇
• t-statistic is given by 𝑡 = 𝑠/ with 𝑛 − 1 degrees of freedom 99% and 95% Confidence limit for 𝜇
𝑛−1
∑ 𝑥−𝑥ҧ 2
• 𝑥,ҧ 𝑠 are the mean and S.D if the sample 𝑠 2 =
𝑛
Problems
1) The nine items of a sample have the following values:45, 47, 50, 52, 48, 47, 49, 53, 51. Does the mean of
these differ significantly from the assumed mean of 47.5 at 5% LOS?
Hint: 𝐻0 : 𝜇 = 47.5, , 𝐻1 : 𝜇 ≠ 47.5, 𝑥ҧ = ∑𝑥𝑖 /𝑛, 𝑠 = ∑ 𝑥𝑖 − 𝑥ҧ 2 /𝑛 = 2.469, 𝑛 = 9, 𝑑𝑓 = 8
2) The heights of 10 males of a given locality are found to be 175,168,155,170, 152, 170, 175, 160, 160 and
165 cms. Based on this sample, find the 95% confidence limits for the height of males in that locality.
3) A fertilizer mixing machine is set to give 12 kg of nitrate for quintal bag of fertilizer. Ten 100 kg bags are
examined, the percentage of nitrate per bag are as follows: 11, 14, 13, 12, 13, 12, 13, 14, 11, 12. Are there
any reasons to believe that the machine is defective? Value of t for 9 degree of freedom is 2.262.
Hint: 𝐻0 : 𝜇 = 12, 𝑥ҧ = ∑𝑥𝑖 /𝑛,𝑠 = ∑ 𝑥𝑖 − 𝑥ҧ 2 /𝑛, 𝑛 = 10, 𝑑𝑓 = 9
4) A machinist is making engine parts with axle diameter of 0.7 inch. A random sample of 10 parts shows
mean diameter 0.742 inch with a SD of 0.04inch .On the basis of the sample, test whether the work is
meeting the specification? .
Hint: 𝐻0 : 𝜇 = 0.7, 𝐻1 : 𝜇 ≠ 𝑥,ҧ ⇒ two tailed test, 𝑥ҧ = 0.742, 𝑠 = 0.04, 𝑛 = 10, 𝑑𝑓 = 9
5) The mean lifetime of a sample of 25 bulbs is found as 1550 hours with a SD of 120h. The company
manufacturing the bulbs claims that the average life of their bulbs is 1600h. Is the claim acceptable at 5%
LOS?
6) The average breaking strength of steel rods is specified to be 18.5 thousand pounds. To test this a sample of
14 rods was tested. The mean and standard deviation obtained were 17.85 and 1.955 respectively. Is the
result of the experiment significant with 95% confidence?
𝒔
7) Show that 95% confidence limits for the mean µ of the population are 𝒙
ഥ ± 𝒕𝟎.𝟎𝟓 .Deduce that a
𝒏−𝟏
random sample of 16 values with mean 41.5 inches and the sum of the square of the deviation from
the mean 135 inches2 and drawn from a normal population ,95% confidence limits for the mean of
population are 39.9 and 43.1 inches.
Student’s t- Test for difference between means of two independent samples of mean
sizes 𝒏𝟏 and 𝒏𝟐
𝒙𝟏 −𝒙𝟐
t-statistic is given by 𝒕 =
𝒏𝟏 𝒔𝟐 𝟐
𝟏 +𝒏𝟐 𝒔𝟐 𝟏 + 𝟏
𝒏𝟏 +𝒏𝟐 −𝟐 𝒏𝟏 𝒏𝟐

which follows 𝑛1 + 𝑛2 − 2 degrees of freedom. 𝑠1 and 𝑠2 are the S.D of samples. If 𝑛1 = 𝑛2 = 𝑛 then

Problems
1) A group of 10 rats fed on a diet A and another group of 8 rats fed on a different diet B
recorded the following increase in weights in gms. Test whether the diet A is superior to diet B.
Diet A 5 6 8 1 12 4 3 9 6 10
Diet B 2 3 6 8 1 10 2 8 - -

Solution: 𝐻0 : 𝑥1 = 𝑥2 (No difference between Diet A and Diet B)


H1 : 𝑥1 > 𝑥2 diet A is superior to diet B

2
∑𝑥1𝑖 ∑ 𝑥1𝑖 −𝑥1 2 ∑𝑥2𝑖 ∑ 𝑥2 𝑖 −𝑥2
𝑥1 = = 6.4, 𝑠1 = = 3.2 , 𝑥2 = = 5, 𝑠2 = = 1.7892
𝑛1 𝑛1 𝑛2 𝑛2
At 5% LoS, 𝑡𝛼 =1.746
𝒙𝟏 − 𝒙𝟐
𝑡= = 𝟎. 𝟐𝟖𝟔𝟑 𝒅. 𝒇. = 𝟏𝟎 + 𝟖 − 𝟐 = 𝟏𝟔, 𝒕 < 𝒕𝜶
𝒏𝟏 𝒔𝟐𝟏 + 𝒏𝟐 𝒔𝟐𝟐 𝟏 𝟏
+
𝒏𝟏 + 𝒏𝟐 − 𝟐 𝒏𝟏 𝒏𝟐
⇒ Accept 𝐻0 . That is reject the claim that Diet A is superior to Diet B
2) Sample of two types of electric bulbs were tested for length of life and the following data were obtained

Size Mean SD
Sample1 8 1234 h 36 h
Sample2 7 1036 h 40 h
Is the difference in the means sufficient to warrant that type 1 bulbs are superior to type 2 bulbs?

3) The table gives the biological values of protein from 6 cows’ milk and 6 buffalo’s milk. Examine whether the
differences are significant.

Cow’s Milk 1.8 2.0 1.9 1.6 1.8 1.5


Buffalo’s
2.0 1.8 1.8 2 2.1 1.9
Milk
Student’s t- Test for equality of means of two dependent samples
• Students average marks 𝑥 in Test 1 and average marks 𝑦 in Test 2 after remedial classes
• B.P. of patients undergoing YOGA treatment before and after treatment
In the above situations, we have to test whether two means 𝜇1 and 𝜇2 are equal(𝐻0 ) or not(𝐻1 ).
For 𝑛 dependent random samples 𝑥, 𝑦, let 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖 ⟹ 𝑑ҧ and 𝑠𝑑 are the mean and S.D. of the differences
(consider the negatives), then t-statistic is given by


𝒅
𝒕=
𝒔𝒅 / 𝒏 − 𝟏

1) Eleven school boys were given as test in drawing. They were given a month’s further tuition and a
second Test at of equal difficulty was held at the end of it. Do the marks give evidence that the
students have benefited by extra coaching?

Boys 1 2 3 4 5 6 7 8 9 10 11
Marks I test 23 20 19 21 18 20 18 17 23 16 19
Marks II test 24 19 22 18 20 22 20 20 23 20 18

Solution: 𝐻0 : No improvement between Marks of Test 1 and Test 2)


H1 : Improvement in Test 2
𝑑 = −1, 1, −3,3, −2, −2, −2, −3, 0, −4,1
At 5% LoS, 𝑡𝛼 =1.746
∑𝑑 ∑ 𝑥𝑖 −𝑥1 2
𝑑ҧ = 𝑛 𝑖 = −1.09, 𝑠𝑑 = = 2.02 ⇒ t = 1.706 > t 𝛼 ⇒Reject 𝐻0
𝑛1
𝑭-test
• F-test is used to test the equality of population variances
• F distribution is given by
𝑑
1 𝑑1 𝑥 𝑑1 𝑑2 2
𝑓 𝑥 = 𝑑1 𝑑2
𝑥𝛽 , 𝑑1 𝑥+𝑑2 𝑑1 +𝑑2
2 2
where 𝑑1 and 𝑑2 are the degrees of freedom
• Suppose we have two samples 𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛1 and 𝑦1 , 𝑦2 , … 𝑦𝑛2 have drawn from
the normal population having same variance 𝜎 2
• i.e., Null Hypothesis 𝑯𝟎 : 𝝈𝟐𝒙 = 𝝈𝟐𝒚 = 𝝈𝟐
𝑢2 𝒏𝟏 𝒔𝟐𝟏 𝒏𝟐 𝒔𝟐𝟐
• Test statistic is given by 𝐹 = , where 𝒖𝟐 =largest of , and 𝑣 2 is
𝑣2 𝒏𝟏 −𝟏 𝒏𝟐 −𝟏
other one.
𝑛1 𝑛2
∑𝑖=1 𝑥𝑖 −𝑥ҧ 2 ∑𝑖=1 𝑦𝑖 −𝑦ത 2
𝑠12 = , 𝑠22 =
𝑛1 𝑛2
• Test statistic is calculated with 𝑛1 − 1, 𝑛2 − 1 degrees of freedom.
• F-table for critical values 𝐹𝛼 𝑛1 − 1, 𝑛2 − 1 is given below.
• Columns specify the numerator degrees of freedom (DF1), while rows set the
denominator’s (DF2).
1) A sample of size 13 gave an estimated population variance of 3.0, while another sample of size 15 gave an
estimate of 2.5. could both samples be from populations with the same variance?
Solution:𝑛1 = 13, 𝑛2 = 15, 𝑠12 = 3, 𝑠22 = 2.5
𝐻0 : 𝜎𝑥2 = 𝜎𝑦2
𝑛1 𝑠12 𝑛2 𝑠22
= 3.25 = 2.6785 3.25 > 2.6785 ⇒ 𝑢2 = 3.25, 𝑣 2 = 2.6785
𝑛1 −1 𝑛2 −1
𝐹 = 𝑢2 /𝑣 2 =1.2133
𝐹𝛼 (12,14) = 2.53 ⟹ F < F𝛼 ⟹ H0 is accepted. i.e.,
Both the samples are from the populations with same variance.
2) Two samples of sizes 9 and 8 gave the sums of squares of deviations from their respective
means equal to 160 and 91 respectively. Can they be regarded as drawn from the same normal
population?
3) Two independent samples of sizes 7 and 6 have the following values:
Sample A 28 30 32 33 33 29 34
Sample B 29 30 30 24 27 29 -
Examine Whether the samples have been drawn from normal population having the same
variance?
∑ 𝑥1𝑖 −𝑥1 2 ∑ 𝑥2𝑖 −𝑥2 2
Hint: 𝑠1 = = 2.118, 𝑠22 = = 2.114
𝑛1 𝑛2
𝑠12 > 𝑠22 ⇒ 𝑢2 = 4.489, 𝑣 2 = 4.4722
⟹ 𝐹 = 𝑢2 /𝑣 2 =1.004, 𝐹𝛼 (6,5) = 4.95 ⟹ F < F𝛼
Hence 𝐻0 is accepted, i.e., samples have been drawn from normal population having the same
variance
𝝌𝟐 -test for goodness of fit
• 𝜒 2 -test is used to test the goodness of the fit i.e., theoretical distributions
• Let 𝑜1 , 𝑜2 , 𝑜3 , … 𝑜𝑛 and 𝑒1 , 𝑒2 , 𝑒3 , … 𝑒𝑛 be observed
and expected frequencies, such that ∑𝑜𝑖 = 𝑁 = ∑𝑒𝑖
• i.e., Null Hypothesis 𝑯𝟎 : The theoretical distribution is a
good fit to the observed frequencies.
𝑛 𝑜𝑖 −𝑒𝑖 2
• Test statistic is given by 𝜒2 = ∑𝑖=1
𝑒𝑖
• Test statistic is calculated by considering 𝝂 = 𝒏 − 𝒄 degrees
of freedom, where 𝒏 is the observed number of terms and
c is the number of constraints required for the estimation of the expected frequencies.
Constraints may be following :
• ∑𝑜𝑖 = 𝑁 ≅ ∑𝑒𝑖
(Here we assume that 𝑁 = ∑𝑒𝑖 , in all cases this is the null hypothesis, hence we loose 1 d.f. )
• Number of parameters estimated from observed distributions – (𝑝)
(Example, if we fit Poisson’s distribution for the given data, we will find the parameter of the Poisson’s distribution 𝑚 using
observed frequencies assuming they are same, hence we loose 1 d.f.→ 𝑝 = 1)
• From pooling ( If 𝑒𝑖 < 5, it should be pooled with adjacent frequencies, i.e., if 𝑘 number of 𝑒𝑖′ 𝑠 are < 5, then we lose 𝑘 d.f.)
• 𝜒 2 -table for critical values 𝜒𝛼2 at 𝑛 − (1 + 𝑝 + 𝑘) 𝑑. 𝑓. is given below.
• Accept 𝐻0 if 𝜒 2 < 𝜒𝛼2
1) The following data relates to the number of mistakes in each page of a book containing 180 pages.
No. of mistakes/page 0 1 2 3 4 ≥𝟓 Total
No. of Pages 130 32 15 2 1 0 180
Test whether the Poisson distribution is a good fit to this observed distribution.
Solution:
∑𝑓⋅𝑥
• 𝑥ҧ = = 0.4 ⇒ parameter of the Poisson distribution 𝑚 = 0.4 ⇒ 𝒑 = 𝟏
𝑛
𝑒 −𝑚 𝑚𝑟
• P.M.F.: 𝑃 𝑥 = 𝑟 = 𝑥 𝑃(𝑥) 𝒆𝑖 = 𝟏𝟖𝟎𝑷(𝒙) 𝒐𝒊
𝑟!
• Construct the distribution using above p.m.f ⟹ 0 0.6722 121 130
1 0.2666 48 32
• 𝐻0 : ∑𝑜𝑖 = ∑𝑒𝑖
(i.e., Poisson Distribution is the good 2 0.0555 10 11 15 18
fit for the data ⇒ 1 3 0.0055 1 2
• Pool the frequencies 𝑒𝑖′ 𝑠 having value less than 5 4 0 0 1
⇒𝒌=𝟑 ≥5 0 0 0
• Hence degrees of freedom=𝑛 − 1 + 𝑝 + 𝑘 = 1
Total 1 180 180
𝑜 −𝑒 2
• 𝜒2 = ∑𝑛𝑖=1 𝑖 𝑖 = 10.4572
𝑒𝑖
2 2
• 𝜒0.05,1 = 3.84 𝜒 2 > 𝜒0.05,1 ⇒ 𝐻0 is rejected
2) The following data relates to the number of mistakes in each page of a book containing 180 pages.

Test whether the Poisson distribution is a good fit to this observed distribution.
3) To an observed frequency distribution, binomial distribution is fitted after estimating p from the
observed data. The observed and theoretical frequencies are given below

Test whether binomial distribution is a good fit.

4) 10000 digits are randomly chosen from a telephone directory and the following
data is obtained. Test whether there is equi-distribution in the telephone director
at 1% level of significance.
5) According to a theory in Genetics, the proportion of beans of four types A, B, C and D in a
generation should be 9:3:3:1. In an experiment, among 1600 beans, the frequency of beans of
each of the above four types were 882, 313, 287 and 118 respectively. Does the result support the
theory?

:1

⇒ Unknown distribution ∶ 𝑝 =0

⇒ No pooling ∶ 𝑘 =0
4 − (1 + 0 + 0)
6) In order to test whether a die is biased, it is thrown 72 times and the results are tabulated as follows:

What is your conclusion?

6 − (1 + 0 + 0)
7) A survey of 64 families with 3 children each is conducted and the number of male children in each family
is noted. The result are tabulated as follows: Apply chi-square test of goodness of fit to test whether male
and female children are equiprobable

Male children 0 1 2 3 Total


Families 6 19 29 10 64

Degrees of freedom = 𝑛 − 1 + 𝑝 + 𝑘 = 4 − 1 = 3
⇒ 𝜒𝛼2 with 3 𝑑. 𝑓. = 7.81 hence 𝜒 2 < 𝜒𝛼2 ⇒ 𝐻0 is accepted
𝝌𝟐 test of independent attributes

• Let 𝐴𝑖 and 𝐵𝑖 are the two attributes which characterize the population sample of size m and 𝑛.
• 𝐻0 : The two attributes 𝐴 and 𝐵 are independent
𝑜𝑖∗ × 𝑜∗𝑗 𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
• We compute expected frequency 𝑒𝑖𝑗 using 𝑒𝑖𝑗 = 𝑁 = 𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
𝑜𝑖 −𝑒𝑖 2
• Compute 𝜒 = ∑𝑛𝑖=1
2
with 𝑚 − 1 × 𝑛 − 1 degrees of freedom.
𝑒𝑖

1) The following table shows the result of an experiment to investigate the effect of vaccination
induced on the animals against a particular disease. Use χ2- test to test the hypothesis that
there is no difference between and vaccinated and unvaccinated groups i.e. vaccination and this
disease are independent

Got disease Did not get disease


vaccinated 9 42
Not vaccinated 17 28

Hint: 51×26 45×26 51×70 45×70


Got Did not get
Total 𝑒1 = , 𝑒2 = , 𝑒3 = , 𝑒4 =
disease disease 192 192 192 192

vaccinated 9 42 51
𝑜1 = 9, 𝑜2 = 42, 𝑜3 = 17, 𝑜4 = 28
Not
17 28 45
vaccinated 𝑛 = 96, d.f.=𝑛 − 1 =95
Total 26 70 96
Exercise Problems:
1. A sample of 12 measurements of the diameter of a metal ball gave the mean xത = 7.38𝑚𝑚 With S.D. 𝑠 = 1.24𝑚𝑚.
Find i) 95% ii) 99% confidence limits for the actual diameter.
𝑠
Hint: 𝑛 = 12 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡, 𝑥ҧ = 7.38, 𝑠 = 1.24, find the confidence limit using 𝑥ҧ ± 𝑡𝛼 𝑛−1
2. A random sample of size 16 values from a normal population showed a mean of 53 and a sum of squares of deviation
from the mean is equal to 150. Can this sample be regarded as taken from the population having 56 as mean? Obtain
95% and 99% confidence limits of the mean of the population.
2 ∑ 𝑥𝑖−𝑥ҧ 2
Hint: 𝑛 = 16 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡 , 𝑥ҧ = 53, ∑ 𝑥𝑖 − 𝑥ҧ = 150, find 𝑠= , 𝐻0 : 𝜇 = 56 (sample mean =
𝑛
population mean)
3. A certain stimulus administered to each of the 12 patients resulted in the following change in blood pressure. 5,2, 8, -
1,3,0,6,-2,1,5,0,4. Can it be concluded that the stimulus will increase the blood pressure?
Hint: 𝑑𝑖 are given, 𝑛 = 12 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡, dependent samples, 𝐻0 : 𝑆𝑡𝑖𝑚𝑢𝑙𝑢𝑠 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑠 𝑡ℎ𝑒 𝑏𝑙𝑜𝑜𝑑 𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒
4. Test the equality of standard deviations for the data given below at 5% level of significance: n1=10 ; n2=14; s1=1.5 ;
s2=1.2
Hint: Equality of s.d. ⇒ F-test
5. The mean height and the SD height of 8 randomly chosen soldiers are 166.9 and 8.29 cm respectively. The
corresponding values of 6 randomly chosen sailors are 170.3 and 8.50 cm respectively. Based on this data, can we
conclude that soldiers are, in general, shorter than sailors?
Hint: 𝑛1 , 𝑛2 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡 , 2 independent samples
6. According to the norms established for a mechanical aptitude test, persons who are 18 years old have an average height of
73.2 inches with a standard deviation of 8.6 inches. If 4 randomly selected persons of that age averaged 76.7, test the
hypothesis against the alternative hypothesis at the 0.01 level of significance.
Hint: 𝐻0 : 𝜇 = 73.2, 𝑛 = 4 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡 ,single mean 𝑥ҧ = 76.7, 𝑠 = 8.6

7. The heights of 10 males of a given locality are found to be 175,168,155,170, 152, 170, 175, 160, 160 and 165 cms. Based
on this sample, find the 95% confidence limits for the height of males in that locality.
𝑠
Hint: 𝑛 = 10 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡, 𝑥,ҧ 𝑠, find the confidence limit using 𝑥ҧ ± 𝑡𝛼 𝑛−1

8. Sample of students were drawn from two universities and from their weights in kilograms, mean and standard deviations
are calculated and shown below. Make a large sample test to test the significance of the difference between the means.
Mean S.D Sample size
University A 55 10 400
University B 57 15 100.
Hint: 𝜎1 ≠ 𝜎2 are unknown and 𝑠1 , 𝑠2 are known, 𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 (There is no significance difference between means)

9. A sample of 6 persons in an office reveled an average daily smoking of 10,12,8,9,16,5 cigarettes. The average level of
smoking in the whole office has to be estimated at 90% level of confidence. t = 2.015 for 5 degree of freedom..
5 𝑠 5
Hint: 𝑛 = 6 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡, 𝑥,ҧ 𝑠, find the confidence limit using 𝑥ҧ ± 𝑡0.1 , given 𝑡0.1 = 2.015
𝑛−1

10. Ten individuals are chosen at random from a population and their heights are found to be in inches 63, 63, 64, 66, 69, 69,
70, 70, 71. Discuss the suggestion that the mean height of universe is 65
Hint: 𝐻0 : 𝜇 = 65, 𝑛 = 9 < 30 ⇒ 𝑡 − 𝑡𝑒𝑠𝑡 ,single mean, calculate 𝑥,ҧ 𝑠
11. The mean life time of sample of 100 fluorescent light bulbs produced by a company is computed to be 1570 hours with a
standard deviation of 120 hours. The company claims that the average life of the bulbs produced by it is 1600 hours. Using
the level of significance of 0.05, is the claim acceptable?
Hint: 𝑛 = 100 > 30 ⇒,z-test, 𝑥ҧ = 1570, 𝜎 = 120, H0 : 𝜇 = 1600(The avg life of the bulbs produced by the company is 1600)

12. Fit a Poisson distribution for the following data and test the goodness of fit.
x: 0 1 2 3 4 5 6 Total
f: 273 70 30 7 7 2 1 390.
Hint: 𝑛 = 7, goodness of the fit ⇒ 𝜒 2 -test
H0 : ∑𝑜𝑖 = ∑𝑒𝑖 , 𝑑𝑓: −1
∑𝑓 𝑥
𝑥ҧ = ∑𝑓𝑖 𝑖 = 0.5 = 𝑚(parameter of the Poisson distribution) 𝑝 = 1 ,
𝑖
𝑒𝑖 : 283.19,90.62, 14.49, 1.54, 0.12,0,0 Pooling 3 data ⇒ 𝑘 = 3 ⇒ 𝑑. 𝑓 = 𝑛 − 1 + 𝑝 + 𝑘 = 2

13. Among 64 offspring of a certain cross between Guinea pigs 34 were red, 10 were black and 20 were white. According to
the genetic model these numbers should be in the ratio 9:3:4. Are the data consistent with the model at 5% level?
64 10 20
Hint: 𝑜𝑖 : 34,10,20, 9:3:4⇒ total 16, 𝑒𝑖 → 9 ⋅ 16 , 3 ⋅ 16 , 4 ⋅ 16

You might also like