Key Terms
Population
In statistics, we generally want to study a population.
You can think of a population as a collection of persons,
things, or objects under study.
Population
Determine the population:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Population
Determine the population:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Ans:
The population is all first-year students attending Christ University this
term.
Key Terms
Sample
To study the population, we select a sample.
The idea of sampling is to select a portion (or subset) of the
larger population and study that portion (the sample) to gain
information about the population.
Sample
Determine the sample:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Ans:
The sample could be 100 first year students at the college.
Parameter
A parameter is a number that is a property of the population.
Because it takes a lot of time and money to examine an entire
population, sampling is a very practical technique.
Parameter
Determine the parameter:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Ans:
The parameter is the average (mean) amount of money spent (excluding
books) by first year college students at Christ University this term.
Statistic
A statistic is a number that represents a property of the
sample.
Statistic
Determine the statistic:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Ans:
The statistic is the average (mean) amount of money spent (excluding
books) by first year college students in the sample.
   Relation Between a Population and its Samples
Population – Parameter
Sample – Statistic
         Categories of Statistical Analysis
Descriptive Statistics         Inferential Statistics
Inferential statistics
Inferential statistics allows you to make predictions (“inferences”) from that data.
With inferential statistics, you take data from samples and make generalizations
about a population.
                         Statistical Inference
    Estimation                                     Testing of Hypothesis
Point      Interval    Bayesian
                 Parametric       Non-Parametric       Sequential           Bayesian
                                                         the sample size
                                                         is not fixed in
                                                         advance.
 Large Sample                                            Instead data are
                              Small Sample               evaluated as
                                                         they are
                                                         collected, and
 Normal                       Sampling distributions     further sampling
                                                         is stopped.
                   Why Statistical Tests?
Statistical tests are intended to decide whether a hypothesis about
distribution of one or more populations or samples should be rejected or
accepted.
          Statistical Hypothesis
 ▪ A hypothesis is a contention based on preliminary observation of what appears to be
   facts, which may or may not be true.
Some examples :
▪ The average rate of inflation in 1970’s was greater than the average rate of inflation in
 1990’s.
▪ An increase in the proportion of workers belonging to labor unions increases the wage rate
 in a state, Ceteris paribus.
▪ An increase in the unemployment rate reduces the rate of inflation.
HYPOTHESIS AND HYPOTHESIS TESTING
A statistical hypothesis is a claim (assertion, statement, belief, or
assumption) about an unknown population parameter value.
The process that enables a decision maker to test the validity (or
significance) of his claim by analyzing the difference between the value of
sample statistic and the corresponding hypothesized population
parameter value is called hypothesis testing.
Hypothesis Testing
•   Is also called significance testing
•   Tests a claim about a parameter using evidence (data in a sample
•   The technique is introduced by considering a one-sample z test
•   The procedure is broken into five steps
•   Each element of the procedure must be understood
Step 1: State the Null Hypothesis (H0) and Alternative Hypothesis (H1)/Ha
✔   The null hypothesis (H0) is a claim of “no difference in the population”
✔   The alternative hypothesis (Ha) claims “H0 is false”
The problem: In the 1970s, 20–29 year old men in India had a mean μ body weight
of 170 pounds. Standard deviation σ was 40 pounds. We test whether mean body
weight in the population now differs.
Null hypothesis H0: μ = 170 (“no difference”)
The alternative hypothesis can be either Ha: μ > 170 (one-sided test) or
Ha: μ ≠ 170 (two-sided test)
 Step 2: State the Level of Significance, α (alpha)
The level of significance, usually denoted by α (alpha), defines the likelihood of
rejecting a null hypothesis when it is true,
✔ that is, it is the risk a decision maker takes of rejecting the null hypothesis when
    it is really true.
 This means that the finding has a 95% chance of being true. Instead it will show
 you ".05," meaning that the finding has a five percent (.05) chance of not being
 true, which is the converse of a 95% chance of being true.
 Desired Confidence Interval      Z Score      Level of Significance
 90%                                1.645         10%
 95%                                1.96           5%
 99%                                2.576          1%
Step 3: Establish Critical or Rejection Region
The acceptance region shown in Fig. is a range of values of the sample statistic spread around the null hypothesized
population parameter. If values of the sample statistic fall within the limits of acceptance region, the null hypothesis
is accepted, otherwise it is rejected.
Step 4: Construction of test Statistic
Step 5: Formulate a Decision Rule to Accept Null Hypothesis
Compare the calculated value of the test statistic with the critical value (also called standard table
value of test statistic). The decision rules for null hypothesis are as follows:
• Accept H0 if the test statistic value falls within the area of acceptance.
• Reject otherwise.
HYPOTHESIS TESTING FOR POPULATION PARAMETERS WITH LARGE SAMPLES
Hypothesis Testing for Single Population Mean
A packaging device is set to fill detergent powder packets with a mean weight of 5 kg, with a standard
deviation of 0.21 kg. The weight of packets can be assumed to be normally distributed. The weight of
packets is known to drift upwards over a period of time due to machine fault, which is not tolerable. A
random sample of 100 packets is taken and weighed. This sample has a mean weight of 5.03 kg. Can we
conclude that the mean weight produced by the machine has increased? Use a 5 per cent level of
significance.
Desired Confidence Interval   Z Score   Level of Significance
90%                             1.645      10%
95%                             1.96        5%
99%                             2.576       1%
The mean lifetime of a sample of 400 fluorescent light bulbs produced by a
company is found to be 1600 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced in general is higher than
the mean life of 1570 hours at α = 0.01 level of significance.
The mean lifetime of a sample of 400 fluorescent light bulbs produced by a
company is found to be 1600 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced in general is higher than
the mean life of 1570 hours at α = 0.01 level of significance.
An ambulance service claims that it takes, on the average, 8.9 minutes to reach its destination in
emergency calls. To check on this claim, the agency which licenses ambulance services has then
timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.8
minutes. Does this constitute evidence that the figure claimed is too low at the 1 per cent
significance level?
An ambulance service claims that it takes, on the average, 8.9 minutes to reach its destination in
emergency calls. To check on this claim, the agency which licenses ambulance services has then
timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.8
minutes. Does this constitute evidence that the figure claimed is too low at the 1 per cent
significance level?
Type I Error (False Positive Error)
A type I error occurs when the null hypothesis is actually
true, but was rejected as false by the testing.
Let’s use a shepherd and wolf example.
Let’s say that our null hypothesis is that there is “no
wolf present.”
That is, the actual condition was that there was no wolf
present; however, the shepherd wrongly indicated
there was a wolf present by calling “Wolf! Wolf!”
This is a type I error or false positive error.
Type II Error (False Negative)
A type II error occurs when the null hypothesis is
actually false, but was accepted as true by the
testing.
Again, our null hypothesis is that there is “no wolf
present.”
That is, the actual situation was that there was a
wolf present; however, the shepherd wrongly
indicated there was no wolf present.
Let’s start with our shepherd/wolf example
Null Hypothesis          Type I Error / False Positive     Type II Error / False Negative
Wolf is not present      Shepherd thinks wolf is present   Shepherd thinks wolf is NOT
                         (shepherd cries wolf) when no     present (shepherd does nothing)
                         wolf is actually present          when a wolf is actually present
Cost Assessment          Costs (actual costs plus shepherd Replacement cost for the sheep
                         credibility)    associated   witheaten by the wolf, and
                         scrambling the townsfolk to killreplacement cost for hiring a
                         the non-existing wolf            new shepherd
                         Null Hypothesis is true     Null hypothesis is false
Reject null hypothesis   Type I Error False Positive Correct Outcome True
                                                     Positive
Fail to reject null      Correct outcome True        Type II Error False
hypothesis               Negative                    Negative
a type I error can be thought of as “convicting an innocent person” and type II error
“letting a guilty person go free”.
                                                                                           Shawshank Redemption
   Null Hypothesis             Type I Error / False Positive     Type II Error / False Negative
   Person is not guilty of the Person is judged as guilty when   Person is judged not guilty when
   crime                       the person actually did not       they actually did commit the crime
                               commit the crime (convicting an   (letting a guilty person go free)
                               innocent person)
   Cost Assessment             Social costs of sending an        Risks of letting a guilty criminal
                               innocent person to prison and     roam the streets and committing
                               denying them their personal       future crimes
                               freedoms (which in our society,
                               is considered an almost
                               unbearable cost)
Null Hypothesis    Type I Error / False Positive         Type II Error / False Negative
Medicine A cures   (H0 true, but rejected as false)      (H0 false, but accepted as true)
Disease B          Medicine A cures Disease B, but is    Medicine A does not cure Disease B,
                   rejected as false                     but is accepted as true
Cost Assessment    Lost opportunity cost for rejecting   Unexpected side effects (maybe even
                   an effective drug that could cure     death) for using a drug that is not
                   Disease B                             effective
Hence, many textbooks and instructors will say that the Type 1
(false positive) is worse than a Type 2 (false negative) error.