0% found this document useful (0 votes)
153 views9 pages

Hypothesis Testing Basics Guide

The document defines key terminology used in hypothesis testing: - The null and alternative hypotheses represent the two opposing possibilities being tested. The null hypothesis is assumed true by default. - The test statistic summarizes sample data in a way that allows determining if the value is plausible or implausible under the null hypothesis. - The rejection region defines the range of test statistic values considered too extreme to support the null hypothesis. The critical value separates this region. - The significance level sets the probability of rejecting the null hypothesis when it is actually true, with smaller levels making rejection less likely.

Uploaded by

Nur Alia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views9 pages

Hypothesis Testing Basics Guide

The document defines key terminology used in hypothesis testing: - The null and alternative hypotheses represent the two opposing possibilities being tested. The null hypothesis is assumed true by default. - The test statistic summarizes sample data in a way that allows determining if the value is plausible or implausible under the null hypothesis. - The rejection region defines the range of test statistic values considered too extreme to support the null hypothesis. The critical value separates this region. - The significance level sets the probability of rejecting the null hypothesis when it is actually true, with smaller levels making rejection less likely.

Uploaded by

Nur Alia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

DASHBOARD LEARN MENU

Learn VEE Mathematical Stats 4 4.1 4.1.1 Terminology

Terminology

A hypothesis test takes two opposing possibilities and checks which one is better supported by the available data.
Specifically, the data is summarized by a single value which is judged to be either a plausible or implausible outcome
using probability. A plausible value supports one possibility, while an implausible value supports the other. This logic is
consistent with the intuition expressed in the introductory coin example.

To define the boundary that separates "plausible" from "implausible", we need to be familiar with the terminology
associated with hypothesis testing.

Null and Alternative Hypotheses


The two "opposing possibilities" mentioned are called the null hypothesis and the alternative hypothesis. They are often
denoted as H 0 and H 1 , respectively. These hypotheses are usually mathematical statements about parameters of
interest. For example, "a coin is fair" can be expressed as the hypothesis: a Bernoulli parameter p = 0.5 .

The null hypothesis often takes a "status quo" position, meaning it is the statement assumed to be true by default. In
turn, the alternative hypothesis is typically the statement that a researcher has interest in proving.

In conducting a hypothesis test, the calculations are performed assuming the null hypothesis is true. After weighing the
evidence, the researcher decides to either:

• Fail to reject the null hypothesis, or

• Reject the null hypothesis in favor of the alternative hypothesis.

In other words, without sufficient evidence supporting H 1 , we keep assuming the default of H 0 . Otherwise, sufficient
evidence favoring H 1 would suggest that H 0 ought to be rejected.

MORE INFORMATION

Note that the first decision does not say "accept the null hypothesis". Strongly affirming that something is true is
arguably beyond the scope and ability of a hypothesis test. Thus, the phrase "fail to reject" is more accurate and
preferred.

Processing math: 26%


Test Statistic
The test statistic is a statistic as defined in Section 1.2, which is used to reject or not reject the null hypothesis. This is
achieved by summarizing the sample observations while assuming the null hypothesis is true. Using its sampling
distribution, we can determine whether the calculated test statistic from the data is considered "plausible" or
"implausible".

COACH'S REMARKS

The literature on statistics tends to use certain terms rather loosely. For example, both Xˉ and xˉ are often simply referred
to as "sample mean", in spite of the inherent difference between the two, as previously discussed.

The term "test statistic" is no different. However, there is a lesser emphasis on a test statistic being a random variable
in hypothesis testing. Therefore, we hereby use "test statistic" to only refer to the value calculated from the data.

If a test statistic is near either tail of the sampling distribution, then the data appears to be a rare occurrence (i.e.
implausible). Conversely, a test statistic closer to the center of the sampling distribution suggests that the data appears
to be a typical occurrence (i.e. plausible). Keep in mind that the sampling distribution is based on the null hypothesis
being true.

When a test statistic is in either tail of the sampling distribution, perhaps it is not true that the data was a rare
occurrence. Instead, the data may have come from a different distribution altogether. This implies that the null
hypothesis is actually incorrect. In other words, an extreme test statistic would support the alternative hypothesis more
than the null hypothesis.

To assist in learning the rest of the jargon, we will assume the following trivial setup throughout this subsection:

• There is one sample observation, X , with mean μ and variance σ 2 .

• The hypothesis test investigates the value of μ .

• The test statistic is x , the observed value of X .

• The sampling distribution is a normal distribution.

Rejection Region and Critical Value


The rejection region is the range of test statistic values that we consider "too extreme" and thus decide to reject H 0 in
favor of H 1 . A critical value is a value that separates the rejection region from the rest of the possible test statistic
values.

Processing math: 26%


COACH'S REMARKS

Before continuing, it is important to distinguish between two-tailed tests and one-tailed tests.

• Two-tailed: both tails of the sampling distribution are included in the rejection region.

• One-tailed: only one tail of the sampling distribution is included in the rejection region.

We limit our discussion to two-tailed tests for now. The rejection region can be written as

[x ≤ a] ∪ [x ≥ b]

meaning a test statistic x is "too extreme" if it is smaller than a or greater than b , in which case H 0 would be rejected.
The critical values a and b are chosen such that both tails are symmetrical. Since X is normally distributed, the rejection
region can also be written in terms of the standard normal distribution, such as

[z ≤ − c] ∪ [z ≥ c] = | z | ≥ c

x−μ
where the test statistic is now z = . In this case, notice that we may avoid keeping track of two critical values, − c
σ
and c , by taking the absolute value of z .

Significance Level
The critical value sets the boundary for how extreme the test statistic must be in order to reject the null hypothesis. A
critical value is determined by setting a significance level denoted by α , where

Pr

The significance level is the probability of rejecting H_0 , assuming it is true. Clearly, we would prefer not to reject H_0
if it is true, hence \alpha is typically a small percent. The closer \alpha is to 0, the less likely that H_0 will be rejected; a
test statistic would need to be more extreme to provide evidence against H_0 .

The following graph illustrates the concepts. It shows a standard normal distribution assuming H_0 is true.

Processing math: 26%


From start to finish, a hypothesis test looks like this:

• Determine an appropriate significance level for the test. A common value is \alpha = 0.05 .

• From the sampling distribution which assumes H_0 is true, determine the critical value that corresponds to the
chosen \alpha .

• Collect data, and use it to calculate the test statistic.

• Compare the test statistic to the critical value based on the rejection region. For a two-tailed test under the setup
we have been assuming:

◦ If \left\vert\, z \,\right\vert \ge c , then it is in the rejection region; reject H_0 .

◦ If \left\vert\, z \,\right\vert < c , then it is not in the rejection region; do not reject H_0 .

Let's try performing a complete hypothesis test.

EXAMPLE 4.1.1

It is believed that the mean age of licensed drivers in 2017 is 43.7. To test whether this is the case, one licensed
driver's age is observed to be 60 in 2017. These ages are normally distributed with variance 80.

Test whether the mean age of licensed drivers in 2017 differs from 43.7 at the 5% significance level.

Processing math: 26%


SOLUTION

Start by formally stating the null and alternative hypotheses. Letting \mu represent the mean age of licensed drivers in
2017,

H_0: \mu = 43.7


H_1: \mu \ne 43.7

Next, determine the critical value. Since \alpha = 0.05 ,

\Pr(\left\vert\, Z \,\right\vert \ge c) = 0.05

The graph indicates that c is the 2.5 + 95 = 97.5 th percentile of Z . From the Z -table, obtain c = 1.96 .

Next, calculate the test statistic.

z = \dfrac{x - \mu}{\sigma} = \dfrac{60 - 43.7}{\sqrt{80}} = 1.82

Since \left\vert\, 1.82 \,\right\vert < 1.96 , the test statistic does not fall in the rejection region, and thus we do not reject
the null hypothesis of \pmb{\mu} \mathbf{\, = 43.7} at the 5% significance level. This means the observation of a 60-
year-old licensed driver is plausible if the mean age is in fact 43.7.
\tag*{$\blacksquare$}

While no additional terminology is required to perform a hypothesis test, there are a few others worth mentioning.

Processing math: 26%


p-Value
A p -value is the probability of observing the test statistic or a value more extreme, assuming H_0 is true. Thus, for a
test statistic z , the p -value is

\Pr(\left\vert\, Z \,\right\vert \ge \left\vert\, z \,\right\vert \mid H_0 \text{ is true})

This is similar to the definition of a significance level, with \left\vert\, z \,\right\vert replacing the critical value c .
Therefore, instead of comparing \left\vert\, z \,\right\vert with c to make a decision, we may compare the p -value with
the significance level \alpha .

• If p -value \le \alpha , then reject H_0 .

• If p -value > \alpha , then do not reject H_0 .

It should be evident that the two comparisons are equivalent. A larger \left\vert\, z \,\right\vert results in a smaller p -
value, and vice versa.

What is the p -value for Example 4.1.1?

\begin{align} \Pr(\left\vert\, Z \,\right\vert \ge \left\vert\, z \,\right\vert) & = 2 \cdot \Pr(Z \ge \left\vert\, z \,\right\vert) \\ & = 2
\cdot \Pr(Z \ge 1.82) \\ & = 2(1 - \Pr(Z < 1.82)) \\ & = 2(1 - 0.9656) \\ & = \mathbf{0.0688} \end{align}

As expected, we arrive at the same conclusion to not reject H_0 , since the p -value of 6.88% is greater than the 5%
significance level.

Processing math: 26%


Type I and Type II Errors
While we are unable to know for certain if a hypothesis is true, we try to make an informed decision with a hypothesis
test. This does not mean hypothesis test decisions are always right, even when all the necessary test assumptions are
met. However, we can assume H_0 to be either true or false, and then consider the impact of making a wrong decision.

A type I error occurs when H_0 is rejected while it is true. The probability of making this error is the significance level
\alpha . In other words, we are willing to make a wrong decision 100\alpha \% of the time when H_0 is true, balanced
by the possibility that H_0 is actually false.

A type II error occurs when H_0 fails to be rejected while it is false. The probability of making this error is denoted as
\beta .

H_{0} is True H_{0} is False

Reject H_{0} Type I Error Correct Decision

Fail to Reject H_{0} Correct Decision Type II Error

In layman's terms, a type I error is a false positive, while a type II error is a false negative. Hypothesis test decisions are
not error-free; they merely provide sensible judgment given the data.

Power of a Test
We would like to make the right decision of rejecting H_0 when it is false. The probability of making this right decision is
called the power of a test, which is denoted by 1 - \beta . This is because rejecting H_0 when it is false is the
complement of a type II error.

Since the condition is that H_0 is false, a power can only be calculated when an explicit alternative to H_0 is provided.

What is the power of the test in Example 4.1.1 if in reality \mu = 48 ?

To do this, we need to find the critical values in the original unit of age.

\dfrac{a - 43.7}{\sqrt{80}}= -1.96 \quad \Rightarrow \quad a = 26.169


\dfrac{b - 43.7}{\sqrt{80}}= 1.96 \quad \Rightarrow \quad b = 61.231

In summary, the rejection region for the test is

[x \le 26.169] \cup [x \ge 61.231]

Therefore, if \mu = 48 , the power of the test is

Processing math: 26%


\begin{align} \Pr([X \le 26.169] \cup [X \ge 61.231]) & = \Pr\left(\left[Z \le \dfrac{26.169 - 48}{\sqrt{80}}\right] \cup \left[Z
\ge \dfrac{61.231 - 48}{\sqrt{80}}\right]\right) \\ & = \Pr([Z \le -2.44] \cup [Z \ge 1.48]) \\ & = \Pr(Z \le -2.44) + \Pr(Z \ge
1.48) \\ & = [1 - \Pr(Z < 2.44)] + 1 - \Pr(Z < 1.48) \\ & = 1 - 0.9927 + 1 - 0.9306 \\ & = \mathbf{0.0767}\end{align}

Intuitively, this low power makes sense. If \mu = 48 , then H_0 should be rejected, since \mu \ne 43.7 . However, there
is not a major difference between \mu = 43.7 and \mu = 48 . As a result, the test is unlikely to detect the distinction, and
thus unlikely to correctly reject H_0 .

OTHER INFORMATION

You may be curious what the rejection region is for the coin toss scenario. The null hypothesis is that p = 0.5 for a
binomial sampling distribution with 10 "trials". Note that

\Pr(\le 1 \text{ head}) + \Pr(\ge 9 \text{ heads}) = 0.0215

while \Pr(\le 2 \text{ heads}) + \Pr(\ge 8 \text{ heads}) would exceed 0.05. Therefore, at the 5% significance level, we
reject the null hypothesis if 0, 1, 9, or 10 heads are observed from 10 coin tosses.

Discussions

Processing math: 26%


Previous Lesson Next Lesson
Watch 4.1.0 Overview Watch 4.1.1 Hypotheses and Test Statistic

Processing math: 26%

You might also like