Residual Leverage Plot (Regression Diagnostic)

Z-test : Formula, Types, Examples

Last Updated : 30 Jan, 2025

After learning about inferential statistics we now move on to a more specific technique used for making decisions based on sample data – the Z-test. Studying entire populations can be time-consuming, costly and sometimes impossible. so instead you take a sample from that population.

This is where the Z-test becomes important. It helps us make inferences about the entire population based on the sample data. It allows us to answer questions like:

Is the sample mean significantly different from a known population mean?
Is there a significant difference between the means of two sample groups
This article will explain about Z-test is when to use it and how to perform it in simple terms.

Understanding Z-Test

A Z-test is a type of hypothesis test that compares the sample’s average to the population’s average and calculates the Z-score and tells us how much the sample average is different from the population average by looking at how much the data normally varies. It is particularly useful when the sample size is large >30. This Z-Score is also known as Z-Statistics formula is:

[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}[/Tex]

where,

[Tex]\bar{x} [/Tex]: mean of the sample.
[Tex]\mu [/Tex]: mean of the population.
[Tex]\sigma [/Tex]: Standard deviation of the population.

Let’s understand with the help of example The average family annual income in India is 200k with a standard deviation of 5k and the average family annual income in Delhi is 300k. Then Z-Score for Delhi will be.

[Tex]\begin{aligned}\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma}\\&=\frac{300-200}{5}\\&=20\end{aligned}[/Tex]

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

For a z-test to provide reliable results these assumptions must be met:

Normal Distribution: The population from which the sample is drawn should be approximately normally distributed.
Equal Variance: The samples being compared should have the same variance.
Independence: All data points should be independent of one another.

Steps to perform Z-test

First step is to identify the null and alternate hypotheses.
Determine the level of significance (∝).
Find the critical value of z in the z-test.
Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.
[Tex]Z = \frac{(\overline{x}- \mu)}{\left ( \sigma /\sqrt{n} \right )} [/Tex]
where
- [Tex]\bar{x} [/Tex]: mean of the sample.
- [Tex]\mu [/Tex]: mean of the population.
- [Tex]\sigma [/Tex]: Standard deviation of the population.
- n: sample size.
Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Type of Z-test

There are mainly two types of Z-tests. Let’s understand them one by one:

One Sample Z test

A one-sample Z-test is used to determine if the mean of a single sample is significantly different from a known population mean. When to Use:

The population standard deviation is known.
The sample size is large (usually n>30).
The data is approximately normally distributed.

Suppose a company claims that their new smartphone has an average battery life of 12 hours. A consumer group tests 100 phones and finds an average battery life of 11.8 hours with a known population standard deviation of 0.5 hours.

Step 1: Hypotheses:

H₀: μ=12

H₁: μ≠12

Step2: Calculate the Z-Score:
we can calculate Z-score using the formula:

[Tex]z = \frac{x – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

where xˉ=11.8 ,μ=12, σ=0.5 and n=100 after putting the value we get:

[Tex]z = \frac{11.8- 12}{\frac{0.5}{\sqrt{100}}} = -4[/Tex]

Step3: Decision
Since ∣Z∣=4>1.96∣Z∣=4>1.96 (critical value for α=0.05α=0.05) we reject H₀ indicate significant evidence against the company’s claim.

Now let’s implement this in Python using the Statsmodels and Numpy Library:

Python

import numpy as np
from statsmodels.stats.weightstats import ztest

data = [11.8] * 100  
population_mean = 12
population_std_dev = 0.5

z_statistic, p_value = ztest(data, value=population_mean)

print(f"Z-Statistic: {z_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The average battery life is different from 12 hours.")
else:
    print("Fail to reject the null hypothesis: The average battery life is not significantly different from 12 hours.")

Output:

Z-Statistic: -560128131373970.2500

P-Value: 0.0000

Reject the null hypothesis: The average battery life is different from 12 hour

Two-sampled z-test

In this test we have provided 2 normally distributed and independent populations and we have drawn samples at random from both populations. Here we consider u₁and u₂ to be the population mean and X₁and X₂ to be the observed sample mean. Here our null hypothesis could be like this:

[Tex]H_{0} : \mu_{1} -\mu_{2} = 0 [/Tex]and alternative hypothesis
[Tex]H_{1} : \mu_{1} – \mu_{2} \ne 0 [/Tex]

and the formula for calculating the z-test score:

[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}} [/Tex]

where [Tex]\sigma_1[/Tex] and [Tex]\sigma_2[/Tex] are the standard deviation and n₁ and n₂ are the sample size of population corresponding to u₁and u₂ . Let’s look at the example to understand:

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10
Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Solution:

Step 1: Null & Alternate Hypothesis

Null Hypothesis: There is no significant difference between the mean score between the online and offline classes
[Tex] \mu_1 -\mu_2 = 0 [/Tex]
Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes.
[Tex] \mu_1 -\mu_2 \neq 0 [/Tex]

Step 2: Significance Level

Significance Level: 5%
[Tex]\alpha = 0.05 [/Tex]

Step 3: Z-Score

[Tex]\begin{aligned}\text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}\\ &= \frac{(75-80)-0}{\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}}\\ &= \frac{-5}{\sqrt{2+2.4}}\\ &= \frac{-5}{2.0976}\\&=-2.384\end{aligned}[/Tex]

Step 4: Check to Critical Z-Score value in the Z-Table for alpha/2 = 0.025

Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

absolute(Z-Score) > Critical Z-Score
Sow we reject the null hypothesis and there is a significant difference between the online and offline classes.

Now we will implement the two sampled z-test using the libraries used in previous implementation:

Python

import numpy as np
import scipy.stats as stats

# Group A (Offline Classes)
n1 = 50
x1 = 75
s1 = 10

# Group B (Online Classes)
n2 = 60
x2 = 80
s2 = 12

# Null Hypothesis = mu_1-mu_2 = 0 
# Hypothesized difference (under the null hypothesis)
D = 0

# Set the significance level
alpha = 0.05

# Calculate the test statistic (z-score)
z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))

# Calculate the critical value
z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:',z_critical)

# Compare the test statistic with the critical value
if np.abs(z_score) > z_critical:
    print("""Reject the null hypothesis.
There is a significant difference b/w the online and offline classes.""")
else:
    print("""Fail to reject the null hypothesis.
There is not evidence to suggest a significant difference b/w the online and offline classes.""")

# Approach 2: Using P-value
    
# P-Value : Probability of getting less than a Z-score
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value :',p_value)

# Compare the p-value with the significance level
if p_value < alpha:
    print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
    print("""Fail to reject the null hypothesis.
There is not evidence to suggest significant difference b/w the online and offline classes.""")

Output:

Z-Score: 2.3836564731139807
Critical Z-Score: 1.959963984540054
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
P-Value : 0.01714159544079563
Reject the null hypothesis.
There is a significant difference between the online and offline classes.

The Z-Table

Solved examples :

Problem 1: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?

Solution:

Step 1: State the hypotheses
H₀: μ = 12 (null hypothesis)
H₁: μ ≠ 12 (alternative hypothesis)

Step 2: Calculate the Z-score
Z = (x̄ – μ) / (σ / √n)
= (11.8 – 12) / (0.5 / √100)
= -0.2 / 0.05
= -4

Step 3: Find the critical value (two-tailed test at 5% significance)
Z₀.₀₂₅ = ±1.96

Step 4: Compare Z-score with critical value
|-4| > 1.96, so we reject the null hypothesis.

Conclusion: There is sufficient evidence to refute the company’s claim about battery life.

Problem 2: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?

Solution:

Step 1: State the hypotheses
H₀: μ₁ – μ₂ = 0 (null hypothesis)
H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis)

Step 2: Calculate the Z-score
Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂))
= (15 – 13) / √((3²/50) + (4²/60))
= 2 / √(0.18 + 0.2667)
= 2 / 0.6455
= 3.10

Step 3: Find the critical value (two-tailed test at 1% significance)
Z₀.₀₀₅ = ±2.576

Step 4: Compare Z-score with critical value
3.10 > 2.576, so we reject the null hypothesis.

Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.

Z-test – FAQS

What is the main limitation of the z-test?

The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.

What is the minimum sample for z-test?

A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

What is the application of z-test?

It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.

What is the theory of the z-test?

The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.

Residual Leverage Plot (Regression Diagnostic)

pawangfg

Improve

Article Tags :

Practice Tags :

Machine Learning

Z-test : Formula, Types, Examples

Understanding Z-Test

Steps to perform Z-test

Type of Z-test

One Sample Z test

Two-sampled z-test

The Z-Table

Solved examples :

Z-test – FAQS

What is the main limitation of the z-test?

What is the minimum sample for z-test?

What is the application of z-test?

What is the theory of the z-test?

Similar Reads

Linear Algebra and Matrix

Statistics for Machine Learning

Probability and Probability Distributions

Calculus for Machine Learning

Regression in Machine Learning

Thank You!

What kind of Experience do you want to share?