Z-test is especially useful when you have a large sample size and know the population’s standard deviation. Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis.
What is Z-Test?
Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).
Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).
The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:
[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}
[/Tex]
where,
- [Tex]\bar{x}
[/Tex]: mean of the sample.
- [Tex]\mu
[/Tex]: mean of the population.
- [Tex]\sigma
[/Tex]: Standard deviation of the population.
z-test assumes that the test statistic (z-score) follows a standard normal distribution.
Example
The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.
Then Z-Score for Delhi will be.
[Tex]\begin{aligned}
\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma}
\\&=\frac{300-200}{5}
\\&=20
\end{aligned}
[/Tex]
This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).
When to Use Z-test
- The sample size should be greater than 30. Otherwise, we should use the t-test.
- Samples should be drawn at random from the population.
- The standard deviation of the population should be known.
- Samples that are drawn from the population should be independent of each other.
- The data should be normally distributed, however, for a large sample size, it is assumed to have a normal distribution because central limit theorem
Hypothesis Testing
A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.
- Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H0.
- Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by HA.
- Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).
- First, identify the null and alternate hypotheses.
- Determine the level of significance (∝).
- Find the critical value of z in the z-test using
- Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.
[Tex]Z = \frac{(\overline{X}- \mu)}{\left ( \sigma /\sqrt{n} \right )}
[/Tex]
where,- [Tex]\bar{x}
[/Tex]: mean of the sample.
- [Tex]\mu
[/Tex]: mean of the population.
- [Tex]\sigma
[/Tex]: Standard deviation of the population.
- n: sample size.
- Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis
Type of Z-test
Left-tailed Test
In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
Right-tailed Test
In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
One-Tailed Test
A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.
- First, we define the null hypothesis and the alternate hypothesis. Our null hypothesis will be:
[Tex]H_0 : \mu = 100
[/Tex]
and our alternate hypothesis.
[Tex]H_A : \mu > 100
[/Tex] - State the level of significance. Here, our level of significance is given in this question ([Tex]\alpha
[/Tex] =0.05), if not given then we take ∝=0.05 in general.
- Now, we compute the Z-Score:
X = 110
Mean = 100
Standard Deviation = 15
Number of samples = 50
[Tex]\begin{aligned}
\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}
\\&=\frac{110-100}{15/\sqrt{50}}
\\&=\frac{10}{2.12}
\\&=4.71
\end{aligned}
[/Tex] - Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
- Here 4.71 >1.645, so we reject the null hypothesis.
- If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.
Code Implementations of One-Tailed Z-Test
Python
# Import the necessary libraries
import numpy as np
import scipy.stats as stats
# Given information
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05
# compute the z-score
z_score = (sample_mean-population_mean)/(population_std/np.sqrt(50))
print('Z-Score :',z_score)
# Approach 1: Using Critical Z-Score
# Critical Z-Score
z_critical = stats.norm.ppf(1-alpha)
print('Critical Z-Score :',z_critical)
# Hypothesis
if z_score > z_critical:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")
# Approach 2: Using P-value
# P-Value : Probability of getting less than a Z-score
p_value = 1-stats.norm.cdf(z_score)
print('p-value :',p_value)
# Hypothesis
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")
Output:
Z-Score : 4.714045207910317Critical Z-Score : 1.6448536269514722Reject Null Hypothesisp-value : 1.2142337364462463e-06Reject Null Hypothesis
Two-tailed test
In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.
Below is an example of performing the z-test:
Two-sampled z-test
In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u1 and u2 to be the population mean, and X1 and X2 to be the observed sample mean. Here, our null hypothesis could be like this:
[Tex]H_{0} : \mu_{1} -\mu_{2} = 0
[/Tex]
and alternative hypothesis
[Tex]H_{1} : \mu_{1} – \mu_{2} \ne 0
[/Tex]
and the formula for calculating the z-test score:
[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}
[/Tex]
where [Tex]\sigma_1
[/Tex] and [Tex]\sigma_2
[/Tex] are the standard deviation and n1 and n2 are the sample size of population corresponding to u1 and u2 .
Example:
There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.
Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10
Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12
Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.
Solution:
Step 1: Null & Alternate Hypothesis
- Null Hypothesis: There is no significant difference between the mean score between the online and offline classes
[Tex] \mu_1 -\mu_2 = 0
[/Tex] - Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes.
[Tex] \mu_1 -\mu_2 \neq 0
[/Tex]
Step 2: Significance Label
- Significance Label: 5%
[Tex]\alpha = 0.05
[/Tex]
Step 3: Z-Score
[Tex]\begin{aligned}
\text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)}
{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}
\\ &= \frac{(75-80)-0}
{\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}}
\\ &= \frac{-5}
{\sqrt{2+2.4}}
\\ &= \frac{-5}
{2.0976}
\\&=-2.384
\end{aligned}
[/Tex]
Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025
Step 5: Compare with the absolute Z-Score value
- absolute(Z-Score) > Critical Z-Score
- Reject the null hypothesis. There is a significant difference between the online and offline classes.
Code Implementations on Two-sampled Z-test
Python
import numpy as np
import scipy.stats as stats
# Group A (Offline Classes)
n1 = 50
x1 = 75
s1 = 10
# Group B (Online Classes)
n2 = 60
x2 = 80
s2 = 12
# Null Hypothesis = mu_1-mu_2 = 0
# Hypothesized difference (under the null hypothesis)
D = 0
# Set the significance level
alpha = 0.05
# Calculate the test statistic (z-score)
z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))
# Calculate the critical value
z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:',z_critical)
# Compare the test statistic with the critical value
if np.abs(z_score) > z_critical:
print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
print("""Fail to reject the null hypothesis.
There is not enough evidence to suggest a significant difference between the online and offline classes.""")
# Approach 2: Using P-value
# P-Value : Probability of getting less than a Z-score
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value :',p_value)
# Compare the p-value with the significance level
if p_value < alpha:
print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
print("""Fail to reject the null hypothesis.
There is not enough evidence to suggest significant difference between the online and offline classes.""")
Output:
Z-Score: 2.3836564731139807
Critical Z-Score: 1.959963984540054
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
P-Value : 0.01714159544079563
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
Solved examples :
Example 1: One-sample Z-test
Problem: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?
Solution:
Step 1: State the hypotheses
H₀: μ = 12 (null hypothesis)
H₁: μ ≠ 12 (alternative hypothesis)
Step 2: Calculate the Z-score
Z = (x̄ – μ) / (σ / √n)
= (11.8 – 12) / (0.5 / √100)
= -0.2 / 0.05
= -4
Step 3: Find the critical value (two-tailed test at 5% significance)
Z₀.₀₂₅ = ±1.96
Step 4: Compare Z-score with critical value
|-4| > 1.96, so we reject the null hypothesis.
Conclusion: There is sufficient evidence to refute the company’s claim about battery life.
Problem: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?
Solution:
Step 1: State the hypotheses
H₀: μ₁ – μ₂ = 0 (null hypothesis)
H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis)
Step 2: Calculate the Z-score
Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂))
= (15 – 13) / √((3²/50) + (4²/60))
= 2 / √(0.18 + 0.2667)
= 2 / 0.6455
= 3.10
Step 3: Find the critical value (two-tailed test at 1% significance)
Z₀.₀₀₅ = ±2.576
Step 4: Compare Z-score with critical value
3.10 > 2.576, so we reject the null hypothesis.
Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.
Problem 3 : A polling company claims that 60% of voters support a new policy. In a sample of 1000 voters, 570 support the policy. At a 5% significance level, is there evidence to support the company’s claim?
Solution:
Step 1: State the hypotheses
H₀: p = 0.60 (null hypothesis)
H₁: p ≠ 0.60 (alternative hypothesis)
Step 2: Calculate the Z-score
p̂ = 570/1000 = 0.57 (sample proportion)
Z = (p̂ – p) / √(p(1-p)/n)
= (0.57 – 0.60) / √(0.60(1-0.60)/1000)
= -0.03 / √(0.24/1000)
= -0.03 / 0.0155
= -1.94
Step 3: Find the critical value (two-tailed test at 5% significance)
Z₀.₀₂₅ = ±1.96
Step 4: Compare Z-score with critical value
|-1.94| < 1.96, so we fail to reject the null hypothesis.
Conclusion: There is not enough evidence to refute the polling company’s claim at the 5% significance level.
Problem 4 : A manufacturer claims that their light bulbs last an average of 1000 hours. A sample of 100 bulbs has a mean life of 985 hours. The population standard deviation is known to be 50 hours. At a 5% significance level, is there evidence to reject the manufacturer’s claim?
Solution:
H₀: μ = 1000
H₁: μ ≠ 1000
Z = (x̄ – μ) / (σ / √n)
= (985 – 1000) / (50 / √100)
= -15 / 5
= -3
Critical value (α = 0.05, two-tailed): ±1.96
|-3| > 1.96, so reject H₀.
Conclusion: There is sufficient evidence to reject the manufacturer’s claim at the 5% significance level.
Example 5 : Two factories produce semiconductors. Factory A’s chips have a mean resistance of 100 ohms with a standard deviation of 5 ohms. Factory B’s chips have a mean resistance of 98 ohms with a standard deviation of 4 ohms. Samples of 50 chips from each factory are tested. At a 1% significance level, is there a difference in mean resistance between the two factories?
Solution:
H₀: μA – μB = 0
H₁: μA – μB ≠ 0
Z = (x̄A – x̄B) / √((σA²/nA) + (σB²/nB))
= (100 – 98) / √((5²/50) + (4²/50))
= 2 / √(0.5 + 0.32)
= 2 / 0.872
= 2.29
Critical value (α = 0.01, two-tailed): ±2.576
|2.29| < 2.576, so fail to reject H₀.
Conclusion: There is not enough evidence to conclude a difference in mean resistance at the 1% significance level.
Problem 6 : A political analyst claims that 40% of voters in a certain district support a new tax policy. In a random sample of 500 voters, 220 support the policy. At a 5% significance level, is there evidence to reject the analyst’s claim?
Solution:
H₀: p = 0.40
H₁: p ≠ 0.40
p̂ = 220/500 = 0.44
Z = (p̂ – p) / √(p(1-p)/n)
= (0.44 – 0.40) / √(0.40(1-0.40)/500)
= 0.04 / 0.0219
= 1.83
Critical value (α = 0.05, two-tailed): ±1.96
|1.83| < 1.96, so fail to reject H₀.
Conclusion: There is not enough evidence to reject the analyst’s claim at the 5% significance level.
Problem 7 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?
Solution:
H₀: pA – pB = 0
H₁: pA – pB ≠ 0
p̂A = 150/1000 = 0.15
p̂B = 180/1200 = 0.15
p̂ = (150 + 180) / (1000 + 1200) = 0.15
Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))
= (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200))
= 0 / 0.0149
= 0
Critical value (α = 0.05, two-tailed): ±1.96
|0| < 1.96, so fail to reject H₀.
Conclusion: There is no significant difference in the effectiveness of the two advertising methods at the 5% significance level.
Problem 8 : A new treatment for a disease is tested in two cities. In City A, 120 out of 400 patients recover. In City B, 140 out of 500 patients recover. At a 5% significance level, is there a difference in the recovery rates between the two cities?
Solution:
H₀: pA – pB = 0
H₁: pA – pB ≠ 0
p̂A = 120/400 = 0.30
p̂B = 140/500 = 0.28
p̂ = (120 + 140) / (400 + 500) = 0.2889
Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))
= (0.30 – 0.28) / √(0.2889(1-0.2889)(1/400 + 1/500))
= 0.02 / 0.0316
= 0.633
Critical value (α = 0.05, two-tailed): ±1.96
|0.633| < 1.96, so fail to reject H₀.
Conclusion: There is not enough evidence to conclude a difference in recovery rates between the two cities at the 5% significance level.
Problem 9 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?
Solution:
H₀: pA – pB = 0
H₁: pA – pB ≠ 0
p̂A = 150/1000 = 0.15
p̂B = 180/1200 = 0.15
p̂ = (150 + 180) / (1000 + 1200) = 0.15
Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))
= (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200))
= 0 / 0.0149
= 0
Critical value (α = 0.05, two-tailed): ±1.96
|0| < 1.96, so fail to reject H₀.
Conclusion: There is no significant difference in the effectiveness of the two advertising methods at the 5% significance level.
Problem 10 : A company claims that their product weighs 500 grams on average. A sample of 64 products has a mean weight of 498 grams. The population standard deviation is known to be 8 grams. At a 1% significance level, is there evidence to reject the company’s claim?
Solution:
H₀: μ = 500
H₁: μ ≠ 500
Z = (x̄ – μ) / (σ / √n)
= (498 – 500) / (8 / √64)
= -2 / 1
= -2
Critical value (α = 0.01, two-tailed): ±2.576
|-2| < 2.576, so fail to reject H₀.
Conclusion: There is not enough evidence to reject the company’s claim at the 1% significance level.
Practice Problems
1).A cereal company claims that their boxes contain an average of 350 grams of cereal. A consumer group tests 100 boxes and finds a mean weight of 345 grams with a known population standard deviation of 15 grams. At a 5% significance level, is there evidence to refute the company’s claim?
2).A study compares the effect of two different diets on cholesterol levels. Diet A is tested on 50 people, resulting in a mean reduction of 25 mg/dL with a standard deviation of 8 mg/dL. Diet B is tested on 60 people, resulting in a mean reduction of 22 mg/dL with a standard deviation of 7 mg/dL. At a 1% significance level, is there a significant difference between the two diets?
3).A politician claims that 60% of voters in her district support her re-election. In a random sample of 1000 voters, 570 support her. At a 5% significance level, is there evidence to reject the politician’s claim?
4).Two different teaching methods are compared. Method A results in 80 students passing out of 120 students. Method B results in 90 students passing out of 150 students. At a 5% significance level, is there a difference in the effectiveness of the two methods?
5).A company claims that their new energy-saving light bulbs last an average of 10,000 hours. A sample of 64 bulbs has a mean life of 9,800 hours. The population standard deviation is known to be 500 hours. At a 1% significance level, is there evidence to reject the company’s claim?
6).The mean salary of employees in a large corporation is said to be $75,000 per year. A union representative suspects this is too high and surveys 100 randomly selected employees, finding a mean salary of $72,500. The population standard deviation is known to be $8,000. At a 5% significance level, is there evidence to support the union representative’s suspicion?
7).Two factories produce computer chips. Factory A’s chips have a mean processing speed of 3.2 GHz with a standard deviation of 0.2 GHz. Factory B’s chips have a mean processing speed of 3.3 GHz with a standard deviation of 0.25 GHz. Samples of 100 chips from each factory are tested. At a 5% significance level, is there a difference in mean processing speed between the two factories?
8).A new vaccine is claimed to be 90% effective. In a clinical trial with 500 participants, 440 develop immunity. At a 1% significance level, is there evidence to reject the claim about the vaccine’s effectiveness?
9).Two different advertising campaigns are tested. Campaign A results in 250 sales out of 2000 views. Campaign B results in 300 sales out of 2500 views. At a 5% significance level, is there a difference in the effectiveness of the two campaigns?
10).A quality control manager claims that the defect rate in a production line is 5%. In a sample of 1000 items, 65 are found to be defective. At a 5% significance level, is there evidence to suggest that the actual defect rate is different from the claimed 5%?
Type 1 error and Type II error
- Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
- Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.
| Null Hypothesis is TRUE | Null Hypothesis is FALSE |
---|
Reject Null Hypothesis | Type I Error
(False Positive)
| Correct decision |
---|
Fail to Reject the Null Hypothesis | Correct decision | Type II error
(False Negative)
|
---|
Summary
Z-tests are used to determine whether there is a statistically significant difference between a sample statistic and a population parameter, or between two population parameters.Z-tests are statistical tools used to determine if there’s a significant difference between a sample statistic and a population parameter, or between two population parameters. They’re applicable when dealing with large sample sizes (typically n > 30) and known population standard deviations. Z-tests can be used for analyzing means or proportions in both one-sample and two-sample scenarios. The process involves stating hypotheses, calculating a Z-score, comparing it to a critical value based on the chosen significance level (often 5% or 1%), and then making a decision to reject or fail to reject the null hypothesis.
FAQS
What is the main limitation of the z-test?
The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.
What is the minimum sample for z-test?
A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.
What is the application of z-test?
It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.
What is the theory of the z-test?
The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.
Our GATE 2026 Courses for CSE & DA offer live and recorded lectures from GATE experts, Quizzes, Subject-Wise Mock Tests, PYQs and practice questions, and Full-Length Mock Tests to ensure you’re well-prepared for the toughest questions.
Take the Three 90 Challenge! Complete 90% of the course in 90 days and earn a 90% refund. Stay motivated, track your progress, and make the most of your preparation time. Plus, enjoy exclusive features like:
--> All India Mock Test
--> Live GATE CSE & DA Mentorship Classes
--> Live Doubt Solving Sessions
Join now and stay ahead in your GATE 2026 journey!
Similar Reads
Maths for Machine Learning
Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
5 min read
Linear Algebra and Matrix
Matrices
Matrices are key concepts in mathematics, widely used in solving equations and problems in fields like physics and computer science. A matrix is simply a grid of numbers, and a determinant is a value calculated from a square matrix. Example: [Tex]\begin{bmatrix} 6 & 9 \\ 5 & -4 \\ \end{bmatr
3 min read
Scalar and Vector
Scalar and Vector Quantities are used to describe the motion of an object. Scalar Quantities are defined as physical quantities that have magnitude or size only. For example, distance, speed, mass, density, etc. However, vector quantities are those physical quantities that have both magnitude and di
8 min read
Python Program to Add Two Matrices
Given two matrices X and Y, the task is to compute the sum of two matrices and then print it in Python. Examples: Input : X= [[1,2,3], [4 ,5,6], [7 ,8,9]] Y = [[9,8,7], [6,5,4], [3,2,1]] Output : result= [[10,10,10], [10,10,10], [10,10,10]]Table of Content Add Two Matrices Using for loopAdd Two Matr
4 min read
Python Program to Multiply Two Matrices
Given two matrices, we will have to create a program to multiply two matrices in Python. Example: Python Matrix Multiplication of Two-Dimension [GFGTABS] Python matrix_a = [[1, 2], [3, 4]] matrix_b = [[5, 6], [7, 8]] result = [[0, 0], [0, 0]] for i in range(2): for j in range(2): result[i][j] = (mat
5 min read
Vector Operations
Vector Operations are operations that are performed on vector quantities. Vector quantities are the quantities that have both magnitude and direction. So performing mathematical operations on them directly is not possible. So we have special operations that work only with vector quantities and hence
9 min read
Product of Vectors
Vector operations are used almost everywhere in the field of physics. Many times these operations include addition, subtraction, and multiplication. Addition and subtraction can be performed using the triangle law of vector addition. In the case of products, vector multiplication can be done in two
6 min read
Scalar Product of Vectors
Two vectors or a vector and a scalar can be multiplied. There are mainly two kinds of products of vectors in physics, scalar multiplication of vectors and Vector Product (Cross Product) of two vectors. The result of the scalar product of two vectors is a number (a scalar). The common use of the scal
9 min read
Dot and Cross Products on Vectors
A quantity that is characterized not only by magnitude but also by its direction, is called a vector. Velocity, force, acceleration, momentum, etc. are vectors. Vectors can be multiplied in two ways: Scalar product or Dot productVector Product or Cross productTable of Content Scalar Product/Dot Pr
9 min read
Transpose a matrix in Single line in Python
Transpose of a matrix is a task we all can perform very easily in Python (Using a nested loop). But there are some interesting ways to do the same in a single line. In Python, we can implement a matrix as a nested list (a list inside a list). Each element is treated as a row of the matrix. For examp
4 min read
Transpose of a Matrix
A matrix is a rectangular arrangement of numbers (or elements) in rows and columns. It is often used in mathematics to represent data, solve systems of equations, or perform transformations. A matrix is written as: [Tex]A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9\e
11 min read
Adjoint and Inverse of a Matrix
Given a square matrix, find the adjoint and inverse of the matrix. We strongly recommend you to refer determinant of matrix as a prerequisite for this. Adjoint (or Adjugate) of a matrix is the matrix obtained by taking the transpose of the cofactor matrix of a given square matrix is called its Adjoi
15+ min read
How to inverse a matrix using NumPy
In this article, we will see NumPy Inverse Matrix in Python before that we will try to understand the concept of it. The inverse of a matrix is just a reciprocal of the matrix as we do in normal arithmetic for a single number which is used to solve the equations to find the value of unknown variable
3 min read
Program to find Determinant of a Matrix
The determinant of a Matrix is defined as a special number that is defined only for square matrices (matrices that have the same number of rows and columns). A determinant is used in many places in calculus and other matrices related to algebra, it actually represents the matrix in terms of a real n
15+ min read
Program to find Normal and Trace of a matrix
Given a 2D matrix, the task is to find Trace and Normal of matrix.Normal of a matrix is defined as square root of sum of squares of matrix elements.Trace of a n x n square matrix is sum of diagonal elements. Examples : Input : mat[][] = {{7, 8, 9}, {6, 1, 2}, {5, 4, 3}}; Output : Normal = 16 Trace =
6 min read
Data Science | Solving Linear Equations
Linear Algebra is a very fundamental part of Data Science. When one talks about Data Science, data representation becomes an important aspect of Data Science. Data is represented usually in a matrix form. The second important thing in the perspective of Data Science is if this data contains several
9 min read
Data Science - Solving Linear Equations with Python
A collection of equations with linear relationships between the variables is known as a system of linear equations. The objective is to identify the values of the variables that concurrently satisfy each equation, each of which is a linear constraint. By figuring out the system, we can learn how the
4 min read
System of Linear Equations
In mathematics, a system of linear equations consists of two or more linear equations that share the same variables. These systems often arise in real-world applications, such as engineering, physics, economics, and more, where relationships between variables need to be analyzed. Understanding how t
9 min read
System of Linear Equations in three variables using Cramer's Rule
Cramer's rule: In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknown variables. It expresses the solution in terms of the determinants of the coefficient matrix and of matrices obtained from it by replacing one colu
12 min read
Eigenvalues and Eigenvectors
Eigenvectors are the directions that remain unchanged during a transformation, even if they get longer or shorter. Eigenvalues are the numbers that indicate how much something stretches or shrinks during that transformation. These ideas are important in many areas of math and engineering, including
15+ min read
Applications of Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are indispensable mathematical concepts that find widespread application across various fields of engineering. Understanding their significance allows engineers to analyze and solve complex problems efficiently. This article explores the practical applications of eigenva
7 min read
How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?
In this article, we will discuss how to compute the eigenvalues and right eigenvectors of a given square array using NumPy library. Example: Suppose we have a matrix as: [[1,2], [2,3]] Eigenvalue we get from this matrix or square array is: [-0.23606798 4.23606798] Eigenvectors of this matrix are: [[
2 min read
Statistics for Machine Learning
Descriptive Statistic
Whenever we deal with some piece of data no matter whether it is small or stored in huge databases statistics is the key that helps us to analyze this data and provide insightful points to understand the whole data without going through each of the data pieces in the complete dataset at hand. In thi
6 min read
Measures of Central Tendency
Usually, frequency distribution and graphical representation are used to depict a set of raw data to attain meaningful conclusions from them. However, sometimes, these methods fail to convey a proper and clear picture of the data as expected. Therefore, some measures, also known as Measures of Centr
5 min read
Measures of Dispersion | Types, Formula and Examples
Measures of Dispersion are used to represent the scattering of data. These are the numbers that show the various aspects of the data spread across various parameters. Let's learn about the measure of dispersion in statistics , its types, formulas, and examples in detail. Dispersion in StatisticsDisp
10 min read
Mean, Variance and Standard Deviation
Mean, Variance and Standard Deviation are fundamental concepts in statistics and engineering mathematics, essential for analyzing and interpreting data. These measures provide insights into data's central tendency, dispersion, and spread, which are crucial for making informed decisions in various en
8 min read
Calculate the average, variance and standard deviation in Python using NumPy
Numpy in Python is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python. Numpy provides very easy methods to calculate the average, variance
5 min read
Random Variable
Random variable is a fundamental concept in statistics that bridges the gap between theoretical probability and real-world data. A Random variable in statistics is a function that assigns a real value to an outcome in the sample space of a random experiment. For example: if you roll a die, you can a
11 min read
Difference between Parametric and Non-Parametric Methods
Statistical analysis plays a crucial role in understanding and interpreting data across various disciplines. Two prominent approaches in statistical analysis are Parametric and Non-Parametric Methods. While both aim to draw inferences from data, they differ in their assumptions and underlying princi
8 min read
Probability Distribution - Function, Formula, Table
A probability distribution describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a frequency distribution shows how often outcomes occur in a sample or
15+ min read
Confidence Interval
In the realm of statistics, precise estimation is paramount to drawing meaningful insights from data. One of the indispensable tools in this pursuit is the confidence interval. Confidence intervals provide a systematic approach to quantifying the uncertainty associated with sample statistics, offeri
12 min read
Covariance and Correlation
Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. In this article, we will learn about the differences
6 min read
Program to find correlation coefficient
Given two array elements and we have to find the correlation coefficient between two arrays. The correlation coefficient is an equation that is used to determine the strength of the relation between two variables. The correlation coefficient is sometimes called as cross-correlation coefficient. The
8 min read
Robust Correlation
Correlation is a statistical tool that is used to analyze and measure the degree of relationship or degree of association between two or more variables. There are generally three types of correlation: Positive correlation: When we increase the value of one variable, the value of another variable inc
8 min read
Normal Probability Plot
The probability plot is a way of visually comparing the data coming from different distributions. These data can be of empirical dataset or theoretical dataset. The probability plot can be of two types: P-P plot: The (Probability-to-Probability) p-p plot is the way to visualize the comparing of cumu
3 min read
Quantile Quantile plots
The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it
8 min read
True Error vs Sample Error
True Error The true error can be said as the probability that the hypothesis will misclassify a single randomly drawn sample from the population. Here the population represents all the data in the world. Let's consider a hypothesis h(x) and the true/target function is f(x) of population P. The proba
3 min read
Bias-Variance Trade Off - Machine Learning
It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. A p
3 min read
Understanding Hypothesis Testing
Hypothesis testing is a fundamental statistical method employed in various fields, including data science, machine learning, and statistics, to make informed decisions based on empirical evidence. It involves formulating assumptions about population parameters using sample statistics and rigorously
15+ min read
T-test
In statistics, various tests are used to compare different samples or groups and draw conclusions about populations. These tests, known as statistical tests, focus on analyzing the likelihood or probability of obtaining the observed data under specific assumptions or hypotheses. They provide a frame
14 min read
Paired T-Test - A Detailed Overview
Student’s t-test or t-test is the statistical method used to determine if there is a difference between the means of two samples. The test is often performed to find out if there is any sampling error or unlikeliness in the experiment. This t-test is further divided into 3 types based on your data a
5 min read
P-value in Machine Learning
P-value helps us determine how likely it is to get a particular result when the null hypothesis is assumed to be true. It is the probability of getting a sample like ours or more extreme than ours if the null hypothesis is correct. Therefore, if the null hypothesis is assumed to be true, the p-value
6 min read
F-Test in Statistics
F test is a statistical test that is used in hypothesis testing, that determines whether or not the variances of two populations or two samples are equal. An f distribution is what the data in a f test conforms to. By dividing the two variances, this test compares them using the f statistic. Dependi
7 min read
Z-test : Formula, Types, Examples
Z-test is especially useful when you have a large sample size and know the population's standard deviation. Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examinin
15+ min read
Residual Leverage Plot (Regression Diagnostic)
In linear or multiple regression, it is not enough to just fit the model into the dataset. But, it may not give the desired result. To apply the linear or multiple regression efficiently to the dataset. There are some assumptions that we need to check on the dataset that made linear/multiple regress
5 min read
Difference between Null and Alternate Hypothesis
Hypothesis is a statement or an assumption that may be true or false. There are six types of hypotheses mainly the Simple hypothesis, Complex hypothesis, Directional hypothesis, Associative hypothesis, and Null hypothesis. Usually, the hypothesis is the start point of any scientific investigation, I
3 min read
Mann and Whitney U test
Mann and Whitney's U-test or Wilcoxon rank-sum test is the non-parametric statistic hypothesis test that is used to analyze the difference between two independent samples of ordinal data. In this test, we have provided two randomly drawn samples and we have to verify whether these two samples is fro
4 min read
Wilcoxon Signed Rank Test
Prerequisites: Parametric and Non-Parametric Methods Hypothesis Testing Wilcoxon signed-rank test, also known as Wilcoxon matched pair test is a non-parametric hypothesis test that compares the median of two paired groups and tells if they are identically distributed or not. We can use this when: Di
4 min read
Kruskal Wallis Test
Kruskal Wallis Test: It is a nonparametric test. It is sometimes referred to as One-Way ANOVA on ranks. It is a nonparametric alternative to One-Way ANOVA. It is an extension of the Man-Whitney Test to situations where more than two levels/populations are involved. This test falls under the family o
4 min read
Friedman Test
Friedman Test: It is a non-parametric test alternative to the one way ANOVA with repeated measures. It tries to determine if subjects changed significantly across occasions/conditions. For example:- Problem-solving ability of a set of people is the same or different in Morning, Afternoon, Evening. I
5 min read
Probability Class 10 Important Questions
Probability is a fundamental concept in mathematics for measuring of chances of an event happening By assigning numerical values to the chances of different outcomes, probability allows us to model, analyze, and predict complex systems and processes. Probability Formulas for Class 10 It says the pos
4 min read
Probability and Probability Distributions
Mathematics - Law of Total Probability
Probability theory is the branch of mathematics concerned with the analysis of random events. It provides a framework for quantifying uncertainty, predicting outcomes, and understanding random phenomena. In probability theory, an event is any outcome or set of outcomes from a random experiment, and
13 min read
Bayes's Theorem for Conditional Probability
Bayes's Theorem for Conditional Probability: Bayes's Theorem is a fundamental result in probability theory that describes how to update the probabilities of hypotheses when given evidence. Named after the Reverend Thomas Bayes, this theorem is crucial in various fields, including engineering, statis
9 min read
Mathematics | Probability Distributions Set 1 (Uniform Distribution)
Prerequisite - Random Variable In probability theory and statistics, a probability distribution is a mathematical function that can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. For instance, if the random variable X is used to denote the
4 min read
Mathematics | Probability Distributions Set 4 (Binomial Distribution)
The previous articles talked about some of the Continuous Probability Distributions. This article covers one of the distributions which are not continuous but discrete, namely the Binomial Distribution. Introduction - To understand the Binomial distribution, we must first understand what a Bernoulli
5 min read
Mathematics | Probability Distributions Set 5 (Poisson Distribution)
The previous article covered the Binomial Distribution. This article talks about another Discrete Probability Distribution, the Poisson Distribution. Introduction -Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we c
4 min read
Uniform Distribution | Formula, Definition and Examples
Uniform Distribution is the probability distribution that represents equal likelihood of all outcomes within a specific range. i.e. the probability of each outcome occurring is the same. Whether dealing with a simple roll of a fair die or selecting a random number from a continuous interval, uniform
11 min read
Mathematics | Probability Distributions Set 2 (Exponential Distribution)
The previous article covered the basics of Probability Distributions and talked about the Uniform Probability Distribution. This article covers the Exponential Probability Distribution which is also a Continuous distribution just like Uniform Distribution. Introduction - Suppose we are posed with th
5 min read
Mathematics | Probability Distributions Set 3 (Normal Distribution)
The previous two articles introduced two Continuous Distributions: Uniform and Exponential. This article covers the Normal Probability Distribution, also a Continuous distribution, which is by far the most widely used model for continuous measurement. Introduction - Whenever a random experiment is r
5 min read
Mathematics | Beta Distribution Model
The Beta Distribution is a continuous probability distribution defined on the interval [0, 1], widely used in statistics and various fields for modeling random variables that represent proportions or probabilities. It is particularly useful when dealing with scenarios where the outcomes are bounded
12 min read
Gamma Distribution Model in Mathematics
Introduction : Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we can think of it as a random variable. Now, if this random variable X has gamma distribution, then its probability density function is given as follows
2 min read
Chi-Square Test for Feature Selection - Mathematical Explanation
One of the primary tasks involved in any supervised Machine Learning venture is to select the best features from the given dataset to obtain the best results. One way to select these features is the Chi-Square Test. Mathematically, a Chi-Square test is done on two distributions two determine the lev
4 min read
Student's t-distribution in Statistics
As we know normal distribution assumes two important characteristics about the dataset: a large sample size and knowledge of the population standard deviation. However, if we do not meet these two criteria, and we have a small sample size or an unknown population standard deviation, then we use the
10 min read
Python - Central Limit Theorem
Central Limit Theorem (CLT) is a foundational principle in statistics, and implementing it using Python can significantly enhance data analysis capabilities. Statistics is an important part of data science projects. We use statistical tools whenever we want to make any inference about the population
7 min read
Limits, Continuity and Differentiability
Limits, Continuity, and Differentiability are fundamental concepts in calculus, essential for analyzing and understanding the behavior of functions. These concepts are crucial for solving real-world problems in physics, engineering, and economics. Table of Content LimitsKey Characteristics of Limits
10 min read
Implicit Differentiation
Implicit Differentiation is the process of differentiation in which we differentiate the implicit function without converting it into an explicit function. For example, we need to find the slope of a circle with an origin at 0 and a radius r. Its equation is given as x2 + y2 = r2. Now, to find the s
6 min read
Calculus for Machine Learning
Partial Derivatives in Engineering Mathematics
Partial derivatives are a basic concept in multivariable calculus. They convey how a function would change when one of its input variables changes, while keeping all the others constant. This turns out to be particularly useful in fields such as physics, engineering, economics, and computer science,
10 min read
Advanced Differentiation
Derivatives are used to measure the rate of change of any quantity. This process is called differentiation. It can be considered as a building block of the theory of calculus. Geometrically speaking, the derivative of any function at a particular point gives the slope of the tangent at that point of
8 min read
How to find Gradient of a Function using Python?
The gradient of a function simply means the rate of change of a function. We will use numdifftools to find Gradient of a function. Examples: Input : x^4+x+1 Output :Gradient of x^4+x+1 at x=1 is 4.99 Input :(1-x)^2+(y-x^2)^2 Output :Gradient of (1-x^2)+(y-x^2)^2 at (1, 2) is [-4. 2.] Approach: For S
2 min read
Optimization techniques for Gradient Descent
Gradient Descent is a widely used optimization algorithm for machine learning models. However, there are several optimization techniques that can be used to improve the performance of Gradient Descent. Here are some of the most popular optimization techniques for Gradient Descent: Learning Rate Sche
4 min read
Higher Order Derivatives
Higher order derivatives refer to the derivatives of a function that are obtained by repeatedly differentiating the original function. The first derivative of a function, f′(x), represents the rate of change or slope of the function at a point.The second derivative, f′′(x), is the derivative of the
6 min read
Taylor Series
Taylor Series is the series which is used to find the value of a function. It is the series of polynomials or any function and it contains the sum of infinite terms. Each successive term in the Taylor series expansion has a larger exponent or a higher degree term than the preceding term. We take the
10 min read
Application of Derivative - Maxima and Minima | Mathematics
The Concept of derivative can be used to find the maximum and minimum value of the given function. We know that information about and gradient or slope can be derived from the derivative of a function. We try to find a point which has zero gradients then locate maximum and minimum value near it. It
3 min read
Absolute Minima and Maxima
Absolute Maxima and Minima are the maximum and minimum values of the function defined on a fixed interval. A function in general can have high values or low values as we move along the function. The maximum value of the function in any interval is called the maxima and the minimum value of the funct
12 min read
Optimization for Data Science
From a mathematical foundation viewpoint, it can be said that the three pillars for data science that we need to understand quite well are Linear Algebra , Statistics and the third pillar is Optimization which is used pretty much in all data science algorithms. And to understand the optimization con
5 min read
Unconstrained Multivariate Optimization
Wikipedia defines optimization as a problem where you maximize or minimize a real function by systematically choosing input values from an allowed set and computing the value of the function. That means when we talk about optimization we are always interested in finding the best solution. So, let sa
4 min read
Lagrange Multipliers | Definition and Examples
In mathematics, a Lagrange multiplier is a potent tool for optimization problems and is applied especially in the cases of constraints. Named after the Italian-French mathematician Joseph-Louis Lagrange, the method provides a strategy to find maximum or minimum values of a function along one or more
8 min read
Lagrange's Interpolation
What is Interpolation? Interpolation is a method of finding new data points within the range of a discrete set of known data points (Source Wiki). In other words interpolation is the technique to estimate the value of a mathematical function, for any intermediate value of the independent variable. F
7 min read
Linear Regression in Machine learning
Machine Learning is a branch of Artificial intelligence that focuses on the development of algorithms and statistical models that can learn from and make predictions on data. Linear regression is also a type of machine-learning algorithm more specifically a supervised machine-learning algorithm that
15+ min read
Ordinary Least Squares (OLS) using statsmodels
In this article, we will use Python's statsmodels module to implement Ordinary Least Squares ( OLS ) method of linear regression. Introduction : A linear regression model establishes the relation between a dependent variable( y ) and at least one independent variable( x ) as : [Tex] \hat{y}=b_1x+b_0
4 min read
Regression in Machine Learning