Z-test : Formula, Types, Examples
Last Updated :
30 Jan, 2025
After learning about inferential statistics we now move on to a more specific technique used for making decisions based on sample data – the Z-test. Studying entire populations can be time-consuming, costly and sometimes impossible. so instead you take a sample from that population.
This is where the Z-test becomes important. It helps us make inferences about the entire population based on the sample data. It allows us to answer questions like:
- Is the sample mean significantly different from a known population mean?
- Is there a significant difference between the means of two sample groups
- This article will explain about Z-test is when to use it and how to perform it in simple terms.
Understanding Z-Test
A Z-test is a type of hypothesis test that compares the sample’s average to the population’s average and calculates the Z-score and tells us how much the sample average is different from the population average by looking at how much the data normally varies. It is particularly useful when the sample size is large >30. This Z-Score is also known as Z-Statistics formula is:
[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}[/Tex]
where,
- [Tex]\bar{x}
[/Tex]: mean of the sample.
- [Tex]\mu
[/Tex]: mean of the population.
- [Tex]\sigma
[/Tex]: Standard deviation of the population.
Let’s understand with the help of example The average family annual income in India is 200k with a standard deviation of 5k and the average family annual income in Delhi is 300k. Then Z-Score for Delhi will be.
[Tex]\begin{aligned}\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma}\\&=\frac{300-200}{5}\\&=20\end{aligned}[/Tex]
This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).
For a z-test to provide reliable results these assumptions must be met:
- Normal Distribution: The population from which the sample is drawn should be approximately normally distributed.
- Equal Variance: The samples being compared should have the same variance.
- Independence: All data points should be independent of one another.
- First step is to identify the null and alternate hypotheses.
- Determine the level of significance (∝).
- Find the critical value of z in the z-test.
- Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.
[Tex]Z = \frac{(\overline{x}- \mu)}{\left ( \sigma /\sqrt{n} \right )}
[/Tex]
where- [Tex]\bar{x}
[/Tex]: mean of the sample.
- [Tex]\mu
[/Tex]: mean of the population.
- [Tex]\sigma
[/Tex]: Standard deviation of the population.
- n: sample size.
- Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis
Type of Z-test
There are mainly two types of Z-tests. Let’s understand them one by one:
One Sample Z test
A one-sample Z-test is used to determine if the mean of a single sample is significantly different from a known population mean. When to Use:
- The population standard deviation is known.
- The sample size is large (usually n>30).
- The data is approximately normally distributed.
Suppose a company claims that their new smartphone has an average battery life of 12 hours. A consumer group tests 100 phones and finds an average battery life of 11.8 hours with a known population standard deviation of 0.5 hours.
Step 1: Hypotheses:
H₀: μ=12
H₁: μ≠12
Step2: Calculate the Z-Score:
we can calculate Z-score using the formula:
[Tex]z = \frac{x – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]
where xˉ=11.8 ,μ=12, σ=0.5 and n=100 after putting the value we get:
[Tex]z = \frac{11.8- 12}{\frac{0.5}{\sqrt{100}}} = -4[/Tex]
Step3: Decision
Since ∣Z∣=4>1.96∣Z∣=4>1.96 (critical value for α=0.05α=0.05) we reject H₀ indicate significant evidence against the company’s claim.
Now let’s implement this in Python using the Statsmodels and Numpy Library:
Python
import numpy as np
from statsmodels.stats.weightstats import ztest
data = [11.8] * 100
population_mean = 12
population_std_dev = 0.5
z_statistic, p_value = ztest(data, value=population_mean)
print(f"Z-Statistic: {z_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: The average battery life is different from 12 hours.")
else:
print("Fail to reject the null hypothesis: The average battery life is not significantly different from 12 hours.")
Output:
Z-Statistic: -560128131373970.2500
P-Value: 0.0000
Reject the null hypothesis: The average battery life is different from 12 hour
Two-sampled z-test
In this test we have provided 2 normally distributed and independent populations and we have drawn samples at random from both populations. Here we consider u1 and u2 to be the population mean and X1 and X2 to be the observed sample mean. Here our null hypothesis could be like this:
- [Tex]H_{0} : \mu_{1} -\mu_{2} = 0 [/Tex]and alternative hypothesis
- [Tex]H_{1} : \mu_{1} – \mu_{2} \ne 0 [/Tex]
and the formula for calculating the z-test score:
[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}} [/Tex]
where [Tex]\sigma_1[/Tex] and [Tex]\sigma_2[/Tex] are the standard deviation and n1 and n2 are the sample size of population corresponding to u1 and u2 . Let’s look at the example to understand:
There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination the score of each student comes. Now we want to determine whether the online or offline classes are better.
- Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10
- Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12
Assuming a 5% significance level perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.
Solution:
Step 1: Null & Alternate Hypothesis
- Null Hypothesis: There is no significant difference between the mean score between the online and offline classes
[Tex] \mu_1 -\mu_2 = 0
[/Tex] - Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes.
[Tex] \mu_1 -\mu_2 \neq 0
[/Tex]
Step 2: Significance Level
- Significance Level: 5%
[Tex]\alpha = 0.05
[/Tex]
Step 3: Z-Score
[Tex]\begin{aligned}\text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}\\ &= \frac{(75-80)-0}{\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}}\\ &= \frac{-5}{\sqrt{2+2.4}}\\ &= \frac{-5}{2.0976}\\&=-2.384\end{aligned}[/Tex]
Step 4: Check to Critical Z-Score value in the Z-Table for alpha/2 = 0.025
Step 5: Compare with the absolute Z-Score value
- absolute(Z-Score) > Critical Z-Score
- Sow we reject the null hypothesis and there is a significant difference between the online and offline classes.
Now we will implement the two sampled z-test using the libraries used in previous implementation:
Python
import numpy as np
import scipy.stats as stats
# Group A (Offline Classes)
n1 = 50
x1 = 75
s1 = 10
# Group B (Online Classes)
n2 = 60
x2 = 80
s2 = 12
# Null Hypothesis = mu_1-mu_2 = 0
# Hypothesized difference (under the null hypothesis)
D = 0
# Set the significance level
alpha = 0.05
# Calculate the test statistic (z-score)
z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))
# Calculate the critical value
z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:',z_critical)
# Compare the test statistic with the critical value
if np.abs(z_score) > z_critical:
print("""Reject the null hypothesis.
There is a significant difference b/w the online and offline classes.""")
else:
print("""Fail to reject the null hypothesis.
There is not evidence to suggest a significant difference b/w the online and offline classes.""")
# Approach 2: Using P-value
# P-Value : Probability of getting less than a Z-score
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value :',p_value)
# Compare the p-value with the significance level
if p_value < alpha:
print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
print("""Fail to reject the null hypothesis.
There is not evidence to suggest significant difference b/w the online and offline classes.""")
Output:
Z-Score: 2.3836564731139807
Critical Z-Score: 1.959963984540054
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
P-Value : 0.01714159544079563
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
The Z-Table

Solved examples :
Problem 1: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?
Solution:
Step 1: State the hypotheses
H₀: μ = 12 (null hypothesis)
H₁: μ ≠ 12 (alternative hypothesis)
Step 2: Calculate the Z-score
Z = (x̄ – μ) / (σ / √n)
= (11.8 – 12) / (0.5 / √100)
= -0.2 / 0.05
= -4
Step 3: Find the critical value (two-tailed test at 5% significance)
Z₀.₀₂₅ = ±1.96
Step 4: Compare Z-score with critical value
|-4| > 1.96, so we reject the null hypothesis.
Conclusion: There is sufficient evidence to refute the company’s claim about battery life.
Problem 2: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?
Solution:
Step 1: State the hypotheses
H₀: μ₁ – μ₂ = 0 (null hypothesis)
H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis)
Step 2: Calculate the Z-score
Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂))
= (15 – 13) / √((3²/50) + (4²/60))
= 2 / √(0.18 + 0.2667)
= 2 / 0.6455
= 3.10
Step 3: Find the critical value (two-tailed test at 1% significance)
Z₀.₀₀₅ = ±2.576
Step 4: Compare Z-score with critical value
3.10 > 2.576, so we reject the null hypothesis.
Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.
Z-test – FAQS
What is the main limitation of the z-test?
The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.
What is the minimum sample for z-test?
A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.
What is the application of z-test?
It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.
What is the theory of the z-test?
The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.
Similar Reads
Maths for Machine Learning
Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
5 min read
Linear Algebra and Matrix
Matrices
Matrices are key concepts in mathematics, widely used in solving equations and problems in fields like physics and computer science. A matrix is simply a grid of numbers, and a determinant is a value calculated from a square matrix. Example: [Tex]\begin{bmatrix} 6 & 9 \\ 5 & -4 \\ \end{bmatr
3 min read
Scalar and Vector
Scalar and Vector Quantities are used to describe the motion of an object. Scalar Quantities are defined as physical quantities that have magnitude or size only. For example, distance, speed, mass, density, etc. However, vector quantities are those physical quantities that have both magnitude and di
8 min read
Python Program to Add Two Matrices
The task of adding two matrices in Python involves combining corresponding elements from two given matrices to produce a new matrix. Each element in the resulting matrix is obtained by adding the values at the same position in the input matrices. For example, if two 2x2 matrices are given as: The su
3 min read
Python Program to Multiply Two Matrices
Given two matrices, we will have to create a program to multiply two matrices in Python. Example: Python Matrix Multiplication of Two-Dimension [GFGTABS] Python matrix_a = [[1, 2], [3, 4]] matrix_b = [[5, 6], [7, 8]] result = [[0, 0], [0, 0]] for i in range(2): for j in range(2): result[i][j] = (mat
5 min read
Vector Operations
Vectors are fundamental quantities in physics and mathematics, that have both magnitude and direction. So performing mathematical operations on them directly is not possible. So we have special operations that work only with vector quantities and hence the name, vector operations. Thus, It is essent
8 min read
Product of Vectors
Vector operations are used almost everywhere in the field of physics. Many times these operations include addition, subtraction, and multiplication. Addition and subtraction can be performed using the triangle law of vector addition. In the case of products, vector multiplication can be done in two
6 min read
Scalar Product of Vectors
Two vectors or a vector and a scalar can be multiplied. There are mainly two kinds of products of vectors in physics, scalar multiplication of vectors and Vector Product (Cross Product) of two vectors. The result of the scalar product of two vectors is a number (a scalar). The common use of the scal
9 min read
Dot and Cross Products on Vectors
A quantity that is characterized not only by magnitude but also by its direction, is called a vector. Velocity, force, acceleration, momentum, etc. are vectors. Â Vectors can be multiplied in two ways: Scalar product or Dot productVector Product or Cross productTable of Content Scalar Product/Dot Pr
9 min read
Transpose a matrix in Single line in Python
Transpose of a matrix is a task we all can perform very easily in Python (Using a nested loop). But there are some interesting ways to do the same in a single line. In Python, we can implement a matrix as a nested list (a list inside a list). Each element is treated as a row of the matrix. For examp
4 min read
Transpose of a Matrix
A matrix is a rectangular arrangement of numbers (or elements) in rows and columns. It is often used in mathematics to represent data, solve systems of equations, or perform transformations. A matrix is written as: [Tex]A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9\e
11 min read
Adjoint and Inverse of a Matrix
Given a square matrix, find the adjoint and inverse of the matrix. We strongly recommend you to refer determinant of matrix as a prerequisite for this. Adjoint (or Adjugate) of a matrix is the matrix obtained by taking the transpose of the cofactor matrix of a given square matrix is called its Adjoi
15+ min read
How to inverse a matrix using NumPy
In this article, we will see NumPy Inverse Matrix in Python before that we will try to understand the concept of it. The inverse of a matrix is just a reciprocal of the matrix as we do in normal arithmetic for a single number which is used to solve the equations to find the value of unknown variable
3 min read
Program to find Determinant of a Matrix
The determinant of a Matrix is defined as a special number that is defined only for square matrices (matrices that have the same number of rows and columns). A determinant is used in many places in calculus and other matrices related to algebra, it actually represents the matrix in terms of a real n
15+ min read
Program to find Normal and Trace of a matrix
Given a 2D matrix, the task is to find Trace and Normal of matrix.Normal of a matrix is defined as square root of sum of squares of matrix elements.Trace of a n x n square matrix is sum of diagonal elements. Examples : Input : mat[][] = {{7, 8, 9}, {6, 1, 2}, {5, 4, 3}}; Output : Normal = 16 Trace =
6 min read
Data Science | Solving Linear Equations
Linear Algebra is a very fundamental part of Data Science. When one talks about Data Science, data representation becomes an important aspect of Data Science. Data is represented usually in a matrix form. The second important thing in the perspective of Data Science is if this data contains several
9 min read
Data Science - Solving Linear Equations with Python
A collection of equations with linear relationships between the variables is known as a system of linear equations. The objective is to identify the values of the variables that concurrently satisfy each equation, each of which is a linear constraint. By figuring out the system, we can learn how the
4 min read
System of Linear Equations
In mathematics, a system of linear equations consists of two or more linear equations that share the same variables. These systems often arise in real-world applications, such as engineering, physics, economics, and more, where relationships between variables need to be analyzed. Understanding how t
8 min read
System of Linear Equations in three variables using Cramer's Rule
Cramer's rule: In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknown variables. It expresses the solution in terms of the determinants of the coefficient matrix and of matrices obtained from it by replacing one colu
12 min read
Eigenvalues and Eigenvectors
Eigenvectors are the directions that remain unchanged during a transformation, even if they get longer or shorter. Eigenvalues are the numbers that indicate how much something stretches or shrinks during that transformation. These ideas are important in many areas of math and engineering, including
15+ min read
Applications of Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors play a crucial role in a wide range of applications across engineering and science. Fields like control theory, vibration analysis, electric circuits, advanced dynamics, and quantum mechanics frequently rely on these concepts. One key application involves transforming ma
7 min read
How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?
In this article, we will discuss how to compute the eigenvalues and right eigenvectors of a given square array using NumPy library. Example: Suppose we have a matrix as: [[1,2], [2,3]] Eigenvalue we get from this matrix or square array is: [-0.23606798 4.23606798] Eigenvectors of this matrix are: [[
2 min read
Statistics for Machine Learning
Descriptive Statistic
Statistics serves as the backbone of data science providing tools and methodologies to extract meaningful insights from raw data. Data scientists rely on statistics for every crucial task - from cleaning messy datasets and creating powerful visualizations to building predictive models that glimpse i
5 min read
Measures of Central Tendency
Usually, frequency distribution and graphical representation are used to depict a set of raw data to attain meaningful conclusions from them. However, sometimes, these methods fail to convey a proper and clear picture of the data as expected. Therefore, some measures, also known as Measures of Centr
5 min read
Measures of Dispersion | Types, Formula and Examples
Measures of Dispersion are used to represent the scattering of data. These are the numbers that show the various aspects of the data spread across various parameters. Let's learn about the measure of dispersion in statistics , its types, formulas, and examples in detail. Dispersion in StatisticsDisp
10 min read
Mean, Variance and Standard Deviation
Mean, Variance and Standard Deviation are fundamental concepts in statistics and engineering mathematics, essential for analyzing and interpreting data. These measures provide insights into data's central tendency, dispersion, and spread, which are crucial for making informed decisions in various en
8 min read
Calculate the average, variance and standard deviation in Python using NumPy
Numpy in Python is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python. Numpy provides very easy methods to calculate the average, variance
5 min read
Random Variable
Random variable is a fundamental concept in statistics that bridges the gap between theoretical probability and real-world data. A Random variable in statistics is a function that assigns a real value to an outcome in the sample space of a random experiment. For example: if you roll a die, you can a
11 min read
Difference between Parametric and Non-Parametric Methods
Statistical analysis plays a crucial role in understanding and interpreting data across various disciplines. Two prominent approaches in statistical analysis are Parametric and Non-Parametric Methods. While both aim to draw inferences from data, they differ in their assumptions and underlying princi
8 min read
Probability Distribution - Function, Formula, Table
A probability distribution describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a frequency distribution shows how often outcomes occur in a sample or
15+ min read
Confidence Interval
Confidence Interval (CI) is a range of values that estimates where the true population value is likely to fall. Instead of just saying The average height of students is 165 cm a confidence interval allow us to say We are 95% confident that the true average height is between 160 cm and 170 cm. Before
9 min read
Covariance and Correlation
Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. In this article, we will learn about the differences
5 min read
Program to find correlation coefficient
Given two array elements and we have to find the correlation coefficient between two arrays. The correlation coefficient is an equation that is used to determine the strength of the relation between two variables. The correlation coefficient is sometimes called as cross-correlation coefficient. The
8 min read
Robust Correlation
Correlation is a statistical tool that is used to analyze and measure the degree of relationship or degree of association between two or more variables. There are generally three types of correlation: Positive correlation: When we increase the value of one variable, the value of another variable inc
8 min read
Normal Probability Plot
The probability plot is a way of visually comparing the data coming from different distributions. These data can be of empirical dataset or theoretical dataset. The probability plot can be of two types: P-P plot: The (Probability-to-Probability) p-p plot is the way to visualize the comparing of cumu
3 min read
Quantile Quantile plots
The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it
8 min read
True Error vs Sample Error
True Error The true error can be said as the probability that the hypothesis will misclassify a single randomly drawn sample from the population. Here the population represents all the data in the world. Let's consider a hypothesis h(x) and the true/target function is f(x) of population P. The proba
3 min read
Bias-Variance Trade Off - Machine Learning
It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. There is a tradeoff between a modelâs ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. A p
3 min read
Understanding Hypothesis Testing
Hypothesis method compares two opposite statements about a population and uses sample data to decide which one is more likely to be correct.To test this assumption we first take a sample from the population and analyze it and use the results of the analysis to decide if the claim is valid or not. Su
14 min read
T-test
After learning about the Z-test we now move on to another important statistical test called the t-test. While the Z-test is useful when we know the population variance. The t-test is used to compare the averages of two groups to see if they are significantly different from each other. Suppose You wa
11 min read
Paired T-Test - A Detailed Overview
Studentâs t-test or t-test is the statistical method used to determine if there is a difference between the means of two samples. The test is often performed to find out if there is any sampling error or unlikeliness in the experiment. This t-test is further divided into 3 types based on your data a
5 min read
P-value in Machine Learning
P-value helps us determine how likely it is to get a particular result when the null hypothesis is assumed to be true. It is the probability of getting a sample like ours or more extreme than ours if the null hypothesis is correct. Therefore, if the null hypothesis is assumed to be true, the p-value
6 min read
F-Test in Statistics
F test is a statistical test that is used in hypothesis testing that determines whether the variances of two samples are equal or not. The article will provide detailed information on f test, f statistic, its calculation, critical value and how to use it to test hypotheses. To understand F test firs
6 min read
Z-test : Formula, Types, Examples
After learning about inferential statistics we now move on to a more specific technique used for making decisions based on sample data â the Z-test. Studying entire populations can be time-consuming, costly and sometimes impossible. so instead you take a sample from that population. This is where th
9 min read
Residual Leverage Plot (Regression Diagnostic)
In linear or multiple regression, it is not enough to just fit the model into the dataset. But, it may not give the desired result. To apply the linear or multiple regression efficiently to the dataset. There are some assumptions that we need to check on the dataset that made linear/multiple regress
5 min read
Difference between Null and Alternate Hypothesis
Hypothesis is a statement or an assumption that may be true or false. There are six types of hypotheses mainly the Simple hypothesis, Complex hypothesis, Directional hypothesis, Associative hypothesis, and Null hypothesis. Usually, the hypothesis is the start point of any scientific investigation, I
3 min read
Mann and Whitney U test
Mann and Whitney's U-test or Wilcoxon rank-sum test is the non-parametric statistic hypothesis test that is used to analyze the difference between two independent samples of ordinal data. In this test, we have provided two randomly drawn samples and we have to verify whether these two samples is fro
4 min read
Wilcoxon Signed Rank Test
Prerequisites: Parametric and Non-Parametric Methods Hypothesis Testing Wilcoxon signed-rank test, also known as Wilcoxon matched pair test is a non-parametric hypothesis test that compares the median of two paired groups and tells if they are identically distributed or not. We can use this when: Di
4 min read
Kruskal Wallis Test
Kruskal Wallis Test: It is a nonparametric test. It is sometimes referred to as One-Way ANOVA on ranks. It is a nonparametric alternative to One-Way ANOVA. It is an extension of the Man-Whitney Test to situations where more than two levels/populations are involved. This test falls under the family o
4 min read
Friedman Test
Friedman Test: It is a non-parametric test alternative to the one way ANOVA with repeated measures. It tries to determine if subjects changed significantly across occasions/conditions. For example:- Problem-solving ability of a set of people is the same or different in Morning, Afternoon, Evening. I
5 min read
Probability Class 10 Important Questions
Probability is a fundamental concept in mathematics for measuring of chances of an event happening By assigning numerical values to the chances of different outcomes, probability allows us to model, analyze, and predict complex systems and processes. Probability Formulas for Class 10 It says the pos
4 min read
Probability and Probability Distributions
Mathematics - Law of Total Probability
Probability theory is the branch of mathematics concerned with the analysis of random events. It provides a framework for quantifying uncertainty, predicting outcomes, and understanding random phenomena. In probability theory, an event is any outcome or set of outcomes from a random experiment, and
13 min read
Bayes's Theorem for Conditional Probability
Bayes's Theorem for Conditional Probability: Bayes's Theorem is a fundamental result in probability theory that describes how to update the probabilities of hypotheses when given evidence. Named after the Reverend Thomas Bayes, this theorem is crucial in various fields, including engineering, statis
9 min read
Mathematics | Probability Distributions Set 1 (Uniform Distribution)
Prerequisite - Random Variable In probability theory and statistics, a probability distribution is a mathematical function that can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. For instance, if the random variable X is used to denote the
4 min read
Mathematics | Probability Distributions Set 4 (Binomial Distribution)
The previous articles talked about some of the Continuous Probability Distributions. This article covers one of the distributions which are not continuous but discrete, namely the Binomial Distribution. Introduction - To understand the Binomial distribution, we must first understand what a Bernoulli
5 min read
Mathematics | Probability Distributions Set 5 (Poisson Distribution)
The previous article covered the Binomial Distribution. This article talks about another Discrete Probability Distribution, the Poisson Distribution. Introduction -Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we c
4 min read
Uniform Distribution | Formula, Definition and Examples
Uniform Distribution is the probability distribution that represents equal likelihood of all outcomes within a specific range. i.e. the probability of each outcome occurring is the same. Whether dealing with a simple roll of a fair die or selecting a random number from a continuous interval, uniform
11 min read
Mathematics | Probability Distributions Set 2 (Exponential Distribution)
The previous article covered the basics of Probability Distributions and talked about the Uniform Probability Distribution. This article covers the Exponential Probability Distribution which is also a Continuous distribution just like Uniform Distribution. Introduction - Suppose we are posed with th
5 min read
Mathematics | Probability Distributions Set 3 (Normal Distribution)
The previous two articles introduced two Continuous Distributions: Uniform and Exponential. This article covers the Normal Probability Distribution, also a Continuous distribution, which is by far the most widely used model for continuous measurement. Introduction - Whenever a random experiment is r
5 min read
Mathematics | Beta Distribution Model
The Beta Distribution is a continuous probability distribution defined on the interval [0, 1], widely used in statistics and various fields for modeling random variables that represent proportions or probabilities. It is particularly useful when dealing with scenarios where the outcomes are bounded
12 min read
Gamma Distribution Model in Mathematics
Introduction : Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we can think of it as a random variable. Now, if this random variable X has gamma distribution, then its probability density function is given as follows
2 min read
Chi-Square Test for Feature Selection - Mathematical Explanation
One of the primary tasks involved in any supervised Machine Learning venture is to select the best features from the given dataset to obtain the best results. One way to select these features is the Chi-Square Test. Mathematically, a Chi-Square test is done on two distributions two determine the lev
4 min read
Student's t-distribution in Statistics
As we know normal distribution assumes two important characteristics about the dataset: a large sample size and knowledge of the population standard deviation. However, if we do not meet these two criteria, and we have a small sample size or an unknown population standard deviation, then we use the
10 min read
Python - Central Limit Theorem
Central Limit Theorem (CLT) is a foundational principle in statistics, and implementing it using Python can significantly enhance data analysis capabilities. Statistics is an important part of data science projects. We use statistical tools whenever we want to make any inference about the population
7 min read
Limits, Continuity and Differentiability
Limits, Continuity, and Differentiability are fundamental concepts in calculus, essential for analyzing and understanding the behavior of functions. These concepts are crucial for solving real-world problems in physics, engineering, and economics. Table of Content LimitsKey Characteristics of Limits
10 min read
Implicit Differentiation
Implicit Differentiation is the process of differentiation in which we differentiate the implicit function without converting it into an explicit function. For example, we need to find the slope of a circle with an origin at 0 and a radius r. Its equation is given as x2 + y2 = r2. Now, to find the s
6 min read
Calculus for Machine Learning
Partial Derivatives in Engineering Mathematics
Partial derivatives are a basic concept in multivariable calculus. They convey how a function would change when one of its input variables changes, while keeping all the others constant. This turns out to be particularly useful in fields such as physics, engineering, economics, and computer science,
10 min read
Advanced Differentiation
Derivatives are used to measure the rate of change of any quantity. This process is called differentiation. It can be considered as a building block of the theory of calculus. Geometrically speaking, the derivative of any function at a particular point gives the slope of the tangent at that point of
8 min read
How to find Gradient of a Function using Python?
The gradient of a function simply means the rate of change of a function. We will use numdifftools to find Gradient of a function. Examples: Input : x^4+x+1 Output :Gradient of x^4+x+1 at x=1 is 4.99 Input :(1-x)^2+(y-x^2)^2 Output :Gradient of (1-x^2)+(y-x^2)^2 at (1, 2) is [-4. 2.] Approach: For S
2 min read
Optimization techniques for Gradient Descent
Gradient Descent is a widely used optimization algorithm for machine learning models. However, there are several optimization techniques that can be used to improve the performance of Gradient Descent. Here are some of the most popular optimization techniques for Gradient Descent: Learning Rate Sche
4 min read
Higher Order Derivatives
Higher order derivatives refer to the derivatives of a function that are obtained by repeatedly differentiating the original function. The first derivative of a function, fâ²(x), represents the rate of change or slope of the function at a point.The second derivative, fâ²â²(x), is the derivative of the
6 min read
Taylor Series
A Taylor series represents a function as an infinite sum of terms, calculated from the values of its derivatives at a single point. Taylor series is a powerful mathematical tool used to approximate complex functions with an infinite sum of terms derived from the function's derivatives at a single po
8 min read
Application of Derivative - Maxima and Minima
Derivatives have many applications, like finding rate of change, approximation, maxima/minima and tangent. In this section, we focus on their use in finding maxima and minima. Note: If f(x) is a continuous function, then for every continuous function on a closed interval has a maximum and a minimum
6 min read
Absolute Minima and Maxima
Absolute Maxima and Minima are the maximum and minimum values of the function defined on a fixed interval. A function in general can have high values or low values as we move along the function. The maximum value of the function in any interval is called the maxima and the minimum value of the funct
12 min read
Optimization for Data Science
From a mathematical foundation viewpoint, it can be said that the three pillars for data science that we need to understand quite well are Linear Algebra , Statistics and the third pillar is Optimization which is used pretty much in all data science algorithms. And to understand the optimization con
5 min read
Unconstrained Multivariate Optimization
Wikipedia defines optimization as a problem where you maximize or minimize a real function by systematically choosing input values from an allowed set and computing the value of the function. That means when we talk about optimization we are always interested in finding the best solution. So, let sa
4 min read
Lagrange Multipliers | Definition and Examples
In mathematics, a Lagrange multiplier is a potent tool for optimization problems and is applied especially in the cases of constraints. Named after the Italian-French mathematician Joseph-Louis Lagrange, the method provides a strategy to find maximum or minimum values of a function along one or more
8 min read
Lagrange's Interpolation
What is Interpolation? Interpolation is a method of finding new data points within the range of a discrete set of known data points (Source Wiki). In other words interpolation is the technique to estimate the value of a mathematical function, for any intermediate value of the independent variable. F
7 min read
Linear Regression in Machine learning
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It provides valuable insights for prediction and data analysis. This article will explore its types, assumptions, implementation, advantages, and evaluation me
15+ min read
Ordinary Least Squares (OLS) using statsmodels
In this article, we will use Python's statsmodels module to implement Ordinary Least Squares ( OLS ) method of linear regression. Introduction : A linear regression model establishes the relation between a dependent variable( y ) and at least one independent variable( x ) as : [Tex] \hat{y}=b_1x+b_0
4 min read
Regression in Machine Learning