Unit 1
Descriptive Statistics
Descriptive statistics are used to summarize, organize, and simplify data. Their primary function is to
provide a clear and understandable presentation of the data collected from a sample or a
population. This is achieved through numerical measures and graphical representations.
Common numerical methods include measures of central tendency (such as mean, median, and
mode) and variability (such as range, variance, and standard deviation). These help to describe the
typical values and the spread of the data. Additionally, graphical tools such as bar charts, histograms,
pie charts, and boxplots are employed to visually represent data distributions and patterns.
The key point about descriptive statistics is that they deal only with the data at hand. They do not
attempt to draw conclusions or make generalizations beyond the specific group being studied. For
instance, calculating the average exam score of students in a class is a descriptive statistic—it
provides a snapshot of that particular group.
Inferential Statistics
Inferential statistics, on the other hand, go a step beyond description. Their purpose is to make
predictions, decisions, or generalizations about a population based on data obtained from a sample.
Because studying entire populations is often impractical, inferential methods allow researchers to
infer characteristics of a larger group from a subset of it.
This branch of statistics uses methods such as hypothesis testing, confidence intervals, regression
analysis, and analysis of variance (ANOVA) to assess the reliability of conclusions drawn from sample
data. For example, inferential statistics would be used to determine whether a new teaching method
significantly improves student performance compared to a traditional method, based on data from a
sample of classrooms.
A central concept in inferential statistics is probability, which helps quantify the uncertainty
associated with making inferences. Because conclusions are based on incomplete data (a sample
rather than the full population), there is always a risk of error, which is addressed using concepts like
p-values, significance levels, and confidence levels.
Parametric and Non-Parametric Statistics
Parametric and non-parametric statistics represent two distinct approaches to analysing data in
Behavioral sciences. Parametric statistics are based on several stringent assumptions about the
population from which samples are drawn. These include the assumption that the data come from
populations that are normally distributed and that the variances within each group are equal, a
condition known as homogeneity of variance. Additionally, parametric methods require that the data
be measured at an interval or ratio level. When these assumptions are met, parametric tests are
considered more powerful and efficient. Examples of such tests include the z-test, t-test, ANOVA, and
Pearson's correlation.
Non-parametric statistics, by contrast, do not assume a specific distribution for the population. They
are often referred to as distribution-free methods. These tests are especially useful when data are
measured at nominal or ordinal levels or when sample sizes are small and the assumptions of
parametric tests cannot be justified. Non-parametric tests are based on fewer and less stringent
assumptions. They include methods such as the Mann-Whitney U test, Wilcoxon Signed-Rank test,
Kruskal-Walli’s test, and Spearman’s rank-order correlation. While non-parametric tests are easier to
compute and applicable to a broader range of data types, they are typically less powerful than their
parametric counterparts when the latter’s assumptions are satisfied.
Testing Hypothesis About Single Means (z and t)
In hypothesis testing for single means, the goal is to determine whether a sample mean significantly
differs from a known or hypothesized population mean. When the population standard deviation is
known and the sample size is large, the z-test is used. However, in most real-world scenarios, the
population standard deviation is unknown, and the sample size may be small. In such cases, the t-
test becomes the appropriate choice.
The t-test allows researchers to estimate the standard error using the sample standard deviation.
This introduces an additional layer of uncertainty, which is accounted for by using the Student’s t-
distribution rather than the standard normal distribution. The outcome of a t-test is a t-value, which
reflects how far the sample mean deviates from the hypothesized population mean in standard error
units. If this t-value falls in the critical region based on the chosen level of significance, the null
hypothesis is rejected.
Random Sampling Distribution of Means
The concept of the sampling distribution of the mean is foundational in inferential statistics. It refers
to the distribution of sample means over repeated sampling from the same population. If we
repeatedly draw samples of the same size from a population and calculate the mean of each, the
distribution of those means forms what is known as the sampling distribution. According to the
Central Limit Theorem, this distribution approximates normality as the sample size increases,
regardless of the population’s distribution. The mean of this sampling distribution equals the
population mean, while its standard deviation—known as the standard error—equals the population
standard deviation divided by the square root of the sample size.
This principle underlies the logic of hypothesis testing, as it enables researchers to estimate how
likely it is to obtain a particular sample mean under the assumption that the null hypothesis is true.
Null and Alternate Hypothesis
In hypothesis testing, two competing statements are formulated: the null hypothesis (H₀) and the
alternate hypothesis (H₁ or Ha). The null hypothesis asserts that there is no effect or no difference—it
is the hypothesis that any observed differences are due to chance. For example, it might state that
the average score of students who took a new learning course is equal to that of students who did
not.
The alternate hypothesis proposes that there is a true effect or a significant difference. It suggests
that any observed difference in the data is not likely due to random variation alone. Hypothesis
testing involves collecting data and evaluating whether the observed outcomes are sufficiently
improbable under the assumption of the null hypothesis, thus providing grounds to reject it in Favor
of the alternate.
One-Tailed and Two-Tailed Hypothesis
When formulating hypotheses, researchers must decide whether they are testing for a directional or
non-directional effect. A one-tailed (directional) hypothesis posits that the population parameter
differs from the hypothesized value in a specific direction—either greater than or less than. For
example, a one-tailed hypothesis might predict that a training program increases productivity.
In contrast, a two-tailed (non-directional) hypothesis merely suggests that there is a difference,
without specifying the direction. It is used when the researcher is open to detecting a change in
either direction, such as testing whether a new medication affects blood pressure either positively or
negatively. Statistically, a one-tailed test allocates the entire rejection region to one end of the
distribution, which can increase power if the specified direction is correct. A two-tailed test divides
the rejection region between both ends, making it more conservative but also more flexible when
the direction of the effect is uncertain.
Characteristics of Student’s Distribution of t
The student’s t-distribution is a probability distribution used in estimating population parameters
when the sample size is small and the population standard deviation is unknown. Unlike the normal
distribution, the t-distribution is characterized by heavier tails, meaning it has more probability in the
tails and less in the centre. This accounts for the additional variability introduced by estimating the
population standard deviation from the sample. As the sample size increases, the t-distribution
approaches the shape of the normal distribution. This property makes it ideal for small-sample
inference, gradually transitioning to the standard normal as the number of degrees of freedom
increases.
Degrees of Freedom
Degrees of freedom (df) are a crucial concept in statistical testing and influence the shape of the t-
distribution. In the context of a one-sample t-test, the degrees of freedom equal the sample size
minus one (n - 1). This subtraction accounts for the fact that one parameter (the mean) has been
estimated from the data, leaving n - 1 independent pieces of information. In more complex designs,
such as comparing multiple groups or performing regression analysis, the formula for degrees of
freedom changes accordingly. The number of degrees of freedom directly affects the critical values of
the t-distribution and thus the stringency of the test.
Assumptions of the t-Test
The validity of conclusions drawn from a t-test depends on several assumptions. First, the data must
be randomly sampled from the population. While perfect random sampling is often impractical,
random assignment in experiments is generally considered an acceptable substitute. Second,
observations must be independent, meaning that the value of one observation does not influence or
relate to another. Third, the population from which the sample is drawn should be normally
distributed. However, thanks to the Central Limit Theorem, this assumption becomes less critical
with larger sample sizes. Finally, in independent samples t-tests, the assumption of homogeneity of
variance must be satisfied—the variances in the two groups being compared should be
approximately equal. While minor deviations from this assumption are tolerable, especially with
equal or large sample sizes, significant differences in variance combined with unequal sample sizes
can distort the result
Levels of Significance Versus p-values
In hypothesis testing, the level of significance (denoted by alpha, α) represents the threshold at
which we reject the null hypothesis. It reflects the researcher’s tolerance for Type I error—the
probability of rejecting a true null hypothesis. Common significance levels are 0.05, 0.01, and 0.001.
A significance level of 0.05 means there is a 5% risk of concluding that an effect exists when it
actually does not.
The p-value, on the other hand, is the probability of obtaining a result at least as extreme as the
observed one, assuming the null hypothesis is true. Unlike the alpha level, which is chosen before
data collection, the p-value is calculated from the observed data. If the p-value is less than or equal
to the chosen alpha level, the null hypothesis is rejected. This result is said to be statistically
significant. The smaller the p-value, the stronger the evidence against the null hypothesis. While p-
values are widely reported in psychological research, they are often interpreted relative to standard
thresholds (e.g., p < .05, p < .01), rather than as exact values.
Unit 2
Type I and Type II Errors
In hypothesis testing, statistical decisions are made based on sample data, which inherently carries a
risk of error. Two primary types of decision errors are recognized: Type I and Type II. A Type I error
occurs when the null hypothesis is rejected even though it is actually true. This kind of error is also
referred to as a false positive, as it falsely indicates the presence of an effect or difference when none
exists. The probability of making a Type I error is denoted by the Greek letter alpha (α), commonly
set at 0.05, meaning there is a 5% risk of rejecting a true null hypothesis.
In contrast, a Type II error arises when the null hypothesis is retained even though it is false. This is a
false negative error, where a real effect or difference is not detected by the statistical test. The
probability of committing a Type II error is represented by beta (β). For example, if a study finds no
difference between two treatments when, in reality, one treatment is more effective, it has
committed a Type II error. These errors illustrate the trade-off inherent in hypothesis testing:
reducing the risk of one type of error often increases the risk of the other.
Power of a Test
The power of a statistical test refers to the probability that it will correctly reject a false null
hypothesis. This concept is mathematically represented as (1 - β), where β is the probability of a Type
II error. Power is a crucial aspect of experimental design because it determines the likelihood of
detecting an effect if one truly exists. A test with low power may fail to identify meaningful
differences, while a test with high power is more reliable in confirming real effects.
Several factors influence statistical power. These include the sample size, the significance level (α),
the true effect size, and whether the test is one-tailed or two-tailed. Increasing the sample size and
choosing a one-tailed test (if justified) typically increases power. Lower variability in the data and a
larger true difference between means also contribute to higher power. Researchers are often advised
to aim for a power of 0.80 or higher, meaning there is an 80% chance of correctly rejecting a false
null hypothesis.
Difference Between Two Means: t-test for Independent and Dependent Groups
When comparing the means of two groups, researchers use variations of the t-test depending on the
study design. The t-test for independent groups is applied when the two groups being compared are
separate and unrelated. For example, if one group of students uses a new teaching method while
another uses a traditional one, and we want to compare their average test scores, an independent t-
test is appropriate. This test evaluates whether the observed difference in group means is statistically
significant, assuming equal variances between the groups. It uses a pooled estimate of variance,
combining information from both samples to provide a more accurate estimate of the population
variance.
In contrast, the t-test for dependent (or paired) groups is used when the same subjects are measured
under two conditions, or when pairs of subjects are closely matched. An example is measuring a
group’s performance before and after an intervention. Since the scores are linked, the test focuses
on the differences within pairs rather than between independent groups. This often results in higher
statistical power because it removes variability between subjects as a source of error.
Both types of t-tests involve calculating a t-value, which is then compared to a critical value from the
t-distribution based on the chosen significance level and degrees of freedom. If the calculated t-value
exceeds the critical value, the null hypothesis (that there is no difference between means) is
rejected.
Confidence Intervals
A confidence interval provides a range within which the true population parameter is likely to fall,
given the observed sample data. Unlike a point estimate, which provides a single value (such as a
sample mean), a confidence interval offers a range of plausible values for the parameter, along with a
confidence level—commonly 95%. For example, if a study estimates the average number of hours
people exercise per week to be 5, with a 95% confidence interval from 4.6 to 5.4, it implies that if the
study were repeated many times, 95% of the intervals would contain the true population mean.
Confidence intervals are constructed using the formula:
Point Estimate ± (Critical Value × Standard Error).
The point estimate is usually the sample mean, the critical value depends on the chosen confidence
level (e.g., 1.96 for 95% under normal distribution), and the standard error reflects the variability in
the estimate. The width of the confidence interval is influenced by the sample size and the variability
in the data: larger samples and less variability produce narrower intervals, indicating greater
precision.
Importantly, confidence intervals and hypothesis testing are closely related. If a confidence interval
for a mean difference does not include zero, it implies that the difference is statistically significant at
the corresponding alpha level. This dual role makes confidence intervals valuable for both estimation
and decision-making in inferential statistics.
Unit 3
One-Way ANOVA: Assumptions and Calculation
One-Way Analysis of Variance (ANOVA) is a statistical method used when comparing the means of
more than two independent groups. The objective is to determine whether the observed differences
among the sample means are statistically significant or if they are likely due to random variation. The
underlying logic is based on partitioning the total variance in the data into two components: variance
between groups (attributable to treatment or group differences) and variance within groups
(attributable to random error or individual differences).
For the ANOVA results to be valid, certain assumptions must be satisfied. The first assumption is
normality, which states that the scores within each group should be drawn from normally distributed
populations. Thanks to the Central Limit Theorem, minor deviations from normality are generally
acceptable, especially with larger sample sizes. The second assumption is homogeneity of variances,
meaning that the variances in each of the groups being compared should be approximately equal.
Finally, ANOVA assumes the independence of observations, indicating that each subject's score is not
influenced by the scores of others.
The F-statistic in ANOVA is computed by dividing the mean square between groups by the mean
square within groups. If this ratio is significantly larger than 1, it suggests that the group means are
not all equal, leading to the rejection of the null hypothesis that all group means are the same.
Comparison of t and F Statistics
The t and F statistics are both used to test hypotheses about group means, but they differ in their
scope and application. The t-test is typically used when comparing the means of two groups. It
evaluates whether the observed difference between two sample means is statistically significant. The
F-test, on the other hand, is used when there are more than two groups. It assesses whether there is
a significant difference among three or more group means simultaneously.
Mathematically, the square of the t-statistic (when comparing two groups) is equivalent to the F-
statistic. This means that if you perform a one-way ANOVA on two groups, the F-value you obtain will
be equal to the square of the t-value from an independent samples t-test. Despite this mathematical
connection, the interpretation and use of the two tests differ. The t-test provides a direct comparison
between two groups, while the F-test determines whether at least one group differs from the others
without specifying which one.
Post Hoc Comparisons: Conceptual Understanding
Post hoc comparisons are additional statistical tests conducted after obtaining a significant F-ratio in
ANOVA. These comparisons aim to identify exactly which groups differ from each other. The term
"post hoc" means "after the fact," and such tests are necessary because the F-test only indicates that
at least one group mean is different without revealing where the difference lies.
Conducting multiple comparisons increases the risk of committing Type I errors—that is, incorrectly
rejecting the null hypothesis. To address this, post hoc tests incorporate corrections that adjust the
criteria for statistical significance. Common post hoc tests include Tukey’s Honestly Significant
Difference (HSD), Scheffé’s test, and the Newman–Keuls procedure. These tests differ in their
stringency: Tukey's HSD balances the need to detect real differences without inflating the Type I error
rate, whereas Scheffé's test is more conservative and controls the error rate even more strictly.
The choice of post hoc test often depends on the research context, sample sizes, and the number of
comparisons. A crucial prerequisite for conducting post hoc tests is that the overall F-test must first
be significant. Only then is it appropriate to explore which specific group differences are driving the
overall effect.
Chi-Square as a Measure of Discrepancy Between Expected and Observed Frequencies
The chi-square test is a non-parametric statistical procedure used to examine the discrepancy
between observed and expected frequencies in categorical data. It is especially useful in evaluating
hypotheses about distributions of frequencies across different categories or testing the
independence between two categorical variables. The chi-square statistic is calculated by summing
the squared differences between observed (fo) and expected (fe) frequencies, divided by the
expected frequencies for each category.
This statistic evaluates whether the differences between what was observed and what was expected
are too large to be attributed to chance. If the calculated chi-square value exceeds the critical value
from the chi-square distribution table (based on the degrees of freedom and significance level), the
null hypothesis is rejected. This implies that there is a statistically significant difference between the
observed and expected frequencies, suggesting that the observed pattern is unlikely due to chance
alone.
Assumptions and Calculations for the Chi-Square Test
Although often labeled as a non-parametric test, the chi-square test does rest on several
assumptions. First, the data should be collected from a random sample, and the observations should
be independent of one another. This means that the occurrence of one observation in a category
should not influence the probability of another observation appearing in any category. Second, the
expected frequency in each cell of the contingency table should be at least five. If this condition is
not met, particularly in small samples, a correction such as Yates' correction for continuity should be
applied, or Fisher’s exact test may be considered as an alternative.
The calculation of the chi-square statistic involves several straightforward steps. First, determine the
expected frequencies for each category under the null hypothesis. Then, subtract the expected from
the observed frequencies, square the difference, and divide by the expected frequency. Summing
these values across all categories yields the chi-square statistic. This value is then compared to a
critical value from the chi-square distribution table. If the calculated value exceeds the critical
threshold, the null hypothesis is rejected. The degrees of freedom for the test are calculated as the
number of categories minus one for a goodness-of-fit test or as (rows − 1) × (columns − 1) for a test
of independence.