0% found this document useful (0 votes)
8 views18 pages

Statistics Notes

The document discusses inferential statistics, focusing on the Independent Sample t-test and Dependent Sample t-test, which are used to compare means from different populations or related samples, respectively. It outlines the assumptions, formulas, and applications of both tests, as well as the concept of confidence intervals (CIs) that provide a range of values for an unknown parameter with a specified confidence level. Additionally, it highlights the advantages of using confidence intervals over traditional hypothesis testing in statistical analysis.

Uploaded by

Mishika Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

Statistics Notes

The document discusses inferential statistics, focusing on the Independent Sample t-test and Dependent Sample t-test, which are used to compare means from different populations or related samples, respectively. It outlines the assumptions, formulas, and applications of both tests, as well as the concept of confidence intervals (CIs) that provide a range of values for an unknown parameter with a specified confidence level. Additionally, it highlights the advantages of using confidence intervals over traditional hypothesis testing in statistical analysis.

Uploaded by

Mishika Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

INFERENTIAL STATISTICS

INDEPENDENT SAMPLE t-TEST

In Independent Sample t-test, we compare two samples of two different populations which are independent to
each other. Due to the assumption of homogeneity, ux – uy = 0.

The Independent Samples t Test compares the means of two independent groups in order to determine whether
there is statistical evidence that the associated population means are significantly different. The Independent
Samples t Test is a parametric test.

Independent samples are the samples which are drawn so that the selection of elements in one sample is in no
way influenced by the selection of elements in the other and vice versa.

Independent samples exist when the selection of elements comprising the sample of Y scores is in no way
influenced by the selection of elements making up the sample of X scores, and vice versa.

Formula:

The Independent Samples t Test is commonly used to test the following:

- Statistical differences between the means of two groups


- Statistical differences between the means of two interventions
- Statistical differences between the means of two change scores
Ɯ ASSUMPTIONS

 Each sample is drawn at random from its respective population.

Under the normal curve model, the ideal condition requires that we know σx and σy. Use of the t statistic
and evaluation of the outcome in terms of independent sample t test evades this assumption. The t
distribution makes allowance for the estimation of σx and σy from the samples.

 Two random samples must be independently selected.

Example independent variables that meet this criterion include gender (2 groups: male or female),
employment status (2 groups: employed or unemployed), smoker (2 groups: yes or no), and so forth.

 Samples are drawn at random with replacement.

When we sample with replacement, the two sample values are independent. This means that what we get
on the first one doesn't affect what we get on the second. Mathematically, this means that the covariance
between the two is zero.

In sampling without replacement, the two sample values aren't independent. Practically, this indicates
that what we got on the for the first one affects what we can get for the second one. Mathematically, this
means that the covariance between the two isn't zero, and the resulting computation becomes complex.

 The sampling distribution of both groups (𝑋- 𝑌) follows the normal curve.

Another requirement is that the sampling distribution of (𝑋- 𝑌) follows the normal distribution. The
sampling distribution will be normal only when the distribution of the population of scores is also
normal. However, according to the central limit theorem, the sampling distribution of the mean tends
toward normality, even when the population of scores is not normally distributed. The strength of this
tendency toward normality is pronounced when samples are large but is less so when samples are small.
(The effect of central limit theorem is rather pronounced unless sample size is quite small)

 The distribution should follow the assumption of homogeneity of variance

This assumption applies only to inference about the difference between two independent means. For the t
distribution to be exactly applicable, homogeneity of variance is assumed. Though this assumption sounds
formidable, but help from several quarters in the practical application makes it efficacious. Some real-
world usages of the stated assumption are: First, according to experience, the assumption of homogeneity
of variance appears to be reasonably
satisfied in numerous cases. Second, violation of the assumption causes less disturbance when samples are
large than when they are small. (As a rule of thumb, a moderate departure from the homogeneity of
variance will have little effect when each sample consists of 20 or more observations). Finally, the
problem created by homogeneity of variance is minimized when the two samples are of equal (or nearly
equal) size.

DEPENDENT SAMPLE t-TEST

It is a design in which measurement in one sample are related to measurements in the other sample. There are
three basic ways in which dependent samples may be generated: (1) the same subjects are used for both
conditions of the study; (2) different subjects are used, but they are matched by the experimenter on some
variable related to performance on the variable being observed; and (3) natural pairings.

Repeated Measure design: It is a design in which observations are measured on the same subject under two or
more conditions. The potential advantage of this design is that it can reduce differences between the two groups
of scores due to random sampling. Thus, for a given pair of observations, the value of Y is in part determined by
(or related to) the particular value of X, and the samples cannot be said to be independent.

Matched Subject design: Samples may be dependent even when different subjects have been used. When
observations are paired this way, the value of any particular Y score will be in part related to the value of its
paired X score, and so the values of X and Y cannot be said to be completely independent. The matched subject
design is a design in which subjects from two or more samples are paired/matched by the experimenter. In a
matched-subjects design, it is not always necessary that the experimenter match subjects on some variable to
form pairs. Some pairs of observation occur naturally in the population. For example, investigators may choose to
study identical twins, fathers and sons, mothers and daughters, or husbands and wives. Many people would
prefer to call such studies matched-pairs investigations (rather than “matched-pairs experiments”).

Formula:

(register)

Ɯ ASSUMPTIONS

Unlike the test of the difference between two independent means, the assumption of homogeneity of
variance is not required for the test between two dependent means.

 Hypothesis testing still assumes that the sample has been drawn randomly with replacement and that
the sampling distribution of (𝑋- 𝑌) 𝑜𝑟 (𝐷) is normally distributed.

Our choice of sample size is an important design consideration. In the behavioural sciences, most
dependent variables are not distributed normally. Researchers and statisticians talk about the dependent
t-test only requiring approximately normal data because it is quite "robust" to violations of normality,
meaning that the assumption can be a little violated and still provide valid results.
When sample size is large, (>25), the central limit theorem tells us that the sample distribution of 𝐷
should approximate the normal curve regardless of what the original population of scores look like. On
the other hand, if sample size is small (<25) and we have grounds to believe that the parent population is
not normal in shape, a ‘nonparametric’ statistical test may be a better alternative than t.

CONFIDENCE INTERVAL

In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data.
This gives a range of values for an unknown parameter (for example, a population mean). The interval has an
associated confidence level that gives the probability with which the estimated interval will contain the true
value of the parameter. The confidence level is chosen by the investigator. For a given estimation in a given
sample, using a higher confidence level generates a wider (i.e., less precise) confidence interval. In general terms,
a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding
estimator.

Through the confidence interval, we analyse how confident an investigator or researcher is that the population
parameter will fall in an interval.

When a researcher constructs intervals according to the standard rule, it is proper to say that the probability is
.XX that an interval so constructed will include ux. However, once the specific limits are established for a given
set of data, the interval thus obtained either does or does not cover ux. It is not proper to say that the probability
is .XX that ux lies within the interval. Consequently, it is usual to substitute the term confidence for probability
in speaking of a specific interval. The limits of the interval are referred to as confidence interval and the
statement of degree of confidence is called the confidence coefficient, C.

Let us substitute .XX, for .95 corresponding to the previous explanation. When a researcher says that he is 95%
confident, he means that he doesn’t know whether the particular interval covers ux, but when intervals are
constructed according to the rule, 95 of every 100 of them (on the average) will include ux. Since ux is a fixed
value, and does not vary, it is the interval that varies from estimate to estimate and not the value of ux.
Investigators may expect that 95% of estimates will include ux within their range when tp is selected to agree
with C= .95.

The locations of the intervals vary because the sample means vary. The widths of the intervals differ, too. This is
because interval width depends in part on the varying estimates of the population standard deviation obtained
from the several samples. Factors affecting the width of the confidence interval include the size of the sample,
the confidence level, and the variability in the sample.

It is also important to bring into consideration that sx (and consequently s𝑥) is affected by sample size. Thus, for a
given confidence coefficient, a small sample results in a wide confidence interval and a large sample, results in a
narrower one.

Formula:
One sample t test: CI= 𝑋 ± tp s𝑥

Independent sample t test: CI = (𝑋- 𝑌) ± tp s𝑥-𝑦


*here, tp refers to the critical value that is dependent upon the degrees of freedom.

Example:

Average Height

We measure the heights of 40 randomly chosen men, and get a mean height of 175cm, We also know the
standard deviation of men's heights is 20cm. The 95% Confidence Interval (we show how to calculate it later) is:
175cm ± 6.2 cm

This says the true mean of ALL men (if we could measure all their heights) is likely to be between
168.8cm and 181.2cm. However, it might not be the case. The "95%" says that 95% of experiments
like we just did will include the true mean, but 5% won't. Therefore, there is a 1-in-20 chance (5%)
that our Confidence Interval does NOT include the true mean.

Ɯ RELATION BETWEEN CI AND HYPOTHESIS TESTING

Confidence intervals and hypothesis testing are two sides of the same coin. For most population
parameters, the confidence interval contains all values of Ho that would be retained had they
been tested using α= 1-C (for a nondirectional alternative hypothesis).

If the value specified in Ho falls within the interval Ho would be retained in hypothesis testing,
whereas if that values falls outside the interval, it would be rejected.

If a confidence interval of 95% is constructed, all values in the interval are considered possible for
the parameter being estimated. The values outside this interval are rejected, if the value of the
parameter specified by the null hypothesis is not contained in the 95% interval, at the .05 alpha
level. An inference of the same would be, a problem in which 𝑋 = 112, s𝑥 = 5 and n= 25. When
the null hypothesis Ho: ux = 100 is tested, t= 12/5 = 2.4; the difference is significant at the .05 level
(tcrit = ±2.064 for df). If we calculate a confidence interval from exactly the same data, we get C
(+1.68 ≤ ux ≤ 22.32) = .95. This interval estimate says that the value of ux exceeds 100 by 1.68 to
22.32 points. The interval does not include the possibility that ux= 100, the same message
conveyed by the null hypothesis. In addition, the confidence interval informs us what the true
difference could be.

On the other hand, if a 99% class interval is constructed, then all values in the interval are
considered possible for the parameter being estimated. The values outside this interval are
rejected, if the value of the parameter specified by the null hypothesis is not contained in the
99% interval, at the .01 alpha level. If a confidence interval reads C (-16.70 ≤ ux ≤ +6.70) = .99,
zero is among the possible values that the difference might take. If instead of confidence interval,
the researcher had tested the null hypothesis of no difference at α= .01, it would have produced a
t smaller than the value needed to reject the hypothesis.

It may be convenient to make the general correspondence that parameter values within a
confidence interval are equivalent to those values that would not be rejected by a hypothesis test,
but this would be dangerous. In many instances the confidence intervals that are quoted are only
approximately valid, perhaps derived from "plus or minus twice the standard error," and the
implications of this for the supposedly corresponding hypothesis tests are usually unknown.

It is worth noting that the confidence interval for a parameter is not the same as the acceptance
region of a test for this parameter, as is sometimes thought. The confidence interval is part of the
parameter space, whereas the acceptance region is part of the sample space. For the same reason,
the confidence level is not the same as the complementary probability of the level of significance.

Some statisticians have suggested that we use confidence intervals to test hypothesis. However,
when used in a particular way, confidence intervals have the same Type I and Type II error rates
as does null hypothesis significance testing.

Ɯ ADVANTAGES OF CONFIDENCE INTERVALS

The behavioral sciences employ hypothesis testing much more often than confidence intervals,
but in many studies confidence intervals are superior. Some advantages of confidence intervals
(independently and, in comparison to hypothesis testing) are:

1. The final quantitative output of a confidence interval is a statement about the parameter of
interest. In hypothesis testing, the statement is about a derived score, such as z or t, or about a
probability, p. In both forms of inference, the question is about the parameter. In estimation,
unlike hypothesis testing, it is right before our eyes.

2. A confidence interval straightforwardly shows the influence of random sampling variation and,
in particular, sample size. For a given level of confidence, a small sample results in a wide
confidence interval and a large sample results in a narrower one. The width of the interval gives
the investigator a direct indication of whether the estimate is precise enough for the purpose at
hand.
For hypothesis testing, significance is subject to two influences that cannot be untangled without
further study: (1) the difference between what was hypothesized and what is true and (2) the
amount of sampling variation present. For example, a large value of t could occur in a study
where the true difference was small but the sample was large, or where the true difference was
large and the sample was small. Unless we look at the descriptors and n, we do not know which
one it is.

3. In hypothesis testing, it is easy to confuse a statistically significant difference with an


important one. Essentially, this problem disappears when we use a confidence interval. Suppose
that for a sample of IQ scores, 𝑋= 103, s𝑥 =20, and n =1600. If we test the hypothesis that ux =100,
we obtain a t of approximately +6 with a corresponding probability <.000000001. However, the
95% confidence interval of ux is C(102 ≤ X ≤ 104) =.95, which brings us back to reality. It is up to
us to decide whether this result is important.

4. The outcome of testing a hypothesis is to declare that the one specific condition stated in the
null hypothesis could be true (Ho is retained) or is unlikely to be true (H0 is rejected). A
confidence interval, on the other hand, emphasizes the existence of a range of values that might
characterize the parameter in question. Where we can, it seems desirable to recognize the variety
of possible outcomes rather than develop a “yes–no” mindset toward one particular possibility.

5. Confidence intervals provide information about a range in which the true value lies with a
certain degree of probability, as well as about the direction and strength of the demonstrated
effect. This enables conclusions to be drawn about the statistical plausibility and clinical
relevance of the study findings.

6. CI informs the investigator and the reader about the power of the study and whether or not the
data are compatible with a clinically significant treatment effect. The width of the CI is an
indication of the precision of the point estimate – a narrower CI indicates a more precise
estimate, while a wide CI indicates a less precise estimate.

7. The lower limit of a CI, which is the limit closest to the null value, is typically used for
hypothesis testing. The higher limit, the limit furthest from the null value, can be used to indicate
whether or not a treatment effect or association is compatible with the data.

8. Confidence intervals also provide a more appropriate means of analysis for studies that seek to
describe or explain, rather than to make decisions about treatment efficacy. The logic of
hypothesis testing uses a decision-making mode of thinking which is more suitable to
randomized, controlled trials (RCTs) of health care interventions.

9. Confidence intervals attach a measure of accuracy to a sample statistic. Determining the


accuracy of a point estimate is not possible. A statistic such as a sample mean is just one estimate
of the population mean, and, because the population mean is nearly always unknown, it is not
possible to know how good the estimate is.
10. The strengths of CIs become particularly evident in meta-analysis. Displaying the CI for each
study on what is known as a forest plot illustrates clearly the relative merits of the separate
studies. Those studies that are based on larger samples have correspondingly narrower CIs, and
the CI for the total odds ratio is the narrowest, as this CI is based on the aggregated sample.

UNIT-3
ANOVA

In the 1920s, Sir Ronald Fisher developed a better answer to problems presented by using techniques
that take into consideration only the information provided by two groups. The technique is called
analysis of variance, abbreviated ANOVA. The technique allows us to compare simultaneously
several means with the level of significance specified by the investigator.

The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically
significant differences between the means of three or more independent (unrelated) groups.

To complete the analysis of variance, we require a method of comparing s2bet with s2w. This is the F
ratio, a statistic named in honor of Ronald Fisher. The F ratio is formed by the ratio of the two
independent variance estimates:

F = s2bet / s2w

In one-way analysis of variance, there may be two or more treatment conditions, often referred to as
different levels of the categorical independent variable or factor. The number of treatment
conditions are represented by the use of k. The k treatments may be identified by letters, such as A,
B, and C, and the population means as uA, uB, and uC.

The null hypothesis in ANOVA is often referred to as an omnibus hypothesis (i.e., covering many
situations at once), and ANOVA itself as an omnibus test. The alternative hypothesis, often stated
simply as “not Ho” is that the population means are different in some way. For example, two or more
group means may be alike and the remainder different, all may be different, and so on. A distinction
made between directional and nondirectional alternative hypotheses, no longer makes sense when
the number of means exceed two.

Ɯ COMPARISON OF F AND t

T-test and Analysis of Variance abbreviated as ANOVA, are two parametric statistical techniques
used to test the hypothesis. As these are based on the common assumption like the population
from which sample is drawn should be normally distributed, homogeneity of variance, random
sampling of data, independence of observations, measurement of the dependent variable on the
ratio or interval level, people often misinterpret these two.

One way analysis of variance (ANOVA) is also known as the F-test and the expression F= t2
shows the relationship between ANOVA and t. Like the t distribution, the F distribution is
actually a family of curves depending on the degrees of freedom. This distribution is positively
skewed. This is intuitively reasonable. If the estimate of σ2 in the numerator is smaller than that
in the denominator, F will be less than 1.00 but not less than zero (because variance estimates are
never negative). But if the estimate in the numerator is larger than that in the denominator, the F
ratio may be much larger than 1.00. The null hypothesis of equality of population means will be
rejected only if the calculated value of F is larger than that expected through random sampling if
the hypothesis is true. Consequently, the region of rejection is entirely in the upper tail of the F
distribution.

ANOVA is closely related to the t test and, for the special case of two groups, leads to exactly the
same conclusions as the t test. In fact, you can use ANOVA instead of t in the two-sample,
independent-groups design.
𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛𝑠−ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛𝑠
t= 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛𝑠


F=
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑤𝑖𝑡ℎ 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

In the formulas of F and t test, the denominator in both the respective tests, appear to be due to
chance, that is, no treatment effect. The only difference lies in the estimate of variance and
standard deviation in the respective tests.

The t-test is a method that determines whether two populations are statistically different from
each other, whereas ANOVA determines whether three or more populations are statistically
different from each other.

Like the t-test, ANOVA is used to test hypotheses about differences in the average values of some
outcome between two groups; however, while the t-test can be used to compare two means or
one mean against a known distribution, ANOVA can be used to examine differences among the
means of several different groups at once. More generally, ANOVA is a statistical technique for
assessing how nominal independent variables influence a continuous dependent variable.

It can be concluded that t-test is a special type of ANOVA that can be used when we have only
two populations to compare their means. Although the chances of errors might increase if t-test is
used when we have to compare more than two means of the populations concurrently, that is
why ANOVA is used.

Ɯ THE BASIS OF ANOVA


 Within-groups variation (s2w)

Within each sample, the individual scores vary around the sample mean. This is known as a
within-groups variation. Within-groups variation is a direct reflection of the inherent
variation (variation due to chance) among individuals given the same treatment. We can
present exactly the same stimulus to everyone in a group and still observe variation in
reaction times.

Each sample is looked at on its own. In other words, no interactions between samples are
considered.

Within-groups variation reflects only inherent variation. It does not reflect differences caused
by differential treatment. The reason is that within a particular treatment group, each subject
gets the same treatment.

 Between-groups variation (s2bet.)

The between-groups variation is a reflection of inherent (chance) variation among


individuals. It is the variation among the means of the different treatment conditions (here,
means can vary because of inherent variation and treatment effect, if any exists).

Unlike within group variation, where the focus is on the differences between a population
and its mean, between group variation is concerned with finding how the means of groups
differ from each other.

Ɯ ASSUMPTIONS ABOUT ANOVA

The analysis of variance can be presented in terms of a linear model, which makes the following
assumptions about the probability distribution of the responses. Since F = t2, the assumptions
underlying the F test for one-way ANOVA are the same as those for the t test for independent
samples.

 The populations are normally distributed.

According to the Central Limit Theorem, sample means tend to cluster around the central
population value. Therefore, when sample size is large, the distribution of the sample means
will always be large. For large sample sizes testing for normality doesn’t really work.

The one-way ANOVA is considered a robust test against the normality assumption. This
means that it tolerates violations to its normality assumption rather well. As regards the
normality of group data, the one-way ANOVA can tolerate data that is non-normal (skewed
or kurtotic distributions) with only a small effect on the Type I error rate.

Moderate departure from the normal distribution specified, does not unduly disturb the
outcome of the test. This is especially true as sample size increases. Normality is really only
needed for small sample sizes, say n < 20 per group. However, with highly skewed
populations, ANOVA results in less accurate probability statements.

Platykurtosis can have a profound effect when your group sizes are small. This leaves you
with two options: (1) transform your data using various algorithms so that the shape of your
distributions become normally distributed or (2) choose the nonparametric Kruskal-Wallis H
Test which does not require the assumption of normality.

 The variances of the several populations are the same (homogeneity of variance).

With larger variances, the investigator can expect a greater number of observations at the
extremes of the distributions, and this can have real implications on inferences that the
researcher makes from comparisons between groups. Homogeneity is only needed if sample
sizes are very unequal. In this case, Levene's test indicates if it's met.

Violations of this assumption, homogeneity of variance, are also negligible when sample sizes
are equal. Heterogeneity of variance ordinarily becomes a problem only when σ2largest =
4σ2smallest or, with more moderate differences, when sample sizes differ considerably. In
condition of a choice, the researcher must try to select samples of equal size for each
treatment condition. This minimizes the effect of failing to satisfy the condition of
homogeneity of variance.

There are two tests that you can run that are applicable when the assumption of homogeneity
of variances has been violated: (1) Welch or (2) Brown and Forsythe test. Alternatively, you
could run a Kruskal-Wallis H Test. For most situations it has been shown that the Welch test
is best.

 Selection of elements comprising any particular sample is independent of selection of


elements of any other sample.
The research samples have to come from a randomized or randomly sampled design, which
means that the rows in given data do not influence one another. This is an important
assumption which plays significant role in simplification of statistical analysis.

Observations may not be independent if: (1) repeated measurements are taken on the same
subject (2) observations are correlated in time (3) observations are correlated in space.
This assumption, independence of samples, reminds us that the ANOVA procedure described
to this point is not appropriate for repeated measures on the same subjects nor when matched
subjects have been assigned to treatment groups.

 Samples are drawn at random with replacement.

The biggest problem is probably that of obtaining random samples. In the real world most
researchers use random assignment of available subjects. Random sampling is more of a
premise than a requirement—it allows us to extend conclusions to the population.

POST HOC COMPARISONS


Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least
three group means, statistically significant results indicate that not all of the group means are equal.
However, ANOVA results do not identify which particular differences between pairs of means are
significant. Use Post hoc tests to explore differences between multiple group means while controlling
the experiment-wise error rate.

Post Hoc Comparisons or posteriori comparisons are statistical tests performed after obtaining a
significant value of F that indicate which means are significantly different. Post hoc tests are only
used in conjunction with tests of group difference, such as ANOVA.

Post hoc tests allow researchers to locate those specific differences and are calculated only if the
omnibus F test is significant. If the overall F test is nonsignificant, then there is no need for the
researcher to explore for any specific differences.

They keep us from taking undue advantage of chance. If we are comparing many means, we would
expect some of the differences to be fairly substantial as a result of random sampling variation alone,
even if the null hypothesis is true.

Applying the conventional t test just to the largest or to all of the possible differences between
sample means would substantially increase the probability of making a Type I error—that is,
rejecting Ho when it is true. Instead of using a sampling distribution that compares the means of only
two samples, post hoc tests generally employ sampling distributions that compare the means of many
samples. In short, post hoc tests protect us from making too many Type I errors by requiring a bigger
difference before we can declare that difference to be statistically significant.

There are several commonly used Post hoc tests. The only real difference among them is that some
are more conservative than others. In ascending level of conservativeness with regard to Type I
errors, we could choose from among such tests as Duncan’s multiple-range test, the Newman–Keuls
test, Tukey’s HSD test, Bonferroni Procedure and the Scheffé test. To use any one of these tests, our F
ratio must first be significant.
The Tukey’s HSD test does not inflate the probability of Type I error as much as many other tests,
yet it is not nearly as conservative as the Scheffé test. HSD stands for “honestly significant
difference.” The test involves determining a critical HSD value for the data. Although the test can be
used to make specific comparisons, most investigators make all possible pairwise comparisons. The
hypothesis of equal population means is rejected for any pair of samples for which the difference
between sample means is as large as or larger than the critical HSD value.

Ɯ CONCERNS ABOUT POST HOC COMPARISONS

Although most researchers still follow an ANOVA with post hoc tests, many leading statisticians
caution about their use. This happens because of several reasons:

First, most researchers do not need to see the results of comparing all possible pairs to answer the
question(s) they are interested in, and tests for all possible pairs are very conservative, thus
sacrificing power.

Second, the use of tests of all possible pairs often resembles a fishing expedition: Reviewers
should require writers to articulate their expectations well enough to reduce the likelihood of
post hoc rationalizations. Fishing expeditions are often recognizable by the promiscuity of their
explanations.

In place of post hoc tests, these statisticians urge that researchers examine trends or clusters of
effects. Most important, they advise, “If a specific contrast interests you, examine it”.

UNIT-4

NON-PARAMETRIC TESTS

In the broadest sense a nonparametric statistical method is one that does not rely for its validity or its
utility on any assumptions about the form of distribution that is taken to have generated the sample
values on the basis of which inferences about the population distribution are to be made. However
weaker assumptions are still required in most of the nonparametric statistical methods. Some of the
nonparametric tests such as sign test were used as early as in the eighteenth century.

The non-parametric test does not require any population distribution, which is meant by distinct
parameters. It is also a kind of hypothesis test, which is not based on the underlying hypothesis. In
the case of the non-parametric test, the test is based on the differences in the median. So, this kind of
test is also called a distribution-free test. The test variables are determined on the nominal or ordinal
level. If the independent variables are non-metric, the non-parametric test is usually performed.
Nonparametric tests serve as an alternative to parametric tests such as T-test or ANOVA that can be
employed only if the underlying data satisfies certain criteria and assumptions.

Some tests using the nonparametric approach are: Wilcoxon signed-rank test, Kruskal-Wallis test,
Mann-Whitney U test, Spearman correlation, and Friedman’s ANOVA amongst others.

PARAMETRIC TESTS NONPARAMETRIC TESTS

A statistical test, in which specific assumptions A statistical test used in the case of non-metric
are made about the population parameter is independent variables, is called non-parametric
known as parametric test. test.

The parametric test is one which has The nonparametric test is one where the
information about the population parameter. researcher has no idea regarding the population
parameter.
Parametric is a statistical test which assumes Non-parametric does not make any
parameters and the distributions about the assumptions for the distribution and
population is known. parameters of a given population.

To calculate the central tendency, a mean To calculate the central tendency, a median
value is used. value is used.

As opposed to the nonparametric test, the In the nonparametric test, it is assumed that the
parametric test assumes that the measurement measurement of variables of interest is measured
of variables of interest is done on interval or on a nominal or ordinal scale.
ratio level.

The applicability of parametric test is for The nonparametric test applies to both variables
variables only. and attributes.

For measuring the degree of association For measuring the degree of association between
between two quantitative variables, Pearson’s two quantitative variables, spearman’s rank
coefficient of correlation is used in the correlation is used in the nonparametric test.
parametric test.

Parametric tests have greater statistical power. Most nonparametric tests engage in the
In most cases, parametric tests have more conversion of data to ranks, which dramatically
power. If an effect actually exists, a parametric reduces the statistical power of the test.
analysis is more likely to detect it.
Parametric tests assume a normal distribution Non-parametric tests are often described as
of values, or a “bell-shaped curve.” For distribution-free statistics because they make no
example, height is roughly a normal assumptions about the distribution. Some
distribution in that if you were to graph common situations for using nonparametric tests
height from a group of people, one would see a are when the distribution is not normal (the
typical bell-shaped curve. This distribution is distribution is skewed), the distribution is not
also called a Gaussian distribution. known, or the sample size is too small (<30) to
assume a normal distribution. That is, the
nonparametric test assumes no definite shape or
size of distribution.

The parametric test assumes unimodal The nonparametric test does not necessarily
distribution. consider a unimodal distribution.

PARAMETRIC TEST NONPARAMETRIC TEST

One Sample t-test Sign Test or Wilcoxon one Sample


Test

Name of Tests Independent Sample t-test Mann-Whitney U Test

Dependent Sample t-test Wilcoxon signed-rank test

One way ANOVA Kruskal-Wallis one way ANOVA

CHI-SQUARE TEST

The researcher compares the observed (sample) frequencies for the several categories of the
distribution with those frequencies expected according to his or her hypothesis. The difference
between observed and expected frequencies is expressed in terms of a statistic named chi-square (χ2),
introduced by Karl Pearson in 1900.

The chi-square test was developed for categorical data; that is, for data comprising qualitative
categories, such as eye color, gender, or political affiliation. Although the chi-square test is
conducted in terms of frequencies, it is best viewed conceptually as a test about proportions.

This test is used to determine whether the observed distribution differs significantly from the
theoretical or expected distribution.

In the chi-square test, the null hypothesis is that in the population, the proportional frequency of
cases in each category equals a specified value. The proportions hypothesized are derived from a
research question of interest to the researcher. When the null hypothesis is true and certain
conditions hold the sampling distribution formed by the values of χ2 calculated from repeated
random samples closely follows a known theoretical distribution. There is a family of sampling
distributions of χ2, each member corresponding to a given number of degrees of freedom.

Formula:

The chi-square statistic, χ2, provides a measure of the difference between observed and expected
frequencies. Its basic formula is:

(𝒇𝒐−𝒇𝒆)𝟐
Σ( )
𝒇𝒆
1. χ2 cannot be negative because all differences are squared; both positive and negative differences
make a positive contribution to the value of χ2.

2. χ2 will be zero only in the unusual event that each observed frequency exactly equals the
corresponding expected frequency.

3. Other things being equal, the larger the difference between the values of f0 and their
corresponding values of fe, the larger χ2.

4. It is not the size of the difference alone that accounts for a contribution to the value of χ2; rather, it
is the size of the difference relative to the magnitude of the expected frequency. This is intuitively
reasonable.

5. The value of χ2 depends on the number of differences involved in its calculation. For example, if
we used three brands of soft drinks instead of four, there would be one less difference to contribute
to the total of χ2. We take this factor into account by considering the number of degrees of freedom
(df) associated with the particular χ2.

The calculated value of χ2 will be small when the difference between the values of fo and the values
of fe is small and large when it is not. To test the null hypothesis, we must learn what calculated
values of χ2 would occur under random sampling when the null hypothesis is true. Then, we will
compare the χ2 calculated from our particular sample with this distribution of values. If it is so large
that such a value would rarely occur when the null hypothesis is true, we will reject the hypothesis.

In the standard applications of this test, the observations are classified into mutually exclusive
classes. If the null hypothesis that there are no differences between the classes in the population is
true, the test statistic computed from the observations follows a χ2 frequency distribution. The
purpose of the test is to evaluate how likely the observed frequencies would be assuming the null
hypothesis is true.

Test statistics that follow a χ2 distribution occur when the observations are independent and
normally distributed, which assumptions are often justified under the central limit theorem. There
are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on
observations of the pairs.
Ɯ ASSUMPTIONS

Many people refer to chi-square as a nonparametric test, but it does, in fact, assume the central
limit theorem and is therefore neither a “distribution-free” nor an “assumption-free” procedure.
The distribution of χ2 follows from certain theoretical considerations, and thus we may expect
obtained values of χ2 to follow this distribution exactly only when the assumptions appropriate to
this model are satisfied. Three such assumptions are:

1. It is assumed that the sample drawn is a random sample from the population about which
inference is to be made. In practice, this requirement is seldom fully met.

2. It is assumed that observations are independent.

3. It is assumed that, in repeated experiments, the observed frequencies will be normally


distributed about the expected frequencies. With random sampling, this tends to be true.

Ɯ CHI-SQUARED TEST FOR GOODNESS-OF-IT

The goodness-of-fit test is a statistical hypothesis test to see how well sample data fit a
distribution from a population with a normal distribution. Put differently, this test shows if your
sample data represents the data you would expect to find in the actual population or if it is
somehow skewed. Goodness-of-fit establishes the discrepancy between the observed values and
those that would be expected of the model in a normal distribution case.

The goodness-of-fit problem asks whether the relative frequencies observed in the categories of a
sample frequency distribution are in agreement with the relative frequencies expected to be true
in the population (as stated in Ho).

For the goodness-of-it problem, one variable can be distinguished between two variables: equal
preferences and unequal preferences.

In any goodness-of-fit problem, the hypothesized proportion in each category is dictated by the
research question of interest. The hypothesized proportions do not have to be equal, or unequal.

To calculate a chi-square goodness-of-fit, it is necessary to set the desired alpha level of


significance (e.g., if your confidence level is 95% or .95, then the alpha is .05), identify the
categorical variables to test, and define hypothesis statements about the relationships between
them. The null hypothesis asserts that no relationship exists between variables, and the
alternative hypothesis assumes that a relationship exists.

Ɯ CHI-SQUARE AS A TEST FOR INDEPENDENCE BETWEEN TWO VARIABLES


Chi-square test can also be used to analyze bivariate frequency distributions, a distribution that
shows the relation between two variables.

The Chi-Square Test of Independence determines whether there is a significant association


between categorical variables (i.e., whether the variables are independent or related).

Tabular formations of the bivariate frequency distributions can determine what cell frequencies
would be expected if the two variables are independent of each other in the population. Then we
may use chi-square to compare the observed cell frequencies with those expected under the null
hypothesis of independence. If the (fo- fe) differences are small, χ2 will be small, suggesting that
the two variables could be independent. Conversely, a large χ2 points toward a contingency
relationship.

In general, the null hypothesis of independence for a contingency table is equivalent to


hypothesizing that in the population, the relative frequencies for any row (across the categories
of the column variable) are the same for all rows, or that in the population, the relative
frequencies for any column (across the categories of the row variable) are the same for all
columns.

You might also like