0% found this document useful (0 votes)
10 views17 pages

Stats Report 2

This document outlines a practicum on inferential statistics, focusing on hypothesis testing using t-tests and ANOVA with IBM SPSS. It details the methodology for assessing job performance differences between two groups based on job description clarity, including the formulation of null and alternative hypotheses, and the assumptions necessary for valid statistical inference. The document also provides a step-by-step guide for conducting t-tests in SPSS, emphasizing the importance of random sampling and independence of samples.

Uploaded by

vartikabhatia04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

Stats Report 2

This document outlines a practicum on inferential statistics, focusing on hypothesis testing using t-tests and ANOVA with IBM SPSS. It details the methodology for assessing job performance differences between two groups based on job description clarity, including the formulation of null and alternative hypotheses, and the assumptions necessary for valid statistical inference. The document also provides a step-by-step guide for conducting t-tests in SPSS, emphasizing the importance of random sampling and independence of samples.

Uploaded by

vartikabhatia04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

INFERENTIAL STATISTICS​ ​ ​ ​ ​ ​ ​ 1

Inferential Statistics: Practicum

Submitted by

Vartika Bhatia

UID: 23034528075

Semester IV

Department of Psychology, Kamala Nehru College

UPC: 2112102403
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 2

Index

S. NO TOPIC SIGNATURE

1.​ Introduction

2.​ t-test using IBM SPSS

3.​ ANOVA using IBM SPSS

4.​ References
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 3

Aim

To assess whether a statistically significant difference exists between two groups (clear

job description and ambiguous job description) on their job performance scores using t test using

IBM SPSS.

Introduction

​ Testing hypotheses of means involves evaluating whether there is a significant difference

between the averages of two groups. This is a key part of inferential statistics, where conclusions

about populations are drawn based on sample data. The process begins with setting up two

competing statements: the null hypothesis, which typically states that there is no difference

between the group means, and the alternative hypothesis, which proposes that a meaningful

difference does exist. A test statistic, such as the t-value, is then calculated using the sample

means, sample sizes, and variability within the data. This statistic is compared against a critical

value to determine whether the observed difference is likely due to random variation or reflects a

true effect. If the test statistic falls beyond the critical value, the null hypothesis is rejected,

suggesting that the group means are significantly different. This method is essential in many

research settings where the goal is to assess the impact of different treatments, interventions, or

conditions.

Null and alternative hypotheses

​ The null and alternative hypotheses play a crucial role in statistical hypothesis testing,

especially when evaluating the means of two groups. The null hypothesis (H₀) generally posits

that there is no significant difference between the population means of the two groups. This

hypothesis is grounded in the premise that any observed discrepancies in sample means arise
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 4

from random fluctuations rather than a genuine effect. For instance, in a study examining the

impact of various reading overlays on comprehension among dyslexic children, the null

hypothesis would claim that the mean scores of the populations under both overlay conditions

are identical. This framework enables researchers to determine whether the sample data provide

adequate evidence to refute this hypothesis.

Conversely, the alternative hypothesis (Hₐ) presents a statement that opposes the null

hypothesis. It indicates that a genuine difference exists between the population means of the two

groups. This difference may be directional, predicting which group will exhibit a higher mean, or

non-directional, simply asserting that the means are not equivalent. In the case of dyslexic

children, the researchers did not indicate which overlay would enhance comprehension; their

focus was solely on identifying any potential effect. Therefore, the alternative hypothesis would

assert that the mean comprehension scores for one overlay condition differ from those of the

other. This differentiation between the null and alternative hypotheses enables researchers to

utilize sample data to draw meaningful conclusions about effects at the population level.

The random sampling distribution of the difference between two sample means

The concept of the random sampling distribution of the difference between two sample

means focuses on understanding the behavior of the difference between the means of two
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 5

samples that are randomly selected from two populations. This is an extension of the idea from

single-sample cases, where we analyze the sampling distribution of just one mean. In this case,

however, we are concerned with what happens when we subtract one sample mean from

another—denoted as 𝑋- 𝑌. The aim is to determine what kind of differences we can expect to

occur by chance when there is no actual difference between the populations, i.e., when the null

hypothesis H0 is true and the population means are equal.

To illustrate this, the example provided draws from two identical populations of scores: 3, 5, and

7. From each population, all possible samples of size 2 are drawn with replacement, and their

means are computed. For each possible sample from population X, its mean is paired with every

possible mean from population Y, giving us 81 unique combinations. For every pair, the

difference 𝑋 - 𝑌 is calculated. This forms the basis of the sampling distribution of the difference

between two sample means. These differences range from -4 to +4. Although the populations

themselves are simple and not normally distributed, the distribution of these differences, when

plotted, closely resembles a normal distribution.

Interestingly, despite the original populations being discrete and somewhat flat in distribution,

the differences in sample means still tend to fall into a normal pattern. This supports the idea that

the central limit theorem applies not just to single sample means, but also to differences between

sample means. The mean of this distribution is 0, which aligns with our assumption that the

population means are equal. This reinforces an important statistical idea: when the population

means are the same, the expected value of 𝑋 - 𝑌 is zero. Furthermore, this distribution allows
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 6

researchers to assess how likely an observed difference is, assuming the null hypothesis is true,

which is crucial for hypothesis testing and determining statistical significance.

Properties of RSDM

The properties of the sampling distribution of the difference between two means are

based on three fundamental characteristics that define any distribution: mean, standard deviation,

and shape. These properties help us understand how the differences between sample means

behave when repeatedly drawing samples from two populations.​

​ The mean of the sampling distribution of the difference between two sample means is

denoted by μ𝑋-𝑌 . This value is equal to the difference between the means of the two populations

from which the samples are drawn, i.e.,

μ𝑋-𝑌 = µ𝑋 − µ𝑌

This means that if you repeatedly take samples from two populations and calculate the difference

between the sample means each time, the average of all those differences will be equal to the

actual difference between the population means. If the two populations have the same mean (i.e.,

µ𝑋= µ𝑌), then the mean of the sampling distribution will be zero.

The standard deviation of this sampling distribution is called the standard error of the

difference between two means, denoted by σ𝑋-𝑌. This value tells us how much the difference

between sample means is expected to vary from sample to sample due to random chance. If the

samples are independent, the standard error is calculated using the formula:
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 7

This formula combines the variances (squared standard deviations) of the individual

sample means. In practical terms, it helps quantify the expected variability in the difference

between sample means.

The shape of the sampling distribution of the difference between two means tends toward

a normal distribution under many conditions. If the original populations are normally distributed,

then the sampling distribution of the difference will also be normal. However, even if the original

populations are not normally distributed, the Central Limit Theorem tells us that as the sample

sizes increase, the sampling distribution of the difference in sample means will approximate a

normal shape. This makes it possible to use statistical methods that assume normality when

testing hypotheses about the difference between two means.

Determining the formula for t

In determining a formula for t, the goal is to test whether there is a significant difference

between the means of two independent groups. To do this, we begin by estimating the standard

error of the difference between the sample means. Since we usually don’t know the population

standard deviations, we must estimate them from the sample data.

The estimated standard error of the difference between two means, denoted as s𝑋 - 𝑌, is

calculated using the variances and sample sizes of the two groups. The formula is:
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 8

This formula allows us to calculate an approximate z value, where the numerator is the

observed difference in sample means minus the hypothesized difference in population means,

and the denominator is the estimated standard error. The formula for this is:

When sample sizes are large, this z value is nearly normally distributed. However, as

sample sizes decrease, its distribution diverges from normality unless both sample sizes are

equal. In these cases, the statistic is better modeled using Student’s t distribution.

Sir Ronald A. Fisher proposed a modification assuming the variances of the two

populations are equal. This is known as the assumption of homogeneity of variance, where σ2X =

σ2Y . Under this assumption, a pooled estimate of the population variance, s2P, is calculated by

combining the sums of squares from both samples and dividing by the total degrees of freedom:

This pooled variance can be used to refine our estimate of the standard error of the difference

between means. When factored in, the formula becomes:


TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 9

Substituting the pooled variance from the previous formula, we obtain:

If the sample sizes are equal, this formula simplifies further to:

With this estimated standard error, we can calculate the t statistic, which tests the null

hypothesis that the population means are equal. The formula for the t statistic is:

This t value follows a Student's t distribution with degrees of freedom equal to

(nX−1)+(nY−1). This formulation allows researchers to make inferences about population

differences using sample data, even when the population parameters are unknown.

Steps in calculating t using IBM SPSS

IBM SPSS (Statistical Package for the Social Sciences) is a powerful statistical software

used widely in research, social sciences, psychology, education, and other fields. It allows users

to enter data, manage variables, and perform a wide range of statistical tests without the need for

complex programming. With its user-friendly interface and menu-driven commands, SPSS is
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 10

ideal for performing hypothesis testing, such as the independent-samples t-test, which compares

the means of two unrelated groups. Using the example of comparing math test scores between

students who attended in-person classes and those who took online classes, a breakdown of the

steps encompassed within the procedure of using SPSS is elaborated below.

First, the data must be entered correctly. In the Data Editor, enter all math test scores into

a single column, regardless of which group they belong to. In a second column, indicate the

group for each score: enter 1 for in-person and 2 for online. Then go to the Variable View to

name and label the variables. Name the first column Scores and label it as Math Test Scores.

Name the second column ClassType and label it Type of Class. For clearer interpretation in the

output, assign value labels to the ClassType variable: set “1” as “In-Person Class” and “2” as

“Online Class”. Once the data and labels are entered, go to the Analyze menu. From there,

choose Compare Means, then click on Independent-Samples T Test. A dialog box will appear.

Move the Scores variable into the Test Variable(s) box and the ClassType variable into the

Grouping Variable box. Click Define Groups and enter “1” for Group 1 and “2” for Group 2 to

correspond with the group codes you used in the data. Click Continue, then OK.

SPSS will generate an output window with results. The Group Statistics table will show

the number of students in each group, along with their mean scores, standard deviation, and

standard error. For example, students in in-person classes might have an average score of 85,

while those in online classes average 72. Below that, the Independent Samples Test table

presents the results of the t-test. The first column shows Levene’s Test for Equality of Variances,
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 11

which checks if the variability of scores in the two groups is similar. If the significance value

here is above .05, you use the first row labeled Equal variances assumed.

Suppose the t value is 3.95, the degrees of freedom (df) is 28, and the significance

(2-tailed) is .000. Since the p-value is below .05, the difference in mean scores is statistically

significant. SPSS also shows the actual mean difference and the 95% confidence interval,

helping you interpret how much higher (or lower) the in-person scores are compared to the

online ones. In this way, SPSS helps you determine whether the method of class delivery has a

significant effect on student performance in math tests.

Assumptions associated with inference about the difference between two independent means

When conducting inference about the difference between two independent means, there

are several important statistical assumptions that must be satisfied to ensure valid results. These

assumptions are rooted in the principles of the normal curve model. First and foremost, it is

essential that each sample is drawn at random from its respective population. Random sampling

ensures that the data collected is representative and not biased, which strengthens the

generalizability of the results.

Additionally, the two samples must be independently selected. This means that the

selection or characteristics of individuals in one group should not influence those in the other

group. Independence between samples is crucial because any relationship between the groups

could distort the comparison of their means. The statistical model also assumes that samples are

drawn with replacement, although in practical applications, this is often approximated. A critical

aspect of the model is that the sampling distribution of the difference between the sample means
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 12

(X̄ – Ȳ) follows a normal distribution. This assumption allows researchers to use the

t-distribution when the population variances are unknown, which is common in most real-world

scenarios.

In the ideal normal curve model, the population standard deviations (σₓ and σᵧ) are

known. However, since this is rarely the case, the t-distribution is used instead, as it accounts for

the estimation of these values from the sample data. Another important assumption is the

homogeneity of variance, which means that the variances of the two populations being compared

should be approximately equal. While this might seem like a stringent requirement, in practice, it

is often met or its violation has minimal impact—especially when the sample sizes are equal or

large. When samples are large, differences in variance tend to matter less, and using equal

sample sizes further reduces any negative effects caused by heterogeneity of variance. Moreover,

the central limit theorem plays a supportive role in validating the assumption of normality. It

states that regardless of the shape of the population distribution, the sampling distribution of the

sample mean will tend to follow a normal distribution as the sample size increases. This effect is

even stronger when sample sizes are large (such as 25 or more), making the t-test relatively

robust to violations of the normality assumption.

Method

Data

For the present practicum, secondary data was utilised, retrieved from an online repository

(source: https://osf.io/jv3kn/). The dataset comprised scores on job performance for two distinct

independent groups differentiated based on clarity in job descriptions (JD) — Clear job
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 13

description and ambiguous job description. The data is provided in Appendix 1. In the present

analysis, the hypotheses are structured to evaluate the impact of clarity of job description on job

performance. The total number of participants were two hundred three (N=202) and between the

two groups (Clear JD and Ambiguous JD) one hundred one participants were given a clear job

description (CJD = 101) and one hundred one participants were in the ambiguous job description

group (NNB = 101). The null hypothesis (H0) claims that there is no significant difference in job

performance between individuals with clear and ambiguous job descriptions.

Analytical Procedure

The analysis was conducted using IBM SPSS version 29.0.1. To conduct an independent t-test

using SPSS Statistics, we followed these steps: First, we accessed the main dialog box by

selecting Analyse > Compare Means. Then, we chose the outcome variable and dragged it to the

box labelled Test Variable(s). We specified our grouping variable, which distinguished between

burnout and non-burnout groups by transferring it to the box labelled Grouping Variable.

Clicking the "define groups" button allowed us to input the numeric codes assigned to each

group. After defining the groups, we clicked "continue" to return to the main dialog box. In the

main dialog box, we could adjust the width of the confidence interval in the output by clicking

"options." The default setting was a 95% confidence interval. Once we set our preferences, we

clicked "ok" to run the analysis. The output from the independent t-test

included summary statistics for the experimental conditions and the main test statistics. The

summary statistics table provided information such as the number of participants in each group,

mean scores, standard deviations, standard errors, and confidence intervals for the means. The
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 14

test statistics table contained the main test statistics, including values for cases where equal

variances were assumed and not assumed.

Result and Analysis

The results of the t test of two groups: workers with clear and ambiguous job descriptions are

presented in Table 1.

N Mean SD t (df) p

Clear 101 51.6238 15.38561 -2.124(200) 0.035

Ambiguous 101 55.8911 13.07968 -2.124(200) 0.035


Note: ***: p<.001, **: p<.01, *: p<.05, n.s.: not significant

As seen in Table , the sample size (N) for both with clear and ambiguous job description is

101.The mean score for job performance is higher for the ambiguous job description group

(55.8911) compared to the clear job description group (51.6238). The standard deviation is

15.39561 for clear group sample and 13.07968 for the ambiguous group sample. The obtained

t-test statistic is -2.124 with degrees of freedom (df) of 200. The p-value is 0.035, which is

statistically significant/or not. The standard error mean is 1.53093 for clear job description and

1.30148 for ambiguous job description group.

Discussion

The aim of the study was to assess whether a statistically significant difference existed

between two groups (clear job description and ambiguous job description) on their job

performance scores using t test using IBM SPSS. The study uses secondary data retrieved from
TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 15

an online repository (source: https://osf.io/jv3kn/) as its source of sample collection. The given

data was employed to assess the effect of type of job description over the job performance of the

employees. Hence, the data was segregated into two sects, clear job description and ambiguous

job description, creating two levels of the independent variable under examination. The null

hypothesis of the study claimed that there was no significant difference between the impact of

the two variables on the employees’ job performances. In contrast, the alternative hypothesis

pointed towards a significant difference between the two independent variables.

For the analysis of the given data, as well as understanding the relationship between job

description and performance, t-test was the statistical tool employed. The t-test is a statistical

method used to determine whether there is a significant difference between the means of two

groups, which may be related in certain features. It works by comparing the observed difference

between sample means to the difference expected by chance under the null hypothesis, which

assumes no real effect. There are several types of t-tests: the independent samples t-test compares

means from two separate groups, the paired samples t-test compares means from the same group

at different times (like before and after an intervention), and the one-sample t-test compares the

sample mean to a known value or population mean. The formula for the independent samples

t-test involves the difference between sample means divided by the estimated standard error of

that difference. Assumptions for using a t-test include normal distribution of the populations,

homogeneity of variance, independence of observations, and random sampling. When these

conditions are met, the t-test provides a powerful tool for making inferences about population

means based on sample data.


TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 16

Independent sample t-test was used for the procedural analysis of the data using IBM

SPSS. For a sample size of 101 for both groups, the average impact of clear job description on

the job performance of the employees was statistically lower (mean = 51.6238), in contrast to the

effect of ambiguous job description (mean = 55.8911). Employing Student’s t test to compare

and analyze the two independent variables, the value of t obtained (-2.124) surpasses the

threshold of retention as the value for t critical is found to be -1.972. The p-value is also

observed to be of lesser numerical value than the criteria for insignificance (p<0.05). Therefore,

through statistical evidence the null hypothesis, which claimed no significant difference between

the two groups, is rejected. The result predicts a statistically higher impact of ambiguous job

description on the job performance of employees in comparison to clear job description. While

the alternative hypothesis was nominated to be non-directional, the result seems to defy the

manifested expectations of the researcher, that is, clear job description would have a higher, even

if not significantly, positive impact on the job performance. Although the results of the t-test run

in the opposite direction, it is perfectly acceptable given the nature of the alternative hypothesis.

From the statistical results it could be inferred that ambiguous job descriptions, wherein the

employee is unclear regarding their duties in a work space, impacts their performance more

positively. Multiple factors might be intersecting in order to produce a result such as this which

has not been incorporated in this research study. No certain claims can be made to answer the

research question solely on the basis of statistical analysis, however, through inferences from the

statistical rejection, a significant difference can be concluded between the two independent

variables and their impact on the job performance.


TITLE OF YOUR PAPER​ ​ ​ ​ ​ ​ ​ ​ ​ 17

Null Hypothesis Significance Testing (NHST) has several limitations as a statistical tool

of analysis. One major issue is that it reduces complex data to a simple "significant" or "not

significant" outcome, often leading researchers to overlook the actual size or importance of an

effect (effect size). NHST also encourages p-hacking, where researchers may manipulate

analyses to achieve desirable p-values, and it heavily depends on sample size—large samples can

make even trivial effects seem significant. Additionally, NHST does not provide the probability

that a hypothesis is true, leading to frequent misinterpretation. To overcome these limitations,

researchers are encouraged to report and interpret effect sizes and confidence intervals alongside

p-values, pre-register their analyses to avoid bias, and consider using alternative methods like

estimation-based approaches that focus more on the magnitude and precision of effects rather

than just binary outcomes.

You might also like