0 ratings 0% found this document useful (0 votes) 132 views 175 pages Pde 710
The document outlines the introduction of e-course books for Post Graduate Diploma in Education students at the National Teachers' Institution, Kaduna, highlighting the shift from printed materials to digital formats in response to changing learner preferences. It emphasizes the importance of integrating ICT in education and the potential of e-books to alleviate issues related to course material scarcity. The document also discusses the significance of statistics in educational research and decision-making, detailing various statistical methods and their applications in educational contexts.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
NATIONAL TEACHERS’ INSTITUTION, KADUNA
(PGDE/DLS)
sais |
‘|
E-COURSE BOOK ONFOREWORD
The introduction of e-books for the Post Graduate Diploma in Education
(PGDE) Students of the Institute clearly indicates the commitment of the
9*Executive Management members in the integration of ICT in all
operations of the Institute as contained in its 5-Year Strategic Plan.
There is no doubt, the use of printed course books as learner support
mechanism is no longer fashionable and affordable. The advent and
availability of smart and android phones in the hands of learners has
changed the learners’ interest from’ learning through books to learning
through devices, whichis more convenient and\ accessible than the
printed copies of books.
I commend the foresight of the Director and Chief Executive - Professor
Musa Garba Maitafsir for the courage to accept the reality that ICT has
compelled Education Managers to employ ICT driven techniques in
supporting learners due to the shift in learners’ interest in learning.
The e-course books will unavoidably bring an end to theera of course
books scarcity, complaint of over and under production of the books as
well as the accessibility of selected courses bedeviling some centres of
the Institute.
I sincerely commend NTI for this giant stride with the hope that it would
be extended to it other programmes.
Finally, I employ the Management of the Institute to look at the
possibility of providing access to the it e-books to the entire teachers of
Nigeria by dedicating an Open Educational Resource (OER) site for use
free-of-charge.
(tral
Malam Adamu Adamu
Honourable Minister of Education
January, 2023NATIONAL TEACHERS’ INSTITUTE, KADUNA
POSTGRADUATE DIPLOMA IN EDUCATION
(PGDE/DLS)
E-COURSE BOOK
ON
PDE 710:
STATISTICAL METHODS IN EDUCATIONTABLE OF CONTENTS
Course Title Contents Pages
MODULE 1: STATISTICAL METHODS IN EDUCATION I
Unit 1 | The Meaning of Statistics 1-9
Unit 2 | Descriptive Statistics 10-24
Unit 3 | Measures of Central Tendency and Location: Mean, Mode,
edian and Graphical..Location of Mode, Median,| 25-53
Quartiles, Deciles ‘and Percentiles
Unit 4 | Measures of Variability OF Dispersion, Standlard Scores (Z-| 54-72
Scores and T-Scores) and the Normal Curve
Unit 5 | Measures of Relationship or Correlation and Regression 73-87
Unit 6 | Probability and Its Laws 88-96
Unit 7 | Distribution Funetions of a Random Variable 97-103
MODULE 2 STATISTICAL METHODS IN EDUCATION IT
Unit 1 | Testing Statistical Significance 104-114
Unit 2)» | Sampling 115-126
Unit 3~ | Parametric Statistics 127-148
Unit 4 | Non-Parametric Statistics: Computation Procedures 149 - 167
iiMODULE 1
|
PDETIO: | STATISTICAL METHODS IN EDUCATION |INTRODUCTION
Educational Research usually is an attempt to answer educational questions in a systematic,
objective and precise manner. In order to do this, the research is designed and carried out.
Measurements of various shades are carried out, From this, a jumbled mass of numbers are
obtained. In order to make meaning out of these numbers, they are organized. From these
array of data, calculations are made and relationships described. In answering the research
question, some decisions are made. In carrying out all these, the research as needs Statisties
apart from the appropriate research design.
Statistics means different things to different people. This is because statistics has its tentacles
virtually in every pie of human endeavour. Whenever the need for sound judgment and
decision making arises in any life situation, reliance on statistics is considered wise. This is
because figures don't lie though liars can figure!
Quality Teachers, Great Nationena
UNIT ONE: THE MEANING OF STATISTICS
INTRODUCTION
The word “Statistics” conveys a variety of meanings to people. To some, it is a collection of
tables, charts, data or numbers. To others, it is an advanced component of mathematics.
However, to the researcher, it is a tool for collecting, presenting, and analyzing data which
will be used in decision making. Statistics here is seen in its vigorous analytical and
applicational perspective.
In this Unit, we shall explore the meaning of statistics and some of the concepts associated
with it so that we can clearly understand its use, significance and purpose in education.
OBJECTIVES
By the end of this Unit, you should be able to:
(1) Define the term “Statisties” correctly.
2) Distinguish between statistics anid statistic.
(3) Discuss the place of statisticsin education.
(4) Explain the relationship between statistics and probability.
(5) _ Explain clearly some basic statistical concepts and notations,
WHAT IS STATISTICS?
Everyday, we are bombarded with statements such as:
The number. of accidents recorded, onollr Yoda between September and
January this year is more than that of the same period last year,
The Federal Government is to.reduce the civil service workforce by 33% in its
reform agenda.
The following statistics was provided as the allocation to states in this Quarter
etc
In all these cases, statistics is used to inform the public. ‘The use of statistics probably begin
as early as the First Century A. D., when governments used a census of land and properties
for tax purposes. This was gradually extended to such local events as births, deaths and
marriages.
The science of statistics, which uses a sample to predict or estimate some characteristics of a
population, began its development during the nineteenth century. Statistics is defined as the
science comprising rules and procedures for collecting, organizing, summarizing,
describing, analyzing, presenting and interpreting numerical data which are used in
making decisions, valid estimates, predictions and generalizations.ena
Apart from using statistics to inform people, it plays a significant role in moder day business
and educational decision making and forecasting. Statistical methods offer us the opportunity
to evaluate an uncertain future using limited information to assess the likelihood of future
events occurring.
Because of this contemporary use of statistics, it has three distinct parts — Descriptive
Statistics, Inferential Statistics and Experimental Statistics. In descriptive Statistics, the
event or outcome of events are described without drawing conclusions. It is concemed only
with the collection, organization, summarizing, analysis and presentation of an array of
numerical qualitative or quantitative data. Descriptive statistics include the Mean, Mode,
Median, Standard Deviation, Range, Percentile, Kurtosis, Correlation Coefficient, and
Proportions etc.
Experimental statistics relates the design of experiments to establishing causes and effects.
Such designs as experimental, Quasi-experimental, (factorial, Block, ANOVA, and
ANCOVA etc belong to this groupy.
Inferential Statistics builds on-the descriptive statistics by going a step further to make
interpretation. The focus of inferential statistics is-surmising the properties of a population
from the known properties of a sample of the population. Based on probability theory, valid
and reliable decisions, generalizations, predictions and conclusions can be made using this
statistics
Inferential statistics find-usefulness instochastic (random) process, queuing theory, game
theory, quality control etc; Statistical procedures like chi-square, t-test, f-test etc belong to
inferential statistics.
‘As a student of education, you need to study statistics because of its usefulness in making
predictions and taking decisions on educational matters, Downier and Heath (1970) indicated
the following basicreasons for studying statisti
(Daily Use: Statistics is of immediate and practical utility. They help the educator to
‘get work down quickly and efficiently: ‘They help the educator in forecasting, testing,
record keeping, test reporting and interpretation ete.
(2) Problem Solving: When action researches are conducted to solve immediate
problems, statistical methods are applied-to the data, -Issues bordefing on curriculum
improvement, deciding on a better method of teaching or predicting students’
enrolment and the required school plant will involve the use of statistics.
(3) Theoretical Research: Theories predict what we expect to observe in sp:
circumstances. Most researches in the behavioural sciences are now very
sophisticated and are therefore more quantitative. Theories therefore serve to
organize the information, In order to test these theories in education and the social
sciences, we resort to statistical methods. The advantages of statistical methods in
research include:
(i) They permit the most exact kind of description
(ii) They force us to be definite and exact in our procedures and in our thinking.
2ena
4)
6)
(ii) They enable us to summarize our results in a meaningful and convenient form,
It gives order to our data in order that we can see the forest as well as the
individual trees.
(iv) They enable us to draw general conclusions in accordance to the accepted
rules, It further establishes how much faith can be put on the conclusion and
how far we can extend our generalization,
(v) They enable us to prediet “how much” of a given event will occur under
specified conditions known and measured.
(vi) They enable us to analyze some of the causal factors underlying complex and
otherwise bewildering events. Causal factors are usually best uncovered and
proved by means of experiments. In education and social sciences, this may
not be possible in most cases, Statistical methods are therefore often a
necessary substitute forand as a constantcompanion of experiment,
Thus, knowledge of some basic statistical proceduires is essential for those proposing
to carry out research in order.to~summmarized andvinterpret their data well and
communicate their finding,
‘Comprehension and Use of Research
The competent educator and researcher-must.be able to-réad, with understanding,
reports of applied: and theoretical research Learning invany field comes largely
through reading In any specialized field, reading is largely a matter of enlarging
vocabulary. Reading research reports means encountering statistical symbols,
concepts and ideas which must be understood. He should also be able to determine
when a given statistical procedure had been. appropriately-used inorder to assess the
conclusions teached, To do this, he must be at grips:with statistical ideas and
methods,
Employment
Statistical logic, statistical thinking and statistical operations are necessary
components, of the teaching profession,, To the extent that the teacher uses in his
practice the common technigal instruments, such as tests, the educator will depend
upon ‘statistical background in their administration and in the interpretation of the
results. Teachers who are unfamiliar with these procedures may have difficulty in
evaluating their students’ abilities and achievements, They will also find it difficult to
review research in their areas of specialization and to acquire up-to-date information,
Knowledge of statistics is also advantageous in other employment situations like
Engineering, Accountancy, and Economies etc. Statistics has a wider application in
several human endeavours.
Training in statistics is also training in scientific method. Statistical inference is
inductive inference — the making of general statements from the study of particular
cases. Many instances of this are encountered in life and on teaching,ena
ACTIVITY I
1. Give a clear definition of statistics.
2. Explain lucidly four reasons for studying statistics in education.
THE PURPOSE OF STATISTICS
As earlier mentioned, statistics is used in a variety of forms by different people. However,
the primary purposes of statistics are to:
(i) reduce large array of data to manageable and comprehensible form;
(ii) aid in the study of populations and samples;
(iii) aid in making reliable inferences about events based on observational data; and
(iv) help in arri
ing at validand reliable decisions and genteralizations.
EDUCATIONAL STATISTICS
Educational statistics is simply the application of the Seichce of{Statistics to solve problems
connected with various facets of education. It helps us to organize, summarize, present and
interpret results and data) from) educational measurements. Through it, the degrees of
association between educational variables are measured and inferences or predictions made in
order to accomplish certain educational tasks.
According to Boyinbode (1984), such educational tasks may include the organization and
presentation of data, the measurement and description of individual or group performance,
the measurement of relationships, the design of experiments and-testing of the significance of
its results, the drawing of inferences or the formation of models and educational forecasting.
There are various role players-in education — educational. managers and administrators,
Teachers, Guidance. and Career Counsellors, Fxaminers and Examining Bodies, Researchers,
Parents and students. Each of these stakeholders in education will need information in order
to perform their roles well. Reliable information will be arrived at through the use of
statistics. Also, for them to manipulate the, information to a useful and productive end will
involve the use of statistics. Thus, each uses statistics in specific ways to achieve specific
educational tasks. It is-therefore not surprising that statistics is used in education in the
following areas:
> Determination of educational needs of the community ——- population, age
distribution, state finance, priorities, manpower, growth rate, existing institutions,
personnel ete.
> Planning for physical resources (School Plant) i.c. when determining the number of
classrooms, the formular below is often applied:
CxPxD-W
Px D
R =on
One
Where
C = number of streams in the school
P = number of periods held per day
= number of school days per week
W = _ number of periods per week spent outside normal classroom teaching
for recess, PHE, Gardening, Break, practicals in the laboratory ete.
> Planning for Human Resources
Accurate projections should be made based on population. From these, the number of
classes, teachers, students and other non-teaching staff would be determined. The total
- 1
hhumber of teachers required in a school used:torbe.l~: 1 between teachers and number of
class streams. However, in recent times, the-number of pupils enrolled for each subject
offered in the school, the number of periods per subject per week, the level of difficulty of
each subject, the level of academic attainment of students in each subject and the content
volume of each subject are input variables in the equation.
The other important areas where Statistics is applied in education include:
> Educational Budgeting
> Inspection and school record/keeping
> Test development, fest scoring and test reporting
> — Continuous’assessment and record keeping and reporting:
In all these areas, statistics is applied to solve educational problems by various stake holders.
Statistics is therefore of immense importance in education,
ACTIVITY
Briefly discuss the rolé of statisties in eduCation.
SOME BASIC STATISTICAL CONCEPTS AND NOTATIONS
Variables and Constants:
A variable is a characteristic or property that can take on different values. It refers to a
property where by the members of a group or set differ from one anothi
Individuals in a class may differ in sex age, Intelligence, height etc. These properties are
variables.
Constants on the other hand do not assume different values
Variables could be those that vary in quality or those that vary in quantity.ena
Quantitative Variables take values that very in terms of magnitude, They are easy to
measure and compare with one another. These may be scores obtained in a test, weight,
height, age, distance, number etc.
Qualitative Variables are those that differ in kind, They are only categorized. The
differences are usually in kind such as marital status, gender, nationality, social economic
status, educational qualifications ete.
Quantitative variables may be discrete or continuous.
A discrete variable is one which can take only a finite set of values, implying that fractional
values are usually not allowed. These variables are generated by a counting process usually in
whole numbers i.e. the number of goals scored in a football match, the number of teachers in
a school, number of gitls and boys in a class ete.
A continuous variable is that which can take"onany value over a range of feasible values.
Measured data can be whole numbers or fractions 1 2 weight, height, distance values etc.
Variables could also be dependent or independent depending On their functions in a given
context. A variable that is dependent in one context may be independent in another.
The Independent Variable is one that is manipulated or treated. The effect of this
manipulation is manifested’on the dependent variable. The value of the dependent variable
thus depends on that of the independent variable. Also, the value of the dependent variable is,
usually predicted from that of the independent variable. When comparing the effects of two
teaching methods on ‘students’ learning. achievement, the teaching methods are the
independent variables while learning achievement is the dependent variable.
Note that in graphing, the-dependent variable is placed on the vértical, Y - axis while the
independent variable is placed on the horizontal, X — axis.
There are two types'of independent variables — Treatment or Abtive variables and Organismic
or Attribute variables,
‘Treatment or-Active-Variable-is-defined-as’ one-that.can.be ditéctly, manipulated by the
researchers and to which he or she assigns subjects. This group includes method of teaching,
method of grouping and reinforcement procedures.
Organismig of Attribute Variables are those variables that cannot be activély manipulated by
the researchers. ‘These variables are sometimes called assigned variables’ and they are
characteristics of individuals that cannot be manipulated at will. Such independent variables
as age, sex, aptitude, social class, race, and intelligence level had already been determined but
the researchers can decide to include or remove them as variables to be studied.
Confounding Variables: confounding variables are those aspects of a study or sample that
might influence the dependent variable or the outcome measure and whose effect may be
confused with the effects of the independent variable, There are two types of these —
Intervening and Extraneous variables.
Intervening variables are those variables that cannot be measured directly or controlled but
may have an important effect upon the outcome. They are usually modifying variables that
interfere between the cause and the effect.Im
TT Statistical Meth
These may include anxiety, fatigue and motivation. These variables cannot be ignored in
experiments and must be controlled as much as practicable through the use of designs.
Extraneous variables: These are variables not manipulated by the researchers (uncontrolled
variables) that may have a significant effect on the outcome of a study. These may include
such variables as teacher competence or enthusiasm, the age, socio-economic status or
academic ability of the students in the study.
Though it is impossible to eliminate all extraneous variables in a classroom research, using
robust experimental designs enables the researcher to neutralize their influence to a large
extent. Some other methods include removing the variable, randomization, matching cases,
‘group matching or balancing cases and analysis of covariane
Data: This is a collection of information, qualitative or quantitative
Distribution: This is the amangettient of a Sét of numbers classified according to
some property,
Population: This refers tothe group of measurements that ate of interest i.e. the aggregate of
units to be covered. This may be people; objects, materials, measurements or things,
Populations could be finite or infinite, When the population is not too large and can be easily
counted then it is finite i, e; number of students ina school, number of candidates that wrote
an examination ete.
However, when the members of @ pupation are so large say like the grains of sand or number
of women in West Africa, we say itis infinite,
Sample: This is a part or subset of a population. It is any subgroup or sub aggregate drawn by
some appropriate-method from”a population, The sample-is usually the portion of the
population appropriately selected for observation:
Parameter: This is a descriptive measure or characteristic, true value of a population, When
such characteristics'as-mean, standard deviation or variance of @ population is computed they
are called parameters.
Statistic: This refers to a descriptive measure or characteristic of a sample
When we calcilaté the average age Of ‘candidates who ‘wrote JME, we ‘are talking of a
parameter. However, if we compute the average age’ of candidates from a given school or
state then the average age is a statistic. Note that this is also different from statistics as a
discipline.
Different Symbols are used to denote statistics and parameter:
characteristics Parameter Statistic
Mean H X oo M
Standard Deviation 6 SD orS
Variance oF SD? or S?ear
EAS ee
Dichotomy: A categorical variable with only two categories- i. e. Male/Female
Categorical Variable: A nominal variable on which positions and scores are not recorded as
number
Scores: Any position on a numerical variable
Skewness of a Distribution: This is a distribution having a longer tail at one end than at the
other. It is an asymmetrical distribution.
Kurtosis: This is the extent of peakedness in a distribution
Normal Distribution: This is a symmetrical distribution having its mean, mode and Median
equal. Also, the frequencies of the variable extend equally both to the left and to the right of
the mode.
Parametric Tests: These are tests whose efficacy. tests whether the variable being studied is
at least approximately normally distributed,
Non-Parametric Tests: These tests are developed ‘without reference to the distribution of
variables.
x: avariable
ft: frequency of occutrence or observations
n; the sample size in:the number of observations selected from a population (number of
occurrence
N= EF; total numbér of observations comprising a population of interest.
Z: pronounced as sigma i. e, is a summation sign which instructs us to “take the sum of
or add
© 2,46) 210
Yoo square root Sign directs us to find the square root of anumber ie
= 6
= this
fects ls to-raise a quantity to the indicated power.
ACTIVITY
(Clearly distinguish between
(i) statistics and “statistic”
(i) continuous and diserete variable
(ii) parametric and non-parametric tests.
(iv) discrete and continuous variables
(v) qualitative and quantitative variables
(vi) dependent and independent variablesaT Statistical Meth
SUMMARY
. In this unit, we have defined statistics as the science which comprises the rules and
procedures for collecting, organizing, summarizing, describing, analyzing, presenting
and interpreting numerical data which are used for decision making, predictions and
generalizations.
© The importance of statistics in general terms were discussed
© Educational statistics is the application of statistics in the field of education,
© The uses and purposes of statistics in education were enumerated.
© Some basic statistical concepts with some statistical not notations were also
explained.
ASSIGNMENT
Discuss one application of’statistical methods in teacher education,
REFERENCES
Avy, Donal et al (1979):~Jntroduction to Research in Education U~S. A. Holt, Rineheat and
Winston, Inc,
Best, J. W and Kahn, J. V, (1986): Research in Education. London: Practice Hall Inter.
Boyinbode I, R. (1984): Fundamental Statistical Methods in Education and Res
Ife, DC $8 Books.
arch, The-
Gay L. G. (1970)-- Education Research? Competéncies for Analysis and Application. Ohio.
Charles B--Merill
Guilford, J. P. and Fruchter, B. (1973): Fundamental Statistics in Psychology and
Education,
McCall, R. By (1980): ~ Fundamental Statistics for Psychology, U. A. A. Harcourt B.
Jovanovich Inc.Sn i)
UNIT TWO: DESCRIPTIVE STATISTICS
oT
OBJECTIVES
We have seen that various educational data can be obtained in various ways. These data must
be summarized and presented in a form that is easily understood. Statistics is used to do this.
The type of statistics will depend largely on the nature of data involved.
In this unit, we shall discuss the various scales of measurement, ways of organizing these
data and presenting them and the calculations of some statisties.
OBJECTIVES
By the end of this Unit, you should be able to:-
1. describe the four scales of measurement;
describe the organization and présentation 6f data using charts and graphs
define terms associated with frequency distribution,
construct a frequency table for any'sét of data,
draw a histogram ro represent a given set of data;
draw frequeney polygons from frequency distributions;
draw a frequency curve for a large set of data; and
ea aus
identify the different types of frequency curves.
SCALE OF MEASUREMENT
Quantification has-been defined as a numeri¢al method of describing observations of
materials or characteristics. When a defined portion of the material or characteristic is used
as a standard for measuring-any sample, a-valid and precise method of data description is
provided
Measurement is a fundamental step in the conduct of a research. Measurement is defined as
the process through which observations are translated into numbers. It is the assignment of
numerals to objects or events according to certain rules. Starting with variables, some rules
are then used.to determine how these variables will be expressed in numerical form. It may
be through tests or actual measurements. The nature of the measurement process that
produces the numbers determines the interpretation that can be made from them and the
statistical procedures that can be meaningfully used with them. Scientists distinguish among
four levels of measurement as categorized in the scales of measurement which are Nomtinal,
Ordinal, Interval and Ratio.
NOMINAL SCALE
Nominal data are counted data. Each individual can only be a member of mutually exclusive
category and not the other. All members of each category include notionally, gender, socio-
economic status, occupation, role, religious affiliation etc.
10ena
Numbers are often used at the nominal level, but only in order to identify the categories. The
numbers arbitrarily assigned to the categories serve mainly as labels or names. The numbers
do not represent absolute or relative amounts of any characterization. For instance, the
numbers given to football players do not represent their degree of skillfulness but just for
recognition and positions.
‘The identifying numbers in a nominal scale can not be arithmetically manipulated through
addition, subtraction, multiplication or division. However, those statistical procedures based
‘on mere counting such as reverting the number of observations in a category can be used.
‘Thus, with this type of scale, we can only find the mode, percentages, draw charts and may
perform chiesquare test and some special types of correlation.
ORDINAL SCALE
Nominal scales show that things. aré’ different but Ordinal scale shows the direction of
differences. It shows relative position\of one thing>to another but can not specify the
magnitude of the interval between two measures. Ordinal scales, thus only permit the ranking
of items or individuals from highest to Jowest, The eriterion for highest (o lowest ordering is
expressed as relative position or rank in’a group: I", 2”, 3 <_._ nth. This is why ordinal
scale is also called rank-order.“ Ordinal measures have no absolute values and real
differences between adjacent ranks may not be equal. Neither difference between the number
nor their ratio has meaning. When numbers 1, 2, 3 and so on are used, there is implication
that rank 1 is as much higher than rank 2 as 2 is than 3, and so on.
In ordinal measurement, the empirical procedure used for ordering objects must satisfy the
criterion of transitivity postulate. ‘This postulate holds that the relationship must be such that
“if object a is greater than object 6, and object & is greater than object ¢, then object a is
greater than object c,
This is written‘as “if (a > 6) and (6 > c), then (a > c). Other words such a8 stronger than,
precedes and has more attribute than can be Substituted for greater than in other situations.
‘The arithmetical observation of addition, subiraction, multiplication and division cannot be
usefill with ordinal scales. The statistics that can be used with nominal scale can also be used
with the ordinal scale,
INTERVAL SCALE
This is an arbitrary scale based on equal units of measurements which indicates how much of
a given characteristic is present. It provides equal intervals from an arbitrary origin.
An interval scale not only orders objects or events according to the amount of the attribute
they represent but also establishes equal intervals between the units of measure.
Equal differences in the numbers represent equal differences in the amount of the attributes
being measured. The difference in the amount of the characteristics possessed by person with
scores of 60 and 65 is assumed to be equivalent to that between persons with scores of 70 and
75. The limitation here is the lack of a true zero. The zero point is arbitrary. Interval scale
irena
lacks ability to measure the complete absence of the trait and a measure of 30 does not mean
that the person has twice as much of the trait as someone who scored 15.
You should note that in most cases where we use interval scales, the intervals are equal in
terms of the measuring instrument itself but not necessarily in terms of the ability we are
measuring.
‘Common example of interval data include time and temperature as measured on Centigrade
and Fahrenheit scales, scores obtained in achievement tests and other examples.
We can also force ordinal scale into an interval scale as in the case of ratings like:
a) 1. _ Strongly agree b) Excellent
2 Agree Good
3. Undecided Average
4, Disagree ‘Weak
5. Strongly disagree Poor
SA A UND D sD
If this is regarded as a Continuum, where it is possible to choose any point, then we can
regard it as interval scalé.
Because interval scale lack true zero, multiplication and division of the numbers are not
appropriate. This is because ratios between the numbers on ar’interval scale are measureless.
However, additions and subtractions are possible, Any statistical procedures based on adding
may be used with their scale along with the procedures earlier mentioned to be appropriate
for the lower level scaleés.-These-include-mean, standard deviations, ftests, pearson r, analysis
of variance, ete.
THE RATIO SCALE
The fourth and final type of scale is the ratio scale. It provides a true zero point as well as
The numerals of the ratio scale have the qualities of real numbers and can be added,
subtracted, multiplied divided and expressed in ratio relationship e.g. 10g is one half of 20g.
30cm is three times 10em ete.
Examples of ratio data are usually found in the physical sciences and seldom if ever obtained
in education and behavioural sciences.
In education, these are limited to educational performance and other physiological
‘measurements, All types of statistical procedures are appropriate with a ratio scale.
12aT Statistical Meth
|ACTIVITY
[Describe each type of the measurement scales and give a situation when each can be applied.
THE ORGANIZATION OF DATA
It is always difficult to make sense out of a large data that have not been arranged. This may
be data from your research work or students” scores on tests. You need a method to organize
the data in order to interpret them. Organizing research data is a fundamental step in
statistics. There are two ways of organizing such data:-
(arranging the measures into frequency distributions and
(ii) presenting them in graphie forms.
When you have an ungrouped raw data that is few, it is wise to arrange them in descending
order of magnitude to produce whatis known as an array.of data. This process of arranging
the raw data to get an array offdata is called Ranking. For example, scores of the students in
your class on Statistics are as follows:
Musa ~ 70 Lawrence _ 45
David F 50, Ade 1 52
Audu -O ¥ Ofodile = | 48
Hanatu - 60 Benedict ~ 55
Bunu = 90) Osun “ 40
Ranking
90
88
70
60
5S
52
48
45
40
This is an array of Raw Data
This array provides a more convenient arrangement. The highest score being 90 and the
lowest 40.
From this, the Range can be easily calculated.
The Range is the difference between the highest score, H and the Lowest score, L which
is90-40 = 50MODUL
FREQUENCY DISTRIBUTION
Given below is another array of the raw scores of 60 students in another statistics test
55
52
50
53
53
47
60
52
50
52
47
50
70
49
50
49
46
47
55
50
40
44
49
53
42
58
38
52
52
46
49
53
50
49
47
46
50
60
49
47
44
55
46
52
58
44
53
55
52
37
37
35
32
47
58
37
In order to make meaning out of this array of data, you will arrange them from highest to
lowest. A systematic arrangement of individual measures from lowest to highest or vice-versa
is called a Frequency Distribution,
Rank ordering the scores from highest t0 lowest
This may also be put in a frequency distribution table as given below:-
70
60
40
38
60.
58
37
35
53
50
49.
47
46
44
42
SCORES
70
60
58
37
55
58
37
55
53
52
50
49
41
46
42
57
55
53
52
50
49
47
46
4g
TABLES
1
1
I
mt
14
35
33
52
50
49
41
53
52, 52
50 50
49.
a7
FREQUENCIES
1
2
3
4
5[Ena
mm
SCORES TABLES FREQUENCIES
53 HHI 6
52 HEIL 8
50 a 7
49 HHL 6
47 HI 6
46 si 4
44 mI 4
42 0 2
40 1 1
38, 1 1
FREQUENCY DISTRIBUTIONS
‘When summarizing large masses of raw data, it is often good to distribute the data into
classes or categories and-to determine the number of individuals belonging to each class
frequency.
Definition: A tabulaf arrangement of data by classes together with the corresponding class
frequency is called a frequency distribution or frequency table.
‘As a preliminary of a full scale traffic. survey, it was necessary_to have’some information
about the number of occupants of cars entering a certain town on Saturday afternoons, and an
‘occupancy count was made on each of 40 cars, The result were:
1, 3, 2, e@seeedromalomliyee dyed >
143, 32% 3 2% 2 2
1 % S591 300, Bed 36 1,
4 ol Waal lay Cilia.
‘Are these variety discrete or continuous?
‘These are discrete varieties but figures like these dazzle you and you find yourself not able to
make any meaning out of numerical data like these just by mere looking at them.
A simple picture of the occupancy of the cars is obtained if the data is given in the form of a
table, showing the number of cars with 1 occupant, the number with 2 occupants, and so on.
To tabulate the data in this way, you will probably find it easiest to work your way
systematically through the 40 counts assigning each to the appropriate category using a tally
mark as shown and working with blocks of five to facilitate the final totalling. Infact, the
‘observer might as well have recorded this data in this way in the first place.
15[Ena
mm
Number of Occupants | Tally Stokes Numbers of cars
L HH HH HH Is
2 HOH IL 2
3. HH IL 8
4, mI 4
5 I 1
40
Table 2.1
This is a simple example of a frequency distribution. (frequency table). The variate (which
will henceforth be denoted by X)is in this case "number of occupants". The number of cars
with X occupants shows the frequency with-which that value of X occurred. F is usually used.
for frequency.
In order to get a better picture, faw data can_also be grouped into Group Frequency
Distribution.
In doing this, we have to decide on the Aumber of Grotips required as well as the size of each
interval. There is no fixed number of Groups that is appropriate. However, it is advised that
between 5 and 20 groups are enough depending on the range of the scores.
GROUPED DATA
Let us consider the tesult of life-testing of 80 tungsten filament electri¢ lamps. The life of
each lamp is given to the nearest hour:
854 1284 1001 on 1168
1357 1090 1082 1494 1684
1355 1502 1281 1666 778
1550 628 1325 1073 1273
1608 1367 1152 1393 1399
1199 1155 822 1448, 1623
1058 1930 1365 1291 683
811 1137 1185 392 937
963 1279 1494 798, 1599
1281 590 960 1310 1848
1200 845 1454 919. 1571
16oa
i710 1734 1928 1416 1465
1026 1299 1242 1508 705
1084 1220 1650 1091 210
1399 1198 518 1199 2074
945 1215 905 1810 1265
We now present this data into a grouped frequency table.
NOTE THE STEPS
1. The range (2074-210) is found and divided into 10 groups.
2. Each group has width of 200.
3. The tally method is used to determine the frequefiey.in each group or class.
4. Always check up that the-sum of the frequencies \is equal to the number of
observations in the data, 80.in this case» Table 2.2 shows the grouped data.
STOP! and compare this grouped/data/with the tngrouped form of the same data, What
differences do you observe? We readily observe characteristics of the distribution clearer and
faster with the grouped data, and further statistics are readily facilitated, as you will see later
in this unit,
Table 1.2 Grouped Frequeney Distribution
7
Life Tally Marks ‘Nimber of lamps
xX F
201 - 400 I 1
401_- 600: IL z
601 - 800 a 5
801, - 1000 aH HEIL 12
1001-1200 HH HH HAIL 7
1201 - 1400 HH HH HH HH 20
1401 — 1600 HH OHH I 12
1601 — 1800 HH IL 7
1801 — 2000, Tl 3
2001 - 2200 1 1
Table 2.3MODUL
ne
TERMS USED IN FREQUENCY DISTRIBUTIONS
The table below is a frequency distribution of masses (to the nearest kg) of 100 male students
at a certain College of Education in Nigeria.
Masses of 100 Students at a Certain Coll
of Education
Mass Number of Students
(Kg) x F
60 - 62 5
63 - 65 18,
66 - 68 2
69 ~71 27
2 74 8
100
Table 2.4
‘You should notice that with groups for this table defined in the way shown, there is always a
gap between the right hand endpoint of one group and the left hand endpoint of the next one
(ic. between 62 and 63,65 and 66/etc)"This may appear to: make the data more of a discrete
one than continuous one;
However, a life)recorded 62-kg would iit reality have been between 61.5-and 62.5kg (see
rounding off numbers in Unit 1)'and similarly 63°-Kg covers true values between 62.5 and
63.5 Kg. Thus, in reality the data is a continuous one. The true end points of the groups are as
shown with continuous, coverage along the time scale.
End Points givén ag | / End of points Given as
values's measured true values
60 - 62 59.5 - 62.5
63-65 62.5 - 65.5
66-68 65.5 - 68.5
Table 2.5
It is important to choose groups whose end points do not coincide with actual observed data
‘The above explanation brings us to what is called:
18[Ena
mm
Class Boundaries: These numbers above indicated by the points 59.5, 62.5 ete are called
class boundaries or true class limits. The smaller number 59.5 is the lower class boundary
and the larger number 62.5 is the upper class boundary.
How to calculate class boundaries will be discussed later.
Class Intervals and Class Limits: A symbol defining a group such as 60-62 in the above
table is called class interval or class. The end numbers 60 and 62 are called class limits; and
the larger number is called the upper class limit while the small one is the lower class limit.
Am open class interval: is one which has no upper class limit or no lower class limit such as,
the class "75 years and over".
‘The Size of a Class Interval: The size or width of a class interval also referred to as the
class width, class size ot class strength is the difference between the lower and upper class
limits. For example in the data of table 2.5 the-class.interval is.
62.5-59.5 = 65.5-62.5 =3'¢te or
63-60 = 66-63 = For (Since the classes ate of equal, size)
65-62 = 68-65 4
Calculation of Class Boundaries, Class boundaries aré obtained by adding the upper class
limit of one class to the-lower class limit of the next higher class and dividing by 2. For
example, the upper class boundary of the first class (60-62) of the data given in table 2.4 is
62+ 63
62.5 =< The lower class boundary of the second class (63-65).
‘The upper class boundary of the second class (63 - 65) © > £6 — 65.5 lower
class boundary.of the third class (66 - 68) and so on.
594.60
‘The lower class boundary of the first class (60-62) is = 59.5
‘CLASS MARK: The elass mark also called the class midpoint or class centre is obtained by
adding the lower and upper|class limits and dividing by two. Thus, the class mark of the class
60- 62 is
60+ 62
2
= 61
ARRAYS: An array is an arrangement of numerical data in ascending or descending order of
‘magnitude. The difference between the largest and smallest numbers is the range of the data.
Example: Looking at table 2.5 of the length of life of 80 lamps.
Find
a. The lower limit of the 4th class.
b, The upper limit of the Sth class,
19MODUL
©,
ne
The class mark of the 3rd class.
d. The class boundaries of the 8th class.
€. The size of the 6th class.
f Are all the classes of the same size?
g The frequency of the 7th class.
h. Which class has the highest frequency?
Solution
a, The 4th class is 801-1000
The lower limit is 801
b. The Sth class is 1001-1200
The upper limit is 1200
c. The 3rd class is 601-800,
“The class mark is ©! ~ = 7005
d. The eighth class is 1601 - 1800
‘The lower class boundary is 1600 #1601 E 1601 1600.5
‘The upper class boundary:isyt S20 180! na g00,5,
€. The 6th class is 1201 -1400 and its'size is-1400.5-1200.5 = 200
{To determine if all the elasses are equal 600-400-200=800- 600=1000-800 ete
OR 401-201 =200-801-601 etc. All the classes are of equal size.
2. The 7th/class is 1401-1600 and its frequency is 12.
h. The 6th class 1201-1400 had the highest frequency of 20.
Table 1.4: Marks obtained in Mathematics by 80 Students
Marks Frequency
x F
50 = 54 I
55-59 2
20RS
Marks Frequency
x F
60 = 64 ul
65 - 69 10
70-74 12
75 -79 21
80 - 84 6
85 - 89 9
90 - 94
95 - 99
With reference to this table, determine;
a, The lower limit of'the 6th class
b. The upper limit of the fourth class
c. The class mark of the tenth class:
d. The class boundaries of the fifth class.
ec. The size of the 9threlass
f —_Areall the classes of equal size?
g. _ Whatis the frequency of the 6th class?
GRAPHICAL REPRESENTATION OF DATA - THE HISTOGRAM
Data may be presented in two dimensional graphs to make more comparison than is possible
with textual matter alone.
There are a number of graphs for doing this. These include line graphs, Bar graphs,
Pictographs, Pie graphs, Histogram, Frequency Polygons, Ogive and the smooth curve
Histograms, Frequency Polygon and the smooth curve are most commonly used in education.
HISTOGRAM
‘The histogram is a graph which uses bars to depict the way two variables are related. Each
bar has as their bases the class interval and its length the class frequency.
2m
Ta
7”
Example
Let us consider the frequency distribution of the length of life of the lamps earlier discussed:
Definition: The chart of a frequency distribution is called a histogram,
EXAMPLE
tire HRs)
Fig. 11
The diagram of the length of life in (hours) of 80 lamps is as shown in Fig.1.1 The base of
each rectangle extends from lower class boundary to the upper on a scale representing the
variable, in this case the length of life in hours. The true class boundaries must be used, so
that horizontal seale representing length of life is covered continuously with no breaks
in between the rectangle. Notice that the width of the rectangles are equal as shown. This is
because the frequency distribution has equal class interval. The bars are of different heights
because the frequencies of each class are different.
22ener
Looking at the histogram, you will notice that the height of the rectangles represents the
frequencies (where classes are of equal size). Note also that the left hand edge on each
rectangle represents the lower class boundary and the right hand edge represents the upper
class boundary. For the class 1201-1400 AB represents 1200.5 and CD 1400.5 which are the
lower and upper class boundaries respectively.
In general, when you combine n classes, the frequency (the height) of the new class becomes
of the sum of frequencies.
n
- « 1
Thus, if classes two and three are combined, the frequency becomes x (215) =3.5
FREQUENCY POLYGONS AND FREQUENCY CURVES
Frequency Polygons: The graph’6f a frequency distribution is called a frequency polygon.
The graph is obtained by plotting the class frequencies against the class marks. It can also be
obtained by connecting midpoints.of the tops-of the rectangles inthe histogram (where the
histogram is already drawn),
ACTIVITY:
[Draw the frequency polygon from the histogram of the length of life (in hrs) of 80 lamps.
Solution: All we need do here is join the midpoints of the already drawn histograms. The
extremes are adjusted accordingly.
Frequency Curves
Most data are samplé/‘6f @ large population. Where the population is very large many
observations are possible, it therefore becomes theoretically possible (for continuous data) to
choose class intervals very-small-and still-haye quite-a-number of observations falling within
each class. Thus, the frequency polygon for a large population will have so many small
broken line segments that they closely approximate curves which we call frequency curves.
Frequency Curves /can ‘be jobtainted) by smoothing frequency polygons.’ For)this reason a
frequency curve is sometimes called a smoothed: frequency -polygon. ‘The smoothing
removes irregularities in the curve but still approximates the same area.
JACTIVITIES
1. (a) Arrange the numbers 12,56,42,21,5,18,10,3,61,34,65,24 in an array and
(b) Determine the range.
2, If the class marks in a frequency distribution of lengths of laurels are 129, 138, 147,
156, 165, 174 and 183mm, find the class interval size, boundaries and limits.MODUL
ne
REFERENCES
Avy, Donal et al (1979): Introduction to Research in Education U. 8. A. Holt, Rineheat and
Winston, Inc.
Best, J. W and Kahn, J. V. (1986): Research in Education, London: Practice Hall Inter.
Boyinbode I. R. (11984): Fundamental Statistical Methods in Education and Research, Ne
Ife, DC S$ Books.
Gay L. G. (1970) - Education Research: Competencies for Analysis and Application. Ohio.
Charles E, Merill.
Guilford, J, P. and Fruchter, B. (1973): Fundamental Statistics in Psychology and
Education,
McCall, R. B_ (1980): Fundamental Statistics for Psychology, U. A. A. Harcourt B.
Jovanovich Inc,
24aT Statistical Meth
UNIT THREE: MEASURES OF CENTRAL TENDENCY
AND LOCATION: MEAN, MODE, MEDIAN
AND GRAPHICAL LOCATION OF MODE,
MEDIAN, QUARTILES, DECILES AND
PERCENTILES
INTRODUCTION
We have so far been dealing with the qualitative aspects of a distribution. However, some
aspects of a distribution can be described in quantitative terms by calculating certain values
from it. An average is a value which is typical or representative of a set of data, Since such
typical values tend to lie centrally within a,setof data arranged in an array, averages are also
called measures of central tendency. There are several types of averages. The most common
being midranges, the arithmetie mean, the mode and the median, The unit concerns itself with
these averages. These measures feveal the position or lerigth of scores in a distribution
OBJECTIVES
By the end of this unit, you should be able to:
() define and calculate the mean, median and mode of a distribution;
(ii) make observations about mean, mode and median of a distribution;
(iii) find the median atid mode using a graph; and
(iv) locate the quartiles, deciles and percentiles by means of a graph,
THE ARITHMETIC MEAN
When buying-electric-lamp bulbs, you.can.pay.a little extra to-get the "longer life" type
When tested, the lives (in hours) of 5 "standard" bulbs and S"longer-life"bulbs were as
follows:
"Standard" 1281 1090 1555 1494 1823
"Longer life" 2048 2741 2212 3319 3041
Here, it would be useful to have a measure which, for each type of bulb, would give a general
indication of the time lasted. This is sometimes termed a "measure of location", as its aim is
to indicate where about the observations are located (in this case, on the time scale). These
measures are also called measures of central tendency, since their values tend to lie centrally
within a set of data arranged in an array. The measure most often used to meet this need is the
arithmetic mean.[Ena
mm
There are other types of means such as the geometric mean and the harmonic mean but they
are not widely used and it is the arithmetic mean which is referred to when the word "mean"
is used
sum of all observations
Total numberof observations
Arithmetic Mean =
Its symbol is ¥ (pronounced x bar). If ¥ (pronounced sigma) means sum or addition of a
series and there are a set of group of N numbers x1, x2, X3..... X7, then
yx
W
‘The symbol )) Xi is used to denote:the Sum of all XPSfrom i= 1 to i
Since the mean is an arithmetié average, it is classifiéd as an interval statistic. Its use is
appropriate for interval or ratio data but not nominal or ordinal data.
Example: Using the data for the "standard" electric lamp bulbs gives their mean length of
life as
1281 + 1555 +1491 +1823 RAB 48 6 hours
5 5
What is the mean length of life of the "longer life" bulbs? You should have done it like this
2048 + 2741 + 2212 $3319 +3041
5
13361 38722 hours
More generally’ if x,,.X) =x, are-n-values-of a variable.x,-then their Arithmetic mean x is
given by
10
ion
oz 1st 140149 gg 167 17
LENGHT (min) —S—
Fig. 3.1
32m
Ta
mm
GRAPHICAL REPRESENTATION
Graphically, the median is the value of the X (abcissa) corresponding to that vertical line
which (correspond) to the S0th percentile point on the cumulative frequency and divides the
frequency into two equal halves.
Example 2: Using the data of the 40 laurel leaves in example 1 above, obtain the median
length of the 40 laurel leaves graphically.
Solution: The first step is to draw a smooth cumulative frequency curve or percentage Ogive
for the given data (see graph - Fig,3.2).
100}
g g 3 g
&
> UM. FREQUENCY & CUM, RELATIVE FREQUENCY
8
1175” 1265 1355 1446 1538 162.5 171.5 180.5
> EenetHimm
Fig. 3.2
33MODUL
ne
We know that the median should be the 20th item (where N/2) or the 20.Sth item (where
N#1/2) is used. Therefore, the value corresponding to a frequency of 20 or 20.5 is read along
the X axis which is 146.75mm as shown on the graph of Fig.3.2.
A percentage ogive could be drawn and the value corresponding to 50th percentile is read. It
is again 146,75mm as shown on the same graph.
THE MODE
‘As already defined, the mode of a set of values is that one which occurs with the greatest
frequene}
Geometric Representation
‘The mode of a set of values can be obtained from the histogram of the distribution. To
illustrate this, we present the following example.
Example: Find the modal age of-adiili males in @eertain company from the following
distribution.
Ages | Frequency
21-25 x
26-30 14
31-35 29
36-40 4B
4145 33
46-50 9
Solution: The fourth class 36 =40 is the modal class since it has the highest frequency. The
mode therefore must live within the modal class. Tor find the mode only the histogram of
three classes need be drawn, that is the histogram of the class before the modal class (31-35),
the modal class (36-40) and the class after the modal class (41-45). See graph of Fig.ona
An 7D
‘3 36 46
98.9 ESTIMATE OF MODE
Fig. 3.3
‘The line AC and BD are drawn. The mode is determined by the X (abeissa) value of their
intersection. In this case, the mode is found to be 38.9
QUARTILES
ACTIVITY
1. What are quartiles? Define quartiles, deciles and percentiles.
35IM
To Statistical
If you cannot remember these definitions again, go back to the beginning of this unit and
study them again,
Just as the median splits the area under the curve into equal portions (see diagram below), so
also can a frequency curve be splited
A B
MEDIAN
Area A= Area B. (Fig.3.4)
Extending this idea, we an Split a frequency curve iito.as many equal portions as we wish.
‘You will recall that the general name given to those values that split a curve into equal parts
are called quantiles. You will also recall the following:-
1. The three values that split a distribution into four equal portions are known as
quartiles. In order of magnitude, they are usually represented by Ql, Q2, Q3 and
called the first, second and third quartiles, respectively.
B c
os oe h Pe
® ®
Fig.3.5
The second Quartile is the median since it divides the area under the curve into two
equal portions.
2. The nine values that split a distribution into ten equal portions are known as deciles
and ate represented by Di,D2 —— Ds. The fifth decile Ds being the median. See
Fig.5.4 (b)
3. The ninety-nine values that split a distribution into one hundred equal portions are
known as percentiles and are represented by Pi, P2,...Po where again Pso is the
median,
36MODUL
ne
GRAPHICAL REPRESENTATION OF QUANTILES
Example 1: The following data gives the weight of 1200 duck eggs.
Weight (mid 9
5 | 57 | 69] 63 | 66 | 99 | 72 | a5 | 78] 81 | s4 | 87] 90 | 93
point in grams)
Noofeges [7 | 13 | 68 | 144 | 197 | 204 | 208 | 160 | 101 | sa] 25] 13 Ja | 2
Find the median, quartiles, D8 and Psy; the 8th Decile and 37th percentile using graphical
method.
Solution:
All we need do is to draw the percentage ogive of the distribution, From the percentage
ogive, it becomes relatively easy to find the median'which isthe 50th percentile. The first
quartile is the 25th percentile. The second quartile is the 50th percentile (the median) the third
quartile is the 75th percentile. The eight decile Dyvis the 80th percentile and P37 the 37th
percentile.
(See solution on graph (Fig.2.5) Table is.as shown in Table 5.1
Wight (grams) f cf pet.
58.5 1 7 Os
61.5 13 20 17
64.5 68 88 73
615 144 32 193
70.5 197 429 35.8
BS 204 633 52.8
16.5 208) 841 70.1
79.5 160 loor 83.4
82.5 101 1102 918
85.5 54 1156 96.3
88.5 28 Lusi 98.4
915 13, L194 99.5
94.5 4 1198 99.8
91: 2 1200 100
Table 3.1mm
g
3
8
40}
i
i
i
5
2
i
3
é
30}
20}
10
e eo 2 8) 2 9 2 |B
gs 3522238 Bo sw
—— > Wiis igms) LESSTHAN
Fig. 3.6
Median = 73.35gms
Ist quartile = 71.10gms
2nd quartile = 73.35gms.MODULE On istical Methods in Education (PDE 710)
3rd quartile = — 77gms
8th decile (D8) = 79gms
P37,37th percentile = 70.7 gms.
ACTIVITY
1 The annual salaries of five men were N5,500, N4,800, N7,000 N8,000 and N32,000.
a. find the arithmetic mean of their salaries.
b. find their median salary.
c. would you say the mean is typical of the salaries?
4. which of the two (a) or (b) gives‘amore reliable average and why?
2. The grades of a studentin eight examinations Were 50,60,75,85,67,60,56 and 72.
a. Find the mode of the grades and
b. Find the median of the grades.
c. Is the mode tinique?
DERIVATION AND-USE OF FORMULAE FOR THE MEASURES OF
CENTRAL TENDENCY FOR A FREQUENCY DISTRIBUTION
In the early part of this unit, we learnt about measures of central tendency and how to derive
them from a set of numbers. -Later, we also leant how (6 locate them graphically. In this
Section, we will learn how. todetive them from a frequency distribution.
What is the arithmetic mean or mean of asetof numbers?
If you cannot state what mean is, go back to the opening section of this unit. You will recall
that the mean is the sum of all the items in a group divided by the numbers of items in that
group.
We had leat also how to calculate the mean’ for a/set or group. Let us now see how to
calculate the mean for a frequency distribution.
MEAN FOR A FREQUENCY DISTRIBUTION
For a discrete frequency distribution taking values (x) x2... %,) with corresponding
frequencies (fi, fi, «.....-- fads the mean xis given by
39aT Statistical Meth
Proof: Now x1, occurs exactly fi times x2 occurs f3 times, so that the sum total of all items is fi
x + fox + fake = firs
and the total number of items is clearly fi + fe +....+fi
yA
But x is defined as the sum of all items divided by the number of items hence
Note: That }° is a summation notation. The process of adding x7, x2, x3 ..... Xn can be
written as.x) + x2,+ x3 +...+.x, and using the }°_ notation can be written as
i.e, the sum of all observations x1, x2... up to and including +»,
Worked Example
Example. A group’of 10.has‘a’mean of 36 and a second group of 16 has’a mean of 20. Find
the mean of the combined group of 26,
Solution:
x f fe
36 10 360
20 16 320
26 680
LAL 80 Logis
YS 26
Continuous Frequency Distribution
For a continuous frequency distribution or grouped discrete distribution, the last method
cannot be directly used since we do not have distinct x values but ranges of values of x.
40MODUL
mm
‘What is done in this case is to simply take the midpoint of the class to represent x value and
proceed in the usual way as in the last example.
Example 1: The weights in (Kg) of 65 female adults of a certain female adult school is
shown in the frequency distribution below. Find their mean weight.
Solution:
Class Weight Midpoint Frequency
(Kg) x f &
5.00 - 5.49 5.245 12 62.940
-5.50 - 5.99 5.745 32 183.840.
6.00 - 6.49 6.245 Th 68.695
6.50 - 6.99 6.745 8 53.960
7.00 - 7.49 7.245 2, 14.490
65 383.925,
DY = 65 Lope = 383.928
383.925
65
5.91
‘The mean weight is 5.91 Kg
Example 2:
178 people were asked how many coins they had in their pockets and the following results
were obtained.
No of Coin
No of people 6 8 8 8
Find the mean number of Coins.
41[Ena
mm
Solution:
Class |___ Midpoint (x) |__Frequencyf |
0-4 2 6 12
5-7 6 48
8-10 9 2
11-12 115 46
178
= _ 178 -
x 6 6.85 =7 coins to thenéarest whole number of coins
NOTE:
1, The fact that we have Unequal class intervals makes no difference to the calculation
for the mean.
2. The calculated miean (6.85) is nob a typical member of the-distribution since the data
comprises of whole numbers. However) when calculating statistical measures for
discrete distributions, we often give the answer in continuous form unless otherwise
specified in which ease an approximated mean value such as(7) can be used.
THE CODING METHOD
When dealing with/large awkward values of a Variable, the €alculation of the mean by the
methods so far employed can become tedious, for-this reason the coding method is
introduced,
‘The method involves subtracting (or adding a number from each of the original values and, if
possible and convenient, dividing (or multiplying) these new values by another number to
obtain a set of x values: which shouldbe more manageable,-We say that the x, values have
been coded (or transformed) into x values. We then find the mean of the x values x and by
using a suitable decoding formula obtain x
Definition: If (a) the set (x, x; -, %4)is transformed to (x4, -, x,)
or (b) the frequency distribution
Mm --%
fh--I
is transformed to
means of the coding formula x
42MODUL
ne
and x is found, we obtain x by means of the decoding formula x =a + bx
NOTE: a and b are chosen for convenience in order to make the x values as simple as
possible.
Example:
Find the mean of the set (15,21,24,27,30,33,36,39,42) using a method of coding.
Solution:
Subtract 27 (a central value) from each item. This is shown in the table below
10 10
MEDIAN
Recall This
The median of a set of numbers x), x2, —- xn is defined as the middle value of the set when
arranged in order of magnitude and the mean of the two middle values if the set has an even
number of items.MODUL
ne
For a Frequency Distribution
For a discrete frequency distribution taking the values (x1, x2...2%) with corresponding
xe
2
1
th value when the values are ranked.
Drt
ai
frequencies (fi, fi ....f) the median is the
Here there is distinction as to whether there is even or odd number of items. The s
sometimes replaced by
DS
— if is fairly large.
+ XS y large
It is usually desirable to include a columi"of Cumulative frequencies when calculating the
median for a discrete frequency distribution-as|shown in the following example.
Example 1:
Find the median of the following discrete distribution:
x} o]1/@ a 4 5 6
fr} 6 | 4 ito | 20 |) 20°} 30°} 10
Solution:
eo) oO (Cump)
0 6 6
I 4 10
2 10 20
3 20 40
4 2 60
5 30 90
6 10 100
100[Ena
mm
The 50,Sth falls at x = 4, the fifth row using the cumulative frequency column.
Hence the median is 4.
Grouped Data
When dealing with a continuous (or grouped diserete) distribution, we can only estimate a
value for the median.
Example:
Consider the following distribution.
x S| Cump
10-19.9 2 2
20.29.9 14 16
30-39.9 38,/] 54
40-49,9 23 7
5059.9 6] 83
60-69.9 I 84,
8441
N = 84. Therefore the median should be the ( ye = 42.5th item which falls in the
class 30-39.9. This class is called the median class, We neéd to find where in the median
class, the median isexpected to lic..From the frequency distribution we sée that there are 16
items up to 29.9 and'$4 items up to 39.9. We require the 42.5th item:
We therefore need to find m such'that there-are 42.5 items up to m. Since there are 16 items to
29.5 and 42.5-items tom, there-must-be-42:5 -/16-=26.5-items-from 29.95 to m. Similarly
there must be 54-42.5 items = 11.5 items from m to 39.95. Now there are total of 38 items in
26.5 5
the median class, therefore m must li a fraction =—of the way along-29,95 to 39.95
The actual distance into the class must be 26.5/38 x 10 (since 10 is the class width)
The median therefore lie at a point 29.95 + 10 avg = 36.92
Note that all number in the above expres
mn are well defined quantities, 29.95 is the lower
1
class boundary of the median class. 26.5 is 42.5 -16 that is ; - Cum f up to lower class
boundary (/cb) of median class. 38 is the median class frequency and 10 is the median class
width or interval
This technique for estimating a median value is called the method of Interpolation,
45ion
(PDE
The general formula for working is therefore given for a continuous (or grouped discrete)
frequency distribution by.
1) )
3 ) -ON,
Ff (median)
Cor
v )
7 ON):
Ff (median)
J
C where
mah
1) = lower class boundary of median class.
N = Number of items in the’data
(/);. = Sum of the frequencies of all classes lower than median class
‘f (median) = frequency of median class
C= median class width
Example: Find the median length of 40 laurel Teaves using ‘interpolation formula and
interpolation method,
Length mm f of
1184126 3 3
127-135 3 8
136-144 9 17
145-153 12 29
154-162 g 34,
163-171 4 38
172-180 2 40
Solution: We include the cumulative frequency column and find the following:
Using Formula:
= 2050r X =20
2
N =
46ion
hy = 1445
A= 17
f median= 12
- OY)1
A 2) le
f (median)
(205 -
14g5~ (205-17
}° = 147.12mm
\
Using Interpolation:
The median is Y= item =20,5th item
‘Now the sum of the first-three classes’ frequencies is 17(i
20.5 we require 3.5 more of the 12 cases in the fourth class, The median must therefore lie
3
3° of the way between'144.5 and 153.5
. 3 +54 9), To give the desired
The median therefore is:
144.5 + 38 (153.5 - 144.5) = 147.12mm_
QUARTILES
We just found how to calculate the median using a formula, Let us now look at other
quantiles. Fot small sets of data, the yalue’ of calculating quantiles; suchas) deciles or
percentiles ig not significant. However, this becomes useful for frequency distributions with
large number of items.“Their location in an ordered set or a frequency distribution is
calculated in a manner similar to that of the median.
Since quartiles split a set of distribution into four equal portions, the first and third quartile Qu
nti)"
/
and Q willbe 1 [ a
C4)
a
and 3 (721) items respectively in a distribution.
4)
inet"
imilarly, Dz will be the 7 | a! item
10
47MODUL
ne
rey
Also, P2s is the 23 | = item.
{100 )
In general, if a particular quantile splits a distribution into 5 equal parts the jth quantile of the
(21 item of the size ordered distribution
set will be the j
s)
Let us now look at the following grouped distribution
x f Cumf
70-72, 5 5
T3618 18 2B
76-78 42 65)
79-81 27 92
82-84 8 100
“100—
In this distribution, n =) f = 100
100+1
The Ist quartile Q is given by th and
3100-41)
the 3" quartile Qs is-given by itertis
It follows therefore that Qris the 25.2Sth item and Qs the 75.75th item:
Since Q) occurs in the class 76 to 78 it is the Q; class, similarly 79-81 is the 3 quartile Qs
class,
The general formula similar to the median interpolation method for obtaining the Ist and 3rd
quartiles are’as follows:
N41
a -
70,
Ic.
—-oP.
12,
48ion
Where /; and /s are the lower class boundaries of the Ist and 3rd quartile classes.
N= Total number of items in the distribution, (7 f), , and ()f), = cumulative frequencies
lower than the respective quartile classes.
‘£01, and {Qs = frequencies of Ist and 3" quartiles; Cy and Cs ~ widths of Ist and 3rd quartile
classes.
Example:
Find using an interpolation formula method the median, quartiles and Psy of the weight of
1200 ducks given below
Weight F Cumf
(gms)
56 - 58 7 7
59-61 13 20.
62 - 64 68 88
65 - 67 144 232
68-70 197 429
1-73 204 633
74-76 208 841
77-79 160 1001
80-82 101 1102
83 - 85, 54 S56
86 - 88 25 1181
89-91 13 1194
92 - 94, 4 1198
95-97 2 1200
Solution: ‘The cumulative frequency is calculated above
(a) Median = a ‘1 600.5¢hitem
Median class = 1-%
bh = 70.5(E Do
c=3
429, f median = 204
49OOO
Median = 70,5 + 3 { 9005 = 429
204
= 73.02
() Quis the Been item = 300.25 the item
Qu class = 68 - 70
Thus, 1; = 67.5 (Yo f), = 232
fO1 = 197.0 =
(30025 — 232°)
Q= 675 +3 aa): 68.54
(197
502001) con 95m item
76.5 (Y f)s =841,f0; = 160, C-=3
6s +3 (200-75 — 841)
f 160
77.62
Qs
(©) Paris the 37 CO item 444,37 items
‘This lies imelass'71'=73
Tey = 70.5 (Pay = 429, fPs7-= 204
c=3
Pay = 70,5 + 3 (444. 37 ~ 425) = 70.72
209
THE MODE
The mode of a set of values is defined as the one which occurs with the greatest frequency.
For continuous or grouped discrete data, a method similar to interpolation is used.
This is illustrated by the following example. Consider the following distribution.
50roa
Class t
21-25 2
26-30 4
31-35 29
36-40 4B
41-45 33
46-50 9
‘The modal class is 36-40 since it is the class with the highest frequency. It is obvious that the
modal value should lie in this class. Sia (44-45) following the modal class is
larger than the class (31 - 35) Ronee jode should be larger than the
modal class midpoint. The | class midpoint depending
ee I class,
‘on whether the class following ein rger'or smaller. than the class previous to
the modal class. The fi
Fig. 3.1
31ea
‘The formula for the mode is given by lr + cy where
1 lower class boundary of modal class
A, = difference in frequencies between modal class and previous class.
A.
difference in frequencies between modal class and the following class.
c= width of modal class.
Nom!
4,
The value, “ies always between 0 and 1
A, +A;
Using the given illustration
1=35, A, = 43-29=14
A, = 43-33 = 10 ande
/
Mode = 35 + (45 = 37.9
\24
Example:
The following are the distribution of marks of 62 students in a statistics test.
x f
93-97 2
98 -10.2 5
10.3-10.7 12
10.8512 18
113-117 14
11.8-12.2 6
12.3612.7 4
12.8-13.2 1
Find the mode.
Solution: Modal clas
10.8 - 11.2
52MODUL
om
hh =10.75
A, = 18-12 = 6A, = 18- 14-4
c= 05
6)
mode ~ 10.75 + (©) (0.5)= 11.05
10)
\
ASSIGNMENTS
1. The weight in kilogrammes, recorded by 50 final year students are as follows:
Weight (Kg) Number of Students
54-57 5
5861 7
62-65 10
66 - 69, 12
70 - 73 6
4-77 5
78-81 4
82-85 1
Find the median, Qy, Q5, and 60th percentile.
REFERENCES.
Avy, Donal et al (1979): Introduction to Research in Education U. 8. A. Holt, Rineheat and
Winston, Ine.
Best, J. W and Kahn, J. V.\ (1986): Research in Education) London: Practice Hall Inter.
Boyinbode I.-R. (11984): Fundamental Statistical Methods in Education and Research, e-
Gay L. G. (1970) - Education Research: Competencies for Analysis and Application. Ohio.
Charles E. Merill
Guilford, J. P. and Fruchter, B. (1973): Fundamental Statistics in Psychology and
Education,
McCall, R. B_ (1980): Fundamental Statistics for Psychology, U. A. A. Harcourt B.
Jovanovich Ine.aT Statistical Methods in
UNIT FOUR: MEASURES OF VARIABILITY OR
DISPERSION, STANDARD SCORES
(Z - SCORES AND T - SCORES) AND THE
NORMAL CURVE
INTRODUCTION
There is the need to determine the above in any distribution particularly when considering
students’ performance.
You are aware that when a teacher or an examiner marks or grades students’ or candidates!
answer scripts, he/she assigns some marks or scores out of a maximum obtainable score. The
fixed maximum obtainable score may be.10»20.30, 50, or most often 100. Scores may also
be values of a variable (age, height, life span or weightof materials). Scores as presented
above are referred to as raw scores. Raw'in the sense that such scores are not yet standardized
or normed. Performance scores, barring examination mialpractices or irregularities, depends
upon easiness or difficulty indices of ifems/tasks and'the generosity, or severity tendency of
the teacher or examiner. Other yariable scores may-depend upon defects in or errors of
reading the calibrations-of measuring instruments, All these defects or errors make the
interpretations of scores difficult
More so, when a candidafe/student gets a score of 70% in an examination, what would you
make out of it?
Is the 70% high score im-terms of standards of the task undertaken or in relation to the scores
of the other candidates/students. who also took the examination? Supposing 70% is the
highest/greatest of the least score of allthe scores eamed by all the students, how far apart are
the other scores? ‘Tovovercome the above errors, defects or wnduesinfluences on scores,
norming and/or standardization of score are devised and-used,
OBJECTIVES
By the end of this unit, you should be able to:
1. define variability and give its measures
2, calculate standard deviation
3. convert raw scores to z - score and vice versa;
4, transform a given z - score to a T- score and vice versa;
5 convert a raw score overall performance of students in a given set of tests or course as
expressed in percentage score and T - scores when nei
given/obtained,
sary statistics are
6. draw the normal curve
7. interpret the areas of the normal curve
34aT Statistical Meth
MEASURE OF VARIABILITY (SPREAD) OR DISPERSION
Measures of spread dispersion or variability indicate the degree to which the various points in
a distribution deviate from the average. Measures of central tendency only describe a
distribution in terms of average value or the typical measure but not the total picture of the
distribution. The mean and the median may be identical for some distributions without us
knowing their spread. This is why measures of spread are necessary.
For illustration, consider the following distributions of scores of students in two subjects:
Distribution A Distribution B
98 16
90 8B
85 77
80 u
5 9
70 B
65 2
60 74
55 75
Tx = 675 615
N = 9 9
x = = aa & ="75
Md = 15 = 75
The scores in distribution B is homogonous with little difference between adjacent scores,
The scores in distribution are heterogeneous spreading for apart and performance ranged
from superior to very poor,
However, the mean and median in both distribitions are the same. Therefore, there is the
need for the indices that describe the spread or dispersion of scores in a distribution. Several
of such measures are available. These include the Range, Quartile Deviation, mean deviation,
variance and the standard deviation
Range
The range is the simplest of all indices of variability. It is the difference between the highest
and lowest scores in a distribution.
The range may be inclusive or exclusive.
The exelusive range is usually quoted as the difference between the largest and the smallest
scores in a distribution. However, the inclusive range is the difference between the upper
55