DATA COLLECTION AND SAMPLING
Learning Objective
Here, the learning objectives are as follow:
Plan how to collect statistical data to test a set of predictions.
Use data to make inferences and generalisation.
Look at alternative ways to choose a sample and decide which the best
method to use.
This means at the end of this unit, you should be able to:
Carry out statistical investigations, collect data and sample, make
inferences and generalisations.
Introduction
To answer statistical questions, you need to collect data.
Statistical Data: are the outcomes or counts or the observations obtained
from investigation or experiment.
Types of Data: There are different types of data:
1. Discrete data: are data that can only take certain values. They are fixed
data values determined by counting.
The data can be counted and has limited number of values that usually comes
in the form of whole numbers or integers. The values therefore can be 0, 1,
2, and so on.
Examples of discrete data are:
The number of people in a class
Test question answered correctly
2. Continuous data: are data that can only take any value, including decimal
values. They are values measured over a particular time interval. Height,
weight, length, temperature, masses; times etc. are all examples continuous
data.
Examples are:
The weight of a baby in its first year
The temperature in a room throughout the day
Note:
Both discrete and continuous data are regarded as Quantitative data.
Data about numeric values (number-based, countable or measurable:
how many, how much, how often etc.).
3. Categorical data: are data that can be grouped into categories instead of
being measured numerically. The data are not given in numbers but rather
in natural language descriptions (words).
Note:
Categorical data is otherwise known as Qualitative data.
Interpretation-based, descriptive and relating to language or word: why,
how or what …
Some examples are:
Colour of hair
Name of department in your school
Political affiliation in your country
The gender represented in a school basketball team.
So, data in essence can be categorized into two:
1. Qualitative data: categorical data as example.
2. Quantitative data: discrete and continuous data as examples
Data
Quantitative Qualitative
Discrete Data Continuous Data Categorical Data
Methods of Collecting Data
Earlier in grade 8, you learnt several ways by which data can be collected.
Some of which includes:
Interviewing people
The use of Questionnaire
Carrying out Experiment and Observation
Taking measurement
Carrying out survey
Suggested Teaching Approach
Design and use practical activities to test learners previous knowledge of:
Data and
Data collection
Data Collection
To answer questions in statistics, you need to collect data.
Data collection is a way of gathering and measuring information that
enables one to answer relevant questions and evaluate outcomes.
The following are the steps for collecting data:
First, decide which type of data you need to collect: discrete,
continuous or categorical.
Decide how to collect the data (method): interview,
questionnaire, observation, survey etc.
Evaluate the outcomes.
In your previous classes on data, the focus is on collection of data (type of data
and method collecting data). In this section, while planning statistical
investigations, you will not only collect data, you will also make predictions,
inferences and generalisation. It is therefore important to get familiar with
those.
Statistical Prediction
Statistical Prediction is the expected result or outcome of a test that is based
on assumption or hypothesis or theory (proposed explanation or
statement).
In scientific method, the prediction is constructed before any applicable
research.
For instance:
From the study of a student’ performance, such as scores on tests and
. exams, we may predict the student’s final grade with reasonable
accuracy.
Statistical Inference, Sample and Genaralisation
Statistical Inference
Statistical Inference is the practice of using sampled data to infer judgment,
draw conclusion or make predictions about a larger sample or
population (the people you are interested in).
This happens in a case where you cannot question the whole population. In
such case, you need to choose a sample.
In simple terms, inference is a situation where data is extracted from a
group of subjects and then used to make predictions about a larger group.
For example,
If you take 100 students, from a school population, on
whether they like physical school or virtual learning,
And 75 of the 100 students like virtual learning while 25 like
physical school.
Using the data, you may infer or conclude that 75 percent of
the general population of students in the school like virtual
learning while 25 percent like physical school.
Statistical inference is therefore based on random sampling i.e. it is based on
the data sampled from a larger population.
Statistical inference therefore helps to make generalisation i.e.
It helps to determine a general parameter about a larger sample or
population based on data acquired from samples.
It helps to determine how a larger group of subjects will perform based
on the performance of the existing subjects.
It helps to make educated predictions about how a set of data will
scale/land when applied to larger population.
Generalisation
Generalisation is therefore a general statement or conclusion reached or
obtained by inference from specific cases.
Let’s get to work by looking at this example:
Social media is considered a challenge to students and their study.
You are asked to investigate the gender that spends more time on
social media.
Here you need to think about things like:
Activities or apps available on social media.
Gender in relation to the activities students spend time on
Age and gender in relation to time spend.
Sampling
1. Statistical Questions: first write some questions you could ask about the
gender that spends more time on social media.
What type of activities students engage in on social media? WhatsApp,
Facebook, instagram engagement, playing games, watching events such
as sport, movies, trading, cartoons etc?
Which gender spends more time on social media? Basing yourself on
outcome from individual activity?
How age and gender determine the activity and who spend more time
on social media?
2. Make Predictions: from your questions, write some prediction you could
test.
Girls spend more time on facebook and instagram than boys or
Boys spend more time playing games or watching sport than girls.
While girls generally spend more time watching movies/soapies, boys
spend time trading online.
Girls within the age of 11-13, spend more time watching cartoon
3. Collect Data and Sample: Decide some different ways of choosing a sample
to test one or more of your predictions.
To test your predictions or some, you need to think type of data you will
collect and how to collect the data.
Here, you may think of collecting discrete and categorical data using
questionnaire and/or data sheet with headings to fill or tick.
Then choose samples to test one or more of you predictions as shown
below:
i. Data Sheet
Fill the activities using M for Male and F for Female.
Grade Facebook Instagram Games Sport
ii. Questionnaire
Questions Yes or No
Girls spend more time watching movies or soapies?
Boys are involved in online trading than girls?
Questionnaire
Which group of students spends more time on cartoons?
Tick the appropriate age range and
For the gender, write either M for Male or F for Female.
Age Range Gender
11-13 14-16 17-18
4. Trial: Carry out small trial investigation to test your data collection
method. You may think of ways to improve your investigation?
Here, use the data sheet/questionnaire to get sample opinion says 120
students from different level of your school. For instance, 20 students
from each level (grade 8 to A Level).
5. Generalisation: Use the results of your trial to make a generalisation about
the gender of students that spend more time on social media.
You may use your samples to generalise among:
Students as a whole.
Students at the lower secondary (Grade 8 and 9).
Students at the upper secondary (Grade 10 and 11).
Students at advanced level (AS and A levels).
Do it Yourself Questions
1. You are going to investigate the impact of a school’s science fair on students
in the last five years.
a. Write some questions you could ask about the impact of science fair on
students in the last five years.
b. Write some predictions you can test.
c. Describe some different ways of choosing a sample to test one or more
predictions.
d. Which sample method is best? Give a reason for your answer.
e. Carry out a small trial of your investigation. Can you think of a ways to
improve your investigations?
f. use the results of your trial to make a generalisation about the impact of a
school’s science fair on students in the last five years.
Follow the guidelines in question one and answer the following questions:
2. A school has introduced the use of tablets at school. You are going to
investigate if the introduction of tablets at school helps any student.
3. The traffic controller of a police department wants to carry out
investigation to control traffic during closing hours and the accident it may
cause. Investigate how this section of the police may carry out the
investigation.
BIAS: POTENTIAL ISSUES AND SOURCES
Learning Objective
Here, the learning objectives shall be to learn:
about sources of bias
about ways to choose an unbiased sample
how to identify wrong or misleading information
This means at the end of this unit, you should be able to:
collect a data that is a representative of the whole population or
unbiased.
Statistical bias is statistics that do not provide an accurate representation of
the population.
Sources of Bias
There are different possible sources of bias. Statistical bias can occur due to
factor such as method of sampling i.e. some data is biased because the sample
of people it surveys doesn’t accurately represent the population or
certain groups are underrepresented in a sample data.
So, the accuracy or reliability of a statistical investigation depends on the
sample data collected. A sample data that does not represent the whole
population is biased.
Example 1
A presidential candidate wants to be the voice of the majority and so he
carried out an investigation to test what people in a community think about
meat industry.
He heads to vegetarian community and ask 7 people and all of them
unanimously say the meat industry should be banned.
And so he concluded that everyone in the community wants the meat
industry to be banned.
In the election, he had less than 2% of the votes available.
Explain why?
His sample is a representation of a group (the vegetarian)
who dislikes meat; the entire population is not represented.
His sample data therefore is a biased.
A sample data should be a representative of the entire
population to be accurate. Hence why a bias is regarded
as prejudice in favour of or against one thing, person or
group compared with another.
Bias is bad and our objective in this section is to minimize as much bias
as we can.
Other sources of bias are selection bias, omitted variable bias, observer bias,
recall bias, funding bias etc.
Example 2
A garage coordinator wants to find out average number of taxi arriving at the
park in a day. On a cold day, he took a sample of taxi arriving from 7am –
12pm and based his conclusion on this sample.
Explain why this is a biased sample in the sense that:
The time interval for the sample is short.
The sample is taken only on a day.
Being a cold day, people may have delayed to leave their
homes.
Activity
Suggest what he can do to achieve a reliable investigation.
Explain your suggestion
Do it Yourself Questions 1
1. Below are questions from investigation:
a. Do you agree that ChatGPT will make students to think less?
b. Do you think the cost of groceries is too high the minimum wage earners?
c. Do you prefer your new teacher of Philosophy?
i. Explain why these questions will give a biased result.
ii. How could the questions be written to prevent being biased?
2. A student carries out an investigation to test if a family of 12 member,
comprises of 4 children, 2 parents and 6 others prefers grape juice or orange
juice.
In her investigation, she only took a sample of the 4 children who say they
prefer orange juice and generalise that the whole family like orange juice.
a. Explain the problem with her investigation.
b. Suggest how she can solve this problem.
3. A football fan base comprises of 100 males and 50 females.
The football club wants to choose a representative sample of 60 fans to
support the team abroad.
i. Suggest how the selection should be done.
ii. Use you answer to determine how many males and females the football
club should choose.
4. A novelist wants to investigate if the readers of her new literature book it
fascinating or not. She managed to interview 64 readers out of the sample of
256.
a. Work out the percentage of readers she interviewed.
b. The novelist thinks that the percentage could cause a biased.
i. Explain why the novelist may be right.
ii. For better result, suggest another method she may use to collect her data.
Do it Yourself Questions 2
1. A teacher wants to investigate students’ low performance in Mathematics.
She developed a questionnaire which she gave out to 320 students
121student returned the questionnaire. Work out and explain how this
number may cause bias.
2. LHDA is inviting all Basotho to suggest names that should be given to the
Tunnel Boring Machine (TBM). The winning name suggested will result in
unforgettable prizes from the LHDA for the winner.
When choosing the names, participants are advised to consider “A legend
from the Leribe district, the area of the tunnel excavation” as one of the
criteria.
Explain why this particular criterion might give a bias.
3. A sample of students representing Grade 11 were given chance to write
either Physics or Physical Science.
When they were asked, ‘if they prefer Physical Science’, 80% of the sample
students said, ‘yes’.
a. Why do you think this result might be biased?
b. How would you asked the questioning to avoid bias?
4. The table below shows the numbers of students in a school house sport.
Soccer Basket ball Total
Red 48 34 82
Green 50 44 94
98 78 176
The school wants to choose a representative sample of 35 students for district
competition.
a. How many students in the sample should be in green house?
b. How many students in the sample should play soccer?
c. Use bar chart to show the data in the table.
REPRESENTING AND INTERPRETING DATA USING DIAGRAMS
Representing Data using Diagrams
For the purpose of interpretation of data collected during investigation, data
(categorical, discreet or continuous data) may be represented using
diagrams, charts and graphs.
The following are the lists of diagrams, charts and graphs that may be used to
represent data:
Venn and Carroll diagrams.
Tally charts, frequency tables and two-way tables.
Dual and Compound bar charts.
Pie Charts
Line graphs, time series graphs and frequency polygons.
Scatter graphs
Stem-and-leaf and back-to-back stem-and-leaf diagrams.
Infographics
So, in order to interpret data, you need to choose which representation to use
for a given data. It is therefore very important to decide which type of
representation (diagram, chart, graph) is best to use to represent data.
The table below will help you to decide the right representation:
Type of Diagram, Chart When it is to be used What it looks like
or Graph
Venn diagram When data are sorted into
groups that have some
things in common.
Bar chart When data being compared
are discrete data.
Dual bar chart When comparing two sets of
discrete data.
Compound bar chart When two or more data are
to be combined into one
bar. In order to identify the
individual quantity and the
total quantity.
Frequency diagram For comparing continuous
data.
Line graph It is used to see how data
changes over time.
Scatter graph For comparing two sets of
data points.
Pie chart For comparing the portion of
each sector with the whole
amount.
Infographic For showing information in a
quick and easy to
understand way.
Interpreting Data from Diagram
You already learn therefore know how to use most of these items in
representing and interpreting data (categorical, discreet or continuous data).
However, our objective in this session shall be to interpret data represented
using:
Frequency Polygon
Scatter Graphs
Back-to-back and
Stem-and-leaf diagrams.
Learning Objective
Here, the learning objectives shall be to:
Draw and interpret frequency polygons for discrete and continuous
data.
Draw and interpret scatter graphs.
Draw and interpret back-to-back and stem-and-leaf diagrams.
This means at the end of this unit, you should be able to:
Interpret data from the above diagrams.
FREQUENCY POLYGON
Frequency polygon is a type of line graph obtained when the class frequency
is plotted against the class midpoint and the points are joined by a line
segment to create a curve.
So, in order to draw a frequency polygon, the following are the steps:
Calculate the midpoint of each class interval (by finding
the mean or average of the class interval.
Prepare a table of class frequency and midpoint for the
class.
Plot of a graph of class frequency (on 𝑦 −axis) and
midpoint of the class (𝑥 −axis).
Then join the plotted points with a straight line segment
(polygon).
Note:
The following expected of you:
Label the axes with the quantities you have plotted on the axes.
Give the graph a title
Below is an example:
Class Interval Midpoint of Frequency
Length, 𝑙 (mm) Class interval
0 < 𝑙 ≤ 20 0 + 400 6
= 10
2
20 < 𝑙 ≤ 40 20 + 40 9
= 30
2
40 < 𝑙 ≤ 60 40 + 60 19
= 50
2
60 < 𝑙 ≤ 80 60 + 80 16
= 70
2
80 < 𝑙 ≤ 100 80 + 100 7
= 90
2
100 < 𝑙 ≤ 120 100 + 120 3
= 110
2
The line graph is as shown below:
Frequency-Height Line Graph
Activity
Use the table above and plot the frequency polygon if the frequencies are 2, 8,
9, 7 and 1 respectively.
SCATTER GRAPHS
A scatter graph is a statistical diagram that compares two sets of data. It
gives a visual representation of the relationship or correlation between
these two sets of data.
In scatter graphs, dots or crosses are used to represent values for these two
sets of data and the position of a dot or cross on the horizontal and vertical
axes indicates the values for an individual data point.
The two sets of data could have:
Positive Correlation: This is a case in which the data shows uphill pattern
as you move from left to right. As one value increases, the other also
increases and vice versa.
Negative Correlation: in this case, the data show a downhill pattern as you
move from right to left. This means as one value increases, the other
decreases and vice versa. This indicates a negative relationship between the
two sets of data.
No Correlation: This is a case when the data are random and do not have
any kind of pattern. This means there is no relationship between the two
sets of data.
Line of Best Fit on Scatter Graphs
Where two sets of data have positive or negative correlation, a line of best fit
may be drawn on the scatter graph.
The line of best fit shows the relationship that exists between the two
sets of data.
It helps to get an estimate value of the variable. It may therefore be
referred to as an estimated line of best fit.
It can also be used to show values such as:
Strong correlation: is the case where most of the data points from the
two sets of data will be closed to the line of best fit.
Weak correlation: is the case if most data points of the two sets of data
are not close to the line of best fit.
Interpolation
The line of best fit may as well be used to:
Estimate the value of one variable when the value of the other
variable is given. This is called interpolation.
This is done in these ways:
Identifying the given value on its axis (vertical or horizontal
axis). Let’s say it is on the horizontal axis.
Make a broken line from this value (on the horizontal axis) to
the line of best fit.
Then from the line of best fit, continue the line to the variable
you are looking for on the vertical axis.
We shall see example of interpolation as we continue.
Note:
When drawing line of best fit:
Make sure it is a straight line in the direction of the correlation,
With points distributed on each side of the line as equally as possible
along the line.
The line may pass directly through a number of points.
To plot a scatter graph, the following are the steps:
Identify the independence and dependence variables.
Place the independence variable on the horizontal
(𝑥 − 𝑎𝑥𝑖𝑠) and the dependent variable on the vertical
(𝑦 − 𝑎𝑥𝑖𝑠).
Label the axes with the variable names and units (where
needed)
Then plot the point like any other graph.
Give the graph a title.
Example:
The table below shows the height and weight of 10 students.
Student 1 2 3 4 5 6 7 8 9 10
Height (cm) 120 145 130 155 160 135 150 145 130 140
Weight (Kg) 40 50 47 62 60 55 58 52 50 49
a. By placing height on the vertical axis and weight on the horizontal axis, plot
a scatter graph to show the relationship between the height and weight.
Plot each of the pair point using your previous knowledge of graphs, and
mark it with a cross or dot.
Below is the graph.
b. Use the graph to describe the relationship between these two variables.
The graph is a positive correlation. This means as one variable
increases, the other increases likewise and vice versa.
To be more specific, the graph shows that the taller a student is, the
heavier the student as well and vice versa.
c. Draw a line of best fit on you graph and describe the strength of the
correlation.
Most of the points are away from the line.
It therefore means that the correlation or relationship is a weak one.
d. Use the graph to estimate the height of a student with a weight if 56kg.
First go to where the weight is 56kg on the horizontal axis.
From there, draw a line to the line of best fit.
From the line of best fit, draw another line to the vertical axis
representing the height.
The point at which this line touches the vertical axis is the height when
weight is 56kg.
See the graph below.
Do it Yourself Questions
1. Describe and explain the correlation you would expect between each of
the data below.
a. The age of a vehicle and its speedometer.
b. The amount of time fishing and the amount of bait in the bucket.
c. The number of passenger in a bus and the number of traffic lights on the
route.
2. The table below shows the height of the waves at Durban beach and the
number of surfer at the beach.
Wave height 5 8 7 3 6
(feet)
Number of 26 63 58 17 37
Surfers
a. Plot a scatter graph to show the data
b. Describe the type and strength of the correlation between the two data.
Explain your answer.
c. Draw a line of best fit for the graph.
d. Use the line of best fit to estimate the height of the wave if 15 surfers were
at the beach
3. The scatter graph below shows the numbers of lawns mowed by a Gardner
during one week.
a. How many days does it take to mow 20 lawns?
b. About how many lawns can be moved in 1 day?
c. Describe the relationship shown by the data.
4. The scatter graph shows the weights of a baby taken from birth through
some months.
a. What is the weight of the baby at birth?
b. What is the age of the baby when the weight is 15 pounds?
c. Does the data show a positive, a negative or no correlation?
5. Below is a scatter graph showing relationship between numbers of boys
and girls in different classrooms.
a. How many classrooms are there altogether?
b. Zainab and James described the relationship as follow:
Zainab says the scatter graph
James says this can’t be true,
shows a negative correlation.
that there is no relationship
This means that the more the
between number of boys and
boys the less the girls and vice
of girls.
versa.
Discuss with other learners and decide who is correct between Zainab and
James
6. Here is a table showing the numbers of losses a gamer has in playing a video
game for 7 weeks.
Week 1 2 3 4 5 6 7
Losses 15 12 10 7 6 3 1
a. Plot a scatter graph for the data. Place Week on the horizontal axis and
Losses on the vertical axis.
b. Draw a line of best fit. Use your line of best fit to estimate how many losses
the gamer had in 3.5 weeks.
c. Describe the relationship and strength of the correlation.
STEM-AND-LEAF DIAGRAM
Data needs to be presented in a way that it is easy to visualise and quickly
understand the data. Using stem-and-leaf is one of the many ways this may be
done.
Stem-and-leaf diagram is another way of representing data where each
number is split into two parts, namely the stem and the leaf, hence the
name.
The stem is the first few digits or every digit before the last digit.
While the leaf is the last digit (it must be one digit only).
The symbol ‘I’ is used to split and express the stem and leaf values.
For instance,
In a 173, 17 will be the stem and 3 the leaf.
In a number 46, 4 will form the stem and 6 the leaf.
While in a number 3.9, 3 will be the stem while the leaf will be 9.
A one digit number like 7 may be considered as 07, it therefore has a
stem of 0 and leave of 7.
Stem-and-leaf diagram has the following features:
The numbers are arranged in line vertically and
horizontally.
The numbers are arranged in order of size, from the
smallest to the largest.
Use of keys to show how to read the diagram.
How to make a stem-and-leaf diagram
First identify the smallest and largest number in the
data.
Identify the stems and the leaves.
Draw a vertical line and list the stem numbers to the
left of the line and each leaf number on the right next to
its corresponding stem
Below is the table of fruits found in a bag:
Fruit Number
Apple 22
Orange 32
Pear 14
Banana 21
Cherry 4
Avocado 29
Watermelon 4
Pineapple 13
Lemon 29
Plum 20
Guava 24
Coconut 2
Grape fruit 12
Fig 1
Draw a stem-and-leaf diagram for the number of fruit.
Stem Leaf
0 1 2 4 4 The numbers are between 1 and 32
Set the number in order of size as
1 2 3 4
stem on the left and leaf on the right.
2 0 1 2 4 9 9
3 2
Key: 0 1 means 1 fruit
You may then be asked to use your stem-and-leaf diagram to find other
information such as mean, median, mode and range.
This shall be done extensively in the next topic. However, you already
had previous knowledge of stem-and-leaf in your previous grade.
Suggested Teaching Approach
Design and use practical activities to test learners previous knowledge of:
Stem-and-leaf
BACK-TO-BACK STEAM-AND-LEAF DIAGRAM
Here in this grade, the objective shall be to draw and interpret back-to-back
steam-and-leaf diagrams. It will be a step further to you previous knowledge
of stem-and-leaf diagram.
A back-to-back stem-and-leaf diagram is a method of comparing two data by
attaching two sets of leaves to the same stem in a stem-and-leaf diagram.
How to make back-to-back stem-and-leaf diagram
This is similar to making stem-to-leaf diagram for a single data as follow:
First identify the smallest and largest number in the
two data.
Identify the stems and the leaves.
Draw two vertical line and list the stem numbers
between the vertical lines.
Then set out in order of size the leave for one data no
the left of the line and the leave for the other data to
the right of the line.
Let see the example below:
Earlier on we drew a stem-to-leaf diagram for a number of fruits in a bag.
Here is another table that shows the number of fruits found in another bag.
Fruit Number
Apple 19
Orange 27
Pear 29
Banana 33
Cherry 24
Avocado 21
Watermelon 5
Pineapple 5
Lemon 12
Plum 10
Guava 9
Coconut 13
Grape fruit 8
Fig 5
We are going to draw a back-to-back stem-and-leaf diagram to show the
number of fruits in the two bags, by so doing comparing the relationship
between the two bags.
bag 1 bag 2
4 4 2 1 0 5 5 5 8 9 The number ranges from 1 to 33
4 3 2 1 0 2 3 9 The leaves for bag one comes out from the
stem in order of size to the left.
9 9 4 2 1 0 2 1 4 7 9
While the leaves for bag 2 comes out from
2 3 3
the stem in order of size from the right.
Key: For bag 1, 0 1 means 1 fruit
For bag 2, 0 5 means 5 fruits
You may use the back-to-back stem diagram to answer questions like:
a. What fraction of the fruit in bag one is more than 10 but less than 20.
The data in these category are: 11, 12 and 13
These numbers added together
(11 + 12 + 13) 𝑎𝑑𝑑 𝑢𝑝 𝑡𝑜 36.
The total number of ball in bag one add up to:
11 + 36 + 145 + 32 = 224
Then divide the sum of fruits more than 10 but less than
20 by the total number of fruits in bag one:
Therefore, fraction of fruits in bag one that is more than 10 but
less than 20
36 18 9
= = =
224 112 56
Activity
b. Which fruits in bag two have equal number?
c. Which bag has the highest number of fruits?
CALCULATING MEAN, MEDIAN, MODE AND RANGE FOR GROUPED DATA
Learning Objective
Here, the learning objective is to use:
Mean, Median, Mode and Range to compare two Grouped Data.
This means at the end of this unit, you should be able to:
Carry out statistical trends and relationships between two sets of data.
You already had some basic knowledge about statistical mean, median, mode
and range, especially how to work out the mean, median, mode and range for
individual data and for data represented in a frequency table.
In this objective, we shall look beyond individual data to calculating mean,
median, mode and range for grouped data.
Note:
Grouped data is also referred to as “Class Interval”.
Calculating mean, median, mode and range for grouped data requires a
different approach.
However, for the purpose and better understanding of this objective, we need
to remind ourselves the following:
MEAN: is the sum of all the values divide by the number of values. It
is otherwise refer to as average.
MEDIAN: is the middle value when the values are arranged in order
of increasing size.
MODE: is the most frequent value-values that appears most in a set
of data.
RANGE: is the largest value minus the smallest value.
Suggested Teaching Approach
Use activities to test learners previous knowledge ofMean, median, mode and
range from:
Individual data and
Frequency table
Grouped Data
A group data is a type of data that has been grouped or classified into
specific categories or ranges.
It is used to make it easier to analyse and interpret large amount of data.
A frequency table below shows the weights of certain people. It is an example
of a grouped data.
Weight (Kg) Frequency
20 < 𝑤 ≤ 30 2
30 < 𝑤 ≤ 40 13
40 < 𝑤 ≤ 50 7
50 < 𝑤 ≤ 60 6
The grouped data’s frequency table is different from the individual
data’s frequency table in the sense that the values representing the
quantity being measured are grouped, hence the name, group data
(see the coloured column). Otherwise known as the “Class interval”.
The table shows that:
2 people out of the total number of people have their weight within 21
and 30. They are therefore in the group or class interval 20 < 𝑤 ≤ 30.
13 people have weight within 31 and 40 are in the group or class
interval 30 < 𝑤 ≤ 40 and so on.
Calculating Mean, Median, Moe and Range of Grouped Data
Calculating Mean of Grouped Data
Steps for finding the Mean of a Grouped Data:
1. First work out the midpoint of each group or class interval: The
midpoint is the average of the class interval and is worked out as
shown:
20 < 𝑤 ≤ 30 is the first group or class interval in the
frequency table.
Its midpoint will be the average of 20 and 30 (class
interval): i.e.
= = 25 (midpoint)
So, 25 is the midpoint for class interval 20 < 𝑤 ≤ 30
Activity
Repeat these steps and find the midpoint for each of the class interval
2. Then multiply each midpoint by the corresponding frequency i.e.
25 × 2 = 50
Repeat this step for each class interval
Activity
Repeat these steps for each class interval.
3. Add all the results obtained from multiplying midpoint by frequency
i.e.
50 + …
Activity
Add the results obtained from multiplying midpoint and frequency.
Also, add all the frequencies together.
4. Lastly, find the mean by dividing the total obtained by the sum of the
frequency: i.e.
Mean equals the total of all the product of midpoint and
frequency divided by the total of all the frequencies i.e.
𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑀𝑒𝑎𝑛 =
𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Activity
Using the formula, find the mean.
Approximate your answer to the nearest whole number .
Note:
These steps can simply be shown on a table by adding two columns
representing “midpoint” and “midpoint × frequency” to our frequency
table:
See the table below. The columns added are coloured.
Weight (Kg) Midpoint Frequency Midpoint × Frequency
20 < 𝑤 ≤ 30 25 2 50
30 < 𝑤 ≤ 40 35 13 455
40 < 𝑤 ≤ 50 45 7 315
50 < 𝑤 ≤ 60 55 6 330
Total 28 1150
𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 1150
𝑀𝑒𝑎𝑛 = = = 41.07𝑘𝑔
𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 28
Therefore, the estimate of the mean is 41kg
Calculating Median of a Grouped Data
The steps for finding the median of class interval or grouped data are as
follow:
First find the sum of all the frequencies.
Divide it by 2 (Since median is known to be the middle
number).
The median will be class interval or group corresponding
to the result (frequency) obtained after the division.
Let find the median for our group of data:
The sum of all the frequencies = 28
Then = 𝟏𝟒
From the column of frequency, 14 falls in row 2 (because 2 in the
row 1 and 13 in row 2 add up to 15.
And the class interval or group corresponding to row 2 is
30 < 𝒘 ≤ 𝟒𝟎.
Therefore,
Calculating the Mode
theof a Grouped
median Data
is 30 < 𝑤 ≤ 40.
Calculating Mode of a Grouped Data
Mode is regarded as the simplest of statistics measure that involve mean,
median, mode and range.
It is the class interval or group with the highest frequency.
Steps for find mode:
From the table identify the highest frequency.
-Let us use our table as example:
In our frequencytable,
The greatest frequency is 13.
And the class interval or group corresponding to this frequency
is 30 < 𝑤 ≤ 40.
Therefore, the mode or modal class is 30 < 𝑤 ≤ 40.
Calculating Range of a Grouped Data
Range, as defined, is the difference between the largest and the smallest
data i.e. largest data minus the smallest data.
So, in our frequency table,
The smallest value is 20
And the largest value is 60
So, the range = largest value – smallest value
𝑖. 𝑒. 𝑟𝑎𝑛𝑔𝑒 = 60 − 20 = 40𝑘𝑔
The range therefore is estimated to be 40kg.
Activity
1. From our example,
a. What do you observe about the mean and the range? Explain why both
answers are estimate.
b. What can you observe about the median and the mode? Explain your
observation.
2. The table shows the record of work submitted by two departments of a
school.
Monday Tuesday Wednesday Thursday Friday
Department 1 20 21 22 20 21
Department 2 30 15 12 36 28
a. Draw a back-to-back stem-and-leaf diagram for these departments.
b. Find the estimate mean and range for these departments.
c. Which department is most consistent?
d. Compare and comment on the record of work submitted by these two
department.
e. The school thinks the record of work by Department 2 is better. Do you agree
or disagree. Justify your answer by explanation.
Do it Yourself Questions
1. A school takes heights, in cm, of 51 students from a school.
Class interval 100 ≤ ℎ < 110 110 ≤ ℎ < 120 120 ≤ ℎ < 130 130 ≤ ℎ < 140
Frequency 6 16 21 8
a. Estimate the mean and class interval where the median height falls.
b. Find the modal class.
c. Work out an estimate for the range.
2. The number shows the number of hours a sample of people spent viewing television
one during summer.
a. Complete the frequency table for this sample.
Viewing time/hours Number of people
0 ≤ ℎ < 10
10 ≤ ℎ < 20 27
20 ≤ ℎ < 30 33
30 ≤ ℎ < 40
40 ≤ ℎ < 50
50 ≤ ℎ < 60
b. Calculate the mean viewing time for these number of people
c. Work out an estimate for the range.
d. State one different you would expect to see in the data if it were to be carried out
during the winter.
3. A farmer buys 2 packets of seeds from two different companies. Each packet contains
20 seeds. The farmer records the number of plants that grow from each packets.
Company A 20 5 20 20 20 6 20 20 20 8
Company B 17 18 15 16 18 18 17 15 17 18
Draw
a. A scatter diagram of the two companies
b. A back-to-back stem-and-leaf diagram of the two companies.
c. Find the mean, median and mode for each company’s seeds.
d. Which company does the mode suggest is best?
e. Which company does the mean suggest is best?
f. Find the range of each company seeds.
4. The list below shows the maximum daily temperature, in 0F, in a certain month of the
year.
55.3. 49.4 63.9 55.7 56.3 54.0 52.2 58.7 58.9 52.0
45.8 55.3 42.6 62.5 63.4 61.0 58.5 48.9 62.3 68.4
56.4 67.0 43.3 58.1 53.6 52.1 46.9 51.3 56.7 63.4
a. Complete the grouped frequency table below.
Temperature, T Tallies Frequency
40 < 𝑇 ≤ 44
44 < 𝑇 ≤ 48
48 < 𝑇 ≤ 52
52 < 𝑇 ≤ 56
56 < 𝑇 ≤ 60
60 < 𝑇 ≤ 64
64 < 𝑇 ≤ 68
68 < 𝑇 ≤ 72
b. Represent the data using:
i. bar chart ii. Pie chart. Iii. Frequency polygon iv. Scatter diagram v. Stem-and-
leaf diagram
c. Calculate an estimate of the mean of the temperature.