0% found this document useful (0 votes)

5 views20 pages

2.representation of Data

The document discusses various methods for representing data, including stem-and-leaf diagrams, box-and-whisker plots, histograms, and cumulative frequency graphs. Each method is explained with its purpose, construction steps, and examples, highlighting their suitability for different types of data. Additionally, it covers the concept of skewness in data distributions and how to interpret it using measures of central tendency.

Uploaded by

周佳文

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views20 pages

2.representation of Data

Uploaded by

周佳文

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Section 2.

Representation of data

2.1 Stem-and-leaf diagrams

 Represent discrete data: specific values you cannot subdivide. Typically

integers you can count.

 Show all raw data and groups it into class intervals of same class
width.
Note: the class width here is the difference between two consecutive
lower/upper class limits.

E.g., in Fig 2.3, we divide the following discrete data points into 4 Classes of
width 10:

Classes: 0-9, 10-19, 20-29, 30-39.

 Consists of:
1) a stem: defines the scale for the data
2) leaf: where the data is plotted in ascending order
3) a key: explains how to read the data
4) numbers in bracket: how many values in that class intervals, not always
included but useful when given large dataset.

 What are stem-and-leaf diagrams used for:

1) The data is arranged into classes, so it is easy to see the modal class
interval. E.g., the modal class in Fig 2.3 the class 20-29 where 23 is the
mode.
2) Since the data is in ascending order, it is easy to identify the median,
quartiles (LQ and UQ), maximum and minimum.
Eg., Find them for Fig 2.3

LQ=_______________Median=____________,UQ=______________,therefore,
IQR=___________
Maximum=____________, Minimum=__________, therefore Range =
____________

1
3) Outliers can be easily identified and removed: a data point that differs
significantly from other observations

Example:

2.2 Back-to-back stem and leaf diagrams

 Useful when data is to be split into 2 comparable categories such as

2
boys/girls; children/adults, Chinese/Russian, etc.
E.g.,

Note:
1) One stem for both girls and boys, a system of leaves for either, one key
that explains both
2) The leaves on the left-hand side of the stems (Boys) increase from the
center outwards.

Example:

The following stem and leaf diagrams show the times taken by some children
and adults to complete a level on a computer game.

Key:2 | 3 represents a time of 23 seconds

(a) Compare the times taken to complete the level between the children and
the adults.

3
(b) It is later discovered two of the adults’ times had been omitted from the
diagram –times of 23 and 42 seconds. Briefly explain whether adding these
times would change the adults’ median time.

4
2.3 Box-and-Whisker Plots

 A graph that clearly shows key statistics including median, quartiles,

minimum, maximum and outliers.

 Used for both continuous and discrete data.

 Does not show any other individual data items.

 Steps to draw a box-and-whisker plot:

5
1) Write the individual data points if discrete and form classes in order from
smallest to largest.
2) Draw a scale based on the the classes and label the scale.
3) Determine Lower Quartile Q1, Median Q2, Upper Quartile Q3, minimum
and maximum.
4) Complete the box using Q1, Q2 (Median) and Q3.
5) Draw the whiskers using the minimum and maximum.

 Box plots are often used to compare two sets of data such that:
1) both plots will be drawn one above the other on the same scale on the x-
axis
2) It’s easy to see the main shape of the data distribution.

Example:

The incomplete box plot below shows the tail lengths in cm of some students’
pets.

(a) Given that the median tail length was 21 cm, complete the box plot. Mark
the key statistics including Median, UQ, LQ, Max and Min.

(b) Find the range and interquartile range of the tail lengths

6
7
2.4 Histograms

 Displays grouped continuous or discrete data such that it doesn’t

allow gaps between the class intervals in a histogram. If there is gap,
reform the Classes by finding the middle point between Classes in last
section, regardless of whether the data type is discrete or
continuous.
E.g. If given classes as , (continuous), or 0-9, 10-
19 (discrete), you should transform them into 0 ≤ x< 9.5 and 9.5 ≤ x< 19.5.

 Consists of x/y-axis where:

1) on x-axis, the class intervals are plotted in order. But Note: the intervals
are not necessarily of equal length.

2) on y-axis, the frequency density for each class is plotted that is shaped
like rectangles/bars: the frequency per unit of the data in each class

E.g

8
 Steps to make a histogram:
 Always check there are no gaps in between classes.

1) Find the class width of each group by subtracting the lower boundary
from the upper boundary

2) Calculate the frequency density.

3) Label the class intervals on x-axis and plot the frequency density for each
class like bars on y-axis. Note: the bars may have different widths.

Example:

9
10
2.5 Cumulative Frequency Graph (c.f.)

 Used with data that has been organized into a grouped frequency
table, therefore it is not possible to find the actual values of mean, median
and quartiles: we can only estimate them.

 Consists of:
1) on x-axis plotted the Classes: Note that the sample size is usually a large
number, so you need to examine the scale carefully before labelling them.

2) on y-axis plotted the number of data points/observations up to

(accumulative) a certain data value on the upper boundary of each class:
consider both the frequency of the data in that specific class and
that of all data in the Classes below it.

11
 Steps to draw a cumulative frequency table:
1) Draw x-/y- axis with scale based on classes. Label x-axis with the random
variable and its unit. Label y-axis with cumulative frequency.

2) Pinpoint the data (x,y) as dots on your graph based on the upper class
boundary of each class (x) and the cumulative frequency up to that point
(y).

3) Connect the dots with curves rather than straight lines. (IMPORTANT)

 Find approximated statistics from the c.f. graph:

1) Lower Quartile Q1:

Draw a horizontal straight line representing . The intersection point

with the curve reflects the corresponding x values as the Q1.

2) Median Q2:

Draw a horizontal straight line representing . The intersection point

with the curve reflects the corresponding x values as the Q2.

3) Upper Quartile Q3:

Draw a horizontal straight line representing . The intersection point

with the curve reflects the corresponding x values as the Q3.

Example:
The cumulative frequency graph below shows the lengths in cm, l , of a group
of puppies in a training group.

(a) Given that the group was one of the groups used in the data
collection, find the number of puppies that were in this group.

(b) Use the graph to find an estimate for the interquartile range of the
puppies.

12
13
14
2.6 Summary of Comparison btw Graphs/Plots

 Stem-and-leaf diagrams:
1) used with discrete data of a single variable (with back-to-back it can be
categorized into two, but still single variable)
2) shows all raw data and shape of data distribution
3) used for datasets of small sample size (less than 30)

 Box-and-Whisker Plots:
1) used with discrete or continuous data of a single variable
2) shows the range, IQR,and Q1/2/3 quartiles.
3) useful for comparing data patterns quickly

 Histogram:
1) used with grouped continuous (more commonly) or discrete data of
a single variable
2) used with varying Class width/group sizes.
3) shows the frequencies of each Class, represented by the area of each
bar.

 Cumulative frequency graphs.

1) used with grouped continuous data of a single variable
2) shows the cumulative frequencies that fall below the upper boundary
of each Class.

15
Example:

A student is collecting information on his friends’ interests and believes that

his friends who only have dogs spend more time outside than his friends who
only have cats. He has surveyed 20 friends with only cats and 20 friends with
only dogs and has written down the total amount of time, rounded to the
nearest hour, each of them spent outside last week. Describe, with a reason,
which diagram would be best for the student to use to display the data.

 Skewness

1) Skewness describes the way in which data in a non-symmetrical

distribution is leaning

-- A distribution that has its tail on the right side has positive skew:
skewed to the right
Tail extends to the right side; the majority of data concentrates in lower values.

-- A distribution that has its tail on the left side has negative skew:
skewed to the left
Tail extends to the left side; the majority of data concentrates in higher values.

16
2) If the distribution is shown on a box plot looking at the difference
between the quartiles can help decide how it is skewed:
-- If the median is closer to the lower quartile then the distribution
has positive skew
Q3 - Q2 > Q2- Q1

 Majority of lower values are consistent and less varied on the left
side --- smaller Q2-Q1
 Tail of higher values are spread out and more varied on the right side
--- larger Q3-Q2

-- If the median is closer to the upper quartile then the distribution

has negative skew
Q3 - Q2 < Q2 - Q1

 Majority of higher values are consistent and less varied on the right
side --- smaller Q3-Q2
 Tail of lower values are spread out and more varied on the left side ---
larger Q2-Q1

17
3) Looking at the values of the statistics can help you decide whether
distribution is positively skewed or negatively skewed
-- In a positively skewed distribution
mode < median < mean

 Majority of data concentrates on lower values but the mean is pushed

up by the more-varied-higher values on the right, which gives a
higher mean.
 Tail doesn’t influence median as much.
 Majority of data concentrates on lower values, which gives a lower
mode.

-- In a negatively skewed distribution

mean < median < mode

 Majority of data concentrates on higher values but the mean is pulled

down by the more-varied-lower values on the right, which gives a
lower mean.
 Tail doesn’t influence median as much.
 Majority of data concentrates on higher values, which gives a higher
mode.

18
Example:

The graph below shows the distribution according to height of a group of jockeys at a south
Florida horse track. Select the statement that correctly describes a relationship between measures
of central tendency for this distribution.

A. The mean is less than the mode.

B. The mode and the mean are the same.
C. The median is greater than the mode.
D. The median and the mean are the same.

The graph below shows the distribution of a group of Gator fans according the number junk cars
in their back yards. Select the statement that correctly gives a relationship between measures of
central tendency for this distribution.

19
A. The mean is the same as the median.
B. The mean is less than the mode.
C. The mode is less than the mean.
D. The median is less than the mean.

As Level Math STATISTIC
No ratings yet
As Level Math STATISTIC
32 pages
Mathematics Revision Guide S1 - Compressed
No ratings yet
Mathematics Revision Guide S1 - Compressed
76 pages
Applied Statistics Overview
100% (1)
Applied Statistics Overview
61 pages
Lecture-3 Graphical Representationsin Statistics
No ratings yet
Lecture-3 Graphical Representationsin Statistics
67 pages
Final Term Notes Ands
No ratings yet
Final Term Notes Ands
43 pages
Grade 12 Statistics Guide
No ratings yet
Grade 12 Statistics Guide
217 pages
Share Report in Elementary Statistics and Probability - 1
No ratings yet
Share Report in Elementary Statistics and Probability - 1
72 pages
4.1 Descriptive Stat - Part 1
No ratings yet
4.1 Descriptive Stat - Part 1
32 pages
Unit 12 - Averages - Measures of Speed
No ratings yet
Unit 12 - Averages - Measures of Speed
4 pages
Frequency Distributions & Graphs
No ratings yet
Frequency Distributions & Graphs
11 pages
S1 Mesure of Location
No ratings yet
S1 Mesure of Location
35 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
42 pages
1 Review of Statistics
No ratings yet
1 Review of Statistics
24 pages
Lesson 7 (Descriptive Statistics Part 3) - Oct 2024
No ratings yet
Lesson 7 (Descriptive Statistics Part 3) - Oct 2024
32 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Information Representation - Notes
No ratings yet
Information Representation - Notes
13 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
8 pages
S1 Mesure of Location
No ratings yet
S1 Mesure of Location
33 pages
Statistics 1 Notes
No ratings yet
Statistics 1 Notes
32 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
AEM Lecture 2
No ratings yet
AEM Lecture 2
71 pages
What Is Raw Data?
No ratings yet
What Is Raw Data?
8 pages
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
No ratings yet
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
23 pages
Data Rep Slides
No ratings yet
Data Rep Slides
20 pages
Mean, Median, and Mode For PDF On 20241028
No ratings yet
Mean, Median, and Mode For PDF On 20241028
69 pages
Descriptive Statistics Week 2: L2 - Graphical Display of Data
No ratings yet
Descriptive Statistics Week 2: L2 - Graphical Display of Data
22 pages
CS 459 Chapter 2
No ratings yet
CS 459 Chapter 2
84 pages
Frequency Distribution & Data Presentation
No ratings yet
Frequency Distribution & Data Presentation
9 pages
Lesson 6 Presentation of Data
No ratings yet
Lesson 6 Presentation of Data
73 pages
Data Visualization & Data Exploration - Unit II
No ratings yet
Data Visualization & Data Exploration - Unit II
26 pages
BUSINESS STATISTICS - Unit-2
No ratings yet
BUSINESS STATISTICS - Unit-2
23 pages
Ed242 Lec2a Review Data
No ratings yet
Ed242 Lec2a Review Data
21 pages
CAIE A2 Paper 3 Maths
No ratings yet
CAIE A2 Paper 3 Maths
48 pages
CH 2 Processing and Representing Data
No ratings yet
CH 2 Processing and Representing Data
96 pages
MATH103 M2 Data Presentation
No ratings yet
MATH103 M2 Data Presentation
43 pages
g11 10 Statistics
No ratings yet
g11 10 Statistics
49 pages
Lesson 6 Descriptive Statistics - Data Representation
No ratings yet
Lesson 6 Descriptive Statistics - Data Representation
35 pages
Methods of Data Collection and Presentation
No ratings yet
Methods of Data Collection and Presentation
33 pages
Algebra1section9 2
No ratings yet
Algebra1section9 2
13 pages
1st Mid
No ratings yet
1st Mid
19 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
No ratings yet
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
17 pages
Describing Data New
No ratings yet
Describing Data New
13 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Stat 104 2.2docx
No ratings yet
Stat 104 2.2docx
3 pages
Chapter 3 - Representations of Data
No ratings yet
Chapter 3 - Representations of Data
3 pages
ISE1204 - Lecture 2
No ratings yet
ISE1204 - Lecture 2
42 pages
Annotated 3 Ch3 Data Description F2014
No ratings yet
Annotated 3 Ch3 Data Description F2014
16 pages
Intro to Statistics Basics
No ratings yet
Intro to Statistics Basics
53 pages
Frequency Distributions & Graphs Guide
No ratings yet
Frequency Distributions & Graphs Guide
28 pages
Stat Module 3.2
No ratings yet
Stat Module 3.2
16 pages
CH 2 Notes Filled
No ratings yet
CH 2 Notes Filled
22 pages
Finals RT Core 3
No ratings yet
Finals RT Core 3
25 pages
Note 02
No ratings yet
Note 02
31 pages
Statanalysis C2a
No ratings yet
Statanalysis C2a
6 pages
AP Exam Data Analysis Guide
100% (2)
AP Exam Data Analysis Guide
38 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
4 Permutation&Combination
No ratings yet
4 Permutation&Combination
26 pages
1.1 Lec Sheet
No ratings yet
1.1 Lec Sheet
3 pages
2.2 Lect Sheet
No ratings yet
2.2 Lect Sheet
5 pages
Remainder Factor Theorem Worksheet
No ratings yet
Remainder Factor Theorem Worksheet
1 page
Chapter 3 Exercises
No ratings yet
Chapter 3 Exercises
12 pages
1 Functions
No ratings yet
1 Functions
31 pages
5.discrete Probability Distributions
No ratings yet
5.discrete Probability Distributions
23 pages
Statistical Measures
No ratings yet
Statistical Measures
50 pages
IG-Maths 0606: Quadratic Functions and Simultaneous Equa-Tions Notes
No ratings yet
IG-Maths 0606: Quadratic Functions and Simultaneous Equa-Tions Notes
12 pages
Chapter 9-P3
No ratings yet
Chapter 9-P3
28 pages
Algebra 2 04-Review
No ratings yet
Algebra 2 04-Review
1 page
2 Q Permutations and Combinations Groups 202504221536 02660
No ratings yet
2 Q Permutations and Combinations Groups 202504221536 02660
17 pages
Chapter 8 P3
No ratings yet
Chapter 8 P3
17 pages
语音考点试讲 2
No ratings yet
语音考点试讲 2
3 pages
03 - 9709 - 12 - Afp - m25 - 10022025 2
No ratings yet
03 - 9709 - 12 - Afp - m25 - 10022025 2
17 pages
Module 2 - Data Management and Data Wrangling
No ratings yet
Module 2 - Data Management and Data Wrangling
40 pages
Holter: Operating Manual
No ratings yet
Holter: Operating Manual
24 pages
Course Content SAP S4 HANA SD Module
57% (7)
Course Content SAP S4 HANA SD Module
3 pages
Active Suspension System For Railway Pantographs
No ratings yet
Active Suspension System For Railway Pantographs
15 pages
KTA50 - Cam Followers
No ratings yet
KTA50 - Cam Followers
13 pages
Unemployment
No ratings yet
Unemployment
18 pages
Real-Time Smart Driver Sleepiness Detection by Eye Aspect Ratio Using Computer Vision
No ratings yet
Real-Time Smart Driver Sleepiness Detection by Eye Aspect Ratio Using Computer Vision
10 pages
Aramco Scafffold Materials - 19 April'2022
No ratings yet
Aramco Scafffold Materials - 19 April'2022
26 pages
Te1sa254-003k Turkey
No ratings yet
Te1sa254-003k Turkey
202 pages
Lesson 1-Use Hand Tools (CSS)
100% (1)
Lesson 1-Use Hand Tools (CSS)
34 pages
Energy Efficiency in Pump Systems
No ratings yet
Energy Efficiency in Pump Systems
30 pages
AP510C-510CX QuickRef
No ratings yet
AP510C-510CX QuickRef
2 pages
Leaving Cert 2025 Maths Paper 1 Higher Level 1
No ratings yet
Leaving Cert 2025 Maths Paper 1 Higher Level 1
32 pages
Quantum Computing Qubits
No ratings yet
Quantum Computing Qubits
15 pages
The Big Book of Commercial Solar Software Must Haves
No ratings yet
The Big Book of Commercial Solar Software Must Haves
17 pages
Accounts 31.10.2020 11-30
No ratings yet
Accounts 31.10.2020 11-30
5 pages
Introduction To Business Logic
No ratings yet
Introduction To Business Logic
10 pages
Title: AI in Art and Music Generation
No ratings yet
Title: AI in Art and Music Generation
8 pages
FDS100 Eng
No ratings yet
FDS100 Eng
1 page
Zetron Model 284 Spec Sheet
No ratings yet
Zetron Model 284 Spec Sheet
4 pages
Advanced Programming With Net
No ratings yet
Advanced Programming With Net
3 pages
The Demystification of Lookup Tables in Revit Families I
100% (1)
The Demystification of Lookup Tables in Revit Families I
35 pages
5 Parallel and Distributed Computing
No ratings yet
5 Parallel and Distributed Computing
9 pages
c12 - Introduction To The Base Station System
No ratings yet
c12 - Introduction To The Base Station System
18 pages
TPN Server Users Guide EPDOC-X143-En-500
No ratings yet
TPN Server Users Guide EPDOC-X143-En-500
146 pages
DTC C1231/31 Malfunction in Steering Angle Sensor Circuit
No ratings yet
DTC C1231/31 Malfunction in Steering Angle Sensor Circuit
4 pages
Curriculum Vitae: Surinder Singh
No ratings yet
Curriculum Vitae: Surinder Singh
3 pages
PDCCH Optimization for LTE Performance
100% (1)
PDCCH Optimization for LTE Performance
1 page
Secabo CIII
No ratings yet
Secabo CIII
27 pages
AI Art: Transform Words to Images
No ratings yet
AI Art: Transform Words to Images
1 page