HS791: Data Presentation
Ambarish Kunwar
Department of Biosciences and Bioengineering
Indian Institute of Technology Bombay
1
Outline of this Lecture
 Types of variables and data
 Presenting your data using figures
 Making proper figures and tables
 Types of error bars
 Proper use of different types of error bars
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Variables and Data
VARIABLES: The values of the quantities that vary from
one measurement to another are called variables.
Examples: Gender and Age of people in a party.
Two types: Qualitative and Quantitative
DATA: Data are values of qualitative or quantitative
variables, belonging to a set of measurements.
Qualitative Data: No numbers can be assigned
Quantitative Data: Number can be assigned
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Quantitative Variable
A variable quantity may either be continuous i.e. it can
assume any value within a certain range
Or, it can only assume integer values (whole numbers)
and not fractions of integers
Continuous variables are usually measurements
Examples: heights, weights, lengths
Integer (Discontinuous) variables are usually counts
Examples: number of petals on a flower, number of fishes
in a pond
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Qualitative Variable
Qualitative data arise when the observations fall into
separate distinct categories.
Examples:
Colour of eyes : blue, green, brown
Exam result: pass or fail
Socio-economic status: lower, middle or upper.
Qualitative data are classified as:
Nominal if there is no natural order between the categories
(e.g. eye color)
Ordinal if an ordering exists (e.g. grades, pain level)
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Tutorial Assignment I
Please indicate type of variable by checking two appropriate boxes
from: integer, continuous, quantitative, qualitative, ordinal, and
nominal
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Presenting your data (Discrete)
 A simple and effective way of summarizing discrete
(qualitative or quantitative) data is by counting the
number of observations falling into each category
 The number associated with each category is called the
frequency and the collection of frequencies over all
categories gives the frequency distribution of that
variable
The relative frequency is a number which describes the
proportion of observations falling in a given category
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Presenting your data (Discrete)
Illustrated using data on the number of students in different
disciplines in a university
Discipline
Total Frequency
Relative Frequency
Percentage
Physics
70
0.175
17.5
Chemistry
85
0.225
22.5
Mathematics
30
0.075
7.5
Biology
90
0.225
22.5
Statistics
45
0.1125
11.25
History
32
0.0875
8.75
Economics
34
0.100
10
The frequency distribution of a variable is often presented
graphically as a Bar Chart
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Presenting your data (Bar Chart)
The vertical axis can be:
frequencies
or, relative frequencies
or, percentages
On the horizontal axis:
 All boxes should have the same width
 Leave gaps between the boxes (no connection between them)
 Boxes can be in any order.
An alternative way of displaying the data is using a Pareto Chart
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Presenting your data (Pie Chart)
 However, For relative frequencies or percentages,
a Pie Chart is often more appropriate
 Each slice represents a proportion of the total
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
10
Presenting your data (Scatter Plot)
Bar charts and pie charts are often used to describe the structure within a
particular data set.
Often, occasions arise where we want to examine the relationships
between two or more quantitative data sets. The data can be continuous or
integer.
Examples:
1. Are the number of caterpillars on an oak leaf (integer variable) related to
the size of the leaf (continuous variable) for a sample of 100 leaves?
2. Does the number of sparrows (integer variable) in a particular village is
determined by the number of houses (integer variable) ?
3. Is the length of the upper arm bone (continuous variable) related to the
length of the upper leg bone (continuous variable) in a group of 100
students?
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
11
Presenting your data (Scatter Plot)
 The first think to think about is whether you believe one of the
variables (the INDEPENDENT variable) has the main effect on
the values of the other variable (the DEPENDENT variable) but
not vice versa).
 If so, plot independent variable on x-axis and dependent
variable on Y-axis.
Identify Independent variable
and dependent variable in last
three examples
If neither variable is dependent or
independent the one can choose
either variable to be the dependent
variable
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
12
Presenting your data (Scatter Plot)
 Consider another scatter
plot: Relationship between
number of fishes and
pond size
 As pond size increases,
there is a tendency for the
number of fishes to also
increase. The relationship
isnt perfect (all the points
do not lie on a single line)
but seems right.
OUTLIER
 A point a bit out from all other points is called an OUTLIER
and care should always be taken to identify outliers
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
13
Presenting your data (Scatter Plot)
 Why is this point different?
 Possible Reasons?
 Number of fish was overestimated?
 Area of the pond underestimated?
 Measurements correct but the water level of the pond was higher
than the other ponds and so supported more fish?
 We cant tell by looking at the data and we would need
to study this pond in details to find out.
 However, the scatter plot has helped us to visualize the
data and identify this outlier.
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
14
Presenting your data (Continuous Data)
 The frequency distribution of a discrete quantitative variable may
be summarized in a bar chart or relative percentage in pie chart
 Similarly, The frequency distribution of a continuous quantitative
variable can be constructed in the same way by first grouping the
observations. Example: Body weight of 100 individuals
Put them in bins
find frequency
73.26
71.87
67.88
87.78
64.00
65.78
61.45
65.64
69.77
70.39
71.46
81.55
82.45
79.77
81.38
87.35
68.76
69.00
83.67
81.81
66.54
69.87
77.46
79.93
65.69
68.62
76.50
78.97
87.98
82.76
84.38
86.35
65.76
82.00
83.38
78.32
65.69
69.63
71.67
78.66
80.45
69.97
84.39
77.55
78.26
77.65
82.99
69.69
64.66
78.50
81.95
72.49
79.27
69.13
64.29
75.67
82.25
87.18
64.20
65.58
81.45
72.89
68.25
66.53
65.39
82.32
85.69
75.63
74.67
78.36
71.59
80.95
79.72
67.32
69.93
64.39
75.62
82.75
69.97
81.49
64.35
66.16
72.07
72.77
71.38
76.39
74.76
79.00
73.67
80.81
65.87
68.99
67.95
79.93
82.69
78.62
69.50
68.97
78.89
80.76
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
15
Presenting your data (Histogram)
Charts which show the frequency distributions of continuous
variables are called histograms.
Unlike bar charts, they are drawn without gaps between the bars
because the x-axis is used to represent the class intervals
Not so good in Excel!
Sometimes bin centers are shown
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
16
Tutorial Assignment II
A particular data set is described to you and you must choose
which of the following four graphs (Bar Chart, Histogram, Pie Chart
and Scatter Plot) is most appropriate for presentation of this data
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
17
How to make good figures
First lets look at a bad figure
Problems with figure?
 Appropriate axis and origin
 Tick marks
 Legends
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
18
Do not use figures unnecessarily. They take space and are costly!
Avoid them: especially when result can be described using simple one line text
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Figure source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html
Anatomy of a Figure
19
Figure source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html
Anatomy of a Figure
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
20
The Anatomy of a Figure
Use figures rather than table if same information can conveyed
using figure. Tables often take more space than figure
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
Table source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html
Anatomy of a Table
21
Error Bars
 Some researchers are often unsure how error bars
should be used and interpreted.
 Discuss some basic features of error bars and how
these can help communicate data and assist correct
interpretation.
 Error bars may show confidence intervals, standard
errors, standard deviations, or other quantities.
 Different types of error bars give quite different
informations, and so figure legends must make
clear what error bars represent
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
22
Error Bars (Descriptive and Inferential)
Error bars, if used properly:
EITHER give information describing the data
(Descriptive Error Bars)
OR, give information about what conclusions,
or inferences, are justified
(Inferential Error Bars)
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
23
 Range and standard deviation (SD) are
descriptive error bars because they show
how the data are spread.
 Range error bars encompass the lowest
and highest values.
 Standard Deviation (SD) roughly gives the
average or typical difference between the
data points and their mean, M.
Figure source: Cumming et al., JCB, Vol. 177, 7 (2007)
Error Bars (Descriptive)
 About two thirds of data points lie within
mean  1 SD region, and ~95% of the
data points will be within mean  2 SD
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
24
Error Bars (Descriptive)
 Mean of your results (M) will tend to get closer and closer
to the true mean , as you increase the size of your
sample, or repeat the experiment more times. Therefore,
we can use M as our best estimate of the unknown .
 Similarly, if repeat an experiment more and more times,
the SD of your experimental results will tend to more and
more closely approximate the true standard deviation ()
that you would get if the experiment was performed an
infinite number of times
 The SD of the experimental results will approximately
equal to , whether n is large or small.
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
25
Error Bars (Inferential)
 In biology and some other fields it is common to compare
samples from two groups, to see if they are different.
 Examples: wild-type vs mutant mice, or experimental result
with a control
 To make inferences from the data i.e. to make a judgment
whether the results are significantly different, or whether the
differences in results are due to random fluctuation or
chance, a different type of error bar can be used (inferential).
 These interferential error bars are standard error (SE/SEM)
bars and confidence intervals (CI/95%CI).
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
26
Error Bars (Inferential)
Mean (M) of the data with SE or CI error bars, gives an
indication of the region where you can expect the mean of
the whole possible set of results.
This region defines the values
that are most plausible for .
Figure source: Cumming et al., JCB, Vol. 177, 7 (2007)
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
27
Types of Error Bars
Error bars can be descriptive or inferential, and could be any of
the bars discussed on previous slides (or something else too)
Error bars are meaningless and misleading if figure
legend does not state what kind they are.
Take Home Message I
When showing error bars, always describe in the figure
legends what they are
Use error bars with caution especially when reporting data
replicate measurements and representative experiments
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
28
Replicates or Independent Samples
Scientist handles the wide variation that occurs in nature by
measuring a number of independently sampled individuals,
independently conducted experiments, or independent
observations (n).
Take Home Message II
Value of n (sample size or number of independently
performed experiments) must be stated in figure legend
It is very important that n (the number of independent
results) is carefully distinguished from the number of
replicates
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
29
Replicates: Example
Replicates: repetition of measurement on one individual in a
single condition, or multiple measurements of the same or
identical samples
Consider a lab experiment to determine whether deletion of a
gene in mice affects tail length.
Option 1: choose one mutant mouse and one wild type, and
perform 10 measurements of each of their tails.
Option 2: measure the tail lengths 10 wild type mice and 10
mutant mice
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
30
Replicates: Example
Option1 can not answer the central question, whether gene
deletion affects tail length, because n=1 for each genotype,
no matter how often each tail was measured
To address this question successfully we must distinguish
the possible effect of gene deletion from natural animal-toanimal variation.
Therefore, Option 2 is the correct experiment to do as n>1
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
31
Representative Experiments
Sometimes a figure shows only the data for a representative
experiment. This implies that several other similar
experiments were also performed. If a representative
experiment is shown, then n = 1, and no error bars should
be shown
Take Home Message III
Show error bars only for independently repeated
experiments, and never for replicates. Data from a
representative experiment should not have error bars,
because in such an experiment, n = 1
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
32
Comparing Results-Which Error Bars to use?
 Which error bar to use when comparing experimental
results
 For example, biologist usually try to compare experimental
results with controls,
 It is usually appropriate to show inferential error bars, such
as SE or CI, rather than SD when comparing experimental
results with some controls.
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
33
Table source: Cumming et al., JCB, Vol. 177, 7 (2007)
Summary - Common Error Bars
Take Home Message IV
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
34
Acknowledgments and Useful Resources
 I would like to thank all those whose scientific papers,
lecture notes and other course materials available on
the internet have directly/indirectly helped me to
prepare these lecture slides.
 Some of the useful resources are listed below:
1.
Cumming, G.; Fidler, F.; Vaux, D. L. (2007). "Error bars in experimental biology".
The Journal of Cell Biology 177 (1): 711. doi:10.1083/jcb.200611141
2.
http://www.dur.ac.uk/stat.web/variab.htm
3.
http://www.dur.ac.uk/~dbl0www6/cont_pres.htm
4.
http://www.ruf.rice.edu/~bioslabs/tools/tools.html
5.
http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefightml
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
35
Thank You !
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay
36