Lecture 1
Let’s start at the very beginning!
Part I: Intro
• Know the different
types of models in
science
• Define alternative
Chapter and null hypothesis
objectives • Understand
importance of null
hypothesis
• Distinguish
experiments from
observational
studies, and know
the pros and cons
What is a model?
What is a model?
The explanation of an observed pattern
In a model, a series of statements (or
formulae) explains why the observations
have occurred.
Types of
models
Verbal/conceptual
models non-
mathematical
explanations of how
nature works
Types of
models
Mathematical
models
Empirical or
statistical models:
mathematical
descriptions of
relationships
resulting from
processes
Types of
models
Mathematical
models
Empirical or statistical
models: These models
use mathematical
expressions to best
describe the data. They
show coefficients
without technical
meaning
Types of
models
Mathematical
models
Theoretical or
mechanistic or process-
based models:
mathematical
descriptions of
processes. These models
use mathematical
expressions that best
describe the physical or
biological process.
Coefficients have a
Exam How far will the ball roll?
Two modeling options to
ple figure this out..
Empirical Mechanistic
Use data from previous Use Newton’s second law and
experiments to create a kinematics equation to predict
regression predicting distance how far a ball will roll based on
from force (correlation). its mass and force applied
(causal).
Once we have a model, next
step is to form a hypothesis…
We can make a prediction or predictions
deduced from our conceptual model or
theory; these predictions are called
research hypotheses.
If a particular model is correct, we would
predict specific observations under a new
set of circumstances.
Once we have a model, next
step is to form a hypothesis…
Bacteria were sampled from
three surfaces on the human
body with varying disturbances,
counted, and classified into
morphospecies.
Research question: Are
intermediate levels of
disturbance associated with high
bacterial diversity?
Hypothesis: Intermediate
disturbance levels will increase
the number of bacterial
morphospecies.
Statistical hypothesis testing
Hypothesis testing involves the careful
construction of two statements: the null
hypothesis and the alternative hypothesis.
These hypotheses can look very similar
but are actually different.
Statistical hypothesis testing
Null and alternative hypotheses are similar in
some ways:
•They’re both answers to the research
question.
•They both make claims about the population.
•They’re both evaluated by statistical tests.
However, there are important differences
between the two types of hypotheses,
summarized in the following table.
Alternative hypotheses
Null hypotheses (H0) (Ha)
Definition A claim that there is no A claim that there is an effect in
effect in the population. the population.
Also known as H0 Ha
H1
Typical phrases used •No effect •An effect
•No difference •A difference
•No relationship •A relationship
•No change •A change
•Does not increase •Increases
•Does not decrease •Decreases
Symbols used Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
p≤α Rejected Supported
p>α Failed to reject Not supported
Philosophers have long
recognized that proof of a
Why have a theory or its derived
null hypothesis is logically
impossible, because all
hypothesis? observations related to the
hypothesis must be made
For example:
A long history of
observations in Europe led
to the hypothesis that
Why have a swans were white.
null However - only by observing
all swans everywhere could
hypothesis? we “prove” that all swans
are white.
Obviously, this is impossible.
In fact, there are black
swans in Australia.
Why have a
We specify a null
null hypothesis that includes all
hypothesis? possibilities except your
prediction, because it is
much simpler logically to
disprove a null hypothesis.
In other
words…
A null hypothesis is critical in
science because it allows us to
test predictions in a way that
emphasizes falsifiability.
Instead of trying to prove our
hypothesis directly, we aim to
disprove the null hypothesis,
which is simpler and more
rigorous.
Swans example
Initial Hypothesis: “All swans are white”
This is difficult to prove because it would require
observing every swan in the world.
Null Hypothesis: “Not all swans are white” (includes
the possibility of non-white swans).
The null hypothesis is broader and includes the
possibility of non-white swans. Importantly, it sets up
a testable statement.
Swans example
Testing the Null:
We collect a sample of swans from all over the world.
If we find even one black (or any other color) swan,
this provides evidence supporting the null
hypothesis and disproves the initial hypothesis ("All
swans are white").
Swans example
What Happens if We Don’t Find a Non-White Swan?
If we fail to find a non-white swan in our sample, we
reject the null hypothesis ('Not all swans are white')
and provisionally accept the initial hypothesis ('All
swans are white’).
However, this does not prove the initial hypothesis
definitively—it only means that our current sample
does not provide evidence to refute it.
Falsifiability
What Makes a Hypothesis Not
Falsifiable:
1.Vagueness or Ambiguity
2.Unobservable Entities or
Phenomena
3.Non-Specific Predictions
4.Overly Broad or General
Statements
Exercise!
Come up with an imaginary research
hypotheses and corresponding alternative
and null statistical hypothesis.
We do studies to gather
information and test
hypotheses. There are
two types of studies.
Now let’s • In an observational
study, we measure or
talk about survey members of a
sample without trying to
study affect them.
• In an experiment, we
design! assign people or things
to groups and apply
some treatment to one
of the groups, while the
other group does not
receive the treatment.
Observational study
Benefits
• Depending on what is being studied, the
results can be more true to life because of the
natural circumstances.
• It can help researchers to study something
that would be unethical to ask participants to
do. For instance, the effect of an unhealthy
activity, i.e. smoking.
Weaknesses
• The lack of a controlled environment usually
means there are other factors that could have
influenced the result.
Experiment
Benefits
• In certain fields, considered the ‘gold standard’
for producing reliable evidence.
• Simple experiments, with tight controls, can give
pretty conclusive results.
Weaknesses
• Research involving this method can frequently be
expensive and time-consuming to gain approval.
• Sometimes the unnatural circumstances can be
unethical either for animals or humans. For
instance, giving people medical placebos can be
disappointing or psychologically damaging.
Exercise!
Problem 1: Drinking tea before bedtime
A study took random sample of adults and asked them
about their bedtime habits. The data showed that
people who drank a cup of tea before bedtime were
more likely to go to sleep earlier than those who didn't
drink tea.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Problem 1: Drinking tea before bedtime
A study took random sample of adults and asked them
about their bedtime habits. The data showed that
people who drank a cup of tea before bedtime were
more likely to go to sleep earlier than those who didn't
drink tea.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Another study took a group of adults and randomly
divided them into two groups. One group was told to
drink tea every night for a week, while the other
group was told not to drink tea that week.
Researchers then compared when each group fell
asleep.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Another study took a group of adults and randomly
divided them into two groups. One group was told to
drink tea every night for a week, while the other
group was told not to drink tea that week.
Researchers then compared when each group fell
asleep.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Problem 2: Social media and happiness
A study randomly assigned volunteers to one of two groups:
One group was directed to use social media sites as they
usually do. One group was blocked from social media sites.
The researchers looked at which group tended to be
happier.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Problem 2: Social media and happiness
A study randomly assigned volunteers to one of two groups:
One group was directed to use social media sites as they
usually do. One group was blocked from social media sites.
The researchers looked at which group tended to be
happier.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Another study took a random sample of people and
examined their social media habits. Each person was
classified as either a light, moderate, or heavy social
media user. The researchers looked at which groups
tended to be happier.
Which type of study method is this?
• A) Observational study
• B) Experiment
Exercise!
Another study took a random sample of people and
examined their social media habits. Each person was
classified as either a light, moderate, or heavy social
media user. The researchers looked at which groups
tended to be happier.
Which type of study method is this?
• A) Observational study
• B) Experiment
Part II: Variables
and Probability
Define
continuous and
Chapter discrete
variables
objective Understand
observation
s uncertainty
Understand the
role of
probability in
statistics
Random variables
A random variable is simply a variable
whose values are not known for certain
before a sample is taken.
The set of all possible outcomes of a
random variable, is called the sample
space.
Random
variables
Types of variables:
• Numerical
• Discrete
(countable)
• Continuous
(measurable)
• Categorical
• Nominal
• Ordinal
Random variables
Discrete variables:
Discrete data refers to countable,
individualized items. These items are not
divisible. Instead, they only exist in set
increments or units.
Random variables
Discrete variables:
• Numerical, whole, non-negative numbers
or values
• Easily visualized on charts, graphs,
histograms, etc
• “Simple,” in whole numbers or values
• Impossible to divide into parts
Random variables
Continuous variables:
Continuous data is data that is
measurable, versus being countable.
Continuous variables use increments that
are dividable and subcountable. In other
words, continuous variables could go on
forever.
Random variables
Continuous variables:
• Measurements using standard units
• Determined by the method used to
measure
• Often precise down to multiple decimal
places
Discrete Data Continuous Data
Not Countable (Infinite)
Countable Requires External
Measurement/Scale
Cannot Be Divided Past Its Subdivisible
Base Unit
Less Granularity More Granularity
Uncertainty and
Probability
The distinction between
discrete and continuous
variables is important,
especially when trying to
measure uncertainty and
probability.
The single most important
characteristic of biological
data is their uncertainty
Exercise! https://g.co/kgs/Y9tFuv
Let X be a random variable representing the roll of a
fair 6-sided die. Complete the following table which
will represent the theoretical distribution of X (Value
of the Die and it’s corresponding probability).
v
v
v
We just *observed* observation
uncertainty
This variability reflects sampling
uncertainty—the natural
Observed frequencies in our
randomness introduced when
sample differs from this perfect
we observe a subset of the full
distribution. These differences
population.
arise because we are working
with a finite sample, not the full As sample size increases, the
population. observed distribution will
converge toward the theoretical
one.
Important vocabulary
• Sample: random
collection of
observations
• Population: all possible
observations
Think about the swan
example!
In most statistical
analyses, we view
uncertainty in terms of
probabilities and
understanding probability
is crucial to understanding
modern applied statistics.
As a simple example, and
one that is used in nearly
every introductory
statistics book, imagine
the toss of a coin. Most of
us would state that the
probability of heads is 0.5,
but what do we really
mean by that statement?
The classical interpretation of
probability is that it is the
relative frequency of an
event that we would expect
in the long run, or in a long
sequence of identical trials.
In the coin tossing example,
the probability of heads
being 0.5 is interpreted as
the expected proportion of
heads in a long sequence of
tosses.
Using long-run frequency
trials for interpretation is the
classical statistical
interpretation of
probabilities.
This is called the
frequentist approach, or
frequentist statistics.