Ch. 1 Notes
Ch. 1 Notes
What is the goal of statistics? To learn about a large group by examining data from some of its
members.
Statistics – the science of collecting, organizing, summarizing, and analyzing information to draw
conclusions or answer questions. In addition, statistics is about providing a measure of confidence in
any conclusions.
Note there are 4 parts to definition:
1. Collecting data
2. Organize & Summarize (descriptive statistics)
3. Analyze and draw conclusions (remember our goal) using inferential stats
4. Predictions cannot be 100% accurate…how confident are you in the results?
Data – information that has been collected (used in drawing a conclusion or making a decision)
Population – the entire group of individuals to be studied (in most cases we cannot survey or analyze
EVERY member of the population)
Ex: all voters in America
Ex: all people in the world with a particular medical problem
Sample – A subset of the population being studied. We often analyze information from a sample to
make predictions for the whole population. (the sample data must be representative of the
population from which the data was drawn)
Ex: survey a sample of voters to predict who will win an election
Ex: give a sample of patients a new medication to see if it will help all patients
Note: if sound statistical procedures are followed, we can often use statistics to estimate parameters.
But, different samples can give different statistics. So, we need to express how confident we are in
the results.
Note that the 4 parts of our definition of statistics will break our course into 2 main
portions….descriptive statistics and inferential statistics. Basically we either DESCRIBE the data we
have or we use it to INFER (predict) what we can’t know for sure.
1
Descriptive Statistics – consist of organizing and summarizing data (numerical summaries, tables,
graphs)
Inferential Statistics – uses methods that take a result from a sample, extend it to the population,
and measure the reliability of the result. (we use statistics to estimate parameters)
Quantitative Variables – provide numerical measures of individuals. The values can be added or
subtracted and provide meaningful results. Examples…heights, weights, ages, shoe sizes
Discrete variable – when the value of the variable is finite or countable. (ie, 0, 1, 2, …)
Examples…number of books in your car, number of people in your family (counted items)
Continuous variable – when the value of the variable is infinite and not countable.
Examples… gallons, weight, height,… “measured amounts” (decimals/fractions make sense
even though we may round for practical purposes)
---------------------------------------------------------------------------------------------------------------------------------------
Section 1.2 – Studies vs. Experiments
In research, we wish to determine how varying the amount of an explanatory variable will affect the
value of the response variable.
Designed Experiment – when you assign the individuals in the study to a certain group, intentionally
change the value of an explanatory variable, and then record the value of the response variable for
each group. (You DO something)
Observational Study – when you observe the behavior of the individuals in the study, without trying
to influence the outcome of the study. (You WATCH what was already happening anyway)
In an observational study, the researcher can only claim association, not causation (maybe some
underlying reason caused both things to happen.)
Confounding in a study occurs when the effects of two or more explanatory variables are not
separated. Confounding can happen in one of two ways:
Lurking Variable: an explanatory variable that was not considered in a study, but that affects
the value of the response variable. (some underlying reason may have caused it to happen)
2
Confounding Variable: an explanatory variable that was considered in a study whose effect
cannot be distinguished from a secondary explanatory variable in the study. (Ex: a professor
tries two different teaching techniques. One in the morning class and the other in an
afternoon class. She knows the student performance results could be due to time of day
rather than her teaching method, but can’t distinguish whether the results were due to time
or teaching method)
---------------------------------------------------------------------------------------------------------------------------------------
Sections 1.3 & 1.4 – Sampling Methods
Random Sampling – the process of using chance to select individuals from a population to be
included in the sample. (each individual member has an equal chance of being selected)
Simple Random Sampling – when sample of size n from a population of size N is obtained where
every possible sample of size n has an equally likely chance of occurring.
Although SRS is best, it requires us to have a frame (which we don’t always have). It can also be
expensive in terms of financial cost and/or time. So, sometimes we use other strategies.
---------------------------------------------------------------------------------------------------------------------------------------
Section 1.5 – Bias in Sampling
If the results of the sample are not representative of the population, then the sample has bias.
Sampling Bias – the technique used to obtain the sample’s individuals tends to favor one part
of the population over another and can lead to incorrect predictions.
Ex: using phone book to call people and ask their preferences would leave out people without
landlines and homeless
Nonresponse Bias – when individuals selected to be in the sample who do not respond to the
survey have different opinions from those who do respond to the survey.
(this can be reduced by using callbacks for people who did not respond to mailed surveys, or
providing incentive)
Response Bias – when the answers on a survey do not reflect the true feelings of the
respondent. (can be the fault of interviewer error, misrepresented answers, wording of the
questions, ordering of the questions or words, type of question, or data-entry error)