Ch 1 Statistics: The Art and Science of Learning from Data
Design, Description, Inference
Parameter – numerical summary of a population
Statistic - numerical summary of a sample
Ch2 Exploring Data with Graphs and Numerical Summaries
Categorical Variables:
Summarize with Counts and Percentages
Graphs – Bar Charts and Pie Charts
Quantitative Variables:
Graphs – Dotplots, Histograms, Stemplots, Boxplots
Measures of Center – Mean, Median, Mode
Measures of Spread – Range, IQR, Variance, Standard Deviation
Choosing best measures of center or spread for a particular shape distribution
How outliers affect measures of center and spread.
Empirical Rule (68% – 95% – 99.7%)
Quartiles and Percentiles
Five Number Summary
Ch3 Association: Contingency, Correlation and Regression
Contingency Tables: Conditional Proportions
Correlation (r):
Measures strength and direction of linear association between 2 quantitative vars
positive, negative
strong, weak
number between -1 and +1, no units
Regression:
Equation to predict y from x
x=explanatory (or predictor) variable
y=response variable
Regression Equation:
slope – average change in y for a one-unit change in x
y-intercept – expected value of y when x=0, BUT we only interpret
if x=0 makes sense and is close to the values of x observed in data
Find the equation using the data summaries
Use line for making predictions
Residuals = observed y – predicted y (prediction errors)
Least Squares Method: finds the line that minimizes the sum of squared residuals
R2 = (r) 2 proportion of the variability in y that is explained by the regression on x
Cautions:
Extrapolation
Influential Outlier
Correlation (or Association) does not imply Causation
Simpson’s Paradox – a lurking variable can reverse the association between two
categorical variables in a Contingency Table
Chapter 4: Gathering Data
Experiments vs Observational Studies
Simple Random Sample
Surveys:
Margin of Error
Sampling Bias: Undercoverage, Volunteer Samples, Convenience Samples
Nonresponse Bias
Response Bias
Experiments:
Control: Placebos, Blind Study, Lurking Variables, Matched Pairs (Blocks)
Randomization
Replication
Experimental Units
Response Variable
Factors
Treatments
Observational Studies:
Cross-sectional Studies
Retrospective Studies
Prospective Studies
Chapter 5: Probability in our Daily Lives
Randomness
Probability
Independent Trials
Sample Space
Complement of an Event: P(Ac)=1-P(A)
Disjoint Events A and B: P(A or B) = P(A) + P(B)
Conditional Probability: P(A | B) = P(A and B) / P(B)
Independent Events A and B: Definition: P(A | B) = P(A)
Multiplication Rule: P(A and B) = P(A) x P(B)
P(at least one)
Problems of sensitivity and specificity
Chapter 6: Probability Distributions
Discrete Random Variable:
Finite number of possible values
Probability Distribution: list, graph or formula with all possible values of X
and their probabilities
Population Mean
Continuous Random Variables:
Infinite number of possible values
Probabilities are areas under a density curve (smooth) with a total area of 1
Assign probabilities to intervals, not individual values of X
Normal Probability Distributions:
Bell-shaped curves, indexed by their mean: and standard deviation:
Follows Empirical Rule
z-score:
Empirical Rule
Using the Z table
area to the left, to the right, in between
value of x for top 5%, bottom 20%, central 50%, etc
Binomial Distribution:
Each of n trials can have two possible outcomes: success or failure
Probability of success for each trial is the same: p (independent events)
Binomial Random Variable X counts the number of successes
Mean: and Standard Deviation
Binomial Formula: