0% found this document useful (0 votes)

36 views3 pages

Cheat Sheet

Stab22

Uploaded by

Sahala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views3 pages

Cheat Sheet

Stab22

Uploaded by

Sahala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Scatterplots & Association. Scatterplot: A graphical display of two quantitative variables to show their relationship.

Types of
Associations: Positive Association: As one variable increases, the other also increases. Negative Association: As one variable
increases, the other decreases. No Association: No discernible pattern between the two variables. Correlation Coefficient. r is
between -1 and 1. Closer to 1 means strong positive correlation; closer to -1 means strong negative correlation; 0 means no
correlation. Properties: Symmetric: r is the same regardless of which variable is on the x-axis or y-axis. Not Affected by Scaling:
Shifting or rescaling variables does not affect r. Sensitive to Outliers: Outliers can drastically change the value of r. Interpretation of
Correlation Coefficient. r>0r > 0r>0: Positive relationship. r<0r < 0r<0: Negative relationship. Magnitude of r: The closer r is to±1,
the stronger the linear relationship. Linear Regression. Purpose: Linear regression models the relationship between two
quantitative variables by fitting a straight line to the data. Assumptions in Linear Regression. Linearity: The relationship between
x and y should be linear. How to check: Look for patterns in the residual plot. A curved pattern suggests a violation of the linearity
assumption. Constant Variance (Homoscedasticity): The variance of residuals should be constant across all values of x. How to
check: Check for funnel-shaped patterns in the residual plot. A funnel shape suggests heteroscedasticity. Normality of Residuals:
The residuals should be normally distributed. How to check: Use Q-Q plots or histograms of the residuals. Independence of
Residuals: Residuals should be independent of each other (especially in time series data). How to check: Use the Durbin-Watson
test. A value close to 2 indicates no autocorrelation. Key Regression Diagnostics. Residual Plot. Purpose: Used to check for
linearity, constant variance, and independence. What to look for: Randomly scattered points around 0 suggest the model is
appropriate. Problems: Curved pattern → Non-linearity. Funnel shape → Heteroscedasticity. Patterns over time → Autocorrelation.
Q-Q Plot. Purpose: Used to check the normality of residuals. What to look for: Residuals should lie along a straight diagonal line.
Deviations indicate non-normality. Regression Diagnostics Summary Linearity: Check residual plot for random scatter. Curves
suggest non-linearity. Constant Variance: Look for a funnel shape in the residual plot to detect heteroscedasticity. Normality: Use
Q-Q plots or histograms to check if residuals follow a normal distribution. Independence: Use the Durbin-Watson test to check for
autocorrelation in time series data. Common Problems and Solutions. Multicollinearity: Issue: Independent variables are highly
correlated. Solution: Remove one of the correlated variables or use Ridge regression. Heteroscedasticity: Issue: Non-constant
variance in residuals. Solution: Use transformations (e.g., log), weighted least squares, or robust standard errors. Non-linearity:
Issue: Curved relationships between variables. Solution: Use polynomial regression or transform variables. Outliers: Issue: Large
residuals that may skew the model. Solution: Identify and remove outliers or use robust regression techniques. R2 is it only reflects
the proportion of variance in the dependent variable explained by the model. Re-expressing Data (Transformations): When the
assumptions of linear regression (such as linearity or constant variance of residuals) are violated, transforming the data can often
help. Common transformations: Square (y²): Useful when the data is left-skewed. Square root (√y): Often applied to count data
(such as the number of items or occurrences). Logarithmic (log(y)): Works well for right-skewed data or data that grows
exponentially (such as interest rates or population growth). Inverse (1/y): Useful for ratios and rates (e.g., converting speed from
km/h to h/km). Outliers: These are points that deviate significantly from the rest of the data and can distort the regression line.High
Leverage Points: These occur when an x-value is far from the mean of the other x-values. While high leverage points don’t
necessarily distort the regression line, they have the potential to do so if they are influential. Influential Points: These are points
that have a significant effect on the slope of the regression line. Removing these points would greatly alter the fit of the line.Linear
Transformation. A linear transformation modifies a dataset by either shifting or rescaling all values. Formula; Y= aX+b. Where:
a: The scaling factor (multiplies each value). b: The shifting factor (adds/subtracts a constant from each value). Effect of Shifting
(Adding/Subtracting a Constant): Adds b to the mean and median. No effect on spread (standard deviation, IQR). Effect of
Rescaling (Multiplying/Dividing by a Constant): multiplies a to the mean, median, and spread (standard deviation, IQR).
Effect of Linear Transformations on Z-Scores: Shifting (adding/subtracting b): No effect on z-scores, because both the data
point and the mean shift by the same amount.Rescaling (multiplying/dividing by a): No effect on z-scores, because both the data
point and standard deviation are scaled by the same factor, which cancels out. Normal Distribution (Bell Curve) Characteristics:
Symmetric and unimodal (one peak at the mean). Defined by two parameters: Mean (μ): The center of the curve. Standard
deviation (σ): Controls the spread of the curve. 68-95-99.7 Rule: 68% of data falls within 1 standard deviation of the mean. 95%
falls within 2 standard deviations. 99.7% falls within 3 standard deviations. Why Use the Normal Distribution? Many real-world
datasets (heights, weights, IQ scores) follow a Normal distribution. Z-scores are especially useful for interpreting data within a
Normal distribution. General Density Curve: A smooth curve that describes the shape of the data distribution. Can be skewed,
bimodal, or uniform. The area under the curve equals 1 (100% of the data). Normal Density Curve (Bell Curve): A specific type
of density curve for normally distributed data. Symmetric and bell-shaped. Defined by mean (μ) and standard deviation
(σ).When to Use Each: Normal Density Curve: When the data is symmetric, unimodal, and follows the 68-95-99.7 rule. General
Density Curve: When the data is skewed, bimodal, or doesn’t follow the Normal distribution. Proportions and Z-Scores You can
use z-scores to find what proportion of data falls above or below a specific value in a Normal distribution.Simple Random Sampling
(SRS) Definition: Every individual in the population has an equal chance of being selected. Example: Selecting 20 students
randomly from a class of 575 using R (sample(1:575, 20)). Advantage: Minimizes bias, truly random selection. Disadvantage:
Not always practical for large populations. Sampling Frame.Definition: A list of individuals from which a sample is drawn. It should
closely match the population to avoid bias.Example: If you use the Yellow Pages to sample businesses, but not all businesses are
listed, this could introduce bias.Stratified Sampling Definition: The population is divided into strata (groups of similar individuals).
An SRS is taken within each stratum. Example: Dividing a student population by gender (40% male, 60% female) and ensuring the
sample reflects this ratio.Advantage: Ensures all groups are represented, reducing bias.Use Case: When there are distinct
subgroups in the population that might have different characteristics.Cluster Sampling.Definition: Instead of sampling individuals
directly, entire clusters (e.g., neighborhoods) are randomly selected, and then individuals within those clusters are surveyed.
Advantage: Practical for large, dispersed populations. Disadvantage: May introduce bias if clusters are not representative of the
whole population. Multistage Sampling Definition: A complex sampling method where sampling occurs in stages (e.g., first
sampling cities, then neighborhoods, then households). Example: National surveys where different regions are sampled at various
levels. Use Case: For large populations with hierarchical structures. Systematic Sampling Definition: Selects every ith individual
from a population after a random starting point. Example: Surveying every 5th customer in a store after randomly selecting a
starting point. Advantage: Easy to implement when you have a list of the population. Bias in Sampling. Undercoverage Bias
Definition: Occurs when some groups in the population are left out or underrepresented.Example: A phone survey that only
contacts landline users underrepresents younger people who primarily use mobile phones.Impact: Results may not reflect the true
population characteristics.Response Bias Definition: Occurs when the way questions are asked influences respondents to answer
in a particular way. Types of Response Bias: Leading Questions: A question like "How much do you agree that smoking is
harmful?" nudges respondents toward agreement. Social Desirability Bias: People may answer in a socially acceptable way rather
than truthfully (e.g., overstating recycling habits). Sensitive Topics: Respondents may underreport behaviors such as drug use due
to fear of judgment. Recall Bias: Asking participants to remember past events (e.g., "How many times did you exercise last year?")
can lead to inaccurate answers.Population Definition: The entire group you're studying (e.g., all Canadians, all students at a
college). Example: All college students in a drug-use survey. Sample Definition: A smaller group selected from the population to
represent the whole. Example: 100 students in your dorm surveyed about drug use. Parameter. Definition: A numerical value
describing a characteristic of the population. It is often unknown. Example: The true proportion of all students at a college who use
drugs (e.g., 30%). Statistic Definition: A numerical value describing a characteristic of a sample. It is used to estimate the
population parameter. Example: The proportion of students in your dorm who use drugs (e.g., 15%).Sampling Variability
Definition: Results can vary depending on the sample selected from the population. Key Point: Larger samples tend to have less
variability and provide more reliable estimates of the population parameter. Sample Size Effect on Accuracy: A larger sample size
reduces sampling variability and improves the accuracy of estimates. Note: The sample size, not the population size, is the key
factor in reducing variability—except when the sample is more than 10% of the population. Voluntary Response Bias.Definition:
Only individuals with strong opinions (often negative) are likely to respond, skewing the results.Example: A survey asking people to
rate their dissatisfaction with a product might only attract angry customers, leading to a biased conclusion.Convenience
Sampling.Definition: Including individuals who are easy to reach rather than selecting a representative sample.Example:
Surveying students who are nearby in a common area rather than randomly selecting from the entire student body. Undercoverage
Definition: Failing to sample certain groups within the population adequately. Example: Missing rural voters in a political survey
conducted primarily in urban areas. . Experimental Designs.Completely Randomized Design (CRD): All experimental units are
randomly assigned to treatments. Randomized Block Design (RBD): Random assignment is done within blocks of similar
subjects.Blocking Blocking is used to ensure that groups with specific characteristics (e.g., age) are not unevenly distributed in
treatment groups, which could bias the results. Experiments Placebo: A fake treatment is used to prevent knowledge of treatment
from affecting the response. The control group may receive a placebo or standard treatment. Blinding:Single-blind: Either the
subjects or evaluators don't know the treatment assignment. Double-blind: Both subjects and evaluators are unaware of treatment
assignments to avoid bias. Confounding Variable: A variable that affects both the factor and response, making it difficult to tell the
true cause of the response. Observational Studies vs. Experiments. Observational Studies: No control over the subjects'
behavior, simply observing and measuring variables. Experiments: Different treatments are imposed to compare effects. Data
Collection Methods Sample Surveys: Directly ask a sample for information. Observational Studies: Observe and record data
without manipulating variables. Retrospective: Look at past data. Prospective: Collect data as events happen. Experiments:
Assign different treatments to measure causal effects. Principles of Experimental Design.Control: Make conditions similar for all
groups except for the treatment. Randomize: Distribute unknown effects evenly across groups. Replicate: Take multiple
measurements to ensure results aren't by chance. Blocking: Group subjects by known factors to control variability. Blinding in
Experiments. Placebo-Controlled: Ensures the placebo effect doesn’t interfere with results. Double-Blind: Both researchers and
subjects are blinded to treatment assignments to prevent bias.

Da SMNR
No ratings yet
Da SMNR
32 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Topic Review 统计学考纲参考
No ratings yet
Topic Review 统计学考纲参考
14 pages
Bocalig Act5 MMW
No ratings yet
Bocalig Act5 MMW
6 pages
Slides
No ratings yet
Slides
39 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
Module5 Bigdata Analytics
No ratings yet
Module5 Bigdata Analytics
110 pages
Data Science & ML Essentials
No ratings yet
Data Science & ML Essentials
15 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
MATM Midterm Reviewer
No ratings yet
MATM Midterm Reviewer
10 pages
Further Summary
No ratings yet
Further Summary
29 pages
Outliers Correlation
No ratings yet
Outliers Correlation
21 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Day 3 Statistics Interview QnA
No ratings yet
Day 3 Statistics Interview QnA
5 pages
Statistics Concept Review
No ratings yet
Statistics Concept Review
54 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
Introduction To The Practice of Basic Statistics (Textbook Outline)
100% (14)
Introduction To The Practice of Basic Statistics (Textbook Outline)
65 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
Further Bound Reference
No ratings yet
Further Bound Reference
42 pages
ASM Using R 2 Marks Answer Keys
100% (1)
ASM Using R 2 Marks Answer Keys
10 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Intro to Statistics & Sampling
100% (1)
Intro to Statistics & Sampling
30 pages
Margin 6794edf99eb1f 6794ede66a47f
No ratings yet
Margin 6794edf99eb1f 6794ede66a47f
2 pages
Statistics Equationls
No ratings yet
Statistics Equationls
5 pages
Instructor'S Manual: Statistical Techniques in Financial Management
No ratings yet
Instructor'S Manual: Statistical Techniques in Financial Management
3 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
IPS7e LecturePPT ch02
No ratings yet
IPS7e LecturePPT ch02
105 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
AP Statistics Study Guide
No ratings yet
AP Statistics Study Guide
17 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
4 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
YMS Topic Review (Chs 1-8)
No ratings yet
YMS Topic Review (Chs 1-8)
7 pages
Measure of Dispersion-1
No ratings yet
Measure of Dispersion-1
17 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Understanding Linear Models and Regression
No ratings yet
Understanding Linear Models and Regression
23 pages
2
No ratings yet
2
20 pages
UNIT I Notes-1
No ratings yet
UNIT I Notes-1
18 pages
SCSA1606 - Predictive and Advanced Analytics - Unit II
No ratings yet
SCSA1606 - Predictive and Advanced Analytics - Unit II
50 pages
Statistics for Data Analysts
No ratings yet
Statistics for Data Analysts
2 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Unit I II III IV
No ratings yet
Unit I II III IV
23 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
9 pages
Group 1 Testing Assumptions
No ratings yet
Group 1 Testing Assumptions
35 pages
Applied Statistics 1 - Week 1
No ratings yet
Applied Statistics 1 - Week 1
5 pages
Regression
No ratings yet
Regression
49 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Notes
No ratings yet
Notes
3 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Pad Unit 2 Ibm
No ratings yet
Pad Unit 2 Ibm
61 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
DATA 240 - 23 - Lec3 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec3 - FA 2024 - Dist
50 pages
KIN 206 Assignment #3
No ratings yet
KIN 206 Assignment #3
14 pages
Statistics Problems for Students
100% (2)
Statistics Problems for Students
22 pages
Statexer#3
No ratings yet
Statexer#3
3 pages
Group 1 Tender Juicy
No ratings yet
Group 1 Tender Juicy
49 pages
Basic Statistics For Analysis and Interpretation of Assessment Data
No ratings yet
Basic Statistics For Analysis and Interpretation of Assessment Data
24 pages
Self-Control and Consumptive Behavior of Adolescents in An Islamic Boarding School
No ratings yet
Self-Control and Consumptive Behavior of Adolescents in An Islamic Boarding School
8 pages
Homework M2 Solution
No ratings yet
Homework M2 Solution
9 pages
Unit 3 Data Exploration (P)
No ratings yet
Unit 3 Data Exploration (P)
69 pages
Statistics Practice for Students
No ratings yet
Statistics Practice for Students
8 pages
Statistical Measurements
No ratings yet
Statistical Measurements
25 pages
Be Electrical Engineering Semester 3 2023 May Engineering Mathematics III m3 Pattern 2019
No ratings yet
Be Electrical Engineering Semester 3 2023 May Engineering Mathematics III m3 Pattern 2019
5 pages
STA641 Assignment 2 Solution Spring 2019: Join Us
No ratings yet
STA641 Assignment 2 Solution Spring 2019: Join Us
2 pages
Term Paper On Measures of Central Tendency
100% (1)
Term Paper On Measures of Central Tendency
5 pages
S9hbgKG6EemEawpeY3OQmg - Formative Quiz 1 Solutions PDF
No ratings yet
S9hbgKG6EemEawpeY3OQmg - Formative Quiz 1 Solutions PDF
2 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
63 pages
Probability Normal Distribution
No ratings yet
Probability Normal Distribution
20 pages
Advance Statistics Assignment
No ratings yet
Advance Statistics Assignment
3 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
39 pages
Measure of Central Tendency - Grouped Data
No ratings yet
Measure of Central Tendency - Grouped Data
15 pages
Quantitative Marking Guide For Set A Hips Buea
No ratings yet
Quantitative Marking Guide For Set A Hips Buea
4 pages
Numerical Descriptive Measures (Week2) : in This Chapter, The Students Should Be Able To
No ratings yet
Numerical Descriptive Measures (Week2) : in This Chapter, The Students Should Be Able To
37 pages
Ip Melc1 Q4
No ratings yet
Ip Melc1 Q4
2 pages
Age and Gender Data Analysis
No ratings yet
Age and Gender Data Analysis
41 pages
Geostatistics Exam Guide
No ratings yet
Geostatistics Exam Guide
9 pages
STAT 5700 Homework 1
No ratings yet
STAT 5700 Homework 1
19 pages
Central Limit Theorem Guide
No ratings yet
Central Limit Theorem Guide
13 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
REXNORD Falk Coupling Fit Tolerance
No ratings yet
REXNORD Falk Coupling Fit Tolerance
1 page
27.02.2024 For Students
No ratings yet
27.02.2024 For Students
7 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
15 pages

Cheat Sheet

Uploaded by

Cheat Sheet

Uploaded by

Scatterplots & Association. Scatterplot: A graphical display of two quantitative variables to show their relationship.

You might also like