Statistics

The document discusses topics related to exploratory data analysis including variable identification, univariate analysis, bivariate analysis, missing value treatment, outlier treatment, variable transformation, and variable creation. It also discusses regression, classification algorithms, measures of central tendency, measures of variability, measures of relationship, hypothesis testing, and regression analysis concepts.

Uploaded by

rkdakua2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views7 pages

Statistics

Uploaded by

rkdakua2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

EDA ( EXPLARATORY DATA ANALYSIS)

1- variable identification
2- univariate
3- bivariate-> correlation
4- missing value treatment->
 Deletion
 Mean/median/mode Imputation
 Prediction model
 KNN Imputation
5- outlier treatment
6- variable transformation
7- variable creation
RAW DATA & CLEAN DATA
- REGRESSION & CLASSION ( ML)
regression --> if dependent variable is continuous in natrure.
regression algorithm or regression model -->
1- simple linear regression
2- multiple linear regression
3- gradient descend || sgd || bgd
4- polynomial regression
5- support vector regression
6- decision tree regression
7- random forest regression ->regularization technique - L1 & L2 regression
8- time series
classification --> if dependent variable is binary in nature.
classification algorithm -
1- logistic regression
2- support vector machine
3- knn
4- naive bayes
5- decisstion tree
6- random forest
7- ada boost | catboost
8- xgboost
9- lgbm
1.MEASURE OF CENTRAL TENDENCY
Mean-> the sum of all values in the given data/population divided by a
It is preferred for numerical data
total number of values in the given data/population
Median-> the middle value after putting the observations in
ascending order
Mode-> the most commonly observed value in a set of data It is preferred for categorical data
2- MEASURE OF ASSYMETRY
Skewness-> measure of symmetry, or more precisely, the lack of
symmetry

every time we need to consider

only 0skewness
|| AKA -- (normal distribution ||
gaussian || 0 symmetrical)

+ve skewness --> (mean > median & mode) == data stays at left
and outlier is at right
0 skewness --> (mean = median = mode) == data stay at center
& no outlier
-ve skewness --> (mode > mean & Median) --> data stay at right
& outlier left)
Kurtois-> measure of whether the data are heavy-tailed or light-
tailed relative to a normal distribution every time we need to consider
leptokurtic == +ve kurtois == mean>median & Mode only mesokurtic
platykurtic == -ve kurtois == mode > mean & median
mesokurtic == normal distribution == mean = median = mode
3- MEASURE OF VARIABILITY-Regression Analysis

Sample equations are considered

everytime
3- MESUSURE OF RELATIONSHIP
Covariance

Correlation

from normal distribution to standard normal distribution

z-score (standard error) ==> converty mean to 0 & standard – 1

Type-I Error-> Reject a null true hypothesis

Type-II Error-> Accept a false null hypothesis
P-value=1-α
α- from standard distribution table with z-value and confidence
percentage, it can be determined Rule: you should reject null
hypothesis, if p-value<α

Z-Test is for population

Z-Test
Population variance is known

T-Test is for sample

t-Test
Population variance is unknown

Confidence Margin Error-> value obtained from

interval Z-Test/t-Test
Central Limit Theorem-> No matter what ever the distibution in entire dataset mean of sample you took
entire dataset it approximate to be normal distribution only
when sample size increase the
Standard Error-> standard deviation of the distribution formed standard error is decrerases &
by sample means bigger sample gives better
approximation
Strong positive correlation is recommended for consideration of relevant variables.
Sum of Squares Total (SST)

Sum of Squares Regression (SSE)

Sum of Squares of Error

SST=SSR+SSE
R Square-> 𝑹𝟐 = 𝑺𝑺𝑹/𝑺𝑺𝑻
range(0,1)
least squares method (ols)-> min SSE->lower error
 better explanatory power so this method aims to find the line which minimise the sum of squared
error

Ch-9 Data Preparation and Preliminary Analysis
No ratings yet
Ch-9 Data Preparation and Preliminary Analysis
15 pages
FDS CH 2
No ratings yet
FDS CH 2
2 pages
Statistical Analysis Basics
100% (1)
Statistical Analysis Basics
143 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
The World of Statistics
No ratings yet
The World of Statistics
1 page
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Data Science & ML Essentials
No ratings yet
Data Science & ML Essentials
15 pages
E Book - Unit 4
No ratings yet
E Book - Unit 4
12 pages
Complete SPSS Tests
No ratings yet
Complete SPSS Tests
148 pages
The World of Statistics
No ratings yet
The World of Statistics
1 page
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Statistics
No ratings yet
Statistics
64 pages
Intro to Statistics & Sampling
100% (1)
Intro to Statistics & Sampling
30 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
Data Processing and Anlysis
No ratings yet
Data Processing and Anlysis
41 pages
MATM Midterm Reviewer
No ratings yet
MATM Midterm Reviewer
10 pages
Bocalig Act5 MMW
No ratings yet
Bocalig Act5 MMW
6 pages
BBS I Formulae Statistics
No ratings yet
BBS I Formulae Statistics
22 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
BRM-Statistics in Research
No ratings yet
BRM-Statistics in Research
30 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
6 Data Analysis
No ratings yet
6 Data Analysis
24 pages
Basic Statistics Refresher For Business Analytics
No ratings yet
Basic Statistics Refresher For Business Analytics
5 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Data Analysis by Dr. E. Mushi
No ratings yet
Data Analysis by Dr. E. Mushi
70 pages
Instructor'S Manual: Statistical Techniques in Financial Management
No ratings yet
Instructor'S Manual: Statistical Techniques in Financial Management
3 pages
BS 5, Agile
No ratings yet
BS 5, Agile
5 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Foundations or Research Analysis
No ratings yet
Foundations or Research Analysis
31 pages
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
No ratings yet
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
36 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Final Stats Intrerview Q&A
No ratings yet
Final Stats Intrerview Q&A
20 pages
6 CE 411 - HYDROLOGY (Statistical Measures)
No ratings yet
6 CE 411 - HYDROLOGY (Statistical Measures)
33 pages
Lecture3IntroStat 2025
No ratings yet
Lecture3IntroStat 2025
27 pages
Ads Exp1
No ratings yet
Ads Exp1
4 pages
05 - Statistical Processing and Analysis of Medical Data
No ratings yet
05 - Statistical Processing and Analysis of Medical Data
14 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
3 4 Research 8 2
No ratings yet
3 4 Research 8 2
54 pages
Economic Analysis & Regression Guide
No ratings yet
Economic Analysis & Regression Guide
4 pages
Midterm Reviewer Matm
No ratings yet
Midterm Reviewer Matm
3 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Statistics Intro 1
No ratings yet
Statistics Intro 1
41 pages
RM Final Chapter 1
No ratings yet
RM Final Chapter 1
56 pages
Angilan, Ef
No ratings yet
Angilan, Ef
5 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
10 Question Answer
No ratings yet
10 Question Answer
2 pages
Descriptive & Inferential Stats Guide
No ratings yet
Descriptive & Inferential Stats Guide
3 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Activity 2
No ratings yet
Activity 2
4 pages
Binary Logistic Regression Analysis
No ratings yet
Binary Logistic Regression Analysis
19 pages
22bap03-Dabm Lab Manual
100% (1)
22bap03-Dabm Lab Manual
56 pages
Ffcode
No ratings yet
Ffcode
5 pages
(J. G. Kalbfleisch) Probability and Statistical I PDF
No ratings yet
(J. G. Kalbfleisch) Probability and Statistical I PDF
188 pages
Advance Statistical Methods in Data Science Chen
100% (7)
Advance Statistical Methods in Data Science Chen
229 pages
1645102337
No ratings yet
1645102337
23 pages
Football Result Prediction Using Simple Classification Algorithms, A Comparison Between K-Nearest Neighbor and Linear Regression
No ratings yet
Football Result Prediction Using Simple Classification Algorithms, A Comparison Between K-Nearest Neighbor and Linear Regression
26 pages
Statistical Test
No ratings yet
Statistical Test
7 pages
(FREE PDF Sample) (Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition Ebooks
100% (5)
(FREE PDF Sample) (Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition Ebooks
49 pages
A Meta-Analysis of Ten Learning Techniques
No ratings yet
A Meta-Analysis of Ten Learning Techniques
9 pages
QI Macros User Guide 2016 PDF
No ratings yet
QI Macros User Guide 2016 PDF
36 pages
Beginner's Guide to Regression Models
No ratings yet
Beginner's Guide to Regression Models
18 pages
Determinants of Intention To Use Family Planning M
No ratings yet
Determinants of Intention To Use Family Planning M
12 pages
Histogram From Raw Data: Frequency
No ratings yet
Histogram From Raw Data: Frequency
2 pages
Measure of Position
No ratings yet
Measure of Position
10 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
8 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
Hypothesis Testing II
No ratings yet
Hypothesis Testing II
48 pages
CM8 - Z-Test & Test For Outliers
No ratings yet
CM8 - Z-Test & Test For Outliers
40 pages
Lesson 2.4-The Empirical Rule and Assessing Normality
No ratings yet
Lesson 2.4-The Empirical Rule and Assessing Normality
11 pages
2) Descriptive Statistics - Asst. Prof. Dr. Meliz Yuvalı
No ratings yet
2) Descriptive Statistics - Asst. Prof. Dr. Meliz Yuvalı
16 pages
String Functions: Extract 1st Word From String "Name"
No ratings yet
String Functions: Extract 1st Word From String "Name"
28 pages
My Assignment
100% (2)
My Assignment
11 pages
Dimension Reduction For Outlier Detection Using DOBIN
No ratings yet
Dimension Reduction For Outlier Detection Using DOBIN
30 pages
PAST: Free Palaeontological Stats Tool
No ratings yet
PAST: Free Palaeontological Stats Tool
93 pages
Applied Statistics For Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998instant Download
No ratings yet
Applied Statistics For Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998instant Download
49 pages
Understanding Errors in Experimental Measurements
No ratings yet
Understanding Errors in Experimental Measurements
13 pages
Forecasting
No ratings yet
Forecasting
50 pages
An Analysis of Workload and Job Stress On Employees' Job Performance
No ratings yet
An Analysis of Workload and Job Stress On Employees' Job Performance
6 pages
502 - QM - Lec Guide PDF
No ratings yet
502 - QM - Lec Guide PDF
144 pages

Statistics

Uploaded by

Statistics

Uploaded by

EDA ( EXPLARATORY DATA ANALYSIS)

every time we need to consider

Sample equations are considered

from normal distribution to standard normal distribution

z-score (standard error) ==> converty mean to 0 & standard – 1

Type-I Error-> Reject a null true hypothesis

Z-Test is for population

T-Test is for sample

Confidence Margin Error-> value obtained from

Sum of Squares Regression (SSE)

Sum of Squares of Error

You might also like