0% found this document useful (0 votes)

26 views32 pages

UNIT II Part-2

The document covers parametric methods in multivariate data analysis, focusing on parameter estimation, handling missing values, and the multivariate normal distribution. It discusses multivariate classification and regression techniques, emphasizing their applications in predicting outcomes based on multiple variables. Additionally, it explains the concepts of mean imputation and Mahalanobis distance in the context of multivariate analysis.

Uploaded by

janarthana9789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views32 pages

UNIT II Part-2

Uploaded by

janarthana9789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

UNIT II

Parametric Methods

Code:U18CST7002
Presented by: Nivetha R
Department: CSE
Multivariate Data
• When the data involves three or more
variables, it is categorized under
multivariate.
• d inputs/features/attributes: d-variate
• N instances/observations/examples
• Each feature may be in different units
Multivariate Data
• simplification – summarizing large body of data by means of
relatively few parameters. (Feature selection)
• exploratory – Obtain hypotheses about data.
• predict the value of one variable from the values of other
variables.
• multivariate classification – if discrete
• multivariate regression - if numeric
Parameter Estimation
For example, in deciding on a loan application, an
observation vector
is the information associated with a customer and is
composed of age,
marital status, yearly income, and so forth, and we have N
such past customers.

• These measurements may be of different scales, for

example, age in years and yearly income in monetary
units. Some like age may be numeric, and some like
marital status may be discrete.
• Typically these variables are correlated. If they are not,
there is no need for a multivariate analysis.
• Our aim may be simplification, that is, summarizing this
large body of data by means of relatively few
4
Parameter Estimation Data

𝑇
Mean:𝐸 [ 𝐱 ]=𝛍=[𝜇1 ,. . ,𝜇𝑑]
Covariance Matrix

if two variables are independent, their covariance, and hence their

correlation, is 0.

5
Parameter Estimation Data

6
Estimation of Missing Values
• What to do if certain instances have missing attributes?
• Ignore those instances:
• good idea if the sample is large
• not a good idea if the sample is small
• Use ‘missing’ as an attribute: may give information
• Estimate the missing data - Imputation

• Imputation: to fill in the missing entries by estimating them.

• Mean imputation:
• Mean for Numeric Value
• The most likely value for discrete Value
• Imputation by regression:
• Predict based on other attributes
• A separate classification or regression problem

7
Estimation of Missing Values
• In imputation by regression, try to predict the value of a
missing variable from other variables whose values are
known for that case. Depending on the type of the
missing variable, define a separate regression or
classification problem that we train by the data points for
which such values are known.
• If many different variables are missing, take the means
as the initial estimates and the procedure is iterated until
predicted values stabilize.
• If the variables are not highly correlated, the regression
approach is equivalent to mean imputation.

8
Multivariate Normal Distribution
• A multivariate normal distribution is a vector in multiple
normally distributed variables, such that any linear
combination of the variables is also normally distributed.
• It is mostly useful in extending the central limit theorem
to multiple variables, but also has applications to
bayesian inference and thus machine learning, where the
multivariate normal distribution is used to approximate
the features of some characteristics; for instance, in
detecting faces in pictures.

9
Multivariate Normal Distribution

10
Multivariate Normal Distribution

𝐱 N 𝑑( 𝛍 , Σ )
11
Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in

terms of ∑ (normalizes for difference in
variances and correlations)

Bivariate: d = 2

[ ]
2
𝜎 1 𝜌 𝜎 1𝜎 2
Σ= 2
𝜌 𝜎 1𝜎 2 𝜎 2

12
Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in

terms of ∑ (normalizes for difference in
variances and correlations)

Bivariate: d = 2

[ ]
2
𝜎 1 𝜌 𝜎 1𝜎 2
Σ= 2
𝜌 𝜎 1𝜎 2 𝜎 2

13
Multivariate Normal Distribution

14
Multivariate Normal Distribution

15
Multivariate classification

16
Multivariate classification

17
Multivariate classification

18
Multivariate classification

19
Multivariate classification

20
Multivariate classification

21
Multivariate classification

22
Multivariate classification

23
Multivariate Regression
Multivariate Regression is a method used to measure the
degree at which more than one independent variable
(predictors) and more than one dependent variable
(responses), are linearly related. The method is broadly
used to predict the behavior of the response variables
associated to changes in the predictor variables, once a
desired degree of relation has been established.
Exploratory Question: Can a supermarket owner maintain
stock of water, ice cream, frozen foods, canned foods and
meat as a function of temperature, tornado chance and gas
price during tornado season in June?

24
Multivariate Regression
From this question, several obvious assumptions can be
drawn: If it is too hot, ice cream sales increase; If a tornado
hits, water and canned foods sales increase while ice cream,
frozen foods and meat will decrease; If gas prices increase,
prices on all goods will increase. A mathematical model,
based on multivariate regression analysis will address this
and other more complicated questions.

25
Simple Regression
The Simple Regression model, relates one predictor and one

Let 𝑛n observations be (𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)(x1,y1),(x2,y2),

response.

…,(xn,yn) pairs of predictors and responses, such

that 𝜖𝑖∼𝑁(0,𝜎2)ϵi∼N(0,σ2) are i.i.d (independent and
identically distributed). For fixed real numbers 𝛽0β0and 𝛽1β1
(parameters), the model is as follows:
𝑦𝑖=𝛽0+𝛽1𝑥𝑖+𝜖𝑖yi=β0+β1xi+ϵi
The fitted model (fitted to the given data) is as follows:
𝑦^𝑖=𝛽^0+𝛽^1𝑥𝑖y^i=β^0+β^1xi
The estimated parameters are 𝛽^1=∑(𝑥𝑖−𝑥ˉ)(𝑦𝑖−𝑦ˉ)∑(𝑥𝑖−𝑥ˉ)2β^
1=∑(xi−xˉ)2∑(xi−xˉ)(yi−yˉ)and 𝛽^0=𝑦ˉ−𝛽^1𝑥ˉβ^0=yˉ−β^1xˉ,
such that 𝑥ˉxˉ and 𝑦ˉyˉare the sample averages.

26
Multiple Regression
The Multiple Regression model, relates more than one
predictor and one response. Regression model, relates one
predictor and one response.
Let YY be the 𝑛×1n×1 response vector, XX be an 𝑛×(𝑞+1)n×(q+1)
matrix such that all entries of the first column are 1′𝑠1′s,
and 𝑞q predictors. Let 𝜖ϵ be an 𝑛×1n×1 vector such
that 𝜖𝑖∼𝑁(0,𝜎2)ϵi∼N(0,σ2) are i.i.d (independent and identically
distributed), and 𝛽β be an (𝑞+1)×1(q+1)×1 vector of fixed
parameters. The model is as follows:

27
Multivariate Regression
The Multivariate Regression model, relates more than
one predictor and more than one response.

Let YY be the 𝑛×𝑝n×p response matrix, XX be

an 𝑛×(𝑞+1)n×(q+1) matrix such that all entries of the first column
are 1′𝑠1′s, and 𝑞q predictors. Let BB be an (𝑞+1)×𝑝(q+1)×p matrix
of fixed parameters, 𝛯Ξ be an 𝑛×𝑝n×p matrix such
that 𝛯∼𝑁(0,𝛴)Ξ∼N(0,Σ) (multivariate normally distributed with
covariance matrix 𝛴Σ). The model is as follows:

28
multivariate linear regression
The Multivariate Regression model, relates more than
one predictor and more than one In multivariate linear
regression, the numeric output r is assumed to be written
as a linear function, that is, a weighted sum, of several input
variables, x1, . . . , xd, and noise. Actually in statistical
literature, this is called multiple regression; statisticians use
the term multivariate when there are multiple outputs. The
multivariate linear model is

29
multivariate linear regression

30
multivariate linear regression

is small, in multivariate regression, we rarely use

polynomials of an order higher than linear.

31
Multivariate Regression

I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
18 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Applied Multivariate Statistics - Review
No ratings yet
Applied Multivariate Statistics - Review
26 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Multivariate
100% (1)
Multivariate
78 pages
Econometrics for Advanced Learners
No ratings yet
Econometrics for Advanced Learners
129 pages
CS3491-AI ML-Chapter 5
No ratings yet
CS3491-AI ML-Chapter 5
25 pages
Chapter 0 - Multiple Regression Models
100% (1)
Chapter 0 - Multiple Regression Models
34 pages
Ewan
No ratings yet
Ewan
144 pages
Regression Analysis Course Notes
No ratings yet
Regression Analysis Course Notes
73 pages
Linear Regression for Researchers
No ratings yet
Linear Regression for Researchers
41 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
Statistical Models for Analysts
No ratings yet
Statistical Models for Analysts
93 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
AMA3602Final2024Fall Ray
No ratings yet
AMA3602Final2024Fall Ray
21 pages
Dva 2
No ratings yet
Dva 2
13 pages
Da 3
No ratings yet
Da 3
30 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit 3
No ratings yet
Unit 3
25 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Day 6 Session 2 MLR
No ratings yet
Day 6 Session 2 MLR
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Multivariate
0% (1)
Multivariate
319 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
01 Multivariate Analysis
100% (1)
01 Multivariate Analysis
40 pages
Linear Regression Models 2018
No ratings yet
Linear Regression Models 2018
68 pages
Multivariate Data Analysis Guide
No ratings yet
Multivariate Data Analysis Guide
24 pages
Chapter 09 Linear
No ratings yet
Chapter 09 Linear
29 pages
Computational Psychology
No ratings yet
Computational Psychology
39 pages
BRM Multi Var
No ratings yet
BRM Multi Var
38 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
Metodos de Regresion
No ratings yet
Metodos de Regresion
8 pages
DA Unit 3 Trio
No ratings yet
DA Unit 3 Trio
13 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Chap 5
No ratings yet
Chap 5
13 pages
Ics054 Unit 2a
No ratings yet
Ics054 Unit 2a
8 pages
4 Using - Multivariate - Statistics - (Contents)
No ratings yet
4 Using - Multivariate - Statistics - (Contents)
11 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Peter D. Hoff - Multivariate Statistical Analysis
No ratings yet
Peter D. Hoff - Multivariate Statistical Analysis
174 pages
Sta 3
No ratings yet
Sta 3
9 pages
Multivariate Analysis Basics
No ratings yet
Multivariate Analysis Basics
55 pages
Data Analysis Notes
No ratings yet
Data Analysis Notes
9 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Prediction & Forecasting: Regression Analysis
No ratings yet
Prediction & Forecasting: Regression Analysis
3 pages
Unit-3 Research Methods-MCA
No ratings yet
Unit-3 Research Methods-MCA
15 pages
Ai - Foundations of Machine Learning II
No ratings yet
Ai - Foundations of Machine Learning II
54 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
19 pages
Multiple Regression Okk PDF
No ratings yet
Multiple Regression Okk PDF
19 pages
Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
C01 Introduction S
No ratings yet
C01 Introduction S
20 pages
Notes of DA Unit-II
No ratings yet
Notes of DA Unit-II
91 pages
Thesis on Imputation via Gradient Boosted Trees
No ratings yet
Thesis on Imputation via Gradient Boosted Trees
73 pages
Session 7 - Multivariate Data Analysis
No ratings yet
Session 7 - Multivariate Data Analysis
28 pages
Visualization Charts
No ratings yet
Visualization Charts
108 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
DV Unit5
No ratings yet
DV Unit5
113 pages
Unit V - Graphical Models
No ratings yet
Unit V - Graphical Models
43 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
QB 12678
No ratings yet
QB 12678
3 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
UNIT II Part-1
No ratings yet
UNIT II Part-1
59 pages
DV Unit1 Part2
No ratings yet
DV Unit1 Part2
98 pages
Sample Exam Questions Review
No ratings yet
Sample Exam Questions Review
19 pages
Jarque - Bera Test
No ratings yet
Jarque - Bera Test
3 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
22 pages
2023 05 Struktur Variaans-Kovarians
No ratings yet
2023 05 Struktur Variaans-Kovarians
42 pages
Lasso Regression
No ratings yet
Lasso Regression
3 pages
Newbold Sbe8 Ch01 Ge
No ratings yet
Newbold Sbe8 Ch01 Ge
63 pages
Chapter 21 AP Classroom HW
No ratings yet
Chapter 21 AP Classroom HW
3 pages
Book Reviews: Editor: Ananda Sen
No ratings yet
Book Reviews: Editor: Ananda Sen
12 pages
Econometric Quantile Regression Guide
No ratings yet
Econometric Quantile Regression Guide
72 pages
Simple Moving Avg (SMA) and Problems
100% (1)
Simple Moving Avg (SMA) and Problems
7 pages
Panel Smooth Threshold Regression Guide
No ratings yet
Panel Smooth Threshold Regression Guide
7 pages
Iit M Diploma Et1 Exam Qpd2 s2
No ratings yet
Iit M Diploma Et1 Exam Qpd2 s2
379 pages
Data Mentah Dan Hasil Spss
No ratings yet
Data Mentah Dan Hasil Spss
8 pages
Solution - Chapter 10
No ratings yet
Solution - Chapter 10
33 pages
MISSING
No ratings yet
MISSING
5 pages
Self Assessment Exam
No ratings yet
Self Assessment Exam
4 pages
Lecture - 5 - Validation
No ratings yet
Lecture - 5 - Validation
30 pages
Fisher On Design
No ratings yet
Fisher On Design
15 pages
Assignment 2 Fin534 Nurul Izzah Khasnan
No ratings yet
Assignment 2 Fin534 Nurul Izzah Khasnan
18 pages
Linear Regression Using Gradient Descent and Normal Equation Method
No ratings yet
Linear Regression Using Gradient Descent and Normal Equation Method
6 pages
Unit - 2 BRM PDF
No ratings yet
Unit - 2 BRM PDF
9 pages
Big Data and Business Intelligence
No ratings yet
Big Data and Business Intelligence
108 pages
Halal Labels & Price Impact on Samyang Purchases
No ratings yet
Halal Labels & Price Impact on Samyang Purchases
9 pages
Man Science Note 6
No ratings yet
Man Science Note 6
3 pages
Wilcoxon Matched Pair Signed Ranks Test
No ratings yet
Wilcoxon Matched Pair Signed Ranks Test
19 pages
Chapter 10
No ratings yet
Chapter 10
7 pages
Get Statistics For Business & Economics 14e Edition David R. Anderson - Ebook PDF Free All Chapters
100% (9)
Get Statistics For Business & Economics 14e Edition David R. Anderson - Ebook PDF Free All Chapters
46 pages
Simple Linear Regression Problems
No ratings yet
Simple Linear Regression Problems
5 pages
Wasserman 8 PDF
No ratings yet
Wasserman 8 PDF
12 pages
江艇+IRID metrics 2016 slides
No ratings yet
江艇+IRID metrics 2016 slides
162 pages

UNIT II Part-2

Uploaded by

UNIT II Part-2

Uploaded by

UNIT II

• These measurements may be of different scales, for

if two variables are independent, their covariance, and hence their

• Imputation: to fill in the missing entries by estimating them.

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in

Let 𝑛n observations be (𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)(x1​,y1​),(x2​,y2​),

…,(xn​,yn​) pairs of predictors and responses, such

Let YY be the 𝑛×𝑝n×p response matrix, XX be

is small, in multivariate regression, we rarely use

You might also like

Let 𝑛n observations be (𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)(x1,y1),(x2,y2),

…,(xn,yn) pairs of predictors and responses, such