0% found this document useful (0 votes)
26 views32 pages

UNIT II Part-2

The document covers parametric methods in multivariate data analysis, focusing on parameter estimation, handling missing values, and the multivariate normal distribution. It discusses multivariate classification and regression techniques, emphasizing their applications in predicting outcomes based on multiple variables. Additionally, it explains the concepts of mean imputation and Mahalanobis distance in the context of multivariate analysis.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views32 pages

UNIT II Part-2

The document covers parametric methods in multivariate data analysis, focusing on parameter estimation, handling missing values, and the multivariate normal distribution. It discusses multivariate classification and regression techniques, emphasizing their applications in predicting outcomes based on multiple variables. Additionally, it explains the concepts of mean imputation and Mahalanobis distance in the context of multivariate analysis.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

UNIT II

Parametric Methods

Code:U18CST7002
Presented by: Nivetha R
Department: CSE
Multivariate Data
• When the data involves three or more
variables, it is categorized under
multivariate.
• d inputs/features/attributes: d-variate
• N instances/observations/examples
• Each feature may be in different units
Multivariate Data
• simplification – summarizing large body of data by means of
relatively few parameters. (Feature selection)
• exploratory – Obtain hypotheses about data.
• predict the value of one variable from the values of other
variables.
• multivariate classification – if discrete
• multivariate regression - if numeric
Parameter Estimation
For example, in deciding on a loan application, an
observation vector
is the information associated with a customer and is
composed of age,
marital status, yearly income, and so forth, and we have N
such past customers.

• These measurements may be of different scales, for


example, age in years and yearly income in monetary
units. Some like age may be numeric, and some like
marital status may be discrete.
• Typically these variables are correlated. If they are not,
there is no need for a multivariate analysis.
• Our aim may be simplification, that is, summarizing this
large body of data by means of relatively few
4
Parameter Estimation Data

𝑇
Mean:𝐸 [ 𝐱 ]=𝛍=[𝜇1 ,. . ,𝜇𝑑]
Covariance Matrix

if two variables are independent, their covariance, and hence their


correlation, is 0.

5
Parameter Estimation Data

6
Estimation of Missing Values
• What to do if certain instances have missing attributes?
• Ignore those instances:
• good idea if the sample is large
• not a good idea if the sample is small
• Use ‘missing’ as an attribute: may give information
• Estimate the missing data - Imputation

• Imputation: to fill in the missing entries by estimating them.


• Mean imputation:
• Mean for Numeric Value
• The most likely value for discrete Value
• Imputation by regression:
• Predict based on other attributes
• A separate classification or regression problem

7
Estimation of Missing Values
• In imputation by regression, try to predict the value of a
missing variable from other variables whose values are
known for that case. Depending on the type of the
missing variable, define a separate regression or
classification problem that we train by the data points for
which such values are known.
• If many different variables are missing, take the means
as the initial estimates and the procedure is iterated until
predicted values stabilize.
• If the variables are not highly correlated, the regression
approach is equivalent to mean imputation.

8
Multivariate Normal Distribution
• A multivariate normal distribution is a vector in multiple
normally distributed variables, such that any linear
combination of the variables is also normally distributed.
• It is mostly useful in extending the central limit theorem
to multiple variables, but also has applications to
bayesian inference and thus machine learning, where the
multivariate normal distribution is used to approximate
the features of some characteristics; for instance, in
detecting faces in pictures.

9
Multivariate Normal Distribution

10
Multivariate Normal Distribution

𝐱 N 𝑑( 𝛍 , Σ )
11
Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in


terms of ∑ (normalizes for difference in
variances and correlations)

Bivariate: d = 2

[ ]
2
𝜎 1 𝜌 𝜎 1𝜎 2
Σ= 2
𝜌 𝜎 1𝜎 2 𝜎 2

12
Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in


terms of ∑ (normalizes for difference in
variances and correlations)

Bivariate: d = 2

[ ]
2
𝜎 1 𝜌 𝜎 1𝜎 2
Σ= 2
𝜌 𝜎 1𝜎 2 𝜎 2

13
Multivariate Normal Distribution

14
Multivariate Normal Distribution

15
Multivariate classification

16
Multivariate classification

17
Multivariate classification

18
Multivariate classification

19
Multivariate classification

20
Multivariate classification

21
Multivariate classification

22
Multivariate classification

23
Multivariate Regression
Multivariate Regression is a method used to measure the
degree at which more than one independent variable
(predictors) and more than one dependent variable
(responses), are linearly related. The method is broadly
used to predict the behavior of the response variables
associated to changes in the predictor variables, once a
desired degree of relation has been established.
Exploratory Question: Can a supermarket owner maintain
stock of water, ice cream, frozen foods, canned foods and
meat as a function of temperature, tornado chance and gas
price during tornado season in June?

24
Multivariate Regression
From this question, several obvious assumptions can be
drawn: If it is too hot, ice cream sales increase; If a tornado
hits, water and canned foods sales increase while ice cream,
frozen foods and meat will decrease; If gas prices increase,
prices on all goods will increase. A mathematical model,
based on multivariate regression analysis will address this
and other more complicated questions.

25
Simple Regression
The Simple Regression model, relates one predictor and one

Let 𝑛n observations be (𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)(x1​,y1​),(x2​,y2​),


response.

…,(xn​,yn​) pairs of predictors and responses, such


that 𝜖𝑖∼𝑁(0,𝜎2)ϵi​∼N(0,σ2) are i.i.d (independent and
identically distributed). For fixed real numbers 𝛽0β0​and 𝛽1β1​
(parameters), the model is as follows:
𝑦𝑖=𝛽0+𝛽1𝑥𝑖+𝜖𝑖yi​=β0​+β1​xi​+ϵi​
The fitted model (fitted to the given data) is as follows:
𝑦^𝑖=𝛽^0+𝛽^1𝑥𝑖y^​i​=β^​0​+β^​1​xi​
The estimated parameters are 𝛽^1=∑(𝑥𝑖−𝑥ˉ)(𝑦𝑖−𝑦ˉ)∑(𝑥𝑖−𝑥ˉ)2β^​
1​=∑(xi​−xˉ)2∑(xi​−xˉ)(yi​−yˉ​)​and 𝛽^0=𝑦ˉ−𝛽^1𝑥ˉβ^​0​=yˉ​−β^​1​xˉ,
such that 𝑥ˉxˉ and 𝑦ˉyˉ​are the sample averages.

26
Multiple Regression
The Multiple Regression model, relates more than one
predictor and one response. Regression model, relates one
predictor and one response.
Let YY be the 𝑛×1n×1 response vector, XX be an 𝑛×(𝑞+1)n×(q+1)
matrix such that all entries of the first column are 1′𝑠1′s,
and 𝑞q predictors. Let 𝜖ϵ be an 𝑛×1n×1 vector such
that 𝜖𝑖∼𝑁(0,𝜎2)ϵi​∼N(0,σ2) are i.i.d (independent and identically
distributed), and 𝛽β be an (𝑞+1)×1(q+1)×1 vector of fixed
parameters. The model is as follows:

27
Multivariate Regression
The Multivariate Regression model, relates more than
one predictor and more than one response.

Let YY be the 𝑛×𝑝n×p response matrix, XX be


an 𝑛×(𝑞+1)n×(q+1) matrix such that all entries of the first column
are 1′𝑠1′s, and 𝑞q predictors. Let BB be an (𝑞+1)×𝑝(q+1)×p matrix
of fixed parameters, 𝛯Ξ be an 𝑛×𝑝n×p matrix such
that 𝛯∼𝑁(0,𝛴)Ξ∼N(0,Σ) (multivariate normally distributed with
covariance matrix 𝛴Σ). The model is as follows:

28
multivariate linear regression
The Multivariate Regression model, relates more than
one predictor and more than one In multivariate linear
regression, the numeric output r is assumed to be written
as a linear function, that is, a weighted sum, of several input
variables, x1, . . . , xd, and noise. Actually in statistical
literature, this is called multiple regression; statisticians use
the term multivariate when there are multiple outputs. The
multivariate linear model is

29
multivariate linear regression

30
multivariate linear regression

is small, in multivariate regression, we rarely use


polynomials of an order higher than linear.

31
Multivariate Regression

32

You might also like