0% found this document useful (0 votes)
76 views2 pages

Problem.:, I, I, I, I, I, I

The document outlines the process for performing Principal Component Analysis (PCA) using both eigendecomposition of the correlation matrix and singular value decomposition (SVD) on a numeric matrix without missing values. It details the input requirements, output expectations, and algorithm steps for both methods, including data normalization, matrix computation, and checks for validity. Additionally, it emphasizes the importance of testing the implementation against built-in functions and ensuring correctness through various checks.

Uploaded by

Shaikh Firdous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views2 pages

Problem.:, I, I, I, I, I, I

The document outlines the process for performing Principal Component Analysis (PCA) using both eigendecomposition of the correlation matrix and singular value decomposition (SVD) on a numeric matrix without missing values. It details the input requirements, output expectations, and algorithm steps for both methods, including data normalization, matrix computation, and checks for validity. Additionally, it emphasizes the importance of testing the implementation against built-in functions and ensuring correctness through various checks.

Uploaded by

Shaikh Firdous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Problem. Write from-scratch code to perform principal component analysis on given data.

Use eigendecomposition of the correlation matrix for this purpose.

Input. X: n × p numeric matrix (rows: cases/samples, columns: variables/factors); without


any missing values.

Output. Suppose k = min(n, p).

1. Loadings/rotations: p × k matrix.

2. Principal components/scores: n × k matrix.

3. Standard deviations: k-vector.

Checks on input arguments. Valid values in the input arguments, no missing values, etc.
Treat end-cases such as n < 2 and p < 2 separately.

Algorithm.

1. Shift each column of X by its own mean; i.e., Y·,i = X·,i − Mean(X·,i ) for each column
i = 1, . . . , p. Scale each column of Y by its own standard error; i.e., Y·,i = Y·,i /SE(Y·,i ) for
each column i = 1, . . . , p.

2. Compute the p × p correlation matrix C = Y T Y /(n − 1). C should be symmetric, all 1s


on the diagonal, and all other elements between −1 and +1.

3. Compute the eigendecomposition of C using the in-built function eigen(). This gives a
p-vector of eigenvalues d and a p × p matrix V with eigenvectors as its columns. Formally,
the eigendecomposition is C = V T · diag(d) · V .

4. (a) Check if the eigenvalues d > 0. If not, take an appropriate course of action.
(b) Check if the eigenvalues d are in descending order. If not, then reorder d in descending
order. Reorder the columns of V to match the changed order.

5. Compute the output quantities:

(a) Rotation/loading matrix R is the matrix of first k columns of V .


(b) Scores/principal component matrix (with PCs as columns) is Y R.
(c) Standard deviations of the PCs are the square roots of first k eigenvalues in d.

Testing.

1. At each step of the algorithm, put appropriate checks that reflect the assumptions made
about the computed quantities.
2. Compare the results of your implementation with the output of the in-built function
princomp() applied to a standard data set such as USArrests or iris without the species
column. The simplest artificial test data set would be a 2-variable (X1 , X2 ) data set where
X2 = mX1 + c + ϵ where the noise ϵ is normal with mean 0 and standard deviation σ > 0.
3. Demonstrate that your code produces correct results.

1
Problem. Write from-scratch code to perform principal component analysis on given data.
Use singular value decomposition of the data matrix for this purpose.

Input. X: n × p numeric matrix (rows: cases/samples, columns: variables/factors); without


any missing values.

Output. Suppose k = min(n, p).

1. Loadings/rotations: p × k matrix.

2. Principal components/scores: n × k matrix.

3. Standard deviations: k-vector.

Checks on input arguments. Valid values in the input arguments, no missing values, etc.
Treat end-cases such as n < 2 and p < 2 separately.

Algorithm.

1. Shift each column of X by its own mean; i.e., Y·,i = X·,i − Mean(X·,i ) for each column
i = 1, . . . , p. Scale each column of Y by its own standard error; i.e., Y·,i = Y·,i /SE(Y·,i ) for
each column i = 1, . . . , p.

2. Compute the singular value decomposition of Y using the in-built function svd(). SVD
gives a k-vector of singular values d, a n × k matrix U (left singular vectors as columns),
and a p × k matrix V (right singular vectors as columns). Formally, the singular value
decomposition is Y = U · diag(d) · V T .

3. (a) Check if the singular values d > 0. If not, take an appropriate course of action.
(b) Check if the singular values d are in descending order. If not, then reorder d in
descending order. Reorder the columns of U and V to match the changed order.

4. Compute the output quantities:

(a) Rotation/loading matrix R is the matrix V .


(b) Scores/principal component matrix (with PCs as columns) is Y R.

(c) Standard deviations of the PCs are d/ n.

Testing.

1. At each step of the algorithm, put appropriate checks that reflect the assumptions made
about the computed quantities.
2. Compare the results of your implementation with the output of the in-built function
princomp() applied to a standard data set such as USArrests or iris without the species
column. The simplest artificial test data set would be a 2-variable (X1 , X2 ) data set where
X2 = mX1 + c + ϵ where the noise ϵ is normal with mean 0 and standard deviation σ > 0.
3. Demonstrate that your code produces correct results.

You might also like