0% found this document useful (0 votes)
9 views4 pages

Aiml Exp 5 Viva

The document discusses feature selection and Principal Component Analysis (PCA), emphasizing the goal of dimensionality reduction to simplify analysis while retaining relevant information. It outlines various methods and concepts related to PCA, including advantages, limitations, and comparisons with other techniques like LDA and t-SNE. Additionally, it covers advanced topics such as hybrid methods, incremental PCA, and the integration of domain knowledge into feature selection.

Uploaded by

ngore971
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Aiml Exp 5 Viva

The document discusses feature selection and Principal Component Analysis (PCA), emphasizing the goal of dimensionality reduction to simplify analysis while retaining relevant information. It outlines various methods and concepts related to PCA, including advantages, limitations, and comparisons with other techniques like LDA and t-SNE. Additionally, it covers advanced topics such as hybrid methods, incremental PCA, and the integration of domain knowledge into feature selection.

Uploaded by

ngore971
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SPPU Mechanical Engineering

Created by @vaibhavpandit_tele

4.
5 Feature Selection and PCA
Basic Fundamentals

1. What is the goal of dimensionality reduction?


To reduce the number of variables in a dataset while preserving as much relevant information as possible,
simplifying analysis and improving model performance.

2. List three advantages of PCA.

 Removes correlated features.

 Enhances algorithm performance by reducing dimensionality.

 Enables better visualization of high-dimensional data.

3. How does a low-variance filter work?


It removes features with variance below a set threshold, assuming low-variance features carry little
information.

4. What is the interpretation of principal components?


Principal components are new uncorrelated variables formed as linear combinations of original features
that capture maximum variance in the data.

5. How does correlation-based feature selection work?


It selects features highly correlated with the target variable but uncorrelated with each other to reduce
redundancy and improve predictive power.

6. Why is PCA sensitive to feature scaling?


Because PCA relies on variance, features with larger scales dominate the principal components unless
data is standardized.

7. What is the scree plot used for in PCA?


To visualize the eigenvalues of principal components and help decide how many components to retain.

8. Define "eigenvalue" in the context of PCA.


An eigenvalue represents the amount of variance captured by its corresponding principal component.
9. How does LDA differ from PCA?
LDA is supervised and maximizes class separability, while PCA is unsupervised and maximizes variance
without considering class labels.

10. What is the Kaiser criterion for component selection?


Retain principal components with eigenvalues greater than 1, as they explain more variance than an
individual original variable.

Medium Level

11. How would you determine the optimal number of principal components?
By analyzing the scree plot, cumulative explained variance (e.g., 90%), or using cross-validation to balance
dimensionality and accuracy.

12. Compare forward selection and backward elimination for feature selection.

 Forward selection starts with no features and adds them iteratively based on improvement.

 Backward elimination starts with all features and removes the least significant iteratively.

13. Explain how to apply PCA for image compression.


Flatten images, apply PCA to reduce dimensionality, store only top components, and reconstruct images
from these components to save space.

14. What are the limitations of PCA for non-linear data?


PCA cannot capture non-linear relationships as it is a linear method, leading to poor representation of
complex data structures.

15. How does multicollinearity impact feature selection?


Highly correlated features can cause redundancy and instability in models, making feature selection
necessary to remove them.

16. Propose a method to validate selected features using cross-validation.


Split data into folds, train models on selected features in training folds, and evaluate performance on
validation folds to ensure generalization.

17. How can PCA be used for noise reduction?


By discarding components with low eigenvalues that mostly capture noise, reconstructing data with
principal components representing signal.
18. What are the assumptions of ICA compared to PCA?
ICA assumes statistical independence of source signals and non-Gaussianity, unlike PCA which assumes
orthogonality and maximizes variance.

19. How does t-SNE address PCA's limitations for visualization?


t-SNE captures non-linear relationships and preserves local structure, providing better visualization of
complex data clusters.

20. Explain the role of singular value decomposition (SVD) in PCA.


SVD decomposes the data matrix into singular vectors and values, facilitating efficient computation of
principal components.

Hard Level

21. Derive the relationship between covariance matrix and PCA components.
PCA components are eigenvectors of the covariance matrix; eigenvectors define directions of maximum
variance, eigenvalues quantify variance along those directions.

22. Design a hybrid dimensionality reduction method combining PCA and t-SNE.
First apply PCA to reduce dimensionality and noise, then apply t-SNE on PCA output for detailed non-
linear visualization.

23. How would you apply PCA to streaming data with concept drift?
Use incremental PCA algorithms that update components as new data arrives, adapting to changes in data
distribution over time.

24. Critique the interpretability challenges of PCA-transformed features.


Principal components are linear combinations of original features, making it difficult to attribute specific
meanings to them.

25. Propose a method to integrate domain knowledge into automated feature selection.
Incorporate expert-defined feature importance as priors or constraints within feature selection
algorithms.

26. Analyze the failure modes of PCA for categorical data.


PCA assumes numeric continuous data and linear relationships; it fails to handle categorical variables
properly without encoding.
27. How can reinforcement learning optimize feature selection pipelines?
Model feature selection as sequential decisions by an RL agent, rewarding selections that improve model
performance.

28. Evaluate the computational complexity of incremental PCA.


Incremental PCA updates components with new data batches, reducing complexity compared to batch
PCA but still dependent on data size and component count.

29. Design a metric to quantify information loss in dimensionality reduction.


Use reconstruction error or the proportion of variance not explained by retained components.

30. How does kernel PCA extend traditional PCA for non-linear data?
Kernel PCA applies PCA in a high-dimensional feature space via kernel functions, capturing non-linear
structures in the original data.

This completes your clean Q&A set on Feature Selection and PCA for effective viva preparation.

You might also like