Machine Learning Questions and Answers
1. Performance measures associated with a classification model
- Error Rate: (FP + FN)/N
- Accuracy: (TP + TN)/N = 1 - Error Rate
- True Positive Rate (TPR) / Recall / Sensitivity: TP/P
- False Positive Rate (FPR): FP/N
- Precision: TP/P'
- Specificity: TN/N = 1 - FPR
- Receiver Operating Characteristic (ROC) Curve: Graphical representation of TPR vs. FPR.
- Area Under Curve (AUC): Measure of classifier performance based on ROC.
2. Measures used to select features for root and internal nodes in a decision tree
- Entropy: Measures impurity in a dataset.
- Gini Index: Measures the probability of misclassification.
- Information Gain: Reduction in entropy when a feature is used.
- Gain Ratio: Adjusted version of information gain to account for attribute splits.
3. Multivariate Classifier
- Considers multiple features simultaneously.
- Examples: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA),
Multivariate Decision Trees.
4. Differences between Random Forest and Decision Tree
Decision Tree:
- Simple, interpretable model.
- Can overfit to training data.
Random Forest:
- Collection of multiple decision trees.
- Uses bagging to reduce overfitting.
- More accurate but less interpretable.
5. Challenges in Linear Support Vector Machines (SVM)
- Sensitivity to noise and outliers.
- Difficulty handling non-linearly separable data.
- High computational complexity for large datasets.
- Feature scaling required for optimal results.
6. K-Nearest Neighbors (KNN) Classifier
- Lazy learner (stores training data and classifies new instances based on nearest neighbors).
- Distance-based approach (uses Euclidean, Manhattan, Minkowski distances).
- Sensitive to choice of K (odd K values prevent ties in binary classification).
7. Kernel Function & High Dimensionality Handling
- Kernel Function: Transforms data into higher dimensions (e.g., polynomial, Gaussian, radial
basis function).
- Handling High Dimensionality: Use PCA, feature selection, and dimensionality reduction
methods.
8. Distance Metrics
- Euclidean Distance: d(x,y) = sqrt(sum (x_i - y_i)^2)
- Manhattan Distance: d(x,y) = sum |x_i - y_i|
- Minkowski Distance: d(x,y) = (sum |x_i - y_i|^p)^(1/p).
9. Difference between Regression and Classification
- Regression: Predicts continuous values.
- Classification: Predicts discrete labels/classes.
10. Difference between Probabilistic Generative and Discriminative Classifiers
- Generative Classifiers (e.g., Naïve Bayes, Gaussian Mixture Model): Model joint probability
P(X, Y).
- Discriminative Classifiers (e.g., Logistic Regression, SVM): Model conditional probability P(Y
| X).
11. Applications of Clustering
- Image segmentation
- Customer segmentation
- Anomaly detection
- Document clustering
- Bioinformatics.
12. Approaches to Finding K in K-Means Clustering
- Elbow Method
- Silhouette Score
- Gap Statistic.
13. Properties of Clustering
- Homogeneity: Items in a cluster should be similar.
- Separation: Clusters should be distinct.
- Scalability: Should handle large datasets.
- Robustness: Should handle noise and outliers.
14. Types of Clustering
- Hard Clustering (e.g., K-Means)
- Soft Clustering (e.g., Gaussian Mixture Model)
- Hierarchical Clustering
- Density-Based Clustering (e.g., DBSCAN).
15. Feature Selection vs. Feature Extraction
- Feature Selection: Choosing the most relevant features.
- Feature Extraction: Transforming features into a new space (e.g., PCA, LDA).
16. Curse of Dimensionality Solutions
- Feature selection to reduce dimensions.
- Principal Component Analysis (PCA).
- Manifold Learning (e.g., t-SNE, LLE).
- Regularization techniques.
17. Spectral Clustering
- Uses eigenvalues of similarity matrices to perform clustering.
- Effective for non-convex clusters.
18. LVM (Latent Variable Model)
- Used for dimensionality reduction.
- Examples: Principal Component Analysis (PCA), Hidden Markov Models (HMMs).
19. Difference between Gaussian Mixture Model (GMM) and Dirichlet Mixture Model (DMM)
- GMM: Assumes Gaussian distributions with known priors.
- DMM: Uses Dirichlet Process as a prior, allowing a variable number of clusters.
20. Applications of Topic Mixture Models
- Text Classification
- Document Clustering
- Sentiment Analysis
- Recommender Systems.
21. Difference between Soft and Hard Clustering
- Hard Clustering: Data point belongs to only one cluster (e.g., K-Means).
- Soft Clustering: Data point has probabilities for multiple clusters (e.g., Gaussian Mixture
Models).
22. Why is K Odd in KNN?
- Prevents ties in classification.
- Ensures a majority vote in binary classification.