This project implements the K-Nearest Neighbors (KNN) algorithm for gender classification based on height, weight, and age. Various distance metrics, including Euclidean, Manhattan, and Minkowski distances, are applied to classify individuals into gender categories (Male or Female). The project evaluates the model's performance using different values of K and identifies the most effective feature set for accurate predictions.
- Implement KNN using Euclidean, Manhattan, and Minkowski distance metrics.
- Evaluate the impact of different
Kvalues on classification accuracy. - Analyze feature importance by removing individual features and assessing model performance.
- Use cross-validation to evaluate model robustness.
- Programming Language: Python
- Libraries:
numpy: For numerical computations.math: For mathematical operations.
- Training Data (
Training_Data.txtandTraining_Data.csv):- Contains height, weight, age, and gender labels (
Mfor Male,Wfor Female). - Example:
(( 1.6530190426733, 72.871146648479, 24), W) (( 1.6471384909498, 72.612785314988, 34), W)
- Contains height, weight, age, and gender labels (
- Test Data (
Test_Data.txtandTest_Data.csv):- Contains height, weight, and age without labels for prediction.
- Example:
(1.62065758929, 59.376557437583, 32)
- Implemented KNN using the following distance metrics:
- Euclidean Distance: [ \text{distance} = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} ]
- Manhattan Distance: [ \text{distance} = \sum_{i=1}^{n} |x_i - y_i| ]
- Minkowski Distance: [ \text{distance} = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p} ]
- Tested the model with various values of
K(e.g., 1, 3, 5, 7). - Observed classification accuracy for each
Kusing cross-validation.
- Removed features (e.g., age) to evaluate their impact on model performance.
- Discovered that removing age improved accuracy, indicating height and weight are stronger predictors of gender.
| Distance Metric | K=1 | K=3 | K=5 | K=7 | K=9 |
|---|---|---|---|---|---|
| Euclidean | 100% | 95% | 92% | 90% | 88% |
| Manhattan | 98% | 96% | 93% | 91% | 89% |
| Minkowski (p=3) | 99% | 96% | 93% | 90% | 89% |
- Removing
Ageimproved model accuracy across all metrics andKvalues. - Height and weight were found to be the most significant features for predicting gender.