KNN Age Prediction Model for Asian Dataset Based on 19 lengths of left-hand bones
Male Dataset
Using Library Train_Test_Split
Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding
Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding
import pandas as pd
f_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_train.csv')
f_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_test.csv')
Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training
from sklearn.model_selection import train_test_split
input_training = f_train.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_training = f_train['ChrAge']
input_testing = f_test.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_testing = f_test['ChrAge']
Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)
Step 5:
Import metrics from sklearn to calculate mean squared error
from sklearn import metrics
mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)
Output:
Using Manually Random Ordered
Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding
Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding
import pandas as pd
m_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/male_train.csv')
m_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/male_test.csv')
Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training
from sklearn.model_selection import train_test_split
input_training = m_train.drop(['Bil', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weightkg',
'Heightcm', 'Trunk HTcm', 'ChrAge'], axis=1)
target_training = m_train['ChrAge']
input_testing = m_test.drop(['Bil', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weightkg',
'Heightcm', 'Trunk HTcm', 'ChrAge'], axis=1)
target_testing = m_test['ChrAge']
Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)
Step 5:
Import metrics from sklearn to calculate mean squared error
from sklearn import metrics
mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)
Output:
Jupyter screenshot:
Female Dataset
Using Library Train_Test_Split
Step 1:
Open anaconda navigator and launch Jupyter Notebook
Save the xray_image_dataset_female.csv dataset in the same folder as your coding
Create a new phyton file and start doing the coding
Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding
import pandas as pd
f_data = pd.read_csv('C:/Users/ariny/OneDrive/Documents/xray_image_dataset_female.csv')
f_data
Step 3:
Split the dataset into training and testing datasets using the function train_test_split.
This function is used to select training data and testing data randomly.
Import function train_test_split using library sklearn.model_selection
Drop unrelated columns from the dataset excel
Set y as the prediction age
Test size is 0.3 because 70% for training and 30% for testing
from sklearn.model_selection import train_test_split
X = f_data.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)', 'Height(cm)',
'Trunk HT (cm)', 'ChrAge'], axis=1)
y = f_data['ChrAge']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
Step 5:
Import metrics to calculate accuracy from sklearn
Calculate mean squared error for the dataset
from sklearn import metrics
mse = metrics.mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
Output:
Using Manually Random Ordered
Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding
Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding
import pandas as pd
f_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_train.csv')
f_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_test.csv')
Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training
from sklearn.model_selection import train_test_split
input_training = f_train.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_training = f_train['ChrAge']
input_testing = f_test.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_testing = f_test['ChrAge']
Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)
Step 5:
Import metrics from sklearn to calculate mean squared error
from sklearn import metrics
mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)
Output:
Jupyter screenshot:
Advantages and disadvantages
Library Train_Test_Split
Advantages Disadvantages
Less time consumed Inconsistent result
Easier to implement and code because the Re-train and re-test dataset every time the
library already existed code executed
Manually Random Ordered
Advantages Disadvantages
Consistent result Time consuming
Can custom split based on specific conditions Model performance can be biased