0% found this document useful (0 votes)
60 views9 pages

KNN Age Prediction for Asian Dataset

The document describes the steps to create KNN age prediction models for Asian datasets based on bone length measurements. It involves importing libraries like pandas and sklearn, loading CSV datasets, splitting data into training and testing sets, defining input and target variables, training a KNN classifier on the training set and predicting ages on the testing set. Model performance is evaluated using mean squared error. Advantages of the train-test-split library include less time consumed and easier implementation while doing it manually allows consistent results but takes more time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views9 pages

KNN Age Prediction for Asian Dataset

The document describes the steps to create KNN age prediction models for Asian datasets based on bone length measurements. It involves importing libraries like pandas and sklearn, loading CSV datasets, splitting data into training and testing sets, defining input and target variables, training a KNN classifier on the training set and predicting ages on the testing set. Model performance is evaluated using mean squared error. Advantages of the train-test-split library include less time consumed and easier implementation while doing it manually allows consistent results but takes more time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

KNN Age Prediction Model for Asian Dataset Based on 19 lengths of left-hand bones

Male Dataset
Using Library Train_Test_Split

Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding

Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding

import pandas as pd
f_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_train.csv')
f_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_test.csv')

Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training

from sklearn.model_selection import train_test_split


input_training = f_train.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_training = f_train['ChrAge']
input_testing = f_test.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_testing = f_test['ChrAge']

Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)

Step 5:
Import metrics from sklearn to calculate mean squared error

from sklearn import metrics


mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)

Output:
Using Manually Random Ordered

Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding

Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding

import pandas as pd
m_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/male_train.csv')
m_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/male_test.csv')

Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training

from sklearn.model_selection import train_test_split


input_training = m_train.drop(['Bil', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weightkg',
'Heightcm', 'Trunk HTcm', 'ChrAge'], axis=1)
target_training = m_train['ChrAge']
input_testing = m_test.drop(['Bil', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weightkg',
'Heightcm', 'Trunk HTcm', 'ChrAge'], axis=1)
target_testing = m_test['ChrAge']

Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)

Step 5:
Import metrics from sklearn to calculate mean squared error
from sklearn import metrics
mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)

Output:

Jupyter screenshot:
Female Dataset
Using Library Train_Test_Split

Step 1:
Open anaconda navigator and launch Jupyter Notebook
Save the xray_image_dataset_female.csv dataset in the same folder as your coding
Create a new phyton file and start doing the coding

Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding

import pandas as pd
f_data = pd.read_csv('C:/Users/ariny/OneDrive/Documents/xray_image_dataset_female.csv')
f_data

Step 3:
Split the dataset into training and testing datasets using the function train_test_split.
This function is used to select training data and testing data randomly.
Import function train_test_split using library sklearn.model_selection
Drop unrelated columns from the dataset excel
Set y as the prediction age
Test size is 0.3 because 70% for training and 30% for testing

from sklearn.model_selection import train_test_split


X = f_data.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)', 'Height(cm)',
'Trunk HT (cm)', 'ChrAge'], axis=1)
y = f_data['ChrAge']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
Step 5:
Import metrics to calculate accuracy from sklearn
Calculate mean squared error for the dataset

from sklearn import metrics


mse = metrics.mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Output:
Using Manually Random Ordered

Step 1:
Open anaconda navigator and launch Jupyter Notebook
In the csv file, put =RANDBETWEEN(x,y) with x and y is 1 until the last number
Sort and split into two for training data and testing data
Save the datasets in the same folder as your coding

Step 2:
Import library pandas in the coding.
This library is used to read the CSV file from the same folder
Load the dataset into the coding

import pandas as pd
f_train = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_train.csv')
f_test = pd.read_csv('C:/Users/ariny/OneDrive/Documents/female_test.csv')

Step 3:
Divide the datasets into input and target variables
Drop unrelated columns in input training and target training

from sklearn.model_selection import train_test_split


input_training = f_train.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_training = f_train['ChrAge']
input_testing = f_test.drop(['No', 'Race', 'Gender', 'DOB', 'Exam Date', 'Tanner', 'Weight(kg)',
'Height(cm)', 'Trunk HT (cm)', 'ChrAge'], axis=1)
target_testing = f_test['ChrAge']

Step 4:
Import library KNN from sklearn.neighbors
Use Classifier to determine false or true
Change n_neighbors according to number of nearest neighbors
Train the model using knn.fit and predict using knn.predict

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(input_training, target_training)
y_pred = knn.predict(input_testing)

Step 5:
Import metrics from sklearn to calculate mean squared error
from sklearn import metrics
mse = metrics.mean_squared_error(target_testing, y_pred)
print("Mean Squared Error:", mse)

Output:

Jupyter screenshot:
Advantages and disadvantages

Library Train_Test_Split

Advantages Disadvantages

Less time consumed Inconsistent result

Easier to implement and code because the Re-train and re-test dataset every time the
library already existed code executed

Manually Random Ordered

Advantages Disadvantages

Consistent result Time consuming

Can custom split based on specific conditions Model performance can be biased

You might also like