Winter semester 23-24
Course code MDI4001
Course name Machine Learning for DataScience (LAB)
Submitted to:
Jyotismita Chaki
Jyotismita@vit.ac.in
FAT Exam
Submitted by :
Shiny. S (21MID0079)
Shiny.2021@vitstudent.ac.in
Date: 29 April 2024
a) Performing the preprocessing steps in the given dataset
CODE:
import pandas as pd
import numpy as np
data = pd.read_csv("agriculture_dataset.csv")
data.head()
data.info()
data.isnull().sum()
data.describe()
SCREENSHOT :
There are no null values in the dataset. So there isn’t need for further preprocessing
steps.
b. Divide the dataset into train, validation, and test sets.
CODE :
x = data.iloc[:,0:6]
y = data['Plant type']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
SCREENSHOT:
C ) Use a suitable hyperparameter-tuned ML model to train the dataset.
Random Forest is the suitable hyperparameter-tuned model to train the given dataset.
CODE :
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)
y_pred = rf_classifier.predict(X_test)
SCREENSHOT:
d. After training, validate it and test the model’s performance
CODE:
model = RandomForestClassifier(random_state=1, max_depth=10)
model.fit(X_train, y_train)
pred_train = model.predict(X_train)
train_score = accuracy_score(y_train,pred_train)
print('train_accuracy_score',train_score)
pred_val = model.predict(X_test)
val_score = accuracy_score(y_test,pred_val)
print('val_accuracy_score',val_score)
SCREENSHOT:
Hypertuning the model for better value accuracy:
CODE:
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,
recall_score, ConfusionMatrixDisplay
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
param_dist = {'n_estimators': randint(50,500),'max_depth': randint(1,20)}
rf = RandomForestClassifier()
rand_search = RandomizedSearchCV(rf,param_distributions = param_dist, n_iter=5,
cv=5)
rand_search.fit(X_train, y_train)
# Create a variable for the best model
best_rf = rand_search.best_estimator_
# Print the best hyperparameters
print('Best hyperparameters:', rand_search.best_params_)
# Generate predictions with the best model
pred_train = best_rf.predict(X_train)
train_score = accuracy_score(y_train,pred_train)
print('train_accuracy_score',train_score)
pred_val = best_rf.predict(X_test)
val_score = accuracy_score(y_test,pred_val)
print('val_accuracy_score',val_score)
SCREENSHOT:
Creating the confusion matrix:
CODE:
cm = confusion_matrix(y_test,pred_val)
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
SCREENSHOT:
a) Perform the pre-processing steps if needed. If the pre-processing steps are not needed
CODE:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
data = pd.read_csv('University_dataset.csv')
data.head()
data.isnull().sum()
SCREENSHOT:
B ) Divide the dataset into train, validation, and test set.
CODE:
X = data.iloc[:, 1:6].values
y = data.iloc[:, 6].values
SCREENSHOT:
C ) Can we use an ANN to train the dataset? If yes, then create an ANN and train and validate the
model by using the dataset and write a discussion on the performance of the model on the answer
booklet given. If no, then write your justification on the answer booklet given.
CODE:
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build the ANN
model = Sequential()
model.add(Dense(128, input_dim=5, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='linear'))
# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_error'])
# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test)) #Evaluate
the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Loss: {loss}, Mean Absolute Error: {accuracy}')
# Make predictions
predictions = model.predict(X_test)
SCREENSHOT: