0% found this document useful (0 votes)

42 views36 pages

Internship Report ML'

Uploaded by

Shreya Rangachar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views36 pages

Internship Report ML'

Uploaded by

Shreya Rangachar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 36

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi - 590 018, Karnataka.

21INT68 -Innovation/Entrepreneurship
/Societal Internship

“DATA ANALYSIS USING PYTHON”

Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Engineering
In
Computer Science & Engineering

Submitted by

1BI21CS140 Shreya VR

Under the Guidance of

Dr. Maya B.S
Assistant Professor
Dept. of CSE, BIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

BANGALORE INSTITUTE OF TECHNOLOGY
K. R. Road, V. V. Puram, Bengaluru - 560 004

2023-2024

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi-590 018, Karnataka

BANGALORE INSTITUTE OF TECHNOLOGY

Bengaluru-560 004

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Certificate

Certified that the 21INT68-Innovation/Entrepreneurship/Societal Internship (21INT68)

work entitled “Data Analysis using Python” carried out by Ms. Shreya VR bearing USN
1BI21CS163, a bonafide student of Bangalore Institute of Technology in partial fulfillment
for the award of Bachelor of Engineering in Computer Science & Engineering of the
Visvesvaraya Technological University, Belagavi during the academic year 2023-2024. It
is certified that all corrections/suggestions indicated for Internal Assessment have been
incorporated in the report deposited in the departmental library.
The Internship report has been approved as it satisfies the academic requirements
in respect of Internship work prescribed for the said degree.

Guide Dr. J. Girija

Dr. Maya B.S. Professor and Head
Assistant Professor Department of CSE, BIT
Department of CSE, BIT

ACKNOWLEDGEMENT
The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without complementing those who made it possible and whose
guidance and encouragement made my efforts successful. So, my sincere thanks to all
those who have supported me in completing this Internship successfully.

My sincere thanks to Dr. M. U. Aswath, Principal, BIT and Dr. J. Girija, HOD,
Department of CSE, BIT for their encouragement, support and guidance to the student
community in all fields of education. I am grateful to our institution for providing us a
congenial atmosphere to carry out the Internship successfully.

I would not forget to remember Dr. Bhanushree K J, Associate Professor 21INT68 -

Innovation/Entrepreneurship/Societal Internship Coordinator, Department of CSE,
BIT, for her encouragement and more over for her timely support and guidance till the
completion of the Internship.

I avail this opportunity to express my profound sense of deep gratitude to my esteemed

guide Dr. Maya B.S., Assistant Professor, Department of CSE, BIT, for her moral
support, encouragement and valuable suggestions throughout the Internship.

I extend my sincere thanks to all the department faculty members and non-teaching staff
for supporting me directly or indirectly in the completion of this Internship.

NAME: SHREYA VR
USN: 1BI21CS140

TABLE OF CONTENTS
Chapter 1 - Introduction 1
1.1 Overview 1
1.2 Objective 1
1.3
Purpose, Scope and applicability 2

1.3.1 Purpose 2
1.3.2 Scope 2
1.3.3 Applicability 3
1.4 Organization of Report 4
Chapter 2 - Problem Statement 5
Chapter 3 -Methodology /System Architecture/Algorithm 7
Chapter 4 - Tools/Technologies 9
Chapter 5 - Implementation 11
5.1 Source code 11
Chapter 6 - Results 17
Chapter 7 - Reflection Notes 20
Chapter 8 - References 21
Internship Certificate 22
LIST OF FIGURES
Figure Page
Description
No. No.
2.1 Sample Dataset 6
3.1 System Architecture 7
6.1 Results for Random Forest Classifier 17
6.2 Results for Decision Tree Classifier 17
6.3 Results for Support Vector Machine 18
(a)Prediction on new data_1
6.4 18
(b) Output for new data_1
(a) Prediction on new data_2
6.5 19
(b) Output for new data_2

5
Chapter 1
Introduction

6
Chapter 1
Introduction
1.1 Overview

Diabetes is a chronic metabolic disorder characterized by elevated blood sugar levels,

which can lead to serious health complications if left untreated. Predictive modeling using
machine learning (ML) techniques has emerged as a valuable tool in the early detection
and management of diabetes.

ML models analyze various factors such as demographics, lifestyle habits, medical

history, and biomarkers to predict an individual's risk of developing diabetes. By
leveraging large datasets and sophisticated algorithms, these models can identify patterns
and trends that may not be apparent through traditional methods.

The predictive power of ML models enables healthcare providers to intervene early,

offering personalized recommendations for lifestyle modifications, preventive measures,
and targeted screenings. This proactive approach improves patient outcomes by enabling
timely interventions to prevent or delay the onset of diabetes-related complications.

Machine Learning models hold great promise in diabetes prediction by providing insights
into individual risk profiles and empowering both patients and healthcare providers to
take proactive steps towards better management and prevention of diabetes.

1.2 Objective

The objective of this project is to develop and evaluate machine learning models for the
prediction of diabetes based on healthcare data. The primary goal is to create predictive
models capable of accurately identifying individuals at risk of developing diabetes before
symptoms appear. Through data analysis, preprocessing, and model training, the project
aims to harness the predictive power of machine learning algorithms to improve early
detection and management of diabetes.

7
Furthermore, the project seeks to explore and compare the performance of different
machine learning classifiers, including Random Forest, Decision Tree, and Support
Vector
Machine (SVM), in predicting diabetes. By employing techniques such as grid search for
hyperparameter tuning and cross-validation for robust evaluation, the project aims to
identify the most effective model for diabetes prediction. Ultimately, the project aims to
provide insights into the utility of machine learning approaches in healthcare and
contribute to the advancement of predictive modeling in diabetes care.

1.3 Purpose, Scope and Applicability

1.3.1 Purpose
The purpose of this project is to develop and evaluate a machine learning-based
predictive model for the early detection of diabetes. Diabetes is a prevalent chronic
disease that poses significant health risks and burdens on individuals and healthcare
systems worldwide.

Early detection plays a crucial role in preventing or delaying the onset of diabetes-related
complications, thus improving patient outcomes and reducing healthcare costs. By
leveraging machine learning algorithms and predictive modeling techniques, this project
aims to harness the power of data analytics to accurately predict an individual's risk of
developing diabetes based on various health parameters and risk factors.

By evaluating different machine learning algorithms, including Random Forest, Decision

Tree, and Support Vector Machine classifiers, this project aims to identify the most
suitable approach for predicting diabetes risk. Ultimately, the purpose of this project is to
advance our understanding of the role of machine learning in healthcare and to empower
healthcare providers with tools that can improve patient outcomes and enhance
population health.

1.3.2 Scope
The scope of this project encompasses the development, implementation, and evaluation
of a machine learning-based predictive model for diabetes detection. The project aims to
utilize a dataset containing relevant health parameters and risk factors to train and
evaluate various machine learning algorithms, including Random Forest, Decision Tree,

8
and Support Vector Machine classifiers. The project will explore different strategies for
handling missing data, outliers, and feature engineering to optimize the predictive model's
accuracy and reliability.

Additionally, the scope of the project extends to the evaluation and validation of the
predictive models using appropriate metrics such as accuracy, confusion matrices, and
classification reports.

Furthermore, the scope of the project extends to the practical application of the predictive
model in healthcare settings. This includes developing an interface for healthcare
professionals to input patient data and obtain predictions regarding their risk of
developing diabetes. Overall, the scope of this project is to provide a comprehensive
analysis of machine learning techniques for diabetes detection, with the aim of
contributing to the advancement of personalized, data-driven approaches to diabetes
management and prevention.

1.3.3 Applicability
The applicability of this project extends across various sectors of healthcare, offering
potential benefits to both patients and healthcare providers. Firstly, the developed
machine learning-based predictive model for diabetes detection can significantly improve
patient outcomes by enabling early intervention and personalized care. By accurately
identifying individuals at risk of developing diabetes before symptoms manifest,
healthcare professionals can initiate timely preventive measures, lifestyle modifications,
and medical interventions to mitigate the progression of the disease and reduce the
likelihood of complications.

The predictive model can be integrated into clinical practice to support healthcare
providers in making informed decisions regarding patient care. By incorporating the
predictive model into electronic health records or clinical decision support systems,
healthcare professionals can access real-time risk assessments and recommendations for
diabetes management during routine patient visits.

The applicability of this project extends beyond clinical settings to include wellness and
preventive healthcare programs. By empowering individuals with knowledge about their
risk of diabetes and providing them with tailored interventions and support, these

9
programs have the potential to improve overall population health outcomes and reduce
healthcare costs associated with diabetes-related complications.
The applicability of this project in healthcare spans across clinical practice, population
health management, and preventive healthcare initiatives, offering transformative benefits
to individuals, healthcare systems, and communities alike.

1.4 Organization of report

The project consists of 7 chapters.

Chapter 1
It comprises the overview of the project, the objective of the project along with the
purpose,
scope, and applicability.
Chapter 2
It comprises the problem statement along with the inputs and the outputs that is to be
given
and expected respectively.
Chapter 3
It gives an overview of the methodology used and the system architecture that has been
followed.
Chapter 4
It consists of the system architecture description and representation of a system along
with a detailed explanation of the system architecture.
Chapter 5
It encompasses the implementation of the project by using the source code.
Chapter 6
They are the outputs that can be expected on implementing the source code. It covers the
maximum possible cases.
Chapter 7
It is the reflection note on how the project has affected me in both technical and non-
technical aspects.

10
Chapter 2
Problem Statement

11
Chapter 2
Problem Statement

Develop a classification algorithm to accurately predict whether patients have diabetes or

not using a dataset with medical predictor variables such as the number of pregnancies,
glucose levels, blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree
function, and age.

The binary outcome variable (Outcome) indicates whether a patient has diabetes (1) or
got (0). The goal is to create a robust predictive model that can assist in early diabetes
diagnosis and risk assessment, ultimately improving patient care and health outcomes.

Key Components to the problem:

Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration in an oral glucose tolerance test
Blood Pressure: Diastolic blood pressure (mm Hg)
Skin Thickness: Triceps skinfold thickness (mm)
Insulin: Two-hour serum insulin
BMI: Body Mass Index
Diabetes Pedigree Function: A numerical feature or variable typically used in diabetes-
related datasets. It quantifies the diabetes hereditary risk er likelihood based on family
history
Age: Age in years

Outcome: Class variable (either 10 or 1). 268 of 768 values are 1, and the others are 0
(Target Variable)

Sample Input:
Number of Pregnancies: 5

12
Glucose Level: 130
Blood Pressure: 70 mmHg
Skin Thickness: 30 mm
Insulin Level: 80
Body Mass Index (BMI): 2

Diabetes Pedigree Function: 0.5

Age of Patient: 35 years

Output for the above data:

“Not Diabetic”

Sample Dataset:

Fig 2.1 Sample Dataset

13
Chapter 3
System Architecture

14
Chapter 3
System Architecture

Fig. 3.1 System Architecture

Data Collection:
The initial phase involves gathering relevant healthcare data from various sources,
including medical records, wearable devices, and health surveys. This data may include
patient demographics, medical history, biometric measurements (e.g., glucose levels,
blood pressure), and lifestyle factors. Techniques for data collection may involve
accessing electronic health records, utilizing health monitoring devices, and conducting
health assessments.

Data Preprocessing:
Raw healthcare data often requires preprocessing to ensure its quality and suitability for
model training. This crucial step involves cleaning the data to remove errors and
inconsistencies, handling missing values, and addressing outliers. Additionally, features
may be transformed or engineered to extract relevant information. For diabetes
prediction, preprocessing may include standardizing glucose levels, categorizing blood
pressure readings, and encoding categorical variables.

Model Training:
During this phase, machine learning models are trained on the preprocessed healthcare
data to predict the likelihood of diabetes occurrence.

Decision Trees are versatile models capable of handling both classification and regression
tasks. They partition the feature space into distinct regions based on feature values,
allowing for intuitive interpretation of decision-making processes.

Dept. of CSE, BIT 2023-2024

Support Vector Machines (SVMs) are particularly effective for binary classification tasks
but can also be extended to handle multiclass classification and regression. They identify
hyperplanes in high-dimensional feature spaces that best separate data points belonging to
different classes, maximizing the margin between classes to enhance robustness and
generalization.

Random Forests leverage the power of ensemble learning by aggregating predictions

from multiple decision trees. By randomly sampling subsets of the training data and
features for each tree, Random Forests reduce the risk of overfitting and improve model
accuracy and robustness. The final prediction is determined by a majority vote or
averaging across the ensemble, leading to more reliable predictions compared to
individual decision trees.

The ANN model consists of multiple layers with dropout regularization to prevent
overfitting.

Input layer: Accepts standardized features.

Hidden layers: Comprise several dense layers with ReLU activation functions to
introduce non-linearity.

Output layer: Single neuron with sigmoid activation function, producing binary
predictions (0 or 1) for diabetes outcome.

Model Construction:

Build an ANN model using Sequential API .Configure dense layers with various
activation functions and dropout regularization to prevent overfitting.

Model Compilation:

Compile the model using binary cross-entropy loss and the Adam optimizer .Set accuracy
as the evaluation metric.

Model Training:

Train the model on the training data with 50 epochs and a batch size of 128.Utilize a
validation split of 10% for monitoring training progress.

Dept. of CSE, BIT 2023-2024

Model Evaluation:
After training, it's essential to evaluate the performance of the predictive models on
unseen data. This is typically achieved by splitting the preprocessed healthcare data into
training and testing sets. The models are trained on the training data and then assessed on
the testing data using metrics such as accuracy, sensitivity and specificity. Evaluation
ensures that the models generalize well to new data and can effectively identify
individuals at risk of diabetes.

Prediction:
Once trained and evaluated, the predictive models can be deployed to make predictions
on new healthcare data. New patient data undergoes preprocessing similar to the training
data and is then fed into the trained models. The models leverage the learned patterns
from the training data to generate predictions regarding an individual's likelihood of
developing diabetes, aiding healthcare professionals in early detection and preventive
interventions.

These stages lay the foundation for deploying machine learning models as decision
support tools in clinical settings, empowering healthcare professionals with valuable
insights to tailor interventions, optimize treatment plans, and promote proactive strategies
for diabetes management and prevention.

Dept. of CSE, BIT 2023-2024

Chapter 4
Tools/Technologies

Dept. of CSE, BIT 2023-2024

Chapter 4
Tools/Technologies

4.1 Hardware Tools

To implement this project, I have used Jupyter Notebook and Python 3 (ipykernel). The
minimum requirements are:

Operating System:
 Windows: The code can be executed on computers running Microsoft Windows
operating systems, including Windows 7, Windows 8, and Windows 10.
 macOS: It is also compatible with macOS, the operating system used on Apple
Macintosh computers.
 Linux: The code can run on various distributions of Linux, including Ubuntu,
Fedora, CentOS, and others.
Memory (RAM):
 The amount of RAM required depends on the size of the dataset being processed
and the complexity of the machine learning models being trained.
 For small to medium-sized datasets and relatively simple models, 4GB to 8GB of
RAM should be sufficient.
Processor (CPU):
 The processor's speed and number of cores influence the code's execution time,
especially during data preprocessing and model training.
 Any modern multi-core processor (e.g., Intel Core i5, i7, or i9 series for Intel
CPUs, or AMD Ryzen series for AMD CPUs) should be sufficient for running the
code.
Storage Space:
 Recommended minimum: 200 MB free disk space for the VS Code installation.
Additional space will be required for your projects and dependencies.
Internet Connectivity:
 Internet connectivity is not mandatory, but some features like extensions and
updates require an internet connection.
Additionally, one can use Google Colab Environment to run this code which will require
internet connection.

Dept. of CSE, BIT 2023-2024

4.2 Software Tools

Programming Language:
Python is the programming language used in the code. Ensure that Python is installed on
the system. One can download and install Python from the official Python website.
Additionally, it's recommended to use a virtual environment management tool like
virtualenv or conda to create isolated Python environments for managing dependencies
and avoiding conflicts between different projects.

Development Environment:
The development environment to write, execute, and manage the Python code. Popular
choices include Jupyter Notebooks, Google Colab, Anaconda, or any text editor/IDE
(Integrated Development Environment) like VSCode, PyCharm, or Sublime Text.

4.3 Libraries Used

pandas: pandas is a powerful data manipulation and analysis library in Python. It

provides data structures like DataFrame and Series, which are ideal for handling
structured data. pandas is widely used for tasks such as data cleaning, transformation,
exploration, and preparation.

numpy: numpy is a fundamental package for scientific computing in Python. It provides

support for multidimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently. numpy is essential for numerical
computations and is extensively used in array-oriented computing tasks.

scikit-learn: scikit-learn is a versatile machine learning library in Python. It offers a wide

range of supervised and unsupervised learning algorithms, including classification,
regression, clustering, dimensionality reduction, and more. scikit-learn also provides tools
for model selection, evaluation, and preprocessing.

seaborn: seaborn is a statistical data visualization library built on top of matplotlib. It

provides a high-level interface for creating attractive and informative statistical graphics.

Dept. of CSE, BIT 2023-2024

seaborn simplifies the process of creating complex visualizations, such as heatmaps,
violin plots, and pair plots, with minimal code.

Chapter 5
Implementation

Dept. of CSE, BIT 2023-2024

Chapter 5
Implementation
5.1 Source Code

#Import Necessary Libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
import seaborn as sns
import matplotlib.pyplot as plt

#Data Analysis

url = "health care diabetes.csv"

data = pd.read_csv(url)

data.head()

data.tail()

print("Number of Rows",data.shape[0])
print("Number of Columns",data.shape[1])

data.info()

Dept. of CSE, BIT 2023-2024

data.isnull().sum(
data.describe()

data_copy = data.copy(deep=True)
data.columns

data_copy[['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI']] = data_copy[['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI']].replace(0,np.nan)
data_copy.isnull().sum()

data['Glucose'] = data['Glucose'].replace(0,data['Glucose'].mean())
data['BloodPressure'] = data['BloodPressure'].replace(0,data['BloodPressure'].mean())
data['SkinThickness'] = data['SkinThickness'].replace(0,data['SkinThickness'].mean())
data['Insulin'] = data['Insulin'].replace(0,data['Insulin'].mean())
data['BMI'] = data['BMI'].replace(0,data['BMI'].mean())

#Training Model

# Separate features and target variable

X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Function to generate heatmap

def plot_confusion_matrix_heatmap(conf_matrix, title):

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title(title)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

Dept. of CSE, BIT 2023-2024

#Random Forest Classifier

pipeline_rf = Pipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler()),
('classifier', RandomForestClassifier(random_state=42))
])

param_grid_rf = {
'classifier__n_estimators': [50, 100, 200],
'classifier__max_depth': [None, 10, 20],
'classifier__min_samples_split': [2, 5, 10],
'classifier__min_samples_leaf': [1, 2, 4],
'classifier__max_features': ['auto', 'sqrt', 'log2']
}

grid_search_rf = GridSearchCV(pipeline_rf, param_grid_rf, cv=5, scoring='accuracy',

n_jobs=-1)
grid_search_rf.fit(X_train, y_train)
best_model_rf = grid_search_rf.best_estimator_
y_pred_rf = best_model_rf.predict(X_test)

accuracy_rf = accuracy_score(y_test, y_pred_rf)

conf_matrix_rf = confusion_matrix(y_test, y_pred_rf)
classification_report_str_rf = classification_report(y_test, y_pred_rf)

print("RandomForestClassifier Results:")
print(f"Accuracy: {accuracy_rf}")
print(f"Confusion Matrix:\n{conf_matrix_rf}")
print("Classification Report:\n", classification_report_str_rf)

# Generate heatmap for RandomForestClassifier

plot_confusion_matrix_heatmap(conf_matrix_rf, 'RandomForestClassifier Confusion
Matrix')

Dept. of CSE, BIT 2023-2024

#Decision Tree Classifier

pipeline_dt = Pipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler()),
('classifier', DecisionTreeClassifier(random_state=42))
])

param_grid_dt = {
'classifier__max_depth': [None, 10, 20],
'classifier__min_samples_split': [2, 5, 10],
'classifier__min_samples_leaf': [1, 2, 4],
'classifier__max_features': ['auto', 'sqrt', 'log2']
}

grid_search_dt = GridSearchCV(pipeline_dt, param_grid_dt, cv=5, scoring='accuracy',

n_jobs=-1)
grid_search_dt.fit(X_train, y_train)
best_model_dt = grid_search_dt.best_estimator_
y_pred_dt = best_model_dt.predict(X_test)

accuracy_dt = accuracy_score(y_test, y_pred_dt)

conf_matrix_dt = confusion_matrix(y_test, y_pred_dt)
classification_report_str_dt = classification_report(y_test, y_pred_dt)

print("DecisionTreeClassifier Results:")
print(f"Accuracy: {accuracy_dt}")
print(f"Confusion Matrix:\n{conf_matrix_dt}")
print("Classification Report:\n", classification_report_str_dt)

# Generate heatmap for DecisionTreeClassifier

plot_confusion_matrix_heatmap(conf_matrix_dt, 'DecisionTreeClassifier Confusion
Matrix')

Dept. of CSE, BIT 2023-2024

#Support Vector Machine

pipeline_svm = Pipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler()),
('classifier', SVC(random_state=42))
])

param_grid_svm = {
'classifier__C': [0.1, 1, 10],
'classifier__kernel': ['linear', 'rbf', 'poly'],
'classifier__gamma': ['scale', 'auto']
}

grid_search_svm = GridSearchCV(pipeline_svm, param_grid_svm, cv=5,

scoring='accuracy', n_jobs=-1)
grid_search_svm.fit(X_train, y_train)
best_model_svm = grid_search_svm.best_estimator_
y_pred_svm = best_model_svm.predict(X_test)

accuracy_svm = accuracy_score(y_test, y_pred_svm)

conf_matrix_svm = confusion_matrix(y_test, y_pred_svm)
classification_report_str_svm = classification_report(y_test, y_pred_svm)

print("SVM Classifier Results:")

print(f"Accuracy: {accuracy_svm}")
print(f"Confusion Matrix:\n{conf_matrix_svm}")
print("Classification Report:\n", classification_report_str_svm)

# Generate heatmap for SVM Classifier

plot_confusion_matrix_heatmap(conf_matrix_svm, 'SVM Classifier Confusion Matrix')

Dept. of CSE, BIT 2023-2024

#Prediction on New Data

rf_model=RandomForestClassifier(n_estimators=100,
max_depth=10, min_samples_split=2, min_samples_leaf=1, max_features='auto',
random_state=42)

# Fit the model to the training data

rf_model.fit(X_train, y_train)

# Example of new data (modify this according to your feature names and values)
new_data_example = pd.DataFrame({
'Pregnancies': [5],
'Glucose': [130],
'BloodPressure': [70],
'SkinThickness': [30],
'Insulin': [80],
'BMI': [25],
'DiabetesPedigreeFunction': [0.5],
'Age': [35]
})

# Make predictions using the trained RandomForestClassifier

prediction_rf = rf_model.predict(new_data_example)

# Convert the prediction to a more user-friendly result

result_rf = "Diabetic" if prediction_rf[0] == 1 else "Not Diabetic"

print("Result: ", result_rf)

Dept. of CSE, BIT 2023-2024

Nueral networks sequential model :

model = Sequential([
Dense(512, activation='relu', input_shape=(X_train.shape[1],)),
Dropout(0.5),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(256, activation='relu'),
Dropout(0.5),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

Dept. of CSE, BIT 2023-2024

Chapter 6
Results
Chapter 6
Results

Fig 6.1
Results for Random Forest Classifier

Fig 6.2 Results for Decision Tree Classifier

Dept. of CSE, BIT 2023-2024

Fig 6.3 Results for Support Vector Machine

Fig 6.4(a) Prediction on new data_1

Fig 6.4(b) Output for new data_1

Dept. of CSE, BIT 2023-2024

Fig 6.5(a) Prediction on new data_2

Fig 6.5(b) Output for new data_2

Dept. of CSE, BIT 2023-2024

Chapter 7
Reflection Notes

Dept. of CSE, BIT 2023-2024

Chapter 7
Reflection Notes

Delving into the realm of machine learning, Python emerged as a cornerstone due to its
rich ecosystem of libraries tailored for data analysis and modeling. scikit-learn, NumPy,
and Pandas stood out as indispensable tools, each playing pivotal roles in various stages
of the machine learning pipeline. NumPy provided a solid foundation for numerical
computations and efficient handling of arrays, while Pandas facilitated seamless data
manipulation and preprocessing tasks, thanks to its powerful DataFrame and Series
structures. Meanwhile, scikit-learn offered a comprehensive suite of machine learning
algorithms and utilities, simplifying the implementation and evaluation of predictive
models.

The Random Forest algorithm emerged as a particularly robust choice for predictive
modeling endeavors. Its ensemble nature, which harnesses the collective wisdom of
multiple decision trees, proved effective in enhancing predictive accuracy while
mitigating the risk of overfitting. By aggregating predictions from diverse individual
trees, Random Forests fostered robustness and resilience, making them well-suited for a
wide range of classification and regression tasks across different domains.

My exploration of machine learning tools and technologies revealed a dynamic landscape,

where careful choices and integration of tools significantly impact the development and
deployment of predictive models.my internship experience provided me with invaluable
opportunities for my personal development as well. I am grateful for the challenges and
learning experiences that have contributed to my growth, and I look forward to applying
these newfound skills and insights in future endeavors.

Dept. of CSE, BIT 2023-2024

References
1. DataCamp. (n.d.). Random Forests Classifier in Python. Retrieved from
https://www.datacamp.com/tutorial/random-forests-classifier-
python#::text=Random%20forests%20are%20for%20supervised,combine
%20predictions%20from%20other%20models.

2. GeeksForGeeks. (n.d.). Decision Tree Introduction with Example. Retrieved from

[https://www.geeksforgeeks.org/decision-tree-introduction-example/]
(https://www.geeksforgeeks.org/decision-tree-introduction-example/).

3. GeeksForGeeks. (n.d.). Support Vector Machine Algorithm. Retrieved from

[https://www.geeksforgeeks.org/support-vector-machine-algorithm/](https://
www.geeksforgeeks.org/support-vector-machine-algorithm/).

4. W3Schools. (n.d.). Python Machine Learning - Getting Started. Retrieved from

[https://www.w3schools.com/python/python_ml_getting_started.asp](https://
www.w3schools.com/python/python_ml_getting_started.asp).

5. Brownlee, J. (2020). Introduction to Random Forests for Classification and

Regression. Machine Learning Mastery. Retrieved from
[https://machinelearningmastery.com/random-forest-ensemble-in-python/]
(https://machinelearningmastery.com/random-forest-ensemble-in-python/).

6. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ...
& Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of
Machine Learning Research, 12(Oct), 2825-2830. Retrieved from
[https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html](https://
jmlr.csail.mit.edu/papers/v12/pedregosa11a.html).

7. McKinney, W. (2010). Data Structures for Statistical Computing in Python.

Proceedings of the 9th Python in Science Conference, 51-56. Retrieved from

Dept. of CSE, BIT 2023-2024

[https://conference.scipy.org/proceedings/scipy2010/mckinney.html](https://
conference.scipy.org/proceedings/scipy2010/mckinney.html).

Dept. of CSE, BIT 2023-2024

Project Report Diabetes
No ratings yet
Project Report Diabetes
31 pages
Diabetes
No ratings yet
Diabetes
73 pages
Diabetes Prediction with ML
No ratings yet
Diabetes Prediction with ML
38 pages
Diabetes Documentation
No ratings yet
Diabetes Documentation
54 pages
Kanak Blackbook Project
No ratings yet
Kanak Blackbook Project
57 pages
Diabets Prediction System Using Machine Learning Techiques: Jawaharlal Nehru Technological University Hyderabad
No ratings yet
Diabets Prediction System Using Machine Learning Techiques: Jawaharlal Nehru Technological University Hyderabad
47 pages
Diabetes Thesis1
No ratings yet
Diabetes Thesis1
20 pages
AI & ML Internship Report 2023
No ratings yet
AI & ML Internship Report 2023
33 pages
Estimaing Diabetic Risk Accurately (Documentation)
No ratings yet
Estimaing Diabetic Risk Accurately (Documentation)
56 pages
Major Project
No ratings yet
Major Project
53 pages
Sample INTERNSHIP Report
No ratings yet
Sample INTERNSHIP Report
32 pages
Pro 1
No ratings yet
Pro 1
11 pages
Seminar Report Shanu Saklani
No ratings yet
Seminar Report Shanu Saklani
22 pages
BHOOMIKA INTERNSHIP Final REPORT
No ratings yet
BHOOMIKA INTERNSHIP Final REPORT
50 pages
Diabets Project Document3
No ratings yet
Diabets Project Document3
60 pages
Final Diabetes Prediction Documentation
No ratings yet
Final Diabetes Prediction Documentation
52 pages
Mini Project Report
No ratings yet
Mini Project Report
34 pages
Diabetes Prediction Using ML
No ratings yet
Diabetes Prediction Using ML
29 pages
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
No ratings yet
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
27 pages
Report 4227
No ratings yet
Report 4227
29 pages
Medical Cost Prediction Internship
No ratings yet
Medical Cost Prediction Internship
10 pages
22iot21 Internship Report (
No ratings yet
22iot21 Internship Report (
23 pages
Final Int68mon
No ratings yet
Final Int68mon
8 pages
Disease Pred
No ratings yet
Disease Pred
42 pages
Ipsita PR
No ratings yet
Ipsita PR
41 pages
Report
No ratings yet
Report
47 pages
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
No ratings yet
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
43 pages
CPP Final Reportt
No ratings yet
CPP Final Reportt
15 pages
Dheeraj Internship Report
No ratings yet
Dheeraj Internship Report
38 pages
Iidt Record
No ratings yet
Iidt Record
25 pages
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
100% (1)
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
35 pages
Mini Project On Diabetes Prediction: Information Technology
No ratings yet
Mini Project On Diabetes Prediction: Information Technology
19 pages
Start
No ratings yet
Start
11 pages
Student ML Project: Diabetes Predictor
0% (1)
Student ML Project: Diabetes Predictor
25 pages
Ashita A B
No ratings yet
Ashita A B
28 pages
Sairaj Kasote
No ratings yet
Sairaj Kasote
11 pages
Sai - Doc (1) H5
No ratings yet
Sai - Doc (1) H5
58 pages
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
AICTE Internship 2024 Project Report Template 2
No ratings yet
AICTE Internship 2024 Project Report Template 2
27 pages
Diabetes Project MuskanAltaf
No ratings yet
Diabetes Project MuskanAltaf
15 pages
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
No ratings yet
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
20 pages
Rasim Abdul PDF
No ratings yet
Rasim Abdul PDF
27 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
43 pages
Final Document1
No ratings yet
Final Document1
126 pages
Internship Introduction Pages
No ratings yet
Internship Introduction Pages
10 pages
Project Documentation of Diabetese Detection Using KNN Algorithm
No ratings yet
Project Documentation of Diabetese Detection Using KNN Algorithm
47 pages
Mini Docs Batch 7
No ratings yet
Mini Docs Batch 7
49 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Organized
No ratings yet
Organized
5 pages
Internship Report
No ratings yet
Internship Report
23 pages
Ashutoosh Jhaaa
No ratings yet
Ashutoosh Jhaaa
6 pages
Roopa Report
No ratings yet
Roopa Report
64 pages
Mini Project Template Both
No ratings yet
Mini Project Template Both
35 pages
Diabetes Prediction via ML
No ratings yet
Diabetes Prediction via ML
82 pages
Project Report 6th Sem
No ratings yet
Project Report 6th Sem
20 pages
Pooja Intership2
No ratings yet
Pooja Intership2
35 pages
DIABETES
No ratings yet
DIABETES
33 pages
Heart Disease
No ratings yet
Heart Disease
28 pages
DW 10 TD
No ratings yet
DW 10 TD
151 pages
CH 1-2 Choi
No ratings yet
CH 1-2 Choi
29 pages
Kubernetes for Microservices Availability
No ratings yet
Kubernetes for Microservices Availability
12 pages
Electrodynamics Fiziks Notes
40% (5)
Electrodynamics Fiziks Notes
371 pages
14 Colpitts Oscillators
No ratings yet
14 Colpitts Oscillators
3 pages
Rules For The Survey and Construction of Steel Ships: Part C
No ratings yet
Rules For The Survey and Construction of Steel Ships: Part C
19 pages
MAT2615 TL 101 Assignment2
No ratings yet
MAT2615 TL 101 Assignment2
2 pages
Certified Reliability Engineer
No ratings yet
Certified Reliability Engineer
9 pages
Albert Einstein - Wikipedia, The Free Encyclopedia
No ratings yet
Albert Einstein - Wikipedia, The Free Encyclopedia
34 pages
English For Electrical and Electronics Engineering v2 1 2
No ratings yet
English For Electrical and Electronics Engineering v2 1 2
77 pages
Study Guide Test 1
No ratings yet
Study Guide Test 1
1 page
Marathi Verb Forms Guide
50% (2)
Marathi Verb Forms Guide
2 pages
AP Chemistry Lab Paper Chromatography
No ratings yet
AP Chemistry Lab Paper Chromatography
2 pages
History of Indian Philosophy Vol 2 - Frauwallner, Erich
No ratings yet
History of Indian Philosophy Vol 2 - Frauwallner, Erich
274 pages
Midpoint of A Line Task Two Worksheet 1
No ratings yet
Midpoint of A Line Task Two Worksheet 1
3 pages
Exponential Smoothing Techniques
No ratings yet
Exponential Smoothing Techniques
18 pages
GATE 2001 Instrumentation Solved Paper
No ratings yet
GATE 2001 Instrumentation Solved Paper
15 pages
Project Report
No ratings yet
Project Report
45 pages
Phase Locked Loop Working and Operatin Principle With Applications
No ratings yet
Phase Locked Loop Working and Operatin Principle With Applications
5 pages
Math8 Q4 Week3 Hybrid Version1 1
No ratings yet
Math8 Q4 Week3 Hybrid Version1 1
13 pages
Engineering Students' Guide to SPS
100% (1)
Engineering Students' Guide to SPS
37 pages
Military Vehicle Test Report
No ratings yet
Military Vehicle Test Report
36 pages
14 Agriculture01
No ratings yet
14 Agriculture01
75 pages
Semantics: Speech Acts
No ratings yet
Semantics: Speech Acts
14 pages
1 Development Platform
No ratings yet
1 Development Platform
12 pages
Nasa
No ratings yet
Nasa
36 pages
Chapter 3 The Molecules of Cells: Campbell's Biology: Concepts and Connections, 7e (Reece Et Al.)
100% (1)
Chapter 3 The Molecules of Cells: Campbell's Biology: Concepts and Connections, 7e (Reece Et Al.)
15 pages
Notes 92
No ratings yet
Notes 92
4 pages
Gas Project Ballasting Plan
No ratings yet
Gas Project Ballasting Plan
18 pages