0% found this document useful (0 votes)

18 views14 pages

Stroke Prediction

The document outlines a stroke prediction project utilizing various machine learning models, including Logistic Regression, SVM, Decision Tree, and KNN. It details the data preprocessing steps, exploratory data analysis, and model training and evaluation processes. The dataset consists of health-related features, and the models achieved high accuracy scores in predicting stroke occurrences.

Uploaded by

Pulkita Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

Stroke Prediction

Uploaded by

Pulkita Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

7/5/23, 12:49 PM Stroke Prediction

Heart Stroke Prediction

Importing libraries
In [ ]: #Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import f1_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import log_loss

In [ ]: #Loading the dataset

df = pd.read_csv('healthcare-dataset-stroke-data.csv')
df.head()

Out[ ]: id gender age hypertension heart_disease ever_married work_type Residenc

0 9046 Male 67.0 0 1 Yes Private

Self-
1 51676 Female 61.0 0 0 Yes
employed

2 31112 Male 80.0 0 1 Yes Private

3 60182 Female 49.0 0 0 Yes Private

Self-
4 1665 Female 79.0 1 0 Yes
employed

In [ ]: df.drop('id', axis=1, inplace=True)

Data Preprocessing
In [ ]: df.describe()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 1/14

7/5/23, 12:49 PM Stroke Prediction

Out[ ]: age hypertension heart_disease avg_glucose_level bmi s

count 5110.000000 5110.000000 5110.000000 5110.000000 4909.000000 5110.00

mean 43.226614 0.097456 0.054012 106.147677 28.893237 0.04

std 22.612647 0.296607 0.226063 45.283560 7.854067 0.2

min 0.080000 0.000000 0.000000 55.120000 10.300000 0.00

25% 25.000000 0.000000 0.000000 77.245000 23.500000 0.00

50% 45.000000 0.000000 0.000000 91.885000 28.100000 0.00

75% 61.000000 0.000000 0.000000 114.090000 33.100000 0.00

max 82.000000 1.000000 1.000000 271.740000 97.600000 1.00

In [ ]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 5110 non-null object
1 age 5110 non-null float64
2 hypertension 5110 non-null int64
3 heart_disease 5110 non-null int64
4 ever_married 5110 non-null object
5 work_type 5110 non-null object
6 Residence_type 5110 non-null object
7 avg_glucose_level 5110 non-null float64
8 bmi 4909 non-null float64
9 smoking_status 5110 non-null object
10 stroke 5110 non-null int64
dtypes: float64(3), int64(3), object(5)
memory usage: 439.3+ KB

In [ ]: df['age'].astype(int)

Out[ ]: 0 67
1 61
2 80
3 49
4 79
..
5105 80
5106 81
5107 35
5108 51
5109 44
Name: age, Length: 5110, dtype: int32

In [ ]: #Checking for null values

df.isnull().sum()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 2/14

7/5/23, 12:49 PM Stroke Prediction

Out[ ]: gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 201
smoking_status 0
stroke 0
dtype: int64

In [ ]: #replacing the missing values with the most frequent value

df['bmi'].fillna(df['bmi'].mode()[0], inplace=True)

Check values and their count in the columns

In [ ]: print(df['ever_married'].value_counts())
print(df['work_type'].value_counts())
print(df['gender'].value_counts())
print(df['Residence_type'].value_counts())
print(df['smoking_status'].value_counts())

ever_married
Yes 3353
No 1757
Name: count, dtype: int64
work_type
Private 2925
Self-employed 819
children 687
Govt_job 657
Never_worked 22
Name: count, dtype: int64
gender
Female 2994
Male 2115
Other 1
Name: count, dtype: int64
Residence_type
Urban 2596
Rural 2514
Name: count, dtype: int64
smoking_status
never smoked 1892
Unknown 1544
formerly smoked 885
smokes 789
Name: count, dtype: int64

Replacing the values in columns with numerical values

Residence Type: Urban = 1, Rural = 0

Smoking Status: formerly smoked = 1, never smoked = 2, smokes = 3, Unknown = 0
Ever_Maried : Yes = 1, No = 0
Gender : Male = 1, Female = 0, Other = 2

file:///E:/Data Science Course/Projects/Stroke Prediction.html 3/14

7/5/23, 12:49 PM Stroke Prediction

Work Type : Private = 0, Self-employed = 1, children = 2, Govt_job = 3,

Never_worked = 4

In [ ]: df['ever_married'].replace({'Yes':1, 'No':0}, inplace=True)

df['gender'].replace({'Male':1, 'Female':0,'Other':2}, inplace=True)
df['Residence_type'].replace({'Urban':1, 'Rural':0}, inplace=True)
df['smoking_status'].replace({'formerly smoked':0, 'never smoked':1, 'smokes':2,
df['work_type'].replace({'Private':0, 'Self-employed':1, 'children':2, 'Govt_job

Exploratory Data Analysis

Find correlation between the variables

In [ ]: df.corr()['stroke'][:-1].sort_values().plot(kind='bar')

Out[ ]: <Axes: >

In [ ]: plt.figure(figsize=(10,10))
sns.heatmap(df.corr(), annot=True)

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/Stroke Prediction.html 4/14

7/5/23, 12:49 PM Stroke Prediction

In [ ]: # replace age with number wrt to age group

# 0 = 0-12 , 1 = 13-19 , 2 = 20-30 , 3 = 31-60 , 4 = 61-100
df['age'] = pd.cut(x=df['age'], bins=[0, 12, 19, 30, 60, 100], labels=[0, 1, 2,
df.head()

Out[ ]: gender age hypertension heart_disease ever_married work_type Residence_type

0 1 4 0 1 1 0 1

1 0 4 0 0 1 1 0

2 1 4 0 1 1 0 0

3 0 3 0 0 1 0 1

4 0 4 1 0 1 1 0

Visulaizing the data

In [ ]: sns.countplot(x = 'gender', data = df)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 5/14

7/5/23, 12:49 PM Stroke Prediction

Out[ ]: <Axes: xlabel='gender', ylabel='count'>

In [ ]: fig, ax = plt.subplots(4,4,figsize=(20, 20))

sns.countplot(x = 'gender', data = df,hue = 'stroke', ax=ax[0,0])
sns.countplot(x = 'age', data = df,hue = 'hypertension', ax=ax[0,1])
sns.countplot(x = 'age', data = df,hue = 'heart_disease', ax=ax[0,2])
sns.countplot(x = 'age', data = df,hue = 'stroke', ax=ax[0,3])
sns.countplot(x = 'hypertension', data = df,hue = 'stroke', ax=ax[1,0])
sns.countplot(x = 'heart_disease', data = df,hue = 'stroke', ax=ax[1,1])
sns.countplot(x = 'ever_married', data = df,hue = 'stroke', ax=ax[1,2])
sns.countplot(x = 'age', data = df,hue = 'ever_married', ax=ax[1,3])
sns.countplot(x = 'work_type', data = df,hue = 'stroke', ax=ax[2,0])
sns.countplot(x = 'Residence_type', data = df,hue = 'stroke', ax=ax[2,1])
sns.countplot(x = 'smoking_status', data = df,hue = 'stroke', ax=ax[2,2])
sns.lineplot(x = 'bmi', y = 'avg_glucose_level', data = df,hue = 'stroke', ax=ax
sns.countplot(x = 'age', data = df,hue = 'smoking_status', ax=ax[3,0])
sns.countplot( x = 'work_type', data = df,hue = 'Residence_type', ax=ax[3,1])
sns.countplot(x = 'work_type', data = df,hue = 'smoking_status', ax=ax[3,2])
sns.countplot(x = 'Residence_type', data = df,hue = 'smoking_status', ax=ax[3,3]

Out[ ]: <Axes: xlabel='Residence_type', ylabel='count'>

file:///E:/Data Science Course/Projects/Stroke Prediction.html 6/14

7/5/23, 12:49 PM Stroke Prediction

Train Test Split

In [ ]: X_train, X_test, y_train, y_test = train_test_split(df.drop('stroke', axis=1), d

Model Training

Logistic Regression
In [ ]: lr = LogisticRegression()
lr

Out[ ]: ▾ LogisticRegression

LogisticRegression()

In [ ]: #training the model

lr.fit(X_train, y_train)
lr.score(X_test, y_test)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 7/14

7/5/23, 12:49 PM Stroke Prediction

C:\Users\DELL\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2k
fra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\linear_model\_lo
gistic.py:458: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Out[ ]: 0.9393346379647749

In [ ]: #testing the model

lr_pred = lr.predict(X_test)
accuracy_score(y_test, lr_pred)

Out[ ]: 0.9393346379647749

Support Vector Machine (SVM)

In [ ]: from sklearn.svm import SVC
svm = SVC()
svm

Out[ ]: ▾ SVC

SVC()

In [ ]: #training the model

svm.fit(X_train, y_train)
svm.score(X_test, y_test)

Out[ ]: 0.9393346379647749

In [ ]: #testing the model

sv_pred = svm.predict(X_test)
accuracy_score(y_test, sv_pred)

Out[ ]: 0.9393346379647749

Decision Tree Classifier

In [ ]: from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt

Out[ ]: ▾ DecisionTreeClassifier

DecisionTreeClassifier()

In [ ]: #training the model

dt.fit(X_train, y_train)
dt.score(X_test, y_test)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 8/14

7/5/23, 12:49 PM Stroke Prediction

Out[ ]: 0.9099804305283757

In [ ]: #testing the model

dt_pred = dt.predict(X_test)
accuracy_score(y_test, dt_pred)

Out[ ]: 0.9099804305283757

K-Nearest Neighbors (KNN)

In [ ]: knn = KNeighborsClassifier()
knn

Out[ ]: ▾ KNeighborsClassifier

KNeighborsClassifier()

In [ ]: #training the model

knn.fit(X_train, y_train)
knn.score(X_test, y_test)

Out[ ]: 0.9373776908023483

In [ ]: #testing the model

knn_pred = knn.predict(X_test)
accuracy_score(y_test, knn_pred)

Out[ ]: 0.9373776908023483

Model Evaluation

Logistic Regression
In [ ]: sns.heatmap(metrics.confusion_matrix(y_test, lr_pred), annot=True, fmt='d')
plt.title('Accuracy Score: {}'.format(accuracy_score(y_test, lr_pred)))
plt.ylabel('Predicted')
plt.xlabel('Actual')
plt.show()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 9/14

7/5/23, 12:49 PM Stroke Prediction

In [ ]: print('Logistic Regression Model Accuracy Score:',accuracy_score(y_test, lr_pred

print('Logistic Regression Model F1 score: ',metrics.f1_score(y_test, lr_pred))
print('Logistic Regression Model Mean Absolute Error: ',metrics.mean_absolute_er
print('Logistic Regression Model Mean Squared Error: ',metrics.mean_squared_erro
print('Logistic Regression Model log loss: ',log_loss(y_test, lr_pred))

Logistic Regression Model Accuracy Score: 0.9393346379647749

Logistic Regression Model F1 score: 0.0
Logistic Regression Model Mean Absolute Error: 0.060665362035225046
Logistic Regression Model Mean Squared Error: 0.060665362035225046
Logistic Regression Model log loss: 2.1866012819229583

Support Vector Machine (SVM)

In [ ]: sns.heatmap(metrics.confusion_matrix(y_test, sv_pred), annot=True, fmt='d')
plt.title('Accuracy Score: {}'.format(accuracy_score(y_test, sv_pred)))
plt.ylabel('Predicted')
plt.xlabel('Actual')
plt.show()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 10/14

7/5/23, 12:49 PM Stroke Prediction

In [ ]: print('SVM Model Accuracy Score:',accuracy_score(y_test, sv_pred))

print('SVM Model F1 score: ',metrics.f1_score(y_test, sv_pred))
print('SVM Model Mean Absolute Error: ',metrics.mean_absolute_error(y_test, sv_p
print('SVM Model Mean Squared Error: ',metrics.mean_squared_error(y_test, sv_pre
print('SVM Model log loss: ',log_loss(y_test, sv_pred))

SVM Model Accuracy Score: 0.9393346379647749

SVM Model F1 score: 0.0
SVM Model Mean Absolute Error: 0.060665362035225046
SVM Model Mean Squared Error: 0.060665362035225046
SVM Model log loss: 2.1866012819229583

Decision Tree Classifier

In [ ]: sns.heatmap(metrics.confusion_matrix(y_test, dt_pred), annot=True, fmt='d')
plt.title('Accuracy Score: {}'.format(accuracy_score(y_test, dt_pred)))
plt.ylabel('Predicted')
plt.xlabel('Actual')
plt.show()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 11/14

7/5/23, 12:49 PM Stroke Prediction

In [ ]: print('Decision Tree Model Accuracy Score:',accuracy_score(y_test, dt_pred))

print('Decision Tree Model F1 score: ',metrics.f1_score(y_test, dt_pred))
print('Decision Tree Model Mean Absolute Error: ',metrics.mean_absolute_error(y_
print('Decision Tree Model Mean Squared Error: ',metrics.mean_squared_error(y_te
print('Decision Tree Model log loss: ',log_loss(y_test, dt_pred))

Decision Tree Model Accuracy Score: 0.9099804305283757

Decision Tree Model F1 score: 0.2459016393442623
Decision Tree Model Mean Absolute Error: 0.09001956947162426
Decision Tree Model Mean Squared Error: 0.09001956947162426
Decision Tree Model log loss: 3.2446341602727773

K-Nearest Neighbors (KNN)

In [ ]: sns.heatmap(metrics.confusion_matrix(y_test, knn_pred), annot=True, fmt='d')
plt.title('Accuracy Score: {}'.format(accuracy_score(y_test, knn_pred)))
plt.ylabel('Predicted')
plt.xlabel('Actual')
plt.show()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 12/14

7/5/23, 12:49 PM Stroke Prediction

In [ ]: print('KNN Model Accuracy Score:',accuracy_score(y_test, knn_pred))

print('KNN Model F1 score: ',metrics.f1_score(y_test, knn_pred))
print('KNN Model Mean Absolute Error: ',metrics.mean_absolute_error(y_test, knn_
print('KNN Model Mean Squared Error: ',metrics.mean_squared_error(y_test, knn_pr
print('KNN Model log loss: ',log_loss(y_test, knn_pred))

KNN Model Accuracy Score: 0.9373776908023483

KNN Model F1 score: 0.0
KNN Model Mean Absolute Error: 0.06262230919765166
KNN Model Mean Squared Error: 0.06262230919765166
KNN Model log loss: 2.2571368071462796

Model Comparison
In [ ]: models = ['Logistic Regression', 'SVM', 'Decision Tree', 'KNN']
accuracy = [accuracy_score(y_test, lr_pred), accuracy_score(y_test, sv_pred), ac
plt.figure(figsize=(10,5))
plt.bar(models, accuracy, color = 'Maroon', width = 0.4)
plt.xlabel('Models')
plt.ylabel('Accuracy')
plt.title('Model Accuracy')
plt.show()

file:///E:/Data Science Course/Projects/Stroke Prediction.html 13/14

7/5/23, 12:49 PM Stroke Prediction

Conclusion
The model accuracies of Logistic Regression, SVM and KNN are quite similar i.e. 93.8 %.
The accuracy of Decision Tree Classifier is 91.8 %. So, we can use any of these models to
predict the heart stroke.

According to the graphs age v/s hypertension, heart disease showing chances of stroke,
the number of person having a stroke shows dependece upon heart disease and
hypertension. But when we plot the graph of heart disease and hypertension against the
stroke, the persons with lower chances of hypertension and heart disease has increased
chances of stroke. This is a peculiar thing and needs to be investigated further. In
addition to that non somkers have higher chances of stroke than smokers. This is also a
peculiar thing and needs to be investigated further. However person having BMI
between20 to 50 have higher chances of stroke.

Last but not least other features such as martial status, residence type as well as work
type are showing effect on the chances of stroke.

file:///E:/Data Science Course/Projects/Stroke Prediction.html 14/14

Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
m3125 Practical 3
No ratings yet
m3125 Practical 3
13 pages
Rapport
No ratings yet
Rapport
21 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Brain Stroke Prediction Using ML - Jupyter Notebook
No ratings yet
Brain Stroke Prediction Using ML - Jupyter Notebook
17 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Heart Attack
No ratings yet
Heart Attack
18 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
AI Mini Project
No ratings yet
AI Mini Project
6 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Diabetes
No ratings yet
Diabetes
7 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Dsbda 5
No ratings yet
Dsbda 5
12 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Vedant, Aiml
No ratings yet
Vedant, Aiml
63 pages
6034 Logistic Regression
No ratings yet
6034 Logistic Regression
6 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Linear Merged Pagenumber
No ratings yet
Linear Merged Pagenumber
48 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Prediction - Ipynb - Colab
No ratings yet
Prediction - Ipynb - Colab
7 pages
Heart Disease - EDA and Prediction
No ratings yet
Heart Disease - EDA and Prediction
13 pages
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
No ratings yet
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
10 pages
Diabetes Prediction Model Guide
No ratings yet
Diabetes Prediction Model Guide
20 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Diabetes Prediction with SVM & RF
No ratings yet
Diabetes Prediction with SVM & RF
8 pages
Medical Cost Prediction
No ratings yet
Medical Cost Prediction
27 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Data Set Preperation
No ratings yet
Data Set Preperation
7 pages
Heart - Disease - 1.ipynb - Colaboratory
No ratings yet
Heart - Disease - 1.ipynb - Colaboratory
9 pages
Deeks Ex5
No ratings yet
Deeks Ex5
4 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
Healthcare Dataset Stroke Data
No ratings yet
Healthcare Dataset Stroke Data
87 pages
Dataset Description
No ratings yet
Dataset Description
1 page
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Corrigé TP ML Prétraitmodelisation
No ratings yet
Corrigé TP ML Prétraitmodelisation
24 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Basic Web Page Creation
No ratings yet
Basic Web Page Creation
40 pages
Answer Files For Chapter 21
No ratings yet
Answer Files For Chapter 21
54 pages
First Tech Federal Bank Open Up Method PDF
100% (3)
First Tech Federal Bank Open Up Method PDF
26 pages
67383e3b1980c836e42857fd Wuxabaxesibax
No ratings yet
67383e3b1980c836e42857fd Wuxabaxesibax
2 pages
Bernard - ASM - ENG - GRP - Rev01c - WP-M
No ratings yet
Bernard - ASM - ENG - GRP - Rev01c - WP-M
36 pages
Student Grades and Sales Data
No ratings yet
Student Grades and Sales Data
17 pages
ESG Data Access at University of Zurich
No ratings yet
ESG Data Access at University of Zurich
5 pages
Digital Transformation of SAP Supply Chain Processes... 2024 - 215 PP
No ratings yet
Digital Transformation of SAP Supply Chain Processes... 2024 - 215 PP
215 pages
Engineering Poster Design Guide
No ratings yet
Engineering Poster Design Guide
8 pages
Sap New Edition Hana: SQL Script
No ratings yet
Sap New Edition Hana: SQL Script
32 pages
Scratch Programming: Hungry Parrot Guide
No ratings yet
Scratch Programming: Hungry Parrot Guide
8 pages
Agnet Over Satcom
No ratings yet
Agnet Over Satcom
6 pages
In Place Sorting Vs Out Place Sorting
No ratings yet
In Place Sorting Vs Out Place Sorting
2 pages
Simple Novel Manager (VNGE)
No ratings yet
Simple Novel Manager (VNGE)
2 pages
Robot Operating System
No ratings yet
Robot Operating System
4 pages
G10 Math Q1 - Week 1-2 - Arithmetic Sequence
No ratings yet
G10 Math Q1 - Week 1-2 - Arithmetic Sequence
23 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Mbi Mbi6030
No ratings yet
Mbi Mbi6030
28 pages
Twitter Emojis & Text or To Be Moo Copy & Paste
No ratings yet
Twitter Emojis & Text or To Be Moo Copy & Paste
1 page
Alpha Trend
No ratings yet
Alpha Trend
2 pages
VMAPStech Fact Sheet - 2022
No ratings yet
VMAPStech Fact Sheet - 2022
2 pages
Register 6 - Iu Interface Control Plane SCCP and RANAP
100% (1)
Register 6 - Iu Interface Control Plane SCCP and RANAP
94 pages
Actros 5 & Arocs 5 Fitment Guide
No ratings yet
Actros 5 & Arocs 5 Fitment Guide
30 pages
HP LJ m203 Pro MFP m227 Troubleshooting
100% (1)
HP LJ m203 Pro MFP m227 Troubleshooting
282 pages
Probabilistic Reasoning: CS 188: Artificial Intelligence
No ratings yet
Probabilistic Reasoning: CS 188: Artificial Intelligence
10 pages
Constello 2K24-March 6
No ratings yet
Constello 2K24-March 6
15 pages
Bally - Mk2 Alpha 0001 Setup
No ratings yet
Bally - Mk2 Alpha 0001 Setup
83 pages
2-1 Python Manual
100% (1)
2-1 Python Manual
80 pages
Plantilla Caso Práctico SEO-sem
No ratings yet
Plantilla Caso Práctico SEO-sem
9 pages
FortiNAC Deployment Guide v94
No ratings yet
FortiNAC Deployment Guide v94
82 pages

Stroke Prediction

Uploaded by

Stroke Prediction

Uploaded by

7/5/23, 12:49 PM Stroke Prediction

Heart Stroke Prediction

In [ ]: #Loading the dataset

Out[ ]: id gender age hypertension heart_disease ever_married work_type Residenc

0 9046 Male 67.0 0 1 Yes Private

2 31112 Male 80.0 0 1 Yes Private

3 60182 Female 49.0 0 0 Yes Private

In [ ]: df.drop('id', axis=1, inplace=True)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 1/14

Out[ ]: age hypertension heart_disease avg_glucose_level bmi s

count 5110.000000 5110.000000 5110.000000 5110.000000 4909.000000 5110.00

mean 43.226614 0.097456 0.054012 106.147677 28.893237 0.04

std 22.612647 0.296607 0.226063 45.283560 7.854067 0.2

min 0.080000 0.000000 0.000000 55.120000 10.300000 0.00

25% 25.000000 0.000000 0.000000 77.245000 23.500000 0.00

50% 45.000000 0.000000 0.000000 91.885000 28.100000 0.00

75% 61.000000 0.000000 0.000000 114.090000 33.100000 0.00

max 82.000000 1.000000 1.000000 271.740000 97.600000 1.00

In [ ]: #Checking for null values

file:///E:/Data Science Course/Projects/Stroke Prediction.html 2/14

In [ ]: #replacing the missing values with the most frequent value

Check values and their count in the columns

Replacing the values in columns with numerical values

Residence Type: Urban = 1, Rural = 0

file:///E:/Data Science Course/Projects/Stroke Prediction.html 3/14

Work Type : Private = 0, Self-employed = 1, children = 2, Govt_job = 3,

In [ ]: df['ever_married'].replace({'Yes':1, 'No':0}, inplace=True)

Exploratory Data Analysis

Out[ ]: <Axes: >

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/Stroke Prediction.html 4/14

In [ ]: # replace age with number wrt to age group

Out[ ]: gender age hypertension heart_disease ever_married work_type Residence_type

Visulaizing the data

file:///E:/Data Science Course/Projects/Stroke Prediction.html 5/14

Out[ ]: <Axes: xlabel='gender', ylabel='count'>

In [ ]: fig, ax = plt.subplots(4,4,figsize=(20, 20))

Out[ ]: <Axes: xlabel='Residence_type', ylabel='count'>

file:///E:/Data Science Course/Projects/Stroke Prediction.html 6/14

Train Test Split

In [ ]: #training the model

file:///E:/Data Science Course/Projects/Stroke Prediction.html 7/14

In [ ]: #testing the model

Support Vector Machine (SVM)

In [ ]: #training the model

In [ ]: #testing the model

Decision Tree Classifier

In [ ]: #training the model

file:///E:/Data Science Course/Projects/Stroke Prediction.html 8/14

In [ ]: #testing the model

K-Nearest Neighbors (KNN)

In [ ]: #training the model

In [ ]: #testing the model

file:///E:/Data Science Course/Projects/Stroke Prediction.html 9/14

In [ ]: print('Logistic Regression Model Accuracy Score:',accuracy_score(y_test, lr_pred

Logistic Regression Model Accuracy Score: 0.9393346379647749

Support Vector Machine (SVM)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 10/14

In [ ]: print('SVM Model Accuracy Score:',accuracy_score(y_test, sv_pred))

SVM Model Accuracy Score: 0.9393346379647749

Decision Tree Classifier

file:///E:/Data Science Course/Projects/Stroke Prediction.html 11/14

In [ ]: print('Decision Tree Model Accuracy Score:',accuracy_score(y_test, dt_pred))

Decision Tree Model Accuracy Score: 0.9099804305283757

K-Nearest Neighbors (KNN)

file:///E:/Data Science Course/Projects/Stroke Prediction.html 12/14

In [ ]: print('KNN Model Accuracy Score:',accuracy_score(y_test, knn_pred))

KNN Model Accuracy Score: 0.9373776908023483

file:///E:/Data Science Course/Projects/Stroke Prediction.html 13/14

file:///E:/Data Science Course/Projects/Stroke Prediction.html 14/14

You might also like