I’m happy to share the project I completed "Prognosis of Cardiovascular Diseases with COVID vaccines through Naive Bayes and Linear Regression Supervised ML models".
🔗Git Hub Link:
https://lnkd.in/gxksDf4J
☄️Literature publications are available supporting the hypothesis of COVID vaccines triggering heart problems in elderly patients.
☄️One of the publications supporting hypothesis/problem statement: https://lnkd.in/gctuJs8X.
🔗VAERs Dataset Link:
https://lnkd.in/gRHjKGun
📌Data Collection & Data Wrangling:
1. Download 2020, 2021, 2022, 2023 vaccines data files from VAERs.
2. Under each year, the data sources are separately available for Patients, Vaccines, Adverse Events data. To create distinctive data file for each year, use ‘Full Outer Join’ to add columns considering VAERs_ID as ‘Primary Key’ and save it with name of the respective year.
3. Now compile above listed 4 data files (2020, 2021, 2022, 2023) by merging all rows by creating a Single Master Tracker which retrieves 13,34,040 patient records.
4. Filter out adverse events occurred less than 30days of vaccines shot to establish causal relationship between Vaccine and adverse events (9,26,931 patient records retrieved).
5. Include patients who received COVID vaccines and exclude other patient records. The final dataset comprises of 800192 records (8lakhs).
📌Exploratory Data Analysis (EDA):
Performed EDA using Spotfire to draw insights by excluding Gender rows with ‘Unknown’ values. Since female records are extensively higher than male records, ‘Gender’ could not be considered as variable for ML prediction. Hence ‘Unknown’ Gender values were re-inserted to the dataset.
📌 Feature Engineering & Feature Scaling (Python script 🐍 on Google Colab):
1. To build the ML model, only 3 columns (Age Group, Vaccine Manufacturers, Symptoms Text) were selected out of 52 columns available in the final dataset.
2. Transform categorical features into numerical values ranging as 0 or 1 to execute ML.
📌ML Type :
Supervised Machine Learning was chosen since both input variable (Vaccine Manufacturer/Age Group) and Output Label (Heart Problems) are known.
📌ML Algorithms Used:
Naive Bayes using 2 variables (Vaccine Manufacturer, Age Group) and Linear Regression for single feature (Vaccine Manufacturer)
📌 Accuracy of ML model built: 87%
📌Confusion Matrix Value: (0,0)
📌Conclusion drawn with VAERs data:
No postive correlation between COVID Vaccine & Heart Problems. The dataset was not used to assess pre-existence of heart problems in the patients who experienced cardiovascular issues with COVID vaccines. Hence it was not explored whether patients experinced cardiac side effects as new adverse events or exacerbation of existent condition.