Dsbda Miniproject Report
Dsbda Miniproject Report
BACHELOR OF ENGINEERING
In
Information Technology
By
Gayatri Bhosale (IT-3005)
Sanika Deshmukh (IT-3005)
Isha Gaikwad (IT-3013)
Vrushali Gawai (IT-3015)
Narayan Gawas (IT-3016)
Shrikant Hundekar (IT-3019)
CERTIFICATE
Is a bonafide work carried out by them under the supervision and guidance of Prof.
Pravin Kamble and it is approved for partial fulfillment of the requirement for TE
(Information Technology Engineering) course Data Science And Big Data Analysis
of Savitribai Phule Pune University for the award of the Degree of Bachelor of
Engineering (Information technology Engineering).
This project report has not been earlier submitted to any other institute or university
for the award of any degree or diploma.
ACKNOWLEDGEMENT
We would like to take this opportunity to express our gratitude towards all the people
Who have in various ways, helped in the successful completion of our project. We must convey
our gratitude to Prof. Pravin Kamble for giving us the constant source of inspiration and help
in preparing the project, personally correcting our work and providing encouragement
throughout the project.
We express deep gratitude towards Dr. V. S. Gaikwad HOD, Information Technology
Engineering Department, for providing the support and giving me his valuable time.
We also express gratitude towards my Father, Mother, other family members, and my
friends for encouraging me with their valuable suggestions and motivating me from time to
time.
Group Members:-
DECLARATION
We, the undersigned, hereby declare that the Project entitled “COVID-19 vaccination
data analysis across Indian states” submitted by us to Trinity College of Engineering and
Research Pune, for the award of the degree of Bachelor of Engineering in Information
Technology Engineering, under the guidance of Prof. Pravin Kamble is our original work.
We further declare that to the best of our knowledge and belief, this work has not been
previously submitted to this or any other university.
Group Members:-
CONTENTS
CONTENTS i
ABSTRACT ii
1 INTRODUCTION 1
1.1 INTRODUCTION 1
1.2 RELATED WORK 2
1.3 AIMS/MOTIVATION 2
1.4 OBJECTIVE OF THE WORK 3
1.5 SCOPE OF PROJECT 3
1.6 PURPOSE 4
2 LITERATURE SURVEY 5
3 IMPLEMENTATION 6
3.1 PROBLEM STATEMENT 10
3.2 HARDWARE AND SOFTWARE REQUIREMENT 10
3.3 MODULES/LIBRARIES USED DETAILS 11
3.4 DATASET USED DETAILS 12
4 DESIGN AND SPECIFICATION 13
4.1 FLOWCHART 13
4.2 SYSTEM ARCHITECTURE 14
5 RESULTS AND OUTCOMES 17
6 CONCLUSION AND FUTURE SCOPE 18
7 REFERENCES 19
i
COVID-19 vaccination data analysis across Indian states
Abstract
The COVID-19 pandemic prompted one of the largest vaccination drives in human history,
with India undertaking a massive nationwide campaign to curb the spread of the virus. This
project aims to analyze COVID-19 vaccination data across various Indian states to assess the
distribution, coverage, and effectiveness of the vaccination efforts. Using data from official
sources, the study provides insights into state-wise vaccination trends, demographic coverage,
and disparities in vaccine access. Visualization techniques and statistical analysis are employed
to identify patterns, outliers, and correlations between vaccination rates and factors such as
population density, healthcare infrastructure, and case counts. The results highlight significant
progress in vaccination coverage while also revealing areas that require focused intervention.
This analysis can assist policymakers and health officials in making informed decisions for
future public health strategies and resource allocation.
ii
COVID-19 vaccination data analysis across Indian states
Chapter 1
Introduction
1.1 Introduction
The outbreak of COVID-19 in late 2019 rapidly evolved into a global health crisis, affecting
millions of lives and overwhelming healthcare systems worldwide. India, with its vast and
diverse population, faced unique challenges in controlling the spread of the virus. In response,
the Government of India launched an extensive vaccination campaign in January 2021, aiming
to immunize the population and reduce the impact of the pandemic. This campaign became one
of the largest vaccination drives globally, targeting different age groups in phases and
deploying multiple vaccines such as Covishield, Covaxin, and later, others like Sputnik V.
The outbreak of COVID-19 in late 2019 rapidly evolved into a global health crisis, affecting
millions of lives and overwhelming healthcare systems worldwide. India, with its vast and
diverse population, faced unique challenges in controlling the spread of the virus. In response,
the Government of India launched an extensive vaccination campaign in January 2021, aiming
to immunize the population and reduce the impact of the pandemic. This campaign became one
of the largest vaccination drives globally, targeting different age groups in phases and
deploying multiple vaccines such as Covishield, Covaxin, and later, others like Sputnik V.
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Several studies and data visualization projects have explored COVID-19 vaccination
trends both globally and within India. Platforms like the CoWIN dashboard, MoHFW, and
MyGov.in have provided real-time vaccination data, while independent efforts such as
COVID19India.org and Data Meet helped in public tracking and analysis. Researchers have
examined vaccination coverage, disparities among states, and correlations with healthcare
infrastructure and population demographics. These works provide a valuable foundation for
understanding regional differences and guiding targeted public health strategies, which this
project builds upon through focused state-wise analysis.
In addition to government sources, academic studies and journal articles have analyzed
the effectiveness of India's vaccination rollout using statistical models and survey data. These
studies often highlight challenges such as vaccine hesitancy, digital access barriers in rural
areas, and logistical hurdles in remote regions. Comparative analyses between states have
revealed how factors like urbanization, literacy, and healthcare infrastructure significantly
influenced vaccination rates. This project extends such findings by providing a visual and data-
driven examination of state-wise trends to better understand the overall performance and
regional disparities in India’s vaccination campaign.
1.3 AIMS/MOTIVATION
In addition to government sources, academic studies and journal articles have analyzed
the effectiveness of India's vaccination rollout using statistical models and survey data. These
studies often highlight challenges such as vaccine hesitancy, digital access barriers in rural
areas, and logistical hurdles in remote regions. Comparative analyses between states have
revealed how factors like urbanization, literacy, and healthcare infrastructure significantly
influenced vaccination rates.
This project is motivated by the need to leverage data to uncover patterns, gaps, and
successes in the vaccination campaign across Indian states. By analyzing vaccination data, we
can assess the effectiveness of policy decisions, identify states or regions that require more
attention, and ensure equitable vaccine distribution. transparency and guidance, empowering
students to set realistic goals and target suitable universities.
2
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
1. To collect and preprocess state-wise COVID-19 vaccination data from reliable sources.
2. To identify trends, patterns, and disparities in vaccination rates based on demographics
and geography.
3. To highlight states with high or low performance in vaccination rollout and explore
possible reasons.
4. To provide insights and recommendations for improving vaccine coverage and
accessibility.
5. To support data-driven decision-making for future public health planning and crisis
management.
6. To analyze the distribution and coverage of vaccinations across different Indian states.
7. To compare vaccination data with factors like population density, healthcare
infrastructure, and reported COVID-19 cases.
The study also explores regional disparities in vaccine distribution and uptake,
highlighting key factors that influenced vaccination trends, such as urban-rural divide, literacy
rate, and accessibility. Through data visualization and interpretation, the project aims to present
insights in a user-friendly format to assist policymakers, researchers, and public health
officials.
The scope is limited to secondary data collected from official and publicly available
sources such as the CoWIN portal, Ministry of Health and Family Welfare (MoHFW), and
government dashboards. While the project does not involve primary data collection or real-
time data analysis, it provides a strong foundation for further studies in vaccine impact
assessment and public health strategy planning.
3
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
1.6 PURPOSE
The primary purpose of this project is to analyze the COVID-19 vaccination data across
Indian states to gain meaningful insights into the reach, coverage, and effectiveness of the
national vaccination campaign. By studying state-wise data, the project aims to identify
patterns, regional disparities, and critical factors affecting vaccine distribution and uptake.
4
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 2
LITERATURE SURVEY
5
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 3
IMPLEMENTATION
The project began with data collection from official sources like the CoWIN dashboard and
COVID19India.org, which provided real-time vaccination statistics across Indian states. The
raw data was cleaned and preprocessed using Python and Pandas, handling missing values,
standardizing formats, and merging with demographic and healthcare data. After
preprocessing, statistical analysis was performed to identify trends and correlations, such as
the relationship between vaccination rates and factors like population density and healthcare
infrastructure.
For data visualization, tools like Matplotlib, Seaborn, and Plotly were used to create charts,
heat maps, and interactive visualizations, making it easier to compare vaccination coverage
across states. The findings were then compiled into reports, highlighting regional disparities
and offering recommendations for improving vaccine distribution, especially in underserved
areas. The project aimed to provide actionable insights for policymakers and public health
officials to improve vaccination efforts across India.
Results:
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
7
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
8
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
9
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
The project aims to analyze state-wise COVID-19 vaccination data across India to identify
disparities in vaccine coverage and uncover factors influencing vaccination rates, with the goal
of informing more effective and equitable public health strategies.
Hardware Requirements:
1. Processor: Intel Core i3 or higher (or equivalent)
2. RAM: 4 GB or more (8 GB recommended for better performance)
3. Storage: 1 GB of free disk space (for dataset and dependencies)
4. Graphics Card: Integrated graphics are sufficient, but a dedicated GPU can speed up
training for larger datasets (optional).
Software Requirements:
1. Operating System: Windows 10/11, macOS, or Linux
2. Python Version: Python 3.7 or higher
3. IDE: Any Python IDE (e.g., PyCharm, Jupyter Notebook, Visual Studio Code)
4. Libraries and Packages:
o Pandas (for data manipulation and analysis)
o NumPy (for numerical operations)
o Matplotlib (for data visualization)
o Seaborn (for enhanced visualizations)
o Plotly (for making dashboard)
o Scipy (Statistical anatytics)
5. Additional Tools:
o Jupyter Notebook or Google Colab
10
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
1. Pandas:
Purpose: Used for data manipulation and analysis.
Details: It helps load, clean, and preprocess the dataset. It provides data structures like
DataFrame for organizing the data into tables and offers powerful methods for data
wrangling.
Installation: pip install pandas
2. NumPy:
Purpose: Used for numerical operations and array handling.
Details: NumPy provides support for multi-dimensional arrays and matrices, along
with a collection of mathematical functions to operate on these arrays.
Installation: pip install numpy
3. Matplotlib:
Purpose: Used for data visualization.
Details: This library is used to create static, animated, and interactive visualizations. In
this project, it is used for plotting scatter plots to compare actual vs. predicted values.
Installation: pip install matplotlib
4. Seaborn:
Purpose: Provides enhanced visualization capabilities over Matplotlib.
Details: Seaborn is built on top of Matplotlib and is used for making attractive and
informative statistical graphics. It is particularly useful for visualizing complex data
relationships and distributions.
Installation: pip install seaborn
11
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
COVID19India.org:
Source: COVID19India.org
Details: This platform offers detailed data on COVID-19 vaccination across Indian
states and union territories. It aggregates data from the CoWIN portal, but also
incorporates additional sources to ensure accuracy and completeness.
Format: Available in CSV format, which can be directly used for analysis.
12
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 4
4.1 FLOWCHART
The flowchart outlines the step-by-step process of a machine learning project aimed at
detecting COVID-19 vaccination status across Indian states. It begins with problem definition,
followed by data collection from sources like CoWIN API and government portals. The data
is then preprocessed by handling missing values, encoding categorical data, and selecting
important features. Exploratory Data Analysis (EDA) is performed to understand state-wise
distribution through visualizations. Various models such as Logistic Regression, Random
Forest, and XGBoost are considered, after which the selected models are trained and validated
using techniques like train-test split and cross-validation.
13
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
14
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Model Evaluation
Once the models are trained, their performance is evaluated using standard classification
metrics such as accuracy, precision, recall, and F1-score. These metrics help determine how
well the model distinguishes between different vaccination statuses (e.g., vaccinated,
unvaccinated, partially vaccinated). A confusion matrix may also be used to visualize the
13
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
number of correct and incorrect predictions. This evaluation ensures that only the most reliable
model is selected for final deployment or interpretation.
Visualization of Results
In the final stage, the model’s predictions are presented in a user-friendly format to support
decision-making. The results are visualized using charts, dashboards, or geographical maps to
show state-wise vaccination statuses clearly. Tools like Plotly, Tableau, or Streamlit can be
used to create interactive dashboards, allowing stakeholders to drill down into specific states
or trends. These visualizations are key to translating complex ML outputs into actionable
insights for policymakers and healthcare professionals.
16
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 5
The analysis of COVID-19 vaccination data across Indian states revealed notable
disparities in vaccine coverage. States with well-developed healthcare infrastructure and higher
levels of urbanization—such as Maharashtra, Kerala, and Delhi—achieved significantly higher
vaccination rates compared to rural or economically weaker states, particularly in parts of the
northeast and central India. A positive correlation was observed between literacy rates and
vaccination uptake, suggesting that public awareness and access to information played a key
role in vaccine acceptance. Additionally, while the majority of states achieved substantial first
and second dose coverage, booster dose uptake was considerably lower in many regions,
highlighting the need for continued awareness campaigns and follow-up initiatives.
The outcome of the project demonstrates the power of data analytics in uncovering
regional trends and guiding public health decisions. Through visualizations such as bar charts
and heatmaps, the project effectively identified high- and low-performing states, offering
valuable insights into the factors influencing vaccine distribution. These findings can support
policymakers and healthcare organizations in refining strategies for future vaccination drives,
ensuring more equitable and efficient coverage across all regions of the country.
17
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 6
Conclusion:
The analysis of COVID-19 vaccination data across Indian states highlights
significant disparities in vaccine distribution and coverage, influenced by factors such as
healthcare infrastructure, literacy rates, urbanization, and public awareness. By leveraging data
analytics and visualization tools, the project successfully identified key patterns and gaps in
the vaccination rollout, offering insights that can support more targeted and equitable public
health strategies. The study not only emphasizes the importance of data-driven decision-
making in managing large-scale health campaigns but also lays the groundwork for further
research aimed at improving vaccine access and preparedness for future health crises.
Future Scope:
This project can be further enhanced by integrating real-time data through APIs and
expanding the analysis to a more granular level, such as district-wise or rural-urban
comparisons. Machine learning models can be employed to predict vaccination trends, identify
vulnerable regions, and optimize resource distribution. Additionally, future work could involve
comparing India’s vaccination performance with other countries or health programs to derive
best practices. Incorporating public feedback and sentiment analysis from social media can also
help understand public perception and guide more effective awareness campaigns.
18
Department of IT - 2024-25
COVID-19 vaccination data analysis across Indian states
Chapter 7
REFERENCES
[4]Ghosh, A., et al. (2021). Regional Disparities in COVID-19 Vaccination in India. Journal
of Public Health Policy, 42(3), 345-356.
[6]Indian Council of Medical Research (ICMR). (2021). COVID-19 Vaccine Effectiveness and
Coverage in India. https://www.icmr.gov.in
19
Department of IT - 2024-25