0% found this document useful (0 votes)

71 views6 pages

Project 1 Healthcare

This document outlines a project focused on analyzing patient data within the healthcare domain, specifically for dialysis patients. It details the data preparation process, including cleaning and correcting errors, as well as the steps for exploratory data analysis and visualization. The final deliverables include a clean dataset and a report summarizing key insights and recommendations based on the analysis.

Uploaded by

neonnoneed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views6 pages

Project 1 Healthcare

Uploaded by

neonnoneed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Projects

Analysis Of Patient Data (Domain: Healthcare)

 This project requires learners to analyze the patient data of those suffering from
different diseases across various summaries. The facility, chain organizations, and
dialysis stations analysis is required to be carried out where the patients are
undergoing dialysis. The project also focuses on the payment mode aspect wherein if
any discounts or reduction in payments have happened then those are analyzed.

# Correct the data preparation

# Set number of rows

num_rows = 55

# Define sample data with consistent length

data = {
"Patient_ID": [f"P{str(i).zfill(3)}" for i in range(1, num_rows - 1)] + [None],
"Facility_Name": ["CityCare Hosp", "MediLife Center", "Wellness Clinic", None, "CityCare
Hosp"] * 11,
"Chain_Organization": ["HealthPlus", "Independent", None, "HealthPlus", "MediLife
Group"] * 11,
"Dialysis_Station_ID": ["Station_" + str(random.randint(1, 5)) for _ in range(num_rows)],
"Disease_Type": ["Kidney Failure", "Acute Kidney Disease", "Chronic Kidney Disease",
None, "Kidney Failure"] * 11,
"Payment_Mode": ["Cash", "Insurance", "Govt Aid", None, "Insurance"] * 11,
"Total_Cost": [random.choice([5000, 4800, 5200, None, 5100, "Five Thousand", "4,800"])
for _ in range(num_rows)],
"Discount_Applied": [random.choice([0, 500, 700, None, "Five Hundred", 600]) for _ in
range(num_rows)],
"Final_Amount": [random.choice([4500, 4800, 4700, 4400, None, "Four Thousand Five
Hundred"]) for _ in range(num_rows)],
"Visit_Date": pd.date_range(start="2024-01-01", periods=num_rows, freq="7D").tolist()
}

# Create DataFrame
df = pd.DataFrame(data)
# Manually inject more errors
df.loc[5, 'Facility_Name'] = "medilife center" # Lowercase error
df.loc[10, 'Payment_Mode'] = "cash" # lowercase
df.loc[15, 'Total_Cost'] = "five thousand" # text instead of number
df.loc[20, 'Discount_Applied'] = "six hundred" # text
df.loc[25, 'Final_Amount'] = "four thousand five hundred" # text
df.loc[30, 'Disease_Type'] = "kidney failure" # lowercase
df.loc[35, 'Chain_Organization'] = None # missing value

# Save to CSV
csv_path = '/mnt/data/patient_dialysis_data_with_errors_final.csv'
df.to_csv(csv_path, index=False)
csv_path

1. Dataset (Expected Columns)

Column Name Description

Patient_ID Unique ID of each patient

Facility_Name Hospital/facility name

Chain_Organization Organization that owns the facility

Dialysis_Station_ID Station where dialysis is done

Disease_Type Type of kidney disease

Payment_Mode Payment method (Cash, Insurance, Govt Aid)

Total_Cost Original treatment cost

Discount_Applied Discount amount given

Final_Amount Final amount after discount

Visit_Date Date of visit

2. Solution Guide — Step-by-Step

Step 1: Understand the Problem
 You need to analyze patient data, dialysis facilities, chains, stations, and payment
modes.
 You need to find patterns, summaries, and insights.

Step 2: Load and Clean the Data

 Load Data: Load CSV or Excel file into Python (Pandas) or Power BI/Excel.
 Check Null Values: If missing values exist, decide to fill or remove them.
 Correct Data Types: Ensure cost columns are numbers, dates are dates, etc.
 Remove Duplicates: Check and clean.
Tools: Pandas, Excel, or Power Query (Power BI).

Step 3: Exploratory Data Analysis (EDA)

Here you find answers to important questions:

✅ Facility Level

 How many patients per facility?

 Which facility has the highest number of dialysis sessions?

✅ Chain Organization Level

 How many facilities belong to each chain?

 Compare the patient numbers between chains.

✅ Dialysis Station Level

 Which stations are used most?

 Are some stations underutilized?

✅ Payment Mode Analysis

 How are patients paying? (cash, insurance, govt aid, etc.)

 What percentage received discounts?
 Average discount amount?

✅ Revenue Analysis

 Total revenue = sum(Final_Amount)

 Revenue lost due to discounts = sum(Discount_Applied)
Tools: Pandas groupby, pivot tables, bar charts.

Step 4: Visualization
Create graphs and charts like:
 Bar chart of patients per facility
 Pie chart of payment mode distribution
 Line chart showing monthly revenue
 Heatmap of station usage
Tools: Power BI, Tableau, Matplotlib, or Excel Charts.

Step 5: Insights and Recommendations

Write a summary report with conclusions, such as:
 "Facility X handles 30% of all dialysis cases."
 "Chain Y has the highest discount application, affecting revenue."
 "Most payments happen via insurance — 60%."
 "Station 4 and 5 are underused — optimization needed."

Step 6: Final Deliverables

You can submit:
 A clean Excel file or Python notebook (Jupyter) or Power BI Dashboard.
 A PDF report or PowerPoint summarizing the key insights and charts.
 Optional: a few recommendations for management based on findings.

I even added mistakes/errors like:

 Null values
 Wrong spellings
 Data types mistakes (text instead of numbers)

 document all their assumptions.

(Example: If filling missing payment modes with "Unknown", write it clearly.)

Data Cleaning Steps

👉 Step 1: Handle Missing Values

 Identify missing data (nulls) in key columns.

 Fill or remove rows depending on analysis needs.
o Example: If Facility_Name or Patient_ID is missing → drop that row.
o If Disease_Type is missing → can fill as 'Unknown'.

👉 Step 2: Standardize Text Data

 Correct lowercase/case issues.

o Example: Change "cash" to "Cash", "medilife center" to "MediLife Center".
 Remove unnecessary spaces.

👉 Step 3: Correct Data Types

 Convert Total_Cost, Discount_Applied, Final_Amount to numeric values.

o Text like "Five Thousand" → convert to 5000
o Remove commas inside numbers like "4,800" → 4800

👉 Step 4: Check Duplicates

 Check if any Patient_ID is duplicated.

 Keep only the latest Visit_Date record if needed.

👉 Step 5: Validate Logical Consistency

 Check if:
Final Amount=Total Cost−Discount Applied\text{Final Amount} = \text{Total Cost} -
\text{Discount Applied}Final Amount=Total Cost−Discount Applied
o If not, fix or flag incorrect rows.

Data Analysis Steps

👉 Step 6: Facility-wise Patient Count

 How many patients went to each facility?

👉 Step 7: Chain Organization Summary

 How many facilities under each chain?

 Number of patients handled per chain.

👉 Step 8: Dialysis Station Usage

 Number of treatments per Dialysis_Station_ID.

 Find busiest stations.

👉 Step 9: Payment Mode Analysis

 How many patients used Cash, Insurance, Govt Aid?

 Average Final Amount per payment mode.

👉 Step 10: Discount Analysis

 How many patients received discounts?

 Total amount of discounts given (sum).

👉 Step 11: Timeline Trend

 Plot number of visits per month.

 Are patients increasing or decreasing over time?

Reporting and Insights

👉 Step 12: Key Findings

 Facility with most patients.

 Chain organization handling maximum cases.
 Popular payment mode.
 Average discount given.
 Busiest dialysis station.

👉 Step 13: Visualizations (Optional)

 Bar chart: Facility vs Number of Patients

 Pie chart: Payment Mode share
 Line chart: Visits over time

Step Task Output

1 Handle Missing Values Cleaned data

2 Standardize Text Proper text formatting

3 Correct Data Types Numeric columns fixed

4 Remove Duplicates Unique patients

5 Validate Amounts Logical data

6-11 Analyze Business insights

12-13 Report Final dashboard

Healthcare Data Cleaning Assignment
No ratings yet
Healthcare Data Cleaning Assignment
2 pages
Health Care System Analysispdf
No ratings yet
Health Care System Analysispdf
19 pages
Columbia Asia Hospital Project
No ratings yet
Columbia Asia Hospital Project
19 pages
Phase 2
No ratings yet
Phase 2
6 pages
L&T Final Project
No ratings yet
L&T Final Project
23 pages
ADAV Project
No ratings yet
ADAV Project
21 pages
Project HealthCare Insurance
No ratings yet
Project HealthCare Insurance
9 pages
Geetha Polaboina - Data Analyst - CV
100% (1)
Geetha Polaboina - Data Analyst - CV
4 pages
Data-Anylst Resume Pranav
No ratings yet
Data-Anylst Resume Pranav
1 page
Mahesh T Tableau Resume
No ratings yet
Mahesh T Tableau Resume
4 pages
Ad Iiiiii
No ratings yet
Ad Iiiiii
1 page
PPS Batch 1
No ratings yet
PPS Batch 1
25 pages
Business Practicals
No ratings yet
Business Practicals
33 pages
1607 Assessment Dataset
No ratings yet
1607 Assessment Dataset
1 page
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Group 1 CIN-Act QN (A)
No ratings yet
Group 1 CIN-Act QN (A)
3 pages
Data Analyst Proposal Rahul Pandit
No ratings yet
Data Analyst Proposal Rahul Pandit
2 pages
Case Study Guidelines
No ratings yet
Case Study Guidelines
7 pages
Healthcare Analytics Dashboard NewReport File
No ratings yet
Healthcare Analytics Dashboard NewReport File
29 pages
Assessing Data Quality Dimensions
No ratings yet
Assessing Data Quality Dimensions
9 pages
I Have Extraxted The Data Into Excel Now Tell Me W
No ratings yet
I Have Extraxted The Data Into Excel Now Tell Me W
3 pages
Hospital Management Documentation
No ratings yet
Hospital Management Documentation
19 pages
Exploring Data Analytics in The Healthcare Industry For Improved Patient Care
No ratings yet
Exploring Data Analytics in The Healthcare Industry For Improved Patient Care
10 pages
Election Analysis
No ratings yet
Election Analysis
4 pages
Ibm PROJECT 1 1 Output
No ratings yet
Ibm PROJECT 1 1 Output
10 pages
IJRTI2404048
No ratings yet
IJRTI2404048
6 pages
Supriya Data Analyst Resume
No ratings yet
Supriya Data Analyst Resume
3 pages
Abroz Data Story
No ratings yet
Abroz Data Story
9 pages
Data Analyst & Consultant Profile
No ratings yet
Data Analyst & Consultant Profile
1 page
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Vipul (2) Aug 2025
No ratings yet
Vipul (2) Aug 2025
1 page
Eda Document Longterm
No ratings yet
Eda Document Longterm
10 pages
Power Bi
No ratings yet
Power Bi
7 pages
Final Project DA 11.00
No ratings yet
Final Project DA 11.00
3 pages
LAB01
No ratings yet
LAB01
7 pages
Data Analytics Course Outline
No ratings yet
Data Analytics Course Outline
5 pages
Data Analyst Profile: Chinmayee Khade
No ratings yet
Data Analyst Profile: Chinmayee Khade
1 page
Operations Dashboard Guide
No ratings yet
Operations Dashboard Guide
7 pages
Pa Unit 2
No ratings yet
Pa Unit 2
6 pages
20 Scenario Q&A For Data Analyst
No ratings yet
20 Scenario Q&A For Data Analyst
4 pages
5 Project For Business Analysis
No ratings yet
5 Project For Business Analysis
5 pages
Document
No ratings yet
Document
29 pages
Data Analytics & Power BI Projects-IBMSB4AL
No ratings yet
Data Analytics & Power BI Projects-IBMSB4AL
9 pages
Project
No ratings yet
Project
6 pages
Red Credit Union
No ratings yet
Red Credit Union
21 pages
Safwan BA
No ratings yet
Safwan BA
3 pages
Project Report
100% (1)
Project Report
16 pages
BI Journal KC
No ratings yet
BI Journal KC
38 pages
Dice Resume CV LAKSHMI GUDAPATI
No ratings yet
Dice Resume CV LAKSHMI GUDAPATI
5 pages
Shrey ML EXP 1
No ratings yet
Shrey ML EXP 1
4 pages
MIS Project
No ratings yet
MIS Project
8 pages
Tasks Summary
No ratings yet
Tasks Summary
2 pages
Summary - Lifecycle of Data Analysis - 3982
No ratings yet
Summary - Lifecycle of Data Analysis - 3982
7 pages
Dcova Framework
No ratings yet
Dcova Framework
7 pages
DAI - DD - Term 4 2023
No ratings yet
DAI - DD - Term 4 2023
14 pages
Keerthana Kola
No ratings yet
Keerthana Kola
1 page
Data Analyst Specialist - Projects Ideas
No ratings yet
Data Analyst Specialist - Projects Ideas
6 pages
Data Analyst Specialist - Projects Ideas
No ratings yet
Data Analyst Specialist - Projects Ideas
6 pages
Epik Program: Residence Certificate (SARS)
No ratings yet
Epik Program: Residence Certificate (SARS)
2 pages
Bluguard l900 User Manual v1.0
No ratings yet
Bluguard l900 User Manual v1.0
23 pages
Mooney Aircraft Inspection Guide
100% (1)
Mooney Aircraft Inspection Guide
2 pages
Seed Production of The Native Catfish (Hito)
100% (1)
Seed Production of The Native Catfish (Hito)
29 pages
Training Evaluation Self-Assessment
No ratings yet
Training Evaluation Self-Assessment
3 pages
Sarimul H Mazumdar - 20240324 - 202649 - 0000
No ratings yet
Sarimul H Mazumdar - 20240324 - 202649 - 0000
2 pages
Entrepreneurship Theories Guide
No ratings yet
Entrepreneurship Theories Guide
4 pages
ER Model (Database Management)
No ratings yet
ER Model (Database Management)
19 pages
Drone Basics for Enthusiasts
100% (2)
Drone Basics for Enthusiasts
23 pages
FinAcc 6
No ratings yet
FinAcc 6
24 pages
UPL Manual
No ratings yet
UPL Manual
462 pages
Turbo Repair Parts Catalog
No ratings yet
Turbo Repair Parts Catalog
28 pages
Digest of San Pablo v. Pantranco South Express, Inc. (G.R. No. 61461)
No ratings yet
Digest of San Pablo v. Pantranco South Express, Inc. (G.R. No. 61461)
1 page
Flight Attendant
No ratings yet
Flight Attendant
25 pages
PSM Implementation AIGA
No ratings yet
PSM Implementation AIGA
25 pages
MDM Monthly Format 25-26
100% (3)
MDM Monthly Format 25-26
2 pages
116 SA1130 Hi Fi Choice English Nov 1998
No ratings yet
116 SA1130 Hi Fi Choice English Nov 1998
1 page
Accident Prevention
100% (2)
Accident Prevention
106 pages
TRAPeze Manual S7700
No ratings yet
TRAPeze Manual S7700
44 pages
Displacement Velocity Acceleration
No ratings yet
Displacement Velocity Acceleration
6 pages
Philippines Gender Equality Progress
No ratings yet
Philippines Gender Equality Progress
12 pages
History of Electrical Pioneers
No ratings yet
History of Electrical Pioneers
37 pages
Human Resource Management: Stephen P. Robbins Mary Coulter
No ratings yet
Human Resource Management: Stephen P. Robbins Mary Coulter
45 pages
Input
No ratings yet
Input
24 pages
Advertising's Impact on Supply & Demand
No ratings yet
Advertising's Impact on Supply & Demand
6 pages
Selected Abbreviations
No ratings yet
Selected Abbreviations
15 pages
Arsi University College of Social Sciences, Humanities and Law Department of Sociology and Social Work
100% (2)
Arsi University College of Social Sciences, Humanities and Law Department of Sociology and Social Work
42 pages
Annexture A Personal Details
No ratings yet
Annexture A Personal Details
84 pages
TS2 - COA-Laws Rules and Regulations On Government Expenditures PDF
No ratings yet
TS2 - COA-Laws Rules and Regulations On Government Expenditures PDF
92 pages
Potentiality of A Business in A Particular District Read Our District Industrial Potentiality Survey Report
No ratings yet
Potentiality of A Business in A Particular District Read Our District Industrial Potentiality Survey Report
49 pages

Project 1 Healthcare

Uploaded by

Project 1 Healthcare

Uploaded by

Projects

Analysis Of Patient Data (Domain: Healthcare)

# Correct the data preparation

# Set number of rows

# Define sample data with consistent length

1. Dataset (Expected Columns)

Patient_ID Unique ID of each patient

Facility_Name Hospital/facility name

Chain_Organization Organization that owns the facility

Dialysis_Station_ID Station where dialysis is done

Disease_Type Type of kidney disease

Payment_Mode Payment method (Cash, Insurance, Govt Aid)

Total_Cost Original treatment cost

Discount_Applied Discount amount given

Final_Amount Final amount after discount

Visit_Date Date of visit

2. Solution Guide — Step-by-Step

Step 2: Load and Clean the Data

Step 3: Exploratory Data Analysis (EDA)

 How many patients per facility?

✅ Chain Organization Level

 How many facilities belong to each chain?

✅ Dialysis Station Level

 Which stations are used most?

✅ Payment Mode Analysis

 How are patients paying? (cash, insurance, govt aid, etc.)

 Total revenue = sum(Final_Amount)

Step 5: Insights and Recommendations

Step 6: Final Deliverables

I even added mistakes/errors like:

 document all their assumptions.

Data Cleaning Steps

👉 Step 1: Handle Missing Values

 Identify missing data (nulls) in key columns.

👉 Step 2: Standardize Text Data

 Correct lowercase/case issues.

👉 Step 3: Correct Data Types

 Convert Total_Cost, Discount_Applied, Final_Amount to numeric values.

👉 Step 4: Check Duplicates

 Check if any Patient_ID is duplicated.

👉 Step 5: Validate Logical Consistency

Data Analysis Steps

👉 Step 6: Facility-wise Patient Count

 How many patients went to each facility?

👉 Step 7: Chain Organization Summary

 How many facilities under each chain?

👉 Step 8: Dialysis Station Usage

 Number of treatments per Dialysis_Station_ID.

👉 Step 9: Payment Mode Analysis

 How many patients used Cash, Insurance, Govt Aid?

👉 Step 10: Discount Analysis

 How many patients received discounts?

👉 Step 11: Timeline Trend

 Plot number of visits per month.

Reporting and Insights

👉 Step 12: Key Findings

 Facility with most patients.

👉 Step 13: Visualizations (Optional)

 Bar chart: Facility vs Number of Patients

Step Task Output

1 Handle Missing Values Cleaned data

2 Standardize Text Proper text formatting

3 Correct Data Types Numeric columns fixed

4 Remove Duplicates Unique patients

5 Validate Amounts Logical data

6-11 Analyze Business insights

12-13 Report Final dashboard

You might also like