0% found this document useful (0 votes)
71 views6 pages

Project 1 Healthcare

This document outlines a project focused on analyzing patient data within the healthcare domain, specifically for dialysis patients. It details the data preparation process, including cleaning and correcting errors, as well as the steps for exploratory data analysis and visualization. The final deliverables include a clean dataset and a report summarizing key insights and recommendations based on the analysis.

Uploaded by

neonnoneed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views6 pages

Project 1 Healthcare

This document outlines a project focused on analyzing patient data within the healthcare domain, specifically for dialysis patients. It details the data preparation process, including cleaning and correcting errors, as well as the steps for exploratory data analysis and visualization. The final deliverables include a clean dataset and a report summarizing key insights and recommendations based on the analysis.

Uploaded by

neonnoneed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Projects

Analysis Of Patient Data (Domain: Healthcare)


 This project requires learners to analyze the patient data of those suffering from
different diseases across various summaries. The facility, chain organizations, and
dialysis stations analysis is required to be carried out where the patients are
undergoing dialysis. The project also focuses on the payment mode aspect wherein if
any discounts or reduction in payments have happened then those are analyzed.

# Correct the data preparation

# Set number of rows


num_rows = 55

# Define sample data with consistent length


data = {
"Patient_ID": [f"P{str(i).zfill(3)}" for i in range(1, num_rows - 1)] + [None],
"Facility_Name": ["CityCare Hosp", "MediLife Center", "Wellness Clinic", None, "CityCare
Hosp"] * 11,
"Chain_Organization": ["HealthPlus", "Independent", None, "HealthPlus", "MediLife
Group"] * 11,
"Dialysis_Station_ID": ["Station_" + str(random.randint(1, 5)) for _ in range(num_rows)],
"Disease_Type": ["Kidney Failure", "Acute Kidney Disease", "Chronic Kidney Disease",
None, "Kidney Failure"] * 11,
"Payment_Mode": ["Cash", "Insurance", "Govt Aid", None, "Insurance"] * 11,
"Total_Cost": [random.choice([5000, 4800, 5200, None, 5100, "Five Thousand", "4,800"])
for _ in range(num_rows)],
"Discount_Applied": [random.choice([0, 500, 700, None, "Five Hundred", 600]) for _ in
range(num_rows)],
"Final_Amount": [random.choice([4500, 4800, 4700, 4400, None, "Four Thousand Five
Hundred"]) for _ in range(num_rows)],
"Visit_Date": pd.date_range(start="2024-01-01", periods=num_rows, freq="7D").tolist()
}

# Create DataFrame
df = pd.DataFrame(data)
# Manually inject more errors
df.loc[5, 'Facility_Name'] = "medilife center" # Lowercase error
df.loc[10, 'Payment_Mode'] = "cash" # lowercase
df.loc[15, 'Total_Cost'] = "five thousand" # text instead of number
df.loc[20, 'Discount_Applied'] = "six hundred" # text
df.loc[25, 'Final_Amount'] = "four thousand five hundred" # text
df.loc[30, 'Disease_Type'] = "kidney failure" # lowercase
df.loc[35, 'Chain_Organization'] = None # missing value

# Save to CSV
csv_path = '/mnt/data/patient_dialysis_data_with_errors_final.csv'
df.to_csv(csv_path, index=False)
csv_path

1. Dataset (Expected Columns)


Column Name Description

Patient_ID Unique ID of each patient

Facility_Name Hospital/facility name

Chain_Organization Organization that owns the facility

Dialysis_Station_ID Station where dialysis is done

Disease_Type Type of kidney disease

Payment_Mode Payment method (Cash, Insurance, Govt Aid)

Total_Cost Original treatment cost

Discount_Applied Discount amount given

Final_Amount Final amount after discount

Visit_Date Date of visit

2. Solution Guide — Step-by-Step


Step 1: Understand the Problem
 You need to analyze patient data, dialysis facilities, chains, stations, and payment
modes.
 You need to find patterns, summaries, and insights.

Step 2: Load and Clean the Data


 Load Data: Load CSV or Excel file into Python (Pandas) or Power BI/Excel.
 Check Null Values: If missing values exist, decide to fill or remove them.
 Correct Data Types: Ensure cost columns are numbers, dates are dates, etc.
 Remove Duplicates: Check and clean.
Tools: Pandas, Excel, or Power Query (Power BI).

Step 3: Exploratory Data Analysis (EDA)


Here you find answers to important questions:

✅ Facility Level

 How many patients per facility?


 Which facility has the highest number of dialysis sessions?

✅ Chain Organization Level

 How many facilities belong to each chain?


 Compare the patient numbers between chains.

✅ Dialysis Station Level

 Which stations are used most?


 Are some stations underutilized?

✅ Payment Mode Analysis

 How are patients paying? (cash, insurance, govt aid, etc.)


 What percentage received discounts?
 Average discount amount?

✅ Revenue Analysis

 Total revenue = sum(Final_Amount)


 Revenue lost due to discounts = sum(Discount_Applied)
Tools: Pandas groupby, pivot tables, bar charts.

Step 4: Visualization
Create graphs and charts like:
 Bar chart of patients per facility
 Pie chart of payment mode distribution
 Line chart showing monthly revenue
 Heatmap of station usage
Tools: Power BI, Tableau, Matplotlib, or Excel Charts.

Step 5: Insights and Recommendations


Write a summary report with conclusions, such as:
 "Facility X handles 30% of all dialysis cases."
 "Chain Y has the highest discount application, affecting revenue."
 "Most payments happen via insurance — 60%."
 "Station 4 and 5 are underused — optimization needed."

Step 6: Final Deliverables


You can submit:
 A clean Excel file or Python notebook (Jupyter) or Power BI Dashboard.
 A PDF report or PowerPoint summarizing the key insights and charts.
 Optional: a few recommendations for management based on findings.

I even added mistakes/errors like:


 Null values
 Wrong spellings
 Data types mistakes (text instead of numbers)

 document all their assumptions.


(Example: If filling missing payment modes with "Unknown", write it clearly.)

Data Cleaning Steps

👉 Step 1: Handle Missing Values

 Identify missing data (nulls) in key columns.


 Fill or remove rows depending on analysis needs.
o Example: If Facility_Name or Patient_ID is missing → drop that row.
o If Disease_Type is missing → can fill as 'Unknown'.

👉 Step 2: Standardize Text Data

 Correct lowercase/case issues.


o Example: Change "cash" to "Cash", "medilife center" to "MediLife Center".
 Remove unnecessary spaces.

👉 Step 3: Correct Data Types

 Convert Total_Cost, Discount_Applied, Final_Amount to numeric values.


o Text like "Five Thousand" → convert to 5000
o Remove commas inside numbers like "4,800" → 4800

👉 Step 4: Check Duplicates

 Check if any Patient_ID is duplicated.


 Keep only the latest Visit_Date record if needed.

👉 Step 5: Validate Logical Consistency

 Check if:
Final Amount=Total Cost−Discount Applied\text{Final Amount} = \text{Total Cost} -
\text{Discount Applied}Final Amount=Total Cost−Discount Applied
o If not, fix or flag incorrect rows.

Data Analysis Steps

👉 Step 6: Facility-wise Patient Count

 How many patients went to each facility?

👉 Step 7: Chain Organization Summary

 How many facilities under each chain?


 Number of patients handled per chain.

👉 Step 8: Dialysis Station Usage

 Number of treatments per Dialysis_Station_ID.


 Find busiest stations.

👉 Step 9: Payment Mode Analysis

 How many patients used Cash, Insurance, Govt Aid?


 Average Final Amount per payment mode.

👉 Step 10: Discount Analysis

 How many patients received discounts?


 Total amount of discounts given (sum).

👉 Step 11: Timeline Trend

 Plot number of visits per month.


 Are patients increasing or decreasing over time?

Reporting and Insights

👉 Step 12: Key Findings

 Facility with most patients.


 Chain organization handling maximum cases.
 Popular payment mode.
 Average discount given.
 Busiest dialysis station.

👉 Step 13: Visualizations (Optional)

 Bar chart: Facility vs Number of Patients


 Pie chart: Payment Mode share
 Line chart: Visits over time

Step Task Output

1 Handle Missing Values Cleaned data

2 Standardize Text Proper text formatting

3 Correct Data Types Numeric columns fixed

4 Remove Duplicates Unique patients

5 Validate Amounts Logical data

6-11 Analyze Business insights

12-13 Report Final dashboard

You might also like