0% found this document useful (0 votes)

69 views7 pages

Real Estate Data Insights

This document analyzes housing data from India. It begins by importing necessary libraries and reading in a CSV file containing housing data. Some initial data preprocessing steps are performed, including handling missing values, dropping unnecessary features, and encoding categorical features. Exploratory data analysis is conducted through visualizations of categorical and numerical features. Correlations between features are also analyzed through a heatmap. The goal of this analysis is to understand patterns in the Indian housing data.

Uploaded by

mellouk ayoub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views7 pages

Real Estate Data Insights

Uploaded by

mellouk ayoub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

IndianHousingAnalysis By Ahmad Raza

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('IndianHouses.csv')

df.head()

Area BHK Bathroom Furnishing Locality Parking Price Status Transaction Type Per_Sqft

Semi-
0 800.0 3 2.0 Rohini Sector 25 1.0 6500000 Ready_to_move New_Property Builder_Floor NaN
Furnished

Semi- J R Designers Floors,

1 750.0 2 2.0 1.0 5000000 Ready_to_move New_Property Apartment 6667.0
Furnished Rohini Sector 24

Citizen Apartment, Rohini

2 950.0 2 2.0 Furnished 1.0 15500000 Ready_to_move Resale Apartment 6667.0
Sector 13

Semi-
3 600.0 2 2.0 Rohini Sector 24 1.0 4200000 Ready_to_move Resale Builder_Floor 6667.0
Furnished

Semi- Rohini Sector 24 carpet

4 650.0 2 2.0 1.0 6200000 Ready_to_move New_Property Builder_Floor 6667.0
Furnished area 650 sqft status R...

Data Preprocessing Part 1

df.select_dtypes(include='object').nunique()

Furnishing 3
Locality 365
Status 2
Transaction 2
Type 2
dtype: int64

df['Locality']

0 Rohini Sector 25
1 J R Designers Floors, Rohini Sector 24
2 Citizen Apartment, Rohini Sector 13
3 Rohini Sector 24
4 Rohini Sector 24 carpet area 650 sqft status R...
...
1254 Chittaranjan Park
1255 Chittaranjan Park
1256 Chittaranjan Park
1257 Chittaranjan Park Block A
1258 Chittaranjan Park
Name: Locality, Length: 1259, dtype: object

#we dont need deep information of locality we need main detail as i split in feature
df['Locality'].str.split(' ').str[1]

0 Sector
1 R
2 Apartment,
3 Sector
4 Sector
...
1254 Park
1255 Park
1256 Park
1257 Park
1258 Park
Name: Locality, Length: 1259, dtype: object

df['Locality'] = df['Locality'].str.split(' ').str[1]

#using split string funtion we split and got less number of unique values but we need top 30
plt.figure(figsize=(10,5))
df['Locality'].value_counts().head(50).plot(kind='bar')
plt.show()
df['Locality'].nunique()

119

#the number opf unique values is less than nefore but we cant handle it so we drop it
df.drop('Locality',axis=1,inplace=True)

Handle Missing Values

df.isnull().sum()

Area 0
BHK 0
Bathroom 2
Furnishing 5
Parking 33
Price 0
Status 0
Transaction 0
Type 5
Per_Sqft 241
dtype: int64

#we fill nan values by mean because of numerical values

df['Per_Sqft'] = df['Per_Sqft'].fillna(df['Per_Sqft'].mean())

#we fill nan values by mode because of categorical values

df['Bathroom'] = df['Bathroom'].fillna(df['Bathroom'].mode()[0])
df['Parking'] = df['Parking'].fillna(df['Parking'].mode()[0])
df['Furnishing'] = df['Furnishing'].fillna(df['Furnishing'].mode()[0])
df['Type'] = df['Type'].fillna(df['Type'].mode()[0])

Handle DataTypes
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1259 entries, 0 to 1258
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Area 1259 non-null float64
1 BHK 1259 non-null int64
2 Bathroom 1259 non-null float64
3 Furnishing 1259 non-null object
4 Parking 1259 non-null float64
5 Price 1259 non-null int64
6 Status 1259 non-null object
7 Transaction 1259 non-null object
8 Type 1259 non-null object
9 Per_Sqft 1259 non-null float64
dtypes: float64(4), int64(2), object(4)
memory usage: 98.5+ KB

df['Bathroom'] = df['Bathroom'].astype('int')

df['Parking'] = df['Parking'].astype('int')

Exploratary Data Analysis

cat_vars = ['Furnishing','Status','Transaction','Type']

num_cols = len(cat_vars)

fig , axs = plt.subplots(nrows=2,ncols=2,figsize=(15,10))

axs = axs.flatten()

for i , var in enumerate (cat_vars):

sns.countplot(y=var,data=df,ax=axs[i])
axs[i].set_title(var)

if num_cols < len(axs):

for i in range(num_cols, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

int_vars = df.select_dtypes(include=['int','float'])

num_cols = len(int_vars)

fig , axs = plt.subplots(nrows=3,ncols=2,figsize=(15,10))

axs = axs.flatten()
for i , var in enumerate (int_vars):
df[var].plot(kind='hist',ax=axs[i])
axs[i].set_title(var)

if num_cols < len(axs):

for i in range(num_cols, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

#exploratary data anaylsis with kde(kernal density estimation)

num = df.select_dtypes(include=['int','float']).columns.tolist()
col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

for i , var in enumerate (num):

sns.histplot(data=df,x=var,kde=True,ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()
#exploratary data anaylsis with boxplot to indentity outliers
num = df.select_dtypes(include=['int','float']).columns.tolist()
col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

for i , var in enumerate (num):

sns.boxplot(data=df,x=var,ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()
#eda using dependent feature price
cat = ['Furnishing','Status','Transaction','Type']
col = len(cat)
fig, axs = plt.subplots(nrows=col,ncols=2,figsize=(15,15))
axs = axs.flatten()

for i, var in enumerate (cat):

sns.barplot(x='Price', y=var, data=df, ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

Data Preprocessing Part 2

#print all the unique values ob object datatypes to enoded them
#print all the unique values ob object datatypes to enoded them
for col in df.select_dtypes(include='object').columns:
print(f'{col}: {df[col].unique()}')

Furnishing: ['Semi-Furnished' 'Furnished' 'Unfurnished']

Status: ['Ready_to_move' 'Almost_ready']
Transaction: ['New_Property' 'Resale']
Type: ['Builder_Floor' 'Apartment']

#encoded all values by labelencoder

from sklearn import preprocessing
for col in df.select_dtypes(include=['object']).columns:
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df[col].unique())
df[col] = label_encoder.transform(df[col])
print(f'{col} : {df[col].unique()}')

Furnishing : [1 0 2]
Status : [1 0]
Transaction : [0 1]
Type : [1 0]

#co_relation heatmap
plt.figure(figsize=(15,10))
sns.heatmap(df.corr(),fmt='.2g',annot=True)
plt.show()

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

Bangalore Real Estate Price Analysis
No ratings yet
Bangalore Real Estate Price Analysis
28 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
15 pages
House Price Prediction Analysis
No ratings yet
House Price Prediction Analysis
14 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
Ex 1
No ratings yet
Ex 1
119 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
33 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
West Rox
No ratings yet
West Rox
29 pages
Pract1.printdsbdapdf 2
No ratings yet
Pract1.printdsbdapdf 2
7 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Week 12
No ratings yet
Week 12
2 pages
House Price Prediction Guide
No ratings yet
House Price Prediction Guide
14 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
ADS Exp3
No ratings yet
ADS Exp3
8 pages
House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
Exp 10
No ratings yet
Exp 10
1 page
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
Real Estate Price Prediction Guide
No ratings yet
Real Estate Price Prediction Guide
13 pages
Housing Main
No ratings yet
Housing Main
23 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
Major Project Guide
No ratings yet
Major Project Guide
5 pages
Amex 2nd Class
No ratings yet
Amex 2nd Class
3 pages
Regression Workbook
No ratings yet
Regression Workbook
2 pages
Data Science: Housing Price Prediction
No ratings yet
Data Science: Housing Price Prediction
2 pages
Pandas - Jupyter Notebook - 19!7!2025
No ratings yet
Pandas - Jupyter Notebook - 19!7!2025
36 pages
EDA Techniques for Data Science Students
No ratings yet
EDA Techniques for Data Science Students
48 pages
Intro to Pandas for Data Science
No ratings yet
Intro to Pandas for Data Science
6 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Prepared by Asif Bhat Exploratory Data Analysis: Explore Dataset
No ratings yet
Prepared by Asif Bhat Exploratory Data Analysis: Explore Dataset
143 pages
Linear Regression - House Price Prediction
100% (2)
Linear Regression - House Price Prediction
174 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Python Real Estate Data Analysis
No ratings yet
Python Real Estate Data Analysis
10 pages
Housing Data Cleaning & Analysis
No ratings yet
Housing Data Cleaning & Analysis
7 pages
Intro to ML with Sklearn & Python
No ratings yet
Intro to ML with Sklearn & Python
10 pages
Pract1.printdsbdapdf 2
No ratings yet
Pract1.printdsbdapdf 2
10 pages
Capstone Project 6 April
No ratings yet
Capstone Project 6 April
64 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
ML Beginners: Predict House Prices
No ratings yet
ML Beginners: Predict House Prices
32 pages
Data - Analysis (Intro To Pandas)
No ratings yet
Data - Analysis (Intro To Pandas)
16 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
Python Expert
No ratings yet
Python Expert
10 pages
Data Analysis Advance House Price Prediction 1682585529
No ratings yet
Data Analysis Advance House Price Prediction 1682585529
73 pages
House 2
No ratings yet
House 2
11 pages
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
No ratings yet
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
14 pages
Minor Assignment
No ratings yet
Minor Assignment
34 pages
Data Science Projects Guide 1695069841
No ratings yet
Data Science Projects Guide 1695069841
12 pages
SQL Interview Prep Guide
No ratings yet
SQL Interview Prep Guide
43 pages
The Distribution of National Income in France, 2014
No ratings yet
The Distribution of National Income in France, 2014
21 pages
TD 4
No ratings yet
TD 4
6 pages
ME372 Smart Meter with GSM/GPRS
No ratings yet
ME372 Smart Meter with GSM/GPRS
2 pages
Rflintune01-Lab Manual v0.02
No ratings yet
Rflintune01-Lab Manual v0.02
39 pages
2d Transformations 1
No ratings yet
2d Transformations 1
4 pages
Process Control Basics for Students
100% (1)
Process Control Basics for Students
51 pages
AI Mid 1 Q Paper
No ratings yet
AI Mid 1 Q Paper
2 pages
9 Ai MCQ3
No ratings yet
9 Ai MCQ3
8 pages
Secure SDLC Standard v2
No ratings yet
Secure SDLC Standard v2
9 pages
HP PCs - Configuring The Boot Order in The System BIOS
No ratings yet
HP PCs - Configuring The Boot Order in The System BIOS
2 pages
PaloAlto Portfolio-Product-Brochure
No ratings yet
PaloAlto Portfolio-Product-Brochure
13 pages
Vinayak Khare - Updated
No ratings yet
Vinayak Khare - Updated
19 pages
An Internal Error Has Occurred. (Code - 0x8003001D, Could Not Access File For Write Operation) - When Saving A Change in SmartConsole
No ratings yet
An Internal Error Has Occurred. (Code - 0x8003001D, Could Not Access File For Write Operation) - When Saving A Change in SmartConsole
3 pages
Slamtec Rplidar On Pi
No ratings yet
Slamtec Rplidar On Pi
13 pages
Hostel Management System
79% (57)
Hostel Management System
42 pages
Blaupunkt Soundbar SBW50 - Optical
No ratings yet
Blaupunkt Soundbar SBW50 - Optical
15 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
HCIP-Transmission V2.5 Lab Guide
100% (1)
HCIP-Transmission V2.5 Lab Guide
170 pages
Sih 1614
No ratings yet
Sih 1614
6 pages
Building Successful Information Systems
100% (1)
Building Successful Information Systems
56 pages
Adam Harley CV
No ratings yet
Adam Harley CV
5 pages
Shift Lock
No ratings yet
Shift Lock
5 pages
Splunk Enterprise Search Reference 6.4.1
No ratings yet
Splunk Enterprise Search Reference 6.4.1
3 pages
WU1400 Quick Installation Guide
No ratings yet
WU1400 Quick Installation Guide
2 pages
Assembly Language Lab Guide
No ratings yet
Assembly Language Lab Guide
15 pages
Crash 2022 03 11 - 21.46.17 Client
No ratings yet
Crash 2022 03 11 - 21.46.17 Client
3 pages
قياس سعة التخزين الرقمي
No ratings yet
قياس سعة التخزين الرقمي
13 pages
LaunchPad MSP430FR5969 Schematic
No ratings yet
LaunchPad MSP430FR5969 Schematic
5 pages
Originality Report Analysis
No ratings yet
Originality Report Analysis
25 pages
UART Interfacing
No ratings yet
UART Interfacing
4 pages
Colab
No ratings yet
Colab
8 pages
Go Programming Learn by Doing Practical Projects
No ratings yet
Go Programming Learn by Doing Practical Projects
130 pages

Real Estate Data Insights

Uploaded by

Real Estate Data Insights

Uploaded by

IndianHousingAnalysis By Ahmad Raza

Semi- J R Designers Floors,

Citizen Apartment, Rohini

Semi- Rohini Sector 24 carpet

Data Preprocessing Part 1

df['Locality'] = df['Locality'].str.split(' ').str[1]

Handle Missing Values

#we fill nan values by mean because of numerical values

#we fill nan values by mode because of categorical values

Exploratary Data Analysis

fig , axs = plt.subplots(nrows=2,ncols=2,figsize=(15,10))

for i , var in enumerate (cat_vars):

if num_cols < len(axs):

fig , axs = plt.subplots(nrows=3,ncols=2,figsize=(15,10))

if num_cols < len(axs):

#exploratary data anaylsis with kde(kernal density estimation)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

for i , var in enumerate (num):

if col < len(axs):

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

for i , var in enumerate (num):

if col < len(axs):

for i, var in enumerate (cat):

if col < len(axs):

Data Preprocessing Part 2

Furnishing: ['Semi-Furnished' 'Furnished' 'Unfurnished']

#encoded all values by labelencoder

You might also like