0% found this document useful (0 votes)
45 views25 pages

18BCS115

house price predictor using ml

Uploaded by

Aditya Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views25 pages

18BCS115

house price predictor using ml

Uploaded by

Aditya Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

House Price Prediction

ENGINEERING CLINIC PROJECT REPORT


Submitted by
SHIVANESH. B (18BCS074)
MOHAMED RIFAY. M (18BCS103)
VASANTH. M (18BCS115)
VIJAY V (18BCS116)
SUNIL KUMAR T (18BCS121)

Under the guidance of

Dr.R.Kalaiselvi, AP-III/ISE

In the partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING

in

COMPUTER SCIENCE AND ENGINEERING


KUMARAGURU COLLEGE OF TECHNOLOGY
COIMBATORE-641 049
(An Autonomous Institution Affiliated to Anna University, Chennai)

December 2020

Verified by

(Dr.R.Kalaiselvi)
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO.


NO.

ABSTRACT 2

1 INTRODUCTION 2

1.1 CONCEPTUAL STUDY OF THE PROJECT 3

1.2 OBJECTIVES OF THE PROJECT 3

1.3 SCOPE OF THE PROJECT 4

2 LITERATURE REVIEW 4

2.1 LITERATURE REVIEW OF JOURNAL 4

2.2 DATA DESCRIPTION 4

3 PROBLEM DEFINITION 5

4 DATA VISUALIZATION 6

PROPOSED SYSTEM 14

4.1 METHODOLOGY 14

4.2 IMPLEMENTATION 16

5 CONCLUSION 23

1
ABSTRACT
Usually, House price index represents the summarized price changes of residential
housing. While for a single-family house price prediction, it needs more accurate
method based on location, house type, size, build year, local amenities, and some
other factors which could affect house demand and supply. With limited dataset and
data features, a practical and composite data pre-processing, creative feature
engineering method is examined in this paper.

1. INTRODUCTION

Prediction house prices are expected to help people who plan to buy a house so
they can know the price range in the future, then they can plan their finance well.
In addition, house price predictions are also beneficial for property investors to
know the trend of housing prices in a certain location.

The goal of this statistical analysis is to help us understand the relationship


between house features and how these variables are used to predict house price.

Buying a home is dream come true for many but it can also become a nightmare if
proper procedure is not adhered to. There have been many cases where buyers
have had to let go off the property as it was either an agricultural land or the title
was disputed or there were pending dues.

Surprisingly, people in the metros also fall prey to sellers leading to a risky deal
where the buyer ends up losing his hard-earned money and mental peace. Most
buyers get attracted by the lucrative offers being offered by property brokers,
mostly new launches. Thus, it is imperative that buyers should take care of every
aspect related to a property deal before taking a final call

2
1.1 CONCEPTUAL STUDY OF THE PROJECT

Everybody needs a roof over their heads. It can be a house, a villa, or a flat.
Everybody, at some point in life, faces a choice whether to buy a house, and if so,
which one. And why are they so expensive?

Regardless of motives of buying a house, both sides agree on a price. It is always


good to know how much a house is worth, what is the expected transaction price.
Furthermore, it may be even more important why is the price like that, what has an
impact on it.

In this work, we want to find an answer to both questions with a stronger emphasis
on the latter. This paper intends to be a comprehensive use case of how to deal
with a regression problem for Data Scientists. Let us start with a couple of
questions that allow to define and understand problems regarding house pricing.

The seller does not know how to increase the value of the apartment so that the
investment outlay is lower than the added value (e.g. building a pool may increase
the price and renovating the bathroom is not worth it).

The seller does not know how much to sell the apartment for (he makes an offer on
the portal and does not know if the price is adequate).

The buyer does not know how much the apartment is worth (as above, whether the
price is adequate).

1.2 OBJECTIVES OF THE PROJECT

• Predict the house price.

• Using two different models in terms of minimizing the difference


between predicted and actual rating.

3
1.3 SCOPE OF THE PROJECT
o Data Visualization
o Removing outliers
o Data preprocessing
o Prediction of sales Price

2 Literature Review
Trends in housing prices indicate the current economic situation and also
are a concern to the buyers and sellers. There are many factors that have an
impact on house prices, such as the number of bedrooms and bathrooms. House
price depends upon its location as well. A house with great accessibility to
highways, schools, malls, employment opportunities, would have a greater price
as compared to a house with no such accessibility. Predicting house prices
manually is a difficult task and generally not very accurate, hence there are many
systems developed for house price prediction. Sifei Lu, Zengxiang Li, Zheng Qin,
Xulei Yang, Rick Siow Mong Goh [1] had proposed an advanced house prediction
system using linear regression. This system’s aim was to make a model that can
give us a good house price prediction based on other variables. They used the
Linear Regression for Ames dataset and hence it gave good accuracy. The house
price prediction project had two modules namely, Admin and the User. Admin can
add location and view the location. Admin had the authority to add density on the
basis of per unit area. Users can view the location and see the predicted housing
price for that particular location.

4
2.2 DATA DESCRIPTION
Parameters Description Datatype
OverallQual Rates the overall material Numerical
and finish of the house
YearBuilt Original construction Numerical
date
TotalBsmtSF Total square feet of Numerical
basement area
GrLivArea Above grade (ground) Numerical
living area square feet
FullBath Full bathrooms above Numerical
grade
GarageCars Size of garage in car Numerical
capacity
GarageArea Size of garage in square Numerical
feet
WoodDeckSF Wood deck area in Numerical
square feet
PoolArea Pool area in square feet Numerical
YrSold Year Sold (YYYY) Numerical
SalePrice (Dependent Selling Price of the house Numerical
Variable)

3. PROBLEM DEFINITION
Ask a home buyer to describe their dream house, and they probably
won't begin with the height of the basement ceiling or the proximity to
an east-west railroad. But this playground competition's dataset proves
that much more influences price negotiations than the number of
bedrooms or a white-picket fence.

5
With 79 explanatory variables describing (almost) every aspect of
residential homes in Ames, Iowa, this competition challenges you to
predict the final price of each home.
4. DATA VISULAIZATION PART

Finding the Null values

6
7
HEATMAPS

8
9
Scatter Plot

10
11
Boxplot

12
13
4.1. PROPOSED METHODOLOGY

1) Data Collection:
Methodology represents a description about the framework that is undertaken. It
consists of various milestones that need to be achieved in order to fulfill the
objective. We have undertaken different data mining and machine learning
concepts. Data Collection The dataset used in this project was an open source
dataset from Kaggle. It consists of 3000 records with 80 parameters that have the
possibility of affecting the property prices. However out of these 80 parameters
only 37 were chosen which are bound to affect the housing prices. Parameters such
as Area in square meters, Overall quality which rates the overall condition and
finishing of the house, Location, Year in which house was built, Numbers of
Bedrooms and bathrooms, Garage area and number of cars that can fit in garage,
swimming pool area, selling year of the house and Price at which house is sold.
Selling price is a dependent variable on several other independent variables. Some
parameters had numerical values and some were ratings. These ratings were
converted to numerical values.

2) Data Preprocessing:
It is a process of transforming the raw, complex data into systematic
understandable knowledge. It involves the process of finding out missing and
redundant data in the dataset. Entire dataset is checked for Nan and whichever
observation consists of Nan will be deleted. Thus, this brings uniformity in the
dataset. However, in our dataset, there was no missing values found meaning that
every record was constituted its corresponding feature values.

14
3) Data Analysis:
Before applying any model to our dataset, we need to find out characteristics of
our dataset. Thus, we need to analyses our dataset and study the different
parameters and relationship between these parameters. We can also find out the
outliers present in our dataset. Outliers occur due to some kind of experimental
errors and they need to be excluded from the dataset. From the analysis we found
out that there exist one or two outliers. The general trend for Sale price over
different parameters. 'GrLivArea' and 'TotalBsmtSF' seem to be linearly related
with 'SalePrice'. The overall quality of the house and Area rises the sale price of
the house rises too! However, Overall quality and number of bathrooms are non-
correlated and are independent of each other. Total Basement Area and Ground
Living Area are correlated to each other. There exists an outlier in all the graphs of
Total Basement Area. This outlier could be present due to experimental errors and
hence that observation can be avoided. A correlation number gives the degree of
association between two variables. The correlation number exists between +1 to -1.
A positive number represents a positive correlation between two variables and vice
versa. However, if the correlation number is 0, it shows that there is no correlation
between two variables and they are independent of each other. Correlation Matrix
gives a in depth idea about correlation among various parameters.
Blocks with correlation number towards zero show a higher variance since the
correlation is 0 and are independent. These parameters cannot be neglected.
However, numbers towards +1 or -1 are showing higher relation between variable
thus either of them can be neglected to minimize the number of parameters.

4) Regression

A. Linear regression: Simple linear regression statistical method allows us to


summarize and study the relationship between two continuous quantitate
variables. One variable, denoted x, is regarded as the predictor, explanatory,
or independent variable. The other variable, denoted y, is regarded as the
response, outcome, or dependent variable.

15
B. Multiple Regression Analysis Multiple regression analysis is used to check
whether there is a statistically noteworthy association the middle of sets of
variables. It’s used to discover patterns in the individuals sets of information.
Numerous relapse Investigation will be very nearly the same Likewise basic
straight relapse. The main distinction the middle of straightforward straight
relapse Also numerous relapses is in the number for predictors (“x” variables)
utilized within those relapses. Straightforward relapse examination
employments an absolute x variable to each subordinate “y” variable. Case in
point: (x1, Y1). Numerous relapse utilization numerous “x” variables for
every free variable: (x1)1, (x2)1, (x3)1, Y1). In one-variable straight
regression, you might information particular case subordinate variable (i. E.
“sales”) against a autonomous variable (i. E. “profit”). Anyhow you could
make intrigued by how diverse sorts from claiming offers impact the relapse.
You Might set your X1 as particular case kind from claiming sales, your X2
Similarly as in turn sort about deals etc.

4.2. IMPLEMENTATION

16
17
18
19
20
21
22
5. CONCLUSION

In this research paper, we have used machine learning algorithms to predict the
house prices. We have mentioned the step by step procedure to analyse the
dataset and finding the correlation between the parameters. Thus, we can select
the parameters which are not correlated to each other and are independent in
nature.

REFERENCES

[1] R. J. Shiller, “Understanding recent trends in house prices and home


ownership,” National Bureau of Economic Research, Working Paper 13553, Oct.
2007. DOI: 10.3386/w13553. [Online]. Available:
http://www.nber.org/papers/w13553.

[2] D. Belsley, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying


Influential Data and Source of Collinearity. New York: John Wiley, 1980.

[3] J. R. Quinlan, “Combining instance-based and model-based learning,” Morgan


Kaufmann, 1993, pp. 236–243.

[4] S. C. Bourassa, E. Cantoni, and M. Hoesli, “Predicting house prices with


spatial dependence: a comparison of alternative methods,” Journal of Real Estate
Research, vol. 32, no. 2, pp. 139–160, 2010. [Online]
Available:http://EconPapers.repec.org/RePEc:jre:issued:v:32: n:2:2010:p:139-160.

[5] S. C. Bourassa, E. Cantoni, and M. E. Hoesli, “Spatial dependence, housing


submarkets and house price prediction,” eng, 330; 332/658, 2007, ID:
unige:5737.[Online]. Available: http:// archive - ouverte. unige. ch/unige:5737.

[6] Pow, Nissan, Emil Janulewicz, and L. Liu. "Applied Machine Learning Project
4 Prediction of real estate property prices in Montréal." (2014).

[7] Limsombunchai, Visit. "House price prediction: hedonic price model vs.
artificial neural network."New Zealand Agricultural and Resource Economics
Society Conference. 2004.

23
24

You might also like