0% found this document useful (0 votes)
65 views87 pages

Housing Price Prediction

Uploaded by

manishkaman005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views87 pages

Housing Price Prediction

Uploaded by

manishkaman005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 87

Similarity Report

PAPER NAME AUTHOR


housing price Subroto
prediction.docx

WORD CHARACTER COUNT


COUNT
107497 Characters
17792
Words

PAGE FILE SIZE


COUNT
512.7KB
63 Pages

SUBMISSION DATE REPORT DATE


Aug 5, 2024 10:49 AM Aug 5, 2024 10:51 AM GMT+5:30
GMT+5:30

18% Overall Similarity


The combined total of all matches, including overlapping sources, for each
database.
14% Internet database 16% Publications database
Crossref database Crossref Posted Content database

Excluded from Similarity Report


Bibliographic material
Summary
HOUSING PRICE PREDICTION

UNIVERSITY OF ENGINEERING
&
MANAGEMENT, JAIPUR
Housing Price Prediction
BACHELOR OF
Submitted in the partial fulfillment of the degree of
APPLICATION

Und
Samayak
UNIVERSITY OFjain [Enrollment No:
ENGINEERING 12021004009012]
& MANAGEMENT,
BY
UNDER THE GUIDANCE OF

PROF. BABLU KUMAR MAJHI

BACHELOR OF COMPUTER APPLICATION

UNIVERSITY OF ENGINEERING & MANAGEMENT, JAIPUR


Approval

This is to certify that the project report “Housing Price Prediction” submitted Samayak Jain
3
(Enrollment No: 12021004009012) in partial fulfillment of the requirements of the degree of Bachelor
Computer Application from University of Engineering and Management, Jaipur was carried out in
systematic and procedural manner to the best of our knowledge. It is a bona fide work of the candidate and
carried out under our supervision and guidance during the academic session 2021-2024.

Prof. Bablu Kumar Majhi


Project (BCA)
UEM,

Sayak Pramanik Mukesh Yadav


Pr
HO (BCA) PrDe
UEM, UEM,
ACKNOWLEDGEMENT

The endless thanks goes to Lord Almighty for all the blessings he has showered me, whi
has enabled me to write this last note in my research work. During the period of my research, as
in the rest ofmy life,I have been blessed by Almighty with some extraordinary people who have
spun a web of support around me. Words can never be enough in expressing how grateful I am
to those incredible people in my life who made this thesis possible.I would like an attempt to
thank them for makingmy time during my research in the Institute a period I will treasure. I am
deeply indebted tomy research supervisor, Professor Bablu kumar majhi me such an interesting
thesis topic. Each meeting with him added in valuable aspects to the implementation
broadened my perspective. He has guided me with his invaluable suggestions, lightened up
way in my darkest times and encouraged me a lot in the academic

Samayak Jain
ABSTRACT

Predicting house prices has long been a complex challenge tackled by numerous researchers.
Accurate predictions are essential for informing stakeholders 1in house estate shaping
housing policies, refining real appraisals. This project’ " Comprehensi Guide to
House Price ," provides an in-depth overview5
of various strategies used to forecast
house prices.This systematic literature review examines the data types and modeling approaches
employed in research from 1992 to 2021. We meticulously analyzed 93 articles that each
present unique techniques for predicting house prices. These works were evaluated and scored
based on the novelty of their models and data. Our cluster analysis maps the landscape of
property
valuation, identifying key trends and shifts in the field.While
1
traditional methods and
conventional data sources still dominate, the field of house price prediction is gradually
embracing mo sophisticated techniques 1
and innovative data inputs. Our review highlights
opportunities for integrating advanced data types, such as unstructured and complex spatial data,
and incorporating deep 2learning andcustomized methods. These advancements hold the
and significantly improve the of housepotential
to guide future research accuracy predictions.
Table o Contents

Table of Contents
CHAPTER 8
1.List of

2. INTRODUCTION
CHAPTER 108
Related work 10
3. Chapter 14
3.1 Review 14
4. Chapter 19
5. Chapter 38
5.1 Results & Discussion 38
6. Chapter 57
6.1 Conclusion 57
7. Chapter 58
7.1 Future Work -58
8. Chapter 59
8.1 Bibliography 59
CHAPTER 1
INTRODUCTION

Housing Price Prediction, or valuing residential properties, is a complex challenge because real estate
11
valuations are influenced by more than just the physical attributes of the building. Factors like location, the
neighborhood, and public perception also play significant roles. Additionally, market prices are driven by
buyers' willingness to pay, adding another layer of complexity to establishing an objective value for residential
properties. Traditionally, specialists like notaries, real estate brokers, and property investors have relied on
years of experience to make these valuations, making the automation of this process a daunting task.

However, Automated Valuation Models (AVMs) can enhance the accuracy of valuations, benefiting buyers,
sellers, notaries, banks, and policymakers. The main challenge lies in the diversity of techniques and kinds of
data utilized in the forecast of home prices, which makes achieving consistent accuracy difficult. Most
5
researchers agree on using 5a hedonic approach, which considers variables describing the physical characteristics
of the house. Additionally, incorporating effects of is crucial due to dependence (how
nearby house prices influence each other) and spatial heterogeneity (how these influences vary across different
areas).

The first law of geography states, "Everything is related to everything else, but near things are more related than
distant things," highlighting the importance of spatial factors in house price prediction. However, accounting for
these spatial is still a contentious issue. Some researchers use proxies like submarkets or distances to
central business districts, while others employmodels that take into account the location, like spatial
econometrics, kriging, or spatially variable coefficient models. One more Another approach involves directly
incorporating location data (longitude and latitude) into machine learning algorithms to detect spatial
patterns.

Temporal dependence between house prices is another challenge, which most researchers address through
modeling techniques. While supervised learning is the predominant approach, some researchers have explored
semisupervised learning for house price prediction.
1
To address these challenges, our systematic literature review investigates the methods and data types used in
house price prediction, focusing on geospatial components. We identified trends through a cluster analysis of 93
articles based on their proposed methods and input data. The findings reveal an overrep5resentation of
conventional models and traditional data types. No5netheless, recent research has started to explore
more advanced
and innovative data sources, such as advanced machine learning and deep learning, often combined with table,
image, and textual data.
5
Our investigation concludes with this trend analysis, which identifies important gaps and research opportunities.
The following is the project's structure: The relevant property value research is reviewed in Section 2. Section 3
provides an overview
7
The approach that utilizes the PSALSAR framework Section 4 presents the results. Finally, Sections 5 and
off an in-depth discussi and conclusions, respectively

The House Price Index (HPI) is a widely ut6ilized metric for gauging changes in residential housing prices
across various countries. Examples include the US Federal Housing Finance Agency HPI, the S&P/Case-Shiller
price index,the UK National Statistics HPI,the UK Land Registry’s HPI,the UK Halifax HPI, UK
the
a weighted, repeat-sales index, which
s operates as
Rightmove HPI, and URA HPI. The
calculates average
18
price changes based on repeated sales or refinancing of the same properties.
6
This index relies
on analyzing repeat mortgage transactions on single-family properties, with data provided by Fannie Mae or
Freddie Mac since January By employing various analytical tools, enables economists to assess shifts in
mortgage default rates, prepayments, and housing affordability within specific regions.

Despite its utility, the HPI is a generalized measure and may not accurately predict the price of individual
houses. Factors such as location, property age, and the number of floors are crucial in predicting specific house
prices. In recent years, machine learning has become an essential tool for more precise house price predictions.
5This is because machine learning can utilize various property attributes and not just historical sales data.
Numerous studies have demonstrated the effectiveness of machine learning in this domain. However, many
have focused solely on comparing individual model performances without exploring the potential of
combining different models.
18 18
An exception is the work by S. Lu et al., used a hybrid regression
6
technique for house forecasting
but required extensive parameter tuning. Recognizing the significance of model combination, this paper
explores the Stacked Generalization approach, a machine learning method aimed at optimizing
prediction accuracy. Using the "Housing Price in Beijing" dataset from Kaggle, we vali6dated the
performance of multiple models. Our findings showed that the Stack Generalization method yieldedRoot
the lowest
Squared Logarithmic Error (RMSLE) of 0.16350 on the test
2
T structure of this paper is as follows: Section 2 details the methodology, Section 3 presents a comparison of
the and S ection 4 discusses the , draws a suggests directions for future research
conclusions,

In today’s competitive real estate market, organizations strive to gain an edge over their competitors.
Simplifying the house pricing process for everyday users while delivering accurate results is essential. This
paper introduces a system that predicts house prices using a regression machine learning algorithm. If you’re
looking to sell a 4h2ouse, knowing the right price to list it at is crucial, and a computer algorithm can provide
a This regression model to predict the for
precise estimate. designed not only of houses ready sale but
also those under construction.
10
Regression is a machine tool that helps make predictions learning the relationship s a target
parameter and various independent parameters from existing statistical data. In this context, a 3house price
depends factors such the number of rooms, living area, and location. By applying machine learning to
these parameters, we can estimate house values in a specific region.

the property, while the independent features include


ice of the number of bedrooms, number of bathrooms, carpet real
are built-up area, floor number, age of the property, zip
code, and the property’s latitude and longitude. In addition to these commonly used features, we have
incorporated air quality and crime rate, as higher values in these features typically lead to lower house prices.

The entire implementation is executed with Python as the programming language. For constructing the
predictive model, we utilize a Decision Tree Regressor from the “Scikit-learn” machine learning library. Grid
Search CV is employed to determine the optimal max-depth value for constructing the decision tree. Once the
model is trained, it is integrated with a user interface using Flask, a Python web framework.

This approach ensures that the system is user-friendly and capable11of providing reliable house price predictions,
thus aiding both sellers and buyers in making informed decisions. In report, we present system, "House
Price Using Machine Learning." Alongside fundamental needs like food and water, having a place
to call home is one of a person's most basic desires. In the real estate industry, accurately predicting house
prices is crucial as it helps buyers and sellers make informed decisions. With advancements in machine
learning, numerous algorithms have been developed to predict property prices accurately.
Our research leverages
real estate
a dataset
portfolio
of real estate properties and employs XGBoost, an advanced gradient boosting
technique, to forecast house values. XGBoost is a powerful algorithm known for effectively managing
structured datasets and has demonstrated excellent performance in forecasting complex datasets. It is
frequently
71
used in machine competitions due its robustness.
56
The goa of house pri prediction is to develop a model that can accurately estimate the a new house
l price 40 of
based on its attributes. This is achieved by using historical data on house features—such as square footage, the
number of and
7 bathroo location, etc.—and their corresponding prices. In this project, we applied
five algorithms: linear regression, support vector machine, Lasso regression, Random Forest, and XGBoost, to
predict house prices using a dataset of real estate properties.

XGBoost, in particular,
13
is highly effective for this purpose as it can handle a large number of features and
capture complex relationships between the features and the target (price). B assessing the performance
of these algorithms,
51
especially XGBoost, we aim to provide a reliable method for house price prediction that
can benefit both buyers and sellers in the real estate market.

Algorithms for machine learning (ML) are progressively being utilized for use in automated valuation models
and in bulk real estate appraisals These mass appraisals involve standardized procedures that collect data
from 73
real estate listings to estimate the value large groups of properties, ensuring that the appraisals are conducted
impartially and consistently. Utilizing these technologies has the benefit of enabling rapid and
inexpensive execution of numerous valuations at a low cost per value. Mass appraisals using automated
systems are 4
commonly employed for recurring annual taxes, as well as sporadic real estate such as property transfer
taxes, capital taxes, and inheritance and gift taxes. They are also used in banking (for loans and mortgage
risk assessment), and by real estate marketers..
4

Recently, machine learning methods have been applied to estimate house prices, making these
relatively in this field. For instance, 4Antipov et al. used the random forest (RF) technique appraising
2,695 properties in Russia's St. Petersburg. They put into practice methods like k-nearest neighbors (K-NN),
7
chi-squared automatic interaction detector (CHAID), and classification and regression tree (CART).Their
results demonstrated that these techniques are highly effective, 4even when dealing with significant
heteroscedasticity, categorical variables, outliers, and incorrect dat

There are various ML appraisals of real One approach compares


4 approaches to algorithms to
the performance of classical regression models, mainly hedonic price models (HPMs), against
using different methods using Bayesian, logistic regression, and decision trees Bayesian methods, and
mode 4
more. Studies consistently show that HPMs are less in predicting house prices compared to ML
algorithms. Another approach focuses on identifying which ML algorithms best predict real estate prices.
4
Despite the extensive research, there is no consensus on which ML algorithm is most suitable for predicting
house
there algorith
prices. However, general agreement that ML outperform traditional linear models. There is a
dearth of research on the COVID-19 pandemic's effects on home values in Spain, particularly small-scale
studies that make use of microdata. Renigier-Biłozor et al. and other authors stress the necessity of integrating
new automated technology in addition to conventional real estate assessment techniques.

The Using big datasets, the study examined how well various ensemble learning algorithms performed in
predicting home prices. algorithms (Random Forest (RF) and Extra-Trees Regressor (ETR) and bagging methods
(Gradient Boosting Regressor (GBR), Extreme Gradient Boosting (XGBM), and Light Gradient Boosting
Chapter 2
Related Work
Although few reviews on valuations for residential properties have been released, the field has seen significant
advancements in both the methods and the types of data used. A previous review analyzed existing literature
and identified three main trends. The initial trend highlights the application of spatial techniques, embodied
in the 24
catchphrase "location, location, location," which takes into account variety as well spatial
dependence.Spatial dependence means that the prices of nearby properties are related, while spatial heterogeneity
38
indicates that the relationship between property value (the dependent variable) and its influencing factors
( independent ) varies by location. The review discusses advanced spatial methods such as spatial
econometric models and geographically weighted regression to tackle these issues.

An earlier review categorizes these advanced spatial methods as third-generation techniques, which go beyond
the manual calculations of techniques of the first and second generations. First-generation methods involve
market segmentation and fitting models built on submarkets, whereas variables used in second-generation
approaches include coordinates, accessibility metrics, and neighborhood demarcation. Fuzzy logic,
16
autoregressive integrated moving average (ARIMA) models, and artificial neural networks (ANNs) are included
to list of advanced techniques for appraisal. Recent studies have focused on advanced learning techniques
due to their increasing popularity.techniques based on geographic information systems (GIS) and artificial
intelligence (AI) as well as reviews specifically on on ANNs in real estate appraisal.

Other changes noted in include the impact of sustainability, the breakdown of property values into land and
structural components, and greater study on land values in addition to improvements in modeling approaches.
policies like premiums for green buildings. Real estate laws and land values, however, are not covered by
this review of the literature.

To sum up, earlier research mainly concentrated on techniques for forecasting home values; however, this study
presents a more thorough two-dimensional examination, broadening the model viewpoint and giving special
attention to the data dimension. In line with past studies, this study emphasizes the significance of incorporating
Geographical data as a result of thecritical role of location in house price prediction.

2.1 Unlocking Property Insights:

The world of real estate is being revolutionized by the advent of advanced technologies, particularly machine
learning. Predicting housing prices with high accuracy has always been a challenge due to the myriad factors
influencing market values. However, machine learning algorithms are now unlocking insights that were
previously beyond reach, transforming how buyers, sellers, and real estate professionals navigate the market.

Machine learning leverages, enormous volumes of data to find trends and forecast future events. This entails
2 76
examining a wide range of factors in context of real estate, square footage, the number of bedrooms
and location, proximity to amenities, historical price trends, and even less obvious factors like air
quality and crime rates. By processing this data, machine learning models can provide a nuanced understanding
of property values, offering a level of precision that surpasses traditional methods.
One of the most powerful machine learning techniques used in real estate price prediction is XGBoost, an
advanced gradient boosting algorithm.
12 XGBoost excels at handling structured data and capturing complex
interactions between variables, making it particularly well-suited for multifaceted nature of real estate
markets. Studies have shown that XGBoost and similar algorithms outperform traditional linear
regression models, offering more accurate and reliable predictions.

The advantages of applying artificial intelligence to for housing price prediction are manifold. For sellers, these
models can suggest optimal listing prices that maximize returns while minimizing the time a property spends
on the market These forecasts can be used by buyers to determine if a property is reasonably priced aiding in
negotiations and decision-making. Real estate professionals, such as agents and appraisers, gain a powerful tool
to enhance their expertise, providing clients with data-driven insights that instill confidence and trust.

Moreover, Real-time adaptation to shifting market conditions is possible using machine learning models.
Conventional assessment techniques sometimes rely on historical data, which could not accurately represent
current market conditions or trends market shifts. In contrast, machine learning algorithms continuously learn
and 8
update from new data, ensuring that predictions remain relevant accurate even in dynamic environments.
8
Another significant advantage is the ability to handle large datasets efficiently. In a typical real estate market,
thousands of transactions occur
67
regularly, generating an overwhelming amount of data. Machine learning
algorithms are designed to process and this data at scale, uncovering trends and patterns that would be
impossible for humans to discern manually. This capability not only enhances accuracy but also speeds up the
prediction process, making it feasible to perform mass appraisals swiftly and at a lower cost.
8 8
The integration of machine learning into real estate also paves the way for innovative applications. For
4
example, predictive models can be used in urban planning to forecast t impact o new developments on
surrounding property values. Financial institutions can assess mortgage risks more accurately by
incorporating these predictions into their lending criteria. Even policymakers can benefit from these insights
when designing housing policies and regulations.
10
In conclusion, the application of machine
1
learning to housing price prediction is unlocking unprecedented
insights within the property industrt By harnessing the power of advanced algorithms and vast datasets, 66
these
models provide accurate, timely, and actionable predictions. As the technology continues to evolve, it promis
to further revolutionize how understand, interact with, and make in the real estate industry,
decisions
eventually helping purchasers, vendors.

2.2 Price Predictions Simplified:


65
Predicting prices, whether it's for a house, a car, or even a stock, is a complex task involves understanding a
multitude of factors. While it might seem like magic, it's actually based on data and sophisticated models.

The Building Blocks

At the heart of price prediction is data. This could be information about a house (size, location, age), a car
(make, model, mileage), or a stock (company performance, market trends). The more data you have, the
better your predictions can be.

Once you have the data, you need a model to analyze it. This model can be a simple equation or a complex
algorithm. For instance, to predict a house price, you might consider factors like square footage, number of
bedrooms, location, and recent sales data. A model would then determine how these factors influence the price.
Machine Learning Magic

In recent years, machine learning has revolutionized price prediction. Instead of relying solely on simple
equations, computers can recognize intricate patterns by learning from enormous volumes of data. For example,
a machine learning model could analyze thousands of car sales to determine which features (like fuel efficiency
or safety ratings) have the biggest impact on price.

Challenges and Solutions

Predicting prices isn't always perfect. Factors like economic conditions, unexpected events, and human behavior
can all influence prices in ways that are difficult to predict. To overcome these challenges, experts often
combine multiple models and continuously update their data to improve accuracy.

Real-World Applications

Price prediction is used in countless ways. E-commerce platforms use it to recommend products,
financial institutions use it to assess risks, and businesses use it to optimize pricing strategies. Even your
favorite streaming service uses it to suggest shows you might like.

In essence, price prediction is about finding patterns in data and using those patterns to make informed
guesses about the future. While it's not always exact, it's a powerful tool that helps us make better decisions
every day.

2.3 Location, Location, Location:


29
The real estate mantra, “location, location, location,” isn’t just a catchy phrase; it's a fundamental truth. A
property's geographic position is arguably the most significant factor influencing its value.

Why Location Matters

 Proximity to Amenities: Homes near schools, parks, shopping centers, and public
transportation command higher prices. People want convenience and accessibility.
 Neighborhood Quality: Safe, clean, and well-maintained neighborhoods attract buyers and
maintain property values.
 Job Market: There is typically more demand for houses in areas with strong job markets prices.
 Schools: Strong school districts significantly boost property values as families seek quality
education for their children.
 Transportation: Easy access to highways, public transportation, and airports increases a
property's desirability.
 Views and Natural Amenities: Properties with water views, mountain vistas, or proximity to
parks often command premium prices.

Beyond the Obvious

While these factors are well-known, the nuances of location can be complex. For instance, a home situated on
a quiet street within a bustling neighborhood might be more desirable than one on a main road. Additionally,
micro-locations within a neighborhood can also impact value. A home near a popular park or a highly-rated
restaurant could see a price premium.
The Evolving Landscape

The importance of location is constantly evolving. With the rise of remote work, the appeal of suburban and
rural areas has increased. Factors like access to high-speed internet and proximity to outdoor recreation
have gained prominence.

Ultimately, location is a dynamic factor influenced by a myriad of variables. Understanding these nuances
is crucial for both homebuyers and sellers to make informed decisions.
Chapter 3

Review
52
"Predicting housing prices, also known as residential property valuation, is a complex and multifaceted task
influenced numerous facto beyond the physical attributes of the building itself. The property's location, the
characteristics of the neighborhood, and public perception all significantly impact a property's value.
Additionally, market prices are influenced by buyers' willingness to pay, further complicating the establishment
of objective valuations. Traditionally, Professionals with years of experience, such real estate investors,
notaries, and agents, have been trusted and deep market knowledge to determine property values, making the
automation of this process a complex endeavor.
54
Automated Valuation Models (AVMs) present a promising solution, potentially benefiting a wide range
stakeholde including by improving the consistency and accuracy of property values, banks, buyers, sellers,
notaries, and legislators. Nonetheless, there are a lot of difficulties and discrepancies in housing price forecast
because different approaches and data sets are used. The majority of researchers agree that using
a hedonic approach, which takes into account elements that characterize the house's external features.
Moreover, incorporating location effects is crucial due to spatial dependence (the phenomenon where nearby
property prices influence each other) and spatial heterogeneity (the variability in relationships between property
values and influencing factors across different locations).
9
The first law of geography, which states "Everything is related to everything else, but near things are more
related than distant " underscores t critical importance of factors in house price prediction.
Despite this, Accounting for these geographical impacts is still a difficult and controversial problem. While
some studies 1use location-aware like kriging, spatial econometrics, or spatial shifting coefficie
models, others use proxies like submarkets or distances to major business districts. An alternative strategy
entails integrating To find spatial patterns, machine learning algorithms are fed location data (longitude and
latitude).. Temporal dependence between house prices adds another layer of complexity, which most
researchers address through sophisticated modeling techniques. Supervised learning is the predominant
approach, although some researchers have explored semi-supervised learning for house price prediction.

To address these multifaceted challenges, our systematic literature review delves into the the techniques and
kinds of data used in home price forecasting, with an emphasis on geographic components. Through a cluster
analysis of 93 articles, we identified prevalent trends based on proposed methods and input data. Our findings
reveal an overrepresentation of conventional models and traditional data types. Nevertheless, recent research
has begun to explore increasingly sophisticated methods and cutting-edge data sources, such as sophisticated
machine learning and deep learning approaches paired with text, image, and graph data.
19
An essential30component of our work, this trend analysis identifies important gaps and areas for future research in
the of housing price prediction. format of the paper is as follows: A summary of relevant
2
property
valuation research is given in Section 2. The methodology based on the PSALSAR framework is described
Section3. Section analytical results shown in Section 4, and a thorough discussi and conclusions are
provided in Sections 5 and 6.

While few reviews specifically focus on residential property valuation, the field has seen significant
advancements41in both methods and data usage. An earlier review identified three main trends: the use of spatial
methods, both spatial heterogeneity and dependency. Spati dependence refers to phenomenon where
prices of nearby properties are related, while spatial heterogeneity indicates that the relationships between
property values and their influencing factors vary across different locations. This review discusses advanced
spatial methods, such as spatial econometric models and geographically weighted regression, to address these
issues.

These sophisticated geographical methods, which go beyond first- and second-generation approaches involving
neighborhood delineation
16
and market segmentation, are categorized as third-generation techniques by another
review. Fuzzy logic, autoregressive integrated moving average (ARIMA) models, a artificial neural networks
(ANNs) are added to list of sophisticated valuation techniques in a third review. The increasing application
of advanced learning strategies has led to recent studies emphasizing techniques based on geographic
information systems (GIS) and artificial intelligence (AI), in addition to reviews that concentrate on ANNs in
real estate appraisal.

In addition to advancements
1
Recent research has revealed several trends in modeling methodologies, such as the
growing emphasis on land values, breakdown of property values into and structural components, and the
influence of sustainability policies, including green building premiums. Nonetheless, real estate laws and land
values are regarded as falling outside the purview of this literature review.

To sum up, earlier studies have mostly concentrated on techniques for forecasting home values; however, this
work presents a more thorough two-dimensional examination, broadening the model viewpoint and giving
special attention to the data dimension. In line with other evaluations, this research emphasizes the significance
of taking into account geographic data because geography is a key factor in predicting home prices. By filling
in the spaces While examining the new developments in the industry, this study seeks to offer insightful
information and direct future studies in housing price prediction.

Predicting husing prices is a complex endeavor that has captivated researchers and practitioners for decades.
The interplay of numerous factors, both tangible and intangible, makes it a challenging yet fascinating field.

Key Factors Influencing Housing Prices


8
At the core of housing price prediction lies a deep understanding of the factors that influence property values.
These factors can be broadly categorized as:

 Property-specific attributes: These include physical characteristics square footage, quantity of


bedrooms and baths, size of the lot, age, state of repair, and any special characteristics
 Location: Geographic location is paramount, encompassing factors like proximity to amenities, schools,
t ransportation, employment centers, and overall neighborhood quality.
6 9
 Economic indicators: Interest rates, employment rates, GDP growth, inflation, and consumer
confidence significantly impact housing demand and prices.
 Market dynamics: Supply and demand, inventory levels, and competition among buyers and
sellers influence price fluctuations.
 External factors: Natural disasters, political events, and regulatory changes can have profound
impacts on housing markets.
Modeling Techniques
62
A variety of statistical and machine techniques have been to predict housing prices.
Traditional methods like multiple linear regression have been widely used, but with the advent of big data
and computational power, more sophisticated models have gained prominence.

 Hedonic pricing models: These models estimate the implicit prices of different housing attributes
by analyzing how property prices vary with changes in these characteristics.
20
 Machine learning: Algorithms such as decision trees, random forests, support vector machines, and
neural networks shown promise i capturing complex relationships between features and prices.
 Time series analysis: For analyzing trends and seasonality in housing prices, time series models like
ARIMA and SARIMA are employed.
 Spatial analysis: Incorporating geographic information systems (GIS) to analyze spatial patterns and
relationships between properties and their surroundings can enhance prediction accuracy.

Challenges and Future Directions

Despite advancements in modeling techniques, predicting housing prices remains a challenging task due to:

 Data availability and quality: Access to comprehensive and accurate data is crucial but often limited.
 Market volatility: Housing markets are subject to rapid changes influenced by various economic
and social factors.
 Unforeseen events: Black swan events like pandemics or economic crises can disrupt
established patterns.

To address these challenges, future research should focus on:

 Advanced modeling techniques: Exploring deep learning, reinforcement learning, and hybrid
models to capture complex patterns.
 Alternative data sources: Incorporating data from social media, satellite imagery, and
other unconventional sources to enrich models.
 Explainable AI: Developing models that can provide transparent explanations for their predictions
to build trust.
 Dynamic pricing models: Considering real-time market conditions and incorporating feedback loops
to improve prediction accuracy.

By addressing these challenges and leveraging the power of data and advanced analytics, researchers and
practitioners can develop more accurate and robust housing price prediction models.

3.1. House Prices and Machine Learning

Machine Learning for Predicting House Prices: A Look at the Top


Performers

Researchers are constantly exploring how machine learning can improve house price prediction accuracy.
This summary explores several studies that compared the effectiveness of different machine learning
algorithms for this task.
Key Findings:
35
 Random Forest (RF) emerges as a frequent winner, demonstrating strong performance in studies Park et al.
[6], Banerjee et al. [26], Ceh et al. [8], Fan et al. [27], Ahmed Neloy et al. [7], and Hong et al. [12]. It excels in
capturing complex relationships within data.
 Gradient Boosting techniques like Impressive results are also shown by 2 XGBM (Extreme
Boosting) and LGBM (Light Gradient Boosting Machine) in experiments conducted by Kok et al. [14],
Fan et al. [27], andand Voutas Chatzidis [16]. They are adept at handling large datasets and reducing
bias.
 Support Vector Machines (SVM) are recognized for their consistency and reliability, as shown in Banerjee et
al. [26]. They excel in finding clear boundaries between data points.
 Ensemble Techniques that combine multiple algorithms, like bagging and random forest, also prove effective
in studies by Alfaro-Navarro et al. [19]. Ensemble approaches leverage the strengths of different models for
potentially better results.
Chapter 4

Material & Method


This project adheres to the PSALSAR framework's systematic literature review requirements. Section 3.1
1
provides details on the Protocol stage; Section 3.2 covers the Search and Appraisal steps; and Section 3.3 covers
the Synthesis step.Sections 4 and 5 discuss the last phase in the analysis process.

4.1 Research issues

Determining the goals and scope of the research is the first step in the PSALSAR framework. This study's
purview includes spatially-component residential property valuation driven by models and data. This implies
that the
1
spatial effects that affect home prices must be taken into account by either the model type or the input data.
"Private" Only apartments and single-family dwellings qualify as "property." Only studies that offer a technique
for estimating the individual properties' values are considered.

Analyzing the techniques and input data types utilized for property appraisal is the study's dual focus. The
following research questions (RQs) result from this:

Question 1: Which spatially-aware methodologies are applied in property valuation?


RQ 2: What kinds of input dat1a are used to value properties?
RQ 3: What patterns and avenues for more study are there for data-driven property valuation?

4.2 Search Approach


5
The In accordance with the study objectives and scope, the second phase is designing the search technique a
1
inclusion/exclusion criteria.An overview of the four phases that make up the Search and Appraisal steps in
study is shown in Figure 1.
Phase 1: Recognition
43
First, the pertinent scientific literature is found. The Web of Science (WoS) Core Collection a the Scop
database we searched with the same query in order to gather a representative collection of literature. Table 1
lists the variations of the search strings for WoS and Scopus. Three elements had to be present in the title or
1
abstract: "house" or an equivalent .(such as "real estate," "dwelling," or "residential property"), "valuation," or a
synonym (such as "price prediction," "price estimate," the terms "determination," "appraisal," and "spatial" or
"geospatial." 412 publications were found using this search on Scopus. and 600 from WoS. A final batch of 799
unique papers was obtained after 213 duplicates were eliminated.
Phase 2: Examining

The literature list was then filtered according


1
to document type and language. Of the 46 papers, the English
language filter omitted several. 164 articles that were published book chapters, conference proceedings, or
reviews also left out.

Abstract/arch term
Results of rch query on Scopus N = 412

Results of your Web of Science search query Excluded # 213

600 is N.
Phase
combining1:
search results and eliminating duplicates
Identificati
N is 799.
N1 = 799

filter N
nglish = 753
language Excluded # 210
Document type filter
Phase 2: N = 589

N2 = 589

N = 142 Excluded # 474


Abstract reading
Phase 3: N = 115
Eligibility itle reading
N3 = 115

Excluded # 22

Phase 4:
Literature study
N4 = 93

ull body reading


N = 93

Figure 1. Flow diagram depicting the Search and Appraisal steps consisting of four phases to identify all relevant scientific

Table 1. Search 2

Data-Base Search String No.


Articl
Date of
2 Acquisition
TITLE-ABS((appraisal OR (price W/1 determin*) OR house es
OR housing OR dwelling OR (real AND property)) AND
Scopus (valuation
1
OR (price W/1 predict*) OR appraisal OR
(geospatial O spatia
412 11 October
T I = (((real AND estate) OR house OR housing OR 2021
O R (residential AND property)) (price AND predict*)
OR appraisal OR (price AND determin*)) AND (geospatial
2
OR spatial)) ORAB = (((real ANDestate) ORhouse OR
WoS h ousing OR dwelling OR (residential AND property))
600
( valuation OR (price AND predict*) OR appraisal OR 11 October
AND determin*)) AND (geospatial OR spatial))
The price of each house or apartment should be predicted in detail in the article.
Additionally, Three requirements for exclusion were applied:

1. Articles analyzing correlations between house prices and other variables without focusing on
prediction methods were excluded.
2. Articles using cadastral values rather than market prices were not considered.
3. Articles predicting land values without addressing the value of homes were excluded.

To apply these criteria, we first reviewed the titles of the papers. Titles were evaluated based on
the presence or absence of specific keywords to infer the content. For instance, to identify
articles meeting the second inclusion criterion, titles were checked for terms like ‘model’,
‘approach’, ‘technique’, or ‘analytics’. Titles containing terms like ‘correlation’, ‘estimation’, or
‘analysis’ without referencing a prediction method were excluded based on the first exclusion
criterion. Articles mentioning ‘cadastral value’ and ‘land value’ were excluded according to the
second and third exclusion criteria, respectively. This initial screening retained 142 papers.

Next, we reviewed the abstracts of these papers using the same inclusion and exclusion criteria. This
1
step helped resolve any uncertainties from the title review. During this phase, Three review
we found and eliminated from the list, despite being covered in Section 2. This process led to
the exclusion of 474 papers, leaving a final list of 115 relevant publications.

4.4. Literature Study

In the last stage, we used our university's subscription to get the complete papers and categorized
them based on two main aspects: the type of model described and the type of input data used.
This thorough review identified additional irrelevant articles. Specifically:

 Four articles did not focus on predicting house prices.


 Eleven articles predicted average house prices for zones or neighborhoods rather than individual
properties.
 There was one fewer duplicate article.
A lone extra review paper was left out.
Five articles compared several models without suggesting a particular approach to
property assessment.
 There were 93 pertinent research papers in the final batch after this inspection.

4.5. Sorting and Examining Documents

The chosen papers were grouped and examined in the Synthesis process according to the year of
publication, the journal outlet, the prediction techniques, and the input data kinds.

1
4.5.1. Publication Year and Channel

A recent upsurge in research on the spatial valuation of residential properties is highlighted by


2 1
Figure 2, which shows the distribution publications based on the year of publication. The
journals that have published the of the articles in this are listed in Table 2. The Real
Estate Finance Journal and For instance, Economics produced eight papers.relevant to this
literature review. All journals listed are related to real estate or geography, reflecting the
specialized nature of the research in this field.

4.5.2 Methods

The published research can be categorized into fourteen distinct modeling approaches: fuzzy
logic, nearest neighbor techniques, time series analysis,
5
kriging, spatial econometric models,
spatially coefficient models , decision/regression trees, andsupport vector
changing devices, gradient
boosted artificial neural random forests, or perceptrons trees, additional
ensemble methods, and deep learning (for further information, see Table A1 in Appendix A). It's
common for multiple methods to be compared within a single study, often as a benchmark for
the proposed approach. Therefore, for this analysis, we focused solely on the primary method
outlined in each paper and summarized these in Table A1.

4.5.3 SYSTEM DESIGN AND ARCHITECTURE

Phase 1: Data Collection and Preparation

The foundation of any successful machine learning project is high-quality data. For our housing
price prediction model, we started by gathering information on Mumbai's real estate properties
from various online platforms. This data included essential details like location, property size
(carpet and built-up area), age, and zip code. It's crucial to ensure the data is structured and
quantifiable for effective analysis.

Before diving into modeling, we meticulously cleaned and prepared the dataset. This involved
handling missing values, which could be due to incomplete data or data entry errors. We
addressed these issues by either removing entries with excessive missing information or
imputing missing values with averages or other statistical measures. Outliers, which can skew
our model's performance, were also identified and treated appropriately, either by removing
them or capping their values.

2
Phase 2: Model Training
2
With a clean and prepared dataset, we divided it into a training set and a testing set, two separate
subsets.
The model is trained using the training to learn thebetween property features and their
corresponding prices.
13
We employed a decision tree regression algorithm for this task. Decision trees excel at handli
both numerical and categorical data, making suitable for our diverse dataset. The algorithm
wherein decision rules are represented by branches, features are represented by each internal
node, and leaf nodepredict the property price.

Phase 3: Model Evaluation and Deployment

Once the model was trained, we evaluated its performance using the testing dataset. This helped
us assess how accurately the model could predict prices for unseen data. Several metrics, such as
33
mean squared error (MSE) and R-squared, were used measure t model's effectiveness.

To make our model accessible to users, we integrated it with a user interface using Flask, a
Python web framework. This allowed users to input property details and receive a predicted price
almost instantly.

Note: While this outlines a basic approach, real-world projects often involve more complex
modeling techniques, hyperparameter tuning, and rigorous evaluation to achieve optimal results.
Additionally, incorporating feature engineering to create new informative features can
significantly enhance model performance.

4.5.4 Explanation
Data is the Foundation

The journey to predicting housing prices begins with a solid foundation: data. Gathering
relevant information is crucial. This includes factors like property size, location, number of
bedrooms, neighborhood amenities, and historical sales data. It's essential to collect data that
accurately reflects the market you're targeting.

Data Preparation: The Cleanup Crew

Raw data often contains inconsistencies and imperfections. Data preprocessing is the procedure
for converting unstructured data into a format that is clear and ready for analysis. This involves
handling missing values (by removing or imputing them), identifying and addressing outliers,
and 12
converting data into a consistent format. For example, categorical data like location might need to
be converted into numerical .
3
Model
Building: T Hea the Process
rt o

Once the data is prepared, it's time to build the predictive model. Several machine learning
algorithms can be employed for this task. Popular choices include:

 Linear regression: Assumes a linear relationship between house prices and features.
 Decision trees: Create a tree-like model to make predictions based on decision rules.
 Random forests: an ensemble approach that boosts accuracy by combining several
decision trees.
 Support vector machines: To divide data points into distinct classes—in this case, price
ranges—find the optimal hyperplane.
 Neural networks: Complex models inspired by the human brain, capable of learning
intricate patterns.

The choice of algorithm depends on factors like dataset size, complexity, and desired prediction
accuracy.

Data Sourcing and Collection

The foundation of any robust housing price prediction model is a comprehensive and high-
quality dataset. In this research, data on multifamily properties in Alicante, Spain, was
meticulously collected from a real estate portal between May 2019 and December 2021. Key
information included:

 Property details: Size, number of bedrooms, bathrooms, and other relevant features.
 Building characteristics: Amenities like elevators, parking, and swimming pools.
 Geographic location: Precise coordinates for each property.
 Listing information: Asking price and listing duration.

To ensure data accuracy, the dataset was regularly updated to account for changes in property
status (sold, withdrawn, price adjusted) and new listings. Outliers and inconsistencies in the data
were identified and addressed through careful cleaning and preprocessing.

Data Enrichment

To to increase the model's capacity for prediction, more information from various sources was
incorporated:

 Demographic data: Population density, age distribution, and income levels at the census
tract level were taken from 2the National Statistics of Spain(INE).

4
 Geographic information: The National Geographic Institute (IGN), the Valencian
Cartographic Institute, and the Regional Ministry of Education, Culture, and Sport
provided information on green areas, schools, and transportation
infrastructure.
 Proximity analysis: Distances to key amenities like schools, parks, and the coast were
calculated using network analysis to accurately reflect travel times.

Data Cleaning and Preprocessing

Data cleaning was a crucial step to ensure data quality and reliability. Outliers in property
features like area, number of bedrooms, and bathrooms were removed to prevent model bias.
Properties with missing essential information were also excluded. To avoid data leakage,
identical properties within the same building were merged.

Dataset Overview

The final dataset comprised approximately 39,943 unique multifamily properties in Alicante,
Spain. It included a rich set of features encompassing property characteristics, location,
neighborhood demographics, and accessibility to amenities. This comprehensive dataset
provided a solid foundation for building accurate housing price prediction models.

Affordability issues persist

The American dream of homeownership might be facing some roadblocks in the coming years.
While efforts are underway to lower mortgage rates, a new report suggests affordability will
remain a significant concern for many potential buyers.

Key Challenges:

 Soaring Home Prices: A combination of rising home prices in various markets and
increasing living expenses is pushing homeownership out of reach for many first-time
buyers, who traditionally represent a major segment (30-40%) of the housing market.
 Limited Progress on Affordability: Realtor.com predicts a slight decrease in home
purchase costs by 2024, but this still translates to a significant portion of income (around
35%) – higher than historical averages.

5
A Glimpse into the Future:

 2024: A modest decrease in home purchase costs is anticipated, making homes slightly
more affordable. However, affordability remains a major hurdle, especially for first-
time buyers.
 2025: Goldman Sachs forecasts a 3.7% increase in home prices, driven by existing
market momentum. While mortgage rates are expected to dip slightly (around 6.3%),
affordability is still a central concern, particularly for first-time buyers.

Expert View:

Roger Ashworth, a Goldman Sachs analyst, emphasizes the demographic challenge:

"The largest demographic group in the US is 30-39 year olds, and this is expected to grow for
years to come. This age range often coincides with life milestones like starting families. While
some individuals will prioritize buying regardless of rental affordability, financing costs still play
a major role. With current high financing costs, renting remains the cheaper option. Based on our
forecasts, mortgage affordability is unlikely to see significant improvements in the near future."

6
Figure 2. Number of articles published per year.

Table 2. Most popular publication

Source Title Number of Articles


TheJournal of Finance and Economics for Real Estate 8
4
International Journal of Housing Markets
and Analysis
ISPRS International Journal of Geo- 4
3
Information Transportation Research Part D:
Transport and Environment 3
Journal of Geographical Systems 3
Journal of Property Research

7
4.5.3 Input Data

Furthermore, we classified the based on kinds of input data that were used, as shown
in Appendix A's Table A2. The majority of studies used a combination of these data sets, in
contrast to the modeling methodologies. The source datacan be categorized as follows:

 Property-related features: This traditional category includes structural, socioeconomic,


and environmental characteristics commonly used in property valuation, as well as data
on nearby points of interest (POIs).
 Standard spatial data: This encompasses basic geographic information such as
coordinates and straight-line distances between locations.
 Advanced spatial data: This category covers more complex spatial data like
topographical information and alternative distance metrics derived from road or public
transportation networks.
 Graph-structured data: This type of data represents relationships between entities as
networks.
 Unstructured data: This includes text and image data, which offer rich but less easily
processed information.

4.5.4 Novelty Assessment

Model Novelty Score: We began by identifying the primary modeling technique for each study.
This was either the sole method employed or the one newly suggested by the writers. Then,
based on the presumption that more recent techniques typically outperform older ones in
prediction tasks,
a novelty
2
score was assigned. For example, studies show that MLP is superior to MRA and RF.
beats MRA and GWR, and enhanced tre exceed kriging terms of home value forecast.
Since deep learning (DL) is currently the most efficient method, it received the highest rating.
technique for managing text and image-based unstructured data. The range of scores was 1 (most
traditional) to 8 (DL)

Data Novelty Score: Each publication was evaluated based on the types of input data included.
Similar to the model score, a novelty score was assigned to each data type. Common data like
property features and temporal information received lower scores, while spatial data, especially
advanced types, were ranked higher. Graph-structured and unstructured data (text, images) were
considered the most novel. The overall data novelty score was calculated by summing the scores
of all included data types. While theoretically possible to reach a maximum score of 43, practical
limitations in data combination typically resulted in scores from one to twenty..

8
Table 3. Model novelty score.

Model regression analysis


Multiple Sco

Krigi 1

Spatial econometric 1

Spatially varying coefficient 1

Time series 2
Nearest 3
Fuzzy 2
Decision 3

Support vector 4

Artificial neural 4

Random 6
Gradient boosted 6

Ensembles 6
Deep 8

4.5.5 Data Accessibility

Data accessibility varied considerably across the reviewed publications. While the vast majority
(96%) relied on proprietary data sets, a small number offered some level of public availability.
Three studies explicitly mentioned having publicly available data sets, while five offered the
possibility of requesting access.One such instance is which provides a downloadable compressed
file containing both the data set and the codebase used for analysis ( accessed February 1, 2022).
This specific data set contains fundamental spatial information like as coo and standard property
attributes.

9
However, transparency wasn't always as clear. Two publications referenced data sources (CINP
and Centadata) without providing specific details or links. Although both these data sets
reportedly contain standard property data, temporal information, and basic spatial features goes a
step further by including socioeconomic factors and points of interest (POIs).

The remaining studies offering access upon request primarily dealt with basic property features
with two extending
1
to include network distances (considered advanced spatial data) Lastly,
offered access to a data set textual features in addition to property information.
containing

Table 4. Data novelty score.

Standard features
Input Data
Structural Sco
Temporal 11
Socioeconomic 1
Environmental 1
POI 1
1

Standard spatial 4
data Advanced 6
spatial data Graphs 8
Unstructured data
Images 10
Text 10

1
Chapter 5
Result
This section categorizes the reviewed papers based on their modeling techniques and analyzes
their novelty over time. Table 5 provides a detailed breakdown of the papers by method,
including publication year ranges. Figure 3 visually represents the model novelty scores of each
study plotted against publication year and author name.Our analysis reveals that traditional
methods
1
like multiple regression analysis, kriging, spatial econometricspatially
models, varying
and
mod dominate the field. Most studies employing 1
these methods received low novelty scores,
with many earning the lowest possible rating. While time64series and fuzzy logic models have been
explored by a few researchers, they are not as prevalent.Recent years havewitnessed a growing
interest in methods capable of handling large datasets, such 1
as artificial neural networks and
support vector Additionally, the field is embracing more advanced machine learning
techniques, including ensemble methods like random forests and gradient boosted trees, as well
as deep learning. These approaches hold significant promise for improving residential property
valuation models.
year per
Table 5. Model-based categorization of studies with count and the range of publication

Model Studies Count Range


MRA [29–44] 1 1996–2020
6
Kriging [45–54] 1 1995–2019
0
SE [2,21–24,55– 2 1992–2021
M 78] 9
SVC [4,8,26,79– 1 2012–2021
93] 8
Time [94–97] 4 2004–2015
Series

Table 6. Cont.

Model Studies Count Range


Fuzzy Logic[98,99]22006–2016
NN [100,101] 2 2017–2021
DT [25] 1 2021–2021
SVM [102] 1 2014–2014
ANN [16,103,104] 1 3 2011–2021
RF [17,18,105] 3 2020–2021

1
GBT [19] 1 2021–2021
Other Ensembles [27,106] 2 2020–2021
DL[28]12021–2021

color of the data point relates to the publication


Figure 3. Model novelty plot based on scores relative to study ID (sequence based on year). The

Figu 4. Model novelty plot based scored relative to study ID(based data).The list of
relates to the

1
2
5. Model novelty plot based on scored relative to study ID(based data).The list of
Figu
r elat in table to the

Figu 6. Model novelty plot based scored relative to study ID(based data).The list of
relat in table to the

1
7. Model
Figu novelty plot based on scores relative to study ID (sequence based on year).
color of the data relates to t publication ye

Figure 8. Model novelty plot based scored relative to study ID(based data).The list of
relates in table to the

1
9. scored data).The list
Figu in table
Model novelty plot based relative to study ID(based of
relat to the

5.1.1. Multiple Regression Analysis


forecast home 45

Hedonic price models have been used extensively pric because they make the
many housing attributes. These models, which
assumption that property
5 values are a function
have historically relied on multiple regression analysis (MRA), these mod typically incorporate
structural, neighborhood, and locational factors. While linear regression is common, researchers
have also employed semi-log and double-log regressions, combined with various estimation
techniques Stepwise regression, LASSO, least absolute deviation, generalized least squares, and
ordinary least squares..

To capture non-linear relationships, generalized additive models (GAMs) have been introduced,
often incorporating spatial and temporal dimensions. Semiparametric models, including those
using geospatial splines, have been developed to further enhance flexibility. Additionally,
multilevel hedonic regression models have been proposed to account for hierarchical data
structures, such as properties within neighborhoods or communities. These models often employ
Bayesian estimation methods for increased complexity.

Despite the emergence of more sophisticated techniques, MRA remains a popular and
foundational approach in property valuation. Its simplicity and interpretability make it a
benchmark for comparison with newer methods.

1
5.1.2. Kriging

Kriging, a geostatistical method, has been another popular approach to valuing residential
properties. Ten studies within our review period employed this technique. Kriging predicts
property values at unobserved locations based on known values at nearby locations. It assumes
that property prices vary spatially following a random pattern with a consistent average and a
relationship between locations determined solely by distance.

To estimate property values, kriging calculates a variogram, which measures how property prices
vary with distance. A mathematical model is fitted to this variogram, and weights are assigned to
known property values to predict prices at unknown locations. This process minimizes prediction
error.

Cokriging expands on this by considering additional factors beyond location that influence
property prices. It incorporates multiple variables and their spatial relationships to improve
predictions. Regression cokriging further refines this by modeling multiple interrelated
equations, allowing for complex relationships between variables and their residuals.

To address the limitation of assuming a constant average property price, regression kriging
combines traditional regression modeling with kriging. A regression model is initially fitted to
property characteristics, and kriging is then applied to the prediction errors. The final prediction
is a combination of both models.
1
Finally, Average property values for particular locations are included in area-to-point kriging
with external drift (A2PKED), along with individual property data, providing a more
comprehensive approach to spatial prediction.

5.1.3. Spatial Econometrics

addresses th spatial interdependence often present in property data.


Spatial
e
Techniques like spatial autoregressive (SAR) and spatial error models incorporate spatial
relationships into the traditional hedonic model. These models rely on spatial weight matrices,
which quantify the influence of neighboring properties.

SAR models directly


39
incorporate the average price of neighboring properties, while spatial error
models account for spatial autocorrelation in the error terms. The general spatial combines
both approaches. Researchers have employed various methods to construct spatial weight
matrices, including distance-based criteria and more complex techniques like local anisotropic
1
methods.

1
To capture both spatial and temporal patterns, spatiotemporal models have been developed.
These models extend spatial econometrics to incorporate time-series elements. Additionally,
spatial quantile regression and spatial Durbin models have been used to address specific issues,
such as heteroscedasticity and the impact of spatially lagged independent variables.

While maximum likelihood, generalized method of moments, and Bayesian methods are
commonly used for estimation, alternative approaches like spatial expansion and modeling
spatial autocorrelation as a constant term have also been explored. To accommodate non-linear
relationships and spatial heterogeneity, semiparametric and hierarchical spatial models have been
proposed.

Spatial econometric methods have become increasingly prevalent in property valuation research,
with the spatial error model being particularly popular. These techniques offer significant
advantages in capturing the complex spatial dependencies that influence property prices.

5.1.4. Spatially Varying Coefficient Models


Unlock the power of advanced spatial analysis with our comprehensive 2
The
4 Geographically
Weighted Regression (GWR) modelwas created especially for the study of real estate markets.
This innovative tool offers arobust solution for understanding property values through spatially
varying coefficients, allowing you to accurately model and predict house prices based on
location- specific factors.

GWR is a sophisticated statistical method that estimates parameters for each observation
individually, accommodating the unique characteristics of each property. By using a distance
matrix tailored to each observation, our model provides highly localized insights, enhancing
accuracy in property valuation. The model integrates various weighting approaches, including bi-
square and Gaussian kernel functions, ensuring the most reliable results.

Our GWR model goes beyond traditional methods by incorporating temporal elements through
The regression that is weighted by geography and time (GTWR). This addition permits the
coefficients to change across time as well as over space., incorporating travel distances based on
road networks and adapting to dynamic changes in the housing market.

Additionally, the model features a semi supervised regression approach with co-training GWR,
employing both Gaussian and bi-square kernel functions. This iterative training process 1enhanc
the accuracy and adaptability. For a mo nuanced analysis, our Global regression is
combined with local GWR in a mixed-scale hedonic model. approaches, accommodating spatial
stationarity and nonstationarity simultaneously.

Process Model with Bayesian Spatially Varying Coefficient offers further precision by predicting
house prices with hierarchical conditional modeling. This approach provides comprehensive

1
inference on model parameters, fresh observation prediction intervals, and efficient handling of
sparse data.

Our GWR model also incorporates Eigenvector Spatial Filtering (ESF) to capture localized
spatial variation and address multicollinearity issues, although it requires advanced computation
and mathematical knowledge. For a more accessible approach, a simplified ESF model is
available, maintaining the core selection methods while offering ease of use.

5.1.5. Models of Time Series

A model of trends that is hierarchical has been introduced to enhance property value
predictions by integrating general price trends with trends at the cluster level and housing
attributes. By differentiating between trends for different kinds of houses, neighborhoods, and
districts, this model addresses temporal and spatial interdependence in property pricing.

Additionally, several time


1
series models have been explored to predict property values across
Various cities in the southern California region. Vector models, vector 1error-
correction , and Bayesi versions of these are all included in this study. vect
error-correction , Bayesi variations of these , with spatial and causality priors.
and
The findings suggest that the effectiveness of these time series models varies depending on the
area.

Another approach combines Geographically Weighted Regression (GWR) with exponential


smoothing through a three-phase forecasting process. First, annual estimates are made for GWR
models. Next, these coefficients' future values are predicted by a time series model. In order to
predict future real estate values, a GWR model employs these estimated coefficients.

Despite their potential, time series models are relatively uncommon in property valuation studies,
with only a few articles exploring them between
1
2004 and 2015.

A5.1.6.
novelFuzzy Logic
methodolo for property valuation combines fuzzy logic with utilizing geographical
analysis and Geographic Information Systems (GIS) tools. This approach involves using real
estate variables to design a knowledge-base operator, generate fuzzy sets, and set rules. operator
for inference. The use of fuzzy logic helps handle the uncertainties and variations in property
data effectively.

An additional novel technique applies a fuzzy Bayesian approach to real estate appraisal. In
order to estimate property values, this two-step procedure begins with Bayesian regression
analysis. Variables displaying deterministic variability in the second stage—contrasting with

2
those that vary

2
randomly—are converted into fuzzy vectors based on fuzzy membership functions. This
fuzzification allows for the creation of fuzzy Bayesian confidence intervals for regression
parameters, resulting in predictions expressed as confidence intervals.

Fuzzy logic models have been explored sporadically in property valuation studies, with examples
appearing in 2006 and 2016. These models, alongside time series approaches, were noted for
their novelty in studies conducted between 2004 and 2016.

5.1.7. Nearest Neighbors


26
In a recent study, the spatial autoregressive (SAR) model was
refined property prices
53 to account
based on the prices
10 of neighboring and the differences i attributes between a
property and its neighbors. This approach enhances the ability to capture the influence of
nearby properties on price.

Instead of focusing only on geographic closeness, another version of the SAR model
incorporates a distance matrix in the attribute space. The distances between observations are
computed using k-means clustering on this matrix. The best model combines attribute-based and
geographic data closeness.

Nearest-neighbor models have been utilized in recent years, specifically in studies from 2017 and
2021. These models are noted for their novelty in the field, as reflected in their high model
novelty scores.

5.1.8. Decision Trees


2
Decision trees form the foundation for several advanced tree-based methods. In one study, the
28
effects ofdifferent factors onKraków housing prices were examined using Chi-Square Automatic
Interaction Detector (CHAID) and Classification and Regression Trees . CART stands
for CHAID, which is well-known for its adaptability in both regression and classification
problems, uses chisquare tests to identify the best data splits..

These decision tree methods supported the results of Multiple Regression Analysis (MRA) by
highlighting that different districts within the city exhibit distinct pricing patterns. This study
also received a high a high novelty score for the model, comparable to nearest-neighbor models,
indicating its innovative approach.

2
5.1.9. Support Vector Machines

For property appraisal, a brand-new model known as 1the Semiparametric Spatial Effect
Squares Support Vector Machine (SSELS- ) was presented. By adding
1

and a component, this model expands upon the least squares asupport
spatialvector
effect term
machine . element. Using a kernel for nonlinear transformations, the SSELS-SVM offers
improved accuracy in predicting house prices compared to both semiparametric Generalized
Additive Models (GAMs) and traditional parametric models. The SSELS-SVM model, used in
2014, has been recognized for its high level of innovation and is rated with a score of 4 for
model uniqueness.

5.1.10. Artificial Neural Networks

The Multilayer Perceptron


59
(MLP) is a basic yet highly effective neural network architecture,
known for its ability to learn complex relationships between input and output data. While its
architecture is straightforward, MLPs excel in capturing dependencies that can significantly
enhance property valuation accuracy.

Research has shown that MLPs can outperform traditional Multiple Regression Analysis (MRA)
in real estate valuation, particularly in markets like Budapest. Additionally, MLPs have been
adapted for property valuation by integrating them with Geographic Information Systems (GIS),
allowing for a more nuanced analysis of geographic data.

Further advancements include spatial neural networks that use MLPs to analyze neighborhood
features derived from satellite images. MLPs have also been employed as meta-models in
stacking ensembles to compare performance across different models.

Despite their potential, MLPs have been featured in only a few studies since 2011, and they,
along with Support Vector Machines (SVMs), have received a high model novelty score of 4.
This highlights their innovative application in property valuation research.

5.1.11. Random Forests

The Random Forest (RF) method enhances property valuation by combining multiple decision
trees into an ensemble and introducing randomization. This technique improves robustness by
constructing each tree from a using a bootstrapped sample of the data to train it on a random

2
selection of variables. RF provides a more accurate estimate of home prices by averaging the
estimates of these several trees.
57
Research has demonstrated that Random Forests outperform traditional hedonic multiple
regressi (MRA) and Geographically Weighted Regression (GWR). A recent study compared
Random Forest with other machine learning approaches and applied Shapley values to explain
the RF model's predictions. Recent evidence suggests that this strategy has been especially
successful
1

in latter two years of the study period.

5.1.12. Gradient Boosted Trees

XGBoost, LightGBM, and Gradient Boosting sophisticated tree-based techniques for property
valuation and geospatial analysis. In a recent study, Gradient Boosting was found to be the most
effective approach for integrating housing and Points of Interest (POI) data into geospatial
network embeddings.

Tree boosting involves sequentially adding trees to correct the errors of previous trees, therefore
lowering the final model's overall loss. This method is implemented using Gradient Boosting; to
improve performance and efficiency, optimized versions are provided by XGBoost and
LightGBM..

These Gradient Boosting Techniques (GBT) have been used in recent studies, with GBT
methods appearing in the latest publication reviewed.

5.1.13. Other Ensembles

Stacking ensembles, a technique 2that combines predictions from base models, have
employed in a few recent studies ([27, 106]). While offering improved accuracy,
7
stacking often
comes at the cost of increased computational time. These studies utilized tree-based like
Random Forest, Gradient Boosted Trees, LightGBM as base models, with either linear
regression or neural networks as meta-models to combine predictions.

Unlike the more established ensemble methods like Random Forest and Gradient Boosted Trees,
stacking remains a relatively new approach in property valuation research, with only a handful of
studies adopting it within the last few years. Given its ability to leverage the strengths of multiple
models, stacking shows promise as a valuable tool for enhancing property valuation accuracy.
2
5.1.14. Deep Learning

Deep learning represents the most advanced modeling explored in property


research thus far. While some studies have employed neural networks for creating embeddings
from complex data such as graphs and satellite images, these works primarily focused on feature
engineering rather than using deep learning as the core predictive model.

In our review, just one study used deep learning specifically to predict real estate prices. In order
to analyze and comprehend textual property descriptions, this study integrated self-attention
1

mechanisms with Long Short-Term Memory (LSTM) networks. Interestingly, this study is
notable for the greatest model novelty score; it is shown by its placement at the far right of the
novelty timeline. cutting-edge status in the industry.

Despite its potential, deep learning remains relatively underutilized in property valuation
compared to more traditional methods.

5.2. Data-Based Categorization

Table 6 provides a detailed overview of how different input data types have been utilized across
the studied publications. To assess data novelty, we assigned scores based on the complexity and
uniqueness of each data type. Figure 4 visually represents these data novelty scores over time.

According to our investigation, the most often used input categories are structural, temporal,
point- of-interest (POI), and simple geographical data. Although some researchers have taken
socioeconomic and environmental factors into account since the beginning of the study period,
their use is less common. broad. Advanced geographic data, including network-based and
topographical data distances, have become more and more prominent in recent research.

In contrast, graph, image, and textual data remain relatively underutilized, with only a few
studies exploring their potential. The diverse combinations of input data across the analyzed
papers result in a scattered pattern of data novelty scores over time, highlighting the varying
levels of data sophistication employed in the field.

2
1
Table 7. Data type-based categorization of studies with count and the range of publication year
techniq
Cou
Data Studi nt Range
Structural [2,4,8,16–19,21–50,52–95,97– 91 1992–
2021
Tempor [4,8,18,19,21–23,25,28,29,31–34,36–49,51–57,59–63,65–77,79– 78 1995–
100,102,104,105] 20
Socioecono [2,8,18,19,21,23,29,31,36,37,39–41,44,57,59,61,63– 40 1992–
mic 66,68– 2021
70,73,75,77,78,80–82,85,89,92,95,97,100,103–105]
Environme [21,24,26,29,41,44,55,58,59,64,69,87,98] 13 1996–
ntal
1
2020
POI [16–19,21,23–27,30–35,38,39,44,46,48,49,52–54,56–59,64– 56 2002–
66,69,71,73,74,76–81,83,86–89,92,95,97,98,100–102,104,105] 2021
Basic spatial [2,4,16–19,21–24,27–39,41–46,49–55,57–62,64–68,70–81,83– 81 1992–
95,97,98,100–106] 2021
Advanc [26,27,30–32,39,44,51,55,57–59,61,65,66,86] 16 2002–
spatial 2020
Graphs [19] 1 2021–
2021
Images [104] 1 2021–
2021
Text [28,106] 2 2021–
2021

5.2.1. Structural Features

Structural property features are a fundamental component of nearly all property valuation
models. As evident from Table 6, an overwhelming majority of the reviewed studies
incorporated these characteristics. While the depth of structural detail varies widely, from a
single living area measure to extensive lists of attributes, it's clear that this information forms the
backbone of property valuation datasets.

Property size, often measured by living area, consistently emerges as a crucial factor. Several
studies, including those employing machine learning techniques, have highlighted its significant
impact on property value. For instance, research using four machine learning models found
house area to be the most influential factor, accounting for between 8% and 20% of property
value.
15 Other
key structural attributes, such p number of bedrooms edictors.
the most important r

2
the f
l
o
o
r
s
,

a
l
s
o

f
r
e
q
u
e
n
t
l
y

r
a
n
k

a
m
o
n
g

2
5.2.2. Temporal Data

Time-related data, such as age, construction year, and transaction date, commonly incorporated
into property valuation models. Often represented as numerical values or dummy variables, this
temporal information is frequently treated as a structural property characteristic.

While some studies estimate separate models for different years due to data limitations, time-
series models explicitly leverage temporal data to capture property value trends over time.
Another approach, the repeat sales method, analyzes price changes of the same property over
time to estimate market trends. However, this method is often combined with other valuation
techniques for better accuracy.

Given its widespread use and consistent inclusion in feature importance analyses,1 temporal data is
considered as crucial to property valuation as structural information. Factors like building age
construction frequently rank among the influential predictors of property value.

5.2.3. Socioeconomic Features


Unlock the full potential of your real estate valuation process with our Comprehensive
Socioeconomic Data Integration Tool. Designed to 70 seamlessly integrate a wide range of
socioeconomic features, this tool empowers you to enhance the accuracy depth o your
property assessmen using sophisticated hedonic pricing models.

Our tool includes an extensive array of demographic and economic data, such as population
levels, age ratios, single-person household statistics, owner occupation rates, and unemployment
rates, ensuring a comprehensive analysis of neighborhood characteristics. This data is essential
for a nuanced understanding of property values and their determinants.

Additionally, the tool incorporates detailed occupant-level socioeconomic features, 1such as


household income, years of education, work experience. These variables are crucial for a
granular and individualized assessment more of property values, taking into account the
attributes of each household. unique

With its user-friendly interface and robust data integration capabilities, our tool simplifies the
process of incorporating these complex variables into your hedonic regression models. Whether
you are a real estate professional, researcher, or data analyst, this tool provides you with the
necessary resources to conduct thorough and accurate property valuations.

Experience the power of precise and comprehensive data with our Socioeconomic Data
Integration Tool and elevate your real estate valuation practice to new heights.

2
5.2.4. Environmental Features
63
Environmental features can play a significant role in property valuation a
1 part of broader
neighborhood characteristics. For instance, a street quality index and an index of nonresidential
5

land on the can be derived from specific data sets. Similarly, neighborhood land use or
cover variables often incorporated into models to reflect environmental quality.
1
Census variables that measure overall environmental quality, along with specific variables
to air and noise pollution, frequently used in these models. Unlike the direct measurement of
air pollution levels, environmental quality often relies on subjective assessments, such as the
perceived presence of greenery and general environmental conditions.

Some research
47
create land use and cover variables, especially those linked to visibility aspects,
using digital elevation models and Geographic Information System (GIS) data.
Furthermore, seismic activity data can be used as an explanatory variable, and
It has also been investigated how water quality affects real estate values.

Overall, environmental data is included in a substantial number of property valuation studies,


underscoring its importance in creating accurate and comprehensive valuation models

5.2.5. POI Data


74 72
Points of Interest (POIs) are frequently utilized to evaluate a home's neighborhood quality. Points
of interest (POIs) that most frequent encountered are establishments including schools,
hospitals, rail stations, highways, CBDs, and natural features. POI information is usually
included in a distance feature in property valuation models, each property's distance from the
Position of POI.
1 1
POIs can also be included as dummy variables to indicate whether a specific kind of POI, such
, is present within given radius of property. The quantity of various POIs, like
a
neighboring eateries or schools, may also be taken into account.
77
In some studies, POI 1hotspots, such as green spaces and commercial or
have
been identified using social media check-in data. POI data is utilized in business
more than half of
property valuation studies, highlighting its importance in creating accurate and comprehensive
valuation models.
2
5.2.6. Basic Spatial Data

T kind of input data, distance features are calculated using common metrics like Euclidean
distance employing location data in the form of coordinates.
1
Coordinates: Since "spatial" and "geospatial" are included in the search query, a lot of articl
inclu location information of so kind, particularly geographic coordinates. A total of 54
papers make 24
use of this data in different ways or include coordinates as variables in their models.
For example, the spatial weights matrix in econometric models is based on distances
estimated based on coordinates. Furthermore, coordinate data are needed for spatially variable
coefficient models in order to Calculate the coefficients. Latitude was shown to be one of the
most important characteristics in one study. for employing a Random Forest (RF) model to
forecast home values

Distance Features: This type of input data includes 1actual distance or accessibility measures
model features. Dummy variables indicating the presence of a POI within a certain radius are
considered POI data rather than distance features. Euclidean distance is frequently used for
calculating these features. The Haversine formula, which approximates the great-circle distance
on the Earth's surface, is another method used. One common distance feature in the literature is
the measure of distance to the central business district (CBD), which has been included in
hedonic models from the early days of property valuation studies.

Basic spatial data is widely used in property valuation models and has been consistently included
throughout the study period. However, the approach to using exact location data
5 has evolved. In
the early years, coordinates were often implied through weight matrices in spatial
models and kriging methods. More recently, a direct approach has been adopted, incorporating
5
longitude and latitude variables into predictive

5.2.7. Advanced Spatial Data

Advanced measurements based on networks topographical da are taken into account


in this
category.

Topographical Data: Visibility features are often created using digital elevation models.
These geographical details aid in identifying the regions that each property can see that have
particular land uses.
1
Studies on noise pollution that may have an impact on properties also include
their models using noise derived from noise

Advanced Distance Features: A more sophisticated and practical approach to incorporate distance
in value models is through the use of road network-based distance metrics. To link roadway

3
networks with property and point-of-interest locations, these sometimes call for the use of a GIS

3
application. aspects of accessibility and travel durations, whether traveling by vehicle, bicycle, or
foot are likewise regarded as distance features. For example, certain research utilizing principal
component analysis (PCA) of walking and travel to create accessibility covariates road network-
based travel times to points of interest. Furthermore, municipal and regional accessibility
initiatives are frequently examined depending on journey durations.

In certain studies, accessibility metrics are computed by hand for both local and global
characteristics. These models incorporate a number of traffic accessibility indices, including
metro
1
accessibility, walking accessibility in the road network, and bus accessibility based on bus
and road from station data and subway maps. Oftentimes, traffic variables rank among the
most crucial characteristics of models for machine learning.

Centrality and Connectivity: The street network is used to compute variables pertaining to
centrality and connectivity, emphasizing the significance of these elements in property valuation
models.

While basic spatial information is widely used, 1advanced spatial appear in only 16
seven published before 2010 and nine after. The presence of advanced spatial data typically raises
5
the data novelty score significantly. Among the with high data novelty scores, the majority
include advanced spatial information, often combining multiple input data types. This
combination of data types underscores the significant role advanced spatial information plays in
enhancing the novelty and accuracy of property valuation models.

5.2.8. Graph Data

In study, the authors built their own gra by joining homes with regions, rail stations, schools,
and other sites of interest (POIs), combining their properties into a structure based on location.
After that, a graph neural network was implanted in this graph. and applied to the prediction
model as a collection of features. Graphs' application in real estate appraisal models have been
few and have only been implemented since 2021. The introduction of a graph This article's data
is noteworthy for having high data novelty scores. Given that it was released in 2021, the piece
occurs in Figure 4 close to the x-axis's end.

5.2.9. Image Data


Visual aspects should be taken into account in property appraisal procedures since they are
frequently important to buyers. In one study, these visual characteristics were extracted from
satellite data and then convolutional neural networks (CNNs) were utilized to embed them in
vectors. Following that, these vectors were integrated with temporal, spatial, socioeconomic,
location, and point-of-interest characteristics to provide the final
5 model of prediction. Because of
its high data novelty score, our method ranks the publication in top-right corner of Figure 4
the 2021 publication.
3
5.2.10. Textual Data
Textual descriptions of houses have been included to common features in property value models
in recent research. A document-word frequency matrix was one method used to include these
descriptors into the value model. An LSTM model was utilized in a different investigation to
process the textual information. The models gave amenities, transportation, and place names a lot
of weight. Key characteristics of the apartments taken from the descriptions. This creative
application of textual data is renowned for having excellent scores for data originality and being
included in two of the most recent publications, putting them in Figure 4's upper-right corner.
.
5.2.11 Exterior for Your Home
Classic Charm: Wood Siding

 Clapboard: This timeless style features horizontal overlapping planks, offering a traditional and
cozy appearance.
 Shingles: Known for their textured look, shingles can be made from cedar, asphalt, or
other materials.
 Shiplap: This vertical plank siding brings a modern farmhouse aesthetic to your home.

Low-Maintenance Appeal: Vinyl and Fiber Cement

 Vinyl: Offering a wide range of colors and styles, vinyl siding is durable, low-maintenance,
and often mimics the look of wood.
 Fiber cement: Known for its durability and fire resistance, fiber cement siding provides
a versatile look that can be painted to match your desired color.

Timeless Elegance: Brick and Stone

 Brick: A classic choice that exudes timeless beauty and durability. It offers excellent insulation
and requires minimal maintenance.
 Stone: For a luxurious and natural look, stone exterior can add significant curb appeal.
However, it's typically more expensive and requires professional installation.

Modern and Sleek: Stucco and Metal

 Stucco: This cement-based material offers a smooth or textured finish and is popular in warmer
climates for its insulating properties.
 Metal: Contemporary and low-maintenance, metal siding comes in various finishes,
including aluminum, steel, and copper.

3
Factors to Consider

When selecting your exterior covering, consider these factors:

 Climate: Your local weather conditions will influence material durability.


 Maintenance: Evaluate your willingness to invest time in upkeep.
 Cost: Determine your budget for materials and installation.
 Style: Choose a material that complements your home's architecture.
 Energy efficiency: Consider materials with good insulation properties.

Rates the overall condition of the house

Determining a home's overall condition is crucial for both buyers and sellers. It involves a5
comprehensi evaluation o various factors that contribute to property's value and livability.

Key Factors to Consider

 Structural Integrity: This examines the foundation, framing, roof, and overall stability of the
house. Signs of cracks, leaks, or structural damage can significantly impact the condition rating.
 Exterior Condition: The assessment includes the state of siding, roofing, windows, doors,
and landscaping. Factors like age, wear and tear, and maintenance history are considered.
 Interior Condition: Evaluating the condition of walls, floors, ceilings, kitchens, bathrooms,
and overall cleanliness is essential. Up-to-date fixtures, appliances, and finishes contribute to
a higher rating.
 Systems and Components: Checking the functionality and age of heating, cooling, electrical,
plumbing, and ventilation systems is crucial.
 Maintenance History: A well-maintained home generally reflects a better overall condition.
Evidence of regular upkeep and repairs is a positive indicator.

Rating Systems

While there's no standardized rating system, common terms used to describe a home's overall
condition include:

 Excellent: The home is in pristine condition with minimal wear and tear.
 Very Good: The home is well-maintained with minor cosmetic or functional issues.
 Good: The home is in average condition with some noticeable wear and tear.
 Fair: The home requires significant repairs or renovations.
 Poor: The home is in disrepair and may require extensive work.

3
5.3 Discussion

Figure 5 illustrates the relationship between model novelty and data novelty across 93 articles,
with each data point representing an article and colored by publication year. The plot displays
two novelty scores for each article, and jittering is applied to improve legibility due to overlap in
scores. The plot is annotated with five inferred clusters, categorized by model and data novelty.
1
Cluster 1: This cluster uses traditional input data types with classic hedonic procedures. Mod
types with novelty scores between 1 and and data novelty values between 1 and 9 are included.
Structure, temporal, economical, environmental, point of interest, and fundamental geographic
information. This cluster's articles are mostly from 1992 to 2021 and focus mostly on approaches.
1
such as Support Vector Systems, Kriging, Multiple Regression Analysis (MRA), and Spati
Econometric Models (SEM) Fuzzy Logic (FL), time series, and Vector Classification (SVC).
Lately, this cluster's works have concentrated on using tried-and-true techniques while
broadening the scope of input data types.

Cluster 2: 14 papers in this cluster have 1a model novelty score of 1 and a data novelty score of
or higher, demonstrating the application of sophisticated geographic data along with traditional
hedonic approaches. Prior to 2015, the models in this cluster, mainly MRA and SEM as well as
kriging and SVC subsequently show an emphasis on incorporating cutting-edge spatial data into
conventional techniques for appraisal.

Cluster 3: This cluster, which consists of six articles, uses conventional input data types but
includes more sophisticated model types, such as fundamental machine learning techniques.
Interestingly, this cluster's model uniqueness declines with time, with papers released between
2011
3
and 2016 utilizing Artificial Neural networks (ANNs) and support vector machines
(SVMs), as well as those that use them between 2017 and 2021 Decision trees (DT) and neural
networks (NN). Variety exists in data novelty; just one article does not utilizing fundamental
spatial data

Cluster 4: With three articles, the smallest cluster has low data novelty and strong model novelty.
Published around 2020 or 2021, this cluster demonstrates the application of Random Forest (RF)
techniques to sophisticated machine learning algorithms with conventional input data sources.
The pieces come together in one article. diverse data kinds, including POI, structural, temporal,
socioeconomic, and fundamental spatial details.

Cluster 5: The five articles in this most current and innovative


1
cluster have significant data and
model innovation. These papers, which were released in latter two years of the study period,
3
make use of sophisticated input data types as well as advanced ML and deep learning techniques.

3
data Although most articles in this cluster show high data novelty, one article has a lower data
novelty score due to its focus on advanced spatial information. This cluster includes a variety of
models, including deep learning, Gradient Boosting Trees (GBT), and ensemble methods, with
one outlier article using Multi-Layer
5
Perceptrons (MLP) and Convolutional Neural
(CNNs) for image data transformation.

5.3.1 Trends and Opportunities

Overall, the clusters highlight the evolution of property valuation methods from conventional
approaches to more advanced techniques incorporating novel data types and modeling strategies.

The residential property valuation sector often prioritizes proven models and data types when
resolving Research Questions 3. The majority of papers—both old and new—show minimal data
and model innovation. Some, nonetheless, stick out because of their high data or uniqueness of
model. or both. A change has been observed in recent studies experimenting with novel machine
learning (ML) techniques. toward more sophisticated machine learning prediction models,
deviating from conventional techniques. Additionally, this tendency may be seen in the rising
usage of novel data forms such graphs and unstructured data, even though it has conventional
characteristics like basic geographical, temporal, and structural data continue to be common.
5
Several factors contribute to the reliance on conventional hedonic methods and traditional 11
data
typ in residential property valuation. One significant factor is data availability. While big data
and deep learning methods become mo accessible, high-quality, publicly available housing
datasets that include transaction
1
prices, hedonic attributes, and spatial data are still scarce. Existing
datasets, such as those from King County (USA), Melbourne (Australia), Ames (Iowa, USA),
Boston (USA), are often limited in spatial scope and require combining with separate, more
advanced spatial data sources. This can complicate data integration, leading to issues like
merging difficulties, missing data, and sparsity. Additionally, dealing with unstructured data
like graphs, 60
images, and text presents challenges due to resource-intensive nature data
collection a preprocessing.

Another factor may be the gap between academic research and industry practice. While academic
research has traditionally focused on established methods and data types, industry may explore
more novel techniques and data types that are not always reflected in academic literature.

The property valuation area is presented with new options as a result of the progressive
integration of advanced and traditional data sources and increasingly sophisticated forecasting
methodologies. It is possible to make use of unstructured data, like text and image files, and to
create sophisticated geographical information and more intricate distance calculations. The use

3
of deep learning methods, which are well-suited for handling unstructured data, could further
enhance the

3
effectiveness of these models. Future research might focus on integrating advanced input data
types with traditional ones and developing tailored ML and deep learning methods to manage
and extract valuable insights from diverse data sources.

Learning curves and training time according to training set

Selecting the optimal algorithm 4for predicting house prices a complex task that involves
considering multiple factors beyond just raw performance metrics. While it's tempting to simply
choose the algorithm with the highest accuracy, a more nuanced approach is necessary.

Key Considerations:

 Performance Metrics: While17accuracy (measured by metrics like R-squared) is crucial,


it's essential to consider other metrics such as Mean Squared Error (MSE) or Me
Absolute Error (MAE) to2 get a comprehensive understanding of the model's performance.
 Overfitting: Overfitting occurs when a model remarkably well on training
but poorly on unseen Algorithms that are good at avoiding overfitting are XGBoost
and LightGBM.
 Model Complexity
8
and Training Time: Some algorithms, like deep neural networks,
require substantial computational resources and time for training. For large datasets,
simpler models like linear regression or decision trees might be more practical.
 Interpretability: Understanding the factors influencing the model's predictions can be
valuable. Linear regression models are generally more interpretable than complex
models like random forests or neural networks.
 Model Deployment: The size of the model can affect deployment. Smaller models are
often preferred for real-time applications.

Iterative Model Selection:

Given these complexities, a systematic approach is recommended:

1. Shortlist Algorithms: Based on initial performance metrics, select a few promising


algorithms (e.g., XGBoost, LightGBM, Random Forest).
2. Hyperparameter Tuning: Experiment with different hyperparameter settings for each
algorithm to optimize performance.
3. Cross-Validation: To assess the performance of the
mod on several data subsets and
spot possible overfitting, use cross-validation.
12
4. Model Comparison: Compare the performance of tuned various
models
metri .
5. Consider Additional Factors: Evaluate factors like training time, model size, and
interpretability to make a final decision.

3
Ensemble Methods:

Combining multiple models (ensembling) can often improve predictive performance. You can
experiment with methods like stacking, boosting, and bagging.

Model Interpretation
27
Selecting the optimal algorithm for predicting house prices is a complex task that requires
consideration of various beyond just raw performance metrics. While the
Gradient Boosting Regressor (GBR) might initially appear to be the top contender due to
its strong performance in both the test dataset and cross-validation, other factors must be
weighed.

Key Considerations for Model Selection

17
 Performance: While accuracy (measured by R-squared) is crucial, other metrics like
Squared Error (MSE) or Mean Absolute Error (MAE) should also be considered. Small
differences in performance might not justify the complexities of some models.
 Overfitting: models that exhibit remarkable performance on training data but subpar
performance on unknown data
 Stability: Consistency in performance across different data subsets is essential. Models
with low variance in cross-validation are generally preferred.
 Computational Efficiency: Training time and model complexity can impact the
feasibility of deployment, especially in real-time applications.
 Interpretability: Understanding the factors influencing predictions can be valuable.
Simpler models like linear regression are often more interpretable than complex ones.

Model Interpretation: Understanding Feature Importance

To gain insights into which factors drive house prices, feature importance analysis is crucial.
While many algorithms provide built-in methods, using a consistent approach like permutation
importance offers several advantages:

 Model Agnostic: It can be applied to any model.


 Interpretability: The results are easily understandable.
 Efficiency: It's computationally less intensive than other methods.

By shuffling feature values and measuring the impact on model performance, permutation
importance helps identify the most influential factors.

4
The Path Forward

Given the trade-offs between different algorithms, it's often


8
beneficial to experiment with
multiple models and hyperparameter tuning. Ensembling techniques, which combine
from multiple models, also improve overall performance.
31
Ultimately, the best model depends on the specific requirements of the project, such as desir
level of accuracy, computational a interpretability needs. A careful evaluation of
resources, 10
these factors will guide the selection of most suitab algorithm.

Machine Outperforms Traditional Methods


46
The study clearly demonstrates the superiority of machine learning algorithms over the
traditional 1linear model (OLS) in predicting house prices. This is attributed to the ability of these
algorithms to capture complex, non-linear relationships within the which is essential for
accurately modeling t real estate market.

Ensemble Methods: A Strong Contender

Among the machine learning algorithms tested, ensemble methods like Gradient Boosting
(GBM), XGBoost, and LightGBM exhibited exceptional performance. These algorithms excel at
handling complex datasets and reducing overfitting, making them ideal for housing price
prediction. While Random Forest (RF) also performed well, it was more prone to overfitting
compared to the boosting-based methods.

Key Factors Influencing House Prices

Feature importance analysis revealed that 2floor area, number of bathrooms, and the
of an elevator the most significant predictors of house prices. Additionally, location,
specifically proximity to desirable areas like Playa
4
de San Juan and El Cabo de la Huerta, plays a
crucial role. Neighborhood characteristics, such as net household income, also significantly
impact property values.

Impact of the COVID-19 Pandemic

study highlights 4
T t temporary impact of the COVID- pandemic on the housing market
Alicante. While there was an initial price decline, the market quickly recovered, and prices
surpassed pre-pandemic levels.

4
Limitations and Future Research

While the study provides valuable insights, it's essential to acknowledge limitations.
Relying solely on asking prices might not perfectly reflect actual transaction prices.
Additionally, the focus on Alicante limits the generalizability of findings to other regions.

Future research could explore:

 Longitudinal analysis: Tracking price changes over extended periods to identify trends
and patterns.
 Incorporation of additional features: Exploring the impact of factors like energy
efficiency, property age, and school district quality.

4
Chapter 6

Conclusions
Policymakers, buyers, sellers, and other real estate players all depend on accurate residential
property appraisal. A comprehensive analysis of the literature on this subject enhances research
and benefits society by enhancing market transparency and influencing housing improving real
estate appraisal procedures and policies.

This PSALSAR paradigm, this study conducted a thorough evaluation of prior research on
predictive techniques for geospatial data-based house price forecasting. The approach comprised
the following steps:

1. Protocol: Defined the research scope and questions.


2. Search and Appraisal: Developed a search strategy5
to find relevant studies.
3. Synthesis and Analysis: Categorized and the papers based the techniques and
input analyzed kinds employed.
The Several model types (such as MRA,
1
kriging, SEM, SVC, time series, FL, NN, DT, SVM,
1
ANN, RF, GBT, and DL) and input data types (such as structural, temporal, socioeconomic,
environmental, POI, basic spatial, advanced spatial, graphs , pictures, and...). text ). A model
a given paper. scores were based
novelty score data novelty score to Model
on the complexity of the methods used, while data novelty scores reflected the diversity and
advancement of input data types.

These scores were plotted over time and analyzed to identify five distinct clusters:

1. Conventional Methods with Traditional Data: This cluster, comprising nearly 70% of
the reviewed literature, indicates a dominance of traditional methods and data types,
5
showing low levels of novelty.
2. 1Conventional Methods with Advanced Spatial Data: This group combines traditional
methods withmore advanced spatial information.
3. Basic ML Methods with Traditional Data: Includes basic machine learning methods
applied to traditional data
1
types.
4. Advanced ML Methods with Traditional Data : Features advanc machine learning
techniques but still relies on traditional data.
5. Advanced ML Methods with Advanced Data: The most innovative cluster, using both
advanced machine learning methods and novel data types like images and text.
4
Even while traditional approaches are still widely used, new studies are starting to look into
advanced geographical data, unstructured data, and complex machine learning and deep learning
methods.

This shift presents several research opportunities, including:

 Leveraging Unstructured Data: Using deep learning to handle and analyze images and
text data.
 Developing Advanced Spatial Features: Creating more complex spatial data and
distance measures.
 Tailoring Algorithms: Adapting machine learning and deep learning methods to
effective combine diverse features.
75
Future research
21
could focus on integrating advanced input data with cutting-edge predictive
methods to enhance the accuracy and robustness of property valuation mod

4
Chapter 7

Future works
Key Feature Categories

1. Property Characteristics:
o Basic details: square footage, number of bedrooms, bathrooms, and floors.
o Building quality: age, condition, renovations, and material type.
o Additional features: garage size, basement, fireplace, pool, and other amenities.
2. Location:
o Geographic coordinates: latitude and longitude.
o Neighborhood: crime rates, school districts, proximity to amenities.
o Zoning regulations: impact on property use and value.
3. Economic Factors:
o Local and national economic indicators: GDP, unemployment rate, interest rates.
o Housing market trends: supply and demand, price indices.
4. Time-Based Features:
o Seasonal variations: price fluctuations based on time of year.
o Market trends: historical price data to identify patterns.

Feature Engineering Techniques

 Handling Missing Values: Imputation or deletion based on data characteristics.


 Outlier Detection: Identifying and handling extreme values to prevent model bias.
 Feature Scaling: Normalizing or standardizing numerical features for better model
performance.
 One-Hot Encoding: Converting categorical features into numerical representations.
 Feature Creation: Deriving new features from existing ones (e.g., creating a "rooms per
person" feature).
 Feature Interaction: Combining features to capture complex relationships (e.g.,
interaction between square footage and number of bedrooms).
 Dimensionality
7
Reduction: methods such as Principal Component Analysis (PCA) to
minimize characteristics while maintaining data.

4
Example Features

 Derived Features:
o Price per square foot
o Age of the property
o Distance to schools, parks, and public transportation
o Property tax rate
 Categorical Features:
o Property type (single-family, condo, townhouse)
o Heating and cooling systems
o Roof type
o Neighborhood quality

Challenges and Considerations

 Data Availability: Access to comprehensive and accurate data is essential.


 Feature Relevance: Identifying features that significantly impact prices can be
challenging.
 Overfitting: Overfitting can result from adding too many features, which lowers the
generalization of the model.
 Computational Efficiency: Feature engineering can be computationally expensive,
especially with large datasets.

1. Structural Features: These include the physical 2


characteristics of property, such its
size, number of bedrooms, bathrooms, and overall condition. These features provide a
fundamental basis for estimating property values.
2. Temporal Features: These account for changes over time, such as the age of the
property, historical price trends, and seasonal effects. They help capture how the value of
a property evolves and responds to market conditions.
3. Socioeconomic Features: These reflect the characteristics of the neighborhood and its
residents, including average income levels, employment rates, and educational
attainment. They offer insights into the broader economic context influencing property
values.
4. Environmental Features: These include factors like proximity to parks, noise levels,
and air quality. They help assess the livability and environmental quality of a location.
5. Points of Interest (POIs): Features related to nearby amenities such as schools,
hospitals, shopping centers, and transportation hubs. The presence and distance of these
amenities can significantly impact property values.
6. Spatial Features: These involve geographic and topographic data, such as the property’s
location within a city, views from the property, and accessibility based on road networks.
Advanced spatial data might include complex measures of distance and visibility.
7. Unstructured
7
Data: This includes textual descriptions, images, and other forms of data

4
that can be analyzed modern techniques like natural language processing (NLP).

4
Bibliography

[1] Rosen, S. Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition.
J. Political Econ. 1974, 82, 34–55.

[2] Can, A. Specification and estimation of hedonic housing price models. Reg. Sci. Urban
Econ. 1992, 22, 453–474.

[3] Kang, Y.; Zhang, F.; Peng, W.; Gao, S.; Rao, J.; Duarte, F.; Ratti, C. Understanding house
price appreciation using multi-source big geo-data and machine learning. Land Use Policy 2021,
111, 104919.

[4] Yacim, J.A.; Bosh off, D.G.B. A Comparison of Bandwidth and Kernel Function Selection in
Geographically Weighted Regression for House Valuation. Int. J. Technol. 2019, 10, 58.

[5] Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region.
Econ. Geogr. 1970, 46, 234–240.

[6] Gao, Q.; Shi, V.; Pettit, C.; Han, H. Property valuation using machine learning algorithms on
statistical areas in Greater Sydney, Australia. Land Use Policy 2022, 123, 106409.

[7] Sisman, S.; Aydinoglu, A. Improving performance of mass real estate valuation through
application of the dataset optimization and Spatially Constrained Multivariate Clustering
Analysis. Land Use Policy 2022, 119, 106167.

[8] Yang, Y.; Liu, J.; Xu, S.; Zhao, Y. An Extended Semi-Supervised Regression Approach with
Co-Training and Geographical Weighted Regression: A Case Study of Housing Prices in
Beijing. ISPRS Int. J. Geo-Inf. 2016, 5, 4.

[9] Mengist, W.; Soromessa, T.; Legese, G. Method for conducting systematic literature review
and meta-analysis for environmental science research. MethodsX 2020, 7, 100777.

[10] Krause, A.L.; Bitter, C. Spatial econometrics, land values and sustainability: Trends in real
estate valuation research. Cities 2012, 29, S19–S25.

4
[11] Mccluskey, W.J.; Borst, R.A. Specifying the effect of location in multivariate valuation
models for residential properties: A critical evaluation from the mass appraisal perspective. Prop.
Manag. 2007, 25, 312–343.

[12] Pagourtzi, E.; Assimakopoulos, V.; Hatzichristos, T.; French, N. Real estate appraisal: A
review of valuation methods. J. Prop. Investig. Financ. 2003, 21, 383–401.

[13] Wang, D.; Li, V.J. Mass appraisal models of real estate in the 21st century: A systematic
literature review. Sustainability 2019, 11, 7006.

[14] Zhou, G.; Ji, Y.; Chen, X.; Zhang, F. Artificial Neural Networks and the Mass Appraisal of
Real Estate. Int. J. Online Eng. (IJOE) 2018, 14, 180.

[15] Geerts, M.; De Weerdt, J.; vanden Broucke, S. A Survey of Methods and Input Data Types
for House Price Prediction: Literature List. KU Leuven RDR 2022, V2.

[16] Kutasi, D.; Badics, M.C. Valuation methods for the housing market: Evidence from
Budapest. Acta Oecon 2016, 66, 527–546.

[17] Yilmazer, E.S.; Kocaman, S. A mass appraisal assessment study using machine learning
based on multiple regression and random forest. Land Use Policy 2020, 99, 104889.

[18] Zhang, Y.; Zhang, D.; Miller, E.J. Spatial Autoregressive Analysis and Modeling of
Housing Prices in City of Toronto. J. Urban Plan. Dev. 2021, 147, 05021003.

[19] Das, S.S.S.; Ali, M.E.; Li, Y.F.; Kang, Y.B.; Sellis, T. Boosting house price predictions
using geo-spatial network embedding. Data Min. Knowl. Discov. 2021, 35, 2221–2250.

[20] Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA,
USA, 2017; Volume 1.
[21] Montero, J.M.; Mínguez, R.; Fernández-Avilés, G. Housing price prediction: Parametric
versus semi-parametric spatial hedonic models. J. Geogr. Syst. 2018, 20, 27–55.

[22] . Nappi-Choulet, I.; Maury, T.P. A Spatial and Temporal Autoregressive Local Estimation
for the Paris Housing Market. J. Reg. Sci. 2011, 51, 732–750.

[23] Hui, E.C.M.; Zhong, J.; Yu, K. Heterogeneity in Spatial Correlation and Influential Factors
on Property Prices of Submarkets Categorized by Urban Dwelling Spaces. J. Urban Plan. Dev.
2016, 142, 04014047.

4
[24] Liao, W.C.; Wang, X. Hedonic house prices and spatial quantile regression. J. Hous. Econ.
2012, 21, 16–27.

[25] Jasi ´nska, E.; Preweda, E. Statistical Modelling of the Market Value of Dwellings, on the
Example of the City of Kraków. Sustainability 2021, 13, 9339.

[26] Wu, C.; Ye, X.; Ren, F.; Wan, Y.; Ning, P.; Du, Q. Spatial and Social Media Data Analytics
of Housing Prices in Shenzhen, China. PLoS ONE 2016, 11, e0164553.

[27] Xue, C.; Ju, Y.; Li, S.; Zhou, Q.; Liu, Q. Research on accurate house price analysis by using
gis technology and transport accessibility: A case study of xi’an, china. Symmetry 2020, 12,
1329.

[28] Zhou, X.; Tong, W. Learning with self-attention for rental market spatial dynamics in the
Atlanta metropolitan area. Earth Sci. Inform. 2021, 14, 837–845.

[29] Adair, A.S.; Berry, J.N.; McGreal, W.S. Hedonic modelling, housing submarkets and
residential valuation. J. Prop. Res. 1996, 13, 67–83.

[30] Gultekin, B.; Yamamura, E. Predicting Housing Prices in Central Ankara, Turkey Based on
Spatial Dependence Analysis. Stud. Reg. Sci. 2002, 33, 217–227.

[31] Orford, S. Valuing Locational Externalities: A GIS and Multilevel Modelling Approach.
Environ. Plan. B Plan Des. 2002, 29, 105–127.

[32] Martínez, L.M.; Viegas, J.M. Effects of Transportation Accessibility on Residential


Property Values. Transp. Res. Rec. J. Transp. Res. Board. 2009, 2115, 127–137.

[33] Osland, L.; Thorsen, I. Predicting housing prices at alternative locations and under
alternative scenarios of the spatial job distribution. Lett. Spat. Resour. Sci. 2009, 2, 133–147.

[34] Filippova, O.; Rehm, M. The impact of proximity to cell phone towers on residential
property values. Int. J. Hous. Mark. Anal. 2011, 4, 244–267..

[35] Koramaz, T.K.; Dokmeci, V. Spatial Determinants of Housing Price Values in Istanbul. Eur.
Plan. Stud. 2012, 20, 1221–1237.

[36] Brunauer, W.A.; Lang, S.; Feilmayr, W. Hybrid multilevel STAR models for hedonic house
prices. Jahrb Reg. 2013, 33, 151–172.

5
[37] Brunauer, W.; Lang, S.; Umlauf, N. Modelling house prices using multilevel structured
additive regression. Stat. Model. 2013, 13, 95–123.

[38] Panduro, T.E.; Veie, K.L. Classification and valuation of urban green spaces—A hedonic
house price valuation. Landsc. Urban Plan. 2013, 120, 119–128.

[39] Franck, M.; Eyckmans, J.; De Jaeger, S.; Rousseau, S. Comparing the impact of road noise
on property prices in two separated markets. J. Environ. Econ. Policy 2015, 4, 15–44.

[40] Keskin, B.; Dunning, R.; Watkins, C. Modelling the impact of earthquake activity on real
estate values: A multi-level approach. J. Eur. Real Estate Res. 2017, 10, 73–90.

[41] Marmolejo-Duarte, C. Does urban centrality influence residential prices? An analysis for
the Barcelona Metropolitan Area. Rev. Constr. 2017, 16, 57–65.

[42] Hill, R.J.; Scholz, M. Can Geospatial Data Improve House Price Indexes? A Hedonic
Imputation Approach with Splines. Rev. Income Wealth 2018, 64, 737–756.

[43] Doumpos, M.; Papastamos, D.; Andritsos, D.; Zopounidis, C. Developing automated
valuation models for estimating property values: A comparison of global and locally weighted
approaches. Ann. Oper. Res. 2021, 306, 415–433.

[44] Osland, L.; Östh, J.; Nordvik, V. House price valuation of environmental amenities: An
application of GIS-derived data. Reg. Sci. Policy Pract. 2020, 14, 939–959.

[45] Bourassa, S.C.; Cantoni, E.; Hoesli, M. Spatial dependence, housing submarkets, and house
price prediction. J. Real Estate Financ. Econ. 2007, 35, 143–160.

[46] Chica Olmo, J. Spatial Estimation of Housing Prices and Locational Rents. Urban Stud.
1995, 32, 1331–1344.

[47] Chica-Olmo, J. Prediction of housing location price by a multivariate spatial method:


Cokriging. J. Real Estate Res. 2007, 29, 91–114.

[48] Yoo, E.H.; Kyriakidis, P. Area-to-point Kriging in spatial hedonic pricing models. J. Geogr.
Syst. 2009, 11, 381–406.

[49] Chica-Olmo, J.; Cano-Guervos, R.; Chica-Olmo, M. A Coregionalized Model to Predict


Housing Prices. Urban Geogr. 2013, 34, 395–412.

[50] Larraz, B.; Población, J. An online real estate valuation model for control risk taking: A
spatial approach. Investig. Anal. J. 2013, 42, 83–96.

5
[51] Szczepa ´nska, A.; Senetra, A.; Wasilewicz-Pszczółkowska, M. The effect of road traffic
noise on the prices of residential property—A case study of the polish city of Olsztyn. Transp.
Res. Part D Transp. Environ. 2015, 36, 167–177.

[52] de Koning, K.; Filatova, T.; Bin, O. Improved Methods for Predicting Property Prices in
Hazard Prone Dynamic Markets. Environ. Resour. Econ. 2018, 69, 247–263.

[53] Chica-Olmo, J.; Cano-Guervos, R.; Chica-Rivas, M. Estimation of Housing Price Variations
Using Spatio-Temporal Data. Sustainability 2019, 11, 1551.

[54] Chica-Olmo, J.; Cano-Guervos, R.; Tamaris-Turizo, I. Determination of buffer zone for
negative externalities: Effect on housing prices. Geogr. J. 2019, 185, 222–236.

[55] Paterson, R.W.; Boyle, K.J. Out of Sight, Out of Mind? Using GIS to Incorporate Visibility
in Hedonic Property Value Models. Land Econ. 2002, 78, 417–425.

[56] Tse, R.Y.C. Estimating Neighbourhood Effects in House Prices: Towards a New Hedonic
Model Approach. Urban Stud. 2002, 39, 1165–1180.

[57] Thériault, M.; Des Rosiers, F.; Villeneuve, P.; Kestens, Y. Modelling interactions of
location with specific value of housing attributes. Prop. Manag. 2003, 21, 25–62.

[58] Cohen, J.P.; Coughlin, C.C. Spatial hedonic models of airport noise, proximity, and housing
prices. J. Reg. Sci. 2008, 48, 859–878.

[59] Zietz, J.; Zietz, E.N.; Sirmans, G.S. Determinants of House Prices: A Quantile Regression
Approach. J. Real Estate Financ. Econ. 2008, 37, 317–333.

[60] Zhu, B.; Füss, R.; Rottke, N.B. The Predictive Power of Anisotropic Spatial Correlation
Modeling in Housing Prices. J. Real Estate Financ. Econ. 2011, 42, 542–565.

[61] Cho, S.H.; Yu, T.H.E.; Kim, S.G.; Roberts, R.K.; Lee, D. Applying Directed Acyclic
Graphs to Assist Specification of a Hedonic Model. Hous. Stud. 2012, 27, 984–1007

[62] Liu, X. Spatial and Temporal Dependence in House Price Prediction. J. Real Estate Financ.
Econ. 2013, 47, 341–369.

[63] Moreira de Aguiar, M.; Simões, R.; Braz Golgher, A. Housing market analysis using a
hierarchical–spatial approach: The case of Belo Horizonte, Minas Gerais, Brazil. Reg. Stud. Reg.
Sci. 2014, 1, 116–137.

[64] Chasco, C.; Sánchez, B. Valuation of environmental pollution in the city of Madrid: An
application with hedonic models and spatial quantile regression. Rev. Déconomie Reg. Urbaine
2015, 1, 343–370.

5
18% Overall Similarity
Top sources found in the following databases:
14% Internet database 16% Publications database
Crossref database Crossref Posted Content database

TOP SOURCES
The sources with the highest number of matches within the submission.
Overlapping sources will not be displayed.

backoffice.biblio.ugent.be
Internet 6%

mdpi.com
Internet 2%

coursehero.com
Internet 2%

hdl.handle.n
et <1
Internet %

Margot Geerts, Seppe vanden Broucke, Jochen De Weerdt. "A


Survey o... <1
Crossref %

repository.tcu.e
du <1
Internet %

frontiersin.or
g <1
Internet %

Amir Shachar. "Introduction to Algogens", Open Science


Framework, ... <1
Publication %

Sources
Ali Abderrezak. "Economic growth in the Maghrib: Are
Similarity Report <1
neighbouring ec...
Crossref %

Sources
"Intelligent Systems Design and Applications", Springer Science
and Bu... <1
Crossref %

mdpi-
res.com <1
Internet %

"Front Matter", 2023 8th International Conference on Computer


Scienc... <1
Crossref %

Y.S. Wudil, Amin Al-Fakih, Mohammed A. Al-Osta, M.A. Gondal.


"Intelli... <1
Crossref %

subasish.github.i
o <1
Internet %

researchgate.n
et <1
Internet %

science.go
v <1
Internet %

m.moam.inf
o <1
Internet %

Quang Truong, Minh Nguyen, Hy Dang, Bo Mei. "Housing Price


Predicti... <1
Crossref %

"Applied Informatics", Springer Science and Business Media


LLC, 2024 <1
Crossref %

Sources
ijraset.co
Similarity Report <1
m
Internet %

"Cognitive Computing and Cyber Physical Systems", Springer


Science ... <1
Crossref %

Sources
gtg.webhost.uoradea
.ro <1
Internet %

link.springer.co
m <1
Internet %

Jean Dubé, Diègo Legros. "Spatial Econometrics Using


Microdata", Wil... <1
Crossref %

Mehmet Balcilar, Rangan Gupta, Ricardo M. Sousa, Mark E.


Wohar. "Lin... <1
Crossref %

"Advances in Automated Valuation Modeling", Springer


Nature, 2017 <1
Crossref %

fastercapital.co
m <1
Internet %

u.camdemy.co
m <1
Internet %

mattwardhomes.c
om <1
Internet %

Jooyong Shim, Okmyung Bin, Changha Hwang. "Semiparametric


spatia... <1
Crossref %

dspace.univ-
msila.dz <1
Internet %

Sources
ieomsociety.o
Similarity Report <1
rg
Internet %

"Machine Learning and Knowledge Extraction", Springer


Science and B... <1
Crossref %

Sources
dqfire.co
m <1
Internet %

ojs.wiserpub.co
m <1
Internet %

pdfcoffee.co
m <1
Internet %

peeref.co
m <1
Internet %

tandfonline.co
m <1
Internet %

Andy L. Krause, Christopher Bitter. "Spatial econometrics, land


values ... <1
Crossref %

Chomba Kolala, Maksym Polyakov, James Fogarty. "Impacts of


mining... <1
Crossref %

Delores Conway. "A Spatial Autocorrelation Approach for


Examining th... <1
Crossref %

RRavinder Reddy, E Padmalatha, S Durga Devi. "House Price


Rate Chan... <1
Crossref %

Richard Yeaw Chong Seow. "A Review of CSR and ESG


Disclosures Det... <1
Crossref posted content %

Sources
S. Sisman, A.C. Aydinoglu. "A modelling approach with
Similarity Report <1
geographically ...
Crossref %

Sergio Copiello. "Spatial dependence of housing values in


Northeaster... <1
Crossref %

Sources
Visar Hoxha. "Exploring the predictive power of ANN and
traditional re... <1
Crossref %

freidok.uni-
freiburg.de <1
Internet %

"Analysis of Images, Social Networks and Texts", Springer


Science and... <1
Crossref %

0-www-mdpi-
com.brum.beds.ac.uk <1
Internet %

Ali Soltani, Chyi Lin Lee. "The non-linear dynamics of South


Australian ... <1
Crossref %

Chao Mou, Qing Zhou, Yinan Ran, Liang Ge, Yong Wang.
"Recommendi... <1
Crossref %

Janusz Sobieraj, Dominik Metelski. "Machine Learning Insights:


Explori... <1
Crossref %

Jorge Chica-Olmo, Rafael Cano-Guervos. "Does my house have a


premi... <1
Crossref %

Peter Jones, David Hillier, Daphne Comfort. "Materiality and


external a... <1
Crossref %

Peyman Jafary, Davood Shojaei, Abbas Rajabifard, Tuan Ngo.


"Automa... <1
Crossref %

Sources
Shaoze Cui, Yanzhang Wang, Dujuan Wang, Qian Sai, Ziheng
Similarity Report <1
Huang, T....
Crossref %

Yuhao Kang, Fan Zhang, Wenzhe Peng, Song Gao, Jinmeng Rao,
Fabio ... <1
Crossref %

Sources
deepai.or
g <1
Internet %

dokumen.p
ub <1
Internet %

emrbi.or
g <1
Internet %

ideas.repec.o
rg <1
Internet %

iris.polito.
it <1
Internet %

libweb.kpfu.
ru <1
Internet %

lup.lub.lu.s
e <1
Internet %

mydataroad.co
m <1
Internet %

podtail.s
e <1
Internet %

123articleonline.co
m <1
Internet %

Sources
kluniversity.i
Similarity Report <1
n
Internet %

realtytrac.co
m <1
Internet %

Sources
Similarity Report

Artur Janowski, Małgorzata Renigier-Bilozor. "HELIOS Approach:


Utilizi... <1
Crossref %

Choujun Zhan, Yonglin Liu, Zeqiong Wu, Mingbo Zhao, Tommy


W.S. Ch... <1
Crossref %

Fang-Jing Wu, Matthias R. Brust, Yan-Ann Chen, Tie Luo. "The


Privacy ... <1
Crossref %

Seungwoo Choi, Mun Yong Yi. "Computational Valuation Model


of Hou... <1
Crossref %

"Big Data Computing and Communications", Springer Science


and Busi... <1
Crossref %

"ITNG 2024: 21st International Conference on Information


Technology-... <1
Crossref %

Alice Barreca, Elena Fregonara, Diana Rolando. "EPC Labels


and Buildi... <1
Crossref %

Chao Wu, Xinyue Ye, Fu Ren, You Wan, Pengfei Ning, Qingyun
Du. "Spat... <1
Crossref %

Sources

You might also like