Housing Price Prediction
Housing Price Prediction
 UNIVERSITY OF ENGINEERING
             &
    MANAGEMENT, JAIPUR
               Housing Price Prediction
       BACHELOR OF
       Submitted in the partial fulfillment of the degree of
                                          APPLICATION
                         Und
     Samayak
UNIVERSITY OFjain [Enrollment No:
              ENGINEERING         12021004009012]
                               & MANAGEMENT,
                               BY
                 UNDER THE GUIDANCE OF
 This is to certify that the project report        “Housing Price Prediction” submitted Samayak Jain
                                         3
(Enrollment No: 12021004009012) in partial fulfillment of the requirements of the degree of Bachelor
   Computer Application from University of Engineering and Management, Jaipur was carried out in
   systematic and procedural manner to the best of our knowledge. It is a bona fide work of the candidate and
 carried out under our supervision and guidance during the academic session 2021-2024.
 The endless thanks goes to Lord Almighty for all the blessings he has showered      me, whi
has enabled me to write this last note in my research work. During the period of my research, as
  in the rest ofmy life,I have been blessed by Almighty with some extraordinary people who have
spun a web of support around me. Words can never be enough in expressing how grateful I am
to those incredible people in my life who made this thesis possible.I would like an attempt to
  thank them for makingmy time during my research in the Institute a period I will treasure. I am
deeply indebted tomy research supervisor, Professor Bablu kumar majhi me such an interesting
   thesis topic. Each meeting with him added in valuable aspects to the implementation
   broadened my perspective. He has guided me with his invaluable suggestions, lightened up
   way in my darkest times and encouraged me a lot in the academic
Samayak Jain
                                     ABSTRACT
 Predicting house prices has long been a complex challenge tackled by numerous researchers.
Accurate predictions are essential for informing stakeholders 1in house estate               shaping
 housing policies,      refining real          appraisals. This project’ " Comprehensi Guide to
 House Price             ," provides  an  in-depth overview5
                                                              of various strategies used to forecast
house prices.This systematic literature review examines the data types and modeling approaches
employed in research from 1992 to 2021. We meticulously analyzed 93 articles that each
present unique techniques for predicting house prices. These works were evaluated and scored
based on the novelty of their models and data. Our cluster analysis maps the landscape of
property
valuation, identifying key trends and shifts in the field.While
                                                              1
                                                                          traditional methods and
conventional data sources still dominate, the field of house price prediction is gradually
embracing mo sophisticated techniques    1
                                               and innovative data inputs. Our review highlights
opportunities for integrating advanced data types, such as unstructured and complex spatial data,
and incorporating deep 2learning andcustomized methods.           These advancements hold the
                          and significantly improve the             of housepotential
to guide future research                                  accuracy                 predictions.
Table o Contents
Table of Contents
        CHAPTER                   8
 1.List of
2. INTRODUCTION
    CHAPTER                     108
          Related work          10
3. Chapter                      14
     3.1 Review                 14
4. Chapter                      19
5. Chapter                      38
     5.1 Results & Discussion   38
6.   Chapter                    57
     6.1 Conclusion             57
7.   Chapter                    58
     7.1 Future Work            -58
8.   Chapter                    59
     8.1 Bibliography            59
                                           CHAPTER 1
INTRODUCTION
Housing Price Prediction, or valuing residential properties, is a complex challenge because real estate
                                              11
valuations are influenced by more than just the physical attributes of the building. Factors like location, the
neighborhood, and public perception also play significant roles. Additionally, market prices are driven by
buyers' willingness to pay, adding another layer of complexity to establishing an objective value for residential
properties. Traditionally, specialists like notaries, real estate brokers, and property investors have relied on
years of experience to make these valuations, making the automation of this process a daunting task.
However, Automated Valuation Models (AVMs) can enhance the accuracy of valuations, benefiting buyers,
sellers, notaries, banks, and policymakers. The main challenge lies in the diversity of techniques and kinds of
data utilized in the forecast of home prices, which makes achieving consistent accuracy difficult. Most
                                                                                      5
researchers agree on using 5a hedonic approach, which considers variables describing the physical characteristics
of the house. Additionally, incorporating effects of            is crucial due to         dependence (how
nearby house prices influence each other) and spatial heterogeneity (how these influences vary across different
areas).
The first law of geography states, "Everything is related to everything else, but near things are more related than
distant things," highlighting the importance of spatial factors in house price prediction. However, accounting for
these spatial         is still a contentious issue. Some researchers use proxies like submarkets or distances to
central business districts, while others employmodels that take into account the location, like spatial
econometrics, kriging, or spatially variable coefficient models. One more Another approach involves directly
incorporating location data (longitude and latitude) into machine learning algorithms to detect spatial
patterns.
Temporal dependence between house prices is another challenge, which most researchers address through
modeling techniques. While supervised learning is the predominant approach, some researchers have explored
semisupervised learning for house price prediction.
                                                                                 1
To address these challenges, our systematic literature review investigates the methods and data types used in
house price prediction, focusing on geospatial components. We identified trends through a cluster analysis of 93
articles based on their proposed methods and input data. The findings reveal an overrep5resentation of
conventional models and traditional data types. No5netheless, recent research has started to explore
                                                                                       more   advanced
and innovative data sources, such as advanced machine learning and deep learning, often combined with table,
image, and textual data.
                                                                      5
Our investigation concludes with this trend analysis, which identifies important gaps and research opportunities.
The following is the project's structure: The relevant property value research is reviewed in Section 2. Section 3
provides an overview
                                                      7
The approach that utilizes the PSALSAR framework Section 4 presents the results. Finally, Sections 5 and
 off an in-depth discussi and conclusions, respectively
The House Price Index (HPI) is a widely ut6ilized metric for gauging changes in residential housing prices
across various countries. Examples include the US Federal Housing Finance Agency HPI, the S&P/Case-Shiller
price index,the UK National Statistics HPI,the UK Land Registry’s HPI,the UK Halifax HPI,          UK
                                                                                                   the
                                                                 a weighted, repeat-sales index, which
                     s                  operates as
Rightmove HPI, and       URA HPI. The
      calculates average
                   18
                         price changes based on repeated sales or refinancing of the same properties.
                                                                                               6
                                                                                                      This index relies
      on analyzing repeat mortgage transactions on single-family properties, with data provided by Fannie Mae or
       Freddie Mac since January       By employing various analytical tools, enables economists to assess shifts in
      mortgage default rates, prepayments, and housing affordability within specific regions.
      Despite its utility, the HPI is a generalized measure and may not accurately predict the price of individual
      houses. Factors such as location, property age, and the number of floors are crucial in predicting specific house
      prices. In recent years, machine learning has become an essential tool for more precise house price predictions.
     5This is because machine learning can utilize various property attributes and not just historical sales data.
      Numerous studies have demonstrated the effectiveness of machine learning in this domain. However, many
      have focused solely on comparing individual model performances without exploring the potential of
      combining different models.
                                18                            18
      An exception is the work by S. Lu et al.,      used a hybrid regression
                                                                            6
                                                                              technique for house       forecasting
      but required extensive parameter tuning. Recognizing the significance of model combination, this paper
      explores the Stacked Generalization approach, a machine learning            method aimed at optimizing
      prediction accuracy. Using the "Housing Price in Beijing" dataset from Kaggle, we vali6dated the
      performance of multiple models. Our findings showed that the Stack Generalization method    yieldedRoot
                                                                                             the lowest
        Squared Logarithmic Error (RMSLE) of         0.16350 on the test
                       2
       T structure of this paper is as follows: Section 2 details the methodology, Section 3 presents a comparison of
       the      and S ection 4 discusses the          , draws               a suggests directions for future research
                                                      conclusions,
      In today’s competitive real estate market, organizations strive to gain an edge over their competitors.
      Simplifying the house pricing process for everyday users while delivering accurate results is essential. This
      paper introduces a system that predicts house prices using a regression machine learning algorithm. If you’re
      looking to sell a 4h2ouse, knowing the right price to list it at is crucial, and a computer algorithm can provide
      a                  This regression model                           to predict the                     for
      precise estimate.                          designed not only                          of houses ready     sale but
      also those under construction.
                 10
      Regression is a machine          tool that helps make predictions learning the relationship s            a target
      parameter and various independent parameters from existing statistical data. In this context, a 3house price
       depends factors such the number of rooms, living area, and location. By applying machine learning to
      these parameters, we can estimate house values in a specific region.
      The entire implementation is executed with Python as the programming language. For constructing the
      predictive model, we utilize a Decision Tree Regressor from the “Scikit-learn” machine learning library. Grid
      Search CV is employed to determine the optimal max-depth value for constructing the decision tree. Once the
      model is trained, it is integrated with a user interface using Flask, a Python web framework.
      This approach ensures that the system is user-friendly and capable11of providing reliable house price predictions,
      thus aiding both sellers and buyers in making informed decisions. In         report, we present system, "House
       Price            Using Machine Learning." Alongside fundamental needs like food and water, having a place
      to call home is one of a person's most basic desires. In the real estate industry, accurately predicting house
      prices is crucial as it helps buyers and sellers make informed decisions. With advancements in machine
      learning, numerous algorithms have been developed to predict property prices accurately.
Our research leverages
                   real estate
                         a dataset
                               portfolio
                                   of real estate properties and employs XGBoost, an advanced gradient boosting
technique, to forecast house values. XGBoost is a powerful algorithm known for effectively managing
structured datasets and has demonstrated excellent performance in forecasting complex datasets. It is
frequently
       71
used in machine            competitions due its robustness.
                                       56
The goa of house pri prediction is to develop a model that can accurately estimate the            a new house
     l                    price                                      40                        of
based on its attributes. This is achieved by using historical data on house features—such as square footage, the
 number of             and
                        7   bathroo location, etc.—and their corresponding prices. In this project, we applied
five algorithms: linear regression, support vector machine, Lasso regression, Random Forest, and XGBoost, to
predict house prices using a dataset of real estate properties.
XGBoost, in particular,
                 13
                        is highly effective for this purpose as it can handle a large number of features and
capture complex relationships between the features and the target            (price). B assessing the performance
of these algorithms,
                 51
                     especially XGBoost,   we  aim  to provide a reliable method  for house price prediction that
can benefit both buyers and sellers in the real estate market.
Algorithms for machine learning (ML) are progressively being utilized for use in automated valuation models
and in bulk real estate appraisals These mass appraisals involve standardized procedures that collect data
from                73
real estate listings to estimate the value large groups of properties, ensuring that the appraisals are conducted
impartially and consistently. Utilizing these technologies has the benefit of enabling rapid and
inexpensive execution of numerous valuations at a low cost per value. Mass appraisals using automated
systems are                                                   4
commonly employed for recurring annual taxes, as well as sporadic real estate            such as property transfer
 taxes, capital       taxes, and inheritance and gift taxes. They are also used in banking (for loans and mortgage
risk assessment),                                   and by real estate marketers..
         4
Recently, machine learning methods have been applied to estimate house prices, making these
  relatively   in this field. For instance, 4Antipov et al. used the random forest (RF) technique appraising
2,695 properties in Russia's St. Petersburg. They put into practice methods like k-nearest neighbors (K-NN),
7
chi-squared automatic interaction detector (CHAID), and classification and regression tree (CART).Their
results demonstrated that these techniques are highly effective, 4even when dealing with significant
heteroscedasticity, categorical variables, outliers, and incorrect dat
The Using big datasets, the study examined how well various ensemble learning algorithms performed in
predicting home prices. algorithms (Random Forest (RF) and Extra-Trees Regressor (ETR) and bagging methods
(Gradient Boosting Regressor (GBR), Extreme Gradient Boosting (XGBM), and Light Gradient Boosting
                                              Chapter 2
Related Work
Although few reviews on valuations for residential properties have been released, the field has seen significant
advancements in both the methods and the types of data used. A previous review analyzed existing literature
and identified three main trends. The initial trend highlights the application of spatial techniques, embodied
in the                                        24
catchphrase "location, location, location," which takes into account                 variety as well      spatial
dependence.Spatial dependence means that the prices of nearby properties are related, while spatial heterogeneity
               38
indicates that the relationship between property value (the dependent variable) and its influencing factors
( independent          ) varies by location. The review discusses advanced spatial methods such as spatial
econometric models and geographically weighted regression to tackle these issues.
 An earlier review categorizes these advanced spatial methods as third-generation techniques, which go beyond
 the manual calculations of techniques of the first and second generations. First-generation methods involve
 market segmentation and fitting models built on submarkets, whereas variables used in second-generation
 approaches include coordinates, accessibility metrics, and neighborhood demarcation. Fuzzy logic,
16
 autoregressive integrated moving average (ARIMA) models, and artificial neural networks (ANNs) are included
  to list of advanced techniques for appraisal. Recent studies have focused on advanced learning techniques
 due to their increasing popularity.techniques based on geographic information systems (GIS) and artificial
 intelligence (AI) as well as reviews specifically on on ANNs in real estate appraisal.
Other changes noted in include the impact of sustainability, the breakdown of property values into land and
structural components, and greater study on land values in addition to improvements in modeling approaches.
policies like premiums for green buildings. Real estate laws and land values, however, are not covered by
this review of the literature.
To sum up, earlier research mainly concentrated on techniques for forecasting home values; however, this study
presents a more thorough two-dimensional examination, broadening the model viewpoint and giving special
attention to the data dimension. In line with past studies, this study emphasizes the significance of incorporating
Geographical data as a result of thecritical role of location in house price prediction.
The world of real estate is being revolutionized by the advent of advanced technologies, particularly machine
learning. Predicting housing prices with high accuracy has always been a challenge due to the myriad factors
influencing market values. However, machine learning algorithms are now unlocking insights that were
previously beyond reach, transforming how buyers, sellers, and real estate professionals navigate the market.
Machine learning leverages, enormous volumes of data to find trends and forecast future events. This entails
          2                                                                             76
examining a wide range of factors in context of real estate,                square footage, the number of bedrooms
 and            location, proximity to amenities, historical price trends, and even less obvious factors like air
quality and crime rates. By processing this data, machine learning models can provide a nuanced understanding
of property values, offering a level of precision that surpasses traditional methods.
One of the most powerful machine learning techniques used in real estate price prediction is XGBoost, an
advanced gradient boosting algorithm.
                               12     XGBoost excels at handling structured data and capturing complex
interactions between variables, making it particularly well-suited for multifaceted nature of real estate
markets. Studies have shown that XGBoost and similar algorithms outperform traditional linear
regression models, offering more accurate and reliable predictions.
The advantages of applying artificial intelligence to for housing price prediction are manifold. For sellers, these
models can suggest optimal listing prices that maximize returns while minimizing the time a property spends
on the market These forecasts can be used by buyers to determine if a property is reasonably priced aiding in
negotiations and decision-making. Real estate professionals, such as agents and appraisers, gain a powerful tool
to enhance their expertise, providing clients with data-driven insights that instill confidence and trust.
Moreover, Real-time adaptation to shifting market conditions is possible using machine learning models.
Conventional assessment techniques sometimes rely on historical data, which could not accurately represent
current market conditions or trends market shifts. In contrast, machine learning algorithms continuously learn
and                                             8
update from new data, ensuring that predictions remain relevant       accurate even in dynamic environments.
                                     8
Another significant advantage is the ability to handle large datasets efficiently. In a typical real estate market,
thousands of transactions occur
                          67
                                regularly, generating an overwhelming amount of data. Machine learning
algorithms are designed to process and           this data at scale, uncovering trends and patterns that would be
impossible for humans to discern manually. This capability not only enhances accuracy but also speeds up the
prediction process, making it feasible to perform mass appraisals swiftly and at a lower cost.
8                                                    8
The integration of machine learning into real estate also paves the way for innovative applications. For
                    4
example, predictive models  can be used in urban planning to forecast t impact o new developments on
surrounding property values. Financial institutions can assess mortgage risks more accurately by
incorporating these predictions into their lending criteria. Even policymakers can benefit from these insights
when designing housing policies and regulations.
10
In conclusion, the application of machine
                                        1
                                           learning to housing price prediction is unlocking unprecedented
insights within the property industrt By harnessing the power of advanced algorithms and vast datasets,  66
                                                                                                             these
models provide accurate, timely, and actionable predictions. As the technology continues to evolve, it promis
to further revolutionize how understand, interact with, and make                in the real estate industry,
                                 decisions
eventually helping purchasers, vendors.
At the heart of price prediction is data. This could be information about a house (size, location, age), a car
(make, model, mileage), or a stock (company performance, market trends). The more data you have, the
better your predictions can be.
Once you have the data, you need a model to analyze it. This model can be a simple equation or a complex
algorithm. For instance, to predict a house price, you might consider factors like square footage, number of
bedrooms, location, and recent sales data. A model would then determine how these factors influence the price.
Machine Learning Magic
In recent years, machine learning has revolutionized price prediction. Instead of relying solely on simple
equations, computers can recognize intricate patterns by learning from enormous volumes of data. For example,
a machine learning model could analyze thousands of car sales to determine which features (like fuel efficiency
or safety ratings) have the biggest impact on price.
Predicting prices isn't always perfect. Factors like economic conditions, unexpected events, and human behavior
can all influence prices in ways that are difficult to predict. To overcome these challenges, experts often
combine multiple models and continuously update their data to improve accuracy.
Real-World Applications
Price prediction is used in countless ways. E-commerce platforms use it to recommend products,
financial institutions use it to assess risks, and businesses use it to optimize pricing strategies. Even your
favorite streaming service uses it to suggest shows you might like.
In essence, price prediction is about finding patterns in data and using those patterns to make informed
guesses about the future. While it's not always exact, it's a powerful tool that helps us make better decisions
every day.
      Proximity to Amenities: Homes near schools, parks, shopping centers, and public
       transportation command higher prices. People want convenience and accessibility.
      Neighborhood Quality: Safe, clean, and well-maintained neighborhoods attract buyers and
       maintain property values.
      Job Market: There is typically more demand for houses in areas with strong job markets prices.
      Schools: Strong school districts significantly boost property values as families seek quality
       education for their children.
      Transportation: Easy access to highways, public transportation, and airports increases a
       property's desirability.
      Views and Natural Amenities: Properties with water views, mountain vistas, or proximity to
       parks often command premium prices.
While these factors are well-known, the nuances of location can be complex. For instance, a home situated on
a quiet street within a bustling neighborhood might be more desirable than one on a main road. Additionally,
micro-locations within a neighborhood can also impact value. A home near a popular park or a highly-rated
restaurant could see a price premium.
The Evolving Landscape
The importance of location is constantly evolving. With the rise of remote work, the appeal of suburban and
rural areas has increased. Factors like access to high-speed internet and proximity to outdoor recreation
have gained prominence.
Ultimately, location is a dynamic factor influenced by a myriad of variables. Understanding these nuances
is crucial for both homebuyers and sellers to make informed decisions.
                                              Chapter 3
Review
                                                                             52
"Predicting housing prices, also known as residential property valuation, is a complex and multifaceted task
 influenced numerous facto beyond the physical attributes of the building itself. The property's location, the
characteristics of the neighborhood, and public perception all significantly impact a property's value.
Additionally, market prices are influenced by buyers' willingness to pay, further complicating the establishment
of objective valuations. Traditionally, Professionals with years of experience, such real estate investors,
notaries, and agents, have been trusted and deep market knowledge to determine property values, making the
automation of this process a complex endeavor.
                                                                                            54
Automated Valuation Models (AVMs) present a promising solution, potentially benefiting a wide range
 stakeholde including by improving the consistency and accuracy of property values, banks, buyers, sellers,
notaries, and legislators. Nonetheless, there are a lot of difficulties and discrepancies in housing price forecast
because different approaches and data sets are used. The majority of researchers agree that using
a hedonic approach, which takes into account elements that characterize the house's external features.
Moreover, incorporating location effects is crucial due to spatial dependence (the phenomenon where nearby
property prices influence each other) and spatial heterogeneity (the variability in relationships between property
values and influencing factors across different locations).
    9
The first law of geography, which states "Everything is related to everything else, but near things are more
 related than distant       " underscores t critical importance of           factors in house price prediction.
Despite this, Accounting for these geographical impacts is still a difficult and controversial problem. While
some studies 1use location-aware           like kriging, spatial econometrics, or spatial shifting coefficie
models, others use proxies like submarkets or distances to major business districts. An alternative strategy
entails integrating To find spatial patterns, machine learning algorithms are fed location data (longitude and
latitude).. Temporal dependence between house prices adds another layer of complexity, which most
researchers address through sophisticated modeling techniques. Supervised learning is the predominant
approach, although some researchers have explored semi-supervised learning for house price prediction.
To address these multifaceted challenges, our systematic literature review delves into the the techniques and
kinds of data used in home price forecasting, with an emphasis on geographic components. Through a cluster
analysis of 93 articles, we identified prevalent trends based on proposed methods and input data. Our findings
reveal an overrepresentation of conventional models and traditional data types. Nevertheless, recent research
has begun to explore increasingly sophisticated methods and cutting-edge data sources, such as sophisticated
machine learning and deep learning approaches paired with text, image, and graph data.
                                                                                             19
An essential30component of our work, this trend analysis identifies important gaps and areas for future research in
 the     of housing price prediction.       format of the paper is as follows: A summary of relevant
                                                                                                  2
                                                                                                          property
valuation research is given in Section 2. The methodology based on the PSALSAR framework is described
Section3. Section      analytical results shown in Section 4, and a thorough discussi and conclusions are
provided in Sections 5 and 6.
 While few reviews specifically focus on residential property valuation, the field has seen significant
advancements41in both methods and data usage. An earlier review identified three main trends: the use of spatial
methods, both spatial heterogeneity and      dependency. Spati dependence refers to phenomenon where
prices of nearby properties are related, while spatial heterogeneity indicates that the relationships between
property values and their influencing factors vary across different locations. This review discusses advanced
spatial methods, such as spatial econometric models and geographically weighted regression, to address these
issues.
These sophisticated geographical methods, which go beyond first- and second-generation approaches involving
neighborhood delineation
                      16
                          and market segmentation, are categorized as third-generation techniques by another
review. Fuzzy logic, autoregressive integrated moving average (ARIMA) models, a artificial neural networks
(ANNs) are added to list of sophisticated valuation techniques in a third review. The increasing application
of advanced learning strategies has led to recent studies emphasizing techniques based on geographic
information systems (GIS) and artificial intelligence (AI), in addition to reviews that concentrate on ANNs in
real estate appraisal.
 In addition to advancements
                   1
                              Recent research has revealed several trends in modeling methodologies, such as the
growing emphasis on land values, breakdown of property values into            and structural components, and the
influence of sustainability policies, including green building premiums. Nonetheless, real estate laws and land
values are regarded as falling outside the purview of this literature review.
To sum up, earlier studies have mostly concentrated on techniques for forecasting home values; however, this
work presents a more thorough two-dimensional examination, broadening the model viewpoint and giving
special attention to the data dimension. In line with other evaluations, this research emphasizes the significance
of taking into account geographic data because geography is a key factor in predicting home prices. By filling
in the spaces While examining the new developments in the industry, this study seeks to offer insightful
information and direct future studies in housing price prediction.
Predicting husing prices is a complex endeavor that has captivated researchers and practitioners for decades.
The interplay of numerous factors, both tangible and intangible, makes it a challenging yet fascinating field.
     Hedonic pricing models: These models estimate the implicit prices of different housing attributes
      by analyzing how property prices vary with changes in these characteristics.
     20
    Machine learning: Algorithms such as decision trees, random forests, support vector machines, and
       neural networks     shown promise i capturing complex relationships between features and prices.
    Time series analysis: For analyzing trends and seasonality in housing prices, time series models like
      ARIMA and SARIMA are employed.
    Spatial analysis: Incorporating geographic information systems (GIS) to analyze spatial patterns and
      relationships between properties and their surroundings can enhance prediction accuracy.
Despite advancements in modeling techniques, predicting housing prices remains a challenging task due to:
      Data availability and quality: Access to comprehensive and accurate data is crucial but often limited.
      Market volatility: Housing markets are subject to rapid changes influenced by various economic
       and social factors.
      Unforeseen events: Black swan events like pandemics or economic crises can disrupt
       established patterns.
      Advanced modeling techniques: Exploring deep learning, reinforcement learning, and hybrid
       models to capture complex patterns.
      Alternative data sources: Incorporating data from social media, satellite imagery, and
       other unconventional sources to enrich models.
      Explainable AI: Developing models that can provide transparent explanations for their predictions
       to build trust.
      Dynamic pricing models: Considering real-time market conditions and incorporating feedback loops
       to improve prediction accuracy.
By addressing these challenges and leveraging the power of data and advanced analytics, researchers and
practitioners can develop more accurate and robust housing price prediction models.
Researchers are constantly exploring how machine learning can improve house price prediction accuracy.
This summary explores several studies that compared the effectiveness of different machine learning
algorithms for this task.
Key Findings:
                                                                                                     35
      Random Forest (RF) emerges as a frequent winner, demonstrating strong performance in studies Park et al.
       [6], Banerjee et al. [26], Ceh et al. [8], Fan et al. [27], Ahmed Neloy et al. [7], and Hong et al. [12]. It excels in
       capturing complex relationships within data.
      Gradient Boosting techniques like Impressive results are also shown by 2 XGBM (Extreme
       Boosting) and LGBM (Light Gradient Boosting Machine) in experiments conducted by Kok et al. [14],
       Fan et al. [27], andand Voutas Chatzidis [16]. They are adept at handling large datasets and reducing
       bias.
      Support Vector Machines (SVM) are recognized for their consistency and reliability, as shown in Banerjee et
       al. [26]. They excel in finding clear boundaries between data points.
      Ensemble Techniques that combine multiple algorithms, like bagging and random forest, also prove effective
       in studies by Alfaro-Navarro et al. [19]. Ensemble approaches leverage the strengths of different models for
       potentially better results.
                                              Chapter 4
Determining the goals and scope of the research is the first step in the PSALSAR framework. This study's
purview includes spatially-component residential property valuation driven by models and data. This implies
that the
                                                                          1
spatial effects that affect home prices must be taken into account by either the model type or the input data.
"Private" Only apartments and single-family dwellings qualify as "property." Only studies that offer a technique
for estimating the individual properties' values are considered.
Analyzing the techniques and input data types utilized for property appraisal is the study's dual focus. The
following research questions (RQs) result from this:
                                 Abstract/arch term
                                 Results of rch query on Scopus N = 412
    600 is N.
   Phase
    combining1:
              search results and eliminating duplicates
  Identificati
    N is 799.
                                                          N1 = 799
                                filter N
                                nglish   = 753
                                       language                               Excluded # 210
                                Document type filter
       Phase 2:                 N = 589
N2 = 589
Excluded # 22
      Phase 4:
  Literature study
                                                          N4 = 93
Figure 1. Flow diagram depicting the Search and Appraisal steps consisting of four phases to identify all relevant scientific
Table 1. Search 2
    1. Articles analyzing correlations between house prices and other variables without focusing on
       prediction methods were excluded.
    2. Articles using cadastral values rather than market prices were not considered.
    3. Articles predicting land values without addressing the value of homes were excluded.
To apply these criteria, we first reviewed the titles of the papers. Titles were evaluated based on
the presence or absence of specific keywords to infer the content. For instance, to identify
articles meeting the second inclusion criterion, titles were checked for terms like ‘model’,
‘approach’, ‘technique’, or ‘analytics’. Titles containing terms like ‘correlation’, ‘estimation’, or
‘analysis’ without referencing a prediction method were excluded based on the first exclusion
criterion. Articles mentioning ‘cadastral value’ and ‘land value’ were excluded according to the
second and third exclusion criteria, respectively. This initial screening retained 142 papers.
Next, we reviewed the abstracts of these papers using the same inclusion and exclusion criteria. This
                                                                              1
step helped resolve any uncertainties from the title review. During this phase, Three review
 we found and eliminated from the list, despite being covered in Section 2. This process led to
the exclusion of 474 papers, leaving a final list of 115 relevant publications.
In the last stage, we used our university's subscription to get the complete papers and categorized
them based on two main aspects: the type of model described and the type of input data used.
This thorough review identified additional irrelevant articles. Specifically:
The chosen papers were grouped and examined in the Synthesis process according to the year of
publication, the journal outlet, the prediction techniques, and the input data kinds.
                                                    1
        4.5.1. Publication Year and Channel
4.5.2 Methods
The published research can be categorized into fourteen distinct modeling approaches: fuzzy
logic, nearest neighbor techniques, time series analysis,
                                                   5
                                                          kriging, spatial econometric models,
spatially coefficient models , decision/regression trees, andsupport vector
changing                                                                      devices, gradient
boosted artificial neural           random forests, or            perceptrons trees, additional
ensemble methods, and deep learning (for further information, see Table A1 in Appendix A). It's
common for multiple methods to be compared within a single study, often as a benchmark for
the proposed approach. Therefore, for this analysis, we focused solely on the primary method
outlined in each paper and summarized these in Table A1.
The foundation of any successful machine learning project is high-quality data. For our housing
price prediction model, we started by gathering information on Mumbai's real estate properties
from various online platforms. This data included essential details like location, property size
(carpet and built-up area), age, and zip code. It's crucial to ensure the data is structured and
quantifiable for effective analysis.
Before diving into modeling, we meticulously cleaned and prepared the dataset. This involved
handling missing values, which could be due to incomplete data or data entry errors. We
addressed these issues by either removing entries with excessive missing information or
imputing missing values with averages or other statistical measures. Outliers, which can skew
our model's performance, were also identified and treated appropriately, either by removing
them or capping their values.
                                                2
        Phase 2: Model Training
                                                  2
With a clean and prepared dataset, we divided it into a training set and a testing set, two separate
subsets.
         The model is trained using the training to learn thebetween property features and their
corresponding prices.
                                                                                          13
We employed a decision tree regression algorithm for this task. Decision trees excel at handli
 both numerical and categorical data, making    suitable for our diverse dataset. The algorithm
wherein decision rules are represented by branches, features are represented by each internal
node, and leaf nodepredict the property price.
 Once the model was trained, we evaluated its performance using the testing dataset. This helped
 us assess how accurately the model could predict prices for unseen data. Several metrics, such as
33
 mean squared error (MSE) and R-squared, were used measure t model's effectiveness.
To make our model accessible to users, we integrated it with a user interface using Flask, a
Python web framework. This allowed users to input property details and receive a predicted price
almost instantly.
Note: While this outlines a basic approach, real-world projects often involve more complex
modeling techniques, hyperparameter tuning, and rigorous evaluation to achieve optimal results.
Additionally, incorporating feature engineering to create new informative features can
significantly enhance model performance.
4.5.4 Explanation
        Data is the Foundation
The journey to predicting housing prices begins with a solid foundation: data. Gathering
relevant information is crucial. This includes factors like property size, location, number of
bedrooms, neighborhood amenities, and historical sales data. It's essential to collect data that
accurately reflects the market you're targeting.
Raw data often contains inconsistencies and imperfections. Data preprocessing is the procedure
for converting unstructured data into a format that is clear and ready for analysis. This involves
handling missing values (by removing or imputing them), identifying and addressing outliers,
and                                                                                            12
converting data into a consistent format. For example, categorical data like location might need to
 be converted into numerical                .
                                                  3
       Model
       Building:            T Hea   the Process
                              rt  o
Once the data is prepared, it's time to build the predictive model. Several machine learning
algorithms can be employed for this task. Popular choices include:
      Linear regression: Assumes a linear relationship between house prices and features.
      Decision trees: Create a tree-like model to make predictions based on decision rules.
      Random forests: an ensemble approach that boosts accuracy by combining several
       decision trees.
      Support vector machines: To divide data points into distinct classes—in this case, price
       ranges—find the optimal hyperplane.
      Neural networks: Complex models inspired by the human brain, capable of learning
       intricate patterns.
The choice of algorithm depends on factors like dataset size, complexity, and desired prediction
accuracy.
The foundation of any robust housing price prediction model is a comprehensive and high-
quality dataset. In this research, data on multifamily properties in Alicante, Spain, was
meticulously collected from a real estate portal between May 2019 and December 2021. Key
information included:
      Property details: Size, number of bedrooms, bathrooms, and other relevant features.
      Building characteristics: Amenities like elevators, parking, and swimming pools.
      Geographic location: Precise coordinates for each property.
      Listing information: Asking price and listing duration.
To ensure data accuracy, the dataset was regularly updated to account for changes in property
status (sold, withdrawn, price adjusted) and new listings. Outliers and inconsistencies in the data
were identified and addressed through careful cleaning and preprocessing.
Data Enrichment
To to increase the model's capacity for prediction, more information from various sources was
incorporated:
      Demographic data: Population density, age distribution, and income levels at the census
       tract level were taken from 2the National Statistics     of Spain(INE).
                                                 4
      Geographic information: The National Geographic Institute (IGN), the Valencian
       Cartographic Institute, and the Regional Ministry of Education, Culture, and Sport
       provided information on green areas, schools, and transportation
       infrastructure.
      Proximity analysis: Distances to key amenities like schools, parks, and the coast were
       calculated using network analysis to accurately reflect travel times.
Data cleaning was a crucial step to ensure data quality and reliability. Outliers in property
features like area, number of bedrooms, and bathrooms were removed to prevent model bias.
Properties with missing essential information were also excluded. To avoid data leakage,
identical properties within the same building were merged.
Dataset Overview
The final dataset comprised approximately 39,943 unique multifamily properties in Alicante,
Spain. It included a rich set of features encompassing property characteristics, location,
neighborhood demographics, and accessibility to amenities. This comprehensive dataset
provided a solid foundation for building accurate housing price prediction models.
The American dream of homeownership might be facing some roadblocks in the coming years.
While efforts are underway to lower mortgage rates, a new report suggests affordability will
remain a significant concern for many potential buyers.
Key Challenges:
      Soaring Home Prices: A combination of rising home prices in various markets and
       increasing living expenses is pushing homeownership out of reach for many first-time
       buyers, who traditionally represent a major segment (30-40%) of the housing market.
      Limited Progress on Affordability: Realtor.com predicts a slight decrease in home
       purchase costs by 2024, but this still translates to a significant portion of income (around
       35%) – higher than historical averages.
                                                   5
A Glimpse into the Future:
      2024: A modest decrease in home purchase costs is anticipated, making homes slightly
       more affordable. However, affordability remains a major hurdle, especially for first-
       time buyers.
      2025: Goldman Sachs forecasts a 3.7% increase in home prices, driven by existing
       market momentum. While mortgage rates are expected to dip slightly (around 6.3%),
       affordability is still a central concern, particularly for first-time buyers.
Expert View:
"The largest demographic group in the US is 30-39 year olds, and this is expected to grow for
years to come. This age range often coincides with life milestones like starting families. While
some individuals will prioritize buying regardless of rental affordability, financing costs still play
a major role. With current high financing costs, renting remains the cheaper option. Based on our
forecasts, mortgage affordability is unlikely to see significant improvements in the near future."
                                                  6
                                  Figure 2. Number of articles published per year.
                                      7
4.5.3 Input Data
Furthermore, we classified the            based on kinds of input data that were used, as shown
in Appendix A's Table A2. The majority of studies used a combination of these data sets, in
contrast to the modeling methodologies. The source datacan be categorized as follows:
Model Novelty Score: We began by identifying the primary modeling technique for each study.
This was either the sole method employed or the one newly suggested by the writers. Then,
based on the presumption that more recent techniques typically outperform older ones in
prediction tasks,
a novelty
      2
           score was assigned. For example, studies show that MLP is superior to MRA and RF.
beats MRA and GWR, and              enhanced tre exceed kriging terms of home value forecast.
Since deep learning (DL) is currently the most efficient method, it received the highest rating.
technique for managing text and image-based unstructured data. The range of scores was 1 (most
traditional) to 8 (DL)
Data Novelty Score: Each publication was evaluated based on the types of input data included.
Similar to the model score, a novelty score was assigned to each data type. Common data like
property features and temporal information received lower scores, while spatial data, especially
advanced types, were ranked higher. Graph-structured and unstructured data (text, images) were
considered the most novel. The overall data novelty score was calculated by summing the scores
of all included data types. While theoretically possible to reach a maximum score of 43, practical
limitations in data combination typically resulted in scores from one to twenty..
                                                8
                              Table 3. Model novelty score.
Krigi 1
Spatial econometric 1
                                     Time series                                  2
                                     Nearest                                     3
                                     Fuzzy                                        2
                                     Decision                                    3
Support vector 4
Artificial neural 4
                                     Random                                       6
                                     Gradient boosted                            6
                                     Ensembles                                    6
                                     Deep                                        8
Data accessibility varied considerably across the reviewed publications. While the vast majority
(96%) relied on proprietary data sets, a small number offered some level of public availability.
Three studies explicitly mentioned having publicly available data sets, while five offered the
possibility of requesting access.One such instance is which provides a downloadable compressed
file containing both the data set and the codebase used for analysis ( accessed February 1, 2022).
This specific data set contains fundamental spatial information like as coo and standard property
attributes.
                                                   9
However, transparency wasn't always as clear. Two publications referenced data sources (CINP
and Centadata) without providing specific details or links. Although both these data sets
reportedly contain standard property data, temporal information, and basic spatial features goes a
step further by including socioeconomic factors and points of interest (POIs).
The remaining studies offering access upon request primarily dealt with basic property features
with two extending
                 1
                      to include network distances (considered advanced spatial data) Lastly,
offered access to a data set          textual features in addition to     property information.
containing
             Standard features
               Input Data
             Structural                      Sco
             Temporal                         11
             Socioeconomic                    1
             Environmental                    1
             POI                              1
             1
              Standard      spatial           4
              data       Advanced             6
              spatial data Graphs             8
              Unstructured data
              Images                          10
              Text                            10
                                                   1
                                        Chapter 5
  Result
 This section categorizes the reviewed papers based on their modeling techniques and analyzes
 their novelty over time. Table 5 provides a detailed breakdown of the papers by method,
 including publication year ranges. Figure 3 visually represents the model novelty scores of each
 study plotted against publication year and author name.Our analysis reveals that traditional
 methods
  1
           like multiple regression analysis, kriging, spatial econometricspatially
                                                                               models, varying
                                                                                       and
  mod     dominate  the field. Most  studies employing   1
                                                           these  methods  received  low  novelty scores,
 with many earning the lowest possible rating. While time64series and fuzzy logic models have been
 explored by a few researchers, they are not as prevalent.Recent years havewitnessed a growing
 interest in methods capable of handling large datasets, such   1
                                                                      as artificial neural networks and
  support vector             Additionally,  the field is embracing    more  advanced machine learning
 techniques, including ensemble methods like random forests and gradient boosted trees, as well
 as deep learning. These approaches hold significant promise for improving residential property
 valuation models.
   year per
Table 5. Model-based categorization of studies with count and the range of publication
Table 6. Cont.
                   1
          GBT                     [19]                   1             2021–2021
          Other Ensembles        [27,106]               2             2020–2021
          DL[28]12021–2021
  Figu 4. Model novelty plot based      scored relative to study ID(based   data).The list of
  relates to the
                                                 1
          2
        5. Model novelty plot based on scored relative to study ID(based   data).The list of
  Figu
r elat in table to the
Figu 6. Model novelty plot based      scored relative to study ID(based    data).The list of
relat in table to the
                                               1
       7. Model
Figu              novelty plot based on scores relative to study ID (sequence based on year).
color of the data     relates to t publication ye
Figure 8. Model novelty plot based    scored relative to study ID(based    data).The list of
relates in table to the
                                               1
      9.                               scored                                data).The list
Figu in table
          Model novelty plot based              relative to study ID(based                    of
relat         to the
 Hedonic price models have been used extensively                     pric because they make the
                                                    many housing attributes. These models, which
 assumption that property
                        5   values are a function
have historically relied on multiple regression analysis (MRA), these mod typically incorporate
structural, neighborhood, and locational factors. While linear regression is common, researchers
have also employed semi-log and double-log regressions, combined with various estimation
techniques Stepwise regression, LASSO, least absolute deviation, generalized least squares, and
ordinary least squares..
To capture non-linear relationships, generalized additive models (GAMs) have been introduced,
often incorporating spatial and temporal dimensions. Semiparametric models, including those
using geospatial splines, have been developed to further enhance flexibility. Additionally,
multilevel hedonic regression models have been proposed to account for hierarchical data
structures, such as properties within neighborhoods or communities. These models often employ
Bayesian estimation methods for increased complexity.
Despite the emergence of more sophisticated techniques, MRA remains a popular and
foundational approach in property valuation. Its simplicity and interpretability make it a
benchmark for comparison with newer methods.
                                                1
5.1.2. Kriging
Kriging, a geostatistical method, has been another popular approach to valuing residential
properties. Ten studies within our review period employed this technique. Kriging predicts
property values at unobserved locations based on known values at nearby locations. It assumes
that property prices vary spatially following a random pattern with a consistent average and a
relationship between locations determined solely by distance.
To estimate property values, kriging calculates a variogram, which measures how property prices
vary with distance. A mathematical model is fitted to this variogram, and weights are assigned to
known property values to predict prices at unknown locations. This process minimizes prediction
error.
Cokriging expands on this by considering additional factors beyond location that influence
property prices. It incorporates multiple variables and their spatial relationships to improve
predictions. Regression cokriging further refines this by modeling multiple interrelated
equations, allowing for complex relationships between variables and their residuals.
To address the limitation of assuming a constant average property price, regression kriging
combines traditional regression modeling with kriging. A regression model is initially fitted to
property characteristics, and kriging is then applied to the prediction errors. The final prediction
is a combination of both models.
                                                                             1
Finally, Average property values for particular locations are included in area-to-point kriging
with external drift (A2PKED), along with individual property data, providing a more
comprehensive approach to spatial prediction.
           1
To capture both spatial and temporal patterns, spatiotemporal models have been developed.
These models extend spatial econometrics to incorporate time-series elements. Additionally,
spatial quantile regression and spatial Durbin models have been used to address specific issues,
such as heteroscedasticity and the impact of spatially lagged independent variables.
While maximum likelihood, generalized method of moments, and Bayesian methods are
commonly used for estimation, alternative approaches like spatial expansion and modeling
spatial autocorrelation as a constant term have also been explored. To accommodate non-linear
relationships and spatial heterogeneity, semiparametric and hierarchical spatial models have been
proposed.
Spatial econometric methods have become increasingly prevalent in property valuation research,
with the spatial error model being particularly popular. These techniques offer significant
advantages in capturing the complex spatial dependencies that influence property prices.
GWR is a sophisticated statistical method that estimates parameters for each observation
individually, accommodating the unique characteristics of each property. By using a distance
matrix tailored to each observation, our model provides highly localized insights, enhancing
accuracy in property valuation. The model integrates various weighting approaches, including bi-
square and Gaussian kernel functions, ensuring the most reliable results.
Our GWR model goes beyond traditional methods by incorporating temporal elements through
The regression that is weighted by geography and time (GTWR). This addition permits the
coefficients to change across time as well as over space., incorporating travel distances based on
road networks and adapting to dynamic changes in the housing market.
Additionally, the model features a semi supervised regression approach with co-training GWR,
employing both Gaussian and bi-square kernel functions. This iterative training process 1enhanc
 the          accuracy and adaptability. For a mo nuanced analysis, our Global regression is
combined with local GWR in a mixed-scale hedonic model. approaches, accommodating spatial
stationarity and nonstationarity simultaneously.
Process Model with Bayesian Spatially Varying Coefficient offers further precision by predicting
house prices with hierarchical conditional modeling. This approach provides comprehensive
                                                1
inference on model parameters, fresh observation prediction intervals, and efficient handling of
sparse data.
Our GWR model also incorporates Eigenvector Spatial Filtering (ESF) to capture localized
spatial variation and address multicollinearity issues, although it requires advanced computation
and mathematical knowledge. For a more accessible approach, a simplified ESF model is
available, maintaining the core selection methods while offering ease of use.
A model of trends that is hierarchical has been introduced to enhance property value
predictions by integrating general price trends with trends at the cluster level and housing
attributes. By differentiating between trends for different kinds of houses, neighborhoods, and
districts, this model addresses temporal and spatial interdependence in property pricing.
Despite their potential, time series models are relatively uncommon in property valuation studies,
with only a few articles exploring them between
                                         1
                                                   2004 and 2015.
A5.1.6.
   novelFuzzy     Logic
           methodolo for property valuation combines fuzzy logic with utilizing geographical
analysis and Geographic Information Systems (GIS) tools. This approach involves using real
estate variables to design a knowledge-base operator, generate fuzzy sets, and set rules. operator
for inference. The use of fuzzy logic helps handle the uncertainties and variations in property
data effectively.
An additional novel technique applies a fuzzy Bayesian approach to real estate appraisal. In
order to estimate property values, this two-step procedure begins with Bayesian regression
analysis. Variables displaying deterministic variability in the second stage—contrasting with
                                                 2
those that vary
                  2
randomly—are converted into fuzzy vectors based on fuzzy membership functions. This
fuzzification allows for the creation of fuzzy Bayesian confidence intervals for regression
parameters, resulting in predictions expressed as confidence intervals.
Fuzzy logic models have been explored sporadically in property valuation studies, with examples
appearing in 2006 and 2016. These models, alongside time series approaches, were noted for
their novelty in studies conducted between 2004 and 2016.
Instead of focusing only on geographic closeness, another version of the SAR model
incorporates a distance matrix in the attribute space. The distances between observations are
computed using k-means clustering on this matrix. The best model combines attribute-based and
geographic data closeness.
Nearest-neighbor models have been utilized in recent years, specifically in studies from 2017 and
2021. These models are noted for their novelty in the field, as reflected in their high model
novelty scores.
These decision tree methods supported the results of Multiple Regression Analysis (MRA) by
highlighting that different districts within the city exhibit distinct pricing patterns. This study
also received a high a high novelty score for the model, comparable to nearest-neighbor models,
indicating its innovative approach.
                                                  2
 5.1.9. Support Vector Machines
For property appraisal, a brand-new model known as 1the Semiparametric Spatial Effect
 Squares Support Vector Machine (SSELS-          ) was presented. By adding
                                                               1
 and a               component,   this model expands  upon  the least squares asupport
                                                                                 spatialvector
                                                                                         effect term
 machine        . element. Using a kernel for nonlinear transformations, the SSELS-SVM offers
improved accuracy in predicting house prices compared to both semiparametric Generalized
Additive Models (GAMs) and traditional parametric models. The SSELS-SVM model, used in
2014, has been recognized for its high level of innovation and is rated with a score of 4 for
model uniqueness.
Research has shown that MLPs can outperform traditional Multiple Regression Analysis (MRA)
in real estate valuation, particularly in markets like Budapest. Additionally, MLPs have been
adapted for property valuation by integrating them with Geographic Information Systems (GIS),
allowing for a more nuanced analysis of geographic data.
Further advancements include spatial neural networks that use MLPs to analyze neighborhood
features derived from satellite images. MLPs have also been employed as meta-models in
stacking ensembles to compare performance across different models.
Despite their potential, MLPs have been featured in only a few studies since 2011, and they,
along with Support Vector Machines (SVMs), have received a high model novelty score of 4.
This highlights their innovative application in property valuation research.
The Random Forest (RF) method enhances property valuation by combining multiple decision
trees into an ensemble and introducing randomization. This technique improves robustness by
constructing each tree from a using a bootstrapped sample of the data to train it on a random
                                                  2
selection of variables. RF provides a more accurate estimate of home prices by averaging the
estimates of these several trees.
                                                                                     57
  Research has demonstrated that Random Forests outperform traditional hedonic multiple
  regressi (MRA) and Geographically Weighted Regression (GWR). A recent study compared
Random Forest with other machine learning approaches and applied Shapley values to explain
the RF model's predictions. Recent evidence suggests that this strategy has been especially
successful
1
XGBoost, LightGBM, and Gradient Boosting sophisticated tree-based techniques for property
valuation and geospatial analysis. In a recent study, Gradient Boosting was found to be the most
effective approach for integrating housing and Points of Interest (POI) data into geospatial
network embeddings.
Tree boosting involves sequentially adding trees to correct the errors of previous trees, therefore
lowering the final model's overall loss. This method is implemented using Gradient Boosting; to
improve performance and efficiency, optimized versions are provided by XGBoost and
LightGBM..
These Gradient Boosting Techniques (GBT) have been used in recent studies, with GBT
methods appearing in the latest publication reviewed.
Stacking ensembles, a technique 2that combines predictions from          base models, have
employed in a few recent studies ([27, 106]). While offering improved accuracy,
                                                                         7
                                                                                  stacking often
comes at the cost of increased computational time. These studies utilized tree-based          like
 Random Forest, Gradient Boosted Trees,         LightGBM    as base   models, with  either linear
regression or neural networks as meta-models to combine predictions.
Unlike the more established ensemble methods like Random Forest and Gradient Boosted Trees,
stacking remains a relatively new approach in property valuation research, with only a handful of
studies adopting it within the last few years. Given its ability to leverage the strengths of multiple
models, stacking shows promise as a valuable tool for enhancing property valuation accuracy.
                                                  2
 5.1.14. Deep Learning
In our review, just one study used deep learning specifically to predict real estate prices. In order
to analyze and comprehend textual property descriptions, this study integrated self-attention
                                                                                            1
mechanisms with Long Short-Term Memory (LSTM) networks. Interestingly, this study is
notable for the greatest model novelty score; it is shown by its placement at the far right of the
novelty timeline. cutting-edge status in the industry.
Despite its potential, deep learning remains relatively underutilized in property valuation
compared to more traditional methods.
Table 6 provides a detailed overview of how different input data types have been utilized across
the studied publications. To assess data novelty, we assigned scores based on the complexity and
uniqueness of each data type. Figure 4 visually represents these data novelty scores over time.
According to our investigation, the most often used input categories are structural, temporal,
point- of-interest (POI), and simple geographical data. Although some researchers have taken
socioeconomic and environmental factors into account since the beginning of the study period,
their use is less common. broad. Advanced geographic data, including network-based and
topographical data distances, have become more and more prominent in recent research.
In contrast, graph, image, and textual data remain relatively underutilized, with only a few
studies exploring their potential. The diverse combinations of input data across the analyzed
papers result in a scattered pattern of data novelty scores over time, highlighting the varying
levels of data sophistication employed in the field.
                                                  2
               1
    Table 7. Data type-based categorization of studies with count and the range of publication year
     techniq
                                                                                                      Cou
Data               Studi                                                                              nt  Range
Structural         [2,4,8,16–19,21–50,52–95,97–                                                       91  1992–
                                                                                                          2021
Tempor         [4,8,18,19,21–23,25,28,29,31–34,36–49,51–57,59–63,65–77,79–                            78  1995–
               100,102,104,105]                                                                             20
Socioecono     [2,8,18,19,21,23,29,31,36,37,39–41,44,57,59,61,63–                                     40  1992–
mic            66,68–                                                                                     2021
               70,73,75,77,78,80–82,85,89,92,95,97,100,103–105]
Environme      [21,24,26,29,41,44,55,58,59,64,69,87,98]                                               13      1996–
ntal
1
                                                                                                              2020
POI            [16–19,21,23–27,30–35,38,39,44,46,48,49,52–54,56–59,64–                                56      2002–
               66,69,71,73,74,76–81,83,86–89,92,95,97,98,100–102,104,105]                                     2021
Basic spatial [2,4,16–19,21–24,27–39,41–46,49–55,57–62,64–68,70–81,83–                                81      1992–
               95,97,98,100–106]                                                                                  2021
 Advanc        [26,27,30–32,39,44,51,55,57–59,61,65,66,86]                                            16      2002–
spatial                                                                                                       2020
  Graphs         [19]                                                                                     1    2021–
                                                                                                               2021
Images         [104]                                                                                  1       2021–
                                                                                                              2021
 Text              [28,106]                                                                               2    2021–
                                                                                                               2021
    Structural property features are a fundamental component of nearly all property valuation
    models. As evident from Table 6, an overwhelming majority of the reviewed studies
    incorporated these characteristics. While the depth of structural detail varies widely, from a
    single living area measure to extensive lists of attributes, it's clear that this information forms the
    backbone of property valuation datasets.
    Property size, often measured by living area, consistently emerges as a crucial factor. Several
    studies, including those employing machine learning techniques, have highlighted its significant
    impact on property value. For instance, research using four machine learning models found
    house area to be the most influential factor, accounting for between 8% and 20% of property
    value.
        15 Other
    key structural attributes, such    p number of bedrooms      edictors.
    the most important                 r
                                                         2
the   f
      l
      o
      o
      r
      s
      ,
      a
      l
      s
      o
      f
      r
      e
      q
      u
      e
      n
      t
      l
      y
      r
      a
      n
      k
      a
      m
      o
      n
      g
          2
 5.2.2. Temporal Data
Time-related data, such as age, construction year, and transaction date, commonly incorporated
into property valuation models. Often represented as numerical values or dummy variables, this
temporal information is frequently treated as a structural property characteristic.
While some studies estimate separate models for different years due to data limitations, time-
series models explicitly leverage temporal data to capture property value trends over time.
Another approach, the repeat sales method, analyzes price changes of the same property over
time to estimate market trends. However, this method is often combined with other valuation
techniques for better accuracy.
Given its widespread use and consistent inclusion in feature importance analyses,1 temporal data is
considered as crucial to property valuation as structural information. Factors like building age
 construction     frequently rank among the         influential predictors of property value.
Our tool includes an extensive array of demographic and economic data, such as population
levels, age ratios, single-person household statistics, owner occupation rates, and unemployment
rates, ensuring a comprehensive analysis of neighborhood characteristics. This data is essential
for a nuanced understanding of property values and their determinants.
With its user-friendly interface and robust data integration capabilities, our tool simplifies the
process of incorporating these complex variables into your hedonic regression models. Whether
you are a real estate professional, researcher, or data analyst, this tool provides you with the
necessary resources to conduct thorough and accurate property valuations.
Experience the power of precise and comprehensive data with our Socioeconomic Data
Integration Tool and elevate your real estate valuation practice to new heights.
                                                2
5.2.4. Environmental Features
                       63
Environmental features can play a significant   role in property valuation a
                                           1                                    part of    broader
neighborhood characteristics. For instance, a street quality index and an index of nonresidential
                                                                                        5
 land    on  the    can  be derived  from specific  data sets. Similarly, neighborhood  land use or
 cover variables often incorporated into models to reflect environmental quality.
                                                                         1
Census variables that measure overall environmental quality, along with specific variables
 to air and noise pollution, frequently used in these models. Unlike the direct measurement of
air pollution levels, environmental quality often relies on subjective assessments, such as the
perceived presence of greenery and general environmental conditions.
Some research
        47
                create land use and cover variables, especially those linked to visibility aspects,
using digital elevation models and Geographic Information System (GIS) data.
Furthermore, seismic activity data can be used as an explanatory variable, and
It has also been investigated how water quality affects real estate values.
 T kind of input data, distance features are calculated using common metrics like Euclidean
distance employing location data in the form of coordinates.
                     1
Coordinates: Since "spatial" and "geospatial" are included in the search query, a lot of articl
 inclu location information of so kind, particularly geographic coordinates. A total of 54
papers make 24
             use of this data in different ways or include coordinates as variables in their models.
For example, the spatial weights matrix in          econometric models is based on distances
estimated based on coordinates. Furthermore, coordinate data are needed for spatially variable
coefficient models in order to Calculate the coefficients. Latitude was shown to be one of the
most important characteristics in one study. for employing a Random Forest (RF) model to
forecast home values
Distance Features: This type of input data includes 1actual distance or accessibility measures
model features. Dummy variables indicating the presence of a POI within a certain radius are
considered POI data rather than distance features. Euclidean distance is frequently used for
calculating these features. The Haversine formula, which approximates the great-circle distance
on the Earth's surface, is another method used. One common distance feature in the literature is
the measure of distance to the central business district (CBD), which has been included in
hedonic models from the early days of property valuation studies.
Basic spatial data is widely used in property valuation models and has been consistently included
throughout the study period. However, the approach to using exact location data
                                                                             5    has evolved. In
the early years, coordinates were often implied through weight matrices in spatial
models and kriging methods. More recently, a direct approach has been adopted, incorporating
5
longitude and latitude variables into predictive
 Topographical Data: Visibility features are often created using digital elevation models.
These geographical details aid in identifying the regions that each property can see that have
particular land uses.
                   1
                      Studies on noise pollution that may have an impact on properties also include
their models using noise        derived from noise
Advanced Distance Features: A more sophisticated and practical approach to incorporate distance
in value models is through the use of road network-based distance metrics. To link roadway
                                                 3
networks with property and point-of-interest locations, these sometimes call for the use of a GIS
                                                3
application. aspects of accessibility and travel durations, whether traveling by vehicle, bicycle, or
foot are likewise regarded as distance features. For example, certain research utilizing principal
component analysis (PCA) of walking and travel to create accessibility covariates road network-
based travel times to points of interest. Furthermore, municipal and regional accessibility
initiatives are frequently examined depending on journey durations.
In certain studies, accessibility metrics are computed by hand for both local and global
characteristics. These models incorporate a number of traffic accessibility indices, including
metro
              1
accessibility, walking accessibility in the road network, and bus accessibility based on bus
 and road       from station data and subway maps. Oftentimes, traffic variables rank among the
most crucial characteristics of models for machine learning.
Centrality and Connectivity: The street network is used to compute variables pertaining to
centrality and connectivity, emphasizing the significance of these elements in property valuation
models.
While basic spatial information is widely used, 1advanced spatial      appear in only 16
seven published before 2010 and nine after. The presence of advanced spatial data typically raises
 5
the data novelty score significantly. Among the        with high data novelty scores, the majority
include advanced spatial information, often combining multiple input data types. This
combination of data types underscores the significant role advanced spatial information plays in
enhancing the novelty and accuracy of property valuation models.
 In study, the authors built their own gra by joining homes with regions, rail stations, schools,
and other sites of interest (POIs), combining their properties into a structure based on location.
After that, a graph neural network was implanted in this graph. and applied to the prediction
model as a collection of features. Graphs' application in real estate appraisal models have been
few and have only been implemented since 2021. The introduction of a graph This article's data
is noteworthy for having high data novelty scores. Given that it was released in 2021, the piece
occurs in Figure 4 close to the x-axis's end.
      Clapboard: This timeless style features horizontal overlapping planks, offering a traditional and
       cozy appearance.
      Shingles: Known for their textured look, shingles can be made from cedar, asphalt, or
       other materials.
      Shiplap: This vertical plank siding brings a modern farmhouse aesthetic to your home.
      Vinyl: Offering a wide range of colors and styles, vinyl siding is durable, low-maintenance,
       and often mimics the look of wood.
      Fiber cement: Known for its durability and fire resistance, fiber cement siding provides
       a versatile look that can be painted to match your desired color.
      Brick: A classic choice that exudes timeless beauty and durability. It offers excellent insulation
       and requires minimal maintenance.
      Stone: For a luxurious and natural look, stone exterior can add significant curb appeal.
       However, it's typically more expensive and requires professional installation.
      Stucco: This cement-based material offers a smooth or textured finish and is popular in warmer
       climates for its insulating properties.
      Metal: Contemporary and low-maintenance, metal siding comes in various finishes,
       including aluminum, steel, and copper.
                                                     3
       Factors to Consider
Determining a home's overall condition is crucial for both buyers and sellers. It involves a5
comprehensi evaluation o various factors that contribute to property's value and livability.
      Structural Integrity: This examines the foundation, framing, roof, and overall stability of the
       house. Signs of cracks, leaks, or structural damage can significantly impact the condition rating.
      Exterior Condition: The assessment includes the state of siding, roofing, windows, doors,
       and landscaping. Factors like age, wear and tear, and maintenance history are considered.
      Interior Condition: Evaluating the condition of walls, floors, ceilings, kitchens, bathrooms,
       and overall cleanliness is essential. Up-to-date fixtures, appliances, and finishes contribute to
       a higher rating.
      Systems and Components: Checking the functionality and age of heating, cooling, electrical,
       plumbing, and ventilation systems is crucial.
      Maintenance History: A well-maintained home generally reflects a better overall condition.
       Evidence of regular upkeep and repairs is a positive indicator.
Rating Systems
While there's no standardized rating system, common terms used to describe a home's overall
condition include:
      Excellent: The home is in pristine condition with minimal wear and tear.
      Very Good: The home is well-maintained with minor cosmetic or functional issues.
      Good: The home is in average condition with some noticeable wear and tear.
      Fair: The home requires significant repairs or renovations.
      Poor: The home is in disrepair and may require extensive work.
                                                    3
5.3 Discussion
Figure 5 illustrates the relationship between model novelty and data novelty across 93 articles,
with each data point representing an article and colored by publication year. The plot displays
two novelty scores for each article, and jittering is applied to improve legibility due to overlap in
scores. The plot is annotated with five inferred clusters, categorized by model and data novelty.
                                                                                             1
Cluster 1: This cluster uses traditional input data types with classic hedonic procedures. Mod
  types with novelty scores between 1 and and data novelty values between 1 and 9 are included.
 Structure, temporal, economical, environmental, point of interest, and fundamental geographic
information. This cluster's articles are mostly from 1992 to 2021 and focus mostly on approaches.
                                               1
such as Support Vector Systems, Kriging, Multiple Regression Analysis (MRA), and Spati
Econometric Models (SEM) Fuzzy Logic (FL), time series, and Vector Classification (SVC).
Lately, this cluster's works have concentrated on using tried-and-true techniques while
broadening the scope of input data types.
Cluster 2: 14 papers in this cluster have 1a model novelty score of 1 and a data novelty score of
or higher, demonstrating the application of sophisticated geographic data along with traditional
hedonic approaches. Prior to 2015, the models in this cluster, mainly MRA and SEM as well as
kriging and SVC subsequently show an emphasis on incorporating cutting-edge spatial data into
conventional techniques for appraisal.
Cluster 3: This cluster, which consists of six articles, uses conventional input data types but
includes more sophisticated model types, such as fundamental machine learning techniques.
Interestingly, this cluster's model uniqueness declines with time, with papers released between
2011
3
       and 2016 utilizing Artificial Neural networks (ANNs) and support vector machines
(SVMs), as well as those that use them between 2017 and 2021 Decision trees (DT) and neural
networks (NN). Variety exists in data novelty; just one article does not utilizing fundamental
spatial data
Cluster 4: With three articles, the smallest cluster has low data novelty and strong model novelty.
Published around 2020 or 2021, this cluster demonstrates the application of Random Forest (RF)
techniques to sophisticated machine learning algorithms with conventional input data sources.
The pieces come together in one article. diverse data kinds, including POI, structural, temporal,
socioeconomic, and fundamental spatial details.
                                               3
data Although most articles in this cluster show high data novelty, one article has a lower data
novelty score due to its focus on advanced spatial information. This cluster includes a variety of
models, including deep learning, Gradient Boosting Trees (GBT), and ensemble methods, with
one outlier article using Multi-Layer
                           5
                                       Perceptrons (MLP) and Convolutional Neural
(CNNs) for image data transformation.
Overall, the clusters highlight the evolution of property valuation methods from conventional
approaches to more advanced techniques incorporating novel data types and modeling strategies.
The residential property valuation sector often prioritizes proven models and data types when
resolving Research Questions 3. The majority of papers—both old and new—show minimal data
and model innovation. Some, nonetheless, stick out because of their high data or uniqueness of
model. or both. A change has been observed in recent studies experimenting with novel machine
learning (ML) techniques. toward more sophisticated machine learning prediction models,
deviating from conventional techniques. Additionally, this tendency may be seen in the rising
usage of novel data forms such graphs and unstructured data, even though it has conventional
characteristics like basic geographical, temporal, and structural data continue to be common.
                                                5
Several factors contribute to the reliance on conventional hedonic methods and traditional    11
                                                                                                   data
 typ in residential property valuation. One significant factor is data availability. While big data
 and deep learning methods        become mo accessible, high-quality, publicly available housing
datasets that include transaction
                             1
                                  prices, hedonic attributes, and spatial data are still scarce. Existing
datasets, such as those from King County (USA), Melbourne (Australia), Ames (Iowa, USA),
Boston (USA), are often limited in spatial scope and require combining with separate, more
advanced spatial data sources. This can complicate data integration, leading to issues like
merging difficulties, missing data, and sparsity. Additionally, dealing with unstructured data
like graphs,       60
images, and text presents                 challenges due to       resource-intensive nature data
collection a preprocessing.
Another factor may be the gap between academic research and industry practice. While academic
research has traditionally focused on established methods and data types, industry may explore
more novel techniques and data types that are not always reflected in academic literature.
The property valuation area is presented with new options as a result of the progressive
integration of advanced and traditional data sources and increasingly sophisticated forecasting
methodologies. It is possible to make use of unstructured data, like text and image files, and to
create sophisticated geographical information and more intricate distance calculations. The use
                                                    3
of deep learning methods, which are well-suited for handling unstructured data, could further
enhance the
                                              3
effectiveness of these models. Future research might focus on integrating advanced input data
types with traditional ones and developing tailored ML and deep learning methods to manage
and extract valuable insights from diverse data sources.
Selecting the optimal algorithm 4for predicting house prices a complex task that involves
considering multiple factors beyond just raw performance metrics. While it's tempting to simply
choose the algorithm with the highest accuracy, a more nuanced approach is necessary.
Key Considerations:
                                                3
Ensemble Methods:
Combining multiple models (ensembling) can often improve predictive performance. You can
experiment with methods like stacking, boosting, and bagging.
Model Interpretation
                                                       27
Selecting the optimal algorithm for predicting house prices is a complex task that requires
 consideration of various       beyond just raw performance metrics. While the
Gradient Boosting Regressor (GBR) might initially appear to be the top contender due to
its strong performance in both the test dataset and cross-validation, other factors must be
weighed.
                                                                              17
      Performance: While accuracy (measured by R-squared) is crucial, other metrics like
       Squared Error (MSE) or Mean Absolute Error (MAE) should also be considered. Small
       differences in performance might not justify the complexities of some models.
      Overfitting: models that exhibit remarkable performance on training data but subpar
       performance on unknown data
      Stability: Consistency in performance across different data subsets is essential. Models
       with low variance in cross-validation are generally preferred.
      Computational Efficiency: Training time and model complexity can impact the
       feasibility of deployment, especially in real-time applications.
      Interpretability: Understanding the factors influencing predictions can be valuable.
       Simpler models like linear regression are often more interpretable than complex ones.
To gain insights into which factors drive house prices, feature importance analysis is crucial.
While many algorithms provide built-in methods, using a consistent approach like permutation
importance offers several advantages:
By shuffling feature values and measuring the impact on model performance, permutation
importance helps identify the most influential factors.
                                                   4
       The Path Forward
Among the machine learning algorithms tested, ensemble methods like Gradient Boosting
(GBM), XGBoost, and LightGBM exhibited exceptional performance. These algorithms excel at
handling complex datasets and reducing overfitting, making them ideal for housing price
prediction. While Random Forest (RF) also performed well, it was more prone to overfitting
compared to the boosting-based methods.
Feature importance analysis revealed that 2floor area, number of bathrooms, and the
 of an elevator the most significant predictors    of house prices. Additionally, location,
specifically proximity to desirable areas like Playa
                                               4
                                                     de San Juan and El Cabo de la Huerta, plays a
crucial role. Neighborhood characteristics, such as net household income, also significantly
impact property values.
    study highlights                 4
 T                    t temporary impact of the COVID- pandemic on the housing market
Alicante. While there was an initial price decline, the market quickly recovered, and prices
surpassed pre-pandemic levels.
                                                   4
         Limitations and Future Research
While the study provides valuable insights, it's essential to acknowledge limitations.
Relying solely on asking prices might not perfectly reflect actual transaction prices.
Additionally, the focus on Alicante limits the generalizability of findings to other regions.
      Longitudinal analysis: Tracking price changes over extended periods to identify trends
       and patterns.
      Incorporation of additional features: Exploring the impact of factors like energy
       efficiency, property age, and school district quality.
                                                4
                                     Chapter 6
Conclusions
Policymakers, buyers, sellers, and other real estate players all depend on accurate residential
property appraisal. A comprehensive analysis of the literature on this subject enhances research
and benefits society by enhancing market transparency and influencing housing improving real
estate appraisal procedures and policies.
 This PSALSAR paradigm, this study conducted a thorough evaluation of prior research on
predictive techniques for geospatial data-based house price forecasting. The approach comprised
the following steps:
These scores were plotted over time and analyzed to identify five distinct clusters:
   1. Conventional Methods with Traditional Data: This cluster, comprising nearly 70% of
       the reviewed literature, indicates a dominance of traditional methods and data types,
      5
       showing low levels of novelty.
   2. 1Conventional Methods with Advanced Spatial Data: This group combines traditional
       methods withmore advanced spatial information.
   3. Basic ML Methods with Traditional Data: Includes basic machine learning methods
       applied to traditional data
                               1
                                   types.
   4. Advanced ML Methods with Traditional Data : Features advanc machine learning
     techniques but still relies on traditional data.
   5. Advanced ML Methods with Advanced Data: The most innovative cluster, using both
       advanced machine learning methods and novel data types like images and text.
                                                 4
 Even while traditional approaches are still widely used, new studies are starting to look into
advanced geographical data, unstructured data, and complex machine learning and deep learning
methods.
        Leveraging Unstructured Data: Using deep learning to handle and analyze images and
         text data.
        Developing Advanced Spatial Features: Creating more complex spatial data and
         distance measures.
        Tailoring Algorithms: Adapting machine learning and deep learning methods to
          effective combine diverse features.
75
Future research
        21
                could focus on integrating advanced input data with cutting-edge predictive
methods to enhance the accuracy and robustness of property valuation mod
                                                 4
                                 Chapter 7
Future works
     Key Feature Categories
 1. Property Characteristics:
       o Basic details: square footage, number of bedrooms, bathrooms, and floors.
       o Building quality: age, condition, renovations, and material type.
       o Additional features: garage size, basement, fireplace, pool, and other amenities.
 2. Location:
       o Geographic coordinates: latitude and longitude.
       o Neighborhood: crime rates, school districts, proximity to amenities.
       o Zoning regulations: impact on property use and value.
 3. Economic Factors:
       o Local and national economic indicators: GDP, unemployment rate, interest rates.
       o Housing market trends: supply and demand, price indices.
 4. Time-Based Features:
       o Seasonal variations: price fluctuations based on time of year.
       o Market trends: historical price data to identify patterns.
                                            4
    Example Features
   Derived Features:
       o Price per square foot
       o Age of the property
       o Distance to schools, parks, and public transportation
       o Property tax rate
   Categorical Features:
       o Property type (single-family, condo, townhouse)
       o Heating and cooling systems
       o Roof type
       o Neighborhood quality
                                            4
that can be analyzed   modern techniques like natural language processing (NLP).
                                    4
                                 Bibliography
 [1] Rosen, S. Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition.
J. Political Econ. 1974, 82, 34–55.
[2] Can, A. Specification and estimation of hedonic housing price models. Reg. Sci. Urban
Econ. 1992, 22, 453–474.
[3] Kang, Y.; Zhang, F.; Peng, W.; Gao, S.; Rao, J.; Duarte, F.; Ratti, C. Understanding house
price appreciation using multi-source big geo-data and machine learning. Land Use Policy 2021,
111, 104919.
[4] Yacim, J.A.; Bosh off, D.G.B. A Comparison of Bandwidth and Kernel Function Selection in
Geographically Weighted Regression for House Valuation. Int. J. Technol. 2019, 10, 58.
[5] Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region.
Econ. Geogr. 1970, 46, 234–240.
[6] Gao, Q.; Shi, V.; Pettit, C.; Han, H. Property valuation using machine learning algorithms on
statistical areas in Greater Sydney, Australia. Land Use Policy 2022, 123, 106409.
[7] Sisman, S.; Aydinoglu, A. Improving performance of mass real estate valuation through
application of the dataset optimization and Spatially Constrained Multivariate Clustering
Analysis. Land Use Policy 2022, 119, 106167.
[8] Yang, Y.; Liu, J.; Xu, S.; Zhao, Y. An Extended Semi-Supervised Regression Approach with
Co-Training and Geographical Weighted Regression: A Case Study of Housing Prices in
Beijing. ISPRS Int. J. Geo-Inf. 2016, 5, 4.
[9] Mengist, W.; Soromessa, T.; Legese, G. Method for conducting systematic literature review
and meta-analysis for environmental science research. MethodsX 2020, 7, 100777.
[10] Krause, A.L.; Bitter, C. Spatial econometrics, land values and sustainability: Trends in real
estate valuation research. Cities 2012, 29, S19–S25.
                                                4
[11] Mccluskey, W.J.; Borst, R.A. Specifying the effect of location in multivariate valuation
models for residential properties: A critical evaluation from the mass appraisal perspective. Prop.
Manag. 2007, 25, 312–343.
[12] Pagourtzi, E.; Assimakopoulos, V.; Hatzichristos, T.; French, N. Real estate appraisal: A
review of valuation methods. J. Prop. Investig. Financ. 2003, 21, 383–401.
[13] Wang, D.; Li, V.J. Mass appraisal models of real estate in the 21st century: A systematic
literature review. Sustainability 2019, 11, 7006.
[14] Zhou, G.; Ji, Y.; Chen, X.; Zhang, F. Artificial Neural Networks and the Mass Appraisal of
Real Estate. Int. J. Online Eng. (IJOE) 2018, 14, 180.
[15] Geerts, M.; De Weerdt, J.; vanden Broucke, S. A Survey of Methods and Input Data Types
for House Price Prediction: Literature List. KU Leuven RDR 2022, V2.
[16] Kutasi, D.; Badics, M.C. Valuation methods for the housing market: Evidence from
Budapest. Acta Oecon 2016, 66, 527–546.
 [17] Yilmazer, E.S.; Kocaman, S. A mass appraisal assessment study using machine learning
based on multiple regression and random forest. Land Use Policy 2020, 99, 104889.
[18] Zhang, Y.; Zhang, D.; Miller, E.J. Spatial Autoregressive Analysis and Modeling of
Housing Prices in City of Toronto. J. Urban Plan. Dev. 2021, 147, 05021003.
[19] Das, S.S.S.; Ali, M.E.; Li, Y.F.; Kang, Y.B.; Sellis, T. Boosting house price predictions
using geo-spatial network embedding. Data Min. Knowl. Discov. 2021, 35, 2221–2250.
[20] Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA,
USA, 2017; Volume 1.
[21] Montero, J.M.; Mínguez, R.; Fernández-Avilés, G. Housing price prediction: Parametric
versus semi-parametric spatial hedonic models. J. Geogr. Syst. 2018, 20, 27–55.
[22] . Nappi-Choulet, I.; Maury, T.P. A Spatial and Temporal Autoregressive Local Estimation
for the Paris Housing Market. J. Reg. Sci. 2011, 51, 732–750.
[23] Hui, E.C.M.; Zhong, J.; Yu, K. Heterogeneity in Spatial Correlation and Influential Factors
on Property Prices of Submarkets Categorized by Urban Dwelling Spaces. J. Urban Plan. Dev.
2016, 142, 04014047.
                                                 4
[24] Liao, W.C.; Wang, X. Hedonic house prices and spatial quantile regression. J. Hous. Econ.
2012, 21, 16–27.
[25] Jasi ´nska, E.; Preweda, E. Statistical Modelling of the Market Value of Dwellings, on the
Example of the City of Kraków. Sustainability 2021, 13, 9339.
[26] Wu, C.; Ye, X.; Ren, F.; Wan, Y.; Ning, P.; Du, Q. Spatial and Social Media Data Analytics
of Housing Prices in Shenzhen, China. PLoS ONE 2016, 11, e0164553.
[27] Xue, C.; Ju, Y.; Li, S.; Zhou, Q.; Liu, Q. Research on accurate house price analysis by using
gis technology and transport accessibility: A case study of xi’an, china. Symmetry 2020, 12,
1329.
[28] Zhou, X.; Tong, W. Learning with self-attention for rental market spatial dynamics in the
Atlanta metropolitan area. Earth Sci. Inform. 2021, 14, 837–845.
[29] Adair, A.S.; Berry, J.N.; McGreal, W.S. Hedonic modelling, housing submarkets and
residential valuation. J. Prop. Res. 1996, 13, 67–83.
[30] Gultekin, B.; Yamamura, E. Predicting Housing Prices in Central Ankara, Turkey Based on
Spatial Dependence Analysis. Stud. Reg. Sci. 2002, 33, 217–227.
[31] Orford, S. Valuing Locational Externalities: A GIS and Multilevel Modelling Approach.
Environ. Plan. B Plan Des. 2002, 29, 105–127.
[33] Osland, L.; Thorsen, I. Predicting housing prices at alternative locations and under
alternative scenarios of the spatial job distribution. Lett. Spat. Resour. Sci. 2009, 2, 133–147.
[34] Filippova, O.; Rehm, M. The impact of proximity to cell phone towers on residential
property values. Int. J. Hous. Mark. Anal. 2011, 4, 244–267..
[35] Koramaz, T.K.; Dokmeci, V. Spatial Determinants of Housing Price Values in Istanbul. Eur.
Plan. Stud. 2012, 20, 1221–1237.
[36] Brunauer, W.A.; Lang, S.; Feilmayr, W. Hybrid multilevel STAR models for hedonic house
prices. Jahrb Reg. 2013, 33, 151–172.
                                                  5
[37] Brunauer, W.; Lang, S.; Umlauf, N. Modelling house prices using multilevel structured
additive regression. Stat. Model. 2013, 13, 95–123.
[38] Panduro, T.E.; Veie, K.L. Classification and valuation of urban green spaces—A hedonic
house price valuation. Landsc. Urban Plan. 2013, 120, 119–128.
[39] Franck, M.; Eyckmans, J.; De Jaeger, S.; Rousseau, S. Comparing the impact of road noise
on property prices in two separated markets. J. Environ. Econ. Policy 2015, 4, 15–44.
[40] Keskin, B.; Dunning, R.; Watkins, C. Modelling the impact of earthquake activity on real
estate values: A multi-level approach. J. Eur. Real Estate Res. 2017, 10, 73–90.
[41] Marmolejo-Duarte, C. Does urban centrality influence residential prices? An analysis for
the Barcelona Metropolitan Area. Rev. Constr. 2017, 16, 57–65.
[42] Hill, R.J.; Scholz, M. Can Geospatial Data Improve House Price Indexes? A Hedonic
Imputation Approach with Splines. Rev. Income Wealth 2018, 64, 737–756.
[43] Doumpos, M.; Papastamos, D.; Andritsos, D.; Zopounidis, C. Developing automated
valuation models for estimating property values: A comparison of global and locally weighted
approaches. Ann. Oper. Res. 2021, 306, 415–433.
[44] Osland, L.; Östh, J.; Nordvik, V. House price valuation of environmental amenities: An
application of GIS-derived data. Reg. Sci. Policy Pract. 2020, 14, 939–959.
[45] Bourassa, S.C.; Cantoni, E.; Hoesli, M. Spatial dependence, housing submarkets, and house
price prediction. J. Real Estate Financ. Econ. 2007, 35, 143–160.
[46] Chica Olmo, J. Spatial Estimation of Housing Prices and Locational Rents. Urban Stud.
1995, 32, 1331–1344.
[48] Yoo, E.H.; Kyriakidis, P. Area-to-point Kriging in spatial hedonic pricing models. J. Geogr.
Syst. 2009, 11, 381–406.
[50] Larraz, B.; Población, J. An online real estate valuation model for control risk taking: A
spatial approach. Investig. Anal. J. 2013, 42, 83–96.
                                                5
[51] Szczepa ´nska, A.; Senetra, A.; Wasilewicz-Pszczółkowska, M. The effect of road traffic
noise on the prices of residential property—A case study of the polish city of Olsztyn. Transp.
Res. Part D Transp. Environ. 2015, 36, 167–177.
[52] de Koning, K.; Filatova, T.; Bin, O. Improved Methods for Predicting Property Prices in
Hazard Prone Dynamic Markets. Environ. Resour. Econ. 2018, 69, 247–263.
[53] Chica-Olmo, J.; Cano-Guervos, R.; Chica-Rivas, M. Estimation of Housing Price Variations
Using Spatio-Temporal Data. Sustainability 2019, 11, 1551.
[54] Chica-Olmo, J.; Cano-Guervos, R.; Tamaris-Turizo, I. Determination of buffer zone for
negative externalities: Effect on housing prices. Geogr. J. 2019, 185, 222–236.
 [55] Paterson, R.W.; Boyle, K.J. Out of Sight, Out of Mind? Using GIS to Incorporate Visibility
in Hedonic Property Value Models. Land Econ. 2002, 78, 417–425.
[56] Tse, R.Y.C. Estimating Neighbourhood Effects in House Prices: Towards a New Hedonic
Model Approach. Urban Stud. 2002, 39, 1165–1180.
[57] Thériault, M.; Des Rosiers, F.; Villeneuve, P.; Kestens, Y. Modelling interactions of
location with specific value of housing attributes. Prop. Manag. 2003, 21, 25–62.
[58] Cohen, J.P.; Coughlin, C.C. Spatial hedonic models of airport noise, proximity, and housing
prices. J. Reg. Sci. 2008, 48, 859–878.
[59] Zietz, J.; Zietz, E.N.; Sirmans, G.S. Determinants of House Prices: A Quantile Regression
Approach. J. Real Estate Financ. Econ. 2008, 37, 317–333.
[60] Zhu, B.; Füss, R.; Rottke, N.B. The Predictive Power of Anisotropic Spatial Correlation
Modeling in Housing Prices. J. Real Estate Financ. Econ. 2011, 42, 542–565.
[61] Cho, S.H.; Yu, T.H.E.; Kim, S.G.; Roberts, R.K.; Lee, D. Applying Directed Acyclic
Graphs to Assist Specification of a Hedonic Model. Hous. Stud. 2012, 27, 984–1007
[62] Liu, X. Spatial and Temporal Dependence in House Price Prediction. J. Real Estate Financ.
Econ. 2013, 47, 341–369.
[63] Moreira de Aguiar, M.; Simões, R.; Braz Golgher, A. Housing market analysis using a
hierarchical–spatial approach: The case of Belo Horizonte, Minas Gerais, Brazil. Reg. Stud. Reg.
Sci. 2014, 1, 116–137.
[64] Chasco, C.; Sánchez, B. Valuation of environmental pollution in the city of Madrid: An
application with hedonic models and spatial quantile regression. Rev. Déconomie Reg. Urbaine
2015, 1, 343–370.
                                               5
   18% Overall Similarity
Top sources found in the following databases:
  14% Internet database                     16% Publications database
  Crossref database                         Crossref Posted Content database
TOP SOURCES
The sources with the highest number of matches within the submission.
Overlapping sources will not be displayed.
      backoffice.biblio.ugent.be
      Internet                                                                    6%
      mdpi.com
      Internet                                                                    2%
      coursehero.com
      Internet                                                                    2%
      hdl.handle.n
      et                                                                          <1
      Internet                                                                    %
      repository.tcu.e
      du                                                                          <1
      Internet                                                                    %
      frontiersin.or
      g                                                                           <1
      Internet                                                                    %
                                                                        Sources
               Ali Abderrezak. "Economic growth in the Maghrib: Are
Similarity Report                                                               <1
               neighbouring ec...
               Crossref                                                         %
                                                                      Sources
"Intelligent Systems Design and Applications", Springer Science
and Bu...                                                       <1
Crossref                                                         %
mdpi-
res.com                                                          <1
Internet                                                         %
subasish.github.i
o                                                                <1
Internet                                                         %
researchgate.n
et                                                               <1
Internet                                                         %
science.go
v                                                                <1
Internet                                                         %
m.moam.inf
o                                                                <1
Internet                                                         %
                                                       Sources
               ijraset.co
Similarity Report                                                              <1
               m
               Internet                                                        %
                                                                     Sources
gtg.webhost.uoradea
.ro                                                              <1
Internet                                                         %
link.springer.co
m                                                                <1
Internet                                                         %
fastercapital.co
m                                                                <1
Internet                                                         %
u.camdemy.co
m                                                                <1
Internet                                                         %
mattwardhomes.c
om                                                               <1
Internet                                                         %
dspace.univ-
msila.dz                                                         <1
Internet                                                         %
                                                       Sources
               ieomsociety.o
Similarity Report                                                                <1
               rg
               Internet                                                          %
                                                                       Sources
dqfire.co
m                                                                   <1
Internet                                                            %
ojs.wiserpub.co
m                                                                   <1
Internet                                                            %
pdfcoffee.co
m                                                                   <1
Internet                                                            %
peeref.co
m                                                                   <1
Internet                                                            %
tandfonline.co
m                                                                   <1
Internet                                                            %
                                                          Sources
               S. Sisman, A.C. Aydinoglu. "A modelling approach with
Similarity Report                                                                <1
               geographically ...
               Crossref                                                          %
                                                                       Sources
Visar Hoxha. "Exploring the predictive power of ANN and
traditional re...                                                   <1
Crossref                                                            %
freidok.uni-
freiburg.de                                                         <1
Internet                                                            %
0-www-mdpi-
com.brum.beds.ac.uk                                                 <1
Internet                                                            %
Chao Mou, Qing Zhou, Yinan Ran, Liang Ge, Yong Wang.
"Recommendi...                                                      <1
Crossref                                                            %
                                                          Sources
               Shaoze Cui, Yanzhang Wang, Dujuan Wang, Qian Sai, Ziheng
Similarity Report                                                            <1
               Huang, T....
               Crossref                                                      %
               Yuhao Kang, Fan Zhang, Wenzhe Peng, Song Gao, Jinmeng Rao,
               Fabio ...                                                  <1
               Crossref                                                      %
                                                                   Sources
deepai.or
g                               <1
Internet                        %
dokumen.p
ub                              <1
Internet                        %
emrbi.or
g                               <1
Internet                        %
ideas.repec.o
rg                              <1
Internet                        %
iris.polito.
it                              <1
Internet                        %
libweb.kpfu.
ru                              <1
Internet                        %
lup.lub.lu.s
e                               <1
Internet                        %
mydataroad.co
m                               <1
Internet                        %
podtail.s
e                               <1
Internet                        %
123articleonline.co
m                               <1
Internet                        %
                      Sources
               kluniversity.i
Similarity Report                         <1
               n
               Internet                   %
               realtytrac.co
               m                          <1
               Internet                   %
                                Sources
Similarity Report
               Chao Wu, Xinyue Ye, Fu Ren, You Wan, Pengfei Ning, Qingyun
               Du. "Spat...                                                      <1
               Crossref                                                          %
Sources