=Paper=
{{Paper
|id=Vol-3698/paper6
|storemode=property
|title=Literature Review of Explainable Machine Learning in Real Estate
|pdfUrl=https://ceur-ws.org/Vol-3698/paper6.pdf
|volume=Vol-3698
|authors=Arnis Staško,Jānis Grundspeņķis
|dblpUrl=https://dblp.org/rec/conf/balt/StaskoG24
}}
==Literature Review of Explainable Machine Learning in Real Estate==
Literature Review of
Explainable Machine Learning in Real Estate
Arnis Staško1 and Jānis Grundspeņķis1
Riga Technical University, 6A Kipsalas Street, Riga, LV-1048, Latvia
Abstract
A literature review is conducted on explainable machine learning methods used in real estate. It
identifies 17 relevant articles that reveal various subfields of real estate and the explainable
machine learning methods used. Among them, XGBoost and SHAP is the most commonly used
combination for explainable machine learning in the studied area. The study also identifies
research gaps that could be addressed through further studies on time factors, model
explainability, training set balance, and causal dependencies.
Keywords
Real estate, explainable machine learning, research methods, literature review
1. Introduction
The demand for artificial intelligence applications has grown significantly in the last decade.
Companies are looking for ways to integrate artificial intelligence solutions into their
processes to improve their product or service and competitiveness in the market, as well as
to reduce the required amount of labour or costs. Real estate companies are no exception.
There is a shortage of labor and customers expect lower operating costs under competitive
conditions. It is essential to make the right decisions about real estate and its management
where the number of influencing factors is large and difficult for a person to grasp.
Therefore, artificial intelligence solutions could help.
Artificial intelligence studies methods for developing intelligent machines or software
that imitate human behaviour. Although people usually talk about the need to implement
an artificial intelligence solution, in practice it often results in the development of machine
learning solutions. Machine learning is a subfield of artificial intelligence that creates
software models from training examples to perform prediction, recognition, or clustering.
Diverse machine learning algorithms allow us to train systems so that they gain autonomy,
but the disadvantage of the most common ones based on neural networks is the inability to
explain the obtained result (black box). Therefore, there is an increased interest in
explainable machine learning methods, which would not only provide predictions or
Baltic DB&IS Conference Forum and Doctoral Consortium 2024
arnis.stasko@rtu.lv (A.Staško); janis.grundspenkis@rtu.lv (J.Grundspeņķis)
0000-0003-2876-5497 (A.Staško); 0000-0003-2526-4662 (J.Grundspeņķis)
© 2024 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
58
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
recommend decisions but would also argue for the recommended solution (white box). In
the field of real estate people are not ready to blindly trust artificial intelligence to make a
decision about the most expensive thing they own. Explainability is therefore critical.
Therefore, the questions of this research are related to the need to investigate in which
areas of real estate machine learning is used, what research methods and algorithms are
used, why explainable machine learning is chosen and what further research might be
useful. Accordingly, the research object is explainable machine learning methods.
The structure of the work is as follows. Chapter 2 explores the types of literature review
used in similar studies. Further, Chapter 3 describes the literature review approach. Then,
the literature review results are presented in Chapter 4. Finally, the conclusions and future
work are summarized in Chapter 5.
2. Method Selection for Literature Review
To choose a suitable literature review method for the research, a search for publications in
the ScienceDirect2 database is carried out by searching ("machine learning" AND "literature
review") in article titles and limiting results to 2023. Journal articles from the last year
should be sufficient to reasonably identify the most current approaches. From 27 returned
articles only 24 are used due to availability or title relevance.
Briefly browsing the content of the articles and paying special attention to the research
method section, it is found that 20 out of 24 use systematic literature review. On closer
examination, it is seen that the majority leans towards Preferred Reporting Items for
Systematic reviews and Meta-Analyses (PRISMA) guidelines [20] or in the direction of
Kitchenham and Brereton's various modifications of systematic review [12].
Considering that Kitchenham and Brereton's [12] specializes in software engineering
literature reviews, while PRISMA guidelines [20] originate from the medical field, within
the scope of this study the Kitchenham and Brereton's version [12] is adopted. The next
chapter describes the approach of a literature review.
3. Literature Review Protocol
The literature review adapted from Kitchenham and Brereton's version [12] is performed
as follows:
1. Define research questions for the literature review.
2. Perform an initial search in the ScienceDirect database by searching for review
articles related to research questions to ensure that a similar literature review
has not already been conducted.
3. Perform a manual search in the ScienceDirect database by searching for articles
related to research questions. Select candidate papers based on abstract & title.
4. Iteratively perform forward and backward snowballing in the Scopus abstract
and citation database3. Add any missed papers based on abstract & title analysis.
2 https://www.sciencedirect.com/ (accessed December 21, 2023)
3 https://www.scopus.com/ (accessed January 6, 2024)
59
5. Read the full version of selected papers and apply detailed inclusion/exclusion
criteria during the data extraction and quality assessment process.
The authors believe that the use of the combination of ScienceDirect and Scopus provides
sufficient coverage of reliable literature sources.
3.1. Research Questions
The cornerstone of a systematic literature review is the definition of research questions. So,
to achieve the goals set for the research, the research questions are:
• RQ1. In what subfields of real estate explainable machine learning is applied?
• RQ2. What research methods are used to study explainable machine learning in
the field of real estate?
• RQ3. What machine learning methods are used in the field of real estate?
• RQ4. What explainable machine learning methods are used in the field of real
estate?
• RQ5. Why explainable machine learning methods are used in the field of real
estate?
• RQ6. What are the research gaps in explainable machine learning in the field of
real estate?
Further, the results of the availability of similar studies in the literature are analyzed.
3.2. Initial Search
To ensure that a similar reliable literature review is not available, ScienceDirect4 is searched
for keywords related to the research. Results for a search within article titles, an abstract
and keywords are summarized in Table 1.
Table 1
Initial search results
Search phrase Results
("real estate" AND "explainable machine learning" AND "overview") 0
("real estate" AND "explainable machine learning" AND "review") 0
("real estate" AND "explainable machine learning" AND "survey") 0
("real estate" AND "explainable artificial intelligence" AND "overview") 0
("real estate" AND "explainable artificial intelligence" AND "review") 0
("real estate" AND "explainable artificial intelligence" AND "survey") 0
4 https://www.sciencedirect.com/ (accessed January 6 , 2024)
60
The initial search results prove that a potentially similar literature review is not
available. It is justified to carry out the intended literature review. Next, manual search
results are summarized.
3.3. Manual Search
The manual search is performed in the ScienceDirect5 database by searching for research
articles by phrase ("real estate" AND ("explainable machine learning" OR "explainable
artificial intelligence" OR “XAI”)) in article titles, an abstract and keywords. A total of five
articles are found [1], [8], [10], [13], [19]. After reading the title and abstract, all are
accepted as relevant for further research. If there are other publications, authors trust that
they will be discovered in the process of snowballing in the Scopus database.
3.4. Forward & Backward Snowballing
In the forward snowballing all articles citing the examined article and in the backward
snowballing all articles referenced from the examined article according to the Scopus5
database are reviewed and the relevant articles are selected.
In the first iteration, the articles found during manual search are examined. In every next
iteration, the articles found in the previous iteration are examined. As relevant are accepted
articles between 2019 and 2023 with full-text availability and whose title or abstract
reflects a connection with the field of real estate and use explainable machine learning
methods in their research. A total of three iterations are performed. During the 3rd iteration,
no new articles are found and the snowballing is not continued. The summary of all
iterations and results is given in Table 2. With snowballing 12 new articles are added to the
research.
Table 2
Summary of newly discovered and relevant articles discovered during
forward and backward snowballing
Iter. Source articles Forward snowballing Backward snowballing Total
1 5 4: [2], [6], [15], [17] 6: [3], [9], [14], [16], [21], [22] 10
2 10 1: [5] 1: [23] 2
3 2 0 0 0
Once a list of relevant articles for further research is obtained, during the data extraction
step the quality of articles is evaluated in detail and the answers to the research questions
are clarified.
3.5. Data Extraction
According to the research questions data extraction and quality assessment are performed
by reading the full text of each article. While the answers to RQ1, RQ3 and RQ4 are readily
5 https://www.scopus.com/ (accessed January 7, 2024)
61
apparent, RQ2, RQ5 and RQ6 require additional effort. Almost none of the articles mention
the exact research method used. In some of them a case study [3], [9], [10], [14] or a
literature review [2], [8], [22] is mentioned, however, when researched in detail, it can be
seen that the prime research method is a laboratory experiment. Similarly, the justification
of the need for machine learning is to be explained. Several articles take this for granted and
the detailed analysis of the benefits of explainability is performed to determine the real
need. The most difficult is to determine research gaps. Therefore, the future research
questions mentioned in the article are identified. Then, the actual research gaps are
discussed. The data extraction results are presented in Appendix A.
4. Results
The literature review discovered 17 publications from scientific journals with Scopus cite
scores between 3.3 and 14.8 (2023 data updated on 05.01.2024.). While the journal Habitat
International6 is ranked first in terms of the number of articles, the journal Reliability
Engineering and System Safety7 have the highest citation score 14.8. The full journal list is
presented in Table 3. These results show that all articles are published in acknowledged
editions.
Table 3
Journals presenting discovered articles
Journal Cite Score 2023 Articles
Reliability Engineering and System Safety 14.8 1
Land Use Policy 13.3 1
Expert Systems with Applications 13.2 2
Finance Research Letters 10.8 2
Habitat International 10.2 3
Applied Geography 7.8 1
Big Data Research 7.8 1
Sensors 6.9 1
International Journal of Geo-Information (ISPRS) 6.7 1
Real Estate Economics 4.0 1
Journal of Real Estate Finance and Economics 3.7 1
Risks 3.6 1
Buildings 3.3 1
The literature study identified 64 authors publishing on the application of explainable
machine learning in real estate. In terms of citations, the top most significant are the works
of Kang & Zhang et.al. [9] with 86 citations, Chen & Yao et.al. [3] with 39 citations and Rico-
6 https://www.sciencedirect.com/journal/habitat-international
7 https://www.sciencedirect.com/journal/reliability-engineering-and-system-safety
62
Juan & Taltavull de La Paz [21] with 38 citations. Visualization is used to demonstrate the
scope of the authors' contribution (Figure 1).
Figure 1: Author work by citations.
Significant to discover a set of keywords that illustrate the topic of the reviewed articles
(Figure 2). They represent the research area.
Figure 2: Keywords presenting discovered articles.
63
The subsections below summarize the answers to the research questions.
4.1. RQ1: Real estate subfields
The first research question RQ1 is “In what subfields of real estate explainable machine
learning is applied?” The literature study reveals 8 different research subfields in real
estate, where the most frequently addressed issue is real estate price prediction [1], [3],
[8], [9], [10], [14], [21], [22], then follows real estate price estimation [16], [5] and real
estate rent price prediction [6], [13]. One study from each subfield represents on
understanding of the land use intensity [2], real estate fire loss prediction [23], building
thermal comfort requirement prediction [15], stadium fire risk assessment [17] and
credit default prediction of real estate companies [19]. This information gives an idea in
which areas it would be possible to repeat similar studies in a reader’s region, and also
allows to navigate which directions have not yet been covered, in case new research is
implemented.
4.2. RQ2: Research methods
The second research question RQ2 is “What research methods are used to study explainable
machine learning in the field of real estate?” Evaluating all articles, it can be concluded that
they all represent a laboratory experiment as a research method. This is quite
understandable since building a machine learning model consists of training a model and
evaluating its results using a testing set. Such an approach by default involves a laboratory
experiment.
In addition, it should be noted that in four articles it is mentioned that a case study is
conducted [3], [9], [10], [14]. On the other hand, from the content of three articles, it is
observable that a literature review is carried out [2], [8], [22].
4.3. RQ3: Machine learning methods
The third research question RQ3 is “What machine learning methods are used in the field of
real estate?” When searching for answers to this question, two aspects were evaluated -
firstly, which machine learning methods are used and secondly, which of them shows the
highest results or is the only one tested. The list of the machine learning methods studied in
real estate is provided in Table 4.
The XGBoost method shows the best results or is chosen as appropriate in 7 out of 10
cases [3], [5], [6], [8], [10], [16], [22]. It is followed by Random forest in 4 out of 10 cases
[2], [14], [17], [21] and LightGBM in 2 out of 4 cases [1], [15]. One in each study IBTEM [23],
CatBoost [13], AdaBoost [19] and Gradient boosting machine [9]. The top three methods –
XGBoost [4], Random Forest [7] & LightGBM [11] are based on decision tree algorithms. The
results are useful as they allow to make research-based choices about the machine learning
method for similar research.
64
Table 4
Machine learning methods studied in real estate
No Method Count No Method Count
1 XGBoost (#1) 10 12 EBM 1
2 Random Forest (#2) 10 13 Elastic net 1
3 LightGBM (#3) 4 14 GBDT 1
4 AdaBoost 3 15 GBR 1
5 KNN 3 16 IBTEM 1
6 Linear regression 3 17 Lasso regression 1
7 CatBoost 2 18 Logistic regression 1
8 Decision tree 2 19 Multiple linear regression 1
9 Gradient Boosting 2 20 Naïve Bayes 1
Neural network
10 Ridge regression 2 21 1
(Multilayer perceptron)
11 SVR 2 22 SVM 1
4.4. RQ4: Explainable machine learning methods
The fourth research question RQ4 is “What explainable machine learning methods are used
in the field of real estate?” In the field of explainable machine learning, six different methods
are used in the literature – SHAP [1], [3], [6], [8], [10], [13], [15], [17], [19], [21], [23]; FI [2],
[9], [14], [16], [22]; PDPs [13], [14], [16], [22]; PFI [5], [8], [13]; ALE plots [2], [13], [16]; ICE
[19]. The SHAP [18] method and its various modifications are the most widely used. The
SHAP global and local explanations provide an opportunity to explain black box machine
learning techniques. It allows to build a complex / black-box machine learning model that
provides the highest possible results, while maintaining the possibility of understanding its
operation, as well as gaining knowledge about the field under study.
4.5. RQ5: The reason for explainable machine learning
The fifth research question RQ5 is “Why explainable machine learning methods are used in
the field of real estate?” Analyzing the publications, the reasons why their authors chose to
use explainable machine learning methods can be interpreted in different ways, however,
in fact, all researches found in the field of real estate are united by one goal - to understand
the decision or forecast suggested by the model or to find correlations between the known
information and the predicted outcome. Explainability simultaneously provides both
knowledge of the researched field and increases users' confidence in the obtained solution.
A detailed analysis can be found in Appendix A.
4.6. RQ6: Research gaps
The sixth research question RQ6 is “What are the research gaps in explainable machine
learning in the field of real estate?” This is the most difficult question to analyze when
studying the literature. The authors of each article indicate possible further work or
65
improvements as a continuation of their research. However, that does not always indicate
research gaps in general.
11 studies out of 17 note the need to repeat the study with better quality, additional or
different types of data [1], [3], [5], [8], [9], [13], [14], [15], [17], [22], [23]. 8 studies note the
need to improve the performance of algorithms by tuning them or testing others [1], [5],
[9], [13], [14], [16], [21], [22]. 6 studies propose to try the solution in a different
geographical location [2], [3], [6], [8], [14], [22]. 4 studies encourage to try a solution in real
life or explore specific aspects of real life [2], [6], [19], [23]. 3 studies suggest improving the
speed of the algorithm [5], [16], [17], or including the time factor [3], [9], [23] in the analysis
of the problem sphere. Only 2 studies suggest improving model explainability [9], [21]. In
conclusion, one study at a time encourages comparing the results of different fields [3],
solving the imbalance of the data set [17] or looking for the true causal dependencies [5].
5. Conclusions
From the conducted literature review it is evident that explainable machine learning
methods in the field of real estate are used to determine property value, rent and price, as
well as land use intensity, fire damage, thermal comfort, fire risk and bankruptcy prediction.
In the field of machine learning, the most suitable research method is a laboratory
experiment, and it is useful to apply a literature review and/or case study, if necessary. The
study also indicates that the decision tree based XGBoost, Random Forest & LightGBM
machine learning methods and SHAP explainable machine learning method are the most
suitable or most used in real estate, providing the results of the highest value. The use of
explainable machine learning is mainly necessary to understand the decision or forecast.
Moreover, it provides an understating about the researched field and increases trust in the
obtained machine learning model.
On the other hand, the study of research gaps gives only general ideas for further
research. It’s offered to make common improvements to existing solutions, to use additional
data, to replicate the experiment in other areas or to try the solution in real-life situations.
Scientific innovations could be sought in studies of time factors, model explainability,
training set balance, and causal dependencies. However, before starting further research in
these directions, additional research is needed to clarify what is done in specific technical
areas that are not limited to real estate.
The results of this literature review can be used for further decisions on the
implementation of similar research in the reader’s region or for the initiation of new /
unexplored research directions in the field of real estate.
Acknowledgements
The research leading to these results is part of the research project "Multi-contextual data
analytics solutions for building management" jointly implemented by Riga Technical
University, SIA "Lursoft IT" and SIA "Hagberg".
66
References
[1] Baur K., Rosenfelder M. & Lutz B., “Automated real estate valuation with machine
learning models using property descriptions,” Expert Systems with Applications, vol.
213, pp.1-13, Mar. 2023.
[2] Belmiro C., Silveira Neto R.D.M., Barros A. & Ospina R., “Understanding the land use
intensity of residential buildings in Brazil: An ensemble machine learning approach,”
Habitat International, vol. 139, pp.1-12., Sep. 2023.
[3] Chen L., Yao X., Liu Y., Zhu Y., Chen W., Zhao X. & Chi T., “Measuring impacts of urban
environmental elements on housing prices based on multisource data – a case study of
Shanghai, China,” International Journal of Geo-Information (ISPRS), vol. 9 (no. 2), pp.1-
23, Feb. 2020.
[4] Chen T. & Guestrin C., “XGBoost: A Scalable Tree Boosting System,” in Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (KDD’16), 2016, pp. 785–794.
[5] Deppner, J., von Ahlefeldt-Dehn, B., Beracha, E. & Schaefers, W., “Boosting the Accuracy
of Commercial Real Estate Appraisals: An Interpretable Machine Learning Approach,”
Journal of Real Estate Finance and Economics, pp.1-38, Mar. 2023.
[6] Dou M., Gu Y. & Fan H., “Incorporating neighborhoods with explainable artificial
intelligence for modeling fine-scale housing prices,” Applied Geography, vol. 158, pp.1-
11, Sep. 2023.
[7] Ho T.K., “Random Decision Forests,” in Proceedings of the 3rd International Conference
on Document Analysis and Recognition (ICDAR’95), 1995, pp. 278–282.
[8] Iban, M.C, “An explainable model for the mass appraisal of residences: The application
of tree-based Machine Learning algorithms and interpretation of value determinants,”
Habitat International, vol. 128, pp.1-11, Oct. 2022.
[9] Kang Y., F. Zhang F., Peng W., Gao S., Rao J., Duarte F. & Ratti C., “Understanding house
price appreciation using multi-source big geo-data and machine learning,” Land Use
Policy, vol. 111, pp.1-11, Dec. 2021.
[10] Karamanou A., Kalampokis E. & Tarabanis K., “Linked Open Government Data to Predict
and Explain House Prices: The Case of Scottish Statistics Portal,” Big Data Research, vol.
30, pp.1-15, Nov. 2022.
[11] Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q. & Liu T.Y., "LightGBM: A Highly
Efficient Gradient Boosting Decision Tree," in Proceedings of the Advances in Neural
Information Processing Systems 30 (NIPS'17), 2017, pp. 3148-3156.
[12] Kitchenham B. & Brereton P., “A systematic review of systematic review process
research in software engineering,” Information and Software Technology, vol. 55 (no.
12), pp. 2049-2075, Dec. 2013.
[13] Lenaers I. & De Moor L., “Exploring XAI techniques for enhancing model transparency
and interpretability in real estate rent prediction: A comparative study,” Finance
Research Letters, vol. 58, pp.1-9, Dec. 2023.
[14] Levantesi S. & Piscopo G., “The Importance of Economic Variables on London Real
Estate Market: A Random Forest Approach,” Risks, vol. 8 (no. 4), pp.1-17, Dec. 2020.
67
[15] Liu H. & Ma E., “An Explainable Evaluation Model for Building Thermal Comfort in
China,” Buildings, vol. 13 (no. 12), pp.1-20, Dec. 2023.
[16] Lorenz F., Willwersch J., Cajias M. & Fuerst F., “Interpretable machine learning for real
estate market analysis,” Real Estate Economics, vol. 51 (no. 5), pp. 1178-1208, Sep.
2023.
[17] Lu Y., Fan X., Zhang Y., Wang Y. & Jiang X., “Machine Learning Models Using SHapley
Additive exPlanation for Fire Risk Assessment Mode and Effects Analysis of Stadiums,”
Sensors, vol. 23 (no. 4), pp.1-19, Feb. 2023.
[18] Lundberg S.M & Lee S.I., “A Unified Approach to Interpreting Model Predictions,” in
Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS’17),
2017, pp. 4768–4777.
[19] Ma Y., Zhang P., Duan S. & Zhang T., “Credit default prediction of Chinese real estate
listed companies based on explainable machine learning,” Finance Research Letters,
vol. 58, Dec. 2023.
[20] Moher D., Liberati A., Tetzlaff J., Altman D.G., Antes G., Atkins D., Barbour V., Barrowman
N., Berlin J.A., Clark J., Clarke M., Cook D., D'Amico R., Deeks J.J., Devereaux P.J., Dickersin
K., Egger M., Ernst E., Gøtzsche P.C., Grimshaw J., Guyatt G., Higgins J., Ioannidis J.P.A.,
Kleijnen J., Lang T., Magrini N., McNamee D., Moja L., Mulrow C., Napoli M., Oxman A.,
Pham B., Rennie D., Sampson M., Schulz K.F., Shekelle P.G., Tovey D., Tugwell P.,
“Preferred reporting items for systematic reviews and meta-analyses: The PRISMA
statement,” PLoS Medicine, vol. 6 (no. 7), pp.1-6, Jul. 2009.
[21] Rico-Juan J.R. & Taltavull de La Paz P., “Machine learning with explainability or spatial
hedonics tools? An analysis of the asking prices in the housing market in Alicante,
Spain,” Expert Systems with Applications, vol. 171, pp.1-14, Jun. 2021.
[22] Taecharungroj V., “Google Maps amenities and condominium prices: Investigating the
effects and relationships using machine learning,” Habitat International, vol. 118, pp.1-
12, Dec. 2021.
[23] Wang N., Xu Y. & Wang S., “Interpretable boosting tree ensemble method for
multisource building fire loss prediction,” Reliability Engineering and System Safety,
vol. 225, pp.1-17, Sep. 2022.
68
A. Data extraction and assessment results
RQ3: Machine RQ4:
ID/ RQ1: RQ2: Research
Learning Method Explainable RQ5: Why explain RQ6: Gaps
REF Subfields Method
Method
1. Test the model in other cities;
XGBoost (Random To identify what the
Real estate 2. Repeat the experiment with verified and
A1 Literature review Forest, XGBoost, model takes into account
price PFI, SHAP reliable value data;
[8] Lab experiment LightGBM, when estimating real
prediction 3. Repeat the experiment with the addition of
Gradient Boosting) estate prices
socio-economic and demographic data.
1. To justify the reliability
of a predictive model;
Real estate
A2 Case study, 2. To understand which
price XGBoost SHAP n/a
[10] Lab experiment factors affect and
prediction
determine the prices of
houses.
1. Validate the influence of different descriptions
on real estate price in a controlled laboratory
experiment;
2. Prove that the difference between non-
LightGBM (Linear contextualized methods and contextualized
Real estate regression, To understand how the embeddings increases even more through fine-
A3
price Lab experiment elastic net, SVR, SHAP model arrives at tuning a pre-trained BERT model.
[1]
prediction random forest, decisions 3. Repeat the experiment on real estate
LightGBM) descriptions in other languages than English and
German.
4. Extend the approach to the textual
descriptions of short-term rent offers like hotel
rooms or AirBnB offers.
Validate methodology on other real estate or
To gain a comprehensive
Real estate CatBoost (Ridge financial-economic datasets and models to
A4 ALE plots, PDPs, understanding of the
rent price Lab experiment regression, deepen our understanding of the substitutability,
[13] PFI, SHAP factors
prediction XGBoost, CatBoost) complementarity, benefits, and limitations of XAI
driving rent
techniques in finance
A5 Credit AdaBoost
Lab experiment ICE, SHAP To clearly understand Implement results for practical applications.
[19] default (AdaBoost, EBM,
69
RQ3: Machine RQ4:
ID/ RQ1: RQ2: Research
Learning Method Explainable RQ5: Why explain RQ6: Gaps
REF Subfields Method
Method
prediction of Logistic regression, the ranking of feature
real estate Random forest, importance and the
companies SVM) impact on the prediction
results
1. Consider other urban realities;
To understand the
Understand Random forest 2. Apply the model on commercial lots and its
A6 Literature review factors responsible for
the land use (Random forest, ALE plots, FI variation according to economic activities;
[2] Lab experiment the higher urban land use
intensity XGBoost). 3. Investigate urban physical structure of urban
intensity in cities
centers.
To understand the 1. Test the model in other cities;
Real estate
A7 relationships between 2. Analyze neighbourhood characteristic
rent price Lab experiment XGBoost SHAP
[6] housing units and their interactive or synergetic impacts on housing
prediction
neighbourhoods prices.
LightGBM
Building (Bayesian-
thermal optimized To understand the
A8
comfort Lab experiment LightGBM, KNN, SHAP thermal requirements of Incorporate additional variables in the model
[15]
requirement Random forest, building occupants
prediction XGBoost, GBDT,
SVR)
Random forest
(Naïve Bayes, KNN, To find the complex 1. Repeat the experiment with additional data;
Stadium fire
A9 Decision tree, nonlinear relationship 2. Explore ways to solve the label imbalance;
risk Lab experiment SHAP
[17] AdaBoost, between risk features 3. Increase operational efficiency and reduce
assessment
LightGBM, Random and stadium fire risk. time costs;
forest)
1. Test the model in other cities;
2. Quantify the differences among cities;
XGBoost (Linear To explain the impacts of
Real estate 3. Integrate multi-year data to analyze the
A10 Case study Regression, urban environmental
price SHAP temporal dynamics of the impacts of the urban
[3] Lab experiment XGBoost, Random elements
prediction environmental elements on housing prices;
forest, GBR) on housing prices
4. Repeat the experiment with additional and
improved data.
70
RQ3: Machine RQ4:
ID/ RQ1: RQ2: Research
Learning Method Explainable RQ5: Why explain RQ6: Gaps
REF Subfields Method
Method
To investigate the 1. Repeat the experiment with additional data;
Real estate relationship between 2. Additionally test and tune the model;
A11 Literature review XGBoost (Random
price FI, PDPs neighborhood amenities 3. Identifying the similarities and differences in
[22] Lab experiment forest, XGBoost)
prediction and the prices of the importance of amenities in various
condominiums geographical areas.
To better explain which
1. Repeat the experiment with additional and
variables have more
improved data;
Real estate importance in
A12 Case study 2. Repeat the experiment with different machine
price Random forest FI, PDPs describing the evolution
[14] Lab experiment learning algorithms;
prediction of the house price
3. Repeat the experiment on other real estate
following an urban
datasets.
approach
1. Test the model in other cities;
Gradient boosting 2. Repeat the experiment with DCNN;
machine (GBM) 3. Include dynamics of urban land use changes
Real estate with decision trees To examine the effect of into the framework with richer datasets;
A13 Case study
price (Gradient boosting FI different variables on 4. Add deeper exploration and more
[9] Lab experiment
prediction machine (GBM), house price appreciation explanations.
Multiple linear 5. Involve more time-series data and approaches
regression (MLR)) from the economy to build causality relationships
and improve the interpretability of the model.
Random forest
(KNN, Decision
tree, Random
forest, AdaBoost,
To observe non-linear 1. Repeat the experiment with deep artificial
Real estate CatBoost, Neural
A14 relationships between neural networks;
price Lab experiment network SHAP
[21] housing prices and 2. Resolve explainability challenges in deep
prediction (Multilayer
housing attributes artificial neural networks.
perceptron),
Linear regression,
Ridge regression,
Lasso regression)
71
RQ3: Machine RQ4:
ID/ RQ1: RQ2: Research
Learning Method Explainable RQ5: Why explain RQ6: Gaps
REF Subfields Method
Method
Real estate 1. Enhance model building speed;
A15 ALE plots, FI, To justify decisions and
price Lab experiment XGBoost 2. Improve the reliability and validity of
[16] PDPs generate new insights
estimation algorithmic decision-making.
1. Use a model to assist relevant departments in
To understand the making timelier decisions regarding dispatching
Real estate IBTEM (Catboost,
A16 reasons for making aid and mobilizing resources;
fire loss Lab experiment XGBoost, SHAP
[23] certain decisions or 2. Repeat the experiment with updated data;
prediction LightGBM)
predictions 3. Include time series forecasting in the building
fire loss prediction.
1. Improve data availability for machine learning
experiments;
2. Justify the rationale behind patterns or
Real estate
A17 To make informed determine causality in the relation between input
price Lab experiment XGBoost PFI
[5] decisions and output data;
estimation
3. Enhance model building speed;
4. Improve machine learning algorithms for the
field of real estate.
72