=Paper= {{Paper |id=Vol-3698/paper6 |storemode=property |title=Literature Review of Explainable Machine Learning in Real Estate |pdfUrl=https://ceur-ws.org/Vol-3698/paper6.pdf |volume=Vol-3698 |authors=Arnis Staško,Jānis Grundspeņķis |dblpUrl=https://dblp.org/rec/conf/balt/StaskoG24 }} ==Literature Review of Explainable Machine Learning in Real Estate== https://ceur-ws.org/Vol-3698/paper6.pdf
                                Literature Review of
                                Explainable Machine Learning in Real Estate
                                Arnis Staško1 and Jānis Grundspeņķis1

                                Riga Technical University, 6A Kipsalas Street, Riga, LV-1048, Latvia

                                                Abstract
                                                A literature review is conducted on explainable machine learning methods used in real estate. It
                                                identifies 17 relevant articles that reveal various subfields of real estate and the explainable
                                                machine learning methods used. Among them, XGBoost and SHAP is the most commonly used
                                                combination for explainable machine learning in the studied area. The study also identifies
                                                research gaps that could be addressed through further studies on time factors, model
                                                explainability, training set balance, and causal dependencies.

                                                Keywords
                                                Real estate, explainable machine learning, research methods, literature review



                                1. Introduction
                                The demand for artificial intelligence applications has grown significantly in the last decade.
                                Companies are looking for ways to integrate artificial intelligence solutions into their
                                processes to improve their product or service and competitiveness in the market, as well as
                                to reduce the required amount of labour or costs. Real estate companies are no exception.
                                There is a shortage of labor and customers expect lower operating costs under competitive
                                conditions. It is essential to make the right decisions about real estate and its management
                                where the number of influencing factors is large and difficult for a person to grasp.
                                Therefore, artificial intelligence solutions could help.
                                   Artificial intelligence studies methods for developing intelligent machines or software
                                that imitate human behaviour. Although people usually talk about the need to implement
                                an artificial intelligence solution, in practice it often results in the development of machine
                                learning solutions. Machine learning is a subfield of artificial intelligence that creates
                                software models from training examples to perform prediction, recognition, or clustering.
                                Diverse machine learning algorithms allow us to train systems so that they gain autonomy,
                                but the disadvantage of the most common ones based on neural networks is the inability to
                                explain the obtained result (black box). Therefore, there is an increased interest in
                                explainable machine learning methods, which would not only provide predictions or


                                Baltic DB&IS Conference Forum and Doctoral Consortium 2024
                                   arnis.stasko@rtu.lv (A.Staško); janis.grundspenkis@rtu.lv (J.Grundspeņķis)
                                   0000-0003-2876-5497 (A.Staško); 0000-0003-2526-4662 (J.Grundspeņķis)
                                             © 2024 Copyright for this paper by its authors.
                                             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                                                                           58
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
recommend decisions but would also argue for the recommended solution (white box). In
the field of real estate people are not ready to blindly trust artificial intelligence to make a
decision about the most expensive thing they own. Explainability is therefore critical.
   Therefore, the questions of this research are related to the need to investigate in which
areas of real estate machine learning is used, what research methods and algorithms are
used, why explainable machine learning is chosen and what further research might be
useful. Accordingly, the research object is explainable machine learning methods.
   The structure of the work is as follows. Chapter 2 explores the types of literature review
used in similar studies. Further, Chapter 3 describes the literature review approach. Then,
the literature review results are presented in Chapter 4. Finally, the conclusions and future
work are summarized in Chapter 5.

2. Method Selection for Literature Review
To choose a suitable literature review method for the research, a search for publications in
the ScienceDirect2 database is carried out by searching ("machine learning" AND "literature
review") in article titles and limiting results to 2023. Journal articles from the last year
should be sufficient to reasonably identify the most current approaches. From 27 returned
articles only 24 are used due to availability or title relevance.
    Briefly browsing the content of the articles and paying special attention to the research
method section, it is found that 20 out of 24 use systematic literature review. On closer
examination, it is seen that the majority leans towards Preferred Reporting Items for
Systematic reviews and Meta-Analyses (PRISMA) guidelines [20] or in the direction of
Kitchenham and Brereton's various modifications of systematic review [12].
    Considering that Kitchenham and Brereton's [12] specializes in software engineering
literature reviews, while PRISMA guidelines [20] originate from the medical field, within
the scope of this study the Kitchenham and Brereton's version [12] is adopted. The next
chapter describes the approach of a literature review.

3. Literature Review Protocol
The literature review adapted from Kitchenham and Brereton's version [12] is performed
as follows:

      1. Define research questions for the literature review.
      2. Perform an initial search in the ScienceDirect database by searching for review
         articles related to research questions to ensure that a similar literature review
         has not already been conducted.
      3. Perform a manual search in the ScienceDirect database by searching for articles
         related to research questions. Select candidate papers based on abstract & title.
      4. Iteratively perform forward and backward snowballing in the Scopus abstract
         and citation database3. Add any missed papers based on abstract & title analysis.


2       https://www.sciencedirect.com/ (accessed December 21, 2023)
3       https://www.scopus.com/ (accessed January 6, 2024)




                                                  59
      5. Read the full version of selected papers and apply detailed inclusion/exclusion
         criteria during the data extraction and quality assessment process.

The authors believe that the use of the combination of ScienceDirect and Scopus provides
sufficient coverage of reliable literature sources.

3.1. Research Questions
The cornerstone of a systematic literature review is the definition of research questions. So,
to achieve the goals set for the research, the research questions are:

       •   RQ1. In what subfields of real estate explainable machine learning is applied?
       •   RQ2. What research methods are used to study explainable machine learning in
           the field of real estate?
       •   RQ3. What machine learning methods are used in the field of real estate?
       •   RQ4. What explainable machine learning methods are used in the field of real
           estate?
       •   RQ5. Why explainable machine learning methods are used in the field of real
           estate?
       •   RQ6. What are the research gaps in explainable machine learning in the field of
           real estate?

Further, the results of the availability of similar studies in the literature are analyzed.

3.2. Initial Search
To ensure that a similar reliable literature review is not available, ScienceDirect4 is searched
for keywords related to the research. Results for a search within article titles, an abstract
and keywords are summarized in Table 1.

Table 1
Initial search results
                                     Search phrase                                   Results
            ("real estate" AND "explainable machine learning" AND "overview")          0
            ("real estate" AND "explainable machine learning" AND "review")            0
            ("real estate" AND "explainable machine learning" AND "survey")            0
            ("real estate" AND "explainable artificial intelligence" AND "overview")   0
            ("real estate" AND "explainable artificial intelligence" AND "review")     0
            ("real estate" AND "explainable artificial intelligence" AND "survey")     0




4       https://www.sciencedirect.com/ (accessed January 6 , 2024)




                                                    60
   The initial search results prove that a potentially similar literature review is not
available. It is justified to carry out the intended literature review. Next, manual search
results are summarized.

3.3. Manual Search
The manual search is performed in the ScienceDirect5 database by searching for research
articles by phrase ("real estate" AND ("explainable machine learning" OR "explainable
artificial intelligence" OR “XAI”)) in article titles, an abstract and keywords. A total of five
articles are found [1], [8], [10], [13], [19]. After reading the title and abstract, all are
accepted as relevant for further research. If there are other publications, authors trust that
they will be discovered in the process of snowballing in the Scopus database.

3.4. Forward & Backward Snowballing
In the forward snowballing all articles citing the examined article and in the backward
snowballing all articles referenced from the examined article according to the Scopus5
database are reviewed and the relevant articles are selected.
   In the first iteration, the articles found during manual search are examined. In every next
iteration, the articles found in the previous iteration are examined. As relevant are accepted
articles between 2019 and 2023 with full-text availability and whose title or abstract
reflects a connection with the field of real estate and use explainable machine learning
methods in their research. A total of three iterations are performed. During the 3rd iteration,
no new articles are found and the snowballing is not continued. The summary of all
iterations and results is given in Table 2. With snowballing 12 new articles are added to the
research.

Table 2
Summary of newly discovered and relevant articles discovered during
forward and backward snowballing
    Iter. Source articles Forward snowballing          Backward snowballing            Total
      1         5          4: [2], [6], [15], [17] 6: [3], [9], [14], [16], [21], [22]  10
      2        10                   1: [5]                        1: [23]               2
      3         2                     0                              0                  0

   Once a list of relevant articles for further research is obtained, during the data extraction
step the quality of articles is evaluated in detail and the answers to the research questions
are clarified.

3.5. Data Extraction
According to the research questions data extraction and quality assessment are performed
by reading the full text of each article. While the answers to RQ1, RQ3 and RQ4 are readily


5        https://www.scopus.com/ (accessed January 7, 2024)




                                                   61
apparent, RQ2, RQ5 and RQ6 require additional effort. Almost none of the articles mention
the exact research method used. In some of them a case study [3], [9], [10], [14] or a
literature review [2], [8], [22] is mentioned, however, when researched in detail, it can be
seen that the prime research method is a laboratory experiment. Similarly, the justification
of the need for machine learning is to be explained. Several articles take this for granted and
the detailed analysis of the benefits of explainability is performed to determine the real
need. The most difficult is to determine research gaps. Therefore, the future research
questions mentioned in the article are identified. Then, the actual research gaps are
discussed. The data extraction results are presented in Appendix A.

4. Results
The literature review discovered 17 publications from scientific journals with Scopus cite
scores between 3.3 and 14.8 (2023 data updated on 05.01.2024.). While the journal Habitat
International6 is ranked first in terms of the number of articles, the journal Reliability
Engineering and System Safety7 have the highest citation score 14.8. The full journal list is
presented in Table 3. These results show that all articles are published in acknowledged
editions.

Table 3
Journals presenting discovered articles
                            Journal                     Cite Score 2023 Articles
       Reliability Engineering and System Safety              14.8         1
       Land Use Policy                                        13.3         1
       Expert Systems with Applications                       13.2         2
       Finance Research Letters                               10.8         2
       Habitat International                                  10.2         3
       Applied Geography                                       7.8         1
       Big Data Research                                       7.8         1
       Sensors                                                 6.9         1
       International Journal of Geo-Information (ISPRS)        6.7         1
       Real Estate Economics                                   4.0         1
       Journal of Real Estate Finance and Economics            3.7         1
       Risks                                                   3.6         1
       Buildings                                               3.3         1

   The literature study identified 64 authors publishing on the application of explainable
machine learning in real estate. In terms of citations, the top most significant are the works
of Kang & Zhang et.al. [9] with 86 citations, Chen & Yao et.al. [3] with 39 citations and Rico-




6       https://www.sciencedirect.com/journal/habitat-international
7       https://www.sciencedirect.com/journal/reliability-engineering-and-system-safety




                                                   62
Juan & Taltavull de La Paz [21] with 38 citations. Visualization is used to demonstrate the
scope of the authors' contribution (Figure 1).




Figure 1: Author work by citations.


Significant to discover a set of keywords that illustrate the topic of the reviewed articles
(Figure 2). They represent the research area.




Figure 2: Keywords presenting discovered articles.




                                              63
The subsections below summarize the answers to the research questions.


4.1. RQ1: Real estate subfields
The first research question RQ1 is “In what subfields of real estate explainable machine
learning is applied?” The literature study reveals 8 different research subfields in real
estate, where the most frequently addressed issue is real estate price prediction [1], [3],
[8], [9], [10], [14], [21], [22], then follows real estate price estimation [16], [5] and real
estate rent price prediction [6], [13]. One study from each subfield represents on
understanding of the land use intensity [2], real estate fire loss prediction [23], building
thermal comfort requirement prediction [15], stadium fire risk assessment [17] and
credit default prediction of real estate companies [19]. This information gives an idea in
which areas it would be possible to repeat similar studies in a reader’s region, and also
allows to navigate which directions have not yet been covered, in case new research is
implemented.

4.2. RQ2: Research methods
The second research question RQ2 is “What research methods are used to study explainable
machine learning in the field of real estate?” Evaluating all articles, it can be concluded that
they all represent a laboratory experiment as a research method. This is quite
understandable since building a machine learning model consists of training a model and
evaluating its results using a testing set. Such an approach by default involves a laboratory
experiment.
   In addition, it should be noted that in four articles it is mentioned that a case study is
conducted [3], [9], [10], [14]. On the other hand, from the content of three articles, it is
observable that a literature review is carried out [2], [8], [22].

4.3. RQ3: Machine learning methods
The third research question RQ3 is “What machine learning methods are used in the field of
real estate?” When searching for answers to this question, two aspects were evaluated -
firstly, which machine learning methods are used and secondly, which of them shows the
highest results or is the only one tested. The list of the machine learning methods studied in
real estate is provided in Table 4.
    The XGBoost method shows the best results or is chosen as appropriate in 7 out of 10
cases [3], [5], [6], [8], [10], [16], [22]. It is followed by Random forest in 4 out of 10 cases
[2], [14], [17], [21] and LightGBM in 2 out of 4 cases [1], [15]. One in each study IBTEM [23],
CatBoost [13], AdaBoost [19] and Gradient boosting machine [9]. The top three methods –
XGBoost [4], Random Forest [7] & LightGBM [11] are based on decision tree algorithms. The
results are useful as they allow to make research-based choices about the machine learning
method for similar research.




                                                64
Table 4
Machine learning methods studied in real estate
      No    Method             Count               No    Method                     Count
      1     XGBoost (#1)        10                 12    EBM                          1
      2     Random Forest (#2)  10                 13    Elastic net                  1
      3     LightGBM (#3)        4                 14    GBDT                         1
      4     AdaBoost             3                 15    GBR                          1
      5     KNN                  3                 16    IBTEM                        1
      6     Linear regression    3                 17    Lasso regression             1
      7     CatBoost             2                 18    Logistic regression          1
      8     Decision tree        2                 19    Multiple linear regression   1
      9     Gradient Boosting    2                 20    Naïve Bayes                  1
                                                         Neural network
      10    Ridge regression              2        21                                 1
                                                         (Multilayer perceptron)
      11    SVR                           2        22    SVM                          1


4.4. RQ4: Explainable machine learning methods
The fourth research question RQ4 is “What explainable machine learning methods are used
in the field of real estate?” In the field of explainable machine learning, six different methods
are used in the literature – SHAP [1], [3], [6], [8], [10], [13], [15], [17], [19], [21], [23]; FI [2],
[9], [14], [16], [22]; PDPs [13], [14], [16], [22]; PFI [5], [8], [13]; ALE plots [2], [13], [16]; ICE
[19]. The SHAP [18] method and its various modifications are the most widely used. The
SHAP global and local explanations provide an opportunity to explain black box machine
learning techniques. It allows to build a complex / black-box machine learning model that
provides the highest possible results, while maintaining the possibility of understanding its
operation, as well as gaining knowledge about the field under study.

4.5. RQ5: The reason for explainable machine learning
The fifth research question RQ5 is “Why explainable machine learning methods are used in
the field of real estate?” Analyzing the publications, the reasons why their authors chose to
use explainable machine learning methods can be interpreted in different ways, however,
in fact, all researches found in the field of real estate are united by one goal - to understand
the decision or forecast suggested by the model or to find correlations between the known
information and the predicted outcome. Explainability simultaneously provides both
knowledge of the researched field and increases users' confidence in the obtained solution.
A detailed analysis can be found in Appendix A.

4.6. RQ6: Research gaps
The sixth research question RQ6 is “What are the research gaps in explainable machine
learning in the field of real estate?” This is the most difficult question to analyze when
studying the literature. The authors of each article indicate possible further work or




                                                   65
improvements as a continuation of their research. However, that does not always indicate
research gaps in general.
    11 studies out of 17 note the need to repeat the study with better quality, additional or
different types of data [1], [3], [5], [8], [9], [13], [14], [15], [17], [22], [23]. 8 studies note the
need to improve the performance of algorithms by tuning them or testing others [1], [5],
[9], [13], [14], [16], [21], [22]. 6 studies propose to try the solution in a different
geographical location [2], [3], [6], [8], [14], [22]. 4 studies encourage to try a solution in real
life or explore specific aspects of real life [2], [6], [19], [23]. 3 studies suggest improving the
speed of the algorithm [5], [16], [17], or including the time factor [3], [9], [23] in the analysis
of the problem sphere. Only 2 studies suggest improving model explainability [9], [21]. In
conclusion, one study at a time encourages comparing the results of different fields [3],
solving the imbalance of the data set [17] or looking for the true causal dependencies [5].

5. Conclusions
From the conducted literature review it is evident that explainable machine learning
methods in the field of real estate are used to determine property value, rent and price, as
well as land use intensity, fire damage, thermal comfort, fire risk and bankruptcy prediction.
   In the field of machine learning, the most suitable research method is a laboratory
experiment, and it is useful to apply a literature review and/or case study, if necessary. The
study also indicates that the decision tree based XGBoost, Random Forest & LightGBM
machine learning methods and SHAP explainable machine learning method are the most
suitable or most used in real estate, providing the results of the highest value. The use of
explainable machine learning is mainly necessary to understand the decision or forecast.
Moreover, it provides an understating about the researched field and increases trust in the
obtained machine learning model.
   On the other hand, the study of research gaps gives only general ideas for further
research. It’s offered to make common improvements to existing solutions, to use additional
data, to replicate the experiment in other areas or to try the solution in real-life situations.
Scientific innovations could be sought in studies of time factors, model explainability,
training set balance, and causal dependencies. However, before starting further research in
these directions, additional research is needed to clarify what is done in specific technical
areas that are not limited to real estate.
   The results of this literature review can be used for further decisions on the
implementation of similar research in the reader’s region or for the initiation of new /
unexplored research directions in the field of real estate.

Acknowledgements
The research leading to these results is part of the research project "Multi-contextual data
analytics solutions for building management" jointly implemented by Riga Technical
University, SIA "Lursoft IT" and SIA "Hagberg".




                                                   66
References
[1] Baur K., Rosenfelder M. & Lutz B., “Automated real estate valuation with machine
     learning models using property descriptions,” Expert Systems with Applications, vol.
     213, pp.1-13, Mar. 2023.
[2] Belmiro C., Silveira Neto R.D.M., Barros A. & Ospina R., “Understanding the land use
     intensity of residential buildings in Brazil: An ensemble machine learning approach,”
     Habitat International, vol. 139, pp.1-12., Sep. 2023.
[3] Chen L., Yao X., Liu Y., Zhu Y., Chen W., Zhao X. & Chi T., “Measuring impacts of urban
     environmental elements on housing prices based on multisource data – a case study of
     Shanghai, China,” International Journal of Geo-Information (ISPRS), vol. 9 (no. 2), pp.1-
     23, Feb. 2020.
[4] Chen T. & Guestrin C., “XGBoost: A Scalable Tree Boosting System,” in Proceedings of
     the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
     Mining (KDD’16), 2016, pp. 785–794.
[5] Deppner, J., von Ahlefeldt-Dehn, B., Beracha, E. & Schaefers, W., “Boosting the Accuracy
     of Commercial Real Estate Appraisals: An Interpretable Machine Learning Approach,”
     Journal of Real Estate Finance and Economics, pp.1-38, Mar. 2023.
[6] Dou M., Gu Y. & Fan H., “Incorporating neighborhoods with explainable artificial
     intelligence for modeling fine-scale housing prices,” Applied Geography, vol. 158, pp.1-
     11, Sep. 2023.
[7] Ho T.K., “Random Decision Forests,” in Proceedings of the 3rd International Conference
     on Document Analysis and Recognition (ICDAR’95), 1995, pp. 278–282.
[8] Iban, M.C, “An explainable model for the mass appraisal of residences: The application
     of tree-based Machine Learning algorithms and interpretation of value determinants,”
     Habitat International, vol. 128, pp.1-11, Oct. 2022.
[9] Kang Y., F. Zhang F., Peng W., Gao S., Rao J., Duarte F. & Ratti C., “Understanding house
     price appreciation using multi-source big geo-data and machine learning,” Land Use
     Policy, vol. 111, pp.1-11, Dec. 2021.
[10] Karamanou A., Kalampokis E. & Tarabanis K., “Linked Open Government Data to Predict
     and Explain House Prices: The Case of Scottish Statistics Portal,” Big Data Research, vol.
     30, pp.1-15, Nov. 2022.
[11] Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q. & Liu T.Y., "LightGBM: A Highly
     Efficient Gradient Boosting Decision Tree," in Proceedings of the Advances in Neural
     Information Processing Systems 30 (NIPS'17), 2017, pp. 3148-3156.
[12] Kitchenham B. & Brereton P., “A systematic review of systematic review process
     research in software engineering,” Information and Software Technology, vol. 55 (no.
     12), pp. 2049-2075, Dec. 2013.
[13] Lenaers I. & De Moor L., “Exploring XAI techniques for enhancing model transparency
     and interpretability in real estate rent prediction: A comparative study,” Finance
     Research Letters, vol. 58, pp.1-9, Dec. 2023.
[14] Levantesi S. & Piscopo G., “The Importance of Economic Variables on London Real
     Estate Market: A Random Forest Approach,” Risks, vol. 8 (no. 4), pp.1-17, Dec. 2020.




                                               67
[15] Liu H. & Ma E., “An Explainable Evaluation Model for Building Thermal Comfort in
     China,” Buildings, vol. 13 (no. 12), pp.1-20, Dec. 2023.
[16] Lorenz F., Willwersch J., Cajias M. & Fuerst F., “Interpretable machine learning for real
     estate market analysis,” Real Estate Economics, vol. 51 (no. 5), pp. 1178-1208, Sep.
     2023.
[17] Lu Y., Fan X., Zhang Y., Wang Y. & Jiang X., “Machine Learning Models Using SHapley
     Additive exPlanation for Fire Risk Assessment Mode and Effects Analysis of Stadiums,”
     Sensors, vol. 23 (no. 4), pp.1-19, Feb. 2023.
[18] Lundberg S.M & Lee S.I., “A Unified Approach to Interpreting Model Predictions,” in
     Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS’17),
     2017, pp. 4768–4777.
[19] Ma Y., Zhang P., Duan S. & Zhang T., “Credit default prediction of Chinese real estate
     listed companies based on explainable machine learning,” Finance Research Letters,
     vol. 58, Dec. 2023.
[20] Moher D., Liberati A., Tetzlaff J., Altman D.G., Antes G., Atkins D., Barbour V., Barrowman
     N., Berlin J.A., Clark J., Clarke M., Cook D., D'Amico R., Deeks J.J., Devereaux P.J., Dickersin
     K., Egger M., Ernst E., Gøtzsche P.C., Grimshaw J., Guyatt G., Higgins J., Ioannidis J.P.A.,
     Kleijnen J., Lang T., Magrini N., McNamee D., Moja L., Mulrow C., Napoli M., Oxman A.,
     Pham B., Rennie D., Sampson M., Schulz K.F., Shekelle P.G., Tovey D., Tugwell P.,
     “Preferred reporting items for systematic reviews and meta-analyses: The PRISMA
     statement,” PLoS Medicine, vol. 6 (no. 7), pp.1-6, Jul. 2009.
[21] Rico-Juan J.R. & Taltavull de La Paz P., “Machine learning with explainability or spatial
     hedonics tools? An analysis of the asking prices in the housing market in Alicante,
     Spain,” Expert Systems with Applications, vol. 171, pp.1-14, Jun. 2021.
[22] Taecharungroj V., “Google Maps amenities and condominium prices: Investigating the
     effects and relationships using machine learning,” Habitat International, vol. 118, pp.1-
     12, Dec. 2021.
[23] Wang N., Xu Y. & Wang S., “Interpretable boosting tree ensemble method for
     multisource building fire loss prediction,” Reliability Engineering and System Safety,
     vol. 225, pp.1-17, Sep. 2022.




                                                  68
A. Data extraction and assessment results
                                           RQ3: Machine           RQ4:
 ID/       RQ1:        RQ2: Research
                                          Learning Method      Explainable           RQ5: Why explain                                 RQ6: Gaps
 REF     Subfields       Method
                                                                 Method
                                                                                                                  1. Test the model in other cities;
                                          XGBoost (Random                         To identify what the
        Real estate                                                                                               2. Repeat the experiment with verified and
 A1                   Literature review   Forest, XGBoost,                        model takes into account
        price                                                   PFI, SHAP                                         reliable value data;
 [8]                  Lab experiment      LightGBM,                               when estimating real
        prediction                                                                                                3. Repeat the experiment with the addition of
                                          Gradient Boosting)                      estate prices
                                                                                                                  socio-economic and demographic data.
                                                                                  1. To justify the reliability
                                                                                  of a predictive model;
        Real estate
  A2                  Case study,                                                 2. To understand which
        price                             XGBoost                 SHAP                                                                   n/a
 [10]                 Lab experiment                                              factors affect and
        prediction
                                                                                  determine the prices of
                                                                                  houses.
                                                                                                                  1. Validate the influence of different descriptions
                                                                                                                  on real estate price in a controlled laboratory
                                                                                                                  experiment;
                                                                                                                  2. Prove that the difference between non-
                                          LightGBM (Linear                                                        contextualized methods and contextualized
        Real estate                       regression,                             To understand how the           embeddings increases even more through fine-
 A3
        price         Lab experiment      elastic net, SVR,       SHAP            model arrives at                tuning a pre-trained BERT model.
 [1]
        prediction                        random forest,                          decisions                       3. Repeat the experiment on real estate
                                          LightGBM)                                                               descriptions in other languages than English and
                                                                                                                  German.
                                                                                                                  4. Extend the approach to the textual
                                                                                                                  descriptions of short-term rent offers like hotel
                                                                                                                  rooms or AirBnB offers.
                                                                                                                  Validate methodology on other real estate or
                                                                              To gain a comprehensive
        Real estate                       CatBoost (Ridge                                                         financial-economic datasets and models to
  A4                                                         ALE plots, PDPs, understanding of the
        rent price    Lab experiment      regression,                                                             deepen our understanding of the substitutability,
 [13]                                                          PFI, SHAP      factors
        prediction                        XGBoost, CatBoost)                                                      complementarity, benefits, and limitations of XAI
                                                                              driving rent
                                                                                                                  techniques in finance
  A5    Credit                            AdaBoost
                      Lab experiment                            ICE, SHAP         To clearly understand           Implement results for practical applications.
 [19]   default                           (AdaBoost, EBM,

                                                                             69
                                            RQ3: Machine             RQ4:
ID/       RQ1:          RQ2: Research
                                           Learning Method        Explainable             RQ5: Why explain                            RQ6: Gaps
REF     Subfields         Method
                                                                    Method
       prediction of                       Logistic regression,                        the ranking of feature
       real estate                         Random forest,                              importance and the
       companies                           SVM)                                        impact on the prediction
                                                                                       results
                                                                                                                 1. Consider other urban realities;
                                                                                       To understand the
       Understand                          Random forest                                                         2. Apply the model on commercial lots and its
A6                     Literature review                                               factors responsible for
       the land use                        (Random forest,        ALE plots, FI                                  variation according to economic activities;
[2]                    Lab experiment                                                  the higher urban land use
       intensity                           XGBoost).                                                             3. Investigate urban physical structure of urban
                                                                                       intensity in cities
                                                                                                                 centers.
                                                                                       To understand the         1. Test the model in other cities;
       Real estate
A7                                                                                     relationships between     2. Analyze neighbourhood characteristic
       rent price      Lab experiment      XGBoost                   SHAP
[6]                                                                                    housing units and their   interactive or synergetic impacts on housing
       prediction
                                                                                       neighbourhoods            prices.
                                           LightGBM
       Building                            (Bayesian-
       thermal                             optimized                                   To understand the
 A8
       comfort         Lab experiment      LightGBM, KNN,            SHAP              thermal requirements of     Incorporate additional variables in the model
[15]
       requirement                         Random forest,                              building occupants
       prediction                          XGBoost, GBDT,
                                           SVR)
                                           Random forest
                                           (Naïve Bayes, KNN,                          To find the complex         1. Repeat the experiment with additional data;
       Stadium fire
 A9                                        Decision tree,                              nonlinear relationship      2. Explore ways to solve the label imbalance;
       risk            Lab experiment                                SHAP
[17]                                       AdaBoost,                                   between risk features       3. Increase operational efficiency and reduce
       assessment
                                           LightGBM, Random                            and stadium fire risk.      time costs;
                                           forest)
                                                                                                                   1. Test the model in other cities;
                                                                                                                   2. Quantify the differences among cities;
                                           XGBoost (Linear                             To explain the impacts of
       Real estate                                                                                                 3. Integrate multi-year data to analyze the
A10                    Case study          Regression,                                 urban environmental
       price                                                         SHAP                                          temporal dynamics of the impacts of the urban
[3]                    Lab experiment      XGBoost, Random                             elements
       prediction                                                                                                  environmental elements on housing prices;
                                           forest, GBR)                                on housing prices
                                                                                                                   4. Repeat the experiment with additional and
                                                                                                                   improved data.


                                                                                  70
                                          RQ3: Machine            RQ4:
ID/       RQ1:        RQ2: Research
                                         Learning Method       Explainable           RQ5: Why explain                           RQ6: Gaps
REF     Subfields       Method
                                                                 Method
                                                                                  To investigate the         1. Repeat the experiment with additional data;
       Real estate                                                                relationship between       2. Additionally test and tune the model;
A11                  Literature review   XGBoost (Random
       price                                                    FI, PDPs          neighborhood amenities     3. Identifying the similarities and differences in
[22]                 Lab experiment       forest, XGBoost)
       prediction                                                                 and the prices of          the importance of amenities in various
                                                                                  condominiums               geographical areas.
                                                                                  To better explain which
                                                                                                             1. Repeat the experiment with additional and
                                                                                  variables have more
                                                                                                             improved data;
       Real estate                                                                importance in
A12                  Case study                                                                              2. Repeat the experiment with different machine
       price                             Random forest          FI, PDPs          describing the evolution
[14]                 Lab experiment                                                                          learning algorithms;
       prediction                                                                 of the house price
                                                                                                             3. Repeat the experiment on other real estate
                                                                                  following an urban
                                                                                                             datasets.
                                                                                  approach
                                                                                                             1. Test the model in other cities;
                                         Gradient boosting                                                   2. Repeat the experiment with DCNN;
                                           machine (GBM)                                                     3. Include dynamics of urban land use changes
       Real estate                       with decision trees                      To examine the effect of   into the framework with richer datasets;
A13                  Case study
       price                             (Gradient boosting        FI             different variables on     4. Add deeper exploration and more
[9]                  Lab experiment
       prediction                         machine (GBM),                          house price appreciation   explanations.
                                           Multiple linear                                                   5. Involve more time-series data and approaches
                                         regression (MLR))                                                   from the economy to build causality relationships
                                                                                                             and improve the interpretability of the model.
                                           Random forest
                                           (KNN, Decision
                                            tree, Random
                                         forest, AdaBoost,
                                                                                  To observe non-linear      1. Repeat the experiment with deep artificial
       Real estate                        CatBoost, Neural
A14                                                                               relationships between      neural networks;
       price         Lab experiment            network            SHAP
[21]                                                                              housing prices and         2. Resolve explainability challenges in deep
       prediction                             (Multilayer
                                                                                  housing attributes         artificial neural networks.
                                             perceptron),
                                         Linear regression,
                                         Ridge regression,
                                         Lasso regression)



                                                                             71
                                       RQ3: Machine         RQ4:
ID/       RQ1:        RQ2: Research
                                      Learning Method    Explainable              RQ5: Why explain                           RQ6: Gaps
REF     Subfields       Method
                                                           Method
       Real estate                                                                                        1. Enhance model building speed;
A15                                                      ALE plots, FI,        To justify decisions and
       price         Lab experiment       XGBoost                                                         2. Improve the reliability and validity of
[16]                                                        PDPs               generate new insights
       estimation                                                                                         algorithmic decision-making.
                                                                                                          1. Use a model to assist relevant departments in
                                                                               To understand the          making timelier decisions regarding dispatching
       Real estate                    IBTEM (Catboost,
A16                                                                            reasons for making         aid and mobilizing resources;
       fire loss     Lab experiment       XGBoost,           SHAP
[23]                                                                           certain decisions or       2. Repeat the experiment with updated data;
       prediction                        LightGBM)
                                                                               predictions                3. Include time series forecasting in the building
                                                                                                          fire loss prediction.
                                                                                                          1. Improve data availability for machine learning
                                                                                                          experiments;
                                                                                                          2. Justify the rationale behind patterns or
       Real estate
A17                                                                            To make informed           determine causality in the relation between input
       price         Lab experiment       XGBoost             PFI
[5]                                                                            decisions                  and output data;
       estimation
                                                                                                          3. Enhance model building speed;
                                                                                                          4. Improve machine learning algorithms for the
                                                                                                          field of real estate.




                                                                          72