<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Literature Review of Explainable Machine Learning in Real Estate</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arnis Staško</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jānis Grundspeņķis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Riga Technical University</institution>
          ,
          <addr-line>6A Kipsalas Street, Riga, LV-1048</addr-line>
          ,
          <country country="LV">Latvia</country>
        </aff>
      </contrib-group>
      <fpage>58</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>A literature review is conducted on explainable machine learning methods used in real estate. It identifies 17 relevant articles that reveal various subfields of real estate and the explainable machine learning methods used. Among them, XGBoost and SHAP is the most commonly used combination for explainable machine learning in the studied area. The study also identifies research gaps that could be addressed through further studies on time factors, model explainability, training set balance, and causal dependencies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Real estate</kwd>
        <kwd>explainable machine learning</kwd>
        <kwd>research methods</kwd>
        <kwd>literature review</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>recommend decisions but would also argue for the recommended solution (white box). In
the field of real estate people are not ready to blindly trust artificial intelligence to make a
decision about the most expensive thing they own. Explainability is therefore critical.</p>
      <p>Therefore, the questions of this research are related to the need to investigate in which
areas of real estate machine learning is used, what research methods and algorithms are
used, why explainable machine learning is chosen and what further research might be
useful. Accordingly, the research object is explainable machine learning methods.</p>
      <p>The structure of the work is as follows. Chapter 2 explores the types of literature review
used in similar studies. Further, Chapter 3 describes the literature review approach. Then,
the literature review results are presented in Chapter 4. Finally, the conclusions and future
work are summarized in Chapter 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Method Selection for Literature Review</title>
      <p>To choose a suitable literature review method for the research, a search for publications in
the ScienceDirect2 database is carried out by searching ("machine learning" AND "literature
review") in article titles and limiting results to 2023. Journal articles from the last year
should be sufficient to reasonably identify the most current approaches. From 27 returned
articles only 24 are used due to availability or title relevance.</p>
      <p>
        Briefly browsing the content of the articles and paying special attention to the research
method section, it is found that 20 out of 24 use systematic literature review. On closer
examination, it is seen that the majority leans towards Preferred Reporting Items for
Systematic reviews and Meta-Analyses (PRISMA) guidelines [20] or in the direction of
Kitchenham and Brereton's various modifications of systematic review [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Considering that Kitchenham and Brereton's [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] specializes in software engineering
literature reviews, while PRISMA guidelines [20] originate from the medical field, within
the scope of this study the Kitchenham and Brereton's version [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is adopted. The next
chapter describes the approach of a literature review.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature Review Protocol</title>
      <p>
        The literature review adapted from Kitchenham and Brereton's version [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is performed
as follows:
1. Define research questions for the literature review.
2. Perform an initial search in the ScienceDirect database by searching for review
articles related to research questions to ensure that a similar literature review
has not already been conducted.
3. Perform a manual search in the ScienceDirect database by searching for articles
related to research questions. Select candidate papers based on abstract &amp; title.
4. Iteratively perform forward and backward snowballing in the Scopus abstract
and citation database3. Add any missed papers based on abstract &amp; title analysis.
5. Read the full version of selected papers and apply detailed inclusion/exclusion
criteria during the data extraction and quality assessment process.
      </p>
      <p>The authors believe that the use of the combination of ScienceDirect and Scopus provides
sufficient coverage of reliable literature sources.</p>
      <sec id="sec-3-1">
        <title>3.1. Research Questions</title>
        <p>The cornerstone of a systematic literature review is the definition of research questions. So,
to achieve the goals set for the research, the research questions are:
•
•
•
•
•
•</p>
        <p>RQ1. In what subfields of real estate explainable machine learning is applied?
RQ2. What research methods are used to study explainable machine learning in
the field of real estate?
RQ3. What machine learning methods are used in the field of real estate?
RQ4. What explainable machine learning methods are used in the field of real
estate?
RQ5. Why explainable machine learning methods are used in the field of real
estate?
RQ6. What are the research gaps in explainable machine learning in the field of
real estate?
Further, the results of the availability of similar studies in the literature are analyzed.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Initial Search</title>
        <p>To ensure that a similar reliable literature review is not available, ScienceDirect4 is searched
for keywords related to the research. Results for a search within article titles, an abstract
and keywords are summarized in Table 1.</p>
        <p>Search phrase
("real estate" AND "explainable machine learning" AND "overview")
("real estate" AND "explainable machine learning" AND "review")
("real estate" AND "explainable machine learning" AND "survey")
("real estate" AND "explainable artificial intelligence" AND "overview")
("real estate" AND "explainable artificial intelligence" AND "review")
("real estate" AND "explainable artificial intelligence" AND "survey")</p>
        <p>The initial search results prove that a potentially similar literature review is not
available. It is justified to carry out the intended literature review. Next, manual search
results are summarized.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Manual Search</title>
        <p>
          The manual search is performed in the ScienceDirect5 database by searching for research
articles by phrase ("real estate" AND ("explainable machine learning" OR "explainable
artificial intelligence" OR “XAI”)) in article titles, an abstract and keywords. A total of five
articles are found [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [19]. After reading the title and abstract, all are
accepted as relevant for further research. If there are other publications, authors trust that
they will be discovered in the process of snowballing in the Scopus database.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Forward &amp; Backward Snowballing</title>
        <p>In the forward snowballing all articles citing the examined article and in the backward
snowballing all articles referenced from the examined article according to the Scopus5
database are reviewed and the relevant articles are selected.</p>
        <p>In the first iteration, the articles found during manual search are examined. In every next
iteration, the articles found in the previous iteration are examined. As relevant are accepted
articles between 2019 and 2023 with full-text availability and whose title or abstract
reflects a connection with the field of real estate and use explainable machine learning
methods in their research. A total of three iterations are performed. During the 3rd iteration,
no new articles are found and the snowballing is not continued. The summary of all
iterations and results is given in Table 2. With snowballing 12 new articles are added to the
research.</p>
        <p>Once a list of relevant articles for further research is obtained, during the data extraction
step the quality of articles is evaluated in detail and the answers to the research questions
are clarified.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Data Extraction</title>
        <p>
          According to the research questions data extraction and quality assessment are performed
by reading the full text of each article. While the answers to RQ1, RQ3 and RQ4 are readily
5
https://www.scopus.com/ (accessed January 7, 2024)
apparent, RQ2, RQ5 and RQ6 require additional effort. Almost none of the articles mention
the exact research method used. In some of them a case study [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] or a
literature review [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [22] is mentioned, however, when researched in detail, it can be
seen that the prime research method is a laboratory experiment. Similarly, the justification
of the need for machine learning is to be explained. Several articles take this for granted and
the detailed analysis of the benefits of explainability is performed to determine the real
need. The most difficult is to determine research gaps. Therefore, the future research
questions mentioned in the article are identified. Then, the actual research gaps are
discussed. The data extraction results are presented in Appendix A.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The literature review discovered 17 publications from scientific journals with Scopus cite
scores between 3.3 and 14.8 (2023 data updated on 05.01.2024.). While the journal Habitat
International6 is ranked first in terms of the number of articles, the journal Reliability
Engineering and System Safety7 have the highest citation score 14.8. The full journal list is
presented in Table 3. These results show that all articles are published in acknowledged
editions.</p>
      <p>
        The literature study identified 64 authors publishing on the application of explainable
machine learning in real estate. In terms of citations, the top most significant are the works
of Kang &amp; Zhang et.al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with 86 citations, Chen &amp; Yao et.al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with 39 citations and
Ricohttps://www.sciencedirect.com/journal/habitat-international
https://www.sciencedirect.com/journal/reliability-engineering-and-system-safety
Juan &amp; Taltavull de La Paz [21] with 38 citations. Visualization is used to demonstrate the
scope of the authors' contribution (Figure 1).
      </p>
      <p>Significant to discover a set of keywords that illustrate the topic of the reviewed articles
(Figure 2). They represent the research area.
The subsections below summarize the answers to the research questions.</p>
      <sec id="sec-4-1">
        <title>4.1. RQ1: Real estate subfields</title>
        <p>
          The first research question RQ1 is “In what subfields of real estate explainable machine
learning is applied?” The literature study reveals 8 different research subfields in real
estate, where the most frequently addressed issue is real estate price prediction [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ],
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [21], [22], then follows real estate price estimation [16], [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and real
estate rent price prediction [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. One study from each subfield represents on
understanding of the land use intensity [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], real estate fire loss prediction [23], building
thermal comfort requirement prediction [15], stadium fire risk assessment [17] and
credit default prediction of real estate companies [19]. This information gives an idea in
which areas it would be possible to repeat similar studies in a reader’s region, and also
allows to navigate which directions have not yet been covered, in case new research is
implemented.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. RQ2: Research methods</title>
        <p>The second research question RQ2 is “What research methods are used to study explainable
machine learning in the field of real estate?” Evaluating all articles, it can be concluded that
they all represent a laboratory experiment as a research method. This is quite
understandable since building a machine learning model consists of training a model and
evaluating its results using a testing set. Such an approach by default involves a laboratory
experiment.</p>
        <p>
          In addition, it should be noted that in four articles it is mentioned that a case study is
conducted [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. On the other hand, from the content of three articles, it is
observable that a literature review is carried out [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [22].
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. RQ3: Machine learning methods</title>
        <p>The third research question RQ3 is “What machine learning methods are used in the field of
real estate?” When searching for answers to this question, two aspects were evaluated
firstly, which machine learning methods are used and secondly, which of them shows the
highest results or is the only one tested. The list of the machine learning methods studied in
real estate is provided in Table 4.</p>
        <p>
          The XGBoost method shows the best results or is chosen as appropriate in 7 out of 10
cases [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [16], [22]. It is followed by Random forest in 4 out of 10 cases
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [17], [21] and LightGBM in 2 out of 4 cases [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [15]. One in each study IBTEM [23],
CatBoost [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], AdaBoost [19] and Gradient boosting machine [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The top three methods –
XGBoost [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Random Forest [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] &amp; LightGBM [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] are based on decision tree algorithms. The
results are useful as they allow to make research-based choices about the machine learning
method for similar research.
        </p>
        <p>No
1
2
3
4
5
6
7
8
9
10
11</p>
        <p>Method
XGBoost (#1)
Random Forest (#2)
LightGBM (#3)
AdaBoost
KNN
Linear regression
CatBoost
Decision tree
Gradient Boosting
Ridge regression
SVR</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. RQ4: Explainable machine learning methods</title>
        <p>
          The fourth research question RQ4 is “What explainable machine learning methods are used
in the field of real estate?” In the field of explainable machine learning, six different methods
are used in the literature – SHAP [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [15], [17], [19], [21], [23]; FI [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [16], [22]; PDPs [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [16], [22]; PFI [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]; ALE plots [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [16]; ICE
[19]. The SHAP [18] method and its various modifications are the most widely used. The
SHAP global and local explanations provide an opportunity to explain black box machine
learning techniques. It allows to build a complex / black-box machine learning model that
provides the highest possible results, while maintaining the possibility of understanding its
operation, as well as gaining knowledge about the field under study.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. RQ5: The reason for explainable machine learning</title>
        <p>The fifth research question RQ5 is “Why explainable machine learning methods are used in
the field of real estate?” Analyzing the publications, the reasons why their authors chose to
use explainable machine learning methods can be interpreted in different ways, however,
in fact, all researches found in the field of real estate are united by one goal - to understand
the decision or forecast suggested by the model or to find correlations between the known
information and the predicted outcome. Explainability simultaneously provides both
knowledge of the researched field and increases users' confidence in the obtained solution.
A detailed analysis can be found in Appendix A.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. RQ6: Research gaps</title>
        <p>The sixth research question RQ6 is “What are the research gaps in explainable machine
learning in the field of real estate?” This is the most difficult question to analyze when
studying the literature. The authors of each article indicate possible further work or
improvements as a continuation of their research. However, that does not always indicate
research gaps in general.</p>
        <p>
          11 studies out of 17 note the need to repeat the study with better quality, additional or
different types of data [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [15], [17], [22], [23]. 8 studies note the
need to improve the performance of algorithms by tuning them or testing others [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [16], [21], [22]. 6 studies propose to try the solution in a different
geographical location [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [22]. 4 studies encourage to try a solution in real
life or explore specific aspects of real life [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [19], [23]. 3 studies suggest improving the
speed of the algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [16], [17], or including the time factor [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [23] in the analysis
of the problem sphere. Only 2 studies suggest improving model explainability [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [21]. In
conclusion, one study at a time encourages comparing the results of different fields [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ],
solving the imbalance of the data set [17] or looking for the true causal dependencies [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>From the conducted literature review it is evident that explainable machine learning
methods in the field of real estate are used to determine property value, rent and price, as
well as land use intensity, fire damage, thermal comfort, fire risk and bankruptcy prediction.</p>
      <p>In the field of machine learning, the most suitable research method is a laboratory
experiment, and it is useful to apply a literature review and/or case study, if necessary. The
study also indicates that the decision tree based XGBoost, Random Forest &amp; LightGBM
machine learning methods and SHAP explainable machine learning method are the most
suitable or most used in real estate, providing the results of the highest value. The use of
explainable machine learning is mainly necessary to understand the decision or forecast.
Moreover, it provides an understating about the researched field and increases trust in the
obtained machine learning model.</p>
      <p>On the other hand, the study of research gaps gives only general ideas for further
research. It’s offered to make common improvements to existing solutions, to use additional
data, to replicate the experiment in other areas or to try the solution in real-life situations.
Scientific innovations could be sought in studies of time factors, model explainability,
training set balance, and causal dependencies. However, before starting further research in
these directions, additional research is needed to clarify what is done in specific technical
areas that are not limited to real estate.</p>
      <p>The results of this literature review can be used for further decisions on the
implementation of similar research in the reader’s region or for the initiation of new /
unexplored research directions in the field of real estate.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The research leading to these results is part of the research project "Multi-contextual data
analytics solutions for building management" jointly implemented by Riga Technical
University, SIA "Lursoft IT" and SIA "Hagberg".
[15] Liu H. &amp; Ma E., “An Explainable Evaluation Model for Building Thermal Comfort in</p>
      <p>China,” Buildings, vol. 13 (no. 12), pp.1-20, Dec. 2023.
[16] Lorenz F., Willwersch J., Cajias M. &amp; Fuerst F., “Interpretable machine learning for real
estate market analysis,” Real Estate Economics, vol. 51 (no. 5), pp. 1178-1208, Sep.
2023.
[17] Lu Y., Fan X., Zhang Y., Wang Y. &amp; Jiang X., “Machine Learning Models Using SHapley
Additive exPlanation for Fire Risk Assessment Mode and Effects Analysis of Stadiums,”
Sensors, vol. 23 (no. 4), pp.1-19, Feb. 2023.
[18] Lundberg S.M &amp; Lee S.I., “A Unified Approach to Interpreting Model Predictions,” in
Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS’17),
2017, pp. 4768–4777.
[19] Ma Y., Zhang P., Duan S. &amp; Zhang T., “Credit default prediction of Chinese real estate
listed companies based on explainable machine learning,” Finance Research Letters,
vol. 58, Dec. 2023.
[20] Moher D., Liberati A., Tetzlaff J., Altman D.G., Antes G., Atkins D., Barbour V., Barrowman
N., Berlin J.A., Clark J., Clarke M., Cook D., D'Amico R., Deeks J.J., Devereaux P.J., Dickersin
K., Egger M., Ernst E., Gøtzsche P.C., Grimshaw J., Guyatt G., Higgins J., Ioannidis J.P.A.,
Kleijnen J., Lang T., Magrini N., McNamee D., Moja L., Mulrow C., Napoli M., Oxman A.,
Pham B., Rennie D., Sampson M., Schulz K.F., Shekelle P.G., Tovey D., Tugwell P.,
“Preferred reporting items for systematic reviews and meta-analyses: The PRISMA
statement,” PLoS Medicine, vol. 6 (no. 7), pp.1-6, Jul. 2009.
[21] Rico-Juan J.R. &amp; Taltavull de La Paz P., “Machine learning with explainability or spatial
hedonics tools? An analysis of the asking prices in the housing market in Alicante,
Spain,” Expert Systems with Applications, vol. 171, pp.1-14, Jun. 2021.
[22] Taecharungroj V., “Google Maps amenities and condominium prices: Investigating the
effects and relationships using machine learning,” Habitat International, vol. 118,
pp.112, Dec. 2021.
[23] Wang N., Xu Y. &amp; Wang S., “Interpretable boosting tree ensemble method for
multisource building fire loss prediction,” Reliability Engineering and System Safety,
vol. 225, pp.1-17, Sep. 2022.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Data extraction and assessment results</title>
      <p>RQ1:
Subfields</p>
      <sec id="sec-7-1">
        <title>1. Test the model in other cities;</title>
        <p>2. Repeat the experiment with verified and
reliable value data;
3. Repeat the experiment with the addition of
socio-economic and demographic data.</p>
        <p>RQ6: Gaps</p>
        <p>n/a
1. Validate the influence of different descriptions
on real estate price in a controlled laboratory
experiment;
2. Prove that the difference between
noncontextualized methods and contextualized
embeddings increases even more through
finetuning a pre-trained BERT model.
3. Repeat the experiment on real estate
descriptions in other languages than English and
German.
4. Extend the approach to the textual
descriptions of short-term rent offers like hotel
rooms or AirBnB offers.</p>
        <p>Validate methodology on other real estate or
financial-economic datasets and models to
deepen our understanding of the substitutability,
complementarity, benefits, and limitations of XAI
techniques in finance</p>
      </sec>
      <sec id="sec-7-2">
        <title>ICE, SHAP</title>
      </sec>
      <sec id="sec-7-3">
        <title>To clearly understand</title>
      </sec>
      <sec id="sec-7-4">
        <title>Implement results for practical applications.</title>
        <p>
          A6
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
prediction of
real estate
companies
        </p>
      </sec>
      <sec id="sec-7-5">
        <title>Understand</title>
        <p>the land use
intensity</p>
      </sec>
      <sec id="sec-7-6">
        <title>Logistic regression, Random forest, SVM)</title>
      </sec>
      <sec id="sec-7-7">
        <title>Random forest (Random forest, XGBoost).</title>
      </sec>
      <sec id="sec-7-8">
        <title>LightGBM</title>
        <p>(Bayesianoptimized
LightGBM, KNN,
Random forest,
XGBoost, GBDT,
SVR)
Random forest
(Naïve Bayes, KNN,
Decision tree,
AdaBoost,
LightGBM, Random
forest)</p>
      </sec>
      <sec id="sec-7-9">
        <title>XGBoost (Linear Regression, XGBoost, Random forest, GBR)</title>
      </sec>
      <sec id="sec-7-10">
        <title>ALE plots, FI</title>
      </sec>
      <sec id="sec-7-11">
        <title>SHAP</title>
      </sec>
      <sec id="sec-7-12">
        <title>SHAP</title>
      </sec>
      <sec id="sec-7-13">
        <title>SHAP</title>
      </sec>
      <sec id="sec-7-14">
        <title>SHAP Lab experiment XGBoost</title>
        <p>To understand the 1. Consider other urban realities;
iftnahcteetohnrisgsihtryeerisnpucoribntiaseinbsllaenfodruse v32a..IArnipavpteilosytnitghaacetcemourodrdbienalgnotpnohceyocsmoicnmaolemsrticrciuaaclctltuoivrtseitaioenfsdu;ritbsan
centers.</p>
        <p>To understand the 1. Test the model in other cities;
relationships between 2. Analyze neighbourhood characteristic
housing units and their interactive or synergetic impacts on housing
neighbourhoods prices.</p>
      </sec>
      <sec id="sec-7-15">
        <title>To understand the</title>
        <p>thermal requirements of Incorporate additional variables in the model
building occupants
To find the complex
nonlinear relationship
between risk features
and stadium fire risk.</p>
        <p>1. Repeat the experiment with additional data;
2. Explore ways to solve the label imbalance;
3. Increase operational efficiency and reduce
time costs;
1. Test the model in other cities;
2. Quantify the differences among cities;
3. Integrate multi-year data to analyze the
temporal dynamics of the impacts of the urban
environmental elements on housing prices;
4. Repeat the experiment with additional and
improved data.</p>
      </sec>
      <sec id="sec-7-16">
        <title>To explain the impacts of urban environmental elements on housing prices</title>
        <p>A11
[22]</p>
      </sec>
      <sec id="sec-7-17">
        <title>FI, PDPs FI</title>
      </sec>
      <sec id="sec-7-18">
        <title>SHAP</title>
        <p>Real estate
price
estimation</p>
      </sec>
      <sec id="sec-7-19">
        <title>Real estate</title>
        <p>fire loss
prediction</p>
      </sec>
      <sec id="sec-7-20">
        <title>Lab experiment XGBoost Lab experiment IBTEM (Catboost,</title>
      </sec>
      <sec id="sec-7-21">
        <title>ALE plots, FI, PDPs</title>
      </sec>
      <sec id="sec-7-22">
        <title>SHAP Lab experiment XGBoost PFI</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baur</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenfelder</surname>
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lutz</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <article-title>“Automated real estate valuation with machine learning models using property descriptions,” Expert Systems with Applications</article-title>
          , vol.
          <volume>213</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          , Mar.
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Belmiro</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silveira Neto R.D.M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barros</surname>
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ospina</surname>
            <given-names>R.</given-names>
          </string-name>
          , “
          <article-title>Understanding the land use intensity of residential buildings in Brazil: An ensemble machine learning approach</article-title>
          ,” Habitat International, vol.
          <volume>139</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ., Sep.
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chen</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>X.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chi</surname>
            <given-names>T.</given-names>
          </string-name>
          , “
          <article-title>Measuring impacts of urban environmental elements on housing prices based on multisource data - a case study of Shanghai</article-title>
          , China,”
          <source>International Journal of Geo-Information (ISPRS)</source>
          , vol.
          <volume>9</volume>
          (
          <issue>no</issue>
          .
          <issue>2</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          , Feb.
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chen</surname>
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Guestrin</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>“XGBoost: A Scalable Tree Boosting System</surname>
          </string-name>
          ,”
          <source>in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Deppner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , von Ahlefeldt-Dehn,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Beracha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            &amp;
            <surname>Schaefers</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          ,
          <source>“Boosting the Accuracy of Commercial Real Estate Appraisals: An Interpretable Machine Learning Approach,” Journal of Real Estate Finance and Economics</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          , Mar.
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dou</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            <given-names>Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Fan</surname>
            <given-names>H.</given-names>
          </string-name>
          , “
          <article-title>Incorporating neighborhoods with explainable artificial intelligence for modeling fine-scale housing prices</article-title>
          ,
          <source>” Applied Geography</source>
          , vol.
          <volume>158</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          , Sep.
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ho</surname>
            <given-names>T.K.</given-names>
          </string-name>
          , “
          <article-title>Random Decision Forests,”</article-title>
          <source>in Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR'95)</source>
          ,
          <year>1995</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Iban</surname>
            ,
            <given-names>M.C,</given-names>
          </string-name>
          “
          <article-title>An explainable model for the mass appraisal of residences: The application of tree-based Machine Learning algorithms and interpretation of value determinants</article-title>
          ,” Habitat International, vol.
          <volume>128</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          , Oct.
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kang</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>F. Zhang F.</given-names>
            ,
            <surname>Peng</surname>
          </string-name>
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Gao</surname>
          </string-name>
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Rao</surname>
          </string-name>
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Duarte</surname>
          </string-name>
          <string-name>
            <given-names>F.</given-names>
            &amp;
            <surname>Ratti</surname>
          </string-name>
          <string-name>
            <surname>C.</surname>
          </string-name>
          , “
          <article-title>Understanding house price appreciation using multi-source big geo-data and machine learning</article-title>
          ,
          <source>” Land Use Policy</source>
          , vol.
          <volume>111</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          , Dec.
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Karamanou</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalampokis</surname>
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Tarabanis</surname>
            <given-names>K.</given-names>
          </string-name>
          , “
          <article-title>Linked Open Government Data to Predict and Explain House Prices:</article-title>
          <source>The Case of Scottish Statistics Portal,” Big Data Research</source>
          , vol.
          <volume>30</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , Nov.
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ke</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finley</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>W.</given-names>
          </string-name>
          , Ma W.,
          <string-name>
            <surname>Ye</surname>
            <given-names>Q.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Liu</surname>
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <article-title>"LightGBM: A Highly Efficient Gradient Boosting Decision Tree,"</article-title>
          <source>in Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS'17)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3148</fpage>
          -
          <lpage>3156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kitchenham</surname>
            <given-names>B.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Brereton</surname>
            <given-names>P.</given-names>
          </string-name>
          , “
          <article-title>A systematic review of systematic review process research in software engineering</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>55</volume>
          (no.
          <issue>12</issue>
          ), pp.
          <fpage>2049</fpage>
          -
          <lpage>2075</lpage>
          , Dec.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Lenaers</surname>
            <given-names>I.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>De Moor L</surname>
          </string-name>
          .,
          <article-title>“Exploring XAI techniques for enhancing model transparency and interpretability in real estate rent prediction: A comparative study</article-title>
          ,
          <source>” Finance Research Letters</source>
          , vol.
          <volume>58</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , Dec.
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Levantesi</surname>
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Piscopo</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <source>“The Importance of Economic Variables on London Real Estate Market: A Random Forest Approach,” Risks</source>
          , vol.
          <volume>8</volume>
          (
          <issue>no</issue>
          .
          <issue>4</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          , Dec.
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>