<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Conference of the Italian Association for Artificial Intelligence, November</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards an Evaluation Framework for Indoor Recom mender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Ferrato</string-name>
          <email>alessio.ferrato@uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Recommender Systems, Evaluation Framework, Indoor Environment</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Engineering, Roma Tre University</institution>
          ,
          <addr-line>00146 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Doctoral project supervised by Carla Limongelli (Roma Tre University) and Giuseppe Sansonetti (Roma Tre</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>6</fpage>
      <lpage>09</lpage>
      <abstract>
        <p>Recommender Systems (RSs) play a crucial role in shaping user experiences on the Web, yet their availability is limited when it comes to indoor environments. Indoor RSs face unique challenges, including user localization, privacy concerns, complex spatial layouts, and user adoption. While several evaluation frameworks exist, they are primarily designed for online domains and may not be suitable for indoor recommendations. This paper introduces an evaluation framework tailored for indoor RSs, addressing the scarcity of publicly available datasets and the complexity of model comparison and metric selection. We also emphasize the absence of a suitable dataset for indoor recommendations and propose the integration of a synthetic data generator to facilitate research in this domain. This paper reviews existing evaluation frameworks and identifies their limitations in the context of indoor recommendations, setting the first step for developing a specialized framework. Our work aims to bridge the gap between traditional RSs and indoor environments, paving the way for more efective recommendations in physical spaces.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        A Recommender System (RS) can filter the content in a given scenario by creating personalized
recommendations specific to help users make decisions. To date, these systems are widespread
on the Web and, in many cases, shape the success of the platforms we use every day [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Despite
this, when outside the Web, particularly in indoor environments, we rarely take advantage of
RSs [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        The low proliferation of indoor RS is a combination of several factors, and there are some
unique challenges to address in this environment compared to their online counterparts [
        <xref ref-type="bibr" rid="ref4">4, 5</xref>
        ].
Some key points to consider are user localization, privacy issues, complex environments, and,
most importantly, user adoption [6]. In particular, an indoor RS relies mainly on users’ physical
movements and interactions in the environment, which can be very tangled (e.g., multiple rooms,
lfoors, and diferent layouts). Moreover, convincing users to adopt this type of system can be
dificult considering the amount of information it needs, but building them with
anonymityUniversity).
preserving techniques can help to overcome this aspect [6]. It is important to note that they
must also face the usual issues impacting any RS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], such as cold-start, data sparsity, scalability,
and fairness.
      </p>
      <p>Indoor environments, such as retail stores, museums, and educational institutions, present
distinctive needs, and traditional online RSs may need to adapt more efectively to these physical
spaces. Even if there are noteworthy works in indoor environments (e.g., see [5, 7]), to the
best of our knowledge, the current research landscape suggests that there is still no established
approach in these settings due to several factors: lack of publicly available datasets, dificult
model comparison, and suitable evaluation metrics selection. To this end, we introduce and
discuss an evaluation framework for indoor RSs with a special synthetic data generation module
to simplify research.</p>
      <p>The following section will discuss the background by introducing diferent evaluation
frameworks we can employ in this domain. Next, we will illustrate the research goals. Finally, our
framework and its main components will be discussed by summarizing the work’s contributions
and presenting the possible limitations.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <p>
        Considering that “ofline evaluations are often the first step in conducting evaluations and there
is a logical evolution from ofline evaluations, through user studies to online analyses” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
the development of an evaluation framework plays a key role in helping to reproduce
experiments [8]. Over the years, several evaluation frameworks for RSs have emerged, some of them
theoretical (e.g., see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), others freely distributed under license1, but to our knowledge, none of
them was explicitly built for evaluating indoor RSs.
      </p>
      <p>A review of the literature revealed five frameworks, summarized in Table 1, which could
be used and adapted to our scenario: Daisy-Rec [9], Mab2Rec2, RecBole [10], ReChorus [11],
RecPack [12].</p>
      <p>DaisyRec, RecPack, and RecBole are created explicitly for Top-K recommendation but support
only one type of input, a matrix of user and item interaction. For this reason, these systems
are not suitable for our application domain since no information about the placement of items
in the environment can be used. In contrast, Mab2Rec accepts a more complex representation
as input but is a framework that only implements models based on multi-armed bandits, thus
limiting evaluation with diferent recommendation techniques. RecBole, on the other hand, is a
vast framework that allows the evaluation of many recommendation tasks. However, even if
more than 64 models are implemented for context-aware/session/sequential recommendation,
none of them is made to suggest items specifically in an indoor environment.</p>
      <p>In addition, all these frameworks already provide and simplify the loading and preprocessing
of the most popular datasets in the literature. However, unfortunately, no one of them belongs
to the indoor scenario. Finally, they do not implement any mechanism for generating synthetic
data even if it would be useful because “in cases where a natural real-world dataset that would be</p>
      <sec id="sec-3-1">
        <title>1github.com/ACMRecSys/recsys-evaluation-frameworks (last access: February 25, 2024) 2github.com/fidelity/mab2rec (last access: February 25, 2024)</title>
        <p>
          suficiently suitable for developing, training, and evaluating a RS is not available, a synthesized
dataset may be used”[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Research Goals</title>
      <p>Given the above, the Research Goals can be summarized into three points:
1. Implement a synthetic data generator from indoor environment representation (i.e., items
and their location).
2. Implement models used in indoor recommendation (e.g., [5, 7]).</p>
      <p>3. Identify metrics to be used for evaluation (e.g., crowdness, coverage, popularity).</p>
    </sec>
    <sec id="sec-5">
      <title>4. Framework Overview</title>
      <p>In every evaluation framework, we can identify three characteristic components: data module,
recommendation module, and evaluation module.</p>
      <p>The first element is the module in charge of data loading and processing. In our case, we want
to enrich the module with a component in charge of generating synthetic data to overcome
their lack in literature. A strategy for generation is presented in [13] where, starting from
a limited set of real data, the authors generate synthetic datasets to train context-aware RSs.
Diferent strategies can be used to generate the data at this point, starting from less complex
simulations (e.g., random-walking) to elaborate scenarios with diferent user preference profiles
(e.g., visiting style in museum [14]), diferent user flow (e.g., Google Popular Times 4), or adding
dwell time [15].</p>
      <p>Another central element is the recommendation module, where all the models available for
training are implemented. This part difers between frameworks in the number of models
implemented and the recommendation task. In our framework, we will implement the models</p>
      <sec id="sec-5-1">
        <title>4blog.google/products/maps/maps101 (last access: February 25, 2024)</title>
        <p>found in the literature for ofline recommendation(e.g., see [ 5]). However, we will also add
traditional models to test whether it is the most suitable in this domain.</p>
        <p>
          Finally, the evaluation module is used to evaluate the RSs through classical metrics (for a
complete list please see [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]) with particular attention to the so-called fairness metrics [16] if
synthetic data are used since real data (e.g., rating) are not available to compute some results
(e.g., prediction accuracy).
        </p>
        <p>The system will be modular enough to write a configuration file to start an experiment,
making it very easy to introduce new researchers to this topic.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>In conclusion, the evaluation of indoor RSs is a challenging task, primarily due to the scarcity
of available datasets and models. We recognize this limitation so we are determined to fill this
gap, in an underexplored domain, by developing an evaluation framework with new specifics.
In this initial contribution, we outlined the research activities planned for this purpose.
[5] M. Del Carmen Rodríguez-Hernández, S. Ilarri, R. Hermoso, R. Trillo-Lado, Towards
trajectory-based recommendations in museums: evaluation of strategies using mixed
synthetic and real data, Procedia computer science 113 (2017) 234–239.
[6] A. Friedman, B. P. Knijnenburg, K. Vanhecke, L. Martens, S. Berkovsky, Privacy aspects of
recommender systems, Recommender systems handbook (2015) 649–688.
[7] J. Shin, C. Lee, C. Lim, Y. Shin, J. Lim, Recommendation in ofline stores: A gamification
approach for learning the spatiotemporal representation of indoor shopping, in:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
KDD ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3878–3888.</p>
      <p>URL: https://doi.org/10.1145/3534678.3539199. doi:10.1145/3534678.3539199.
[8] M. Ferrari Dacrema, S. Boglio, P. Cremonesi, D. Jannach, A troubling analysis of
reproducibility and progress in recommender systems research, ACM Transactions on
Information Systems (TOIS) 39 (2021) 1–49.
[9] Z. Sun, H. Fang, J. Yang, X. Qu, H. Liu, D. Yu, Y.-S. Ong, J. Zhang, Daisyrec 2.0:
Benchmarking recommendation for rigorous evaluation, IEEE Transactions on Pattern Analysis
and Machine Intelligence (2022).
[10] W. X. Zhao, Y. Hou, X. Pan, C. Yang, Z. Zhang, Z. Lin, J. Zhang, S. Bian, J. Tang, W. Sun,
et al., Recbole 2.0: Towards a more up-to-date recommendation library, in: Proceedings of
the 31st ACM International Conference on Information &amp; Knowledge Management, 2022,
pp. 4722–4726.
[11] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Make it a chorus: knowledge-and time-aware
item modeling for sequential recommendation, in: Proceedings of the 43rd International
ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp.
109–118.
[12] L. Michiels, R. Verachtert, B. Goethals, Recpack: An(other) experimentation toolkit for
top-n recommendation using implicit feedback data, in: Proceedings of the 16th ACM
Conference on Recommender Systems, RecSys ’22, Association for Computing Machinery,
New York, NY, USA, 2022, p. 648–651. URL: https://doi.org/10.1145/3523227.3551472. doi:10.
1145/3523227.3551472.
[13] M. Del Carmen Rodríguez-Hernández, S. Ilarri, R. Hermoso, R. Trillo-Lado, Datagencars: A
generator of synthetic data for the evaluation of context-aware recommendation systems,
Pervasive and Mobile Computing 38 (2017) 516–541.
[14] M. Zancanaro, T. Kuflik, Z. Boger, D. Goren-Bar, D. Goldwasser, Analyzing museum
visitors’ behavior patterns, in: User Modeling 2007: 11th International Conference, UM
2007, Corfu, Greece, July 25-29, 2007. Proceedings 11, Springer, 2007, pp. 238–246.
[15] A. Ferrato, C. Limongelli, M. Mezzini, G. Sansonetti, A deep learning-based approach to
model museum visitors, in: Proceedings of the ACM Intelligent User Interfaces workshops,
2022, pp. 217–221.
[16] Y. Deldjoo, D. Jannach, A. Bellogin, A. Difonzo, D. Zanzonelli, Fairness in recommender
systems: research landscape and future directions, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zangerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <article-title>Evaluating recommender systems: survey and framework</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Battiston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yildirim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schweitzer</surname>
          </string-name>
          ,
          <article-title>Moving recommender systems from on-line commerce to retail stores, Information systems</article-title>
          and e-business management
          <volume>10</volume>
          (
          <year>2012</year>
          )
          <fpage>367</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrato</surname>
          </string-name>
          ,
          <article-title>Challenges for anonymous session-based recommender systems in indoor environments</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1339</fpage>
          -
          <lpage>1341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. M. F.</given-names>
            <surname>Otani</surname>
          </string-name>
          , V. F. de Santana,
          <article-title>Practical challenges in indoor mobile recommendation</article-title>
          ,
          <source>arXiv preprint arXiv:2211.15810</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>