<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Quality Challenges in Multimodal Tourism Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zehui Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfram Höpken</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietmar Jannach</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Applied Sciences Ravensburg-Weingarten</institution>
          ,
          <addr-line>Doggenriedstrasse, Weingarten, Weingarten, 88250</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Klagenfurt</institution>
          ,
          <addr-line>Universitätsstraße 65-67, Klagenfurt am Wörthersee, 9020</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Modern recommender systems increasingly consider diverse data modalities to enhance model performance. However, the quality of the underlying multimodal data (e.g. data from social media platforms) has received limited attention so far. Focusing on tourism recommendation, we examine real-world challenges encountered when using datasets from Yelp and Instagram. Issues include noise from fake reviews, hallucinated content from LLMs, modality contradictions, and demographic bias. We outline future directions such as quality-aware fusion, structured and controlled LLM-based content generation to reduce hallucinations, and user group diferentiation to improve the robustness and reliability of multimodal recommendation systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multimodal Recommendation</kwd>
        <kwd>Data Quality</kwd>
        <kwd>Tourism</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Modality Fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        As a key application domain of recommender systems, tourism recommendation involves tasks such as
providing personalized suggestions for transportation, accommodation, and points-of-interest (POIs) [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,
2, 3</xref>
        ]. Recently, researchers have increasingly explored the use of diverse information types to improve
model performance [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Beyond check-in records that incorporate timestamps and geographic
coordinates, additional sources such as POI profiles (e.g., textual attributes), social relationships (e.g.,
friendship networks), user feedback (e.g., reviews), and visual content (e.g., photos) have been integrated
through representation learning, attention-based neural architectures, or late fusion schemes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Despite the growing reliance on such heterogeneous data sources, the quality of multimodal
information remains largely underexplored. This oversight can introduce noise, bias, and contradictions
into the recommendation process. In the following, we draw on findings from real-world datasets to
illustrate the challenges that arise when working with imperfect, incomplete, or biased multimodal
data in tourism recommender systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Challenges in Multimodal Data Quality</title>
      <p>
        Recent investigations in tourism-related user behavior analysis and next-POI recommendation have
revealed several fundamental issues regarding the quality of multimodal data. Based on two real-world
datasets from [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], namely the Multimodal Yelp Dataset1 and an Instagram dataset collected from the
Lake Constance region in Germany2, we summarize the key challenges as follows.
      </p>
      <sec id="sec-2-1">
        <title>Noise in User-Generated and Model-Generated Content. Real-world datasets often contain</title>
        <p>
          noisy information that can significantly degrade the efectiveness of recommendation models. For
example, the multimodal Yelp dataset is constructed from user-generated reviews and check-in records
on the Yelp platform. However, some of these reviews originate from fake or automated accounts that
produce fabricated interactions, leading to distorted representations of businesses. Prior work [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] has
tackled this issue by analyzing behavioral patterns to identify non-human users. Suspicious accounts
are first flagged based on unusually high posting frequency compared to normal user activity. These
lfagged accounts are then further examined for implausible movement speeds between consecutive
check-ins and eventually removed. In the Instagram dataset, burst postings (multiple posts within a few
minutes at the same location) are also observed. Retaining such data does not contribute to the analysis
of meaningful sequential behavior patterns, but instead dilutes the signal of genuine user trajectories
and biases the statistical distribution of activities.
        </p>
        <p>
          In addition, with the increasing use of LLMs to process textual and visual modalities, a new form of
semantic noise has emerged. Hallucinated or irrelevant content generated by LLMs can significantly
degrade the quality of downstream embeddings and harm recommendation performance. One approach
to address this issue is to use structured summarizations instead of open-ended generation. By enforcing
predefined fields in the summaries, the process becomes more controllable and less susceptible to
hallucination. In particular, for visual content, summaries are generated selectively, only for images that
contain clear and interpretable visual cues [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Ambiguous images are filtered out to avoid introducing
additional noise.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Demographic Bias in User Behavior and Content Distribution. Demographic bias occurs when</title>
        <p>
          there is a mismatch between the distribution of users in training data and the characteristics of the
target population [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. A typical case in tourism is the behavioral divergence between tourists and locals.
For example, Instagram posts collected from Lake Constance region includes contributions from both
travelers and nearby residents. If such data are used indiscriminately, the recommendation model may
overfit to local preferences that are irrelevant for tourist-oriented next-POI suggestions.
        </p>
        <p>Demographic bias can also arise from population imbalances in content creation. For instance,
Instagram content may primarily come from younger users, whereas recommendations targeting
regions like Lake Constance are often aimed at an older audience. This mismatch can reduce the
relevance and efectiveness of recommendations. Moreover, the majority of media originates from the
DACH region (Germany, Switzerland, Austria), reflecting cultural and behavioral patterns specific to
this geographic area, which limits the generalizability of the dataset to broader international contexts.
One possible approach to mitigate this issue is to further analyze the user profiles to improve audience
segmentation and better align the system output with the preferences of the target user group.
Semantic Contradictions Across Modalities. Contradictions refer to semantic inconsistencies
between content from diferent modalities. In the current datasets, such inconsistencies mainly stem
from user-generated content (e.g., misaligned captions, incorrect location tags, irrelevant hashtags) or
from business-provided information that is outdated or inaccurate. A common example arises in the
multimodal Yelp dataset, where an LLM-generated summary based on the majority of user reviews
might describe a restaurant as “clean and quiet”, while user-uploaded photos reveal a cluttered or noisy
environment. Similar inconsistencies can also be found in Instagram posts, where images, captions, and
location tags do not always align. A photo may be taken in one city but tagged to another, or hashtags
may be added that are unrelated to the actual content. Such contradictions reduce the coherence and
reliability of the fused multimodal representation, posing challenges for both modality alignment and
inference stability in downstream models.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Possible Solutions &amp; Future Directions</title>
      <p>Given such challenges, we propose several directions for future research: (1) Future models should
adopt data-aware multimodal learning, where the quality of each modality is explicitly modeled during
training. This way, low-quality modalities can be dynamically downweighted in the modality fusion
process [9, 10]; (2) Current fusion strategies assume consistent and complete input across modalities,
which rarely holds in real-world scenarios. Future research should develop robust fusion mechanisms to
tolerate missing, noisy, or redundant inputs [11, 12, 13]. This includes support for preserving
modalityspecific uniqueness, reducing cross-modal redundancy, and enhancing synergistic signals to better
adapt to imperfect data; (3) Future studies should also focus on modeling user intent divergence across
diferent groups, such as tourists versus locals. Identifying and representing such behavioral diferences
during preprocessing or representation learning can mitigate distributional mismatches and improve
recommendation relevance [14, 15]; (4) In addition to these directions, we observe that videos (e.g., ones
posted by social media users or 360° immersive marketing videos) [16, 17] have not been leveraged much
in the literature for improved tourism recommendations. We see the incorporation of these additional
visual signals as a promising area for future work; (5) Given the increasing role of LLMs in multimodal
recommendation, we advocate for more research on structured and controlled LLM-based generation to
reduce hallucinations, thereby enhancing the reliability of generated content for downstream use [18].</p>
      <p>In summary, integrating data quality considerations into both upstream preprocessing and
downstream modeling is essential for building trustworthy, personalized, and data-aligned multimodal
recommender systems.</p>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[9] Q. Zhang, H. Wu, C. Zhang, Q. Hu, H. Fu, J. T. Zhou, X. Peng, Provable dynamic fusion for
low-quality multimodal data, in: Proceedings of the 40th International Conference on Machine
Learning, ICML’23, JMLR.org, 2023.
[10] S. Wei, Y. Luo, Y. Wang, C. Luo, Robust multimodal learning via representation decoupling, in:
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4,
2024, Proceedings, Part XLII, Springer-Verlag, Berlin, Heidelberg, 2024, p. 38–54. doi:10.1007/
978-3-031-72946-1_3.
[11] R. Lin, H. Hu, Missmodal: Increasing robustness to missing modality in multimodal sentiment
analysis, Transactions of the Association for Computational Linguistics 11 (2023) 1686–1702.
doi:10.1162/tacl_a_00628.
[12] Y.-H. H. Tsai, M. Ma, M. Yang, R. Salakhutdinov, L.-P. Morency, Multimodal routing: Improving
local and global interpretability of multimodal language analysis, in: B. Webber, T. Cohn, Y. He,
Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 1823–1833. URL:
https://aclanthology.org/2020.emnlp-main.143/. doi:10.18653/v1/2020.emnlp-main.143.
[13] C. Xu, Y. Zhang, Z. Guan, W. Zhao, Trusted multi-view learning with label noise, in: Proceedings
of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI ’24, 2024.
doi:10.24963/ijcai.2024/582.
[14] P. Sanchez, L. W. Dietz, Travelers vs. locals: The efect of cluster analysis in point-of-interest
recommendation, in: Proceedings of the 30th ACM Conference on User Modeling, Adaptation and
Personalization, UMAP ’22, Association for Computing Machinery, New York, NY, USA, 2022, p.
132–142. doi:10.1145/3503252.3531320.
[15] A. Derdouri, T. Osaragi, A machine learning-based approach for classifying tourists and locals
using geotagged photos: the case of tokyo, Information Technology &amp; Tourism 23 (2021) 575–609.
doi:10.1007/s40558-021-00208-3.
[16] M. Casillo, F. Colace, A. Lorusso, D. Santaniello, C. Valentino, Integrating physical and virtual
experiences in cultural tourism: An adaptive multimodal recommender system, IEEE Access 13
(2025) 28353–28368. doi:10.1109/ACCESS.2025.3539205.
[17] L. Argyriou, D. Economou, V. Bouki, Design methodology for 360° immersive video applications:
the case study of a cultural heritage virtual tour, Personal and Ubiquitous Computing 24 (2020)
843–859. doi:10.1007/s00779-020-01373-8.
[18] D. King, Z. Shen, N. Subramani, D. S. Weld, I. Beltagy, D. Downey, Don’t say what you don’t know:
Improving the consistency of abstractive summarization by constraining beam search, in: A.
Bosselut, K. Chandu, K. Dhole, V. Gangal, S. Gehrmann, Y. Jernite, J. Novikova, L. Perez-Beltrachini
(Eds.), Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and
Metrics (GEM), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates
(Hybrid), 2022, pp. 555–571. doi:10.18653/v1/2022.gem-1.51.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Gretzel</surname>
          </string-name>
          , W. Höpken, Recommender Systems in Tourism, Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -05324-6_
          <fpage>26</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Borràs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valls</surname>
          </string-name>
          ,
          <article-title>Intelligent tourism recommender systems: A survey</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>41</volume>
          (
          <year>2014</year>
          )
          <fpage>7370</fpage>
          -
          <lpage>7389</lpage>
          . doi:https://doi.org/10.1016/j.eswa.
          <year>2014</year>
          .
          <volume>06</volume>
          . 007.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Panigrahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pati</surname>
          </string-name>
          ,
          <article-title>Tourism recommendation system: a survey and future research directions</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>82</volume>
          (
          <year>2023</year>
          )
          <fpage>8983</fpage>
          -
          <lpage>9027</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11042-022-12167-w.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Höpken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <article-title>A survey on point-of-interest recommendations leveraging heterogeneous data</article-title>
          ,
          <source>Information Technology &amp; Tourism</source>
          <volume>27</volume>
          (
          <year>2025</year>
          )
          <fpage>29</fpage>
          -
          <lpage>73</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s40558-024-00301-3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-M.</given-names>
            <surname>Yiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>A survey on point-of-interest recommendation: Models, architectures, and security</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>37</volume>
          (
          <year>2025</year>
          )
          <fpage>3153</fpage>
          -
          <lpage>3172</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2025</year>
          .
          <volume>3551292</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schwarzenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Höpken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          ,
          <article-title>Do travel destinations meet my expectations? a comparison of tourists' perceptions and destinations' self-presentation through instagram posts by a convolutional neural network</article-title>
          , in: L.
          <string-name>
            <surname>Nixon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Tuomi</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. O'Connor</surname>
          </string-name>
          (Eds.),
          <source>Information and Communication Technologies in Tourism 2025</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>299</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -83705-0_
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Höpken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <article-title>Beyond visit trajectories: Enhancing poi recommendation via llm-augmented text and image representations</article-title>
          ,
          <source>in: Proceedings of the Nineteenth ACM Conference on Recommender Systems</source>
          , RecSys '25,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          , p.
          <fpage>521</fpage>
          -
          <lpage>526</lpage>
          . URL: https://doi.org/10.1145/3705328.3748014. doi:
          <volume>10</volume>
          .1145/ 3705328.3748014.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Neophytou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stinson</surname>
          </string-name>
          ,
          <article-title>Revisiting popularity and demographic biases in recommender evaluation and efectiveness</article-title>
          ,
          <source>in: Advances in Information Retrieval: 44th European Conference on IR Research</source>
          , ECIR
          <year>2022</year>
          , Stavanger, Norway,
          <source>April 10-14</source>
          ,
          <year>2022</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2022</year>
          , p.
          <fpage>641</fpage>
          -
          <lpage>654</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -99736-6_
          <fpage>43</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>