<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sequential Recommenders and Multimodal Inputs: Mitigating Data Quality Issues in Industry-Scale Recommenders</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Malte Lichtenberg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Rufini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albatross AI</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Poor data quality remains a key bottleneck for advancing recommender systems in industrial settings. In this talk, we argue that sequential transformer-based recommenders, particularly ID-based architectures such as SASRec [1], are more robust to several forms of data quality issues compared to traditional learning-to-rank approaches. However, they sufer acutely from item cold-start, which we treat as a missing-modality problem. We then discuss how multimodal content embeddings can address this challenge and present DenseRec [2], a simple but efective method to mitigate item cold-start by integrating dense content embeddings into sequential models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Quality</kwd>
        <kwd>Multimodal Recommenders</kwd>
        <kwd>Sequential Recommenders</kwd>
        <kwd>Item Cold Start</kwd>
        <kwd>Content Embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        embeddings with dense multimodal representations in a way that preserves the advantages of both.
This connects directly to the workshop’s focus on data quality in multimodal recommendation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:
item cold-start represents a severe form of missing-modality scenario where the behavioral modality is
completely absent [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        DenseRec: A Dual-Path Approach. We proposed DenseRec [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a simple yet efective dual-path
method for sequential recommendation. DenseRec (Figure 1) retains a primary ID-based path while
adding a secondary “dense path” derived from pretrained content embeddings. A learnable projection
layer aligns the dense representations with the ID embedding space. During training, both paths are
stochastically activated with probability dense, a hyper parameter. At inference time, the dense path is
used only for cold-start items, while known items rely exclusively on the ID path. This ensures that
multimodal signals augment the model only where necessary, thereby reducing destructive interference
with strong behavioral signals. The approach treats item cold-start as a missing-modality problem and
applies a selective integration strategy to maintain recommendation quality across both warm and cold
items.
      </p>
      <p>
        Experimental Validation. We evaluated DenseRec on three Amazon Reviews 2023 datasets [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
using time-based train/validation/test splits with strict cold-start: unseen items appear only in
the test set. For dense content embeddings, we used the all-MiniLM-L6-v2 model1 from the
sentence-transformers library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to embed item content. We compared DenseRec against a
strong ID-only SASRec baseline, applying hyperparameter optimization only to the baseline for
fairness. Results, measured in HR@100, show that DenseRec significantly mitigates item cold-start, with
performance varying smoothly as a function of dense. In particular, dense = 0.5 yielded a favorable
balance between ID and dense signals, demonstrating that careful integration of multimodal content
can address data quality issues arising from missing behavioral data. In ongoing work, we experiment
with multi-modal embeddings to build upon the results presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Conclusion. Sequential ID-based recommenders provide robustness against many DQ issues
prevalent in traditional LTR approaches, but struggle with item cold-start. DenseRec introduces a lightweight,
non-destructive mechanism to incorporate multimodal content embeddings, reducing cold-start while
retaining the strengths of ID-based modeling. More broadly, our findings highlight that understanding
DQ challenges, including missing modalities in the form of cold-start, can directly inform the design of
multimodal sequential models.
1https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
Declaration on Generative AI
The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.-C.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <article-title>Self-attentive sequential recommendation, in: 2018 IEEE international conference on data mining (ICDM)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Lichtenberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Candia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rufini</surname>
          </string-name>
          , Denserec:
          <article-title>Revisiting dense content embeddings for sequential transformer-based recommendation</article-title>
          ,
          <source>arXiv preprint arXiv:2508.18442</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Malitesta</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. C. M. Mancino</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>McAuley</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Melchiorre</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Nawaz</surname>
          </string-name>
          , First international workshop
          <article-title>on data quality-aware multimodal recommendation (daquamrec)</article-title>
          ,
          <source>in: Proceedings of the Nineteenth ACM Conference on Recommender Systems</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>1378</fpage>
          -
          <lpage>1382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Keshavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sathiamoorthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Heldt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tandon</surname>
          </string-name>
          , et al.,
          <article-title>Better generalization with semantic ids: A case study in ranking for recommendations</article-title>
          ,
          <source>in: Proceedings of the 18th ACM Conference on Recommender Systems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1039</fpage>
          -
          <lpage>1044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
            ,
            <given-names>W. X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Learning vector-quantized item representation for transferable sequential recommenders</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>1162</fpage>
          -
          <lpage>1171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Are id embeddings necessary? whitening pre-trained text embeddings for efective sequential recommendation</article-title>
          ,
          <source>in: 2024 IEEE 40th International Conference on Data Engineering (ICDE)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>530</fpage>
          -
          <lpage>543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ganhör</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moscati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hausberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nawaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <article-title>A multimodal single-branch embedding network for recommendation in cold-start and missing modality scenarios</article-title>
          ,
          <source>in: Proceedings of the 18th ACM Conference on Recommender Systems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>380</fpage>
          -
          <lpage>390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <article-title>Bridging language and items for retrieval and recommendation</article-title>
          ,
          <source>arXiv preprint arXiv:2403.03952</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>