<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge is Power: Boosting Recommender Systems by Infusing LLMs with Domain Expertise</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Petruzzelli</string-name>
          <email>alessandro.petruzzelli@uniba.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cataldo Musto</string-name>
          <email>cataldo.musto@uniba.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <email>marco.degemmis@uniba.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <email>pasquale.lops@uniba.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>giovanni.semeraro@uniba.it</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Large Language Models (LLMs) have emerged as a powerful new paradigm for recommender systems. However, their efectiveness is often constrained by the general-purpose knowledge acquired during pre-training, which may lack the domain-specific detail required for specialized recommendation tasks. To address this, we introduce a comprehensive pipeline for injecting multi-source knowledge directly into an LLM. Our methodology extracts and lexicalizes information from item descriptions (textual), knowledge graphs (structured), and user-item interactions (collaborative). This external knowledge is then infused into the model through a unified fine-tuning process that simultaneously adapts the LLM to a top-k re-ranking task. We conduct extensive experiments across movie, music, and book domains, demonstrating that our approach significantly enhances recommendation accuracy, especially in domains less-covered by the LLM's original training data. Our knowledge-injected model achieves state-of-the-art performance, outperforming a wide array of baselines, including powerful zero-shot models like GPT-4, in the music and book domains. This paper serves as a discussion of the research originally presented in the paper referenced as [12].</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommender Systems</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Knowledge Injection</kwd>
        <kwd>Fine-Tuning</kwd>
        <kwd>Domain Adaptation</kwd>
        <kwd>Knowledge-Aware Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The evolution of recommender systems has progressed from collaborative filtering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which sufers
from data sparsity, towards Knowledge-Aware Recommender Systems (KARS) that leverage external
data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The latest paradigm shift involves Large Language Models (LLMs), which ofer unprecedented
zero-shot reasoning capabilities [11].
      </p>
      <p>
        Current LLM-based recommendation strategies fall into two categories. Non-tuning approaches use
pre-trained models like GPT-4 as-is, relying on sophisticated prompt engineering to elicit
recommendations [15]. This method is limited by the LLM’s static, general-purpose knowledge. Tuning approaches
adapt smaller, open-source LLMs (e.g., LLaMA [14]) to recommendation tasks via instruction tuning
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While frameworks like P5 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] unify various recommendation tasks into a text-to-text format, they
primarily focus on task adaptation rather than enriching the model’s core knowledge base.
      </p>
      <p>We identify a critical gap: the need to explicitly infuse LLMs with curated, domain-specific knowledge.
This process, which we term knowledge injection, is vital for enhancing the model’s understanding of
items, particularly in niche domains (e.g., technical books, indie music) that are underrepresented in
general pre-training corpora.</p>
      <p>This paper introduces a novel pipeline for injecting multi-source knowledge into an LLM for top-k
recommendation. Our contributions are:
1. A modular pipeline for extracting, lexicalizing, and injecting knowledge from textual descriptions,
knowledge graphs, and collaborative signals into an LLM.
2. A comprehensive analysis across three domains demonstrating how diferent knowledge types
impact recommendation accuracy.
3. Evidence that our knowledge-injected model achieves state-of-the-art performance,
outperforming strong baselines, including GPT-4, particularly in specialized domains.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Our goal is to improve top-k item recommendation by training an LLM to re-rank a candidate list
of items for a user . The methodology involves two primary phases: a unified training stage for
knowledge injection and task adaptation, followed by an inference stage. A detailed rappresentation of
our methodology is illustrated in Figure 1</p>
      <sec id="sec-2-1">
        <title>2.1. Knowledge Extraction and Lexicalization</title>
        <p>We extract knowledge from three heterogeneous sources and convert it into a natural language format
suitable for LLM consumption.</p>
        <p>
          • Textual Data: Raw item descriptions () are used directly as they are already in text format.
• Knowledge Graphs (KG): Structured KG triples (e.g., (Tenet, director, Christopher Nolan)) are
converted into sentences using predefined templates (e.g., " Tenet was directed by Christopher
Nolan.").
• Collaborative Data: We mine association rules from the user-item interaction matrix using
the Apriori algorithm [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Rules (e.g., {item A} → {item B, C}) are lexicalized into sentences like,
"People who like item A also tend to like item B and item C."
This multi-source approach is inspired by KARS research showing that combining these data types
yields robust performance [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Unified Training Phase</title>
        <p>We adopt a unified training strategy that combines task adaptation with knowledge injection in a single
step. The model is fine-tuned on a dataset containing two types of examples:
1. Instruction-Tuning Data: For each user, we generate prompts that frame the re-ranking task.</p>
        <p>
          The input contains the user’s history and candidate items, and the target is the ground-truth
ranked list.
2. Knowledge-Tuning Data: The lexicalized textual, KG, and collaborative information for each
item is formatted as input-output pairs where the model learns to reconstruct this knowledge.
Training optimizes a total loss total = k + i, where k is the reconstruction loss for the knowledge
data and i is the prediction loss for the instruction-tuning (re-ranking) data. Both are standard
cross-entropy losses for next-token prediction. We use Low-Rank Adaptation (LoRA) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for
parametereficient fine-tuning.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Inference</title>
        <p>At inference time, the fine-tuned LLM is given a prompt containing a test user’s history and a list
of candidate items. The model generates a ranked list, which is parsed to extract the final top-k
recommendations.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <p>Our experiments address three research questions: (RQ1) How do individual knowledge types afect
performance? (RQ2) How does combining knowledge sources impact performance? (RQ3) How does
our model compare to state-of-the-art baselines?
Datasets. We use three public datasets: MovieLens 1M (movies), Last.FM (music), and DBbook
(books). Item features (textual, graph) are mapped from DBpedia.</p>
      <p>Implementation. We use LLaMA 3 8B Instruct as our base model. The evaluation protocol follows a
standard user-based split (80% fine-tuning, 20% test).</p>
      <p>
        Baselines. We compare our model against three families of baselines:
• Collaborative Filtering: BPR [13], MultiVAE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and SimpleX [10].
• Graph-based: LightGCN [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and CFKG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
• LLM-based: Zero-shot prompting with GPT-3.5, GPT-4, the base LLaMA 3 model, and the
tuned P5 model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>Our results are summarized in Table 1 and organized by our research questions.</p>
      <sec id="sec-4-1">
        <title>4.1. RQ1 &amp; RQ2: Impact of Knowledge</title>
        <p>We first analyzed the efect of injecting diferent knowledge sources individually and in combination.
The findings are highly domain-dependent.</p>
        <p>For MovieLens, simply fine-tuning the LLM on the re-ranking task without any external knowledge
(‘LLaMA w/o knowledge‘) yielded the best results within our framework. Injecting additional knowledge
did not provide further gains and in some cases slightly degraded performance. This suggests that the
base LLaMA 3 model already possesses extensive knowledge about the popular movie domain, making
additional injection redundant. The massive, proprietary GPT-4 model performs best overall on this
dataset, likely due to its even larger scale and more comprehensive pre-trained knowledge base.</p>
        <p>For Last.FM (Music), the scenario is reversed. Here, injecting external knowledge provides a
substantial and statistically significant performance boost. The best-performing single source was
Textual data, which significantly outperformed the no-knowledge variant. Combining knowledge
sources did not yield further improvements over using textual data alone. This indicates that for music,
rich descriptive text is the most critical missing piece of information for the LLM.</p>
        <p>For DBbook (Books), knowledge injection was again highly efective. The best performance was
achieved by combining all three knowledge sources (Collaborative + Graph + Text), which delivered
a statistically significant improvement over the no-knowledge baseline. This suggests that the book
domain benefits from a more holistic set of information, blending content descriptions, structured
metadata, and user behavior patterns.</p>
        <p>A key takeaway is that the value of knowledge injection is inversely proportional to the
domain’s representation in the LLM’s pre-training data. For well-covered domains like movies,
task-tuning is suficient. For specialized or niche domains like music and books, explicit knowledge
injection is crucial for achieving high accuracy.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. RQ3: Comparison with Baselines</title>
        <p>As shown in Table 1, our approach demonstrates state-of-the-art performance.</p>
        <p>First, fine-tuning LLaMA 3 (even without knowledge) dramatically outperforms zero-shot LLM
baselines (including GPT-4 in many cases) and traditional methods on the music and book datasets.
This highlights the power of adapting an LLM to the specific task and data distribution.</p>
        <p>Second, our final knowledge-injected model, LLaMA-KI, sets a new state of the art on the music
and book domains, decisively outperforming all other models, including the much larger GPT-4. This
is a critical finding: a smaller, open-source model, when infused with the right domain knowledge, can
surpass a massive, general-purpose model. This shows that targeted knowledge is a more eficient path
to high performance in specialized domains than simply scaling up the model size.</p>
        <p>In the movie domain, while our tuned model significantly outperforms traditional baselines and
other LLM approaches like P5, it does not surpass GPT-4 in terms of nDCG. However, it achieves higher
recall and recommends less popular items, indicating a better ability to handle the cold-start setting of
our evaluation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>We presented a versatile pipeline for injecting multi-source domain knowledge into LLMs for
recommendation. Our experiments demonstrate that this approach is highly efective, yielding state-of-the-art
results, particularly in domains where a general-purpose LLM’s pre-trained knowledge is insuficient.
The results underscore a key principle: for specialized recommendation tasks, targeted knowledge
injection can be more valuable than raw model scale.</p>
      <p>Future work will focus on: (1) integrating more diverse knowledge sources like user reviews and
multimodal data (e.g., images, audio); (2) developing methods to automatically assess the quality and
relevance of knowledge sources before injection; and (3) exploring the trade-ofs between model size,
the amount of injected knowledge, and computational costs.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This research is partially funded by the PNRR project FAIR—Future AI Research (PE00000013),
Spoke 6—Symbiotic AI, under the NRRP MUR program supported by NextGenerationEU (CUP
H97G22000210007), and the PHaSE project — Promoting Healthy and Sustainable Eating through
Interactive and Explainable AI Methods, funded by MUR under the PRIN 2022 program - Finanziato
dall’Unione europea - NextGeneration EU, Missione 4 Componente 1 (CUP H53D23003530006).</p>
      <p>The models are developed using the Leonardo supercomputer with the support of CINECA-Italian
Supercomputing Resource Allocation, under class C projects IscrC_LLM-REC (HP10CTEUGX) and
IscrC_SYMBREC (HP10C1A4P8).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author did not use any AI tool.
Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS
2019, December 8-14, 2019, Vancouver, BC, Canada, pages 5712–5723, 2019. URL https://proceedings.
neurips.cc/paper/2019/hash/a2186aa7c086b46ad4e8bf81e2a3a19b-Abstract.html.
[10] Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, and Xiuqiang He.</p>
      <p>Simplex: A simple and strong baseline for collaborative filtering. In Gianluca Demartini, Guido
Zuccon, J. Shane Culpepper, Zi Huang, and Hanghang Tong, editors, CIKM ’21: The 30th ACM
International Conference on Information and Knowledge Management, Virtual Event, Queensland,
Australia, November 1 - 5, 2021, pages 1243–1252. ACM, 2021. doi: 10.1145/3459637.3482297. URL
https://doi.org/10.1145/3459637.3482297.
[11] OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/ARXIV.2303.08774.</p>
      <p>URL https://doi.org/10.48550/arXiv.2303.08774.
[12] Alessandro Petruzzelli, Cataldo Musto, Marco de Gemmis, Giovanni Semeraro, and Pasquale
Lops. Empowering recommender systems based on large language models through knowledge
injection techniques. In Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and
Personalization, UMAP ’25, page 40–50, New York, NY, USA, 2025. Association for Computing
Machinery. ISBN 9798400713132. doi: 10.1145/3699682.3728341. URL https://doi.org/10.1145/
3699682.3728341.
[13] Stefen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. BPR: bayesian
personalized ranking from implicit feedback. In Jef A. Bilmes and Andrew Y. Ng, editors, UAI 2009,
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC,
Canada, June 18-21, 2009, pages 452–461. AUAI Press, 2009. URL https://www.auai.org/uai2009/
papers/UAI2009_0139_48141db02b9f0b02bc7158819ebfa2c7.pdf.
[14] Hugo Touvron, Louis Martin, and Kevin Stone. Llama 2: Open Foundation and Fine-Tuned Chat</p>
      <p>Models.
[15] Lei Wang and Ee-Peng Lim. Zero-shot next-item recommendation using large pretrained language
models. CoRR, abs/2304.03153, 2023. doi: 10.48550/ARXIV.2304.03153. URL https://doi.org/10.
48550/arXiv.2304.03153.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          .
          <article-title>Fast Algorithms for Mining Association Rules</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Qingyao</given-names>
            <surname>Ai</surname>
          </string-name>
          , Vahid Azizi, Xu Chen,
          <string-name>
            <given-names>and Yongfeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Learning heterogeneous knowledge base embeddings for explainable recommendation</article-title>
          .
          <source>Algorithms</source>
          ,
          <volume>11</volume>
          (
          <issue>9</issue>
          ),
          <year>2018</year>
          . ISSN 1999-
          <volume>4893</volume>
          . doi:
          <volume>10</volume>
          .3390/a11090137. URL https://www.mdpi.com/1999-4893/11/9/137.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Vito</given-names>
            <surname>Walter</surname>
          </string-name>
          <string-name>
            <surname>Anelli</surname>
          </string-name>
          , Pierpaolo Basile, Tommaso Di Noia, Francesco Maria Donini, Antonio Ferrara, Cataldo Musto, Fedelucio Narducci, Azzurra Ragone, and Markus Zanker, editors.
          <source>Proceedings of the Sixth Knowledge-aware and Conversational Recommender Systems Workshop co-located with 18th ACM Conference on Recommender Systems (RecSys</source>
          <year>2024</year>
          ), Bari, Italy,
          <year>October 18th</year>
          ,
          <year>2024</year>
          , volume
          <volume>3817</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2024</year>
          .
          <article-title>CEUR-WS.org</article-title>
          . URL https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3817</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Janneth</given-names>
            <surname>Chicaiza</surname>
          </string-name>
          and
          <string-name>
            <given-names>Priscila</given-names>
            <surname>Valdiviezo-Diaz</surname>
          </string-name>
          .
          <article-title>A Comprehensive Survey of Knowledge GraphBased Recommender Systems: Technologies, Development, and Contributions</article-title>
          . Information,
          <volume>12</volume>
          (
          <issue>6</issue>
          ):
          <fpage>232</fpage>
          , May
          <year>2021</year>
          .
          <article-title>ISSN 2078-2489</article-title>
          . doi:
          <volume>10</volume>
          .3390/info12060232. URL https://www.mdpi.com/2078-2489/ 12/6/232.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Shijie</given-names>
            <surname>Geng</surname>
          </string-name>
          , Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang.
          <article-title>Recommendation as language processing (RLP): A unified pretrain, personalized prompt &amp; predict paradigm (P5)</article-title>
          . In Jennifer Golbeck,
          <string-name>
            <given-names>F.</given-names>
            <surname>Maxwell</surname>
          </string-name>
          <string-name>
            <surname>Harper</surname>
          </string-name>
          , Vanessa Murdock,
          <string-name>
            <given-names>Michael D.</given-names>
            <surname>Ekstrand</surname>
          </string-name>
          , Bracha Shapira, Justin Basilico, Keld T. Lundgaard, and Even Oldridge, editors,
          <source>RecSys '22: Sixteenth ACM Conference on Recommender Systems</source>
          , Seattle, WA, USA, September
          <volume>18</volume>
          -
          <issue>23</issue>
          ,
          <year>2022</year>
          , pages
          <fpage>299</fpage>
          -
          <lpage>315</lpage>
          . ACM,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1145/3523227.3546767. URL https://doi.org/10.1145/3523227.3546767.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Xiangnan</given-names>
            <surname>He</surname>
          </string-name>
          , Kuan Deng, Xiang Wang,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yong-Dong Zhang</surname>
            , and
            <given-names>Meng</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Lightgcn: Simplifying and powering graph convolution network for recommendation</article-title>
          .
          <source>In Jimmy X. Huang</source>
          ,
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chang</surname>
          </string-name>
          , Xueqi Cheng, Jaap Kamps, Vanessa Murdock,
          <string-name>
            <surname>Ji-Rong Wen</surname>
          </string-name>
          , and Yiqun Liu, editors,
          <source>Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2020</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , China,
          <source>July 25-30</source>
          ,
          <year>2020</year>
          , pages
          <fpage>639</fpage>
          -
          <lpage>648</lpage>
          . ACM,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/3397271.3401063. URL https://doi.org/10.1145/3397271.3401063.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Herlocker</surname>
          </string-name>
          , Joseph A.
          <string-name>
            <surname>Konstan</surname>
            , Al Borchers,
            <given-names>and John Riedl.</given-names>
          </string-name>
          <article-title>An algorithmic framework for performing collaborative filtering</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>51</volume>
          (
          <issue>2</issue>
          ):
          <fpage>227</fpage>
          -
          <lpage>234</lpage>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1145/3130348. 3130372. URL https://doi.org/10.1145/3130348.3130372.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Edward</surname>
            <given-names>J Hu</given-names>
          </string-name>
          , Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu,
          <string-name>
            <given-names>Yuanzhi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shean</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lu</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Weizhu</given-names>
            <surname>Chen</surname>
          </string-name>
          . Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          .
          <source>arXiv preprint arXiv:2106.09685</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jianxin</given-names>
            <surname>Ma</surname>
          </string-name>
          , Chang Zhou, Peng Cui,
          <string-name>
            <given-names>Hongxia</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Wenwu</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Learning disentangled representations for recommendation</article-title>
          . In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer,
          <article-title>Florence d'Alché-Buc, Emily B</article-title>
          .
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , and Roman Garnett, editors,
          <source>Advances in Neural Information</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>