<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Beyond Homogeneous Users: Simulating Diverse User Personas for Conversational Recommendation Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Francesco Maria Martina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Petruzzelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cataldo Musto</string-name>
          <email>cataldo.musto@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <email>marco.degemmis@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <email>pasquale.lops@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>giovanni.semeraro@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bari Aldo Moro</institution>
          ,
          <addr-line>Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Conversational Recommender Systems (CRSs) leverage multi-turn interactions to assist users in decision-making. The quality of these systems is heavily dependent on the datasets used for their training. Existing datasets, often created by crowdworkers or proprietary Large Language Models (LLMs), frequently lack conversational coherence, fail to model diverse user behaviors, and sufer from reproducibility issues. To address these limitations, we introduce DistillRecDial, a conversational recommendation dataset that incorporates varied user personas and interaction patterns. The dataset is generated using a Large-to-Small Language Model Distillation pipeline, enabling dialogue synthesis without reliance on closed, resource-intensive LLMs. We model user heterogeneity in preferences, goals, and dialogic behavior, resulting in a more realistic and diverse corpus. Human and automated evaluations indicate that DistillRecDial surpasses existing datasets in dialogue quality and diversity. To promote reproducible research, the dataset and generation code are publicly released and integrated into the CRSLab framework. This article is a discussion paper for our work accepted to the Reproducibility Track at ACM RecSys 2025; the code and resources can be found in [12].</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversational Recommendation</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Datasets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Conversational Recommender Systems (CRSs) enhance user engagement by providing personalized
recommendations through multi-turn dialogue [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The development of end-to-end CRSs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has
intensified the need for high-quality conversational datasets. However, existing corpora present notable
challenges.Human-generated datasets, such as ReDial [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], often result from role-playing exercises that
lack genuine user intent, leading to superficial conversations [
        <xref ref-type="bibr" rid="ref2 ref8">8, 2</xref>
        ]. While recent methods using Large
Language Models (LLMs) for synthetic data generation [
        <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
        ] have improved coherence, they
introduce two primary issues. First, they typically depend on proprietary LLMs, which limits accessibility
and reproducibility. Second, they model users as a homogeneous group, failing to capture the
behavioral diversity observed in real-world interactions where users may have varying levels of preference
clarity and initiative [
        <xref ref-type="bibr" rid="ref1">1, 14</xref>
        ]. To overcome these gaps, we present DistillRecDial, a conversational
recommendation dataset designed around explicit user personas. Our contributions are:
1. A dataset that models heterogeneous user behaviors by defining five distinct user stereotypes
based on preference and intention clarity.
2. A Large-to-Small knowledge distillation pipeline that enables scalable, high-quality dialogue
generation using open-source models, ensuring reproducibility.
3. A comprehensive evaluation demonstrating that DistillRecDial exhibits superior dialogue
quality and diversity compared to established datasets.
4. The public release of the dataset, generation code, and its integration into the CRSLab benchmark
[15].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Early CRS datasets like ReDial [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], GoRecDial [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and INSPIRED [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] were generated by human
annotators. While foundational, these datasets are dificult to scale and can have inconsistent quality.
      </p>
      <p>
        Synthetic data generation using LLMs has emerged as a scalable alternative. Initial eforts converted
single-turn interactions to dialogues [
        <xref ref-type="bibr" rid="ref11 ref4">4, 11</xref>
        ], but often inherited the limitations of the source data.
More recent work, such as PEARL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and LLM-REDIAL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], employed GPT-3.5 as a user simulator.
These methods produce more coherent dialogues but rely on closed-source APIs, posing challenges for
reproducibility. Furthermore, they do not explicitly model the diversity of user interaction patterns.
      </p>
      <p>
        Our work difers by systematically modeling user variation through stereotypes. We utilize a
knowledge distillation pipeline [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], transferring the capabilities of a large "teacher" model to a smaller,
open-source "student" model. This approach allows for the creation of a high-quality, diverse, and
reproducible dataset without dependence on proprietary models.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Construction</title>
      <p>The generation of DistillRecDial was designed to produce high-quality dialogues reflecting diverse
user behaviors. The pipeline involves grounding conversations in real user data, defining user
stereotypes to guide dialogue flow, and using knowledge distillation for scalable generation. A summary of
dataset statistics is provided in Table 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Grounding and User Stereotypes</title>
        <p>To ground dialogues in realistic preferences, we utilized the Amazon Reviews dataset, focusing on the
Movies and TV category. After applying a 12-core filter to ensure user profiles were suficiently dense,
the data contained 23,456 users and 15,597 items. We enriched item metadata with information from The
Movie Database (TMDB), including genres, keywords, and cast, to enable content-based conversations.</p>
        <p>A core feature of our work is the modeling of user heterogeneity. We defined five user stereotypes by
combining two behavioral dimensions: Preference Expression (None, Implicit, Explicit) and Intention
Clarity (None, Vague, Specific), based on prior user modeling literature [ 14]. (1) Curious Newcomer:
No history, vague goal. (2) Focused Newcomer: No history, specific goal. (3) History-Based Browser:
Has history, no specific goal. (4) Guided Explorer: Has history, seeks novelty. (5) Preference-Driven
Searcher: Has history, expresses explicit and complex preferences.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Prompt Design and Generation Pipeline</title>
        <p>For each of the 23,456 users, we randomly assigned one of the five stereotypes. A structured prompt was
created containing: (1) a description of the user’s persona and goal, (2) a target item to be recommended,
and (3) the user’s interaction history (if applicable), with features conditioned on the assigned stereotype
(Table 2).
1. Teacher Generation: A teacher model (LLaMA 3.3 70B) generated high-quality dialogues for
10% of the prompts. These dialogues served as exemplars.
2. Knowledge Distillation: A smaller student model (LLaMA 3.1 8B) was fine-tuned on the
teacher-generated dialogues. This ofline distillation transfers the teacher’s stylistic and structural
capabilities to the student [13].
3. Scalable Generation: The fine-tuned student model generated dialogues for all 23,456 prompts.</p>
        <p>After filtering for hallucinations, the final dataset contains 21,039 high-quality dialogues. Example
snippets are shown in Table 3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We evaluated DistillRecDial to address three research questions regarding dialogue quality (RQ1),
diversity (RQ2), and downstream task performance (RQ3). We compared our dataset with ReDial,
INSPIRED, and PEARL.</p>
      <sec id="sec-4-1">
        <title>4.1. Human Evaluation of Dialogue Quality (RQ1)</title>
        <p>We conducted a head-to-head human evaluation where 10 evaluators compared 100 dialogues from
DistillRecDial against 100 from each baseline. As shown in Figure 1, DistillRecDial was consistently
preferred across all criteria, including naturalness, relevance, and consistency. It outperformed
humangenerated datasets (ReDial, INSPIRED) by a wide margin and also demonstrated higher quality than the
LLM-generated PEARL dataset.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Automated Evaluation of Dialogue Diversity (RQ2)</title>
        <p>To assess diversity, we computed the average pairwise cosine similarity of utterances at each turn for
DistillRecDial and PEARL. As illustrated in Figure 2, DistillRecDial exhibits significantly lower
turn-level similarity (0.422) compared to PEARL (0.598), indicating greater linguistic diversity. PEARL’s
high similarity in early turns is due to repetitive openings, whereas our stereotype-driven approach
generates more varied initial user requests.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Downstream Task Performance (RQ3)</title>
        <p>We benchmarked standard CRS models on DistillRecDial using the CRSLab framework.</p>
        <p>Recommendation Performance: As shown in Table 4, BERT, which leverages rich textual context,
achieved the strongest recommendation performance (Hit@10 = 0.1728). Sequential models like SASRec
performed less favorably, suggesting that the context-rich dialogues in our dataset are well exploited
by pre-trained language models. Integrated CRS models like KBRD, INSPIRED, and ReDial struggled,
indicating that they do not generalize well to the complex conversational structures present in
DistillRecDial. This highlights the challenge of jointly optimizing for recommendation and dialogue on
realistic, user-aware data and points to opportunities for developing more advanced CRS architectures.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We introduced DistillRecDial, a large-scale, diverse, and reproducible conversational recommendation
dataset. By modeling distinct user stereotypes and employing a knowledge distillation pipeline with
open-source models, we address key limitations of prior datasets related to behavioral homogeneity and
reliance on proprietary LLMs. Our evaluations confirm that DistillRecDial exhibits higher dialogue
quality and diversity. By integrating it into the CRSLab framework, we provide a robust benchmark to
spur the development of next-generation CRSs capable of adapting to a wide range of user interaction
styles.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This research is partially funded by the PNRR project FAIR—Future AI Research (PE00000013),
Spoke 6—Symbiotic AI, under the NRRP MUR program supported by NextGenerationEU (CUP
H97G22000210007), and the PHaSE project — Promoting Healthy and Sustainable Eating through
Interactive and Explainable AI Methods, funded by MUR under the PRIN 2022 program - Finanziato
dall’Unione europea - NextGeneration EU, Missione 4 Componente 1 (CUP H53D23003530006).</p>
      <p>The models are developed using the Leonardo supercomputer with the support of CINECA-Italian
Supercomputing Resource Allocation, under class C projects IscrC_LLM-REC (HP10CTEUGX) and
IscrC_SYMBREC (HP10C1A4P8).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author did not use any AI tool.
[13] Anup Shirgaonkar, Nikhil Pandey, Nazmiye Ceren Abay, Tolga Aktas, and Vijay Aski. Knowledge
distillation using frontier open-source llms: Generalizability and the role of synthetic data. arXiv
preprint arXiv:2410.18588, 2024.
[14] Junjie Zhang, Ruobing Xie, Yupeng Hou, Xin Zhao, Leyu Lin, and Ji-Rong Wen. Recommendation
as instruction following: A large language model empowered recommendation approach. ACM
Trans. Inf. Syst., December 2024.
[15] Kun Zhou, Xiaolei Wang, Yuanhang Zhou, Chenzhan Shang, Yuan Cheng, Wayne Xin Zhao, Yaliang
Li, and Ji-Rong Wen. Crslab: An open-source toolkit for building conversational recommender
system. arXiv preprint arXiv:2101.00939, 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Wanling</given-names>
            <surname>Cai</surname>
          </string-name>
          , Yucheng Jin,
          <string-name>
            <given-names>and Li</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Impacts of personal characteristics on user trust in conversational recommender systems</article-title>
          .
          <source>In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI '22. ACM</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Shuyu</given-names>
            <surname>Guo</surname>
          </string-name>
          , Shuo Zhang, Weiwei Sun, Pengjie Ren, Zhumin Chen, and
          <string-name>
            <given-names>Zhaochun</given-names>
            <surname>Ren</surname>
          </string-name>
          .
          <article-title>Towards explainable conversational recommender systems</article-title>
          .
          <source>In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>2786</fpage>
          -
          <lpage>2795</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Shirley</given-names>
            <surname>Anugrah</surname>
          </string-name>
          <string-name>
            <surname>Hayati</surname>
          </string-name>
          , Dongyeop Kang, Qingxiaoyang Zhu,
          <string-name>
            <given-names>Weiyan</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zhou</given-names>
            <surname>Yu</surname>
          </string-name>
          . Inspired:
          <article-title>Toward sociable recommendation dialog systems</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>8142</fpage>
          -
          <lpage>8152</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Zhankui</given-names>
            <surname>He</surname>
          </string-name>
          , Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, and
          <string-name>
            <surname>Julian McAuley</surname>
          </string-name>
          .
          <article-title>Large language models as zero-shot conversational recommenders</article-title>
          .
          <source>In Proceedings of the 32nd ACM international conference on information and knowledge management</source>
          , pages
          <fpage>720</fpage>
          -
          <lpage>730</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Geofrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          , Oriol Vinyals, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distilling the knowledge in a neural network</article-title>
          .
          <source>arXiv preprint arXiv:1503.02531</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dietmar</given-names>
            <surname>Jannach</surname>
          </string-name>
          , Ahtsham Manzoor, Wanling Cai,
          <string-name>
            <given-names>and Li</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>A survey on conversational recommender systems</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>54</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Dongyeop</given-names>
            <surname>Kang</surname>
          </string-name>
          , Anusha Balakrishnan, Pararth Shah, Paul A Crook,
          <string-name>
            <surname>Y-Lan Boureau</surname>
            , and
            <given-names>Jason</given-names>
          </string-name>
          <string-name>
            <surname>Weston</surname>
          </string-name>
          .
          <article-title>Recommendation as a communication game: Self-supervised bot-play for goal-oriented dialogue</article-title>
          .
          <source>In Proceedings of the 2019 Conference on EMNLP-IJCNLP</source>
          , pages
          <fpage>1951</fpage>
          -
          <lpage>1961</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Minjin</given-names>
            <surname>Kim</surname>
          </string-name>
          , Minju Kim, Hana Kim, Beong-woo Kwak, SeongKu Kang, Youngjae Yu,
          <string-name>
            <given-names>Jinyoung</given-names>
            <surname>Yeo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Dongha</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Pearl: A review-driven persona-knowledge grounded conversational recommendation dataset</article-title>
          .
          <source>In Findings of the ACL</source>
          <year>2024</year>
          , pages
          <fpage>1105</fpage>
          -
          <lpage>1120</lpage>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Raymond</given-names>
            <surname>Li</surname>
          </string-name>
          , Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Pal</surname>
          </string-name>
          .
          <article-title>Towards deep conversational recommendations</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>31</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tingting</surname>
            <given-names>Liang</given-names>
          </string-name>
          , Chenxin Jin, Lingzhi Wang, Wenqi Fan, Congying Xia, Kai Chen, and
          <string-name>
            <given-names>Yuyu</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <article-title>Llm-redial: A large-scale dataset for conversational recommender systems created from user behaviors with llms</article-title>
          .
          <source>In Findings of the Association for Computational Linguistics ACL</source>
          <year>2024</year>
          , pages
          <fpage>8926</fpage>
          -
          <lpage>8939</lpage>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yu</surname>
            <given-names>Lu</given-names>
          </string-name>
          , Junwei Bao, Zichen Ma, Xiaoguang Han,
          <string-name>
            <surname>Youzheng Wu</surname>
            , Shuguang Cui, and
            <given-names>Xiaodong</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          .
          <article-title>August: an automatic generation understudy for synthesizing conversational recommendation datasets</article-title>
          .
          <source>In Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          , pages
          <fpage>10538</fpage>
          -
          <lpage>10549</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Francesco Maria Martina</surname>
          </string-name>
          , Alessandro Petruzzelli, Cataldo Musto, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro.
          <article-title>DistillRecDial: A knowledge-distilled dataset capturing user diversity in conversational recommendation</article-title>
          .
          <source>In Proceedings of the Nineteenth ACM Conference on Recommender Systems</source>
          , RecSys '
          <fpage>25</fpage>
          . ACM,
          <year>September 2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>