<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the PIR Track at FIRE 2024: Evaluation of Personalised Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pranav Kasela</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Braga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Efrosyni Sokli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gian Carlo Milanese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Peikos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandip Modha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Raganato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Viviani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) Lab, Department of Informatics, Systems, and Communication (DISCo) University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This abstract provides a short overview of the first edition of the shared task on Personalised Information Retrieval (PIR) organized at the 16th Forum for Information Retrieval Evaluation (FIRE 2024). A more detailed discussion of the approaches used by the participating teams is available in the track overview paper. PIR 2024 consisted of two sub-tasks. The first sub-task aims to explore the personalisation in cQA based on user profiles, following the standard IR pipeline. The second one, instead, aims to investigate the personalisation in cQA based on user profiles using recent LLMs and prompt engineering. Although the tasks saw an enthusiastic response in registrations, with 10 teams requesting the dataset, only 1 team finally submitted the runs, and 2 of them submitted the working notes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information Retrieval</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Personalization</kwd>
        <kwd>Large Language Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Task Definition</title>
      <sec id="sec-2-1">
        <title>The first edition of PIR consisted of the following two sub-tasks:</title>
        <sec id="sec-2-1-1">
          <title>2.1. Task 1: Standard IR</title>
          <p>
            The cQA task will be tackled as a standard ad-hoc IR task, where the questions are going to be considered
as the queries, and the collection, from which the answers will be retrieved, is composed by all the
answers available in the dataset. In this case, personalization can be tackled using any standard or
novel technique to create a user profile and inject it in the retrieval model. We plan to provide multiple
baselines that utilize, as first stage retrievers, both classical approaches such as BM25, and neural
approaches based on BERT-like models [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. As a second stage, we plan to provide re-rankers, using
cross-encoders, like Mono-T5 [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], for non-personalized baselines, and for personalized baselines, using
of a mix of tags and historical documents related to the users and weighted according to their importance
for the current question.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. Task 2: Prompt-based IR</title>
          <p>
            Diferently from the second stage of the standard IR task, the proposed prompt-based baselines
personalise the results by using models like Phi [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] and GPT [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] with prompts similar to the following one:
“To which degree between 0 and 1 does the document [DOCUMENT] answer the question [QUESTION],
and is relevant to a user with the following profile [USER PROFILE]”, where the [USER PROFILE] is a
series of user interests that are inferred from their activities and ordered according to their timestamp
(most recent first) and importance.
          </p>
          <p>
            More details about how this dataset was created can be found in the original resource paper [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset and Evaluation</title>
      <p>In this section, we discuss the datasets for each sub-task and the evaluation metrics used for each of
them.</p>
      <p>The PIR-FIRE will use data from StackExchange, a very popular community Question Answering
(cQA) platform. The data is publicly available1 under a cc-by-sa 4.0 license. The dataset is composed of
questions, and their answers, collected from fifty communities, which can be categorized under the
large umbrella of humanistic communities. In Table 1 we report the basic statistics for the dataset.
Specifically: document length, measured in the number of words, document score, which is the diference
between the number of up- and down-votes assigned by the community; answers’ count, the number of
answers given to a question; comments’ count, the number of user comments to a given question or
answer; favorite count, that indicates the number of users that flagged the question as their favorite,
showing their interest in that topic; tags count, the number of tags associated to the question by the
asking user.</p>
      <p>
        The dump is curated and merged to tackle the cQA task as a retrieval task. The PIR-FIRE test collections
provide the traditional components used in IR experiments, i.e. access to a document collection, search
topics, and corresponding relevance judgments. Regarding the judgments, we only consider relevant
the single answer that is explicitly labelled as the best answer by the user who submitted the question.
In addition, our PIR evaluation test collections [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are accompanied by user-related information for
modelling and introducing profiles in evaluation experiments. The user-related information includes
the text and the number of views of documents they have generated, and in many cases also the tags
associated with these documents, the date since they are registered on the website, the badges they
obtained, their reputation score, and some times also their autobiography. The information collected as
previously explained can be used for personalising and adapting the search process to the current user,
e.g. by creating and exploiting personal user profiles.
std
3.0.1. Evaluation setup
We will provide the participants with several baselines, including keyword and dense-based
representations of user profiles (anonymised information gathered about individual users) as part of our data
collection. For this shared task, we will use traditional evaluation metrics in the IR literature that can
be applied also to personalized search, Precision (P), Recall (R), Mean Average Precision (MAP), Mean
Reciprocal Rank (MRR), and (normalized) Discounted Cumulative Gain (nDCG).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Participation</title>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>A total of 10 teams registered across both sub-tasks. Only one team (Word Wizards) submitted runs for
task 1. For both tasks, 2 teams submitted the working notes.</p>
      <p>Table 2 shows the performance of our proposed baselines and submitter runs.</p>
      <p>The Personalised Information Retrieval (PIR) track at FIRE’24 focus on the evaluation of Personalised
Information Retrieval (PIR), which remains an important topic both in research and the development of
practical applications.</p>
      <p>In future eforts, we plan to address the challenges encountered by making datasets more
manageable, enhancing promotion, and broadening the scope of personalization to include diverse tasks in
Information Retrieval (IR) and possibly Natural Language Processing (NLP). By providing resources
such as pre-trained models, smaller datasets, and novel tasks, we aim to encourage a stronger focus on
personalization across varied domains. These steps will help attract a broader range of participants and
methodologies, driving greater engagement and impact.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kasela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braga</surname>
          </string-name>
          , G. Pasi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <article-title>Se-pqa: Personalized community question answering</article-title>
          ,
          <source>in: Companion Proceedings of the ACM on Web Conference</source>
          <year>2024</year>
          , WWW '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>1095</fpage>
          -
          <lpage>1098</lpage>
          . URL: https://doi.org/10.1145/ 3589335.3651445. doi:
          <volume>10</volume>
          .1145/3589335.3651445.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <article-title>Personalized large language models through parameter eficient fine-tuning techniques</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>3076</fpage>
          -
          <lpage>3076</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kasela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raganato</surname>
          </string-name>
          , G. Pasi,
          <article-title>Synthetic data generation with large language models for personalized community question answering</article-title>
          ,
          <source>arXiv preprint arXiv:2410.22182</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kasela</surname>
          </string-name>
          , G. Pasi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          , Desire-me:
          <article-title>Domain-enhanced supervised information retrieval using mixture-of-experts</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mysore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bendersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <source>Lamp: When large language models meet personalization</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>11406</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Document ranking with a pretrained sequence-to-sequence model</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>708</fpage>
          -
          <lpage>718</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>63</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>63</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Awan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awadallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Awadalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bahree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtiari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Behl</surname>
          </string-name>
          , et al.,
          <article-title>Phi-3 technical report: A highly capable language model locally on your phone</article-title>
          ,
          <source>arXiv preprint arXiv:2404.14219</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Achiam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Akkaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Aleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Altenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anadkat</surname>
          </string-name>
          , et al.,
          <source>Gpt-4 technical report, arXiv preprint arXiv:2303.08774</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>