<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Context for Conversational Information Seeking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haitao Yu</string-name>
          <email>yuhaitao@slis.tsukuba.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lingzhen Zheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kaiyu Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumio Fujita</string-name>
          <email>sufujita@lycorp.co.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideo Joho</string-name>
          <email>hideo@slis.tsukuba.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Conversational, Information Seeking, Personalized Context, LLM</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graduate School of Comprehensive Human Sciences, University of Tsukuba</institution>
          ,
          <addr-line>Tsukuba City, Ibaraki</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Library, Information and Media Science, University of Tsukuba</institution>
          ,
          <addr-line>Tsukuba City, Ibaraki</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LY Research, LY Corporation</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Washington</institution>
          ,
          <addr-line>DC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Conversational information seeking (CIS) extends the classic search to a conversational nature, which has attracted significant attention in recent years. Yet one size does not fit all, it is no surprise that users often need high-quality personalized response due to their diferent personas, e.g., for the search about alternatives to cow's milk, the desired responses may difer a lot. In this work, we focus on CIS that aims to account for personalized retrieval and response generation. Specifically, we follow the CIS paradigm presented in the TREC iKAT track, which consists of three core tasks, namely personal textual knowledge base (PTKB) statement ranking, passage ranking, and response generation. For PTKB statement ranking, we propose to fuse multiple large language models (LLMs). For passage ranking, we propose four diferent strategies for personalized retrieval. For response generation, we resort to zero-short LLM-based answer generation by incorporating personalized context. The experimental results show that: (1) For PTKB statement ranking, our method achieves the best performance in terms of MRR on the set of iKAT organizers' assessments. It also shows superior performance over the baseline based on GPT-4. This indicates that a fusion of multiple LLMs is a promising choice when tackling problems of this kind. (2) For passage ranking, on one hand, one of our proposed strategies is able to achieve comparable performance as Llama2-based baseline. On the other hand, our analysis indicates that the way of incorporating PTKB statements for personalized retrieval matters, where a direct concatenation is not recommended. (3) For response generation, our proposed method is able to generate grounded and natural personalized responses, and is comparable to the top-tier LLM-based baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, conversational systems have attracted
considerable attention from both academic researchers and
industrial practitioners. In the field of information retrieval (IR),
conversational information seeking (CIS) has been identified
as one of the most important research directions.
Remarkable eforts have been made from diferent aspects, which
include, but not limited to, conversational search
conceptualization [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], conversational query re-writing [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ],
generating and selecting clarifying questions [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7, 8, 9, 10</xref>
        ] and
conversational response generation [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ].
      </p>
      <p>
        Despite the successes achieved by the aforementioned
studies, fundamental research questions remain open. For
example, providing high-quality user-specific response is
still a challenging problem. Take the case by Aliannejadi et
al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as an example, for the search about alternatives to
cow’s milk, two personas can be: (A) Alice is a vegan who
is deeply concerned about the environment; and (B) Bob has
been recently diagnosed with diabetes, has a nut allergy, and
is lactose intolerant. Given Alice and Bob’s personas, their
corresponding conversations with the system would evolve
and develop in very diferent ways. Put another way, the
responses that are helpful to Alice may not be necessarily
useful to Bob, and vice versa. In fact, information needs of
this kind are prevalent in daily information searches, which
include, but not limited to, job finding, healthcare search and
online shopping. Given the information needs expressed
as a sequence of search queries (or questions) and diferent
Information Retrieval’s Role in RAG Systems (IR-RAG), 18 July, 2024,
∗The corresponding author.
in terms of MRR on the set of iKAT organizers’
assessments which relies on a larger assessment pool.
Moreover, our method also shows superior
performance over the GPT-4-based baseline. This
highlights that it is not straightforward to solve a
component task by merely tailoring a powerful LLM.
Whereas a fusion of multiple LLMs can be a
promising choice when tackling problems of this kind.
• For passage ranking, we propose four diferent
strategies for personalized retrieval, which enables
us to well investigate the impact of utterance
rewriting and the way of incorporating personalized
context. Through result analysis and comparison, we
found that: Though our proposed method for
selecting PTKB statements is relatively reliable, how to
incorporate the selected PTKB statements to
formulate the input for personalized retrieval matters a lot.
A direct concatenation is not suggested according to
the inferior performance of our proposed strategies.
• For response generation, we resort to zero-short
LLM-based answer generation by incorporating
personalized context. Our method is able to generate
CEUR
      </p>
      <p>ceur-ws.org
fourth step, we mange to unify the ranking information and
binary classification results of the previous two steps via
a scoring function and an indicator function. The scoring
function assigns a weight for each remaining statement in
the 2nd step as follows:
 () = 1 −
   5
() +  
2 ∗ ||

()
where    () and    5 () represent the rank
positions according to the regression scores by MonoT5
and RankGPT, respectively. || represents the number of
remaining PTKB statements in the second step. The
indicator function builds upon  () and a voting mechanism as
follows:
(1)
(2)
grounded and natural personalized responses, and
is comparable to the top-tier LLM-based baseline.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>Figure 1 describes our focused framework for CIS that
accounts for users’ personas. It assumes that there is a
personal text knowledge base (PTKB), which consists narrative
sentences providing personal information about the users.
A system following this framework consists of the following
key modules. (1) Statement ranking: given the context of the
conversation and the current user utterance, this module
returns a ranked list of PTKB statements based on their
relevance, which reflects the user’s persona; (2) Passage ranking:
given the context of the conversation, the current user
utterance, and the PTKB statements, this module is responsible
for retrieving a ranked list of passages from the document
collection; (3) Response generation: this module returns the
answer text as a response to the user. In particular, the
response should be a generative or abstractive summary of the
relevant passages. We recognize that the gap exists between
our focused framework for CIS and the real-world search
scenarios. Since this topic is still in its infancy, we leave it
as a future work to explore more complex frameworks.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Given the target paradigm for CIS in section 2, we elaborate
on the proposed methods for addressing the key module as
below.</p>
      <sec id="sec-3-1">
        <title>3.1. Statement Ranking by Fusing Multiple</title>
      </sec>
      <sec id="sec-3-2">
        <title>LLMs</title>
        <p>
          The key idea of our method (denoted as SR_FML) for
tackling statement ranking is to efectively fuse multiple LLMs
through a cascade of four steps. At the first step, we rewrite
each conversation turn’s utterance. Specifically, the
T5CANARD model [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] fine-tuned with the testing topics of
TREC CAsT 2022 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] is used, and the preceding turns’
conversations (3 turns at most) are used as the context. At
the second step, given the candidate PTKB statements, we
perform binary logistic regression based on the BERT [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
model. The candidate PTKB statements with a true label are
kept for later steps, and the statements with a false label are
ifltered out. At the third step, we perform binary logistic
regression again over the remaining PTKB statements based
on MonoT5 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] in the same way as the second step. In
addition, we use RankGPT [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] to sort the PTKB statements,
and assign the top half statements with a true label, and
a false label for the remaining bottom statements. At the
 () =
⎧1 if (  () +   5
⎨ and  () &gt; 0.65
⎩0 otherwise
() +  
()) ≥ 2
  () ,   5 () , and   () respectively
represent the binary classification result by each adopted LLM,
where an output of 1 denotes a true label, and 0 for a false
label.
        </p>
        <p>The final result list of PTKB statements is generated by
selecting statements with a positive output via the indicator
function and ranking them via the scoring function in a
decreasing order.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. Zero-shot LLM-based Passage Ranking</title>
        <p>To cope with passage ranking, we resort to the typical
pipeline of retrieve-then-rank. Firstly, we use BM25 with
the default setting in Pyserini to retrieve the top 5
passages. Then we design 4 strategies (denoted as PR_S1, PR_S2,
PR_S3 and PR_S4, respectively) to re-rank the top 5 passages
using multiple specifically selected LLMs in a zero-shot
manner.</p>
        <p>To formulate the input, PR_S1, PR_S3, and PR_S4
concatenate the rewritten utterance and the top 2 relevant PTKB
statements returned by the module of statement ranking.
PR_S2 directly uses the rewritten utterance as the input.</p>
        <p>
          During the ranking process, the diferences among
the four strategies are as follows: (1) PR_S1 and
PR_S2 assemble the results by multiple LLMs (i.e.,
”stabilityai/stablelm-tuned-alpha-7b”,
”eachadea/vicuna13b-1.1”, ”jondurbin/airoboros-7b”,
”TheBloke/koala-13BHF”) [
          <xref ref-type="bibr" rid="ref20 ref21 ref22 ref23 ref24">20, 21, 22, 23, 24</xref>
          ] in a voting manner. Specifically,
given the information need represented by the input, we
ask each LLM to compare the candidate passages in a
pairwise manner. The passage that is identified to be more
relevant than the other gets a vote. Finally, we rank the
passages based on the cumulative number of votes in a
decreasing order; (3) PR_S3 merely relies on MonoT5 with
the default setting in PyGaggle to rank the passages; (4)
PR_S4 relies on the idea of RankGPT to rank the passages,
where the GPT-3.5 API is used.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3. Personalized Response Generation</title>
        <p>
          For tackling response generation, we aim to generate
personalized response. Specifically, for each conversation turn,
the top-1 passage and the top-2 PTKB statements
representing the personalized context are used as the input. For
the base LLM, we resort to T5 [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], which is specifically
ifne-tuned for the summarization task.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>
          We use the dataset released by TREC iKAT 2023 for
evaluating the efectiveness with 25 testing topics. Each topic
has 1 ∼ 3 subtree conversations that represent diferent
personas. For each personalized conversation, there is a
list of around 10 PTKB statements. Moreover, the passage
collection has 116, 838, 987 passages, which is derived from
a subset of ClueWeb22-B [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baselines</title>
        <p>In order to make a fair and thorough analysis, we perform a
module-specific comparison by selecting the most
competitive and representative baseline methods from TREC iKAT
2023’s participants. We add a prefix of BS to each baseline
method for a better clarity.</p>
        <p>
          For statement ranking, BS_zs_Llama and BS_ft_Llama use
zero-shot and fine-tuned Llama-2-7b-chat [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] for rewriting
the utterance, respectively. Then they use MiniLM12 [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
to rank PTKB statements based on the rewritten utterance.
        </p>
        <p>For passage ranking, BS_Llama2 initially instructs
Llama2-7b-chat to reformulate the current utterance considering
previous conversation turns’ context. Then, the revised
conversation, along with a specific passage, are provided to
the model to assess the passage’s relevance.</p>
        <p>
          For response generation, BS_FastChatT5andLlama
creates a summarization for each of the top passages retrieved
by BM25 using FastChatT5 [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], then it generates the
response to current utterance based on the summaries in a
retrieval-generate loop. A final response is summarized by
BS_DenseMonoT5 using diferent engines including
conventional language models and Llama2 based on top passages.
        </p>
        <p>
          Besides the above module-specific baseline methods,
BS_GPT-4 is compared across three modules, which
represents the method using the most powerful LLM (i.e., GPT-4
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]). For statement ranking, BS_GPT-4 casts it as a binary
classification problem. The prompt includes the instruction,
context of the conversation, PTKB statements of the user,
and current user utterance. The output is a ranked list of
relevant statements. For passage ranking, BS_GPT-4 initially
generates an answer for each turn. Subsequently, GPT-4 is
employed to produce five queries for each answer. These
generated queries are used via BM25 to retrieve passages,
then the pre-trained MiniLM12 is deployed for ranking the
passages. For response generation, GPT-4 is prompted to
generate the answer, using the top-10 retrieved passages,
the top-3 PTKB statements, the context of the conversation
and the user utterance.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Implementation Details</title>
        <p>
          All experiments were conducted on a server with two A100
(40GB) GPUs. The CUDA version is 12.2. For fine-tuning
T5-CANARD, the configuration is: training epochs: 5, batch
size: 4, learning rate: 1 − 5 . For SR_FML, bert-base-uncased
with default parameter settings is used as the backbone
model, which comes from transformers library provided by
HuggingFace [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. We iterate its predictions five times and
compute the average relevance scores for each statement.
For RankGPT, the configuration is: window size: 4, step size:
1. The MonoT5 with default parameter settings in Pygaggle
is used. In PR_S3, the window size of RankGPT is adjusted
to 3. In PR_S1 and PR_S2, we set the prompt_max_length
of the four zero-shot LLMs to 2048. Additionally, we set the
decoding method to beam_search, output_max_length to
512, and temperature to 1.0 by default [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. For RG_SumT5,
t5-base-finetuned-summarize-news is employed with
conifguration: input max_length: 512, output min_length: 50,
output max_length: 150, length_penalty: 2.0, num_beams:
4.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>In Table 1, Table 2 and Table 3, we show the overall
performance of the baseline approaches, and the proposed
methods for statement ranking, passage ranking and response
generation, respectively. Within each table, the best result
in terms of each metric is indicated in bold, and the
secondbest result is underlined.</p>
      <p>
        For statement ranking, we note that there are two sets
of assessments which were created by the iKAT
organizers and NIST assessors, respectively. The key diferences
are that: During topic generation, the organizers annotated
each turn in terms of their provenance to PTKB statements
and included their labels in the released topic files. During
the assessment of passage relevance, the NIST assessors
were also asked to judge the relevance of PTKB statements
to each turn. The assessment pool is smaller than the one
done by the organizers. The organizers judged all of the
turns, while the NIST assessors only judged the turns that
were selected for passage relevance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. From Table 1,
we can observe that BS_zs_Llama outperforms the other
methods in terms of nDCG@3, P@3 and Recall@3. Though
BS_ft_Llama relies on the same LLM, its performance is
impacted due to the rewritten utterances in a fine-tune setting.
On the contrary, BS_GPT-4 relying on the powerful GPT-4
shows inferior performance across two sets of assessments.
This indicates that the usage of GPT-4 for statement
ranking is not straightforward, further exploration is needed
for a better performance. Over the set of iKAT organizers’
assessments, our proposed method (i.e., SR_FML) shows
competitive performance as BS_zs_Llama, and achieves the
best performance in terms of MRR. This indicates the benefit
of fusing multiple LLMs, which enables us to leverage on
the advantages of diferent LLMs. In view of the fact that
the set of iKAT organizers’ assessments bases on a larger
assessment pool, it is reasonable to say that the evaluation
over this set is more reliable.
      </p>
      <p>
        For passage ranking, the results in Table 2 show that
BS_GPT-4 significantly outperform BS_Llama2 and our
pro0.4382
0.1389
0.1433
0.1130
0.1107
0.1086
posed methods by a large margin. This echoes the findings
in prior studies [
        <xref ref-type="bibr" rid="ref19 ref33 ref34 ref35">19, 33, 34, 35</xref>
        ] which have shown the leading
capability of GPT-4 in the passage ranking task. One
probable reason is that the pipeline of generate-retrieve-generate
adopted by BS_GPT-4 is more suitable for passage ranking
than our adopted pipeline of retrieve-generate. Among our
proposed strategies for passage ranking, PR_S2 shows the
best performance, and also outperforms BS_Llama2.
Compared with BS_Llama2, a possible reason for the inferior
performance of the other three strategies is the way of
formulating the input. We directly concatenate the utterance
and related PTKB statements as the input, while BS_Llama2
rewrites the utterance with the statements using LLM.
Another possible reason for our inferior performance is that
we focus on the earlier positions and only re-rank the top-5
passages returned by BM25. As a result, this setting would
become a bottleneck for us to get relevant passages given
the limited retrieval ability of BM25.
      </p>
      <p>For response generation, the results are evaluated in terms
of groudedness and naturalness. Groudedness measures
whether the generated response can be attributed to the
passages that it is supposed to be generated from. Naturalness
measures the extent to which the response sounds
humanlike, such as the general fluency and understandability of
the generated response. GPT-4 is used to evaluate both
the groundedness and naturalness of the responses in each
turn. Finally, the mean of groundedness and naturalness
over all turns is reported. From Table 3, we can observe
that BS_GPT-4 again outperforms the other methods by
a large margin. Our proposed method (i.e., RG_SumT5)
outperforms BS_DenseMonoT5 and shows competitive
performance as BS_FastChatT5andLlama.</p>
      <p>It is noticeable that the evaluation results are likely to be
somewhat biased towards BS_GPT-4, since the evaluation
is conducted by GPT-4. We leave it as a future work to
further test the efectiveness of these methods for response
generation through human evaluation results.</p>
      <p>A joint look across Table 1, Table 2 and Table 3 reveals
that: First, we do not observe a clear correlation between
statement ranking and passage ranking, which seems
counterintuitive. For instance, though BS_GPT-4 shows inferior
performance in statement ranking, it outperforms the other
methods by a large margin in passage ranking. This
counterintuitiveness may arise from a number of possible reasons,
such as the strong zero-shot capability of GPT-4 and the
precise understanding of persona information underlying
selected PTKB statements. This is also worthy to be
investigated as a future work. Second, for both personalized
retrieval and response generation in the context of CIS, there
is still a large room to improve the performance.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this study, we focus on CIS that accounts for personalized
retrieval and response generation. By following the CIS
paradigm presented in the TREC iKAT track, we propose
diferent methods to tackle three core tasks, namely
personal textual knowledge base (PTKB) statement ranking,
passage ranking and response generation. We have shown
that fusing multiple LLMs is a promising way for addressing
PTKB statement ranking. Also, our analysis indicates that
an efective way of injecting the selected PTKB statements
is quite important for personalized retrieval. Since
conversational systems arise in a variety of applications, such as
recommender systems and question answering, we believe
that our work provides insights for developing
conversational systems that account for personalized retrieval and
response generation.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This research has been supported by JSPS KAKENHI Grant
Number 19H04215.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Halvey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <article-title>Conceptualizing agent-human interactions during the conversational search process</article-title>
          , in: The second international workshop on conversational approaches to information retrieval,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Trippas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <article-title>Towards multimodal conversational information seeking</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1577</fpage>
          -
          <lpage>1587</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>A theoretical framework for conversational search</article-title>
          ,
          <source>in: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval</source>
          , CHIIR '17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          , p.
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          . URL: https://doi.org/10.1145/3020165. 3020183. doi:
          <volume>10</volume>
          .1145/3020165.3020183.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Few-shot generative conversational query rewriting</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1933</fpage>
          -
          <lpage>1936</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vakulenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Anantha</surname>
          </string-name>
          ,
          <article-title>Question rewriting for conversational question answering</article-title>
          ,
          <source>in: Proceedings of the 14th ACM international conference on web search and data mining</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>355</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F. Tsai</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          <article-title>, Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>39</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3446426. doi:
          <volume>10</volume>
          .1145/3446426.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Asking clarifying questions in open-domain information-seeking conversations</article-title>
          ,
          <source>in: Proceedings of the 42nd international acm sigir conference on research and development in information retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>475</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          , G. Lueck,
          <article-title>Generating clarifying questions for information retrieval</article-title>
          ,
          <source>in: Proceedings of The Web Conference</source>
          <year>2020</year>
          , WWW '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>418</fpage>
          -
          <lpage>428</lpage>
          . URL: https://doi.org/10.1145/3366423.3380126. doi:
          <volume>10</volume>
          . 1145/3366423.3380126.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sekulić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>Towards facetdriven generation of clarifying questions for conversational search</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM SIGIR international conference on theory of information retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          , E. Chen,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lueck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <article-title>Analyzing and learning from user interactions for search clarification</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2006</year>
          .00166.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Quan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Multidomain dialogue acts and response co-generation</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>12363</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ji</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Structured and natural responses co-generation for conversational search</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Yoo</surname>
          </string-name>
          , J.-W. Ha, Dialogbert:
          <article-title>Discourseaware response generation via learning to recover and rank utterances</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>12911</fpage>
          -
          <lpage>12919</lpage>
          . URL: https://ojs.aaai.org/index.php/ AAAI/article/view/17527. doi:
          <volume>10</volume>
          .1609/aaai.v35i14.
          <fpage>17527</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zahra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shubham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jefery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leif</surname>
          </string-name>
          , Trec ikat
          <year>2023</year>
          :
          <article-title>The interactive knowledge assistance track overview</article-title>
          ,
          <source>in: Proceedings of the Thirty-Second Text REtrieval Conference (TREC</source>
          <year>2023</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F. Tsai</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Conversational question reformulation via sequence-to-sequence architectures and pretrained language models</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <year>01909</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Owoicho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Trippas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vakulenko</surname>
          </string-name>
          , Trec cast 2022:
          <article-title>Going beyond user ask and system retrieve with initiative and response generation</article-title>
          , NIST Special Publication (
          <year>2022</year>
          )
          <fpage>500</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Document ranking with a pretrained sequence-to-sequence model</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>708</fpage>
          -
          <lpage>718</lpage>
          . URL: https://aclanthology. org/
          <year>2020</year>
          .findings-emnlp.
          <volume>63</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . findings- emnlp.63.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , S. Wang,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Is chatgpt good at search? investigating large language models as re-ranking agents</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>09542</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gudibande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Koala: A dialogue model for academic research</article-title>
          , Blog post,
          <year>2023</year>
          . URL: https://bair. berkeley.edu/blog/2023/04/03/koala/.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Nussbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Duderstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Mulyar,</surname>
          </string-name>
          <article-title>Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo</article-title>
          , https://github.com/nomic-ai/
          <year>gpt4all</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taori</surname>
          </string-name>
          , I. Gulrajani,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dubois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Hashimoto</surname>
          </string-name>
          , Stanford alpaca:
          <article-title>An instruction-following llama model</article-title>
          , https: //github.com/tatsu-lab/stanford_alpaca,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>W.-L. Chiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , L. Zheng,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Stoica</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>Vicuna:</given-names>
          </string-name>
          <article-title>An open-source chatbot impressing gpt-4 with 90%* chatgpt quality</article-title>
          ,
          <year>2023</year>
          . URL: https:// lmsys.org/blog/2023-03-30-vicuna/.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>5485</fpage>
          -
          <lpage>5551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Overwijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Callan,</surname>
          </string-name>
          <article-title>Clueweb22: 10 billion web documents with rich information</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3360</fpage>
          -
          <lpage>3362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almahairi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Babaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bashlykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Batra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhosale</surname>
          </string-name>
          , et al.,
          <source>Llama</source>
          <volume>2</volume>
          :
          <article-title>Open foundation and finetuned chat models</article-title>
          ,
          <source>arXiv preprint arXiv:2307.09288</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , W.-L. Chiang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <article-title>Judging llm-as-a-judge with mtbench and chatbot arena</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2306</volume>
          .
          <fpage>05685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>O.</surname>
          </string-name>
          (
          <year>2023</year>
          ), Gpt-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          , in: Q. Liu, D. Schlangen (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https:// aclanthology.org/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Llm-blender: Ensembling large language models with pairwise ranking and generative fusion</article-title>
          ,
          <source>arXiv preprint arXiv:2306.02561</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharifymoghaddam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Rankvicuna: Zero-shot listwise document reranking with open-source large language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>15088</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Large language models for information retrieval: A survey</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2308</volume>
          .
          <fpage>07107</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Ma,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ture</surname>
          </string-name>
          ,
          <article-title>Found in the middle: Permutation self-consistency improves listwise ranking in large language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>07712</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>