<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Keyword Extraction for Improved Document Retrieval in Conversational Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleg Borisov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Aliannejadi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Crestani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitá della Svizzera italiana (USI)</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research has shown that mixed-initiative conversational search, based on the interaction between users and computers to clarify and improve a query, provides enormous advantages. Nonetheless, incorporating additional information provided by the user from the conversation poses some challenges. In fact, further interactions could confuse the system as a user might use words irrelevant to the information need but crucial for correct sentence construction in the context of multi-turn conversations. To this aim, in this paper, we have collected two conversational keyword extraction datasets and propose an end-to-end document retrieval pipeline incorporating them. Furthermore, we study the performance of two neural keyword extraction models, namely, BERT and sequence to sequence, in terms of extraction accuracy and human annotation. Finally, we study the efect of keyword extraction on the end-to-end neural IR performance and show that our approach beats state-of-the-art IR models. We make the two datasets publicly available to foster research in this area.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversational Search</kwd>
        <kwd>Mixed-Initiative Conversations</kwd>
        <kwd>Keyword Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recent developments in speech recognition and deep learning have led to intelligent assistants,
such as Google Assistant, Microsoft Cortana, and Apple Siri. Consequently, researchers and
users are exploring novel means of communication and information access, such as spoken
queries and conversations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Research on information-seeking conversational systems has
gained lots of attention recently. Various shared evaluation tasks have been raised in the
community, focusing on single- [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and mixed-initiative [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] conversational search systems.
The aim of research in mixed-initiative conversations is to enable a system to take the initiative
of the conversation when necessary, aiming to provide a better experience to the user [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. An
example of mixed-initiative interaction is asking clarifying questions that has been recently
studied in the context of information-seeking conversations [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] and Web search [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ].
      </p>
      <p>
        In Web search, where users usually type their queries, they take some time to formulate a
query and often do not follow common sentence structures. For example, they only focus on
using the most important words for their search. Consequently, a narrow focus is created for
the search engine, making the inspection of documents for the most relevant query words easier.
In contrast to this, conversational IR faces challenges due to the inclination of users to follow
their own speech patterns when formulating queries rhetorically. Here, users tend to include
some unnecessary terms that appear crucial for a proper sentence construction but might derail
the IR model in searching for relevant documents [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This could also be magnified when a
conversation evolves into multiple turns [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ] and a new form of conversation is presented
to the user, such as when the system asks clarifying questions. This happens mainly due to the
context-dependence nature of multi-turn conversations and new types of responses that could
emerge in a mixed-initiative conversation.
      </p>
      <p>
        While the efectiveness of conversational systems has been studied before [
        <xref ref-type="bibr" rid="ref15 ref16 ref6 ref7">6, 15, 7, 16</xref>
        ],
the main goal of this paper is to study if the identification of keywords retrieved from the
human-computer interaction will help achieve better retrieval results. To this aim, we collect
two datasets of keyword extraction and study the efectiveness of multiple generative models
on them. Our first dataset is collected based on the performance of the retrieval model using
diferent keywords, while the other is collected from news articles online. Every news article
comes with a title and a set of keywords. Our intuition is that a neural model can learn to extract
useful keywords from news titles and use this external knowledge for more efective keyword
extraction in a conversation. We study the efect of various keyword extraction strategies on
non-neural and neural document retrieval pipelines. To the best of our knowledge, keyword
extraction in the context of mixed-initiative conversational IR has not been studied before.
      </p>
      <p>In our retrieval pipeline, after the conversational phase, where the system interacts with the
user to clarify the query ambiguities, the conversational sequence is passed to the keyword
extractor, identifying the most important terms from the sentences. In parallel to that, the
document retrieval model performs the first relevance ranking of the documents based on the
original conversation. Finally, the Neural IR performs re-ranking using top documents from
the IR phase 1 and keywords obtained by keyword extractor from Keyword Extraction Phase
as inputs of the system.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Collection</title>
      <p>As the topics addressed by this study have only recently surfaced in research, a substantial
amount of work needed to be done to answer whether keywords could support the IR model
with document retrieval tasks.</p>
      <p>
        As was discussed earlier, keyword extraction from short-sized documents using Deep Learning
is a relatively new topic. The previously created Inspec, SemEval-2010, SemEval-2017 datasets
are not suitable for this research, as they are focused on keyword and keyphrase extraction
from medium- and large-sized texts (e.g., abstracts or scientific articles) [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ]. In contrast
to this, the main focus of this research is keyword extraction from small-sized sentences of the
length of no more than 20 words, which is the average English sentence length [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Therefore,
we collect and release two types of datasets: (i) News-Keyword based and (ii) IR-Keyword based
datasets.1
      </p>
      <sec id="sec-2-1">
        <title>1Data available at: https://github.com/aliannejadi/ConvKey</title>
        <sec id="sec-2-1-1">
          <title>2.1. News-Keywords Based dataset</title>
          <p>Online newspaper websites and other social network Web pages tend to follow a content
structure, where common articles are structured as title, main text, tags (sometimes hashtags).
Content creators try not only to select an appealing and interesting title name but also to
summarize the content in one sentence, thus selecting the most important words to portray the
key message of an article. This can also be considered a reverse IR operation as the author, given
the document’s content, provides a title (considered the query in our case) that corresponds to
the article in the best possible way.</p>
          <p>Authors usually also choose some tags that either describe the article in the most general
way or place the story in the context of other related articles that one could find on the website.
From the user’s point of view, tags provide an opportunity to navigate to other related material;
however, having well-formulated tags is also crucial for Search Engine Optimization and could
impact the website’s visibility or the article [21].</p>
          <p>Taking into consideration that writers pay very close attention to the title and the tags used,
where it is not unusual for tag words to appear in the title, brings us to the first method of
keyword dataset: considering title as the input text, and tags as the target keywords. If a tag
does not appear in a corresponding title, we do not add it to the keywords list (as shown in
Table 1).
Five German words you’ll need [summer, holidays, members] [summer]
to know this summer</p>
          <p>To create the dataset, we scraped the following news websites: BBC3, The Local4 and Salon5.
In total, over 104,000 title-tag pairs have been obtained using this method. After filtering the
outliers and the items where the tags do not appear in the title, the dataset shrinks to 79,000
instances.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. IR-Keyword-based dataset</title>
          <p>
            Classical IR systems only focus on the basic preprocessing of the query, such as the removal
of stopwords and punctuation. Having too many words could confuse the system and lead it
to retrieve unwanted results. Therefore, a correct keyword identification could lead to better
retrieval performance, while selecting less good keywords will inevitably worsen the output
results. We developed the IR-Keyword-based dataset based on this assumption, applying the
previously created Qulac dataset [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. To create a dataset in this context, we used Qulac’s first
conversational round, containing three components: query , question , and answer , which
retrieve a set of relevant documents.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2Taken on 30 June 2020 from</title>
        <p>https://www.thelocal.ch/20200630/five-german-words-youll-need-to-know-this-summer
3https://www.bbc.com/
4https://www.thelocal.ch/
5https://www.salon.com/</p>
        <p>The main idea is to identify a set of words from , , and , which will lead to the greatest
relevance of retrieved documents. To evaluate the system’s performance, we used the
Normalized Discounted Cumulative Gain at 20 (NDCG@20) metrics. Due to the complexity of
the permutations of all potential keywords of the whole set , , , we decided to focus on one
component at a time. The algorithm that was used is presented in Algorithm 1. The main idea
is to choose 0, which could be a query, question, or an answer. For example, let us consider 0
a query and 1, 2 to represent the question and answer. Afterward, we would like to consider
all possible subsets of words of 0 (query in our example), which will form a set of potential
keywords. In mathematical terms, such operation is known as the powerset. For instance, if
0 is "How are you?", then a set of potential keywords would be: {"how", "are", "you", "how are",
"how you", "you are", "how are you" }6. The cardinality of a powerset highly depends on the
number of words that the input sentence contains. To address having a large powerset, we limit
the maximum size of the subset to four words.</p>
        <p>Next, we consider one instance  from the potential keyword set and retrieve the documents
by supplying , 1 and 2 to the document retrieval model. Consequently, it is important
to evaluate the retrieved documents’ relevance and save the obtained score. In the end, we
save the 0 as the input text and  as the set of keywords that led to the retrieval of the
most relevant documents. We repeat a similar operation by considering 0 as the question,
and 1, 2 as query and question, and later 0 as the answer, and 1, 2 as query and answer,
respectively. We apply the same process for all conversations from the Qulac dataset until we
receive keywords from all queries, questions, and answers. Applying this approach, 15,320 data
samples were obtained. The benefit which this approach suggests is that where the answer of a
user in a computer interaction is uncertain or ambiguous and will not provide any important
information, the system learns to ignore these. In this scenario, the system should ideally ask
another question or base the search only on the initial query. Therefore, the proposed method
of dataset generation will be able to mimic this behavior.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methods</title>
      <p>This section describes our conversational IR framework. We start with the neural models that
we used for the keyword extraction task. Then we continue with the neural IR models and
describe how keyword extraction fits into our pipeline.</p>
      <sec id="sec-3-1">
        <title>3.1. Keyword Extraction Models</title>
        <p>For the Keyword Extraction Phase, we experimented with two diferent types of neural models:
Sequence-to-Sequence architecture and BERT model [22, 23]. Sequence-to-Sequence architecture
uses Gated Recurrent Unit (GRU) as a recurrent neural network, the Attention mechanism to
help the decoder, and pre-trained Word2Vec embeddings the performance on the words outside
of the training set vocabulary [24, 25, 26]. We use Sequence-to-Sequence because it has been a
state-of-the-art architecture for many diferent NLP tasks and established new benchmarks for
the tasks of Neural Machine Translation [22]. In contrast to the previously described model, we</p>
        <sec id="sec-3-1-1">
          <title>6We also keep the original order in which the words appear in the text</title>
          <p>Algorithm 1: IR-Keyword Based Dataset Creation Method.</p>
          <p>Method: find_keywords(query, question, answer)
for 0 in [query, question, answer] do
1, 2 = [query, question, answer].remove(0);
potential_keywords = PowerSet(0, maxSubsetSize=4) ;
scores_list = list() ;
for  in potential_keywords do
ranked_documents = IRmodel.retrieve(, 1, 2) ;
score = ranked_documents.evaluate(metrics="NDCG@20");
scores_list.append(score) ;
end
max_score_index = argmax(scores_list) ;
 = potential_keywords[max_score_index] ;
save("input text" = 0, "keywords"=)
end</p>
          <p>Original Sentence “Conservatives and liberals drink diferent beer”
Tokenized Sentence [’conservatives’, ’and’, ’liberals’</p>
          <p>d´rink’, ’diferent’, ’ beer’]
Keywords [’conservatives’, ’liberals’, ’beer’]</p>
          <p>
            Named Entities [
            <xref ref-type="bibr" rid="ref1 ref1 ref1">1, 0, 1, 0, 0, 1</xref>
            ]
also selected BERT as the most recently developed Transformer-based neural architecture in
the field of NLP. One of its biggest advantages is that it has been pre-trained on a great amount
of data using two main approaches: Masked Language Model, which is related to the prediction
of masked/hidden tokens in the input sentence, and Next Sentence Prediction, which has the
objective of predicting the next sentence from the input sequence. Therefore, by nfie-tuning
the model, it is possible to achieve great results in tasks, such as: Named Entity Recognition
(NER), Sentence Classification, Answer Searching, and others [23].
          </p>
          <p>To train selected architectures, we formulate the task of keyword extraction in the form of a
NER task, as shown in Table 2. Where we say that a word is a keyword it its corresponding
entity is labeled as "1", and not a keyword if it is marked as "0".</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Neural IR models</title>
        <p>
          We extend the solution available from previous research [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] by adding Information Retrieval
Phase 2, represented by the Neural IR model. We study the efectiveness of the following
commonly used two Neural IR models:
1. Deep Relevance Matching Model (DRMM): this model puts more emphasis on the
        </p>
        <sec id="sec-3-2-1">
          <title>7Retrieved on 15th of October 2019, from:</title>
          <p>https://www.salon.com/2013/02/27/conservatives_and_lilberals_drink_diferent_beer_partner/
relevance (both semantic and lexical) matching of the query rather than on exclusively
semantic matching. It considers three crucial factors of the "handling of the exact matching
signals, query term importance, and diverse matching requirements" [27].
2. Deep Semantic Similarity Model (DSSM): based on the Siamese network architecture,
DSSM has the main focus in comparing cosine similarities of the vector representations
of a query and the document, where vector representations are learned using Deep
fullyconnected layers [28]. Originally such a model was only used for short text matching
tasks (for example, matching questions with the most relevant answers); however, later,
DSSM proved to be useful for tasks involving documents containing long texts, thus being
a perfect choice for IR related tasks [29].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>Data. We use the publicly available Qulac8 dataset, which is built based on the TREC Web
Track 2009-2012, for our experiments. For keyword extraction experiments, we use the two
datasets described in Section 2.</p>
        <p>Metrics. We evaluate keyword extractions’ performance in two ways, namely, the accuracy of
extraction and end-to-end document retrieval. As for extraction accuracy, we use the following
evaluation metrics: Precision, Recall, Average Tag Correct Identification (ATCI) 9, and Correct
per Response Fill (CpRF) 10. Also, we perform a human evaluation on the extracted keywords,
where we ask the human annotators to score each extracted keyword from 1 to 5. Our IR
evaluation follows the standard IR metrics, namely, Normalized Discounted Cumulative Gain at
 (nDCG@), Precision at  (P@), Mean Reciprocal Rank (MRR), and Mean Average Precision
(MAP).</p>
        <p>Statistical Significance. We perform two-tailed paired t-test  −  &lt; 0.05 to determine
significant improvements on the IR metrics.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Keyword Extractors</title>
        <p>To evaluate the performance of Keyword Extracting Neural Networks, two methods have been
used. The first one relies on test dataset accuracy, while the second one is a human evaluation
method.</p>
        <p>Test dataset accuracy. Table 3 shows the performance of the Keyword Extractors We also
created a simple "Non-Neural approach" to serve as a baseline. This method operates in a
very elementary way: the word frequencies were calculated from the training dataset. Using
a brute force approach, the optimal frequency threshold was found, which maximizes the</p>
        <sec id="sec-4-2-1">
          <title>8http://www.github.com/aliannejadi/qulac</title>
          <p>9tests the quality of the overall assigned tags, by checking if the model has correctly assigned keyword or not
keyword tag</p>
          <p>10It captures the ratio of fully correct and partially correct predictions to the total amount of sentences in the
dataset (adopted from MUC-5)
correct identification of tagged keywords (if the frequency of a certain word is below the
threshold, the word is assigned to be a keyword). If the word has not been seen before, it is
automatically assigned as the keyword as it is considered rare enough as it has not appeared
in the training corpus. In Table 3, we see that BERT seems to be an ultimate solution to the
keyword extraction problem, as the model achieves a much better test set performance than
other keyword extractors.</p>
          <p>Human Evaluation. While the testing set evaluates the models’ performances in a similar
training environment, it is also essential to test whether the extracted keywords would suit
human judgments. To address this question, Google’s Query Wellformedness dataset has been
used [30]. Judges were asked to select the least possible number of keywords given a sentence
and rate the relevance of the keywords chosen by the keyword extractor on a scale from 1 to
5, where 5 is the best score. For the latter part, we asked judges to imagine themselves in a
situation where they have to answer the question based on only the keywords provided. In
the scenario that the Neural Network’s selected keywords were doubtful, we asked the judges
to use Google’s Search Engine and plug the keywords to see if suficiently good results were
obtained.</p>
          <p>As can be observed from Table 3, Sequence-to-Sequence (Seq2seq) models appear to retrieve
better keywords than BERT as the scores given to the model by the judges are higher.
Additionally, it is essential to focus on words that the judges selected, the Seq2seq and BERT models
alike, to describe the reasoning of the classifiers and compare it to the experts’ judgment. We
explore the locations in which the keywords tend to appear in the sentences more often. As
clearly seen from Figure 1, keywords appear to be located closer to the end of the sentence.
Neural Networks have correctly learned this trend. However, it can be noted that, in general,
DRMM
DSSM
orig
s2s
bert
non-neural
orig
s2s
bert
non-neural
the models tend to underestimate the number of keywords in a sentence.</p>
          <p>Both evaluation approaches (test set and human evaluation) give interesting insights as we
can clearly see that BERT learned better keywords that lead to the best document retrieval. In
contrast, according to the Human evaluation dataset, Sequence-to-Sequence was able to retrieve
more relevant keywords. Therefore, we also study the impact of both models on end-to-end IR
performance to see how they eventually afect IR performance.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Neural IR Models</title>
        <p>The performance of the Neural IR Models is presented in Table 4. As can be observed, in the
case of the DRMM Neural IR model usage, the models provided with keywords have achieved
a similar performance and have outperformed the Non-Neural IR model. Interestingly, the
DRMM supplied with original conversational sequences was able to show the best performance
concerning other DRMM models.</p>
        <p>Looking at the DSSM, we can observe that providing keywords using BERT or the
Sequenceto-sequence architecture yields much better results than when using original conversational
sequences or the Non-Neural model (the last two achieved relatively similar performances).
DSSM models that have used keywords have achieved the overall best retrieval performance.
Keywords Extractor Influence on IR model. Another interesting insight is provided by
considering how well the Neural IR model performs concerning the efectiveness of the Keyword
Extractor. In this case, we are interested in the precision of keywords provided by the
Sequenceto-sequence Keyword Extractor and how the produced keywords impact the performance of
the DSSM model.</p>
        <p>As we see in Table 5, the premise is that the better the quality of the produced keywords, the
better the IR model will perform. It is also interesting to see how much the Neural IR model
will benefit from high-quality keywords. First, we start with the test dataset ordered by the
relevance scores assigned by the Neural IR models. The next step is to split the dataset into
three sub-parts, based on the precision of the keywords obtained from the Keyword Extractor’s
query-question-answer sequences.</p>
        <p>MAP</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This research studied the application of keywords in the context of conversational IR is going
to be advantageous. For this purpose, we created two keyword extraction datasets and studied
two types of Keyword Extractor, one based on a seq2seq architecture and the other based on
BERT. We tested the keyword extraction performance based on keyword extraction, as well
as end-to-end document retrieval performance. To do so, we test the performance on two
state-of-the-art neural IR models, namely, DRMM and DSSM. We showed that Neural IR models
supplied with keywords from conversational communications with users improve the relevance
of retrieved documents through experimental results. In addition, we showed that the higher
the Keyword Extractor’s precision, the better is the performance of the DSSM IR model.</p>
      <p>For further work, it would be interesting to train the Neural Networks on a newly created
dataset manually labeled by humans. Likely, the keyword dataset creation approach, which we
proposed in this paper, misses some important keywords that humans will identify easily.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>The work constitutes part of the master thesis of Oleg Borisov at the Universitá della Svizzera
italiana (USI) on conversational search.
[21] N. Yalçın, U. Köse, What is search engine optimization: Seo?, Procedia-Social and</p>
      <p>Behavioral Sciences 9 (2010) 487–493.
[22] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, in:</p>
      <p>Advances in neural information processing systems, 2014, pp. 3104–3112.
[23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[24] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
and translate, arXiv preprint arXiv:1409.0473 (2014).
[25] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y.
Bengio, Learning phrase representations using rnn encoder-decoder for statistical machine
translation, arXiv preprint arXiv:1406.1078 (2014).
[26] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient estimation of word representations in
vector space, arXiv preprint arXiv:1301.3781 (2013).
[27] J. Guo, Y. Fan, Q. Ai, W. B. Croft, A deep relevance matching model for ad-hoc retrieval, in:
Proceedings of the 25th ACM International on Conference on Information and Knowledge
Management, 2016, pp. 55–64.
[28] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, L. Heck, Learning deep structured
semantic models for web search using clickthrough data, in: Proceedings of the 22nd ACM
international conference on Information &amp; Knowledge Management, 2013, pp. 2333–2338.
[29] B. Mitra, F. Diaz, N. Craswell, Learning to match using local and distributed representations
of text for web search, in: Proceedings of the 26th International Conference on World
Wide Web, 2017, pp. 1291–1299.
[30] M. Faruqui, D. Das, Identifying well-formed natural language questions, arXiv preprint
arXiv:1808.09419 (2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>A theoretical framework for conversational search</article-title>
          , in: CHIIR, ACM,
          <year>2017</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          , Cast-19:
          <article-title>A dataset for conversational information seeking</article-title>
          , in: SIGIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>1985</fpage>
          -
          <lpage>1988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiseleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chuklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Burtsev</surname>
          </string-name>
          ,
          <article-title>Convai3: Generating clarifying questions for open-domain dialogue systems (clariq</article-title>
          ), CoRR abs/
          <year>2009</year>
          .11352 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          , E. Kanoulas, P. Thomas,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>Analysing mixed initiatives and search strategies during conversational search</article-title>
          , in: CIKM, ACM,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          ,
          <article-title>Principles of mixed-initiative user interfaces</article-title>
          , in: M. G. Williams,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Altom</surname>
          </string-name>
          (Eds.), CHI, ACM,
          <year>1999</year>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Asking clarifying questions in opendomain information-seeking conversations</article-title>
          , in: SIGIR, ACM,
          <year>2019</year>
          , pp.
          <fpage>475</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hashemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Guided transformer: Leveraging multiple external sources for representation learning in conversational search</article-title>
          , in: SIGIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>1131</fpage>
          -
          <lpage>1140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Bennett</surname>
          </string-name>
          , G. Lueck,
          <article-title>Generating clarifying questions for information retrieval</article-title>
          ,
          <source>in: WWW, ACM / IW3C2</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>418</fpage>
          -
          <lpage>428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          , E. Chen,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lueck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <article-title>Analyzing and learning from user interactions for search clarification</article-title>
          , in: SIGIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>1181</fpage>
          -
          <lpage>1190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sekulic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>User engagement prediction for clarification in search</article-title>
          ,
          <source>in: ECIR, Lecture Notes in Computer Science</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>619</fpage>
          -
          <lpage>633</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lotze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Klut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          , E. Kanoulas,
          <article-title>Ranking clarifying questions based on predicted user engagement</article-title>
          ,
          <source>CoRR abs/2103</source>
          .06192 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Kato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ohshima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <article-title>Cognitive search intents hidden behind queries: a user study on query formulations</article-title>
          ,
          <source>in: Proceedings of the 23rd International Conference on World Wide Web</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>314</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Ríssola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>Harnessing evolution of multiturn conversations for efective answer retrieval</article-title>
          , in: CHIIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Voskarides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          , E. Kanoulas, M. de Rijke,
          <article-title>Query resolution for conversational search with limited supervision</article-title>
          , in: SIGIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>921</fpage>
          -
          <lpage>930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Towards conversational search and recommendation: System ask, user respond</article-title>
          ,
          <source>in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>A. M. Krasakis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Aliannejadi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Voskarides</surname>
          </string-name>
          , E. Kanoulas,
          <article-title>Analysing the efect of clarifying questions on document ranking in conversational search</article-title>
          , in: ICTIR, ACM,
          <year>2020</year>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Augenstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Vikraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>McCallum</surname>
          </string-name>
          ,
          <year>Semeval 2017</year>
          task 10:
          <article-title>Scienceie-extracting keyphrases and relations from scientific publications</article-title>
          ,
          <source>arXiv preprint arXiv:1704.02853</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hulth</surname>
          </string-name>
          ,
          <article-title>Improved automatic keyword extraction given more linguistic knowledge</article-title>
          ,
          <source>in: Proceedings of the 2003 conference on Empirical methods in natural language processing</source>
          ,
          <year>2003</year>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Medelyan</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <article-title>Automatic keyphrase extraction from scientific articles</article-title>
          ,
          <source>Language resources and evaluation 47</source>
          (
          <year>2013</year>
          )
          <fpage>723</fpage>
          -
          <lpage>742</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cutts</surname>
          </string-name>
          , Oxford guide to plain English, Oxford University Press, USA,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>