<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Building Benchmark Datasets for Understudied Information Retrieval Tasks: the Case of Semantic Query Labeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elias Bassani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Consorzio per il Trasferimento Tecnologico - C2T</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this manuscript, we review the work we undertake to build a large-scale benchmark dataset for an understudied Information Retrieval task called Semantic Query Labeling. This task is particularly relevant for search tasks that involve structured documents, such as Vertical Search, and consists of automatically recognizing the parts that compose a query and unfolding the relations between the query terms and the documents' fields. We first motivate the importance of building novel evaluation datasets for less popular Information Retrieval tasks. Then, we give an in-depth description of the procedure we followed to build our dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Vertical search</kwd>
        <kwd>Structured document search</kwd>
        <kwd>Semantic query labeling</kwd>
        <kwd>Dataset</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Community Question Answering platforms (e.g., Quora1, Yahoo! Answer2, Stack Overflow 3, etc.).
Community Question Answering research datasets include Quora Dataset4, Yahoo! Answers
Dataset [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], SemEval-2017 Task3 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], CQADupStack [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], ComQA [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and LinkSO [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
Moreover, some big private companies have actively contributed to provide expensive large-scale
benchmark datasets to the research community, such as Microsoft5 with its MS MARCO [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
dataset. Unfortunately, other tasks appear to be research matter only for those companies that
can aford to produce the datasets needed for models training and evaluation, and, unfortunately,
the majority of these datasets are never made available to the research community. As well
known, this situation also poses reproducibility issues that can hardly be overcome.
      </p>
      <p>One of the IR sub-fields that received limited attention from academicians for the study of
the application of Deep Learning techniques is Vertical Search. However, nowadays, many
diferent kinds of vertical online platforms, such as e-commerce websites (e.g., Amazon 6),
media streaming services (e.g., Netflix 7, Spotify8), job-seeking platforms (e.g., LinkedIn9), digital
libraries (e.g., DBLP10), and several others, provide access to domain-specific information
through a search engine to millions of users every day. What makes Vertical Search interesting
from a research perspective and, potentially, for the application of sophisticated Machine
Learning-based approaches is that vertical platforms usually organize their information in
structured documents, which require to be treated appropriately during search to leverage the
additional information encoded in their structure. However, search functionalities on vertical
platforms are usually delivered as standard keyword-based search, or trough uncomfortable
faceted search interfaces, which require additional efort form the user. Unlike in Web Search,
user queries in vertical systems often contain references to specific structured information
contained in the documents. Nevertheless, Vertical Search is often managed as a traditional
retrieval task, treating documents as unstructured texts and taking no advantage of the latent
structure carried by the queries. Exploiting this latent information could unfold the relations
between the query terms and the documents’ structure, thus enabling the search engine to
leverage the latter during retrieval.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Semantic Query Labeling</title>
      <p>
        Semantic Query Labeling [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is the task of 1) locating the constituent parts of a query
(segmentation) and 2) assigning predefined and domain-specific semantic labels to each of them
(classification ). Conducting this task in a pre-matching phase could allow a search engine to
leverage the structure and the semantics of the query terms, making it able to efectively take
advantage of the structure of the documents during retrieval, thus enhancing the matching
1https://www.quora.com
2https://yahoo.com
3https://stackoverflow.com
4https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs
5https://www.microsoft.com
6https://www.amazon.com
7https://www.netflix.com
8https://www.spotify.com
9https://www.linkedin.com
10https://www.dblp.org
process. For example, in the movie domain, the query “alien ridley scott 1979” carries references
to structured information usually contained in the documents of a movie corpus: the title of a
movie, Alien, the name oa movie director, Ridley Scott, and a date, 1979. In this case, the query
could be segmented accordingly into alien, ridley scott, and 1979 and the query segments could
be tagged with the labels Title, Director, and Year, respectively.
      </p>
      <p>
        Semantic Query Labelling is a challenging task that can add context and structure to
keywordbased queries, usually composed of a few terms that may be ambiguous. The main challenges
of this task are related to the vocabulary overlap among diferent semantic classes, which
could require the use of contextual information and disambiguation techniques, and vocabulary
mismatch [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] between the vocabulary employed by the users to express their information need
and the vocabulary used to describe the corresponding answers in the document collection.
Unfortunately, the production of an appropriate dataset to evaluate the efectiveness of automatic
query tagging approaches is costly, and actually, there is a lack of publicly available datasets for
this task.
      </p>
      <p>
        Despite semantic query labelling could play an important role in Vertical Search, very little
work has been done in this regard. The majority of past eforts in this context come from private
companies, such as Microsoft ([
        <xref ref-type="bibr" rid="ref21 ref23">21, 23, 24, 25, 26</xref>
        ]) and Yahoo! ([27]). Due to privacy issues,
companies cannot release the datasets used in their studies. As well known, this makes it hard
to reproduce their approaches and comparatively evaluate them. Moreover, the lack of public
datasets makes it dificult for academic researchers to propose novel Semantic Query Labeling
models, and evaluate their efectiveness.
      </p>
      <p>As we strongly believe in the utility of advancing in Vertical Search, we have recently
undertaken a step towards the definition of a benchmark dataset for this task 11.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Building a Benchmark Dataset for Semantic Query Labeling</title>
      <p>In this section, we describe the dataset we have defined and shared [ 28], as well as the process
we followed for manually annotating each query term. Our dataset is composed of thousands
of manually-labeled real-world queries in the movie domain for training and evaluating novel
methods for Semantic Query Labeling.</p>
      <p>The choice of working in the movie domain is motivated by the fact that movie streaming
platforms are popular nowadays, but they still provide a sub-optimal search experience to their
users. Moreover, structured search is fundamental in this context: as we assessed during our
work described here, users tend to compose their queries referring to specific movie-related
information, such as the name of an actor or a director, a movie genre, a topic, and others, which
are usually available as metadata. By conducting a qualitative evaluation of the top 10 results
returned by the search engine of one of the most popular movie streaming services, we assessed
that it is not able to correctly retrieve movies even for simple queries. For example, “horror 2015”
retrieved only one horror movie from 2015, many other results were neither horror movies nor
movies from 2015. “2015 horror” did not retrieved any result at all. Neither “leone eastwood” nor
“sergio leone clint eastwood” retrieved any result despite the presence on the platform of all the
movies directed by Sergio Leone and starring Clint Eastwood at the time of the experiment.</p>
      <p>11https://github.com/AmenRa/semantic-query-tagging-dataset</p>
      <sec id="sec-3-1">
        <title>3.1. Query Gathering</title>
        <p>The first step in building a dataset suitable for studying Semantic Query Labeling is the query
gathering. To collect the queries that are part of our dataset, we relied on a publicly available
large-scale query log of the AOL Web search engine12, which was shared by Pass et al. [29].
This query set comprises queries issued by real users between March 1, 2006, and May 31, 2006.
First of all, we defined a list of seed-terms for identifying movie-related queries: movie, movies,
iflm , and films . Leveraging these terms, we extracted 39 635 unique queries. Then, we manually
ifltered out all the queries that did not fall into our category of interest: keyword-based queries
that resemble those used by users for searching movies on movie streaming platforms. As the
large majority of the initially extracted queries were related to theaters’ movie listings — note
that AOL ofers a general-purpose Web search engine — we ended up collecting 9 752 candidate
queries. After removing the seed-terms used for gathering the queries, manually correcting
misspellings, normalizing strings, removing the stop-words, and applying lemmatization, our
dataset counts 6 749 unique queries.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Semantic Labels Assessment</title>
        <p>The second step in the building process of our dataset was to define 1) the semantic label set to
use for the creation of the ground truth and 2) the procedure to follow to assign the semantic
labels to the query terms, ensuring the quality of the proposed dataset.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Semantic Labels</title>
          <p>After an initial analysis of the harvested queries, we defined the following semantic classes
to assign to each query term: Title, Country, Year, Genre, Director, Actor, Production company,
Tag (mainly topics and plot features), Sort (e.g., new, best, popular, etc.). Following previews
works in Natural Language Processing and Sequence Labeling [30], we used the IOB2 labeling
format [31, 32] for manually assigning both semantic labels and segmentation delimiters. For
example, the query “alien by ridley scott 1979” is labeled as follows: “alien B-TITLE by O ridley
B-DIRECTOR scott I-DIRECTOR 1979 B-YEAR”, where the prefix B- indicates the beginning
of a segment, the prefix I- indicates that the term is inside a segment, and the tag O is used to
label terms with no semantic values, such as the preposition by in our example.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Creation of the Ground Truth</title>
          <p>One of the main reasons for choosing to work in the movie domain is the public availability of
movie-related information. We relied on this information to ensure the quality of the ground
truth labels we manually assigned to the query terms. In this regard, we consulted many
websites that contain movie-related information while labeling the queries, such as Wikipedia13,
IMDb14, and many others. Furthermore, particular attention was paid in discerning actors from
directors, as sometimes a single person is both an actor and a director, such as Ron Howard.
12https://www.aol.com
13https://www.wikipedia.org
14https://www.imdb.com
In these cases, we followed a simple rule: if the query contains elements pointing towards a
specific interpretation of the query, we labeled the query accordingly (e.g., in the query “1999
ron howard”, Ron Howard has been labeled as a Director as in 1999 he directed the movie EDtv
and did not star in any movie), otherwise we assigned the most likely label based on the number
of movies the person has directed or starred. Therefore, we can state that, where meaningful,
we applied a contextual labeling.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Building a Fine-grained Evaluation Setting</title>
        <p>To promote a realistic evaluation setting, we split the dataset into train, dev, and test sets
temporally, using the queries issued in the first two months as train set, and those from the two
subsequent two-weeks periods as dev set and test set. Temporal splitting also reduces query
term overlaps between the splits: we noticed that queries issued by users in the same search
session often share several terms. We also observed that not taking care of this aspect could
yield unrealistic results when training with real-world data.</p>
        <p>To build a fine-grained evaluation setting, we created three diferent scenarios of increasing
dificulty by subsetting our benchmark dataset. The first scenario we built, Basic, comprises only
queries containing the following semantic components: Actor, Country, Genre, Title, Year, and O.
We then added the semantic components Director and Sort to create the Advanced scenario.
Finally, we added Production Company and Tag to create the Hard scenario. The rationale behind
these choices is as follows: the Basic scenario is composed of semantic components whose
vocabularies are disjoint; the Advanced scenario introduces vocabulary overlaps (actors/directors),
and a semantic class with few manually defined values; the Hard scenario introduces a semantic
class often subject to omissions, e.g., Walt Disney Pictures → disney, and a class, Tag, afected by
vocabulary overlaps with the others and vocabulary mismatch between queries and documents.
Table 1 reports some statistics regarding the proposed scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this manuscript, we described the building process of a novel benchmark dataset we have
recently proposed. We hope our efort can stimulate research for the understudied task of
Semantic Query Labeling and encourage other researchers in building datasets for other not
very popular Information Retrieval tasks that could greatly benefit from the recent advancements
in Deep Learning.
[24] X. Li, Understanding the semantic structure of noun phrase queries, in: ACL 2010,
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,
July 11-16, 2010, Uppsala, Sweden, The Association for Computer Linguistics, 2010, pp.
1337–1345.
[25] N. Sarkas, S. Paparizos, P. Tsaparas, Structured annotations of web queries, in: Proceedings
of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp.
771–782.
[26] J. Liu, X. Li, A. Acero, Y. Wang, Lexicon modeling for query understanding, in: Proceedings
of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP
2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, IEEE, 2011, pp.
5604–5607.
[27] Z. Kozareva, Q. Li, K. Zhai, W. Guo, Recognizing salient entities in shopping queries, in:
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,
ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers, The Association
for Computer Linguistics, 2016.
[28] E. Bassani, G. Pasi, Semantic query labeling through synthetic query generation, in:
SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in
Information Retrieval, Virtual Event, Canada, July 11-15, 2021, ACM, 2021, pp. 2278–2282.
[29] G. Pass, A. Chowdhury, C. Torgeson, A picture of search, in: Proceedings of the 1st
International Conference on Scalable Information Systems, Infoscale 2006, Hong Kong,
May 30-June 1, 2006, volume 152 of ACM International Conference Proceeding Series, ACM,
2006, p. 1.
[30] E. F. Tjong Kim Sang, F. De Meulder, Introduction to the CoNLL-2003 shared task:
Language-independent named entity recognition, in: Proceedings of the Seventh
Conference on Natural Language Learning at HLT-NAACL 2003, 2003, pp. 142–147.
[31] A. Ratnaparkh, Maximum entropy models for natural language ambiguity resolution,
in: Ph.D. Dissertation in Computer and Information Science, University of Pennsylvania,
1998.
[32] E. F. T. K. Sang, J. Veenstra, Representing text chunks, in: EACL 1999, 9th Conference of
the European Chapter of the Association for Computational Linguistics, June 8-12, 1999,
University of Bergen, Bergen, Norway, The Association for Computer Linguistics, 1999,
pp. 173–179.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          , X. Cheng,
          <article-title>A deep look into neural ranking models for information retrieval</article-title>
          ,
          <source>Inf. Process. Manag</source>
          .
          <volume>57</volume>
          (
          <year>2020</year>
          )
          <fpage>102067</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>Neural models for information retrieval</article-title>
          ,
          <source>CoRR abs/1705</source>
          .01509 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , X. Cheng,
          <article-title>A deep top-k relevance matching model for ad-hoc retrieval</article-title>
          ,
          <source>in: Information Retrieval - 24th China Conference, CCIR</source>
          <year>2018</year>
          , Guilin, China,
          <source>September 27-29</source>
          ,
          <year>2018</year>
          , Proceedings, volume
          <volume>11168</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wan</surname>
          </string-name>
          , X. Cheng,
          <article-title>Text matching as image recognition</article-title>
          ,
          <source>in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, Arizona, USA, AAAI Press,
          <year>2016</year>
          , pp.
          <fpage>2793</fpage>
          -
          <lpage>2799</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          , X. Cheng, Deeprank:
          <article-title>A new deep architecture for relevance ranking in information retrieval</article-title>
          ,
          <source>in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2017</year>
          , Singapore,
          <source>November 06 - 10</source>
          ,
          <year>2017</year>
          , ACM,
          <year>2017</year>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Severyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <article-title>Learning to rank short text pairs with convolutional deep neural networks</article-title>
          ,
          <source>in: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August</source>
          <volume>9</volume>
          -
          <issue>13</issue>
          ,
          <year>2015</year>
          , ACM,
          <year>2015</year>
          , pp.
          <fpage>373</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Convolutional neural tensor network architecture for community-based question answering</article-title>
          ,
          <source>in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2015</year>
          ,
          <string-name>
            <given-names>Buenos</given-names>
            <surname>Aires</surname>
          </string-name>
          , Argentina,
          <source>July 25-31</source>
          ,
          <year>2015</year>
          , AAAI Press,
          <year>2015</year>
          , pp.
          <fpage>1305</fpage>
          -
          <lpage>1311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hamza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Florian</surname>
          </string-name>
          ,
          <article-title>Bilateral multi-perspective matching for natural language sentences</article-title>
          ,
          <source>in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2017</year>
          , Melbourne, Australia,
          <source>August 19-25</source>
          ,
          <year>2017</year>
          , ijcai.org,
          <year>2017</year>
          , pp.
          <fpage>4144</fpage>
          -
          <lpage>4150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Neural matching models for question retrieval and next question prediction in conversation</article-title>
          ,
          <source>CoRR abs/1707</source>
          .05409 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. E</surname>
          </string-name>
          ,
          <article-title>Joint learning of response ranking and next utterance suggestion in human-computer conversation system</article-title>
          ,
          <source>in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Shinjuku, Tokyo, Japan,
          <source>August</source>
          <volume>7</volume>
          -
          <issue>11</issue>
          ,
          <year>2017</year>
          , ACM,
          <year>2017</year>
          , pp.
          <fpage>685</fpage>
          -
          <lpage>694</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Tice</surname>
          </string-name>
          ,
          <article-title>Building a question answering test collection</article-title>
          ,
          <source>in: SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24-28</source>
          ,
          <year>2000</year>
          , Athens, Greece,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2000</year>
          , pp.
          <fpage>200</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meek</surname>
          </string-name>
          ,
          <article-title>Wikiqa: A challenge dataset for open-domain question answering</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2015</year>
          , Lisbon, Portugal,
          <source>September 17-21</source>
          ,
          <year>2015</year>
          , The Association for Computational Linguistics,
          <year>2015</year>
          , pp.
          <fpage>2013</fpage>
          -
          <lpage>2018</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Keikha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Park</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Evaluating answer passages using summarization measures</article-title>
          ,
          <source>in: The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '14,
          <string-name>
            <surname>Gold</surname>
            <given-names>Coast</given-names>
          </string-name>
          ,
          <string-name>
            <surname>QLD</surname>
          </string-name>
          ,
          <source>Australia - July 06 - 11</source>
          ,
          <year>2014</year>
          , ACM,
          <year>2014</year>
          , pp.
          <fpage>963</fpage>
          -
          <lpage>966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Glass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Applying deep learning to answer selection: A study and an open task</article-title>
          ,
          <source>in: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding</source>
          ,
          <string-name>
            <surname>ASRU</surname>
          </string-name>
          <year>2015</year>
          ,
          <article-title>Scottsdale</article-title>
          ,
          <string-name>
            <surname>AZ</surname>
          </string-name>
          , USA, December
          <volume>13</volume>
          -
          <issue>17</issue>
          ,
          <year>2015</year>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>813</fpage>
          -
          <lpage>820</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Wikipassageqa: A benchmark collection for research on non-factoid answer passage retrieval</article-title>
          ,
          <source>in: The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2018</year>
          , Ann Arbor, MI, USA, July
          <volume>08</volume>
          -
          <issue>12</issue>
          ,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>1165</fpage>
          -
          <lpage>1168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , L. Deng, MS MARCO:
          <article-title>A human generated machine reading comprehension dataset</article-title>
          ,
          <source>in: Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 colocated with the 30th Annual Conference on Neural Information Processing Systems (NIPS</source>
          <year>2016</year>
          ), Barcelona, Spain, December 9,
          <year>2016</year>
          , volume
          <volume>1773</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoogeveen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Màrquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verspoor</surname>
          </string-name>
          , Semeval
          <article-title>-2017 task 3: Community question answering</article-title>
          , CoRR abs/
          <year>1912</year>
          .00730 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoogeveen</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Verspoor</surname>
          </string-name>
          , T. Baldwin,
          <article-title>Cqadupstack: A benchmark data set for community question-answering research</article-title>
          ,
          <source>in: Proceedings of the 20th Australasian Document Computing Symposium, ADCS</source>
          <year>2015</year>
          ,
          <article-title>Parramatta</article-title>
          ,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia, December 8-
          <issue>9</issue>
          ,
          <year>2015</year>
          , ACM,
          <year>2015</year>
          , pp.
          <volume>3</volume>
          :
          <fpage>1</fpage>
          -
          <issue>3</issue>
          :
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yahya</surname>
          </string-name>
          , G. Weikum,
          <article-title>Comqa: A community-sourced dataset for complex factoid question answering with paraphrase clusters, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Leng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>Linkso: a dataset for learning to retrieve similar question answer pairs on software development forums</article-title>
          ,
          <source>in: Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, NL4SE@ESEC/SIGSOFT FSE</source>
          <year>2018</year>
          ,
          <article-title>Lake Buena Vista</article-title>
          , FL, USA, November 4,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Manshadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Semantic tagging of web search queries</article-title>
          ,
          <source>in: ACL</source>
          <year>2009</year>
          ,
          <article-title>Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th</article-title>
          <source>International Joint Conference on Natural Language Processing of the AFNLP, 2-7 August</source>
          <year>2009</year>
          , Singapore, The Association for Computer Linguistics,
          <year>2009</year>
          , pp.
          <fpage>861</fpage>
          -
          <lpage>869</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <article-title>The vocabulary problem in human-system communication</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>30</volume>
          (
          <year>1987</year>
          )
          <fpage>964</fpage>
          -
          <lpage>971</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Acero</surname>
          </string-name>
          ,
          <article-title>Extracting structured information from user queries with semisupervised conditional random fields</article-title>
          ,
          <source>in: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2009</year>
          , Boston, MA, USA, July
          <volume>19</volume>
          -
          <issue>23</issue>
          ,
          <year>2009</year>
          , ACM,
          <year>2009</year>
          , pp.
          <fpage>572</fpage>
          -
          <lpage>579</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>