<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LIG-Health at Adhoc and Spoken IR Consumer Health Search: expanding queries using UMLS and FastText.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philippe Mulhem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriela Gonzalez Saez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Mannion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Didier Schwab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jibril Frej</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Univ. Grenoble Alpes</institution>
          ,
          <addr-line>CNRS, Grenoble INP</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the work done by the LIG of Grenoble for the Adhoc and the Spoken Consumer Health search. Our focus for this participation is to study the e ectiveness of simple query expansions for health related retrieval. We focused on several query expansions, using knowledge-based or embedding-based techniques, with and without weighting of expansions, with and without Pseudo Relevance Feedback. The results obtained for Adhoc queries show that our baseline run outperforms the query expansions proposed. The results obtained for spoken queries show that several speakers lead to very di erent results, and that merging the results from several users improve the quality of the system.</p>
      </abstract>
      <kwd-group>
        <kwd>Query Expansion</kwd>
        <kwd>UMLS</kwd>
        <kwd>FastText</kwd>
        <kwd>Query fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This paper describes the experiments achieved by the LIG-Health team for the
CLEF 2020 evaluation campaign [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We did participate to the task Consumer
Health Search of CLEF eHealth 2020 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and more spci cally to the adHoc
subtask and to the spoken queries subtask [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The people involved are for these
experiments are members of the Information Retrieval group (MRIM) and the
Natural Language Processing group (GETALP) of the Laboratoire d'Informatique
de Grenoble1.
      </p>
      <p>
        Our work targeted the two subtasks proposed: adhoc and spoken queries.
For both subtasks, we explored the use of two query expansions methods: one
knowledge-supported using the UMLS meta-thesaurus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and one using
embeddings using FastText [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Binary and weighted expansions were processed in both
case. For the retrieval stage, we considered both \Straight" (SR) and Relevance
Feedback (RF) cases. We study how some simple processes may be adapted for
both text and spoken queries. In the case of spoken queries, query expansions
may be questionable because of the possible errors of speech to text steps. In all
the cases, we made use of the assessments of CLEF eHealth 2018 to select the
submissions.
      </p>
      <p>We tackled the spoken queries by considering all the transcriptions provided,
and applying the two expansions and the two retrieval described above, in a
way to evaluate the best con guration to submit. For the fusion of runs, we did
consider a simple fusion of result lists.</p>
      <p>The remaining of the paper is organized as follows. In Section 2 we describe
in detail the two expansions approaches used, before describing our proposal in
Section 3. Section 4 focuses on the features and parameters of the Information
Retrieval used. The o cial results are presented in part 6. We discuss the results
in Section 7 before concluding in Section 8.</p>
    </sec>
    <sec id="sec-2">
      <title>Expansion Approaches</title>
      <sec id="sec-2-1">
        <title>FastText-based</title>
        <p>
          This rst expansion proposed relies on FastText [
          <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
          ]. FastText proposes a
framework to learn and manage words embeddings. It is able to consider
subwords (using ngrams) as opposed to more classical embeddings models like
Word2Vec [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which create embeddings only for whole-word tokens. The
FastText embedding vector of a word is the sum of the vectors of its component
ngrams.
        </p>
        <p>We used the pre-trained word vectors for English language, trained on
Common Crawl and Wikipedia using FastText. The features of the model used are
as follows;
{ Continuous bag-of-words (CBOW) with positional weighting
{ Vector embeddings of dimension d = 300
{ Character n-grams of length 5
{ Context window of size 5
{ Sampling of 10 negative examples per positive example</p>
        <p>Using such embeddings, in our experiments, we expand each query using
terms with a cosine similarity greater than an experimentally determined
threshold t with the original query terms - i.e. denoting the cosine similarity function as
F Tcos, for each term w in the preprocessed query, we calculate its FastText
embedding vector f (w) and then add all terms w0 for which F Tcos(f (w); f (w0)) t
to the query.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>UMLS-based</title>
        <p>
          The second expansion strategy used in this work relies on the Uni ed Medical
Language System (UMLS) Metathesaurus [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], a comprehensive biomedical
thesaurus incorporating a network of semantically related concepts linking a large
number of medical language resources. From the many information sources in
UMLS Metathesaurus, we restricted our expansion search to one that is speci
cally designed to deal consumer-level medical vocabulary: the Open Access and
Collaborative Consumer Health Vocabulary2, known as the CHV, which contains
more than 88 000 synonyms for more that 57 000 concepts.
        </p>
        <p>The CHV is used to get the synonyms of query terms, and we denote the
function mapping a term to its CHV synonym as CHVsyn in the following. As
the synonyms were often too general or too numerous in initial experiments,
we introduced in addition a ltering step based on FastText similarity F Tcos.
Given that the goal of the UMLS-based expansion is to nd expansion terms
that are semantically rather than syntactically related to the query terms, the
CHV synonyms were only included in the expanded query if their FastText
embedding had a cosine similarity less than 0.6 with the original query term
they were associated with.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Query expansions proposed</title>
      <p>The two query expansions proposed are described now. Each of them has two
versions: the binary one and the weighted one. As their names suggest, the binary
expansions do not consider any weight to query terms, and the weighted ones
are able to indicate a level of importance of a term in the query. We detail them
in the following.
3.1</p>
      <sec id="sec-3-1">
        <title>Embedding-based only expansion.</title>
        <p>
          This approach is quite similar to [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], one major di erence is the embeddings
consider subwords, as described above in part 2.1. We do not use any manually
de ned knowledge for these expansions. Formulas 1 and 2 describe the binary
expansion based on FastText. In formula 1, the set V OC F T denotes the
vocabulary that FastText manages. The manually-de ned threshold considered here,
0.75, is lower than the one in the UMLS expansion: it is consistent with [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and
a trade o between quality of suggested terms and the quantity of terms found.
        </p>
        <p>T exp F Tbinary(qi) = feje 2 V OC F T q ^ F Tcos(qi; e) &gt;= 0:75g
(1)
2 https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CHV/index.html
Exp Query F Tbinary(q) = q [qi2q T exp F Tbinary(qi)
(2)</p>
        <p>For the weighted FT-based expansion, the principle is the same as before,
but:
{ the initial query terms have a weight of 1;
{ the expanded terms are weighted by the cosine similarity between their
Fast</p>
        <p>Text embedding and that of their synonym in the original query;
{ if one expansion term occurs several times in the expansion, each (weighted)
occurrence is considered in the expansion.</p>
        <p>More formally, the formulas 3 and 4 describe such expansion :</p>
        <p>
          The use of knowledge-based expansions of query is well studied, as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In
the speci c case of medical search, the use of UMLS meta thesaurus is classical,
as in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The binary expansion is processed in follows query term by query
term qi, as described in formulas 5 and 6. In our experiments for each query
term qi from a query q, we look for the synonyms of qi in the consumer health
vocabulary. Then, we apply a ltering that keeps the term if his similarity, using
FastText [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], is larger that 0.8. Again, this threshold has been manually de ned
and is consistent with [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (even if [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] showed that such threshold can not be
considered as a rule of thumb). This ltering allows to consolidate the trust we
have in the synonyms provided by CHV.
        </p>
        <p>T exp U M LSbinary(qi) = feje 2 CHVsyn(qi) ^ F Tcos(qi; e) &gt;= 0:8g</p>
        <p>Exp Query U M LSbinary(q) = q [qi2q T exp U M LSbinary(qi)</p>
        <p>For the weighted UMLS-based expansion, the principle is the same as before,
but:
{ the initial query terms have a weight of 1;
{ as CHV does not weight synonymy relationships, we propose that the
expanded terms get the weight provided by FastText;
{ if one expansion term occurs several times on the expansion, each (weighted)
occurrence is considered in the expansion.
(5)
(6)
More formally, the formulas 7 and 8 describe such expansion :</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Information Retrieval System</title>
      <p>
        The information retrieval system used for the experiments is Terrier v5.23 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
We did not index the corpus, but we used the index provided by the organizers.
This had a impact on the retrieval, as simple tests made us nd out that the
index seem corrupted, leading to duplicate document identi ers in result lists.
We did then post-process the result list in a way to remove these duplicated
documents. Because this removal was applied on the top-1000 documents, our
results lists are less than 1000 long.
      </p>
      <p>
        The IR model used is BM25 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], with b=0.75 after preliminary experiments,
other parameters by default. Many experiments show that BM25 is a very good
model to be used [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The Relevance Feedback model is Bose Einstein (bo1
model of Terrier), with default parameters (3 top documents considered, and
10 terms for expansion). The Bo1 relevance feedback model provides very good
results.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Runs description</title>
      <p>The di erent runs submitted were the best four runs of several con gurations.
As described above in section 3, adding expanded con gurations, we get a total
10 runs:
1. Noexp: no expansion, straight query processing (i.e., without Relevance</p>
      <p>Feedback) z;
2. Noexp RF: no expansion, RF query processing y;
3. FT Straight binary: FastText-based query expansion, binary expansion mode,
no RF z;
4. FT Straight weighted: FastText-based weighted query expansion, no RF;
5. FT RF binary: FastText-based binary expansion, RF query processing y;
6. FT RF weighted: FastText-based weighted query excpansion, RF query
processing;
7. UMLS Straight binary: UMLS-based query expansion, binary expansion mode,
straight query processing z;
8. UMLS Straight weighted: UMLS-based weighted query expansion, straight
query processing;
3 http://terrier.org/</p>
      <p>A described below, we select among these con gurations our submissions for
the two subtasks Adhoc (marked z) and Spoken queries (marked y).
5.1</p>
      <sec id="sec-5-1">
        <title>Adhoc subtask</title>
        <p>For the selection of our submitted run, we did evaluate the quality on the qrels
of CLEF eHealth Adhoc 2018, using the MAP, of the 10 con gurations above.
The results obtained are presented in Table 1. The best reference run between
Noexp and Noexp RF plus the top three runs were submitted as our o cial
runs. For the Adhoc subtask, dedicated to retrieve documents when asking one
query, a set of 50 queries are provided. The runs with the best results over these
50 queries are chosen (marked with a z in section 5).</p>
        <p>From Table 1, we see that, using the CLEF eHealth 2018 reference, the best
run is the non-expanded and non-RF one. When binary con gurations achieve
the same quality that their binary counterpart, we choose the binary con
guration. This explain why the UMLS and TF-based binary expansions with straight
query processing are selected. Overall, we notice that the Relevance Feedback
query processing underperforms straight query processing for the FastText-based
expansions, and that the weighted FastText expansions behave the same than
their binary counterparts.
On the Spoken subtask, the 50 topics from the Adhoc task had been recorded
by six users (Participant 1 to Participant 6). Per participant, six
transcriptions are provided: default enhanced transcription, ESPNET commonvoice,
ESPNET librispeech, ESPNET librispeech rnnlm, phone enhanced, video enhanced.
Weexplore all if these transcriptions for each participant. This leads to a total
of 36 (= 6 participants 6 transcriptions) versions of the set of queries.</p>
        <p>The full selection of the four submitted runs per user considers the 10 con
gurations described previously over these versions of queries. It follows two steps:
the rst one select one transcription per user, and in a second step we choose
the con gurations used for the submission. More precisely:</p>
      </sec>
      <sec id="sec-5-2">
        <title>1. Selection of the one transcription per participant</title>
        <p>We rst choose the transcriptions that achieves the higher MAP values
(according to CLEF eHEalth 2018 qrels) over the non-expanded runs. These
results are presented in Figure 2. We see that the transcription quality varies
a lot depending on the speaker: fort instance the default enhanced
transcription is very good for the participants 1, 2, 3 and 6, but fails for participants
4 and 5. By analyzing this gure, we select the following transcriptions:
{ default enhanced transcription for Participant 1
{ default enhanced transcription for Participant 2
{ video enhanced for Participant 3
{ phone enhanced for Participant 4
{ phone enhanced for Participant 5
{ default enhanced transcription for Participant 6</p>
      </sec>
      <sec id="sec-5-3">
        <title>2. Selection of four con gurations per participant over the chosen transcription of step 1</title>
        <p>Then, we computed the averaged MAP (over the 6 participants, using CLEF
eHealth 2018 Adhoc assessments provided) for each IR con guration system.
These results are presented in Figure 3. Considering these averaged maps, we
choose the top-four con gurations with only one non-expanded run (marked
as y in section 5):
(a) Noexp RF: no expansion, RF query processing;
(b) FT RF binary: FastText-based binary query expansion, RF query
processing;
(c) UMLS RF binary: UMLS-based binary query expansion, RF query
processing
(d) UMLS RF weighted: UMLS-based weighted query expansion, RF query
processing.</p>
        <p>We notice that all the selected con gurations use Relevance Feedback.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Fused runs</title>
        <p>We also submitted 4 runs that fuse the results of the same con guration for
each participant. To integrate these results, we used a simple sum of scores.
This allows to study if the integration of several transcriptions from several
participants outperforms single participant transcriptions. The MAP evaluation
of these four con gurations on the CLEF eHealth 2018 assessments, as presented
in Figure 4, show a slight increase compared to the top results per user.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>
        We present here the o cial results obtained by our runs, for the adhoc and
spoken queries subtasks. We consider the following evaluation measures: MAP
to assess globally the quality of the con gurations, Bpref [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that takes into
account the fact that the evaluation relies on incomplete assessments, and the
classical ndcg@10 that focuses on the relevance of the top-10 results. For these
runs, the other measures provided by the organizers, like RBP-based ones, lead
to similar rankings of the con gurations tested.
6.1
      </p>
      <sec id="sec-6-1">
        <title>Adhoc</title>
        <p>Our o cial results for the Adhoc query runs are presented in Table 2. In this
table, we see that the best MAP and ndcg@10 results are obtained without any
query expansion, and without any relevance feedback. This means that none of
the query expansions are able to increase the quality regarding these
evaluation measures. However, binary UMLS and TF-based expansions, with straight
query processing, slightly outperform the un-expanded runs with straight query
processing.</p>
        <p>We studied also the results obtained for RBP measures in table3. We see
that the UMLS expanded run outperforms the non-expanded one for RBP and
RBP readability.
6.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Spoken queries</title>
        <p>The o cial results of our results per participant are presented in Table 4. This
table con rms our evaluations on 2018 assessments: there are large variations of
quality (for all the measures) depending on the participant considered. From our
four con gurations submitted, the best MAP for participant 6 is 0.1744 (Noexp
with RF), where for participant 5 the best MAP (UMLS-based binary
expansion with RF) is only 0.1036. The best measures (among the 3 presented) per
user are obtained without any query expansion in 12 cases on 24. The
UMLSbased binary expansion only provides twice the best measures (MAP and Bpref)
for participant 5. The weighted expansion using UMLS outperforms other
congurations in 4 cases. The weighted UMLS expansion outperforms the binary
UMLS expansion on 12 cases over 18. FT-based expansion never produces the
best results, but the second best Bpref and ndcg@10 for participant 1.</p>
        <p>For the submitted merged results, we see that the merging always increase
(over the best single participant) the evaluations measures for each con
guration. Here, the binary UMLS-based expansion outperforms its weighted
counterpart for Bpref and ndcg@10. FT-based expansions underperform UMLS-based
expansions.</p>
        <p>For the RBP evaluations of the merged spoken runs, the no-expanded run
still outperforms our other submissions.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Discussion</title>
      <p>We focus rst on the Adhoc subtask. According to the MAP values obtained
on the 2018 assessments, our o cial results are consistent: the best run is the
non-expanded one without relevance feedback. We see that the UMLS expanded
query with relevance feedback performs as well as the non-expanded run for rst
results, as the P@10 and ndcg@10 are almost equal. Binary UMLS and FT-based
expansions slightly outperform our non-expanded runs for the Bpref measure,
this shows that such expansions can be bene cial. We see in Figures 5 and 6
(i.e, our two best results according to MAP), that the expansion is much more
unstable compared to the media results.</p>
      <p>When considering spoken participant runs, we show again that some
expansions proposed are able to outperform the non-expansions con gurations. More
precisely, UMLS-based expansions obtain larger MAP and Bpref values than
non-expansions for 33% (= 2/6) of the participants.</p>
      <p>The fused spoken evaluation results that we get from the spoken queries
are consistent with the adhoc results: the expansions underperform the
nonexpanded runs. With merged results, the expansions are never close to the quality
of the non-expanded runs: the reason is that the expansions over each user tend to
disperse the initial query expression already subject to transcription errors. We
present in 7 and 8 (i.e, our two best results according to ndcg@10) the results
per query. We see again here that the expansion underperforms the median
results more often than the non-expanded run.</p>
      <p>A more detailed fusion process may improve the overall quality, but in any
case merging results for spoken queries needs to be able to retrieve similar queries
asked by several users, which is not an easy task.</p>
      <p>This work is only focusing on simple expansions, and these expansions do
not succeed in increasing the quality of the results. The expansion terms are
not strongly enough related to teh initial query. Future experiments will be
conducted to check exactly why these expansions fail.</p>
      <p>For the o cial evaluation measures related to credibility, the UMLS
expanded runs outperform by 4.3% (cRBP 0.80) the non-expanded one for the
Adhoc search, but still the non-expanded runs achieve a higher quality than the
expanded ones for the spoken runs.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>We presented in this paper the con gurations of the retrieval for Adhoc and
Spoken queries subtasks of the consumer Health Search task from CLEF eHealth
2020. We focused our proposal on several query expansions. The query
expansions rely on UMLS meta-thasaurus, and on words embeddings using FastText.</p>
      <p>The main ndings is that the expansion proposed for classical Adhoc
underperforms simple retrieval (with or without relevance Feedback strategy). For
spoken runs, we were able to detect that some query expansions (based on UMLS)
do compete well with simple retrieval without expansion.</p>
      <p>Other approaches based on reranking should be studied in the future, in a
way to avoid the noise generated by the expansions of queries.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgement</title>
      <p>This work was partially supported by the ANR Kodicare project, grant
ANR19-CE23-0029 of the French Agence Nationale de la Recherche.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Mohannad</given-names>
            <surname>Almasri</surname>
          </string-name>
          , Catherine Berrut, and
          <string-name>
            <surname>Jean-Pierre Chevallet</surname>
          </string-name>
          .
          <article-title>A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information</article-title>
          . In Conference ECIR, volume
          <volume>42</volume>
          , pages
          <fpage>369</fpage>
          {
          <fpage>715</fpage>
          ,
          <string-name>
            <surname>Padoue</surname>
          </string-name>
          , Italy,
          <year>March 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          .
          <article-title>The Uni ed Medical Language System (UMLS): Integrating Biomedical Terminology</article-title>
          .
          <source>Nucleic Acids Res</source>
          .,
          <volume>32</volume>
          (
          <string-name>
            <surname>Database-Issue</surname>
          </string-name>
          ):
          <volume>267</volume>
          {
          <fpage>270</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          . CoRR, abs/1607.04606,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Chris</given-names>
            <surname>Buckley and Ellen M. Voorhees</surname>
          </string-name>
          .
          <article-title>Retrieval Evaluation with Incomplete Information</article-title>
          .
          <source>In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '
          <volume>04</volume>
          , page
          <volume>25</volume>
          {
          <fpage>32</fpage>
          , New York, NY, USA,
          <year>2004</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Elekes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schaeler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Boehm</surname>
          </string-name>
          .
          <article-title>On the Various Semantics of Similarity in Word Embedding Models</article-title>
          .
          <source>In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>10</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Hanna Suominen, Liadh Kelly, Zhengyang Liu, Gabriella Pasi, Gabriela Saez Gonzales, Marco Viviani, and
          <string-name>
            <given-names>Chenchen</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF eHealth 2020 Task 2: Consumer Health Search with Ad Hoc and Spoken Queries</article-title>
          . In Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          , CEUR Workshop Proceedings,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Hanna Suominen, Liadh Kelly, Antonio Miranda-Escalada, Martin Krallinger, Zhengyang Liu, Gabriella Pasi, Gabriela Saez Gonzales, Marco Viviani, and
          <string-name>
            <given-names>Chenchen</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2020</article-title>
          . In Avi Arampatzis, Evangelos Kanoulas, Theodora Tsikrika, Stefanos Vrochidis, Hideo Joho, Christina Lioma, Carsten Eickho , Aurelie Neveol, Linda Cappellato, and Nicola Ferro, editors,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ) , LNCS Volume number:
          <volume>12260</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Kotov and ChengXiang Zhai</surname>
          </string-name>
          .
          <article-title>Tapping into Knowledge Base for Concept Feedback: Leveraging Conceptnet to Improve Search Results for Di cult Queries</article-title>
          . In Eytan Adar, Jaime Teevan, Eugene Agichtein, and Yoelle Maarek, editors,
          <source>WSDM</source>
          , pages
          <volume>403</volume>
          {
          <fpage>412</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Craig</surname>
            <given-names>Macdonald</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richard</surname>
            <given-names>McCreadie</given-names>
          </string-name>
          , Rodrygo LT Santos, and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>From Puppy to Maturity: Experiences in Developing Terrier</article-title>
          .
          <source>Proc. of OSIR at SIGIR</source>
          , pages
          <volume>60</volume>
          {
          <fpage>63</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Edouard Grave, Piotr Bojanowski,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Puhrsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Armand</given-names>
            <surname>Joulin</surname>
          </string-name>
          .
          <article-title>Advances in Pre-Training Distributed Word Representations</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Je</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          . In C. J.
          <string-name>
            <surname>C. Burges</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Q. Weinberger, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , pages
          <fpage>3111</fpage>
          {
          <fpage>3119</fpage>
          . Curran Associates, Inc.,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Antonio</surname>
          </string-name>
          Miranda-Escalada,
          <article-title>Aitor Gonzalez-Agirre, Jordi Armengol-Estape, and Martin Krallinger. Overview of Automatic Clinical Coding: Annotations, Guidelines, and Solutions for non-English Clinical Cases at CodiEsp Track of CLEF eHealth 2020</article-title>
          . In Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          , CEUR Workshop Proceedings,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Navid</surname>
            <given-names>Rekabsaz</given-names>
          </string-name>
          , Mihai Lupu, and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Exploration of a Threshold for Similarity Based on Uncertainty in Word Embedding</article-title>
          . In Joemon M Jose, Claudia Hau , Ismail Sengor Alt ngovde, Dawei Song, Dyaa Albakour, Stuart Watt, and John Tait, editors,
          <source>Advances in Information Retrieval</source>
          , pages
          <volume>396</volume>
          {
          <fpage>409</fpage>
          ,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2017</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          .
          <source>Okapi at trec-3. In Overview of the Third Text REtrieval Conference (TREC-3)</source>
          , pages
          <fpage>109</fpage>
          {
          <fpage>126</fpage>
          .
          <string-name>
            <surname>Gaithersburg</surname>
          </string-name>
          , MD: NIST,
          <year>January 1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Peilin</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hui</given-names>
            <surname>Fang</surname>
          </string-name>
          .
          <article-title>A Reproducibility Study of Information Retrieval Models</article-title>
          .
          <source>ICTIR '16, page</source>
          <volume>77</volume>
          {
          <fpage>86</fpage>
          , New York, NY, USA,
          <year>2016</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Liu Zhenyu and Chu Wesley W.
          <article-title>Knowledge-based Query Expansion to Support Scenario-speci c Retrieval of Medical Free Text</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>173</volume>
          {
          <fpage>202</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>