<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CUNI at the ShARe/CLEF eHealth Evaluation Lab 2014</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shadi Saleh</string-name>
          <email>saleh@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Pecina</string-name>
          <email>pecina@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Formal and Applied Linguistics Charles University in Prague</institution>
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>226</fpage>
      <lpage>235</lpage>
      <abstract>
        <p>This report describes the participation of the team of Charles University in Prague at the ShARe/CLEF eHealth Evaluation Lab in 2014. We took part in Task 3 (User-Centered Health Information Retrieval) and its both subtasks (monolingual and multilingual retrieval). Our system was based on the Terrier platform and its implementation of the Hiemstra retrieval model. We experimented with several methods for data cleaning and automatic spelling correction of query terms. For data cleaning, the most e ective method was to employ simple HTML markup removal. The more advanced cleaning methods which remove boilerplate decreased retrieval performance. Automatic correction of spelling errors performed on the English queries for the monolingual task proved to be e cient and leaded to our best P@10 score equal to 0.5360. In the multilingual retrieval task, we employed the Khresmoi medical translation system developed at the Charles University in Prague and translated the source queries from Czech, German, and French to English and employed the same retrieval system as for the monolingual task. The cross-lingual retrieval performance measured by P@10 relative to the scores obtained in the monolingual task ranged between 80% and 90% depending on the source language of the queries.</p>
      </abstract>
      <kwd-group>
        <kwd>multilingual information retrieval</kwd>
        <kwd>data cleaning</kwd>
        <kwd>machine translation</kwd>
        <kwd>spelling correction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The digital medical content available on-line has grown rapidly in recent years.
This increase has a potential to improve user experience with Web medical
information retrieval (IR) systems which are more and more often used to consult
users' health related issues. Recently, Fox [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] reported that about 80% of Internet
users in the U.S. look for health information on-line and this number is expected
to grow in future.
      </p>
      <p>
        In this report, we describe our participation at the the ShARe/CLEF eHealth
Evaluation Lab 2014, Task 3 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which focus on developing methods and data
resources for evaluation of IR from the perspective of patients.
      </p>
      <p>
        Our system is built on Terrier [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and employs its implementation of the
Hiemstra retrieval model. The main contribution of our participation is the
examination of several methods for cleaning the document collection (provided as
raw documents with HTML markup) and automatic correction of spelling errors
in query terms and handling unknown words.
      </p>
      <p>In the remainder of the paper, we review the task speci cation, describe the
test collection and queries, our experiments, their results and conclude with the
main ndings of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task description</title>
      <p>The goal of Task 3 in ShARe/CLEF eHealth Evaluation Lab 2014 is to design
an IR system which returns a ranked list of medical documents (English web
pages) from the provided test collection as a response to patients' queries. The
task is split into two tasks:
{ Task 3a is a standard TREC-style text IR task1. The participants had to
develop monolingual retrieval techniques for a set of English queries and return
the top 1,000 relevant documents from the collection for each query. They
could submit up to seven ranked runs: Run 1 as a baseline, Runs 2{4 for
experiments exploiting discharge summaries provided for each query, and Runs
5{7 for experiments not using the discharge summaries (see Section 3.2).
{ Task 3b extends Task 3a by providing the queries in Czech, German, and
French. The participants were asked to use these queries to retrieve relevant
documents from the same collection as in Task 3a. They were allowed to
submit up to seven ranked runs for each language using same restrictions as
in Task 3a. No restrictions were put on techniques for translating the queries
to English.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <sec id="sec-3-1">
        <title>Document Collection</title>
        <p>
          The document collection for Task 3 consists of automatically crawled pages from
various medical web sites, including pages certi ed by the Health On the Net2
and other well-known medical web sites and databases. The collection was
provided by the Khresmoi project3 and covers a broad set of medical topics. For
details, see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The collection contains a total of 1,104,298 web pages. We excluded a small
portion of the pages because of no or un-readable content (382 pages contained
a Flash-related error message, and 658 pages were unreadable binary les).
1 http://trec.nist.gov/
2 http://www.hon.ch/
3 http://khresmoi.eu/
&lt;t o p i c&gt;
&lt;i d&gt;q t e s t 2 0 1 4 . 4 7&lt;/ i d&gt;
&lt;discharge summary&gt;</p>
        <p>22821 026994 DISCHARGE SUMMARY. t x t
&lt;/ discharge summary&gt;
&lt; t i t l e&gt;</p>
        <p>tretament f o r s u b a r a c h n o i d hemorrage
&lt;/ t i t l e&gt;
&lt;d e s c&gt;</p>
        <p>What a r e the t r e a t m e n t s f o r s u b a r a c h n o i d hemorrage ?
&lt;/ d e s c&gt;
&lt;n a r r&gt;</p>
        <p>Relevant documents s h o u l d c o n t a i n i n f o r m a t i o n on the treatment
f o r s u b a r a c h n o i d hemorrage .
&lt;/ n a r r&gt;
&lt; p r o f i l e&gt;</p>
        <p>This 36 y e a r o l d male p a t i e n t does not remember how he was t r e a t e d
i n the h o s p i t a l . Now he wants to know about the c a r e f o r
s u b a r a c h n o i d hemorrage p a t i e n t s .</p>
        <p>
          &lt;/ p r o f i l e&gt;
&lt;/ t o p i c&gt;
Two sets of queries were provided for Task 3 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The training set of 5 queries and
their matching relevance assessments and a test set of 50 queries for the main
evaluation. All queries were provided in English (for Task 3a) and in Czech,
German, and French (for Task 3b).
        </p>
        <p>The English queries were constructed by medical professionals from the main
disorder diagnosed in discharge summaries of real patients (i.e. documents
containing a summary of important information from their entire hospitalization)
provided for Task 2. Then, the queries were translated to Czech, German, and
French by medical professionals and reviewed. Each query description consists
of the following elds:
{ title: text of the query,
{ description: longer description of what the query means,
{ narrative: expected content of the relevant documents,
{ pro le: main information on the patient (age, gender, condition),
{ discharge summary: ID of the matching discharge summary.</p>
        <p>An example of a query is given in Figure 1 and some basic statistics associated
with the query sets are shown in Table 1.
Our system consists of three main components: a document processing
component, a query processing component, and a search engine (see Figure 2). First,
the collection is processed and data to be indexed is extracted from each
document in the collection. Second, the search engine is employed to index the data.
Third, each query is processed (eventually translated), enters the search engine
which retrieves the top 1,000 ranked documents based on a retrieval model and
its parameters.</p>
        <p>
          The main evaluation metrics for Task 3 is precision at top 10 ranked
documents (P@10), however, we also present results of other well known metrics
implemented in the standard trec eval tool:4 such as precision at top 5 ranked
documents (P@5), Normalized Discount Cumulative Gain at top 5 and 10 ranked
documents (NDCG@5, NDCG@10), Mean Average Precision (MAP), precision
after R documents have been retrieved where R is the number of known
relevant documents (Rprec), binary preference (bpref), and the number of relevant
documents (rel ret). In the remainder of this section, we describe our retrieval
system in more detail.
We employ Terrier 3.5 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] as the search engine for indexing and retrieval. The
retrieval model is the standard Hiemstra language model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as implemented in
Terrier, where given a query Q and its terms Q = (t1; t2; : : : ; tn), each document
D in a collection C is scored using the following formula:
        </p>
        <p>P (D; Q) = P (D)
i) P (tijC) + iP (tijD)) ;
where P (D) is the prior probability of D to be relevant estimated as by summing
up frequencies of query terms in the document D over their frequencies in the</p>
        <sec id="sec-3-1-1">
          <title>4 http://trec.nist.gov/trec_eval/</title>
          <p>n
Y ((1
i=1
0.8
0.7
0.6
0.4
0.3
0.4</p>
          <p>Lambda
0.1
0.2
0.3
0.5
0.6
0.7
0.8
collection C. P (tijC) and P (tijD) are probabilities of ti in the collection C and
document D, respectively. They are estimated by maximum likelihood estimation
using frequencies of the term ti in the collection C and document D, respectively.</p>
          <p>i is a linear interpolation coe cient re ecting the overall importance of the term
ti. In our system, we used the same value for all the terms and tune it on the
training query set by grid search to maximize MAP (see Figure 3). The highest
MAP value was achieved with =0.087 which is used in all our experiments.
After releasing the relevance assessments of the test queries, we measured the
e ect of on the test set performance and the results are shown in Figure 3 too.
The gure also contains test and training curves for P@10, the o cial measures
for Task 3 in 2014, which was announced together with the evaluation results.</p>
          <p>
            We also perform Pseudo Relevance Feedback (PRF) implemented in
Terrier as query expansion which modi es (expands) a given query by adding the
most informative terms from top retrieved documents and performs the retrieval
again with the expanded query. We use Terrier's implementation for Bo1
(BoseEinstein 1) from the Divergence From Randomness framework [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. We expanded
each query by taking ten highest scored terms from three top ranked documents.
These values achieved the best results measured on the training set, although
they were lower than the results without PRF.
4.2
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Document processing</title>
        <p>The documents in the collection are provided as raw web pages including all
the HTML markup and eventually also CSS style de nitions and Javascript
code which should be removed before indexing. We employed three data
cleaning methods and evaluate their e ect on the retrieval quality measured on the
training queries.</p>
        <p>First, we simply removed all markup, style de nitions, and script code by
the a Perl module HTML-Strip5 (but keep meta keywords and meta description
tags). This reduces the total size of the collection from 41,628 MB to 6,821 MB,
which is about 16% of the original size and and average document length is 911
tokens (words and punctuation marks).</p>
        <p>
          Although the size reduction is very substantial, the resulting documents still
contained a lot of noise (such as web page menus, navigation bars, various
headers and footers), which is likely not to be relevant to the main content of the
page. Such noise is often called boilerplate. We used two methods to remove it:
Boilerpipe [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] reduced the total number of tokens in the collection by additional
58% (the average document length is 383 tokens) and JusText [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] by 55% (the
average document length is 409 tokens).
        </p>
        <p>More details from the data cleaning phase are provided in Table 2. Table 3
then reports the IR results obtained by the Hiemstra model using the training set
and the collection processed by the three methods compared with the case where
no cleaning was performed at all. Surprisingly, the most e ective method is the
simple HTML-Strip tool. The two other methods are probably too aggressive and
remove some relevant material important for IR. In all the following experiments,
the collection is cleaned by HTML-Strip.
4.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Query processing</title>
        <p>
          For Task 3a, the queries entering the search engine of our system are constructed
from the title and narrative description elds. For Task 3b, we translated the
queries to English by the Khresmoi translator described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] which is tuned
        </p>
        <sec id="sec-3-3-1">
          <title>5 http://search.cpan.org/dist/HTML-Strip/Strip.pm</title>
          <p>
            speci cally to translate user queries from the medical domain [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. For
comparison, we also provide results obtained by translating the queries using on-line
translators Google Translate6 and Bing Translator7. In the baseline experiment,
we take the title terms as they are. As an additional query processing step, we
attempt to handle words which are unknown.
          </p>
          <p>There are three types for unknown words in the queries. The rst (and
frequent) type is made of words with spelling errors. Such errors could be
automatically and corrected. The second type of unknown words is made of words, which
are correct in the source language, but they are out-of-vocabulary (OOV) for
the translation system and thus remain untranslated. Such words could be
modi ed/replaced by known words (e.g. morphological variant) before translation or
translated ex-post by a dictionary look-up or another translation system. The
third type is made of query terms which are correct (and correctly translated)
but they do not appear in the test collection and are not indexed. In such a case,
there is no straightforward and easy solution how to deal with them (possibly
they could be replaced by a synonym or another related words).
Spelling correction The queries for Task 3 were written by medical
professionals, but this does not guarantee that they do not contain spelling errors, see
e.g. the query in Figure 1, where the word tretament contains a spelling error.
To detect and correct the spelling errors we employed an on-line English
medical dictionary MedlinePlus8. This dictionary provides a de nition for correct
medical words and for those which are not correct, it o ers a possible correction.</p>
          <p>We automated the process and for each term in the title and narrative of the
English queries, we check whether the word exists or not. If the response is "404
not found", we parse the page to get the closest word.</p>
          <p>Two steps translation After translation, our spell checking script reported
some unknown words which left untranslated by the Khresmoi system. We passed
all such words to Google Translate and obtained their translation which replaced
the untranslated forms.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>6 http://translate.google.com/</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>7 http://www.bing.com/translator/</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>8 http://www.nlm.nih.gov/medlineplus/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and results</title>
      <sec id="sec-4-1">
        <title>Task 3a: monolingual IR</title>
        <p>We submitted 4 runs (RUN1, RUN5, RUN6 and RUN7). We did not submit
RUN2{4 because we did not use the discharge summaries in our experiments.
In all the submitted runs, we apply the Hiemstra retrieval mode with the tuned
parameter value on the test collection processed by the HTML-Strip. The
submitted runs employs the techniques discussed in the previous section as follows:
RUN1 exploits queries constructed from the titles only without any processing.
RUN5 extends RUN1 by conducting spelling correction on query titles.
RUN6 extends RUN5 by applying PRF (query expansion).</p>
        <p>RUN7 extends RUN6 by adding queries from both of titles and narrative tags.</p>
        <p>The results of our runs submitted to Task 3a are summarized in Table 4.
The only improvement was achieved by RUN5 implementing spelling correction
of English. We found 11 misspelled words in English test queries which a ected
7 queries in total. Neither query expansion using PRF in RUN5 nor adding
addition query terms from the narrative elds bring any improvement.
In the second part of Task 3, we apply the previous runs on translated queries
using the same setup. But we handle OOV problem in RUN5 not by spelling
correction as we do in task 3a. We found 7 untranslated words from Czech test
queries, 5 words French, and 12 from German, which were post-translated.</p>
        <p>The best P@10 is achieved by Czech IR using RUN5 as shown in Table 5.
Solving the OOV issue in Czech queries enhances the results by 5.4%, 1.2% in
French IR and 3.6% in German IR, while PRF in all multilingual runs does
not help. It might have happened because of complex morphological forms for
medical terms. Also the usage of narrative tags does not improve the results.</p>
        <p>
          Uno cially, we also show results for queries have been translated using
Google Translate (See RUN1G) and Bing Translator (See RUN1B). Google
Translate does better than Bing Translator on Czech and German, while Bing
Translator performs better than Google Translate when the source language is French.
However, both these services outperform the Khresmoi translator on this test
set.
We have described our participation in ShARe/CLEF 2014 eHealth Evaluation
Lab Task 3 in its two subtask. Our system was based on the Terrier platform
and its implementation of the Hiemstra retrieval model. We experimented with
several methods for data cleaning in the test collection and domain-speci c
language processing (e.g., correction of spelling errors) and found that the optimal
cleaning method is a simple removal of HTML markup. In future, we would like
to examine query expansion techniques based on the UMLS [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] thesaurus and
extend our work on spelling correction to languages other than English.
This work was funded by the Czech Science Foundation (grant n. P103/12/G084)
and the EU FP7 project Khresmoi (contract no. 257528).
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amati</surname>
          </string-name>
          , G.:
          <article-title>Probability models for information retrieval based on divergence from randomness</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Glasgow (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Health Topics:
          <volume>80</volume>
          %
          <article-title>of internet users look for health information online</article-title>
          .
          <source>Tech. rep.</source>
          , Pew Research Center (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
          </string-name>
          , H.:
          <article-title>Share/clef ehealth evaluation lab 2014, task 3: User-centred health information retrieval</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hiemstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraaij</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Twenty-one at TREC-7: ad-hoc and cross-language track</article-title>
          .
          <source>In: Proceedings of the seventh Text Retrieval Conference TREC-7</source>
          . pp.
          <volume>227</volume>
          {
          <fpage>238</fpage>
          .
          <string-name>
            <surname>US</surname>
          </string-name>
          <article-title>National Institute of Standards and Technology (</article-title>
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schrek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
          </string-name>
          , J.:
          <article-title>Overview of the share/clef ehealth evaluation lab 2014</article-title>
          .
          <source>In: Proceedings of CLEF 2014. Lecture Notes in Computer Science (LNCS)</source>
          , Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Kohlschutter,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Fankhauser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <article-title>Boilerplate detection using shallow text features</article-title>
          .
          <source>In: Proceedings of the Third ACM International Conference on Web Search and Data Mining</source>
          . pp.
          <volume>441</volume>
          {
          <fpage>450</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amati</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plachouras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lioma</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Terrier: A high performance and scalable information retrieval platform</article-title>
          .
          <source>In: Proceedings of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR</source>
          <year>2006</year>
          )
          <article-title>(</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dusek</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hlavacova</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marecek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamchyna</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uresova</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Adaptation of machine translation for multilingual information retrieval in the medical domain</article-title>
          .
          <source>Arti cial Intelligence in Medicine</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pomikalek</surname>
          </string-name>
          , J.:
          <article-title>Removing Boilerplate and Duplicate Content from Web Corpora</article-title>
          .
          <source>Ph.D. thesis, Ph.D. thesis</source>
          , Masaryk University (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Uresova</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dusek</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Multilingual test sets for machine translation of search queries for cross-lingual information retrieval in the medical domain</article-title>
          . In: Chair),
          <string-name>
            <given-names>N.C.C.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Loftsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          , S. (eds.)
          <source>Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          .
          <source>European Language Resources Association</source>
          , Reykjavik, Iceland (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. U.S.
          <article-title>National Library of Medicine: UMLS reference manual (2009), metathesaurus</article-title>
          . Bethesda,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>