<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The University of Iowa at CLEF 2014: eHealth Task 3</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chao Yang</string-name>
          <email>chao-yang@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanmitra Bhattacharya</string-name>
          <email>sanmitra-bhattacharya@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Padmini Srinivasan</string-name>
          <email>padmini-srinivasan@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Iowa</institution>
          ,
          <addr-line>Iowa City, IA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>283</fpage>
      <lpage>295</lpage>
      <abstract>
        <p>The task 3 of CLEF eHealth Evaluation lab aims to help laypeople get more accurate information from health related documents. In this task, we did several experiments and tried di erent technologies to improve the retrieval performance. We tried to clean the original dataset and did sentence level retrieval. We explored di erent parameter settings for pseudo relevance feedback. Description and Narrative was utilized to expand the query as well. We also modi ed Markov Random Field (MRF) model to expand the query using medical phrase only. In our training set (2013 test set), using those methods can signi cantly improve the retrieval performance by 8-15% from baseline. We submitted 4 runs. Results on 2014 test set suggest that the technologies we used except MRF have the potential to improve the performance for the top 5 retrieved results.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Query Expansion</kwd>
        <kwd>Pseudo Relevance Feedback</kwd>
        <kwd>Markov Random Field</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The ShARe/CLEF eHealth Evaluation Lab[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is part of CLEF 2014 Conference
and Labs of the Evaluation Forum1. It aims to help laypeople understand health
related documents better. We participated in Task 3: User-centred health
information retrieval[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Its goal is to develop a more accurate retrieval strategy for
health related documents. Speci cally, participants were required to submit a
list of relevant health related document ids for each query (topic). In 2014, Task
3 includes a monolingual IR task (Task 3a) and a multilingual IR task (Task
3b). We participated in Task 3a only.
      </p>
      <p>In particular we asked questions like:
1) Does sentence splitting on documents help improve retrieval performance?
2) How does one optimize the parameters for pseudo relevance feedback?
3) Is query expansion using descriptions and narratives more e ective than
using titles only?</p>
      <p>3) Can we include medical phrase detection to make a better Markov random
eld (MRF) model?</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>The dataset for Task 3 is provided by Khresmoi project2. It has a set of
medicalrelated documents in HTML format. The documents are from well-known health
and medical sites and databases. The size of dataset is about 41G
(uncompressed), it has 1,103,450 documents.
2.1</p>
      <sec id="sec-2-1">
        <title>HTML to Text</title>
        <p>Since the format of the documents is HTML, it has a lot of HTML tags and other
noises which may a ect the retrieval performance if we index them directly. We
employed Lynx3, a command line browser to convert the HTML les to text only.
The size of the text only dataset decreased to 8.6G. Then we replaced frequent
UTF-8 broken characters4. We named this text only dataset \All Text".
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Content Cleaning</title>
        <p>The ideal text we extract should be the main article from the webpage.
However, there are di erent sections in a typical webpage. The sections could be the
structure information about the website, contact information, headlines, even
advertisement. The example in Figure 1 shows the beginning of one text output
from Lynx. Except for the last two lines, all the information is unrelated to the
main article.
* Home
* About
* Ask A Question
* Attract CME
Attract
NPHS Logo
Search Clinical Questions Enter search details Search
A total of 1713 clinical questions available
Quick Guide to ATTRACT
What is the evidence for betamethasone cream versus circumcision in phimosis?
Associated tags:child health, men's health, circumcision, phimosis, treatment,
corticosteroid
...</p>
        <p>However, to remove all those irrelevant information is not trivial. In order to
keep the main article only, we tried to use simple rules to remove the headlines,
titles. In particular, we removed all the lines which have less and equal than 3
tokens. We also removed all the lines which start with either `*', `+', `-', `o', `#',
and `@'. Those are the headline start symbols from Lynx.</p>
        <p>After the data cleaning mentioned above, the dataset we have is about 5.4G.
We name this collection \Text Clean".
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Sentence Splitting</title>
        <p>
          Besides indexing whole documents, we also explored sentence level retrieval. We
used GENIA Sentence Splitter (GeniaSS) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to split sentences of each text
document from \All Text". This sentence splitter is optimized for the biomedical
documents and has good performance. Keeping track of the original text
document id we created 3 sentence level datasets: \Sent 1", \Sent 2", and \Sent 3".
        </p>
        <p>\Sent 1" has only single sentences. (In other words, we treat each sentence
as a logical `document'.)
\Sent 2" has pairs of adjacent sentences.</p>
        <p>\Sent 3" has sequences of 3 adjacent sentences.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Training Topic Set</title>
        <p>We did not use the training topics provided in CLEF eHealth 2014 because
there were only 5 topics and the coverage of qrels le is small. Therefore, we
used CLEF eHealth 2013 test topics as our training topics. The 2013 test set
has 50 topics. Figure 2.4 shows an example training topic.
&lt;query&gt;
&lt;id&gt;qtest1&lt;/id&gt;
&lt;discharge summary&gt;00098-016139-DISCHARGE SUMMARY.txt
&lt;/discharge summary&gt;
&lt;title&gt;Hypothyreoidism&lt;/title&gt;
&lt;desc&gt;What is hypothyreoidism&lt;/desc&gt;
&lt;narr&gt;description of what type of disease hypothyreoidism is&lt;/narr&gt;
&lt;pro le&gt;A forty year old woman, who seeks information about her
condition&lt;/pro le&gt;
&lt;/query&gt;</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Baseline</title>
      <p>
        To nd out our baseline strategy we created separate indexes from di erent
datasets (\All Text", \Text Clean", \Sent 1", \Sent 2" and \Sent 3") using
Indri [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We ltered out stopwords during indexing and in the queries. We ran
Indri's Query Likelihood model used title only as query to retrieve documents from
di erent indexes and the one with best performance is our baseline. For instance,
the query for the example in Section 2.4 is \#combine(Hypothyreoidism)"
      </p>
      <p>The evaluation focused on P@5, P@10, NDCG@5, and NDCG@10. These
results including MAP are shown in Table 1. We also include the baselines and
the best performing runs in 2013. Scores bolded are the best for that measure
in the table.</p>
      <p>
        Again, Title All Text is the retrieval strategy using title as query and All Text
as index which mentioned before. BM25 and BM25 FB (with Pseudo Relevance
Feedback) are the o cial baselines in 2013. The two o cial baselines only use
title as query. (The same strategy with Title All Text.) Mayo2 and Mayo3 are
the best 2 runs last year from Zhu et al. at Mayo Clinic[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Our Title All Text
is better than BM25 in all the measures, it could have bene ted from using
Lynx to output text format. It even outperforms Mayo2 and Mayo3 in terms
of NDCG@5 and NDCG@10 (but not in P@5, P@10 or MAP). However, using
title only to retrieve from Text Clean and Sent 1/2/3 indexes did not improve
the performance. Especially for using Sent 1/2/3, the performance for all the
measures dropped signi cantly.
      </p>
      <p>Therefore, we use Title All Text as the baseline for the later experiments.
We drop the Text Clean and the three sentence level datasets since these do not
improve retrieval performance.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Optimize Pseudo Relevance Feedback</title>
      <p>
        Pseudo Relevance Feedback is a popular and successful method for
expanding queries. We can see in Table 1, the o cial baseline BM25 FB outperforms
BM25 in almost all of the measures. We tried to improve on our baseline results
with Title All Text by optimizing the parameters of Pseudo Relevance
Feedback (Lavrenko's relevance models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) using Indri. There are 3 parameters that
need to be set. The rst is the weight of original query (Weight). The weight
for the expanded query is 1-Weight. The number of documents used for pseudo
relevance feedback. The number of terms selected for the feedback query.
      </p>
      <p>One important notice is that in the later experiments, if a retrieved
document which ranked in top 10 is not in the 2013 test qrels (since 2013 test topics
are our training topics) provided, we judge it by ourselves and add it to the 2013
test qrels. When judging the documents, we always tried to refer how the
documents were labeled in the o cial qrels (Actually, a lot of documents are almost
identical, but only some of them were labeled because of pooling). In the end of
our experiments, we added total of 310 documents in the qrels. (80 relevant and
230 non-relevent documents.) It is true adding the qrels might make the later
comparison against the 2013 o cial submitted runs and 2013 baselines unfair.
But it would be also impossible to improve our retrieval strategies if we don't
label the unjudged top 10 retrieved documents.
4.1</p>
      <sec id="sec-4-1">
        <title>Weight of Original Query</title>
        <p>We experimented with Weight from 0.1 to 0.9. We set the initial value of #
terms and # docs to 20 and 5 respectively. Result is shown in Table 2.</p>
        <p>Weight between 0.6 and 0.9 seem strong across the measures. We favor 0.6
and 0.7 in terms of emphasizing precision at high ranks.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Number of Documents</title>
        <p>We explored di erent values for number of documents from 5 to 50. We tried
both 0.6 and 0.7 for Weight, which is the optimal values from the last experiment.
Again, the initial value for number of terms is set to 20. Table 3 shows the result
for Weight=0.6, as it performs better than 0.7 in the experiment.</p>
        <p>The optimal value for number of documents is 10 (both for Weight = 0.6 and
0.7).
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Number of Terms</title>
        <p>Next we explored values of number of terms from 5 to 50. We set Weight and
# Docs to 0.6 and 10 respectively based on the previous experiments. Table 4
shows the result. We also show the baseline results (without the bene t of pseudo
relevance feedback).
Both 40 and 45 are good values for # Terms. We choose 45 for the later
experiment since we would like to focus more on top 5 performance (In the later o cial
evaluation, top 10 was used in the primary measures). Finally our parameters
for pseudo relevance feedback, Weight, number of Docs, number of Terms are
0.6, 10, and 45 respectively.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Expanding the Query Using Description &amp; Narrative</title>
      <p>From the topic example in Section 2.4, we know the title only contains the
minimum information for the topic. In order to better describe the information
needs of the user, we could expand the query using description or narrative eld
of the topic.</p>
      <p>We explored linear combinations of title and description, title and narrative
to improve retrieval performance. Speci cally we weight the title by WeightT and
weight for description or narrative by 1-WeightT. (We also ltered out stopwords
for description or narrative elds.)</p>
      <p>
        The results of linear combination of title and description, title and narrative
are shown in Table 5 and Table 6 respectively. We can see that for both Table 5
and Table 6, when the weightT increases, performance also increases. But even
the weightT=0.9, it is still not as good as the baseline. Therefore, using
description or narrative elds did not signi cantly improve retrieval performance. These
elds may require more sophisticated methods to extract keywords and combine
them with the title.
Inspired by Zhu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we explored Markov Random Field (MRF) model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
as well. Zhu et al. used the parameters settings described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For example
if the topic title is "Coronary artery disease", the expanded Indri query using
MRF model should be:
#weight( 0.8 #combine(coronary artery disease) 0.1 #combine( #1(coronary
artery) #1(artery disease) ) 0.1 #combine( #uw8(coronary artery) #uw8(artery
disease) ) )
In this section, we describe how we modi ed the MRF model and explored the
parameters. In order to distinguish the original MRF from our modi ed version,
we call the original MRF, MRF Bigram since it expands the query using bigrams
in the query. And we call our modi ed version, MRF MedPhrase.
6.1
      </p>
      <sec id="sec-5-1">
        <title>MRF Bigram</title>
        <p>There are 3 parameters for MRF Bigram model: weight of the title (WeightT)
(weights for #1 part and uw8 part are both equal to (1-WeightT)/2 ), Window
Type (uw or od: uw/od means unordered/ordered window for the terms), and
Window Size (e.g uw8 means unordered window size 8 in Indri). We began with
the experiment for the WeightT. The initial value for Window Type &amp; Size are
set to uw and 8 respectively. The result is shown in Table 7.</p>
        <p>MRF Bigram model does improve retrieval performance compared to our
baseline (Title All Text). The optimal value for the WeightT is 0.8 or 0.9. We
choose 0.8 since we focused on the top 5 performance more (Again, the o cial
evaluation later focuses on the top 10 ).</p>
        <p>Next, we would like to nd if changing Window Type &amp; Size would a ect
the retrieval performance. Results exploring Window Type &amp; Size are shown in
Table 8.</p>
        <p>
          Therefore, WeightT 0.8, uw5 are our optimal parameters for MRF Bigram
model.
MRF Bigram does improve the retrieval performance, but using bigram does
not always make sense. For example, ideally topic \facial cuts and scar tissue"
should be interpreted as phrases \facial cuts" and \scar tissue". Bigram \cuts
scar" (ignore stopwords) does not make sense. Therefore, we modi ed the
original MRF model and only use medical phrases to expand the query. Using the
same example in Section 2.4, MRF MedPhrase model should generate the query
like:
#weight( 0.8 #combine(coronary artery disease) 0.1 #combine( #1(coronary
artery disease)) 0.1 #combine( #uw5(coronary artery disease) ) )
Because coronary artery disease is a medical phrase. Using another topic
example: \shortness breath swelling". The query using MRF MedPhrase model
should generate the query like:
#weight( 0.8 #combine(shortness breath swelling) 0.1 #combine( #1(shortness
breath) swelling ) 0.1 #combine( #uw5(shortness breath) swelling ) )
To identify the medical phrases, we use MetaMap [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to parse the title of topic.
Similar with the MRF Bigram, we found the optimal parameter value for WeightT
is 0.8, the Window Type &amp; Size should be set as uw5 as well.
        </p>
        <p>To make the extraction of medical phrases correct, we need to also
enabled spell checking (SC) for MRF models. Table 9 shows the comparison for
MRF Bigram and MRF MedPhrase. In the comparison, we combined MRF with
Pseudo Relevance Feedback (RF) as well.</p>
        <p>Supporting our intuition, MRF MedPhrase model outperforms MRF Bigram
for all the measures.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Expand Medical Abbreviation</title>
      <p>Our best run using MRF MedPhrase with spell checking and pseudo relevance
feedback is signi cantly better than the best runs last year. But there is one
more important thing to do. There are several abbreviations in the medical
topics, which would be very helpful if we can expand them. However, to expand
medial abbreviation is also not trivial. We tried several medical abbreviation
lists and found the one from Wikipedia5 might be the most appropriate one for
our task. However, there are still some abbreviations missed. In the 2014 test
data, we found \L" could mean \left" which our method cannot expand.</p>
      <p>The result is shown in Table 10. Using medical abbreviation expansion does
help achieve higher performance.</p>
      <p>So far, we did several experiments including cleaning the web text, sentence
level retrieval, pseudo relevance feedback, linear combination of title and
description/narrative, MRF model, spell checking and abbreviation expansion. The
comparison between our best strategy and our baseline is shown in Table 11. Our
best strategy improved about 15% for the measures on top 5 retrieved results.
It also improved about 8-9% for the measures on top 10 retrieved results from
baseline.
0.5520 0.5120 0.5498
MRF MedPhrase (14.05%") (7.56%") (15.41%")</p>
      <p>RF SC Abbr
0.5257 0.2625
(9.27%") (11.7%")
8</p>
    </sec>
    <sec id="sec-7">
      <title>Submitted Runs And Results</title>
      <p>Because the discharge summary is very noisy, we didn't develop retrieval
strategies utilizing it. We submitted 4 runs in our nal submission. (The baseline is
5 http://en.wikipedia.org/wiki/List_of_medical_abbreviations:_A
run 1, the experiments without discharge summaries should be Runs 5-7. 5 is
the highest priority while 7 is the lowest.) Table 12 shows our runs and the
technologies used.</p>
      <p>Run 1 is our baseline, which only uses title to retrieve medical documents.
Run 5 is our best run, it uses Markov Random Field (MRF) model which
expands queries using only medical phrases, it also utilizes abbreviations expansion,
pseudo relevance feedback and spell checking. Run 6 is the same as Run 5, but
without pseudo relevance feedback. Run 7 is the same as Run 5, but without
MRF model.</p>
      <p>Table 13 shows the nal performance from the o cial evaluation.
Unfortunately, the runs do not signi cantly di er from each other. Our Run 7 has better
scores for P@5 and NDCG@5 which is our original focus. It shows that pseudo
relevance feedback has the ability to achieve high accuracy retrieval especially
for the top 5 results. (In the nal judgement, run 7 submission was not in the
judged pool. Therefore, the real performance for run 7 could be even higher.)
But our baseline (Run 1) has better performance for P@10 and NDCG@10 which
are the primary o cial measures. The MRF model we trained using 2013 test
data does not improve retrieval performance using 2014 test dataset. The reason
could be that we over tted the model though we attempted to avoid that pitfall.</p>
      <p>Figure 3 shows our Run 1 (since it has the best top 10 performance in our
runs) against the median and best performance (p@10) across all systems
submitted to CLEF for each query topic. Topics 8, 13, 15, 28, 34, 44, and 50 are
easily handled by Run 1, but topics 7, 11, 22, 32, 38, 40, 47 are di cult for it.</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>We explored cleaning of the dataset and sentence level retrieval. We showed
that retrieval performance did not improve by utilizing the two methods. We
also tried linear combinations of title and description/narrative, it seems it is
a non trivial task. We did experiments to nd out the optimal parameters for
pseudo relevance feedback, showed that it can achieve higher performance for
top 5 retrieved items. We modi ed the Markov Random Field model by using the
medical phrases to expand the query. This method shows the ability to achieve
higher performance on the 2013 queries but fails using the 2014 test dataset.
Future work planned includes a more sophisticated method to combine the title
and description/narrative/discharge summary, and avoiding the over tting of
the MRF model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.-M.</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <article-title>An overview of metamap: historical perspective and recent advances</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <volume>229</volume>
          {
          <fpage>236</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , G. Jones, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          .
          <article-title>Share/clef ehealth evaluation lab 2014, task 3: User-centred health information retrieval</article-title>
          .
          <source>In Proceedings of CLEF</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schrek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Leroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Mowery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Velupillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinez</surname>
          </string-name>
          , G. Zuccon, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Overview of the share/clef ehealth evaluation lab 2014</article-title>
          .
          <source>In Proceedings of CLEF 2014, Lecture Notes in Computer Science (LNCS)</source>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Relevance based language models</article-title>
          .
          <source>In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>120</volume>
          {
          <fpage>127</fpage>
          . ACM,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A markov random eld model for term dependencies</article-title>
          .
          <source>In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>472</volume>
          {
          <fpage>479</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Saetre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yoshida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yakushiji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miyao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsubayashi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ohta</surname>
          </string-name>
          .
          <article-title>Akane system: protein-protein interaction pairs in biocreative2 challenge, ppi-ips subtask</article-title>
          .
          <source>In Proceedings of the Second BioCreative Challenge Workshop</source>
          , pages
          <volume>209</volume>
          {
          <fpage>212</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Turtle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Indri: A language model-based search engine for complex queries</article-title>
          .
          <source>In Proceedings of the International Conference on Intelligent Analysis</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>2</fpage>
          <lpage>{</lpage>
          6.
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carterette</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Using discharge summaries to improve information retrieval in clinical domain</article-title>
          .
          <source>Proceedings of the ShARe/-CLEF eHealth Evaluation Lab</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>