<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>in Consumer Health Information Search: UEVORA @ 2016 FIRE CHIS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hua Yang</string-name>
          <email>huayangchn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Gonçalves</string-name>
          <email>tcg@uevora.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer science department, University of Évora</institution>
          ,
          <addr-line>Évora</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computer science department, University of Évora</institution>
          ,
          <addr-line>Évora</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant &amp; support, relevant &amp; neutral, relevant &amp; oppose, irrelevant &amp; neutral). Our submission for CHIS uses the first approach. • Information systems➝Data management system engines</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Health information search; machine learning; IR</title>
      <sec id="sec-1-1">
        <title>1. INTRODUCTION</title>
        <p>
          Online search engines have become a common way for obtaining
health information; a life project report shows that about 69% of
U.S. adults have the experience of using Internet as a tool for
health information such as weight, diet, symptoms and so on [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
In the meanwhile, research interest in health information retrieval
(HIR) has also grown in the past years. As a matter of fact, health
information is of interest to a variety of users, from physicians to
specialists, from practitioners to nurses, from patients to patients
family, and from biomedical researchers to consumers (general
public). Also, health information may be available in diverse
sources, like electronic health record, personal health records,
general web, social media, journal articles, and wearable devices
and sensors [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>While factual health information search has matured considerably,
complex health information searching with more than just one
single correct answer still remains elusive. Consumer Health
Information Search (CHIS) for FIRE 2016 is proposed for
investigating complex health information search by laypeople. In
this scenario, laypeople search for health information with
multiple perspectives from diverse sources both from medical
research and from real world patient narratives.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>There are two sets of tasks:</title>
      <p>A)
B)</p>
      <p>Given a CHIS query, and a document/set of documents
associated with that query, the task is to classify the
sentences in the document as relevant to the query or not.
The relevant sentences are those from that document, which
are useful in providing the answer to the query.</p>
      <p>These relevant sentences need to be further classified as
supporting the claim made in the query, or opposing the
claim made in the query.</p>
      <p>The five queries proposed in the task are showed in figure 1.
Figure 2 gives an example of the output of the system. Annotated
data set is provided to participants.</p>
      <p>This paper is divided into 4 sections. In the first section, we
briefly introduced the background and the 2016 FIRE CHIS task.
We then talk about the methods we use in the second section. Two
different approaches are experimented to accomplish the task and
each approach will be discussed. Experiments and the results are
presented in the third section. Finally, the conclusions are made.</p>
    </sec>
    <sec id="sec-3">
      <title>Q1: Does sun exposure cause skin cancer？</title>
      <p>Q2: Are e-cigarettes safer than normal
cigarettes?
Q3: Can Harmone Replacement Therapy(HRT)
cause cancer?
Q4: Can MMR Vaccine lead to children
developing autism?
Q5:Should I take vitamin C for common cold?
S2:
David Peyton, a chemistry professor at Portland
State University who helped conduct the research,
says that the type of formaldehyde generated by
ecigarettes could increase the likelihood it would get
deposited in the lung, leading to lung cancer.</p>
      <p>A) Relevant, B) oppose
S3:
Harvey Simon, MD, Harvard Health Editor,
expressed concern that the nicotine amounts in
ecigarettes can vary significantly.</p>
      <p>A)Irrelevant, B) Neutral
as relevant to the query and non-retrieved as irrelevant. Figure 3
depicts our model for task A. First, we input the original task
queries and provided sentences into the IR model. The relevant
sentences are retrieved and ranked according to the weighting
methods. Top ranked (in our experiments, we choose top 3)
relevant sentences are used as the source to expand the original
queries. Expanded queries are used as the input. The IR model is
used again to retrieve sentences with expanded queries. The
relevant sentences are used as the input of a classification model
works. We regard all the retrieved sentences from our IR model as
relevant to the query and we use them the input of task B.</p>
      <sec id="sec-3-1">
        <title>METHODS</title>
        <p>We propose two different approaches to accomplish the task. In
order to make it easier to explain, we name them program A and
program B. In program A, two different models are trained by
using both state of the art in information retrieval and machine
learning techniques. In program B, we take the task as a whole
and only use machine learning techniques. One single
classification model is trained in program B. We will discuss each
approach in detail in the following part.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.1 Program A</title>
        <p>Considering the task is divided into sub-tasks, we implement two
different models to satisfy the task, with each model processing
one task. For task A, we implement an information retrieval (IR)
model to retrieve relevant sentences. The retrieved sentences are
regarded as relevant to the query, and non-retrieved ones as
irrelevant. For task B, we use a supervised learning algorithm to
get a classification model. The retrieved sentences from the first
part are then classified as support, oppose or neutral to the claim
made in the query.
2.1.1</p>
        <sec id="sec-3-2-1">
          <title>An IR model for Task A</title>
          <p>In task A, sentences provided by the organizer should be
classified as relevant to the queries or not. We implement an IR
model to do this classification. Retrieved sentences are regarded
Terrier1 is used to implement a baseline IR model. All queries and
sentences are pre-processed. Stop-words are removed, stemming
and normalization are applied. TF*IDF weighting model is used
for the computation of sentence scores with respect to the query.
The queries can be retrieved one by one or in batch. We use
pseudo relevance feedback as a way to expand the original queries.
We set all parameters to Terrier the default ones.</p>
          <p>
            Pseudo relevance feedback (a.k.a. blind relevance feedback) is a
way to improve retrieval performance without the user interaction
[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. Previous works showed its effectiveness in improving the
performance [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. Figure 4 depicts how this technique can be
used in an IR model to satisfy the user.
          </p>
          <p>
            This technique is used in our experiments to expand the original
query. The most informative terms are extracted from
topreturned documents as the expanded query terms, as shown in
Figure 4. We use Bo1 [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] as the expanded term weighting model.
A Bo1 model uses the Bose-Einstein statistics and terms are
weighted in the top retrieved documents. In our experiments, 10
expansion terms are extracted from the top 3 retrieved documents.
No other query expansion techniques are used in our experiments.
          </p>
          <p>The annotated dataset provided by the organizer is first
preprocessed. Then TF*IDF scheme is used to extract data features
from the text. These features will be used as the input of the
learning system to train a classification model. This model is able
to further classify the relevant sentences retrieved from the IR
model into support, oppose or neutral to the claim stated in the
query.</p>
          <p>TextBlob 3 tool is used for text processing. Naïve Bayes and
decision tree classifiers are used as learning methods. Only
TF*IDF features are extracted, no other data features are used in
our experiments.
2.1.3</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Integration</title>
          <p>The retrieved sentences by an IR model are regarded as relevant
to the query and they are further labeled as ‘neutral’, ‘support’, or
‘oppose’ to the query by the classification model. The
nonretrieved sentences from the IR model are regarded as irrelevant
to the query, and we assign ‘neutral’ label to all the irrelevant
sentences.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>2.2 Program B</title>
        <p>As another approach to figure out the problem and provide
multiperspective for the users, we look on the task as a whole and
reorganize the annotated data with four different labels:
-irrelevant &amp; neutral
-relevant &amp; support
-relevant &amp; oppose
-relevant &amp; neutral</p>
        <p>Using the annotated data with the labels above, we get a
classification model and this model is used to classify the test
sentences into those four classes. The approach is the same as the
one described in sub-section 2.1.3, but here we are using all the
sentences and instead of having three classes, we have four, as
figure 6 shows. The output is a sentence with one label from the
fours that we list above. For example:
Sentence: Harvey Simon, MD, Harvard Health Editor, expressed
concern that the nicotine amounts in e-cigarettes can vary
significantly.</p>
        <p>Output: Irrelevant &amp; Neutral</p>
        <p>All the sentences provided are pre-processed data and used to
train a classification model with supervised machine learning
techniques. We extract features with TF*IDF scheme. Test data
needs to be pre-processed before classification.
2Image from
http://www.slideshare.net/LironZighelnic/querydriftprevention-for-robust-query-expansion-presentation-43186077
3 https://textblob.readthedocs.io/en/dev/</p>
      </sec>
      <sec id="sec-3-4">
        <title>3. EXPERIMENTS AND RESULTS</title>
        <p>In this part, we give the results in our experiments. We will
present our experiments separately according to each program we
proposed in the previous part.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.1 Experiments of Program A</title>
        <p>3.1.1</p>
        <sec id="sec-3-5-1">
          <title>Runs for task A</title>
          <p>The results for different runs are shown in table1. TrecEval 4
program is used to evaluate the performance. We produce
different runs to compare the performance using F1 score as the
evaluation method.</p>
          <p>-taskA.run1: process all the queries without bath pseudo
relevance feedback</p>
          <p>-taskA.run2: process all the queries in batch with pseudo
relevance feedback</p>
          <p>-taskA.run3: process the queries individually without pseudo
relevance feedback</p>
          <p>-taskA.run4: process the queries individually with pseudo
relevance feedback
We got our best results with run4 and the average F1 score is 0.73.
The results present that our IR model works well on query3,
query4 and query5.</p>
          <p>Considering the way of processing, we can see that processing the
queries one by one is much better than all the queries in batch.
As a way to do the query expansion, PRF technique does improve
the recall obviously, which means it can get more relevant
documents returned. Also, this technique reacts differently
depending on the processing way. If all the queries are processed
in batch, using PFR decreases the performance in F1 score
compared with the results without using PFR,. If the query is
processed one by one, PRF increases the performance totally; but
some queries show a lit bit down score compared with non-PRF
using. We can also see that for query1 and query2, the score is
improved sharply when using PRF. Combining the task and our
system, we adopt PRF as a way to improve the system
performance.
3.1.2</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>Run for task B</title>
          <p>For task B, we use the traditional TF*IDF scheme to extract data
features and Naïve Bayes is used as the learning method. Table 2
present our experiment results for this part.</p>
          <p>From the results, we can see that the average score for this
classification is 0.28, which is very low.</p>
          <p>The classification is based on the results from the IR model. Some
sentences may be irrelevant to the query indeed, but is classified
as relevant to query, this kind of sentences are regarded as
relevant and be classified by the classification model. This will
affect the performance of the system.
4 http://trec.nist.gov/trec_eval/</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.2 Experiments of Program B</title>
        <p>The average score for this model is 0.64. We get highest score for
query 3 and the lowest one for query 5.
In this paper, we present our two different approaches to
accomplish 2016 FIRE CHIS task.</p>
        <p>With the first approach, we implement both an IR model and a
classification model. The results show that our IR model works
well generally except on query2. The classification model shows a
low performance for all.</p>
        <p>With the second approach, we take the task as a whole and using
machine learning techniques only to do the classification.
Although we figure out different approaches to the task, we have
different output form for two approaches; we do not compare the
performance of both approaches. The second approach presented
in our paper is just another possible way to solve the problem
proposed by the organizer. Program A is used as the final
submission to the challenge.</p>
      </sec>
      <sec id="sec-3-7">
        <title>5. REFERENCES</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Christopher</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            and
            <given-names>Hinrich</given-names>
          </string-name>
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          .
          <article-title>Foundations of statistical natural language processing</article-title>
          , volume
          <volume>999</volume>
          . MIT Press,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yang</surname>
            <given-names>Song</given-names>
          </string-name>
          , Yun He, Qinmin Hu,
          <string-name>
            <given-names>Liang</given-names>
            <surname>He</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E Mark</given-names>
            <surname>Haacke</surname>
          </string-name>
          .
          <article-title>Ecnu at 2015 ehealth task 2: User-centred health information retrieval</article-title>
          .
          <source>Proceedings of the ShARe/CLEF eHealth Evaluation Lab</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ellen</surname>
            <given-names>M Voorhees</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donna K Harman</surname>
          </string-name>
          , et al.
          <article-title>TREC: Experiment and evaluation in information retrieval</article-title>
          , volume
          <volume>1</volume>
          . MIT press Cambridge,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Susannah</given-names>
            <surname>Fox</surname>
          </string-name>
          and
          <string-name>
            <given-names>Maeve</given-names>
            <surname>Duggan</surname>
          </string-name>
          .
          <article-title>Tracking for health</article-title>
          .
          <source>Pew Research Center's Internet &amp; American Life Project</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Gareth JF Jones,
          <string-name>
            <surname>Liadh Kelly</surname>
            ,
            <given-names>Henning</given-names>
          </string-name>
          <article-title>M¨uller</article-title>
          , and Justin Zobel.
          <article-title>Medical information retrieval: introduction to the special issue</article-title>
          .
          <source>Information Retrieval Journal</source>
          ,
          <volume>1</volume>
          (
          <issue>19</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Giambattista</given-names>
            <surname>Amati</surname>
          </string-name>
          .
          <article-title>Probability models for information retrieval based on divergence from randomness</article-title>
          .
          <source>PhD thesis</source>
          , University of Glasgow,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>