<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raksha Sanjay Jalan</string-name>
          <email>jalan.raksha@research</email>
          <email>jalan.raksha@research. iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pattisapu Nikhil Priyatam</string-name>
          <email>nikhil.pattisapu@research</email>
          <email>nikhil.pattisapu@research. iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasudeva Varma</string-name>
          <email>vv@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Search and Information, Extraction Lab, IIIT Hyderabad</institution>
          ,
          <addr-line>Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>5</lpage>
      <abstract>
        <p>World Wide Web acts as one of the major sources of information for health related questions. However, often, there are multiple con icting answers to a single question and it is hard to come up with \a single best correct answer". Therefore, it is highly desirable to identify con icting perspectives about a particular question (or topic). In this paper, we have described our participation in Consumer Health Information System(CHIS) task at FIRE 2016. There were two sub-tasks in this contest. The rst sub-task deals with identifying if a particular answer is relevant to a given question. The second sub-task deals with detecting if a particular answer agrees or refuses the claim posed in a given question. We pose both these tasks as supervised pair classi cation tasks. We report our results for various document representations and classi cation algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>Pair classi cation tasks</kwd>
        <kwd>document representations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Most of the research developments in area of Question
Answering(QA), as fostered by TREC, have so far focused
on open-domain QA systems. Recently however, the eld
has witnessed a growing interest in restricted domain QA.</p>
      <p>The health domain is one of the most information critical
domains in need of intelligent Question Answering systems
that can e ectively aid medical researchers and health care
professionals in their daily information search.</p>
      <p>The proposed CHIS task investigates complex health
information search in scenarios where users search for health
information with more than just a single correct answer, and
look for multiple perspectives from diverse sources both from
medical research and from real world patient narratives.</p>
      <p>Given a CHIS query,a document/set of documents
associated with that query, the task is to classify the sentences in
the document as relevant to the query or not. The relevant
sentences are those from that document, which are useful in
providing the answer to the query. These relevant sentences
need to be further classi ed as supporting the claim made
in the query, or opposing the claim made in the query.</p>
      <p>We pose both these problems as pair classi cation tasks,
where given a (question, answer) pair, the system has to
judge whether or not the answer is relevant to the query
and if so, whether or not it supports the claim made in the
query. Consider the following example
Question: Are e-cigarettes safer than normal cigarettes?
Sentence 1: Because some research has suggested that the
levels of most toxicants in vapor are lower than the levels in
smoke, e-cigarettes have been deemed to be safer than
regular cigarettes.</p>
      <p>Sentence 2: David Peyton, a chemistry professor at
Portland State University who helped conduct the research, says
that the type of formaldehyde generated by e-cigarettes could
increase the likelihood it would get deposited in the lung,
leading to lung cancer.</p>
      <p>Sentence 3: Harvey Simon, MD, Harvard Health Editor,
expressed concern that the nicotine amounts in e-cigarettes
can vary signi cantly.</p>
      <p>In the above example Sentence 1 is Relevant and supports
the claim made in the question. Sentence 2 is relevant but
refutes the claim made in the question. Sentence 3 is
irrelevant to the question. For both the tasks, we used K-fold
cross validation technique to evaluate our results.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Our proposed method solves question answering task as
classi cation task.Lot of research work has been done on
text categorization.</p>
      <p>
        Text representation is one of the key factors that a ects
the performance of classi er. The Paragraph Vector
algorithm by Le and Mikolov[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]also termed paragraph2vec is
a powerful method to nd suitable vector representations
for sentences, paragraphs and documents of variable length.
The algorithm tries to nd embeddings for separate words
and paragraphs at the same time through a procedure
similar to word2vec. De Boom, Cedric and Van Canneyt[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
were rst to come up with hybrid method for short text
representations that combines the strength of dense
distributed representations with the strength of tf-idf based
methods to automatically reduce the impact of less
informative terms.According to this paper, combination of word
embeddings and tf-idf information leads to a better model
for semantic content within short text fragments.
Ruiz, Miguel E and Srinivasan, Padmini[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presented the
design and evaluation of a text categorization method based
on the Hierarchical Mixture of Experts model. This model
has used a divide and conquer principle to de ne smaller
categorization problems based on a prede ned hierarchical
structure. The nal classi er was a hierarchical array of
neural networks. They have shown that the use of the
hierarchical structure improves text categorization performance
with respect to an equivalent at model.
      </p>
      <p>
        Dumais, Susan[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]has experimented with di erent automatic
learning algorithms for text classi cation.Each document is
represented as vector of words as done in vector
representation of information retrieval[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].This vectros are then fed
to di erent classi ers for text categorization.Experiments
have shown that Linear Support Vector Machines(SVM) is
more promising as compared to other classi ers on their
dataset.But for our task Naive Bayes has outperformed.
      </p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
      <p>In the pair classi cation task, i.e. categorizing the pair
(qm; an) we create two labeled datasets for each query as
shown below.</p>
      <p>RelevanceDatasetqm = f(an; 1) such that an is relevant
to qmg [ f(an; 0) such that
an is not relevant to qmg
(1)
ClaimDatasetqm = f(an; 1) such that an supports the
claim made in qmg [ f(an; 0) such that
an ref utes the claim made in qmg</p>
      <p>[ f(an; 2) such that an is
neutral to the claim made in qmg
(2)</p>
      <p>Note that we could use the above dataset creation
techniques only because the number of questions were xed and
known in advance.</p>
      <p>We observed that, labels were highly imbalanced in both
datasets with a larger number of positive examples and fewer
negative examples. We use oversampling and under
sampling based techniques to mitigate this problem
(OverSampling technique:Synthetic Minority Over-sampling Technique
(SMOTE)). After creating the datasets. We split the data
into train and test sets. We use doc2vec and tf-idf and
ensemble based representations to represent each answer (or
sentence). We train multiple supervised algorithms on each
of the above mentioned datasets.
3.1</p>
      <p>TF-IDF</p>
      <p>TF-IDF representation is one of the well established
document representation technique in the eld of text mining.
This kind of representation is capturing syntactic
similarities as for the example (is cancer curable?, Chemotherapy is
often used to cure cancer). However, TF-IDF based
representations are not e cient at capturing the semantic
similarities between sentences as in the example: Does sun
exposure cause skin cancer ?, Exposure to UV rays from the
sun or tanning beds is the most preventable risk factor for
melanoma. Note that melanoma, cancer are highly
similar concepts but their similarity is not captured in TF-IDF
representation. We therefore also experiment with
representations that are good at capturing the semantic relations
v(doc)
v(t-2)
v(t-1)
v(t+1)
v(t+2)</p>
      <p>Concatenated
Representation
v(t)
between text. We have used the TF-IDF implementation of
scikit-learn.
3.2</p>
      <p>Doc2Vec</p>
      <p>
        Recently, Word2Vec[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] based models have been exploited
heavily for several tasks that require capturing semantic
relatedness between text. Doc2Vec[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is one such model which
is trained on huge text corpora for the task of word
prediction. The doc2vec algorithm has two variants - Distributed
Memory (DM) and Distributed Bag of Words (DBoW). For
this work, we use Distributed Memory (DM) based models
due to its superior performance in previously reported tasks.
The architecture of DM is shown in gure 1
      </p>
      <p>Input</p>
      <p>Projection</p>
      <p>Output</p>
      <p>The problem with doc2vec or any other neural network
based model is that it requires huge amount of training
data. The main reason for this is the large number of
parameters which need to be learnt. Consider the example of
doc2vec model shown in gure 1. The vector
representations of 4 words, document representation, neural network
weights, all have to be learnt. The number of sentences
available in CHIS task is too low for such representation learning
schemes. To address this issue, we choose pre-trained word
vectors which already capture semantic relatedness between
words to a large extent.</p>
      <p>Although, google released word vectors trained on google
news corpus using the word2vec algorithm, we did not choose
these vectors as the number of hits were too low. The main
reason for this is the di erence in domain (many words in
the health care domain, found in the CHIS dataset were not
present in the google news dataset). We therefore used the
vectors released by Pyssalo et al who also train word2vec
algorithm on PubMed corpus. We used Gensims
implementation for Doc2Vec1.
3.3</p>
    </sec>
    <sec id="sec-4">
      <title>Ensemble Representation</title>
      <p>1https://radimrehurek.com/gensim/models/doc2vec.html
In order to capture both the syntactic and semantic
similarities e ciently, we use an ensemble approach, where for
each sentence we obtain its TF-IDF and doc2vec
representations (from previous sections). We then concatenate both
these representations to form an ensemble representation.</p>
    </sec>
    <sec id="sec-5">
      <title>DATASET</title>
      <p>This CHIS dataset consists of 5 health related queries and
5 les containing labeled sentences for respective queries.
Each sentence has two associated labels</p>
      <p>Relevance Label (Relevant or Irrelevant)</p>
      <p>Support Variable (Support, Oppose or Neutral)
The queries are of the following formats, where A, B
represent medical entities.</p>
      <sec id="sec-5-1">
        <title>Does A causes B?</title>
      </sec>
      <sec id="sec-5-2">
        <title>Does A cure B?</title>
      </sec>
      <sec id="sec-5-3">
        <title>Is A is better than B?</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>We used document embedding size of 400 for all the
experiments involving doc2vec, word embedding size obtained
using word2vec was 200. We have used Pythons sklearn
library to realize the SVM, Naive Bayes algorithms.We have
realized a neural network using Keras library 2 using Theano
as backend. We have used sigmoid as activation function and
Binary Cross Entropy(BCE) as loss function. Data is fed to
the network in mini-batches with a mini-batch size of 32.
We use a 10 fold cross validation to evaluate all our results.</p>
    </sec>
    <sec id="sec-7">
      <title>RESULTS</title>
      <p>In this section we present the results of various document
representations and classi cation algorithms for both the
CHIS subtasks: predicting relevant answers and predicting
whether or not a given answer supports the claim made in
the question.</p>
      <sec id="sec-7-1">
        <title>Query Name Skin Cancer MMR HRT</title>
        <p>E-cigarettes</p>
        <p>Vitamin C
Average Accuracy</p>
        <p>Neural Network
14.62
8.45
10.11
17.79
6.05
11.404
2https://keras.io/keras-deep-learning-library-for-theanoand-tensor ow</p>
      </sec>
      <sec id="sec-7-2">
        <title>Query Name Skin Cancer MMR HRT</title>
        <p>E-cigarettes</p>
        <p>Vitamin C
Average Accuracy</p>
      </sec>
      <sec id="sec-7-3">
        <title>Query Name Skin Cancer MMR HRT</title>
        <p>E-cigarettes</p>
        <p>Vitamin C
Average Accuracy
Neural Network
28.66
12.35
15.92
20.81
19.76
19.5</p>
      </sec>
      <sec id="sec-7-4">
        <title>Query Name Skin Cancer MMR HRT</title>
        <p>E-cigarettes</p>
        <p>Vitamin C
Average Accuracy</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this work, we have designed algorithms to detect if an
answer is relevant to a particular health query and whether
or not it supports the claim made in the query. We pose both
SVM
62.91
36.06
34.32
52.23
50.67
47.238
SVM
54.95
25.42
24.67
32.96
35.78
34.756
these tasks as classi cation tasks. We experimented with
a combination of several document representation schemes
and classi cation algorithms. We note that Naive Bayes
classi er has outperformed other classi cation algorithms by
a signi cant margin. We got the average accuracy of 73.03%
in sub-task 1 and 52.46 in sub-task 2. We also additionally
note that our model has predicted results with highest
accuracy for MMR query. The choice of training one classi er for
a query also gave superior performance compared to
training one classi er per class. We observed that our model's
performance is highly sensitive towards towards quality of
pre-trained word vectors, choice of classi er.</p>
      <p>
        We wish to further extend this work by obtaining
pretrained word vectors using other neural network based
algorithms like GLoVE[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Skip thought[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Deep Structured
Semantic Model(DSSM)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Convolutional Deep Structured
SemanticModels(CDSSM)[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We also wish to use these
algorithms in order to obtain richer document representations.
In this work, we have trained one classi er per query, but
such a setting is not feasable for building real applications
where the queries are not known in advance. In such
scenarios we wish to categorize queries and train a single classi er
for each query category.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>C. De Boom</surname>
            ,
            <given-names>S. Van</given-names>
          </string-name>
          <string-name>
            <surname>Canneyt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bohez</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Demeester</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Dhoedt</surname>
          </string-name>
          .
          <article-title>Learning semantic similarity for very short texts</article-title>
          .
          <source>In 2015 IEEE International Conference on Data Mining Workshop (ICDMW)</source>
          , pages
          <fpage>1229</fpage>
          {
          <fpage>1234</fpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Platt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Heckerman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sahami</surname>
          </string-name>
          .
          <article-title>Inductive learning algorithms and representations for text categorization</article-title>
          .
          <source>In Proceedings of the seventh international conference on Information and knowledge management</source>
          , pages
          <volume>148</volume>
          {
          <fpage>155</fpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.-S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Acero</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Heck</surname>
          </string-name>
          .
          <article-title>Learning deep structured semantic models for web search using clickthrough data</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management</source>
          , pages
          <volume>2333</volume>
          {
          <fpage>2338</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Urtasun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          .
          <article-title>Skip-thought vectors</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3294</volume>
          {
          <fpage>3302</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <article-title>Distributed representations of sentences and documents</article-title>
          . In ICML, volume
          <volume>14</volume>
          , pages
          <fpage>1188</fpage>
          {
          <fpage>1196</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In EMNLP</source>
          , volume
          <volume>14</volume>
          , pages
          <fpage>1532</fpage>
          {
          <fpage>43</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          .
          <article-title>Hierarchical text categorization using neural networks</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <volume>87</volume>
          {
          <fpage>118</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          .
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information processing &amp; management</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <volume>513</volume>
          {
          <fpage>523</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Mesnil</surname>
          </string-name>
          .
          <article-title>Learning semantic representations using convolutional neural networks for web search</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on World Wide Web</source>
          , pages
          <volume>373</volume>
          {
          <fpage>374</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>