=Paper= {{Paper |id=Vol-1737/T5-7 |storemode=property |title=Consumer Health Information System |pdfUrl=https://ceur-ws.org/Vol-1737/T5-7.pdf |volume=Vol-1737 |authors=Raksha Sanjay Jalan,Pattisapu Nikhil Priyatam,Vasudeva Varma |dblpUrl=https://dblp.org/rec/conf/fire/JalanPV16 }} ==Consumer Health Information System== https://ceur-ws.org/Vol-1737/T5-7.pdf
                        Consumer Health Information System

             Raksha Sanjay Jalan                  Pattisapu Nikhil Priyatam                 Vasudeva Varma
              Search and Information                  Search and Information              Search and Information
                  Extraction Lab                          Extraction Lab                      Extraction Lab
                 IIIT Hyderabad                          IIIT Hyderabad                      IIIT Hyderabad
                Hyderabad, India                        Hyderabad, India                    Hyderabad, India
           jalan.raksha@research.                nikhil.pattisapu@research.                    vv@iiit.ac.in
                    iiit.ac.in                             iiit.ac.in

ABSTRACT                                                              Question: Are e-cigarettes safer than normal cigarettes?
World Wide Web acts as one of the major sources of infor-
mation for health related questions. However, often, there             Sentence 1: Because some research has suggested that the
are multiple conflicting answers to a single question and it is     levels of most toxicants in vapor are lower than the levels in
hard to come up with “a single best correct answer”. There-         smoke, e-cigarettes have been deemed to be safer than regu-
fore, it is highly desirable to identify conflicting perspectives   lar cigarettes.
about a particular question (or topic). In this paper, we have
described our participation in Consumer Health Information
System(CHIS) task at FIRE 2016. There were two sub-tasks              Sentence 2: David Peyton, a chemistry professor at Port-
in this contest. The first sub-task deals with identifying if a     land State University who helped conduct the research, says
particular answer is relevant to a given question. The second       that the type of formaldehyde generated by e-cigarettes could
sub-task deals with detecting if a particular answer agrees         increase the likelihood it would get deposited in the lung, lead-
or refuses the claim posed in a given question. We pose             ing to lung cancer.
both these tasks as supervised pair classification tasks. We
report our results for various document representations and
classification algorithms.                                            Sentence 3: Harvey Simon, MD, Harvard Health Editor,
                                                                    expressed concern that the nicotine amounts in e-cigarettes
                                                                    can vary significantly.
Keywords
Pair classification tasks, document representations
                                                                      In the above example Sentence 1 is Relevant and supports
1.   INTRODUCTION                                                   the claim made in the question. Sentence 2 is relevant but
                                                                    refutes the claim made in the question. Sentence 3 is irrel-
  Most of the research developments in area of Question             evant to the question. For both the tasks, we used K-fold
Answering(QA), as fostered by TREC, have so far focused             cross validation technique to evaluate our results.
on open-domain QA systems. Recently however, the field
has witnessed a growing interest in restricted domain QA.
  The health domain is one of the most information critical         2.   RELATED WORK
domains in need of intelligent Question Answering systems              Our proposed method solves question answering task as
that can effectively aid medical researchers and health care        classification task.Lot of research work has been done on
professionals in their daily information search.                    text categorization.
                                                                    Text representation is one of the key factors that affects
  The proposed CHIS task investigates complex health in-            the performance of classifier. The Paragraph Vector algo-
formation search in scenarios where users search for health         rithm by Le and Mikolov[5]also termed paragraph2vec is
information with more than just a single correct answer, and        a powerful method to find suitable vector representations
look for multiple perspectives from diverse sources both from       for sentences, paragraphs and documents of variable length.
medical research and from real world patient narratives.            The algorithm tries to find embeddings for separate words
  Given a CHIS query,a document/set of documents associ-            and paragraphs at the same time through a procedure sim-
ated with that query, the task is to classify the sentences in      ilar to word2vec. De Boom, Cedric and Van Canneyt[1]
the document as relevant to the query or not. The relevant          were first to come up with hybrid method for short text
sentences are those from that document, which are useful in         representations that combines the strength of dense dis-
providing the answer to the query. These relevant sentences         tributed representations with the strength of tf-idf based
need to be further classified as supporting the claim made          methods to automatically reduce the impact of less infor-
in the query, or opposing the claim made in the query.              mative terms.According to this paper, combination of word
  We pose both these problems as pair classification tasks,         embeddings and tf-idf information leads to a better model
where given a (question, answer) pair, the system has to            for semantic content within short text fragments.
judge whether or not the answer is relevant to the query            Ruiz, Miguel E and Srinivasan, Padmini[8] presented the de-
and if so, whether or not it supports the claim made in the         sign and evaluation of a text categorization method based
query. Consider the following example                               on the Hierarchical Mixture of Experts model. This model
has used a divide and conquer principle to define smaller       between text. We have used the TF-IDF implementation of
categorization problems based on a predefined hierarchical      scikit-learn.
structure. The final classifier was a hierarchical array of
neural networks. They have shown that the use of the hier-      3.2      Doc2Vec
archical structure improves text categorization performance        Recently, Word2Vec[6] based models have been exploited
with respect to an equivalent flat model.                       heavily for several tasks that require capturing semantic re-
Dumais, Susan[2]has experimented with different automatic       latedness between text. Doc2Vec[5] is one such model which
learning algorithms for text classification.Each document is    is trained on huge text corpora for the task of word predic-
represented as vector of words as done in vector represen-      tion. The doc2vec algorithm has two variants - Distributed
tation of information retrieval[9].This vectros are then fed    Memory (DM) and Distributed Bag of Words (DBoW). For
to different classifiers for text categorization.Experiments    this work, we use Distributed Memory (DM) based models
have shown that Linear Support Vector Machines(SVM) is          due to its superior performance in previously reported tasks.
more promising as compared to other classifiers on their        The architecture of DM is shown in figure 1
dataset.But for our task Naive Bayes has outperformed.
                                                                          Input        Projection            Output
3.    APPROACH
  In the pair classification task, i.e. categorizing the pair
(qm , an ) we create two labeled datasets for each query as     v(doc)
shown below.

RelevanceDatasetqm = {(an , 1) such that an is relevant         v(t-2)
                               to qm } ∪ {(an , 0) such that
                                  an is not relevant to qm }
                                                          (1)
                                                                v(t-1)                                                v(t)
 ClaimDatasetqm = {(an , 1) such that an supports the
                 claim made in qm } ∪ {(an , 0) such that
                       an ref utes the claim made in qm }       v(t+1)
                               ∪ {(an , 2) such that an is
                                                                                     Concatenated
                        neutral to the claim made in qm }                            Representation
                                                       (2)      v(t+2)
   Note that we could use the above dataset creation tech-
niques only because the number of questions were fixed and
known in advance.                                               Figure 1: Architecture of Distributed Memory(DM) Model
   We observed that, labels were highly imbalanced in both
datasets with a larger number of positive examples and fewer       The problem with doc2vec or any other neural network
negative examples. We use oversampling and under sam-           based model is that it requires huge amount of training
pling based techniques to mitigate this problem (OverSam-       data. The main reason for this is the large number of pa-
pling technique:Synthetic Minority Over-sampling Technique      rameters which need to be learnt. Consider the example of
(SMOTE)). After creating the datasets. We split the data        doc2vec model shown in figure 1. The vector representa-
into train and test sets. We use doc2vec and tf-idf and en-     tions of 4 words, document representation, neural network
semble based representations to represent each answer (or       weights, all have to be learnt. The number of sentences avail-
sentence). We train multiple supervised algorithms on each      able in CHIS task is too low for such representation learning
of the above mentioned datasets.                                schemes. To address this issue, we choose pre-trained word
                                                                vectors which already capture semantic relatedness between
3.1   TF-IDF                                                    words to a large extent.
   TF-IDF representation is one of the well established doc-       Although, google released word vectors trained on google
ument representation technique in the field of text mining.     news corpus using the word2vec algorithm, we did not choose
This kind of representation is capturing syntactic similari-    these vectors as the number of hits were too low. The main
ties as for the example (is cancer curable?, Chemotherapy is    reason for this is the difference in domain (many words in
often used to cure cancer). However, TF-IDF based repre-        the health care domain, found in the CHIS dataset were not
sentations are not efficient at capturing the semantic simi-    present in the google news dataset). We therefore used the
larities between sentences as in the example: Does sun ex-      vectors released by Pyssalo et al who also train word2vec
posure cause skin cancer ?, Exposure to UV rays from the        algorithm on PubMed corpus. We used Gensims implemen-
sun or tanning beds is the most preventable risk factor for     tation for Doc2Vec1 .
melanoma. Note that melanoma, cancer are highly simi-
lar concepts but their similarity is not captured in TF-IDF     3.3      Ensemble Representation
representation. We therefore also experiment with repre-
                                                                1
sentations that are good at capturing the semantic relations        https://radimrehurek.com/gensim/models/doc2vec.html
   In order to capture both the syntactic and semantic sim-           Query Name         Neural Network      SVM     Naive Bayes
ilarities efficiently, we use an ensemble approach, where for         Skin Cancer             9.76           46.67      57.65
each sentence we obtain its TF-IDF and doc2vec represen-                 MMR                  7.42           30.34     74.862
tations (from previous sections). We then concatenate both                HRT                9.192           25.43      62.05
these representations to form an ensemble representation.             E-cigarettes           12.41           25.21     54.785
                                                                       Vitamin C              7.05           32.51      54.28
4.     DATASET                                                      Average Accuracy         9.166          32.032     60.725
  This CHIS dataset consists of 5 health related queries and       Table 2: Results obtained for sub-task 1 for TF-IDF repre-
5 files containing labeled sentences for respective queries.       sentations
Each sentence has two associated labels

     • Relevance Label (Relevant or Irrelevant)                       Query Name         Neural Network      SVM     Naive Bayes
                                                                      Skin Cancer            28.66           62.91     68.181
     • Support Variable (Support, Oppose or Neutral)                     MMR                 12.35           36.06     87.931
                                                                          HRT                15.92           34.32       75
  The queries are of the following formats, where A, B rep-           E-cigarettes           20.81           52.23     71.875
resent medical entities.                                               Vitamin C             19.76           50.67     62.162
                                                                    Average Accuracy          19.5          47.238     73.030
     • Does A causes B?
                                                                   Table 3: Results obtained for sub-task 1 for Ensemble rep-
     • Does A cure B?
                                                                   resentations
     • Is A is better than B?
                                                                      Query Name         Neural Network      SVM     Naive Bayes
                                                                      Skin Cancer            26.45           54.95      57.74
5.     EXPERIMENTS                                                       MMR                 17.67           25.42     49.851
  We used document embedding size of 400 for all the ex-                  HRT                14.95           24.67      21.56
periments involving doc2vec, word embedding size obtained             E-cigarettes           16.67           32.96      41.65
using word2vec was 200. We have used Pythons sklearn li-               Vitamin C             11.96           35.78      31.41
brary to realize the SVM, Naive Bayes algorithms.We have            Average Accuracy         17.54          34.756     40.442
realized a neural network using Keras library 2 using Theano
as backend. We have used sigmoid as activation function and        Table 4: Results obtained for sub-task 2 for Doc2Vec repre-
Binary Cross Entropy(BCE) as loss function. Data is fed to         sentations
the network in mini-batches with a mini-batch size of 32.
We use a 10 fold cross validation to evaluate all our results.
                                                                      Query Name         Neural Network      SVM     Naive Bayes
                                                                      Skin Cancer            28.96           57.65      59.54
6.     RESULTS                                                           MMR                 19.45           25.24      62.89
  In this section we present the results of various document              HRT                18.65           29.56      35.42
representations and classification algorithms for both the            E-cigarettes           17.45          39.567     55.645
CHIS subtasks: predicting relevant answers and predicting              Vitamin C             21.05          47.671      31.94
whether or not a given answer supports the claim made in            Average Accuracy         21.112         39.817     49.087
the question.
                                                                   Table 5: Results obtained for sub-task 2 for TF-IDF repre-
      Query Name        Neural Network      SVM      Naive Bayes   sentations
      Skin Cancer            14.62          28.65        48.72
         MMR                  8.45         21.841       61.762        Query Name         Neural Network      SVM     Naive Bayes
          HRT                10.11          30.54        47.67        Skin Cancer            34.79           60.67       62.5
      E-cigarettes           17.79          21.67       41.985           MMR                 21.676         29.508   68.96551724
       Vitamin C              6.05          23.45       41.567            HRT                21.25           34.66       37.5
    Average Accuracy        11.404        25.2302      48.3408        E-cigarettes           19.345          46.26     60.9375
                                                                       Vitamin C             22.197          50.66   32.43243243
Table 1: Results obtained for sub-task 1 for Doc2Vec repre-         Average Accuracy         23.851          44.35      52.467
sentations
                                                                   Table 6: Results obtained for sub-task 2 for Ensemble rep-
                                                                   resentations
   For Both the sub-tasks, highest average accuracies are
achieved when sentences are represented using ensemble rep-
resentations and classifications are done using Naive Bayes
classifier.
                                                                   7.   CONCLUSION AND FUTURE WORK
                                                                     In this work, we have designed algorithms to detect if an
2                                                                  answer is relevant to a particular health query and whether
 https://keras.io/keras-deep-learning-library-for-theano-
and-tensorflow                                                     or not it supports the claim made in the query. We pose both
these tasks as classification tasks. We experimented with         [10] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil.
a combination of several document representation schemes               Learning semantic representations using convolutional
and classification algorithms. We note that Naive Bayes                neural networks for web search. In Proceedings of the
classifier has outperformed other classification algorithms by         23rd International Conference on World Wide Web,
a significant margin. We got the average accuracy of 73.03%            pages 373–374. ACM, 2014.
in sub-task 1 and 52.46 in sub-task 2. We also additionally
note that our model has predicted results with highest accu-
racy for MMR query. The choice of training one classifier for
a query also gave superior performance compared to train-
ing one classifier per class. We observed that our model’s
performance is highly sensitive towards towards quality of
pre-trained word vectors, choice of classifier.
   We wish to further extend this work by obtaining pre-
trained word vectors using other neural network based al-
gorithms like GLoVE[7], Skip thought[4], Deep Structured
Semantic Model(DSSM)[3], Convolutional Deep Structured
SemanticModels(CDSSM)[10]. We also wish to use these al-
gorithms in order to obtain richer document representations.
In this work, we have trained one classifier per query, but
such a setting is not feasable for building real applications
where the queries are not known in advance. In such scenar-
ios we wish to categorize queries and train a single classifier
for each query category.

8.   REFERENCES
 [1] C. De Boom, S. Van Canneyt, S. Bohez,
     T. Demeester, and B. Dhoedt. Learning semantic
     similarity for very short texts. In 2015 IEEE
     International Conference on Data Mining Workshop
     (ICDMW), pages 1229–1234. IEEE, 2015.
 [2] S. Dumais, J. Platt, D. Heckerman, and M. Sahami.
     Inductive learning algorithms and representations for
     text categorization. In Proceedings of the seventh
     international conference on Information and
     knowledge management, pages 148–155. ACM, 1998.
 [3] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and
     L. Heck. Learning deep structured semantic models for
     web search using clickthrough data. In Proceedings of
     the 22nd ACM international conference on Conference
     on information & knowledge management, pages
     2333–2338. ACM, 2013.
 [4] R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel,
     R. Urtasun, A. Torralba, and S. Fidler. Skip-thought
     vectors. In Advances in neural information processing
     systems, pages 3294–3302, 2015.
 [5] Q. V. Le and T. Mikolov. Distributed representations
     of sentences and documents. In ICML, volume 14,
     pages 1188–1196, 2014.
 [6] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and
     J. Dean. Distributed representations of words and
     phrases and their compositionality. In Advances in
     neural information processing systems, pages
     3111–3119, 2013.
 [7] J. Pennington, R. Socher, and C. D. Manning. Glove:
     Global vectors for word representation. In EMNLP,
     volume 14, pages 1532–43, 2014.
 [8] M. E. Ruiz and P. Srinivasan. Hierarchical text
     categorization using neural networks. Information
     Retrieval, 5(1):87–118, 2002.
 [9] G. Salton and C. Buckley. Term-weighting approaches
     in automatic text retrieval. Information processing &
     management, 24(5):513–523, 1988.