=Paper=
{{Paper
|id=Vol-1737/T5-7
|storemode=property
|title=Consumer Health Information System
|pdfUrl=https://ceur-ws.org/Vol-1737/T5-7.pdf
|volume=Vol-1737
|authors=Raksha Sanjay Jalan,Pattisapu Nikhil Priyatam,Vasudeva Varma
|dblpUrl=https://dblp.org/rec/conf/fire/JalanPV16
}}
==Consumer Health Information System==
Consumer Health Information System Raksha Sanjay Jalan Pattisapu Nikhil Priyatam Vasudeva Varma Search and Information Search and Information Search and Information Extraction Lab Extraction Lab Extraction Lab IIIT Hyderabad IIIT Hyderabad IIIT Hyderabad Hyderabad, India Hyderabad, India Hyderabad, India jalan.raksha@research. nikhil.pattisapu@research. vv@iiit.ac.in iiit.ac.in iiit.ac.in ABSTRACT Question: Are e-cigarettes safer than normal cigarettes? World Wide Web acts as one of the major sources of infor- mation for health related questions. However, often, there Sentence 1: Because some research has suggested that the are multiple conflicting answers to a single question and it is levels of most toxicants in vapor are lower than the levels in hard to come up with “a single best correct answer”. There- smoke, e-cigarettes have been deemed to be safer than regu- fore, it is highly desirable to identify conflicting perspectives lar cigarettes. about a particular question (or topic). In this paper, we have described our participation in Consumer Health Information System(CHIS) task at FIRE 2016. There were two sub-tasks Sentence 2: David Peyton, a chemistry professor at Port- in this contest. The first sub-task deals with identifying if a land State University who helped conduct the research, says particular answer is relevant to a given question. The second that the type of formaldehyde generated by e-cigarettes could sub-task deals with detecting if a particular answer agrees increase the likelihood it would get deposited in the lung, lead- or refuses the claim posed in a given question. We pose ing to lung cancer. both these tasks as supervised pair classification tasks. We report our results for various document representations and classification algorithms. Sentence 3: Harvey Simon, MD, Harvard Health Editor, expressed concern that the nicotine amounts in e-cigarettes can vary significantly. Keywords Pair classification tasks, document representations In the above example Sentence 1 is Relevant and supports 1. INTRODUCTION the claim made in the question. Sentence 2 is relevant but refutes the claim made in the question. Sentence 3 is irrel- Most of the research developments in area of Question evant to the question. For both the tasks, we used K-fold Answering(QA), as fostered by TREC, have so far focused cross validation technique to evaluate our results. on open-domain QA systems. Recently however, the field has witnessed a growing interest in restricted domain QA. The health domain is one of the most information critical 2. RELATED WORK domains in need of intelligent Question Answering systems Our proposed method solves question answering task as that can effectively aid medical researchers and health care classification task.Lot of research work has been done on professionals in their daily information search. text categorization. Text representation is one of the key factors that affects The proposed CHIS task investigates complex health in- the performance of classifier. The Paragraph Vector algo- formation search in scenarios where users search for health rithm by Le and Mikolov[5]also termed paragraph2vec is information with more than just a single correct answer, and a powerful method to find suitable vector representations look for multiple perspectives from diverse sources both from for sentences, paragraphs and documents of variable length. medical research and from real world patient narratives. The algorithm tries to find embeddings for separate words Given a CHIS query,a document/set of documents associ- and paragraphs at the same time through a procedure sim- ated with that query, the task is to classify the sentences in ilar to word2vec. De Boom, Cedric and Van Canneyt[1] the document as relevant to the query or not. The relevant were first to come up with hybrid method for short text sentences are those from that document, which are useful in representations that combines the strength of dense dis- providing the answer to the query. These relevant sentences tributed representations with the strength of tf-idf based need to be further classified as supporting the claim made methods to automatically reduce the impact of less infor- in the query, or opposing the claim made in the query. mative terms.According to this paper, combination of word We pose both these problems as pair classification tasks, embeddings and tf-idf information leads to a better model where given a (question, answer) pair, the system has to for semantic content within short text fragments. judge whether or not the answer is relevant to the query Ruiz, Miguel E and Srinivasan, Padmini[8] presented the de- and if so, whether or not it supports the claim made in the sign and evaluation of a text categorization method based query. Consider the following example on the Hierarchical Mixture of Experts model. This model has used a divide and conquer principle to define smaller between text. We have used the TF-IDF implementation of categorization problems based on a predefined hierarchical scikit-learn. structure. The final classifier was a hierarchical array of neural networks. They have shown that the use of the hier- 3.2 Doc2Vec archical structure improves text categorization performance Recently, Word2Vec[6] based models have been exploited with respect to an equivalent flat model. heavily for several tasks that require capturing semantic re- Dumais, Susan[2]has experimented with different automatic latedness between text. Doc2Vec[5] is one such model which learning algorithms for text classification.Each document is is trained on huge text corpora for the task of word predic- represented as vector of words as done in vector represen- tion. The doc2vec algorithm has two variants - Distributed tation of information retrieval[9].This vectros are then fed Memory (DM) and Distributed Bag of Words (DBoW). For to different classifiers for text categorization.Experiments this work, we use Distributed Memory (DM) based models have shown that Linear Support Vector Machines(SVM) is due to its superior performance in previously reported tasks. more promising as compared to other classifiers on their The architecture of DM is shown in figure 1 dataset.But for our task Naive Bayes has outperformed. Input Projection Output 3. APPROACH In the pair classification task, i.e. categorizing the pair (qm , an ) we create two labeled datasets for each query as v(doc) shown below. RelevanceDatasetqm = {(an , 1) such that an is relevant v(t-2) to qm } ∪ {(an , 0) such that an is not relevant to qm } (1) v(t-1) v(t) ClaimDatasetqm = {(an , 1) such that an supports the claim made in qm } ∪ {(an , 0) such that an ref utes the claim made in qm } v(t+1) ∪ {(an , 2) such that an is Concatenated neutral to the claim made in qm } Representation (2) v(t+2) Note that we could use the above dataset creation tech- niques only because the number of questions were fixed and known in advance. Figure 1: Architecture of Distributed Memory(DM) Model We observed that, labels were highly imbalanced in both datasets with a larger number of positive examples and fewer The problem with doc2vec or any other neural network negative examples. We use oversampling and under sam- based model is that it requires huge amount of training pling based techniques to mitigate this problem (OverSam- data. The main reason for this is the large number of pa- pling technique:Synthetic Minority Over-sampling Technique rameters which need to be learnt. Consider the example of (SMOTE)). After creating the datasets. We split the data doc2vec model shown in figure 1. The vector representa- into train and test sets. We use doc2vec and tf-idf and en- tions of 4 words, document representation, neural network semble based representations to represent each answer (or weights, all have to be learnt. The number of sentences avail- sentence). We train multiple supervised algorithms on each able in CHIS task is too low for such representation learning of the above mentioned datasets. schemes. To address this issue, we choose pre-trained word vectors which already capture semantic relatedness between 3.1 TF-IDF words to a large extent. TF-IDF representation is one of the well established doc- Although, google released word vectors trained on google ument representation technique in the field of text mining. news corpus using the word2vec algorithm, we did not choose This kind of representation is capturing syntactic similari- these vectors as the number of hits were too low. The main ties as for the example (is cancer curable?, Chemotherapy is reason for this is the difference in domain (many words in often used to cure cancer). However, TF-IDF based repre- the health care domain, found in the CHIS dataset were not sentations are not efficient at capturing the semantic simi- present in the google news dataset). We therefore used the larities between sentences as in the example: Does sun ex- vectors released by Pyssalo et al who also train word2vec posure cause skin cancer ?, Exposure to UV rays from the algorithm on PubMed corpus. We used Gensims implemen- sun or tanning beds is the most preventable risk factor for tation for Doc2Vec1 . melanoma. Note that melanoma, cancer are highly simi- lar concepts but their similarity is not captured in TF-IDF 3.3 Ensemble Representation representation. We therefore also experiment with repre- 1 sentations that are good at capturing the semantic relations https://radimrehurek.com/gensim/models/doc2vec.html In order to capture both the syntactic and semantic sim- Query Name Neural Network SVM Naive Bayes ilarities efficiently, we use an ensemble approach, where for Skin Cancer 9.76 46.67 57.65 each sentence we obtain its TF-IDF and doc2vec represen- MMR 7.42 30.34 74.862 tations (from previous sections). We then concatenate both HRT 9.192 25.43 62.05 these representations to form an ensemble representation. E-cigarettes 12.41 25.21 54.785 Vitamin C 7.05 32.51 54.28 4. DATASET Average Accuracy 9.166 32.032 60.725 This CHIS dataset consists of 5 health related queries and Table 2: Results obtained for sub-task 1 for TF-IDF repre- 5 files containing labeled sentences for respective queries. sentations Each sentence has two associated labels • Relevance Label (Relevant or Irrelevant) Query Name Neural Network SVM Naive Bayes Skin Cancer 28.66 62.91 68.181 • Support Variable (Support, Oppose or Neutral) MMR 12.35 36.06 87.931 HRT 15.92 34.32 75 The queries are of the following formats, where A, B rep- E-cigarettes 20.81 52.23 71.875 resent medical entities. Vitamin C 19.76 50.67 62.162 Average Accuracy 19.5 47.238 73.030 • Does A causes B? Table 3: Results obtained for sub-task 1 for Ensemble rep- • Does A cure B? resentations • Is A is better than B? Query Name Neural Network SVM Naive Bayes Skin Cancer 26.45 54.95 57.74 5. EXPERIMENTS MMR 17.67 25.42 49.851 We used document embedding size of 400 for all the ex- HRT 14.95 24.67 21.56 periments involving doc2vec, word embedding size obtained E-cigarettes 16.67 32.96 41.65 using word2vec was 200. We have used Pythons sklearn li- Vitamin C 11.96 35.78 31.41 brary to realize the SVM, Naive Bayes algorithms.We have Average Accuracy 17.54 34.756 40.442 realized a neural network using Keras library 2 using Theano as backend. We have used sigmoid as activation function and Table 4: Results obtained for sub-task 2 for Doc2Vec repre- Binary Cross Entropy(BCE) as loss function. Data is fed to sentations the network in mini-batches with a mini-batch size of 32. We use a 10 fold cross validation to evaluate all our results. Query Name Neural Network SVM Naive Bayes Skin Cancer 28.96 57.65 59.54 6. RESULTS MMR 19.45 25.24 62.89 In this section we present the results of various document HRT 18.65 29.56 35.42 representations and classification algorithms for both the E-cigarettes 17.45 39.567 55.645 CHIS subtasks: predicting relevant answers and predicting Vitamin C 21.05 47.671 31.94 whether or not a given answer supports the claim made in Average Accuracy 21.112 39.817 49.087 the question. Table 5: Results obtained for sub-task 2 for TF-IDF repre- Query Name Neural Network SVM Naive Bayes sentations Skin Cancer 14.62 28.65 48.72 MMR 8.45 21.841 61.762 Query Name Neural Network SVM Naive Bayes HRT 10.11 30.54 47.67 Skin Cancer 34.79 60.67 62.5 E-cigarettes 17.79 21.67 41.985 MMR 21.676 29.508 68.96551724 Vitamin C 6.05 23.45 41.567 HRT 21.25 34.66 37.5 Average Accuracy 11.404 25.2302 48.3408 E-cigarettes 19.345 46.26 60.9375 Vitamin C 22.197 50.66 32.43243243 Table 1: Results obtained for sub-task 1 for Doc2Vec repre- Average Accuracy 23.851 44.35 52.467 sentations Table 6: Results obtained for sub-task 2 for Ensemble rep- resentations For Both the sub-tasks, highest average accuracies are achieved when sentences are represented using ensemble rep- resentations and classifications are done using Naive Bayes classifier. 7. CONCLUSION AND FUTURE WORK In this work, we have designed algorithms to detect if an 2 answer is relevant to a particular health query and whether https://keras.io/keras-deep-learning-library-for-theano- and-tensorflow or not it supports the claim made in the query. We pose both these tasks as classification tasks. We experimented with [10] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. a combination of several document representation schemes Learning semantic representations using convolutional and classification algorithms. We note that Naive Bayes neural networks for web search. In Proceedings of the classifier has outperformed other classification algorithms by 23rd International Conference on World Wide Web, a significant margin. We got the average accuracy of 73.03% pages 373–374. ACM, 2014. in sub-task 1 and 52.46 in sub-task 2. We also additionally note that our model has predicted results with highest accu- racy for MMR query. The choice of training one classifier for a query also gave superior performance compared to train- ing one classifier per class. We observed that our model’s performance is highly sensitive towards towards quality of pre-trained word vectors, choice of classifier. We wish to further extend this work by obtaining pre- trained word vectors using other neural network based al- gorithms like GLoVE[7], Skip thought[4], Deep Structured Semantic Model(DSSM)[3], Convolutional Deep Structured SemanticModels(CDSSM)[10]. We also wish to use these al- gorithms in order to obtain richer document representations. In this work, we have trained one classifier per query, but such a setting is not feasable for building real applications where the queries are not known in advance. In such scenar- ios we wish to categorize queries and train a single classifier for each query category. 8. REFERENCES [1] C. De Boom, S. Van Canneyt, S. Bohez, T. Demeester, and B. Dhoedt. Learning semantic similarity for very short texts. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pages 1229–1234. IEEE, 2015. [2] S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on Information and knowledge management, pages 148–155. ACM, 1998. [3] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2333–2338. ACM, 2013. [4] R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In Advances in neural information processing systems, pages 3294–3302, 2015. [5] Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188–1196, 2014. [6] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013. [7] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, volume 14, pages 1532–43, 2014. [8] M. E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87–118, 2002. [9] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523, 1988.