Team DA_IICT at Consumer Health Information Search
                          @FIRE2016

                                                   Jainisha Sankhavara
                                                          IRLP Lab
                         Dhirubhai Ambani Institute of Information and Communication Technology
                                               Gandhinagar, Gujarat, India
                                           jainishasankhavara@gmail.com


ABSTRACT                                                          supporting the claim made in the query, or opposing the
Consumer Health Information Search task focuses on re-            claim made in the query.
trieval of relevant multiple perspectives for complex health
search queries. This task addresses the queries which do not      Example query: Are e-cigarettes safer than normal cigarettes?
have a single definitive answer but having diverse point of
views available. This paper reports the result of standard re-    S1: Because some research has suggested that the levels of
trieval methods for identifying the aspects of retrieval result   most toxicants in vapor are lower than the levels in smoke, e-
towards the query.                                                cigarettes have been deemed to be safer than regular cigarettes.
                                                                  A)Relevant, B) Support
Keywords                                                          S2: David Peyton, a chemistry professor at Portland State
Consumer Health Information Search, Health Information            University who helped conduct the research, says that the
Retrieval                                                         type of formaldehyde generated by e-cigarettes could in-
                                                                  crease the likelihood it would get deposited in the lung, lead-
                                                                  ing to lung cancer. A)Relevant, B) oppose
1.      INTRODUCTION
   People are highly using web search engines for health in-      S3: Harvey Simon, MD, Harvard Health Editor, expressed
formation retrieval now a days. These search engines are          concern that the nicotine amounts in e-cigarettes can vary
quite suitable to answer the straightforward health related       significantly. A)Irrelevant, B) Neutral
medical queries but some queries are complex in a way that
they do not have a single definitive answer, instead they have      There were 5 queries provided and 357 sentences across
multiple perspectives to the queries, both for and against        those queries. The performance is measured in terms of
hypothesis. The presence of multiple perspectives with dif-       percentage accuracy of each task against each query and a
ferent grades of supporting evidence (which is dynamically        task wise average over all five queries are considered as eval-
changing over time due to the arrival of new research and         uation measure.
practice evidence) makes it all the more challenging for a
lay searcher. Consumer Health Information Search (CHIS)
aims to target such information retrieval search tasks, for       3.   EXPERIMENTS
which there is no single best correct answer but having mul-         The experiments include standard retrieval methods to
tiple and diverse perspectives/points of view available on the    identify relevant and irrelevant sentences (task A). To iden-
web regarding the queried information.                            tify weather the sentence is supporting the claim or opposing
                                                                  the claim (task B), the standard query expansion technique
  The description of data is provided in section 2. The ex-       was used.
periments and results are described in section 3 and section
4 respectively and we conclude in section 5.                        The experiments are done using terrier[2] tool-kit which
                                                                  are openly available. The experiments focuses on how useful
                                                                  the standard retrieval methods are to identify the relevance
2.      CHIS TASK                                                 at sentence level instead of documents and how it can be use-
     There will be two sets of tasks:                             ful to identify the supporting or opposing nature of sentences
                                                                  to the hypothesis of query. BM25[1],[3] model is used to
A) Given a CHIS query, and a document/set of documents            identify relevant/not-relevant sentences and TF-IDF[1] with
associated with that query, the task is to classify the sen-      query expansion is used to identify supporting/opposing na-
tences in the document as relevant to the query or not. The       ture of the sentences.
relevant sentences are those from that document, which are
useful in providing the answer to the query.                      Task A: Identify relevant/non-relevant sentences.
                                                                    The sentences are indexed using terrier and the retrieval is
B) These relevant sentences need to be further classified as      performed against each queries using BM25 retrieval model.
The retrieved sentences are marked as relevant for task A      algorithms are helpful to get average results but for task
and others (non-retrieved sentences ) are considered as non-   B, standard information retrieval algorithms fails to achieve
relevant to the query.                                         atleast average results. So, the standard algorithms are less
                                                               recommendable to use to extract supporting/opposing sen-
Task B: Identify support/oppose/neutral nature of sen-         tences but definitely can be used to extract relevant/non-
tences.                                                        relevant sentences.
   The sentences are indexed using terrier and the queries
are executed against indexed sentences using TF-IDF re-
trieval model with Bo1 query expansion model taking top        5.   CONCLUSION
5 sentences as feedback and 30 terms as expansion terms.          The paper describes results of standard information re-
The sentences retrieved using query expansion are which        trieval algorithms on complex medical queries which have
are identified relevant according to task A are marked as      multiple perspectives available. Standard information re-
supporting and the sentences which are not retrieved using     trieval algorithm gives average results when identifying rel-
query expansion but retrieved using task A are marked as       evant/ non-relevant sentences but it gives less than the av-
opposing to the query since they are relevant to the query.    erage results in identifying supporting/opposing sentences.
All irrelevant sentences are considered to be neutral to the   In task A, our results are third in the rank-list of all partic-
query.                                                         ipants.


4.    RESULTS                                                  6.   REFERENCES
  The percentage accuracy of the query wise results ob-
                                                               [1] I. Mogotsi. Christopher d. manning, prabhakar
tained by above described method is given in the below ta-
                                                                   raghavan, and hinrich schütze: Introduction to
ble.
                                                                   information retrieval. Information Retrieval,
                                                                   13(2):192–195, 2010.
     Query                 Task A           Task B             [2] I. Ounis, G. Amati, V. Plachouras, B. He,
     Skincare            52.27272727          37.5                 C. Macdonald, and D. Johnson. Terrier information
     MMr                 87.93103448      46.55172414              retrieval platform. In European Conference on
     HRT                 91.66666667      27.77777778              Information Retrieval, pages 517–519. Springer, 2005.
     Ecig                  54.6875           46.875            [3] S. Robertson and H. Zaragoza. The probabilistic
     Vitc                64.86486486      31.08108108              relevance framework: BM25 and beyond. Now
     Overall             70.28455866       37.9571166              Publishers Inc, 2009.


  Table 1: Percentage accuracy for both the tasks


   There were 9 teams participated in task A and 8 teams in
task B. The comparison of the overall percentage accuracy
of the results with maximum of all teams and average of all
teams is given in the following graph.


Figure 1: Comparison of with maximum and average
results


   The results of task A are comparable to the average of
all other systems that means standard information retrieval