=Paper=
{{Paper
|id=Vol-2696/paper_159
|storemode=property
|title=Query-focused biomedical text summarization in BioASQ 8B
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_159.pdf
|volume=Vol-2696
|authors=Jainisha Sankhavara,Prasenjit Majumder
|dblpUrl=https://dblp.org/rec/conf/clef/SankhavaraM20
}}
==Query-focused biomedical text summarization in BioASQ 8B==
Query-focused biomedical text summarization in BioASQ 8B Jainisha Sankhavara1[0000−0001−7460−1587] and Prasenjit Majumder1 Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India {jainishasankhavara,prasenjit.majumder}@gmail.com Abstract. This paper presents query-sentence matching based and UMLS query-graph based summarization techniques for query-specific biomedical text summarization. The query-specific graphs, constructed using UMLS entities and relations, are used for matching the sentences. The core idea is to find candidate biomedical entities for query expansion which are semantically connected. The graph represents these connections and it was automatically constructed using UMLS knowledge source and biomedical text. The results of the proposed techniques experimented on previous BioASQ dataset are better as compared to the results of baseline techniques. The same techniques are applied on task 8B dataset for ideal answer generation and submitted to BioASQ8. These submitted results gave the highest scores among all participants’ submissions for automatic evaluation scores(ROUGE-2 Recall and ROUGE-SU4 Recall). Keywords: Biomedical text summarization · UMLS · Query-focused summarization. 1 Introduction Biomedical text and information on the web are growing exponentially nowadays. Text summarization attempts to provide the users with a summarized version of the text with maximum information content in a compact, quick and intelligible way. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques [10]. Automatic text summarization of biomedical text is a promising method for helping clinicians and researchers to efficiently obtain and understand any topic by producing a summary from one or multiple documents. The goal of text summarization is to present a subset of the source text, which expresses the most important points with minimal redundancy. Thus, text summarization Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessaloniki, Greece. may become an important tool to assist clinicians and researchers with their information and knowledge management tasks. Sometimes, there may exist a query for which the user seeks information and sometimes it may not. In the case of query-focused summarization, the generated summary should have the answer to the query in it. It is a very usual scenario that users want exact answers along with some related details in case of their medical related queries. Therefore, we are focusing here on query-focused biomedical multi-document summarization which will be helpful to clinicians and all other users who are seeking elaborated answers to their medical related queries. BioASQ1 organizes challenges which include biomedical semantic indexing and question answering. The question answering task uses benchmark datasets containing development and test questions, in English, along with gold standard (reference) answers constructed by a team of biomedical experts. The partici- pants have to respond with various types of answers. Specifically, task B has questions with their related documents and snippets for which exact answers and ideal answers need to be generated. Here we focus on generating ideal an- swers for the questions. The ideal answers are paragraph sized summaries with multiple sentences. We are focusing on generating ideal answers using extractive summarization on available snippets. The remainder of this paper is presented as follows: Section 2 shows the related works. Section 3 describes the baseline methods and the proposed methods for query-focused biomedical text summarization. Sections 4 presents the experiments and results with analysis. Finally, section 6 concludes it. 2 Related work A lot of research has been carried out in the field of biomedical text summariza- tion. A recent survey on the research in text summarization in the biomedical domain highlights that natural language processing and hybrid techniques were prominently used for summarization of multiple documents [10]. The graph-based summarization using named-entities has been presented as EntityRank algorithm which considers information about named entities in the process of multi-document graph-based summarization [17]. Their results show that the addition of named-entity information increases the performance of graph-based summarizers in the biomedical domain. [11] studied different feature selection approaches for identifying important concepts in a biomedical text and showed that the concept based summarization method outperforms other frequency-based, domain-independent and baseline methods. Query based biomedical text summarization techniques that rely on external ontology knowledge resource UMLS are proposed in the literature [7, 5, 4, 14, 18, 12, 3]. The ontology-based method of biomedical text summarization performed better when compared to keyword-only methods. [16] observed that an approach for query-focused summarization of medical text based on target-sentence-specific 1 http://bioasq.org/ and target-sentence-independent statistics along with domain-specific features outperforms other baseline and benchmark summarization systems. Text summarization approaches often rely on the similarity measure to model the text documents. [1] has studied the impact of the similarity measure on the performance of the summarization methods in the biomedical domain and found that exploiting both biomedical concepts and semantic types improvises the quality of summaries. Here we propose an approach for query-specific biomedical text summarization which uses ontology knowledge source UMLS [2] to generate a graph of candidate biomedical entities from the query and their semantically connected entities. The importance values of the entities in the query graph are then incorporated in the similarity measure using statistics from the dataset for selecting sentences. 3 Methods This section describes baseline summarization methods, query sentence matching based method, modified query sentence matching based method using UMLS query graph and modified lexrank using UMLS query-graph. 3.1 Baselines Two basic approaches of summarization TextRank [9] and LexRank [6] are used as baselines. In both TextRank and LexRank, a graph is constructed with vertex as each sentence in the document. The edges between sentences are based on some form of semantic similarity or content overlap. TextRank uses a very similar measure based on the number of words two sentences have in common while LexRank uses cosine similarity of TF-IDF vectors. tfw,s1 tf w,s2 (idfw )2 P w∈s1 ,s2 sim(s1 , s2 ) = r P r P (1) (tfw,s1 idfw )2 (tfw,s2 idfw )2 w∈s1 w∈s2 where tfw,si is the number of occurrences of the word w in the sentences si . In the graph, edges were formed between the sentences having similarity greater than the threshold. In both algorithms, the sentences are ranked by applying PageRank to the resulting graph. A summary is formed by combining the top ranking sentences, using a length cutoff to limit the size of the summary. 3.2 UMLS query graph based lexrank The UMLS query-graph based lexrank is a modified version of lexrank which uses query-specific graphs generated using UMLS to get the importance of words, matches sentences using weighted cosine similarity measure, generates a graph of sentences and then applies pagerank on the graph. Query-specific graph has been generated with the use of UMLS entities and relations as described in [15]. From the query, UMLS concepts are extracted and represented as nodes in the graph. Along with concepts, UMLS also contains relations between entities. These relations for query concepts are used to expand nodes. Each query node gets expanded by its related UMLS concepts considering all types of relations within UMLS. After the node expansion, the expanded graph contains all the related concepts as nodes and relations as edges. Two nodes in the graph can have an edge between them if and only if those two entities have some relation in UMLS. There are various types of relations present in UMLS and all types of relations are used. There can be some isolated nodes in the graph when any query concept is not related to any other query concept or it does not have any common related concept with another query concept. The graph is further refined by assigning weights to the edges and removing some of the edges in the graph. The edge weights are calculated based on the co-occurrence value of entities in the text to be summarized. For any edge between two entities, the co-occurrence value of two entities is used as the weight for that edge. The edges whose edge weights are less than some threshold are removed from the graph. Less edge weight for an edge between two entities means those two entities rarely occur together and hence share very less or no context. In the refined graph, the nodes are weighted using PageRank [13]. Node weight represents the importance of that node in the graph i.e. importance of that entity in the graph for that particular query. The main difference with lexrank method is UMLS query-graph weighted cosine similarity. The later processing is same as it is in lexrank. The UMLS query graph based weighted cosine similarity is: (tfw,s1 +Ww )(tf w,s2 +Ww )(idfw )2 P w∈s1 ,s2 sim(s1 ,s2 )= s s P ((tfw,s +Ww )idfw )2 ((tfw,s +Ww )idfw )2 P 1 2 w∈s1 w∈s2 where, Ww = importance of concept w from query-graph, if w is in query-graph = 0, otherwise tfw,q and tfw,s are the number of occurrences of the word w in query q and sentence s, respectively. idfw is the inverse of the number of sentences in which word w is present. The formula incorporates the weights of the extended query terms from query graph in the tf-idf vectors of every sentence containing those terms. 3.3 Query-Sentence matching The Query Sentence Matching (QSM) based summarization method compares all the sentences with the query and takes top similar sentences to query as summary. The queries and all the sentences in snippets are represented by vectors of tf-idf values of words in the sentences. The similarity measure used to match query vector and sentence vector is cosine similarity as given by equation 1. The only difference here is that the similarity is calculated between query and a sentence instead of similarity between two sentences. 3.4 UMLS query graph based query-sentence matching The UMLS querygraph QSM summarization method is a modified version of QSM which uses query-specific graphs generated using UMLS to get the importance of words. For each query, it generates a query-specific graph as described in [15]. This method uses concepts identified using graph based method along with weights. The weights are incorporated in the similarity measure while ranking the sentences for summary. The UMLS query-graph based cosine similarity between query and sentences are calculated using the following formula: P (tfw,q idfw +Ww,q )(tfw,s idfw ) w∈q,s sim(q, s) = r P rP (tfw,q idfw +Ww,q )2 (tfw,s idfw )2 w∈q w∈s where, Ww,q = importance of concept w from query-graph of q, if w is in query-graph = 0, otherwise tfw,q and tfw,s are the number of occurrences of the word w in query q and sentence s, respectively. idfw is the inverse of the number of sentences in which word w is present. Here, the weights are only considered for query vector. They are not incorpo- rated in sentence vectors unlike UMLS graph based lexrank. The intuition for updating only query vector was to see it as an query expansion procedure for query-focused text summarization. 4 Experiments and Results This section describes the experiments performed along with their results. For our experiments, we have used the dataset of BioASQ task 5B phase B and BioASQ task 8B phase B. BioASQ task 5B phase B dataset is used as a benchmark dataset which contains various questions in English, along with gold standard (reference) answers constructed by a team of biomedical experts. The test dataset has five different batches, each containing 100 questions. For each question, the relevant snippets are given and the ideal answer for that question needs to be generated. The ideal answers are paragraph sized summaries so it’s a case of multi-document summarization on relevant snippets. The evaluation is done using The ROUGE [8] measures: ROUGE-2 Recall, ROUGE-2 F-measure, ROUGE-SU4 Recall and ROUGE-SU4 F-measure. The same methods are applied on BioASQ task 8B phase B batch 5 dataset and the runs were submitted to BioASQ8 challenge. 4.1 Results on BioASQ 5B The results on all 5 batches of BIOASQ 5B phase B dataset are presented here. Table 1 and table 2 show a comparison of summarization methods (described in previous section) in terms of ROUGE-2 Recall and ROUGE-2 F-measure, respectively. Table 3 and table 4 shows a comparison of summarization methods in terms of ROUGE-SU4 Recall and ROUGE-SU4 F-measure, respectively. Table 1. ROUGE-2 Recall results on BIOASQ task 5B dataset. Bold represents highest results and * represents statistically significant diffrence with p < 0.05 when compared to baseline lexrank. Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 textrank 0.5188 0.5322 0.6179 0.6169 0.5760 lexrank 0.5716 0.5618 0.6256 0.6150 0.6160 lexrank UMLS querygraph 0.5793 0.5542 0.6278 0.6092 0.6373* QSM 0.5395 0.5193 0.5828 0.5697 0.5514 UMLS querygraph QSM 0.5447 0.5127 0.5895 0.5776 0.5689 Table 2. ROUGE-2 F-measure results on BIOASQ task 5B dataset. Bold represents highest results and * represents statistically significant difference with p < 0.05 when compared to baseline lexrank. Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 textrank 0.1984 0.1857 0.2089 0.2491 0.2185 lexrank 0.2305 0.2051 0.2321 0.2607 0.2456 lexrank UMLS querygraph 0.2324 0.2043 0.2346 0.2562 0.2522* QSM 0.2195 0.1992 0.2158 0.2516 0.2172 UMLS querygraph QSM 0.2200 0.1962 0.2174 0.2501 0.2205 Table 3. ROUGE-SU4 Recall results on BIOASQ task 5B dataset. Bold represents highest results. Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 textrank 0.5419 0.5581 0.6248 0.6345 0.5801 lexrank 0.5887 0.5878 0.6358 0.6267 0.6169 lexrank UMLS querygraph 0.5951 0.5786 0.6384 0.6234 0.6360 QSM 0.5580 0.5432 0.5938 0.5808 0.5628 UMLS querygraph QSM 0.5607 0.5399 0.5988 0.5937 0.5785 Table 4. ROUGE-SU4 F-measure results on BIOASQ task 5B dataset. Bold represents highest results. Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 textrank 0.1958 0.1804 0.2038 0.2419 0.2114 lexrank 0.2253 0.2013 0.2279 0.2518 0.2384 lexrank UMLS querygraph 0.2270 0.1999 0.2305 0.2480 0.2439 QSM 0.2154 0.1935 0.2126 0.2437 0.2117 UMLS querygraph QSM 0.2152 0.1917 0.2143 0.2428 0.2148 4.2 Discussion The results show that UMLS querygraph QSM gives an improvement over QSM. The method lexrank UMLS querygraph gives an improvement over lexrank for batch 1,3 and 5 of the dataset. For the other two batches, the results are comparable. For batch 5, the ROUGE-2 Recall and ROUGE-2 F-measure results of lexrank UMLS querygraph are statistically significantly better than lexrank. Fig. 1. query wise change in lexrank UMLS querygraph with respect to baseline lexrank and distribution of types of the queries The graphs in the first row of fig. 1 shows the query type wise change in the results of lexrank UMLS querygraph as compared to lexrank for every batch of the data while the second row shows the batch wise distribution of the queries based on their types. From the graphs, we can say that the ’yesno’ type of questions are getting improved in all batches (considering batch 2 where it is showing zero change: no improvement and no deterioration). The graph of batch 5 indicates that the major part of effectiveness of the method lexrank UMLS querygraph comes from the improvements in ’factoid’ and ’yesno’ type of queries with a small contribution from ’summary’ types of the queries. For batch 2 and 4 where lexrank UMLS querygraph failed, decrements in ’factoid’, ’list’ and ’summary’ type of queries must be the reason. 4.3 Results on BioASQ 8B Table 5 shows the results of submitted runs for BioASQ task 8B phase B batch 5 using the techniques described in section 3. Surprisingly, for BioASQ 8B batch 5, simple QSM approach outperformed textrank, lexrank and UMLS querygraph based approaches. Table 5. BioASQ task 8B Phase B Ideal answer generation results of batch 5 System R-2 Recall R-2 F-measure R-SU4 Recall R-SU4 F-measure DAIICT QSM 0.6646 0.3468 0.6603 0.3306 DAIICT text 0.6627 0.3425 0.6587 0.3261 DAIICT lex 0.6431 0.3351 0.6399 0.3207 DAIICT lex UMLSgraph 0.6411 0.3332 0.6382 0.3190 DAIICT QSM UMLSgraph 0.6411 0.3332 0.6382 0.3190 Among all participants’ submitted runs, these five submitted runs appeared to be top five(DAIICT QSM being the highest) for ROUGE-2 Recall and ROUGE- SU4 Recall. For ROUGE-2 F-measure, DAIICT QSM is second highest consid- ering all participants’ runs and it is the third highest in case of ROUGE-SU4 F-measure. 5 Conclusion This paper presents query sentence matching based summarization techniques and UMLS query graph based summarization techniques which were submitted to BioASQ8 challenge for ideal answer generation in task B on biomedical seman- tic question answering. These techniques incorporate weights of the candidate biomedical entities from queries and their semantically related entities identified by UMLS and do the matching using tf-idf vectors. The results of the proposed techniques on BioASQ 5B phase B dataset are compared with baselines textrank and lexrank. The analysis shows that the UMLS query graph based method gives comparable results with the baselines and helps to improve ’yesno’ type of questions. The results of these techniques on BioASQ task 8B phase B batch 5 dataset were the highest among all participants where simple QSM approach outperformed UMLS graph based QSM as well as UMLS graph based lexrank. References 1. Azadani, M.N., Ghadiri, N.: Evaluating different similarity measures for automatic biomedical text summarization. In: International Conference on Intelligent Systems Design and Applications. pp. 305–314. Springer (2017) 2. Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1), D267–D270 (2004) 3. Cao, Y., Liu, F., Simpson, P., Antieau, L., Bennett, A., Cimino, J.J., Ely, J., Yu, H.: Askhermes: An online question answering system for complex clinical questions. Journal of biomedical informatics 44(2), 277–288 (2011) 4. Chen, P., Verma, R.: A query-based medical information summarization system using ontology knowledge. In: 19th IEEE Symposium on Computer-Based Medical Systems (CBMS’06). pp. 37–42. IEEE (2006) 5. Elhadad, N., Kan, M.Y., Klavans, J.L., McKeown, K.R.: Customization in a unified framework for summarizing medical literature. Artificial intelligence in medicine 33(2), 179–198 (2005) 6. Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22, 457–479 (2004) 7. Fiszman, M., Rindflesch, T.C., Kilicoglu, H.: Abstraction summarization for manag- ing the biomedical research literature. In: Proceedings of the HLT-NAACL workshop on computational lexical semantics. pp. 76–83. Association for Computational Lin- guistics (2004) 8. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. pp. 74–81 (2004) 9. Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. pp. 404–411 (2004) 10. Mishra, R., Bian, J., Fiszman, M., Weir, C.R., Jonnalagadda, S., Mostafa, J., Del Fiol, G.: Text summarization in the biomedical domain: a systematic review of recent research. Journal of biomedical informatics 52, 457–467 (2014) 11. Moradi, M., Ghadiri, N.: Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artificial intelligence in medicine 84, 101–116 (2018) 12. Morales, L.P., Esteban, A.D., Gervás, P.: Concept-graph based biomedical automatic summarization using ontologies. In: Proceedings of the 3rd textgraphs workshop on graph-based algorithms for natural language processing. pp. 53–56. Association for Computational Linguistics (2008) 13. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford InfoLab (1999) 14. Reeve, L., Han, H., Brooks, A.D.: Biochain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on Applied computing. pp. 180–184. ACM (2006) 15. Sankhavara, J., Dave, R., Dave, B., Majumder, P.: Query specific graph-based query reformulation using umls for clinical information access. Journal of Biomedical Informatics p. 103493 (2020) 16. Sarker, A., Mollá, D., Paris, C.: An approach for query-focused text summarisation for evidence based medicine. In: Conference on Artificial Intelligence in Medicine in Europe. pp. 295–304. Springer (2013) 17. Schulze, F., Neves, M.: Entity-supported summarization of biomedical abstracts. In: Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016). pp. 40–49 (2016) 18. Shi, Z., Melli, G., Wang, Y., Liu, Y., Gu, B., Kashani, M.M., Sarkar, A., Popowich, F.: Question answering summarization of multiple biomedical documents. In: Con- ference of the Canadian Society for Computational Studies of Intelligence. pp. 284–295. Springer (2007)