EnDSUM: Entropy and Diversity based Disaster Tweet Summarization Piyush Kumar Garg1 , Roshni Chakraborty2 and Sourav Kumar Dandapat3 1 Indian Institute of Technology, Patna, Bihar, India 2 Aalborg University, Denmark 3 Indian Institute of Technology, Patna, Bihar, India Abstract The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of all the tweets which can highlight the important aspects of the disaster. In this paper, we propose an entropy and diversity based summarizer, termed as EnDSUM, specifically for disaster tweet summarization. Our comprehensive analysis on 6 datasets indicates the effectiveness of EnDSUM and additionally, highlights the scope of improvement of EnDSUM. Keywords Entropy, Disaster tweets, Social media, Summarization 1. Introduction Social media platforms, like Twitter, are highly important mediums of information during disasters. For example, humanitarian organizations and government agencies rely on Twitter to identify relevant information on different categories, such as affected population, urgent need of resources, infrastructure damage, etc [1]. However, the huge number of tweets posted and the high vocabulary diversity [2, 3] makes it challenging to manually find the relevant information [4, 5]. In order to address this issue, several research works [6, 7] have proposed specific tweet summarization approaches for disaster events. Existing disaster tweet summarization approaches could be segregated into content based [6], graph based [7], deep learning based [8], and ontology based [9] approaches on the basis of the mechanism they follow. While content based approaches [6, 10] rely only on the importance of the words present in a tweet to determine its selection to the summary, deep learning based approaches [8] consider both content and contextual importance of the tweet. However, none of these approaches consider the vocabulary diversity and therefore, fails to always ensure diversity in summary and coverage of all the important categories present in the tweets. In order In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’22 Workshop, Stavanger (Norway), 10-April-2022 � piyush1_2021cs05@iitp.ac.in (P. K. Garg); roshnic@cs.aau.dk (R. Chakraborty); sourav@iitp.ac.in (S. K. Dandapat) � 0000-0003-2266-9605 (P. K. Garg); 0000-0003-2043-2356 (S. K. Dandapat) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 91 to address these, graph based approaches [7, 11] initially group similar tweets together such that each group represents a category by community detection algorithms, thereby handling the vocabulary diversity followed by selecting representative tweets from each group to create the summary to ensure coverage. However, automatic community detection algorithms fails to automatically segregate the tweets into different categories due to the vocabulary overlap among tweets of different categories. Therefore, Garg et al. [9] initially identify the category of each tweet by an ontology based approach and then, select tweets from each category to generate the summary. However, none of these approaches try to handle the vocabulary diversity simultaneously while selecting the tweets into the summary. For example, these existing approaches are dependent on identifying the categories initially which lead to bad summaries, such as reduced diversity in summary, if the categories are not identified correctly. In order to resolve this, we propose EnDSUM, an entropy and diversity based disaster sum- marizer where we automatically select that tweet into summary which provides the best information coverage of all the tweets, i.e., entropy and most novel information, i.e., diversity. Therefore, EnDSUM can generate the summary automatically without explicitly identifying the category of a tweet. Although there are few single and multiple document summarization approaches [12, 13, 14, 15, 16] that have highlighted the relevance of entropy based selection of sentences into summary, those approaches are not directly applicable to disaster tweets. The reason being the informal structure of tweets, absence of storyline in tweets and the high vocabu- lary diversity in user generated tweets. Our evaluation of EnDSUM with existing state-of-the-art disaster tweet summarization approaches on 6 different disasters shows its high effectiveness on 5 datasets. However, we observe that the performance of EnDSUM degrades when there is considerable vocabulary overlap among the tweets which belong to different categories of the same disaster event. The reason being we consider only content based information for calculation of entropy and diversity. The organization of the paper is as follows. We discuss problem definition and proposed approach in Section 2 followed by the experiment details in Section 3 and conclusions in Section 4. 2. Proposed Approach Given a disaster event, E, that consists of m tweets, T = {T1 , T2 , ..., Tm }, we aim to prepare a summary, S, by selecting l tweets from T such that it provides the maximum information coverage from T with minimum redundant information in the final summary. Therefore, we propose EnDSUM where we iteratively select the tweet that can ensure maximum entropy of all the tweets and maximum diversity in summary. While selection of the tweet with maximum entropy ensures information coverage of a category, selection of the tweet with the maximum diversity ensures not multiple tweets from the same category are selected [17, 18]. Although [17, 18] ensure maximization of diversity in summary, they propose network stratification based approaches which require explicit grouping of similar tweets together by community detection to ensure maximum diversity. [17, 18] are tweet summarization approaches related to news events which are not directly applicable to disaster events as community detection algorithms fail to group similar tweets in a disaster automatically [9]. Therefore, in EnDSUM, we propose an entropy and diversity based selection mechanism specific for tweets related to 92 disaster events that does not require identification of similar groups and ensure better summary quality. Therefore, at every iteration, we select the tweet (T ⇤ ), which has the maximum score by Equation 1. X 0 T ⇤ = arg max (↵ ⇤ E(Ti , K) + ⇤ D(Ti , S )) (1) Ti 2(T S 0 ) where, E(Ti , K) represents the entropy of Ti and K is the list of similar tweets of Ti , where a tweet is said to be similar to Ti if the content based cosine similarity, i.e. Pij between them is higher than 0 (as shown in [19]) and Pij is the normalized number of overlapping between Ti 0 and Tj normalized by the total number of overlapping keywords of Ti with any tweet. D(Ti , S ) represents the information diversity provided by Ti with respect to the already selected tweets in 0 summary, S . ↵ and are the tunable parameters which represent the importance of E(Ti , K) 0 and D(Ti , S ) respectively. We consider ↵ and as 0.5 to provide equal importance to both entropy and diversity. Although there are several available mechanisms to calculate E(Ti ), we rely on Karci Entropy [19] for EnDSUM. Karci Entropy can resolve the inherent vocabulary diversity in disaster tweets as it calculates the entropy of a tweet, E(Ti , K), by considering the similarity of Ti with the other tweets as shown in Equation 2. |K| X E(Ti , K) = | Pij log Pij |, 0 < (2) j=1 where, represents the importance of similarity. We consider as 0.5 as highlighted by Hark et al. [19]. Hark et al. [19] discuss while a lower value of mostly considers the impact of the local effect of the keywords, a higher value considers the impact of the global effect. Furthermore, they observe that the Rouge-N score was maximum for the value of 0.5 irrespective of the summary length which we directly consider as the value in EnDSUM. As a future direction of EnDSUM, we intend to exhaustively experiment and develop Karci Entropy such that it is most 0 0 suitable for tweets related to disaster summarization. We calculate D(Ti , S ) as (1-Sim(Ti , S )) 0 0 where Sim(Ti , S ) represents the overlap in keywords between Ti and S by 0 X Overlap(Ti , Tk ) Sim(Ti , S ) = (3) Length(Ti ) k2S 0 where, Length(Ti ) is the number of keywords of Ti . We follow Khan et al. [20] to identify 0 the keywords of Ti as the nouns, verbs, adjectives present in Ti and similarly, for S , we consider 0 the distinct set of nouns, verbs, adjectives present in all the tweets of S . Therefore, a lower 0 Sim(Ti , S ) ensures Ti has minimum redundant content information with respect to already 0 generated summary, S , and a higher E(Ti ) ensures Ti has higher information coverage of the category. 93 3. Experiments and Results In this Section, we provide details of the experiment and results. For the datasets, we consider Los Angeles International Airport Shooting 1 (D1 ) provided by Olteanu et al. [21], Hurricane Matthew 2 (D2 ), Puebla Mexico Earthquake 3 (D3 ), Pakistan Earthquake 4 (D4 ) and Midwestern U.S. Floods 5 (D5 ) provided by Alam et al. [22] and Sandy Hook Elementary School Shooting 6 (D6 ) provided by Dutta et al. [7]. We perform lemmatization, convert to lower case and remove of Twitter specific keywords [23] and retweets as pre-processing. We consider the ground truth summary provided by Garg et al. [9] for D1 -D5 and by Dutta et al. [7] for D6 . We compare EnDSUM with content based [24] (B1 ), graph based [7] (B2 ), sub-event based [25] (B3 ) and ontology based [9] (B4 ) disaster summarization approaches. Results and Discussion : We evaluate the performance of EnDSUM and the existing research with the ground truth summary using ROUGE-N [26] F1-score score when N=1, 2, and L. Our observations from Table 1 indicate that EnDSUM ensures better ROUGE-N F1-score over all baselines for D2 -D6 . The improvement is highest over B1 baseline and lowest over B4 baseline. EnDSUM performs worse than B4 for Rouge-N scores and worse than B1 for Rouge-2 and Rouge- L scores on D1 . Therefore, although EnDSUM has highly effective performance in most scenarios, it sometimes fails to resolve the vocabulary overlap across different categories in a disaster, as seen for D1 . Therefore, to resolve this, we are working towards making EnDSUM resilient irrespective of the vocabulary diversity by considering semantic and contextual similarity along with the already considered content similarity for entropy and diversity calculation. Table 1 F1-score of ROUGE-1, ROUGE-2 and ROUGE-L score of EnDSUM and baselines on 6 datasets is shown. Dataset Approaches ROUGE-1 ROUGE-2 ROUGE-L Dataset Approaches ROUGE-1 ROUGE-2 ROUGE-L F1-score F1-score F1-score F1-score F1-score F1-score EnDSU M 0.55 0.21 0.27 EnDSU M 0.51 0.16 0.24 B1 0.49 0.22 0.29 B1 0.20 0.04 0.20 D1 B2 0.48 0.18 0.25 D4 B2 0.47 0.14 0.21 B3 0.52 0.21 0.23 B3 0.45 0.11 0.21 B4 0.56 0.23 0.29 B4 0.50 0.15 0.23 EnDSU M 0.52 0.17 0.24 EnDSU M 0.52 0.13 0.24 B1 0.48 0.13 0.22 B1 0.19 0.04 0.18 D2 B2 0.47 0.14 0.22 D5 B2 0.48 0.10 0.20 B3 0.44 0.12 0.22 B3 0.50 0.12 0.22 B4 0.49 0.15 0.23 B4 0.51 0.13 0.22 EnDSU M 0.52 0.14 0.26 EnDSU M 0.55 0.27 0.44 B1 0.45 0.13 0.23 B1 0.53 0.26 0.33 D3 B2 0.46 0.14 0.24 D6 B2 0.52 0.22 0.29 B3 0.44 0.14 0.23 B3 0.48 0.20 0.27 B4 0.48 0.16 0.25 B4 0.51 0.20 0.29 1 https://en.wikipedia.org/wiki/2013_Los_Angeles_International_Airport_shooting 2 https://en.wikipedia.org/wiki/Hurricane_Matthew 3 https://en.wikipedia.org/wiki/2017_Puebla_earthquake 4 https://en.wikipedia.org/wiki/2019_Kashmir_earthquake 5 https://en.wikipedia.org/wiki/2019_Midwestern_U.S._floods 6 https://en.wikipedia.org/wiki/Sandy_Hook_Elementary_School_shooting 94 4. Conclusions and Future Works In this paper, we propose a novel entropy and diversity based tweet summarizer, EnDSUM for disaster events. Our experimental analysis on 6 disaster datasets indicates both the effectiveness of EnDSUM and its scope of improvement. For example, to handle the the high vocabulary overlap among categories, we are working to both include semantic and contextual similarity while calculating entropy and diversity in EnDSUM. Furthermore, while most summarization algorithms generate a predefined length summary, we intend to extend EnDSUM such that it provides complete information coverage of the disaster event, while maintaining diversity auto- matically without predefined summary length. Currently, EnDSUM selects the most informative tweet into the summary at each iteration. As a future direction, we intend to modify EnDSUM such that it can select the best subset of tweets simultaneously as the summary. References [1] M. Imran, P. Mitra, C. Castillo, Twitter as a lifeline: Human-annotated twitter corpora for nlp of crisis-related messages, arXiv preprint arXiv:1605.05894 (2016). [2] C. Castillo, Big crisis data: social media in disasters and time-critical situations, Cambridge University Press, 2016. [3] R. Chakraborty, A. Kharat, A. Khatua, S. K. Dandapat, J. Chandra, Predicting tomorrow’s headline using twitter deliberations., in: CIKM Workshops, 2018. [4] S. Vieweg, C. Castillo, M. Imran, Integrating social media communications into the rapid assessment of sudden onset disasters, in: International Conference on Social Informatics, Springer, 2014, pp. 444–461. [5] M. Imran, C. Castillo, F. Diaz, S. Vieweg, Processing social media messages in mass emergency: A survey, ACM Computing Surveys (CSUR) 47 (2015) 1–38. [6] K. Rudra, S. Ghosh, N. Ganguly, P. Goyal, S. Ghosh, Extracting situational information from microblogs during disaster events: a classification-summarization approach, in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 583–592. [7] S. Dutta, V. Chandra, K. Mehra, A. K. Das, T. Chakraborty, S. Ghosh, Ensemble algorithms for microblog summarization, IEEE Intelligent Systems 33 (2018) 4–14. [8] A. Dusart, K. Pinel-Sauvagnat, G. Hubert, Tssubert: Tweet stream summarization using bert, arXiv preprint arXiv:2106.08770 (2021). [9] P. K. Garg, R. Chakraborty, S. K. Dandapat, Ontorealsumm: Ontology based real-time tweet summarization, arXiv preprint arXiv:2201.06545 (2022). [10] K. Rudra, N. Ganguly, P. Goyal, S. Ghosh, Extracting and summarizing situational infor- mation from the twitter social media during disasters, ACM Transactions on the Web (TWEB) 12 (2018) 1–35. [11] S. Dutta, S. Ghatak, M. Roy, S. Ghosh, A. K. Das, A graph based clustering technique for tweet summarization, in: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO)(trends and future directions), IEEE, 2015, pp. 1–6. 95 [12] A. Khurana, V. Bhatnagar, Investigating entropy for extractive document summarization, Expert Systems with Applications 187 (2022) 115820. [13] G. Feigenblat, H. Roitman, O. Boni, D. Konopnicki, Unsupervised query-focused multi- document summarization using the cross entropy method, in: Proceedings of the 40th International ACM SIGIR Conference on research and development in information retrieval, 2017, pp. 961–964. [14] S. Aji, R. Kaimal, Document summarization using positive pointwise mutual information, AIRCC’s International Journal of Computer Science and Information Technology 4 (2012) 47–55. [15] W. Luo, F. Zhuang, Q. He, Z. Shi, Effectively leveraging entropy and relevance for summa- rization, in: Asia Information Retrieval Symposium, Springer, 2010, pp. 241–250. [16] G. Ravindra, N. Balakrishnan, K. Ramakrishnan, Multi-document automatic text sum- marization using entropy estimates, in: International Conference on Current Trends in Theory and Practice of Computer Science, Springer, 2004, pp. 289–300. [17] R. Chakraborty, M. Bhavsar, S. K. Dandapat, J. Chandra, Tweet summarization of news articles: An objective ordering-based perspective, IEEE Transactions on Computational Social Systems 6 (2019) 761–777. [18] R. Chakraborty, M. Bhavsar, S. Dandapat, J. Chandra, A network based stratification approach for summarizing relevant comment tweets of news articles, in: International Conference on Web Information Systems Engineering, Springer, 2017, pp. 33–48. [19] C. Hark, A. Karcı, Karcı summarization: A simple and effective approach for automatic text summarization using karcı entropy, Information Processing & Management 57 (2020) 102187. [20] M. A. H. Khan, D. Bollegala, G. Liu, K. Sezaki, Multi-tweet summarization of real-time events, in: 2013 International Conference on Social Computing, IEEE, 2013, pp. 128–133. [21] A. Olteanu, S. Vieweg, C. Castillo, What to expect when the unexpected happens: Social media communications across crises, in: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, 2015, pp. 994–1009. [22] F. Alam, U. Qazi, M. Imran, F. Ofli, Humaid: Human-annotated disaster incidents data from twitter with deep learning benchmarks, arXiv preprint arXiv:2104.03090 (2021). [23] C. Arachie, M. Gaur, S. Anzaroot, W. Groves, K. Zhang, A. Jaimes, Unsupervised detection of sub-events in large scale disasters, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020, pp. 354–361. [24] K. Rudra, P. Goyal, N. Ganguly, M. Imran, P. Mitra, Summarizing situational tweets in crisis scenarios: An extractive-abstractive approach, IEEE Transactions on Computational Social Systems 6 (2019) 981–993. [25] K. Rudra, P. Goyal, N. Ganguly, P. Mitra, M. Imran, Identifying sub-events and summarizing disaster-related information from microblogs, in: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, pp. 265–274. [26] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81. 96