=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_31
|storemode=property
|title=Visual and Textual Analysis of Social Media and Satellite Images for Flood Detection @ Multimedia Satellite Task MediaEval 2017
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_31.pdf
|volume=Vol-1984
|authors=Konstantinos Avgerinakis,Anastasia Moumtzidou,Stelios Andreadis,Emmanouil Michail,Ilias Gialampoukidis,Stefanos Vrochidis,Ioannis Kompatsiaris
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AvgerinakisMAMG17
}}
==Visual and Textual Analysis of Social Media and Satellite Images for Flood Detection @ Multimedia Satellite Task MediaEval 2017==
Visual and textual analysis of social media and satellite images for flood detection @ multimedia satellite task MediaEval 2017 Konstantinos Avgerinakis1 , Anastasia Moumtzidou1 , Stelios Andreadis1 , Emmanouil Michail1 , Ilias Gialampoukidis1 , Stefanos Vrochidis1 , Ioannis Kompatsiaris1 1 Centre for Research & Technology Hellas - Information Technologies Institute, Greece koafgeri@iti.gr,moumtzid@iti.gr,andreadisst@iti.gr michem@iti.gr,heliasgj@iti.gr,stefanos@iti.gr,ikom@iti.gr ABSTRACT This paper presents the algorithms that CERTH team deployed in order to tackle disaster recognition tasks and more specifically Disaster Image Retrieval from Social Media (DIRSM) and Flood- Detection in Satellite images (FDSI). Visual and textual analysis, as well as late fusion of their similarity scores, were deployed in social media images, while color analysis in the RGB and near- infrared channel of satellite images was performed in order to discriminate flooded from non-flooded images. Deep Convolutional Neural Network (DCNN), DBpedia Spotlight and combMAX was implemented to tackle DIRSM, while Mahalanobis Distance-based classification and morphological post-processing were applied to deal with FDSI. Figure 1: Block diagram of our multimodal retrieval system 1 INTRODUCTION fused with a novel multimodal approach which combines non-linear Security, surveillance and more specifically disaster prediction and graph-based fusion [3] with combMax scoring. For FDSI subtask classification from social media and satellite sources have raised a CERTH performs a Mahalanobis distance classification and several lot of interest in the computer science the last decade. The unob- morphological and adaptive filters, so as to separate flood from trusive and abundant nature of these data rendered them as one of non-flood areas inside satellite image scene. the most valuable sources to extract and deduct early warning or identification of an ongoing or eminent disaster. Multimedia satellite task is a challenge of MediaEval that com- 2 APPROACH prises of two tasks: (a) Disaster Image Retrieval from Social Me- 2.1 Flood detection from social media (DIRSM) dia (DIRSM) and (b) Flood-Detection in Satellite Images (FDSI). Social media were crawled in this task so as to acquire images and DIRSM provides a great amount of social media images (YFCC100M- text about flood scenarios. For that purposes, two modalities were Dataset) and their metadata (Flickr), while FDSI is comprised of a deployed and fused with a non-linear graph-based fusion approach. large amount of 4 colour-channel, 3 for the RGB spectrum and 1 for The first modality concerned visual analysis and more specif- the near-infrared, satellite images from PlanetLabs [5]. Both tasks ically flood detection inside image samples by adopting a Deep ask from the participants to leverage any available technology so Convolutional Neural Network (DCNN) framework. GoogleNet [4] as to determine whether a flood event occurs in the provided test was trained on 5055 ImageNet concepts, and the output of the last data. As far as visual data are concerned, a flood event is considered pooling layer with dimension 1024 was used as a global keyframe when an image shows an "unexpected high water level in indus- representation. The provided development set was then splitted trial, residential, commercial and agricultural areas". The reader is into two subsets and used to train an SVM classifier and define its suggested to read [1] for further information about the contest and optimal parameters: t (defines the kernel type) and g (gamma in the provided data. kernel function). The best results were achieved for t = 1 (polyno- In this work, CERTH presents its algorithms for DIRSM and FDSI mial function) and д = 0.5. The test environment that CERTH built, subtasks. For flood recognition in images, CERTH uses the output of included the evaluation of the precomputed features provided from the last pooling layer of a trained GoogleNet [4] for global keyframe the Multimedia-Satellite challenge (i.e. acc, gabor, fcth, jcd, cedd, representation and trains an SVM classifier to recognize images that eh, sc, cl, and tamura) and DCNN features that were produced from are related to a flooding event. Textual information is also retrieved the Places205 − GooдLeN et network by fusing the features from by leveraging the metadata of the social media images by using the convolutional layers 3a and 3b. SVM classifiers were trained for DBpedia Spotlight annotation tool [2]. Both of these modalities are all of these features and results showed that the proposed DCNN feature outperformed most of them significantly. Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland The second modality concerns the detection of flood-related text in social media metadata. For that purposes, DBpedia Spotlight [2] MediaEval’17, 13-15 September 2017, Dublin, Ireland K. Avgerinakis et al. was adapted so as to detect flood, water and related keyphrases that Table 1: CERTH results in DIRSM task were acquired from the training set metadata (i.e. title, description, user tags). A disambiguation algorithm followed up to compare the Modalities single cutoff several cutoffs aforementioned phrases with the collection, using Jaccard similari- Visual 78.82% 92.27% ties. The similarity scores of the two modalities were also combined with the use of a late fusion approach that uses non-linear graph Textual 36.15% 39.90% based techniques (random walk, diffusion-based) in a weighted Fusion 68.57% 83.37% non-linear way [3]. The top-l multimodal objects are filtered with respect to textual concepts, leading to l ×l similarity matrixes S 1 , S 2 and query-based l × 1 similarity vectors s 1 and s 2 . More specifically, Table 2: CERTH results in FDSI task 10 positive examples were selected from the training set as queries so as to acquire 10 ranked lists and by using combMAX late fusion to get the final list of relevant-to-the-flood multimodal objects. The loc01 loc02 loc03 loc04 loc05 loc06 loc07 overall block diagram of this approach is depicted in Fig. 1. 81.71% 68.33% 82.08% 47.01% 45.84% 64.92% 56.27% 2.2 Flood detection from satellite images (FDSI) Satellite images were collected from PlanetLabs [5] so that we can modality and cannot leverage or complement the visual information evaluate our localization algorithm in real case scenarios. Local- in the final deduction, leading to lower accuracy rates than visual ization is based on a Mahalanobis classification framework and does. post-processing morphological operations. Results from Satellite images (FDSI) are presented in Table 2. Mahalanobis distances with stratified covariance estimates were The accuracy rates are quite diverse amongst them as we acquired computed to train our classifier by randomly selecting 10000 sam- very high rates in some locations such as loc01 and loc03, while ples (RGB and infrared pixels) from each 7 sets of satellite images, other ones such as loc04 and loc05 were too low. From our point leading into a final population of 70000 samples. Linear, diagonal of view, this is attributed to the colour nature of the data in these linear, quadratic and diagonal quadratic discriminant functions areas, as in the former the separation of water was clear, while in were also computed, but Mahalanobis distances achieved the high- the latter non-flood areas had similar colour with the flood ones. est classification results. For every image of the testing set all pixels Furthermore, groundtruth masks included some non-flood pixels of the image were extracted, creating a four dimensional (R,G,B,NI) as flood and as the nature of our algorithm is pixel-wise they were testing set consisting of 102400 samples (320 pixels × 320 pixels) misclassified as positive samples lead to poor performance models. per image. The final outcome was a binary mask that denoted 1 for Overall, our classifier lead to 74.67% localization accuracy rate. flooded pixels and 0 for non-flooded ones. Post-processing was then deployed on the acquired binary masks, 4 DISCUSSION AND OUTLOOK in order to eliminate erroneous areas that resulted from the noisy Multimedia satellite challenge gave as the opportunity to test our nature of the dataset. A global filter was initially deployed on the bi- algorithm in real case disaster scenarios. Social media and satellite nary mask so as to eliminate population of flood-denoted pixels that sources proved extremely valuable and helped us separate flood as a whole did not surpass the 5% of the image size. Similarly, a local scenarios from others. The high average precision rate that visual filter followed up so as to eliminate the connected components of features achieved proves that computer vision community can be- flood-denoted areas that did not surpass the size of 10 pixels. Image come ever more helpful in disaster detection and it is clear now that dilation and erosion was finally applied around each pixel and its can surpass the ambiguity that text can introduce in the decision surrounding area (circular cell with radius of 4 pixels) to eliminate feature. On the other hand, satellite images proved quite noisy and small areas that were falsely denoted as flood, but simultaneously require deeper investigation in the future. preserve the larger ones. As a future work, we plan to adopt deeper techniques that exist in the literature to recognize and discriminate places from each 3 RESULTS AND ANALYSIS other, while we also plan to investigate hybrid representations that Social media results for flood situations (DIRSM) are gathered in Ta- combine shallow with deep features so as to achieve even higher ble 1. Two retrieval approaches were used; (a) single cutoff scheme precision rates in the visual part of the system. Text approaches that returns the top-480 most similar samples and (b) multiple cutoff should undoubtedly revised and get tailored to disaster related scheme that combines results from 4 different thresholds equal to scenarios, while fusion approaches that consider “semantic filtering” 50, 100, 250, 480 by averaging their scores so as to conclude into a stages based on textual concepts will be revised. Regarding FDSI, final list. we plan to build a shallow/deep representation scheme that will It is obvious that multiple cutoffs worked better than a single. leverage both texture (i.e. LBP) and deep features so as to learn to Furthermore, we can observe that visual modality surpassed the separate flood from non-flood areas even more effectively. textual by far and this is mainly attributed to the fact that some keywords related to flood and water might be found under several ACKNOWLEDGMENTS irrelevant contexts, leading text retrieval to very low accuracy This work is supported by beAWARE project, partially funded by rates. Fusion is also affected by the low performance of the textual the European Commission (H2020-700475). Multimedia Satellite Task MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Benjamin Bischke, Patrick Helber, Christian Schulze, Srinivasan Venkat, Andreas Dengel, and Damian Borth. The Multimedia Satellite Task at MediaEval 2017: Emergence Response for Flooding Events. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [2] Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics). [3] Ilias Gialampoukidis, Anastasia Moumtzidou, Dimitris Liparas, Theodora Tsikrika, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Multimedia retrieval based on non-linear graph-based fusion and par- tial least squares regression. Multimedia Tools and Applications (2017), 1–21. [4] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions.. In CVPR. IEEE Computer Society, 1–9. http://dblp.uni-trier.de/db/conf/cvpr/ cvpr2015.html#SzegedyLJSRAEVR15 [5] Planet team. 2017. Planet Application Program Interface: In Space for Life on Earth. (2017).