=Paper=
{{Paper
|id=Vol-2670/MediaEval_19_paper_47
|storemode=property
|title=AI-Based Flood Event Understanding and Quantifying Using Online Media and Satellite
Data
|pdfUrl=https://ceur-ws.org/Vol-2670/MediaEval_19_paper_47.pdf
|volume=Vol-2670
|authors=Mirko Zaffaroni,Laura López
Fuentes,Alessandro Farasin,Paolo
Garza,Harald Skinnemoen
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ZaffaroniLFGS19
}}
==AI-Based Flood Event Understanding and Quantifying Using Online Media and Satellite
Data==
AI-based flood event understanding and quantification using online media and satellite data Mirko Zaffaroni1,4,* , Laura Lopez-Fuentes2,5,* , Alessandro Farasin3,4,* , Paolo Garza3 , Harald Skinnemoen5 1 University of Turin, Italy; mirko.zaffaroni@unito.it 2 University of the Balearic Islands, Spain; l.lopez@uib.es 3 Politecnico di Torino, Italy; {name.surname}@polito.it 4 LINKS Foundation, Italy; {name.surname}@linksfoundation.com 5 AnsuR Technologies, Norway; {name}@ansur.no * The authors contributed equally to this work. ABSTRACT This work has been done in the context of MediaEval 2019, as a In this paper we study the problem of flood detection and quan- participation in the Multimedia Satellite task. Detailed information tification using online media and satellite data. We present a three about the task and data can be found in [3]. approaches, two of them based on neural networks and a third one based on the combination of different bands of satellite images. 2 RELATED WORK This work aims to detect floods and also give relevant information Emergency prevention, detection, assistance and understanding about the flood situation such as the water level and the extension through computer vision and image processing techniques is an of the flooded regions, as specified in the three subtasks, for which open problem since the early stages of this field [13]. In particular, in of them we propose a specific solution. the flood detection domain scientific work mostly focuses on flood detection either in social media or satellite data [1, 2, 4]. Among the latest, several approaches are known in the literature and exploit 1 INTRODUCTION spectral bands and other sensor measurements [9, 14–16, 19] to The frequency and the intensity of natural disasters have risen retrieve proper indicators. significantly due to climate change. Flood events alone represent This work builds on top of our Multi-modal deep learning ap- about the 39% of the natural disasters occurred worldwide. During proach for flood detection [12], which used social media images this type of natural disasters it is important for emergency respon- together with their metadata to determine if a social media post ders to have as much information as possible about the magnitude contained visual information about a flood; and our deep learning of the disaster, the areas affected and the situation and location of models for passability detection of flooded roads [5, 11], which people in danger. In order to extract this information we consider went a step further and gave information about the state of the two information sources: online news articles and satellite spectral roads during a flood event, information that is of utmost importance imagery. Thanks to the rapid access to internet, online news contain during a flood in order to build a map of accessible roads for rescue information about natural disasters in almost real-time while satel- and supply operations. Moving in this direction, in this paper we lite spectral imagery can give information of the extension of the aim at giving an estimation of the water level. flood. Using these two information sources, we propose approaches for flood event understanding and quantification: 3 APPROACH • An algorithm that determines if an image extracted from an on- In this section, each stage of the solution will be briefly introduced. line news article contains relevant information about the flood. News Image Topic Disambiguation (NITD). During a flood- For example, images of the flood itself, but also images of emer- ing the media normally updates the information about the situation gency responders, people in danger, etc. to keep the reader updated. Due to the large amount of online • An algorithm that given an image, extracted from online news, newspapers and media, searching for these relevant articles can be determines if there is water in the image and in case of containing time consuming. To optimize the search it is possible to use natural water, if the water level is above or below the knee level of the language processing (NLP) algorithms or keyword searches. Since people in the scene, if there are any. It contemplates also the use most of these articles contain images, in this first stage, we want of news text as additional data for inference. to refine the search using a computer vision algorithm to classify • An algorithm that given spectral imagery from satellites it seg- those images in flood event related/not flood event related. ments the water regions of the images and gives a flood/no flood In order to train the classifier we use the training set for this prediction and an estimation of the flood extension. task that is composed by 5145 images which have been retrieved from online news as containing information about a flood by an NLP or keyword algorithm and then manually classified. As for Copyright 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution the algorithm, we use an ensemble of 4 state-of-the-art networks 4.0 International (CC BY 4.0). (InceptionV3 [18], MobileNet [10], VGG16 and VGG19 [17]) and MediaEval’19, 27-29 October 2019, Sophia Antipolis, France MediaEval’19, 27-29 October 2019, Sophia Antipolis, France M. Zaffaroni, L. Lopez-Fuentes, A. Farasin et al. cross-validation using two folds. Since the dataset is highly imbal- to white the pixels having MN DW I S 2 ≥ 0, black the others. anced we balance the dataset during training by randomly under- ρдr een − ρ swir 1 B03 − B11 sampling the negative class for each epoch. This way the dataset MN DW I = , MN DW I S 2 = (1) ρдr een + ρ swir 1 B03 + B11 stays balanced but we use all the samples from both categories. Finally, we combine the networks by majority voting. Assuming that the dataset does not have missing values lasting for Multimodal Flood Level Estimation (MFLE). Given online the whole time serie, we set the pixels related to uncovered areas to articles with visual and textual information we developed a textual, white. Then, we performed the pixel-wise intersection among two a textual-visual and a only visual model to estimate the flood level sets of layers: (i) the computed binary layers marked as FLOODED by predicting if the water is above or below the knee of the people and (ii) the ones marked as NON-FLOODED in the metadata file. in the scene. The latter model is composed by two branches: (i) it The two images depict the water persistence in case of flood takes as input image crops of person’s knees extracted by a state and non-flood. Finally, to discriminate flooded regions from normal of the art pose estimator [6], and predicts if the knee is under or water-sources (like rivers or lakes) a pixel-wise difference among above the water; (ii) it takes as input a full image of the scene, and the two sets is computed. Even if a binary mask representing the predicts if the image has people with knees underwater. residual flood extent is available, to be compliant with the CCSS To create the training data for the first branch we used the pose subtask, the approach returns 1 if there is still any white region in estimator algorithm to extract a region around all the knees from the resulting binary mask, 0 otherwise. the training set. The knees from images which were labelled as 0, water below the knee, were labelled as 0 by default, while the ones 4 RESULTS belonging to images labelled as 1 were manually labelled, since The results, split by subtask, are reported in Table 1. For the sub- there could be people in the same image with water level above tasks NITD and MFLE, the F1-Scores are referred to the 20 % of the or below the knee. Both networks use a VGG19 [17] pre-trained development set, used as validation set. Conversely, being the CCSS on ImageNet [7] to extract deep features of the images, followed proposed approach an expert system, the whole devset was used. by a fully-connected (FC). Then the information is concatenated In this latest substask, the confusion matrix on the devset, TP:108, to combine the semantic features of the knee with the context FP:0, FN:33, TN:127, shows that the approach is strong against false information provided by the full resolution image. This way the positives, having a precision of 1.0. first branch gets information about the context while the second branch gets information about the knees. Finally, a FC estimates if Table 1: Results per subtask the knee is above or below the water and another FC if the water is above or below the knee level. The two branch system is propose because a simple one-branch Convolution Neural Network (CNN) Subtask Data DevSet F-Score TestSet F-Score would greedily learn to predict flooded images as “water above NITD Visual 0.8062 0.6628 the knee” class, since it lacks specific data about the knees in the Visual 0.7667 0.5428 scene and so it would associate the features of a flooded area as MFLE Text 0.5213 0.4956 “water above the knee” class, because it solely composed by these Visual & Text 0.5454 0.5284 examples. Finally, an image is classified as “water above the knee” if there CCSS Satellite 0.8850 0.9118 is at least one knee in the scene that is classified as “water above the knee” by the knee branch and the context branch also classified the 5 ANALYSIS AND CONCLUSIONS image as “water above the knee”. We also combined textual data of the articles to verify if it could lead to a better predictor. This was Conclusions present our insight on the subtasks. (NITD) Balanc- achieved by building an ensemble composed by the previous model ing the dataset during training and combining different models and an NLP module. This module is composed by a bidirectional significantly improves the perfomance. (MFLE) (i) Merging global Long Short-Term Memory (LSTM) network. The result of the LSTM and local classifiers improves the performance; (ii) the text brings is concatenated to the last FC layer of the image classifier. The only some information, but the approach gives better results process- textual model is composed by the module described above alone. ing only images; (iii) people water reflection degrades the perfor- City-centered Satellite Sequences (CCSS). Given a sequence mance of pose estimation algorithm. (iv) the importance of the two of Sentinel-2 satellite images that depict a certain city over a cer- branches is supplied by an ablation study in which the two branch tain length of time, this task aims to classify whether there was a model achieved 0.79 F1-score on validation, while the full image flooding event ongoing in that city at that time. branch alone achieved 0.71 and the branch using the cropped knees We built an expert system which leverages on both the spectral achieved 0.76. (CCSS) (i) B03 and B11 are highly informative for and the related metadata information. Firstly, it computes a binary water segmentation; (ii) the approach is an expert system, therefore mask for each layer, in which white pixels represent areas with there is no need of a training set and it is computationally fast; presence of water, while black pixels represent the other regions. The binary masks are obtained: (i) by computing, for each pixel, the ACKNOWLEDGMENTS Modified Normalized Difference Water Index (MNDWI) [8] adapted This work was supported by the European Commission H2020 for Sentinel-2 bands (S2), according to Equation (1); (ii) by setting SHELTER project, GA no. 821282 and by the Spanish grant TIN2016- 75404-P. Laura Lopez-Fuentes benefits from the NAERINGSPHD The Multimedia Satellite Task: Flood Severity Estimation MediaEval’19, 27-29 October 2019, Sophia Antipolis, France fellowship of the Norwegian Research Council under the collabora- [18] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Re- tion agreement Ref.3114 with the UIB. thinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818– REFERENCES 2826. [19] Z. Zhu and C. E. Woodcock. 2012. Object-based cloud and cloud [1] K. Avgerinakis, A. Moumtzidou, S. Andreadis, E. Michail, I. Gialam- shadow detection in Landsat imagery. Remote sensing of environment poukidis, S. Vrochidis, and I. Kompatsiaris. Visual and Textual Analysis 118 (2012), 83–94. of Social Media and Satellite Images for Flood Detection@ Multimedia Satellite Task MediaEval 2017. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [2] B. Bischke, P. Bhardwaj, A. Gautam, P. Helber, D. Borth, and A. Den- gel. Detection of Flooding Events in Social Multimedia and Satellite Imagery using Deep Neural Networks. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [3] B. Bischke, P. Helber, S. Brugman, E. Basar, Zx Zhao, M. Larson, and K. Pogorelov. The Multimedia Satellite Task at MediaEval 2019: Es- timation of Flood Severity. In Proc. of the MediaEval 2019 Workshop (Oct. 27-29, 2019). Sophia Antipolis, France. [4] B. Bischke, P. Helber, Z. Zhao, J. De Bruijn, and D. Borth. The Multi- media Satellite Task at MediaEval 2018. In Proc. of the MediaEval 2018 Workshop (Oct. 29-31, 2018). Sophia Antipolis, France. [5] B. Bischke, P. Helber, Z. Zhao, J. de Bruijn, and D. Borth. The Mul- timedia Satellite Task at MediaEval 2018: Emergency Response for Flooding Events. In Proc. of the MediaEval 2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France. [6] Z. Cao, T. Simon, S. Wei, and Y. Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299. [7] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09. [8] G. Donchyts, J. Schellekens, H. Winsemius, E. Eisemann, and N. van de Giesen. 2016. A 30 m resolution surface water mask including estima- tion of positional and thematic differences using landsat 8, srtm and openstreetmap: a case study in the Murray-Darling Basin, Australia. Remote Sensing 8, 5 (2016), 386. [9] A. Farasin and P. Garza. 2018. PERCEIVE: Precipitation Data Char- acterization by means on Frequent Spatio-Temporal Sequences. In ISCRAM. [10] A. G Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. (2017). [11] L. Lopez-Fuentes, A. Farasin, H. Skinnemoen, and P. Garza. Deep Learning Models for Passability Detection of Flooded Roads. In Proc. of the MediaEval 2018 Workshop (Oct. 29-31, 2018). Sophia Antipolis, France. [12] L. Lopez-Fuentes, J. van de Weijer, M. Bolanos, and H. Skinnemoen. Multi-modal Deep Learning Approach for Flood Detection. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [13] L. Lopez-Fuentes, J. van de Weijer, M. González-Hidalgo, H. Skin- nemoen, and A. D. Bagdanov. 2018. Review on computer vision tech- niques in emergency situations. Multimedia Tools and Applications 77, 13 (2018), 17069–17107. [14] K. Osumi. 2019. Detecting land cover change using Sentinel-2. Ab- stracts of the ICA 1 (2019). [15] S. Qiu, Z. Zhu, and B. He. 2019. Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4–8 and Sentinel-2 imagery. Remote Sensing of Environment 231 (2019), 111205. [16] C. Rossi, F. S. Acerbo, K. Ylinen, I. Juga, P. Nurmi, A. Bosca, F. Tarasconi, M. Cristoforetti, and A. Alikadic. 2018. Early detection and information extraction for weather-induced floods using social media streams. International journal of disaster risk reduction 30 (2018), 145–157. [17] K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Net- works for Large-Scale Image Recognition. ICLR (2015).