Deep Learning models for passability detection of flooded roads Laura Lopez-Fuentes1, 2, 3 , Alessandro Farasin4, 5 , Harald Skinnemoen3 , Paolo Garza4 1 University of the Balearic Islands, Spain, 2 Autonomous University of Barcelona, Spain 3 AnsuR Technologies, Norway, 4 Politecnico di Torino, Italy 5 Istituto Superiore Mario Boella, Italy l.lopez@uib.es,alessandro.farasin@ismb.it,harald@ansur.no,paolo.garza@polito.it ABSTRACT each image, metadata is available. Metadata is a set of informa- In this paper we study and compare several approaches to detect tion concerning the tweet itself and the user who wrote it. For floods and evidence for passability of roads by conventional means instance, the text message and the respective language, the number in Twitter. We focus on tweets containing both visual information (a of retweets and likes, the number of replies, the coordinates (and picture shared by the user) and metadata, a combination of text and whether it is geo-located) and the user’s followers number are only related extra information intrinsic to the Twitter API. This work a subset of properties associated to a tweet. has been done in the context of the MediaEval 2018 Multimedia Satellite Task. 4 APPROACH To properly deal with heterogeneous data, we opted for a "divide- et-impera" approach: we created a model for each kind of data. 1 INTRODUCTION In detail we developed: (i) a classification model using only the Social media are becoming, year by year, ever more popular and metadata information, (ii) three classification models using only used for sharing people daily activities. This massive adoption the images, (iii) a model which combines the metadata and the brought to a large availability of contents, such as texts and pictures, visual information. In this section we will briefly describe the three in the most varied sectors, making the social media a great source of different systems. information. The large availability of data is precious for extracting knowledge and it lays the basis for several applications. One of 4.1 Metadata only them falls in the context of emergency management related to We processed metadata (i) filtering out properties not available natural disasters, in which computer vision and machine learning in the whole dataset, (ii) studying the correlation among the re- techniques are investigated [2, 3, 11, 16] to extract key information maining ones and selecting the most relevant ones. As a result and help first responders in their activities. we kept the text written by the user, the language in which the In this work we focus on analyzing social media posts in order to tweet was originally written, the number of retweets and the num- extract valuable information about roads affected by flood. In more ber of persons who had favourited the tweet. As a pre-processing detail, we propose a multi-modal deep learning network which pro- step we have translated all texts to English, removed emojis, urls cesses flood-related social media pictures and related metadata (e.g., and special characters; then we used lemmatization and tokeniza- Twitter, Flickr, YFCC100M), both to provide (a) evidence of roads tion techniques. We represented the text features using a word and (b) whether they are also passable. The approach presented in embedding initialized with Glove [7]. After that, we normalized this paper takes his inspiration (and it wants to be an extension) between 0 and 1 the number of retweets and the number of times from the work [10] proposed in the MediaEval challenge held in the tweet was favourited. Then, we binarized the original language 2017. information by assigning 0 to English and 1 to any other language. Finally, we defined a neural network composed by a bidirectional 2 RELATED WORKS Long Short-Term Memory (LSTM) network. The result of the LSTM Recent literature approaches leverage on satellite [8, 12, 15] or has been concatenated to the normalized extra fields and passed ground acquisitions [5] to identify flood events. Other works focus through a two parallel fully-connected (FC) layers with a softmax more on urban elements detection such as roads [6, 9]. To the best classifier, one per task. of our knowledge there are no existing works to determine road Both tasks have been trained in parallel. Initially, we set all the passability evidence during flood events. images which had no evidence of flood as having no passability issues either. However, this strategy introduced a big imbalance 3 DATA to an already imbalanced dataset which made the training more The dataset used in this work was distributed by MediaEval 2018 difficult, so we finally decided to use only the images containing Multimedia Satellite Task [1, 4]. It consists of 5820 Twitter images evidence of flood to train the passability classifier, while still doing with its related metadata, from which ∼36% of the images present the training in parallel. flooded regions with evidence of roads. Only the images belonging to the earlier class are considered for the second task evaluation: 4.2 Visual Information only among them, the ∼45% present passable roads. Furthermore, for As a pre-processing step for the images we applied several data augmentation techniques: image rotation, width and height shifts, Copyright held by the owner/author(s). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France horizontal flip and zoom. We designed two different systems to process the images, which we will compare. MediaEval’18, 29-31 October 2018, Sophia Antipolis, France L. Lopez-Fuentes et al. Table 1: F-Score (%) evaluated on the test set for the two sub-tasks. Firstly, it is computed on a subset of 50 tweets from the training set and manually annotated by 4 persons. Then, it is computed on the test set for each developed approach: Metadata- only Approach (MA), Visual Approach 1 (VA1, Double-Ended Classifier with Compact Loss), Visual Approach 2 (VA2, Network Stacking with average aggregation), Visual Approach 3 (VA3, Network Stacking with average and voting aggregation). EVIDENCE [%] PASSABILITY [%] Approach \ Data Metadata Images Meta + Imgs Metadata Images Meta + Imgs Human annotation 51.48 87.32 - 18.18 47.71 - Metadata only [MA] 43.88 - - 19.3 - - Image only [VA1, VA2, VA3] - 85.6 86.43 87.79 - - 24.09 67.13 68.38 - Metadata and Image [VA1+MA] - - 83.12 - - 28.34 Double-ended classifier with compact loss: We used the In- be seen, our approach for flood evidence classification using meta- ception V3 [14] network pre-trained on ImageNet with two FC data obtains very poor results but close to the results obtained by layers and a softmax classifier at the end. Each end of the network human annotators, which means that the metadata was not very was trained for each task. These tasks can be subsumed as two discriminative for this task. Since the error is cumulative, the results separate One-class classification problems, in which the single class of both the humans and the metadata classifier drop significantly is the flood event for the first case, its passability condition for the for the passability detection, being the F-score again very close to second one. We took inspiration from [13] and we customized the one another. All the image classification approaches achieve sim- InceptionV3 optimization function, as: д̂ = maxд D(д(t))+λC(д(t)), ilar results on the first task, while the network stacking achieves where: (a) д is the deep feature representation for the training data a small improvement with respect to the double-ended classifier t, (b) λ is a positive constant, (c) D is the Descriptive loss function with compact-loss network. Furthermore, the aggregation of the net- (within this approach, we used the cross-entropy) and (d) C is the works using average and voting slightly improves the aggregation Compactness loss function, which evaluates the batch intra-class compared to using only the average. However, there is a big gap deep feature distance to derive objects from the same class. between the performance of the double-ended classifier with com- Network staking: We used 9 state-of-the art networks (Incep- pact loss and the network stacking approaches. When we decided tionV3, Xception, VGG16, VGG19, InceptionResNetV2, MobileNet, to use the same network body to perform both tasks we thought DenseNet121, DenseNet201, NASNetLarge) all of them pre-trained that since the tasks were very related, one task could benefit from on ImageNet. They were separately trained for both problems in 5 the others knowledge. However, since one task was more difficult different train-validation folds, which generated 90 networks (45 than the other, the double-ended network seems to have specialized per task). The output of each network is a number between 0 and 1 on the easy task while leaving aside the difficult task. It is also re- which represents the probability of the picture containing evidence markable that the network stacking algorithms achieve significant of roads in flooded regions and evidence of passable roads, respec- better results than the human annotators, probably because: (i) road tively. Being n the number of networks and pi the probability of the passability is subjective in several cases and (ii) while the network picture corresponding to class 1, we define p as the average of pi for learnt over the whole training set, the human annotators were not all 1 ≤ i ≤ n. We define votinд(p1 , ..., pn ) = |{i/pi > 0.5, 1 ≤ i ≤ given any examples about the task. n}|, where |.| is the set cardinality. We used two different methods Finally, combining metadata with images does not provide much to aggregate the results from the networks: (i) Average aggregation: improvement or it even worsens the results due to the lack of dis- pred(p1 , ..., pn ) = (p > 0.5), (ii) Average and voting aggregation: criminative features in the metadata. if p > 0.5 and votinд(p1 , ..., pn ) > n2 − 2 or 6 CONCLUSIONS   1   pred(p1 , ..., pn ) = p > 0.45 and votinд(p1 , ..., pn ) ≥ n2 ,   0 otherwise.  In this paper we studied several approaches to perform flood and  road passability detection. We proposed several approaches to deal with textual and visual information. According to our tests, we 4.3 Metadata and Visual Information discovered that when a network tries to accomplish several tasks To combine the metadata and the images, we merged the Metadata- with different difficulties, even if they are related, it focuses on one only and the Double-ended classifier with compact loss approaches. of them (presumably the simplest one), achieving good performance The two networks were taken without their respective double- in one case, but bad in the latter one. ended fully-connected (FC) layers and merged with two newer FC layers (one per task) with a softmax classifier. ACKNOWLEDGMENTS This work was partially supported by the Spanish Grants TIN2016- 5 RESULTS 75404-P AEI/FEDER, UE, TIN2014-52072-P, TIN2016-79717-R, TIN2013- To get a first idea of the upper-bound for our task we asked 4 per- 42795-P and the European Commission H2020 I-REACT project no. sons to do the task on a subset of 50 images, the results are given 700256. Laura Lopez-Fuentes benefits from the NAERINGSPHD fel- in Table 1. In the subsequent rows we have included the results lowship of the Norwegian Research Council under the collaboration for the 5 different approaches introduced in this paper. As it can agreement Ref.3114 with the UIB. 2018-Multimedia-Satellite-Task MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [1] 2018. MediaEval 2018 Multimedia Satellite Task. http://www. multimediaeval.org/mediaeval2018/multimediasatellite/. (2018). Data released: 31 May 2018. [2] Flavia Sofia Acerbo and Claudio Rossi. 2017. Filtering informative tweets during emergencies: a machine learning approach. In Proceed- ings of the First CoNEXT Workshop on ICT Tools for Emergency Networks and DisastEr Relief. ACM, 1–6. [3] Federico Angaramo and Claudio Rossi. 2017. Online clustering and classification for real-time event detection in Twitter. (2017). [4] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and Damian Borth. The Multimedia Satellite Task at MediaEval 2018: Emergency Response for Flooding Events. In Proc. of the MediaEval 2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France. [5] Tom Brouwer, Dirk Eilander, Arnejan Van Loenen, Martijn J Booij, Kathelijne M Wijnberg, Jan S Verkade, and Jurjen Wagemaker. 2017. Probabilistic flood extent estimates from social media flood observa- tions. Natural Hazards & Earth System Sciences 17, 5 (2017). [6] Yinghua He, Hong Wang, and Bo Zhang. 2004. Color-based road detection in urban traffic scenes. IEEE Transactions on intelligent transportation systems 5, 4 (2004), 309–318. [7] Christopher D. Manning Jeffrey Pennington, Richard Socher. 2018. GloVe: Global Vectors for Word Representation. (2018). https://nlp. stanford.edu/projects/glove/ [8] Victor Klemas. 2014. Remote sensing of floods and flood-prone areas: an overview. Journal of Coastal Research 31, 4 (2014), 1005–1013. [9] Hui Kong, Jean-Yves Audibert, and Jean Ponce. 2010. General road detection from a single image. IEEE Transactions on Image Processing 19, 8 (2010), 2211–2220. [10] Laura Lopez-Fuentes, Joost van de Weijer, Marc Bolanos, and Harald Skinnemoen. 2017. Multi-modal deep learning approach for flood detection. In Proc. of the MediaEval 2017 Workshop (Sept. 13–15, 2017). Dublin, Ireland. [11] Laura Lopez-Fuentes, Joost van de Weijer, Manuel González-Hidalgo, Harald Skinnemoen, and Andrew D Bagdanov. 2018. Review on com- puter vision techniques in emergency situations. Multimedia Tools and Applications 77, 13 (2018), 17069–17107. [12] Igor Ogashawara, Marcelo Pedroso Curtarelli, and Celso M Ferreira. 2013. The use of optical remote sensing for mapping flooded areas. Int. J. Eng. Res. Appl 3, 5 (2013), 1956–1960. [13] Pramuditha Perera and Vishal M Patel. 2018. Learning Deep Features for One-Class Classification. arXiv preprint arXiv:1801.05365 (2018). [14] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826. [15] CJ Ticehurst, P Dyce, and JP Guerschman. 2009. Using passive mi- crowave and optical remote sensing to monitor flood inundation in support of hydrologic modelling. In Interfacing modelling and sim- ulation with mathematical and computational sciences, 18th World IMACS/MODSIM Congress. 13–17. [16] Luca Venturini and Evelina Di Corso. 2017. Analyzing spatial data from twitter during a disaster. In Big Data (Big Data), 2017 IEEE International Conference on. IEEE, 3779–3783.