Ensemble and Inference based Methods for Flood Severity Estimation Using Visual Data Mir Murtaza, Muhammad Hanif, Muhammad Atif Tahir, Muhammad Rafi National University of Computer and Emerging Sciences, Karachi Campus, Pakistan {K173029,hanif.soomro,atif.tahir,muhammad.rafi}@nu.edu.pk ABSTRACT and a score that indicated belief of calculated confidence scores. This paper presents the contribution of NUCES DSGP team for the We also added an image level flood evidence probability obtained Multimedia Satellite Task at MediaEval 2019. The essential tasks from VGG19 pre-trained on the Places365 dataset and fine-tuned include News Image Topic Disambiguation (NITD) and Multimodal on Multimedia Satellite 2017 flood images. This flood probability is Flood level Estimation (MFLE) from news images. An ensemble same for all the detected person(s) in an image. For inference, we based deep learning method has been applied to the News Image trained a random forest on keypoints scores and flood probability Topic Disambiguation task, where data augmentation and trans- scores obtained for each person bounding box. If the person with fer learning were used for binary classification of images. During the maximum probability exceeds 0.50, the image is classified as training, the challenge of class imbalance is managed by using data positive instance. Our inspiration for this work comes from efforts augmentation technique and selection of equal sample size from on text and image based data from different sources to identify type each class. For Multimodal Flood Level Estimation task, person’s and intensity of disasters [5, 9, 11, 15]. lower body keypoints were detected along with image flood proba- bility scores from two deep convolutional network architectures, 2 APPROACH namely ResNet50 and VGG19. The confidence scores of detected Two different deep learning based approaches were used for both keypoints and the convolutional networks’ output probabilities tasks. The details of approaches are given below: were combined and were passed to a Random Forest classifier for a final prediction score. The evaluation of the proposed methods for News Image Topic Disambiguation (NITD): An ensemble based the test set of NITD task revealed a 0.895 F1-Micro score (3rd best deep learning approach has been adopted for the task of Image- score), while the evaluation of MFLE task provided 0.734 F1-Macro based News Topic Disambiguation, MediaEval, 2019. The dataset score (2nd best score). for the task contains 5181 images for training and 1296 images for testing purpose. There were 564 images related to first class and 4617 images for second class. An ensemble based method has 1 INTRODUCTION been implemented for the specified binary image classification task. Effective and efficient flood response system requires timely in- Initially, class imbalanced problem has been solved by using data formation about the event. The risk of damage could be reduced augmentation technique. The Augmentor [3] library has been used by appropriate actions on the basis of inferred information. The to create multiple copies of minority class by using different param- data collected through different mediums including news articles eters, including rotate, flip and zoom. After balancing both classes, is easily available and could be used for disaster response system. 3000 images from each class were randomly selected for training. Vi- The NITD task of "Multimedia Satellite Task at MediaEval, 2019" sual Geometry Group (VGG16) [14] classification model pretrained [1] provided the challenge of developing binary image classifier by on Hybrid dataset (Places365 [9] and ImageNet [7]) has been used using images published in different articles. The classifier method for the purpose of classification. Visual Geometry Group (VGG16) should predict whether or not the topic of article in which particular is a neural network, launched during ImageNet Large Scale Visual image was included, discussing any water related disaster. Recognition Challenge (ILSVRC), 2014 and secured first and sec- In the MFLE task, we had to determine whether at least one per- ond positions in image localization and image classification tasks, son in the image is standing in water, and whether or not the level respectively. Dropout ratio was set as 0.3, which was added after of water is above the knee. To solve this problem, we considered the first dense layer. And a batch normalization layer was added both global and local perspective of an image. Our approach was after the dropout layer. The last softmax layer was replaced with a to first detect and localize all person(s) present in an image. Then sigmoid unit for binary classes. The last 6 layers of the model were a dataset was prepared which contains persons’ bounding boxes retrained on dataset of the task and remaining layers were fixed from all of the images. The ResNet50 Convolutional Neural Net- during the training. The parameters of the model were optimized work was fine-tuned to extract flood probability of the bounding using the Adam optimizer, with learning rate of 10e-6. The model box. The label of the image was attached to all bounding box(es) of was retrained for 40 epochs and finally the trained model was ap- person(s) present in the image. Then we stacked ResNet50 probabil- plied for prediction on 1296 test images. Similarly, five models were ity with confidence score of four lower body keypoints of interest: trained and majority voting was applied for the final prediction, as namely Right Knee, Left Knee, Right Ankle, Left Ankle of a person depicted in Figure 1. Copyright 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution Multimodal . Flood Level Estimation from news (MFLE): For 4.0 International (CC BY 4.0). the prediction of at least one person standing in water level above MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France Murtaza et al. Figure 1: Data Flow diagram showing process for the task of News Image Topic Disambiguation the knee in an image, we hypothesized that "a detected person in the flooded region with low knees and ankles visibility is a positive case". For this purpose, we prepared a separate dataset which consists of all the detected persons’ bounding boxes from their respective images. The label attached to a person bounding box was same as that of the parent image. For person’s bounding box detection, the Faster-RCNN model [13] is used, an end-to-end architecture that uses a Region Proposal Network to select regions of interest along with VGG16 Deep CNN for classification. We fine-tuned the last convolutional layer of the ImageNet ResNet50 CNN architecture on these detected person patches of the training Figure 2: MFLE Inference Pipeline. (a) An image (b) De- images and replaced the last softmax layer with a sigmoid unit. As tected Persons and Bounding Boxes (c) Persons keypoints (d) the dataset was highly imbalanced, balanced weights were used ResNet50 flood estimation (e) Random Forest Inference of which is an effective method to train Convolutional Neural Network the feature set (f) Max Probability from Persons and Thresh- models on imbalanced data [6]. The purpose of using ResNet is to olding get a local estimate of the flood level in the person’s bounding box. Afterwards, for the persons bounding boxes dataset, we obtained the visibility confidence scores for four lower body keypoints i.e use data augmentation. Run 5 uses EasyEnsemble [10, 12] approach Right Knee, Left Knee, Right Ankle, Left Ankle of a person and an for classification. Run 1 of MLFE uses the image feature set and overall score indicating belief in the keypoints calculation. These Random Forest classifier. probabilities help in estimation of the visibility part of our hypoth- Table 1 shows the performance of various runs on test set. For esis. We utilized Fang et. al [8] approach of Regional Multi-person NITD, the ensemble of five different VGG16 (Hybrid) models has Pose Estimation (RMPE) for keypoint detection. The development produced a micro-averaged F1 score of 0.895. For MFLE, the pro- dataset had images with no flood evidence. Therefore, we quantified posed method has produced F1 (macro-averaged) score of 0.734. the global flood evidence by acquiring flood evidence probability We got an F1-Score (+ve class) of 0.694 on the MFLE validation set. from Places365 VGG19 [16] fine-tuned on MediaEval Multimedia Table 1: Test set results from various runs of NITD and Satellite 2017 dataset [2]. All persons bounding boxes of an image MFLE. get same global flood evidence probability. Then a Random Forest [4] was trained with 321 trees and gini splitting criteria on the feature vectors comprising of ResNet50 lo- Task Run1 Run2 Run3 Run4 Run5 cal flood probability, Right Knee confidence, Left Knee confidence, NITD 0.711 0.718 0.890 0.895 0.747 Right Ankle confidence, Left Ankle confidence, keypoints Accuracy MFLE 0.734 - - - - Score and VGG19 global flood evidence. For inference, we take an image, detect person(s) bounding box(es), then for each person bounding box, pass it through ResNet50 and get a local flood prob- 4 CONCLUSION ability. Then we extract confidence probabilities for Right Knee, Various approaches for NITD task have been applied. An approach Left Knee, Right Ankle, Left Ankle of a person using RMPE and without data augmentation has been implemented by balancing the global flood evidence probability using Places365 VGG19. Each the minority and majority classes through weights. Also, features person’s features set is passed to the trained Random Forest and with the length of 1365 have been extracted for each image and the final probability of that person being in water above knee-level EasyEnsemble classifier is applied. However, the best performing is obtained. For classification, we select the person in the image approach utilized data augmentation technique to increase the having the maximum probability, if the maximum probability is quantity of minority class. We did an ablation study of the MFLE greater than 0.50, the image is classified as positive instance, else a inference model. First, when VGG19 global level flood estimation negative one as shown in Figure 2. component was removed, the performance deteriorated. Same hold true for the ResNet local level flood estimation and for the An- 3 TEST RESULTS FOR NITD AND MFLE kle keypoints. For the Random Forest probabilities, we computed For all runs of the NITD task, VGG16 ensemble was used. Run 1 and different statistics such as mean and median, but maximum of prob- 2 use class weights to handle the class imbalance, Run 3, 4 and 5 abilities resulted in better performance on the validation set. 2019 Multimedia Satellite Task MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France REFERENCES [1] Benjamin Bischke, Patrick Helber, Simon Brugman, Erkan Basar, Martha Larson, and Konstantin Pogorelov. 2019. The Multimedia Satellite Task at MediaEval 2019: Estimation of Flood Severity. In Proc. of the MediaEval 2019 Workshop (Oct. 27-29, 2019). Sophia Antipolis, France. [2] Benjamin Bischke, Patrick Helber, Christian Schulze, Venkat Srini- vasan, Andreas Dengel, and Damian Borth. 2017. The Multimedia Satellite Task at MediaEval 2017, Emergency Response for Flooding Events. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [3] Marcus D Bloice, Peter M Roth, and Andreas Holzinger. 2019. Biomed- ical image augmentation using Augmentor. Bioinformatics (04 2019). https://doi.org/10.1093/bioinformatics/btz259 btz259. [4] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32. [5] Tom Brouwer, Dirk Eilander, Arnejan Van Loenen, Martijn J Booij, Kathelijne M Wijnberg, Jan S Verkade, and Jurjen Wagemaker. 2017. Probabilistic flood extent estimates from social media flood observa- tions. Natural Hazards & Earth System Sciences 17, 5 (2017). [6] Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106 (2018), 249–259. [7] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255. [8] Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 2334–2343. [9] Ryan Lagerstrom, Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert Power, Bella Robinson, and Tomasz Bednarz. 2016. Image classification to support emergency situation awareness. Frontiers in Robotics and AI 3 (2016), 54. [10] Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. 2017. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbal- anced Datasets in Machine Learning. Journal of Machine Learning Research 18, 17 (2017), 1–5. http://jmlr.org/papers/v18/16-365.html [11] Zhenlong Li, Cuizhen Wang, Christopher T Emrich, and Diansheng Guo. 2018. A novel approach to leveraging social media for rapid flood mapping: a case study of the 2015 South Carolina floods. Cartography and Geographic Information Science 45, 2 (2018), 97–110. [12] Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2008. Exploratory under- sampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2008), 539–550. [13] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal net- works. In Advances in neural information processing systems. 91–99. [14] Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolu- tional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations. [15] Nataliya Tkachenko, Stephen Jarvis, and Rob Procter. 2017. Predicting floods with Flickr tags. PloS one 12, 2 (2017), e0172870. [16] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recogni- tion. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452–1464.