Ensemble and Inference based Methods for Flood Severity
                     Estimation Using Visual Data
                          Mir Murtaza, Muhammad Hanif, Muhammad Atif Tahir, Muhammad Rafi
                             National University of Computer and Emerging Sciences, Karachi Campus, Pakistan
                                       {K173029,hanif.soomro,atif.tahir,muhammad.rafi}@nu.edu.pk

ABSTRACT                                                                 and a score that indicated belief of calculated confidence scores.
This paper presents the contribution of NUCES DSGP team for the          We also added an image level flood evidence probability obtained
Multimedia Satellite Task at MediaEval 2019. The essential tasks         from VGG19 pre-trained on the Places365 dataset and fine-tuned
include News Image Topic Disambiguation (NITD) and Multimodal            on Multimedia Satellite 2017 flood images. This flood probability is
Flood level Estimation (MFLE) from news images. An ensemble              same for all the detected person(s) in an image. For inference, we
based deep learning method has been applied to the News Image            trained a random forest on keypoints scores and flood probability
Topic Disambiguation task, where data augmentation and trans-            scores obtained for each person bounding box. If the person with
fer learning were used for binary classification of images. During       the maximum probability exceeds 0.50, the image is classified as
training, the challenge of class imbalance is managed by using data      positive instance. Our inspiration for this work comes from efforts
augmentation technique and selection of equal sample size from           on text and image based data from different sources to identify type
each class. For Multimodal Flood Level Estimation task, person’s         and intensity of disasters [5, 9, 11, 15].
lower body keypoints were detected along with image flood proba-
bility scores from two deep convolutional network architectures,         2   APPROACH
namely ResNet50 and VGG19. The confidence scores of detected             Two different deep learning based approaches were used for both
keypoints and the convolutional networks’ output probabilities           tasks. The details of approaches are given below:
were combined and were passed to a Random Forest classifier for a
final prediction score. The evaluation of the proposed methods for       News Image Topic Disambiguation (NITD): An ensemble based
the test set of NITD task revealed a 0.895 F1-Micro score (3rd best      deep learning approach has been adopted for the task of Image-
score), while the evaluation of MFLE task provided 0.734 F1-Macro        based News Topic Disambiguation, MediaEval, 2019. The dataset
score (2nd best score).                                                  for the task contains 5181 images for training and 1296 images
                                                                         for testing purpose. There were 564 images related to first class
                                                                         and 4617 images for second class. An ensemble based method has
1    INTRODUCTION
                                                                         been implemented for the specified binary image classification task.
Effective and efficient flood response system requires timely in-        Initially, class imbalanced problem has been solved by using data
formation about the event. The risk of damage could be reduced           augmentation technique. The Augmentor [3] library has been used
by appropriate actions on the basis of inferred information. The         to create multiple copies of minority class by using different param-
data collected through different mediums including news articles         eters, including rotate, flip and zoom. After balancing both classes,
is easily available and could be used for disaster response system.      3000 images from each class were randomly selected for training. Vi-
The NITD task of "Multimedia Satellite Task at MediaEval, 2019"          sual Geometry Group (VGG16) [14] classification model pretrained
[1] provided the challenge of developing binary image classifier by      on Hybrid dataset (Places365 [9] and ImageNet [7]) has been used
using images published in different articles. The classifier method      for the purpose of classification. Visual Geometry Group (VGG16)
should predict whether or not the topic of article in which particular   is a neural network, launched during ImageNet Large Scale Visual
image was included, discussing any water related disaster.               Recognition Challenge (ILSVRC), 2014 and secured first and sec-
   In the MFLE task, we had to determine whether at least one per-       ond positions in image localization and image classification tasks,
son in the image is standing in water, and whether or not the level      respectively. Dropout ratio was set as 0.3, which was added after
of water is above the knee. To solve this problem, we considered         the first dense layer. And a batch normalization layer was added
both global and local perspective of an image. Our approach was          after the dropout layer. The last softmax layer was replaced with a
to first detect and localize all person(s) present in an image. Then     sigmoid unit for binary classes. The last 6 layers of the model were
a dataset was prepared which contains persons’ bounding boxes            retrained on dataset of the task and remaining layers were fixed
from all of the images. The ResNet50 Convolutional Neural Net-           during the training. The parameters of the model were optimized
work was fine-tuned to extract flood probability of the bounding         using the Adam optimizer, with learning rate of 10e-6. The model
box. The label of the image was attached to all bounding box(es) of      was retrained for 40 epochs and finally the trained model was ap-
person(s) present in the image. Then we stacked ResNet50 probabil-       plied for prediction on 1296 test images. Similarly, five models were
ity with confidence score of four lower body keypoints of interest:      trained and majority voting was applied for the final prediction, as
namely Right Knee, Left Knee, Right Ankle, Left Ankle of a person        depicted in Figure 1.
Copyright 2019 for this paper by its authors. Use
permitted under Creative Commons License Attribution                     Multimodal
                                                                            .          Flood Level Estimation from news (MFLE): For
4.0 International (CC BY 4.0).                                           the prediction of at least one person standing in water level above
MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France
MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France                                                                    Murtaza et al.


Figure 1: Data Flow diagram showing process for the task of
News Image Topic Disambiguation


the knee in an image, we hypothesized that "a detected person
in the flooded region with low knees and ankles visibility is a
positive case". For this purpose, we prepared a separate dataset
which consists of all the detected persons’ bounding boxes from
their respective images. The label attached to a person bounding
box was same as that of the parent image. For person’s bounding
box detection, the Faster-RCNN model [13] is used, an end-to-end
architecture that uses a Region Proposal Network to select regions
of interest along with VGG16 Deep CNN for classification. We
fine-tuned the last convolutional layer of the ImageNet ResNet50
CNN architecture on these detected person patches of the training         Figure 2: MFLE Inference Pipeline. (a) An image (b) De-
images and replaced the last softmax layer with a sigmoid unit. As        tected Persons and Bounding Boxes (c) Persons keypoints (d)
the dataset was highly imbalanced, balanced weights were used             ResNet50 flood estimation (e) Random Forest Inference of
which is an effective method to train Convolutional Neural Network        the feature set (f) Max Probability from Persons and Thresh-
models on imbalanced data [6]. The purpose of using ResNet is to          olding
get a local estimate of the flood level in the person’s bounding box.
   Afterwards, for the persons bounding boxes dataset, we obtained
the visibility confidence scores for four lower body keypoints i.e        use data augmentation. Run 5 uses EasyEnsemble [10, 12] approach
Right Knee, Left Knee, Right Ankle, Left Ankle of a person and an         for classification. Run 1 of MLFE uses the image feature set and
overall score indicating belief in the keypoints calculation. These       Random Forest classifier.
probabilities help in estimation of the visibility part of our hypoth-       Table 1 shows the performance of various runs on test set. For
esis. We utilized Fang et. al [8] approach of Regional Multi-person       NITD, the ensemble of five different VGG16 (Hybrid) models has
Pose Estimation (RMPE) for keypoint detection. The development            produced a micro-averaged F1 score of 0.895. For MFLE, the pro-
dataset had images with no flood evidence. Therefore, we quantified       posed method has produced F1 (macro-averaged) score of 0.734.
the global flood evidence by acquiring flood evidence probability         We got an F1-Score (+ve class) of 0.694 on the MFLE validation set.
from Places365 VGG19 [16] fine-tuned on MediaEval Multimedia
                                                                          Table 1: Test set results from various runs of NITD and
Satellite 2017 dataset [2]. All persons bounding boxes of an image
                                                                          MFLE.
get same global flood evidence probability.
   Then a Random Forest [4] was trained with 321 trees and gini
splitting criteria on the feature vectors comprising of ResNet50 lo-                Task     Run1    Run2    Run3    Run4    Run5
cal flood probability, Right Knee confidence, Left Knee confidence,                 NITD     0.711   0.718   0.890   0.895   0.747
Right Ankle confidence, Left Ankle confidence, keypoints Accuracy                   MFLE     0.734     -       -       -       -
Score and VGG19 global flood evidence. For inference, we take
an image, detect person(s) bounding box(es), then for each person
bounding box, pass it through ResNet50 and get a local flood prob-        4   CONCLUSION
ability. Then we extract confidence probabilities for Right Knee,         Various approaches for NITD task have been applied. An approach
Left Knee, Right Ankle, Left Ankle of a person using RMPE and             without data augmentation has been implemented by balancing
the global flood evidence probability using Places365 VGG19. Each         the minority and majority classes through weights. Also, features
person’s features set is passed to the trained Random Forest and          with the length of 1365 have been extracted for each image and
the final probability of that person being in water above knee-level      EasyEnsemble classifier is applied. However, the best performing
is obtained. For classification, we select the person in the image        approach utilized data augmentation technique to increase the
having the maximum probability, if the maximum probability is             quantity of minority class. We did an ablation study of the MFLE
greater than 0.50, the image is classified as positive instance, else a   inference model. First, when VGG19 global level flood estimation
negative one as shown in Figure 2.                                        component was removed, the performance deteriorated. Same hold
                                                                          true for the ResNet local level flood estimation and for the An-
3   TEST RESULTS FOR NITD AND MFLE                                        kle keypoints. For the Random Forest probabilities, we computed
For all runs of the NITD task, VGG16 ensemble was used. Run 1 and         different statistics such as mean and median, but maximum of prob-
2 use class weights to handle the class imbalance, Run 3, 4 and 5         abilities resulted in better performance on the validation set.
2019 Multimedia Satellite Task                                                  MediaEval‘19, 27-29 October 2019, Sophia Antipolis, France


REFERENCES
 [1] Benjamin Bischke, Patrick Helber, Simon Brugman, Erkan Basar,
     Martha Larson, and Konstantin Pogorelov. 2019. The Multimedia
     Satellite Task at MediaEval 2019: Estimation of Flood Severity. In Proc.
     of the MediaEval 2019 Workshop (Oct. 27-29, 2019). Sophia Antipolis,
     France.
 [2] Benjamin Bischke, Patrick Helber, Christian Schulze, Venkat Srini-
     vasan, Andreas Dengel, and Damian Borth. 2017. The Multimedia
     Satellite Task at MediaEval 2017, Emergency Response for Flooding
     Events. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017).
     Dublin, Ireland.
 [3] Marcus D Bloice, Peter M Roth, and Andreas Holzinger. 2019. Biomed-
     ical image augmentation using Augmentor. Bioinformatics (04 2019).
     https://doi.org/10.1093/bioinformatics/btz259 btz259.
 [4] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001),
     5–32.
 [5] Tom Brouwer, Dirk Eilander, Arnejan Van Loenen, Martijn J Booij,
     Kathelijne M Wijnberg, Jan S Verkade, and Jurjen Wagemaker. 2017.
     Probabilistic flood extent estimates from social media flood observa-
     tions. Natural Hazards & Earth System Sciences 17, 5 (2017).
 [6] Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A
     systematic study of the class imbalance problem in convolutional
     neural networks. Neural Networks 106 (2018), 249–259.
 [7] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.
     2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
     conference on computer vision and pattern recognition. IEEE, 248–255.
 [8] Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE:
     Regional multi-person pose estimation. In Proceedings of the IEEE
     International Conference on Computer Vision. 2334–2343.
 [9] Ryan Lagerstrom, Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert
     Power, Bella Robinson, and Tomasz Bednarz. 2016. Image classification
     to support emergency situation awareness. Frontiers in Robotics and
     AI 3 (2016), 54.
[10] Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. 2017.
     Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbal-
     anced Datasets in Machine Learning. Journal of Machine Learning
     Research 18, 17 (2017), 1–5. http://jmlr.org/papers/v18/16-365.html
[11] Zhenlong Li, Cuizhen Wang, Christopher T Emrich, and Diansheng
     Guo. 2018. A novel approach to leveraging social media for rapid flood
     mapping: a case study of the 2015 South Carolina floods. Cartography
     and Geographic Information Science 45, 2 (2018), 97–110.
[12] Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2008. Exploratory under-
     sampling for class-imbalance learning. IEEE Transactions on Systems,
     Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2008), 539–550.
[13] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster
     R-CNN: Towards real-time object detection with region proposal net-
     works. In Advances in neural information processing systems. 91–99.
[14] Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolu-
     tional Networks for Large-Scale Image Recognition. In International
     Conference on Learning Representations.
[15] Nataliya Tkachenko, Stephen Jarvis, and Rob Procter. 2017. Predicting
     floods with Flickr tags. PloS one 12, 2 (2017), e0172870.
[16] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio
     Torralba. 2017. Places: A 10 million image database for scene recogni-
     tion. IEEE transactions on pattern analysis and machine intelligence 40,
     6 (2017), 1452–1464.