<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Flood detection from social multimedia and satellite images using ensemble and transfer learning with CNN architectures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Danielle Dias</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulisses Dias</string-name>
          <email>ulisses@ft.unicamp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Campinas</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>In this paper we explore deep convolutional neural networks pretrained on ImageNet along with transfer learning mechanism to detect if an area has been a ected by a ood in terms of access. We worked in two tasks with di erent datasets. The rst dataset contains images from social media and the goal is to identify direct evidence for passability of roads by conventional means. The second dataset contains high resolution satellite imagery of partially ooded areas and the goal is to identify sections of roads that are potentially blocked. For both tasks, we used visual information only and our best models achieved averaged F1-Score value of 64.81% on the rst task and 73.27% on the second task.</p>
      </abstract>
      <kwd-group>
        <kwd>Ensemble</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Flooding events demand fast response. Rescue and medical teams
should move fast to the a ected points and bring victims to safety
in a timely manner. Unfortunately, roads may be a ected by the
ood in terms of access. Automatic road passability recognition
aids the support planning that will mitigate the impact of disasters.</p>
      <p>
        The “Multimedia Satellite Task 2018” studies the problem of road
passability classi cation, namely whether or not it is possible to
travel through a ooded region. Two tasks were proposed
depending on the source of information. In the rst task, we should take
advantage of the high popularity of social media and lter those
information which provide direct evidence for passability of roads.
In the second task, we receive high resolution satellite imagery of
partially ooded areas and the goal is to identify if it is possible to
go from a point A to a point B. More details can be found in the
task overview [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>The dataset for the social media task consists of 7,387 images and
the dataset for the remote sensing task consists of 1,664 satellite
images. As the size of the dataset is limited, we decided to use the
transfer learning mechanism in both cases in a similar work ow:
images are received as input, pre-trained convolutional neural
networks (CNNs) are used for feature extraction (Section 2.1), arti cial
neural networks (ANNs) predict labels (Section 2.2), and an
ensemble is constructed of individual classi ers (Section 2.3).</p>
      <p>While in the social media subtask images are the only source of
information, in the remote sensing subtask we receive the images
along with two points A and B. The question is whether or not we
can go from point A to point B. Thus, we preprocess the images to
embed these points within the image (Section 2.4).
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Transfer Learning Mechanism</title>
      <p>
        Many advanced CNN architectures have been trained on ImageNet
and are currently available. We selected 10 of them as feature
extractors: DenseNet121 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], DenseNet169 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], DenseNet201 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
InceptionResNetV2 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], InceptionV3 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], MobileNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], ResNet50 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
VGG16 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], VGG19 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Xception [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We also studied if global
feature based approaches extracted with Lire [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] could provide any
signi cant advantage to the pre-trained models, but since no
improvement was achieved, we decided to use only features extracted
from CNNs.
      </p>
      <p>We replaced the architecture prediction layers with new ANN
models, which are responsible for returning the classi cation
labels. For the social media subtask, the output labels account for (i)
no evidence, (ii) evidence/not passable, and (iii) evidence/passable.
For the remote sensing subtask, the output labels account for (i)
passable, and (ii) non passable. That said, ANNs in the former
subtask have three units, whereas ANNs in the latter have only two.
The output layers use softmax activation function.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Prediction Layer Models</title>
      <p>Two approaches performed best on our 5-fold cross validation
analysis of prediction layers. They are hereby called Model1 and Model2.</p>
      <p>Model1 is an ANN having only one hidden layer with 512 nodes.
Each node uses ReLU as activation function. We added a Dropout
layer with a dropout ratio of 50% in the hidden layer, and l2
regularization to prevent over tting.</p>
      <p>Model2 is an ANN having two hidden layers. The rst has 2048
nodes and the second has 128. Nodes in hidden layers use ReLU as
activation function, and l2 as regularization. We dropped out 80%
of the connections between input layer and the rst hidden layer.
We also added a dropout ratio of 50% in each hidden layer.
2.3
We have 10 CNN architectures to extract features and 2 ANN
architectures for prediction. Therefore, for each image we have 20
class predictions, each prediction is a vector of three oating point
numbers in the social media task and two oating point numbers
in the remote sensing task.</p>
      <p>To create an ensemble, we concatenate the class predictions and
use logistic regression to map the 20×3 = 60 dimension vector to 3
output classes in the social media task, and to map the 20 × 2 = 40
dimension vector to 2 output classes in the remote sensing task.
2.4</p>
    </sec>
    <sec id="sec-5">
      <title>Preprocessing Satellite Images</title>
      <p>In Figure 1 we illustrate all the steps to preprocess the satellite
images. In Figure 1(a) we show one of the images in the development
dataset. Figures 1(b-d) have marks added for illustrative purposes
(a) Original
(b) Marks Added
(c) Cropped
(d) Rotated
and to clarify the explanation, we did not really add these marks
to the image provided to the CNNs. Blue and red marks represent
the inputted A and B points, respectively. Our ultimate goals is to
place these points in xed locations, so the model could learn how
to nd a path between them. We follow by describing each step.
(1) We rst compute a point C which is halfway between A
and B as shown by the yellow mark in Figure 1(b).
(2) We compute a circunference centered in C that have the
distance between A and B as diameter. Observe in
Figure 1(b) that only a small area of the image is inside the
circunference and that the area outside is not helpful to
answer whether there is a path between A and B. This
observation occurs in several cases.
(3) We crop the image to keep only the circunscript square as
shown in Figure 1(c).
(4) We rotate the image around C to place A in the left side
where the circunference touches the circunscript square,
and B in the right side counterpart, as shown in Figure 1(d).
3</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS AND ANALYSIS</title>
      <p>During the training phase, we evaluated the models using 5-fold
cross validation. We selected the four best models and the
ensemble to submit to the organizers, which performed their analysis on
unseen data and reported the results back to us. Table 1 and
Table 2 present the results using the averaged F1-Score metric for
the social media and remote sensing subtasks, respectively.</p>
      <p>In the social media task, the ensemble produced the best results,
obtaining F1-Score of 64.81% against 62.93% yielded by ResNet50,
the best individual model. In the remote sensing task, the
ensemble achieved 71.72% while the best individual model DenseNet121
reached 73.27%. We believe there is room for improvement if we
tune the ensemble again, or if we replace the logistic regressor by
other classi cation methods.
4</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>We used pre-trained CNNs as a starting point to create models that
predict if it is possible to travel through a ooded area.
We thank CAPES and CNPq (grant 400487/2016-0) and FAPESP
(grant 2015/11937-9).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Jens de Bruijn, and
          <string-name>
            <given-names>Damian</given-names>
            <surname>Borth</surname>
          </string-name>
          .
          <source>The Multimedia Satellite Task at MediaEval</source>
          <year>2018</year>
          :
          <article-title>Emergency Response for Flooding Events</article-title>
          .
          <source>In Proc. of the MediaEval 2018</source>
          Workshop (Oct.
          <fpage>29</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2018</year>
          ). Sophia-Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Xception: Deep Learning with Depthwise Separable Convolutions</article-title>
          .
          <source>CoRR abs/1610</source>
          .02357 (
          <year>2016</year>
          ). arXiv:
          <volume>1610</volume>
          .02357 http://arxiv.org/abs/1610.02357
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Andrew</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Howard</surname>
            , Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and
            <given-names>Hartwig</given-names>
          </string-name>
          <string-name>
            <surname>Adam</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>MobileNets: E cient Convolutional Neural Networks for Mobile Vision Applications</article-title>
          . (04
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>CoRR abs/1512</source>
          .03385 (
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .03385 http://arxiv.org/abs/1512.03385
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Gao</given-names>
            <surname>Huang</surname>
          </string-name>
          , Zhuang Liu, and
          <string-name>
            <surname>Kilian</surname>
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Densely Connected Convolutional Networks</article-title>
          .
          <source>CoRR abs/1608</source>
          .06993 (
          <year>2016</year>
          ). arXiv:
          <volume>1608</volume>
          .06993 http://arxiv.org/abs/1608.06993
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Lux</surname>
          </string-name>
          and
          <article-title>Savvas A</article-title>
          . Chatzichristo s.
          <year>2008</year>
          .
          <article-title>Lire: Lucene Image Retrieval: An Extensible Java CBIR Library</article-title>
          .
          <source>In Proceedings of the 16th ACM International Conference on Multimedia (MM '08)</source>
          . ACM, New York, NY, USA,
          <fpage>1085</fpage>
          -
          <lpage>1088</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>CoRR abs/1409</source>
          .1556 (
          <year>2014</year>
          ). arXiv:
          <volume>1409</volume>
          .1556 http://arxiv.org/abs/1409.1556
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , Sergey Io e, and
          <string-name>
            <given-names>Vincent</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning</article-title>
          .
          <source>CoRR abs/1602</source>
          .07261 (
          <year>2016</year>
          ). arXiv:
          <volume>1602</volume>
          .07261 http://arxiv.org/abs/1602.07261
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , Vincent Vanhoucke, Sergey Io e, Jonathon Shlens, and
          <string-name>
            <given-names>Zbigniew</given-names>
            <surname>Wojna</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Rethinking the Inception Architecture for Computer Vision</article-title>
          . CoRR abs/1512.00567 (
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .00567 http://arxiv.org/abs/1512.00567
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>