=Paper=
{{Paper
|id=Vol-2670/MediaEval_19_paper_7
|storemode=property
|title=The
Multimedia Satellite Task at MediaEval 2019
|pdfUrl=https://ceur-ws.org/Vol-2670/MediaEval_19_paper_7.pdf
|volume=Vol-2670
|authors=Benjamin Bischke,Patrick Helber,Simon Brugman,Erkan Basar,Zhengyu Zhao,Martha Larson, Konstantin Pogorelov
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BischkeHBB0LP19
}}
==The
Multimedia Satellite Task at MediaEval 2019==
The Multimedia Satellite Task at MediaEval 2019 Estimation of Flood Severity Benjamin Bischke1,2 , Patrick Helber1,2 , Simon Brugman3 , Erkan Basar3,4 , Zhengyu Zhao3 , Martha Larson3 , Konstantin Pogorelov5,6 1 German Research Center for Artificial Intelligence (DFKI), Germany 2 TU Kaiserslautern, Germany 3 Radboud University, Netherlands 4 FloodTags, Netherlands 5 Simula Research Laboratory, Norway 6 University of Oslo, Norway {benjamin.bischke,patrick.helber}@dfki.de {simon.brugman,erkan.basar,z.zhao,m.larson}@cs.ru.nl konstantin@simula.no ABSTRACT This paper provides a description of the Multimedia Satellite Task at MediaEval 2019. The main objective of the task is to extract com- plementary information associated with events which are present in Satellite Imagery and news articles. Due to their high socio- economic impact, we focus on flooding events and built upon the last two years of the Multimedia Satellite Task. Our task focuses this year on flood severity estimation and consists of three subtasks: (1) Image-based News Topic Disambiguation, (2) Multimodal Flood Level Estimation from news, (3) Classification of city-centered satel- lite sequences. The task moves forward the state of the art in flood Figure 1: Sample images for the Multimodal Flood Level impact assessment by concentrating on aspects that are important Estimation dataset, shown jointly with extracted pose key- but are not generally studied by multimedia researchers. points. The goal of this subtask is to identify persons stand- ing in water above knee level, based on visual and textual information of news articles. 1 INTRODUCTION Floods can cause loss of life and substantial property damage. More- estimation. In the following, we extend the series of the Multimedia over, the economic ramifications of flood damage disproportionately Satellite Task [1, 2] and define three subtasks in the direction of impact the most vulnerable members of society [12]. In order to flood severity estimation. assess the impact of a flooding event, typically satellite imagery is ac- quired and remote sensing specialists visually or semi-automatically 2 TASK DETAILS [7, 11] interpret them to create flood maps to quantify impact of such events. One major drawback of this approach when only 2.1 Image-based News Topic Disambiguation relying on satellite imagery are unusable images from optical sen- For the first subtask, participants receive links to a set of images that sors due to the presence of clouds and adverse constellations of appeared in online news articles (English). They are asked to build non-geostationary satellites at particular points in time. In order a binary image classifier that predicts whether or not the article to overcome this drawback, we additionally analyse complemen- in which each image appeared mentioned a water-related natural- tary information from social multimedia and news articles. The disaster event. All of the news articles in the data set contain a larger goal of this task is to analyse and combine the information flood-related keyword, e.g., “flood”, but their topics are ambiguous. in satellite images and online media content in order to provide For example, a news article might mention a “flood of flowers”, a comprehensive view of flooding events. While there has been without being an article on the topic of a natural-disaster flooding some work in disaster event detection [3, 5, 8] and disaster relief event. Participants are allowed to submit 5 runs: [6, 9, 10] from social media, not much research has been done in • Required run 1: using visual information only the direction of flood severity estimation. In this task, participants • General run 2, 3, 4, 5: everything automated allowed, in- receive multimedia data, new articles, and satellite imagery and cluding using data from external sources are required to train classifiers. The task moves forward the state of the art in flood impact assessment by concentrating on aspects 2.2 Multimodal Flood Level Estimation that are important but are not generally studied by multimedia In the second subtask, participants receive a set of links to online researchers. In this year, we are also in particular interested into a news articles (English) and links to accompanying images. The closer analysis of both, visual and textual information for severity set has been filtered to include only news articles for which the Copyright 2019 for this paper by its authors. Use accompanying image depicts a flooding event. Participants are permitted under Creative Commons License Attribution asked to build a binary classifier that predicts whether or not the 4.0 International (CC BY 4.0). image contains at least one person standing in water above the knee. MediaEval’19, 27-29 October 2019, Sophia Antipolis, France MediaEval’19, 27-29 October 2019, Sophia Antipolis, France B. Bischke et al. Participants can use image-features only, but the task encourages a We annotated the images based on the image content. For the combination of image and text features, and even use of satellite annotation we used the open-source VGG Image Annotator1 (VIA) imagery. As in the previous task, participants are allowed to submit from the Visual Geometry Group at Oxford [4]. We drew a bounding 5 runs: box around all people who are depicted with at least one of their feet • Required run 1: using visual information only occluded by water. Children are included in the definition of people, • Required run 2: using text information only although they are shorter. In order to derive consistent labels, we • Required run 3: using visual and text information only were in particular interested in persons standing in water, in the • General run 4, 5: everything automated allowed, including sense that the part of the body that is under water, should be in the using data from external sources upright position. For each of the bounding boxes we additionally collected a depth indicator: feet, knee, hip or chest. If one knee is occluded by water and not the hip, then we annotated knee, because 2.3 City-centered satellite sequences the highest body part the water has reached is the knee. We follow In this complementary subtask, participants receive a set of se- the same approach as described above to divide the articles into a quences of satellite images that depict a certain city over a certain development set (4.932 articles) and test set (1.234 articles). length of time. They are required to create a binary classifier that determines whether or not there was a flooding event ongoing in 3.3 City-centered satellite sequences that city at that time. Because this is the first year we work with The dataset for last subtask was derived from the Sentinel-2 satel- sequences of satellite images, the data will be balanced so that the lite archive of the European Space Agency (ESA) and the Coperni- prior probability of the image sequence depicting a flooding event cus Emergency Management Service (EMS). We collected satellite is 50%. This design decision will allow us to better understand the images for past flooding events that have been mapped and vali- task. Challenges of the task include cloud cover and ground-level dated by human annotators from the Copernicus EMS team. Rather changes with non-flood causes. For this subtask, participants are than relying on a single satellite image to estimate flood sever- allowed to submit the following five runs: ity, we consider a sequence of images. We provide multi-spectral • Required run 1: using the provided satellite imagery Sentinel-2 images, since bands beyond the visible RGB-channels • General run 2, 3, 4, 5: everything automated allowed, in- contain vital information about water. Please note, that we use L2A cluding using data from external sources pre-processed Sentinel-2 images which are already atmospheric corrected and consists of 12 bands2 . For each flooding event, we determine and download the corresponding Sentinel-2 image se- 3 DATASET DETAILS quences that have been recorded 45 days before and 45 days after 3.1 Image-based News Topic Disambiguation the flooding event. We compute the intersection of the satellite The dataset for this task contains links to images that were accom- images with the ground truth obtained from the EMS service and panying English-language news articles. News articles published in split the image sequences into smaller patches of size 512 x 512 2017 and 2018, were collected from ten local newspapers for multi- pixels. This resulted in a set of 335 image sequences. Depending on ple African countries (Kenya, Liberia, Sierra Leone, Tanzania and the constellation of the Sentinel-2 satellites, we obtained for each Uganda) if they contained at least one image and at least one occur- sequence between 4 and 20 image patches. For each image patch, rence of the word flood, floods or flooding in the text. This resulted we provide additional metadata such as cloud cover and the amount in a set of 17.378 images. We noticed that there is a large number of black pixels due to errors in the data acquisition. The label is of duplicates in the dataset, therefore we applied a de-duplication created based on the intersection of the images in each sequence algorithm and filtered out images such that we finally obtained a with the manually annotated flood extend of EMS (0=no overlap, set of unique URLs for a all images in the dataset. This filtering 1=overlap with image sequence). We split the sequences with 80/20 step decreased the size of the dataset to 6.477 images. The ground into a development set and test set. truth data of the dataset consists of a class label (0=not flood event related/1=flood event related) for each image. This was extracted 4 EVALUATION by three human annotators, who labeled the images based on the In order to evaluate the approaches we will use the metric F1-Score image and text content of each article. The images for this task were for all three subtasks. The metric computes the harmonic mean divided into a development set (5.181 images) and test set (1.296 between precision and recall for the corresponding class of the task. images) using stratified sampling with a split ratio of 80/20. ACKNOWLEDGMENTS 3.2 Multimodal Flood Level Estimation This work was supported BMBF project DeFuseNN (01IW17002) and the NVIDIA AI Lab (NVAIL) program. We would like to thank The dataset the Multimodal Flood Level Estimation task was ex- the FloodTags team for giving us access to the links of news articles. tracted from the same African newspapers articles that were col- lected for the above described subtask. However, rather than in the previous task, we provide participants not only with images but 1 https://github.com/multimediaeval/2019-Multimedia-Satellite-Task/raw/wiki-data/ rather the complete article. In total we collected 6.166 articles with multimodal-flood-level-estimation/resources/via.html 2 Since L2A images contain Bottom-Of-Atmosphere corrected reflectance, Band 10 is the word flooding, floods. missing since it corresponds to Cirrus clouds The 2019 Multimedia Satellite Task MediaEval’19, 27-29 October 2019, Sophia Antipolis, France REFERENCES [1] Benjamin Bischke, Patrick Helber, Christian Schulze, Venkat Srini- vasan, Andreas Dengel, and Damian Borth. 2017. The Multimedia Satellite Task at MediaEval 2017. In Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), Dublin, Ireland, September 13-15, 2017. [2] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and Damian Borth. 2018. The Multimedia Satellite Task at MediaEval 2018. In Working Notes Proceedings of the MediaEval 2018 Workshop, Sophia Antipolis, France, 29-31 October 2018. [3] Tom Brouwer, Dirk Eilander, Arnejan Van Loenen, Martijn J Booij, Kathelijne M Wijnberg, Jan S Verkade, and Jurjen Wagemaker. 2017. Probabilistic flood extent estimates from social media flood observa- tions. Natural Hazards & Earth System Sciences 17, 5 (2017). [4] Abhishek Dutta and Andrew Zisserman. 2019. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19). ACM, New York, NY, USA, 4. https://doi.org/10.1145/3343031.3350535 [5] Dirk Eilander, Patricia Trambauer, Jurjen Wagemaker, and Arnejan Van Loenen. 2016. Harvesting social media for generation of near real-time flood maps. Procedia Engineering 154 (2016), 176–183. [6] Huiji Gao, Geoffrey Barbier, and Rebecca Goolsby. 2011. Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intelligent Systems 26, 3 (2011), 10–14. [7] Jessica Heinzelman and Carol Waters. 2010. Crowdsourcing crisis information in disaster-affected Haiti. US Institute of Peace Washington, DC. [8] Min Jing, Bryan W Scotney, Sonya A Coleman, Martin T McGinnity, Stephen Kelly, Xiubo Zhang, Khurshid Ahmad, Antje Schlaf, Sabine Grunder-Fahrer, and Gerhard Heyer. 2016. Flood event image recog- nition via social media image and text analysis. In IARIA conference COGNITIVE. [9] Shamanth Kumar, Geoffrey Barbier, Mohammad Ali Abbasi, and Huan Liu. 2011. Tweettracker: An analysis tool for humanitarian and disaster relief. In Fifth international AAAI conference on weblogs and social media. [10] Peter M Landwehr and Kathleen M Carley. 2014. Social media in disaster relief. In Data mining and knowledge discovery for big data. Springer, 225–257. [11] Ida Norheim-Hagtun and Patrick Meier. 2010. Crowdsourcing for crisis mapping in Haiti. Innovations: Technology, Governance, Globalization 5, 4 (2010), 81–89. [12] Tim GJ Rudner, Marc Rußwurm, Jakub Fil, Ramona Pelich, Benjamin Bischke, Veronika Kopačková, and Piotr Biliński. 2019. Multi3Net: Segmenting Flooded Buildings via Fusion of Multiresolution, Mul- tisensor, and Multitemporal Satellite Imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 702–709.