=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_12
|storemode=property
|title=WISC at MediaEval 2017: Multimedia Satellite Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_12.pdf
|volume=Vol-1984
|authors=Nataliya Tkachenko,Arkaitz Zubiaga,Rob Procter
|dblpUrl=https://dblp.org/rec/conf/mediaeval/TkachenkoZP17
}}
==WISC at MediaEval 2017: Multimedia Satellite Task==
WISC at MediaEval 2017: Multimedia Satellite Task Nataliya Tkachenko1,2 , Arkaitz Zubiaga2 , Rob Procter1,2,3 1 Warwick Institute for the Science of Cities, University of Warwick, Coventry CV4 7AL, UK 2 Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK 3 The Alan Turing Institute, The British Library, London NW1 2DB, UK {nataliya.tkachenko,a.zubiaga,rob.procter}@warwick.ac.uk ABSTRACT 2 RELATED WORK This working note describes the work of the WISC team on the Mul- Recent work has proposed to combine traditional data sources timedia Satellite Task at MediaEval 2017. We describe the runs that such as sensor networks (e.g., river gauges and pluviometers) with our team submitted to both the DIRSM and FDSI subtasks, as well user-generated data from social media such as Twitter and Flickr as our evaluations on the development set. Our results demonstrate (Tkachenko et al., under review). To the best of our knowledge, high accuracy in the detection of flooded areas from user-generated however, the combination of satellite images and georeferenced content in social media. In the first subtask consisting of disaster UGC has not been tackled in scientific research, potentially because image retrieval from social media, we found that tags defined by it may be of limited use in areas with high percentage of cloud users to describe the images are very helpful for achieving high ac- coverage (e.g., Northern Europe) or for being very expensive due curacy classification. In the second subtask consisting of detecting to the need of sufficiently high spatio-temporal resolution. flood in satellite images, we found that social media can increase With the growing availability of the free or inexpensive multi- the precision in analyses when combined with satellite images by and hyperspectral image tiles, it is becoming increasingly important taking advantage of spatial and temporal overlaps between data to understand how such data sources perform alongside new meth- sources. ods and how their combined use can help overcome each other’s limitations when used independently. With our participation in the Multimedia Satellite Task, we aimed to analyse how social media can be used to identify flooded areas, as well as to identify the best 1 INTRODUCTION classification approaches. Accurate and timely designation of flooded areas is beneficial to help build and maintain situational awareness and to estimate the 3 APPROACH impact of natural hazards [4]. When it comes to the estimation of impact, there is no consistency across experts as to the different 3.1 DIRSM Subtask methods used to measure impact [3, 11]. Assessing and comparing 3.1.1 Experiment Settings. We performed 10-fold cross-validation disaster impact has traditionally been deemed a very challenging experiments. We used two different ways of evaluating our ap- task as systematic data or studies are hard to obtain. Moreover, proaches: (1) precision, recall and F1 score over the positive (flood) historical data gathered from different sources covering different class, and (2) Average Precision at X (AP@X) at various cutoffs, regions cannot be effectively used for new regions or at different X={50, 100, 200, 300, 400, 500}. Since the official evaluation relies points in time. Examples of valuation techniques used for impact on the latter, we ended up choosing our best submissions based on assessments include market based techniques, such as property AP@X, especially looking at X={50, 100, 200}, as the other values destruction, reduction in income and sales, and non-market based were rather high for our smaller test sets. techniques, such as loss of life, various environmental consequences 3.1.2 Features. We use combinations of these features: and psychological effects suffered by the affected individuals [3, 6, • Visual features: having performed leave-one-out tests of 11]. combinations of the visual features provided by the organ- Owing to the importance of furthering impact estimation tech- isers, we found the best combination to be that including niques, interest in the development of computational approaches CEDD, CL and GABOR. has increased [2, 12]. Current methods for flood detection and flood • Metadata: we combined three of the metadata provided impact estimation make use of contemporary, open data sources with the dataset, namely description, title and tags. The fea- such as social media [7, 9, 10]. The objective of this shared task and tures were all represented using a bag-of-words approach, the contribution by the Warwick Institute for the Science of Cities however, we built three separate vectors, one for each meta- (WISC) team is to assess the extent to which these techniques ap- data, which were then concatenated into a single vector. proximate the results obtained through traditional methods of flood With all three features, we lowercased the texts, and to- detection, such as local sensor networks and satellite images [1]. kenised them by the space character. We also tokenised This working note presents our efforts and achievements towards multi-word user tags. this objective. • Word embeddings: we trained word embeddings from a large collection of titles, descriptions and user tags. We Copyright held by the owner/author(s). used the entire YFCC100m dataset [8] to get overall 215 MediaEval’17, 13-15 September 2017, Dublin, Ireland million input texts combining all three types of features, which were fed into a Gensim word embedding with 300 MediaEval’17, 13-15 September 2017, Dublin, Ireland N. Tkachenko et al. dimensions [5]. These word embeddings were then used to Avg. over create vectors of 300 dimensions for each of title, descrip- Run no. X = {50, 100, 250, 480} X = 480 tion and user tags of each image. To create word vectors #1 0.6275 0.5095 for each sentence, we averaged the word vectors of the #2 0.7437 0.6678 words composing the sentence, as in [13]. #3 0.8087 0.7226 • Machine translation: we used the Bing machine transla- #4 0.8161 0.7197 tion API to translate user tags into English, where a user #5 0.8199 0.7210 tag was not originally in English. We used the translation Table 2: DIRSM results on the test set. package for Python1 to achieve this. 3.1.3 Classifiers. We tested different classifiers, including a Lo- gistic Regression classifier, Random Forests, Multinomial Naive Tables 1 and 2 show our results on the development and test Bayes and Multi-layer Perceptron. We opted to build our system sets, respectively. While results are similar over the development using a Logistic Regression classifier based on the performance ob- set, we observe remarkable differences in the test set. The metadata served on the development set. We used confidence scores provided classifier (#2) performs better than that based on visual features (#1), by the classifier to rank the images. however, the combination of both leads to substantial improvements (#3). There is still a considerable improvement when we used deep 3.2 FDSI Subtask learning to represent the features using word vectors (#4), and a In this subtask, we performed the selection of the spectral images in further slight improvement when we used machine translation to the first instance, which were possible to construct from the 4-band have all tags consistently in English (#5). spectral resolution data supplied. Selected indices in question were LWI (Land Water Index), NDVI (Normalised Difference Vegetation 4.2 FDSI Subtask Index) and NDWI (Normalised Difference Water Index). For the subsequent runs we used machine learning methods for supervised Run no. Jaccard (Dev. Set) classification and for unsupervised clustering machine learning. #1 0.83 This was applied to the NDWI as the best performing spectral index #2 0.87 in the first step of the FDSI task. Table 3: FDSI results on the development set. We also developed a second run, where we used KMeans to achieve binary image segmentations on the basis of the spatial distribution of the spectrally concentrated and transitioned pixels. Run no. Jaccard (Test Set 1) Jaccard (Test Set 2) 4 EXPERIMENTS AND RESULTS #1 0.80 0.83 4.1 DIRSM Subtask #2 0.81 0.77 Based on performance assessments, we chose these 5 submissions: Table 4: FDSI results on the test set. • Run 1, visual information: only visual features. • Run 2, metadata: only metadata features. • Run 3, visual information and metadata: both features Tables 3 and 4 show our results on the development and test by concatenating the two vectors. sets, respectively. Our results show the benefit of leveraging social • Run 4, word vectors: we concatenate five vectors for vi- media features (#2) over not using them (#1) when training and sual features, metadata, word vectors of titles, word vectors testing data overlap spatially and temporally (Test Set 1). This is, of user tags and word vectors of descriptions. however, not the case for the Test Set 2 where the test data includes • Run 5, machine translation and word vectors: we con- new locations, which we aim to explore further in future work. catenate five vectors for visual features, metadata, word vectors of titles, word vectors of machine translated user 5 CONCLUSION tags and word vectors of descriptions. We have explored the use of classifiers to identify social media Run no. X = 50 X = 100 X = 200 X = 300 X = 400 images of flooded areas. In the DIRSM task we have found that com- #1 0.980 0.990 0.995 0.793 0.596 bining both visual features and social metadata can be beneficial, and that the use of external resources to train word embeddings #2 0.980 0.990 0.988 0.676 0.507 and translate the metadata into English can lead to even further #3 0.980 0.990 0.979 0.671 0.504 improvements. In the FDSI task, our results showed higher accu- #4 0.980 0.990 0.975 0.666 0.500 racy detection for the flooded areas with help of the social media #5 0.980 0.990 0.974 0.665 0.500 classifiers. Social media can boost precision in combined analyses, Table 1: DIRSM results on the development set. where training and test data overlap spatially and temporally. ACKNOWLEDGMENTS 1 https://pypi.python.org/pypi/translation We wish to thank the Alan Turing Institute for its support. Multimedia Satellite Task MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Benjamin Bischke, Patrick Helber, Christian Schulze, Srinivasan Venkat, Andreas Dengel, and Damian Borth. The Multimedia Satellite Task at MediaEval 2017: Emergence Response for Flooding Events. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland. [2] J Fohringer, D Dransch, H Kreibich, and K Schröter. 2015. Social media as an information source for rapid flood inundation mapping. Natural Hazards and Earth System Sciences 15, 12 (2015), 2725–2738. [3] Valentina Gallina, Silvia Torresan, Andrea Critto, Anna Sperotto, Thomas Glade, and Antonio Marcomini. 2016. A review of multi- risk methodologies for natural hazards: Consequences and challenges for a climate change impact assessment. Journal of environmental management 168 (2016), 123–132. [4] Benjamin Herfort, João Porto de Albuquerque, Svend-Jonas Schel- horn, and Alexander Zipf. 2014. Exploring the geographical relations between social media and flood phenomena to improve situational awareness. In Connecting a digital Europe through location and place. Springer, 55–71. [5] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119. [6] OECD. 2013. OECD: New Data for Understanding the Human Con- dition. OECD Global Science Forum Report on Data and Research Infrastructure for the Social Sciences. (2013). [7] Luke Smith, Qiuhua Liang, Phil James, and Wen Lin. 2015. Assessing the utility of social media as a data source for flood risk manage- ment using a real-time modelling framework. Journal of Flood Risk Management (2015). [8] Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64–73. [9] Nataliya Tkachenko, Stephen Jarvis, and Rob Procter. 2017. Predicting floods with Flickr tags. PloS one 12, 2 (2017), e0172870. [10] Nataliya Tkachenko, Rob Procter, and Stephen Jarvis. 2016. Predicting the impact of urban flooding using open data. Open Science 3, 5 (2016), 160013. [11] Gisela Wachinger, Ortwin Renn, Chloe Begg, and Christian Kuhlicke. 2013. The risk perception paradoxâĂŤimplications for governance and communication of natural hazards. Risk analysis 33, 6 (2013), 1049–1065. [12] Huan Wu, Robert F Adler, Yudong Tian, George J Huffman, Hongyi Li, and JianJian Wang. 2014. Real-time global flood estimation using satellite-based precipitation and a coupled land surface and routing model. Water Resources Research 50, 3 (2014), 2693–2717. [13] Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, and Michal Lukasik. 2016. Stance classification in rumours as a sequential task exploiting the tree structure of social media conversations. In Proceedings of the 26th International Conference on Computational Linguistics.