=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_46
|storemode=property
|title=BMC@MediaEval 2017 Multimedia Satellite Task via Regression Random Forest
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_46.pdf
|volume=Vol-1984
|authors=Xiyao Fu,Yi Bin,Liang Peng,Jie Zhou,Yang Yang,Heng Tao Shen
|dblpUrl=https://dblp.org/rec/conf/mediaeval/FuBPZ0S17
}}
==BMC@MediaEval 2017 Multimedia Satellite Task via Regression Random Forest==
BMC@MediaEval 2017 Multimedia Satellite Task Via Regression Random Forest Xiyao Fu, Yi Bin, Liang Peng, Jie Zhou Yang Yang, Heng Tao Shen Center for Future Media and School of Computer Science and Engineering University of Electronic Science and Technology of China fu.xiyao.gm@gmail.com,yi.bin@hotmail.com,pliang951125@outlook.com,jiezhou0714@gmail.com dlyyang@gmail.com,shenhengtao@hotmail.com ABSTRACT used to map the variance is time-consuming, and the data format In the MediaEval 2017 Multimedia Satellite Task, we propose an is not fit for FDSI task to deal with. approach based on regression random forest which can extract In this paper, we propose to employ regression random forests valuable information from a few images and their corresponding to rank the relevance of flooding in both social media images taken metadata. The experimental results show that when processing so- by cameras and satellite images describing the overall situation of cial media images, the proposed method can be high-performance a certain district. It is shown that our method can well balance the in circumstances where the images features are low-level and the efficiency and effectivity. In other words, our method achieves com- training samples are relatively small of number. Additionally, when patible performance with extraordinary short time-consuming. More the low-level color features of satellite images are too ambiguous specifically, we design two prediction systems based on regres- to analyze, random forest is also a effective way to detect flooding sion random forest method, which has been proven effective in area. handling high-dimensional data and preventing overfitting when training set is comparatively small [7]. In the rest of this paper, we mainly discuss the approach developed for our systems and the 1 INTRODUCTION evaluation of experimental results. The outburst of social media provides us with an opportunity to deal with specific tasks, e.g., disaster prediction and specific scene 2 APPROACH DESCRIPTION identification. Such problems can be crucial in agriculture, urban- 2.1 DIRSM Subtask ization and environment monitoring. The MediaEval 2017 Multime- The goal of DIRSM subtask is to retrieve all images which show dia Satellite Task consists of two subtasks: Disaster Image Retrieval direct evidence of a flooding event from social media streams. The from Social Media (DIRSM) task and Flood-Detection in Satellite details of this subtask are described in [2]. The main challenge of Images(FDSI) task. The former one requires the prediction system this task is in two folds: (a) discrimination of the water levels in to identify flooding circumstances in social media pictures, while different areas, and (b) consideration of different types of flooding the latter task aims to judge that which district in a certain area of events. In many cases the images can be confusing to classify (e.g., a satellite image is suffering from flooding. to tell images showing a flushing river or a rainforest from the real As to the theoretical basis of the task, existing work such as [1], flooding ones such as a flooded park). which using Twitter as main data source and analyze associated ge- ographical, textual, temporal and social media information. They 2.1.1 Feature Extraction. In recent years, Convolutional Neural split the task into four events (metadata analysis, text analysis, im- Networks (CNNs) has been dominating in the field of computer vi- age analysis and temporal aggregation), each of which represents sion, such as recognition and detection. Therefore, except for the an utilizations of the data from tweets. In the processing of the baseline features provided by the organizers, we also extract robust satellite images in FDSI subtask, Chaouch et al. [3] exploited and CNN feature to improve the performance of our system. Specifi- combined different low-level color selectors to identify flooding cally, we apply ResNet-152 [4] as the extraction network, and em- areas on different satellite pictures. However, they used RGB color ploy Caffe toolbox [5] to extract features from the training set. Each map to detect flooding area by predicting the water level, which image is extracted from the bottom conv layer of ResNet. The di- means that this method requires the images to have strong diver- mension of each image feature vector is 2048. sity. In other conditions such as this subtask when the colors of the images is dim, the method may have difficulty processing them. In [6], the authors employed SVMs and low-level features descrip- 2.1.2 Models Definition. In order to improve the performance tors (e.g., SIFT descriptor) to detect fire scenes. In [8], the authors of our system, the learning algorithm we choose must satisfy sev- aim to generate spatial variants of satellite images in order to map eral requirements: (a) it should remain high-performance under the flooding areas. Same as aforementioned, the histogram they circumstances where data is restricted; (b) it should excel in ac- curacy among current algorithms; (c) it can handle thousands of input variables without variable deletion; (d) its speed should be Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland high enough. Due to the consideration above, we use regression random forest for the ranking of the 5 runs. MediaEval’17, 13-15 September 2017, Dublin, Ireland X. Fu et al. As an important application in ensemble methods, random for- Table 1: Performance on Testset of DIRSM Subtask est is a high-performance method both for classification and regres- sion. When the number of training set images is not large enough run1 run2 run3 run4 run5 to utilize other learning methods (e.g., deep learning), using ran- MAP 19.21 12.84 18.30 17.24 17.72 dom forest can prevent overfitting and unbalance of features in datasets [7]. A random forest consists of many classification(or re- gression) trees and uses bagging mechanism to learn base estima- Table 2: Performan on Testset of FDSI Subtask tors. As one of the main contributions in ensemble methods, bag- ging requires randomly allocating training data (including features run1 run2 run3 run4 run5 learned) to each classifier(regressor) to train a base estimator. location 01 0.3657 0.368 0.3525 0.3617 0.3678 Theoretically, the results will improve with the number of trees location 02 0.3286 0.3125 0.3226 0.3221 0.3224 in a random forest increasing. However, since the computation cost location 03 0.3408 0.3359 0.342 0.3486 0.3421 increases as well as the advancement decreases when new trees location 04 0.3107 0.32 0.3155 0.3129 0.3106 are added to a larger forest. This phenomenon indicates that the location 05 0.426 0.427 0.424 0.433 0.4341 number of trees should be limited. new location 0.402 0.401 0.402 0.403 0.401 As for the details of the parameters setting, we set the bagging percent of the forest as 0.9 [9], the minimal leaf size as 10 (when the data points get down the value, stop splitting the data), we set average precision at the top 50, 100, 200, 300, 400 and 500 rankings the number of regression trees in the random forest as 500. During of each run. training, we split the development set into training set and valida- Experimental results of DIRSM are shown in Table 1. As we tion set with 80 and 20 percent, respectively. can see, metadata only (run 2) perform much worse than visual information only model (run 1), which indicates that associated 2.2 FDSI Subtask descriptions are much noisy than visual information for flooding The aim of the FDSI subtask is to develop a model that is able to prediction. Intuitively, more feature bring more information, and identify regions in satellite imagery which are affected by a flood- gain better performance. However, run 3 (combination of visual ing. Same as the DIRSM subtask, the details can be find in [2]. The and textual feature) performs a little worse than run 1. This also main challenge relies on defining flooding area based on conjoint demonstrates that textual description introduces much noise, even area’s situation. For example, in a satellite image, a lake has bounds induce to decrease the performance of original images. and belongs to the area without flooding, while a river does not have intact bounds and partly belongs to flooding area. 3.2 FDSI Subtask The same as before, we use ResNet and caffe to extract the fea- For the run 1, 2 and 3, we only utilize the original satellite im- tures of the satellite images. We still use the 2048-dim vectors of ages and their ground truth masks in training. For run 4 and 5, we bottom conv layer in ResNet. Meanwhile, we utilize random for- use the cropped satellite images and horizontally flipped images as est to process the images in the development set due to the same well. reason, which is that the number of features to learn in the satel- Table 2 exhibits the intersection of union (IoU) of experimental lite images is small and the development set is even smaller than results. the best performances lie in the location 05 and the new the DIRSM subtask. We set the bagging percent of the forest at 0.8, location provided by the task organizers. Possible reasons may be the minimal leaf size as 20, and the number of trees at 400. During that the number of the images in these location is relatively small, training, we use the first four development set folders as training reducing the possibility of overfitting. Besides, the mean perfor- set, the other two as validation set. mance of the last 2 runs is better than the first 3 ones, claiming that the general run cast the better results. It is possible that the 3 EXPERIMENTS AND RESULTS performance in the first 3 ones suffer from the variance of more 3.1 DIRSM Subtask images, but generally the results are at large satisfying. Because of different requirements in the 5 runs, we train features with slightly different setting. In run 1, 4 and 5, we only utilize the 4 CONCLUSION images features in training. To augment the dataset, we randomly In this paper, we illustrated our approach for the MediaEval 2017 crop and flip horizontally the original images in the development Multimedia Satellite Task. In both subtasks, combining random forests set, and obtain 5320 more images for run 4 and 5 additionally. In and CNN features enhanced the performance of the detection. In run 2, we utilize the text given in associated metadata of the devel- the DIRSM subtask, combining the features learnt from text and opment set. We process each word in every image description as images improved the regression performance of labeling, but our GloVe vector, the dimension of each vector 300, and constrict each methods still suffer from noise. As to FDSI subtask, the perfor- sentence to its maximum length to generate a matrix including all mance of the proposed method could be better when the number the sentences. In run 3, we use both the text and the development of test images fewer. The best result remained in the location 05 set images to train the random forest. and the new location. Overall, the proposed method got promising The mean average precision (mAP) scores we get from the 5 performance in processing both social media stream and satellite runs is shown in Table 1. The mAP scores listed are the mean of images. Multimedia Satellite Task MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Benjamin Bischke, Damian Borth, Christian Schulze, and Andreas Den- gel. Contextual enrichment of remote-sensed events with social me- dia streams. In Proceedings of the 2016 ACM on Multimedia Conference, 2016. [2] Benjamin Bischke, Patrick Helber, Christian Schulze, Srinivasan Venkat, Andreas Dengel, and Damian Borth. The multimedia satel- lite task at mediaeval 2017: Emergence response for flooding events. In Proc. of the MediaEval 2017 Workshop, Sept. 13-15, 2017. [3] Naira Chaouch, Marouane Temimi, Scott Hagen, John Weishampel, Stephen Medeiros, and Reza Khanbilvardi. A synergetic use of satellite imagery from sar and optical sensors to improve coastal flood mapping in the gulf of mexico. In Hydrological processes, 2012. [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep resid- ual learning for image recognition. In Proceedings of the IEEE confer- ence on computer vision and pattern recognition, 2016. [5] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, 2014. [6] Ryan Lagerstrom, Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert Power, Bella Robinson, and Tomasz Bednarz. Image classification to support emergency situation awareness. In Frontiers in Robotics and AI, 2016. [7] Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. In R News, 2002. [8] Igor Ogashawara, Marcelo Pedroso Curtarelli, and Celso M Ferreira. The use of optical remote sensing for mapping flooded areas. In Inter- national Journal of Engineering Research and Application, 2013. [9] Vladimir Svetnik, Andy Liaw, Christopher Tong, J Christopher Cul- berson, Robert P Sheridan, and Bradley P Feuston. Random forest: a classification and regression tool for compound classification and qsar modeling. In Journal of chemical information and computer sciences, 2003.