=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_24
|storemode=property
|title=Automatic Annotation of Change in Earth Observation Imagery
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_24.pdf
|volume=Vol-2621
|authors=Nathalie Neptune
|dblpUrl=https://dblp.org/rec/conf/circle/Neptune20
}}
==Automatic Annotation of Change in Earth Observation Imagery==
Automatic Annotation of Change in Earth Observation Imagery Nathalie Neptune nathalie.neptune@irit.fr Under the supervision of Josiane Mothe IRIT, UMR 5505, CNRS Toulouse, France ABSTRACT not yet been annotated. In fact, unannotated EO images are plentiful. Earth observation images are plentiful and have been used for sev- The Landsat and Copernicus programs provide free access to the eral decades for many tasks such as classifying land cover, monitor- images produced by their satellite missions with new images made ing agriculture and more generally for detecting changes happening available every day. However, using these images for semantic on the surface of the Earth. There exist however several challenges change detection poses the challenge of correctly identifying the in using these images in information systems. One of them is the classes for each pixel of each image. Without labels, changes can annotation of the images with semantic descriptors. To address still be detected by comparing the images, however the semantic this, as our PhD main objective, we propose a new approach for information about those changes will be missing. Without this the semantic annotation of changes detected in satellite images information it is not possible to tell for example that a piece of land using annotations from an unaligned text corpus. Using scientific previously covered by trees has now changed into a road. Which publications on the areas of interest and related to the changes is why class labels or annotations for each pixel of each image observed in the images, we will extract candidate keywords for the need to be acquired either through human (expert) annotators or annotations. State of the art image change detection techniques will automatic methods. The latter can fill the gap in a context where be used to find the regions within the images where the changes expert annotators are not available and few or no annotated images have occurred. The keywords and change pixels will be matched exist for a given area. In fact, machine learning algorithms can by jointly embedding them into a low dimension space. learn to automatically annotate images from previously annotated examples. A sufficient number of examples must still be provided CCS CONCEPTS to train these algorithms. Supervised machine learning has been successfully used for • Computing methodologies → Matching; • Applied comput- semantic change detection [6]. Such models are trained on images ing → Annotation; Environmental sciences. along with their semantically segmented masks. In the case of KEYWORDS deep learning models in particular, the scale of the data needed for Image annotation, Multimodal learning, Text-image matching training makes it impractical to have experts manually annotate all the images. Crowdsourcing has been used to provide image 1 INTRODUCTION annotations at very large scale [19]. A similar approach is not well The comparison of two earth observation (EO) images of an area, suited for EO images because some expertise or training might be taken at different times, allows to detect whether changes have required to properly identify and differentiate among classes. As a occurred in that area during that time frame. This process is called result, automatic and semi-automatic approaches for the annotation change detection [21]. Various applications call for the use of change task are commonly used to build large EO data sets such as [20] detection techniques such as the monitoring of forest disturbance and [11]. and loss [11, 25], the tracking of urban change [23], and the mapping Scientific literature published by researchers who work with of changes caused by natural disasters [3, 7]. earth observation images is undoubtedly a source of expert knowl- Semantic annotation of images involves associating them with edge in the field. Publications in Earth sciences therefore can be semantic keywords [5]. Likewise, adding semantic information to seen as a very large source of expertise that could be leveraged for changes detected in EO images results in semantic change detection. interpreting EO images. Furthermore, it is available at a large scale. There exist several collections of annotated EO data that can be In fact, across all scientific disciplines, the number of publications used for semantic change detection such as [11], [8], [17] and [9]. has grown exponentially in the past decades [2]. However, many are either limited by their geographical coverage With visual semantic embeddings we learn to represent textual [17] [9], or by their temporal coverage [9] or by their topical cover- and visual data in the same latent space. The closer those data age [11]. Moreover, all the previously cited image collections are points are semantically, the smaller their distance is in the joint provided on a yearly basis only. Therefore, using these datasets for space. Several approaches have been proposed including the joint semantic change detection may not be suitable in cases where the embedding of images and words into a common low dimension goal is to detect changes in a timely manner. space [10] for image classification, and the embedding of images Therefore, to detect changes that are happening in an area of and sentences into a common space for image description [22]. interest at a specific time, one might need to use images that have "Copyright © 2020 for this paper by its author. Use permitted under Creative Commons https://www.usgs.gov/land-resources/nli/landsat License Attribution 4.0 International (CC BY 4.0)." https://www.copernicus.eu/en Nathalie Neptune While change detection can be applied to any type of image pairs When a machine learning algorithm can, at testing time, iden- of the same scene or location, we are focusing on the case of tree tify classes that were not previously seen during training, it is loss cover in tree covered areas to test our approach. For these cases, performing "zero-shot" classification. Few-shot learning happens the texts may not be available at testing time. The model needs to when the number of examples of a class is very low, for training. learn from the labels and make predictions without extracting new Both zero-shot and few-shot learning methods have been proposed annotations when images are provided without accompanying text for learning classes with missing labels or with insufficient exam- at testing time. The text data in this case is considered privileged ples in the training data for supervised land cover and land use information as defined by [24] and is used only when training the classification models [13]. model. Our proposed approach differs from [13] in that we will perform By using joint image and text embeddings we will automatically change detection with a deep learning model and our word vectors assign relevant annotations to image pairs and segments where will also be trained on a specialized corpus of scientific documents. changes are detected. Therefore we will perform two core tasks, This should make our model more scalable while providing more change detection and the annotation of the changes. We propose relevant annotations specifically for change detection as opposed to use scientific publications as a source of annotations which will to classification alone. Unlike [6, 16] our model is suitable for a be extracted using a neural language model. These annotations can change detection dataset that is not fully annotated meaning that be used in subsequent tasks such as image indexing and retrieval. some annotations may be missing or incorrect. We are also not manually building an ontology like [4]. Our approach is closer to [23], however they only take into account the location of the 2 BACKGROUND text that is used to learn the annotations, they do not consider the The first step in the semantic annotation of changes in EO images temporal relation between the text and the images which is useful is to locate the changes within the images. Binary change detection for change detection. methods only detect if and where a change occurred, while semantic change detection methods also specify semantic information about 3 RESEARCH QUESTIONS the change that is detected [14]. Both types of methods have been Our primary goal is to perform semantic change detection on satel- applied to satellite images to detect land cover changes with the lite images by combining image change detection with text infor- most recent ones using deep learning models [6, 16]. mation extraction, adding semantic annotations to the detected Dault et al. [6] proposed a change detection deep learning model changes. To the best of our knowledge this is the first time this type for satellite images based on U-Net [18] which is an encoder-decoder of approach has be proposed for the semantic change detection architecture with skip connections between the encoding and de- problem. In addition, we seek to improve the performance of the im- coding streams. The U-Net architecture was initially proposed for age pair change detection task by using these extracted annotations the segmentation of biomedical images. Three variations of the when training the model. model are proposed. The first model performs early fusion by tak- ing the concatenation of the images as input effectively treating 3.1 Hypotheses them as different color channels. In this case, the change detection problem is posed as an image segmentation task with two classes: When related scientific documents are used, relevant annotations change and no-change. The second and third models are siamese for change detection in image pairs can be extracted and used for variations of the U-Net architecture where the encoder part is du- semantic change detection in the absence of labels provided by plicated to encode each image separately and skip connections are experts. This implies that the most important changes have been used in two ways, by concatenating either the skip connections studied and documented by scientists and can be found in their from both encoding streams or the absolute value of their difference. publications. With a visual-semantic model, we can make the link Another variation of U-Net with early fusion which uses dense skip between the changed segments and the labels in zero and few-shot connections was proposed by [16]. learning cases. By using these annotations as privileged information When the semantic information about the classes present in the when training the model, the change detection algorithm should image pair is inconsistent or lacking, one solution is to include other be able to detect changes with higher accuracy compared to when sources of information such as ontologies [3] or geo-referenced images alone are used. Wikipedia articles[23]. Our proposed approach combines visual semantic embeddings 3.2 Experimental setup for annotating changes, and adds the semantic label to binary To test our model, we will use images from an area of interest in change detection. The semantic labels will be extracted from a Madre de Dios, in Peru, corresponding to the Tile 19LCF from the text corpus made of publications related to the area and the type Sentinel satellite missions. We will use all available Sentinel-1 and of change of interest in time and space, using word embeddings. Sentinel-2 images of the area from 2015 to 2017. This region was The changes will be detected using change detection methods from chosen because it is an area where there are a lot of reported defor- the state of the art [6, 16] with the extracted labels added as priv- estation incidents. Madre de Dios was the department in Peru with ileged information in the learning process using heteroscedastic the second largest area affected by deforestation in 2017, according dropout which was proposed by [12] as a way to learn using priv- to a report by the Ministry of Environment of Peru [15]. ileged information in deep neural networks. The variance of the heteroscedastic dropout is a function of the privileged information. https://sentinel.esa.int/web/sentinel/missions Automatic Annotation of Change in Earth Observation Imagery We collected abstracts of publications from the Web of Science algorithm, it came at the high cost of expert hours. The crowdsourc- to create our corpus. We retrieved a total of 298 publications using ing approach using Wikipedia data in [23] while promising, resulted the keywords "Madre de Dios" for the topic and taking only articles only in modest improvements for the semantic segmentation task. and proceeding papers for the years 2000 to 2019. Using the name of Ideally, we would like to combine the benefits of knowledge from the area allowed us to exclude articles that were not relevant to the experts and big data from crowdsourcing. We propose to use the location. By only taking papers published in this 19 year period we expert knowledge through relevant scientific articles from which avoid those that were written too early before the Sentinel missions annotations will be extracted. Adapted versions of state of the art started in 2014. However, limiting the collection only to articles change detection models from [6] and [16] will be used, to test on published after 2014 yields too few results. Articles published after images from our area of interest. Both Sentinel-2 optical imagery 2017 might also include changes observed after our most recent and Sentinel-1 radar imagery will be included. Using images from image, but including them allows for better coverage of changes a tropical rainforest area means that there will be significant cloud present in our image dataset because they are more likely to appear cover making a large number of pixels, in the optical images, non in a publication months or years later. usable for change detection. The radar imagery will help mitigate For the image (change/no-change) labels we use data from two this problem. sources, the GLAD alert system [11] and the Geobosques system Our proposed method will therefore perform change detection, [25]. The alerts will be used as labels for a supervised change de- in satellite image pairs by predicting change pixels. The method tection learning algorithm applied to images of tree covered areas. will also perform semantic annotation of the detected changes by For each pixel these alerts indicate whether there was tree cover predicting their labels. loss or not. We will use the definition of tree cover as vegetation For the annotations, we will use word embeddings trained on that is at least 5 meters tall with a minimum of 60% canopy cover Wikipedia and align them with our text dataset using the RCSLS as it is defined in the GLAD alert system. criterion [1]. These embeddings cover a large vocabulary on many Our deep learning model will be based on binary change detec- topics, most of which are not relevant to our case. This is why tion methods [6, 16] for image pairs. The model will be adapted to we will realign them with embeddings from our specific corpus to use the semantic information extracted from our corpus to learn ensure that the distribution of words matches our intended topic. the annotations for the images and annotate images with classes The word embeddings are used to look up our change labels. that may not have been seen during training. Using an approach similar to [10], for each pixel, the model will Let us consider an example where there are confirmed tree cover predict its label as its vector representation. In [10], the vector loss alerts for January 2, 2017, in our area of interest. The alerts are representation of images are projected into vectors of the same images of the area with all pixels at 0 except for those for which dimensions as the word vectors, and the model predicts the label an alert was generated. To learn to detect those change pixels, one vector using a similarity metric. image for each satellite is taken before and after the date of the alert. We will also add the text as privileged information during train- The images from different satellites are stacked together to create a ing using heteroscedastic dropout [12]. This heteroscedastic droptout single image. In this case, one image from Sentinel-1 is taken on will be used in order to improve the performance of the model on December 27, 2016 and on from Sentinel-2 on December 28, 2016. our relatively small dataset. Then two other images are similarly taken from January 2 and While the model does not explicitly take into account the tempo- January 3, 2017, to see the area after the alerts. The model learns ral relation between the text and the image we will use documents to identify the change pixels from the alerts by comparing both from a limited time period that covers the time before, during and images. In parallel, we take the corpus made of the publications after the changes occur. In doing so, we expect to limit the number about the same area and extract the word vectors. We then find of extracted words that are temporally out of scope. the word vectors from the corpus which are most similar to our We want to have a model that is suitable for environmental appli- change class word vectors. This will give us a first set of additional cations and therefore we will test on this type of data first. However, labels for our change map. Then we cluster the word vectors from applications in other domains should be possible, provided there the corpus to find the ones that more often occur with our change exist a relevant corpus that can be used with the images. We will word vectors. evaluate the performance of our proposed method using precision and recall statistics and the Intersection over Union score. We will compare our method to state of the art methods for change detection [6, 16] on the Madre de Dios data set described in 4 METHOD section 3.2. By integrating an ontology to the segmentation process of pre and post-disaster images, [3] showed that overall accuracy went from 67.9% to 89.4% for images of their test area. With a reduced number of samples (200), [23] demonstrated that using Wikipedia 5 CONCLUSION AND DISCUSSION annotations for the task of semantic segmentation, the Intersection- The automatic annotation of change in satellite imagery using over-Union (IoU) score was 51.70% compared to 50.75% when pre- annotations extracted from relevant scientific literature will demon- training on ImageNet. In both cases the methods were tested on strate how expert knowledge can be gathered from text, in an images of urban areas. While the use of the ontology created by unsupervised way to add semantic information to changes detected experts in [3] improved greatly the accuracy of the classification in images. Furthermore, for the change detection task, the use of Nathalie Neptune disasters. In 2014 IEEE Geoscience and Remote Sensing Symposium. IEEE, 2347– 2350. [4] Hafidha Bouyerbou, Kamal Bechkoum, and Richard Lepage. 2019. Geographic ontology for major disasters: methodology and implementation. International Journal of Disaster Risk Reduction 34 (2019), 232–242. [5] Gustavo Carneiro and Nuno Vasconcelos. 2005. Formulating semantic image annotation as a supervised learning problem. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 163–168. [6] Rodrigo Caye Daudt, Bertr Le Saux, and Alexandre Boulch. 2018. Fully convo- lutional siamese networks for change detection. In 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 4063–4067. [7] Lingtong Du, Qingjiu Tian, Tao Yu, Qingyan Meng, Tamas Jancso, Peter Udvardy, and Yan Huang. 2013. A comprehensive drought monitoring method integrating MODIS and TRMM data. International Journal of Applied Earth Observation and Geoinformation 23 (2013), 245–253. [8] ESA. 2017. Land Cover CCI Product User Guide Version 2. Tech. Rep. (2017). maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf [9] ESA. 2017. S2 prototype Land Cover 20m map of Africa 2016. http: //2016africalandcover20m.esrin.esa.int/ [10] Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121–2129. [11] Matthew C Hansen, Alexander Krylov, Alexandra Tyukavina, Peter V Potapov, Figure 1: Overview of our proposed approach for automatic Svetlana Turubanova, Bryan Zutta, Suspense Ifo, Belinda Margono, Fred Stolle, annotation of changes in EO images. and Rebecca Moore. 2016. Humid tropical forest disturbance alerts using Landsat data. Environmental Research Letters 11, 3 (2016), 034008. [12] John Lambert, Ozan Sener, and Silvio Savarese. 2018. Deep learning under privileged information using heteroscedastic dropout. In Proceedings of the IEEE these annotations as privileged information when training the al- Conference on Computer Vision and Pattern Recognition. 8886–8895. gorithm should result in higher overall accuracy. This may provide [13] Aoxue Li, Zhiwu Lu, Liwei Wang, Tao Xiang, and Ji-Rong Wen. 2017. Zero-shot scene classification for high spatial resolution remote sensing images. IEEE a method that could be used in tools for non-experts to find and Transactions on Geoscience and Remote Sensing 55, 7 (2017), 4157–4167. identify changes in satellite images of their areas of interest, using [14] Dengsheng Lu, Paul Mausel, Eduardo Brondizio, and Emilio Moran. 2004. Change detection techniques. International journal of remote sensing 25, 12 (2004), 2365– open data even when no expert annotations are available. An exten- 2401. sion of this work could be to extract triples of events with location [15] MINAM. 2018. Cobertura y deforestación en los bosques húmedos amazónicos and time from the full text of the publications, and using them 2017. Technical Report. Programa Nacional de Conservación de Bosques para la Mitigación del Cambio Climático del Ministerio del Ambiente. to construct a timeline of change events with the corresponding [16] Daifeng Peng, Yongjun Zhang, and Haiyan Guan. 2019. End-to-End Change time series of satellite images. Unlike the loose temporal relation of Detection for High Resolution Satellite Images Using Improved UNet++. Remote the currently proposed method, the temporally annotated triples Sensing 11, 11 (2019), 1382. [17] DA Roberts, M Toomey, I Numata, TW Biggs, J Caviglia-Harris, MA Cochrane, C should provide more exact matches to the changes observed on Dewes, KW Holmes, RL Powell, CM Souza, et al. 2013. LBA-ECO ND-01 Landsat the images as they would represent a phenomenon along with its 28.5-m Land Cover Time Series, Rondonia, Brazil: 1984-2010. ORNL DAAC (2013). [18] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional location and its temporal information. Our hypothesis is that this networks for biomedical image segmentation. In International Conference on would also make the candidate labels less noisy overall. Medical image computing and computer-assisted intervention. Springer, 234–241. [19] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ACKNOWLEDGMENTS 2015. Imagenet large scale visual recognition challenge. International journal of The author would like to thank Dr. Brice Mora and Julius Akinyemi computer vision 115, 3 (2015), 211–252. [20] Yosiu Edemir Shimabukuro, Valdete Duarte, Eliana Maria Kalil Mello, and for their feedback and advice on the research project. We are grate- José Carlos Moreira. 2000. Presentation of the Methodology for Creating the ful for the suggestions and comments of the anonymous reviewers. Digital PRODES. Technical Report. São José dos Campos. This work is supported by the Schlumberger Foundation hrough [21] Ashbindu Singh. 1989. Review article digital change detection techniques using remotely-sensed data. International journal of remote sensing 10, 6 (1989), 989– the Faculty for the Future program. 1003. [22] Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and An- REFERENCES drew Y Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguis- [1] Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, and Armand tics 2 (2014), 207–218. Joulin. 2019. Updating Pre-trained Word Vectors and Text Classifiers using [23] Burak Uzkent, Evan Sheehan, Chenlin Meng, Zhongyi Tang, Marshall Burke, Monolingual Alignment. arXiv preprint arXiv:1910.06241 (2019). David Lobell, and Stefano Ermon. 2019. Learning to interpret satellite images in [2] Lutz Bornmann and Rüdiger Mutz. 2015. Growth rates of modern science: A global scale using wikipedia. arXiv preprint arXiv:1905.02506 (2019). bibliometric analysis based on the number of publications and cited references. [24] Vladimir Vapnik. 2006. Estimation of dependences based on empirical data. Springer Journal of the Association for Information Science and Technology 66, 11 (2015), Science & Business Media. 2215–2222. [25] Christian Vargas, Joselyn Montalban, and Andrés Alejandro Leon. 2019. Early [3] Hafidha Bouyerbou, Kamal Bechkoum, Nadjia Benblidia, and Richard Lepage. warning tropical forest loss alerts in Peru using Landsat. Environmental Research 2014. Ontology-based semantic classification of satellite images: Case of major Communications 1, 12 (2019), 121002.