=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_24
|storemode=property
|title=Automatic Annotation of Change in Earth Observation Imagery
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_24.pdf
|volume=Vol-2621
|authors=Nathalie Neptune
|dblpUrl=https://dblp.org/rec/conf/circle/Neptune20
}}
==Automatic Annotation of Change in Earth Observation Imagery==
<pdf width="1500px">https://ceur-ws.org/Vol-2621/CIRCLE20_24.pdf</pdf>
<pre>
 Automatic Annotation of Change in Earth Observation Imagery
                                                                         Nathalie Neptune
                                                                     nathalie.neptune@irit.fr
                                                                   Under the supervision of
                                                                         Josiane Mothe
                                                                      IRIT, UMR 5505, CNRS
                                                                         Toulouse, France
ABSTRACT                                                                               not yet been annotated. In fact, unannotated EO images are plentiful.
Earth observation images are plentiful and have been used for sev-                     The Landsat and Copernicus programs provide free access to the
eral decades for many tasks such as classifying land cover, monitor-                   images produced by their satellite missions with new images made
ing agriculture and more generally for detecting changes happening                     available every day. However, using these images for semantic
on the surface of the Earth. There exist however several challenges                    change detection poses the challenge of correctly identifying the
in using these images in information systems. One of them is the                       classes for each pixel of each image. Without labels, changes can
annotation of the images with semantic descriptors. To address                         still be detected by comparing the images, however the semantic
this, as our PhD main objective, we propose a new approach for                         information about those changes will be missing. Without this
the semantic annotation of changes detected in satellite images                        information it is not possible to tell for example that a piece of land
using annotations from an unaligned text corpus. Using scientific                      previously covered by trees has now changed into a road. Which
publications on the areas of interest and related to the changes                       is why class labels or annotations for each pixel of each image
observed in the images, we will extract candidate keywords for the                     need to be acquired either through human (expert) annotators or
annotations. State of the art image change detection techniques will                   automatic methods. The latter can fill the gap in a context where
be used to find the regions within the images where the changes                        expert annotators are not available and few or no annotated images
have occurred. The keywords and change pixels will be matched                          exist for a given area. In fact, machine learning algorithms can
by jointly embedding them into a low dimension space.                                  learn to automatically annotate images from previously annotated
                                                                                       examples. A sufficient number of examples must still be provided
CCS CONCEPTS                                                                           to train these algorithms.
                                                                                          Supervised machine learning has been successfully used for
• Computing methodologies → Matching; • Applied comput-
                                                                                       semantic change detection [6]. Such models are trained on images
ing → Annotation; Environmental sciences.
                                                                                       along with their semantically segmented masks. In the case of
KEYWORDS                                                                               deep learning models in particular, the scale of the data needed for
Image annotation, Multimodal learning, Text-image matching                             training makes it impractical to have experts manually annotate
                                                                                       all the images. Crowdsourcing has been used to provide image
1    INTRODUCTION                                                                      annotations at very large scale [19]. A similar approach is not well
The comparison of two earth observation (EO) images of an area,                        suited for EO images because some expertise or training might be
taken at different times, allows to detect whether changes have                        required to properly identify and differentiate among classes. As a
occurred in that area during that time frame. This process is called                   result, automatic and semi-automatic approaches for the annotation
change detection [21]. Various applications call for the use of change                 task are commonly used to build large EO data sets such as [20]
detection techniques such as the monitoring of forest disturbance                      and [11].
and loss [11, 25], the tracking of urban change [23], and the mapping                     Scientific literature published by researchers who work with
of changes caused by natural disasters [3, 7].                                         earth observation images is undoubtedly a source of expert knowl-
   Semantic annotation of images involves associating them with                        edge in the field. Publications in Earth sciences therefore can be
semantic keywords [5]. Likewise, adding semantic information to                        seen as a very large source of expertise that could be leveraged for
changes detected in EO images results in semantic change detection.                    interpreting EO images. Furthermore, it is available at a large scale.
There exist several collections of annotated EO data that can be                       In fact, across all scientific disciplines, the number of publications
used for semantic change detection such as [11], [8], [17] and [9].                    has grown exponentially in the past decades [2].
However, many are either limited by their geographical coverage                           With visual semantic embeddings we learn to represent textual
[17] [9], or by their temporal coverage [9] or by their topical cover-                 and visual data in the same latent space. The closer those data
age [11]. Moreover, all the previously cited image collections are                     points are semantically, the smaller their distance is in the joint
provided on a yearly basis only. Therefore, using these datasets for                   space. Several approaches have been proposed including the joint
semantic change detection may not be suitable in cases where the                       embedding of images and words into a common low dimension
goal is to detect changes in a timely manner.                                          space [10] for image classification, and the embedding of images
   Therefore, to detect changes that are happening in an area of                       and sentences into a common space for image description [22].
interest at a specific time, one might need to use images that have
"Copyright © 2020 for this paper by its author. Use permitted under Creative Commons   https://www.usgs.gov/land-resources/nli/landsat
License Attribution 4.0 International (CC BY 4.0)."                                    https://www.copernicus.eu/en
                                                                                                                                Nathalie Neptune


   While change detection can be applied to any type of image pairs          When a machine learning algorithm can, at testing time, iden-
of the same scene or location, we are focusing on the case of tree        tify classes that were not previously seen during training, it is
loss cover in tree covered areas to test our approach. For these cases,   performing "zero-shot" classification. Few-shot learning happens
the texts may not be available at testing time. The model needs to        when the number of examples of a class is very low, for training.
learn from the labels and make predictions without extracting new         Both zero-shot and few-shot learning methods have been proposed
annotations when images are provided without accompanying text            for learning classes with missing labels or with insufficient exam-
at testing time. The text data in this case is considered privileged      ples in the training data for supervised land cover and land use
information as defined by [24] and is used only when training the         classification models [13].
model.                                                                       Our proposed approach differs from [13] in that we will perform
   By using joint image and text embeddings we will automatically         change detection with a deep learning model and our word vectors
assign relevant annotations to image pairs and segments where             will also be trained on a specialized corpus of scientific documents.
changes are detected. Therefore we will perform two core tasks,           This should make our model more scalable while providing more
change detection and the annotation of the changes. We propose            relevant annotations specifically for change detection as opposed
to use scientific publications as a source of annotations which will      to classification alone. Unlike [6, 16] our model is suitable for a
be extracted using a neural language model. These annotations can         change detection dataset that is not fully annotated meaning that
be used in subsequent tasks such as image indexing and retrieval.         some annotations may be missing or incorrect. We are also not
                                                                          manually building an ontology like [4]. Our approach is closer
                                                                          to [23], however they only take into account the location of the
2   BACKGROUND                                                            text that is used to learn the annotations, they do not consider the
The first step in the semantic annotation of changes in EO images         temporal relation between the text and the images which is useful
is to locate the changes within the images. Binary change detection       for change detection.
methods only detect if and where a change occurred, while semantic
change detection methods also specify semantic information about          3     RESEARCH QUESTIONS
the change that is detected [14]. Both types of methods have been         Our primary goal is to perform semantic change detection on satel-
applied to satellite images to detect land cover changes with the         lite images by combining image change detection with text infor-
most recent ones using deep learning models [6, 16].                      mation extraction, adding semantic annotations to the detected
    Dault et al. [6] proposed a change detection deep learning model      changes. To the best of our knowledge this is the first time this type
for satellite images based on U-Net [18] which is an encoder-decoder      of approach has be proposed for the semantic change detection
architecture with skip connections between the encoding and de-           problem. In addition, we seek to improve the performance of the im-
coding streams. The U-Net architecture was initially proposed for         age pair change detection task by using these extracted annotations
the segmentation of biomedical images. Three variations of the            when training the model.
model are proposed. The first model performs early fusion by tak-
ing the concatenation of the images as input effectively treating         3.1     Hypotheses
them as different color channels. In this case, the change detection
problem is posed as an image segmentation task with two classes:          When related scientific documents are used, relevant annotations
change and no-change. The second and third models are siamese             for change detection in image pairs can be extracted and used for
variations of the U-Net architecture where the encoder part is du-        semantic change detection in the absence of labels provided by
plicated to encode each image separately and skip connections are         experts. This implies that the most important changes have been
used in two ways, by concatenating either the skip connections            studied and documented by scientists and can be found in their
from both encoding streams or the absolute value of their difference.     publications. With a visual-semantic model, we can make the link
Another variation of U-Net with early fusion which uses dense skip        between the changed segments and the labels in zero and few-shot
connections was proposed by [16].                                         learning cases. By using these annotations as privileged information
    When the semantic information about the classes present in the        when training the model, the change detection algorithm should
image pair is inconsistent or lacking, one solution is to include other   be able to detect changes with higher accuracy compared to when
sources of information such as ontologies [3] or geo-referenced           images alone are used.
Wikipedia articles[23].
    Our proposed approach combines visual semantic embeddings             3.2     Experimental setup
for annotating changes, and adds the semantic label to binary             To test our model, we will use images from an area of interest in
change detection. The semantic labels will be extracted from a            Madre de Dios, in Peru, corresponding to the Tile 19LCF from the
text corpus made of publications related to the area and the type         Sentinel satellite missions. We will use all available Sentinel-1 and
of change of interest in time and space, using word embeddings.           Sentinel-2 images of the area from 2015 to 2017. This region was
The changes will be detected using change detection methods from          chosen because it is an area where there are a lot of reported defor-
the state of the art [6, 16] with the extracted labels added as priv-     estation incidents. Madre de Dios was the department in Peru with
ileged information in the learning process using heteroscedastic          the second largest area affected by deforestation in 2017, according
dropout which was proposed by [12] as a way to learn using priv-          to a report by the Ministry of Environment of Peru [15].
ileged information in deep neural networks. The variance of the
heteroscedastic dropout is a function of the privileged information.      https://sentinel.esa.int/web/sentinel/missions
Automatic Annotation of Change in Earth Observation Imagery


   We collected abstracts of publications from the Web of Science           algorithm, it came at the high cost of expert hours. The crowdsourc-
to create our corpus. We retrieved a total of 298 publications using        ing approach using Wikipedia data in [23] while promising, resulted
the keywords "Madre de Dios" for the topic and taking only articles         only in modest improvements for the semantic segmentation task.
and proceeding papers for the years 2000 to 2019. Using the name of            Ideally, we would like to combine the benefits of knowledge from
the area allowed us to exclude articles that were not relevant to the       experts and big data from crowdsourcing. We propose to use the
location. By only taking papers published in this 19 year period we         expert knowledge through relevant scientific articles from which
avoid those that were written too early before the Sentinel missions        annotations will be extracted. Adapted versions of state of the art
started in 2014. However, limiting the collection only to articles          change detection models from [6] and [16] will be used, to test on
published after 2014 yields too few results. Articles published after       images from our area of interest. Both Sentinel-2 optical imagery
2017 might also include changes observed after our most recent              and Sentinel-1 radar imagery will be included. Using images from
image, but including them allows for better coverage of changes             a tropical rainforest area means that there will be significant cloud
present in our image dataset because they are more likely to appear         cover making a large number of pixels, in the optical images, non
in a publication months or years later.                                     usable for change detection. The radar imagery will help mitigate
   For the image (change/no-change) labels we use data from two             this problem.
sources, the GLAD alert system [11] and the Geobosques system                  Our proposed method will therefore perform change detection,
[25]. The alerts will be used as labels for a supervised change de-         in satellite image pairs by predicting change pixels. The method
tection learning algorithm applied to images of tree covered areas.         will also perform semantic annotation of the detected changes by
For each pixel these alerts indicate whether there was tree cover           predicting their labels.
loss or not. We will use the definition of tree cover as vegetation            For the annotations, we will use word embeddings trained on
that is at least 5 meters tall with a minimum of 60% canopy cover           Wikipedia and align them with our text dataset using the RCSLS
as it is defined in the GLAD alert system.                                  criterion [1]. These embeddings cover a large vocabulary on many
   Our deep learning model will be based on binary change detec-            topics, most of which are not relevant to our case. This is why
tion methods [6, 16] for image pairs. The model will be adapted to          we will realign them with embeddings from our specific corpus to
use the semantic information extracted from our corpus to learn             ensure that the distribution of words matches our intended topic.
the annotations for the images and annotate images with classes                The word embeddings are used to look up our change labels.
that may not have been seen during training.                                Using an approach similar to [10], for each pixel, the model will
   Let us consider an example where there are confirmed tree cover          predict its label as its vector representation. In [10], the vector
loss alerts for January 2, 2017, in our area of interest. The alerts are    representation of images are projected into vectors of the same
images of the area with all pixels at 0 except for those for which          dimensions as the word vectors, and the model predicts the label
an alert was generated. To learn to detect those change pixels, one         vector using a similarity metric.
image for each satellite is taken before and after the date of the alert.      We will also add the text as privileged information during train-
The images from different satellites are stacked together to create a       ing using heteroscedastic dropout [12]. This heteroscedastic droptout
single image. In this case, one image from Sentinel-1 is taken on           will be used in order to improve the performance of the model on
December 27, 2016 and on from Sentinel-2 on December 28, 2016.              our relatively small dataset.
Then two other images are similarly taken from January 2 and                   While the model does not explicitly take into account the tempo-
January 3, 2017, to see the area after the alerts. The model learns         ral relation between the text and the image we will use documents
to identify the change pixels from the alerts by comparing both             from a limited time period that covers the time before, during and
images. In parallel, we take the corpus made of the publications            after the changes occur. In doing so, we expect to limit the number
about the same area and extract the word vectors. We then find              of extracted words that are temporally out of scope.
the word vectors from the corpus which are most similar to our                 We want to have a model that is suitable for environmental appli-
change class word vectors. This will give us a first set of additional      cations and therefore we will test on this type of data first. However,
labels for our change map. Then we cluster the word vectors from            applications in other domains should be possible, provided there
the corpus to find the ones that more often occur with our change           exist a relevant corpus that can be used with the images. We will
word vectors.                                                               evaluate the performance of our proposed method using precision
                                                                            and recall statistics and the Intersection over Union score.
                                                                               We will compare our method to state of the art methods for
                                                                            change detection [6, 16] on the Madre de Dios data set described in
4    METHOD                                                                 section 3.2.
By integrating an ontology to the segmentation process of pre
and post-disaster images, [3] showed that overall accuracy went
from 67.9% to 89.4% for images of their test area. With a reduced
number of samples (200), [23] demonstrated that using Wikipedia             5   CONCLUSION AND DISCUSSION
annotations for the task of semantic segmentation, the Intersection-        The automatic annotation of change in satellite imagery using
over-Union (IoU) score was 51.70% compared to 50.75% when pre-              annotations extracted from relevant scientific literature will demon-
training on ImageNet. In both cases the methods were tested on              strate how expert knowledge can be gathered from text, in an
images of urban areas. While the use of the ontology created by             unsupervised way to add semantic information to changes detected
experts in [3] improved greatly the accuracy of the classification          in images. Furthermore, for the change detection task, the use of
                                                                                                                                                            Nathalie Neptune


                                                                                             disasters. In 2014 IEEE Geoscience and Remote Sensing Symposium. IEEE, 2347–
                                                                                             2350.
                                                                                         [4] Hafidha Bouyerbou, Kamal Bechkoum, and Richard Lepage. 2019. Geographic
                                                                                             ontology for major disasters: methodology and implementation. International
                                                                                             Journal of Disaster Risk Reduction 34 (2019), 232–242.
                                                                                         [5] Gustavo Carneiro and Nuno Vasconcelos. 2005. Formulating semantic image
                                                                                             annotation as a supervised learning problem. In 2005 IEEE Computer Society
                                                                                             Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE,
                                                                                             163–168.
                                                                                         [6] Rodrigo Caye Daudt, Bertr Le Saux, and Alexandre Boulch. 2018. Fully convo-
                                                                                             lutional siamese networks for change detection. In 2018 25th IEEE International
                                                                                             Conference on Image Processing (ICIP). IEEE, 4063–4067.
                                                                                         [7] Lingtong Du, Qingjiu Tian, Tao Yu, Qingyan Meng, Tamas Jancso, Peter Udvardy,
                                                                                             and Yan Huang. 2013. A comprehensive drought monitoring method integrating
                                                                                             MODIS and TRMM data. International Journal of Applied Earth Observation and
                                                                                             Geoinformation 23 (2013), 245–253.
                                                                                         [8] ESA. 2017. Land Cover CCI Product User Guide Version 2. Tech. Rep. (2017).
                                                                                             maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf
                                                                                         [9] ESA. 2017. S2 prototype Land Cover 20m map of Africa 2016.                    http:
                                                                                             //2016africalandcover20m.esrin.esa.int/
                                                                                        [10] Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio
                                                                                             Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding
                                                                                             model. In Advances in neural information processing systems. 2121–2129.
                                                                                        [11] Matthew C Hansen, Alexander Krylov, Alexandra Tyukavina, Peter V Potapov,
Figure 1: Overview of our proposed approach for automatic                                    Svetlana Turubanova, Bryan Zutta, Suspense Ifo, Belinda Margono, Fred Stolle,
annotation of changes in EO images.                                                          and Rebecca Moore. 2016. Humid tropical forest disturbance alerts using Landsat
                                                                                             data. Environmental Research Letters 11, 3 (2016), 034008.
                                                                                        [12] John Lambert, Ozan Sener, and Silvio Savarese. 2018. Deep learning under
                                                                                             privileged information using heteroscedastic dropout. In Proceedings of the IEEE
these annotations as privileged information when training the al-                            Conference on Computer Vision and Pattern Recognition. 8886–8895.
gorithm should result in higher overall accuracy. This may provide                      [13] Aoxue Li, Zhiwu Lu, Liwei Wang, Tao Xiang, and Ji-Rong Wen. 2017. Zero-shot
                                                                                             scene classification for high spatial resolution remote sensing images. IEEE
a method that could be used in tools for non-experts to find and                             Transactions on Geoscience and Remote Sensing 55, 7 (2017), 4157–4167.
identify changes in satellite images of their areas of interest, using                  [14] Dengsheng Lu, Paul Mausel, Eduardo Brondizio, and Emilio Moran. 2004. Change
                                                                                             detection techniques. International journal of remote sensing 25, 12 (2004), 2365–
open data even when no expert annotations are available. An exten-                           2401.
sion of this work could be to extract triples of events with location                   [15] MINAM. 2018. Cobertura y deforestación en los bosques húmedos amazónicos
and time from the full text of the publications, and using them                              2017. Technical Report. Programa Nacional de Conservación de Bosques para la
                                                                                             Mitigación del Cambio Climático del Ministerio del Ambiente.
to construct a timeline of change events with the corresponding                         [16] Daifeng Peng, Yongjun Zhang, and Haiyan Guan. 2019. End-to-End Change
time series of satellite images. Unlike the loose temporal relation of                       Detection for High Resolution Satellite Images Using Improved UNet++. Remote
the currently proposed method, the temporally annotated triples                              Sensing 11, 11 (2019), 1382.
                                                                                        [17] DA Roberts, M Toomey, I Numata, TW Biggs, J Caviglia-Harris, MA Cochrane, C
should provide more exact matches to the changes observed on                                 Dewes, KW Holmes, RL Powell, CM Souza, et al. 2013. LBA-ECO ND-01 Landsat
the images as they would represent a phenomenon along with its                               28.5-m Land Cover Time Series, Rondonia, Brazil: 1984-2010. ORNL DAAC (2013).
                                                                                        [18] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional
location and its temporal information. Our hypothesis is that this                           networks for biomedical image segmentation. In International Conference on
would also make the candidate labels less noisy overall.                                     Medical image computing and computer-assisted intervention. Springer, 234–241.
                                                                                        [19] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
                                                                                             Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al.
ACKNOWLEDGMENTS                                                                              2015. Imagenet large scale visual recognition challenge. International journal of
The author would like to thank Dr. Brice Mora and Julius Akinyemi                            computer vision 115, 3 (2015), 211–252.
                                                                                        [20] Yosiu Edemir Shimabukuro, Valdete Duarte, Eliana Maria Kalil Mello, and
for their feedback and advice on the research project. We are grate-                         José Carlos Moreira. 2000. Presentation of the Methodology for Creating the
ful for the suggestions and comments of the anonymous reviewers.                             Digital PRODES. Technical Report. São José dos Campos.
This work is supported by the Schlumberger Foundation hrough                            [21] Ashbindu Singh. 1989. Review article digital change detection techniques using
                                                                                             remotely-sensed data. International journal of remote sensing 10, 6 (1989), 989–
the Faculty for the Future program.                                                          1003.
                                                                                        [22] Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and An-
REFERENCES                                                                                   drew Y Ng. 2014. Grounded compositional semantics for finding and describing
                                                                                             images with sentences. Transactions of the Association for Computational Linguis-
 [1] Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, and Armand                 tics 2 (2014), 207–218.
     Joulin. 2019. Updating Pre-trained Word Vectors and Text Classifiers using         [23] Burak Uzkent, Evan Sheehan, Chenlin Meng, Zhongyi Tang, Marshall Burke,
     Monolingual Alignment. arXiv preprint arXiv:1910.06241 (2019).                          David Lobell, and Stefano Ermon. 2019. Learning to interpret satellite images in
 [2] Lutz Bornmann and Rüdiger Mutz. 2015. Growth rates of modern science: A                 global scale using wikipedia. arXiv preprint arXiv:1905.02506 (2019).
     bibliometric analysis based on the number of publications and cited references.    [24] Vladimir Vapnik. 2006. Estimation of dependences based on empirical data. Springer
     Journal of the Association for Information Science and Technology 66, 11 (2015),        Science & Business Media.
     2215–2222.                                                                         [25] Christian Vargas, Joselyn Montalban, and Andrés Alejandro Leon. 2019. Early
 [3] Hafidha Bouyerbou, Kamal Bechkoum, Nadjia Benblidia, and Richard Lepage.                warning tropical forest loss alerts in Peru using Landsat. Environmental Research
     2014. Ontology-based semantic classification of satellite images: Case of major         Communications 1, 12 (2019), 121002.

</pre>