=Paper=
{{Paper
|id=Vol-2622/paper3
|storemode=property
|title=Building Change Detection in Aerial Images
|pdfUrl=https://ceur-ws.org/Vol-2622/paper3.pdf
|volume=Vol-2622
|authors=Fatima Mroueh,Ihab Sbeity,Mohamad Chaitou
|dblpUrl=https://dblp.org/rec/conf/bdcsintell/MrouehSC19
}}
==Building Change Detection in Aerial Images==
Building Change Detection in Aerial Images
Fatima Mroueh Ihab Sbeity Mohamad Chaitou
Computer Science Department Computer Science Department Computer Science Department
Lebanese University Lebanese University Lebanese University
Beirut, Lebanon Beirut, Lebanon Beirut, Lebanon
fatima.mroueh249@gmail.com ihab.sbeity@gmail.com mohamad.chaitou@ul.edu.lb
Abstract—In this paper, we provide an approach that detects data, such as data generated from Digital Elevation Models
the changes in buildings between two multi-temporal aerial (DEM), Light Detection and Ranging (LiDAR) technology
images of different sources. Since the images in most cases are and other kinds of remote sensing technologies [1] [4], or
not perfectly aligned, our approach takes into consideration the
differences in the geometric aspects of the images. Differences in limited to work with specific types of images such as GeoTiff
scale, view point or overlapping regions may be present between images that contain accurate geographic information such as
the pair of images. Our approach relies on segmentation to coordinates is the global coordinates system. Furthermore,
extract building masks from the original aerial images. Changes they are also limited to aligned images that are of the same
are then found by comparing the features of the pair of masks scale and view point (same height, same camera calibrations,
using image matching algorithms. This procedure is applied
on a set of 80 pairs of aerial images of different sizes and same coordinates...).
with different applied transformations, and an evaluation has The main problem with these techniques is that they rely
been considered in comparison with the corresponding ground too much on the information provided with the images, and
truth references. The evaluation yields buildings change detection therefore they cannot be applied to any image that is not
rate of 92.7%. The results of our proposed approach suggest enriched with any information such as geo-spatial information.
that automatic building change detection is possible, but further
research should include improvement of the segmentation phase Nowadays, the automatic analysis techniques of images
to better distinguish buildings and enhancement of the change are very essential. Machine Learning and Computer Vision
detection method. Real time application of the process is also a techniques, and more specifically the image matching algo-
challenging perspective. rithms, has proven to be very efficient in the field of image
Index Terms—change detection, aerial images, image seg- processing and comparison. Furthermore, there are still poor
mentation, image matching algorithms, SIFT, feature detection,
feature description scientific methodologies for detecting changes in aerial im-
ages, especially those that differ in geometric aspects such as
scale and orientation, and without being limited to additional
I. I NTRODUCTION
information about the images. Going deeper into the topic
Aerial imagery is - as it sounds - the process of taking is essential to introduce new efficient insights in the field of
images from the air. It is a subset of a larger domain called change detection in aerial images.
Remote Sensing. It consists of acquiring data without making Accordingly, this research provides a complete procedure
physical contact with the objects in study [1]. for building change detection in aerial images using machine
Aerial images such as satellite imagery or drone imagery learning and computer vision techniques and algorithms.
are considered one of the richest sources of data that can be The main advantage of our approach is that it does not
used in various applications. Change detection in aerial images take benefit from any of the information derived from the
is detecting new or disappeared objects in images registered aerial images. It deals with aerial images as simple PNG or
at different moments of time and possibly in various lighting, JPG formats without any enrichment. More importantly, it can
heights and camera calibrations [2]. Detecting the changes in detect changes in aerial images that differ in scale and view
aerial images of the same region and taken at different times point and images that have overlapped regions. This way, our
is useful and important in many domains such as: automatic approach can be applied to any pair of aerial images despite
map updating, field change after catastrophic events, detecting of their related information or their geometric aspects.
illegal buildings areas and undeclared refugees camps, analysis
of urban and suburban areas, a base for automatic monitoring II. P REVIOUS S TUDIES
system and some other military applications. For these rea- Detecting the changes in aerial images has been an old and
sons, detecting changes in aerial images has thus become an long journey. In particular, changes in buildings, is an essential
important research topic [3]. part of this journey.
In fact, several techniques and approaches are designed and Looking at the previous studies related to our topic, one can
implemented to detect changes in aerial images. However, all see that most of these studies rely on data fusion; they integrate
these techniques were motivated by the availability and fusion multiple data sources to produce more consistent and accurate
of different types of useful and profitable remote sensing information than that provided by any individual data source.
Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
13
For example, in their work, Nebiker et al. used image-based changes by finding where the update line features spatially
dense digital surface models (DSMs) in order to compute match the base line features and detects spatial changes,
a depth value for every pixel of an image, combined with attribute changes, or both, as well as no change. However,
the aerial images for the detection of individual buildings. all inputs to this tool must be in the same coordinates system
They used these models with object-based image analysis to [18]. While in our case, we aim to detect changes even if we
detect changes [5]. As well, the study of Chen Lin was do not know the spatial location of the geographic region we
based on multi-source data. They pre-processed the data using are working on.
triangulation of an irregular network of data points collected To the best of our knowledge, we did not find any study that
by Light Detection And Ranging (LiDAR) technology, and processes aerial images independently from any other source
then, the changes were detected by finding differences in of information to extract buildings from. Moreover, computer
height by comparing the LiDAR point measurements and the vision techniques such as image matching algorithms are not
estimates of the building models [6]. Furthermore, Alonso et employed to detect changes although they proved to be very
al. applied the support vector machine (SVM) classification efficient in the comparison of images.
algorithm to a joint satellite and laser data set for the extraction To overcome the two problems cited above, our approach
of buildings. For change detection, they suggested to compare works in three steps. As we are interested in small-scale
an old map with more recent spatial information instead of change detection (buildings), the first step is the segmentation
comparing a pair of images [7]. Many other studies took phase, in which we eliminate a large part of the scene without
benefit from data sources other than the aerial image itself losing any actual building. This is possible by extracting build-
such as Digital Elevation Models (DEMs), laser scanner data, ings’ footprints from the aerial images. Second, we use the
indicator of vegetation (NDVI), the relationship between the SIFT image matching algorithm to check the correspondence
buildings and their shadows, and high resolution aerial images of the pair of images, i.e. to make sure that the images taken
in order to detect changes in buildings [8] [9] [10] [11] [12]. correspond to the same geographic region. Third, we detect
Most of these studies suffered from significant problems with the type of transformation applied to one of the images with
small buildings and with buildings surrounded by high trees. respect to the other (scale, rotation, overlap). The detected
Talking about extracting the buildings before detecting transformation is then reversed to get two images of the same
changes, this step was included in numerous studies. Some scale and view. The last step, the difference image can be
of them went for region-based classification where each small computed and post-processed. In this step, the changes in the
region was classified to “building” or “no-building” based on buildings are detected.
a decision tree induced from training data (edge recording of
the buildings), and then classified to “change” or “no change” III. BACKGROUD
based on some conditions [13]. Other used the indicator of
A. Image Segmentation
vegetation NDVI for distinguishing buildings from trees since
both have similar height information [14]. Neural network Computer vision is a field that is intended to make comput-
classifier also was employed in order to classify the regions ers accurately understand and efciently process visual data like
in an aerial image into multiple classes (grove, building, tree, images. Extracting information from images and understand-
shadow . . . ) by feeding the neural network with many inputs ing image information is very critical in many applications in
such as area, average gray level, shape factor and compactness this domain. Computer vision helps in extracting features of
[3]. Region-based segmentation was also applied using a an image in order to simplify image analysis [19].
decision tree that rely on the geometric properties of the land In several cases, we may not be interested in all the
cover objects such as elevation, spectral information, texture components of the image, but only for some areas or objects
and shape [15]. The most important and precise segmentation that have certain characteristics related to our task. Image
was applied using Convolutional Neural Networks where the segmentation is one of the best techniques to handle this issue.
large imagery is divided into small patches, and then CNN This technique works by isolating objects from the rest of the
is trained with those patches and their corresponding three- image [20] [21] [22] [23] [24]. Image segmentation mainly has
channel map patches (building, road and background) [16]. the role of classifying each pixel of an image into meaningful
However this work was not including change detection. classes that refer to specific objects. It involves grouping of
As for detecting changes in aerial images that have different the elements of an image by certain criteria of homogeneity
views, Bourdis et al. stated that camera motion and viewpoint [4]. It does not only make a prediction for an input providing
differences introduce parallax effects. Therefore, in order to be classes, but also provides additional information regarding the
robust to viewpoint differences, they introduce an algorithm to location of such classes.
distinguish between real changes and parallax effects based on Deep learning techniques have proven to be very efficient in
optical flow constrained with epipolar geometry [17]. In other solving such problems. These techniques can learn patterns in
works concerning this point, knowing the calibration of the order to predict classes. The main deep learning architecture
camera or the spatial information about the geographic area that is used for image segmentation, and generally speaking
were essential in order to achieve the goal [13] [17]. for image processing, is the Convolutional Neural Network
Furthermore, ArcGIS Pro offers a tool that detect feature (CNN).
14
Frameworks like MaskRCNN, RetinaNet allow to apply
image segmentation using deep learning. However the domain
of application of some of them is restricted to scene images,
and they cannot be used in case of aerial images [25] [26].
Other frameworks that work with aerial images such as ENVI,
ERDAS Imagine, eCognition and others are also available [27]
[28]. Nevertheless they have many limitations. Some of them
do not have any vectorization tool to convert the segmented
result to use them in further analysis, others are confused
with images where the building roofs were dark and having
intensities very less as compared to other building objects [29].
B. Image Matching
In order to compare the images, we look for specific patterns
Fig. 1. Worflow of our approach.
or specific features that are unique in the images and that
can be easily compared. A feature is a relevant piece of
information. It is a specific structure in the image such as matching feature in the other image. We can summarize the
a point, an edge or a corner. The operation of finding the process of image matching as follows: 1- Find a set of
features of an image is called Feature Detection. distinctive keypoints. 2- Define a region around each keypoint.
Feature detection is the process of transforming the visual 3- Extract and normalize the region content. 4- Compute a
information of the image into the vector space. It is basically local descriptor from the normalized region. 5- Match local
finding keypoints (or interest points) in the image. A keypoint descriptors.
is a unique point in the local area around it. A keypoint can Many comparative studies have been published assessing
be matched to a corresponding point in another image. The the performance of the image matching algorithms. The real
main purpose of detecting features is giving us the possibility challenge is to achieve true invariant feature detection under
to perform mathematical operations on them, and thus to find any image transformation. It seems that the selection of the
similar vectors that lead us to similar objects or scenes in adequate algorithm to complete the matching task significantly
the images. Ideally, this information should be invariant under depends on the type of the image to be matched and in the
image transformations, so we can find the same features again variations within an image and its matching pair in scale,
even if the image is transformed in some way. orientation or other transformations. Most of these studies has
Using a specific feature detection algorithm, we search stated that Scale Invariant Feature Transform SIFT algorithm
for such features in the first image and then we look for performs the best against different image transformations [30]
the same features in the other image. As a result, we get a [31] [32] [33].
set of points (xi , yi ) for each image, where xi and yi are
the coordinates of the point i detected as a feature in the IV. M ETHODOLODY
image. After detecting interest points, we continue to compute
a descriptor for each one of them. The regions around the Fig. 1 represents the overall process of our approach: First,
features should be described so that the algorithm can find the the buildings footprints from the acquired aerial image will be
similar features in the other image. This is called the Feature extracted in order to use them in detecting changes instead of
Description. the original aerial images. To achieve this step, a segmentation
The local appearance around each feature point is described model for extracting buildings masks from aerial images is
in some way that is invariant under changes in translation, built. Second, we suppose that a database is already prepared
scale and rotation. Therefore, we end up with a descriptor containing preprocessed aerial images’ masks of the region
vector for each feature point. Feature descriptors encode of interest. In this step, we look into in the database for the
interesting information into a series of numbers and act as a mask that corresponds to our input mask. This step is achieved
sort of numerical ‘fingerprint’ that can be used to differentiate by computing a similarity measure between each couple of
one image from another. Once the features and the descriptors images using SIFT image matching algorithm. Finally, after
are extracted and computed, some preliminary feature matches aligning the couple of masks, we detect changes by filtering
between these images will be established. their difference image.
Feature matching, or more generally image matching, is
the task of establishing correspondences between two images. A. Buildings’ Footprints Extraction
Keypoints between two images are matched by identifying Extracting buildings’ footprints from aerial images is a kind
their nearest neighbors. This is achieved by comparing the of preprocessing of the images before matching. It helps us to
descriptors across the images to identify similar features. get better results in detecting changes, since by segmenting the
For any two images, we get a set of pairs (xi , yi ), (x0i , yi0 ) images, we get rid of every element that is considered noise
where (xi , yi ) is a feature in one image and (x0i , yi0 ) is its for us (not an object of interest).
15
trails, railway stations and much more, all over the world.
OSM maps are saved on the internet and they are totally free.
But the most important thing is that OSM is accurate and up
to date (normally updated every day) [35].
There are two reasons for which we are building our own
segmentation model instead of using OSM data directly. The
first reason is that OSM data do not cover all the regions we
are interested in. In Lebanon for example, buildings masks are
not provided for all the country. So, we take benefit from the
available geometries provided by OSM in order to build the
segmentation model. Later, this model will provide us with the
buildings footprints for regions that are not covered by OSM.
The second reason, which is the most important one, is that
we may not be aware of the exact location of the image in the
global coordinates system. In such case, we cannot use OSM
extracts.
GeoFabrik server from OSM provides a convenient and
Fig. 2. Aerial image and its corresponding buildings mask. updated extracts which we can work with [36]. GeoFabrik
team extract, select and process free geodata for everyone.
They create shape files, maps, and map tiles with a free
A segmentation model is needed for this purpose. Many of charge download service. The geometries extracted from
tools that implement this techniques are available. The used GeoFabrik server are shape files of extension .shp. A shape
tool to achieve our goal is RoboSat [34]. RoboSat is an file is a simple format that is used for storing the geometric
end-to-end pipeline written in Python3 for feature extraction location and attribute information of geographic features that
from aerial and satellite imagery. Features can be anything can be represented by points, lines or polygons. We are only
visually distinguishable in the imagery such as buildings, roads interested in polygon representation of the buildings. These
or cars [34]. We chose to work with RoboSat since it is shape files can be visualized as vector layer in GIS tools,
specially designed to work with aerial images and it has shown which help us to decide at what locations we need to download
important results in this domain. satellite imagery to complete the dataset.
The data preparation tools in RoboSat help us to create Although the masks are not always perfect, but a slightly
and prepare the dataset for training feature extraction models. noisy dataset will still work fine with training the model on
Also, the modelling tools in RoboSat help with training fully thousands of images and masks.
convolutional neural networks for segmentation [34]. The next step is to download the corresponding aerial
Fig. 2 represents an aerial image with its corresponding imagery. Our aerial imagery is downloaded from Mapbox [37].
buildings mask. Mapbox satellite is a full global base map. It uses global
a) Data Preparation: We first walk through creating a satellite and aerial imagery from commercial providers such
dataset for training feature extraction model. Such dataset con- as NASA and USAS. Mapbox provides an API that allows us
sists of satellite imagery combined with their corresponding to download the needed satellite imagery [37].
masks for the feature we want to extract, which is building RoboSat works with the Slippy Map tile format to abstract
in our case. We can think of these masks as binary images away georeferenced imagery behind tiles of the same size. A
which take the value zero where there is no building and one Slippy Map is, in general, a term referring to modern web
for building areas. maps which let you zoom and pan around. By default, the
This dataset will serve as training set for the segmentation Slippy Map renders tiles. Tiles are 256 x 256 pixel PNG
model. The goal is to have a model that accepts an aerial files. Each tile is a file in a directory representing a column,
image and outputs its corresponding buildings footprints. As and each column is a subdirectory that represents the zoom
mentioned before, the footprints will be used to detect changes level. RoboSat offers the tool that is responsible for tiling the
instead of original aerial images, and this is to reduce all kinds collected aerial images as well as extracted geometries.
of noise that may affect the accuracy of our application. Our With downloaded satellite imagery and rasterized corre-
objects of interest are only buildings. sponding masks, our dataset is complete and ready. Fig.3
We start by extracting geometries from OpenStreetMap shows the downloaded aerial imagery tiles with their corre-
(OSM) project. We try then to figure out where we need sponding buildings footprints.
satellite imagery in order to complete the training set [35]. b) Training and Modelling: The RoboSat segmentation
OpenStreetMap (OSM) project creates and provides free ge- model is a kind of fully convolutional neural network which
ographic data. The OpenStreetMap Foundation is an interna- we train on pairs of aerial images and corresponding masks.
tional not-to-profit organization supporting the OpenStreetMap The training process takes place within a GeForce GTX 1080
project. This project maintains data about roads, buildings, platform. When picking up the best checkpoint, the model
16
Fig. 4. SIFT matches between pairs of masks having different transformations.
Fig. 3. Aerial images tiles with their corresponding footprints. keypoints, the set of matched keypoints between the pair of
images and many other useful information.
Fig. 4 represents the matching points between pairs of
allows to predict the segmentation probabilities for every images having different transformations. For visualization, the
pixel in an image. These segmentation probabilities indicate original image is put on the left side and the other image is put
how likely each pixel is a background or a building. These on the right side and the matches are drawn as lines between
probabilities are then turned into discrete segmentation masks. both images.
The same segmentation model is used for extracting buildings Let n and m be the number of keypoints in the first mask and
footprints from old imagery as well as the input aerial image. second mask respectively. And let S = {Pi /i 2 1, 2, . . . , n}
be the set of detected keypoints in the first mask and S 0 =
B. Image Correspondence
{Pi0 /i 2 1, 2, . . . , m} be the set of detected keypoints in the
At this point, after extracting buildings footprints from the second mask. Let M be the set containing the pair of keypoints
original input aerial image, we need to find its corresponding indices that match with each other. Then M = {(i, j)/Pi 2
mask from the already prepared dataset. The pair of masks SandPj0 2 S 0 } are found as matched keypoints. This naming
will not be perfectly aligned. Many types of transformations will be used in all next sections.
may be applied to one of the images with respect to the other. Both images are of the same scene then there must be
Different scales, different views, overlapping regions between proportionality between the relative distances of the keypoints.
images are examples of such transformations. Thus in all cases this condition must be satisfied:
Here, and because we do not know the exact location of
the images in the global coordinates system, some similarity d(Pa , Pb ) d(Pe0 , Pf0 )
⇡ (1)
measure is needed to find the mask from the dataset that best d(Pc , Pd ) d(Pg0 , Ph0 )
matches with our input image. This similarity measure will
help us in deciding whether the two images are for the same such that a, b, c, d 2 {1, 2, . . . , n} and e, f, g, g 2
scene or not. For this purpose, SIFT image matching algorithm {1, 2, . . . , m} and {a, e}, {b, f }, {c, g}, {d, h} 2 M .
is used. We compute this factor for the matched keypoints found for
The objective here is to find a similarity measure that the pair of images. In some cases, there might be false matches
helps us to know that the masks are extracted from the same which lead to some disparity in the values of the factor
geographic region despite of the applied transformation. between the matching pairs. To remove this inconsistency, we
First, we use SIFT image matching algorithm to detect the remove all the matching pairs that give a factor which is far
interest points in both masks (having different transforma- from the most frequent factor. Then we compute the ratio of
tions). Then, we compute the descriptors for each one of the the number of the remaining matching pairs over the total
images in order to use them in the matching process. SIFT number of good matches. We rely on this ratio as a similarity
algorithm provides us with the coordinates of the detected measure between the two images.
17
In other cases, this similarity factor can vary. In order to
have a threshold that can be used in any other case, we com-
pute the similarity factor for 408 pair of masks with different
sizes and different applied transformations. We computed the
average of the proportionality factor of the 408 pairs of masks
and we got 0.88685 as an average factor. But since we are
assuming that the pair of masks that we need to compare have
differences in buildings, we accept 0.7 as a threshold.
C. Change Detection
After finding the corresponding masks, SIFT matching
algorithm is very efficient in detecting the type of the transfor-
mation applied to one of the images with respect to the other.
We differentiate between three main types of transformations:
masks that have overlapping regions, masks that are different
in scale and masks that are different in rotation angle. We will
explain in details how to detect each type of these transfor-
mations by applying simple mathematics on the information
provided by SIFT algorithm.
a) Overlapping Regions: For this type of transforma-
tion, we use template matching algorithms. This algorithm
is available in OpenCV library for computer vision [20].
This algorithm proved to be very efficient in detecting the
overlapping regions between two images.
After computing the similarity measure between the pair of
masks and checking that the views correspond to the same Fig. 5. Difference image before and after filtering.
scene, we try to find the overlapping regions between the pair
of masks. The bigger image is then cropped to be aligned with
its overlapping region. We apply template matching algorithm have to find the angle between the lines that are formed by
to search for the small mask in the bigger one. Although there respective matched points. So
are some differences in the buildings, the template matching d(Pa , Pb ) d(Pe0 , Pf0 )
algorithm gives us an accurate result. Now, we have two ✓= ⇡ (3)
d(Pc , Pd ) d(Pg0 , Ph0 )
aligned pair of masks that are ready to detect changes between
them. We also remove inconsistencies because of the presence of
b) Scale Transformation: In this type of transformation, false matches. Now, we have two aligned pair of masks that
we aim to find the scale factor between the pair of masks. are ready to detect changes between them.
When we get the scale ratio , we can transform both masks to D. Difference Image
be in the same scale. The process now is very similar to the
Whatever was the transformation applied to one of the
one performed in computing the similarity measure since the
images, at this point we have two aligned images. All what
ratio of distances computed there was in fact the scale factor.
we have to do is to find the difference image. Of course,
So
the difference image will contain some noise because of the
d(Pa , Pb ) d(Pe0 , Pf0 )
= ⇡ (2) differences in the resolution of the pair of masks, the fact that
d(Pc , Pd ) d(Pg0 , Ph0 ) drives us to filter the difference image.
Filtering the noise in the difference image consists of
for all a, b, c, d 2 {1, 2, . . . , n} and e, f, g, g 2 {1, 2, . . . , m} finding the contours in it. Contours are curves joining all
and {a, e}, {b, f }, {c, g}, {d, h} 2 M . the continuous points (along the boundary) having same color
We also remove inconsistencies because of the presence of or intensity. Contours are very helpful for shape detection or
false matches. Now, we have two aligned pair of masks that recognition. Since we are using binary images, we have more
are ready to detect changes between them. chance to get a better accuracy. Finding such contours rely on
c) View Point Transformation (Orientation: In this type detecting Canny Edges [38].
of transformation, we aim to find the rotation angle between Fig. 5 represents the noisy difference image and the filtered
the pair of masks. When we get the rotation angle, we one.
can transform both masks to be in the same orientation. To The contours are projected finally onto one of the original
calculate the angle of rotation between the two masks, we images to show the differences clearly.
18
TABLE I
D ESCRIPTION OF THE TRANSFORMED MASKS
Description
A Scale factor = 1.56, 4 changes
B Rotation angle = 73°, 2 changes
C Have overlapping region, 3 changes
TABLE II
R ESULTS OF SIFT ALGORITHM APPLIED ON DIFFERENT PAIRS OF MASKS
No. of keypoints No. of keypoints No. of matches
in the original mask in the transformed mask
A 261 216 169
B 261 779 171
C 261 121 91
in order to guarantee that the results of the image segmentation
do not affect our evaluation.
Fig. 6. Evaluation metrics for each epoch of the training process. In order to show the results of the whole workflow, refer
to Fig. 2 that shows an aerial image and its corresponding
mask. We suppose that this image is acquired now from an
aircraft. We also suppose that we have a database containing
old masks (extracted from old aerial images). The goal is to
find the mask in the database that corresponds to the mask of
this aerial image by computing the similarity measure between
each pair of masks.
After extracting the buildings’ footprints from the image,
we manually apply different transformations to the mask in
order to evaluate our procedure. Table 1 shows the description
of the applied transformations. We also apply manually some
changes between the masks.
First, we apply SIFT algorithm to the original mask with
each one of the applied mask. Table 2 represents the results
of SIFT algorithm, and Fig. 8 shows the resulting matching
points found by SIFT for each pair of masks.
Now, we compute the similarity measure and the geometric
parameters of the pair of masks to compare them with the
ground truth shown in Table 1. The results are shown in Table
3.
As shown in the table, all the similarity measure for the
Fig. 7. Comparison between ground truth buildings masks and the predicted transformed mask with respect to the original mask are greater
ones. or equal to the threshold. As for scale factor, the difference
between the computed scale factor and the real one for the four
pairs of masks does not exceed 0.1. As well, the difference
V. R ESULTS AND D ISCUSSIONS between the computed rotated angle and the real one does not
Fig. 6 shows the evaluation metrics for both training and exceed 0.1°. Now, the difference image is computed for each
evaluation sets during the training process. We pick up the
checkpoint of the epoch 66 since it has the best values for
the validation set. This epoch has the least loss value which TABLE III
is 0.0491. At the same time this epoch has the highest mean S IMILARITY MEASURES AND GEOMETRIC PARAMETERS COMPUTED FOR
THE PAIRS OF MASKS
intersection over union value which is 0.757.
Fig. 7 shows a comparison between ground truth masks (on Similarity measure Scale factor Rotation angle
the left) an the predicted masks (on the right). A 0.692308 1.53581 0.12881
B 0.88636 1.01337 73.0351
For change detection, we used the accurate buildings ex- C 0.875 0.94348 -0.00921
tracted from OSM to evaluate the change detection procedure,
19
Fig. 8. SIFT matches between each pair of masks.
pair of masks after aligning them. Fig. 8 shows the difference
image of each of the four pair of masks.
The procedure was applied on a test set of 80 pairs of
aerial images with different characteristics and different ap-
plied transformations in order to evaluate our procedure. The
following histograms show the accuracy rate of the results of
the change detection as well as for the geometric parameters
for each type of transformation.
It is clear from the obtained results that our procedure works
the best with the scale transformation as well as overlapping
regions. However, some errors were encountered with rotation
and mixed transformations. The results are expectable since
SIFT algorithm is designed to be robust with scale transfor-
mation.
As an overall rate, our procedure gives 92.7% of true change
Fig. 9. Difference image between each pair of masks.
detection for different types of transformations.
The strengths of our procedure can be summarized by the
following points: (1) this procedure works with simple PNG
aerial images without any additional metadata, (2) if the shape
20
TABLE IV Furthermore, points of interest other than buildings can be
ACCURACY RATE OF THE RESULTS OF THE CHANGE DETECTION WITH taken into consideration in the process of change detection.
DIFFERENT TYPES OF TRANSFORMATIONS
It can also include roads, vegetation and any other class of
Transformations objects that can be present in aerial images.
Scale Orientation Overlapping Mixed Additionally, enhancing the segmentation model with a
Accuracy (%) 100 85.5 99 86.3
larger and more suitable dataset is essential in a further
research due to its considerable and significant effect in the
improvement of the overall results of the approach, since
of buildings in another region are different from the buildings detecting changes rely truly on the extracted buildings’ foot-
in the training set, simply anyone can train his own dataset prints.
and then use the same procedure to detect changes, (3) this
R EFERENCES
procedure can be extended to point of interests other than
[1] J. D. Kiser and D. P. Paine, Aerial Photography And Image Interpreta-
buildings, finally (4) this procedure is robust against different tion, Canada: John Wiley Sons, Inc., 2012.
types of transformations. [2] M. N. Favorskaya and L. C. Jain, Computer Vision in Control Systems,
However two main limitations encounter this procedure Aerial and Satellite Image Processing, vol. 135, M. N. Favorskaya and
J. C. Lakhmi, Eds., Canberra: Springer International Publishing, 2018.
which are (1) this procedure has expensive computation time [3] N. Paparoditis, M. Jordan and J. P. Cocquerez, ”Building Detection
so it cannot act as real time application and (2) the final results and Reconstruction from Mid- and High-Resolution Aerial Imagery,”
are always dependent to the accuracy of the segmentation Computer Vision And Image Understanding, vol. 72, pp. 122-142, 1997.
[4] G. Wilhauck, ”Comparison of Object Oriented Classification Techniques
phase. and Standard Image Analysis For the Use of Change Detection Between
SPOT multispectral Satellite Images and Aerial Photos,” International
VI. C ONCLUSION Archives of Photogtammetry and Remote Sensing, vol. XXXIII, 2000.
[5] S. Nebiker, N. Lack and M. Deuber, ”Building change detection from
Building change detection in aerial images that differ in historical aerial photographs using dense image matching and object-
many geometric aspects such as scale and view point is a based image analysis,” Remote Sensing, vol. 6, pp. 8310-8336, Septem-
challenging research topic nowadays. A complete solution ber 2014.
[6] L.-C. Chen and L.-J. Lin, ”Detection of building changes from aerial
for this problem is not yet found and developed. This work images and light detection and ranging (LIDAR) data,” Journal of
has presented a complete procedure to detect new and de- Applied Remote Sensing, vol. 4, no. 1, 2010.
molished buildings in two aerial images taken at different [7] M. C. Alonso, J. A. Malpica, F. Papi, A. Arozarena and A. Martinez-
Agirre, ”Change detection of buildings from satellite imagery and lidar
times. Our procedure worked in three steps. The first step data,” International Journal of Remote Sensing, vol. 34, no. 5, p. 1652,
which is extracting building footprints from original aerial March 2013.
images was accomplished using a segmentation model. Using [8] I. Tomljenovic, D. Tiede and T. Blaschke, ”A building extraction ap-
proach for airborne laser scanner data utilizaing the object based image
machine learning, specifically convolutional neural network, analysis paradigm,” International Journal of Applied Earth Observation
this model was built by training on a large number of aerial and Geoinformation, vol. 52, pp. 137-148, October 2016.
images coupled with their buildings masks. The second step [9] R. B. Irvin and D. M. McKeown, ”Methods for exploiting the rela-
tionship between buildings and their shadows in aerial imagery,” IEEE
which is image correspondence was done by calculating a Transactions on Systems, Man and Cybernetics, vol. 19, no. 6, 1989.
similarity factor between each pair of images. At this point, [10] Y. Wang, ”Automatic extraction of building outline from high resolution
the pair of images that represent the same geographic area aerial imagery,” The International Archives of the Photogrammetry,
Remote Sensing and Spatial Information Sciences, Vols. XLI-B3, 2016.
is found. The last step, which is change detection, benefits [11] M. Leena, J. Hyyppa and H. Kaartinen, ”Automatic detection of changes
from image matching algorithms in particular SIFT algorithm. from laser scanner and aerial image data for updating buildings map,”
This algorithm is applied to align the pair of the images and Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 35, pp. 434-
439, July 2004.
then compute their difference in order to detect the changed [12] M. C. A. Turker and B. Cetinkaya, ”Automatic detection of earthquake-
buildings. This procedure showed a change detection rate of damaged buildings using DEMs created from pre- and post-earthquake
92.7% for different types of transformations. stereo aerial photographs,” International Journal of Remote Sensing, vol.
26, no. 4, pp. 823-832, 16 August 2006.
[13] F. Jung, ”Detecting building changes from multitemporal aerial stere-
VII. C HALLENGES AND F UTURE W ORK opairs,” ISPRS Journal of Photogrammetry and Remote Sensing, vol.
A big challenge is faced in our approach which is inability to 58, no. 3-4, pp. 187-201, January 2004.
[14] F. Rottensteiner, S. Clode, J. C. Trinder and K. Kubik, ”Fusing airborne
be implemented as a real time system. The image segmentation laser scanner data and aerial imagery for the automatic extraction
phase as well as searching in a database for the mask that of buildings in densely built-up areas,” International Archives of the
corresponds with the input image are computationally expen- Photogrammetry, Remote Sensing and Spatial Information Sciences, vol.
35, 2006.
sive, although building the model and preparing the dataset are [15] L. Y. Chen, T.-A. Teo, Y.-C. Shao and Y.-C. Lai, ”Fusion of LIDAR
carried out only once. Further studies must be accomplished data and optical imagery for building modeling,” International Archives
in order to find suitable solutions for this critical issue. of Photogrammetry, Remote Sensing and Spatial Information Sciences,
vol. 35, 2004.
Moreover, future work can have a more specific design for [16] S. Saito and Y. Aoki, ”Building and road detection from large aerial
the experiment. The overall findings that emerged from our imagery,” in Image Processing: Machine Vision Applications VIII, San
experiments gave us some promising directions to follow for Francisco, 2015.
[17] N. Bourdis, M. Denis and H. Sahbi, ”Constrained optical flow for aerial
building an optimal, operative, complete and automatic system image change detection,” in 2011 IEEE International Geoscience and
in the future. Remote Sensing Symposium (IGARSS), Vancouver, Canada, 2011.
21
[18] ”Arcgis.om,” 2019. [Online]. Available: https://pro.arcgis.com/en/pro-
app/tool-reference/data-management/detect-feature-changes.htm.
[19] S. Yuheng and Y. Hao, ”Image Segmentation Algorithms Overview,”
2017.
[20] G. Bradski and A. Kaehler, Learning OpenCV, United States: O’Reilly
Media, Inc., December 2016.
[21] N. R. Pal and S. K. Pal, ”A Review on Image Segmentation Techniques,”
Pattern Recognition, vol. 26, no. 9, pp. 1277-1294, September 1993.
[22] K. S. Fu and J. K. Mui, ”A Survey on Image Segmentation,” Pattern
Recoginition, vol. 13, no. 1, pp. 3-16, 1981.
[23] P.-g. Ho, Ed., Image Segmentation, 2011.
[24] A. P. Dhawan, ”Image Segmentation,” in Medical Image Analysis,
Wiley-IEEE Press, 2011, pp. 229-264.
[25] ”fizyr/keras-retinanet,” 2019. [Online]. Available:
https://github.com/fizyr/keras-retinanet.
[26] W. Abdullah, ”Mask R-CNN for object detection and instance segmen-
tation on Keras and TensorFlow,” GitHub Repository, 2017.
[27] ”ENVI - The Leading Geospatial Analytics Software,” 2019.
[Online]. Available: https://www.harrisgeospatial.com/Software-
Technology/ENVI.
[28] M. J. Canty, Image Analysis, Classification and Change Detection in
Remote Sensing, New York: Taylor Francis Group, 2014.
[29] V. Srivastava, ”Evaluation of various segmentation tools for extraction
of urban features using high resolution remote sensing data,” The
International Archives of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, vol. 34, no. XXX.
[30] J. Bauer, N. Sunderhauf and P. Protzel, ”Comapring Several Implemen-
tations of Two Recently Published Feature Detectors,” in International
Conference on Intelligent and Autonomous Systems (ICAS), Toulouse,
France, 2007.
[31] P. M. Panchal, S. R. Panchal and S. K. Shah, ”A Comparison of SIFT
and SURF,” International Journal of Innovative Research in Computer
and Communication Engineering, vol. 1, no. 2, April 2013.
[32] U. M. Babri, M. Tnavir and K. Khurshid, ”Feature Based Corre-
pondence: A Comparative Study on Image Matching Algorithms,”
International Journal of Advamced Computer Science and Applications
(IJACSA), vol. 7, no. 3, 2016.
[33] E. Karami, S. Prasad and M. Shehata, ”Image Matching Using SIFT,
SURF, BRIEF and ORB: Performance Comparison for Distorted Im-
ages,” in Newfoundland Electrical and Computer Engineering Confer-
ence, Canada, 2015.
[34] ”Github - RoboSat,” Mapbox, 2018. [Online]. Available:
https://github.com/mapbox/robosat.
[35] ”OpenStreetMap OSM,” OpenStreetMap Foundation OSMF, 2010. [On-
line]. Available: www/openstreetmap.org.
[36] ”GeoFabrik,” OpenStreetMap, 2018. [Online]. Available: geofabrik.de.
[37] ”Mapbox,” Mapbox, 2010. [Online]. Available: www.mapbox.com.
[38] J. Canny, ”A Computational Approach to Edge Detection,” in Readings
in Computer Vision, M. A. Fischler and O. Firschein, Eds., Elsevier,
1987, pp. 184-203.
22