Building Change Detection in Aerial Images Fatima Mroueh Ihab Sbeity Mohamad Chaitou Computer Science Department Computer Science Department Computer Science Department Lebanese University Lebanese University Lebanese University Beirut, Lebanon Beirut, Lebanon Beirut, Lebanon fatima.mroueh249@gmail.com ihab.sbeity@gmail.com mohamad.chaitou@ul.edu.lb Abstract—In this paper, we provide an approach that detects data, such as data generated from Digital Elevation Models the changes in buildings between two multi-temporal aerial (DEM), Light Detection and Ranging (LiDAR) technology images of different sources. Since the images in most cases are and other kinds of remote sensing technologies [1] [4], or not perfectly aligned, our approach takes into consideration the differences in the geometric aspects of the images. Differences in limited to work with specific types of images such as GeoTiff scale, view point or overlapping regions may be present between images that contain accurate geographic information such as the pair of images. Our approach relies on segmentation to coordinates is the global coordinates system. Furthermore, extract building masks from the original aerial images. Changes they are also limited to aligned images that are of the same are then found by comparing the features of the pair of masks scale and view point (same height, same camera calibrations, using image matching algorithms. This procedure is applied on a set of 80 pairs of aerial images of different sizes and same coordinates...). with different applied transformations, and an evaluation has The main problem with these techniques is that they rely been considered in comparison with the corresponding ground too much on the information provided with the images, and truth references. The evaluation yields buildings change detection therefore they cannot be applied to any image that is not rate of 92.7%. The results of our proposed approach suggest enriched with any information such as geo-spatial information. that automatic building change detection is possible, but further research should include improvement of the segmentation phase Nowadays, the automatic analysis techniques of images to better distinguish buildings and enhancement of the change are very essential. Machine Learning and Computer Vision detection method. Real time application of the process is also a techniques, and more specifically the image matching algo- challenging perspective. rithms, has proven to be very efficient in the field of image Index Terms—change detection, aerial images, image seg- processing and comparison. Furthermore, there are still poor mentation, image matching algorithms, SIFT, feature detection, feature description scientific methodologies for detecting changes in aerial im- ages, especially those that differ in geometric aspects such as scale and orientation, and without being limited to additional I. I NTRODUCTION information about the images. Going deeper into the topic Aerial imagery is - as it sounds - the process of taking is essential to introduce new efficient insights in the field of images from the air. It is a subset of a larger domain called change detection in aerial images. Remote Sensing. It consists of acquiring data without making Accordingly, this research provides a complete procedure physical contact with the objects in study [1]. for building change detection in aerial images using machine Aerial images such as satellite imagery or drone imagery learning and computer vision techniques and algorithms. are considered one of the richest sources of data that can be The main advantage of our approach is that it does not used in various applications. Change detection in aerial images take benefit from any of the information derived from the is detecting new or disappeared objects in images registered aerial images. It deals with aerial images as simple PNG or at different moments of time and possibly in various lighting, JPG formats without any enrichment. More importantly, it can heights and camera calibrations [2]. Detecting the changes in detect changes in aerial images that differ in scale and view aerial images of the same region and taken at different times point and images that have overlapped regions. This way, our is useful and important in many domains such as: automatic approach can be applied to any pair of aerial images despite map updating, field change after catastrophic events, detecting of their related information or their geometric aspects. illegal buildings areas and undeclared refugees camps, analysis of urban and suburban areas, a base for automatic monitoring II. P REVIOUS S TUDIES system and some other military applications. For these rea- Detecting the changes in aerial images has been an old and sons, detecting changes in aerial images has thus become an long journey. In particular, changes in buildings, is an essential important research topic [3]. part of this journey. In fact, several techniques and approaches are designed and Looking at the previous studies related to our topic, one can implemented to detect changes in aerial images. However, all see that most of these studies rely on data fusion; they integrate these techniques were motivated by the availability and fusion multiple data sources to produce more consistent and accurate of different types of useful and profitable remote sensing information than that provided by any individual data source. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 13 For example, in their work, Nebiker et al. used image-based changes by finding where the update line features spatially dense digital surface models (DSMs) in order to compute match the base line features and detects spatial changes, a depth value for every pixel of an image, combined with attribute changes, or both, as well as no change. However, the aerial images for the detection of individual buildings. all inputs to this tool must be in the same coordinates system They used these models with object-based image analysis to [18]. While in our case, we aim to detect changes even if we detect changes [5]. As well, the study of Chen Lin was do not know the spatial location of the geographic region we based on multi-source data. They pre-processed the data using are working on. triangulation of an irregular network of data points collected To the best of our knowledge, we did not find any study that by Light Detection And Ranging (LiDAR) technology, and processes aerial images independently from any other source then, the changes were detected by finding differences in of information to extract buildings from. Moreover, computer height by comparing the LiDAR point measurements and the vision techniques such as image matching algorithms are not estimates of the building models [6]. Furthermore, Alonso et employed to detect changes although they proved to be very al. applied the support vector machine (SVM) classification efficient in the comparison of images. algorithm to a joint satellite and laser data set for the extraction To overcome the two problems cited above, our approach of buildings. For change detection, they suggested to compare works in three steps. As we are interested in small-scale an old map with more recent spatial information instead of change detection (buildings), the first step is the segmentation comparing a pair of images [7]. Many other studies took phase, in which we eliminate a large part of the scene without benefit from data sources other than the aerial image itself losing any actual building. This is possible by extracting build- such as Digital Elevation Models (DEMs), laser scanner data, ings’ footprints from the aerial images. Second, we use the indicator of vegetation (NDVI), the relationship between the SIFT image matching algorithm to check the correspondence buildings and their shadows, and high resolution aerial images of the pair of images, i.e. to make sure that the images taken in order to detect changes in buildings [8] [9] [10] [11] [12]. correspond to the same geographic region. Third, we detect Most of these studies suffered from significant problems with the type of transformation applied to one of the images with small buildings and with buildings surrounded by high trees. respect to the other (scale, rotation, overlap). The detected Talking about extracting the buildings before detecting transformation is then reversed to get two images of the same changes, this step was included in numerous studies. Some scale and view. The last step, the difference image can be of them went for region-based classification where each small computed and post-processed. In this step, the changes in the region was classified to “building” or “no-building” based on buildings are detected. a decision tree induced from training data (edge recording of the buildings), and then classified to “change” or “no change” III. BACKGROUD based on some conditions [13]. Other used the indicator of A. Image Segmentation vegetation NDVI for distinguishing buildings from trees since both have similar height information [14]. Neural network Computer vision is a field that is intended to make comput- classifier also was employed in order to classify the regions ers accurately understand and efciently process visual data like in an aerial image into multiple classes (grove, building, tree, images. Extracting information from images and understand- shadow . . . ) by feeding the neural network with many inputs ing image information is very critical in many applications in such as area, average gray level, shape factor and compactness this domain. Computer vision helps in extracting features of [3]. Region-based segmentation was also applied using a an image in order to simplify image analysis [19]. decision tree that rely on the geometric properties of the land In several cases, we may not be interested in all the cover objects such as elevation, spectral information, texture components of the image, but only for some areas or objects and shape [15]. The most important and precise segmentation that have certain characteristics related to our task. Image was applied using Convolutional Neural Networks where the segmentation is one of the best techniques to handle this issue. large imagery is divided into small patches, and then CNN This technique works by isolating objects from the rest of the is trained with those patches and their corresponding three- image [20] [21] [22] [23] [24]. Image segmentation mainly has channel map patches (building, road and background) [16]. the role of classifying each pixel of an image into meaningful However this work was not including change detection. classes that refer to specific objects. It involves grouping of As for detecting changes in aerial images that have different the elements of an image by certain criteria of homogeneity views, Bourdis et al. stated that camera motion and viewpoint [4]. It does not only make a prediction for an input providing differences introduce parallax effects. Therefore, in order to be classes, but also provides additional information regarding the robust to viewpoint differences, they introduce an algorithm to location of such classes. distinguish between real changes and parallax effects based on Deep learning techniques have proven to be very efficient in optical flow constrained with epipolar geometry [17]. In other solving such problems. These techniques can learn patterns in works concerning this point, knowing the calibration of the order to predict classes. The main deep learning architecture camera or the spatial information about the geographic area that is used for image segmentation, and generally speaking were essential in order to achieve the goal [13] [17]. for image processing, is the Convolutional Neural Network Furthermore, ArcGIS Pro offers a tool that detect feature (CNN). 14 Frameworks like MaskRCNN, RetinaNet allow to apply image segmentation using deep learning. However the domain of application of some of them is restricted to scene images, and they cannot be used in case of aerial images [25] [26]. Other frameworks that work with aerial images such as ENVI, ERDAS Imagine, eCognition and others are also available [27] [28]. Nevertheless they have many limitations. Some of them do not have any vectorization tool to convert the segmented result to use them in further analysis, others are confused with images where the building roofs were dark and having intensities very less as compared to other building objects [29]. B. Image Matching In order to compare the images, we look for specific patterns Fig. 1. Worflow of our approach. or specific features that are unique in the images and that can be easily compared. A feature is a relevant piece of information. It is a specific structure in the image such as matching feature in the other image. We can summarize the a point, an edge or a corner. The operation of finding the process of image matching as follows: 1- Find a set of features of an image is called Feature Detection. distinctive keypoints. 2- Define a region around each keypoint. Feature detection is the process of transforming the visual 3- Extract and normalize the region content. 4- Compute a information of the image into the vector space. It is basically local descriptor from the normalized region. 5- Match local finding keypoints (or interest points) in the image. A keypoint descriptors. is a unique point in the local area around it. A keypoint can Many comparative studies have been published assessing be matched to a corresponding point in another image. The the performance of the image matching algorithms. The real main purpose of detecting features is giving us the possibility challenge is to achieve true invariant feature detection under to perform mathematical operations on them, and thus to find any image transformation. It seems that the selection of the similar vectors that lead us to similar objects or scenes in adequate algorithm to complete the matching task significantly the images. Ideally, this information should be invariant under depends on the type of the image to be matched and in the image transformations, so we can find the same features again variations within an image and its matching pair in scale, even if the image is transformed in some way. orientation or other transformations. Most of these studies has Using a specific feature detection algorithm, we search stated that Scale Invariant Feature Transform SIFT algorithm for such features in the first image and then we look for performs the best against different image transformations [30] the same features in the other image. As a result, we get a [31] [32] [33]. set of points (xi , yi ) for each image, where xi and yi are the coordinates of the point i detected as a feature in the IV. M ETHODOLODY image. After detecting interest points, we continue to compute a descriptor for each one of them. The regions around the Fig. 1 represents the overall process of our approach: First, features should be described so that the algorithm can find the the buildings footprints from the acquired aerial image will be similar features in the other image. This is called the Feature extracted in order to use them in detecting changes instead of Description. the original aerial images. To achieve this step, a segmentation The local appearance around each feature point is described model for extracting buildings masks from aerial images is in some way that is invariant under changes in translation, built. Second, we suppose that a database is already prepared scale and rotation. Therefore, we end up with a descriptor containing preprocessed aerial images’ masks of the region vector for each feature point. Feature descriptors encode of interest. In this step, we look into in the database for the interesting information into a series of numbers and act as a mask that corresponds to our input mask. This step is achieved sort of numerical ‘fingerprint’ that can be used to differentiate by computing a similarity measure between each couple of one image from another. Once the features and the descriptors images using SIFT image matching algorithm. Finally, after are extracted and computed, some preliminary feature matches aligning the couple of masks, we detect changes by filtering between these images will be established. their difference image. Feature matching, or more generally image matching, is the task of establishing correspondences between two images. A. Buildings’ Footprints Extraction Keypoints between two images are matched by identifying Extracting buildings’ footprints from aerial images is a kind their nearest neighbors. This is achieved by comparing the of preprocessing of the images before matching. It helps us to descriptors across the images to identify similar features. get better results in detecting changes, since by segmenting the For any two images, we get a set of pairs (xi , yi ), (x0i , yi0 ) images, we get rid of every element that is considered noise where (xi , yi ) is a feature in one image and (x0i , yi0 ) is its for us (not an object of interest). 15 trails, railway stations and much more, all over the world. OSM maps are saved on the internet and they are totally free. But the most important thing is that OSM is accurate and up to date (normally updated every day) [35]. There are two reasons for which we are building our own segmentation model instead of using OSM data directly. The first reason is that OSM data do not cover all the regions we are interested in. In Lebanon for example, buildings masks are not provided for all the country. So, we take benefit from the available geometries provided by OSM in order to build the segmentation model. Later, this model will provide us with the buildings footprints for regions that are not covered by OSM. The second reason, which is the most important one, is that we may not be aware of the exact location of the image in the global coordinates system. In such case, we cannot use OSM extracts. GeoFabrik server from OSM provides a convenient and Fig. 2. Aerial image and its corresponding buildings mask. updated extracts which we can work with [36]. GeoFabrik team extract, select and process free geodata for everyone. They create shape files, maps, and map tiles with a free A segmentation model is needed for this purpose. Many of charge download service. The geometries extracted from tools that implement this techniques are available. The used GeoFabrik server are shape files of extension .shp. A shape tool to achieve our goal is RoboSat [34]. RoboSat is an file is a simple format that is used for storing the geometric end-to-end pipeline written in Python3 for feature extraction location and attribute information of geographic features that from aerial and satellite imagery. Features can be anything can be represented by points, lines or polygons. We are only visually distinguishable in the imagery such as buildings, roads interested in polygon representation of the buildings. These or cars [34]. We chose to work with RoboSat since it is shape files can be visualized as vector layer in GIS tools, specially designed to work with aerial images and it has shown which help us to decide at what locations we need to download important results in this domain. satellite imagery to complete the dataset. The data preparation tools in RoboSat help us to create Although the masks are not always perfect, but a slightly and prepare the dataset for training feature extraction models. noisy dataset will still work fine with training the model on Also, the modelling tools in RoboSat help with training fully thousands of images and masks. convolutional neural networks for segmentation [34]. The next step is to download the corresponding aerial Fig. 2 represents an aerial image with its corresponding imagery. Our aerial imagery is downloaded from Mapbox [37]. buildings mask. Mapbox satellite is a full global base map. It uses global a) Data Preparation: We first walk through creating a satellite and aerial imagery from commercial providers such dataset for training feature extraction model. Such dataset con- as NASA and USAS. Mapbox provides an API that allows us sists of satellite imagery combined with their corresponding to download the needed satellite imagery [37]. masks for the feature we want to extract, which is building RoboSat works with the Slippy Map tile format to abstract in our case. We can think of these masks as binary images away georeferenced imagery behind tiles of the same size. A which take the value zero where there is no building and one Slippy Map is, in general, a term referring to modern web for building areas. maps which let you zoom and pan around. By default, the This dataset will serve as training set for the segmentation Slippy Map renders tiles. Tiles are 256 x 256 pixel PNG model. The goal is to have a model that accepts an aerial files. Each tile is a file in a directory representing a column, image and outputs its corresponding buildings footprints. As and each column is a subdirectory that represents the zoom mentioned before, the footprints will be used to detect changes level. RoboSat offers the tool that is responsible for tiling the instead of original aerial images, and this is to reduce all kinds collected aerial images as well as extracted geometries. of noise that may affect the accuracy of our application. Our With downloaded satellite imagery and rasterized corre- objects of interest are only buildings. sponding masks, our dataset is complete and ready. Fig.3 We start by extracting geometries from OpenStreetMap shows the downloaded aerial imagery tiles with their corre- (OSM) project. We try then to figure out where we need sponding buildings footprints. satellite imagery in order to complete the training set [35]. b) Training and Modelling: The RoboSat segmentation OpenStreetMap (OSM) project creates and provides free ge- model is a kind of fully convolutional neural network which ographic data. The OpenStreetMap Foundation is an interna- we train on pairs of aerial images and corresponding masks. tional not-to-profit organization supporting the OpenStreetMap The training process takes place within a GeForce GTX 1080 project. This project maintains data about roads, buildings, platform. When picking up the best checkpoint, the model 16 Fig. 4. SIFT matches between pairs of masks having different transformations. Fig. 3. Aerial images tiles with their corresponding footprints. keypoints, the set of matched keypoints between the pair of images and many other useful information. Fig. 4 represents the matching points between pairs of allows to predict the segmentation probabilities for every images having different transformations. For visualization, the pixel in an image. These segmentation probabilities indicate original image is put on the left side and the other image is put how likely each pixel is a background or a building. These on the right side and the matches are drawn as lines between probabilities are then turned into discrete segmentation masks. both images. The same segmentation model is used for extracting buildings Let n and m be the number of keypoints in the first mask and footprints from old imagery as well as the input aerial image. second mask respectively. And let S = {Pi /i 2 1, 2, . . . , n} be the set of detected keypoints in the first mask and S 0 = B. Image Correspondence {Pi0 /i 2 1, 2, . . . , m} be the set of detected keypoints in the At this point, after extracting buildings footprints from the second mask. Let M be the set containing the pair of keypoints original input aerial image, we need to find its corresponding indices that match with each other. Then M = {(i, j)/Pi 2 mask from the already prepared dataset. The pair of masks SandPj0 2 S 0 } are found as matched keypoints. This naming will not be perfectly aligned. Many types of transformations will be used in all next sections. may be applied to one of the images with respect to the other. Both images are of the same scene then there must be Different scales, different views, overlapping regions between proportionality between the relative distances of the keypoints. images are examples of such transformations. Thus in all cases this condition must be satisfied: Here, and because we do not know the exact location of the images in the global coordinates system, some similarity d(Pa , Pb ) d(Pe0 , Pf0 ) ⇡ (1) measure is needed to find the mask from the dataset that best d(Pc , Pd ) d(Pg0 , Ph0 ) matches with our input image. This similarity measure will help us in deciding whether the two images are for the same such that a, b, c, d 2 {1, 2, . . . , n} and e, f, g, g 2 scene or not. For this purpose, SIFT image matching algorithm {1, 2, . . . , m} and {a, e}, {b, f }, {c, g}, {d, h} 2 M . is used. We compute this factor for the matched keypoints found for The objective here is to find a similarity measure that the pair of images. In some cases, there might be false matches helps us to know that the masks are extracted from the same which lead to some disparity in the values of the factor geographic region despite of the applied transformation. between the matching pairs. To remove this inconsistency, we First, we use SIFT image matching algorithm to detect the remove all the matching pairs that give a factor which is far interest points in both masks (having different transforma- from the most frequent factor. Then we compute the ratio of tions). Then, we compute the descriptors for each one of the the number of the remaining matching pairs over the total images in order to use them in the matching process. SIFT number of good matches. We rely on this ratio as a similarity algorithm provides us with the coordinates of the detected measure between the two images. 17 In other cases, this similarity factor can vary. In order to have a threshold that can be used in any other case, we com- pute the similarity factor for 408 pair of masks with different sizes and different applied transformations. We computed the average of the proportionality factor of the 408 pairs of masks and we got 0.88685 as an average factor. But since we are assuming that the pair of masks that we need to compare have differences in buildings, we accept 0.7 as a threshold. C. Change Detection After finding the corresponding masks, SIFT matching algorithm is very efficient in detecting the type of the transfor- mation applied to one of the images with respect to the other. We differentiate between three main types of transformations: masks that have overlapping regions, masks that are different in scale and masks that are different in rotation angle. We will explain in details how to detect each type of these transfor- mations by applying simple mathematics on the information provided by SIFT algorithm. a) Overlapping Regions: For this type of transforma- tion, we use template matching algorithms. This algorithm is available in OpenCV library for computer vision [20]. This algorithm proved to be very efficient in detecting the overlapping regions between two images. After computing the similarity measure between the pair of masks and checking that the views correspond to the same Fig. 5. Difference image before and after filtering. scene, we try to find the overlapping regions between the pair of masks. The bigger image is then cropped to be aligned with its overlapping region. We apply template matching algorithm have to find the angle between the lines that are formed by to search for the small mask in the bigger one. Although there respective matched points. So are some differences in the buildings, the template matching d(Pa , Pb ) d(Pe0 , Pf0 ) algorithm gives us an accurate result. Now, we have two ✓= ⇡ (3) d(Pc , Pd ) d(Pg0 , Ph0 ) aligned pair of masks that are ready to detect changes between them. We also remove inconsistencies because of the presence of b) Scale Transformation: In this type of transformation, false matches. Now, we have two aligned pair of masks that we aim to find the scale factor between the pair of masks. are ready to detect changes between them. When we get the scale ratio , we can transform both masks to D. Difference Image be in the same scale. The process now is very similar to the Whatever was the transformation applied to one of the one performed in computing the similarity measure since the images, at this point we have two aligned images. All what ratio of distances computed there was in fact the scale factor. we have to do is to find the difference image. Of course, So the difference image will contain some noise because of the d(Pa , Pb ) d(Pe0 , Pf0 ) = ⇡ (2) differences in the resolution of the pair of masks, the fact that d(Pc , Pd ) d(Pg0 , Ph0 ) drives us to filter the difference image. Filtering the noise in the difference image consists of for all a, b, c, d 2 {1, 2, . . . , n} and e, f, g, g 2 {1, 2, . . . , m} finding the contours in it. Contours are curves joining all and {a, e}, {b, f }, {c, g}, {d, h} 2 M . the continuous points (along the boundary) having same color We also remove inconsistencies because of the presence of or intensity. Contours are very helpful for shape detection or false matches. Now, we have two aligned pair of masks that recognition. Since we are using binary images, we have more are ready to detect changes between them. chance to get a better accuracy. Finding such contours rely on c) View Point Transformation (Orientation: In this type detecting Canny Edges [38]. of transformation, we aim to find the rotation angle between Fig. 5 represents the noisy difference image and the filtered the pair of masks. When we get the rotation angle, we one. can transform both masks to be in the same orientation. To The contours are projected finally onto one of the original calculate the angle of rotation between the two masks, we images to show the differences clearly. 18 TABLE I D ESCRIPTION OF THE TRANSFORMED MASKS Description A Scale factor = 1.56, 4 changes B Rotation angle = 73°, 2 changes C Have overlapping region, 3 changes TABLE II R ESULTS OF SIFT ALGORITHM APPLIED ON DIFFERENT PAIRS OF MASKS No. of keypoints No. of keypoints No. of matches in the original mask in the transformed mask A 261 216 169 B 261 779 171 C 261 121 91 in order to guarantee that the results of the image segmentation do not affect our evaluation. Fig. 6. Evaluation metrics for each epoch of the training process. In order to show the results of the whole workflow, refer to Fig. 2 that shows an aerial image and its corresponding mask. We suppose that this image is acquired now from an aircraft. We also suppose that we have a database containing old masks (extracted from old aerial images). The goal is to find the mask in the database that corresponds to the mask of this aerial image by computing the similarity measure between each pair of masks. After extracting the buildings’ footprints from the image, we manually apply different transformations to the mask in order to evaluate our procedure. Table 1 shows the description of the applied transformations. We also apply manually some changes between the masks. First, we apply SIFT algorithm to the original mask with each one of the applied mask. Table 2 represents the results of SIFT algorithm, and Fig. 8 shows the resulting matching points found by SIFT for each pair of masks. Now, we compute the similarity measure and the geometric parameters of the pair of masks to compare them with the ground truth shown in Table 1. The results are shown in Table 3. As shown in the table, all the similarity measure for the Fig. 7. Comparison between ground truth buildings masks and the predicted transformed mask with respect to the original mask are greater ones. or equal to the threshold. As for scale factor, the difference between the computed scale factor and the real one for the four pairs of masks does not exceed 0.1. As well, the difference V. R ESULTS AND D ISCUSSIONS between the computed rotated angle and the real one does not Fig. 6 shows the evaluation metrics for both training and exceed 0.1°. Now, the difference image is computed for each evaluation sets during the training process. We pick up the checkpoint of the epoch 66 since it has the best values for the validation set. This epoch has the least loss value which TABLE III is 0.0491. At the same time this epoch has the highest mean S IMILARITY MEASURES AND GEOMETRIC PARAMETERS COMPUTED FOR THE PAIRS OF MASKS intersection over union value which is 0.757. Fig. 7 shows a comparison between ground truth masks (on Similarity measure Scale factor Rotation angle the left) an the predicted masks (on the right). A 0.692308 1.53581 0.12881 B 0.88636 1.01337 73.0351 For change detection, we used the accurate buildings ex- C 0.875 0.94348 -0.00921 tracted from OSM to evaluate the change detection procedure, 19 Fig. 8. SIFT matches between each pair of masks. pair of masks after aligning them. Fig. 8 shows the difference image of each of the four pair of masks. The procedure was applied on a test set of 80 pairs of aerial images with different characteristics and different ap- plied transformations in order to evaluate our procedure. The following histograms show the accuracy rate of the results of the change detection as well as for the geometric parameters for each type of transformation. It is clear from the obtained results that our procedure works the best with the scale transformation as well as overlapping regions. However, some errors were encountered with rotation and mixed transformations. The results are expectable since SIFT algorithm is designed to be robust with scale transfor- mation. As an overall rate, our procedure gives 92.7% of true change Fig. 9. Difference image between each pair of masks. detection for different types of transformations. The strengths of our procedure can be summarized by the following points: (1) this procedure works with simple PNG aerial images without any additional metadata, (2) if the shape 20 TABLE IV Furthermore, points of interest other than buildings can be ACCURACY RATE OF THE RESULTS OF THE CHANGE DETECTION WITH taken into consideration in the process of change detection. DIFFERENT TYPES OF TRANSFORMATIONS It can also include roads, vegetation and any other class of Transformations objects that can be present in aerial images. Scale Orientation Overlapping Mixed Additionally, enhancing the segmentation model with a Accuracy (%) 100 85.5 99 86.3 larger and more suitable dataset is essential in a further research due to its considerable and significant effect in the improvement of the overall results of the approach, since of buildings in another region are different from the buildings detecting changes rely truly on the extracted buildings’ foot- in the training set, simply anyone can train his own dataset prints. and then use the same procedure to detect changes, (3) this R EFERENCES procedure can be extended to point of interests other than [1] J. D. Kiser and D. P. Paine, Aerial Photography And Image Interpreta- buildings, finally (4) this procedure is robust against different tion, Canada: John Wiley Sons, Inc., 2012. types of transformations. [2] M. N. Favorskaya and L. C. Jain, Computer Vision in Control Systems, However two main limitations encounter this procedure Aerial and Satellite Image Processing, vol. 135, M. N. Favorskaya and J. C. Lakhmi, Eds., Canberra: Springer International Publishing, 2018. which are (1) this procedure has expensive computation time [3] N. Paparoditis, M. Jordan and J. P. Cocquerez, ”Building Detection so it cannot act as real time application and (2) the final results and Reconstruction from Mid- and High-Resolution Aerial Imagery,” are always dependent to the accuracy of the segmentation Computer Vision And Image Understanding, vol. 72, pp. 122-142, 1997. [4] G. Wilhauck, ”Comparison of Object Oriented Classification Techniques phase. and Standard Image Analysis For the Use of Change Detection Between SPOT multispectral Satellite Images and Aerial Photos,” International VI. C ONCLUSION Archives of Photogtammetry and Remote Sensing, vol. XXXIII, 2000. [5] S. Nebiker, N. Lack and M. Deuber, ”Building change detection from Building change detection in aerial images that differ in historical aerial photographs using dense image matching and object- many geometric aspects such as scale and view point is a based image analysis,” Remote Sensing, vol. 6, pp. 8310-8336, Septem- challenging research topic nowadays. A complete solution ber 2014. [6] L.-C. Chen and L.-J. Lin, ”Detection of building changes from aerial for this problem is not yet found and developed. This work images and light detection and ranging (LIDAR) data,” Journal of has presented a complete procedure to detect new and de- Applied Remote Sensing, vol. 4, no. 1, 2010. molished buildings in two aerial images taken at different [7] M. C. Alonso, J. A. Malpica, F. Papi, A. Arozarena and A. Martinez- Agirre, ”Change detection of buildings from satellite imagery and lidar times. Our procedure worked in three steps. The first step data,” International Journal of Remote Sensing, vol. 34, no. 5, p. 1652, which is extracting building footprints from original aerial March 2013. images was accomplished using a segmentation model. Using [8] I. Tomljenovic, D. Tiede and T. Blaschke, ”A building extraction ap- proach for airborne laser scanner data utilizaing the object based image machine learning, specifically convolutional neural network, analysis paradigm,” International Journal of Applied Earth Observation this model was built by training on a large number of aerial and Geoinformation, vol. 52, pp. 137-148, October 2016. images coupled with their buildings masks. The second step [9] R. B. Irvin and D. M. McKeown, ”Methods for exploiting the rela- tionship between buildings and their shadows in aerial imagery,” IEEE which is image correspondence was done by calculating a Transactions on Systems, Man and Cybernetics, vol. 19, no. 6, 1989. similarity factor between each pair of images. At this point, [10] Y. Wang, ”Automatic extraction of building outline from high resolution the pair of images that represent the same geographic area aerial imagery,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vols. XLI-B3, 2016. is found. The last step, which is change detection, benefits [11] M. Leena, J. Hyyppa and H. Kaartinen, ”Automatic detection of changes from image matching algorithms in particular SIFT algorithm. from laser scanner and aerial image data for updating buildings map,” This algorithm is applied to align the pair of the images and Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 35, pp. 434- 439, July 2004. then compute their difference in order to detect the changed [12] M. C. A. Turker and B. Cetinkaya, ”Automatic detection of earthquake- buildings. This procedure showed a change detection rate of damaged buildings using DEMs created from pre- and post-earthquake 92.7% for different types of transformations. stereo aerial photographs,” International Journal of Remote Sensing, vol. 26, no. 4, pp. 823-832, 16 August 2006. [13] F. Jung, ”Detecting building changes from multitemporal aerial stere- VII. C HALLENGES AND F UTURE W ORK opairs,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. A big challenge is faced in our approach which is inability to 58, no. 3-4, pp. 187-201, January 2004. [14] F. Rottensteiner, S. Clode, J. C. Trinder and K. Kubik, ”Fusing airborne be implemented as a real time system. The image segmentation laser scanner data and aerial imagery for the automatic extraction phase as well as searching in a database for the mask that of buildings in densely built-up areas,” International Archives of the corresponds with the input image are computationally expen- Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 35, 2006. sive, although building the model and preparing the dataset are [15] L. Y. Chen, T.-A. Teo, Y.-C. Shao and Y.-C. Lai, ”Fusion of LIDAR carried out only once. Further studies must be accomplished data and optical imagery for building modeling,” International Archives in order to find suitable solutions for this critical issue. of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 35, 2004. Moreover, future work can have a more specific design for [16] S. Saito and Y. Aoki, ”Building and road detection from large aerial the experiment. The overall findings that emerged from our imagery,” in Image Processing: Machine Vision Applications VIII, San experiments gave us some promising directions to follow for Francisco, 2015. [17] N. Bourdis, M. Denis and H. Sahbi, ”Constrained optical flow for aerial building an optimal, operative, complete and automatic system image change detection,” in 2011 IEEE International Geoscience and in the future. Remote Sensing Symposium (IGARSS), Vancouver, Canada, 2011. 21 [18] ”Arcgis.om,” 2019. [Online]. Available: https://pro.arcgis.com/en/pro- app/tool-reference/data-management/detect-feature-changes.htm. [19] S. Yuheng and Y. Hao, ”Image Segmentation Algorithms Overview,” 2017. [20] G. Bradski and A. Kaehler, Learning OpenCV, United States: O’Reilly Media, Inc., December 2016. [21] N. R. Pal and S. K. Pal, ”A Review on Image Segmentation Techniques,” Pattern Recognition, vol. 26, no. 9, pp. 1277-1294, September 1993. [22] K. S. Fu and J. K. Mui, ”A Survey on Image Segmentation,” Pattern Recoginition, vol. 13, no. 1, pp. 3-16, 1981. [23] P.-g. Ho, Ed., Image Segmentation, 2011. [24] A. P. Dhawan, ”Image Segmentation,” in Medical Image Analysis, Wiley-IEEE Press, 2011, pp. 229-264. [25] ”fizyr/keras-retinanet,” 2019. [Online]. Available: https://github.com/fizyr/keras-retinanet. [26] W. Abdullah, ”Mask R-CNN for object detection and instance segmen- tation on Keras and TensorFlow,” GitHub Repository, 2017. [27] ”ENVI - The Leading Geospatial Analytics Software,” 2019. [Online]. Available: https://www.harrisgeospatial.com/Software- Technology/ENVI. [28] M. J. Canty, Image Analysis, Classification and Change Detection in Remote Sensing, New York: Taylor Francis Group, 2014. [29] V. Srivastava, ”Evaluation of various segmentation tools for extraction of urban features using high resolution remote sensing data,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 34, no. XXX. [30] J. Bauer, N. Sunderhauf and P. Protzel, ”Comapring Several Implemen- tations of Two Recently Published Feature Detectors,” in International Conference on Intelligent and Autonomous Systems (ICAS), Toulouse, France, 2007. [31] P. M. Panchal, S. R. Panchal and S. K. Shah, ”A Comparison of SIFT and SURF,” International Journal of Innovative Research in Computer and Communication Engineering, vol. 1, no. 2, April 2013. [32] U. M. Babri, M. Tnavir and K. Khurshid, ”Feature Based Corre- pondence: A Comparative Study on Image Matching Algorithms,” International Journal of Advamced Computer Science and Applications (IJACSA), vol. 7, no. 3, 2016. [33] E. Karami, S. Prasad and M. Shehata, ”Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Im- ages,” in Newfoundland Electrical and Computer Engineering Confer- ence, Canada, 2015. [34] ”Github - RoboSat,” Mapbox, 2018. [Online]. Available: https://github.com/mapbox/robosat. [35] ”OpenStreetMap OSM,” OpenStreetMap Foundation OSMF, 2010. [On- line]. Available: www/openstreetmap.org. [36] ”GeoFabrik,” OpenStreetMap, 2018. [Online]. Available: geofabrik.de. [37] ”Mapbox,” Mapbox, 2010. [Online]. Available: www.mapbox.com. [38] J. Canny, ”A Computational Approach to Edge Detection,” in Readings in Computer Vision, M. A. Fischler and O. Firschein, Eds., Elsevier, 1987, pp. 184-203. 22