=Paper=
{{Paper
|id=Vol-3611/paper12
|storemode=property
|title=Building height prediction using neural networks based on Sentinel multi spectral images
|pdfUrl=https://ceur-ws.org/Vol-3611/paper12.pdf
|volume=Vol-3611
|authors=Giedrius Stravinskas,Vytas Vadapolas,Arminas Pamakštis,Andrius Kriščiūnas,Ingrida Lagzdinytė-Budnikevičienė
|dblpUrl=https://dblp.org/rec/conf/ivus/StravinskasVPKL22
}}
==Building height prediction using neural networks based on Sentinel multi spectral images==
Building height prediction using neural networks based on Sentinel multi spectral images Giedrius Stravinskas1,* , Vytas Vadapolas1,* , Arminas Pamakštis1 , Andrius Kriščiūnas1 and Ingrida Lagzdinyṫe-Budnikė1 1 Kaunas University of Technology, Faculty of Informatics, Department of Applied Informatics Abstract Satellite imagery is a form of data that can be used for many applications, especially those focusing on change over time. In this article, we analyze methods of detecting buildings and predicting their height as well as what key attributes are required for good predictions. Building detection and prediction are done by using neural network algorithms such as convolutional neural networks to estimate their height. Predictions are made based on additional data of the building and its area. In this paper three different building estimation models are implemented. The research showed that using a mixed dataset that takes both Sentinel image patch data and numerical feature input of additional building data performs well even with lower quality images. Keywords Building height prediction, Sentinel images, artificial neural networks 1. Introduction focus is ice, and all-year measurements of clouds and aerosol distributions over land in Polar Regions [7]. Having a way of quickly predicting building height pro- Sentinel-1/Sentinel-2 images came from the Coperni- vides us with essential knowledge for sustainable urban cus programme which is a European initiative for the development and plays a vital role in the fields of urban, implementation of information services dealing with the pollution transmission, building energy consumption, environment and security, based on observation data re- population estimation [1]. Essentially building height ceived from Earth Observation (EO) satellites and ground- information is crucial for the comprehensive understand- based information. Copernicus API provides access to ing of urban development [2]. Satellite images obtained during the Sentinel missions Determining urban development and its magnitude allowing comparison of the same locations within dif- usually requires the aggregation of many criteria. This is ferent time frames. The Sentinel-2 satellite is equipped a difficult process due to the time it takes to access this with an opto-electronic multispectral sensor for survey- information and the possible changes that might happen ing with a resolution of 10 to 60 m in the visible, near while the data is being collected. Even with automated infrared (VNIR), and short-wave infrared (SWIR) spectral monitoring systems, this creates linked data which is zones, including 13 spectral channels. This ensures the hard to handle [3]. In addition, some things that are not capture of differences in vegetation state, including tem- documented or finished will not be collected and this poral changes, and minimizes impact on the quality of will make the resulting predictions less accurate, as with atmospheric photography. The orbit is an average height the degradation of the data accuracy, predictive accuracy of 785 km and the presence of two satellites in the mis- goes down too [4]. Therefore, approaching this problem sion allows repeated surveys every 5 days at the equator there is a need to use data that is easier to get and reflect and every 2-3 days at middle latitudes. the current state precisely. Satellite images that are up An analysis of literature where Satellite images are to date are a great source of current data that is freely used has shown a variety of different use cases, such available. as detecting specific crops and change in the soil struc- There are many sources of satellite images like Lidar, ture [8]. Other examples include continuous observation Sentinel-1, Sentinel-2, InSar, ICESat and others. All of of ships moving on the sea surface [9] and retrieving these specialize in different areas, have different spec- significant wave height [10]. trums that can be used for different tasks. For example, Li- The focus of this work is on prediction of building dar and InSar capture ground elevation/deformation and height. There are already existing approaches that help are great for tasks that focus on nature/surface changes to solve similar problems. For example, shadows in com- [5][6]. ICESat images measure ice sheet balance. The bination with gradient formulas are employed in high- IVUS 2022: 27th International Conference on Information Technology resolution images for building height extraction [11], * Corresponding author. objects that are salient are identified and then their edges © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). are found [12]. Multi-scene building height estimation CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings method is using shadow length calculation combined getting the building imagery data “Copernicus Open Ac- with fish net and Pauta criterion [13]. Other building cess Hub” applied programming interface (API) was used detection works include using the U-Net model to assign [22]. This allows getting satellite image patches with ad- semantic labels to each pixel as building/non-building ditional information (latitude, longitude, time of the year [14] or using 3D building models in conjunction with and the day that the pictures were taken). To further sim- satellite imagery to predict building height [15], as well plify the process “Copernicus Open Access Hub” applied as using deep convolutional neural networks (DCNN) for programming interface (API) was used with “Python” semantic segmentation and applying filters [16]. programming language integration. This has facilitated Convolution neural networks (CNNs) networks could the download and processing of Sentinel-2 images. An be one of the most promising options for building de- example of taken image can be seen below (see Figure 1). tection and height prediction. CNN models have been proven to be good at extracting mid and high-level ab- stract feature representations from small raw images [17] for classification purposes, by interleaving convolutional and pooling layers, i.e., spatially shrinking the feature maps layer by layer. Recently proposed network archi- tectures also allow for dense per-pixel predictions[18]. Many of these models rely on high-resolution satellite imagery, the detail of which makes it easier for the models to identify and detect extremely small elements [19][20]. The spatial resolution of high-resolution satellite images is about 1m/pixel [20]. The downside, however, is that these high-resolution images are not taken very often. This creates the preconditions for working with old data, which could potentially give a false picture of the current Figure 1: Sentinel-2 example satellite image of Barcelona situation. Therefore, lower quality satellite imagery is a more appropriate choice in this case. In this work Sentinel satellite imagery with medium- Areas in Europe were selected by bounding mentioned resolution 10 m/pixel Sentinel images [21] has been used geographical zone in a two-point rectangle (58°59’42.0"N which allows to see the development of housing and 10°14’20.4"W x 36°57’00.0"N 36°25’51.6"E). An example infrastructure. The authors of this study performed ex- can be seen below (see Figure 2). According to bounds periments combining Sentinel imagery with additional “Planet OSM” data query was adapted to only select build- geographic and time data in order to achieve feasible ings in the bounded area. Also, buildings that were less building height estimation accuracy. than 20m in height were filtered out. Finally work is composed as follows: the details de- scribed in Chapter II include the process of collecting images of the buildings, their height and additional data, as well as the peculiarities of forming a complete data set. Chapter III describes three different models that were used to estimate the height of the buildings. Chapter IV provides a comparative analysis of the results of these models. Chapter V provides summaries and insghts from the study. 2. Dataset creation For the sake of building variety, an area of interest was bound to Europe. This gives the ability to have differ- ent types of buildings without covering the whole world. For the administrative information (building location, Figure 2: Bound geographical zone with “geojson.io” height, area) "Planet OSM” database was used which con- tains complete copies of the “OpenStreetMap” database. “OpenStreetMap” (OSM) is a collaborative project to cre- The “geojson.io” and building polygon (coming from ate a free editable geographic database of the world. For “Planet OSM” database) coordinate data was used to download large Sentinel images of the region. Polygon data from “OpenStreetMap” was iterated and patches of Table 1 32x32px were cut out by centering patch to the center Final dataset distribution of building polygon. This helped to build image dataset consisting of 32x32px image patches with building in the Dataset Image count Unique building count center. Also, metadata was taken from the image (latitude, Train 5980 2887 longitude, time of the year and the day the pictures were Validation 2237 1352 taken). Then image patch data was concatenated with Test 405 399 administrative building data. Each image of a building is accompanied by its height and area in square meters (for non-linear model). The dataset creation flowchart can be seen below (see Figure 3). Figure 5: Data distribution on the map and removed buildings. a) part of the image shows initial dataset where buildings are Figure 3: Dataset creation flowchart marked as blue dots; b) part of the image shows an initial dataset, where blue dots are buildings lower than 30m and red dots are buildings that were removed as they are above Created dataset contained 9988 images. To have 30m height more evenly distributed dataset, heights that have lower amount of images were removed. Building height 0.5 quantile was calculated to be 30m and therefore build- ings of height above 30m were removed from the dataset 3. Modeling and validation (see histogram in Figure 4). In general, building height from sentinel images may be calculated analytically if such information as Sentinel location in space, sun position, building and building shadow contour are well known. Unfortunately, while sun and Sentinel position in space may be extracted from image metadata, the shape of image buildings and its shadow information are generally unknown because of relatively rough resolution of Sentinel images. In or- der to define the model possibilities to approximate the building height of rough resolution images and validate Figure 4: Data filtered under 0.5 quantile – 30m. a) part that model itself is able to approximate building height histogram contains all dataset buildings, red line marks 0.5 from all theoretically necessary data features, three types quantile; b) part histogram contains only building counts in of models were implemented. Firstly, the baseline Non- range from 21-30m linear model that predicts building height from building’s area was created. This allows to get nonrandom initial After removing mentioned buildings dataset contain- height prediction which does not depend on Sentinel ing 8622 images and was split to train, validation, and data. Secondly, Convolutional Neural Network (CNN) test datasets. The distribution of heights after split stayed with DenseNet201 backbone model was used to extract the same. Train set contained 5980 images of 2887 unique complex features from Sentinel-2 satellite imagery in or- buildings, validation set contained 2237 images of 1352 der to find relation between building shape presented in unique buildings and test set contained 405 images of 399 rough sentinel image and height. unique buildings (see Table 1). The splitting was made Finally, a mixed data model has been created, which to prevent having the same building in train, validation expands the second model by including such information or test sets. as latitude, longitude, day of the year and time of the day. Most of the buildings were taken from southern Europe This information enables the model to approximate sun area. Higher than 30 m buildings were removed from the position. By taking into account, that Sentinel position dataset and are marked in red (see Figure 5). from the ground at the image is always similar, the third model has all information necessary for building height model with an error of 1.89 m. The nonlinear regression model was in the middle among Sentinel-based models with an error of 2.02 m (see Table 2). Here, the mean absolute error (MAE) refers to how many meters each forecast differed from the actual height of the building. Table 2 Building height prediction errors on test set using Logistic regression with weighted classes, Image patch CNN regression and mixed data prediction models Model MSE MAE MAPE Logistic regression with weighted classes 7.71 2.02 0.087 Image patch CNN based regression 9.11 2.31 0.098 Figure 6: Prediction model schemes Mixed data prediction model 6.2 1.89 0.079 detection. Different models schemes provided in above According to the results of models predictions on test (see Figure 6). set of buildings in range 20-30m it is visible that without Neural network based models architectures can be having any additional information (only that comes with seen in below (see Figure 7). On the left side (Model Sentinel imagery - latitude, longitude, time of the year 2) - the CNN model that takes an image of size 32x32 (0-365) and the time of day (0-24) that the pictures were pixels and returns a height prediction. On the right side taken) it is possible to predict building height with 1.89m (Model 3) - the Mixed data model is shown, similarly takes MAE. This is on average 13cm more precise than using a 32x32 pixels image but additionally takes additional additional data of building area which can be not avail- numerical data latitude, longitude, day of the year and able if building is newly built or not yet registered. Also, time of the day (input3 of size 4) and concatenates image the mixed data model is on average 42cm more precise and numerical data layers to single layer that returns than CNN image model. predicted height. Both models try to minimise the mean squared error loss function (MSE). Figure 8: Error distribution of models by no of predictions according to error Figure 7: Neural network based models architectures As seen in error distributions of different models in Figure 8. Most of the predictions of mixed data model are scattered around 0 error. The two other models have To measure the accuracy between models three error more widely scattered error. This means that mixed data metrics, namely, Mean Squared Error (MSE), Mean Abso- model has better accuracy as it makes smaller deviations lute Error (MAE) and Mean Absolute Percentage Error from ground truth. Also it can be seen that mixed data (MAPE) were used. These metrics were chosen as the model seems to predict lower height than the buildings most popular error metrics for regression [23]. original height. After training implemented models, the results showed In order to further investigate the performance of each that when comparing MAE (which can be measured in model in predicting height, three buildings from Nice and meters), the worst performing model is the Sentinel Im- France were analyzed in detail (see Figure 9). Based on age patch CNN-based regression with an error of 2.31 m, the height prediction results for Building 1 (see Figure 9), and the best performing model is a mixed data prediction it can be assumed that for larger buildings, such as sports arenas, the nonlinear regression model provides a more aerial optical and aerial light detection and ranging (Li- accurate prediction than other models based on image dar) data to prepare the training data. Both models use a recognition. In the photo shown, the shadow angles the convolutional neural network (CNN) architecture. The building, making it larger. It is possible that for this IM2ELEVATION model takes a single optical image as in- reason, the CNN model predicts a higher height for the put and produces an estimated DSM image as output. The building. As seen in the prediction results of building 2 IM2ELEVATION model achieved a mean absolute error example (see Figure 9), the mixed data model is the most of 1.46 while the mixed data prediction model achieved a accurate. This could be that additional data of latitude mean absolute error of 1.89. The Mixed data model, how- and longitude helps the model to consider what other ever, makes use of satellite photos of lesser quality than buildings are in the surrounding area and thus making the areal ones. The Mixed data model outperforms the the prediction more accurate. By the results on building 3 IM2ELEVATION model where buildings are sparsely dis- (see Figure 9) the most accurate prediction is also made tributed in the scene. As it performs better when building by mixed data model. are not as close together and have more distinct features. 4. Conclusion The analysis of the literature showed that Convolution neural networks are one of the most promising options at extracting mid and high-level abstract feature representa- tions from small raw images and can be used effectively for building height estimation. A building dataset consisting of 8622 different build- ings was created using images and metadata collected from Sentinel-2 from the Copernicus program to train and test the model. The "OpenStreetMap" database was used to add height and additional information for each building. During modeling and validation phase, three different models were created. The Non-Linear logistic regression model was trained using the building area for initial data and was used as baseline for other models. The CNN DenseNet201 backbone model was trained only on a sen- Figure 9: Image prediction test on three buildings in Nice tinel image patches dataset containing additional data (France). Each building marked on the map and has ground about the buildings. It performed worse than the base- truth height (GT) and model prediction line model. The Mixed data model consisting of CNN DenseNet201 backbone feature extractor was trained on This could be due to the fact that there is only one a dataset that takes both Sentinel image patch data and clearly separated building from other buildings in the numerical feature input of additional building data. It area with visible shadow and there are some surrounding performed better than both the baseline model and CNN buildings for mixed data reference. DenseNet201 backbone model that used satellite imagery To furthermore check the correctness of the results alone. more diverse dataset should be made. This dataset should Comparing the performance of the trained models, it contain images of larger range of heights not only 20- turned out that the mixed data prediction model provides 30m with similar distribution of images between heights. the best results. A mean absolute error of 1.89 and a mean Also it would be wise to test how models perform on absolute percentage error of 0.079 were achieved. The areas that have dense/sparse building distribution. Fea- reason for better performance could be that additional ture importance tool checker, such as [24], can be used data of latitude and longitude helps the model to consider to determine whether part of a CNN model identifies a what other buildings are in the surrounding area and thus building shadow as an important feature/property. make the prediction more accurate. For additional information we compared the Mixed A limitation of the models is that they are trained data model to the IM2ELEVATION model used for mainly only on data from the southern Europe area and Building Height Estimation from Single-View Aerial feature buildings only up to 30 meters. When estimat- Imagery [25]. Both models use different d atasets, the ing buildings from different regions, this could result IM2ELEVATION model uses a multisensory fusion of in inconsistent results. The area was chosen due to the abundance of data, while the height limit was decided ary Satellite. Sensors 2021, 21, 7547. https://doi.org/ to ensure a more evenly distributed dataset, as smaller 10.3390/s21227547. building data greatly outnumbers large building data. [10] Xue, Sihan, et al. "Significant wave height retrieval Testing the model with low buildings showed that to from Sentinel-1 SAR imagery by convolutional neu- apply the same model to smaller buildings a more high- ral network." Journal of Oceanography 76.6 (2020): quality dataset is needed. To improve predictions, images 465-477. of buildings that are not as close together are required, [11] Raju, P. L. N., Himani Chaudhary, and A. K. Jha. as they have more distinct features. "SHADOW ANALYSIS TECHNIQUE FOR EXTRAC- TION OF BUILDING HEIGHT USING HIGH RES- OLUTION SATELLITE SINGLE IMAGE AND AC- References CURACY ASSESSMENT." International Archives of the Photogrammetry, Remote Sensing and Spatial [1] Lívia Tomás, Leila Fonseca, Cláudia Almeida, Fer- Information Sciences (2014). nando Leonardi, Madalena Pereira (2016) Urban [12] X. Cai, H. Sui, R.Lv, and Z. Song, “Automatic circular population estimation based on residential build- oil tank detection in high-resolution optical image ings volume using IKONOS-2 images and lidar data, based on visual saliency and Hough transform” In: International Journal of Remote Sensing, 37:sup1, Proc. of IEEE Workshop on Electronics, Computer 1-28, DOI: 10.1080/01431161.2015.1121301. and Applications, pp.408-411, 2014. [2] Mahtta, Richa, Anjali Mahendra, and Karen C. Seto. [13] Xie, Yakun, et al. "Multi-Scene Building Height Es- "Building up or spreading out? Typologies of urban timation Method Based on Shadow in High Resolu- growth across 478 cities of 1 million+." Environmen- tion Imagery." Remote Sensing 13.15 (2021): 2862. tal Research Letters 14.12 (2019): 124077. [14] Irwansyah, Edy, Heryadi, Yaya, Agung, Alexan- [3] Lim, Chiehyeon, Kwang-Jae Kim, and Paul P. der. (2021). Semantic Image Segmentation Maglio. "Smart cities with big data: Reference mod- for Building Detection in Urban Area with els, challenges, and considerations." Cities 82 (2018): Aerial Photograph Image using U-Net Models. 86-99. 10.1109/AGERS51788.2020.9452773. [4] Bansal, Arun, Robert J. Kauffman, and Rob R. Weitz. [15] David Frantz, Franz Schug, Akpona Okujeni, Clau- "Comparing the modeling performance of regres- dio Navacchi, Wolfgang Wagner, Sebastian van der sion and neural networks as data quality varies: A Linden, Patrick Hostert. "National-scale mapping business value approach." Journal of Management of building height using Sentinel-1 and Sentinel-2 Information Systems 10.1 (1993): 11-32. time series." Remote Sensing of Environment 252 [5] Cheng Wang and Nancy F. Glenn Integrating Li- (2021): 112128. DAR Intensity and Elevation Data for Terrain Char- [16] Niemeyer, Joachim, Franz Rottensteiner, and Uwe acterization in a Forested Area. IEEE Geoscience Soergel. "Contextual classification of lidar data and and Remote Sensing Letters · August 2009. building object detection in urban areas." ISPRS [6] Kang, Y.; Lu, Z.; Zhao, C.; Xu, Y.; Kim, J.-W.; Galle- journal of photogrammetry and remote sensing 87 gos, A.J InSAR monitoring of creeping landslides (2014): 152-165. in mountainous regions: A case study in Eldorado [17] Zhou Feiyan, Jin Linpeng and Dong Jun, "Review National Forest, California. Remote Sensing of En- of convolutional neural networks", Journal of com- vironment 258 (2021): 112400.D. Harel, First-Order puter science, vol. 40, no. 6, pp. 1229-1251, 2017. Dynamic Logic, volume 68 of Lecture Notes in Com- [18] Bearman, Amy, et al. "What’s the point: Semantic puter Science, Springer-Verlag, New York, NY, 1979. segmentation with point supervision." European doi:10.1007/3-540-09237-4. conference on computer vision. Springer, Cham, [7] Zwally, H.J.; Schutz, B.; Abdalati, W.; Abshire, J.; 2016. Bentley, C.; Brenner, A.; Bufton, J.; Dezio, J.; Han- [19] Tobler W. “Measuring Spatial Resolution” 1987. cock, D.; Harding, D.; et al. ICESat’s Laser Measure- https://www.researchgate.net/publication/291877360 ments of Polar Ice, Atmosphere, Ocean, and Land. Measuring spatial resolution. J. Geodyn. 2002, 34, 405–445. [20] IKONOS-2. https://earth.esa.int/web/eoportal/satellite- [8] Lang, Nico, Konrad Schindler, and Jan Dirk Wegner. missions/i/ikonos-2. "Country-wide high-resolution vegetation height [21] Resolution and swath. mapping with Sentinel-2." Remote Sensing of Envi- https://sentinel.esa.int/web/sentinel/missions/sentinel- ronment 233 (2019): 111347. 2/instrument-payload/resolution-and-swath. [9] Yu, W.; You, H.; Lv, P.; Hu, Y.; Han, B. A Moving [22] Copernicus Open Access Hub. Ship Detection and Tracking Method Based on Op- https://scihub.copernicus.eu/. tical Remote Sensing Images from the Geostation- [23] Botchkarev A., “Performance Metrics (Error Mea- sures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology” 2018. https://arxiv.org/abs/1809.03006. [24] Ribeiro M. T., Singh S., Guestrin C. “Why Should I Trust You?” Explaining the Predictions of Any Clas- sifier. 2016. https://arxiv.org/pdf/1602.04938v1.pdf. [25] Liu C-J, Krylov VA, Kane P, Kavanagh G, Dahyot R. IM2ELEVATION: Building Height Estimation from Single-View Aerial Imagery. Remote Sensing. 2020; 12(17):2719. https://doi.org/10.3390/rs12172719