=Paper=
{{Paper
|id=Vol-3611/paper12
|storemode=property
|title=Building height prediction using neural networks based on Sentinel multi spectral images
|pdfUrl=https://ceur-ws.org/Vol-3611/paper12.pdf
|volume=Vol-3611
|authors=Giedrius Stravinskas,Vytas Vadapolas,Arminas Pamakštis,Andrius Kriščiūnas,Ingrida Lagzdinytė-Budnikevičienė
|dblpUrl=https://dblp.org/rec/conf/ivus/StravinskasVPKL22
}}
==Building height prediction using neural networks based on Sentinel multi spectral images==
<pdf width="1500px">https://ceur-ws.org/Vol-3611/paper12.pdf</pdf>
<pre>
                                Building height prediction using neural networks based on
                                Sentinel multi spectral images
                                Giedrius Stravinskas1,* , Vytas Vadapolas1,* , Arminas Pamakštis1 , Andrius Kriščiūnas1 and
                                Ingrida Lagzdinyṫe-Budnikė1
                                1
                                    Kaunas University of Technology, Faculty of Informatics, Department of Applied Informatics


                                                                          Abstract
                                                                          Satellite imagery is a form of data that can be used for many applications, especially those focusing on change over time. In
                                                                          this article, we analyze methods of detecting buildings and predicting their height as well as what key attributes are required
                                                                          for good predictions. Building detection and prediction are done by using neural network algorithms such as convolutional
                                                                          neural networks to estimate their height. Predictions are made based on additional data of the building and its area. In this
                                                                          paper three different building estimation models are implemented. The research showed that using a mixed dataset that takes
                                                                          both Sentinel image patch data and numerical feature input of additional building data performs well even with lower quality
                                                                          images.

                                                                          Keywords
                                                                          Building height prediction, Sentinel images, artificial neural networks


                                1. Introduction                                                                                       focus is ice, and all-year measurements of clouds and
                                                                                                                                      aerosol distributions over land in Polar Regions [7].
                                Having a way of quickly predicting building height pro-                                                  Sentinel-1/Sentinel-2 images came from the Coperni-
                                vides us with essential knowledge for sustainable urban cus programme which is a European initiative for the
                                development and plays a vital role in the fields of urban, implementation of information services dealing with the
                                pollution transmission, building energy consumption, environment and security, based on observation data re-
                                population estimation [1]. Essentially building height ceived from Earth Observation (EO) satellites and ground-
                                information is crucial for the comprehensive understand- based information. Copernicus API provides access to
                                ing of urban development [2].                                                                         Satellite images obtained during the Sentinel missions
                                   Determining urban development and its magnitude allowing comparison of the same locations within dif-
                                usually requires the aggregation of many criteria. This is ferent time frames. The Sentinel-2 satellite is equipped
                                a difficult process due to the time it takes to access this with an opto-electronic multispectral sensor for survey-
                                information and the possible changes that might happen ing with a resolution of 10 to 60 m in the visible, near
                                while the data is being collected. Even with automated infrared (VNIR), and short-wave infrared (SWIR) spectral
                                monitoring systems, this creates linked data which is zones, including 13 spectral channels. This ensures the
                                hard to handle [3]. In addition, some things that are not capture of differences in vegetation state, including tem-
                                documented or finished will not be collected and this poral changes, and minimizes impact on the quality of
                                will make the resulting predictions less accurate, as with atmospheric photography. The orbit is an average height
                                the degradation of the data accuracy, predictive accuracy of 785 km and the presence of two satellites in the mis-
                                goes down too [4]. Therefore, approaching this problem sion allows repeated surveys every 5 days at the equator
                                there is a need to use data that is easier to get and reflect and every 2-3 days at middle latitudes.
                                the current state precisely. Satellite images that are up                                                An analysis of literature where Satellite images are
                                to date are a great source of current data that is freely used has shown a variety of different use cases, such
                                available.                                                                                            as detecting specific crops and change in the soil struc-
                                   There are many sources of satellite images like Lidar, ture [8]. Other examples include continuous observation
                                Sentinel-1, Sentinel-2, InSar, ICESat and others. All of of ships moving on the sea surface [9] and retrieving
                                these specialize in different areas, have different spec- significant wave height [10].
                                trums that can be used for different tasks. For example, Li-                                             The focus of this work is on prediction of building
                                dar and InSar capture ground elevation/deformation and height. There are already existing approaches that help
                                are great for tasks that focus on nature/surface changes to solve similar problems. For example, shadows in com-
                                [5][6]. ICESat images measure ice sheet balance. The bination with gradient formulas are employed in high-
                                IVUS 2022: 27th International Conference on Information Technology resolution images for building height extraction [11],
                                *
                                  Corresponding author.                                                                               objects that are salient are identified and then their edges
                                         © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                         Attribution 4.0 International (CC BY 4.0).                                                   are found [12]. Multi-scene building height estimation
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
method is using shadow length calculation combined             getting the building imagery data “Copernicus Open Ac-
with fish net and Pauta criterion [13]. Other building         cess Hub” applied programming interface (API) was used
detection works include using the U-Net model to assign        [22]. This allows getting satellite image patches with ad-
semantic labels to each pixel as building/non-building         ditional information (latitude, longitude, time of the year
[14] or using 3D building models in conjunction with           and the day that the pictures were taken). To further sim-
satellite imagery to predict building height [15], as well     plify the process “Copernicus Open Access Hub” applied
as using deep convolutional neural networks (DCNN) for         programming interface (API) was used with “Python”
semantic segmentation and applying filters [16].               programming language integration. This has facilitated
   Convolution neural networks (CNNs) networks could           the download and processing of Sentinel-2 images. An
be one of the most promising options for building de-          example of taken image can be seen below (see Figure 1).
tection and height prediction. CNN models have been
proven to be good at extracting mid and high-level ab-
stract feature representations from small raw images [17]
for classification purposes, by interleaving convolutional
and pooling layers, i.e., spatially shrinking the feature
maps layer by layer. Recently proposed network archi-
tectures also allow for dense per-pixel predictions[18].
   Many of these models rely on high-resolution satellite
imagery, the detail of which makes it easier for the models
to identify and detect extremely small elements [19][20].
The spatial resolution of high-resolution satellite images
is about 1m/pixel [20]. The downside, however, is that
these high-resolution images are not taken very often.
This creates the preconditions for working with old data,
which could potentially give a false picture of the current    Figure 1: Sentinel-2 example satellite image of Barcelona
situation. Therefore, lower quality satellite imagery is a
more appropriate choice in this case.
   In this work Sentinel satellite imagery with medium-           Areas in Europe were selected by bounding mentioned
resolution 10 m/pixel Sentinel images [21] has been used       geographical zone in a two-point rectangle (58°59’42.0"N
which allows to see the development of housing and             10°14’20.4"W x 36°57’00.0"N 36°25’51.6"E). An example
infrastructure. The authors of this study performed ex-        can be seen below (see Figure 2). According to bounds
periments combining Sentinel imagery with additional           “Planet OSM” data query was adapted to only select build-
geographic and time data in order to achieve feasible          ings in the bounded area. Also, buildings that were less
building height estimation accuracy.                           than 20m in height were filtered out.
   Finally work is composed as follows: the details de-
scribed in Chapter II include the process of collecting
images of the buildings, their height and additional data,
as well as the peculiarities of forming a complete data set.
Chapter III describes three different models that were
used to estimate the height of the buildings. Chapter IV
provides a comparative analysis of the results of these
models. Chapter V provides summaries and insghts from
the study.


2. Dataset creation
For the sake of building variety, an area of interest was
bound to Europe. This gives the ability to have differ-
ent types of buildings without covering the whole world.
For the administrative information (building location, Figure 2: Bound geographical zone with “geojson.io”
height, area) "Planet OSM” database was used which con-
tains complete copies of the “OpenStreetMap” database.
“OpenStreetMap” (OSM) is a collaborative project to cre-     The “geojson.io” and building polygon (coming from
ate a free editable geographic database of the world. For “Planet OSM” database) coordinate data was used to
                                                          download large Sentinel images of the region. Polygon
data from “OpenStreetMap” was iterated and patches of Table 1
32x32px were cut out by centering patch to the center Final dataset distribution
of building polygon. This helped to build image dataset
consisting of 32x32px image patches with building in the    Dataset      Image count              Unique building count
center. Also, metadata was taken from the image (latitude,   Train           5980                         2887
longitude, time of the year and the day the pictures were  Validation        2237                         1352
taken). Then image patch data was concatenated with           Test            405                          399
administrative building data. Each image of a building is
accompanied by its height and area in square meters (for
non-linear model). The dataset creation flowchart can be
seen below (see Figure 3).


                                                              Figure 5: Data distribution on the map and removed buildings.
                                                              a) part of the image shows initial dataset where buildings are
Figure 3: Dataset creation flowchart                          marked as blue dots; b) part of the image shows an initial
                                                              dataset, where blue dots are buildings lower than 30m and
                                                              red dots are buildings that were removed as they are above
   Created dataset contained 9988 images. To have             30m height
more evenly distributed dataset, heights that have lower
amount of images were removed. Building height 0.5
quantile was calculated to be 30m and therefore build-
ings of height above 30m were removed from the dataset        3. Modeling and validation
(see histogram in Figure 4).
                                                              In general, building height from sentinel images may be
                                                              calculated analytically if such information as Sentinel
                                                              location in space, sun position, building and building
                                                              shadow contour are well known. Unfortunately, while
                                                              sun and Sentinel position in space may be extracted from
                                                              image metadata, the shape of image buildings and its
                                                              shadow information are generally unknown because of
                                                              relatively rough resolution of Sentinel images. In or-
                                                              der to define the model possibilities to approximate the
                                                              building height of rough resolution images and validate
Figure 4: Data filtered under 0.5 quantile – 30m. a) part that model itself is able to approximate building height
histogram contains all dataset buildings, red line marks 0.5 from all theoretically necessary data features, three types
quantile; b) part histogram contains only building counts in of models were implemented. Firstly, the baseline Non-
range from 21-30m
                                                              linear model that predicts building height from building’s
                                                              area was created. This allows to get nonrandom initial
   After removing mentioned buildings dataset contain- height prediction which does not depend on Sentinel
ing 8622 images and was split to train, validation, and data. Secondly, Convolutional Neural Network (CNN)
test datasets. The distribution of heights after split stayed with DenseNet201 backbone model was used to extract
the same. Train set contained 5980 images of 2887 unique complex features from Sentinel-2 satellite imagery in or-
buildings, validation set contained 2237 images of 1352 der to find relation between building shape presented in
unique buildings and test set contained 405 images of 399 rough sentinel image and height.
unique buildings (see Table 1). The splitting was made           Finally, a mixed data model has been created, which
to prevent having the same building in train, validation expands the second model by including such information
or test sets.                                                 as latitude, longitude, day of the year and time of the day.
   Most of the buildings were taken from southern Europe This information enables the model to approximate sun
area. Higher than 30 m buildings were removed from the position. By taking into account, that Sentinel position
dataset and are marked in red (see Figure 5).                 from the ground at the image is always similar, the third
                                                              model has all information necessary for building height
                                                             model with an error of 1.89 m. The nonlinear regression
                                                             model was in the middle among Sentinel-based models
                                                             with an error of 2.02 m (see Table 2). Here, the mean
                                                             absolute error (MAE) refers to how many meters each
                                                             forecast differed from the actual height of the building.

                                                             Table 2
                                                             Building height prediction errors on test set using Logistic
                                                             regression with weighted classes, Image patch CNN regression
                                                             and mixed data prediction models

                                                             Model                                     MSE MAE MAPE
                                                             Logistic regression with weighted classes 7.71 2.02 0.087
                                                             Image patch CNN based regression          9.11 2.31 0.098
Figure 6: Prediction model schemes                           Mixed data prediction model                6.2 1.89 0.079


detection. Different models schemes provided in above           According to the results of models predictions on test
(see Figure 6).                                              set of buildings in range 20-30m it is visible that without
   Neural network based models architectures can be          having any additional information (only that comes with
 seen in below (see Figure 7). On the left side (Model       Sentinel imagery - latitude, longitude, time of the year
2) - the CNN model that takes an image of size 32x32         (0-365) and the time of day (0-24) that the pictures were
pixels and returns a height prediction. On the right side    taken) it is possible to predict building height with 1.89m
(Model 3) - the Mixed data model is shown, similarly takes   MAE. This is on average 13cm more precise than using
a 32x32 pixels image but additionally takes additional       additional data of building area which can be not avail-
numerical data latitude, longitude, day of the year and      able if building is newly built or not yet registered. Also,
time of the day (input3 of size 4) and concatenates image    the mixed data model is on average 42cm more precise
and numerical data layers to single layer that returns       than CNN image model.
predicted height. Both models try to minimise the mean
squared error loss function (MSE).


                                                             Figure 8: Error distribution of models by no of predictions
                                                             according to error


Figure 7: Neural network based models architectures         As seen in error distributions of different models in
                                                         Figure 8. Most of the predictions of mixed data model
                                                         are scattered around 0 error. The two other models have
   To measure the accuracy between models three error more widely scattered error. This means that mixed data
metrics, namely, Mean Squared Error (MSE), Mean Abso- model has better accuracy as it makes smaller deviations
lute Error (MAE) and Mean Absolute Percentage Error from ground truth. Also it can be seen that mixed data
(MAPE) were used. These metrics were chosen as the model seems to predict lower height than the buildings
most popular error metrics for regression [23].          original height.
   After training implemented models, the results showed    In order to further investigate the performance of each
that when comparing MAE (which can be measured in model in predicting height, three buildings from Nice and
meters), the worst performing model is the Sentinel Im- France were analyzed in detail (see Figure 9). Based on
age patch CNN-based regression with an error of 2.31 m, the height prediction results for Building 1 (see Figure 9),
and the best performing model is a mixed data prediction it can be assumed that for larger buildings, such as sports
arenas, the nonlinear regression model provides a more       aerial optical and aerial light detection and ranging (Li-
accurate prediction than other models based on image         dar) data to prepare the training data. Both models use a
recognition. In the photo shown, the shadow angles the       convolutional neural network (CNN) architecture. The
building, making it larger. It is possible that for this     IM2ELEVATION model takes a single optical image as in-
reason, the CNN model predicts a higher height for the       put and produces an estimated DSM image as output. The
building. As seen in the prediction results of building 2    IM2ELEVATION model achieved a mean absolute error
example (see Figure 9), the mixed data model is the most     of 1.46 while the mixed data prediction model achieved a
accurate. This could be that additional data of latitude     mean absolute error of 1.89. The Mixed data model, how-
and longitude helps the model to consider what other         ever, makes use of satellite photos of lesser quality than
buildings are in the surrounding area and thus making        the areal ones. The Mixed data model outperforms the
the prediction more accurate. By the results on building 3   IM2ELEVATION model where buildings are sparsely dis-
(see Figure 9) the most accurate prediction is also made     tributed in the scene. As it performs better when building
by mixed data model.                                         are not as close together and have more distinct features.


                                                             4. Conclusion
                                                           The analysis of the literature showed that Convolution
                                                           neural networks are one of the most promising options at
                                                           extracting mid and high-level abstract feature representa-
                                                           tions from small raw images and can be used effectively
                                                           for building height estimation.
                                                              A building dataset consisting of 8622 different build-
                                                           ings was created using images and metadata collected
                                                           from Sentinel-2 from the Copernicus program to train
                                                           and test the model. The "OpenStreetMap" database was
                                                           used to add height and additional information for each
                                                           building.
                                                              During modeling and validation phase, three different
                                                           models were created. The Non-Linear logistic regression
                                                           model was trained using the building area for initial data
                                                           and was used as baseline for other models. The CNN
                                                           DenseNet201 backbone model was trained only on a sen-
Figure 9: Image prediction test on three buildings in Nice tinel image patches dataset containing additional data
(France). Each building marked on the map and has ground about the buildings. It performed worse than the base-
truth height (GT) and model prediction
                                                           line model. The Mixed data model consisting of CNN
                                                           DenseNet201 backbone feature extractor was trained on
   This could be due to the fact that there is only one a dataset that takes both Sentinel image patch data and
clearly separated building from other buildings in the numerical feature input of additional building data. It
area with visible shadow and there are some surrounding performed better than both the baseline model and CNN
buildings for mixed data reference.                        DenseNet201 backbone model that used satellite imagery
   To furthermore check the correctness of the results alone.
more diverse dataset should be made. This dataset should      Comparing the performance of the trained models, it
contain images of larger range of heights not only 20- turned out that the mixed data prediction model provides
30m with similar distribution of images between heights. the best results. A mean absolute error of 1.89 and a mean
Also it would be wise to test how models perform on absolute percentage error of 0.079 were achieved. The
areas that have dense/sparse building distribution. Fea- reason for better performance could be that additional
ture importance tool checker, such as [24], can be used data of latitude and longitude helps the model to consider
to determine whether part of a CNN model identifies a what other buildings are in the surrounding area and thus
building shadow as an important feature/property.          make the prediction more accurate.
   For additional information we compared the Mixed           A limitation of the models is that they are trained
data model to the IM2ELEVATION model used for mainly only on data from the southern Europe area and
Building Height Estimation from Single-View Aerial feature buildings only up to 30 meters. When estimat-
Imagery [25]. Both models use different d atasets, the ing buildings from different regions, this could result
IM2ELEVATION model uses a multisensory fusion of in inconsistent results. The area was chosen due to the
abundance of data, while the height limit was decided              ary Satellite. Sensors 2021, 21, 7547. https://doi.org/
to ensure a more evenly distributed dataset, as smaller            10.3390/s21227547.
building data greatly outnumbers large building data.         [10] Xue, Sihan, et al. "Significant wave height retrieval
   Testing the model with low buildings showed that to             from Sentinel-1 SAR imagery by convolutional neu-
apply the same model to smaller buildings a more high-             ral network." Journal of Oceanography 76.6 (2020):
quality dataset is needed. To improve predictions, images          465-477.
of buildings that are not as close together are required, [11] Raju, P. L. N., Himani Chaudhary, and A. K. Jha.
as they have more distinct features.                               "SHADOW ANALYSIS TECHNIQUE FOR EXTRAC-
                                                                   TION OF BUILDING HEIGHT USING HIGH RES-
                                                                   OLUTION SATELLITE SINGLE IMAGE AND AC-
References                                                         CURACY ASSESSMENT." International Archives of
                                                                   the Photogrammetry, Remote Sensing and Spatial
 [1] Lívia Tomás, Leila Fonseca, Cláudia Almeida, Fer-
                                                                   Information Sciences (2014).
      nando Leonardi, Madalena Pereira (2016) Urban
                                                              [12] X. Cai, H. Sui, R.Lv, and Z. Song, “Automatic circular
      population estimation based on residential build-
                                                                   oil tank detection in high-resolution optical image
      ings volume using IKONOS-2 images and lidar data,
                                                                   based on visual saliency and Hough transform” In:
      International Journal of Remote Sensing, 37:sup1,
                                                                   Proc. of IEEE Workshop on Electronics, Computer
      1-28, DOI: 10.1080/01431161.2015.1121301.
                                                                   and Applications, pp.408-411, 2014.
 [2] Mahtta, Richa, Anjali Mahendra, and Karen C. Seto.
                                                              [13] Xie, Yakun, et al. "Multi-Scene Building Height Es-
      "Building up or spreading out? Typologies of urban
                                                                   timation Method Based on Shadow in High Resolu-
      growth across 478 cities of 1 million+." Environmen-
                                                                   tion Imagery." Remote Sensing 13.15 (2021): 2862.
      tal Research Letters 14.12 (2019): 124077.
                                                              [14] Irwansyah, Edy, Heryadi, Yaya, Agung, Alexan-
 [3] Lim, Chiehyeon, Kwang-Jae Kim, and Paul P.
                                                                   der. (2021). Semantic Image Segmentation
      Maglio. "Smart cities with big data: Reference mod-
                                                                   for Building Detection in Urban Area with
      els, challenges, and considerations." Cities 82 (2018):
                                                                   Aerial Photograph Image using U-Net Models.
      86-99.
                                                                   10.1109/AGERS51788.2020.9452773.
 [4] Bansal, Arun, Robert J. Kauffman, and Rob R. Weitz.
                                                              [15] David Frantz, Franz Schug, Akpona Okujeni, Clau-
      "Comparing the modeling performance of regres-
                                                                   dio Navacchi, Wolfgang Wagner, Sebastian van der
      sion and neural networks as data quality varies: A
                                                                   Linden, Patrick Hostert. "National-scale mapping
      business value approach." Journal of Management
                                                                   of building height using Sentinel-1 and Sentinel-2
      Information Systems 10.1 (1993): 11-32.
                                                                   time series." Remote Sensing of Environment 252
 [5] Cheng Wang and Nancy F. Glenn Integrating Li-
                                                                   (2021): 112128.
      DAR Intensity and Elevation Data for Terrain Char-
                                                              [16] Niemeyer, Joachim, Franz Rottensteiner, and Uwe
      acterization in a Forested Area. IEEE Geoscience
                                                                   Soergel. "Contextual classification of lidar data and
      and Remote Sensing Letters · August 2009.
                                                                   building object detection in urban areas." ISPRS
 [6] Kang, Y.; Lu, Z.; Zhao, C.; Xu, Y.; Kim, J.-W.; Galle-
                                                                   journal of photogrammetry and remote sensing 87
      gos, A.J InSAR monitoring of creeping landslides
                                                                   (2014): 152-165.
      in mountainous regions: A case study in Eldorado
                                                              [17] Zhou Feiyan, Jin Linpeng and Dong Jun, "Review
      National Forest, California. Remote Sensing of En-
                                                                   of convolutional neural networks", Journal of com-
      vironment 258 (2021): 112400.D. Harel, First-Order
                                                                   puter science, vol. 40, no. 6, pp. 1229-1251, 2017.
      Dynamic Logic, volume 68 of Lecture Notes in Com-
                                                              [18] Bearman, Amy, et al. "What’s the point: Semantic
      puter Science, Springer-Verlag, New York, NY, 1979.
                                                                   segmentation with point supervision." European
      doi:10.1007/3-540-09237-4.
                                                                   conference on computer vision. Springer, Cham,
 [7] Zwally, H.J.; Schutz, B.; Abdalati, W.; Abshire, J.;
                                                                   2016.
      Bentley, C.; Brenner, A.; Bufton, J.; Dezio, J.; Han-
                                                              [19] Tobler W. “Measuring Spatial Resolution” 1987.
      cock, D.; Harding, D.; et al. ICESat’s Laser Measure-
                                                                   https://www.researchgate.net/publication/291877360
      ments of Polar Ice, Atmosphere, Ocean, and Land.
                                                                   Measuring spatial resolution.
      J. Geodyn. 2002, 34, 405–445.
                                                              [20] IKONOS-2. https://earth.esa.int/web/eoportal/satellite-
 [8] Lang, Nico, Konrad Schindler, and Jan Dirk Wegner.
                                                                   missions/i/ikonos-2.
      "Country-wide high-resolution vegetation height
                                                              [21] Resolution                   and                swath.
      mapping with Sentinel-2." Remote Sensing of Envi-
                                                                   https://sentinel.esa.int/web/sentinel/missions/sentinel-
      ronment 233 (2019): 111347.
                                                                   2/instrument-payload/resolution-and-swath.
 [9] Yu, W.; You, H.; Lv, P.; Hu, Y.; Han, B. A Moving
                                                              [22] Copernicus            Open         Access         Hub.
      Ship Detection and Tracking Method Based on Op-
                                                                   https://scihub.copernicus.eu/.
      tical Remote Sensing Images from the Geostation-
                                                              [23] Botchkarev A., “Performance Metrics (Error Mea-
     sures) in Machine Learning Regression, Forecasting
     and Prognostics: Properties and Typology” 2018.
     https://arxiv.org/abs/1809.03006.
[24] Ribeiro M. T., Singh S., Guestrin C. “Why Should I
     Trust You?” Explaining the Predictions of Any Clas-
     sifier. 2016. https://arxiv.org/pdf/1602.04938v1.pdf.
[25] Liu C-J, Krylov VA, Kane P, Kavanagh G, Dahyot R.
     IM2ELEVATION: Building Height Estimation from
     Single-View Aerial Imagery. Remote Sensing. 2020;
     12(17):2719. https://doi.org/10.3390/rs12172719

</pre>