<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Dymo);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Improving the efficiency of damaged buildings detection based on ASPP technologies⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerii Dymo</string-name>
          <email>dymovalery@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksandr Gozhyj</string-name>
          <email>alex.gozhyj@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Kalinina</string-name>
          <email>irina.kalinina1612@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Petro Mohyla Black Sea National University</institution>
          ,
          <addr-line>St. 68 Desantnykiv 10, Mykolaiv, 54000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2045</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The paper presents an increase in the efficiency of detecting damaged buildings on satellite images of the U-Net convolutional network model by modifying. Instead of the usual bottleneck, the use of atrous spatial pyramid pooling (ASPP) is proposed. As part of the study, the dataset was expanded to 100 images with dimensions of 512x512 pixels, and various augmentations were applied to increase the variability of the dataset, which contributed to more effective training on a limited dataset. Weighting coefficients for each image were also added to the dataset, which were used during training to solve the problem of the predominance of the number of pixels of one class over others. Models of different configurations with an ASPP layer were built and compared with the base U-Net model without ASPP. As a result of testing on the evaluation dataset, an increase in the mean IoU by 5.39% compared to the classical architecture was observed, as well as a significant reduction in overall losses and an increase in the mean IoU by about 2% on a separate testing dataset, which indicates a corresponding increase in the model's efficiency. The proposed architecture can be used in further studies of segmentation of images of buildings damaged by hostilities.</p>
      </abstract>
      <kwd-group>
        <kwd>detection of damage buildings</kwd>
        <kwd>semantic segmentation</kwd>
        <kwd>U-Net</kwd>
        <kwd>ASPP 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recent studies [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ] show that a large number of popular and effective neural network models for
detecting damage due to various disasters are built using convolutional networks, among which
YOLO, ResNet and U-Net stand out. At the same time, the application of such neural networks for
complex detection or segmentation tasks, for example, as the detection of damaged or destroyed
buildings for preliminary assessment of destruction, reveals existing difficulties: loss of accuracy in
the case of a large number of objects of different sizes, insufficient recognition of object edges, etc.
      </p>
      <p>Various approaches are used to improve efficiency. For example, the use of weighting
coefficients helps to train the model with different numbers of images or pixels belonging to one of
the classes. Another approach may be to increase the number of images through augmentation, or
to change the parameter settings in the model itself – this may affect the model's ability to take
into account more features from the images, but does not solve problems such as the model's
inability to “focus” on specific regions of the image, or to extract information of different
scalability, which requires the use of more complex image processing methods, changing the model</p>
      <p>
        The authors of the study [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose their own module Feature Pyramid Network, which can be
used in various convolutional networks to extract image features for both object detection and
segmentation. The module is built taking into account the separation of features “top-down”, which
allows detecting both low-level features and high-level ones.
      </p>
      <p>
        In studies [
        <xref ref-type="bibr" rid="ref5 ref6">5-6</xref>
        ], the use of spatial pyramid pooling was also proposed, which allows the use of
datasets of different dimensions without losing the accuracy and efficiency of object detection in
images. In turn, the authors of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] propose atrous convolutions as a solution for expanding the
receptive field by using larger gaps between kernel elements, which allows capturing remote
features and more spatial information without increasing the number of parameters or reducing
resolution. Combining the architecture of pyramid pooling, the authors proposed atrous spatial
pyramid pooling (hereinafter ASPP is used), which allowed capturing more information of different
scales by using parallel atrous convolutions with different rates.
      </p>
      <p>
        The authors applied the created ASPP technique in their own model DeepLabv3+ [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which
extends the previous model by using an encoder-decoder architecture. This allowed to improve the
segmentation of objects in images that have a complex structure and different scalability.
      </p>
      <p>
        The authors of the following studies [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-10</xref>
        ] applied the general principle of operation of ASPP to
solve various tasks related to remote recognition of objects on the ground. Thus, the work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] aims
to build a horizontal U-shaped GAN model for segmentation of building images using the
intermediate ASPP module, which allows to improve the localization and detection of buildings of
different sizes.
      </p>
      <p>
        In the study [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a modified ASPP architecture is proposed using an additional feature extraction
channel and changing the design of the dilation rate, as well as introducing an attention
coordination mechanism. Despite the improvements obtained, the authors note the possibility of
some limitations of the model regarding the influence of shadows on the accuracy of building and
vegetation segmentation, which can be solved by more accurate color distribution, detection of
object edges, etc.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], a new Feature Residual Analysis Network model is presented, which offers a balance
between sufficient accuracy and speed of feature extraction, using Feature Pyramid Pooling
inspired by the corresponding ASPP module.
      </p>
      <p>
        In turn, although the study [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] does not address the topic of building detection in images, the
authors propose the application of U-Net using ASPP for segmentation of brain tumor images, and
note the corresponding improvements in accuracy and reduction of losses relative to the basic
UNet architecture.
      </p>
      <p>Problem statement. The purpose of this paper is to improve the efficiency of the U-Net
convolutional network model by implementing an ASPP layer instead of a bottleneck to increase
the accuracy of segmentation of buildings damaged as a result of hostilities in satellite images.</p>
    </sec>
    <sec id="sec-2">
      <title>2. U-Net convolutional network architecture with atrous spatial pyramid pooling layer</title>
      <p>
        In the framework of a previous study [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a classical U-Net architecture model were built,
originally developed by Olaf Ronneberger, Philipp Fischer, Thomas Brox [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which is based on a
fully convolutional neural network [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] with appropriate changes in the structure. The model was
used for semantic segmentation of buildings damaged as a result of hostilities in satellite images,
had 64 filters, and was trained on a smaller dataset (50 images with dimensions of 256 by 256
pixels) for 25 epochs. As a result, it was possible to achieve an overall accuracy of 84.21%, as well as
corresponding IoU indicators of 45.83% for damaged and 49.14% for intact buildings on the
evaluation dataset.
      </p>
      <p>In turn, the constructed model had difficulties in determining buildings of different sizes and
shapes, in some images, homestead plots (which can often be found in the private sector of
Ukrainian cities and towns) could be identified as buildings. It is worth considering the peculiarities
of damage as a result of hostilities, which differ from damage caused by natural phenomena such as
hurricanes or floods. There was a need to improve the segmentation capabilities of the model by
changing its architecture, or by using other processing methods.</p>
      <p>Figure 1 shows the architecture of the model for segmenting damaged buildings studied in a
previous work.</p>
      <p>The classic U-Net architecture consists of two main parts – Contracting and Expanding paths,
which are necessary for appropriate feature selection and image segmentation, which in turn is
similar to the principle of operation of encoder-decoder models.</p>
      <p>In addition, the model has various components, such as: Convolutional Block (a sequence of 3x3
convolution layers and activation functions), Encoder Block (a component of Contracting Path,
which uses Convolutional Block, 2x2 pooling layer and Dropout), Decoder Block (an analogue of
Encoder Block, which uses upsampling layers, concatenation of previous layers and corresponding
Dropout), as well as Bottleneck, which is the narrowest point of the model with the lowest image
resolution but the largest number of channels.</p>
      <p>In this study, we propose the use of Atrous Spatial Pyramid Pooling to potentially improve the
model by applying multiple pooling layers with different distance rates, which allows selecting the
necessary features of objects in the image at different levels of scalability. The use of the ASPP
module instead of the bottleneck will reduce the computational complexity of the network and
improve image segmentation by using atrous convolutions.</p>
      <p>Typically, the use of deep convolutional neural networks is successfully applied to most
segmentation tasks due to their fully connected nature, although constant repetition of
maxpooling and striding operations significantly reduces spatial resolution, and also creates a lot of
parameters to calculate, which in turn significantly complicates the model and leads to additional
resource consumption and time for training. The use of deconvolution layers can solve some of the
problems, but requires even more memory and time.</p>
      <p>
        Figure 2 shows an example of a visual representation of the atrous convolutions kernel (in other
works [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] a similar principle is also given as dilated convolutions) with different parameters of the
distance between the filters.
      </p>
      <p>
        In turn, the use of atrous convolution, proposed by the authors [
        <xref ref-type="bibr" rid="ref15 ref6">6, 15</xref>
        ] allows to calculate the
output data of the layers with any resolution, and can be used both after training the network and
implemented together with it as well. As can be seen in the figure, if in the usual convolution
operation, the filter has a fixed size, and slides over the input feature map, multiplying the values
to calculate the output value, then in the case of atrous convolution the filter has “gaps”. Thus, the
filter becomes larger and is able to cover more of the receptive field, which is respectively
determined by the size of the filter, but at the same time this does not lead to an increase in the
number of parameters for calculation, since only non-zero values are taken into account, which
replace the gaps.
      </p>
      <p>
        Spatial Pyramid Pooling was used in the original work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in R-CNN to eliminate the need to
train a neural network on fixed-size images by using resampling of convolutional features at the
same scale, which inspired the authors to create a modified version – Atrous Spatial Pyramid
Pooling. ASPP uses multiple parallel convolutional layers with atrous convolutions implemented at
different sampling rates. ASPP extracts features for each sampling rate, which are processed in
separate layers, and then combined into one to obtain the final result [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>As can be seen, the step-by-step application of several filters with different distances allows to
significantly reduce the need for a larger number of operations (by reducing the parameters) with
better results in detecting both low-level and high-level features.</p>
      <p>The proposed ASPP module for modifying U-Net is shown in Figure 3.</p>
      <p>The data from the Contracting Path is fed into the ASPP module, which applies a stepwise 1x1
convolution, 3x3 convolution with appropriate filter spacing (atrous convolutions), and a pooling
process, followed by concatenating all previous layers into one. At the end of the module, a 1x1
convolution is performed and the processed data is fed into the Expanding Path. This allows the
model to receive more information from different scales, which in turn can improve the model’s
ability to detect the shape of an object in an image more accurately, which is important when
segmenting damaged buildings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset, pre-processing and implementation of augmentations</title>
      <p>The model training set was created using Google Earth. Images of the private sector of the city of
Mariupol were used, which were taken around May 2022. The dataset contains 100 satellite images
with dimensions of 512 by 512 pixels, and contains about more than 2500 unique instances of
buildings, as well as 100 images of segmentation masks for each image (Figure 4).</p>
      <p>As can be seen, each image contains dozens of buildings of different shapes and scales, as well
as other objects that pose significant obstacles to detection, such as roads, trees, small architectural
forms, etc. This study considers the segmentation of damaged and intact buildings, without
segmenting more complex and diverse objects to simplify the task.</p>
      <p>For the segmentation task, three classes were defined, namely: background (pixels that do not
belong to buildings), normal (pixels that belong to buildings that are not visually damaged), and
damaged (damaged or destroyed). It is worth noting that this approach does not allow to determine
the nature of the damage inside the building, therefore, in order to improve the model performance
and correctly classify pixels, some buildings that were probably damaged were assigned to the
normal class if the significant damage caused by the fighting could not be visually confirmed, or
this damage is not significant in the context of this study.</p>
      <p>
        An example of annotated images is shown in Figure 5. Black pixels indicate background, while
green and red pixels indicate normal and damaged classes, respectively. The free to use Labelme
software was used in the annotation process [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Image preprocessing included sharpening (reducing blur), normalization, and dimensionality
rescaling as needed (if the model input is smaller than the original image size). Augmentation was
also applied using the Albumentations library [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which increased the variability and ability of
the model to learn from a limited dataset.
      </p>
      <p>Figure 6 shows an example of the applied augmentations. The following augmentations
configurations were used in the process.</p>
      <p>


shift_limit=0.3, scale_limit=0.1, rotate_limit=270
shift_limit=0.2, scale_limit=0.05, rotate_limit=90, blur_limit=3
shift_limit=0.2, scale_limit=0.15, rotate_limit=180</p>
      <p>The final step was to create a function to generate weights for the image samples that were
applied to each existing feature in the dataset (including the corresponding segmentation masks).
This approach allows using weights for each pixel of the corresponding class to balance the
existing classes in the dataset.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Metrics and functions</title>
      <p>The U-Net architecture involves the use of a Softmax function (also known as a normalized
exponential function) in the last layer, which allows the original data to be transformed into a
probability distribution. In this case, this means that each pixel will be assigned a corresponding
probability value for belonging to each class, and the total sum will be equal to one, after which the
functions of obtaining the largest value can be applied to finally determine the most likely class to
which a particular pixel belongs.</p>
      <p>The cost function used is the categorical cross entropy (CCE), which is often used in semantic
segmentation problems. CCE calculates the “distance” between the actual class distribution and the
corresponding prediction; a lower score indicates a greater degree of agreement between the
predicted value and reality. Since CCE calculates the score based on probability distributions,
Softmax is used before applying the loss function.</p>
      <p>
        Formulas 1 and 2 reflect the corresponding Softmax and CCE functions [
        <xref ref-type="bibr" rid="ref18">18-20</xref>
        ]:
where c is the current class in the range from 1 to k, j is the iteration from 1 to k, image is the
output data (image), and exp is the exponent of the corresponding set of values.
      </p>
      <p>where c is the current class in the range from 1 to k, image is the original data (image), and
ground_truth is the segmentation mask for the corresponding image.</p>
      <p>Since the work considers semantic image segmentation, Intersection over Union (IoU) was
chosen as a metric for calculating the effectiveness of the model, which takes into account the ratio
of the number of pixels of the original image with the corresponding segmentation map. During
training and testing, IoU is calculated for two main classes – intact and damaged buildings, as well
as the average IoU value between all classes, which allows for a more complete understanding of
some models [21].</p>
      <p>Formulas 3 and 4 for IoU are given below [22-24].</p>
      <p>where c is the current class for calculation, TP is the ‘true positive’ value, FP is the ‘false
positive’ value, FN is the ‘false negative’ value of the pixels.
(1)
(2)
(3)
(4)
where c is the current class to calculate, Cmax is total classes, IoUc is the calculated IoU value for
class c.</p>
      <p>In turn, the use of weights allows the model to focus on more important classes, for example, on
building classes instead of background. The corresponding function values are multiplied by
weights, which at the output gives an adjusted result, for example, the loss function takes into
account the different importance between classes, increasing the loss for targeted classes. The
selection of weights is quite a complex task and may require additional research.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Models training and testing. Comparative analysis</title>
      <p>The model building and training was performed in Python using the Tensorflow and Keras
libraries [25], as well as the Google Colab cloud machine learning service [26] using an NVIDIA T4
GPU.</p>
      <p>At the beginning of the study, two baseline models were compared: U-Net with a conventional
architecture and with ASPP as a bottleneck. Training was performed on 256 by 256 pixels images
due to the memory limitations of the GPU. Appropriate augmentations were applied to the dataset,
which increased the image volume by a factor of five, of which 70% was divided for model training,
15% for model evaluation, and 15% as test images that were not used in training or evaluation,
which is necessary to obtain more objective results regarding the quality of the model.</p>
      <p>Figure 7 shows charts with the results of training and evaluation of the baseline model using
ASPP as a bottleneck. Overall, the base model shows result comparable to the model without ASPP,
but has lower losses and higher average IoU.</p>
      <p>Figure 8 shows some segmented images from the testing set. These images were not included in
the model training, so the results represent a realistic assessment of the model’s performance on
unique data.</p>
      <p>It is possible to note a potentially positive dynamics of pixel classification into the
corresponding classes, although the model still has insufficient data on some objects, for example,
adjacent areas. Since the task is to segment two main classes: damaged and intact buildings, the
model has difficulty recognizing objects that have “transitional” features between these two classes,
as well as if the environment has features of buildings – as in the case of construction debris or
completely destroyed buildings. An acceptable solution to solve such features may be to expand the
problem, with the segmentation of intermediate classes that will characterize the degree of damage.</p>
      <p>As a result, 2 basic U-Net models (classical and with ASPP) were compared, as well as 3 best
models with other parameters. Table 1 shows the basic configuration of each model that was tested
in the study.</p>
      <sec id="sec-5-1">
        <title>Base model without ASPP</title>
      </sec>
      <sec id="sec-5-2">
        <title>Base model with ASPP</title>
      </sec>
      <sec id="sec-5-3">
        <title>Model</title>
      </sec>
      <sec id="sec-5-4">
        <title>Model 1</title>
      </sec>
      <sec id="sec-5-5">
        <title>Model 2</title>
      </sec>
      <sec id="sec-5-6">
        <title>Model 3</title>
      </sec>
      <sec id="sec-5-7">
        <title>Weights</title>
      </sec>
      <sec id="sec-5-8">
        <title>Augmentations</title>
      </sec>
      <sec id="sec-5-9">
        <title>Epochs</title>
        <p>ASPP config</p>
      </sec>
      <sec id="sec-5-10">
        <title>The results of the comparative analysis are given in Table 2.</title>
        <p>Thus, it is possible to note the improvement of the main metrics such as loss, accuracy and
mean IoU in the baseline model using ASPP. At the same time, the IoU value for damaged and
intact buildings is lower. This can be explained by the fact that the model reduced the probable
area of classification of buildings “above” the background class, which partially reduced the IoU
indicator but increased the overall average IoU value. As a result, the baseline model has 0.2364 less
losses on the evaluation set, increased accuracy by 0.1 and the average IoU value by 0.0539
compared to the classical model, which indicates the positive impact of ASPP on the segmentation
of damaged buildings.</p>
        <p>Model 3 has the following better performance, such as increased IoU values for damaged and
intact buildings, as well as an average IoU value of 0.541, close to that of the baseline model with
ASPP, which may indicate a positive impact of reducing the atrous rate in the context of the task.</p>
        <p>It is worth noting that Model 1 showed the best results among all models at 23 epochs on the
evaluation dataset: 0.4583 IoU for damaged, 0.5147 IoU for intact, with a 0.6072 average IoU value
among the three classes, making this model potentially the best among those built.</p>
        <p>Despite the improvements, the model still requires addressing other inaccuracies, such as the
possibility of confusing objects not related to buildings (running tracks, sports fields, the pixels of
which may coincide with pixels that are characteristic of buildings), or the inability to distinguish
completely destroyed objects from rubble (Figure 9).</p>
        <p>This can be addressed in several ways, including using a larger dataset, expanding the classes
for segmentation (e.g., adding the degree of damage to buildings), or changing the approach to data
annotation, as the impact of annotation on the model's accuracy in detecting damage to buildings
and the environment due to combat operations requires more detailed investigation, which will be
performed in future studies.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The paper considered improving the efficiency of the U-Net CNN model using atrous spatial
pyramid pooling to increase the accuracy of segmentation of damaged buildings. The implemented
ASPP module was used as a bottleneck, which allowed not only to cover more image features at
different levels, but also to reduce the computing power required for training.</p>
      <p>For the study, the dataset was expanded to 100 original images, and various augmentations were
applied to increase the variability of the dataset, which generally positively affected the model's
ability to train on a limited number of images. Weights were also generated for each image, which
gave more weight to pixels in specific classes to balance out the data that predominated.</p>
      <p>As part of the study, a baseline model was built using ASPP, and the 3 best ones were selected
using different parameters. It was determined that the implementation of ASPP has a positive effect
on the training efficiency and the quality of the final results. According to the comparative
analysis, it was possible to increase the average IoU value on the validation dataset by 5.39% for the
baseline model using ASPP. At the same time, Model 3 has higher IoU values for both classes of
buildings, which may indicate a positive effect of reducing the distance between the filters. It is
also worth noting the evaluating results of Model 1, which received the best results among all
models at 23 epochs: 45.83% and 51.47% IoU for damaged and intact buildings, respectively, as well
as an average IoU of 60.72%, which makes this model potentially the best among the others.</p>
      <p>The positive impact of ASPP application allows using this module in further studies aimed at
reducing the impact of other features of segmentation of damaged buildings as a result of
hostilities, such as the difficulties of detecting destroyed buildings from rubble and of accurately
determining the shape of damaged buildings and others.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[19] A. Gozhyj, V. Nechakhin, I. Kalinina, Solar Power Control System based on Machine Learning
Methods. 23-26 September (2020) Corpus ID: 231714812. Conference: 2020 IEEE 15th
International Conference on Computer Sciences and Information Technologies (CSIT).
doi:10.1109/CSIT49958.2020.9321953.
[20] S. Babichev, B. Durnyak, V. Zhydetskyy, I. Pikh, V. Senkivskyy, Techniques of DNA
Microarray Data Pre-processing Based on the Complex Use of Bioconductor Tools and Shannon
Entropy. CEUR Workshop Proceedings 2353, Zaporizhzhia (2019) pp. 365-377,
https://ceurws.org/Vol-2353/paper29.pdf.
[21] P. Bidyuk, A. Gozhyj, Z. Szymanski, I. Kalinina, V. Beglytsia, The Methods Bayesian Analysis
of the Threshold Stochastic Volatility Model. Journal 2018 IEEE 2nd International Conference
on Data Stream Mining and Processing, DSMP ’2018, Lviv. (2018) 70-74. doi:
10.1109/DSMP.2018.8478474
[22] T. A. Taha, A. Hanbury, Metrics for evaluating 3D medical image segmentation: analysis,
selection, and tool, BMC Med. Imaging 15 (2006) 29. doi:10.1186/s12880-015-0068-x.
[23] V. Andrunyk, A. Vasevych, L. Chyrun, N. Chernovol, N. Antonyuk, A. Gozhyj, V. Gozhyj, I.</p>
        <p>Kalinina, M. Korobchynskyi. Development of Information System for Aggregation and
Ranking of News Taking into Account the User Needs (2020),
https://ceur-ws.org/Vol2604/paper74.pdf.
[24] V. Senkivskyy, I. Pikh, N. Senkivska, I. Hileta, O. Lytovchenko, Y. Petyak, Forecasting
Assessment of Printing Process Quality. Journal of Graphic Engineering and Design (2020) 11(1), pp.
27-35, https://doi.org/10.1007/978-3-030-54215-3_30.
[25] M. Abadi, A. Agarwal, et al., TensorFlow: Large-scale machine learning on heterogeneous
distributed systems, 2016. arXiv:1603.04467.
[26] Colaboratory, Google, 2024. URL: https://research.google.com/colaboratory/faq.html.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Advances in rapid damage identification methods for postdisaster regional buildings based on remote sensing images: A survey</article-title>
          ,
          <source>Buildings</source>
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <article-title>898</article-title>
          . doi:
          <volume>10</volume>
          .3390/buildings14040898.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. L. Moreno</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Montoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lozano Garzón</surname>
          </string-name>
          ,
          <article-title>Toward reliable post-disaster assessment: Advancing building damage detection using You Only Look Once convolutional neural network and satellite imagery</article-title>
          ,
          <source>Mathematics</source>
          <volume>13</volume>
          (
          <year>2025</year>
          )
          <article-title>1041</article-title>
          . doi:
          <volume>10</volume>
          .3390/math13071041.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>BDHE-Net</surname>
          </string-name>
          :
          <article-title>A novel building damage heterogeneity enhancement network for accurate and efficient post-earthquake assessment using aerial and remote sensing data</article-title>
          ,
          <source>Appl. Sci</source>
          .
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <article-title>3964</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14103964.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hariharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Improved feature pyramid networks for object detection</article-title>
          ,
          <source>in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit</source>
          .,
          <source>CVPR</source>
          <year>2017</year>
          ,
          <article-title>Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>936</fpage>
          -
          <lpage>944</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2017</year>
          .
          <volume>106</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>37</volume>
          (
          <year>2015</year>
          )
          <fpage>1904</fpage>
          -
          <lpage>1916</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Papandreou, I. Kokkinos,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Yuille</surname>
          </string-name>
          ,
          <article-title>DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>40</volume>
          (
          <year>2018</year>
          )
          <fpage>834</fpage>
          -
          <lpage>848</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Papandreou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schroff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1802</year>
          .
          <volume>02611</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <article-title>An end-to-end atrous spatial pyramid pooling and skip-connections generative adversarial segmentation network for building extraction from high-resolution aerial images</article-title>
          ,
          <source>Appl. Sci</source>
          .
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>5151</article-title>
          . doi:
          <volume>10</volume>
          .3390/app12105151.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          , ASPP+
          <article-title>-LANet: A multi-scale context extraction network for semantic segmentation of high-resolution remote sensing images</article-title>
          ,
          <source>Remote Sens</source>
          .
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <article-title>1036</article-title>
          . doi:
          <volume>10</volume>
          .3390/rs16061036.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Feature residual analysis network for building extraction from remote sensing images</article-title>
          ,
          <source>Appl. Sci</source>
          .
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>5095</article-title>
          . doi:
          <volume>10</volume>
          .3390/app12105095.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <article-title>Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism</article-title>
          ,
          <source>in: Proc. 5th Int. Conf. Big Data Artif. Intell. Softw</source>
          . Eng.,
          <source>ICBASE</source>
          <year>2024</year>
          , Wenzhou, China,
          <year>2024</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>397</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2409.08588.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gozhyj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Kalinina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dymo</surname>
          </string-name>
          ,
          <article-title>Application of convolutional neural networks for detection of damaged buildings</article-title>
          ,
          <source>CEUR-WS</source>
          <volume>3711</volume>
          (
          <year>2024</year>
          )
          <fpage>15</fpage>
          -
          <lpage>27</lpage>
          . URL: http://CEUR-WS.org/Vol3711/paper2.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-Net:
          <article-title>Convolutional networks for biomedical image segmentation</article-title>
          ,
          <year>2015</year>
          . URL: https://arxiv.org/pdf/1505.04597.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Long</surname>
          </string-name>
          , E. Shelhamer, T. Darrell,
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          ,
          <source>in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit</source>
          .,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.1411.4038.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Koltun, Multi-scale context aggregation by dilated convolutions</article-title>
          ,
          <source>in: Proc. Int. Conf. Learn. Representations</source>
          ,
          <string-name>
            <surname>ICLR</surname>
          </string-name>
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.1511.07122.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Labelme</surname>
          </string-name>
          ,
          <source>Image polygonal annotation with Python</source>
          ,
          <year>2024</year>
          . URL: https://github.com/labelmeai/labelme.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Buslaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kvedchenya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Iglovikov</surname>
          </string-name>
          ,
          <article-title>Albumentations: fast and flexible image augmentations</article-title>
          ,
          <source>in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit</source>
          .,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1809</year>
          .06839.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>C. M. Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer, New York, NY,
          <year>2006</year>
          . ISBN:
          <fpage>0</fpage>
          -
          <lpage>387</lpage>
          -31073-8.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>