=Paper=
{{Paper
|id=Vol-3101/Paper16
|storemode=property
|title=Application of semantic segmentation of clouds of points for preservation of cultural heritage
|pdfUrl=https://ceur-ws.org/Vol-3101/Paper16.pdf
|volume=Vol-3101
|authors=Nataliya Boyko,Mariia Rizhko
|dblpUrl=https://dblp.org/rec/conf/citrisk/BoykoR21
}}
==Application of semantic segmentation of clouds of points for preservation of cultural heritage==
<pdf width="1500px">https://ceur-ws.org/Vol-3101/Paper16.pdf</pdf>
<pre>
Application of Semantic Segmentation of Clouds of Points
for Preservation of Cultural Heritage
Nataliya Boyko1 and Mariia Rizhko1
1Lviv Polytechnic National University, Profesorska Street 1, Lviv, 79013, Ukraine


            Abstract
            Artificial intelligence is evolving and emerging in many new areas, but a literature analysis suggests that
            3D and AI-based technologies to monitor cultural heritage have not been studied enough. Cultural
            heritage requires non-stop detailed observation and protection as buildings become older and ruin
            through time. This process is critical and requires much time, financial and human resources. Since it is
            not always possible to provide these resources, Computational and Information Technologies are
            needed to build a risk-informed system that will analyze and notify about cultural heritage changes in a
            timely manner. Therefore, the contribution of this document is potentially essential for this area. The
            study examines ArCH datasets and best techniques for segmenting 3D point clouds: Point-wise MLP
            with PointNet, PointNet ++ and RandLA-Net, Point Convolution with PointCNN, RNN-based with RSNet,
            Graph-based with DGCNN. The paper examines the efficiency of the semantic segmentation models
            PointNet, PointNet ++, RandLA-Net, PointCNN, RSNet, DGCNN on S3DIS, ScanNet, Semantic3D, and
            SemanticKITTI datasets. The efficiency of semantic segmentation models PointNet, PointNet ++,
            RandLA-Net, PointCNN, RSNet, DGCNN on S3DIS, ScanNet, Semantic3D, and SemanticKITTI datasets are
            compared.

           Keywords 1
            Artificial Intelligence, Point Cloud, Semantic Segmentation, Monitoring, Cultural Heritage, Risk-Informed
            Systems, Information Technologies


1. Introduction
Cultural heritage plays a vital role in preserving the memory and knowledge of the past. Moreover,
its preservation is essential in developing modern infrastructure, constructing new cities, roads,
and railways. At the same time, we must not forget about the development of tourist services, an
adaptation of old buildings to modern needs, illegal archeological excavations, and other potential
risks related to destroying cultural heritage.
    Preserving cultural heritage has three main risks. First of all, it is a time-consuming process
that is required to be done repeatedly. If it is impossible to do so, cultural heritage will get damaged
and require renovation, or in some cases, it can even be lost forever. The second concern is
financial resources. Non-stop monitoring takes much time and human resources; thus, it takes

1
 CITRisk’2021: 2nd International Workshop on Computational & Information Technologies for Risk-Informed Systems, September
16–17, 2021, Kherson, Ukraine
EMAIL: nataliya.i.boyko@lpnu.ua (N.Boyko); mariia.rizhko.knm.2018@lpnu.ua (M.Rizhko)
ORCID: 0000-0002-6962-9363 (N.Boyko); 0000-0003-3885-4661 (M.Rizhko)
             © 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
much money. No monitoring results in cultural heritage damage, and restoration takes even more
money. The last but not least risk are people working with cultural heritage. They do a monotonous
job checking cultural heritage for destruction. Instead, they could spend time on research and
renovation tasks.
   Nowadays, cultural heritage monitoring is managed by cultural organizations, which are
constantly confronted with a large amount of data that needs to be processed and small resources
that they can use. The solution is to create a risk-informed system to automate data monitoring and
analysis based on 3D and AI technologies. By automating the processes in collecting and analyzing
information, it is possible to achieve significant cost savings, both human and financial.
   The work aims to systematize approaches to directions and technologies of 3D and AI
technologies to analyze and recognize cultural heritage, developing a system for practical
application.
   The solution of the following tasks is required:
   1. Review of existing 3D and AI solutions for monitoring and analysis of cultural heritage
      preservation.
   2. Research of requirements, methods, and algorithms to get a decision for the task.
   3. Select and collect the necessary data of cultural heritage to be analyzed.
   4. Development of architecture for monitoring and analysis of cultural heritage preservation.
   5. Creating an application program - a system of semantic segmentation for cultural
      heritage.
   The study's practical importance is to create a new risk-informed system for functional,
objective, and cost-effective monitoring of cultural heritage changes that will monitor many
facilities and act quickly and promptly.


2. Related Works
Risk analysis is one of the essential tools for preserving cultural heritage. It is used for decision-
making in the process of cultural heritage asset management and maintenance. For this purpose,
both quantitive and qualitative analysis is used [21].
    Although risk categorization plays a vital role in risk management in other disciplines, it has
yet to be successfully applied to cultural heritage studies [22].
    The importance of information searching and systematizing in the modern world has led to the
thematic modeling of text document collections in this study. Therefore, thematic models are used
to identify trends in scientific publications or news streams for classifying and categorizing image
documents and video streams, information retrieval, including multilingual, tagging web pages,
detecting spam, recommendation, and other applications.
    3D scanning - building a computer model of a material object. It is studied by many researchers
[5-8]. Currently, there are two leading technologies of 3D scanning - laser scanning and
photogrammetry.
    Laser scanning is a technology for obtaining information about terrain and objects using a laser.
This method has been studied by scientists [5-8]. The result of the laser scan is a cloud of laser
reflection points.
    There are two types of laser scanning: mobile and stationary. During mobile scanning,
continuous measurement is performed while the vehicle is driving. The device is established
motionlessly during stationary scanning, and measurement is carried out from several standing
points.
    Photogrammetry is a science that studies the appearances, shapes, and positions of various
objects in space, objects, and their shapes by measuring their photographic image. It was studied
by researchers [5-8].
    No special devices are required to use this method. It is enough to have a camera on a modern
phone.
    When choosing photogrammetry for 3D scanning, an important question is what and how
affects the accuracy of 3D models. Accuracy can depend on many factors [1]: optical and digital
camera characteristics, spatial distribution, and ground control points.
    Also, shooting with the help of remote control systems, such as drones, has been studied. This
method has an advantage over standard photogrammetry due to the aerial view [2].
    In recent years, interest in preserving cultural heritage has begun to grow, so more and more
data is being digitized, which is very important for artificial intelligence, as the training of models
is based on data. However, collecting a large enough amount of data is still a problem because it
is time-consuming and requires a human factor to mark the correct elements.
    Machine learning technologies have become popular not only in computer science but in other
fields as well. One of the reasons for the growth, in particular, is the successful application of deep
learning methods for image classification [3], in which convolutional neural networks (CNN)
exceed the human ability to analyze objects [4].
    The potential of deep learning technologies for three-dimensional image analysis achieved a
remarkable breakthrough in 2012 when the AlexNet model [5] showed excellent analysis results
during the ImageNet competition. In 2014, GoogLeNet [6] won the ImageNet competition,
achieving 93.3% accuracy of semantic segmentation. In 2016, Microsoft Networks ResNet [7]
won the ImageNet competition, achieving 96.4% accuracy.


3. Materials and Methods
Learning on point clouds is attracting more and more attention with the development of augmented
and virtual reality, their wide application in computer vision, autopilot, robot development. Deep
learning is well researched to solve 2D problems, but for 3D cultural heritage data, it is only
evolving and needs further research and the creation of new datasets to train effective models.
   In this paper, 3D data are presented in point clouds, and the task is their semantic segmentation.
For this purpose, the rights to use the ArCH dataset (Architectural Cultural Heritage point clouds
for classification and semantic segmentation) were obtained. [8] The dataset consists of 17
annotated scenes, each point of which belongs to one of 9 classes: "arch": 0, "column": 1,
"moldings": 2, "floor": 3, "door_window": 4, "wall": 5, "stairs": 6, "vault": 7, "roof": 8, "other": 9.
Some of these scenes belong to the UNESCO heritage. Others are part of the historical heritage
and represent different historical periods and architectural styles.
   Fifteen scenes are used for training and two for testing models. Scenes for training include
churches, chapels, porticos, loggias, pavilions, and monasteries. Two test scenes have different
characteristics. The first represents a simple, almost symmetrical one-level building with standard
and repetitive geometric elements. The second scene represents a complex, asymmetrical building
with two levels, shot both inside and outside, with different types of vaults, stairs, and windows.
Figure 1: Photography, 3D color image, and semantic segmentation of a training object


Figure 2: Photography, 3D color image, and semantic segmentation of the first testing object
Figure 3: Photography, 3D color image, and semantic segmentation of the second testing object

Figures 1-3 show a visualization of one of the training objects and two objects used to test the
quality of the models. Objects are represented as clouds of points with corresponding r / g / b
values for each point to indicate color and class. Data were obtained using various sensors
(cameras, scanners) and platforms (UAV and others). Preprocessing included spatial translation,
subsampling, and feature selection.

Table 1
Information about training objects
                    Number of                                   Class
      Name                           Scene     Getting data                  Subsampling (cm)
                      points                                   number
1_TR_cloister     15,740,229      Indoor/    TLS + UAV        8/9        1
                                  Outdoor
2_TR_church     20,862,139        Indoor     TLS              8/9        1
3_VAL_room      4,188,066         Indoor     TLS              6/9        1
4_CA_church     4,850,807         Outdoor    TLS + UAV        6/9        1
5_SMV_chapel_1 3,783,412          Outdoor    TLS + UAV        9/9        1
6_SMV_chapel_2t 6,326,871         Indoor/OutdTLS + UAV        9/9        1
o4                                oor
7_SMV_chapel_24 3,571,064         Outdoor    TLS + UAV        9/9        1
8_SMV_chapel_28 3,156,753         Outdoor    TLS + UAV        9/9        1
9_SMV_chapel_10 2,193,189         Indoor/OutdTLS + UAV        6/9        1
                                  oor
10_SStefano_porti 3,783,699         Outdoor    Terrestrial 8/9           1
co_1                                           photogramm
                                               etry
11_SStefano_porti 10,047,392        Outdoor    Terrestrial 8/9           1
co_2                                           photogramm
                                               etry
12_KAS_pavillion_ 598,384           Indoor/OutdTLS         4/9           1
1                                   oor
13_KAS_pavillion_ 325,822           Indoor/OutdTLS         4/9           1
2                                   oor
14_TRE_square 9,409,239             Outdoor    Terrestrial 8/9           1.5
                                               photogramm
                                               etry
15_OTT_church     13,302,903        Indoor/OutdTLS         9/9           1.5
                                    oor

Table 2
Information about testing objects

                        Number of                                       Class Subsampling
          Name                                Scene     Getting data
                          points                                       number    (cm)

 A_SMG_portico    17,798,012          Outdoor           TLS + UAV      9/9     1
 B_SMV_chapel_27t 16,200,442          Indoor/Outdoor    TLS + UAV      9/9     1
 o35

Tables 1 and 2 provide more information about training and testing objects. The total number of
points for training is 102,139,969, for testing 33,998,454.
   Point-based Networks are used for the segmentation (Fig. 4).


Figure 4: Point-based methods for clouds of points
The semantic segmentation task is to divide a cloud of points into parts according to the semantic
meaning of the points. This section will describe the best semantic segmentation techniques
nowadays: Point-wise MLP on PointNet [9], PointNet ++ [10] and RandLA-Net [11], Point
Convolution on PointCNN [12], RNN-based on RSNet [13], Graph-based on DGCNN [14].
   Pointwise MLP methods typically use common MLP (Multi-Layer Perceptron) as the central
unit in their network for its high efficiency. However, point functions obtained with MLP cannot
cover local geometry in point clouds and interactions between points. Therefore, various methods
have been proposed, including PointNet, PointNet ++, and RandLA-Net, to provide more context
for each point and explore deeper local structures.
   Convolutional networks require highly structured data to obtain scales and other optimizations.
Because the point cloud is not standard, the data must be transformed into a voxel grid or image
collection before transmitting it for learning. However, this transformation makes the obtained
data excessively voluminous and can also change the nature of the data. That is why PointNet
accepts a cloud of points without transformations.
   The PointNet architecture (Fig. 5) consists of three main modules: a max-pooling layer as a
symmetric function for aggregating information from all points, a structure for combining local
and global information, two networks for combining entry points and point features.


Figure 5: PointNet architecture

The idea of this model is to approximate the general function defined on the set of points through
the application of a symmetric function on the transformed elements in the network (Formula 1):
                                    {  1       n
                                                }
                                  f ( x ,..., x ) ≈ g ( h ( x ),..., h ( x )),
                                                             1            n
                                                                                                        (1)
             N
where f : 2 R → R, h : R N → R K , g : R K × ... × R K → R – is a symmetric function.
   H is approximated through the MLP network and g through the composition by a function of
one variable and max pooling function.
   PointNet does not cover local structures because of the metric space in which the points are
located, limiting the ability to recognize small patterns and generalize complex scenes. PointNet
++ is a hierarchical neural network that applies PointNet recursively to a set of entry points. Using
distances, PointNet ++ can study local features with increasing contextual scale (Fig. 6).
Figure 6: PointNet++ architecture

RandLA-Net is a lightweight neural architecture (Figure 7) that can handle large-scale point clouds
200 times faster than other architectures, as most existing architectures use time-consuming
preprocessing and post-processing techniques. PointNet is computationally efficient but does not
capture the contextual information of each point. RandLA-Net handles large 3D point clouds in
one pass without requiring any pre/post-processing steps, such as voxelization, block separation,
or graphing. RandLA-Net relies only on random sampling within the network and therefore
requires much less memory and computation.


Figure 7: RandLA-Net architecture

The first step, Local Spatial Encoding (LocSE), is finding adjacent points. For each point, its
neighboring points are searched for by a simple K-nearest neighbors (KNN) algorithm based on
the point-wise Euclidean distance.
                                                                                  {   i   i   i
                                                                                                  }
   The next step is Relative Point Position Encoding. For each nearest K point p1 ... p k ... p K of

the central point p , the following is considered (Formula 2):
                   i
                              k            k         k                                                 (2)
                            r = MLP ( p ⊕ p ⊕ ( p − p ) ⊕ p − p k ,
                             i         i   i     i   i     i   i
where p and p k - is x-y-z coordinates of points, ⊕ - concatenation operation,         - calculates
        i        i
the Euclidean distance between adjacent and central points.
   The last step in this part is Point Feature Augmentation. For each adjacent point p k
                                                                                                i
corresponding to r k concatenation is made with the corresponding point features f k and the
                     i                                                                  i
                    k
resulting vector fˆ is received.
                   i
  The next step is Attentive Pooling, which consists of Computing Attention Scores and
Weighted Summation. Computing Attention Scores (Formula 3):
                                              k         k                                             (3)
                                          s     = g ( f , W ),
                                              i        i
where W - is MLP weights
                                   K
   Weighted Summation: f = ∑ fˆ k ⋅ s k .
                        i      i     i
                                k =1
   Point Convolution uses spatial-local correlation in data presented densely in grids and provides
a basis for studying features from point clouds. One example of such an architecture is PointCNN
(Figure 8).


Figure 8: PointCNN architecture for classification (a i b) and segmentation (c)

PointCNN learns the transformation of input points to weigh the input features associated with the
points and rearrange the points in the canonical order. The PointCNN architecture contains two
designs: Hierarchical Convolution and χ-Conv Operator.
    Hierarchical Convolution is recursively applied to local parts of the grid, often reducing data
to fewer representative points but with more saturated information.
    The χ-Conv operator works in local parts, accepts connected points as input data, and makes
convolution. Neighboring points are transformed into local coordinate systems of representative
points, and later these local coordinates are individually combined with the corresponding features
(Formula 4).
               F
                   p
                                                                             [                  ]
                       = χ − Conv ( K , p , P, F ) = Conv ( K , MLP ( P − p ) × MLP ( P − p ), F ,
                                                                                   σ
                                                                                                        (4)

where MLP is used separately for each point, as in PointNet.
            σ
Most other semantic networks do not model the necessary relationships between point clouds.
RNN-based models will be presented on the example of RSNet (Fig. 9). A key component of the
RSNet architecture is a highly efficient module of local dependence between points. RSNet takes
as input clouds of not preprocessed points and returns semantic labels for each of them.


Figure 9: RSNet diagram

Input and output feature blocks are used to generate features independently. In the middle of them,
the local dependency module is located. The input function block receives entry points and
generates attributes, and the output blocks receive processed input attributes and return final
predictions for each point. Both blocks use a sequence of multiple layers to create independent
representations of the features for each point. The local dependency module combines an
aggregation layer, a bidirectional recurrent neural network (RNN) layer, and a separation layer.
The problem of the local context is solved first by projecting disordered points on ordered features
and then applying traditional learning algorithms.
    Graph-based networks are used to capture the shapes and geometric structures of three-
dimensional point clouds. First, a point cloud is represented as a set of simple interconnected
shapes and super points, then a graph of super points is used to capture the structure and context
of information. After that, the large-scale problem of cloud point segmentation is divided into three
subtasks: geometrically homogeneous distribution, the embedding of super points, and context
segmentation.
    One example of a Graph-based architecture is DGCNN (Figure 10). DGCNN is an EdgeConv
that is suitable for CNN complex point cloud tasks, including classification and segmentation.
EdgeConv operates on graphs that are dynamically computed at each level of the network. It
covers the local geometric structure, preserving the invariance of the permutation. Instead of
generating point features directly from their embeddings, EdgeConv generates edge features that
describe the relationships between a point and its neighbors. EdgeConv is designed to be invariant
to the ordering of neighbors and therefore is an invariant of permutation.


Figure 10: DGCNN architecture

Because EdgeConv creates a local graph and learns embeddings for edges, the model can group
points in Euclidean and semantic space. Instead of working on individual points, as in PointNet,
DGCNN uses local geometric structures to construct a local graph of adjacent points and apply
operations on the edges connecting adjacent pairs of points.


4. Experiments
The results of the analyzed methods in the previous section are compared on the datasets S3DIS
[15], ScanNet [16], Semantic3D [17], and SemanticKITTI [18]. For this purpose, mean class
accuracy (mAcc), overall accuracy (oAcc), and mean class intersection over union (mIoU) metrics
are used. The data are taken from the articles of the corresponding algorithms and datasets.
   S3DIS: all point clouds are obtained without manual intervention using a Matterport scanner.
Dataset consists of 271 rooms, which belong to 6 large-scale internal scenes from 3 different
buildings, with 6020 sq.m. These areas mainly include offices, training and exhibition spaces,
conference rooms.
   ScanNet: annotations contain expected calibration parameters, camera positions, three-
dimensional surface reconstructions, textured grids, dense object-level semantic segmentation,
CAD models. The dataset contains annotated RGB-D environment scans. In total, there are 2.5M
images in 1513 scans obtained in 707 different locations.
   Semantic3D: includes about 4 billion 3D points obtained using static ground-based laser
scanners, covering up to 160x240x30 meters of space. Point clouds belong to 8 classes (urban and
rural) and contain coordinates, RGB information, and intensity.
   SemanticKITTI: an extensive outdoor dataset containing a detailed point annotation of 28
classes. The dataset contains labels for the whole horizontal 360-degree field of view of the
rotating laser sensor.

Table 3
Effectiveness evaluation of semantic segmentation models on S3DIS, ScanNet, Semantic3D, and
SemanticKITTI datasets

                          S3DIS                ScanNet            Semantic3D        SemanticKITTI

                     mAcc       mIoU       oAcc      mIoU        oAcc      mIoU      mAcc    mIoU
     PointNet     48.98       41.09     -         14.69       -           -        29.9      17.9
    PointNet++    -           50.04     71.4      34.26       -           -        -         -
    RandLA-Net    -           -         -         -           94.8        77.4     -         53.9
     PointCNN     63.86       57.26     85.1      45.8        -           -        -         -
      RSNet       59.42       56.5      -         39.35       -           -        -         -
      DGCNN       -           56.1      -         -           -           -        -         -

Table 3 shows the results achieved in the original articles of methods and datasets. As can be seen
from the results presented in this table, the best metrics in the RandLA-Net model are on the
Semantic3D and SemanticKITTI datasets, in the PointCNN model - on the S3DIS, ScanNet
datasets. However, there are quite a few unknown results on various datasets. Accordingly, we can
assume that the RandLA-Net or PointCNN models will work best on the ArCH dataset. However,
due to the omitted values, it may turn out that the other models will still be better than the previous
two.


5. Results
The experiments part demonstrates semantic segmentation models' performance on S3DIS,
ScanNet, Semantic3D, and SemanticKITTI datasets. The presented models were of different
types: PointNet, PointNet ++, and RandLA-Net are Point-wise MLP models, PointCNN is Point
Convolution, RSNet is RNN-based, DGCNN is Graph-based. Furthermore, these methods were
tested on different types of environment: offices, training and exhibition spaces, conference rooms,
cities and towns, open and closed space.
    The results were shown in Table 1. They show that different models were better on different
datasets for different metrics. So for the S3DIS dataset, the best model is PointCNN with mAcc
equal to 63.86 while PointNet has mAcc equal to 48.98, RSNet - 59.42. Also, PointCNN shows
the best results on mIoU metric, which is equal to 57.26, while it is 41.09 in PointNet, 50.04 in
PointNet++, 56.5 in RSNet, and 56.1 in DGCNN.
    For the ScanNet dataset, the best model is also PointCNN with oAcc equal to 85.1, while it is
71.4 in PointNet++. Also, PointCNN shows the best results on the mIoU metric, which is equal to
45.8 as opposed to 14.69 in PointNet, 34.26 in PointNet ++, 39.35 in RSNet.
    For the Semantic3D dataset, the best model is RandLA-Net, which shows high results with
oAcc = 94.8 and mIoU = 77.4. In the two previous datasets, the maximum value of oAcc was 85.1
and mIoU 57.26.
    The SemanticKITTI dataset is also poorly researched. The mAcc metric is shown for PointNet
only and is 29.9. The mIoU metric is presented for PointNet and RandLA-Net and is 17.9 and
53.9, respectively.
    Therefore, if we compare the presented methods PointNet, PointNet ++ and RandLA-Net,
PointCNN, RSNet, and DGCNN for the S3DIS, ScanNet, Semantic3D, and SemanticKITTI
datasets, we can get the following conclusions. Further research is needed on the methods
presented on the relevant datasets, as not all possible options have been considered in the official
articles in which the models and datasets were first presented. Further research should be
performed on the same metrics, finding mAcc, oAcc, and mIoU in all combinations of models and
datasets. After the same metrics are received, it will be possible to compare which datasets on
which models work best. The last step will be to check the models presented on the ArCH dataset
against the same metrics mAcc, oAcc, and mIoU.


6. Conclusion
The protection of cultural heritage in the urbanization epoch and city development time is critical
for preserving history. However, this is quite an enormous task with many risks connected to time,
financial, and human resources. Therefore, a solution for automating the monitoring and analysis
of data by creating semantic segmentation of point clouds was presented. A risk-informed system
based on computational and information technologies will reduce risks and increase the efficiency
of using these resources.
    The existing solutions were considered, the methods and datasets that correspond to the goal
were analyzed, and their results on different metrics were collected and analyzed.
    The following steps in continuing this study will be: conducting experiments on the presented
methods for the respective datasets; comparison of experimental results on the same metrics;
verification of the presented methods on the ArCH dataset.


      References
[1]    M.Bolognesi, A.Furini, V.Russo, A.Pellegrinelli, P.Russo, Accuracy of cultural heritage
       3Dmodels by RPAS and terrestrial photogrammetry, Int. Arch. Photogramm. Remote Sens.
       Spat. Inf. Sci. ISPRS Arch, 40, 2014, pp. 113–119
[2]    E.Karachaliou, E.Georgiou, D.Psaltis, E.Stylianidis, UAV for mapping historic buildings:
       From 3D modeling to BIM, ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., XLII-
       2, 2019, pp. 397–402
[3]    A.Krizhevsky, I.Sutskever, G.Hinton, Imagenet classification with deep convolutional
       neural networks. NIPS, 1, 2012, pp. 1097–1105
[4]    F.Radenovi´c, G.Tolias, O.Chum, CNN image retrieval learns from BoW: Unsupervised
       finetuning with hard examples. Eur. Conf. Comput. Vis. ECCV, 2016, pp. 1–17
[5]    A.Krizhevsky, I.Sutskever, G.Hinton, Imagenet classification with deep convolutional
       neural networks. NIPS 1, 2012, pp. 1097–1105
[6]    C.Szegedy, W.Liu, Y.Jia, P. ermanet, S.Reed, D.Anguelov, D.Erhan, V.Vanhoucke,
       A.Rabinovich, Going Deeper with Convolutions, CVPR, 2015, p. 1–9
[7]    K.He, X.Zhang, S.Ren, J.Sun, Deep Residual Learning for Image Recognition, 2015, 1–12
[8]    F.Matrone, A.Lingua, R.Pierdicca, E. S.Malinverni, M.Paolanti, E.Grilli, F.Remondino,
       A.Murtiyoso, T.Landes, A benchmark for large-scale heritage point cloud semantic
       segmentation, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2020,
       2020, pp. 1419–1426
[9]    R.Q.Charles, S.Hao, M.Kaichun, J.G.Leonidas, Deep learning on point sets for 3d
       classification and segmentation, in: Proceedings of the IEEE conference on computer vision
       and pattern recognition, 2017, pp. 652–660
[10]   Ch.R.Qi, L.Yi, H.Su, J.G.Leonidas, Pointnet++: Deep hierarchical feature learning on point
       sets in a metric space, Advances in neural information processing systems, 30, 2017, pp.
       5099–5108
[11]   Q.Hu, B.Yang, L.Xie, S.Rosa, Y.Guo, Z.Wang, N.Trigoni, A.Markham, RandLA-Net:
       Efficient semantic segmentation of large-scale point clouds, in: Proceedings of the
       IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11108–
       11117
[12]   Y.Li, R.Bu, M.Sun, W.Wu, X.Di, B.Chen, Pointcnn: Convolution on x-transformed points.
       Advances in neural information processing systems, 31, 2018, pp. 820–830
[13]   Q.Huang, W.Wang, U.Neumann, Recurrent slice networks for 3d segmentation of point
       clouds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
       Recognition, 2018, pp. 2626–2635
[14]   Y.Wang, Y.Sun, Z.Liu, S.E.Sarma, M.Bronstein, M.Justin, Dynamic graph cnn for learning
       on point clouds. Acm Transactions On Graphics (tog), 38, 5, 2019, pp. 1–12
[15]   I.Armeni, O.Sener, A.R.Zamir, H.Jiang, I. Brilakis, M. Fischer, S. Savarese, 3d semantic
       parsing of large-scale indoor spaces, in: Proceedings of the IEEE Conference on Computer
       Vision and Pattern Recognition, 2016, pp. 1534–1543
[16]   A.Dai, A.Chang, M.Savva, M.Halber, T.Funkhouser, Nießner. Scannet: Richly-annotated
       3d reconstructions of indoor scenes, in: Proceedings of the IEEE Conference on Computer
       Vision and Pattern Recognition, 2017, pp. 5828–5839
[17]   T.Hackel, N.Savinov, L.Ladicky, J.D.Wegner, K.Schindler, M.Pollefeys, Semantic3d. net:
       A new large-scale point cloud classification benchmark, arXiv preprint arXiv:1704.03847,
       2017
[18]   J.Behley, M.Garbade, A.Milioto, J.Quenzel, S.Behnke, C.Stachniss, J.Gall,
       SemanticKITTI: A dataset for semantic scene understanding of lidar sequences, in:
       Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2017,
       pp. 9297–9307,
[19]   N.Boiko, The issue of access sharing to data when building enterprise information model,
       in: IX International Scientific and Technical conference, Computer science and information
       technologies (CSIT 2014), Lviv, Ukraine, 2014, pp. 23-24
[20]   N.Boyko, R.Hlynka, Application of Machine Algorithms for Classification and Formation
       of the Optimal Plan, in: Proceedings of the 5th International Conference on Computational
       Linguistics and Intelligent Systems (COLINS 2021), Vol. 1: Main Conference Lviv,
       Ukraine, April 22-23, 2021, pp. 1853-1865.
[21]   V.Rajcic, Risks and resilience of cultural heritage assets, in: International Conference:
       Europe and the Mediterranean: Towards a Sustainable Built Environment At: Malta, Vol.
       1,                      2016.                     https://www.researchgate.net/publication/
       299395298_Risks_and_resilience_of_cultural_heritage_assets
[22]   R.Sharifi, Risk Characterization for Preserving Cultural Heritage Assets, 2016.
       https://www.chnt.at/wp-content/uploads/eBook_CHNT22_Sharifi.pdf

</pre>