Cultural heritage image classification using transfer learning for
feature extraction: a comparison
Radmila Janković Babić
Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, Belgrade, Serbia


                               Abstract
                               Image classification in the domain of cultural heritage becomes extremely important with the
                               development of digitisation practices. This study aims to analyze how classification
                               performance on the small dataset representing cultural heritage changes depending on the
                               feature extraction method. The dataset comprised of 150 images belonging to three classes: (i)
                               archaeological sites, (ii) frescoes, and (iii) monasteries. Five transfer learning architectures
                               were used to extract the features from images, while classification was per-formed using four
                               traditional machine learning algorithms, mainly Random forest, Naïve Bayes, Decision tree,
                               and Multilayer perceptron classifier. The results suggest that Random forest and Multilayer
                               perceptron are the most suitable algorithms for classification of cultural heritage images,
                               especially when used in combination with the DenseNet121 pre-trained architecture. Naïve
                               Bayes also performed well, with a maximum accuracy of 100% obtained when features are
                               extracted using EfficientNetB0. However, the Decision tree algorithm reached only moderate
                               performance.

                               Keywords 1
                               Cultural heritage, classification, transfer learning

1. Introduction
    Preservation of cultural heritage remains one of the most important tasks in the era of digitisation.
In general, cultural heritage can be classified into tangible or physical (such as buildings, monuments,
archaeological remains, art works, artifacts) and intangible (such as traditions, language, rituals, skills,
folklore). As cultural heritage represents values, traditions, and beliefs of national identity, it shapes
future generations and creates a strong bond to their history and surroundings. Since cultural heritage
can easily be damaged and destroyed, it is essential to find adequate ways to restore and preserve it.
    Digital technologies play a vital role for the preservation and restauration of cultural heritage.
Recently, the use of Machine Learning (ML) techniques has proven to be an appropriate way to deal
with preservation of cultural heritage. However, such use does not come without barriers. Major
problems in this domain are the quality and size of datasets [1]. To tackle the dataset size problem,
transfer learning architectures can be utilized where deep convolutional neural networks (CNNs) are
trained on very large image datasets, and then applied on smaller data, while ML-based approaches are
frequently used to enhance heritage objects using image reconstruction approaches [2, 3, 4].
    Recent contributions of classification techniques in the domain of cultural heritage include the use
of the multilayer perceptron (MLP), averaged one dependence estimators, forest by penalizing
attributes, and k-nearest neighbor rough sets and analogy-based reasoning for classification of altar,
gargoyle, dome, column, and vault images [5], and the performance was compared to the CNN. Feature
extraction using VGG16 and classification using Random forest (RF) were performed in [6] with the
aim to classify Batik types. Multiple linear regression and fuzzy inference models were used to predict

VIPERC2022: 1st International Virtual Conference on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding, 12
September 2022
EMAIL: rjankovic@mi.sanu.ac.rs
ORCID: 0000-0003-3424-134X
                            © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Wor
 Pr
    ks
     hop
  oceedi
       ngs
             ht
             I
              tp:
                //
                 ceur
                    -
             SSN1613-
                     ws
                      .or
                    0073
                        g
                            CEUR Workshop Proceedings (CEUR-WS.org)
the life of built cultural heritage in [7], while logistic regression approach was compared to maximal
entropy for predictive modeling of archaeological site locations in [8]. Chronological classification of
ancient painting was performed in [9] using the Support Vector Machine (SVM) classifier.
   Although traditional ML approaches proved to be accurate in classification of cultural heritage, more
recent approaches are focused on deep learning, and specifically transfer learning. One of the major
advantages of these approaches lies in the fact that extraction of features from the images is performed
automatically, however, the selection of network configuration is more complex. In addition, deep
learning approaches usually require large datasets to learn from, so transfer learning approaches were
developed to reduce the computational efforts. Some of the recent contributions of deep learning and
transfer learning approaches for cultural heritage include classification of Indian heritage sites using
MobileNetV2 architecture [10], multimodal classification of cultural datasets [11], classification of four
cultural heritage sites, mainly Baalshamin, Temple of Bel, Tetrapylon, and Roman theatre at Palmyra,
using Support Vector Machine (SVM) algorithm, transfer learning based on AlexNet architecture, cloud
vision approach, and full CNN [12]. Classification of architectural heritage has been performed in [13]
using deep learning networks AlexNet, InceptionV3, ResNet, and Inception-ResNet-v2, while in [14]
a CNN model from scratch was trained on the same dataset with good performance. The comparison of
pre-trained CNN networks for cultural heritage image classification has been done in [15, 16], while in
[17] the authors compared the performance of ResNet-18 and custom CNN with SVM and RF used as
classifiers of Iberian Ceramics. However, to the author’s knowledge, no studies aimed to compare the
performance of several ML algorithms when different pre-trained architectures are used for feature
extraction.
   Hence, the aim of this paper is to compare the performance of four traditional ML algorithms when
feature extraction is performed using five pre-trained architectures: (i) MobileNet, (ii) InceptionV3, (iii)
Xception, (iv) EfficientNetB0, and (v) DenseNet121. When discussing performance, the focus of this
study will be on small sets of data, which are by nature harder to classify correctly, as complex models
usually require more data to accurately learn from it.
   This paper is structured as follows. Section 2 presents the data and describes the methodology, while
Section 3 discusses the obtained results. Section 4 presents the conclusions.

2. Data and methodology
2.1. Data

    The dataset used in this study was first introduced in [18] where the aim was to compare the
performances of decision tree classifiers. Here, however, the aim is to observe to what extent the
classification performance changes depending on the pre-trained architectures that are used to extract
features from images.
    The dataset consists of 150 color images obtained from Google Images and Flickr, belonging to
three classes: (i) archaeological sites, (ii) frescoes, and (iii) monasteries. All images are of size 150x150
pixels. Samples of images from each class are shown in Fig. 1. The dataset was divided into training
and test sets, where 35 images per class were used for training and 15 images per class were used for
testing the models.


Figure 1: Example of images from the dataset – (a) archaeological site, (b) fresco, (c) monastery
2.2.    Methodology

    The methodology used in this study consists of several steps. First, pre-trained architectures are used
to extract features from the images. The features are then normalized, after which the classification is
performed using traditional ML algorithms. Finally, the performance of each model is evaluated.

2.2.1. Pre-trained architectures

    In this study, five transfer learning architectures were used for feature extraction, namely MobileNet,
DenseNet121, Xception, InceptionV3, and EfficientNetB0. The loaded weights were pre-trained on
ImageNet.
    The main strategy behind MobileNet is that it is built using depthwise separable convolutions. It
consists of 28 layers, where all layers are followed by batch normalization and use the ReLU function,
except for the last fully connected layer which uses a softmax function [19].
    DenseNet121 is a pre-trained architecture consisting of 120 convolutional layers, four average
pooling layers, and one fully connected layer. DenseNet is very similar to ResNet, with the main
difference being the concatenation of the output feature maps with the inputs [20].
    InceptionV3 is a deep learning architecture that applies convolutional transformations and max-
pooling to each layer, and then concatenates these results into an output. InceptionV3 consists of 42
layers.
    Xception is a deep convolutional neural network architecture that is based on depthwise separable
convolution layers [21]. It is an extension of the Inception architecture, but instead of the Inception
modules, it uses depthwise separable convolutions. The Xception architecture is 71 layers deep and
consist of 36 convolutional layers.
    EfficientNetB0 is a convolutional deep neural network that was developed in an attempt to answer
the question if there is a way to scale up ConvNets in order to obtain better accuracy and efficiency
[22]. This was possible by increasing the network depth, channel width, and image resolution, as the
authors proposed [22]. EfficientNetB0 is 237 layers deep.

2.2.2. ML algorithms

    Classification was performed using four ML algorithms, in particular RF, Multilayer Perceptron
classifier (MLPC), Naïve Bayes, and Decision tree.
    RF is an ensemble machine learning algorithm that is based on bootstrap aggregation. This algorithm
constructs a number of decision trees that work as an ensemble where each tree predicts a class, and the
prediction outcome will be the class with most votes [23]. The working process of RF classification
starts by randomly selecting samples from the training set, constructing decision trees for each sample,
generating an output for each decision tree, and finally selecting the most voted outcome.
    MLP is a feedforward neural network-based algorithm that consists of an input layer, one or more
hidden layers, and an output layer. The algorithm works by assigning weights to the inputs in a neuron,
summing those weights and passing them through an activation function, and then propagate the results
to the next layer, until it reaches the output layer. The error between the expected and real output is then
calculated and backpropagated through the network, with the aim to minimize the cost function.
    Naïve Bayes classifier is based on Bayes’ theorem, and it assumes strong independence, meaning
that each feature is not affected by other features. The algorithm works by first calculating prior
probabilities for each class, then calculating the likelihood probability, and finally calculating the
posterior probabilities for each class. The prediction outcome will be the class with the highest
conditional probability.
    Decision tree algorithm is a supervised learning approach that uses a tree-structured classifier to
perform classification. Decision trees consist of nodes and branches, mainly the root node which
represents the dataset, branches which represent the decision rules, and leaf nodes that represent the
outcome.
    Model configuration is the same for each algorithm. RF model consisted of 70 trees in the forest,
and the quality of the split was measured using Gini impurity. Naïve Bayes and Decision tree used
default configuration as described in the scikit documentation [24], while MLPC consisted of three
hidden layers with 150, 100, and 50 neurons, respectively, with rectified linear unit (ReLU) activation
function, and stochastic gradient-based optimizer, i.e. Adam. The maximum number of iterations was
set to 300, the L2 regularization term was set to 0.0001, while the learning rate was set to 0.001.

2.2.3. Performance evaluation

    Performance evaluation for each model was done using widely known metrics – precision, recall,
F1-score, and accuracy.
    Precision is the ratio of true positive cases (TP) to the total predicted positive cases (which is the
sum of TP and false positive cases (FP)), and can be calculated as:
                                                   𝑇𝑃                                                (1)
                                           𝑃=            .
                                                𝑇𝑃 + 𝐹𝑃
    Recall is the ratio of correctly predicted positive cases to all cases in the positive class. Recall is
calculated by dividing the number of true positives (TP) to the sum of TP and false negative (FN) cases,
as in:
                                                   𝑇𝑃                                                (2)
                                           𝑅=            .
                                               𝑇𝑃 + 𝐹𝑁
    F1-score represents the harmonic mean of precision and recall, and can be calculated as:
                                                        𝑃∗𝑅                                          (3)
                                     𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗          .
                                                        𝑃+𝑅
    Finally, accuracy is calculated as:
                                                    𝑇𝑃 + 𝑇𝑁                                          (4)
                                𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                          .
                                              𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁


3. Results and discussion

   While feature extraction was performed using the pre-trained architectures, classification was done
using the traditional ML models. The smallest differences in performance between different pre-trained
architectures were observed for RF where for features extracted using MobileNet, DenseNet121,
Xception, and EfficientNetB0 the obtained accuracies are 97.78%, while for InceptionV3 the accuracy
was lower – 88.89%. For features extracted using DenseNet121 and InceptionV3, the MLPC obtained
a 100% accuracy, while for the EfficientNetB0, MobileNet, and Xception, the accuracies were 97.78%,
91.11% and 86.67%, respectively. On the contrary, Naïve Bayes and Decision tree obtained the lowest
performance in terms of accuracy. When feature extraction was performed using EfficientNetB0, the
Naïve Bayes obtained a 100% accuracy, however for other pre-trained architectures the accuracies
range from 53.33% (InceptionV3) to 95.56% (MobileNet). Considering classification using the
Decision tree, the highest accuracy was obtained when features are extracted using DenseNet121 and
Xception (82.22%), while for other pre-trained architectures, the accuracies range from 73.33%
(MobileNet and InceptionV3) to 75.56% (EfficientNetB0). These results are presented in Table 1.
   It is interesting to observe which classes the models confuse the most. When DenseNet121 was used
for feature extraction, it can be seen that RF misclassified only one image of archaeological site as
fresco, while Naïve Bayes classified 10 images of frescoes as archaeological sites, and six images of
frescos as monastery. The decision tree misclassified five images of archaeological sites as fresco, and
three images of archaeological site as monastery (Figure 2). Hence, MLPC was found to be the most
successful in classifying images of cultural heritage when using the DenseNet121 architecture for
feature extraction.
Table 1
Model accuracies
      Pre-trained               Metric           RF          MLPC         Naïve Bayes     Decision tree
      architecture
      Mobile Net              Accuracy         97.78         91.11        95.56          73.33
                              Precision         0.98          0.91         0.96           0.73
                                Recall          0.98          0.93         0.96           0.76
                              F1-score          0.98          0.91         0.96           0.74
      DenseNet121             Accuracy         97.78          100         64.44          82.22
                              Precision         0.98          1.00         0.64           0.82
                                Recall          0.98          1.00         0.83           0.88
                              F1-score          0.98          1.00         0.63           0.82
        Xception              Accuracy         97.78         86.67        84.44          82.22
                              Precision         0.98          0.87         0.84           0.82
                                Recall          0.98          0.90         0.89           0.83
                              F1-score          0.98          0.87         0.84           0.81
      InceptionV3             Accuracy         88.89          100         53.33          73.33
                              Precision         0.89          1.00         0.53           0.73
                                Recall          0.91          1.00         0.77           0.76
                              F1-score          0.89          1.00         0.49           0.73
     EfficientNetB0           Accuracy         97.78         97.78         100           75.56
                              Precision         0.98          0.98         1.00           0.76
                                Recall          0.98          0.98         1.00           0.76
                              F1-score          0.98          0.98         1.00           0.76
   Note: Macro average values of precision, recall, and F1-score are shown. Macro average values are
calculated as the arithmetic mean of individual classes' scores.


   Figure 2: Confusion matrices obtained from traditional ML classification when DenseNet121
architecture was used for feature extraction

    Considering the EfficientNetB0 architecture, Naïve Bayes correctly classified all the images from
test set, while RF incorrectly classified only one image belonging to the class of archaeological sites as
monastery, and MLPC incorrectly classified one image of fresco as archaeological site. However, the
Decision tree incorrectly classified three images belonging to the fresco class as archaeological sites,
one image of monastery as archaeological site, three images of archaeological sites as fresco, one image
of monastery as fresco, and three images of archaeological sites as monastery (Figure 3).


Figure 3: Confusion matrices obtained from traditional ML classification when EfficientNetB0
architecture was used for feature extraction.

    When using the InceptionV3 architecture, the best performing model is MLPC as it classified all
images from the test set correctly. RF incorrectly classified four images of archaeological sites as fresco,
and one image of monastery as archaeological site. Furthermore, the decision tree algorithm
misclassified two images of archaeological sites as frescoes, three images of frescoes as archaeological
sites, five images of frescoes as monastery, and two images of monasteries as archaeological sites.
Finally, the Naïve Bayes misclassified one image of archaeological sites as fresco, seven images of
frescoes as archaeological site, and 13 images of fresco as monastery. Hence, MLPC is clearly the best
choice when using InceptionV3 architecture for feature extraction, while Naïve Bayes is the worst
choice as it misclassified most images belonging to classes archaeological sites and monasteries (Figure
4).
    Classification of features extracted using the MobileNet architecture obtained good results for most
models. In particular, RF misclassified only one image of archaeological site as fresco, and Naïve Bayes
incorrectly classified one image of fresco as archaeological site, and one image of fresco as monastery.
MLPC misclassified one image of monastery as archaeological site, and three images of monastery as
fresco. Finally, the decision tree algorithm misclassified one image of fresco as archaeological site,
three images of monastery as archaeological site, six images of fresco as monastery, and two images of
monastery as fresco (Figure 5).
    When using the Xception architecture, all models misclassified at least one image from the test set.
RF misclassified one image of fresco as archaeological site, while MLPC misclassified three images of
monastery as archaeological site, and three images of monastery as fresco. The Naïve Bayes incorrectly
classified six images of fresco as archaeological site, and one image of fresco as monastery. Finally,
the decision tree incorrectly classified one image of archaeological site as fresco, three images of fresco
as archaeological site, three images of monastery as archaeological site, and one image of monastery as
fresco (Figure 6).
Figure 4: Confusion matrices obtained from traditional ML classification when InceptionV3
architecture was used for feature extraction.


Figure 5: Confusion matrices obtained from traditional ML classification when MobileNet architecture
was used for feature extraction.
Figure 6: Confusion matrices obtained from traditional ML classification when Xception architecture
was used for feature extraction.

    These findings suggest the following. The RF algorithm obtained the best results when feature
extraction was performed using MobileNet, DenseNet121, Xception, or EfficientNetB0. For these four
pre-trained architectures, RF reached the same value of accuracy, precision, recall, and F1-score of
0.98. For MLPC, the highest values of performance metrics were obtained when features were extracted
using DenseNet121 and InceptionV3 architectures. On the contrary, features extracted using Xception
architecture and classified using the MLPC obtained accuracy of 87%. Naive Bayes obtained the highest
accuracy of 100% when EfficientNetB0 architecture was used for feature extraction. However, when
feature extraction was performed using the InceptionV3, the accuracy of Naive Bayes classification
was only slightly above 50, i.e. 53.33%. Finally, considering the Decision tree, this algorithm performed
the best when features were extracted using DenseNet121 and Xception architectures with accuracies
of 82.22%. However, when using MobileNet, InceptionV3 and EfficientNetB0 architectures,
performance is slightly weaker.
    In terms of misclassified samples, the results show that in most cases the RF model misclassified
the images of archaeological sites as fresco, suggesting that this algorithm is not capable to completely
differentiate between the features that represent these classes. The Naive Bayes, on the other hand,
frequently confused images of frescoes, classifying these as monastery or archaeological site.
Furthermore, the MLPC made only several incorrect classifications, where it incorrectly classified
images of monastery as archaeological site or as fresco. Finally, the decision tree was not able to
successfully differentiate between the classes, as it made incorrect classifications in each class. Based
on the above results, it can be concluded that RF and MLPC are the most suitable algorithms for
classification of cultural heritage images when pre-trained architectures are used for feature extraction.
    Comparing the obtained results to the results in [18], it can be concluded that using transfer learning
approaches for feature extraction improves the performance of classification on small sets containing
images of cultural heritage. In terms of precision, recall, and F1-score, the performance of RF in [18]
reached 0.93, while in this study, when using transfer learning for feature extraction, the RF algorithm
obtained precision, recall, and F1-score of 0.98 for four out of five pre-trained architectures. This is a
significant improvement that confirms the suitability of transfer learning approaches for feature
extraction.
4. Conclusion

    This study was aimed at observing the differences in classification performance of traditional ML
algorithms on a small set of cultural heritage images whose features were extracted using five pre-
trained deep learning architectures. The dataset used in this study consists of only 150 cultural heritage
images belonging to three classes (50 images per class). Feature extraction was performed using
MobileNet, Dense-Net121, EfficientNetB0, InceptionV3, and Xception architectures, while
classification was done using RF, Decision tree, MLPC, and Naïve Bayes algorithms.
    The results suggest that the best performance was reached using RF, as well as MLPC algorithms,
especially when the extraction of features was made using the DenseNet121 architecture. Although the
differences in performance of RF and MLPC algorithms between the pre-trained architectures are not
very extreme, the results of this study confirm the importance of a careful selection of feature extraction
method. Finally, it should be noted that the decision tree obtained the lowest performance, with the
differences between the pre-trained architectures ranging between 73.33% and 82.22% accuracy.

5. Acknowledgements

   This work was supported by the Ministry of Education, Science and Technological Development of
the Republic of Serbia through the Mathematical Institute of the Serbian Academy of Sciences and Arts.

6. References

[1] M. Fiorucci, M. Khoroshiltseva, M. Pontil, A. Traviglia, A. Del Bue, S. James, Machine learning
     for cultural heritage: A survey, Pattern Recognition Letters 133 (2020) 102-108.
[2] A. Belhi, A. Bouras, A. K. Al-Ali, S. Foufou, A machine learning framework for enhancing digital
     experiences in cultural heritage, Journal of Enterprise Information Management (2020).
[3] S. Zhou, Y. Xie, Intelligent Restoration Technology of Mural Digital Image Based on Machine
     Learning Algorithm, Wireless Communications and Mobile Computing (2022).
[4] R. Hermoza, I. Sipiran, 3D reconstruction of incomplete archaeological objects using a generative
     adversarial network, in: Proceedings of Computer Graphics International, 2018, pp. 5-11.
[5] R. Janković, Machine learning models for cultural heritage image classification: Comparison based
     on attribute selection, Information 11 (2019).
[6] D. M. S. Arsa, A. A. N. H. Susila, VGG16 in batik classification based on random forest, in:
     International Conference on Information Management and Technology, ICIMTech, IEEE, 2019,
     pp. 295-299.
[7] A. J. Prieto, A. Silva, J. de Brito, J. M. Macías-Bernal, F. J. Alejandre, Multiple linear regression
     and fuzzy logic models applied to the functional service life prediction of cultural heritage, Journal
     of Cultural Heritage 27 (2017) 20-35.
[8] I. Wachtel, R. Zidon, S. Garti, G. Shelach-Lavi, Predictive modeling for archaeological site
     locations: Comparing logistic regression and maximal entropy in north Israel and north-east China,
     Journal of Archaeological Science 92 (2018) 28-36.
[9] L. Chen, J. Chen, Q. Zou, K. Huang, Q. Li, Multi-view feature combination for ancient paintings
     chronological classification, Journal on Computing and Cultural Heritage 10 (2017) 1-15.
[10] U. Kulkarni, S. M. Meena, S.V. Gurlahosur, U. Mudengudi, Classification of cultural heritage sites
     using transfer learning, in: Fifth international conference on multimedia big data, BigMM, IEEE,
     2019, pp. 391-397.
[11] A. Belhi, A. Bouras, S. Foufou, Leveraging known data for missing label prediction in cultural
     heritage context, Applied Sciences 8 (2018) 1768.
[12] A. Yasser, K. Clawson, C. Bowerman, M. Lévêque, Saving cultural heritage with digital make-
     believe: machine learning and digital techniques to the rescue, in: Proceedings of the 31st British
     Computer Society Human Computer Interaction Conference, ACM Press, 2017, pp. 1-5.
[13] J. Llamas, P. M. Lerones, R. Medina, E. Zalama, J. Gomez-Garcia Bermejo, Classification of
     architectural heritage images using deep learning techniques, Applied Sciences 7 (2017) 1-26.
[14] M. Ćosović, R. Janković, CNN classification of the cultural heritage images, in: 19th International
     Symposium INFOTEH-JAHORINA, IEEE, 2020, pp. 1-6.
[15] A. Belhi, H. O. Ahmed, T. Alfaqheri, A. Bouras, A. H. Sadka, S. Foufou, Study and Evaluation of
     Pretrained CNN Networks for Cultural Heritage Image Classification, in: Data Analytics for
     Cultural Heritage, Springer, Cham, 2021, pp. 47-69.
[16] M. Sabatelli, M. Kestemont, W. Daelemans, P. Geurts, Deep transfer learning for art classification
     problems, in: Proceedings of the European Conference on Computer Vision Workshops, 2018.
[17] P. Navarro, C. Cintas, M. Lucena, J. M. Fuertes, C. Delrieux, M. Molinos, Learning feature
     representation of Iberian ceramics with automatic classification models, Journal of Cultural
     Heritage 48 (2021) 65-73.
[18] R. Jankovic, Classifying cultural heritage images by using decision tree classifiers in WEKA, in:
     Proceedings of the 1st International Workshop on Visual Pattern Extraction and Recognition for
     Cultural Heritage Understanding Co-Located with 15th Italian Research Conference on Digital
     Libraries, IRCDL, 2019, pp. 119-127.
[19] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam,
     Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint
     arXiv:1704.04861, 2018.
[20] N. Hasan, Y. Bao, A. Shawon, Y. Huang, DenseNet convolutional neural networks application for
     predicting COVID-19 using CT image, SN computer science 2 (2021) 1-11.
[21] F. Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the
     IEEE conference on computer vision and pattern recognition, IEEE, 2017, pp. 1251-1258.
[22] M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in:
     International conference on machine learning, PMLR, 2019, pp. 6105-6114.
[23] L. Breiman, Random forests, Machine learning 45 (2001) 5-32.
[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P.
     Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, Scikit-learn: Machine learning in Python,
     Journal of machine Learning research 12 (2011) 2825-2830.