=Paper=
{{Paper
|id=Vol-2696/paper_158
|storemode=property
|title=Herbarium-Field Triplet Network for Cross-domain Plant Identification. NEUON Submission to LifeCLEF 2020 Plant
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_158.pdf
|volume=Vol-2696
|authors=Sophia Chulif,Yang Loong Chang
|dblpUrl=https://dblp.org/rec/conf/clef/ChulifC20
}}
==Herbarium-Field Triplet Network for Cross-domain Plant Identification. NEUON Submission to LifeCLEF 2020 Plant==
<pdf width="1500px">https://ceur-ws.org/Vol-2696/paper_158.pdf</pdf>
<pre>
      Herbarium-Field Triplet Network for
       Cross-Domain Plant Identification
    NEUON Submission to LifeCLEF 2020 Plant

                       Sophia Chulif and Yang Loong Chang

      Department of Artificial Intelligence, NEUON AI, 94300 Sarawak, Malaysia
                                  https://neuon.ai/
                      {sophiadouglas,yangloong}@neuon.ai


        Abstract. This paper presents the implementation and performance of
        a Herbarium-Field triplet loss network to evaluate the herbarium-field
        similarity of plants which corresponds to the cross-domain plant identifi-
        cation challenge in PlantCLEF 2020. A two-streamed triplet loss network
        is trained to maximize the embedding distance of different plant species
        and at the same time minimize the embedding distance of the same
        plant species given herbarium-field pairs. The team submitted seven runs
        which achieved a Mean Reciprocal Rank score of 0.121 and 0.111 for the
        whole test set and the sub-set of the test set respectively.

        Keywords: Cross-domain plant identification, computer vision, triplet
        loss, convolutional neural networks


1     Introduction
Plant specimens in herbaria have been used by novices and experts alike to
study and confirm plant species as well as many other useful applications as
described in [4]. Many works are being carried out to improve the access and
preservation of these specimens as they would be considerably less expensive
to obtain rather than field images. Despite its large collection, the application
of herbaria specimens on the identification of real-world plants require more
research [14].
    The objective in PlantCLEF 2020 [5,6] involves a task of cross-domain plant
classification between herbarium specimens and field (real-world plant) images.
In this paper, we present our approach using a two-streamed network, namely
Herbarium-Field triplet loss network to evaluate the similarity of herbarium-field
pairs corresponding to the aforementioned task.
    We adopt triplet loss function to optimize the plant embeddings which reg-
ulates the measure of plant similarity. The implemented network is trained to
maximize the embeddings of different herbarium-field species pairs and minimize
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem-
    ber 2020, Thessaloniki, Greece.
Fig. 1. The triplet loss concept mainly revolves around minimizing the distances be-
tween same class and maximizing the distances between different classes. (a) shows
two classes with its herbarium counterpart, the image embedding is compared with its
own herbarium and the herbarium from another class (as indicated by the arrows). (b)
The distances between herbarium-field pairs of the same species has to be less than
the herbarium-pairs of different species (red and blue box denotes the class label).


the embeddings of same species pairs. It learns the similarity between herbarium
sheets and field images instead of directly classifying plant species as conven-
tional convolutional neural networks (CNN) [9].


2   Related Works


FaceNet: A Unified Embedding for Face Recognition and Clustering
The authors in [12] introduce triplet loss function that uses a CNN to optimize
face embeddings which corresponds to a measure of face similarity. Instead of
training an intermediate layer, the embeddings are directly optimized in an Eu-
clidean space for face verification. Likewise, this triplet loss function is adopted
in our networks to learn the optimized plant embeddings.


Plant Disease Recognition with Siamese Network The authors in [2] in-
troduce Few-Shot Learning algorithms that classify leaf images with deep learn-
ing. They employ Siamese Network with triplet loss that shows the possibility
of achieving high accuracy with small datasets. In addition, the authors in [3]
address the classification problem using real-world images. They also show that
the image embeddings extracted from the employed Siamese Network are bet-
ter than using transfer learning. In the same way, we employed a two-streamed
triplet loss network which works similarly to classify plants utilising the herbar-
ium and field embeddings.
      Fig. 2. Network Architecture of the Herbarium-Field Triplet Loss Network.


3     Methodology

This section describes our approach in PlantCLEF 2020, the implemented net-
work architecture and training stages involved. The training process is split
into three stages: pre-trained herbarium network, pre-trained field network and
two-stream triplet loss network. The Herbarium and Field networks are trained
individually to construct networks that could model generalized herbarium and
field features. A triplet network is then employed to model the triplets distance
between herbarium and field features. The objective is to train the network to
behave: (i) herbarium features (or embeddings) of a species should be closer to
the field features of the same class (ii) herbarium features of a species should be
further from field features of a different class. Fig. 1 illustrates the concept of
triplets learning for herbarium-field pairs.


3.1    Network Architecture

The network architecture implemented in our approach is illustrated in Figure 2.
This Herbarium-Field triplet loss network is constructed with two Inception-v4
CNNs [13], namely Herbarium CNN and Field CNN which were initialized with
weights pre-trained on PlantCLEF 2020 [5] and PlantCLEF 2017 [7] respectively.
Both networks are formed to cater for the generalization of herbarium and field
features. At the final embedding layer of each network, a batch normalization
layer is added and the output is fed into a fully-connected layer. The output
size of the fully-connected layer is then reduced from 1536 to 500. Subsequently,
these outputs are L2 normalized in the L2 layer and concatenated to give an
output size of (n ∗ m) × 500 whereby n and m is the batch size of the Herbarium
and Field networks respectively. This concatenated embedding is later passed
into the triplet loss layer1 through which the network learns to compute the
herbarium and field embeddings with respective to their optimum embedding
space. The network is trained to maximize the embedding distance of different
species in herbarium-field pairs and minimize the embedding distance of the
same species. The classification of species is dependent on the computed embed-
ding space by which a large embedding distance denotes different species and a
small embedding distance indicates same species. There are two types of training
methods investigated i.e., frozen front layers and non-frozen front layers.


Frozen Front Layers In this method, the front layers of the pre-trained
Herbarium and Field network, or simply, the extractor layer of the network is
frozen. This allows only the weights in the newly added layer (triplet loss layer)
to be updated.


Non-Frozen Layers This method on the other hand trains all layers in the
network. It allows the network to relearn and recompute the embeddings of
herbarium and field images with respective to their optimized embedding space
from the triplet loss. The new layers are set to have a higher learning rate than
the migrated layers.


3.2    Training stages

Herbarium Network As mentioned in 3.1, a Herbarium network based on
the Inception-v4 model [13] is set up to make up the Herbarium-Field triplet
loss network. The Herbarium network is initialized on weights pre-trained from
ImageNet [11] and trained with PlantCLEF 2020 dataset (herbarium images) [5].

Field Network Likewise, the Field network adopts the Inception-v4 [13] net-
work architecture. It is also initialized with weights pre-trained from ImageNet
[11] but trained with PlantCLEF 2017 dataset (field images) [7] instead.

Herbarium-Field Triplet Loss Network Once the Herbarium and Field net-
works are trained, the Herbarium-Field Triplet Loss network is set up. The net-
work is trained with PlantCLEF 2020 dataset [5] consisting of both herbarium
and field images. The network trained in the Non-Frozen Layers setup is set with
a learning rate of 0.00001 in the migrated layers and 0.0001 in the newly added
layers, whereas the Frozen Front Layers setup is set with a learning rate of zero
in the migrated layers.

1
    The triplet loss is computed using triplet semihard loss function provided in Ten-
    sorflow 1.13 [1]
           Table 1. Training dataset distribution for different networks
                                    Number of images Number of classes
                Network
                                   Herbarium Field Herbarium Field
                Herbarium             305,531       -       997       -
                   Field                 -      1,187,484    -     10,000
       Herbarium-Field Triplet Loss   197,552    6,257      435     435


4     Training Setup

4.1   Data Preparation

As mentioned in the task description, only a subset of species for field images
were provided to allow learning a mapping between the herbarium and field
domain. We separated the species which possess both herbarium and field images
to be used for mapping. Out of 997 classes, 435 classes were identified having
both herbarium and field images. These classes were then used for training.
Although the total number of classes was reduced from 997 to 435 species, the
network was still trained to map the embedding space of 997 classes.
    During the training of the Herbarium-Field triplet loss network, the images
used for each batch were picked to be balanced for each class. For instance, in a
batch of size 16, each class may not comprise more than 4 images, meanwhile the
minimum number of images in each class is 2. This allows a balanced selection
of anchors for the triplet loss.


4.2   Data Augmentation

In order to increase the network generalization and increase training sample size,
data augmentation was applied on the training images. Random cropping, hor-
izontal flipping and colour distortion (brightness, saturation, hue, and contrast)
of images were performed on the training dataset. As a result, features and var-
ious transforms that are invariant to their original locations can be learned by
the network, consequently reducing the chance of overfitting [10].


4.3   Training Dataset and Hyperparameters

The training dataset distributions and network setup parameters are summarized
in Table 1 and Table 2 respectively.


5     Experiments

The experiments were conducted using Tensorflow 1.13 [1] alongside slim pack-
ages. The codes are available at https://github.com/NeuonAI/plantclef2020 challenge
                          Table 2. Network training parameters
                         Herbarium and Field Network Herbarium-Field Triplet Loss Network
       Parameter
                                    Value                           Value
       Batch Size                     256                                 16
  Input Image Size               299 × 299 × 3                     299 × 299 × 3
       Optimizer              Adam Optimizer[8]                 Adam Optimizer[8]
Initial Learning Rate               0.0001                            0.0001
      Weight Decay                  0.00004                           0.00004
      Loss Function         Softmax Cross Entropy                   Triplet Loss


5.1     Dataset

Due to the limited field training samples, prior to training, a sample of images
from each of the “herbarium photo associations” and “photo” folders were ran-
domly segregated for validation purposes. 1,219 field images were separated from
the test set leaving 5,038 field images for training instead of 6,257 as stated in
Table 1. The number of images and classes present in the experimented training
and testing dataset are summarized in Table 3. Nevertheless, the class num-
ber for the Herbarium-Field triplet loss network remains 997 and 10,000 in the
Herbarium and Field network stream respectively.


       Table 3. Dataset of experimented Herbarium-Field Triplet Loss Network.

                           Network             Herbarium       Field
                           Dataset            Train Test Train Test
                      Number of images       153,867 43,685 5,038 1,219
                    Number of classes present 435     434 435 345


5.2     Inference Procedure

Herbarium dictionary
For inference, the embeddings from 997 herbarium classes were first extracted
using the trained Herbarium-Field triplet loss network to form the reference
embeddings served as a herbarium dictionary. Random samples from each class
were picked and fed into the network to obtain the embeddings. The extracted
embeddings were then averaged to get a single embedding representation for
each class. The embedding for each class was subsequently saved as a dictionary.
    Note that the extraction was done with two different types of image cropping,
namely, Center Crop and Center and Corner Crop. The Center Crop approach
crops the centre region of the herbarium sample. Meanwhile, the Corner Crop
approach on the other hand crops the top left, top right, bottom left, and bottom
right region of the herbarium sample. Each region was cropped and resized then
      Table 4. Validation Accuracy with Center Crop Herbarium Extraction.

                                         Top 1                   Top 5
                             Top 1    Center Crop    Top 5    Center Crop
             Networks
                          Center Crop      +      Center Crop      +
                                      Corner Crop             Corner Crop
                FL         27.48 %     28.63 %     50.78 %     52.42 %
               NFL         32.65 %     32.73 %     59.97 %     58.98 %
            NFL-ENS        36.42 %     37.33 %     65.14 %     67.51%
             NFL-AUG       18.05 %     18.46 %     42.49 %     42.49 %
          NFL-AUG-ENS      36.42 %     37.33 %     65.14 %     67.51 %

Table 5. Validation Accuracy with Center and Corner Crop Herbarium Extraction.

                                         Top 1                   Top 5
                             Top 1    Center Crop    Top 5    Center Crop
             Networks
                          Center Crop      +      Center Crop      +
                                      Corner Crop             Corner Crop
                FL         27.40 %     29.20 %     50.78 %     52.17 %
               NFL         33.06 %     34.29 %     59.80 %     58.98 %
            NFL-ENS        36.10 %     37.57 %     63.82 %     66.45 %
             NFL-AUG       18.29 %     18.79 %     41.84 %     42.74 %
          NFL-AUG-ENS      36.10 %     37.57 %     63.82 %     66.45 %


passed into the network for the extraction of herbarium embeddings.

Feature similarity
After obtaining the single embedding representation of each class, the saved
dictionary is then used to compare the embedding distance between the 997
herbarium representation and the test image. During validation, Center and
Corner Crop were also applied together with horizontal flip in obtaining the test
images’ embeddings. This resulted in 10 different variations for each image which
was then averaged to obtain their similarity probability. Cosine similarity was
used as the distance metric in measuring the embedding similarity. Then, the
cosine distance was obtained by subtracting the cosine similarity from 1. Finally,
inverse distance weighting was performed on the cosine distance to obtain the
probabilities of each class.

5.3   Network and Results
The experimented results are tabulated in Table 4 and Table 5 for Center Crop
and Center Crop and Corner Crop herbarium extraction methods respectively.
The networks were tested on the same validation set of 1,219 images in which
the Top 1 and Top 5 predictions were evaluated. Center Crop and Corner Crop
were also applied on the field test set before validation. 5 different Herbarium-
Field triplet loss networks were experimented, i.e.:

Network 1: Frozen Front Layers (FL) A network trained with frozen front
layers.
Network 2: Non-Frozen Layers (NFL) A network trained with non-frozen
layers, or to put simply, trained with all layers.


Network 3: Non-Frozen Layers Ensemble Model (NFL-ENS) A ensem-
ble of 3 different models trained on all layers.


Network 4: Non-Frozen Layers Increased Augmentation (NFL-AUG)
A network trained with all layers whereby the training images were pre-processed
with more transformations and augmentation.


Network 5: Non-Frozen Layers Increased Augmentation Model En-
semble (NFL-AUG-ENS) An ensemble of Network 3 and Network 4.


5.4   Discussion

From the experiments, it can be seen that the NFL ensemble models performed
the best among the networks. The ensemble of these networks increased the ro-
bustness of the system and returned better predictions. On the other hand, the
FL network performed the worst among the networks. It can be suggested that
the training of all layers does help the prediction model instead of freezing the
front layers or extractor layers of the network. In can be seen that the ensem-
ble models with increased augmentation performed equally as to the ensemble
model without increased augmentation. It can be suggested that the increased
augmentation may have not produced enough new significant information for
the network to learn. Since a portion of field images were separated from the
training set to serve as test set, some of the classes may miss some field infor-
mation. In addition, the trained model does not represent the entire classes as
some classes miss field images. Consequently, the networks did not performed as
well as it was not fed with sufficient images to represent the field domain. An
approach to increasing the prediction accuracy would be increasing the training
samples of the field images that are not present in the training set.


6     Submission

6.1   Inference Procedure

The procedure adopted to produce the submitted results are as follow:

(i) Construct herbarium dictionary by extracting samples of herbarium embed-
    dings for all 997 plant species using the trained Herbarium-Field triplet loss
    network.
    (a) Apply Center and Corner Crops on the images before extraction.
    (b) Average the cropped herbarium embeddings for each species and save
        them.
   (ii) Group the test images belonging to the same observation ID.
  (iii) For each image under the same observation ID, apply Center and Corner
        Crops which result in 5 images each.
  (iv) Subsequently flip the images horizontally resulting in 10 images each.
   (v) Average the 10 images and pass them to the Herbarium-Field triplet loss
        network.
  (vi) Obtain the image embeddings.
 (vii) Compute cosine similarity between each of the extracted embeddings with
        the saved 997 herbarium embeddings.
(viii) Obtain cosine distance by subtracting the cosine similarity from the value
        of 1.
  (ix) Apply inverse distance weighting on the cosine distance.
   (x) Obtain the probabilities of the embedding distance.
  (xi) Average the probabilities over the total number of images for each observa-
        tion ID.
 (xii) Repeat steps (iii) to (xii) for the remaining observation IDs.
(xiii) Collect the predictions, probabilities and ranks for each observation ID.


  6.2   Submitted Runs

  The team submitted a total of seven runs based off the networks mentioned in
  Section 5.3.


  Run 1 This model was based off (FL). Unlike the rest of the runs, this net-
  work was trained with frozen front layers and does not apply image flipping
  during validation. Moreover, the embedding distances were normalized, inversed
  then applied with softmax to obtain the probabilities. In addition, the probabil-
  ities were based off the averaged embedding instead of all embeddings for each
  observation ID.


  Run 2 This model was based off (NFL). Similar to Run 1 however it was
  trained with all layers of the network, the embeddings of each observation IDs
  were averaged and then applied with Cosine Similarity and Inverse Distance
  Weighting to obtain the probabilities.


  Run 3 This model was based off (NFL). Similar to Run 2 however by using
  Cosine Similariy and Inverse Weighting, the probabilities of each embeddings
  were first computed then averaged for each observation IDs .


  Run 4 This model was based off (NFL). Similar to Run 3 however the probabil-
  ities take into account the total embeddings of each observation IDs multiplied
  by their croppings which consist of 10 variations.
                   Table 6. MRR Score of the Submitted Runs

                 Run           MRR Whole            MRR Sub-Set
                   7              0.121                 0.107
                   5              0.111                 0.108
                   3              0.103                 0.094
                   2              0.099                 0.076
                   6              0.093                 0.066
                   4              0.088                 0.073
                   1              0.081                 0.061


Run 5 This model was based off (NFL-ENS). Unlike Run 1 to 4, the network
was trained together with the full dataset as stated in Table 1. It is also an
ensemble of the predictions from 3 models of the same network.

Run 6 This model was based off (NFL-AUG). Similar to Run 5 which was
trained with the full dataset however it is not an ensemble of models and trained
with increased image processing transformations and augmentations.

Run 7 This model was based off (NFL-AUG-ENS). This run is the ensemble
of the predictions from Run 5 and Run 6.

6.3   Submission Results
Our best submitted runs scored a Mean Reciprocal Rank (MRR) of 0.121 and
0.108 for the first and second metric respectively. Our results are tabulated in
Table 6. The results by all the participating teams are summarised in Fig. 3 and
Fig. 4.

6.4   Discussion
Similar to the experiment results, the ensemble models performed the best among
the networks. The ensemble model with increased augmentation on the other
hand performed best in the whole test set. In addition, the MRR score of the
networks for the first and second metric are relatively close despite the few train-
ing photos in the sub-set species. It can be suggested that the number of training
samples for each class does not directly influence the performance of the model.
Other than filling the missing training samples of the field classes, the methods
in obtaining the herbarium embedding representation can also be looked into to
increase prediction accuracy. Such methods involve finding the best herbarium
dictionary representation. Various image processing methods like flipping can be
performed before extracting the herbarium embeddings. Meanwhile, finding the
best model of the Herbarium-Field Triplet Loss Network and using it for the
extraction of the herbarium embeddings would be significant as well.
             Fig. 3. Official Results of PlantCLEF 2020.


Fig. 4. Official Results of PlantCLEF 2020 (Second Metric Evaluation).
                     Table 7. MRR Score of Post-challenge Runs

                 Run               MRR Whole             MRR Sub-Set
                      8              0.101                  0.094
                      9              0.114                  0.105
                     10              0.110                  0.107

Table 8. Post-challenge Validation Accuracy with Center Crop Herbarium Dictionary.

                                     Top 1                   Top 5
                         Top 1    Center Crop    Top 5    Center Crop
               Run
                      Center Crop      +      Center Crop      +
                                  Corner Crop             Corner Crop
                8         44.71%     45.94%     75.80%       77.19%
                 9        36.42%     37.33%     65.14%       67.51%
                10        36.42%     37.33%     65.14%       67.51%

Table 9. Post-challenge Validation Accuracy with Center and Corner Crop Herbarium
Dictionary.

                                     Top 1                   Top 5
                         Top 1    Center Crop    Top 5    Center Crop
               Run
                      Center Crop      +      Center Crop      +
                                  Corner Crop             Corner Crop
                8         46.02%     48.32%    74.98%        76.95%
                9         36.10%     37.57%     63.82%       66.45%
                10        36.10%     37.57%     63.82%       66.45%


7   Post-challenge Runs
In addition to the submitted results, the team trained another 3 runs which was
based off the continuation of Run 6. However, the results did not performed
better than the submitted runs. Since the runs were trained with the whole
dataset, we believe the drop in performance is due to overfitting as there was no
baseline to determine when to stop training the model. The MRR score of the
runs are tabulated in Table 7.

Run 8 This model was based off (NFL-AUG). This run was a continuation of
the training from Run 6 which was trained with increased iterations.

Run 9 This model was based of (NFL-AUG-ENS). This run was an ensemble
of Run 8 and Run 5 predictions.

Run 10 This model was based off (NFL-ENS). This run was an ensemble of 3
different models from Run 8.

   We tested the post-challenge runs on our segregated test set as well and the
results are tabulated in Table 8 and Table 9 for Center Crop and Center and Cor-
ner Crop herbarium dictionary construction methods respectively. In contrast
with its MRR score, Run 8 shows the best performance in the experimental
validation setup when in fact it performed the worst among the post-challenge
runs. This is likely due to overfitting as mentioned.


8    Conclusion

In this paper we have presented our approach in PlantCLEF 2020 which focused
on the cross-domain plant identification between herbarium sheets and in-field
photos. We adopted a two-streamed Herbarium-Field triplet loss network which
performed relatively equal regardless if few field training images were given.
Based on the similar score between MRR metric 1 and 2, it is proven that the
proposed network feature is not directly affected by the plant class but it learns
to perceive the similarity between a given field image with herbarium images. It
is shown that even with a minimal amount of field images for each species, cross-
domain plant identification can be performed. The identification of real-world
plants based on herbarium sheets alone is indeed a challenging task. Although
our machines did not performed as well with missing field classes which is the case
in real-world, it shows that with sufficient data, it offers a step in alleviating the
tedious task of herbarium-field classification which requires high level expertise.
For future work, the field images that are not present among the training dataset
can be added to improve the predictions. This would allow the model to learn
the whole representation of plant species with respect to their herbarium and
field domain. Furthermore, the extraction of herbarium embeddings to form a
more powerful dictionary can be investigated to find the best representation of
herbarium embeddings for the herbarium-field similarity comparison.


Acknowledgment

The resources of this project is supported by NEUON AI SDN. BHD., Malaysia.


References

 1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,
    G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,
    Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg,
    J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J.,
    Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V.,
    Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng,
    X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015),
    https://www.tensorflow.org/, software available from tensorflow.org
 2. Argüeso, D., Picon, A., Irusta, U., Medela, A., San-Emeterio, M.G., Bereciartua,
    A., Alvarez-Gila, A.: Few-shot learning approach for plant disease classification
    using images taken in the field. Computers and Electronics in Agriculture 175,
    105542 (2020)
 3. Chandra, M., Patil, P.S., Roy, S., Redkar, S.S.: Classification of various plant dis-
    eases using deep siamese network (2020)
 4. Funk, V.A.: 100 uses for an herbarium: well at least 72. American Society of Plant
    Taxonomists Newsletter (2003)
 5. Goëau, H., Bonnet, P., Joly, A.: Overview of the lifeclef 2020 plant identification
    task. In: CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation
    Forum, Sep. 2020, Thessaloniki, Greece. (2020)
 6. Joly, A., Deneu, B., Kahl, S., Goëau, H., Ruiz De Castaneda, R., Champ, J.,
    Eggel, I., Cole, E., Bonnet, P., Botella, C., Dorso, A., Glotin, H., Lorieul, T.,
    Servajean, M., Stöter, F.R., Vellinga, W.P., Müller, H.: Lifeclef 2020: Biodiversity
    identification and prediction challenges. In: Proceedings of CLEF 2020, CLEF:
    Conference and Labs of the Evaluation Forum, Sep. 2020, Thessaloniki, Greece.
    (2020)
 7. Joly, A., Goëau, H., Glotin, H., Spampinato, C., Bonnet, P., Vellinga, W.P., Lom-
    bardo, J.C., Planque, R., Palazzo, S., Müller, H.: Lifeclef 2017 lab overview: multi-
    media species identification challenges. In: International Conference of the Cross-
    Language Evaluation Forum for European Languages. pp. 255–274. Springer (2017)
 8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint
    arXiv:1412.6980 (2014)
 9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
    volutional neural networks. In: Advances in neural information processing systems.
    pp. 1097–1105 (2012)
10. Mikolajczyk, A., Grochowski, M.: Data augmentation for improving deep learn-
    ing in image classification problem. In: 2018 international interdisciplinary PhD
    workshop (IIPhDW). pp. 117–122. IEEE (2018)
11. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
    Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large
    scale visual recognition challenge. International journal of computer vision 115(3),
    211–252 (2015)
12. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face
    recognition and clustering. In: Proceedings of the IEEE conference on computer
    vision and pattern recognition. pp. 815–823 (2015)
13. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
    and the impact of residual connections on learning. In: Thirty-First AAAI Confer-
    ence on Artificial Intelligence (2017)
14. Wäldchen, J., Rzanny, M., Seeland, M., Mäder, P.: Automated plant species
    identification—trends and future directions. PLoS computational biology 14(4),
    e1005993 (2018)

</pre>