=Paper= {{Paper |id=Vol-3067/paper13 |storemode=property |title=A deep learning-based approach for segmenting and counting reproductive organs from digitized herbarium specimen images using refined Mask Scoring R-CNN |pdfUrl=https://ceur-ws.org/Vol-3067/paper13.pdf |volume=Vol-3067 |authors=Abdelaziz Triki,Bassem Bouaziz,Jitendra Gaikwad,Walid Mahdi |dblpUrl=https://dblp.org/rec/conf/tacc/TrikiBGM21 }} ==A deep learning-based approach for segmenting and counting reproductive organs from digitized herbarium specimen images using refined Mask Scoring R-CNN == https://ceur-ws.org/Vol-3067/paper13.pdf
A deep learning-based approach for segmenting and
counting reproductive organs from digitized
herbarium specimen images using refined Mask
Scoring R-CNN
Abdelaziz Triki1 , Bassem Bouaziz1 , Jitendra Gaikwad2 and Walid Mahdi1
1
    MIRACL/CRNS, University of Sfax, Tunisia
2
    Friedrich Schiller University Jena, Germany


                                         Abstract
                                         The accurate segmentation and counting of the reproductive organs within the herbarium specimen play
                                         an important role in studying the impact of climate change on plant development over time. Recently, the
                                         researchers have gained a lot of knowledge about plant phenology owing to herbaria’s digitization efforts,
                                         which may help accelerate plant phenology research by making large digitized specimen collections
                                         publicly available. Nevertheless, the automatic segmentation and counting of the reproductive organs
                                         is a challenging problem. This is because of the high variability of reproductive organs, which vary in
                                         size, shape, orientation, and color. The use of machine learning techniques, including deep learning,
                                         has recently been shown to be helpful in this endeavor. We proposed in this paper a deep learning
                                         method based on the refined Mask Scoring R-CNN approach to segment and count reproductive organs,
                                         including buds, flowers, and fruits from specimen images. Our proposed method achieved a precision
                                         rate of 94.5% and a recall rate of 93%.

                                         Keywords
                                         Reproductive Organs, Mask Scoring R-CNN, Digitized Herbarium Specimen Images,


   Nowadays, the impact of climate change on organisms has received significant attention
from scientists. Variations in phenological events have been important where species are failing
to react phenologically to climate change, causing serious consequences for long-term diversity
preservation [1].
   Information about plant phenology found on herbarium specimens, including flowers, buds,
and fruits, has recently been proven to be of considerable importance. Currently, herbaria
across the globe have collected millions of physical specimen sheets, reducing their use and
accessibility by scientists. The gathered specimen sheets include relevant information for
addressing research questions such as the flowering period [2].

unisian Algerian Conference on Applied Computing (TACC 2021), December 18–20, 2021, Tabarka, Tunisia
$ abdelaziz.triki@yahoo.fr (A. Triki); bassem.bouaziz@isims.usf.tn (B. Bouaziz); jitendra.gaikwad@uni-jena.de
(J. Gaikwad); walid.mahdi@isims.usf.tn (W. Mahdi)
€ https://www.researchgate.net/profile/Abdelaziz-Triki (A. Triki);
https://www.researchgate.net/profile/Bassem-Bouaziz-2 (B. Bouaziz);
https://www.researchgate.net/profile/Jitendra-Gaikwad-2 (J. Gaikwad);
https://www.researchgate.net/profile/Walid-Mahdi (W. Mahdi)
 0000-0001-5818-2941 (A. Triki); 0000-0002-3692-9482 (B. Bouaziz); 0000-0002-7565-4389 (J. Gaikwad);
0000-0003-3465-0397 (W. Mahdi)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
   To make millions of stored specimen images more easily accessible, herbaria worldwide
have started a massive digitization process [3] of their collections, providing not only access
to meta-data, but also to the content of the specimen. Consequently, scientists employed the
digitized herbarium collections to track the effects of phenological events on climate change
across locations and time [4].
   The remainder of this paper is structured as follows. Section 2 presents the related work for
segmentation and counting of reproductive organs, while the third section shows our proposed
approach. Section 4 discusses the findings of the assessment and experiments we conducted
to determine how well our method performed. Section 5 presents the conclusion and future
directions.


1. Related Work
Numerous studies using digitized herbarium specimen (DHS) images have focused on a single
phenological event such as the flowering period to investigate climate change. Only a few
studies have investigated several reproductive organs at the same time (such as measuring the
fruit area) and determined how different phenological events are linked [5].
   To estimate flowering time, Park et al. [6] have proposed the first initiative, which manipulates
the phenological organs, such as whether or not flowers or fruits are present, together with
the day of the year of gathering. Nevertheless, the researchers completely overlooked other
reproductive organs such as buds, seeds, and roots.
   Recently, with the advent of deep learning techniques, several methods based on Convolu-
tional Neural Networks (CNN) have been developed to assess the reactiveness of different points
in a given phenophase. The study developed by Lorieul et al. [7] showed that a deep learning
method based on CNN could detect the presence/absence of fruits or flowers within DHS images
and overlook the buds. However, the detection accuracy of flowers and fruits was low (≈ 85%
and ≈ 80% accuracy, respectively). Besides, Ellwood et al. [8] established a new method for
classifying reproductive organs into two major groups: fertile and sterile specimens. However,
the proposed approach is unable to segment and count the reproductive organs accurately
within the DHS images.
   On the other hand, Goëau et al. [9] developed a deep learning-based approach using Mask
R-CNN [10] to segment and count the reproductive organs of a small dataset containing one
species (Streptanthus tortuosus). The testing findings showed that flowers within DHS images
are more consistently recognized and counted than fruits. To overcome the limitations of
Goëau’s approach, Davis et al. [11] have built a deep learning model to cope with a higher
number of species. The developed solution locates and counts the reproductive organs of
buds, flowers, and fruits within the specimen sheets. Also, they used 3000 specimens of five
common wildflower species from the eastern United States of America to train their model.
According to the experimental results, the counting was accurate, but it differed depending on
the reproductive organs. Here, the counting of flower organs is less accurate than counting
buds and fruits due to the morphological variability of flowers within the scanned specimens.
   In this paper, we developed a deep learning-based approach for segmenting and counting
reproductive organs, including buds, fruits, and flowers, from DHS images using refined Mask
Scoring R-CNN [12]. Our method generates a boundary mask for each segmented object
and classifies it based on that mask. Furthermore, our suggested solution was trained and
evaluated using a dataset of DHS images collected from the Herbarium Haussknecht of Jena
(https://www.herbarium.uni-jena.de/).


2. Specimen Dataset Collection
We used in this paper several species gathered from the herbarium Haussknecht of Jena,
Germany, to train and test our proposed approach. This dataset comprises 4000 DHS images
containing different families of species with obvious phenotypic changes during the reproductive
cycle to prevent singularity.
   The collected specimens were resized by changing their sizes from 6400 × 3400 pixels to
2048 × 1024 pixels for height and width, respectively. It includes objects of different sizes and
shapes. The resizing process minimizes unnecessary memory usage and maintains the aspect
ratio of the scanned herbarium sheets. Moreover, the VGG Image Annotator (VIA) annotation
tool [13] was used to annotate each specimen by hand, and the resulting JSON file represents
the bounding mask for each reproductive organ inside the scanned specimen images. Each class
has a bounding polygon to indicate its bounding mask due to its uneven form.
   On the other hand, this study splits the dataset into two halves at random: 70% of the dataset
was used to train our approach, and 30% of the dataset was used to perform the test phase.


3. Proposed Approach
The quantitative analysis of reproductive organs within DHS images, including buds, flowers,
and fruits, will substantially enhance the use of specimen collections in climate change research.
Given the massive number of DHS images stored in the Haussknecht herbarium in Jena and the
diversity of reproductive organs introduced within the herbarium scans (i.e., scales, orientations,
shapes, and colors), we proposed a deep learning-based approach for counting these reproductive
organs and improving their segmentation accuracy. The developed approach is a refined version
of the state-of-the-art instance segmentation technique, Mask Scoring RCNN [12], in which the
backbone model was altered by proposing a new Feature Pyramid Networks (FPN) network.

3.1. Mask Scoring RCNN
Mask Scoring RCNN [12] is an improved variant of Mask-RCNN [10] that improves instance
segmentation accuracy. In comparison to Faster R-CNN [14], Mask R-CNN is more efficient in
terms of time and accuracy since it supports pixel-level instance segmentation. Additionally,
Mask-RCNN utilizes classification confidence as the mask’s consistency metric. Nevertheless,
mask accuracy is often uncorrelated with classification confidence, resulting in sub-optimal
precision and durability of predicted masks. Mask Scoring RCNN learns and predicts mask
consistency using a specific network block to overcome the cited problems. This block contains
the instance-specific features and the expected mask. Additionally, it measures the intersection-
over-union (IoU) between the expected mask and the ground truth. The IoU is employed to
determine the predicted mask’s shape. This enables Mask Scoring RCNN to get much better
instance segmentation outcomes than Mask-RCNN.
   On the other hand, Mask Scoring RCNN model is divided into three parts. The first part of
the approach shows a standard convolutional layer for extracting fine features. As a result, the
feature maps produced will be sent into the Region Proposal Network (RPN), which will create
and correct the region proposals. The last part of the model is used to fine-tune the bounding
boxes and the resulting masks.

3.2. Feature Extraction
For image feature extractions, deep convolutional networks with different weight layers are
frequently employed. With more convolution layers, the training error may rise, and classifica-
tion accuracy may fall. ResNet effectively tackled this issue by figuring out how to represent
residuals between inputs and outputs. By distinguishing between various types of objects,
ResNet speeds up training time and improves the accuracy of predictions.
   The conventional Mask Scoring RCNN model uses ResNet [15] in conjunction with FPN [16]
as the backbone model to extract the image features. However, extracting the reproductive
organs from the DHS image using this backbone may produce irrelevant predictions with lower
performance. This is due to several scale variations that may affect how well reproductive
organs can be extracted from DHS images.
   When it comes to object instance segmentation using Mask Scoring R-CNN, one of its
weaknesses is scale variation. Due to the wide variety of objects’ sizes and shapes within the
DHS images, detectors have difficulty segmenting the most miniature objects such as buds.
To remedy the existing method problems, we present in this paper a refined FPN network to
enhance the backbone structure of the Mask Scoring R-CNN network model.
   The conventional FPN of Mask Scoring R-CNN comprises bottom-up and top-down paths
to form a two-channel feature extraction path model. As a result, the high-level features may
be increased, and all features in FPN with a decent classification ability will be improved.
Nevertheless, the conventional two-channel path does not provide sufficient coverage to collect
semantic feature data from the pyramid network’s bottom layer and the upper layer of the
network.
   To resolve the issues involved with the conventional two-channel pathways of FPN, we
developed a three-channel FPN pathway. A more significant number of paths are available to
acquire the features, as illustrated in Figure 1, resulting in a greater improvement in segmentation
accuracy and reducing the training time. By adding the third channel path, the model will be
updated as follows:

    • A 3 × 3 convolution kernel with a step size of 3 was conducted by the layer a to reduce
      the spatial size.
    • To create the merged feature layer, we performed a lateral join on the second channel
      feature layer x.
    • As with the second step, add layers one at a time until achieving the layer d and then
      stop. The final a, b, c layers represent the third channel feature extraction layers added
      from the bottom up path.
  Figure 1 represents the refined FPN structure, which facilitates continuous information
propagation and improves the segmentation performance.




Figure 1: Refined FPN structure




4. Experimental Results
In this paper, we implemented our proposed approach using Facebook’s Open Source Maskrcnn-
Benchmark project (https://github.com/facebookresearch/maskrcnn-benchmark). We also em-
ployed Google Collaboratory [17] to perform the training and testing phases. Besides, our
approach was trained for 100 epochs with 300 steps for each epoch, while the stochastic gradi-
ent descent (SGD) was initialized with a learning rate of 0.001, a momentum of 0.9, and a weight
decay of 0.0005.
   During the test phase, we selected randomly 300 DHS images of five common species (Aplopap-
pus stoloniferous, DC., var; Eupatorium scordonioides A. Gray; Penstemon pulchellus Lindl; Senecio
Chapalensis, Watson and Russelia trachypleura, Rob). The reproductive organs on each herbar-
ium sheet vary in size and position. We identified in this paper 1450 reproductive organs to be
used as ground truth data for this research, comprising 262 buds, 483 fruits, and 705 flowers.
   To examine the model’s consistency and measure the segmentation performance, we measured
the Precision (Eq. 1), Recall (Eq. 2) and Average Precision (AP) metrics for our method, the
original Mask Scoring-RCNN and the Mask-RCNN, using various backbone networks of the
ResNet-50/101 architectures.

                                              𝑇 𝑟𝑢𝑒𝑃 𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
                      𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                                                            (1)
                                     𝑇 𝑟𝑢𝑒𝑃 𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹 𝑎𝑙𝑠𝑒𝑃 𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
                                             𝑇 𝑟𝑢𝑒𝑃 𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
                        𝑅𝑒𝑐𝑎𝑙𝑙 =                                                              (2)
                                   𝑇 𝑟𝑢𝑒𝑃 𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹 𝑎𝑙𝑠𝑒𝑁 𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
  As shown in Table 1, our proposed method had an overall precision and recall of 88.9% and
Table 1
The precision/recall of our approach
                         Metrics       Buds    Fruits    Flowers    Overall
                         Precision     86%     89.2%     91.6%      88.9%
                         Recall        88.5%   91.7%     87.1%      89.1%


89.1%, respectively. A total of 19 buds, for example, were mistakenly labeled as flowers owing
to the overlapping of various organs within the DHS images.
   According to the segmentation findings in Table 1, our method can correctly identify the
presence or absence of reproductive organs within the DHS images. Segmenting smaller objects
(such as buds) with varying occlusion levels is also feasible. For flower segmentation, our
architecture performs better than other reproductive organs. It achieved a precision of 91.6%
and a recall rate of 87.1%.

4.1. The impact of data augmentation approaches on segmentation results
Over the last few years, object segmentation has shown impressive results using deep learning
methods. However, the scarcity of data is seen as a significant drawback since large quantities
of labeled datasets are required to improve model accuracy and prevent overfitting. As a
workaround for the absence of large annotated datasets, we performed several data augmentation
strategies, including rotation, translation, scale, and brightness [4]. Additionally, the goal of this
study is to determine the effect of different image augmentation techniques on the segmentation
of reproductive organs.
   To examine our model’s stability while counting the reproductive organs, we created the
confusion matrix (Table 2). This matrix illustrates our approach’s reliability by comparing the
actual count and the predicted count of different reproductive organs within the DHS images.
   As shown in Tables 2 and 3, when raising the amount of training data using data augmentation
methods, our model’s performance improved significantly. As a result, the overall precision of
our approach was increased by 5.6%, while the overall recall was increased by 3.8%, respectively.
This demonstrates that the proposed method with data augmentation may enhance the model’s
capacity to conduct instance segmentation.

4.2. The counting evaluation
To evaluate the counting capability of our proposed approach, we measured the Mean Absolute
Error (MAE) for each reproductive organ within the DHS images. The MAE is a common metric
used to evaluate the model’s accuracy. It measures the average absolute value of the counting
error of each reproductive organ as follows (Eq. 3):
                                                   𝑖=1
                                                1 ∑︁
                                       𝑀 𝐴𝐸 =        |𝑥𝑖 − 𝑥|                                     (3)
                                                𝑛 𝑛
  Where n is the number of errors.
Table 2
The confusion matrix of our approach when applying the data augmentation approaches using different
species
 Species Names        No.     No. of Buds               No. of Fruits            No. of Flowers
                      of
                      DHS
 Aplopappus             54    Ground Truth      53      Ground Truth     97      Ground Truth     142
                      ,
 stoloniferous                Detected          49      Detected         94      Detected         139
 Eupatorium                   Ground Truth      49      Ground Truth     88      Ground Truth     134
                      46
 scordonioides                Detected          44      Detected         82      Detected         130
 Penstemon                    Ground Truth      46      Ground Truth     89      Ground Truth     127
                      65
 pulchellus                   Detected          39      Detected         80      Detected         120
 Senecio                      Ground Truth      51      Ground Truth     101     Ground Truth     140
                      72
 Chapalensis                  Detected          43      Detected         92      Detected         136
 Russelia                     Ground Truth      63      Ground Truth     108     Ground Truth     162
                      63
 trachypleura                 Detected          56      Detected         103     Detected         158
                              Ground Truth      262     Ground Truth     483     Ground Truth     705
 Total                300
                              Detected          228     Detected         451     Detected         683

Table 3
The precision/recall of our approach when using the data augmentations
                           Metrics     Buds    Fruits    Flowers    Overall
                           Precision   93.8%   96.1%     93.6%      94.5%
                           Recall      87%     93.3%     98.2%      93%

Table 4
Predicted and true counts of reproductive organs, including buds, flowers, and fruits
                                                           Buds    Fruits      Flowers   All
         True Number of reproductive organs                262     483         705       1450
         Predicted Number of reproductive organs           228     451         683       1362
         Mean Absolute Error (MAE)                         0.34    0.32        0.22      0.8


  As shown in Table 4, the MAE value of our approach was very low for all kinds of reproductive
organs, where the lowest value was obtained by the flower organs.

4.3. Experimental results and evaluation of multiple segmentation approaches
To assess the efficacy of our proposed approach in segmenting the reproductive organs from the
herbarium sheets, we compared it to other state-of-the-art approaches with various backbone
models. Table 5 presents the findings of the assessment. By employing ResNet101 as our
backbone network, our method outperformed Mask Scoring RCNN and Mask-RCNN when
identifying reproductive organs.
   Using the original Mask Scoring RCNN approach, many reproductive organs were incorrectly
segmented, resulting in lower overall results (the achieved overall precision and recall were 84%
, and 86% , respectively). Furthermore, as shown in Figure 2, our proposed model’s reliability
Table 5
The comparison results of our approach using different ResNet50/101 backbones
                      Models                            Precison    Recall
                      Original Mask Scoring RCNN
                                                        86.3%       88%
                      with ResNet50
                      Original Mask Scoring RCNN
                                                        84%         86%
                      with ResNet101
                      Mask-RCNN with ResNet50           84%         82%
                      Mask-RCNN with ResNet101          85%         83%
                      Our approach with ResNet50        90.6%       91.3%
                      Our approach with ResNet101       94.5%       93%


may be confirmed. This curve gives additional insight into the proposed approach. Also, Figure
3 shows that the loss remains constant after 100 epochs, whereas the model training takes
around 21 hours. As a result, when ResNet101 is used as a backbone network, the total loss was
minimized while accuracy increased during the training phase.




Figure 2: Accuracy curve of our proposed approach




5. Discussion
We present in this paper a new deep learning based-approach for segmenting and counting the
reproductive organs such as buds, flowers, and fruits from the DHS images. For the sake of
this research, 4000 different herbarium images containing reproductive organs were manually
annotated. However, several factors impact the performance of our proposed approach, including
the quantity of the training dataset and the high variability of reproductive organs within the
scanned herbarium, which vary in size, shape, orientation, and color.
   In this paper, we investigate the effect of organ size on the segmentation of reproductive
organs within the herbarium sheets, where smaller objects, such as buds, perform worse than
Figure 3: Loss curve of our proposed approach


larger ones. Buds are performing worse for a variety of reasons. One example is the small
number of bud organs found in the DHS images, where about 70% of the samples were devoid
of them. Flowers and fruits have higher visual distinctiveness, whereas buds are smaller and
have lower visual distinctiveness. As a result, flowers seem to be the most segmented organ,
followed by fruits and buds, because they are more significant than buds and they are not as
widely distributed as fruits and buds (Figure 4).
   Further analysis of counting results presented in the confusion matrix (Table 2) and Table 4
reveal that the overall number of counted reproductive organs using our approach was quite
close to the actual value (ground truth). For example, the lowest MEA was for the flower counts
due to the morphological uniformity and abundance of flowers within the herbarium sheets.
As demonstrated in Table 2, the total number of counted flower organs was 683, while their
true value was 705 organs (MAE = 0.22). As a result of their irregular shape and scarcity within
the DHS images, MAE for counting buds was considerably poorer than for flowers or fruits, as
previously mentioned (MAE = 0.34).
   By comparing our approach with other state-of-the-art techniques, we found that our method
performed well for segmenting and counting the reproductive organs within the DHS images
(Table 5). The precision and recall reached 94.5% and 93%, respectively. However, additional
research is needed to determine if variations in the appearance of reproductive organs among
dried and pressed specimens complicate the automated segmentation of phenological features.


6. Conclusion and Future Directions
For the sake of segmenting and counting the reproductive organs within the DHS images, we
presented in this paper a deep learning-based approach, where the FPN network was refined by
creating three-channel pathways FPN. Our method is focused on precisely segmenting different-
sized and oriented buds, fruits, and flowers within the DHS images. Then, we predicted the
number of reproductive organs by counting their occurrence within the DHS images. Based on
Figure 4: Our proposed approach Outputs


the findings, our approach can segment reproductive organs with high precision and recall.
   It is important to note that the success of our method is also dependent on a relatively training
dataset. Enriching the dataset with specimens from various spatio-temporal scales may thus
significantly improve performance.


Acknowledgments
This work was part of the MAMUDS project (Management Multimedia Data for Science),
supported by BMBF, Germany (Project No. 01D16009) and MHESR, Tunisia.


References
 [1] S. Piao, Q. Liu, A. Chen, I. Janssens, Y. H. Fu, J. Dai, L. Liu, X. Lian, M. Shen, X. Zhu, Plant
     phenology and global climate change: Current progresses and challenges., Global change
     biology 25 6 (2019) 1922–1940.
 [2] T. Borsch, A. Stevens, E. Häffner, A. Güntsch, W. Berendsohn, M. Appelhans, C. Bari-
     laro, B. Beszteri, F. Blattner, O. Bossdorf, H. Dalitz, S. Dressler, R. Duque-Thüs, H. Esser,
     A. Franzke, D. Goetze, M. Grein, U. Grünert, F. Hellwig, J. Hentschel, E. Hörandl, T. Janssen,
     N. Jürgens, G. Kadereit, T. Karisch, M. Koch, F. Müller, J. Müller, D. Ober, S. Porembski,
     P. Poschlod, C. Printzen, M. Röser, P. Sack, P. Schlüter, M. Schmidt, M. Schnittler, M. Scholler,
     M. Schultz, E. Seeber, J. Simmel, M. Stiller, M. Thiv, H. Thüs, N. Tkach, D. Triebel, U. Warnke,
     T. Weibulat, K. Wesche, A. M. Yurkov, G. Zizka, A complete digitization of german herbaria
     is possible, sensible and should be started now, Research Ideas and Outcomes 6 (2020).
 [3] P. W. Sweeney, B. Starly, P. J. Morris, Y. Xu, A. Jones, S. Radhakrishnan, C. Grassa, C. Davis,
     Large-scale digitization of herbarium specimens: Development and usage of an automated,
     high-throughput conveyor system, Taxon (2018).
 [4] D. F. Lima, J. H. F. Mello, I. T. Lopes, R. Forzza, R. Goldenberg, L. Freitas, Phenological
     responses to climate change based on a hundred years of herbarium collections of tropical
     melastomataceae, PLoS ONE 16 (2021).
 [5] C. G. Willis, E. R. Ellwood, R. B. Primack, C. C. Davis, K. D. Pearson, A. S. Gallinat,
     J. M. Yost, G. Nelson, S. J. Mazer, N. L. Rossington, T. H. Sparks, P. S. Soltis, Old
     plants, new tricks: Phenological research using herbarium specimens, Trends in Ecology
     and Evolution 32 (2017) 531–546. URL: https://www.sciencedirect.com/science/article/pii/
     S0169534717300939. doi:https://doi.org/10.1016/j.tree.2017.03.015.
 [6] D. Park, A. Williams, E. Law, A. Ellison, C. Davis, Assessing plant phenological patterns in
     the eastern united states over the last 120 years, Environmental Data Initiative (2018).
 [7] T. Lorieul, K. D. Pearson, E. R. Ellwood, H. Goëau, J. Molino, P. W. Sweeney, J. Yost, J. Sachs,
     E. Mata-Montero, G. Nelson, P. Soltis, P. Bonnet, A. Joly, Toward a large-scale and deep
     phenological stage annotation of herbarium specimens: Case studies from temperate,
     tropical, and equatorial floras, Applications in Plant Sciences 7 (2019).
 [8] E. R. Ellwood, K. D. Pearson, G. Nelson, Emerging frontiers in phenological research,
     Applications in Plant Sciences 7 (2019).
 [9] H. Goëau, A. Mora-Fallas, J. Champ, N. Love, S. Mazer, E. Mata-Montero, A. Joly, P. Bonnet,
     A new fine-grained method for automated visual analysis of herbarium specimens: A case
     study for phenological data extraction, Applications in Plant Sciences 8 (2020).
[10] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: 2017 IEEE International
     Conference on Computer Vision (ICCV), 2017, pp. 2980–2988. doi:10.1109/ICCV.2017.
     322.
[11] C. Davis, J. Champ, D. S. Park, I. K. Breckheimer, G. M. Lyra, J. Xie, A. Joly, D. Tarapore,
     A. Ellison, P. Bonnet, A new method for counting reproductive structures in digitized
     herbarium specimens using mask r-cnn, Frontiers in Plant Science 11 (2020).
[12] Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, Mask scoring r-cnn, 2019 IEEE/CVF
     Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 6402–6411.
[13] A. Dutta, A. Zisserman, The via annotation software for images, audio and video, Pro-
     ceedings of the 27th ACM International Conference on Multimedia (2019).
[14] S. Ren, K. He, R. B. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection
     with region proposal networks, IEEE Transactions on Pattern Analysis and Machine
     Intelligence 39 (2015) 1137–1149.
[15] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016 IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR) (2016) 770–778.
[16] T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, S. J. Belongie, Feature pyramid
     networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern
     Recognition (CVPR) (2017) 936–944.
[17] C. Shorten, T. Khoshgoftaar, A survey on image data augmentation for deep learning,
     Journal of Big Data 6 (2019) 1–48.