Sabanci-Okan System in
LifeCLEF 2015 Plant Identification Competition

        Mostafa Mehdipour Ghazi1 , Berrin Yanikoglu1 , Erchan Aptoula2 ,
                  Ozlem Muslu1 , and Murat Can Ozdemir1
1
    Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey
       2
         Computer Engineering Department, Okan University, Istanbul, Turkey
           {mehdipour,berrin,ozlemmuslu,ozdemirmcan}@sabanciuniv.edu
                            erchan.aptoula@okan.edu.tr


        Abstract. We present our deep learning based plant identification sys-
        tem in the LifeCLEF 2015. The approach is based on a simple deep con-
        volutional network called PCANet and does not require large amounts
        of data due to using principal component analysis to learn the weights.
        After learning multistage filter banks, a simple binary hashing is applied
        to the filtered data, and features are pooled from block histograms. A
        multiclass linear support vector machine is then trained and the system
        is evaluated using the plant task datasets of LifeCLEF 2014 and 2015. As
        announced by the organizers, our submission achieved an overall inverse
        rank score of 0.153 in the image-based and an inverse rank score of 0.162
        in the observation-based task of LifeCLEF 2015, as well as an inverse
        rank score of 0.51 for the LeafScan dataset of LifeCLEF 2014.

        Keywords: plant identification, deep learning, PCANet, support vector
        machine, inverse rank score


1     Overview

In recent years, research in the area of automatic plant identification from pho-
tographs has concentrated around annual plant identification competitions that
are organized within the CLEF campaigns including ImageCLEF [1–3] and Life-
CLEF [4–6]. CLEF is devoted to promoting and evaluating multilingual and
multimodal information retrieval systems and the main goal of these competi-
tions is to benchmark the challenging task of content-based identification and
retrieval of plant species which is of immense importance in botany, agriculture,
plant taxonomy, pharmacy, and pharmacology. The task is carried out using im-
ages of different types of plant parts, such as leaves, branches, stems, flowers,
and fruits.
    Since 2011, competitive submissions for the plant identification task have
been made to ImageCLEF and LifeCLEF in which the participating systems
have utilized widely different approaches; still, the problem is far from being
solved due to several challenges including large variations in color, illumination,
background, size, and shape. Deep learning approaches are new and suitable for
solving such problems with large amounts of intra-class variability [7].
     There are two tasks within the LifeCLEF 2015 campaign: image-based and
observation-based plant identification tasks. The image-based task requires iden-
tification given a single image while the goal of the observation-based task is to
identify plants based on multi-image query. The latter corresponds to the sce-
nario in which a photographer uses the same camera to take snapshots from
different views of various organs of a plant species under the similar lighting
conditions and on the same day. The campaign started in 2011 with the image-
based task covering over 70 tree species, and the observation-based task became
the main track later in 2014. By 2015 [6], the number of species has reached
about 1000, covering the entire flora of a given region.
     In this work, we have utilized a system different from our previous submis-
sions [8–11] to recognize plant species using a new deep convolutional network
known as PCANet [7]. Although the PCANet method is suboptimal in compari-
son with common convolutional neural networks (CNN) [12,13], our experiments
using PCANet resulted in good performances for aligned images such as scanned
leaves.


2     PCANet

PCANet is a recently proposed convolutional network architecture that combines
the strengths of principal component analysis (PCA) and deep learning [7]. In
comparison with the CNN which attempts to find optimal filters for feature
mapping, PCANet is suboptimal in that it learns the filter banks by applying
PCA on the input data. On the other hand, its advantages lie in the facts that
it does not require large amounts of data or long learning time while still using
the core concept of the deep convolutional network architecture.
    The general structure of PCANet and our proposed architecture for plant
identification are presented in this section.


2.1   General PCANet Architecture

PCANet initializes by applying principal component analysis to overlapping
patches of all images. The selected principal components form the first layer
filters and the projections of the patches on to the principal components form
the response of units in the first layer.
     We then repeat this methodology to form a cascaded linear map in the next
layers of the deep convolutional network architecture. Next, the method uses
binary quantization and hashing for the multi-stage filtered image sets to con-
catenate them in the decimal form. Finally, local histograms are extracted from
the blocks of the quantized images and spatial pyramid pooling method is ap-
plied to these histograms to extract features.
     The algorithm is explained in detail as follows. The training data contains
i = 1, 2, ..., N images Ii of size m × n. In the first stage, patches of size k1 × k2
pixels are extracted around each pixel in the image Ii . Afterwards, all such
overlapping patches are collected, vectorized, and mean subtracted to obtain
Xi . Repeating this operation for all images, we obtain a patch collection X, as:

                          X = [X1 , X2 , ..., XN ] ∈ Rk1 k2 ×N mn                            (1)

Next, in order to calculate the desired filter banks of orthonormal filters, V , PCA
minimizes the reconstruction error to compute their L1 principal components.
The constrained optimization is formulated as:
                                                   2
                   min        kX − V V T XkF            subject to V T V = IL1               (2)
              V ∈Rk1 k2 ×L1

where k·kF is Frobenius norm, IL1 is the identity matrix of size L1 × L1 and the
solution simply consists of finding L1 principal eigenvectors of XX T . Therefore,
the PCA filters for the first layer form weights Wl11 for l1 = 1, 2, ..., L1 , by
converting the eigenvectors to matrices of size k1 × k2 . Hence, the l1 th filtered
image is calculated by convolving the l1 th filter with the ith patch-mean removed
image, I¯i , as,

                                         Iil1 = I¯i ∗ Wl11                                   (3)

     We can repeat the same approach to learn L2 PCA filters for the second layer
to create double filtered images. For this purpose, all the overlapping patches of
each filtered image Iil1 are collected, vectorized, and mean subtracted to obtain
Yil1 . Repeating this algorithm for all filtered images, we obtain,

          Y = [Y11 , ..., YN1 , Y12 , ..., YN2 , ..., Y1L1 , ..., YNL1 ] ∈ Rk1 k2 ×L1 N mn   (4)

Similarly, PCA filters for the second layer, Wl22 for l2 = 1, 2, ..., L2 , are obtained
by finding L2 principal eigenvectors of Y Y T and rearranging them as matrices
of size k1 × k2 . Therefore, the double filtered image, computed sequentially using
the l1 th and l2 th filters, is obtained by convolving the l2 th filter with the ith
patch-mean removed filtered image, I¯il1 , as,

                                      Oil1 ,l2 = I¯il1 ∗ Wl22                                (5)

As can be seen, in the output O for each image, we have L1 × L2 double filtered
images with real values. To decrease the number of images, it is proposed to first
binarize them using Heaviside step function H(·). Next, for each pixel, we map
L2 quantized binary bits to a decimal number as
                                        L2
                                        X
                               Til1 =           2l2 −1 H(I¯il1 ∗ Wl22 )                      (6)
                                        l2 =1

In fact, this conversion maps each L2 binary bits acquired from corresponding
pixels of the double filtered binary images into a single graylevel image pixel in
the range [0, 2L2 − 1].
                                                                                                                                ℎ
                                                                            ,
                                                                                ,
                                                      ̅                                                                              ℎ
                    ̅                                                               ,


                                                          ̅                             ,
                                                                                            ,


                                                                                                ,


                           First Stage                                 Second Stage                             Output Layer
  Input Layer   mean removal – applying PCA filters           mean removal – applying PCA filters   binarization and mapping – feature pooling


                           Fig. 1: Block diagram of a two-stage PCANet


    Finally, we partition each of L1 decimal images (Til1 ) into B blocks and
compute block histograms (with 2L2 bins) for all L1 images as the features of
ith image,
                                                                                                                           L2
 fi = [hist1 (Ti1 ), ..., histB (Ti1 ), ..., hist1 (TiL1 ), ..., histB (TiL1 )] ∈ R1×2                                          L1 B
                                                                                                                                           (7)

where histj (·) indicates the histogram of the j th block of the partitioned im-
age. Utilizing local histograms provides translation invariance in the extracted
features. Figure 1 displays the block diagram of a two-stage PCANet.

2.2     PCANet Architecture for Plant Identification
In order to process each plant image, color images are converted from RGB
to HSY color space [14] and scaled identically to 128 × 128 pixels. We apply
PCANet on each color component of the scaled plant images assuming a 2-stage
convolutional network with L1 = 10 and L2 = 8 as the number of filter banks in
each stage with the overlapping image patches of size 7 × 7 and histogram block
size of 20×10. Because of the massive size of data obtained after feature pooling,
we chose the multi-class linear Support Vector Machine (SVM) classifier for
complexity and accuracy issues in the final stage. For the SVM implementation,
we used the Liblinear toolbox [15] and applied the dual L2-regularized-L2-loss
model and a misclassification penalty cost of one.

2.3     Score Fusion in the Observation-Based Task
For the observation-based task, we applied the Borda count fusion method [16]
to the proposed system outputs to combine the scores obtained by different
photographs of individual plants. In this method, each class that appears in
the list of top classes returned by the classifier receives a vote that is inversely
proportional to its rank in that list. Note that for each observation with k images,
there are k such class lists. We modified the Borda count in this problem such
that votes are distributed not only to the class, but also to the members in the
same genus.
3     Performance Analysis

In this section, we will explain the competition datasets and adjust optimal
parameters of our proposed system by extracting validation sets from the train-
ing data. We then define the performance metrics and report the experimental
results on the validation and official test sets.


3.1   Dataset

The plant identification task in LifeCLEF 2015 involves identifying 1,000 species
of trees, herbs, and ferns from photographs of their different organs mostly taken
within France by different users. The collected dataset contains 113,205 pictures,
91,759 images for training and 21,446 images for testing. Table 1 shows the details
of the provided datasets and their sample images.


Table 1: Details of the datasets for plant identification task within the LifeCLEF
2015
                      Branch Entire Flower       Fruit   Leaf     LeafScan Stem

 # Training Samples    8,130   16,235   28,225   7,720   13,367    12,605   5,476
 # Test Samples        2,088    6,113   8,327    1,423   2,690      221     584
 # Classes              891     993      997      755     899       351     649


     To validate our results, we used the proportionate stratified random sampling,
i.e. we first randomly split the training dataset into two subsets after placing
images of each plant species into both the training and the validation sets. The
proportion of validation sets to training sets is assumed to be around 1 to 3.
In other words, for any dataset, we randomly selected one-fourth of available
samples of each individual class, if possible, as samples of the validation set.


3.2   Results

We next applied our proposed plant identification system for different plant cat-
egories on the obtained training and validation sets. Table 2 shows the system
performance in terms of the obtained first rank classification accuracies. As ex-
pected, the flower, fruit, and stem photographs are relatively easier to classify
compared to branch, leaf, and entire categories.
   LifeCLEF lab itself employs a user-based metric called the average inverse
rank score [17] instead of the total classification accuracy. The average inverse
Table 2: Top rank classification accuracies for the proposed plant identification
system
  Category     # Training Samples # Validation Samples Accuracy

  Branch                6,447                      1,683               23.53 %
  Entire               12,567                      3,668               25.08 %
  Flower               21,531                      6,694               34.02 %
  Fruit                 6,072                      1,648               40.41 %
  Leaf                 10,367                      3,000               33.80 %
  LeafScan              9,576                      3,029               90.49 %
  Stem                  4,344                      1,132               37.19 %


Table 3: Official average inverse rank scores of our best run in the imaged-based
task of LifeCLEF 2015
  Branch Entire Flower Fruit Leaf                LeafScan Stem Overall
   0.053      0.106     0.189    0.143   0.111      0.216      0.120     0.153


rank score S is defined as
                                U      Pu      Nu,p
                             1 X 1 X       1 X
                       S=                           su,p,n                       (8)
                             U u=1 Pu p=1 Nu,p n=1

where U is the number of users who have taken the query pictures; Pu is the
number of individual plants observed by the uth user; Nu,p is the number of
pictures taken from the pth plant observed by the uth user; and su,p,n is the
inverse of the rank of the correct species for the given image, ranging from 0 to
1. Considering this metric, we applied PCANet to the test sets using the learned
parameters in the training step and submitted our prediction results to the orga-
nizers of LifeCLEF 2015 for official evaluation. Table 3 displays the inverse rank
scores of our best run in the image-based LifeCLEF 2015 for different categories
of plant identification task. In the observation-based task, our approach using
Borda count achieved an inverse rank score of 0.162.
    Bearing in mind that this large dataset consists of 1,000 classes of similar
categories, our official overall rank score of 0.153 in the image-based task shows
a fair performance for our submission. Comparing the official test results given
in Table 3 with the higher accuracies reported in Table 2, we conclude that a
considerable amount of overfitting existed during validation. This situation could
have been improved if we had used more data, but the time complexity was an
issue even with the simple architecture.
   Furthermore, since a very small subset of the test data in LifeCLEF 2015
belonged to scanned leaves, we skipped the preprocessing and segmentation
phase [18] which had been applied to the LeafScan category in our previous sub-
missions [8–11]. Once we applied PCANet to the segmented and preprocessed
LeafScan images of LifeCLEF 2014, we achieved the inverse rank score of 0.51
as measured by the campaign organizers.

3.3   Time Complexity
We measured the complexity of our system in terms of the running time for
feature extraction and training the classifier. On average over all categories,
PCANet took 1.37 seconds/image and 6.13 seconds/image for feature extraction
and training, respectively. All codes were implemented in MATLAB (run in 80
GB RAM and 2.50 GHz CPU with two processors).
    Although the campaign within the LifeCLEF 2015 had allowed using external
training data, we restricted ourselves to the provided LifeCLEF datasets. That
was due to the reasons that PCANet does not require large data to learn the
weights and that using external datasets would increase the processing time for
our system.

3.4   Effects of Parameter Selection
Parameters of the proposed PCANet-based plant identification system were se-
lected experimentally through validation. Due to combinatorial increase, each
time we adjusted only one of the parameters until finding the optimal point. For
instance, we observed that by increasing or decreasing the image patch size from
7 × 7, the performance rapidly decreased. The same conditions were held for the
block size of histograms (20 × 10).
    On the other hand, we observed that by increasing the normalization size of
input images and/or the number of filter banks (especially within the first stage),
the performance improved. However, there was a trade-off between complexity
and accuracy, i.e. by increasing the input image size and/or the number of filters,
the accuracy improved slightly while the size of the output feature vectors ex-
panded as well. Consequently, the training time increased drastically. Therefore,
we set the system parameters equal to values mentioned in the Section 2.2 to
evaluate our system.


4     Summary and Discussions
In this work, we used a simple deep convolutional approach called PCANet in
order to identify the plant species within the LifeCLEF 2015 plant identifica-
tion dataset. Our best run has shown a fair performance with overall inverse
rank scores of 0.153 and 0.162 in the image-based and observation-based tasks,
respectively. It seems that the proposed system is fast and efficient for aligned
images such as preprocessed scanned leaves.
Acknowledgments. This project is supported by Turkish Scientific and Re-
search Council of Turkey (TUBITAK) under project number 113E499.


References
 1. Goëau, H., Bonnet, P., Joly, A., Boujemaa, N., Barthelemy, D., Molino, J.F., Birn-
    baum, P., Mouysset, E., Picard, M.: The CLEF 2011 plant images classification
    task. In: CLEF (Notebook Papers/Labs/Workshop), Amsterdam (2011)
 2. Goëau, H., Bonnet, P., Joly, A., Yahiaoui, I., Barthelemy, D., Boujemaa, N.,
    Molino, J.F.: The ImageCLEF 2012 plant identification task. In: CLEF (Online
    Working Notes/Labs/Workshop), Rome (2012)
 3. Goëau, H., Bonnet, P., Joly, A., Bakic, V., Barthelemy, D., Boujemaa, N., Molino,
    J.F.: The ImageCLEF 2013 plant identification task. In: CLEF (Working Notes),
    Valencia (2013)
 4. Goëau, H., Joly, A., Bonnet, P., Selmi, S., Molino, J.F., Barthelemy, D., Boujemaa,
    N.: LifeCLEF plant identification task 2014. In: CLEF (Working Notes), Sheffield
    (2014) 598–615
 5. Joly, A., Goëau, H., Spampinato, C., Bonnet, P., Vellinga, W.P., Planqué, R.,
    Rauber, A., Palazzo, S., Fisher, B., Müller, H.: LifeCLEF 2015: multimedia life
    species identification challenges. In: CLEF 2015 Proceedings. Springer LNCS
    (2015)
 6. Goëau, H., Joly, A., Bonnet, P.: LifeCLEF plant identification task 2015. In: CLEF
    (Working Notes), Toulouse (2015)
 7. Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: PCANet: A simple deep
    learning baseline for image classification? Computing Research Repository (CoRR
    - arXiv) (2014) arXiv:1404.3606v2.
 8. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-Okan system at ImageClef 2011:
    Plant identification task. In: CLEF (Notebook Papers/Labs/Workshop). (2011)
 9. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-Okan system at ImageClef 2012:
    Combining features and classifiers for plant identification. In: CLEF (Online Work-
    ing Notes/Labs/Workshop). (2012)
10. Yanikoglu, B., Aptoula, E., Yildiran, S.T.: Sabanci-Okan system at ImageClef 2013
    plant identification competition. In: CLEF (Working Notes). (2013)
11. Yanikoglu, B., Yildiran, S.T., Tirkaz, C., Aptoula, E.: Sabanci-Okan system at
    LifeCLEF 2014 plant identification competition. In: CLEF (Working Notes). (2014)
    771–777
12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied
    to document recognition. Proceedings of the IEEE 86(11) (1998) 2278–2324
13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep
    convolutional neural networks. In: Neural Information Processing Systems. (2012)
    1106–1114
14. Hanbury, A.: A 3D-polar coordinate colour representation well adapted to image
    analysis. In: Proceedings of the 13th Scandinavian conference on Image analysis.
    (2003) 804–811
15. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A
    library for large linear classification. Journal of Machine Learning Research 9
    (2008) 1871–1874
16. Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B.,
    Kasabov, N.: Artificial neural networks and machine learning. In: Proceedings of
    the 23rd International Conference on Artificial Neural Networks. (2013) 8–9
17. Müller, H., Clough, P., Deselarers, T., Caputo, B.: ImageCLEF: Experimental
    evaluation in visual information retrieval. Volume 32 of The Information Retrieval
    Series. Springer (2010)
18. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Automatic plant identification from pho-
    tographs. Machine Vision and Applications 25(6) (2014) 1369–1383