Sabanci-Okan System in LifeCLEF 2015 Plant Identification Competition Mostafa Mehdipour Ghazi1 , Berrin Yanikoglu1 , Erchan Aptoula2 , Ozlem Muslu1 , and Murat Can Ozdemir1 1 Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey 2 Computer Engineering Department, Okan University, Istanbul, Turkey {mehdipour,berrin,ozlemmuslu,ozdemirmcan}@sabanciuniv.edu erchan.aptoula@okan.edu.tr Abstract. We present our deep learning based plant identification sys- tem in the LifeCLEF 2015. The approach is based on a simple deep con- volutional network called PCANet and does not require large amounts of data due to using principal component analysis to learn the weights. After learning multistage filter banks, a simple binary hashing is applied to the filtered data, and features are pooled from block histograms. A multiclass linear support vector machine is then trained and the system is evaluated using the plant task datasets of LifeCLEF 2014 and 2015. As announced by the organizers, our submission achieved an overall inverse rank score of 0.153 in the image-based and an inverse rank score of 0.162 in the observation-based task of LifeCLEF 2015, as well as an inverse rank score of 0.51 for the LeafScan dataset of LifeCLEF 2014. Keywords: plant identification, deep learning, PCANet, support vector machine, inverse rank score 1 Overview In recent years, research in the area of automatic plant identification from pho- tographs has concentrated around annual plant identification competitions that are organized within the CLEF campaigns including ImageCLEF [1–3] and Life- CLEF [4–6]. CLEF is devoted to promoting and evaluating multilingual and multimodal information retrieval systems and the main goal of these competi- tions is to benchmark the challenging task of content-based identification and retrieval of plant species which is of immense importance in botany, agriculture, plant taxonomy, pharmacy, and pharmacology. The task is carried out using im- ages of different types of plant parts, such as leaves, branches, stems, flowers, and fruits. Since 2011, competitive submissions for the plant identification task have been made to ImageCLEF and LifeCLEF in which the participating systems have utilized widely different approaches; still, the problem is far from being solved due to several challenges including large variations in color, illumination, background, size, and shape. Deep learning approaches are new and suitable for solving such problems with large amounts of intra-class variability [7]. There are two tasks within the LifeCLEF 2015 campaign: image-based and observation-based plant identification tasks. The image-based task requires iden- tification given a single image while the goal of the observation-based task is to identify plants based on multi-image query. The latter corresponds to the sce- nario in which a photographer uses the same camera to take snapshots from different views of various organs of a plant species under the similar lighting conditions and on the same day. The campaign started in 2011 with the image- based task covering over 70 tree species, and the observation-based task became the main track later in 2014. By 2015 [6], the number of species has reached about 1000, covering the entire flora of a given region. In this work, we have utilized a system different from our previous submis- sions [8–11] to recognize plant species using a new deep convolutional network known as PCANet [7]. Although the PCANet method is suboptimal in compari- son with common convolutional neural networks (CNN) [12,13], our experiments using PCANet resulted in good performances for aligned images such as scanned leaves. 2 PCANet PCANet is a recently proposed convolutional network architecture that combines the strengths of principal component analysis (PCA) and deep learning [7]. In comparison with the CNN which attempts to find optimal filters for feature mapping, PCANet is suboptimal in that it learns the filter banks by applying PCA on the input data. On the other hand, its advantages lie in the facts that it does not require large amounts of data or long learning time while still using the core concept of the deep convolutional network architecture. The general structure of PCANet and our proposed architecture for plant identification are presented in this section. 2.1 General PCANet Architecture PCANet initializes by applying principal component analysis to overlapping patches of all images. The selected principal components form the first layer filters and the projections of the patches on to the principal components form the response of units in the first layer. We then repeat this methodology to form a cascaded linear map in the next layers of the deep convolutional network architecture. Next, the method uses binary quantization and hashing for the multi-stage filtered image sets to con- catenate them in the decimal form. Finally, local histograms are extracted from the blocks of the quantized images and spatial pyramid pooling method is ap- plied to these histograms to extract features. The algorithm is explained in detail as follows. The training data contains i = 1, 2, ..., N images Ii of size m × n. In the first stage, patches of size k1 × k2 pixels are extracted around each pixel in the image Ii . Afterwards, all such overlapping patches are collected, vectorized, and mean subtracted to obtain Xi . Repeating this operation for all images, we obtain a patch collection X, as: X = [X1 , X2 , ..., XN ] ∈ Rk1 k2 ×N mn (1) Next, in order to calculate the desired filter banks of orthonormal filters, V , PCA minimizes the reconstruction error to compute their L1 principal components. The constrained optimization is formulated as: 2 min kX − V V T XkF subject to V T V = IL1 (2) V ∈Rk1 k2 ×L1 where k·kF is Frobenius norm, IL1 is the identity matrix of size L1 × L1 and the solution simply consists of finding L1 principal eigenvectors of XX T . Therefore, the PCA filters for the first layer form weights Wl11 for l1 = 1, 2, ..., L1 , by converting the eigenvectors to matrices of size k1 × k2 . Hence, the l1 th filtered image is calculated by convolving the l1 th filter with the ith patch-mean removed image, I¯i , as, Iil1 = I¯i ∗ Wl11 (3) We can repeat the same approach to learn L2 PCA filters for the second layer to create double filtered images. For this purpose, all the overlapping patches of each filtered image Iil1 are collected, vectorized, and mean subtracted to obtain Yil1 . Repeating this algorithm for all filtered images, we obtain, Y = [Y11 , ..., YN1 , Y12 , ..., YN2 , ..., Y1L1 , ..., YNL1 ] ∈ Rk1 k2 ×L1 N mn (4) Similarly, PCA filters for the second layer, Wl22 for l2 = 1, 2, ..., L2 , are obtained by finding L2 principal eigenvectors of Y Y T and rearranging them as matrices of size k1 × k2 . Therefore, the double filtered image, computed sequentially using the l1 th and l2 th filters, is obtained by convolving the l2 th filter with the ith patch-mean removed filtered image, I¯il1 , as, Oil1 ,l2 = I¯il1 ∗ Wl22 (5) As can be seen, in the output O for each image, we have L1 × L2 double filtered images with real values. To decrease the number of images, it is proposed to first binarize them using Heaviside step function H(·). Next, for each pixel, we map L2 quantized binary bits to a decimal number as L2 X Til1 = 2l2 −1 H(I¯il1 ∗ Wl22 ) (6) l2 =1 In fact, this conversion maps each L2 binary bits acquired from corresponding pixels of the double filtered binary images into a single graylevel image pixel in the range [0, 2L2 − 1]. ℎ , , ̅ ℎ ̅ , ̅ , , , First Stage Second Stage Output Layer Input Layer mean removal – applying PCA filters mean removal – applying PCA filters binarization and mapping – feature pooling Fig. 1: Block diagram of a two-stage PCANet Finally, we partition each of L1 decimal images (Til1 ) into B blocks and compute block histograms (with 2L2 bins) for all L1 images as the features of ith image, L2 fi = [hist1 (Ti1 ), ..., histB (Ti1 ), ..., hist1 (TiL1 ), ..., histB (TiL1 )] ∈ R1×2 L1 B (7) where histj (·) indicates the histogram of the j th block of the partitioned im- age. Utilizing local histograms provides translation invariance in the extracted features. Figure 1 displays the block diagram of a two-stage PCANet. 2.2 PCANet Architecture for Plant Identification In order to process each plant image, color images are converted from RGB to HSY color space [14] and scaled identically to 128 × 128 pixels. We apply PCANet on each color component of the scaled plant images assuming a 2-stage convolutional network with L1 = 10 and L2 = 8 as the number of filter banks in each stage with the overlapping image patches of size 7 × 7 and histogram block size of 20×10. Because of the massive size of data obtained after feature pooling, we chose the multi-class linear Support Vector Machine (SVM) classifier for complexity and accuracy issues in the final stage. For the SVM implementation, we used the Liblinear toolbox [15] and applied the dual L2-regularized-L2-loss model and a misclassification penalty cost of one. 2.3 Score Fusion in the Observation-Based Task For the observation-based task, we applied the Borda count fusion method [16] to the proposed system outputs to combine the scores obtained by different photographs of individual plants. In this method, each class that appears in the list of top classes returned by the classifier receives a vote that is inversely proportional to its rank in that list. Note that for each observation with k images, there are k such class lists. We modified the Borda count in this problem such that votes are distributed not only to the class, but also to the members in the same genus. 3 Performance Analysis In this section, we will explain the competition datasets and adjust optimal parameters of our proposed system by extracting validation sets from the train- ing data. We then define the performance metrics and report the experimental results on the validation and official test sets. 3.1 Dataset The plant identification task in LifeCLEF 2015 involves identifying 1,000 species of trees, herbs, and ferns from photographs of their different organs mostly taken within France by different users. The collected dataset contains 113,205 pictures, 91,759 images for training and 21,446 images for testing. Table 1 shows the details of the provided datasets and their sample images. Table 1: Details of the datasets for plant identification task within the LifeCLEF 2015 Branch Entire Flower Fruit Leaf LeafScan Stem # Training Samples 8,130 16,235 28,225 7,720 13,367 12,605 5,476 # Test Samples 2,088 6,113 8,327 1,423 2,690 221 584 # Classes 891 993 997 755 899 351 649 To validate our results, we used the proportionate stratified random sampling, i.e. we first randomly split the training dataset into two subsets after placing images of each plant species into both the training and the validation sets. The proportion of validation sets to training sets is assumed to be around 1 to 3. In other words, for any dataset, we randomly selected one-fourth of available samples of each individual class, if possible, as samples of the validation set. 3.2 Results We next applied our proposed plant identification system for different plant cat- egories on the obtained training and validation sets. Table 2 shows the system performance in terms of the obtained first rank classification accuracies. As ex- pected, the flower, fruit, and stem photographs are relatively easier to classify compared to branch, leaf, and entire categories. LifeCLEF lab itself employs a user-based metric called the average inverse rank score [17] instead of the total classification accuracy. The average inverse Table 2: Top rank classification accuracies for the proposed plant identification system Category # Training Samples # Validation Samples Accuracy Branch 6,447 1,683 23.53 % Entire 12,567 3,668 25.08 % Flower 21,531 6,694 34.02 % Fruit 6,072 1,648 40.41 % Leaf 10,367 3,000 33.80 % LeafScan 9,576 3,029 90.49 % Stem 4,344 1,132 37.19 % Table 3: Official average inverse rank scores of our best run in the imaged-based task of LifeCLEF 2015 Branch Entire Flower Fruit Leaf LeafScan Stem Overall 0.053 0.106 0.189 0.143 0.111 0.216 0.120 0.153 rank score S is defined as U Pu Nu,p 1 X 1 X 1 X S= su,p,n (8) U u=1 Pu p=1 Nu,p n=1 where U is the number of users who have taken the query pictures; Pu is the number of individual plants observed by the uth user; Nu,p is the number of pictures taken from the pth plant observed by the uth user; and su,p,n is the inverse of the rank of the correct species for the given image, ranging from 0 to 1. Considering this metric, we applied PCANet to the test sets using the learned parameters in the training step and submitted our prediction results to the orga- nizers of LifeCLEF 2015 for official evaluation. Table 3 displays the inverse rank scores of our best run in the image-based LifeCLEF 2015 for different categories of plant identification task. In the observation-based task, our approach using Borda count achieved an inverse rank score of 0.162. Bearing in mind that this large dataset consists of 1,000 classes of similar categories, our official overall rank score of 0.153 in the image-based task shows a fair performance for our submission. Comparing the official test results given in Table 3 with the higher accuracies reported in Table 2, we conclude that a considerable amount of overfitting existed during validation. This situation could have been improved if we had used more data, but the time complexity was an issue even with the simple architecture. Furthermore, since a very small subset of the test data in LifeCLEF 2015 belonged to scanned leaves, we skipped the preprocessing and segmentation phase [18] which had been applied to the LeafScan category in our previous sub- missions [8–11]. Once we applied PCANet to the segmented and preprocessed LeafScan images of LifeCLEF 2014, we achieved the inverse rank score of 0.51 as measured by the campaign organizers. 3.3 Time Complexity We measured the complexity of our system in terms of the running time for feature extraction and training the classifier. On average over all categories, PCANet took 1.37 seconds/image and 6.13 seconds/image for feature extraction and training, respectively. All codes were implemented in MATLAB (run in 80 GB RAM and 2.50 GHz CPU with two processors). Although the campaign within the LifeCLEF 2015 had allowed using external training data, we restricted ourselves to the provided LifeCLEF datasets. That was due to the reasons that PCANet does not require large data to learn the weights and that using external datasets would increase the processing time for our system. 3.4 Effects of Parameter Selection Parameters of the proposed PCANet-based plant identification system were se- lected experimentally through validation. Due to combinatorial increase, each time we adjusted only one of the parameters until finding the optimal point. For instance, we observed that by increasing or decreasing the image patch size from 7 × 7, the performance rapidly decreased. The same conditions were held for the block size of histograms (20 × 10). On the other hand, we observed that by increasing the normalization size of input images and/or the number of filter banks (especially within the first stage), the performance improved. However, there was a trade-off between complexity and accuracy, i.e. by increasing the input image size and/or the number of filters, the accuracy improved slightly while the size of the output feature vectors ex- panded as well. Consequently, the training time increased drastically. Therefore, we set the system parameters equal to values mentioned in the Section 2.2 to evaluate our system. 4 Summary and Discussions In this work, we used a simple deep convolutional approach called PCANet in order to identify the plant species within the LifeCLEF 2015 plant identifica- tion dataset. Our best run has shown a fair performance with overall inverse rank scores of 0.153 and 0.162 in the image-based and observation-based tasks, respectively. It seems that the proposed system is fast and efficient for aligned images such as preprocessed scanned leaves. Acknowledgments. This project is supported by Turkish Scientific and Re- search Council of Turkey (TUBITAK) under project number 113E499. References 1. Goëau, H., Bonnet, P., Joly, A., Boujemaa, N., Barthelemy, D., Molino, J.F., Birn- baum, P., Mouysset, E., Picard, M.: The CLEF 2011 plant images classification task. In: CLEF (Notebook Papers/Labs/Workshop), Amsterdam (2011) 2. Goëau, H., Bonnet, P., Joly, A., Yahiaoui, I., Barthelemy, D., Boujemaa, N., Molino, J.F.: The ImageCLEF 2012 plant identification task. In: CLEF (Online Working Notes/Labs/Workshop), Rome (2012) 3. Goëau, H., Bonnet, P., Joly, A., Bakic, V., Barthelemy, D., Boujemaa, N., Molino, J.F.: The ImageCLEF 2013 plant identification task. In: CLEF (Working Notes), Valencia (2013) 4. Goëau, H., Joly, A., Bonnet, P., Selmi, S., Molino, J.F., Barthelemy, D., Boujemaa, N.: LifeCLEF plant identification task 2014. In: CLEF (Working Notes), Sheffield (2014) 598–615 5. Joly, A., Goëau, H., Spampinato, C., Bonnet, P., Vellinga, W.P., Planqué, R., Rauber, A., Palazzo, S., Fisher, B., Müller, H.: LifeCLEF 2015: multimedia life species identification challenges. In: CLEF 2015 Proceedings. Springer LNCS (2015) 6. Goëau, H., Joly, A., Bonnet, P.: LifeCLEF plant identification task 2015. In: CLEF (Working Notes), Toulouse (2015) 7. Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: PCANet: A simple deep learning baseline for image classification? Computing Research Repository (CoRR - arXiv) (2014) arXiv:1404.3606v2. 8. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-Okan system at ImageClef 2011: Plant identification task. In: CLEF (Notebook Papers/Labs/Workshop). (2011) 9. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-Okan system at ImageClef 2012: Combining features and classifiers for plant identification. In: CLEF (Online Work- ing Notes/Labs/Workshop). (2012) 10. Yanikoglu, B., Aptoula, E., Yildiran, S.T.: Sabanci-Okan system at ImageClef 2013 plant identification competition. In: CLEF (Working Notes). (2013) 11. Yanikoglu, B., Yildiran, S.T., Tirkaz, C., Aptoula, E.: Sabanci-Okan system at LifeCLEF 2014 plant identification competition. In: CLEF (Working Notes). (2014) 771–777 12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11) (1998) 2278–2324 13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems. (2012) 1106–1114 14. Hanbury, A.: A 3D-polar coordinate colour representation well adapted to image analysis. In: Proceedings of the 13th Scandinavian conference on Image analysis. (2003) 804–811 15. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008) 1871–1874 16. Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N.: Artificial neural networks and machine learning. In: Proceedings of the 23rd International Conference on Artificial Neural Networks. (2013) 8–9 17. Müller, H., Clough, P., Deselarers, T., Caputo, B.: ImageCLEF: Experimental evaluation in visual information retrieval. Volume 32 of The Information Retrieval Series. Springer (2010) 18. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Automatic plant identification from pho- tographs. Machine Vision and Applications 25(6) (2014) 1369–1383