=Paper=
{{Paper
|id=Vol-1180/CLEF2014wn-Life-YanicogluEt2014
|storemode=property
|title=Sabanci-Okan System at LifeCLEF 2014 Plant Identication Competition
|pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-Life-YanicogluEt2014.pdf
|volume=Vol-1180
|dblpUrl=https://dblp.org/rec/conf/clef/YanikogluYTA14
}}
==Sabanci-Okan System at LifeCLEF 2014 Plant Identication Competition==
Sabanci-Okan System at LifeCLEF 2014 Plant Identification Competition Berin Yanikoglu1 , S. Tolga Yildiran1 , Caglar Tirkaz1 , and Erchan Aptoula2 1 Sabanci University, Istanbul, Turkey 2 Okan University, Istanbul, Turkey {berrin, stolgay, caglart}@sabanciuniv.edu erchan.aptoula@okan.edu.tr Abstract. We describe our system in 2014 LifeCLEF [1] Plant Identi- fication Competition. The sub-system for isolated leaf category (LeafS- cans) was basically the same as last year [2], while plant photographs in all the remaining categories were classified using either local descriptors or deep learning techniques. However, due to large amount of data, large number of classes and shortage of time, our system was not very success- ful in the plant photograph sub-categories; but we obtained better results in isolated leaf images. As announced by the organizers, we obtained an inverse rank score of 0.127 overall and 0.449 for isolated leaves. 1 Overview Plant identification campaign within LifeCLEF 2014 was similar to previous years, but it was in larger scale with twice the number of classes and images as of last year [3]. The dataset consists of isolated leaf images called LeafScans, consisting of scanned or scan-like leaf photographs and plant photographs in different categories (e.g. Flower, Fruit, Stem). In summary, the dataset contained a total of 47,815 images (11,335 pictures of isolated leaves and 36,480 plant photographs). Our submission was very similar to that submitted in 2013 for the case of isolated leaves, while we started to build a new system for plant photographs us- ing local descriptors and deep learning techniques. We split the task to share the work load: for Flower, Fruit and Entire categories, we used dense-SIFT descrip- tors, while for Branch and Leaf categories we used convolutional neural networks (CNN). The stem category was also recognized using globally extracted texture and color features, similar to LeafScans. In both of these two approaches, the main problem was the large number of classes, which resulted in long training times. As a result, we exploited the meta-data wherever applicable: namely to split the flower/fruit categories ac- cording to flowering/fruit bearing periods and trained a separate classifier for each time period. In this way, we aimed to reduce the number of classes dealt by each classifier. In the other categories (Stem, Branch, Entire and Leaf) time information did not seem very useful and was not used; in fact we did not use meta-data anywhere else in the system. 771 2 Preprocessing Preprocessing stages were present only for the isolated leaf images. Specifically, we align the leaf’s major axis with the vertical through principal component anal- ysis, with additional correction coming from the leaf petiole’s location. Then size normalization is achieved by normalizing the leaf height to 600 pixels, preserv- ing the aspect ratio. Orientation normalization done this way is quite successful, however is not error-proof. Furthermore, errors in orientation normalization typi- cally cause overall errors because most of the features are sensitive to orientation. There was no preprocessing for plant photographs. 3 Features 3.1 Features for LeafScans The descriptors used for characterizing the samples of the LeafScan category are identical to those used during the plant identification track of ImageCLEF 2013 [2]. In detail they are as follows: Basic Shape Statistics (BSS): provides contour related information by comput- ing basic statistical measures from the distance to centroid curve. In particular, once the image centroid is located, we compute the contour pixels’ Euclidean distance to it, thus resulting in a numerical sequence. After sorting the said sequence, we extract the following basic measures from it: BSS = {maximum, minimum, median, variance} (1) Area Width Factor (AWF) is computed on grayscale and constitutes a slight variation of the leaf width factor introduced in Ref. [4]. Specifically, given an isolated leaf image, it is first divided into n strips, perpendicular to its major axis. For the final n-dimensional feature, we compute the volume (V ol, i.e. sum of pixel values) of each strip (V oli ) normalized by the global volume (V ol): AW F = {V oli /V ol}1≤i≤n (2) Regional Moments of Inertia (RMI) is relatively similar to AWF. It requires an identical image subdivision system, differing only in the characterization of each strip. To explain, instead of using the sum of pixel values, each strip is described by means of the mean Euclidean distance between its centroid and contour pixels [5]. Angle Code Histogram (ACH) has been used in Ref. [6] for tree leaf classification. Given the binary segmentation mask, it consists in first subsampling the contour points, followed by computing the angles of successive point triplets. The final feature is formed by the normalized histogram of the computed angles. 772 Edge Background/Foreground Ratio Histogram is computed on the binary mask of its input and it consists in calculating the ratio of background to foreground pixels in a subwindow centered on each edge pixel. The normalized histogram of the said ratios constitutes the end feature vector. Orientation Histogram (OH) is computed on grayscale data. After computing the orientation map using a 11x11 edge detection operator for determining the dominant orientation at each pixel, the feature vector is computed as the nor- malized histogram of n bins of dominant orientations. Circular Covariance Histogram (CCH) and Rotation Invariant Point Triplets (RIT) are both texture descriptors [7] based on the morphological covariance operator. They operate on grayscale images, and focus on extracting period- icity patterns by means of morphological openings and closings using circular structuring elements. Color Auto-correlogram (AC) was used for color description [8]. It was computed in the LSH color space after a non-uniform subquantization to 63 colors (7 levels for hue, 3 for saturation and 3 for luminance). The color autocorrelogram describes the spatial correlation of colors. It consists of a table where the entry (i, j) denotes the probability of encountering two pixels of color i at a distance of j pixels. Saturation-weighted Hue Histogram (SWHH) was also used as a color descriptor, where the total value of each bin Wθ , θ ∈ [0, 360] is calculated as: X Wθ = Sx δθHx (3) x where Hx and Sx are the hue and saturation values at position x and δij the Kronecker delta function [9]. As far as the color space is concerned, we used LSH [10] since it provides a saturation representation independent of luminance. We used a single Support Vector Machine (SVM) classifier, as described in [2], to classify LeafScan images based on these features. 3.2 Features for Plant Photos We used local descriptors and deep learning techniques for the task of recognizing photographs of plants, where shape descriptors are not useful and global color and texture information is of limited use. In Stem category only, we used texture and color information globally, as the task seemed somewhat easier than others and due to lack of time. Local Descriptors - Dense SIFT This approach is applied to Flower, Fruit and Entire categories. Due to the large number of classes that increase training times and decrease accuracy, we first divided each category into sub-categories, 773 using the date information from meta-data. Then we trained a separate classifier for each sub-category (15 of them in total), in order to reduce the problems and increase accuracy. In the dense-SIFT approach, 16-by-16 patches are collected from images and SIFT descriptors are calculated for each patch [11, 12]. Then, these descriptors are clustered using k-means clustering[13] to produce a visual word dictionary. We experimented with the size of this dictionary and selected 1,200 words, as it gave the best result with validation data. Once the visual word dictionary is selected, each image is described as a histogram of these visual words, using the Bag of Words model [14]. Finally, a linear Support Vector Machine (SVM) is trained to differentiate between classes in each sub-category, using these his- tograms as features [15]. We used the Weka toolbox [16] for the implementation where Stochastic Dual Coordinate Ascent[17] solver method is used as the optimization method. For parameter optimization, grid search is used, where C = 10 was selected as the cost value. Deep Learning - Convolutional Neural Network: We trained a convolu- tional neural network (CNN) for the Leaf and Branch categories, that were seen as the most challenging sub-categories. The CNN we employed contains 8 layers where the output of the last fully connected softmax layer produces a distribu- tion over the class labels. The first and the fourth layers are convolutional layers with 5x5 kernels; the second and the fifth layers are local contrast normalization layers; the third and the sixth layers are max-pooling layers; the seventh layer is a locally connected convolutional layer. The Rectified Linear Units (ReLU) non-linearity is applied to the output of every convolutional and fully-connected layer [18–20] In order to process each color image, the largest square region of the image is cropped and down-scaled to 32x32x3 for further processing. The first convo- lutional layer filters the 32x32x3 input image with 32 kernels of size 5x5x3; the second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 64 kernels of size 5x5x32; the third convolutional layer, a locally-connected layer with unshared weights, has 64 kernels of size 3x3x64; the final layer is a fully connected soft-max layer. In order to reduce over-fitting we used two techniques: data augmentation and drop-out. For data augmentation we applied label preserving transformations on the image data such as reflections, small rotations and translations whereas dropout is employed in the third convolutional layer, just before the soft-max layer. Global Features - Stem Category: For the characterization of the samples of the Stem category, we employed a combination of texture and color descriptors obtained from the whole image. The reason was two-fold: lack of time and the impression that this sub-category was relatively simpler compared to other sub- categories. 774 We used as descriptors those used with isolated leaf images (i.e. OH, CCH, RIT), as well as the Morphological Covariance (MC). M C is the morphological equivalent of the usual covariance operator. It is based on successive erosions by means of point couples at various distances and orientations: M C(f ) = V ol εP2,v (f ) /V ol (f ) (4) where ε denotes the erosion operator, P2,v a point couple separated by a vector v and V ol the image volume, i.e. the sum of pixel values. As such it can capture the directionality as well as periodicity of its input [7]. 4 Results For training classifiers and validating our methods, we split the provided training set into two (Train and Validation sets), separately for each sub-category. In doing the split of the available development data, we kept all images of an individual plant in the Train or Validation splits, in order to reduce overfitting. We report the top-1 recognition accuracy of our system on this Validation data, in the fourth column of Table 1, since the labels for the Test set are not yet available. We also report the inverse-rank scores obtained from Test data, announced by the competition organizers, in the fifth column. Note that the top-1 accuracies and the inverse-rank scores are not directly comparable, due to data set change, different metrics, and also because the official score uses a user-based averaging, while the accuracy results use an average over images. Table 1. Accuracy in each subcategory, as measured during development over valida- tion split of the development data. Content # Train images/ # Validation images/ Accuracy Inverse-Rank /#Classes /#Classes (%) Global LeafScan 9,532/212 1,802/133 69.70 0.449 Stem 2,526/368 940/282 25.15 0.089 Dense-SIFT Flower 11,169/484 1,993/464 21.74 0.149 Fruit 3,033/374 721/269 19.12 0.118 Entire 5,348/490 1,008/476 9.55 0.077 CNN Leaf 5,660/470 2,094/439 20.68 0.066 Branch 1,414/356 573/270 12.64 0.007 The way we split the training data resulted in having fewer number of classes in the Validation set, as seen in Table 1, but we did not want to split images of the same individual plant across the two sets (as some of them could be very similar). As a result, the validation performance was not fully informative, as some classes were not in the set; but this was considered preferrable to alternatives. 775 Our results are not very good except for the LeafScan category where the task is easier and we have obtained very good results in the last years [21, 2]. In fact, all of our inverse-rank scores are significantly lower compared to last year’s results. This may be attributed to the more complex problem (with almost twice the number of classes) and also no time to quite polish up our new methods. Acknowledgements This project is supported by Turkish Scientific and Re- search Council of Turkey (TUBITAK), under project number 113E499. References 1. Joly, A., Müller, H., Goëau, H., Glotin, H., Spampinato, C., Rauber, A., Bonnet, P., Vellinga, W.P., Fisher, B.: Lifeclef 2014: multimedia life species identification challenges. In: Proceedings of CLEF 2014. (2014) 2. Yanikoglu, B., Aptoula, E., Yildiran, S.T.: Sabanci-Okan system at ImageClef 2013. In: CLEF (Notebook Papers/Labs/Workshop). (2013) 3. Goëau, H., Joly, A., Bonnet, P., Molino, J.F., Barthélémy, D., Boujemaa, N.: Life- clef plant identification task 2014. In: CLEF working notes 2014. (2014) 4. Hossain, J., Amin, M.A.: Leaf shape identification based plant biometrics. In: Inter- national Conference on Computer and Information Technology, Dhaka, Bangladesh (2010) 458–463 5. Knight, D., Painter, J., Potter, M.: Automatic plant leaf classification for a mobile field guide (2010) 6. Wang, Z., Chi, Z., Feng, D.: Shape based leaf image retrieval. IEE Proceedings in Vision, Image and Signal Processing 150 (2003) 34–43 7. Aptoula, E.: Extending morphological covariance. Pattern Recognition 45 (2012) 4524–4535 8. Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlogram. In: CVPR, San Juan, Puerto Rico (1997) 762–768 9. Hanbury, A.: Circular statistics applied to colour images. In: Computer Vision Winter Workshop, Valtice, Czech Republic (2003) 55–60 10. Aptoula, E., Lefèvre, S.: On the morphological processing of hue. Image and Vision Computing 27 (2009) 1394–1401 11. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna- tional Journal of Computer Vision 60 (2004) 91–110 12. Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: Sift flow: Dense correspon- dence across different scenes. In: Proceedings of the 10th European Conference on Computer Vision: Part III. ECCV ’08, Berlin, Heidelberg, Springer-Verlag (2008) 28–42 13. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002) 881–892 14. Wallach, H.M.: Topic modeling: Beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning. ICML ’06, New York, NY, USA, ACM (2006) 977–984 15. Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. KDD ’06, New York, NY, USA, ACM (2006) 217–226 776 16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11 (2009) 10–18 17. Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08, New York, NY, USA, ACM (2008) 408–415 18. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann ma- chines. In Frnkranz, J., Joachims, T., eds.: ICML, Omnipress (2010) 807–814 19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q., eds.: NIPS. (2012) 1106–1114 20. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Im- proving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012) 21. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-Okan system at ImageClef 2012: Combining features and classifiers for plant identification. In: CLEF (Notebook Papers/Labs/Workshop). (2012) 777