The CLEF 2011 plant images classification task Hervé Goëau1 , Pierre Bonnet2 , Alexis Joly1 , Nozha Boujemaa1 , Daniel Barthelemy3 , Jean-François Molino4 , Philippe Birnbaum5 , Elise Mouysset6 , and Marie Picard6 1 INRIA, IMEDIA team, France, name.surname@inria.fr, http://www-rocq.inria.fr/imedia/ 2 INRA, UMR AMAP, France, pierre.bonnet@cirad.fr, http://amap.cirad.fr/fr/index.php 3 CIRAD, BIOS, Direction and INRA, UMR AMAP, F-34398, France, daniel.barthelemy@cirad.fr, http://amap.cirad.fr/fr/index.php 4 IRD, UMR AMAP, France, jean-francois.molino@ird.fr, http://amap.cirad.fr/fr/index.php 5 CIRAD, UMR AMAP, France, philippe.birnbaum@cirad.fr, http://amap.cirad.fr/fr/index.php 6 Tela Botanica, France, name@tela-botanica.org, http://www.tela-botanica.org/ Abstract. ImageCLEF’ plant identification task provides a testbed for the system-oriented evaluation of tree species identification based on leaf images. The aim is to investigate image retrieval approaches in the con- text of crowdsourced images of leaves collected in a collaborative manner. This paper presents an overview of the resources and assessments of the plant identification task at ImageCLEF 2011, summarizes the retrieval approaches employed by the participating groups, and provides an anal- ysis of the main evaluation results. Keywords: ImageCLEF, plant, leaves, images, collection, identifica- tion, classification, evaluation, benchmark 1 Introduction Convergence of multidisciplinary research is more and more considered as the next big thing to answer profound challenges of humanity related to health, bio- diversity or sustainable energy. The integration of life sciences and computer sci- ences has a major role to play towards managing and analyzing cross-disciplinary scientific data at a global scale. More specifically, building accurate knowledge of the identity, geographic distribution and uses of plants is essential if agri- cultural development is to be successful and biodiversity is to be conserved. Unfortunately, such basic information is often only partially available for pro- fessional stakeholders, teachers, scientists and citizens, and often incomplete for ecosystems that possess the highest plant diversity. A noticeable consequence, expressed as the taxonomic gap, is that identifying plant species is usually im- possible for the general public, and often a difficult task for professionals, such as farmers or wood exploiters and even for the botanists themselves. The only 2 way to overcome this problem is to speed up the collection and integration of raw observation data, while simultaneously providing to potential users an easy and efficient access to this botanical knowledge. In this context, content-based visual identification of plant’s images is considered as one of the most promising solution to help bridging the taxonomic gap. Evaluating recent advances of the IR community on this challenging task is therefore an important issue. This paper presents the plant identification task that was organized within Im- ageCLEF 20117 for the system-oriented evaluation of visual based plant iden- tification. This first year pilot task was more precisely focused on tree species identification based on leaf images. Leaves are far from being the only discrim- inant visual key between tree species but they have the advantage to be eas- ily observable and the most studied organ in the computer vision community. The task was organized as a classification task over 70 tree species with visual content being the main available information. Additional information only in- cluded contextual meta-data (author, date, locality name) and some EXIF data. Three types of image content were considered: leaf scans, leaf photographs with a white uniform background (referred as scan-like pictures) and unconstrained leaf’s photographs acquired on trees with natural background. The main orig- inality of this data is that it was specifically built through a citizen sciences initiative conducted by Telabotanica 8 , a French social network of amateur and expert botanists. This makes the task closer to the conditions of a real-world application: (i) leaves of the same species are coming from distinct trees living in distinct areas (ii) pictures and scans are taken by different users that might not used the same protocol to collect the leaves and/or acquire the images (iii) pictures and scans are taken at different periods in the year. 2 Task resources 2.1 The Pl@ntLeaves dataset Building effective computer vision and machine learning techniques is not the only side of the taxonomic gap ptoblem. Speeding-up the collection of raw ob- servation data is clearly another crucial one. The most promising approach in that way is to build real-world collaborative systems allowing any user to enrich the global visual botanical knowledge [9]. To build the evaluation data of Im- ageCLEF plant identification task, we therefore set up a citizen science project around the identification of common woody species covering the Metropolitan French territory. This was done in collaboration with TelaBotanica social net- work and with researchers specialized in computational botany. Technically, images and associated tags were collected through a crowd-sourcing web application [9] and were all validated by expert botanists. Several cycles of such collaborative data collection and taxonomical validation occurred. Scans of leaves were first collected over two seasons, between July and September 2009 7 http://www.imageclef.org/2011 8 http://www.tela-botanica.org/ 3 Fig. 1. List of tree species included in the Pl@ntLeaves dataset and between June and September 2010 thanks to the work of active contributors from TelaBotanica social network. The idea of collecting only scans during this first period was to initialize the training data with limited noisy background and to focus on plant variability rather than mixed plant and view conditions variability. This allowed to collect 2228 scans over 55 species. A public version of the application 9 was then opened in October 2010 and additional data were collected up to March 2011. The new collected images were either scans, or photographs with uniform background (referred as scan-like photos), or uncon- strained photographs with natural background. They involved 15 new species from the previous set of 55 species. The Pl@ntLeaves dataset used within Im- ageCLEF finally contained 5436 images with 3070 scans, 897 scan-like photos and 1469 photographs. Figure 2 displays samples of these 3 image types for 4 distinct tree species. The full list of species is provided in Figure 1. 9 http://combraille.cirad.fr:8080/demo plantscan/ 4 Fig. 2. Illustration of the 3 image type categories for 4 species 2.2 Pl@ntLeaves metadata Each image of Pl@ntLeaves dataset is associated with the following meta-data: – Date upload date of the image – Type (acquisition type: scan, scan-like or photograph) – Content content type: single leaf, single dead leaf or foliage (several leaves on tree visible in the picture) – Taxon full taxon name (sub-regnum, regnum, class, division, order, family, genus, species) – VernacularNames French or English vernacular names – Author name of the author of the picture – Organization name of the organization of the author – Locality locality name (a district or a country division or a region) – GPSLocality GPS coordinates of the observation 5 Fig. 3. An image of Pl@ntLeaves dataset and its associated metadata These meta-data are stored in independent xml files, one for each image. Figure 3 displays an example image with its associated xml data. Additional but partial meta-data information can be found in the image’s EXIF, and might include the camera or the scanner model, the image resolution and dimension, the optical parameters, the white balance, the light measures, etc. 2.3 Pl@ntLeaves variability The main originality of Pl@ntLeaves compared to previous leaf datasets, such as the Swedish dataset [13] or the Smithsonian one [1], is that it was built in a collaborative manner through a citizen sciences initiative. This makes it closer to the conditions of a real-world application: (i) leaves of the same species are coming from distinct trees living in distinct areas (ii) pictures and scans are taken by different users that might not used the same protocol to collect the leaves and/or acquire the images (iii) pictures and scans are taken at different periods in the year. Intra-species visual variability and view conditions variabil- ity are therefore more stressed-out which makes the identification more realistic but more complex. Figures 4 to 9 provide illustrations of the intra-species vi- sual variability over several criteria including leaf’s color, leaf’s global shape, leaf’s margin appearance, number and relative positions of leaflets and number of lobes. On the other side, Figure 10 illustrates the light reflection and shadows variations of scan-like photos. It shows that this acquisition protocol is actu- 6 Fig. 4. Color variation of Cotinus coggygria Scop. (Eurasian smoketree) Fig. 5. Global shape variation of Corylus avellana L. (European Hazel) ally very different than pure scans. Both share the property of a limited noisy background but scan-like photos are much more complex due to the lighting conditions variability (flash, sunny weather, etc.) and the unflatness of leaves. Finally, the variability of unconstraint photographs acquired on the tree and with natural background is definitely a much more challenging issue as illustrated in Figure 11. 3 Task description The task was evaluated as a supervised classification problem with tree species used as class labels. 3.1 Training and Test data A part of Pl@ntLeaves dataset was provided as training data whereas the remain- ing part was used later as test data. The training subset was built by randomly selecting 2/3 of the individual plants of each species (and not by randomly splitting the images themselves). So that pictures of leaves belonging to the same individual tree cannot be split across training and test data. This prevents iden- tifying the species of a given tree thanks to its own leaves and that makes the task more realistic. In a real world application, it is indeed much unlikely that a 7 Fig. 6. Leaf’s margin variation of Quercus ilex L. (Holm oak) Fig. 7. Number of leaflets variation of Fraxinus angustifolia Vahl (Narrow-leafed Ash) user tries to identify a tree that is already present in the training data. Detailed statistics of the composition of the training and test data are provided in Table 1. Nb of pictures Nb of individual plants Nb of contributors Train 2349 151 17 Scan Test 721 55 13 Train 717 51 2 Scan-like Test 180 13 1 Train 930 72 2 Photograph Test 539 33 3 Train 3996 269 17 All Test 1440 99 14 Table 1. Statistics of the composition of the training and test data 8 Fig. 8. Leaflets relative position variation of Vitex agnus-castus L. (Judas Tree) Fig. 9. Number of lobes variation of Ficus carica L. (Common Fig) 3.2 Task objective and evaluation metric The goal of the task was to associate the correct tree species to each test image. Each participant was allowed to submit up to 3 runs built from different meth- ods. As many species as possible can be associated to each test image, sorted by decreasing confidence score. Only the most confident species was however used in the primary evaluation metric described below. But providing an extended ranked list of species was encouraged in order to derive complementary statistics (e.g. recognition rate at other taxonomic levels, suggestion rate on top k species, etc.). The primary metric used to evaluate the submitted runs was a normalized clas- sification rate evaluated on the 1st species returned for each test image. Each test image is attributed with a score of 1 if the 1st returned species is correct and 0 if it is wrong. An average normalized score is then computed on all test images. A simple mean on all test images would indeed introduce some bias with regard to a real world identification system. Indeed, we remind that the Pl@ntLeaves dataset was built in a collaborative manner. So that few contribu- tors might have provided much more pictures than many other contributors who provided few. Since we want to evaluate the ability of a system to provide correct answers to all users, we rather measure the mean of the average classification rate per author. Furthermore, some authors sometimes provided many pictures 9 Fig. 10. Light reflection and shadows variation of scan-like photos of Magnolia Gran- diflora (Southern Magnolia) Fig. 11. Variability of unconstrained photographs of Acer platanoides (Norway Maple) of the same individual plant (to enrich training data with less efforts). Since we want to evaluate the ability of a system to provide the correct answer based on a single plant observation, we also decided to average the classification rate on each individual plant. Finally, our primary metric was defined as the following average classification score S: U Pu Nu,p 1 X 1 X 1 X S= su,p,n (1) U u=1 Pu p=1 Nu,p n=1 U : number of users (who have at least one image in the test data) Pu : number of individual plants observed by the u-th user Nu,p : number of pictures taken from the p-th plant observed by the u-th user su,p,n : classification score (1 or 0) for the n-th picture taken from the p-th plant observed by the u-th user It is important to notice that while making the task more realistic, the nor- malized classification score also makes it more difficult. Indeed, it works as if a bias was introduced between the statistics of the training data and the one of the test data. It highlights the fact that bias-robust machine learning and computer vision methods should be preferred to train such real-world collaborative data. Finally, to isolate and evaluate the impact of the image acquisition type (scan, scan-like, photogragh), a normalized classification score S was computed for each 10 type separately. Participants were therefore allowed to train distinct classifiers, use different training subsets or use distinct methods for each data type. 4 Participants and techniques A total of 8 groups submitted 20 runs, which is a successful participation rate for a first year pilot task on a new topic. Participants were mainly academics, spe- cialized in computer vision and multimedia information retrieval, coming from all around the world: Australia (1), Brazil (1), France (2), Romania (1), Spain (1), Turkey (1) and UK (1). We list below the 8 participants and give a brief overview of the techniques they used to run the plant identification task. We re- mind here that ImageCLEF benchmark is a system-oriented evaluation and not a formal evaluation of the underlying methods. Readers interested by the scien- tific and technical details of any of these methods should refer to the CLEF2011 working note of each participant. IFSC (3 runs) [6] The two best runs obtained by this participant (IFSC USP run2 & IFSC USP run1) are mainly based on a new shape boundary anal- ysis method they introduced recently [3]. It is based on the complex network theory [2]. A shape is modeled into a small-world complex network and it uses degree and joint degree measurements in a dynamic evolution network to com- pose a set of shape descriptors. This method is claimed to be robust, noise tolerant, scale invariant and rotation invariant and proved to provide better per- formances than Fourier shape descriptors, curvature-based descriptors, Zernike moments and multiscale fractal dimensions. LIRIS (4 runs) [7] This participant also used a classification scheme based on shape boundary analysis. The main originality however is that they used a model-driven approach for the segmentation and shape estimation. Their four runs differ in the parameters of the method. UAIC (3 runs) This participant was the only one trying to benefit from metadata associated to the images (location, date, author, etc.). They submitted therefore 3 runs to evaluate the contribution of metadata compared to using vi- sual content only. Their first run (UAIC2011 Run01) is based on visual content only, the second one (UAIC2011 Run02) uses only metadata based features in the classification process, and the third one uses both (UAIC2011 Run03). SABANCI-OKAN (1 run) [16] The system consists of two separate sub- parts for: i) scan-pseudoscan images and ii) photos. The features used for the scan categories are computed using basic color and shape descriptors such as color moments and convexity, as well as more complex ones such as the Fourier descriptors of the contour and several morphological texture descriptors based on covariance extensions. Since some of these features are not meaningful for the photo category, a subset with color and texture features was used there. For 11 training, Support Vector Machines and classifier combination using only image content and none of the meta-data were used. All of the scan-pseudoscan images were used for part i), while for part ii), all of the images were used. The system was fully automatic. INRIA (2 runs) [10] This participant submitted two runs based on two rad- ically different methods. Their second run (inria imedia plantnet run2) is based on a shape boundary feature, called DFH [15], that they introduced in 2006. Their first run (inria imedia plantnet run1) is more surprising for a supervised classification task of leaves since it is based on local features matching with rigid geometrical models . Such generalist method is usually more dedicated to large- scale retrieval of rigid objects and this is the only participant who used such approach. RMIT (2 runs) [11] RMIT mainly focused on comparing two distinct ma- chine learning algorithms: an instance-based learning, implemented on Weka as IB1 (nearest-neighbor classifier) (RMIT run1), and a decision tree technique implemented on Weka as J48 (RMIT run2). For both, all training data were used, without complementary data. The features used were GIFT 166 colour histograms. DAEDALUS (1 run) [14] This participant used a generalist image re- trieval framework based on SIFT features and a nearest-neighbor classifier. KMIMMIS (4 runs) This participant also used a generalist image re- trieval framework based on local features and a nearest-neighbor classifier. They compared different configurations: basic clustered SIFT with 1-NN label trans- fer (kmimmis run1 and kmimmis run4), simple edge and corner point detection with 1-NN vote label transfer (kmimmis run2), simple edge and corner point detection with 10-NN vote label transfer (kmimmis run3). Over all runs, the most frequently used class of methods is shape boundary analysis. 8 runs among 20 are based on some boundary shape features.This is not surprising since state-of-the-art methods addressing leaf-based identification in the literature are mostly based on leaf segmentation and shape boundary fea- tures [4, 12, 5, 15, 3]. On the other side, it was a good news that the majority of the runs were based on other various approaches so that more relevant conclu- sions can be expected. 5 Results 5.1 Global analysis Figures 12, 13 and 14 present the normalized classification scores of the 20 sub- mitted runs for each of the three image types. Alternatively, Figure 15 presents 12 the overall performances averaged over the 3 image types. Table 2 finally presents the same results but with detailed numerical values. A first global remark is that, as expected, the performances are degrading with the complexity of the acquisition image type. Scans are more easy to identify than scan-like photos and unconstrained photos are much more difficult. This is can be easily seen in Figure 15 where the relative scores of each image type are highlighted by distinct colors. A second global remark is that no method provide the best score for all image types. None of the run even belongs to the top-3 runs of all image types (as shown in Table 2). This is somehow disappointing from the genericity point of view but not surprising regarding the nature of the different image types. One could expect that scans and scan-like photos lead to similar conclusions but this is actually not the case. The only runs that give quite stable and good perfor- mances over the three image types are the two runs of IFSC based on complex network shape boundary analysis method (IFSC USP run1 & IFSC USP run2). This justifies their excellent ranking when averaging the classification scores over the three image types. All other methods fail in providing as good results for the unconstrained photographs as shown in Figure 14. This score gap between IFSC run and others has however to be mitigated by a bias introduced by the author normalization of the classification score. Indeed, their high score is mostly due to excellent performances on the images of one of the 3 contributors. All these images are very similar and less cluttered than the average of the unconstrained photos (actually they are all close-up of a juda’s tree leaf). But still, it seems that this is the only method that have at least perform very well on these images. A third important remark is that shape boundary analysis methods do not provide the best results on the scan images whereas they are usually consid- ered as being state-of-the-art on such data. They all provide good classification scores between 48% and 56% but they are consistently outperformed by two more generic image retrieval approaches (as shown in Figure 12). The best score is achieved by INRIA’s run using large-scale matching of local features with rigid geometrical models (inria imedia plantnet run1, 68% classification rate). This suggests that modeling leaves as part-based rigid and textured objects might be an interesting alternative to shape boundary approaches that do not characterize well margin details, ribs or limb texture. Second best score on scans is obtained by the run of SABANCI-OKAN (Sabanci-okan-run1) which uses a supervised classification approach based on support vector machine (SVM) and a combination of 3 global visual features. This suggests that combining shape boundary features with other color and shape textures is also a promising direc- tion. Global conclusions on the scan-like photos are quite different (Figure 13). The best score is obtained by INRIA’s run (inria imedia plantnet run2) purely based on a global shape boundary feature (DFH [15]). It is followed closely by the four runs of LIRIS also based on boundary shape features but using a model-driven approach for the segmentation and the shape estimation. Then come the two 13 runs of INRIA and SABANCI-OKAN that ranked first on the scan images (the first one based on rigid objects matching and the second one training combined features) and finally the shape boundary method of IFSC. One conclusion is that shape boundary methods appear to provide more stable results for scans and scan-like photos. On the other side the rigid object matching method of IN- RIA degrades much more from scans to scan-like pictures. This can be explained by the fact that it is more discriminant regarding the leave’s morphology but less robust to light reflections and shadows. These lighting variations might also explain the degrading performances of the combined features used by SABANCI- OKAN. Run id Participant Scans Scan-like Photographs Mean IFSC USP run2 IFSC 0,562 0,402 0,523 0,496 inria imedia plantnet run1 INRIA 0,685 0,464 0,197 0,449 IFSC USP run1 IFSC 0,411 0,430 0,503 0,448 LIRIS run3 LIRIS 0,546 0,513 0,251 0,437 LIRIS run1 LIRIS 0,539 0,543 0,208 0,430 Sabanci-okan-run1 SABANCI-OKAN 0,682 0,476 0,053 0,404 LIRIS run2 LIRIS 0,530 0,508 0,169 0,403 LIRIS run4 LIRIS 0,537 0,538 0,121 0,399 inria imedia plantnet run2 INRIA 0,477 0,554 0,090 0,374 IFSC USP run3 IFSC 0,356 0,187 0,116 0,220 kmimmis run4 KMIMMIS 0,384 0,066 0,101 0,184 kmimmis run1 KMIMMIS 0,384 0,066 0,040 0,163 UAIC2011 Run01 UAIC 0,199 0,059 0,209 0,156 kmimmis run3 KMIMMIS 0,284 0,011 0,060 0,118 UAIC2011 Run03 UAIC 0,092 0,163 0,046 0,100 kmimmis run2 KMIMMIS 0,098 0,028 0,102 0,076 RMIT run1 RMIT 0,071 0,000 0,098 0,056 RMIT run2 RMIT 0,061 0,032 0,043 0,045 daedalus run1 DAEDALUS 0,043 0,025 0,055 0,041 UAIC2011 Run02 UAIC 0,000 0,000 0,042 0,014 Table 2. Normalized classification scores for each run and each image type. Top 3 results per image type are highlighted in bold 5.2 About using metadata Using metadata to help the identification, and more particularly geo-tags, is definitely something that has to be studied. We were therefore very enthusias- tic to see the runs of UAIC, aimed at evaluating the potential contributions of Pl@ntLeaves metadata. Unfortunately, results clearly show that adding meta- data degrades their identification performances. Metadata alone even give a null 14 Fig. 12. Normalized classification scores for scan images classification success rate. Besides technical details that could probably slightly improve such species filtering based on metadata, it still shows that Pl@ntLeaves metadata might be intrinsically not very useful for identification purposes. The main reason is probably that the geographic spread of the data is limited (French mediterranean area). So that most species of the dataset might be identically and uniformly distributed in the covered area. Geo-tags would be for sure more useful at a global scale (continent, countries). But at a local scale, the geo- graphical distribution of plants is much more complex. It usually depends on localized environmental factors such as sun exposition or water proximity that would require much more data to be modeled. 5.3 Performances per species To evaluate which species are more difficult to identify than others, we averaged the performances over the runs of all participants (for each specie). It is how- ever difficult to understand precisely the score variations. They can be due to morphological variations but also to different view conditions or other statistical bias in the data such as the number of training images. Figure 16 presents the obtained graph for the scan images only (in order to limit view conditions bias). The only global trend we discovered so far is that simple leaves are on average easier to identify than compound leaves. 15 Fig. 13. Normalized classification scores for scan-like photos 5.4 Performances per plant morphology In botany, species are described and categorized by morphological features, fre- quently on leaves [8]. These features are very numerous and concern for instance the leaf organization, the margin type, the shapes, the venation, the presence of spines, etc. We attempt here to evaluate which kinds of morphological fea- tures make more difficult the species identification, through three essential kinds of features: the leaf organization, the global shape (for simple leaves only) and the margin type. Table 17 gives a detailed description of these features for each species involved in the test image dataset. To evaluate which leaf organization makes more difficult the species iden- tification, we averaged the performances over the runs of all participants for each kind of organization: simple or compound and palmately compound. Fig- ure 18 presents the obtained scores for each kind of image. This graph confirms that species with simple leaves are on average easier. However, compound leaves can be subdivided into sub-categories describing how leaflets are organized: pin- nately compound (with leaflets arranged along a the main axis called rachis), or palmately compound (with leaflets attached at one same basal point). Figure 19 presents the obtained scores and illustrates that species with palmately com- pound leaves can be easier than species with simple leaves, at least for the two tested species ”Aesculus hippocastanum” and ”Vitex agnus-castus”. Another in- teresting point is the difference on scores between scan and scan-like for pinnate compound leaves (there were no scan-like images of palmately compound leaves 16 Fig. 14. Normalized classification scores for photographs in the test image dataset). One explanation is that scan-like of compound leaves are often in relief involving disturbing shadows and a global shape . In the case of simple leaves, to evaluate which kind of shape makes more difficult the species identification, we averaged the performances over the runs of all participants for each category of shape identified in the test image dataset: asymmetrical, elliptic, lanceolate, linear, lobate, obovate and orbicular. Figure 20 presents the obtained scores for each kinds of shape and image. Results show that species with an orbicular shape are easier to identify. This is confirmed in the graph 16 where three of the six orbicular species in the test image dataset (”acer campestre”, ”cercis siliquastrum”, ”corylus avellana”) give good results (the best one for ”corylus avellana”). One last important morphological feature studied by botanists is the mar- gin type. In this case, to evaluate which type of margin makes more difficult the species identification, we averaged the performances over the runs of all participants for each category of margin identified in the test image dataset: ”untoothed”, ”dentate”, ”serrate” and ”crenate”. Figure 21 presents the ob- tained scores for each kinds of margin and image. Results show that the margin type weakly affect species identification, except maybe for species with crenate margin. 5.5 Performances per image To qualitatively assess which kind of images causes good results and which one makes all methods failed, we sorted all test pictures by the number of runs in 17 Fig. 15. Normalized classification scores averaged over all image types which they were correctly identified. The obtained ranking confirms that scan images are much easier for the identification. 99% of 100-top ranked images are actually scans (with 11 to 17 successful runs). Figure 22 displays the 4 best identified images (with 17/20 successful runs). They all are very standard leaf images similar to the one found in books illustrations. On the other side, 260 images were not identified by any run with a majority of unconstrained photos (63 scans, 27 scan-like photos, 168 unconstrained photographs). The scans and scan-like photos belonging to this category of most difficult images are very interesting. As illustrated in Figure 23, most of them correspond to outlier leaves with defaults or unusual morphology (ill or torn leaves, missing leaflets, etc.). Figure 23 displays 8 of them. 6 Conclusions This paper presented the overview and the results of ImageCLEF 2011 plant identification testbed. A challenging collaborative dataset of tree’s leaf images was specifically built for this evaluation and 8 participants coming from different countries submitted a total of 20 runs. A first conclusion was that identification performances are close from mature when using scans or photos with uniform background but that unconstrained photos are still much more challenging. More data and evaluation are clearly required to progress on such data. Another im- portant conclusion was that sate-of-the-art methods based on shape boundary analysis were not the best ones on leaf scans. Better performances were no- 18 Fig. 16. Mean classification score per species averaged over all participant runs - scan test images only tably obtained with a local features matching technique usually more dedicated to the large-scale retrieval of rigid objects. On the other side, shape boundary analysis methods remain better on scan-like photos due to their better robust- ness to light reflections and shadow effects. This suggests that combining shape boundary features with part-based rigid object models might be an interesting direction. Adding texture and color information also showed some improvements. On the contrary, using additional metadata such as geo-tags was not concluding on the evaluated dataset. Probably because the geographic spread of the data was limited. Acknowledgement This work was funded by the Agropolis fundation through the project Pl@ntNet (http://www.plantnet-project.org/) and the EU through the CHORUS+ Coor- dination action (http://avmediasearch.eu/) References 1. Agarwal, G., Belhumeur, P., Feiner, S., Jacobs, D., Kress, J.W., R. Ramamoorthi, N.B., Dixit, N., Ling, H., Mahajan, D., Russell, R., Shirdhonkar, S., Sunkavalli, K., White, S.: First steps toward an electronic field guide for plants. Taxon 55, 597–610 (2006) 19 Fig. 17. Mean classification score per leaf organization averaged over all participant runs 2. Albert, R.Z.: Statistical mechanics of complex networks. Ph.D. thesis, Notre Dame, IN, USA (2001) 3. Backes, A.R., Casanova, D., Bruno, O.M.: A complex network-based approach for boundary shape analysis. Pattern Recognition 42(1), 54 – 67 (2009) 4. Belhumeur, P., Chen, D., Feiner, S., Jacobs, D., Kress, W., Ling, H., Lopez, I., Ra- mamoorthi, R., Sheorey, S., White, S., Zhang, L.: Searching the world4s herbaria: A system for visual identification of plant species. In: ECCV, pp. 116–129 (2008) 5. Bruno, O.M., de Oliveira Plotze, R., Falvo, M., de Castro, M.: Fractal dimension applied to plant identification. Information Sciences 178(12), 2722 – 2733 (2008) 6. Casanova, D., ao Batista Florindo, J., Bruno, O.M.: Ifsc/usp at imageclef 2011: Plant identification task. In: Working notes of CLEF 2011 conference (2011) 7. Cerutti, G., Tougne, L., Mille, J., Vacavant, A., Coquin, D.: Guiding active con- tours for tree leaf segmentation and identification. In: Working notes of CLEF 2011 conference (2011) 8. Ellis, B., Daly, D.C., Hickey, L.J., Johnson, K.R., Mitchell, J.D., Mitchell, J.D., Wilf, P., Wing, S.L.: Manual of Leaf Architecture. Comstock Publishing Associates (2009) 9. Goeau, H., Joly, A., Selmi, S., Bonnet, P., Mouysset, E., Joyeux, L.: Visual-based plant species identification from crowdsourced data. In: Proceedings of ACM Mul- timedia 2011 (2011) 10. Goeau, H., Joly, A., Yahiaoui, I., Bonnet, P., Mouysset, E.: Participation of inria & pl@ntnet to imageclef 2011 plant images classification task. In: Working notes of CLEF 2011 conference (2011) 11. Hamid, R.A., Thom, J.A.: Rmit at imageclef 2011 plant identification. In: Working notes of CLEF 2011 conference (2011) 12. Neto, J.C., Meyer, G.E., Jones, D.D., Samal, A.K.: Plant species identification using elliptic fourier leaf shape analysis. Computers and Electronics in Agriculture 50(2), 121 – 134 (2006) 20 Fig. 18. Mean classification score per leaf organization averaged over all participant runs with detailed scores for compound leaves. 13. Söderkvist, O.J.O.: Computer Vision Classification of Leaves from Swedish Trees. Master’s thesis, Linköping University, SE-581 83 Linköping, Sweden (September 2001), liTH-ISY-EX-3132 14. Villena-Román, J., Lana-Serrano, S., González-Cristóbal, J.C.: Daedalus at im- ageclef 2011 plant identification task: Using sift keypoints for object detection. In: Working notes of CLEF 2011 conference (2011) 15. Yahiaoui, I., Hervé, N., Boujemaa, N.: Shape-based image retrieval in botanical collections. In: Advances in Multimedia Information Processing - PCM 2006, vol. 4261, pp. 357–364 (2006) 16. Yanikoglu, B., Aptoula, E., Tirkaz, C.: Sabanci-okan system at imageclef 2011: Plant identification task. In: Working notes of CLEF 2011 conference (2011) 21 Fig. 19. Mean classification score per simple leaf shape averaged over all participant runs Fig. 20. Mean classification score per margin type averaged over all participant runs 22 Fig. 21. 4 best identified images (17/20 successful runs) Fig. 22. Top: 4 of the most difficult test scans (0/20 successful runs) - Bottom: 4 of the most difficult scan-like test photos (0/20 successful runs) 23 Fig. 23. Morphological features of the species used as image test: leaf organization, global shape (for simple leaves only), and margin type