-

Glasgow, UK, April

LifeCLEF: Multimedia Life Species Identification

Alexis Joly

alexis.joly@inria.fr 3

Hervé Glotin

glotin@univ-tln.fr 4

Pierre Bonnet

pierre.bonnet@cirad.fr 0

Henning Müller

Henning.Mueller@hevs.ch 2

Concetto Spampinato

cspampin@dieei.unict.it 5

Willem-Pier Vellinga

wp@xeno-canto.org 7

Hervé Goëau

herve.goeau@inria.fr 3

Andreas Rauber

rauber@ifs.tuwien.ac.at 6

Bob Fisher

rbf@inf.ed.ac.uk 1 0 CIRAD , France 1 Edinburgh Univ. , UK 2 HES-SO , Switzerland 3 INRIA , France 4 IUF & Univ. de Toulon , France 5 University of Catania , Italy 6 Vienna Univ. of Tech. , Austria 7 Xeno-canto foundation , The , Netherlands

2014

1 2014 7 13

Building accurate knowledge of the identity, the geographic distribution and the evolution of living species is essential for a sustainable development of humanity as well as for biodiversity conservation. In this context, using multimedia identi cation tools is considered as one of the most promising solution to help bridging the taxonomic gap. With the recent advances in digital devices/equipment, network bandwidth and information storage capacities, the production of multimedia big data has indeed become an easy task. In parallel, the emergence of citizen sciences and social networking tools has fostered the creation of large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) that have started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art multimedia analysis techniques on such data is still not well understood and is far from reaching the real world's requirements in terms of identi cation tools. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and ne-grained classi cation problems in 3 living worlds. Each task is based on large and real-world data and the measured challenges are de ned in collaboration with biologists and environmental stakeholders in order to re ect realistic usage scenarios.

1.1

LifeCLEF LAB OVERVIEW Motivations

Building accurate knowledge of the identity, the geographic distribution and the evolution of living species is essential for a sustainable development of humanity as well as for biodiversity conservation. Unfortunately, such basic information is often only partially available for professional stakeholders, teachers, scientists and citizens, and more often incomplete for ecosystems that possess the highest diversity, such as tropical regions. A noticeable cause and consequence of this sparse knowledge is that identifying living plants or animals is usually impossible for the general public, and often a difcult task for professionals, such as farmers, sh farmers or foresters, and even also for the naturalists and specialists themselves. This taxonomic gap [ 32 ] was actually identi ed as one of the main ecological challenges to be solved during the Rio's United Nations Conference in 1992.

In this context, using multimedia identi cation tools is considered as one of the most promising solution to help bridging the taxonomic gap [ 23, 11, 8, 31, 28, 1, 30, 18 ]. With the recent advances in digital devices, network bandwidth and information storage capacities, the production of multimedia data has indeed become an easy task. In parallel, the emergence of citizen sciences and social networking tools has fostered the creation of large and structured communities of nature observers (e.g. eBird1, Xeno-canto2, Tela Botanica3, etc.) that have started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art multimedia analysis techniques on such data is still not well understood and are far from reaching the real world's requirements in terms of identication tools [ 18 ]. Most existing studies or available tools typically identify a few tens or hundreds of species with moderate accuracy whereas they should be scaled-up to take one, two or three orders of magnitude more, in terms of number of species (the total number of living species on earth is estimated to be around 10K for birds, 30K for shes, 300K for plants and more than 1.2M for invertebrates [ 6 ]). 1.2

Evaluated Tasks

The LifeCLEF lab proposes to evaluate these challenges in the continuity of the image-based plant identi cation task [ 19 ] that was run within ImageCLEF lab during the last three years with an increasing number of participants. It however radically enlarges the evaluated challenge towards multimodal data by (i) considering birds and sh in addition to plants (ii) considering audio and video contents in addition to images (iii) scaling-up the evaluation data to hundreds of thousands of life media records and thousands of living species. More concretely, the lab is organized around

1http://ebird.org/

2http://www.xeno-canto.org/ 3http://www.tela-botanica.org/ three tasks:

PlantCLEF: an image-based plant identi cation task BirdCLEF: an audio-based bird identi cation task FishCLEF: a video-based sh identi ca

tion task As described in more detail in the following sections, each task is based on big and real-world data and the measured challenges are de ned in collaboration with biologists and environmental stakeholders so as to re ect realistic usage scenarios. For this pilot year, the three tasks are mainly concerned with species identi cation, i.e., helping users to retrieve the taxonomic name of an observed living plant or animal. Taxonomic names are actually the primary key to organize life species and to access all available information about them either on the web, or in herbariums, in scienti c literature, books or magazines, etc. Identifying the taxon observed in a given multimedia record and aligning its name with a taxonomic reference is therefore a key step before any other indexing or information retrieval task. More focused or complex challenges (such as detecting species duplicates or ambiguous species) could be evaluated in coming years.

The three tasks are primarily focused on content-based approaches (i.e. on the automatic analyses of the audio and visual signals) rather than on interactive information retrieval approaches involving textual or graphical morphological attributes. The content-based approach to life species identi cation has several advantages. It is rst intrinsically language-independent and solves many of the multilingual issues related to the use of classical text-based morphological keys that are strongly language dependent and understandable only by few experts in the world. Furthermore, an expert of one region or a speci c taxonomic group does not necessarily know the vocabulary dedicated to another group of living organisms. A content-based approach can then be much more easily generalizable to new oras or faunas contrary to knowledge-based approaches that require building complex models manually (ontologies with rich descriptions, graphical illustrations of morphological attributes, etc.). On the other hand, LifeCLEF lab is inherently cross-modal through the presence of contextual and social data associated to the visual and audio contents. This includes geo-tags or location names, time information, author names, collaborative ratings or comments, vernacular names (common names of plants or animals), organ or picture type tags, etc. The rules regarding the use of these meta-data in the evaluated identi cation methods will be speci ed in the description of each task. Overall, these rules are always designed so as to re ect real possible usage scenarios while o ering the largest diversity in the a ordable approaches. 1.3

Expected Outcomes

The main expected outcomes of LifeCLEF evaluation campaign are the following: give a snapshot of the performances of state-of-the-art multimedia techniques towards building real-world life species identi cation systems provide large and original data sets of biological records, and then allow comparison of multimedia-based identi cation techniques boost research and innovation on this topic in the next few years and encourage multimedia researchers to work on trans-disciplinary challenges involving ecological and environmental data foster technological ports from one domain to another and exchanges between the di erent communities (information retrieval, computer vision, bio-accoustic, machine learning, etc.) promote citizen science and nature observation as a way to describe, analyse and preserve biodiversity 2. 2.1

TASK1: PlantCLEF Context

Content-based image retrieval approaches are nowadays considered to be one of the most promising solution to help bridge the botanical taxonomic gap, as discussed in [ 14 ] or [ 22 ] for instance. We therefore see an increasing interest in this trans-disciplinary challenge in the multimedia community (e.g. in [ 16, 9, 21, 24, 17, 4 ]). Beyond the raw identi cation performances achievable by state-of-theart computer vision algorithms, the visual search approach o ers much more e cient and interactive ways of browsing large oras than standard eld guides or online web catalogs. Smartphone applications relying on such imagebased identi cation services are particularly promising for setting-up massive ecological monitoring systems, involving hundreds of thousands of contributors at a very low cost. The rst noticeable progress in this way was achieved by the US consortium at the origin of LeafSnap4. This popular iPhone application allows a fair identi cation of 185 common American plant species by simply shooting a cut leaf on a uniform background (see [ 22 ] for more details). A step beyond was achieved recently by the Pl@ntNet project [ 18 ] which released a cross-platform application (iPhone [ 13 ], android5 and web 6) allowing (i) to query the system with pictures of plants in their natural environment and (ii) to contribute to the dataset thanks to a collaborative data validation work ow involving Tela Botanica7 (i.e. the largest botanical social network in Europe).

As promising as these applications are, their performances are however still far from the requirements of a real-world social-based ecological surveillance scenario. Allowing the mass of citizens to produce accurate plant observations requires to equip them with much more accurate identi cation tools. Measuring and boosting the performances of contentbased identi cation tools is therefore crucial. This was precisely the goal of the ImageCLEF8 plant identi cation task organized since 2011 in the context of the worldwide evaluation forum CLEF9. In 2011, 2012 and 2013 respectively 8, 10

4http://leafsnap.com/

5https://play.google.com/store/apps/details?id=org.plantnethl=fr 6http://identify.plantnet-project.org/ 7http://www.tela-botanica.org/ 8http://www.imageclef.org/ 9http://www.clef-initiative.eu/ and 12 international research groups did cross the nish line of this large collaborative evaluation by benchmarking their images-based plant identi cation systems (see [ 14 ], [ 15 ] and [ 19 ] for more details). Data mobilised during these 3 rst years can be consulted at the following url10, geographic distribution of theses botanical records can be seen on Figure 1.

Contrary to previous evaluations reported in the literature, the key objective was to build a realistic task closer to real-world conditions (di erent users, cameras, areas, periods of the year, individual plants, etc.). This was initially achieved through a citizen science initiative initiated 4 years ago in the context of the Pl@ntNet project in order to boost the image production of Tela Botanica social network. The evaluation data was enriched each year with the new contributions and progressively diversi ed with other input feeds (Annotation and cleaning of older data, contributions made through Pl@ntNet mobile applications). The plant task of LifeCLEF 2014 is directly in the continuity of this e ort. Main novelties compared to the last years are the following: (i) an explicit multi-image query scenario (ii) the supply of user ratings on image quality in the meta-data (iii) a new type of view called "Branch" additionally to the 6 previous ones (iv) basically more species (about 500 which is an important step towards covering the entire ora of a given region). 2.2

Dataset

More precisely, PlantCLEF 2014 dataset is composed of 60,962 pictures belonging to 19,504 observations of 500 species of trees, herbs and ferns living in a European region centered around France. This data was collected by 1608 distinct contributors. Each picture belongs to one and only one of the 7 types of view reported in the meta-data (entire plant, fruit, leaf, ower, stem, branch, leaf scan) and is associated with a single plant observation identi er allowing to link it with the other pictures of the same individual plant (observed the same day by the same person). It is noticeable that most 10http://publish.plantnet-project.org/project/plantclef image-based identi cation methods and evaluation data proposed in the past were so far based on leaf images (e.g. in [ 22, 5, 9 ] or in the more recent methods evaluated in [ 15 ]). Only few of them were focused on ower's images as in [ 25 ] or [ 3 ]. Leaves are far from being the only discriminant visual key between species but, due to their shape and size, they have the advantage to be easily observed, captured and described. More diverse parts of the plants however have to be considered for accurate identi cation. As an example, the 6 species depicted in Figure 2 share the same French common name of "laurier" even though they belong to di erent taxonomic groups (4 families, 6 genera).

The main reason is that these shrubs, often used in hedges, share leaves with more or less the same-sized elliptic shape. Identifying a laurel can be very di cult for a novice by just observing leaves, while it is undisputably easier with owers. Beyond identi cation performances, the use of leaves alone has also some practical and botanical limitations. Leaves are not visible all over the year for a large fraction of plant species. Deciduous species, distributed from temperate to tropical regions, can't be identi ed by the use of their leaves over di erent periods of the year. Leaves can be absent (ie. lea ess species), too young or too much degraded (by pathogen or insect attacks), to be exploited e ciently. Moreover, leaves of many species are intrinsically not enough informative or very di cult to capture (needles of pines, thin leaves of grasses, huge leaves of palms, ...).

Another originality of PlantCLEF dataset is that its social nature makes it closer to the conditions of a real-world identi cation scenario: (i) images of the same species are coming from distinct plants living in distinct areas (ii) pictures are taken by di erent users that might not used the same protocol to acquire the images (iii) pictures are taken at di erent periods in the year. Each image of the dataset is associated with contextual meta-data (author, date, locality name, plant id) and social data (user ratings on image quality, collaboratively validated taxon names, vernacular names) provided in a structured xml le. The gps geo-localization and the device settings are available only for some of the images. Table 3 gives some examples of pictures with decreasing averaged users ratings for the di erent types of views. Note that the users of the specialized social network creating these ratings (Tela Botanica) are explicitely asked to rate the images according to their plant identi cation ability and their accordance to the pre-de ned acquisition protocol for each view type. This is not an aesthetic or general interest judgement as in most social image sharing sites. 2.3

Task Description

The task will be evaluated as a plant species retrieval task based on multi-image plant observations queries. The goal is to retrieve the correct plant species among the top results of a ranked list of species returned by the evaluated system. Contrary to previous plant identi cation benchmarks, queries are not de ned as single images but as plant observations, meaning a set of one to several images depicting the same individual plant, observed by the same person, the same day. Each image of a query observation is associated with a single view type (entire plant, branch, leaf, fruit, ower, stem or leaf scan) and with contextual metadata (data, location, author). Each participating group is allowed to submit up to 4 runs built from di erent methods. Semi-supervised and interactive approaches (particularly for segmenting parts of the plant from the background), are allowed but will be compared independently from fully automatic methods. Any human assistance in the processing of the test queries has therefore to be signaled in the submitted runs meta-data.

In practice, the whole PlantCLEF dataset is split in two parts, one for training (and/or indexing) and one for testing. The test set was built by randomly choosing 1/3 of the observations of each species whereas the remaining observations were kept in the reference training set. The xml les containing the meta-data of the query images were purged so as to erase the taxon name (the ground truth), the vernacular name (common name of the plant) and the image quality ratings (that would not be available at query stage in a real-world mobile application). Meta-data of the observations in the training set are kept unaltered.

The metric used to evaluate the submitted runs will be a score related to the rank of the correct species in the returned list. Each query observation will be attributed with a score between 0 and 1 re ecting equal to the inverse of the rank of the correct species (equal to 1 if the correct species is the top-1 decreasing quickly while the rank of the correct species increases). An average score will then be computed across all plant observation queries. A simple mean on all plant observation test would however introduce some bias. Indeed, we remind that the PlantCLEF dataset was built in a collaborative manner. So that few contributors might have provided much more observations and pictures than many other contributors who provided few. Since we want to evaluate the ability of a system to provide the correct answers to all users, we rather measure the mean of the average classi cation rate per author. Finally, our primary metric is de ned as the following average classi cation score S:

U S = 1 X

1 XPu 1 U u=1 Pu p=1 Nu;p su;p where U is the number of users, Pu the number of individual plants observed by the u-th user, Nu;p the number of pictures of the p-th plant observation of the u-th user, su;p is the score between 1 and 0 equals to the inverse of the rank of the correct species. 3. 3.1

TASK2: BirdCLEF Context

The bird and the plant identi cation tasks share similar usage scenarios. The general public as well as professionals like park rangers, ecology consultants, and of course, the ornithologists themselves might actually be users of an automated bird identifying system, typically in the context of wider initiatives related to ecological surveillance or biodiversity conservation. Using audio records rather than bird pictures is justi ed by current practices [ 8, 31, 30, 7 ]. Birds are actually not easy to photograph as they are most of the time hidden, perched high in a tree or frightened by human presence, and they can y very quickly, whereas audio calls and songs have proved to be easier to collect and very discriminant.

Only three noticeable previous initiatives on bird species identi cation based on their songs or calls in the context of worldwide evaluation took place, in 2013. The rst one was the ICML4B bird challenge joint to the international Conference on Machine Learning in Atlanta, June 2013. It was initiated by the SABIOD MASTODONS CNRS group11, the university of Toulon and the National Natural History Museum of Paris [ 12 ]. It included 35 species, and 76 participants submitted their 400 runs on the Kaggle interface. The second challenge was conducted by F. Brigs at MLSP 2013 workshop, with 15 species, and 79 participants in August 2013. The third challenge, and biggest in 2013, was organised by University of Toulon, SABIOD and Biotope, with 80 species from the Provence, France. More than thirty teams participated, reaching 92% of average AUC. The description of the ICML4B best systems are given into the on-line book [ 2 ], including for some of them reference to some useful scripts.

In collaboration with the organizers of these previous challenges, BirdCLEF 2014 goes one step further by (i) signi cantly increasing the species number by almost an order of magnitude (ii) working on real-world social data built from hundreds of recordists (iii) moving to a more usage-driven and system-oriented benchmark by allowing the use of metadata and de ning information retrieval oriented metrics. Over11http://sabiod.org all, the task is expected to be much more di cult than previous benchmarks because of the higher confusion risk between the classes, the higher background noise and the higher diversity in the acquisition conditions (devices, recordists uses, contexts diversity, etc.). It will therefore probably produce substantially lower scores and o er a better progression margin towards building real-world generalist identi cation tools. 3.2

Dataset

The training and test data of the bird task is composed by audio recordings collected by Xeno-canto (XC)12. Xenocanto is a web-based community of bird sound recordists worldwide with about 1500 active contributors that have already collected more than 150,000 recordings of about 9000 species. Nearly 500 species from Brazilian forests are used in the BirdCLEF dataset, representing the 500 species of that region with the highest number of recordings, totalling about 14,000 recordings produced by hundreds of users. Figure 4 illustrates the geographical distribution of the dataset samples.

To avoid any bias in the evaluation related to the used audio devices, each audio le has been normalized to a constant bandwidth of 44.1 kHz and coded over 16 bits in wav mono format (the right channel is selected by default). The conversion from the original Xeno-canto data set was done using mpeg, sox and matlab scripts. The optimized 16 Mel Filter Cepstrum Coe cients for bird identi cation (according to an extended benchmark [ 10 ]) have been computed with their rst and second temporal derivatives on the whole set. They were used in the best systems run in ICML4B and NIPS4B challenges.

Audio records are associated with various meta-data including the species of the most active singing bird, the species of the other birds audible in the background, the type of sound (call, song, alarm, ight, etc.), the date and location of the observations (from which rich statistics on species distribution can be derived), some textual comments of the authors, multilingual common names and collaborative quality ratings. All of them were produced collaboratively by Xenocanto community. 3.3

Participants are asked to determine the species of the most active singing birds in each query le. The background noise can be used as any other meta-data, but it is forbidden to correlate the test set of the challenge with the original annotated Xeno-canto data base (or with any external content as many of them are circulating on the web). More precisely and similarly to the plant task, the whole BirdCLEF dataset has been split in two parts, one for training (and/or indexing) and one for testing. The test set was built by randomly choosing 1/3 of the observations of each species whereas the remaining observations were kept in the reference training set. Recordings of the same species done by the same person the same day are considered as being part of the same observation and cannot be split across the test and training set. The xml les containing the meta-data of the query recordings were purged so as to erase the taxon name (the ground truth), the vernacular name (common name of the bird) and the collaborative quality ratings (that would not be available at query stage in a real-world mobile application). Meta-data of the recordings in the training set are kept unaltered.

The groups participating to the task will be asked to produce up to 4 runs containing a ranked list of the most probable species for each query records of the test set. Each species will have to be associated with a normalized score in the range [0; 1] re ecting the likelihood that this species is singing in the sample. The primary metric used to compare the runs will be the Mean Average Precision averaged across all queries. Additionally, to allow easy comparisons with the previous Kaggle ICML4B and NIPS4B benchmarks, the AUC under the ROC curve will be computed for each species, and averaged over all species. 4. 4.1

TASK3: FishCLEF Context

Underwater video monitoring has been widely used in recent years for marine video surveillance, as opposed to human manned photography or net-casting methods, since it does not in uence sh behavior and provides a large amount of material at the same time. However, it is impractical for humans to manually analyze the massive quantity of video data daily generated, because it requires much time and concentration and it is also error prone. Automatic sh identi cation in videos is therefore of crucial importance, in order to estimate sh existence and quantity [ 29, 28, 26 ]. Moreover, it would help supporting marine biologists to understand the natural underwater environment, promote its preservation, and study behaviors and interactions between marine animals that are part of it. Beyond this, video-based sh species identi cation nds applications in many other contexts: from education (e.g. primary/high schools) to the entertainment industry (e.g. in aquarium).

To the best of our knowledge, this is the rst worldwide initiative on automatic image and video based sh species identi cation. 4.2

Dataset

The underwater video dataset used for FishCLEF, is derived from the Fish4Knowledge13 video repository, which contains about 700k 10-minute video clips that were taken in the past ve years to monitor Taiwan coral reefs. The Taiwan area is particularly interesting for studying the marine ecosystem, as it holds one of the largest sh biodiversities of the world with more than 3000 di erent sh species whose taxonomy is available at 14. The dataset contains videos recorded from sunrise to sunset showing several phenomena, e.g. murky water, algae on camera lens, etc., which makes the sh identi cation task more complex. Each video has a resolution of 320x240 with 8 fps and comes with some additional metadata including date and localization of the recordings. Figure 5 shows 4 snapshots of 4 cameras monitoring the coral reef by Taiwan's Kenting site and it illustrates the complexity of automatic sh detection and recognition in real-life settings.

More speci cally, the FishCLEF dataset consists of about 3000 videos with several thousands of detected sh. The sh detections were obtained by processing such underwater videos with video analysis tools [ 27 ] and then manually labeled using the system in [ 20 ]. 4.3

Task Description

The dataset for the video-based sh identi cation task will be released in two times: the participants will rst have access to the training set and a few months later, they will be provided with the testing set. The goal is to automatically detect sh and its species. The task comprises three sub-tasks: 1) identifying moving objects in videos by either background modeling or object detection methods, 2) detecting sh instances in video frames and then 3) identifying species (taken from a subset of the most seen sh species) of sh detected in video frames.

Participants could decide to compete for only one subtask or all subtasks. Although tasks 2 and 3 are based on still images, participants are invited to exploit motion information extracted from videos to support their strategies.

As scoring functions, the authors are asked to produce: ROC curves for sub-task one. In particular, precision, recall and F-measures measured when comparing, on a pixel basis, the ground truth binary masks and the 13www. sh4knowledge.eu 14http:// shdb.sinica.edu.tw/ output masks of the object detection methods are required; Recall for sh detection in still images as a function of bounding box overlap percentage: a detection is considered true positive if the PASCAL score between it and the corresponding object in the ground truth is over 0.7; Average recall and recall per sh species for the sh recognition subtask.

The participants to the above tasks will be asked to produce several runs containing a list of detected sh together with their species (only for subtask 3). When dealing sh species identi cation, a ranked list of the most probable species (and the related likelihood values) for each detected sh must be provided.

5. SCHEDULE AND PERSPECTIVES

LifeCLEF 2014 registrations opened in December 2013 and will close at the end of April 2014. At the time of writing, already 61 research groups registered to at least one of the three task and this number will continue growing. As in any evaluation campaign, many of the registered groups won't cross the nish line be submitting o cial runs but this re ects at least their interest in LifeCLEF data and the related challenges. The schedule of the ongoing and remaining steps of LifeCLEF 2014 campaign is the following:

ADDITIONAL AUTHORS

Robert Planque (Xeno-Canto, The Netherlands, email: r.planque@vu.nl)

[1] MAED '12: Proceedings of the 1st ACM International Workshop on Multimedia Analysis for Ecological Data , New York, NY, USA, 2012 . ACM. 433127 .

[2] Proc. of the rst workshop on Machine Learning for Bioacoustics , 2013 .

[3]

Angelova ,

Zhu ,

Lin ,

Wong , and

Shpecht . Development and deployment of a large-scale ower recognition mobile app , December 2012 .

[4]

Aptoula and

Yanikoglu . Morphological features for leaf based plant recognition . In Proc. IEEE Int. Conf. Image Process ., Melbourne , Australia, page 7 , 2013 .

[5]

A. R.

Backes ,

Casanova , and

O. M.

Bruno . Plant leaf identi cation based on volumetric fractal dimension . International Journal of Pattern Recognition and Arti cial Intelligence , 23 ( 6 ): 1145 { 1160 , 2009 .

[6] H.-T. C. Baillie , J.E.M. and

Stuart . 2004 iucn red list of threatened species. a global species assessment . IUCN, Gland, Switzerland and Cambridge, UK, 2004 .

[7]

Briggs ,

Lakshminarayanan ,

Neal ,

X. Z.

Fern ,

Raich ,

S. J.

Hadley ,

A. S.

Hadley , and

M. G.

Betts . Acoustic classi cation of multiple simultaneous bird species: A multi-instance multi-label approach . The Journal of the Acoustical Society of America , 131 : 4640 , 2012 .

[8]

Cai ,

Ee ,

Pham ,

Roe , and J. Zhang. Sensor network for the monitoring of ecosystem: Bird species recognition . In Intelligent Sensors, Sensor Networks and Information , 2007 . ISSNIP 2007 . 3rd International Conference on, pages 293 { 298 , Dec 2007 .

[9]

Cerutti ,

Tougne ,

Vacavant , and

Coquin . A parametric active polygon for leaf segmentation and shape estimation . In International Symposium on Visual Computing , pages 202 { 213 , 2011 .

[10]

Dufour ,

Artieres , H. GLOTIN , and

Giraudet . Clusterized mel lter cepstral coe cients and support vector machines for bird song iden cation . 2013 .

[11]

K. J.

Gaston and M. A. O'Neill . Automated species identi cation: why not? Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 359 ( 1444 ): 655 { 667 , 2004 .

[12]

Glotin and

Sueur . Overview of the rst international challenge on bird classi cation . 2013 .

[13]

Go eau,

Bonnet ,

Joly ,

Bakic ,

Barbe , I. Yahiaoui,

Selmi ,

Carre ,

Barthelemy ,

Boujemaa , et al. Pl@ ntnet mobile app . In Proceedings of the 21st ACM international conference on Multimedia , pages 423 { 424 . ACM, 2013 .

[14]

Go eau,

Bonnet ,

Joly ,

Boujemaa ,

Barthelemy ,

J.-F.

Molino ,

Birnbaum , E. Mouysset, and

Picard . The ImageCLEF 2011 plant images classi cation task . In CLEF working notes , 2011 .

[15]

Go eau,

Bonnet ,

Joly ,

Yahiaoui ,

Barthelemy ,

Boujemaa , and

J.-F.

Molino . The imageclef 2012 plant identi cation task . In CLEF working notes , 2012 .

[16]

Go eau, A . Joly,

Selmi ,

Bonnet ,

Mouysset ,

Joyeux ,

J.-F.

Molino ,

Birnbaum ,

Bathelemy , and

Boujemaa . Visual-based plant species identi cation from crowdsourced data . In ACM conference on Multimedia , pages 813 { 814 , 2011 .

[17]

Hazra ,

Deb ,

Kundu ,

Hazra , et al. Shape oriented feature selection for tomato plant identi cation . International Journal of Computer Applications Technology and Research , 2 ( 4 ):449{meta, 2013 .

[18]

Joly ,

Goeau ,

Bonnet ,

Bakic ,

Barbe ,

Selmi , I. Yahiaoui ,

Carre ,

Mouysset ,

J.-F.

Molino ,

Boujemaa , and

Barthelemy . Interactive plant identi cation based on social image data . Ecological Informatics , 2013 .

[19]

Joly , H. Goeau,

Bonnet ,

Bakic ,

J.-F.

Molino ,

Barthelemy , and

Boujemaa . The Imageclef Plant Identi cation Task 2013 . In International workshop on Multimedia analysis for ecological data , Barcelone, Espagne, Oct. 2013 .

[20]

Kavasidis ,

Palazzo ,

Salvo ,

Giordano , and

Spampinato . An innovative web-based collaborative platform for video annotation . Multimedia Tools and Applications , pages 1 { 20 , 2013 .

[21]

Kebapci ,

Yanikoglu , and

Unal . Plant image retrieval using color, shape and texture features . The Computer Journal , 54 ( 9 ): 1475 { 1490 , 2011 .

[22]

Kumar ,

P. N.

Belhumeur ,

Biswas ,

D. W.

Jacobs ,

W. J.

Kress ,

I. C.

Lopez , and

J. V. B.

Soares . Leafsnap: A computer vision system for automatic plant species identi cation . In European Conference on Computer Vision , pages 502 { 516 , 2012 .

[23] D.-J. Lee , R. B.

Schoenberger , D.

Shiozawa , X.

Xu , and P.

Zhan . Contour matching for a sh recognition and migration-monitoring system . In Optics East , pages 37 { 48 . International Society for Optics and Photonics , 2004 .

[24]

Mouine , I.

Yahiaoui, and

Verroust-Blondet . Advanced shape context for plant species identi cation using leaf image retrieval . In ACM International Conference on Multimedia Retrieval , pages 49 :1{ 49 : 8 , 2012 .

[25] M.-E. Nilsback and A. Zisserman . Automated ower classi cation over a large number of classes . In Indian Conference on Computer Vision, Graphics and Image Processing , pages 722 { 729 , 2008 .

[26]

M. R.

Shortis ,

Ravanbakskh ,

Shaifat ,

E. S.

Harvey ,

Mian ,

J. W.

Seager ,

P. F.

Culverhouse ,

D. E.

Cline , and

D. R.

Edgington . A review of techniques for the identi cation and measurement of sh in underwater stereo-video image sequences . In SPIE Optical Metrology 2013 , pages 87910G { 87910G . International Society for Optics and Photonics , 2013 .

[27]

Spampinato ,

Beauxis-Aussalet ,

Palazzo ,

Beyan ,

Ossenbruggen ,

He ,

Boom , and

Huang . A rule-based event detection system for real-life underwater domain . Machine Vision and Applications , 25 ( 1 ): 99 { 117 , 2014 .

[28]

Spampinato ,

Y.-H.

Chen-Burger ,

Nadarajan , and R. B. Fisher. Detecting, tracking and counting sh in low quality unconstrained underwater videos . In VISAPP (2) , pages 514 { 519 . Citeseer , 2008 .

[29]

Spampinato ,

Giordano ,

R. Di

Salvo , Y. -H. J. Chen-Burger , R. B. Fisher, and G. Nadarajan . Automatic sh classi cation for underwater species behavior understanding . In Proceedings of ACM ARTEMIS 2010 , pages 45 { 50 . ACM, 2010 .

[30]

Towsey ,

Planitz ,

Nantes ,

Wimmer , and

Roe . A toolbox for animal call recognition . Bioacoustics , 21 ( 2 ): 107 { 125 , 2012 .

[31]

V. M.

Trifa ,

A. N.

Kirschel ,

C. E.

Taylor , and

E. E.

Vallejo . Automated species recognition of antbirds in a mexican rainforest using hidden markov models . The Journal of the Acoustical Society of America , 123 : 2424 , 2008 .

[32]

Q. D.

Wheeler ,

P. H.

Raven , and

E. O.

Wilson . Taxonomy: Impediment or expedient? Science , 303 ( 5656 ): 285 , 2004 .