=Paper= {{Paper |id=Vol-1222/paper2 |storemode=property |title=LifeCLEF: Multimedia life species identification |pdfUrl=https://ceur-ws.org/Vol-1222/paper2.pdf |volume=Vol-1222 |dblpUrl=https://dblp.org/rec/conf/mir/JolyMGGSRBVFP14 }} ==LifeCLEF: Multimedia life species identification== https://ceur-ws.org/Vol-1222/paper2.pdf
              LifeCLEF: Multimedia Life Species Identification

                        Alexis Joly                               Henning Müller                           Hervé Goëau
                       INRIA, France                           HES-SO, Switzerland                           INRIA, France
                 alexis.joly@inria.fr                    Henning.Mueller@hevs.ch                      herve.goeau@inria.fr
                    Hervé Glotin                          Concetto Spampinato                           Andreas Rauber
            IUF & Univ. de Toulon, France                   University of Catania, Italy            Vienna Univ. of Tech., Austria
                  glotin@univ-tln.fr                      cspampin@dieei.unict.it                   rauber@ifs.tuwien.ac.at
                     Pierre Bonnet                          Willem-Pier Vellinga                             Bob Fisher
                      CIRAD, France                        Xeno-canto foundation, The                    Edinburgh Univ., UK
              pierre.bonnet@cirad.fr                              Netherlands                            rbf@inf.ed.ac.uk
                                                              wp@xeno-canto.org

ABSTRACT                                                                          sparse knowledge is that identifying living plants or animals
Building accurate knowledge of the identity, the geographic                       is usually impossible for the general public, and often a dif-
distribution and the evolution of living species is essential                     ficult task for professionals, such as farmers, fish farmers or
for a sustainable development of humanity as well as for                          foresters, and even also for the naturalists and specialists
biodiversity conservation. In this context, using multimedia                      themselves. This taxonomic gap [32] was actually identified
identification tools is considered as one of the most promis-                     as one of the main ecological challenges to be solved during
ing solution to help bridging the taxonomic gap. With the                         the Rio’s United Nations Conference in 1992.
recent advances in digital devices/equipment, network band-                          In this context, using multimedia identification tools is
width and information storage capacities, the production of                       considered as one of the most promising solution to help
multimedia big data has indeed become an easy task. In par-                       bridging the taxonomic gap [23, 11, 8, 31, 28, 1, 30, 18].
allel, the emergence of citizen sciences and social networking                    With the recent advances in digital devices, network band-
tools has fostered the creation of large and structured com-                      width and information storage capacities, the production of
munities of nature observers (e.g. eBird, Xeno-canto, Tela                        multimedia data has indeed become an easy task. In paral-
Botanica, etc.) that have started to produce outstanding                          lel, the emergence of citizen sciences and social networking
collections of multimedia records. Unfortunately, the perfor-                     tools has fostered the creation of large and structured com-
mance of the state-of-the-art multimedia analysis techniques                      munities of nature observers (e.g. eBird1 , Xeno-canto2 , Tela
on such data is still not well understood and is far from                         Botanica3 , etc.) that have started to produce outstanding
reaching the real world’s requirements in terms of identifi-                      collections of multimedia records. Unfortunately, the perfor-
cation tools. The LifeCLEF lab proposes to evaluate these                         mance of the state-of-the-art multimedia analysis techniques
challenges around 3 tasks related to multimedia information                       on such data is still not well understood and are far from
retrieval and fine-grained classification problems in 3 living                    reaching the real world’s requirements in terms of identi-
worlds. Each task is based on large and real-world data and                       fication tools [18]. Most existing studies or available tools
the measured challenges are defined in collaboration with                         typically identify a few tens or hundreds of species with mod-
biologists and environmental stakeholders in order to reflect                     erate accuracy whereas they should be scaled-up to take one,
realistic usage scenarios.                                                        two or three orders of magnitude more, in terms of number
                                                                                  of species (the total number of living species on earth is es-
                                                                                  timated to be around 10K for birds, 30K for fishes, 300K for
1.    LifeCLEF LAB OVERVIEW                                                       plants and more than 1.2M for invertebrates [6]).

1.1     Motivations                                                               1.2    Evaluated Tasks
   Building accurate knowledge of the identity, the geographic                       The LifeCLEF lab proposes to evaluate these challenges in
distribution and the evolution of living species is essential for                 the continuity of the image-based plant identification task
a sustainable development of humanity as well as for biodi-                       [19] that was run within ImageCLEF lab during the last
versity conservation. Unfortunately, such basic information                       three years with an increasing number of participants. It
is often only partially available for professional stakeholders,                  however radically enlarges the evaluated challenge towards
teachers, scientists and citizens, and more often incomplete                      multimodal data by (i) considering birds and fish in addition
for ecosystems that possess the highest diversity, such as                        to plants (ii) considering audio and video contents in addi-
tropical regions. A noticeable cause and consequence of this                      tion to images (iii) scaling-up the evaluation data to hun-
                                                                                  dreds of thousands of life media records and thousands of
                                                                                  living species. More concretely, the lab is organized around
Copyright c by the paper’s authors. Copying permitted only for private
and academic purposes.                                                            1
                                                                                    http://ebird.org/
In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of       2
the International Workshop on Environmental Multimedia Retrieval (EMR               http://www.xeno-canto.org/
                                                                                  3
2014), Glasgow, UK, April 1, 2014, published at http://ceur-ws.org                  http://www.tela-botanica.org/


                                                                              7
three tasks:                                                               • provide large and original data sets of biological records,
                                                                             and then allow comparison of multimedia-based iden-
                                                                             tification techniques
          PlantCLEF: an image-based plant iden-
   tification task                                                         • boost research and innovation on this topic in the next
                                                                             few years and encourage multimedia researchers to work
         BirdCLEF: an audio-based bird identifi-                             on trans-disciplinary challenges involving ecological and
   cation task                                                               environmental data

         FishCLEF: a video-based fish identifica-                          • foster technological ports from one domain to another
   tion task                                                                 and exchanges between the different communities (in-
                                                                             formation retrieval, computer vision, bio-accoustic, ma-
As described in more detail in the following sections, each
                                                                             chine learning, etc.)
task is based on big and real-world data and the measured
challenges are defined in collaboration with biologists and                • promote citizen science and nature observation as a
environmental stakeholders so as to reflect realistic usage                  way to describe, analyse and preserve biodiversity
scenarios. For this pilot year, the three tasks are mainly
concerned with species identification, i.e., helping users to
retrieve the taxonomic name of an observed living plant or            2.     TASK1: PlantCLEF
animal. Taxonomic names are actually the primary key to
organize life species and to access all available information         2.1     Context
about them either on the web, or in herbariums, in scientific            Content-based image retrieval approaches are nowadays
literature, books or magazines, etc. Identifying the taxon            considered to be one of the most promising solution to help
observed in a given multimedia record and aligning its name           bridge the botanical taxonomic gap, as discussed in [14]
with a taxonomic reference is therefore a key step before any         or [22] for instance. We therefore see an increasing inter-
other indexing or information retrieval task. More focused            est in this trans-disciplinary challenge in the multimedia
or complex challenges (such as detecting species duplicates           community (e.g. in [16, 9, 21, 24, 17, 4]). Beyond the
or ambiguous species) could be evaluated in coming years.             raw identification performances achievable by state-of-the-
   The three tasks are primarily focused on content-based             art computer vision algorithms, the visual search approach
approaches (i.e. on the automatic analyses of the audio               offers much more efficient and interactive ways of brows-
and visual signals) rather than on interactive information            ing large floras than standard field guides or online web
retrieval approaches involving textual or graphical morpho-           catalogs. Smartphone applications relying on such image-
logical attributes. The content-based approach to life species        based identification services are particularly promising for
identification has several advantages. It is first intrinsi-          setting-up massive ecological monitoring systems, involving
cally language-independent and solves many of the multi-              hundreds of thousands of contributors at a very low cost.
lingual issues related to the use of classical text-based mor-        The first noticeable progress in this way was achieved by
phological keys that are strongly language dependent and              the US consortium at the origin of LeafSnap4 . This pop-
understandable only by few experts in the world. Further-             ular iPhone application allows a fair identification of 185
more, an expert of one region or a specific taxonomic group           common American plant species by simply shooting a cut
does not necessarily know the vocabulary dedicated to an-             leaf on a uniform background (see [22] for more details). A
other group of living organisms. A content-based approach             step beyond was achieved recently by the Pl@ntNet project
can then be much more easily generalizable to new floras              [18] which released a cross-platform application (iPhone [13],
or faunas contrary to knowledge-based approaches that re-             android5 and web 6 ) allowing (i) to query the system with
quire building complex models manually (ontologies with               pictures of plants in their natural environment and (ii) to
rich descriptions, graphical illustrations of morphological at-       contribute to the dataset thanks to a collaborative data val-
tributes, etc.). On the other hand, LifeCLEF lab is inher-            idation workflow involving Tela Botanica7 (i.e. the largest
ently cross-modal through the presence of contextual and              botanical social network in Europe).
social data associated to the visual and audio contents. This
includes geo-tags or location names, time information, au-               As promising as these applications are, their performances
thor names, collaborative ratings or comments, vernacular             are however still far from the requirements of a real-world
names (common names of plants or animals), organ or pic-              social-based ecological surveillance scenario. Allowing the
ture type tags, etc. The rules regarding the use of these             mass of citizens to produce accurate plant observations re-
meta-data in the evaluated identification methods will be             quires to equip them with much more accurate identification
specified in the description of each task. Overall, these rules       tools. Measuring and boosting the performances of content-
are always designed so as to reflect real possible usage sce-         based identification tools is therefore crucial. This was pre-
narios while offering the largest diversity in the affordable         cisely the goal of the ImageCLEF8 plant identification task
approaches.                                                           organized since 2011 in the context of the worldwide evalua-
                                                                      tion forum CLEF9 . In 2011, 2012 and 2013 respectively 8, 10
1.3    Expected Outcomes                                              4
                                                                        http://leafsnap.com/
  The main expected outcomes of LifeCLEF evaluation cam-              5
                                                                        https://play.google.com/store/apps/details?id=org.plantnethl=fr
paign are the following:                                              6
                                                                        http://identify.plantnet-project.org/
                                                                      7
   • give a snapshot of the performances of state-of-the-art            http://www.tela-botanica.org/
                                                                      8
     multimedia techniques towards building real-world life             http://www.imageclef.org/
                                                                      9
     species identification systems                                     http://www.clef-initiative.eu/


                                                                  8
                                                                      Figure 2: 6 plant species sharing the same com-
                                                                      mon name for laurel in French, belonging to distinct
                                                                      species.


                                                                      image-based identification methods and evaluation data pro-
                                                                      posed in the past were so far based on leaf images (e.g. in
                                                                      [22, 5, 9] or in the more recent methods evaluated in [15]).
                                                                      Only few of them were focused on flower’s images as in [25]
                                                                      or [3]. Leaves are far from being the only discriminant visual
Figure 1: Distribution map of botanical records of                    key between species but, due to their shape and size, they
the Plant task 2013.                                                  have the advantage to be easily observed, captured and de-
                                                                      scribed. More diverse parts of the plants however have to be
                                                                      considered for accurate identification. As an example, the
and 12 international research groups did cross the finish line
                                                                      6 species depicted in Figure 2 share the same French com-
of this large collaborative evaluation by benchmarking their
                                                                      mon name of ”laurier” even though they belong to different
images-based plant identification systems (see [14], [15] and
                                                                      taxonomic groups (4 families, 6 genera).
[19] for more details). Data mobilised during these 3 first
                                                                         The main reason is that these shrubs, often used in hedges,
years can be consulted at the following url10 , geographic dis-
                                                                      share leaves with more or less the same-sized elliptic shape.
tribution of theses botanical records can be seen on Figure
                                                                      Identifying a laurel can be very difficult for a novice by just
1.
                                                                      observing leaves, while it is undisputably easier with flowers.
   Contrary to previous evaluations reported in the litera-
                                                                      Beyond identification performances, the use of leaves alone
ture, the key objective was to build a realistic task closer to
                                                                      has also some practical and botanical limitations. Leaves
real-world conditions (different users, cameras, areas, peri-
                                                                      are not visible all over the year for a large fraction of plant
ods of the year, individual plants, etc.). This was initially
                                                                      species. Deciduous species, distributed from temperate to
achieved through a citizen science initiative initiated 4 years
                                                                      tropical regions, can’t be identified by the use of their leaves
ago in the context of the Pl@ntNet project in order to boost
                                                                      over different periods of the year. Leaves can be absent
the image production of Tela Botanica social network. The
                                                                      (ie. leafless species), too young or too much degraded (by
evaluation data was enriched each year with the new contri-
                                                                      pathogen or insect attacks), to be exploited efficiently. More-
butions and progressively diversified with other input feeds
                                                                      over, leaves of many species are intrinsically not enough in-
(Annotation and cleaning of older data, contributions made
                                                                      formative or very difficult to capture (needles of pines, thin
through Pl@ntNet mobile applications). The plant task of
                                                                      leaves of grasses, huge leaves of palms, ...).
LifeCLEF 2014 is directly in the continuity of this effort.
                                                                      Another originality of PlantCLEF dataset is that its social
Main novelties compared to the last years are the following:
                                                                      nature makes it closer to the conditions of a real-world iden-
(i) an explicit multi-image query scenario (ii) the supply of
                                                                      tification scenario: (i) images of the same species are coming
user ratings on image quality in the meta-data (iii) a new
                                                                      from distinct plants living in distinct areas (ii) pictures are
type of view called ”Branch” additionally to the 6 previ-
                                                                      taken by different users that might not used the same proto-
ous ones (iv) basically more species (about 500 which is an
                                                                      col to acquire the images (iii) pictures are taken at different
important step towards covering the entire flora of a given
                                                                      periods in the year. Each image of the dataset is associ-
region).
                                                                      ated with contextual meta-data (author, date, locality name,
                                                                      plant id) and social data (user ratings on image quality, col-
2.2       Dataset                                                     laboratively validated taxon names, vernacular names) pro-
                                                                      vided in a structured xml file. The gps geo-localization and
   More precisely, PlantCLEF 2014 dataset is composed of
                                                                      the device settings are available only for some of the images.
60,962 pictures belonging to 19,504 observations of 500 species
                                                                      Table 3 gives some examples of pictures with decreasing av-
of trees, herbs and ferns living in a European region centered
                                                                      eraged users ratings for the different types of views. Note
around France. This data was collected by 1608 distinct con-
                                                                      that the users of the specialized social network creating these
tributors. Each picture belongs to one and only one of the 7
                                                                      ratings (Tela Botanica) are explicitely asked to rate the im-
types of view reported in the meta-data (entire plant, fruit,
                                                                      ages according to their plant identification ability and their
leaf, flower, stem, branch, leaf scan) and is associated with
                                                                      accordance to the pre-defined acquisition protocol for each
a single plant observation identifier allowing to link it with
                                                                      view type. This is not an aesthetic or general interest judge-
the other pictures of the same individual plant (observed the
                                                                      ment as in most social image sharing sites.
same day by the same person). It is noticeable that most
10
     http://publish.plantnet-project.org/project/plantclef            2.3    Task Description

                                                                  9
                                                                       across all plant observation queries. A simple mean on all
                                                                       plant observation test would however introduce some bias.
                                                                       Indeed, we remind that the PlantCLEF dataset was built
                                                                       in a collaborative manner. So that few contributors might
                                                                       have provided much more observations and pictures than
                                                                       many other contributors who provided few. Since we want
                                                                       to evaluate the ability of a system to provide the correct an-
                                                                       swers to all users, we rather measure the mean of the average
                                                                       classification rate per author. Finally, our primary metric is
                                                                       defined as the following average classification score S:

                                                                                                   U     Pu
                                                                                                1 X 1 X 1
                                                                                          S=                      su,p
                                                                                                U u=1 Pu p=1 Nu,p

                                                                          where U is the number of users, Pu the number of indi-
                                                                       vidual plants observed by the u-th user, Nu,p the number of
                                                                       pictures of the p-th plant observation of the u-th user, su,p
                                                                       is the score between 1 and 0 equals to the inverse of the rank
                                                                       of the correct species.
Figure 3: Examples of PlantCLEF pictures with
decreasing averaged users ratings for the different                    3.      TASK2: BirdCLEF
types of views
                                                                       3.1       Context
                                                                          The bird and the plant identification tasks share similar
   The task will be evaluated as a plant species retrieval task
                                                                       usage scenarios. The general public as well as profession-
based on multi-image plant observations queries. The goal
                                                                       als like park rangers, ecology consultants, and of course, the
is to retrieve the correct plant species among the top results
                                                                       ornithologists themselves might actually be users of an au-
of a ranked list of species returned by the evaluated sys-
                                                                       tomated bird identifying system, typically in the context of
tem. Contrary to previous plant identification benchmarks,
                                                                       wider initiatives related to ecological surveillance or biodi-
queries are not defined as single images but as plant obser-
                                                                       versity conservation. Using audio records rather than bird
vations, meaning a set of one to several images depicting
                                                                       pictures is justified by current practices [8, 31, 30, 7]. Birds
the same individual plant, observed by the same person,
                                                                       are actually not easy to photograph as they are most of the
the same day. Each image of a query observation is asso-
                                                                       time hidden, perched high in a tree or frightened by human
ciated with a single view type (entire plant, branch, leaf,
                                                                       presence, and they can fly very quickly, whereas audio calls
fruit, flower, stem or leaf scan) and with contextual meta-
                                                                       and songs have proved to be easier to collect and very dis-
data (data, location, author). Each participating group is
                                                                       criminant.
allowed to submit up to 4 runs built from different methods.
                                                                       Only three noticeable previous initiatives on bird species
Semi-supervised and interactive approaches (particularly for
                                                                       identification based on their songs or calls in the context of
segmenting parts of the plant from the background), are al-
                                                                       worldwide evaluation took place, in 2013. The first one was
lowed but will be compared independently from fully auto-
                                                                       the ICML4B bird challenge joint to the international Con-
matic methods. Any human assistance in the processing of
                                                                       ference on Machine Learning in Atlanta, June 2013. It was
the test queries has therefore to be signaled in the submitted
                                                                       initiated by the SABIOD MASTODONS CNRS group11 ,
runs meta-data.
                                                                       the university of Toulon and the National Natural History
                                                                       Museum of Paris [12]. It included 35 species, and 76 partici-
In practice, the whole PlantCLEF dataset is split in two
                                                                       pants submitted their 400 runs on the Kaggle interface. The
parts, one for training (and/or indexing) and one for test-
                                                                       second challenge was conducted by F. Brigs at MLSP 2013
ing. The test set was built by randomly choosing 1/3 of the
                                                                       workshop, with 15 species, and 79 participants in August
observations of each species whereas the remaining observa-
                                                                       2013. The third challenge, and biggest in 2013, was organ-
tions were kept in the reference training set. The xml files
                                                                       ised by University of Toulon, SABIOD and Biotope, with 80
containing the meta-data of the query images were purged
                                                                       species from the Provence, France. More than thirty teams
so as to erase the taxon name (the ground truth), the ver-
                                                                       participated, reaching 92% of average AUC. The descrip-
nacular name (common name of the plant) and the image
                                                                       tion of the ICML4B best systems are given into the on-line
quality ratings (that would not be available at query stage
                                                                       book [2], including for some of them reference to some useful
in a real-world mobile application). Meta-data of the obser-
                                                                       scripts.
vations in the training set are kept unaltered.
                                                                       In collaboration with the organizers of these previous chal-
                                                                       lenges, BirdCLEF 2014 goes one step further by (i) signifi-
The metric used to evaluate the submitted runs will be a
                                                                       cantly increasing the species number by almost an order of
score related to the rank of the correct species in the re-
                                                                       magnitude (ii) working on real-world social data built from
turned list. Each query observation will be attributed with
                                                                       hundreds of recordists (iii) moving to a more usage-driven
a score between 0 and 1 reflecting equal to the inverse of the
                                                                       and system-oriented benchmark by allowing the use of meta-
rank of the correct species (equal to 1 if the correct species
                                                                       data and defining information retrieval oriented metrics. Over-
is the top-1 decreasing quickly while the rank of the correct
                                                                       11
species increases). An average score will then be computed                  http://sabiod.org


                                                                  10
                                                                        canto community.

                                                                        3.3    Task Description
                                                                           Participants are asked to determine the species of the most
                                                                        active singing birds in each query file. The background noise
                                                                        can be used as any other meta-data, but it is forbidden to
                                                                        correlate the test set of the challenge with the original an-
                                                                        notated Xeno-canto data base (or with any external content
                                                                        as many of them are circulating on the web). More precisely
                                                                        and similarly to the plant task, the whole BirdCLEF dataset
                                                                        has been split in two parts, one for training (and/or index-
                                                                        ing) and one for testing. The test set was built by randomly
                                                                        choosing 1/3 of the observations of each species whereas the
                                                                        remaining observations were kept in the reference training
                                                                        set. Recordings of the same species done by the same per-
                                                                        son the same day are considered as being part of the same
                                                                        observation and cannot be split across the test and training
                                                                        set. The xml files containing the meta-data of the query
Figure 4: Xeno-canto audio recordings distribution                      recordings were purged so as to erase the taxon name (the
centered around Brazil area                                             ground truth), the vernacular name (common name of the
                                                                        bird) and the collaborative quality ratings (that would not
                                                                        be available at query stage in a real-world mobile applica-
all, the task is expected to be much more difficult than previ-
                                                                        tion). Meta-data of the recordings in the training set are
ous benchmarks because of the higher confusion risk between
                                                                        kept unaltered.
the classes, the higher background noise and the higher
diversity in the acquisition conditions (devices, recordists
                                                                           The groups participating to the task will be asked to pro-
uses, contexts diversity, etc.). It will therefore probably pro-
                                                                        duce up to 4 runs containing a ranked list of the most prob-
duce substantially lower scores and offer a better progression
                                                                        able species for each query records of the test set. Each
margin towards building real-world generalist identification
                                                                        species will have to be associated with a normalized score in
tools.
                                                                        the range [0; 1] reflecting the likelihood that this species is
3.2       Dataset                                                       singing in the sample. The primary metric used to com-
                                                                        pare the runs will be the Mean Average Precision aver-
   The training and test data of the bird task is composed              aged across all queries. Additionally, to allow easy com-
by audio recordings collected by Xeno-canto (XC)12 . Xeno-              parisons with the previous Kaggle ICML4B and NIPS4B
canto is a web-based community of bird sound recordists                 benchmarks, the AUC under the ROC curve will be com-
worldwide with about 1500 active contributors that have al-             puted for each species, and averaged over all species.
ready collected more than 150,000 recordings of about 9000
species. Nearly 500 species from Brazilian forests are used
in the BirdCLEF dataset, representing the 500 species of                4.    TASK3: FishCLEF
that region with the highest number of recordings, totalling
about 14,000 recordings produced by hundreds of users. Fig-             4.1    Context
ure 4 illustrates the geographical distribution of the dataset
                                                                           Underwater video monitoring has been widely used in re-
samples.
                                                                        cent years for marine video surveillance, as opposed to hu-
   To avoid any bias in the evaluation related to the used au-
                                                                        man manned photography or net-casting methods, since it
dio devices, each audio file has been normalized to a constant
                                                                        does not influence fish behavior and provides a large amount
bandwidth of 44.1 kHz and coded over 16 bits in wav mono
                                                                        of material at the same time. However, it is impractical for
format (the right channel is selected by default). The con-
                                                                        humans to manually analyze the massive quantity of video
version from the original Xeno-canto data set was done using
                                                                        data daily generated, because it requires much time and con-
ffmpeg, sox and matlab scripts. The optimized 16 Mel Fil-
                                                                        centration and it is also error prone. Automatic fish identifi-
ter Cepstrum Coefficients for bird identification (according
                                                                        cation in videos is therefore of crucial importance, in order to
to an extended benchmark [10]) have been computed with
                                                                        estimate fish existence and quantity [29, 28, 26]. Moreover,
their first and second temporal derivatives on the whole set.
                                                                        it would help supporting marine biologists to understand
They were used in the best systems run in ICML4B and
                                                                        the natural underwater environment, promote its preserva-
NIPS4B challenges.
                                                                        tion, and study behaviors and interactions between marine
   Audio records are associated with various meta-data in-
                                                                        animals that are part of it. Beyond this, video-based fish
cluding the species of the most active singing bird, the species
                                                                        species identification finds applications in many other con-
of the other birds audible in the background, the type of
                                                                        texts: from education (e.g. primary/high schools) to the
sound (call, song, alarm, flight, etc.), the date and location
                                                                        entertainment industry (e.g. in aquarium).
of the observations (from which rich statistics on species dis-
                                                                           To the best of our knowledge, this is the first worldwide
tribution can be derived), some textual comments of the au-
                                                                        initiative on automatic image and video based fish species
thors, multilingual common names and collaborative quality
                                                                        identification.
ratings. All of them were produced collaboratively by Xeno-
12
     http://www.xeno-canto.org/                                         4.2    Dataset

                                                                   11
                                                                               output masks of the object detection methods are re-
                                                                               quired;

                                                                             • Recall for fish detection in still images as a function of
                                                                               bounding box overlap percentage: a detection is con-
                                                                               sidered true positive if the PASCAL score between it
                                                                               and the corresponding object in the ground truth is
                                                                               over 0.7;

                                                                             • Average recall and recall per fish species for the fish
                                                                               recognition subtask.

                                                                           The participants to the above tasks will be asked to pro-
                                                                        duce several runs containing a list of detected fish together
                                                                        with their species (only for subtask 3). When dealing fish
Figure 5: 4 snapshots of 4 cameras monitoring the                       species identification, a ranked list of the most probable
Taiwan’s Kenting site                                                   species (and the related likelihood values) for each detected
                                                                        fish must be provided.
   The underwater video dataset used for FishCLEF, is de-
rived from the Fish4Knowledge13 video repository, which                 5.     SCHEDULE AND PERSPECTIVES
contains about 700k 10-minute video clips that were taken                  LifeCLEF 2014 registrations opened in December 2013
in the past five years to monitor Taiwan coral reefs. The Tai-          and will close at the end of April 2014. At the time of writ-
wan area is particularly interesting for studying the marine            ing, already 61 research groups registered to at least one of
ecosystem, as it holds one of the largest fish biodiversities of        the three task and this number will continue growing. As
the world with more than 3000 different fish species whose              in any evaluation campaign, many of the registered groups
taxonomy is available at 14 . The dataset contains videos               won’t cross the finish line be submitting official runs but this
recorded from sunrise to sunset showing several phenom-                 reflects at least their interest in LifeCLEF data and the re-
ena, e.g. murky water, algae on camera lens, etc., which                lated challenges. The schedule of the ongoing and remaining
makes the fish identification task more complex. Each video             steps of LifeCLEF 2014 campaign is the following:
has a resolution of 320x240 with 8 fps and comes with some
additional metadata including date and localization of the               31.01.2014:       training data ready and shared
recordings. Figure 5 shows 4 snapshots of 4 cameras mon-                 15.03.2014:       test data ready and shared
itoring the coral reef by Taiwan’s Kenting site and it illus-            01.05.2014:       deadline for submission of runs
trates the complexity of automatic fish detection and recog-             15.05.2014:       release of raw results
nition in real-life settings.                                            15.06.2014:       submission of working notes describing
   More specifically, the FishCLEF dataset consists of about                               each participant systems and runs
3000 videos with several thousands of detected fish. The                 15.07.2014:       overall tasks reports including results
fish detections were obtained by processing such underwa-                                  analysis and main findings
ter videos with video analysis tools [27] and then manually              16.09.2014:       1 day LifeCLEF workshop at CLEF 2014
labeled using the system in [20].                                                          Conference (Sheffield, UK)
                                                                           The organisation of a new campaign in 2015, as well as its
4.3       Task Description                                              precise content, will depend on the outcomes of the 2014 edi-
   The dataset for the video-based fish identification task             tion and on the feedback received from the registered groups.
will be released in two times: the participants will first have
access to the training set and a few months later, they will
be provided with the testing set. The goal is to automati-              6.     ADDITIONAL AUTHORS
cally detect fish and its species. The task comprises three               Robert Planqué (Xeno-Canto, The Netherlands, email:
sub-tasks: 1) identifying moving objects in videos by either            r.planque@vu.nl)
background modeling or object detection methods, 2) de-
tecting fish instances in video frames and then 3) identifying
species (taken from a subset of the most seen fish species)
                                                                        7.     REFERENCES
of fish detected in video frames.                                        [1] MAED ’12: Proceedings of the 1st ACM International
   Participants could decide to compete for only one subtask                 Workshop on Multimedia Analysis for Ecological Data,
or all subtasks. Although tasks 2 and 3 are based on still im-               New York, NY, USA, 2012. ACM. 433127.
ages, participants are invited to exploit motion information             [2] Proc. of the first workshop on Machine Learning for
extracted from videos to support their strategies.                           Bioacoustics, 2013.
   As scoring functions, the authors are asked to produce:               [3] A. Angelova, S. Zhu, Y. Lin, J. Wong, and C. Shpecht.
                                                                             Development and deployment of a large-scale flower
       • ROC curves for sub-task one. In particular, precision,
                                                                             recognition mobile app, December 2012.
         recall and F-measures measured when comparing, on
                                                                         [4] E. Aptoula and B. Yanikoglu. Morphological features
         a pixel basis, the ground truth binary masks and the
                                                                             for leaf based plant recognition. In Proc. IEEE Int.
13
     www.fish4knowledge.eu                                                   Conf. Image Process., Melbourne, Australia, page 7,
14
     http://fishdb.sinica.edu.tw/                                            2013.


                                                                   12
 [5] A. R. Backes, D. Casanova, and O. M. Bruno. Plant                    D. Barthélémy, and N. Boujemaa. The Imageclef
     leaf identification based on volumetric fractal                      Plant Identification Task 2013. In International
     dimension. International Journal of Pattern                          workshop on Multimedia analysis for ecological data,
     Recognition and Artificial Intelligence,                             Barcelone, Espagne, Oct. 2013.
     23(6):1145–1160, 2009.                                          [20] I. Kavasidis, S. Palazzo, R. Salvo, D. Giordano, and
 [6] H.-T. C. Baillie, J.E.M. and S. Stuart. 2004 iucn red                C. Spampinato. An innovative web-based collaborative
     list of threatened species. a global species assessment.             platform for video annotation. Multimedia Tools and
     IUCN, Gland, Switzerland and Cambridge, UK, 2004.                    Applications, pages 1–20, 2013.
 [7] F. Briggs, B. Lakshminarayanan, L. Neal, X. Z. Fern,            [21] H. Kebapci, B. Yanikoglu, and G. Unal. Plant image
     R. Raich, S. J. Hadley, A. S. Hadley, and M. G. Betts.               retrieval using color, shape and texture features. The
     Acoustic classification of multiple simultaneous bird                Computer Journal, 54(9):1475–1490, 2011.
     species: A multi-instance multi-label approach. The             [22] N. Kumar, P. N. Belhumeur, A. Biswas, D. W. Jacobs,
     Journal of the Acoustical Society of America,                        W. J. Kress, I. C. Lopez, and J. V. B. Soares.
     131:4640, 2012.                                                      Leafsnap: A computer vision system for automatic
 [8] J. Cai, D. Ee, B. Pham, P. Roe, and J. Zhang. Sensor                 plant species identification. In European Conference
     network for the monitoring of ecosystem: Bird species                on Computer Vision, pages 502–516, 2012.
     recognition. In Intelligent Sensors, Sensor Networks            [23] D.-J. Lee, R. B. Schoenberger, D. Shiozawa, X. Xu,
     and Information, 2007. ISSNIP 2007. 3rd                              and P. Zhan. Contour matching for a fish recognition
     International Conference on, pages 293–298, Dec 2007.                and migration-monitoring system. In Optics East,
 [9] G. Cerutti, L. Tougne, A. Vacavant, and D. Coquin. A                 pages 37–48. International Society for Optics and
     parametric active polygon for leaf segmentation and                  Photonics, 2004.
     shape estimation. In International Symposium on                 [24] S. Mouine, I. Yahiaoui, and A. Verroust-Blondet.
     Visual Computing, pages 202–213, 2011.                               Advanced shape context for plant species
[10] O. Dufour, T. Artieres, H. GLOTIN, and P. Giraudet.                  identification using leaf image retrieval. In ACM
     Clusterized mel filter cepstral coefficients and support             International Conference on Multimedia Retrieval,
     vector machines for bird song idenfication. 2013.                    pages 49:1–49:8, 2012.
[11] K. J. Gaston and M. A. O’Neill. Automated species               [25] M.-E. Nilsback and A. Zisserman. Automated flower
     identification: why not? Philosophical Transactions of               classification over a large number of classes. In Indian
     the Royal Society of London. Series B: Biological                    Conference on Computer Vision, Graphics and Image
     Sciences, 359(1444):655–667, 2004.                                   Processing, pages 722–729, 2008.
[12] H. Glotin and J. Sueur. Overview of the first                   [26] M. R. Shortis, M. Ravanbakskh, F. Shaifat, E. S.
     international challenge on bird classification. 2013.                Harvey, A. Mian, J. W. Seager, P. F. Culverhouse,
[13] H. Goëau, P. Bonnet, A. Joly, V. Bakić, J. Barbe,                  D. E. Cline, and D. R. Edgington. A review of
     I. Yahiaoui, S. Selmi, J. Carré, D. Barthélémy,                   techniques for the identification and measurement of
     N. Boujemaa, et al. Pl@ ntnet mobile app. In                         fish in underwater stereo-video image sequences. In
     Proceedings of the 21st ACM international conference                 SPIE Optical Metrology 2013, pages 87910G–87910G.
     on Multimedia, pages 423–424. ACM, 2013.                             International Society for Optics and Photonics, 2013.
[14] H. Goëau, P. Bonnet, A. Joly, N. Boujemaa,                     [27] C. Spampinato, E. Beauxis-Aussalet, S. Palazzo,
     D. Barthélémy, J.-F. Molino, P. Birnbaum,                          C. Beyan, J. Ossenbruggen, J. He, B. Boom, and
     E. Mouysset, and M. Picard. The ImageCLEF 2011                       X. Huang. A rule-based event detection system for
     plant images classification task. In CLEF working                    real-life underwater domain. Machine Vision and
     notes, 2011.                                                         Applications, 25(1):99–117, 2014.
[15] H. Goëau, P. Bonnet, A. Joly, I. Yahiaoui,                     [28] C. Spampinato, Y.-H. Chen-Burger, G. Nadarajan,
     D. Barthelemy, N. Boujemaa, and J.-F. Molino. The                    and R. B. Fisher. Detecting, tracking and counting
     imageclef 2012 plant identification task. In CLEF                    fish in low quality unconstrained underwater videos.
     working notes, 2012.                                                 In VISAPP (2), pages 514–519. Citeseer, 2008.
[16] H. Goëau, A. Joly, S. Selmi, P. Bonnet, E. Mouysset,           [29] C. Spampinato, D. Giordano, R. Di Salvo, Y.-H. J.
     L. Joyeux, J.-F. Molino, P. Birnbaum, D. Bathelemy,                  Chen-Burger, R. B. Fisher, and G. Nadarajan.
     and N. Boujemaa. Visual-based plant species                          Automatic fish classification for underwater species
     identification from crowdsourced data. In ACM                        behavior understanding. In Proceedings of ACM
     conference on Multimedia, pages 813–814, 2011.                       ARTEMIS 2010, pages 45–50. ACM, 2010.
[17] A. Hazra, K. Deb, S. Kundu, P. Hazra, et al. Shape              [30] M. Towsey, B. Planitz, A. Nantes, J. Wimmer, and
     oriented feature selection for tomato plant                          P. Roe. A toolbox for animal call recognition.
     identification. International Journal of Computer                    Bioacoustics, 21(2):107–125, 2012.
     Applications Technology and Research, 2(4):449–meta,            [31] V. M. Trifa, A. N. Kirschel, C. E. Taylor, and E. E.
     2013.                                                                Vallejo. Automated species recognition of antbirds in a
[18] A. Joly, H. Goeau, P. Bonnet, V. Bakić, J. Barbe,                   mexican rainforest using hidden markov models. The
     S. Selmi, I. Yahiaoui, J. Carré, E. Mouysset, J.-F.                 Journal of the Acoustical Society of America,
     Molino, N. Boujemaa, and D. Barthélémy. Interactive                123:2424, 2008.
     plant identification based on social image data.                [32] Q. D. Wheeler, P. H. Raven, and E. O. Wilson.
     Ecological Informatics, 2013.                                        Taxonomy: Impediment or expedient? Science,
[19] A. Joly, H. Goëau, P. Bonnet, V. Bakic, J.-F. Molino,               303(5656):285, 2004.


                                                                13