=Paper=
{{Paper
|id=Vol-1180/CLEF2014wn-Life-KaramtiEt2014
|storemode=property
|title=MIRACL at ImageCLEF 2014: Plant Identification Task
|pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-Life-KaramtiEt2014.pdf
|volume=Vol-1180
|dblpUrl=https://dblp.org/rec/conf/clef/KaramtiFTG14
}}
==MIRACL at ImageCLEF 2014: Plant Identification Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1180/CLEF2014wn-Life-KaramtiEt2014.pdf</pdf>
<pre>
        MIRACL at LifeCLEF 2014: Multi-organ
          observation for Plant Identification

       Hanen Karamti, Sana Fakhfakh, Mohamed Tmar, Walid Mahdi, and
                               Faiez Gargouri

    MIRACL Laboratory, City ons Sfax, University of Sfax, B.P.3023 Sfax TUNISIA
                           karamti.hanen@gmail.com,
                            sanafakhfakh@yahoo.fr,
                         mohamed.tmar@isimsf.rnu.tn,
                          walid.mahdi@isimsf.rnu.tn,
                         faiez.gargouri@fsegs.rnu.tn


        Abstract. ImageCLEF 2014 has a challenge based on analysis for iden-
        tifying plants. This article describes our first participation to the multi-
        image plant observation queries task of PlantCLEF 2014. The task will
        be evaluated as a plant species retrieval task based on multi-image plant
        observations queries. The goal is to retrieve the correct plant species
        among the top results of a ranked list of species returned by the evalu-
        ated system.
        In this paper, we present two method. Our first method is purely visual
        and entirely automatic, using only the image information. One should
        mention that the total time spent with preparing this submission was
        only about three week. The results were accordingly fairly poor. The
        challenge of our second method is to identify plant species based on
        combination of textual and structural context of image. Indeed, we have
        used the meta-data in our system for exploring the image characteris-
        tics. Our approach is based on a modern technique for exploitation of
        structure of XML document. Also, the results were accordingly fairly
        poor.
        Although our results are not quite promising as compared to other par-
        ticipant groups, they can still guide our work in this field for some con-
        clusions reached.

        Keywords: plant observation, feature extraction, ImageClef, XML, im-
        age retrieval.


1     Introduction

The plant identification has become an important and challenging research area
since it is estimated that approximately one half of world plant species is still
not cataloged. Among such unidentified species one may find, for instance, the
healing of a disease or a plant that can cooperate in the equilibrium of the
ecosystem around it.


                                           747
    Despite the importance of studies related to the description and categoriza-
tion of plants, this task remains still a difficult for a botanist which is limited
to a specific amount of information about the vegetal. Furthermore, among the
information which may be collected, the most relevant for the botanist analysis
are flowers and fruits. However, it turns out that in most cases these elements
are observed only in specific periods of the year. This is a complicated issue
given that the observation may not be possible when these characteristics are
noticeable.
    A solution for this impasse is the identification of plant by observing its dif-
ferent organs, such as it has been demonstrated in [1]. Indeed, botanists usually
observe simultaneously several organs like the leaves and the fruits or the flowers
in order to disambiguate species which could be confused if only one organ were
observed. Moreover, if only one organ is observed, such as the bark of a decidu-
ous plant during winter where nothing else is observable, then the observation of
this organ with several photos related to different point of views could be more
informative than only one point of view.
    Thus, image analysis based on computational tools is a worthwhile approach
in order to help the botanist or even provide by itself a reliable outcome for the
classification task. In this context, ImageCLEF the Combined Lab Evaluation
Forum (CLEF) hosts an annual competition on identification plant species.
    In this year, ImageCLEF organizes a new challenge dedicated to botanical
data (called PlantCLEF 2014), the species identification task won’t be image-
centered but observation-centered. The aim of the task will be to produce a list
of relevant species for each observation of a plant of the test dataset, i.e. one or
a set of several pictures related to a same event: one same person photographing
several detailed views on various organs the same day with the same device with
the same lightening conditions observing one same plant. The main novelties
compared to the last years are the following: An explicit multi-image query
scenario, User ratings on image quality, a new type of view called ’Branch’
additionally to the 6 others views (Scan, photos of Flower, Fruit, Stem, Leaf
and Entire views), and basically more species (the number of species will be this
year about 500, which is an important step towards covering the entire flora of
a given region).
    In this context, the following paper describes our (team MIRACL) approach
for the LifeCLEF, task on multi-image plant observation queries. Our focus for
this endeavor was on three main points: (i) image retrieval based on content that
requires a thorough knowledge of low-level features of image. (ii) Image retrieval
based on context based on information extracted in proximity of image for ex-
ample: title or caption of image, textual description, etc. (iii) the combination
of textual and visual features.
    The following Section 2 describes the materials and methods used. In Section
3, we describe the results of our experiments. Finally, some conclusions reached
are presented in Section 4.


                                       748
2     Material and Methods

2.1   Database


            Fig. 1. Sample images for the database of PlantCLEF 2014


    The task will be based on Pl@ntView dataset which focuses on 500 herb,
tree and fern species centered on France (some plants observations are from
neighboring countries). This database (shows figure 1) is maintained by the
French project PlantNet (INRIA, CIRAD, Telabotanica) and the French CNRS
program MASTODONS.
It contains more than 60000 pictures belonging each to one of the 7 types of
view reported into the meta-data, in a xml file (one per image) with explicit
tags [8]. For (1), (2), (3), (5), (6) and (7), the images are taken directly from the
trees. For (4), the images are oriented vertically along the main natural axis and
with the petiole visible collected using flatbed scanners. These images haven’t
an uniform background.
Each image has an xml file associated that contain the following meta-data:

 – ObservationId: the plant observation ID from which several pictures can
   be associated
 – FileName


                                        749
 – MediaId
 – View Content: Branch, Entire, Flower, Fruit, Leaf, LeafScan, Stem
 – ClassId: the class number ID that must be used as ground-truth. It is a
   numerical taxonomical number used by Tela Botanica
 – Species: the species names (containing 3 parts: the Genus name, the Species
   name, the author(s) who discovered or revised the name of the species)
 – Genus: the name of the Genus, one level above the Species in the taxonom-
   ical hierarchy used by Tela Botanica
 – Family: the name of the Family, two levels above the Species in the taxo-
   nomical hierarchy used by Tela Botanica
 – Date: (if available) the date when the plant was observed,
 – Vote: the (round up) average of the user ratings on image quality
 – Locality: (if available) locality name, most of the time a town
 – Latitude & Longitude: (if available) the GPS coordinates of the obser-
   vation in the EXIF metadata, or, if no GPS information were found in the
   EXIF, the GPS coordinates of the locality where the plant was observed
   (only for the towns of metropolitan France)
 – Author: name of the author of the picture, And if the image was included
   in previous plant task:
 – Year: ImageCLEF2011, ImageCLEF2012, ImageCLEF2013, PlantCLEF2014
   when the image was integrated in the benchmark
 – IndividualPlantId2013: the plant observation ID used last year during the
   ImageCLEF2013 plant task,
 – ImageID2013: the image id.jpg used in 2013.

The full database is split into training and testing dataset. The train dataset has
47815 images (1987 of ’Branch’, 6356 photographs of ’Entire’, 13164 of ’Flower ’,
3753 ’Fruit ’, 7754 of ’Leaf ’, 3466 ’Stem’ and 11335 ’scans’ and ’scan-like’ pictures
of ’leaf ’).
The test dataset results in 8163 plant-observation-queries. These queries are
based on 13146 images (731 of ’Branch’, 2983 photographs of ’Entire’, 4559 of
’Flower ’, 1184 ’Fruit ’, 2058 of ’Leaf ’, 935 ’Stem’ and 696 ’scans’ and ’scan-like’
pictures of ’leaf ’) .


2.2   Methods

In this section, we will present our methods.


Feature Extraction Two main functionalities are supported of feature extrac-
tion (run 1): data extraction processing and query processing. The data extrac-
tion processing is responsible for extracting appropriate features from images
and storing them into their feature vectors. This process is usually performed
offline. A feature vector VI of an image I can be thought of as a list of low-levels
features (C1 , C2 , ..., Cm ), where m is the number of features.
We have used three descriptor to features extraction:


                                        750
 – A color layout descriptor (CLD) [3] is designed to capture the spatial distri-
   bution of color in an image. The feature extraction process consists of two
   parts: grid based representative color selection and discrete cosine transform
   with quantization. The image is divided into an 8 × 8 grid. An 8 × 8 pixel
   image is created, with each pixel given the representative color from the
   corresponding grid area in the image. The 8 × 8 matrix is transformed with
   the discrete cosine transform. Finally, a zigzag scan is performed on the ma-
   trix. The resulting matrices (one for each color component) make up the
   descriptor.
 – A Edge Histogram Descriptor (EHD) [2] is a texture descriptor proposed for
   MPEG-7 expresses only the local edge distribution in the image
 – A Scalable Color Descriptor (SCD) [4], the historgram is generated by color
   quantizing the image into 256 bins in the HSV color space, with 16 bins for
   hue, and 4 bins each for saturation and value. The image on the right is an
   example of color images with corresponding histograms. The left image and
   center image are more similar to each other based on the color histograms,
   compared to the image on the right. This descriptor is useful when searching
   for similarities between images.
The Ci represents the combination of CCLD , CSCD and CEHD of feature i. The
query processing, in turn, extracts a feature vector from a query and applies a
metric (Euclidean distance equation 1) to evaluate the similarity between the
query image and the database images [7].


                                                      v
                                                      um
                                                      uX
               SvisuelI = distEuclidean (VI , VIi ) = t (CI − CIi )2               (1)
                                                       i=1

The similarites scores of the queries results builds a score vector. A score vector
SvisuelI of an image I can be thought of as a set of scores (Svisuel1 , Svisuel2 , ...,
Svisueln ), where n is the dimension of database images.

System using Structure of XML document In this section, we are inter-
ested in Context-based Image retrieval techniques, and more precisely in Image
retrieval based on textual and structural context in XML documents. Image
context is composed all textual information surrounding the image. For retrieve
image presentated in Figure 2, we can use text surrounding image such as doc-
ument title, image name, image caption, etc
     There are other sources of evidence that were used as visual descriptors,
information from link around the image, structure of XML document. Indeed, We
focus on XML documents don’t have a homogeneous structure. What makes the
structure as new source of evidence. The textual context remains insufficient in
most of time. In this context, [5] say: ”Ignore the document structure is to ignore
its semantics”. The idea is to calculate the relevancy score of image element based
on information from the textual and structural context to answer a specific


                                         751
                  <?xml version="1.0" encoding="UTF-8"?>
                  <Image>
                   <ObservationId>18228</ObservationId>
                   <FileName>6.jpg</FileName>
                   <MediaId>6</MediaId>
                   <Vote>4</Vote>
                   <Content>Flower</Content>
                   <caption> Flower Papaver </caption>
                   <ClassId>30269</ClassId>
                   <Family>PAPAVERACEAE</Family>
                   <Species>Papaver rhoeas L.</Species>
                   <Genus>Papaver</Genus>
                   <Author>liliane roubaudi</Author>
                   <Date>26/05/13</Date>
                   <Location />
                   <Latitude />
                   <Longitude />
                   <YearInCLEF>PlantCLEF2014</YearInCLEF>
                   <IndividualPlantId2013 />
                   <ImageID2013 />
                   <LearnTag>Train</LearnTag>
                  </Image>


                       Fig. 2. Example of a image element context.


information needs of user, expressed as query composed of set of keywords.
And seeking the most appropriate manner to combine two sources of evidence:
text and structure. Our main inspiration is to use the structure to involve each
textual information depending on its position in XML document, that is textual
information that gives the best possible description of image element.
    In run 2, we propose a automatic method in the field of image retrieval
that takes into account the structure as a source of evidence and its impact on
search performance. We present a new source of evidence dedicated to image
retrieval based on the intuition that each textual node contains information that
describes semantically a image element. And the participation of each text node
in the score of a image element varies with its position in there XML document.
To compute the geometric distance, we initially place the nodes of each XML
document in an Euclidean space to calculate the coordinates of each node. Then,
we compute the score of a image element depending on the distance between
each textual node [6]. Figure 3 present our indexing system based on textual
and structural context for image.
    The result will be represented by a vector SscoreI . This vector of an image I
can be thought of as a set of scores (Sscore1 , Sscore2 , ..., Sscoren ), where n is the
dimension of database images.

Combination of Textual and Visual features From our two vectors Sscore
and Svisuel , we calculate a score that is a linear combination of scores for each
modality:
                                                                   
                  Sscore1                       Svisuel1                S1
                Sscore2                      Svisuel2              S2 
    Sscorei =       ..           Svisueli =       ..          Si =  . 
                                                                   
                                                                       .. 
                                                        
                     .                            .   
                  Sscoren                       Svisueln                Sn


             Si = score(Svisueli , Sscorei ) = αSvisueli + (1 − α)Sscorei           (2)


                                                            752
                                                                         <?xml version="1.0" encoding="UTF-8"?>
                                                                         <Image>
                                                                          <ObservationId>18228</ObservationId>
                                                                          <FileName>6.jpg</FileName>
                                                                          <MediaId>6</MediaId>
                                                                          <Vote>4</Vote>
                                                                          <Content>Flower</Content>
                                                                          <ClassId>30269</ClassId>


                                             NLP
                                                                          <Family>PAPAVERACEAE</Family>
                                                                          <Species>Papaver rhoeas L.</Species>
                                                                          <Genus>Papaver</Genus>
                                                                          <Author>liliane roubaudi</Author>


                                             tools
                                                                          <Date>26/05/13</Date>
                                                                          <Location />
                                                                          <Latitude />
                                                                          <Longitude />
                                                                          <YearInCLEF>PlantCLEF2014</YearInCLEF>
                                                                          <IndividualPlantId2013 />
                                                                          <ImageID2013 />
                                                                          <LearnTag>Train</LearnTag>
                                                                         </Image>


                                                                                               Corpus


                                                                                                                          STRUCTURAL INDEXING
                         TEXTUAL INDEXING
                                             Pretreatment                     Extraction of geometric descriptors


                                            Terms extraction
                                                                             Compute coordinates of XML nodes

                                            Terms weighing                                       X1    X1    X1    X1
                                                                                                 X2    X2    X2     X2
                                                                                                 X3    X3    X3     X3
                                                                                                 X4    X4    X4     X4
                                              #t1 #d1 #n1 0.12                                    .     .     .      .
                                              #t2 #d2 #n2 0.03                                    .     .     .      .
                                              #t3 #d3 #n3 0                                       .     .     .      .
                                                                                                 XM    XM    XM    XM
                                                                                                node1 node2 node3 nodeN


                                                      Selection of descriptions for each node


      Fig. 3. Architecture of our indexing model (textual and structural context).


   This score (equation 2) is a dot product between two vectors representing
the visual features and the textual features.
The parameter α is used to weight the amount of information conveyed by each
modality.


3     Experiments and Results

3.1    Evaluation Metric

The metric evaluation S was related to the rank of the correct species in the list
of retrieved species as follows [8]:

                               U      Pu      Nu,p
                            1 X 1 X       1 X
                         S=                        Su,p,n                                                                                       (3)
                            U u=1 Pu p=1 Nu,p n=1

Where U :is the number of users.
Pu : number of individual plants observed by the uth user.
Nu,p : number of pictures taken from the pth plant observed by the uth user.
Su,p,n : score between 1 and 0 equals to the inverse of the rank of the correct
species (for the nth picture taken from the pth plant observed by the uth user).


                                                                 753
3.2   The Results

Miracl team has submitted three automatic runs to the multi-image plant obser-
vation queries. And the details are as follows [9]. In the first run, we extracted
visual descriptors from the images, used texture, color and edge model for gen-
erating the feature vectors. In the second run, we changed the evidence source
of image in hope of taking general context of image into account the textual and
structural information in XML document.
In the third run, also the last run, we fused the classifiers of the two runs above
to form a new kind of degree of attribution.


                  Table 1. Results for all runs in plant collection

               Run name           Retrieval type           Score
              Miracl Run 1             Visual              0, 063
              Miracl Run 2 Context(Textual and Structural) 0, 051
              Miracl Run 3       Visual + Context          0, 047


    The results are showed in Table 1. From it we can find that the first run
achieves the best results in all kinds of pictures. That’s to say, the visual scheme
usually can reach a better result for combination of different visual descriptors
advantages.
The run for the contextual descriptors is in the middle place of the three runs
and the one for flip combination descriptors gets the worst results which is in
contradiction with our expectation. So we can speculate that the reasons for this
phenomenon may have a relation to the number of training images in certain
species which have very few images and the meta data of image are similar and
relatively short which does not identify a species.


4     Conclusion

This work is our first try in the plant identifcation task. Although the results are
not quite promising, we still reach some conclusions and gain some experience.
(1) At first, as there are some species which have only a few images, the bias for
these species may be emphasized and combing some retrieval methods is a good
choice.
(2) Many speacies have the same observation and the same id class. In fact, we
have a problem to product the run. Each run contains betwen 10 to 200 species
for each queries.
    While our solution was not performing up to the best ones in this competition
(the idea score is 0.47), it is fast, accurate enough for non-critical applications
and has potential for improvement. We intend to proceed developing this solution
further and plan a mobile phone implementation of it.


                                        754
References
1. Goëau, Hervé and Bonnet, Pierre and Barbe, Julien and Bakic, Vera and Joly, Alexis
   and Molino, Jean-François and Barthelemy, Daniel and Boujemaa, Nozha: Multi-
   organ Plant Identification. In :Proceedings of the 1st ACM International Workshop
   on Multimedia Analysis for Ecological Data (MAED ’12). pp.41–44 (2012)
2. Park, Dong Kwon and Jeon, Yoon Seok and Won, Chee Sun: Efficient Use of Lo-
   cal Edge Histogram Descriptor. In :Proceedings of the 2000 ACM Workshops on
   Multimedia. pp.51–54 (2000)
3. Kasutani, E. and Yamada, A.: The MPEG-7 color layout descriptor: a compact
   image feature description for high-speed image/video segment retrieval. In :Pro-
   ceedings. 2001 International Conference on Image Processing. pp.674 - 677, vol.1
   (2001)
4. Cieplinski, Leszek: MPEG-7 Color Descriptors and Their Applications. In :Pro-
   ceedings of the 9th International Conference on Computer Analysis of Images and
   Patterns (CAIP ’01). pp.11–20 (2001)
5. Schlieder, T., Holger, M.: Querying and Ranking XML Documents. In :Journal of
   the American Society for Information Science and Technolog. pp.489–503, vol.53
   (2002)
6. FAKHFAKH, S., Tmar, M., MAHDI, W.: A New Metric for Multimedia Retrieval in
   Structured Documents. In:15th International Conference on Enterprise Information
   Systems, ICEIS 2013, pp.240–247 (2013)
7. Hanen Karamti: Vectorisation du modèle d’appariement pour la recherche d’images
   par le contenu. In: CORIA 2013, pp.335-340 (2013)
8. Joly, Alexis and Müller, Henning and Goëau, Hervé and Glotin, Hervé and Spamp-
   inato, Concetto and Rauber, Andreas and Bonnet, Pierre and Vellinga, Willem-Pier
   and Fisher, Bob. LifeCLEF 2014: multimedia life species identification challenges.
   In: Proceedings of CLEF 2014 (2014)
9. Goëau, Hervé and Joly, Alexis and Bonnet, Pierre and Molino, Jean-François and
   Barthélémy, Daniel and Boujemaa, Nozha. LifeCLEF Plant Identification Task 2014.
   In: CLEF working notes 2014 (2014)


                                         755

</pre>