<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MIRACL at LifeCLEF 2014: Multi-organ observation for Plant Identification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hanen Karamti</string-name>
          <email>karamti.hanen@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sana Fakhfakh</string-name>
          <email>sanafakhfakh@yahoo.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Tmar</string-name>
          <email>mohamed.tmar@isimsf.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walid Mahdi</string-name>
          <email>walid.mahdi@isimsf.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faiez Gargouri</string-name>
          <email>faiez.gargouri@fsegs.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MIRACL Laboratory, City ons Sfax, University of Sfax</institution>
          ,
          <addr-line>B.P.3023 Sfax</addr-line>
          <country country="TN">TUNISIA</country>
        </aff>
      </contrib-group>
      <fpage>747</fpage>
      <lpage>755</lpage>
      <abstract>
        <p>ImageCLEF 2014 has a challenge based on analysis for identifying plants. This article describes our first participation to the multiimage plant observation queries task of PlantCLEF 2014. The task will be evaluated as a plant species retrieval task based on multi-image plant observations queries. The goal is to retrieve the correct plant species among the top results of a ranked list of species returned by the evaluated system. In this paper, we present two method. Our first method is purely visual and entirely automatic, using only the image information. One should mention that the total time spent with preparing this submission was only about three week. The results were accordingly fairly poor. The challenge of our second method is to identify plant species based on combination of textual and structural context of image. Indeed, we have used the meta-data in our system for exploring the image characteristics. Our approach is based on a modern technique for exploitation of structure of XML document. Also, the results were accordingly fairly poor. Although our results are not quite promising as compared to other participant groups, they can still guide our work in this field for some conclusions reached.</p>
      </abstract>
      <kwd-group>
        <kwd>plant observation</kwd>
        <kwd>feature extraction</kwd>
        <kwd>ImageClef</kwd>
        <kwd>XML</kwd>
        <kwd>image retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The plant identification has become an important and challenging research area
since it is estimated that approximately one half of world plant species is still
not cataloged. Among such unidentified species one may find, for instance, the
healing of a disease or a plant that can cooperate in the equilibrium of the
ecosystem around it.</p>
      <p>Despite the importance of studies related to the description and
categorization of plants, this task remains still a difficult for a botanist which is limited
to a specific amount of information about the vegetal. Furthermore, among the
information which may be collected, the most relevant for the botanist analysis
are flowers and fruits. However, it turns out that in most cases these elements
are observed only in specific periods of the year. This is a complicated issue
given that the observation may not be possible when these characteristics are
noticeable.</p>
      <p>A solution for this impasse is the identification of plant by observing its
different organs, such as it has been demonstrated in [1]. Indeed, botanists usually
observe simultaneously several organs like the leaves and the fruits or the flowers
in order to disambiguate species which could be confused if only one organ were
observed. Moreover, if only one organ is observed, such as the bark of a
deciduous plant during winter where nothing else is observable, then the observation of
this organ with several photos related to different point of views could be more
informative than only one point of view.</p>
      <p>Thus, image analysis based on computational tools is a worthwhile approach
in order to help the botanist or even provide by itself a reliable outcome for the
classification task. In this context, ImageCLEF the Combined Lab Evaluation
Forum (CLEF) hosts an annual competition on identification plant species.</p>
      <p>In this year, ImageCLEF organizes a new challenge dedicated to botanical
data (called PlantCLEF 2014), the species identification task won’t be
imagecentered but observation-centered. The aim of the task will be to produce a list
of relevant species for each observation of a plant of the test dataset, i.e. one or
a set of several pictures related to a same event: one same person photographing
several detailed views on various organs the same day with the same device with
the same lightening conditions observing one same plant. The main novelties
compared to the last years are the following: An explicit multi-image query
scenario, User ratings on image quality, a new type of view called ’Branch’
additionally to the 6 others views (Scan, photos of Flower, Fruit, Stem, Leaf
and Entire views), and basically more species (the number of species will be this
year about 500, which is an important step towards covering the entire flora of
a given region).</p>
      <p>In this context, the following paper describes our (team MIRACL) approach
for the LifeCLEF, task on multi-image plant observation queries. Our focus for
this endeavor was on three main points: (i) image retrieval based on content that
requires a thorough knowledge of low-level features of image. (ii) Image retrieval
based on context based on information extracted in proximity of image for
example: title or caption of image, textual description, etc. (iii) the combination
of textual and visual features.</p>
      <p>The following Section 2 describes the materials and methods used. In Section
3, we describe the results of our experiments. Finally, some conclusions reached
are presented in Section 4.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <sec id="sec-2-1">
        <title>Database</title>
        <p>The task will be based on Pl@ntView dataset which focuses on 500 herb,
tree and fern species centered on France (some plants observations are from
neighboring countries). This database (shows figure 1) is maintained by the
French project PlantNet (INRIA, CIRAD, Telabotanica) and the French CNRS
program MASTODONS.</p>
        <p>
          It contains more than 60000 pictures belonging each to one of the 7 types of
view reported into the meta-data, in a xml file (one per image) with explicit
tags [8]. For (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ), (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ), (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) and (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ), the images are taken directly from the
trees. For (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ), the images are oriented vertically along the main natural axis and
with the petiole visible collected using flatbed scanners. These images haven’t
an uniform background.
        </p>
        <p>Each image has an xml file associated that contain the following meta-data:
– ObservationId: the plant observation ID from which several pictures can
be associated
– FileName
– MediaId
– View Content: Branch, Entire, Flower, Fruit, Leaf, LeafScan, Stem
– ClassId: the class number ID that must be used as ground-truth. It is a
numerical taxonomical number used by Tela Botanica
– Species: the species names (containing 3 parts: the Genus name, the Species
name, the author(s) who discovered or revised the name of the species)
– Genus: the name of the Genus, one level above the Species in the
taxonomical hierarchy used by Tela Botanica
– Family: the name of the Family, two levels above the Species in the
taxonomical hierarchy used by Tela Botanica
– Date: (if available) the date when the plant was observed,
– Vote: the (round up) average of the user ratings on image quality
– Locality: (if available) locality name, most of the time a town
– Latitude &amp; Longitude: (if available) the GPS coordinates of the
observation in the EXIF metadata, or, if no GPS information were found in the
EXIF, the GPS coordinates of the locality where the plant was observed
(only for the towns of metropolitan France)
– Author: name of the author of the picture, And if the image was included
in previous plant task:
– Year: ImageCLEF2011, ImageCLEF2012, ImageCLEF2013, PlantCLEF2014
when the image was integrated in the benchmark
– IndividualPlantId2013: the plant observation ID used last year during the</p>
        <p>ImageCLEF2013 plant task,
– ImageID2013: the image id.jpg used in 2013.</p>
        <p>The full database is split into training and testing dataset. The train dataset has
47815 images (1987 of ’Branch’, 6356 photographs of ’Entire’, 13164 of ’Flower ’,
3753 ’Fruit ’, 7754 of ’Leaf ’, 3466 ’Stem’ and 11335 ’scans’ and ’scan-like’ pictures
of ’leaf ’).</p>
        <p>The test dataset results in 8163 plant-observation-queries. These queries are
based on 13146 images (731 of ’Branch’, 2983 photographs of ’Entire’, 4559 of
’Flower ’, 1184 ’Fruit ’, 2058 of ’Leaf ’, 935 ’Stem’ and 696 ’scans’ and ’scan-like’
pictures of ’leaf ’) .
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Methods</title>
        <p>In this section, we will present our methods.</p>
        <p>Feature Extraction Two main functionalities are supported of feature
extraction (run 1): data extraction processing and query processing. The data
extraction processing is responsible for extracting appropriate features from images
and storing them into their feature vectors. This process is usually performed
offline. A feature vector VI of an image I can be thought of as a list of low-levels
features (C1, C2, ..., Cm), where m is the number of features.</p>
        <p>We have used three descriptor to features extraction:
– A color layout descriptor (CLD) [3] is designed to capture the spatial
distribution of color in an image. The feature extraction process consists of two
parts: grid based representative color selection and discrete cosine transform
with quantization. The image is divided into an 8 × 8 grid. An 8 × 8 pixel
image is created, with each pixel given the representative color from the
corresponding grid area in the image. The 8 × 8 matrix is transformed with
the discrete cosine transform. Finally, a zigzag scan is performed on the
matrix. The resulting matrices (one for each color component) make up the
descriptor.
– A Edge Histogram Descriptor (EHD) [2] is a texture descriptor proposed for</p>
        <p>MPEG-7 expresses only the local edge distribution in the image
– A Scalable Color Descriptor (SCD) [4], the historgram is generated by color
quantizing the image into 256 bins in the HSV color space, with 16 bins for
hue, and 4 bins each for saturation and value. The image on the right is an
example of color images with corresponding histograms. The left image and
center image are more similar to each other based on the color histograms,
compared to the image on the right. This descriptor is useful when searching
for similarities between images.</p>
        <p>The Ci represents the combination of CCLD, CSCD and CEHD of feature i. The
query processing, in turn, extracts a feature vector from a query and applies a
metric (Euclidean distance equation 1) to evaluate the similarity between the
query image and the database images [7].</p>
        <p>
          SvisuelI = distEuclidean(VI , VIi ) = utvu Xm(CI − CIi )2
i=1
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
The similarites scores of the queries results builds a score vector. A score vector
SvisuelI of an image I can be thought of as a set of scores (Svisuel1 , Svisuel2 , ...,
Svisueln ), where n is the dimension of database images.
        </p>
        <p>System using Structure of XML document In this section, we are
interested in Context-based Image retrieval techniques, and more precisely in Image
retrieval based on textual and structural context in XML documents. Image
context is composed all textual information surrounding the image. For retrieve
image presentated in Figure 2, we can use text surrounding image such as
document title, image name, image caption, etc</p>
        <p>There are other sources of evidence that were used as visual descriptors,
information from link around the image, structure of XML document. Indeed, We
focus on XML documents don’t have a homogeneous structure. What makes the
structure as new source of evidence. The textual context remains insufficient in
most of time. In this context, [5] say: ”Ignore the document structure is to ignore
its semantics”. The idea is to calculate the relevancy score of image element based
on information from the textual and structural context to answer a specific
information needs of user, expressed as query composed of set of keywords.
And seeking the most appropriate manner to combine two sources of evidence:
text and structure. Our main inspiration is to use the structure to involve each
textual information depending on its position in XML document, that is textual
information that gives the best possible description of image element.</p>
        <p>In run 2, we propose a automatic method in the field of image retrieval
that takes into account the structure as a source of evidence and its impact on
search performance. We present a new source of evidence dedicated to image
retrieval based on the intuition that each textual node contains information that
describes semantically a image element. And the participation of each text node
in the score of a image element varies with its position in there XML document.
To compute the geometric distance, we initially place the nodes of each XML
document in an Euclidean space to calculate the coordinates of each node. Then,
we compute the score of a image element depending on the distance between
each textual node [6]. Figure 3 present our indexing system based on textual
and structural context for image.</p>
        <p>The result will be represented by a vector SscoreI . This vector of an image I
can be thought of as a set of scores (Sscore1, Sscore2 , ..., Sscoren), where n is the
dimension of database images.</p>
        <p>Combination of Textual and Visual features From our two vectors Sscore
and Svisuel, we calculate a score that is a linear combination of scores for each
modality:</p>
        <p> Sscore1 
Sscorei =  Sscore2 
 ... </p>
        <sec id="sec-2-2-1">
          <title>Sscoren</title>
          <p> Svisuel1 
Svisueli =  Svisuel2 
 ... </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Svisueln</title>
          <p> S1 
Si =  S...2 </p>
          <p>
            Sn
Si = score(Svisueli , Sscorei ) = αSvisueli + (1 − α)Sscorei
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
Pretreatment
          </p>
          <p>Extraction
of geo
metric
descriptors
I</p>
          <p>E
tools
Terms</p>
          <p>extraction
Terms</p>
          <p>weighing</p>
          <p>Architecture
of
our
indexing
model
(te
xtu
al
and
stru
ctural
co
nte
xt).</p>
          <p>This
score
(equation
2)
is
a
dot
pro duct
between
two
vectors
representing
the
visual
features
and the
textual
features.</p>
          <p>The parameter
α
is
used to
weight the amount
of
information conveyed by each
mo dality.
3
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Exp eriments and</title>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>Evaluation</title>
      </sec>
      <sec id="sec-4-2">
        <title>Metric</title>
        <p>The
metric
evaluation S
was
related to the rank
of
the correct
species
in
the
list
of
retrieved species
as
follows</p>
        <p>[8]:
S
=
1
U</p>
        <p>U
X
u
=1
1
P
u</p>
        <p>P</p>
        <p>u
X
p
=1</p>
        <p>1
N
u,p</p>
        <p>N</p>
        <p>u,p
X
n
=1</p>
        <p>
          S
u,p,n
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
the
of
Where
        </p>
        <p>U :is
number
of
users.</p>
        <p>u
P
N
S
u,p
u,p,n
: number
in
dividual
plants
observed
by the
u
th
user.
: number
of
pictures
taken
from
the p
th
plant observed
by the
:
score
between
1
and
0
equals
to
the
inverse
of
the
rank
of
species
(for the
n
th
picture
taken
from
the p
plant observed
by the
th
u
th</p>
        <p>user.
the
u
th
correct
user).
3.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>The Results</title>
        <p>Miracl team has submitted three automatic runs to the multi-image plant
observation queries. And the details are as follows [9]. In the first run, we extracted
visual descriptors from the images, used texture, color and edge model for
generating the feature vectors. In the second run, we changed the evidence source
of image in hope of taking general context of image into account the textual and
structural information in XML document.</p>
        <p>In the third run, also the last run, we fused the classifiers of the two runs above
to form a new kind of degree of attribution.</p>
        <p>The results are showed in Table 1. From it we can find that the first run
achieves the best results in all kinds of pictures. That’s to say, the visual scheme
usually can reach a better result for combination of different visual descriptors
advantages.</p>
        <p>The run for the contextual descriptors is in the middle place of the three runs
and the one for flip combination descriptors gets the worst results which is in
contradiction with our expectation. So we can speculate that the reasons for this
phenomenon may have a relation to the number of training images in certain
species which have very few images and the meta data of image are similar and
relatively short which does not identify a species.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        This work is our first try in the plant identifcation task. Although the results are
not quite promising, we still reach some conclusions and gain some experience.
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) At first, as there are some species which have only a few images, the bias for
these species may be emphasized and combing some retrieval methods is a good
choice.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Many speacies have the same observation and the same id class. In fact, we
have a problem to product the run. Each run contains betwen 10 to 200 species
for each queries.
      </p>
      <p>While our solution was not performing up to the best ones in this competition
(the idea score is 0.47), it is fast, accurate enough for non-critical applications
and has potential for improvement. We intend to proceed developing this solution
further and plan a mobile phone implementation of it.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Go</surname>
          </string-name>
          <article-title>¨eau, Herv´e and Bonnet, Pierre and Barbe, Julien and Bakic, Vera and Joly, Alexis and Molino, Jean-Fran¸cois and Barthelemy, Daniel and Boujemaa, Nozha: Multiorgan Plant Identification</article-title>
          .
          <source>In :Proceedings of the 1st ACM International Workshop on Multimedia Analysis for Ecological Data (MAED '12)</source>
          . pp.
          <fpage>41</fpage>
          -
          <lpage>44</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Park</surname>
          </string-name>
          , Dong Kwon and Jeon, Yoon Seok and Won, Chee Sun:
          <article-title>Efficient Use of Local Edge Histogram Descriptor</article-title>
          .
          <source>In :Proceedings of the 2000 ACM Workshops on Multimedia</source>
          . pp.
          <fpage>51</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kasutani</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yamada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval</article-title>
          .
          <source>In :Proceedings. 2001 International Conference on Image Processing</source>
          . pp.
          <fpage>674</fpage>
          -
          <lpage>677</lpage>
          , vol.
          <volume>1</volume>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cieplinski</surname>
          </string-name>
          ,
          <article-title>Leszek: MPEG-7 Color Descriptors and Their Applications</article-title>
          .
          <source>In :Proceedings of the 9th International Conference on Computer Analysis of Images and Patterns (CAIP '01)</source>
          . pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Schlieder</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Querying and Ranking XML Documents</article-title>
          . In
          <source>:Journal of the American Society for Information Science and Technolog</source>
          . pp.
          <fpage>489</fpage>
          -
          <lpage>503</lpage>
          , vol.
          <volume>53</volume>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>FAKHFAKH</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tmar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>MAHDI</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A New Metric for Multimedia Retrieval in Structured Documents</article-title>
          .
          <source>In:15th International Conference on Enterprise Information Systems, ICEIS 2013</source>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>247</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Hanen</given-names>
            <surname>Karamti</surname>
          </string-name>
          :
          <article-title>Vectorisation du mod`ele d'appariement pour la recherche d'images par le contenu</article-title>
          .
          <source>In: CORIA</source>
          <year>2013</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>340</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Joly</surname>
          </string-name>
          , Alexis and Mu¨ller, Henning and Go¨eau, Herv´e and Glotin, Herv´e and Spampinato, Concetto and Rauber, Andreas and Bonnet, Pierre and Vellinga, Willem-Pier and Fisher, Bob.
          <source>LifeCLEF</source>
          <year>2014</year>
          :
          <article-title>multimedia life species identification challenges</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Go</surname>
          </string-name>
          <article-title>¨eau, Herv´e and Joly, Alexis and Bonnet, Pierre and Molino, Jean-Fran¸cois and Barth´el´emy, Daniel</article-title>
          and Boujemaa, Nozha.
          <source>LifeCLEF Plant Identification Task</source>
          <year>2014</year>
          . In: CLEF working notes
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>