<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Efficient instance-based fish species visual identification by global representation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierre-Hugues Joalland</string-name>
          <email>joalland@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sébastien Paris</string-name>
          <email>sebastien.paris@lsis.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hervé Glotin</string-name>
          <email>glotin@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix-Marseille Université</institution>
          ,
          <addr-line>CNRS, ENSAM, LSIS UMR 7296, 13397 Marseille</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut Universitaire de France</institution>
          ,
          <addr-line>103 Bd St-Michel, 75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Université de Toulon</institution>
          ,
          <addr-line>CNRS, LSIS UMR 7296, 83957 La Garde</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>785</fpage>
      <lpage>789</lpage>
      <abstract>
        <p>This paper presents the participation of the LSIS/DYNI team for the ImageCLEF 2014 Fish identification challenge. ImageCLEF's Fish identification task provides a testbed for the system-oriented evaluation of fish species identification based on still images. The goal is to investigate image retrieval approaches in the context of images extracted from collected videos. The LSIS/DYNI team submitted three runs,, won the challenge with results that sensibly outperform the baseline (both recall and precision of 0.99) for the imagebased fish recognition category with a fully automatic method. Our approach is based on a computer vision framework involving local, highly discriminative visual descriptors, sophisticated visual-patches encoder and large-scale supervised classification. The paper presents the three procedures employed, and provides an analysis of the obtained evaluation results.</p>
      </abstract>
      <kwd-group>
        <kwd>ImageCLEF</kwd>
        <kwd>fish species identification</kwd>
        <kwd>underwater video monitoring</kwd>
        <kwd>images</kwd>
        <kwd>identification</kwd>
        <kwd>classification</kwd>
        <kwd>Fisher Vectors</kwd>
        <kwd>Local Ternary Patterns</kwd>
        <kwd>late fusion</kwd>
        <kwd>encoding/pooling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This paper presents the contribution of the LSIS/DYNI group for the LifeClef Fish
identification task[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that was organized within ImageCLEF 2014 for the fish species
recognition based on still images containing only one fish instance (Subtask 4). This challenge was
organized as a classification task over 10 fish species with visual content being the main
available information. Considered images are extracted from underwater fish videos acquired with
natural background (see Fig. 1). The LSIS/DYNI team submitted three runs, all of them based
on local feature extraction and large-scale supervised classification. Our automatic methods
won the challenge and sensibly outperformed the baseline for the image-based fish recognition
task (both recall and precision of 0.99).
The task has been evaluated as a fish species retrieval task.
      </p>
      <sec id="sec-1-1">
        <title>2.1 Training and Test data</title>
        <p>The images dataset was built from the Fish4Knowledge (www.fish4knowledge.eu) videos in
charge of monitoring Taiwan coral reefs in the past five years. The dataset contains videos
recorded from sunrise to sunset showing several phenomena, e.g. murky water, algae on camera
lens, etc., which makes the fish identification task more complex.</p>
        <p>Each video has a resolution of either 320x240 or 640x480 with 5 to 8 fps. Only the 10 main
species were considered.
•
•</p>
        <p>The training data is comprised of 9868 images. The groundtruth consists in 10
directories (10 species), each one containing the images according to the species.
The test data is comprised of 6956 to-be-predicated images.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2 Task objective and evaluation metric</title>
        <p>The goal of the task was to retrieve the correct species among the 10 possible ones for each test
image.</p>
        <p>Each participant was allowed to submit up to 3 runs. As many species as possible can be
associated to each test image, sorted by decreasing confidence score. However, we chose to only
provide the best ranked one by our system.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Description of used methods</title>
      <p>
        For all submitted runs, we followed the same unsupervised pipeline [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]:
i. local feature extraction
ii. patch encoding
iii. pooling with spatial pyramid for local analysis and a linear large-scale supervised
classification
iv. supervised classification using Linear SVM
For all used methods, global representation is retrieved on 1x1 plus 2x2 pooling windows.
No image specific pre-processing was performed, in particular illumination correction,
background substraction. The posterior probabilities are retrieved from the SVM outputs by linear
regression. Late fusion is performed by averaging posterior probabilities.
      </p>
      <sec id="sec-2-1">
        <title>3.1 Local Ternary Patterns (LTP) → LSIS DYNI run 1</title>
        <p>
          The first run corresponds to a one layer architecture based on LTP features [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], where
dictionary is fixed by the LTP framework and local feature linearly encoded with a single dictionary
element. We fixed t = 10. The basic idea of LTP is to approximate ternary code by
concatenating two binary codes (Local Binary Patterns).
        </p>
        <p>Our method is based on a multiscale version where block size is selected from 1 pixel up to 3
pixels (3 scales in total). Final features are obtained by average pooling on 1x1 + 2x2 spatial
pyramid (5 windows).</p>
        <p>Thus, features size is 7680 due to 256 x 2 codes x 3 scales x 5 windows</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2 Late fusion of LTP and improved FV → LSIS DYNI run 2</title>
        <p>
          The second run is using LTP features as in run1, coupled with improved Fisher Vectors [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ],
where the spatial compound of the local features were added. As local features, we chose SIFT
vectors densely sampled and decorrelated by PCA in a 80 dimension space.
        </p>
        <p>We first compute 25x25 SIFT patches sized 24x24 pixels per image and repeat this for 3
scales. Fisher vectors are obtained with the same spatial pyramid as in 3.1, by estimating a
Gaussian Mixture Model (GMM) with 16 Gaussians. Fisher Vectors are derived from the
mean values and the variances of the fitted GMM.</p>
        <p>Thus, features size is 38400 due to 2 x 80 x 16 x 3 scales x 5 windows.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3 Late fusion of LTP + improved FV + Sparse Coding → LSIS DYNI run 3</title>
        <p>
          Here, we took as local features some LTP patches densely sampled (25x25 per image) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
As in 3.2, we first compute 25x25 LTP patches sized 24x24 pixels per image and repeat this
for 3 scales with the same spatial pyramid of 5 windows.
        </p>
        <p>Learning of dictionary is performed by using sparse coding, with a positivity constraint for
both sparse codes and dictionary elements. The dictionary finally contains 1024 elements and
lp-norm pooling on the 1x1+2x2 spatial pyramid where we fixed p = 3.</p>
        <p>Thus, features size is 15360 due to 1024 x 3 scales x 5 windows.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>4.1 Baseline</title>
      </sec>
      <sec id="sec-3-2">
        <title>4.2 Recall score</title>
        <p>
          The baseline for this task is VLFeat for fish species recognition [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Our runs outperformed sensibly the baseline by obtaining an average recall of 0.99 vs. Baseline
0.91 (see Fig. 2).
4.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Precision score</title>
        <p>Precision of 1 is obtained for almost all species expect for Chromis margaritifer, Dascyllus
reticulatus and Plectrogly-Phidodon dickii species :
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Our methods sensibly outperformed the baseline and were ranked first of this first ImageCLEF
Fish identification challenge our framework is well adapted for this easy challenge by
outperforming sensibly the baseline :</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>This work is supported by RAPID PHRASE project with Prolexia SA.</p>
        <p>This work is also supported by the SABIOD CNRS MI MASTODONS Big Data project on
automatic species identification and will next be completed by joint bioacoustic and visual
identification.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Boom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Palazzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. X.</given-names>
            <surname>Huang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-M. Chou</surname>
            ,
            <given-names>F.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Spampinato</surname>
          </string-name>
          , R. B. Fisher;
          <article-title>A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage</article-title>
          , Ecological
          <string-name>
            <surname>Informatics</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          , B. Fisher ;
          <article-title>multimedia life species identification challenges</article-title>
          ,
          <source>LifeCLEF</source>
          <year>2014</year>
          proceedings (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          , B. Fisher,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Boom</surname>
          </string-name>
          ; CLEF working notes
          <year>2014</year>
          ,
          <source>LifeCLEF Fish Identification Task</source>
          <year>2014</year>
          ,
          <article-title>FishClef 2014 proceedings (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Paris, S.,
          <string-name>
            <surname>Halkias</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
          </string-name>
          , H.:
          <article-title>Sparse coding for histograms of local binary patterns applied for image categorization: Toward a bag-of-scenes analysis</article-title>
          .
          <source>In: ICPR' 12</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Paris, S.,
          <string-name>
            <surname>Halkias</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
          </string-name>
          , H.:
          <article-title>Participation of LSIS/DYNI to ImageCLEF 2012 - working keynotes</article-title>
          . In: ImageCLEF'
          <volume>12</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Paris, S.,
          <string-name>
            <surname>Halkias</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
          </string-name>
          , H.:
          <article-title>Efficient Bag of Scenes Analysis Categorization</article-title>
          . In: ICPRAM'
          <volume>13</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sanchez</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mensink</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbee</surname>
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Image Classification with the Fisher Vector: Theory and Practice</article-title>
          .
          <source>In: International Journal of Computer Vision</source>
          <volume>105</volume>
          ,
          <issue>3</issue>
          ,
          <fpage>222</fpage>
          -
          <lpage>245</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tan</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Triggs</surname>
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions</article-title>
          . In: AMFG '
          <fpage>07</fpage>
          - 3rd
          <source>International Workshop Analysis and Modelling of Faces and Gestures</source>
          <volume>4778</volume>
          ,
          <fpage>168</fpage>
          -
          <lpage>182</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Vedaldi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fulkerson</surname>
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>VLFeat - an open and portable library of computer vision algorithms</article-title>
          . In: ACM International Conference on Multimedia. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>