<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IPL at ImageCLEF 2017 Concept Detection Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leonidas Valavanis</string-name>
          <email>valavanisleonidas@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spyridon Stathopoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business</institution>
          ,
          <addr-line>76 Patission Str, 10434, Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present the methods and techniques performed by the IPL Group for the concept detection task of ImageCLEF 2017. A probabilistic k-nearest neighbor approach was used for automatically detecting multiple concepts in medical images. The visual representation of images was based on the well known, bag of visual words and bag-of-colors models. Detection performance was further enhanced by applying late fusion on the results obtained using di erent image representations. Our best results were ranked 2nd compared with runs under the same conditions.</p>
      </abstract>
      <kwd-group>
        <kwd>probabilistic k-nearest neighbors</kwd>
        <kwd>image annotation</kwd>
        <kwd>concept detection</kwd>
        <kwd>quad-tree bag of colors</kwd>
        <kwd>bag of visual words</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Automatic Image annotation is an important and challenging task within the
eld of computer vision with applications in several domains. In the medical
domain it plays an important role in supporting image search, browsing and
organization for clinical diagnosis and treatment. Image retrieval based on semantic
information has many advantages and is more robust than using only low-level
visual features. In the case of absence of semantic information a typical method
to bridge the gap between low level visual features and high level semantics is
through the automatic image annotation. This is achieved by applying machine
learning techniques to learn a mapping of visual features to textual words. The
learned model is then used to assign semantic concepts to new unseen images.</p>
      <p>
        The ImageCLEFcaption 2017 task, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], part of ImageCLEF 2017 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], consists
of 2 subtasks: concept detection and caption prediction. Our group participated
in the concept detection subtask. For this task, participating groups were asked
to develop systems to identify the presence of relevant bio-medical concepts in
medical images.
      </p>
      <p>
        Details of this task can be found in the overview paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the web page
of the contest 1. Our approach to concept detection is based on a Probabilistic
1 http://www.imageclef.org/2017/caption
k-nearest neighbor (PKNN) merging two well known models for image
representation, that of the Bag of Visual Words (BoVW), [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and an improved version of
bag of colors (QBoC), [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. When combined with late fusion, results are further
improved, ranking 2nd in best performing runs compared to algorithms that
don't rely on external data sources.
      </p>
      <p>The following sections, present the image representation methods and our
algorithm for concept detection. Finally we report on our results and conclude
with possible venues for further research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Visual representation of images</title>
      <p>Three di erent visual representation models were used in our experiments:
1. Localized compact features
2. Bag of Visual Words (BoVW)
3. Bag of Colors (BoC)
2.1</p>
      <sec id="sec-2-1">
        <title>Localized compact features</title>
        <p>
          Compact visual descriptors have been used extensively in the past years to e
ciently represent images in a dataset. For this task, two kinds of visual features
were extracted:
1. Color and Edge Directivity Descriptor (CEDD)[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
2. Fuzzy Color and Texture Histogram (FCTH)[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>However, in their original form, these descriptors are extracted globally from
an image. In order to include a degree of spatial information, features are
extracted over a spatial 4x4 grid. The image is rst resized into 256 x 256 pixels
and then is split into a 4x4 grid of non-overlaping image blocks. The visual
features are then extracted for each block and their corresponding vectors are
concatenated to form a single feature vector. The nal vector size for the CEDD
and FCTH is 4 4 144 = 2; 303 and 4 4 192 = 3; 702 respectively.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dense SIFT features and BoVW</title>
        <p>Inspired from text retrieval, the Bag-of-visual Words (BoVW) approach has
shown promising results in the eld of image retrieval and classi cation. Here
the BoVW model was implemented using the DenseSIFT visual descriptor.</p>
        <p>
          The Dense SIFT algorithm [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], is a variant of the SIFT algorithm, which is
equivalent to extracting SIFT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] from a dense grid of locations at a xed scale
and orientation. The SIFT feature is invariant with respect to many common
image deformations, including position, scale, illumination, rotation, and a ne
transformation. The number of features extracted from local interest points using
the Dense SIFT descriptor may vary, depending on the image. In order to have a
xed number of feature dimensions, a visual codebook is created by clustering the
extracted local interest points of a number of sample images, using the k-means
clustering algorithm. After experiments, the number of clusters used is 4; 096.
Each cluster (visual word), represents a di erent local pattern which shares
similar interest points. The histogram of an image, is created by performing a
vector quantization which assigns each key-point to its closest cluster (visual
word).
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Quad-Tree Bag-of-Colors Model (QBoC)</title>
        <p>
          The QBoC representation was successfully used for representing images in
previous works [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]. With the BoC model [12] a color vocabulary is learned from a
sub-set of the image collection. This vocabulary is used to extract the color
histograms for each image. The BoC model was used for classi cation of biomedical
images in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and it was shown that it is combined successfully with the
BoWSIFT model in a late fusion manner. Similarly to the BoW model the main
drawback with the BoC is the lack of spatial information. Quad-Tree
decomposition sub-divides an image into regions of homogeneous colors. Each time the
image is split into four equal size squares and the process continues until we
reach a sub-region of size 1 1 pixel (see Figure 1b). In both models the Term
Frequency-Inverse Document Frequency (TF-IDF) weights of visual words were
calculated and the image vectors were normalized with the L1 norm.
(a)
(b)
        </p>
        <p>Probabilistic k-NN concept detection (PKNN)
In this section we brie y present our baseline algorithm for automatic concept
detection in medical images. The algorithm is divided into two main phases,
namely, the visual retrieval step and the annotation step.</p>
        <p>In the visual retrieval phase, for a given test image, a sample of the k most
visually similar images from the training dataset is retrieved. Several experiments
on the validation set helped to determine the optimum value for k (k=100).</p>
        <p>In the annotation step, the concepts associated to the k retrieved images
form the candidate concepts for a test image. The nal assigned concepts are
determined by a probability score based on the occurrence of concepts in the
selected sample. For every distinct concept, w, present in the retrieved training
subset, we calculate ConceptScore(w) as:</p>
        <p>D
combSU M (w) = X ConceptScored(w)</p>
        <p>d=1
where D is the number of descriptors to combine.</p>
        <p>ConceptScore(w) =
k
X P (j) P (wjj)
j=1
where j is an image from the top-k results and P (wjj) is calculated by:
P (wjj) =
count(w; j)</p>
        <p>jWj j
where count(w; j) is the number of times concept w is found in image j, and
jWj j is the total number of concepts in image j. P (j) is considered uniform for
all images and thus it is ignored.</p>
        <p>The top 6 concepts with the highest ConceptScore are selected for the test
image. This number was determined by calculating the average number of
concepts per image in the training set (i.e. 5:58).
3.1</p>
      </sec>
      <sec id="sec-2-4">
        <title>Concept scoring with Random walks with Restart (RWR)</title>
        <p>As an alternative concept scoring method, a Random Walk with Restart (RWR)
algorithm [11] was tested. First, from the set of the top k retrieved images an
adjacency matrix A of size [c c] is constructed, where c is the number of
distinct concepts in the retrieved images. Matrix A de nes the graph whose
nodes correspond to concepts and edges connect concepts if they are assigned in
the same image of the train set. Next, the RWR algorithm is applied to matrix
A resulting in a vector r of size [c 1] representing the most probable concepts
for the test image. Similarly, the top 6 concepts with the highest r(w) value are
selected as the concepts for the test image.
3.2</p>
      </sec>
      <sec id="sec-2-5">
        <title>Late fusion</title>
        <p>In order to improve results, late fusion was applied to the ranked lists of
concepts for each test image. In late fusion, the ranked concept score lists from
di erent visual descriptors are combined. A new score is calculated based on the
combSUM function:
(1)
(2)
(3)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Submitted Runs</title>
      <p>To determine our system's optimal parameters and best visual features we
experimented with the validation set provided by the organizers. Trying di erent
visual descriptors and concept detection algorithms, helped us conclude on the
submitted runs. Table 1 presents some of the top results obtained from these
experiments on the validation set. The "PKNN" pre x in run id corresponds
to experiments using the probabilistic concept detection algorithm and "RWR"
the random walks with restarts algorithm. The "LFS" pre x corresponds to runs
using the late fusion method described in Section 3.2. Table 2 presents the runs
submitted to clef and their corresponding results. The same pre xes are also
used to describe the individual runs.</p>
      <p>Run ID
PKNN CEDD
PKNN CEDD 4x4
PKNN FCTH
PKNN FCTH 4x4
PKNN GBOC
PKNN DSIFT
RWR GBOC
RWR DSIFT
LFS PKNN (FCTH 4x4 DSIFT GBOC)
LFS PKNN (CEDD 4x4 DSIFT GBOC)
In this report, we presented the image concept detection methods used by the
IPL Group for the medical concept detection subtask at ImageCLEF 2017. For
our runs, we used a simple Probabilistic k-Nearest Neighbor approach.
Experiments show that using late fusion on BoVW and QBoC performs best and that
the image representation plays an important role in performance. Furthermore,
the Random walks with Restarts algorithm seemed to perform slightly less,
however, a more systematic research is currently underway for this method. Our best
run was ranked 2nd in the top performing runs compared to algorithms that
don't rely on any external data sources. This results are encouraging and lead to
further research on improving the concept detection algorithm with additional
textual meta-data.
11. Wang, C., Jing, F., Zhang, L., Zhang, H.J.: Image annotation re nement using
random walk with restarts. In: Proceedings of the 14th ACM International
Conference on Multimedia. pp. 647{650. MM '06, ACM, New York, NY, USA (2006),
http://doi.acm.org/10.1145/1180639.1180774
12. Wengert, C., Douze, M., Jegou, H.: Bag-of-colors for improved image search. In:
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale,
AZ, USA, November 28 - December 1, 2011. pp. 1437{1440 (2011), http://doi.
acm.org/10.1145/2072298.2072034</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>Cedd: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval</article-title>
          .
          <source>In: ICVS</source>
          . pp.
          <volume>312</volume>
          {
          <issue>322</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.</surname>
          </string-name>
          : Fcth:
          <article-title>Fuzzy color and texture histogram - a low level feature for accurate image retrieval</article-title>
          .
          <source>In: WIAMIS</source>
          . pp.
          <volume>191</volume>
          {
          <issue>196</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwall</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Garc a Seco de Herrera,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Overview of ImageCLEFcaption 2017 - image caption prediction and concept detection for biomedical images</article-title>
          .
          <source>In: CLEF 2017 Labs Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Dublin,
          <source>Ireland (September</source>
          <volume>11</volume>
          -14
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Garc a Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Markonis</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Bag{of{colors for biomedical document image classi cation</article-title>
          .
          <source>In: Medical Content-Based Retrieval for Clinical Decision Support</source>
          , pp.
          <volume>110</volume>
          {
          <fpage>121</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boato</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dicente Cid</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia Seco de Herrera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwall</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Overview of ImageCLEF 2017: Information extraction from images</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017. Lecture Notes in Computer Science</source>
          , vol.
          <volume>10456</volume>
          . Springer, Dublin,
          <source>Ireland (September</source>
          <volume>11</volume>
          -14
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A bayesian hierarchical model for learning natural scene categories</article-title>
          .
          <source>In: CVPR (2)</source>
          . pp.
          <volume>524</volume>
          {
          <issue>531</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Sift ow: Dense correspondence across scenes and its applications</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>33</volume>
          (
          <issue>5</issue>
          ),
          <volume>978</volume>
          {994 (May
          <year>2011</year>
          ), http://dx.doi.org/10.1109/TPAMI.
          <year>2010</year>
          .147
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In: Proceedings of the International Conference on Computer Vision-Volume 2 -</source>
          Volume 2. pp.
          <volume>1150</volume>
          {
          <fpage>1157</fpage>
          . ICCV '99, IEEE Computer Society, Washington, DC, USA (
          <year>1999</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>850924</volume>
          .
          <fpage>851523</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Valavanis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stathopoulos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalamboukis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>IPL at CLEF 2016 medical task</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2016</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Evora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          . pp.
          <volume>413</volume>
          {
          <issue>420</issue>
          (
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /16090413.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Valavanis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stathopoulos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalamboukis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fusion of bag-of-words models for image classi cation in the medical domain</article-title>
          .
          <source>In: Advances in Information Retrieval - 39th European Conference on IR Research</source>
          , ECIR
          <year>2017</year>
          ,
          <article-title>Aberdeen</article-title>
          , UK, April 8-
          <issue>13</issue>
          ,
          <year>2017</year>
          , Proceedings. pp.
          <volume>134</volume>
          {
          <issue>145</issue>
          (
          <year>2017</year>
          ), https://doi.org/10.1007/ 978-3-
          <fpage>319</fpage>
          -56608-5_
          <fpage>11</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>