<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IPL at imageCLEF 2018: A kNN-based Concept Detection Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leonidas Valavanis</string-name>
          <email>valavanisleonidas@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theodore Kalamboukis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business</institution>
          ,
          <addr-line>76 Patission Str, 10434, Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present the methods and techniques performed by the IPL Group for the Concept Detection subtask of the ImageCLEF 2018 Caption Task. In order to automatically predict multiple concepts in medical images, a k-NN based concept detection algorithm was used. The visual representation of images was based on the bagof-visual-words and bag-of-colors models. Our proposed algorithm was ranked 13th among 28 runs and our top run achieved F1 score 0.0509.</p>
      </abstract>
      <kwd-group>
        <kwd>Bag of Visual Words</kwd>
        <kwd>Bag of Colors</kwd>
        <kwd>image annotation</kwd>
        <kwd>knn</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Visual Concept Detection or Automated Image Annotation of medical images
has become a very challenging task, due to the increasing number of images
in the medical domain. This immense number of medical images leads to a
situation where automatic image annotation becomes more and more important.
It has gained a lot of attention through various challenges and researchers that
try to automate the understanding of the image and its content and provide
useful insights that could be beneficial for clinicians. Detecting image concepts
can be particularly useful in Content Based Image Retrieval (CBIR) because it
allows us to annotate images with semantically meaningful information bridging
the gap between low level visual features and high level semantic information
and improving the efectiveness of CBIR. The main idea of image annotation
techniques is to learn automatically semantic concept from a large number of
training samples, and use them to label new images. As the amount of visual
information increases, the need for new methods for searching it, also increases.</p>
      <p>
        This year our group participated only in the concept detection task. Details
of this task and the data can be found in the overview papers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the web
page of the contest 1. In the next section we present a detailed description of the
1 http://http://www.imageclef.org/2018/caption
modelling techniques. In section 3, the data representation and preprocessing is
presented as well as the results of our submitted runs. Finally, Section 4 concludes
our work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>A k-NN based concept detection algorithm</title>
      <p>kNN is one of the simplest and very efective algorithm in classification problems.
Given an unknown item, J , (testing image) we calculate its distance from all
images in a training set and assign the item to its nearest category. The decision
is based on the number of neighbours we take to assign a score to categories.
This is a dificult decision and a weak point of the algorithm.</p>
      <p>In the following we give the definition of the annotation problem and a brief
description of the algorithm we used in our submitted runs.</p>
      <p>
        Let X = fx1; x2; :::; xN g a set of images and Y = fy1; y2; :::; ylg the set of
concepts (labels or annotations). Consider a train set T = f(x1; Y1); :::; (xN ; YN )g
where Yi Y is a subset of concepts assigned to image xi. For each concept yi we
define its semantic group by the set Ti T , such that Ti = fxk 2 T : yk 2 Yig.
The annotation problem is then defined by the probability p(yijJ ). Given an
unannotated image, J , the best label will be
(1)
(2)
(3)
were dist(:) can be any distance (L1, L2 etc) between the visual representations
of the images J and xj . If we sort the images inside Tk, with the highest score
on the top, we can take any number of images in the summation 2. Due to the
imbalance property of the images between the semantic groups of the concepts
usually we consider only a subset of the nearest neighbours in the summation
of equation 2. The scores in (2) are converted to probabilities using a soft-max
function with a decay parameter, w, which nullifies the distances of most of the
images and only few of them, the nearest ones, contribute in the summation.
In our experiments the value of the parameters, like the parameter w, were
estimated on experiments with the CLEF 2017 data set. The algorithm we have
described in matrix form is written by the matrix multiplication:
score(yk; J ) = J T XvT Y
were Xv is a matrix N mv with N the size of the train set and mv the number of
visual features and Y a N numOf Concepts binary and very sparse matrix. The
entry Y fig(j) denotes the j th conceptID of the image i. For computational
eficiency, eq. 3 was implemented with a very fast Matlab function using the
cosine distance -equivalent to Euclidean distance- for normalized to unity vectors.
Several relations have been used to define the probability
following it is defined by the score:
p(yijJ ) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the
y = arg max p(yijJ )
      </p>
      <p>i
score(yk; J ) =
∑ dist(J; xj )
xj2Tk</p>
      <p>As we already mention due to the imbalance property of the annotation
problem the algorithm benefits those concepts with high frequency occurrence
in the training images. From several experiments with smaller data, plotting the
distribution of relevance of the concepts, sorted by their values of DF (Document
Frequency) versus the distribution of retrieval we observed that, concepts with
low frequency in the train set are downgraded while concepts with high frequency
are benefited by the algorithm. Thus the algorithm was modified by normalizing
the outcome from eq. 3 by the value DF (yk)=avgDF . This last step improved
the performance of the algorithm significantly.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <sec id="sec-3-1">
        <title>Image Visual Representation</title>
        <p>
          One important step in the process of concept detection is the visual
representation of images. Images are represented using two models, the Bag-of-visual
Words (BoVW) model and a generalized version of the Bag-of-Colors (QBoC)
model based on the quad tree decomposition of the image. The BoC model was
used for classification of biomedical images in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and it was combined
successfully with the BoVW-SIFT model in a late fusion manner. In a similar vein, we
based our approach to the BoVW and QBoC models for the concept detection
of images. In this section, we give a brief insight of these descriptors.
Bag-of-visual Words (BoVW) The BoVW model has shown promising
results in the field of classification and image retrieval. The DenseSIFT visual
descriptor was used to implement the BoVW model in our runs. This process
includes the extraction of the SIFT keypoints [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] from a dense grid of locations
at a fixed scale and orientation of the images. The extracted interest points are
clustered, by k-means, to form a visual codebook of a predefined size. In our runs
the size of the codebook was 4,096. The final representation of an image is
created by performing a vector quantization which assigns each extracted key-point
of an image to its closest cluster in the codebook.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Generalized Bag-of-Colors Model(QBoC) A generalized version of the</title>
        <p>
          BoC model was proposed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The approach introduces spatial information
in the representation of an image by finding homogeneous regions based on
some criterion. A common data structure which has been used for this purpose
is the quadtree. A quad-tree recursively divides a square region of an image into
four equal size quadrants until a homogeneous quadrant was found or a stopping
criterion is met. This approach uses the simple BoC model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to form a
vocabulary or palette of colors, which is then used to extract the color histograms for
each image. Similar colors within a sub-region of the image are quantized into
the same color, which is the closest color(visual word) in the visual codebook. In
our runs we have used two diferent palettes of size 100 and 200. These palettes
result to 1500 and 3000 total color features depending on the number of levels
in the quad-tree.
3.2
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Preprocessing and normalization</title>
        <p>The TF-IDF weights of visual words were calculated for each model separately
and the corresponding visual vectors were normalized using the Euclidean norm.
The similarities between test and train images were combined for both
representations of the images in a late fusion manner using weights w1=0.65 for the
DenseSIFT descriptor and w2=0.35 for the QBoC. The values of these
parameters were chosen based on our experiments on several other image collections.
Finally another parameter of our algorithm is the decay parameter (w) of the
softmax function. Several values of w were used in our runs based on
experimentation with the CLEF-2017 data set. This year’s data contain a set of 223305
training images and a test set of 10000 images with 111156 discrete concepts.
The training data are represented with two matrices, one of 223305 4096 for
the dense SIFT representation and the other of 223305 3000 for the QBoC
representation. Similarly are represented the images in the test set. These data
demand more that 13GB of memory. To overcome our memory limitations we
implemented a parallel knn algorithm splitting the matrices into 10 blocks that
are accommodated in RAM.
3.3</p>
      </sec>
      <sec id="sec-3-4">
        <title>Submitted Runs and Results</title>
        <p>To determine the algorithm’s optimal parameters we experimented with the
ImageCLEF 2017 caption task dataset. In this year’s contest we submitted eight
visual runs for the concept detection task. For all runs we used Dense Sift with
4.096 clusters and GBoC with 200 clusters. The parameter w is between 200
and 300. The parameter annot denotes the number of predicted concepts. The
results are presented in table 1.</p>
        <p>Run_ID F1 Score Annot Parameter
DET_IPL_CLEF2018_w_300_gboc_200 0.0509 70
DET_IPL_CLEF2018_w_300_gboc_200 0.0406 40
DET_IPL_CLEF2018_w_300_gboc_200 0.0351 30
DET_IPL_CLEF2018_w_200_gboc_200 0.0307 30</p>
        <p>The choice of the parameters w, and number of concepts (annot) is a matter
for further investigation. It seems that there is a trade of between the values
of these parameters which are set experimentally. A large value of w, may lead
the model to over-fitting while a large value of annot reduces the accuracy. Our
choice of annot was based on the observation that on average each image in the
train set contains 30 concepts.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper we presented the automated image annotation experiments
performed by the IPL Group for the concept detection task at ImageCLEF 2018
Caption task. A k-NN based concept detection algorithm was used for the
automatic Image Annotation of medical images. A normalization step was proposed
on the scores in eq. (3) which improved significantly the performance of kNN.
The results so far with our new knn approach are encouraging and several new
directions have emerged which are currently under investigation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrearczyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
          </string-name>
          , H.:
          <article-title>Overview of the imageclef 2018 caption prediction tasks</article-title>
          .
          <source>In: CLEF working notes</source>
          , CEUR. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>C.V.</given-names>
          </string-name>
          :
          <article-title>Image annotation using metric learning in semantic neighbourhoods</article-title>
          .
          <source>In: Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision</source>
          , Florence, Italy, October 7-
          <issue>13</issue>
          ,
          <year>2012</year>
          , Proceedings, Part III. (
          <year>2012</year>
          )
          <fpage>836</fpage>
          -
          <lpage>849</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markonis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
          </string-name>
          , H.:
          <article-title>Bag-of-colors for biomedical document image classification</article-title>
          .
          <source>In: Medical Content-Based Retrieval for Clinical Decision Support</source>
          . Springer (
          <year>2013</year>
          )
          <fpage>110</fpage>
          -
          <lpage>121</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>60</volume>
          (
          <issue>2</issue>
          ) (
          <year>2004</year>
          )
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Valavanis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stathopoulos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalamboukis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fusion of bag-of-words models for image classification in the medical domain</article-title>
          .
          <source>In: Advances in Information Retrieval - 39th European Conference on IR Research</source>
          , ECIR
          <year>2017</year>
          ,
          <article-title>Aberdeen</article-title>
          , UK, April 8-
          <issue>13</issue>
          ,
          <year>2017</year>
          , Proceedings. (
          <year>2017</year>
          )
          <fpage>134</fpage>
          -
          <lpage>145</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wengert</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jégou</surname>
          </string-name>
          , H.:
          <article-title>Bag-of-colors for improved image search</article-title>
          .
          <source>In: Proceedings of the 19th International Conference on Multimedia</source>
          <year>2011</year>
          , Scottsdale,
          <string-name>
            <surname>AZ</surname>
          </string-name>
          , USA, November 28 - December 1,
          <year>2011</year>
          . (
          <year>2011</year>
          )
          <fpage>1437</fpage>
          -
          <lpage>1440</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Bogdan</given-names>
            <surname>Ionescu</surname>
          </string-name>
          , Henning Müller, Mauricio Villegas, Alba García Seco de Herrera, Carsten Eickhof, Vincent Andrearczyk, Yashin Dicente Cid, Vitali Liauchuk, Vassili Kovalev, Sadid A.
          <string-name>
            <surname>Hasan</surname>
          </string-name>
          , Yuan Ling, Oladimeji Farri, Joey Liu, Matthew Lungren,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen,
            <given-names>Luca</given-names>
          </string-name>
          <string-name>
            <surname>Piras</surname>
          </string-name>
          , Michael Riegler, Liting Zhou, Mathias Lux, Cathal Gurrin: In: Overview of ImageCLEF 2018:
          <article-title>Challenges, Datasets and Evaluation In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF</article-title>
          <year>2018</year>
          ),
          <year>2018</year>
          , LNCS Lecture Notes in Computer Science, Springer, September
          <volume>10</volume>
          -
          <issue>14</issue>
          ,
          <fpage>1437</fpage>
          -
          <lpage>1440</lpage>
          , Avignon, France
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>