<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-disciplinary modality classi cation for medical images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktor Gal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Illes Solt</string-name>
          <email>solt@tmit.bme.hu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Gedeon</string-name>
          <email>tom.gedeon@anu.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mike Nachtegael</string-name>
          <email>mike.nachtegaelg@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Applied Mathematics and Computer Science, Ghent University</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics</institution>
          ,
          <country country="HU">Hungary</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computer Science, The Australian National University</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Modality is a key facet in medical image retrieval, as a user is likely interested in only one of e.g. radiology images, owcharts, and pathology photos. While assessing image modality is trivial for humans, reliable automatic methods are required to deal with large un-annotated image bases, such as gures taken from the millions of scienti c publications. We present a multi-disciplinary approach to tackle the classication problem by combining image features, meta-data, textual and referential information. Our system achieved an accuracy of 96.86 % in cross-validation on the ImageCLEF 2011 training dataset having 18 imbalanced modality classes, and an accuracy of 90.2 % on the ImageCLEF 2010 dataset having 8 well-balanced modality classes. We evaluate the importance of the individual feature sets in detail, and provide an error analysis pointing at weaknesses of our method and obstacles in the classi cation task. For the bene t of the image classi cation community, we make the results of our feature extraction methods publicly available at http://categorizer.tmit.bme.hu/~illes/imageclef2011modality.</p>
      </abstract>
      <kwd-group>
        <kwd>image classi cation</kwd>
        <kwd>image feature extraction</kwd>
        <kwd>image modality</kwd>
        <kwd>text mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Imaging modality is an important aspect of the image for medical retrieval [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In
user-studies, clinicians have indicated that modality is one of the most important
lters that they would like to be able to limit their search by. However, this
modality is typically extracted from the caption and is often not correct or
present. Studies have shown that the modality can be extracted from the image
itself using visual features [
        <xref ref-type="bibr" rid="ref10 ref13 ref7">13, 10, 7</xref>
        ]. Therefore, In this paper, we propose to use
both visual and textual features for medical image representation, and combine
the di erent features using normalised kernel function in SVM.
      </p>
      <p>
        The proposed algorithm is evaluated in the context of the ImageCLEF 2011
Modality Classi cation task[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which uses a dataset of 988+1024 images taken
from PubMed articles.
      </p>
      <p>The rest of this paper is organised as follows. In Section 1, we describe in
detail our experimental setting. In Section 3, we present and compare di erent
runs we submitted. We discuss the submitted runs and the results in Section 4
and we conclude in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Evaluation setting</title>
        <p>In this section, we describe in detail our experimental setting.</p>
        <p>The ImageCLEF 2011 Modality Classi cation task used split-validation
measuring the accuracy of the systems. On the training dataset, we performed
strati ed 10-fold cross-validation to evaluate feature sets and classi ers.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Feature extraction</title>
        <p>Caption text Figures in scienti c publications often have descriptive captions
that provide information on the modality of the image. \Contrast-enhanced
axial computed tomographic scan", \HRCT showing extensive areas of
consolidation with air bronchogram" are examples of captions of images assigned to the
`CT' modality class. However, the caption may be missing or may not hint at
the modality, e.g. \E. coli that satisfy the similarity threshold values." As the
examples suggest, the linguistic constructs expressing modality can have a high
variation. Considering these remarks, we extract binary features from caption
texts as follows. We de ne a set of regular expressions to be matched against the
caption text, a match results in a value of 1. Regular expressions were initially
created for each word having a high information gain for any of the
modality classes and were later manually re ned to capture linguistic variations (e.g.
f?MRI?) and multi-word phrases (e.g. error bars?).</p>
        <p>MeSH terms Scienti c articles indexed by Medline/PubMed are tagged with
MeSH terms (medical subject headings) by eld experts. MeSH terms can be
seen as a thesaurus for the life sciences containing entries like `Human', `Liver
Neoplasms' and `Magnetic Resonance Imaging', entries can be further quali ed
by e.g. `methods', `pathology'. We hypothesise that the article's MeSH terms
and its gures' modality are correlated, and hence de ne features
corresponding to individual MeSH terms and quali ers. A unique identi er for the article
(e.g. PMID or DOI) is required to retrieve its MeSH annotations, however, such
identi ers can be absent. As the number of MeSH terms, quali ers and their
combinations far exceeds the number of modality labels, we perform feature
selection by keeping only those that are present for at least a prede ned number
of articles in the training set.</p>
        <p>
          Colour histogram Using colour histograms in content-based image retrieval
system has been successfully applied in the past, for a detailed review see [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Based
on these studies we have chosen to use HSV colour-space based histogram, and
quantised the hue and the saturation to three and the value to four levels.
        </p>
        <p>Based on this we de ned f hist feature vector, where each element of the
vector represents the normalised number of pixels in a given histogram bin.
Mean of pixels Through manually supervised error analysis on the training set,
we identi ed that the images in Graphic 1st-level group are mainly having a
white background. Hence, we have de ned a simple feature fmean = Ij , that
represents the mean value of the pixels in an image. By simply thresholding
these values one could identify the images that belong to the Graphic group
with a very high accuracy.</p>
        <p>Axis recognition The previously mentioned mean of pixels method gave a strong
support for recognising images in the Graphic top-level group, but as it consists
of two sub-groups, Graphs and Drawing, thus a new feature was required to
di erentiate the images belonging to one or the other category. By manually
observing the images in these two categories one can easily point out the main
di erence by using a simple edge detector: the images belonging to the Graphs
category are mainly consisting of horizontal and vertical lines (i.e. the x-y axis of
a graph), whereas the images in Drawing category are mostly diagrams, where
the orientation of the lines is random.</p>
        <p>Based on this idea we have de ned the following feature. Let LIj be the set of
all the detected lines and GLIj be the number of good lines in an arbitrary image
Ij , where a given line is a good line if it's orientation is horizontal or vertical
and it is within a given margin of the picture's border. The latter condition is
for not to count the borders of an image as good lines.</p>
        <p>Using these two sets we de ned a feature
flines(Ij ) = jGLIj j
jLIj j
(1)</p>
        <p>
          In order to detect the lines and their orientation in an image we used a simple
Hough transform [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Skin detection The images in the Dermatology category was one of the most
di cult recognise. As not only it was the least represented category in the whole
training set, i.e. there are only seven examples (see Table 1) for this category,
but the images in this set are simple photographs (of various skin abnormalities)
thus they have very similar characteristics to the general photo labeled images.
Hence, most of the previously de ned features failed to distinguish the images
in Dermatology set from the others.</p>
        <p>
          Using a simple skin detector algorithm[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] we de ned a new feature fskin(Ij )
for and image Ij
fskin(Ij ) = SD(Ij )
(2)
where the function SD( ) calculates the skin-segmented binary image of an input
image, and Ik{as previously de ned{is the mean value of image Ik.
Meta-data We determine whether an image post-processing software was used
by analysing meta-data stored in JPEG les' EXIF section. For this, we analyze
the `Comment' eld, to nd mentions of commonly used image manipulation
software (e.g. Adobe Photoshop, MS Paint). We also extract from the EXIF
whether the image is stored as gray-scale only.
        </p>
        <p>
          Radiopaedia Radiopaedia (http://radiopaedia.org) is a community wiki for
radiology images and patient cases. Images are tagged by users with the body system
(e.g. Heart, Musculoskeletal) depicted, but unfortunately for us, not with the
type of radiology method used to create the image. Leveraging the mutual
information between body systems and radiology methods, we derived features for
modality classi cation by taking the output probabilities of a classi er trained
to predict body systems shown in the image.
Bag of visual-words The state-of-the-art content based image retrieval systems
has been signi cantly improved by the introduction of SIFT[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] features and the
bag-of-words image representation [
          <xref ref-type="bibr" rid="ref12 ref14 ref3 ref8">12, 8, 3, 14</xref>
          ].
        </p>
        <p>
          The bag-of-visual-words image representation is based on the bag of words
(BoW) model in natural language processing (NLP). BoW in NLP is a popular
method for representing documents In this model a document is simply
represented by the number of di erent words that are in the document. The idea
behind this is, that documents on the same topic have similar words with similar
number of occurrences in them (see LDA[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]).
        </p>
        <p>In case of and image, the basic idea of bag-of-words model is that a set
of local image patches is sampled using some method{e.g. densely or using a
key-point detector{and a vector of visual descriptors is evaluated on each patch
independently. In this paper we used the well known SIFT descriptor on each
patch. The SIFT descriptor computes a gradient orientation histogram within
the support region. For each of eight orientation planes, the gradient image is
sampled over a four y four grid of locations, hence resulting in a 128-dimensional
feature vector for each region. In order to make the descriptor less sensitive to
small changes in the position of the support region and put more emphasis on
the gradients that are near the centre of the region a Gaussian window function
is used to assign a weight to the magnitude of each sample point.</p>
        <p>After acquiring these SIFT features for all the images in the dataset, the
nal step is to convert vector represented patches to "codewords" (analogy to
words in text documents), which also produces a "codebook" (analogy to a
word dictionary). A codeword can be considered as a representative of several
similar patches. In our case we performed k-means clustering over all the vectors.
Codewords are then de ned as the centres of the learned clusters. Thus, each
patch in an image is mapped to a certain codeword through the clustering process
and the image can be represented by the histogram of the codewords.</p>
        <p>
          In our bag-of-visual-words model we used the the tf-idf weighting[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] scheme,
that has proven to be a very successful approach for image retrieval. The tf
part of the weighting scheme represents the number of features described by
a given visual word. The frequency of visual word in the image provides useful
information about repeated structures and textures. While, the idf part captures
the informativeness of visual words{visual words that appear in many di erent
images are less informative than those that appear rarely.
        </p>
        <p>Other systems The challenge organisers generously supplied participants with
predictions of their in-house system. This classi cation was automatic for the test
set, but confusingly enough, the ground truth labels were used for the train set. In
order to exploit this valuable resource, we used it as an input to our classi er by
introducing arti cial smoothing to avoid over tting on this particular otherwise
noise free indicator variable. Also note that while split evaluation is sound in
this setting, the cross-validation evaluation of those two runs is awed (being
over-optimistic) due to information leakage.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Classi cation</title>
        <p>
          Based on the numerical and binary features of the images obtained through
feature extraction, we perform vector space classi cation to predict modality classes
of unseen images. Among the classi cation algorithms available in Weka [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we
found the support vector machine SMO to have the best standalone
performance over the full feature space in cross-validation on ImageCLEF 2011
training dataset. We used SMO with default settings for the rest of the experiments
unless stated otherwise.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>In this section, we provide the nal results of the ve submitted runs for the
modality classi cation tasks. Table 2 shows both the correctly classi ed
percentage for the di erent features set compositions. Comparing the result of our best
submitted run and the best submitted run of the modality classi cation task,
one can see that there is very small (0.88%) di erence between the two runs.</p>
      <p>The performance of the runs broken down for the individual classes is show
in Table 3 and in Figure 1.
As can be seen on Figure 1, the systems performs well on higher support classes,
while performance drops to zero for some more rare classes. This behaviour is
tolerated by the challenge main evaluation metric accuracy, in contrast to a
more pessimistic evaluation like F-measure. Table 2 shows, which features have
been used in the di erent runs. It is important to see that omitting Caption text
features results in almost about a ten percent accuracy loss, see the di erence
between the runs #3 and #4.</p>
      <p>Using MeSH and Radiopaedia features gained us about one percent in
accuracy.</p>
      <p>The in-house modality classi er of the challenge organisers proved to be
superior in predicting the `Dermatology' class (Table 3, however, its inferior
performance on higher support classes prevented it from being bene tial in
combination (Table 2).
4.1</p>
      <sec id="sec-3-1">
        <title>Other experiments</title>
        <p>Motivated by the grouping of modality labels by the challenge organisers, we
experimented with hierarchical classi cation. In particular, we applied a
hierarchical greedily ascending classi er scheme wrapping the baseline classi er. In this
scheme, classi cation is rst performed on the hierarchies uppermost level (here
groups), then the most probable hierarchy node is selected where classi cation
continues recursively. For hierarchical classi cation, cross-validation results were
inferior to those obtained from the baseline ( at) classi er.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we proposed to extract di erent visual and textual features for
medical image representation, and fusion the di erent extracted visual feature
and textual feature for modality classi cation. To extract visual features from
the images, we used some state-of-art methods like bag-of-visual words and some
standard ones like colour histogram and introduced some heuristic
representations of the images specialised for the ImageCLEF2011 medical modality
classication task.</p>
      <p>With the suggested feature extraction algorithms in this paper and the SVM
classi er we have achieved to 2nd place on the ImageCLEF2011 medical image
modality classi cation task.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements References</title>
      <p>Viktor Gal was supported by Marie Curie Initial Training Networks (ITN) Ref.
238819 (FP7-PEOPLE-ITN-2008).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>Andrew Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>Michael I.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          {
          <fpage>1022</fpage>
          ,
          <string-name>
            <surname>March</surname>
          </string-name>
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D</given-names>
            <surname>Chai</surname>
          </string-name>
          and
          <string-name>
            <given-names>K N</given-names>
            <surname>Ngan</surname>
          </string-name>
          .
          <article-title>Face segmentation using skin-color map in videophone applications. Circuits and Systems for Video Technology</article-title>
          , IEEE Transactions on,
          <volume>9</volume>
          (
          <issue>4</issue>
          ):
          <volume>551</volume>
          {
          <fpage>564</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>O</given-names>
            <surname>Chum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Sivic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M</given-names>
            <surname>Isard. Total Recall</surname>
          </string-name>
          :
          <article-title>Automatic query expansion with a generative feature model for object retrieval</article-title>
          .
          <source>In 2007 IEEE 11th International Conference on Computer Vision</source>
          , pages
          <fpage>1</fpage>
          <article-title>{8</article-title>
          . IEEE,
          <year>October 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>RO</given-names>
            <surname>Duda</surname>
          </string-name>
          .
          <article-title>Use of the Hough transformation to detect lines and curves in pictures</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Mark</given-names>
            <surname>Hall</surname>
          </string-name>
          , Eibe Frank, Geo rey Holmes, Bernhard Pfahringer,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Reutemann</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ian</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>The WEKA data mining software: an update</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>10</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>William R Hersh</surname>
            , Henning Muller, Je ery R Jensen,
            <given-names>Jianji</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            , Paul N Gorman, and
            <given-names>Patrick</given-names>
          </string-name>
          <string-name>
            <surname>Ruch</surname>
          </string-name>
          .
          <article-title>Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>13</volume>
          (
          <issue>5</issue>
          ):
          <volume>488</volume>
          {
          <fpage>496</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A</given-names>
            <surname>Jain</surname>
          </string-name>
          .
          <article-title>Image retrieval using color and shape</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>29</volume>
          (
          <issue>8</issue>
          ):
          <volume>1233</volume>
          {
          <fpage>1244</fpage>
          ,
          <year>August 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Harzallah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <article-title>A contextual dissimilarity measure for accurate and e cient image search</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2007</year>
          , IEEE Conference on,
          <source>(CVPR '07)</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Jayashree</given-names>
            <surname>Kalpathy-Cramer</surname>
          </string-name>
          , Henning Muller, Steven Bedrick, Ivan Eggel, Alba Garcia Seco de Herrera, and
          <string-name>
            <given-names>Theodora</given-names>
            <surname>Tsikrika</surname>
          </string-name>
          .
          <article-title>The CLEF 2011 medical image retrieval and classi cation tasks</article-title>
          .
          <source>In CLEF 2011 working notes</source>
          , Amsterdam, The Netherlands,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Abolfazl</given-names>
            <surname>Lakdashti</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Moin</surname>
          </string-name>
          .
          <article-title>A New Content-Based Image Retrieval Approach Based on Pattern Orientation Histogram</article-title>
          . In Andre Gagalowicz and Wilfried Philips, editors,
          <source>Computer Vision/Computer Graphics Collaboration Techniques</source>
          , pages
          <volume>587</volume>
          {
          <fpage>595</fpage>
          . Springer Berlin / Heidelberg, Berlin, Heidelberg,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>David</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lowe.</surname>
          </string-name>
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In Proceedings of the International Conference on Computer Vision-Volume 2 -</source>
          Volume 2, ICCV '
          <volume>99</volume>
          , pages
          <fpage>1150</fpage>
          {, Washington, DC, USA,
          <year>1999</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D</given-names>
            <surname>Nister</surname>
          </string-name>
          and
          <string-name>
            <given-names>H</given-names>
            <surname>Stewenius</surname>
          </string-name>
          .
          <article-title>Scalable Recognition with a Vocabulary Tree</article-title>
          . In
          <source>Computer Vision and Pattern Recognition</source>
          ,
          <year>2006</year>
          IEEE Computer Society Conference on, pages
          <volume>2161</volume>
          {
          <fpage>2168</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>A</given-names>
            <surname>Pentland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R W</given-names>
            <surname>Picard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S</given-names>
            <surname>Sclaro</surname>
          </string-name>
          . Photobook:
          <article-title>Content-based manipulation of image databases</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <volume>233</volume>
          {
          <fpage>254</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>J. Philbin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Chum</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sivic</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Lost in quantization: Improving particular object retrieval in large scale image databases</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Josef</given-names>
            <surname>Sivic</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Video Google: A Text Retrieval Approach to Object Matching in Videos</article-title>
          .
          <source>In 9th IEEE International Conference on Computer Vision</source>
          (ICCV
          <year>2003</year>
          ), pages
          <fpage>1470</fpage>
          {
          <fpage>1477</fpage>
          . IEEE Computer Society,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>RC</given-names>
            <surname>Veltkamp</surname>
          </string-name>
          .
          <article-title>A survey of content-based image retrieval systems</article-title>
          .
          <source>Content-based image and video retrieval</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>