<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of Visual Concepts and Annotation of Images using Predictive Clustering Trees</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivica Dimitrovski</string-name>
          <email>ivicad@feit.ukim.edu.mk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dragi Kocev</string-name>
          <email>Dragi.Kocev@ijs.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suzana Loskovska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saˇso Dˇzeroski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Faculty of Electrical Engineering and Information Technology Karpoˇs bb</institution>
          ,
          <addr-line>1000 Skopje</addr-line>
          ,
          <country country="MK">Macedonia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Knowledge Technologies, Joˇzef Stefan Institute Jamova cesta 39</institution>
          ,
          <addr-line>1000 Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a multiple targets classification system for visual concepts detection and image annotation. Multiple targets classification (MTC) is a variant of classification where an instance may belong to multiple classes at the same time. The system is composed of two parts: feature extraction and classification/annotation. The feature extraction part provides global and local descriptions of the images. These descriptions are then used to learn a classifier and to annotate an image with the corresponding concepts. To this end, we use predictive clustering trees (PCTs), which are capable to classify an instance to multiple classes at once, thus exploit the interactions that may occur among the different visual concepts (classes). Moreover, we constructed ensembles (random forests) of PCTs, to improve the predictive performance. We tested our system on the image database from the visual concept detection and annotation task part of ImageCLEF 2010. The extensive experiments conducted on the benchmark database show that our system has very high predictive performance and can be easily scaled to large number of images and visual concepts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>An ever increasing amount of visual information is becoming available in digital
form in various digital archives. The value of the information obtained from
an image depends on how easily it can be found, retrieved, accessed, filtered
and managed. Therefore, tools for efficient archiving, browsing, searching and
annotation of images are a necessity.</p>
      <p>A straightforward approach, used in some existing information retrieval tools
for visual materials, is to manually annotate the images by keywords and then
to apply text-based query for retrieval. However, manual image annotation is
an expensive and time-consuming task, especially given the large and constantly
growing size of image databases.</p>
      <p>The image search provided by major search engines, such as Google, Bing,
Yahoo! and AltaVista, relies on textual or metadata descriptions of images found
on the web pages containing the images and the file names of the images. The
results from these search engines are very disappointing when the visual content
of the images is not mentioned, or properly reflected, in the associated text.</p>
      <p>
        A more sophisticated approach to image retrieval is automatic image
annotation: a computer system assigns metadata in the form of captions or keywords to
a digital image [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These annotations reflect the visual concepts that are present
in the image. This approach begins with the extraction of feature vectors
(descriptions) from the images. A machine learning algorithm is then used to learn
a classifier, which will then classify/annotate new and unseen images.
      </p>
      <p>
        Most of the systems for detection of visual concepts learn a separate model
for each visual concept [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, the number of visual concepts can be large
and there can be mutual connections between the concepts that can be exploited.
An image may have different meanings or contain different concepts, multiple
targets classification (MTC) can be used for obtaining annotations (i.e., labels
for the multiple visual concepts present in the image) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The goal of MTC is to
assign to each image multiple labels, which are a subset of a previously defined
set of labels.
      </p>
      <p>
        In this paper, we present a system for detection of visual concepts and
annotation of images. For the annotation of the images, we propose to exploit the
interactions between the target visual concepts (inter-class relationships among
the image labels) by using predictive clustering trees (PCTs) for MTC. PCTs
are able to handle multiple target concepts, i.e., perform MTC. To improve the
predictive performance, we use ensembles (random forests) of PCTs for MTC.
For the extraction of features, we use several techniques that are recommended
as most suitable for the type of images at hand [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        We tested the proposed approaches on the image database from the visual
concept detection and annotation task part of ImageCLEF 2010 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The visual
concept detection and annotation task is a multiple labels (targets) classification
challenge. It aims at the automatic annotation of a large number of consumer
photos with multiple annotations. The concepts used in this annotation task are
for example abstract categories like Family/Friends or Partylife, the time of day
(day, night, sunny, ...), Persons (no, single, small or big group), Quality (blurred,
underexposed, ...) and etc.
      </p>
      <p>The remainder of this paper is organized as follows. Section 2 presents the
proposed large scale visual concept detection system. Section 3 explains the
experimental design. Section 4 reports the obtained results. Conclusions and a
summary are given in Section 5.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>System for Detection of Visual Concepts</title>
      <sec id="sec-2-1">
        <title>Overall architecture</title>
        <p>
          Fig. 1 presents the architecture of the proposed system for visual concepts
detection and image annotation. The system is composed of a feature extraction
part and a classification/annotation part. We use two different sets of features
to describe the images: global and local features extracted from the image pixel
values. We employ different sampling strategies and different spatial pyramids
to extract the visual features (both global and local) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>Images (Train / Test)
Feature extraction</p>
        <p>Visual features
Spatial pyramid
Sampling strategy</p>
        <p>Codebook
transform
Descriptors of images
PCTs for MTC / Classifiers</p>
        <p>Predictions / Annotations</p>
        <p>
          As an output of the feature extraction part, we obtain several sets of
descriptors of the image content that can be used to learn a classifier to annotate the
images with the visual concepts. Tommassi et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] show that usage of various
visual features that bring different information about the visual content of the
images clearly outperform single feature approaches. Following these findings, in
our research we use ‘high level’ feature fusion scheme.
        </p>
        <p>The high level fusion scheme (depicted in Fig. 2) is performed as follows.
First, we learn a classifier for each set of descriptors separately. The classifier
outputs the probabilities with which an image is annotated with the given visual
concepts. To obtain a final prediction, we combine the probabilities output from
the classifiers for the different descriptors by averaging them. Depending on
the domain, different weights can be used for the predictions of the different
descriptors.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Multiple Targets Classification</title>
        <p>
          Following the reccomendations from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we formally describe the machine
learning task that we consider here - multiple targets classification.
        </p>
        <p>We define the task of multiple targets prediction as follows:
Given:
– A description space X that consists of tuples of primitives (boolean, discrete
or continuous variables), i.e. ∀Xi ∈ X, Xi = (xi1 , xi2 , ..., xiD ), where D is
the size of a tuple (or number of descriptive variables),
– a target space Y , where each tuple consists of several variables that can be
either continuous or discerete, i.e., ∀Yi ∈ Y, Yi = (yi1 , yi2 , ..., yiT ), where T
is the size of a tuple (or number of target variables),
– a set of examples/instances E, where E = {(Xi, Yi)|Xi ∈ X, Yi ∈ Y, 1 ≤ i ≤</p>
        <p>N } and N is the number of examples of E (N = |E|), and
– a quality criterion q (which rewards models with high predictive accuracy
and low complexity).</p>
        <p>Find: a function f : X → Y such that f maximizes q. Here, the function f is
presented with decision trees, i.e., predictive clustering trees.</p>
        <p>If the tuples from Y (the target space) consist of continuous/numeric
variables then the task at hand is multiple targets regression. Likewise, if the tuples
from Y consist of discrete/nominal variables then the task is called multiple
targets classification.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Ensembles of PCTs for MTC</title>
        <p>
          In the PCT framework [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a tree is viewed as a hierarchy of clusters: the top-node
corresponds to one cluster containing all data, which is recursively partitioned
into smaller clusters while moving down the tree.
        </p>
        <p>PCTs are constructed with a standard “top-down induction of decision trees”
(TDIDT) algorithm. The heuristic for selecting the tests is the reduction in
variance caused by partitioning the instances, where the variance V ar(S) is
defined by equation (1) below. Maximizing the variance reduction maximizes
cluster homogeneity and improves predictive performance.</p>
        <p>
          A leaf of a PCT is labeled with/predicts the prototype of the set of examples
belonging to it. With appropriate variance and prototype functions, PCTs can
handle different types of data, e.g., multiple targets [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], hierarchical multi-label
classification [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] or time series [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. A detailed description of the PCT framework
can be found in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The PCT framework is implemented in the CLUS system,
which is available for download at http://www.cs.kuleuven.be/~dtai/clus.
        </p>
        <p>The prototype function returns a vector containing the probabilities that an
example belongs to a given class for each target attribute. This afterwards can
be used to calculate the majority class for each target attribute. The variance
function is computed as the sum of the entropies of class variables:</p>
        <p>T
Var (E) = X GiniCoefficient (E , yi )
i=1
(1)</p>
        <p>
          For a detailed description of PCTs for MTC the reader is referred to [
          <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
          ].
Next, we explain how PCTs are used in the context of an ensemble classifier,
namely ensembles further improve the performance of PCTs.
        </p>
        <p>
          Random Forests of PCTs To improve the predictive performance of PCTs,
we use ensemble methods. An ensemble classifier is a set of classifiers. Each
new example is classified by combining the predictions of each classifier from
the ensemble. These predictions can be combined by taking the average (for
regression tasks) or the majority vote (for classification tasks) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In our case,
the predictions in a leaf are the proportions of examples of different classes that
belong to it. We use averaging to combine the predictions of the different trees.
As for the base classifiers, a threshold should be specified to make a prediction.
        </p>
        <p>
          We use random forests as an ensemble learning technique. A random forest [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
is an ensemble of trees, obtained both by bootstrap sampling, and by randomly
changing the feature set during learning. More precisely, at each node in the
decision tree, a random subset of the input attributes is taken, and the best
feature is selected from this subset (instead of the set of all attributes). The
number of attributes that are retained is given by a function f of the total
number of input attributes x (e.g., f (x) = x, f (x) = √x, f (x) = ⌊log2 x⌋ + 1).
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Feature Extraction</title>
        <p>
          We use different commonly used types of techniques for feature extraction from
images. We employ two types of global image descriptors: gist features [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and
a RGB color histogram, with 8 bins in each color channel for the RGB color
space.
        </p>
        <p>
          Local features include scale-invariant feature transforms (SIFT) extracted
densely on a multi-scale grid [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The dense sampling gives an equal weight to
all key-points, independent of their spatial location in the image. To overcome
this limitation, one can use spatial pyramids of 1x1, 2x2 and 1x3 regions [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          We computed four different sets of SIFT descriptors over the following color
spaces: RGB, opponent, normalized opponent and gray. For each set of SIFT
descriptors, we use the codebook approach to avoid using all visual features of
an image [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>The generation of the codebook begins by randomly sampling 50 key-points
from each image and extracting SIFT descriptors in each key-point (i.e., each
key-point is described by a vector of numerical values). Then, to create the
codewords, we employ k-means clustering on the set of all key-points. We set
the number of clusters to 4000, thus we define a codebook with 4000 codewords (a
codeword corresponds to a single cluster and a codebook to the set of all clusters).
Afterwards, we assign the key-points to the discrete codewords predefined in the
codebook and obtain a histogram of the occurring visual features. This histogram
will contain 4000 bins, one for each codeword. To be independent of the total
number of key-points in an image, the histogram bins are normalized to sum up
to 1.</p>
        <p>
          The number of key-points and codewords (clusters) are user defined
parameters for the system. The values used above (50 key-points and 4000 codewords)
are recommended for general images [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Design</title>
      <sec id="sec-3-1">
        <title>Definition and Parameter Settings</title>
        <p>
          We evaluated our system on the image database from the visual concept
detection and annotation task part of ImageCLEF 2010. The image database consists
of training (8000) and test (10000) images. The images are labeled with 93 visual
concepts [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. A list of the visual concepts is presented in Table 2. The goal of
the task is to predict which of the visual concepts are present in each of the
testing images.
        </p>
        <p>We generated six sets of visual descriptors for the images: four sets of SIFT
descriptors (one detector, dense sampling, over four different color spaces) with
32000 bins for each set (8 sub-images, from the spatial pyramids: 1x1, 2x2 and
1x3, 4000 bins each). We also generated two sets of global descriptors (gist
features with 960 bins and RGB color histogram with 512 bins).</p>
        <p>The parameter values for the random forests were as follows: we used 100
base classifiers and the size of the feature subset was set to 10% of the number
of descriptive attributes.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Performance measures</title>
        <p>
          The evaluation of the results is done using three measures of performance
suggested by the organizers of the challenge: mean average precision (MAP),
Fmeasure and average ontology score (AOS) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The first score evaluates the
performance for each visual concept (concept-based evaluation), while the second
and the third evaluate the performance for each testing image (example-based
evaluation).
        </p>
        <p>The mean average precision is widely used evaluation measure. For a given
target concept, the average precision can be calculated as the area under the
precision-recall curve for that target. Hence, it combines both precision and
recall into a single performance value. The average precision is calculated for
each visual concept separately and the obtained values are then averaged to
obtain the mean average precision.</p>
        <p>The F-measure is also widely used measure and it is well known. F-measure
is calculated as the weighted harmonic mean of precision and recall:
P recision · Recall
F − measure = 2 · P recision + Recall
(2)</p>
        <p>
          The AOS measure calculates the misclassification cost for each missing or
wrongly annotated concept per image. The AOS score is based on structure
information (distance between concepts in the provided ontology of concepts),
relationships from the ontology and the agreement between annotators for a
concept extend with misclassification cost that incorporates the Flickr context
similarity costs map [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Additionally, we report Equal Error Rate (EER) and Area Under the ROC
curve (AUC). However, we do not discuss these evaluation measures because
they were not used for the evaluation of the submitted runs by the organizers of
the competition [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This was the case because precision/recall analysis gives
more intuitive and sensitive evaluation than the ROC analysis.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Submitted runs</title>
        <p>We have submitted four different runs (see Table 1). We do not use the EXIF
metadata and the Flickr user tags provided for the photos. This means that all
our runs consider automatic annotation using only visual information.</p>
        <p>The runs can be divided using the following criteria: used descriptors and
rescaling of the outputs. We used two different sets of descriptors: only SIFT
(local descriptors) and SIFT combined with global descriptors (RGBHist and
Gist). Since the AOS measure uses threshold 0.5 to determine wheter an image
is annotated with a concept, we lineraly scale the probabilities to cope with the
skewed distribution of the visual concepts. The linear scaling can be done on low
level or high level. With the low level approach, we linearly scale the outputs
from each classifier (obtained from the separate descriptors) and then average
these values. For the high level approach, we linearly scale the averaged output
of the classifiers.</p>
        <p>The runs can be summarized as follows:
– SIFT + RGBHist + Gist (HLScale): local descriptors (SIFT) and global
descriptors (RGBHist and Gist) with high level linear scaling.
– SIFT (HLScale): local descriptors (SIFT) with high level linear scaling.
– SIFT + RGBHist + Gist (LLScale): local descriptors (SIFT) and global
descriptors (RGBHist and Gist) with low level linear scaling.</p>
        <p>– SIFT (LLScale): local descriptors (SIFT) with low level linear scaling.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>
        values for the following visual concepts: Neutral-Illumination, No-Visual-Season,
No-Blur, No-Persons, Outdoor, Sky, Day, Landscape-Nature, No-Visual-Time,
Clouds, Natural, Plants. We obtain lower AP values for the concepts that are less
represented in the training set of images (e.g., rain, horse, skateboard, graffiti...)
and the ‘difficult’ concepts (e.g., abstract, technical, boring). The agreement of
human annotators on the ‘difficult’ concepts is ∼ 75% [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Further improvements can be expected if different weighting schemes are
used (to combine the predictions of the various descriptors). The weight of the
descriptors can be adapted for each visual concept separately. For instance, the
SIFT descriptors are invariant to color changes, and they do not predict well
concepts where illumination is important. Thus, the weight of the SIFT
descriptors in the combined predictions for those concepts should be decreased. Also
we should find better descriptors for these concepts, such as estimating the color
temperature and overall light intensity.</p>
      <p>Another approach is to tackle the problem with the skewed distribution of
concepts over the images. On approach can be generation of virtual images
containing the under-represented visual concepts. These virtual images can be
obtained with re-scaling, translation, rotation, changing the brightness of the
images from the under-represented concepts.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Multiple targets classification (MTC) problems are encountered increasingly
often in image annotation. However, flat classification machine learning approaches
are predominantly applied in this area. In this paper, we propose to exploit the
dependencies between the different target attributes by using ensembles of trees
for MTC. Our approach to MTC builds a single classifier that simultaneously
predicts all of the visual concepts present in the images at once. This means
adding new visual concepts will just slighly decrease the computational
efficiency. While, for the other approaches that create a classifier for each visual
concept separately this means learning an additional classifier.</p>
      <p>Applied on the image database from the visual concept detection and
annotation task part of ImageCLEF 2010 our approach was ranked fourth for the
example-based performance measures (Ontology Score with FCS and Average</p>
      <p>F-measure) and fifth for the concept-based evaluation (Mean Average Precision),
out of 17 competing groups.</p>
      <p>The system we presented is general. It can be easily extended with new
feature extraction methods, and it can thus be easily applied to other domains,
types of images and other classification schemes. In addition, it can handle
arbitrarily sized hierarchies organized as trees or directed acyclic graphs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blockeel</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramon</surname>
          </string-name>
          , J.:
          <article-title>Top-down induction of clustering trees</article-title>
          .
          <source>In Proc. of the 15th ICML</source>
          ,
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.: Random</given-names>
          </string-name>
          <string-name>
            <surname>Forests</surname>
          </string-name>
          .
          <source>Machine Learning</source>
          ,
          <volume>45</volume>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dˇzeroski</surname>
          </string-name>
          , S.:
          <article-title>Towards a General Framework for Data Mining</article-title>
          .
          <source>In Proc. of the 5th KDID</source>
          , LNCS vol.
          <volume>4747</volume>
          ,
          <fpage>259</fpage>
          -
          <lpage>300</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kocev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vens</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Struyf</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Dˇzeroski, S.:
          <article-title>Ensembles of Multi-Objective Decision Trees</article-title>
          .
          <source>Proc. ECML</source>
          <year>2007</year>
          , LNAI vol.
          <volume>4701</volume>
          ,
          <fpage>624</fpage>
          -
          <lpage>631</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lazebnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponce</surname>
          </string-name>
          , J.:
          <article-title>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</article-title>
          .
          <source>CVPR</source>
          ,
          <fpage>2169</fpage>
          -
          <lpage>2178</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          :
          <article-title>Real-Time Computerized Annotation of Pictures</article-title>
          .
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>30</volume>
          (
          <issue>6</issue>
          ),
          <fpage>985</fpage>
          -
          <lpage>1002</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D. G.</given-names>
          </string-name>
          :
          <article-title>Distinctive Image Features from Scale-Invariant Keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>60</volume>
          (
          <issue>2</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nowak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF 2009 Large-Scale Visual Concept Detection</article-title>
          and
          <string-name>
            <given-names>Annotation</given-names>
            <surname>Task</surname>
          </string-name>
          .
          <source>Multilingual Information Access Evaluation Vol. II Multimedia Experiments: 10th Workshop of the CLEF</source>
          <year>2009</year>
          , to appear in LNCS, Corfu, Greece (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nowak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukashevich</surname>
          </string-name>
          , H.:
          <article-title>Multilabel classification evaluation using ontology information</article-title>
          , Workshop on IRMLeS, Heraklion, Greece (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>10. Visual Concept Detection and Annotation Task at ImageCLEF 2010: http://www.imageclef.org/2010/PhotoAnnotation</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Oliva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>42</volume>
          (
          <issue>3</issue>
          ),
          <fpage>145</fpage>
          -
          <lpage>175</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Slavkov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gjorgjioski</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Struyf</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Dˇzeroski, S.:
          <article-title>Finding explained groups of time-course gene expression profiles with predictive clustering trees</article-title>
          .
          <source>Molecular BioSystems</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>4</issue>
          ,
          <fpage>729</fpage>
          -
          <lpage>740</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Van de Sande</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gevers</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snoek</surname>
            .,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A comparison of color features for visual concept classification</article-title>
          .
          <source>CIVR</source>
          ,
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tommasi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orabona</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Discriminative cue integration for medical image annotation</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>29</volume>
          (
          <issue>15</issue>
          ),
          <fpage>1996</fpage>
          -
          <lpage>2002</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vens</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Struyf</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schietgat</surname>
          </string-name>
          , L., Dzeroski, S.,
          <string-name>
            <surname>Blockeel</surname>
          </string-name>
          , H.:
          <article-title>Decision trees for hierarchical multi-label classification</article-title>
          .
          <source>Machine Learning</source>
          <volume>73</volume>
          (
          <issue>2</issue>
          ),
          <fpage>185</fpage>
          -
          <lpage>214</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>