<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KDEVIR at ImageCLEF 2014 Scalable Concept Image Annotation Task: Ontology-based Automatic Image Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ismat Ara Reshma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Zia Ullah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masaki Aonoy</string-name>
          <email>aono@tut.jpy</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Toyohashi University of Technology</institution>
          ,
          <addr-line>1-1 Hibarigaoka, Tempaku-Cho, Toyohashi, 441-8580, Aichi</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>386</fpage>
      <lpage>397</lpage>
      <abstract>
        <p>In this paper, we describe our participation in the ImageCLEF 2014 Scalable Concept Image Annotation task. In this participation, we propose a novel approach of automatic image annotation by using ontology at several steps of supervised learning. In this regard, we construct tree-like ontology for each annotating concept of images using WordNet and Wikipedia as primary source of knowledge. The constructed ontologies are used throughout the proposed framework including several phases of training and testing of one-vs-all SVMs classi er. Experimental results clearly demonstrate the effectiveness of the proposed framework.</p>
      </abstract>
      <kwd-group>
        <kwd>Concept Detection</kwd>
        <kwd>Classi cation</kwd>
        <kwd>Image Annotation</kwd>
        <kwd>Ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Due to the explosive growth of digital technologies, collections of images are
increasing tremendously in every moment. The ever growing size of the image
collections has evolved the necessity of image retrieval (IR) systems; however,
the task of IR from a large volume of images is formidable since binary stream
data is often hard to decode, and we have very limited semantic contextual
information about the image content.</p>
      <p>
        To enable the user for searching images using semantic meaning,
automatically annotating images with some concepts or keywords using machine learning
is a popular technique. During last two decades, there are a large number of
researches being lunched using state-of-the-art machine learning techniques [1{
4] (e.g. SVMs, Logistic Regression). In such efforts, most often each image is
assumed to have only one class label. However, this is not necessarily true for
real world applications, as an image might be associated with multiple semantic
tags. Therefore, it is a practical and important problem to accurately assign
multiple labels to one image. To alleviate above problem i.e. to annotate each
image with multiple labels, a number of research have been carried out; among
them adopting probabilistic tools such as the Bayesian methods is popular [5{7].
More review can be found in [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. However, accuracy of such approach depends
on expensive human labeled training data.
      </p>
      <p>Fortunately, some initiatives have been taken to reduce the reliability on
manually labeled image data [10{13] by using cheaply gathered web data. Although
the "semantic gaps" between low-level visual features and high-level semantics
still remain and accuracy is not improved remarkably.</p>
      <p>
        In order to reduce the dependencies of human-labeled image data,
ImageCLEF [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] has been organizing the photo annotation and retrieval task for the
last several years, where training data is a large collection of Web images without
ground truth labels. Despite the proposed methods in this task shown
encouraging performance on a large scale dataset, unfortunately none of them utilizes
the sematic relations among annotating concepts. In this paper, we describe the
participation of KDEVIR at ImageCLEF 2014 Scalable Concept Image
Annotation Task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where, we proposed a novel approach, ontology based supervised
learning that exploits both low-level visual features and high-level semantic
information of images during training and testing. The evaluation results reveal
the effectiveness of proposed framework.
      </p>
      <p>The rest of the paper is organized as follows: Section 2 describes the
proposed framework. Section 3 describes our submitted runs to this task as well
as comparison results with other participants' runs. Finally, Concluded remarks
and some future directions of our work are described in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>Given Training Data: Large</title>
    </sec>
    <sec id="sec-3">
      <title>Scale Web Image Corpus</title>
      <sec id="sec-3-1">
        <title>Visual</title>
      </sec>
      <sec id="sec-3-2">
        <title>Features</title>
      </sec>
      <sec id="sec-3-3">
        <title>Metadata</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Processed Training Data</title>
    </sec>
    <sec id="sec-5">
      <title>Training</title>
      <sec id="sec-5-1">
        <title>Data for c1</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Training</title>
      <sec id="sec-6-1">
        <title>Data for c2</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Training</title>
      <sec id="sec-7-1">
        <title>Data for cN</title>
        <p>!"#$%&amp;'()"*#%+,#-(.%</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Features and/or</title>
    </sec>
    <sec id="sec-9">
      <title>Metadata</title>
      <sec id="sec-9-1">
        <title>Concepts</title>
        <p>{c1, c2,.. cN}</p>
        <sec id="sec-9-1-1">
          <title>Pre-processing of</title>
        </sec>
        <sec id="sec-9-1-2">
          <title>Training Data</title>
        </sec>
        <sec id="sec-9-1-3">
          <title>Constructing</title>
        </sec>
        <sec id="sec-9-1-4">
          <title>Ontology</title>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>Constructed Ontology</title>
      </sec>
      <sec id="sec-9-3">
        <title>Ontology of c1 Ontology of c2 Ontology of cN</title>
        <sec id="sec-9-3-1">
          <title>Training Classifier</title>
        </sec>
      </sec>
      <sec id="sec-9-4">
        <title>Trained</title>
      </sec>
      <sec id="sec-9-5">
        <title>Models</title>
        <sec id="sec-9-5-1">
          <title>Generating</title>
        </sec>
        <sec id="sec-9-5-2">
          <title>Annotations</title>
        </sec>
      </sec>
      <sec id="sec-9-6">
        <title>Final Annotations</title>
        <p>Proposed Framework
In this section, we describe our method for annotating images with a list of
semantic concepts. We divide our method into four steps: 1) Constructing
Ontology, 2) Pre-processing of Training Data, 3) Training Classi er, and 4) Generating
Annotations. An overview of our proposed framework is depicted in Fig. 1.
2.1</p>
        <p>
          Constructing Ontology
Ontologies are the structural frameworks for organizing information about the
world or some part of it. In computer science and information science, ontology
is de ned as an explicit, formal speci cation of a shared conceptualization [
          <xref ref-type="bibr" rid="ref16 ref17">16,
17</xref>
          ] and it formally represents knowledge as a set of concepts within a domain,
and the relationships between those concepts. To utilize these relationships in
image annotation, we construct ontology for each concept of a prede ned list of
concept used to annotate images.
        </p>
        <p>
          In real world, an image might contain multiple objects (aka concepts) in a
single frame, where concepts are inter-related and maintain a natural way of being
co-appearance. We use these hypotheses to construct ontologies for concepts.
In this regard, we utilize WordNet [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and Wikipedia as primary sources of
knowledge. However, WordNet and Wikipedia themselves have some limitations
which cause obstacles to construct ontology using its. For example, WordNet
considers very small number of conceptual relations and very few cross-POS
(Parts of Speech) pointers among words; on the other hand, Wikipedia contains
wide range of semantic information, however, is not structured as WordNet and
prone to contain noises, as of being free to edit for all expert and non-expert
contributor. As, both of the sources have some limitations, during knowledge
extraction we choose those parts of both sources which are less prone to noise
and semantically more con dent. Thus, take the advantage of both structured
representation of WordNet and wide diversity of semantic relations of Wikipedia.
        </p>
        <p>
          Let C be a set of concepts. We will construct a tree-like [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] ontology for each
concept cc 2 C. In order to build ontologies, rst of all, we select some types of
relations including: 1) taxonomical Rt, 2) bionomical Rb, 3) food habitual Rfh,
and 4) weak hierarchical, Rwh. The rst and fourth types of relations de ne
relations among any types of concepts, where second and third types de ne
relations among concepts which are biological living things. The relations are
extracted empirically according to our observations on WordNet and hundreds
of Wikipedia articles. According to the semantic con dence, the order of relation
types is: Rt &gt; Rb &gt; Rfh &gt; Rwh. For each type of relations, we extract a set of
relations as listed below:
- Rt = finHypernymPathOf, superClassOf g
- Rb =fhabitat, inhabit, liveIn, foundOn, foundIn, locateAt, nativeTog
- Rfh =fliveOn, feedOng
- Rwh =fkindOf, typeOf, representationOf, methodOf, appearedAt, appearedIn,
ableToProduceg
Finally, we apply some \if-then" type inference rules to add an edge from a
parent-concept to a child-concept by leveraging the above relations as
illustrated in Fig. 2. In addition, for some concepts, especially adjectives (e.g.
indoor, outdoor), which have neither much lexical information in WordNet, nor
any Wikipedia articles, we manually determine the relations to other concepts.
        </p>
        <p>
          Parent-concept !"!
$% &amp; '%!
!#! Child-concept
For a given list of concepts, we select the most weighted images for each concept
from the noisy training images by exploiting their metadata (details about
metadata are given in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]) and pre-constructed concept ontologies. In this regards,
rst of all, we detect the nouns and adjectives from metadata using WordNet
followed by singularizing with Pling Stemmer1. Secondly, detected terms from
metadata: Web text (scofeat), keywords, and URLs are weighted by BM25 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
mean reciprocal rank (MRR), and a constant weight,# 2 (0; 1) respectively,
which is followed by detecting concepts from the weighted sources on
appearance basis. Thus, we have three lists of possible weighted concepts from three
different sources of metadata for each image.
        </p>
        <p>We take the inverted index of image-wise weighted concepts, thus generate
the concept-wise weighted images. To aggregate the images for a concept from
three sources, we normalize the weight of images, and linearly combine the
normalized BM25 (nBM25) weight, normalized MRR (nMRR), and constant weight
# to generate the nal weight of images. From the resultant aggregated list of
images, top-m images are primarily selected for each concept.</p>
        <p>Finally, in order to increase the recall, we merge the primarily selected
training images of each concept with its predecessor concepts of highest semantic
con dent (i.e. predecessors connected by rt 2 Rt) by leveraging our concept
ontologies. Thus, we enhance training images per-concept as well as number of
annotated concepts per-image.</p>
        <p>Image annotation is a multi-class multi-label classi cation problem; current
state-of-the-art classi ers are not able to solve this problem in their usual
format. Towards this problem, we propose a novel technique of using ontologies
during different phases of learning a classi er. In this regard, we choose Support
Vector Machines (SVMs) as a classi er for its robustness of generalization. We
subdivide the whole problem into several sub-problems according to the
number of concepts, i.e. train SVMs for each concept separately, since using a large
dataset at a time is not rational in terms of memory and time.</p>
        <p>highway
road
(a)
nighttime</p>
        <p>soil
unpaved</p>
        <p>Another problem is that, along with the different parameters, the classi
cation accuracy of SVMs depends on the positive and negative examples which
are used to train the classi er. It is obvious that if classi ers are trained with
wrong examples, the prediction will be wrong. However, selecting appropriate
training example is formidable without any semantic clues. For example, if we
train a classi er about \soil" without taking into account semantic inter-links
with other concepts, one might choose only the \soil" community of Fig. 3
as positive examples, and the remaining are as negative examples. However, it
might result in wrongly trained model, since semantically \unpaved" contains
soil, which should not be in negative example. To handle this issue, we use our
pre-constructed concept ontologies. We randomly select with replacement n-folds
positive image examples for each concept from its image list and negative
examples from image lists of other concepts which are not its successor of strong or
weak semantic con dent in its ontology.</p>
        <p>
          From the n-folds positive and negative examples, we train n probabilistic
one-vs-all SVM models for each concept, where n 2 [1; 10]. We use LIBSVM [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
to learn the SVM models. As kernel, two hybrid kernels are plugged in, instead
of using the default choice linear kernel or Gaussian kernel, since image
classi cation is a nonlinear problem and distribution of image data is unknown.
We choose histogram intersection kernel (HIK) [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] as primary kernel which is
further used to generate two other hybrid kernels. The HIK is de ned as:
l
kHI (h(a); h(b)) = ∑ min(h(qa); h(qb))
q=1
(1)
where h(a) and h(b) are two normalized histograms of l bins; in context of image
data, two feature vectors of l dimensions.
        </p>
        <p>One of the hybrid kernels is convex combination of HIKs (CCHIK) generated
from low level visual features of image de ned as:
(2)
(3)
K(0) =
1 jF j</p>
        <p>
          ∑ KHI (fs)
jF j s=1
where KHI (fs) is a HIK matrix, computed from feature vector type fs 2 F ; F is
a set of visual feature types (details about used visual features are given in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]);
and jF j is the number of elements in F . In this task jF j = 7.
        </p>
        <p>
          Another hybrid kernel is context dependent kernel (CDK) [
          <xref ref-type="bibr" rid="ref21 ref24">21, 24</xref>
          ], de ned
as:
        </p>
        <p>K(t+1) = K(0) +</p>
        <p>P K(t)P ′
where K(0) is the CCHIK kernel, P is the left stochastic adjacency matrix
between images with each entry proportional to the number of shared labels, and
0. Unlike the original CDK, here, we consider semantic links emerged from
ontological information along with contextual links (as shown in Fig. 3). These
kernels are plugged into the SVMs for training and testing.
2.4</p>
        <p>Generating Annotations
The trained models generated in the previous subsection are used to predict
annotations. Given a test image with its visual features (which are similar types
of training images' visual features) and URLs as metadata, the system nds out
the concepts from URLs on appearance basis as did before for URLs of training
images. The visual features and detected concepts are used to calculate kernel
values mentioned in previous subsection, which are in turn used for predicting
annotations. For the given test image, if a model of particular concept responds
positively, the image is considered as voted by current model i.e. the
corresponding concept is primarily selected for annotation. At the same time, the tracks of
predicted probability and vote are kept. This process is repeated for all learned
models of all concepts. The concept-wise predicted probabilities and votes are
accumulated for n-models. In second level selection, empirical thresholds for
accumulated probabilities and votes are used to select more relevant annotations.
In third level, we take top-k weighted concepts, and nally, the test image is
annotated with the selected concepts along with their predecessor concepts in
concept ontologies.
3</p>
        <p>
          KDEVIR Runs and Comparative Results
We submitted total ten runs, which are differ from each other in terms of: use
of ontology or not, if used, then in terms of used relation types of different
semantic con dent during nal stage of generating annotation; used kernel (e.g.
CCHIK, CDK); number of primarily selected training images, m; and number
of trained models for each concept, n. The con gurations of all runs are given
in Table 1, where, runs are arranged according to their original name to ease
the ow of description. All the parameters used in our proposed framework
were set empirically to obtain optimal F-measure based on sample (MF-samples)
of corresponding run on development set. Details about all the performance
measures are given in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>In Fig. 4, and 5, comparisons of our runs (denoted KDEVIR-*) and other
participants' runs are illustrated. It reveals the most effectiveness of our proposed
approach over other participants' runs. Among the our submitted runs, in Run 1
and 7, we did not exploit semantic information from ontology to compare the
effectiveness of our proposed ontology-based approach over ontology-free ordinary
one-vs-all SVMs setting with CCHIK and CDK respectively. The comparison
results depict that proposed approach tremendously outperform the ordinary
one-vs-all SVMs setting.
4</p>
        <p>Conclusion
In this paper, we described the participation of KDEVIR at ImageCLEF 2014
Scalable Concept Image Annotation task, where, we proposed a novel approach
5
0
55
50
45
)
40
%
(
ts35
p
e
c30
n
o
-c25
F
M20
15
10
5
0
40
35
)
30
%
(
s25
e
l
p
a
s
m20
P15
A
M
10
5
0
1 2 3 4 5 1 1 2 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 1 2 1 2 3 4 5 6 7 8 9 0 1 1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
-_UM -_UM -_UM -_UM -_UM IIKN -FU -FU EO EO EO EO EO EO ILP ILP ILP ILP ILP ILP ILP ILP ILP ILP IR IR IR IR IR IR IR IR IR IR IL IL IL ab ab IA IA IA IA IA IA IA IA I_A I_A II_ _C _C _C _C _C _C _C _C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _</p>
        <p>V V V V V V V V V V M M M L L L L L L L L L L L L N U U U U U U U U
A A A A A F C C A A A A A A E E E E E E E E E E d d M M M M M M M M M M R R R R R R R R
IS IS IS IS IS IM IM IN IN IN IN IN IN D D D D D D D D D D in in
D D D D D K K K K K K K K K K M M
organizers in http://www.imageclef.org/2014/annotation/results) of our runs
(denoted KDEVIR-*) and other participants' runs on the test set. Acronyms
stand for RUC: Renmin U. of China, DISA-MU: Masaryk U. in Czech
Republic, MIL: Tokyo U., MindLab: National U. of Colombia, MLIA: Kyushu
U. in Japan, IPL: Athens U. of Economics and Business, IMC-FU: Fudan U.
in China, NII: National Institute of Informatics in Japan, FINKI: Ss.Cyril and
Methodius U. in Macedonia, INAOE: National Institute of Astrophysics, Optics
and Electronics in Mexico. (a) mean F-measures for samples (MF-samples), (b)
mean F-measures for concepts (MF-concepts), and (c) mean average precision
for samples (MAP-samples)
1 2 3 4 5 1 1 2 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 1 2 1 2 3 4 5 6 7 8 9 0 1 1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
-_UM -_UM -_UM -_UM -_UM IIKN -FU -FU EO EO EO EO EO EO ILP ILP ILP ILP ILP ILP ILP ILP ILP ILP IR IR IR IR IR IR IR IR IR IR IL IL IL ab ab IA IA IA IA IA IA IA IA I_A I_A II_ _C _C _C _C _C _C _C _C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _</p>
        <p>V V V V V V V V V V M M M L L L L L L L L L L L L N U U U U U U U U
A A A A A F C C A A A A A A E E E E E E E E E E d d M M M M M M M M M M R R R R R R R R
IS IS IS IS IS IM IM IN IN IN IN IN IN D D D D D D D D D D in in
D D D D D K K K K K K K K K K M M
during development
2014 organizers in http://www.imageclef.org/2014/annotation/results) of our
runs (denoted KDEVIR-*) and other participants' runs on three different
subsets of the test set in terms of mean F-measures for samples (MF-samples).
Acronyms stand for RUC: Renmin U. of China, DISA-MU: Masaryk U. in
trophysics, Optics and Electronics in Mexico. (a) for the subset of test set seen
during development, (b) for the subset of test set unseen during development,
and (c) for the subset of test set, which contains both seen and unseen samples
394
for annotating images using ontologies at several phases of supervised learning
from large scale noisy training data.</p>
        <p>The evaluation result reveals that our proposed approach achieved the most
effective and best performance among 58 submitted runs in terms of MF-samples
and MF-concepts. Moreover, according to the MAP-samples it produced
comparable result, although we did not prioritize the annotated concepts came from
semantic relation (i.e. we assigned the same weights of originally predicted
concepts to their corresponding semantically emerged concepts in annotation of a
particular image). In future, we will consider fuzzy relations among concepts
in ontologies to facilitate more robust ranking of annotation, thus increase the
MAP, and incorporate distributed framework to ensure scalability.
Acknowledgement
This research was partially supported by the Ministry of Education, Culture,
Sports, Science and Technology (MEXT), Grant-in-Aid (B) 26280038.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dumont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maree</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wehenkel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geurts</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Fast multi-class image annotation with random windows and multiple output randomized trees</article-title>
          .
          <source>In: Proc. International Conference on Computer Vision Theory and Applications</source>
          (VISAPP) Volume. Volume
          <volume>2</volume>
          . (
          <year>2009</year>
          )
          <volume>196</volume>
          {
          <fpage>203</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alham</surname>
            ,
            <given-names>N.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Parallelizing multiclass support vector machines for scalable image annotation</article-title>
          .
          <source>Neural Computing and Applications</source>
          <volume>24</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          )
          <volume>367</volume>
          {
          <fpage>381</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <surname>Y</surname>
          </string-name>
          .:
          <article-title>Incorporating multiple svms for automatic image annotation</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>40</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
          <volume>728</volume>
          {
          <fpage>741</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          :
          <article-title>Content-based image classi cation using a neural network</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>25</volume>
          (
          <issue>3</issue>
          ) (
          <year>2004</year>
          )
          <volume>287</volume>
          {
          <fpage>300</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rui</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chua</surname>
            ,
            <given-names>T.S.:</given-names>
          </string-name>
          <article-title>A novel approach to auto image annotation based on pairwise constrained clustering and semi-nave bayesian model</article-title>
          .
          <source>In: Multimedia Modelling Conference</source>
          ,
          <year>2005</year>
          .
          <article-title>MMM 2005</article-title>
          .
          <article-title>Proceedings of the 11th International</article-title>
          , IEEE (
          <year>2005</year>
          )
          <volume>322</volume>
          {
          <fpage>327</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fotouhi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Image content annotation using bayesian framework and complement components analysis</article-title>
          .
          <source>In: Image Processing</source>
          ,
          <year>2005</year>
          .
          <article-title>ICIP 2005</article-title>
          . IEEE International Conference on. Volume
          <volume>1</volume>
          ., IEEE (
          <year>2005</year>
          ) I{
          <fpage>1193</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jeon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavrenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manmatha</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Automatic image annotation and retrieval using crossmedia relevance models</article-title>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Deep convolutional ranking for multilabel image annotation</article-title>
          .
          <source>arXiv preprint arXiv:1312.4894</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , G.:
          <article-title>A review on automatic image annotation techniques</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>45</volume>
          (
          <issue>1</issue>
          ) (
          <year>2012</year>
          )
          <volume>346</volume>
          {
          <fpage>362</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , W.Y.,
          <string-name>
            <surname>Wen</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Hierarchical clustering of www image search results using visual, textual and link information</article-title>
          .
          <source>In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM</source>
          (
          <year>2004</year>
          )
          <volume>952</volume>
          {
          <fpage>959</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.:
          <article-title>Training highly multiclass classi ers</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <year>2014</year>
          )
          <volume>1</volume>
          {
          <fpage>48</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
          </string-name>
          , N.:
          <article-title>Large scale image annotation: learning to rank with joint word-image embeddings</article-title>
          .
          <source>Machine learning 81(1)</source>
          (
          <year>2010</year>
          )
          <volume>21</volume>
          {
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Jing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Ma</surname>
          </string-name>
          , W.Y.:
          <article-title>Annosearch: Image auto-annotation by search</article-title>
          .
          <source>In: Computer Vision and Pattern Recognition</source>
          ,
          <source>2006 IEEE Computer Society Conference on. Volume</source>
          <volume>2</volume>
          ., IEEE (
          <year>2006</year>
          )
          <volume>1483</volume>
          {
          <fpage>1490</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Martinez-Gomez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patricia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvasti</surname>
            , N., Uskudarl ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cazorla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Varea</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morell</surname>
          </string-name>
          , V.:
          <article-title>ImageCLEF 2014: Overview and analysis of the results</article-title>
          .
          <source>In: CLEF proceedings. Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
          </string-name>
          , R.:
          <article-title>Overview of the ImageCLEF 2014 Scalable Concept Image Annotation Task</article-title>
          . In:
          <article-title>CLEF 2014 Evaluation Labs</article-title>
          and Workshop, Online Working Notes. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Gruber</surname>
            ,
            <given-names>T.R.</given-names>
          </string-name>
          :
          <article-title>Toward principles for the design of ontologies used for knowledge sharing? International journal of human-computer studies 43(5) (</article-title>
          <year>1995</year>
          )
          <volume>907</volume>
          {
          <fpage>928</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Studer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benjamins</surname>
            ,
            <given-names>V.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Knowledge engineering: principles and methods</article-title>
          .
          <source>Data &amp; knowledge engineering 25(1)</source>
          (
          <year>1998</year>
          )
          <volume>161</volume>
          {
          <fpage>197</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <issue>11</issue>
          ) (
          <year>1995</year>
          )
          <volume>39</volume>
          {
          <fpage>41</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulla</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Sentiment learning on product reviews via sentiment ontology tree</article-title>
          .
          <source>In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          , Association for Computational Linguistics (
          <year>2010</year>
          )
          <volume>404</volume>
          {
          <fpage>413</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beaulieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willett</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Okapi at trec-7: automatic ad hoc, ltering, vlc and interactive track. Nist Special Publication SP (</article-title>
          <year>1999</year>
          )
          <volume>253</volume>
          {
          <fpage>264</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sahbi</surname>
          </string-name>
          , H.:
          <article-title>Cnrs-telecom paristech at imageclef 2013 scalable concept image annotation task: Winning annotations with context dependent svms</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs and Workshop</source>
          , Online Working Notes. Valencia,
          <source>Spain (September</source>
          <volume>23</volume>
          -26
          <year>2013</year>
          )
          <article-title>Overview of the ImageCLEF</article-title>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Libsvm: a library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 2</source>
          (
          <issue>3</issue>
          ) (
          <year>2011</year>
          )
          <fpage>27</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Swain</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballard</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          :
          <article-title>Color indexing</article-title>
          .
          <source>International journal of computer vision 7</source>
          (
          <issue>1</issue>
          ) (
          <year>1991</year>
          )
          <volume>11</volume>
          {
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sahbi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Audibert</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keriven</surname>
          </string-name>
          , R.:
          <article-title>Context-dependent kernels for object classi cation</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,
          <source>IEEE Transactions on 33(4)</source>
          (
          <year>2011</year>
          )
          <volume>699</volume>
          {
          <fpage>708</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>