<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CNRS - TELECOM ParisTech at ImageCLEF 2013 Scalable Concept Image Annotation Task: Winning Annotations with Context Dependent SVMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hichem SAHBI</string-name>
          <email>hichem.sahbi@telecom-paristech.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS TELECOM ParisTech</institution>
          ,
          <addr-line>46 rue Barrault, 75013 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <abstract>
        <p>In this paper, we describe the participation of CNRS TELECOM ParisTech in the ImageCLEF 2013 Scalable Concept Image Annotation challenge. This edition promotes the use of many contextual cues attached to visual contents. Image collections are supplied with visual features as well as tags taken from di erent sources (web pages, etc.). Our framework is based on training support vector machines (SVMs) using a class of kernels referred to as context dependent. These kernels are designed by minimizing objective functions mixing visual features and their contextual cues resulting from surrounding tags. The results clearly corroborate the complementarity of tags and visual features and the effectiveness of these context dependent SVMs for image annotation.</p>
      </abstract>
      <kwd-group>
        <kwd>Context-Dependent Kernels</kwd>
        <kwd>Support Vector Machines</kwd>
        <kwd>Image Annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Conventionally, visual information search requires a preliminary step known as
image annotation. The latter is a major challenge (see for instance [
        <xref ref-type="bibr" rid="ref14 ref17 ref23 ref24 ref31 ref33 ref5 ref8">14, 33, 31, 23,
24, 17, 5, 8</xref>
        ]) and consists in assigning list of keywords (a.k.a concepts) to given
visual content. These concepts may either correspond to physical entities
(pedestrians, cars, etc.) or to high level aspects resulting from the interaction of many
entities into scenes (races, ghts, etc.). In both cases, image annotation is
challenging due to the perplexity when assigning concepts to scenes especially when
the number of possible concepts is taken from a large vocabulary and when
analyzing highly semantic contents.
      </p>
      <p>
        Existing annotation methods (see for instance [
        <xref ref-type="bibr" rid="ref17 ref5">5, 17</xref>
        ]) are usually content-based;
they rst model image observations using low level features (color, texture,
shape, etc.), treat each concept as an independent class, and then train the
corresponding concept-speci c classi er to identify images belonging to that
concept using a variety of machine learning and inference techniques such as
latent Dirichlet allocation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Markov models [
        <xref ref-type="bibr" rid="ref17 ref23">17, 23</xref>
        ], probabilistic latent
semantic analysis [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and support vector machines (SVMs) [
        <xref ref-type="bibr" rid="ref10 ref30">10, 30</xref>
        ]. These
learning machines are used to model correspondences between concepts and low level
features and make it possible to assign concepts to new images.
      </p>
      <p>
        The above annotation methods heavily rely on their visual content for image
annotation [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Due to the semantic gap, they are unable to fully explore the
semantic information inside images; this comes from the statistical inconsistency
of low level features with respect to the learned concepts and also complexity
of scenes. Another class of annotation methods, referred to as context-based, has
emerged that takes advantage of extra information (such as contextual cues in
social networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) in order to better capture the correlations between images
and their semantic concepts. Early methods started to emerge for text
documents in social networks [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ] and now recent work is handling visual content
annotation, in di erent contexts; such as the approach of [
        <xref ref-type="bibr" rid="ref11 ref18">18, 11</xref>
        ] that uses
visual links as context in social networks, in order to propagate image tags and the
method of [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] that uses friendship connections and conditional random elds
in order to improve the performance of photo annotation. Other works consider
distances between tags using Flickr [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], or context informations taken from
personal calendars [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], GPS locations [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], visual appearances [
        <xref ref-type="bibr" rid="ref19 ref4">4, 19</xref>
        ] and multiple
cues [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] in order to improve annotation.
      </p>
      <p>
        In this paper, we describe the participation of \CNRS-TELECOM
ParisTech" at ImageCLEF 2013 Scalable Concept Image Annotation Task [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. Our
proposed solution is based on the design of similarity functions that compare
images, using context-dependent kernels. The latter are designed using multiple
visual features as well as multiple contextual (text) informations provided in this
task. When plugged into SVMs, for image classi cation and annotation, these
kernels turned out to be very e ective.
      </p>
      <p>The rest of this paper is organized as follows; in Section 2, we describe motivation
and proposed method at a glance. In Section 3, we describe our participation
and di erent runs submitted to this task as well as our results and comparison
against other participants' runs. Finally, we conclude the paper in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation and Proposed Method at a Glance</title>
      <p>
        Among image annotation methods mentioned earlier, those based on machine
learning and particularly kernel methods (such as SVMs) are particularly
successful but their success remains highly dependent on the choice of kernels. The
latter, de ned as symmetric and positive semi-de nite functions [
        <xref ref-type="bibr" rid="ref32 ref35">35, 32</xref>
        ], should
reserve large values to very similar content and vice-versa. Usual kernels, either
holistic [
        <xref ref-type="bibr" rid="ref16 ref22">16, 22</xref>
        ] or alignment-based [
        <xref ref-type="bibr" rid="ref1 ref12 ref13 ref20 ref25 ref3 ref37 ref6">12, 1, 13, 3, 20, 37, 6, 25</xref>
        ], consider similarities
as decreasing functions of distances between patterns or proportional to the
quality of aligning primitives inside patterns. In both cases, kernels rely only on the
intrinsic properties of patterns without taking into account their contextual cues.
We are interested, in this work, in the integration of context in kernels in order
to further enhance their discrimination power, for image annotation, while
ensuring their positive de niteness and also their e ciency. The guiding principle
relies on a basic assertion: kernels should not depend only on intrinsic aspects
of images (as images with the same semantic may have di erent visual and
textual features), but also on di erent sources of knowledge including context. The
designed family of kernels, takes high values not only when images share the
same content but also the same context. The context of an image is de ned
as the set of images sharing links (eg. tags) and exhibiting better semantic
descriptions, compared to both pure visual and tag based descriptions. The issue
of combining context and visual content for image annotation and search has
been investigated in previous related work (see for instance [
        <xref ref-type="bibr" rid="ref27 ref28 ref29 ref30 ref39 ref4 ref40 ref9">9, 4, 40, 39, 29, 28,
30, 27</xref>
        ] and work discussed earlier); the novel part of this work aims to integrate
context (from the ImageCLEF 2013 collection), in kernel design for classi cation
and annotation, and plug these kernels in support vector machines in order to
take bene t from their well established generalization power [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
      </p>
      <p>
        In this work, we use a novel class of kernels (referred to as explicit and
context-dependent) for image annotation [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] (see also [
        <xref ref-type="bibr" rid="ref28 ref29">29, 28</xref>
        ]). An image database
is modeled as a graph where nodes are pictures and edges correspond to the
shared tagged links. The proposed kernel design method is based on the
optimization of an objective function mixing a delity term, a context criterion
and a regularization term. The delity term, takes into account the visual
content of images, so highly visually similar contents encourage high kernel values.
The context criterion, considers the local graph structure and allows us to
further enhance the relevance of our designed kernel, by di using and restoring
the similarity i pairs of images are also surrounded by highly similar images
that should also recursively share the same context. The regularization term
controls the smoothness of the learned kernel and makes it possible to obtain a
closed form solution. Solving this minimization problem results into a recursive
similarity function (with an explicit kernel map) that converges to a positive
semi-de nite xed-point.
      </p>
      <p>Again, our proposed method goes beyond the naive use of low level features and
usual context free kernels (established as the standard baseline in image
annotation) in order to design a family of kernels applicable to annotation and suitable
to integrate the \contextual" information taken from tagged links in
interconnected datasets. In the proposed context-dependent kernel, two images (even
with di erent visual content and even sharing di erent tags) will be declared as
similar if they share the same visual context (see also Fig. 1). This is usually
useful as tags in interconnected data may be noisy and misspelled. Furthermore,
the intrinsic visual content of images might not always be relevant especially for
concepts exhibiting large variation of the underlying visual aspects.</p>
    </sec>
    <sec id="sec-3">
      <title>ImageCLEF 2013 Evaluation</title>
      <p>The targeted task is image annotation also known as \concept detection"; given
a picture of a database, the goal is to predict which concepts (classes) are present
into that picture.
3.1</p>
      <sec id="sec-3-1">
        <title>ImageCLEF 2013 Collection</title>
        <p>The annotation task, of this year, concentrated on developing annotation
algorithms that rely only on data obtained automatically from the web. A very
large amount of images was gathered from the web by the organizers, and using
associated web pages, tags were also obtained. As tags are noisy (i.e., the degree
of relationship between images and the surrounding tags varies greatly), we use
some preprocessing in order to assign tags to images.</p>
        <p>Dev set. this set is labeled and consists in 1,000 images belonging to 95
categories including \aerial", \bridges", \clouds", etc. Sample of images belonging
to the dev set is shown in Fig. 2, top.</p>
        <p>Test set. as the objective, of this year task, is to develop algorithms that can
easily change or scale the list of concepts used for image annotation, an unlabeled
test set was provided and includes 2,000 images belonging to 116 categories; 21
of them are not available in the dev set and are considered as out of list concepts.
These concepts include \bottle", \butter y", \chair", etc. Sample of images
belonging to the out of list concepts is shown in Fig. 2, bottom.</p>
        <p>Training set label generation. a larger set including 250,000 images was
provided with meta-data but without labels. The meta-data, associated to a given
image, includes a list of keywords used in order to retrieve that image, in the
web, with di erent search engines.</p>
        <p>For a given concept (among the 116 concepts), we extract a training set, by
collecting among the 250k images those which include that concept, in their
meta-data. As keywords associated to a given concept may appear in di erent
forms, we applied some morphological expansions in order to increase the recall
when searching for training images belonging to a given concept.
Context matrix generation. we design a left stochastic adjacency matrix
(denoted P) between images with each entry proportional to the number of shared
keywords in the meta-data of the underlying images. We use this adjacency
matrix in order to build our context dependent kernels as discussed in section 3.3.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>ImageCLEF 2013 Visual Features</title>
        <p>We used only the visual features provided in this imageCLEF task including
GIST, Color Histograms, SIFT, C-SIFT, RGB-SIFT and OPPONENT-SIFT.
For all the SIFT-based descriptors, a bag-of-words representation is provided.
Even though provided, images were not used in order to extract any extra
features.
3.3</p>
        <p>CNRS-TELECOM ParisTech Runs and Comparison
All our submitted runs (discussed below) are based on SVM training. Again the
goal is to achieve image annotation also known as concept detection. For this
purpose, we trained \one-versus-all" SVM classi ers for each concept; we use many
random folds (taken from training data) for multiple SVM training and we use
these SVMs in order to predict the concepts on the dev and test sets. We repeat
this training process, for each concept, through di erent random folds from the
training set and we take the average scores of the underlying SVM classi ers.
This makes classi cation results less sensitive to the sampling of the training set.
For all the submitted runs (see runs 1 - 6 below), the only di erence resides
in the used kernels. We plug the latter into SVMs in order to achieve concept
detection. Performances are evaluated using the mean F-measures (at concept
and sample levels) as well as the mean average precisions. Details about these
measures are given in the ImageCLEF 2013 web page1.</p>
        <p>Run 1. for this run, we build 7 gram matrices2 associated to the visual
features mentioned earlier. Then, we linearly combine those matrices into a single
one. Notice that this combination does not result from multiple kernel learning
but just a convex combination of kernels with uniform weights. We plug the
resulting kernel into SVMs for training and testing. A given test image is assigned
to a given concept, i the underlying SVM score is (with = 0:5 in practice).
Run 2. the setting of this run is exactly the same as run 1 except that the
cut-o threshold is set to 1.</p>
        <p>
          Run 3. the linear combination of kernel matrices (denoted K(0)) obtained in
runs 1 and 2 is used as an initialization to the context dependent kernel (CDK)
de ned as K(t+1) = K(0) + PK(t)P0, with 0 (see [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]). The latter is
computed iteratively (in two iterations) using the adjacency matrix P introduced
earlier. Once designed, we plug CDK into SVMs for training and testing. A
given test image is again assigned to a given concept, i the underlying SVM
score is (with = 0:5 in practice)
Run 4. the setting of this run is exactly the same as run 3 except that the
cut-o threshold is set to 1.
1 http://imageclef.org/2013/photo/annotation
2 Based on histogram intersection kernel.
Run 5. before computing the convex combination of kernels (as done in runs 3,
4), we rst evaluate for each kernel matrix (associated to a given visual feature)
its underlying CDK (K(t+1) = K(0) + PK(t)P0 with K(0) being the linear
kernel matrix). Then, we apply histogram intersection kernel to these CDKs
and we linearly combine the resulting kernels with uniform weights. Again, the
number of iterations in CDK is set to 2. Once designed, we plug the nal kernel
matrix into SVMs, for training and testing. A given test image is again assigned
to a given concept, i the underlying SVM score is (with = 0:5 in practice)
Run 6. the setting of this run is exactly the same as run 5 except that the
cut-o threshold is set to 1.
        </p>
        <p>Diagrams in Figs. 3, 4 and 5, show the mean F-measures and mean average
precisions of our runs and their comparisons with respect to di erent participants'
runs. From all these results it is clear that our best runs (runs 6 and 4)
outperform the others for almost all the evaluation measures.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We discussed in this paper, the participation of \CNRS-TELECOM ParisTech"
in ImageCLEF 2013 Scalable Concept Image Annotation Task. Our submissions
include pure visual runs based on linear combination of elementary histogram
intersection kernels, as well as combined visual/textual runs, that consider the
context of images through context dependent kernels. The latter turned out to
be the most e ective and achieved the best performance among 57 participants'
runs.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported in part by a grant from the Research Agency ANR
(Agence Nationale de la Recherche) under the MLVIS project and a grant from
DIGITEO under the RELIR project.</p>
      <p>ME E
I N N
N U U
U</p>
      <p>ME E
I N N
N U U U C C C C C</p>
      <p>U
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
J
R
U</p>
      <p>IMIMIM IM
N N N N
U U U U
IMIMIM IM
N N N N
U U U U
IMIMIM IM
N N N N</p>
      <p>U U U U
60
50
40
30
20
10
0
45
40
35
30
25
20
15
10
5
0
60
50
40
30
20
10
0
http://imageclef.org/2013/photo/annotation) of our runs (denoted TPT-*) and other participants'
runs on the dev set. Acronyms stand for ISI: Tokyo U., UNIMORE: U. of Modena and
Reggio Emilia, RUC: Renmin U. of China, UNEDUV: National U. of Distance Education at Spain,
CEALIST: CEA, France, KDEVIR: Toyohashi U. of Technology in Japan, URJCyUNED: King
Juan Carlos U. in Spain, MICC: Florence U. in Italy, SZTAKI: Hungarian Academy of Sciences,
INAOE: National Institute of Astrophysics, Optics and Electronics in Mexico, THSSMPAM:
Tsinghua U., Beijing, China, LMCHFUT: Hefei University of Technology, China. Top diagram:
mean F-measures for samples, middle: mean F-measures for concepts, and Bottom: mean average
precision.</p>
      <p>ME E
I N N
N U U
U</p>
      <p>ME E
I N N
N U U U C C C C C</p>
      <p>U
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
J
R
U
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
runs on the test set. Acronyms stand for ISI: Tokyo U., UNIMORE: U. of Modena and
Reggio Emilia, RUC: Renmin U. of China, UNEDUV: National U. of Distance Education at Spain,
CEALIST: CEA, France, KDEVIR: Toyohashi U. of Technology in Japan, URJCyUNED: King
Juan Carlos U. in Spain, MICC: Florence U. in Italy, SZTAKI: Hungarian Academy of Sciences,
INAOE: National Institute of Astrophysics, Optics and Electronics in Mexico, THSSMPAM:
Tsinghua U., Beijing, China, LMCHFUT: Hefei University of Technology, China. Top diagram: mean
F-measures for samples, middle: mean F-measures for concepts, and Bottom: mean F-measures
for concepts unseen in the dev set.</p>
      <p>ME E
I N N
N U U
U</p>
      <p>ME E
I N N
N U U U C C C C C</p>
      <p>U
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
J
R
U
6−TP 4−TP II1−S II4−S II2−S 2−E 5−E 1−E 2−T 6−E 4−C I5−S 5−C I3−S 3−ERO 3−VDU 5−VDU 5−TTP 3−CRU 3−TTP 2−CRU 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUN IL4−TSEA IL5−TSEA IL3−TSEA IL2−TSEA IL1−TSEA I1−EVKRD y3−ENDCU I5−CCMI4−CCM2−EDUN I3−CCM1−EDUN I2−CCMI1−CCMI3−VEKR 1−TTP I6−VEKRD I4−VEKRD I2−VEKRD I5−VEKRD I1−TZAKS I3−AENO I2−ZTKAS 3−PSAMM2−SPAMM1−FTCUH I1−AENO 1−SPAMMI2−AENO 4−SPAMM5−SPAMMI4−AENO</p>
      <p>R R R P R U I U I
T T O O O T O R R</p>
      <p>D
6−TP 4−TP II1−S II4−S II2−S 2−ERO 5−ERO 1−ERO 2−TTP 6−ERO 4−CUR II5−S 5−CU II3−S 3−ERO 3−VUD 5−VUD 5−TTP 3−RUC 3−TTP 2−RUC 4−ERO 4−VUD 1−VDU 1−CRU 2−VEDUNU I4L−TSEAC IL5−TSEAC IL3−TSEAC IL2−TSEAC IL1−TSEAC I1−EKVDR y3−ECNUD I5−CCM I4−CCM 2−EDUN I3−CCM 1−EDUN I2−CCM I1−CCM I3−VEKRD 1−TTP I6−VEKRD I4−EVKRD I2−EVKRD I5−EVKRD I1−TZASK I3−AENO I2−ZTKAS 3−PSAMM 2−SPAMM 1−FTCUH I1−AENO 1−SPAMM I2−AENO 4−SPAMM 5−SPAMM I4−AENO
T T IUNM IUNM IUNM IUNM R IUNM ENU EUN IUNM ENU EUN</p>
      <p>TSH TSH LM TSH TSH TSH
J
R
U</p>
      <p>JyC yJC
UR UR</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bahlmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haasdonk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Burkhardt</surname>
          </string-name>
          .
          <article-title>On-line handwriting recognition with support vector machines, a kernel approach</article-title>
          .
          <source>IWFHR</source>
          , pages
          <volume>49</volume>
          {
          <fpage>54</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>K.</given-names>
            <surname>Barnard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Duygululu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Forsyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Matching words and pictures</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Boughorbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tarel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Boujemaa</surname>
          </string-name>
          .
          <article-title>The intermediate matching kernel for image local features</article-title>
          .
          <source>IEEE International Joint Conference on Neural Networks</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Annotating photo collection by label propagation according to multiple similarity cues</article-title>
          .
          <source>ACM Multimedia</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Carneiro</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Vasconcelos</surname>
          </string-name>
          .
          <article-title>Formulating semantic image annotation as a supervised learning problem</article-title>
          .
          <source>In: Proc. of CVPR</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Cuturi</surname>
          </string-name>
          .
          <article-title>Fast global alignment kernels</article-title>
          .
          <source>In Proceedings of the International Conference on Machine Learning</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Good</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarvas</surname>
          </string-name>
          .
          <article-title>From context to content: leveraging context to infer media metadata</article-title>
          .
          <source>In: Proceedings of 12th Annual ACM International Conference on Multimedia, MM</source>
          <year>2004</year>
          ,
          <article-title>Brave New Topics Session on From Context to Content: Leveraging Contextual Metadata to infer Multimedia Content in New York</article-title>
          , ACM Press,
          <fpage>188</fpage>
          -
          <lpage>195</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P.</given-names>
            <surname>Duygulu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Barnard</surname>
          </string-name>
          , J. deFreitas, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Forsyth</surname>
          </string-name>
          .
          <article-title>Object recognition as machine translation: Learning a lexicon for a xed image vocabulary</article-title>
          . In: Heyden,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sparr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Johansen</surname>
          </string-name>
          , P. (eds.)
          <article-title>ECCV 2002</article-title>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>2353</volume>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>112</lpage>
          . Springer, Heidelberg,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Neustaedter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Image annotation using personal calendars as context</article-title>
          .
          <source>ACM Multimedia</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Jain</surname>
          </string-name>
          .
          <article-title>Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classi ers</article-title>
          .
          <source>in Proc. of ACM MULTIMEDIA</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-J.</given-names>
            <surname>Zha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and X.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Visual-textual joint relevance learning for tag-based social image search</article-title>
          .
          <source>IEEE Trans. Image Processing</source>
          ,
          <volume>22</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>T.</given-names>
            <surname>Gartner</surname>
          </string-name>
          .
          <article-title>A survey of kernels for structured data</article-title>
          .
          <source>Multi Relational Data Mining</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <volume>49</volume>
          {
          <fpage>58</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>K.</given-names>
            <surname>Grauman</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <article-title>The pyramid match kernel: E cient learning with sets of features</article-title>
          .
          <source>Journal of Machine Learning Research (JMLR)</source>
          ,
          <volume>8</volume>
          :
          <fpage>725</fpage>
          {
          <fpage>760</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zimel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Carreira</surname>
          </string-name>
          .
          <article-title>Multiscale conditional random elds for image labeling</article-title>
          .
          <source>In CVPR</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>D.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          .
          <article-title>Inferring photographic location using geotagged web images</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          ,
          <volume>56</volume>
          (
          <issue>1</issue>
          ):
          <volume>131</volume>
          {
          <fpage>153</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kondor</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Jebara</surname>
          </string-name>
          .
          <article-title>A kernel between sets of vectors</article-title>
          .
          <source>In proceedings of the 20th International conference on Machine Learning</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Automatic linguistic indexing of pictures by a statistical modeling approach</article-title>
          .
          <source>IEEE Trans. on PAMI</source>
          ,
          <volume>25</volume>
          (
          <issue>9</issue>
          ):
          <volume>1075</volume>
          {
          <fpage>1088</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Snoek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Worring</surname>
          </string-name>
          .
          <article-title>Learning tag relevance by neighbor voting for social image retrieval</article-title>
          .
          <source>In MIR conference</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.-H.</given-names>
            <surname>Tsang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          .
          <article-title>Textual query of personal photos facilitated by large-scale web data</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <volume>1022</volume>
          {
          <fpage>1036</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>S.</given-names>
            <surname>Lyu</surname>
          </string-name>
          .
          <article-title>Mercer kernels for object recognition with local features</article-title>
          .
          <source>In the proceedings of the IEEE Computer Vision and Pattern Recognition</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>F.</given-names>
            <surname>Monay</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>GaticaPerez</surname>
          </string-name>
          .
          <article-title>Plsa-based image autoannotation: Constraining the latent space</article-title>
          .
          <source>in Proc. of ACM International Conference on Multimedia</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. P. Moreno,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Vasconcelos</surname>
          </string-name>
          .
          <article-title>A kullback-leibler divergence based kernel for svm class cation in multimedia applications</article-title>
          .
          <source>In Neural Information Processing Systems</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>G.</given-names>
            <surname>Moser</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Serpico</surname>
          </string-name>
          .
          <article-title>Combining support vector machines and markov random elds in an integrated framework for contextual image classi cation</article-title>
          .
          <source>In TGRS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowak</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Huiskes</surname>
          </string-name>
          .
          <article-title>New strategies for image annotation: Overview of the photo annotation task at imageclef 2010</article-title>
          .
          <source>in The Working Notes of CLEF</source>
          <year>2010</year>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>J. Qiu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ben-Hur</surname>
            ,
            <given-names>J.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Vert</surname>
            , and
            <given-names>W. S.</given-names>
          </string-name>
          <string-name>
            <surname>Noble</surname>
          </string-name>
          .
          <article-title>A structural alignment kernel for protein structures</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>23</volume>
          (
          <issue>9</issue>
          ):
          <volume>1090</volume>
          {
          <fpage>1098</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Image retrieval: Ideas, in uences, and trends of the new age</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahbi</surname>
          </string-name>
          .
          <article-title>Explicit context-aware kernel map learning for image annotation</article-title>
          .
          <source>The 9th International Conference on Computer Vision systems</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. H.
          <string-name>
            <surname>Sahbi</surname>
            ,
            <given-names>J.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Audibert</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Keriven</surname>
          </string-name>
          .
          <article-title>Context-dependent kernels for object classi cation</article-title>
          .
          <source>In Pattern Analysis and Machine Intelligence (PAMI)</source>
          ,
          <volume>4</volume>
          (
          <issue>33</issue>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29. H.
          <string-name>
            <surname>Sahbi</surname>
            ,
            <given-names>J.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Audibert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rabarisoa</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Keriven</surname>
          </string-name>
          .
          <article-title>Context-dependent kernel design for object matching and recognition</article-title>
          .
          <source>In the proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahbi</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Context based support vector machines for interconnected image annotation (the saburo tsuji best regular paper award)</article-title>
          .
          <source>In the Asian Conference on Computer Vision (ACCV)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <given-names>D.</given-names>
            <surname>Semenovich</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sowmya</surname>
          </string-name>
          .
          <article-title>Geometry aware local kernels for object recognition</article-title>
          .
          <source>In ACCV</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32. J.
          <string-name>
            <surname>Shawe-Taylor</surname>
            and
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Cristianini</surname>
          </string-name>
          .
          <article-title>Support vector machines and other kernelbased learning methods</article-title>
          . Cambridge University Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiebo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Weiyu</surname>
          </string-name>
          .
          <article-title>Probabilistic spatial context models for scene content understanding</article-title>
          .
          <source>In CVPR</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zickler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <article-title>Auto-tagging facebook: Social network context improves photo annotation</article-title>
          .
          <source>in IVW</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Statistical learning theory</article-title>
          . A
          <string-name>
            <surname>Wiley-Interscience Publication</surname>
          </string-name>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>M. Villegas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Paredes</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Thomee</surname>
          </string-name>
          .
          <article-title>Overview of the imageclef 2013 scalable concept image annotation subtask</article-title>
          .
          <source>CLEF 2013 working notes</source>
          , Valencia, Spain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>C. Wallraven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Caputo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Graf</surname>
          </string-name>
          .
          <article-title>Recognition with local features: the kernel recipe</article-title>
          . ICCV, pages
          <volume>257</volume>
          {
          <fpage>264</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38. L.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Hua</surname>
            , N. Y. nd W. Ma, and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Flickr distance</article-title>
          .
          <source>In: Proc. of ACM MULTIMEDIA</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. L.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Geng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanjalic</surname>
            , and
            <given-names>X.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Hua</surname>
          </string-name>
          .
          <article-title>A uni ed context model for web image retrieval</article-title>
          .
          <source>ACM Transactions on Multimedia Computing</source>
          , Communications, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (TOMCCAP),
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>28</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40. Y.
          <string-name>
            <surname>-H. Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>P-T.Wu</surname>
            ,
            <given-names>C.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-H. Lin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Hsu</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          . Contextseer:
          <article-title>Context search and recommendation at query time for shared consumer photos</article-title>
          .
          <source>ACM Multimedia</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>D. Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zha</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Giles</surname>
          </string-name>
          .
          <article-title>Exploring social annotations for information retrieval</article-title>
          .
          <source>in the WWW conference</source>
          , Beijing, China,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>