<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MLIA at ImageCLFE 2014 Scalable Concept Image Annotation Challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xing Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atsushi Shimada</string-name>
          <email>atsushig@limu.ait.kyushu-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rin-ichiro Taniguchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Advanced Information Technology, Kyushu University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>411</fpage>
      <lpage>420</lpage>
      <abstract>
        <p>In this paper, we propose a large-scale image annotation system for the ImageCLEF 2014 Scalable Concept Image Annotation task. The annotation task, of this year, concentrated on developing annotation algorithms that rely only on data obtained automatically from the web. Since the sophisticated SVM based annotation techniques had been widely applied in the task last year (ImageCLEF 2013), for the task this year, we also adopt the SVM based annotation techniques and put our e ort mainly on obtaining more accurate concepts assignment for training images. More speci cally, we proposed a two-fold scheme to assign concepts to unlabeled training images: (1) A traditional process which stems the extracted web data of each training image from textual aspect, and make concepts assignment based on the appearance of each concept. (2) An additional process which leverages the deep convolutional network toolbox Overfeat to predict labels (in ImageNet nouns) for each training image from visual aspect, then the predicted tags are mapped to concepts in ImageCLEF based on WordNet synonyms and hyponyms with semantic relations. Finally, the allocated concepts for each training image are generated based on a fusion step of the two-fold concepts assignment processes. Experimental results show that the proposed concepts assignment scheme is e cient to improve the assignment results of traditional textual processing and to allocate reasonable concepts for training images. Consequently, with an e cient SVMs solver based on Stochastic Gradient Descent, our annotation systems achieves competitive performance in the annotation task.</p>
      </abstract>
      <kwd-group>
        <kwd>imageclef</kwd>
        <kwd>image annotation</kwd>
        <kwd>social web data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this year ImageCLEF 2014 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we participated the Scalable Concept Image
Annotation challenge1 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which aimed at developing more scalable image
annotation system. The goal of this challenge is to develop annotation systems that
for training only rely on unsupervised web data and other automatically
obtainable resources. In contrast to traditional image annotation evaluations with
labeled training data, this challenge requires work in more front, such as handling
      </p>
    </sec>
    <sec id="sec-2">
      <title>1http://www.imageclef.org/2014/annotation</title>
      <p>the noisy data, textual processing and multilabel annotations and scalability to
unobserved labels.</p>
      <p>
        Since this year is the third edition of the annotation challenge, regarding the
methodology of annotation system, we can make several observations from the
overview reports [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] of previous editions:
{ The best performing system, TPT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], only used provided visual features,
which indicated that the visual features provided by the organizers is
sufcient enough and the other features extracted by several teams might be
complementary.
{ The top 3 teams (TPT, MIL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and UNIMORE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) all utilized SVMs based
algorithms to learn separate classi ers for each concept, which was veri ed
to be superior to the K nearest neighbor (KNN) based annotation techniques
used by other groups, such as RUC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], MICC [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
{ The textual processing and concepts assignment for training images were
signi cant, since they directly a ected the learning accuracy of concept
classi ers.
      </p>
      <p>The major di erence of the challenge this year compared with previous
editions is the proportions of \scaled" concepts. In the challenge last year, there are
total 116 concepts (95 concepts for development set and 21 more for test set),
the proportions of \scaled" concepts are 12116 0:181. On the contrast, in this
year, there are total 207 concepts (107 concepts for development set and 100
more for test set), the proportions of \scaled" concepts are 210007 0:483. Thus
it implies the signi cance of annotation system to be scalable and to generalize
well to the new concepts.</p>
      <p>
        To develop a robust and scalable annotation system, we believe that one of
the intrinsic issues is to assign more appropriate concepts to training images.
Once we have collected more accurate (positive/negative) samples for each
concept, it is possible to improve the performance of concepts' classi ers. Thus for
the contest, we mainly focus on the issue of accurate concepts assignment for
training images. Besides the traditional textual information processing such as
stopwords removal and stemming, which have been widely applied in previous
editions. We also leverage the recent popular convolutional neural networks
(CNN) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to allocate tags (1K WordNet nouns) for each training images from
visual aspect. As the CNN based method utilizes the deep neural network to
improve classi cation task, we can rely on the tags predicted by Overfeat and map
the tags to concepts of ImageCLFE vocabulary. Then a late fusion approach is
used to decide the nal concepts assignment for each training image. Finally, we
train a linear SVM classi er for each concept (similar in development and test
set) with the visual features provided by the organizers. To tackle the high
dimensional large volumes of training data, we adopt the online learning strategy
of stochastic gradient descent (SGD). We nally obtain competitive annotation
performances in terms of mAP-samples, MF-concepts and MF-samples measures
and are ranked the 4th place among all 11 groups on overall measure.
      </p>
      <p>The rest of the paper is organized as follows. Section 2 demonstrates the
architecture of proposed annotation system and we mainly discuss our concepts
assignment scheme for training images. In Section 3, we describe our
experimental setups and report the evaluation results obtained on both the development
and the test sets. And Section 4 includes conclusion and some future direction
of our works.
2</p>
      <sec id="sec-2-1">
        <title>Proposed annotation system</title>
        <p>
          The proposed annotation system is depicted in Figure 1. To assign more
appropriate concepts for training images, we conduct a 2-fold scheme which explicitly
leverages the provided textural information semantically (Section 2.1) and the
training images visually (Section 2.2). Based on the reliable labeled training
images, we further learn SVMs based concept classi ers using standard visual
features provided by the organizers. To tackle the high dimensional features and
large volumes of data, we use online learning method combined with SGD
algorithm. Then we use the learnt stable concept classi ers for concept prediction of
images in development and test sets. In the following subsections, we would like
to depict the detailed procedure of each module of the diagram in Figure 1.
The organizers of ImageCLFE 2014 provided several kinds of textural features
of training images. Following the traditional text processing approach utilized
last year, to e ciently process the textual features, we applied multiple
ltering on the textural features. Regarding the modules of \Stopword removal and
stemming" in Figure 1, the detailed processing procedures are:
{ \Stopword removal and stemming" is performed on the \scofeats" les,
where stopwords, misspelled words, words from di erent languages other
than English, the titles of the original web pages are extracted and parsed.
{ We then matched the semantic relations of the remaining words with the list
of concepts in development set based on WordNet 3.02. We extend the list of
concepts with their synonyms, and examine whether current word matches
with concept or its synonyms.
{ The Lucene [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] stemmer is adopted if the word does not exactly match with
the list of concepts.
        </p>
        <p>The output of the \Result ltering and re nement" produces a candidate set of
concepts for each of the training image. Indeed, the processing approach in this
subsection could be considered as a baseline as it gives many false negative and
false positive concepts to training images. Therefore, besides the textual features
of training images, it is reasonable to further consider the visuality of training
images. For example, for a training image describing \airplane", and its
textural features (web page, title, etc) contain words of \airplane pilot hats", simply
applying the text processing approach would result in concepts \airplane",
\person", and \hat" to be assigned to the training image. However, if it is possible to
estimate the content of image visually in advance, then the unrelated concepts
\person", \hat" could be rejected to the training image. Thus, in the next
subsection, we would like to introduce a context mapping method to predict tags
for training images in advance.
2.2</p>
        <sec id="sec-2-1-1">
          <title>Context mapping using CNN</title>
          <p>To estimate the content of training images visually, we take advantages of a
recently proposed toolbox Overfeat3, which is an image recognizer and feature
extractor built around a deep convolutional neural network (CNN). We consider
this powerful toolbox for two reasons: (1) It achieved competitive classi cation
results on ImageNet 2013 contest4. (2) OverFeat convolutional net was trained
on WordNet 1K nouns, which is consistent to the concept list of ImageCLFE.
Thus it is rational to predict tags for training images based on the Overfeat and
mapping the tags to ImageCLFE using a built context mapping rule. Regarding
the modules \Tag prediction with CNN" and \Context mapping" in Figure 1,
the detailed processing procedures are:
{ For a given training image, we directly use the Overfeat toolbox to predict
tags for it.
{ For each of the tag predicted from Overfeat, we calculate its semantic
similarity to the concept list of development set, and mapping it to the most
similar concept.
2http://wordnet.princeton.edu/
3http://cilvr.nyu.edu/doku.php?id=software:overfeat:start
4http://www.image-net.org/challenges/LSVRC/2013/results.php
instrument
space
toy
cloudless
cityscape
person
cartoon
sign
protest
airplane
sky
vehicle
cloudless
cityscape
person
cartoon
boat</p>
          <p>In Figure 2, we give an example of the context mapping using CNN. The tags
in blue rectangle are obtained from the previous \text processing" stage. The tags
(with con dence scores) in green rectangle are tags predicted from Overfeat. For
the context mapping procedure (in practice, we use the path similarity measure
in NLTK toolbox as the semantic measure), we can get a candidate concept set
fsky, airplane, vehicle, boat g based on the tags in green rectangle .</p>
          <p>For the \Result ltering and re nement" module in Figure 1, it fuses the
candidate concept set from both textual processing approach and context mapping
with CNN. Since there are much more number of concepts produced by textual
processing approach than context mapping with CNN, however, the concept set
from textual processing approach is more coarse. Thus for the fusion strategy, we
relied more on the concept set from context mapping with CNN and preserved
the concepts with high similarity scores in concept set from textual processing
approach. In Figure 2, the concepts in red rectangle are the nal assigned
concepts to the training image, which are considered to be semantically related to
the training image.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Experimental results</title>
        <sec id="sec-2-2-1">
          <title>Visual features</title>
          <p>
            Similar as the best result of TPT [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] in ImageCLFE 2013 annotation task, we use
the visual features provided by the organizer including GIST, Color Histogram,
SIFT, C-SIFT, RGB-SIFT and OPPONENT-SIFT. For all SIFT-based
descriptors, a bag-of-words (BoW) representation is provided. An early fusion is made
by concatenating all the features provided (global color histogram, getlf, CSIFT,
GIST, opponent SIFT, RGB-SIFT, SIFT) resulting in a 21,312 dimension space.
Global features GIST and Color Histogram are normalized using L2 norm, and
SIFT-based features are normalized using L1 norm.
3.2
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Evaluation measures</title>
          <p>For the performance measures used to evaluate the runs, there are three standard
measures: mean F-measure for the samples (MF-samples), mean F-measure for
the concepts (MF-concepts) and the mean average precision for the samples
(MAP-samples). The MF is computed analyzing both the samples (MF-samples)
and the concepts (MF-concepts), whereas the MAP is computed analyzing the
samples.
3.3</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Training SVM classi ers for concepts</title>
          <p>
            Following the SVM based annotation techniques which had achieved best
annotation performance last year [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], again we trained \one-versus-all" SVM
classi er for each concept. The popular SVM solvers, such as SVMlight, LibSVM,
they are not feasible for training large volumes of data with high dimension,
since these batch methods need to pre-load entire training data into memory, to
compute gradient in each iteration. Thus it is di cult to directly utilize these
SVM solvers. According to the con guration of our machine (an Intel Core i7
2600 CPU (3.4 GHz) and 16 GB RAM), we take into account a better solution
by the stochastic gradient descent (SGD) algorithm which is more e cient for
training SVM classi ers with large-scale data. Di erent from the batch method,
in the SGD algorithm, training sample is fed one by one to calculate the gradients
and update rules of model parameters. Although the SGD algorithm might need
more iteration loops to reach convergence, it requires much less memory cost
which is more appropriate for large-scale training samples and online learning
manner.
          </p>
          <p>
            According to the advices in [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ], we randomize the training data and load the
data in chunks which t in memory, then train the di erent classi ers on further
randomizations of chunks, so that di erent epochs will get the chunks data with
di erent ordering which leads the learnt classi ers to be stable. We repeat this
training process on training set for 5 times to train SVM classi er for each
concept of development set and cross validate the F-measure on development
set. Then we select the parameters of best performance on development set to
further learn classi ers for concepts of test set. To predict concepts for images in
development and test sets, we use the trained concepts's classi ers and obtain
decision scores for each concept by thresholding the con dence score at zero.
3.4
          </p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Inside analysis of annotation results</title>
          <p>We rst discuss the proposed 2-fold concept assignments to training images, and
evaluate its in uence of learning accuracy of concept classi ers. We rst
conduct experiments on the development set and then extend the 2-fold scheme
to test set. Here we consider three settings: (1) \Single-Fold A": the single fold
scheme of traditional textual information process (\Stopword removal and
stemming" module in Figure 1). (2) \Single-Fold B": the single fold scheme of CNN
based tag prediction process (\Tag prediction with CNN" and \Context
mapping" modules), (3) \Two-Folds": the fusion process of both \Single-Fold A"
and \Single-Fold B". We limited the maximum number of concepts assigned to
each training image to be 4. Then we use the learned SVMs classi ers from the
labeled training data to predict concepts for images in development set (the top
5 ranked concepts are considered to be the nal predicted concepts). Table 1
shows the annotation performance of three settings, and one of the baselines
provided by the organizers is also included for comparison. It can be observed
that three settings consistently improve the performance of baseline. In
particular, the tags predicted by Overfeat is considerably accurate for training images.
\Single-Fold B" outperforms the \Single-Fold A" setting of traditional textual
information scheme, which implies the tags is highly coherent with the concepts
in ImageCLFE. Moreover, when fusing the two settings to formulate proposed
\Two-Folds" setting, the result is further improved on all three measures.</p>
          <p>Then we evaluate the e ect of \Result ltering and re nement" module.
Since in the experiment settings above, we restrict the number (denoted by K)
of assigned concepts to each training image as K = 4. And it is reasonable that
the value of K could in uence the learning accuracy of concept classi ers, as
it directly determines the quality of training samples for each concept. Thus,
we further vary the value of K (ranges from 1 to 10), and explore the optimal
K for concept assignments for training images. The annotation performance on
development set with varying K is shown in Figure 3. It can be observed from
Figure 3 that: (1) The peaks of both MF-concept and MF-sample are reached
when K = 6, and peak of MAP-sample reaches the peak when K = 9. (2) The
MAP-sample is more sensitive to K since the number of ground truth concepts
for each image in development set ranges from 1 to 11 (with average 3.52). Based
on these observations, nally we choose K 2 [6; 7; 8; 9; 10] for our latter submit
runs of test set.</p>
          <p>For the test set, we submitted ten runs5. Here we would like to present our
best 5 runs with baselines provided by organizers and the best runs from the
other groups. We can learn from the overall results in Table 2 that: (1) All our
submitted runs are beyond the best baseline result for the test set according to
all measures. Looking into the overall participants results list, our best runs are
at position 6, 3 and 5 order by the MF-sample, MF-concept and MAP-sample
respectively for the test set, and position 4 for the overall performance. It means
that our best runs are competitive compared with other results.</p>
          <p>However, there is still a considerable gap between our best runs and the
topranked runs from KDEVIR group. Although currently we are not able to explore
the details of their proposed annotation technique, there are still space to
improve our annotation system itself from the following aspects: (1) In our current</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5http://www.imageclef.org/2014/annotation/results</title>
      <p>Run
Baseline (oppsift)
kdevir 09</p>
      <p>MIL 03
MindLab 01
DISA-MU 04</p>
      <p>RUC 05</p>
      <p>IPL 09
IMC-FU 01
INAOE 05</p>
      <p>NII 01
FINKI 01
MLIA 09
MLIA 10
MLIA 08
MLIA 07
MLIA 06
9.8
system, we directly utilized the Overfeat toolbox for tag prediction of training
images, a more reasonable choice is that we can generate CNN visual features
and directly use these visual features to learn concept classi ers. Indeed, several
teams such as MIL and MindLab used the CNN visual features. (2) Currently,
the \Context mapping" module only considered mapping the tags from Overfeat
to ImageCLEF with its synonymous/hyponyms in WordNet, and the similarity
measure from NLTK toolbox might not be precise to map the correct results.
An optional choice is modeling the context based similarity measure of tags
depending on the Flickr image metadata, which is more e cient to capture the
semantic associations from the practical circumstance. (3) Our concept
modeling (SVMs based concept classi ers learning) is not elaborately optimized and
tuned, because of the limitations of hardware con gurations and consumption of
resources. Our system capability should be improved if we could overcome these
limitations.
4</p>
      <sec id="sec-3-1">
        <title>Conclusion</title>
        <p>In this paper, we presented our annotation system developed to participate at
ImageCLEF 2014 for the Scalable Concept Image Annotation task. Our proposal
focus on improving the accuracy of concept assignments for training images.
We proposed a 2-fold concept assignments scheme which explicitly leverages the
provided textural information semantically (Section 2.1) and the training images
visually. To learn concept classi ers, we adopted the sophisticated SVM based
model, and took the SGD algorithm to deal with large scale settings of this
task. Experimental results show that our proposal on both visual and textual
information processing are necessary to build a competitive system. Moreover,
we also considered potential future directions to further improve current system.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Martinez-Gomez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patricia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvasti</surname>
            , N., Uskudarl ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cazorla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Varea</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morell</surname>
          </string-name>
          , V.:
          <article-title>ImageCLEF 2014: Overview and analysis of the results</article-title>
          .
          <source>In: CLEF proceedings. Lecture Notes in Computer Science</source>
          , Springer Berlin Heidelberg (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Grana</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serra</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manfredi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiara</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martoglia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandreoli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          : Unimore at imageclef 2013:
          <article-title>Scalable concept image annotation</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs andWorkshop, OnlineWorking Notes</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hatcher</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gospodnetic</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCandless</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Lucene in action</article-title>
          .
          <source>Second Edition</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hidaka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunji</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harada</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Mil at imageclef 2013: Scalable system for image annotation</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs andWorkshop, OnlineWorking Notes</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Renmin university of china at imageclef 2013 scalable concept image annotation</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs and Workshop</source>
          , Online Working Notes (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sahbi</surname>
          </string-name>
          , H.:
          <article-title>Telecom paristech at imageclef 2013 scalable concept image annotation task: Winning annotations with context dependent svms</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs and Workshop</source>
          , Online Working Notes (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sermanet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eigen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Overfeat: Integrated recognition, localization and detection using convolutional networks</article-title>
          .
          <source>CoRR abs/1312</source>
          .6229 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Uricchio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Bimbo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Micc at imageclef 2013 image annotation subtask</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs andWorkshop, Online Working Notes</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
          </string-name>
          , R.:
          <article-title>Overview of the imageclef 2012 scalable concept image annotation subtask</article-title>
          .
          <source>In: CLEF 2012 Evaluation Labs and Workshop</source>
          , Online Working Notes (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
          </string-name>
          , R.:
          <article-title>Overview of the ImageCLEF 2014 Scalable Concept Image Annotation Task</article-title>
          . In:
          <article-title>CLEF 2014 Evaluation Labs</article-title>
          and Workshop, Online Working Notes (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomee</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the imageclef 2013 scalable concept image annotation subtask</article-title>
          .
          <source>In: CLEF 2013 Evaluation Labs and Workshop</source>
          , Online Working Notes. pp.
          <volume>1</volume>
          {
          <issue>19</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>