<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF 2009 Large-Scale Visual Concept Detection and Annotation Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefanie Nowak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Dunker Semantic Audio-Visual Systems</string-name>
          <email>peter.dunker@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fraunhofer IDMT</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilmenau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2 [Database
Managment]: H.2.4 Systems-Multimedia Databases</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Classification and Annotation</institution>
          ,
          <addr-line>Knowledge Structures, Evaluation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Measurement</institution>
          ,
          <addr-line>Performance, Experimentation, Benchmark</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The large-scale visual concept detection and annotation task (LS-VCDT) in ImageCLEF 2009 aims at the detection of 53 concepts in consumer photos. These concepts are structured in an ontology which implies a hierarchical ordering and which can be utilized during training and classification of the photos. The dataset consists of 18.000 Flickr photos which were manually annotated with 53 concepts. 5.000 photos were used for training and 13.000 for testing. Altogether 19 research groups participated and submitted 73 runs. Two evaluation paradigms have been applied, the evaluation per concept and the evaluation per photo. The evaluation per concept was performed by calculating the Equal Error Rate (EER) and the Area Under Curve (AUC). For the evaluation per photo a recently proposed hierarchical measure was utilized that takes the hierarchy and the relations of the ontology into account and calculates a score per photo. For the concepts, an average AUC of 84% could be achieved, including concepts with an AUC of 95%. The classification performance for each photo ranged between 69% and 100% with an average score of 90%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Task Description, Database and Ontology</title>
      <p>
        The focus of LS-VCDT lies on the automatic detection and annotation of concepts in a large
consumer photo collection. It mainly poses two challenges:
1. Can image classifiers scale to the large amount of concepts and data?
2. Can an ontology (hierarchy and relations) help in large scale annotations?
In this task, the MIR Flickr 25.000 image dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is utilized. This collection consists of 25.000
photos from Flickr with creative commons license. Most of them contain EXIF data, stored in a
separate text file. We used altogether 18.000 of theses photos, annotated them manually with the
defined visual concepts and provided them to the participants.
      </p>
      <p>The training set consists of 5.000 and the testset of 13.000 images of the photoset. All images
have multiple annotations. Most annotations refer to holistic visual concepts and are annotated
at an image-based level. Altogether we provided the annotations for 53 concepts in rdf format
and as plain text files. The visual concepts are organized in a small ontology. Participants may
use the hierarchical order of the concepts and the relations between concepts for solving the
annotation task. It was not allowed to use additional data for the training of the systems to
ensure comparability among the groups.</p>
      <p>The LS-VCDT is an extension of the former VCDT 2008 concerning the amount of data
available and the amount of concepts to be annotated. In 2008, the database was quite small with
about 1.800 images for training, 1.000 images for testing and 17 concepts to be detected.
0
9
18
27
36
45
1
2
3</p>
      <p>4
10
19
28
11
20
29
5
14
23
12
21
13
22
30
31
32
6
15
24
33
37
38
39
40
41
42
7
25
16
34
43
8
17
26
35
44
46
47
48
49
50
51
52
The annotation process was realized in three steps. First the annotation of all photos was
performed by several annotators, second a validation step of these annotations was conducted and
third an agreement between different annotators for the same concepts and photos was calculated.</p>
      <p>The annotation of 18.000 photos was performed by 43 persons from the Fraunhofer IDMT.
The number of photos that were annotated by one person varied between 30 and 2.500 images.
All annotators were provided with a definition of the concepts and example images with the goal
to allow a consistent annotation amongst the large number of persons. It was important that the
concepts are represented over the whole image. Some of the concepts exclude each other, others
can be depicted simultaneously. One example photo per concept is illustrated in Fig. 1 and a
complete list of all concepts can be found in Table 1. The frequency of each concept in the training
and test sets is also depicted.</p>
      <p>After this first annotation step, a validation of the annotations was performed. Due to the
number of people, the number of photos and the ambiguity of some image contents, the annotations
were not consistent throughout the database. Three persons performed a validation by screening
only those photos that a) were annotated with concept X and b) that were not annotated with
concept X. In the first case they had to delete all annotations for concepts that were not depicted
at the photo and so were wrongly assigned. In the second case the goal was to find the photos
where an annotation for concept X was missing but where the concept was visible.</p>
      <p>Additionally, a subset of 100 photos was annotated by 11 different persons. These annotations
are used to calculate an agreement between annotators for different concepts and photos. The
agreement on concepts is illustrated in Table 1. For each photo and each concept, the annotation
of the majority of annotators was regarded as correct and the percentage of annotators that
annotated correct is utilized as agreement factor. This agreement is used in the Hierarchical Score
(HS) as scaling factor (see Sec. 2.3). In case of a low agreement the algorithm assumes that the
concept is ambiguous and therefore reduces the costs if the system wrongly assigns this concept.
Regrettably, there was no possibility to annotate each photo by two or three persons to get a
validation and an agreement on concepts over the whole set.
2.2</p>
      <sec id="sec-1-1">
        <title>Ontology</title>
        <p>In addition to the photos and their annotation an ontology was provided. All concepts are
structured in this ontology. Fig. 2 shows a simple hierarchical organization of a part of the concepts.
The hierarchy allows to make assumptions about the assignment of concepts to documents. E.g.,
if a photo is classified to contain trees, it also contains plants. Then, next to the is-a
relationship of the hierarchical organization of concepts, additionally other relationships between concepts
determine possible label assignments. The ontology restricts e.g., that for a certain sub-node only
one concept can be assigned at a time (disjoint items) or that a special concept (like portrait)
postulates other concepts like persons or animals.</p>
        <p>The ontology allows the participants to incorporate knowledge in their classification algorithms,
and to make assumptions about which concepts are probable in combination with certain labels.
2.3</p>
      </sec>
      <sec id="sec-1-2">
        <title>Evaluation Measures</title>
        <p>
          The evaluation of submissions to LS-VCDT considers two evaluation paradigms. We are interested
in the evaluation per concept and in the evaluation per photo. For the evaluation per concept,
the EER and the AUC of the ROC curves summarize the performance of the individual runs.
The EER is defined as the point where the false acceptance rate of a system is equal to the false
rejection rate. These scores were also used in the VCDT task 2008 and allow to compare the results
of the different groups to some overlapping concepts. The evaluation per photo is assessed with a
recently proposed hierarchical measure [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. It considers partial matches between system output
and ground truth and calculates misclassification costs for each missing or wrongly annotated
concept per image. The score is based on structure information (distance between concepts in the
        </p>
        <sec id="sec-1-2-1">
          <title>Train (%) Test (%) Annotator Agreement</title>
          <p>System 1
System 2
Disjoint concepts</p>
          <p>Relationship
hasPersons
hierarchy), relationships from the ontology and the agreement between annotators for a concept.
The calculation of misclassification costs favours systems that annotate a photo with concepts close
to the correct ones more than systems that annotate concepts that are far away in the hierarchy
from the correct concepts. (E.g. for the single-label classification case depicted in Fig. 2, system
1 gets lower misclassification costs than system 2.)
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>19 participants submitted results to the LS-VCDT task in altogether 73 runs. The number of runs
was restricted to a maximum of 5 runs per group.</p>
      <p>In Table 2 the results for the evaluation per concept are illustrated. The team with the best
results (ISIS University of Amsterdam) achieves an EER of 23% and an AUC of 84% in average for
their best run. One run with pseudo-random numbers was added by the organizers. In this case
for each concept a random number between 0 and 1 was generated that denotes the confidence of
the annotation for the EER/AUC computation and that was rounded to 0 or 1 for the hierarchical
measure per photo. The random numbers achieve an EER and AUC of 50%.</p>
      <p>In Table 3 the results for each concept are summarized. In average the concepts could be
detected with an EER of 23% and an AUC of 84%. A great amount of these concepts was
classified best by the ISIS group. It is obvious that the aesthetic concepts (Aesthetic_Impression,
Overall_Quality and Fancy) are classified worst (EER greater than 38% and AUC smaller
than 66%.). This is not suprising due to the subjective nature of these concepts which also
made the groundtruthing very difficult. The best classified concepts are Clouds (AUC: 96%),
Sunset-Sunrise(AUC: 95%), Sky(AUC: 95%) and Landscape-Nature (AUC: 94%).</p>
      <p>In Table 4 the results for the evaluation per photo are summarized. The classification
performance per photo ranges between 69% and 100% with an average of 90%. The best results in terms
of HS were achieved by the XRCE group with 83% annotation score over all photos. It can be
seen from the table that the ranking of the groups is different than for the EER/AUC. It seems
that some of the groups took the ontology information into account (at least in a post-processing
step) and others ignored it. The include of the annotator agreements does not change the results
substantially. The scores are a bit worse as the measure is stricter, but the ranking of the groups
remains.
This subsection gives a brief overview about the submitted technologies of the participants. Further
information about each approach can be found in the corresponding papers.</p>
      <p>
        The IAM group [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] focuses on visual-terms, which are created by low-level features mainly
based on SIFT and a following codebook quantization. For machine learning, the Cross Language
Latent Indexing method was applied which maps the concept names and the visual-terms into a
semantic space. The decision classification is handled by estimating the smallest cosine distance
between concepts and visual-terms. With the different runs, a successive expansion of the concept
hierarchy was tried, which results in no improvement.
      </p>
      <p>
        The algorithm of the TELECOM ParisTech group [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was designed especially for the
largescale scenario, which means a low complexity for the processing and an easy extension to a variety
of concepts, by accepting a decrease of precision. The algorithm utilizes global visual features and
text features generated out of the 53 visual concepts via PCA. A Canonical Correlation Analysis
is used to capture linear relationships between these different features spaces.
      </p>
      <p>
        The UPMC/LIP6 group [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] utilizes a simple HSV histogram feature calculated in 3
horizontal segments and a linear kernel SVM for learning.
      </p>
      <p>
        The MRIM-LIG group [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] combines RGB histograms, SIFT and Gabor features. For the
learning phase, different SVM combinations are trained and as a priori, the best feature/SVM
setup for each concept is used.
      </p>
      <p>
        The LSIS group [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] combines different features, e.g. HSV, edge, gabor or profile entropy
and applies a Visual Dictionary with a visual-word approach.
      </p>
      <p>
        The AVEIR submissions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are from a joint group of the individual participants: Telecom
ParisTech, LSIS, MRIM-LIG and UPMC/LIP6. The AVEIR submissions seem to be equal to
the individual submissions. An efficient and reliable combination or fusion method based on a
carefulness index is discussed only theoretically in the working notes.
      </p>
      <p>
        The KameyamaLab group [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] proposed a system with joint global color and texture features
as well as local features based on saliency regions. Additionally, a gist of scene feature is used.
For the assignment of concept labels, a KNN classifier is applied.
      </p>
      <p>
        The ISIS group [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] applies a system that is based on four main steps. First, a sampling
strat
      </p>
      <sec id="sec-2-1">
        <title>Best AUC Best EER Group</title>
        <p>egy is applied that combines a spatial pyramid approach and saliency points detection. Second,
SIFT features are extracted in different color spaces. To reduce the amount of visual features, a
codebook transformation is utilized in the third step and the frequency information of predefined
codewords is used as final feature. The final learning step is based on SVM with χ2 kernel. The
runs differ mainly in the number of SIFT features used and the codebook generation.</p>
        <p>
          The FIRST group [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] used SIFT features on different color channels and pyramid histograms
over color intensities. The SIFT features are combined by the bag of words approach. For
classification, a SVM was applied with average kernel, sparse L1 MKL and non-sparse Lp MKL kernel.
        </p>
        <p>
          The XRCE group [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] uses a set of different features, a GMM image representation, a Fisher
vector and local RGB statistics and SIFT features. The local features are extracted on a multi-level
image-grid. For the classification, a Sparse Logistic Regression approach was applied. In the
postprocessing, the hierarchical structure, disjoint-concepts and relating concepts were considered.
        </p>
        <p>
          The SZTAKI1 group [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] used SIFT features and a graph-based segmentation algorithm.
Based on the segments, color histograms, shape and DFT features are estimated. The SIFT
features are post-processed with a GMM, and a Fisher kernel is applied on the features derived
from the segmentation. For classification, a binary logistic regression approach is utilized. As
one of a few groups the SZTAKI group applied the connections between concepts in the provided
ontology among others to estimate correlations of appearing concepts.
        </p>
        <p>
          The INRIA-LEAR group [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] utilizes a bag-of-features setup with global features namely a
gist of scene descriptor and different color histograms applied in three horizontal regions of the
image and the local SIFT feature quantized with k-means. In two runs a weighted nearest neighbor
tag prediction method is applied and in two runs a SVM for each concept is used. The fifth run
uses a SVM classifier, trained for multi-class separation. The SVM runs performed better than the
tag prediction, whilst the tag prediction was ten times faster. No post-processing on the ontology
rules was performed, therefore the results of the HS are worse than the EER.
        </p>
        <p>
          The TIA-INAOE group [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] provided an algorithm based on global features, e.g. color and
edge histograms. As baseline run, a KNN classifier is used and the most often appearing concepts
in the top nearest neighbor training images were assigned. A further label refinement process
1SZTAKI equals the bpacad submissions
concentrates on co-occurrence statistics of the disjoint concepts of training set.
        </p>
        <p>
          The CVIU I2R group [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] provides a system that utilizes various global and local features,
e.g. color and edge histograms, color coherence vector, census transform and different SIFT
features. Furthermore, a local region search algorithm is used to previously select a relevant
bounding box for each concept. In combination with a SVM with χ2 kernel a feature selection
process is applied to choose the most relevant features for each concept. For the disjoint concepts,
the probabilities were manipulated in order to have a single concept above 0.5.
        </p>
        <p>
          The UAIC group [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] utilizes four modules. First, a face detection software was used to
estimate the number of faces in images. Unfortunately, this modules breaks the rules of the
LSVCDT, because the face detector is based on a Viola and Jones detector which was trained with
data that was not provided in this task. The second module concentrates on clustering of training
images, whereas most concepts were set to a score of 0.5 if no decision could be made. The same
process was applied in an EXIF data processing module. The last module sets default values to
disjoint concepts depending on their occurrence in the training data.
        </p>
        <p>
          The MMIS submissions [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] utilize global color histogram, Tamura texture and Gabor
features. Selected features are estimated in nine subregions and concatenated to the overall feature
vector. A non-parametric density estimation is applied and a baseline approach by global feature
weighting was submitted. The other four submissions use different parameter combinations for
word correlations to semantic similarity. The runs differ by the source of the semantic similarity
space, which was estimated by the training data, by Google Web search, WordNet and Wikipedia
measure. The submission based on the training data achieved the best results.
        </p>
        <p>Summarizing the approaches, some facts can be driven. The groups that used local features like
SIFT achieved better results than the groups relying solely on global features. Most groups that
investigated the concept hierarchy and analyzed, e.g. the correlations between the concepts, could
achieve better ranks evaluated with the hierarchical measure than with the EER. The information
about the computational performance are difficult to compare, because the information range from
72 hours for the complete process, to 1 second for training and testing. In the 2010 task, a more
detailed specification for this information is needed.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This paper summarises the ImageCLEF 2009 LS-VCDT. Its aim was to automatically annotate
photos with 53 concepts in a multilabel scenario. An additionally provided ontology could be
used to enrich the classification system. The results show that in average the task could be solved
reasonably well with the best system achieving an AUC of 84% for all photos. Four other groups
got an AUC score over or equal to 80%. Evaluated on the concept basis, the concepts could be
annotated in average with an AUC of 84%. In terms of HS, the best system annotated all photos
with an average annotation rate of 83%. Three other systems were very close to these results
with 83%, 82% and 81%. Part of the groups used the ontology for post-processing or to learn
correlations of concepts. No participant integrated the ontology in a reasoning system and tried
to apply this system for the classification task. The large number of concepts and photos posed
no problem to the classification systems.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgment</title>
      <p>This work has been partly supported by grant No. 01MQ07017 of the German research program
THESEUS funded by the Ministry of Economics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ah-Pine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          , G. Csurka, and
          <string-name>
            <surname>Y. Liu.</surname>
          </string-name>
          <article-title>XRCE's Participation in ImageCLEF 2009</article-title>
          . CLEF working notes
          <year>2009</year>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Binder</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kawanabe</surname>
          </string-name>
          .
          <article-title>Fraunhofer FIRST's Submission to ImageCLEF2009 Photo Annotation Task: Non-sparse Multiple Kernel Learning</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Daroczy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Petras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Benczur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fekete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nemeskey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Siklosi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Weiner</surname>
          </string-name>
          . SZTAKI @
          <article-title>ImageCLEF 2009</article-title>
          . CLEF working notes
          <year>2009</year>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guillaumin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mensink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          .
          <article-title>INRIA-LEARs participation to ImageCLEF 2009</article-title>
          . CLEF working notes
          <year>2009</year>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.A.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montex</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.E.</given-names>
            <surname>Sucar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Villasenor</surname>
          </string-name>
          .
          <article-title>TIA-INAOE's Participation at ImageCLEF 2009</article-title>
          . CLEF working notes
          <year>2009</year>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fakeri-Tabrizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tollari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Denoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Gallinari</surname>
          </string-name>
          . UPMC/LIP6 at ImageCLEFannotation 2009:
          <article-title>Large Scale Visual Concept Detection and Annotation</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferecatu</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahbi</surname>
          </string-name>
          . TELECOM ParisTech at ImageClef 2009:
          <article-title>Large Scale Visual Concept Detection and Annotation Task</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fakeri-Tabrizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferecatu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tollari</surname>
          </string-name>
          , G. Quenot,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahbi</surname>
          </string-name>
          , E. Dumont, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Gallinari</surname>
          </string-name>
          .
          <article-title>Comparison of Various AVEIR Visual Concept Detectors with an Index of Carefulness</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Hare</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          . IAM@
          <article-title>ImageCLEFPhotoAnnotation 2009: Naive application of a linear-algebraic semantic space</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mark</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Huiskes</surname>
            and
            <given-names>Michael S.</given-names>
          </string-name>
          <string-name>
            <surname>Lew</surname>
          </string-name>
          .
          <article-title>The MIR Flickr Retrieval Evaluation</article-title>
          .
          <source>In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval</source>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iftene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vamanu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Croitoru. UAIC at ImageCLEF 2009 Photo Annotation</surname>
          </string-name>
          <article-title>Task</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Llorente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Little</surname>
          </string-name>
          , and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Ru¨ger</article-title>
          . MMIS at ImageCLEF 2009:
          <article-title>Non-parametric Density Estimation Algorithms</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <surname>J-P. Chevallet</surname>
          </string-name>
          , G. Quenon, and R. Al Batal.
          <article-title>MRIM-LIG at ImageCLEF 2009: Photo Retrieval and Photo Annotation tasks</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ngiam</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Goh. I2R ImageCLEF Photo Annotation 2009 Working</surname>
          </string-name>
          <article-title>Notes</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowak</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lukashevich</surname>
          </string-name>
          .
          <article-title>Multilabel Classification Evaluation using Ontology Information</article-title>
          .
          <source>In The 1st Workshop on Inductive Reasoning and Machine Learning on the Semantic Web -IRMLeS</source>
          <year>2009</year>
          ,
          <article-title>co-located with the 6th Annual European Semantic Web Conference (ESWC), Heraklion</article-title>
          , Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarin</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Kameyama</surname>
          </string-name>
          .
          <article-title>Joint Contribution of Global and Local Features for Image Annotation</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>K.E.A. van de Sande</surname>
          </string-name>
          , T. Gevers,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.W.M.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          . The University of Amsterdam's Concept Detection System at ImageCLEF
          <year>2009</year>
          . CLEF working notes
          <year>2009</year>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z-Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. Dumont. LSIS</given-names>
            <surname>Scale Photo</surname>
          </string-name>
          <article-title>Annotations: Discriminant Features SVM versus Visual Dictionary based on Image Frequency</article-title>
          .
          <source>CLEF working notes 2009</source>
          , Corfu, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>