<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the ImageCLEF 2014 Domain Adaptation Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barbara Caputo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Novi Patricia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Idiap Research Institute</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Rome La Sapienza</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>341</fpage>
      <lpage>347</lpage>
      <abstract>
        <p>This paper describes the rst edition of the Domain Adaptation Task at ImageCLEF 2014. Domain adaptation refers to the challenge of leveraging over knowledge acquired when learning to recognize given classes on a database, when using a di erent data collection. We describe the scienti c motivations behind the task, the research challenge on which the 2014 edition focused, the data and evaluation metric and results obtained by participants. After a discussion on the lesson learned during this rst edition, we conclude with possible ideas for future editions of the task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The amount of freely available and annotated image collections is dramatically
increased over the last years, thanks to the di usion of high-quality cameras, and
also to the introduction of new and cheap annotation tools such as Mechanical
Turk [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Attempts to leverage over and across such large data sources has proved
challenging. Indeed, tools like Google GoggleS3 are able to recognize reliably
limited classes of objects, like books or wine labels, but are not able to generalize
across generic objects like food items, clothing items and so on. Several authors
showed that, for a given task, training on a dataset (e.g. Pascal VOC 07) and
testing on another (e.g. ImageNet) produces very poor results, although the set
of depicted object categories is the same [
        <xref ref-type="bibr" rid="ref10 ref12 ref13 ref6">10,13,6,12</xref>
        ]. In other words, existing
object categorization methods do not generalize well across databases.
      </p>
      <p>
        This problem is known in the literature as the domain adaptation challenge,
as known in machine learning for speech and language processing [
        <xref ref-type="bibr" rid="ref1 ref5">1,5</xref>
        ]. A source
domain (S) usually contains a large amount of labeled images, while a target
domain (T ) refers broadly to a dataset that is assumed to have di erent
characteristics from the source, and few or no labeled samples. Formally, two domains
di er when their probability distributions di er: PS (x; y) 6= PT (x; y), where
x 2 X indicates the generic image sample and y 2 Y the corresponding class
label. Within this context, the across dataset generalization problem stems from
an intrinsic di erence between the underlying distributions of the data.
      </p>
      <p>
        Addressing this issue would have a tremendous impact on the generality and
adaptability of any vision-based annotation system. Current research in domain
adaptation focuses on a scenario where
{ (a) the prior domain (source) consists of one or maximum two databases;
{ (b) the labels between the source and the target domain are the same, and
{ (c) the number of annotated training data for the target domain are limited.
The goal of the Domain Adaptation Task, initiated in 2014 under the
ImageCLEF umbrella [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], is to push the state of the art in domain adaptation towards
more realistic settings, relaxing these assumptions. Our ambition is to provide,
over the years, stimulating problems and challenging data collections that might
stimulate and support novel research in the eld.
      </p>
      <p>In the rest of the paper we describe the 2014 Domain Adaptation Task
(section 2.1), the data and features provided to the participants (section 2.2), and
the evaluation metric adopted (section 2.3). Section 3 describes the results
obtained while section 4 provides an in depth discussion of the results obtained and
identi es possible new directions for the 2015 edition of the task. Conclusions
are given in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The 2014 Domain Adaptation Task</title>
      <p>In this section we describe the Domain Adaptation Task proposed in the
ImageCLEF 2014 lab. We rst outline the research challenge we aimed at addressing
(section 2.1). Then, we describe the data collection used and the features
provided to all participants (section 2.2) and we describe the evaluation metric used
(section 2.3).
2.1</p>
      <sec id="sec-2-1">
        <title>The Research Challenge</title>
        <p>
          In the 2014 version ( rst edition) of the Domain Adaptation Task, we focused
on the number of sources available to the system. Current experimental settings,
widely used in the community, consider typically one source and one target [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
or at most two sources and one target [
          <xref ref-type="bibr" rid="ref11 ref6">6,11</xref>
          ]. This scenario is unrealistic: with
the wide abundance of annotated resources and data collections that are made
available to users, and with the fast progress that is being made in the image
annotation community, it is likely that systems will be able to access more and
more databases, and therefore to leverage over a much larger number of sources
than two, as considered in the most challenging settings today.
        </p>
        <p>
          To push research towards more realistic scenarios, the 2014 edition of the
Domain Adaptation Task has proposed an experimental setup with four sources,
where such sources were built by exploiting existing available resources like the
ImageNet, Caltetch256 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] databases and so on. Participants were thus requested
to build recognition systems for the target classes by leveraging over such source
knowledge. We considered a semi-supervised setting, i.e. a setting where the
target data, for each class, is limited but annotated. In the next section we
describe in details the data used for the sources, the classes contained both in
the source and the target, and the target data provided to participants.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Data and Features</title>
        <p>Source and Target Data To de ne the source and target data, we considered
ve publicly available databases:
{ the Caltech-256 database, consisting of 256 object categories, with a total of
30.607 images;
{ the ImageNet ILSVRC2012 database, organized according to the WordNet
hierarchy, with an average of 500 images per node;
{ the PASCAL VOC2012 database, an image data set for object class
recognition with 20 object classes;
{ the Bing database, containing all 256 categories from the Caltech-256 one,
and augmented with 300 web images per category that were collected through
textual search using Bing;
{ and the SUN database, a scene understanding database that contains 899
categories and 130.519 images.</p>
        <p>We then selected twelve classes, common to all the dataset listed above:
aereoplane, bike, bird, boat, bottle, bus, car, dog, horse, monitor, motorbike,
and people. Figure 1 illustrates the images contained for each class in each of
the considered datasets. As sources, we considered 50 images representing the
classes listed above from the databases Caltech-256, ImageNet, PASCAL and
Bing. The 50 images were randomly selected from all those contained in each
of the data collection, for a total of 600 images for each source. As target, we
used images taken from the SUN database for each class. We randomly selected 5
images per class for training, and 50 images per class for testing. These data were
given to all participants as validation set. The test set consisted of 50 images for
each class, for a total of 600, manually collected by us using the class names as
textual queries with standard search engines.</p>
        <p>
          Features Instead of making available directly the images to participants,
we decided to release pre-computed features only, in order to keep the focus on
the learning aspects of the algorithms in this year's competition. Thus, we
represented every image with dense SIFT descriptors (PHOW features) at points
on a regular grid with spacing 128 pixels [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. At each grid point the descriptors
were computed over four patches with di erent radii, hence each point was
represented by four SIFT descriptors. The dense features have been vector quantized
into 256 visual words using k-means clustering on a randomly chosen subset
of the Caltech-256 database. Finally, all images were converted to 2 2 spatial
histograms over the 256 visual words, resulted in 1024 feature dimension. The
software used for computing such features is available at www.vlfeat.org.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation Metrics</title>
        <p>We asked participants to provide the class name for each of the 600 test images
released. Results were compared with the ground truth, and a score was assigned
as follows:
{ For each correctly classi ed image will receive 1 point;
{ For each misclassi ed image will receive 0 point.</p>
        <p>We provided to all participants, together with the validation data, a matlab
script for evaluating the performance of their algorithms before the o cial
submission, i.e. on the validation data. The script had been tested under Matlab
(ver 8.1.0.64) and Octave (ver 3.6.2).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>
        While 19 groups registered to the domain adaptation task to receive access to
the training and validation data, only 3 groups eventually submitted runs: the
XRCE group, the Hubert Curien Lab group and the Idiap group (organizers).
They submitted the following algorithms:
{ the XRCE group submitted a set of methods based on several heterogeneous
methods for domain adaptation, whose predictions were subsequently fused.
By combining the output of instance based approaches and metric learning
one with a brute force SVM prediction, they obtained a set of heterogeneous
classi ers all producing class prediction for the target domain instances.
These were combined through di erent versions of majority voting in order
to improve the overall accuracy.
{ The Hubert Curien Lab group did not submit any working notes, neither
sent any detail about their algorithm. We are therefore not able to describe
it.
{ The Idiap group submitted a baseline run using a recently introduced
learning to learn algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The approach considers source classi ers as
experts, and it combines their con dence output with a high-level cue
integration scheme, as opposed to a mid-level one as proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The algorithm
is called High-level Learning to Learn (H-L2L). As our goal was not to
obtain the best possible performance but rather to provide an o the shelf
baseline against which to compare results of the other participants, we did
not perform any parameter tuning.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Analysis and Discussion</title>
      <p>The clear success of the XRCE group, obtained by combining several domain
adaptation methods presented in the literature, seems to indicate that current
methods are not able to address e ectively the problem of leveraging over
multiple sources. Ensemble methods, chosen by at least two teams, appear instead to
be a viable option in this setting, whether used to combine the output of various
domain adaptation algorithms, whether used to combine several source output
con dences.</p>
      <p>The choice made to provide to participants only the features computed from
each image, and not the images itself, forced groups to focus on the learning
aspects of the problems, but perhaps did not allow for enough exibility in
attacking the problem. We don't plan to repeat this choice in the future editions
of the task.</p>
      <p>A last remark should be made on the scarce participation to the task. Even
though only three groups eventually submitted runs, 19 groups expressed interest
and registered, in order to access the training and validation data. We believe
that this is an indicator of enough interest to push us to organize again the task
next year, also collecting feedbacks from the participating and registered groups
in order to identify possible problems in the current edition and to o er a more
engaging edition of the task in the future.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        The rst edition of the Domain Adaptation Task, organized under the
ImageCLEF umbrella, focused on the problem of building a classi er in a target domain
while leveraging over four di erent sources. nineteen groups registered for the
task, and eventually three groups submitted runs, with the XRCE winning the
competition with an ensemble learning based method. For the 2015 edition of
the task, we plan to make available to participants the raw images, as opposed
to pre-computed features as done in 2014, so to allow for a wider generality of
approaches. We will continue to propose data supporting the problem of
leveraging from multiple sources, possibly by augmenting the number of classes (which
was 12 in the 2014 edition), and/or allowing for a partial overlap of classes
between sources and between sources and target, as proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In order to
signi cantly increase the number of participants to the task next year, we will
contact all groups that registered to the task and ask their preferences among
these di erent options.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledments</title>
      <p>This work was partially supported by the Swiss National Science Foundation
project Situated Vision (SIVI).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blitzer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation with structural correspondence learning</article-title>
          .
          <source>In: Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bosch</surname>
            , Anna ad Zisserman,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Image classi cation using random forests and ferns</article-title>
          .
          <source>In: Proc. CVPR</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Buhrmester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gosling</surname>
          </string-name>
          , S.D.:
          <article-title>Amazon's mechanical turk a new source of inexpensive, yet high-quality</article-title>
          ,
          <source>data? Perspectives on Psychological Science</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ), 3{
          <issue>5</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Martinez-Gomez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patricia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvasti</surname>
            , N., Uskudarl ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cazorla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Varea</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morell</surname>
          </string-name>
          , V.:
          <article-title>ImageCLEF 2014: Overview and analysis of the results</article-title>
          .
          <source>In: CLEF proceedings. Lecture Notes in Computer Science</source>
          , Springer Berlin Heidelberg (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Daume</surname>
            <given-names>III</given-names>
          </string-name>
          , H.:
          <article-title>Frustratingly easy domain adaptation</article-title>
          .
          <source>In: Association for Computational Linguistics Conference (ACL)</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sha</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grauman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Geodesic ow kernel for unsupervised domain adaptation</article-title>
          .
          <source>In: Proc. CVPR</source>
          .
          <article-title>Extended version considering its additional material</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gri n</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Holub</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Caltech 256 object category dataset</article-title>
          .
          <source>Tech. Rep. UCB/CSD-04-1366</source>
          , California Institue of Technology (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tommasi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Multiclass transfer learning from unconstrained priors</article-title>
          .
          <source>In: Proc. ICCV</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Patricia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Learning to learn, from transfer learning to domain adaptation: a unifying perspective</article-title>
          .
          <source>In: Proc. CVPR</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulis</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fritz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Adapting visual category models to new domains</article-title>
          .
          <source>In: Proc. ECCV</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tommasi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Frustratingly easy nbnn domain adaptation</article-title>
          .
          <source>In: Proc. ICCV</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tommasi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quadrianto</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lampert</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Beyond dataset bias: Multi-task unaligned shared knowledge transfer</article-title>
          .
          <source>In: Proc. ACCV</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Unbiased look at dataset bias</article-title>
          .
          <source>In: Proc. CVPR</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>