<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ECP: Evaluation Community Portal A Portal for Evaluation And Collaboration in User Modelling and Personalisation Research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kevin Koidl, Killian Levacher, Owen Conlan</string-name>
          <email>Kevin.Koidl@scss.tcd.ie</email>
          <email>Killian.Levacher@scss.tcd.ie</email>
          <email>Owen.Conlan@scss.tcd.ie</email>
          <email>{Kevin.Koidl, Killian.Levacher, Owen.Conlan}@scss.tcd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Steichen</string-name>
          <email>bsteichen@scu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT, KDEG, School of Computer Science &amp; Statistics, Trinity College</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Engineering, Santa Clara University</institution>
          ,
          <addr-line>Santa Clara, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Researchers conducting evaluations in the fields of User Modelling and Personalisation face the challenge of missing continuing evaluation feedback and collaboration with the overall research community. This missing ability results in limitations such as missing feedback on evaluation approaches, missing insight into other potentially usable evaluation results, and the lack of creating shared evaluation tasks. This paper introduces a community portal ECP: Evaluation Community Portal, which is specifically focused on evaluations within the UMAP community (User Modeling, Adaptation, and Personalisation) General and reference~Cross-computing techniques~Evaluations</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>tools
and</p>
    </sec>
    <sec id="sec-2">
      <title>1.   INTRODUCTION</title>
      <p>Researchers conducting evaluations in the fields of User
Modelling and Personalisation face the challenge of missing
continuing evaluation feedback and collaboration within the
overall research community. This missing ability results in
limitations such as missing feedback on evaluation approaches,
missing insight into other potentially usable evaluation results,
and the lack of shared evaluation tasks to compare different user
modeling and personalization approaches.</p>
      <p>Other research areas, such as in Information Retrieval (IR)
through the TREC and CLEF initiatives, have managed to
overcome these barriers by creating evaluation campaigns with
shared evaluation tasks, as well as community portals containing
shared datasets. Another example of a successful research
community portal is the well-known CFP (Call for Paper) wiki,
which is an established portal for finding information related to
upcoming conferences. Both examples serve as a clear indicator
that community portals within and across research communities
can serve as a vehicle to overcome limitations and boundaries due
to lack of central communication and outreach abilities.</p>
      <p>This paper introduces a community portal ECP: Evaluation
Community Portal, which is specifically focused on evaluations
within the UMAP community (User Modelling and
Personalisation), aiming to serve as a place for the creation and
discussion of shared evaluation tasks from design to results.
Furthermore, the portal seeks to provide result data set access to
expand on other research and discuss previously conducted work.</p>
      <p>The goal of this paper is to spark a discussion on how the
proposed portal would assist the UMAP research community and
what mechanism would have to be put in place to create and
promote such a portal approach.</p>
    </sec>
    <sec id="sec-3">
      <title>2.   RELATED APPROACHES</title>
      <p>Despite a well established User Modelling, Adaptation and
Personalisation (UMAP) community, many fundamental
evaluation challenges still remain to be solved.</p>
      <p>
        Repeatedly obtaining a sufficiently large number of users to
evaluate prototypes is a recurring theme, very familiar across
research institutions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In order to overcome this issue, many
researchers in the field of Human-Computer Interaction have
started using crowdsourcing platforms such as Amazon
Mechanical Turk1 or Crowdflower2 to perform usability studies.
Indeed, the use of such platforms has been shown to be a good
substitute for general lab-based usability studies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However,
the nature of systems and experiments in the field of User
Modeling and Personalization typically require prolonged user
exposure and interaction with a system in order to i) build
accurate user models and ii) truly gauge the effectiveness of
personalization techniques, which is often infeasible given the
typically short interaction paradigm and setup of crowdsourcing
platforms.
      </p>
      <p>
        Additionally, the ability to assess aggregated research results
over time is also hampered by the fact that evaluations are mostly
carried out in isolation from each other and are usually not easily
reproducible or directly comparable [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], which affects the
ability to produce rigorous comparative evaluations between
individual systems produced. For example, while there has been
substantial work over the last two decades in the development of
novel adaptive and personalized e-learning systems, the various
research prototypes have generally not been compared to each
other through standardised evaluation campaigns.
      </p>
      <p>
        Within the existing wider research community, two
wellestablished community-based practices are worth pointing out.
The first consists of the Call For Papers (CFP) wiki [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], whose
1 https://www.mturk.com
2 http://www.crowdflower.com
main purpose is to allow researchers to advertise conference
venues, paper submission deadlines, etc. This community-driven
platform serves the purpose of both i) centralising the outreach
needs of the community with respect to a shared unique goal (i.e.
attracting as many research submissions as possible), as well as ii)
inviting individual researchers to contribute to the list of venues
available in each field.
      </p>
      <p>Considering the recurrent need for large number of users in
each UMAP evaluation, it is surprising that no equivalent
platform exists for the purpose of evaluation within the
community. As of today, there is no central location in which to
advertise individual UMAP evaluation calls. Evaluation calls are
mostly performed through dedicated institution-wide or field
specific research mailing lists3 to which one needs to subscribe.
As a result, the wider research community and general public is
often unaware of these calls. An equivalent ECP wiki platform
would not only centralise and simplify the process of advertising
on-going evaluations within each field of personalisation, it could
also contribute to the larger evaluation needs and analysis of the
community through the a-posteriori publication of datasets,
evaluation metrics and results for each experiment.</p>
      <p>
        The second community-based practice of interest consists of
the CLEF [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and TREC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] shared tasks initiatives. As part of
these tasks, separate systems are designed within the context of a
common set of evaluation constraints (eg: common scenario,
dataset, metrics etc.) and users to compare each approach
proposed. In addition to pooling resources, which lets researchers
focus their efforts on developing their systems, this approach
embeds comparative evaluation as the core evaluation strategy.
Again, the UMAP community lacks such shared tasks and
therefore similar research prototype systems are typically not
compared to each other through a rigorous process. A CFE
platform, as proposed above, could be augmented to form the
basis for the creation of similar tasks within the personalisation
community. Existing evaluation datasets and results published on
the platform could organically increase the number of independent
evaluations being carried out upon identical datasets, eventually
leading to dedicated shared evaluation tasks.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.   PORTAL OVERVIEW</title>
      <p>Based on the discussion above we propose a community
focused portal, which is inspired by work done within CLEF, and
is based on the simplicity of CFP. We propose the following key
features as a starting point for this community effort:
•  
•  
•  </p>
      <p>Ability to post calls for participation in evaluations. This
feature, which is similar to CFP, requires the linking to
surveys and online systems where the evaluation can be
conducted.</p>
      <p>Ability to discuss approaches and findings in a forum
manner. This may include following evaluations and/or
discussions to receive notification on status and outcome.
Ability to upload and present data that can be shared and
used in other evaluations.</p>
      <p>ECP will require substantial community-driven effort to
ensure it remains useful and impactful. For this reason, the portal
3User modelling mailing list:
https://www.di.unito.it/listserver/subrequest/um, Adaptive
Hypermedia mailing list:
http://pegasus.tue.nl/mailman/listinfo/ah
has to be designed in an open and simple fashion by using easy to
implement and extensible platforms such as Content Management
Systems or Wikis. Similar to other community efforts, the portal
does not require a central structure or organisation once the basic
portal is established. Its growth and success depends mostly on
researchers to pick up tasks and extend the portal where needed.</p>
    </sec>
    <sec id="sec-5">
      <title>4.   CONCLUSION</title>
      <p>Based on the challenges of evaluation in User Modelling and
Personalisation we propose a community driven portal introduced
as ECP (Evaluation Community Portal). We discussed the overall
motivation to this topic and related projects successfully applied
in other research communities such as Information Retrieval. We
furthermore introduce a brief overview of required high level
features. We envisage that the main challenges related to ECP will
be in bootstrapping the Portal and gaining initial community
momentum. Like any community lead approach it requires a
certain amount of traction to ensure it is widely used across
different research institutes. Furthermore, an initial task force
(community champions) leading these efforts needs to be
identified which should include more than one research institute
across more than one continent.</p>
    </sec>
    <sec id="sec-6">
      <title>5.   ACKNOWLEDGMENTS</title>
      <p>The ADAPT Centre for Digital Content Technology is funded
under the SFI Research Centres Programs (Grant 13/RC/2106)
and is co-funded under the European Regional Development Fund</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names> </given-names>
            <surname>Paramythis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Weibelzahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , &amp;
            <surname>Masthoff</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Layered evaluation of interactive adaptive systems: framework and formative methods</article-title>
          .
          <source>User Modeling and UserAdapted Interaction</source>
          ,
          <volume>20</volume>
          (
          <issue>5</issue>
          ),
          <fpage>383</fpage>
          -
          <lpage>453</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names> </given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , &amp;
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Experiments on user experiences with recommender interfaces</article-title>
          .
          <source>Behaviour &amp; Information Technology</source>
          ,
          <volume>33</volume>
          (January),
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names> </given-names>
            <surname>Hernández del Olmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            , &amp;
            <surname>Gaudioso</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Evaluation of recommender systems: A new approach</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>35</volume>
          (
          <issue>3</issue>
          ),
          <fpage>790</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]   Wiki Call For Papers: http://www.wikicfp.com/cfp/ Accessed on:
          <volume>06</volume>
          /05/16
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname> </surname>
            <given-names>CLEF</given-names>
          </string-name>
          <year>2016</year>
          : http://clef2016.clef-initiative.eu/ Accessed on:
          <volume>06</volume>
          /05/16
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname> </surname>
            <given-names>TREC</given-names>
          </string-name>
          <year>2016</year>
          : http://trec.nist.gov/pubs/call2016.html Accessed on:
          <volume>06</volume>
          /05/16
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]   Aniket Kittur, Ed H.
          <string-name>
            <surname>Chi</surname>
            , and
            <given-names>Bongwon</given-names>
          </string-name>
          <string-name>
            <surname>Suh</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Crowdsourcing user studies with Mechanical Turk</article-title>
          .
          <source>In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI</source>
          <year>2008</year>
          ), pp.
          <fpage>453</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names> </given-names>
            <surname>Komarov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Reinecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , &amp;
            <surname>Gajos</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. Z.</surname>
          </string-name>
          (
          <year>2013</year>
          , April).
          <article-title>Crowdsourcing performance evaluations of user interfaces</article-title>
          .
          <source>In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI</source>
          <year>2013</year>
          ), pp.
          <fpage>207</fpage>
          -
          <lpage>21</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>