ECP: Evaluation Community Portal
           A Portal for Evaluation And Collaboration in User
               Modelling and Personalisation Research

  Kevin Koidl, Killian Levacher, Owen Conlan                                               Ben Steichen
                       ADAPT, KDEG,                                            Department of Computer Engineering
        School of Computer Science & Statistics                                       Santa Clara University
              Trinity College, Dublin, Ireland                                        Santa Clara, CA, USA
{Kevin.Koidl, Killian.Levacher, Owen.Conlan}@scss.tcd.ie                               bsteichen@scu.edu


ABSTRACT                                                            Furthermore, the portal seeks to provide result data set access to
Researchers conducting evaluations in the fields of User            expand on other research and discuss previously conducted work.
Modelling and Personalisation face the challenge of missing             The goal of this paper is to spark a discussion on how the
continuing evaluation feedback and collaboration with the overall   proposed portal would assist the UMAP research community and
research community. This missing ability results in limitations     what mechanism would have to be put in place to create and
such as missing feedback on evaluation approaches, missing          promote such a portal approach.
insight into other potentially usable evaluation results, and the
lack of creating shared evaluation tasks. This paper introduces a   2.   RELATED APPROACHES
community portal ECP: Evaluation Community Portal, which is              Despite a well established User Modelling, Adaptation and
specifically focused on evaluations within the UMAP community       Personalisation (UMAP) community, many fundamental
(User Modeling, Adaptation, and Personalisation)                    evaluation challenges still remain to be solved.
CCS Concepts                                                              Repeatedly obtaining a sufficiently large number of users to
General    and     reference~Cross-computing        tools    and    evaluate prototypes is a recurring theme, very familiar across
techniques~Evaluations                                              research institutions [1]. In order to overcome this issue, many
                                                                    researchers in the field of Human-Computer Interaction have
Keywords                                                            started using crowdsourcing platforms such as Amazon
Evaluation;Shared;Evaluation;User Modelling;Research                Mechanical Turk1 or Crowdflower2 to perform usability studies.
Community Portals;Personalisation                                   Indeed, the use of such platforms has been shown to be a good
                                                                    substitute for general lab-based usability studies [7][8]. However,
1.   INTRODUCTION                                                   the nature of systems and experiments in the field of User
     Researchers conducting evaluations in the fields of User       Modeling and Personalization typically require prolonged user
Modelling and Personalisation face the challenge of missing         exposure and interaction with a system in order to i) build
continuing evaluation feedback and collaboration within the         accurate user models and ii) truly gauge the effectiveness of
overall research community. This missing ability results in         personalization techniques, which is often infeasible given the
limitations such as missing feedback on evaluation approaches,      typically short interaction paradigm and setup of crowdsourcing
missing insight into other potentially usable evaluation results,   platforms.
and the lack of shared evaluation tasks to compare different user
modeling and personalization approaches.                                  Additionally, the ability to assess aggregated research results
                                                                    over time is also hampered by the fact that evaluations are mostly
      Other research areas, such as in Information Retrieval (IR)   carried out in isolation from each other and are usually not easily
through the TREC and CLEF initiatives, have managed to              reproducible or directly comparable [2, 3], which affects the
overcome these barriers by creating evaluation campaigns with       ability to produce rigorous comparative evaluations between
shared evaluation tasks, as well as community portals containing    individual systems produced. For example, while there has been
shared datasets. Another example of a successful research           substantial work over the last two decades in the development of
community portal is the well-known CFP (Call for Paper) wiki,       novel adaptive and personalized e-learning systems, the various
which is an established portal for finding information related to   research prototypes have generally not been compared to each
upcoming conferences. Both examples serve as a clear indicator      other through standardised evaluation campaigns.
that community portals within and across research communities
can serve as a vehicle to overcome limitations and boundaries due        Within the existing wider research community, two well-
to lack of central communication and outreach abilities.            established community-based practices are worth pointing out.
                                                                    The first consists of the Call For Papers (CFP) wiki [4], whose
     This paper introduces a community portal ECP: Evaluation
Community Portal, which is specifically focused on evaluations
within the UMAP community (User Modelling and                       1
                                                                        https://www.mturk.com
Personalisation), aiming to serve as a place for the creation and   2
discussion of shared evaluation tasks from design to results.           http://www.crowdflower.com
main purpose is to allow researchers to advertise conference              has to be designed in an open and simple fashion by using easy to
venues, paper submission deadlines, etc. This community-driven            implement and extensible platforms such as Content Management
platform serves the purpose of both i) centralising the outreach          Systems or Wikis. Similar to other community efforts, the portal
needs of the community with respect to a shared unique goal (i.e.         does not require a central structure or organisation once the basic
attracting as many research submissions as possible), as well as ii)      portal is established. Its growth and success depends mostly on
inviting individual researchers to contribute to the list of venues       researchers to pick up tasks and extend the portal where needed.
available in each field.
                                                                          4.   CONCLUSION
     Considering the recurrent need for large number of users in               Based on the challenges of evaluation in User Modelling and
each UMAP evaluation, it is surprising that no equivalent                 Personalisation we propose a community driven portal introduced
platform exists for the purpose of evaluation within the                  as ECP (Evaluation Community Portal). We discussed the overall
community. As of today, there is no central location in which to          motivation to this topic and related projects successfully applied
advertise individual UMAP evaluation calls. Evaluation calls are          in other research communities such as Information Retrieval. We
mostly performed through dedicated institution-wide or field              furthermore introduce a brief overview of required high level
specific research mailing lists3 to which one needs to subscribe.         features. We envisage that the main challenges related to ECP will
As a result, the wider research community and general public is           be in bootstrapping the Portal and gaining initial community
often unaware of these calls. An equivalent ECP wiki platform             momentum. Like any community lead approach it requires a
would not only centralise and simplify the process of advertising         certain amount of traction to ensure it is widely used across
on-going evaluations within each field of personalisation, it could       different research institutes. Furthermore, an initial task force
also contribute to the larger evaluation needs and analysis of the        (community champions) leading these efforts needs to be
community through the a-posteriori publication of datasets,               identified which should include more than one research institute
evaluation metrics and results for each experiment.                       across more than one continent.
     The second community-based practice of interest consists of
the CLEF [5] and TREC [6] shared tasks initiatives. As part of            5.   ACKNOWLEDGMENTS
these tasks, separate systems are designed within the context of a        The ADAPT Centre for Digital Content Technology is funded
common set of evaluation constraints (eg: common scenario,                under the SFI Research Centres Programs (Grant 13/RC/2106)
dataset, metrics etc.) and users to compare each approach                 and is co-funded under the European Regional Development Fund
proposed. In addition to pooling resources, which lets researchers
focus their efforts on developing their systems, this approach
                                                                          6.   REFERENCES
embeds comparative evaluation as the core evaluation strategy.            [1]   Paramythis, A., Weibelzahl, S., & Masthoff, J. (2010).
                                                                                Layered evaluation of interactive adaptive systems:
Again, the UMAP community lacks such shared tasks and
therefore similar research prototype systems are typically not                  framework and formative methods. User Modeling and User-
                                                                                Adapted Interaction, 20(5), 383–453.
compared to each other through a rigorous process. A CFE
platform, as proposed above, could be augmented to form the               [2]   Chen, L., & Pu, P. (2012). Experiments on user experiences
basis for the creation of similar tasks within the personalisation              with recommender interfaces. Behaviour & Information
community. Existing evaluation datasets and results published on                Technology, 33(January), 1–23
the platform could organically increase the number of independent         [3]   Hernández del Olmo, F., & Gaudioso, E. (2008). Evaluation
evaluations being carried out upon identical datasets, eventually               of recommender systems: A new approach. Expert Systems
leading to dedicated shared evaluation tasks.                                   with Applications, 35(3), 790–804.
3.   PORTAL OVERVIEW                                                      [4]   Wiki Call For Papers: http://www.wikicfp.com/cfp/
     Based on the discussion above we propose a community                       Accessed on: 06/05/16
focused portal, which is inspired by work done within CLEF, and           [5]   CLEF 2016: http://clef2016.clef-initiative.eu/ Accessed on:
is based on the simplicity of CFP. We propose the following key                 06/05/16
features as a starting point for this community effort:
                                                                          [6]   TREC 2016: http://trec.nist.gov/pubs/call2016.html Accessed
     •     Ability to post calls for participation in evaluations. This         on: 06/05/16
           feature, which is similar to CFP, requires the linking to      [7]   Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008.
           surveys and online systems where the evaluation can be               Crowdsourcing user studies with Mechanical Turk. In
           conducted.                                                           Proceedings of the SIGCHI Conference on Human Factors in
     •     Ability to discuss approaches and findings in a forum                Computing Systems (CHI 2008), pp. 453-456.
           manner. This may include following evaluations and/or          [8]   Komarov, S., Reinecke, K., & Gajos, K. Z. (2013, April).
           discussions to receive notification on status and outcome.           Crowdsourcing performance evaluations of user interfaces.
                                                                                In Proceedings of the SIGCHI Conference on Human Factors
     •     Ability to upload and present data that can be shared and
                                                                                in Computing Systems (CHI 2013), pp. 207-21
           used in other evaluations.
    ECP will require substantial community-driven effort to
ensure it remains useful and impactful. For this reason, the portal

3
    User modelling mailing list:
     https://www.di.unito.it/listserver/subrequest/um, Adaptive
     Hypermedia mailing list:
     http://pegasus.tue.nl/mailman/listinfo/ah