ECP: Evaluation Community Portal A Portal for Evaluation And Collaboration in User Modelling and Personalisation Research Kevin Koidl, Killian Levacher, Owen Conlan Ben Steichen ADAPT, KDEG, Department of Computer Engineering School of Computer Science & Statistics Santa Clara University Trinity College, Dublin, Ireland Santa Clara, CA, USA {Kevin.Koidl, Killian.Levacher, Owen.Conlan}@scss.tcd.ie bsteichen@scu.edu ABSTRACT Furthermore, the portal seeks to provide result data set access to Researchers conducting evaluations in the fields of User expand on other research and discuss previously conducted work. Modelling and Personalisation face the challenge of missing The goal of this paper is to spark a discussion on how the continuing evaluation feedback and collaboration with the overall proposed portal would assist the UMAP research community and research community. This missing ability results in limitations what mechanism would have to be put in place to create and such as missing feedback on evaluation approaches, missing promote such a portal approach. insight into other potentially usable evaluation results, and the lack of creating shared evaluation tasks. This paper introduces a 2.   RELATED APPROACHES community portal ECP: Evaluation Community Portal, which is Despite a well established User Modelling, Adaptation and specifically focused on evaluations within the UMAP community Personalisation (UMAP) community, many fundamental (User Modeling, Adaptation, and Personalisation) evaluation challenges still remain to be solved. CCS Concepts Repeatedly obtaining a sufficiently large number of users to General and reference~Cross-computing tools and evaluate prototypes is a recurring theme, very familiar across techniques~Evaluations research institutions [1]. In order to overcome this issue, many researchers in the field of Human-Computer Interaction have Keywords started using crowdsourcing platforms such as Amazon Evaluation;Shared;Evaluation;User Modelling;Research Mechanical Turk1 or Crowdflower2 to perform usability studies. Community Portals;Personalisation Indeed, the use of such platforms has been shown to be a good substitute for general lab-based usability studies [7][8]. However, 1.   INTRODUCTION the nature of systems and experiments in the field of User Researchers conducting evaluations in the fields of User Modeling and Personalization typically require prolonged user Modelling and Personalisation face the challenge of missing exposure and interaction with a system in order to i) build continuing evaluation feedback and collaboration within the accurate user models and ii) truly gauge the effectiveness of overall research community. This missing ability results in personalization techniques, which is often infeasible given the limitations such as missing feedback on evaluation approaches, typically short interaction paradigm and setup of crowdsourcing missing insight into other potentially usable evaluation results, platforms. and the lack of shared evaluation tasks to compare different user modeling and personalization approaches. Additionally, the ability to assess aggregated research results over time is also hampered by the fact that evaluations are mostly Other research areas, such as in Information Retrieval (IR) carried out in isolation from each other and are usually not easily through the TREC and CLEF initiatives, have managed to reproducible or directly comparable [2, 3], which affects the overcome these barriers by creating evaluation campaigns with ability to produce rigorous comparative evaluations between shared evaluation tasks, as well as community portals containing individual systems produced. For example, while there has been shared datasets. Another example of a successful research substantial work over the last two decades in the development of community portal is the well-known CFP (Call for Paper) wiki, novel adaptive and personalized e-learning systems, the various which is an established portal for finding information related to research prototypes have generally not been compared to each upcoming conferences. Both examples serve as a clear indicator other through standardised evaluation campaigns. that community portals within and across research communities can serve as a vehicle to overcome limitations and boundaries due Within the existing wider research community, two well- to lack of central communication and outreach abilities. established community-based practices are worth pointing out. The first consists of the Call For Papers (CFP) wiki [4], whose This paper introduces a community portal ECP: Evaluation Community Portal, which is specifically focused on evaluations within the UMAP community (User Modelling and 1 https://www.mturk.com Personalisation), aiming to serve as a place for the creation and 2 discussion of shared evaluation tasks from design to results. http://www.crowdflower.com main purpose is to allow researchers to advertise conference has to be designed in an open and simple fashion by using easy to venues, paper submission deadlines, etc. This community-driven implement and extensible platforms such as Content Management platform serves the purpose of both i) centralising the outreach Systems or Wikis. Similar to other community efforts, the portal needs of the community with respect to a shared unique goal (i.e. does not require a central structure or organisation once the basic attracting as many research submissions as possible), as well as ii) portal is established. Its growth and success depends mostly on inviting individual researchers to contribute to the list of venues researchers to pick up tasks and extend the portal where needed. available in each field. 4.   CONCLUSION Considering the recurrent need for large number of users in Based on the challenges of evaluation in User Modelling and each UMAP evaluation, it is surprising that no equivalent Personalisation we propose a community driven portal introduced platform exists for the purpose of evaluation within the as ECP (Evaluation Community Portal). We discussed the overall community. As of today, there is no central location in which to motivation to this topic and related projects successfully applied advertise individual UMAP evaluation calls. Evaluation calls are in other research communities such as Information Retrieval. We mostly performed through dedicated institution-wide or field furthermore introduce a brief overview of required high level specific research mailing lists3 to which one needs to subscribe. features. We envisage that the main challenges related to ECP will As a result, the wider research community and general public is be in bootstrapping the Portal and gaining initial community often unaware of these calls. An equivalent ECP wiki platform momentum. Like any community lead approach it requires a would not only centralise and simplify the process of advertising certain amount of traction to ensure it is widely used across on-going evaluations within each field of personalisation, it could different research institutes. Furthermore, an initial task force also contribute to the larger evaluation needs and analysis of the (community champions) leading these efforts needs to be community through the a-posteriori publication of datasets, identified which should include more than one research institute evaluation metrics and results for each experiment. across more than one continent. The second community-based practice of interest consists of the CLEF [5] and TREC [6] shared tasks initiatives. As part of 5.   ACKNOWLEDGMENTS these tasks, separate systems are designed within the context of a The ADAPT Centre for Digital Content Technology is funded common set of evaluation constraints (eg: common scenario, under the SFI Research Centres Programs (Grant 13/RC/2106) dataset, metrics etc.) and users to compare each approach and is co-funded under the European Regional Development Fund proposed. In addition to pooling resources, which lets researchers focus their efforts on developing their systems, this approach 6.   REFERENCES embeds comparative evaluation as the core evaluation strategy. [1]   Paramythis, A., Weibelzahl, S., & Masthoff, J. (2010). Layered evaluation of interactive adaptive systems: Again, the UMAP community lacks such shared tasks and therefore similar research prototype systems are typically not framework and formative methods. User Modeling and User- Adapted Interaction, 20(5), 383–453. compared to each other through a rigorous process. A CFE platform, as proposed above, could be augmented to form the [2]   Chen, L., & Pu, P. (2012). Experiments on user experiences basis for the creation of similar tasks within the personalisation with recommender interfaces. Behaviour & Information community. Existing evaluation datasets and results published on Technology, 33(January), 1–23 the platform could organically increase the number of independent [3]   Hernández del Olmo, F., & Gaudioso, E. (2008). Evaluation evaluations being carried out upon identical datasets, eventually of recommender systems: A new approach. Expert Systems leading to dedicated shared evaluation tasks. with Applications, 35(3), 790–804. 3.   PORTAL OVERVIEW [4]   Wiki Call For Papers: http://www.wikicfp.com/cfp/ Based on the discussion above we propose a community Accessed on: 06/05/16 focused portal, which is inspired by work done within CLEF, and [5]   CLEF 2016: http://clef2016.clef-initiative.eu/ Accessed on: is based on the simplicity of CFP. We propose the following key 06/05/16 features as a starting point for this community effort: [6]   TREC 2016: http://trec.nist.gov/pubs/call2016.html Accessed •   Ability to post calls for participation in evaluations. This on: 06/05/16 feature, which is similar to CFP, requires the linking to [7]   Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. surveys and online systems where the evaluation can be Crowdsourcing user studies with Mechanical Turk. In conducted. Proceedings of the SIGCHI Conference on Human Factors in •   Ability to discuss approaches and findings in a forum Computing Systems (CHI 2008), pp. 453-456. manner. This may include following evaluations and/or [8]   Komarov, S., Reinecke, K., & Gajos, K. Z. (2013, April). discussions to receive notification on status and outcome. Crowdsourcing performance evaluations of user interfaces. In Proceedings of the SIGCHI Conference on Human Factors •   Ability to upload and present data that can be shared and in Computing Systems (CHI 2013), pp. 207-21 used in other evaluations. ECP will require substantial community-driven effort to ensure it remains useful and impactful. For this reason, the portal 3 User modelling mailing list: https://www.di.unito.it/listserver/subrequest/um, Adaptive Hypermedia mailing list: http://pegasus.tue.nl/mailman/listinfo/ah