Collaborative environment of the PROMISE infrastructure: an "ELEGantt" approach Marco Angelini Claudio Bartolini Gregorio Convertino Sapienza University of Rome HP Labs Xerox Research Italy USA Centre,Europe angelini@dis.uniroma1.it claudio.bartolini@hp.com convertino@xrce.xerox.com Guido Granato Preben Hansen Giuseppe Santucci Sapienza University of Rome SICS Sapienza University of Rome Italy Sweden Italy granato@dis.uniroma1.it preben@sics.se santucci@dis.uniroma1.it ABSTRACT ways recreated from scratch, successful processes (best practices) This paper focuses on developing lightweight tools for knowledge cannot be reused, novices cannot be easily trained based on shared sharing and collaboration by communities of practice operating in experience. the field of information retrieval. The paper contributes a moti- 3) The overall community cannot easily reflect on (and thus re- vating scenario, a characterization of these communities, a list of engineer) its own workflow around specific TRECs. requirements for collaboration, and then a system design proposed 2. MOTIVATING SCENARIO as a proof-of-concept implementation that is being evaluated. The starting point of our analysis is a typical IR evaluation cam- paign (lab). In a typical scenario, Adam (lab organizer) is prepar- 1. INTRODUCTION ing an IR experiment and evaluation task and spends time and re- This paper focuses on the problem of supporting knowledge shar- sources for coordinating, communicating and assembling people ing and collaboration in communities of practice that operate in and resources in order to proceed with the overall evaluation task, the field of information retrieval (IR). These communities include e.g. recruiting people that will be responsible for different evalu- developers, researchers, and stakeholders who periodically collect ation task(s). Communication and sharing of information may be and use scientific data produced by the experimental evaluation of different within different across sub-tasks. Furthermore, they may IR systems. Specifically, the communities considered include those be different between labs without any awareness among actors of involved in three specific IR domains: Patent, Cultural Heritage, the similarities/differences in the evaluation task processes. Thus, and Radiology. it is important to identify the stages in the evaluation task process The research context of the work reported in this paper is the PRO- as well as how collaborative and information sharing activities are MISE NoE. This project aims at advancing the current tools for IR manifested. communities to perform experimental evaluation of complex mul- timedia and multilingual information systems. The ultimate goal of 3. CHARACTERIZING IR COMMUNITIES the project is to develop a unified infrastructure for the community The CLEF experimental platform involves a series of CLEF Labs to efficiently collect and reuse data, knowledge, tools, methodolo- and one or more tracks within each Lab. Each Lab as well as each gies, and communities of end users. In this context, providing ad- track involves a certain set of tasks that could be considered as a equate support for collaboration is crucial. Herefrom the specific task process or workflow. In order to define and describe these goal of the work reported in this paper: designing and evaluating tasks we have investigated the lab and track organizers of a CLEF lightweight support for knowledge sharing and collaboration. experiment, how they performed their work and what steps they Currently, the following problems result from lack of suitable col- went through during their work. Furthermore, we have extracted laboration tools: requirements for collaborative information handling and informa- 1) Greater effort is required by individual members, who contribute tion sharing activities specifically [3, 5]. An evaluation campaign as volunteers, for sharing knowledge and collaborating. In the long is an activity intended to support IR researchers providing a large term, this discourages broader participation. test collection and uniform scoring procedures. An evaluation cam- 2) Poor reuse of content and process information across the mul- paign is organized within an evaluation framework like TREC or tiple instantiations of similar experimental evaluation processes. CLEF and can involve different domains (cultural heritage, patent, Over time, this leads to inefficient processes: e.g., content is al- radiology and so on). Within an evaluation campaign there are many tracks, such as multimedia, multilingual, text, music, images, etc. A track can be organized differently according to a specific do- main and include, in turn, several tasks. A task is used to define the structure of the experiment, specifying a set of documents, a set of topics, and a relevance assessment. For each task the set of doc- uments can be structured defining, for example, a title, keywords, images and so on. A topic represents an information need. Doc- uments can be assessed as being relevant or not (or more or less Presented at EuroHCIR2012. Copyright c 2012 for the individual papers by the papers’ authors. Copying permitted only for private and academic relevant) for a given information need (topic). purposes. This volume is published and copyrighted by its editors. Some of the most common tasks that we observed as part of a Another aspect that characterizes the work of people involved in a lab is that there is an alternation between individual and collabora- tive work, which is in contrast to a too rigid environment. The basic idea of our system is to improve the actual tools used in IR community without defining the collaborative environment in a Figure 1: Task stages of organizing an experimental IR track in too rigid way. Our purposes in this paper are: CLEF. 1) Collecting the tools actually used (e.g. Skype, Googledocs, etc.) in a structured environment. 2) Making available to users other col- laborative tools (polls, news). 3) T@GZ: a social software system for organizational information sharing. typical evaluation campaign include the followings: submission, preparation, track definition, topic creation, data set, relevance as- 4.1 Requirements by role sessment, summarizing, and finalizing. In order to get information on the actual tasks users performed, we made requirements elicitation through a number of question- 3.1 Observed roles in IR Communities naires to track organizers within the three predefined domains of Within an evaluation campaign many people are involved in dif- Patent, Cultural Heritage, and Radiology in the CLEF platform. ferent tasks, such as organizing, creating topics, managing collec- Working through these questions we identified: a) different roles tions, handling participants and submission, choosing measures, of actors involved in the CLEF experiments, b) requirements for and running the final evaluations. collaborative information handling activities and information shar- The set of actors involved in PROMISE activities is not homo- ing and c) links between roles and collaborative events. geneous and depends on the domain which is taken into account. Furthermore we describe each task stage regarding subtask involved Looking at three domains (patent, medical, and cultural heritage), (fig.1): we defined the following actors: - Submission task: preparing a lab proposition including sub tracks, - organizers: people who are in charge of preparing a campaign; it acceptance of the lab or not. is possible to distinguish domain organizers and track organizers; - Preparation task: preparation of a CLEF lab flyer including de- - participants: people who run their algorithm(s) according to the tails of each lab, obtaining databases, preparing a copyright agree- actual tasks; ment, preparation of the web page. - relevance assessors: people who make the relevance assessment; - Track definition task: definition of broad tasks, start of registra- - topic creators: people who define topics for a given task; tion for the participants. - site administrators: (e.g. system administrators); - Topic creation task: preparation of detailed topics for each task, - other researchers release of topics, checks of the copyright forms. - annotators: people who annotate resources to highlight some - Data set: data access for all registered participants with signed hidden information. copyright forms for their tracks. Each of these actors is described along with a set of activities or - Relevance judgment task: preparation of the judgement system; tasks. Moreover, a user can have more than one role. finding qualified judges; submission of all runs by participants; pooling of the runs to create documents to judge; judgement of 3.2 Observed tools of IR Communities the documents for relevance; evaluation of all submitted runs. The collaborative work may be performed in a non-structured - Summarizing task: release of ground truth; submission of par- manner using basic tools for collaboration in an ad hoc fashion. ticipants’ papers with results and technical descriptions; analysis of The following are the most commonly used tools for collaboration: the results and submission of overview paper. - E-mail: the most common way to organize the work and spread - Finalizing task: CLEF workshop and labs with discussion; feed- information; backs on CLEF; preparation of next year of CLEF; distribution of - Face-to-face meetings: useful to discuss more effectively about responsibilities. problems and solve them; it is complex if people don’t work in the same building; 4.2 Implicit requirements - Video conference tools (e.g. Skype): used instead of face-to-face Support the community in managing the process: communication; - Tasks and roles. Various roles are involved in communities work - Shared workspaces (e.g. shared document editor, desktop shar- on experimental evaluation of IR systems. Multiple tracks and tasks ing): useful to share documents. However an issue may arise where are part of a process occurring in an evaluation experiment such as many people work on the same document. CLEF. Within a track, some of the tasks are interdepend. Specific roles perform specific tasks. One member can play multiple roles. 4. COLLABORATION REQUIREMENTS The set of the role and tasks changes across different IR domains. As mentioned in the introduction, a more suitable collaborative - Assume both individual and collaborative works. Many tasks tool is needed to help researchers to accomplish their tasks. To re- are conducted individually and the individual work is interleaved alize it we have to overcome some limitations. with collaborative work. Support both individual and collaborative The first one is the impossibility to define a common detailed work- works and faster transitions. flow due to the presence of different domains, each of them with - Only some of the steps in the work-flow are fully specified before specific needs. This makes it difficult to realize a collaborative en- or at the outset of the process, others are defined during the process. vironment completely specified. Despite this limit, it is possible Needs related to existing work tools: to individuate some common needs such as: communication with - As for other communities of knowledge workers [1], the members other actors, access to data of previous campaign, sharing of task of these communities use email as their primary communication flow of actual evaluation campaign, and sharing workspace with tool. It would be helpful for any new tool for knowledge sharing actors involved in the same tasks. and collaboration to build on the central role of email. - Clear added value. The user should be required to log onto a new system only if such new system supports additional functions that are useful and are not already supported in email or other general- purpose media already in use (e.g. VoIP). - Difficult tracking and reuse. Perhaps a useful role can be played in facilitating a smoother integration across the multiple existing tools. Needs related to new collaborative functions that might support collaboration: - Groups. Group creation is tied to the task. - Polls. The polls component should be integrated with the other components (this is an example of function not available in email). - Collaborative workspace. It is desirable for the members of the Figure 2: Specificity Frontier community to have easy access to the shared resources and typical steps, which, ideally, should be made available all in one place. - Process visualization. We observed visible differences in the dif- ferent instantiations of the work-flows elicited from different sub- specific instances of these workflows share several of the tasks and communities. It allows people to become aware of differences and roles, the specific instances will inevitably vary across IR domains similarities in the way sub-communities go about performing the (and data types), evaluation campaigns, and over time (because the same process. process is refined by the IR community in a collaborative manner as it is repeated over the years). Interestingly, recent research on com- 5. CONTEXT AND RELATED WORK munities of professionals pointed to the same need for collaborative Recent research has pointed to the importance of investigating tools that are able to support flexible realizations of the processes and supporting collaboration in the field of Information Retrieval rather than forcing the community into hard-coded processes [9]. (IR). For example, [4] reviews the studies of collaboration relative Articulating further the design requirements for supporting semi- to this field and concludes that the IR field needs to better under- structured workflows, [2] divides the specificity frontier into four stand and improve the systems that support both direct and indirect sub-spectra: providing context for enactment, monitoring constraints collaboration during information tasks. This is supported by stud- about the task, providing/planning options to reach a goal, and ies in various IR settings. [6] investigated patent engineers, which guiding through a given script. Building on this classification, in is a specific community conducting IR processes, and found that this paper we focus on providing support to the IR communities in they were involved in various collaborative activities. Overall, ex- the first two sub-spectra. isting studies have observed that collaboration is indeed endemic to the broader activities of individuals who perform information 6. DESIGN AND ARCHITECTURE seeking, searching, and retrieval tasks. However, with our work To fulfill the requirements described on Section 4, we devised we aim to address two specific limitations of the existing literature. the architecture shown on Figure 3. It refers to the classical CLEF First, the prior studies of collaborative practices in IR have focused experiments organization, that is arranged in terms of different do- on describing the practices of teams or groups of users in different mains (e.g., Medical, Patents, etc.). For each domain one or more settings and user communities (e.g., academia, industry, medicine, tracks are available. As described on Section 3, the organization patent offices; see [4] or have developed new tools to support these of a track is a complex collaborative process and encompasses sev- practices [8], but have not yet systematically investigated how to eral tasks that exhibit some precedence relationships. The task flow support collaboration at the level of a large communities of prac- of a track is formalized using an Extended Light livE Gantt (Ele- tice. Second, while there has been research on how to support com- Gantt) and is shown within the CUI to the user, acting as the main munities of practices of professionals (e.g., [10]) or scientists (e.g., entry point for collaborative activities (Figure 4). A suitable ad- collaboratories, see [6], we focus specifically on how to support ministrative interface allows for adapting the task flow of a track lightweight knowledge sharing and collaboration in the commu- to procedural changes. According to [2] (Figure 2), EleGantt is in nity of IR researchers and professionals who develop and evaluate charge of providing a part of the process context, i.e., a structured IR tools. This is a community with unique needs for collaboration to-do list. The second part of the context, i.e., a shared common and types of workflows: recurrently, specific sub-communities of space, is provided by T@GZ [7]. EleGantt is an extension of the volunteers need to agree on, build, and refine evaluation campaigns traditional Gantt chart. It allows for: for testing IR systems. - attaching a rich set of meta-data to tasks with the goal of sup- A key distinctive property of the IR communities is that their porting collaborative activities: involved people, involved roles, as- workflows cannot be fully specified a priori. That is, if we con- sociated tags, kind of collaboration activities needed to accomplish sider a continuum from highly specified to highly unspecified pro- the task, and the list of other processes that share the same activity; cesses, then we could classify the instances of workflows of the IR - expressing the non overlapping constraint between tasks that communities as intermediate cases along this continuum. [2], who must be executed in sequence; named this continuum the Specificity Frontier, observed a gap be- - specifying temporal uncertainties (e.g., minimum and maxi- tween two existing approaches for supporting collaboration: most mum duration of an activity), and degree of freedom for milestones collaborative systems have focused on either automating fixed work and deliverable releases. Moreover, the EleGantt visualization is processes (e.g., Enterprise Resource Planning tools) or simply sup- both a visualization of the task flow and an interactive interface porting communication in ad-hoc processes (e.g., email). To ad- that allows for exploring and accessing task flow associated infor- equately support the collaboration in IR communities we need to mation, like roles, people, similarities with other task flows, etc. bridge the gap between these two approaches using lightweight T@GZ is a social software system for organizational informa- tools that are compatible semi-structured workflows. While the tion sharing. In T@GZ the user can share by simply sending an The Collaborative User Interface (CUI) is the Web based access point to all the collaborative activities (see figure 4). It is basically split in two subcomponents. The first one allows for managing per- sonal user collaborative information (left part of the picture), e.g., messages, polls, etc. The second one refers to the whole process and allows for both exploring it using EleGantt, discovering peo- ple, roles, and tags and browsing the whole set of tagged emails. Moreover, the CUI contains a set of tabs that allows for accessing the collaborative tools that have been specified in the EleGantt. In order to provide the user with an unique access point to the pro- cess resources, the CUI sends tagged emails to T@GZ, containing a link to the collaborative resources (e.g., link to dropbox folder or to Google Docs document) 7. CONCLUSION AND FUTURE WORK As result of our investigation we identified some general chal- lenges or open issues for this domain: 1) find a good balance be- tween the need for flexibility to fit various, partially-defined pro- Figure 3: The architecture of the Promise collaborative system cesses and the need for enough specification in order to allow au- tomation 2) identify the set of predefined tags (some common and some domain specific) 3) semi-automate the tagging process (e.g., an intelligent assistant). Acknowledgements The work in this paper has been supported by the PROMISE NoE (contract n. 258191) project as a part of the 7th Framework Pro- gram of the European commission (FP7/2007-2013). 8. REFERENCES [1] V. Bellotti, N. Ducheneaut, M. Howard, I. Smith, and R. Grinter. Quality versus quantity: e-mail-centric task management and its relation with overload. Human-Computer Interaction archive, 20(1), June 2005. [2] A. Bernstein. How can cooperative work tools support dynamic group process? bridging the specificity frontier. . In ACM conference on Computer supported cooperative work (CSCW ’00), 2000. [3] M. Croce, E. Di Reto, G. Granato, P. Hansen, A. Sabetta, G. Santucci, and F. Veltri. Collaborative User Interface Requirements. In PROMISE deliverable D5.1, 2011. [4] J. . Foster. Collaborative information seeking and retrieval. In Annual Review of Information Science and Technology, Figure 4: The Promise Collaborative User Interface (CUI) Volume 40, 2006, 329Ű356, 2006. [5] P. Hansen, G. L. Granato, and G. Santucci. Collecting and Assessing collaborative requirements. In Collaborative Information Seeking (CIS 2011) workshop, 2011. email message with the content to be shared, and addressing the [6] P. Hansen and K. Jarvelin. Collaborative information message to one or more topic specific keywords. For example, one retrieval in an information- intensive domain. Information might use the address, bizdev@share.X.com, for referring to infor- Processing and Management (IPM). 2005. mation related to "business development" topic (see Figure 4, top [7] P. M. Joshi, C. Bartolini, and S. Graupner. T@gz: intuitive left). Thus, the content of that email is ’tagged’ by the keyword and effortless categorization and sharing of email ’bizdev’. Any mail may have multiple tags attached in this man- conversations. In A. Mille and F. L. e. a. Gandon, editors, ner, in the ’To’ or ’CC’ fields, using any client. While enabling WWW(Companion Volume), page 365. ACM, 2012. easy publishing and re-finding of this information, the system does [8] M. Morris and E. Horvitz. SearchTogether: an interface for not induce people to send additional emails other than those that collaborative web search. In 20th annual ACM symposium on they are already sharing. Focusing now on the implementation of User interface software and technology (UIST ’07)., 2007. T@GZ in the whole system, using a set of predefined tags (i.e., the [9] H. R. Motahari-Nezhad, C. Bartolini, S. Graupner, tags associated to the elegantt’ s tasks), T@GZ provides a means S. Singhal, and S. Spence. IT Support Conversation for indexing the emails that are exchanged among the organizers Manager: A Conversation-Centered Approach and Tool for of the tracks, including links to smart attachments. The work-flow Managing Best Practice IT Processes. In Hewlett Packard engine is aware of the elegantts and using time information and in- Laboratories Palo Alto,USA,2010, 2010. specting the KB sends through email different kinds of notifications [10] R. S. W. . Wenger, E.; McDermott. Cultivating Communities (e.g., a deadline is approaching, it is time to move to the next step, of Practice . Harvard Business Press, 2002. etc.) to people involved in the tracks organization.