Collaborative environment of the PROMISE infrastructure:
                 an "ELEGantt" approach

                   Marco Angelini                            Claudio Bartolini                   Gregorio Convertino
            Sapienza University of Rome                             HP Labs                           Xerox Research
                       Italy                                         USA                              Centre,Europe
           angelini@dis.uniroma1.it                   claudio.bartolini@hp.com convertino@xrce.xerox.com
               Guido Granato                               Preben Hansen          Giuseppe Santucci
            Sapienza University of Rome                               SICS                    Sapienza University of Rome
                       Italy                                         Sweden                              Italy
           granato@dis.uniroma1.it                           preben@sics.se                  santucci@dis.uniroma1.it

ABSTRACT                                                                  ways recreated from scratch, successful processes (best practices)
This paper focuses on developing lightweight tools for knowledge          cannot be reused, novices cannot be easily trained based on shared
sharing and collaboration by communities of practice operating in         experience.
the field of information retrieval. The paper contributes a moti-         3) The overall community cannot easily reflect on (and thus re-
vating scenario, a characterization of these communities, a list of       engineer) its own workflow around specific TRECs.
requirements for collaboration, and then a system design proposed         2.    MOTIVATING SCENARIO
as a proof-of-concept implementation that is being evaluated.                 The starting point of our analysis is a typical IR evaluation cam-
                                                                          paign (lab). In a typical scenario, Adam (lab organizer) is prepar-
1.    INTRODUCTION                                                        ing an IR experiment and evaluation task and spends time and re-
   This paper focuses on the problem of supporting knowledge shar-        sources for coordinating, communicating and assembling people
ing and collaboration in communities of practice that operate in          and resources in order to proceed with the overall evaluation task,
the field of information retrieval (IR). These communities include        e.g. recruiting people that will be responsible for different evalu-
developers, researchers, and stakeholders who periodically collect        ation task(s). Communication and sharing of information may be
and use scientific data produced by the experimental evaluation of        different within different across sub-tasks. Furthermore, they may
IR systems. Specifically, the communities considered include those        be different between labs without any awareness among actors of
involved in three specific IR domains: Patent, Cultural Heritage,         the similarities/differences in the evaluation task processes. Thus,
and Radiology.                                                            it is important to identify the stages in the evaluation task process
The research context of the work reported in this paper is the PRO-       as well as how collaborative and information sharing activities are
MISE NoE. This project aims at advancing the current tools for IR         manifested.
communities to perform experimental evaluation of complex mul-
timedia and multilingual information systems. The ultimate goal of
                                                                          3.    CHARACTERIZING IR COMMUNITIES
the project is to develop a unified infrastructure for the community         The CLEF experimental platform involves a series of CLEF Labs
to efficiently collect and reuse data, knowledge, tools, methodolo-       and one or more tracks within each Lab. Each Lab as well as each
gies, and communities of end users. In this context, providing ad-        track involves a certain set of tasks that could be considered as a
equate support for collaboration is crucial. Herefrom the specific        task process or workflow. In order to define and describe these
goal of the work reported in this paper: designing and evaluating         tasks we have investigated the lab and track organizers of a CLEF
lightweight support for knowledge sharing and collaboration.              experiment, how they performed their work and what steps they
Currently, the following problems result from lack of suitable col-       went through during their work. Furthermore, we have extracted
laboration tools:                                                         requirements for collaborative information handling and informa-
1) Greater effort is required by individual members, who contribute       tion sharing activities specifically [3, 5]. An evaluation campaign
as volunteers, for sharing knowledge and collaborating. In the long       is an activity intended to support IR researchers providing a large
term, this discourages broader participation.                             test collection and uniform scoring procedures. An evaluation cam-
2) Poor reuse of content and process information across the mul-          paign is organized within an evaluation framework like TREC or
tiple instantiations of similar experimental evaluation processes.        CLEF and can involve different domains (cultural heritage, patent,
Over time, this leads to inefficient processes: e.g., content is al-      radiology and so on). Within an evaluation campaign there are
                                                                          many tracks, such as multimedia, multilingual, text, music, images,
                                                                          etc. A track can be organized differently according to a specific do-
                                                                          main and include, in turn, several tasks. A task is used to define the
                                                                          structure of the experiment, specifying a set of documents, a set of
                                                                          topics, and a relevance assessment. For each task the set of doc-
                                                                          uments can be structured defining, for example, a title, keywords,
                                                                          images and so on. A topic represents an information need. Doc-
                                                                          uments can be assessed as being relevant or not (or more or less
Presented at EuroHCIR2012. Copyright c 2012 for the individual papers
by the papers’ authors. Copying permitted only for private and academic   relevant) for a given information need (topic).
purposes. This volume is published and copyrighted by its editors.        Some of the most common tasks that we observed as part of a
                                                                         Another aspect that characterizes the work of people involved in a
                                                                         lab is that there is an alternation between individual and collabora-
                                                                         tive work, which is in contrast to a too rigid environment.
                                                                         The basic idea of our system is to improve the actual tools used in
                                                                         IR community without defining the collaborative environment in a
Figure 1: Task stages of organizing an experimental IR track in          too rigid way. Our purposes in this paper are:
CLEF.                                                                    1) Collecting the tools actually used (e.g. Skype, Googledocs, etc.)
                                                                         in a structured environment. 2) Making available to users other col-
                                                                         laborative tools (polls, news). 3) T@GZ: a social software system
                                                                         for organizational information sharing.
typical evaluation campaign include the followings: submission,
preparation, track definition, topic creation, data set, relevance as-   4.1    Requirements by role
sessment, summarizing, and finalizing.                                      In order to get information on the actual tasks users performed,
                                                                         we made requirements elicitation through a number of question-
3.1    Observed roles in IR Communities                                  naires to track organizers within the three predefined domains of
   Within an evaluation campaign many people are involved in dif-        Patent, Cultural Heritage, and Radiology in the CLEF platform.
ferent tasks, such as organizing, creating topics, managing collec-      Working through these questions we identified: a) different roles
tions, handling participants and submission, choosing measures,          of actors involved in the CLEF experiments, b) requirements for
and running the final evaluations.                                       collaborative information handling activities and information shar-
The set of actors involved in PROMISE activities is not homo-            ing and c) links between roles and collaborative events.
geneous and depends on the domain which is taken into account.           Furthermore we describe each task stage regarding subtask involved
Looking at three domains (patent, medical, and cultural heritage),       (fig.1):
we defined the following actors:                                         - Submission task: preparing a lab proposition including sub tracks,
- organizers: people who are in charge of preparing a campaign; it       acceptance of the lab or not.
is possible to distinguish domain organizers and track organizers;       - Preparation task: preparation of a CLEF lab flyer including de-
- participants: people who run their algorithm(s) according to the       tails of each lab, obtaining databases, preparing a copyright agree-
actual tasks;                                                            ment, preparation of the web page.
- relevance assessors: people who make the relevance assessment;         - Track definition task: definition of broad tasks, start of registra-
- topic creators: people who define topics for a given task;             tion for the participants.
- site administrators: (e.g. system administrators);                     - Topic creation task: preparation of detailed topics for each task,
- other researchers                                                      release of topics, checks of the copyright forms.
- annotators: people who annotate resources to highlight some            - Data set: data access for all registered participants with signed
hidden information.                                                      copyright forms for their tracks.
Each of these actors is described along with a set of activities or      - Relevance judgment task: preparation of the judgement system;
tasks. Moreover, a user can have more than one role.                     finding qualified judges; submission of all runs by participants;
                                                                         pooling of the runs to create documents to judge; judgement of
3.2    Observed tools of IR Communities                                  the documents for relevance; evaluation of all submitted runs.
   The collaborative work may be performed in a non-structured           - Summarizing task: release of ground truth; submission of par-
manner using basic tools for collaboration in an ad hoc fashion.         ticipants’ papers with results and technical descriptions; analysis of
The following are the most commonly used tools for collaboration:        the results and submission of overview paper.
- E-mail: the most common way to organize the work and spread            - Finalizing task: CLEF workshop and labs with discussion; feed-
information;                                                             backs on CLEF; preparation of next year of CLEF; distribution of
- Face-to-face meetings: useful to discuss more effectively about        responsibilities.
problems and solve them; it is complex if people don’t work in the
same building;                                                           4.2    Implicit requirements
- Video conference tools (e.g. Skype): used instead of face-to-face         Support the community in managing the process:
communication;                                                           - Tasks and roles. Various roles are involved in communities work
- Shared workspaces (e.g. shared document editor, desktop shar-          on experimental evaluation of IR systems. Multiple tracks and tasks
ing): useful to share documents. However an issue may arise where        are part of a process occurring in an evaluation experiment such as
many people work on the same document.                                   CLEF. Within a track, some of the tasks are interdepend. Specific
                                                                         roles perform specific tasks. One member can play multiple roles.
4.    COLLABORATION REQUIREMENTS                                         The set of the role and tasks changes across different IR domains.
   As mentioned in the introduction, a more suitable collaborative       - Assume both individual and collaborative works. Many tasks
tool is needed to help researchers to accomplish their tasks. To re-     are conducted individually and the individual work is interleaved
alize it we have to overcome some limitations.                           with collaborative work. Support both individual and collaborative
The first one is the impossibility to define a common detailed work-     works and faster transitions.
flow due to the presence of different domains, each of them with         - Only some of the steps in the work-flow are fully specified before
specific needs. This makes it difficult to realize a collaborative en-   or at the outset of the process, others are defined during the process.
vironment completely specified. Despite this limit, it is possible       Needs related to existing work tools:
to individuate some common needs such as: communication with             - As for other communities of knowledge workers [1], the members
other actors, access to data of previous campaign, sharing of task       of these communities use email as their primary communication
flow of actual evaluation campaign, and sharing workspace with           tool. It would be helpful for any new tool for knowledge sharing
actors involved in the same tasks.                                       and collaboration to build on the central role of email.
- Clear added value. The user should be required to log onto a new
system only if such new system supports additional functions that
are useful and are not already supported in email or other general-
purpose media already in use (e.g. VoIP).
- Difficult tracking and reuse. Perhaps a useful role can be played
in facilitating a smoother integration across the multiple existing
tools.
Needs related to new collaborative functions that might support
collaboration:
- Groups. Group creation is tied to the task.
- Polls. The polls component should be integrated with the other
components (this is an example of function not available in email).
- Collaborative workspace. It is desirable for the members of the
                                                                                              Figure 2: Specificity Frontier
community to have easy access to the shared resources and typical
steps, which, ideally, should be made available all in one place.
- Process visualization. We observed visible differences in the dif-
ferent instantiations of the work-flows elicited from different sub-
                                                                           specific instances of these workflows share several of the tasks and
communities. It allows people to become aware of differences and
                                                                           roles, the specific instances will inevitably vary across IR domains
similarities in the way sub-communities go about performing the
                                                                           (and data types), evaluation campaigns, and over time (because the
same process.
                                                                           process is refined by the IR community in a collaborative manner as
                                                                           it is repeated over the years). Interestingly, recent research on com-
5.    CONTEXT AND RELATED WORK                                             munities of professionals pointed to the same need for collaborative
   Recent research has pointed to the importance of investigating          tools that are able to support flexible realizations of the processes
and supporting collaboration in the field of Information Retrieval         rather than forcing the community into hard-coded processes [9].
(IR). For example, [4] reviews the studies of collaboration relative           Articulating further the design requirements for supporting semi-
to this field and concludes that the IR field needs to better under-       structured workflows, [2] divides the specificity frontier into four
stand and improve the systems that support both direct and indirect        sub-spectra: providing context for enactment, monitoring constraints
collaboration during information tasks. This is supported by stud-         about the task, providing/planning options to reach a goal, and
ies in various IR settings. [6] investigated patent engineers, which       guiding through a given script. Building on this classification, in
is a specific community conducting IR processes, and found that            this paper we focus on providing support to the IR communities in
they were involved in various collaborative activities. Overall, ex-       the first two sub-spectra.
isting studies have observed that collaboration is indeed endemic
to the broader activities of individuals who perform information           6.    DESIGN AND ARCHITECTURE
seeking, searching, and retrieval tasks. However, with our work               To fulfill the requirements described on Section 4, we devised
we aim to address two specific limitations of the existing literature.     the architecture shown on Figure 3. It refers to the classical CLEF
First, the prior studies of collaborative practices in IR have focused     experiments organization, that is arranged in terms of different do-
on describing the practices of teams or groups of users in different       mains (e.g., Medical, Patents, etc.). For each domain one or more
settings and user communities (e.g., academia, industry, medicine,         tracks are available. As described on Section 3, the organization
patent offices; see [4] or have developed new tools to support these       of a track is a complex collaborative process and encompasses sev-
practices [8], but have not yet systematically investigated how to         eral tasks that exhibit some precedence relationships. The task flow
support collaboration at the level of a large communities of prac-         of a track is formalized using an Extended Light livE Gantt (Ele-
tice. Second, while there has been research on how to support com-         Gantt) and is shown within the CUI to the user, acting as the main
munities of practices of professionals (e.g., [10]) or scientists (e.g.,   entry point for collaborative activities (Figure 4). A suitable ad-
collaboratories, see [6], we focus specifically on how to support          ministrative interface allows for adapting the task flow of a track
lightweight knowledge sharing and collaboration in the commu-              to procedural changes. According to [2] (Figure 2), EleGantt is in
nity of IR researchers and professionals who develop and evaluate          charge of providing a part of the process context, i.e., a structured
IR tools. This is a community with unique needs for collaboration          to-do list. The second part of the context, i.e., a shared common
and types of workflows: recurrently, specific sub-communities of           space, is provided by T@GZ [7]. EleGantt is an extension of the
volunteers need to agree on, build, and refine evaluation campaigns        traditional Gantt chart. It allows for:
for testing IR systems.                                                    - attaching a rich set of meta-data to tasks with the goal of sup-
   A key distinctive property of the IR communities is that their          porting collaborative activities: involved people, involved roles, as-
workflows cannot be fully specified a priori. That is, if we con-          sociated tags, kind of collaboration activities needed to accomplish
sider a continuum from highly specified to highly unspecified pro-         the task, and the list of other processes that share the same activity;
cesses, then we could classify the instances of workflows of the IR        - expressing the non overlapping constraint between tasks that
communities as intermediate cases along this continuum. [2], who           must be executed in sequence;
named this continuum the Specificity Frontier, observed a gap be-          - specifying temporal uncertainties (e.g., minimum and maxi-
tween two existing approaches for supporting collaboration: most           mum duration of an activity), and degree of freedom for milestones
collaborative systems have focused on either automating fixed work         and deliverable releases. Moreover, the EleGantt visualization is
processes (e.g., Enterprise Resource Planning tools) or simply sup-        both a visualization of the task flow and an interactive interface
porting communication in ad-hoc processes (e.g., email). To ad-            that allows for exploring and accessing task flow associated infor-
equately support the collaboration in IR communities we need to            mation, like roles, people, similarities with other task flows, etc.
bridge the gap between these two approaches using lightweight                 T@GZ is a social software system for organizational informa-
tools that are compatible semi-structured workflows. While the             tion sharing. In T@GZ the user can share by simply sending an
                                                                            The Collaborative User Interface (CUI) is the Web based access
                                                                         point to all the collaborative activities (see figure 4). It is basically
                                                                         split in two subcomponents. The first one allows for managing per-
                                                                         sonal user collaborative information (left part of the picture), e.g.,
                                                                         messages, polls, etc. The second one refers to the whole process
                                                                         and allows for both exploring it using EleGantt, discovering peo-
                                                                         ple, roles, and tags and browsing the whole set of tagged emails.
                                                                         Moreover, the CUI contains a set of tabs that allows for accessing
                                                                         the collaborative tools that have been specified in the EleGantt. In
                                                                         order to provide the user with an unique access point to the pro-
                                                                         cess resources, the CUI sends tagged emails to T@GZ, containing
                                                                         a link to the collaborative resources (e.g., link to dropbox folder or
                                                                         to Google Docs document)
                                                                         7.    CONCLUSION AND FUTURE WORK
                                                                            As result of our investigation we identified some general chal-
                                                                         lenges or open issues for this domain: 1) find a good balance be-
                                                                         tween the need for flexibility to fit various, partially-defined pro-
  Figure 3: The architecture of the Promise collaborative system
                                                                         cesses and the need for enough specification in order to allow au-
                                                                         tomation 2) identify the set of predefined tags (some common and
                                                                         some domain specific) 3) semi-automate the tagging process (e.g.,
                                                                         an intelligent assistant).
                                                                         Acknowledgements
                                                                         The work in this paper has been supported by the PROMISE NoE
                                                                         (contract n. 258191) project as a part of the 7th Framework Pro-
                                                                         gram of the European commission (FP7/2007-2013).
                                                                         8.    REFERENCES
                                                                          [1] V. Bellotti, N. Ducheneaut, M. Howard, I. Smith, and
                                                                              R. Grinter. Quality versus quantity: e-mail-centric task
                                                                              management and its relation with overload.
                                                                              Human-Computer Interaction archive, 20(1), June 2005.
                                                                          [2] A. Bernstein. How can cooperative work tools support
                                                                              dynamic group process? bridging the specificity frontier. . In
                                                                              ACM conference on Computer supported cooperative work
                                                                              (CSCW ’00), 2000.
                                                                          [3] M. Croce, E. Di Reto, G. Granato, P. Hansen, A. Sabetta,
                                                                              G. Santucci, and F. Veltri. Collaborative User Interface
                                                                              Requirements. In PROMISE deliverable D5.1, 2011.
                                                                          [4] J. . Foster. Collaborative information seeking and retrieval. In
                                                                              Annual Review of Information Science and Technology,
    Figure 4: The Promise Collaborative User Interface (CUI)                  Volume 40, 2006, 329Ű356, 2006.
                                                                          [5] P. Hansen, G. L. Granato, and G. Santucci. Collecting and
                                                                              Assessing collaborative requirements. In Collaborative
                                                                              Information Seeking (CIS 2011) workshop, 2011.
email message with the content to be shared, and addressing the           [6] P. Hansen and K. Jarvelin. Collaborative information
message to one or more topic specific keywords. For example, one              retrieval in an information- intensive domain. Information
might use the address, bizdev@share.X.com, for referring to infor-            Processing and Management (IPM). 2005.
mation related to "business development" topic (see Figure 4, top         [7] P. M. Joshi, C. Bartolini, and S. Graupner. T@gz: intuitive
left). Thus, the content of that email is ’tagged’ by the keyword             and effortless categorization and sharing of email
’bizdev’. Any mail may have multiple tags attached in this man-               conversations. In A. Mille and F. L. e. a. Gandon, editors,
ner, in the ’To’ or ’CC’ fields, using any client. While enabling             WWW(Companion Volume), page 365. ACM, 2012.
easy publishing and re-finding of this information, the system does       [8] M. Morris and E. Horvitz. SearchTogether: an interface for
not induce people to send additional emails other than those that             collaborative web search. In 20th annual ACM symposium on
they are already sharing. Focusing now on the implementation of               User interface software and technology (UIST ’07)., 2007.
T@GZ in the whole system, using a set of predefined tags (i.e., the       [9] H. R. Motahari-Nezhad, C. Bartolini, S. Graupner,
tags associated to the elegantt’ s tasks), T@GZ provides a means              S. Singhal, and S. Spence. IT Support Conversation
for indexing the emails that are exchanged among the organizers               Manager: A Conversation-Centered Approach and Tool for
of the tracks, including links to smart attachments. The work-flow            Managing Best Practice IT Processes. In Hewlett Packard
engine is aware of the elegantts and using time information and in-           Laboratories Palo Alto,USA,2010, 2010.
specting the KB sends through email different kinds of notifications     [10] R. S. W. . Wenger, E.; McDermott. Cultivating Communities
(e.g., a deadline is approaching, it is time to move to the next step,        of Practice . Harvard Business Press, 2002.
etc.) to people involved in the tracks organization.