=Paper= {{Paper |id=None |storemode=property |title=Supplying Collaborative Source-code Retrieval Tools to Software Developers |pdfUrl=https://ceur-ws.org/Vol-763/paper2.pdf |volume=Vol-763 |dblpUrl=https://dblp.org/rec/conf/eurohcir/Fernandez-LunaHC11 }} ==Supplying Collaborative Source-code Retrieval Tools to Software Developers== https://ceur-ws.org/Vol-763/paper2.pdf

Supplying Collaborative Source-code Retrieval Tools
to Software Developers

Juan M. Fernández-Luna Juan F. Huete Julio C. Rodríguez-Cano
Departamento de Ciencias de Departamento de Ciencias de Centro de Desarrollo Territorial
la Computación e Inteligencia la Computación e Inteligencia Holguín. Universidad de las
Artificial, CITIC-UGR. Artificial, CITIC-UGR. Ciencias Informáticas, 80100
Universidad de Granada, Universidad de Granada, Holguín, Cuba
18071 Granada, Spain 18071 Granada, Spain jcrcano@uci.cu
jmfluna@decsai.ugr.es jhg@decsai.ugr.es

ABSTRACT One of the reasons that the existing IR systems do not
Collaborative information retrieval (CIR) and search-driven adequately support collaboration is that there are not good
software development (SDD) are both new emerging research models and methods that describe users’ behavior during
fields; the first one was born in response to the problem of collaborative tasks. To address this issue, the community
satisfying shared information needs of groups of users that has adopted CIR as an emerging research field in charge to
collaborate explicitly, and the second to explore source-code establish techniques to satisfy the shared information needs
retrieval concept as an essential activity during software de- of group members, starting from the extension of the IR
velopment process. Taking advantages of the recent con- process with the knowledge about the queries, the context,
tributions in CIR and SDD, in this paper we introduce a and the explicit collaboration habits among group members.
plug-in that can be added to the NetBeans IDE in order CIR community identifies four fundamental features in this
to enable remote teams of developers to use collaborative multidisciplinary field that can enhance the value of colla-
source-code retrieval tools. We also include in this work borative search tools: user intent transition, awareness, di-
experimental results to confirm that CIR&SDD techniques vision of labor, and sharing of knowledge [2].
give out better search results than individual strategies. In addition, SDD is a new research area motivated by
the observation that software developers spend most of their
time searching pertinent information that they need in order
Categories and Subject Descriptors to solve their tasks at hand. We identified that SDD context
H.5.3 [Information Interfaces and presentation (e.g., was a very interesting field where collaborative IR features
HCI)]: Group and Organization Interfaces; H.3.3 [Information could be greatly exploited. For this reason we use the phrase
Storage and Retrieval]: Search Process. collaborative SDD to refer to the application of di↵erent
collaborative IR techniques in the SDD process [3].
General Terms It’s known than some IDE incorporate tools with support
Design, Human Factors. for developer’s collaboration practices, but without making
emphasis in source-code retrieval. In this sense, the objec-
tive of this paper is to present the results of the comparison
Keywords of traditional SDD and collaborative SDD. In both search
Collaborative Information Seeking and Retrieval, Search- scenarios, we use the NetBeans IDE plug-in COSME (CO-
driven Software Development, Multi-user Search Interface. llaborative Search MEeting) with the appropriate configura-
tions. COSME endows NetBeans IDE with traditional and
1. INTRODUCTION collaborative source-code retrieval tools.
This paper is organized as follows: The first section presents
“Collaboration” seems to be the buzzword this year, a brief overview of related works and place our research in
just like “knowledge management” was last year.
context. Then, we describe our software tool and method,
– David Coleman
explaining the di↵erent aspects of our experimental evalua-
In the last few years, Information Retrieval (IR) Systems tion. Finally we discuss the results and present some con-
have become critical tools for software developers. Today clusion remarks.
we can use vertical IR systems focused in integrated deve-
lopment environment (IDE) extensions for source-code re-
trieval as such Strathcona [5], CodeConjurer [6], and Code- 2. RELATED WORK
Genie [1], but these only allow an individual interaction from There is a small body of work that investigates methods
the team developers’s perspective. to join collaborative information retrieval and search-driven
software development. On the one hand, some researchers
have identified di↵erent search scenarios where it is necessa-
Copyright c 2011 for the individual papers by the papers’ authors. ry to extend IR systems with collaborative capabilities. For
Copying permitted only for private and academic purposes. This volume is example, in the Web context, SearchTogether [8] is a sys-
published and copyrighted by the editors of EuroHCIR2011.
tem which enables remote users to synchronously or asyn-
EuroHCIR ’11 Newcastle, UK chronously collaborate when searching the Web. It supports
collaboration with several mechanisms of group awareness, 3. THE COSME PLUG-IN
division of labor, and persistence. On the other hand, the To improve software developers with shared technical in-
SDD community presents di↵erent prototypes and systems. formation needs we implemented the COSME front-end as
For example, Sourcerer [1] is an infrastructure for large-scale a NetBeans IDE plug-in. The principal technologies that
indexing and analysis of open source code. Sourcerer crawls we used to implement it include the CIRLab framework [2],
Internet looking for Java code from a variety of locations, NetBeans IDE platform, Java as programming language,
such as open source repositories, public web sites, and ver- and AMENITIES (A MEthodology for aNalysis and desIgn
sion control systems. of cooperaTIve systEmS) as software engineering method-
CIR systems can be applied in several domains, such as ology. COSME is designed to enable either synchronous
travel planning, organizing social events, working on a home- or asynchronous, but explicit remote collaboration among
work assignment or medical environments, among many oth- teams of developers with shared technical needs. In the fol-
ers. We identified software development as another possi- lowing section we are going to outline COSME.
ble application field where much evidence of collaboration
among programmers on a development task can be found. 3.1 Current Features
For example, concurrent edition of models and processes re- Figure 1 is a screenshot showing various features of our
quire synchronous collaboration between architects and de- COSME plug-in. We refer to the circled numbers in the
velopers who can not be physically present at a common following text.
location [7]. 1. Search Control Panel: It is integrated in turn for
However, current SDD systems do not have support for three collapsible panels; (a) configuration, where the devel-
explicit collaboration among developers with shared techni- opers can select the search options and engines to accomplish
cal information needs, which frequently look for additional the search tasks; (b) filters show the user’s interest field ac-
documentation on the API (Application Programming In- cording to the collection contents; and (c) collection type
terface), read posts for people having the same problem, permit to specify the type of search result’s items.
search the company’s site for help with the API, or looking 2. Search Results Window: The search results can
for source code examples where other people successfully be classified according to three di↵erent source-code local-
used the API. Fortunately, in the last few years, some re- ization: (d) results can be obtained as a consequence of
searchers have realized that collaboration is an important division of labor techniques introduced by the collaborative
feature, which should be analyzed in detail in order to be search session (CoSS) chairman. A CoSS is a group of end-
integrated with operational IR systems, upgrading them to users working together to satisfy their shared information
CIR systems. needs. One CoSS only can have one developer in the roll of
As an approach to these situations, we propose in this chairman; (e) or by explicit recommendations accomplished
work the COSME plug-in [4]. It makes the contribution in for group members of their CoSS; (f ) finally, search results
current SDD providing explicit support for teams of devel- also can be obtained by individual search.
opers, enabling developers to collaborate on both the pro- 3. Item Viewer: It shows full item content in di↵erent
cess and results of a search. COSME provides collabora- formats, e.g. pdf, plain text, and Java source-code files.
tive search functions for exploring and managing source-code All item formats are showed to the developers within the
repositories and documents about technical information in NetBeans IDE.
the software development context. 4. CoSS Portal: Developer can use the chat tool em-
In order to support such CIR techniques, COSME pro- bedded in the CoSS Portal to negotiate the creation of a
vides some collaborative services in the context of SDD: collaborative search session or to join at any active CoSS.
For each CoSS, the chairman can to establish the integrity
• The embedded chat tool enables direct communication criteria, membership policy, and division of labor principles.
among di↵erent developers.
4. EXPERIMENTAL EVALUATION
• Relevant search results can be shared with the explicit In this section we are going to show how collaborative
recommender mechanisms. features applied to SDD improves the traditional opera-
tion without them. Then if we consider the null hypoth-
• Another important feature is the automatic division esis (H0 ) that AT SDD ACSDD , our alternative hypothesis
of labor. By implementing an e↵ective division of la- (H1 ) is that the collaborative work should help to improve
bor policy the search task can be split across team the retrieval performance in a SDD task: AT SDD < ACSDD ,
developers, thereby avoiding considerable duplication where TSDD stands for Traditional SDD and CSDD for Col-
of e↵ort. laborative SDD. To evaluate our proposal we compare 10
group interactions in two di↵erent kinds of search scenarios
• Through awareness mechanisms all developers are al- (SS) on SDD, SS2k+1 and SS2(k+1) ; k 2 0, . . . , 9. SS2k+1
ways informed about the team activities to save e↵ort. represents a team of developers that use a conventional IR
Awareness is a valuable learning mechanism that help system, this means that developers do not have access to
the less experienced developers to view the syntax used techniques of division of labor, sharing of knowledge, or
by their teammates, being an inspiration to reformu- awareness (traditional SDD – TSDD), while S2(k+1) repre-
late their queries. sents a team of developers that uses a CIR system. Then, 5
teams worked in a TSDD context (those with odd subindexes)
• All search results can be annotated, either for personal and the other 5 with CSDD (even subindexes). In both
use, like a summary, or in the team context, for dis- search scenarios, we used COSME with the appropriate con-
cussion threads and ratings. figurations for both settings.
Figure 1: Screenshot of NetBeans IDE with COSME plug-in installed

The search scenario was a common task proposed to a qe 0 .
group of developers without Java background: select the
most relevant classes to manage GUI (Graphical User In- T
| qu 0 qe 0 |
terface) components using di↵erent Java API with a total sim(qu , qe ) = S = (1)
| qu 0 q e 0 |
of 2420 files. Specifically, Jidesoft (634), OpenSwing (434)),
SwingX (732)) and Swing (620). We have focussed on these In Equation 1, is a value between 0 and 1. For this ex-
API because they are directly related to the context of the periment we assumed that there exists an expert’s relevance
N +1 S
experiment although they are not complete: we have only judgement to qu only if 9 2
, where N =| qu 0 qe 0 |,
N
considered their most relevant API packages for the experi- selecting the relevance judgements that correspond to max
ment. for each qe .
For evaluation purposes, we created our own test collec- In order to measure the e↵ectiveness of the described SST SDD
tion: a group of 10 experts proposed a set of 100 topics and SSCSDD scenarios, we considered as evaluation mea-
strongly related to the objective of the experimentation, sures the metrics proposed by Pickens et al. in [9], i.e. se-
then their corresponding queries were submitted to each of lected precision (Ps , the fraction of documents judged rel-
the following search engines: Lucene, Minion, Indri and Ter- evant by the developer that were marked relevant in the
rier. A document pool was obtained by ranking fusion and ground truth), and selected recall (Rs ) as their dependent
later the experts, grouped in pairs, determined the relevant measures. To summarize e↵ectiveness in a single number we
documents for each topic. use F1s measure.
In collaborative SDD, it is very important to analyze the According to the documents that each team selected for
interaction among group members, therefore, unlike the eval- each common topic, F1s measure was computed. In order to
uation of a traditional SDD system, we can not fix the accomplish the statistical analysis of the results, we use the
queries. Then each participating group could freely formu- non parametric test of Wilcoxon (all against all). The Monte
late their queries to the search engine. In order to compare Carlo method was used and adjusted with the 99% trust
team results, the search engine identified the most similar intervals and 10000 signs. It was considered the existences
queries formulated by the members of the teams with re- of significance (Sig.) as appear in Table 1.
spect to those formulated by experts. If the system found We could notice significative di↵erences between TSDD
enough similarity and if they occur in all the groups, then and CSDD groups, considered two by two. As F1s values for
these queries are considered that deals with the same topic CSDD groups are better than those computed from TSDD
and selected for group comparison purposes. The similar- groups for those cases, then we could conclude that when
ity measure between queries is calculated by Equation 1. A teams works supported by collaborative tools, they obtain
user query (qu ) and an expert query (qe ) are considered to better results. From Table 1, we could realize that apart
be the same if they are within a given similarity threshold. from SS5 , each SST SDD has got at least one SSCSDD with
A new query qu 0 is obtained applying the Porter stemmer significant di↵erence values of F1s . With this results we
algorithm to qu ’s terms, and analogously, we would obtain accept H1 , because AT SDD < ACSDD .
SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9
F1s
SS2 0, 062
SS3 0, 180 0, 051
SS4 0, 022† 0, 212 0, 038†
SS5 0, 272 0, 069 0, 152 0, 054
SS6 0, 045† 0, 201 0, 080 0, 290 0, 056
SS7 0, 215 0, 031† 0, 340 0,090 0, 206 0, 042†
SS8 0, 053 0, 131 0, 061 0, 190 0, 072 0, 158 0, 070
SS9 0, 243 0, 072 0, 201 0, 029† 0, 344 0, 068 0, 238 0, 042†
SS10 0, 065 0, 098 0, 041† 0, 290 0, 072 0, 235 0, 045† 0, 132 0, 058
†: significant di↵erence (0, 01  Sig < 0, 05)
‡: highly significant di↵erence (Sig < 0, 01)

Table 1: Wilcoxon Test Results.

5. CONCLUSIONS AND FUTURE WORKS Search-Driven Development-Users, Infrastructure, Tools
Collaboration in SDD is just being recognized as an im- and Evaluation, pages 1–4, Washington, DC, USA,
portant research area. While in some cases collaborative 2009. IEEE Computer Society.
SDD can be handled by conventional search engines, we [2] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
need to understand how the collaborative nature of source- and J. C. Rodrı́guez-Cano. Cirlab: A groupware
code retrieval a↵ects the requirements on search algorithms. framework for collaborative information retrieval
Research in this direction needs to adopt the theories and research. Information Processing and Management,
methodologies of SDD and CIR, and supplement them with 44(1):256–273, 2009.
new approach constructs as appropriate. In this work we [3] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
present COSME as a collaborative SDD tool that helps team and J. C. Rodrı́guez-Cano. Improving search–driven
developers to find better sources than searching with tradi- development with collaborative information retrieval
tional SDD strategies, as well as an experimental approach techniques. In HCIR ’09: IIIrd Workshop on
that confirms our hypotheses. Human–Computer Interaction and Information
Our ongoing work focuses on the COSME back-end which Retrieval, Washington DC, USA, 2009.
poses fundamental research challenges as well as provides [4] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
new opportunities to let group members collaborate in new and J. C. Rodrı́guez-Cano. Cosme: A netbeans ide
ways: plugin as a team–centric alternative for search driven
(i) Profile Analysis. We aim to analyze the user-generated software development. In Group 2010: Ist Workshop on
data using various techniques from the study of di↵erent col- Collaborative Information Seeking, Florida, USA, 2010.
laborative virtual environments and recommender systems. [5] R. Holmes. Do developers search for source code
With the results, our goal is to provide better personalized examples using multiple facts? In SUITE 2009: First
search results, support the users while searching and recom- International Workshop on Search-Driven Development
mend users to relevant trustworthy collaborators. Users, Infrastructure, Tools and Evaluation, Vancouver,
(ii) P2P/hybrid-network Retrieval. Due to scalability Canada, 2009.
and privacy issues we favor a distributed environment by [6] W. Janjic. Lowering the barrier to reuse through
means of a P2P (peer-to-peer) retrieval feature based on hy- test-driven search. In SUITE 2009: First International
brid architecture to store the user-generated data and col- Workshop on Search-Driven Development Users,
lections (CASPER – CollAborative Search in PEer-to-peer Infrastructure, Tools and Evaluation, Vancouver,
netwoRks). The main challenges in this respect are to ensure Canada, 2009.
a reliable and efficient data analysis. [7] M. Jiménez, M. Piattini, and A. Vizcaı́no. Challenges
and improvements in distributed software development:
6. ACKNOWLEDGMENTS A systematic review. 2009.
This work has been partially supported by the Spanish re- [8] M. R. Morris and E. Horvitz. Searchtogether: an
search programme Consolider Ingenio 2010: MIPRCV (CSD2007- interface for collaborative web search. In UIST ’07:
00018), the Spanish MICIN project TIN2008-06566-C04-01 Proceedings of the 20th annual ACM symposium on
and the Andalusian Consejerı́a de Innovación, Ciencia y Em- User interface software and technology, pages 3–12,
presa project TIC-04526. We also would like to thank Car- New York, NY, USA, 2007. ACM.
men Torres for support and discussions and for all of our [9] J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and
experiment participants. M. Back. Algorithmic mediation for collaborative
exploratory search. In SIGIR ’08: Proceedings of the
7. REFERENCES 31st annual international ACM SIGIR conference on
[1] S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An Research and development in information retrieval,
internet-scale software repository. In SUITE ’09: pages 315–322, New York, NY, USA, 2008. ACM.
Proceedings of the 2009 ICSE Workshop on