=Paper= {{Paper |id=None |storemode=property |title=Supplying Collaborative Source-code Retrieval Tools to Software Developers |pdfUrl=https://ceur-ws.org/Vol-763/paper2.pdf |volume=Vol-763 |dblpUrl=https://dblp.org/rec/conf/eurohcir/Fernandez-LunaHC11 }} ==Supplying Collaborative Source-code Retrieval Tools to Software Developers== https://ceur-ws.org/Vol-763/paper2.pdf
       Supplying Collaborative Source-code Retrieval Tools
                     to Software Developers

        Juan M. Fernández-Luna                          Juan F. Huete                       Julio C. Rodríguez-Cano
        Departamento de Ciencias de             Departamento de Ciencias de                Centro de Desarrollo Territorial
        la Computación e Inteligencia           la Computación e Inteligencia               Holguín. Universidad de las
            Artificial, CITIC-UGR.                  Artificial, CITIC-UGR.                  Ciencias Informáticas, 80100
           Universidad de Granada,                 Universidad de Granada,                         Holguín, Cuba
            18071 Granada, Spain                    18071 Granada, Spain                           jcrcano@uci.cu
         jmfluna@decsai.ugr.es                      jhg@decsai.ugr.es

ABSTRACT                                                                    One of the reasons that the existing IR systems do not
Collaborative information retrieval (CIR) and search-driven              adequately   support collaboration is that there are not good
software development (SDD) are both new emerging research                models and methods that describe users’ behavior during
fields; the first one was born in response to the problem of             collaborative tasks. To address this issue, the community
satisfying shared information needs of groups of users that              has adopted CIR as an emerging research field in charge to
collaborate explicitly, and the second to explore source-code            establish techniques to satisfy the shared information needs
retrieval concept as an essential activity during software de-           of group members, starting from the extension of the IR
velopment process. Taking advantages of the recent con-                  process with the knowledge about the queries, the context,
tributions in CIR and SDD, in this paper we introduce a                  and the explicit collaboration habits among group members.
plug-in that can be added to the NetBeans IDE in order                   CIR community identifies four fundamental features in this
to enable remote teams of developers to use collaborative                multidisciplinary field that can enhance the value of colla-
source-code retrieval tools. We also include in this work                borative search tools: user intent transition, awareness, di-
experimental results to confirm that CIR&SDD techniques                  vision of labor, and sharing of knowledge [2].
give out better search results than individual strategies.                  In addition, SDD is a new research area motivated by
                                                                         the observation that software developers spend most of their
                                                                         time searching pertinent information that they need in order
Categories and Subject Descriptors                                       to solve their tasks at hand. We identified that SDD context
H.5.3 [Information Interfaces and presentation (e.g.,                    was a very interesting field where collaborative IR features
HCI)]: Group and Organization Interfaces; H.3.3 [Information could be greatly exploited. For this reason we use the phrase
Storage and Retrieval]: Search Process.                                  collaborative SDD to refer to the application of di↵erent
                                                                         collaborative IR techniques in the SDD process [3].
General Terms                                                               It’s known than some IDE incorporate tools with support
Design, Human Factors.                                                   for developer’s collaboration practices, but without making
                                                                         emphasis in source-code retrieval. In this sense, the objec-
                                                                         tive of this paper is to present the results of the comparison
Keywords                                                                 of traditional SDD and collaborative SDD. In both search
Collaborative Information Seeking and Retrieval, Search-                 scenarios, we use the NetBeans IDE plug-in COSME (CO-
driven Software Development, Multi-user Search Interface.                llaborative Search MEeting) with the appropriate configura-
                                                                         tions. COSME endows NetBeans IDE with traditional and
1. INTRODUCTION                                                          collaborative source-code retrieval tools.
                                                                            This paper is organized as follows: The first section presents
               “Collaboration” seems to be the buzzword this year,       a brief overview of related works and place our research in
                 just like “knowledge management” was last year.
                                                                         context. Then, we describe our software tool and method,
                                                    – David Coleman
                                                                         explaining the di↵erent aspects of our experimental evalua-
   In the last few years, Information Retrieval (IR) Systems             tion. Finally we discuss the results and present some con-
have become critical tools for software developers. Today                clusion remarks.
we can use vertical IR systems focused in integrated deve-
lopment environment (IDE) extensions for source-code re-
trieval as such Strathcona [5], CodeConjurer [6], and Code-              2. RELATED WORK
Genie [1], but these only allow an individual interaction from              There is a small body of work that investigates methods
the team developers’s perspective.                                       to join collaborative information retrieval and search-driven
                                                                         software development. On the one hand, some researchers
                                                                         have identified di↵erent search scenarios where it is necessa-
Copyright c 2011 for the individual papers by the papers’ authors.       ry to extend IR systems with collaborative capabilities. For
Copying permitted only for private and academic purposes. This volume is example, in the Web context, SearchTogether [8] is a sys-
published and copyrighted by the editors of EuroHCIR2011.
                                                                         tem which enables remote users to synchronously or asyn-
EuroHCIR ’11 Newcastle, UK                                               chronously collaborate when searching the Web. It supports
collaboration with several mechanisms of group awareness,         3.    THE COSME PLUG-IN
division of labor, and persistence. On the other hand, the          To improve software developers with shared technical in-
SDD community presents di↵erent prototypes and systems.           formation needs we implemented the COSME front-end as
For example, Sourcerer [1] is an infrastructure for large-scale   a NetBeans IDE plug-in. The principal technologies that
indexing and analysis of open source code. Sourcerer crawls       we used to implement it include the CIRLab framework [2],
Internet looking for Java code from a variety of locations,       NetBeans IDE platform, Java as programming language,
such as open source repositories, public web sites, and ver-      and AMENITIES (A MEthodology for aNalysis and desIgn
sion control systems.                                             of cooperaTIve systEmS) as software engineering method-
   CIR systems can be applied in several domains, such as         ology. COSME is designed to enable either synchronous
travel planning, organizing social events, working on a home-     or asynchronous, but explicit remote collaboration among
work assignment or medical environments, among many oth-          teams of developers with shared technical needs. In the fol-
ers. We identified software development as another possi-         lowing section we are going to outline COSME.
ble application field where much evidence of collaboration
among programmers on a development task can be found.             3.1    Current Features
For example, concurrent edition of models and processes re-          Figure 1 is a screenshot showing various features of our
quire synchronous collaboration between architects and de-        COSME plug-in. We refer to the circled numbers in the
velopers who can not be physically present at a common            following text.
location [7].                                                        1. Search Control Panel: It is integrated in turn for
   However, current SDD systems do not have support for           three collapsible panels; (a) configuration, where the devel-
explicit collaboration among developers with shared techni-       opers can select the search options and engines to accomplish
cal information needs, which frequently look for additional       the search tasks; (b) filters show the user’s interest field ac-
documentation on the API (Application Programming In-             cording to the collection contents; and (c) collection type
terface), read posts for people having the same problem,          permit to specify the type of search result’s items.
search the company’s site for help with the API, or looking          2. Search Results Window: The search results can
for source code examples where other people successfully          be classified according to three di↵erent source-code local-
used the API. Fortunately, in the last few years, some re-        ization: (d) results can be obtained as a consequence of
searchers have realized that collaboration is an important        division of labor techniques introduced by the collaborative
feature, which should be analyzed in detail in order to be        search session (CoSS) chairman. A CoSS is a group of end-
integrated with operational IR systems, upgrading them to         users working together to satisfy their shared information
CIR systems.                                                      needs. One CoSS only can have one developer in the roll of
   As an approach to these situations, we propose in this         chairman; (e) or by explicit recommendations accomplished
work the COSME plug-in [4]. It makes the contribution in          for group members of their CoSS; (f ) finally, search results
current SDD providing explicit support for teams of devel-        also can be obtained by individual search.
opers, enabling developers to collaborate on both the pro-           3. Item Viewer: It shows full item content in di↵erent
cess and results of a search. COSME provides collabora-           formats, e.g. pdf, plain text, and Java source-code files.
tive search functions for exploring and managing source-code      All item formats are showed to the developers within the
repositories and documents about technical information in         NetBeans IDE.
the software development context.                                    4. CoSS Portal: Developer can use the chat tool em-
   In order to support such CIR techniques, COSME pro-            bedded in the CoSS Portal to negotiate the creation of a
vides some collaborative services in the context of SDD:          collaborative search session or to join at any active CoSS.
                                                                  For each CoSS, the chairman can to establish the integrity
   • The embedded chat tool enables direct communication          criteria, membership policy, and division of labor principles.
     among di↵erent developers.
                                                                  4.    EXPERIMENTAL EVALUATION
   • Relevant search results can be shared with the explicit         In this section we are going to show how collaborative
     recommender mechanisms.                                      features applied to SDD improves the traditional opera-
                                                                  tion without them. Then if we consider the null hypoth-
   • Another important feature is the automatic division          esis (H0 ) that AT SDD ACSDD , our alternative hypothesis
     of labor. By implementing an e↵ective division of la-        (H1 ) is that the collaborative work should help to improve
     bor policy the search task can be split across team          the retrieval performance in a SDD task: AT SDD < ACSDD ,
     developers, thereby avoiding considerable duplication        where TSDD stands for Traditional SDD and CSDD for Col-
     of e↵ort.                                                    laborative SDD. To evaluate our proposal we compare 10
                                                                  group interactions in two di↵erent kinds of search scenarios
   • Through awareness mechanisms all developers are al-          (SS) on SDD, SS2k+1 and SS2(k+1) ; k 2 0, . . . , 9. SS2k+1
     ways informed about the team activities to save e↵ort.       represents a team of developers that use a conventional IR
     Awareness is a valuable learning mechanism that help         system, this means that developers do not have access to
     the less experienced developers to view the syntax used      techniques of division of labor, sharing of knowledge, or
     by their teammates, being an inspiration to reformu-         awareness (traditional SDD – TSDD), while S2(k+1) repre-
     late their queries.                                          sents a team of developers that uses a CIR system. Then, 5
                                                                  teams worked in a TSDD context (those with odd subindexes)
   • All search results can be annotated, either for personal     and the other 5 with CSDD (even subindexes). In both
     use, like a summary, or in the team context, for dis-        search scenarios, we used COSME with the appropriate con-
     cussion threads and ratings.                                 figurations for both settings.
                       Figure 1: Screenshot of NetBeans IDE with COSME plug-in installed


   The search scenario was a common task proposed to a           qe 0 .
group of developers without Java background: select the
most relevant classes to manage GUI (Graphical User In-                                                 T
                                                                                                  | qu 0 qe 0 |
terface) components using di↵erent Java API with a total                        sim(qu , qe ) =         S        =        (1)
                                                                                                  | qu 0 q e 0 |
of 2420 files. Specifically, Jidesoft (634), OpenSwing (434)),
SwingX (732)) and Swing (620). We have focussed on these            In Equation 1, is a value between 0 and 1. For this ex-
API because they are directly related to the context of the      periment we assumed that there exists an expert’s relevance
                                                                                                  N +1                  S
experiment although they are not complete: we have only          judgement to qu only if 9         2
                                                                                                       , where N =| qu 0 qe 0 |,
                                                                                                    N
considered their most relevant API packages for the experi-      selecting the relevance judgements that correspond to max
ment.                                                            for each qe .
   For evaluation purposes, we created our own test collec-         In order to measure the e↵ectiveness of the described SST SDD
tion: a group of 10 experts proposed a set of 100 topics         and SSCSDD scenarios, we considered as evaluation mea-
strongly related to the objective of the experimentation,        sures the metrics proposed by Pickens et al. in [9], i.e. se-
then their corresponding queries were submitted to each of       lected precision (Ps , the fraction of documents judged rel-
the following search engines: Lucene, Minion, Indri and Ter-     evant by the developer that were marked relevant in the
rier. A document pool was obtained by ranking fusion and         ground truth), and selected recall (Rs ) as their dependent
later the experts, grouped in pairs, determined the relevant     measures. To summarize e↵ectiveness in a single number we
documents for each topic.                                        use F1s measure.
   In collaborative SDD, it is very important to analyze the        According to the documents that each team selected for
interaction among group members, therefore, unlike the eval-     each common topic, F1s measure was computed. In order to
uation of a traditional SDD system, we can not fix the           accomplish the statistical analysis of the results, we use the
queries. Then each participating group could freely formu-       non parametric test of Wilcoxon (all against all). The Monte
late their queries to the search engine. In order to compare     Carlo method was used and adjusted with the 99% trust
team results, the search engine identified the most similar      intervals and 10000 signs. It was considered the existences
queries formulated by the members of the teams with re-          of significance (Sig.) as appear in Table 1.
spect to those formulated by experts. If the system found           We could notice significative di↵erences between TSDD
enough similarity and if they occur in all the groups, then      and CSDD groups, considered two by two. As F1s values for
these queries are considered that deals with the same topic      CSDD groups are better than those computed from TSDD
and selected for group comparison purposes. The similar-         groups for those cases, then we could conclude that when
ity measure between queries is calculated by Equation 1. A       teams works supported by collaborative tools, they obtain
user query (qu ) and an expert query (qe ) are considered to     better results. From Table 1, we could realize that apart
be the same if they are within a given similarity threshold.     from SS5 , each SST SDD has got at least one SSCSDD with
A new query qu 0 is obtained applying the Porter stemmer         significant di↵erence values of F1s . With this results we
algorithm to qu ’s terms, and analogously, we would obtain       accept H1 , because AT SDD < ACSDD .
                           SS1       SS2      SS3       SS4        SS5     SS6       SS7       SS8       SS9
                                                                F1s
                   SS2     0, 062
                   SS3     0, 180    0, 051
                   SS4     0, 022† 0, 212     0, 038†
                   SS5     0, 272    0, 069   0, 152    0, 054
                   SS6     0, 045† 0, 201     0, 080    0, 290    0, 056
                   SS7     0, 215    0, 031† 0, 340     0,090     0, 206   0, 042†
                   SS8     0, 053    0, 131   0, 061    0, 190    0, 072   0, 158    0, 070
                   SS9     0, 243    0, 072   0, 201    0, 029† 0, 344     0, 068    0, 238    0, 042†
                   SS10 0, 065       0, 098   0, 041† 0, 290      0, 072   0, 235    0, 045†   0, 132    0, 058
                   †: significant di↵erence (0, 01  Sig < 0, 05)
                   ‡: highly significant di↵erence (Sig < 0, 01)


                                             Table 1: Wilcoxon Test Results.


5.   CONCLUSIONS AND FUTURE WORKS                                Search-Driven Development-Users, Infrastructure, Tools
   Collaboration in SDD is just being recognized as an im-       and Evaluation, pages 1–4, Washington, DC, USA,
portant research area. While in some cases collaborative         2009. IEEE Computer Society.
SDD can be handled by conventional search engines, we        [2] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
need to understand how the collaborative nature of source-       and J. C. Rodrı́guez-Cano. Cirlab: A groupware
code retrieval a↵ects the requirements on search algorithms.     framework for collaborative information retrieval
Research in this direction needs to adopt the theories and       research. Information Processing and Management,
methodologies of SDD and CIR, and supplement them with           44(1):256–273, 2009.
new approach constructs as appropriate. In this work we      [3] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
present COSME as a collaborative SDD tool that helps team        and J. C. Rodrı́guez-Cano. Improving search–driven
developers to find better sources than searching with tradi-     development with collaborative information retrieval
tional SDD strategies, as well as an experimental approach       techniques. In HCIR ’09: IIIrd Workshop on
that confirms our hypotheses.                                    Human–Computer Interaction and Information
   Our ongoing work focuses on the COSME back-end which          Retrieval, Washington DC, USA, 2009.
poses fundamental research challenges as well as provides    [4] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
new opportunities to let group members collaborate in new        and J. C. Rodrı́guez-Cano. Cosme: A netbeans ide
ways:                                                            plugin as a team–centric alternative for search driven
   (i) Profile Analysis. We aim to analyze the user-generated    software development. In Group 2010: Ist Workshop on
data using various techniques from the study of di↵erent col-    Collaborative Information Seeking, Florida, USA, 2010.
laborative virtual environments and recommender systems.     [5] R. Holmes. Do developers search for source code
With the results, our goal is to provide better personalized     examples using multiple facts? In SUITE 2009: First
search results, support the users while searching and recom-     International Workshop on Search-Driven Development
mend users to relevant trustworthy collaborators.                Users, Infrastructure, Tools and Evaluation, Vancouver,
   (ii) P2P/hybrid-network Retrieval. Due to scalability         Canada, 2009.
and privacy issues we favor a distributed environment by     [6] W. Janjic. Lowering the barrier to reuse through
means of a P2P (peer-to-peer) retrieval feature based on hy-     test-driven search. In SUITE 2009: First International
brid architecture to store the user-generated data and col-      Workshop on Search-Driven Development Users,
lections (CASPER – CollAborative Search in PEer-to-peer          Infrastructure, Tools and Evaluation, Vancouver,
netwoRks). The main challenges in this respect are to ensure     Canada, 2009.
a reliable and efficient data analysis.                      [7] M. Jiménez, M. Piattini, and A. Vizcaı́no. Challenges
                                                                 and improvements in distributed software development:
6. ACKNOWLEDGMENTS                                               A systematic review. 2009.
   This work has been partially supported by the Spanish re- [8] M. R. Morris and E. Horvitz. Searchtogether: an
search programme Consolider Ingenio 2010: MIPRCV (CSD2007-       interface for collaborative web search. In UIST ’07:
00018), the Spanish MICIN project TIN2008-06566-C04-01           Proceedings of the 20th annual ACM symposium on
and the Andalusian Consejerı́a de Innovación, Ciencia y Em-     User interface software and technology, pages 3–12,
presa project TIC-04526. We also would like to thank Car-        New York, NY, USA, 2007. ACM.
men Torres for support and discussions and for all of our    [9] J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and
experiment participants.                                         M. Back. Algorithmic mediation for collaborative
                                                                 exploratory search. In SIGIR ’08: Proceedings of the
7. REFERENCES                                                    31st annual international ACM SIGIR conference on
[1] S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An       Research and development in information retrieval,
    internet-scale software repository. In SUITE ’09:            pages 315–322, New York, NY, USA, 2008. ACM.
    Proceedings of the 2009 ICSE Workshop on