Application Domains in the Research Papers at
                   BENEVOL: A Retrospective

                 Andrea Capiluppi                                                  Nemitari Ajienka
             Dept of Computer Science                                         Dept of Computer Science
     Brunel University London, United Kingdom                            Edge Hill University, United Kingdom

                                          Bilyaminu Auwal Romo
                                Dept of Engineering and Digital Technologies
                                   Coventry University, United Kingdom


                                                                         much) FLOSS data has been harvested, and
                                                                         how many times the authors flagged an issue
                       Abstract                                          in their different application domains. Second,
                                                                         we discuss the implications of using ‘applica-
    Research on empirical software engineering                           tion domain’ as the clustering factor in the
    has increasingly used the data that is made                          sampling of FLOSS data, and the generalisa-
    available in online repositories, specifically                       tions within and outside the clusters.
    Free/Libre/Open Source Software projects
    (FLOSS). The latest trends for researchers is                     Index terms— FLOSS, application domains,
    to gather “as much data as possible” to (i)                     BENEVOL papers
    prevent bias in the representation of a small
    sample, (ii) work with a sample as close as the                 1     Introduction
    population itself, and (iii) showcase the per-
                                                                    The use of open, available data has been a welcomed
    formance of existing or new tools in treating
                                                                    accelerator in the software engineering research field.
    vast amount of data.
                                                                    Data on the processes and products available via an
    The effects of harvesting enormous amounts                      Open Source approach has led to an increasingly large
    of data have been only marginally considered                    number of workshops, conferences, papers and re-
    so far: data could be corrupted; repositories                   search attempts to describe the phenomenon. Re-
    could be forked; and developer identities could                 searchers gathered initially in 2001 around the Open
    be duplicated. In this paper we posit that                      Source Software (OSS) workshops, held annually in co-
    there is a fundamental flaw in harvesting large                 location with the ICSE series of conferences. Before
    amounts of data, and when generalising the                      the OSS workshop spawned into the OSS conference
    conclusions: the application domain, or con-                    in 2005, the BENEVOL community started to group
    text, of the analysed systems must be the pri-                  together researchers from the Software Evolution do-
    mary factor for the cluster sampling of FLOSS                   main. Its initial focus was ‘(...) to bring researchers
    projects.                                                       to identify and discuss important principles, problems,
    This paper presents two contributions: first,                   techniques and results related to software evolution re-
    we analyse a collection of 100 BENEVOL pa-                      search and practice’1 .
    pers that appeared showing whether (and how                        While the goal of a few BENEVOL papers has been
                                                                    to achieve the generality of the results [1], the domain,
Copyright © by the paper’s authors. Use permitted under Cre-        context and uniqueness of a software system have not
ative Commons License Attribution 4.0 International (CC BY          been considered very often by empirical software engi-
4.0).                                                               neering research. As in the example reported in [2], the
In: D. Di Nucci, C. De Roover (eds.): Proceedings of the 18th       extensive study of all JSON parsers available would
Belgium-Netherlands Software Evolution Workshop, Brussels,
Belgium, 28-11-2019, published at http://ceur-ws.org                    1 https://smartcare.be/events/benevol-04-workshop


                                                                1
find similarities between them or common patterns.               cannot be used as a dependent or independent variable
That type of study would focus on one particular lan-            for any model or analysis.
guage (JSON), one specific domain (parsers) and in-                 The acknowledgement of GitHub as the newly es-
evitably draw limited conclusions. On the other hand,            tablished central focus for FLOSS development gener-
considering the “parsers” domain (but without focus-             ated a similar requirement, in terms of shared guide-
ing on one single language) would show the common                lines to avoid common mining mistakes [6]. Differently
characteristics of developing that type of systems, and          from [5], the 2014 paper mostly focused on the techni-
irrespective of their language.                                  cal aspects of GitHub, and how the collected metrics
   The underlying vision of this paper is to open a              could skew the results, due to the inner workings of the
proper debate on the importance of context for any               Git toolset, and the different approach to FLOSS de-
software system, and the uniqueness of its applica-              velopment observed on GitHub (forking, non-software
tion domain. This position paper stems from the                  development, inactivity of projects). Neither [5] or [6]
work of several prominent researchers who called the             warned about the variability of FLOSS projects, the
community to ‘go deeper, not wider’ (Michael God-                importance of their context, or the uniqueness of their
frey at MSR 2017) and ‘minding the mine, mining the              domains.
mind’ [3]. We posit that past empirical investigations              Outside of the FLOSS literature, the diversity and
using FLOSS systems have been mostly blind to these              context of software systems have received some atten-
aspects (i.e., context and domain), establishing simi-           tion in the past [4, 7]. The phrase “large scale” has
larities between vastly different systems if they shared         been frequently used in empirical software engineering
a common pattern in one measured attribute. Using                research to denote the magnitude of the analyzed case
an extreme example, one could establish a similarity             study or studied software sample. Notwithstanding,
between the coupling of a ‘hammock‘ and a ‘bridge’               Nagappan et al. argue that analyzing a high number
due to the fact that both are held at the sides.                 of projects is not always necessary [2]. But what is
   The purpose of this paper is to share some find-              even more important is the selection of the projects
ings about a selection of papers discussed during the            studied.
last few years of BENEVOL workshops. The focus is                   Interesting patterns valuable to researchers and
specifically based on BENEVOL papers that have used              practitioners are often identified in domain-based anal-
FLOSS data. The context of our analysis is the di-               ysis of software projects. Results from one domain
versity of FLOSS projects under study, and how that              might not be applicable in another. As such, it is im-
was reflected by researchers in their findings. Some             portant for results to be representative.
100 papers are analysed in terms of whether FLOSS                   Software categorization or domain clustering has
projects are used, how many, and whether considera-              gained importance over the years. For example, the
tions of application domains have been used to inform            knowledge of software trends in a particular domain
the sampling of FLOSS projects, or the validity of the           can assist developers in the search for domain-specific
conclusions. We assume that domains are relevant as              reusable components [8]. Tian et al. [9] proposed a
a fundamental construct for any empirical software en-           technique based on Latent Dirichlet Allocation for au-
gineering research [4].                                          tomatic software categorization in open-source soft-
                                                                 ware repositories.
2     Related Work                                                  According to Haefliger et al. [10], “domain anal-
                                                                 yses, documentation, and quality standards enhance
The vast literature on FLOSS systems of the last 10
                                                                 the ability to reuse software components”. However,
years has been possible also due to a series of guidelines
                                                                 our survey of past BENEVOL papers that have an-
on how to perform quantitative, empirical analysis on
                                                                 alyzed OSS projects demonstrates that software do-
FLOSS processes and products. When SourceForge2
                                                                 mains have not been considered in most of the past
was considered as the de-facto FLOSS forge, a well
                                                                 software engineering studies.
received research paper shared more than one insight
on the most common mistakes to avoid when mining
data and results from the projects hosted there [5].             3   A Survey of BENEVOL Papers:
Among other more technical issues of mining this spe-                2012 to 2018
cific forge, this paper actively warned against an inac-
                                                                 In order to show how FLOSS data has been used and
curate ‘screening’ of projects into samples: reducing a
                                                                 analysed by the BENEVOL community, we report here
population to, say, ‘FLOSS projects with more than 7
                                                                 an investigation of the research papers appeared in the
developers’ would inevitably reduce the variables for
                                                                 last 5 years of the BENEVOL event. An overall 101
the analysis, but the ‘number of developers’ variable
                                                                 papers have been considered in this study: we share
    2 https://sourceforge.net/                                   the raw data in the spreadsheet at https://tinyurl.


                                                             2
com/y69wkadr.
  Each paper was read by one of the co-authors, and
summarised along the following points:

  • Use of FLOSS systems (yes/no): at first we
    checked whether FLOSS projects are used in the
    paper at all. This served as an indicator of the
    pervasiveness of FLOSS projects in the literature
    produced by BENEVOL papers.

  • Number of FLOSS systems used : in second in-
    stance, we trawled through the paper, annotating
    where the authors mentioned how many FLOSS
    systems were used. In the case of full papers, the
    abstract, introduction, methodology and conclu-
    sion were read for that purpose.

  • Analysis of application domains: thirdly, we con-
    sidered the methodology, results and conclusion
    of each paper, along with the threats to valid-
    ity, looking for considerations of application do-
    mains. We checked if the authors considered
    this attribute in the sampling of FLOSS projects,
    whether they limited their results against this
    axis, or whether it was considered a specific threat
    to validity. This attribute was coded as either             Figure 1: Papers using FLOSS (above) and use
    {yes — no}.                                                 of FLOSS and non-FLOSS projects (below) in the
                                                                BENEVOL (between 2012 and 2018)
   The contributions to the 2016 edition of BENEVOL
are not available online, so they had to be excluded            empirical approach, of each paper to determine how
from our analysis. The spreadsheet with the categori-           many FLOSS systems were reported in the study. Fig-
sation of the papers has been made available for in-            ure 2 displays the cumulative number of FLOSS sys-
spection under the following link: https://tinyurl.             tems used in BENEVOL papers, per year. The median
com/y69wkadr.                                                   number of systems has increased from one analysed
                                                                OSS system in 2012 to 1,127 systems in 2018.
3.1   BENEVOL use of FLOSS Systems
In this section we provide the first point of our anal-
ysis: ‘how many BENEVOL papers have used FLOSS
systems in their analyses? ’. As visible in the two plots
of Figure 1, researchers (and accepted BENEVOL pa-
pers) have steadily used FLOSS systems for their pa-
pers. The first plot shows the absolute numbers of ac-
cepted BENEVOL submissions that use one or more
FLOSS projects.
   The bottom plot of Figure 1 shows the ratio of
FLOSS and non-FLOSS papers in the BENEVOL
sample of papers. It is getting increasingly more com-
mon to use one or more commercial software systems,
or a combination of FLOSS and non-FLOSS projects.               Figure 2: Cumulative number of FLOSS projects per
                                                                year
3.2   Number of        FLOSS      systems     used    in
                                                                   The exponential number of FLOSS systems being
      BENEVOL
                                                                used by BENEVOL papers has been accelerated by
In this section we report on the number of FLOSS                many factors: (i) availability of open forges (Fresh-
systems evaluated by BENEVOL papers. For this pur-              Meat, SourceForge, Savannah, Apache FSF, GitHub
pose, we analysed the methodology description, or the           and many others); (ii) common, shared toolsets to per-


                                                            3
form the analyses; (iii) guidelines on how to effectively         considered the results, findings or discussion as con-
use forges.                                                       strained by the type of system (e.g., its domain). This
   Below we give a summary of findings to assess the              included checking how the threats to external validity
trends observed in the number of FLOSS systems anal-              (if any) addressed limited the conclusion to the do-
ysed by the BENEVOL papers.                                       main(s) under investigation.
                                                                     We grouped the papers into two categories (and
Growth of sample sizes                                            plotted them accordingly per year):
The trend that we observed throughout the subsequent                  1. papers that directly considered application do-
years of the BENEVOL contributions is, fundamen-                         mains as drivers in the variability of the results
tally, summarised as ‘the more the better’. Authors                      (stack ”YES” in Figure 3);
have started to include larger and larger FLOSS sam-
ples to their papers. We can assume that this pattern                 2. papers that didn’t considered application domains
has been followed in order to achieve the generality                     as drivers (stack ”NO” in Figure 3).
of a paper’s findings. At the last edition of available
BENEVOL contributions (BENEVOL 2018), over one                       The results of this analysis are shown in Figure 3:
million FLOSS systems were considered for investiga-              a ratio (%) is used to separate the papers in the
tion, jointly by the accepted papers.                             two categories. It is clear from the visualisation that
                                                                  BENEVOL papers do not generally acknowledge the
Uncertainty on sample sizes                                       variability of results as driven by the domains of the
                                                                  systems involved. Earlier papers (especially from the
Several BENEVOL papers use ecosystems [11], or um-                2012 batch) have a good cover of domains in the eval-
brella projects [12], as their cases studies, whereas             uation of the results, but this is not reflected in the
other papers either take a subset of those super-                 later editions of BENEVOL.
projects, or explicitly declaring the number of subpro-
jects (e.g., Scala [13] or Python [14] projects) that they
analysed. This means that our final figures are mostly
lower bounds of the actual number of FLOSS systems
being used by the BENEVOL community.

Ecosystems vs time of analysis
Several BENEVOL papers have used umbrella projects
(for example, Gnome). In most cases we considered
them as single FLOSS systems: depending on the time
of the analysis, these larger projects can contain a vari-
able number of sub-projects. This makes it difficult to
define the status of the super-project, in terms of num-
ber of its sub-projects, as well as their domains. This           Figure 3: Application domains in papers using FLOSS
also makes it difficult to replicate those studies, as well       projects
as their results and conclusions.                                    As visible in the Figure, the majority of findings
                                                                  on FLOSS, as reported by BENEVOL papers, do
Sampling and Pruning                                              not mention application domains. In some cases, re-
Throughout the editions of BENEVOL, sampling of a                 searchers have acknowledged the variability of the re-
few systems (see for instance [15] or [16]) has given             sults [17, 18], and hinted that other factors could play
way to whole-forge analyses. Also, there seems to be              a role in such variability. We considered as a “lim-
a general view that ‘pruning’ a sample is a good idea             ited” acknowledgment of the relevance of the applica-
for removing outliers, or for promoting quality. This             tion domain when authors mentioned the diversity of
has an effect on the sample studied, and the represen-            the systems under study.
tativeness of the population as a whole.
                                                                  4      BENEVOL: FLOSS and Domains
3.3   Application domains and FLOSS projects
                                                                  The birdseye view on the type of BENEVOL contri-
The third analysis was based on the application do-               butions (Sections 3.1, 3.2 and 3.3 above) reveals some
mains of the systems considered in the empirical study.           interesting trends when dealing with FLOSS projects.
For all the papers (not only for those using FLOSS                Below we discuss in more detail whether FLOSS pa-
projects), we tried to establish whether the authors              pers were analysed (”YES” or ”NO”), and whether


                                                              4
domains were considered in the analysis (”YES” or              meta-models from GitHub), hence representing conve-
”NO”).                                                         nience sampling. For example in the study on control
                                                               flow, Landman et al., [24] focused on the Sourcerer
4.1   FLOSS: YES, Domains: YES                                 Corpus which contains 18K (13K non empty) Java
                                                               projects. In an empirical analysis of the maintainabil-
So far in the BENEVOL series, few papers explicitly            ity of CRAN packages, Claes et al., [25] presented early
addressed the importance of domains when analysing             results on analysing the dependencies of the CRAN R
systems, or when discussing findings. An interesting           packages repository.
perspective is given in [19], since it considers a very           We concluded that most of the papers studied from
specific type of systems, the ‘cross-system packages’.         the BENEVOL series do not consider the application
These systems are likely to show similar characteristics       domains as an important factor for software analysis
since they are supposed to act as vectors to an from           or evolution.
the overarching system.
   By drawing on the importance of the application
domains in this paper [20], the authors signify the
importance of domain analysis when creating a theo-            4.3   FLOSS: NO, Domains: YES
retical and practical framework that supports the de-
velopment and the evolution of adaptive data-intensive         A few of the papers that we analysed are not based on
software systems for ubiquitous environments in their          FLOSS systems, but more in general on commercial,
study. Thus, they focus on data and in particular on           or in-house software. In a few cases, we observed that
the problem of finding the most suitable portion of            the authors actually considered the limitations of their
data that have to be provided by the application in            case studies to the one domain that was investigated.
the of context of ‘self-adaptive system’.                         As a few of such examples, we noted a paper
   Likewise in the 11th edition of BENEVOL (2012),             based on a banking system [26]; and one focused on
[21] examined the impact and role of social media on           the specific features of home-automation system [27].
software development. The authors argued that “so-             Both these papers clearly acknowledged the limitations
cial media is poised to bring about a paradigm shift           given by the chosen application domains that their sys-
in software engineering research” particularly in OSS          tems are based on. In other cases, the authors specif-
community.                                                     ically focused on one domain (for example, GIS sys-
   In the 2014 edition, only one BENEVOL study                 tems [28], or the larger business domain [29]).
focusing on OSS projects implicitly highlighted the               In general, the BENEVOL papers using non-OSS
need to investigate projects from various domains [22].        software as their case studies do not use the domains
The authors studied an OSS project called DrJava               to aggregate results. Nonetheless a few BENEVOL
and implicitly mentioned domains but did not investi-          contributions have shown a clear pathway into not gen-
gate multiple projects clustered into several domains.         eralising the findings to all domains.
According to the authors, “we chose an IDE since
they contain elements of multiple domains. The IDE
project was taken from the Qualitas Corpus and it con-
sists of 3000 revisions since 2000 and the system grew         5     Conclusion
from 30K SLOC in 2003 to 200K SLOC in 2013.
   We concluded that application domains are not well          This paper analysed how open source software has
represented or studied in the papers that use FLOSS            been used by the BENEVOL contributions between
data.                                                          2012 and 2018. We showed the increasing number of
                                                               BENEVOL contributions that used FOSS projects for
                                                               their analyses.
4.2   FLOSS: YES, Domains: NO
                                                                  Although the majority of contributions do not ac-
The vast majority of BENEVOL contributions, based              nowledge the importance of domains when discussing
on FLOSS systems, do not consider domains as one               the findings, there is an increasing number of papers
of the factors to take in consideration. An interesting        that limit the results, or the data sampling, to specific
example of this approach is given in [1], where the            domains. We believe that one of the major challenges
authors pose that ‘... (to) gather as much as possible         for empirical software engineering is to better under-
should be the aim of empirical software engineering’.          stand the role of domains, especially in the evolution
   More in general, the approach of researchers is to          of software systems. We propose for papers that em-
focus on specific languages or source code models (see         pirically analyse software systems to acknowledge such
for instance the paper in [23], focused on all available       challenge in a ‘threat to domain validity’.


                                                           5
References                                                    [11] Tom Mens, Bram Adams, and Josianne Marsan.
                                                                   Towards an interdisciplinary, socio-technical anal-
 [1] Antoine Pietri and Stefano Zacchiroli.      To-
                                                                   ysis of software ecosystem health. arXiv preprint
     wards universal software evolution analysis. In
                                                                   arXiv:1711.04532, 2017.
     BENEVOL, pages 6–10, 2018.
 [2] Meiyappan Nagappan, Thomas Zimmermann,                   [12] Maëlick Claes. Applying biological evolution to
     and Christian Bird. Diversity in software engi-               software ecosystems a case study with gnome.
     neering research. In Proceedings of the 2013 9th         [13] Yunior Pacheco, Jonas De Bleser, Tim Molderez,
     Joint Meeting on Foundations of Software Engi-                Dario Di Nucci, Wolfgang De Meuter, and Coen
     neering, pages 466–476. ACM, 2013.                            De Roover. Mining extension point patterns in
 [3] A. J. Ko. Mining the mind, minding the mine:                  scala. In BENEVOL, pages 16–20, 2018.
     grand challenges in comprehension and mining.
                                                              [14] José Javier Merchante and Gregorio Robles. From
     In Andy Zaidman, Yasutaka Kamei, and Emily
                                                                   python to pythonic: Searching for python idioms
     Hill, editors, Proceedings of the 15th Interna-
                                                                   in github.
     tional Conference on Mining Software Reposito-
     ries, MSR 2018, Gothenburg, Sweden, May 28-              [15] Ward Muylaert and Coen De Roover. Untangling
     29, 2018, page 118. ACM, 2018.                                source code changes using program slicing. In
 [4] Steve Easterbrook, Janice Singer, Margaret-Anne               BENEVOL, pages 36–38, 2017.
     Storey, and Daniela Damian. Selecting empiri-            [16] Jie Tan, Mircea Lungu, and Paris Avgeriou. To-
     cal methods for software engineering research. In             wards studying the evolution of technical debt
     Guide to advanced empirical software engineering,             in the python projects from the apache software
     pages 285–311. Springer, 2008.                                ecosystem. In BENEVOL, pages 43–45, 2018.
 [5] James Howison and Kevin Crowston. The perils
                                                              [17] Zeeger Lubsen, Andy Zaidman, and Martin
     and pitfalls of mining sourceforge. In Proceedings
                                                                   Pinzger. Using association rules to study the co-
     of the International Workshop on Mining Soft-
                                                                   evolution of production & test code. In Mining
     ware Repositories (MSR 2004. Citeseer, 2004.
                                                                   Software Repositories, 2009. MSR’09. 6th IEEE
 [6] Eirini Kalliamvakou, Georgios Gousios, Kelly                  International Working Conference on, pages 151–
     Blincoe, Leif Singer, Daniel M German, and                    154. IEEE, 2009.
     Daniela Damian. The promises and perils of min-
     ing github. In Proceedings of the 11th working           [18] Christian Rodrı́guez-Bustos and Jairo Aponte.
     conference on mining software repositories, pages             How distributed version control systems impact
     92–101. ACM, 2014.                                            open source software projects. In Mining Soft-
                                                                   ware Repositories (MSR), 2012 9th IEEE Work-
 [7] Carmine Vassallo, Sebastiano Panichella, Fabio                ing Conference on, pages 36–39. IEEE, 2012.
     Palomba, Sebastian Proksch, Andy Zaidman, and
     Harald C Gall. Context is king: The developer            [19] Eleni Constantinou, Alexandre Decan, and Tom
     perspective on the usage of static analysis tools.            Mens. Breaking the borders: an investigation of
     In 2018 IEEE 25th International Conference on                 cross-ecosystem software packages. arXiv preprint
     Software Analysis, Evolution and Reengineering                arXiv:1812.04868, 2018.
     (SANER), pages 38–49. IEEE, 2018.
                                                              [20] Marco Mori and Anthony Cleve. A framework
 [8] Yunwen Ye and Gerhard Fischer.           Reuse-               to support the development and evolution of self-
     conducive development environments. Automated                 adaptive data-intensive systems. In 11th edition
     Software Engineering, 12(2):199–235, 2005.                    of the BElgian-NEtherlands software eVOLution
                                                                   symposium (BENEVOL 2012), 01 2012.
 [9] Kai Tian, Meghan Revelle, and Denys Poshy-
     vanyk. Using latent dirichlet allocation for auto-       [21] Maëlick Claes. Applying biological evolution to
     matic categorization of software. In 6th IEEE In-             software ecosystems a case study with gnome.
     ternational Working Conference on Mining Soft-                In 11th edition of the BElgian-NEtherlands soft-
     ware Repositories, 2009. MSR’09., pages 163–                  ware eVOLution symposium (BENEVOL 2012),
     166. IEEE, 2009.                                              01 2012.
[10] Stefan Haefliger, Georg Von Krogh, and Sebas-
     tian Spaeth. Code reuse in open source software.
     Management Science, 54(1):180–193, 2008.


                                                          6
[22] Davy Landman, Alexander Serebrenik, and Jur-              [26] Elvan Kula, Ayushi Rastogi, Hennie Huijgens,
     gen Vinju. The relationship between cc and sloc:               and Arie van Deursen. Characterizing rapid re-
     a preliminary analysis on its evolution. In Benevol            leases in a large banking company: A case study.
     2014 (Seminar on Software Evolution in Belgium                 In BENEVOL, pages 56–60, 2018.
     and the Netherlands, Amsterdam, The Nether-
     lands, November 27-28, 2014), pages 29–30. Cen-           [27] Tim Molderez, Coen De Roover, and Wolfgang
     trum voor Wiskunde en Informatica, 2014.                       De Meuter.    Towards a domain-specific lan-
                                                                    guage for automated network management. In
[23] Önder Babur, Loek Cleophas, and Mark van den                  BENEVOL, pages 39–43, 2017.
     Brand. Metamodel clone detection with samos.
     BENEVOL, 2018.                                            [28] Cosmin Tomozei, Iulian Furdu, and Simona-Elena
                                                                    Vârlan. Gis sdks dynamics echoed by social re-
[24] Davy Landman, Alexander Serebrenik, and Jur-                   quirements transformations. In BENEVOL, pages
     gen Vinju. Control flow in the wild a first look               22–25, 2017.
     at 13k java projects. BENEVOL 2013, page 35,
     2013.                                                     [29] Gururaj Maddodi and Slinger Jansen. Responsive
                                                                    software architecture patterns for workload vari-
[25] Maëlick Claes, Tom Mens, and Philippe Gros-                   ations: A case-study in a cqrs-based enterprise
     jean. Towards an empirical analysis of the main-               application. In BENEVOL, page 30, 2017.
     tainability of cran packages. BENEVOL 2013,
     page 42.


                                                           7