Application Domains in the Research Papers at BENEVOL: A Retrospective Andrea Capiluppi Nemitari Ajienka Dept of Computer Science Dept of Computer Science Brunel University London, United Kingdom Edge Hill University, United Kingdom Bilyaminu Auwal Romo Dept of Engineering and Digital Technologies Coventry University, United Kingdom much) FLOSS data has been harvested, and how many times the authors flagged an issue Abstract in their different application domains. Second, we discuss the implications of using ‘applica- Research on empirical software engineering tion domain’ as the clustering factor in the has increasingly used the data that is made sampling of FLOSS data, and the generalisa- available in online repositories, specifically tions within and outside the clusters. Free/Libre/Open Source Software projects (FLOSS). The latest trends for researchers is Index terms— FLOSS, application domains, to gather “as much data as possible” to (i) BENEVOL papers prevent bias in the representation of a small sample, (ii) work with a sample as close as the 1 Introduction population itself, and (iii) showcase the per- The use of open, available data has been a welcomed formance of existing or new tools in treating accelerator in the software engineering research field. vast amount of data. Data on the processes and products available via an The effects of harvesting enormous amounts Open Source approach has led to an increasingly large of data have been only marginally considered number of workshops, conferences, papers and re- so far: data could be corrupted; repositories search attempts to describe the phenomenon. Re- could be forked; and developer identities could searchers gathered initially in 2001 around the Open be duplicated. In this paper we posit that Source Software (OSS) workshops, held annually in co- there is a fundamental flaw in harvesting large location with the ICSE series of conferences. Before amounts of data, and when generalising the the OSS workshop spawned into the OSS conference conclusions: the application domain, or con- in 2005, the BENEVOL community started to group text, of the analysed systems must be the pri- together researchers from the Software Evolution do- mary factor for the cluster sampling of FLOSS main. Its initial focus was ‘(...) to bring researchers projects. to identify and discuss important principles, problems, This paper presents two contributions: first, techniques and results related to software evolution re- we analyse a collection of 100 BENEVOL pa- search and practice’1 . pers that appeared showing whether (and how While the goal of a few BENEVOL papers has been to achieve the generality of the results [1], the domain, Copyright © by the paper’s authors. Use permitted under Cre- context and uniqueness of a software system have not ative Commons License Attribution 4.0 International (CC BY been considered very often by empirical software engi- 4.0). neering research. As in the example reported in [2], the In: D. Di Nucci, C. De Roover (eds.): Proceedings of the 18th extensive study of all JSON parsers available would Belgium-Netherlands Software Evolution Workshop, Brussels, Belgium, 28-11-2019, published at http://ceur-ws.org 1 https://smartcare.be/events/benevol-04-workshop 1 find similarities between them or common patterns. cannot be used as a dependent or independent variable That type of study would focus on one particular lan- for any model or analysis. guage (JSON), one specific domain (parsers) and in- The acknowledgement of GitHub as the newly es- evitably draw limited conclusions. On the other hand, tablished central focus for FLOSS development gener- considering the “parsers” domain (but without focus- ated a similar requirement, in terms of shared guide- ing on one single language) would show the common lines to avoid common mining mistakes [6]. Differently characteristics of developing that type of systems, and from [5], the 2014 paper mostly focused on the techni- irrespective of their language. cal aspects of GitHub, and how the collected metrics The underlying vision of this paper is to open a could skew the results, due to the inner workings of the proper debate on the importance of context for any Git toolset, and the different approach to FLOSS de- software system, and the uniqueness of its applica- velopment observed on GitHub (forking, non-software tion domain. This position paper stems from the development, inactivity of projects). Neither [5] or [6] work of several prominent researchers who called the warned about the variability of FLOSS projects, the community to ‘go deeper, not wider’ (Michael God- importance of their context, or the uniqueness of their frey at MSR 2017) and ‘minding the mine, mining the domains. mind’ [3]. We posit that past empirical investigations Outside of the FLOSS literature, the diversity and using FLOSS systems have been mostly blind to these context of software systems have received some atten- aspects (i.e., context and domain), establishing simi- tion in the past [4, 7]. The phrase “large scale” has larities between vastly different systems if they shared been frequently used in empirical software engineering a common pattern in one measured attribute. Using research to denote the magnitude of the analyzed case an extreme example, one could establish a similarity study or studied software sample. Notwithstanding, between the coupling of a ‘hammock‘ and a ‘bridge’ Nagappan et al. argue that analyzing a high number due to the fact that both are held at the sides. of projects is not always necessary [2]. But what is The purpose of this paper is to share some find- even more important is the selection of the projects ings about a selection of papers discussed during the studied. last few years of BENEVOL workshops. The focus is Interesting patterns valuable to researchers and specifically based on BENEVOL papers that have used practitioners are often identified in domain-based anal- FLOSS data. The context of our analysis is the di- ysis of software projects. Results from one domain versity of FLOSS projects under study, and how that might not be applicable in another. As such, it is im- was reflected by researchers in their findings. Some portant for results to be representative. 100 papers are analysed in terms of whether FLOSS Software categorization or domain clustering has projects are used, how many, and whether considera- gained importance over the years. For example, the tions of application domains have been used to inform knowledge of software trends in a particular domain the sampling of FLOSS projects, or the validity of the can assist developers in the search for domain-specific conclusions. We assume that domains are relevant as reusable components [8]. Tian et al. [9] proposed a a fundamental construct for any empirical software en- technique based on Latent Dirichlet Allocation for au- gineering research [4]. tomatic software categorization in open-source soft- ware repositories. 2 Related Work According to Haefliger et al. [10], “domain anal- yses, documentation, and quality standards enhance The vast literature on FLOSS systems of the last 10 the ability to reuse software components”. However, years has been possible also due to a series of guidelines our survey of past BENEVOL papers that have an- on how to perform quantitative, empirical analysis on alyzed OSS projects demonstrates that software do- FLOSS processes and products. When SourceForge2 mains have not been considered in most of the past was considered as the de-facto FLOSS forge, a well software engineering studies. received research paper shared more than one insight on the most common mistakes to avoid when mining data and results from the projects hosted there [5]. 3 A Survey of BENEVOL Papers: Among other more technical issues of mining this spe- 2012 to 2018 cific forge, this paper actively warned against an inac- In order to show how FLOSS data has been used and curate ‘screening’ of projects into samples: reducing a analysed by the BENEVOL community, we report here population to, say, ‘FLOSS projects with more than 7 an investigation of the research papers appeared in the developers’ would inevitably reduce the variables for last 5 years of the BENEVOL event. An overall 101 the analysis, but the ‘number of developers’ variable papers have been considered in this study: we share 2 https://sourceforge.net/ the raw data in the spreadsheet at https://tinyurl. 2 com/y69wkadr. Each paper was read by one of the co-authors, and summarised along the following points: • Use of FLOSS systems (yes/no): at first we checked whether FLOSS projects are used in the paper at all. This served as an indicator of the pervasiveness of FLOSS projects in the literature produced by BENEVOL papers. • Number of FLOSS systems used : in second in- stance, we trawled through the paper, annotating where the authors mentioned how many FLOSS systems were used. In the case of full papers, the abstract, introduction, methodology and conclu- sion were read for that purpose. • Analysis of application domains: thirdly, we con- sidered the methodology, results and conclusion of each paper, along with the threats to valid- ity, looking for considerations of application do- mains. We checked if the authors considered this attribute in the sampling of FLOSS projects, whether they limited their results against this axis, or whether it was considered a specific threat to validity. This attribute was coded as either Figure 1: Papers using FLOSS (above) and use {yes — no}. of FLOSS and non-FLOSS projects (below) in the BENEVOL (between 2012 and 2018) The contributions to the 2016 edition of BENEVOL are not available online, so they had to be excluded empirical approach, of each paper to determine how from our analysis. The spreadsheet with the categori- many FLOSS systems were reported in the study. Fig- sation of the papers has been made available for in- ure 2 displays the cumulative number of FLOSS sys- spection under the following link: https://tinyurl. tems used in BENEVOL papers, per year. The median com/y69wkadr. number of systems has increased from one analysed OSS system in 2012 to 1,127 systems in 2018. 3.1 BENEVOL use of FLOSS Systems In this section we provide the first point of our anal- ysis: ‘how many BENEVOL papers have used FLOSS systems in their analyses? ’. As visible in the two plots of Figure 1, researchers (and accepted BENEVOL pa- pers) have steadily used FLOSS systems for their pa- pers. The first plot shows the absolute numbers of ac- cepted BENEVOL submissions that use one or more FLOSS projects. The bottom plot of Figure 1 shows the ratio of FLOSS and non-FLOSS papers in the BENEVOL sample of papers. It is getting increasingly more com- mon to use one or more commercial software systems, or a combination of FLOSS and non-FLOSS projects. Figure 2: Cumulative number of FLOSS projects per year 3.2 Number of FLOSS systems used in The exponential number of FLOSS systems being BENEVOL used by BENEVOL papers has been accelerated by In this section we report on the number of FLOSS many factors: (i) availability of open forges (Fresh- systems evaluated by BENEVOL papers. For this pur- Meat, SourceForge, Savannah, Apache FSF, GitHub pose, we analysed the methodology description, or the and many others); (ii) common, shared toolsets to per- 3 form the analyses; (iii) guidelines on how to effectively considered the results, findings or discussion as con- use forges. strained by the type of system (e.g., its domain). This Below we give a summary of findings to assess the included checking how the threats to external validity trends observed in the number of FLOSS systems anal- (if any) addressed limited the conclusion to the do- ysed by the BENEVOL papers. main(s) under investigation. We grouped the papers into two categories (and Growth of sample sizes plotted them accordingly per year): The trend that we observed throughout the subsequent 1. papers that directly considered application do- years of the BENEVOL contributions is, fundamen- mains as drivers in the variability of the results tally, summarised as ‘the more the better’. Authors (stack ”YES” in Figure 3); have started to include larger and larger FLOSS sam- ples to their papers. We can assume that this pattern 2. papers that didn’t considered application domains has been followed in order to achieve the generality as drivers (stack ”NO” in Figure 3). of a paper’s findings. At the last edition of available BENEVOL contributions (BENEVOL 2018), over one The results of this analysis are shown in Figure 3: million FLOSS systems were considered for investiga- a ratio (%) is used to separate the papers in the tion, jointly by the accepted papers. two categories. It is clear from the visualisation that BENEVOL papers do not generally acknowledge the Uncertainty on sample sizes variability of results as driven by the domains of the systems involved. Earlier papers (especially from the Several BENEVOL papers use ecosystems [11], or um- 2012 batch) have a good cover of domains in the eval- brella projects [12], as their cases studies, whereas uation of the results, but this is not reflected in the other papers either take a subset of those super- later editions of BENEVOL. projects, or explicitly declaring the number of subpro- jects (e.g., Scala [13] or Python [14] projects) that they analysed. This means that our final figures are mostly lower bounds of the actual number of FLOSS systems being used by the BENEVOL community. Ecosystems vs time of analysis Several BENEVOL papers have used umbrella projects (for example, Gnome). In most cases we considered them as single FLOSS systems: depending on the time of the analysis, these larger projects can contain a vari- able number of sub-projects. This makes it difficult to define the status of the super-project, in terms of num- ber of its sub-projects, as well as their domains. This Figure 3: Application domains in papers using FLOSS also makes it difficult to replicate those studies, as well projects as their results and conclusions. As visible in the Figure, the majority of findings on FLOSS, as reported by BENEVOL papers, do Sampling and Pruning not mention application domains. In some cases, re- Throughout the editions of BENEVOL, sampling of a searchers have acknowledged the variability of the re- few systems (see for instance [15] or [16]) has given sults [17, 18], and hinted that other factors could play way to whole-forge analyses. Also, there seems to be a role in such variability. We considered as a “lim- a general view that ‘pruning’ a sample is a good idea ited” acknowledgment of the relevance of the applica- for removing outliers, or for promoting quality. This tion domain when authors mentioned the diversity of has an effect on the sample studied, and the represen- the systems under study. tativeness of the population as a whole. 4 BENEVOL: FLOSS and Domains 3.3 Application domains and FLOSS projects The birdseye view on the type of BENEVOL contri- The third analysis was based on the application do- butions (Sections 3.1, 3.2 and 3.3 above) reveals some mains of the systems considered in the empirical study. interesting trends when dealing with FLOSS projects. For all the papers (not only for those using FLOSS Below we discuss in more detail whether FLOSS pa- projects), we tried to establish whether the authors pers were analysed (”YES” or ”NO”), and whether 4 domains were considered in the analysis (”YES” or meta-models from GitHub), hence representing conve- ”NO”). nience sampling. For example in the study on control flow, Landman et al., [24] focused on the Sourcerer 4.1 FLOSS: YES, Domains: YES Corpus which contains 18K (13K non empty) Java projects. In an empirical analysis of the maintainabil- So far in the BENEVOL series, few papers explicitly ity of CRAN packages, Claes et al., [25] presented early addressed the importance of domains when analysing results on analysing the dependencies of the CRAN R systems, or when discussing findings. An interesting packages repository. perspective is given in [19], since it considers a very We concluded that most of the papers studied from specific type of systems, the ‘cross-system packages’. the BENEVOL series do not consider the application These systems are likely to show similar characteristics domains as an important factor for software analysis since they are supposed to act as vectors to an from or evolution. the overarching system. By drawing on the importance of the application domains in this paper [20], the authors signify the importance of domain analysis when creating a theo- 4.3 FLOSS: NO, Domains: YES retical and practical framework that supports the de- velopment and the evolution of adaptive data-intensive A few of the papers that we analysed are not based on software systems for ubiquitous environments in their FLOSS systems, but more in general on commercial, study. Thus, they focus on data and in particular on or in-house software. In a few cases, we observed that the problem of finding the most suitable portion of the authors actually considered the limitations of their data that have to be provided by the application in case studies to the one domain that was investigated. the of context of ‘self-adaptive system’. As a few of such examples, we noted a paper Likewise in the 11th edition of BENEVOL (2012), based on a banking system [26]; and one focused on [21] examined the impact and role of social media on the specific features of home-automation system [27]. software development. The authors argued that “so- Both these papers clearly acknowledged the limitations cial media is poised to bring about a paradigm shift given by the chosen application domains that their sys- in software engineering research” particularly in OSS tems are based on. In other cases, the authors specif- community. ically focused on one domain (for example, GIS sys- In the 2014 edition, only one BENEVOL study tems [28], or the larger business domain [29]). focusing on OSS projects implicitly highlighted the In general, the BENEVOL papers using non-OSS need to investigate projects from various domains [22]. software as their case studies do not use the domains The authors studied an OSS project called DrJava to aggregate results. Nonetheless a few BENEVOL and implicitly mentioned domains but did not investi- contributions have shown a clear pathway into not gen- gate multiple projects clustered into several domains. eralising the findings to all domains. According to the authors, “we chose an IDE since they contain elements of multiple domains. The IDE project was taken from the Qualitas Corpus and it con- sists of 3000 revisions since 2000 and the system grew 5 Conclusion from 30K SLOC in 2003 to 200K SLOC in 2013. We concluded that application domains are not well This paper analysed how open source software has represented or studied in the papers that use FLOSS been used by the BENEVOL contributions between data. 2012 and 2018. We showed the increasing number of BENEVOL contributions that used FOSS projects for their analyses. 4.2 FLOSS: YES, Domains: NO Although the majority of contributions do not ac- The vast majority of BENEVOL contributions, based nowledge the importance of domains when discussing on FLOSS systems, do not consider domains as one the findings, there is an increasing number of papers of the factors to take in consideration. An interesting that limit the results, or the data sampling, to specific example of this approach is given in [1], where the domains. We believe that one of the major challenges authors pose that ‘... (to) gather as much as possible for empirical software engineering is to better under- should be the aim of empirical software engineering’. stand the role of domains, especially in the evolution More in general, the approach of researchers is to of software systems. We propose for papers that em- focus on specific languages or source code models (see pirically analyse software systems to acknowledge such for instance the paper in [23], focused on all available challenge in a ‘threat to domain validity’. 5 References [11] Tom Mens, Bram Adams, and Josianne Marsan. Towards an interdisciplinary, socio-technical anal- [1] Antoine Pietri and Stefano Zacchiroli. To- ysis of software ecosystem health. arXiv preprint wards universal software evolution analysis. In arXiv:1711.04532, 2017. BENEVOL, pages 6–10, 2018. [2] Meiyappan Nagappan, Thomas Zimmermann, [12] Maëlick Claes. Applying biological evolution to and Christian Bird. Diversity in software engi- software ecosystems a case study with gnome. neering research. In Proceedings of the 2013 9th [13] Yunior Pacheco, Jonas De Bleser, Tim Molderez, Joint Meeting on Foundations of Software Engi- Dario Di Nucci, Wolfgang De Meuter, and Coen neering, pages 466–476. ACM, 2013. De Roover. Mining extension point patterns in [3] A. J. Ko. Mining the mind, minding the mine: scala. In BENEVOL, pages 16–20, 2018. grand challenges in comprehension and mining. [14] José Javier Merchante and Gregorio Robles. From In Andy Zaidman, Yasutaka Kamei, and Emily python to pythonic: Searching for python idioms Hill, editors, Proceedings of the 15th Interna- in github. tional Conference on Mining Software Reposito- ries, MSR 2018, Gothenburg, Sweden, May 28- [15] Ward Muylaert and Coen De Roover. Untangling 29, 2018, page 118. ACM, 2018. source code changes using program slicing. In [4] Steve Easterbrook, Janice Singer, Margaret-Anne BENEVOL, pages 36–38, 2017. Storey, and Daniela Damian. Selecting empiri- [16] Jie Tan, Mircea Lungu, and Paris Avgeriou. To- cal methods for software engineering research. In wards studying the evolution of technical debt Guide to advanced empirical software engineering, in the python projects from the apache software pages 285–311. Springer, 2008. ecosystem. In BENEVOL, pages 43–45, 2018. [5] James Howison and Kevin Crowston. The perils [17] Zeeger Lubsen, Andy Zaidman, and Martin and pitfalls of mining sourceforge. In Proceedings Pinzger. Using association rules to study the co- of the International Workshop on Mining Soft- evolution of production & test code. In Mining ware Repositories (MSR 2004. Citeseer, 2004. Software Repositories, 2009. MSR’09. 6th IEEE [6] Eirini Kalliamvakou, Georgios Gousios, Kelly International Working Conference on, pages 151– Blincoe, Leif Singer, Daniel M German, and 154. IEEE, 2009. Daniela Damian. The promises and perils of min- ing github. In Proceedings of the 11th working [18] Christian Rodrı́guez-Bustos and Jairo Aponte. conference on mining software repositories, pages How distributed version control systems impact 92–101. ACM, 2014. open source software projects. In Mining Soft- ware Repositories (MSR), 2012 9th IEEE Work- [7] Carmine Vassallo, Sebastiano Panichella, Fabio ing Conference on, pages 36–39. IEEE, 2012. Palomba, Sebastian Proksch, Andy Zaidman, and Harald C Gall. Context is king: The developer [19] Eleni Constantinou, Alexandre Decan, and Tom perspective on the usage of static analysis tools. Mens. Breaking the borders: an investigation of In 2018 IEEE 25th International Conference on cross-ecosystem software packages. arXiv preprint Software Analysis, Evolution and Reengineering arXiv:1812.04868, 2018. (SANER), pages 38–49. IEEE, 2018. [20] Marco Mori and Anthony Cleve. A framework [8] Yunwen Ye and Gerhard Fischer. Reuse- to support the development and evolution of self- conducive development environments. Automated adaptive data-intensive systems. In 11th edition Software Engineering, 12(2):199–235, 2005. of the BElgian-NEtherlands software eVOLution symposium (BENEVOL 2012), 01 2012. [9] Kai Tian, Meghan Revelle, and Denys Poshy- vanyk. Using latent dirichlet allocation for auto- [21] Maëlick Claes. Applying biological evolution to matic categorization of software. In 6th IEEE In- software ecosystems a case study with gnome. ternational Working Conference on Mining Soft- In 11th edition of the BElgian-NEtherlands soft- ware Repositories, 2009. MSR’09., pages 163– ware eVOLution symposium (BENEVOL 2012), 166. IEEE, 2009. 01 2012. [10] Stefan Haefliger, Georg Von Krogh, and Sebas- tian Spaeth. Code reuse in open source software. Management Science, 54(1):180–193, 2008. 6 [22] Davy Landman, Alexander Serebrenik, and Jur- [26] Elvan Kula, Ayushi Rastogi, Hennie Huijgens, gen Vinju. The relationship between cc and sloc: and Arie van Deursen. Characterizing rapid re- a preliminary analysis on its evolution. In Benevol leases in a large banking company: A case study. 2014 (Seminar on Software Evolution in Belgium In BENEVOL, pages 56–60, 2018. and the Netherlands, Amsterdam, The Nether- lands, November 27-28, 2014), pages 29–30. Cen- [27] Tim Molderez, Coen De Roover, and Wolfgang trum voor Wiskunde en Informatica, 2014. De Meuter. Towards a domain-specific lan- guage for automated network management. In [23] Önder Babur, Loek Cleophas, and Mark van den BENEVOL, pages 39–43, 2017. Brand. Metamodel clone detection with samos. BENEVOL, 2018. [28] Cosmin Tomozei, Iulian Furdu, and Simona-Elena Vârlan. Gis sdks dynamics echoed by social re- [24] Davy Landman, Alexander Serebrenik, and Jur- quirements transformations. In BENEVOL, pages gen Vinju. Control flow in the wild a first look 22–25, 2017. at 13k java projects. BENEVOL 2013, page 35, 2013. [29] Gururaj Maddodi and Slinger Jansen. Responsive software architecture patterns for workload vari- [25] Maëlick Claes, Tom Mens, and Philippe Gros- ations: A case-study in a cqrs-based enterprise jean. Towards an empirical analysis of the main- application. In BENEVOL, page 30, 2017. tainability of cran packages. BENEVOL 2013, page 42. 7