=Paper=
{{Paper
|id=Vol-2047/BENEVOL_2017_paper_2
|storemode=property
|title=Towards an Interdisciplinary, Socio-technical Analysis of Software Ecosystems Health
|pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_2.pdf
|volume=Vol-2047
|authors=Tom Mens,Bram Adams,Josianne Marsan
|dblpUrl=https://dblp.org/rec/conf/benevol/MensAM17
}}
==Towards an Interdisciplinary, Socio-technical Analysis of Software Ecosystems Health==
Towards an interdisciplinary, socio-technical
analysis of software ecosystem health
Tom Mens Bram Adams Josianne Marsan
Software Engineering Lab MCIS Lab Faculty of Business Administration
University of Mons Polytechnique Montreal Laval University
Mons, Belgium Montréal, Canada Québec, Canada
tom.mens@umons.ac.be bram.adams@polymtl.ca josianne.marsan@sio.ulaval.ca
Abstract—This extended abstract presents the research goals have a centralised management for overseeing the ecosystem’s
and preliminary research results of the interdisciplinary research health and survival. Instead, maintainers of SECO components
project SECOHealth, an ongoing collaboration between research need to understand and make decisions about the socio-
teams of Polytechnique Montreal (Canada), the University of
Mons (Belgium) and Laval University (Canada). SECOHealth technical impact of important events affecting SECO health
aims to contribute to research and practice in software en- and recommend corrective actions (e.g., improving SECO
gineering by delivering a validated interdisciplinary scientific quality and its attractiveness to key actors). Unfortunately,
methodology and a catalog of guidelines and recommendation there is only little support or best practices to enable SECO
tools for improving software ecosystem health. maintainers to perform these tasks.
Index Terms—software, ecosystem, evolution, health, recom-
mendation, prediction, survival, sustainability, resilience, socio-
technical
II. A BOUT SOFTWARE ECOSYSTEM HEALTH
I. I NTRODUCTION
From a biological point of view, health can be defined
The two-year interuniversity SECOHealth project1 started
as “the extent to which an organism’s vital systems are
on October 1, 2017. The three authors of this extended abstract
performing normally at any given time” [1]. This definition
are its Principal Investigators. SECOHealth aims to contribute
can be transposed to SECO health [2] by considering a SECO
to research and practice in software engineering by delivering
as a living organism, whose constituent software projects are
an interdisciplinary scientific methodology and a catalog of
the vital technical systems that need to perform “normally” in
guidelines and recommendation tools for improving software
order to have a healthy ecosystem, and whose community is
ecosystem (SECO) health. Those will enable key ecosystem
healthy if all community members are performing normally.
actors to better monitor and control the SECO health and
equip them with corrective and preventive measures to ensure SECO health problems can be very diverse in nature, and
their SECO’s survival and sustainability. The interdisciplinary can have many different causes. For example, in March 2016,
methodology used in our project will also guide other re- the npm ecosystem experienced the problem of a package get-
searchers in interdisciplinary projects involving open source ting unpublished, causing several thousands of transitively de-
communities or SECOs. pendent packages to break. The underlying cause was a typical
SECOs are large collections of interacting and interdepen- case of rage quitting, where the owner of the package decided
dent software projects that share a common technological to remove all of its packages2 . Another documented example
platform and that are maintained by large online communities of rage quitting relates to “toxic” communication styles of
of contributors. They pervade every aspect of human life open source communities, such as the one of the Linux Kernel
including entertainment, health, economy, industry, politics, community, causing a prominent developer to quit3 ; or the case
education and science. Commercial SECOs such as mobile where a central contributor to the bug handling community of
app stores or the Internet-of-Things have taken over our daily Gentoo Linux unexpectedly left, causing a major disruption
lives by storm, to the extent that the functioning of our modern in the community’s activity [3]. From a more technical point
digitally-enabled society would be severely impacted if SECOs of view, typical examples of health problems are packages
degrade in stability or even cease to exist. containing bugs or security vulnerabilities, causing potential
Yet, despite the strategic importance of ensuring the overall problems in packages depending on it. The impact of the
well-being of SECOs, their health is still ill-apprehended, as problem grows as the number of transitive dependencies on
SECOs are subject to constant evolution, due to an increasing a problematic package grows.
pace of events (e.g., technological or environmental changes).
What makes this especially challenging, is that SECOs do not 2 blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
3 https://slashdot.org/story/15/10/05/2031247/
1 www.secohealth.org linux-kernel-dev-sarah-sharp-quits-citing-brutal-communications-style
7
III. P ROJECT G OALS contributors (e.g., mean time to respond to a question, mean
time to fix a bug), social network structure and its evolution
SECOHealth aims at providing a scientific methodology
(e.g. turnover rate), contributor activity and productivity, and
and disciplined set of techniques to understand and control
the quality of interaction between all human stakeholders.
the health of software ecosystems. We adopt a socio-technical
Phenomenological health characteristics include the amount
perspective since the technical and social layers of SECOs are
of company involvement (i.e., paid contributors), market share,
strongly interwoven [4]. Our project aims to:
presence of competing products, and so on.
• define a conceptual model of SECO health;
With respect to the social health problem of developer
• explore analogies from other scientific disciplines such
turnover, we conducted an empirical study on the npm and
as ecology and toxicology; RubyGems ecosystems. Using the statistical technique of
• determine indicators capable of measuring the different
survival analysis we identified which social or technical factors
aspects of SECO health; in a SECO coincide with a higher or lower probability of
• determine events that affect the health of a SECO and its
developer abandonment [10].
constituent projects; Concerning technical health, we carried out a quantitative
• empirically validate these health indicators and events,
empirical analysis of the evolution of package dependency
both qualitatively and quantitatively; networks for seven package distributions of varying size
• build and evaluate models to predict the impact of a given
and age [11]. We proposed metrics to capture the growth,
event on SECO health; changeability, reusability and fragility of these dependency
• build and evaluate a socio-technical dependency model
networks. We observed that the dependency networks tend to
to understand how health problems propagate throughout grow over time, while a minority of packages are responsible
a SECO; for most of the package updates. The majority of packages
• propose a catalog of guidelines and recommendations for
depend on other packages, but only a small proportion of
supporting SECO health. packages accounts for most of the reverse dependencies. We
Joining our complementary strengths in theory-driven and observed a high proportion of “fragile” packages due to a high
data-driven investigation, we will follow a mixed-methods and increasing number of transitive dependencies.
approach [5], combining bottom-up data mining and top-
down interview/survey-based research, as well as combining IV. I NTERDISCIPLINARY R ESEARCH
state-of-the-art quantitative and qualitative analysis techniques SECOHealth will view SECOs as ecological ecosystems
emanating from different scientific disciplines. comprised of a population of living organisms (interdependent
Under the approval of Research Ethics Committees from the software projects and their interacting communities of con-
participating universities, we conducted face-to-face interviews tributors), and will produce health indicators and prediction
at the European Open Source Summit of the Linux Foundation models by drawing inspiration from well-known principles
(Prague, October 2017) with 17 SECO practitioners. The and theories from other disciplines, such as the notion of
interviews followed the guidelines of Patton [6], with the goal biodiversity in ecology [12], or the notion of toxicity in
of understanding what SECO health means for practitioners, toxicology [13], [14].
what indicators they use themselves or could be used given SECO health needs to be studied at different levels of
the right data, and which events have impacted SECO health granularity since the health of the SECO as a whole depends
in the past. on the health of its social and technical components, and vice
We will operationalise the SECO health indicators into versa. At a micro-level of analysis (i.e., within and between
concrete metrics, and perform SECO data mining to measure individual projects of a SECO), we will explore the impact
and evaluate the identified health indicators. We will build of toxicity, arguing that certain behaviour and interactions in
and empirically validate prediction models of how SECOs the SECO community can be toxic to not only the individual
will react to events, by relying on historical data from version software projects, but even to the SECO as a whole, and
control systems, code review and bug repositories, mailing lists hence can jeopardize its health and sustainability. Examples
and developer fora. of possible toxic social behaviour may consist of deviant or
Based on the recent research on SECO and community aggressive behaviours, for example in the form of flame wars
health [2], [7]–[9], we will consider three high-level character- as a reaction to bad quality code contributions [13]. One
istics of health: technical (i.e., concerning technical software promising way to assess such toxicity is by measuring social
artefacts), social (i.e., concerning contributor communities and debt [15], i.e., social interactions between SECO members that
the relations between their members) and phenomenological have been strained due to time pressure or lack of attention,
(i.e., concerning external/internal events and their manifesta- and at some point might blow up and cause friction within the
tion). Technical health characteristics include traditional soft- community of developers involved in a software project.
ware quality metrics, software dependency structure, software At a macro-level, we will study how health problems of
growth rate, size and frequency of software updates, bug fixes, SECO components evolve and propagate to others. Among
security vulnerabilities, obsolete or deprecated components, others, we will test the principle of biodiversity by analysing
and so on. Social characteristics include responsiveness of to which extent the SECO’s resilience decreases when its
8
diversity decreases. By resilience we refer to the ecosystem’s [6] M. Q. Patton, Qualitative research and evaluation methods : integrating
capacity of resisting to disturbances, or recovering from a theory and practice. SAGE Publications, 2015.
[7] F. Fotrousi, S. A. Fricker, M. Fiedler, and F. Le-Gall, “KPIs for software
perturbation quickly. Diversity will be analysed according to ecosystems: A systematic mapping study,” in Int’l Conf. Software
a variety of factors (e.g., geographical, activity-related, time- Business (ICSOB). Springer, 2014, pp. 194–211.
related, gender-related, artefact-related). Some of these factors [8] J. Y. Monteith, J. D. McGregor, and J. E. Ingram, “Proposed metrics on
ecosystem health,” in Int’l Workshop on Software-defined Ecosystems
have been shown to have positive effects on software health. (BigSystem). ACM, 2014, pp. 33–36.
For example, gender diversity has been shown to have a [9] K. Manikas and D. Kontogiorgos, “Characterizing software activity:
positive effect on the productivity of GitHub teams [16]. The influence of software to ecosystem health,” in Int’l Workshop on
Software Ecosystems (IWSECO), European Conf. Software Architecture
Workshops, 2015.
V. R ELATED W ORK AND R ELATED P ROJECTS [10] E. Constantinou and T. Mens, “An empirical comparison of developer
retention in the RubyGems and npm software ecosystems,” Innovations
SECOHealth can be considered as a successor project of the in Systems and Software Engineering, vol. 13, no. 2-3, pp. 101–115,
ECOS project (Ecological Studies of Open Source Software 2017.
Ecosystems) [17] that was carried out from 2012 till 2017. [11] A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of
dependency network evolution in seven software packaging ecosystems,”
As part of that project, we carried out multiple empirical Empirical Software Engineering, 2018.
studies of the evolution of open source software ecosystems [12] T. Mens, M. Claes, P. Grosjean, and A. Serebrenik, Studying Evolving
(e.g., Gnome [18], CRAN [19]–[21], Debian [22], npm and Software Ecosystems based on Ecological Models. Springer, 2014, pp.
297–326.
RubyGems [23]–[25]). [13] K. Carillo and J. Marsan, ““The dose makes the poison” - exploring the
The SECOHealth members are actively involved in the toxicity phenomenon in online communities,” in Int’l Conf. Information
CHAOSS (Community Health Analytics of Open Source Soft- Systems (ICIS), 2016.
[14] K. Carillo, J. Marsan, and B. Negoita, “Exploring the biosphere –
ware) initiative of the Linux Foundation. Metrics Committee. towards a conceptualization of peer production communities as living
Its goal is to define, implement and assess metrics for open organisms,” in SIGOPEN Pre-ICIS 2017 Workshop on Open Phenomena,
source community health and sustainability. While CHAOSS’ Association for Information Systems (AIS), Seoul, South Korea, 2017.
[15] D. A. Tamburri, P. Kruchten, P. Lago, and H. v. Vliet, “Social debt
initial focus is on metrics at the level of individual software in software engineering: insights from industry,” Journal of Internet
projects, SECOHealth will focus on ecosystem-wide health Services and Applications, vol. 6, no. 1, May 2015.
metrics. To operationalise our metrics, we aim to use Bitergia’s [16] B. Vasilescu, D. Posnett, B. Ray, M. G. J. van den Brand, A. Serebrenik,
P. T. Devanbu, and V. Filkov, “Gender and tenure diversity in GitHub
open source GrimoireLab tool chain for software development teams,” in Int’l Conf. Human Factors in Computing Systems (CHI).
analytics4 , which is one of the tool suites actively supported ACM, 2015, pp. 3789–3798.
by CHAOSS. [17] T. Mens, M. Claes, and P. Grosjean, “ECOS: Ecological studies of open
source software ecosystems,” in IEEE Int’l Conf. Software Maintenance,
Reengineering, and Reverse Engineering (CSMR-WCRE), 2014, pp.
ACKNOWLEDGMENT 403–406.
This research is carried out in the context of FRQ-FNRS [18] B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, “On the
variation and specialisation of workload: A case study of the Gnome
collaborative research project R.60.04.18.F “SECOHealth” ecosystem community,” Empirical Software Engineering vol. 19, pp.
and FNRS Research Credit J.0023.16 “Analysis of Software 955–1008, 2014.
Project Survival”. We thank our external project collaborators [19] D. M. Germán, B. Adams, and A. E. Hassan, “The evolution of the R
software ecosystem,” 2013, pp. 243–252.
Kevin Carillo (Toulouse University, France), Damian Tamburri [20] M. Claes, T. Mens, and P. Grosjean, “On the maintainability of CRAN
(Politecnico di Milano, Italy), Gregorio Robles (Universidad packages,” in IEEE Int’l Conf. Software Maintenance, Reengineering,
Rey Juan Carlos, Spain) and Bogdan Negoita (HEC Montréal, and Reverse Engineering (CSMR-WCRE), 2014, pp. 308–312.
[21] A. Decan, T. Mens, M. Claes, and P. Grosjean, “When GitHub meets
Canada) for actively participating to this project. CRAN: An analysis of inter-repository package dependency problems,”
in Int’l Conf. Software Analysis, Evolution, and Reengineering (SANER),
R EFERENCES 2016.
[22] M. Claes, T. Mens, R. D. Cosmo, and J. Vouillon, “A historical analysis
[1] X. Wang and S. Lantzy, “A systematic examination of member turnover of Debian package incompatibilities,” in Working Conf. Mining Software
and online community health,” in Int’l Conf. Information Systems (ICIS), Repositories (MSR), 2015, pp. 212–223.
2011. [23] E. Constantinou and T. Mens, “Social and technical evolution of
[2] S. Jansen, “Measuring the health of open source software ecosystems: software ecosystems: A case study of Rails,” in Int’l Workshop on
Beyond the scope of project health,” Information and Software Technol- Software Ecosystems (IWSECO), European Conf. Software Architecture
ogy, vol. 56, no. 11, pp. 1508 – 1519, 2014, special issue on Software Workshops, 2016.
Ecosystems. [24] A. Decan, T. Mens, and M. Claes, “An empirical comparison of de-
[3] M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “The rise pendency issues in OSS packaging ecosystems,” in Int’l Conf. Software
and fall of a central contributor: Dynamics of social organization and Analysis, Evolution and Reengineering (SANER), 2017, pp. 2–12.
performance in the Gentoo community,” in International Workshop on [25] E. Constantinou and T. Mens, “Socio-technical evolution of the Ruby
Cooperative and Human Aspects of Software Engineering (CHASE), ecosystem in GitHub,” in Int’l Conf. Software Analysis, Evolution and
2013, pp. 49–56. Reengineering (SANER), 2017, pp. 34–44.
[4] T. Mens, “An ecosystemic and socio-technical view on software mainte-
nance and evolution,” in Int’l Conf. Software Maintenance and Evolution
(ICSME), 2016, pp. 1–8.
[5] R. B. Johnson and A. J. Onwuegbuzie, “Mixed methods research: A
research paradigm whose time has come,” Educational Researcher,
vol. 33, no. 7, pp. 14–26, 2004.
4 http://grimoirelab.github.io
9