=Paper= {{Paper |id=Vol-2047/BENEVOL_2017_paper_2 |storemode=property |title=Towards an Interdisciplinary, Socio-technical Analysis of Software Ecosystems Health |pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_2.pdf |volume=Vol-2047 |authors=Tom Mens,Bram Adams,Josianne Marsan |dblpUrl=https://dblp.org/rec/conf/benevol/MensAM17 }} ==Towards an Interdisciplinary, Socio-technical Analysis of Software Ecosystems Health== https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_2.pdf
           Towards an interdisciplinary, socio-technical
             analysis of software ecosystem health
                    Tom Mens                           Bram Adams                                   Josianne Marsan
           Software Engineering Lab                     MCIS Lab                         Faculty of Business Administration
               University of Mons                 Polytechnique Montreal                          Laval University
                Mons, Belgium                        Montréal, Canada                            Québec, Canada
            tom.mens@umons.ac.be                  bram.adams@polymtl.ca                    josianne.marsan@sio.ulaval.ca



   Abstract—This extended abstract presents the research goals        have a centralised management for overseeing the ecosystem’s
and preliminary research results of the interdisciplinary research    health and survival. Instead, maintainers of SECO components
project SECOHealth, an ongoing collaboration between research         need to understand and make decisions about the socio-
teams of Polytechnique Montreal (Canada), the University of
Mons (Belgium) and Laval University (Canada). SECOHealth              technical impact of important events affecting SECO health
aims to contribute to research and practice in software en-           and recommend corrective actions (e.g., improving SECO
gineering by delivering a validated interdisciplinary scientific      quality and its attractiveness to key actors). Unfortunately,
methodology and a catalog of guidelines and recommendation            there is only little support or best practices to enable SECO
tools for improving software ecosystem health.                        maintainers to perform these tasks.
   Index Terms—software, ecosystem, evolution, health, recom-
mendation, prediction, survival, sustainability, resilience, socio-
technical
                                                                               II. A BOUT SOFTWARE ECOSYSTEM HEALTH
                         I. I NTRODUCTION
                                                                         From a biological point of view, health can be defined
   The two-year interuniversity SECOHealth project1 started
                                                                      as “the extent to which an organism’s vital systems are
on October 1, 2017. The three authors of this extended abstract
                                                                      performing normally at any given time” [1]. This definition
are its Principal Investigators. SECOHealth aims to contribute
                                                                      can be transposed to SECO health [2] by considering a SECO
to research and practice in software engineering by delivering
                                                                      as a living organism, whose constituent software projects are
an interdisciplinary scientific methodology and a catalog of
                                                                      the vital technical systems that need to perform “normally” in
guidelines and recommendation tools for improving software
                                                                      order to have a healthy ecosystem, and whose community is
ecosystem (SECO) health. Those will enable key ecosystem
                                                                      healthy if all community members are performing normally.
actors to better monitor and control the SECO health and
equip them with corrective and preventive measures to ensure             SECO health problems can be very diverse in nature, and
their SECO’s survival and sustainability. The interdisciplinary       can have many different causes. For example, in March 2016,
methodology used in our project will also guide other re-             the npm ecosystem experienced the problem of a package get-
searchers in interdisciplinary projects involving open source         ting unpublished, causing several thousands of transitively de-
communities or SECOs.                                                 pendent packages to break. The underlying cause was a typical
   SECOs are large collections of interacting and interdepen-         case of rage quitting, where the owner of the package decided
dent software projects that share a common technological              to remove all of its packages2 . Another documented example
platform and that are maintained by large online communities          of rage quitting relates to “toxic” communication styles of
of contributors. They pervade every aspect of human life              open source communities, such as the one of the Linux Kernel
including entertainment, health, economy, industry, politics,         community, causing a prominent developer to quit3 ; or the case
education and science. Commercial SECOs such as mobile                where a central contributor to the bug handling community of
app stores or the Internet-of-Things have taken over our daily        Gentoo Linux unexpectedly left, causing a major disruption
lives by storm, to the extent that the functioning of our modern      in the community’s activity [3]. From a more technical point
digitally-enabled society would be severely impacted if SECOs         of view, typical examples of health problems are packages
degrade in stability or even cease to exist.                          containing bugs or security vulnerabilities, causing potential
   Yet, despite the strategic importance of ensuring the overall      problems in packages depending on it. The impact of the
well-being of SECOs, their health is still ill-apprehended, as        problem grows as the number of transitive dependencies on
SECOs are subject to constant evolution, due to an increasing         a problematic package grows.
pace of events (e.g., technological or environmental changes).
What makes this especially challenging, is that SECOs do not            2 blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
                                                                         3 https://slashdot.org/story/15/10/05/2031247/
  1 www.secohealth.org                                                linux-kernel-dev-sarah-sharp-quits-citing-brutal-communications-style




                                                                                                                                              7
                    III. P ROJECT G OALS                            contributors (e.g., mean time to respond to a question, mean
                                                                    time to fix a bug), social network structure and its evolution
   SECOHealth aims at providing a scientific methodology
                                                                    (e.g. turnover rate), contributor activity and productivity, and
and disciplined set of techniques to understand and control
                                                                    the quality of interaction between all human stakeholders.
the health of software ecosystems. We adopt a socio-technical
                                                                    Phenomenological health characteristics include the amount
perspective since the technical and social layers of SECOs are
                                                                    of company involvement (i.e., paid contributors), market share,
strongly interwoven [4]. Our project aims to:
                                                                    presence of competing products, and so on.
   • define a conceptual model of SECO health;
                                                                       With respect to the social health problem of developer
   • explore analogies from other scientific disciplines such
                                                                    turnover, we conducted an empirical study on the npm and
       as ecology and toxicology;                                   RubyGems ecosystems. Using the statistical technique of
   • determine indicators capable of measuring the different
                                                                    survival analysis we identified which social or technical factors
       aspects of SECO health;                                      in a SECO coincide with a higher or lower probability of
   • determine events that affect the health of a SECO and its
                                                                    developer abandonment [10].
       constituent projects;                                           Concerning technical health, we carried out a quantitative
   • empirically validate these health indicators and events,
                                                                    empirical analysis of the evolution of package dependency
       both qualitatively and quantitatively;                       networks for seven package distributions of varying size
   • build and evaluate models to predict the impact of a given
                                                                    and age [11]. We proposed metrics to capture the growth,
       event on SECO health;                                        changeability, reusability and fragility of these dependency
   • build and evaluate a socio-technical dependency model
                                                                    networks. We observed that the dependency networks tend to
       to understand how health problems propagate throughout       grow over time, while a minority of packages are responsible
       a SECO;                                                      for most of the package updates. The majority of packages
   • propose a catalog of guidelines and recommendations for
                                                                    depend on other packages, but only a small proportion of
       supporting SECO health.                                      packages accounts for most of the reverse dependencies. We
   Joining our complementary strengths in theory-driven and         observed a high proportion of “fragile” packages due to a high
data-driven investigation, we will follow a mixed-methods           and increasing number of transitive dependencies.
approach [5], combining bottom-up data mining and top-
down interview/survey-based research, as well as combining                      IV. I NTERDISCIPLINARY R ESEARCH
state-of-the-art quantitative and qualitative analysis techniques      SECOHealth will view SECOs as ecological ecosystems
emanating from different scientific disciplines.                    comprised of a population of living organisms (interdependent
   Under the approval of Research Ethics Committees from the        software projects and their interacting communities of con-
participating universities, we conducted face-to-face interviews    tributors), and will produce health indicators and prediction
at the European Open Source Summit of the Linux Foundation          models by drawing inspiration from well-known principles
(Prague, October 2017) with 17 SECO practitioners. The              and theories from other disciplines, such as the notion of
interviews followed the guidelines of Patton [6], with the goal     biodiversity in ecology [12], or the notion of toxicity in
of understanding what SECO health means for practitioners,          toxicology [13], [14].
what indicators they use themselves or could be used given             SECO health needs to be studied at different levels of
the right data, and which events have impacted SECO health          granularity since the health of the SECO as a whole depends
in the past.                                                        on the health of its social and technical components, and vice
   We will operationalise the SECO health indicators into           versa. At a micro-level of analysis (i.e., within and between
concrete metrics, and perform SECO data mining to measure           individual projects of a SECO), we will explore the impact
and evaluate the identified health indicators. We will build        of toxicity, arguing that certain behaviour and interactions in
and empirically validate prediction models of how SECOs             the SECO community can be toxic to not only the individual
will react to events, by relying on historical data from version    software projects, but even to the SECO as a whole, and
control systems, code review and bug repositories, mailing lists    hence can jeopardize its health and sustainability. Examples
and developer fora.                                                 of possible toxic social behaviour may consist of deviant or
   Based on the recent research on SECO and community               aggressive behaviours, for example in the form of flame wars
health [2], [7]–[9], we will consider three high-level character-   as a reaction to bad quality code contributions [13]. One
istics of health: technical (i.e., concerning technical software    promising way to assess such toxicity is by measuring social
artefacts), social (i.e., concerning contributor communities and    debt [15], i.e., social interactions between SECO members that
the relations between their members) and phenomenological           have been strained due to time pressure or lack of attention,
(i.e., concerning external/internal events and their manifesta-     and at some point might blow up and cause friction within the
tion). Technical health characteristics include traditional soft-   community of developers involved in a software project.
ware quality metrics, software dependency structure, software          At a macro-level, we will study how health problems of
growth rate, size and frequency of software updates, bug fixes,     SECO components evolve and propagate to others. Among
security vulnerabilities, obsolete or deprecated components,        others, we will test the principle of biodiversity by analysing
and so on. Social characteristics include responsiveness of         to which extent the SECO’s resilience decreases when its




                                                                                                                                   8
diversity decreases. By resilience we refer to the ecosystem’s                   [6] M. Q. Patton, Qualitative research and evaluation methods : integrating
capacity of resisting to disturbances, or recovering from a                          theory and practice. SAGE Publications, 2015.
                                                                                 [7] F. Fotrousi, S. A. Fricker, M. Fiedler, and F. Le-Gall, “KPIs for software
perturbation quickly. Diversity will be analysed according to                        ecosystems: A systematic mapping study,” in Int’l Conf. Software
a variety of factors (e.g., geographical, activity-related, time-                    Business (ICSOB). Springer, 2014, pp. 194–211.
related, gender-related, artefact-related). Some of these factors                [8] J. Y. Monteith, J. D. McGregor, and J. E. Ingram, “Proposed metrics on
                                                                                     ecosystem health,” in Int’l Workshop on Software-defined Ecosystems
have been shown to have positive effects on software health.                         (BigSystem). ACM, 2014, pp. 33–36.
For example, gender diversity has been shown to have a                           [9] K. Manikas and D. Kontogiorgos, “Characterizing software activity:
positive effect on the productivity of GitHub teams [16].                            The influence of software to ecosystem health,” in Int’l Workshop on
                                                                                     Software Ecosystems (IWSECO), European Conf. Software Architecture
                                                                                     Workshops, 2015.
        V. R ELATED W ORK AND R ELATED P ROJECTS                                [10] E. Constantinou and T. Mens, “An empirical comparison of developer
                                                                                     retention in the RubyGems and npm software ecosystems,” Innovations
   SECOHealth can be considered as a successor project of the                        in Systems and Software Engineering, vol. 13, no. 2-3, pp. 101–115,
ECOS project (Ecological Studies of Open Source Software                             2017.
Ecosystems) [17] that was carried out from 2012 till 2017.                      [11] A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of
                                                                                     dependency network evolution in seven software packaging ecosystems,”
As part of that project, we carried out multiple empirical                           Empirical Software Engineering, 2018.
studies of the evolution of open source software ecosystems                     [12] T. Mens, M. Claes, P. Grosjean, and A. Serebrenik, Studying Evolving
(e.g., Gnome [18], CRAN [19]–[21], Debian [22], npm and                              Software Ecosystems based on Ecological Models. Springer, 2014, pp.
                                                                                     297–326.
RubyGems [23]–[25]).                                                            [13] K. Carillo and J. Marsan, ““The dose makes the poison” - exploring the
   The SECOHealth members are actively involved in the                               toxicity phenomenon in online communities,” in Int’l Conf. Information
CHAOSS (Community Health Analytics of Open Source Soft-                              Systems (ICIS), 2016.
                                                                                [14] K. Carillo, J. Marsan, and B. Negoita, “Exploring the biosphere –
ware) initiative of the Linux Foundation. Metrics Committee.                         towards a conceptualization of peer production communities as living
Its goal is to define, implement and assess metrics for open                         organisms,” in SIGOPEN Pre-ICIS 2017 Workshop on Open Phenomena,
source community health and sustainability. While CHAOSS’                            Association for Information Systems (AIS), Seoul, South Korea, 2017.
                                                                                [15] D. A. Tamburri, P. Kruchten, P. Lago, and H. v. Vliet, “Social debt
initial focus is on metrics at the level of individual software                      in software engineering: insights from industry,” Journal of Internet
projects, SECOHealth will focus on ecosystem-wide health                             Services and Applications, vol. 6, no. 1, May 2015.
metrics. To operationalise our metrics, we aim to use Bitergia’s                [16] B. Vasilescu, D. Posnett, B. Ray, M. G. J. van den Brand, A. Serebrenik,
                                                                                     P. T. Devanbu, and V. Filkov, “Gender and tenure diversity in GitHub
open source GrimoireLab tool chain for software development                          teams,” in Int’l Conf. Human Factors in Computing Systems (CHI).
analytics4 , which is one of the tool suites actively supported                      ACM, 2015, pp. 3789–3798.
by CHAOSS.                                                                      [17] T. Mens, M. Claes, and P. Grosjean, “ECOS: Ecological studies of open
                                                                                     source software ecosystems,” in IEEE Int’l Conf. Software Maintenance,
                                                                                     Reengineering, and Reverse Engineering (CSMR-WCRE), 2014, pp.
                          ACKNOWLEDGMENT                                             403–406.
  This research is carried out in the context of FRQ-FNRS                       [18] B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, “On the
                                                                                     variation and specialisation of workload: A case study of the Gnome
collaborative research project R.60.04.18.F “SECOHealth”                             ecosystem community,” Empirical Software Engineering vol. 19, pp.
and FNRS Research Credit J.0023.16 “Analysis of Software                             955–1008, 2014.
Project Survival”. We thank our external project collaborators                  [19] D. M. Germán, B. Adams, and A. E. Hassan, “The evolution of the R
                                                                                     software ecosystem,” 2013, pp. 243–252.
Kevin Carillo (Toulouse University, France), Damian Tamburri                    [20] M. Claes, T. Mens, and P. Grosjean, “On the maintainability of CRAN
(Politecnico di Milano, Italy), Gregorio Robles (Universidad                         packages,” in IEEE Int’l Conf. Software Maintenance, Reengineering,
Rey Juan Carlos, Spain) and Bogdan Negoita (HEC Montréal,                           and Reverse Engineering (CSMR-WCRE), 2014, pp. 308–312.
                                                                                [21] A. Decan, T. Mens, M. Claes, and P. Grosjean, “When GitHub meets
Canada) for actively participating to this project.                                  CRAN: An analysis of inter-repository package dependency problems,”
                                                                                     in Int’l Conf. Software Analysis, Evolution, and Reengineering (SANER),
                                   R EFERENCES                                       2016.
                                                                                [22] M. Claes, T. Mens, R. D. Cosmo, and J. Vouillon, “A historical analysis
 [1] X. Wang and S. Lantzy, “A systematic examination of member turnover             of Debian package incompatibilities,” in Working Conf. Mining Software
     and online community health,” in Int’l Conf. Information Systems (ICIS),        Repositories (MSR), 2015, pp. 212–223.
     2011.                                                                      [23] E. Constantinou and T. Mens, “Social and technical evolution of
 [2] S. Jansen, “Measuring the health of open source software ecosystems:            software ecosystems: A case study of Rails,” in Int’l Workshop on
     Beyond the scope of project health,” Information and Software Technol-          Software Ecosystems (IWSECO), European Conf. Software Architecture
     ogy, vol. 56, no. 11, pp. 1508 – 1519, 2014, special issue on Software          Workshops, 2016.
     Ecosystems.                                                                [24] A. Decan, T. Mens, and M. Claes, “An empirical comparison of de-
 [3] M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “The rise         pendency issues in OSS packaging ecosystems,” in Int’l Conf. Software
     and fall of a central contributor: Dynamics of social organization and          Analysis, Evolution and Reengineering (SANER), 2017, pp. 2–12.
     performance in the Gentoo community,” in International Workshop on         [25] E. Constantinou and T. Mens, “Socio-technical evolution of the Ruby
     Cooperative and Human Aspects of Software Engineering (CHASE),                  ecosystem in GitHub,” in Int’l Conf. Software Analysis, Evolution and
     2013, pp. 49–56.                                                                Reengineering (SANER), 2017, pp. 34–44.
 [4] T. Mens, “An ecosystemic and socio-technical view on software mainte-
     nance and evolution,” in Int’l Conf. Software Maintenance and Evolution
     (ICSME), 2016, pp. 1–8.
 [5] R. B. Johnson and A. J. Onwuegbuzie, “Mixed methods research: A
     research paradigm whose time has come,” Educational Researcher,
     vol. 33, no. 7, pp. 14–26, 2004.

  4 http://grimoirelab.github.io




                                                                                                                                                             9