I Want to Invest in An Open Source Project: Shall I Hire Insiders or Outsiders? Vahid Etemadi1 , Jesús M. González-Barahona2 and Gregorio Robles2 1 Shiraz University of Technology, Shiraz, Iran 2 GSyC/LibreSoft, Universidad Rey Juan Carlos, Madrid, Spain Abstract Many companies and institutions want to contribute to Open Source projects in order to ensure its maintenance and evolution. One way of doing it is to contract developers to work on the project. In this situation, a frequent question is whether it is better to hire developers who have already collaborated with the project (i.e., insiders, usually volunteers that are active in the project) or if it is better to hire external developers (outsiders, with no previous relation to the project). An analogy to the logic behind the Insider/Outsider separation can be found in the well-known onion model, where they refer to insiders as core developers. The goal of this paper is to simulate and compare the performance (in terms of the time and the cost) of a company hiring a number of developers in two different scenarios, where the hired developers are a) insiders, and b) outsiders. We will therefore use a method used in the research literature before to simulate the behavior of software projects and determine the best strategy. Developers are therefore assigned bugs to work on in every round, being a round a given timespan (e.g., three months) or a given number of bugs are received (e.g., 50 bugs). During each round, a Non-dominated Sorting Genetic Algorithm II (NSGA-II) was used to evaluate the candidate assignments in terms of the time and the cost. With the current settings, the results show that there is no significant difference between the two scenarios in the terms of the two performance metrics considered. Upon the results so far, it seems we cannot favor one of the scenarios over the other one, based on the time and the cost. We believe further statistical analysis on the obtained data could power up our research in the future. Keywords OSS team, Core developers, contributors, digital twins 1. Introduction Free/Open Source Software (FOSS) has lived a major transformation since its early days. It started as a movement of communities of volunteers, but since 20 years ago it has been gaining attention from the software industry [1]. Nowadays, it is not infrequent to see professional developers hired by companies and institutions interested in driving the FOSS project forward. Companies and institutions have different ways to collaborate with and contribute to FOSS projects. One of them is to hire developers that devote time to the project as part of their BENEVOL’21: The 20th Belgium-Netherlands Software Evolution Workshop, December 07–08, 2021, ’s-Hertogenbosch (virtual), NL $ v.etemadi@sutech.ac.ir (V. Etemadi); jgb@gsyc.urjc.es (J. M. González-Barahona); grex@gsyc.urjc.es (G. Robles) € https://vahidetemadi.github.io/ (V. Etemadi); http://gsyc.es/~jgb (J. M. González-Barahona); http://gsyc.urjc.es/~grex (G. Robles)  0000-0003-1188-5708 (V. Etemadi); 0000-0001-9682-460X (J. M. González-Barahona); 0000-0002-1442-6761 (G. Robles) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) professional tasks. This is especially interesting in those projects that are already mature and have difficulties attracting new collaborators. These projects have become frequently an essential part of an infrastructure (e.g., OpenSSL or Log4J), but are not seen as attractive by volunteers, probably because the tasks to be done are mainly of perfective or corrective nature and the technologies are seen as old. A possibility for making the project sustainable is to hire volunteers already working on the project (insiders) and ensuring their contribution by making them professionals. This has the advantage that they are already familiar with the inner functioning, the process, the tools, the code, and the people. Another possibility is to hire external developers (outsiders) and assign them to work on the project. Although these developers have to get to know many aspects of the project, including its source code, it has the benefit that the community grows, at least in number of participants, if we assume that volunteer developers won’t leave the project. In this paper, we try to devise and evaluate these two scenarios, and find out if one outperforms the other. For doing so, we will take data from two FOSS projects that have been previously used in the research literature to compare bug assignment strategies. By using a simulation, we would like to see how the results are when a set of already existing developers devote more time to the project (i.e., they were volunteers and have been hired to work now full-time on the project) or when several new developers are included in the project (i.e., external developers are hired to work full-time on the project). Our contribution is to perform an exploration study on the performance evaluation of different perspectives in creating bug fixing teams in terms of the time and the cost over several successive round. We basically rely on several research publications that formulate bug fixing tasks as a two-objective problem, and take advantage of NSGA-II to solve it [2, 3]. We use a real data-set which goes through a simulated context to assign the tasks to developers. The model is then applied to the software maintenance and evolution phase of the software projects, as this has been reported to be the phase where it is harder to attract volunteers to the project [4]. In particular, the model takes the change requests and supports the maintainers to narrow down their plannings for evolving the software. This paper is structured as follows: Section 2 offers a short explanation related to the concepts targeted in this report. Next, we formulate the research questions and present a short description on their goal in Section 3. We offer some insight of the simulation process in Section 4. Section 5 iterates over the steps followed to operationalize our idea. We provide the result obtained so far in Section 6. Finally, Section 7 intends to discuss the results and the potential impact of our research. 2. Related Work Although FOSS projects are self-organized (some authors even argue that they follow stigmergic principles as found for instance in ant colonies [5]), developers in FOSS projects can in general be categorized as peripheral and core developers following an onion structure [6]. Core developers are usually those who have a key role in the project in opposition to peripheral ones whose contributions are minor. These two groups of developers are supposed to be different in terms of the time they commit and devote to the project. In recent times, many core developers are full-time paid employees from companies and other institutions, while peripheral contributors remain volunteers [7]. Even though volunteers do not devote much time to the project, their overall contribution is very valuable and projects strive to have a healthy community that attracts newcomers. In addition, it should be noted that this picture is not static, but evolves over time; it has been observed that there are generations of core developers that usually last three to five years [8]. It is also known that the time that it takes from the first contribution to become a core developer is of about 24 months in the mean for volunteers, but only 6 months for a paid developer [9]. A questionable, but interesting fact, given these models, is about the triplet events of joining, staying in, and abandoning projects (i.e., attraction, retention and turnover). Each of these events, that could take many forms, impacts the project success metrics [10]. This research work aims at focusing on the developer attraction that affects the composition of the team. It could be hypothesized that following a particular type of attraction paradigm could come up with different consequences. These consequences could reveal themselves in form of the project performance metrics. Among all, time and cost are well-known problem-specific metrics which are assumed to play an important role in practitioner’s opinion to switch to a particular team structure. To measure these two metrics, we need to look at the activities that consume time and need resources. Software maintenance and evolution is known to be the most costly part of a software project, and bug-fixing tasks are a key part in maintaining software. Thus, we think that to be able to evaluate different strategies in team composition we need to evaluate the time and the cost of bug-fixing tasks. 3. Research Questions In this Section, we present the two RQs that we would like to answer in this research. Basically, our aim is i) to compare the two scenarios in terms of their performance in every single round of maintenance, and ii) to offer a round-wise time and cost comparison for the two options. In detail: • RQ1: In terms of the final Pareto-fronts, how does each scenario perform in comparison to the other one in terms of the objectives (time and cost)? The motivation behind this RQ is to allow readers to observe and compare the final solutions provided by simulating each scenario. Each Pareto-front includes non-dominated assignments in terms of time and cost. Non-dominated assignments have the feature of not strongly dominate other solutions in the set (e.g., being exactly lower and not equal). The goal then will be representing sample Pareto-fronts (coming from an iteration in running), in terms of the time and the cost, for each single round. • RQ2: From an all-in-one round-wise representation, what are the best solutions offered by each scenario in terms of the minimum Time? For this RQ, a round-wise representation of minimum time over successive rounds is required. All-in-one means combining all rounds together and come up with a single comparison. This helps to avoid bias due to any particular condition of one or several rounds. That the main difference between RQ1 and RQ2 is that we only focus on the 1e6 12000 Insiders Sc. 800000 Insiders Sc. 1.2 Insiders Sc. Insiders Sc. Outsiders Sc. Outsiders Sc. 800000 Outsiders Sc. Outsiders Sc. 700000 700000 10000 1.0 600000 600000 8000 500000 0.8 500000 Cost (in $) Cost (in $) Cost (in $) Cost (in $) 6000 400000 400000 0.6 300000 300000 4000 200000 0.4 200000 2000 100000 0.2 100000 0 0 0 4000 6000 8000 10000 12000 14000 16000 4000 6000 8000 10000 12000 14000 0.0 2000 3000 4000 5000 6000 7000 Time (in h) Time (in h) 3000 3500 4000 4500 5000 5500 Time (in h) Time (in h) (a) 𝑅𝑜𝑢𝑛𝑑#1 (b) 𝑅𝑜𝑢𝑛𝑑#2 (c) 𝑅𝑜𝑢𝑛𝑑#3 (d) 𝑅𝑜𝑢𝑛𝑑#4 1e6 1e6 1e6 Insiders Sc. Insiders Sc. 1.0 Insiders Sc. 700000 Insiders Sc. 1.2 1.0 Outsiders Sc. Outsiders Sc. Outsiders Sc. Outsiders Sc. 600000 1.0 0.8 0.8 500000 0.8 0.6 400000 0.6 Cost (in $) Cost (in $) Cost (in $) Cost (in $) 0.6 300000 0.4 0.4 200000 0.4 0.2 0.2 100000 0.2 0 0.0 0.0 2000 3000 4000 5000 6000 7000 8000 9000 3000 4000 5000 6000 7000 8000 9000 10000 2000 3000 4000 5000 6000 3000 4000 5000 6000 7000 8000 9000 10000 Time (in h) Time (in h) Time (in h) Time (in h) (e) 𝑅𝑜𝑢𝑛𝑑#5 (f) 𝑅𝑜𝑢𝑛𝑑#6 (g) 𝑅𝑜𝑢𝑛𝑑#7 (h) 𝑅𝑜𝑢𝑛𝑑#8 Figure 1: Pareto-fronts over rounds for Insiders and Outsiders developer participation as the core. Each scatter plot represents the non-dominated solution in the objective space in terms of the time and the cost (JDT project) solution with minimum time, while for the overall view we do it for both time and cost. In the future, we would like to include other criteria such as minimum cost. 4. The Assignment Model So far, we have chosen time and cost as the two substantially relevant metrics for the evaluation. These are computed for every single candidate assignment. This means that we need a compu- tational search viewpoint to find the optimized assignment(s). We use a search-based software engineering (SBSE) [11] approach for bug-fixing task assignments, as it has already been done in several previous works [3, 2]. In this model, the problem is defined as a two-objective problem whose fitness function for an assignment S is defined as: 𝐹 𝑖𝑡𝑛𝑒𝑠𝑠(𝑆) = 𝐹 (𝑇 𝑖𝑚𝑒, 𝐶𝑜𝑠𝑡) (1) In [3] and [2], the authors explain how these two objectives contradict each other. The formal model in Eq. 1 is applied on every candidate assignment. The time variable in this equation is computed based on i) the estimated effort required to have task done and ii) the productivity of the assigned developer. For further information, we refer to Section 3.2 in [3]. Then, since we assume that we have access to the hourly wage of a particular developer (we know that it could differ from culture to culture), we are able to simply multiply time to wage to obtain how much that particular developer should be paid. This process are supposed to take place for all the candidate assignments. This is the core of a SBSE approach, in this case applied to the T1 T2 T3 T4 T11 T12 T21 T22 T31 T32 T33 T41 D1 D2 D3 D3 D1 D3 D4 D2 Figure 2: Representation of assigning tasks (bugs) to developers. Indices in the this vector are the sorted tasks by their prerequisite order taking code dependencies into consideration. assignment of tasks. We believe an SBSE paradigm for bug fixing task assignment is a robust solution that offers advantages over the other approaches (e.g., avoids developers becoming overloaded with many bugs). For example, there are similar approaches with the mission of finding the best assignee that belong to the category of information retrieval and machine learning. These two types of approaches take advantage from popular techniques, however, usually are expected to underestimate if the project has to handle concurrent fixing tasks and only a few number of fixers are available. In our model, we specifically rely on NSGA-II [12] to explore the search space. NSGA-II comes from the family of Multi-Objective Evolutionary Algorithms (MOEA), and basically follows the principles of a Genetic Algorithm (GA). NSGA-II is well-suited for the problems with more than one objective (as in our case, with time and cost). Typically, during each iteration of the algorithm, following steps are taken: 1. Create a population of candidate assignments. When creating each assignment, core developers take a 100% chance to be nominated for fixing bugs compared to volunteer developers who only have a 10% chance. This tries to model core developers to be full-time developers (with a 40-hour week), while volunteers devote in the mean 10 times less (i.e., only 4 hours per week [13]). 2. Initiate the evaluation process to measure the objectives. 3. The generated solutions are ranked upon a method of choice, and a selection process is started. 4. Genetic operators are applied to the selected solutions to create next generation of candidate solutions. 5. If the maximum Number of Fitness Evaluation (NFE) is met, the algorithm stops and the non-dominated solution is extracted as one of the Pareto-optimal solutions. Since we are using a genetic algorithm for evaluation, every assignment takes a shape as shown in Fig. 2. In this figure, the set of bugs 𝑇 = {𝑇1 , 𝑇2 , 𝑇3 , 𝑇4 } is broken down to associated subtasks to be then assigned to a developer set 𝐷 = {𝐷1 , 𝐷2 , 𝐷3 , 𝐷4 }. Fig. 2 is only an example assignment with the flexibility of becoming bigger (or lower) in terms of number of tasks and available developers. To sum up, our model is able to take a fixed number of bugs and developers and offer a Pareto-optimal set of the best assignments as the fixing plan. Table 1 Stats of the datasets, including two Eclipse components, JDT and Platform. Dataset # of Bugs Included # of Packages Time Interval JDT 240 12 2004 - 2006 Platform 240 12 2002 - 2004 5. Operationalization To answer the RQs, we performed several experiments with data obtained from a real con- text. We have therefore followed a simulation-based experimentation [14] for this purpose. Following a simulation-based study might introduce some limitations. All the experiments rely on a real data-set (including bugs features and developer properties), as provided by [2]. We offer the implementation of the scenarios in the replication package under the directory DeveloperAttraction directory 1 . Given our research questions, two scenarios were devised. We call these scenarios Insiders, where no external developers are added to the project, and Outsiders, where additional external developers are assumed to participate in fixing efforts. These two scenarios share similarities, as well as some differences. In the next subsections, we elaborate on the similarities and differences in the implementation. 5.1. Dataset To fed our model, we lend a preprocessed dataset of bugs (including the packages that should be modified and the effort required to fix a bug) and developers from the former study by Karim et al. [2]. The dataset contains a bug-set from JDT and from Platform, two components of the Eclipse project. We consider a project that has a total number of 240 bugs; however, this is expected to be extend to more projects and more number of bugs in future work. Table 1 offers a short description of this dataset. The bugs, the relevant assignee, and other features are available in the Bugzilla issue tracking system as well. Each scenario has access to a developer pool of core and peripheral developers. It is supposed that the projects have 20 developers before hiring new ones. Both scenarios have 5 core developers and engage 5 more (who are assumed to be a) already involved volunteers or b) external professionals, depending on the scenario). So, each scenario will have 10 core developer in total. However, the total number of developers will keep being 20 (in the case already involved volunteers are hired) or 25 (if, in addition to the 20 original developers, 5 new developers are hired). 5.2. Shared Features For each scenario, a round-wise bug fixing procedure is considered. During each round, a fixed number of bugs are assumed to be assigned to the members of the team. In terms of team composition, each scenario has the same number of core developers. We assume core 1 Available at https://github.com/vahidetemadi/SCM_TA/ Insiders Sc. Insiders Sc. Outsiders Sc. Outsiders Sc. 800000 2600 600000 2400 Time (in h) Cost (in $) 400000 2200 200000 2000 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Round # Round # (a) (b) Figure 3: (a) Solution with minimum time, and (b) Associated costs for the selected solutions of each round (JDT project) 0.8 0.8 0.7 0.6 0.6 p-value of Time p-value of Cost 0.5 0.4 0.4 0.3 0.2 0.2 0.0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Round # Round # (a) (b) Figure 4: p-value for (a) Solution with minimum time, and (b) Associated costs for the selected solutions of each round (JDT project) developers’ expertise is the same in both scenarios. Peripheral (or volunteers) developers devote less time to the project, and are expected to contribute to it for free. 5.3. Differences The difference among scenarios is in the number of volunteer developers, as in the Insiders scenario some of them have become core developers by means of being hired to work full- time on the project. Hence, the Outsiders scenario has a higher number of volunteers (free contributors). With the current settings, the additional contributors are only 20% of the total number of the Outsiders scenario (5 developers). 1e6 8000 Insiders Sc. Insiders Sc. Outsiders Sc. 4.0 Outsiders Sc. 7000 3.5 6000 3.0 5000 Time (in h) 2.5 Cost (in $) 4000 2.0 3000 1.5 2000 1.0 1000 0.5 1 2 3 4 5 6 7 8 Round # 1 2 3 4 5 6 7 8 Round # (a) (b) Figure 5: (a) Solution with minimum time, and (b) Associated costs for the selected solutions of each round (Platform project) 0.30 0.035 0.030 0.25 0.025 0.20 p-value of Time p-value of Cost 0.020 0.15 0.015 0.10 0.010 0.05 0.005 0.000 0.00 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Round # Round # (a) (b) Figure 6: p-value for (a) Solution with minimum time, and (b) Associated costs for the selected solutions of each round (Platform project) 6. Preliminary Results In this section, we provide the answers to our RQs. 6.1. Answer to RQ1 The preliminary results for the eight rounds are represented in Fig. 1 (the # of rounds is a function of the # of bugs in our settings, meaning it is not considered as a limitation). The scatter plots visualize the final solutions’ time and cost, extracted from the Pareto-fronts. These solutions are non-dominated, meaning none of them strongly dominates the other one in terms of time and cost. These results show a very close performance in terms of the objectives for the two scenarios. It is interesting to notice that the minor differences in performance repeat themselves over all the rounds. 6.2. Answer to RQ2 As discussed earlier, the experiment is run for each round, resulting in a set of non-dominated solutions that are available to be selected. This non-dominated set includes the solutions that one of them needs to be selected by the owner. A possible criterion for selecting a solution is based on minimum time. So, let’s select those solutions (for each round) with minimum time as the best one (for each scenario) and compare them. Similar to what was presented as answer to RQ1, we provide a full-round-wise representation of the selected solutions. Full-round-wise takes the solution with minimum time at the end of each round and compares two scenarios. For those solutions with minimum time, the round- wise representation of the cost is offered too. Fig. 3 illustrates the solutions with minimum time and the associated costs for JDT project. Besides, Fig. 3 offers the results for project Platform. In addition to the answers to the RQs, we would like to clarify how these results are aligned with the motivation of our work. Each sub-figure in Fig. 1 explains what a practitioner would expect from his investment on the Insiders and Outsiders scenario. Imagine a project received 30 bugs to be assigned, and the financial costs are to be taken into account. Round #1 in Fig. 1 (a) compares the best solutions obtained from our model in terms of time and cost. Clearly, if we compare the solutions in their objective space point-by-point, nothing that shows significant dominance is inferable. If we do the same exercise for the 8 rounds of assignment (captioned as Rounds #1 to 8), we find a similar situation. In fact, we believe a round-wise assignment view allows practitioners to generalize their conclusion while looking at the whole process. The purpose of Fig. 3 is exactly to offer this whole view. The difference we have found is marginal, both in terms of the cost and the time, regardless of which assignment is selected. To avoid the effect of randomness in our results, we decided to report on the p-value and effect size of the obtained results too. Being able to statistically enrich the comparison gives practitioners the chance to better judge the results. Typically, we needed to state this hypothesis that “there is a not significant difference between two scenarios in terms of the time and the cost". Then, we computed the p-value of the time and the cost for these two scenarios, and ended up with the results shown in Fig. 4. As the results suggest, we have to accept our null-hypothesis that states that there is no significant difference between both projects, particularly in terms of the time and the cost. In other words, it means hiring Insiders or Outsiders would not make a big difference for the project. 7. Discussion and Impact Based on our preliminary results, project stakeholders will be able to compare the performance of the two scenarios and understand that there is not a big deal in favoring one over the other, at least if we only considers time and cost. We are aware that such a conclusion requires a stronger and broader experiment. Other factors might have an impact in the scenarios, and have not been taken into account. For instance, we have not considered the problems that newcomers have when joining a project [15], although previous research has shown that the joining process for professional developers is relatively fast [9]. Relying on a simulated research work always introduces some threats and limitations. Finding a perfect reflection of the context in this regard is a tough task. However, we followed some principles to bridge this gap. For instance, all the input data-sets are from real FOSS projects. Re-running the simulated process also helps with facing randomness that might exist in the simulation, as we considered in our implementation. Another limitation is related to the generalization of the conclusion that we plan to mitigate by including other projects. More statistical test also should be used to offer better insights too, like we have done in a previous work [3]. Our recommendation to offer a simulated implementation of the real maintenance phase of FOSS projects could be considered as an application of digital twins for FOSS projects [16]. So far, we have focused on the attraction events and in the bug fixing process to be mirrored in their twins. However, it could be extended to other activities in maintenance and evolution as well. This twin could sit next to the real project and assist the community in taking decision regarding team arrangement and where to locate resources more efficiently. In our case, both scenarios are sort of a digital model that simplifies real process for hiring developers for FOSS projects. 8. Threats to Validity In this Section, we intend to shortly list the threats to validity and what our strategies have been to mitigate them. 8.1. Construct validity In this research, we planned to rely on the constructive theoretical and practical techniques that have already been proved to be reliable. For instance, there might be threats in terms of using an SBSE paradigm for modeling bug fixing task assignment. However, we are able to see that our idea is not new and has already been discussed in former studies [3, 17, 2]. 8.2. External validity Our research results might be questioned in terms of its applicability to other, similar projects. At this stage, we only focus on two specific projects, and this raises the concern of not being able to end up with a general conclusion. We accept this is a big concern, and we are planning to include more number of projects to alleviate it. In this research, we only focused on the bug fixing tasks. That might question our findings for being only specific to a particular category of post-release issues (bug fixing tasks). However, we hypothesize that the results could be the same for all kinds of change requests, since task assignment engine treats all kind of task the same, except when they are labeled with a particular priority. 8.3. Internal validity We presented two RQs which are aligned with the purpose of the study. To answer these questions, we followed the standard way of performing an empirical analysis. We relied on verified techniques (see [3, 2, 18]) to offer a digital model of the environment, rerun the assignment algorithm for several times, and tried to offer an statistical test in an effort to avoid potential biases. 8.4. Conclusion validity We tried to be very careful about the final results which are dependent to the flow of activities in a planned analysis. The validity of the results allows to ensure a clear conclusion. We believe the flow of the current analysis, which is enriched with previous relevant studies, is likely to avoid potential threats. A key concern regarding both scenarios is introduced when the project might be threaten by the possibility of developer turnover or heterogeneity. In our former study [17], we proposed a solution to alleviate the long-term effects of developer churn that happens due to many reasons. 9. Conclusion In this research, we tried to compare two usual scenarios (from the perspective of a company or an institution wanting to invest in a FOSS project) to hire new developers tor expand the core developers team. To achieve this, we were assisted by a digital model that focuses on the time and the cost as two quantitative criteria for evaluating the outcomes of each scenario. The obtained results, obtained after a statistical analysis, show that there is no significant difference between hiring Insiders or Outsiders in terms of time and cost. However, as future work, we plan to include more projects and performing further analysis to compare the investment options. Moreover, something that might sound missing in the current study is not including bus factor as a criterion that software communities could rely on for making decision. Thus, in future work, we aim at including the bus factor and other qualitative criteria to offer a broader and clearer perspective of all possible solutions. Acknowledgment The second and third author acknowledge the support of the Government of Spain through project “BugBirth” (RTI2018-101963-B-100). We also appreciate the insightful comments from the BENEVOL 2021 reviewers. Their comments have been very constructive to enhance our paper. References [1] G. Robles, I. Steinmacher, P. Adams, C. Treude, Twenty years of open source software: From skepticism to mainstream, IEEE Software 36 (2019) 12–15. [2] M. R. Karim, G. Ruhe, M. M. Rahman, V. Garousi, T. Zimmermann, An empirical inves- tigation of single-objective and multiobjective evolutionary algorithms for developer’s assignment to bugs, Journal of Software: Evolution and Process 28 (2016) 1025–1060. [3] V. Etemadi, O. Bushehrian, R. Akbari, G. Robles, A scheduling-driven approach to efficiently assign bug fixing tasks to developers, Journal of Systems and Software 178 (2021) 110967. [4] D. M. German, The gnome project: a case study of open source, global software develop- ment, Software Process: Improvement and Practice 8 (2003) 201–215. [5] G. Robles, J. J. Merelo, J. M. Gonzalez-Barahona, Self-organized development in libre software: a model based on the stigmergy concept, ProSim’05 16 (2005). [6] K. Crowston, J. Howison, The social structure of free and open source software develop- ment, First Monday (2005). [7] Y. Zhang, M. Zhou, A. Mockus, Z. Jin, Companies’ participation in oss development-an empirical study of openstack, IEEE Transactions on Software Engineering (2019). [8] G. Robles, J. M. Gonzalez-Barahona, I. Herraiz, Evolution of the core team of developers in libre software projects, in: 2009 6th IEEE international working conference on mining software repositories, IEEE, 2009, pp. 167–170. [9] I. Herraiz, G. Robles, J. J. Amor, T. Romera, J. M. González Barahona, The processes of joining in global distributed software projects, in: Proceedings of the 2006 international workshop on Global software development for the practitioner, 2006, pp. 27–33. [10] J. Tian, Software quality engineering: testing, quality assurance, and quantifiable improve- ment, John Wiley & Sons, 2005. [11] M. Harman, B. F. Jones, Search-based software engineering, Information and software Technology 43 (2001) 833–839. [12] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii, in: International conference on parallel problem solving from nature, Springer, 2000, pp. 849–858. [13] R. A. Ghosh, R. Glott, B. Krieger, G. Robles, Free/libre and open source software: Survey and study, 2002. [14] S. Easterbrook, J. Singer, M.-A. Storey, D. Damian, Selecting empirical methods for software engineering research, in: Guide to advanced empirical software engineering, Springer, 2008, pp. 285–311. [15] I. Steinmacher, I. S. Wiese, T. Conte, M. A. Gerosa, D. Redmiles, The hard life of open source software project newcomers, in: Proceedings of the 7th international workshop on cooperative and human aspects of software engineering, 2014, pp. 72–78. [16] J. Ahlgren, K. Bojarczuk, S. Drossopoulou, I. Dvortsova, J. George, N. Gucevska, M. Harman, M. Lomeli, S. M. Lucas, E. Meijer, et al., Facebook’s cyber–cyber and cyber–physical digital twins, in: Evaluation and Assessment in Software Engineering, 2021, pp. 1–9. [17] V. Etemadi, O. Bushehrian, G. Robles, Task assignment to counter the effect of developer turnover in software maintenance: A knowledge diffusion model, Information and Software Technology (2021) 106786. [18] F. Sarro, F. Ferrucci, M. Harman, A. Manna, J. Ren, Adaptive multi-objective evolutionary algorithms for overtime planning in software projects, IEEE Transactions on Software Engineering 43 (2017) 898–917.