Analysis of a Clone-and-Own Industrial Automation System: An Exploratory Study Nick Lodewijks University of Amsterdam, The Netherlands nicklodewijks@gmail.com many practitioners, mainly because of its simplicity and availability (Dubinsky et al., 2013). Abstract While the general belief is that clone-and-own is a bad and unsustainable development technique, it In industry, the development of similar prod- has been used successfully for the development of ucts is often addressed by cloning and modi- the MES-Toolbox; a large (±1 million lines of Java fying existing artifacts. This so-called clone- code) proprietary factory automation system. Over and-own approach is often considered to be a the past 17 years, for each new customer, an exist- bad practice but is perceived as a favorable ing system was cloned and modified in any possible and natural software reuse approach by many way to add, modify or remove functionality. With practitioners. Unfortunately, current litera- over 70 implementations of the systems running world- ture lacks quantitative information about the wide, the company now seeks to reduce maintenance positive and negative effects of clone-and-own. overhead. Unfortunately, the decision on how to move In this paper, we present the results of our ex- forward from a successful clone-and-own approach is ploratory analysis of an industry system devel- not straightforward. oped using the clone-and-own approach. We Over the past decade, several tools and techniques found that products from the same product for dealing with cloned product variants have been pro- family can vary significantly in change activ- posed. Some of them advocate the elimination of all ity over time, divergence from their origin and clones by merging the variants into a single platform, synchronization activity. We will further in- and others propose to maintain multiple variants as- vestigate these factors to develop quantitative is (Rubin, Czarnecki, and Chechik, 2013). What ap- measures for the assessment of clone-and-own proach works best for a given situation depends on the benefits and drawbacks. domain and context of that situation. In some cases eliminating all clones and adopting an integrated plat- 1 Introduction form is neither possible nor beneficial (Antkiewicz et al., 2014). Eliminating clones will increase coupling, Cloning is often considered to be a practice harm- and changing shared code may require re-testing of all ful to the quality of source code, and potentially a systems that use it (Dubinsky et al., 2013). If the suc- cause of maintainability problems (Kapser and God- cess of the product highly depends on the benefits of frey, 2006; Thummalapenta et al., 2010). Yet, in in- clone-and-own, then its merits should be considered dustry, the development of similar products is often before moving away to a different approach. addressed by cloning and modifying existing artifacts. The main objective of our study is to explore the This so-called clone-and-own approach is perceived as evolution of MES-Toolbox systems and to gain in- a favorable and natural software reuse approach by sight into how clone-and-own may have affected on- going project development and maintenance. In this Copyright c by the paper’s authors. Copying permitted for private and academic purposes. paper, we show how version control system metadata, Proceedings of the Seminar Series on Advanced Techniques and source-code differencing, and visualization techniques Tools for Software Evolution SATToSE 2017 (sattose.org). can be used to identify clone-and-own related points 07-09 June 2017, Madrid, Spain. of interest in the evolution of a product family. 1 2 Subject System and Jager, 2015). For every new factory, a clone of the codebase of the latest platform release is realized The system studied in this work is the MES-Toolbox; by creating a branch with the Subversion version con- a 17-year-old proprietary Java-based factory automa- trol system. The clone is then configured and changed tion system developed by ENGIE. The main purpose in any possible way by Application Engineers to add, of the systems is for automation of batch and con- modify or remove functionality. Each clone corre- tinuous production processes. It can visualize, con- sponds to the automation system of a factory some- trol and register every step of an entire production where around the world for some specific customer. process. From the intake of raw material (unloading Each clone is a variant of the base platform. We refer from trucks, ships, bags, pallets, containers), prepara- to the collection of all MES-Toolbox variants as the tion (dosing, weighing, heating), processing (pressing, MES-Toolbox product family. grinding, mixing), storage, to the distribution of end Between clones there exists a varying degree of com- products to customers. Depending on what customers monality, and there is often no clear relation between require for their production process, the system per- the clones. Clones developed for the same customer forms article and recipe management, quality regis- might have more in common than clones developed tration, production planning, tracking and tracing of for different customers. For example, if a company materials used in production, stock control, shift reg- requires all their production facilities to have identi- istration, production performance analysis and com- cal branding and communication interfaces with third- municates with ERP systems. To monitor and con- party systems. However, even clones that appear to be trol physical production equipment (e.g., conveyors, unrelated in terms of end-user requirements may still mixers, weigher, buttons, lights), the MES-Toolbox have some forms of commonality, such as the graphical communicates with Programmable Logic Controller’s user interface components or the configuration frame- (PLC’s) that perform the actual low-level control of work that is used. these physical devices. Over the past 17 years, the system has grown to 3 Research Questions contain more than 6500 Java files, with a total of ap- proximately 1 million lines of Java code. While the de- Dubinsky et al. (2013) observed that independence sign of the system has a modular structure and aims to provided by clone-and-own is one of the major reasons separate common code from customer implementation for considering cloning as an efficient reuse mechanism. code as much as possible, it’s a monolithic application. Developers can make any change required to satisfy Nearly all source-code is contained in a single project, customer requirements, without affecting other clones. which is developed, built and versioned as a whole. In- They do not have to collaborate with teams working ternally, this project is called the Standard project, as on other systems, that may have different priorities or it is used as a basis for all new projects. This project, scheduling constraints. These characteristics of clone- which can be considered as the main platform of the and-own have to be considered when new change mech- product family, contains a constantly growing set of anisms are introduced, since different techniques may reusable core components and ready-to-use standard not provide the same degree of independence. But how solutions. much independence is needed, and how can it be mea- Within the organization there is a clear distinction sured? In this section, we describe three research ques- between platform development and application devel- tions that we will use to explore independence-related opment, this distinction is often found in a Software characteristics of the MES-Toolbox product family. Ecosystem (SECO) (Lettner, Angerer, Grünbacher, et RQ1: Do MES-Toolbox systems change in parallel? al., 2014). A small team of five developers is respon- When cloning is used to develop systems independent sible for the overall design, development, and mainte- of each other, developers can decide when to change nance of the system. The founder and writer of the the codebase of each individual system. The develop- first line of code of this system is also still part of this ment of each system can follow its own release and de- team. Work of this team is focused on maintenance of velopment schedule that is based on available resources the core platform, development of complex customer and requirements for the system. The development of specific features, standardization of functionality, de- complex systems with many customer-specific modifi- velopment of product configuration tools, and provide cations may require and allow for months of continu- support to application engineers. ous, frequent change, while relatively straightforward Even though the system is highly configurable, and simple systems might have strict deadlines and cloning is used to address the specificity and high de- require only a few changes within the first weeks. gree of variation of customer requirements often found To explore whether MES-Toolbox systems may in the domain of industrial automation (Schrock, Fay, have benefited from this time aspect of independence, 2 we want to gain a rough understanding of the degree 21(35%) of parallel change in the MES-Toolbox product family. 20 We hypothesize that a schedule-driven need for inde- pendence may lead to a lack of parallel development, 15(25%) 15 whereas some relation between systems (e.g.: sys- Number of Systems tems developed the same customer) or a collaboration- 11(18.3%) overhead driven need for independence may lead to 10 8(13.3%) parallel change. For the purpose of this exploratory study, we do not yet use a strict definition of paral- 5 5(8.3%) lel change. Instead, we are interested in any form of seemingly parallel change. Do systems change in par- allel every week, month or year? Do many systems 0 change at roughly the same time, or is this only the Pre 7.1 7.2 7.2.1 7.3 Platform Version case for specific systems? Figure 1: Distribution of System Versions RQ2: How much do MES-Toolbox systems diverge from their origin? Clone-and-own allows developers nance overhead caused synchronization for these sys- to add, remove or modify files without affecting their tems. In the MES-Toolbox product family, synchro- origin. These changes will inherently cause systems to nization with products and their origin can occur in diverge from their origins; they are no longer identical. both ways. Bugs are often found and fixed on a prod- As the product family grows it often becomes increas- uct, after which the change is propagated to the plat- ingly hard to keep an overview of the available func- form project. From there, the change can be propa- tionality (Stanciulescu, Schulze, and Wa̧sowski, 2015; gated to all the other products derived from that plat- Berger et al., 2014; Duc et al., 2014). We hypothesize form version. that the degree of divergence can be used to quan- tify the complexity caused by cloning. Therefore, we 4 Research Methodology are interested to see how this property of the MES- To explore the evolution of MES-Toolbox systems, we Toolbox product family has evolved over time. built a tool that retrieves changes to each system from A developer of the MES-Toolbox platform stated the subversion (SVN) repository, performs source-code that diverged Java files often make it difficult to prop- differencing and exports the relevant information to a agate changes, but expected that the Java codebase CSV file for further analysis in R. Our tool is embed- would not significantly diverge for most of the 7.2 and ded in a modified version of JMeld1 , an open source 7.2.1 systems. Many of these systems are considered differencing tool written in Java. relatively simple, and hardly require any customer- specific modification of the codebase. First, we make a local copy of the SVN repository with the command svnadmin hotcopy, and verify its RQ3: Have all MES-Toolbox systems been synchro- integrity with svnadmin verify in the analysis envi- nized with their origin? Cloning is said to increase ronment. This local repository is used for all data col- maintenance overhead because changes to one clone lection to ensure that the data source does not change may have to be propagated to all clones. Studies during subsequent analysis. have shown however that change propagation is not al- ways performed (Stanciulescu, Schulze, and Wa̧sowski, 4.1 Selecting MES-Toolbox Systems 2015), which suggests that cloning does not necessarily increase maintenance overhead due to change propa- We extract all systems present in the local copy of the gation. repository by scanning the output of svn ls2 for paths In the organization we study, changes are manu- in the form of projecten/.*/trunk/$. We then man- ally propagated at the discretion of teams developing ually validate these paths and documented for each the systems. Application engineers stated that they system the platform version it was branched from, the periodically merge changes from the platform release name of the project, an anonymised name, the repos- to customer systems, but only while they are still un- itory path, and any unusual properties of the system der active development. Because some systems are that we have to consider during analysis. For example, developed relatively fast, we expect that some sys- development of some systems was discontinued and the tems retrieve only very few changes from their origin, systems were never put into production. We excluded thus arguably not causing much maintenance over- these systems from the analysis. Finally, we noted head. Consequently, techniques that purely reduce 1 https://sourceforge.net/projects/jmeld/ repetitive task would have limited effect on mainte- 2 svn ls -R {svnRepo} | egrep "projecten/.*/trunk/$" 3 whether the system was directly branched from the Change Activity Over Time platform, or from another branch (its nesting depth). System 1 ● ● ● ● ● There are currently four platform versions: 7.1, 7.2, Commits 7.2.1 and 7.3. The first version of the platform (7.1) 1 ● System 2 ● ● ● System ● 2 was released on 7 March 2012 and was followed rela- ●3 tively fast by the next release (7.2) on 18 December System 3 ● ● ● ●4 2012. Version 7.2.1 of the platform was released on 7 ●5 October 2014, and version 7.3 on 14 December 2016. System 4 ● ● ● Figure 1 shows the distribution of versions among sys- 01 02 03 04 05 Date (week) tems in the version repository. Twenty-one systems pre-date the first platform release. There are five 7.1 Figure 2: Example visualization for change activity. All systems, fifteen 7.2 systems, eleven 7.2.1 systems and systems exhibit different change activities, with a varying eight 7.3 systems. For this study, we mainly focus on degree of parallel change. 7.2 and 7.2.1 systems, as these have all been put into production, and are derived from a comparable base platform within the last five years. The main difference 4.3 Detecting Parallel Change between these versions is the internationalization of all text visible to the end-user. There are no significant differences in terms of architecture or functionality. To determine whether systems change in parallel, we We refer to the codebase of a specific platform ver- are interested in the time aspect of change at system- sion in the form of PL-VERSION, for example, we use level granularity. We decided to use a visualization PL7.2 to refer to version 7.2 of the platform. The in- which allows us to gain insight into whether (a) sys- ternal name of the system can contain the name of the tems change in parallel, (b) systems change continu- customer, and the location of the production facility. ously, periodically or at arbitrary moments in time, Since this information is subject to confidentiality, we and (c) to identify variance between systems. manually defined an anonymised name for each system For this visualization we chose systems as the first in the form of P-NUMBER. In this paper, we often refer dimension and time as the second dimension. To pre- to this name as Pn , which can be read as project n or vent overplotting, we group data-points by week or product n. month. By grouping data, we will not be able to dis- tinguish between systems that changed many times a 4.2 Mining Commit Metadata week, or only once a month. To mitigate this effect we For each system we selected, we extract the version introduce an additional dimension which is number of history using a bash script. This bash script uses commits (proportional to the radius of the dot). This the svn log 3 command to export the version history leads us to the view shown in Figure 2. in xml format. For each system we collect all revi- The vertical axis represents the systems, and the sions from its change history, and extract the relevant horizontal axis the time of the changes. Each dot rep- change metrics. We used the definition and format of resents a point in time when a system was changed. the change metrics dataset published by Yamashita et The radius of the dot is proportional to the number al. (2017) as a basis for our data set. of commits that occurred. In this example, we group We extract the revision number, author and date of the data-points by week. Continuous change activity each revision. Next, from the output of svn diff4 , we will give rise to a sequence of horizontally aligned dots. determine the full path of the files that were changed, Changing a system twenty times a week will result in the type of change (added, deleted or modified), and a thicker horizontal dot pattern compared to changing calculate for each file how many lines were changed, a system only once a week. In Figure 2 we observe added or deleted. From the full path of the files, we that system 1 was under continuous maintenance, as extract the file name and file extension. Note that we it was changed every week. System 2 was changed use svn diff to determine which files were changed, every other week, which appears to be more periodi- and not svn log. The reason for this is that when a cal but due to the week-based granularity may still be directory is deleted, the output of svn log only con- considered as continuous to some extent. The change tains the name of the directory, and does not contain activity for systems 3 and 4 is continuous for the first the names of the files contained in the directory. three weeks, but declining for system 3 and increasing 3 svn log --xml --stop-on-copy -v > for system 4. Finally, we see that systems 1, 2 and 3 -log.xml all changed in the first week, but system 3 has been 4 svn diff -x -U0 -c {revisionNumber} {repositoryPath} modified more frequent. 4 4.4 Measuring Divergence revision system file diffDelta diff To measure how much systems have diverged from 1 PL7.2.1 Main.java 5 5 their origin, we developed a tool that calculates how 2 P17 Main.java -5 0 much the difference between each system and its ori- 3 P17 Main.java 10 10 gin has changed over time. We do so by calculating 4 P17 Main.java 5 15 the differences for each system, for each file, at every 5 PL7.2.1 Main.java -15 0 revision that changed either the system or its origin. We perform these measurements on a local copy of the Table 1: Example data of divergence over time calculation. actual codebase of the systems. For the platforms and each system, we locally replay their change history by versa. Subversion automatically registers the merged sequentially updating the local working copy with svn revision(s) and the origin of the merge in a so-called update. After each update of a platform codebase, svn:mergeinfo property attached to files and direc- we re-calculate the differences with code-differencing tories6 . We classify each revision commit as MERGE or on all systems that have been derived from the plat- NON_MERGE by scanning the output of svn diff for an form. Similarly, after each update of the codebase of occurrence of svn:mergeinfo. a system, we re-calculate the differences between the Unfortunately, we cannot blindly trust the validity system and its origin. This technique is computation- of Subversion properties. Subversion properties can be ally intensive but does allow us to explore how much changed by hand, developers might forget to commit each revision has affected divergence. the changes to properties, or they could manually copy We measure differences at line-level granularity changes between systems without using the merging (number of lines different) with the Java implemen- system. We aim to mitigate these issues by taking into tation of GNU diff 5 . Using the file-level granular- account whether revisions have caused convergence or ity measurement, we aggregate to file-level granularity. divergence. We expect that most changes to systems By using a line-level granularity instead of a file-level will cause them to diverge from their origin and that granularity (number of files different), we will be able merging these changes to their origin will cause them to aggregate to file-level granularity and report on both to converge. Similarly, we expect that changes to the levels. We define the difference in number of lines as origin of systems will cause them to diverge, and merg- diff. During analysis, we keep track of how much the ing the change to the systems will cause them to con- difference has increased or decreased compare to the verge. We manually validate a large sample of data to previous revision, the diffDelta. ensure this is a reliable technique to detect synchro- We illustrate the divergence calculation on an arti- nization. ficial example in Table 1. In this example, PL7.2.1 is the origin of system P17 . First, we update the local 5 Results and Analysis copy of the codebase of PL7.2.1 to revision 1 and cal- In this section, we present the results of our ex- culate the differences between PL7.2.1 and P17 . We see ploratory analysis. that in revision 1, Main.java was modified on PL7.2.1 , causing a difference of five lines. Next, we update P17 to revision 2 and re-calculate the differences. We see 5.1 Parallel Change that Main.java was changed, reducing the difference RQ1. Do MES-Toolbox systems change in parallel? by five lines. This pattern of increasing and decreasing Figure 3 shows the change activity of the PL7.2 and divergence is typically caused by change propagation PL7.2.1 platforms, and all systems derived from these when revision 1 is merged to system P17 in revision 2. platforms that were included in our study. We see As the measurements continue, we see that Main.java that many systems appear to be modified almost con- was modified two more times on P17 , increasing the tinuously, even years after the first change was made. difference by ten lines in revision 3 and five lines in For example, systems P1 and P3 . Systems P4 , P8 and revision 4. Finally, in revision 5 the difference was P18 also appear to be changed continuously, but to a reduced by fifteen lines by a change on PL7.2.1 . lesser extent than the first group. The change activity for these systems appears less dense and contains more 4.5 Detecting Synchronization periods of inactivity. The longest period of inactivity for these systems is approximately four months7 for Systems retrieving changes from their origin, or con- system P4 . tributing changes to their origin is often done by merg- ing the revision from the system to its origin or vice 6 http://svnbook.red-bean.com/en/1.7/svn.branchmerge. basicmerging.html 5 http://www.bmsi.com/java/#diff 7 124 days, 14 March 2014 to 16 July 2014 5 Change Activity Over Time PL−7.2 ● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●● ● ● ●● ●●●●●●●● ● ●●●● ●●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● P−1 ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ● ● ● ●●●●● ●●●●●●●●●●● ● ● ● ●●●●● ●●●● ●●●● ● ●●●●●● ● ● ●● ● ●●●●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● P−2 ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● P−3 ● ● ●● ●● ●● ●●●●● ●●●●●●●●●● ●●●●●●●●●● ● ●●●● ● ●● ●●●●●●●●● ● ● ●●● ●●●●●●●●●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● P−4 ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ●●●●●●●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● P−5 ●●●●●●● ● ● ● ● ● ● ● ● P−6 ●● ● ●●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● P−7 ● ●● ● ●●●●● ●●●●●●●●● ●●●●●●● ● ●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ●●●●●●●● ● ●●●●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●● ●●● 7.2 P−8 ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● P−9 ● ● ●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● P−10 ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ●● ● ● P−11 ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● P−12 ● ● ● ● ● ●● ● ● ● P−13 ● ●●● ● ● ● ● ●● ● ● ● ● ● System P−14 ● ●●●●●●●●●●● ●●●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●●●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● P−15 ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● PL−7.2.1 ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● P−16 ●● ● ●● ●●●●● ●●●●●●●●●●● ● ●●●● ●●●●●●●●●●●●●●●●● ● ●● ●● ● ● ● ● ●●●●●● ● ● ● ● ●● ●●● ● ●● ● ●●●● ● ● ●●●●●●● ● ● ●● ● ●●● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● P−17 ●● ● ●● ●●●● ● ●●● ● ● ●● ● ● ● ● P−18 ● ●● ● ● ● ● ● ●● ● ●● ● ● ●●●● ● ●●● ● ● ●●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● P−19 ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● P−20 ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● 7.2.1 P−21 ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● P−22 ● ●● ● ● ●●● ●● ● ●●● ● ● ●● ● ● ● ● ● P−23 ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● P−24 ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● P−25 ● ●● ● ● ● ● ●● ● ● ●● ● ● ● P−26 ● ● ● ●● ● ● ● ● ● ●●● ● ● 2013 2014 2015 2016 2017 Date Commits ● 20 ● 40 ● 60 ● 80 Figure 3: The change activity of PL7.2 and PL7.2.1 systems. The majority of the systems show an initial burst The change activity of MES-Toolbox systems seems of activity at the beginning of the project, followed by consistent with observations by Lettner, Angerer, a varying amount of activity afterward. This seems Grünbacher, et al. (2014), who stressed the impor- similar to the change frequency of Keba, an industrial tance of platform quality characteristics like stability automation ecosystem studied by Lettner, Angerer, and backward compatibility, and long-term platform Grünbacher, et al. (2014). In the Keba ecosystem, evolution in the domain of industrial automation. The the change frequency reportedly largely depends on oldest system we analyzed was 11 years old, and still customer requirements, and most changes happen continuously changed. Some systems were inactive for within the first three to four weeks in a customer years before becoming active again due to new cus- project. In our case, for many systems, most change tomer demands. This is not necessarily the case for activity does appear to occur in the first period of the other systems developed with clone-and-own. Stanci- project, but this period is much longer (2-4 months). ulescu, Schulze, and Wa̧sowski (2015) found forks in Manual examination of some of the changes that oc- the Marlin ecosystem, an open source firmware for 3D curred after this initial period, suggests that they are printers, to be characterized by a short maintenance often (critical) bug-fixes or minor changes requested lifetime (101 days on average). by the customer. For example, P5 was changed With regard to whether and to what extent MES- on 31 July 2015 after being inactive for almost Toolbox systems have been changed in parallel, we a year (311 days). Manual analysis of this change clearly see that multiple MES-Toolbox systems are shows that this change was triggered by a customer re- changed roughly at the same time. However, the de- quest after the physical production line was modified. gree of parallel change is not the same for all systems, nor is it constant over time. Many systems appear to 6 P−1 P−2 P−3 P−1 P−2 P−3 150000 20 15 100000 10 50000 5 0 0 P−4 P−5 P−6 P−4 P−5 P−6 150000 20 15 Number of Lines Diverged (.java files) 100000 10 % of Java Files Diverged 50000 5 0 0 P−7 P−8 P−9 P−7 P−8 P−9 150000 20 15 100000 10 50000 5 0 0 P−10 P−11 P−12 P−10 P−11 P−12 150000 20 15 100000 10 50000 5 0 0 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 Date Date Figure 4: Divergence over time for a subset of PL7.2 and PL7.2.1 systems in percentage of files and number of lines. be changed in parallel initially until the development It may be seen clearly that while divergence tends of one system is done and they no longer change in to increase over time, there is a variance both in the parallel. For example, if we look at systems P4 and degree of divergence and rate of divergence. In the first P5 , we see a major reduction in change activity of sys- year of the history of systems P1 , P2 , P3 , and P7 , the tem P5 after June 2013, but the development of system proportion of diverged Java files appears to be highly P4 continues. This type of pattern is what we would volatile compared to the other systems. This can also expect to see due to a schedule-driven need for inde- be seen in divergence in number of lines, but is less pendence. clear. Furthermore, we observe at least two vertically In terms of percentage of Java files, all systems at aligned dot patterns. These patterns occur if multiple some point in time diverged between 7% and 22.5% systems are changed at roughly the same time, while from their origin. This suggests that all systems, even many of those systems did not change before or after those that do not frequently change, can diverge sig- that time. Manual inspection of these patterns shows nificantly. In terms of diverged number lines, most that both instances were critical bug fixes, manually systems did not exceed 50.000 lines (<5%), and only merged to most systems on the same day, regardless two systems diverged more than 75.000 lines. of the development schedule of the systems. The fact Overall, we see that divergence measured in per- that we do not see many of these vertical line patterns centage of Java files can be significantly different from suggests that mass-synchronization of many systems divergence measured in terms of number of lines. In at once does not happen often in the MES-Toolbox 2014 the diverged number of lines for system P6 rapidly product family. increased from less than 25.000 lines to more than 140.000 lines. We do not see this growth in the file- 5.2 Divergence based measurement. Manual analysis of this anomaly shows that a developer deleted a module from the RQ2. How much do MES-Toolbox systems diverge codebase which was not required for the project but from their origin? In this research question, we cal- was causing merge-conflicts. culate how much MES-Toolbox systems are different Even though the codebase of many systems report- from their origin, and explore how this property has edly hardly required any customer-specific modifica- changed over time. Figure 4 shows the divergence mea- tions, they still diverged significantly. For these sys- surements over time, in terms of percentage of files and tems, this divergence was not caused by changes to the number of lines. systems, but by the lack of synchronization of changes 7 Synchronization Change Activity Contributed to Origin Retrieved from Origin P−1 |●●●●●●● ● ●●●●●● ● ● ● ● ● ●● ● ● ● ● ● |●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● P−2 |● ● ● | ●● ● ●● ● ● ● ● ● ● P−3 | ● ● ● ●●● ● ● ● ●● | ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● P−4 | ● ●● ● ● ● | ● ● ● ● ●● ● P−5 | | ● ● ● ● P−6 | ● ● ● ● | ● ● ●● ● ● ● P−7 | ● ● ● ● ● ●● ● ● |● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● 7.2 P−8 | ● ●● ● | ● ● ● ● ● ●● ● ● P−9 | | ●● ● Commits P−10 | ● ● ● | ● ● ● ● ● ● 1 System ● 2 P−11 | ● ● ● | ● ● ●● ● ● 5 P−13 | | ● ● ● ● ● 10 P−15 | | ● ● ● ● ● ● ● ● ● 20 P−16 | ● | ● ●● ● ● ● ● ● ● ● ● ● P−17 | | ● ● ● ● ● ● P−18 | ● ● ● ● | ● ● ● ● ● ● P−20 | ● | ● ● ● ● ● ● ● ● 7.2.1 P−21 | ● ● ● | ● ● ● ● ● ● P−23 | | ● ●● P−25 | ● | ● P−26 | | ● 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 Date Figure 5: Synchronizing Changes of PL7.2 and PL7.2.1 systems. from their origin to the system. This is a form of inde- raises the question; how do we distinguish between pendent evolution, a pattern of commits where clones these types of divergence, and how do they affect anal- diverge throughout the studied time-interval. How- ysis tools and techniques? Analyzing differences be- ever, some clones were eventually synchronized which tween variants is the primary activity performed when is a form of late propagation, a pattern of commits migrated to a more structured software product line where clones diverge, and later in time converge again approach. Based on these differences, variants can be after changes are propagated (Schmorleiz and Lammel, merged into a single variant or points where variation 2016). is needed can be identified. In the context of varia- Thummalapenta et al. (2010) studied clone evolu- tion analysis, differences caused by late propagation tion patterns for cloning in-the-small, and confirmed are not necessarily relevant. the possibility of late propagation being misclassified as independent evolution. However, they found that 5.3 Synchronization late propagation patterns always took place in much less time than their total time interval of observations, RQ3. Have all MES-Toolbox systems been synchro- thus concluded that such misclassification would occur nized with their origin? only rarely. Our data suggest that cloning in-the-large To detect whether a change to either PL7.2.1 or PL7.2 may be much more susceptible to misclassification, as was contributed by one of the systems, we identify in our case the systems are often synchronized at ar- revisions that caused at least one system to converge bitrary points in time. System P8 did not retrieve any one line. In the combined history of PL7.2.1 and PL7.2 , new changes from its origin for almost a year, after there were 501 revisions for which this was the case. which a bulk of changes were propagated at once, re- We manually inspected these revisions and found that ducing the proportion of diverged Java files from 7.5% 372 revisions (74%) were correctly classified as changes to less than 4%. contributed by the converging system(s). Out of these The maintenance overhead caused by divergence 372 revisions, 17 revisions did not have merge-info. due to late propagation is arguably different from di- A detection strategy solely based on the presence of vergence due to customer-specific modifications. This merge-info would have missed these revisions. 8 To detect whether systems retrieved changes from viding quantitative data to support our findings and their origin, we identify for each system, all revisions collaboration with an external supervisor. that have merge-info, and caused at least one Java External Validity Development practices in other file to converge with the origin of the system. The organizations that use clone-and-own might have dif- change history of system P4 contained 18 revisions ferent effects on the evolution of the system, which with merge-info, of which 14 caused convergence. Out may lead to different observations. However, some of of these 14 revisions, 12 (85%) were correctly classified our findings are consistent with those of other, inde- as changes retrieved from its origin. pendent studies. Figure 5 shows the synchronizing changes over time. In our analysis of synchronizing changes, we looked We see that all systems retrieved changes from their at the number of synchronizing commits. The number origin at least once, and most but not all systems con- of commits can be affected by the behavior of individ- tributed changes to their origin. This is different from ual developers. Developers can choose to merge each Marlin forks, as Stanciulescu, Schulze, and Wa̧sowski individual revision, or merge a large number revisions (2015) found that 15% of all forks, and 34% of all ac- at once. The first style clearly results in a higher num- tive forks synchronized at least once with the main ber of commits compared to the latter, but arguably Marlin repository. requires more effort too. While all systems retrieve changes from their origin, some do so significantly more frequent than others. 7 Related Work Systems P1 and P3 retrieved changes from their origin Clone Evolution Patterns respectively 202 and 89 times. Furthermore, we see that the period of time between subsequent synchro- Thummalapenta et al. (2010) proposed an approach nizations can be relatively long. For example, system for the identification of the evolution of cloned code P6 retrieved changes from its origin on 22 July 2013, fragments over time and categorized the evolution pat- and 8 months later on 24 March 2014. This is con- terns as (a) Consistent Evolution, (b) Late Propaga- sistent with the results in the previous section, where tion, (c) Delayed Propagation, and (d) Independent we identified long time-interval late propagation in the Evolution. In our study, we used these patterns to visualization of divergence over time. characterize some of the change patterns we observed Finally, we observe at least two instances of verti- in the evolution of the product family. For example, cally aligned dots. These patterns can be caused by Delayed Propagation was used as a strategy to vali- multiple systems retrieving changes from their origin date the correctness of changes on some variants, be- roughly at the same time. Manual inspection of these fore propagating them to all variants. Independent patterns shows that both instances were critical bug Evolution was used to keep the variant as-is after the fixes, manually merged to most systems on the same project had been commissioned and the testing phase day. The fact that we do not see many of these ver- had already finished. tical line patterns suggests that mass-synchronization Similar characteristics were found by Stanciulescu, of many systems at once does not happen often in the Schulze, and Wa̧sowski (2015) in a study on the ad- MES-Toolbox product family. vantages and disadvantages of forking using the case of Marlin, an open source firmware for 3D printers. 6 Threats to Validity They found that important bug-fixes were not propa- gated and functionality was sometimes developed more Internal Validity During our study, the MES- than once. Intuitively you may consider these findings Toolbox product family continued to change. To pre- to be bad practices and drawbacks of clone-and-own. vent this change from affecting our results, we obtained However, there are situations where this may be de- a local copy of the repository. This local copy of the sirable, as the authors found that “Once the firmware repository was used throughout the study. is configured and running on the printer, new changes We used the merge-info property to determine are not desired”. whether a commit was a merge. Since this prop- In an environment where the potential cost of an erty can be incorrect, we additionally checked whether error can be significant, systems are changed as lit- commits caused systems to converge. We cross- tle as possible when maintained (Cordy, 2003). In checked the precision of this technique by manually a clone-an-own based system, this characteristic can inspecting revisions, and achieved a good precision. be detected by looking for patterns like Independent While the experience of the author as a developer Evolution, the lack of synchronization with the ori- of the system may provide a detailed interpretation gin, or redundant code. This is in line with some of of fine-grained changes, this can cause some bias. We the cloning patterns described by Kapser and Godfrey aimed to reduce this threat as much as possible by pro- (2006). They argued that code duplication can also 9 have benefits, and described the pros and cons in a degree of variation in the implementation of crosscut- catalog of cloning patterns used in real-world systems. ting concerns, we expect that this may also affect the extent to which changes are propagated, and how the Software Ecosystem Characteristics code-bases diverge. Lettner, Angerer, Grünbacher, et al. (2014) studied Marin, Moonen, and Deursen (2005) propose a clas- the relevance of characteristics of Software Ecosystems sification system for crosscutting concerns in terms of in the domain of industrial automation and found some sorts, where a sort is a description based on a num- additional characteristics that according to them are ber of distinctive properties. A sort we expect to find of particular importance in the industrial automation often in this case study is Entangled Roles. In Object domain. For example, platform quality characteristics Oriented terminology this sort is defined as Implement like stability and backward compatibility, and long- a method with (entangled) functionality that belongs term platform evolution seemed to be essential to the to a different concern than the main concern of that success of the studied system. One of the reasons for method. A characteristic of clone-and-own is that it this conclusion was that “application engineer B re- allows application engineers to make these kinds of ported that he had to update a ten-year-old version of fine-grained changes quickly. For example, a customer the platform software because an important customer wants to be notified when stock levels exceed a certain had decided to leave out several platform releases and value. If there is no such monitoring system in place, then requested a new feature. This led to significant then the fastest solution can be to add this function- difficulties in merging the old software version with the ality to a method that deals in some way with stock- new functionality.”. Developers of the system we study control. Implementation of a generic solution may ex- have reported similar issues with upgrading customer ceed the level of expertise of the application engineer, systems to a new release. and waiting for a platform engineer to develop the so- In a later study by Lettner, Angerer, Prähofer, et al. lution may take too much time. (2014), the change characteristics and software evo- lution challenges of the same ecosystem were inves- Figueiredo et al. (2009) describe 13 patterns of tigated. The software change taxonomy of Buckley crosscutting concerns identified in three case studies, et al. (2005) was used to describe qualitatively when, one of which was a software product line. The authors where, and how changes were made in different parts found that some patterns consistently emerged in sit- of the system and what was affected by changes. The uations with the frequent use of inheritance. They authors found that the ecosystem is subject to both found that this was often the case in product lines continuous and periodic evolution. The core platform because “Program families rely extensively on the use is continuously changed to include new features and of abstract classes and interfaces in order to imple- bug-fixes, while those changes are only periodically ment variabilities. The inappropriate modularization released to platform users. The granularity of these of such crosscutting concerns might lead to future in- changes is reportedly primarily coarse for customer stabilities in the design of the varying modules” requirements, and fine for bug fixes. Propagation of changes is done by hand, and change impact analysis Detection of crosscutting concerns is called aspect is performed manually, based on expert knowledge. mining. Various aspect mining techniques have been The system we study is in the same domain and proposed (Kellens, Mens, and Tonella, 2007; Tourwé seems to be developed similarly. Our study is different and Mens, 2004; Ceccato et al., 2006). For exam- in a sense that we support our findings with visual ple, fan-in analysis looks for crosscutting functional- representations of the evolution of the system. For ity by detecting methods that are explicitly invoked example, we know that in this case changes are also from many methods scattered throughout the code propagated by hand, so we developed a technique to (Marin, Deursen, and Moonen, 2007). History-based show how frequent this is actually done in the MES- concern mining techniques analyze change-history to Toolbox product family. detect which program entities change together fre- quently (Breu and Zimmermann, 2006; Adams, Jiang, Crosscutting Concerns and Hassan, 2010). Hashimoto and Mori (2012) devel- oped a tool that improves history-based concern min- A possible area of interest in the analysis of clone- ing by combining it with fine-grained change analysis and-own evolution is the presence and development of based on abstract syntax tree differencing. crosscutting concerns in the system. A crosscutting concern is a feature whose implementation is spread In future work, we intend to use these tools and across many modules (Marin, Deursen, and Moonen, techniques to gain a deeper understanding of the 2007). If product variants, or clones, exhibit a high change and divergence patterns we found. 10 Clone-and-Own in Product Line Engineering related points of interest. First, we explored whether MES-Toolbox systems have changed in parallel. Next, Dubinsky et al. (2013) studied the processes and per- we investigated how much the codebase of the systems ceived advantages and disadvantages of the clone-and- diverged from their origin, and to what extent this own approach of six industrial software product lines. changed over time. Finally, we studied the synchro- They show that cloning is perceived as a favorable and nization activity between systems and their origins. natural reuse approach by the majority of practition- We observed that many MES-Toolbox systems are ers in the studied companies, mainly because of its changed roughly at the same time, but that the degree simplicity and availability. They found that practi- of parallel change is not the same for all systems, nor tioners lack the awareness and knowledge about forms is it constant over time. Many systems appear to be of reuse, and many alternative approaches fail to con- changed in parallel initially until the development of vince them that they yield better results. one system is done and they no longer change in par- Rubin, Czarnecki, and Chechik (2013) proposed a allel. This is consistent with a schedule-driven need framework to organize knowledge related to the devel- for independence. We further observed a schedule- opment, maintenance and merge-refactoring of prod- independent cause for parallel change, which was the uct lines realized via cloning. This framework is a step need to propagate critical bug fixes to many systems towards a recommender system that can assist users in on the same day. This form of mass-synchronization selecting tools and techniques that are useful in their appeared to have occurred only twice in the history of situation. the systems we analyzed. Hetrick, Krueger, and Moore (2006) report on the With regard to divergence, we found that all MES- experience of a structured, incremental transition from Toolbox systems we analyzed, including those which a clone-and-own approach to software product line reportedly hardly required any customer-specific mod- practices. They show that it is possible to make this ifications, diverged significantly from their origin. In transition without a significant upfront investment and terms of the proportion of Java files, all systems di- disruption of the ongoing production schedules. The verged between 7% and 22.5% from their origin. In authors indicate that the file branch factor gradually terms of diverged number lines, most systems did not reduced during the transition, to a point where all exceed 50.000 lines (<5%), and only two systems di- branches from product line core assets were completely verged more than 75.000 lines. We identified one case eliminated. This metric is defined as the average num- where the divergence measured in percentage of Java ber of branched files per product, normalized by the files was significantly different from divergence mea- number of products. Our study shows that the num- sured in terms of number of lines. ber of branched files per product can vary significantly During our analysis of divergence over time, we were between systems and over time. Hence, care has to able to identify points in time when systems were syn- be taken when using the average. Furthermore, we chronized with their origin. Our analysis of synchro- found that products with a similar percentage of files nizing changes confirms these findings, and we found diverged can vary significantly in terms of total num- that all systems we analyzed retrieved changes from ber of lines diverged. their origin at least once, but not all systems con- Antkiewicz et al. (2014) propose an incremental and tributed changes back to their origin. minimally invasive strategy for adoption of product- Overall, these results show that products from the line engineering. The strategy is called virtual plat- same product family can vary significantly in terms of form, and should allow organizations to obtain incre- change activity over time, divergence from their ori- mental benefits from incremental changes to the de- gin and synchronization activity. It is important to velopment approach. By studying the development keep this in mind when studying product families re- practices of our industry case, we gain insight into an alized via clone-and-own, as these variations may play industry context and the needs of practitioners. This an important role in reducing maintenance overhead. may serve as input for recommender systems, require- In future work, we will further investigate these factors ments for the virtual platform, and can be helpful to to develop quantitative measures for the assessment of practitioners, researchers and tool developers. clone-and-own benefits and drawbacks. 8 Conclusion Acknowledgements In this work, we presented the results of our ex- We thank prof. dr. J.J. Vinju, the reviewers and other ploratory analysis of an industry product family de- participants of the SATToSE 2017 seminar for their veloped using a clone-and-own approach. The goal of helpful input on related literature and the direction of this analysis was to gain insight into how the prod- this study. uct family has evolved, and to identify clone-and-own 11 References Kellens, A., K. Mens, and P. Tonella (2007). “A Survey of Automated Code-Level Aspect Mining Techniques”. Adams, B., Z. M. Jiang, and A. E. Hassan (2010). “Iden- In: Transactions on Aspect-Oriented Software Develop- tifying Crosscutting Concerns Using Historical Code ment IV. Berlin, Heidelberg: Springer Berlin Heidelberg, Changes”. In: Proceedings of the 32nd ACM/IEEE In- pp. 143–162. ternational Conference on Software Engineering - ICSE Lettner, D., F. Angerer, P. Grünbacher, et al. (2014). “Soft- ’10. Vol. 1. ACM, pp. 305–314. ware Evolution in an Industrial Automation Ecosystem: Antkiewicz, M. et al. (2014). “Flexible Product Line En- An Exploratory Study”. In: Software Engineering and gineering with a Virtual Platform”. In: Companion Advanced Applications (SEAA), 2014 40th EUROMI- Proceedings of the 36th International Conference on CRO Conference on. IEEE, pp. 336–343. Software Engineering - ICSE Companion 2014. ACM, Lettner, D., F. Angerer, H. Prähofer, et al. (2014). “A Case pp. 532–535. Study on Software Ecosystem Characteristics in Indus- Berger, T. et al. (2014). “Three Cases of Feature-Based trial Automation Software”. In: Proceedings of the 2014 Variability Modeling in Industry”. In: Lecture Notes in International Conference on Software and System Pro- Computer Science (including subseries Lecture Notes in cess - ICSSP 2014. ACM, pp. 40–49. Artificial Intelligence and Lecture Notes in Bioinformat- Marin, M., A. van Deursen, and L. Moonen (2007). “Iden- ics). Vol. 8767. Springer, pp. 302–319. tifying Crosscutting Concerns Using Fan-In Analysis”. Breu, S. and T. Zimmermann (2006). “Mining Aspects In: ACM Transactions on Software Engineering and from Version History”. In: Automated Software Engi- Methodology (TOSEM) 17.1, pp. 1–37. neering, 2006. ASE’06. 21st IEEE/ACM International Marin, M., L. Moonen, and A. van Deursen (2005). Conference on. IEEE, pp. 221–230. “A Classification of Crosscutting Concerns”. In: 21st Buckley, J. et al. (2005). “Towards a Taxonomy of Software IEEE International Conference on Software Mainte- Change”. In: Journal of Software Maintenance and Evo- nance (ICSM’05). IEEE, pp. 673–676. lution: Research and Practice 17.5, pp. 309–332. Rubin, J., K. Czarnecki, and M. Chechik (2013). “Manag- Ceccato, M. et al. (2006). “Applying and Combining Three ing Cloned Variants: A Framework and Experience”. In: Different Aspect Mining Techniques”. In: Software Qual- Proceedings of the 17th International Software Product ity Journal 14.3, pp. 209–231. Line Conference - SPLC ’13. ACM, p. 101. Cordy, J. R. (2003). “Comprehending Reality - Practi- Schmorleiz, T. and R. Lammel (2016). “Similarity manage- cal Barriers to Industrial Adoption of Software Mainte- ment of ’cloned and owned’ variants”. In: Proceedings of nance Automation”. In: Program Comprehension, 2003. the 31st Annual ACM Symposium on Applied Comput- 11th IEEE International Workshop on. IEEE, pp. 196– ing - SAC ’16. New York, New York, USA: ACM Press, 205. pp. 1466–1471. Dubinsky, Y. et al. (2013). “An Exploratory Study of Schrock, S., A. Fay, and T. Jager (2015). “Systematic inter- Cloning in Industrial Software Product Lines”. In: Pro- disciplinary reuse within the engineering of automated ceedings of the European Conference on Software Main- plants”. In: Systems Conference (SysCon), 2015 9th An- tenance and Reengineering, CSMR, pp. 25–34. nual IEEE International, pp. 508–515. Duc, A. N. et al. (2014). “Forking and coordination in Stanciulescu, S., S. Schulze, and A. Wa̧sowski (2015). multi-platform development: a case study”. In: Proceed- “Forked and Integrated Variants in an Open-Source ings of the 8th ACM/IEEE International Symposium Firmware Project”. In: 2015 IEEE International Con- on Empirical Software Engineering and Measurement - ference on Software Maintenance and Evolution (IC- ESEM ’14. New York, New York, USA: ACM Press, SME). IEEE, pp. 151–160. pp. 1–10. Thummalapenta, S. et al. (2010). “An Empirical Study on Figueiredo, E. et al. (2009). “Crosscutting Patterns and the Maintenance of Source Code Clones”. In: Empirical Design Stability: An Exploratory Analysis”. In: IEEE Software Engineering 15.1, pp. 1–34. International Conference on Program Comprehension, Tourwé, T. and K. Mens (2004). “Mining Aspectual Views pp. 138–147. using Formal Concept Analysis”. In: Source Code Analy- Hashimoto, M. and A. Mori (2012). “Enhancing History- sis and Manipulation, Fourth IEEE International Work- Based Concern Mining with Fine-Grained Change Anal- shop on. IEEE Comput. Soc, pp. 97–106. ysis”. In: 2012 16th European Conference on Software Yamashita, A. et al. (2017). “Software Evolution and Maintenance and Reengineering. IEEE, pp. 75–84. Quality Data from Controlled, Multiple, Industrial Hetrick, W. A., C. W. Krueger, and J. G. Moore (2006). Case Studies”. In: Proceedings of the 14th International “Incremental Return on Incremental Investment: En- Conference on Mining Software Repositories. IEEE, genio’s Transition to Software Product Line Practice”. pp. 507–510. In: International Conference on Object-Oriented Pro- gramming, Systems, Languages and Applications. ACM, pp. 798–804. Kapser, C. and M. Godfrey (2006). “"Cloning Considered Harmful" Considered Harmful”. In: 2006 13th Working Conference on Reverse Engineering. IEEE, pp. 19–28. 12