8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) Augmenting Window Contents with Transfer Learning for Effort Estimation Sousuke Amasakia a Okayama Prefectural University, 111 Kuboki, Soja, 719-1197, Japan Abstract BACKGROUND: Some studies showed filtering out old completed projects with a window was effective for preparing a train- ing dataset of an effort estimation model. Other studies showed selecting completed projects similar to a target project was also effective. The application of the similarity-based selection after the windowing approach was failed to synthesize their effects. The shortage of similar projects in the windowed pool was a potential cause of the failure. AIMS: To examine whether augmenting the window pool is effective to improve the estimation accuracy. METHOD: The moving windows approach was used for preparing a window pool. The similarity-based selection was applied to augment the pool. The selection assumes that projects in the pool form a set of virtual target projects. Old projects outside the pool were assumed to form a set of cross-company projects to be selected. The empirical study with a single-company ISBSG data was conducted to evaluate the effect. RESULTS: A positive synergistic effect was observed. The augmented window could synthesize the windowing approach and the similarity-based selection. It could also be combined with the similarity-based selection without perfor- mance degradation. CONCLUSIONS: Practitioners should consider adding projects similar to recently completed projects when effort estimation is based on historical data. Keywords effort estimation, moving windows, augmenting windows 1. Introduction project data regarding metrics used for estimation. The study found that the combination of those techniques The success of software projects relies on many fac- might be worse than the independent application. tors. The accuracy of software effort estimation is an The negative synergistic effect can be reasoned, at serious influential factor at early project phase. Over- least, in two aspects. First, the relevancy filtering was estimation and underestimation have caused serious applied after applying the chronological filtering. The consequences for decades. Researchers have studied chronological filtering does not care about feature vari- data-driven software effort estimation models while ables and may select a subset that does not hold enough experts’ judgment is still a primary choice in actual. projects similar to a target project. It would be better The accuracy of the software effort estimation models to augment the subset with old but resemble projects is considered insufficient among not a few managers. using the relevancy filtering. Second, the simple aver- Software effort estimation models are affected by age and median were used as effort estimation models the adequacy of historical data from past projects. For as discussed in [1]. The simple models only used the instance, an organization’s productivity is not station- effort variable for estimation and were insensitive to ary nor monotonic due to changes in the environment the change in the distribution of feature variables af- and the organization itself. Inaccurate effort estima- ter the relevancy filtering. tion models would be obtained with the historical data This paper proposed an augmented chronological that might not reflect the present productivity. A key filtering based on the chronological filtering and the to accurate software effort estimation is to prepare his- relevancy filtering. Its effects were investigated with torical data that reflect the characteristics of a target a software effort estimation model using feature vari- project to be estimated. ables, in addition to the simple average and median A past study [1] examined two filtering techniques, models. The augmented filtering was also evaluated namely, chronological filtering and relevancy filtering. as alternative chronological filtering in the past com- The chronological filtering [2] removes too old project bination method. The following questions were asked: data. The relevancy filtering [3] removes dissimilar RQ1: Does augmenting moving windows with a rele- QuASoQ 2020: 8th International Workshop on Quantitative vancy filtering affects the estimation accuracy? Approaches to Software Quality, December 01, 2020, Singapore email: amasaki@cse.oka-pu.ac.jp (S. Amasaki) RQ2: Does using the augmentation as a chronologi- orcid: 0000-0001-8763-3457 (S. Amasaki) Β© 2020 Copyright for this paper by its authors. Use permitted under Creative cal filtering affect the estimation accuracy of the CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) past combination method? 4 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) 2. Related Work fort estimation. They applied it to transfer old project to a new project and found that TEAK was effective 2.1. Chronological Filtering not only for cross-company effort estimation but also for cross-time effort estimation [14]. Although research in software effort estimation mod- NN-filter is based on a nearest neighbor algorithm. els has a long history, relatively few studies have taken In that sense, a study by Amasaki and Lokan [5] can into consideration the chronological order of projects. be considered an evaluation study of the combination Therefore, chronological filtering has not been studied of the relevancy filtering and the chronological filter- well compared with other topics in effort estimation. ing. In that study, the combination worked well to im- To our knowledge, Kitchenham et al. [4] were first prove estimation accuracy for a narrow range of win- to suggest the use of chronological filtering. They built dow sizes. While that study used a wrapper approach four linear regression models with four subsets, each for feature selection and logarithmic transformation in of which comprised projects from different ranges of addition to the nearest neighbor algorithm, our study time duration. As the coefficients of the models were aims to explore the effects of the combination with- different from each other, they allowed to drop out out such complicated factors. For that purpose, we older project data. Lokan and Mendes [2] were the first adopted two simple estimation techniques that were to study the effect of using moving windows in detail. not adopted in [5], described in the next section. They used linear regression (LR) models and a single- company dataset from the ISBSG repository. Training sets were defined to be the 𝑁 most recently completed 3. Methodology projects. They found that the use of a window could affect accuracy significantly; predictive accuracy was 3.1. Effort Estimation Techniques better with larger windows; some window sizes were particularly effective. Amasaki and Lokan also inves- In [1], average and median were used as software ef- tigated the effect of using moving windows with Esti- fort estimation models. The average was adopted be- mation by Analogy [5] and CART [6]. They found that cause it uses the whole training set and is sensitive to moving windows could improve the estimation accu- the distribution of effort values in the training set. The racy, but the effect was different than with LR. median was adopted because it is robust to the distri- Recent studies showed the effect and its extent could bution and contrasts with the average. These models be affected by windowing policies [7] and software or- estimate efforts without adjustments based on feature ganizations [8]. Lokan and Mendes [7] investigated variables of projects. the effect on accuracy when using moving windows of To examine the difference in the use of feature vari- various ranges of time duration to form training sets ables in software effort estimation, we also adopted on which to base effort estimates. They also showed Lasso [15] for our experiment. Lasso is a kind of pe- that the use of windows based on duration could affect nalized linear regression models. Past studies on the the accuracy of estimates, but to a lesser extent than chronological filtering used Lasso and showed that the windows based on a fixed number of projects [8]. chronological filtering was effective with it. Our ex- periment used LassoLarsIC of scikit-learn library. 2.2. Relevancy Filtering 3.2. Chronological Filtering Relevancy filtering is a type of transfer learning ap- proach. While many filtering approaches have been This study adopted fixed-size moving windows [2] and proposed for cross-project defect prediction (e.g., [9]), fixed-duration moving windows [8]. The latest 𝑁 fin- a few studies on cross-company effort estimation have ished projects were selected as a training set by the evaluated the effects of relevancy filtering approaches. fixed-size moving windows. The fixed-duration mov- Turhan and Mendes [3] applied brings a so-called ing windows selected the latest projects finished within NN-filter [10] to cross-company effort estimation of 𝑁 months. As 𝑁 influences on the effectiveness of web projects. They showed that an estimation model moving windows, we explored various values as well based on raw cross-company data was worse than that as past studies. based on within-company data but was improved as comparable one by using the NN-filter. Kocaguneli et 3.3. Relevancy Filtering al. [11, 12, 13] also introduced a transfer learning ap- proach called TEAK for improving cross-company ef- This study used a nearest neighbor algorithm as a rele- vancy filtering approach. It is also called NN-filter [10]. 5 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) The procedure of NN-filter is as follows: 3.5. Combination 1. Select π‘˜ closest instances of history data to each The combination of the chronological filtering and the instance of target project data in terms of un- relevancy filtering was investigated in [1]. The chrono- weighted Euclidean distance. logical filtering and the relevancy filtering were com- 2. Combine the selected instances without dupli- bined as follows: cation. 1. Recently completed projects are selected with Note that each feature of project data was normalized the moving windows approach. The remained with min-max normalization before the distance cal- old projects are discarded. culation. 2. NN-filter is applied to select projects from the As the synergistic effect could be observed with ef- recently completed projects. The selected projects fective filtering, the relevancy filtering had to be con- resemble a target project to be estimated. figured as effective. For average and mean effort esti- 3. The selected projects are used to train a software mation models, we roughly fixed π‘˜ = 3, which is the effort estimation model. smallest number which can make average and median estimations give distinct efforts. For lasso, we roughly The combination method was found less effective than fixed π‘˜ = 10, half of the minimum of the window sizes each of the filtering methods in [1] with mean and me- we explored. In general, increasing π‘˜ would lead to dian models. worse estimation if NN-filter works well. Hence, these The augmentation can be considered a variation of values could not be the best but were expected more moving windows approach while it is a way to com- reasonable than larger π‘˜s. bine moving windows and NN-filter. In this paper, this combination was also examined using a subset ob- tained by the augmentation. As the augmentation has 3.4. Augmentation more projects, NN-filter might bring better neighbors The augmentation adds old projects selected by the from an augmented subset. relevancy filtering into a subset obtained by the chrono- logical filtering as follows: 3.6. Experiment procedure 1. Recently completed projects are selected with As the chronological filtering relies on the time prox- the moving windows approach. imity, our experiment needs to assume a situation that 2. NN-filter is applied to select projects from the a development organization needs to respond to con- remained old projects. The most similar project tinuously coming new projects. The size of windows to each project of the recently completed projects influences on where our experiment starts. As same as is selected. The set of selected projects has no the past studies, our experiment with a specific win- duplicate. dow size was conducted as follows: 3. The selected projects and the recently completed 1. Sort all projects by starting date. projects are combined. 2. For a given window size 𝑁 , find the earliest project 4. The combined projects are used to train a soft- 𝑝0 for which at least 𝑁 + 1 projects were com- ware effort estimation model. pleted prior to the start of 𝑝0 (projects from 𝑝0 Note that NN-filter uses the effort variable in addition onwards are the ones whose training set is af- to feature variables. As efforts of the past projects are fected by using a window, so they form the set known, it is possible to use the effort variable in the of evaluation projects for this window size. For augmentation process. example, with a window of 20 projects, at least The augmentation shares the same assumption as 21 projects must have finished for the window the chronological filtering that the recently completed to differ from the growing portfolio.) projects resemble a target project to be estimated. Re- 3. For every project 𝑝𝑖 in chronological sequence, sults of NN-filter are also expected to pretend to be as starting from 𝑝0 , form a training set using mov- fresh as the recently completed projects. Therefore, ing windows and the growing portfolio (all com- the selected projects are considered to keep the simi- pleted projects). larity to the target project. β€’ For no filtering, the training set is all projects that finished before 𝑝𝑖 started. 6 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) Table 1 We concentrate first on the statistical significance of Summary statistics for ratio-scaled variables in data from differences in accuracy that arise from using the filter- single ISBSG organization ing approaches. To test for statistically significant dif- ferences between accuracy measures, we use the two- Variable Min Mean Median Max StDev sided Wilcoxon signed-rank test (wilcoxon function Size 10 496 266 6294 699 of the scipy package for Python) and set the statisti- Effort 62 4553 2408 57749 6212 cal significance level at 𝛼 = 0.05. The setting of this PDR 0.53 16.47 8.75 387.10 31.42 study is a typical multiple testing, and the p-values of the tests must be controlled. Bonferroni correction is a popular method for this purpose. However, the adop- β€’ For fixed-size moving windows, the train- tion of this simple correction results in the lack of sta- ing set is the 𝑁 most recent projects that tistical power, especially for not large effects. We thus finished before 𝑝𝑖 started. If multiple projects controlled the false discovery rate (FDR) of multiple finished on the same date, all of them are testing [18] with the β€œmultipletests” function of included. the statsmodels package in Python. FDR is a ratio β€’ For fixed-duration, the training set is the of the number of falsely rejected null hypotheses to most recent projects whose whole life cy- the number of rejected null hypotheses. cle had fallen within a window of 𝐷 months prior to the start of 𝑝𝑖 . 4. Results and Discussion 4. Estimate an effort of a target project based on past project data. 4.1. Comparisons between Moving β€’ For no filtering, the training set from the Windows and Augmentation previous step is used. Figure 1 has 6 plots showing the difference in mean β€’ For relevancy filtering, a subset selected by absolute error against window sizes using the fixed- a nearest neighbor from the training set is size moving windows (baseline) and the augmentation used. with it. The x-axis of each figure is the size of the β€’ For the augment method, an augmented set window, and the y-axis is the subtraction of the accu- of the training set with the projects not se- racy measure value with the growing approach from lected in the previous step is used. that with the moving windows at the given x-value. 5. Evaluate the estimation results. The moving windows and the augmentation with it were advantageous where the line is below 0. Circle This study used the single-company subset of the points mean a statistically significant difference, with ISBSG dataset that was analyzed in [2, 7, 8, 5, 6, 16]. Ta- the moving windows or the augmentation with it, be- ble 1 shows summary statistics. We explored window ing better than the growing portfolio. At these points, sizes from 20 to 120 projects for the size-based moving the corresponding FDR-controlled p-value was below windows and from 12 to 84 months for the duration- 𝛼 = 0.05. based moving windows as well as the past study [17]. Figure. 1 revealed the effect of using the fixed-size No filtering, called the growing portfolio in past stud- moving windows and the augmentation, compared to ies, was used as a baseline for comparing the filtering always using the growing portfolio as follows: methods. β€’ With average effort estimation, statistically sig- nificant differences were found for almost all win- 3.7. Performance Measures dow sizes. The augmentation did not bring clear The accuracy statistics that we used to evaluate the changes except for small window sizes, where effort estimation models are based on the difference additional statistically significant differences were between estimated effort and actual effort. We used found. Mean Absolute Error (MAE), which is widely used to evaluate the accuracy of effort estimation models, as β€’ With median effort estimation, no statistically it is an unbiased measure that favours neither under- significant difference was found for all window nor over-estimates. sizes. The augmentation improved the perfor- mance a bit for smaller window sizes but wors- ened it a bit for larger window sizes. The ef- 7 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) 60 60 60 50 50 50 40 40 40 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 30 30 30 20 20 20 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Window Size (number of projects) Window Size (number of projects) Window Size (number of projects) (a) MW (average) (b) MW (median) (c) MW (lasso) 60 60 60 50 50 50 40 40 40 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 30 30 30 20 20 20 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Window Size (number of projects) Window Size (number of projects) Window Size (number of projects) (d) Augmentation(average) (e) Augmentation (median) (f) Augmentation (lasso) Figure 1: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-size MW and Aug- mentation) fects never caused a statistically significant dif- than 40 months. The growing portfolio was no ference. longer advantageous for larger window sizes. β€’ With lasso, statistically significant differences were β€’ With median effort estimation, the effective win- found when window size is between 85 and 95 dow range was more than 60 months. Disadvan- or is more than 110. The augmentation made tageous window sizes are between 55 months the advantages in other window sizes statisti- and 60 months. The augmentation made the sta- cally significant. The significant differences in tistically significant differences disappeared. larger window sizes disappeared instead. Note that lasso was more accurate than the others even β€’ With lasso, there was no significant difference. when used with the growing portfolio. There was no clear advantage nor disadvantage. The augmentation made no statistically signifi- These observations suggested that the augmentation cant difference while the difference got closer a could bring a positive synergistic effect on the estima- bit. tion accuracy when the augmentation was applied to fixed-size windows with average or lasso. These observations suggested that the augmentation Figure 2 plotted the same comparisons but using the could improve the estimation accuracy when the aug- fixed-duration moving windows. In the figure, square mentation was applied to fixed-duration windows with points mean a statistically significant difference, with average effort estimation. the fixed-duration moving windows being worse than The answer to RQ1 is yes: Augmenting moving win- the growing portfolio. These figures revealed the ef- dows with a relevancy filtering was useful. It did not fects of the fixed-duration moving windows and the cause an apparent negative synergistic effect, at least. augmentation with it, compared to always using the It sometimes made positive synergistic effects. growing portfolio as follows: 4.2. Evaluation of Combination of β€’ With average effort estimation, the effective win- Augmented MW and NN-filter dow range was between 20 months and less than 30 months. The growing portfolio got advanta- The combination of the augmented moving windows geous for more than 53 months. The augmenta- and the NN-filter was evaluated under the same situ- tion extended the advantageous range to more ations. The number of neighbors was set to 3 for av- 8 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) 60 60 60 50 50 50 40 40 40 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 30 30 30 20 20 20 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Window Size (calendar months) Window Size (calendar months) Window Size (calendar months) (a) MW (average) (b) MW (median) (c) MW (lasso) 60 60 60 50 50 50 40 40 40 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 30 30 30 20 20 20 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Window Size (calendar months) Window Size (calendar months) Window Size (calendar months) (d) Augmentation(average) (e) Augmentation (median) (f) Augmentation (lasso) Figure 2: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-duration MW and Augmentation) 20 20 20 10 10 10 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 βˆ’40 βˆ’40 βˆ’40 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Window Size (number of projects) Window Size (number of projects) Window Size (number of projects) (a) NN (average) (b) NN (median) (c) NN (lasso) 30 30 30 20 20 20 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 βˆ’40 βˆ’40 βˆ’40 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Window Size (number of projects) Window Size (number of projects) Window Size (number of projects) (d) AG + NN (average) (e) AG + NN (median) (f) AG + NN (lasso) Figure 3: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-size MW + Augmen- tation + NN-filter) erage and median effort estimation models and 10 for mented moving windows with it, compared to always lasso because lasso models, as described in Section 3.3. using the growing portfolio as follows: Figure 3 has 6 plots showing the difference in mean β€’ With average effort estimation, the NN-filter made absolute error against fixed-size window sizes using statistically significant differences for almost all the NN-filter and using the combination of the aug- window sizes. Combining the augmented mov- mented windows and the NN-filter. These figures re- ing windows with the NN-filter made no clear vealed the effects of using the NN-filter and the aug- change except for small window sizes, where the 9 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) 20 20 20 10 10 10 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 βˆ’40 βˆ’40 βˆ’40 20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Window Size (calendar months) Window Size (calendar months) Window Size (calendar months) (a) NN (average) (b) NN (median) (c) NN (lasso) 30 30 30 20 20 20 Differences in mean AE(%) Differences in mean AE(%) Differences in mean AE(%) 10 10 10 0 0 0 βˆ’10 βˆ’10 βˆ’10 βˆ’20 βˆ’20 βˆ’20 βˆ’30 βˆ’30 βˆ’30 βˆ’40 βˆ’40 βˆ’40 20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Window Size (calendar months) Window Size (calendar months) Window Size (calendar months) (d) AG + NN (average) (e) AG + NN (median) (f) AG + NN (lasso) Figure 4: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-duration MW + Augmentation + NN-filter) differences got smaller from about -30% to -20%. tic effect caused by the combination of fixed-size mov- The significance of the differences was retained, ing windows and NN-filter. though. Figure 4 plotted the same comparison but using the fixed-duration moving windows. These figures revealed β€’ With median effort estimation, the NN-filter made the effects of using the NN-filter and the augmented no clear change while it caused positive effects moving windows with it, compared to always using as depicted by the line running below the zero the growing portfolio as follows: line for a wide window range. Combining the augmented moving windows with the NN-filter β€’ With average effort estimation, NN-filter made made no clear change except for small window statistically significant differences for almost all sizes. No statistically significant difference ap- window sizes. Combining the augmented mov- peared. ing windows with the NN-filter made no clear change. β€’ With lasso, the NN-filter made no statistically significant change though it worsened the per- β€’ With median effort estimation, NN-filter made formance. Combining the augmented moving no clear change while it caused positive effects windows with the NN-filter mitigated the degra- as depicted by the line running below the zero dation. Note that the augmentation made the line for a wide window range. Combining the significant improvement as shown in Fig. 1(f). augmented moving windows with the NN-filter The NN-filter canceled the improvement. made no clear change. β€’ With lasso, NN-filter made no statistically sig- In [1], the combination of the moving windows and nificant change though it worsened the perfor- NN-filter caused a negative synergistic effect. For ex- mance. Combining the augmented moving win- ample, less than half of the window sizes could achieve dows with the NN-filter mitigated the degrada- the improvement of -30% or more where mean effort tion by NN-filter. estimation was applied. The augmentation made the performance improvement of -30% or more for more Therefore, these observations suggested that the aug- than a half of the range as shown in Fig. 3(d). There- mentation did not result in a negative synergistic effect fore, these observations suggested that the augmented caused by the combination of fixed-size moving win- moving windows did not result in a negative synergis- dows and NN-filter. Rather the degradation by NN- filter could be mitigated. 10 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) The answer to RQ2 is as follows: The combination development estimation accuracy, The Journal of of the augmented chronological filtering and the rele- Systems & Software 64 (2002) 57–77. vancy filter did not bring a negative synergistic effect [5] S. Amasaki, C. Lokan, The Effects of Moving except for small window sizes. Rather, the negative ef- Windows to Software Estimation: Comparative fect caused by the relevancy filtering was mitigated by Study on Linear Regression and Estimation by the augmentation. Analogy, in: Proc. of IWSM-MENSURA 2012, IEEE, 2012, pp. 23–32. [6] S. Amasaki, C. Lokan, The Effect of Moving Win- 5. Conclusion dows on Software Effort Estimation: Compara- tive Study with CART, in: Proc. of IWESEP 2014, We explored the effects of the augmentation and its IEEE, 2014, pp. 1–6. combination with a relevancy filtering for effort esti- [7] C. Lokan, E. Mendes, Investigating the Use mation. We confirmed the augmentation was a useful of Duration-Based Moving Windows to Improve way to bring a positive synergistic effect of the chrono- Software Effort Prediction, in: Proc. of APSEC logical filtering and the relevancy filtering. Combining 2012, 2012, pp. 818–827. the augmented windows with the relevancy filtering, [8] C. Lokan, E. Mendes, Investigating the use as well as in [1] also diminished the negative syner- of duration-based moving windows to improve gistic effect caused by the combination of the moving software effort prediction: A replicated study, windows and NN-filter found in a past study. We thus Inf. Softw. Technol. 56 (2014) 1063–1075. concluded that the augmentation can be a good way to [9] S. Herbold, CrossPare: A tool for bench- combine the two filtering approaches and also a good marking cross-project defect predictions, in: extension of the moving windows, which can be safely Proc. of 30th IEEE/ACM International Confer- combined with the relevancy filtering. ence on Automated Software Engineering Work- Further investigation considering other transfer learn- shops (ASEW), IEEE, 2016, pp. 90–95. ing approaches is in future work. The NN-filter used [10] B. Turhan, T. Menzies, A. B. Bener, J. Di Ste- for augmentation is a type of transfer learning, it is fano, On the relative value of cross-company and interesting to examine the effects of other approaches within-company data for defect prediction, Em- such as [14] for augmentation. Some transfer learning pirical Software Engineering 14 (2009) 540–578. approaches for cross-project defect prediction [19] can [11] E. Kocaguneli, T. Menzies, How to Find Relevant also be applied. The threat to external validity can be Data for Effort Estimation?, in: Proc. of ESEM, mitigated with additional project data. IEEE, 2011, pp. 255–264. [12] E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, J. W. Acknowledgments Keung, When to use data from other projects for effort estimation, in: Proc. of ASE, ACM, 2010, This work was partially supported by JSPS KAKENHI pp. 321–324. Grant #18K11246. [13] E. Kocaguneli, T. Menzies, A. B. Bener, J. W. Ke- ung, Exploiting the Essential Assumptions of Analogy-Based Effort Estimation, IEEE Transac- References tions on Software Engineering 38 (2012) 425–438. [14] E. Kocaguneli, T. Menzies, E. Mendes, Transfer [1] S. Amasaki, Exploring Preference of Chronolog- learning in effort estimation, Empirical Software ical and Relevancy Filtering in Effort Estimation, Engineering 20 (2015) 813–843. in: Proc. of Profes 2019, Springer, 2019, pp. 247– [15] R. Tibshirani, Regression shrinkage and selection 262. via the lasso, J. Roy. Statist. Soc. Ser. B (1996) 267– [2] C. Lokan, E. Mendes, Applying moving windows 288. to software effort estimation, in: Proc. of ESEM [16] S. Amasaki, C. Lokan, Evaluation of Moving 2009, 2009, pp. 111–122. Window Policies with CART, in: Proc. of IWE- [3] B. Turhan, E. Mendes, A Comparison of Cross- SEP 2016, IEEE, 2016, pp. 24–29. Versus Single-Company Effort Prediction Models [17] S. Amasaki, C. Lokan, A Replication of Compar- for Web Projects, in: Proc. of SEAA, IEEE, 2014, ative Study of Moving Windows on Linear Re- pp. 285–292. gression and Estimation by Analogy, in: Proc. of [4] B. Kitchenham, S. Lawrence Pfleeger, B. McColl, PROMISE, ACM Press, 2015, pp. 1–10. S. Eagan, An empirical study of maintenance and [18] Y. Benjamini, D. Yekutieli, The control of the false 11 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020) discovery rate in multiple testing under depen- dency, Annals of statistics 29 (2001) 1165–1188. [19] S. Herbold, Training data selection for cross- project defect prediction, in: Proc. of PROMISE ’13, ACM, 2013, pp. 6:1–6:10. 12