=Paper= {{Paper |id=Vol-2767/paper02 |storemode=property |title=Augmenting Window Contents with Transfer Learning for Effort Estimation |pdfUrl=https://ceur-ws.org/Vol-2767/01-QuASoQ-2020.pdf |volume=Vol-2767 |authors=Sousuke Amasaki |dblpUrl=https://dblp.org/rec/conf/apsec/Amasaki20 }} ==Augmenting Window Contents with Transfer Learning for Effort Estimation== https://ceur-ws.org/Vol-2767/01-QuASoQ-2020.pdf
                                                  8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




Augmenting Window Contents with Transfer Learning
for Effort Estimation
Sousuke Amasakia
a Okayama Prefectural University, 111 Kuboki, Soja, 719-1197, Japan



                                          Abstract
                                          BACKGROUND: Some studies showed filtering out old completed projects with a window was effective for preparing a train-
                                          ing dataset of an effort estimation model. Other studies showed selecting completed projects similar to a target project was
                                          also effective. The application of the similarity-based selection after the windowing approach was failed to synthesize their
                                          effects. The shortage of similar projects in the windowed pool was a potential cause of the failure. AIMS: To examine whether
                                          augmenting the window pool is effective to improve the estimation accuracy. METHOD: The moving windows approach was
                                          used for preparing a window pool. The similarity-based selection was applied to augment the pool. The selection assumes
                                          that projects in the pool form a set of virtual target projects. Old projects outside the pool were assumed to form a set of
                                          cross-company projects to be selected. The empirical study with a single-company ISBSG data was conducted to evaluate
                                          the effect. RESULTS: A positive synergistic effect was observed. The augmented window could synthesize the windowing
                                          approach and the similarity-based selection. It could also be combined with the similarity-based selection without perfor-
                                          mance degradation. CONCLUSIONS: Practitioners should consider adding projects similar to recently completed projects
                                          when effort estimation is based on historical data.

                                          Keywords
                                          effort estimation, moving windows, augmenting windows


1. Introduction                                                                                                        project data regarding metrics used for estimation. The
                                                                                                                       study found that the combination of those techniques
The success of software projects relies on many fac-                                                                   might be worse than the independent application.
tors. The accuracy of software effort estimation is an                                                                     The negative synergistic effect can be reasoned, at
serious influential factor at early project phase. Over-                                                               least, in two aspects. First, the relevancy filtering was
estimation and underestimation have caused serious                                                                     applied after applying the chronological filtering. The
consequences for decades. Researchers have studied                                                                     chronological filtering does not care about feature vari-
data-driven software effort estimation models while                                                                    ables and may select a subset that does not hold enough
experts’ judgment is still a primary choice in actual.                                                                 projects similar to a target project. It would be better
The accuracy of the software effort estimation models                                                                  to augment the subset with old but resemble projects
is considered insufficient among not a few managers.                                                                   using the relevancy filtering. Second, the simple aver-
   Software effort estimation models are affected by                                                                   age and median were used as effort estimation models
the adequacy of historical data from past projects. For                                                                as discussed in [1]. The simple models only used the
instance, an organization’s productivity is not station-                                                               effort variable for estimation and were insensitive to
ary nor monotonic due to changes in the environment                                                                    the change in the distribution of feature variables af-
and the organization itself. Inaccurate effort estima-                                                                 ter the relevancy filtering.
tion models would be obtained with the historical data                                                                     This paper proposed an augmented chronological
that might not reflect the present productivity. A key                                                                 filtering based on the chronological filtering and the
to accurate software effort estimation is to prepare his-                                                              relevancy filtering. Its effects were investigated with
torical data that reflect the characteristics of a target                                                              a software effort estimation model using feature vari-
project to be estimated.                                                                                               ables, in addition to the simple average and median
   A past study [1] examined two filtering techniques,                                                                 models. The augmented filtering was also evaluated
namely, chronological filtering and relevancy filtering.                                                               as alternative chronological filtering in the past com-
The chronological filtering [2] removes too old project                                                                bination method. The following questions were asked:
data. The relevancy filtering [3] removes dissimilar
                                                                                                                       RQ1: Does augmenting moving windows with a rele-
 QuASoQ 2020: 8th International Workshop on Quantitative                                                                   vancy filtering affects the estimation accuracy?
Approaches to Software Quality, December 01, 2020, Singapore
email: amasaki@cse.oka-pu.ac.jp (S. Amasaki)                                                                           RQ2: Does using the augmentation as a chronologi-
orcid: 0000-0001-8763-3457 (S. Amasaki)
                                    Β© 2020 Copyright for this paper by its authors. Use permitted under Creative           cal filtering affect the estimation accuracy of the
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                                past combination method?




                                                                                                                   4
               8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




2. Related Work                                                 fort estimation. They applied it to transfer old project
                                                                to a new project and found that TEAK was effective
2.1. Chronological Filtering                                    not only for cross-company effort estimation but also
                                                                for cross-time effort estimation [14].
Although research in software effort estimation mod-
                                                                   NN-filter is based on a nearest neighbor algorithm.
els has a long history, relatively few studies have taken
                                                                In that sense, a study by Amasaki and Lokan [5] can
into consideration the chronological order of projects.
                                                                be considered an evaluation study of the combination
Therefore, chronological filtering has not been studied
                                                                of the relevancy filtering and the chronological filter-
well compared with other topics in effort estimation.
                                                                ing. In that study, the combination worked well to im-
   To our knowledge, Kitchenham et al. [4] were first
                                                                prove estimation accuracy for a narrow range of win-
to suggest the use of chronological filtering. They built
                                                                dow sizes. While that study used a wrapper approach
four linear regression models with four subsets, each
                                                                for feature selection and logarithmic transformation in
of which comprised projects from different ranges of
                                                                addition to the nearest neighbor algorithm, our study
time duration. As the coefficients of the models were
                                                                aims to explore the effects of the combination with-
different from each other, they allowed to drop out
                                                                out such complicated factors. For that purpose, we
older project data. Lokan and Mendes [2] were the first
                                                                adopted two simple estimation techniques that were
to study the effect of using moving windows in detail.
                                                                not adopted in [5], described in the next section.
They used linear regression (LR) models and a single-
company dataset from the ISBSG repository. Training
sets were defined to be the 𝑁 most recently completed           3. Methodology
projects. They found that the use of a window could
affect accuracy significantly; predictive accuracy was          3.1. Effort Estimation Techniques
better with larger windows; some window sizes were
particularly effective. Amasaki and Lokan also inves-           In [1], average and median were used as software ef-
tigated the effect of using moving windows with Esti-           fort estimation models. The average was adopted be-
mation by Analogy [5] and CART [6]. They found that             cause it uses the whole training set and is sensitive to
moving windows could improve the estimation accu-               the distribution of effort values in the training set. The
racy, but the effect was different than with LR.                median was adopted because it is robust to the distri-
   Recent studies showed the effect and its extent could        bution and contrasts with the average. These models
be affected by windowing policies [7] and software or-          estimate efforts without adjustments based on feature
ganizations [8]. Lokan and Mendes [7] investigated              variables of projects.
the effect on accuracy when using moving windows of                To examine the difference in the use of feature vari-
various ranges of time duration to form training sets           ables in software effort estimation, we also adopted
on which to base effort estimates. They also showed             Lasso [15] for our experiment. Lasso is a kind of pe-
that the use of windows based on duration could affect          nalized linear regression models. Past studies on the
the accuracy of estimates, but to a lesser extent than          chronological filtering used Lasso and showed that the
windows based on a fixed number of projects [8].                chronological filtering was effective with it. Our ex-
                                                                periment used LassoLarsIC of scikit-learn library.
2.2. Relevancy Filtering
                                                                3.2. Chronological Filtering
Relevancy filtering is a type of transfer learning ap-
proach. While many filtering approaches have been               This study adopted fixed-size moving windows [2] and
proposed for cross-project defect prediction (e.g., [9]),       fixed-duration moving windows [8]. The latest 𝑁 fin-
a few studies on cross-company effort estimation have           ished projects were selected as a training set by the
evaluated the effects of relevancy filtering approaches.        fixed-size moving windows. The fixed-duration mov-
   Turhan and Mendes [3] applied brings a so-called             ing windows selected the latest projects finished within
NN-filter [10] to cross-company effort estimation of            𝑁 months. As 𝑁 influences on the effectiveness of
web projects. They showed that an estimation model              moving windows, we explored various values as well
based on raw cross-company data was worse than that             as past studies.
based on within-company data but was improved as
comparable one by using the NN-filter. Kocaguneli et            3.3. Relevancy Filtering
al. [11, 12, 13] also introduced a transfer learning ap-
proach called TEAK for improving cross-company ef-              This study used a nearest neighbor algorithm as a rele-
                                                                vancy filtering approach. It is also called NN-filter [10].




                                                            5
               8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




The procedure of NN-filter is as follows:                       3.5. Combination
   1. Select π‘˜ closest instances of history data to each        The combination of the chronological filtering and the
      instance of target project data in terms of un-           relevancy filtering was investigated in [1]. The chrono-
      weighted Euclidean distance.                              logical filtering and the relevancy filtering were com-
   2. Combine the selected instances without dupli-             bined as follows:
      cation.
                                                                   1. Recently completed projects are selected with
Note that each feature of project data was normalized                 the moving windows approach. The remained
with min-max normalization before the distance cal-                   old projects are discarded.
culation.                                                          2. NN-filter is applied to select projects from the
   As the synergistic effect could be observed with ef-               recently completed projects. The selected projects
fective filtering, the relevancy filtering had to be con-             resemble a target project to be estimated.
figured as effective. For average and mean effort esti-            3. The selected projects are used to train a software
mation models, we roughly fixed π‘˜ = 3, which is the                   effort estimation model.
smallest number which can make average and median
estimations give distinct efforts. For lasso, we roughly  The combination method was found less effective than
fixed π‘˜ = 10, half of the minimum of the window sizes     each of the filtering methods in [1] with mean and me-
we explored. In general, increasing π‘˜ would lead to       dian models.
worse estimation if NN-filter works well. Hence, these       The augmentation can be considered a variation of
values could not be the best but were expected more       moving windows approach while it is a way to com-
reasonable than larger π‘˜s.                                bine moving windows and NN-filter. In this paper,
                                                          this combination was also examined using a subset ob-
                                                          tained by the augmentation. As the augmentation has
3.4. Augmentation                                         more projects, NN-filter might bring better neighbors
The augmentation adds old projects selected by the from an augmented subset.
relevancy filtering into a subset obtained by the chrono-
logical filtering as follows:                             3.6. Experiment procedure
   1. Recently completed projects are selected with             As the chronological filtering relies on the time prox-
      the moving windows approach.                              imity, our experiment needs to assume a situation that
   2. NN-filter is applied to select projects from the          a development organization needs to respond to con-
      remained old projects. The most similar project           tinuously coming new projects. The size of windows
      to each project of the recently completed projects        influences on where our experiment starts. As same as
      is selected. The set of selected projects has no          the past studies, our experiment with a specific win-
      duplicate.                                                dow size was conducted as follows:
   3. The selected projects and the recently completed
                                                                   1. Sort all projects by starting date.
      projects are combined.
                                                                   2. For a given window size 𝑁 , find the earliest project
   4. The combined projects are used to train a soft-
                                                                      𝑝0 for which at least 𝑁 + 1 projects were com-
      ware effort estimation model.
                                                                      pleted prior to the start of 𝑝0 (projects from 𝑝0
Note that NN-filter uses the effort variable in addition              onwards are the ones whose training set is af-
to feature variables. As efforts of the past projects are             fected by using a window, so they form the set
known, it is possible to use the effort variable in the               of evaluation projects for this window size. For
augmentation process.                                                 example, with a window of 20 projects, at least
   The augmentation shares the same assumption as                     21 projects must have finished for the window
the chronological filtering that the recently completed               to differ from the growing portfolio.)
projects resemble a target project to be estimated. Re-            3. For every project 𝑝𝑖 in chronological sequence,
sults of NN-filter are also expected to pretend to be as              starting from 𝑝0 , form a training set using mov-
fresh as the recently completed projects. Therefore,                  ing windows and the growing portfolio (all com-
the selected projects are considered to keep the simi-                pleted projects).
larity to the target project.                                            β€’ For no filtering, the training set is all projects
                                                                           that finished before 𝑝𝑖 started.




                                                            6
                8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




Table 1                                                         We concentrate first on the statistical significance of
Summary statistics for ratio-scaled variables in data from   differences in accuracy that arise from using the filter-
single ISBSG organization                                    ing approaches. To test for statistically significant dif-
                                                             ferences between accuracy measures, we use the two-
  Variable Min Mean Median                 Max StDev
                                                             sided Wilcoxon signed-rank test (wilcoxon function
  Size          10     496        266      6294     699      of the scipy package for Python) and set the statisti-
  Effort        62    4553       2408    57749     6212      cal significance level at 𝛼 = 0.05. The setting of this
  PDR         0.53   16.47        8.75 387.10      31.42     study is a typical multiple testing, and the p-values of
                                                             the tests must be controlled. Bonferroni correction is a
                                                             popular method for this purpose. However, the adop-
          β€’ For fixed-size moving windows, the train- tion of this simple correction results in the lack of sta-
            ing set is the 𝑁 most recent projects that tistical power, especially for not large effects. We thus
            finished before 𝑝𝑖 started. If multiple projects controlled the false discovery rate (FDR) of multiple
            finished on the same date, all of them are testing [18] with the β€œmultipletests” function of
            included.                                        the statsmodels package in Python. FDR is a ratio
          β€’ For fixed-duration, the training set is the of the number of falsely rejected null hypotheses to
            most recent projects whose whole life cy- the number of rejected null hypotheses.
            cle had fallen within a window of 𝐷 months
            prior to the start of 𝑝𝑖 .
                                                                 4. Results and Discussion
    4. Estimate an effort of a target project based on
       past project data.                                   4.1. Comparisons between Moving
          β€’ For no filtering, the training set from the           Windows and Augmentation
             previous step is used.
                                                            Figure 1 has 6 plots showing the difference in mean
          β€’ For relevancy filtering, a subset selected by absolute error against window sizes using the fixed-
             a nearest neighbor from the training set is size moving windows (baseline) and the augmentation
             used.                                          with it. The x-axis of each figure is the size of the
          β€’ For the augment method, an augmented set window, and the y-axis is the subtraction of the accu-
             of the training set with the projects not se- racy measure value with the growing approach from
             lected in the previous step is used.           that with the moving windows at the given x-value.
    5. Evaluate the estimation results.                     The moving windows and the augmentation with it
                                                            were advantageous where the line is below 0. Circle
   This study used the single-company subset of the points mean a statistically significant difference, with
ISBSG dataset that was analyzed in [2, 7, 8, 5, 6, 16]. Ta- the moving windows or the augmentation with it, be-
ble 1 shows summary statistics. We explored window ing better than the growing portfolio. At these points,
sizes from 20 to 120 projects for the size-based moving the corresponding FDR-controlled p-value was below
windows and from 12 to 84 months for the duration- 𝛼 = 0.05.
based moving windows as well as the past study [17].          Figure. 1 revealed the effect of using the fixed-size
No filtering, called the growing portfolio in past stud- moving windows and the augmentation, compared to
ies, was used as a baseline for comparing the filtering always using the growing portfolio as follows:
methods.
                                                                β€’ With average effort estimation, statistically sig-
                                                                  nificant differences were found for almost all win-
3.7. Performance Measures
                                                                  dow sizes. The augmentation did not bring clear
The accuracy statistics that we used to evaluate the              changes except for small window sizes, where
effort estimation models are based on the difference              additional statistically significant differences were
between estimated effort and actual effort. We used               found.
Mean Absolute Error (MAE), which is widely used to
evaluate the accuracy of effort estimation models, as           β€’ With median effort estimation, no statistically
it is an unbiased measure that favours neither under-             significant difference was found for all window
nor over-estimates.                                               sizes. The augmentation improved the perfor-
                                                                  mance a bit for smaller window sizes but wors-
                                                                  ened it a bit for larger window sizes. The ef-




                                                             7
                                               8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




                                          60                                                                                          60                                                                                         60

                                          50                                                                                          50                                                                                         50

                                          40                                                                                          40                                                                                         40




             Differences in mean AE(%)




                                                                                                         Differences in mean AE(%)




                                                                                                                                                                                                    Differences in mean AE(%)
                                          30                                                                                          30                                                                                         30

                                          20                                                                                          20                                                                                         20

                                          10                                                                                          10                                                                                         10

                                           0                                                                                           0                                                                                          0

                                         βˆ’10                                                                                         βˆ’10                                                                                        βˆ’10

                                         βˆ’20                                                                                         βˆ’20                                                                                        βˆ’20

                                         βˆ’30                                                                                         βˆ’30                                                                                        βˆ’30
                                               20    40         60             80            100   120                                     20   40         60             80            100   120                                     20       40         60             80            100   120
                                                          Window Size (number of projects)                                                           Window Size (number of projects)                                                               Window Size (number of projects)



                                                    (a) MW (average)                                                                            (b) MW (median)                                                                                 (c) MW (lasso)

                                          60                                                                                          60                                                                                         60

                                          50                                                                                          50                                                                                         50

                                          40                                                                                          40                                                                                         40
             Differences in mean AE(%)




                                                                                                         Differences in mean AE(%)




                                                                                                                                                                                                    Differences in mean AE(%)
                                          30                                                                                          30                                                                                         30

                                          20                                                                                          20                                                                                         20

                                          10                                                                                          10                                                                                         10

                                           0                                                                                           0                                                                                          0

                                         βˆ’10                                                                                         βˆ’10                                                                                        βˆ’10

                                         βˆ’20                                                                                         βˆ’20                                                                                        βˆ’20

                                         βˆ’30                                                                                         βˆ’30                                                                                        βˆ’30
                                               20    40         60             80            100   120                                     20   40         60             80            100   120                                     20       40         60             80            100   120
                                                          Window Size (number of projects)                                                           Window Size (number of projects)                                                               Window Size (number of projects)



                                                (d) Augmentation(average)                                                                  (e) Augmentation (median)                                                                       (f) Augmentation (lasso)

Figure 1: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-size MW and Aug-
mentation)



      fects never caused a statistically significant dif-                                                                                                                               than 40 months. The growing portfolio was no
      ference.                                                                                                                                                                          longer advantageous for larger window sizes.
    β€’ With lasso, statistically significant differences were                                                                                                                      β€’ With median effort estimation, the effective win-
      found when window size is between 85 and 95                                                                                                                                   dow range was more than 60 months. Disadvan-
      or is more than 110. The augmentation made                                                                                                                                    tageous window sizes are between 55 months
      the advantages in other window sizes statisti-                                                                                                                                and 60 months. The augmentation made the sta-
      cally significant. The significant differences in                                                                                                                             tistically significant differences disappeared.
      larger window sizes disappeared instead. Note
      that lasso was more accurate than the others even                                                                                                                           β€’ With lasso, there was no significant difference.
      when used with the growing portfolio.                                                                                                                                         There was no clear advantage nor disadvantage.
                                                                                                                                                                                    The augmentation made no statistically signifi-
These observations suggested that the augmentation                                                                                                                                  cant difference while the difference got closer a
could bring a positive synergistic effect on the estima-                                                                                                                            bit.
tion accuracy when the augmentation was applied to
fixed-size windows with average or lasso.                                                                                                                            These observations suggested that the augmentation
   Figure 2 plotted the same comparisons but using the                                                                                                               could improve the estimation accuracy when the aug-
fixed-duration moving windows. In the figure, square                                                                                                                 mentation was applied to fixed-duration windows with
points mean a statistically significant difference, with                                                                                                             average effort estimation.
the fixed-duration moving windows being worse than                                                                                                                      The answer to RQ1 is yes: Augmenting moving win-
the growing portfolio. These figures revealed the ef-                                                                                                                dows with a relevancy filtering was useful. It did not
fects of the fixed-duration moving windows and the                                                                                                                   cause an apparent negative synergistic effect, at least.
augmentation with it, compared to always using the                                                                                                                   It sometimes made positive synergistic effects.
growing portfolio as follows:
                                                                                                                                                                     4.2. Evaluation of Combination of
    β€’ With average effort estimation, the effective win-
                                                         Augmented MW and NN-filter
      dow range was between 20 months and less than
      30 months. The growing portfolio got advanta- The combination of the augmented moving windows
      geous for more than 53 months. The augmenta- and the NN-filter was evaluated under the same situ-
      tion extended the advantageous range to more ations. The number of neighbors was set to 3 for av-




                                                                                                                                                               8
                                               8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




                                          60                                                                                                        60                                                                                                        60

                                          50                                                                                                        50                                                                                                        50

                                          40                                                                                                        40                                                                                                        40




             Differences in mean AE(%)




                                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                                                                 Differences in mean AE(%)
                                          30                                                                                                        30                                                                                                        30

                                          20                                                                                                        20                                                                                                        20

                                          10                                                                                                        10                                                                                                        10

                                           0                                                                                                         0                                                                                                         0

                                         βˆ’10                                                                                                       βˆ’10                                                                                                       βˆ’10

                                         βˆ’20                                                                                                       βˆ’20                                                                                                       βˆ’20

                                         βˆ’30                                                                                                       βˆ’30                                                                                                       βˆ’30
                                                    20        30        40          50        60       70   80                                                20        30        40          50        60       70   80                                                 20         30        40          50        60       70   80
                                                                    Window Size (calendar months)                                                                             Window Size (calendar months)                                                                               Window Size (calendar months)



                                                         (a) MW (average)                                                                                          (b) MW (median)                                                                                              (c) MW (lasso)

                                          60                                                                                                        60                                                                                                        60

                                          50                                                                                                        50                                                                                                        50

                                          40                                                                                                        40                                                                                                        40
             Differences in mean AE(%)




                                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                                                                 Differences in mean AE(%)
                                          30                                                                                                        30                                                                                                        30

                                          20                                                                                                        20                                                                                                        20

                                          10                                                                                                        10                                                                                                        10

                                           0                                                                                                         0                                                                                                         0

                                         βˆ’10                                                                                                       βˆ’10                                                                                                       βˆ’10

                                         βˆ’20                                                                                                       βˆ’20                                                                                                       βˆ’20

                                         βˆ’30                                                                                                       βˆ’30                                                                                                       βˆ’30
                                                    20        30        40          50        60       70   80                                                20        30        40          50        60       70   80                                                 20         30        40          50        60       70   80
                                                                    Window Size (calendar months)                                                                             Window Size (calendar months)                                                                               Window Size (calendar months)



                                                (d) Augmentation(average)                                                                                (e) Augmentation (median)                                                                                      (f) Augmentation (lasso)

Figure 2: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-duration MW and
Augmentation)


                                          20                                                                                                        20                                                                                                        20



                                          10                                                                                                        10                                                                                                        10
             Differences in mean AE(%)




                                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                                                                 Differences in mean AE(%)




                                           0                                                                                                         0                                                                                                         0



                                         βˆ’10                                                                                                       βˆ’10                                                                                                       βˆ’10



                                         βˆ’20                                                                                                       βˆ’20                                                                                                       βˆ’20



                                         βˆ’30                                                                                                       βˆ’30                                                                                                       βˆ’30



                                         βˆ’40                                                                                                       βˆ’40                                                                                                       βˆ’40
                                               20        40              60             80            100        120                                     20        40              60             80            100        120                                     20          40              60             80            100        120
                                                                   Window Size (number of projects)                                                                          Window Size (number of projects)                                                                            Window Size (number of projects)



                                                         (a) NN (average)                                                                                          (b) NN (median)                                                                                                  (c) NN (lasso)

                                          30                                                                                                        30                                                                                                        30


                                          20                                                                                                        20                                                                                                        20
             Differences in mean AE(%)




                                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                                                                 Differences in mean AE(%)




                                          10                                                                                                        10                                                                                                        10


                                           0                                                                                                         0                                                                                                         0


                                         βˆ’10                                                                                                       βˆ’10                                                                                                       βˆ’10


                                         βˆ’20                                                                                                       βˆ’20                                                                                                       βˆ’20


                                         βˆ’30                                                                                                       βˆ’30                                                                                                       βˆ’30


                                         βˆ’40                                                                                                       βˆ’40                                                                                                       βˆ’40
                                               20        40              60             80            100        120                                     20        40              60             80            100        120                                     20          40              60             80            100        120
                                                                   Window Size (number of projects)                                                                          Window Size (number of projects)                                                                            Window Size (number of projects)



                                                    (d) AG + NN (average)                                                                                     (e) AG + NN (median)                                                                                            (f) AG + NN (lasso)

Figure 3: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-size MW + Augmen-
tation + NN-filter)



erage and median effort estimation models and 10 for mented moving windows with it, compared to always
lasso because lasso models, as described in Section 3.3. using the growing portfolio as follows:
   Figure 3 has 6 plots showing the difference in mean
                                                             β€’ With average effort estimation, the NN-filter made
absolute error against fixed-size window sizes using
                                                               statistically significant differences for almost all
the NN-filter and using the combination of the aug-
                                                               window sizes. Combining the augmented mov-
mented windows and the NN-filter. These figures re-
                                                               ing windows with the NN-filter made no clear
vealed the effects of using the NN-filter and the aug-
                                                               change except for small window sizes, where the




                                                                                                                                                                                       9
                                               8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




                                          20                                                                                        20                                                                                     20



                                          10                                                                                        10                                                                                     10




             Differences in mean AE(%)




                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                              Differences in mean AE(%)
                                           0                                                                                         0                                                                                      0



                                         βˆ’10                                                                                       βˆ’10                                                                                    βˆ’10



                                         βˆ’20                                                                                       βˆ’20                                                                                    βˆ’20



                                         βˆ’30                                                                                       βˆ’30                                                                                    βˆ’30



                                         βˆ’40                                                                                       βˆ’40                                                                                    βˆ’40
                                                  20    30       40          50        60    70   80                                     20    30       40          50        60    70   80                                     20      30       40          50        60    70   80
                                                             Window Size (calendar months)                                                          Window Size (calendar months)                                                            Window Size (calendar months)



                                                       (a) NN (average)                                                                       (b) NN (median)                                                                          (c) NN (lasso)

                                          30                                                                                        30                                                                                     30


                                          20                                                                                        20                                                                                     20
             Differences in mean AE(%)




                                                                                                       Differences in mean AE(%)




                                                                                                                                                                                              Differences in mean AE(%)
                                          10                                                                                        10                                                                                     10


                                           0                                                                                         0                                                                                      0


                                         βˆ’10                                                                                       βˆ’10                                                                                    βˆ’10


                                         βˆ’20                                                                                       βˆ’20                                                                                    βˆ’20


                                         βˆ’30                                                                                       βˆ’30                                                                                    βˆ’30


                                         βˆ’40                                                                                       βˆ’40                                                                                    βˆ’40
                                                  20    30       40          50        60    70   80                                     20    30       40          50        60    70   80                                     20      30       40          50        60    70   80
                                                             Window Size (calendar months)                                                          Window Size (calendar months)                                                            Window Size (calendar months)



                                                 (d) AG + NN (average)                                                                   (e) AG + NN (median)                                                                        (f) AG + NN (lasso)

Figure 4: The difference in mean absolute error against moving windows (growing portfolio vs. fixed-duration MW +
Augmentation + NN-filter)



       differences got smaller from about -30% to -20%. tic effect caused by the combination of fixed-size mov-
       The significance of the differences was retained, ing windows and NN-filter.
       though.                                                Figure 4 plotted the same comparison but using the
                                                          fixed-duration moving windows. These figures revealed
    β€’ With median effort estimation, the NN-filter made the effects of using the NN-filter and the augmented
       no clear change while it caused positive effects moving windows with it, compared to always using
       as depicted by the line running below the zero the growing portfolio as follows:
       line for a wide window range. Combining the
       augmented moving windows with the NN-filter              β€’ With average effort estimation, NN-filter made
       made no clear change except for small window               statistically significant differences for almost all
       sizes. No statistically significant difference ap-         window sizes. Combining the augmented mov-
       peared.                                                    ing windows with the NN-filter made no clear
                                                                  change.
    β€’ With lasso, the NN-filter made no statistically
       significant change though it worsened the per-           β€’ With median effort estimation, NN-filter made
       formance. Combining the augmented moving                   no clear change while it caused positive effects
       windows with the NN-filter mitigated the degra-            as depicted by the line running below the zero
       dation. Note that the augmentation made the                line for a wide window range. Combining the
       significant improvement as shown in Fig. 1(f).             augmented moving windows with the NN-filter
       The NN-filter canceled the improvement.                    made no clear change.
                                                                β€’ With lasso, NN-filter made no statistically sig-
In [1], the combination of the moving windows and
                                                                  nificant change though it worsened the perfor-
NN-filter caused a negative synergistic effect. For ex-
                                                                  mance. Combining the augmented moving win-
ample, less than half of the window sizes could achieve
                                                                  dows with the NN-filter mitigated the degrada-
the improvement of -30% or more where mean effort
                                                                  tion by NN-filter.
estimation was applied. The augmentation made the
performance improvement of -30% or more for more Therefore, these observations suggested that the aug-
than a half of the range as shown in Fig. 3(d). There- mentation did not result in a negative synergistic effect
fore, these observations suggested that the augmented caused by the combination of fixed-size moving win-
moving windows did not result in a negative synergis- dows and NN-filter. Rather the degradation by NN-
                                                          filter could be mitigated.




                                                                                                                                                          10
                8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




   The answer to RQ2 is as follows: The combination             development estimation accuracy, The Journal of
of the augmented chronological filtering and the rele-          Systems & Software 64 (2002) 57–77.
vancy filter did not bring a negative synergistic effect    [5] S. Amasaki, C. Lokan, The Effects of Moving
except for small window sizes. Rather, the negative ef-         Windows to Software Estimation: Comparative
fect caused by the relevancy filtering was mitigated by         Study on Linear Regression and Estimation by
the augmentation.                                               Analogy, in: Proc. of IWSM-MENSURA 2012,
                                                                IEEE, 2012, pp. 23–32.
                                                            [6] S. Amasaki, C. Lokan, The Effect of Moving Win-
5. Conclusion                                                   dows on Software Effort Estimation: Compara-
                                                                tive Study with CART, in: Proc. of IWESEP 2014,
We explored the effects of the augmentation and its
                                                                IEEE, 2014, pp. 1–6.
combination with a relevancy filtering for effort esti-
                                                            [7] C. Lokan, E. Mendes, Investigating the Use
mation. We confirmed the augmentation was a useful
                                                                of Duration-Based Moving Windows to Improve
way to bring a positive synergistic effect of the chrono-
                                                                Software Effort Prediction, in: Proc. of APSEC
logical filtering and the relevancy filtering. Combining
                                                                2012, 2012, pp. 818–827.
the augmented windows with the relevancy filtering,
                                                            [8] C. Lokan, E. Mendes, Investigating the use
as well as in [1] also diminished the negative syner-
                                                                of duration-based moving windows to improve
gistic effect caused by the combination of the moving
                                                                software effort prediction: A replicated study,
windows and NN-filter found in a past study. We thus
                                                                Inf. Softw. Technol. 56 (2014) 1063–1075.
concluded that the augmentation can be a good way to
                                                            [9] S. Herbold,        CrossPare: A tool for bench-
combine the two filtering approaches and also a good
                                                                marking cross-project defect predictions, in:
extension of the moving windows, which can be safely
                                                                Proc. of 30th IEEE/ACM International Confer-
combined with the relevancy filtering.
                                                                ence on Automated Software Engineering Work-
   Further investigation considering other transfer learn-
                                                                shops (ASEW), IEEE, 2016, pp. 90–95.
ing approaches is in future work. The NN-filter used
                                                           [10] B. Turhan, T. Menzies, A. B. Bener, J. Di Ste-
for augmentation is a type of transfer learning, it is
                                                                fano, On the relative value of cross-company and
interesting to examine the effects of other approaches
                                                                within-company data for defect prediction, Em-
such as [14] for augmentation. Some transfer learning
                                                                pirical Software Engineering 14 (2009) 540–578.
approaches for cross-project defect prediction [19] can
                                                           [11] E. Kocaguneli, T. Menzies, How to Find Relevant
also be applied. The threat to external validity can be
                                                                Data for Effort Estimation?, in: Proc. of ESEM,
mitigated with additional project data.
                                                                IEEE, 2011, pp. 255–264.
                                                           [12] E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, J. W.
Acknowledgments                                                 Keung, When to use data from other projects for
                                                                effort estimation, in: Proc. of ASE, ACM, 2010,
This work was partially supported by JSPS KAKENHI               pp. 321–324.
Grant #18K11246.                                           [13] E. Kocaguneli, T. Menzies, A. B. Bener, J. W. Ke-
                                                                ung, Exploiting the Essential Assumptions of
                                                                Analogy-Based Effort Estimation, IEEE Transac-
References                                                      tions on Software Engineering 38 (2012) 425–438.
                                                           [14] E. Kocaguneli, T. Menzies, E. Mendes, Transfer
 [1] S. Amasaki, Exploring Preference of Chronolog-             learning in effort estimation, Empirical Software
      ical and Relevancy Filtering in Effort Estimation,        Engineering 20 (2015) 813–843.
      in: Proc. of Profes 2019, Springer, 2019, pp. 247– [15] R. Tibshirani, Regression shrinkage and selection
      262.                                                      via the lasso, J. Roy. Statist. Soc. Ser. B (1996) 267–
 [2] C. Lokan, E. Mendes, Applying moving windows               288.
      to software effort estimation, in: Proc. of ESEM [16] S. Amasaki, C. Lokan, Evaluation of Moving
      2009, 2009, pp. 111–122.                                  Window Policies with CART, in: Proc. of IWE-
 [3] B. Turhan, E. Mendes, A Comparison of Cross-               SEP 2016, IEEE, 2016, pp. 24–29.
      Versus Single-Company Effort Prediction Models [17] S. Amasaki, C. Lokan, A Replication of Compar-
      for Web Projects, in: Proc. of SEAA, IEEE, 2014,          ative Study of Moving Windows on Linear Re-
      pp. 285–292.                                              gression and Estimation by Analogy, in: Proc. of
 [4] B. Kitchenham, S. Lawrence Pfleeger, B. McColl,            PROMISE, ACM Press, 2015, pp. 1–10.
      S. Eagan, An empirical study of maintenance and [18] Y. Benjamini, D. Yekutieli, The control of the false




                                                           11
              8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2020)




     discovery rate in multiple testing under depen-
     dency, Annals of statistics 29 (2001) 1165–1188.
[19] S. Herbold, Training data selection for cross-
     project defect prediction, in: Proc. of PROMISE
     ’13, ACM, 2013, pp. 6:1–6:10.




                                                        12