<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empirical Study of Fault Introduction Focusing on the Similarity among Local Variable Names</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hirohisa Aman</string-name>
          <email>aman@ehime-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomoyuki Yokogawa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sousuke Amasaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minoru Kawahara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Information Technology, Ehime University</institution>
          ,
          <addr-line>Matsuyama, Ehime</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Computer Sc. &amp; Systems Eng., Okayama Prefectural University</institution>
          ,
          <addr-line>Soja, Okayama</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>-Well-chosen variable names play significant roles in program comprehension and high-quality software development and maintenance. However, even though all variables have easyto-understand names, attention needs to be paid to the similarity among those names as well because a highly similar pair of variables may decrease the readability of code and might cause a fault. In order to analyze the relationship of variable similarity with the risk of introducing faults, this paper collects variable data from ten Java open source development projects and conducts an empirical study on the following three research questions: (1) Does the distribution of similarity differ among software development projects?, (2) What is the appropriate threshold of similarity?, and (3) How can the threshold of similarity contribute to the fault-prone method prediction? Then, the empirical results show the following findings: (1) The distribution of similarity is nearly identical regardless of the project; (2) Programmers should avoid giving similar names to different variables to prevent a fault introduction, and the threshold of Levenshtein similarity is 0.35; (3) By classifying Java methods with the above threshold, the risk of overlooking fault-introducing events in the fault-prone prediction model is effectively reduced; The recall of the prediction model improves by 12.8% on average.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Code changes form essential parts of software development
and maintenance, i.e., drive a software evolution. On the other
hand, a code change also has a risk of introducing a new
fault [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Because the code change is a human intellectual
activity, the risk of a fault introduction would become higher
when the program is more complex and harder to understand.
Hence, the readability of code plays a significant role in the
successful development and maintenance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By making the
code readable, the programmer can review the code more
clearly and deeply, and may quickly detect a potential fault
if it exists. Moreover, readable code is easy-to-understand for
many other programmers as well, and they can review and
maintain the source code smoothly. One of the most critical
matters for producing readable code is the proper naming of
the variables [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Indeed, there is a report saying that 24%
of code review feedbacks were related to the variable naming
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Well-chosen names of variables can be useful clues to the
understanding of what the program does [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and can lead to the
advantages of a low maintenance cost [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. On the other hand,
we can readily decrease the readability and understandability
of programs by selecting meaningless names for the variables.
      </p>
      <p>
        There have been various studies on a better naming of
variables in the past. For example, Lawrie et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
Scanniello et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] reported empirical results that fully spelled
English words or their abbreviated forms are better for naming
variables in terms of program comprehension and effective
fault detection. Binkley et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] showed that the camel
casestyle naming is useful for enhancing the understandability.
      </p>
      <p>
        Although the naming of variables has been studied, we
should consider not only the naming of an individual variable
but also the relationship among the names of variables. If
two variables have highly similar names to each other, they
may be easily confused. For example, suppose a method
(or a function) has some local variables including lineIndex
and lineIndent. Although each of these two names looks
easy-to-understand, a mix of them may be confusable [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
When a programmer uses a sophisticated editor which can
automatically complement or suggest the name of a variable,
the programmer might accidentally select a similar but wrong
variable [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. When we use two or more variables, we should
give more distinguishable names to the variables while keeping
the meaningfulness of the names.
      </p>
      <p>In this paper, we report our quantitative study on the risk
of introducing faults by focusing on the similarity among the
names of the local variables. The key contributions of this
paper are as follows:</p>
      <p>We present a quantitative guideline of the local variable
name to prevent the fault introduction: When a
programmer declares two or more local variables in a Java
method, the names of them should be dissimilar to each
other. The threshold of Levenshtein similarity between
variable names is 0:35.
We empirically show that the classification of Java
methods by the above threshold can be helpful in the fault
introduction prediction; Especially, it reduces the risk of
overlooking fault introductions effectively: The
classification based on the similarity of local variable names
improves the recall, the precision, and the F value of the
random forest-based fault introduction prediction model
by 12:8%, 2:45%, and 4:2% on average, respectively.</p>
      <p>The remainder of this paper is organized as follows: In
Section II, we describe the related work regarding variable names
and present our research motivation and research questions
(RQs). Then, in Section III, we report the empirical work that
we conducted on the RQs. Finally, we give our conclusion and
future work in Section IV.</p>
    </sec>
    <sec id="sec-2">
      <title>II. RELATED WORK AND MOTIVATION</title>
      <p>In this section, we briefly explain the related work focusing
on the names of local variables. Then, we describe our research
motivation and put our research questions (RQs) to clarify our
goal in this paper.</p>
      <sec id="sec-2-1">
        <title>A. Naming Variables</title>
        <p>There have been several empirical studies regarding better
naming of variables in the past.</p>
        <p>
          Lawrie et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] conducted an empirical study focusing on
the comprehensibility of variables from the perspective of the
naming form. They prepared three variations of a variable
name as (a) fully spelled English word, (b) an abbreviated
form of the word, and (c) a single character then got 128
programmers to compare the ease of comprehension of them. The
empirical results showed that the trend of comprehensibility
is “(a) (b) &gt; (c),” but there is no statistically significant
difference between (a) and (b). That is, a fully spelled English
word or a well-chosen abbreviated work would be a better
name for a variable than a single character name. Scanniello
et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] also performed an empirical study regarding the
relationships of the variable naming with the program
comprehension and the fault detection/fix. The empirical results
involving 100 programmers showed a trend similar to the one
reported by Lawrie et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          When a variable has a more complex role, programmers
tend to use a longer and more descriptive name. The
major ways of such naming are the concatenations of two or
more words (or abbreviated words) by the camel case (for
example, indexOfArray) or the snake case (for example,
index of array) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Although the above previous studies are remarkable
empirical work to discuss the appropriate naming of variables,
their main focus is on the naming of an individual variable.
In other words, the relationship between names has not been
well-discussed. When we see two or more variables and the
names of them are similar to each other, we might get confused
by the similar names even if each variable has an
easy-tounderstand name.</p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Similar Names of Variables</title>
        <p>
          As mentioned above, when there are two or more local
variables and the names are highly similar to each other,
the code comprehensibility and the readability may be low
even if each of the names is easy-to-understand. Binkley et
al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] concerned the risk of selecting the wrong variable
when the name is long. Indeed, many programmers may use an
advanced code editor like Atom1 or an integrated development
environment like Eclipse2, which can automatically complete
the name of the variable or line up available variables [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
Then, a programmer might overlook a wrong completion or
choose the wrong candidate (see Fig. 1).
        </p>
        <p>
          Tashima et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] analyzed the issue of similar variable
names from the perspective of the fault-proneness. In work
by Tashima et al., they measured the similarity between two
names by the Levenshtein distance [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We briefly describe
their approach below. The Levenshtein distance between two
names is defined to be the minimum number of operations
required to change one name to the other name. The available
operations are the following three ones:
to add one character,
to delete one character, and
to replace one character by another character.
        </p>
        <p>For example, we can produce lineIndent from lineIndex
through the following two operations: 1) replace x (in
lineIndex) by n, and 2) add t to the end of it. That is, the Levenshtein
distance between the above two names is two. The smaller
the Levenshtein distance, the higher the similarity between
the names. However, the length of the name has also an
impact on the similarity assessment. For example, although
the Levenshtein distance between file and pipe is also two,
the similarity of (file, pipe) does not look at as the same as
the similarity of (lineIndex, lineIndent). Hence, Tashima et
al. used the following normalized Levenshtein distance (NLD)
between two names, s1 and s2, in their work:</p>
        <p>NLD (s1; s2) =</p>
        <p>LD(s1; s2)
maxf (s1); (s2)g
;
(1)
where LD (s1; s2) is the Levenshtein distance between s1 and
s2, and (si) is the length (character count) of si (for i = 1; 2).</p>
        <p>
          Tashima et al. conducted an empirical study using Java
programs and showed that more fault fixes tend to occur in
Java methods which have a pair of local variables with highly
similar names [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Although their study is a useful previous
work which drew attention to the similarity of the variable
names, there are the following two problems to be solved.
1) The fault introduction events were missed: The previous
work used a snapshot when a fault was fixed. It is better
to focus on the fault introduction events rather than the
fault fix events for discussing the relationship between
the similarity of variable names and the fault-proneness.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2) The threshold of the similarity was not discussed: The</title>
        <p>previous work did not discuss the threshold of the
similarity. To construct the guideline about the similarity, we
need to find an appropriate threshold value of similarity.
These two problems are our research motivations in this paper.</p>
      </sec>
      <sec id="sec-2-4">
        <title>C. Research Questions</title>
        <p>To address the two problems mentioned in Section II-B, we
consider the following three research questions (RQs). In this
study, we use the following Levenshtein similarity (LS ),
LS (s1; s2) = 1</p>
        <p>NLD (s1; s2) ;
(2)
as our similarity measure because NLD is an inverse measure
of similarity.</p>
      </sec>
      <sec id="sec-2-5">
        <title>RQ1: Does the distribution of similarity differ among software development projects?</title>
        <p>Because different projects may have different
developers, the trends of naming variables might also
differ among projects. To discuss an appropriate
threshold value of the similarity between variable
names, we first need to check whether the difference
of projects affects the distribution of similarity or
not. If there is a significant variation in the
distributions, we have to study the appropriate threshold
value for each project separately. Otherwise, we can
discuss the standard threshold value, which can be
commonly used for all projects in this study.</p>
      </sec>
      <sec id="sec-2-6">
        <title>RQ2: What is the appropriate threshold of similarity?</title>
        <p>
          If there are two or more local variables whose names
are highly similar to each other in a Java method,
they are confusable and may adversely affect the
code quality of the method. Although Tashima et al.
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] empirically showed that the presence of such a
confusing pair of local variables in a Java method
is related to the fault-proneness of the method, they
did not discuss the appropriate threshold value of the
similarity between variable names. Thus, we explore
the appropriate threshold in this study and present
the result as a guideline about the variable naming.
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>RQ3: How can the threshold of similarity contribute to the fault-prone method prediction?</title>
        <p>Once we obtain an appropriate threshold value of
the similarity between variable names through the
empirical study regarding RQ2, then we can classify
Java methods by the threshold. Thus, we examine
the usefulness of the classification by the
threshold of similarity. More specifically, we build
faultprone method prediction models with and without
using the above threshold and compare the prediction
performance values between them to evaluate the
usefulness of the classification.</p>
        <p>We conduct an empirical study on the above three RQs in
the following section.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>III. EMPIRICAL STUDY</title>
      <p>In this section, we report the empirical study that we
conducted to address the three RQs mentioned above. First, we
describe our aim and data source in Section III-A and explain
the procedure of our data collection and analysis in Section
III-B. Then, we show our results in Section III-C and give
our discussions about the results in Section III-D. Finally, we
describe our threats to validity in Section III-E.</p>
      <sec id="sec-3-1">
        <title>A. Aim and Data Source</title>
        <p>This study aims to tackle the above three RQs through
an empirical data analysis. To this end, we collect
finegrained method-level data of code changes from open source
development projects and analyze the risk of introducing faults
in terms of the similarity among the names of local variables.</p>
        <p>We used ten open source development projects as our data
source3 (see Table I). The main reasons why we use these
projects are as follows:
1) The code repository is Git;
2) The primary development language is Java;
3) The issue (bug) tracking system is Apache JIRA;
4) The developers specify the corresponding issue IDs in the
commit messages when they commit fixed source files
into the code repository.</p>
        <p>
          The reasons 1) and 2) aim to perform a lightweight method
data collection. In this study, we collect the change history
of methods (functions) from a project. To carry out our data
collection effectively, we utilize a Git-based fine-grained code
repository, Historage [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] which manages the source
code at the method level rather than the file level; Git
repositories are convertible to Historage repositories. Because the
supported language of Historage repository is Java, we focus
only on Java projects in this study.
        </p>
        <p>
          The reasons 2), 3), and 4) are requirements for collecting
fault introduction data. To detect fault introduction events, we
use the well-known SZZ algorithm [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]–[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and one of its
implementations, SZZ Unleashed [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The SZZ algorithm
3https://fbeam,flink,hbase,hive,kafka,rocketmq,
storm,zeppelin,zookeeperg.apache.org/,
http://groovy-lang.org/
links a reported issue (bug) with the corresponding issue-fix
commits. The SZZ Unleashed is designed to collect issue data
from Apache JIRA and to make the above links to commits
on Git. Because the issue-commit linking is based on whether
or not the issue ID appears in the commit message, the above
reason 4) is a requirement in this study.
        </p>
        <p>We performed a project search on GitHub with the search
keyword “org:apache,” and selected the most popular4 ten
projects (see Table I) satisfying the above requirements.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Procedure</title>
        <p>We collect data of Java methods and their local variables
from each of the studied projects and analyze the collected
data in the following procedure.</p>
      </sec>
      <sec id="sec-3-3">
        <title>1) Fine-grained repository construction:</title>
        <p>
          Because local variables belong to a Java method, we
need to collect data of local variables from each method.
Moreover, to capture the fault introduction events in a
method, we have to examine the code change history
at the method level rather than the file level. Since the
Git repositories of the studied projects maintain the code
change history at file-level, we convert the repositories
into finer-grained method-level repositories—the
Historage repositories [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]—by using the conversion tool,
Kenja5 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>2) Fault introduction commit detection:</title>
        <p>
          According to the SZZ algorithm [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]–[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], we detect the
commits in which the code changes introduced faults;
We use the SZZ Unleashed [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], an implementation of
the SZZ algorithm. In the above detection, we exclude the
following kinds of methods because they are not related to
any fault introduction: Java methods for testings, demos,
and document generations.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3) Data collection of the fault-proneness of methods and the names of local variables:</title>
        <p>If a method experiences a fault introduction, we define
the method to be faulty. For a faulty method, we focus
on the commit in which the fault introduction occurred
first and extract the corresponding revision of the method.
Then, for each pair of local variables (including formal
parameters) in the selected revision, we compute the
similarity between the names of variables. To link a computed
similarity with the method in our analysis, we adopt the
highest similarity in a method as the representative value
of the method.</p>
        <p>When a method is not faulty, we single out the latest
revision of the method as a sample for our data analysis
and compute the similarity of local variables as well.
We henceforth denote the highest similarity between local
variable names in method m by HLS(m).</p>
      </sec>
      <sec id="sec-3-6">
        <title>4) Data analysis for answering RQ1 (Does the distribution of similarity differ among software development projects?):</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4The popularity is evaluated by the “stars” score on GitHub.</title>
      <p>
        5https://github.com/niyaton/kenja
We compare the distributions of computed HLS( ) across
the studied projects by using the summary statistics—the
minimum, the first quartile (25 percentile), the median,
the mean, the third quartile (75 percentile), and the
maximum—and the box plots. Moreover, we randomly
select 140 samples6 (methods) from each project, i.e.,
1400(= 140 10) samples in total, and perform the
Kruskal-Wallis test [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] at significance level 0:05 with
the following hypotheses.
      </p>
      <p>Null hypothesis: There is no difference in the median
similarity across the studied projects.</p>
      <p>Alternative hypothesis: At least one project’s median
similarity is different from the others.</p>
      <sec id="sec-4-1">
        <title>5) Data analysis for answering RQ2 (What is the appropriate threshold of similarity?):</title>
        <p>Let M be the set of all Java methods to be analyzed.
We consider the following two subsets of M , which are
divided by a threshold :</p>
        <p>ML( ) = f m 2 M j HLS (m) g</p>
        <p>MH ( ) = f m 2 M j HLS (m) &gt; g
Now we explore the appropriate threshold of similarity
( ) as follows.</p>
        <p>First, for a , we compute the following faulty method
rates (FMRL( ) and FMRH ( )) in ML( ) and MH ( ),
respectively:</p>
        <p>FMRx( ) =
fm 2 Mx( ) j m is faultyg
jMx( )j
;
(3)
where x 2 fL; Hg.</p>
        <p>Then, we evaluate the effect of the classification by ,
using the following odds ratio OR( ):</p>
        <p>OR( ) = FMRH ( )= f1 FMRH ( )g : (4)</p>
        <p>FMRL( )= f1 FMRL( )g
The higher OR( ) means that the classification by
is more effective in terms of the fault-prone method
detection. We adopt the threshold such that OR( ) has
the highest value, as the appropriate threshold .</p>
      </sec>
      <sec id="sec-4-2">
        <title>6) Data analysis for answering RQ3 (How can the threshold of similarity contribute to the fault-prone method prediction?):</title>
        <p>
          To examine how the classification by can contribute
to the fault-prone method prediction, we compare the
prediction performances of the prediction models with
and without using the similarity-based classification.
Although various mathematical models have been studied
for the fault-prone method prediction in the past, we use
the random forest in this study because it has been widely
known as one of the most promising models [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
6a) Examination by random forests without using the
similarity-based classification: First, we build a random
forest7 for predicting whether a method is faulty or not,
6We decided the simple size “140” by a simulation using random numbers,
where the significance level is 0:05, and the power of a test is 0:8.
        </p>
        <p>7We use the randomForest function provided by the randomForest package
of R version 3.6.1, with its default settings.</p>
        <p>
          by using only the fundamental code metrics: the lines of
code (LOC) [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], the cyclomatic complexity (CC) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ],
and the number of local variables in a method.
        </p>
        <p>Then, we compute the following performance values: the
recall, the precision, and the F value (the harmonic mean
of the recall and the precision). These values form the
baselines of evaluation in our study.</p>
        <p>
          During the prediction model construction, we have to
pay attention to the imbalance between positive samples
and negative samples: we have less number of faulty
methods than non-faulty ones in our data sets. Such
an imbalance of data often leads to a poor prediction
model. To overcome this issue of imbalanced data, we use
SMOTE algorithm [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], which oversamples the minority
by generating similar artificial data and undersamples
the majority to balance the data set8. By using such a
balanced data set, we construct the random forest for
predicting faulty methods. Because the behavior of the
SMOTE algorithm depends on the random number, we
repeat the following two steps 100 times:
6a-1) construct the prediction model, and
6a-2) compute the performance values.
        </p>
        <p>Then, we adopt the mean performance values as the
baselines (see Fig. 2).</p>
      </sec>
      <sec id="sec-4-3">
        <title>6b) Examination by random forests with using the</title>
        <p>similarity-based classification: Next, by obtained at
the step 5, we divide the set of all methods M into
ML( ) and MH ( ). Then, we obtain the mean
performance values of the prediction models by repeating
the following three steps 100 times (see Fig. 3):
6b-1) construct the prediction models for ML( ) and</p>
        <p>MH ( ), respectively,
6b-2) integrate the prediction results produced by the
two models, and
8We use the SMOTE function provided by the DMwR package of R version
3.6.1; To balance the ratio of faulty methods and non-faulty ones as
“fiftyfifty” in our data set, we increase the samples of faulty methods by
10fold through the oversampling and randomly choose (undersample) the same
number of non-faulty methods.</p>
        <p>6b-3) compute the performance values of the prediction.</p>
      </sec>
      <sec id="sec-4-4">
        <title>6c) Comparison of the prediction performances: Finally,</title>
        <p>we evaluate the effect of the similarity-based
classification by comparing the mean performance values obtained
in the step 6a and 6b.</p>
      </sec>
      <sec id="sec-4-5">
        <title>C. Results</title>
        <p>By performing the steps 1) – 3) described in Section III-B,
we collected 86; 104 methods (including constructors) from
the studied ten projects and detected fault introductions by
the SZZ algorithm. Notice that the above methods are the
ones that have two or more local variables (including formal
parameters) because this study focuses on the similarity among
the names of local variables. Table II presents the number of
methods and that of faulty ones for each project; For example,
in Beam project, 63 out of 7; 491 methods are faulty.</p>
        <p>We show the results of the steps 4)–6) below, which are
corresponding to RQ1, RQ2, and RQ3, respectively. The set
of our empirical data is available at
http://se.cite.ehimeu.ac.jp/data/QuASoQ2019/.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Results regarding RQ1 (Does the distribution of similarity differ among software development projects?)</title>
        <p>Table III presents the summary statistics of the similarity by
the studied projects, and Fig. 4 shows the box plots of them.
From the table and figure, the distributions of similarity in
projects seem to be close to each other.</p>
        <p>Moreover, the p-value of Kruskal-Wallis test was 0:3387(&gt;
0:05), so we cannot reject the null hypothesis “There is no
difference in the median similarity across the studied projects.”
That is, there does not seem to be a significant difference in
the central tendency of similarity distribution among projects.</p>
        <p>From the above results, our answer to RQ1 (Does the
distribution of similarity differ among software development
projects?) is: The distribution of similarity is nearly identical
regardless of the project.</p>
      </sec>
      <sec id="sec-4-7">
        <title>Results regarding RQ2 (What is the appropriate threshold of similarity?)</title>
        <p>In the data analysis regarding RQ1, we have seen that the
distribution of similarity is nearly identical regardless of the
project. Hence, we seek an appropriate threshold of similarity
between the names of local variables without distinction of
the project.</p>
        <p>From Table III and Fig. 4, the range of “relatively high
similarity” seems to be between 0:3 and 0:6. Thus, by changing
the threshold from 0:3 to 0:6 at intervals of 0:01, we divided
the set of all methods M into ML( ) and MH ( ), obtained the
2.1
o
it
ra1.9
s
d
d
O
1.7
faulty method rates FMRL( ) and FMRH ( ), and computed
the odds ratio of them, OR( ), as the measure of effect. We
show the change of the odds ratio over in Fig. 5.</p>
        <p>In the figure, the odds ratio becomes the highest around
= 0:35–0:4, and it drops around = 0:5. Table IV presents
a part of the computed odds ratios (OR), the faulty method
rates (FMRs), and the ratio between FMRs; In the table, we
highlight the highest odds ratio and the highest ratio between
FMRs by boldface together with the “ ” mark.</p>
        <p>In Table IV, six thresholds ( = 0:34; 0:35; 0:36; 0:37; 0:40;
and 0:41) showed the highest odds ratio, i.e., the highest effect
of classification by that threshold for detecting fault-prone
methods. Because the ratio between faulty method rates also
gets the highest value at = 0:35 within these six thresholds,
we consider it is the appropriate threshold: = 0:35.</p>
        <p>From the above results, our answer to RQ2 (What is
the appropriate threshold of similarity?) is: The appropriate
threshold of similarity is around 0:35.</p>
      </sec>
      <sec id="sec-4-8">
        <title>Results regarding RQ3 (How can the threshold of similarity contribute to the fault-prone method prediction?)</title>
        <p>Table V shows the mean performance values of the random
forest models constructed in the steps 6a and 6b and the
improvements of them by using the similarity-based
classification; In the table, “Baseline random forest (RF)” column
and “RF + similarity classification” column correspond to the
mean performance values of the random forests constructed
in the step 6a (Fig. 2) and 6b (Fig. 3), respectively, and
“Improvement” column presents the improvement rates of the
latter value from the former value:
latter value former value
100 (%): (5)
former value
For example of the recall in Beam project, the former value is
0:4814 and the latter one is 0:6208, so the improvement rate is
computed as: (0:6208 0:4814)=0:4814 100 ≃ +28:95 (%):</p>
        <p>In the table, the recall value, the precision value, and the F
value improves by 12:8%, 2:45%, and 4:2% on average of all
projects. Moreover, each of all projects shows an improvement
in recall value as well. On the other hand, the precision values
decrease in six out of ten projects. Because of the reduction of
precision values in those projects, the F values also get lower
in three out of ten projects.</p>
        <p>As a result, the classification using the above threshold
( = 0:35) always works for improving the recall of the
faultprone method prediction. That is, the similarity-based
classification contributes to the reduction of risk of overlooking
fault-introducing events. Even though some projects showed
decreases in the precision and the F values, these measures
also got improved on average.</p>
        <p>From the above results, our answer to RQ3 (How can the
threshold of similarity contribute to the fault-prone method
prediction?) are: That threshold contributes to the reduction of
risk of overlooking faults in the fault-prone method prediction.</p>
      </sec>
      <sec id="sec-4-9">
        <title>Moreover, although the precision value might be decrease</title>
        <p>in some cases, the performance of prediction tends to get
improved—the recall: +12:8%; the precision: +2:45%; the
F value: +4:2%.</p>
      </sec>
      <sec id="sec-4-10">
        <title>D. Discussions</title>
        <p>To answer RQ1, from each of Java methods in the
studied projects, we collected the highest Levenshtein similarity
(HLS ) among the names of local variables in the method and
compared the distribution of HLS values across the projects.
As a result, the distributions seem to be close to each other.
Moreover, there is no significant difference in the median
Similarity
0:1
0:2
0:3
0:35
among the projects. Because the highest similarity tends to
be less than 0:5 in many Java methods, we can say that
programmers would avoid having a pair of highly similar
(confusing) variable names within the same method regardless
of the project. To see this trend intuitively, we show samples of
variable pairs appearing in the studied projects in Table VI. If
the similarity between the two names is greater than 0:5, these
names tend to get harder to discriminate each other because
the half or more parts of them are duplicated. Especially, pairs
of variable names whose similarity is around 0:8–0:9 may lead
to an erroneous selection as they are a pair of a word and its
plural form or a pair of almost the same names in which the
only difference between them is the used number (for example,
1 or 2). It is better to avoid using such highly similar pairs
for preventing human errors during the programming activity,
and we should support it by providing a quantitative guideline
and an automated checking tool.</p>
        <p>Next, to answer RQ2, we experimented with the
similaritybased classification of Java methods and evaluated the effect
of classification by the odds ratio in terms of the faulty method
rate, while changing threshold . As a result, the odds ratio
showed the highest value around 0:35–0:4, and we found that
the appropriate threshold is 0:35. As we have seen in Table VI,
pairs of variable names with the similarity are around 0:35–0:4
may have one or more common words in the compound names.
These pairs would not be especially confusing. However, if
they become a more similar pair, i.e., the similarity gets
larger than 0.5, the risk of a human error such as a
mixup of the variable would also increase. Hence, we consider
= 0:35 would be a reasonable threshold to warn regarding
the confusing pair of variable names.</p>
        <p>Finally, to answer RQ3, we prepared two different
random forest-based prediction models with and without using
the similarity-based classification, and we evaluated how the
prediction model with using the similarity (Fig. 3) outperforms
the baseline model (Fig. 2) which did not use the similarity.
As a result, the mean prediction performance values showed
the following improvements: the recall improved by +12:8%,
the precision did by +2:45%, and the F value did by +4:2%.
Notably, the recall got improved in all of the ten projects.
Thus, we consider that the similarity-based classification can
work for preventing the overlooking of faults more effectively.
Although projects Kafka and RocketMQ showed relatively
small improvements (+3:6% and +0:05%) in Table V, the
following two things would be significant reasons: their recall
values were already at relatively high levels in the baseline
model, and these projects have the least faulty methods in the
data set (Table II). On the other hand, the precision values
decreased from the baseline in some projects. In general, if
we predict more objects as positive (i.e., faulty) to prevent
overlooking the true-positive objects, the precision value tends
to decrease. In our data set, because of a small number of
faulty methods (Table II), we cannot avoid the depression
of the precision when we increase the number of positive
prediction cases. Nonetheless, the top four projects in terms
of the faulty rate (Groovy, HBase, Zeppelin, and Zookeeper)
showed improvements in both the recall and the prediction.
Therefore, the similarity-based classification tends to help for
improving the performance of the prediction model.</p>
        <p>In the experiment for answering RQ3, we used only the
optimal threshold ( ) obtained in the previous experiment
regarding RQ2 because we aim to examine the effect of
on the fault-prone method prediction model. Although other
threshold values might also have similar effects, a detailed
further investigation using various threshold values is our
future work.</p>
      </sec>
      <sec id="sec-4-11">
        <title>E. Threats to Validity</title>
        <p>We discuss the threats to validity that can affect our results.</p>
        <p>Conclusion Validity: The naming of variables may depend
on the developers’ experience and preference, so the
heterogeneity of the project might have an impact on our empirical
results. Because there was no significant difference in the
median of similarity among the studied projects, we performed
our study using a single threshold common to all projects.
However, it may be better to collect more data from more
projects and to analyze the effect of the threshold by the
project domain. That is one of our significant future work.</p>
        <p>Internal Validity: We quantitatively analyzed the
relationship between the fault-introducing risk in a Java method and
the similarity among the names of local variables in the
method. However, the observed fault-introducing events may
not always be related to the local variables of interest. In other
words, there might be a fault at the part which is independent
of the variable, and this is a threat to the internal validity. We
need a finer-grained code analysis to link a fault-introducing
event with the local variables in the future.</p>
        <p>
          Construct Validity: Although we measured the similarity
between variable names by using the Levenshtein distance,
it is not the only way of evaluating the similarity. There
are other edit distance metrics such as the longest common
subsequence distance [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and the Hamming distance [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ],
and the use of a different metric may lead to a different
evaluation of similarity. Moreover, we can consider not only
edit distance but also semantic distance by using the Doc2Vec
[
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] method. A further analysis using other similarity metrics
is our significant future work.
        </p>
        <p>To evaluate the effect of the similarity-based classification,
we constructed the random forest together with the SMOTE
algorithm. The selection of parameters in the model
construction may affect the results: We specified the parameters of
oversampling and undersampling in the SMOTE function so
that the ratio between faulty methods and non-faulty methods
becomes fifty-fifty, and used the default parameters in the
randomForest function. Although a different parameter setting
may make a different result, we did not do any special tuning
to avoid yet another threat to validity, i.e., the issue of the
appropriate setting selection.</p>
        <p>External Validity: The threat to external validity is that our
data set consists of ten Java open source software projects.
Although we collected data from various projects, our results
might not represent the results on all data of all software
products. Moreover, the difference in the programming
language may affect the trend of variable naming. A further data
collection and analysis would be needed to mitigate this threat.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>IV. CONCLUSION</title>
      <p>In this paper, we focused on the similarity among the names
of local variables in a Java method. Because the presence
of a highly similar, i.e., “confusing” pair of local variables
may decrease the code readability and cause a human error,
we quantitatively analyzed such a risk by using the data of
fault-introducing events. Through an empirical study using the
data of 86; 104 Java methods collected from ten open source
projects, we got the following findings:</p>
      <p>To reduce the risk of introducing faults into a Java
method, programmers should avoid giving similar names
to different variables. The threshold of Levenshtein
similarity is 0:35.</p>
      <p>The classification of the Java method by the above
threshold works for reducing the risk of overlooking
faultintroducing events in the fault-prone prediction model;
The recall of the prediction model improves by 12:8%
on average.</p>
      <p>When a programmer develops a Java method having two or
more local variables, it is better to make the names of local
variables dissimilar to each other. If there are highly similar
names, they can be confusing and may decrease the readability,
and consequently, may raise the risk of introducing faults. The
above findings can form a quantitative guideline.</p>
      <p>Our future work includes: 1) to further analyze the impacts
of the differences in the project domain and the programming
language by collecting more project data; 2) to develop the
tool or the plugin for alerting the presence of similar variables
based on our empirical results; 3) to examine the similarity of
variable names by using other metrics including other edit
distance metrics and semantic metrics.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENT This work was supported by JSPS KAKENHI #18K11246. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <source>Applied Software Measurement: Global Analysis of Productivity and Quality</source>
          , 3rd ed. New York:
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Boswell</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Foucher</surname>
          </string-name>
          , The Art of Readable Code:
          <article-title>Simple and Practical Techniques for Writing Better Code</article-title>
          . Sebastopol, CA: Oreilly &amp; Associates,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Deissenboeck</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pizka</surname>
          </string-name>
          , “Concise and consistent naming,” Softw. Quality J., vol.
          <volume>14</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>261</fpage>
          -
          <lpage>282</lpage>
          , Sept.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Allamanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          , “
          <article-title>Learning natural coding conventions,”</article-title>
          <source>in Proc. 22nd ACM SIGSOFT Int. Symp. Foundations of Softw</source>
          . Eng.,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2014</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Corazza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Martino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Maggio</surname>
          </string-name>
          , “
          <article-title>Linsen: An efficient approach to split identifiers and expand abbreviations,”</article-title>
          <source>in Proc. 28th IEEE Int. Conf. Softw. Maintenance, Sept</source>
          .
          <year>2012</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Pigoski</surname>
          </string-name>
          , Practical Software Maintenance:
          <article-title>Best Practices for Managing Your Software Investment</article-title>
          , 1st ed. N.J.: Wiley,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Morrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Feild</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          , “
          <article-title>Effective identifier names for comprehension and memory,” Innovations in Syst</article-title>
          . &amp;
          <string-name>
            <surname>Softw</surname>
          </string-name>
          . Eng., vol.
          <volume>3</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>303</fpage>
          -
          <lpage>318</lpage>
          , Dec.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Scanniello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Risi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Romano</surname>
          </string-name>
          , “
          <article-title>Fixing faults in c and java source code: Abbreviated vs. full-word identifier names</article-title>
          ,
          <source>” ACM Trans. Softw. Eng. Methodol.</source>
          , vol.
          <volume>26</volume>
          , no.
          <issue>2</issue>
          , pp.
          <volume>6</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          :
          <fpage>43</fpage>
          ,
          <string-name>
            <surname>Jul</surname>
          </string-name>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. I.</given-names>
            <surname>Maletic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Morrell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Sharif</surname>
          </string-name>
          , “
          <article-title>The impact of identifier style on effort and comprehension</article-title>
          ,” Empir. Softw. Eng., vol.
          <volume>18</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>219</fpage>
          -
          <lpage>276</lpage>
          , Apr.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tashima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yokogawa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kawahara</surname>
          </string-name>
          , “
          <article-title>Fault-prone java method analysis focusing on pair of local variables with confusing names,”</article-title>
          <source>in Proc. 44th Euromicro Conf. Softw</source>
          . Eng. &amp;
          <string-name>
            <surname>Advanced</surname>
            <given-names>App.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          .
          <year>2018</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maex</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Morrell</surname>
          </string-name>
          , “
          <article-title>Identifier length and limited programmer memory</article-title>
          ,
          <source>” Sc. Comp. Programming</source>
          , vol.
          <volume>74</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>430</fpage>
          -
          <lpage>445</lpage>
          , May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kersten</surname>
          </string-name>
          , and L. Findlater, “
          <article-title>How are java software developers using the elipse ide?” IEEE Softw.</article-title>
          , vol.
          <volume>23</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>July 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gusfield</surname>
          </string-name>
          , Algorithms on Strings,
          <source>Trees, and Sequences: Computer Science and Computational Biology</source>
          . Cambridge Univ. Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mizuno</surname>
          </string-name>
          , and T. Kikuno, “Historage:
          <article-title>Fine-grained version control system for java,”</article-title>
          <source>in Proc. 12th Int. Workshop Principles Softw. Evolution &amp; 7th Annual ERCIM Workshop Softw. Evolution, Sep</source>
          .
          <year>2011</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] --, “
          <article-title>Bug prediction based on fine-grained module histories,”</article-title>
          <source>in Proc. 34th Int. Conf. Softw</source>
          . Eng.,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2012</year>
          , pp.
          <fpage>200</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>J. S</surname>
          </string-name>
          <article-title>´liwerski, T. Zimmermann, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeller</surname>
          </string-name>
          , “
          <article-title>When do changes induce fixes?</article-title>
          ”
          <source>in Proc. 2005 Int. Workshop Mining</source>
          Softw. Repositories, May
          <year>2005</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Spacco</surname>
          </string-name>
          , “
          <article-title>SZZ revisited: Verifying when changes induce fixes,”</article-title>
          <source>in Proc. Workshop on Defects in Large Softw. Systems</source>
          , Jul.
          <year>2008</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. A.</given-names>
            da
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McIntosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shang</surname>
          </string-name>
          , U. Kulesza,
          <string-name>
            <given-names>R.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hassan</surname>
          </string-name>
          , “
          <article-title>A framework for evaluating the results of the szz approach for identifying bug-introducing changes</article-title>
          ,
          <source>” IEEE Trans. Softw</source>
          . Eng., vol.
          <volume>43</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>641</fpage>
          -
          <lpage>657</lpage>
          ,
          <year>July 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Borg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. C.</given-names>
            <surname>Svensson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Hansson</surname>
          </string-name>
          , “SZZ Unleashed:
          <article-title>An open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the jenkins project</article-title>
          ,” in arXiv:
          <year>1903</year>
          .01742,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fujiwara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hata</surname>
          </string-name>
          , E. Makihara,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fujihara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nakayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Iida</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. ichi Matsumoto, “
          <article-title>Kataribe: A hosting service of historage repositories,”</article-title>
          <source>in Proc. 11th Working Conf. Mining Softw</source>
          . Repositories, May
          <year>2014</year>
          , pp.
          <fpage>380</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Daniel</surname>
          </string-name>
          , Applied Nonparametric Statistics, 2nd ed. Boston, MA: Cengage Learning,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>T. K. Ho</surname>
          </string-name>
          , “
          <article-title>The random subspace method for constructing decision forests,”</article-title>
          <source>IEEE Trans. Pattern Analysis &amp; Machine Intelligence</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>832</fpage>
          -
          <lpage>844</lpage>
          , Aug.
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lessmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baesens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mues</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Pietsch</surname>
          </string-name>
          , “
          <article-title>Benchmarking classification models for software defect prediction: A proposed framework and novel findings,”</article-title>
          <source>IEEE Trans. Softw</source>
          . Eng., vol.
          <volume>34</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>485</fpage>
          -
          <lpage>496</lpage>
          ,
          <year>July 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N. E.</given-names>
            <surname>Fenton</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          , “
          <article-title>Quantitative analysis of faults and failures in a complex software system</article-title>
          ,
          <source>” IEEE Trans. Softw</source>
          . Eng., vol.
          <volume>26</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>797</fpage>
          -
          <lpage>814</lpage>
          , Aug.
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>T. J. McCabe</surname>
          </string-name>
          , “
          <article-title>A complexity measure</article-title>
          ,
          <source>” IEEE Trans. Softw</source>
          . Eng., vol. SE-
          <volume>2</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>320</lpage>
          , Dec.
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Bowyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Hall</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. P.</given-names>
            <surname>Kegelmeyer</surname>
          </string-name>
          , “SMOTE:
          <article-title>Synthetic minority over-sampling technique</article-title>
          ,
          <source>” J. Artificial Intelligence Research</source>
          , vol.
          <volume>16</volume>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          , Jun.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bergroth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hakonen</surname>
          </string-name>
          , and T. Raita, “
          <article-title>A survey of longest common subsequence algorithms,”</article-title>
          <source>in Proc. 7th Int. Symp. String Processing &amp; Inf. Retrieval</source>
          , Sep.
          <year>2000</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>D. J. MacKay</surname>
          </string-name>
          , Information Theory, Inference, and
          <string-name>
            <given-names>Learning</given-names>
            <surname>Algorithms</surname>
          </string-name>
          . Cambridge: Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Lau</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          , “
          <article-title>An empirical evaluation of doc2vec with practical insights into document embedding generation,”</article-title>
          <source>in Proc. 1st Workshop on Representation Learning for NLP, Aug</source>
          .
          <year>2016</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>