<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop on Quantitative
Approaches to Software Quality, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Software Defect Prediction based on JavaBERT and CNN-BiLSTM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kun Cheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shingo Takada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Grad. School of Science and Technology, Keio University Yokohama</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>04</volume>
      <issue>2023</issue>
      <fpage>51</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>Software defects can lead to severe issues in software systems, such as software errors, security vulnerabilities, and decreased software performance. Early prediction of software defects can prevent these problems, reduce development costs, and enhance system reliability. However, existing methods often focus on manually crafted code features and overlook the rich semantic and contextual information in program code. In this paper, we propose a novel approach that integrates JavaBERT-based embeddings with a CNN-BiLSTM model for software defect prediction. Our model considers code context and captures code patterns and dependencies throughout the code, thereby improving prediction performance. We incorporate Optuna to find optimal hyperparameters. We conducted experiments on the PROMISE dataset, which demonstrated that our approach outperforms baseline models, particularly in leveraging code semantics to enhance defect prediction performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Software defect prediction</kwd>
        <kwd>JavaBERT</kwd>
        <kwd>CNN</kwd>
        <kwd>BiLSTM</kwd>
        <kwd>Optuna</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>Optuna automatically executes the above combination of</title>
        <p>
          JavaBERT and CNN-BiLSTM multiple times, and outputs
Researchers have explored various models for feature ex- the best hyperparameter values through these executions.
traction in software defect prediction, from traditional ma- Then we retrain the model in another version of the code
chine learning to deep learning. Initially, Support Vector based on the obtained hyperparameters and test the model
Machines (SVM), as employed by Elish et al.[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], gained performance.
prominence for identifying defective modules using static
code metrics. However, it struggled to uncover deep
semantics within the source code. Deep Belief Networks 3.1. Embedding with JavaBERT
(DBN), introduced by Wang et al.[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], aimed to extract BERT (Bidirectional Encoder Representations from
more complex features from code through unsupervised Transformers)[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a language model widely employed
learning. Yet, its limited depth posed challenges in reveal- in natural language processing (NLP) tasks. Unlike
coning intricate relationships within the source code. Con- ventional embeddings, BERT excels at capturing
intrivolutional Neural Networks (CNNs) were used by Li et cate contextual associations. Traditional methods like
al.[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to predict software defects by analyzing structural Word2Vec[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and GloVe[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] generate static contextual
correlations between code tokens. While proficient in representations, whereas BERT, utilizing multi-layer
bidicapturing local patterns, CNNs faced challenges in captur- rectional transformers, enables tokens to gather
informaing longer-range connections. Wang et al.[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] introduced tion from both preceding and succeeding tokens.
an RNN (Recurrent Neural Network)-based model for In our approach, we leverage a pretrained BERT model,
predicting software reliability. Deng et al.[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and Liang JavaBERT[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], fine-tuned for Java code. JavaBERT has
et al.[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] expanded Long Short-Term Memory (LSTM) been trained on a dataset of 2,998,345 Java files from
models in software defect prediction, capturing temporal GitHub open source projects. JavaBERT’s transformer
arpatterns in code sequences. However, a single LSTM can chitecture dynamically adapts token embeddings based on
only capture one direction temporal pattern in the code the entire input sequence, enhancing representation depth
sequence. Bidirectional LSTM (BiLSTM) models with and capturing code token interdependencies. The
Javattention mechanisms emerged. Wang et al.[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] introduced aBERT embeddings, denoted as JavaBERT, are computed
a gated hierarchical BiLSTM model. Uddin et al.[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] by applying the model’s encoder to tokenized Java code.
combined BiLSTM with attention and BERT-based em- For a sequence of code tokens  = {1, 2, . . . , },
beddings. JavaBERT embeddings are computed as:
        </p>
        <p>In short, SVM has difficulty discovering the deep
semantics of the source code, DBN has limited depth so it
is difficult to understand the complex relationships in the JavaBERT = EncoderJavaBERT(1, 2, . . . , )
source code, CNN has difficulty capturing long-distance
correlations, and RNN and LSTM can only capture a sin- Models typically cannot process code text sequences
gle temporal pattern. BiLSTM may have challenges in directly. Through JavaBERT, we embed code text into a
capturing local patterns. continuous vector space, using these vectors as inputs to</p>
        <p>To solve these problems, we combine the advantages the model, making it easier for the model to compute and
of CNN in detecting local patterns with the advantages understand the code.
of BiLSTM in processing sequences, allowing for
comprehensive code inspection. We further incorporate Jav- 3.2. Feature Extraction using
aBERT to dynamically adjust token embeddings based CNN-BiLSTM
on the entire input sequence, thereby deepening the
representation and capturing interdependencies among code
tokens.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methodology</title>
      <sec id="sec-3-1">
        <title>We combine Convolutional Neural Networks (CNN) and</title>
        <p>Bidirectional Long Short-Term Memory networks
(BiLSTM) to extract features. This is the key part of our
approach, where after extracting features with CNN, it is
refined with the sequential capabilities of BiLSTM.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Our software defect prediction method consists of several</title>
        <p>
          key steps, all aimed at improving prediction performance. 3.2.1. Feature Extraction with CNN
As shown in Figure 1, we first use JavaBERT to convert Utilizing Convolutional Neural Networks (CNN)[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] for
the code into vector representations. Next, we employ feature extraction involves sliding a small window, known
the CNN-BiLSTM model for feature extraction, focusing as a filter, over various parts of the code. This filter
examon local patterns and context. We also incorporate sta- ines a small segment of the code at a time, calculating a
tistical features to fully utilize all available information. value at each sliding position to create a "feature map."
∑︁ ∑︁ [ + ,  + ] · [, ] +
        </p>
        <p>)︃
where [, ] is the input at position (, ), [, ]
represents the kernel at position (, ),  is the bias, and 
signifies the activation function.
3.2.2. Refinement of Features with BiLSTM
The</p>
        <p>Bidirectional</p>
        <p>Long</p>
        <p>Short-Term</p>
      </sec>
      <sec id="sec-3-3">
        <title>Memory</title>
        <p>
          (BiLSTM)[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] layer enhances the features extracted by
the Convolutional Neural Networks (CNN). What sets
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>BiLSTM apart is its capability to capture both short-term and long-term dependencies within the code, which perfectly complements the local feature extraction carried out by CNN.</title>
      </sec>
      <sec id="sec-3-5">
        <title>The forward and backward computations in BiLSTM can be unified into a single mathematical representation:</title>
        <p>ℎ = BiLSTM(, ℎ−1 , ℎ+1)</p>
      </sec>
      <sec id="sec-3-6">
        <title>In this equation, ℎ represents the hidden state at time</title>
        <p>STM) model. It is computed based on the input  at the
current time step, the previous hidden state ℎ−1 , and
terns and connections over time, amplifying the feature
representation. In summary, we refine the feature maps
obtained from CNN using BiLSTM to achieve a
comprehensive code representation. This fusion of capturing
local patterns and accounting for temporal dependencies
improves software defect prediction performance.</p>
        <sec id="sec-3-6-1">
          <title>3.3. Integration with Statistical Features</title>
          <p>Our methodology integrates the refined BiLSTM outputs
with statistical features (such as shown in Table 2)
extracted from dataset. This step concatenates the vectors
obtained from the BiLSTM and the vectors of
statistical features obtained from the dataset into longer vectors,
making full use of the description information of the code.</p>
        </sec>
        <sec id="sec-3-6-2">
          <title>3.4. Hyperparameter Optimization by</title>
        </sec>
        <sec id="sec-3-6-3">
          <title>Optuna</title>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>Optuna, a powerful hyperparameter optimization frame</title>
        <p>
          work developed by Akiba et al.[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], plays a vital role in
our approach by automating hyperparameter tuning for the
CNN-BiLSTM model. There are similar frameworks such
as Ray Tune, etc., but Optuna is more lightweight and
mator (TPE) algorithm to efficiently explore and exploit
the hyperparameter space, enhancing the performance of
our Software Defect Prediction task.
step  in the Bidirectional Long Short-Term Memory (BiL- easier to use. It employs the Tree-structured Parzen
Esti
        </p>
        <sec id="sec-3-7-1">
          <title>4.2. Dataset and Data Preprocessing</title>
          <p>In this section, we will discuss a crucial step in our
methodology: determining optimal hyperparameters by
leveraging shared features among different versions of the
same project. Usually, code with similar version numbers
exhibits a high degree of similarity. By harnessing these
inherent similarities, we attempt to find hyperparameters
that can generalize across various versions, ultimately
enhancing model performance.</p>
          <p>Using the Ant project as an example, our aim is to
demonstrate the transferability of hyperparameters
obtained from training on one version (e.g., 1.5) to another
(e.g., 1.6). This transferability is valid as both versions
originate from the same project, sharing similar code
structures and functionalities. This enables the
hyperparameters obtained from one version to serve as a foundation for
other versions within the same project, thereby solidifying
our model configuration.</p>
          <p>We start by selecting version pairs, using the Ant
project as an illustration. Here, we designate version
1.5 for training and version 1.6 for testing. Next, we
deifne the performance metric to optimize, such as the F1
score. Subsequently, Optuna conducts multiple
experiments, traversing various hyperparameter combinations
and evaluating their performance on the designated
testing dataset. Through these iterative experimentation and
evaluation stages, Optuna determines the hyperparameter
set that maximizes the chosen performance metric.</p>
          <p>This process can be represented as:</p>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>Our study uses the PROMISE[18] dataset, exclusively</title>
        <p>
          comprised of Java projects. This dataset spans various
domains and project scales, providing project details like
name, description, version, and bug rate. Table 1 shows an
overview of the projects we use that are in the PROMISE
Java Dataset. Since Optuna’s process of finding
hyperparameters takes a lot of time, we only selected a part of the
projects in the PROMISE data set. Statistical features also
 = Optuna  (Ant 1.5, Ant 1.6) play a vital role in code analysis, offering insights into
Here,  (Ant 1.5, Ant 1.6) embodies the objective func- code composition and behavior. To enhance our study, we
tion maximized during the hyperparameter optimization carefully selected a subset of these features, as shown in
process, with Ant 1.5 as the training dataset and Ant 1.6 Table 2.
as the testing dataset. After obtaining optimal hyperpa- To prepare the data for analysis, we conducted
thorrameters  through the Optuna process, we seamlessly ough data preprocessing. Using the "javalang"[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] Python
transfer them across different project versions.  is library, we removed redundant code elements such as
applied to reconfigure the training and testing sets. For comments, white spaces, and unnecessary details. This
instance, in the Ant project,  is then used on different process allowed us to extract essential token sequences,
version pairs, such as training on Ant 1.6 with  and capturing the code’s semantics. To address class
imbaltesting on Ant 1.7. ance in software defect prediction, we implemented
ran
        </p>
        <p>This operation optimizes hyperparameters across ver- dom oversampling exclusively on the "Bug" class files.
sion pairs, contributing to enhanced model adaptability This deliberate strategy generated synthetic data instances,
and performance in varying project iterations. improving class distribution and mitigating potential bias
towards the majority class.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Research Questions</title>
        <sec id="sec-4-1-1">
          <title>Our experiment addresses the following research questions (RQ) : RQ1: How does the performance of our CNN-BiLSTM model compare against baseline models?</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Experimental Settings</title>
        <p>For each project listed in Table 1, we selected the smallest
two version numbers to serve as versions Y and Y+1
for Optuna’s hyperparameter optimization process. The
search space for the hyperparameters was specified as
shown in Table 3. The number of trials for each project
was set to 30. After completing these experiments, each
project will produce a different set of hyperparameters
that allow the model to output the highest F1 score, and a
model trained on these parameters using version Y. These
hyperparameters were then applied to train new models
on version Y+1 for each project. Then the model trained
on version Y and the model trained on version Y+1 were
evaluated against the code of version Y+2. We conducted
each evaluation test three times and calculated the mean
to obtain the experimental result.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <sec id="sec-5-1">
        <title>In this section, we present the results of our study and discuss their implications, addressing the research questions (RQ) that guide our investigation.</title>
        <sec id="sec-5-1-1">
          <title>5.1. Impact of JavaBERT-based</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>Embeddings with CNN-BiLSTM</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Model</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>To address RQ1, we assessed the performance of our</title>
        <p>model in comparison to baseline models. Table 4 presents
a detailed performance comparison between our
CNNBiLSTM model and the baseline models concerning
precision, recall, and F1-score. For instance, "ant_1.5_1.6"
represents the experimental results obtained by using
version 1.5 of Ant as the training dataset and version 1.6
as the test dataset. The results demonstrate a consistent
outperformance of our model across all metrics. Figure
2 complements the table by providing a visual
representation of the F1 scores, where the x-axis represents pairs
of software versions used for training and testing (e.g.,
ant_1.5_1.6), and the y-axis represents the corresponding
F1 values obtained during testing. This figure shows that
the F1 of our model is higher than the base model most of
the time.</p>
        <sec id="sec-5-2-1">
          <title>5.2. Model Performance Variability</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Across PROMISE Projects and</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>Versions</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>4.4. Baseline Models</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>We compare our proposed approach against the following</title>
        <p>baseline models:
To address RQ2, Figure 3 presents the F1 scores of our
model across different projects and their respective
versions in the PROMISE dataset. In this figure, the x-axis
• Support Vector Machine (SVM): SVM, a classic represents pairs of software versions used for training and
and widely adopted machine learning algorithm, testing (e.g., ant_1.5_1.6), while the y-axis represents the
excels in both linear and non-linear classification corresponding F1 values obtained during testing. When
tasks and is known for its effectiveness in handling we examined the model’s performance across different
high-dimensional data. projects and its various versions, we observed certain
noteworthy patterns. Specifically, within the same project,
suggests that the predictive performance of our model
tends to vary when applied to certain projects. Although
we cannot pinpoint the exact reasons behind these changes
at this time, we speculate that they may have been
influenced by a variety of factors, including project-specific
characteristics, code complexity, and domain-related
differences.
such as Lucene, POI, and Xalan, our models show a high
degree of performance consistency across different
versions. This shows that our model is able to predict
results consistently when dealing with different versions
of certain projects. This consistency can be partially
attributed to the higher code similarity found between ver- Figure 3: F1 Score Across PROMISE Projects
sions within the same project, making it easier for models
to capture shared features and patterns.</p>
        <p>There are some differences between versions of Ant
and Synapse, these differences are relatively minor. In
contrast, projects such as Camel and JEdit show more
performance fluctuations, even within the same project. This</p>
        <sec id="sec-5-3-1">
          <title>5.3. The impact of hyperparameters on the performance of CNN-BiLSTM model</title>
          <p>To address RQ3, in this section, we study the impact of
hyperparameters on the performance of the CNN-BiLSTM
model for code defect prediction. Initially, we set the
hyperparameters to the following values: the number of
epochs is 10, the batch size is 64, the learning rate is 1e-4,
the number of CNN filters is 128, the number of BiLSTM
hidden units is 256, and the CNN filter size is 5 . After
that, we fixed other hyperparameters, and then gradually
manually adjust one of the other parameters, the CNN
iflter or the number of BiLSTM hidden units, to observe
changes in model performance.</p>
          <p>Figure 4 and Figure 5 show our experimental results,
the x-axis is the change in the number of CNN filters Figure 4: Effect of CNN Filter Length on F1 Score
and BiLSTM hidden units, and the y-axis shows the F1
score. We can see that the model performance fluctuates
greatly when a single parameter changes. For example,
the smaller the number of CNN filters, the better the
performance of the model. In Figure 5, the F1 score drops
after BiLSTM hidden unit is 16, but performs better and
tends to be stable after 256. Exploring the impact of each
hyperparameter individually would be a time-consuming
task, and it is difficult to predict how the model will
behave when these hyperparameters are combined. So we
used Optuna, which will constantly try to search for
hyperparameters that can make the model perform better based
on the search algorithm.</p>
          <p>Figures 6 and 7 show the F1 score (y-axis) for a certain
number of trials (x-axis). Specifically, Figure 6 is a scatter
plot, representing the F1 score that was obtained in each Figure 5: Effect of BiLSTM Hidden Units on F1 Score
trial, e.g., when the trial number is 5, the F1 score is the
value for the fifth trial. Figure 7 represents the best model
performance that can be achieved based on the search
until the current trial model is executed. So, in figure 5.4. Threats to Validity
7, when the trial number is 5, the F1 score is the best In our research, we have identified and addressed several
F1 score from the first to the fifth trial. We can observe potential threats to the validity of our findings.
that through continuous repetition and search, Optuna can The implementation of our Python experimental code
gradually search for better results. The entire process is for processing source code text and building models poses
automated, which greatly simplifies our hyperparameter a potential threat due to the possibility of bugs. To mitigate
tuning process. this, we took measures by leveraging mature third-party
libraries (such as javalang and PyTorch) and conducting
thorough code inspections. Additionally, we applied
random oversampling during data preprocessing, which could
introduce bias. Future work will explore alternative
methods to handle class imbalance and assess their impact on
results. Moreover, the use of Optuna for hyperparameter
optimization introduces potential variability in results due
to different search spaces and numbers of trials. To reduce
these threats, we plan to conduct more extensive searches
and explore larger search spaces.</p>
          <p>Our choice of a subset of projects from the PROMISE
dataset due to time constraints may impact the
generalizability of our findings, as the results may not generalize
well to other projects. To address this, we intend to include
a broader range of projects in future research.</p>
          <p>We evaluated our models using a limited set of
performance metrics, specifically precision, recall, and F1
measure. To reduce these threats, we will consider
incorporating additional metrics such as AUC-ROC and
MCC, among others, to provide a more comprehensive
assessment of model performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <sec id="sec-6-1">
        <title>In this research, we have introduced a novel approach</title>
        <p>that leverages JavaBERT-based embeddings with a
CNNBiLSTM model for software defect prediction. Our
approach harnesses semantic and contextual information in
program code to enhance prediction accuracy. Through
comprehensive experiments on the PROMISE dataset,
we have demonstrated the superiority of our model over
baseline models based on precision, recall, and F1-score
metrics.</p>
        <p>Although our study improves the performance of
software defect prediction compared to baseline models, we
still have many future works to do. In addition to what we
discussed in the "threats to validity" session, we can also
train the BERT model in different languages to adapt our
methods to different programming languages.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Omri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sinz</surname>
          </string-name>
          ,
          <article-title>Deep learning for software defect prediction: A survey</article-title>
          ,
          <source>in: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey of software defects research based on deep learning</article-title>
          ,
          <source>in: 2023 6th International Conference on Information Systems and Computer Networks (ISCON)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. O.</given-names>
            <surname>Elish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Elish</surname>
          </string-name>
          ,
          <article-title>Predicting defect-prone software modules using support vector machines</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>81</volume>
          (
          <year>2008</year>
          )
          <fpage>649</fpage>
          -
          <lpage>660</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Automatically learning semantic features for defect prediction</article-title>
          ,
          <source>in: Proceedings of the 38th International Conference on Software Engineering</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <article-title>Software defect prediction via convolutional neural network, in: 2017 IEEE international conference on software quality, reliability and security (QRS)</article-title>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Zhang,</surname>
          </string-name>
          <article-title>Software reliability prediction using a deep learning model based on the RNN encoder-decoder</article-title>
          ,
          <source>Reliability Engineering &amp; System Safety</source>
          <volume>170</volume>
          (
          <year>2018</year>
          )
          <fpage>73</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <article-title>Software defect prediction via LSTM</article-title>
          ,
          <source>IET software 14</source>
          (
          <year>2020</year>
          )
          <fpage>443</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Seml: A semantic LSTM model for software defect prediction</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>83812</fpage>
          -
          <lpage>83824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Software defect prediction based on gated hierarchical LSTMs</article-title>
          ,
          <source>IEEE Transactions on Reliability</source>
          <volume>70</volume>
          (
          <year>2021</year>
          )
          <fpage>711</fpage>
          -
          <lpage>727</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Uddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kefalas</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Khan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Zada</surname>
          </string-name>
          ,
          <article-title>Software defect prediction employing BiLSTM and BERT-based semantic feature</article-title>
          ,
          <source>Soft Computing</source>
          <volume>26</volume>
          (
          <year>2022</year>
          )
          <fpage>7877</fpage>
          -
          <lpage>7891</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>N. T. De Sousa</surname>
          </string-name>
          , W. Hasselbring,
          <article-title>JavaBERT: Training a transformer-based model for the Java programming language</article-title>
          ,
          <source>in: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukushima</surname>
          </string-name>
          ,
          <article-title>Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position</article-title>
          ,
          <source>Biological cybernetics 36</source>
          (
          <year>1980</year>
          )
          <fpage>193</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Paliwal</surname>
          </string-name>
          ,
          <article-title>Bidirectional recurrent neural networks</article-title>
          ,
          <source>IEEE transactions on Signal Processing</source>
          <volume>45</volume>
          (
          <year>1997</year>
          )
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Akiba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yanase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ohta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koyama</surname>
          </string-name>
          ,
          <article-title>Optuna: A next-generation hyperparameter optimization framework</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2623</fpage>
          -
          <lpage>2631</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J. Sayyad</given-names>
            <surname>Shirabad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Menzies</surname>
          </string-name>
          ,
          <source>The PROMISE Repository of Software Engineering Databases., School of Information Technology and Engineering</source>
          , University of Ottawa, Canada,
          <year>2005</year>
          . URL: http://promise.site.uottawa.ca/SERepository.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Thunes</surname>
          </string-name>
          ,
          <source>javalang: pure Python Java parser and tools</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>