<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Issue report classification using a multimodal deep learning technique</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Changwon Kwak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seonah Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of AI Convergence Engineering, Gyeongsang National University</institution>
          ,
          <addr-line>Jinju</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Aerospace and Software Engineering, Gyeongsang National University</institution>
          ,
          <addr-line>Jinju</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Issue reports are useful resources for developing open-source software and continuously maintaining software products. However, it is not easy to systematically classify the issue reports accumulated hundreds of cases a day. To this end, researchers have studied how to classify issue reports automatically. However, these approaches are limited to applying a textoriented classification method. In this paper, we apply a multi-modal model-based classification method, which has shown great performance improvement in many fields. We use images attached to an issue report to improve the performance of issue report classification. To evaluate our approach, we conduct an experiment, where we compare the performance of a text-based single-modal model and that of a text and image-based multi-modal model. The experimental results show that the multi-modal method yields 2.1% higher classification f1-score than that of the single-modal method. Based on the experimental results, we will continue our further exploration of the multi-modal model, by considering the characteristics of the issue report and various heterogeneous outputs.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Multimodal Deep learning</kwd>
        <kwd>Classification</kwd>
        <kwd>Issue reports</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Today, when developing and continuously
maintaining open-source software, open-source
contributors use issue management systems as a
way to quickly reflect users' inconveniences and
improvements of the software systems.
Stakeholders report bugs, functional
improvements, and other requests they find while
using the software as issues. Developers refer to
the issue report to discuss and improve the
software. In the case of active open-source
projects, these issue reports are generated and
accumulated by hundreds of cases per day. In such
a situation, it is not easy to systematically classify
and manage issues.</p>
      <p>
        Researchers have proposed automatically
classifying issue reports to manage them more
systematically [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref6 ref7">1,2,3,4,6,7</xref>
        ]. Recently, researchers
began to adopt deep learning techniques to
classify issue reports. For instance, Cho et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
used CNN and RNN deep learning techniques to
classify issue reports. However, existing
approaches obtain text data such as titles and body
contents of issue reports as inputs for training their
models. Those approaches do not use various
kinds of information that issue reports include.
      </p>
      <p>
        Meanwhile, in the area of deep learning
techniques, multi-modal deep learning models
using two or more modalities have shown
significant performance improvement in many
fields [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref9">9,10,11,12,13</xref>
        ]. This shows that we could
achieve better performance by overcoming the
limitations of using only single-modal data.
      </p>
      <p>
        We observed that issue reports often contain
relevant images. We, therefore, decided to apply
a multi-modal model-based classification method
to classify issue reports. Our proposed method
classifies issue reports by combining the
representation of text data and image data of issue
reports based on the method of Antol, Stanislaw,
et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We also conducted an experiment to see
whether our approach could achieve higher
performance. To evaluate our multi-modal
modelbased approach, we compare the performance of
our approach with that of the CNN-based model
method of a single-modal model, Cho et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
For this, we collected the issue reports of Vscode,
a major project of GitHub. We finally collected
17,500 issue reports with one or more images. To
resolve data imbalance issues, we downsampled
issue reports and used 8,500 issue reports. As a
result of the experiment, our approach showed an
improved f1-score of about 2.1%, compared to the
classification model of the existing method [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The paper is organized as follows. Section 2
introduces related works. Section 3 explains the
experimental setup. Section 4 presents the
experimental results. Section 5 discusses the
experimental results and Section 6 concludes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>The related studies to ours are the studies that
classified the issue reports of open-source projects
and the studies that applied multi-modal deep
learning.</p>
      <p>
        First, there are attempts to conduct the binary
classification of issue reports into bugs/non-bugs.
For example, Pandey et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] extracted a
summary from an issue report and classified the
issue report as a bug/non-bug using Naive Bayes
and SVM. In addition, Zhu et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used kNN to
determine whether the existing label is correct,
and classified an issue report as bug/non-bug
using Attention-based Bi-directional LSTM. As
the next multi-class classification, Kallis et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
used FastText to classify an issue report into Bug,
Enhancement, or Question. Kochhar et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
classified issue reports into 13 categories
including BUG, using SVM. Also, Fazayeli et al.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] tried to classify issue reports into five
categories: unclear, question, up for grabs, bug,
and others, and used the SMO machine learning
algorithm. Recently, Cho et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed a
method of classifying issue reports into features
of the software using a user manual with CNN and
RNN (i.e. LSTM) deep learning techniques. In
this paper, we conduct a comparative experiment
with the CNN model of Cho et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] as the
baseline.
      </p>
      <p>
        Researchers widely used multi-modal deep
learning models in the fields of Action
Recognition, Image Generation, Image
Captioning, and Visual Question Answering
(VQA). Antol, Stanislaw, et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] showed good
performance of a multi-modal model in VQA
work using a model with two channels, image and
text(question). Antol, Stanislaw, et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used
VGGNet for image channels and LSTM for text
channels to embed each data. Their proposed
approach combines features through
elementwise multiplication to transform the data into a
common space to make a classification. Although
there are more effective methods such as MUTAN
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], MCB [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and MLB [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] as data combining
methods, this work uses element-wise
multiplexing from Antol, Stanislaw, et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to
reduce model operations and simplify
implementation. That is, we experiment with
whether a multi-modal deep learning model
extracting text and image data from issue reports
can improve the performance of issue
classification, and we report the results.
      </p>
      <sec id="sec-2-1">
        <title>3. Experimental set-up</title>
      </sec>
      <sec id="sec-2-2">
        <title>3.1. Dataset</title>
        <p>We used the open-source project Vscode, a
major project of GitHub, in our experiment.
Among the issue reports of Vscode, we collected
issue reports with more than one image. We
collected the issue reports that were labeled ‘bug’
or ‘feature’. The total number of collected data
was 17,500, and we used the first image that is
most closely related to the issue among the images
of each data. Figure 1 shows the ratio of the
collected issue data for label. Finally, we used
about 8,500 issue reports through the
DownSampling method to resolve the imbalance
of data with the different numbers of data for each
label and to speed up the experiment by reducing
the model size. We used all of the 'Feature' label
data, which are relatively little data, and for the
'Bug' label data used an appropriate amount of
data from the latest data. So, we used all 'feature'
label data and 4,500 'bug' label data from the latest
data.
3.2.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Method</title>
        <p>model classifying the issue report using the
multimodal model. As shown in Figure 2, the model
gets the text (title) data and image data of issue
reports as inputs. The model classifies the issue
report into “bug” or “feature.”. The model passes
the image and text data through a CNN-based
channel, respectively, to
extract
expression
vectors. These features are combined through
point-wise multiplication operation to express
them in a common space. After that, the model
performs a softmax operation and finally makes a
classification of the issue report as an output.
3.3.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Measurements</title>
        <p>The metrics used for measuring classification
performance were precision, recall, and f1-score.
The calculation for each metric was conducted
using the equations below.</p>
        <p>∑ =0(</p>
        <p>∑ =0(</p>
        <p>=

∑ =0(
=
 
(4)
 1 −</p>
        <p>∑ =0( 1−
=
 ∗</p>
        <p>′ 

∑ =0(</p>
        <p>′ 
 1 − 
 = 2 ∗



 ∗
 +

 )


 ) (5)
(6)</p>
        <p>In the above equations,  
represents the
number of issue reports that the model predicted to
 that belonged to 

. The symbol  
represents the number of issue reports that the
model predicted to 

. The symbol  
 but did not belong to
denotes the number of
and did not belong to 
issue reports that the model predicted to not 

, and   denotes the
number of issue reports that the model predicted to

not belong to 
 but belonged to 

.</p>
      </sec>
      <sec id="sec-2-5">
        <title>4. Experimental results</title>
        <p>precision, recall, and f1-score, and the metric of
each class is calculated and a weighted average is
used according to the class frequency. The results
of the proposed model, the multi-modal model,
showed 73.432% precision, 73.460% recall, and
73.423% f1-score. The CNN model, which is a
single-modal model, showed 71.356% precision,
71.386% recall, and 71.315% f1-score. As a result,
the
proposed
classification
model performed
better than the existing classification model.</p>
        <p>However, it is difficult to regard that it as a
meaningful
improvement
because
the
performance improvement is insignificant. To
determine whether the image data used in the
experiment is suitable for the classification task,
we constructed a single-modal model using only
the image data to measure the classification
 ∗</p>
        <p>′</p>
        <p>′ 
 =</p>
        <p>+
 ∗</p>
        <p>′ 

∑ =0( 
     
′</p>
        <p>)
 =</p>
        <p>+
accuracy. As a result, the single-modal model
showed 53.373% precision, 54.250% recall, and
53.467% f1-score, which does not seem to help
the image data with the classification task.</p>
      </sec>
      <sec id="sec-2-6">
        <title>5. Discussion</title>
        <p>Existing issue report classification studies
have limitations in applying text-oriented
singlemodal classification methods such as title and
body content. In addition to text modality, there
are other modality data in issue reports. In
particular, we conducted this study based on the
fact that images exist in many issue reports.
However, the results of our proposed model did
not improve the performance as expected.
Therefore, we conducted an additional
experiment and checked the accuracy of a deep
learning model that only uses images of issue
reports. The classification accuracy of the deep
learning model using only images is around 53.5
% f1-score. This means that the image data used
in this study are not primary factors on
classification performance. Even so, the
information that images have is helpful for
classification work. In fact, it is easy for
developers to understand the issue report when
they see the text and image data of the issue report
together. Most of the images in the body of the
issue report are parts of code captured in the
development environment. Compared to using the
source code directly, it seems complicated to
understand the meaning of data in image form.
Therefore, it is quite difficult to distinguish the
differences between the issue reports labeled “bug”
and the issue reports labeled “feature”. Now, we
question if we recognize the source code from
images, the source information will be able to help
our classification.</p>
        <p>Nonetheless, based on these experimental
results, we were able to confirm the effect of the
multi-modal application of the issue report.
Therefore, we will continue our further
exploration of the multi-modal model, which
takes into account the characteristics of the issue
report and various heterogeneous outputs. First,
most of the images attached to issue reports
contain code and text. Therefore, if we extract the
code and text from the image and use them for
classifying, it is expected to show better
performance than the existing method using the
image. Next, since users can attach codes to the
issue report, we can use the code as another
modality. Since the code is a source that is directly
related to the software issue, it is highly valuable.
Therefore, it is possible to try to improve the
performance by using it as a multi-modal together
with the existing text data.</p>
      </sec>
      <sec id="sec-2-7">
        <title>6. Conclusion</title>
        <p>We have proposed a method for classifying
issue reports based on a multi-modal deep
learning model using text data (title) and image
data (body) of the issue report. Experimental
results show that the classification model of the
proposed method has an f1-score improvement of
about 2.1% over the existing classification model,
and that the multi-modal deep learning model is
positive for improving the performance of the
classification task.</p>
        <p>We infer that these results come from the fact
that the model utilizes various information from
the issue report. When users write an issue report,
they often write a description of the issue by
attaching images, videos, and codes, etc., in
addition to the title and body content in text
format. This is actually very helpful data for
humans to understand. Therefore, we infer that the
model could better represent the issue report when
also using images that are directly related to the
content rather than just the text of the title or body,
resulting in better classification performance. In
the future, we will explore and advance the
utilization strategy of image data in issue reports.
And we will create a multi-modal model that uses
more heterogeneous components of issue reports
for more accurate issue classification.</p>
      </sec>
      <sec id="sec-2-8">
        <title>7. References</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hudait</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>Automated classification of issue reports from a software issue tracker</article-title>
          .
          <source>Progress in Intelligent Computing Techniques: Theory, Practice, and Applications</source>
          , Springer, Singapore, pp.
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , T. “
          <article-title>A Bug or a Suggestion? An Automatic Way to Label Issues</article-title>
          .”. arXiv:
          <year>1909</year>
          .00934,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Panichella</surname>
          </string-name>
          ,
          <article-title>A systematic comparison of search algorithms for topic modelling-a study on duplicate bug report identification</article-title>
          ,
          <source>in: International Symposium on Search Based Software Engineering</source>
          , Springer, Cham, pp.
          <fpage>11</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Automatic classification of non-functional requirements from augmented app user reviews</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering</source>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>353</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Di</given-names>
            <surname>Sorbo</surname>
          </string-name>
          , G. Canfora, Panichella, Ticket Tagger:
          <article-title>machine learning driven issue classification</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Software Maintenance and Evolution(ICSME)</source>
          , IEEE, pp.
          <fpage>406</fpage>
          -
          <lpage>409</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Requirements traceability technologies and technology transfer decision support: A systematic review</article-title>
          ,
          <source>Journal of Systems and Software</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krinke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Establishing multilevel test-to-code traceability links</article-title>
          ,
          <source>in: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>CHO</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LEE</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , KANG,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,
          <article-title>Classifying issue reports according to feature descriptions in a user manual based on a deep learning model</article-title>
          .
          <source>Information and Software Technology</source>
          ,
          <volume>142</volume>
          :
          <fpage>106743</fpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Antol</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            , J., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Vqa: Visual question answering</article-title>
          .
          <source>In: Proceedings of the IEEE international conference on computer vision</source>
          .
          <year>2015</year>
          . p.
          <fpage>2425</fpage>
          -
          <lpage>2433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>KAFLE</surname>
          </string-name>
          ,
          <article-title>Kushal; KANAN, Christopher. Visual question answering: Datasets, algorithms, and future challenges</article-title>
          .
          <source>Computer Vision</source>
          and Image Understanding,
          <year>2017</year>
          ,
          <volume>163</volume>
          :
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Fukui</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>D. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Multimodal compact bilinear pooling for visual question answering and visual grounding</article-title>
          .
          <source>arXiv preprint arXiv:1606</source>
          .
          <year>01847</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>On</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          <article-title>Hadamard product for low-rank bilinear pooling</article-title>
          .
          <source>arXiv preprint arXiv:1610.04325</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ben-Younes</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cadene</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cord</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Thome</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Mutan: Multimodal tucker fusion for visual question answering</article-title>
          .
          <source>In: Proceedings of the IEEE international conference on computer vision</source>
          .
          <year>2017</year>
          . p.
          <fpage>2612</fpage>
          -
          <lpage>2620</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>