<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the IRSE track at FIRE 2022: Information Retrieval in Software Engineering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Srijoni Majumdar</string-name>
          <email>majumdar.srijoni@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ayan Bandyopadhyay</string-name>
          <email>bandyopadhyay.ayan@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samiran Chattopadhyay</string-name>
          <email>samiran.chattopadhyay@jadavpuruniversity.in</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Partha Pratim Das</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul D Clough</string-name>
          <email>p.d.clough@shefield.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasenjit Majumder</string-name>
          <email>prasenjit.majumder@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TCG CREST</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>West-Bengal</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DA-IICT Gandhinagar</institution>
          ,
          <addr-line>Gujarat</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IIT Kharagpur</institution>
          ,
          <addr-line>West-Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jadavpur University</institution>
          ,
          <addr-line>West-Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Shefield University</institution>
          ,
          <addr-line>Shefield</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>TPXimpact London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Code Comments increase the readability of the surrounding code if they highlight concepts that are not evident from the source code itself. Hence, evaluation of the quality of code comments is important to de-clutter large code bases and remove not useful comments. The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source github C based projects. Overall 34 experiments have been submitted by 11 teams from various universities and software companies. The submissions have been evaluated quantitatively using the F1-Score and qualitatively based on the type of features developed, the supervised learning model used and their corresponding hyper-parameters. The best performing architectures mostly have employed transformer architectures coupled with a software development related embedding space.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;bert</kwd>
        <kwd>GPT-2</kwd>
        <kwd>Stanford POS Tagging</kwd>
        <kwd>neural networks</kwd>
        <kwd>abstract syntax tree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Assessing comment quality can help to de-clutter code bases and subsequently improve code
maintainability. Comments can significantly help to read and comprehend code if they are
consistent and informative. Comment analysis approaches have mainly focused on detecting
inconsistent comments [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] but not appreciably on the quality and relevance of the information
contained in a comment. A poorly written or superfluous comment duplicating the information
evident from source code identifiers can hinder the readability of code, even though it may be
consistent [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Several approaches have been proposed to classify comments based on explicit syntactic
information, such as the presence of specific tags (e.g., @param, @deprecated, etc.), words,
and symbols; or implicit details, such as the type of associated code construct, length of the
comment, parts of speech (POS) and dependency relations of comment words or the cosine
similarity of vector representation of words in code-comment snippets [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ].
      </p>
      <p>These approaches do not target comment quality evaluation based on the interpretation of
the information contained in comments. The syntactic methods used for comment classification
need to be augmented so as to extract the semantics of a comment in order to develop an overall
quality assessment model.</p>
      <p>
        Further, the perception of quality in terms of the ’usefulness’ of the information contained in
comments is relative and hence is perceived diferently based on the context. Bosu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
attempted to assess code review comments (logged in a separate tool) in the context of their
utility in helping developers write better code through a detailed survey at Microsoft. A similar
quality assessment model is important to analyse the type of source code comments that can
help for standard maintenance tasks but is largely missing.
      </p>
      <p>
        Majumdar et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed a comment quality evaluation framework wherein comments
were assessed as ’useful’, ’partially useful’, and ’not useful’ based on whether they increase the
readability of the surrounding code snippets. The authors analyse comments for concepts that
aid in code comprehension and also the redundancies or inconsistencies of these concepts with
the related code constructs in a machine learning framework for an overall assessment. The
concepts are derived through exploratory studies with developers across 7 companies and from
a larger community using crowd-sourcing.
      </p>
      <p>
        The IRSE track of FIRE 2022, extends the work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and empirically investigates comment
quality with a larger set of machine learning solvers and features. The track is targeted to
automate program comprehension tasks and subsequently reduce code maintenance overhead.
In its first edition, the IRSE track is based on a task for quality evaluation of comments into
two clusters - ’useful’ and ’not useful’. A ’useful’ comment (refer Table 1) contains relevant
concepts that are not evident from the surrounding code design, and thus increases the
comprehensibility of the code. The suitability of analysing comment quality using various vector
space representations of code and comment pairs along with standard textual features and code
comment correlation links are evaluated. A total of 34 experiments have been submitted by 11
teams.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Several approaches exist that attempt to assess the quality of comments by detecting
inconsistencies with source code or by classifying comments based on syntactic properties.</p>
      <p>
        Tan et al. [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ] use the sequence of occurrence of words (from an enumerated set) in a
comment and the surrounding code to develop rules for detecting inconsistent comments
related to memory errors.
      </p>
      <p>
        Ying et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] undertake an empirical study to derive the attributes of the various categories
of task comments in Java codes used for developer communication. Storey et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] presented
a detailed study to understand how the task comments are interpreted in larger projects during
the diferent phases of the software lifecycle.
      </p>
      <p>
        Comment quality evaluation: Steidl et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a comment quality detection method
by comparing the similarity of words in code-comment pairs using the Levenshtein distance and
length of comments to filter out trivial and non-informative comments. Rahman et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] detect
useful and non-useful code review comments (logged in review portals) based on attributes
identified from a survey conducted with developers of Microsoft [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. They use textual features
(Table 2) and train using a set of 1,200 review comments for automated quality assessment using
decision tree and naive bayes algorithms. Recent work in the Declutter Challenge of DocGen2
by Liu et al. [13] detects ’not useful’ comments using textual and structural features (Table 2)
in a machine learning framework. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed a framework to evaluate comments based on
concepts that are relevant for code comprehension. They developed textual and code correlation
features using a knowledge graph for semantic interpretation of information contained in
comments (Table 2).
      </p>
      <p>
        The available approaches mostly target to evaluate the quality of the comments by mining
for irrelevant words and phrases, coupled with the repetitiveness in the surrounding constructs.
The context based on which the quality is defined is essential, like Rahman et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] assess
comments based on attributes limited to code review comments only. Similarly, Majumdar et
al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analysed comments by mining concepts that are relevant to code comprehension and
can aid in software maintenance tasks. The IRSE track extends the approach proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to
explore various vector space models and features for binary classification and evaluation of
comments.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. IRSE Track Overview and Data Set</title>
      <p>The following section outlines the task descriptions and the characteristics of the dataset.</p>
      <sec id="sec-3-1">
        <title>3.1. Task Description</title>
        <p>Comment Classification : A binary classification task to classify source code comments as Useful
or Not Useful for a given comment and associated code pair as input.</p>
        <p>Input: A code comment with surrounding code snippet (written in C)
Output: A label (Useful or Not Useful) that characterises whether the comment helps developers
comprehend the associated code</p>
        <p>Therefore in this classification task, the output is based on whether the information contained
in the comment is relevant and would help to comprehend the surrounding code, i.e., it is useful.</p>
        <p>Useful: Comments have suficient software development concept → Comment is Relevant,
and these concepts are not mostly present in the surrounding code → Comment is not Redundant,
hence the comment is Useful</p>
        <p>Not Useful: Comments have suficient software development concept → Comment is
Relevant, and these concepts are mostly present in the surrounding code → Comment is Redundant,
hence the comment is Not Useful</p>
        <p>It may also be the case that comments do not contain suficient software development concepts
→ Comment is Not Relevant, hence the comment is Not Useful.</p>
        <p>It is left to the participants to decide on the threshold value for how many concepts retrieved
make a comment relevant or how many matches with surrounding code make a comment
redundant.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. What is a Relevant Code Comment?</title>
          <p>The notion of relevant comments refers to those that developers perceive as important in
comprehending the associated or surrounding lines of code. These concepts are related to the
outline of the algorithm, data-structure descriptions, mapping to user interface details, possible
exceptions, version details, etc. In the below examples, the comments highlight useful details
about the input data to the function, which is not evident from the associated code itself.
1
2
# works on a two dimensional data matrix (each of size 8) generated from the light rider
bot module
int* flood_fill(self, position, visited) {...}
1 /* uses png_calloc defined in pngriv.h*/
2 PNG_FUNCTION(png_voidp,PNGAPI
3 png_calloc,(png_const_structrp
4 png_ptr, png_alloc_size_t size),PNG_ALLOCATED)
5 { }</p>
          <p>Therefore, a relevant comment provides more information for the surrounding code and
subsequently aids better comprehension that can improve software maintenance.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Relevant, but Redundant</title>
          <p>However, in the example below, even if the comment contains relevant information, it is
already available in the associated code rendering the comment redundant.
1 // PHP Shutdown method to destroy the global php hash map, using zend hash api’s
2 PHP_MSHUTDOWN_FUNCTION(hash) { ... zend_hash_destroy(&amp;php_hashtable); . }</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>We select 5 projects from Github and use the modified random sampling approach of Cochran [15]
to sample source files with equal probability and hence provide an unbiased representation
of the population (C code files with comments). We gather a total of 318 files with 20,206
comments.</p>
        <p>Ground Truth Generation: For every comment, a label (Useful or Not Useful) has been
generated by a team of 14 annotators. Every comment has been annotated by 2 annotators with a
kappa ( ) value of 0.734 (Cohen’s metric [16]). The annotation process has been supervised
through weekly meetings and brainstorming sessions and peer review. Out of the total 16,000
comments, 2,285 comments were annotated by every individual annotator. A total of 156
man-hours were required to complete the annotation process.</p>
        <p>For the IRSE track, we use a set of 9048 comments (from Github) with comment text,
surrounding code snippets, and a label that specifies whether the comment is useful or not. Sample
data has been characterised in Table 1.</p>
        <p>• The development dataset contains 8048 rows of comment text, surrounding code snippets,
and labels (Useful and Not useful).
• The test dataset contains 1,000 rows of comment text, surrounding code snippets, and
labels (Useful and Not useful).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Participation and Evaluation</title>
      <p>In its first edition, IRSE 2022 received a total of 34 experiments from 11 teams. As this track is
related to software maintenance, we received participation from companies like Amazon and
Mentor Graphics along with several research labs of educational institutes.</p>
      <p>The various teams with the details of their submissions are characterised in Table 3.</p>
      <p>Evaluation Procedure: Candidates were asked to submit predicted labels (’useful’ or ’not
useful’) for every data point in the test set of 1000 comments. This was used by our script to
generate the precision, recall, and the F1-Score (Macro) using the annotated (golden) labels.</p>
      <p>Features: Apart from evaluating the prediction metrics, we analysed the types of features the
teams have used to devise the machine learning pipeline. The teams have performed routine
pre-processing and have retained the significant words or letters only for both the code and
comment pairs. Further, some of the teams have also used morphological features of a comment
like a length, significant words ratio, parts of speech characteristics, or occurrence of words from
an enumerated set as textual features. To correlate code and comment and detect redundancies,
the teams mostly used grep-like string match to find similar words.</p>
      <p>Vector Space Representations: Code and comments belong to diferent semantic granularity
which is unified by a vector space representation. The participants have used various pre-trained
embeddings to generate vectors for the words like those based on one hot encoding, tf-idf based,
word2vec or context aware like ELMo and BERT. Each of the employed embedding models are
trained or finetuned using software development corpora.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>The F1-Scores have been analysed based on the machine learning models used, the features and
pre-trained embeddings for projection to vector space.</p>
      <p>The dataset provided was balanced and had 4015 useful comments and 4033 not useful
comments. The textual features used by the teams were mostly related to mining specific words
and determining significant words. Similarly, almost all teams used string matching to locate
overlapping words between code and comment.</p>
      <p>Significant diferences were analysed in terms of the pre-trained embeddings used and the
machine learning models which contributed to the improvement in the F1-Score.</p>
      <p>Machine Learning Architectures: The best F1 score was obtained using GPT architecture,
although it is resource critical and can be aforded by software companies. The other machine
learning models commonly used are recurrent neural networks, support vector machines,
random forest and logistic regression with textual and correlation features. The F1-Score
obtained using recurrent neural networks, support vector machines are comparable to the ones
obtained from BERT. This is because the dataset is balanced and also due to the use of various
textual features apart from numerical vectors (as features).</p>
      <p>
        Pre-Trained Embeddings: Both context-aware and context-independent pre-trained
embeddings trained from scratch or finetuned with software development concepts have been used.
Results are better with the recently released codeBERT pre-trained embeddings [17] where
natural language and programming language pairs are used from software projects of diferent
domains to train using masked language modeling. Comparable results have been obtained
by using CodeELMo [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which trains ELMo from scratch using software development corpora
from books, journals, and code repositories.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The IRSE track in its first edition empirically investigates various approaches in a
machinelearning framework for automated comment quality evaluation. The comments are evaluated
based on whether they contain information that can aid in understanding the surrounding code.
A total of 11 teams participated and submitted 34 experiments that used various types of machine
learning models, embedding spaces, and features. The best F1-Score of 90.8 was reported by
experiments conducted using GPT-2 architecture with textual and numerical features from
CodeBERT vector space embeddings, to classify comments as ’useful’ and ’not useful’.
[13] M. Liu, Y. Yang, X. Peng, C. Wang, C. Zhao, X. Wang, S. Xing, Learning based and
context aware non-informative comment detection, International Conference on Software
Maintenance and Evolution (ICSME), IEEE, 2020, pp. 866–867.
[14] M. Alver, N. Batada, Jabref: Cross-platform citation and reference management software,</p>
      <p>Open Source, 2003. https://github.com/JabRef/jabref, Last Accessed: December 12, 2020.
[15] J. Kotrlik, C. Higgins, Organizational research: Determining appropriate sample size in
survey research, Information technology, learning, and performance journal 19 (2001) 43.
[16] N. Gisev, et al., Interrater agreement and interrater reliability: key concepts, approaches,
and applications, Research in Social and Administrative Pharmacy 9 (2013) 330–338.
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , G. Krishna,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>Zhou, icomment: Bugs or bad comments?, Association for Computing Machinery's Special Interest Group on Operating Systems Review (SIGOPS)</article-title>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I. K.</given-names>
            <surname>Ratol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Robillard</surname>
          </string-name>
          , Detecting fragile comments,
          <source>International Conference on Automated Software Engineering (ASE)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Freitas</surname>
          </string-name>
          , D. da Cruz,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <article-title>A comment analysis approach for program comprehension</article-title>
          ,
          <source>Annual Software Engineering Workshop</source>
          (SEW), IEEE,
          <year>2012</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Automated evaluation of comments to aid software maintenance</article-title>
          ,
          <source>Journal of Software: Evolution and Process</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <article-title>e2463</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pascarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacchelli</surname>
          </string-name>
          ,
          <article-title>Classifying code comments in java open-source software systems</article-title>
          ,
          <source>International Conference on Mining Software Repositories (MSR)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langlais</surname>
          </string-name>
          ,
          <article-title>How good is your comment? a study of comments in java programs</article-title>
          ,
          <source>International Symposium on Empirical Software Engineering and Measurement (ESEM)</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Steidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hummel</surname>
          </string-name>
          , E. Juergens,
          <article-title>Quality analysis of source code comments</article-title>
          ,
          <source>International Conference on Program Comprehension (ICPC)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <article-title>Characteristics of useful code reviews: An empirical study at microsoft</article-title>
          ,
          <source>Working Conference on Mining Software Repositories, IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Hotcomments: how to make program comments more useful?, in: Conference on Programming language design and implementation (SIGPLAN)</article-title>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abrams</surname>
          </string-name>
          ,
          <article-title>Source code that talks: an exploration of eclipse task comments and their implication to repository mining</article-title>
          ,
          <source>ACM SIGSOFT software engineering notes, ACM</source>
          <volume>30</volume>
          (
          <year>2005</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M.-A. Storey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ryall</surname>
            ,
            <given-names>R. I.</given-names>
          </string-name>
          <string-name>
            <surname>Bull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Singer</surname>
          </string-name>
          , Todo or to bug,
          <source>International Conference on Software Engineering (ICSE)</source>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. M. Rahman</surname>
            ,
            <given-names>C. K.</given-names>
          </string-name>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , R. G. Kula,
          <article-title>Predicting usefulness of code review comments using textual features and developer experience</article-title>
          ,
          <source>International Conference on Mining Software Repositories (MSR)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>