<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>But Do Commit Messages Matter? An Empirical Association Analysis with Technical Debt</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>But Do Commit Messages Matter?</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>An empirical analysis is conducted to investigate the association of the content of commit messages and technical debt. The analysis is based on 33 open-source Apache JAVA projects. Structural Topic Modelling, a recently developed text mining technique is employed for sophisticated analysis. The result shows that the certain content of commit messages such as empty messages are potentially associated with Technical Debt.</p>
      </abstract>
      <kwd-group>
        <kwd>commit messages</kwd>
        <kwd>technical debt</kwd>
        <kwd>text mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1.1</p>
      <sec id="sec-1-1">
        <title>Commit Messages</title>
        <p>
          Commit messages writing plays an important role in software development for
it records, or documents the changes in natural languages during the progress of
the project. It is believed that well-written commit messages can provide useful
information to other developers in the life-cycle and eventually contribute to
enhancing the software quality in the end ([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]). It is almost a common
sense that a good teamwork oriented developer takes not only excellent coding
skills but also the ability to write informative commit messages.
        </p>
        <p>
          However, to the author's best knowledge, there has not been an empirical
study based on real-world projects which challenges or con rms the above
mentioned \common sense". Besides, many related issues still haven't been properly
discussed, for instance, does empty message in uence the software quality? Or,
what kind of content in the commit messages are really helpful to the software
quality? The purpose of this research is to provide suggestions to developers
based on empirical analysis while writing commit messages .
Technical Debt (TD) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is a metaphor to describe the issues (including coding
issues or documentation issues) generated in the development process that cause
potential problems and will have to be solved by "paying back" extra e orts in
the future. For example, code smells (e.g. nested, too complicated code structure)
is one type of TD issues which will potentially result in di culties in maintenance
([
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]).
        </p>
        <p>
          Instead of analyzing the software with human e orts e.g. code reviews, to
nd out the potential TD issues, there are several Technical Debt measurement
tools available such as Better Code Hub 1, Coverity Scan 2 and SonarQube 3.
Among those mentioned tools SonarQube is one of the most commonly used
tool in both industry and research community [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. One e ective functions of
SonarQube is that it can automatically detect TD issues in the software with
rule-based algorithms.
        </p>
        <p>
          TD has attracted considerable attentions in research communities (e.g. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]). Recently one open access dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] has been proposed and it
provides more opportunities of TD related data analysis. One recent work [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
has provided basic analysis including di useness and distribution of TD of the
collected projects based one the above-mentioned dataset.
1.3
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Topic Modelling</title>
        <p>
          When it comes to text analysis, one common approach is topic modelling ([
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]). A topic model is a probabilistic model in which, a topic is de ned
as a distribution over words, and it is assumed that when the writing is going
on, the author rst draw a topic label in an "topic urn" and then draw a word
from the topic corresponding "word urn".
        </p>
        <p>Based on the above mentioned assumption, in each document, topics are
occurring with di erent strength (prevalence), so that some documents may be
composed mostly of a particular topic whereas others are a mixture of several
other topics.</p>
        <p>
          Thus, topic modelling is capable of not only capturing the underlying topic
content but also modelling the topic prevalence among a set documents. This
research employees STM [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] , a recently developed topic model to explore the
relationship between commit messages and potential TD issues.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        There have been research works applying topic modelling to analyze commit
messages. For example, Hindle et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] analyzed the commit comments with
LDA. Their work has performed the analysis on repositories of three database
1 Better Code Hub, https://bettercodehub.com
2 Coverity Scan. https://scan.coverity.com
3 SonarQube. https://www.sonarqube.org
systems PostgreSQL, MaxDB and Firebird. They have found that the commit
messages in three di erent database system emphasize di erent concepts. In the
PostgreSQL, the most prevalent topic is related to the external developers Dal
Zotto and Dan McGuirk, in the MaxDB, the most prevalent topic is related to
build system les whereas in the Firebird, the most prevalent topic is related
to commonly used terms in commit messages such as \added", \ xed", and
\updated".
      </p>
      <p>
        Hu et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] have studied commit messages using Dynamic Topic Modeling
(DTM) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]to discover the underlying topics and their evolution processes over
time. Their work has performed case studies on jEdit and PostgreSQL, two
wellknown open source software systems, each contains 12116 and 15990 commit
messages respectively. The analsys has drawn some interesting topics such as
\Fixing Bug and Error", \GUI" topics in the jEdit case and \Bug Fixing",
\Building and Con guration" topics in the PostgreSQL case.
      </p>
      <p>However, although the above mentioned works have use topic modelling
techniques, they have not investigated the relationships between the extracted topics
and the software quality measurements especially TD that might be potentially
associated with the commit messages.</p>
      <p>
        There is another group of works ([
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]) focus on sentiment analysis
of commit logs. Among those works, the most relevant one is the work of is
Islam et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which investigates bug related commits and the sentiment of the
corresponding commit messages. They have analyzed more than 24000 commit
messages and found that both bug-introducing and bug- xing commit messages
have higher positive emotional scores.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>Accumulated Debt Comuptation</title>
        <p>
          In this research. The Technical Debt Dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is used. In the dataset, the
variable \reliabilityRemediationE ort" is generated by SonarQube. The
SonarQube automatically analyzes the changes of the current software and estimates
the time spent in the future to solve or the \remediate" the detected TD issues.
If the SonarQube detects that some TD issues existed before the commit are
solved, the value of \reliabilityRemediationE ort" will decrease, on the other
hand, the value of \reliabilityRemediationE ort" will increase if the SonarQube
detects extra issues after the commit.
        </p>
        <p>Using the \reliabilityRemediationE ort" variable , the \debt change" of each
commit is gained by calculating the di erence between the
\reliabilityRemediationE ort" value before and after the commit. Moreover, an \accumulated debt"
of a committer has contributed to a project can be obtained by summing up all
the debt change values under a certain committer of the project.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Processing</title>
        <p>The empty commit messages can potentially in uence the software quality, and
they also re ect the developers' style while writing commit messages. To evaluate
the impact and take it into account in the topic model, if the the commit message
is empty (575 out of in total 128375 commit messages), the message is labeled as
\emptymessage". The \emptymessage" is then treated as a special vocabulary.
All the commit messages generated by a certain commiter under a speci c project
are gathered as a document. There are in total 1083 documents. To focus on the
contribution of a speci c developer to a project, the same developer in two
di erent projects are considered two di erent developers.</p>
        <p>The text are transformed into lower-cases and extra white space are removed.
Note that, to better preserving the sophisticated writing hobbies in the commit
messages, the text does not undergo some standard processing steps such
stemming or lemmatizing.</p>
        <p>The previously mentioned it the commit messages contribute to TD.
Documents having positive \accumulated debt" are categorized as \debt contributed",
otherwise \no debt contributed".
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Text Mining</title>
        <p>
          The Structural Topic Model (STM) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is used to analysis the relationship
between commit messages topics and the TD issues. STM is a more advanced
model such that, comparing with topic models such as LDA and DTM, it takes
the document-level covariates into account. That is, instead of analyze the topics
and their relationships to covariates in a two-stage manner, STM provides a
integrated solution to the problem. The STM has been widely used in di erent
research disciplines such as policy research [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], climate change [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          In the model selection process, for each topic number from 6 to 20, 10 random
initialized models with randomly selected 50 % of the training documents are
built. The topic number with the highest average held-out likelihood value on
the 50% of the testing documents is selected. After deciding the topic number,
50 models are built and the one with the best semantic coherence [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] over topics
is chosen as the nal model.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Result</title>
      <p>Among 1083 documents (developer-project pairs), 868 of them are \no debt
contributed", 215 of them are \debt contributed". The proportion of \debt
contributed" documents is around 20%.</p>
      <p>The result of model selection is shown in Figure 1. The x-axis (K) represents
the number of topics and the corresponding held-out likelihood is shown on
yaxis. Since the K = 18 reaches the highest held-out likelihood, it is thus selected.</p>
      <p>Top words of the the extracted 18 topics can be found in Table 1. Apperently
the common words such as \the", \and", \ x", \for" can be found in most of
the topics. The relationship between topic prevalence and TD can be observed
in Figure 2. The Topics 5, 6, 8, 9 are leading to not adding TD to the projects
whereas the Topics 1, 3, 13, 15, 16 are leading to adding TD to the projects.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        The prevalence of common words in the extracted topics are similar to the results
of the previously mentioned related works ([
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). Another reason can be
due to that the stop words are not removed in the data processing steps. However,
the di erences between topics are still observable.
      </p>
      <p>When focusing on the no debt contributing Topics 5, 6, 8 and 9, one
observation worth noticing is that is seems that detailed-oriented messages can somehow
reduce the TD. For example Topic 5 has the word \version", \common", Topic 6
has the word \line", \code" and \here". Besides, they both have the word \this".
It is likely that the Topic 8 is related to nal stage of the software development
(\build", \ le", \tag" and \now") so the TD have to be removed. The Topic
9 contains only common words, the reason why it reduces the TD need further
analysis it can be that it does not add or reduce the TD, and it is categorized
into the \no debt contributed" group.</p>
      <p>Most of the debt contributing Topics 1, 3, 13, 15 and 16 include the term \ x".
It can be due to that some code smells or TD issues are generated by over-editing
the code. Topic 13 contains terms \submitted", \reviewed" and \obtained" could
be related the commits in some certain stages of the software development that
generates TD issues and the "emptymessage" potentially shows that empty
commit messages could result in TD issues. Topic 3 and 16 could be developer or
project related.</p>
      <p>The discovered topics related to TD issues are somehow explainable, however,
some more detail-oriented analysis are still required to draw a more
comprehensive picture.</p>
      <p>Another limitation of this research is that the type project and the type
of the changed le of a commit is not taken into consideration. In theory, the
di culty and complexity can a ect the TD. Beside, the type of changed le
should a ect the content of commit messages, e.g. changing \.html" les leads
to more web developing related content whereas changing \.README" les is
more likely to generate use user guideline messages. One practical di culty in
this analysis task is that some commits are related to not just one changed le
e.g. the changed les contain \.README" les and \.html" les. Therefore the
e ect of the type of le is not taken into considerations. The e ect of the above
mentioned factors can be taken into account in future works.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This research conducted an empirical text analysis focusing on the related
commit messages and TD issues. The analysis uses a sophisticated topic modelling
technique to analyze a collection comprises 128375 commit messages and the
corresponding estimated TD measurement across 33 di erent real-world open
access JAVA projects. The ndings show that
1. Some topics extracted from the commit messages and TD potentially
associated.
2. In general, writing detailed-oriented commit messages have negative
association to TD.
3. Empty commit message has positive association to TD.</p>
      <p>However, more details and mechanisms regarding to the discovered
associations still require further investigations.</p>
      <p>Topic Number Top 6 Words
Topic 1 the, and, for, block, xed, from, use, bug, xing
Topic 2 x, add, for, remove, and, update, bug, use, support
Topic 3 alexantonenko, for, via, contributed, and, x, shwethags, the, atlas
Topic 4 the, for, closes, and, this, add, mavenreleaseplugin, prepare, from
Topic 5 the, and, for, from, build, version, adding, commons, this
Topic 6 cvs, this, the, then, from, here, and, has, line
Topic 7 the, for, jaimin, and, add, from, x, ncole, task
Topic 8 the, for, and, added, tag, build, le, ant, now
Topic 9 add, javadoc, for, update, and, x, from, the, use
Topic 10 the, and, for, added, xed, from, test, with, connection, not, and, wizard
Topic 11 aonishuk, for, not, atkach, ababiichuk, onechiporenko, page, make, and
Topic 12 not, the, with, use, was, from, samples, make, and
Topic 13 from, submitted, reviewed, obtained, the, for, added, and, emptymessage
Topic 14 yusaku, via, srimanth, for, the, and, not, should, page
Topic 15 the, added, that, for, and, patch, test, xed, with
Topic 16 via, for, ambari, swagle, not, the, upgrade, dsen, and
Topic 17 the, dlysnichenko, akovalenko, for, and, from, created, jluniya, moe
Topic 18 added, the, for, xed, updated, removed, and, javadoc, test</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Allman</surname>
          </string-name>
          , E.:
          <article-title>Managing technical debt</article-title>
          .
          <source>Commun. ACM</source>
          <volume>55</volume>
          (
          <issue>5</issue>
          ),
          <volume>50</volume>
          {
          <fpage>55</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.:
          <article-title>Dynamic topic models</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on Machine learning</source>
          . pp.
          <volume>113</volume>
          {
          <fpage>120</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research 3(Jan)</source>
          ,
          <volume>993</volume>
          {
          <fpage>1022</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D</given-names>
            <surname>'Ambros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bacchelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lanza</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>On the impact of design aws on software defects</article-title>
          .
          <source>In: 2010 10th International Conference on Quality Software</source>
          . pp.
          <volume>23</volume>
          {
          <fpage>31</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Farrell</surname>
          </string-name>
          , J.:
          <article-title>Corporate funding and ideological polarization about climate change</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>113</volume>
          (
          <issue>1</issue>
          ),
          <volume>92</volume>
          {
          <fpage>97</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fowler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Refactoring: Improving the design of existing code</article-title>
          .
          <source>In: 11th European Conference</source>
          . Jyvaskyla,
          <source>Finland</source>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gilardi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shipan</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          , Wuest,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>The di usion of policy perceptions: Evidence from a structural topic model</article-title>
          . University of Zurich and University of Michigan (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Guzman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azocar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis of commit comments in github: an empirical study</article-title>
          .
          <source>In: Proceedings of the 11th Working Conference on Mining Software Repositories</source>
          . pp.
          <volume>352</volume>
          {
          <fpage>355</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hindle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godfrey</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holt</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          :
          <article-title>What's hot and what's not: Windowed developer topic analysis</article-title>
          .
          <source>In: 2009 IEEE International Conference on Software Maintenance</source>
          . pp.
          <volume>339</volume>
          {
          <fpage>348</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Modeling the evolution of development topics using dynamic topic models</article-title>
          .
          <source>In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>12</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Humphrey</surname>
            ,
            <given-names>W.S.:</given-names>
          </string-name>
          <article-title>A discipline for software engineering</article-title>
          .
          <source>Addison-Wesley Longman Publishing Co., Inc</source>
          . (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zibran</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis of software bug related commit messages</article-title>
          .
          <source>Network</source>
          <volume>740</volume>
          ,
          <issue>740</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lenarduzzi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , , Saarimaki, N.,
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The technical debt dataset</article-title>
          .
          <source>In: The Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE19) (Sept</source>
          <year>2019</year>
          ). https://doi.org/10.1145/3345629.3345630
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lenarduzzi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sillitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey on code analysis tools for software maintenance prediction</article-title>
          .
          <source>In: International Conference in Software Engineering for Defence Applications</source>
          . pp.
          <volume>165</volume>
          {
          <fpage>175</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shatnawi</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution</article-title>
          .
          <source>Journal of systems and software 80(7)</source>
          ,
          <volume>1120</volume>
          {
          <fpage>1128</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lozano</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wermelinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Assessing the e ect of clones on changeability</article-title>
          .
          <source>In: 2008 IEEE International Conference on Software Maintenance</source>
          . pp.
          <volume>227</volume>
          {
          <fpage>236</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mimno</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talley</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leenders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Optimizing semantic coherence in topic models</article-title>
          .
          <source>In: Proceedings of the conference on empirical methods in natural language processing</source>
          . pp.
          <volume>262</volume>
          {
          <fpage>272</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Airoldi</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>A model of text for experimentation in the social sciences</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>111</volume>
          (
          <issue>515</issue>
          ),
          <volume>988</volume>
          {
          <fpage>1003</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Saarimaki, N.,
          <string-name>
            <surname>Lenarduzzi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>On the di useness of code technical debt in java projects of the apache ecosystem</article-title>
          .
          <source>In: Proceedings of the Second International Conference on Technical Debt</source>
          . pp.
          <volume>98</volume>
          {
          <fpage>107</fpage>
          . IEEE Press (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lazar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharif</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Analyzing developer sentiment in commit logs</article-title>
          .
          <source>In: Proceedings of the 13th International Conference on Mining Software Repositories</source>
          . pp.
          <volume>520</volume>
          {
          <fpage>523</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sj berg</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamashita</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anda</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mockus</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyba</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Quantifying the e ect of code smells on maintenance e ort</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          <volume>39</volume>
          (
          <issue>8</issue>
          ),
          <volume>1144</volume>
          {
          <fpage>1156</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Van Kleek</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panovich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vargas</surname>
            ,
            <given-names>G.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karger</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schraefel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Note to self: examining personal information keeping in a lightweight note-taking tool</article-title>
          .
          <source>In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source>
          . pp.
          <volume>1477</volume>
          {
          <fpage>1480</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>