=Paper= {{Paper |id=Vol-2605/16 |storemode=property |title=On the Effect of Discussions on Pull Request Decisions |pdfUrl=https://ceur-ws.org/Vol-2605/16.pdf |volume=Vol-2605 |authors=Mehdi Golzadeh,Alexandre Decan,Tom Mens |dblpUrl=https://dblp.org/rec/conf/benevol/GolzadehDM19 }} ==On the Effect of Discussions on Pull Request Decisions== https://ceur-ws.org/Vol-2605/16.pdf
    On the Effect of Discussions on Pull Request Decisions

                            Mehdi Golzadeh, Alexandre Decan, Tom Mens
                             Software Engineering Lab, University of Mons
                                             Mons, Belgium
                       {mehdi.golzadeh, alexandre.decan, tom.mens}@umons.ac.be



                                                                    social phenomenon [1, 2]. GitHub embraces this social
                                                                    nature by extending the traditional git workflow with
                       Abstract                                     collaboration mechanisms such as pull requests (PR)
                                                                    and commenting. The pull-based development pro-
    Open-source software relies on contributions                    cess [3] constitutes the primary means for integrating
    from different types of contributors. Online                    code from thousands of developers. It allows devel-
    collaborative development platforms, such as                    opers to participate in many projects without having
    GitHub, usually provide explicit support for                    direct commit access. The primary advantage of a PR
    these contributions through the mechanism of                    is the decoupling of the development effort from the
    pull requests, allowing project members and                     decision to merge the result to the project’s codebase.
    external contributors to discuss and evaluate                   It helps developers to avoid frequent merge conflicts
    the submitted code. These discussions can                       with other contributors.
    play an important role in the decision-making
                                                                       Through a built-in commenting mechanism, project
    process leading to the acceptance or rejection
                                                                    integrators can review the code submitted in a PR, and
    of a pull request. We empirically examine in
                                                                    ask contributors to improve their code, add documen-
    this paper 183K pull requests and their dis-
                                                                    tation and tests before deciding to integrate it [4, 5].
    cussions, for almost 4.8K GitHub repositories
                                                                    Therefore, the history of commenting activity on a PR
    for the Cargo ecosystem. We investigate the
                                                                    (including all pull request comments and pull request
    prevalence of such discussions, their partici-
                                                                    review comments) provides a valuable source of infor-
    pants and their size in terms of messages and
                                                                    mation. It enables analysis of who was involved in the
    durations, and study how these aspects relate
                                                                    discussion about a PR (e.g. the PR creator, project
    to pull request decisions.
                                                                    integrators, or other contributors). The discussions
   Index terms— collaborative development, pull re-                 that take place between the author of the PR and the
quests, discussions, software repository mining, empir-             project integrators may play a key role in the ultimate
ical analysis1                                                      decision to merge the PR into the code base, if the con-
                                                                    cerns raised by the project integrators were properly
                                                                    addressed or discussed carefully by the PR author.
1    Introduction
                                                                       While many studies have focused on the importance
Today’s open source software development is increas-                of having successful PRs [6–9], there is much less re-
ingly relying on third-party contributors. Developers               search on understanding the effect of the presence of
contribute to different projects on online distributed              discussions on the decision to accept or reject a PR.
development platforms like GitHub. The collabora-                   Our research aims to empirically study the relation
tive nature of software development it an inherently                between the PR commenting history and the final PR
                                                                    decision. As preliminary steps, we focus in this paper
Copyright © by the paper’s authors. Use permitted under Cre-
ative Commons License Attribution 4.0 International (CC BY          on three research questions:
4.0).                                                                  RQ1 How prevalent are discussions in PRs? helps
In: D. Di Nucci, C. De Roover (eds.): Proceedings of the 18th       us to determine whether the research goal is worth-
Belgium-Netherlands Software Evolution Workshop, Brussels,          while to pursue: if there is only a limited number
Belgium, 28-11-2019, published at http://ceur-ws.org
   1 This research is supported by the joint FNRS / FWO             of PRs with discussions, then we will not be able to
Excellence of Science project SECO-ASSIST and FNRS PDR              draw statistically significant conclusions on their re-
T.0017.18.                                                          lation with PR decisions. We show that most PRs




                                                                1
have at least a few comments and a few participants             effect of organization and developer profiles on the PR
involved in their discussions, and that the presence of         decision [7].
a discussion is related to the decision. In RQ2 Who
is involved in PR discussions? we identify and group            3   Methodology
participants based on their role in a PR. We report
about their combined presence in discussions and ex-            To carry out our empirical investigation, we need a
hibit a relation between a PR decision and the partic-          dataset containing a large number of repositories and
ipants that are involved in its discussion. Finally, in         PRs. The dataset should exclude git repositories that
RQ3 How long are discussions? we measure discussion             have been created merely for experimental or personal
length in terms of time and of number of comments               reasons, or that only show sporadic traces of activity
and show how they relate to a PR decision.                      and contributions [28]. Registries of reusable software
   The remainder of this paper is organized as follows.         packages (e.g., npm for JavaScript, Cargo for Rust,
Section 2 provides the necessary background of studies          or PyPI for Python) are good candidates to find such
related to PRs and comments. Section 3 presents the             repositories, as they typically host thousands of active
data extraction and methodology. Section 4 presents             software projects, and as one can expect most of them
the preliminary results for the above research ques-            to have an associated git repository.
tions. Section 5 discusses the threats to validity of our           We selected the Cargo package registry for the Rust
study. Section 6 summarises the main findings and               programming language, because it contains tens of
outlines future work.                                           thousands of projects, and a large majority of them
                                                                (nearly 85%) is being developed on GitHub. As both
                                                                Cargo and Rust are quite recent (Rust was introduced
2   Background                                                  in 2011), they contain a large number of repositories,
Distributed software development on shared online               even after filtering out those that are inactive in terms
GitHub repositories is very frequently following a pull-        of contributions and discussions related to these con-
based development process [3–5]. Any contributor can            tributions.
create forks of a repository, update them locally by                We relied on libraries.io data dump to extract the
contributing code changes and, whenever ready, re-              metadata for more than 15K Cargo packages [29]. We
quest to have these changes merged back into the main           filtered out 1,571 packages that did not have any as-
branch by submitting a PR [10]. This pull-based soft-           sociated git repository and 413 packages whose repos-
ware development model offers a distributed collabo-            itory is not hosted on GitHub. Not all git reposito-
ration mechanism that allows developers to contribute           ries were still available at the time we extracted the
code in a way that makes code changes trackable                 data, and our final list of repositories is composed of
and reviewable by version control systems. This re-             9,954 candidates. For each of these repositories, we
view mechanism has the additional effect of increasing          retrieved using GitHub API its complete list of PRs
awareness of all changes and allows the developer com-          and, for each PR, all related comments and PR review
munity to form an opinion about the proposed changes            comments. We found that 5,210 repositories did not
and the ultimate merge decision [11]. Many empiri-              have any PRs, hence only 4,744 repositories were re-
cal studies have targeted pull requests from different          tained for further analysis, accounting for more than
points of view, including evaluation of PRs through             188K PRs.
discussion [6], factors influencing acceptance or rejec-            As our goal is to study the relation between discus-
tion [8, 9, 12, 13] and, predicting potential future con-       sions and PR decisions, we decided to remove all PRs
tributors [14].                                                 for which no decision was (yet) taken. Such PRs repre-
   Moreover, there are studies which analyze the con-           sent a small fraction of our dataset (around 2.6%). Our
tent of PR to recommend core member to review, an-              final dataset contains more than 183K PRs, submitted
alyze, evaluate and integrate PRs [15–19], recommend            by 13,623 contributors and accounting for nearly 1M
PRs with high priority [20], study the effect of ge-            comments.
ographical location of contributors on evaluation of                For each PR in this dataset, we have access to its
PRs [21], and gender bias in PR acceptance or re-               creation date, its decision date, its decision, the per-
jection [22]. Some studies targeted code reviews to             son that made that decision, the author of the PR,
study the reasons and impact of confusion in code               and all the comments that were made, including PR
reviews [23], linguistic aspects of code review com-            review comments. It is important to note that the very
ments [24], the impact of continuous integration on             first comment visible in a PR corresponds to the PR
code reviews [25], the challenges faced by code change          description, and is not considered as a PR comment
authors and reviewers [26], how developers perceive             in this paper, following the distinction also made by
code review quality [27], how presence of bots and the          GitHub. For each comment, we retrieved its creation




                                                            2
date and its owner. We distinguish between four cat-                                      comment (has comments), at least two participants
egories of owners:                                                                        (has participants) and at least one comment exchange
                                                                                          (has exchange). Fig. 2 reports on these proportions.
           1. author corresponds to the contributor submitting                            Note that by definition a comment exchange implies at
              the PR;                                                                     least 2 participants, hence we have has exchange =⇒
           2. integrator refers to the person having accepted or                          has participants =⇒ has comments.
              rejected a previous PR in the same project;                                                     1.0
                                                                                                                                   has comments
                                                                                                              0.8                  has participants




                                                                                          proportion of PRs
           3. decider refers to the integrator who accepted or
                                                                                                                                   has exchange
              rejected the PR currently under consideration;                                                  0.6
              and                                                                                             0.4
           4. other corresponds to any other participant (e.g.,                                               0.2
              users, bots, external contributors).
                                                                                                              0.0
                                                                                                                    Accepted     Rejected
4                     Research Results
RQ1 How prevalent are discussions in PRs?                                                 Figure 2: Proportion of accepted and rejected PRs
                                                                                          w.r.t. the presence of comments and participants.
With this first research question, we aim to get in-
sights into the prevalence of discussions in PRs. For                                        While we observe that a majority of PRs (regard-
each PR in the dataset, we computed its number of                                         less of their decision) have comments, proportionally
comments, its number of distinct participants and its                                     more PRs have comments for rejected PRs (72.5%)
number of comment exchanges between one of the inte-                                      than for accepted ones (62.4%). Similar observations
grators and the author, i.e., the number of times there                                   can be made for the other criteria, suggesting a re-
is one comment from an integrator followed by an an-                                      lation between PR acceptance and the presence of a
swer from the PR author. Fig. 1 shows the proportion                                      comment/participant.
of PRs having at least a given number of comments,
participants, and comment exchanges.                                                      RQ2 Who is involved in PR discussions?

                    1.0                                                                   This research question focuses on the participants that
                                                               comments                   are involved in PR discussions. We distinguish be-
                    0.8                                        participants
proportion of PRs




                                                                                          tween four categories of participants, as explained in
                                                               comment exchanges
                    0.6                                                                   Section 3. For each PR, each participant involved in
                                                                                          the discussion was classified in author, integrator, de-
                    0.4
                                                                                          cider or other. Fig. 3 shows the proportion of PR
                    0.2                                                                   discussions in function of the presence of categories of
                    0.0                                                                   participants.
                          0   3       6       9    12       15      18     21    24           We      observe    that    the   author      of    a
                              min. number of comments, participants or exchanges
                                                                                          PR       is     involved    in     most      discussions
                                                                                          (64%=6+12+3+3+3+4+20+13), as is the case
Figure 1: Proportion of PRs having at least a given                                       for      deciders      (62%=11+9+20+12+3+4+1+2)
number of comments, participants or comment ex-                                           and     integrators     (57%=6+9+1+1+3+4+20+13).
changes.                                                                                  Other participants are involved in only 23%
   We observe that while 48.8% of all PRs have at least                                   (=2+1+4+3+3+3+1+6) of the discussions.               We
two comments and 42.4% of all PRs have at least two
participants, only 31.9% of them have comment ex-
changes. We also observe that all curves exhibit power
law behaviour: the proportion of PRs is exponentially
decreasing as the required number of comments, par-
ticipants or exchanges increases. For instance, around
80% of all PRs have less than 8 comments, 3 partici-
pants and 2 comment exchanges.
   Since the presence of comments, participants
and/or comment exchanges could affect the acceptance
or rejection of a PR, we computed the proportion of                                       Figure 3: Proportion of PR discussions w.r.t. the pres-
accepted (resp. rejected) PRs that have at least one                                      ence of participants.




                                                                                      3
observe that the most frequent combinations of partic-
ipants involve the author and some integrator/decider.                                              500
                                                                                                                                              Accepted
                                                                                                    400




                                                                               duration (in days)
For instance, the pair composed of author/integrator                                                                                          Rejected
is the most frequent one (40%=13+20+4+3) followed                                                   300
by the pair author/integrator (39%=20+12+4+3).                                                      200
24% (=20+4) of the discussions involve the author,
                                                                                                    100
an integrator and the decider. 29% (=6+6+11+6) of
                                                                                                      0
all cases involve a single participant only.                                                              0   20   40      60     80 100     120   140
    Similar to what was done for RQ1 , we grouped PRs                                                                   number of comments
according to their decision, and we computed the pro-
portion of PRs with respect to the presence of partic-                         Figure 5: Scatter plot and density plots of discussion
ipants of each category. Fig. 4 reports on these pro-                          duration and number of comments.
portions.
                                                                               number of comments and the duration. We statisti-
                    1.0                                                        cally compared these distributions by means of Mann-
                             discussion with                                   Whitney-U tests. The null hypothesis was rejected in
                    0.8   author             decider
proportion of PRs




                          integrator         other                             both cases (p < 0.001), indicating a statistically sig-
                    0.6                                                        nificant difference between these distributions. How-
                    0.4                                                        ever, we found this difference to be negligible (Cliff’s
                                                                               delta |d| = 0.025) for the number of comments [30,31],
                    0.2
                                                                               and small (|d| = 0.219) for the duration of these dis-
                    0.0                                                        cussions, indicating a higher duration in rejected PRs
                              Accepted                 Rejected
                                                                               than in accepted ones. For instance, the median dura-
                                                                               tion is 1.69 days for rejected PRs and 0.6 for accepted
Figure 4: Proportion of PRs w.r.t.                         participants,       ones.
grouped by PR decision.                                                           The two regression lines superposed on the scatter
   We observe some interesting differences between ac-                         plot reflect the average time between comments (i.e.,
cepted and rejected PRs mainly based on the presence                           the ratio between duration and comments). We com-
of authors and integrators. 51.4% of rejected PRs in-                          puted this ratio for all considered discussions, and we
volve the author of that PR and 49.6% involve an in-                           statistically compared their distributions for accepted
tegrator, while for accepted PRs only 39.1% involve                            and rejected PRs using a Mann-Whitney-U test. We
the author and 34.3% involve an integrator. While in-                          found a statistically significant difference between the
tegrators are proportionally more involved in rejected                         two distributions (p < 0.001) and a small effect size
than accepted PRs, the opposite is true when it comes                          (|d| = 0.258), indicating a higher discussion ratio in
to the decider of a PR: a decider is involved in 42.6%                         accepted PRs than in rejected PRs. For instance, the
of accepted PRs but “only” in 22.0% of the rejected                            median average time between comments is 0.08 for ac-
ones. Finally, when considering all other participants                         cepted PRs, and 0.26 for rejected PRs.
there is only a slight difference between accepted PRs
(14.4%) and rejected PRs (17.4%).                                              5                     Threats to Validity
                                                                               Since our analyses are based on data from git reposi-
RQ3 How long are discussions?
                                                                               tories on GitHub, our results may be exposed to the
The last research question focuses on the length of dis-                       usual threats related to mining data from GitHub such
cussions in terms of number of comments and time be-                           as “a large portion of repositories are not for soft-
tween the first and last comment. We computed these                            ware development” and “two thirds of projects are per-
two characteristics for discussions having at least 2                          sonal” [28]. However, given that our dataset is com-
comments. These account for 49% of all PRs consid-                             posed of git repositories related to Cargo projects, it is
ered so far. The results are reported in Fig. 5, combin-                       unlikely to be affected by such threats. On the other
ing a scatter plot and two density plots (one for each                         hand, the selection bias induced by our dataset be-
considered characteristic).                                                    ing exclusively based on repositories related to Cargo
   We observe from the density plots that most discus-                         projects is a threat to external validity [32], since the
sions have a few comments and last for a short period                          results and conclusions cannot be generalized outside
of time. For instance, the median number of com-                               the scope of this study.
ments is 5 and the median duration is 0.7 days. We                                The main threat to construct validity is that “most
observe from the scatter plot a difference between dis-                        pull requests appear as non-merged even if they are
cussions in accepted and rejected PRs, both for the                            actually merged” [28], potentially leading to an over-




                                                                           4
estimation of the number of rejected PRs to the detri-              This paper is part of a broader study and our inten-
ment of accepted ones. Fully addressing this threat              tion is to gain a deeper understanding of the dynamics
is not possible, but we could rely on heuristics to de-          and patterns of discussions in pull requests, and their
tect whether PR commits are actually part of the main            impact on PR decisions. Our goal is to provide tech-
branch. Such heuristics are likely to change the figures         niques and tools to allow the community to perform
reported in this paper, but are unlikely to affect the           better. Reducing the time to make decisions for pull
findings we obtained. Indeed, even if some PRs were              requests can help the community to encourage better
wrongly identified as non-merged (=rejected), we al-             contributions by reducing the time required to reject
ready exhibited differences in PR discussions between            contributions of insufficient quality or relevance, and
accepted and rejected PRs.                                       by reducing the time to review and accept positive con-
    Another threat to construct validity stems from the          tributions. Moreover, based on the insights obtained
presence of bots and contributors with multiple iden-            during this study we aim to develop techniques to in-
tities. We mitigated the problem of multiple identi-             crease the productivity of contributions in terms of
ties by relying on GitHub usernames to identify con-             code quality and contribution time.
tributors instead of the “author” field values. We did
not consider the presence of bots in this work. This             References
may have led to an overestimation of the number of
comments and participants, but our findings should                [1] Laura A. Dabbish, H. Colleen Stuart, Jason Tsay,
not be significantly affected, assuming that bots rep-                and James D. Herbsleb. Social coding in GitHub:
resent only a fraction of the considered comments. In                 transparency and collaboration in an open soft-
our future work, we will study heuristics to detect bot               ware repository. In Int’l Conf. Computer Sup-
comments in order to take them into account in our                    ported Cooperative Work, pages 1277–1286, 2012.
analyses.                                                         [2] Tom Mens, Marcelo Cataldo, and Daniela
    Finally, the lack of distinction between the different            Damian. The social developer: The future of soft-
types of comments in our dataset represents a threat                  ware development. IEEE Software, 36, January–
to internal validity. Not all comments are equal, but                 February 2019.
have been treated as such in this work. We did not
differentiate based on the size or content of the com-            [3] Georgios Gousios, Martin Pinzger, and Arie van
ments. Similarly, we did not distinguish between PR                   Deursen. An exploratory study of the pull-based
comments and PR review comments, even if they do                      software development model. In International
not serve the same purpose. Making such distinctions                  Conference on Software Engineering, pages 345–
can potentially lead to different results, and will be                355. ACM, 2014.
explored in future work to gain additional insights.
                                                                  [4] G. Gousios, A. Zaidman, M. Storey, and Arie van
                                                                      Deursen. Work practices and challenges in pull-
6    Conclusion                                                       based development: The integrator’s perspective.
In this preliminary research, we empirically studied                  In International Conference on Software Engi-
183K PRs and their discussions, accounting for around                 neering, volume 1, pages 358–368. IEEE, May
1M comments. We showed that discussions are preva-                    2015.
lent in PRs and there are proportionally more com-
                                                                  [5] Georgios Gousios, Margaret-Anne Storey, and Al-
ments, participants and comment exchanges for re-
                                                                      berto Bacchelli. Work practices and challenges in
jected PRs than for accepted ones. We identified and
                                                                      pull-based development: The contributor’s per-
grouped participants based on their role in a PR, and
                                                                      spective. In International Conference on Software
showed that a majority of discussions involved the au-
                                                                      Engineering, pages 285–296. ACM, 2016.
thor, the decider or one of the integrators. We showed
that the presence of these participants is related to PR          [6] Jason Tsay, Laura Dabbish, and James Herb-
decisions.                                                            sleb. Let’s talk about it: Evaluating contribu-
   Finally, we considered discussion length in terms                  tions through discussion in github. In Proceedings
of duration and number of comments. We observed                       of the 22Nd ACM SIGSOFT International Sym-
that most discussions have only a few comments and                    posium on Foundations of Software Engineering,
do not last for long. While we have not found large                   FSE 2014, pages 144–154, New York, NY, USA,
differences between accepted and rejected PRs based                   2014. ACM.
on their number of comments, we found that discus-
sions in rejected PRs are longer, and that discussions            [7] Olga Baysal, Oleksii Kononenko, Reid Holmes,
in accepted PRs are more intense.                                     and Michael W. Godfrey. Investigating techni-




                                                             5
    cal and non-technical factors influencing mod-                 In Asia-Pacific Software Engineering Conference,
    ern code review. Empirical Software Engineering,               volume 1, pages 335–342, Dec 2014.
    21(3):932–959, Jun 2016.
                                                               [17] Manoel Limeira de Lima Júnior, Daricélio Mor-
 [8] Mohammad Masudur Rahman and Chanchal K.                        eira Soares, Alexandre Plastino, and Leonardo
     Roy.   An insight into the pull requests of                    Murta. Developers assignment for analyzing pull
     GitHub. In Working Conference on Mining Soft-                  requests. In ACM Symposium on Applied Com-
     ware Repositories, pages 364–367. ACM, 2014.                   puting, pages 1567–1572. ACM, 2015.

 [9] Di Chen, Kathryn T. Stolee, and Tim Menzies.              [18] Jing Jiang, J.-H He, and X.-Y Chen. Corede-
     Replication can improve prior results: A github                vrec: Automatic core member recommendation
     study of pull request acceptance. In Proceedings               for contribution evaluation. Journal of Computer
     of the 27th International Conference on Program                Science and Technology, 30:998–1016, 09 2015.
     Comprehension, ICPC ’19, pages 179–190, Pis-
     cataway, NJ, USA, 2019. IEEE Press.                       [19] Manoel Limeira de Lima Júnior, Daricélio Mor-
                                                                    eira Soares, Alexandre Plastino, and Leonardo
[10] Y. Yu, H. Wang, V. Filkov, P. Devanbu, and                     Murta. Automatic assignment of integrators to
     B. Vasilescu. Wait for it: Determinants of pull                pull requests: The importance of selecting appro-
     request evaluation latency on GitHub. In Work-                 priate attributes. Journal of Systems and Soft-
     ing Conference on Mining Software Repositories,                ware, 144:181 – 196, 2018.
     pages 367–371, May 2015.
                                                               [20] E. v. d. Veen, G. Gousios, and A. Zaidman. Au-
[11] Jason Tsay, Laura Dabbish, and James Herbsleb.                 tomatically prioritizing pull requests. In Work-
     Influence of social and technical factors for eval-            ing Conference on Mining Software Repositories,
     uating contribution in GitHub. In International                pages 357–361. IEEE, May 2015.
     Conference on Software Engineering, pages 356–
     366. ACM, 2014.                                           [21] Ayushi Rastogi, Nachiappan Nagappan, Georgios
                                                                    Gousios, and André van der Hoek. Relationship
[12] Igor Steinmacher, Gustavo Pinto, Igor Scaliante                between geographical location and evaluation of
     Wiese, and Marco A. Gerosa. Almost there: A                    developer contributions in github. In Interna-
     study on quasi-contributors in open source soft-               tional Symposium on Empirical Software Engi-
     ware projects. In Proceedings of the 40th Interna-             neering and Measurement. ACM, 2018.
     tional Conference on Software Engineering, ICSE
     ’18, pages 256–266, New York, NY, USA, 2018.              [22] Josh Terrell, Andrew Kofink, Justin Middle-
     ACM.                                                           ton, Clarissa Rainear, Emerson Murphy-Hill, and
                                                                    Chris Parnin. Gender bias in open source: Pull
[13] M. Wessel, I. Steinmacher, I. Wiese, and M. A.                 request acceptance of women versus men. 01 2016.
     Gerosa. Should i stale or should i close? an anal-
     ysis of a bot that closes abandoned issues and            [23] Felipe Ebert, Fernando Castor, Nicole Novielli,
     pull requests. In 2019 IEEE/ACM 1st Interna-                   and Alexander Serebrenik. Confusion in code re-
     tional Workshop on Bots in Software Engineering                views: Reasons, impacts, and coping strategies.
     (BotSE), pages 38–42, May 2019.                                pages 49–60, 02 2019.

[14] Damien Legay, Alexandre Decan, and Tom                    [24] Vasiliki Efstathiou and Diomidis Spinellis. Code
     Mens. On the impact of pull request decisions                  review comments: Language matters. CoRR,
     on future contributions. arXiv e-prints, page                  abs/1803.02205, 2018.
     arXiv:1812.06269, Dec 2018.
                                                               [25] M. M. Rahman and C. K. Roy. Impact of con-
[15] Y. Yu, H. Wang, G. Yin, and C. X. Ling. Re-                    tinuous integration on code reviews. In 2017
     viewer recommender of pull-requests in GitHub.                 IEEE/ACM 14th International Conference on
     In International Conference on Software Mainte-                Mining Software Repositories (MSR), pages 499–
     nance and Evolution, pages 609–612. IEEE, Sep.                 502, May 2017.
     2014.
                                                               [26] L. MacLeod, M. Greiler, M. Storey, C. Bird, and
[16] Y. Yu, H. Wang, G. Yin, and C. X. Ling. Who                    J. Czerwonka. Code reviewing in the trenches:
     should review this pull-request: Reviewer rec-                 Challenges and best practices. IEEE Software,
     ommendation to expedite crowd collaboration.                   35(4):34–42, July 2018.




                                                           6
[27] O. Kononenko, O. Baysal, and M. W. Godfrey.            [30] N. Cliff. Dominance statistics: Ordinal analyses
     Code review quality: How developers see it. In              to answer ordinal questions. Psychological Bul-
     2016 IEEE/ACM 38th International Conference                 letin, 114(3):494–509, 1993. cited By 364.
     on Software Engineering (ICSE), pages 1028–
     1038, May 2016.                                        [31] Jeanine Romano, Jeffrey D Kromrey, Jesse Cor-
                                                                 aggio, Jeff Skowronek, and Linda Devine. Explor-
[28] Eirini Kalliamvakou, Georgios Gousios, Kelly                ing methods for evaluating group differences on
     Blincoe, Leif Singer, Daniel M. German, and                 the NSSE and other surveys: Are the t-test and
     Daniela Damian. The promises and perils of                  Cohen’s d indices the most appropriate choices?
     mining GitHub. In Int’l Conf. Mining Software               In Annual Meeting of the Southern Association
     Repositories, pages 92–101. ACM, 2014.                      for Institutional Research, 2006.
[29] Jeremy Katz. Libraries.io open source repository       [32] C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson,
     and dependency metadata (version 1.4.0) [data               B. Regnell, and A. Wesslen. Experimentation in
     set].   http://doi.org/10.5281/zenodo.2536573,              Software Engineering - An Introduction. Kluwer,
     2018.                                                       2000.




                                                        7