=Paper= {{Paper |id=Vol-2605/9 |storemode=property |title=On the Usage of Badges in Open Source Packages on GitHub |pdfUrl=https://ceur-ws.org/Vol-2605/9.pdf |volume=Vol-2605 |authors=Damien Legay,Alexandre Decan,Tom Mens |dblpUrl=https://dblp.org/rec/conf/benevol/LegayDM19 }} ==On the Usage of Badges in Open Source Packages on GitHub== https://ceur-ws.org/Vol-2605/9.pdf
    On the Usage of Badges in Open Source Packages on
                         GitHub

                             Damien Legay, Alexandre Decan, Tom Mens
                            Software Engineering Lab, University of Mons
                                             Mons, Belgium
                        {damien.legay, alexandre.decan, tom.mens}@umons.ac.be



                                                                    a centralised web-based graphical interface that acts
                                                                    as a portal to showcase a project. Given the vari-
                       Abstract                                     ety and quantity of information that can be communi-
                                                                    cated through such interfaces, it is not surprising that
    Continuously attracting contributors is key to                  project maintainers have sought a simpler, faster and
    the health of open source software projects.                    more concise way to communicate essential informa-
    The appearance of badges in online collabora-                   tion or advertise specific aspects of a software project.
    tive development platforms affords maintain-                       Badges are small images conveying one specific in-
    ers the opportunity to advertise the quality                    formation to the reader at a glance. We found evidence
    of their project to potential contributors. In                  of their use in GitHub dating back to 2011. They typ-
    this preliminary research, we analyse 14,592                    ically appear at the top of a project’s README file,
    GitHub package repositories for Cargo and                       which is displayed by GitHub on the project reposi-
    203,029 repositories for Packagist. We mea-                     tory homepage. Badges can advertise various aspects
    sure how prevalent badges are in those repos-                   of a project, e.g., its license                 , the code
    itories, which badges are used, when and how                    coverage of its test suite                 , the adopted
    they are introduced, and which combinations                     code style                      , etc. Some badges act
    of badges co-occur. We find that the most                       as an incentive to maintain excellence in the qualities
    widespread badges convey static information                     they display, lest a bad signal be sent to the project’s
    or relay information about the build status of                  users and potential contributors. For instance, a code
    a project. Those badges are typically added                     coverage badge creates an incentive to maintain a high
    early in projects and prior to or at the same                   code coverage, as otherwise, potential contributors can
    time as other badges.                                           easily see that the project is poorly tested and, there-
                                                                    fore, prone to have hard to detect bugs. Similarly, a
1    Introduction                                                   badge that measures the quality of the code provides
The rise of distributed collaborative development plat-             an incentive to maintain a high code quality, to not
forms, such as GitHub, BitBucket and GitLab, allowed                display on the project’s homepage that the code base
thousands of people to remotely work together on                    is of poor quality.
the same projects. These platforms provide addi-                       Trockman et al [1] observed a relation between the
tional features on top of their underlying version con-             presence of some badges and specific aspects of the
trol system to further support distributed collabora-               software development process. They found that the
tive development. Examples of such features are is-                 presence of dependency management badges correlates
sue tracking, code review, integration with external                with fresher dependencies and the presence of code
tools, etc. These features are usually provided through             coverage badges correlates with larger test suites in
                                                                    a project and more tests in pull requests. Contrarily
Copyright © by the paper’s authors. Use permitted under Cre-        to their expectations, they found that the presence of
ative Commons License Attribution 4.0 International (CC BY          badges dedicated to offering user support was related
4.0).
                                                                    to a higher issue resolution time.
In: D. Di Nucci, C. De Roover (eds.): Proceedings of the 18th
Belgium-Netherlands Software Evolution Workshop, Brussels,             As contributors in open source projects are volun-
Belgium, 28-11-2019, published at http://ceur-ws.org                teers, they typically have little time to devote to the




                                                                1
projects they contribute to, so they must select such            include a wide range of projects serving different pur-
projects with care as they can ill afford to contribute to       poses and exhibiting a wide variation in longevity and
projects of low quality, which are harder to contribute          size.
to and more likely to fail [2–4]. On their part, project            Online package registries of open source libraries for
integrators need to maintain an influx of contributions          popular software programming languages constitute
in order to keep their projects evolving and growing, as         good candidates, since they contain a lot of projects,
it has long been known that software performing real-            many of them being publicly available on GitHub. We
world activities must continually grow and continually           arbitrarily selected two such package registries because
change to adapt to the environment it evolves in [5].            we knew from previous work that many of their pack-
These laws also apply to open source software [6, 7].            ages have an associated git repository publicly avail-
   It is known that the number of stars, the time taken          able on GitHub. These registries are Cargo for Rust
to merge pull requests and number of programming                 libraries, and Packagist for PHP libraries.
languages are the factors most likely to attract contrib-           We collected a list of 15,625 packages on Cargo
utors [8]. However, little is known about the impact             and 216,613 packages on Packagist using their respec-
of badges on contributor attraction.                             tive official API. We downloaded package metadata for
   Our overall research goal is to investigate the rela-         those packages and extracted the link to their associ-
tion between the presence of badges and the influx of            ated git repositories. We filtered packages (1) with-
new contributors and new contributions in a project,             out an associated git repository; (2) whose git repos-
as well as on the health of projects. It is known, for           itory is no longer available; and (3) corresponding to
instance, that the use of continuous integration tools           “spam” packages1 . Since GitHub hosts nearly all the
helps catch bugs more efficiently and integrate pull             git repositories of remaining packages (> 94%), and
requests faster [9, 10] but not whether badges adver-            since it is far easier to deal with only a single col-
tising the use of these tools have any impact on qual-           laborative development platform, we excluded repos-
ity. This paper focuses on preliminary but manda-                itories that were not available on GitHub. The final
tory steps towards this goal by addressing the fol-              dataset contains 14,592 package repositories for Cargo
lowing research questions. RQ0 : How prevalent are               and 203,029 package repositories for Packagist.
badges? This question will help us determine whether                We cloned all these repositories in July 2019 to ex-
the research goal is worthwhile to pursue: if badges             tract badge-related historical information. To iden-
are sparsely used, results about the impact of their             tify badges, we focused on images contained in the
adoption on contributions may not be statistically sig-          projects’ README files. We extracted all images from
nificant. Since badges can convey a wide variety of              these README files, taking into account the various
information whose impact on potential contributions              supported markup languages (HTML, Markdown and
may vary, we examine RQ1 : What are the most fre-                ReST). We manually identified those corresponding to
quent badge categories? As many badges can be used               badges following the iterative approach proposed by
simultaneously in a project, we analyse RQ2 : How fre-           Trockman et al. [1]. Then, we used git log to anal-
quent are combinations of badge categories? The an-              yse the history of these README files to pinpoint the
swer to this question will determine whether the effect          introduction date of each badge.
of badges of one category can be dissociated from those
of another category: if two badge categories are always          3    Results
found together and introduced simultaneously, differ-
entiating their impact will be impossible. Finally, we           RQ0 : How prevalent are badges?
enquire into RQ3 : When are badges introduced? This              With this first question, we aim to determine the ex-
elucidates whether badges are introduced late enough             tent of badge usage in project repositories. We identi-
in a project to compare project characteristics prior to         fied for each of the two considered datasets the badges
and after the adoption of the badge.                             used by their projects. We found 21,884 instances of
                                                                 109 distinct badges in Cargo projects, and 239,529 in-
2    Methodology                                                 stances of 366 distinct badges in Packagist. While
                                                                 there are more badges than projects in both datasets,
To conduct this study, we need a large dataset of can-           that does not necessarily imply that all projects use
didate repositories hosted on online collaborative de-           badges. Figure 1 shows the evolution of the number
velopment platforms. As we are interested in study-
                                                                    1 We manually identified more than 200 packages on Packagist
ing the effect of badges on contributions, the dataset
                                                                 that are not related to software projects, e.g., iphonex-giveaway,
should exclude git repositories created merely for ex-           captain-marvel-pelicula-completa-uncut, etc. These spam pack-
perimental or personal reasons, or that only show spo-           ages are usually quickly removed from Packagist by the main-
radic traces of commit activity [11]. Ideally, it should         tainers of the registry.




                                                             2
and proportion of projects using at least one badge                                                          project’s documentation or website; (7) other badges
in Cargo and Packagist. For Cargo, we do not report                                                          correspond to any badge that does not fit within the
before 2014 as this only concerns 90 projects out of                                                         previous categories, e.g., donation links        .
which only 19 had a badge.
                                 1.0                                                                         Table 1: Number (#) and proportion (%) of badge oc-
 prop. of projects with badges




                                        cargo         proportion (left)                                      currences per badge category for Cargo and Packagist.
                                 0.8    packagist     number (right)            60000




                                                                                    number of projects
                                                                                                                                            Cargo       Packagist
                                 0.6                                                                           category   first occ.        #      %       #     %
                                                                                40000
                                 0.4                                                                              Info    2014-01-18    11,000 50%     69,016 29%
                                                                                20000                            Build    2011-11-12     8,130 37%     53,478 22%
                                 0.2
                                                                                                                  QA      2012-08-31       964    4%   56,167 23%
                                 0.0                                            0                                 Pop     2013-06-01       606    3%   33,581 14%
                                   2011 2012 2013 2014 2015 2016 2017 2018 2019
                                                                                                                DepMgr    2013-03-21       534    2%   18,936   8%
                                                                                                                Support   2014-01-30       464    2%    3,631   2%
Figure 1: Proportion and number of projects using                                                                Other    2011-05-24       181    0%     4556   2%
badges
                                                                                                                Table 1 reports, for each of these categories, the
   While there are far more projects with badges in                                                          date of first identification and the number of occur-
Packagist than in Cargo, we observe a markedly higher                                                        rences and the proportion of badges belonging to each
badge penetration within Cargo (topping off at 57%)                                                          category relative to all of badge occurrences in the
than within Packagist (the highest observed propor-                                                          dataset. While Cargo projects tend to use more badges
tion is 36%). For both datasets, the adoption rate                                                           than Packagist (on average 1.50 badges per project for
eventually reaches a plateau: even though the num-                                                           Cargo, 1.18 for Packagist), there is less diversity in the
ber of projects using badges keeps increasing more                                                           badges they use. The starkest contrast is in the us-
than linearly, it does not supersede the rate of cre-                                                        age of QA badges, which constitute 23% of the badges
ation of new projects. In both datasets, the most fre-                                                       found in Packagist projects, but only 4% of the badges
quent badge is the one reporting the build status of                                                         in Cargo projects. This is partially explained by the
the Travis continuous integration tool (30% and 20%                                                          popularity of the Scrutinizer tool in Packagist (18,196
of all badges used, respectively).                                                                           badges are associated with it) which inspects the qual-
                                                                                                             ity of PHP, Python and Ruby code, but not Rust code.
RQ1 : What are the most frequent badge cate-                                                                 Even other maintainability analysis tools that do sup-
gories?                                                                                                      port Rust, such as Codeclimate, remain rarely used
RQ0 revealed that badges are widely used. However,                                                           in Cargo (4 badges found) while they are frequent for
not all projects use the same badges, and projects may                                                       Packagist (4,289 badges). The rest of this paper will
use badges for a variety of different purposes. Fur-                                                         focus on the categories that account for at least 10%
thermore, many badges fulfil a similar role (e.g., sev-                                                      of the badges in at least one of the datasets.
eral badges can be used to relay the build status of
                                                                                                             RQ2 : How frequent are combinations of badge
a project, based on different providers such as Travis
                                                                                                             categories?
and Appveyor).
   Therefore, we grouped these badges into 7 cate-                                                           Since a project can make use of several badges at once,
gories, following the approach of Trockman et al [1]:                                                        this research question aims to quantify co-occurring
(1) build status (Build) badges signal whether the lat-                                                      badges and to identify which combinations of badges
est build of a project succeeded or not, e.g., passed                                                        are most frequent. Co-occurring badges are clearly not
all tests              ; (2) dependency management                                                           an exception in either dataset: we found that 76% of
(DepMgr) badges inform about dependency fresh-                                                               projects with at least one badge in Cargo have two
ness, e.g., whether dependencies are up-to-date or                                                           or more badges at once (77% in Packagist). On aver-
not                      ; (3) popularity (Pop) badges                                                       age, a project with badges makes use of 2.68 distinct
provide characteristics related to the popularity of a                                                       badges in Cargo, and of 3.59 distinct badges in Pack-
project, e.g., number of downloads                  ; (4)                                                    agist. If we group badges by category, we have on av-
quality assurance (QA) badges report on aspects re-                                                          erage 1.94 badge categories in Cargo and 2.89 in Pack-
lated to code quality, e.g., based on the output of some                                                     agist. Badge categories are counted as co-occurring in
linters               ; (5) support badges provide links                                                     a project whenever at least one badge of each cate-
to chats and user forums, e.g.,             ; and (6) in-                                                    gory is present. Figure 2 shows a Venn diagram of
formation (Info) badges communicate various types of                                                         co-occurring badge categories in both datasets.
information independent of any tool, e.g., the project’s                                                        In Cargo, the most frequent combination by
license           , version and authors or a link to the                                                     far is the one containing Build and Info badges,




                                                                                                         3
                                Cargo                                                           Packagist

                Figure 2: Combinations of badge categories used in Cargo and Packagist projects.
both of them occurring more frequently to-                RQ3 : When are badges introduced?
gether (62.4%=48.3+4.4+8.6+1.1) than apart
                                                          With RQ2 we found that most co-occurring badge cat-
(23.9%=21.9+1.7+0.3 and 13.6%=12.3+1.2+0.1,
                                                          egories correspond to badge instances introduced on
respectively).    We also observe that the lesser-
                                                          the same day in a project. We now focus on when those
used badges are rarely found alone, they tend to
                                                          badges were introduced in a project. For each badge
be paired up with a Build or an Info badge. In
                                                          category, we computed the proportion of projects with
Packagist, too, Build and Info badges are more fre-
                                                          at least one badge making use of a badge of this cat-
quently found together (42%=19.4+9.8+8.2+4.6)
                                                          egory. Figure 3 shows the evolution of these propor-
than      apart     (37.4%=19.7+14.4+1+2.3        and
                                                          tions for both datasets. For Cargo, we observe a fast
17.4%=10+2+0.7+4.7, respectively).          We also
                                                          and somewhat massive adoption of Build badges: the
observe that nearly one out of five projects with
                                                          proportion of projects using such badges went from
badges in Packagist (19.4%) uses the four considered
                                                          2% (September 2013) to 77% (August 2014). A sim-
badge categories at once. In Packagist, badges are
                                                          ilar observation can be made for Info badges to a
less frequently found in isolation. For instance, the
                                                          lesser extent, going from 2% (January 2015) to 59%
proportion of isolated badge categories in Packagist
                                                          two years later. The situation is different for Packag-
is 27.5% (=19.7+1.3+1.8+4.7) while this proportion
                                                          ist where many projects already existed before badges
reaches 34.2% in Cargo (=21.9+12.3).         In both
                                                          were available. Indeed, around 5% of projects using
datasets, we found that Build is the most frequent
                                                          badges in Packagist were created before the availability
isolated badge category by far (21.9% in Cargo, 19.7%
                                                          of such badges. This proportion is only 2% for Cargo,
in Packagist).
                                                          which is not surprising given that Rust appeared in
                                                          2010. The adoption of badges in Packagist is therefore
Table 2: Proportion of co-occurring badge categories      more gradual, going from practically 0% (June 2013)
that were adopted simultaneously in Cargo and Pack-       to 35-40% in two years; with the notable exception of
agist projects.                                           Build badges, whose adoption occurred much faster.
                QA           Pop           Info
                                                          By August 2012, these badges had been adopted by
  Build 54% 38% 64% 71% 65% 71%                           60% of the projects, a probable consequence of the
    QA                    51% 49% 50% 43%                 introduction of Travis-CI in March 2011.
   Pop                                  80% 92%                                      cargo
                                                                                   1.0
                                                               prop. of projects




   Since we found a non-trivial amount of co-occurring                             0.8
                                                                                   0.6
badge categories, in a second step, we examine how fre-                            0.4                                               Build    Pop
                                                                                   0.2                                               QA       Info
quently badges belonging to different categories were
                                                                                   0.0   2012   2013   2014   2015     2016   2017     2018   2019
added on the same calendar day in a project. When-                                                             packagist
                                                                                   1.0
ever a category is represented by several badges within
                                                               prop. of projects




                                                                                   0.8
a same project, the introduction date of the old-                                  0.6
                                                                                   0.4
est badge is considered. Table 2 shows the propor-                                 0.2
tion of co-occurring badge categories that were intro-                             0.0   2012   2013   2014   2015     2016   2017     2018   2019
duced simultaneously. We observe for Cargo that most
co-occurring badge categories are adopted simultane-          Figure 3: Evolution of the proportion of projects,
ously. For Packagist, it mainly depends on the con-           grouped by badge category.
sidered combination. For instance, a large majority
of the combinations involving Build+Pop, Build+Info              For each project and badge category, we measured
or Pop+Info corresponds to badges introduced on the           the elapsed time before the first introduction of a
same day. On the other hand, we observe that 62%              badge of a given category in a given project. Since
(=100-38) of the combinations with Build+QA are not           some projects predate the availability of (services ad-
adopted simultaneously but in subsequent events.              vertised by) badges, we measure this time with respect




                                                          4
to the date of the first opportunity to introduce those            usually found together and that the other categories
badges. So, if the project was created prior to the first          of badges were rarely found without a corresponding
occurrence of a category in our dataset, then the date             Build or Info badge. We also found that co-occurring
of this first occurrence is used as a baseline. Other-             badges were frequently adopted simultaneously. We
wise, we relied on the creation date of the project as a           next examined the temporal aspect of badge adoption
baseline.                                                          and found that the adoption rates of badge categories
                                                                   were either increasing or stable. We also showed that
Table 3: Elapsed time before the first instance of a               badges were usually added early on, within the first 5%
badge category in a project (Cargo and Packagist)                  of a project’s lifetime, but still a significant amount of
                          in days           proportionally         projects adopt badges much later. The results we ob-
     category        median       mean     median       mean
         Info          6 3 119 155        7% 3% 26% 31%
                                                                   tained are in line with those of Trockman et al [1] for
        Build          3 2     89 156     3% 2% 21% 30%            npm.
         QA           10 5 157 163       11% 5% 29% 61%               As future work, we intend to investigate the impact
         Pop           4 3 107 170        4% 3% 25% 24%            of badge adoption on contributions. In doing so, an
    All categories     6 4     112 166    5% 4% 25% 39%
                                                                   aspect to take into account will be the comparative ef-
   Table 3 reports on the median and mean of these                 fort required to maintain some badges over others. We
durations, aggregated by badge category. The left part             also will quantify the phenomenon of badge removal,
of the table expresses these durations in days since the           determine the reasons why it occurs and what is the
date of first opportunity, while the right part expresses          impact on contributions.
them proportionally to the opportunity window (i.e.,
time between the date of the first opportunity and the             References
last known commit of a project). The huge difference                [1] Asher Trockman, Shurui Zhou, Christian
between median and mean values suggests skewed dis-                     Kästner, and Bogdan Vasilescu. Adding sparkle
tributions: while a majority of badges are quickly in-                  to social coding: An empirical study of repository
troduced in projects, there are some outliers taking                    badges in the npm ecosystem. In Proceedings of
a while to introduce badges. This is especially visi-                   the 40th International Conference on Software
ble in Packagist: its median values are lower than the                  Engineering, ICSE ’18, pages 511–522, New
ones for Cargo but its mean values are much higher.                     York, NY, USA, 2018. ACM.
We also observe that, on average, quality assurance
badges were added much later in Cargo projects (me-                 [2] D. Riehle, P. Riemer, C. Kolassa, and
dian is 10 days vs. 5 for Packagist). In both datasets,                 M. Schmidt. Paid vs. volunteer work in open
Build badges are introduced earlier than other badges.                  source. In 2014 47th Hawaii International Con-
                                                                        ference on System Sciences, pages 3286–3295, Jan
                                                                        2014.
4      Conclusion
                                                                    [3] Israr Qureshi and Yulin Fang. Socialization in
We carried out an empirical analysis of the usage of
                                                                        open source software projects: A growth mix-
badges in GitHub repositories, with the ultimate goal
                                                                        ture modeling approach. Organizational Research
of determining their impact on contributions to open
                                                                        Methods, 14(1):208–238, 2011.
source projects. As a preliminary step, we sought to
determine whether badges were widely used in projects               [4] Jailton Coelho and Marco Tulio Valente. Why
for two popular programming language library reg-                       modern open source projects fail. In Proceedings
istries: Packagist and Cargo.                                           of the 2017 11th Joint Meeting on Foundations
   We found that they are used in more than a third                     of Software Engineering, pages 186–196. ACM,
of Packagist projects and more than half of Cargo                       2017.
projects, and that more and more projects tend to use
                                                                    [5] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E.
them. Still, badge adoption rates lag behind the rate
                                                                        Perry, and W. M. Turski. Metrics and laws of
of appearance of new projects. Then, we categorised
                                                                        software evolution-the nineties view. In Proceed-
badges in seven categories, according to the type of in-
                                                                        ings Fourth International Software Metrics Sym-
formation being relayed by each badge, and measured
                                                                        posium, pages 20–32, Nov 1997.
the relative prevalence of each category. We observed
that Packagist projects use a more diverse set of badges            [6] Taranjeet Kaur, Nisha Ratti, and Parminder
than Cargo, the latter mostly sticking to Build and                     Kaur. Applicability of Lehman laws on open
Info badges. We examined the frequency at which the                     source evolution: a case study. International
most common categories co-occurred within the same                      Journal of Computer Applications, 93(18):0975–
projects, finding that Build badges and Info badges are                 8887, 2014.




                                                               5
[7] Godfrey and Qiang Tu. Evolution in open source               International Conference on Automated Software
    software: a case study. In Proceedings 2000 In-              Engineering, pages 426–437. ACM, 2016.
    ternational Conference on Software Maintenance,
    pages 131–142, Oct 2000.                                 [10] Bogdan Vasilescu, Yue Yu, Huaimin Wang,
                                                                  Premkumar Devanbu, and Vladimir Filkov. Qual-
[8] Felipe Fronchetti, Igor Wiese, Gustavo Pinto,                 ity and productivity outcomes relating to contin-
    and Igor Steinmacher. What attracts newcom-                   uous integration in github. In Proceedings of the
    ers to onboard on OSS projects? tl;dr: Popular-               2015 10th Joint Meeting on Foundations of Soft-
    ity. In Francis Bordeleau, Alberto Sillitti, Paulo            ware Engineering, pages 805–816. ACM, 2015.
    Meirelles, and Valentina Lenarduzzi, editors,
    Open Source Systems, pages 91–103. Springer,             [11] Eirini Kalliamvakou, Georgios Gousios, Kelly
    2019.                                                         Blincoe, Leif Singer, Daniel M. German, and
                                                                  Daniela Damian. The promises and perils of min-
[9] Michael Hilton, Timothy Tunnell, Kai Huang,                   ing GitHub. In Working Conference on Mining
    Darko Marinov, and Danny Dig. Usage, costs, and               Software Repositories (MSR), pages 92–101, New
    benefits of continuous integration in open-source             York, NY, USA, 2014. ACM.
    projects. In Proceedings of the 31st IEEE/ACM




                                                         6