<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards automated open source assessment - An empirical study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sai Pranav Koyyada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denim Deshmukh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deepika Badampudi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vida Ahmadi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Usman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Blekinge Institute of Technology</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>City Network International AB</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The open source software (OSS) assessment has become important given the increased adoption of OSS in commercial product development. Researchers proposed many OSS assessment models. However, little is known about the industrial relevance of the models. In this study, we proposed an automated tool based on the OSS assessment attributes identified together with a European cloud provider company. We analyzed 51 repositories to observe patterns in maintenance activities over their lifetime (from inception to the latest release). Based on the analysis, we propose a novel approach for evaluating the maturity of the OSS project. Finally, we assessed the usefulness of our automated solution in a pilot study.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;OSS assessment automation</kwd>
        <kwd>commit classification</kwd>
        <kwd>software maturity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Software companies increasingly adopt Open Source</title>
        <p>
          Software (OSS), which has become part of the
mainstream practice in software engineering [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. However,
the selection of OSS is still challenging [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The
practitioners in our case company: a European cloud
provider, reported similar challenges. They mentioned
gathering information from multiple sources and tools
as a complex and time-consuming activity. The case
company identified the need to automate the OSS
assessment to reduce the selection efort.
        </p>
        <p>
          The automation of the OSS assessment requires the
identification of the attributes practitioners consider
for OSS selection. In the last two decades, many OSS
quality models have been proposed to assist the OSS
selection. Lenarduzzi et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] identified discrepancies
in the information provided in the evaluation models
and the practitioners’ information needs. In addition,
little is known about how relevant these models are in
practice as they have not been validated extensively[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          The OSS quality assessment models suggest many
evaluation attributes. Maintenance is one of the most
considered quality attributes in the OSS assessment
models proposed in the previous studies [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Li et al.
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] conducted a survey to understand the attributes
practitioners consider important in OSS selection.
        </p>
        <p>
          Practitioners mention maintenance as an important
attribute; however, they did not mention metrics for
assessing maintenance [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Metrics are important to
automate quality assessment. However, practitioners did
mention metrics for assessing software maturity, such as
the number of forks, number of releases, and number of
commits [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, Li et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] suggested that while
some practitioners consider the number of commits
as a metric to evaluate software maturity, evaluating
the prevalence of commits over time and the types of
commits may be more useful.
        </p>
        <p>
          Levin and Yehudai [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] proposed a model to classify
the comments based on the maintenance activities.
        </p>
        <p>
          However, their motivation to classify was to improve
planning and resource allocation for maintenance. As
indicated by Li et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the prevalence of commits over
time and types of commits could be a good measure
of maturity. However, they did not find any portals
that efectively provide community-related factors to
automate OSS project assessment [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Therefore, it is
interesting to investigate how commit classification
based on maintenance activities can help automate OSS
assessment.
        </p>
        <p>Iterations
Iteration 1
Iteration 2
Focus group:
Attributes identification
Identifying dependencies
and modules for developing
the tool for automation
Iteration 2 Improvement suggestion
from the case company</p>
        <p>Design
List of attributes</p>
        <p>to automate
Automated tool 1.0
Automated tool 2.0</p>
        <p>Internal validation of the list
of attributes completeness</p>
        <p>Case
company</p>
        <p>Validation
Internal validation of
Automated tool 1.0
Internal + external
validation of
Automated tool 1.0</p>
        <p>Case
company
Case
company +
external
stakeholders
maintenance activities carried out in OSS projects. We two authors discussed with three developers on the
comanalyzed 51 OSS projects, including frameworks, APIs, pleteness of the attributes. Overall the developers agreed
libraries, databases, and applications, to collect attributes with the list of attributes.
that can help facilitate the OSS assessment. Finally, Iteration 2: We identified the diferent sources to
rewe conducted a pilot qualitative study to understand trieve the required attributes. We used various API
Endpractitioners’ opinions on the usefulness of the commit points such as GitHub REST API, Stack Exchange REST
classification based on maintenance activities and API, python libraries such as pydriller, OWASP
Depencommonly considered selection attributes in the OSS dency Checker, and other OSS solutions to gather the
assessment. relevant information. We built an automated solution
that could be triggered with a single command and return
all results from the assessment in the JSON format which
2. Research methodology we refer to as Automated tool 1.0 in Figure 1. We discuss
Our goal is to answer the following research questions - the tool in a focus group with the security expert and
three developers. Our solution used the JSON files
generRQ1: What OSS attributes are considered important to ated output to present the assessment results. The focus
automate in the case company? group participants from the case company requested a
RQ2: How can commit classification be used to facili- feature that pooled all the results on one page.</p>
        <p>tate OSS assessment? Iteration 3: The problem identification for this iteration
RQ3: How do practitioners perceive the usefulness of was the input from the validation in Iteration 2.
Thereour automated solution? fore, we created a module that could show the results
from various JSON files generated from our automated
solution on one page which we refer to as Automated
tool 2.0 in Figure 1. The source code, modules used in
the automated solution, and the documentation of the
tool usage are provided online1. We verified the
functioning of our automated tool by assessing 51 diferent OSS
projects. Our automated solution generated the expected
results. We presented the automated solution through a
technical demo at the case company. During the
technical demo, we assessed two OSS: one framework and one
application. The results took under a minute to generate.</p>
        <p>We finally interviewed 10 developers from three
diferent companies (including the case company) to validate
the completeness of the assessment attributes and the
usefulness of the automated tool.</p>
      </sec>
      <sec id="sec-1-2">
        <title>We used design science method[9] to answer the above</title>
        <p>research questions. Figure 1 depicts the iterations in the
solution development and validation. The steps carried
out in each iteration are described as follows.
Iteration 1: In this iteration our goal is to identify the
attributes that the case company considers important
for automating OSS assessments. We used focus groups
where key stakeholders from the case company and the
authors discussed the diferent attributes. The input to
the focus group was the case company’s checklist for
OSS assessment and the attributes frequently reported in
the literature. The goal of the focus group was to identify
attributes that can be automated to improve the eficiency
and efectiveness of the OSS assessment. The outcome
of the focus group was the list of attributes to automate.
The first two authors were employed at the case company,
providing them easy access to the developers involved
in the OSS assessment. For internal validation, the first</p>
      </sec>
      <sec id="sec-1-3">
        <title>1https://github.com/SaipranavK/oss-recon</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Results</title>
      <sec id="sec-2-1">
        <title>Security: The security expert at the case company</title>
        <p>identified security as an important criterion for adopting
This section presents our results from the focus group OSS projects.
study, analysis of commit classification, and preliminary Community interest: In addition, the company
revalidation based on practitioners’ perceptions of our au- views the support availability by considering the number
tomated solution. of questions and answers posted with tags associated
with the OSS project on StackOverflow.</p>
        <sec id="sec-2-1-1">
          <title>3.1. OSS assessment attributes</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>3.2. Commit classification based on</title>
          <p>maintenance activities to evaluate</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>OSS projects</title>
          <p>Identifying the attributes required in the OSS assessment
to automate the process is important. We discussed the
attributes, the required metrics, and the information to
automate the OSS assessment in the focus group. The
input to the focus group was the collection of attributes We propose using commit classification, among other
and metrics frequently considered in the literature and metrics, to evaluate OSS projects. Each new release of
the case company. We selected the attributes and metrics an OSS has additions or deletions compared to the
previbased on their importance and the ease of understanding ous version. These additions and deletions are changes
perceived by the case company. The case company pre- that introduce new features, fix bugs, or extend the
supferred using descriptive representation than numerical port of OSS. The changes implemented in a release are
metrics-based representations. Therefore, the case com- called maintenance activities. The Software Engineering
pany did not want information on traditional metrics like Body of Knowledge(SWEBOK) and IEEE14764 categorize
code complexity, coupling, cohesion, and other similar software maintenance activities as corrective, adaptive,
metrics. This section presents the attributes for assessing perfective, and preventative. Preventative and corrective
the OSS projects. Table 1 contains the attributes consid- activities are corrections, given their purpose to fix latent
ered important for assessing OSS projects by the case and operational faults. Perfective and adaptive activities
company. We also present the metrics and information are modifications to improve the software.
gathered to access the attributes (see the second column We visualized the commit activeness based on the
in Table 1). maintenance activities from the inception to the latest
re</p>
          <p>
            Repository information: The company starts as- lease of the OSS Project. We used the approach proposed
sessing the OSS project by reviewing general repository by Levin et al.[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] to classify the maintenance activities.
information (see details on repository information in Ta- Figure 2 shows the commit classification of an example
ble 1). OSS across its diferent releases. Each release has a
distriRepository activeness: In addition to the generic infor- bution of three activities: corrective in red, perfective in
mation, the company reviews the repository’s activeness yellow, and adaptive in blue. Preventative activities are
by reviewing the average time it takes to release a new proactive activities, and there are comparatively dificult
version, the number of open issues, and the list of active to capture for classification. Therefore, our study did not
or recent releases of the OSS project. In addition to the include classification based on preventative activities.
company’s attributes, we added age, last updated date,
commit activities: the number of deletions and additions,
and commit classification: the number of corrections,
adaptions and perfective activities metrics.
3.2.1. Commit maturity: A novel perspective on
          </p>
          <p>commits classification</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We analyzed 51 OSS repositories, including frameworks,</title>
        <p>APIs, libraries, databases, and applications. The
visualizations on commit classification on 51 open source
repositories used for analysis are provided online2. We
observed the maintenance activities of the repositories
from their inception to the latest release. The analyzed 51
OSS included very popular repositories with more than
10000 stars, popular repositories with over 5000 stars,
and some growing OSS with over 500 stars on GitHub.</p>
        <p>We observed adaptive activity is the least performed
activity in each release for all the OSS repositories.
Corrective and perfective activities have the most variance,
i.e., the number of corrective activities may be higher
than perfective in some commits and lower in others.</p>
        <p>When a certain maintenance activity’s frequency goes between Adaptive and Corrective activities =
down, and another maintenance activity’s frequency goes
up, we call it a crossover. ( − 1 &gt;  − 1) · ( &lt; ) (1)</p>
        <p>We counted the number of crossovers for each combi- Where (Ai) is a list of Adaptive Activities and (Ci) is a list
nation of the maintenance activities and mapped them of Corrective Activities for each version of the software.
with the type of the OSS. We noticed that all OSS except Similarly, we can calculate three types of intersections.
the ones of type frameworks had a similar number of The number of intersections between each pair of
activcrossovers for each combination of maintenance activi- ities, i.e., Adaptive, Corrective, and Perfective, defines
ties. commit maturity.</p>
        <p>Lehman’s law [10] suggests that a system should con- Commit maturity is the number of times each
maintetinuously change to remain useful. A change can be mea- nance activity crosses other maintenance activities over
sured by the number of counts of corrective, adaptive, the project life cycle. It will allow practitioners to see if
and perfective requests over a certain period [11]. Any the OSS project maintains the balance between the
mainsoftware project should have good features with mini- tenance activities. Figure 3 shows the commit maturity
mum bugs and extensive support. Whenever a crossover and the number of crossovers for each crossover pair.
happens, it indicates that the focus of the community Each dot represents an occurrence of a crossover in a
shifts from one maintenance activity to another to make release. In this example, in release 1.0, the corrective
acsustainable progress. If the community only focuses on tivities decreased, and the perfective activities increased,
adding new features, then there may be many bugs mak- resulting in a crossover. The flask project has a total
ing the OSS unusable. The same applies when the com- crossover count of 21 out of 23 releases. Based on our
munity is only resolving bugs in the OSS. It means that hypothesis, it is a good maturity indication, and the
popthe OSS has many bugs to be addressed and does not ularity of the flask project is a testimony to it.
introduce new features to improve its value. The balance The case company requested all the information
between the maintenance activities should be maintained. needed to evaluate the OSS repository on one page.
FigFrameworks are unique in this regard because of the scale ure 4 provides a screenshot of our solution. The left panel
of features and platforms they support. As we saw from includes information on the repository, while the
midour analysis, it may not be possible to maintain the bal- dle panel provides information on the metrics to assess
ance between the maintenance activities for frameworks. security, support, and legal requirements. The middle
We hypothesize that the total number of crossovers of panel also includes metrics to evaluate repository
activeeach combination of maintenance activities should ide- ness: age, last updated date, the average time to release,
ally be similar to the total number of releases. Crossover and the number of issues. The commit activity, commit
classification, and commit maturity are represented in commit classification based on diferent maintenance
acgraphical format. Finally, the community’s interest: stars, tivities can be used in OSS assessment. We introduce
forks, and watchers are presented in the top right corner. the use of commit maturity to see if an OSS project is
balanced in feature enhancements and bug fixes or overly
3.3. Practitioners perception of the focused on only one type of maintenance activity (e.g.,
only fixing bugs in case of corrective maintenance). We
automated OSS assessment and</p>
        <p>used our tool to analyze 51 OSS projects. We also shared
commit activeness the results with the company practitioners, who found
Completeness of the evaluation attributes: we aimed our tool helpful in performing the OSS assessments. In
to evaluate if the practitioners found our automated so- the future, we plan to study commit maturity as a metric
lution useful for automatically assessing OSS. All the to assess OSS maintainability through extensive
validaparticipants agreed that the tool could help in the OSS tion and application and standardize it for wider adoption.
assessments. Some of the participants wanted to see Additionally, we will continue to enhance our automated
more attributes. For example, one of the interviewees OSS assessment tool by improving the range of supported
mentioned "I want to gather more information like its com- attributes and desired metric outputs. We also wish to
patibility with diferent operating systems and the tutorials employ repository mining techniques to identify and
corsources.". Since we designed our solution primarily for use relate community activities with the OSS engagement
in the case company, it is not surprising that interviewees and growth trends to comment on its popularity and
supwished for additional attributes. One solution could be port. Another interesting direction of research would be
to create a configurable solution where the stakeholders correlating maintenance activities for an OSS with its
tracan select the important attributes in the assessment. ditional maintainability metrics that can help evaluate the
Ease of understanding the information on OSS as- relationship between maintainability and maintenance
sessment attributes: We explained the attributes to the activities, if any, and thus result in more branching paths
interviewees, particularly the attributes such as commit of research in software metrics and the maintainability
maturity. We asked the interviewees if the attributes domain.
we used were easy to understand. All the participants
unanimously agreed that our attributes were easy to un- Acknowledgment
derstand. The interviewees added that "The attributes
were easy to understand. It was simply like a GitHub page The Knowledge Foundation supports this work through
but with more information.". In addition, the interviewees the OSIR project (reference number 20190081) at Blekinge
were positive about the new attributes such as commit Institute of Technology, Sweden.
maturity: "The commit evolution and maturity was
something new but were still very easy to understand along with
other attributes.". References
Commit classification and maturity: We asked the
interviewees if commit classification and commit maturity
are good visualizations to support the OSS adoption
decision. All the interviewees agreed that the visualization
was useful once we explained the attributes to them. One
of the interviewees mentioned "Managers would love such
a visualization because it not only is simple to understand
but also will help a non-technical person easily comment
on the OSS community.".</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion and future work</title>
      <sec id="sec-3-1">
        <title>We reported initial findings from an empirical study to</title>
        <p>support the practitioners in the OSS assessment process.
With the help of the practitioners in the case company,
we first identified the attributes that could be automated
in performing the OSS assessment. Our tool
automatically collects and presents the data about the identified
attributes in one place to facilitate the practitioners in
performing the OSS assessment. We also investigated how
ing source code changes, in: Proceedings of the
13th International Conference on Predictive Models
and Data Analytics in Software Engineering, 2017,
pp. 97–106.
[6] A. Hindle, D. M. German, M. W. Godfrey, R. C.</p>
        <p>Holt, Automatic classication of large changes into
maintenance categories, in: 2009 IEEE 17th
International Conference on Program Comprehension,
IEEE, 2009, pp. 30–39.
[7] S. Gharbi, M. W. Mkaouer, I. Jenhani, M. B.
Messaoud, On the classification of software change
messages using multi-label active learning, in:
Proceedings of the 34th ACM/SIGAPP Symposium on
Applied Computing, 2019, pp. 1760–1767.
[8] L. Ghadhab, I. Jenhani, M. W. Mkaouer, M. B.
Messaoud, Augmenting commit classification by using
ifne-grained source code changes and a pre-trained
deep neural language model, Information and
Software Technology 135 (2021) 106566.
[9] R. J. Wieringa, Design science methodology for
information systems and software engineering,
Springer, 2014.
[10] M. M. Lehman, Programs, life cycles, and laws of
software evolution, Proceedings of the IEEE 68
(1980) 1060–1076.
[11] E. J. Barry, C. F. Kemerer, S. A. Slaughter, How
software process automation afects software
evolution: a longitudinal empirical analysis, Journal
of Software Maintenance and Evolution: Research
and Practice 19 (2007) 1–31.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Robles</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Steinmacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Treude</surname>
          </string-name>
          ,
          <article-title>Twenty years of open source software: From skepticism to mainstream</article-title>
          ,
          <source>IEEE Software 36</source>
          (
          <year>2019</year>
          )
          <fpage>12</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1109/MS.
          <year>2019</year>
          .
          <volume>2933672</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lenarduzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Taibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lavazza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Morasca</surname>
          </string-name>
          ,
          <article-title>Open source software evaluation, selection, and adoption: a systematic literature review</article-title>
          ,
          <source>in: 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>437</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Yılmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolukısa</surname>
          </string-name>
          <string-name>
            <surname>Tarhan</surname>
          </string-name>
          ,
          <article-title>Quality evaluation models or frameworks for open source software: A systematic literature review</article-title>
          ,
          <source>Journal of Software: Evolution and Process</source>
          (
          <year>2022</year>
          )
          <article-title>e2458</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moreschini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Taibi,
          <article-title>Exploring factors and metrics to select open source software components for integration: An empirical study</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>188</volume>
          (
          <year>2022</year>
          )
          <fpage>111255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Levin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yehudai</surname>
          </string-name>
          ,
          <article-title>Boosting automatic commit classification into maintenance activities by utiliz-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>