<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dataset of Open-Source Safety-Critical Software</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafaila Galanopoulou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diomidis Spinellis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Management Science and Technology Athens University of Economics and Business</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Patission 76</institution>
          ,
          <addr-line>Athens, 10434</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the method used to create a dataset of open-source safety-critical software, such as that used for autonomous cars, healthcare, and autonomous aviation, through a systematic and rigorous selection process. The dataset can be used for empirical studies regarding the quality assessment of safety-critical software, its dependencies, and its development process, as well as comparative studies considering software from other domains.</p>
      </abstract>
      <kwd-group>
        <kwd>open-source</kwd>
        <kwd>safety-critical</kwd>
        <kwd>dataset</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>https://www.dmst.aueb.gr/dds/ (D. Spinellis)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Software Selection Method</title>
      <p>Our search is based on the methodology guidelines proposed by Kitchenham and Charters [6].
The software selection method consists of three steps:
• candidate system identification based on Google queries, GitHub tags queries, and results
of other studies,
• project repository identification and exclusion based on selection criteria, and
• filtering based on software purpose and characteristics.</p>
      <p>To retrieve relevant SCS open-source projects, we run by hand the following search queries.
The provided search queries are notional; the disjunctions were performed by combining by
hand the results of individual queries.</p>
      <p>OSS: (“open source system” OR “github project” OR “git repository” OR “open source project”
OR “open source hardware”) We used this query in combination with the ones detailed in the
following paragraphs, which target specific SCS domains, applications, and standards. By using
“github” and “git repository” as search keywords, the OSS query finds projects that are likely to
have a Git repository, which can then be queried through an API to further narrow down the
selected projects. We added the “open source hardware” term based on the assumption that an
open-source hardware system might use OSS as well.</p>
      <p>SCS Domains: OSS AND (“Infrastructure” OR “Medicine” OR “Nuclear engineering” OR
“Recreation” OR “Transport” OR “Railway” OR “Automotive” OR “Aviation” OR “Space flight”) This query
associated with broad SCS domains identified 15 projects in total (through a Google search and
a GitHub tag search). We complemented these results for the domain of health applications by
adding five projects derived from the data associated with a recent related study [ 7].
SCS Automotive Applications: OSS AND (“Airbag systems” OR “Braking systems” OR “Seat
belts” OR “Power Steering systems” OR “Advanced driver-assistance systems” OR “Electronic throttle
control” OR “Battery management system for hybrids and electric vehicles” OR “Electric park brake”
OR “Shift by wire systems” OR “Drive by wire systems” OR “Park by wire”) We derived the terms
of this query from Pimentel’s book chapter [8].</p>
      <p>SCS Medical Applications: OSS AND (“Heart-lung machines ” OR “Ventilators” OR “Insulin
pumps” OR “Life critical monitors” OR “Infusion pumps” OR “Robotic surgery”) The terms of this
query were derived from Hamilton’s [9] and Alemzadeh et al. [10] studies.</p>
      <p>SCS Infrastructure Applications: OSS AND (“Circuit breaker” OR “fire systems” OR
“electrical and hydraulic systems” OR “buildings infrastructure” OR “burner control systems”) The query
keywords were derived from studies conducted for civil infrastructure and systems used for
emergency in buildings (e.g. fire) [ 11, 12].</p>
      <p>The three domain-specific queries identified 96 projects in total.</p>
      <p>SCS Standards: OSS AND (“DO-178C” OR “MISRA” OR “MISRA-C” OR “IEC 61580” OR “IEC
880” OR “ISO 9000”) The standard names were derived from conference presentations associated
with the examined SCS domains derived from the Conference on Digital Avionics Systems [13].
This query identified 4 projects in total.</p>
      <p>For practical reasons in each step we excluded projects lacking a description or having a
non-English description (e.g. Japanese, Portuguese).</p>
      <p>The second step of our selection method was based on a project’s repository characteristics.
Here we excluded projects without a Git repository (Github or GitLab), inactive projects (lacking
a commit in the last eight months) and unpopular projects (having fewer than 70 GitHub stars).
Through these criteria we rejected 57 projects leaving us with 63.</p>
      <p>In our last step we applied an exclusion criterion based on whether the project served a
safety-critical role. For example, in the medical category, we excluded projects associated with a
hospital’s enterprise resource planning (ERP) services or the keeping of patient records. We did
this by running the following Google search query for each project and studying the results to
identify the project’s purpose. (project-name AND (“applications” OR “in open source hardware”
OR “safety critical systems”). Through this criterion we rejected 21 projects leaving us with 42.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The selected projects are listed in Table 1. For each project we list its name, application field,
popularity (in awarded stars), size (in thousands of lines of code), implementation language(s),
and repository location. The whole dataset and replication package are available online.1</p>
      <p>Most projects are in the automotive (AM) sector (12 out of 42), followed by ten projects in
avionics (AV), six in medicine (MED), three in spaceflight (SP), two in nuclear engineering (NE),
and one in recreation (RE).</p>
      <p>Additionally, we found that 34 unique programming languages are used by the selected 42
projects. The most popular are C++ (used in 29 projects), C (25), Python (24), and the Unix shell
(15).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset Limitations</title>
      <p>The presented dataset sufers from some limitations, which could be lifted in the future. First, the
threshold and inclusion criterion of 70 stars we used is arbitrary. Ideally it should be replaced by
objective criteria, based e.g. on attributes that characterize engineered projects [14]. Second, the
study has excluded projects hosted on platforms other than Github and Gitlab. This was done,
because we are aiming to evaluate the dataset against community metrics that can be gathered
from these platforms, such as the number of forked repositories, open and closed issues, and
contributed pull requests. Nevertheless, if one does not care about these metrics, then more
repository hosting platforms should be considered. Third, the dataset does not incorporate SCS
testing software and components, which can be vital for satisfying safety requirements [ 15].</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Future Work</title>
      <p>The development of SCS as OSS did not prove to be a widespread industrial practice, especially
for avionics. We noticed that some of the field’s leaders (BAE System Avionics, Furuno, Airbus
Helicopters, GE Aviation, Telefunken) maintain GitHub organizations, but evidently prefer to
keep their repositories private.</p>
      <p>The low number of SCS OSS projects we found may be associated with onerous
requirements and diminished incentives. Many SCS application domains, in common with accessibility
OSS [16], may require either specialized peripherals (such as LIDAR or ECG sensors) or powerful
hardware. In addition, SCS software may be based on specialized, inflexible, and non-free
development tools [17]. These requirements, can hinder OSS developer participation. Furthermore,
due to its specialized nature and regulatory requirements, SCS deployment may require strong
organizational backing, thus further limiting and discouraging OSS volunteer participation.</p>
      <p>
        Notably, none of the systems identified in the earlier study [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] passed our inclusion criteria.
Most were excluded due to lack of activity or popularity in terms of GitHub stars. This is a
concern regarding the long-term viability and maintenance of OSS SCSs.
      </p>
      <p>
        The presented dataset can be used to evaluate the quality of OSS SCSs. This can be done
e.g. based on the defect prediction model proposed by Foyzur and Premkumar [18], which
outlines relevant process and code metrics. It is important to study both facets, because the
most successful OSS projects are those featuring not only a high-quality code base, but also
a thriving user community [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Results can then be compared with other OSS endeavors that
have similar engineered project characteristics [14]. Furthermore, results can be qualitatively
evaluated based on practices advocated by Wilson et al. [19, 20].
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>This work has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 825328.
[5] E. P. Jharko, The methodology of software quality assurance for safety-critical systems,
in: 2015 International Siberian Conference on Control and Communications (SIBCON),
2015, pp. 1–5. doi:1 0 . 1 1 0 9 / S I B C O N . 2 0 1 5 . 7 1 4 7 0 5 7 .
[6] B. Kitchenham, S. Charters, Guidelines for Performing Systematic Literature Reviews in
Software Engineering, Technical Report EBSE-2007-01, School of Computer Science and
Mathematics, Keele University, 2007.
[7] G. Minohara, C. Rocha, J. Costa, P. Meirelles, Benefits and challenges of open-source
health informatics systems: A systematic search of projects and literature review., JMIR
Preprints (2020). doi:1 0 . 2 1 9 6 / p r e p r i n t s . 1 8 4 8 9 .
[8] J. Pimentel, SECTION 1: INTRODUCTION TO SAFETY-CRITICAL AUTOMOTIVE
SYS</p>
      <p>TEMS, 2006, pp. 1–1. doi:1 0 . 4 2 7 1 / P T - 1 0 3 .
[9] D. K. Hamilton, Chapter 7 - design for critical care, in: A. Sethumadhavan, F.
Sasangohar (Eds.), Design for Health, Academic Press, 2020, pp. 129–145. URL: https://
www.sciencedirect.com/science/article/pii/B9780128164273000075. doi:h t t p s : / / d o i . o r g /
1 0 . 1 0 1 6 / B 9 7 8 - 0 - 1 2 - 8 1 6 4 2 7 - 3 . 0 0 0 0 7 - 5 .
[10] H. Alemzadeh, R. K. Iyer, Z. Kalbarczyk, J. Raman, Analysis of safety-critical computer
failures in medical devices, IEEE Security &amp; Privacy 11 (2013) 14–26.
[11] R. G. Little, Toward more robust infrastructure: observations on improving the resilience
and reliability of critical systems, in: 36th Annual Hawaii International Conference on
System Sciences, 2003. Proceedings of the, IEEE, 2003, pp. 9–pp.
[12] J. Bosch, P. Molin, Software architecture design: evaluation and transformation, in:
Proceedings ECBS’99. IEEE Conference and Workshop on Engineering of Computer-Based
Systems, IEEE, 1999, pp. 4–10.
[13] N. Metayer, A. Paz, G. El Boussaidi, Modelling do-178c assurance needs: A design
assurance level-sensitive dsl, in: 2019 IEEE International Symposium on Software Reliability
Engineering Workshops (ISSREW), 2019, pp. 338–345. doi:1 0 . 1 1 0 9 / I S S R E W . 2 0 1 9 . 0 0 0 9 4 .
[14] N. Munaiah, S. Kroh, C. Cabrey, M. Nagappan, Curating GitHub for engineered
software projects, Empirical Software Engineering 22 (2017) 3219–3253. doi: 1 0 . 1 0 0 7 /
s 1 0 6 6 4 - 0 1 7 - 9 5 1 2 - 6 .
[15] K. Amarendra, A. V. Rao, Safety critical systems analysis, Global journal of computer
science and technology (2011).
[16] M. Heron, V. Hanson, I. Ricketts, Open source and accessibility: Advantages and limitations,</p>
      <p>Journal of Interaction Science 1 (2013). doi:1 0 . 1 1 8 6 / 2 1 9 4 - 0 8 2 7 - 1 - 2 .
[17] S. Suomalainen, Kunnanmäki, Open-source components in safety critical systems, 2004.
[18] F. Rahman, P. Devanbu, How, and why, process metrics are better, in: Proceedings of
the 2013 International Conference on Software Engineering, ICSE ’13, IEEE Press, 2013, p.
432–441.
[19] G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D.</p>
      <p>Haddock, K. D. Huf, I. M. Mitchell, M. D. Plumbley, B. Waugh, E. P. White, P. Wilson, Best
practices for scientific computing, PLOS Biology 12 (2014) 1–7. URL: https://doi.org/10.
1371/journal.pbio.1001745. doi:1 0 . 1 3 7 1 / j o u r n a l . p b i o . 1 0 0 1 7 4 5 .
[20] G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, T. K. Teal, Good enough practices
in scientific computing, PLOS Computational Biology 13 (2017) 1–20. doi: 1 0 . 1 3 7 1 / j o u r n a l .
p c b i . 1 0 0 5 5 1 0 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Knight</surname>
          </string-name>
          ,
          <article-title>Safety critical systems: challenges and directions</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Software Engineering. ICSE</source>
          <year>2002</year>
          ,
          <year>2002</year>
          , pp.
          <fpage>547</fpage>
          -
          <lpage>550</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Androutsellis-Theotokis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spinellis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kechagia</surname>
          </string-name>
          , G. Gousios,
          <article-title>Open source software: A survey from 10,000 feet, Foundations and</article-title>
          Trends in Technology,
          <source>Information and Operations Management</source>
          <volume>4</volume>
          (
          <year>2011</year>
          )
          <fpage>187</fpage>
          -
          <lpage>347</lpage>
          .
          <source>doi:1 0 . 1 5</source>
          <volume>6 1 / 0 2 0 0 0 0 0 0 2 6 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Sulaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oručević-Alagić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Borg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wnuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Höst</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. L. de la Vara</surname>
          </string-name>
          ,
          <article-title>Development of safety-critical software systems using open source software - a systematic map</article-title>
          ,
          <source>in: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / S E A A . 2</source>
          <volume>0 1 4 . 2</volume>
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Margan</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Čandrlić,</surname>
          </string-name>
          <article-title>The success of open source software: A review</article-title>
          ,
          <source>in: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1463</fpage>
          -
          <lpage>1468</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ M I P R O</surname>
          </string-name>
          .
          <volume>2 0 1 5 . 7 1 6 0 5 0 3 .</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>