<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Menendéz)
ç https://aksw.org/GustavoPublio (G. C. Publio); https://labra.weso.es/ (J. E. L. Gayo)
ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ontolo-CI: Continuous Data Validation With ShEx</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gustavo Correa Publio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Emilio Labra Gayo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillermo Facundo Colunga</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Menendéz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AKSW Research Group, University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>WESO Research Group, University of Oviedo</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The amount of public linked data published on the Web has been growing more and more over the last years. In order to keep the consistency of this continuously-growing base of datasets, data validation is a necessity for data publishers and maintainers. To address such validation of ontologies, there are mainly two shapes-based languages, e.g. ShEx and SHACL. The former is pointed out as a concise, formal, modeling approach, while the second is a W3C recommendation for data validation. SHACL already has available tools to perform validation on the fly, but ShEx still lacks this feature. In order to reduce this gap, this work presents Ontolo-CI: a tool for automated data validation, capable of accepting ShEx shapes as input, allowing users to validate their data on the fly through an CI/CD approach, by using GitHub Actions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;shex</kwd>
        <kwd>data validation</kwd>
        <kwd>ontology validation</kwd>
        <kwd>continuous integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        For this purpose, SHACL has SHARK [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a SHACL-based ontology validation framework.
But ShEx still misses a continuous-integration and continuous-development (CI/CD) approach
for validation of ontologies on the fly. To address this gap, in this paper we will present
OntoloCI, a CI/CD tool based in GitHub Actions which is capable of running tests and validating RDF
data with ShEx automatically for a given GitHub repository.
      </p>
      <p>The paper is divided as following: Section 2 describes some related work. Section 3 describes
the features, architecture and general implementation of the Ontolo-CI tool, and finally on
Section 4 we present our preliminary conclusion and possibilities of future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Recent works have been trying to tackle the challenge of data validation in collaborative
environments with diferent approaches and technologies. However, to our knowledge, none of
them are able to provide ShEx validation with CI/CD capabilities yet.</p>
      <p>
        With focus on biological ontology development (OBO ontology), the ROBOT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a
command line tool which can be integrated in custom CI/CD environments and has the option to
run validations at logical level (i.e., look for incoherency). It also runs validations based in a
SPARQL query, but misses the capabilities of a validation language such as ShEx or SHACL.
      </p>
      <p>
        The OnToology work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] lies in a similar place: it is capable of being integrated in GitHub
repositories (although technically not directly in CI/CD environments), and triggers, besides
other features, the evaluation of the new ontology according to OOPS! pitfalls [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but it lacks
the possibility to use a validation language with custom test cases.
      </p>
      <p>
        Another approach, the eXtreme Design methodology (XD), has TESTaLOD [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a tool
designed for supporting the testing team of XD projects - but although it is able to read files from
a Git repository, it cannot be automatically integrated to Git environments, and is only able to
validate tests written with the TestCase OWL meta mode [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Finally, SHARK [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is the closest to what we have proposed. It uses the Travis-CI to run
pre-defined or custom SHACL tests over an uploaded ontology file, and publishes the ontology
in a GitHub repository. As a WebService, it is able to be integrated to a CI/CD environment
such as GitHub actions, although prior configuration would be necessary.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation</title>
      <p>Ontolo-CI1 is a docker based system that integrates with GitHub to provide a continuous
integration system for ontologies. In order to achieve that, it uses Shape Expressions and test
instances for data validation.</p>
      <p>The tool is focused on develop continuous integration for ontologies, although inspired in
Travis-CI and many other continuous integration systems. For that, it enables authors that uses
GitHub as version control system to add it as a check to Pull Requests or Pushes to diferent
branches via GitHub Actions.</p>
      <p>In fig. 1 , it is demonstrated an abstraction of Ontolo-CI. As can be seen, it can be deployed
as a docker container in any machine. Then it will listen to GitHub webhooks. Whenever a
1Available in https://github.com/weso/ontolo-ci
webhook from GitHub arrives, it immediately schedules a build. After the build is finished,
Ontolo-CI will notify GitHub and publish the data on its web page.</p>
      <sec id="sec-3-1">
        <title>3.1. Architecture &amp; Technical details</title>
        <p>Its implementation is mainly in Java, with the support of Docker2 containers in order to achieve
easiness of execution regardless the user’s platform. For the web module, the JavaScript
language with the support of React.js3 component were used. Finally, a NoSQL database instance
of MongoDB4 was also used to store the results of each run, as can be seen in fig. 2 .</p>
        <p>In order to achieve scalability, each module work independently, in a microservices
architecture, while the whole builds up the system’s functionalities. Those modules are:
• Listener: The listener component receives notifications from GitHub when a Pull
Request is started or when commits are pushed. This notifies the scheduler about the new
build to perform.
• Hub: It acts as a GitHub API interface client. It allows the system to collect files from</p>
        <p>GitHub but also to inform about the status of the builds.
• Scheduler: This component receives builds to schedule from the listener, then creates a
worker with the build and schedules its execution.
• Worker: Each worker contains a build to execute. A build is a set of tests to execute
over an ontology. It only knows how to execute tests when told and who to notify when
ifnished.
• Database: When a build is finished by a worker, the results of the build are stored in
a database. Up to now the results that are being stored are: repository, branch, event,
result. The result stores not only the results of the test cases, but also the execution time
and other metrics.
• API : The API provides an access layer for third party services that need to explore the
data from an Ontolo-CI instance. It is also used by the web service. It only allows reading
data at the time.
2https://www.docker.com/
3https://reactjs.org/
4https://www.mongodb.com/</p>
        <p>• Web Provides the interface that displays to the user the results of all scheduled
executions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Features</title>
        <p>For every run, there are a couple of features to provide the results to the user, as listed below:
• GitHub Check Runs: Ontolo-CI uses GitHub Check Runs5 in order to provide detailed
feedback on commits. Every time a push or a pull request is made, Ontolo-CI creates and
updates a check run with the status of the validation process.
• GitHub Details View: Once the validation has finished, a more detailed view can be
seen in the details section of the check run.
• Website Dashboard view: The Dashboard View shows in the website page all the builds
that has been executed over the Ontolo-CI instance. It includes information such as the
name of the repository where the build comes from, Owner of the repository, commit
message and ID, branch name and number, and date and time of the execution.
• Website Build-specific view : The user can get details of the run in a build-specific view
for each build. This view shows all the test cases that make up the build. The users
can see the details of the validation process for any test. The detailed view of a test
case shows the validation and the expected result for each node with its shape. It is also
possible to see the full shape map result of the test case.</p>
        <p>More details and illustrations on the above features can be found in the Ontolo-CI website.6
5https://docs.github.com/en/rest/checks/runs
6https://www.weso.es/ontolo-ci/</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future work</title>
      <p>In this work, we introduced Ontolo-CI, a tool capable of validating ontologies through ShEx
on the fly within a GitHub environment. By using the GitHub Actions, the tool is capable
of producing checks and reports for every change in the repository, making sure that new
changes does not introduce inconsistencies in the data according to the defined ShEx shapes
tests. We plan to extend the tool to enable the validation of SHACL shapes, as well as its
features, transforming it into a more comprehensive Ontology Validation Framework.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>Gustavo Correa Publio acknowledges the Schwarz IT KG for the sponsorship of his
participation in the conference.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud'Hommeaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Boneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <article-title>Validating rdf data</article-title>
          ,
          <source>Synthesis Lectures on Semantic Web: Theory and Technology</source>
          <volume>7</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          ,
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Solbrig</surname>
          </string-name>
          ,
          <article-title>Shape expressions: an rdf validation and transformation language</article-title>
          ,
          <source>in: Proceedings of the 10th International Conference on Semantic Systems</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <article-title>Shapes Constraint Language (SHACL), Recommendation</article-title>
          , W3C,
          <year>2017</year>
          . URL: https://www.w3.org/TR/2017/REC-shacl-
          <volume>20170720</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parvizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mellish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          v. Deemter,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Towards competency question-driven ontology authoring</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>752</fpage>
          -
          <lpage>767</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Publio</surname>
          </string-name>
          ,
          <article-title>Shark: A test-driven framework for design and evolution of ontologies</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>314</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. C</given-names>
            .
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Balhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Douglass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. L.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Mungall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Overton</surname>
          </string-name>
          ,
          <article-title>Robot: a tool for automating ontology workflows</article-title>
          ,
          <source>BMC bioinformatics 20</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alobaid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Santana-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>Automating ontology engineering support activities with ontoology</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>57</volume>
          (
          <year>2019</year>
          )
          <fpage>100472</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <article-title>Oops!(ontology pitfall scanner!): An on-line tool for ontology evaluation</article-title>
          ,
          <source>International Journal on Semantic Web and Information Systems (IJSWIS) 10</source>
          (
          <year>2014</year>
          )
          <fpage>7</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pasqual</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <article-title>Agile knowledge graph testing with testalod</article-title>
          .,
          <source>in: ISWC (Satellites)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hammar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <article-title>Engineering ontologies with patterns-the extreme design methodology., Ontology Engineering with Ontology Design Patterns (</article-title>
          <year>2016</year>
          )
          <fpage>23</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>