<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G. Dziwis); wenige@infai.org (L. Wenige); lpmeyer@infai.org (L. Meyer); martin@infai.org
(M. Martin)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>OntoFlow: A user-friendly Ontology Development Workflow</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gordian Dziwis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Wenige</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars-Peter Meyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Applied Informatics</institution>
          ,
          <addr-line>Goerdelerring 9, 04109 Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>For many years, the development of widely applicable and high-quality ontologies has been an ongoing research topic. Among the various challenges, the lack of integrated development environments for nontechnical domain experts has been one of the most pressing research issues. But while the participation of domain experts is vital for the applicability of ontologies, there are hardly any software tools available that facilitate their active engagement. We present a solution that addresses this research gap by automating the ontology development process with the help of a workflow engine. We define a pipeline that facilitates ontology implementation, serialization, documentation and testing within the scope of a seamless automatic routine that can be easily set up by the ontology engineer and triggered by a non-technical domain expert. Thus, the processing pipeline takes care of most of the operations that usually have to be carried out by an ontology or software engineer. We demonstrate the applicability of the approach by developing an ontology with OntoFlow and validating its functioning with a large-scale ontology dataset from Linked Open Vocabularies (LOV).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology</kwd>
        <kwd>Workflow</kwd>
        <kwd>Integrated Development Environment</kwd>
        <kwd>Quality Assurance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the proliferation of knowledge graphs, ontology development has become a central part
of data integration activities. Due to the increasing eforts around FAIR data and research
data infrastructures worldwide [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], collaborative and decentralized processes around ontology
development are on the rise.
      </p>
      <p>
        However, available software tools fall short of the multi-layered requirements of ontology
development which comprises labor-intensive tasks such as ontology modeling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], serialization,
validation and documentation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Moreover, these tasks often have to be carried out
collaboratively and in a decentralized manner as several (often geographically dispersed) groups of
people are involved in the ontology development process, thus making it a multi-stakeholder
endeavor [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition to that, many of the tools available are dificult to handle for domain
experts without an IT or Semantic Web background [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While these experts have an
extraordinarily high level of expertise in their field (e.g., in life sciences, physics or cultural heritage)
they might not be as familiar with the methods and tools commonly applied in software and
knowledge graph engineering.
      </p>
      <p>
        This situation is exacerbated by the fact that there exists no standard tool stack for ontology
development in the Semantic Web community. Although some tools, such as Protégé [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], are
considerably widespread, they only cover partial aspects of the engineering process such as
ontology modeling or editing and cannot be used in the sense of a fully-fledged integrated
development environment (IDE). On top of that, many software tools for RDF file processing
can either only be used as a command-line utility and/or provide limited usability in terms
of a graphical user interface (GUI). However, virtualization and continuous integration (CI)
techniques from software engineering provide novel opportunities for ontology development [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and can also help to ease participation of domain experts.
      </p>
      <p>
        With OntoFlow we propose a solution that bundles several necessary operations of ontology
engineering in a single automated workflow by adopting best practices from CI pipelines [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Thus, we are able to integrate ontology modeling, serialization, testing and documentation in a
unified process which makes time-consuming, repetitive and error-prone manual work mostly
superfluous.
      </p>
      <p>The main contributions of the paper are as follows:
• Survey of the state-of-the-art of software-aided ontology engineering (Sect. 2).
• Conceptualization and implementation of the software tool OntoFlow that supports
ontology development through automatic pipelines to help users develop
ontologies faster and easier (Sect. 3).
• Demonstration of the feasibility of OntoFlow by evaluating its performance for all
ontologies listed in the Linked Open Vocabularies (LOV) repository (Sect. 4).
• Summarizing the most important findings and outlining future research directions for
improving ontology development with OntoFlow. (Sect. 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Although some existing software applications already support collaborative ontology
development processes and also automate them in parts, there is a lack of approaches as to how
these processes can be linked in the sense of a CI pipeline while at the same time ensuring that
laymen can actively participate in ontology engineering through making changes and
triggering updates [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Important impulses for designing fully-fledged pipelines for software-aided
ontology development come from the following sub-disciplines.
      </p>
      <sec id="sec-2-1">
        <title>2.1. DevOps for Data Collections</title>
        <p>
          Continuous development and integration strategies have become an indispensable part of
modern software engineering. They largely consist of clean-up operations, compilation of
executables, application of automated tests, and the deployment of the finished application,
including generation of appropriate documentation if necessary. Part of these processes is the
constant checking of any updates/new versions, as well as the triggering of troubleshooting
activities if problems occur [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This ensures that software solutions are always up-to-date, that
applications meet predefined quality standards and rely on stable software artifacts. Likewise,
in the wake of ever-growing data volumes and increased relevance of data-driven applications,
efective mechanisms to control the quality-assured publication of data products are required
for IT operations. Processes that follow CI principles are equally needed for eficient production
of data artifacts. The knowledge graph community has been one of the first to adopt DevOps
best practices for datasets since it heavily relies on high-quality data schemas and reproducible
workflows for data conversion, integration and fusion. In this line of research, several authors
have proposed automated data pipelines that take care of data transformation, apply quality
assurance operations and automatic procedures for data publication and efective description
of data artifacts [
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17">10, 11, 12, 13, 14, 15, 16, 17</xref>
          ]. CI mechanisms for data collection involve
operations such as crawling, linking, or data transformation. These processes are usually
executed automatically and are not interrupted by user interactions and manual intervention.
If people work with software tools in this context, they are mostly data scientists or software
engineers.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontology Editors</title>
        <p>This modus operandi difers from the determining factors in ontology development. The creation
of an ontology usually involves several experts with diverse backgrounds and diferent levels of
technical expertise. Therefore, an efective ontology development environment/pipeline has to
foster collaboration as well as provide a graphical editor to efectively support development
processes. By this means, even domain experts with no IT expertise can actively take part in
the creation of ontologies.</p>
        <p>
          Ontology editors such as OntoEdit [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], OntoSeer [19], Protégé [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], Vocol [20] or WebVOWL [21]
are already established software tools for ontology development and visualization. Additionally,
the Semantic Application Design Language ofers an English-like language for semantic
modeling1. SemML uses a template-based approach to help domain experts create ontologies [22].
Other applications in this area focus more on aspects of collaborative and version-based storage
of ontologies so that changes can be managed decentrally and tracked over time. Software tools,
such as Ontoology [23] or the QuitStore [24] provide solutions for these kinds of requirements.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Ontology Documentation &amp; Quality Assurance</title>
        <p>Tools, such as Oops! focus more on the aspect of quality assurance [25] while general-purpose
RDF data testing tools, such as RDFUnit [26] or pySHACL [27] can be equally applied for
ontology testing. Just as important as quality assurance of ontologies is documentation for
end-users. Software applications that automatically generate ontology documentations are
WIDOCO [28], LODE [29] or pyLODE2. They create an HTML representation from ontologies
in standard RDF serialization formats. WIDOCO even goes beyond the purely automatic
creation of ontology documentation by enabling metadata enrichment or ontology testing.</p>
        <sec id="sec-2-3-1">
          <title>1https://github.com/SemanticApplicationDesignLanguage/sadl 2https://github.com/RDFLib/pyLODE</title>
          <p>However, it does not provide an easy-to-use interface for domain experts. Moreover, it is a
Java-based monolithic software, which makes modifications and extensions with tools from
other technology stacks (e.g., Python-based software applications) dificult.</p>
          <p>The ROBOT framework ofers many operations needed for ontology manipulation in the
development process but lacks workflow capabilities. The framework’s documentation refers to
the rather technical oriented Make3 tool for more complex needs.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Ontology Development Workflow</title>
      <sec id="sec-3-1">
        <title>3.1. Requirements</title>
        <p>The goal of OntoFlow is the best possible optimization and automation of ontology development
processes which typically currently involve a great number of labor-intensive tasks, such
as modeling, serialization, updating, testing and documentation. Due to the fact that such
development processes are carried out by several stakeholders, the workflow environment
should foster a collaborative and decentralized way of working.</p>
        <p>Since the development of ontologies often involves domain experts who have only limited
expertise in the field of software and data engineering, OntoFlow should provide a GUI for</p>
        <sec id="sec-3-1-1">
          <title>3https://www.gnu.org/software/make/</title>
          <p>Inputs</p>
          <p>From User</p>
          <p>OWL Files
Convert to</p>
          <p>OWL</p>
          <p>Merge
Ontology Files</p>
          <p>Document
Ontology
Serialize
to RDF
Validate
Ontology
Versioning</p>
          <p>Diff Detection</p>
          <p>Ontology Artifact Package
RDF Serializiations Documentation
Validation Reports Semantic Diff
editing the ontology while automating the most common tasks (e.g., bash scripting and Git
interaction) in the background. Due to the limited IT experience of some involved stakeholders,
it is also vital that workflows can be triggered without in-depth technical understanding of
software development or semantic technologies. Reducing the amount of necessary skills for
the domain expert is critical, because generally ontology development is a one-time job for
them. A reusable ontology workflow setup directed at domain experts helps to avoid costs for
learning the involved technologies and setting up a development and hosting infrastructure
(e.g., running a web server for publishing the ontology and documentation) which - apart from
working time - can incur further costs for licenses and hardware. Because there is no standard
established methodology for ontology development, and it happens in diverse environments,
OntoFlow must be flexible and facilitate easy modifications to accommodate diferent needs.
Table 1 gives an overview of the requirements detailed in the above sections.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Workflow Structure</title>
        <p>The first process is the transformation of the ontology diagrams to OWL files and merging
those with (potentially existent) other OWL files provided by the user. The resulting ontology
is the input for the following processing steps carried out in parallel: A HTML documentation
is generated, describing the ontology’s metadata, classes and properties. The ontology is
serialized into multiple RDF formats. During validation, the ontology is tested against some
previously defined SHACL shapes. If needed, OntoFlow can be adapted so that the publication
of the ontology fails whenever certain SHACL constraints are violated. OntoFlow picks up
pulled from
executed in
executed in
executed in
Gitlab CI Runner</p>
        <p>OntoFlow
Container</p>
        <p>App A
Container</p>
        <p>App B
Container
App C
Container</p>
        <p>Process 1</p>
        <p>Process 2
executed in</p>
        <p>Process n
commit triggers
Nextflow Workflow
draw.io</p>
        <p>SHACL
Shapes
OWL Files
the ‘owl:priorVersion‘ property and detects semantic diferences between versions. This gives
an overview of which classes changed in the current version compared to the previous one.
The serializations, documentation and validation reports as well as the semantic diferences
compose the Ontology Artifact Package.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Implementation and Architecture</title>
        <p>The architectural model of OntoFlow is a pipeline consisting of processes where each operation
takes inputs, transforms them and outputs the results. Process inputs and outputs are
interconnected. In the component diagram 2 this is represented with the beige box labeled
"Nextflow Diagram". Operations are carried out by diferent application programs with a
command-line interface. All applications are containerized, and the container images are
defined by a configuration file. A workflow engine manages container life cycles, executes the
applications and pipes the inputs and outputs between the processes. The pipeline operations
with their executing containers and commands as well as inputs and outputs are defined in a
workflow file. Each application container is based on an image for which a configuration file
exists. There is a container for OntoFlow itself, which has all the necessary dependencies for
running the workflow engine. The images are hosted in a container registry. A continuous
integration script builds the images and pushes them to the registry. Image, as well as ontology
ifles, are managed inside Git repositories to enable version control.</p>
        <p>OntoFlow provides a continuous integration script, which is triggered whenever a change in
the ontology repository occurs. When executed, the OntoFlow container is started with the
mounted ontology repository. Inside the container, the engine initializes the workflow with the
ifles from the ontology repository and OntoFlow’s validation shapes as inputs. Each process
step is then executed in its defined application container. The final outputs are deployed to
their target environment.</p>
        <p>The pipeline was implemented as a Nextflow [ 30] workflow and Docker was chosen as the
container engine. For each tool used in a process step, a Docker image was chosen or created.
The following list describes the software tools used in OntoFlow:
• Chowlk Converter and draw.io4: Converts the ontology diagrams to OWL files.</p>
        <p>Chowlk[31] provides a library of OWL diagram elements for draw.io5 which is an open
source diagram software.
• pySHACL: Validates the ontology against SHACL shapes.
• Bubastis6: Creates a semantic dif in comparison to an older ontology version.
• Jena Riot7: Serializes and merges RDF files.
• sparql-integrate8: Extracts and manipulates data in RDF files.</p>
        <p>• pyLODE: Creates the ontology documentation in HTML.</p>
        <p>OntoFlow can be run on any Linux system with Java and Docker installed, which facilitates
easy modifications and extensions. OntoFlow is integrated into the GitLab ecosystem, thus
boosting the options for automation. GitLab’s container registry hosts the tool images, and its
continuous integration infrastructure is utilized for running OntoFlow. The Ontology Artifact
Package is hosted as a GitLab page. With the schema files managed in a GitLab repository, it is
possible to edit the ontology collaboratively, because draw.io can use a GitLab repository as its
storage backend. This also works when multiple persons are editing the same diagram. This
provides the option for collaborative editing without the user needing to explicitly interact with
Git. Chowlk diagrams are draw.io diagrams of an ontology or parts of an ontology. They are
defined with elements from the Chowlk Ontology Visual Notation library, exported to an XML
File. A line-wise dif comes for free through the usage of a Git repository. For documentation
purposes, the semantic dif provided by Bubastis[32] gives a more useful overview over the
changes between ontology versions. Bubastis analyzes and reports on the five major types of
ontology changes.</p>
        <p>Before an ontology dif can be created, the serialized file and location of the previous ontology
version has to be determined. To achieve this, we run a custom SPARQL query against the input
4https://github.com/oeg-upm/Chowlk
5https://app.diagrams.net/
6https://github.com/EBISPOT/bubastis
7https://jena.apache.org/documentation/io/
8https://github.com/SmartDataAnalytics/RdfProcessingToolkit
2,000 4,000 6,000 8,000 10,000
RDF file with SPARQL-integrate and output the URI locating the prior ontology version.
We curated a set of tools that fulfill our functional requirements, packaged each in a Docker
container, wired them together with Nextflow and integrated the resulting workflow with the
GitLab ecosystem. The OntoFlow code and documentation are available on GitLab, so that
anyone interested can use the software and see how the tool works.9</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We verified the proper execution of the workflow in Section 4.1, by testing it on a large-scale
benchmark dataset of publicly available ontologies, to be able to make some valid statements
about scalability and applicability of our approach for a wide range of domains. In order to
validate our approach, we evaluated in Section 4.2 to what extent the requirements formulated
in Section 3 are fulfilled, in regard to the implementation and the real world use of OntoFlow in
the KupferDigital project.</p>
      <sec id="sec-4-1">
        <title>4.1. Benchmark with LOV Ontologies</title>
        <p>We obtained the benchmark dataset from the ontology repository Linked Open Vocabularies and
downloaded the corresponding ontology files. 10 We decided to use LOV as a reference point
for benchmark generation as it is one of the most comprehensive collection of ontologies and
vocabularies throughout the Semantic Web [33].</p>
        <p>In total, we extracted 799 files from the LOV registry 11. We counted the number of triples of
the ontologies and processed each ontology with OntoFlow. Since LOV ontologies are already
available in a common RDF format and no ontology diagrams are provided for them, we skipped
9https://gitlab.com/infai/ontoflow
10https://lov.linkeddata.es/dataset/lov/sparql
11https://gitlab.com/infai/ontoflow-benchmark/-/tree/main/ontologies
the processing step of ontology serialization from the workflow for these vocabularies. We
logged the processing time and additional information on whether the workflow completed
successfully. From the 799 ontologies, OntoFlow was able to process all of them and did produce
their output files (i.e., Ontology Artifact Packages). We also randomly selected 25 ontologies
and checked their output files for plausibility and consistency after the workflow had been
executed. OntoFlow produced syntactically correct and plausible ontology files, error reports
and documentations for each one of these ontologies.</p>
        <p>Because the architectural model is a pipeline with no conditional branching and the inputs
and outputs can go only one way through the workflow steps, we are confident that OntoFlow
also processed the other ontologies successfully and thus provides a stable environment for the
generation, serialization, validation, post-processing and publication of ontologies.
The successful execution of the workflow for numerous ontologies demonstrates that OntoFlow
is a valuable software tool for automating ontology development processes for a wide range of
application domains.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Requirements Evaluation</title>
        <p>OntoFlow’s implementation was tested during the materials science project KupferDigital,
which intends to develop a platform for the digitalization of materials research and the metal
processing industry. Several experts and institutes are currently involved in the development of
a set of ontologies, which semantically represent the work involved in the development of
copper materials, from ore extraction and alloy development to production and recycling of the
material. Each sub-ontology can consist of several components that can each be modeled with
Chowlk, subsequently processed, merged and documented with OntoFlow. The advantage of
the approach is that domain experts can work independently as well as collaboratively on the
separate ontology components as the corresponding files are stored in a GitLab repository
which serves as a storage backend for draw.io. At the moment, numerous domain experts are
working simultaneously on 14 diferent sub-ontologies from the copper domain. For each
change made to a sub-ontology, OntoFlow is triggered and a new version of the ontology
artifact package is created. Beta versions of those ontologies developed with OntoFlow are
publicly available.12
The following list summarizes the OntoFlow functionalities with regard to the previously
identified requirements in light of the insights that were gained during the use case, our
implementation and the benchmark.</p>
        <p>RQ1.1 Ontology Serialization: OntoFlow can output any RDF serialization supported by
Apache Riot.</p>
        <p>RQ1.2 Ontology Validation: The ontology is validated against SHACL shapes and the result
is added to the output.</p>
        <p>RQ1.3 Ontology Post-processing: OntoFlow picks up a prior version from the ontology and
outputs a semantic dif . Currently, OntoFlow is not able to add the metadata about versioning
during a release automatically.
12https://gitlab.com/kupferdigital/wiki/-/wikis/Ontologies#subontologies
RQ1.4 Ontology Documentation: OntoFlow does generate HTML documentation with
pyLODE and is also able to add diagrams to illustrate the modeling of ontology parts. The
semantic dif , serializations and validation results are not yet integrated into the documentation.
We are waiting for a rewrite of pyLODE, which will make it easier to modify the HTML
templates.</p>
        <p>RQ1.5 Ontology Publication: When set up as a continuous integration job, OntoFlow
completely automates the publishing process. There are two limitations. One is, when published on
GitLab pages, the URI has always the schema http( s):// groupname.example.io/ projectname. For a
custom URI a DNS record, which redirects to the GitLab page URI, has to be set up. Secondly,
GitLab pages do not support content negotiation13. With content negotiation, an HTTP client
can specify which type (or types) of content it would prefer to receive when it attempts to
dereference a URI. For example, while a browser accessing the ontology’s URI would like to
receive the ontology’s documentation, a script would prefer to receive a machine-readable
serialization.</p>
        <p>RQ2.1 GUI support: The ontology can be built in draw.io with diagram elements from the
Chowlk shape library, which is extensively documented.14 Because the Chowlk diagram depicts
OWL and not a general RDF graph, it is not possible to model all aspects of the ontology with
a diagram. Metadata like provenance, examples for classes or comments have to be defined
in an additional RDF file. The domain experts liked the visual definition approach, due to the
similarity to object-oriented modeling approaches. But while this reduces the barrier of entry for
domain experts, this familiarity is also a fallacy, and we see a risk for badly designed diagrams.
We got the feedback from the domain experts, that OntoFlow’s output from the process steps
is quite technical and dificult to use. At the moment, it is only accessible via scanning the CI
pipeline’s job log.</p>
        <p>RQ2.2 Easy execution: When run as a GitLab CI job, OntoFlow starts automatically upon
saving the draw.io XML. Setting up this automated workflow is done by copying a CI
configuration file to the ontology repository. Running OntoFlow locally only requires Java, Docker and
Nextflow installations to be present on the host machine. OntoFlow can then be triggered by a
CLI command. With the help of the documentation, the domain experts were able to set up the
CI job on their own. They did not try to run OntoFlow locally, because they are forced by their
IT to use Windows as their OS. This was also an issue for an industry partner, who also had
policies against using external repositories and could not execute OntoFlow at all.
RQ2.3 Fast execution: Benchmark evaluations on the LOV dataset revealed that OntoFlow’s
execution time scales linearly with the number of triples. Fig. 3 shows the relationship between
the triple count in an ontology and the OntoFlow’s processing time. It took OntoFlow at least 17
seconds to process an ontology, and the maximum workflow duration was 314 seconds for an
ontology with 103,098 triples. A linear regression model with a coeficient determinant of 0.91
is a good fit to model the relationship between triple count and execution time. Workflows were
run on a 4-core laptop with a solid-state disk. The zoomed graph from Fig. 3 shows that for
most ontologies, durations are clustered around 25s. Execution time could be further reduced
by optimizing the architectural setup, but the need did not arise during the development of the
13https://lov.linkeddata.es/dataset/lov/sparql
14https://chowlk.linkeddata.es/notation.html
copper ontology.</p>
        <p>RQ2.4 Easy modification: The main component of OntoFlow is the Nextflow script that
describes the workflow. With 215 lines of code, it is very short, and it follows a simple logic
of shell commands which exchange files as their inputs and outputs. Everyone familiar with
basic command-line skills can modify this script (e.g., by modifying or adding workflow steps)
to suit their needs. OntoFlow’s only dependencies are Java, Nextflow and Docker, so starting
development only encompasses installing those and cloning the OntoFlow repository.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion &amp; Future Work</title>
      <p>We have presented the OntoFlow approach that automates ontology development processes.
By transferring CI techniques from software engineering to the domain of ontology and data
management, we are able to perform previously complex and error-prone ontology engineering
tasks faster and without the excessive labor that was previously required. This is made possible
by the use of virtualized containers that take care of diferent steps of ontology processing. The
containers are intelligently linked to reduce the need for manual interventions. In this way,
the development process consisting of ontology modeling, editing, serialization, validation,
and documentation can be realized faster and easier for the end-user. In addition, the use
of GUI-based open-source modeling software and its integration with a tool for automated
shape-to-RDF conversion enables participation of domain experts who have little to no Semantic
Web expertise. Simultaneous collaborative engagement of diferent stakeholders in multiple
geographically dispersed locations is enabled through Git integration.</p>
      <p>OntoFlow workflows are executable for any type of ontology and can be set up by configuring
a GitLab repository with ontology files and a CI configuration. The containerized bundling of
diferent components implements a modular structure that ensures easy maintainability and
reusability. We have tested the correct operation of the approach for the use case of the copper
ontology. The artifacts automatically generated by the workflow can be viewed publicly (see
Sect. 4).</p>
      <p>In addition, we have carried out a large-scale evaluation on an ontology benchmark dataset
from Linked Open Vocabularies to test the scalability, feasibility and applicability of the approach
for a wide range of ontologies. The evaluation has shown that OntoFlow is easily executable for
the several hundred ontologies listed in LOV, as the workflow completed successfully for each
example in the dataset. In addition, performance analysis has shown that OntoFlow’s execution
time scales linearly with the number of triples. It is also positive to note that processing times
for small to medium-sized ontologies (&lt; 10, 000 triples) are in the range of seconds. This could
probably be reduced even further when one would build a single container for all tools as in a
public GitLab pipeline each image get pulled for each execution.</p>
      <p>To make the pipeline output better accessible for the domain experts, we started with rendering
SHACL validation results with the help of a Jekyll RDF15 template and include more artifacts to
the rendered documentation.</p>
      <p>We did not expect, that being able to draw arbitrary diagrams with draw.io could also be useful.
The domain experts can start by using draw.io for expressing the shared conceptualization with
15https://github.com/AKSW/jekyll-rdf
free form diagrams in an informal, intuitive way and then refine it to a “formal specification of
a shared conceptualization” [34]. We will research if it is possible to integrate more aspects of
the informal conceptualization phase of the ontology development process, like a glossary or
competency questions, into OntoFlow. For supporting the formal specification, we will extend
OntoFlow with a reasoner and evaluate if the output from the reasoner can be integrated in a
meaningful way. We also aim to extend the git dif -based version tracking for ontologies so that
major changes (e.g., in the class hierarchy) are automatically recorded in the documentation.
Another line for future research is the conceptualization and implementation of efective and
modularized workflows for automatic ontology-based data transformation, post-processing and
linking tasks.
ontology development for the semantic web, in: International semantic web conference,
Springer, 2002, pp. 221–235.
[19] P. Bhattacharyya, R. Mutharaju, Ontoseer: A tool to ease the ontology development
process, in: 8th ACM IKDD CODS and 26th COMAD, 2021, pp. 428–428.
[20] L. Halilaj, N. Petersen, I. Grangel-González, C. Lange, S. Auer, G. Coskun, S. Lohmann,
Vocol: An integrated environment to support version-controlled vocabulary development,
in: European Knowledge Acquisition Workshop, Springer, 2016, pp. 303–319.
[21] S. Lohmann, V. Link, E. Marbach, S. Negru, Webvowl: Web-based visualization of
ontologies, in: International Conference on Knowledge Engineering and Knowledge Management,
Springer, 2014, pp. 154–158.
[22] B. Zhou, Y. Svetashova, A. Gusmao, A. Soylu, G. Cheng, R. Mikut, A. Waaler, E. Kharlamov,
Semml: Facilitating development of ml models for condition monitoring with semantics,
Journal of Web Semantics 71 (2021) 100664.
[23] A. Alobaid, D. Garijo, M. Poveda-Villalón, I. Santana-Perez, O. Corcho, Ontoology, a tool
for collaborative development of ontologies., in: ICBO, 2015.
[24] N. Arndt, P. Naumann, N. Radtke, M. Martin, E. Marx, Decentralized collaborative
knowledge management using git, Journal of Web Semantics 54 (2019) 29–47.
[25] M. Poveda-Villalón, A. Gómez-Pérez, M. C. Suárez-Figueroa, Oops!(ontology pitfall
scanner!): An on-line tool for ontology evaluation, International Journal on Semantic Web and
Information Systems (IJSWIS) 10 (2014) 7–34.
[26] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, A. Zaveri,
Test-driven evaluation of linked data quality, in: Proceedings of the 23rd international
conference on World Wide Web, 2014, pp. 747–758.
[27] A. Sommer, N. Car, pyshacl, 2021. doi:10.5281/zenodo.5503838.
[28] D. Garijo, Widoco: a wizard for documenting ontologies, in: International Semantic Web</p>
      <p>Conference, Springer, 2017, pp. 94–102.
[29] S. Peroni, D. Shotton, F. Vitali, The live owl documentation environment: a tool for
the automatic generation of ontology documentation, in: International Conference on
Knowledge Engineering and Knowledge Management, Springer, 2012, pp. 398–412.
[30] P. D. Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, C. Notredame, Nextflow
enables reproducible computational workflows, Nature Biotechnology 35 (2017) 316–319.
doi:10.1038/nbt.3820.
[31] S. Chávez-Feria, R. García-Castro, M. Poveda-Villalón, Converting UML-based
ontology conceptualizations to OWL with chowlk, in: The Semantic Web: ESWC
2021 Satellite Events, Springer International Publishing, 2021, pp. 44–48. doi:10.1007/
97830308041838.
[32] J. Malone, E. Holloway, T. Adamusiak, M. Kapushesky, J. Zheng, N. Kolesnikov, A. Zhukova,
A. Brazma, H. Parkinson, Modeling sample variables with an Experimental Factor Ontology,
Bioinformatics 26 (2010) 1112–1118. doi:10.1093/bioinformatics/btq099.
[33] P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, B. Vatant, Linked open
vocabularies (lov): a gateway to reusable semantic vocabularies on the web, Semantic Web 8
(2017) 437–452.
[34] W. N. Borst, Construction of engineering ontologies for knowledge sharing and reuse,
1997.
This work has been funded by the German Federal Ministry of Education and Research under
grant numbers 13XP5119F and 13XP5116B and by the German Federal Ministry for Digital and
Transport under grant number 19FS2001A.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          , et al.,
          <article-title>Ontology development 101: A guide to creating your ifrst ontology</article-title>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Subhashini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Akilandeswari</surname>
          </string-name>
          ,
          <article-title>A survey on ontology construction methodologies</article-title>
          ,
          <source>International Journal of Enterprise Computing and Business Systems</source>
          <volume>1</volume>
          (
          <year>2011</year>
          )
          <fpage>60</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          , Ontology engineering methodology, in: Handbook on ontologies, Springer,
          <year>2009</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tudorache</surname>
          </string-name>
          , Ontology engineering: Current state, challenges, and future directions,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>125</fpage>
          -
          <lpage>138</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-190382.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Gennari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Fergerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Grosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crubézy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Eriksson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <article-title>The evolution of protégé: an environment for knowledge-based systems development</article-title>
          ,
          <source>International Journal of Human-computer studies 58</source>
          (
          <year>2003</year>
          )
          <fpage>89</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fowler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Foemmel</surname>
          </string-name>
          , Continuous integration (
          <year>2006</year>
          ). URL: https://www.martinfowler. com/articles/continuousIntegration.html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Humble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Farley</surname>
          </string-name>
          , Continuous Delivery:
          <article-title>Reliable Software Releases through Build, Test,</article-title>
          and Deployment Automation, Addison-Wesley Signature Series (Fowler),
          <string-name>
            <surname>Pearson</surname>
            <given-names>ITP</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mungall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carbon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ireland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <article-title>Continuous integration of open biological ontology libraries</article-title>
          ,
          <source>in: The 15th Annual Bio-Ontologies Meeting</source>
          <year>2012</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirulli</surname>
          </string-name>
          ,
          <article-title>Continuous integration for xml and rdf data</article-title>
          , XML LONDON (
          <year>2015</year>
          )
          <fpage>52</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Klímek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Skoda</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Necasky`, Requirements on linked data consumption platform</article-title>
          .,
          <source>in: LDOW@ WWW</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kucera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chlapek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klímek</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Necasky`, Methodologies and best practices for open data publication</article-title>
          .,
          <source>in: DATESO</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Meissner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <article-title>Using devops principles to continuously monitor rdf data quality</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on Semantic Systems</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Rojas Meléndez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Van de Vyvere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gevaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>A preliminary open data publishing strategy for live data in flanders</article-title>
          ,
          <source>in: Companion Proceedings of the The Web Conference</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>1847</fpage>
          -
          <lpage>1853</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Putlier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sukhobok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Elvesaeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Simov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Petkov</surname>
          </string-name>
          , Datagraft:
          <article-title>Simplifying open data publishing</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wenige</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tramp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <article-title>Rdf-based deployment pipelining for eficient dataset release management</article-title>
          .,
          <source>in: SEMANTICS Posters&amp;Demos</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brümmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baron</surname>
          </string-name>
          , I. Ermilov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Freudenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          , Dataid:
          <article-title>Towards semantically rich metadata for complex datasets</article-title>
          ,
          <source>in: Proceedings of the 10th International Conference on Semantic Systems</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Angele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wenke</surname>
          </string-name>
          , Ontoedit: Collaborative
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>