<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enabling combined software and data engineering: the ALIGNED suite of ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Monika Solanki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bojan Bozic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Freudenberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitris Kontokostas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rob Brennan</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Dirschl</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AKSW/KILT, University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>KDEG, School of Computer Science and Statistics, Trinity College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Wolters Kluwer</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>E ective, collaborative integration of software and big data engineering for Web-scale systems, is now a crucial technical and economic challenge. This requires new combined data and software engineering processes and tools. Semantic metadata standards and linked data principles, provide a technical grounding for such integrated systems given an appropriate model of the domain. In this paper we introduce the ALIGNED suite of ontologies speci cally designed to model the information exchange needs of combined software and data engineering. The models have been deployed to enable: tool-chain integration, such as the exchange of data quality reports; cross-domain communication, such as interlinked data and software unit testing; mediation of the system design process through the capture of design intents and as a source of context for model-driven software engineering processes. These ontologies are deployed in web-scale, data-intensive, system development environments in both the commercial and academic domains. We exemplify the usage of the suite on a complex collaborative software and data engineering scenario from the legal information system domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper has been accepted in the ISWC 2016 Resources track. Recent years
have seen a signi cant increase in the demand for data-intensive applications.
However our engineering techniques for building data-intensive systems are both
immature and often partitioned into software engineering and data engineering
processes, tasks or teams. The expressivity of semantic models makes them useful
for both addressing data quality [2] and applying model-driven approaches to
software engineering. Semantic data, in the form of enterprise linked data is
also useful for describing, fusing and managing the combined data and software
engineering lifecycles to increase productivity, agility and system quality.</p>
      <p>
        In this paper, we present a suite of ontologies developed within the ALIGNED5
project, that aim to align the divergent processes encapsulating data and
software engineering. The key aim of the ALIGNED ontology suite is to support
the generation of combined software and data engineering processes and tools
for improved productivity, agility and quality. The suite contains linked data
ontologies/vocabularies designed to: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) support semantics-based model driven
software engineering, by documenting additional system context and constraints
for RDF-based data or knowledge models in the form of design intents, software
lifecycle speci cations and data lifecycle speci cations; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) support data
quality engineering techniques, by documenting data curation tasks, roles, datasets,
work ows and data quality reports at each data lifecycle stage in a data
intensive system; and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) support the development of tools for uni ed views of
software and data engineering processes and software/data test case
interlinking, by providing the basis for enterprise linked data describing software and
data engineering activities (tasks), agents (actors) and entities (artefacts) based
on the W3C provenance ontology6.
      </p>
      <p>This ontology suite has been deployed for validation and incremental
improvement in the ALIGNED project on four, large-scale data-intensive systems
engineering use cases: the Seshat Global History Databank which is compiling
linked data time series relating to all human societies over the past 12,000 years;
JURION7, a legal information platform developed by Wolters Kluwer Germany;
PoolParty8, a semantic technology middleware developed by the Semantic Web
Company; and the DBpedia+9 data quality and release processes.</p>
      <p>The paper is structured as follows: Section 2 presents an overview of the
ALIGNED suite. It provides a brief description of the core ontologies in the
suite.Section 3 presents an evaluation of the ontologies in the suite. Finally,
Section 4 presents conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of the ALIGNED suite</title>
      <p>Figure 1 illustrates the ALIGNED suite of ontologies split into the provenance,
generic, and domain-speci c layers. As can be seen from the gure, a high
emphasis has been placed on reusing existing, well known and standardised
specications where available. At the top layer, the W3C provenance standard forms
the baseline for all our speci cations and all our models extend it in some way.
The split of the ALIGNED ontology suite between a generic layer and a domain
speci c extensions layer allows rapid evolution of domain-speci c extensions for
the ALIGNED use cases/trial environments (JURION, Seshat, DBpedia,
PoolParty) based on a stable set of core concepts modelled in the generic layer. As
the project progresses these extensions will be evaluated and incorporated into
the generic layer if they prove valuable or more widely applicable than a single
domain. Within the project the suite of ontologies is known as the "ALIGNED
metamodel" due to the links with software engineering practices.
6http://www.w3.org/ns/prov-o
7https://www.jurion.de/
8https://www.poolparty.biz/
9http://wiki.dbpedia.org/</p>
      <p>The ALIGNED suite of ontologies</p>
      <p>Fig. 1. The ALIGNED Suite of Ontologies
3</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>Combining data and software engineering processes to increase productivity and
agility, is a challenge being faced by several organisations aiming to exploit the
bene ts of big data. Ontologies and vocabularies developed in accordance to
competency questions, objective criteria and ontology engineering principles can
provide useful support to data scientists and software engineers undertaking the
challenge. In this paper we have proposed the ALIGNED suite of ontologies that
provide semantic models of design intents, domain speci c datasets, software
engineering processes, quality heuristics and error handling mechanisms. The
suite contributes immensely towards enabling interoperability and alleviating
some of the complexities involved. We have exempli ed the usage of the suite
on a real-world use case from the legal domain and evaluated it against the
desired criteria. As ontologies from the suite are now in various stages of adoption
by the ALIGNED use cases, the next steps would incorporate their empirical
evaluation.</p>
      <p>10https://figshare.com/articles/ISWC2016_Resources_Track_Review_
Instructions/2016852</p>
      <p>
        Generic criteria Evaluation
Value Addition (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) The ontologies add data and software engineering speci c metadata to the process
and enrich information about process speci c procedures within data and software
engineering for a tool, which in return can use this context dependent information for
automation and automatic generation purposes. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) DLO is used to provide details
about the data engineering process and SLO details about the software engineering
process. (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) RVO helps producing information about reasoning errors in the knowledge
base, while DIO enables the mining of design intents from requirements speci cation
as well as the generation of uni ed governnce reports by integrating requiremebnts
and design issues.
      </p>
      <p>
        Reuse (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Potential reuse across a wider community of content producers, owners of large
amounts of data, data managers, ontology engineers of new related ontologies and
vocabularies (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Software development model designers, and developers of human
societies datasets (e.g. Seshat Global History Databank). (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) The metamodels are
easy to reuse and published on the Web together with detailed documentation. Top
level models are general and can be applied for all data and software engineering
models. Furthermore, the models are extendable and can be inherited by specialised
domain ontologies for speci c software and data engineering platforms.
      </p>
      <p>Design and Technical quality All ontologies have been designed as OWL DL ontologies, in accordance to ontology
engineeirng principles. Axiomatisations in the ontologies have been de ned based on
the competency questions identi ed during requirements scoping.</p>
      <p>Sustainability All ontologies are deployed on a public Github repositories. Long term sustainability
has been assured by the ontology engineers involved in the design.</p>
      <p>Speci c criteria
Design suitability</p>
      <p>Individual ontologies in the suite have been developed in close association with the
requirements emerging from corresponding, potential exploiting application.Thus they
closely conform to the suitability of the tasks for which they have been designed.</p>
      <p>Design elegance and quality Axiomatisation in the ontologies have been developed following Gruber's principles [1]
of clarity, coherence, extendability, minimum encoding bias and minimum ontological
commitment.</p>
      <p>Logical correctness The ontologies have been veri ed using DL reasoners for satis ability, incoherency
and inconsistencies. Speci cally, inconsistencies for DIO has been checked against the
instance data in the governance triple store.</p>
      <p>External resources reuse External ontologies such as PROV-O, SKOS have been extensively used.
Documentation The ALIGNED public deliverables and publications [3, 4] include detailed
descriptions of the models. The ontologies have been well documented using rdfs:label and
rdfs:comment. HTML documentation via the LODE service has also been enabled.</p>
      <p>All ontologies have been graphically illustrated.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>This research has received funding from the European Unions Horizon 2020
research and innovation programme under grant agreement No 644055, the
ALIGNED project (www.aligned-project.eu).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Gruber</surname>
          </string-name>
          .
          <article-title>Toward principles for the design of ontologies used for knowledge sharing</article-title>
          .
          <source>Int. J. Hum.-Comput</source>
          . Stud.,
          <volume>43</volume>
          (
          <issue>5-6</issue>
          ):
          <volume>907</volume>
          {
          <fpage>928</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          , M. Brummer, S. Hellmann,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          .
          <article-title>Nlp data cleansing based on linguistic ontology constraints</article-title>
          .
          <source>In ESWC</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dirschl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leuthold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          .
          <article-title>Semantically enhanced quality assurance in the jurion business use case</article-title>
          .
          <source>In ESWC</source>
          <year>2016</year>
          (to appear),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Solanki</surname>
          </string-name>
          . DIO:
          <article-title>A pattern for capturing the intents underlying designs</article-title>
          .
          <source>In Proceedings of the 6th Workshop on Ontology and Semantic Web Patterns (WOP</source>
          <year>2015</year>
          ), volume Vol-
          <volume>1461</volume>
          . CEUR-WS.org,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>