<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Achim Reiz</string-name>
          <email>achim.reiz@uni-rostock.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Schlücker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kurt Sandkuhl</string-name>
          <email>kurt.sandkuhl@uni-rostock.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology</institution>
          ,
          <addr-line>Anonymizer, OOPS, NEOntometrics, OWL, RDF</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rostock University</institution>
          ,
          <addr-line>18051 Rostock</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Standard Vocabulary: BRICK</institution>
          ,
          <addr-line>CSVW, DC, DCAT, DCMITYPE, DCTERMS, DCAM, DOAP, FOAF, ODRL2, ORG, OWL, PROF, PROV, QB, RDF, RDFS, SDO, SH, SKOS, SOSA, SSN, TIME, VANN, VOID, WGS, XSD, SWRL</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>The ontologies developed in enterprises often store sensitive information on products, processes, and the overall business. Companies are (understandably) reluctant to share them with persons outside their organization. For some use-cases, though, it is not necessarily the data but the structure that is of interest. This paper presents “OntoAnon”, an anonymizer for ontologies. The Python-based application allows the removal of sensitive information like class-and property names, as well as annotation contents, while preserving the ontology structure and used formalisms. This allows the developing knowledge engineers to use tools like OOPS and NEOntometrics without compromising sensitive data. Further, it allows researchers to collect insensitive but valuable data on enterprise ontologies, e.g., to study evolutional processes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ontologies often contain sensitive data on business processes, products, or persons. The need to
safeguard sensitive internal data within ontologies has emerged as a critical challenge. At the same time,
online tools can aid the development of ontologies. Their recommendations are not necessarily based
on content but can also rely on structure, like graph properties or axiom usage. Examples are
NEOntometrics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and OOPS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Due to security guidelines, these tools are often not allowed, as the
ontologies would be shared with an untrusted third party.
      </p>
      <p>
        As a result, the knowledge engineer misses out on the potentially helpful tools. Also, as researchers
and developers of these tools, we miss out on valuable data. As the developer of NEOntometrics, the
authors of this paper experienced it firsthand: While gathering data on open-source development
processes is relatively easy and already allowed novel discoveries regarding ontology evolution [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
getting insights into internal enterprise development processes is often denied. However, investigating
these internal data yields enormous research potential. For example, software evolution research
showed
discrepancies
between
open- and
closed-source software
developments [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Similar
discrepancies in ontology evolution stand to reason: While open-source data showed no signs of
stereotypical development processes, the statement is not yet proven for internal enterprise data.
Tackling these research questions does not require access to the internal ontology data – the structural
properties are sufficient.
      </p>
      <p>Until today, however, no application allows for sharing structural accurate, anonymized ontologies.
We aim to solve this lack with OntoAnon, an anonymization tool for ontologies that maintains structural
integrity. OntoAnon removes the textual information in an ontology, like the elements’ names and
annotation contents, while preserving the structural attributes. It runs locally, has a simple graphical
user interface (GUI), and creates a textual file for backtracing the translations. Thus, OntoAnon allows
knowledge engineers to use tools like NEOntometrics or OOPS without sharing actual data. This also</p>
      <p>2023 Copyright for this paper by its authors.
enables the developer of these tools to collect further valuable information on ontologies that normally
would not be shared.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Data anonymization and privacy-preserving data publishing already have a broad research
foundation with many available approaches and techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. With the rise of knowledge graphs in
various domains, the specifics of sharing graph data without compromising confidentiality got
significant research attention:
      </p>
      <p>
        Delanaux et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] remove sensitive information with SPARQL-based anonymization policies. They
define queries for information that either shall be preserved or removed, which are the basis for the
anonymization algorithm. Thouvenot et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] developed a technique for anonymization based on
grouping and anatomization, thus altering the relations between critical data points. Hoang et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
developed a time-ware algorithm for evolving graphs that regards insert, update, delete, and re-insert
operations. The same authors also proposed a k-ad approach for adding additional edges to mask users
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>However, most of today’s applications view anonymization from the viewpoint of data, not
structure. They are concerned with guarding privacy and less with sharing the most accurate view of
the ontology structure. Furthermore, current approaches often lack a simple-to-use, easy
implementation.</p>
      <p>OntoAnon focuses on an easy sharing of ontology structure. It suppresses and substitutes all
customcreated vocabulary while preserving the w3c standardized vocabulary. Compared to other approaches
that target the removal of identity information, none of the data is no longer available after reading, but
the graph itself keeps complete integrity.</p>
    </sec>
    <sec id="sec-3">
      <title>3. OntoAnon</title>
      <p>OntoAnon is a standalone Python application. It is open source and available on GitHub2. The
application can either be started by downloading and starting the bundled executable (currently only
available for Windows), by installing the package via pip3 and calling “OntoAnon” from the terminal,
or starting the code manually, best by creating a virtual environment using the included pipfile and the
software pipenv, which installs the only external requirement, the package rdflib4.</p>
      <p>After startup, a GUI appears and queries the required input parameter for the ontology
anonymization (cf. Figure 1). The first field, the Ontology File, is the location of the ontology that shall
2 https://github.com/Uni-Rostock-Win/OntoAnon
3 https://pypi.org/project/OntoAnon/
4 http://rdflib.readthedocs.io/
be anonymized. Identify Format infers serialization, and Anonymized File points to the future
anonymized ontology. The Dictionary File points to the path of the translation file. Namespaces allow
the selection of vocabularies that shall be preserved. By default, all the standardized vocabularies like
the ontology languages RDF(S), OWL, and elements from w3c ontologies like FOAF, PROV, or DC5
are not anonymized. However, it is also possible to deselect ignored namespaces or add further ones.</p>
      <p>A click on Anonymize starts the given process. The application now loads the graph, iterates through
all triples, and replaces non-standardized vocabulary. In this way, the output graph has the same shape,
individuals, and exact usage of the respective attributes without containing actual data. The
anonymization process is depicted in Figure 2.</p>
      <p>After the translation process, the labels, URIs, and literals no longer contain any helpful information.
Figure 3 exemplary presents the object property hierarchy and the object property attributes of
OBI_000304 of the software ontology (swo) 10 loaded in protégé before and after completing the
anonymization process.</p>
      <p>Anonymize</p>
      <p>The corresponding translation file shows how the namespaces, URIs, and literals have been renamed,
allowing the user to trace back the anonymized terms to the original ones. It allows using
structurebased tools like OOPS without sharing critical information and then applying the results to the original
ontology.</p>
      <p>An example of such a use case of the translations is given in Figure 4. It shows an excerpt of an
OOPS analysis for the SWO run on the anonymized files. It identified a pitfall for Subject97. The
structural analysis results can be backtracked to the class OBI_000304 using the translation file. The
example shows that the structural recommendations and results of a tool like OOPS can be used without
sharing the actual content.</p>
      <sec id="sec-3-1">
        <title>Excerpt Translation:</title>
        <p>http://purl.obolibrary.org/obo/OBI_0000304 =&gt;
http://anonymurl.anon/Namespace8/Subject97
http://purl.obolibrary.org/obo/NCBITaxon_9606 =&gt;
http://anonymurl.anon/Namespace8/Subject605</p>
      </sec>
      <sec id="sec-3-2">
        <title>Ontology Pitfall Scanner (OOPS):</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Validation</title>
      <p>
        To check whether the anonymization yields valid results, we tested five ontologies out of public
repositories with varying sizes and checked whether the anonymized versions had the same structural
attributes as the original files using the NEOntometrics web service [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The numerical comparison of structural properties showed no difference: OntoAnon does
anonymize the given ontologies reliably. However, the execution time for larger ontologies is
considerable. The largest tested ontology, Foodon, with 318.105 axioms, takes 252 seconds. The test
machine, however, was a mid-sized business notebook6 not explicitly prepared for running the
performance test. The input data and the test results are available online7.
6 Lenovo Thinkpad L390 Yoga, 16GB Ram, i7-8565U
7 https://github.com/Uni-Rostock-Win/ontoAnon-Testdata
8 https://github.com/EnvironmentOntology/biorealm
9 https://github.com/obophenotype/bio-attribute-ontology
10 https://github.com/allysonlister/swo
11 https://github.com/ukparliament/Ontology
12 https://github.com/FoodOntology/foodon</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Ontologies allow to formally describing of a domain to computers and humans. Thus they often
contain sensitive information on products, processes, and business rules. These contents can hinder the
sharing of ontologies, even though the interest may be more in the structural representations than in the
contents. The proposed OntoAnon software offers a solution to this concern. It is a small, locally run,
python-based software that removes the ontology content while preserving the structural integrity. The
anonymized ontology allows using web-based software like NEOntometrics or OOPS without
sacrificing data security. In this way, the authors hope that OntoAnon eases the collaboration between
researchers and practitioners and allows more empirical insight into the structural developments and
properties of enterprise ontologies.</p>
      <p>While the presented application anonymizes a single ontology, a potential future development shall
anonymize a whole git repository to enable the studying of evolutional development processes of
enterprise ontologies and repositories. Furthermore, while the application already allows the selection
and deselection of namespaces to be customized, a granular selection capability on subclasses and
TBox or A-Box could further aid possible sharing and usage scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reiz</surname>
          </string-name>
          , K. Sandkuhl,
          <article-title>NEOntometrics: A Flexible and Scalable Software for Calculating Ontology Metrics</article-title>
          ,
          <source>in: Proceedings of Poster and Demo Track and Workshop Track of the 18th International Conference on Semantic Systems co-located with 18th International Conference on Semantic Systems (SEMANTiCS</source>
          <year>2022</year>
          ), CEUR-WS, Vienna,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <string-name>
            <surname>OOPS</surname>
          </string-name>
          ! (OntOlogy Pitfall Scanner!),
          <source>Semantic Web and Information Systems</source>
          <volume>10</volume>
          (
          <year>2014</year>
          )
          <fpage>7</fpage>
          -
          <lpage>34</lpage>
          . https://doi.org/10.4018/ijswis.2014040102.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandkuhl</surname>
          </string-name>
          ,
          <article-title>Debunking the Stereotypical Ontology Development Process</article-title>
          ,
          <source>in: Proceedings of the 14th International Joint Conference on Knowledge Discovery</source>
          ,
          <article-title>Knowledge Engineering and Knowledge Management, Valletta</article-title>
          , Malta,
          <source>SCITEPRESS - Science and Technology Publications</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Herraiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          , G. Robles,
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Gonzalez-Barahona</surname>
          </string-name>
          ,
          <article-title>The evolution of the laws of software evolution</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>46</volume>
          (
          <year>2013</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          . https://doi.org/10.1145/2543581.2543595.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Torra</surname>
          </string-name>
          , Guide to Data Privacy, Springer International Publishing, Cham,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Delanaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Rousset</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Thion</surname>
          </string-name>
          ,
          <article-title>Query-Based Linked Data Anonymization</article-title>
          , in: D.
          <string-name>
            <surname>Vrandečić</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          <string-name>
            <surname>Suárez-Figueroa</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            , I. Celino,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>L.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Kaffee</surname>
          </string-name>
          , E. Simperl (Eds.),
          <source>The Semantic Web - ISWC 2018</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>530</fpage>
          -
          <lpage>546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thouvenot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Calvez</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Anonymization using Semantic Anatomization</article-title>
          , in: 2020
          <source>IEEE International Conference on Big Data (Big Data)</source>
          , Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          , USA, IEEE,
          <year>2020</year>
          , pp.
          <fpage>4065</fpage>
          -
          <lpage>4074</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.-T.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carminati</surname>
          </string-name>
          , E. Ferrari,
          <year>2022</year>
          .
          <article-title>Time-Aware Anonymization of Knowledge Graphs</article-title>
          .
          <source>ACM Trans. Priv</source>
          . Secur.,
          <volume>3563694</volume>
          . https://doi.org/10.1145/3563694.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.-T.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carminati</surname>
          </string-name>
          , E. Ferrari,
          <article-title>Cluster-Based Anonymization of Knowledge Graphs</article-title>
          , in: M.
          <string-name>
            <surname>Conti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Casalicchio</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Spognardi (Eds.),
          <source>Applied Cryptography and Network Security</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>