<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Creating Dynamically Evolving Ontologies: A Use Case from the Labour Market Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maaike H.T. de Boer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roos M. Bakker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maaike Burghoorn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TNO</institution>
          ,
          <addr-line>Anna van Buerenplein 1, 2595DA, The Hague</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universiteit Leiden</institution>
          ,
          <addr-line>Reuvensplaats 3, Leiden, 2311BE</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The world is changing, which means that formal representations of (part of) the world should change with it. In this paper, we explore to what extent automation of updating ontologies or taxonomies could be possible using Hybrid AI. We use Natural Language Processing (NLP) methods to automatically recognize and integrate new concepts and alternative labels in an ontology. The labour market domain is used as a use case, as new jobs and skills should be added on a regular basis. In our experiments we show that with our dataset 1) language-based methods seem to outperform a string-based method, but no clear diference between language-based methods can be observed; 2) it is easier to map skills within one ontology (to alternative labels / synonyms) compared to between diferent sources; 3) no clear diference in performance between mapping with synonyms / more relevant text compared to without is visible yet. This means that we can certainly take steps towards automation in the field of ontology evolution, but we are not there yet. In the future we plan to further experiment with at least the integration of skills (3), as well as the creation of a human-in-the-loop system to validate our work and to combine the strengths of humans and machines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology Mapping</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Ontology Evolution</kwd>
        <kwd>Transformer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ontologies are ‘a set of concepts and categories in a subject area or domain that shows their
properties and the relations between them.’1 They can be seen as a representation of (part of)
the world. The world is, however, ever changing; new concepts are formulated, older concepts
get forgotten, new relations and properties are created. For example in the fall of 2022 words as
‘shrinkflation’, ‘bachelorx party’ and ‘pawternaty leave’ are added to the dictionary. 2 One of the
domains in which ontologies or taxonomies often have to handle new concepts and relations is
the labour market, as new jobs and skills are often created.</p>
      <p>These new versions of ontologies or taxonomies are currently often manually created, which is
very labour intensive. In this paper we explore to what extent automation of updating ontologies
or taxonomies could be possible using Hybrid AI - a combination of learned knowledge and
engineered knowledge. We specifically do not aim for full automation, but we foresee that a part
can be automated using ontology mapping techniques and a human-in-the-loop for verification
will still be necessary.</p>
      <p>In the next section, we discuss related work on ontology mapping and ontology evolution.
Section 3 discusses our approach and outlines our experimental setup and the results. The last
section concludes this paper and provides an outlook to future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Ontology Mapping</title>
        <p>
          The comparison between ontologies is often called ontology mapping or ontology matching.
Related work in this field can be divided into element-level and structure-level mapping [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ].
Element-level mapping is a mapping in which each element in an ontology is considered
independently from the other elements in the ontology. Structure-level mapping is a mapping
that includes the whole ontologies or groups of concepts with groups of concepts. As our work
will not include matching whole ontologies or groups of concepts with groups of concepts,
we will focus on element-based mapping. Within the element-based mapping, we focus on
two approaches as they use Natural Language Processing: string-based and language-based
approaches [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The string-based approaches only use the letters of the words to create a
mapping, whereas language-based approaches also use other information of the words, such as
the lemmas or the morphology.
        </p>
        <p>
          In recent years, there has been a development within the language-based approaches to use
more semantic approaches [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. One of these approaches is using word embeddings to compute
similarity between concepts [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ]. These word embeddings have also been combined in a
hybrid way with human knowledge (hand-crafted features), such as in OntoEmma [6]. Other
approaches include deep learning approaches such as DeepAlignment [7] and Transformer
models [8, 9].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontology Evolution</title>
        <p>Most research presented in recent papers is focused on the creation of ontologies. However,
when ontologies are used in real world applications or adopted in companies, it is important
to keep the ontology up to date. This field is called ontology evolution [ 10, 11, 12]. According
to Zablith et al. [11] ontology evolution can be split into five phases: a detection phase, a
change suggestion phase, a change validation phase, an evolution impact phase, and a change
management phase. The detection phase is focused on the detection of a need for change, also
named change capturing [13] or information discovery [14]. The change suggestion phase
suggests possible changes to the ontology. The change validation phase validates the changes
proposed in the suggestion phase. The evolution impact phase assesses the impact of the
changes, often on an application level. The change management phase is a continuous task that
records and versions the changes in the ontology.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>In our experiments we explore to what extent automation towards a new version of an ontology
could be possible. We conduct three experiments of which experiment 1 and 2 are focused on
change suggestion - given new information, does the ontology need adaption? - and experiment
3 combines change suggestion and validation - where should the new information go and what
does this mean in terms of performance? - [11]. We explain the data, methods, evaluation and
results per experiment. All experiments are focused on the domain of the labour market, in
which we expect that skills are added or adjusted in an ontology most often.</p>
      <sec id="sec-3-1">
        <title>3.1. Experiment 1: Recognize similar skills (Dutch)</title>
        <p>The goal of this experiment is to recognize similar skills. If we are able to recognize similar
skills from source documents (such as vacancies or other skill ontologies) to our target ontology
or taxonomy, these could automatically be suggested as a synonym or other relevant link to an
existing skill.</p>
        <p>Dataset The Dutch CompetentNL3 is used as a source ontology and the translated version of
the skills in the ESCO ontology4 is used as a target. Both have 20,000+ skills (such as use foreign
language), and in the experiments we compare a random sample of 10% of the data as a 20,000
times 20,000 skills match would take too much time. Our ground truth is an externally created
mapping between CompetentNL and ESCO, which is created using existing crosswalks as well
as manual validation.</p>
        <p>We use the following state-of-the-art methods mentioned in section 2.1:
• LVS [15]: Levenshtein distance, this is the baseline (string-based);
• SpaCy (NL) [16]: the Dutch word vector model trained on a large news corpus
(nl_core_news_lg) (language-based);
• BERTje (NL) [17]: an Dutch Transformer model trained on several large text corpora,
including books and wikipedia (language-based);
• XLnet (EN) [18]: an English transformer model that is claimed to outperform BERT on 20
tasks (language-based).</p>
        <p>In each method, we calculate for each skill the similarity to each other skill. The skills are
compared and ranked according to the (cosine) similarity score.</p>
        <p>The hypothesis is that the language-based methods ourperform the string-based method.</p>
        <sec id="sec-3-1-1">
          <title>Evaluation</title>
          <p>ods:</p>
          <p>Four diferent metrics are used to evaluate the performance of the various
meth• Accuracy: if the top (all) skill(s) is (are) the same as the ground truth skill(s), assign value
1.0; otherwise 0.0;
3https://www.werk.nl/arbeidsmarktinformatie/skills/competentnl-standaard-voor-skills-in-nederland
4https://ec.europa.eu/esco
• Top 5 accuracy: the number of ground truth skills that appear in the top 5 divided by the
number of ground truth skills;
• MAP: Mean Average Precision; the mean area under the precision-recall curve. This takes
into account the place of the ground truth skills in the ranking;
• DCG: Discounted Cumulative Gain; the graded relevance scale of skills that evaluates the
gain. Similar to MAP the place of the ground truth skills in the ranking is used.
Results Figure 1 shows the results of recognizing similar skills in Dutch. Overall, the result is
not as good as expected. On all metrics, Spacy is the top performer and LVS is the worst. This
confirms our hypothesis, as LVS only uses the information of the string, whereas Spacy and the
other models also use linguistic information. We chose one examples to show the top 1 result
of the various methods (translated to English):
Skill: check availability of military resources;
Ground Truth Skill: identify and monitor physical assets
LVS: estimate resources needed; BERT_nl: control financial and economic resources and activities ;
XLnet: use computer aided design and drawing tools; SpaCy_nl: control operational activities.</p>
          <p>These results show that the ground truth skill is often not very close to the original skill. The
reason for this is that the Dutch CompetentNL is often more specific compared to the (translated
to Dutch) ESCO skill set, and diferent choices in skills are made. This makes it very hard to
create an automatic mapping. Also, the current metrics only look at one ground truth skill. This
motivated us to create experiment 2, within one ontology and multiple ground truth skills.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiment 2: Recognize similar skills (English)</title>
        <p>Experiment 2 has the same goal as experiment 1, but is conducted on English data and considers
data from only one source (the taxonomy / ontology). The mapping is created from an original
skill to an alternative label.</p>
        <p>Dataset We zoom in on a part of the English version of the ESCO ontology. Based on the
existing occupancies and their skills, two datasets are created:
• Data Scientist (DS): total of 47 skills, with each zero or more alternative labels;
• Systems Analysts (SA): total of 19 occupancies (incl. DS) and 231 skills, with each zero or
more alternative labels.</p>
        <p>The alternative labels are used as the ground truth for the skills.</p>
        <p>The hypothesis is that performance in experiment 2 is higher compared to experiment 1, as it
might be easier to map within one source of information (and English models are potentially
also better than Dutch models as English is a bigger language).</p>
        <p>Method The English version of the language based methods of experiment 1 are used, as well
as a specific Transformer model for this domain:
• SpaCy (EN) [16]: the English version of Spacy (en_core_web_lg);
• BERT (EN) [19]: the English version of BERTje;
• JobBERT (EN) [20]: English Transformer model trained on vacancy text;
• XLnet (EN) [18]: same as experiment 1.</p>
        <p>Similar to experiment 1, all (alternative) labels for all skills are compared and ranked according
to the (cosine) similarity score.</p>
        <sec id="sec-3-2-1">
          <title>Evaluation</title>
          <p>One evaluation metric is added to the four of experiment 1:
• Top 1 accuracy: if (one of) the ground truth skill(s) is the top skill, assign value 1.0;
otherwise 0.0;
Results Figure 2 shows the results for the experiment on the DS dataset and the SA dataset,
respectively. For each dataset, we include 1 example again, this time with the top 3 results.
(DS) Skill: manage data;
Ground Truth Skills: data resource management, operate data quality tools, manage data lifecycle,
data administration, administer data
• BERT_en: 1. administer data, 2. use model data, 3. use data bases.
• JobBERT: 1. administer data, 2. prepare data, 3. verify data.
• XLnet: 1. manage data models, 2. data, 3. administer data.</p>
          <p>• SpaCy_en: 1. manage data lifecycle, 2. manage data models, 3. data.
(SA) Skill: signal processing;
Ground Truth Skill : digital signal processing, analogic transmission digital transmission, DSP
• BERT_en: 1. digital signal processing, 2. data processing, 3. image acquisition.
• JobBERT: 1. digital signal processing, 2. data processing, 3. analogic transmission digital
transmission.</p>
          <p>(a) Data Scientist
(b) Systems Analyst
• XLnet and SpaCy_en: 1. digital signal processing, 2. data processing, 3. processing of data.</p>
          <p>The first observation is that the performance is much higher compared to experiment 1. In
the examples we can see that the alternative labels, which are the ground truth skills, are much
closer in meaning compared to the ground truth skills in experiment 1. A second observation is
that the performance of the diferent methods is quite close to each other (the baseline LVS is
not included). This is also visible in the examples. A third observation is that performance
especially accuracy - for the Systems Analyst is slightly lower compared to Data Scientist. This
could be explained by the larger number of skills.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Experiment 3: Integrate / Map skills (English)</title>
        <p>The goal of this experiment is the integration and validation of the new synonyms in the
ontology. The diference in performance is calculated when new synonyms are added to the
comparison. We compare four settings which could be used to compare a skill or skill set (incl.
alternative labels) to other skills in the same occupancy group. Four settings are used: 1) nosyn:
just the new skill used; 2) syn: skill + all alternative labels; 3) random_alt: skills + random 1
alternative label from that skill; 4) wrong_alt: skills + random 2 alternative labels from another
skill.</p>
        <p>The hypothesis is that the wrong_alt has the worst performance as wrong information is
added and this should thus be least close to other skills from the same group, followed by random,
nosyn and syn. More and complete information will probably improve mapping performance.
Dataset The same dataset as experiment 2 is used, but the skill mappings are manually
created.</p>
        <sec id="sec-3-3-1">
          <title>Method</title>
          <p>The same 4 methods as experiment 2 are used.</p>
          <p>The same evaluation as experiment 2 is used, but only results for MAP are shown
Results Figure 3 shows the results of the integration of new skills, on the metric MAP and
the DS dataset only due to space limitations. Other metrics show a similar trend.</p>
          <p>We will use the example of DS from the previous section, where the skill for nosyn is the
same as mentioned before (manage data), syn is the skill and all ground truth skills mentioned.
Random_alt is manage data manage data lifecycle and wrong_alt is manage data develop own
practices continuously Language Integrated Query. The mappings are, however, not on the
synonyms any more, as those are directly used in the experiment, but on other skills in the
same dataset. We show the result of the mapping of the random_alt below:
Random_alt Skill: manage data manage data lifecycle;
Ground Truth Skills: manage findable accessible interoperable and reusable data ensure corrected
data storing
• BERT_en: 1. create data models manage data models, 2. manage ICT data architecture define
enterprise data architecture, 3. design database in the cloud design cloud data architecture.
• JobBERT: 1. manage ICT data architecture define enterprise data architecture , 2. design
database in the cloud design cloud data architecture, 3. implement data quality processes
verify data.
• XLnet: 1. create data models manage data models, 2. unstructured data data analytics, 3.</p>
          <p>implement data quality processes verify data
• SpaCy_en: 1. implement data quality processes verify data, 2. establish data processes
develop data processes, 3. create data models manage data models</p>
          <p>The results show that if no synonyms (nosyn) or correct alternative labels (syn) are used
the SpaCy model performs best. In case of one wrong alternative label, XLnet outperforms
SpaCy. This means that XLnet is less afected by this wrong alternative label. If more than one
wrong alternative labels are added performance drops with all methods, as would be expected.
JobBERT - trained on relevant data - does outperform the general BERT model, but not XLnet
and Spacy in this experiment.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion &amp; Future Work</title>
      <p>In this paper we perform experiments to find out to what extent automation towards a new
version of an ontology could be possible. We focus ourselves on one Dutch and two English
datasets within the labour market domain. Our experiments show that exp 1) language-based
methods outperform a string-based method, but no clear diference between language-based
methods can be observed; exp 2) it is easier to map skills within one ontology (to alternative labels
/ synonyms) compared to between diferent sources; exp 3) no clear diference in performance
between mapping with synonyms / more relevant text compared to without is visible yet.</p>
      <p>Our results motivate the deployment of automated concept mapping to support the evolution
of ontologies. We now used Hybrid AI - the combination of learned knowledge and human
knowledge - and we foresee that hybrid intelligence - the combination of human interaction
with a machine - is necessary in the first step towards automation. In future work, we want to
verify the strengths and weaknesses of both, and create an interface to pose suggestions for
new skills and to work on a new version of an ontology in an user-friendly way.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to thank the internal TNO program on AI (APPL.AI) for their financial support,
as well as the partners of the Skills Matching project. Furthermore, we would like to thank
Quirine Smit for helping with the data analysis, and Jok Tang and Stephan Raaijmakers for
internally reviewing this paper.
[6] L. L. Wang, C. Bhagavatula, M. Neumann, K. Lo, C. Wilhelm, W. Ammar, Ontology
alignment in the biomedical domain using entity definitions and context, arXiv preprint
arXiv:1806.07976 (2018).
[7] P. Kolyvakis, A. Kalousis, D. Kiritsis, Deepalignment: Unsupervised ontology matching
with refined word vectors, in: Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long Papers), 2018, pp. 787–798.
[8] S. Neutel, M. H. T. de Boer, Towards automatic ontology alignment using BERT, in: AAAI</p>
      <p>Spring Symposium: Combining Machine Learning with Knowledge Engineering, 2021.
[9] Y. He, J. Chen, D. Antonyrajah, I. Horrocks, BERTMap: A BERT-based Ontology Alignment</p>
      <p>System, arXiv preprint arXiv:2112.02682 (2021).
[10] L. Stojanovic, B. Motik, Ontology Evolution within Ontology Editors, in: EON, 2002, pp.</p>
      <p>53–62.
[11] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis,
M. Sabou, Ontology evolution: a process-centric survey, The knowledge engineering
review 30 (2015) 45–75.
[12] F. Osborne, E. Motta, Pragmatic ontology evolution: reconciling user requirements and
application performance, in: International Semantic Web Conference, Springer, 2018, pp.
495–512.
[13] L. Stojanovic, Methods and tools for ontology evolution (2004).
[14] F. Zablith, Evolva: A comprehensive approach to ontology evolution, in: European</p>
      <p>Semantic Web Conference, Springer, 2009, pp. 944–948.
[15] C. Room, Levenshtein distance, algorithms 12 (2019) 32.
[16] M. Honnibal, I. Montani, spacy 2: Natural language understanding with Bloom embeddings,
convolutional neural networks and incremental parsing, To appear 7 (2017) 411–420.
[17] W. de Vries, A. van Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, M. Nissim, Bertje:</p>
      <p>A dutch bert model, arXiv preprint arXiv:1912.09582 (2019).
[18] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized
autoregressive pretraining for language understanding, Advances in neural information
processing systems 32 (2019).
[19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[20] J.-J. Decorte, J. Van Hautte, T. Demeester, C. Develder, JobBERT: Understanding job titles
through skills, arXiv preprint arXiv:2109.09605 (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <article-title>A survey of approaches to automatic schema matching</article-title>
          ,
          <source>the VLDB Journal</source>
          <volume>10</volume>
          (
          <year>2001</year>
          )
          <fpage>334</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          , et al.,
          <article-title>Ontology matching</article-title>
          , volume
          <volume>18</volume>
          , Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>I. Harrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jupp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lomax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Senger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Splendiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wilson</surname>
          </string-name>
          , et al.,
          <article-title>Ontology mapping for semantically enabled applications</article-title>
          ,
          <source>Drug discovery today 24</source>
          (
          <year>2019</year>
          )
          <fpage>2068</fpage>
          -
          <lpage>2075</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <article-title>Ontology matching with word embeddings, in: Chinese computational linguistics and natural language processing based on naturally annotated big data</article-title>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tounsi Dhouib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faron Zucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Tettamanzi</surname>
          </string-name>
          ,
          <article-title>An ontology alignment approach combining word embedding and the radius measure</article-title>
          ,
          <source>in: International Conference on Semantic Systems</source>
          , Springer, Cham,
          <year>2019</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>