<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Active Annotation Support System for Regulatory Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Korger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joachim Baumeister</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Würzburg</institution>
          ,
          <addr-line>Am Hubland, D-97074 Würzburg</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Manual document annotation is a resource intense task. The costs of annotation can be lowered by supporting the manual annotation with pre-processing of the available corpus and active in-process support of annotating users. To integrate diferent components into a coherent active annotation support system the XML Metadata Interchange standard can be used to exchange objects on the base of a metameta data model. Further, to integrate an existing knowledge graph into an annotation support system the RDF query language SPARQL can be used as an interface to analyze existent documents and declare new knowledge. In this manner the presented eforts contribute to structure and standardize the process of manual knowledge acquisition from regulatory documents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Management</kwd>
        <kwd>Document Annotation</kwd>
        <kwd>Meta-Meta Data Models</kwd>
        <kwd>SPARQL</kwd>
        <kwd>Ontology Population</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A core task in the field of knowledge management is to provide insight to documents related
to the current problem situation. For example, for servicing industrial machines, the service
technician will need quick access to the appropriate documentation. For internal knowledge
management, large companies often provide access to regulatory compliance documents for
use by their employees. This access is depicted conceptually on the left side of Figure 1. In these
documents the textual parts are usually annotated by (semantic) metadata, in order to implement
a quick and problem-oriented access to information. This semantic metadata in turn is then used
by semantic search engines and navigation interfaces to provide a quick and context-oriented
access [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Whereas for some new documents the authoring process can be extended by the
augmentation of metadata, the metadata needs to be attached to legacy documents in any case.
Albeit the progression in natural language processing, information extraction and ontology
population, the attachment of metadata to document passages is often done manually in order
to achieve high quality for the annotation. However, the manual annotation of documents is a
cumbersome and costly task.
      </p>
      <p>Available frameworks for general textual annotation lack active annotation support. In this
work, we propose a semantic approach to actively annotate documents by integrating a domain
specific ontology within the annotation task. The domain specific ontology represents prior</p>
      <sec id="sec-1-1">
        <title>Documents Active Annotation Support System</title>
      </sec>
      <sec id="sec-1-2">
        <title>Semantic Access</title>
      </sec>
      <sec id="sec-1-3">
        <title>Domain Knowledge</title>
        <p>domain knowledge and it is used to integrate new knowledge collected in the active annotation
process. The right side of Figure 1 shows a conceptual view of this approach. which reduces
the eforts and improves quality of annotation compared to fully manual approaches.</p>
        <p>The rest of the paper is organized as follows: In Section 2 we describe the components of the
proposed framework as well as their interaction. In Section 3 we present a case study using the
system for the annotation of regulatory documents for nuclear safety. We finish the paper with
related work, future work, and a concluding statement.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Components of an Active Annotation Support System</title>
      <p>In the following, we use an example with explanatory characteristics out of the domain of
ifre safety. A fire is an incident for which appropriate measures need to be taken. A fire
extinguisher is such a measure but also a blanket. Such incidents and measures are in the
scope of discovery for semantic annotation in unknown documents. An active annotation
support system (AASS) suggests such annotations to the user with the recommendation of likely
annotation features. It proposes automatically discovered annotations on the base of machine
learning and natural language processing techniques, so-called pre-annotations. Such a system
is called active, if the choice made by the user for the current annotation is incorporated into
the next annotation recommendation. In this context, the system can also decide which textual
passages are presented to the user for annotation to optimize the overall performance. The
annotations are done on the basis of a semantic model for regulatory knowledge represented in
an ontology. The semantic model is populated with instances created during the annotation
step which makes up a knowledge graph of applied regulatory knowledge.</p>
      <p>In this paper, an AASS for the knowledge management of regulatory documents is introduced,
major components necessary for the implementation are pointed out and explained in detail.
The workflow of bringing diferent components of the architecture together in a performant
and consistent manner is explained. Basically there are two processing environments in use.
First, the annotation frontend in which the user works and does his manual annotations. Second,
the backend which does background work like file handling, provides NLP functionality, and
manages consistency. A graphical view of the workflow and incorporated components is
presented in Figure 2, components shown in the figure will be explained with more detail in the
following sections.</p>
      <sec id="sec-2-1">
        <title>2.1. Natural Language Processing Components</title>
        <p>
          The handling of natural language text contained in regulatory documents requires appropriate
techniques, namely to extract relevant entities and their relations [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For the processing
of natural language text a supporting NLP container is necessary that provides functionality
like text ofset handling, tokenization, string matching, and rule matching. Each document
is encapsulated in such a container on the backend side as Figure 2 shows in the component
number (4). The container provides an interface to semantic knowledge contained in the
regulatory domain knowledge ontology (1). It applies natural language processing steps to
grant access to the document on a level of tokens and entities. Rule matching steps are applied
to identify relevant known and unknown entities together with their relations.
        </p>
        <p>The document container is created with the input of the document content (3). From the
existent ontology structure (1) together with the already available annotations saved in the
corresponding knowledge graph (2) a hierarchical type system is derived that is implemented
into the annotation environment. Therefore, the information has to be transferred into a format
that can be consumed and displayed by the frontend tools. This type system holds the supportive
information for the user. In this way the options are represented that a user has for annotating
entities and their relations. The functionality of the annotation environment is used, in order
to retrieve feedback within the active learning step. As the presented architecture picks the
best of diverse data models it is important to guarantee consistency of data. For the practical
application the documents need to be uniquely addressable as well as diferent annotated
versions of the same document need to be identifiable. Furthermore, the annotated metadata
needs to be manageable. For instance, it is important to know which annotation was done by
which annotating component. Therefore, a systematic model of unique identifiers is maintained.
Additionally, meta information about the provenance of data is stored in the knowledge graph.
The document container itself provides measures to maintain its own identifiers and assures at
least the inherent (syntactical) correctness when transferring identifiers to the frontend.</p>
        <p>Therefore, this container needs to communicate with the annotation tool for which a standard
is necessary for the interchange of metadata. The pre-annotations (5) created by the NLP
container are transferred to the document container which forwards them to the frontend. They
are displayed for instance in the same manner as manual annotations are and can be accepted or
adjusted by the user. The user might want to change the beginning and ending of the automatic
annotation, the annotated features or delete the whole annotation. The annotations of the user
are communicated to the backend side in the same manner as they are passed to the frontend.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Semantic Technology Components</title>
        <p>Semantic components are modularized and stacked with the intention to re-use and adapt
them to new regulatory domains. The fundament is an ontology (1) that represents the basic
regulatory knowledge model. This model is instantiated and instances are aggregated in diferent
knowledge graphs (2). Semantic technologies are used for three diferent purposes:</p>
        <p>
          First, prior and learned domain knowledge is organized in the knowledge graph on the
base of the knowledge organization model. For this task we use the SKOS (Simple Knowledge
Organization System) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] scheme and build a domain specific semantic model on top. The
domain knowledge consists of entities that are described with labels and descriptions as well
as proprietary relations between them. These are the entities and relations the user wants to
discover and annotate in the available document corpus.
        </p>
        <p>
          Second, a set of extraction patterns is matched to the corpus to identify unknown entities and
relations. All these discovered information units are saved in the knowledge graph together with
their annotations. An annotation made in a documents on the base of the domain knowledge
signifies for instance, that a certain entity occurs at this position in the document. These
automatic annotations are complemented by the manual annotations of the user. A big corpus
quickly entails a large amount of annotations. This data has to be queryable to give insight
with human perception capability for assessment of the quality of annotations and usage of
the discovered knowledge. Additionally, a structured data storage in this manner allows for
semantic search in the annotated data [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          Third, provenance information is stored in the knowledge graph to organize for instance
diferent authors, diferent documents, and experiments. We chose the Provenance Ontology
(PROV-O) as a base to build a domain specific system above [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Textual Similarity Assessment</title>
        <p>
          What distinguishes the annotation of regulatory documents from the annotation of general
documents? We see the diference in the availability of a semantic model coding prior
knowledge founding on several previous case studies. Further, this provides the base to construct a
domain independent NLP engine exploiting textual phenomena that occur especially in texts of
regulation. The combination of both facilitates the annotation of unknown regulatory texts.
For instance, having taxonomies available of regulatory domain specific entities allows for
recommending annotation features based on generalization and specification in the neighborhood
defined by the topology of the taxonomy [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The domain specific relations and which textual
indicators in regulatory speech point to them, simplifies the discovery of specific entities.
        </p>
        <p>Being supported by regulatory semantic prior knowledge the “active” component of the
annotation system can also exploit the users annotations eficiently. For instance, if the user
has annotated a relation but not the entities the recommendation of fitting entities becomes
possible due to domain knowledge. For instance, taken the example of fire safety “ ... detecting
and extinguishing quickly those fires which do start ... ”. We already know, that a fire extinguisher
is a measure against fires, this can be exploited by textual similarity assessment to identify that
the verb extinguishing itself also is a measure. Additionally, extinguishing is connected with
detecting via the conjunction and which allows the inference that detecting also is a measure
and should be annotated as one. The adverb quickly can be added to create the more specific
entities detecting quickly and extinguishing quickly. This would be stored in the knowledge
graph as the measures quick detection and quick extinguishment in nominalized speech as well
as the more general entities detection and extinguishment.</p>
        <p>
          One additional aspect of having domain knowledge available is that specific similarity
measures can be created. This is done either on the base of a language model like BERT or with the
usage of a feature-based similarity measure [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In the case of the language model approach a
pre-trained model is used. A model can also be trained on the specific textual data if suficiently
available. These similarity measures are then used to identify unknown textual passages that
are similar to known textual passages that have already been classified. In the following section
we describe an implementation of all components and report a case study in the domain of
nuclear safety.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Case Study - Active Learning in the Domain of Nuclear Safety</title>
    </sec>
    <sec id="sec-4">
      <title>Regulations</title>
      <p>
        Worldwide, several governmental and non governmental authorities provide regulatory
compliance documents to assure nuclear safety. The scope reaches from recommendations for
the safe operation of nuclear power plants to the safe execution of x-ray radiography. These
documents are a valuable source of knowledge. To access this knowledge in a systematic
manner, the availability of metadata within the documents is needed. This metadata has to be
created by manual or automatic textual annotation. The information of how the regulatory
metadata is structured is so-called meta-metadata. Synchronous with the annotation process,
this knowledge about the structure of the metadata is improved. To evaluate and improve the
before explained approach of active annotation support a case study in the domain of nuclear
safety was put into practice. The framework was used to annotate selected documents of a
corpus of 143 regulatory documents provided by the International Atomic Energy Association
(IAEA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>A task in this context is the annotation of potential incidents with the according safety
measures as recommended in a document. For instance the phrase “manual fire fighting ” is
annotated as a measure to react to a discovered “fire ”. Subsequently, it is wishful to provide this
information in the next annotation step. This can be done by either annotating all occurrences
of the phrase “manual fire fighting ” or “on the fly” by providing the available information to
the domain expert as a likely annotation to choose from. In this manner the knowledge is
continuously improved with every working step of manual annotation. The overall view of
tools and techniques used to implement the whole system as depicted in Figure 3.
Annotated
Document</p>
      <p>Corpus</p>
      <p>UIMA
Document 3</p>
      <p>Container
NLP Container
spaCy
4
XMI
XMI</p>
      <p>Active Annotation Support System</p>
      <sec id="sec-4-1">
        <title>3.1. The UIMA CAS Object and its Serialized Representation</title>
        <p>
          Apache UIMA (Unstructured Information Management Architecture) is a framework for the
management of unstructured information with the goal of structuring it into a processing
pipeline of annotation steps [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In this setup UIMA is used as a document container in the
backend depicted by component number three. UIMA provides functionality to hold the textual
content of the document, the information about entities and their relations, and is capable of
transferring this information into a serialized data format. For each document a so-called CAS
object (Common Analysis Structure) is created with access to the type system schema necessary
for the serialization and the communication to the frontend (5).
        </p>
        <p>
          The CAS object maintains the correctness of indices and the according correctness of the
serialized format. The XMI standard can be used for the serialized exchange of data objects on
the base of meta-metadata models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The content of an exemplary XMI file can be seen in the
following Listing 1. The file holds the document content with its MIME type in the SOFA string
(Subject Of Analysis) as well as two entities with the attributes measureRoot and incidentRoot
and the relation between them with the label isReactiveMeasureTo.
Listing 1: A simplified XMI definition for a document of fire safety in nuclear power plants
showing namespaces, entities, and relations.
        </p>
        <p>
          The attribute names are extracted from the regulatory knowledge graph (2) and are proprietary
to nuclear safety. Here, it is important to maintain consistency over the whole workflow to
match UIMA, NLP, and knowledge graph entities. This task would exceed the capabilities
of the document and NLP container functionality but can be conveniently fulfilled with the
help of a knowledge management system like SKOS. To manually maintain the ontology
and the belonging knowledge graphs an editor assists to assure correctness and supports
manual ontology population. The tool (1) we chose for this task is KnowWE (Knowledge Wiki
Environment) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Natural Language Processing Components</title>
        <p>
          A python library (pyCAS) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] allows to integrate the document container build on the UIMA
base with the NLP container (4). The main part of the NLP functionality is handled by the
python-based NLP framework spaCy [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The system enables tokenization,
part-of-speechtagging, stemming, and rule-matching on the base of a pre-trained language model available
in diferent languages. For similarity assessment of entities and textual passages the spaCy
library is enriched by a pre-trained BERT model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Similarity assessment is a core task of the
workflow. Entity labels retrieved from the knowledge graph need to be matched against the
natural document text. In this process basic natural text processing steps regarding for instance,
spelling, word stems, and synonyms [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is applied. Further characteristics of speech like the
partof-speech classification is exploited to discover sentences with potential relations. In Figure 4 the
surface of the annotation tool is sketched. The available annotations are highlighted with gray
color, pre-annotations with light gray color. Component (1) shows the type system proposing
selection of available entities fitting to the textual passage “ fires which do start ” (3) in hierarchical
order derived from the knowledge graph (2). How relations between entities are displayed is
shown by component (4) which links two entities with the relation type isReactiveMeasureTo.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Querying Regulatory Knowledge</title>
        <p>
          A benefit of the automatic and manual annotation eforts is that the annotated documents can
be accessed via queries allowing, e.g., for semantic search on them. SPARQL is a query language
Users
5
6
2.2. To ensure adequate fire safety in a nuclear power plant in
operation, an appropriate level of defence in depth should be maintained
throughout the lifetime of the plant, through the fulfilment of the three
principal objectives identified in Ref. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]: ITEyEpnetiStyyGstoemld 1
        </p>
        <p>4 incidentRoot
(1) Preventing fires from starting; isReactiveMeasureTo fsitraerItnincgidFeirnetIncident
(2) Detecting and extinguishing quickly those fires which do start, thus
limiting the damage; and</p>
        <p>3
(3) Preventing the spread of those fires which have not been
extinguished, thus minimizing their effects on essential plant functions.</p>
        <p>Frontend</p>
        <p>2
Backend
to retrieve and manipulate information contained in RDF knowledge graphs. Relevant features
are extracted with a query and then stored in an appropriate data structure provided by the
programming language (array, list, object, matrix, etc.). When the NLP processing is done, the
results are transferred into the needed representation (CAS, XMI, RDF). In Figure 5 an excerpt
of a taxonomy is illustrated. It orders incidents that are relevant in the domain of nuclear safety
from a most general root incident pirinu:incidentRootNuclearSafety to more specific incidents
connect via the property piri:broader. The property signifies that the entity in scope is of a more
general character.</p>
        <p>pirinu:incidentRootNuclearSafety
broader
broader
broader
broader
broader
broader
pirinu:beyondDesignBasisAccident
pirinu:severeAccident
pirinu:omissionIncident
pirinu:organizationalFailings
pirinu:fireIncident
broader
broader
broader
pirinu:aircraftCrashIncident
pirinu:startingFireIncident
pirinu:spreadingFireIncident</p>
        <p>The taxonomy shown before can be accessed via a SPARQL statement which is depicted in
Listing 2. The core ontology definitions are aggregated with the namespace piri. The knowledge
graph for the domain of nuclear safety is separated with the proprietary namespace pirinu
which stands for “piri nuclear”. The asterisk following piri:broader signifies that all entities that
are transitively related to pirinu:incidentRootNuclearSafety with piri:broader should be retrieved.
SELECT ?x ?yLabel ?z</p>
        <p>WHERE {
?x ?y ?z .
?x piri:broader+ pirinu:incidentRootNuclearSafety .</p>
        <p>FILTER (?y = piri:broader).</p>
        <p>FILTER (?z != ?x).</p>
        <p>BIND (SUBSTR(STR(?y), 33) AS ?yLabel)
}
Listing 2: A SPARQL statement that
selects all entities transitively related to pirinu:incidentRootNuclearSafety by the
property piri:broader.</p>
        <p>Most often a large number of annotations is created by an automatic annotation process. To
manually browse and assess their quality, additional user support is necessary. The present
corpus of nuclear safety consists of more than 10,000 pages of natural text. When fully annotated
this quickly results in millions of annotations which exceeds human perception capabilities.
Hence, the annotated knowledge needs to be aggregated and evaluated in a way complying to
user requirements. Therefore, KnowWE is used for the presentation of the annotated data with
a variety of options. Automatic annotations can be structured into a tabular format. The tabular
structure can be adapted to the current context. Graphical data visualization is beneficial to
display relations between annotated entities as depicted by Figure 5 .</p>
        <p>
          Listing 3 shows an example of how to extract all annotations that are relevant for a specific
user problem. The user wants to know about all entities and their relations that were manually
annotated in a specific document for fire safety. The concept scheme piri:IAEAGS unites all
annotations that where done manually and are approved by human review as gold standard
annotations [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The query retrieves all information units with phrases, annotated features,
text ofsets, ids, and if available, relations having the current entity as an argument. The
query is written and executed in the KnowWE environment on a wiki page. The result is
visually presented with a tabular design and with additional filtering functionality. Users
can comfortably browse manual and automatic annotations and edit specific entities in the
annotation environment.
        </p>
        <p>SELECT DISTINCT
?iu ?id ?offsetfrom ?offsetto ?phrase ?annotation
(GROUP_CONCAT(?relatedPhrase;SEPARATOR=",") AS ?relations)
WHERE {
?iu rdf:type piri:InformationUnit .
?iu piri:broader pirinu:regulatoryDocumentNuclearSafetyPub1091-web .
?iu piri:hasPhrase ?phrase .
?iu piri:hasOffsetFrom ?offsetfrom .
?iu piri:hasOffsetTo ?offsetto .
?iu piri:hasID ?id.
?iu piri:hasAnnotation ?annotation.
?iu piri:inScheme pirinu:annotationSchemeGoldAnnotationIAEA.</p>
        <p>OPTIONAL {
?iu piri:hasMeasure ?relatedMeasure .
?relatedMeasure piri:hasPhrase ?relatedPhrase .</p>
        <p>}
}
GROUP BY ?iu ?id ?phrase ?offsetfrom ?offsetto ?annotation
ORDER BY ?id
Listing 3: A SPARQL statement that selects all entities and their relations of the nuclear gold
annotation scheme and presents them in an aggregated way.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>This work presented the concept to actively support users in the task of manual document
annotation. Namely, to improve performance and consistency of the annotating work. Necessary
components were described in their interaction in a cyclic process. A focus was set on the
modularity of the approach. This allows the exchange of individual modules to adapt the
approach to other domains and new technologies. A case study in the domain of nuclear safety
documents showed the practical application of the architecture. Specific tools in use where
presented in their functionality. The use of the system in a life working setup on real world
documents served as an evaluation revealing shortcomings and their refinement to the present
status of quality.</p>
      <sec id="sec-5-1">
        <title>4.1. Related Work</title>
        <p>
          The XMI standard is used in a variety of scenarios. To use it for the modularization of an active
annotation support system and the integration of a domain ontology of regulatory knowledge is
a new approach. Software design shows parallels to the model-based annotation of documents
as the reuse of existing components is saving resources in a same way programming code can
be re-used and adapted to similar necessities. This aspect was elaborated with the use of the
XMI standard by Di Felice et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Interesting work on how to consistently explore XMI
ifles for the generation of test cases was presented by Achimugu et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Bucko et al. [16]
presented a work towards the automation of model driven system architecture to support system
architects in their work by an ontology model coding manual transformation guidelines. Nasiri
et al. [17] try a similar approach to the present one on user stories for software modeling and
extract class diagrams into a XMI file with natural language processing steps. Wardhana et
al. [18] use the XMI standard as a bridge to transform a System Modeling Language diagram
automatically into an ontology to replace to costly and error-prone manual transformation by
system engineers. A work to transform business process models coding domain knowledge
via decision learning support and the XMI standard into an UML model to assist software
engineering was presented by Mythily et al. [19]. Previous work has also been done by the
authors that gives more insight into the domain of nuclear safety and the according knowledge
management [20]. Additional information together with the components of the architecture
can be found on the PIRI website [21] and on github [22].
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Future Work</title>
        <p>Currently the integration of knowledge graphs into the UIMA proprietary type system lacks
automation and standardization. This feature would make the overall architecture more seamless
and also improve consistency. Furthermore, it would allow to reduce the eforts to, for instance,
integrate any SKOS-based knowledge graph. Besides that, an extension of the CAS object to
hold partial knowledge extracted by a SPARQL query in a standardized way would facilitate
diverse processing steps.
[16] B. Bučko, K. Zábovská, M. Zábovský, Ontology as a modeling tool within model driven
architecture abstraction, in: 2019 42nd International Convention on Information and
Communication Technology, Electronics and Microelectronics (MIPRO), 2019, pp. 1525–
1530.
[17] S. Nasiri, Y. Rhazali, M. Lahmer, A. Adadi, From user stories to UML diagrams driven by
ontological and production model, International Journal of Advanced Computer Science
and Applications 12 (2021).
[18] H. Wardhana, A. Ashari, A. Sari, Transformation of SysML requirement diagram into
OWL ontologies, International Journal of Advanced Computer Science and Applications
11 (2020).
[19] M. Mythily, S. Saha, S. Selvam, I. T. J. Swamidason, BPM supported model generation by
contemplating key elements of information security, Automated Software Engineering 29
(2022).
[20] A. Korger, J. Baumeister, Case-based generation of regulatory documents and their
semantic relatedness, in: K. Arei, S. Kapoor, R. Bhatia (Eds.), Future of Information and
Communication Conference San Francisco, volume 1130 of Advances in Information and
Communication, Springer, 2020, pp. 91–110.
[21] A. Korger, J. Baumeister, Piri ontology, 2022. URL: https://www.piri-safety.com.
[22] A. Korger, J. Baumeister, 2022. URL: https://github.com/regdoc/piri.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McCool</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Semantic search</article-title>
          , in: Twelfth International World Wide Web Conference (WWW
          <year>2003</year>
          ),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Martin</surname>
          </string-name>
          , Speech and
          <string-name>
            <given-names>Language</given-names>
            <surname>Processing</surname>
          </string-name>
          :
          <article-title>An Introduction to Natural Language Processing</article-title>
          , Computational Linguistics, and Speech Recognition, 1st ed.,
          <string-name>
            <surname>Prentice Hall</surname>
            <given-names>PTR</given-names>
          </string-name>
          , USA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] W3C, SKOS Simple Knowledge Organization System Reference</article-title>
          : http://www.w3.org/TR/ skos-reference,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <issue>W3C</issue>
          ,
          <string-name>
            <surname>PROV-O: The</surname>
            <given-names>PROV</given-names>
          </string-name>
          Ontology: http://www.w3.org/TR/prov-o,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          , Experience Management, Springer, Berlin, Heidelberg,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          , Stroudsburg, PA, USA,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . arXiv:
          <year>1810</year>
          .04805v1.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>International</given-names>
            <surname>Atomic Energy Agency</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://www.iaea.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lally</surname>
          </string-name>
          ,
          <article-title>Accelerating corporate research in the development, application and deployment of human language technologies</article-title>
          ,
          <source>in: SEALTS '03: Proceedings of the HLTNAACL 2003 workshop on Software engineering and architecture of language technology systems, Association for Computational Linguistics</source>
          , Morristown, NJ, USA,
          <year>2003</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , XML Metadata Interchange,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          , Boston, MA,
          <year>2009</year>
          , pp.
          <fpage>3597</fpage>
          -
          <lpage>3597</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baumeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reutelshoefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Puppe</surname>
          </string-name>
          ,
          <article-title>KnowWE: A semantic wiki for knowledge engineering</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>35</volume>
          (
          <year>2011</year>
          )
          <fpage>323</fpage>
          -
          <lpage>344</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zehe</surname>
          </string-name>
          ,
          <article-title>Python implementation of the apache UIMA CAS data structure (</article-title>
          <year>2020</year>
          ). URL: https://gitlab2.informatik.uni-wuerzburg.de/alz20ij/PyUIMA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          , I. Montani,
          <string-name>
            <given-names>S. Van</given-names>
            <surname>Landeghem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , spaCy: Industrial-strength
          <source>Natural Language Processing in Python, Zenodo</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wißler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almashraee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Monett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paschke</surname>
          </string-name>
          ,
          <article-title>The gold standard in corpus annotation</article-title>
          ,
          <source>in: IEEE GSC Passau</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Di Felice</surname>
          </string-name>
          , G. Paolone,
          <string-name>
            <given-names>R.</given-names>
            <surname>Paesani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marinelli</surname>
          </string-name>
          ,
          <article-title>Design and Implementation of a Metadata Repository about UML Class Diagrams. A Software Tool Supporting the Automatic Feeding of the Repository</article-title>
          ,
          <source>Electronics</source>
          <volume>11</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Achimugu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Achimugu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nwufoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Husssein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kolapo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Olufemi</surname>
          </string-name>
          ,
          <article-title>An improved approach for generating test cases during model-based testing using tree traversal algorithm</article-title>
          ,
          <source>Journal of Software Engineering and Applications</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>257</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>