<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Graph-based Discovery of Ontology Change Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Javed</string-name>
          <email>mjaved1@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yalemisew M. Abgaz</string-name>
          <email>yabgaz2@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claus Pahl</string-name>
          <email>cpahl3@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Next Generation Localisation (CNGL), School of Computing, Dublin City University</institution>
          ,
          <addr-line>Dublin 9</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontologies can support a variety of purposes, ranging from capturing conceptual knowledge to the organisation of digital content and information. However, information systems are always subject to change and ontology change management can pose challenges. We investigate ontology change representation and discovery of change patterns. Ontology changes are formalised as graph-based change logs. We use attributed graphs, which are typed over a generic graph with node and edge attribution. We analyse ontology change logs, represented as graphs, and identify frequent change sequences. Such sequences are applied as a reference in order to discover reusable, often domain-specific and usagedriven change patterns. We describe the pattern discovery algorithms and measure their performance using experimental results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ontologies can support tasks ranging from capturing the conceptual knowledge
to the organisation of digital content and other information artefacts. However,
information systems are always subject to change and ontology change
management can pose challenges. Ontologies become essential for knowledge sharing
activities in dynamic information systems, in areas such as bioinformatics,
educational technology systems, e-learning, indexing and retrieval.</p>
      <p>
        The dynamic nature of knowledge requires ontologies to change over time.
The reason for change in associated content can be change in the domain, the
specification, or the conceptualization [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A change in an ontology may
originate from a domain knowledge expert, a user of the ontology or a change in
the application area [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While some generic ontologies (like upper ontologies)
evolve at a slower pace, we have been working with non-public ontologies used
to annotate content in large-scale information systems, for instance an
ontologyannotated help system for a large content management architecture. In this
context, changes happen on a daily basis, triggered by changes in software, its
technical or domain environment. Systematic ontology change becomes here a
necessity to enable controlled, accountable and predictable evolution.
      </p>
      <p>
        The operationalisation of changes is a vital part of ontology evolution.
Elementary, composite and complex change operators and detection of higher level
changes have been suggested in past [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3–6</xref>
        ]. For semantically enhanced
information systems, a coherent representation of ontology change is essential. We
present a graph-based approach for ontology change representation. We can
identify recurring change sequences from an ontology change log that captures
changes at an operational level. Graphs are used to formalise change. As we are
interested in changes which are applied on a particular ontology entity in
different sessions, we use a gap-constrained pattern discovery approach. Some central
features of our approach are:
– Fine-granular ontology change representation (RDF triples) helps sharing
semantics of the data and representing intent and scope of change explicitly.
– An operational representation of the ontology changes using a graph-based
formalisation. Ontology change log graphs enable efficient search, analysis
and categorisation of ontology changes.
– Discovery of recurring change pattern from an ontology change log, which
provides an opportunity to define reusable domain-specific change patterns
that can be implemented in existing knowledge management systems.
      </p>
      <p>The paper is structured as follows. We discuss ontology change
representation in Section 2. Section 3 introduces pattern discovery concepts. A detailed
description of change patterns discovery algorithms is given in Section 4.
Experimental results and an illustration of the technique are given in Section 5.
Related work is discussed in Section 6 and we end with some conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ontology Change Representation</title>
      <p>We studied the evolution of ontologies empirically in order to determine a
finegranular ontology change representation and to identify reusable, usually
domainspecific change patterns which cannot be identified by simply exploring or
querying changes. As case studies, the domains university administration and software
application were looked at. The former represents an organisation involving
people, organisational units and processes. The latter, based on our work with an
industrial partner, takes a content-centric perspective on a software system. The
application system is a content management and archiving system there.</p>
      <p>To record of applied changes, we use ontology change logs. They provide
operational as well as analytical support in the evolution process. Change logs
are also used to keep the evolution process transparent and centrally manageable.
It captures all the changes applied to entities of the ontology.</p>
      <p>
        A graph-based formalisation is an operational representation for the ontology
changes. Graphs enable efficient search and analysis and can also communicate
information visually. We followed the idea of attributed graphs. Graphs with
node and edge attribution are typed over an attribute type graph (ATG) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Attributed graphs ensure that all edges and nodes of a graph are typed over the
ATG and each node is either a source or target, connected by an edge (Figure
DateTime
      </p>
      <p>String</p>
      <p>User
timeStamp hasID hasCreator</p>
      <p>Integer
order
Operation hasOperation</p>
      <p>Atomic</p>
      <p>Composite
Domain-Specific</p>
      <p>Change hasParameter</p>
      <p>No. of Parameters
hasElement Integer
Element</p>
      <p>Parameters
Entity</p>
      <p>Integer</p>
      <p>String</p>
      <p>Concept Object Prop. Data Prop. Individual
hasAxiom hasRestriction hasEntity
Axiom Restriction</p>
      <p>Entity</p>
      <p>Cardinality
Individual
Axiom</p>
      <p>Data Prop.</p>
      <p>Axiom</p>
      <p>Object Prop. Class Axiom</p>
      <p>Axiom</p>
      <p>Graph Node
Attribute Node - Metadata
Attribute Node - Change data
Graph Edge
Node Attribute Edge
Edge Attribute Edge</p>
      <p>An attributed graph (AG) typed over an ATG is given in Figure 2. The types
defined on the nodes can be given as t(Add) = Operation, t(classAssertion)
= ClassAxiom, t(J ohn) = Individual and t(P hD Student) = Concept. The
attributed graph actually represents a single ontology change operation where
graph node n(g) is the central part of AG. Such graph nodes are linked to each
other using graph edges e(g) to represent a complete ontology change log graph.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Pattern Discovery in Ontology Change Logs</title>
      <p>
        Good patterns always arise from practical experience [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Change patterns,
created in a collaborative environment, provide guidelines to ontology change
man
      </p>
      <p>Tue: 24/03/2011:
12:03:24
129973...</p>
      <p>
        MJaved
timeStamp
agement and has important implications for ontology-development tools [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Identifying recurring changes define reusable, often domain-specific change
patterns. The aim is the discovery of usage-driven change patterns in order to
support the pattern-based ontology evolution.
      </p>
      <p>
        In order to conceptualise the changes, we constructed a static metadata
ontology by looking into concrete structure of OWL-DL syntax-based domain
ontologies. The metadata ontology represents different categories of ontology changes
based on our layered change operator framework [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], types of ontology elements
(such as concept, axioms, restriction etc.) and other concepts such as change,
users, timestamp etc. Each instance of the change log is of type Change, available
in the metadata ontology. We used an RDF triple-store to record the change log,
domain ontologies and static metadata ontology. Our ontology editing framework
(OnE) offers a graph API, which is used for generating graphs from change log,
based on SPARQL queries. We captured more than 500 ontology changes from
the university ontology change log (&gt;5000 log triples).
      </p>
      <p>The change log graph allows us to identify and classify frequent changes
that occur in ontologies over a period of time. Initially, we analyzed the change
log graph manually and observed that combinations of change operations occur
repeatedly during the evolution of ontologies. We identified these as frequent
recurring change patterns that can be reused.</p>
      <p>While patterns are sometimes used in their exact form, often more flexibility
is needed. Users often use different orderings of change operations to perform
the same (semantically equivalent) change at different times. To capture
semantically equivalent, but operationally different patterns, we introduce a metric
that captures a node gap between two adjacent graph nodes in a sequence called
sequence gap or n-distance. It refers to the distance between two adjacent graph
nodes in a change log graph. This will help us to more flexible pattern notions.
We merge different types of patterns into two basic subdivisions, i.e. ordered
change patterns (OP) and unordered change patterns (UP). The instances of
ordered change patterns comprise change operations in exact same sequential order
from change log graph. Thus, such complete (OCP) or partial (OPP) patterns
may have only a positive node distance value, starting from zero to user given
threshold (x). The instances of unordered change patterns comprise change
operation which may or may not be in exact same sequential order from change
log graph. These complete (UCP) or partial (UPP) patterns may have a node
distance that ranges from a negative node distance value (−x) to a positive node
distance value (x). Completeness means that all pattern nodes are used in the
concrete graph; partiality refers to a subset of nodes. For the remainder, we
focus on complete change patterns, but we discuss the relevance of partial change
patterns in our conclusions.</p>
      <p>We considered identifying recurring sequenced change operations from a
change log as a problem of recognition of frequent pattern in a graph. We
describe two key metrics by introducing the following definitions. First, the pattern
support of a pattern p is the number of occurrences of such pattern in the change
log graph G. Pattern support is denoted by sup(p). The minimum number of
occurrences required for a sequence s in change log graph G to qualify as a change
pattern p is the minimum pattern support, denoted by min sup(p). Second, the
pattern length of a pattern p is the number of change operations in it, denoted
by len(p). The minimum length required for a sequence s in a change log graph
G to qualify as a member of a candidate pattern set is the minimum pattern
length, denoted by min len(p).</p>
      <p>A set of candidate pattern sequences CP, to be considered as a change pattern
P, must meet the following criteria:
– The length of the each candidate pattern sequence cp must be equal to
or greater than the threshold value set by the minimum pattern length
len(cp) ≥ min len(p) : cp ∈ CP .
– The support for a candidate pattern cp in a change log graph G must
be above the threshold value of minimum pattern support of a pattern
sup(cp) ≥ min sup(p) : cp.</p>
      <p>Each candidate pattern sequence cp must also reach a consistent ontology state.
There may be internal graph nodes in cp, reaching non-consistent states,
however, at the end of the sequence the ontology must be back in a consistent state.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Pattern Discovery Algorithms</title>
      <p>The discovery of change patterns is operationalised in the form of discovery
algorithms. The section is divided into two parts, i.e. algorithms for searching
ordered complete change patterns (OCP) and algorithms for searching unordered
complete change patterns (UCP). The inputs to the pattern discovery algorithms
comprise the graph G representing change log triples, the minimum pattern
support min sup, the minimum pattern length min len and the maximum
nodedistance x. Before we describe each algorithm, we introduce some concepts.
– Target entity, primary/auxiliary context of change: The target entity is the
ontology entity to which the change is applied; primary/auxiliary context
refers to entities which are directly/indirectly affected by such a change.
– Candidate node (cn): A candidate node cn is a graph node selected at the
start of the node iteration process (discussed later). Each graph node will
act as a candidate node cn in one iteration each of the algorithm.
– Candidate sequence (cs): The candidate sequence cs is the context-aware set
of graph nodes starting from particular candidate node cn.
– Discovered node (dn): The discovered node dn is a graph node that matches
the candidate node cn (in a particular iteration) in terms of its operation,
element and type of context. DN refers to the set of discovered nodes.
– Discovered sequence (ds): ds is the context-aware set of graph nodes
starting from a discovered node dn that matches candidate sequence cs (in an
iteration). DS refers to the set of discovered node sequences.</p>
      <p>OCP Discovery Algorithm. To discover ordered complete change patterns
(OCP), the identified sequences are of same length and contain change operations
in the exact same chronological order. The basic idea of the algorithm is to
iterate over the graph nodes, generate the candidate sequence starting from
particular graph node and search the similar sequences within the graph G. The
OCP algorithm is defined in algorithms 4.1 – 4.2. The algorithm iterates over
each graph node and selects it as a candidate node (cnk), where k refers to
the identification key of the node. If the candidate node cnk contains only one
parameter, the parameter is considered as target entity. If node cnk contains
two parameters, the first parameter is considered as target entity and the second
parameter is considered as the primary context of the candidate node. Once
the candidate node and its target entity are captured, an iterative process of
expansion of candidate node cnk to its adjacent nodes cnk++ starts and continues
until more extensions are not possible (i.e. adjacent nodes do not share the same
target entity). If the target entity of the adjacent node is matched with the
target entity of the candidate node, it is taken as the next node of the candidate
sequence cs. If the target entity does not match, an iterative process will start to
find the next node whose target entity matches the target entity of the candidate
node. The iteration continues based on the user threshold x, i.e. the allowed gap
between two adjacent graph nodes of a pattern (n-distance).</p>
      <p>Algorithm 4.1 Ordered Complete Change Pattern Discovery Algorithm
10: go back to step 1.
11: end if
12: while (DS.size ≥ min sup) do
13: for each discovered sequence ds in DS do
14: t ← getT argetEntity(ds)
15: Expand(dnj, x)
16: M atch(dnj++, cnk++, t)
17: if (Expanded &amp;&amp; Matched) then
18: ds ← dnj++
19: else
20: break while loop.
21: end if
22: end for
23: if (ds.size &lt; min len) then
24: discard ds from DS
25: end if
26: end while
27: max ← get Maximum length of sequences such that (max ≥ min sup)
28: for each sequence ds in DS do
29: if (ds.size &lt; max) then
30: discard ds
31: else
32: trimSequence(ds, max)
33: end if
34: end for
35: Pdomain specific ← (ds + cs)
36: S ← Pdomain specific
37: end for</p>
      <p>Once the candidate sequence is constructed and is above the threshold value
of the minimum pattern length, the next step is to search for the matching nodes
(i.e. discovered nodes dn) of the same type as candidate node cnk. If the number
of discovered nodes is above the threshold value (minimum pattern support),
the next step is to expand the discovered nodes and match them to parallel
candidate nodes. Each discovered node is expanded one after another. Similar
to the expansion of candidate nodes, the identification of the next node of a
discovered sequence ds is an iterative process (depending on x).</p>
      <sec id="sec-4-1">
        <title>Algorithm 4.2 Method:GenerateCandidateSequence()</title>
        <p>Input: Graph (G), Maximum n-distance (x), Graph Node (n)
Output: Candidate Sequence (cs)
1: k = 0
2: cnk ← n
3: cs ← cnk
4: context = true
5: while (context) do
6: Expand(cnk, x)
7: if (Exanded) then
8: cs ← cnk++
9: else
10: context = f alse
11: end if
12: end while
13: return cs</p>
        <p>The expansion of a discovered node dn stops if either more extension of that
node is not possible or the expansion has reached to the size of the candidate
sequence (i.e. the length of ds is equal to the length of cs). At the end of the
expansion of a discovered sequence, if the length of an expanded discovered
sequence is less than the threshold value of the minimum pattern length, it must
be discarded from the set of discovered sequences.</p>
        <p>Once the expansion of discovered nodes is finished, in order to identify the
change patterns of greater size, the next step is to find the maximum length of the
sequences (max) such that the value of max is greater than or equal to threshold
value of the minimum pattern length and the number of identified sequences
is greater than or equal to the threshold value of minimum pattern support.
Sequences whose length is less than the value max are discarded from the set of
discovered sequences. Those discovered sequences whose length is greater than
max are truncated to size max.</p>
        <p>As a last step, the candidate sequence along with the discovered sequences is
saved as a domain-specific change pattern in the result list S and the algorithm
goes back to step 1 and selects the next graph node as a candidate node.
UCP Discovery Algorithm. A collection of change operations is not always
executed in same chronological order, even if the result is the same. As then
the change operations in a sequence can be reordered, the aim is to discover
unordered complete change patterns by modifying the node search space in each
iteration. The pseudocode of the UCP algorithm is given in algorithms 4.3 – 4.4.</p>
        <p>Like OCP, UCP iterates over each graph node and selects it as a candidate
node (cnk). An iteration process is used to construct a candidate sequence cs by
expanding candidate node cnk to its subsequent context-matching nodes cnk++.
The next step is to identify the discovered nodes dn and adding them as the first
member of the discovered sequence set DS. There are two main differences in the
expansion of discovered sequences in UCP and OCP. Firstly, the search space in
which the mapping graph node will searched and, secondly, the introduction of
an unidentified-nodes list (ul) which records the unidentified nodes of a candidate
sequence.</p>
        <p>Algorithm 4.3 Unordered Complete Change Pattern Discovery Algorithm
Input: Graph (G), Minimum Pattern Support (min sup), Minimum Pattern Length
(min len), Maximum n-distance (x)
Output: Set of Domain-Specific Change Patterns (S)
1: for i = 0 to NG.size do
2: k = 0
3: cs ← GenerateCandidateSequence(n(gi))
4: if (cs.size &lt; min len) then
5: go back to step 1.
6: end if
7: DN ← DiscoverM atchingN odes(cnk)
8: DS ← DN
9: if (DS.size &lt; min sup) then
10: go back to step 1.
11: end if
12: while (DS.size ≥ min sup) do
13: for each discovered sequence ds in DS do
14: t ← getT argetEntity(ds)
15: setSearchSpace(ds)
16: a ← searchInSpace(ds, cnk++, t)
17: if (found) then
18: ds ← a
19: ascendSequence(ds)
20: setSearchSpace(ds)
21: if (!ul.isEmpty()) then
22: nodeFound = true
23: while (!ul.isEmpty() &amp;&amp; nodeF ound) do
24: nodeF ound ← searchU nidentif iedN odes(ul, ds)
25: ascendSequence(ds)
26: setSearchSpace(ds)
27: end while
28: end if
29: else
30: ul ← cnk++
31: end if
32: end for
33: if (ds.size &lt; min len) then
34: discard ds from DS
35: end if
36: end while
37: for each discovered sequence ds in DS do
38: if (ds.size &lt; cs.size) then
39: discard ds from DS
40: end if
41: end for
42: Pdomain specific ← (ds + cs)
43: S ← Pdomain specific
44: end for
Before the expansion process on any discovered node starts, the search space
(i.e. range of graph nodes in which node will be searched) has to be set. It is
described using two integer variables start range (rs) and end range (re), where
rs and re represent the node ids of the start and end graph nodes of search space.
The search space can be given as rs = min(id) − x − 1 and re = max(id) + x + 1.</p>
        <p>Values min(id) and max(id) are the minimum and maximum id values of
the graph nodes in the discovered sequence ds in a particular iteration. New
values of rs and re are calculated at the start of each iteration of the discovered
node expansion process. For example, given the gap constraint (x = 1) and
a discovered sequence ds that contains two graph nodes ds = {n9, n11} in a
particular iteration, then the search space (in which the next discovered node
will be searched) is n7 − n13. As the algorithm scans the whole graph only once
(i.e. in step 7 of algorithm 4.3 to get the discovered node set) and narrows the
search space later, the search space defining technique improves the performance
of the algorithm.</p>
        <p>The unidentified nodes list (ul) records all candidate nodes that are not
matched in the ds expansion process. If a new node is added to a discovered
sequence, the sequence will be converted into ascending form (based on their id
values) and the search space is reset. If there is no match and ds is not expanded,
the respective candidate node is added to ul. Once the discovered sequence ds is
expanded, an iteration process is applied on ul to search the unidentified nodes
in the updated search space. If an unidentified candidate node is found and
matched (to a discovered node) in the updated search space, the node is added
to the discovered sequence and removed from the unidentified node list. Based
on the modified discovered sequence, the values of rs and re are recalculated.</p>
        <p>At the end of the expansion of a discovered sequence, if the length of an
expanded discovered sequence is less than the minimum pattern length threshold,
it must be discarded from the set of discovered sequences. Then, all discovered
sequences whose length is less than the length of a candidate sequence are
discarded. As a last step, the candidate sequence along with discovered sequences
are saved as a change pattern in the result list S and the algorithm goes back
to step 1 and selects the next graph node as a candidate node.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Algorithm 4.4 Method:setSearchSpace()</title>
        <p>Input: Graph (G), Discovered Sequence (ds), Maximum n-distance (x)
Output: Updated search space (rs - re)
1: n1 ← getF irstN odeOf Sequence(ds)
2: n2 ← getLastN odeOf Sequence(ds)
3: rs = n1.getN odeID() − x − 1
4: if (rs ≤ 0) then
5: rs = 1
6: end if
7: re = n2.getN odeID() + x + 1
8: if (re &gt; G.size) then
9: re = G.size
10: end if
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results and Illustration</title>
      <p>When ontologies are large and in a continuous process of change, our pattern
discovery algorithms can automatically detect change patterns. Such patterns
are based on operations that have been used frequently. This reduces the effort
required in terms of time and consistency management.</p>
      <p>All experiments with change pattern discovery algorithms have been done on
a 3.0 GHz Intel Core 2 Duo CPU with 3.25 GB of RAM, running MS Windows
XP. We utilized our algorithms to discover the domain-specific change patterns in
ontology change log graphs. Given a fixed user input value for minimum pattern
length and minimum pattern support, we executed the algorithms, varied the
node–distance value and evaluated their results. Two examples from discovered
change pattern sequences, one from each level, i.e. ABox-based change patterns
and TBox-based change patterns, are given in Tables 1 and 2.
5.1</p>
      <p>
        Pattern Discovery Example
Elsewhere [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], we presented pattern-based ontology change operators and
motivated the benefits of pattern-based change management where patterns are
usually domain-specific compositions of change operators. Our work here can be
utilised to determine these patterns and make them available for reuse.
– The key concern is the identification of frequent change patterns from change
logs. In the first place, these are frequent operator combinations and can
result in generic patterns. However, our observation is that many of these
are domain-specific, as the example below will illustrate.
– This can be extended to identify semantically equivalent changes in the form
of a pattern. For instance, a reordering of semantically equivalent operations
needs to be recognised by the algorithms.
      </p>
      <p>Change pattern discovery shall now be illustrated using a simple example. The
example in Table 1 is the ABox-based change pattern from the university
ontology, representing the registration procedure of a new PhD student to the
department. First, the student has been registered as a PhD student of a
particular department. Then, a student Id, email Id and a supervisor (which is a
faculty member of the university) is assigned to the student. At the end, the
student is added as a member of a particular research group of the university.
We captured such change patterns and stored them in the ontology evolution
framework for their reuse. Hence, whenever a new PhD student has to be
registered, a stored change pattern can be applied as a single transaction (ensuring
cross-ontology integrity constraints to be met).
Id Change Operations
17 Add individual &lt;TargetEntity i&gt;
18 Add classAssertion &lt;TargetEntity i&gt; &lt;Univ:PhD Student&gt;
19 Add objectPropertyAssertion &lt;TargetEntity i&gt; &lt;Univ:isStudentOf&gt; &lt;Univ:Department i&gt;
20 Add dataPropertyAssertion &lt;TargetEntity i&gt; &lt;Univ:StudentID&gt; &lt;xsd:int&gt;
21 Add dataPropertyAssertion &lt;TargetEntity i&gt; &lt;Univ:EmailID&gt; &lt;xsd:string&gt;
22 Add objectPropertyAssertion &lt;TargetEntity i&gt; &lt;Univ:hasSupervisor&gt; &lt;Univ:Faculty i&gt;
23 Add objectPropertyAssertion &lt;TargetEntity i&gt; &lt;Univ:MemberOf&gt; &lt;Univ:ResearchGroup i&gt;</p>
      <p>The example in Table 2 is a TBox-based change pattern from software
application ontology, representing the introduction of a new software activity. First,
a new concept (TargetEntity c1 ) has been added as a subclass of concept
Software:Activity. Later, to perform this activity, a new procedure has been added
as a subclass of concept Software:Procedure in the help infrastructure section of
the ontology. Finally, the activity and the procedure to perform such activity
are linked to each other using an object property Software:hasProcedure.
In this section, we examine the practical benefits of the discovered change
patterns and lessons learnt in existing real world scenario. Later, we compare the
two types of algorithms and outline the known limitations.</p>
      <p>
        Tools Support and Practical Benefits: The core benefits of discovery of
ontology change patterns are change impact determination and cost reduction
(in terms of time and consistency management). The possible applications of our
change pattern discovery algorithms range from supporting the change tracking
tools, identification of user’s behavioural dependency &amp; classification of users
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], change request recommendations, to the analysis of change patterns and
discovery of causal dependencies.
      </p>
      <p>Tool Support for Change Tracking: One of the key benefits of our change
patterns discovery approach is its integration with an existing ontology change
tracking toolkit (such as Prot´eg´e, Neon etc.). We incorporated the change
capturing and pattern discovery algorithms (as a plugin) in OnE, our Ontology
Editing framework. We executed the change pattern discovery algorithms on
the recorded ontology changes and discovered change patterns. Users can choose
a suitable change patterns from the discovered change pattern list and store
them in their user profile. Later, whenever users load that particular ontology,
they get the list of stored change patterns in their profile and can be applied in
a form of transactions.</p>
      <p>Change Request Recommendation: The identified change patterns can also
be used for change request recommendations. For example, whenever a user
adds a new PhD student in the university ontology, based on the identified PhD
Student Registration change pattern, it can be recommended to the user to add
student id, email id of the student and assign a supervisor to him (using object
property hasSupervisor ). Similarly in software application domain, whenever a
user deletes certain activity from the domain, deletion of the relevant help files
can also be recommended to the user.</p>
      <p>Discovery of Causal Dependencies: Discovered change patterns benefited us
in terms of identification of correlations and the causal dependencies (which
can be integrity constraints) across the ontology taxonomy. An example of a
captured causal dependency in university ontology is the introduction of a new
course. Whenever a new course is introduced, new subjects (and subject codes)
have to be introduced, course-related books have to be provided by the library,
or new vacancies might have to be advertised. This is a domain-specific pattern
for organisations where the introduction of a new product or service entails
for instance further personnel changes. Causal dependencies across the ontology
subsumption hierarchy can easily be overlooked. Pattern discovery can identify
successful changes based on these dependencies. These are association rules that
capture non-obvious relationships.</p>
      <p>Lesson learnt from Real-World Scenario: A number of lessons were learnt
by running our algorithms on a real-world software application ontology. We
compared different versions of the software application (a document archiving
system) and analysed how software application evolves from one version to
another. Based on pattern discovery mechanism, the application is represented in
a form of layered framework, consisting of explanatory, functional and system
implementation layers. The implementation layer consists of the key components
of the software application. The functional layer represents the management
activities of each software component. The explanatory layer contains the software
application help files. These help files provide details about software component
and the content management activities. We ran our change pattern discovery
algorithms and identified the changes at each layer of software application. At the
implementation layer, the most frequent patterns includes addition, deletion,
splitting and merging of different software components etc. At the functional
layer, addition or deletion of archiving activities and gui items are the most
common actions. The change patterns at the explanatory layer are generally linked
to the changes at implementation and functional layers. For example, whenever
pattern add new component is implemented, the relevant description files has
been added (add new description files) at explanatory layer. Similarly, whenever
a new activity has been added at functional layer, the relevant procedural files
has been added (add new procedural files) at explanatory layer.</p>
      <p>Comparison between OCP and UCP: We conducted a performance study
on our case study datasets and looked at their effectiveness for pattern
discovery. OCP is efficient in terms of time consumption due to the permissibility of
only positive node distances (x), i.e. the iteration process for the search of the
next adjacent sequence node only operates in forward direction of the change log
graph. Unordered change operations make the UCP algorithm complex as UCP
needs to i) keep record of all change operations of the sequence (even if they are
not identified), ii) recalculate the search space in each iteration, iii) search the
next sequence node not only in the search space of the graph but also in the
unidentified list of change nodes and iv) converting a sequence to ascending form
in each iteration. However, UCP is more efficient in terms of numbers of
discovered patterns. Similarly, in terms of the size of maximal patterns, UCP discovers
patterns of greater size than OCP. The detailed comparison table between two
algorithms is available on our web1.</p>
      <p>Limitations: One of our algorithms limitations is that it cannot be applied
on the change parameters which are represented as complex expressions. Our
algorithm considers all parameters as atomic classes, properties or individuals.
Secondly, our algorithms used an assumption, i.e. the target entity is always
the first parameter of any ontology change operations. This assumption does
not suit, for example in case of inverse properties, such as hasSupervisor and
isSupervisorOf. In our future work, deep comparison of ontology change
operations will be made in order to identify the target entity (context) in relation to
identified change operations of a sequence.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        The mining of sequential patterns was first proposed by Agrawal and Srikant
in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Since then, many sequential pattern mining algorithms often based on
specific domains [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16 ref17">13–17</xref>
        ] have been suggested.
      </p>
      <p>
        In the domain of DNA or protein sequences, BLAST [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is one of the most
famous algorithms. Given a query sequence (candidate sequence), it searches for
a match from the databases. In contrast, we focus on mining of change sequences
(patterns) from an ontology change database. In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the author proposed the
MCPaS algorithm to answer the problems of mining complex patterns with gap
requirements. Similar to our approach, it allows pattern generation and growing
to be conducted step by step using gap-constrained pattern search.
      </p>
      <p>
        Several algorithms focus on graph-based pattern discovery [
        <xref ref-type="bibr" rid="ref18 ref19 ref20 ref21">18–21</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
the author propose an apriori-based algorithm, called AGM, to discover frequent
substructures. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the authors proposed the gSpan (graph-based
Substructure pattern mining) algorithm for mining frequent closed graphs and adopted a
depth-first search strategy. In constrast to our work, their focus is on discovering
frequent graph substructures without candidate sequence generation. A
chemical compound dataset is compared with results of the FSG [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] algorithm. The
performance study shows that the gSpan outperforms FSG algorithm and is
capable of mining large frequent subgraphs. The Fast Frequent Subgraph Mining
(FFSM) algorithm [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is an algorithm for graph-based pattern discovery. FFSM
can be applied to protein structures to derive structural patterns. Their approach
facilitates families of proteins demonstrating similar function to be analyzed for
structural similarity. Compared with gSpan [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and FSG [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] algorithms using
various support thresholds, FFSM is an order of magnitude faster. gSpan is more
suitable for small graphs (with no more than 200 edges).
1 www.computing.dcu.ie/~mjaved/phdwork.html
      </p>
      <p>
        We adopted ideas from sequential pattern discovery approaches in other
domains, such as sequence-independent structure pattern [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], gap-constraint [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
for setting cutoffs in terms of node matching mechanism. Yet, discovery of change
patterns from ontology change logs is relatively different from sequential pattern
discovery in other contexts (such as the biomedical domain). Recently, few
researchers have focused on detection of higher level generic ontology changes [
        <xref ref-type="bibr" rid="ref5 ref6">5,
6</xref>
        ]. In contrast to their work, our approach is to discover the change patterns and
is based on context-aware, semantic matching of different graph sequences. This
requires the identification of equivalency between unordered change sequences.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>Ontologies will continue to evolve. Scalability beyond manual evolution and
change support is required for both stand-alone ontologies that live over very
long periods of time and also ontologies used in conjunction with other
information systems that themselves can trigger change. Supporting this process by
automated management of change and its analysis is a solution.</p>
      <p>
        The presented work here continues our previous research [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] by adding
operational ontology change representation and pattern discovery algorithms.
In this paper, we proposed a graph-based formalism for ontology change
representation using attributed graphs, which are typed over a generic attribute
type graph. We studied the change log graphs empirically and observed that
the users perform similar groups of change operations multiple times as a
combination of single atomic change operations. The contribution of our work are
the algorithms for the discovery of domain-specific change patterns to support
pattern-based ontology evolution. On the basis of our empirical study, we
identified two fundamental types of change patterns i.e., Ordered Change Patterns
(OP) and Unordered Change Patterns (UP).
      </p>
      <p>The experimental results signify that the solution is valid and adequate to
efficiently handle pattern-based ontology evolution. Our approach benefits in
different ways. First, the discovered patterns are suitable to provide
patternlevel ontology change support. Second, as the discovered patterns are based on
the actual frequent changes applied by the users, patterns assist in a structured
pattern-based evolution process. Third, domain-specific change patterns can be
shared among other ontologies that have similar conceptualizations.</p>
      <p>The next step would clearly be to take the discovered patterns and try to
identify them in other change logs in order to analyse the use of patterns. This
would require our discovery algorithms to be revised accordingly. We have
mentioned the partial patterns types OPP and UPP in section 3. Partial patterns
usage is more a concern for this pattern identification context (and was not dealt
with here for that reason).</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement</title>
      <p>This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142)
as part of the Centre for Next Generation Localisation at Dublin City University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology evolution: Not the same as schema evolution</article-title>
          .
          <source>In: Journal of Knowledge and Information Systems</source>
          . Volume
          <volume>6</volume>
          (
          <issue>4</issue>
          ). (
          <year>2004</year>
          )
          <fpage>328</fpage>
          -
          <lpage>440</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt</surname>
          </string-name>
          , N.:
          <article-title>Ontology change management in prot´eg´e</article-title>
          .
          <source>In: Proceedings of AKT DTA Colloquium</source>
          , Milton Keynes, UK. (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Stojanovic</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Methods and tools for ontology evolution</article-title>
          .
          <source>PhD thesis</source>
          , University of Karlsruhe (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Palma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Change representation for owl 2 ontologies</article-title>
          .
          <source>In: Proceedings of the sixth international workshop on OWL: Experiences and Directions (OWLED)</source>
          .
          <article-title>(</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Papavassiliou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouris</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotzinos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
          </string-name>
          , V.:
          <article-title>On detecting high-level changes in RDF/S KBs</article-title>
          . In: 8th
          <source>International Semantic Web Conference</source>
          . Volume
          <volume>5823</volume>
          of LNCS., Springer (
          <year>2009</year>
          )
          <fpage>473</fpage>
          -
          <lpage>488</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Groner</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Categorization and recognition of ontology refactoring pattern</article-title>
          .
          <source>Technical Report 09/2010</source>
          ,
          <string-name>
            <surname>Institut</surname>
            <given-names>WeST</given-names>
          </string-name>
          , Univ. Koblenz-Landau (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ehrig</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prange</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taentzer</surname>
          </string-name>
          , G.:
          <article-title>Fundamental theory for typed attributed graph transformation</article-title>
          .
          <source>In: Intl. Conf. on Graph Transformation</source>
          . (
          <year>2004</year>
          )
          <fpage>161</fpage>
          -
          <lpage>177</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fayad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , R.:
          <article-title>Software patterns</article-title>
          .
          <source>In: Communications of the ACM, Special Issue on Patterns &amp; Pattern Lang</source>
          . Volume
          <volume>39</volume>
          (
          <issue>10</issue>
          ). (
          <year>1996</year>
          )
          <fpage>37</fpage>
          -
          <lpage>39</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Falconer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tudorache</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          :
          <article-title>An analysis of collaborative patterns in large-scale ontology development projects</article-title>
          .
          <source>In: Proceedings of the sixth international conference on Knowledge capture. K-CAP '11</source>
          (
          <year>2011</year>
          )
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Javed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abgaz</surname>
            ,
            <given-names>Y.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pahl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A pattern-based framework of change operators for ontology evolution</article-title>
          .
          <source>In: On the Move to Meaningful Internet Systems: OTM Workshops</source>
          . Volume
          <volume>5872</volume>
          of LNCS., Springer (
          <year>2009</year>
          )
          <fpage>544</fpage>
          -
          <lpage>553</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Javed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abgaz</surname>
            ,
            <given-names>Y.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pahl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A layered framework for pattern-based ontology evolution</article-title>
          .
          <source>In: 3rd International Workshop Ontology-Driven Information System Engineering (ODISE)</source>
          , London, UK. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Mining sequential patterns</article-title>
          .
          <source>In: Proc. of the Int. Conf. on Data Engineering. IEEE Computer Society</source>
          . (
          <year>1995</year>
          )
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficiently mining closed subsequences with gap constraints</article-title>
          .
          <source>In: Proc. SIAM Int. Conf. on Data Mining (SDM'08)</source>
          , USA. (
          <year>2008</year>
          )
          <fpage>13</fpage>
          -
          <lpage>322</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Plantevit</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choong</surname>
            ,
            <given-names>Y.W.</given-names>
          </string-name>
          :
          <article-title>Mining multidimensional and multilevel sequential patterns</article-title>
          .
          <source>In: ACM Transactions on Knowledge Discovery from Data, Article 4. Volume</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ). (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stefanowski</surname>
          </string-name>
          , J.:
          <article-title>Algorithms for context based sequential pattern mining</article-title>
          .
          <source>In: Fundamenta Informaticae</source>
          . Volume
          <volume>76</volume>
          (
          <issue>4</issue>
          ). (
          <year>2007</year>
          )
          <fpage>495</fpage>
          -
          <lpage>510</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Mining complex patterns across sequences with gap requirements</article-title>
          .
          <source>In: 20th International Joint Conference on Artificial Intelligence</source>
          . (
          <year>2007</year>
          )
          <fpage>2934</fpage>
          -
          <lpage>2940</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Altschul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gish</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Basic local alignment search tool</article-title>
          . In
          <source>: Journal of Molecular Biology</source>
          . Volume
          <volume>215</volume>
          (
          <issue>3</issue>
          ). (
          <year>1990</year>
          )
          <fpage>403</fpage>
          -
          <lpage>410</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Inokuchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Washio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motoda</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An apriori-based algorithm for mining frequent substructures from graph data</article-title>
          .
          <source>In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery</source>
          . (
          <year>2000</year>
          )
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          .:
          <article-title>gspan: Graph-based substructure pattern mining</article-title>
          .
          <source>In: IEEE International Conference on Data Mining</source>
          . (
          <year>2002</year>
          )
          <fpage>721</fpage>
          -
          <lpage>724</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kuramochi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karypis</surname>
          </string-name>
          , G.:
          <article-title>Frequent subgraph discovery</article-title>
          .
          <source>In: 1st IEEE Conference on Data Mining</source>
          . (
          <year>2001</year>
          )
          <fpage>313</fpage>
          -
          <lpage>320</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Huan</surname>
          </string-name>
          , J.:
          <article-title>Graph Based Pattern Discovery in Protein Structures</article-title>
          .
          <source>PhD thesis</source>
          , Department of Computer Science, University of North Carolina (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>