<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying Graph Partitioning Techniques to Modularize Large Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ana Carolina Garcia</string-name>
          <email>carolina.acgg@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Letícia Tiveron</string-name>
          <email>leticiativeronbt@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Justel</string-name>
          <email>cjustel@ime.eb.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Cláudia Cavalcanti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praça General Tibúrcio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praia Vermelha - Urca</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rio de Janeiro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RJ - Brazil</string-name>
        </contrib>
      </contrib-group>
      <fpage>72</fpage>
      <lpage>83</lpage>
      <abstract>
        <p>Nowadays, it is difficult to reuse ontologies, especially those that cover a large domain. It is in this context that ontology modularization can be useful. The goal of this work is to investigate graph partitioning techniques and their application on the modularization of large ontologies, typically, biomedical ontologies. Such investigation may be divided in two steps: (i) how to convert an ontology, represented in OWL or RDF languages into a graph; (ii) which partitioning algorithm would be suitable. More specifically, this work focus is on how to preserve certain ontology properties/relationships in the generated modules. Therefore, a single way of graph conversion was adopted and user-defined edge weights were taken into account. Five graph partitioning algorithms were used for the present investigation, but just three of them were used to verify their behavior in face of edge weight variations. A case study was conducted using a toyontology on the pizza domain, and showed preliminary but interesting results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The Open Biological and Biomedical Ontologies (OBO) Foundry5 and the NCBO
BioPortal provide together more than 300 ontologies. How can a biomedical
annotator deal with such a variety of ontologies? In addition, it is even more difficult
to reuse them taking into account that typical biomedical ontologies have more than
five hundred terms.</p>
      <p>
        It is in this context that ontology modularization can be useful. However, in order
to put modularization into practice greatly depend on the goals that are pursued, and
thus, the splitting of ontologies into smaller modules has to follow some criteria
[
        <xref ref-type="bibr" rid="ref7">Parent and Spaccapietra, 2009</xref>
        ]. It is worth noting that ontologies are semantic-based
structures, in which each class and each property have different meanings. Such
differences should be taken into account in the process of modularization, i.e., certain
classes and/or properties (relationships) may be more relevant than others in the
generated modules. For instance, for a user of the Gene Ontology (
        <xref ref-type="bibr" rid="ref2">GO) [GO
Consortium, 2000</xref>
        ] it may be more important to generate modules where part-of
relationships between selected nodes are all included. Therefore, a good criterion for
generating reusable ontology modules is to rank properties, so that they are not
discarded during modularization.
      </p>
      <p>
        The task of modularizing large ontologies, typically, biomedical ontologies, can be
divided in two steps: (i) how to convert an ontology, represented in OWL or RDF
languages into a graph; and (ii) which partitioning algorithm would be suitable. With
respect to (i), there are different ways of doing such conversion, as stated in [
        <xref ref-type="bibr" rid="ref1">Coskun
et al., 2011</xref>
        ], but either one should represent property ranking values in the graph.
With respect to (ii), there are many graph partitioning algorithms, such as, Spin Glass,
Fast Greedy, Walktrap, Leading Eigenvector and Edge Betweenness. However, just
some of them take into account edge weights: Spin Glass, Walktrap and Fast Greedy.
In this context, a question still remains: which of these algorithms would be more
suitable for the ontology property-aware modularization problem?
      </p>
      <p>
        There are previous works that evaluate partitioning algorithms applied to
ontologies [
        <xref ref-type="bibr" rid="ref1">Coskun et al., 2011</xref>
        ][
        <xref ref-type="bibr" rid="ref11">Oh and Yeom, 2012</xref>
        ], but they did not take into
account edge weight variations.
      </p>
      <p>This paper investigates the behavior of graph partitioning algorithms with respect
to edge weight variations, and describes a case study that shows some initial but
interesting results. In order to conduct this study it was necessary to adapt and use
existing tools. For (i), the PATO6 tool was used because it includes an ontology-graph
conversion mechanism that adds weights to two types of properties (is-a and other
domain properties). It had to be adapted in order to assign a different weight to each
distinct domain property. Then, for (ii), the iGraph7 tool was chosen as it includes
implementations of three edge weight aware partitioning algorithms.</p>
      <p>The case study was carried out using a toy-ontology on the pizza domain.
Although the modularization targets are large ontologies, it would be very difficult to
analyze and evaluate the resulting modules for these ontologies. Therefore, the idea of
using a small ontology was to facilitate the analysis of the modularization results.
Moreover, the choice for a common knowledge domain, such as pizza, was to avoid
the need for biologists or domain specific specialists in the initial tests. The pizza case
study showed that the variation of properties weights led (according to the user needs)
to different partitioning results. It was noted that some algorithms kept most of the
6 http://web.informatik.uni-mannheim.de/anne/Modularization/pato.html
7 http://igraph.sourceforge.net/
highest weight properties within the modules and did not use them as cut edges
(between the modules).</p>
      <p>The rest of this paper is organized as follows. The following section describes the
biomedical scenario, which motivates this investigation. The third section describes
briefly some of the main graph partitioning techniques. The fourth section describes
the experiment and discusses its results. The fifth section concludes the paper,
pointing to future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Biomedical Ontologies</title>
      <p>
        Nowadays there are many ontologies on the biomedical domain that can be found at
The Open Biological and Biomedical Ontologies (OBO) Foundry and at the NCBO
BioPortal. The OBO Foundry is maintained by a group of researchers that establish a
set of principles for ontology development with the goal of creating a suite of
orthogonal interoperable reference ontologies in the biomedical domain [
        <xref ref-type="bibr" rid="ref9">Smith et al.
2007</xref>
        ]. Besides being a repository, the OBO plays the role of a reference organism,
which reviews and certifies a set of ontologies on the biomedical domain. At the time
of the writing of this paper, there were around 113 ontologies, from which only 8
were considered OBO ontologies, while the other 105 were still candidate ontologies.
The GO (Gene Ontology) is one of the most popular among OBO ontologies.
      </p>
      <p>
        The NCBO BioPortal [
        <xref ref-type="bibr" rid="ref6">Noy et al. 2009</xref>
        ] is an ontology repository that feeds a text
annotation service. At the time of this writing there were 306 ontologies. After a
quick analysis about their size, it was possible to conclude that approximately 3%
have more then 100,000 classes; 8% have more than 10,000 classes; and more then
50% have more then 500 classes. Then, it is fair to say that in the biomedical domain,
ontologies are typically of medium and large size. How can a biomedical
(ontologydriven) annotator deal with such a variety of large ontologies?
      </p>
      <p>This scenario motivates us on the investigation of alternative solutions for reducing
the complexity of ontology reuse. The next section summarizes some of the graph
partitioning techniques that could be useful to facilitate their reuse.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Graph Partitioning Techniques</title>
      <p>The graph partitioning problem has been used to model problems of different areas.
The growth and rapid evolution of real networks, created from technological and
social networks, resulted in an increasing volume of data sets. These data can be used
to extract information of the network elements. A network consists of a combination
of elements and relationships between pairs of elements. Given this definition, we can
construct an associated graph G = (V,E), in which the vertices (v ∈ V) represent the
network elements and the edges (e ∈ E) represent some kind of relationship between
the elements.</p>
      <p>The techniques for solving the graph partitioning problem try to divide the set V
(the vertices of G) in clusters (also communities or partitions) that optimize a certain
criterion. For instance, each cluster must have edges between internal vertices with
high weight and edges between different clusters with low weight. Following this
optimization criterion, the graph partitioning problem is NP hard and can be formally
defined as:
Given a graph G=(V,E), find p subsets V1, V2, ... Vp such that:
ii.
iii.</p>
      <p>⋃
=
∩
= ∅, for any
≠ .</p>
      <p>W(i) and W represent the sums of the weights of the edges between vertices
inside the sets Vi and V, respectively.</p>
      <p>The sum of the weights of the edges that connect the vertices into two subsets
Vi and Vj, for all pair i,j must be minimal.</p>
      <p>There are several approximate algorithms to solve the graph partitioning. In this
paper we consider the partitioning algorithms implemented in the library iGraph. This
library provides an implementation of 5 (five) different algorithms for graph
partitioning: Edge Betweenness Community, Walktrap Community, Fast Greedy
Community, Spin Glass Community and Leading Eigenvector Community algorithms.</p>
      <p>
        The Edge Betweenness Community is a divisive algorithm. It removes recursively
edges of the graph until determine communities [
        <xref ref-type="bibr" rid="ref1">Coskun et al., 2011</xref>
        ]. From a
nonweighted graph G=(V, E) the betweenness of an edge e ∈ E, is the number of shortest
paths that connect any two vertices v1 and v2 of V passing through the edge e. Note
that there may be more than one shortest path between two vertices. In this case, if
there are k shortest paths between the vertices v1 and v2 ∈ E, then each one will have a
weight 1/k to calculate the edge's betweenness of these paths [
        <xref ref-type="bibr" rid="ref8">Schaeffer, 2007</xref>
        ].
      </p>
      <p>This algorithm computes the values of the betweenness of each edge. And is based
on the following observation: edges with higher value of betweenness must be
connecting vertices of two different partitions, i.e., they are not inner edges in a
partition. Then the algorithm divides the graph into clusters, removing one by one the
edges with the highest value of betweenness. If more than one edge has the highest
value, one is chosen randomly. After each removal, the betweenness is recalculated
for each edge. This process is repeated until a stop criterion.</p>
      <p>
        The Walktrap Community is an algorithm based on the following statement:
"random walks in a graph tend to get trapped in dense parts of the graph,
corresponding to the communities" [
        <xref ref-type="bibr" rid="ref1">Coskun et al., 2011</xref>
        ]. That is, by drawing a
random path between two nodes, the nodes that belong to the path are more likely to
belong to the same community. It is a hierarchical agglomerative algorithm, because
the communities are built step by step through the union of vertices to form
communities. Initially the algorithm treats all vertices of the graph as communities of
a single node. Then, at each step, two communities are joined, until the stopping
criterion.
      </p>
      <p>
        The Fast Greedy Community is an algorithm widely used to determine
communities for non-directed and sparse graphs G=(V,E) [
        <xref ref-type="bibr" rid="ref4">Eom et al., 2009</xref>
        ]. This
algorithm is based on the concept of modularity of a partition C={C1, …Cp},
Q(C)=Σ1≤i≤ p aii – ai2 , where aii represents the edges of the graph inside the set Ci and
ai the number of edges with one endpoint in the set Ci The Fast GreedyCommunity
algorithm maximizes the value of the modularity in a greedy fashion.
      </p>
      <p>Initially, the algorithm considers each vertex of the graph as a unitary community.
Then, the algorithm finds the pair of communities Cp and Cq having the maximum
value ΔQij = eij –eji – 2ai aj = 2(eij – ai aj), where eij is the number of edges between
Ci and Cj. Next, the algorithm combines these two communities Cp and Cq in only one
community. The process is repeated while ΔQ is a positive number. During the
process of union of two communities into one, the algorithm updates the values
corresponding to neighboring communities (internal and external edges of the
communities affected by the union of Cp and Cq).</p>
      <p>The Spin Glass Community is an algorithm based in a thermodynamics technique
to model the graph-partitioning problem. The meta-heuristic Simulated Annealing is
used to solve the minimization energy problem considering in this model. In this
context, the spin states consider the vertices of the graph and the structure of the
partition in the graph is interpreted as the spin configuration that minimizes the
energy of the Spin Glass.</p>
      <p>
        The Leading Eigenvector Community algorithm uses the concept of modularity in
a different way to perform the partitioning. In this case, the algorithm finds the
eigenvector corresponding to the most positive eigenvalue of a modularity matrix,
defined from values ΔQij and divide the network into two groups, according to the
signs of this vector elements [
        <xref ref-type="bibr" rid="ref5">Newman, 2006</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Case Study</title>
      <p>
        As stated in [
        <xref ref-type="bibr" rid="ref7">Parent and Spaccapietra, 2009</xref>
        ], each module is expected to show a
similar unit of purpose, gluing together those elements that participate on a given
goal. A module should make sense to ontology engineers seeking to (re)use them
[
        <xref ref-type="bibr" rid="ref3">Grau et al., 2006</xref>
        ]. For instance, a module should represent an agreed
conceptualization of a sub-domain of the domain of the ontology.
      </p>
      <p>
        Evaluating ontology modules is not an easy task. A recent related work [
        <xref ref-type="bibr" rid="ref11">Oh and
Yeom, 2012</xref>
        ] proposes a new evaluation framework for selecting an appropriate
ontology modularization tool. In their work, modularization tools were evaluated as
use cases according to the proposed framework, which takes into account three
aspects: tool performance, data performance, and usability. Data performance
includes verifying the cohesion of a module, which means asking if a module contains
plenty of concepts, relationships, and axioms. However, although different
partitioning methods were compared, the property ranking was not used as a criterion
for modularization evaluation.
      </p>
      <p>
        Another related work [
        <xref ref-type="bibr" rid="ref10">Stuckenschmidt and Klein, 2004</xref>
        ] identifies ontology
partitions depending on property weights. These weights are calculated based on
graph dependencies (e.g. subclass, domain and range restrictions). However, a method
purely based on the structure of the ontology may not be able to capture semantics of
such properties.
      </p>
      <p>The present work adopts a user-based, but probably not scalable approach.
According to the user preference (weights assigned to object properties), a set of
modules is generated. A module makes sense if it is cohesive, meaning if it includes
the user-preferred properties, and the related concepts. Based on this criterion it was
possible to evaluate the generated modules.</p>
      <p>In order to verify the behavior of the selected five algorithms taking into account
weighted ontology properties, a case study scenario was prepared: support tools were
configured and/or adapted to execute such algorithms, having as input a chosen
ontology. The following subsections detail the case study scenario and the subsequent
executions of the algorithms, discussing the cohesion of the resulting modules.</p>
    </sec>
    <sec id="sec-5">
      <title>4.1 Scenario Preparation</title>
      <p>The modularization of a given ontology was divided into two tasks: the first one
consists in representing a given ontology as a graph, and the second consists in
partitioning this graph.</p>
      <p>
        With respect to the first task, although it is a non-trivial task, it was not in the
scope of this work to focus on this problem. There are different and richer ways of
converting an OWL/RDF ontology into a graph representation [
        <xref ref-type="bibr" rid="ref1">Coskun et al., 2011</xref>
        ],
but in the context of this work it was performed as follows: given an OWL file,
converts it into a graph G=(V,E), where each OWL class or RDF resource
corresponds to a vertex of V, and each OWL/RDF object property corresponds to an
edge of E. In this type of conversion, usually there is no distinction between the
edges, and therefore the different relationship types are lost. As we stated before,
since ontologies are semantic-based structures and have different domain properties
(object properties), the edge-weight variation is meaningful to their modularization.
      </p>
      <p>The literature pointed to some tools to perform the first task. The Jena8 java library
and the PATO tool were alternatives. Their outputs are graphs in graphML and Pajek
formats, respectively. Although Jena allows three distinct representations for the
graph, PATO was more suitable as it allows assigning weight values to graph edges.</p>
      <p>The PATO tool could be useful for the second task as well, since it performs the
complete modularization of an ontology. However, this tool is poorly documented and
it was not possible to identify the algorithm behind its partitioning algorithm. On the
other hand, there were two known partitioning C++ libraries available: SNAP9 and
iGraph. SNAP provides the implementation of two different partition algorithms,
edge betweenness and fast greedy, while iGraph provides three others besides those
two: spin glass, walktrap and Leading Eigenvector. Furthermore, iGraph admits Pajek
input format, allowing its use in conjunction with the PATO tool. Therefore, iGraph
was chosen for its coverage and compatibility.</p>
      <p>The PATO tool had to be adapted in two ways. First, it was removed all its
unnecessary functionalities for our goal. Second, since PATO’s original code only
differentiates the subclass relationship and the domain properties, assigning the same
weight to all the domain properties, it was necessary to adapt the code in order to
allow assigning distinct weights to different domain properties.
8 http://jena.sourceforge.net
9 http://snap.stanford.edu/snap/</p>
      <p>In addition to iGraph and PATO, the graphic interface of the R-tool10 from iGraph
was used in order to visualize the results. Also, it was developed a Java procedure
whose input is the output of the iGraph library, in text format (.txt). For a partitioning
that generates k communities, this routine returns k files (communities) in Pajek
format.</p>
      <p>To facilitate the analysis of the edge weight variation it was necessary to choose a
domain, small but sufficient, ontology example. As most of the biomedical ontologies
are large, they were initially discarded. Furthermore, it would be difficult to extract a
reduced size module of it to work with. Therefore, work with a toy-ontology on a
common knowledge domain would be a wise choice. A well-known example of toy
ontology on the pizza domain (used in knowledge representation tutorials and
courses) was chosen. However due to limitations of the PATO tool on dealing with
OWL format, an RDF smaller version of the pizza ontology11 was adapted and used.</p>
    </sec>
    <sec id="sec-6">
      <title>4.2 Partitioning Results</title>
      <p>The output graph obtained with PATO from the Pizza ontology is shown in Figure 1.
This figure was generated with the aid of R-tool and Power Point. The Pizza ontology
used has three types of properties: "isa", "consists_of" and "consumes" and for this
study it was planned 5 (five) combinations of weight distribution for the edges, as
described in the Table 2. From each choice of the edge weights combinations, 5
weighted graphs corresponding to the Pizza ontology were created.
10 http://www.R-project.org/
11 http://www.heiko-stoermer.net/teaching/2006-models-and-techniques-of-knowledge-representation</p>
      <p>Each of the five resulting graphs was partitioned by Fast Greed, Spin Glass and
Walktrap algorithms, which allows weighted graphs as input. The other two
algorithms, Leading Eigenvector and Edge Betweeness, were executed only to the
graph with constant weights (graph 0).</p>
      <p>For graph 0, the Spin Glass and Fast Greedy algorithms obtained the same result
(the same 3 communities). The Walktrap and Edge Betweenness algorithms obtained
the same result (the same 3 communities). But the difference between the partitions
obtained in these two cases is the allocation of vertices 0 and 12 (Non-Vegetarian
Pizza and Non-Vegetarian Ingredient). Figure 2 shows the output by the 5 algorithms
for graph 0. In this case, constant weights for the edges, the algorithms seem to have
similar or slightly different behavior. However, the communities obtained with Fast
Greedy and Spin Glass algorithms seem to be better.</p>
      <p>7</p>
      <sec id="sec-6-1">
        <title>ISA C4ONSUMES</title>
        <p>3 13
CONSUMES ISA CONSISTS_OF
14 ISA
ISA 15 CONSUMES</p>
        <sec id="sec-6-1-1">
          <title>ISA CO1N2SISTISSA_OF 10ISA ISA 11</title>
          <p>0 5 ISA
ISA ISA CONSISTS_OIFSA
1 ISA 6
9</p>
          <p>ISA 15 CONSUMES</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>7 ISA CO1N2SISTISSA_OF 10ISA ISA 11</title>
          <p>0 5 ISA
C3ONSIUSMAES 4CIOSNASUM1CEO3SNSISTISS_AOFISA CON1SISTSI_SOAIFSA6
14 ISA</p>
          <p>CONSISTS_OF 2
8 ISA
9</p>
          <p>Edge
Betweeness (3)</p>
          <p>ISA 15 CONSUMES
7 12</p>
          <p>ISA CONSISTS_OF ISA</p>
          <p>0</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>ISA C4ONSUMES ISA</title>
        <p>3 13</p>
      </sec>
      <sec id="sec-6-3">
        <title>CONSUME1S4 ISA CONSISTS_OFISA</title>
        <p>CONSISTS_OF 2
ISA
10</p>
        <p>ISA
5
CONSISTS_OF
ISA ISA
1 ISA 6</p>
        <p>ISA 11
ISA
9</p>
        <p>ISA 15 CONSUMES</p>
        <sec id="sec-6-3-1">
          <title>7 ISA CO1N2SISTISSA_OF 10ISA ISA 11</title>
          <p>0 5 ISA
C3ONSIUSMAES 4CIOSNASUM1CEO3SNSISTISS_AOFISA CON1SISTSI_SOAIFSA6
14 ISA
9
CONSISTS_OF 2</p>
          <p>ISA
8
Spin Glass (3)
Leading
Eigenvector (4)</p>
          <p>Walktrap (3)</p>
          <p>ISA 15 CONSUMES
7 12</p>
          <p>ISA CONSISTS_OF ISA</p>
          <p>0</p>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>ISA C4ONSUMES ISA</title>
        <p>13</p>
        <p>ISA CONSISTS_OF
14 ISA</p>
        <p>CONSISTS_OF 2</p>
        <p>ISA</p>
        <p>In what follows, we will discuss the content of each community generated with
the algorithms. Table 3 shows the vertices allocated in each of the 3 communities by
Fast Greedy and Spin Glass algorithms for graph 0. Both algorithms included
vegan/vegetarians on one community and non-vegetarians in another. Furthermore, a
third community of generic nodes, which did not fit in either first two communities,
was generated. Note that vertex 7 is not allocated in the generic nodes community.
However, vertex 7 is not adjacent to any other vertex in the third community, and
therefore the obtained result makes sense.</p>
        <p>For the graph 0, Edge betweenness and Walktrap algorithm obtain 3 communities.
In this case, vertices 0 and 12 (Non-Vegetarian Ingredient and Non-Vegetarian Pizza)
belong to the partition correspondent to general classes (Non-Vegetarian Customer).
Both algorithms focus on paths (one is deterministic and the other is probabilistic),
and this approach is different from the first two algorithms analyzed. However, more
tests should be performed in order to obtain general conclusions.</p>
        <p>It is also worth noting that, different from the other algorithms, the Leading
Eigenvector algorithm generated an extra partition with different types of customer
(Vegan and Vegetarian Customer and Customer itself). Vertex 15 (Non-Vegetarian
Customer), which seemed to be semantically closer to this community, was allocated
in another related community, which includes Non-Vegetarian Pizza and
NonVegetarian Ingredient.</p>
        <p>With respect to the other graphs (graphs 1-4), Fast Greedy algorithm showed the
best performance in terms of vertex clustering (communities). Fast Greedy’s
partitioning results are shown in Figure 3, while Figures 4 and 5 show the partitioning
results for the Spin Glass and Walktrap algorithms, respectively.</p>
        <p>PROPERTY WEIGHT
---------------------------CCISOOANNSSUISMTESSOF 112
PROPERTY WEIGHT
---------------------------CCISOOANNSSUISMTESSOF 211
7
ISA ISA</p>
        <p>4</p>
        <p>ISA
3
CONSUMES</p>
        <p>15 CONSUMES</p>
        <p>Graph 3, which assigns the highest weight for the "consists_of" property, is the one
for which Fast Greedy showed the best partitioning. This partitioning seems to make
more sense than the one generated by graph 0 (analyzed previously). The four
generated communities may be easily described, as follows: (i) Consumers (3,4,7,15);
(ii) Non-vegetarians (12,0); (iii) Generic classes (1,5,6,9,10,11); (iv) Vegan and
vegetarians (2,8,13,l4). Note that vertex 7 (Customer) now belongs to a community
that includes the other customers. The partitioning for graph 4 is equal to graph 0. In
the other graphs (1 and 2) there are some out of place vertices. For instance, vertex 0
(non-vegetarian ingredient) is separated from vertex 12 (non-vegetarian pizza).</p>
        <p>The partitioning results for Spin Glass algorithm do not show much variation.
Graphs 2-4 partitioning results are equal to graph 0 results. Graph 1, which assigns
the highest weight for the "isa" property, is the only one whose results are different,
but interesting. Note that they are similar to graph 3 results of the Fast Greedy
algorithm, with the difference that communities (ii) and (iii) are merged.</p>
        <p>Similarly, partitioning results for Walktrap algorithm show variation only for
graph 1, and one of its communities (0,1 and 6) do not make much sense, joining
together vertices with not much in common.</p>
        <p>Another important analysis is if the generated modules attend the property ranking
criterion, i.e., if the module includes the properties and concepts according to the user
priority assignment. As stated before, the idea is to prioritize one type of property, in
order to maintain them in the resulting partition. Figure 3 shows that the partitioning
results for graph 1, which assigns highest weight value to the "isa" property, shows
that most of the edges that represent this property are “inside” the community, i.e.,
they are not between two different communities (cut edges). Similarly, partitioning
results for graph 2, which assigns highest weight value to the “consumes” property,
shows that all the edges that represent the “consumes” property are inside the
communities. In other words, the vertices joined by edges of greater weight tended to
remain in the same community, and these edges were then maintained in some
partition. Fast Greedy algorithm also showed the best results, varying according to
the different configurations of edge values. Table 4 summarizes its results. Note that
when the “consumes” property is prioritized, it does not appear as a cut edge (0) in
the graph. The same occurs with the “consists_of” property.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>This paper described an experiment that showed some initial but interesting results on
how partitioning algorithms behave for ontology modularization, with focus on edge
weight variations. In the context of ontologies, it makes sense to identify priorities for
ontology object properties before partitioning. This can lead to more useful ontology
modules from the user point of view.</p>
      <p>
        The focus of this work was on the data performance evaluation, one of the aspects
of the evaluation framework proposed in [
        <xref ref-type="bibr" rid="ref11">Oh and Yeom, 2012</xref>
        ]. More specifically, the
focus was on the cohesion of the resulting modules. Five graph partitioning
algorithms were executed for the graph representation of a toy-ontology on Pizza
domain, but only three of them allowed the generation of weighted graphs. Among
these three, Fast Greedy algorithm had the best preliminary results, showing
“sensibility” with respect to the domain property ranking. However, further tests
should be executed with larger and different ontologies. The suggested assumptions
stated in this work can be refuted or confirmed by future work.
PROPERTY WEIGHT
---------------------------CONSISTS OF 1
CONSUMES 2
ISA 1
PROPERTY WEIGHT
---------------------------CONSISTS OF 1
CONSUMES 1
      </p>
      <p>ISA 2
PROPERTY WEIGHT
---------------------------CONSISTS OF 1
CONSUMES 2
ISA 1
PROPERTY WEIGHT
---------------------------CONSISTS OF 1
CONSUMES 1
ISA 2
7
ISA ISA</p>
      <p>4</p>
      <p>ISA
3</p>
      <p>ISA
ISA
ISA</p>
      <p>ISA
ISA
ISA
ISA
ISA
ISA</p>
      <p>ISA
ISA
ISA
5
1
5
1
5
1
5
1
ISA
ISA</p>
      <p>ISA
ISA
ISA</p>
      <p>6
10
ISA
ISA</p>
      <p>ISA
ISA
ISA</p>
      <p>6
10
ISA
ISA</p>
      <p>ISA
ISA
ISA</p>
      <p>6
10
ISA
ISA</p>
      <p>ISA
ISA
ISA
6
11
11
11
11
9
2
1
9
9
2
1
9
Acknowledgements
The authors would like to thank CNPq (309307/2009-0; 305516/2010-8
486157/2011-3; PIBIC scholarships) and FAPERJ (E-26/111.147/2011) for partially
funding their research projects.
3
4
3
4
7
7
7
7</p>
      <p>ISA
3
CONSUMES</p>
      <p>ISA
3
CONSUMES</p>
      <p>ISA
3
CONSUMES</p>
      <p>ISA
3
CONSUMES
ISA
4
ISA
ISA
4
ISA
ISA
4
ISA
ISA
4
ISA
10
ISA
10
ISA
10
ISA</p>
      <p>ISA
ISA
ISA</p>
      <p>6
ISA
ISA
ISA</p>
      <p>6
ISA
ISA
ISA</p>
      <p>6
ISA
ISA
ISA
6
11
11
11
11
9
9
9
9
ISA
PROPERTY WEIGHT
---------------------------CONSISTS OF 2
CONSUMES 1
ISA 1
ISA
PROPERTY WEIGHT
---------------------------CONSISTS OF 2
CONSUMES 2
ISA 1
ISA
PROPERTY WEIGHT
---------------------------CONSISTS OF 2
CONSUMES 1
ISA 1
ISA
PROPERTY WEIGHT
---------------------------CONSISTS OF 2
CONSUMES 2
ISA 1</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Coskun</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Rothe,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Teymourian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Paschke</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>“Applying Community Detection Algorithms on Ontologies for Identifying Concept Groups”</article-title>
          .
          <source>Proc. of the Fifth International Workshop on Modular Ontologies</source>
          , (WoMO
          <year>2011</year>
          ), p.
          <fpage>12</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>GO</given-names>
            <surname>Consortium.</surname>
          </string-name>
          (
          <year>2000</year>
          ) “
          <article-title>Gene ontology: Tool for the Unification of Biology”</article-title>
          .
          <source>Nat. Genet</source>
          .,
          <volume>25</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>“Modularity and web ontologies”</article-title>
          .
          <source>In: Proc. of the 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR</source>
          <year>2006</year>
          ), p.
          <fpage>198</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Eom</surname>
          </string-name>
          , Young-Ho; Choi, Yoonchan; Jeong, Hawoong; Kwak, Haewoon; Moon,
          <string-name>
            <surname>Sue</surname>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>“Mining Communities in Networks: A Solution for Consistency and Its Evaluation”</article-title>
          .
          <source>In: Proc. of the Internet Measurement Conf. (IMC</source>
          <year>2009</year>
          ), p.
          <fpage>301</fpage>
          -
          <lpage>314</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M. E. J.</given-names>
          </string-name>
          (
          <year>2006</year>
          ) “
          <article-title>Finding community structure in networks using the eigenvectors of matrices”</article-title>
          .
          <source>Physical Review E</source>
          <volume>74</volume>
          (
          <issue>3</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          et al. (
          <year>2009</year>
          )
          <article-title>“BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse”</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>37</volume>
          (
          <string-name>
            <surname>Web-Server-Issue</surname>
            <given-names>)</given-names>
          </string-name>
          , p.
          <fpage>170</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Parent</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Spaccapietra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>“An Overview of Modularity”</article-title>
          . In: Stuckenschmidt,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Parent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Spaccapietra</surname>
          </string-name>
          ,
          <string-name>
            <surname>S</surname>
          </string-name>
          . (Eds)
          <article-title>Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization</article-title>
          .
          <source>Lecture Notes on Computer Science 5445</source>
          , Springer, p.
          <fpage>5</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Schaeffer</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          (
          <year>2007</year>
          )
          <article-title>“Graph Clustering”</article-title>
          .
          <source>Computer Science Review</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>27</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosse</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al. (
          <year>2007</year>
          ) “
          <article-title>The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration”</article-title>
          .
          <source>Nature Biotechnology</source>
          <volume>25</volume>
          , p.
          <fpage>1251</fpage>
          -
          <lpage>1255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Structure-based partitioning of large concept hierarchies</article-title>
          .
          <source>In: Proc. Int. Semantic Web Conference (ISWC).</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yeom</surname>
            ,
            <given-names>H.Y.</given-names>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>“A comprehensive framework for the evaluation of ontology modularization”</article-title>
          .
          <source>Journal Expert Systems with Applications</source>
          <volume>39</volume>
          (
          <issue>10</issue>
          ), p.
          <fpage>8547</fpage>
          -
          <lpage>8556</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>