<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Alignment in the Cloud</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jurgen Bock</string-name>
          <email>bock@fzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Lenk</string-name>
          <email>lenk@fzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Danschel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FZI Research Center for Information Technology</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The problem of ontology alignment is prominent for applications that operate on integrated semantic data. With ontologies becoming numerous and increasingly large in size, scalability is an important issue for alignment tools. This work introduces a novel approach for computing ontology alignments using cloud infrastructures. An alignment algorithm based on particle swarm optimisation is deployed on a cloud infrastructure, taking advantage of its ability to harness parallel computation resources. The deployment is done with a focus on parallel e ciency, taking into account both communication latency and computational inhomogeneity among parallel execution units. Complementing previous experiments showing the e ectiveness of the alignment algorithm, this paper contributes an experiment executed \in the cloud", which demonstrates the scalability of the approach by aligning two large ontologies from the biomedical domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Ontology alignment is a problem prominent in semantic applications, the
semantic web, and the linked data web. It is based on the observation that ontologies1
are heterogeneous models, often representing a similar or equal domain of
interest, and hence have a certain overlap. Accordingly an ontology alignment
is de ned as a set of correspondences between ontological entities, i.e. classes,
properties, and individuals, of two ontologies.</p>
      <p>
        Two examples of where ontology alignment plays an important role are the
linked data web [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the biomedical domain [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. These examples also
demonstrate the necessity for alignment tools to be able to deal with very large
ontologies, which currently constitutes a problem for most alignment algorithms.
Furthermore, it can be observed that ontologies evolve gradually, i.e. changes
typically occur by adding / removing / modifying single entities while the largest
part of the ontology remains unchanged. Thus, due to the typically slow changes,
alignments do not need to be recomputed completely each time, but rather
incrementally maintained and adjusted according to the evolving ontologies.
Furthermore, in typical information systems that utilise alignments, the refresh period
1 There is some disagreement among semantic web and linked data web communities
about whether data sources in the linked data web can be called ontologies. In this
paper the term ontology is used to cover all types of semantic data sources.
for alignments is not time-critical, i.e. alignments do not need to be recomputed
ad-hoc, but can be adjusted o ine, e.g. in nightly alignment jobs.
      </p>
      <p>
        This paper addresses the scalability problem of ontology alignment by
utilising cloud computing as a massively parallel computation infrastructure. The
algorithm to be deployed in the cloud is an alignment algorithm based on particle
swarm optimisation (PSO) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A PSO-based approach to the ontology alignment
problem has several characteristics that meet the aforementioned observations:
1. The algorithm provides a meta-heuristic, which is independent of the
objective function to be optimised. Hence it is straightforward to exchange or
adapt the objective function according to the alignment scenario at hand.
2. The algorithm works incrementally, which has two bene cial e ects: Firstly,
the algorithm can be interrupted at any time providing the best alignment
that has been discovered so far. Secondly, the algorithm can be provided with
an initial (partial) alignment as a start con guration to be re ned. This in
particular allows for alignment evolution of gradually changing ontologies.
3. The algorithm is inherently parallelisable allowing for e cient execution on
parallel computing infrastructures.
      </p>
      <p>The rst two issues are inherent to the application of the PSO meta-heuristic.
However, the usability of the approach depends on an e cient deployment on
parallel computing infrastructures. Computation and adjustment of alignments
on an irregular basis usually does not justify the acquisition of large parallel
hardware infrastructures.</p>
      <p>
        Cloud computing infrastructures are scalable and can be used for data
intensive parallelisable jobs, as it has successfully been shown e.g. in the eld of
bio-informatics [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Recently, cloud computing has also been recognised as a
promising technique to provide scalability for web data processing [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The use
of cloud infrastructures provides the possibility to access a large number of
computing resources in a convenient way without the need of operating one's own,
expensive data centre. Many o erings by cloud providers such as Amazon Web
ServicesTM (AWS)2 are based on a pay-per-use pricing model with an hourly
rate. This allows for exible use of computation resources, since accessing large
amounts of computation power for a short period of time comes at the same
costs as using just a few resources for a long period of time.
      </p>
      <p>The remainder of this paper is structured as follows. In Sect. 2 relevant
background knowledge and related work is presented. Section 3 introduces the
approach of ontology alignment by PSO and demonstrates its parallel scalability.
Justi ed by these insights, Sect. 4 discusses the design and implementation of
a cloud-based deployment of the formerly introduced approach. Experimental
results are presented in Sect. 5 demonstrating both the e ectiveness by referring
to previous benchmarks, and the scalability by using a real-world biomedical
alignment scenario. Section 6 summarises the contributions and provides an
outlook on future work.</p>
      <sec id="sec-1-1">
        <title>2 http://aws.amazon.com</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Foundations</title>
      <p>This section introduces ontology alignment, particle swarm optimisation (PSO),
and cloud computing in more details.
2.1</p>
      <sec id="sec-2-1">
        <title>Ontology Alignment</title>
        <p>
          An ontology alignment is de ned as a set of correspondences between ontological
entities [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], i.e. classes, properties, and individuals, of two ontologies. In this
approach, an entity can correspond to at most one entity of the other ontology,
which denotes what Euzenat and Shvaiko call an ?:? alignment [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Ontology
alignment detection can be perceived as an optimisation problem [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], where the
task is to nd the optimal alignment of two ontologies w.r.t. the criteria denoted
in terms of an evaluation function. These criteria can be based on several basic
matching techniques [10, Chap. 4] or other measures, respecting domain speci c
and modelling language speci c characteristics of the ontologies to be aligned.
        </p>
        <p>
          A representative overview of the state-of-the-art in ontology alignment is
given by Euzenat and Shvaiko [10, Chap. 6] and by the yearly Ontology
Alignment Evaluation Initiative (OAEI) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Most systems focus on the improvement
of alignment quality, whereas the scalability problem has attracted interest only
recently. As the OAEI participation shows, only few systems are capable of
dealing with large ontologies [9, Sect. 10].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Particle Swarm Optimisation (PSO)</title>
        <p>
          PSO is a biologically and socioculturally-inspired meta-heuristic [
          <xref ref-type="bibr" rid="ref12 ref13 ref8">12, 13, 8</xref>
          ],
originally proposed in 1995 by Kennedy and Eberhart [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. It has become a major
research eld since then.
        </p>
        <p>PSO algorithms use a population of particles to nd the optimal parameter
con guration with respect to one or more objective functions. Each particle
represents a candidate solution. A particle's tness is evaluated using objective
function(s). During initialisation, each particle in the swarm is assigned a random
position in the parameter space. A PSO algorithm runs iteratively, where in each
iteration, each particle adjusts its position in the parameter space by adding a
velocity vector to its current position. These particle updates can be performed in
parallel. The velocity vector for a particle is determined by the particle's previous
velocity (inertia), the best position that has been visited by any neighbour of
the particle so far (social component ), and the best position, the particle itself
has visited so far (cognitive component ). Regarding the social component, several
models of de ning particle neighbourhoods have been analysed. Such models are
called social network structures or topologies. A PSO algorithm is called gBest
(global best) PSO, if the neighbourhood of a particle consists of the whole swarm.
Thus the gBest PSO is a special case of the lBest (local best) PSO, where the
neighbourhood of a particle comprises only a subset of the swarm [8, Chap. 12].</p>
        <p>It is important to note, that the particle swarm search heuristic works without
any knowledge about the objective function(s). Hence an objective function is
replaceable and adjustable to arbitrary optimisation problems.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Cloud Computing</title>
        <p>
          Cloud computing is a new paradigm that has been evolving over the last few
years. Most de nitions of cloud computing have in common that cloud computing
o erings can be categorised using the \Everything as a Service" (XaaS) model. A
more detailed view of this model is the \Cloud Computing Stack" [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. According
to the XaaS model the three main service classes are \Software as a Service"
(SaaS), \Platform as a Service" (PaaS), and \Infrastructure as a Service" (IaaS).
While SaaS o erings usually provide an interface directly to the end user by
providing a Web service interface or a graphical user interface (GUI) the PaaS
and IaaS o erings can be used by software architects to build new SaaS services
on top of them. PaaS o erings usually provide a platform where the software
developer can deploy the new services [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          The IaaS o erings at the \Basic Infrastructure Services" level, give the
developer full control over the servers that are running his software. At this level
the user can deploy new machines using Web service technologies. This o ers
the power and exibility to work on a machine level without running one's own
data center and so having convenient access to new resources. O erings such as
Amazon EC2 give users the opportunity to automatically deploy hundreds of
virtual machines within minutes. Thus, it is possible to build highly scalable
applications that can scale up and down in short periods. One famous example of
a successful cloud o ering is Animoto [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The Animoto application transforms
music les and photos into small video slide shows. After o ering their service to
users on a social network the demand of virtual machines went from about 40 to
3500 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This scalability was only possible by designing the Animoto software
as a distributed algorithm deployed on Amazon EC2.
        </p>
        <p>
          Another interesting feature of cloud o erings is the typical pay-as-you-go
pricing model. This means that users only pay for the resources they are really
using. Having such a pricing model it makes no di erence if one single server
is running for 10 hours or if 10 servers are running for just one hour. The New
York Times used this pricing scheme when they built their \TimesMaschine".
By using Amazon EC2 they were able to convert their whole archive (4 TB of
scanned TIFF les), spanning the years 1851-1922, to web documents within
24 hours and total costs of US$ 890 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          In the context of semantic technologies and the semantic web, cloud
computing technologies have been successfully applied, mainly for RDF storage [
          <xref ref-type="bibr" rid="ref18 ref19 ref22">22, 19,
18</xref>
          ], querying [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], and materialisation of RDF/OWL knowledge bases [
          <xref ref-type="bibr" rid="ref23 ref24">24, 23</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Ontology Alignment by Particle Swarm Optimisation</title>
      <p>
        Considering ontology alignment as an optimisation problem, a novel discrete
PSO algorithm (DPSO) has been developed to nd an optimal alignment of two
ontologies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The algorithm has been implemented under the name MapPSO3
and is based on a DPSO algorithm by Correa et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which implements e cient
attribute selection in data mining classi cation tasks.
      </p>
      <sec id="sec-3-1">
        <title>3 http://mappso.sourceforge.net</title>
        <p>3.1</p>
        <sec id="sec-3-1-1">
          <title>Algorithm</title>
          <p>
            In the case of ontology alignment, each particle represents a valid candidate
alignment. With a swarm of particles being initialised randomly, in each
iteration, each particle evaluates the alignment it represents via an objective function
comprising various basic matching techniques [10, Chap. 4] as well as the size of
an alignment. Typically there is only a partial overlap of ontologies, so the
alignment algorithm strives for nding the largest alignment, i.e. maximum number of
correspondences, of high quality. Base matchers can evaluate a correspondence
according to di erent characteristics, such as lexical or linguistic similarity of
entity labels or comments. Moreover, correspondences do not necessarily have
to be evaluated in isolation, e.g. a correspondence of two classes can get a better
evaluation if a correspondence of their superclasses is also part of the alignment
represented by this particle. As discussed in Sect. 2.2 the objective function and
thus the base matchers are replaceable and hence can be adapted to particular
alignment scenarios. For instance in biomedical ontologies speci c OBO4 [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]
annotations can be exploited to evaluate the similarity between entities.
          </p>
          <p>
            Similar to the previous work by Correa et al. [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and the original binary PSO
algorithm by Kennedy and Eberhart [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], particle movements are determined
by proportional likelihoods, since the traditional notion of velocity vectors is not
applicable for binary search spaces, as it is the case for ontology alignment. The
likelihood for a correspondence to be contained in a particle in the subsequent
iteration is increased, if the correspondence exists in the global best (social
component) or personal best (cognitive component) particle con guration.
          </p>
          <p>This update mechanism allows for a guided convergence of the swarm to a
global optimum, which represents the best alignment with respect to the basic
matching techniques speci ed as objective function. Moreover, the size of each
particle varies during the execution of the algorithm in uenced by the size of
the global best alignment. This causes the algorithm to search for the optimum
size of the alignment, i.e. partial overlap, as well.
3.2</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Parallel E ciency</title>
          <p>A detailed theoretical discussion of the correlation between population size and
number of iterations with respect to the swarm convergence goes beyond the
scope of this paper. Figure 1 illustrates an empirical analysis of the convergence
towards the optimum5 when aligning small ontologies of the OAEI benchmark
dataset. The gure shows clearly that a larger number of particles results in faster
convergence by reducing the number of required iterations and thus reducing
wall-clock runtime. Where it takes about 225 iterations for a swarm of 8 particles
to reach an alignment of high quality, the same result is achieved in only 90
iterations by a swarm of 64 particles. Using only 2 particles does not reach an
equally good result within 300 iterations.
4 Open Biomedical Ontologies
5 Note that the optimum depends on the chosen base matchers and thus not necessarily
has to be 0.</p>
          <p>0.8
0.7
t 0.6
n
e
lignm 0.5
A
t
s
e
lba 0.4
b
o
lf
g
tyo 0.3
li
a
u
Q 0.2
0.1
To obtain the required scalability, the PSO-based alignment algorithm has been
ported to Amazon Web ServicesTM (AWS), the IaaS cloud service of Amazon R .
AWS is one of the biggest IaaS providers with a large and active community,
and is representative for any other IaaS provider in the context of this work.</p>
          <p>The deployment has been realised using a server-worker pattern. Several
workers evaluate and update several local particles each, while they are managed
by a central server. Each worker determines the local best alignment among
the results of its particles and sends it to the server. The server determines the
global best alignment, then broadcasts it and synchronises the workers. Exchange
of information between server and workers is realised by the Amazon Simple
Queue Service (SQS)6 or via TCP/IP. The exchange of concrete particle states
is realised by the Amazon Simple Storage Service (S3)7 for reasons of scalability,
reliability, and parallel access.
Deploying the algorithm on virtual machines connected by a network bears two
main challenges. Firstly, the communication latency is much higher when
communicating via network than via main memory. Hence nding and broadcasting
the global best particle leads to a higher communication overhead, which slows
down the algorithm and creates unwarranted costs.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>6 https://aws.amazon.com/sqs/ 7 https://s3.amazonaws.com/</title>
        <p>Secondly, the computation times of workers in a particular iteration di er.
This di erence occurs mainly for two reasons: The unpredictable performance
of the virtual environment, and the varying particle sizes. The performance of
the virtual machine depends on its mapping to real hardware. For example a
\noisy" neighbour, i.e. another program sharing the same real hardware via
a di erent virtual machine, can slow down network communication. In turn, a
virtual machine can utilise more computing power if there are no or only inactive
neighbours. This results in unbalanced performance of the virtual machines. The
random initialisation of particles and their di erent sizes add to this discrepancy
in computation time. A small particle needs less computation time than a big
particle, therefore a worker with smaller particles will require less runtime per
iteration. The random initialisation of particles with di erent sizes is necessary
to search for the optimal alignment size and thus be able to identify partial
overlaps of ontologies. This computation time discrepancy causes fast workers
to idle while waiting for the slower workers and thus decreases parallel e ciency.
4.2</p>
        <sec id="sec-3-2-1">
          <title>Increasing Parallel E ciency</title>
          <p>The challenges of deploying the algorithm to a cloud-based infrastructre
identied have been addressed and solutions are proposed as follows.</p>
          <p>Addressing Latency. Amazon SQS was used as a means for communication
between server and workers. Amazon advertises SQS as reliable, scalable, and
simple. However, it turned out that SQS has high latency. For reducing the
network latency, direct communication has been implemented using the TCP/IP
protocol. Apart from reduced latency, this also resulted in a reduction of the
additional communication overhead caused by multiple workers.</p>
          <p>The latency is reduced further by only sending particle states when necessary,
i.e. when a better particle state has been found. To achieve this a worker sends
the tness value of its local best particle to the server and writes the particle
state itself into the S3 database. The server then broadcasts the global best
tness and the according database location to all worker instances.</p>
          <p>The number of read operations is minimal, since in the PSO topology used,
each particle must know about the global best. In case the global best does not
change, no worker has to read a particle state. The (unlikely) worst case in terms
of communication latency happens if every worker nds a new best alignment
and thus writes its particle state before it has to read a new one found by another
worker in the same iteration.</p>
          <p>Addressing Runtime Discrepancy. The e ect that workers require di erent
runtimes for particle tness computation can be minimised by particle pooling,
i.e. having each worker instance computing multiple particles. Using few particles
per worker results in high runtime variance between workers caused by di erent
particle sizes and the resulting uneven workload. Using more particles per worker,
i.e. the workload for each worker being the sum of workloads contributed by each
of its particles, averages the runtime because a rather uniform distribution of
small and big particles per worker is expected. Using a multi-particle approach
also increases parallel e ciency due to the increased overall runtime required to
evaluate and update several particles. By increasing the runtime the proportion
of time used for communication decreases and thus parallel e ciency is increased.</p>
          <p>Using asynchronous particle updates is another way to compensate the
runtime discrepancy. When using synchronous particle updates every worker has to
wait for the slowest worker and thus is wasting computation time. In the
asynchronous communication mode workers keep on evaluating and updating their
particles until they nd a particle state that is better than the global best they
know about. They send this particle state to the server and continue. The server
broadcasts the new particle state if it is better than the global best known by
the server. Preventing workers to idle drastically increases parallel e ciency.</p>
          <p>Introducing an asynchronous particle update strategy has an e ect on the
underlying particle swarm meta-heuristic. Having not all particles exchanging
information at the same time step has a similar e ect than changing the social
network structure of the PSO from a star topology to a cluster topology8 [8,
Chap. 12], which results in fast communication between particles on the same
worker compared to the communication between workers themselves. A clustered
(lBest ) topology in general results in slower convergence but better robustness
compared to the star (gBest ) topology [8, Chap. 12].</p>
          <p>
            Furthermore, a loss of information can be observed compared to the
synchronous update mechanism. This is due to the fact, that particles compute
iterations with possibly outdated information about the global best, which might
be available on a worker that is still computing another particle. This problem
has been analysed by Lewis et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. Their investigations revealed that the
information loss can be compensated by the increased number of computations
that are possible due to the reduced waiting time.
          </p>
          <p>The bene cial e ect on the runtime for an alignment is re ected in Fig. 2,
showing runtime behaviour for two medium sized ontologies for both synchronous
and asynchronous particle updates. As expected, using an asynchronous
communication mode does not have any e ect if only a single worker is used. However,
for 16 workers that need to communicate their local best results in each iteration,
a clear runtime improvement of almost 50 % can be observed.</p>
          <p>
            One further e ect resulting from the asynchronous particle updates is that
workers hosting mainly small particles can complete an iteration more quickly
than those hosting mainly large particles. Therefore small particles tend to
compute more iterations and thus in uence the swarm stronger than it would be the
case in the synchronous mode. This is bene cial to the overall runtime of the
algorithm, because the average size of particles is smaller and therefore iterations
are faster.
8 While in a star topology, every particle shares information with every other particle,
the cluster topology allows only groups of particles to communicate with each other,
while the groups themselves exchange information only via dedicated particles, which
are part of two clusters.
Since for large ontologies there are no reference alignments available, e ectiveness
and scalability of the approach need to be evaluated separately. While it can
be shown for small benchmark ontologies, that the algorithm produces correct
results using a problem speci c parameter con guration, a separate experiment
has been conducted evaluating scalability using a cloud infrastructure.
Previous evaluations of the MapPSO system at the Ontology Alignment
Evaluation Initiative (OAEI) campaigns 2008 and 2009 [
            <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
            ] have shown the e
ectiveness of the approach. In particular with respect to relaxed precision and recall
metrics [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] the OAEI 2009 organisers report that \[. . . ] MapPSO has signi
cantly better symmetric precision and recall than classical precision and recall,
to the point that it is at the level of the best systems." [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. The reason for this
di erence to classical metrics is most likely the rather generic parameter con
guration that has been used for all benchmark tests, without adaptation to the
single alignment scenarios. This generic parameter con guration is a restriction
that is of minor importance in real-world alignment scenarios.
In order to demonstrate the scalability of this approach, an experiment was
conducted aligning two large ontologies from the biomedical domain. The chosen
ontologies were the Gene Ontology (GO)9 with 31,650 classes, and the Medical
Subject Headings (MESH) Ontology10 with 15,343 classes, both converted to
the OWL format. Speci c base matchers were used that take advantage of the
class annotations resulting from the conversion from OBO to OWL.
          </p>
          <p>In the experiment a population of 128 particles was used distributed on 16
Amazon EC2 instances (workers) and thus utilising particle pooling by
computing 8 particles per worker. The EC2 instances were of the type \High-CPU
Extra Large Instance, 7 GB of memory, 20 EC2 Compute Units (8 virtual cores
with 2.5 EC2 Compute Units each), [. . . ] 64-bit platform, [where] one EC2
Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007
Opteron or 2007 Xeon processor"11. The setup was chosen in a way, that the
number of particles on each instance matches the number of virtual cores, thus
enabling e cient multi-threading of particle computation.</p>
          <p>The experiment was run for 3.5 h, where due to the asynchronous particle
updates, workers computed between 50 and 75 iterations. The di erence in the
number of iterations illustrates the di erent computational e orts of each worker.
Workers that achieved a lower number of iterations than other workers at the
same time, are most likely computing smaller particles on average. This
illustrates the runtime discrepancy addressed in Sect. 4.1 and the increased parallel
e ciency gained by using an asynchronous approach. By using a synchronous
approach the slowest particle constitutes an upper bound and thus a maximum
of 50 iterations could have been computed in 3,5 h. The asynchronous approach,
however, computes an average of 62.875 iterations, which is an increase of 25 %.</p>
          <p>Since there is no reference alignment available for GO and MESH as there
is none for any other large ontologies, no statement about the result quality can
be made. The experiment, however, demonstrates the scalability of the proposed
approach, operating on large ontologies and converging to an alignment.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The problem of large-scale ontology alignment detection has been addressed by
a PSO-based approach. PSO has the characteristics of (i ) being independent of
its objective function(s) and thus easily adaptable to various ontology alignment
scenarios, (ii ) being incremental, i.e. able to re ne an alignment when input
ontologies evolve, and (iii ) being inherently parallelisable. The latter aspect has
been exploited in this paper by deploying the algorithm in the AWS cloud. This
deployment is making use of the emerging cloud computing paradigm, which</p>
      <sec id="sec-4-1">
        <title>9 http://www.geneontology.org/ 10 http://www.nlm.nih.gov/mesh/meshhome.html 11 http://aws.amazon.com/ec2/#instance, accessed 2010/06/19</title>
        <p>provides an on-demand infrastructure for dynamic computational needs, as it is
the case for alignment re nements of gradually evolving ontologies.</p>
        <p>When utilising parallel computation infrastructures on a pay-per-use basis,
such as cloud o erings, it is crucial to maximise parallel e ciency. Due to the
variable particle sizes used in this approach, di erent computation times for each
particle in each iteration can be observed. Thus particle pooling, i.e. multiple
particles per cloud instance, as well as asynchronous particle updates have been
introduced to the algorithm in order to reduce the time wasted by idling particles.</p>
        <p>Previous experiments in the course of OAEI participations have shown the
e ectiveness of the PSO-based approach. Complementing these results, the
scalability via parallelisation has been shown for two large biomedical ontologies.</p>
        <p>Having realised the successful deployment of an ontology alignment algorithm
in the cloud using an IaaS provider, it is a straightforward extension to provide
ontology alignment itself as a web service (SaaS) based on a dynamic and
scalable infrastructure. This enables large-scale ontology alignment for every user
without bothering about hardware requirements. Together with the anytime
behaviour of the PSO-based algorithm, business models comprising the dimensions
computation time, price, and alignment quality can be created.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>The presented research was partially funded by the German Federal Ministry of
Economics (BMWi) under the project Theseus (number 01MQ07019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Animoto</given-names>
            <surname>Productions</surname>
          </string-name>
          <article-title>: animoto - the end of slideshows:</article-title>
          . online, http://www. animoto.com, last seen:
          <year>2010</year>
          /06/16
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data { The Story So Far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <volume>1</volume>
          {
          <fpage>22</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bock</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hettenhausen</surname>
          </string-name>
          , J.:
          <article-title>MapPSO Results for OAEI 2008</article-title>
          .
          <source>In: Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008)</source>
          . vol.
          <volume>431</volume>
          . CEUR Workshop Proceedings, http://ceur-ws.
          <source>org</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bock</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hettenhausen</surname>
          </string-name>
          , J.:
          <article-title>Discrete Particle Swarm Optimisation for Ontology Alignment</article-title>
          .
          <source>Information Sciences (Special Issue on Swarm Intellligence and Applications)</source>
          (
          <year>2010</year>
          ), article in press
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bock</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Hettenhausen</surname>
          </string-name>
          , J.:
          <article-title>MapPSO Results for OAEI 2009</article-title>
          .
          <source>In: Proceedings of the 4th International Workshop on Ontology Matching (OM-2009)</source>
          . vol.
          <volume>551</volume>
          , pp.
          <volume>193</volume>
          {
          <fpage>199</fpage>
          . CEUR Workshop Proceedings, http://ceur-ws.
          <source>org</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Correa</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , C.G.:
          <article-title>A New Discrete Particle Swarm Algorithm Applied to Attribute Selection in a Bioinformatics Data Set</article-title>
          .
          <source>In: Proceedings of the 8th Genetic and Evolutionary Computation Conference (GECCO-2006)</source>
          . pp.
          <volume>35</volume>
          {
          <fpage>42</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ehrig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.:
          <article-title>Relaxed Precision and Recall for Ontology Matching</article-title>
          .
          <source>In: Proceedings of the K-CAP 2005 Workshop on Integrating Ontologies</source>
          . vol.
          <volume>156</volume>
          , pp.
          <volume>25</volume>
          {
          <fpage>32</fpage>
          . CEUR Workshop Proceedings, http://ceur-ws.
          <source>org</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Engelbrecht</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          : Fundamentals of Computational Swarm Intelligence. Wiley (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joslyn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malaise</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pane</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Schar e, F.,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spiliopoulos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svab-Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>dos Santos</surname>
          </string-name>
          , C.T.,
          <string-name>
            <surname>Vouros</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Results of the Ontology Alignment Evaluation Initiative 2009</article-title>
          .
          <source>In: Proceedings of the 4th International Workshop on Ontology Matching (OM-2009)</source>
          . vol.
          <volume>551</volume>
          . CEUR Workshop Proceedings, http://ceur-ws.
          <source>org</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology Matching. Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hilley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Cloud Computing: A Taxonomy of Platform and Infrastructure-level O erings</article-title>
          .
          <source>Tech. rep.</source>
          , College of Computing, Georgia Institute of Technology (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          :
          <article-title>Particle Swarm Optimization</article-title>
          .
          <source>In: Proceedings of IEEE International Conference on Neural Networks</source>
          . vol.
          <volume>4</volume>
          , pp.
          <year>1942</year>
          {
          <year>1948</year>
          . IEEE Computer Society (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          : Swarm Intelligence. Morgan Kaufmann (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eberhart</surname>
          </string-name>
          , R.C.
          <article-title>: A Discrete Binary Version of the Particle Swarm Algorithm</article-title>
          .
          <source>In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics</source>
          . vol.
          <volume>5</volume>
          , pp.
          <volume>4104</volume>
          {
          <fpage>4108</fpage>
          . IEEE Computer Society (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Killalea</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <source>Building Scalable Web Services. Queue</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <volume>10</volume>
          {
          <fpage>13</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lenk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klems</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nimis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandholm</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>What's inside the Cloud? An architectural map of the Cloud landscape</article-title>
          .
          <source>In: Proceedings of the ICSE Workshop on Software Engineering Challenges of Cloud Computing (CLOUD)</source>
          . pp.
          <volume>23</volume>
          {
          <fpage>31</fpage>
          . IEEE Computer Society (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mostaghim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scriven</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Asynchronous Multi-Objective Optimisation in Unreliable Distributed Environments</article-title>
          ,
          <source>Studies in Computational Intelligence</source>
          , vol.
          <volume>210</volume>
          , pp.
          <volume>51</volume>
          {
          <fpage>78</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
          </string-name>
          , G.:
          <article-title>Web Semantics in the Clouds</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>23</volume>
          (
          <issue>5</issue>
          ),
          <volume>82</volume>
          {
          <fpage>87</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunter</surname>
          </string-name>
          , J.:
          <article-title>Scalable Semantics { the Silver Lining of Cloud Computing</article-title>
          .
          <source>In: Proceedings of the IEEE Fourth International Conference on eScience</source>
          . pp.
          <volume>111</volume>
          {
          <fpage>118</fpage>
          . IEEE Computer Society (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ekanayake</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beason</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunarathne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gannon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Cloud Technologies for Bioinformatics Applications</article-title>
          . In
          <source>: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS)</source>
          . pp.
          <volume>1</volume>
          {
          <fpage>10</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosse</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bug</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceusters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eilbeck</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ireland</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leontis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rocca-Serra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruttenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sansone</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheuermann</surname>
            ,
            <given-names>R.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration</article-title>
          .
          <source>Nature Biotechnology</source>
          <volume>25</volume>
          (
          <issue>11</issue>
          ),
          <volume>1251</volume>
          {
          <fpage>1255</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zacharias</surname>
            ,
            <given-names>V.:</given-names>
          </string-name>
          <article-title>RDF On Cloud Number Nine</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable &amp; Dynamic</source>
          . pp.
          <volume>11</volume>
          {
          <fpage>23</fpage>
          . CEUR Workshop Proceedings, http://ceur-ws.
          <source>org</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maassen</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
          </string-name>
          , H.:
          <article-title>OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples</article-title>
          .
          <source>In: Proceedings of the 7th Extended Semantic Web Conference (ESWC)</source>
          .
          <source>LNCS</source>
          , vol.
          <volume>6088</volume>
          , pp.
          <volume>213</volume>
          {
          <fpage>227</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Scalable Distributed Reasoning using MapReduce</article-title>
          .
          <source>In: Proceedings of the 8th International Semantic Web Conference (ISWC)</source>
          .
          <source>LNCS</source>
          , vol.
          <volume>5823</volume>
          , pp.
          <volume>634</volume>
          {
          <fpage>649</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>