<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SYNTHESIS: Results for the Ontology Alignment Evaluation Initiative (OAEI) 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonis Koukourikos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Vouros</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vangelis Karkaletsis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Digital Systems, University of Piraeus</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Informatics &amp; Telecommunications</institution>
          ,
          <addr-line>NCSR “Demokritos”</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>State</institution>
          ,
          <addr-line>Purpose, General Statement</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper presents the SYNTHESIS platform, a system for automatic ontology alignment. The system supports the model-based synthesis of different individual matching methods under a co-operational framework. The configuration that has been tested over the datasets provided by the OAEI 2013 tracks incorporates four matching methods. The paper provides a brief description of the system, presents the results acquired over the various OAEI 2013 Campaign tracks and discusses the system's strengths and weaknesses, as well as, future work that will target the observed issues.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the System</title>
      <p>1.2</p>
      <sec id="sec-2-1">
        <title>Specific Matching Techniques Used</title>
        <p>This section describes briefly the method for synthesizing different matching
methods that is employed by SYNTHESIS. Furthermore, it describes the individual
matching methods incorporated in the configuration of the system that participated in the
OAEI 2013 campaign.</p>
        <p>
          Synthesis. The design and initial implementation of the described matching method
is described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In this work, the synthesis of different matching methods is treated
as a coordination problem, aiming to maximize the welfare of the interacting entities
(agents). In this setting, each agent corresponds to a specific ontology element and to
an individual matching method. Each agent is responsible to decide on a
correspondence for its element to a target ontology, also in coordination with the other agents, so
as to preserve the semantics of specifications. An agent is characterized by: (a) its state
and (b) its utility function. The state ranges in the set of those elements in the target
ontology that the matching method of the agent assesses to correspond to the agent’s
element. A specific assignment to the state variable represents an agent’s decision on a
specific correspondence. Nevertheless the utility of an agent for a specific
correspondence depends on the states of neighboring agents. Specifically, the utility of an agent is
specified to take into account structural constraints derived from subsumption relations
among classes in the source ontology. These constraints represent dependencies
between agents’ decisions, and must be satisfied in order for the computed
correspondences to preserve the semantics of ontological specifications and ensure the coherence
of the correspondences.
        </p>
        <p>Actually, neighbor agents of an agent A are those agents that correspond to the same
ontology element but to different alignment methods, as well as those agents that
correspond to ontology elements that are subsumed by the ontology element of A.</p>
        <p>
          Agents are organized in graphs where they run the max-sum algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to
compute a joined set of correspondences (i.e. an alignment) so as to maximize the sum of
their utilities.
        </p>
        <p>SYNTHESIS is actually a generic platform that can be configured to incorporate any
number of individual matching methods.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Methods incorporated in the current version of the system. The configuration of</title>
        <p>SYNTHESIS that was used for the OAEI 2013 campaign incorporates four, most of
them fairly standard, matching methods. These are described in the following
subsections.</p>
        <p>
          COCLU. This is a string matching technique. It is realized by a partition-based
clustering algorithm, which divides the examined data (strings in our cases) into clusters and
searches over the created clusters using a greedy heuristic [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The clusters are
represented as Huffman trees, incrementally constructed as the algorithm generates and
updates the clusters by processing one string at a time. The decision for adding a newly
encountered string in a given cluster is based on a score function, defined as the
difference of the summed length of the coded string tokens that are members of the cluster
and the corresponding length of the tokens in the cluster when the examined string is
added to the cluster. The implementation incorporated into SYNTHESIS exploits and
compares the local names, labels and comments of the examined classes.
VSM. This is a Vector Space Models-based method [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], computing the similarity
between two documents. In the case of mapping tasks, the pseudo-documents to be
compared are constructed as follows: Each document corresponds to a class or property and
comprises words in the vicinity of that element, i.e. all words found in (a) local name,
label and comments of the class; (b) the local name, label and comments for each of the
class’ properties; and lexical information for its related classes, as defined in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The
produced documents are represented as vectors of weighted index words. Each weight
is the number of words’ occurrence in the document. We apply cosine similarity to
measure the similarity between two vectors.
        </p>
        <p>
          CSR. The CSR method [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] computes subsumption relationships between pairs of
classes belonging in two distinct ontologies. The method treats the mapping problem as a
classification task, exploiting class properties and lexical information derived from
labels, comments, properties and instantiations of the compared classes. Each pair of
classes is represented as a feature vector, which has a length equal to the number of
distinct features of the ontologies. The classifier is trained using information of both
ontologies, considering each ontology in isolation.
        </p>
        <p>
          LDM Alignment. This new method is conceived as part of a Linked Data management
system, which uses unstructured textual information from the Web, in the form of
extracted relation triples, in order to perform various processes related to the whole
spectrum of managing and maintaining Linked Data repositories, such as Ontology
Alignment and Enrichment, Repository Population, Linkage to external repositories, and
Content and Link Validation [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The method performs web searches, using lexical
information from the local names, labels and instances of the compared classes. The web
documents returned from the web searches are pre-processed in order to derive their
textual information, and relation tuples are extracted from each document. The sets of
relation tuples associated with each class are compared, and classes’ similarity is
assessed.
1.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Adaptations Made for the Evaluation</title>
        <p>After some preliminary runs of the system with the datasets provided by the OAEI
campaign, it became evident that the main flaws of the system had to do with its
inability to handle ontologies of large size (in terms of the number of elements in the
ontology). This is due to the current implementation of the generic synthesis process and
to the complexity of the methods incorporated in SYNTHESIS.</p>
        <p>In order to produce a system of acceptable efficiency, we introduced a dynamic
method allocation component in SYNTHESIS. The component performs a shallow
analysis of the input ontologies, in terms of their size and their structure. After several
runs with different method combinations for the campaign datasets, the following
allocation strategy was adopted: the CSR and LDM methods were excluded when the
source ontology included more than 300 classes and properties. Furthermore, CSR was
excluded if the examined ontologies were relatively flat, that is if the hierarchy of
classes was not deeper that three subsumption levels.</p>
        <p>While the motivation for the introduction of this component was to obtain
meaningful results for as many OAEI tracks possible, we aim to expand on the idea of
dynamically invoking different sets of mapping methods, depending on the specific alignment
task at hand. To this end, the method allocation component can become more intricate
and analytic, and be able to select a specific configuration of mapping methods from a
much larger pool, ensuring that the system has reasonable execution times while also
preserving its performance in terms of precision and recall.
1.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Link to the System and Parameters File</title>
        <p>http://users.iit.demokritos.gr/~kukurik/SYNTHESIS.zip
1.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Link to the set of provided alignments</title>
        <p>http://users.iit.demokritos.gr/~kukurik/results.zip
2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The subsections that follow provide an overview and a brief analysis of the results
achieved by SYNTHESIS in the various tracks included in the OAEI 2013 Campaign.
SYNTHESIS was packaged and executed following the setup defined by the SEALS
platform and using the provided SEALS client executable JAR.
2.1</p>
      <sec id="sec-3-1">
        <title>Benchmark</title>
        <p>The following table summarizes the results obtained for the benchmark track, and
specifically the bibliography test set.</p>
        <p>We furthermore obtained results for the finance test set, as it was provided via the
SEALS platform. These results are summarized below:</p>
        <p>Average Runtime</p>
        <p>5217 msec</p>
        <sec id="sec-3-1-1">
          <title>Average Runtime</title>
          <p>974454 msec</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Finance Dataset H-mean Precision 0.504</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>H-mean Recall 0.603</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>H-mean Recall 0.605</title>
          <p>2.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Anatomy</title>
        <p>SYNTHESIS was not able to finish its execution within a reasonable timeframe for
this dataset.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Conference</title>
        <p>The following table summarizes the results obtained for the conference dataset of
the 2013 campaign, as they were obtained via the SEALS client. The accumulative
results are as follows:</p>
        <sec id="sec-3-3-1">
          <title>Average Runtime 5245 msec</title>
          <p>2.4</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Multifarm</title>
        <sec id="sec-3-4-1">
          <title>Conference Dataset H-mean Precision 0.799</title>
        </sec>
        <sec id="sec-3-4-2">
          <title>H-mean Recall</title>
          <p>0.484</p>
          <p>The current version of SYNTHESIS does not directly address the mapping of
ontologies expressed in different languages. However, due to the fact that the synthesis
approach somehow matches ontologies by respecting their hierarchical structure, the
results obtained show a fairly acceptable precision. The following table summarizes the
results reported for this track.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>Precision 0.30 0.25</title>
        </sec>
        <sec id="sec-3-4-4">
          <title>Recall</title>
          <p>Different Ontologies
0.03
Same Ontologies
0.03</p>
        </sec>
        <sec id="sec-3-4-5">
          <title>F-measure 0.05 0.04</title>
          <p>2.5</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Library</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>General Comments</title>
      <sec id="sec-4-1">
        <title>Comments on the results</title>
        <p>As evidenced by the obtained results, the main advantages of SYNTHESIS can be
summarized to the following:
 SYNTHESIS manages to balance the precision and recall throughout different
datasets, even with the fairly simple matching methods running for many pairs of
ontologies.
 When adequate lexical information is available, i.e. when classes’ names and
comments were not suppressed, SYNTHESIS is able to exploit it and produce very good
results.
 The constraints taken into account by agents, enables SYNTHESIS to compute
coherent alignments.</p>
        <p>In contrast, the main drawbacks of SYNTHESIS are:
 The generic synthetic approach implemented in SYNTHESIS, does not scale well
with respect to ontology size. While its runtime for small and medium size
ontologies is quite satisfactory, when dealing with large or very large ontologies, the
system requires a significantly bigger execution time.
 Scalability is significantly affected also by the performance of the individual
matching methods incorporated in the OAEI 2013 system configuration.
 The current configuration of SYNTHESIS is sensitive to the lack of adequate lexical
information for the ontology elements. In the test cases where information like local
class names and labels were suppressed, the results were significantly worse. This is
due to the inclusion of mainly lexical-based matching methods in the current
configuration of the method.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Discussions on the ways to improve the current system</title>
        <p>The drawbacks of the current configuration of SYNTHESIS directly lead to the main
points that can be improved in the future. More specifically, the main problem in
various tracks of the campaign was the fact that SYNTHESIS was not able to complete its
execution within an acceptable timeframe. This motivates us to examine different
scalability techniques and incorporate them in the system. The actions to improve scalability
can refer to the performance of the individual methods used, as well as, the actual
process of synthesizing the different methods under Synthesis.</p>
        <p>Another important step towards improving SYNTHESIS is to design and incorporate
a more intricate method for choosing individual mapping methods. This is an
improvement step on itself, but it is a prerequisite for being able to introduce additional methods
in Synthesis and use the ones more appropriate for a specific alignment task.</p>
        <p>The ultimate goal is to incorporate methods that exploit different types of
information available (lexical, semantic, structural) at various settings (e.g. ontologies in
different languages), by performing a pre-processing step to detect the characteristics
of an alignment tasks, and use the most appropriate methods for constructing the agents
that will be part of the synthesis process.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The participation in the OAEI 2013 has provided significant input for the evaluation
and evolution of our system. The major conclusion was the system’s inability to handle
ontologies of large size, which will be the focus during the immediate next steps of our
research. The more detailed feedback provided by the organizers of each track was also
of particular importance, as it provided further insights for the functionality and the
requirements of an alignment system.
5</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          , “Ontology Matching:
          <article-title>State of the Art and Future Challenges”</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <year>2013</year>
          , pp.
          <fpage>158</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>V.</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          and
          <article-title>George A. Vouros, "Synthesizing Ontology Alignment Methods Using the Max-Sum Algorithm"</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , vol.
          <volume>24</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>940</fpage>
          -
          <lpage>951</lpage>
          , May,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Farinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Petcu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.R.</given-names>
            <surname>Jennings</surname>
          </string-name>
          , “
          <article-title>Decentralised coordination of lowpower embedded devices using the max-sum algorithm”</article-title>
          ,
          <source>in Proc. Of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS</source>
          <year>2008</year>
          ), Estoril, Portugal,
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valarakos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Vouros</surname>
          </string-name>
          , “AUTOMS:
          <article-title>Automating Ontology Mapping through Synthesis of Methods”, in Proceedings of the OAEI (Ontology Alignment Evaluation Initiative) 2006 contest</article-title>
          , Ontology Matching International Workshop, Athens, Georgia, USA,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>V.</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Valarakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Vouros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Karkaletsis</surname>
          </string-name>
          , “SEMA:
          <article-title>Results for the ontology alignment contest OAEI 2007”, OAEI (Ontology Alignment Evaluation Initiative) 2006 contest</article-title>
          , Ontology Matching International Workshop, Busan, Korea,
          <year>2007</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>V.</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Vouros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Karkaletsis</surname>
          </string-name>
          , “
          <article-title>On the discovery of subsumption relations for the alignment of ontologies”</article-title>
          ,
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          , Volume
          <volume>8</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>69</fpage>
          -
          <lpage>88</lpage>
          ,
          <year>March</year>
          2010
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Koukourikos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karkaletsis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Vouros</surname>
          </string-name>
          , “
          <article-title>Exploiting unstructured web information for managing linked data spaces”</article-title>
          ,
          <source>in Proceedings of the 17th Panhellenic Conference on Informatics (PCI '13)</source>
          , Thessaloniki, Greece,
          <year>September 2013</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>