<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monolingual and Cross-lingual Ontology Matching with CIDER-CL: evaluation report for OAEI 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Gracia</string-name>
          <email>jgracia@fi.upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kartik Asooja</string-name>
          <email>kartik.asooja@deri.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Enterprise Research Institute, National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontology Engineering Group, Universidad Polite ́cnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>CIDER-CL is the evolution of CIDER, a schema-based ontology alignment system. Its algorithm compares each pair of ontology entities by analysing their similarity at different levels of their ontological context (linguistic description, superterms, subterms, related terms, etc.). Then, such elementary similarities are combined by means of artificial neural networks. In its current version, CIDER-CL uses SoftTFIDF for monolingual comparisons and Cross-Lingual Explicit Semantic Analysis for comparisons between entities documented in different natural languages. In this paper we briefly describe CIDER-CL and comment its results at the Ontology Alignment Evaluation Initiative 2013 campaign (OAEI'13).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>
        According to the high level classification given in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], our method is a schema-based
system (opposite to others which are instance-based, or mixed), because it relies mostly
on schema-level input information for performing ontology matching. CIDER-CL can
operate in two modes: (i) as an ontology aligner, taking two ontologies as input and
giving their alignment as output, and (ii) as a similarity service, taking two ontology
entities as input and giving the similarity value between them as output. In the first case
the input to CIDER-CL are two OWL ontologies and a threshold value and the output
is an RDF file expressed in the alignment format3, although it can be easily translated
into another formats such as EDOAL4.
      </p>
      <p>The type of alignment that CIDER-CL obtains is semantic equivalence. In its
current implementation the following languages are covered: English (EN), Spanish (ES),
German (DE), and Dutch (NL).
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Specific techniques used</title>
      <p>
        In this section we briefly introduce the monolingual and cross-lingual metrics used by
CIDER-CL, as well as the overall architecture of the ontology aligner.
SoftTFIDF. SoftTFIDF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a hybrid string similarity measure that combines TF-IDF,
a token-based similarity widely used in information retrieval [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], with an edit-based
similarity such as Jaro-Winkler [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (although any other could be used instead).
      </p>
      <p>
        Typically, string comparisons to compute TF-IDF weights are based on exact
matching (after some normalisation or tokenisation step). The idea of SoftTFIDF is to use an
edit distance instead to support a higher degree of variation between the terms. In
particular, we use Jaro-Winkler similarity with a 0.9 threshold, above which two strings
are consider equal. SoftTFIDF measure has proved to be very effective when
comparing short strings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In our case, the corpus used by SoftTFIDF is dynamically created
with the lexical information coming from the two compared ontologies (extracting their
labels, comments, and URI fragments).
      </p>
      <p>
        CL-ESA. For cross-lingual ontology matching we propose the use of CL-ESA [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
a cross-lingual extension of an approach called Explicit Semantic Analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (ESA).
ESA allows comparing two texts semantically with the help of explicitly defined
concepts. This method uses the co-occurrence information of the words from the textual
definitions of the concepts using, for instance, the Wikipedia articles. In short, ESA
extends a simple bag of words model to a bag of concepts model. Some reports [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have
demonstrated the good behaviour of CL-ESA for certain tasks such as cross lingual
information retrieval.
      </p>
      <p>To compare two texts in different languages semantically, Wikipedia-based
CLESA represents the two texts as vectors in a vector space that has the Wikipedia titles
(articles) as dimensions, each vector in its own language specific Wikipedia. The
magnitude of each title/dimension is the associativity weight of the text to that title. To
quantify this associativity, the textual content of the Wikipedia article is utilized. This
weight can be calculated by using different methods, for instance, TF-IDF score.</p>
      <sec id="sec-3-1">
        <title>3 http://alignapi.gforge.inria.fr/format.html 4 http://alignapi.gforge.inria.fr/edoal.html</title>
        <p>
          For implementing CL-ESA, we followed an information retrieval-based approach
by creating a Lucene inverted index of the Wikipedia extended abstracts that exist in
all the considered languages i.e., EN, ES, NL, and DE. To create the weighted vector
of concepts, the term is searched over the index of the respective languages to retrieve
the top associated Wikipedia concepts and the Lucene ranking scores are taken as the
associativity weights of the concepts to the term. We used DBpedia URIs [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as the
pivot between cross-lingual Wikipedia spaces and to identify a Wikipedia concept no
matter the language.
        </p>
        <p>Scheme of the Aligner. Briefly explained, the alignment process is as follows (see
Figure 1):
1. First, the ontological context of each ontology term is extracted. This process is
enriched by applying a lightweight inference mechanism5, in order to add more
semantic information that is not explicit in the asserted ontologies.
2. Second, similarities are computed between different parts of the ontological
context. In particular, ten different features are considered: labels, comments,
equivalent terms, subterms, superterms, direct subterms, direct superterms (both for
classes and properties) and properties, direct properties, and related classes (for
classes) or domains, direct domains, and ranges (for properties).
3. Third, the different similarities are combined within an ANN to provide a final
similarity degree. CIDER-CL uses four different neural networks (multilayer
perceptrons in particular) for computing monolingual and cross-lingual similarities
between classes and properties, respectively.
4. Finally, a matrix (M in Figure 1) with all similarities is obtained. The final
alignment (A) is then extracted from it, finding the highest rated one-to-one relationships
among terms and filtering out the ones below the given threshold.
5 Typically transitive inference, although RDFS or more complex rules can be also applied, at
the cost of processing time.</p>
        <p>Implementation. Some datasets used in OAEI campaigns are open and the reference
alignments available for download. We have used part of such data to train our system.
In particular, we chose a subset of the OAEI’11 benchmark track to train our neural
networks for the monolingual case. We used the whole dataset but excluding cases 202
and 248-266, which present a total absence or randomization of labels and comments
(however their variations, 248-2, 248-4, etc., were not excluded). Also the reference
alignments of the conference track, which are also open, were added to the training
data set.</p>
        <p>The use of the benchmark track for adjusting the ANNs is motivated by the fact
that it covers many possible situations and variations well, such as presence or
absence of certain ingredients (labels, comments, etc.) or the effect of aligning at different
granularity levels (flattened/expanded hierarchies), etc. Further, we add also data of the
conference track to include training data coming from “real world” ontologies.</p>
        <p>For the cross-lingual case, we trained the neural networks with a subset of the
ontologies of the OAEI’13 Multifarm track (in EN, ES, DE, and NL): cmt, conference,
confOf, and sigkdd. Comparisons were run among the different ontologies in the
different languages, excluding comparisons between the same ontologies. Due to the slow
performance of CL-ESA, we decided to perform an attribute selection analysis to
discover which features have more predictive power. As result, we limited the system to
compute these features for classes: labels, subterms, direct superterms, direct subterms,
and properties; while for properties they were limited to: labels, subterms, and ranges.</p>
        <p>
          CIDER-CL has been developed in Java, extending the Alignment API [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. To create
and manipulate neural networks we use Weka6 data mining framework. For SoftTFIDF
we use SecondString7 and for CL-ESA we use the implementation developed by the
Monnet project8, which is available in GitHub as open source9.
1.3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Adaptations made for the evaluation</title>
      <p>The weights and the configuration of the neural networks remained constant for all
the tests and tracks of OAEI’13, as well as the threshold. In particular we selected a
threshold of 0.0025. The intention of such a small value was to promote recall over
precision (while filtering out some extremely low values). Therefore, later filtering can
be made to perform a threshold analysis as the organisers of some OAEI tracks do (e.g.,
conference track).</p>
      <p>Some minor technical adaptations were needed to integrate the system into the Seals
platform, like solving compatibility issues with the libraries used by the Seals wrapper.
1.4</p>
    </sec>
    <sec id="sec-5">
      <title>Link to the system and parameters file</title>
      <p>The version of CIDER-CL used for this evaluation (v1.1) was uploaded to the Seals
platform: http://www.seals-project.eu/ . More information can be found at CIDER-CL’s
website http://www.oeg-upm.net/files/cider-cl .</p>
      <sec id="sec-5-1">
        <title>6 http://www.cs.waikato.ac.nz/ml/weka/ 7 http://secondstring.sourceforge.net/ 8 http://www.monnet-project.eu/ 9 https://github.com/kasooja/clesa</title>
        <p>1.5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Link to the set of provided alignments (in align format)</title>
      <p>The resultant alignments will be provided by the Seals platform:
http://www.sealsproject.eu/
2</p>
      <sec id="sec-6-1">
        <title>Results</title>
        <p>For OAEI’13 campaign, CIDER-CL participated in all the Seals-based tracks10. In the
following, we report the results of CIDER-CL for benchmark, conference, anatomy, and
multifarm tracks. For the other tracks, the system was not fit for the type of evaluation
(e.g., interactive track) or could not complete the task (e.g., library). Details about the
test ontologies, the evaluation process, and the complete results for all tracks can be
found at the OAEI’13 website11.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Benchmark</title>
      <p>This year, a blind test set was generated based on a seed ontology of the bibliographic
domain. Out of the 21 systems participating in this track, CIDER-CL was within the
three best systems in terms of F-measure. In particular, the obtained results were:</p>
      <p>Precision(P)=0.85, Recall(R)=0.67 and F-Measure(F)=0.75</p>
      <p>Compare to the F=0.41 of edna, a simple edit distance-based baseline. In addition,
confidence-weighted measures were also computed for those systems that provided a
confidence value. In almost all cases the results were worse, as it was also the case of
CIDER-CL: P=0.84, R=0.55, and F=0.66</p>
      <p>Also the time spent in the evaluation was calculated. CIDER-CL took 844 19
seconds, which was slower than most of the systems (the median value was 173 sec)
although still far from the slowest one (10241 347 sec).
2.2</p>
    </sec>
    <sec id="sec-8">
      <title>Conference</title>
      <p>In this track, several ontologies from the conference domain were matched, resulting
in 21 alignments. In this case the organisers explored different thresholds and selected
the best achievable results. This test is not blind and the participants have the reference
alignments at their disposal before the evaluation phase.</p>
      <p>Two reference alignments were used in this track: the original reference alignment
(ra1) and its transitive closure (ra2). Two baselines (edna and string equivalence) were
computed for comparison. Notice that the results for CIDER-CL in this track are merely
illustrative and should not be taken as a proper test, due to the fact that part of the
training data of its neural networks came from the conference track reference alignments
(i.e., training and test data coincide partially).</p>
      <p>Out of the 25 systems participating in this track (some of them were variations of
the same system), CIDER-CL performance was close to the average. The results were:
10 http://oaei.ontologymatching.org/2013/seals-eval.html
11 http://oaei.ontologymatching.org/2013
test ra1 (original): P = 0.75, R = 0.47, and F = 0.58 with threshold= 0.14
test ra2 (entailed): P = 0.72, R = 0.44, and F = 0.55 with threshold= 0.08
CIDER-CL was in the group of systems that performed better than the two
baselines for ra2 and between the two baselines for ra1. The results for ra1 illustrates an
improvement with respect to the results obtained by its previous version (CIDER v0.4)
for the same test at OAEI’11 (F=0.53). The runtime was also registered: CIDER-CL
took less than 10 minutes for computing the 21 alignments. The other systems ranged
from 1 minute to more than 40.
2.3</p>
    </sec>
    <sec id="sec-9">
      <title>Anatomy</title>
      <p>This year, the current version of CIDER-CL completed the task and gave results for the
first time. In fact, in previous editions of OAEI, CIDER gave time-outs and the tool did
not finish the task, due to the big size of the involved ontologies. The results are:</p>
      <p>P = 0.65, R = 0.73, F = 0.69, R+ = 0.31</p>
      <p>These results are below the average of the overall results (F-Measure ranging from
0.41 to 0.94, with a median value of 0.81). An “extended recall” (R+) was also
computed, that is, the amount of detected non-trivial correspondences (that do not have the
same normalized label). For this metric CIDER-CL behaved better than the median
value (0.23). In terms of running time, CIDER-CL was the third slowest system (12308
sec) in this track, after discarding those that gave time-out.
2.4</p>
    </sec>
    <sec id="sec-10">
      <title>Multifarm</title>
      <p>This track is based on the alignment of ontologies in nine different languages: EN, DE,
ES, NL, CZ, RU, PT, FR, and CN. All pairs of languages (36 pairs) were considered in
the evaluation. A total of 900 matching tasks were performed. There were 21
participants in this track, 7 of them implementing specific cross-lingual modules as it was the
case of CIDER-CL.</p>
      <p>The organisers divided the results in two types: comparisons between different
ontologies (type i) and comparisons between the same ontologies (type ii). The result
summary published by the organisers aggregates the individual results for all the language
pairs. In the case of CIDER-CL this hampers direct comparisons with other systems,
owing to the fact that CIDER-CL only covers a subset of languages (EN, DE, ES, NL)
and non produced alignments in other languages penalised the overall results. For this
reason we have filtered the language specific results to consider only such subset of
languages. The averaged results for CIDER-CL are:
type i (different ontologies): P = 0.16, R = 0.19, F = 0.17
type ii (same ontologies): P = 0.82 , R = 0.16, F = 0.26</p>
      <p>For type ii, CIDER-CL got the 4th best result overall in terms of F-Measure and
the 3rd best result in the set of systems implementing specific cross-lingual techniques
(the results for such systems ranged from F = 0.12 to F = 0.44 for the referred subset of
languages). On the other hand, for type i CIDER-CL was in 8th position out of the 21
participants, although in the last place among the set of systems implementing
crosslingual techniques (F-measure of the other techniques ranged from 0.17 to 0.35).
3</p>
      <sec id="sec-10-1">
        <title>General comments</title>
        <p>The following subsections contain some remarks and comments about the results
obtained and the evaluation process.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Comments on the results</title>
      <p>CIDER-CL obtained good results for the benchmark track (third place out of 21
participants). This shows that our system performs well for domains in which the system
could be trained with available reference data. Also that SoftTFIDF is suitable for
ontology matching. In contrast, the results for the anatomy track were relatively poor. This
shows that creating a general purpose aligner based on our technique is not immediate.
Adding more training data from other domains would help to solve this.</p>
      <p>The results from the multilingual track are rather modest, but the fact that even the
best systems scored low illustrates the difficulty of the problem. We consider that the
use of CL-ESA is promising for cross-lingual matching, but it will require more study
and adaptation to achieve better results.
3.2</p>
    </sec>
    <sec id="sec-12">
      <title>Discussions on the way to improve the proposed system</title>
      <p>More reference alignments from “real world” ontologies will be used in the future for
training the ANNs, in order to cover more domains and different types of ontologies.
Regarding the cross-lingual matching, there is still room for continuing improving the
use of CL-ESA to that end. We plan also to combine this novel technique with other
ones such as machine translation.</p>
      <p>Time response in CIDER-CL is still an issue and has to be further improved. In
fact CIDER-CL works well with small and medium sized ontologies but not with large
ones. Partitioning and other related techniques will be explored in order to solve this.
3.3</p>
    </sec>
    <sec id="sec-13">
      <title>Comments on the OAEI 2013 test cases</title>
      <p>The variety of tracks and the improvements introduced along the years makes the
campaign very useful to test the performance of ontology aligners and analyse their
strengths and weaknesses. Nevertheless, we miss blind tests cases in more tracks, which
would allow a fair comparison between systems.
4</p>
      <sec id="sec-13-1">
        <title>Conclusion</title>
        <p>CIDER-CL is a schema-based alignment system that compares the ontological context
of each pair of terms in the aligned ontologies. Several elementary comparisons are
computed and combined by means of artificial neural networks. Monolingual and
crosslingual metrics are used in the matching.</p>
        <p>We have presented here some results of the participation of CIDER-CL at OAEI’13
campaign. The results vary depending on the track, from the good results in the
benchmark track to the relatively limited behaviour in anatomy, for instance. We confirmed
that the proposed technique, based on ANNs, is suitable in conjunction with SoftTFIDF
metric for monolingual ontology matching. The use of CL-ESA metric for cross-lingual
matching is promising but requires more study.</p>
        <p>Acknowledgments. This work is supported by the Spanish national project BabeLData
(TIN2010-17550) and the Spanish Ministry of Economy and Competitiveness within
the Juan de la Cierva program.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann. DBpedia -</surname>
          </string-name>
          <article-title>a crystallization point for the web of data</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          , Sept.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sizov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sorg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          .
          <article-title>Explicit versus latent concept models for cross-language information retrieval</article-title>
          .
          <source>In Proceedings of the 21st international jont conference on Artifical intelligence, IJCAI'09</source>
          , pages
          <fpage>1513</fpage>
          -
          <lpage>1518</lpage>
          , San Francisco, CA, USA,
          <year>2009</year>
          . Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ravikumar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Fienberg</surname>
          </string-name>
          .
          <article-title>A comparison of string distance metrics for name-matching tasks</article-title>
          .
          <source>In Proc. Workshop on Information Integration on the Web (IIWeb-03) @ IJCAI-03</source>
          , Acapulco, Mexico, pages
          <fpage>73</fpage>
          -
          <lpage>78</lpage>
          , Aug.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>An API for ontology alignment</article-title>
          .
          <source>In 3rd International Semantic Web Conference (ISWC'04)</source>
          ,
          <source>Hiroshima (Japan)</source>
          . Springer, November
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          . Ontology matching. Springer-Verlag,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Markovitch</surname>
          </string-name>
          .
          <article-title>Computing semantic relatedness using wikipedia-based explicit semantic analysis</article-title>
          .
          <source>In In Proceedings of the 20th International Joint Conference on Artificial Intelligence</source>
          , pages
          <fpage>1606</fpage>
          -
          <lpage>1611</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bernad</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Mena</surname>
          </string-name>
          .
          <article-title>Ontology matching with CIDER: Evaluation report for OAEI 2011</article-title>
          .
          <source>In Proc. of 6th Ontology Matching Workshop (OM'11)</source>
          , at 10th
          <source>International Semantic Web Conference (ISWC'11)</source>
          ,
          <source>Bonn (Germany)</source>
          , volume
          <volume>814</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          , Oct.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. S. K.</given-names>
            <surname>Wong</surname>
          </string-name>
          .
          <article-title>A critical analysis of vector space model for information retrieval</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          ,
          <volume>37</volume>
          (
          <issue>5</issue>
          ):
          <fpage>279</fpage>
          -
          <lpage>287</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Neural Networks for Statistical Modeling</article-title>
          . John Wiley &amp; Sons, Inc., New York, NY, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>P.</given-names>
            <surname>Sorg</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Exploiting wikipedia for cross-lingual and multilingual information retrieval</article-title>
          .
          <source>Data Knowl. Eng.</source>
          ,
          <volume>74</volume>
          :
          <fpage>26</fpage>
          -
          <lpage>45</lpage>
          , Apr.
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Winkler</surname>
          </string-name>
          .
          <article-title>String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage</article-title>
          .
          <source>In Proceedings of the Section on Survey Research</source>
          , pages
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>