=Paper= {{Paper |id=None |storemode=property |title=Multilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint |pdfUrl=https://ceur-ws.org/Vol-803/paper1.pdf |volume=Vol-803 |dblpUrl=https://dblp.org/rec/conf/rcdl/LinK11 }} ==Multilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint== https://ceur-ws.org/Vol-803/paper1.pdf
 Multilingual Ontology Matching based on Wiktionary Data
            Accessible via SPARQL Endpoint ♣

                                                                         © Andrew Krizhanovsky
                    © Feiyu Lin
                                                                 Institution of the Russian Academy of
         Jönköping University, Sweden                              Sciences St.Petersburg Institute for
              feiyu.lin@jth.hj.se                                  Informatics and Automation RAS
                                                                 andrew dot krizhanovsky@gmail.com


                      Abstract                              languages, i.e. multilingual ontology matching [8], [24].
                                                            There are different strategies related to multilingual
    Interoperability is a feature required by the           ontology matching [24]: (1) the indirect alignment
    Semantic Web. It is provided by the ontology            strategy based on composition of alignments, (2) the
    matching methods and algorithms. But now                direct matching between two ontologies, i.e., without
    ontologies are presented not only in English,           intermediary ontologies and with the help of external
    but in other languages as well. It is important         resources (translations). The latter strategy is used in
    to use an automatic translation for obtaining           this work.
    correct matching pairs in multilingual ontology             The Ontology Alignment Evaluation Initiative
    matching. The translation into many languages           (OAEI) 1 was launched in 2004 with the goal of
    could be based on the Google Translate API,             estimating and comparing different techniques and
    the Wiktionary database, etc. From the point of         systems related to ontology alignment. OAEI provides
    view of the balance of presence of many                 some multilingual datasets (ontologies and reference
    languages, of manually crafted translations, of         alignments), which were used in this work in order to
    a huge size of a dictionary, the most promising         evaluate the ontology matching system.
    resource is the Wiktionary. It is a collaborative           The multilingual ontology matching platform is
    project working on the same principles as the           presented in this work. COMS (Context-base Ontology
    Wikipedia. The parser of the Wiktionary was             Matching System) [15] implements the multilingual
    developed and the machine-readable dictionary           ontology matching based on Google Translate API and
    was designed. The data of the machine-                  the data of the English Wiktionary and SPARQL
    readable Wiktionary are stored in a relational          technology.
    database, but with the help of D2R server the               The Wiktionary (www.wiktionary.org) is a
    database is presented as an RDF store. Thus, it         multilingual and multifunctional dictionary. The
    is possible to get lexicographic information            Wiktionary contains not only word’s definitions,
    (definitions, translations, synonyms) from web          semantically related words (synonyms, hypernyms,
    service using SPARQL requests. In the case              etc.), translations, but also the pronunciations (phonetic
    study, the problem entity is a task of                  transcriptions, audio files), hyphenations, etymologies,
    multilingual ontology matching based on                 quotations, parallel texts (quotations with translations),
    Wiktionary data accessible via SPARQL                   figures (which illustrate meaning of the words).
    endpoint. Ontology matching results obtained                Wiktionary is popular since it is freely available and
    using Wiktionary were compared with results             contains huge database of words with translations to
    based on Google Translate API.                          many languages. The salient properties of the
                                                            Wiktionary are the multilinguality, the size, and the
1 Introduction                                              speed of evolution. It is difficult to compare dictionaries
                                                            with the Wiktionary, since data quickly become
Ontology matching is the process of finding
                                                            outdated. E.g. the PanDictionary was compared with the
correspondences between ontologies to allow them to         Wiktionary data obtained in the year 2008, when it has
interoperate. There are different methods, algorithms       403 413 translations [19]. Two years later, in 2010, the
and systems designed for ontology matching [9].
                                                            English Wiktionary contained twice as much
    A relatively new direction is concerned with an
                                                            translations (964 019). 2 So, the Wiktionary is
alignment of ontologies presented in different              permanently growing in number of entries and in the
                                                            scope of languages. Now the English Wiktionary

Труды 13й Всероссийской научной конференции                 1
«Электронные библиотеки: перспективные методы и              See http://oaei.ontologymatching.org
                                                            2
технологии, электронные коллекции» - RCDL’2011,              See http://en.wiktionary.org/wiki/User:AKA_MBG/
Воронеж, Россия, 2011.                                      Statistics:Translations




                                                        1
contains entries in about 770 different languages. The          is presented in Fig. 1: the first meaning, semantic
Wiktionary data are used:                                       relations (synonyms and antonyms) and translations
    • In machine translation between Dutch and                  related to the first meaning.
        Afrikaans [21];                                             The developed Wiktionary parser (wikt_parser) is
    • In the text parsing system NULEX, where some              one of several tools that parse Wiktionary data. Other
        Wiktionary data (verb tense) were integrated            tools include Zawilinski parser (Polish words in English
        with WordNet and VerbNet [18];                          Wiktionary) [14], JWKTL (the English and the German
    • In a speech recognition and speech synthesis as           versions of Wiktionary)3. Our parser wikt_parser differs
        a basis for the rapid pronunciation dictionary          in two areas:
        creation [10].                                              1. It requires that the XML dump to be initially
    The Resource Description Framework is a data                         loaded into the MySQL database;
model for representing information about World Wide                 2. It transforms the Wiktionary database into the
Web resources. SPARQL [1] is a query language for                        machine-readable dictionary and saves it as a
this data model. It is standardized by the World Wide                    smaller database (MySQL or SQLite) for later
Web Consortium. Now SPARQL is supported by most                          use.
RDF triple store.                                                   The parser source code and the database of the
    With the help of D2R server [4] the data extracted          machine-readable Wiktionary are available at the
from the Wiktionary are presented in the form of RDF            project site. 4
store. So, lexicographic information extracted from the             An automatic data extraction and a transformation
Wiktionary is accessible by using SPARQL requests. In           of the Wiktionary data are explained in [13]. The
the case study, the problem entity is a task of                 Wiktionary database used in the experiments and an
multilingual ontology matching based on Wiktionary              example of Wiktionary-based translations are described
data accessible via SPARQL endpoint.                            in the section “3.1 Wiktionary Database and SPARQL
    The next section describes system architecture              queries”.
consisting of the ontology matching system, Wiktionary
relational database, D2R server and SPARQL client.
Section 3 presents multilingual ontology matching
experiments based on Wiktionary and Google Translate
API. The discussion concludes the paper.

2 System architecture
In this section the developed platform will be described.
The key components are a Wiktionary relational
database, COMS ontology matching system [15], and
D2R server which provides access to the machine-
readable Wiktionary via SPARQL endpoint.

2.1 Machine-readable Wiktionary
There is an approach where the data are extracted from
different types of wiki sites for the further processing
and semantic search [21]. In that approach it was
developed special services that export structured data
into RDF/XML format. These services were designed
and tailored to specific wiki engines (MediaWiki,
                                                                Fig. 1. An example of data extracted from the Wiktionary
DokuWiki).                                                      entry “beautiful”: meaning (or definition), semantic relations
    Our work had the more modest goal of extracting             (synonyms and antonyms) and translations of the first
data from only one type of wiki site (Wiktionary),              meaning.
moreover, only one Wiktionary language edition
(English). The important fact is that Wiktionary entries        2.2 COMS
have well-defined structure. However this structure is
                                                                COMS (Context-base Ontology Matching System)
specified not at the level of MediaWiki, but at the level
                                                                system consists of two parts: automatic ontology
of texts of Wiktionary entries. Taking into account the
                                                                matching and context-based ontology matching [15].
structure of Wiktionary entry yield much more
                                                                The multilingual ontology matching is focus on
interesting information than just “structured data in
                                                                automatic matching. Currently, COMS just finds the
RDF/XML format”. The following data was extracted
                                                                corresponding elements and presents the result as
from the English and Russian Wiktionaries: definitions,
thesaurus and translations. An example of data
extracted from the “beautiful” English Wiktionary entry
                                                                3
                                                                    See http://www.ukp.tu-darmstadt.de/software/jwktl/
                                                                4
                                                                    See http://code.google.com/p/wikokit/




                                                            2
    Fig. 2. An automatic ontology matching strategy and evaluation

                                                                             future work, this will be extended to compare
“elementA = elementB similarity measure (float)”. The                        the common classes in the hierarchy.
super, sub and inverse relationships are not included.                  3.   Expanding tree method [17]. Ontology is
                                                                             expanded as a tree and set weights in the tree to
The process of multilingual ontology matching involves                       calculate ontology concept similarity. The
two steps. First COMS translates the entities source to
                                                                             different levels are given different weights
target ontology language. Then it applies automatically
                                                                             depending on the depth of the compared
the following monolingual matching strategies. Fig. 2                        classes. The first level concepts, which get the
shows two ontologies' automatic matching strategy and                        weight as 3 are the class’ subclasses and each
evaluation. Jena (http://jena.sourceforge.net) is used to
                                                                             relationship where it is domain or range. The
parse ontology elements.
                                                                             second level concepts which get weight 2, are
2.2.1 String Matching Strategy                                               depending on the first level concepts’
                                                                             subclasses and their relationship’s ranges.
Different string matching algorithms can be used here.                       Similarity we can get the third level concepts,
There is a good survey [3] on the different string                           with weight 1, based on the second level
similarity methods to calculate string distance from                         concepts. We treat ontology matching as
edit-distance (e.g. Levenstein distance, Monger-Elkan                        asymmetric. For example, a small ontology may
distance, Jaro-Winkler distance) to token-based                              perfectly match some parts of large ontology,
distance functions (e.g. Jaccard similarity, TF-IDF or                       the similarity between the small ontology and
cosine similarity, Jense-Shannon distance).                                  large ontology is 1.0 then, but not vice versa.
    We use the Jaro-Winkler distance [25] and                                The similarity between two concepts is
SmithWaterman algorithm [23] implemented in                                  computed as:
SimMetrics 5 and SecondString as our string matching
methods. The threshold for Jaro-Winkler distance is 0.9.
                                                                             sim ( x, y ) = ∑
                                                                                                wmatched − concepts
SmithWaterman algorithm can help find the similar
region for two strings.                                                                           ∑w     xi


2.2.2 Structure Matching Strategy
Different structure matching strategies are implemented           2.2.3 Lexical Matching Strategy
as following:                                                     One of our ontology matching strategies uses the
    1. If two elements of two ontologies' triples                 WordNet (version 3.0). WordNet 6 is based on
        (subject, predicate and object) are the same, the         psycholinguistic theories to define word meaning and
        third     element is assumed the same. For                models not only word meaning associations but also
        example, if the range and domain of two                   meaning-meaning associations [7]. WordNet consists of
        relations are the same, it means that the                 a set of synsets. Synsets have different semantic
        relations are the same. In future work, this will         relationships such as synonymy (similar) and antonymy
        be extended to compare the common triples in              (opposite),     hypernymy     (superconcept)/hyponymy
        the hierarchy.                                            (subconcept) (also called Is-A hierarchy / taxonomy),
    2. If the subclasses of two classes are the same,             meronymy (part-of) and holonymy (has-a). The paper
        these two classes are assumed the same. In                [16] provides an overview of how to apply WordNet in
                                                                  the ontology matching. In COMS, we use WordNet as
                                                                  the lexical dictionary.
5
 SimMetrics and SecondString are Java-based open-
                                                                  6
source packages used for string matching.                             http://wordnet.princeton.edu




                                                              3
    WordNet-Similarity 7 has implemented several                                                         Entity
WordNet-based similarity measures in a Perl package.
Java WordNet::Similarity8 is a Java implementation of
                                                                          abstraction, abstract entity              physical entity
WordNet::Similarity. Jiang-Conrath [11] measure is
chosen with threshold 1.0 to find corresponding classes
in ontology matching.       Jiang-Conrath measure is                          group, grouping                     object, physical object
derived from the edge-based notion by adding the
information      content        as      a      decision                         social group                          whole, unit
factor.
    jcn = 1 ( IC ( synset1) + IC ( synset 2) − 2 * IC (lcs )))            organization, organisation                artifact, artefact
where lcs is the super concept of synset1 and synset2,
IC is the information content (of a synset).                               institution, establishment              structure, construction
    For example, there are seven senses for the entry
noun school hypernym relation in WordNet (fragment):                         educational institution                building, edifice
    Sense 1
    school -- (an educational institution; "the school                         school (sense1)                      school (sense2)
was founded in 1900")
                                                                     Fig. 3. The fragment of noun senses with school and
         => educational institution -- (an institution               institution in WordNet taxonomy
dedicated to education)
            => institution, establishment -- (an
organization founded and united for a specific purpose)              2.3 SPARQL and D2RQ platform
              => organization, organisation -- (a group
of people who work together)                                         D2R server uses RDF and SPARQL languages in order
                 => social group -- (people sharing some             to provide access to the relational database [4]. System
social relation)                                                     takes SPARQL queries from the web and rewrites them
                   => group, grouping -- (any number                 to SQL queries via a specially prepared file (D2RQ
of entities (members) considered as a unit)                          mapping file).
                      => abstraction, abstract entity -- (a              The ontology matching system takes translation
general concept formed by extracting common features                 from the machine-readable Wiktionary with the help of
from specific examples)                                              D2R server (Fig. 4).
                       => entity -- (that which is                       The D2RQ mapping file has to be created only once.
perceived or known or inferred to have its own distinct              After that it is possible to access to the relational
existence (living or nonliving))                                     database via SPARQL. SPARQL queries will be
    Sense 2                                                          automatically translated on-the-fly into SQL by D2RQ
    school, schoolhouse -- (a building where young                   platform. Therefore there is no need to replicate the
people receive education; "the school was built in                   database into RDF store.
1932"; "he walked to school every morning")
         => building, edifice -- (a structure that has a
roof and walls and stands more or less permanently in
one place; "there was a three-story building on the
corner"; "it was an imposing edifice").

    Fig. 3 shows the fragment of nouns with school and
institution in WordNet taxonomy. If school is used in
Onto1 and institution is used in Onto2, school is the
subconcept of institution in sense1 of WordNet. After
we apply Jiang-Conrath measure, the similarity between
school and institution is 1.25 that is bigger than
threshold 1.0.
    The following subsection describes how to access
this database via SPARQL queries.




                                                                     Fig. 4. Architecture of the platform integrating the ontology
                                                                        matching system with the machine-readable Wiktionary
7                                                                                  accessible via SPARQL queries
    http://www.d.umn.edu/~tpederse/similarity.html
8
    http://www.cogs.susx.ac.uk/users/drh21/




                                                                 4
    A simple Wiktionary SPARQL client was written in             Table 1. Sample SPARQL query for the machine-
Java (as a part of COMS ontology matching system). It            readable Wiktionary
can obtain a list of translations from the source to the          SELECT ?langCode ?langName ?translationWord
target language using Wiktionary data.                            WHERE {
                                                                   ?lang wikpa:lang_code "en";
3 Experiments                                                            wikpa:lang_id ?langId.
                                                                   ?page wikpa:page_page_title "rain cats and dogs";
The experiments are based on one benchmark track of                       wikpa:page_id ?pageId.
OAEI. The reference ontology “test 101” is in English.               ?lang_pos
This reference ontology contains 33 named classes, 24                     wikpa:lang_pos_page_id ?pageId;
object properties, 40 data properties, 56 named                           wikpa:lang_pos_lang_id ?langId;
individuals and 20 anonymous individuals. The “test                       wikpa:lang_pos_id ?langPosId.
206” of benchmark contains one ontology in French.                   ?meaning
Therefore one reference ontology (in English) is                          wikpa:meaning_id ?meaningId;
matched to French ontology.                                               wikpa:meaning_lang_pos_id ?langPosId.
    In the “test 206” ontology in French the most part of          ?translation
words are presented in a canonical form (lemma). There                   wikpa:translation_id ?translationId;
are only a few words which are presented in non-                         wikpa:translation_lang_pos_id ?langPosId;
canonical form, e.g. French words “articles”, “auteurs”,                 wikpa:translation_meaning_id ?meaningId.
“éditeurs”, “réalisateurs”, “pages”, “chapitres”,                  ?langSource wikpa:lang_code ?langCode;
“communications”. Different word forms are                                      wikpa:lang_name ?langName;
recognized by the Google Translate system, but it is not                        wikpa:lang_id ?langIdSource.
taken into account by translation system based on the              ?translation_entry
machine-readable Wiktionary.                                            wikpa:translation_entry_id ?translationEntryId;
    Thus, our system translates labels from English to                  wikpa:translation_entry_translation_id ?translationId;
French first by using multilingual English Wiktionary,                  wikpa:translation_entry_lang_id ?langIdSource;
before applying monolingual matching procedures.                        wikpa:translation_entry_wiki_text_id
                                                                  ?wikiTextIdTrans.
3.1 Wiktionary Database and SPARQL queries                         ?wiki_text wikpa:wiki_text_id ?wikiTextIdTrans;
The dump of the English Wiktionary (as of October 30,                         wikpa:wiki_text_text ?translationWord.
2010) was the source data for our experiments. The                } LIMIT 7
created database of the machine-readable Wiktionary
contains:                                                        Table 2. SPARQL result: translations of the phrase
    - 1 731 784 total entries;                                   “rain cats and dogs”
    - 269 405 English entries
    - 154 990 French entries;                                     ?langCode ?langName            ?translationWord
    - 964 019 total number of translations;                        cmn       Mandarin             傾盆大雨
    - 50 617 number of translations from English to
                                                                   cs            Czech            lít jako z konve
         French.
This database was used for translation in the ontology             fr            French           pleuvoir des cordes
matching system. This database was accessed via                    fr            French           pleuvoir à verse
SPARQL queries. Most SPARQL queries are simple                     fr            French           pleuvoir des
and short [2]. However, it turns out that it is not so in                                         hallebardes
our case (Table 1).                                                ru            Russian          лить как из ведра
    Table 1 contains the example of the SPARQL                     sv            Swedish          ösregna
request for the machine-readable Wiktionary. Input data
for this request are (i) a language code (with value “en”,
                                                                 3.2 Translation Implementation
i.e. English language), (ii) a Wiktionary entry (“rain
cats and dogs”). Different colors of the rows in the             Ontology labels are often concatenated, e.g.
Table 1 show different parts of the request, where one           "dateDePublication", "IntervalleDePages", "Extrait-
part corresponds to one table in the database.                   Compilation". Google Translate system can recognize
    The result of this request is translations of the            the label and translate directly. The machine-readable
English phrase “rain cats and dogs” into all languages           Wiktionary can’t understand the concatenated label. In
presented in the Wiktionary. The part of the answer is           order to properly translate, labels are split into sequence
presented in Table 2. Several SPARQL queries to the              of    their   constituent      words.     For      example,
Wiktionary are presented on the wiki page of the                 "dateDePublication" is separated as “date De
project.9                                                        Publication”.
                                                                    In the reference alignment, one element is coming
                                                                 from “test 101” that is in English, one element is
9                                                                coming from “test 206” that is in French, and their
 See http://code.google.com/p/wikokit/wiki/
                                                                 similarity result. The total number of correct
d2rqMappingSPARQL




                                                             5
translations (the original French word and translated
word compared to reference alignment) before applying
ontology matching strategy by English Wiktionary is              3.3 Precision and Recall
44, and correct number by Google is 60.                          After we get the translation of the French ontology,
    The correct translation of English Wiktionary is             COMS applies automatically the following monolingual
lower, it is because that the Google gives the same word         matching strategies as described in Section 2.2. If there
as translation if the word is not in the dictionary, e.g.,       is no translation of the word, the original of element of
“isbn”, “url”, “lccn”, etc.. However, there is no                the ontology is used to string matching, for example,
translation in English Wiktionary to this case (see table        “isbn” in Wiktionary case. Even COMS can get
3, “isbn” example). On the other hand, Google                    separate meaning of the concatenated word, but COMS
translation API only provides one meaning translation            doesn’t support the different combination of the
of the words while English Wiktionary provides                   translation, for example, “nomCourt” is translated to
multiple meanings (if the word has) translation, for             “noun; name; short; court” and the correct translation is
example, “Université” is only translated to “University”         “shortName” (see table 3), COMS can’t achieve to
in Google, while is translated to “university; school” in        “shortName”. “Film” is interpreted to “movie; film;
English Wiktionary (see table 3). Google is good at              cinema; flick; motion picture”, COMS can recognize it
translation the concatenated word, for example,                  is “MotionPicture”.
“nomCourt” is translated “Shortname” directly (see                   The other matching strategies, such as WordNet is
table 3).                                                        applied, e.g. “school” and “institution” similarity is 1.25
                                                                 (see section 2.3). Structure matching strategy is applied,
  Table 3. Example of translations by Google and                 for example, in “Test 206”, object property “articles”
Wiktionary                                                       has domain “Revue” that interpreted as “Review” in
                                                                 Google and range “Article”. In “Test 101” object
  Source      Translation (list of words) Correspon              property “articles” has domain “Journal” and range
  French                                    dence                “Article”. Since “articles” is similar “articles” and
   word                                                          “Article” is similar “Article”, even “Review” and
                                                                 “Journal” has no string similarity, based on structure
Test #206 By Google By Wiktionary             Test 101           similarity rules, “Revue” and “Journal” is similar. The
                                                                 final alignment result is based on the matching
Film        Film         movie; film;         MotionPict         strategies presented in section 2.2.
                         cinema; flick;       ure                    There are different evaluation measures proposed in
                         motion picture                          the OAEI, e.g., compliance and performance measures.
Référence Reference                           Reference          The compliance measures consist of Precision, Recall,
                                                                 Fallout, F-measure, Overall, etc. Based on [6], the
ExtraitLiv- BookExcer                         InBook             definition of precision and recall are:
re          pt                                                       Definition (Precision).Given a reference alignment
Partie      Party     part; subset;           Part               R, the precision of some alignment A is given by
                      partially                                                   R∩ A
Livre       Paper     book; pound             Book                   P( A, R) =
                                                                                   | A|
Conférence Conference lecture                 Conferen-              It measures a valid possibility for ex post
                                              ce                 evaluations.
Compilatio Compilatio                         Collection             Definition (Recall). Given a reference alignment R,
n          n                                                     the recall of some alignment A is given by
Université University university; school School                                   R∩ A
                                                                     R( A, R) =
isbn        isbn                              isbn
                                                                                   |R|
                                                                    The provided reference alignment has 97 elements,
Clé         Key          key; radical; clef   key                which means | R | = 97.
                                                                    The retrieved alignment based on English
nomCourt Shortname noun; name; short; shortName                  Wiktionary has 54 elements, which means | A |= 54,
                      court
                                                                 intersection R ∩ A = 53
dateDePub Publication date; of; to; by; 's; firstPublis
                                                                     Precision is (see table 4):
-lication Date        in order to;          hed
                                                                                   R ∩ A 53
                      publication;                                   P( A, R) =          =    = 0.98
                      disclosure                                                    | A|   54
chapitres Chapters                          Chapters                Recall is (see table 4):
                                                                                   R ∩ A 53
éditeur     Editor       editor               Editor                 R( A, R) =         =    = 0.55
                                                                                    |R|   97




                                                             6
  The retrieved alignment Google translation API of                     process and the mapping activity [8], by using
COMS has 61 elements, which means | A |= 61, and                        an information about a domain of the ontology
                                                                        [5]).
intersection R ∩ A = 60 elements.                                   3) Now, in the experiment, (1) French words are
    Precision is (see table 4):                                         translated into English, (2) monolingual
                 R ∩ A 60                                               matching procedures based on English
    P( A, R) =         =    = 0.98                                      WordNet were applied. There is an idea to use
                  | A|   61
                                                                        the free French WordNet 10 as an additional
   Recall is (see table 4):                                             resource for the matching of two ontologies in
                 R ∩ A 60                                               English and French languages.
    R( A, R) =        =    = 0.62                               The Wiktionary parser development will be continued
                  |R|   97
                                                                in future work, aiming at an extraction of quotes and
                                                                Wiktionary context labels.
  Table 4. Precision and Recall Comparison between
Wiktionary and Google
                                                                References
                      Precision    Recall
                                                                [1] M. Arenas and J. Perez. Querying Semantic Web
      Wiktionary      0.98         0.55                             Data with SPARQL. 30th ACM Symposium on
      Google          0.98         0.62                             Principles of Database Systems (PODS), June
                                                                    2011,                  Athens,               Greece.
                                                                    http://web.ing.puc.cl/~jperez/papers/pods11b.pdf
4 Discussion and conclusion                                     [2] M. Arias, J. D. Fernández, M. A. Martínez-Prieto,
During the course of these investigations, the following            P. Fuente. An Empirical Study of Real-World
problems were solved:                                               SPARQL Queries. In: 20th International World
     • the possibility to use the translation (extracted            Wide Web Conference, Hyderabad, India, March
          from the Wiktionary) via SPARQL endpoint                  28th, 2011. http://arxiv.org/abs/1103.5043
          was successfully verified;                            [3] W. W. Cohen, P. Ravikumar, S. E. Fienberg, A
                                                                    comparison of string distance metrics for name-
     • the different translation mechanisms (manually
                                                                    matching tasks, 2003.
          crafted Wiktionary translations and statistics-
                                                                [4] R. Cyganiak and C. Bizer. D2R Server: A Semantic
          based Google translations) were applied and
                                                                    Web Front-end to Existing Relational Databases.
          compared in ontology matching.
                                                                    Berliner XML Tage, Berlin, Germany, September
The using of SPARQL has cons and pros.
                                                                    2006. http://richard.cyganiak.de/2008/papers/d2r-
    The merit of the approach (and SPARQL language
                                                                    server-bxmlt2006.pdf
in whole) is that the adding modifications to SPARQL
                                                                [5] M. Espinoza, A. Gomez-Perez, and E. Mena.
request is simpler than the work with SQL request. This
                                                                    Enriching an ontology with multilingual
subjective point of view could be explained by the fact
                                                                    information. In Proc. of 5th European Semantic
that one SPARQL request replaces many SQL requests
                                                                    Web Conference (ESWC'08), Tenerife, (Spain),
(see Table 1). It allows more easily for going deep into
                                                                    June                                           2008.
details, since there is only one step between the
                                                                    http://sid.cps.unizar.es/PUBLICATIONS/POSTSC
question formulation and the result, i.e. there are no
                                                                    RIPTS/eswc08-localization.pdf
intermediate SQL requests.
                                                                [6] J. Euzenat, R. G. Castro, and M. Ehrig. D2.2.2:
    The using of SPARQL has caveats and limitations
                                                                    Specification of a benchmarking methodology for
though. A server which provides SPARQL endpoint
                                                                    alignment techniques. Technical report, NoE
could be easily broken or overloaded by poor, heavy or
                                                                    Knowledge Web project deliverable, 2004.
erroneous request [20]. The effective way is to constrain
                                                                [7] R. Ferrer-i-Cancho. The structure of syntactic
the infinity of SPARQL requests to a strictly defined set
                                                                    dependency networks: insights from recent
of functions in a web service. Thus SPARQL endpoint
                                                                    advances in network theory. In: L. V., A. G. (Eds.),
is necessary for developers and experimenters in order
                                                                    Problems of quantitative linguistics, 2005, 60-75.
to check hypotheses, to create quickly complex queries.
                                                                [8] B. Fu, R. Brennan, D. O'Sullivan. Cross-Lingual
But the work should be carried out in a test mode, i.e.
                                                                    Ontology Mapping and Its Use on the Multilingual
service can stop, fail, and it is possible to restart the
                                                                    Semantic Web. In: 1st Workshop on the
service.
                                                                    Multilingual Semantic Web, at the 19th
    In order to improve results, the following problems
                                                                    International World Wide Web Conference (WWW
should be solved:
                                                                    2010), Raleigh, USA, April 27th, 2010, CEUR
    1) Several Wiktionaries should be integrated into
                                                                    Vol.571,                  2010,               13-20.
         one machine-readable dictionary, since different
                                                                    http://www.tara.tcd.ie/jspui/handle/2262/39188
         Wiktionaries contains both overlapping and
         unique data (see the analysis of English
         Wiktionary and Russian Wiktionary in [12]).
    2) Ontology context information should be taken
         into account (by integrating the translation           10
                                                                     See http://alpage.inria.fr/~sagot/wolf-en.html




                                                            7
[9] M. Gollapalli and X. Li. Survey on Data Linkage            [21] F. Orlandi and A. Passant. Semantic Search on
    and Ontology Matching Techniques. Submitted for                Heterogeneous Wiki Systems. In: Proceedings of
    VLDB, 2011.                                                    the 6th International Symposium on Wikis and
[10] Qingyue He. Automatic Pronunciation Dictionary                Open Collaboration: 1-10, ACM, WikiSym10.
    Generation from Wiktionary and Wikipedia.                      2010.
    Thesis. Karlsruhe Institute of Technology. 2009.               http://www.deri.ie/about/team/member/fabrizio_orl
    http://csl.anthropomatik.kit.edu/index.php?id=25               andi/
[11] J. J. Jiang and D. W. Conrath. Semantic similarity        [22] P. Otte and F. M. Tyers. Rapid rule-based machine
    based on corpus statistics and lexical taxonomy. In:           translation between Dutch and Afrikaans. In: 16th
    International      Conference       Research     on            Annual Conference of the European Association of
    Computational Linguistics, Taiwan, 1997.                       Machine         Translation,     EAMT11,       2011.
[12] A. Krizhanovsky. The comparison of Wiktionary                 http://xixona.dlsi.ua.es/~fran/publications/eamt201
    thesauri transformed into the machine-readable                 1a.pdf
    format. 2010. http://arxiv.org/abs/1006.5040               [23] T. F. Smith, M. S. Waterman. Identification of
[13] A. Krizhanovsky. Transformation of Wiktionary                 common molecular subsequences. Journal of
    entry structure into tables and relations in a                 molecular biology Volume 147 (1981) 195-197.
    relational       database       schema.       2010.        [24] C. Trojahn, P. Quaresma, R. Vieira. An API for
    http://arxiv.org/abs/1011.1368                                 Multi-lingual Ontology Matching. In Proceedings
[14] Z. Kurmas. Zawilinski: a library for studying                 of                     LREC,                   2010.
    grammar in Wiktionary. In: Proceedings of the 6th              http://disi.unitn.it/~p2p/RelatedWork/Matching/Tro
    International Symposium on Wikis and Open                      jahn_691_Paper.pdf
    Collaboration, Gdansk, Poland, July 07 - 09, 2010.         [25] W. E. Winkler. The State of Record Linkage and
    http://www.cis.gvsu.edu/~kurmasz/Software/#Zawi                Current Research. Problems, Technical Report,
    linski                                                         1999.
[15] Feiyu Lin, J. Butters, K. Sandkuhl, F. Ciravegna.
                                                               ♣
    Context-based Ontology Matching: Concept and                 Part of this work was financed by the Foundation
    Application Cases. In: 10th IEEE International             (The Swedish Institute), project CoReLib. Some parts
    Conference on Computer and Information                     of the research were carried out under projects funded
    Technology (CIT). Bradford, UK, June 29-July 1             by grants # 09-07-00066, # 09-07-00436 and # 11-01-
    2010.                                                      00251 of the Russian Foundation for Basic Research,
[16] Feiyu Lin and Kurt Sandkuhl. A survey of                  and project of the research program “Intelligent
    exploiting wordnet in ontology matching. In: IFIP          information technologies, mathematical modelling,
    AI, 2008, 341-350.                                         system analysis and automation” of the Russian
[17] Feiyu Lin and Kurt Sandkuhl. A new expanding              Academy of Sciences. This work is partly funded by the
    tree ontology matching method. On the Move to              project DEON (Development and Evolution of
    Meaningful Internet Systems 2007: OTM 2007                 Ontologies in Networked Organizations) based on a
    Workshops. Lecture Notes in Computer Science,              grant from STINT (The Swedish Foundation for
    2007, Volume 4806/2007, 1329-1337.                         International Cooperation in Research and Higher
[18] C. McFate and K. Forbus. NULEX: An Open-                  Education); grant IG 2008-2013.
    License Broad Coverage Lexicon. (accepted). In:
    The 49th Annual Meeting of the Association for
    Computational Linguistics: Human Language
    Technologies. Portland, Oregon, USA, June 19-24,
    2011. http://www.aclweb.org/anthology/P/P11/P11-
    2063.pdf
[19] Mausam, S. Soderland, O. Etzioni, D. S. Weld, K.
    Reiter, M. Skinner, M. Sammer, J. Bilmes.
    Panlingual Lexical Translation via Probabilistic
    Inference. Artificial Intelligence Journal (AIJ).
    Volume 174 Issues 9-10, 2010, 619-637.
    http://www.aaai.org/ocs/index.php/AAAI/AAAI10/
    paper/viewFile/1688/2281
[20] Lourens van der Meij, Antoine Isaac, Claus Zinn.
    A Web-based Repository Service for Vocabularies
    and Alignments in the Cultural Heritage Domain.
    Proceedings of the 7th Extended Semantic Web
    Conference, (ESWC 2010). Heraklion, Greece, 30
    May            -        3         June        2010.
    http://www.few.vu.nl/~aisaac/papers/STITCH-
    Repository-ESWC10.pdf




                                                           8