<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Dictionary Linking and Aggregation: Quality from Consistency</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kun Ji</string-name>
          <email>kun.ji@helsinki.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shanshan Wang</string-name>
          <email>shanshan.wang@helsinki.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lauri Carlson</string-name>
          <email>lauri.carlson@helsinki.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Helsinki, Department of Modern Languages</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The growth of Web-accessible dictionaries and term data has led to a proliferation of platforms distributing the same lexical resources in di erent combinations and packagings. Finding the right word or translation is like nding a needle in a haystack. The quantity of the data is undercut by the doubtful quality of the resources. Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognise wellconstructed dictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. The measures derived from it are tested against data extracted from BabelNet.</p>
      </abstract>
      <kwd-group>
        <kwd>Information extraction</kwd>
        <kwd>Quality checking</kwd>
        <kwd>Aggregation</kwd>
        <kwd>Merging</kwd>
        <kwd>Linked data</kwd>
        <kwd>Edit distance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Interactive Web and crowdsourcing have produced easily accessible lexical
resources of unprecedented size. Lexical resources such as WordNet or Wiktionary
are complemented by encyclopaedic data collections such as Wikipedia and
Wikidata. The quantity of the data brings along problems of quality, such as
errors, duplication and unclear provenance. Automatic methods including
machine learning techniques are called on to manage the wealth, but may also
contribute to the disorder.</p>
      <p>Typical dictionary data categories, or elds, di er in availability, unambiguity
and information potential. These three aspects often vary inversely: word labels
are abundant and simple, but polysemous; semantic relations are unambiguous
and informative, but scarce; subject eld classi cations and glosses have great
information potential that is hard to make precise. We must combine di erent
properties and vary our methods according to type.</p>
      <p>
        This position paper introduces our line of research (cf . [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) that tries to
develop language-independent linguistically-motivated distributional methods for
quality checking and aggregating such linguistic linked data. We rst illustrate
our approach with a selection of the kind of quality criteria we have in mind. As
an example of such a measure, we describe a simple distance measure which is
a variant of Levenshtein edit distance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The measure is tested against labels,
subject elds, and glosses extracted from multilingual dictionary BabelNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The experiments indicate that a at edit distance measure is less suited for
longer pieces of text. We are working on a more sophisticated language model
that takes into account the linguistic structure of glosses.</p>
      <p>The rest of the paper is structured as follows: Section 2 discusses related
work. Section 3 investigates candidate indicators/properties for quality checking
and aggregation of multiple dictionary data. Section 4 compares these properties
and describes the frequency-based distance measure. Section 5 describes our
progress implementation and evaluation. Sections 6 discusses the results of our
work. Section 7 presents our conclusions future work plan.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Ide and Veronis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] argued that dictionaries have too little information for
extracting from knowledge bases. That is not our task: basically, we just want
dictionaries with fewer errors and duplicates. It remains true that dictionary
checking may bene t from external information sources. The hard part is to
avoid that these introduce more noise than they help suppress.
      </p>
      <p>
        Navigli and Ponzetto compile BabelNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using word sense disambiguation
and machine translation as external sources. WordNet and Wikipedia are linked
by a mapping between WordNet senses and Wikipage titles. Missing translations
are collected from Wikipedia inter-language links and by machine translating
occurrences of the labels within sense-tagged corpora. They report 82 percent
mapping accuracy. We want to locate and x the remaining errors.
      </p>
      <p>
        Eckard et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] match a French dictionary with a machine translated French
WordNet looking for hypernym relations, using manually prepared regex
patterns to parse dictionary de nitions.
      </p>
      <p>
        Semantic relatedness (SR), more generally, measures how much two (strings
of) words or concepts are related, counting all kinds of relations between them.
Zhang et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] present a hybrid SR method that generates a connection graph
between labels using WordNet semantic relations and Wikipedia contexts and
measures semantic relatedness by the density of the graph between two labels.
Semantic similarity can be indirectly measured by semantic relatedness. It may
thus bring useful evidence for our task, which is dictionary alignment and
aggregation. Token level edit distance is known as WER (word error rate) in speech
recognition and machine translation research [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Quality Checking for Dictionary Merging</title>
      <p>Dictionaries are a good case of both the availability and need of matching and
aggregating Web accessible data. There are a legion of mono- and multilingual
dictionaries, glossaries, thesauri and other vocabulary collections in the Web,
some but not all in RDF, some public and collectively maintained, many
commercial but openly accessible for querying. This multiplicity is also an
encumbrance. In language technology, one of the most common feature requests from
human translators is ways to simplify the search of equivalents in the host of
available sources.</p>
      <p>Besides explicit URLs, dictionaries abound in implicit internal and
crossdictionary links, created by shared labels (words, collocations), subject eld
classi cations, glosses, grammar and other properties. By aggregating dictionary
entries, we also implicitly address the problems of (i) identifying valid such links
(ii) discarding misleading and duplicated links, and (iii) making useful links
explicit.
3.1</p>
      <sec id="sec-3-1">
        <title>Terminology</title>
        <p>To begin with, we de ne here some of the key concepts of our dictionary ontology.</p>
        <p>By a label we mean a language-identi ed baseform (lemma), represented in
RDF as "base"@lang. A monolingual or multilingual dictionary minimally
generates a cover (set of possibly overlapping subsets) of the labels. The cover
represents the neighbourhoods generated by a synonym or equivalent relation.
The members of the cover are called synsets, or equivalent sets (eqsets) if the
dictionary is multilingual. An Eqset is a multilingual synset. Separated by
language codes, eqsets form disjoint unions of synsets. Synsets/eqsets can be seen
to represent concepts or meanings.</p>
        <p>A sense is a pairing of a label and a synset that the label is a member of. The
more synsets a label belongs to, the vaguer it is. A label is n-way polysemous
if it belongs to n synsets. Dually, the more members a synset has, the wider its
meaning is. Special language labels are less polysemous than general language
labels. Ideally, terms should be monosemous (per subject eld). Polysemy is
said to happen between subject elds, which supplies another consistency test.
To estimate if a label is a term, one may check the size of its synset. To check
if a term has been translated by a general language label, compare the sizes of
their synsets.</p>
        <p>Subject eld headings show which domain or subject eld a speci c term
belongs to. When the same label appears in di erent meanings, subject eld
classi cations are used to distinguish meanings. A gloss is the de nition or
explanation associated to a label, which provides direct meaning of the concept
in order to check if concepts are same or not. Hypernyms and other
semantic relations serve the same purpose, as do part of speech and other grammar
categories.</p>
        <p>Synsets and eqsets are target units that we reconstruct in our dictionary
alignment. Labels, subject elds, glosses and other indicators are properties from
which our distance measure to be inferred. In this paper, we restrict our attention
to labels, subject elds, and glosses.</p>
        <p>Synsets/eqsets may overlap because labels may belong to many synsets, by
way of vagueness (synonymy is not exact, synset boundaries are negotiable) or
polysemy (a label may belong to di erent but semantically related synsets). A
synset (eqset) is thus a syntactic representative of a meaning. The relation
\synonymy" or \equivalence" means \in the same neighbourhood". It is an
equivalence relation within the synset, but not transitive through shared members of
overlapping synsets.</p>
        <p>Hence overlapping synsets cannot be merged in general. Even if consistent,
the conjoined synset may be narrower than the originals, and the inferred
equivalences have less application than its premises. This is why WordNet translations
cannot be just merged into the synsets that they translate. We need to nd
criteria for when synsets are safe to merge and what is the risk.</p>
        <p>Translation equivalences are more informative than monolingual synsets
because of mismatches between languages. Ambiguous words in one language may
have unambiguous equivalents in another. Synonyms and hypernyms that are
lexical in one language might be phrasal in another. This is particularly true
about direct translations of WordNet to another language, as lexical gaps in the
other language are often lled by phrasal de nitions or paraphrases.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Merging Dictionaries</title>
        <p>To match two dictionaries, we may pool together the equivalent sets from two
dictionaries (with some markings to tell where they came from) and test if the
combined dictionary satis es various quality criteria. In theory, we merge two
dictionaries by merging best matched entries and apply the quality criteria to
the result. In practice, merging and checking may happen interleaved.</p>
        <p>A multilingual dictionary may list bigger or smaller translation equivalences
(multilingual synsets) depending on the precision and completeness of the
translations. An optimal translation equivalence is to be reconstructed from many
such smaller translation equivalences in the di erent sources.</p>
        <p>To nd if two translation equivalences (say, from di erent sources) can be
merged, we may try to unify them by merging them and assuming some
equivalences (e.g. based on label identity). The merge creates a lot of new equivalences.
Some of the new equivalences may be explicitly present or attested in data, some
not.</p>
        <p>We may consider a binary translation pair like "bank"@en - "pankki"@fi
as a base case of an eqset. In general, translation is not symmetric. By a
translation norm, a translation should not add information (change a true text into a
false text), but the opposite is not required: a translation may lose information
in the source to satisfy other desiderata of the translation brief. The relation
'x entails y' is a partial order (transitive and antisymmetric). Symmetry can
be restored by narrowing the context (e.g. with a subject eld heading), or by
giving up transitivity: the weaker notion 'y may translate x' is a symmetric
non-transitive similarity relation. For constructing larger eqsets, the narrowing
solution is preferable, so we should prefer binary translation pairs whose
symmetry is attested in the data.</p>
        <p>If we deconstruct WordNet synsets/eqsets into a binary relation of pairwise
synonymies/equivalences between word senses, they form an equivalence relation
whose quotient sets are the synsets/eqsets. When such equivalence pairs are all
attested, it is easy to reconstruct synsets from them by just forming the partition
of the set into strong components (strongly connected graphs, cliques). Each such
component is a synset/eqset.</p>
        <p>
          The clique test is at the strict end on a scale of attestedness. It construes
synsets out of strongly connected components of the binary equivalent relation.
When the evidence for synsets less complete, we may weaken the tests, with
increased risk. Assuming transitivity and antisymmetry (the translation norm
that translations are no narrower than original), we may check for equivalence by
looking for cycles ([
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). Say we have three dictionaries, en-fi, en-sv, fi-sv.
We may merge en-fi, fi-sv and check the result against sv-en:
        </p>
        <p>Given multiple sources, there are two dimensions of degree of attestedness:
number of distinct attested equivalences and number of duplicate attestations
from di erent sources. Using such counts, we may construct quantitative variants
of the consistency tests.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>WordNet as Gold Standard</title>
        <p>We test our criteria and measures on WordNet by deconstructing WordNet
synsets down to senses or labels and then seeing to what extent we are able
to reconstruct the synsets from them. The deconstruction of WordNet synsets
can be done at word sense level or down to label level. Assume given the following
synsets.
1. Word senses Sense deconstruction for the synsets produces three senses:
[ sense 1; label 'man'@en; property1 'Noun'; property2 'human being' ] .
[ sense 2; label 'human'@en; property1 'Noun', property2 'human being' ] .
[ sense 3; label 'man'@en; property1 'Noun', property2 'adult male' ] .</p>
        <p>Deconstructing synsets to word senses that inherit properties from the synsets,
can we reconstruct the synsets by merging the senses? The answer is
trivially yes if we retain the synset id or other key properties of synsets (like
gloss). This exercise is more relevant when word senses come from di erent
dictionaries.
2. Word labels Label deconstruction produces two labels:
[ label 'man@en'; property1 'Noun'; property2 'human being','adult male' ] .
[ label 'human@en' ; property1 'Noun', property2 'human being' ]
Distributing properties inherited from synset all the way to labels, can we
reconstruct senses and synsets by splitting the labels, without knowing how the
properties were clustered in the senses?</p>
        <p>In (1) the senses keep properties together, whereas in (2) we lose the
information regarding which properties go with which sense. In (2) there will be
many more items to merge. In the example above, the three senses cannot be
reconstructed from the label deconstruction since the ambiguity of 'man' has
been lost. The three senses can be merged back to two synsets from sense
deconstruction because the properties of the two senses agree. For other similar
or more complicated cases, we try reconstruction at varied granularity { word
sense, label sense or combined to nd an optimal merging solution.</p>
        <p>In the general situation of combining di erent dictionaries, what get merged
are just such \senses", \terms" or \entries", which combine one or more labels
plus some other properties. The risk is in merging labels or/and properties
belonging to incorrect or repeated senses. The merger can lose information. Errors
and duplicates may arise. The task is to aggregate the entries to obtain the
most likely and meaningful synsets. Starting from a set of partial descriptions of
shared meaning, we try to merge the descriptions into manageable clusters. We
next look at some statistics on the di erent indicators in WordNet.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Comparing Properties</title>
      <p>In the previous section, we presented criteria that may be used in aggregating
synsets and eqsets. Our criteria depend on notions of identity or su cient
similarity among labels and other dictionary elds/properties, which is the topic of
this section.</p>
      <p>We have not yet touched the problem of matching similar but not identical
properties. For other properties besides labels, such as glosses, matching is not
straightforward. In the general case, we want to deal with graded measures.</p>
      <p>The problem of matching two properties is not independent of matching
the whole entries. Matching property contents is an argument for matching the
entries, and vice versa. We set aside this complication for now.
4.1</p>
      <sec id="sec-4-1">
        <title>Sharing of Labels between Synsets</title>
        <p>The English WordNet RDF has about 100K synsets and 200K senses. It includes
translations in 21 languages. The most complete one is Finnish (300K), with
Malay, Japanese, Indonesian, and French next (over 100K each).</p>
        <p>To appreciate our chances in the reconstruction, we studied how far labels
alone go in measuring the similarity of synsets. Less than 0.1 percent of
hypernymous synset pairs in the English WordNet share one or more labels. About 4
percent of hypernymous eqset pairs share labels in at least one language. This
is another indication that translating the English WordNet creates redundant
distinctions for the target languages. For random synset pairs, the
corresponding percentages are one or two magnitudes smaller. So label sharing is a good,
though rare indicator of synset similarity.</p>
        <p>Listing 1 shows some of the closest WordNet synsets measured in shared
labels and translation respectively. The rst column is the number of shared
labels, the next two columns are the synset ids, followed by a sample shared
label and glosses for the two synsets.</p>
        <p>Listing 1 Closest WordNet Synsets Measured in Shared Labels and Translation
10 wn31:107741018-n wn31:112599160-n "mung bean"@eng 'mung seed''mung plant'
6 wn31:107137720-n wn31:107407761-n "scream"@eng 'cry''noise resembling cry'
6 wn31:104647089-n wn31:104717403-n "severity"@eng 'excessive sternness''hard to endure'
129 wn31:200825727-v wn31:200826456-v "admonish"@eng 'take to task''censure severely'
94 wn31:400046739-r wn31:400473918-r "extremely"@eng 'extreme degree''extraordinary degree'
81 wn31:200346415-v wn31:201654152-v "start"@eng 'take first step''get off the ground'
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>String Relationships in Hypernyms</title>
        <p>A fraction of hyponymy relations are recognisable from their syntactic makeup
as phrasal species terms, each composed of a hypernym denoting the genus and
modi ers specifying di erentia, for example skilled workman &lt; workman . In
the English WordNet, about one quarter of hypernym relations have this form,
mostly phrasal verbs and special eld terms. Another fraction are su xal (in
English, typically compounds), like workman &lt; man . When all of the above
types are included, 22 percent of hypernym relations contain at least one English
substring relationship. Apparently, substring relationships are a useful indicator,
but not strong enough alone.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Distance Measure</title>
        <p>To have a quantitative measure for the distance between similar labels and other
dictionary elds/properties, we implemented a language-independent character
frequency-based edit distance measure. The same measure is designed to be
applicable to subject eld labels and glosses, possibly with di erent additional
information sources and parameter settings.</p>
        <p>
          Our distance measure is a two-level frequency weighted Levenshtein (edit)
distance measure [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. It is designed to be language-independent as far as feasible,
using only information available in the dictionary itself. With this desideratum
in mind, the measure derives edit costs (weights) from character and token
frequencies extracted from the input data or imported from external sources.
4.3.1
        </p>
        <p>Character-based Distance Measure for Comparing Labels
We rst calculate Levenshtein edit distances between tokens, with edit costs
weighted by frequencies of characters per string position. Speci cally, character
cost grows with the variety (number of di erent characters) per position and the
information value (inverse frequency) of the character at the position.</p>
        <p>As expected for English, early positions have more variation, while mid vowels
and dental consonants predominate at endings.
0: v=1 w=2 o=12 c=6 h=3 r=1 b=1 f=4 l=4 k=2 ?=1 s=6 g=1 i=5 a=20 e=10 d=2 p=4 n=1 t=12
1: s=5 g=1 i=8 p=1 t=1 n=16 e=8 a=10 y=1 c=1 o=6 w=1 b=1 f=2 x=4 h=13 u=1 r=9
2: l=2 f=1 r=4 c=1 o=4 y=1 m=2 v=6 t=9 n=10 p=1 d=3 e=3 a=8 i=6 g=1 s=7
3: r=1 a=2 e=9 l=3 t=10 n=2 d=2 f=1 g=2 w=1 i=11 s=4 c=5 o=1 m=4
4: g=3 i=6 s=2 a=1 e=4 n=5 t=9 p=2 o=3 -=1 m=1 u=1 r=4 h=2 l=2 b=1
5: g=4 i=2 s=1 t=1 n=3 d=2 p=1 e=5 a=2 y=6 o=2 c=3 w=1 v=1 l=2 b=2 f=1 r=2
6: e=5 a=1 r=2 p=1 n=3 t=3 l=3 v=1 i=2 c=1 o=2 y=1
7: t=1 n=3 l=1 d=3 e=3 a=1 c=3 y=1 i=1 s=2
8: e=3 g=2 d=1 t=1 n=1
9: e=1 a=1 g=1 n=1
10: t=1 i=1
11: n=1 l=1
12: e=1 y=1
13: d=1</p>
        <p>The tokens are normalised to types using token distances and token
frequencies as guides. The assumption in this reduction is that the dictionary or lemma
form is close in character distance to its in ections and derivations and no less
frequent than them. If a synonym dictionary is supplied, it is used in the
tokenisation, preferring types that occur in the dictionary. Also abbreviations and
multiword phrases get tokenised with the dictionary if supplied.</p>
        <p>The token distances obtained on the rst level are used as token costs in
another Levenshtein round that compares multi-token strings (terms, glosses,
de nitions etc.). This round uses a similar logic to the previous one, using
position-sensitive type frequencies to weight edit costs. (The built-in
assumption is that key terms occur early in glosses.) Besides the usual edit operations
(addition, deletion, substitution) gloss distance adds permutation (by lowering
the cost of substitutions of low-frequency terms if they are o set by an opposite
substitution elsewhere).</p>
        <p>To give more weight to low-frequency terms, the character-based token edit
distances are scaled by token frequencies so that long edit distances to
lowfrequency, high-information tokens (terms) are stretched exponentially at the
high frequency end and short distances correspondingly shrunk at the
opposite end. Under this metric, the short end edit distances manage to single out
in ectional and derivational relations between signi cant keywords.
0.025269 221.000000 reciprocating reciprocal
0.024468 214.000000 functional function
0.023897 209.000000 experiencing experience
0.022639 198.000000 features feature
0.016808 147.000000 characteristic characterized
0.015550 136.000000 organisms organism
0.013835 121.000000 interacting interaction
0.012920 113.000000 accomplishment accomplishing
0.008804 77.000000 substances substance
0.003544 31.000000 independently independent
4.3.2</p>
        <p>Synonym-enhanced Distance Measure for Comparing Glosses
The character-based distance measure fails to capture similarities between glosses
that use unrelated but synonymous words. To remove this limitation, we import
semantic relatedness information from the dictionary itself. It was done here by
lifting WordNet synset and hypernym relations to a semantic relatedness relation
between labels. This construction is lossy in three ways: (1) the further apart two
synsets are in hypernym hierarchy, the more loosely they are considered related;
(2) the larger the synset, the vaguer its meaning (in general) { special language
concepts tend to have fewer synonyms than vaguer or context dependent general
language meanings; (3) precision falls with sense count: a polysemous label is
less sure an indication of meaning than a monosemous one. We generate a fuzzy
set of 1.3M semantically related pairs of labels from English WordNet, weighed
by the above counts so that more precise synonymies have more bearing than
fuzzier ones.</p>
        <p>In sum, the synonym-enhanced gloss distance may correctly predict semantic
distances between di erently worded de nitions of the same thing on the one
hand, and de nitions pertaining to di erent concepts on the other hand. Table
1 shows the gloss distance for term 'REMOVAL' and 'Removal':
{ wn31:100021914-n \any substance such as a chemical element or inorganic
compound that can be taken in by a green plant and used in organic
synthesis"
{ REMOVAL \The formal expulsion or deportation of a non-citizen from the
United States when the non-citizen has been found removable for violating
the immigration laws A person can be removed for overstaying a visa or for
breaking laws including immigration laws"
{ Removal \The expulsion of an alien from the United States based on grounds
of either inadmissibility or deportability"</p>
        <p>Table 2 is a truncated Levenshtein distance matrix for the glosses
REMOVALRemoval. Star marks a substitution, plus addition and minus deletion. The
minimum edit path can be traced following pluses down, minuses to the right, and
stars diagonally down right.</p>
        <p>The
expulsion of
an
alien from the</p>
        <p>United States based on
To evaluate our distance measure, we extracted the rst 4130 synsets from
BabelNet using the Java API for the BabelNet 3.6.1 Lucene index download.
We tested the character-frequency based distance measure on the rst 1000
English language BabelNet labels in our extract, listing for each label its nearest
neighbour in the set according to the measure. The synonym dictionary was not
used here. 177 pairs were identical. 58 percent of the near neighbours came from
the same synset. This may be compared to the probability of a random pair of
labels coming from the same synset in our data (0.007).
"protein folding" ~ "folding"
"Misfolded protein" ~ "Misfolded"
"Misfoldings" ~ "Misfolding"
"Incorrect protein folding" ~ "Incorrect folding"
"Singleblinding" ~ "Singleblind"
"Dryopithecini" ~ "Dryopithecidae"
"Double-entries" ~ "Double-entry"</p>
        <p>The list deteriorates towards the end. This can be helped by adding a con
dence index and threshold to cut o weakest cases. Another class of false
positives are near ties like "fathering"/"feathering". They may be captured using a
dictionary.
5.2</p>
      </sec>
      <sec id="sec-4-4">
        <title>Subject Field Labels</title>
        <p>To test the distance measure on subject eld labels, we extracted 9890 BabelNet
categories in English from our data and listed the nearest matching category
la-6.29 -6.87
+6.71 +7.29
-6.09 -6.67
+6.66 +7.24
bel pairs in that set. In this run, we used the WordNet based synonym dictionary.
An excerpt from both ends of the listing:
"Philosophy" "Epistemology"
"Polytheism" "Religion"
"Behaviorism" "Psychology"
...
"Knights Grand Cross of the Order of Merit of the Italian Republic"
~ "Grand Cross of the Order of Civil Merit"
"Central Committee of the Communist Party of the Soviet Union members"
~ "Heads of the Communist Party of the Soviet Union"
"People executed by the Bourbon dynasty of the Kingdom of France"
~ "Peers of France"</p>
        <p>The result may be evaluated again by comparing the proportion of matches
in our listing which classify the same synset (0.14) to the probability of a pair
of category labels chosen at random to classify the same synset in our data
(0.0004).
5.3</p>
      </sec>
      <sec id="sec-4-5">
        <title>Glosses</title>
        <p>The distance measure was tested on 1000 BabelNet glosses extracted from our
data, with the nearest matching category label pairs listed in that set. In this
run, we used the WordNet-based synonym dictionary. A few examples of the
pairs judged nearest in the listing:
"Jane Seymour Fonda is an Academy Award-winning American actress,
model, writer, producer and political activist."
~ "American actress and activist"
"The down of birds is a layer of fine feathers found under the tougher exterior feathers."
~ "Soft, immature feathers."
"A dog sled is a sled pulled by one or more sled dogs used to travel over ice and through snow."
~ "A sled, pulled by dogs over ice and snow."
"Dreams are successions of images, ideas, emotions, and sensations that
occur involuntarily in the mind during certain stages of sleep."
~ "A series of mental images and emotions occurring during sleep"
"Duty is a term loosely applied to any action which is regarded as morally incumbent,
apart from personal likes and dislikes or any external compulsion."
~ "Nose."</p>
        <p>The result may be evaluated again by comparing the proportion of matches
in our listing which classify the same synset (0.18) to the probability of a pair
of glosses belonging to the same synset in our data (0.002). There were just 14
identical pairs this time.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>As the last example above shows, there is room for improvement here. The
measure is sensitive to length, while lengths of glosses may vary considerably.
The e ect of unequal length may be damped by truncating glosses (say the length
of the shorter one), or by dropping high frequency tokens (articles, prepositions,
auxiliaries). Quality aside, the Levenshtein measure is resource intensive on long
glosses.
Obviously, edit distance is too unstructured for long glosses. We must
improve the language model. We are currently working on a frequency-driven parser
to compare glosses not as at strings but as binary tree (dependency) structures,
so as to cut down on pairwise comparisons of low-information tokens. Only the
n-best edges from the parser are compared using the Levenshtein distance
measure. Our parser is a frequency weighted chart parser using binary (dependency)
grammar rules extracted from dictionary data.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>To summarise, this paper presents quality criteria based on WordNet to merge
linked lexical resources and to detect duplicates and errors in them. A distance
measure to compare linguistic strings was described and tested on WordNet and
BabelNet.</p>
      <p>The rst-round tests suggested how to improve the measure for longer glosses.
That done, we may proceed with the WordNet deconstruction/reconstruction
exercise to test our approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Wang</surname>
            . S and
            <given-names>L. Carlson</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Linguistic Linked Open Data as a Source for Terminology - Quantity versus Quality</article-title>
          .
          <source>Proceedings of NordTerm</source>
          <year>2015</year>
          (to appear).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>Vladimir I.</given-names>
          </string-name>
          (
          <year>February 1966</year>
          ).
          <article-title>"Binary codes capable of correcting deletions, insertions, and reversals"</article-title>
          .
          <source>Soviet Physics Doklady</source>
          .
          <volume>10</volume>
          (
          <issue>8</issue>
          ):
          <volume>707</volume>
          {
          <fpage>710</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ponzetto. BabelNet:</surname>
          </string-name>
          <article-title>The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>193</volume>
          ,
          <string-name>
            <surname>Elsevier</surname>
          </string-name>
          ,
          <year>2012</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>N.</given-names>
            <surname>Ide</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Veronis</surname>
          </string-name>
          .
          <article-title>Extracting knowledge-bases from machine-readable dictionaries: Have we wasted our time?</article-title>
          <source>In Proc KB&amp;KB93 Workshop</source>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Emmanuel</given-names>
            <surname>Eckard</surname>
          </string-name>
          , Lucie Barque,
          <source>Alexis Nasr and Benoit Sagot</source>
          ,
          <year>2012</year>
          . Dictionary-Ontology
          <string-name>
            <surname>Cross-Enrichment</surname>
          </string-name>
          .
          <article-title>Using TLFi and WOLF to enrich one another</article-title>
          .
          <source>COLING Workshop on Cognitive Aspects of the Lexicon</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Z</given-names>
            <surname>Zhang</surname>
          </string-name>
          , AL Gentile,
          <string-name>
            <surname>F Ciravegna</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Harnessing di erent knowledge sources to measure semantic relatedness under a uniform model</article-title>
          .
          <source>in Proceedings of EMNLP</source>
          <year>2011</year>
          . http://www.aclweb.org/anthology/D11-1092.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A</given-names>
            <surname>Marzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Vidal</surname>
          </string-name>
          .
          <article-title>Computation of normalized edit distance and applications</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>15</volume>
          /9,
          <string-name>
            <surname>September</surname>
          </string-name>
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          ,
          <year>Roberto 2009</year>
          .
          <article-title>Using cycles and quasi-cycles to disambiguate dictionary glosses</article-title>
          .
          <source>Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pp.
          <volume>594</volume>
          {
          <issue>602</issue>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Princeton University "About WordNet.
          <source>" WordNet</source>
          . Princeton University.
          <year>2010</year>
          . &lt;http://wordnet.princeton.edu&gt;.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>