=Paper=
{{Paper
|id=Vol-1899/OntoLex_2017_paper_3
|storemode=property
|title=OntoLex and Onomasiological Ordering: Supporting Topical Thesauri
|pdfUrl=https://ceur-ws.org/Vol-1899/OntoLex_2017_paper_3.pdf
|volume=Vol-1899
|authors=Sander Stolk
|dblpUrl=https://dblp.org/rec/conf/ldk/Stolk17
}}
==OntoLex and Onomasiological Ordering: Supporting Topical Thesauri==
OntoLex and Onomasiological Ordering:
Supporting Topical Thesauri
Sander Stolk
Leiden University, Leiden, The Netherlands,
s.s.stolk@umail.leidenuniv.nl
Abstract. The OntoLex vocabulary has been designed to capture
lexicons and to add their lexicographical knowledge to ontologies in the
Semantic Web. Although the specification of the vocabulary posits that
OntoLex allows lexicons to be ordered onomasiologically, it does so for
a very specific kind of onomasiological ordering only. As a consequence,
the vocabulary is currently insufficient for capturing a large proportion
of the existing topical thesauri. This paper demonstrates the current
expressivity and this shortcoming of OntoLex through two case studies:
The Historical Thesaurus of the Oxford English Dictionary and The
Scots Thesaurus. In order for OntoLex to offer full support for topical
thesauri and their ordering principles, this paper proposes the addition
of a single property to the vocabulary: ontolex:isSenseIn. . . .
Keywords: OntoLex, Lemon, onomasiological ordering, thesaurus
1 Introduction
The Lexicon Model for Ontologies vocabulary has been designed to capture
lexicons and to add their lexicographical knowledge to ontologies in the Semantic
Web [1]. The vocabulary has seen a number of updates, and was published as
a W3C vocabulary by the OntoLex community group in May 2016 [2]. This
version, henceforth OntoLex, has since been picked up by a number of bodies,
including the Global WordNet Association, to represent and link existing lexical
resources on the Semantic Web [3].
The specification of OntoLex puts forward a manner in which “lexicons can be
ordered onomasiologically, that is by meanings rather than by lemmas” [2]. For
publishers of topical thesauri, this is good news indeed. Such support is essential
for these lexicographical works, which order their words by meaning instead of
from a to z as is common in typical dictionaries. Yet the OntoLex vocabulary
supports a very specific kind of onomasiological ordering only. As a consequence,
the vocabulary is currently insufficient for capturing the knowledge from a large
proportion of the existing topical thesauri. The current paper demonstrates this
shortcoming of OntoLex and proposes a way forward for the vocabulary.
2 Sander Stolk
2 Methodology
In order to provide insight into the current support of OntoLex for the
onomasiological ordering of topical thesauri, this paper will present two case
studies. The first is based on the Historical Thesaurus of the Oxford English
Dictionary [4]; the second on The Scots Thesaurus [5]. Both lexicographical
works employ an onomasiological ordering for their lexicon. The
first-mentioned thesaurus is considered to be a distinctive one and contains sets
of synonyms. The second is not distinctive but cumulative and refrains from
indicating synonymy [6].
This paper expresses samples from both thesauri in the OntoLex vocabulary.
The manner in which OntoLex is applied is in line with the specification of the
vocabulary [2] and the approach outlined by the Global WordNet Association
[3]. This approach has been adopted by several projects, amongst which the
Open Dutch Wordnet [7]. Namespaces relevant for this paper are provided in
Listing 1. The RDF snippets in subsequent listings are specified in the Turtle
RDF syntax [8]. Sample data from the case studies correspond with resources
between angular brackets in the RDF snippets (that is to say, their namespace
is left unspecified for the present purpose).
Listing 1. Namespaces
@prefix ontolex: .
@prefix owl: .
@prefix rdfs: .
@prefix skos: .
@prefix wn: .
3 Case study Historical Thesaurus of the OED
The first case study presented here is that of the Historical Thesaurus of the
Oxford English Dictionary (HTOED). HTOED captures the English lexis that
has existed throughout its 1300-year history, from Old English up to Modern
English. This topical thesaurus groups together lexical items that are considered
near-synonymous and provides insight into their use in time and place. HTOED
was first published in print in 2009 [4] and in the following year also electronically
[9].
Figure 1 depicts a sample from HTOED. This sample contains six categories
from the topical system of the thesaurus (here represented by circles), which are
organized in a hierarchy. A category that is displayed lower than another category
to which it is connected by means of a line is subordinate to that connected
category. On the right, a number of lexical senses are displayed (some of which
are obsolete, conveyed by a dagger sign). These senses are considered synonyms,
or rather, near-synonyms, in HTOED and are members of the “Freedom/liberty”
category.
OntoLex and Onomasiological Ordering 3
Society
Communication Authority
Lack of
subjection
Permission Freedom/liberty
synonyms
freedom, n. (in sense 3)
† freeship, n. (in sense 2)
† franchise, n. (in sense 1a)
liberty, n. (in sense 1b of homonym 1)
...
Fig. 1. Example HTOED content based on [9]
Expressing categories of the topical system of HTOED in OntoLex is
relatively straightforward. Each HTOED category corresponds with a lexical
concept in OntoLex. The latter is defined as a “mental abstraction, concept or
unit of thought that can be lexicalized by a given collection of senses” [2]. This
definition appears highly applicable to categories from topical thesauri. As
lexical concepts are asserted to be specializations of SKOS concepts, it is
possible to capture the hierarchy between categories using the
broader/narrower relations from SKOS [10]. Listing 2 contains the RDF for
expressing one of the HTOED categories in OntoLex, “Freedom/Liberty”, and
the relation to its superordinate category “Lack of subjection”.
Listing 2. HTOED category “Freedom/liberty” expressed in OntoLex
a ontolex:LexicalConcept ;
skos:prefLabel "Freedom/liberty"@en ;
skos:broader .
The OntoLex vocabulary also contains terminology to express lexical senses
and the lexical entries to which they belong. In order to state that a given
lexical sense from HTOED belongs to one of its categories, the property
ontolex:isLexicalizedSenseOf can be used. This property relates a lexical
sense to a lexical concept, stating that it “lexicalizes” that concept. According
to the section on Lexical Nets in the OntoLex specification, lexical senses that
lexicalize the same concept are considered synonymous [2]. In other words, the
relation of synonymy is not explicitly asserted in OntoLex, but can be inferred
from the use of the ontolex:isLexicalizedSenseOf property. The resulting
4 Sander Stolk
RDF for the sense of freedom from the HTOED sample and its relation to the
“Freedom/liberty” category is provided in Listing 3.
Listing 3. HTOED sense of freedom expressed in OntoLex
a ontolex:LexicalSense ;
skos:prefLabel "freedom n. (sense 3)"@en ;
ontolex:isSenseOf ;
ontolex:isLexicalizedSenseOf .
a ontolex:LexicalEntry ;
skos:prefLabel "freedom, n."@en ;
wn:partOfSpeech wn:noun .
As shown, capturing the onomasiological ordering of the HTOED lexicon
presents no issues with the OntoLex vocabulary. The vocabulary enables one to
express categories and their hierarchy, lexical senses and their relation to a lexical
entry, and the relation between the senses from HTOED and the categories to
which they belong.
4 Case study The Scots Thesaurus
The second case study in this paper concerns The Scots Thesaurus (ScT) [5]. ScT
captures the Lowland Scots lexis available throughout history, from its twelfth-
century beginnings to the present. This thesaurus, published in 1990, categorizes
its lexical items but does not indicate synonymy. Figure 2 depicts the sample
taken from ScT, encompassing five categories and four lexical senses.
Farming
Farmers Crops
Ploughing Sowing
† blander (in sense ’disperse scantily’)
happer (in sense ’a basket or container’)
heuch (in sense ’earth up plants in drills’)
miss (in sense ’fail to germinate or grow’)
...
Fig. 2. Example ScT content
OntoLex and Onomasiological Ordering 5
Expressing categories from ScT is possible in a manner identical to that
used for HTOED. The result for the “Sowing” category from ScT, including its
relation to the superordinate category “Crops”, is provided in Listing 4.
Listing 4. ScT category “Sowing” expressed in OntoLex
a ontolex:LexicalConcept ;
skos:prefLabel "Sowing"@en ;
skos:broader .
As for the lexical senses from ScT, these too can be expressed in OntoLex
comparable to how it has been done for HTOED. There is, however, a notable
difference. The property ontolex:isLexicalizedSenseOf is unsuitable for
relating the senses of ScT to the categories to which they belong. The lexical
senses in ScT are not necessarily lexicalizations of the category in question.
Moreover, senses that belong to the same category are not necessarily
considered synonymous. In fact, they rarely are. A case in point are the senses
of happer and miss from the sample. Both of these senses are members of the
category “Sowing”, and indeed belong to that semantic domain, but can hardly
be said to be synonymous or even to lexicalize the category.
What is missing, then, from the OntoLex vocabulary is terminology to
express a looser manner of onomasiological ordering with categories than
ontolex:isLexicalizedSenseOf does. The RDF snippet in Listing 5 contains
the desired situation, where a tentative property isSenseIn is coined (see
highlighted line) to express the relation between the sense of blander and the
category to which it belongs.
Listing 5. ScT sense of blander expressed in OntoLex
a ontolex:LexicalSense ;
skos:prefLabel "blander"@sco ;
skos:definition "disperse scantily"@en ;
ontolex:isSenseOf ;
:isSenseIn .
a ontolex:LexicalEntry ;
skos:prefLabel "blander, v."@sco ;
wn:partOfSpeech wn:noun .
In short, OntoLex itself does not yet provide terminology to onomasiologically
order the lexicographical content of ScT – and of other thesauri like it.
5 Discussion
The two case studies have shown that OntoLex is not yet expressive enough to
indicate the relation between senses and categories for all topical thesauri. In fact,
6 Sander Stolk
the lack of a property like the tentative isSenseIn does not just affect conveying
content from ScT and the great many existing cumulative thesauri like it. It also
affects expressing these very relations found in thesauri such as HTOED. After
all, senses in HTOED are not just lexicalizations of a category, they are also
members of a number of categories. To illustrate, the assertion that the HTOED
sense of freedom is a lexicalization of the category “Freedom/liberty” entails that
this sense is a member of not just that category but also of its superordinate
categories (see Listing 6).
Listing 6. HTOED sense of freedom and its relation to the categories of HTOED
a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf ;
:isSenseIn ,
,
,
.
In order to truly express how senses are categorized according to topical
systems in thesauri, then, additional terminology is required beyond what
OntoLex currently offers. Properties from other vocabularies that might fill the
gap, such as the subject property from Dublin Core Terms [11], tend to be
too generic to be able to infer further knowledge from topical systems of
thesauri. Moreover, the relation between such properties and
ontolex:isLexicalizedSenseOf is not evident. As such, the required
terminology is best captured in an update of the OntoLex vocabulary itself.
The small addition of a single property such as isSenseIn (see Listing 7),
then, and asserting its connection to the existing OntoLex property (see
Listing 8) would enable onomasiological ordering of lexicons in topical thesauri
of all varieties – distinctive or cumulative, and regardless of whether synonymy
is indicated between senses.
Listing 7. Suggested OntoLex property isSenseIn
ontolex:isSenseIn a owl:ObjectProperty ;
rdfs:label "is sense in"@en ;
rdfs:comment "This property relates a lexical sense to a
concept that captures its meaning to some
extent (that is, partially or even fully)."@en ;
rdfs:domain ontolex:LexicalSense ;
rdfs:range ontolex:LexicalConcept .
OntoLex and Onomasiological Ordering 7
Listing 8. Connection between existing OntoLex property and the suggested one
ontolex:isLexicalizedSenseOf
rdfs:subPropertyOf ontolex:isSenseIn .
6 Conclusion
This paper has shown, by means of two case studies, to what extent the
OntoLex vocabulary currently supports relating lexical senses to the concepts
that facilitate an onomasiological ordering. Such an ordering is (by their very
definition) used in lexicographical works known as topical thesauri. As it
stands, the OntoLex vocabulary offers some support for those thesauri
considered to be distinctive and that capture synonymy. Such thesauri ensure
that lexical senses displayed at a certain category do not just belong to that
category, but also express (or lexicalize) that category. Those thesauri that do
not have that same level of specificity, but merely use their categories to
organize lexical senses into semantic domains, are not yet supported by the
terminology in OntoLex.
The small addition of a single property, as suggested in this paper, would
have a big impact on the expressivity of OntoLex. The onomasiological ordering
of both distinctive and cumulative thesauri – regardless of whether these thesauri
indicate synonymy – could then properly be conveyed on the Semantic Web. As
a result, the variety of lexicographical resources that sit comfortably in OntoLex
would not be limited to dictionaries and lexical nets, as is presently the case, but
would also include thesauri. Increased support in OntoLex for onomasiological
ordering, then, would allow all these resources to truly shine on the Web. In short,
ordering by meaning through the new ontolex:isSenseIn is both meaningful
and sensible.
References
1. McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gmez-
Prez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T.:
Interchanging lexical resources on the Semantic Web. Language Resources and
Evaluation 46(4), 701–719 (2012)
2. Lexicon Model for Ontologies: Community report, 10 May 2016 (2016). URL
http://www.w3.org/2016/05/ontolex/
3. Global WordNet Association: Global Wordnet formats. URL http://globalwordnet.
github.io/schemas/
4. Kay, C., Roberts, J., Samuels, M., Wotherspoon, I. (eds.): Historical thesaurus of
the Oxford English Dictionary: with additional material from ”A thesaurus of Old
English”. Oxford University Press, Oxford (2009)
5. Macleod, I., Cairns, P., Macafee, C., Martin, R. (eds.): The Scots thesaurus.
Aberdeen University Press, Aberdeen (1990)
8 Sander Stolk
6. Kay, C., Alexander, M.: Diachronic and synchronic thesauruses. In: P. Durkin
(ed.) The Oxford handbook of lexicography, pp. 367–380. Oxford University Press,
Oxford (2016)
7. Postma, M., van Miltenburg, E., Segers, R., Schoen, A., Vossen, P.: Open Dutch
WordNet. In: Proceedings of the Eighth Global Wordnet Conference. Bucharest,
Romania (2016)
8. Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle:
W3C recommendation 25 February 2014 (2014). URL http://www.w3.org/TR/
turtle/
9. Historical thesaurus of the Oxford English Dictionary (2010). URL http://oed.
com/thesaurus
10. SKOS Simple Knowledge Organization System reference: W3C recommendation
18 August 2009 (2009). URL http://www.w3.org/TR/skos-reference/
11. DCMI metadata terms (2012). URL http://purl.org/dc/terms/