=Paper=
{{Paper
|id=Vol-1899/OntoLex_2017_paper_7
|storemode=property
|title=LexInfo as a Model for Creating Ontology-Based Dictionary of Russian Grammatical Forms
|pdfUrl=https://ceur-ws.org/Vol-1899/OntoLex_2017_paper_7.pdf
|volume=Vol-1899
|authors=Ksenia Balysheva,Elena Kartashova,Konstantin Kondratiev,Aleksey Mikheev
|dblpUrl=https://dblp.org/rec/conf/ldk/BalyshevaKKM17
}}
==LexInfo as a Model for Creating Ontology-Based Dictionary of Russian Grammatical Forms==
OntoLex as a Model for Creating the Ontology-Based
Diсtionary of Russian Grammatical Forms
Ksenia Balysheva1* (0000-0002-9894-9606), Elena Kartashova2 (0000-0001-9393-
9436), Konstantin Kondratiev3 (0000-0002-7817-6642) and Aleksey Mikheev4 (0000-
0003-1119-6654)
1
Mari State University, Yoshkar-Ola, Russia
qsuaka@mail.ru
2
Mari State University, Yoshkar-Ola, Russia
elena.karta77@mail.ru
3
Telephone Systems Ltd, Yoshkar-Ola, Russia
kk@digt.ru
4
Mari State University, Yoshkar-Ola, Russia
scurra.42@yandex.ru
Abstract. This article describes possibilities of using OntoLex as a model for
creating an ontology of morpho-syntactic properties of the Russian language.
For this purpose we analysed morpho-syntactic properties of Russian, given in
LexInfo and then extended it with grammatical categories that are not repre-
sented or that are not correctly defined in LexInfo. The introduced supplements
and adjustments enable LexInfo to represent morpho-syntactic properties of the
Russian language more completely and to use it for creating the Ontology-
Based Dictionary of Russian Grammatical Forms (OntoRuGrammaForm). The
created ontology-based dictionary helps to detect grammatical forms of widely
used Russian words.
Keywords: OntoLex, LexInfo, Ontology, Morpho-syntactic properties, Ontolo-
gy-Based Dictionary of Russian Grammatical Forms (OntoRuGrammaForm).
1 Introduction
The ontological approach to representation of natural language properties is currently
being developed in computational linguistics, mainly in researching natural language
processing. On the Semantic Web there are various ontology-based lexical and se-
mantic datasets, e.g. WordNet [8], FrameNet [2], BabelNet [12], RussNet [1], RuThes
[9], RuWordNet [10], YARN [3].
On the Semantic Web there exist ontological models representing linguistic Linked
Data that describe morphological features of languages to some extent, including
Russian, e.g. OliA [4], lemon [11], LexInfo [6]. Representation of features of a natu-
ral language as ontologies on the Semantic Web makes it easier to implement the idea
2
of the Linked Data, which has led to the emergence of the Linguistic Linked Open
Data (LLOD) cloud1, a cross-domain knowledge base comprising structured infor-
mation extracted from Wikipedia infoboxes, the World Atlas of Language Structures
(WALS)2 and lexical resources such as Wiktionary3, WordNet, FrameNet [7] and
BabelNet. The advantages of the Linked Data for linguistics include representational
adequacy, structural and conceptual interoperability, data federation [5].
The idea of connecting words with concepts, including the morpho-syntactic level,
which makes it possible to clarify the meaning, e.g. of polysemantic and homony-
mous words, is implemented in LexInfo. In this project we used LexInfo as the most
complete ontology based on RDF model for labeling the Ontology-Based Dictionary
of Russian Grammatical Forms due to its evident advantages: separation and inde-
pendence between the ontological and linguistic levels; structuring linguistic infor-
mation; the ability to specify the meaning of linguistic constructions with respect to
arbitrary ontologies, etc. [6]. In LexInfo the data is serialized in RDF/XML, while in
OntoRuGrammaForm the data is serialized in HDT. Like RDF/XML, HDT is a for-
mat for RDF, but it keeps datasets compressed.
The goal of this project is to create an ontology-based dictionary that represents
morpho-syntactic properties of the Russian language. To achieve this goal we set and
consecutively resolved the following tasks: 1) analysing grammatical classes and
properties of Russian, given in LexInfo; 2) collating the composition of grammatical
classes and properties in LexInfo with Russian grammar books and dictionaries; 3)
supplementing LexInfo with insufficient and refined Russian grammatical categories;
4) translating labels into Russian and supplying LexInfo and OntoLex elements with
Russian commentaries; 5) creating the Ontology-Based Dictionary of Russian Gram-
matical Forms.
Both LexInfo and OntoLex were used to create the Ontology-Based Dictionary of
Russian Grammatical Forms. Grammatical categories of words were determined with
LexInfo, while entities/concepts in a dictionary entry were related with OntoLex.
2 Supplementing the LexInfo Model with Russian Grammatical
Categories
LexInfo is a universal multipurpose model for representing morpho-syntactic proper-
ties of highly inflected languages that have genetic and typological resemblances at
the level of common affixes, roots, and a regular phonetic correspondence of sounds.
In general, morpho-syntactic properties of Russian can be represented in LexInfo.
Nevertheless, the accomplished analysis of its structure showed that these properties
are not fully represented. This fact gave rise to the intent of adjusting these properties,
listed in LexInfo, in accordance with the state-of-the-art of grammar of the Russian
literary language.
1
http://linguistics.okfn.org/llod
2
http://wals.info
3
https://en.wiktionary.org/wiki
3
The analysis of the list of Russian grammatical properties in LexInfo and its colla-
tion with the data of academic grammar books [14, 15] led to the following observa-
tions:
1) some grammatical categories of Russian are not represented and do not have spe-
cial nominations in LexInfo;
2) some grammatical categories are not placed into correct grammatical clas-
ses/properties;
3) some grammatical categories are supplied with inaccurate Russian translations.
The analysis of LexInfo showed that nominations of some Russian grammatical
categories should be introduced (see Table 1):
(1) In LexInfo the individual participle is put into the class VerbFormMood. In
our view, it should also belong to the class PartOfSpeech. So, we introduced
the new class ParticiplePOS, into which the individual participle is placed.
(2) To the class ParticiplePOS we added the new individual shortParticiple.
The distinction between a short participle and a participle is essential for the
system of the Russian language as these two forms have different inflections
and different syntactical functions.
(3) In LexInfo there is no individual gerund. We believe it should be added to
identify the adverbial participle (the Russian gerund) as the part of speech in
Russian. We introduced the new class GerundPOS, into which the individu-
al gerund is put, and we also stated that the individual gerund belongs to the
class VerbFormMood.
(4) We added the individuals singulariaTantum, pluraliaTantum, fixedNumber
to the existing class Number.
(5) We added the new class Finiteness with two individuals – finite and nonFi-
nite – to the class MorphosyntacticProperty.
(6) We introduced the class Reflexivity with two individuals – reflexive and
nonReflexive into the class MorphosyntacticProperty.
(7) The individual impersonalVerb is added to the class VerbPOS.
(8) The individual shortAdjective is added to the class AdjectivePOS.
(9) The individual relativeAdjective is added to the class AdjectivePOS.
(10)The individual collectiveNumeral is added to the class NumeralPOS.
The supplementation of grammatical categories of the Russian language in LexInfo
is also connected with eliminating inaccuracies in placing grammatical categories into
classes (see Table 1):
(1) In LexInfo comparative is the individual of the class Degree. In our view, it
is also the individual of the class AdjectivePOS.
(2) In LexInfo the individual infinitive belongs to the class VerbFormMood. In
our view, it also belongs to the class VerbPOS.
(3) In LexInfo the individual ordinalAdjective belongs to the class Adjective-
POS. According to the grammatical properties of Russian this individual al-
so belongs to the class NumeralPOS.
4
Another important supplement to grammatical properties of Russian in LexInfo is
adjusting translations of class and individual labels into Russian. Some examples of
this type of supplements are given below:
(1) The term gerundive, which is put into the class VerbFormMood, is not accu-
rately translated into Russian. In Latin the gerundive is a verbal adjective
while the gerund is a verbal noun both in Latin and in English. In Russian
the grammatical category of a gerund does not exist. We suggested introduc-
ing the individual gerundPOS to label the adverbial participle (the Russian
gerund) as the part of speech.
(2) In Russian there exist cardinal numerals and ordinal numerals. In LexInfo
the Russian labels for the individuals cardinalNumeral and ordinalNumeral
from the class NumeralPOS are confused and should be interchanged.
(3) In LexInfo class Finiteness from the class MorphosyntacticProperty is la-
beled inaccurately in Russian. Our suggestion is to supply the grammatical
category of finiteness as well as the class Finiteness by the Russian label
spryagaemost. As the English conjugation and the Russian spryagaemost are
quasi-synonyms, we find the LexInfo label Finiteness appropriate to indicate
the ability of Russian verbs to conjugate.
Table 1. Suggested supplements to LexInfo for representing grammatical categories of Rus-
sian.
№ Individual Class Commentary on supplements
1 participle VerbFormMood The individual participle belongs
& ParticiplePOS to the class verbFormMood. The
new class ParticiplePOS is added.
The individual participle should
belong to the class ParticiplePOS
and to the class VerbFormMood.
2 shortParticiple VerbFormMood The new individual shortParticiple
& ParticiplePOS is added to the class Partici-
plePOS. It should belong to both
classes - VerbFormMood and
ParticiplePOS.
3 gerund VerbFormMood The new individual gerund is
& GerundPOS added to two existing classes –
VerbFormMood and GerundPOS.
4 singulariaTantum Number The new individual singulariaTan-
tum is added to the existing class
Number.
5 pluraliaTantum Number The new individual pluraliaTan-
tum is added to the existing class
5
Number.
6 fixedNumber Number The new individual fixedNumber
is added to the existing class
Number.
7 finite Finiteness The new individual finite and the
class Finiteness are added.
8 nonFinite Finiteness The new individual nonFinite and
the class Finiteness are added.
9 reflexive Reflexivity The new individual reflexive and
the class Reflexivity are added.
10 nonReflexive Reflexivity The new individual nonReflexive
and the class Reflexivity are add-
ed.
11 impersonalVerb VerbPOS The new individual impersonal-
Verb is added to the existing class
VerbPOS.
12 shortAdjective AdjectivePOS The new individual shortAdjective
is added to the existing class Ad-
jectivePOS.
13 relativeAdjective AdjectivePOS The new individual relativeAdjec-
tive is added to the existing class
AdjectivePOS.
14 collectiveNumeral Numeral The new individual collectiveNu-
meral is added to the existing
class Numeral.
15 comparative Degree & Adjec- The existing individual compara-
tivePOS tive belongs to the class Degree.
It should also belong to Adjec-
tivePOS.
16 infinitive VerbFormMood The existing individual infinitive
& VerbPOS belongs to VerbFormMood. It
should also belong to VerbPOS .
17 ordinalAdjective AdjectivePOS & The existing individual ordinalAd-
NumeralPOS jective belongs to AdjectivePOS. It
should also belong to Numeral-
POS.
3 The Ontology-Based Dictionary of Russian Grammatical
Forms (OntoRuGrammaForm)
In any subject area the connection of words with concepts in the form of an ontology
should be based on a morpho-syntactic level. The idea turned out to be fruitful for
creation of OntoRuGrammaForm. The completed experimental work made it possible
6
to connect words with concepts by implementing morpho-syntactic properties of the
Russian language.
3.1 Description of OntoRuGrammaForm
With the additions and adjustments, introduced into LexInfo, it became possible to
represent morpho-syntactic properties of Russian more completely and accurately in
the Ontology-Based Dictionary (OntoRuGrammaForm). The ontology is aimed at
revealing grammatical forms for the Russian words in general use.
The Ontology-Based Dictionary of Russian Grammatical Forms (OntoRuGram-
maForm) contains 389,226 lemmas and 5,097,173 word forms. It is available for pub-
lic use at http://ldf.kloud.one/ontorugrammaform. The experience of creating the dic-
tionary can be used for educational purposes, e.g. teaching Russian and testing
knowledge of Russian.
3.2 Technical Implementation and Publication of OntoRuGrammaForm on
the Web
The Open Corpora4, the open corpus of the Russian language, was used as a source
for OntoRuGrammaForm. The Open Corpora is compiled by volunteers using web
texts and is available in XML and plaintext formats. The Open Corpora XML schema
can be viewed at http://opencorpora.org/export/dict/dict.opcorpora.xsd.
The programme component of the dictionary is written in JavaScript (NodeJS), as
we hold to the idea of creating and selecting the components to work with ontologies
on this particular stack of technologies. We divided the technical implementation
process into three blocks for convenience:
1) automatic conversion of the Open Corpora labels into the OntoLex labels;
2) for the backend we used Linked Data Fragments 5;
3) the client part is under development.
The automatic conversion of the Open Corpora labels into the OntoLex labels is a
1:1 mapping. The project of label conversion is available at
https://github.com/cnstntn-kndrtv/opencorpora2ontolex.
The structure of OntoRuGrammaForm conforms to the Lexicon Model for Ontolo-
gies, given in Morpho-Syntactic Description section of Community Report 6. As an
example we use the Russian polysemantic word ‘ёж’ (‘yozh’) – ‘hedgehog’ [13]: 1) a
small animal whose body is covered with sharp needle-like spines; 2) a defensive
barrier of crossed girders. As we do not take meanings into account in our dictionary,
these are two different words, each having its own set of morphological forms.
The description of the word, lemma, and word form relation of the word ‘ёж’
(‘yozh’) – ‘hedgehog’ in the first meaning in the Turtle format comes further.
4
http://opencorpora.org
5
http://linkeddatafragments.org
6
https://www.w3.org/2016/05/ontolex/#morphosyntactic-description
7
# :1_yozh ёж
:1_yozh a ontolex:Word ;
ontolex:canonicalForm :1_yozh:lemma ;
ontolex:otherForm :1_yozh:form1_yozh,
:1_yozh:form2_ezha,
:1_yozh:form3_ezhu .
# :1_yozh ёж Lemma
:1_yozh:lemma
ontolex:writtenRep "ёж"@ru ;
lexinfo:partOfSpeech lexinfo:noun ;
lexinfo:animacy lexinfo:animate ;
lexinfo:gender lexinfo:masculine .
# :1_yozh ёж Forms
:1_yozh:form1_yozh
ontolex:writtenRep "ёж"@ru ;
lexinfo:number lexinfo:singular ;
lexinfo:case lexinfo:nominativeCase .
:1_yozh:form2_ezha
ontolex:writtenRep "ежа"@ru ;
lexinfo:number lexinfo:singular ;
lexinfo:case lexinfo:genitiveCase .
:1_yozh:form3_ezhu
ontolex:writtenRep "ежу"@ru ;
lexinfo:number lexinfo:singular ;
lexinfo:case lexinfo:dativeCase .
Fig. 1 shows the description of the word ‘ёж’ (‘yozh’) – ‘hedgehog’ in the first
meaning, its lemma and three forms out of twelve.
The visualization shown in Fig.1 is implemented with the tool which is being de-
veloped now. This tool makes it possible to make federated querying to ontologies
and represent query results in different forms. This kind of visualisation was specifi-
cally developed for such data types. It demonstrates convenience for representing all
relations as definite groups but not as scattered vertices of a graph. This visualisation
was named Terrapin (based on the name “diamond terrapin”) due to its resemblance
to the Turtle format.
8
Fig. 1.Visualisation of relations between the morphological forms of the word ‘ёж’ (‘yozh’) –
‘hedgehog’.
4 Conclusion and Future Work
As a result of our research, we supplemented and adjusted LexInfo for the adequate
description of morpho-syntactic properties of the Russian language. These supple-
ments and adjustments are proposed as an extension to LexInfo for Russian. The sup-
plemented and adjusted grammatical properties of Russian in LexInfo made it possi-
ble to create the Ontology-Based Dictionary of Russian Grammatical Forms (On-
toRuGrammaForm) which is aimed at revealing grammatical forms of widely used
Russian words. Further work will involve modeling syntactical structure of sentences
with LexInfo to create a system of connecting natural language with concepts in on-
tologies. We also plan to create client applications for queries into OntoRuGramma-
Form.
Acknowledgements
The authors are grateful to Telephone Systems Ltd for support and technical assis-
tance as a part of kloud.one project.
9
References
1. Azarowa, I.: RussNet as a Computer Lexicon for Russian. In: Proceedings of the Intel-
ligent Information systems IIS-2008, pp. 341–350 (2008).
2. Baker, C., Fillmore, C., Lowe, J.: The Berkeley FrameNet Project. In: Proceedings of
COLING '98 the 17th international conference on Computational linguistics, vol: 1, pp.
86–90 (1998).
3. Braslavski, P., Ustalov, D., Mukhin, M.: A Spinning Wheel for Yarn: User Interface for a
Crowdsourced Thesaurus. In: Proceedings of EACL, pp. 101-104. Gothenberg, Sweden
(2014).
4. Chiarcos, C.: An ontology of linguistic annotations. In: LDV Forum, pp. 1–136 (2008).
5. Chiarcos, C., McCrae, J., Cimiano, Ph., Fellbaum, Ch.: Towards open data for linguistics:
Linguistic linked data. In: New Trends of Research in Ontologies and Lexical Resources,
Springer (2013).
6. Cimiano, P., McCrae, J., Buitelaar, P., Stintek, M.: Lexinfo: A declarative model for the
lexicon-ontology interface. In: Web Semantics: Science, Services and Agents on the World
Wide Web, pp. 29–51 (2011).
7. Cimiano, Ph., Unger, Ch., McCrae, J.: Ontology-based Interpretation of Natural Language
(2014).
8. Fellbaum, C.: A Semantic network of English verbs. In: WordNet. An electronic lexical
database, pp. 153–178 (1998).
9. Loukachevitch, N., Dobrov, B.: RuThesLinguistic Ontology vs. Russian Wordnets. In:
Proceedings of Seventh Global WordNet Conference (GWC 2014), pp.154–162 (2014).
10. Loukachevitch, N.V., Lashevich, G., Gerasimova, A.A., Ivanov, V.V., Dobrov, B.V.: Cre-
ating Russian WordNet by Conversion. In: Proceedings of Computational Linguistics and
Intellectual Technologies. International Conference "Dialog 2016", pp. 423–433 (2016).
11. McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the se-
mantic web with lemon. In: The semantic web: research and applications, pp. 245–259
(2011).
12. Navigli, R., Ponzetto, S.: BabelNet: building a very large multilingual semantic network.
In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguis-
tics, pp. 216 –225 (2010).
13. Ozhegov, S.I.: Dictionary of the Russian language (in Russian). Moscow (1983).
14. Shvedova, T.Yu., Arutyunova, N.D., Bondarko, A.V., Ivanov, V.V., Lopatin, V.V.,
Uluhanov, I.S., Philin, Ph.P.: Russian grammar (in Russian). Vol.1: Phonetics.Phonology.
Stress. Intonation. Morphological derivation. Morphology, Nauka, Moscow (1980).
15. Zaliznyak, A.A.: Grammatical dictionary of the Russian language (in Russian). Ast-press,
Moscow (2008).