<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Digital Toolkit to Develop Research Potential of Explanatory Dictionary (Case of Spanish Language Dictionary)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kyrpychova str. 2, Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ukrainian Lingua-Information Fund NAS of Ukraine</institution>
          ,
          <addr-line>Holosiivskyi av. 3, Kyiv, 03039</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays linguistic corpora are recognized as a most effective tool to perform linguistic researches in digital environment. However, the dictionaries that actively use corpus technologies for their creation and update remain underestimated in regards to their research potential. Fundamental explanatory dictionaries of national languages are of primary interest for linguistic experts. The dictionaries of this kind are characterized by giving complete wellstructured and multi-aspect description of language units, having linguistic theories as a basis for creation and by representing all the linguistic information necessary not only for understanding the meanings of language units in various contexts, but also for their correct use. The present paper describes the project of software toolkit for extracting linguistic information from dictionary text. The authors share their experience gained while creating such kind of research tool and show its advantages for professional linguists. The software project is being carried out for working with Spanish Dictionary “Diccionario de la lengua española. 23ª edición” (DLE 23). The entry texts have been taken from DLE 23 online version (www.dle.rae.es). The dictionary is characterized by detailed description of morphological, stylistic, prosodic, syntactic and combinatorial features of Spanish lexical units. The headword list also includes morphemes, phrases of various types, acronyms and abbreviations. The project in question involves the creation of the virtual lexicographic laboratory (VLL DLE 23) intended for linguistic researches on the basis of DLE 23 text. The theoretical framework of the project consists of the theory of lexicographic systems and theory of semantic states. The examples of applying the current version of VLL as a tool for linguistic research are given.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Computer lexicography</kwd>
        <kwd>linguistic information extraction</kwd>
        <kwd>virtual lexicographic laboratory</kwd>
        <kwd>explanatory dictionary</kwd>
        <kwd>digital environment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>One of the present-day tasks of the modern lexicography is to find various ways of using rich
potential of digital environment to timely satisfy the information needs of advanced users and modern
lexicographers. The up-to-date dictionary making relies on digital linguistic technologies. First of all,
we refer to corpus technologies (Corpus Query Systems or CQS) and digital systems to compile and
update dictionaries (DWS short for Dictionary Writing Systems). It should be also noted that
dictionary-making process involves IT specialists who support and develop digital technologies in
linguistics, which is a new challenge for lexicography. Despite major advances in digital technologies,
the lexicographic landscape remains largely heterogeneous. This applies to both the formats of
lexicographic data representation, and the standards for working with them [7].</p>
      <p>Our interest is focused primarily on comprehensive explanatory dictionaries of national languages.
Using CQS and DWS technologies allow non-stop work, i.e. the dictionary-making process is always
in progress without completion stage (as in case of Oxford English Dictionary). However, despite the
availability of advanced user interfaces, their possibilities for searching, analysis and generalization of
linguistic information, primarily for professional linguists are still limited. The authors have been
traditionally those who develop not only the structure and content of the entries but the search
capabilities of the dictionary. As a result, the problem of extracting linguistic information for its
further usage by the experts in their researches is still not resolved. Therefore, the goal of our research
work is the development of an interface scheme to conduct linguistic researches on the basis of
explanatory dictionary text and the construction of an effective toolkit that implements this scheme.
Inspirational is the fact that, unlike paper dictionaries, this is a feasible task for digital lexicographic
text [1, 2, 3, 6, 7, 8].</p>
      <p>For the purposes of our research we have selected Spanish Language Dictionary entitled
“Diccionario de la lengua española. 23ª edición” (shortly DLE 23), which has been published by the
Academia Real Española (Spanish Royal Academy). The DLE 23 is the most comprehensive and
representative explanatory dictionary of the Spanish language. The 23rd edition was published in
October 2014. The year later DLE 23 was made available on CD-ROM and then online at
www.dle.rae.es. Now the Academy is working on a 24th edition, which is supposed to be digital
only [5].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Spanish language dictionary</title>
      <p>The DLE 23 is characterized by detailed description of morphological, stylistic, prosodic, syntactic
and combinatorial features of Spanish lexical units. The headword list also includes morphemes,
phrases of various types, acronyms and abbreviations. The entries contain multi-aspect information
which facilitates not only the meaning of a headword in different contexts but also correct usage in
communication. The main factor which has determined our choice of the dictionary is the availability
of the dictionary text in electronic form in HTML format, which guarantees the authenticity of the
text with its paper version and excludes orthographic errors that are typical for OCR. Moreover, the
tags allow identification of the information elements of a dictionary entry. Currently a prototype
version of VLL DLE 23 which can be accessed at https://services.ulif.org.ua:44359/, enlarges
research potential of DLE 23 in greater extent.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>General characteristics</title>
      <p>The formal model of a lexicographic system isn’t possible to be built without having large and
comprehensive dictionary as a basis. In our case we have selected Spanish language dictionary
“Diccionario de la lengua española 23ª ed. Edición del tricentenario” (shortly DLE 23) by the Royal
Spanish Academy. This dictionary is a fundamental work containing vocabulary to be widely used
both in Spain and Latin America. Besides lexical meanings, DLE 23 also provides detailed
information on grammar, syntax and usage features of the words composing the headword list.</p>
      <p>The headword list of DLE 23 comprises more than 93,000 units representing morphological,
lexical and syntactic levels of the Spanish language. The total number of definitions is 195,439. If
compared with the previous edition [4], DLE 23 has:
 21,466 meanings corresponding to different domains,
 18,712 meanings peculiar to Latin America,
 435 meanings related to the usage in Spain,
 333 foreign words not adapted to Spanish,
 1,637 verbs together with their conjugation models.</p>
    </sec>
    <sec id="sec-4">
      <title>Interface of online version of DLE 23</title>
      <p>The current online version of DLE 23 is intended for providing a reference on word semantics, but
unfortunately has very limited research potential. The interface consists of a list of filters, a search
box and a “Search” button (consultar). The proposed interface allows you to work only with the
dictionary register with a few filters: “word form” (por palabras), “lemma” (lema), “contains”
(contiene), “exactly” (exacta), “begins with” (empieza por), and “ends with” (termina en).</p>
      <p>Linguistic research requires the access to the entire text of the dictionary, as well as to its separate
elements. This requires a theoretical basis for identifying, describing and representing relevant
linguistic data from the DLE 23 text.
2.3.</p>
    </sec>
    <sec id="sec-5">
      <title>Lexicographic analysis</title>
      <p>Each fragment of the entry text corresponds to a certain type of linguistic information and can be
identified by the format of representation. This format can be well-defined or undefined at all. Let us
consider the ways of representing the information in different parts of DLE 23 entry such as
headword, headword variants, etymology, morphology, orthography, set of definitions and
encyclopedic note.</p>
    </sec>
    <sec id="sec-6">
      <title>2.3.1. Entry information elements</title>
      <p>In paper version the elements have a linear order and special characters are used to separate them
in the text array. In online version each element is located in a separate text line and is highlighted not
only with a special marker, but also with color, as shown in Table 1.</p>
    </sec>
    <sec id="sec-7">
      <title>2.3.2. Linguistic information overview</title>
      <p>Each fragment of the entry text corresponds to a certain type of linguistic information and can be
identified by the format of representation. This format can be well-defined or undefined at all. Let us
consider the ways of representing the information in different parts of DLE 23 entry such as
headword, headword variants, etymology, morphology, orthography, set of definitions and
encyclopedic note.</p>
      <p>The entry can be headed not only by a word, but also word-forming elements such as prefixes,
suffixes, as well as idiomatic and non-idiomatic collocations. This entry element contains the
following linguistic information for the headword:
 Headword structure: morpheme (-acro, andro-), word (leche, pan, yerba) or collocation
(agua mineromedicinal, como agua para chocolate);
 Headword type: Spanish word (cama, ojo, perro), foreign word (amateur, ballet),
abbreviation (ADSL, ONG), acronym (hidrosol, laser);
 Homonymy (abalear1, abalear2).</p>
      <p>The headword variants are given for all lexical words, such as nouns, adjectives, adverbs and
verbs, including passive participles, and sometimes for grammar words such as articles and
interjections. Some variants are provided with other details, namely:
 Geographical area, if the variant usage is limited to particular country or countries;
 Definition number if the headword variant relates only to particular lexical meaning (as it
shown in table 2);
 Chronological status indicating that the usage of headword variant is archaic.</p>
      <p>The format of this entry part is as follows: (1) headword variant; (2) additional information. The
examples of headword variant description are given in Table 2.</p>
      <p>As the table shows the word sustancia (substance) has its variant substancia and the absense of
additional information means that headword and the variant are fully interchangeable. The same can
be said about the word jiennense (from Jaén city) which can be interchangeable with jienense and
giennense. In some cases there can be usage limits. For example, the usage of en hora buena
(congratulations!) is limited to lexical meanings described in definitions 2-3. In fourth example the
label “p. us.” (from Spanish poco usado) shows that chavola (cabin) is archaic variant of the
headword chabola. The fifth example shows the geographical and usage limits for the variant
hierbatero: in the meaning 2 only in Columbia, Ecuador, Mexico and Peru; and in the meaning 4 only
in Chile.</p>
      <p>The etymological part of the entry gives brief information about headword origin and is
characterized by the following format: (1) the source language; (2) the etymon; (3) and additional
information, which may include the semantic changes in etymons, structural changes, as well as the
moment from which the word began used in Spanish. The content examples of the etymological part
are given in Table 3.</p>
      <p>Additional information</p>
      <p>-//y este del lat. frons, frontis
zygōtós 'uncido, unido', der. de ζυγοῦν
zygoûn 'uncir, unir'
y este der. del lat. ubīque 'en todas partes'
1857-1894, físico alemán
y este de Bikini, nombre de un atolón de
las Islas Marshall, con infl. de bi- ‘bi-’, por
alus. a las dos piezas</p>
      <p>Etymological information can be concise (1), i.e. indicate only the language of origin and etymon,
or more detailed (2-4). For example, ubicuidad comes from Late Latin word ubiquĭtas, and the letter
has been derived from Latin ubīque “everywhere”. If the word comes from a proper or geographical
name (5-6) the information can be of encyclopedic type. In case of bikini etymology says that the
word has English origin and comes from geographic name Bikini, an atoll of Marshall Isles;
morpheme bi- having the meaning “composed of two parts”.</p>
      <p>The next part of DLE 23 entry is the information about morphological features such as: regular
and irregular forms of superlative degree of comparison for adjectives and adverbs; references to
conjugation patterns for regular and irregular verbs, as well as irregular passive participles for
individual verbs, etc. The examples of morphological information are given in Table 4.</p>
      <p>This part of the entry has neither a special identification marker nor defined format for
representing linguistic information. So, the identifier may be its position in the sequence of the entry
elements. In any case morphological characteristics go after etymology.</p>
      <p>Orthographic information is provided only for headwords, the spelling of which (with a capital or
small letter, with or without an accent) can significantly change their lexical meaning. This entry
element includes may include the following information: spelling feature and the number of the
lexical meaning in the dictionary to which this feature applies (see Table 5).</p>
      <sec id="sec-7-1">
        <title>Orthographic features</title>
      </sec>
      <sec id="sec-7-2">
        <title>Escr. con may. inicial</title>
      </sec>
      <sec id="sec-7-3">
        <title>Escr. con acento</title>
      </sec>
      <sec id="sec-7-4">
        <title>Puede escribirse con acento</title>
      </sec>
      <sec id="sec-7-5">
        <title>Lexical meaning to which the</title>
        <p>feature is applied
en acep. 2
en acep. 3
en acep. 8</p>
        <p>For example, the Spanish word inmaculada can have different meanings depending on its initial
letter. It means “perfect, faultless” with small initial letter and “Mary, mother of Jesus” with capital
letter.</p>
        <p>The set of definitions represents the interpretation of the headwords using definitions of different
types (standard, contextual, explanatory, by synonym, explanatory and others) and may consist of one
or more definitions. Each definition is composed by: 1) introductory part, 2) definition text, 3) usage
examples, 4) additional comments on lemma usage and 5) encyclopedic note. The introductory part is
used for introducing a definition using keywords corresponding to its type. For example, “Dicho de”,
“En” and “Entre” are the keywords for contextual definition, and “U.” for explanatory definition.</p>
        <p>There is no introductory part for standard, synonymous and other definitions. The definition text
can be a sentence or one word, a phrase, as in the case of a definition by synonym. Usage examples
are complementary means of lexical meaning explanation and show headword usage in collocations
or in a sentence. The definitions examples are followed by comments to denote additional grammar
and usage peculiarities the headword may have in the lexical meaning. Let us give the content
examples of lexicographic meaning description for the headword agua (water).</p>
      </sec>
      <sec id="sec-7-6">
        <title>Líquido que se obtiene […]</title>
        <p>lluvia (‖ acción de llover)
lágrimas (‖ gotas de la
glándula lagrimal)
para avisar de la presencia de
cualquier tipo de autoridad.</p>
        <sec id="sec-7-6-1">
          <title>Agua de azahar, de cebada, de limón</title>
        </sec>
        <sec id="sec-7-6-2">
          <title>Se le llenaron los ojos de agua</title>
          <p>∅
∅</p>
        </sec>
      </sec>
      <sec id="sec-7-7">
        <title>U. t. en pl. con el mismo</title>
        <p>significado que en sing.</p>
      </sec>
      <sec id="sec-7-8">
        <title>U. t. en pl. con el mismo significado que en sing.</title>
        <p>∅
∅</p>
        <p>The last part of a definition is encyclopedic note, which is provided for the headwords denoting the
concepts from natural sciences such as chemistry, physics, and mathematics. This note is a non-verbal
way of representing a concept. For example, if the headword denotes chemical substances or
elements, then the corresponding formula is shown in parentheses at the end of the definition. When it
comes to mathematical or physical quantities, linguistic signs, their symbolic designations are
presented. Encyclopedic note in DLE 23 is of two types: 1) “Fórm”, chemical formula, and
2) “Símb”, a symbolic designation of physical or mathematical quantities. The content of
encyclopedic note for the headwords agua, hercio and kilobyte and número pi is shown in Table 7.</p>
        <p>As it can be seen from the above, every element of the dictionary entry contains multi-aspect
information about Spanish language unit. Describing a language as an established system is
illustrative of fundamental dictionaries, especially explanatory ones. It means that these dictionaries,
as stated by Prof. V. A. Shirokov, carry a huge number of implicitly given relationships in a language
system that cannot be revealed using traditional methods. In this regard, there is a need to create a
special software tool with which to reveal these relationships from the text of the dictionary. While
working with the tool, the user’s request may vary from an elementary reference about a specific
word to generalized grammatical and semantic information related to the entire classes of language
units, as well as various relationships developing and functioning in the language system. Elaborating
such software tool implies the selection of appropriate theoretical framework. As such, we use the
theory of lexicographic systems and the theory of semantic states by V.A. Shyrokov, the main
provisions of which are outlined in [9].</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>3. Method</title>
      <p>Developing effective tool with which to extract linguistic information from explanatory dictionary
text requires respective theoretical framework. As such we have selected the theory of lexicographic
systems and the theory of semantic states by Prof. Shyrokov [9].</p>
      <p>According to the theory of lexicographic systems, an explanatory dictionary (like any other
dictionary) is considered as a lexicographic system (L-system). And the L-system itself is an
information system in which one or several lexicographic effects are induced. The main relations in
this system are the relations “subject – object” and “form – content”. Any L-system is defined by the
following components:
 D is a fragment of reality, which is the object of lexicographic description;
 S is a subject that makes lexicographic description of D (in our case, we associate it with the
authors of the dictionary);
 Q is lexicographic effect observed S by the subject in D and transformed in a set of
elementary information units IQ(D) (in our case, we interpret this component as a set of linguistic
units composing a dictionary headword list);
 V(IQ(D)) is a set of descriptions IQ(D); S: IQ(D)  V(IQ(D)).</p>
      <p>In view of the above the following statement will be true for any headword х:</p>
      <p>( ) = { }; ∀    ( )  :  →  ( );  ( ) =  (  ( )) (1)
Where V(x) in the dictionary is the text of the dictionary describing a headword x. Hence V(IQ(D))
is a collection of all dictionary entries. On the set of descriptions V(IQ(D)) and, particularly, on each
V(х), there can be defined two structures:  and []. They are the carriers of the linguistic facts and
regularities in lexicographic system. At the same time  is set of “very simple” structural elements of
the dictionary such as words, abbreviations, labels, notes, figures, elements of grammar and
vocabulary description, etc.). This can be formulated in the following way. For each хIQ(D), a set of
structural elements (х) which compose V(х) is determined according to the following principles:
1. x (х);
2. Any fragment of the dictionary entry V(х) can be built of the elements (х);
3. The principle of forming the elements (х) is to be common for all V(х), i.e. for all хIQ(D).</p>
      <p>It is necessary to indicate importance of the formulated principles of forming -structures in
lexicography. Rule 2 is actually a requirement for the universality of the dictionary metalanguage: any
linguistic fact that is fixed in a particular dictionary must be reflected in its metalanguage. Principle 3
implies that all linguistic facts of the same type and phenomena must have a unified representation in
lexicographic description. These rules provide objective prerequisites for a formalized definition of
the process of linguistic achievement using a lexicographic system.</p>
      <p>In their turn, the  elements join into lexicographic structures [], corresponding to the
description of linguistic phenomenon attributed to a headword. So, the whole lexicographic
description of the headwords is defined by the elements (, []). Each dictionary entry of DLE 23 is
assigned a basic structure (Fig. 2).</p>
      <p>Let us demonstrate the examples of [] that form lexicographic description of the headword
agua. The text of the dictionary element is given in a format that preserves the font markup used in
online version of the dictionary (Fig. 3).</p>
      <p>Based on the text analysis of online version of DLE 23 entries, we distinguish the following
parameters for the left part L0: RR (lemma forms), DUPL (regional variant), ETYM (etymology),
MORPHO (inflection), ORTHO (orthography) and UNCRT (undefined parameter). Each parameter is
represented in our model as a text string.</p>
      <p>The right part P0 is composed of the elements of lexical meaning descriptions. The polysemy of
the headword is determined by the number of these descriptions. Each description may include
several structural elements, namely MNGN (definition No), REM (set of labels), DEF (definition),
ED (encyclopedic note), COM (comment), and IL (illustration).</p>
      <p>The text line of that DLE 23 entry can be subdivided into smaller fragments, each of them
containing a label of specific type: REM-GR (grammar); REM-US (usage); REM-ST (stylistics);
REM-DOM (domain); and REM-REG (geographic region). As a rule, the lexical meaning in the entry
text is described by the structural element DEF. The comments (COM) are consistent with the
definition. Each definition and each comment can be accompanied by its own illustrations (IL). The
structure of the interpretation may include several DEF, COM and IL. The splitting of text into
structural elements for the heading word agua (water) is shown in Table 8. As an example we have
taken lexical meaning descriptions 1, 2, 7 and 15.</p>
      <sec id="sec-8-1">
        <title>Content</title>
        <p>According to the theory of semantic states, any linguistic unit, when used in a context, adopts a
certain semantic state which represents a sum of grammatical and lexical meanings. In our case, we
consider the dictionary as a collection of semantic states of the headwords, the features of which are
fixed by the elements P(x).</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4. Results and discussions</title>
    </sec>
    <sec id="sec-10">
      <title>4.1. Interface of VLL DLE 23 and its research toolkit</title>
      <p>As it shown in Fig. 4, the interface of the VLL DLE 23 laboratory consists of four elements: (1)
top menu bar containing tools for working with the headword list and the text of DLE 23; (2)
headword panel designed to search for words and navigate in the headword; (3) text box to display
dictionary entries (the format corresponds to the original online version of DLE 23); and (4) text box
to view HTML text of dictionary entries. The top menu bar includes two tools: “Selection” and
“Statistics”. The first one contains a group of parameters to make a sample of dictionary entries
containing headword linguistic features (type, structure of the register word, homonymy, number of
lexical meanings, etc.). The second one generates statistics for a specific sample of the entries or the
entire dictionary.</p>
      <p>With Selection tool it’s possible to form an inventory of the Spanish vocabulary reflected in DLE
23. Two tools can be used to select and quantify:
 Cognate words, including homonymous cognate words;
 Spanish vocabulary elements by its origin;
 Words having a specific suffix or prefix, as well as words consisting of a root;
 Language units belonging to a certain linguistic level, for example: morphemes, lexemes,
phrases;
 Units of other types such as abbreviations and acronyms.</p>
      <p>The VLL DLE 23 tools give the opportunity to the users to select the entries the headwords of
which have common grammatical, lexical and other features reflected in the text of the dictionary
entries. These properties are displayed in the definition and other elements of the dictionary entry
using certain keywords and expressions. In particular, the following linguistic properties of the
headwords can be distinguished from the text of the dictionary entries:
 Participation of the morphemes in forming the words to express particular lexical meaning;
 Tracing the way through which the headword came from another language to Spanish
(directly or through intermediary language);
 Lexical meaning development of the headwords of foreign or native origin;
 Semantic structure of the headwords, including diminutives, augmentatives etc.;
 Availability or absence of Spanish equivalents to the words coming from foreign languages;
 The etymology of the headwords belonging to different languages of origin;
 Ability of the words both native and foreign origin to form collocations, for example “Noun +
Adjective” and other types (adjectival, verbal prepositional etc.);
 Words belonging to a particular semantic field headed by a broader word.
4.2.</p>
    </sec>
    <sec id="sec-11">
      <title>Examples of VLL DLE 23 application</title>
      <p>The current version of VLL DLE 23, the developers of which are the authors of this article, is
intended for making an inventory of language units and conducting linguistic researches with
statistical calculations. Let us consider some examples of applying the VLL.</p>
    </sec>
    <sec id="sec-12">
      <title>4.2.1. Formation of lexicographic types</title>
      <p>This function consists in the selection of dictionary entries, the headwords of which can be attributed
to a certain class by their common linguistic properties. With VLL DLE 23 the classes of Spanish
words, united by common linguistic (grammatical, semantic, usage) properties, are possible to be
visualized. Such classes of the words described in the dictionary are called lexicographic types. Let us
form, for example, a lexicographic type composed of the verbs, the conjugation of which is similar to
that of the verb agradecer (to thank). The verbs in question are conjugated using a set of inflections
{-zco, -ces, -ce, -cemos, -céis, -cén, etc.}. The figure 5 shows the result of VLL DLE 23 work on the
selection of the verbs representing such lexicographic type. On the left, a list of verbs included in the
lexicographic type is shown. At the top of the figure is the dictionary entry of a verb. Below is the
“Statistics” window, shown that the formed lexicographic type includes 218 verbs, of which 5 are
homonymous. In similar way the user can get any other lexicographic types taking into account various
linguistic properties. For example, we can get the verbs denoting a movement from one point to another.
In this case, lexicographic type will cover the words such as abordonar (to walk leaning on a stick),
amblar (to amble, to stroll), caminar (to walk), callejear (to wander), correr (run) etc.</p>
    </sec>
    <sec id="sec-13">
      <title>4.2.2. Researching language regularities</title>
      <p>One of the examples of linguistic research to be conducted by means of VLL DLE 23 is the way of
forming verbal nouns denoting the action and the result of this action. Such words are described in
DLE 23 using the definition: Acción y efecto de + verb. This definition pattern serves as a search
query. The results obtained are shown in Fig. 6.</p>
      <p>On the basis of these results the researcher can make certain conclusions regarding the use of the
suffixes to form such kind of nouns, e.g.:
 -ada if the noun is derived from the verbs denoting blows or similar actions: bofetada (slap),
puñalada (blow) etc.;
 -azo if the noun is derived from the verbs denoting blows with something: botellazo (blow
with bottle), culatazo (blow with a rifle butt) etc.;
 -ido if the noun is derived from the verbs denoting sounds or noises: chillido (scream),
ladrido (barking) etc.;
 -ón if the noun is derived from the verbs denoting energetic or quick actions empujón (push),
resbalón (slip) etc.</p>
      <p>This information can be used not only for linguistic research, but also for the preparation of
teaching materials on Spanish grammar.</p>
    </sec>
    <sec id="sec-14">
      <title>4.2.3. Statistics generation</title>
      <p>In addition to linguistic researches VLL DLE 23 is designed to generate statistics, both for the
entire dictionary and for a separate sample. For example, you need to count how many words in
Spanish have different forms for masculine and feminine gender. The result of the work is shown in
Fig. 7. The statistics obtained are as follows:
 19,011 headwords out of which
 840 are homonyms;
 111 are morphemes;
 2257 form collocations;
 16754 don’t form collocations.</p>
    </sec>
    <sec id="sec-15">
      <title>5. Conclusions and future works</title>
      <p>Currently the developed virtual lexicographic laboratory gives a user the opportunity to analyze
the text of the explanatory Spanish dictionary and perform on its basis:
 An inventory of the headwords satisfying the specified parameters (native word, foreign
word; morpheme, abbreviation, word, collocation etc.);
 Extraction of linguistic characteristics of headwords from the text. This makes it possible to
identify regularities in the Spanish language, which are presented in the implicit form in the
dictionary;
 Statistical studies that show the frequency of the considered linguistic phenomena (for
example, the ratio of national and borrowed vocabulary).</p>
      <p>In future the current version of VLL DLE 23 will be provided with an expanded toolkit to work
separately with each dictionary entry element, determining not only its presence or absence, but also
its specific content.</p>
    </sec>
    <sec id="sec-16">
      <title>6. References</title>
      <p>[1] A. Wills, E. Jóhannsson, Reengineering an Online Historical Dictionary for Readers of Specific
Texts, in: I. Kosem, T. Z. Kuhn (Eds.), Electronic lexicography in the 21st century: Smart
lexicography, Proceedings of eLex 2019 conference, Sintra, Portugal, 2019, pp. 116–129.
[2] M. Alipour, B. Robichaud, M.-C. L’Homme, Towards an Electronic Specialized Dictionary for
Learners, in: I. Kosem, M. Jakubíček, J. Kallas, S. Krek (Eds.), Electronic lexicography in the
21st century: linking lexical data in the digital age, Proceedings of eLex 2015 conference,
Herstmonceux Castle, United Kingdom, 2015, pp. 51–69.
[3] R. Lew, Online dictionary skills, in: I. Kosem, J.Kallas (Eds.), Electronic lexicography in the 21st
century: thinking outside the paper. Proceedings of eLex 2013 conference, 2013, Tallinn,
Estonia, pp. 16–31.
[4] Sobre la 23.ª edición del Diccionario de la lengua española, 2014. URL:
https://www.rae.es/sites/default/files/Cifras_23.a_edicion_del_Diccionario.pdf
[5] El nuevo diccionario académico será digital y más panhispánico, 2017. URL:
https://www.rae.es/noticias/el-nuevo-diccionario-academico-sera-digital-y-mas-panhispanico.
[6] T. Roth, Going Online with a German Collocations Dictionary, in: I. Kosem, J.Kallas (Eds.),
Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of eLex 2013
conference, Tallinn, Estonia, 2013, pp. 152–163.
[7] D. Deksne, I. Skadiņa, A. Vasiļjevs, The modern electronic dictionary that always provides an
answer, in: I. Kosem, J.Kallas (Eds.), Electronic lexicography in the 21st century: thinking
outside the paper. Proceedings of eLex 2013 conference, 2013, Tallinn, Estonia, pp. 421–434.
[8] V. Apresjan, N. Mikulin, Dictionary as an Instrument of Linguistic Research, in: Proceedings of
the XVII EURALEX International Congress: Lexicography and Linguistic Diversity, Tbilisi,
Tbilisi State University, 2016, pp. 224–231.
[9] V. Shyrokov, Computer lexicography, Kyiv, 2011.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>