<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Developing Linguistic Research Tools for Virtual Lexicographic Laboratory of the Spanish Language Explanatory Dictionary</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>n Kuprii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nunu Akopi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The present article is devoted to the problems of creating linguistic tools for the virtual lexicographic laboratory of Spanish explanatory dictionary (DLE 23). The goal of the research is to consider some issues related to the development of linguistic tools for the virtual lexicographic laboratory. To achieve this goal the dictionary was analyzed to define the peculiarities of linguistic facts representation, its structure and metalanguage. On the basis of the dictionary analysis and the theory of lexicographic systems the formal model of DLE 23 was developed and its main components, including their relationships, were determined to ensure their availability via linguistic tools for accessing linguistic information. The range of research activities to be performed by using the linguistic tools was outlined.</p>
      </abstract>
      <kwd-group>
        <kwd>Computer Lexicography</kwd>
        <kwd>Virtual Lexicographic Laboratory</kwd>
        <kwd>Digital Environments</kwd>
        <kwd>Electronic Dictionaries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Explanatory Dictionary of the Spanish Language (herein after DLE 23, Diccionario
de la lengua española. Edición del tricentenario (23ª edición)), like other big
explanatory lexicons, contains profound systemic language regularities, which, being almost
hidden from the reader, play an important role in identifying
linguisticallyinformative potential of the language.</p>
      <p>In general, the properties of the language as a system, as well as their presentation
in comprehensive dictionaries of explanatory type, are the object of many researches,
for example [1; 2; 3; 4; 5]. In our opinion, such properties have the best representation
in monolingual explanatory dictionaries. The article is focused on large, mostly
multivolume lexicons, which contain the major part of the national lexicon and
phraseology, and that are characterized by a detailed description of lexical-grammatical and
lexical -semantic systems of the language. Due to the great amount, elaborated
structure and completeness of a lexicographic description such dictionaries are carriers of a
huge number of implicitly-defined linguistic, cognitive, logical and other relationships
( mostly uncontrolled), making this extensive lexicographical system a kind of
“thingin-itself” [6].</p>
      <p>This raises a question of the development of methodology and technology of
creation of such lexicographical objects, and also a question of study a variety of effects
that explicitly or implicitly operate there. From the beginning, we are talking about
the methods of computational linguistics, because, as noted in the book “Computer
lexicography” [6], it is physically impossible to perform such studies with a help of
traditional methods. So, the first problem here is to create digital analogues of
corresponding traditional lexicographic studies or convert them into digital form, followed
by the explication of underlying systemic linguistic regularities.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>The Ukrainian Lingua-Information Fund (Kyiv city), where studies devoted to this
work had been conducted, developed a universal theoretical basis, focused on the
construction of lexicographic objects of almost unlimited size and complexity, and on
implementation of a profound study of the language systems. We are talking about a
theory of lexicographic systems, using the theory the Fund designed and elaborated
linguistic software tools to perform researches on the basis of the Ukrainian, Russian
and Turkish language explanatory dictionaries [7; 8; 9], as well as the tools for
etymological studies of the Ukrainian language [10].</p>
      <p>Using these tools (and partly even before they had been created, but with
approaches based on the theory of lexicographic systems), a number of studies were
conducted and obtained a series of fundamentally new linguistic results for the
Ukrainian language. Among these result we should mention a study and establishment
of formal structure of headword rows of verb and noun (the Ukrainian language) [9;
10; 11; 12].</p>
      <p>However, the developments of the Fund successfully applied to the Ukrainian,
Russian and Turkish languages can’t be automatically used for the Spanish language
for the following reasons:
 typological features of the Spanish language, among them are part-of-speech
variation, dependability of lexical meaning on grammatical one, the possibility of
acquiring lexical meaning by a word when it functions in the specific grammatical
meaning;
 peculiarities of metalanguage and entry structure of DLE 23 to describe
grammatical and lexical properties of Spanish language units.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>The method of formal modeling of lexicographic objects like DLE 23 is based on the
theory of lexicographical systems developed by Prof. Volodymyr Shyrokov.
According to the theory of lexicographical systems, one of the most important relationships
is “subject – object”. In this context we will note that the subject (designated by the
symbol S) can be a person or a group of persons (lexicographers, linguists, experts
etc.) and the object is a set of elementary information units (EIU) marked as IQ(D). In
other words, when processing the EIUs in its mental apparatus, the subject acquires a
set of their descriptions V (IQ(D)). It can be formally represented as:
For each elementary unit x ∊ IQ(D) there is a description V(x) represented as a
dictionary entry. In its turn, V(x) is an element of V(IQ(D)). Therefore, we can assert that
V(IQ(D)) is DLE 23 comprised by the set of the entries V(x):</p>
      <p>
        :   ( ) →  (  ( ))
 (  ( )) = ⋃ ∈ 〱( )  ( )
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
The description of the elementary information units V(x) is represented as a text, a
certain sequence of characters, which is called A-word. The characters of the A-word
compose the finite alphabet: А = {а1…аn}. So, the alphabet of DLE 23 covers:
1) Spanish alphabet (A ... Z, a ... z) including diacritic characters (ñ, ç; á, é, í, ó, ú, ü);
2) Greek alphabet characters (α ... ω); 3) international Latin alphabet; 4) dictionary
metalanguage symbols; 5) punctuation symbols including paired symbols (¡!, ¿?);
6) font patterns.
      </p>
      <p>
        Within each A-word the A-subwords formed of the A-alphabet characters, are
possible to be distinguished. Let us designate the set of A-subwords as B[V(x)]. In case of
DLE 23 the A-subwords are as follows: 1) headword; 2) headword row; 3) headword
variants; 4) etymology; 5) irregular forms; 6) orthography; 7) definition block. All of
them compose the set B[V(x)]:
 [ ( )] ≡ {  ( ) |  = 1,2, …  }
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
In its turn β(х) structure may be subdivided into smaller elements as a result of
operator action:
The application of the method based on the theory of lexicographic systems for the
formal modeling of DLE 23 will be described in the next section. But the principles of
revealing β(х)-structures and respective [β(х)] elements as well as building-up a
formal modeling of DLE 23 are not possible to be worked out in the preliminary study
of the metalanguage and the entry structure of the dictionary.
      </p>
      <p>
        Here βi(x) is an A-subword and i is an index number of an A-subword in the entry of
DLE 23. We’d like to note that the identification of βi(x)-elements is based on the
peculiarities of the dictionary metalanguage that clearly establishes the rules for the
representation of a particular element of the dictionary entry. Thus, we have defined
the first lexicographic structure to be induced within the set of descriptions V(IQ(D)):
Furthermore, β(х) will be the set of the structural elements of the entry V(х) devoted
to selected headword х:
 (  ( )) ≡  = {  , 1,2, … }
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
 ( ) = {  ( ), 1,2, … }
      </p>
      <p>:  →  [ ]</p>
    </sec>
    <sec id="sec-4">
      <title>Lexicographic system of DLE 23</title>
      <sec id="sec-4-1">
        <title>Entry structure and metalanguage peculiarities</title>
        <p>The entry is subdivided into the left and right parts. The left part is made up of a
headword, headword row and information block. The letter contains headword
variants, etymological information, orthography and word flection properties. The right
part explains the meaning of the headword. The main and obligatory element is the
definition. In case of lemmas with part-of-speech variation, the definitions are
grouped according to the part of speech (grammatical category). The example of a
dictionary entry is shown in Figure 1. Let us deeply analyze the peculiarities of
representing linguistic peculiarities of Spanish words in the entry.
The information zones corresponding to β5(х) and β6(х) don’t have their own
metalanguage marker and they can be only identified in the entry by their place in the
sequence of information zones. For example, β5(х) always goes after etymology. The
definition block, in its turn, can be decomposed in β7(х)GRAM (corresponding to
partof-speech note and / or grammar category), β7(х)PRAGM (a group of notes denoting
pragmatic use of the headword, e.g. domain, geographic area, social dialect etc.) and
β7(х)SEM (definitions corresponding to β7(х)GRAM and β7(х)PRAGM). To group the
definitions according to each β7(х)GRAM the following metalanguage means are used in DLE
23:
 black circle “” to identify the group of definitions belonging to the part of speech
of the headword;
 white circle “○” to identify the group of definitions belonging to the grammar
category of the headword;
 vertical parallel bars “||” to separate definitions within the group marked with black
or white circle.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Model of the lexicographic system of DLE 23</title>
        <p>In the structure of dictionary entries, we distinguish the set of register (head) lexical
units W = {х}, which serve as the identifiers of the corresponding dictionary entries
V(x). The DLE 23 register includes words and morphemes, certain phrases and
abbreviations. Representation of morphemes, phrases and abbreviations as headwords is
not inherent for most explanatory dictionaries. For convenience, all language units
that act as a headword will be called headwords.</p>
        <p>
          In the structure of each dictionary entry V(x) there is a “left part” L(x), which
consists of certain headword parameters, and there is “right part” P(x), in which the
lexicographic representation of the semantics of a headword x is given. In the case of
DLE 23, we distinguish two types of language units: lexical level units and
collocations (which include the headword). Therefore it is natural to present the structure of
the dictionary unit V(x) in the form of a combination of descriptions (dictionary
entries) of structural units of both types:
 ( ) ≡  
( ) ⋃[⋃ ( ) ⋃ ( ) 



( )]
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
Here VLex(x) is a lexicographic description of the headword x; Vi jFras(x) is a
description of the j-th phrase of i-th type; m (i) is the number of phrases of i-th type, and n(x)
is the number of phrase types in the dictionary entry V(x). Each lexicographic
description V Lex(x) and Vi Fras(x) corresponds to the basic structure shown in Figure 2.
        </p>
        <p>In the case of V = VLex(x), the headword of the dictionary entry with the
corresponding parameters (which we shall now name the parameters of the headword) acts as L0.
For Vi Fras(x), L0 is the phrase in the register dictionary form plus the parameters of the
head unit. The structure of the right part P0 is identical for a lexical unit and phrase.
Arrows hereinafter indicate relations of inclusion. The text analysis of the dictionary
entries showed that the structures V(x) are almost identical for collocations and
headwords.</p>
        <p>
          To build up a formal model of the lexicographic structure of DLE 23, taking into
account its features mentioned above, we relied on the theory of L-systems by
Volodymyr A. Shyrokov [5], according to which any dictionary can be represented as:
{ ( ),  ( ( )),  ,  [ ], 
[ ( ( ))]}
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
Here D is DLE 23; I(D) = {xi} is a set of headwords; V(I(D)) = {V(xi)} is a set of
lexicographic descriptions, namely dictionary entries; β is a set of structures within
V(I(D)) marked out during dictionary text analysis; σ[β] is a single structure generated
by the operator σ on β; The limitation of the operator σ on V(xi) generates the
dictionary entry σ[xi]; Red[V(I(D))] is a recursive reduction mechanism which detects more
subtle structural elements of the dictionary. In its turn, the set of lexicographic
descriptions of each unit xi ∊ I(D) can be decomposed in several subsets (Figure 3). The
equation (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) will be hereinafter considered as the model of lexicographic system of
        </p>
        <p>By VW(xi) and VCOL(xi) we designate the sets of descriptions of headword and derived
collocations, respectively; VWL(xi) and VWR(xi) correspond to the “left” and “right”
parts. Further, the set VCOL(xi) is divided into the two subsets: a) VCOL’(xi) for
collocations of “noun + adjective” type; b) VCOL’’(xi) for collocations of other types, e.g.
verbal, adverbial, prepositional etc.</p>
        <p>The content example of β-structures for VW(cómico) and VCOL’(cómico de la legua)
and VCOL’’(ponerse cómico) is shown in Table 2, based on the entry text (Fig. 1).
As it was stated above the βj(хi) are possible to be split into smaller objects σ[βj(хi)]
generated by the σ-operator. In DLE 23 there are σ[βj(хi)] objects corresponding to:
1. a certain information element of the entry, e.g. only etymology, headword variants,
definition block etc. (σ0);
2. linguistic information distribution in the entry, e.g. distribution of lexical meanings
by grammatical classes or categories of a headword (σ1);
3. information provided by the dictionary metalanguage, e.g. definition types (σ2),
origin, etymon language, number of definitions etc.
4. linguistic information concerning the interaction of different language levels, e.g.</p>
        <p>morphology – semantics; (σ3).</p>
        <p>The full list of σ[βj(хi)] objects as a result of application of σ-operator to βj(хi)
structures, is given in Table 3.
To conduct linguistic researches on the basis of DLE 23 the interface of the virtual
lexicographic laboratory allows:
 the access to the subsystems VLEX(x), VCOL’(xi) and VCOL’’ of the Spanish language
dictionary (vocabulary and collocations);
 selection of σ[βj(хi)] structures within the subsystems to get linguistic facts of any
kind (for example: etymology and semantics of a Spanish word);
 logical operations on σ[βj(хi)] structures to reveal a group of Spanish language
units with common linguistic peculiarities.</p>
        <p>Applying logical operations “AND”, “OR” and “NOT” to different σ[βj(хi)] objects
through the interface, it is possible to make a sample of Spanish language units having
common linguistic characteristics. For example, hereinafter we form a sample of
words the suffix of which denotes the process and result. In this case the set of
σ[βj(хi)] is as follows: σ0[β1(хi)] = «aje» AND σ2[β1(хi)] = «m.» AND σ3[β1(хi), β7(хi)]
= «Acción y efecto de + X». As a result, we’ll get the words with suffixes: -ado
(lavado, peinado), -aje (etiquetaje, embalaje), -ión (cubrición, gestión).</p>
        <p>The linguistic tools for conducting researches of the Spanish language are grouped
in respective tabs devoted to each DLE 23 information element, for example
“Headword”, “Headword variants”, “Etymology” etc. Each tab corresponds to specific
σ[βj(хi)]-structure and contains checkboxes to set σ-links (“Origin”, “Homonymy”,
“Headword type” etc.).</p>
        <p>In our opinion, the above mentioned features of the interface distinguish the virtual
lexicographic laboratory from other electronic (online) lexicographic resources
(www.dle.rae.es, www.oed.com, dictionary.cambridge.org etc.).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>One of the main tasks of modern computer lexicography is updating and supporting
fundamental lexicons – large paper explanatory dictionaries – in digital environment.
Computer lexicography successfully solves this task by combining many years of
traditional lexicography experience with the latest computer technologies. Virtual
lexicographic laboratories (VLLs) are the result of such combination, i.e. systems that
enable both the operation of dictionary material and the conducting series of linguistic
studies.</p>
      <p>The virtual lexicographic laboratory provides the users with linguistic tools with a
wide range of opportunities for the study of grammatical, semantic, pragmatic, and
other features of the Spanish linguistic units. Unlike digital dictionaries and dictionary
writing systems the VLL offers a software interface for implementation of:
 access administration functions: users authorization and identification; new users
adding and removing; access control (read only, reading and editing of the
dictionary);
 lexicographic works: creation of a number of derivative dictionaries on the basis of
explanatory dictionary; representation of dictionary entries in any format;
 research work: research at a certain language level, presented in the explanatory
dictionary (grammar, including derivation; lexis, including semantics; pragmatics);
research at the junction of language levels: grammar and semantics, word forming
and semantics, semantics and pragmatics etc.</p>
      <p>Unabridged monolingual dictionaries, among them DLE 23, in digital format are
found to be powerful research environment facilitating the navigation and access to
their structural elements and integration of language facts in one object. This can be
achieved by formalizing the structure of dictionary text in a form of β-structures and
σ-links.</p>
      <p>In prospect, it is planned to develop the theory of lexicographic systems by Prof.
Volodymyr Shyrokov for creating the virtual lexicographic laboratory for the
dictionary of Spanish language inflection. To elaborate this laboratory it is necessary to work
out the principles of word flection formal modeling, especially for the Spanish
language.
6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Saussure</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          : Cours de linguistique générale, Paris (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mulder</surname>
            ,
            <given-names>J. W. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hervey</surname>
            ,
            <given-names>S. G. J.</given-names>
          </string-name>
          :
          <article-title>Language as a System of Systems</article-title>
          . In: La Linguistique.
          <volume>11</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>22</lpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Trubetzkoy</surname>
          </string-name>
          , N.: Principles of phonology, Berkeley, University of California Press (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mel</surname>
          </string-name>
          <article-title>'čuk, I. A.: Explanatory combinatorial dictionary</article-title>
          . In: Open Problems in Linguistics and Lexicography. Polimetrica, Monza, pp.
          <fpage>222</fpage>
          -
          <lpage>355</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hjemslev</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Prolegómenos a una teoría del lenguaje</article-title>
          , Madrid (
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Shyrokov</surname>
          </string-name>
          , V.: Computer lexicography,
          <source>Kyiv</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Ukrainian language dictionary in 20 volumes: Rusanivsky V. (ed.),
          <source>Kyiv</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ozhegov</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Tolkovyiy slovar russkogo yazyika</article-title>
          , Moscow (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shyrokov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyrokov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Zastosuvannia formalizmu nechitkykh mnozhyn dlia vyznachennia hramatychnykh staniv turetskykh sliv</article-title>
          .
          <source>In: Movoznavstvo</source>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>Etymological dictionary of the Ukrainian language in 7 volumes: Melnychuk (ed</article-title>
          .),
          <source>Kyiv</source>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pogribna</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chumak</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyrokov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shevchenko</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Linhvistychna klasyfikatsiia ukrainskoho imennyka u svitli teorii leksykohrafichnykh system</article-title>
          .
          <source>In: Movoznavstvo</source>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>82</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rabulets</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sukharina</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyrokov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakymenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Diieslovo v leksykohrafichnii systemi</article-title>
          ,
          <source>Kyiv</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>