<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Stepana Bandera Street, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kyrpychova str. 2, Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ukrainian Lingua-Information Fund of NAS of Ukraine</institution>
          ,
          <addr-line>3, Holosiivskyi avenue, Kyiv, 03039</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vasyl Lytvyn</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper describes the main methodological and technological solutions for parsing of dictionaries elaborated at the Ukrainian Lingua-Information Fund (ULIF) by the authors. The research is carried out on the basis of the digital text of the Dictionary of the Spanish Language (DLE 23). First of all, explanatory dictionaries of national languages as the most complicated multi-parameter lexicographic systems are of the greatest interest, since they provide the most complete lexicographic description of a language, are created by leading specialists (linguists and IT-engineers) and have wide opportunities to use modern digital technologies to the fullest extent. But the problem of extracting linguistic information for its further use (especially for computerized text processing) has not been solved at present. Therefore, the focus of the research is made on the parsing technology to develop the virtual lexicographic laboratory on the basis of the dictionary text parsed from the online version of DLE 23.</p>
      </abstract>
      <kwd-group>
        <kwd>digital lexicography</kwd>
        <kwd>lexicographic system</kwd>
        <kwd>lexicographic data model</kwd>
        <kwd>lexicographic data</kwd>
        <kwd>data analysis</kwd>
        <kwd>database</kwd>
        <kwd>user interface</kwd>
        <kwd>lexicographic data formats</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        One of the main tasks of modern lexicography is to use the potential of the digital environment to
meet the information needs of today's advanced users and lexicographers. Currently, both dictionary
compilation and its logistics rely on digital technologies. These are primarily corpus technologies
CQS (Corpus Query Systems) and digital systems for dictionary compilation and updating DWS
(Dictionary Write Systems). A new challenge for lexicography is the direct participation of IT
specialists at all stages of creation and use of lexicographic products [
        <xref ref-type="bibr" rid="ref1 ref10 ref2 ref3 ref4 ref9">1–4, 9, 10</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>Despite significant advances in modern digital lexicography, standards for representing lexicographic data that are maximally adapted to the conditions of the digital environment remain open. This also applies to the standards of work with these data.</title>
        <p>
          First of all, explanatory dictionaries of national languages are of the greatest interest, as they
provide the most complete lexicographic description of a language, are created by leading specialists
and have the opportunity to use modern technologies to the fullest extent [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ]. The use of CQS
and DWS allows to work non-stop, i.e. the process of dictionary creation and editing is ongoing and
the user has the opportunity to access lexicographic information at the current stage of the
lexicographic process (Oxford English Dictionary [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], Dictionary of the Ukrainian Language in 20
volumes [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). However, despite the availability of advanced (as compared to printed versions) user
interfaces, their ability to search, analyze and summarize linguistic information, primarily for
professionals, is still limited. Authors traditionally develop not only the structure and content of
dictionary entries, but also the search capabilities of the dictionary. As a result, the problem of
extracting linguistic information for its further use by experts in their research has not yet been
solved. Therefore, the goals of our research work are 1) to develop interface schemes for linguistic
research based on the dictionary entries of an explanatory dictionary; 2) to design the structure and
implementation of a database to support lexicographic data; and 3) to build an effective research
toolkit. Unlike paper dictionaries, this is a feasible task for digital dictionaries [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Recent developments in dictionary parsing</title>
      <sec id="sec-2-1">
        <title>2.1. Parsing and data analysis in general terms</title>
        <p>The process of converting structured text into a particular data structure is called parsing. Any
suitable format of the information contained in the source text can be used as a data structure type.
In natural language processing the term “parsing” was first used to describe syntax analysis and later
to refer to the related analysis, parsing still translated as syntax analysis by Google Translate. This
technique is employed in contemporary settings when processing vast volumes of data is required,
which can be challenging, if not impossible, to handle manually. Parsing is typically used to organize,
process, store, and extract data from websites. Making data available in a machine-readable format
is the primary goal of parsing, as most data is often given in a human-readable form. Depending on
particular needs and goals, parsing technologies can be used in a wide range of ways. In this regard
parsing is becoming more and more crucial for any business as the digital economy grows. An
increasingly significant component of research and organizational management is data analytics.
Although data comes from a variety of sources, the Web remains its biggest storage. Businesses are
seen to be using more complex methods to retrieve information from the internet as big data
analytics, artificial intelligence, and machine learning develop.</p>
        <p>Parsers carry out the parsing process. Normally, when people refer to “parsers”, they mean
software applications. Parsers, sometimes known as “bots”, are designed to browse websites,
download pertinent pages, and extract information that can be used for different purposes. These
bots are able to retrieve vast amounts of data in a short amount of time by automating this process.
Given that the data is updated and changing frequently, this offers clear benefits. Websites gather
all kinds of data, including text, photos, and videos. There are many uses for parsing, particularly in
data analysis. In order to analyze consumer sentiment, market research firms utilize parsers to
extract data from internet forums or social media. Others research competition by extracting data
from vendor websites such as Amazon or eBay. Google frequently analyzes, ranks, and indexes its
content via parsing. Contact parsing is another practice used by many businesses when they take
contact details off the internet and use them for marketing. Few limitations exist on the applications
of parsing. Mostly, everything comes down to goals and creativity.</p>
        <p>Nevertheless, it should be noted that parsing data, including bank account information and other
personal information, for the sake of fraud and intellectual property theft is a negative aspect of
parsing. Though the precise approach may differ based on the program or tools utilized, all parsers
adhere to the following three fundamental principles:</p>
        <p>Step 1: Sending HTTP request to the server. The parser sends an HTTP request to the target
website as its initial action.</p>
        <p>Step 2: Extracting and parsing the code from the website. The bot can read and extract a
website’s HTML or XML code once it has access to the parser. The content of the website is
structured according to this code. To be able to recognize and extract predefined elements or objects,
the parser then parses the code, which is basically breaking it down into its component parts. These
could be identifiers, tags, classes, ratings, particular sentences, or other data.</p>
        <p>Step 3: Locally storing the pertinent data. The parser will save the pertinent information
locally after receiving, extracting, and parsing the HTML or XML. Typically, the data is saved in a
structured format, such as .csv or.xls.</p>
        <p>Following the completion of these procedures, the data can be utilized as planned. However, the
process is actually carried out numerous times rather than all at once. Numerous problems that
require attention are to blame for this. A website may crash, for instance, if poorly developed parsers
send an excessive number of HTTP requests. The restrictions on what bots may and cannot
accomplish vary from website to website.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Parsing techniques earlier used in ULIF dictionary-making practice</title>
        <p>Monolingual explanatory dictionaries that describe a general-purpose language in its entirety and
specialized dictionaries that describe special-purpose languages are the first-order objects for
parsing. The list of objects should be restricted to reputable dictionaries that most accurately reflect
the range of language units because parsing is a somewhat resource-intensive process. Though they
may not be very extensive dictionaries, they can be viewed as an extension of the primary general
language dictionary, which includes things like adding new meanings, expanding the list of language
units, and using different representations of parameters (as opposed to the standard one).</p>
        <p>Parsing a printed dictionary text. This is the very first technology that was developed and
created at ULIF to convert the printed Ukrainian language dictionary (SUM 20) into the database
format during the creation of a virtual lexicographic laboratory designed to publish and update the
dictionary. The parsing process using this technology includes the following steps:
1. Analyzing the printed text and constructing a conceptual model of the lexicographic system.
2. Scanning the text, recognizing and reproducing the text in a digital word processor (with the
recovery of structural element markers in linear text).
3. Database schema construction based on the conceptual model.
4. Conversion of marked-up text into a database.
5. Building a computer-based toolkit to provide access to the structural elements of dictionary
entries.
6. Development of computer toolkit to build sub-dictionaries on the basis of the main dictionary
text.</p>
        <p>Parsing a dictionary in pdf format is an improved variant of the first technology, and it is
applied to work with dictionaries in the publishing system format (PDF). This parsing technology
was applied to the Dictionary of Ukrainian biological terminology. The main stages are:
1.
2.
3.
4.
5.
6.
7.
8.</p>
        <sec id="sec-2-2-1">
          <title>Converting dictionary text into word processor format (.doc).</title>
          <p>Building a conceptual model of the lexicographic system.</p>
          <p>Text verification (recovering structural element markers in linear text). Using basic HTML
markers for marking up structural elements.</p>
          <p>Building an XML schema based on a conceptual lexicographic model.</p>
          <p>Conversion of marked-up text into a structured XML file.</p>
          <p>Building a database schema based on XML structure.</p>
          <p>Converting an XML file into a database.</p>
          <p>Building a WEB-site offering a predefined interface.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Parsing technique for DLE 23</title>
      <p>
        Our interest in parsing the Dictionary of the Spanish Language (DLE 23) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is motivated by the
factors that are as follows: 1) the international status of the Spanish language; 2) the scientific status
of the dictionary; 3) the distinctive school of lexicography. Another, perhaps the most important
reason for choosing the dictionary for the research is the availability of a digital version in HTML5
format, which guarantees the authenticity of the dictionary text and transparent structure. DLE 23
is a basic dictionary that includes words that are frequently used in Latin America and Spain. The
lexical meanings of language units and an in-depth explanation of their grammatical, syntactic, and
pragmatic characteristics are included in every dictionary entry. It should be mentioned that the
dictionary's philosophy is derived from the ideas of renowned Spanish lexicographer J. Casares [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
This idea holds that a dictionary is a tool that gives the user the resources they need to locate the
pertinent words and phrases they might need during communication, rather than a collection of
entries sorted alphabetically [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, our goal is to develop techniques for parsing the entire
text of the digital version of the dictionary and to present the results of the study for use not only
for advanced users who use the dictionary as a reference and information system, but also for
specialists in the field of computer assisted text processing.
      </p>
      <p>In the context of our research, the explanatory dictionary of a national language is considered as
a comprehensive source of information for linguistic research. Due to the large volume, complex
structure and completeness of lexicographic description, such dictionaries carry a huge number of
implicit linguistic, cognitive, logical and other relations that are difficult or almost impossible to be
studied by traditional methods. For a digital dictionary, it is necessary to provide access to any
structural element of a dictionary entry and the ability to select a set of entries that meet the user's
research interests. In other words, the interface of a digital dictionary should provide the possibility
of searching not only by the registered word, as in most digital dictionaries.</p>
      <p>The parsing procedure's algorithm, first created and used for DEL 23, is as follows:
1. Scanning the printed text of the dictionary, recognizing and generating a verified list of
headwords of the dictionary entries.
2. Developing a program (bot) that reads dictionary entries in HTML5.0 format from the
dictionary website.
3. Building a conceptual model of the lexicographic system (based on the structure of the
dictionary entry as represented on the dictionary website).
4. Establishing the correspondence of HTML 5.0 markers to the structural elements of the
conceptual model (a special software package was developed for this purpose).
5. Building the schema of an XML file.
6. Converting linear text in HTML markup to an XML file.
7. Designing a database.
8. Building software tools to provide the access to the structural elements.</p>
      <p>ULIF in its dictionary-making practice makes a distinction between an encoding scheme or
database that may replicate a lexicographic system and a formal model, which is a conceptual
representation of the system. Regardless of the conditions and/or limitations placed on its ultimate
representation, the form and content of lexical information is considered in abstract way. This is
particularly important since these possible representations will vary from one application to another;
in particular, dictionaries may be encoded not only for publication purposes in print or electronic
form, but also to create computational lexicons for use in natural language processing applications.
Therefore, it is currently preferable to use the XML format to represent a conceptual model of a
lexicographic language, which can later be transformed into many alternative formats.</p>
      <p>XML (Extensible Markup Language) is a standard proposed by the World Wide Web Consortium
(W3C) for building markup languages of hierarchically structured data for exchange between
different applications, in particular via the Internet. It is a simplified subset of the SGML markup
language (note that SGML was used to mark up the text of the Oxford Dictionary when it was
digitized). An XML document consists of text characters and is human readable. This format is
flexible enough to be suitable for use in a variety of industries. In other words, this standard defines
a meta-language from which specific, subject-oriented data markup languages are defined by
imposing constraints on the structure and content of documents.</p>
      <p>At the beginning of its lexicographic activity ULIF used technology to convert files in .rtf markup
(publisher format) directly into a database, the schema of which was based on a lexicographic data
model. However, with the parsers for the Dictionary of Spanish and the Dictionary of Ukrainian
Biological Terminology made use of XML-format, it is possible to reject the format imposed by the
printed representation of data and effectively capture the structure of the conceptual model.
HTML5.0 (Hyper Text Markup Language), or Hypertext Markup Language, was originally used for
visual text markup. Today it is comparable to XML in its capabilities. Therefore, it can be effectively
used to convert XML into a format for visual representation in the WEB. Additionally, it serves as
the primary format for information presentation on the Internet. All publishing platforms support
the communication publishing format known as PDF. It is an electronic version of a collated text
that is prepared for printing. Every lexicographic system needs a paper outlining a conceptual data
model, regardless of the type of data representation. A document in the doc, docx, or pdf formats
should serve as the standard representation of this model.</p>
      <p>Lexicographic systems are well-structured linguistic data that are represented in the digital
environment's communicative formats. Although their traditional function as reference and
information systems remains relevant, NLP (natural language processing) tasks are becoming more
and more important. It's semantics, to begin with. Dictionary interfaces were developed in the "paper
environment" and are focused on providing access to a certain dictionary entry using a specific form
of a linguistic unit, usually a headword. This strategy has been mostly carried over into digital
reproductions. The user's unique linguistic competencies serve as the sole foundation for deeper
linguistic information processing. Research of linguistic data of the lexicographic system can be
brought to the level of algorithms thanks to the structure of a dictionary entry, which is suitably
represented by the language environment. The user has access to arrays of linguistic data chosen
based on a typical set of parameters rather than individual entries. Parsing dictionaries and storing
information in digital communication formats of the digital environment opens up great possibilities
for the integration of lexicographic systems.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>The interface for DLE 23 online version is implemented taking into account the advantages of the
digital environment for the user. First of all, it refers to the visualization of dictionary entries, which
is shown below by examples.</p>
      <p>Example 1. Dictionary entry visualization in a printed format.</p>
      <p>abombar2 De bomba. 1. tr. Dar forma convexa. ○ intr. 2. Dar a la bomba. ○ prnl. 3. Dicho de una
cosa: Tomar forma convexa.</p>
      <sec id="sec-4-1">
        <title>Example 2. Dictionary entry visualization in online version.</title>
        <p>abombar2
De bomba.
1. tr. Dar forma convexa.
2. intr. Dar a la bomba.
3. prnl. Dicho de una cosa: Tomar forma convexa.</p>
        <p>Example 3. Dictionary entry visualization with HTML language (The text in bold is the text which
can be seen on the screen; the rest being hidden text that manages the visualization of the dictionary
entry).</p>
        <p>&lt;article id="088zJNJ"&gt;
&lt;header title="Definición de abombar" class="f"&gt;abombar&lt;sup&gt;2&lt;/sup&gt;&lt;/header&gt;
&lt;a class="e2" title="Conjugar el verbo abombar2" href="#conjugacioncbD9rJq"&gt;&lt;/a&gt;
&lt;p class="n2"&gt;De &lt;em&gt;bomba&lt;/em&gt;&lt;/p&gt;
&lt;p class="j" id="04CwhyP"&gt;&lt;span class="n_acep"&gt;1. &lt;/span&gt;&lt;abbr class="d" title="verbo
transitivo"&gt;tr.&lt;/abbr&gt; &lt;mark data-id="BrtRK35"&gt;Dar&lt;/mark&gt; &lt;mark data-id="IEvo12v |IFIVvz0"
&gt;forma&lt;/mark&gt; &lt;mark data-id="AgxvK91"&gt;convexa&lt;/mark&gt;.&lt;/p&gt;</p>
        <p>&lt;p class="j2" id="04E67HH"&gt;&lt;span class="n_acep"&gt;2. &lt;/span&gt;&lt;abbr class="d" title="verbo
intransitivo"&gt;intr.&lt;/abbr&gt; &lt;mark data-id="BrtRK35"&gt;Dar&lt;/mark&gt; &lt;mark data-id="002rZ9U
|003Ov93"&gt;a&lt;/mark&gt; &lt;mark data-id="ESraxkH|MiZ5vEt|NWnohQu"&gt;la&lt;/mark&gt; &lt;mark
dataid="5pINrRS|5prGcPu"&gt;bomba&lt;/mark&gt;.&lt;/p&gt;</p>
        <p>&lt;p class="j2" id="04EVDwI"&gt;&lt;span class="n_acep"&gt;3. &lt;/span&gt;&lt;abbr class="d" title="verbo
pronominal"&gt;prnl.&lt;/abbr&gt; &lt;mark data-id="BxLriBU|DgXmXNM"&gt;Dicho&lt;/mark&gt; &lt;mark
dataid="BtDkacL|BtFYznp"&gt;de&lt;/mark&gt; &lt;mark data-id="b67JJSq|b6hEWeB|b6iKApr"&gt;una&lt;/mark&gt;
&lt;mark data-id="B3yTydM|B4tWyfU"&gt;cosa&lt;/mark&gt;: &lt;mark data-id="ZzcN8W0"&gt;Tomar&lt;/mark&gt;
&lt;mark data-id="IEvo12v|IFIVvz0"&gt;forma&lt;/mark&gt; &lt;mark data-id="AgxvK91"&gt; convexa&lt;/mark&gt;.
&lt;/p&gt;
&lt;/article&gt;</p>
        <p>As can be seen from these examples, the transparency of the dictionary entry structure for the
user is provided by the complexity of the hidden text. Based on the analysis of the texts of the online
versions of DLE 23 entries, we identified the following parameters of the left part: RR (lemma forms),
DUPL (regional variant), ETYM (etymology), MORPHO (inflection), ORTHO (spelling), and UNCRT
(undefined parameter). Each parameter is represented in our model as a text string. The right part
consists of elements describing the lexical meaning. The polysemy of a headword is determined by
the number of these descriptions. Each description may include several structural elements, namely:
MNGN (definition number), REM (set of marks), DEF (definition), ED (encyclopedic reference), COM
(comment), and IL (illustration). The REM text string can be split into smaller fragments, each of
which contains a label of a specific type: REM-GR (grammar); REM-US (usage); REM-ST (style);
REMDOM (domain); REM-REG (geographic region). As a rule, a lexical value in the input text is described
by a DEF structural element. Additional comments (COM) are consistent with the definition. Each
definition and comment can be accompanied by its own illustrations (IL). An interpretation structure
may include several DEFs, COMs, and ILs. The example of the entry text decomposition for the
headword abombar is shown in the table 1.</p>
        <p>After XML-mapping of DLE 23 dictionary entries by the above-described principle, the next step
was to create a lexicographic database. In our experience, relational databases have proven to be
inefficient for lexicographic systems. In the case of relational databases, data is stored implicitly as a
set of several tables and relationships between them. Working with individual tables as a single
object requires the creation of a powerful software infrastructure. In addition, the evolutionary
potential of such a digital object is limited by the opacity of the database.</p>
        <p>It makes sense to express dictionary entries as classes in object-oriented programming languages
with additional processing, editing, and storing in an explicit manner because they are the
fundamental components of a lexicographic system with a precisely specified structure. The so-called
NoSQL databases (document-oriented databases) offer this capability. A document (object) with a
precisely defined structure – in our case, a dictionary entry – is the primary component that is stored
and processed in databases of this kind. For our project, NoSQL databases' primary benefit is their
capacity to store lexicographic objects explicitly without altering their internal structure. This allows
for direct access to every component of the lexicographic object and significantly reduces the
likelihood of editing and expanding it. The following criteria served as a guide when selecting a
particular NoSQL database: 1) simplicity of usage; 2) transaction mechanism support; 3) parallelism
support; 4) scientifically free. Consequently, the LiteDB database (http://www.litedb.org/) was used.
It is a free, comparatively basic version of the MongoDB shareware database. Since LiteDB is
constructed as a single library file (dll) and a single configuration file (xml), rather than as a whole
software package, it also has the benefit of being easy to install and connect.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Additional parameterization of dictionary items has been done in order to construct a prototype
version of the VLL. Every headword has a set of parameters given to it: 1) headword variations;
2) headword structure; 3) headword type; 4) homonymy; 5) number of meanings; and 6) number of
word combinations. The HTML text served as the basis for determining every parameter. The
structure of a dictionary entry is clarified by searching for articles using combinations of these
characteristics. The creation of an online application to interact with the VLL DLE database was the
final task. Based on this, the application was developed. The Net Core 2.1 technology. To make
modifying interface elements easier, a collection of HTML, CSS, and Bootstrap JavaScript templates
were utilized.</p>
      <p>The main window (Fig. 1) consists of the following interface elements: a panel, a register search
panel, a register units window, a window for the formatted text of a dictionary entry, and a window
for plain text with HTML markers. The search panel displays all the headwords of the dictionary in
alphabetical order (corresponding to the selection conditions). The list of headwords is divided into
groups of 150 items. The user can use the “Forward” and “Back” buttons or enter the page number
in the text field to go to the corresponding part of the general list. The formatted dictionary entry is
displayed in the same way as in the original online version of the dictionary. The HTML text of the
dictionary entry is used for full-text search (the search string may contain HTML markers and
metalanguage elements). The DLE 23 VLL interface provides the following modes of working with the
dictionary: a) headword list; b) dictionary entries; c) full-text search.</p>
      <p>Headword list search filters, which include entering a string of characters that either start or
end with the search phrase, or you can choose a search word by clicking on it in the register list.
When you don't know how to spell something, search filters help you discover it more quickly.
Diacritical characters can be entered more easily with the help of a virtual keyboard. The headword
list is composed of all word forms of the headwords (only a case for nouns and adjectives). Unlike
the original DLE 23, the VLL DLE 23 list is supplemented with feminine forms and regional variants.
Regardless of the operating mode, information about the number of dictionary entries is always
displayed. The current version works with 106323 entries.</p>
      <p>The dictionary entry mode in the current version of DLE 23 is intended for selecting dictionary
entries that meet the parameters of the structural elements of a dictionary entry. The mode is
activated by clicking the “Selection” menu, after which a dialog box appears. The dialog box has two
tabs: “Headword parameters” and “Explanatory part parameters” (Fig. 2).</p>
      <p>Samples in the VLL DLE 23 can be considered as sub-dictionaries. The available tools allow you
to create the following dictionaries:</p>
      <p>The full-text search mode is effective when you need to select dictionary entries by certain
meta-language elements of DLE 23: various labels, symbols that make up additional comments (“U.
t. c. s.”), meta-language markers (“Orth.”, “Conjug. c.”), etc. In addition, the search text string can
contain both the text of a dictionary entry and HTML code elements (“&lt;abbr title=”Usado solo en
sentido figurativo“&gt;”) from the text field. In the current version of VLL, full-text search combined
with the sampling tool is a very powerful tool for linguistic research. The figure 3 shows the example
of the entry samples, the headwords of which are mostly used in figurative meaning.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In today’s world, information is becoming one of the most valuable resources. In this focus we
consider linguistic information as well. The most authoritative sources of such information are
explanatory dictionaries of national languages. The linguistic paradigm shift brings these
dictionaries into priority sources for extracting linguistic knowledge. Traditional formats of
lexicographic data representation are not suitable for deep analysis. In addition, the problem of data
visualization remains open: presenting the results of data analysis in an understandable and visual
way.</p>
      <p>An explanatory dictionary is one of the most complex lexicographic products. Our research is
based on a lexicographic data model. It performs the following functions: it sets the algorithm for
parsing the text of a dictionary entry, it is used to form both the XML schema and the database
schema. We believe that the presentation of lexicographic information in XML and text-oriented
database formats will be effective for the further development of data analysis. Such a database is
the basis for the VLL, which in our case serves as a tool for data analysis and visualization. Today,
the VLL performs the following functions:

inventory of registered words that meet the set parameters (specific word, foreign word,
morpheme, abbreviation, phrase, homonymy, polysemy);

studying the linguistic features of register words in the text of a dictionary entry. This makes
it possible to identify regularities in the Spanish language that are implicit in the dictionary;
statistical studies that demonstrate the frequencies of the studied linguistic phenomena (for
example, the ratio of native and borrowed vocabulary, etc.).</p>
      <p>Based on these studies, the user can draw certain conclusions about the lexical-semantic,
etymological, grammatical and pragmatic features of Spanish language units. It is planned to expand
the toolkit to provide access to any structural element of a dictionary entry by various parameters
and to provide the ability to output lexicographic data in XML format.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>John</surname>
          </string-name>
          , Kelleher and
          <string-name>
            <given-names>Brendan</given-names>
            <surname>Tierney</surname>
          </string-name>
          . Data science, The MIT Press, Cambridge, MA,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>The Model Thinker: What You Need to Know to Make Data Work for You by Scott E</article-title>
          . Page, Basic Books, New York, NY,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mitkov</surname>
          </string-name>
          ,
          <source>The Oxford Handbook of Computational Linguistics, 2nd ed, Oxford Handbooks</source>
          , Oxford,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Shyrokov</surname>
          </string-name>
          , Language. Information. System, Akademperiodyka, Kyiv,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .15407/ academperiodyka.451.160.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Casares</surname>
          </string-name>
          ,
          <source>Nuevo concepto del diccionario</source>
          ,
          <string-name>
            <surname>Editorial</surname>
            <given-names>CSIC</given-names>
          </string-name>
          , Madrid,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <article-title>Diccionario de la lengua española</article-title>
          . URL: https://dle.rae.es/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Oxford</given-names>
            <surname>English</surname>
          </string-name>
          <article-title>Dictionary (OED)</article-title>
          . URL: https://www.oed.com/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] Dictionary of Ukrainian in 20 volumes (SUM-20), Volumes 1-15, Ukrainian Lingua Information Fund NAS of Ukraine, Kyiv</article-title>
          . URL: https://sum20ua.com/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Trap-Jensen</surname>
          </string-name>
          ,
          <article-title>Lexicography between NLP and linguistics: aspect of theory and practice</article-title>
          , in: J.
          <string-name>
            <surname>Čibej</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Gorjanc</surname>
          </string-name>
          , I. Kosem, S. Simon Krek (Eds.): Lexicography in Global Contexts,
          <source>Proceedings of the 18th EURALEX International Congress</source>
          <year>2018</year>
          ,
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          July Ljubljana,
          <year>2018</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zahra</surname>
          </string-name>
          <string-name>
            <surname>Belkadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Esbai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Model-Driven</surname>
          </string-name>
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          :
          <article-title>From Relational Database to Document-oriented Database in Big Data Context</article-title>
          ,
          <source>in: Proceedings of the 16th International Conference on Software Technologies, ICSOFT</source>
          <year>2021</year>
          , pp.
          <fpage>653</fpage>
          -
          <lpage>659</lpage>
          . doi:
          <volume>10</volume>
          .5220/0010604906530659.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>