<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Software Environment for Multi-aspect Study of Lexical Characteristics of Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena Sidorova</string-name>
          <email>lsidorova@iis.nsk.su</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Akhmadeeva</string-name>
          <email>i.r.akhmadeeva@iis.nsk.su</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>A.P. Ershov Institute of Informatics Systems, Siberian Branch of the Russian Academy of Sciences</institution>
          ,
          <addr-line>Acad. Lavrentjev avenue 6, 630090 Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>306</fpage>
      <lpage>315</lpage>
      <abstract>
        <p>The software environment for multi-aspect study of the lexical characteristics of the text is considered. The proposed environment provides tools and features allowing automatically building a dictionary based on a text corpus of interest. The created toolkit focused on lexical units acting as markers and indicators of higher level objects. The considered environment allows solving various text analysis tasks; because it integrates various tools for conducting language research and supports customization of vocabularies to a problem area. This toolkit includes interfaces for developing vocabularies and a system of features. To study the contexts of the use of terms, concordance construction tools are provided. Concordances allow the researcher to test his or her hypothesis about the functionality of a particular lexical unit. To describe more complex constructions to be extracted, a user can apply search patterns, supported by a user-friendly language. Using these patterns allows us to develop lexicographic resources containing not only the traditional vocabularies and stable inseparable lexical phrases, but also language constructs that have a more complex structure.</p>
      </abstract>
      <kwd-group>
        <kwd>domain vocabulary</kwd>
        <kwd>terminology</kwd>
        <kwd>concordance</kwd>
        <kwd>search pattern</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A plain text, as it is a source of information, and one of the most important means for
communication needs to be thoroughly studied. It is necessary for both evaluating
“quality” of what has been written and automatic text processing along with supporting
information retrieval services. Studying language phenomena and modeling text
understanding processes taking place at the different language levels are in the focus of
contemporary research in computational linguistics.</p>
      <p>
        In order to work out these problems, it is usual to apply a variety of knowledge
written in a formalized form. Among them are widely known thesauri such as WordNet and
RusNet, explanatory combinatorial dictionaries, annotated corpora of texts (for
example, The Russian National Corpus www.ruscorpora.ru), and other resources. Serving as
an instrument for describing a subject vocabulary, thesaurus allows us to characterize
terms and their connections from the point of view of peculiarities of use in this subject
domain [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Another way of studying the linguistic phenomena is to use corpora of
texts. A text corpus is the source and tool of multi-aspect lexicographic works [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
use of specialized methods, such as a frequency analysis of a vocabulary in the corpus,
construction of concordances on various grounds, can help in automating the work of
experts on a formal structures research, initial filling of dictionaries, and construction
of linguistic models on the basis of an annotated corpus of texts. Despite the widely
demanded functionality, there are no known analogues of the specialized set of
customizable components that integrate lexicographic research methods for Russian and
provide semantic markup of terms, statistical analysis, and construction of concordances.
As for other languages, similar functionality is presented on such platforms as GATE
(https://gate.ac.uk) or CLARIN portal (www.clarin.eu). Components developed by
various groups of researchers from different countries and for different languages are
presented in these resources. As well as a method of integrating components into a chain
of calculations is proposed.
      </p>
      <p>
        Literature overview [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3–5</xref>
        ] shows that many researchers having a task to extract
terminology from a large text collection usually choose to combine linguistic and
statistical methods. For extraction of lists of candidate terms that satisfy the specified
linguistic conditions, the method of search patterns describing classes of language expressions
is used. Depending on the type of language information taken into account, the patterns
used in various works are divided into grammatical, lexico-grammatical [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ] and
lexico-syntactic patterns [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Extraction of candidate terms is accompanied by
calculation of statistics and weights for filtering and sorting a result list. The list of candidate
terms includes not only special concepts established in this field, but also numerous
general scientific, peripheral and author’s terms that, as shown in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], are characterized
by a high degree of variation of the language form. In this situation, an expert
assessment stage is needed, at which the ranked lists are presented to the expert for selecting
true terms.
      </p>
      <p>This paper concerns describing various supporting tools for studying lexical
characteristics of a text based on corpora. Combining proposed tools allowed us to develop
an environment for creating problem-oriented vocabularies and provide the end user
with various possibilities to study language phenomena.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Requirements for the Text Research Support Environment</title>
      <p>The development of linguistic models and the creation of resources of sufficient quality
require scrupulous manual labor, supported by software tools. The software
environment should provide the expert with various tools to create necessary knowledge bases
and carry out case studies.</p>
      <p>We formulate the requirements for the system for multi-aspect study of lexical
characteristics of text as follows:
1. The system should be able to automatically fill vocabularies based on text
corpora;
2. The system user should be able to customize and add various attributes for
vocabulary terms;
3. The system should be able to carry out lexical analysis (segmenting a text, and
extracting terms that are presented in the vocabulary);
4. The system should keep statistical and combinatorial properties of language
phenomena found in texts;
5. The system should be able to build a concordance of terms and provide the user
with corresponding visualization tools.</p>
      <p>We developed a system including basic research tools that follows (Fig. 1): an interface
for developing a dictionary and creating a group of features, tools for automatic
generation of lexical content of a dictionary by the corpus of texts and calculating
quantitative characteristics of found terms, concordance construction tools for studying
contexts of lexical units.</p>
      <p>Expert
Text
corpus</p>
      <p>Text</p>
      <p>Concordance</p>
      <p>creation
Accumulation of
statistics
Methods of corpora research</p>
      <p>Identification of
characteristics
Calculation of</p>
      <p>frequency
characteristics</p>
      <p>Text analysis stages
Segmentation</p>
      <p>Morphological
analysis</p>
      <p>Multiword term
extraction</p>
      <p>Lexico-semantic
dictionary</p>
      <p>Search
pattern
generation
Knowledge
base</p>
      <p>Morphological
model</p>
      <p>Rules of
agreement</p>
      <p>System of
features
The considered lexicographic knowledge model includes three main components. The
dictionary defines the lexical model of the sublanguage under consideration, which
defined by the problem area. Grammar provides search and retrieval of lexical units from
texts. The set of user-defined pragmatically-oriented features supports recording of
observations, and is focused on further support of automated text processing methods.</p>
      <p>A representative problem-oriented corpus of texts lies in the research basis. The main
tools providing research support are following:
• search for examples of using vocabulary terms;
• build a variety of contexts (concordances);
• calculate frequencies, co-occurrences, distributions, etc.
3.1</p>
      <sec id="sec-2-1">
        <title>Lexical Model</title>
        <p>In our approach, the dictionary entry contains all information that necessary either for
extracting terms from text or for supporting the subsequent stages of the text analysis.</p>
        <p>A problem-oriented dictionary is a volume of vocabulary organized according to a
semantic (thematic/genre/etc.) principle, considering a certain set of basic formal
relationships. Formally, the dictionary is defined as a system of the form:</p>
        <p>V={W, P, M, G, S, Fw, Fp},
where W is a set of lexemes, where each lexeme is mapped to the entire set of its lexical
forms; P is a set of multiword terms defined as a pair of a form &lt;N-gram, structure
type&gt;. The N-gram specifies sequence of lexemes, and the structure type defines the
head of the phrase and rules for matching N-gram elements.</p>
        <p>M is a morphological model of language. It defines morphological classes and
features.</p>
        <p>G is a set of agreement rules which are used to extract multiword terms.
S is a problem-oriented set of features, terms could be marked with.</p>
        <p>Fw=W→2M⨯S, Fp=P→2G⨯S is a function that maps terms to sets of features.</p>
        <p>The morphological representation the system provides is designed in such a way that
it could be customized depending on the specific problem the user is working on. He
or she can define his or her own set of features and classes, and ensure they are
integrated in the basic morphological representation. A morphological class is defined by
a part of speech, a set of lexical features (for example, animacy or gender for nouns)
and a type of paradigm. It is a rather rare case when one would need to change class.
For example, it is necessary when using additional specialized dictionaries of terms
(dictionaries of names, geographical locations) or there is a need to include words of
another language in the dictionary.</p>
        <p>The description of morphological information includes the following concepts:
morphological attribute, class, part of speech, and type of paradigm.</p>
        <p>The morphological attribute is described by the name Ni and the set of its values Xi:
&lt;Ni, Xi&gt; (for example, &lt;Gender, {masculine, feminine, neuter}&gt;). Part of speech is
also an attribute, but since it must always be present, it was decided to create a separate
entity for this purpose. Attributes within each class are divided into derivational,
inherent to all forms of the lexeme of this class, and inflectional, distinguishing forms of one
lexeme.</p>
        <p>The paradigm type determines its length and matches each element of the paradigm
with a set of attribute values (for example, for a “simple” adjective it is a triple &lt;case,
number, gender&gt;). Such elements are strictly ordered, which makes it possible to use a
compact form of writing in a tree-like structure, the vertices of which are subsets of the
attribute values &lt;Ai, Xi&gt;. A pair of functions f: n→X i1 * ... * Xik, g: Xi1 * ... * Xik→n
provides a conversion of the inflectional paradigm to a set of attribute values, and vice
versa. So each lexeme is assigned a paradigm from the paradigm table, and each
paradigm is assigned a type of paradigm describing its structure.</p>
        <p>The morphological class includes a part of speech, a set of derivational lexical
features xij∈Xi (for example, animation or gender in nouns) and a type of paradigm
describing attributes of word forms.</p>
        <p>Another important feature of the system is the support of multiword terms (phrases)
formed according to the shallow syntactic analysis based upon a fixed set of rules. Most
of the multiword terms include from two to four words and are formed using the rules
of the following type:
• A+N (“аналоговый датчик” which means “analog sensor” in Russian) –
agreement of a noun and an adjective;
• N+Ngent (“автор учебника” which means “textbook author” in Russian) –
agreement of a noun and a noun in the genitive case;
• A+A+N (“новая информационная технология” – “new information
technology”);
• N+Agent+Ngent (“обработка естественного языка” – “natural language
processing”);
• A+N+Ngent (“локальная степень вершины” – “local degree of a vertex”),
• N+Ngent+Ngent (“компонента связности графа” – “connected component of a
graph”) etc.</p>
        <p>There are also terms with a more complex structure, for example, with dependent
prepositional groups:
• N+PREP+N (“резервуар с жидкостью” – “reservoir with liquid”, “рассуждение
по умолчанию” – “default reasoning”);
• N+PREP+N+N (“поиск в пространстве состояний” – “search in the state
space”);
• N+PREP+A+N (“автомат с переменной структурой” – “variable-structure
automata”) etc.</p>
        <p>The system has its own component of multiword term extraction of the Russian
language, which, according to a given set of words and their grammatical characteristics,
checks agreement in accordance with one of the syntactic models and synthesizes a
normal form of a multiword term. The multiword vocabulary term is uniquely identified
by a triple &lt;normal form, rule, &lt; lexical structure&gt;&gt;. Such term has a syntactic head (a
single-word term) and grammatical features that are formed on the basis of the
grammatical features of the head.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Features of Terms</title>
        <p>Depending on the problem being worked out, terms in the dictionary can be supplied
with features of various types: statistical (for solving classification problems), genre
(for text genre analysis), semantic (for semantic analysis), formal (for identifying
markers of certain structures), etc.</p>
        <p>Statistical features keep frequency information. When text is processed all terms
occurred in it have their statistics updated. To perform text classification, we need a
training corpus, i.e. corpus annotated with predefined set of interrelated topics. In the
dictionary for each term we know how much times it occurred in the training corpus (this
is called absolute frequency), and a number of texts in which the term occurred (text
frequency). We also know a list of topics where term was found, absolute frequencies
and text frequencies for each topic from the list. Some parameters (relative frequency,
tf*idf, weight) are calculated dynamically.</p>
        <p>The set of features user needs to markup dictionary terms with, are defined by him
or her and depends on the task being addressed, so it is completely user-defined and
problem oriented. To encode various information about the term (semantic, genre,
stylistic, etc.), the following facilities are provided.
• Class. The term could be of one of the classes. A class hierarchy allows user to assign
a term to a certain level of hierarchy: more general or specific, inheriting properties
from upper classes.
• Attribute. Attributes are used to represent the lexical meaning of a term. Combining
word’s semantic attribute values, we can, to a certain extent, model the component
semantic structure of a word. The main components of the semantic structure of the
term can be considered as thesaurus descriptors.
• Alternative feature sets allow the term ambiguity to be expressed.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Working with Text Corpora</title>
      <p>The developed environment consists of vocabulary components and processors that, on
the one hand, allow automatic creation, fill and edit dictionaries, and, on the other hand,
use those dictionaries in lexical text analysis. One of the most important features is user
supporting tools such as term sorting, term filtering, text coverage visualization,
concordance constructor, etc.
4.1</p>
      <sec id="sec-3-1">
        <title>Corpus-based Vocabulary Learning</title>
        <p>The terminology extraction process consists of steps that follows: a) text tokenization
b) lexical and morphological analysis (lemmatization, extraction of lexical and
grammatical features, normalization), c) extraction of phrases that are “look like” terms
(phrase term-likeness is based on predefined grammatical models), d) update the
statistics of found terms.</p>
        <p>Following are the modules that are used for dictionary construction.</p>
        <p>The morphological analysis is carried out on the basis of the Dialing module
(www.aot.ru), which contains a dictionary of general Russian terms. This module
supports search for words, along with their grammatical features and normal forms on the
dictionary. It also provides an additional feature called predictor, which for any word
that is not in the dictionary can make assumptions about part of speech, normal form
and other features. Predictor can make up to three assumptions for a single term.</p>
        <p>The multiword term extractor is applied to recognize phrases in accordance with a
fixed set of grammatical rules. The main objective of the module is to identify the most
important term-forming syntactic groups, most of which are nominal groups or are
based on them.</p>
        <p>Using aforementioned modules to process text corpus, we will end up with the
resulting dictionary and statistics of frequencies of terms. If there were special features
marked in the corpus, the corresponding terms are treated as having those features, and
statistics are also kept with regard to the features.</p>
        <p>The proposed environment hereby provides tools and features allowing automatic
building a draft dictionary from scratch based on a text corpus of interest. On the basis
of such a dictionary a further research could be carried out.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Concordance</title>
        <p>A concordance is the traditional way of studying a corpus of texts. It contains a
complete index of terms that share context with the selected one. The sizes of contexts may
vary. Concordances allow the researcher to test his or her hypothesis about the
functionality of a particular lexical unit. It could be said that a concordance connects
dictionary terms with the text corpus, and serves as a linguistic markup at the
morphological and shallow syntactic level.</p>
        <p>The implemented in the environment concordance construction tool works with text
files. The user can customize the size of a text fragment being viewed in a context of a
term entry (Fig. 2.). An example of a concordance given in Fig. 2 for a word
“например” (which means “for example” in Russian) includes 144 occurrences from
the text corpus, and shows how context could be expanded word-wise, or how one or
more paragraphs could be summoned to view by providing the selected term entry. In
the example the research purpose was to test the hypothesis about the use of this term
in the argument from expert opinion.</p>
        <p>
          In general, this kind of research allows user to identify more complex language
constructs that ensure the precision and recall of the information extraction process, and to
identify additional features based on them. To describe constructions to be extracted,
user can use search patterns which based on regular expressions, supported by a
userfriendly language [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ].
4.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Search Patterns</title>
        <p>In our studies, we have been using different types of patterns and tools that support
automatic text processing. In each case the toolkit was chosen based on a problem area
and methods used to solve the target problem.</p>
        <p>
          For example, in the project targeting the problem of filtering out prohibited content
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], in addition to being marked by thematic, genre and lexico-semantic features got
from the vocabulary texts was processed with special patterns each of which described
constructions specific to a particular Internet genre [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Those patterns have
significantly improved the accuracy of the genre classification.
        </p>
        <p>Taking a closer look, a pattern allowing detecting a block containing personal
information on a website can be represented as follows:</p>
        <p>_profile:[ “личный кабинет”][“профиль”][“аккаунт”][“о себе"][“личный
профиль”]</p>
        <p>//_profile: [“personal account”] [“profile”] [“account”] [“about me”]
[“personal profile”]</p>
        <p>Profile Description / Contacts: [&lt;_ profile, all_h&gt;]</p>
        <p>In this case, the _profile pattern is defined by a set of alternative terms. If any of
these terms appears as a part of a header at any level (as indicated in the second pattern)
we can classify a text block as a block containing user profile information. Patterns
defined like one from the example belongs to logical combinatorial lexical patterns.</p>
        <p>
          In another project our goal was to extract information from technical documentation
texts. We built a glossary of terms with semantic subject-oriented markup, and applied
search patterns to extract parametric information, which is often represented by
numerical and symbolic notations and abbreviations. The patterns used are defined as follows.
class: ‘Object ACS’, template: ‘ACS TP’, type: ‘base’
[АСУ] = АСУ{ТП}; автоматизированн{…} систем{…} управления
[ACS] = ACS{TP}; automatic control {...} system
Patterns of this type are called lexical-semantic patterns [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Finally, there is yet one project where we target a philosophical problem of argument
analysis. We build a dictionary of markers of argumentative structures on the basis of
an annotated text corpus. Applying patterns allows us to represent area specific
constructions, which could be consisting of more than one part, separated with gaps.</p>
        <p>DSC = [begin: DS, w / &lt;speech&gt; &lt;Verb, past | present&gt;, Expert &lt;N, им&gt;, end: ES]
quote_l = [“|«]
quote_r = [”|»]
DS = [begin: quote_l, end: quote_r] // direct speech</p>
        <p>Thus, in the experiment on the extraction of arguments from expert opinion, the
search accuracy using patterns was 86.5%. Based on the above we can conclude that
using our search patterns allows us to develop lexicographic resources containing not
only the traditional vocabularies and stable inseparable lexical phrases, but also
language constructs that have a more complex structure.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper is devoted to describe approaches and methods for development of
lexicographic resources, conducting studies on text corpora in order to ensure the
completeness and reliability of models being developed. The created toolkit is focused on
lexical units acting as markers and indicators of higher level objects (semantic,
pragmatic, structural-genre, logical-argumentative, etc.).</p>
      <p>The considered software environment integrates basic tools required to conduct
research on lexical characteristics of the text, which ensures a full cycle of the expert’s
work. The environment has wide possibilities for tuning of parameters, ranging from
grammatical categories, lexico-semantic characteristics, classification parameters, and
ending with specific search patterns that ensure the search for contexts and the
construction of concordances. Practical use of this software in various research projects
showed usability, the relevance of functionality and adaptability for different tasks.</p>
      <p>Consequently, distinctive features of the system are:
• possibility of multipurpose use in solving various text analysis tasks, such as text
classification, information extraction, lexicographic research of a text corpus, genre
analysis, etc.;
• integration of various tools within the same environment for conducting language
researches and providing customization of vocabularies to a problem area:
concordance, statistical study based on a corpus of texts, support for semantic markup of
lexical units, a rich set of search tools and filtering.</p>
      <p>The environment supports a rich lexical model that integrates various models of
representation of lexical units and language constructs.
1. Expandable and customizable morphological model (in contrast to the
wellknown morphological analyzers aot, pymorphy, mystem, etc.);
2. Grammar models for Russian phrases extraction and the possibility of selectively
use them;
3. Search patterns integrate semantic, grammatical, lexical and symbolic
representations based on logical operations.</p>
      <p>Further improvements of the system may lie in developing of corpus-based research
tools, such as constructing concordances for the joint occurrence of terms, using
conditions for the presence / absence of feature sets in search queries, etc. It is also planned
to enhance the reusability of results of the research by storing the data in standard
formats based on XML (TEI, OWL).
The work was carried out with the financial support of the Russian Foundation for Basic
Research (grants 19-07-00762 and 17-07-01600 and 18-00-01376 (18-00-00889)).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N.V.</given-names>
          </string-name>
          :
          <article-title>Thesauri in information retrieval tasks</article-title>
          .
          <source>MSU Publ., Moscow</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Sinclair</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Corpus</surname>
          </string-name>
          , Concordance, Collocation. Oxford University Press, Oxford (
          <year>1991</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zakharov</surname>
            ,
            <given-names>V.P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Khokhlova</surname>
            <given-names>M.V.</given-names>
          </string-name>
          :
          <article-title>Automatic extracting of terminological phrases</article-title>
          .
          <source>Structural and Applied linguistics</source>
          <volume>10</volume>
          ,
          <fpage>182</fpage>
          -
          <lpage>200</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bolshakova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nokel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Topic models can improve domain term extraction</article-title>
          .
          <source>In: International conference on Information Retrieval ECIR-2013</source>
          , pp.
          <fpage>684</fpage>
          -
          <lpage>687</lpage>
          . Springer Verlag, (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mitrofanova</surname>
            ,
            <given-names>O.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zaharov</surname>
            <given-names>V.P.</given-names>
          </string-name>
          :
          <article-title>Automatic extracting terminological phrases</article-title>
          .
          <source>In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog-2009”</source>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          . Moscow (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sokirko</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Morphological modules on the site www</article-title>
          .
          <source>aot.ru. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog2004”</source>
          , pp.
          <fpage>559</fpage>
          -
          <lpage>564</lpage>
          . Nauka Publ, Moscow (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bol'shakova</surname>
            ,
            <given-names>E.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baeva</surname>
            ,
            <given-names>N.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordachenkova</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasil</surname>
            'eva,
            <given-names>N.E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Morozov</surname>
            <given-names>S.S.:</given-names>
          </string-name>
          <article-title>Lexicosyntactic patterns for automatic text processing</article-title>
          .
          <source>In: Proc. Int. Conf. Dialogue</source>
          <year>2007</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>75</lpage>
          . Moscow (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rabchevsky</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulatova</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sharafutdinov</surname>
            ,
            <given-names>I.M.:</given-names>
          </string-name>
          <article-title>Application of lexical-syntactic patterns to the automation of ontology building process</article-title>
          .
          <source>In: Proc. 10th All-Rus. Conf. RCDL'2008 Electronic Libraries: Perspective Methods</source>
          , Technologies, Electronic Collections, pp.
          <fpage>103</fpage>
          -
          <lpage>106</lpage>
          .
          <string-name>
            <surname>Dubna</surname>
          </string-name>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sidorova</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          , Kononenko, I.S., and
          <string-name>
            <surname>Zagorulko</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .A.:
          <article-title>An approach to filtering prohibited content on the web</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          ,
          <year>2022</year>
          . pp.
          <fpage>64</fpage>
          -
          <lpage>71</lpage>
          . CEURWS.org (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sidorova</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          and Kononenko, I.S.:
          <article-title>Genre aspects of websites classification</article-title>
          .
          <source>Software Engineering</source>
          <volume>8</volume>
          ,
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sidorova</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Timofeev</surname>
            ,
            <given-names>P.S.:</given-names>
          </string-name>
          <article-title>A lexico-semantic templates as a tool for declarative description language constructs linguistic text analysis</article-title>
          .
          <source>System Informatics</source>
          <volume>13</volume>
          ,
          <fpage>35</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2018</year>
          ) DOI:
          <fpage>10</fpage>
          .31144/si.2307-
          <lpage>6410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bol'shakova</surname>
            ,
            <given-names>E.I.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>K.M.:</given-names>
          </string-name>
          <article-title>Term extraction for constructing subject index of educational scientific text</article-title>
          .
          <source>In: Sixteenth Russian Conference on Artificial Intelligence RCAI2018. T1</source>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>261</lpage>
          . Moscow (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>