<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Logico-Linguistic Model of Ukrainian Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nataliia Darchuk</string-name>
          <email>NataliaDarchuk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kryvyi Sergii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Sorokin</string-name>
          <email>victor.sorokin@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University</institution>
          ,
          <addr-line>14, Taras Shevchenko Boulevard, Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The purpose of the article is to develop a methodology for analyzing a Ukrainian-language text with two components, linguistic and logical, both of which are based on the formal apparatus of both linguistic and logical-model analysis. An example of a formal apparatus for the presentation of procedural knowledge is the computer grammar AGAT as an integral computer model of the Ukrainian language, in which the ontological system works in a complementary mode to the epistemological aspect. Its model - an active text analysis machine - hierarchically solves all the necessary tasks similarly to a human linguist, but it does so according to the rules of computer grammar, which consists of two sections according to the objects of description - morphology and syntax, as well as semantics as the final stage automatic text analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontograph</kwd>
        <kwd>logical-linguistic analysis</kwd>
        <kwd>dependency graph</kwd>
        <kwd>syntactic-semantic relations</kwd>
        <kwd>descriptive logic</kwd>
        <kwd>area of interpretation</kwd>
        <kwd>concept</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Knowledge acquisition from natural language texts is one of the prevalent challenges in artificial
intelligence. A natural language text is the object of computational linguistics research and the subject of
language and speech modeling. Based on the formal apparatus of linguistic data analysis, it is closely
connected to logic, psychology, mathematics, artificial intelligence, and cybernetics. Every computational
model related to the analysis of natural language texts denotes a generation and processing of declarative
and procedural knowledge [1, p.75]. Analyzing such knowledge requires describing denotational and
operational semantics to answer the following questions: a) what is generated or calculated? b) in which
way is it generated or calculated.</p>
      <p>It must be noted that natural language texts should be analyzed in two stages: the linguistic (syntactic
and semantic) stage, as well as formal, logical modeling stage. Semantic module must be present in both
stages. They are closely connected: the more accurate the results of the first analysis stage are, the better its
translation into the formal logical language is.</p>
      <p>In this paper, we aim to develop an analysis methodology of Ukrainian language texts using two
components, linguistic and logical, both of which are based on the formal apparatus of linguistic and logical
modeling analysis. The computational grammar AGAT is an integral computational model of Ukrainian is
an example of formal apparatus of procedural knowledge representation. In AGAT, the ontological and the
gnoseological aspects complement each other, functioning together. The model, an active text analysis
automaton, hierarchically solves all necessary tasks analogously to a human linguist, but does so according
to the rules of computational grammar. The grammar comprises two parts, morphology and syntax, as well
as semantics as the final stage of text analysis.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        For the creation of linguistic modules for natural language text analysis, two main approaches are
currently used: the first is based on rules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and the second is the engineering approach called "machine
learning" [3; 4]. The first approach is linguistic, as it represents linguistic information in formal rules,
sometimes embedded in the program code or in a specially created formal language. The rules are
formulated by linguists themselves. Within the machine learning approach, the source of linguistic
information is not the rules, but the selected texts of the problem domain. Among the methods used in this
approach are supervised, unsupervised and bootstrapping learning. Supervised learning is most commonly
used when building a mathematical and software model of a machine classifier that can recognize different
classes of text units (words, word combinations, etc.) or texts themselves [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The learning is based on
general regularities that are inherent in natural language texts based on data from the training sample, so
knowledge is both declarative (rules) and procedural (machine learning).
      </p>
      <p>Both methods have advantages and disadvantages. Creating rules is a laborious process, but it is deeply
linguistic, taking into account even partial complex cases, which are quite numerous in diverse texts. Rules
are declarative, easy to understand, and easy to modify depending on the results of the module's work.
Machine learning does not require manually creating rules, which shortens the development time of
systems, but classifiers are opaque and hard to interpret linguistically. Therefore, the AGAT grammar is
chosen as the basis of the system for automatic processing of Ukrainian text.</p>
      <p>
        A logical approach to the analysis of natural language texts is considered in many works, which can be
divided into the following directions: a) search for coreferences in the text [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; b) construction of specialized
parsers for highlighting the semantic properties of the text [7; 8]; c) a direction that partially includes
directions a) and b) and is oriented towards obtaining knowledge from the text [9; 10]; formally logical and
ontological direction [11;12], and the direction of transformational analysis of texts [13; 14].
      </p>
      <sec id="sec-2-1">
        <title>The scientific novelty, theoretical and practical value of the results</title>
        <p>Natural language processing is one of the main computer science tasks today. This is largely due to the
desire of humankind to overcome language barriers and also due to the dozens of practical tasks, such as:
methods of automatic translation, referencing and annotation, real-time speech recognition, including
natural language commands, automatic search, constructing responses to questions, detecting and
correcting grammatical errors, building natural language dialogue systems, text coherence checking,
sentiment analysis, etc. Any developments in this field deepen theoretical linguistic knowledge and solve
practical tasks, as they are mostly linguistic, related to the definition of parts of speech, lemmatization,
building dependency trees, coreference resolution, named entity recognition, establishing structural and
semantic incompleteness of sentences, detecting connections and relationships between language units. The
scientific novelty of the results of this research lies in the combination of knowledge from natural language
texts and powerful mathematical logic apparatus, which allows representation, analysis and knowledge
extraction from unstructured natural language texts. In Ukraine, there are no similar research studies.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>Methods of structural linguistics are used in linguistic analysis modules: distributive analysis,
constituency and dependency tree construction, and component analysis. Automatic morphological analysis
module uses the distributive method, automatic syntactic analysis module relies on constituency and
dependency trees, and automatic semantic analysis module utilizes component analysis.</p>
      <p>The logical component utilizes the results of formal grammar analysis. Full automation of the
logicmodeling stage encounters the problem of choosing a formal logical language in which knowledge obtained
on the first stage is presented and studied, and depends on the complexity of the input text T. This problem
is solved in the following way: usually a first-order predicate language is chosen for working with
knowledge, as it is expressive enough and has well-developed algorithmic tools. This choice is also
confirmed by the fact that the selection of higher-order logical languages has a high complexity of the
analysis process and insufficiently developed tools for logical inference.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>Automation of linguistic research is associated with the creation of systems for automatic processing of
written Ukrainian text. The stages of computer language analysis are:
● tokenization - segmentation of letter sequences into words and sentences;
● morphological analysis - part-of-speech and categorical grammatical information;
● syntactic analysis – automatic construction of trees of sentence dependencies, the result of
which is also a marked tree of subordination, attribution to each pair of words of the type of
syntactic connection and syntactic-semantic relations at the level of the morphological way of
expression of the "owner";
● semantic analysis - determining the meanings of individual sentences or their parts;
● logic-model analysis, i.e. translation of the input text into the language of mathematical logic
in order to identify contradictions, illogicalities in the expression of meaning and the
possibility of obtaining information relevant to the request from it.</p>
      <p>Each analyzed text is a separate file in XML format, which contains morphological information about
all the word forms of the text (the lemma and its set of grammatical features), as well as the syntactic
structure of each sentence in the form of a dependency graph. All branches of the tree are marked with
names of syntactic relations (coordination, subordination, conjunction), semantic-syntactic relationships (6
of them: subject, object-direct and indirect, attributive, adverbial, and completive; and 6 conjunctive ones:
identical-conjunctive; contrasting-conjunctive; comparative-conjunctive; explanatory-conjunctive;
joining-conjunctive; separating-conjunctive). The analysis modules use a morphological dictionary
containing 200,000 lemmas and a syntagm grammar, which includes hundreds of rules. The
syntacticsemantic annotation, however, is built automatically, and its results are necessarily corrected.</p>
      <p>The following figure illustrates the stages of text analysis:
forms for text T and division of the set L = L1 , L2 , L3 , L4 into classes (parts of speech and categorical
grammatical characteristics). A lexical analysis is provided with the establishment of deverbatives,
deadjectives, and denominatives.</p>
      <p>Stage 2. Construction of a set of objects D, based on the results of automatic syntactic analysis of text
T and stage 1 results. At this stage, the terms combining several words, anaphoric connections, etc. are
found.</p>
      <p>Stage 3. Comparison of the set of objects D with the data of the information and search thesaurus
Stage 4. Construction of an ontograph based on the set of objects D (construction of relations RL ) sing
classes L = L1 , L2 , L3 , L4 . The ontograph of the text is built on the basis of sentence ontographs by
applying conjunction and simplification rules.</p>
      <p>Based on the stages of logical-linguistic analysis presented in the figure above, the following algorithm
can be presented:</p>
      <sec id="sec-4-1">
        <title>LOGICAL-LINGUISTIC ANALYSIS OF TEXT (T)</title>
        <p>Input: Initial text T.</p>
        <sec id="sec-4-1-1">
          <title>Output: Results of queries to knowledge base of the text T.</title>
          <p>Method:
Algorithm start
1. Enter the initial text T;
2. Carry out syntactic-semantic analysis of T;
3. Based on the results of the analysis of T, construct a table (i) of codes of classes of text T;
4. Based on the table (i) of codes, construct a universe B for text T;
5. Give an interpretation of the universe using an information-search thesaurus.
6. Carry out logical analysis of the universe.</p>
          <p>6.1. Check the obtained facts for inconsistency.
6.2. If the facts are not inconsistent, then enter them into the knowledge base and generate answers
to queries to this knowledge base.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Algorithm end</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The analysis of a scientific text on marketing is presented. The length of the text "Marketing Distribution
Policy" is 4142 tokens; it contains 204 sentences. We will illustrate all stages of the automatic text analysis
using both short and long sentences as an example.</p>
      <p>1) Головним у маркетинговій політиці розподілу є формування відповідних каналів .
2) Важливість цього питання визначається такими обставинами : вибраний канал розподілу
справляє принциповий вплив майже на всю маркетингову програму підприємства;
формування каналу розподілу передбачає укладення тривалих комерційних угод з його
суб'єктами , які потім дуже важко змінити , нехай навіть вони й будуть помилковими ;
між суб'єктами каналів часто виникають конфлікти , які погано відбиваються на
результатах збутової діяльності підприємства ; користувач каналами розподілу
(продуцент товарів ) часто тією чи іншою мірою втрачає безпосередній контроль над
ринком збуту.</p>
      <p>The table in the figure contains sentences with morphological annotations, as well as information about
the part-of-speech and categorical features of words. The table has three columns: the first column is the
main member of the binary phrases, the second column is the subordinate member of the phrase, and the
third column is the syntactic information about the type of phrase. This makes it possible for the program
to create an alphabetical frequency dictionary of text word combinations upon completion of the program.
Alphabetical frequency dictionaries have been built for specific lexemes and classes of words. Among the
most frequent nouns are "канал" (103), "розподіл" (89), "товар" (58), "споживач" (49), "посередник"
(35), "ринок" (32), "підприємство" (28), "товаровиробник" (25), "рівень" (24), "продукція" (22) and
so on.</p>
      <p>Figure 3 demonstrates a graphical representation of a dependency tree created by automatically inverting
the dependency table.</p>
      <p>The dependency tree consists of nodes and edges, where nodes represent the words, and edges illustrate
the relations between head words and dependents of a phrase. Aside from that, additional information on
types of relations between nodes is given. This makes it possible to describe the configuration, form, and
outer parameters of the sentence. However, this is not enough to present the structure of the sentence. The
information about the type of relations between the constituents of the phrase and semantic-syntactic
relations is automatically applied to the set of tree edges. This helps with analyzing complex correlations
between semantics and its formal representation, as the text is parsed automatically based on the formal
features of its units. Thus, automatic syntactic analysis of the sentence is done on two levels: 1) for each
phrase, the program determines its syntactic type based on the morphological features of its head; 2)
syntactic relation type is determined for each edge of the graph</p>
      <p>In addition to relationships between words in a sentence, we observe another, more important type of
ordered relationships - relationships between groups of words, or word combinations, and for their
representation, a formal structure of another type is needed - the constituent structure. By analyzing the
sentence in Figure 3, 4, intuitively, we can divide it into segments that have a hierarchical structure, in
which some have a common part, that is, one part is included in another. The sentence is automatically
divided into segments that form a hierarchical structure:</p>
      <p>If the constituents have a common part - one completely falls within the other - the system of constituents
is considered the formal model of the sentence. The constituent "є головним у маркетинговій політиці
розподілу" is combined with the dominant one "є формування відповідних каналів" with a subordinate
link, because the predicative pair "є формування" is the constructive center of the sentence, and "є
головним…" expands the group of predicate. Studies have shown that the constituent includes not only
individual words, but also nested "complex" constituents, for example, an adverbial clause. Therefore, the
list of constituents demonstrates that not only individual words are the syntactic units in the sentence, but
also whole word combinations or groups: [[Головним [у] [маркетинговій [політиці]] розподілу] // [[є]
формування] [відповідних каналів]]. Note that constituents cannot overlap, but can "nest". This means
that if one word or a group of words simultaneously is a part of two or more constituents, one of them
completely envelops the other one. Following up, it is determined which sets of such structural units
(constituents) belong to the same grammatical class. The structure of constituents illustrates this,
represented in the form of a labeled tree. They add up in the structure of components, forming a system of
sentence components.</p>
      <p>Considering the content part of the sentence constituents, I. R. Vykhovanets notes that semantic
researchers often associate the semantic organization of a sentence with its formal organization, using the
concept of semantic sentence structure, qualifying it as the meaning of the sentence, presented in a
generalized form taking into account those elements of meaning that are outlined by the sentence's form [15,
p. 121]. The objective content of the sentence is best reflected in the concept of a proposition. This is a
stable core, a constant of the sentence, which reflects the structure of the described situation. The linguist
notes that the structure of the proposition is determined by the predicate. The predicate indicates the nature
of the situation - in our case this is the root of the tree - and the corresponding places for objects - the
participants of the situation - these are the actants, arguments, represented by groups of subject and
predicate, quality and functions of which are determined by the predicate. And only the semantic nature of
the predicate determines the number and roles of the actants. Thus, two aspects of study are relevant: the
semantics of predicate words and the semantic roles of actants. The first aspect - the semantics of predicate
words - already has its form as semantic domains of verb classes, predicative adverbs, to which a semantic
class number is assigned, but semantic roles require more research.</p>
      <p>Automatically obtained constituents can be considered as the raw input for forming n-ary predicates, as
the primary way of expressing its content. The semantics of the predicate is determined by a certain
semantic class to which the predicate belongs. Therefore, we can determine the ways of expressing
propositions (propositions of movement, speech, sound, mental sphere, emotions, etc.), the number of
actors, etc. Using a large corpus material, subcorpora of constituents associated with certain propositions
can be formed, from which one can distinguish the constituents which is most frequently used for a specific
proposition.</p>
      <p>The aforementioned sentence is simple, extended, and declarative. Let's analyze its content. It is about
the marketing distribution policy, but the noun-deverbative "розподілу" with the meaning of "placement"
requires clarification: distribution of what? (see The Dictionary of the Ukrainian Language: in 11 volumes).
The subject group "формування відповідних каналів" is also incomplete, since the word "канали" is used
in a figurative sense as "means and ways of achieving something." Therefore, from a semantic and
pragmatic point of view, this sentence is poorly constructed.</p>
      <p>The semantic-logical model of the aforementioned sentence constitutes a predicative-argumentative
structure of the following type. Check Table 1:
Table 1
Predicative-argumentative structure sample
of expressing all members of the predicate structure are lexemes of meaningful parts of speech (nouns,
verbs, adjectives). Service words (prepositions, conjunctions, particles) are not presented.</p>
      <p>The predicate pair (ФОРМУВАННЯ КАНАЛІВ) – ACTION/БУТИ ГОЛОВНИМ) = argument 1
(МАРКЕТИНГОВА ПОЛІТИКА) + argument 2 (ПОЛІТИКА РОЗПОДІЛУ)
Sentence 2</p>
      <p>Важливість цього питання визначається такими обставинами : вибраний канал розподілу
справляє принциповий вплив майже на всю маркетингову програму підприємства; формування
каналу розподілу передбачає укладення тривалих комерційних угод з його суб'єктами , які потім
дуже важко змінити , нехай навіть вони й будуть помилковими ; між суб'єктами каналів часто
виникають конфлікти , які погано відбиваються на результатах збутової діяльності підприємства
; користувач каналами розподілу (продуцент товарів ) часто тією чи іншою мірою втрачає
безпосередній контроль над ринком збуту.</p>
      <p>This sentence follows the first one in the text. It is a complex sentence with a compound-complex
sentence structure. The sentence consists of 70 words and eight predicative parts. Figures 5a-5e represent
fragments of the dependency graph, which is automatically constructed based on types of syntactic and
semantic-syntactic relations, which allows us to simplify the sentence, identify coreference links and restore
the entities, and prepare for logical analysis.
The predicate pair (ВАЖЛИВІСТЬ (ФОРМУВАННЯ КАНАЛІВ)) – action (ВИЗНАЧАЮТЬСЯ) =
argument 1 (ОБСТАВИНАМИ)</p>
      <sec id="sec-5-1">
        <title>From the second predicative part (Figure 7), two propositions are extracted:</title>
        <p>"канал справляє вплив на програму підприємства" and "справляє вплив на програму маркетингову".</p>
        <p>The predicate pair (КАНАЛ РОЗПОДІЛУ) – action (СПРАВЛЯЄ ВПЛИВ) = argument 1 =
(ПРОГРАМУ ПІДПРИЄМСТВА) + argument 2 = (ПРОГРАМУ МАРКЕТИНГОВУ)</p>
        <p>From the third predicative part, three such propositions are extracted:
"формування каналу розподілу передбачає"; "передбачає укладення тривалих комерційних угод з
його суб'єктами"; "які потім дуже важко змінити".</p>
        <p>The predicate pair (ФОРМУВАННЯ КАНАЛУ РОЗПОДІЛУ) – action (ПЕРЕДБАЧАЄ
УКЛАДЕННЯ) = argument 1 = (ТРИВАЛИХ КОМЕРЦІЙНИХ УГОД) + argument 2 = (ЙОГО
СУБ’ЄКТАМИ) + argument 3 = predicative pair (0) – action (ВАЖКО ЗАМІНИТИ); argument 4 = (ЯКІ)</p>
        <p>Upon the construction of a predicative pair, the antecedents of the possessive pronoun "його" and its
conjugate word "які" remain unclear. It is hard to understand whether the author is speaking in regards to
"угоди, які важко потім замінити", or "суб’єкти комерційних угод". Such logical errors have an impact
on the final result of linguistic and logical analysis of a scientific text.</p>
        <p>From the fourth predicative part нехай навіть вони й будуть помилковими , one proposition is extracted:
The predicate pair (ВОНИ) – action (НЕХАЙ БУДУТЬ ПОМИЛКОВИМИ) It is complicated to
determine the antecedent of the pronoun "вони".</p>
        <p>From the fifth predicative part між суб'єктами каналів часто виникають конфлікти, які погано
відбиваються на результатах збутової діяльності підприємства, such propositions are extracted:</p>
        <p>The predicate pair (КОНФЛІКТИ) – action (ВИНИКАЮТЬ); argument 1 = (СУБ’ЄКТ КАНАЛІВ) =
the predicate pair (ЯКІ) – action (ВІДБИВАЮТЬСЯ); argument 2 (РЕЗУЛЬТАТ ДІЯЛЬНОСТІ) +
argument 3 (ЗБУТОВОЇ ДІЯЛЬНОСТІ) + argument 4 (ДІЯЛЬНОСТІ ПІДПРИЄМСТВА)</p>
        <p>From the sixth predicative part користувач каналами розподілу (продуцент товарів) частотією
чи іншою мірою втрачає безпосередній контроль над ринком збуту, such propositions are extracted:</p>
        <p>The predicate pair (КОРИСТУВАЧ КАНАЛАМИ РОЗПОДІЛУ) – action (ВТРАЧАЄ КОНТРОЛЬ);
argument 1 = (КОРИСТУВАЧ= ПРОДУЦЕНТ ТОВАРІВ) + argument 2 (КОНТРОЛЬ НАД РИНКОМ
ЗБУТУ)</p>
        <p>Such method of formulating a predicative structure allows to identify the elements of text that can be
disregarded as insignificant from the point of view of meaning, such as: (вплив) майже (на); (які) потім
дуже (важко…); нехай вони будуть помилковими; Часто (виникають); (які) погано
(відбиваються); Часто тією чи іншою мірою. The simplification of text occurs through the use of
particles, qualitative adverbs, which are specified by a list. In contrast, coreferential connections, which
restore the antecedent of the text, are important both for the transmission of the text's meaning and as a
linking element between the sentences of the text.</p>
        <p>An important part of linguistic processing is the dictionary component - the thesaurus of terms of the
subject area (SO) "Marketing" of the information-search type, where each term is represented in a network,
the nodes of which are terms, and the arcs are relationships between terms (http://www.mova.
info/thes_nl.aspx). Automatic comparison of the predicate structures obtained from the text with the
terminological network of the thesaurus is the basis for building an ontograph for the above sentences (see
Figure 8)</p>
        <p>Distribution channel</p>
        <p>R
MaМrkаeркtiеnтgинpгrоoвgаraпрmограма
підприємства</p>
        <p>R</p>
        <p>R</p>
        <p>Making deals</p>
        <p>R
ЗбSутaоleвsаaдcіяtiлviьtyність
підприємства</p>
        <p>Marketing policy</p>
        <p>R</p>
        <p>ConКfоliнcфtліbктetмwіeжen
Confсliуcбt’bєeктtwамeeиnn</p>
        <sec id="sec-5-1-1">
          <title>R1 – predicate «бути головним», R2 – predicate «впливати», R3 – predicate «передбачати», R4 – predicate «провокувати», R5 – predicate «бути результатом».</title>
          <p>This concludes the linguistic syntactic-semantic analysis and leads to logical analysis.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Logical analysis of the results of linguistic text processing</title>
          <p>The purpose of logical analysis is to verify the consistency and logical compatibility of facts. To
perform logical analysis of the results of linguistic syntactic-semantic analysis, it is proposed to use the
tools of descriptive logics and ontologies, because descriptive logics have a reliable algorithmic basis, and
the use of ontologies is a direct way to build a knowledge base. The knowledge base accumulates
knowledge obtained from the original text, but this knowledge must be consistent (compatible) in a logical
sense. The results of linguistic analysis of the input text presented in the form of an ontograph of this text,
can be checked for inconsistency. To do this, it is necessary to interpret the objects of the universe and
determine the formal logical language in which the checks of the properties of the acquired knowledge and
the generation of consequences that arise from this knowledge will be performed. Currently, the most
suitable descriptive logic for performing such tasks is ALC-logic and some of its extensions. For this logic,
algorithms for generating consequences and checking the consistency of a set of knowledge represented by
formulas of this logic have been developed. Now, let us consider the formal definitions.</p>
          <p>The creation of ontology-like systems is based on the concept of ontology.</p>
          <p>Definition 1. An ontology is defined as an ordered triple</p>
          <p>O = (X, R, F),
where X is a finite set of concepts,, R – s a finite set of binary (semantic) relations defined on X, і
F– is an interpretation function on a domain D of elements from X and R, such that F : X  R  D.
For example, Х (concept)= {(МАРКЕТИНГОВА) ПОЛІТИКА, (МАРКЕТИНГОВА) ПРОГРАМА,
РОЗПОДІЛ ТОВАРУ, ЗБУТОВА ДІЯЛЬНІСТЬ, КОМЕРЦІЙНА УГОДА, СУБ’ЄКТ КАНАЛУ
РОЗПОДІЛУ, ПІДПРИЄМСТВО, etc.}; relations (roles) = {R1 – «бути головним» , R2 –
«впливати», R3 –«передбачати», R4 –«провокувати», R5 – «бути результатом»}.</p>
          <p>If А = ‘канал розподілу’, В = ‘маркетингова політика розподілу’, then the formula АR1B means:
‘канал розподілу’ is prevalent in «маркетинговій політиці розподілу».</p>
          <p>When constructing an ontology, the subject area (SA) is specified, to which the concepts from X and the
relations from R pertain. In this case, it is Economics (marketing). The specification of the SA is necessary
for defining the interpretation of F. The relationship of F with the SA may introduce additional corrections
to the definition of F. These additional corrections are described by the axioms A of the given SA and the
restrictions Rс, which have the form of additional definitions (clarifications, limitations on possible values,
etc.), and properties from the interpretation area D of the given ПО. Thus, we arrive at a refined definition
of the ontology for a specific SA.</p>
          <p>Definition 2. A refined ontology is defined as an ordered quadruple O = (X, R, F, A (D, Rс)), where X
is a finite set of concepts (terms), R s a finite set of binary (semantic) relations defined on X, і F is an
interpretation function on a domain D of elements from X and R, and A(D,Rс) are additional constraints
{Rс}that are described by axioms А on the domain D.</p>
          <p>
            The difference between definitions 1 and 2 lies in the following [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
          <p>a) The set of concepts X (in this case, terms) in definition 1 is oriented towards the problem (Economics)
to be solved, while in definition 2 this set is specified (the subject area is "Marketing") and should be as
complete as possible for the given SA and should be constructed using automated means (from dictionaries
and texts).</p>
          <p>b) The set R in definition 1 is established by experts in the relevant subject area, while in definition 2 it
should be executed on the set D, built using automated means and verified for consistency by the logical
deduction system.</p>
          <p>c) The interpretation function F in definition 1 is chosen by the user according to their professional
competence, own or reference information, and in definition 2 this function is formed based on general
sources of text information such as encyclopedias, dictionaries, results of syntactic and semantic analysis,
etc. For example, for the authors of this work, using definition 1 is sufficient from the perspective of their
competence.</p>
          <p>d) The set of axioms A in definition 2 describes additional specific definitions of concepts from D and
limitations on the interpretation of Rc for a given SA.</p>
          <p>Therefore, it is necessary to define the subject area because the same concepts in different SA may have
different meanings. The system input are texts related to the given SA (for now only in Ukrainian, although
the further development of the system is planned by including other languages, in particular English).</p>
          <p>
            The interpretation area of concepts and ontological relationships is represented by a set of concepts X
of text T, on which the terminology of the SA is built, in which these concepts and a set of semantic
relationships R between concepts are interpreted. For the set X, the interpretation area XF is divided into
classes (for example, proper/common names, names of individuals, abstract/concrete names, expertise, etc.)
[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. This division uses the results of syntactic-semantic analysis, which builds classes of concepts by their
types. In addition, syntactic dependencies between sentence members are found by this analysis and
illustrated in the form of an acyclic graph. Syntactic dependencies, by a certain relationship between
sentence members, carry certain semantic information that is used to detect semantic features and potential
semantic links between lexical units. Detection of semantic features is not done according to rules (they
simply do not exist), but depends on the goal of the analysis, on researchers, and on the developers' skills.
          </p>
          <p>
            The attributive language AL is the basis for the descriptive logics (DL). AL contains the set of atomic
concepts CN and the set of atomic roles RN (binary relationships on CN) [
            <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
            ]. More complex concepts
and relationships are built using constructors.
          </p>
          <p>The semantics of concepts and relationships is built according to set theory, and the following concept
constructors are used: union of concepts, existential quantifier, numerical restriction, and negation of any
concept. The semantics of the concept language is a fragment of the first-order predicate language.</p>
          <p>The extension of the AL language by some subset of constructors gives a specific descriptive logic. If
we add to the AL language the constructor of negation (C - complement), called the complement of the
concept, we get the ALC logic. This logic forms the core of the entire family of descriptive logics.</p>
          <p>
            The formal description of the syntax and semantics of the ALC logic can be found in [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], and so we do
not introduce these concepts here, but instead return to our example, specifically to Figure 8. This figure
represents the results of the linguistic analysis of the input text T. The onthograph on the Figure 8
accumulates the set of concepts C and the set of relations R.
          </p>
          <p>Indeed, in this case we have:
C = {C1 = канал-розподілу, C2 = маркет-політ.-розподілу, C3 = марк.-прогр.підпр,</p>
          <p>C4 = уклад-комер-угод, C5 = конфліктн-ситуац, C6 = збут-діял-підпр,…},
R = {R1 (C1, C2), R2 (C1, C3), R3 (C1, C4), R4 (C1, C5), R5 (C3, C6), R5 (C4, C6), R5 (C5, C6),}.</p>
          <p>These sets may not actually be atomic. To ensure that these are indeed atomic concepts and roles, we
need an interpretation of these sets. The interpretation will determine which concepts are atomic and which
are derived from atomic concepts, and the same will be done for roles.</p>
          <p>Therefore, the information presented in Figure 8 is a high-level partially interpreted ontology
template, which after clarification of the subject area and interpretation is transformed into an ontology in
which logical analysis is performed and after logical analysis, an ontological knowledge base is constructed.
How does this happen? Let's consider our example.</p>
          <p>Example 1. Let's consider a given SA and the interpretation of concepts that appear in the text T about
the marketing policy of a company:</p>
          <p>Objects = {С1 = канал-розподілу, С11 = легальний, С12 = нелегальний, C6 =
контроль-над-ринкомзбуту, C5 = розв’язання-конфл-ситуацій, C7 = розподіл-товарів, C8 = контрабанда, C9 = наркотрафік,
C2 = маркет-політика, C3 = програма-маркет, C4 = комерц-угода, C21 = підприємство, C22 = товари, C23
= користувач-каналами, C24 = суб'єкти}</p>
          <p>Let the terminology of the given SA be:</p>
          <p>Канал ≡ легальний ⨆ нелегальний,
Легальний ≡ тривала-комерц-угод ⨆ нетривала-комерц-угода ⨆ контроль-над-ринком-збуту
⨆ розподілу-товарів ⨆ розв’язання-конфл-ситуацій,
Нелегальний ≡ контрабанда ⨆ наркотрафік ⨆ розв’язання-конфл-ситуацій,
Програма-маркет ≡ програма-підприємства ⨆ тривала-комерц-угода ⨆ нетривала-комерц-угода
⨆ збутова-діяльність-підпр,
Підприємство ≡ виготовлення-лікарських засобів ⨆ виготовлення-косметики,</p>
          <p>Товари ≡ {серцеві, діабетичні, ортопедичні} ⨆ {шампуні, гелі, креми}.</p>
          <p>From this terminology, such atomic concept sets can be extracted:</p>
          <p>CN = {C3 = програма-підпрємсва, C4 = комерц-угод, С41 = нетр-комерц-угод, C6 =
збутовадіяльність-підпр, C8 = контрабанда, C9 = наркотрафік, C5 =розв’яз-конфл. ситуацій, С9= серцеві, С10=
діабетичні, С11= ортопедичні, С12= шампуні, С13= гелі. С14= креми}.</p>
          <p>The ontograph from Figure 8 gives us the set of roles RN (binary relations) of concepts:
DN = {R1 (C1, C2), R2 (C1, C3), R3 (C1, C4), R4 (C1, C5), R5 (C3, C6), R5 (C4, C6), R5 (C5, C6)}.</p>
          <p>It is clear that the set DN can be expanded with additional relations, for example, one can add the relation
R1 (C1, C9).</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>End of example 1.</title>
          <p>The terminology allows us to record general knowledge about concepts and roles, but in addition, it is
also necessary to record knowledge about specific objects or individuals. For example, we need to
understand to which concept they belong and how they are connected to each other. This is found in that
part of the knowledge base, which is called the system of facts about individuals or ABox. For this purpose,
in addition to the set of atomic concepts CN and the set of atomic roles RN, a finite set of IN - individual
names is introduced.</p>
          <p>For example, if we return to the initial text (sentences):
1)Головним у маркетинговій політиці розподілу є формування відповідних каналів .
2)Важливість цього питання визначається такими обставинами : вибраний канал розподілу
справляє принциповий вплив майже на всю маркетингову програму підприємства; формування
каналу розподілу передбачає укладення тривалих комерційних угод з його суб'єктами , які
потім дуже важко змінити , нехай навіть вони й будуть помилковими ; між суб'єктами каналів
часто виникають конфлікти, які погано відбиваються на результатах збутової діяльності
підприємства; користувач каналами розподілу (продуцент товарів ) часто тією чи іншою
мірою втрачає безпосередній контроль над ринком збуту.</p>
          <p>we see these concepts are not clearly defined. The absence of clarity in the formulation of the
conceptterm "політика розподілу", "канал розподілу". This is partly compensated by interpretation and
terminology, but not fully. The semantics of the word "розподіл" is semantically limited and requires
clarification:
1) In the phrase "Важливість цього питання", the noun "питання" does not correlate with the previous
sentence, where this word is not actualized and there is no question mark at the end of the sentence.
2) The term "обставини" encompasses various concepts, including: "канал розподілу товарів";
"процес формування каналу" and "укладення угод", while the information is vague and unclear.
3) The fact presented in "користувач каналами розподілу часто тією чи іншою мірою втрачає
безпосередній контроль над ринком збуту" is unclear and not necessarily true.
4) Forming channels for distributing their own products, the enterprise cannot but find answers to three
questions.</p>
          <p>Despite the absence of clearly defined concepts and their semantic meanings, logical analysis can be
performed on partially interpreted ontology. This is because with the tools of logical-mathematical analysis,
we process not words and their compatibility, but the compatibility of concepts represented by codes of
corresponding concepts. This explains the presence of arrows on Figure 1, which are labeled "Refinement
and Specification" and relate to both semantic and logical refinement. Logical refinement may be required
by terminology (terminology will be contradictory if its axioms are contradictory) and facts. If
terminological contradictions are resolved at the syntactic level, factual axioms require resolution by logic,
which is used in such analysis.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>
        An obvious property of any approach to resolving a specific task is its potential for automation and
efficiency. The complexity of algorithms used in linguistic analysis is assessed as follows: the
morphological module processes 1000 word forms in 1.2 seconds, the syntactic-semantic module in 10-20
seconds, and the complexity of algorithms for checking the compatibility of a knowledge base and working
with ontologies for language ALC (in the aforementioned algorithm, steps 4-6) belongs to the class of
PSPACE-hard complexity [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Such activities are supported by tools such as OWL and Protégé based on
the ALC description logic and its extensions, which are specifically designed for the creation of ontologies
and knowledge bases [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Full automation of knowledge base construction currently seems problematic,
as certain details must be clarified and added by experts in the software.
      </p>
      <p>This opinion is held by the majority of developers of ontological knowledge bases</p>
      <p>This last assessment leads to a certain skepticism in the community of practitioners and, in particular,
linguists, programmers, and knowledge base administrators. Upon such criticism and skepticism, the
response emerges from the described approach and the possibility of its implementation in practice. Full
automation at this stage appears somewhat problematic, as certain details must be clarified by an expert in
the field. The expansion of ontology and the knowledge ontological base and the clarification of its
concepts, related to a specific subject area, should be done by the appropriate expert or experts in this
subject area.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>
        Based on the above, the following conclusions can be drawn. The combination of linguistic
(semanticsyntactic) analysis and logical-modeling and ontological paradigm allows us to assert that the process of
acquiring knowledge and consequences from these inferences can be significantly automated. The
significance of the proposed method is seen in the perspectives of development as both linguistic and logical
analysis of the input text. It is necessary to use the method of automatic construction of an
informationsearching thesaurus of a certain subject. Therefore, the task is to develop as many thesauruses as possible
for different fields of science and technology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The projection of a thesaurus onto a specific text of a
certain subject will help create a semantic network of the text, then the combination of syntactic-semantic
relations with logical thesauruses will be the starting point for applying the logical-modeling method.
      </p>
    </sec>
    <sec id="sec-8">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.L.</given-names>
            <surname>Semotyuk</surname>
          </string-name>
          , Modern technologies of linguistic research, Lviv,
          <year>2011</year>
          , pp.
          <fpage>151</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.P.</given-names>
            <surname>Darchuk</surname>
          </string-name>
          ,
          <article-title>Computer annotation of Ukrainian text: results and prospects</article-title>
          .
          <source>Kyiv: Education of Ukraine</source>
          ,
          <year>2013</year>
          , 543 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.V.</given-names>
            <surname>Lande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Yu.Subach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ya</surname>
          </string-name>
          . Gladun,
          <article-title>Processing of extremely large data sets ( Big Data): tutorial, Kyiv: KPI named after Igor Sikorskyi</article-title>
          , Polytechnic Publishing House,
          <year>2021</year>
          .
          <source>ISBN 978- 966-2344-83-7</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Leung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Leung</surname>
          </string-name>
          , SUNNYNLP at SemEval-2018 Task 10:
          <article-title>A Support-Vector-MachineBased Method for Detecting Semantic Difference using Taxonomy and Word Embedding Features</article-title>
          .
          <source>- Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          ,
          <year>2018</year>
          , P.
          <fpage>741</fpage>
          -
          <lpage>746</lpage>
          . http://doi.org/10.18653/v1/
          <fpage>S18</fpage>
          -1118
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.L.</given-names>
            <surname>Kryvyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.P.</given-names>
            <surname>Darchuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.I.Provotar</surname>
          </string-name>
          ,
          <article-title>Ontology-like systems for the analysis of natural language texts</article-title>
          . J. “Problems of programming”,
          <year>2018</year>
          , No.
          <fpage>2</fpage>
          -3, P.
          <fpage>132</fpage>
          -139
          <source>(proceedings of the international conference J.“UKRPROG-2018”. DOI: 10.15407/pp2018.02.132</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Revealing the Myth of Higher-Order Inference in Coreference Resolution</article-title>
          ,
          <source>Proceedings of the 2020 , Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , P.
          <fpage>8527</fpage>
          -
          <lpage>8533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dernoncourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nakashole</surname>
          </string-name>
          , Rethinking
          <string-name>
            <surname>Self-Attention: An Interpretable Self-Attentive Encoder-Decoder Parser</surname>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Che</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , T. Liu, Towards Better UD Parsing:
          <article-title>Deep Contextualized Word Embeddings</article-title>
          , Ensemble, and
          <string-name>
            <given-names>Treebank</given-names>
            <surname>Concatenation</surname>
          </string-name>
          .
          <article-title>- Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          . - Association for Computational Linguistics,
          <year>2019</year>
          , P.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zha n</surname>
          </string-name>
          , H.Zhao,
          <article-title>Span Model for Open Information Extraction on Accurate Corpus</article-title>
          .
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          ,
          <volume>34</volume>
          (
          <issue>05</issue>
          ),
          <year>2020</year>
          , P.
          <fpage>9523</fpage>
          -
          <lpage>9530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoherchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Darchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kryvyi</surname>
          </string-name>
          ,
          <article-title>Representation, analysis, and extraction of knowledge from unstructured natural language text</article-title>
          ,
          <source>Cybernetics and Systems Analysis</source>
          ,
          <year>2021</year>
          , Volume
          <volume>57</volume>
          ,
          <string-name>
            <surname>N</surname>
          </string-name>
          <year>3</year>
          ., P.
          <fpage>164</fpage>
          -
          <lpage>183</lpage>
          . https://doi.org/10.1007/s10559-021-00373-7
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.L.McGuinness</surname>
          </string-name>
          <article-title>and other</article-title>
          ,
          <source>The Description Logic Handbook</source>
          , Cambridge, University Press,
          <year>2007</year>
          , pp
          <fpage>601</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoherchak</surname>
          </string-name>
          ,
          <article-title>Knowledge Based and Description Logics Applications to Natural Language Texts Analysis</article-title>
          ,
          <source>Proceedings of the 12th International Scientific and Practical Conference of Programming (UkrPROG</source>
          <year>2020</year>
          ),
          <year>2021</year>
          , Volume 2866, P.
          <fpage>259</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rothman</surname>
          </string-name>
          ,
          <source>Transformers for Natural Language Processing (2nd addition)</source>
          ,
          <source>publishing Packt</source>
          ,
          <year>2021</year>
          , pp
          <fpage>384</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>M.-W. Devlin</surname>
            ,
            <given-names>K. Chang</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL-</article-title>
          <string-name>
            <surname>HLT</surname>
          </string-name>
          ,
          <year>2019</year>
          , P.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.R.</given-names>
            <surname>Vyhovanets</surname>
          </string-name>
          ,
          <article-title>Grammar of the Ukrainian language</article-title>
          . Syntax, Kyiv,: Lybid,
          <year>1993</year>
          , pp
          <fpage>365</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>OWL</given-names>
            <surname>Full</surname>
          </string-name>
          ,
          <source>OWL DL and OWL Lite</source>
          . - http:// www.w3.org/TR/owlquade/#Sublanguage-def.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>