<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Computing</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5815/ijitcs.2013.10.06</article-id>
      <title-group>
        <article-title>Analysis of Scientific Texts Metrics for Ontology Concepts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktor Hryhorovych</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>18</volume>
      <fpage>22</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Semantic analysis of textual information is a problem that does not lose its relevance. It has to be solved when solving such tasks as automation of filtering, classification, and clustering of text documents, automation of abstracting a given text, automation of evaluation of answers to open test tasks, automatic construction of a semantic network for a given text, etc. All such tasks are limited to quantifying the elements of a text document and the relationships between them. This paper proposes a method of semantic analysis based on inverse-additive metrics, which takes into account the semantic distance between the terms of the ontology in the text document being analyzed. This metric allows you to correctly process cases where there are several paths in the oriented graph of the ontology from one concept node to another. Semantic analysis of scientific documents is considered, as such texts have a clear structure. The concept of semantic distance between the terms of a scientific text and the semantic weight of a scientific text is introduced. The semantic weight of individual fragments of a scientific text is used to solve the problem of automatic abstracting.</p>
      </abstract>
      <kwd-group>
        <kwd>1 semantic analysis</kwd>
        <kwd>semantic metrics</kwd>
        <kwd>ontology</kwd>
        <kwd>semantic distance</kwd>
        <kwd>semantic weight</kwd>
        <kwd>automatic abstracting</kwd>
        <kwd>text analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>At the same time, it is necessary to overcome some difficulties in the implementation of the
proposed approach, associated with critical nodes on the way in the oriented graph of the ontology
from one concept node to another.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Both works on philology and scientific articles in the field of information technology and
computational linguistics are devoted to the semantic analysis of texts.</p>
      <p>The first category includes works [1-2]. In [1] literary terms are investigated. This is a philological
study that analyzes terms related to the theory of literature, its history, processes and dramatic works.
Work [2] is a philological study that combines approaches that consider the text as a set of
communicative blocks, zones of compression and scattering, noun chains, and thematic progression.</p>
      <p>The following work concerns the development and application of information technology for the
semantic analysis of texts. In [3], a method of latent semantic analysis is described, which assumes
that words close in meaning will occur in similar fragments of text. A matrix containing the number
of words per document is considered (rows represent unique words, and columns represent each
document). The documents are compared based on the cosine of the angle between two vectors (or the
scalar product between the normalizations of two vectors) formed by the corresponding two columns.
Values close to 1 represent very similar documents, while values close to 0 represent very different
documents. The work [4] describes the results of three experiments that demonstrate the use of
methods of latent semantic analysis for the study of texts: the correspondence of annotations to
annotated texts; characterization of essay quality and measurement of text coherence. In [5] the
research of scientific texts on the basis of the constructed annotated corpus of 8736 citations is
described. 6 different algorithms of machine learning with preliminary normalization of data for noise
removal were used. In [6] described intelligent information systems for semantic analysis, semantic
interpretation, and understanding of data, designed to support data management processes. These
processes are performed using linguistic techniques and semantic interpretation of the analyzed sets of
information/data during the processes of description and interpretation. Methods of semantic
interpretation allow extracting information from the sets of analyzed data. This improves
decisionmaking processes and improves the entire data and information management process. In [7] one of the
approaches and methods of semantic analysis is considered - the approach based on vocabulary. It
consists in calculating the orientation of sentiments of the whole document or set of sentences based
on the semantic orientation of vocabulary. In [8], a set-theoretic approach is proposed to describe the
double relation “M is a model of system S”. In [9] new generation systems are studied – cognitive
systems. Their feature is the semantic analysis of data. Cognitive information systems, their
definitions, discussion of perception models, classification of cognitive information systems, and
presentation of decision-making methods are discussed. Particular attention is paid to the
decisionmaking process in cognitive systems. In [10] the method of automated identification of metaphors in
the semantically annotated corpus of texts is described. Work [11] is the first in its field. It describes
the author's method of using NLP and semantic analysis of texts for mapping supply chains. In [12]
the developed method of semantic analysis for processing Ukrainian-language texts is described,
which allows analyzing Ukrainian-language content using the method of latent-semantic analysis (see
[13] and [14]) and morphoanalyzer Pymorphy2. The NER model was used to expand the semantic
capabilities of the developed system. [15] describes the results of studies of phenomenology and the
concept of unambiguity in linguistics when comparing the same aspects of accuracy. The theory of
semantic states and the apparatus of hyperchains in lexicographic structures were used, which made it
possible to formalize the semantics of language constructions and to distinguish between the concepts
of accuracy and unambiguity. The concept of accuracy is introduced as a definition of all lexical
meanings, semantic states, and their superpositions in which the analyzed token can function. [16]
describes the transformation of words into vectors of real numbers (embedding words, see [17] and
[18]). Previously, no vector research was conducted using the Word2vec technique to create a
Ukrainian word corpus. Libraries of licensed open-source software libraries "Gensim" were used to
implement machine learning using Word2vec methods in Python and calculations of cosine affinity of
the obtained vectors. The extent to which vectors are obtained from the Ukrainian corpus and how
word vectors are grouped and associated according to the morphological features of Ukrainian
language suffixes have been studied. [19] describes the use of generating grammars in linguistic
modeling. To automate the study and synthesis of natural language texts, sentence syntax analysis is
used. The main differences in the grammatical and phonetic structure of English and Ukrainian
languages are analyzed. The optimal method of automatic processing of the text set of Ukrainian
language content in relation to essential keywords and identification of content categories, analysis of
syntax, and semantics of the text is determined. Article [20] defines the specification language for
high-level testing scenarios for testing critical systems based on the built-in Uppaal Timed Automata
model. The scalability of the method is demonstrated by the example of satellite software testing.</p>
      <p>[21] describes the web system developed by the authors to visualize the structure of the ontology
data. The creation of the system of visualization of ontologies of the subject area on the example of
ontology models is described in detail. The system provides tools for a dynamic display of different
types of ontographs in accordance with the established visualization criteria, which allows their use in
information systems for the operational management of objects. creation of a system of visualization
of ontologies of the subject area on the example of ontology models. Here is an example of
visualizing the concept of "computer network attack". [22] describes the developed system of
production environment management, which simplifies the process of analyzing a large amount of
information from different sources. [23] describes the results of a study of the process of forming a
semantic core for a web resource. The study expands the concept of the semantic network based on
four components: URI, ontology, data, and semantic language. This concept is implemented in the
work with the help of the semantic core. The core is formed on the principle of annotation based on an
algorithm based on the semantic network and the method of Data Mining technology. Thus, an
alternative implementation of Semantic Web components is proposed. The RDF scheme is used to
represent the semantic core. The software is implemented using JavaScript using the Node JS library.
[24] describes an approach based on the fact that ontology is a mechanism for obtaining information
on the Internet in a more structured way using the semantic network. The focus is on choosing the
presentation of documents suitable for creating user-profiles and supporting the content-based search
process. The semantic web solves this problem, makes data understandable by machines in the form
of an ontology, and the multi-agent extracts useful knowledge hidden in this data and makes it
available. [25] describes the ontological decision support system in automated military control
systems. The core of the system is an ontology that combines three levels: 1) an ontology focused on
the domain, subject area - contains the concept of taxonomy, relationships, instances of classes, and
different types of constraints - axioms. Axioms establish semantic rules for the system of relations; 2)
task-oriented ontology - describes the solution of specific problems, contains knowledge of the
specifications of structures (databases) and methods of data processing; 3) ontology of the upper level
- describes the categories - the concept of the upper level. Examples are physical, functional, and
behavioral concepts and attitudes that relate to general scientific concepts. A decision support system
has been developed - a prototype of an automated control system for the Land Forces of the Armed
Forces of Ukraine according to the standards of NATO member countries. In [26] the authors describe
their unique technology of organizing data warehouses on the basis of consolidated data from
libraries, archives, and museums. The technology is based on multidimensional data analysis and
building a data hypercube. This technology is interesting because in combination with the semantic
analysis of textual information will simplify and increase the efficiency of the social and
communication environment in general and information processing technologies in it. [27] describes
the developed system of automated compilation and formation of digests of electronic publications in
the media, the selection of critical content from one or more documents, and the formation of concise
reports based on them. The system monitors information, receives large amounts of data, analyzes,
organizes data using an automatic header, collects information, indexes material and stores it in a
database, solves thematic filtering, and generates digests automatically. [28] describes the developed
unified methodology for processing information resources in e-content commerce systems. A
formalized method of content analysis is used, which allows you to fully automate the process that
occurs when an author adds a new article. The method identifies articles whose topics are similar to
those viewed by the user.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods 3.1. Inverse-additive metric for ontology concepts</title>
      <p>The inverse-additive metric [29] allows calculating the distance between ontology concepts in the
case when there are several paths from one concept to another. Consider the representation of
ontology concepts and the relationships between them in the form of an oriented graph. Then each
concept will correspond to a certain node. If the ontology is organized in the form of an explanatory
dictionary, then each term is a keyword and its interpretation; and the text of the interpretation
contains keywords - references to other terms. This is the reason for the existence of several paths
from one node to another in the oriented graph of the ontology.</p>
      <p>Define the distance R(A, B) between the concepts A and B as follows:</p>
      <p>1
 ( ,  ) = 
where</p>
      <p>ontology from concept A to concept B.
number of transitions from one concept to another:
  – is the number of transitions from concept A to concept B on the i-th path,  =1, …,  ,
– is the number of different paths that can be taken on the oriented graph of a particular
If there is a single path between concepts A and B, then the distance between them is equal to the
The more paths there are between concepts, the smaller the distance will be, that is, the
semantically closer the corresponding terms will be.</p>
      <p>It is proved that this definition satisfies the axioms of the metric. Note that a pair of
complementary symmetric connections must be introduced for the axiom of symmetry. For example,
for an explanatory dictionary ontology, it is a pair of "uses-of" - used-in relationships, which allows
the symmetry axiom to be met for the proposed metric in the following interpretation:
 
− ( ,  ) =  
− ( ,  )
3.2.</p>
      <p>Semantic distance between terms of a scientific text
Scientific texts have a clear hierarchical structure, as shown in Fig. 1. Here:
Level 1 - the name of the document (root node);</p>
      <p>Level 2 - authors, keywords, abstract, sections, list of sources used. Information content of this
level: list of authors, list of keywords, the text of the annotation, titles of sections, names of used
Level 3 - sections. Information content of this level: the names of sections. Other levels are
Level N - sentences. Information content of level N: words that are part of one sentence.
sources;
possible.</p>
      <p>Consider the problem of calculating the semantic distance between two terms A and B from the
ontology, which are in some text. That is, the ontology contains these terms as concepts A and B. The
distance R(A, B) between these concepts in the ontology is determined by the formula (1).</p>
      <p>The distance between two terms in a scientific text should be defined to take into account the
following. 1) Repeated occurrence of each of these terms in the scientific text. 2) The hierarchical
structure of the document. 3) The presence of many paths from each occurrence of the term A to each
occurrence of the term B. 4) The distance between the corresponding concepts of these terms in the
ontology.
where</p>
      <p>( ,  ) – is the semantic distance between the terms A and B from the ontology in the scientific
text;
  – the number of occurrences of the term A in the scientific text;
  – the number of occurrences of the term B in the scientific text;
 (  ,   ) – is the semantic distance between   – the i-th instance of the term A, and   – the k-th
instance of the term B from the ontology in the scientific text.</p>
      <p>It is necessary to consider in what places of the text there are the specified terms.</p>
      <p>Consider the example shown in Fig. 2. Suppose that in the ontology used to evaluate a scientific
text, the distance between concepts A and B is L.</p>
      <p>1</p>
      <p>B
1
4
2
3
B
8
6
7</p>
      <sec id="sec-3-1">
        <title>Paper title</title>
      </sec>
      <sec id="sec-3-2">
        <title>Authors, keywords, abstract, titles of sections, reference list</title>
      </sec>
      <sec id="sec-3-3">
        <title>Titles of subsections</title>
      </sec>
      <sec id="sec-3-4">
        <title>Sentences</title>
      </sec>
      <sec id="sec-3-5">
        <title>Words in the same sentence</title>
        <p>Thus, the more occurrences of ontology terms in a scientific text, the smaller the semantic distance
between them.
3.3.</p>
        <p>Semantic weight and semantic size of a scientific text</p>
        <p>For the semantic analysis of scientific texts, it is necessary to define the concept of semantic
weight. The semantic weight SW of a scientific text relative to a certain ontology is the sum of the
inverse distances between all terms of the ontology in the scientific text:
 ( ,  )
,
 ( ,  ) – is the semantic distance between the terms A and B from ontology in a scientific text (4).</p>
        <p>Thus, the more terms there are in a scientific text (or its fragment) and the smaller the semantic
distance between them - the greater the semantic weight of this text (fragment).</p>
        <p>The inverse value can be considered as the semantic size of a scientific text (then the semantic size
will have the same dimension as the semantic distance).
3.4.</p>
        <p>Automatic abstracting of a scientific text based on the semantic weight
of sentences</p>
        <p>Semantic analysis of scientific texts based on semantic weight (5) will solve the problem of
automatic abstracting based on the selection of sentences (paragraphs) with the highest semantic
To do this, first define the sentence</p>
        <p>with the largest semantic weight:
where
weight.
where
where
(5)
(6)</p>
        <p>will
(7)


= arg max{</p>
        <p>,
  ∈  ,   ∈  ,</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment 4.1. Calculation of the distance between the concepts of ontology</title>
      <p>method [30]).
concepts.</p>
      <p>The algorithm for calculating the distance between two concepts of ontology is based on the
calculation of the maximum flow between two given nodes of an oriented graph (Ford-Fulkerson</p>
      <p>Consider the ontology of computer science terms. The fragment of the corresponding owl file has
the form shown in Fig. 3. Parsing the owl ontology file will reveal the connections between the
 (  ,   ) – is the semantic distance between the terms   and   from the ontology in the
sentence s of the scientific text (4).</p>
      <p>( ) – is the semantic weight of the sentence s.</p>
      <p>Next, we determine the degree of compression  in automatic abstracting: the result of  
include those sentences whose semantic weight differs from the maximum by no more than  :
 
= { |
( ) ≥ (1 −  )
( 
 )},
– set of sentences, the result of automatic abstracting;
(</p>
      <p>) – the maximum semantic weight of a sentence in a scientific text.</p>
      <p>Thus, automatic abstracting will be implemented as a selection of sentences and compression of
the scientific text in 1/ times relative to the initial number of sentences based on their semantic</p>
      <p>The Computer Science Ontology is an explanatory dictionary that contains a set of terms, each
term being a &lt;keyword, definition&gt; pair. The Individuals section of the ontology will be of
interest to us in the first place because it contains the definition of terms. The definition of each
concept term begins with the tag owl:NamedIndividual, its identifier is contained in the rdf:about
attribute. A subsection that begins with the keyword tag contains the keyword of the term, and a
subsection that begins with the definition tag contains its definition. Concepts related to the current
term "uses-of" are listed in subsections that correspond to UsesOf tags, and their identifiers are the
value of the rdf:resource attribute. The terms-concepts associated with the current used-in concept
are listed in the subsections that correspond to the UsedIn tags, and their identifiers are also the value
of the rdf:resource attribute. Thus, the "uses-of" relationship between the terms of the ontology can
be schematically depicted as follows (Fig. 4):
owl:NamedIndividual
rdf: #A
owl:NamedIndividual
rdf: #B
owl:NamedIndividual
rdf: #C
owl:NamedIndividual
rdf: #D
UsesOf
rdf: #B
UsesOf
rdf: #C</p>
      <p>UsesOf
rdf: #D</p>
      <p>UsesOf
rdf: #D
UsesOf
rdf: #E
UsesOf
rdf: #F</p>
      <p>UsesOf
rdf: #E
UsesOf
rdf: #F</p>
      <p>As you can see, the ontology owl file represents related terms as a list of contiguous vertices of the
oriented graph (Fig. 5):</p>
      <p>A
B
C
D</p>
      <p>B
D
D
E</p>
      <p>C
E
F</p>
      <p>F</p>
      <p>A
C
E</p>
      <p>B
D
F</p>
      <p>To calculate the distance between two terms-concepts of ontology, one must find all the ways from
one concept to another. To do this, solve the traffic flow problem in an oriented ontology graph from
the source node corresponding to the first concept to the receiving node corresponding to the second
concept.</p>
      <p>To simplify the calculations, discard all paths with a length of more than 4 transitions. Such paths
will not be essential for calculating the semantic size of the text.</p>
      <p>The algorithm is implemented by means of C#, Neo4j database is used to store intermediate
results.</p>
      <p>The SDIO (Semantic Distance In Ontology) function gets the ID of two terms:</p>
      <p>public async Task&lt;decimal&gt; SDIO(long from, long to)</p>
      <p>We need to get the number of paths from one term to another in an ontology with a certain number
of transitions (from one to four):
for (int i = 0; i &lt; 4; i++)
{
var countResult = await session.RunAsync(</p>
      <p>_transactions.GetCountOfTransitions(from, to, i)
);
var record = await countResult.SingleAsync();
var count = (long)(record.Values.Single().Value);
dictionary.Add(i + 1, count);
}
We write down the results in the dictionary, with the key – the number of transitions.
After obtaining the number of paths between terms, calculate their distance in the ontology:
var N = 0m;
foreach (var item in dictionary.Keys)
{</p>
      <p>N += dictionary[item] / item;
}
var L = 1m / N;
L – the result of calculations, the result of the function.</p>
      <p>For optimization, you can save the results of calculating the distance between a pair of terms,
because the distance between the terms of the ontology does not change often (only when editing the
ontology). At the beginning of the algorithm, we will check whether the algorithm has already been
executed for this pair of terms, if so, we will return the result, without re-execution.
var sdio = SDIOTerms
.Where(v =&gt; (v.From == from &amp;&amp; v.To == to)</p>
      <p>|| (v.From == to &amp;&amp; v.To == from))
.FirstOrDefault();
if(sdio != null)
{</p>
      <p>return sdio.L;
}
SDIOTerms.Add(new SDIOTerm
{
});</p>
      <p>L = L,
To = to,</p>
      <p>From = from
4.2. Calculating the distance between terms in a scientific text</p>
      <p>The calculation of the distance between two terms from the ontology in the scientific text will be
based on the calculation of the sum of pairwise distances between each occurrence of these terms in
the scientific text.</p>
      <p>The SDIT (Semantic Distance In the Text) function gets two terms, between which you need to
find the distance and structured text (an object in which paragraphs and sentences are clearly
separated, as well as the terms used in it). Terms are passed using their ID.</p>
      <p>public async Task&lt;decimal&gt; SDIT(long from, long to, StructuredText text)
Text structuring is implemented using regular expressions.</p>
      <p>We obtain the semantic distance between these terms in the ontology:</p>
      <p>var L = await SDIO(from, to);</p>
      <p>We will record the results in a collection – a dictionary, the keys of which will be the number of
transitions between terms from the text:
var Ns = new Dictionary&lt;long, long&gt;()
{
{ 0, 0 },
{ 2, 0 },
{ 4, 0 },
};
Implementation of the calculation of the number of transitions between terms in the text:
var textParagraphs = new List&lt;StructuredParagraph&gt;();
textParagraphs.AddRange(text.Paragraphs);
foreach (var paragraph in text.Paragraphs)
{
var paragraphSentances = new List&lt;StructuredSentence&gt;();
paragraphSentances.AddRange(paragraph.Sentences);
foreach (var sentence in paragraph.Sentences)
{
if(sentence.Terms.Any(v =&gt; v.Id == from)</p>
      <p>&amp;&amp; sentence.Terms.Any(v =&gt; v.Id == to))
{</p>
      <p>Ns[0] += sentence.Terms.Count(v =&gt; v.Id == from)</p>
      <p>* sentence.Terms.Count(v =&gt; v.Id == to);
}
paragraphSentances.Remove(sentence);
var sentencesTerms = paragraphSentances</p>
      <p>.SelectMany(v =&gt; v.Terms);
Ns[2] += sentence.Terms.Count(v =&gt; v.Id == from)
* sentencesTerms.Count(v =&gt; v.Id == to)
+ sentence.Terms.Count(v =&gt; v.Id == to)
* sentencesTerms.Count(v =&gt; v.Id == from);
}
textParagraphs.Remove(paragraph);
var paragraphsTerms = textParagraphs</p>
      <p>.SelectMany(v =&gt; v.Terms);
Ns[4] += paragraph.Terms.Count(v =&gt; v.Id == from)
* paragraphsTerms.Count(v =&gt; v.Id == to)
+ paragraph.Terms.Count(v =&gt; v.Id == to)
* paragraphsTerms.Count(v =&gt; v.Id == from);
}
Now let us calculate the semantic distance:
var Re = ((Ns[0] * 1m) / (L + 0m))
+ ((Ns[2] * 1m) / (L + 2m))
+ ((Ns[4] * 1m) / (L + 4m));</p>
      <p>Re – the result of the function – the semantic distance between terms in the text.
4.3. Calculation of the semantic weight of a scientific text in relation to
ontology</p>
      <p>Semantic weight is the sum of the inverse distances between all ontology terms in a scientific text.
The SWOT (Semantic Weight Of the Text) function gets structured text:</p>
      <p>public async Task&lt;decimal&gt; SWOT(StructuredText text)</p>
      <p>Implementation is quite simple, as all the necessary algorithms have already been implemented.
All you need to do is find the semantic distances between each pair of terms in the text and sum up
the inverse values.</p>
      <p>First, we get all the terms:
var terms = text.Terms.Distinct(new TermComparer()).ToList();
var otherTerms = terms.ToList();
{
}
public class TermComparer : IEqualityComparer&lt;Term&gt;
{
public bool Equals([AllowNull] Term x,</p>
      <p>[AllowNull] Term y)
return x.Id == y.Id;
}
public int GetHashCode([DisallowNull] Term obj)
{</p>
      <p>return obj.GetHashCode();
}
Implementation of the semantic weight calculation algorithm:
var SW = 0m;
foreach (var from in terms)
{
otherTerms.Remove(from);
foreach (var to in otherTerms)
{
}
}
return SW;</p>
      <p>SW += 1m / (await SDIT(from.Id, to.Id, text));
4.4. Automatic abstracting of scientific texts</p>
      <p>These algorithms can be used to solve the problem of automatic abstracting of texts. To do this,
select fragments of text (sentences, paragraphs) with the greatest semantic weight.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The result of the developed program is verified on the example of an ontology formed based on an
explanatory dictionary of computer science [31]. Since the created program has a Ukrainian-language
interface, the English translation of the inscriptions on the controls is provided with the help of notes.</p>
      <p>The text from Wikipedia, the article "Computer programming" (Fig. 6) was used for testing:</p>
      <sec id="sec-5-1">
        <title>Text analysis</title>
      </sec>
      <sec id="sec-5-2">
        <title>Text</title>
      </sec>
      <sec id="sec-5-3">
        <title>Show terms</title>
      </sec>
      <sec id="sec-5-4">
        <title>Paragraph analysis</title>
      </sec>
      <sec id="sec-5-5">
        <title>Sentence analysis</title>
      </sec>
      <sec id="sec-5-6">
        <title>Text analysis</title>
        <p>The result: 30394.102 is a very large number. This means that such a text really applies to the field
of "Computer Science" (Fig. 7).</p>
      </sec>
      <sec id="sec-5-7">
        <title>Text</title>
      </sec>
      <sec id="sec-5-8">
        <title>Paragraphs</title>
      </sec>
      <sec id="sec-5-9">
        <title>Sentences</title>
      </sec>
      <sec id="sec-5-10">
        <title>The semantic size of the text in relation to the ontology of computer science</title>
        <p>The results of the analysis of paragraphs and sentences are shown in Fig. 8 and Fig. 9.</p>
      </sec>
      <sec id="sec-5-11">
        <title>The semantic size of the paragraph in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-12">
        <title>The semantic size of the paragraph in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-13">
        <title>The semantic size of the sentence in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-14">
        <title>The semantic size of the sentence in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-15">
        <title>The semantic size of the sentence in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-16">
        <title>The semantic size of the sentence in relation to the ontology of computer science</title>
      </sec>
      <sec id="sec-5-17">
        <title>The semantic size of the sentence in relation to the ontology of computer science</title>
        <p>For comparison, we test a text that is not related to computer science – an article about coffee (Fig.
10).</p>
        <p>The result is 1,999 – a very small number, which means that the analyzed text does not belong to
the field of relevant ontology.</p>
      </sec>
      <sec id="sec-5-18">
        <title>The semantic size of the text in relation to the ontology of computer science</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussions</title>
      <p>The implementation of these algorithms should take into account cases where the ontology graph
contains "critical nodes", which are the intersections of different paths from one concept term to
another.
6.1. Calculation of the distance between the concepts of ontology in the
presence of critical nodes</p>
      <p>Consider the case where there is a "crossroads" of paths leading from one concept of ontology to
another – that is, a "crossroads" of paths between nodes of the oriented ontology graph.</p>
      <p>Assume that the distance between nodes A and E along the path that passes through node B is N1;
the distance between A and E through nodes C and D is equal to N2; the distance between E and I
through nodes F, G is equal to N3; the distance between E and I through H is N4 (Fig. 11):
Here, node E is critical because it is the "crossroads" of the A-B-E-H-I and A-C-D-E-F-G-I paths.
А
В
N1</p>
      <p>N2
C</p>
      <p>D</p>
      <p>E</p>
      <p>F
N4</p>
      <p>N3
H</p>
      <p>G</p>
      <p>I
,
,
,
 ( ,  )</p>
      <p>( 
=
1
 ( ,  ) =
=</p>
      <p>,
 )
(9)
(10)
 ( ,  )  1 +  3  1 +  4  2 +  3  2 +  4
Consider the case when N1=1, N2=2, N3=2, N4=1.</p>
      <p>According to the formula (9), R(A, I) = R(A, E) + R(E, I); 1/R(A, E) = 1 + 1/2 = 3/2;
R(A, E) = 2/3 = R(E, I). From here R(A, I) = 4/3.</p>
      <p>According to the formula (10), 1/R(A, I) = 1/2 + 2/3 + 1/4 = 6/12 + 8/12 + 3/12 = 17/12;
R(A, I) = 12/17 &lt; 4/3 – the values calculated by formulas (9) and (10) differ.</p>
      <p>For the case shown in Fig. 3, N1=2, N2=3, N3=3, N4=4.</p>
      <p>According to the formula (9), 1/R(A, E) = 1/2 + 1/3 = 5/6; R(A, E) = 6/5 = R(E, I). From here
R(A, I) = 12/5 = 60/25.</p>
      <p>According to the formula (10), 1/R(A, I) = 1/4 + 2/5 + 1/6 = 15/60 + 24/60 + 10/60 = 49/60,
R(A, I) = 60/49 &lt; 60/25 – the values do not match again.</p>
      <p>Thus, the presence of critical nodes-"crossroads" does not allow the use of formula (1) to calculate
the distance between the concepts of ontology in a way that ignores such critical nodes. Therefore, the
algorithm for detecting nodes-"crossroads" will be significant.
6.2.</p>
    </sec>
    <sec id="sec-7">
      <title>Detection of critical nodes</title>
      <p>A critical node is a node that corresponds to a single minimum section of the graph, i.e. the
smallest section is equal to 1. The problem of finding the smallest section of the graph is twofold to
the problem of the largest flow [30].</p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusions</title>
      <p>This paper proposes a method of semantic analysis based on inverse-additive metrics, which takes
into account the semantic distance between the terms of the ontology in the text document being
analyzed. This metric allows you to correctly process cases where there are several paths in the
oriented graph of the ontology from one concept node to another.</p>
      <p>Semantic analysis of scientific documents is considered, as such texts have a clear structure. The
concept of semantic distance between the terms of a scientific text and the semantic weight of a
scientific text is introduced. The semantic weight of individual fragments of a scientific text can be
used to solve the problem of automatic abstracting.</p>
      <p>Some of the difficulties in implementing the proposed approach related to critical nodes on the
path in the oriented ontology graph from one concept node to another are discussed.</p>
    </sec>
    <sec id="sec-9">
      <title>8. References</title>
      <p>[2] N. Panasenko, Semantic Structure of Literary Text, Zeszyty Naukowe Uniwersytetu</p>
      <p>Rzeszowskiego. Seria Filologiczna. Studia Anglica Resoviensia 10 (2021) 38-50.
[3] Susan T. Dumais, Latent Semantic Analysis, Annual Review of Information Science and</p>
      <p>Technology 38 (2005) 188–230. doi:10.1002/aris.1440380105.
[4] P. Foltz, Latent Semantic Analysis for Text-Based Research. Behavior Research Methods,</p>
      <p>Instruments, &amp; Computers, 28 (2) (1996) 197-202. doi:10.3758/BF03204765.
[5] H. Raza, M. Faizan, A. Hamza, A. Mushtaq, N. Akhtar, “Scientific Text Sentiment Analysis
using Machine Learning Techniques”, International Journal of Advanced Computer Science and
Applications (IJACSA) 10 (12) (2019). URL: http://dx.doi.org/10.14569/IJACSA.2019.0101222.
[6] L. Ogiela, Intelligent Cognitive Information Systems in Management Applications, Cognitive
Information Systems in Management Sciences (2017) 79-122.
doi:10.1016/B978-0-12-8038031.00006-9.
[7] N. Gupta, R. Agrawal, Application and Techniques of Opinion Mining, Hybrid Computational</p>
      <p>Intelligence (2020) 1-23. doi:10.1016/B978-0-12-818699-2.00001-9.
[8] W. Hodges, Functional Modelling and Mathematical Models: A Semantic Analysis, Philosophy
of Technology and Engineering Sciences (2009) 665-692.
doi:10.1016/B978-0-444-516671.50029-X.
[9] L.Ogiela, M.R. Ogiela, Cognitive Information Systems, Advances in Cognitive Information
Systems. Cognitive Systems Monographs, vol 17, Springer, Berlin, Heidelberg, 2012, pp. 51–60.</p>
      <p>
        URL: https://doi.org/10.1007/978-3-642-25246-4_3.
[10] O. Levchenko, O. Tyshchenko, M. Dilai, Automated Identification of Metaphors in Annotated
Corpus (Based on Substance Terms), in: Proceedings of the 5th International conference on
computational linguistics and intelligent syste
        <xref ref-type="bibr" rid="ref1">ms (COLINS 2021</xref>
        ), Vol. I: main conference,
Kharkiv, Ukraine, April 22-23, 2021, pp. 16-31.
[11] H. Schöpper, W. Kersten, Using Natural Language Processing for Supply Chain Mapping: A
Systematic Review of Current Approaches, in: Proceedings of the 5th International conference
on computational linguistics and intelligent syste
        <xref ref-type="bibr" rid="ref1">ms (COLINS 2021</xref>
        ), Vol. I: main conference,
Kharkiv, Ukraine, April 22-23, 2021, pp. 71-86.
[12] N. Kunanets, Y. Oliinyk, D. Myhal, K. Shunevych, A. Rzheuskyi, Y. Shcherbyna, Enhanced
LSA Method with Ukraine Language Support, in: Proceedings of the 5th International
conference on computational linguistics and intelligent syste
        <xref ref-type="bibr" rid="ref1">ms (COLINS 2021</xref>
        ), Vol. I: main
conference, Kharkiv, Ukraine, April 22-23, 2021, pp. 129-140.
[13] K. Jindal, R Aron, A systematic study of sentiment analysis for social media data, Materials
      </p>
      <p>
        Today (2021). URL: https://www.sciencedirect.com/science/article/pii/S2214785321000705.
[14] B. Ozyurt, M. Ali Akcayol, A new topic modeling based approach for aspect extraction in aspect
based sentiment analysis: SS-LDA, Expert Systems with Applications. URL:
https://www.sciencedirect.com/science/article/pii/S0957417420309519.
[15] V. Shyrokov, “Accuracy” vs “Unambiguity” in Linguistics, in: Proceedings of the 5th
International conference on computational linguistics and intelligent syste
        <xref ref-type="bibr" rid="ref1">ms (COLINS 2021</xref>
        ),
Vol. I: main conference, Kharkiv, Ukraine, April 22-23, 2021, pp. 1-5.
[16] L. Savytska, N. Vnukova, I. Bezugla, V. Pyvovarov, M. Turgut Sübay, Using Word2vec
Technique to Determine Semantic and Morphologic Similarity in Embedded Words of the
Ukrainian Language, in: Proceedings of the 5th International conference on computational
linguistics and intelligent syste
        <xref ref-type="bibr" rid="ref1">ms (COLINS 2021</xref>
        ), Vol. I: main conference, Kharkiv, Ukraine,
April 22-23, 2021, pp. 235-248.
[17] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in
Vector Space, in: Proceedings of Workshop at ICLR 2013, Computation and Language
Scottsdale, Arizona, USA, 2013. arXiv:1301.3781v3.
[18] R. Lebret, R. Collobert, Word Embeddings through Hellinger PCA, in: Proceedings of the 14th
Conference of the European Chapter of the Association for Computational Linguistics,
Association for Computational Linguistics, Gothenburg, Sweden, 2014, pp. 482–490.
doi:10.3115/v1/E14-1051.
[19] V. Vysotska, S. Holoshchuk, R. Holoshchuk, A Comparative Analysis for English and Ukrainian
Texts Processing Based on Semantics and Syntax Approach, in: Proceedings of the 5th
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Saidova</surname>
          </string-name>
          ,
          <article-title>Semantic Analysis of Literary Terms by Literary Types in “The Concise Oxford Dictionary of Literature Terms”</article-title>
          , Philology Matters: Vol.
          <year>2021</year>
          ,
          <source>Iss. 1, Article</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>118</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>doi:10</source>
          .36078/987654486.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>