<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Determining the Directions of Links in Undirected Networks of Terms</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Formulation of the Problem</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Information Recording of NAS of Ukraine</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Technical University “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Scientific Research Institute for Informatics and Law of National Academy of Legal Sciences of Ukraine</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>132</fpage>
      <lpage>145</lpage>
      <abstract>
        <p>This paper examines and analyzes approaches for constructing network of terms as an ontological subject domain model. In particular, new approaches and rules for determining the syntax and semantic links between terms in the text and the directions of these links between nodes in undirected networks of terms constructed from terms of a thematic text corpus, are proposed and researched. Also, one of the methods for creating terminological ontologies - the algorithm for building the thematic networks of natural hierarchies of terms based on analysis of texts corpora - is considered and used to build a directed network of words and phrases (separate unigrams, bigrams and threegrams). The wellknown fairy tale “The story of Little Red Riding Hood” is provided as examples to demonstrate an accuracy of the proposed rules. The Python programming language and its separate functions of a specialized add-in - the module NLTK (Natural Language Toolkit open source library) is used to create the software realization of the proposed and considered approaches and methods. Using the software for modelling and visualization of graphs - Gephi, the built directed networks of terms were visualized for better visual perception. The proposed approach can be used for automatically creating terminological ontologies of subject domains with the participation of experts. Also, the research result can be used to create personal search interfaces for users of information retrieval systems and also can be used in navigation systems in databases. It should help users of such systems simplify the process of searching the relevant information.</p>
      </abstract>
      <kwd-group>
        <kwd>Subject Domain</kwd>
        <kwd>Terminological Ontology</kwd>
        <kwd>Network of Terms</kwd>
        <kwd>Horizontal Visibility Graph</kwd>
        <kwd>Network of Natural Hierarchies of Terms</kwd>
        <kwd>Syntax and Semantic Links</kwd>
        <kwd>Undirected Network</kwd>
        <kwd>Directed Network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>methodologies and techniques of computerized text processing and analysis. The
modern software is increasingly in need of ready-made solutions to improve its systems.</p>
      <p>
        It should be noted that it is very important to formalize the knowledge of some
subject domain while its studying. This process of representing, formal naming and
definition of the categories, properties and relations between the concepts, data and entities
is known as ontology modeling of the subject domain. A network of terms can be
considered as a model of some subject domain. In this network of terms, nodes correspond
to the individual words and phrases in the text and the edges to the links between them.
The process of ontology creating is usually very complex and resource-intensive, and
besides this, it is still an unsolved scientific and practical problem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A separate step
in this formalization is to identify the basic objects. In the case of networks of terms
building, this step includes creation dictionaries, thesauruses, and subject dictionaries
of terms, which based on the text corpus. The task of effective selection of individual
terms from the text corpus and automating such selection is still open, important and
completely unresolved [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>Due to the complexity of natural language, the determination of the syntax and
semantic links between nodes that correspond to the terms in the text and the
determination of the directions of these links is also an equally complex and open problem of
conceptualization.</p>
      <p>The purpose of this work is to propose and present new approaches for determining
the directions of links between nodes in undirected networks of terms built from words
and phrases (separate unigrams, bigrams and trigrams) of a thematic text corpus.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Method for Building Undirected Networks of Terms</title>
      <p>
        There are several approaches for transforming the texts into a network of terms and
different ways to interpret nodes and connections [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. It leads to different kinds of
presentation of these networks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In this work, the compactified horizontal visibility graph (CHVG) algorithm for
creating terminological ontologies of subject domains for key terms (separate unigrams,
bigrams and trigrams) is used.
2.1</p>
      <sec id="sec-2-1">
        <title>Compactified Horizontal Visibility Graph (CHVG) Algorithm</title>
        <p>
          The horizontal visibility graph (HVG) algorithm [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
          ] is a modification of a common
visibility algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          In the work [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], the next steps are proposed to build undirected networks of terms
using the HVG algorithm. The first step is to mark on the horizontal axis a number of
nodes, each of which corresponds to the terms in the order in which they occur in the
text; and the weighted values – numerical estimates xi that is intended to reflect how
important a word is to a document in a collection or corpus are marked on the vertical
axis. In the second stage, the horizontal visibility graph is built.
        </p>
        <p>Two nodes ti and tj corresponding to the elements of the time series xi and xj, are is
connected in a HVG if and only if, when xk &lt; min(xi; xj) for all tk (ti &lt; tk &lt; tj).</p>
        <p>In the third stage, the network that obtained on in the previous steps is compactified:
the nodes that correspond to the same terms are combined into a single node. The
obtained undirected network of terms is called the compactified horizontal visibility graph
(CHVG) (see fig. 1).</p>
        <p>
          Рис. 1. Stages of a building of the compactified horizontal visibility graph [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Thus, the CHVG algorithm allows building an undirected network of terms in case,
when the numerical values are assigned to separated words or phrases (separate
unigrams, bigrams and trigrams) of a thematic text corpus.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Text Corpora Pre-processing</title>
        <p>Languages we speak and write are made up of several words often derived from one
another.</p>
        <p>When a language contains words that are derived from another word as their use in
the speech changes is called Inflected Language. It is clear to understand that an
inflected word(s) will have a common root form.</p>
        <p>In this section, we briefly describe the main parts of processing text documents such
as tokenization, part-of-speech tagging, lemmatization, stop words removal, stemming
process and terms weighting.</p>
        <sec id="sec-2-2-1">
          <title>Tokenization and lemmatization</title>
          <p>For preliminary lexical analysis, breaking text up into its single words (tokens) –
tokenization, is made.</p>
          <p>Lemmatization usually refers to doing things properly with the use of vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only
and to return the base or dictionary form of a word, which is known as the lemma. A
lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation
form of a set of words.</p>
          <p>For example, "runs", "running", "ran" are all forms of the word "run", therefore "run"
is the lemma of all these words. Because lemmatization returns an actual word of the
language, it is used where it is necessary to get valid words.</p>
          <p>In this work, “WordNet Lemmatizer” provided by Python NLTK was used to
lemmatize the tokens. “WordNet Lemmatizer” uses the WordNet Database to lookup
lemmas of words.</p>
          <p>
            Tokenization and lemmatization are usually the initial stages of word processing
because they allow you to work with a word as a single entity, knowing its context [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Part-of-Speech Tagging</title>
          <p>POS tagging is one of the first steps in computer text analysis.</p>
          <p>
            Before lemmatization, it is necessary to provide the context in which you want to
lemmatize that is the parts-of-speech (POS) [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
          <p>In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST),
also called grammatical tagging or word-category disambiguation, is the process of
marking up a word in a text (corpus) as corresponding to a particular part of speech,
based on both its definition and its context—i.e., its relationship with adjacent and
related words in a phrase, sentence, or paragraph. A simplified form of this is commonly
taught to school-age children, in the identification of words as nouns, verbs, adjectives,
adverbs, etc.</p>
          <p>
            In general, PoS tagging algorithms are divided into two distinct groups: rule-based
and stochastic. E. Brill's tagging method [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], which uses rule-based algorithms, is the
first and most widely used method of tagging English-language texts.
          </p>
          <p>“Part of Speech tagging” is one of the more powerful aspects of the NLTK module
in the Python programming language. Basically, the goal of a POS tagger is to assign
linguistic (mostly grammatical) information to sub-sentential units - tokens.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Stop Words Removal</title>
          <p>Also, after the stage of pre-procession of the textual documents and the extraction of
key terms in this study, it is proposed to remove stop words that have no semantic
strength, that is, informationally unimportant ones, as well as bigrams containing at
least one stop word and trigrams that start or end with a stop word. In general, stop
words are words that do not contain important significance to be used in Search
Queries. Usually, these words are filtered out from search queries because they return a vast
amount of unnecessary information. Mostly they are words that are commonly used in
the English language such as 'as, the, be, are' etc.</p>
          <p>The stop dictionary used in this work was based on different stop dictionaries, which
are available at:
https://code.google.com/archive/p/stop-words/downloads/;
http://www.textfixer.com/tutorials/common-english-words.php.</p>
          <p>It should be noted, that each programming language will give its list of stop words
to use. In this work, the “SnowballStemmer” (stemmer that is realized in Python in
NLTK librаry – Natural Language Toolkit librаry) was also used to ignore stop words.</p>
          <p>Also, the formed stop dictionary was expanded by adding other stop words that were
identified by experts within the considered subject domain.</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Stemming</title>
          <p>
            After the stages described above, for combining the words that have a common root
into a single word it is proposed to carry out the process of stemming. Stemming is the
process of reducing inflection in words to their root forms such as mapping a group of
words to the same stem even if the stem itself is not a valid word in the Language [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ].
Stemming usually refers to a crude heuristic process that chops off the ends of words
and often includes the removal of derivational affixes that are used with a word. So
words having the same stem will have a similar meaning. The results of stemming are
similar to determining the root of the word, but its algorithms are based on other
principles [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. That is why, after stemming (processing with stemmer), the word may be
different from its morphological root.
          </p>
          <p>The goal of both stemming and lemmatization is to reduce inflectional forms and
sometimes derivationally related forms of a word to a common base form.</p>
          <p>However, the two processes differ in that stemming most commonly collapses
derivationally related words, whereas lemmatization commonly only collapses the different
inflectional forms of a lemma.</p>
          <p>If confronted with the token "saw", stemming might return just s, whereas
lemmatization would attempt to return either "see" or "saw" depending on whether the use
of the token was as a verb or a noun.</p>
          <p>To avoid the confusion described above, in this work the lemmatization process
precedes the stemming process.</p>
          <p>
            Several stemming algorithms can be distinguished in terms of performance,
accuracy, and how stemming problems are overcome [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ].
          </p>
          <p>
            The most common algorithm for stemming English, and one that has repeatedly been
shown to be empirically very effective, is Porter's algorithm [
            <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
            ]. In this work, the
“PorterStemmer” stemmer realized in Python in NLTK (Natural Language Toolkit)
librаry was used. This function is known for its simplicity and speed. As a result of its
use, words having the same stem will have a similar meaning.
          </p>
          <p>The pre-processing stages described above allows normalizing the text corpus.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Weighting and extraction of the key terms</title>
        <p>
          After the pre-processing stages, the weighting and extraction of the key terms are made.
To form a time series, the function that reflects the term to number, this study uses the
modification of classic statistical weight indicator TF-IDF (from English, TF is Term
Frequency, IDF is Inverse Document Frequency) [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ] – GTF (Global Term
Frequency) [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] as a weight value of terms.
        </p>
        <p>This approach allows having a high statistical indicator of importance for
informationally-important in global context elements of the text.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Rules for Determining the Directions of Links</title>
      <p>As was mentioned above, the determination of the directions of links is a complex and
open problem of ontology creation. Below, we consider several new approaches for
determining the directions of links between nodes in undirected networks of terms built
from words and phrases (separate unigrams, bigrams and trigrams) of a thematic text
corpus.</p>
      <p>Let G be the undirected network of terms that built according to the described above
rules: G:= (V, T) where V is the set of nodes, T is the set of the unordered pairs of nodes
from the set V that correspond to the causal links between the nodes.</p>
      <p>It is supposed that a causal link exists in the direction from the node ti to the node tj
for ∀ , : (ti, tj)∈ T if:</p>
      <p>
        1. the numerical value of the node ti that corresponds to: a) degree [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ] b) HITS
score [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] c) PageRank score [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]) is higher than the numerical value of the
corresponded score of the node tj;
      </p>
      <p>2. within the sentence, the term to which the node ti corresponds precedes the term
to which the node tj corresponds;</p>
      <p>3. the term to which the node ti corresponds is shorter than the term to which the
node tj corresponds.</p>
      <p>
        One of the methods for creating terminological ontologies – the algorithm for
building the thematic networks of natural hierarchies of terms based on analysis of texts
corpora – is used to build the directed network of words and phrases (separate unigrams,
bigrams and trigrams) according to the third rule. The work [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] notes that the
algorithm for building the networks of natural hierarchies of terms provides for the building
of a compactified horizontal visibility graph and the determining of directions of links
between the key terms according to the rule: a word is a part of a two-term phrase or a
three-term phrase and the two-term phrase is a part of the three-term phrase.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Results of the Study of the Proposed Approaches</title>
      <p>The proposed approaches for determining the directions of links in undirected networks
of terms was tested on the example of the English-language text, namely – the
wellknown fairy tale “The story of Little Red Riding Hood”.</p>
      <p>According to the described above method, the text pre-processing and the extraction
of the key terms (separate unigrams, bigrams and trigrams) were made (Table 1, 2 and
3).</p>
      <p>The following results were obtained after building the directed network according to
the first rule for different measures of network nodes (for the degree – fig. 2; for the
HITS – fig. 3; for the PageRank – fig. 4). Using the software for modeling and
visualization of graphs – Gephi (https://gephi.org), the built directed networks of terms were
visualized for better visual perception.</p>
      <p>Fig. 2. Fragment of the directed network, which built according to the first rule for node
degree.</p>
      <p>After analyzing the obtained results, it was found that the directed network, which
built according to the second rule more precisely reflects the directions of links that
exist between the terms in the considered text, than the network, which built according
to the first rule. The network of natural hierarchies of terms has its peculiarities and
advantages, so it is difficult to compare it with the networks built according to the first
two rules. Taking account into the naturalness of links that determined in such a
network, we can talk about their syntactic adequacy.</p>
      <p>Considering, for example, the directions of links determined for key terms, we can
see that according to the first rule, the links between “wolf”-“grandmother”-“red” are
as follows (see fig. 2,3,4): for the degree, HITS and PageRank – the “grandmother”
influences on the “wolf” and the “red”, and the “red” influences on “wolf”. It does not
correspond to the real directions of links that exist in the text in terms of content
analysis. While, according to the second rule, the “wolf” influences on the “grandmother”
and the “grandmother” influences on the “red”, which corresponds to the content of the
considered text.</p>
      <p>In comparison with other rules, the rule for determining the directions of links in
undirected networks of terms, when within the sentence, the term to which the node ti
corresponds precedes the term to which the node t j corresponds (where t j ( ti , t j ) ∈ T)
is more informative among the first two rules. It is because the links determined
according to this rule more precisely corresponds to the content of the considered text
according to experts.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>After studying the proposed rules for determining the directions of links in undirected
networks of terms, it was found that the second rule more precisely reflects the
directions of links, which correspond to the content of the considered text according to
experts and is more informative. On the example of the English-language text – the
wellknown fairy tale “The story of Little Red Riding Hood” the undirected network of terms
was built. Using the proposed rules for determining the directions of links, the directed
networks of terms were obtained from undirected networks of terms. Informative
content of network links built according to the second proposed rule is higher among the
other two rules according to experts. Taking account into the naturalness of links that
determined in the network of natural hierarchies of terms, we can talk about their
syntactic adequacy.</p>
      <p>The directed networks of words and phrases built according to the proposed
approach can be used for automatically creating terminological ontologies of subject
domains with the participation of experts. Also, the research result can be used to create
personal search interfaces for users of information retrieval systems and also can be
used in navigation systems in databases. It should help users of such systems simplify
the process of searching the relevant information.</p>
      <p>As the task of improving the accuracy of determining the directions of links between
nodes in undirected networks of words and phrases is actual, then it is planned to
continue working in this direction, developing new and modifying existing approaches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lande</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snarsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Approach to Creation of Terminological Ontologies</article-title>
          .
          <source>Design ontology 2</source>
          (
          <issue>12</issue>
          ), pp.
          <fpage>83</fpage>
          -
          <lpage>91</lpage>
          , (
          <year>2014</year>
          ).
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lukashevich</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobrov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chuiko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Selection of Word Combinations for Automatic Word Processing System Dictionary</article-title>
          .
          <source>Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference «Dialogue-2008»</source>
          , pp.
          <fpage>339</fpage>
          -
          <lpage>344</lpage>
          . Moscow (
          <year>2008</year>
          ).
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Filippovich</surname>
          </string-name>
          , Yu.,
          <string-name>
            <surname>Prokhorov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantics of Information Technologies: Experiments of Dictionary-thesaurus Description</article-title>
          . Moscow State University of Printing Arts, Moscow (
          <year>2002</year>
          ).
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ferrer-</surname>
            i-Cancho,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Solé</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <source>The Small World of Human Language. in Proc. of the Royal Society of London</source>
          , pp.
          <fpage>2261</fpage>
          -
          <lpage>2265</lpage>
          . London (
          <year>2001</year>
          ). doi:
          <volume>10</volume>
          .1098/rspb.
          <year>2001</year>
          .
          <year>1800</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Caldeira</surname>
            ,
            <given-names>S. M. G.</given-names>
          </string-name>
          , Petit Lobao,
          <string-name>
            <given-names>T. C.</given-names>
            ,
            <surname>Andrade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F. S.</given-names>
            ,
            <surname>Neme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , &amp;
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. G. V.</surname>
          </string-name>
          :
          <article-title>The network of concepts in written texts</article-title>
          .
          <source>The European Physical Journal B-Condensed Matter and Complex Systems</source>
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <fpage>523</fpage>
          -
          <lpage>529</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ferrer-</surname>
            i-Cancho,
            <given-names>R. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solé</surname>
            ,
            <given-names>R. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Köhler</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Patterns in syntactic dependency networks</article-title>
          .
          <source>Physical Review E</source>
          <volume>69</volume>
          (
          <issue>5</issue>
          ), (
          <year>2004</year>
          ). doi:
          <volume>10</volume>
          .1103/PhysRevE.69.051915
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacasa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Horizontal visibility graphs: Exact results for random time series</article-title>
          . Physical Review E,
          <volume>80</volume>
          (
          <issue>4</issue>
          ), (
          <year>2009</year>
          ). doi:
          <volume>10</volume>
          .1103/PhysRevE.80.046103.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gutin</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Severini</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A characterization of horizontal visibility graphs and combinatorics on words</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>390</volume>
          (
          <issue>12</issue>
          ),
          <fpage>2421</fpage>
          -
          <lpage>2428</lpage>
          (
          <year>2011</year>
          ). doi:
          <volume>10</volume>
          .1016/j.physa.
          <year>2011</year>
          .
          <volume>02</volume>
          .031.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bezsudnov</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Snarskii</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          :
          <article-title>From the time series to the complex networks: The parametric natural visibility graph</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>414</volume>
          ,
          <fpage>53</fpage>
          -
          <lpage>60</lpage>
          (
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          .1016/j.physa.
          <year>2014</year>
          .
          <volume>07</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lacasa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nuno</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          :
          <article-title>From time series to complex networks: The visibility graph</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>105</volume>
          (
          <issue>13</issue>
          ),
          <fpage>4972</fpage>
          -
          <lpage>4975</lpage>
          (
          <year>2008</year>
          ). doi:
          <volume>10</volume>
          .1073/pnas.0709247105
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lande</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snarskii</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yagunova</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pronoza</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          :
          <article-title>The use of horizontal visibility graphs to identify the words that define the informational structure of a text</article-title>
          .
          <source>In: 2013 12th Mexican International Conference on Artificial Intelligence</source>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>215</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An Introduction to Information Retrieval</article-title>
          . Cambridge University Press,
          <fpage>22</fpage>
          -
          <lpage>36</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.:
          <article-title>Probabilistic Part-of-Speech Tagging Using Decision Trees</article-title>
          .
          <source>In: Proceedings of International Conference on New Methods in Language Processing</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . Manchester, UK (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Brill</surname>
          </string-name>
          , E.:
          <article-title>A simple rule-based part of speech tagger</article-title>
          .
          <source>In: Proceedings of the third conference on Applied natural language processing (ANLC '92)</source>
          .
          <source>Association for Computational Linguistics</source>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>155</lpage>
          . Stroudsburg. PA. USA (
          <year>1992</year>
          ). doi:
          <volume>10</volume>
          .3115/974499.974526
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jongejan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Automatic training of lemmatization rules that handle morphological changes in pre-, in-and suffixes alike</article-title>
          .
          <source>In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing</source>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>153</lpage>
          . Association for Computational Linguistics,
          <string-name>
            <surname>Singapore</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lovins</surname>
            ,
            <given-names>J. B.</given-names>
          </string-name>
          :
          <article-title>Development of a stemming algorithm</article-title>
          .
          <source>Mech. Translat. &amp; Comp. Linguistics</source>
          <volume>11</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>22</fpage>
          -
          <lpage>31</lpage>
          (
          <year>1968</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ribeiro-Neto</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Modern information retrieval</article-title>
          . New York: ACM Press, Harlow. England:
          <string-name>
            <surname>Addison-Wesle.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          :
          <article-title>An algorithm for suffix stripping</article-title>
          .
          <source>Program</source>
          <volume>14</volume>
          (
          <issue>3</issue>
          ),
          <fpage>130</fpage>
          -
          <lpage>137</lpage>
          (
          <year>1980</year>
          ). doi:
          <volume>10</volume>
          .1108/eb046814
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Willett</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The Porter stemming algorithm: then and now</article-title>
          .
          <source>Program</source>
          <volume>40</volume>
          (
          <issue>3</issue>
          ),
          <fpage>219</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2006</year>
          ). doi:
          <volume>10</volume>
          .1108/00330330610681295.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information processing &amp; management 24(5)</source>
          ,
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          (
          <year>1988</year>
          ). doi:
          <volume>10</volume>
          .1016/
          <fpage>0306</fpage>
          -
          <lpage>4573</lpage>
          (
          <issue>88</issue>
          )
          <fpage>90021</fpage>
          -
          <lpage>0</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <article-title>Mining of massive datasets</article-title>
          . Cambridge University Press (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lande</surname>
            ,
            <given-names>D.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dmytrenko</surname>
            ,
            <given-names>O.O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Snarskii</surname>
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Transformation texts into the complex network with applying visibility graphs algorithms</article-title>
          .
          <source>In: CEUR Workshop Proceedings (ceurws.org)</source>
          . Vol-
          <volume>2318</volume>
          urn:nbn:de:
          <fpage>0074</fpage>
          -
          <lpage>2318</lpage>
          -4.
          <source>Selected Papers of the XVIII International Scientific and Practical Conference on Information Technologies and Security (ITS</source>
          <year>2018</year>
          ). vol.
          <volume>2318</volume>
          . pp.
          <fpage>95</fpage>
          -
          <lpage>106</lpage>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Bondy</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>U. S. R.</given-names>
          </string-name>
          :
          <article-title>Graph theory with applications</article-title>
          . vol.
          <volume>290</volume>
          .
          <string-name>
            <surname>Macmillan</surname>
          </string-name>
          , London (
          <year>1976</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Godsil</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Royle</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <source>Algebraic Graph Theory. Graduate Texts in Mathematics 207</source>
          . Springer, New York (
          <year>2001</year>
          ). doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>4613</fpage>
          -0163-9
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J. M.:</given-names>
          </string-name>
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          .
          <source>In Processing of ACM-SIAM Symposium on Discrete Algorithms</source>
          ,
          <volume>46</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>604</fpage>
          -
          <lpage>632</lpage>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The anatomy of a large-scale hypertextual web search engine</article-title>
          .
          <source>Computer networks and ISDN systems</source>
          ,
          <volume>30</volume>
          (
          <issue>1-7</issue>
          ),
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          (
          <year>1998</year>
          ). doi:
          <volume>10</volume>
          .1016/S0169-
          <volume>7552</volume>
          (
          <issue>98</issue>
          )
          <fpage>00110</fpage>
          -X
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Lande</surname>
            ,
            <given-names>D.V.</given-names>
          </string-name>
          :
          <article-title>Building of networks of natural hierarchies of terms based on analysis of texts corpora</article-title>
          .
          <source>arXiv preprint arXiv:1405.6068</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>