<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Network Approach for Visualizing the Evolution of the Research of Cross-lingual Semantic Similarity</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Aida Kh. Khakimova ANO «Scientific and Research Center for Information in Physics and Technique»</institution>
          ,
          <addr-line>Nizhny Novgorod</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper is devoted to the problem of the bibliometric study of publications on the topic “Cross-lingual Semantic Similarity”, available in the Dimensions database. Visualization of scientific networks showed fragmentation of research, limited interaction of organizations. Leading countries, leading organizations and authors are highlighted. Overlay visualization allowed us to assess the trends in citing authors. The expansion of the geography of research is shown. For international cooperation, the uniformity of semantic approaches to describing the concepts of critical infrastructure, incidents, resources and services related to their maintenance and protection is important. The stated approaches can be applied for visualization and modeling of technological development in the modern digital world. Semantic similarity is a longstanding problem in natural language processing (NLP). The semantic similarity between two words represents the semantic proximity (or semantic distance) between two words or concepts. This is an important problem in natural language processing, as it plays an important role in finding information, extracting information, text mining, web mining and many other applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Linguistic similarities were studied by researchers
from different fields using numerous statistical, linguistic
and neuroscientific approaches.</p>
      <p>The semantic properties of languages are usually
evaluated using the embedding of words, which projects a
linguistic dictionary onto the vector space of a given
number of dimensions, in which the semantic relations of
words are stored.</p>
      <p>
        In artificial intelligence and cognitive science,
semantic similarities were used for various scientific
assessments and measurements, as well as for decoding
complex interfaces of conceptualizing feelings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Theoretically, semantic similarity refers to the idea of
commonality in the characteristics between words or
concepts in a language. Although this is a property of the
relationship between concepts or feelings, it can also be
defined as a measurement of the conceptual similarity
between two words, sentences, paragraphs, documents, or
even two parts of a text.</p>
      <p>
        Recently, there has been a growing interest in finding
semantically similar words in different languages based
on comparable data easily accessible from the Internet
(for example, Wikipedia, news) [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        According to Hotho et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] Text Mining can be
defined - like data mining - as the application of
algorithms and methods from fields of machine learning
and statistics in texts in order to search for useful
templates after pre-processing. Data mining algorithms
can be applied to the extracted data.
      </p>
      <p>
        Text analysis in big data analytics is becoming a
powerful tool for processing unstructured text data,
analyze it to extract new knowledge and identify
meaningful models and correlations hidden in the data.
Text mining refers to the extraction of information and
implicit patterns previously unknown in automatic or
semi-automatic mode from a huge unstructured text data
such as natural language texts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Tech Mining refers to the application of text mining
methods to technical documentation. For the purposes of
patent analysis, this is called “patent mining”. Tech
Mining (TM) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] uses text mining software to exploit
scientific and technical information resources. Mining
technology is used to inform technology management.
This technology combines understanding of technological
innovative processes with software tools for obtaining
vital scientific and technical knowledge.
      </p>
      <p>
        Whereas many applications have employed certain
similarity functions to compute the semantic similarity
between terms, most of the traditional approaches solving
the problem by using dictionaries such as WordNet. The
main problem is that a lot of terms (e.g. abbreviations,
acronyms, brand names etc.) that are not covered by these
kinds of dictionaries [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As a result, semantic similarity
measures which are based on this type of resources cannot
be used directly in these cases.
      </p>
      <p>Tech Mining is the application of text mining tools to
scientific and technical information resources. The
evergrowing volume of scientific results represents a boom in
technological innovation, but also complicates efforts to
obtain useful and concise information for solving
problems. This problem extends to technological mining,
where the development of methods compatible with big
data is an urgent problem.</p>
      <p>In the current patent analysis, numerous patent
documents use different words to describe the same event,
leading to semantic inconsistency and polysemy due to
the many meanings that may exist for a single word. To
solve this problem, document analysis often requires
combining synonyms into the same semantic dimension.
On the other hand, different words can be used to describe
the same events.</p>
      <p>
        The methods for measuring the semantic similarity of
texts are necessary for the development of areas of
information retrieval, data mining and text analysis. Such
methods will help to avoid patent infringement in the
development of technological capabilities to achieve
future competitive advantages [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The growing popularity of data science is also
affecting high-tech industries. However, since they
usually have different core competencies - the creation of
cyberphysical systems, and not, for example, machine
learning algorithms or data mining - to delve into the
science of data by specialists in the field, such as system
engineers or architects, can be more cumbersome than
expected.</p>
      <p>
        In recent years, in order to help subject matter experts
use data science, scientists have been developing semantic
search engines. So, for example, Semantic Snake Charmer
(SSC) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], is a search engine based on subject knowledge.
SSC includes a natural language processing module that
can convert relevant documentation into several types of
semantic graphs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        An accurate assessment of the actual similarity
between documents is fundamental for many automatic
text analysis applications, such as thesaurus generation
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], machine translation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], question-answer [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
information search [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and automatic generalization.
      </p>
      <p>
        Semantic space is an attempt to model the
characteristics of human semantic memory, which is
guided by the principle that words with similar meanings
are found in a similar language environment. Semantic
space is a vector space that captures the value
quantitatively from the point of view of coincidence
statistics, where words (or concepts) are represented as
vectors in a high-dimensional space [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. As a result, the
similarity of the meanings of words can be quantified by
measuring their distance in a high-dimensional vector
space.
      </p>
      <p>
        Latent semantic analysis (LSA) is based on the fact
that words that have similar meanings tend to occur in
similar texts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Knowledge-based methods suffer from a limited
number of common vocabulary words that are commonly
used in general English literature and often not suitable
for specific domains.</p>
      <p>
        The vector space model is classically used to evaluate
the semantic similarity between two documents. Terms
are represented in this semantic space as vectors called
word embeddings. The possibilities of determining textual
similarity based on vector representations of terms in a
semantic space in which the proximity of vectors can be
interpreted as semantic similarity [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] are investigated.
      </p>
      <p>
        The LSA method has an advantage over most modern
information retrieval methods because it has the ability to
measure the similarity of two texts that use completely
different words. However, there are morphological
problems of the correct identification of terms, as well as
more fundamental problems with homonymy / polysemy
and synonymy. Techniques that depend on large
enclosures tend to overestimate relatively unrelated
sentences or relatively related sentences (e.g., LSAs).
LSAs overestimate the similarity score of compared pairs
of sentences [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The study of the similarity assessment
between patent documents and scientific publications in
the field of biotechnology by the LSA method proved that
in this case the decrease in dimension led to the cutting off
of valuable information [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Semantic spaces can be constructed either using the
additive model or the multiplicative model. Both additive
and multiplicative approaches to constructing semantic
space do not take into account the word order among the
components (i.e., words or phrases). Traditional clustering
algorithms usually rely on the BOW (Bag of Words)
approach, and the obvious drawback of BOW is that it
ignores the semantic relationship between words.</p>
      <p>
        Researchers expanded DSM to include the
compositional structure of the language, and called these
models compositional-DSM (CDSM). CDSM models
suggest that the meaning of a word can be interpreted by
its context, and the meaning of a sentence can be obtained
from its compositions [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The central place in CDSM is
compositionality, that is, the meaning of complex
expressions is determined by the values of their
component expressions and the rules for combining them.
      </p>
      <p>
        Assessing semantic similarities between concepts is a
key tool to improve understanding of texts. The structured
knowledge provided by ontologies is widely used to
evaluate similarities. However, in many areas several
ontologies modeling the same concepts in different ways
are available. The paper describes the criteria for choosing
ontologies for assessing semantic similarity [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        A measure of calculating the similarity between
sentences or between documents using an ontology is
proposed. The similarity is evaluated using the concept
vector of the document (proposal), formed by finding the
links between the ontology terms and the content of the
document (proposal) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        The vector space model is used to identify potentially
useful services and evaluate web services [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Methods
for extracting information and automatic semantic textual
similarity assessment were used for electronic health
systems (EHR) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>Similarity measures are used to select a
contextsensitive application that matches the current context of
the user. Personalization of services is directly related to
the user's preferences, displaying his contextual
information from the user environment.</p>
      <p>
        A semantic similarity measure is a tool for assessing
the similarity between instances of the context, which
allows to select services in accordance with their
relevance for a given request, profile and user preferences.
With this approach, the context is considered as a set of
information representing spatio-temporal information
about the user, as well as his preferences and interests,
which is used as a factor in classifying services by
relevance [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        The data sets of common STS problems were widely
used to study similarities at the sentence level and
semantic representations [
        <xref ref-type="bibr" rid="ref25 ref26 ref27">25-27</xref>
        ].
      </p>
      <p>
        The CL-WES method [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] is based on the cosine
similarity of distributed representations of sentences,
which are obtained by weighting the sum of each word
vector in a sentence. At the same time, at the first stage,
the Spanish sentence is translated into English using
Google Translate (i.e., two sentences are formulated in the
same language), then both statements are compared.
      </p>
      <p>
        The similarity score of the interlanguage pairs in
English and Spanish was calculated as the average of the
corresponding language ratings in the monolingual data
sets [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. The study was developed for five languages [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
- English, German, Italian, Spanish and Farsi.
      </p>
      <p>
        The skip-gram model has become one of the most
popular for the study of word representations in NLP [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>The cross-language definition of semantic textual
similarity is an important step for the detection and
evaluation of interlanguage plagiarism; research in this
area is rare.</p>
      <p>A comparable corpus consists of documents in two or
more languages or varieties that are not translations of
each other and deal with similar topics. Comparable
bodies are, by definition, multilingual and interlanguage
collections of text. The Internet can be used as a huge
resource of multilingual texts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Matherials and methods</title>
      <p>To search for publications, the Dimensions database
(https://app.dimensions.ai/) was used, which provides
open access to more than 95 million publication records
and related metrics for individual users. The search
keywords used were “cross-lingual semantic similarity”.
2050 articles were discovered.</p>
      <p>VOSviewer (https://www.vosviewer.com/) was used
to visualize scientific networks. VOSviewer uses a remote
approach to visualizing bibliometric networks. In a
bibliometric network, there are often large differences
between nodes in the number of edges they have to other
nodes.</p>
      <p>
        Popular sites, for example, representing highly cited
publications or highly prolific researchers, may have
several orders of magnitude more connections than their
less popular counterparts. When analyzing bibliometric
networks, normalization of these differences between
nodes is usually performed. VOSviewer by default applies
the normalization of communication strength [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
      </p>
      <p>2050 articles of 2825 authors from 64 countries were
discovered. The dynamics of publications is shown in Fig.
1. The trend line is clearly exponential, the determination
coefficient (R2), which is also called the approximation
confidence value, is 0.6648. Initial publications date back
to the 80s of the 20th century, but research has been
growing since the beginning of the 21st century.</p>
      <p>We reviewed a collaborative network of organizations
(Fig. 3). For 684 organizations, the minimum number of
articles of the organization was taken to be five; such
organizations were allocated 64 in 6 clusters. Fig. 3 shows
that only a small number of universities interact. The
largest cluster included 11 European universities and
organizations: Dublin City University; Fondazione Bruno
Kessler; German Research Center for Artificial
Intelligence; National University of Distance Education;
Trinity College Dublin; University of Alicante; University
of Edinburgh; University of Sheffield; University of The
Basque Country; University of Trento; University of
Wolverhampton.</p>
      <p>We examined a co-authorship network by country, the
minimum number of articles by the author was taken to be
five. Of 2825 authors of 64 countries, 35 are associated in
five clusters (Fig. 4). The two largest clusters included 9
countries. The first cluster included countries: Austria,
Canada, China, India, Iran, Japan, the Netherlands,
Taiwan, and the USA. The second cluster included
countries: Belgium, Finland, France, Greece, Slovenia,
Spain, Switzerland, Tunisia, Great Britain.</p>
      <p>The citation index in recent years is the main measure
of the value of both a scientist and an institution, so we
examined citation networks.</p>
      <p>
        We examined the citation network for documents, the
minimum number of publications by the author was taken
equal to ten. 298 authors from 2050 were identified in 14
clusters (Fig. 5).
The most cited author is Navigli, Roberto (759
citations) [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ]. More than 200 citations from Rosso,
Paolo (239) and Moens, Marie-Francine (216) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>VOSviewer also supports overlay renderings. In
overlay rendering, the color of a node indicates a specific
property of the node, for example, the year of publication.
We presented the authors citation network in an overlay
visualization option to assess citation trends (Fig. 6). The
figure clearly shows that R. Navigli is the founder in the
area.</p>
      <p>A citation network by authors was built. The minimum
number of publications by the author was taken to be five.
16 authors were identified from 2050 in 2 clusters (Fig.
7). Mittal Namitali (2017), Rettinger Achim, Gipp Bela,
Li Juanzi and Zhan Lei (2016) are the most recent of the
most cited authors.
The geographical aspects of citation were considered.
A citation network for countries was built with a
minimum of five publications. 34 countries were
identified (Fig. 8). It is seen that the geography of research
is expanding. So, in the last goals, Brazil, Czech Republic,
Iran, Egypt, Tunisia have joined the research.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>A bibliometric study of publications on the topic
“Cross-lingual Semantic Similarity”, available in the
Dimensions database, was carried out. In recent years,
there has been a significant increase in research.
included 11 European universities and organizations from
Ireland, Italy, Germany, Spain, Scotland, and Great
Britain.</p>
      <p>Visualization of the co-authorship network by country
showed that 35 countries interact in research, countries are
connected in five clusters. The two largest clusters
included 9 countries. In the largest clusters, including 9
countries, the leading ones were the USA and China,
Great Britain and Spain.</p>
      <p>
        The visualization of the citation network revealed 298
of the most cited authors out of 2050. The most cited
author is Navigli, Roberto (759 citations). More than 200
citations from Rosso, Paolo (239) and Moens,
MarieFrancine (216) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Overlay visualization made it possible to evaluate the
citation trends of the authors; it turned out that the most
cited author, Navigli, Roberto, is also the founder of
research in this field [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ].
      </p>
      <p>The most recent cited authors are Mittal Namitali
(2017 citation), Rettinger Achim, Gipp Bela, Li Juanzi
and Zhan Lei (2016 citation).</p>
      <p>Consideration of the geographical aspects of citation
showed an expansion of the geography of research. So, in
the last goals, Brazil, Czech Republic, Iran, Egypt,
Tunisia have joined the research.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>The reported study was funded by RFBR according to
the research projects № 18-07-00225, 18-07-00909,
1807-01111 and 20-04-60185.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rajat</given-names>
            <surname>Pandit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Sengupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Naskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.K.</given-names>
            ,
            <surname>Dash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.S.</given-names>
            and
            <surname>Sardar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.M.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla - A Low Resourced Language</article-title>
          . Informatics,
          <volume>6</volume>
          , 19; doi:10.3390/informatics6020019
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Vulic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Smet</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Moens</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Identifying word translations from comparable corpora using latent topic models</article-title>
          .
          <source>In Proceedings of ACL</source>
          , pages
          <fpage>479</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Prochasson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Rare word translation extraction from aligned comparable documents</article-title>
          .
          <source>In Proceedings of ACL</source>
          , pages
          <fpage>1327</fpage>
          -
          <lpage>1335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nürnberger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Paaß</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>A brief survey of text mining</article-title>
          .
          <source>In Ldv Forum</source>
          , Vol.
          <volume>20</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>19</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hassani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beneki</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazinani</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yeganegi</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Text Mining in Big Data Analytics</article-title>
          .
          <source>Big Data Cogn. Comput</source>
          .
          <year>2020</year>
          ,
          <volume>4</volume>
          , 1; doi:10.3390/bdcc4010001.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <source>Tech Mining. Competitive Intelligence Magazine</source>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alfayez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Alquhayz</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Semantic Similarity Measures Between Words: A Brief Survey</article-title>
          .
          <source>Sci. Int. (Lahore)</source>
          ,
          <volume>30</volume>
          (
          <issue>6</issue>
          ),
          <fpage>907</fpage>
          -
          <lpage>914</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>Y. C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hsin</surname>
            ,
            <given-names>P. L.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Constructing Patent Maps Using Text Mining to Sustainably Detect Potential Technological Opportunities</article-title>
          . Sustainability,
          <volume>10</volume>
          , 3729; doi:10.3390/su10103729.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Grappiolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Gerwen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoosel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Somers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>The Semantic Snake Charmer Search Engine: A Tool to Facilitate Data Science in High-tech Industry Domains</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR '19)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>355</fpage>
          -
          <lpage>359</lpage>
          . DOI:https://doi.org/10.1145/3295750.3298915.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jarmasz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Szpakowicz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Roget's Thesaurus and Semantic Similarity</article-title>
          .
          <source>Recent Adv. Nat. Lang. Process. III Sel. Pap. from RANLP</source>
          , vol.
          <volume>111</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Inkpen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Unsupervised NearSynonym Choice using the Google Web 1T</article-title>
          .
          <source>ACM Trans. Knowl. Discov. Data</source>
          , vol. V, no.
          <source>June</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O</given-names>
            <surname>'Shea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Bandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Crockett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>McLean</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>A Comparative Study of Two Short Text Semantic Similarity Measures</article-title>
          .
          <source>In Agent and MultiAgent Systems: Technologies and Applications</source>
          , vol.
          <volume>4953</volume>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , G. Jo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Howlett</surname>
          </string-name>
          , and L. Jain, Eds. Springer Berlin Heidelberg, pp.
          <fpage>172</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Semantic matching in search</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ):
          <fpage>343</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] Mitchell,
          <string-name>
            <given-names>J.</given-names>
            and
            <surname>Lapata</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Composition in distributional models of semantics</article-title>
          .
          <source>Cognitive science</source>
          ,
          <volume>34</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1388</fpage>
          -
          <lpage>1429</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Latent topic modelling of word cooccurence information for spoken document retrieval</article-title>
          .
          <source>In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP</source>
          <year>2009</year>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>3961</fpage>
          -
          <lpage>3964</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kenter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rijke</surname>
          </string-name>
          , M. de (
          <year>2015</year>
          ).
          <article-title>Short Text Similarity with Word Embeddings</article-title>
          .
          <source>CIKM '15 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management October 19-23</source>
          , Melbourne, Australia. Pp.
          <volume>1411</volume>
          -
          <fpage>1420</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Atoum</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus. (IJACSA)</article-title>
          <source>International Journal of Advanced Computer Science and Applications</source>
          , Vol.
          <volume>7</volume>
          , No.
          <issue>9</issue>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Magerman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Looy</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baesens</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Debackere</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Assessment of Latent Semantic Analysis (LSA) text mining algorithms for large scale mapping of patent and scientific publication documents</article-title>
          . Department Of Managerial Economics, Strategy And Innovation (MSI),
          <year>October</year>
          ,
          <volume>77</volume>
          <fpage>р</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Marelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernardi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zamparelli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment</article-title>
          . SemEval-2014.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Batet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sánchez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Ontology Selection for Semantic Similarity Assessment</article-title>
          .
          <source>ICAART</source>
          <year>2015</year>
          , At Lisbon, Portugal, Volume:
          <volume>2</volume>
          <fpage>https</fpage>
          ://www.researchgate.net/publication/283877653
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Assessing Text Semantic Similarity Using Ontology</article-title>
          .
          <source>Journal Of Software</source>
          , vol.
          <volume>9</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>490</fpage>
          -
          <lpage>497</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Maheswari</surname>
            ,
            <given-names>J.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karpagam</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Indhumathy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Comparison of Web Service SimilarityAssessment Methods</article-title>
          .
          <source>International Journal of Computer Applications</source>
          (
          <volume>0975</volume>
          -
          <fpage>8887</fpage>
          ) Volume
          <volume>98</volume>
          - No.
          <year>22</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Moen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Distributional Semantic Models for Clinical Text Applied to Health Record Summarization Thesis for the Degree of Philosophiae Doctor Trondheim</article-title>
          ,
          <string-name>
            <surname>May</surname>
            <given-names>NTNU</given-names>
          </string-name>
          (Norwegian University of Science and
          <source>Technology Faculty of Information Technology)</source>
          ,
          <volume>93</volume>
          р.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Guessoum</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miraoui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tadj</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Survey Of Semantic Similarity Measures In Pervasive Computing</article-title>
          .
          <source>International Journal On Smart Sensing And Intelligent Systems</source>
          Vol.
          <volume>8</volume>
          , no.
          <issue>1</issue>
          , рр.
          <fpage>125</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and Ma,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A simple but tough-to-beat baseline for sentence embeddings</article-title>
          .
          <source>In Proceedings of ICLR</source>
          <year>2017</year>
          . https://openreview.net/pdf?id=
          <fpage>SyK00v5xx</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>CoRR abs/1705</source>
          .02364. http://arxiv.org/abs/1705.02364.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Pagliardini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Jaggi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features</article-title>
          . arXiv https://arxiv.org/pdf/1703.02507.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Ferrero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Besacier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Agnes</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Using Word Embedding for Cross-Language Plagiarism Detection</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <source>(EACL</source>
          <year>2017</year>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          , Valencia, Spain, volume
          <volume>2</volume>
          , pages
          <fpage>415</fpage>
          -
          <lpage>421</lpage>
          . http://aclweb.org/anthology/E/E17/E17-2066.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Find the word that does not belong: A framework for an intrinsic evaluation of word vector representations</article-title>
          .
          <source>In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP</source>
          . Berlin, Germany, pages
          <fpage>43</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taher</surname>
            <given-names>Pilehvar</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            , and
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>SemEval-2017 Task 2: Multilingual and cross-lingual semantic word similarity</article-title>
          .
          <source>In Proceedings of SemEval</source>
          . Vancouver, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Van Eck</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Waltman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>How to normalize cooccurrence data? An analysis of some well-known similarity measures</article-title>
          .
          <source>2009. Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>60</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1635</fpage>
          -
          <lpage>1651</lpage>
          . Khakimova Aida Kh.,
          <source>PhD</source>
          , docent, Kama Institute (Naberezhnye Chelny, Russia),
          <article-title>ANO «Scientific and Research Center for Information in Physics and Technique» (Nizhny Novgorod, Russia), Е-mail: aida_khatif@mail</article-title>
          .ru
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>