<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Image Tag Core Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ronov</string-name>
          <email>nvsharonova@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we explore the task of tag aggregations or merge of tags meanings for the video and image. In our work, based on our previous research we try to merge tag meanings of video files. We present the result of our experiments using word2vec and clustering algorithms. For our experiments, we use the auto-tagging program from Imagga company as the generating program. As data, we use 5 videos which were split into shots for future processing. Our experiments showed that such clustering algorithms as k-means and Affinity propagation could not be used for aggregation tag meanings. We used word2vec model from spaCy software library and combined the similarity score with score from the auto-tagging program. Our results are not very excellent but better than for clustering algorithms. We received Fmeanmeasure = 0.62. For a detailed analysis of this task, we need to create a dataset with human annotations. It will help to evaluate the Fmean-measure of our approach more precision.</p>
      </abstract>
      <kwd-group>
        <kwd>Image Description</kwd>
        <kwd>Video Description</kwd>
        <kwd>Image Tags</kwd>
        <kwd>Video Tags</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Aggregation of Video Tags</kwd>
        <kwd>Aggregation of Image Tags Meanings</kwd>
        <kwd>Word2vec</kwd>
        <kwd>Clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>There is a lot of content variety on the Internet and it grows drastically. Nowadays we
can discover a large number of video content and various images that are provided by
social networks, professional stock image marketplaces, scientific communities, and
other sources. The presence of a large number of video content and various images
causes interest in the tasks of automatic text generation from images or video series.
Popular tasks include creating subtitles, as well as creating a sentence or phrase based
on certain visual or image information. In this context, image processing and video
processing are very close to each other and can use similar approaches because the
video can be divided into slots where each slot represents an image.</p>
      <p>Generating images into text is an important topic in artificial intelligence, which is
associated with pattern recognition, computer vision, and natural language processing.
From the point of view of natural language processing, such tasks as image tagging,
selecting keywords, evaluating the weight and relevance of keywords, generating
sentences and text, etc. are of quite an interest.</p>
      <p>
        One of our goals is the construction of a system that optimizes the number of tags
describing video resources, without any loss of sense. We have started our research by
analyzing systems that generate descriptions for video and images and explored the
main problems of this task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In our previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we concentrated on the
problem of keywords aggregation into a single description of the object. Multimedia
collections integrate electronic text, graphics, images, sound, and video. Tags that
characterize, describe or refer to categories in certain classifications usually annotate their
objects. These tags help to distinguish the objects and often form folksonomies:
usergenerated categories for organizing digital content. In the work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we showed how
works the preprocessing stage for tag optimization of keywords sets for video
fragments works, using NLP techniques, lexical resources to tag aggregation.
      </p>
      <p>The main purpose of this paper is to investigate the key factors that influence the
similarity of the keywords, which describe an image or video slot. In order to achieve
our goal we make the experiment with tag core creating based on using the
autotagging program, the semantic words distance and clustering algorithm.</p>
      <p>The paper is organized as follows: Section 2 discusses related work and similarity
metrics for aggregation of word meaning, similarity measures and algorithms applied
in our experiments. The results and evaluation using different metrics and algorithms
are reported in Section 3. Finally, in Section 4 we briefly sketch future work and
present the conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <p>
        Recent years are characterized by the development of research in the field of creating
descriptions and keywords or tags for images and videos. Both large companies, such
as Google and Microsoft, and small ones that work in certain areas, for example,
Clarifai (clarifai.com) or Imagga (imagga.com), are engaged in this task. It can also be
noted that certain prerequisites have been created in this area and preliminary studies
are being conducted, which determines such an intensive development. For example,
special image collections were created (e.g. ImageNet, Microsoft COCO, etc.). All
this has allowed achieving by Google Brain researchers automatically create captions
that can accurately describe images. The authors of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provide a number of
successful examples of the operation of this algorithm. Microsoft also has excellent results in
this area.
      </p>
      <p>
        The task of evaluating the word similarity is important in the semantic processing
of image-related texts. Based on state-of-the-art we found out that researchers study
this problem from two perspectives. Firstly, this is the problem of generating text
from an image [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Secondly, it is the problem of images generating from natural
language [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4-7</xref>
        ]. Analysis of publications shows the relevance of the problem
statements [
        <xref ref-type="bibr" rid="ref3 ref6 ref7">3, 6, 7</xref>
        ]. Many authors note that existing approaches generate text descriptions
from a sequence of images automatically. However, such construction of sentences
bases on texts roughly concatenation, which leads to the problem of generating
semantically incoherent content. We can underline that the image-to-text generating is
still an unsolved problem.
      </p>
      <p>
        Another task that many researchers are working over is generating an image based
on a part of the text. Despite some progress in this area, a number of issues are still
open. The authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] study the existent state of the art of models. They note that
recent progress has been made using Generative Adversarial Networks (GANs).
Generative adversarial networks, driven by simple textual descriptions of images, are
capable of generating realistic-looking images [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, current methods still
struggle to generate images based on complex image captions from a heterogeneous
domain. In addition, quantitatively evaluating these text synthesis models is a real
challenge due to most assessment metrics only evaluate image quality and do not
evaluate the correspondence between the image and its caption. The authors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
propose the approach to solve the issue based on a new evaluation metric.
      </p>
      <p>
        Several papers studying particular semantic similarity evaluation metrics [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
Semantic similarity between word pairs has become the most common evaluation
benchmark for word embeddings [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. A large amount of research on semantic
textual similarity is focused on creating modern embeddings. In paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is figured
out that the inclusion of semantic information in any similarity measures improves the
efficiency of the similarity measure and provides human interpretable results for
further analysis. Authors [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] note that little attention was paid to similarity measures.
The cosine similarity is used in the majority of cases. Paper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] illustrate that for all
common word vectors, cosine similarity is essentially equivalent to the Pearson
correlation coefficient, which provides some justification for its use. In the paper [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
report experiments with a rank-based metric for word embeddings, which performs
comparably to vector cosine measure. Researchers suggest that rank-based measures
can improve clustering quality. The analysis shows that many authors note the
shortcomings of the cosine measure in solving problems of assessing the similarity of
words and texts.
      </p>
      <p>
        The study of state-of-the-art shows that in tasks of semantic proximity succeeded
the vector models. Lacking standardized evaluation methods for vector
representations of words, the NLP community relies on word similarity tasks. The paper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
notes that the recent methods perform in capturing semantic and syntactic regularities
using vector arithmetic, but the origin of these regularities has remained opaque. They
analyze and make explicit the model properties needed for such regularities to emerge
in word vectors. Paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] presents several problems associated with the evaluation
of word vectors on word similarity datasets and summarize existing solutions. The
study suggests that the use of word similarity tasks for evaluation of word vectors is
not sustainable and calls for further research on evaluation methods [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Authors of
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] conduct an evaluation of a large number of word embedding models for language
processing applications. Based on the six models of word embedding they provide
experimental results and estimate the performance. The paper [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is devoted to
neural language models for word embeddings that capture rich linguistic and conceptual
information. In paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] an unsupervised method to generate Word2Sense word
embeddings is considered. Authors conclude that on computational NLP tasks,
Word2Sense embeddings compare well with other word embeddings generated by
unsupervised methods. As a result of the literature review, we found out the impact of
word embeddings based methods on the word similarity evaluation. The most popular
similarity metric in semantic models is the vector cosine. Compared to Euclidean
distances, the cosine measure is normalized and is robust to the scaling effect.
However, the limitation of this metric is that it does not take into account that some
dimensions might be more relevant for the semantic content. This leads to the necessity of
using and studying alternative metrics.
      </p>
      <p>In our work, we try to apply different clustering algorithms based on different
metrics. One of the most popular technique word2vec to unification image tags meanings
problem is also used. We consider our experiments and results for image tag
aggregation with using all these methods.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Data Set Description</title>
        <p>We used five fragments of films for our experiments, they are Batmobile, FC
Barcelona, Hunger Games, Meghan Trainor, Remi Gaillard. All these films were divided
into shots. The structure of these files you can see on the Table 1. We received sets of
tags for all video shots using the auto-tagging program from Imagga company
(https://imagga.com/).</p>
        <p>After removing all duplicate tags, we receive the set of tags that are shown in
Fig. 1. On the stage of removing all duplicate tags, we delete only repeating words
without any pre-processing or semantic analysis.</p>
        <p>
          Our task was to create a core of tags for each video without sense missing.
Initially, we present our experiments with clustering algorithms. In our case we don’t know
the number of clusters therefore we need to use a clustering algorithm that can take
into account this feature. For our experiments we use Affinity propagation algorithm.
It is a clustering algorithm based on the concept of "message passing" between data
points [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Unlike clustering algorithms such as k-means propagation does not
require the number of clusters to be determined or estimated before running the
algorithm. Affinity propagation finds "exemplars" members of the input set that are
representative of clusters [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>For defining the similarity measure we use Levenshtein distance. It’s a string
metric for measuring the difference between two sequences. Informally, the Levenshtein
distance between two words is the minimum number of single-character edits required
to change one word into the other.</p>
        <p>Table 3 shows some centroids and corresponding tags for them. We show only five
centroids for each film. Table 4 indicates the final number of core tags and examples
of core tags.</p>
        <p>Clusterization</p>
        <p>Centroid Tags
light bright, light, night
vehicle convertible, device, mechanical, office, recycle, vehicle
colour club, collar, color, colorful, colour, computer, ecology
partners happiness, letters, partner, partners, partnership, patriotism
30s 20s, 30, 30s, 3d, 40s
friends field, friends, friendship, greenhouse
curtain cartoon, currency, curtain, fountain, portrait, urban
celebrate beverage, celebrate, celebration, corporate
cloud child, close, closeup, clothes, cloud, crowd, tagcloud
active active, activity, attractive, autumn, fantasy
broom bathroom, broom, brush, room
print drink, grain, parquet, plant, pretty, print, spring, think
health adult, health, healthcare, healthy
businesswoman businessman, businesspeople, businesswoman
blind basin, bird, blind, blonde, lines, smiling
person expression, person, season, yellow
fashion family, fashion, fashionable, passion
Name of film
Batmobile
FC Barcelona
Hunger Games
Meghan Trainor
Remi Gaillard
cap, coat, fit, hair, happy, hat, head, hot, lab, shape, two
health, healthcare, healthy, heart, vitality
hairpiece, happiness, timepiece
automobile, couple, mobile, model, movable
ibizan, minibus, minivan
dane, doberman, german, human, lab, lawn, man, men, tan
bend, giant, hand, hands, island, plant, sand, sandbar
barrier, retriever, tennis, terrier</p>
        <p>Finally tags (Total number of core tags/examples of core tags)
54 / light, vehicle, digital, people, backdrop, decoration, style, paper, man,
businessman, auto, traffic, sport, automotive, adult, hand, friends,
relationship, finance, partners, cart, etc.
58 / ball, metal, celebrate, advertise, cloud, fare, association, packet,
contestant, soccer, tree, outdoor, grass, pole, active, ring, cuisine, eating, team,
friends, boy, looking, place, fence, sit, etc.
76 / black, space, flower, decoration, element, style, water, cereal,
agriculture, crop, country, health, sun, land, cloud, old, grunge, glass, ice,
advertise, package, ornament, association, print, businesswomen, etc.
65 / light, color, person, hat, health, hands, sensual, dress, child, style,
internet, boy, suit, gold, relaxing, cream, eating, water, clothing, girl,
spring, active, dance, desire, eating, house, etc,
110 / mobile, trailer, tow, tree, man, mountain, horizon, natural, water,
card, dog, holiday, active, sport, children, sun, animal, sea, terrier,
swimming, enjoyment, health, romantic, rest, destination, etc.</p>
        <p>As Table 3 shows the results are not very good, the algorithm merges such words
as retriever and tennis or bird and blond. Therefore, this algorithm is not appropriate
for our task.</p>
        <p>But for good quality evaluation, we need etalon to which we can compare our
results. Unfortunately, we don’t have a dataset with correct core tags for each image
from our collection. However, we can take one image and humans will evaluate tags
and separate only the most important tags for this image. The image with initial and
human tags is presented in Table 5.</p>
        <p>The clustering results were confirmed by our example. Only 3 words of 17 match
with the opinion of experts. These words are like outdoor, fashion and face. To
improve the results of clustering, we used the word2vec model to represent tags and then
clustered using the Euclidean metric. The results were also pretty bad.</p>
        <p>As a result of the experiments, we decided not to use clustering but proposed our
own algorithm for combining tags within the meaning. The description of the
algorithm and the results are shown below.</p>
        <p>Initial tags from auto-tagging program
tourist 52.69%, person 48.47%, traveler
31.54%, pedestrian 31.50%, attractive 30.69%,
people 30.41%, adult 30.39%, street 28.71%,
pretty 26.57%, cute 25.67%, smile 25.25%,
outdoor 24.92%, business 23.29%, city 23.02%,
building 22.73%, urban 22.57%, happy 21.12%,
fashion 20.25%, lifestyle 19.78%, women
18.96%, man 18.68%, lady 18.58%,
professional 18.05%, … (total 102 tags)
Human tags (core of tags)
tourist, person, attractive, street, outdoor, business, fashion, women, student, face,
bag, walking, communication (total 13 tags)
Tags from clusterization algorithm
pretty, outdoor, fashion, man, face, model, walking, businessman, coat, education,
style, architecture, successful, university, phone, shopping, travel (total 17 tags)
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experiments with Similarity Measure</title>
        <p>The most effective metric for determining the similarity of words is the word2vec
model. We use it as a base for our algorithm. The whole algorithm is shown in Fig. 2
Input tags</p>
        <p>Preprocessing</p>
        <p>Using
word2vec
for tag
similarity</p>
        <p>Tag
comparison</p>
        <p>with
considering
score from
auti-tagging
program</p>
        <p>Creation of
tag core</p>
        <p>The proposed algorithm is not complicated, but it takes into account the semantic
similarity of words using word2vec and the weights that the tags have after the
autotagging program.</p>
        <p>We take the word2vec model from spaCy software library and compare each tag
with others on the list. If a tag does not have strong links with other words, we delete
it. Otherwise, we keep tag with a high score in the final set.</p>
        <p>The results of these experiments are presented on Fig. 3. For further experiments,
we took a similarity value of more than 0.8. It was selected based on an analysis of
the tags received, as well as their number. As Fig. 3 shows, we receive a fairly short
list of tags when the value of the similarity measure is more than 0.8. The top of the
tag lists with a similarity measure of more than 0.8 is presented in Table 6.
Batmobile
black
men
success
sport
hand
color
women
playing
hand
smiling
black</p>
        <p>Top of tags (Top 10)
work
money
tasty
smiling
clothing
clothing
working
child
interior
dinner
cereal
example, we can use a part-of-speech tagger for the determiner part of speech and
stay only nouns for the core tag set. Also, such tags as playing and sport could be
merged into one concept.</p>
        <p>For the evaluation, we used METEOR metric. Unigram precision P is calculated as
where m is the number of unigrams in the candidate for tag core which are also found
in the human list of tag core, and   is the number of unigrams in the list from our
algorithm. Unigram recall R is computed as:
where m is as mentioned above, and  ℎis the number of unigrams in the human list of
tag core. Precision and recall are combined using the harmonic mean in the following
fashion, with recall weighted 9 times more than precision:</p>
        <p>For example in Table 5 we received P=0.58, R=0.54, and Fmean=0.62. The final list
of tag core for this example is “smile, outdoor, business, women, work, student, face,
bag, one, success, fashion, education”.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper, we presented the method for the unification of image tag meaning
and show that clustering algorithms aren`t effective for this task. In this work, we
provided how the word2vec works for tag aggregation of keywords sets for video
fragments, using the score from auto-tagging program. We presented statistical
information about our experiments and results. The experiments and results showed that
we need to improve our approach to tag core creation. For a qualitative analysis of the
proposed approach, it is necessary to create a “gold” collection with sets of tags from
users and then evaluate the accuracy of the proposed method.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kanishcheva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharonova</surname>
          </string-name>
          , N.:
          <article-title>Image and Video Tag Aggregation</article-title>
          .
          <source>In: Supplementary Proceedings of the Seventh International Conference on Analysis of Images, Social Networks and Texts (AIST</source>
          <year>2018</year>
          ), pp.
          <fpage>161</fpage>
          -
          <lpage>172</lpage>
          . Moscow, Russia (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge</article-title>
          .
          <source>In: IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>39</volume>
          ,
          <issue>4</issue>
          , pp.
          <fpage>652</fpage>
          -
          <lpage>663</lpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Q.</given-names>
            , &amp;
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .
          <fpage>04192</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Text to Image Synthesis Using Generative Adversarial Networks</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <fpage>00676</fpage>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hinz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wermter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Semantic Object Accuracy for Generative Text-toImage Synthesis</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <fpage>13321</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Agnese</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <fpage>09399</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sommer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iosifidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Text-to-image synthesis method evaluation based on visual patterns</article-title>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sitikhu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pahi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thapa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakya</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Comparison of Semantic Similarity Methods for Maximum Human Interpretability</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <fpage>09129</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zhelezniak</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savkov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammerla</surname>
          </string-name>
          , N. Y.:
          <article-title>Correlation Coefficients and Semantic Textual Similarity</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <fpage>07790</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Santus</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chersoni</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Zhang, Y.:
          <article-title>A rank-based similarity metric for word embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <year>01923</year>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Soler</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apidianaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allauzen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <fpage>08377</fpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C. D.:
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          , https://nlp.stanford.edu/pubs/glove.pdf (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Faruqui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsvetkov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rastogi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Problems With Evaluation of Word Embeddings Using Word Similarity Tasks</article-title>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>35</lpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Evaluating word embedding models: Methods and experimental results</article-title>
          .
          <source>APSIPA Transactions on Signal and Information Processing</source>
          ,
          <volume>8</volume>
          ,
          <issue>E19</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jean</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Embedding word similarity with neural machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6448</fpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Panigrahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simhadri</surname>
            ,
            <given-names>H. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Word2Sense: Sparse Interpretable Word Embeddings</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>5692</fpage>
          -
          <lpage>5705</lpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Dueck</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Clustering by passing messages between data points</article-title>
          .
          <source>Science</source>
          <volume>315</volume>
          (
          <issue>5814</issue>
          ),
          <fpage>972</fpage>
          -
          <lpage>976</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>