<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Triplet losses-based matrix factorization for robust recom mendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Flavio Giobergia</string-name>
          <email>flavio.giobergia@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Control and Computer Engineering</institution>
          ,
          <addr-line>Politecnico di Torino, Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Much like other learning-based models, recommender systems can be afected by biases in the training data. While typical evaluation metrics (e.g. hit rate) are not concerned with them, some categories of final users are heavily afected by these biases. In this work, we propose using multiple triplet losses terms to extract meaningful and robust representations of users and items. We empirically evaluate the soundness of such representations through several “bias-aware” evaluation metrics, as well as in terms of stability to changes in the training set and agreement of the predictions variance w.r.t. that of each user.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>matrix factorization</kwd>
        <kwd>contrastive learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems are a fundamental part of almost
any experience of online users. The possibility of
recommending options tailored to each individual user is
one of the key contributors to the success of many
companies and services. The metrics that are commonly
used in literature to evaluate these models (e.g. hit rate)
are typically only concerned with the overall quality of
the model, regardless of the behaviors of such models
on particular partitions of data. This results in
recommender systems typically learning the preferences of the
“majority”. This in turn implies a poorer quality of
recommendations for users/items that belong to the long tail
of the distribution. In an efort to steer the research
focus to addressing this problem, the EvalRS challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
This challenge, based on the RecList framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
proposes a recommendation problem with a multi-faceted
evaluation, where the quality of any solution is not only
evaluated in terms of overall performance, but also based
on the results obtained on various partitions of users and
items. In this paper, we present a possible recommender
system that addresses the problem proposed by EvalRS.
The solution is based on matrix factorization by
framing an objective function that aligns users and items in
the same embedding space. The matrices are learned by
minimizing a loss function that includes multiple triplet
losses terms. Diferently from what is typically done (i.e.
aligning an anchor user to a positive and a negative item),
in this work we propose additionally using triplet terms
for users and items separately.
      </p>
      <p>
        The full extent of the challenge is described in detail in
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In short, the goal of the challenge is to recommend
EvalRS at CIKM 2022
nEvelop-O
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>In this section we present the proposed methodology,
highlighting the main aspects of interest. No data
preprocessing has been applied to the original data, although
some approaches have been attempted (see Section 4).
The proposed methodology, as explained below, allows
ranking all items based on estimated compatibility with
any given user. We produce the final list of 
recommendations by stochastically selecting items from the ordered
list of songs, weighting each song with the inverse of its
position in the list.</p>
      <sec id="sec-2-1">
        <title>2.1. Loss definition</title>
        <p>
          Matrix factorization techniques have long been known
to achieve high performance in various recommendation
challenges [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This approach consists in aligning
vector representations for two separate entities, users and
items (songs, in this case). This alignment task is a
recurring one: a commonly adopted approach to solving this
problem is through the optimization of a triplet loss [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>A triplet loss is a loss that requires identifying an
anchor point, as well as a positive and a negative point, i.e.
points that should either lie close to (positive) or far from
(negative) the anchor point.</p>
        <p>Users and songs can thus be projected to a common
embedding space in a way that users are placed close to
songs they like and away from songs they do not like.</p>
        <sec id="sec-2-1-1">
          <title>1https://github.com/fgiobergia/CIKM-evalRS-2022</title>
          <p>
            This can be done by choosing a user as the anchor, and
two songs as the positive and negative points. A
reasonable choice for the positive song is one that has been
listened by the user. The choice for the negative song
is trickier. Random songs, or songs not listened by the
user are possible choices. However, more sophisticated
strategies can be adopted to choose negative points that
are dificult for the model to separate from the anchor.
These are called hard negatives and have been shown in
literature to be beneficial to the training of models [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
          <p>We decided to use a simple policy for the selection of
a negative song: a negative song for user  is extracted
from the pool of songs that have been listened by one of
the nearest neighbors of  and have not been listened by
 . By doing so, we aim to reduce the extent to which the
model relies on other users’ preferences to make a
recommendation. The concept of neighboring users is obtained
by comparing the similarity between embedding
representations of all users. Due to the computational cost of
this operation, it is only performed at the beginning of
each training epoch.</p>
          <p>We can thus define the triplets
(</p>
          <p>,   ,   ) to be used
songs respectively.
for the definition of a triplet loss. Here, 

represent the vector for the anchor user, whereas   and
 represent the vectors for the positive and negative
 is used to</p>
          <p>
            Similar approaches where users are aligned to songs
they did or did not like are Bayesian Personalized
Ranking (BPR) [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], where negative songs are sampled
randomly, and WARP [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], where negative items are sampled
so as to be “hard” (based on their proximity of the anchor
w.r.t. the positive item). To improve the robustness of
the representations built, we are additionally interested
in aligning similar songs and similar users. To this end,
we introduce two additional triplet terms to the loss
func
tion, one that is based on (  ,   ,   )and one on (
          </p>
          <p>,   , 
Based on the previously defined concepts, we choose 
as a song listened by   , and   and   as users who
re
spectively listened to   and   . Other alternatives have

been considered, but were ultimately not selected due to
 ).


a higher computational cost.</p>
          <p>We define the final loss as:
ℒ = ∑   ({(

 1{(
 2{(</p>
          <p>,   ) − (</p>
          <p>,   ) − ( 

, 
, 

 ,   ) − (  , 

 ) +  0, 0}+
 ) +  1, 0}+
 ) +  2, 0})
(1)</p>
          <p>Where (⋅) is a distance function between any pair of
vectors. In this work, the cosine distance is used.   is a
margin enforced between positive and negative pairs. In
this work, since all elements are projected to a common
embedding, we used  0 =  1 =  2. Finally,   is a
weight that is assigned to each entry, which is discussed
in Subsection 2.2.</p>
          <p>useranc
songanc
songneg</p>
          <p>userneg
userpos
songpos
user-song loss
song-song loss
user-user loss
on the vectors. Arrow directions represent whether elements
are pulled towards or pushed away from the anchors.
the loss on the embedding vectors learned.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Popularity weight</title>
        <p>
          To make the minority entities more relevant, we adopted
a weighting scheme that modulates the previously
described loss so as to weigh rows more if they belong to
“rarer” entities and less for common ones. In accordance
with [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we identified five factors to be kept into account.
        </p>
        <p>Based on these, a coeficient has been defined for each
entry in the training set. The final weight is given by
a weighted average of these coeficients. The following
is a list of factors, along with the way the respective
coeficients have been computed (logarithms are used for
factors that follow a power law distribution). All
coeficients are normalized to sum to 1 across the respective
population.</p>
        <p>• Gender (</p>
        <p>): in accordance with the original
dataset, a relevance coeficient is provided for the
categories male, female, and undisclosed 2. The
coeficient is proportional to the inverse of the
occurrences of each gender in the list of known
2This simplified perspective on gender does not reflect that of the
• Country (  ): the coeficient related to the We therefore introduce the consistency metric, which
country is calculated as the inverse of the loga- quantifies the variance of the model when tested across
rithm of the number of occurrences of the specific multiple folds, or datasets. A higher variance in
percountry of the users in the training set. formance would be associated with a lower consistency
• Artist popularity (  ): a proxy for the popular- (or higher inconsistency). For a single metric, the
consisity of an artist is obtained by the number of times tency could be defined as the variance of the metric across
songs by that artist have been played in the train- the folds. However, when multiple metrics are involved
ing set. The inverse logarithm of this quantity is (as is the case with this competition), a normalization step
used as coeficients. should be introduced. We thus instead use the coeficient
• Song popularity (  ): a proxy for the popularity of variation, defined as the standard deviation divided by
of a song is provided by the number of times that the mean value, to quantify the inconsistency of a model
song has been played in the training set. The with respect to a metric  . We compute the consistency
inverse logarithm of this quantity is used as coef- for a metric as 1 - inconsistency. The overall consistency
ifcients. is therefore computed as the mean consistency across all
• User activity (  ): the overall activity of a user metrics:
can be quantified in terms of the number of songs
that they have listened to across the training set.  = 1 ∑ (1 −   ) (2)
The inverse logarithm of this quantity is used as | | ∈ |  |
coeficients.</p>
        <p>Where  represents the set of all metrics used, while
  and   are the arithmetic mean and standard
deviation computed over all the folds, for a metric  . We
use the absolute value of the mean to make the results
comparable regardless of sign. Alternatively, the ratio
  2 / 2 could be used to assign a lower penalty in case
of small deviations. The maximum possible eficiency,
1, would be assigned to a model that presents the same
exact performance across all folds for all metrics. Section
3 reports the consistency, measured in these terms, for
the proposed solution.</p>
        <p>The weighted sum of the above-mentioned coeficients
constitutes the weight   in Equation 1. The weights used
for each coeficient have been searched as a part of the
tuning of the model and are presented in Section 3.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Model initialization</title>
        <p>
          The initial values assigned to the users’ and items’
vectors greatly afects the entire learning process. A good
initialization can make the convergence process faster
and/or allows reaching a better minimum. We used
initial vectors for users and items based on an adaptation 2.5. Variance agreement
of the word2vec algorithm [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. We built a corpus of sen- Diferent users may have diferent interests in terms of
tences, one for each song known, composed of users variety. In the “music” context a user may, for example,
who listened to that song, artists and albums, all in the listen to songs from very few authors, whereas others
form token-type=token-value (e.g. song=1234). We then may be more interested in a wider variety of artists. A
trained word2vec to learn representations for all of the similar concept may be applied to other contexts (e.g.
tokens involved. We used as initial vectors the vectors in terms of brand loyalty for products). It is therefore
obtained for the users and songs tokens. desirable that a recommender system should provide a
        </p>
        <p>Word2vec places tokens close in the embedding space wider variety of recommendations for users that are
inbased on their adoption in similar contexts. For this rea- clined to them, and vice versa. We introduce the concept
son, based on the definition of sentences, this approach of variance agreement w.r.t. a variable, which quantifies
already brings close users with similar tastes – in terms how the variance in recommendations correlates to each
of songs, artists, albums, as well as similar songs – in user’s interest in variance, as dictated by their previous
terms of users that listened to them, artists the produced interactions, in terms of the variable of interest. In this
them, albums they are found in. context, we use the artists that produced songs as the
variable of interest.
2.4. Model consistency We quantify the variance of a set of songs as the Gini
impurity over that set, where each song is mapped to the
As we will discuss in Section 3, we empirically observed respective artist. We can thus assign an impurity to any
that the proposed solution presents high variance in the given user,   , as the impurity within the set of songs
performance obtained across the various folds. While they listened to in the training set. For that same user,
this is not directly measured as a part of the core metrics we can define the model’s impurity,   ̂, as the impurity
of EvalRS, we still believe it is important to account for of the set of  songs recommended by the model for that
this aspect in a well-rounded evaluation. user.</p>
        <p>If   is low, the user listens to a limited set of artists is due to the multi-faceted nature of the overall score
(if 0, the user has only listened to one artist in all of its
function.
mending songs from a limited set of artists.
interactions). Similarly, if   ̂ is low, the model is
recom</p>
        <p>Despite the eforts made toward reducing the efect
of the dataset imbalances on the final model, we still
To measure the agreement between users and model’s
observed that the performance of the model is not always
variance, we compute the Pearson correlation on the
consistent. In other words, there is a relatively high
paired data [(  ,   ̂) |  ∈  ] , with  being the set of all
variance in the performance across the various folds.
provides a very interesting perspective on the strengths
the artists the user listens to are the same ones being
and weaknesses of the proposed solution. In particular,
recommended) – that information may be quantified by
the model is highly inconsistent for some of the
fairnessother metrics concerned with the accuracy of models,
oriented metrics – as highlighted by the low consistency
rather than their suitability over a heterogeneous set of
obtained for track popularity and gender. While this does</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental results</title>
      <p>
        In this section we present the results obtained in terms
of the main metrics identified by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], as well as some
additional considerations on the proposed solution.
      </p>
      <p>The model has been trained and fine-tuned to identify
well-performing values for the main hyperparameters.
The best configuration of parameters found is reported
in Table 1.</p>
      <p>Parameter</p>
      <p>Value
 0 =  1 =  2

 1
 2










128
2.5
2.5
0.25
5
100
104
105
104
tion that outperforms all others across all metrics. This
not necessarily imply poor performance, it is a symptom
that the model may be susceptible to fluctuations in
performance as the dataset used for training is changed.
Other metrics, such as the behavioral and the “standard”
ones, show instead a more consistent behavior.</p>
      <p>We additionally evaluated the proposed methodology
in terms of variance agreement for the “artist” variable.
We achieved an agreement of 0.2479, whereas a random
model would achieve ≈ 0. This indicates that the model
does take into account, to some extent, the individual
user’s variance preference. However, there is room for
improvements in these terms.</p>
      <sec id="sec-3-1">
        <title>3.1. Ablation study</title>
        <p>To understand the efect of the various choices made,
we introduce an ablation study where we remove some
portions of the proposed methodology. In particular,
we study the situations where (1) no user-user loss is
consider, (2) no item-item is considered, (3) a random
initialization is used instead of word2vec and (4) all training
records are weighted the same, regardless of their rarity.
score, for all situations. From this we can observe that
all proposed approaches bring a benefit to the overall
result, with the removal of the additional loss terms being
the most important. It should be noted, however, that
this ablation study has been carried out using the
hyperparameters that produced the best performance for the
“full” approach. As such, the ablated performance may be
afected by a lack of hyperparameters fine-tuning, thus
possibly resulting in a lower score.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Failed approaches</title>
      <sec id="sec-4-1">
        <title>I have not failed. I’ve just found 10,000 ways that won’t work.</title>
      </sec>
      <sec id="sec-4-2">
        <title>Thomas A. Edison</title>
        <p>In this section we describe some attempts that have
samples from the training set (points that are never
sampled), whereas using a weight for each row makes sure
that all rows are seen during training.</p>
        <p>Data augmentation: to increase the breadth of the data
available, we tried to synthesize new user-song
interactions, to be then used for training. In particular, we first
quantified the proclivity of users to listen to a limited
number of artists, by means of the Gini impurity (the
more homogeneous the choice of artists, the lower the
Gini index). We can then sample users based on this
factor, and add user-song relationships, where songs are
chosen to belong to the most “likely artists” (i.e. the
artists that are more commonly listened by each sampled
user).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper we presented a possible solution to the
EvalRS challenge. The solution uses matrix factorization
based on multiple triplet losses combined together to
Table 4 align users and songs in the same space. A weighting
Ablation study of the proposed solution. Performance is re- scheme has been introduced to assign more importance
ported in terms of the overall score adopted for the competi- to uncommon users/items – thus improving the quality
tion. of the model in terms of fairness. By introducing the
consistency metric, we show some of the main weaknesses
of the proposed approach: namely, the fact that it is not
been made, but that have not brought any improvement consistent w.r.t. some metrics. We consider this to be one
in terms of performance. of the main problems to be addressed. We additionally</p>
      <p>Entity resolution: the list of known songs contains covered some of the failed attempts made, in the hope
some duplicates. We tried using a naive entity resolu- that others will either not make the same mistakes, or
tion approach (songs with matching artists and matching ifgure out how to improve upon them.
titles are considered to be the same song). Since this
problem afected only a small fraction (a few percent)
of songs, the ER step did not produce any significant Acknowledgments
improvement and has thus been discarded.</p>
      <p>Dataset resampling: we attempted to resample the This work has been supported by the DataBase and Data
training set with a weighting scheme similar to the one Mining Group and the SmartData center at Politecnico
already used to weigh each training sample based on their di Torino.
uniqueness. Worse performance have been observed as a
result of this approach: it can be argued that the reason
for this is that the resampling outright removes some
3A score of -100 has been assigned to solutions that did not reach a
hit rate of 0.015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schnabel</surname>
          </string-name>
          , G. Attanasio,
          <string-name>
            <given-names>C.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. d. S. P.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <article-title>Evalrs: a rounded evaluation of recommender systems</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2207.05772.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 2 2 0 7 . 0 5 7 7 2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <article-title>Beyond ndcg: behavioral testing of recommender systems with reclist</article-title>
          ,
          <source>in: Companion Proceedings of the Web Conference</source>
          <year>2022</year>
          ,
          <year>2022</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <article-title>The lfm-1b dataset for music retrieval and recommendation</article-title>
          ,
          <source>in: Proceedings of the 2016 ACM on international conference on multimedia retrieval</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          ,
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          ,
          <source>Computer</source>
          <volume>42</volume>
          (
          <year>2009</year>
          )
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <article-title>Facenet: A uniifed embedding for face recognition and clustering</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stylianou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pless</surname>
          </string-name>
          ,
          <article-title>Hard negative examples are hard, but useful</article-title>
          ,
          <source>in: European Conference on Computer Vision</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>126</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Freudenthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gantner</surname>
          </string-name>
          , L. SchmidtThieme, Bpr:
          <article-title>Bayesian personalized ranking from implicit feedback</article-title>
          ,
          <source>arXiv preprint arXiv:1205.2618</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <article-title>Large scale image annotation: learning to rank with joint word-image embeddings</article-title>
          ,
          <source>Machine learning 81</source>
          (
          <year>2010</year>
          )
          <fpage>21</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>