<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interpretable Matrix Factorization with Stochasticity Constrained Nonnegative DEDICOM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafet Sifa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cesar Ojeda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kostadin Cvejoski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bauckhage</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bonn</institution>
          ,
          <addr-line>Bonn</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Decomposition into Directed Components (DEDICOM) is a special matrix factorization technique to factorize a given asymmetric similarity matrix into a combination of a loading matrix describing the latent structures in the data and an asymmetric a nity matrix encoding the relationships between the found latent structures. Finding DEDICOM factors can be cast as a matrix norm minimization problem that requires alternating least square updates to nd appropriate factors. Yet, due to the way DEDICOM reconstructs the data, unconstrained factors might yield results that are di cult to interpret. In this paper we derive a projection-free gradient descent based alternating least squares algorithm to calculate constrained DEDICOM factors. Our algorithm constrains the loading matrix to be column-stochastic and the a nity matrix to be nonnegative for more interpretable low rank representations. Additionally, unlike most of the available approximate solutions for nding the loading matrix, our approach takes the entire occurrences of the loading matrix into account to assure convergence. We evaluate our algorithm on a behavioral dataset containing pairwise asymmetric associations between variety of game titles from an online platform.</p>
      </abstract>
      <kwd-group>
        <kwd>Unsupervised Learning</kwd>
        <kwd>Matrix Factorization</kwd>
        <kwd>Constrained Optimization</kwd>
        <kwd>Behavior Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Matrix and tensor factorization methods have been widely used in data
science applications for mostly understanding hidden patterns in the datasets and
learning representations for variety of prediction tasks and recommender
systems [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref13 ref16 ref17 ref5">1, 5, 10, 11, 13, 16, 17</xref>
        ]. Decomposition into Directed Components
(DEDICOM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a matrix and tensor factorization technique and a compact way of
nding low rank representations from asymmetric similarity data. The method
has found applications in various data science problems including analysis of
temporal graphs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], rst order customer migration [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], natural language
processing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], spatio-temporal representation learning [
        <xref ref-type="bibr" rid="ref16 ref2">2, 16</xref>
        ] and link prediction in
relational and knowledge graphs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where the source of the addressed problems
were primarily about analyzing asymmetric relationships between prede ned
entities.
      </p>
      <p>Formally, given an asymmetric similarity matrix S 2 Rn n encoding
pairwise asymmetric relations (i.e. sij 6= sji) among n objects, two-way DEDICOM
nds a loading matrix A 2 Rn k and a square a nity matrix R 2 Rk k for
representing S as
More precisely, DEDICOM approximation of the asymmetric association
between elements i and j (i.e. sij ) can be formulated in column vector notation as
sij</p>
      <p>n
aiT: Raj: = X</p>
      <p>aibrb:
b=1</p>
      <p>T</p>
      <p>n n
aj: = X X aibrbcajc;
b=1 c=1
where ai: and aj: represent the ith and jth row of A respectively and rb:
represents the bth row of R. In this representation A contains the latent factors
where the columns describe the hidden structures in S and the relations among
these structures are encoded by the asymmetric a nity matrix R. A pictorial
illustration of two-way DEDICOM factorization is shown in Fig. 1.</p>
      <p>
        Considering the three factor approximations of the similarity values as in
(1) and (2), the interpretation (especially with respect to scale) of the hidden
structures from given factor matrices A and R might yield misleading results
with the presence of negative values [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. That is, the interpretation of the
results is limited to only consider nonnegative a nities in R or the positive or
negative loadings of the corresponding points in (2). Additionally, when
analyzing nonnegative data matrices (such as ones containing probabilities, counts and
etc.), it is usually bene cial to consider nonnegative factor matrices that can be
considered as condensed (or compressed) representations that can be used for
informed decision making and representation learning [
        <xref ref-type="bibr" rid="ref1 ref15 ref16 ref8">1, 8, 15, 16</xref>
        ].
(1)
(2)
      </p>
      <p>
        In this work we address the issue of interpretability of the DEDICOM
factors by introducing a converging algorithm as an alternative to the previously
proposed methods in [
        <xref ref-type="bibr" rid="ref1 ref13 ref15 ref16">1, 13, 15, 16</xref>
        ]. Our method constrains the loading matrix A
to contain column stochastic vectors and (similar to [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]) the a nity matrix
R to be nonnegative. Formally this amounts to factorize a given asymmetric
similarity matrix S as in (1) by enforcing
and
rij
      </p>
      <p>0 8 fi; jg
acb
0 ^
n
X aqb = 1 8 fc; bg:
q=1</p>
      <p>
        Compared to the additive-scaling based representation introduced in [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ],
constraining the columns of A to contain probabilities has the direct advantage
of interpreting each element of A as the representativeness value of the
particular loading it represents. That is, the value of aib represents how much the
ith element in the dataset contributes to the bth latent factor. This is indeed
parallel to the idea of Archetypal Analysis [
        <xref ref-type="bibr" rid="ref17 ref3 ref6">3, 6, 17</xref>
        ], where the mixture matrices
respectively determine how much a data point and archetypes contribute to
respectively constructing the archetypes and reconstructing each data point using
the archetypes.
      </p>
      <p>In the following we will rst describe the general alternating least squares
optimization framework and give an overview of the algorithms to nd appropriate
constrained and unconstrained DEDICOM factors. After that we will introduce
our algorithm by rst studying our problem setting and deriving the algorithm
step-by-step. This will be followed by a case study covering a real world
application where we will analyze game-ownership patterns from data containing
asymmetric associations between games using the factors we extract by running
our algorithm. Finally, we will conclude our paper with a summary and some
directions for future work.
2</p>
      <p>Algorithms for Finding DEDICOM Factors
Finding appropriate DEDICOM factors for matrix decomposition can be cast as
a matrix norm minimization problem with the objective</p>
      <p>E(A; R) =</p>
      <p>S</p>
      <p>ARAT 2;
which is minimized with respect to A and R. Non-convex minimization problems
of this kind usually follow an iterative alternating least squares (ALS) procedure
where at each iteration we optimize over a selected factor treating the other
factors xed. It is important to note that, the minimized objective function in
(5) is convex in R for xed A whereas not convex in A for xed R, which leads
to optimal and sub-optimal approximate solutions when respectively updating
R and A.
(3)
(4)
(5)</p>
      <p>
        Starting with de ning update rules for the a nity matrix R, we note that
with xed A, minimization of (5) is a matrix regression problem [
        <xref ref-type="bibr" rid="ref1 ref13 ref15">1, 13, 15</xref>
        ] with
global optimum solution
      </p>
      <p>R</p>
      <p>
        AySAT y;
where Ay indicates Moore-Penrose pseudoinverse of A. Furthermore, if A is
constrained to be column orthogonal (i.e. AT A = Ik, where Ik is the k k identity
matrix), the update for R simpli es to R AT SA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Here the updates are
not constrained to fall into a particular class of domain and might result in with
a nity matrices containing negative elements. Constraining R when minimizing
(5) to contain only nonnegative values can be cast as a nonnegative least squares
problem by transforming the arguments of (5) as
where represents the Kronecker product. As noted in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the expression in
(9) can indeed be mapped to constrained linear regression problem [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with a
converging result (through an active set algorithm) to de ne an update as
R
      </p>
      <p>hS ST = hARAT ARTAT i = AhRAT RTAT i;
and solves for A keeping AT</p>
      <p>
        xed [
        <xref ref-type="bibr" rid="ref1 ref13">1, 13</xref>
        ] to come up with an update for A as
E(R) = vec(S)
vec(ARAT )
2
where the vec operator vectorizes (or attens) a matrix B 2 Rm n as
vec(B) = [b11; : : : ; bm1; : : : ; b1n; : : : ; bmn]T :
It is worth mentioning that since vectorizing (5) as shown in (7) does not a ect
the value of the norm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we can make use of the property of vectorization to
represent a vectorization of multiplications of matrices as matrix vector
multiplication to rewrite (7) as
      </p>
      <p>E(R) = vec(S)</p>
      <p>A</p>
      <p>
        2
A vec(R)) ;
(6)
(7)
(8)
(9)
(11)
The latter method's ALS update for the loading matrix A is based on directly
minimizing (5) by considering rst order gradient based optimization [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ],
which in each update moves the current solution for A in the opposite direction
of gradient of (5). To derive the gradient matrix we start by de ning (5) in terms
of the trace operator as
      </p>
      <p>E(A; R) = tr hST S</p>
    </sec>
    <sec id="sec-2">
      <title>ST ARAT</title>
      <sec id="sec-2-1">
        <title>ART AT S + ART AT ARAT i</title>
        <p>= tr hST S
= tr hST Si</p>
      </sec>
      <sec id="sec-2-2">
        <title>2ST ARAT + ART AT ARAT i</title>
      </sec>
      <sec id="sec-2-3">
        <title>2 tr hST ARAT i + tr hART AT ARAT i :</title>
        <p>Since traces are linear mappings and the term tr hST Si does not depend on A,
minimizing E(A; R) in (13) with respect to A is equivalent to minimizing E(A)
which we de ne as
given that
and</p>
        <p>E(A) = E1(A) + E2(A)
E1(A) =</p>
      </sec>
      <sec id="sec-2-4">
        <title>2 tr hST ARAT i</title>
        <p>E2(A) = tr hART AT ARAT i :
Considering the orthogonality constraint on A, the term in (16) becomes
independent of A. That is, since traces are invariant under cyclic permutation we
can reformulate (16) as</p>
        <p>
          E2(A) = tr hART AT ARAT i = tr hRT AT ARAT Ai = tr hRT Ri ;
(17)
(13)
(14)
(15)
(16)
(18)
(19)
which allows for only considering the minimization of the error term in (15) [
          <xref ref-type="bibr" rid="ref15 ref16">15,
16</xref>
          ]. Consequently, we can de ne a gradient descent based update by considering
the gradient matrix
which with a learning rate A allows us to de ne an update for A as
A
        </p>
        <p>A + 2 A</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>ST AR + SART :</title>
      <p>
        As a nal step, in oder to assure the orthogonality constraint on A we project the
updated A by considering its QR decomposition [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] A = QT , where Q 2
Rn k is orthogonal and T 2 Rk k is invertible upper triangular, and following
the update A Q. This concludes our overview of the methods to come up
with appropriate factors. For more detailed information about the alternating
least squares solutions for DEDICOM factorization for matrices and tensors we
refer the reader to [
        <xref ref-type="bibr" rid="ref1 ref13 ref15 ref16">1, 13, 15, 16</xref>
        ].
      </p>
      <p>
        Stochasticity Constrained Nonnegative DEDICOM
In this section we will present a new ALS algorithm to particularly consider
DEDICOM factorization with column stochastic loadings and nonnegative a
nities for interpretability of the resulting factors. To begin with, as noted in [
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ],
we can assure nonnegative a nities R by solving the nonnegative least squares
problem introduced in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] as in (10).
      </p>
      <p>Next considering the theoretical properties of constraining the columns of
matrix A to (4), we note that each column resides in the standard simplex n 1,
which is the convex hull of the standard basis vectors of Rn. Formally, given that
ij represents the Kronocker delta and V = fv1; v2; : : : ; vnjvi = [ i1; : : : ; in]T g
is the set of the standard basis vectors of Rn, the standard simplex n 1 is
de ned as the compact set</p>
      <p>n
n 1 = nX
i=1
n
X
i=1
ivi j
i = 1 ^ i</p>
      <p>o
0 8 i 2 [1; : : : ; n] :
(20)
This indicates that, nding an appropriate column stochastic matrix A
minimizing the convex objective function in (5) can be reduced down to a constrained
optimization problem in the convex compact simplex n 1.</p>
      <p>
        Constrained optimization problems of this kind can be easily tackled using
the Frank-Wolfe algorithm [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ], which is also known as the conditional gradient
method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The main idea behind the Frank-Wolfe algorithm is to minimize
di erentiable convex functions over their domains forming compact convex sets
using iterative rst-order Taylor approximations until achieving convergence.
Formally given a di erentiable convex function f : S ! R over a compact
convex set S, the Frank-Wolfe algorithm aims to solve
(21)
(22)
(23)
for x 2 S, by iteratively solving for
min f (x)
      </p>
      <p>x
st = min sT rf (xt);</p>
      <p>
        s 2 S
where rf (xt) is the gradient of the optimized function f evaluated at the current
solution xt. Following that, at each iteration t, the algorithm estimates the new
solution by evaluating
xt+1 = xt +
t(st
xt);
where t 2 (0; : : : ; 1] is the learning rate that is usually selected by performing a
line search on (23) or set to be monotonically decreasing as a function of t [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ].
Additionally, one particular advantage of this optimization method is that it
allows for -approximation for a given maximum number of iterations tmax [
        <xref ref-type="bibr" rid="ref7 ref9">7,9</xref>
        ].
Considering our special case, since the set of vertices of a standard simplex is
equivalent to the set of standard basis V in Rn [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], at each iteration Frank-Wolfe
u v w x y z
X X X X X X
u v w x y z
X X X X X X
u v w x y z
algorithm moves the current solution into the direction of one of the standard
basis until achieving convergence.
      </p>
      <p>Following that, since the stochasticity constraint does not allow for a
simplication as in (17), we require the full derivation of the gradient matrix @E(A)
@A
from (13) so as to adapt the Frank-Wolfe algorithm to nd appropriate
stochastic columns for A. To this end we start by considering E2(A) from (16) in terms
of a scalar summation as
tr hART AT ARAT i = X X X X X X auvrwvaxwaxyryzauz
(24)
u v w x y z
and consider its partial derivative with respect to an arbitrary element aij of A
that can be written due to linearity of di erentiation as
The expression in (25) can be further simpli ed by the product rule expansion
and evaluating the derivatives results in</p>
      <sec id="sec-3-1">
        <title>X X X X auvrwvaxyryzauz +</title>
        <p>u v y z</p>
      </sec>
      <sec id="sec-3-2">
        <title>X X X X auvrwvaxwryzauz +</title>
        <p>u v w z</p>
      </sec>
      <sec id="sec-3-3">
        <title>X X X X ryzauvrwvaxwaxy:</title>
        <p>v w x y
Finally, reformulating the four summations in (27) in terms of matrix
multiplications as
ij
ij
(27)
Alg. 1: A Frank-Wolfe based projection-free ALS algorithm to nd
Seminonnogative DEDICOM factors with column stochastic loading matrix A
and nonnegative a nity matrix R. At each iteration of the algorithm, we
alternate between nding an optimal column stochastic A keeping R xed
and nding a nonnegative matrix R keeping A xed. Note that the learning
rate here is set to decrease monotonically.
allows us to de ne the gradient matrix @E2(A) as
@A
@E2(A) = 2 ART AT AR + ARAT ART :
@A
(29)</p>
        <p>Combining the results from (18) and (29) the full gradient matrix of the
minimized error norm for A is calculated as follows</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>ART AT AR + ARAT ART ST AR</title>
    </sec>
    <sec id="sec-5">
      <title>SART ;</title>
      <p>(30)
which allows us to de ne a column-wise Frank-Wolfe algorithm to come up with
optimal column stochastic loading matrix A minimizing (5). We list the essential
steps of our adaptation of the projection free Frank-Wolfe algorithm in Alg. 1. In
a nutshell, our algorithm starts by randomly initializing (valid) matrices A and
R, continues with alternating between the Frank-Wolfe optimization updates
to come up with optimal column stochastic A and alternating least squares
240
S
SR230
220
210
200
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30</p>
      <p>ALS Iterations
updates for nonnegative R until a prede ned stopping condition is met. The
stopping conditions are usually selected based on reaching a maximum number
of alternating least squares updates (not to be mixed with the maximum iteration
count tmax of the Frank-Wolfe updates for columns of A in Alg. 1), having minor
subsequent changes in the minimized objective (for our case the matrix norm in
(5)) or combining both of these conditions.
4</p>
      <p>
        A Business Intelligence Application: Analyzing
Asymmetric Game Ownership Associations in an
Online Gaming Platform
In order to evaluate our algorithm, we consider a use-case from game
analytics [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ], where we analyze the asymmetric relationships among various types
of games. To this end we consider a dataset from an online gaming platform called
Steam, which hosts thousands of games of various genres to a large player-base
with its size ranging from 9 to 15 million3 (daily active) players. We used the
dataset from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] which contains game ownership information of more than six
million users about 3007 titles.
      </p>
      <p>Representing the ownership information as a bipartite matrix Y 2 f0; 1gm n
indicating ownership of information of m players for n games, we construct
the asymmetric association among games in terms of their pairwise empirical
conditional probabilities. To this end, we de ne the directional similarity from
game i to game j as
sij = jfc jyci 6= 0 8 cg \ fb jybj 6= 0 8 bgj ;
jfc jyci 6= 0 8 cgj
(31)
3 http://store.steampowered.com/stats/
Loadings</p>
      <p>Identi er
a1
a2
a3</p>
      <p>TF 2/FPS
Indie/Platformer
Flagships/AAA</p>
      <p>TF 2, CS: Source and Red Orchestra: Ostfront 41-45</p>
      <p>WSS, Ethan and Oozi: Earth Adventure</p>
      <p>
        Left 4 Dead 2, Dota 2, Skyrim, HL 2, Garry's Mod
where j j indicates set cardinality and ypq represents if the pth player owns the
qth game. It is important to note that (31) is inherently asymmetric and a
special case of the so-called Tversky index [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] with prototype and variant weights
of respectively 1 and 0 (or vice versa). Since the number of computations
required to obtain the similarity matrix S scales to O(mn2), we parallelized the
procedure of computing the similarity values in (31) by utilizing a hybrid
parallelization technique that simultaneously exploits both distributed and shared
memory architectures in order to speedup the computation time.
      </p>
      <p>
        Having obtained the asymmetric similarity matrix with the empirical
conditional probability values in S, we factorize it using our algorithm and present
the results with k = 3. To explicitly ignore modeling the self-loops, at each
alternating least squares iteration, we replaced the diagonal of S with the diagonal
of its current reconstruction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In Fig. 2 we show the residual sum of squares,
which decreases monotonically and converges to a minimum after 30 iterations.
Analyzing the resulting factors, we observe that the resulting two columns (or
modes) a1 and a2 of A are sparse whereas the last column was dense in terms of
non-zero probability values, where the most dominant games vary from column
to column. In Tbl. 1 we present the games with high loadings in their
corresponding column and observe that mostly the free-to-play agship game Team
Fortress 2 (TF 2), followed by the FPS games CS: Source and Red Orchestra:
Ostfront 41-45, contribute to a1.On the other hand, the indie-platformer games
Wooden Sen'SeY (WSS), Ethan (Meteor Hunter) and Oozi: Earth Adventure
contribute mostly to a2. Finally, the mostly dense a3 has very high loadings
for all of the agship games of the analyzed platform and other AAA games
including Left 4 Dead 2, Dota 2, Skyrim, Half-life 2 (HL 2) and Garry's Mod.
      </p>
      <p>
        Analyzing the resulting row-normalized a nity matrix that encodes the
asymmetric relations between the modes from Fig. 3, the rst rows indicates
tendencies of TF 2 players more to the indie-platformer mode than to the AAA games
because of the fact that TF 2 is mostly a singular game that people primarily
prefer in the Steam platform [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. On the other hand, inline with the results
from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] the opposite directional associations, namely, from the agships to the
TF 2 are high which is related to the fact that majority of the players playing
one or more of the agship games also prefer TF 2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Finally analyzing the
second mode, similar to the results in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] the self association is the highest
association we observe (indicating players preferring mostly to stay in the same
genre), this is followed by high associations to the TF 2 and the mode related
to the agship and AAA games.
      </p>
      <p>TF2/FPS Indie/Platform Flagships/AAA
TF2/FPS</p>
      <p>
        It is worth mentioning that, parallel to the results in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], by increasing the
number of modes k for the loading matrix A, we observed more detailed
partitions regarding the modeled patterns. Running our algorithm with k = 2, we
observed a higher reconstruction error and that the indie-platformer mode a2
remains to be one of the factors whereas the games contributing to a3 have
merged with ones contributing to mode a1. On the other hand, considering the
resulting factors when we ran our algorithm with k = 4, we obtained a slight
improvement regarding the reconstruction error and observed that the modes
a1 and a2 remained the same, however, mode a3 has split into two di erent
modes where both contain the same games as in a3 and the new mode contains
additional indie and plaformer games (reducing the probability on the games of
a3) such as Finding Teddy, Gentlemen, Gravi and Face Noir.
5
      </p>
      <p>Conclusion and Future Work
In this work we studied the DEDICOM model to factorize asymmetric similarity
matrices into a combination of a low rank loading matrix and an a nity
matrix. We gave an overall overview about the theoretical details and algorithms
to come up with appropriate factors. In essence, the a nity matrix matrix R
indicates di erent directional structures that are determined by the columns of
the loading matrix A. Having de ned our objective function to be the matrix
norm of the di erence between the factorized asymmetric similarity matrix and
its DEDICOM reconstruction, we derived the gradient matrix for this function
with respect to the loading matrix A and presented a variant of the Frank-Wolfe
algorithm to obtain interpretable column stochastic loadings and nonnegative
a nities.Running our algorithm on a dataset containing asymmetric pairwise
relationships between more than 3000 games, we found interesting patterns
indicating directional preferences among games.</p>
      <p>
        Our future work involves analyzing the performance of our model for di erent
tasks including link prediction and representation learning [
        <xref ref-type="bibr" rid="ref13 ref16">13, 16</xref>
        ]. Considering
the tensor extensions in [
        <xref ref-type="bibr" rid="ref13 ref16">13, 16</xref>
        ], our future work also involves extending the
proposed model to tensors factorizations which can allow us to analyze asymmetric
similarity matrices from di erent sources. In the context of business intelligence
and game analytics, this might allow us to analyze and compare, for instance,
preferences of di erent countries or player groups.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolda</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Temporal Analysis of Semantic Graphs using ASALSAN</article-title>
          .
          <source>In: Proc. of IEEE ICDM</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sifa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drachen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thurau</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadiji</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Beyond Heatmaps: Spatio-Temporal Clustering using Behavior-Based Partitioning of Game Levels</article-title>
          .
          <source>In: Proc. of IEEE CIG</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bauckhage</surname>
          </string-name>
          , C.
          <article-title>: k-Means Clustering via the Frank-Wolfe Algorithm</article-title>
          .
          <source>In: Proc. of KDML-LWDA</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chew</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rozovskaya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using DEDICOM for Completely Unsupervised Part-Of-Speech Tagging</article-title>
          .
          <source>In: Proc. Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cremonesi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turrin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>: Performance of Recommender Algorithms on Top-N Recommendation Tasks</article-title>
          .
          <source>In: Proc. ACM Recsys</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cutler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Archetypal Analysis</article-title>
          .
          <source>Technometrics</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ),
          <volume>338</volume>
          {
          <fpage>347</fpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfe</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>An Algorithm for Quadratic Programming</article-title>
          .
          <source>Naval Research Logistics</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          -2) (
          <year>1956</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Models for Analysis of Asymmetrical Relationships among N Objects or Stimuli</article-title>
          .
          <source>In: Proc. Joint Meeting of the Psychometric Society and the Society for Mathematical Psychology</source>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jaggi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Revisiting</surname>
          </string-name>
          Frank-Wolfe:
          <article-title>Projection-Free Sparse Convex Optimization</article-title>
          .
          <source>In: Proc. of ICML</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kantor</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rokach</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ricci</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shapira</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <source>Recommender Systems Handbook</source>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volinsky</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Matrix Factorization Techniques for Recommender Systems</article-title>
          .
          <source>Computer</source>
          <volume>42</volume>
          (
          <issue>8</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lawson</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanson</surname>
          </string-name>
          , R.J.:
          <article-title>Solving Least Squares Problems</article-title>
          . SIAM (
          <year>1974</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
          </string-name>
          , H.:
          <article-title>A Three-way Model for Collective Learning on Multi-relational Data</article-title>
          .
          <source>In: Proc. of ICML</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sifa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drachen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Large-Scale</surname>
          </string-name>
          Cross
          <article-title>-Game Player Behavior Analysis on Steam</article-title>
          .
          <source>In: Proc. of AAAI AIIDE</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sifa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ojeda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>User Churn Migration Analysis with DEDICOM</article-title>
          .
          <source>In: Proc. of ACM RecSys</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sifa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikanth</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drachen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ojeda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Predicting Retention in Sandbox Games with Tensor Factorization-based Representation Learning</article-title>
          .
          <source>In: Proc. of IEEE CIG</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sifa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drachen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Archetypal Game Recommender Systems</article-title>
          .
          <source>In: Proc. of KDML-LWA</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tversky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>Features of Similarity. Psychological Review</source>
          <volume>84</volume>
          (
          <issue>4</issue>
          ) (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>