<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tensor Co-clustering: a Parameter-less Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena Battaglia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruggero G. Pensa</string-name>
          <email>ruggero.pensag@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin, Dept. of Computer Science</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Tensors co-clustering has been proven useful in many applications, due to its ability of coping with high-dimensional data and sparsity. However, setting up a co-clustering algorithm properly requires the speci cation of the desired number of clusters for each mode as input parameters. To face this issue, we propose a tensor co-clustering algorithm that does not require the number of desired co-clusters as input, as it optimizes an objective function based on a measure of association across discrete random variables that is not a ected by their cardinality. The e ectiveness of our algorithm is shown on real-world datasets, also in comparison with state-of-the-art co-clustering methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Tensors are widely used mathematical objects that well represent complex
information such as gene expression data, social networks, heterogenous information
networks, time-evolving data, behavioral patterns, and multi-lingual text
corpora. In general, every n-ary relation can be easily represented as a tensor. From
the algebraic point of view, in fact, they can be seen as multidimensional
generalizations of matrices and, as such, can be processed with mathematical and
computational methods that generalize those usually employed to analyze data
matrices, e.g., non-negative factorization, singular value decomposition, itemset
and association rule mining, clustering and co-clustering.</p>
      <p>Clustering, in particular, is by far one of the most popular unsupervised
machine learning techniques since it allows analysts to obtain an overview of the
intrinsic similarity structures of the data with relatively little background
knowledge about them. However, with the availability of high-dimensional
heterogenous data, co-clustering has gained popularity, since it provides a simultaneous
partitioning of each mode. Despite its proven usefulness the correct application
of tensor co-clustering is limited by the fact that it requires the speci cation of
a congruent number of clusters for each mode, while, in realistic analysis
scenarios, the actual number of clusters is unknown. Furthermore, matrix/tensor
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published and
copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
∈
∈</p>
      <p>∈
( )
tensor co-clustering
con ngency tensor</p>
      <p>
        Goodman-Kruskal’s
associa on measures
(co-)clustering is often based on a preliminary tensor factorization step or on
latent block models [
        <xref ref-type="bibr" rid="ref13 ref14 ref3">14, 3, 13</xref>
        ] that, in their turn, require further input parameters
(e.g., the number of latent factors/blocks within each mode). As a consequence,
it is merely impossible to explore all combinations of parameter values in order
to identify the best clustering results.
      </p>
      <p>
        The main reason for this problem is that most clustering algorithms (and
tensor factorization approaches) optimize objective functions that strongly depend
on the number of clusters (or factors). Hence, two solutions with two di
erent numbers of clusters can not be compared directly. Although this reduces
considerably the size of the search space, it prevents the discovery of a
better partitioning once a wrong number of clusters is selected. In this paper, we
address this limitation by proposing a new tensor co-clustering algorithm that
optimizes an objective function that can be viewed as n-mode extension of an
association measure called Goodman-Kruskal's [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], whose local optima do not
depend on the number of clusters. Compared with state-of-the-art techniques
that require the desired number of clusters in each mode as input parameters,
our approach achieves similar or better results on several real-world datasets.
      </p>
      <p>
        The algorithm presented in this paper has been st introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An
interested reader could refer to it for further theoretical and experimental details.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>An association measure for tensor co-clustering</title>
      <p>
        The objective function optimized by our tensor co-clustering algorithm is an
association measure, called Goodman and Kruskal's [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], that evaluates the
dependence between two discrete variables and has been used to evaluate the
quality of 2-way co-clustering [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Goodman and Kruskal's estimates the
strength of the link between two discrete variables X and Y according to the
proportional reduction of the error in predicting one of them knowing the other:
XjY = eX E[eXjY ] ; where eX is the error in predicting X (estimated as the
eX
probability that two di erent observations from the marginal distribution of X
fall in di erent categories) and E[eXjY ] is the expected value of the conditional
error taken with respect to the distribution of Y . Conversely, the proportional
reduction of the error in predicting Y while X is known is Y jX = eY E[eY jX ] :
eY
      </p>
      <p>In order to use this measure for the evaluation of a tensor co-clustering, we
need to extend it so that can evaluate the association of n distinct discrete
variables X1; : : : ; Xn. Reasoning as in the two-dimensional case, we can de ne
the reduction in the error in predicting Xi while (Xj )j6=i are all known as
Xi =</p>
      <p>Xij(Xj)j6=i =
eXi</p>
      <p>E[eXij(Xj)j6=i ]
eXi
for all i n. When n = 2, the measure coincides with Goodman-Kruskal's .</p>
      <p>We will now see how to use function to evaluate a tensor co-clustering.
Let X 2 R+m1 mn be a tensor with n modes and non-negative values. Let us
denote with xk1:::kn the generic element of X , where ki = 1; : : : ; mi for each mode
i = 1; : : : ; n. A co-clustering P of X is a collection of n partitions fPigi=1;:::;n,
ci
where Pi = [j=1Cji is a partition of the elements on the i-th mode of X in ci
groups, with ci mi for each i = 1; : : : ; n. Each co-clustering P can be associated
to a tensor T P 2 Rc+1 cn , whose generic element is the sum of all entries of X
in the same co-cluster. We can look at T P as the contingency n-modal table that
empirically estimates the joint distribution of n discrete variables X1; : : : ; Xn,
where each Xi takes values in fC1i ; : : : Ccii g. From this contingency table, we can
derive the marginal probabilities of each variable Xi (i.e., the probability that a
generic element of mode i falls in a particular cluster on that mode) and we can
use these probabilities to compute Goodman and Kruskal's functions: in this
way we associate to each co-clustering P over X a vector P = ( XP1 ; : : : ; XPn )
that can be used to evaluate the quality of the co-clustering. The overall
coclustering schema is depicted in Figure 1.
3</p>
    </sec>
    <sec id="sec-3">
      <title>A stochastic local search approach to co-clustering</title>
      <p>Our co-clustering approach can be formulated as a multi-objective optimization
problem: given a tensor X with n modes and dimension mi on mode i, an optimal
co-clustering P for X is one that is not dominated by any other co-clustering (i.e.
does not exist any other co-clustering Q with XQj &gt;= XPj for all j = 1; : : : ; n).</p>
      <p>Since we do not x the number of clusters, the space of possible solutions is
huge (for example, given a very small tensor of dimension 10 10 10, the number
of possible partitions is 1:56 1016): it is clear that a systematic exploration of
all possible solutions is not feasible for a generic tensor X . For this reason we
need to nd a heuristic that allows us to reach a \good" partition of X , i.e. a
partition P with high values of XPk for all modes k. With this aim, we propose
a stochastic local search approach to solve the maximization problem.
else
end
i
end
iter whithout moves</p>
      <p>0;
i + 1;
Algorithm 1: T CC(X ; Niter)</p>
      <p>Input: A m1 mn tensor X , Niter</p>
      <p>Result: P1; : : : ; Pn
1 Initialize P1; : : : ; Pn with discrete partitions;
2 i 0;
3 iter whithout moves 0 ;
4 while i Niter &amp; iter whithout moves &lt; maxj=1;:::;n(mj) do
5 for k = 1 to n do
6 if iter whithout moves &lt; t then
7 Randomly choose Cbk in Pk and x in Cbk;
8
9
else
x next(k) //Select the element following the one selected at
iteration i 1 on mode k;</p>
      <p>Cbk Cluster of x;
end
for Ckj in Pk [ ; do</p>
      <p>Qjk (Pk n Cbk; Cjk ) S Cbk n fxg; Cjk [ fxg ;
Qj (P1; P2; : : : ; Pk 1; Qk; Pk+1; : : : ; Pn);</p>
      <p>Compute contingency tensor T j associated to Qj and Qj ;
end
e</p>
      <p>SelectBestP artition(k; b; ( Qj )j=1;:::;jPk[;j);
Pk Qek;
if e == b then
iter whithout moves</p>
      <p>iter whithout moves + 1;</p>
      <p>
        Algorithm 1 provides the general sketch of our tensor co-clustering algorithm,
called TCC. It repeatedly considers modes one by one, sequentially, and it tries
to improve the quality of the co-clustering by moving one single element from
its original cluster Cbk to another cluster on the same mode, Ck, which most
e
improves the quality of the partition, according to a criterion chosen to measure
the quality of the partition. When all the n modes have been considered, the
i-th iteration of the algorithm is concluded. If all objects have been tried but
no move is possible, the algorithm ends. At the end of each iteration, one of the
following possible moves has been done on mode k:
{ an object x has been moved from cluster Cbk to a pre-existing cluster Cek: in
this case the nal number of clusters on mode k remains the same (let'call
it ck) if Cbk is non-empty after the move. If Cbk is empty after the move, it
will be deleted and the nal number of clusters will be ck 1;
{ an object x has been moved from cluster Cbk to a new cluster Cek = ;: the
nal number of clusters on mode k will be ck + 1 (the useless case when x is
moved from Cbk = fxg to Cek = ; is not considered);
{ no move has been performed and the number of clusters remains ck.
Thus, during the iterative process, the updating procedure is able to increase or
decrease the number of clusters at any time. This is due to the fact that, contrary
to other measures, such as the loss in mutual information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], measure has an
upper limit which does not depend on the number of co-clusters and thus enables
the comparison of co-clustering solutions of di erent cardinalities.
      </p>
      <p>As mentioned above, the choise of the best cluster on mode k in which
the selected element x should be moved depends on the way in which we
decide to measure the increase in the quality of the tensor partition. We can
dene di erent measures, corresponding to di erent ways to implement function
SelectBestP artition in Algorithm 1. According to our experiments, the one with
best performances in terms of speed of convergence and quality of the identi ed
co-clusters is the following. Suppose we want to move an object on mode k:
we consider only those moves that improve (or at least do not worsen) Xk and,
among them, we choose the one with the greatest value of avg( ) = n1 Pn
j=1 Xj .</p>
      <p>It can be proven that algorithm TCC with this selection strategy converges to
a Pareto local optimum.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and discussion</title>
      <p>
        In this section, we test the e ectiveness of our co-clustering algorithm through
experiments on the following three real-world datasets: the \four-area" DBLP
dataset1; the \hetrec2011-movielens-2k" dataset2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], from which we extraxt two
di erent tensors, MovieLens1 and MovieLens2; the Yelp dataset3, from which we
extract two tensors, YelpTOR and YelpPGH. To assess the quality of the
clustering performances, we consider two measures commonly used in the clustering
literature: normalized mutual information (NMI) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and adjusted rand index
(ARI) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        We compare our results with those of other state-of-the-art tensor co-clustering
algorithms. nnCP is the non-negative CP decomposition [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and can be used to
co-cluster a tensor, as done by [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], by assigning each element in each mode to the
cluster corresponding to the latent factor with highest value. nnTucker is the
non-negative Tucker decomposition [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. nnCP+kmeans and nnT+kmeans
combine CP (or Tucker) decomposition with a post-processing phase in which
k-means is applied on each of the latent factor matrices, similarly as what has
been done by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. SparseCP consists of a CP decomposition with
nonnegative sparse latent factors [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Finally, TBM performs tensor co-clustering
1 http://web.cs.ucla.edu/~yzsun/data/DBLP_four_area.zip
2 https://grouplens.org/datasets/hetrec-2011/
3 https://www.yelp.com/dataset
via the Tensor Block Model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The last four methods require as input
parameters the number of clusters on each mode: we set these numbers equal to the true
number of classes on the three modes of the tensor. When further parameters
are needed, we follow the instructions suggested in the original papers for setting
them. Algorithms nnCP, nnTucker and their variant with k-means are applied
trying di erent ranks of the decomposition: we report the best result obtained.
In this way we are giving a big advantage to our competitors: we choose the
rank of the decomposition and the number of clusters by looking at the actual
number of categories, which are unknown in standard unsupervised settings.
Despite this, as shown in Table 1, TCC outperforms the other algorithms on all
datasets but one (DBLP) and has comparable results on another (YelpTOR).
In these cases, non-negative Tucker decomposition (with the number of latent
factors set to the correct number of embedded clusters) achieves the best results
and non-negative CP decomposition obtains results that are comparable with
those of TCC. However, when we modify, even if slightly, the number of latent
factors (see Figure2), the results get immediately worse than those of TCC.
      </p>
      <p>The number of clusters identi ed by TCC is usually close to the correct
number of embedded clusters: on average, 5 instead of 4 for DBLP, 5 instead
of 3 for MovieLens1, the correct number 3 for MovieLens2, 5 instead of 3 for
YelpPGH. Only YelpTOR presents a number of clusters (13) that is far from
the correct number of classes (3). However, more than the 85% of the objects
are classi ed in 3 large clusters, while the remaining objects form very small
clusters: we consider these objects as candidate outliers. The same beahviour is
even more pronounced in DBLP, where four clusters contain the 99.9% of the
objects and only 2 objects stay in the \extra cluster".</p>
      <p>Lastly, we provide some insights about the quality of the clusters identi ed
by our algorithm. To this purpose, we choose a co-clustering of the MovieLens1
dataset. This dataset contains 181 movies and all the tags assigned by the users
to each movie. We construct a (215 181 142)-dimensional tensor, where the
three modes represent users, movies and tags. The movies are labelled in three
categories (Animation, Horror and Documentary). Algorithm TCC identi es
ve clusters of movies, instead of the three categories we consider as labels. The
tag clouds in Figure 3, illustrate the 30 movies with more tags for each cluster
(text size depends on the actual number of tags): it can be easily observed that
τTCC</p>
      <p>τTCC
the rst cluster concerns animated movies for children, mainly Disney and Pixar
movies; the second one is a little cluster containing animated movies realized
with the claymation technique (mainly Wallace and Gromit saga's movies or
other lms by the same director); the third cluster is still a subset of the
animated movies, but it contains anime and animated lms from Japan. The fourth
cluster is composed mainly by horror movies and the last one contains only
documentaries. On the tag mode, our algorithm nds thirteen clusters. Six of them
contain more than 90% of the total tags and only 10 uninformative tags are
partitioned in other 7 very small clusters, and could be considered as outliers.
There is a one-to-one correspondence between four clusters of movies (Cartoons,
Anime, Wallace&amp;Gromit and Documentary) and four of the tag clusters; cluster
Horror, instead, can be put in relation with two di erent tag clusters, the rst
containing names of directors, actors or characters of popular horror movies, the
second composed by adjectives tipically used to describe disturbing lms.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Our experimental validation has shown that our approach is able to identify
meaningful clusters. Moreover, it outperforms state-of-the-art methods for most
datasets. Even when our algorithm is not the best one, we have found that
the competitors can not work properly without specifying a correct number of
clusters for each mode of the tensor. As future work, we will design a speci c
algorithm for sparse tensors with the aim of reducing the overall computational
complexity of the approach. Finally, we will further investigate the ability of our
method to identify candidate outliers as small clusters in the data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Battaglia</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pensa</surname>
          </string-name>
          , R.G.:
          <article-title>Parameter-less tensor co-clustering</article-title>
          .
          <source>In: Proceedings of DS 2019</source>
          . pp.
          <volume>205</volume>
          {
          <issue>219</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cantador</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brusilovsky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Ku ik,
          <source>T.: 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec</source>
          <year>2011</year>
          ).
          <source>In: Proc. RecSys</source>
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Robust face clustering via tensor decomposition</article-title>
          .
          <source>IEEE Trans. Cybernetics</source>
          <volume>45</volume>
          (
          <issue>11</issue>
          ),
          <volume>2546</volume>
          {
          <fpage>2557</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dhillon</surname>
            ,
            <given-names>I.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallela</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          :
          <article-title>Information-theoretic co-clustering</article-title>
          .
          <source>In: Proc. ACM SIGKDD</source>
          <year>2003</year>
          . pp.
          <volume>89</volume>
          {
          <issue>98</issue>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kruskal</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          :
          <article-title>Measures of association for cross classi cation</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>49</volume>
          ,
          <issue>732</issue>
          {
          <fpage>764</fpage>
          (
          <year>1954</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Foundation of the parafac procedure: models and conditions for an\ explanatory" multimodal factor analysis</article-title>
          .
          <source>UCLA Working Papers in Phonetics 16</source>
          ,
          <issue>1</issue>
          {
          <fpage>84</fpage>
          (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>C.H.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering</article-title>
          .
          <source>In: Proc. ACM SIGKDD</source>
          <year>2008</year>
          . pp.
          <volume>327</volume>
          {
          <issue>335</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hubert</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arabie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Comparing partitions</article-title>
          .
          <source>J. Classif</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>193</volume>
          {
          <fpage>218</fpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Papalexakis</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidiropoulos</surname>
            ,
            <given-names>N.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>From K-means to higher-way coclustering: Multilinear decomposition with sparse latent factors</article-title>
          .
          <source>IEEE Trans. Signal Processing</source>
          <volume>61</volume>
          (
          <issue>2</issue>
          ),
          <volume>493</volume>
          {
          <fpage>506</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feschet</surname>
          </string-name>
          , F.:
          <article-title>E cient local search in conceptual clustering</article-title>
          .
          <source>In: Proceedings of DS 2001</source>
          . pp.
          <volume>323</volume>
          {
          <issue>335</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Strehl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          , J.:
          <article-title>Cluster ensembles { A knowledge reuse framework for combining multiple partitions</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          ,
          <issue>583</issue>
          {
          <fpage>617</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>L.R.:</given-names>
          </string-name>
          <article-title>Some mathematical notes on three-mode factor analysis</article-title>
          .
          <source>Psychometrika</source>
          <volume>31</volume>
          ,
          <issue>279</issue>
          {
          <fpage>311</fpage>
          (
          <year>1966</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Multiway clustering via tensor block models</article-title>
          .
          <source>In: Proc. of NeurIPS 2019</source>
          . pp.
          <volume>713</volume>
          {
          <issue>723</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Web co-clustering of usage network using tensor decomposition</article-title>
          .
          <source>In: Proceedings of ECBS 2009</source>
          . pp.
          <volume>311</volume>
          {
          <issue>314</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>