<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Interesting Outlier Subgraphs in Attributed Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmad Mel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tijl De Bie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIDA, IDLab, Ghent University</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A common task in local pattern mining on attributed graphs revolves around detecting subgraphs with predefined similarity measures on the attributes of the nodes. In this work we focus on mining cohesive subgraphs with attribute values that stand out when compared to the rest of the graph. We tackle a more specific problem: given a vertex-attributed graph, we aim to detect a cohesive set of connected vertices , such that for a maximal set of the attributes, at least one of the vertices in the subgraph shows an abnormal value. Such patterns are very relevant in the biological field, where anomaly detection is used to identify parts of a corrupt gene network. We develop a pattern syntax and an interestingness score to mine such subgraphs, and implement it using a branch-and-bound algorithm. Furthermore, we design an experimental setup to qualitatively assess the results, discuss our findings, limitations and future improvements.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Subgraph Mining</kwd>
        <kwd>Subjective Interestingness</kwd>
        <kwd>Attributed Graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The graph mining field saw a rise in interest in recent years, benefiting from the surge in the
amount of network data available from the ever-growing data revolution, from social media to
biological protein-protein networks.</p>
      <p>
        For example, in the field of cancer biology, a cancer driver gene is a gene whose mutations
cause or facilitate cellular cancer initiation and progression [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and thus exploratory pathway
and network analysis is becoming an integral tool for cancer driver gene discovery[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
associating infrequently mutated genes as cancer genes based on their interaction with recurrently
mutated genes [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        As more fields are employing graph data, the use of rich graphs for exploratory data mining
became more popular. While classical graphs solely rely on node topology, rich graphs add
another layer of data in the form of node and/or edge characteristics[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, data mining
on node-attributed graphs (referred to as attributed graphs throughout the rest of the paper)
combines additional information and ofers more insights for enhanced graph mining tasks[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        One of those tasks is graph clustering, or community detection, where the goal is to detect
a subgraph (typically a densely connected subset of nodes and the edges between them) of
an attributed graph [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], where the nodes have similar (according to some metric) attribute
values [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Similarly, subgraph anomaly detection on attributed graphs aims to discover outlier
substructures within such graphs that are rare with respect to their node attribute values, deviating
significantly from the majority of the other nodes in the graph[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Both approaches fall under community detection, while the former seeks subgraphs sharing
similarities with their attribute values, and the latter finds subgraphs with difering attribute
values with respect to the rest of the graph.</p>
      <p>
        In this paper, we introduce a new intuitive interpretable method for exploratory subgraph
mining in node attributed graphs, that finds easily describable cohesive set of nodes with
attribute values that stand out compared to the rest of graph. We propose and employ a
principled interestingness measure to rank the discovered patterns, by maximizing the amount
of information contained in a pattern while reducing its size. Recent work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] on this topic
focused on discovering interesting cohesive patterns where for a subset of the attributes, every
node in the subgraph has exceptional attribute value compared to the rest of the graph. Our
work generalizes this work by defining a pattern syntax such that for a subset of the attributes
at least one node in the subgraph has exceptional value. Our novel method aligns with the
application of pathway analysis for gene discovery, where the aim is to find oncogenic pathways
(subgraphs) of interacting genes (nodes) and encoded patient mutation data (attributes), such
that the size of the pathway is minimal while maximizing the patient mutation on the genes.
      </p>
      <p>This approach overcomes many limitations including the need to define exceptions for the
detected patterns, where some nodes in the subgraph do not match the described pattern. We
detail the formulation in section 2. An implementation of the search and rank approach is
detailed in section 3. In section 4 we design an experimental study using real world datasets
from attributed gene interaction networks to validate our method qualitatively. Finally, we
conclude our paper by discussing the results and limitations of our work while setting up future
improvement in section 5.</p>
      <p>Our main contributions can be summarized as follows:
• We introduce a novel interpretable pattern syntax for attributed subgraph mining
• We implement an interestingness measure to rank the described patterns using a heuristic
branch-and-bound search algorithm
• We design an experimental study to validate our method qualitatively on a real world
dataset</p>
    </sec>
    <sec id="sec-2">
      <title>2. Pattern Syntax</title>
      <p>In this section, we formalize the problem statement and the proposed pattern syntax, and define
the interestingness measure used to rank the patterns.</p>
      <sec id="sec-2-1">
        <title>2.1. Preliminaries</title>
        <p>Let  = (, , ˆ) be a vertex-attributed undirected graph.  is a set of  vertices.  ⊆  ×  ,
a set of  edges, With ˆ we denote a set of  numerical attributes. They are formalized as
functions ˆ() ∈ Dom, denoting the value of attribute ˆ ∈ ˆ for node  ∈  . Here, Dom
denotes the set of possible values to which the attribute values ˆ() belong. We use hats over ˆ
and ˆ to emphasize that we are talking about their empirically observed values here. Whenever
we consider them to be random variables, the hats will be omitted.</p>
        <p>We define a cohesive subgraph with anomalous attributes (CSAA) as a tuple (, ) for which
the following holds:
•  ⊆  is a set of vertices in the graph.
•  ⊆ { (, ) |  ∈ ,  ∈ Dom} is a set of so-called characteristics.</p>
        <p>• ∀(, ) ∈ , ∃ ∈  : () ≥</p>
        <p>We refer to the set of vertices  as the CSAA cover of the CSAA, while  represents the
(exceptional) characteristics of those vertices, restricting the attribute domains of the vertices
within the cover.</p>
        <p>We also define () as the neighborhood of range  of a vertex , as the set of vertices
whose shortest path to  is at most :</p>
        <p>() = { ∈  | (, ) ≤ }.</p>
        <p>While the CSAA pattern syntax has to the best of our knowledge not been proposed before,
it is relevant in concrete applications. E.g., in the biological pathway network analysis
problem mentioned in the introduction, where genes represent the nodes, and mutation data the
characteristics, a CSAA would define a pathway (subgraph) of genes (nodes) such that at for
a maximal subset of mutation characteristics (attributes), there exists at least one gene in the
pathway that fits those mutation criteria.</p>
        <p>Problem Statement Following the notations defined above, given a vertex-attributed graph
, our goal is to find a tuple (, ), such that for every attribute in S, at least one vertex  in  ,
should exceed a specified threshold .</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Interestingness Measure</title>
        <p>Now that we have described the pattern syntax, the next step is to define the ranking mechanism
used to quantify the interestingness of the patterns. This is important, as generally speaking
there will be many CSAA’s in any given attributed network. Thus, in this section we cover our
interestingness measure and discuss its formulation in detail, relying on the FORSIED framework,
that leverages the background knowledge of the user about the data and the complexity of the
pattern to quantify its interestingness using information theory. The framework aims provide
as much knowledge to the user about the data, with as little cost as possible, while relying on
the user’s prior belief. More specifically, our subjective interestingness measure SI is defined as
the ratio of the information content IC (background knowledge) over the description length DL
(complexity) of the pattern. Such formulation intuitively leads to a pattern that maximizes the
information content while remaining easily describable.</p>
        <p>Our approach takes into account both the topology of the graph and its embedded data which
makes it more versatile than other anomaly detection techniques.</p>
        <p>SI(, ) =</p>
        <p>IC(, )
DL(, )
.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Information Content</title>
          <p>Subjective measures account for both the data and its user, thus relying on the background
knowledge that the user has about the data. Such prior beliefs are modeled as equality constraints
assuming that the user knows general statistics about the data.</p>
          <p>In this context, we define information content IC of the pattern as the self-information or
surprisal, a measure of unexpectedness of the pattern. Meaning, the least expected patterns are
the most informative, and have the highest information content.</p>
          <p>We also consider only attributed graphs with non-negative integer values for the attributes.
where the attribute value domains are binary (1 or 0).</p>
          <p>In other words ( :</p>
          <p>→ N, ∀ ∈ ). In our experiments we will even consider the situation</p>
          <p>We consider also that the user knows a priori general statistics about the graph, more
specifically the averages over the attributes and vertices. This leads to prior beliefs formalized
as equality constraints on the attribute values  on all the vertices, more specifically, one
constraint modeled after the total sum of the attributes for each vertex, and the other after the
total sum of all the vertices for each attribute in the graph, this is denoted below:
∑︁ Pr()
∑︁ Pr()


︃(
︃(
∑︁ ()
∈
∑︁ ()
∈
)︃
)︃
= ∑︁ ˆ(),
= ∑︁ ˆ(),
^∈^
∈
∀ ∈ ,
∀ ∈ .</p>
          <p>(1)
(2)
(3)
(4)
as background distribution, and we formalize the information content of a pattern IC(, )
as the negative logarithm of the probability that the pattern is present under the background
distribution.</p>
          <p>IC(, ) = − log(Pr(, )).</p>
          <p>And following our problem statement we can define Pr (, ) as</p>
          <p>Pr(, ) = Pr(∀(, ) ∈ , ∃ ∈  : () ≥ ).</p>
          <p>
            Furthermore, as demonstrated in [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], the maximum entropy problem is a convex optimization
problem, of which the solution is a product of independent Geometric distributions, one for
each vertex attribute-value (), of the form
          </p>
          <p>Pr(() = ) = (1 − ), with  ∈ N and  = 1 − ( + ),
where  is the success probability, and   and   the Lagrange multipliers corresponding to
the two constraint types.</p>
          <p>Thus, eq. (5) can be written as:
Pr(∀(, ) ∈ , ∃ ∈  : () ≥
) =</p>
          <p>Pr(∃ ∈  : () ≥</p>
          <p>).</p>
          <p>∏︁
(,)∈
The factors in this product can be written as follows:</p>
          <p>Pr(∃ ∈  : () ≥ ) = 1 − Pr(∀ ∈  : () &lt; )
∏︁ (1 − (1 − ) ).</p>
          <p>Furthermore,</p>
          <p>Pr(∀(, ) ∈ , ∃ ∈  : () ≥
) =</p>
          <p>∏︁
(,)∈
(1 −
∏︁ (1 − (1 − ) )).
∈
Putting things together, eq. (4) would lead to the following Information Content (IC):
IC(, ) =</p>
          <p>∑︁
(,)∈
(− log(1 −
∏︁ (1 − (1 − ) ))).
∈</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Description Length</title>
          <p>The description length DL represents the encoding length of the pattern. Ideally, we aim to find
patterns that are the most informative with minimal description cost. Our approach, provides a
more intuitive interpretation of the descriptional complexity of a pattern, such that, a higher
complexity would negatively impact the total interestingness of a pattern, as the description
length DL functions as a regulator.</p>
          <p>As mentioned earlier in section 2.1, we describe the vertex set  in the pattern as the
intersection of a set of neighborhoods (), with  ∈  . More formally, let us define the set
of all neighborhoods  = {() |  ∈  ∧  ∈ N ∧  ≤ } (with  the maximum radius 
considered), and let  ( ) = {() ∈  |  ⊆ ()} be the subset of neighborhoods that
contain  . The length of a description of the set  as the intersection of all neighborhoods in a
subset  ⊆  ( )</p>
          <p>DL(,  ) = log(| |) + || · log(| |).</p>
          <p>The first term accounts for encoding of the number of neighborhoods log(| |), and the second
term for the description of which neighborhoods are involved (|| log(| |)).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Interesting Outlier Subgraph Detection</title>
      <p>Now that we have defined a CSAA pattern, we will introduce our method in this section,
and subsequently our algorithm1 that searches for patterns and rank them according to the
subjective interesting score defined in section 2.</p>
      <p>The main idea is that the ranking is based on the subjective interestingness score SI moderated
by the background information and the size of the intersection. The bigger the size of the pattern,
the higher the description length DL the lower the total SI. On the other hand, the higher the
information content IC, the higher the total SI. So our algorithm tries to optimize over those
two variables, to maximize SI. Again, our aim is to find patterns such that for a maximal subset
of the attributes, there exists at least one vertex in the pattern that contains anomalous value
compared to the rest of the graph. Applying a brute force search strategy is ineficient, thus we
rely on a beam search approach for the breadth-first-search branch-and-bound algorithm.</p>
      <sec id="sec-3-1">
        <title>3.1. Beam search</title>
        <p>
          Traversing a search tree, branch-and-bound methods keep track of the best solution found so
far and its associated value, and then it computes an upper bound on the highest objective
value that can possibly be achieved when exploring new parts of the search space. If such upper
bound is worse than the current best solution then this portion of the search space cannot
contain the optimal solution, and is pruned. Beam search [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] operates similarly, in each level of
the search tree, beam search expands towards the  most promising solutions, and discards the
rest, where the  is an integer called the beam width. By varying  we can control the search
strategy, from a greedy search for  = 1, mirroring the classic branch-and-bound approach, to
a larger limit to perform a complete search approach, which in most cases is intractable. One of
the main challenge is to define a suitable upper bound for the objective function to efectively
prune irrelevant branches.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Interesting Outlier Subgraph Miner: MIOS</title>
        <p>Our aim is to explore CSAA patterns in attributed graphs and rank them eficiently. To do so, we
employ a beam search branch-and-bound algorithm. Given an attributed graph , maximum
radius  for the ego graph of a given vertex, and  representing the width of the beam search, the
initial step is to enumerate all the neighborhoods () (ie. ego graph of vertex  of radius )
for every vertex  in the graph , using the function G.AllNeighborhoods(d). This would
1Our algorithm code is available on the following github link: https://github.com/aida-ugent/mios
constitute our initial search space candidate-patterns. We initialize BestScore= ∅ to
keep track of the SI values of the most promising  solutions so far. Then for every enumerated
neighborhood (), we generate a pattern object 0. The function G.getNeighborhoods
generate candidate neighborhoods  that intersects with 0 to produce a new pattern  with
the interestingness score ., and an upper bound on the interestingness score that we consider
as an optimistic bound . = * .</p>
        <p>The upper bound on the interestingness score is computed such that the intersection of the
set of neighborhoods . of this pattern  with an additional neighborhood would produce a
pattern  * with its cover  * . containing a single vertex with maximal information content.
Thus | * .| = |.| + 1, and  * . is the same in this case for all the patterns formed by
the diferent vertices in . . Consequently, the upper bound on  * . would depend solely on
the vertex with the maximal IC  * .. Thus the upper bound on SI would be:
* = max∀∈ {({}, * )} .</p>
        <p>(|| + 2) · log(| |)</p>
        <p>This pattern is the best we can hope for as its description length is minimal, and infomation
content is maximal for all the childrens node formed with the additional intersection. New
candidate patterns are generated for next iteration and the algorithm is repeated until the
candidates list is empty. To summarize, the algorithm uses the neighborhoods as search space,
and iteratively adds more neighborhoods to the intersection forming new patterns, thus reducing
the number of vertices in the newly formed patterns, to maximize the interestingness of the
patterns resulting from the intersection of these neighborhoods. The pseudocode for our
algorithm is provided in algorithm 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>In this section we define our experimental setup, by describing our dataset, presenting our
qualitative findings of the application of our method on oncological pathway discovery and an
assessment of the quality of the patterns.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>
          To test our framework, we utilized a dataset consisting of gene interaction network [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and
patient mutation sample data for each gene in the network. The network consists of 185 genes,
and 463 patient samples. We additionally use the PAM50 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] dataset for validation of the results.
PAM50 is a 50-gene signature that classifies breast cancer into five molecular intrinsic subtypes:
Luminal A, Luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, Basal-like
and Normal-like. Each of the five molecular subtypes vary by their biological properties and
prognoses.
        </p>
        <p>The aim is to find a pathway (subgraph) of genes (vertices) such that for a maximal subset of
samples (attributes), there exist at least one gene in the pathway that is mutated (anomalous). To
assess the abnormality of the gene, we combine data from 463 patients, representing mutation,
copy number variation and gene expression data, such that the attribute on the nodes are binary:
1 corresponding to an abnormal gene, and 0 to normal gene.The dataset is illustrated in fig. 1.</p>
        <p>Algorithm 1: MIOS(, , )</p>
        <p>Input:
: Vertex-Attributed Graph Object,
: Maximum Radius of Neighborhoods,
: Width of the Beam Search
Output:</p>
        <p>top-k-patterns: Top-k anomalous pattern objects
BestScore = ∅
candidate-patterns ← G.AllNeighborhoods()
while candidate-patterns ̸= ∅ do
next-candidate-patterns= ∅
for 0 in candidate-patterns do
 ← G.getNeighborhoods(0., 0. )
for  in  do
if 0. ∪  ̸= 0. and 0. ∩  ̸= ∅ then
 ← Pattern(0. ∪ , 0. ∩ )
if  .upperbound &gt; BestScore then</p>
        <p>BestScore ←  .si
next-candidate-patterns ← addPattern( )
top-k-patterns ← addPattern( )
candidate-patterns ← FilterPatterns(, next-candidate-patterns)
top-k-patterns ← FilterPatterns(, top-k-patterns)
return top-k-patterns</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results and discussion</title>
        <p>Using MIOS we were able to detect the most relevant genes in the top-50 patterns. We also
perform post-processing of the results to remove redundant patterns that are produced by
diferent descriptions (intersection of diferent genes ego networks). In fig. 2 we visualize the
top ranking pattern, a subgraph of two vertices {TP53, FOXA1} formed by the intersection of
two neighborhoods N(MYB)1, the ego graph of radius 1 around vertex MYB and N(MDM2)1 the
ego graph of radius 1 centered around node MDM2. TP53 is commonly associated with many
forms of cancer. While interpreting the pattern quality is challenging and out of the scope of
this work, we can nonetheless validate the results by assessing the classification of the resulted
patterns and their association with a specific molecular subtype: (LumA, LumB, HER2, Basal,
Normal). The goal is to evaluate the detected patterns with gene set enrichment analysis using
Fisher Exact Test.</p>
        <p>Enrichment analysis of biological pathways is a statistical method that is used to identify
pathways that are enriched in a gene list more than would be expected by chance. It is a
commonly used method to identify classes of genes that are over-represented in a large set of
genes, and may have an association with disease phenotypes. In table 1 we list an example of a
contingency table of a pattern for a specific subtype. Subtype represents the counts of samples
not associated with the subtype. Characteristics represents the counts of samples not contained
in the characteristics of the patterns.</p>
        <p>Fisher exact-test is used to determine the probabilities of observing the various joint values
within a contingency table under the null hypothesis assumption that the marginal values are
ifxed and that there exist no association between the categorical values. We take the a priori
stance that the categories are independent. Consequently, we calculate the probability that this
contingency table with joint values would occur under the null hypothesis. A small probability
is interpreted as a discrepancy between the data and the null hypothesis of no association
between variables.</p>
        <p>Thus, we can couple the subgroup information to recover certain type of molecular subtype
groups.Performing enrichment analysis (fisher exact test, 2-sided and 1-sided) on the top 50
resulting patterns, resulting in fig. 3, we found that the majority of the p-values are below
0.05 for the fisher tests for HER2 and Basal (both being overrepresented), and LumA (being
underrepresented),</p>
        <p>This implies that the majority of our patterns exhibit statistically significant imbalance for
HER2, Basal and LumA, rendering the classification significant.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>In this paper we presented MIOS, a novel method for interpretable attributed graph mining
for interesting outlier subgraph detection. The goal is to search and rank cohesive sets of
nodes with an intuitive explainable description, and anomalous attribute values compared
to the rest of the graph. We formulated a pattern syntax and implemented an algorithm to
mine such patterns, we then validated our method with real world applications in pathway
mining with preliminary findings showing promising qualitative results. Furthermore, we
performed additional analysis on the results to assess the ranked patterns using fisher exact
tests. The main limitation of our work is the incompletness of the search algorithm, which
would be tackled in future work by implementing better heuristic approaches, using constraint
satisfaction programming to mitigate the pattern explosion problem. Finally, applications with
additional datasets and frameworks with a focus on pathway mining will be another main focus
for improvement.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research leading to these results has received funding from the European Research Council
under the European Union’s Seventh Framework Programme (FP7/2007-2013) (ERC Grant
Agreement no. 615517), and under the European Union’s Horizon 2020 research and innovation
programme (ERC Grant Agreement no. 963924), from the Flemish Government under the
“Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme, and from the
FWO (project no. G0F9816N, 3G042220).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Torkamani</surname>
          </string-name>
          , G. Verkhivker,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Schork</surname>
          </string-name>
          ,
          <article-title>Cancer driver mutations in protein kinase genes</article-title>
          ,
          <source>Cancer letters 281</source>
          (
          <year>2009</year>
          )
          <fpage>117</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Chouinard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Worstell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ilic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamburov</surname>
          </string-name>
          , et al.,
          <article-title>Netsig: network-based discovery from cancer genomes</article-title>
          ,
          <source>Nature methods 15</source>
          (
          <year>2018</year>
          )
          <fpage>61</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Creixell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reimand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Haider</surname>
          </string-name>
          , G. Wu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shibata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mustonen</surname>
          </string-name>
          , A. GonzalezPerez, J. Pearson,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sander</surname>
          </string-name>
          , et al.,
          <article-title>Pathway and network analysis of cancer genomes</article-title>
          ,
          <source>Nature methods 12</source>
          (
          <year>2015</year>
          )
          <fpage>615</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Reyna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Haan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paczkowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Verbeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kahraman</surname>
          </string-name>
          , S. PulidoTamayo, J.
          <string-name>
            <surname>Barenboim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wadi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhingra</surname>
          </string-name>
          , et al.,
          <article-title>Pathway and network analysis of more than 2500 whole cancer genomes</article-title>
          ,
          <source>Nature communications 11</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Community detection in multi-layer graphs: A survey</article-title>
          ,
          <source>ACM SIGMOD Record</source>
          <volume>44</volume>
          (
          <year>2015</year>
          )
          <fpage>37</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bothorel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Micenkova</surname>
          </string-name>
          ,
          <article-title>Clustering attributed graphs: models, measures and methods</article-title>
          ,
          <source>Network Science</source>
          <volume>3</volume>
          (
          <year>2015</year>
          )
          <fpage>408</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzmueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Günnemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Mining communities and their descriptions on attributed graphs: a survey</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>661</fpage>
          -
          <lpage>687</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bendimerad</surname>
          </string-name>
          ,
          <article-title>Mining useful patterns in attributed graphs</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Université de Lyon,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Chunaev</surname>
          </string-name>
          ,
          <article-title>Community detection in node-attributed social networks: a survey</article-title>
          ,
          <source>Computer Science Review</source>
          <volume>37</volume>
          (
          <year>2020</year>
          )
          <fpage>100286</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Akoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koutra</surname>
          </string-name>
          ,
          <article-title>Graph based anomaly detection and description: a survey, Data mining and knowledge discovery 29 (</article-title>
          <year>2015</year>
          )
          <fpage>626</fpage>
          -
          <lpage>688</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bendimerad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lijfijt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plantevit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Robardet</surname>
          </string-name>
          , T. De Bie,
          <article-title>Sias-miner: mining subjectively interesting attributed subgraphs</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>355</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>T. De</surname>
            <given-names>Bie</given-names>
          </string-name>
          ,
          <article-title>Maximum entropy models and subjective interestingness: an application to tiles in binary databases</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>23</volume>
          (
          <year>2011</year>
          )
          <fpage>407</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bisiani</surname>
          </string-name>
          , Beam search,
          <source>Encyclopedia of Artficial Intelligence</source>
          (
          <year>1992</year>
          )
          <fpage>1467</fpage>
          -
          <lpage>1468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gillespie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jassal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stephan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Milacic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rothfels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senf-Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Griss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sevilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Matthews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gong</surname>
          </string-name>
          , et al.,
          <source>The reactome pathway knowledgebase</source>
          <year>2022</year>
          ,
          <source>Nucleic acids research</source>
          <volume>50</volume>
          (
          <year>2022</year>
          )
          <fpage>D687</fpage>
          -
          <lpage>D692</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Parker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mullins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Cheang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Leung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Voduc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vickery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fauron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          , et al.,
          <article-title>Supervised risk predictor of breast cancer based on intrinsic subtypes</article-title>
          ,
          <source>Journal of clinical oncology 27</source>
          (
          <year>2009</year>
          )
          <fpage>1160</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>