Finding NeMo: Fishing in banking networks using network
                                motifs
                   Xavier Fontes                                             David Aparício                               Maria Inês Silva
                      Feedzai                                                   Feedzai                                          Feedzai
                  Porto, Portugal                                           Porto, Portugal                                Lisbon, Portugal
            xavier.fontes@feedzai.com                                 david.aparicio@feedzai.com                        ines.silva@feedzai.com

                 Beatriz Malveiro                                       João Tiago Ascensão                                 Pedro Bizarro
                    Feedzai                                                     Feedzai                                         Feedzai
                Lisbon, Portugal                                            Lisbon, Portugal                               Lisbon, Portugal
         beatriz.malveiro@feedzai.com                                 joao.ascensao@feedzai.com                       pedro.bizarro@feedzai.com

ABSTRACT                                                                                  extract insights about how legitimate and fraudulent transactions
Banking fraud causes billion-dollar losses for banks worldwide.                           "behave" and aid both fraud detection systems and fraud analysts.
In fraud detection, graphs help understand complex transaction                               To the best of our knowledge, this work is the first to find and
patterns and discovering new fraud schemes. This work explores                            explore heterogeneous network motifs in a real-world banking
graph patterns in a real-world transaction dataset by extracting                          setting. Notably, we have two main contributions from this work:
and analyzing its network motifs. Since banking graphs are hetero-                              • We propose a randomization process (i.e., a null model) ade-
geneous, we focus on heterogeneous network motifs. Additionally,                                  quate to banking datasets, a vital component for the defini-
we propose a novel network randomization process that generates                                   tion of network motifs used to provide the baseline frequen-
valid banking graphs. From our exploratory analysis, we conclude                                  cies of each subgraph (detailed in Section 2.2.1).
that network motifs extract insightful and interpretable patterns.                              • We extract network motifs from graphs built from card trans-
                                                                                                  action data and review them thoroughly. We include an anal-
Reference Format:                                                                                 ysis of how the motif significance evolves as more random
Xavier Fontes, David Aparício, Maria Inês Silva, Beatriz Malveiro, João                           networks are used (detailed in Section 3).
Tiago Ascensão, and Pedro Bizarro. Finding NeMo: Fishing in banking
networks using network motifs. In the 2nd Workshop on Search,                                The remainder of this paper is organized as follows. Section 2
Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021).                    presents our method and discusses the key components necessary
                                                                                          for our analysis. We describe the data and present results in Sec-
                                                                                          tion 3. Usage scenarios are proposed in Section 4. We put forward
1    INTRODUCTION                                                                         our main takeaways and offer directions for future work in Sec-
Payment information theft renders online transactions susceptible                         tion 5.
to fraud. Once detected, fraud entails a reimbursement from the
cardholder’s bank, leading to monetary costs to financial institu-                        2     METHOD
tions and customer friction.                                                              In this section we present our methodological choices. First, we
   Fraud detection thus requires a deep understanding of the un-                          discuss how we build networks from banking datasets (Section 2.1).
derlying fraud patterns, and graphs offer an intuitive way to vi-                         Then, we introduce heterogeneous network motifs and describe
sualize these patterns. One can further leverage graph mining in                          how to compute and identify these motifs in banking datasets,
transaction data to understand fraud schemes in a wide range of                           with a special focus on the null model and the measure of motif
applications. Hajdu and Krész [1] developed a methodology to iden-                        significance. (Section 2.2)
tify cycles in transaction networks as a means to detect fraudulent
expenses. Micale et al. [2] retrieved the most frequent patterns in a
                                                                                          2.1     Graph representation
relationship network of people involved in the Panama papers to
identify the most relevant money laundering structures.                                   Banking datasets usually consist of transactions between entities,
   In this work, we build networks from a real-world banking                              such as people, merchants, and businesses. They can include differ-
dataset of card purchases and apply a widely used graph mining                            ent types of transactions such as card payments or bank transfers.
tool – heterogeneous network motifs [7, 10]. Analyzing recurring                             From a banking dataset, we can build two graph representations,
patterns in real banking networks sets a foundation for understand-                       namely (a) entity graphs and (b) transaction graphs, as illustrated
ing how fraud materializes in transaction data. From there, one can                       in Figure 1.

Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
                                                                                          2.1.1 Entity graph. In an entity graph, 𝐺 = (𝑉 , 𝐸), nodes 𝑉 rep-
for the volume as a collection by its editors. This volume and its papers are published   resent entities, such as merchants or clients, and edges 𝐸 connect
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).             entities with at least one shared transaction. This way of repre-
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,          senting banking datasets is helpful to highlight suspicious entity
Copenhagen, Denmark) on CEUR-WS.org.                                                      behavior.
                                            Transaction       Client           Merchant     Fraud?

                        Entity                   T1            C1                M1           No                Transaction
                        Graph                    T2            C1                M1           No                Graph
                           (a)                   T3            C2                M2          Yes                (b)
                                                 T4            C1                M1           No

                       C1              C2

                                                                                   T1                 T2               T3               T4


                       M1              M2


                   Client                Merchant                                                  Legitimate         Fraud

                   Legitimate            Fraud                                                     Client             Merchant


                                  Figure 1: Example of (a) an entity graph and (b) a transaction graph.


    Consider, for example, the entity graph in Figure 1 (a). There,              entities, i..e, 𝜇 (𝑢, 𝑣) ∈ 𝑀, |𝑀 | = 2𝑘 − 1. In Figure 1, 𝑘 = 2, hence
we represent C1 and M1 as two connected nodes of different types,                there are 3 possible labels for edges, 𝑀 = 3.
i.e., {𝐶1, 𝑀1} ∈ 𝑉 , (𝐶1, 𝑀1) ∈ 𝐸, and 𝜙 (𝐶1) ≠ 𝜙 (𝑀1), where                       Transaction can be used to investigate patterns between trans-
𝜙 (𝑢) is the type of node 𝑢. Since we connect all 𝑘 entities in a                actions. Additionally, machine learning classification models can
transaction, they form a 𝑘-node clique in the graph. When the same               benefit from receiving topological information (including centrality
two entities are parties to multiple transactions (e.g., a person makes          measures or node embeddings) extracted from transaction graphs
several purchases at a retail store), we aggregate the transaction               [5].
information into a single edge (𝑢, 𝑣) ∈ 𝐸.
    Therefore, an entity graph is undirected and heterogeneous. The
node label 𝜙 (𝑢) corresponds to the entity’s type (i.e., client, card,           2.2      Heterogeneous network motifs
merchant, or terminal). The edge label 𝜇 (𝑢, 𝑣) is binary, indicating            A network motif is a subgraph that appears more frequently than
whether there is at least one fraudulent transaction involving the               expected. The concept of appearing more frequently than expected
two entities.                                                                    commonly relies on building a large set of randomized networks
                                                                                 R𝐺 that are similar to the original network [4]. In the literature,
2.1.2 Transaction graph. In a transaction graph, 𝐺 = (𝑉 , 𝐸), nodes              authors typically use either 100 or 1000 randomized networks [9].
𝑉 represent individual transactions, and edges 𝐸 connect transac-                   Suppose the frequency of a given subgraph 𝑚𝑖 in the original
tions that share entities.                                                       network is, according to some significance measure (Section 2.2.2),
    In the transaction graph from Figure 1 (b), transactions T1 and              much higher than the (average) frequency on similar randomized
T2 are represented as two nodes connected by an edge indicating                  networks, i.e., 𝑓 𝑟𝑒𝑞(𝑚𝑖 , 𝐺) >> 𝑎𝑣𝑒𝑟𝑎𝑔𝑒_𝑓 𝑟𝑒𝑞(𝑚𝑖 , R𝐺 ). In that case,
that the same client made both, i.e., {𝑇 1,𝑇 2} ∈ 𝑉 and (𝑇 1,𝑇 2) ∈ 𝐸.           subgraph 𝑚𝑖 is a network motif. Similarly, if the subgraph’s fre-
Since connecting all transactions with common entities would result              quency is much lower than the (average) frequency on similar ran-
in very dense graphs, we only connect transactions that occurred                 domized networks, that subgraph is an anti-motif. In this work, we
within a time window, e.g., transactions made by a client in less than           are interested in both motifs and anti-motifs.
24 hours. Moreover, we use edge direction to encode the temporal                    Motif discovery involves computing the frequency of a given set
sequence of transactions, with edges connecting older transactions               of subgraphs and entails subgraph counting [6]. Subgraph counting
to more recent ones, i.e., if (𝑢, 𝑣) ∈ 𝐸, then 𝑣 is more recent than 𝑢.          receives as input a graph G and a list of non-isomorphic subgraph
If two transactions occur in the same timestamp, we add a bidirec-               types (e.g., all possible unique subgraphs with four nodes). Then,
tional edge between the two nodes, i.e., {(𝑢, 𝑣), (𝑣, 𝑢)} ∈ 𝐸.                   it outputs the frequencies of each subgraph type in G, e.g., the
    Thus, a transaction graph is directed and heterogeneous. The                 frequencies of 4-node cliques, 4-node chains, or 4-node stars.
node label is binary (the transaction is fraudulent or legitimate),                 In this work, we use g-tries for subgraph counting since they are
i.e., 𝜙 (𝑥) ∈ {𝑓 𝑟𝑎𝑢𝑑, 𝑙𝑒𝑔𝑖𝑡 }. The edge label is one of 2𝑘 − 1 possible         a general framework able to count subgraphs of arbitrary size in
labels, all the combinations of sharing one or more of 𝑘 different               heterogenous graphs [7, 8]. Other approaches are faster for counting
                                                                           2
                                                                                                              (d) Heterogeneous
          (a) Homogeneous           (b) Heterogeneous Nodes             (c) Heterogeneous Edges
                                                                                                               Nodes and Edges

                   Figure 2: Different 3-node cliques. In our case, we use both heterogeneous nodes and edges.


specific subgraphs, but, as far as we know, g-tries are the only            build the randomized networks from the randomized tabular data.
method that support directed and heterogenous graphs [6].                   Thus, the randomization procedure works as follows:
   Since our graphs are heterogeneous, we need network motifs
                                                                               (1) Shuffle all the fraud labels. This step maintains the fraud
that consider node and edge heterogeneity. Concretely, we need to
                                                                                   rate of the original dataset but randomizes its attribution to
extract heterogeneous motifs and anti-motifs.
                                                                                   different transactions.
   As an illustrative example, consider a 3-node clique where nodes
                                                                               (2) Iterate over each entity type and, for each entity type, we
and edges are homogeneous (Figure 2 (a)) and the following hetero-
                                                                                   randomly chose 𝜌 ∗ 𝑚 pairs of values to be swapped. Here,
geneous graph settings: (i) if nodes can be of two different types,
                                                                                   𝜌 ∈ [0, 1] controls how much we want to randomize the
there are four possible 3-node cliques (Figure 2 (b)), (ii) if edges
                                                                                   original network, and 𝑚 is the number of transactions.
can be of two different types, there are four possible 3-node cliques
                                                                               (3) Build the resulting graph from the randomized data, as de-
(Figure 2 (c)). Thus, in our work, disregarding node or edge labels
                                                                                   scribed in Sections 2.1.1 and 2.1.2.
undermines the necessary differentiation of topological structures.
Heterogeneous motif discovery is more informative than traditional             This network randomization strategy guarantees semantically
homogeneous motif discovery and more complex to extract and                 valid banking networks with a random topology.
analyze.
                                                                            2.2.2 Motif significance measures. After doing subgraph counting
   In banking fraud analysis, heterogeneous network motifs are
                                                                            on both the original network and the randomized networks, we use
more helpful than homogeneous motifs. For instance, knowing that
                                                                            a motif significance measure to evaluate which subgraphs are over-
"clients connected to many different cards are more likely to be
                                                                            and under-represented. Several measures have been proposed [11],
fraudulent" is arguably more informative than just knowing that
                                                                            but here we focus on two of the most well-established: the z-score
"dense subgraphs can be indicative of fraud".
                                                                            and the ratio.
2.2.1 Network randomization. Computing the expected frequency                  The z-score of a subgraph, 𝑧𝑖 , is computed as follows:
of a given subgraph requires randomizing the original network
so that randomized networks are similar to the original. However,                                           𝑓 𝑜 − 𝜇𝑅
defining network similarity is non-trivial and task-dependent. In                                      𝑧𝑖 = 𝑖 𝑅 𝑖                             (1)
practice, the following two approaches are common:                                                             𝜎𝑖
     • Initialize the randomized graph as a copy of the original              Where,
       graph and iteratively swap random pairs of edges [9]. This
                                                                                • 𝑓𝑖𝑜 is the frequency of subgraph 𝑖 in the original network.
       method preserves vertices’ in- and out-degrees.
     • Initialize the randomized graph with the same nodes as the               • 𝜇𝑖𝑅 and 𝜎𝑖𝑅 are the mean and standard deviation of the fre-
       original graph and incrementally add edges with probabili-                 quencies of subgraph 𝑖 in the randomized networks, respec-
       ties based on each nodes’ degree in the original graph [4].                tively.
       This method provides an approximation of the vertices’ in-             We compute the ratio of a subgraph, 𝑟𝑖 , as follows:
       and out-degrees.
   However, these strategies are unsuitable to the banking fraud                                               𝑓𝑜
domain as they fundamentally change the semantic structure of                                             𝑟𝑖 = 𝑖𝑅                             (2)
                                                                                                               𝜇𝑖
banking graphs. For instance, in an entity graph, entities of the
same type are never directly connected by an edge. However, adding              Our goal is to find the subgraphs with the highest z-scores/ratio
or removing edges indiscriminately (while preserving each node’s            (i.e., the motifs) and subgraphs with the lowest z-scores/ratio (i.e.,
degree) will eventually lead to connecting nodes of the same type           the anti-motifs).
and thus compromise the validity of the randomized network. Since               In our experiments (Section 3), we show the ratio of the sub-
these strategies do not take node and edge labels into account, we          graphs since it is more interpretable than the z-score: if 𝑟𝑖 = 100,
need to follow a different approach for network randomization.              the subgraph appears 100 times more often in the original network
   Instead of randomizing the original graph directly, we apply             than in the randomized networks. We complement our analysis by
a randomization procedure directly on the tabular data and then             reporting 𝑓𝑖𝑜 , 𝜇𝑖𝑅 , and 𝜎𝑖𝑅 .
                                                                        3
3 RESULTS                                                                         Ratio
                                                                              10,000
3.1 Data overview
Table 1 outlines the parameters used to generate the entity graph
                                                                                                                                              Motifs
and the transaction graph and provides summary statistics on the               1,000
two graphs.
    In the entity graph, we consider four node types (i.e., client,
                                                                                 100
merchant, terminal, and card) and two edge types (i.e., fraudulent
or legitimate).
    In the transaction graph, we consider two node types (i.e., fraud-            10
ulent and legitimate) and three edge types (i.e., only client, only
merchant, or both client and merchant). The rationale for choosing
these two entities lies in the close relationship between clients and              1
cards and, similarly, merchants and terminals. In other words, since
most clients use few cards and most merchants have few terminals,
                                                                                 0.1
we simplify the graph by dropping the card and terminal entities.                                                                        Anti-Motifs
    Additionally, we found it necessary to constrain the considered
time window due to computational constraints on subgraph count-                 0.01
ing.                                                                                   0          20          40           60           80        100
    Both graphs have hundreds of thousands of edges and many                                           Number of Random Networks
different connected components. The fraud rate in the entity graph
is the number of fraudulent edges, while, in the transaction graph,          Figure 3: Entity graph’s ratio evolution, highlighting the top-
the fraud rate is the number of fraudulent nodes. As a result, the           6 motifs in dark blue and the top-4 anti-motifs in light blue.
fraud rates differ depending on the graph type.

3.2    Motif analysis                                                        have significantly lower ratio than the others (𝑟𝑖 ≈ 0.01). We con-
In this subsection, we start with the entity graph results and then          sider the first to be the motifs, and the last to be the anti-motifs,
show the results of the transaction graph.                                   shown in Figure 4 (a) and Figure 4 (b), respectively.
   We analyze which subgraphs are motifs and anti-motifs based                  From Figure 4 we notice that none of the subgraphs is a 3-node
on the ratio (Equation 2): we consider subgraphs that have a con-            clique. This result might seem counter-intuitive as all transactions
siderably higher ratio than the others to be motifs, and subgraphs           form a 4-node clique. A possible explanation is that some merchants
with the lowest ratios to be anti-motifs.                                    are hub nodes and thus induce chain-like subgraphs. Another gen-
   We focus on subgraphs of size three due to the higher computa-            eral observation is that all edges in the motifs are fraudulent, which
tional cost of larger subgraphs.                                             is not true for the anti-motifs. Domain knowledge indicates that
                                                                             fraudulent transactions tend to be closely connected.
3.2.1 Entity graph. From Figure 3 we can see that, for most cases,              The first three motifs from Figure 4 (a) have a client at its center,
the ratio of all subgraphs stabilizes relatively quickly at ≈ 40 ran-        connected to either (i) two terminals, (ii) one terminal and a mer-
dom networks. We observe that a few subgraphs have significantly             chant, or (iii) two merchants. Subgraphs (i) and (iii) tell us that the
higher ratio than the others (𝑟𝑖 > 100). Moreover, a few subgraphs           client made two fraudulent transactions at different merchants and
                                                                             terminals, respectively. These two subgraphs are motifs because
                                                                             fraud is sometimes recurring for fraudulent clients, and this infor-
                                                                             mation is lost in the randomized networks since they reshuffle fraud
Table 1: Parameters used for graph generation and general                    labels. Subgraph (ii) tells us a similar story but notice that, since the
statistics (NA means Not Applicable).                                        merchant and the terminal are not connected, the merchant and the
                                                                             terminal are from different transactions. The last three motifs are
                                                                             equivalent to the first three but with the card at the center of the
                             entity graph      transaction graph
                                                                             subgraph. Since clients can use multiple cards, the original network
       period              2 days, 5 hours         1 day, 21 hours           counts are lower for subgraphs with the card at the center of the
       lookback period          NA                     6 hours               chain than the counts of the subgraphs with the client at the center
       random networks                       100                             of the chain. We also note that, in practice, it might be interesting
       𝜌                                     0.8                             to investigate all cards used by the client.
       nodes                    51,428                 20,209                   The first three anti-motifs from Figure 4 (b) have a merchant
       edges                   104,833                214,054                at its center. All three anti-motifs convey the same information:
       node types                  4                      2                  merchants typically either have only legitimate transactions or only
       edge types                  2                      3                  fraud transactions, i.e., fraud tends to cluster around the same risky
       conn. components         5,796                  9,877
                                                                             merchants. In the randomized networks, since we randomly swap
       fraud rate              1:1000                  1:500
                                                                             edges, these relations are lost. Finally, the last anti-motif tells us
                                                                         4
                                                                                      Ratio
                                                                                100,000

                                                                                                                                               Motifs
                                                                                   10,000


           700                    600                     550                       1,000
      14 / 0.02 ± 0.14       24 / 0.04 ± 0.28        11 / 0.02 ± 0.14


                                                                                     100


                                                                                      10


           700                    600                     550                          1
      12 / 0.04 ± 0.20       20 / 0.08 ± 0.39        9 / 0.04 ± 0.20
                                                                                                                                          Anti-Motifs

                               (a) Motifs                                             0.1
                                                                                            0       20          40           60          80        100
                                                                                                         Number of Random Networks


                                                                               Figure 5: Transaction graph’s ratio evolution, highlighting
                                                                               the top-4 motifs in dark blue and the top-1 anti-motif in
                                                                               light blue.
           0.02                   0.02                    0.02
      39 / 2335 ± 1011      39 / 2335 ± 1011        39 / 2324 ± 1020

                                                                                  Three of the four top motifs are a 3-node clique, which happens
                                                                               when three transactions that share a merchant or client occur within
                                    Client              Merchant               the same six-hour window. Once again, the randomization of the
                                                                               entities breaks such patterns.
                                    Card                Terminal                  It is important to note that some motifs contain transactions
                                    Legitimate                                 processed in the same timestamp (represented with bi-directional
                                                        Fraud
           0.03                                                                edges). This pattern can happen in the same merchant when the
      704 / 22554 ± 17                                                         merchant is processing transactions in a batch or has multiple
                                                                               terminals.
                            (b) Anti-Motifs
                                                                                  The only anti-motif is a sequence of three transactions in the
                                                                               same merchant, where the middle transaction is fraudulent, and the
Figure 4: Entity graph motifs and anti-motifs. For each sub-                   other two are legitimate. This pattern is less frequent in the original
graph 𝑖, we show its ratio 𝑟𝑖 (first line) and 𝑓𝑖𝑜 / 𝜇𝑖𝑅 ±𝜎𝑖𝑅 (second          network than in the randomized networks since fraud transactions
line).                                                                         typically occur together in reality, and the randomized networks
                                                                               break this pattern.

                                                                               4     USAGE SCENARIOS
that it is not common for clients to use multiple cards in legitimate
transactions. Indeed, clients that use multiple cards are typically            The motif analysis can be a tool to characterize banking datasets. Be-
associated with fraudulent activity.                                           yond summary statistics, the list of motifs and anti-motifs surfaces
                                                                               underlying fraud patterns.
3.2.2 Transaction graph. Figure 5 shows the evolution of the ratio                Fraud experts, who may not be knowledgeable in data science or
for each 3-node subgraph in the transaction graph. After 80 random             statistics, often use graph visualization for data exploration. Motif
networks, the ratios seem to stabilize, and we can clearly distinguish         analysis serves as a visual summary characterization of a dataset
five subgraphs that stand out, namely, the top-4 subgraphs and the             for fraud detection.
bottom-1 subgraph. These subgraphs are presented in Figure 6.                     As an illustrative example, let us consider a fraud expert at a
   The common feature in the four motifs is at least two transactions          bank. Their role entails designing new rules to prevent fraud and
that share the same client and merchant. We may lose this pattern              reviewing unlabelled transactions. The expert can review the char-
during the randomization since we swap merchants and clients                   acteristic patterns surfaced by motif analysis to tailor the fraud de-
independently. However, it is still interesting to observe a significant       tection system in place. Over time, fraudsters design new schemes
number of patterns where the same client makes two or three                    to evade detection. Periodic analysis of motifs surfaces upcoming
transactions in the same merchant in less than six hours.                      fraud schemes and overall behavioral trends.
                                                                           5
                                                                                 As future work, one can investigate whether different banking
                                                                              datasets have similar motifs (and anti-motifs) and if those patterns
                                                                              are different in merchant datasets. This research would follow the
                                                                              findings by Milo et al. [3] where they report that networks with
                                                                              similar contexts have similar subgraph patterns. One can also ex-
                                                                              tend the analysis to larger motif sizes, different temporal windows,
                     12,564                     8,357                         and the inclusion of transaction amounts or fraud labels as graph
                1759 / 0.14 ± 0.99        2507 / 0.30 ± 0.54                  properties.

                                                                              REFERENCES
                                                                               [1] László Hajdu and Miklós Krész. 2020. Temporal network analytics for fraud
                                                                                   detection in the banking sector. In ADBIS, TPDL and EDA 2020 Common Workshops.
                                                                                   Springer, 145–157.
                                                                               [2] Giovanni Micale, Alfredo Pulvirenti, Alfredo Ferro, Rosalba Giugno, and Dennis
                                                                                   Shasha. 2019. Fast methods for finding significant motifs on labelled multi-
                                                                                   relational networks. Journal of Complex Networks 7, 6 (2019), 817–837.
                      2,223                     1,567                          [3] Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal
                 289 / 0.13 ± 0.34        1254 / 0.80 ± 6.94                       Ayzenshtat, Michal Sheffer, and Uri Alon. 2004. Superfamilies of evolved and
                                                                                   designed networks. Science 303, 5663 (2004), 1538–1542.
                                                                               [4] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, N Kashtan, Dmitri Chklovskii, and
                                 (a) Motifs
                                                                                   Uri Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks.
                                                                                   Science (New York, N.Y.) 298 (11 2002), 824–7.
                                                                               [5] Catarina Oliveira, João Torres, Maria Inês Silva, David Aparício, João Tiago
                                                Legitimate                         Ascensão, and Pedro Bizarro. 2021. GuiltyWalker: Distance to illicit nodes in the
                                                                                   Bitcoin network. arXiv:2102.05373 [cs.LG]
                                                Fraud                          [6] Pedro Ribeiro, Pedro Paredes, Miguel EP Silva, David Aparicio, and Fernando
                                                                                   Silva. 2019. A survey on subgraph counting: concepts, algorithms and applications
                                                Client                             to network motifs and graphlets. arXiv preprint arXiv:1910.13011 (2019).
                                                                               [7] Pedro Ribeiro and Fernando Silva. 2014. Discovering colored network motifs. In
                       0.29                     Merchant
                                                                                   Complex Networks V. Springer, 107–118.
               3315 / 11493 ± 7582                                             [8] Pedro Ribeiro and Fernando Silva. 2014. G-Tries: a data structure for storing and
                                                                                   finding subgraphs. Data Mining and Knowledge Discovery 28, 2 (2014), 337–377.
                                                                               [9] Pedro Ribeiro, Fernando Silva, and Marcus Kaiser. 2009. Strategies for network
                              (b) Anti-Motifs                                      motifs discovery. In 2009 Fifth IEEE International Conference on e-Science. IEEE,
                                                                                   80–87.
                                                                              [10] Ryan A Rossi, Nesreen K Ahmed, Aldo Carranza, David Arbour, Anup Rao,
Figure 6: Transaction graph motifs and anti-motifs. For each                       Sungchul Kim, and Eunyee Koh. 2019. Heterogeneous network motifs. arXiv
                                                                                   preprint arXiv:1901.10026 (2019).
subgraph 𝑖, we show its ratio 𝑟𝑖 (fist line) and 𝑓𝑖𝑜 / 𝜇𝑖𝑅 ± 𝜎𝑖𝑅              [11] F. Xia, H. Wei, S. Yu, D. Zhang, and B. Xu. 2019. A Survey of Measures for
(second line).                                                                     Network Motifs. IEEE Access 7 (2019), 106576–106587.


   When reviewing an unlabelled transaction, the expert can com-
pare the respective subgraph with known motifs. This context can
give insight into which pattern, fraudulent or legitimate, might
occur.
   On the other hand, by extracting the most relevant transaction
patterns and uncovering new fraud schemes, motif analysis can
complement common fraud detection systems. These insights can
be used to improve both machine learning models and rule-based
systems.

5   CONCLUSIONS
We explore motifs and anti-motifs in the context of banking fraud
using two graph representations, namely entity graphs and trans-
action graphs. We propose a novel randomization method that
operates directly on tabular data. This way, we overcome the limi-
tations of current network randomization methods in the context
of banking fraud.
   Moreover, we extract heterogeneous network motifs that convey
more information than traditional network motifs and find they
offer interpretable results. Insights extracted from motif analysis can
be used to aid fraud analyst investigate specific cases and improve
fraud detection systems by uncovering new fraud schemes.
                                                                          6