=Paper=
{{Paper
|id=Vol-2644/paper40
|storemode=property
|title=Rule-based Anti-Money Laundering in Financial Intelligence Units: Experience and Vision
|pdfUrl=https://ceur-ws.org/Vol-2644/paper40.pdf
|volume=Vol-2644
|authors=Luigi Bellomarini,Eleonora Laurenza,Emanuel Sallinger
|dblpUrl=https://dblp.org/rec/conf/ruleml/BellomariniLS20
}}
==Rule-based Anti-Money Laundering in Financial Intelligence Units: Experience and Vision==
<pdf width="1500px">https://ceur-ws.org/Vol-2644/paper40.pdf</pdf>
<pre>
      Rule-based Anti-Money Laundering in Financial
        Intelligence Units: Experience and Vision∗

            Luigi Bellomarini1 , Eleonora Laurenza1,2 , and Emanuel Sallinger3
                                       1 Banca d’Italia
                             2 Financial Intelligence Unit for Italy
                              3 TU Wien & University of Oxford


        Abstract. Money laundering is a major threat to the good functioning of finan-
        cial systems. Despite huge technological investments, with machine learning at
        the heart of the Fintech revolution, we are still lacking explainable solutions in
        fighting money laundering, especially for Financial Intelligence Units (FIUs).
        This paper is based on the joint committment of the Fintech community and
        academia in applying state-of-the-art rule-based reasoning to counteract money
        laundering. We report a visionary position about the application of logic-based
        Knowledge Graphs and reasoning with languages in the Datalog+/- family in the
        anti-money laundering (AML) domain. After motivating the impact and the im-
        portance of an explainable rule-based solution, we pin down the core AML prob-
        lems in the form of high-level decision tasks. We envision that the FIU knowledge
        is modeled as the ground truth of a KG, so that AML tasks are formulated and
        carried out as reasoning tasks, addressing specific quality desiderata. We provide
        technical zoom and concrete exemplification of the approach with a real money
        laundering case. We discuss relevant research and technological challenges.

        Keywords: Knowledge Graphs · Reasoning · Anti-money laundering · Fintech.

1    Introduction
This paper is based on our experience with the Financial Intelligence Unit (FIU) for
Italy, one of the most relevant in the world, and the University of Oxford in applying
state-of-art rule-based reasoning to counteract money laundering. Recent guidelines of
the Financial Action Task Force [18] make us aware that the AML and FIU commu-
nities are permeable to and in strong need for such declarative and fully explainable
approaches. Yet, it is our experience that no systematic approach, methodology or sys-
tem has been developed at any FIU or shared in the literature. This paper contributes a
visionary opinion based on a real-world experience, with the ambitious goal of steering
processes and systems at FIUs towards a rule-based direction, for explainable, efficient
and ethical AML decisions.
    Money laundering is the process of making illegally-gained money proceeds, that
is dirty money, appear legal and clean [21], by obfuscating the origin of the money.
Anti-money laundering is a global challenge, which involves financial and non-financial
intermediaries of the private sector (banks, trusts, money transfers, casinos, securities
    ∗ The views and opinions expressed in this paper are those of the authors and do
not necessarily reflect the official policy or position of the Italian FIU or Banca d’Italia.
Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
 dealers, real estate brokers, public notaries, dealers , etc.), Financial Intelligence Units
 and law enforcement agencies. Our interest here is in FIUs, the government agencies
 that act as intermediaries between the private sector and law enforcement agencies.
 FIUs collect and analyze raw transactional data, namely, Suspicious Transaction Re-
 ports (STRs), filed by the intermediaries and conduct financial intelligence investiga-
 tion to produce detailed, inspectable and logically consequential inquiry reports of the
 money laundering cases as well as justifiable hypotheses on the underlying crime (tech-
 nically the predicate offences) for legal follow-up.
     The current outlook on worldwide money laundering is disconcerting: The Interna-
 tional Monetary Fund [28], estimates the amount of laundered money to be between 1.7
 and 4.5 trillion USD. Meanwhile, according to Refinitiv [33], over the last year there has
 been an increase of over 50% in AI-based Fintech investment in improvements in con-
 trasting financial crime. Machine learning (ML) approaches are gaining ground [11]:
 novel supervised [34] and unsupervised [30] techniques are being proposed, with a re-
 cent focus on graph deep learning [37], natural language processing applications [26]
 and social network analysis [13]. Despite these efforts, the global progress in fighting
 financial crime is insufficient to address the uprising emergency [3]. In fact, AI ven-
 dors [36] and ML literature [11,14,31] have been concentrating on “compliance use
 cases” to support financial intermediaries (e.g., to accommodate the Fifth AML Di-
 rective [16]), while the entire AML community is looking at the AI technology in its
 growing demand for viable solutions able to fulfill the requirements of the FIU business.
     FIU use cases are far from the compliance ones and involve several core quality
 desiderata: 1. Complex reasoning tasks on sophisticated money laundering cases in-
 volving a network of many heterogeneous entities should be possible; 2. Fully explain-
 able elicitation of the specific financial fraudulent schemes is desired in order to sustain
 legal follow-ups and avoid unethical or unjustified conclusions; 3. A careful balance of
 inductive approaches and the extensive domain experience must be stricken; 4. Solu-
 tions should be scalable to support very high volumes. None of the above requirements
 are currently met by any existing framework or set of techniques for AML [11].
 Contribution. In this paper, we leverage our experience with the Italian FIU and our
 work on state-of-art Knowledge Representation and Reasoning (KRR) languages such
 as Warded Datalog± [5], a language of the Datalog± family. Datalog± extends Datalog
 with existential rules, while introducing syntactic restrictions to guarantee decidability
 and tractability of the reasoning task. We have successfully applied Warded Datalog± as
 a core language for reasoning on Knowledge Graphs (KGs) [5], a data model for which
 applied reasoning use cases are growing [19]. We suggest that the favourable features
 of such languages and the availability of engineered systems such as VADALOG [7],
 enable reasoning on complex cases of general interest to FIUs and provide a visionary
 position motivating the application of a rule-based approach in order to fulfill the above
 desiderata. In particular, the main contributions of this work are:

– A novel comprehensive formulation of AML use cases as high-level decision tasks,
  with unprecedented attention to the aspects of relevance for FIUs.
– An introduction of our vision on a rule-based reasoning approach to AML. In par-
  ticular, we envision that the enterprise knowledge available to FIUs and intermediaries
  is modeled as the ground truth of an AML KG. Domain experience from financial an-
  alysts as well as machine learning models should be represented as reasoning rules
  over the KG, so that AML tasks are formulated and carried out as reasoning tasks.
– A technical zoom and concrete exemplification of the benefits of a rule-based ap-
  proach to AML thanks to an anonymized and stylized Italian money laundering case.
– In the light of the described desiderata, a deep discussion of the most relevant research
  and technological challenges in a reasoning approach to AML: data wrangling, com-
  pliance checks, analytics, detection of seen and unseen laundering patterns, handling
  uncertainty, and taking ethical decisions.
 Overview. Section 2 includes the use cases and our vision. Section 3 analyzes a real-
 world case from the Italian FIU. Section 4 focuses on challenges and opportunities for a
 rule-based approach to AML. Relevant related work is extensively discussed throughout
 the paper and summed up in Section 5. In Section 6 we draw our conclusions.

 2     A Rule-based Vision on AML
 We start in Section 2.1 by getting to the core of the complex set of AML-related pro-
 cesses a FIU carries out. We capture them in the form of business use cases, presented
 as straightforward research problems. In Section 2.2, we introduce a visionary posi-
 tion and model business use cases as reasoning tasks over a comprehensive rule-based
 Knowledge Graph for AML.

 2.1   AML Use Cases: Getting to the Core

 AML is the task of preventing the process of money laundering as a whole, in all its
 phases. It is frequently presented only through the lens of regulatory compliance, as the
 burden falls primarily on intermediaries, responsible for meeting Know Your Customer
 (KYC) standards posed by regulations or conducting Customer Due Diligence to avoid
 the related high penalties. On the other hand, FIUs, at the heart of the money-laundering
 contrast system, conduct many intertwined activities, culminating in financial intelli-
 gence analysis and disseminating the results to the judicial authority. At the core of
 such activities, we see the following two high-level decision tasks:
1. Suspicion assessment. Decide whether a financial transaction or a case meets a
   definition of “suspicious”.
2. Suspicious offence determination. Decide whether a suspicious predicate offence
   underlies a transaction or case.
 In both the tasks we need to distinguish between transaction and case. While the former
 is a money/asset transfer together with its related attributes (e.g., transaction amount,
 KYC profiles of the involved subjects), cases are sets of cohesive transactions along
 with their context, which may include: The network of involved subjects, assets, compa-
 nies, shareholding structures, etc. The definition of suspicious may either derive from a
 national regulatory body or be independently adopted by the FIU. It can be either based
 on certain criteria or on probabilistic ones. A detailed example of case is in Section 3.
      Task (2) complements Task (1) in that it makes the explainability requirement ex-
 plicit: Besides knowing the fully detailed cause of suspiciousness, we want to decide on
 possible underlying predicate offences. More advanced characterizing tasks of practical
 utility for the FIUs can be formulated as derived tasks extending the previous ones:
- Suspiciousness scoring. Produce a heuristic measure (e.g., a score) of the level of
  suspiciousness of a transaction or a case (see Task (1)).
- Suspicion classification. Classify a case or transaction by its underlying predicate
  offences or produce a heuristic measure (e.g., a score) of the level of confidence that
  a specific predicate offence underlies a case or transaction (see Task (2)).
- Risk-driven optimization. Perform a risk-based scheduling or planning of the over-
  all AML activity to optimize an enterprise-level (FIU or intermediary) objective
  function based on suspiciousness (see Task (1)) or predicate offence (see Task (2)).
  For example, process first more suspicious cases or higher-impact crimes.
- Reconstruction. Produce an explanation about the suspiciousness of a transaction
  or case (see Task (1)) or for the presence of a given predicate offence (see Task (2)).


2.2   Towards Rule-based Knowledge Graphs for AML

In order to lay out our reasoning framework for AML, let us point out the basic notions
of Knowledge Graphs and Knowledge Representation and Reasoning (KRR) languages.
    A KG can be defined as a semi-structured data model characterized by three com-
ponents: 1. a ground extensional component (EDB), with constructs, namely facts, to
represent data in terms of a graph or a generalization thereof; 2. an intensional compo-
nent (IDB), with reasoning rules over the facts of the ground extensional component;
3. a derived extensional component, produced in the reasoning process, which applies
rules on ground facts [7]. In this work, we refer to logic-based KGs, i.e., where the
intensional component is defined with a logic-based KRR language. We envisage the
adoption of VADALOG, a reasoning language revolving around Warded Datalog± , a
language of the Datalog± family that extends Datalog with existential quantification.
Warded Datalog± captures full Datalog [10,27], so it supports full recursion, essential
to navigate graph structures; at the same time it allows ontological reasoning, being able
to express SPARQL queries under set semantics and the entailment regime for OWL 2
QL while guaranteeing scalability thanks to PTIME data complexity for the reasoning
task [24]. VADALOG supports additional features of practical utility: a form of aggre-
gation [35], stratified negation, Boolean conditions, mathematical expressions, proba-
bilistic reasoning, embedded functions and arbitrary machine learning models [7].
    The IDBs are defined in terms of existential rules, i.e. first-order sentence of the
form: ∀¯   𝑥∀¯
             𝑦 (𝜑(¯
                  𝑥 , 𝑦¯) → ∃¯𝑧 𝜓(¯   𝑥 , ¯𝑧)), where 𝜑 (the body) and 𝜓 (the head) are conjunc-
tions of atoms with constants and variables. Heads can also equate variables. We write
𝜑(¯𝑥 , 𝑦¯) → ∃¯𝑧 𝜓(¯ 𝑥 , ¯𝑧 ) and replace ∧ with comma to denote conjunction of atoms. The
semantics of such a rule is intuitively as follows: if there is a fact 𝜑(¯𝑡 , ¯𝑡 0) in the derived
extensional component (denoted as Σ(𝐷) and which initially coincides with the EDB),
then there exists a tuple ¯𝑡 00 such that the facts 𝜓(¯𝑡 , ¯𝑡 00) are also in Σ(𝐷). Roughly, the
reasoning process is the generation of Σ(𝐷) to answer specific queries.
A KG for AML. We are now ready to apply the above definitions to our domain. We
see the set of AML tasks of interest to a FIU as reasoning over an encompassing KG,
modeling all the relevant domain objects and interconnections including: transactions,
enterprise data stores, compliance rules, suspicious patterns, etc. In particular, transac-
tion data and Suspicious Transaction Reports (STRs) should be represented as EDBs
as well as data from enterprise knowledge systems. For intermediaries, they include
                  Fig. 1: An AML KG, with the main tasks and challenges.

transaction, enterprise and KYC data; EDBs derive from: STRs, business, person, as-
set, real estate and invoice registers, facts from social networks and media, newspapers
or follow-up feedback from law enforcement authorities. IDBs should be used to repre-
sent and operationalize official regulations — in a “RegTech” approach — and encode
custom criteria, including money laundering patterns (e.g., circular wire transfers, pyra-
midal control structures), domain rules and suspicious behaviours.
AML Intensional Knowledge and Reasoning. We believe that most of the money
laundering patterns, suspicious behaviours and financial business rules can be described
with a KRR language like VADALOG, supporting full recursion, ontological reasoning,
probabilistic reasoning, and machine learning models. We envision that some reasoning
rules are designed by financial analysts and domain engineers, while others are learnt
from data, e.g., with statistical relational learning approaches [32].
     Crafted rules embody valuable domain knowledge that cannot be induced from the
data (e.g., a compliance regulation, the internals of a money laundering pattern, a com-
plex domain rule) or is well known to the analysts (e.g., money laundering patterns,
tactics for financial trail obfuscation) and inducing it from data would result in lower
accuracy and explainability. The following two VADALOG rules, for instance, are part
of the intensional component of an invoice KG and encode (a simplified version of) the
domain knowledge to detect off-the-books slush funds, set up via false invoices:
                                 Transaction(𝑥, 𝑦, 𝑜) → ∃𝑧 Invoice(𝑧, 𝑦, 𝑥, 𝑜),       (1)
          Invoice(𝑧, 𝑦, 𝑥, 𝑜), ¬Transaction(𝑥, 𝑦, 𝑜) → PotentialSlushFund(𝑦)          (2)
    Normally, whenever there is a transaction from a subject 𝑥 to 𝑦 for a product/service
𝑜, then 𝑦 issues an invoice 𝑧 to 𝑥 for 𝑜 (Rule (1)). Yet, if we have an invoice for 𝑜, but
no transaction from 𝑥 to 𝑦 exists, likely 𝑦 is creating a slush fund (Rule (2)). This is
an elementary case, representing a typical domain knowledge snippet. We shall see a
full-fledged intensional component for a real AML case in Section 3.
    Modern KRR languages allow for general and robust rules that capture both seen
and unseen money laundering patterns. When rules embed parametric machine learning
models, such parameters can in turn be inferred from data. On the other hand, learn-
ing rules from data can sensitively expedite ordinary rule design process. The learning
bus in Figure 1 denotes such a hybrid deductive/inductive approach. AML tasks de-
scribed in Section 2.1 should be modeled as reasoning tasks on the AML KG and used
to develop value added services or APIs to support the various business processes of
intermediaries and FIUs. In particular, we envisage the development of KG-enhanced
AML applications that rely on the outcome of Tasks (1-2) for crucial decisions. For
example, tools for STR workflow processing at intermediaries should query the AML
KG for Task (1) in order to finalize transactions and decide, according to regulatory or
custom criteria, whether to file STRs to their local FIUs. Symmetrically, FIUs, upon
receiving STRs from intermediaries, should trigger Task (1) to perform the follow-up
assessments. Task (2) should be used by FIUs when specific cases need to be pursued,
for autonomous investigations or instances issued by the enforcement authorities.

3   Reasoning on a Real Money Laundering Case
In the daily investigation duties of a FIU, deciding on the suspiciousness of an STR is
a prominent activity: The Italian FIU handles ∼ 100𝐾 such cases per year, for which
timely and explainable decisions are mandatory [20]. This causes Task (1) to emerge in
terms of practical utility, with the need to develop effective decision heuristics. Hence,
suspiciousness scoring is of strategic relevance among derived sub-tasks in Section 2.1.
Let us now see how it can be carried out with a rule-based approach by analyzing a real
pattern of money laundering cases, anonymized and stylized.
Description of the Data. The exten-
sional component of the KG is in Fig-
ure 2. Solid black edges and nodes rep-
resent the EDB. In this case 14 compa-
nies, 4 financial intermediaries, and 4 in-
dividuals are involved. Depending on the
case, the reasoning process may need to
explore millions of entities in a KG with
∼ 22M nodes and ∼ 25.5M links.
Description of the Case. Dashed edges
and nodes in Figure 2 exemplify the de-
                                                     Fig. 2: An example of AML KG.
rived extensional component. The case is
triggered by an STR 𝑠 (the red edge in
Figure 2), reporting a loan instance from an individual 𝑥 having a criminal record (the
red node), to Acme Bank. Our goal is to score and explain suspiciousness of 𝑠). We
start from the search for a typical laundering pattern: A person 𝑥 who is issuing a loan
request to a bank 𝑏 of which he/she is the ultimate beneficial owner (UBO), may intend
to launder unclean money via the bank.
    A UBO is a person who ultimately, i.e., directly or via his/her undertakings, owns
or controls a given entity (Acme Bank, in our case) on whose behalf a transaction (the
loan) is being conducted [18]. In our case, if 𝑥 were a UBO of Acme Bank (𝑏), 𝑥 might
be requesting a fake loan to a bank he/she exerts control upon, with the likely intent to
justify money illegally gained. What typically happens is that perpetrators also try to
conceal their ultimate ownership. As we shall see, KG reasoning allows to go beyond
the literal recognition of the pattern and detects the suspicious behaviour in a general
sense, overcoming ownership concealment.
The following set of VADALOG rules (explained in detail in the next paragraphs) encode
the intensional component representing the domain we have just described:
                             𝑝 :: Person(𝑖1 , 𝑓11 , . . . , 𝑓𝑛1 ), Person(𝑖 2 , 𝑓12 , . . . , 𝑓𝑛2 ),
                      𝑝 = #𝑠𝑖𝑚( 𝑓11 , . . . , 𝑓𝑛1 , 𝑓12 , . . . , 𝑓𝑛2 ) → Spouses(𝑖1 , 𝑖2 ).            (1)
                                     Person(𝑖, . . .) → ∃ 𝑓 Family( 𝑓 ), In(𝑖, 𝑓 ).                     (2)
                              Spouses(𝑖1 , 𝑖2 ), In(𝑖 1 , 𝑓1 ), In(𝑖2 , 𝑓2 ) → 𝑓1 = 𝑓2 .                (3)
                                                                           Control(𝑥, 𝑥).               (4)
                                      Own(𝑥, 𝑦, 𝑤), 𝑤 > 0.5 → Control(𝑥, 𝑦).                            (5)
          Control(𝑥, 𝑧), Own(𝑧, 𝑦, 𝑤), msum(𝑤, h𝑧i) > 0.5 → Control(𝑥, 𝑦).                              (6)
                                                 IsCeoAt(𝑥, 𝑦) → Own(𝑥, 𝑦, 1.0).                        (7)
            Own(𝑥, 𝑧, 𝑤 1 ), Own(𝑧, 𝑦, 𝑤 2 ) → Own(𝑥, 𝑦, msum(𝑤 1 · 𝑤 2 , h𝑧i)).                        (8)
                 Own(𝑥, 𝑦, 𝑤), In(𝑥, 𝑓 ), 𝑗 = msum(𝑤, h𝑥i) → Own( 𝑓 , 𝑦, 𝑗).                            (9)
         𝑤 :: STR(𝑠, 𝑏, 𝑥), Loan(𝑠), In(𝑥, 𝑓 ), Control( 𝑓 , 𝑏) → Suspicious(𝑠).                       (10)

Control. Represented in Rule (10), the suspiciousness degree of 𝑠 depends on the prob-
ability of 𝑥 controlling (Control) 𝑏. To this end, Rules (4-6) define control with a
broadly accepted formulation, also present in logic programming contexts [10]:
A company (or a person/family) 𝑥 controls a company 𝑦, if: (i) 𝑥 directly owns more than
50% of 𝑦 (Rule (5)); or, (ii) 𝑥 controls a set of companies that jointly (i.e., summing the
share amounts), and possibly together with 𝑥, own more than 50% of 𝑦 (Rule (6)).
Moreover, Rule (7) extends the notion of control with the simplifying assumption that
the CEO of a company has full control over it. Rule (8) accumulates direct and indirect
ownerships that 𝑥 exerts on 𝑦, along all possible ownership paths. In our case, individual
𝑥 does not control Acme Bank: 𝑥 is actually concealing control through his/her family.
Family Relationships. Let us consider Rules (1-3). Rule (2) states that every individual
belongs to a family, his/her own. Rule (3) merges families 𝑓1 and 𝑓2 whenever they
contain two spouses, 𝑖 1 and 𝑖 2 . The overall effect is clustering the person space.
     Family relationships are detected by Rule (1). It contains a specialized machine
learning model for link prediction (denoted by the #sim embedded function). It takes
as input the features of 𝑖 1 and 𝑖 2 and returns a score 𝑝 measuring how likely 𝑖1 and
𝑖 2 are spouses. Rule (1) produces Spouses facts with a probability depending on 𝑝.
In our case, let 𝑃1 (in Figure 2) be the suspicious individual’s partner. The family also
contains 𝑃2 , 𝑃3 and potentially more people. Knowing the family members, we can de-
termine the overall relationship of 𝑓 with Acme Bank. To this aim, Rules (9) aggregates
ownership amounts originating from different family members.
Overall Pattern. 𝑃2 directly owns 0.34 of My Bank and 𝑃1 indirectly owns 0.21 =
1 × 0.93 × 0.23 of My Bank (by Rule (8)). In total, 𝑓 controls My Bank owning 0.55 of
the shares. My Bank, in turn, controls Acme Bank holding with 0.52 of the shares via a
pyramidal shareholding structure, probably set up to obfuscate the connection between
the two companies. In conclusion, family 𝑓 controls Acme Bank.
    Now we have all the ingredients to apply Rule (10). While 𝑥 is not literally the UBO
of Acme Bank, we have that his/her family as a whole is. Therefore we conclude there
is actual suspicion 𝑥 is perpetrating money laundering by justifying unclean money with
a fake loan from 𝑏, a bank he/she controls (and therefore can force to issue the loan).
The overall confidence in this conclusion depends on the certainty 𝑝 in the existence of
the personal relationship — the output of a link prediction model — as well as on the
intrinsic reliability 𝑤 of the money laundering pattern.
Discussion of Real and Artificial Cases, and Performance. We tested this kind of
tasks in a VADALOG system instance running on a memory-optimized virtual machine
with 16 cores and 64 GB RAM (Intel Xeon architecture). The evaluation of one single
case in the real Italian company KG requires ∼ 420 seconds, averaged over 100 runs.
Elapsed times are compatible with production use of the solution. A massive execution
on the real KG, in search of all the cases respecting the same laundering pattern success-
fully individuated 1365 cases having a suspiciousness score of at least 0.8, in 1600 sec-
onds. Evaluating accuracy of the specific suspiciousness scoring model adopted in this
example is out of the scope of our discussion. Assessing accuracy of STR scores is con-
troversial and usually built as a comparison against the human analyst’s performance.
Nevertheless, as common in unsupervised settings, automated decisions often improve
human performance, finding new unseen patterns or defusing others. This makes accu-
racy evaluation even more challenging and matter of dedicated studies.
    The core complexity element affecting reasoning times lies in the number of “own-
ership paths” connecting companies and individuals. In order to assess the scalability
of our approach, we developed a graph generator able to build networks with the same
topological characteristics as the real KG: It adopts a variant of the Barabási–Albert
model [2] for directed scale-free networks, with parameters fitted from the real KG. In
the generation of the graph, the attachment likelihood tuning has been used to generate
instances of increasing density, while keeping the other characteristics stable. Quantita-
tive features have been generated by sampling from appropriate distributions, also fitted
from the real KG (e.g., a Beta distribution for company shares). With this tooling, we
have built 8 artificial cases showing that ownership paths can be evaluated in less than
20 seconds for graphs with 1M nodes and density similar to the real one. For graphs
much denser than the real-world financial networks, elapsed time grows to ∼ 3000
seconds for the same number of nodes. Execution time is polynomially affected by the
number of nodes, e.g., with 4600 seconds for a ∼ 28M nodes graph (much larger than
the real one) having realistic density. In conclusion, the solution is highly scalable and
ready to support even multi-national KGs, e.g., at European level, which are comparable
in number of nodes yet have dramatically lower density than our synthetic cases.

4    Challenges and Opportunities for Rule-Based AML
Let us now extend our discussion and deal with what we consider the most relevant
research, technological challenges and opportunities for a rule-based view of AML.
Distributed Reasoning. End-to-end AML processes involve the cooperation of differ-
ent FIUs and intermediaries (e.g., within the European FIU.net4 ). Processes should be
designed as distributed reasoning tasks: At FIU level, an overall scheduling of AML
activities based on STR and predicate offence scores could be performed, so as to con-
centrate computational resources only (or first) on the most relevant cases. FIUs and in-
termediaries (and law enforcement agencies in the longer term) would cooperate in the

    4 https://ec.europa.eu/home-affairs/e-library/glossary/fiunet en
execution of Tasks (1-2) to converge to AML decisions with a global/local perspective.
In particular, reasoning tasks should span multiple KGs, deployed at and maintained
by the different actors. Reasoning should be performed as coordinated sub-tasks, each
local to its respective data shard. For example: an intermediary could trigger suspicion
assessment before issuing a transaction; then, this process, would in turn activate the
respective suspicion assessment and suspicious offence determination on the FIU side.
Trust, Information Sharing and Data Integration. Distributed reasoning would sus-
tain trust thanks to a form of reasoning locality with shared processes and intermediary-
/FIU-level EDBs. This would overcome, for example, the need for the FIUs to retrieve
and store transaction data from the intermediaries or to share privileged intelligence in-
formation with one another. On the other hand, the approach would encourage actors to
share IDBs, e.g., in the form of AML criteria or patterns, so fostering standardization
and reproducibility of checks. Moreover KGs should contribute to the construction of
an integrated information asset for intermediaries and FIUs, to standardize compliance
checks and contrast the proliferation of diverse implementations of the various tasks.
Data Wrangling: Building the EDBs. Data wrangling is the construction of the EDBs
from the various sources involved in AML such as transaction data, enterprise stores,
business registers, KYC profiles, and requires the solution to non-trivial data manage-
ment problems such as entity resolution [12], data fusion [15], natural language pro-
cessing algorithms for unstructured sources [26], etc. These tasks can be specified in
the form of reasoning rules in VADALOG, a very effective lingua franca for data wran-
gling [22,29], with a a solid history in the data management community [17,23].
Computing Analytics. For tactical and strategic planning of AML activities, the avail-
ability of aggregate analytics is fundamental. VADALOG benefits from recent exten-
sions to recursive logic formalisms, namely monotonic aggregations in Datalog that
support queries and reasoning about the number of distinct variable assignments satis-
fying specific goals and conjunction of goals [35]. It is our experience that typical AML
aggregate indicators can be expressed by such formalisms, which support scalable im-
plementations with limited memory footprint.
Finding seen and unseen Money Laundering Patterns. Graph database technology
lacks sufficient expressive power to capture many known money laundering patterns
as, e.g., RPQ-based languages [9] do not handle full recursion; also, as they do not
support ontological reasoning, patterns can be matched only on a case-by-case basis,
with an unaffordable proliferation of queries. KGs represent a very good opportunity to
industrialize pattern detection, which should be modeled in the form of reasoning rules
of the intensional component. VADALOG is a good choice to represent any pattern,
thanks to high expressive power, scalability and full explainability.
    In order to keep up with the evolving pace of financial crime, we need to rely on
self-adapting reasoning rules, robust and able to generalize unseen patterns as well.
VADALOG copes with these requirements. First, the support for intensional predicates,
existential quantification and, more generally, full ontological reasoning enables auto-
matic interaction of seemingly unrelated domain areas to detect unforeseen illicit situa-
tions. Second, the possibility to embed machine learning models allows to detect fuzzy
patterns and limit the proliferation of special cases. Reasoning can also be used to im-
plement scalable and explainable clustering to group the entities (e.g., the suspicious
transactions) according to their features or topological role in their network. Vice versa,
reasoning can be operated inside clusters calculated with standard techniques (e.g., k-
means) to perform fine-grained comparisons. Another promising technique for unseen
patterns is link analysis [11], consisting in establishing connections between entities
(individuals, transactions, etc.) and using them to assess suspiciousness.
Usability and Effectiveness. VADALOG embodies many usability characteristics [24]:
plainess, as it is based on the basic Datalog syntax; simplicity, as facts can be seen as
database tuples; modularity, as rules do not rely on any form of compilation dependency
or procedural ordering. Practical adoption in AML shows VADALOG to be a very com-
pact formalism, able to encode in tens of rules, thousands of lines of procedural code.
The semantics, highly based on first-order logic, is intuitive also for non-IT domain
experts and the use of recursion is easily grasped as a form of inductive definitions.
We observed high appreciation for the presence of inductive definitions involving exis-
tentials and aggregations, traditionally absent in domain-specific languages of financial
and statistical realms. Although the development of complex cases may require the sup-
port of “VADALOG engineers”, the language sparks users’ interest in understanding core
business details, with reduced IT costs, especially w.r.t. non-declarative technology.
Scalability. VADALOG offers a transparent and safe approach to scalability. When only
core features are used (we are in the Warded Datalog± core), the language guarantees
polynomial complexity, suitable to the majority of AML use cases. For ultra-scale sce-
narios, VADALOG trades minor syntactic restrictions (rules are in the Piecewise Linear
Warded Datalog± [8] fragment) for high parallelizability. The adoption of the full range
of advanced features, may come at the cost of higher complexity.
Handling Uncertainty. Mutually connected levels of uncertainty are involved in AML
reasoning to define the similarity of patterns, the suspiciousness level of a transaction or
the likelihood of a predicate offence, etc. VADALOG can handle uncertainty primarily
with probabilistic reasoning [6]: Weights can be used to specify importance of rules;
specifically, they are parameters of log-linear models defining the marginal probabil-
ity of the resulting facts. Embedded machine learning models, e.g., for graph embed-
dings [25], are also supported and can provide input facts to or receive training data
from the rules (see learning bus in Figure 1) in hybrid deductive/inductive reasoning.
Taking Ethical Decisions. Dealing with AML in a global perspective poses non-negli-
gible ethical and legal issues in that one individual’s subnetwork should be inspected
only as a consequence of explicit suspicion. By contrast, some techniques (e.g., social
network analysis [13,34]) move from emerging statistical evidence and then focus on
specific subjects as a consequence. This behaviour can be considered intrusive mis-
conduct for a FIU and rarely produces arguable evidence in judicial follow-ups. AML
approaches based on reasoning on KGs overcome these difficulties by operating in a
query driven fashion [7]: The traversal of the KG is driven by specific goal of the AML
task and the portion of the graph that is actually analyzed is the minimal expansion of
the suspicious individual/bank/transaction vicinity.

5   Related Work

Logic-based KGs and VADALOG are fully covered in [5,7]; recent works already delve
into practical applications [1] and KG architecture [4]. An interesting recent collection
of KG use cases, under diverse perspectives, is provided by Toma et al. [19].
    The most comprehensive survey on AI approaches to AML can be found in Chen et
et al. [11], where they focus on machine learning techniques, including supervised [34]
and unsupervised [30] algorithms, graph deep learning [37] and NLP [26]. Most of
machine learning approaches proposed in the AML literature suffer from some limi-
tations [11]: classification exhibits affordable performance only for medium datasets
and suffer from slow training; approaches based on fuzzy logic capture specific statis-
tical models and require intensive human work for rule writing [11]; neural approaches
suffer from low explainability, which makes them hardly applicable for a FIU. Such
patterns need to be carefully balanced with inductively learned patterns.
    Finally, the related field is social network analysis on money laundering data has
the primary objective of computing emerging statistical properties that are specific mea-
sures referring to the network as a whole [13,34].

6    Conclusion
Money laundering is a significant financial risk for intermediaries and a severe threat for
economies, governments and ultimately for personal wealth and freedom. The potpourri
of rising AI techniques for AML too often neglect domain experience and fail to address
requirements of relevance to FIUs. With this paper, we wish to share our experience in
making tangible actions to combat money laundering with AI: We aim to systematize
and stimulate the research debate by offering a unifying framework for explainable
AML. Logic-based KGs are the right means towards a holistic approach, balancing the
power of the inductive techniques, with the brilliancy of top-level financial analysis.
Acknowledgements. This work was supported by EPSRC programme grant EP/M025268/1, the
EU H2020 grant 809965, and the Vienna Science and Technology (WWTF) grant VRG18-013.

References
 1. Atzeni, P., Bellomarini, L., Iezzi, M., Sallinger, E., Vlad, A.: Weaving enterprise knowledge
    graphs: The case of company ownership graphs. In: EDBT (2020)
 2. Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science (New York,
    N.Y.) 286, 509–12 (11 1999)
 3. Basel Institute on Governance: AML Index 2018. https://bit.ly/2Yd3ray (2019), [Online;
    accessed 17-Jan-2020]
 4. Bellomarini, L., Fakhoury, D., Gottlob, G., Sallinger, E.: Knowledge graphs and enterprise
    AI: the promise of an enabling technology. In: ICDE. pp. 26–37. IEEE (2019)
 5. Bellomarini, L., Gottlob, G., Pieris, A., Sallinger, E.: Swift logic for big data and knowledge
    graphs. In: IJCAI (2017)
 6. Bellomarini, L., Laurenza, E., Sallinger, E., Sherkhonov, E.: Reasoning under Uncertainty in
    Knowledge Graphs (to appear). In: RuleML+RR (2020)
 7. Bellomarini, L., Sallinger, E., Gottlob, G.: The vadalog system: Datalog-based reasoning for
    knowledge graphs. PVLDB 11(9), 975–987 (2018)
 8. Berger, G., Gottlob, G., Pieris, A., Sallinger, E.: The space-efficient core of vadalog. In:
    PODS. pp. 270–284. ACM (2019)
 9. Bonifati, A., Fletcher, G.H.L., Voigt, H., Yakovets, N.: Querying Graphs. Synthesis Lectures
    on Data Management, Morgan & Claypool (2018)
10. Ceri, S., Gottlob, G., Tanca, L.: Logic programming and databases. Springer (2012)
11. Chen, Z., Khoa, L.D.V., Teoh, E.N., Nazir, A., Karuppiah, E.K., Lam, K.S.: Machine learning
    techniques for AML solutions in STR detection: a review. K. Inf. Syst. 57(2), 245–285 (2018)
12. Christen, P.: Data Matching Entity Resolution, and Duplicate Detection. Springer (2012)
13. Colladon, A.F., Remondi, E.: Using social network analysis to prevent money laundering.
    Expert Syst. Appl. 67, 49–58 (2017)
14. Dias, L.F.C., Parreiras, F.S.: Comparing data mining techniques for anti-money laundering.
    In: SBSI. pp. 73:1–73:8. ACM (2019)
15. Dong, X.L., Naumann, F.: Data fusion - resolving data conflicts for integration. PVLDB 2(2),
    1654–1655 (2009)
16. European Parliament: Directive (eu) 2018/843 of the european parliament and of the council.
    https://bit.ly/3gWZYFX (2018), [Online; accessed 17-Jan-2020]
17. Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: Semantics and query answering.
    In: ICDT (2003)
18. FATF: Transparency and Beneficial Ownership. https://bit.ly/2UklJWj (2016), [Online; ac-
    cessed 17-Jan-2020]
19. Fensel, D., Simsek, U., Angele, K., Huaman, E., Kärle, E., Panasiuk, O., Toma, I., Wahler,
    J.U..A.: Knowledge Graphs - Methodology, Tools and Selected Use Cases. Springer (2020)
20. Financial Intelligence Unit for Italy: Rapporto annuale 2018. https://bit.ly/3fwYH6W
    (2019), [Online; accessed 19-Jun-2020]
21. Fincen: History of AML Laws. https://bit.ly/2AMlNau (2015), [Online; accessed 17-Jan-
    2020]
22. Furche, T., Gottlob, G., Neumayr, B., Sallinger, E.: Data wrangling for big data: Towards a
    lingua franca for data wrangling. In: AMW (2016)
23. Golshan, B., Halevy, A.Y., Mihaila, G.A., Tan, W.: Data integration: After the teenage years.
    In: PODS. pp. 101–106. ACM (2017)
24. Gottlob, G., Pieris, A.: Beyond SPARQL under OWL 2 QL entailment regime: Rules to the
    rescue. In: IJCAI. pp. 2999–3007 (2015)
25. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: KDD (2016)
26. Han, J., Barman, U., Hayes, J., Du, J., Burgin, E., Wan, D.: Nextgen AML: distributed deep
    learning based language technologies to augment aml investigation. In: ACL (2018)
27. Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications. In: SIGMOD (2011)
28. International Monetary Fund: World economic outlook, april 2019. https://bit.ly/3cKyuzL
    (2019), [Online; accessed 17-Jan-2020]
29. Konstantinou, N., Abel, E., Bellomarini, L., Bogatu, A., Civili, C., Irfanie, E., Koehler, M.,
    Mazilu, L., Sallinger, E., Fernandes, A., Gottlob, G., Keane, J.A., Paton, N.W.: VADA: an
    architecture for end user informed data preparation. J. Big Data 6, 74 (2019)
30. Larik, A.S., Haider, S.: Clustering based anomalous transaction reporting. In: WCIT. vol. 3,
    pp. 606–610. Elsevier (2011)
31. Oeben, M., Goudsmit, J., Marchiori, E.: Prerequisites and ai challenges for model-based
    anti-money laundering. In: IJCAI 2019 Workshop (2019)
32. Raedt, L.D., Kersting, K., Natarajan, S., Poole, D.: Statistical Relational Artificial Intelli-
    gence: Logic, Probability, and Computation. Synthesis Lectures on AI and machine learning,
    Morgan & Claypool (2016)
33. Refinitiv: Innovation and fight against financial crime. https://www.refinitiv.com/en (2019),
    [Online; accessed 17-Jan-2020]
34. Savage, D., Wang, Q., Zhang, X., Chou, P., Yu, X.: Detection of money laundering groups:
    Supervised learning on small networks. AAAI Workshops (2017)
35. Shkapsky, A., Yang, M., Zaniolo, C.: Optimizing recursive queries with monotonic aggre-
    gates in deals. In: ICDE. pp. 867–878 (2015)
36. Weber, M., Chen, J., Suzumura, T., Pareja, A., Ma, T., Kanezashi, H., Kaler, T., Schardl,
    C.E.L..T.B.: Scalable graph learning for aml: A first look. CoRR abs/1812.00076 (2018)
37. Weber, M., Domeniconi, G., Chen, J., Weidele, D.K.I., Bellei, C., Robinson, T., Leiserson,
    C.E.: Anti-money laundering in bitcoin: Experimenting with graph convolutional networks
    for financial forensics. CoRR abs/1908.02591 (2019)

</pre>