Application of the SDL Library to Reveal Legal Sanctions
 for Crime Perpetrators in Selected Economic Crimes:
   Fraudulent Disbursement and Money Laundering

               Jaroslaw Bak, Maciej Falkowski and Czeslaw Jedrzejek
                     Institute of Control and Information Engineering,
                             Poznan University of Technology,
                   M. Sklodowskiej-Curie Sqr. 5, 60-965 Poznan, Poland
                            {firstname.lastname}@put.poznan.pl


      Abstract. As a part of the PPBW (Polish Platform for Homeland Security), we
      have developed tools that help investigators and prosecutors to conduct
      investigations of financial crimes and manage the data gathered. The most
      important task is to transform data into knowledge that allows for classification
      of illegal activities and assignment of sanctions based on the roles of people in
      companies. In this demo we present an application of the Semantic Data
      Library (SDL) to the problem of gathering, managing, querying and interpreting
      data relevant to building the evidence necessary for an indictment. The SDL
      uses and integrates a rule engine, a relational database, an ontology and a set of
      rules specific to a given crime typology. This combination allows querying and
      inferring a crime scheme and identifies possible charges; in particular, it
      discovers crime activities and roles (of particular types of owners, managers,
      directors and chairs) using concepts, appropriate relations and rules. We present
      results achieved with SDL and an ontology, called the ‘minimal model,’ of a
      fraudulent disbursement committed by management, accompanied by money
      laundering. Prospects on future development of the SDL tool are presented.

      Keywords. SDL library, reasoning, minimal model ontology, financial crime,
      penal code


1    Introduction

     One of the most costly and hard to trace crimes is economic crimes, such as a
VAT tax fraud, or fraud. The level of complicity and amount of proxy and scam
companies involved make it difficult to identify crime schemes. Another issue is to
properly formulate an indictment based on evidence, that, in particular, identifies
roles and activities of members of a crime group. In previous work [1, 2] we
presented a model of a fraudulent disbursement crime, a subset of an asset
misappropriation crime. In a 2009 survey [3], asset misappropriations constituted
two-thirds of all economic crimes. These are often accompanied by money laundering
schemes.


                                                                                           1
      In this demo we apply our approach to fraudulent disbursement combined with
money laundering [4]. The demo is based on the accompanying work [2]. The
sophistication of our model lies less in ontology than in an appropriate set of rules. In
the demo we present an application of the rule-based system originally called AFIZ
(Analyzer of Facts and Relations) to crime scheme analysis. Our system is comprised
of the SDL library [5], a relational database, an ontology and a set of rules. The
relational database contains data gathered during investigation and relevant to the
potential charges. The ontology provides concepts (classes) and relations to which
relational data is mapped and a hierarchy of concepts and relations. Rules express
dependencies and domain knowledge that allows querying about crime members’
activities from a legal point of view. In the demo we do not stress completeness but
simplicity. Therefore, it is based on a single fraudulent disbursement typology (the
Hydra case) and core data related to this case. The rest of the data simulate variants of
this crime by stochastically changing the values of the most important attributes. For
some of these parameters no crime occurs, but generally various possible variants of
persons’ criminal activities are selected. The system goal is to detect a crime
occurrence and bring proper charges based on people’s activities. Despite handling a
restricted class of crime, options of the crime case exhibit richness that is sufficient
from a legal point of view. As seen from the demo, instantiating the ontology allows
querying of various aspects of the crime.
      Surprisingly, the system performs better than an average prosecutor on a charge
assignment. We will elaborate on details of this statement in a future work. This demo
aims at proving that the system is practical and can be of big help if extended to a
larger set of crimes. Since the analytic capability of institutions, such as the Police or
Prosecutor’s Office in Poland, is limited, the system has to hide the complexity of
data structures and reasoning engines and expose friendly interfaces.
      The paper is organized as follows. Section 2 presents the SDL architecture and
functionalities. Section 3 describes a Hydra case which exemplifies an important type
of fraudulent disbursement scheme and the Hydra-case-like simulated input data
generation. In Section 4 we execute selected queries according to the minimal model
ontology. Conclusions and future work are presented in Section 5.


2    SDL Architecture and Functionalities

     SDL integrates ontologies, relational data and rules that represent domain
knowledge. The architecture of this system is presented in Figure 1. The central part,
which gathers input from other system elements and processes rules, are the two Jess
engines [10] used for forward and backward chaining.
     The set of functionalities allows the SDL library to answer queries to the
relational database using ontology and rules. The tool uses hybrid reasoning (forward
and backward) for query execution. The backward method is responsible for
gathering data from the relational database and the forward chaining is used to answer
a given query. The Minimal Model ontology that conceptualizes financial crimes
(presented in [2] and on the demo page) is expressed in OWL-DL [6]. Before it can be


                                                                                        2
                   Fig. 1. The architecture of the Semantic Data Library.

used by the Jess engine, it has to be transformed into a set of rules. We adopt an
approach that first calculates hierarchies of concepts and relations and then transforms
these hierarchies into a set of rules. A part of the ontological knowledge is lost during
this process, but for our purposes this is sufficient, and after adding rules the model
remains decidable. A few chosen OWL properties are transformed into Jess rules as
well, e.g., owl:SymmetricProperty. We use Pellet [7] to calculate all the ontological
dependencies and taxonomy. In the next step, the Pellet output, the computed
ontology has to be transformed into a set of Jess-format rules, which is done by the
SDL-API function.
     Another set of rules is the domain knowledge set, which is expressed in SWRL
[8] language. SWRL extends the expressivity of OWL, supporting the use of ontology
axioms in rules. We also use SWRLB language [9] to extend SWRL with additional
functions. We use these rules to infer new facts in the knowledge base; in our case,
about connections between persons, documents, money transfers and legal sanctions.
These two sets constitute deduction rules.
     There is one more set of rules – production rules; these are used to map between
ontology axioms (properties and classes) and data stored in a relational database.
These are defined in Jess-specific language [10] and their creation is supported by the
SDL-GUI. They map some of the axioms (“essential” axioms) to appropriate SQL
queries. “Essential” means that the instance of this axiom cannot be obtained from the
taxonomy or rules, only directly from a database. Usually, “essential” axioms are
lowest level taxonomy axioms.


3     The Hydra Case

3.1     The Hydra case as an example of a fraudulent disbursement scheme

    Our so-called minimal model ontology comes from experience based on detailed
analysis of descriptions, indictments and sentences of around 10 criminal cases. The
most “clean” case of fraudulent disbursement is the so-called Hydra Case.
    In the Hydra case, the Chief Executive Officer (CEO) of company A (Hydra)
subcontracted construction work. The work was then consecutively subcontracted
through a chain of phony companies B, C, and D (Hermes, Dex, Mobex). Each


                                                                                       3
company was getting a commission for money laundering and falsified documents
stating that the contracted work had been done. Actually, company A itself did what
was identified as “subcontracted construction work”.
     At the end of the chain, the owner of a single-person company D attempted to
withdraw cash, and there was a suspicion that this cash would reach the management
of company A “under the table”. The crime scheme of the Hydra case is presented on
the demo description site (http://150.254.41.181/).
     A definition of the minimal model in application to financial crimes, expressed in
OWL language using the editor Protégé 4.0, is presented in [11]. This ontology has a
modular structure and contains the modules listed in the accompanying paper [2]. The
ontology is added to the demo material.
     It is important to correctly model the sequence of activities in the company
structure that lead to decisions and transactions. We illustrate this in the example of
the three-level structure of authorization (this is easy to generalize to more levels, but
the intent is to make it compatible with the Hydra case). The chain of activities is the
following: in the Hydra case, acceptance of construction work done by B at a given
site is first signed by a manager in A responsible for a work supervision at this site
(MiddleLevelManager); this is followed by a signature of the higher level manager –
a Director of the company responsible for supervision of all sites. A Director may be
authorized to accept invoices and order a payment – technically this is fraud and in
the case of Hydra, was done by a written authorization on the back of the invoice. The
role of the Principal (the top level of authorization, which, however, could have not
been exercised) was analyzed in detail in [1], where we modeled all possible options
of the Principal’s behavior.
     The Principal might not have known that the work has not been done. However,
he was the one who signed the contract for subcontracting and thus could be
implicated. Had the Principal of company A been a person who on the basis of the
work acceptance document had ordered the payment of A to B, upon issuance of an
invoice by B, he would be directly implicated.

3.2    Generation of the Hydra-case-like simulated input data

       For a practical demonstration of the minimal model ontology, we need data
stored in a relational database. We implemented a generation tool in Java which
enables us to generate a relational database with the size of a case as parameters: the
number of companies and number of documents (invoices, work approval documents
and money turnovers). The Hydra case generator generates data concerning:
• Information about employees and their position in a company
• Invoices with all obligatory elements (payer, seller, product, etc.)
• Work approval documents (or the lack of them)
• Signatures on documents
• Goods and services
• Companies and their legal form
• Money turnovers: money transfers, payments and withdrawals
• Legal articles (name, ID and content)
• Information about illicit personal gains and damages to companies (with values)


                                                                                        4
•   Other facts, like who knows about what (Person knowsAbout document) – these
    data result from testimonies. These facts are in the form of RDF triples (the table
    contains three columns: subject, predicate, object)

The seed of the generator is constant, so if the number of the documents is growing
(the number of companies is fixed), the query results (i.e., the number of cases found
criminal) from the bigger database contains all the results from the smaller database
(and some extra results, possibly). The tool generates the Hydra case in four variants
[2] and also generates some obscuring data (other documents, invoices, etc.). Every
generated element has its own identification number. In this manner data are
connected and internally coherent.


4    Queries and Query Execution

     To realize what could be possible questions to the system, we present the
relevant part of one count of charges in the Polish Penal Code:

“Article 296.
    § 1. Whoever, while under an obligation resulting from provisions of law, a
    decision of a competent authority or a contract to manage the property or
    business of a natural or legal person, or an organizational unit which is not a legal
    person, by exceeding powers granted to him or by failing to perform his duties,
    causes it to suffer considerable material damage, shall be subject to the penalty…
• § 2. If the perpetrator of the offence specified in § 1 acts in order to gain a
    material benefit…
• § 3. If the perpetrator of the offence specified in § 1 or 2 causes significant
    material damage of great extent…
• § 4. If the perpetrator of the offence specified in § 1 or 3 acts unintentionally…”

     The specificity and the simple nature of the Hydra case is that all persons,
possibly including the CEO, knew that they were committing a crime; therefore, the
model does not have concepts and data explaining their intentions. It is extremely
difficult to model and answer predicates like: “exceed powers granted to him” (which
could depend, for example, on taking an excessive risk) or “fail to perform his
duties”. A judge has to answer these questions that pertain to a given crime’s
attributes. At this stage our model does not contain such soft crime attributes. It
contains facts and hard concepts, such as “cause a company to suffer considerable
material damage”. The system is not prepared to answer directly all questions a judge
might ask. However, we could ask questions: all counts of criminal activities of a
person X, or all persons subject to counts of a charge C.
     At present, the system does not reason on a partial set of facts. Rather, it assumes
that the set of facts is complete and allows querying for elements of crime attributes.
We do not have an option that allows a user to pose his/her own query. We restrict the
use to a preselected query.


                                                                                       5
     We have prepared five queries to test different aspects of the query answering
mechanism. Queries were executed with the use of the hybrid reasoning process
(forward and backward chaining). Graphical representation of the queries is presented
on the demo description site.
     The first query contains only variables (without any values) and exploits
hierarchy rules. The second query contains variables and values; it exploits
ontological rules (for inComplicityWith symmetric property). The third query
contains only variables and exploits hierarchy rules. The fourth and fifth queries
contain variables and values, and exploit various characteristics of the knowledge
base as coded by rules. The last two queries are computationally demanding - the
property fallsUnder needs almost all rules to be fired, because it requires evidence
why a person falls under a given article. It is obvious that a rule can be fired more
than once (if appropriate facts exist in the Jess working memory).
     The demo queries were executed on three databases, generated according to
methodology presented in Section 3.2. These databases differ in the size of the
generated documents, values of money, turnovers, etc. The numbers of companies and
employees are the same in every database (20 companies and 240 people). Generated
databases contain the following numbers of documents (and money turnovers): 20,
100, 200. A mapping between a relational database and ontology consists of 57
mapping rules, and the set of domain knowledge rules contains 291 rules (42 SWRL
rules from the ontology, instead 41 as in [2]).
      All queries exploit mapping rules (because they are responsible for gathering
data from the relational databases). For anonymity, the numbers in responses to
queries are IDs of objects (persons, companies); if needed they can be transformed
into real names. Queries without any values sometimes need more time to execute
because SDL has to check all possible variable bindings from the database. Results of
the executed queries are presented in Table 1. The SDL Demo is available on this site:
http://150.254.41.181/. Rules fired means how many rules fired in the backward
chaining engine during the reasoning process. Queries where executed on a computer
with the following parameters: Core2Duo 2GHz, 2GB Ram; Java Heap Space was set
at 1024MB.
Table 1. Results of the queries execution
         Query and info                Database 20   Database 100    Database 200
             Query 1     [ms]                781          1328            1922
              Results    [number]            54           474             1036
          Rules Fired    [number]            74           441              796
             Query 2     [ms]               2734         37141          163968
              Results    [number]             1            1                1
          Rules Fired    [number]           1076         36260          225381
             Query 3     [ms]               2875         36344          183047
              Results    [number]            18           322             1004
          Rules Fired    [number]           1367         38457          232583
             Query 4     [ms]               5437        128719       Time exceeded
              Results    [number]             1            1           10 minutes
          Rules Fired    [number]           2040         57091
             Query 5     [ms]               9312     Time exceeded   Time exceeded
              Results    [number]             1        10 minutes      10 minutes
          Rules Fired    [number]           2540


                                                                                     6
As we can see, simple queries (1, 2, 3) are executed in an efficient way (if we look at
how many rules were fired). For more complex reasoning (queries 4 and 5), further
optimization is needed. It is worth noticing that it takes 4 minutes for Pellet to classify
our Minimal Model ontology with instantiation of the Hydra case. Our tool is more
efficient for simplified reasoning without the full advantages of the OWL DL
language (we mostly use hierarchies of concepts and relations computed by Pellet and
SWRL rules). More sophisticated reasoning with the Jess engine is available using the
OWL Meta-model from OWL2Jess [13].
      We also compared our results with the Jess engine using forward and backward
chaining separately. Appropriate scripts were loaded into two Jess engines. Facts (the
same as in databases) were loaded from files. The times of executing queries for the
same database were approximated, because the reasoning process is the main part of
the query execution. Differences between queries execution were not higher than
500ms.
      The time of executing queries in the forward chaining engine (data from the first
database) was around 250ms for each query (data loading took 375ms). From the
second database: 16s (query), 15s (facts). While loading data from the third database,
the size of the Java heap space was reached (both engines), so the queries could not be
executed. The times in a backward chaining engine are the following: 325ms (each
query), 485ms (facts) for data from the first database, and 20s (each query), 183s
(facts) in the second. It is obvious that for small databases, it is better to store data
(facts) in the engines’ working memory. But for the bigger databases, the problem
with scalability occurs. Future investigation and modification of our hybrid reasoning
method are necessary to obtain performance comparable to the forward chaining in
the Jess engine. At the demo time we demonstrate the result of a method that avoids
backward chaining and improves performance to a level comparable to the forward
chaining in the Jess engine.


5    Conclusions and future work

We demonstrated that the SDL library can be used to solve practical problems with
formally defined semantics (in our case the minimal model ontology). We realize that
our tools need extensions and optimization.
      In the next version of the SDL tool, we intend to apply a new conflict resolution
strategy which will be a combination of the “depth” and “breadth” strategies. In the
implemented strategy, SQL queries (data retrieval) will be executed in a more
efficient way, which will make our approach more scalable and useful. Another useful
feature we are going to implement is the reasoning path, which can be presented to
the user and can explain what evidence proves that a person falls under a concrete
legal article. Analysis of justifications [14] may give important information on
ontology incompleteness, which often happens when a model is extended. We also
want to add generation of more ontological rules than owl:SymmetricProperty.
      The demo serves several purposes. It is evidence of the growing power of rule-
based systems. It can already be employed by prosecutors and judges for training


                                                                                         7
purposes. Moreover, upon a crime typology extension, it can be used as an expert
system. Our plan is to commercialize such a tool under the name “Anti-Fraud Future”.
     Extensions in some directions are relatively easy. To account for fraudulent
disbursement committed by non-management workers, one has to think of possible
schemes [15]. It can be either stealing money, e.g., from the register, falling under
Art. 284 PC, or falsifying documents (joint Art. 271PC and Art. 284 PC).
     Acknowledgement. This work was supported by the Polish Ministry of Science
and Higher Education, Polish Technological Security Platform grant
0014/R/2/T00/06/02 in the first stage of the work and by PUT DS 45-083/10 grant in
the later stage.


References

  1.    Bak J., Jedrzejek C., Application of an Ontology-based Model to a Selected Fraudulent
        Disbursement Economic Crime. In Casanovas, P., Pagallo, U., Ajani, G., and Sartor, G., editors,
        AI approaches to the complexity of legal systems, LNAI, Vol 6237, 2010
  2.    Bak J., Jedrzejek , C. and Falkowski M., Application of an Ontology-based and Rule-based Model
        to Selected Economic Crimes – Fraudulent Disbursement and Money Laundering, RuleML 2010
        Conference, LNCS 6403
  3.    PricewaterhouseCoopers         Global       economic       crime     survey      2009      (2009),
        http://www.pwc.com/gx/en/economic-crime-survey/download-economic-crime-people-culture-
        controls.jhtml
  4.    Financial Action Task Force (FATF), http://www.fatf-gafi.org
  5.    Bak, J., Jedrzejek, C., Falkowski, M.: Usage of the Jess engine, rules and ontology to query a
        relational database. In: Governatori, G., Hall, J., Paschke, A. (eds.) RuleML 2009. LNCS, vol.
        5858, pp. 216–230. Springer, Heidelberg (2009)
  6.    McGuinness D., van Harmelen, F.:. Owl web ontology language overview. W3C
        Recommendation, 10 February 2004, http://www.w3.org/TR/owl-features/
  7.    Pellet Reasoner, http://clarkparsia.com/pellet/
  8.    Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S., Grosof, B., Dean, M.: Swrl: A semantic
        web rule language combining owl and ruleml. W3C Member Submission (May 21 2004),
        http://www.w3.org/Submission/SWRL/
  9.    SWRL Built-ins, http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/
  10.   Jess (Java Expert System Shell), http://jessrules.com/
  11.   Jedrzejek C., Cybulka J., Minimal Model of financial crimes (In Polish) Definitions In OWL,
        Technical            report          PPBW            07/2009           extended          09/2009,
        http://www.man.poznan.pl/~jolac/MinimalModel/MinimalModel.owl; the present version is in
        Open Source http://150.254.41.181/SDL_demo_ontology.zip
  12.   OWL API 3.0, http://owlapi.sourceforge.net/
  13.   OWL Meta-model, http://www.ag-nbi.de/research/owltrans/owlmt.clp
  14.   Horridge, M., Parsia, B, and Sattler. U., Laconic and Precise Justifications in OWL. In
        Proceedings of the 7th International Semantic Web Conference (ISWC). Springer, Oct. 2008.
  15.   Jedrzejek C., Bak J., Application of an Ontology-Based Model to Wide-Class Fraudulent
        Disbursement Economic Crimes, Jurix 2010, submitted


                                                                                                        8