<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying Relevant Patterns in a Large Graph of Open Data: A Semantic Exploration of the Panama Papers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antoine Vion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix-Marseille University</institution>
          ,
          <addr-line>CNRS, LEST Aix-en-Provence</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>311</fpage>
      <lpage>332</lpage>
      <abstract>
        <p>The following essay examines the Panama Papers from a sociological perspective, using a method based on analytic induction through graph mining. By way of an introduction, it provides a general overview of the chosen approach. A series of case studies concerning tax evasion practices as detailed in the Panama Papers help to elucidate the kind of query processing that might accompany the practical implementation of the proposed method, before the final section gives a brief summary and touches upon a methodological issue that has yet to be resolved.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Identifying relevant patterns in a large graph of open data is a complex,
exploratory process – a statement that finds ample support in the following
essay, which examines the Panama Papers from a sociological perspective,
employing a method based on analytic induction through graph mining.
From a user’s perspective, data visualization is a crucial step on the path to
data‐driven discoveries
        <xref ref-type="bibr" rid="ref21">(Riche, 2015)</xref>
        . In the field of visual analytics, graphic
visual interfaces tease knowledge out of the data and reveal ways in which this
data can be tested, refined, and shared
        <xref ref-type="bibr" rid="ref18">( Pike et al., 2009)</xref>
        .
      </p>
      <p>
        In an eofrt to better understand how the process of analytic induction
through graph mining might function in practice, this essay explores open
data knowledge graphs (KGs) with the help of DERIVO’s SemSpect tool
        <xref ref-type="bibr" rid="ref12">(Liebig et al., 2017)</xref>
        .1 Running on a Neo4J graph database, SemSpect
allows researchers to explore KGs in an intuitive manner. Moreover, the tool’s
usage of the GraphScale system
        <xref ref-type="bibr" rid="ref12">(Liebig et al., 2017)</xref>
        permits the addition
of an abstraction layer to the KG, which enables fast reasoning and
highperformance querying
        <xref ref-type="bibr" rid="ref6">(Glimm et al., 2014)</xref>
        . With SemSpect, scholars can
embark on a data-driven exploration of the KG in question with a mere click
of their mouse – no blind queries are required. In contrast to other systems,
SemSpect’s aggregated representation allows users to investigate highly
complex KGs, making it possible to understand their structure without having
to engage with the details of the queries applied. At the same time, the tool
empowers researchers to group parts of the KG into clusters and categories,
and to re-use them in new queries, which in turn allows the model of the KG
to be progressively refined.
      </p>
      <p>In the following sections of this paper, I will provide a general overview of
analytic induction (Section 2); discuss the kind of query processing that was
used to examine a series of case studies on tax evasion practices, as revealed in
the Panama Papers (Section 3); present a selection of data-driven discoveries,
which will illustrate the benefits of this kind of query processing (Section
4); and present an outlook on further research based on analytic induction
(Section 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Applying the Method of Analytic Induction to Tax Evasion Patterns</title>
      <p>
        Analytic induction
        <xref ref-type="bibr" rid="ref25">(Znaniecki, 1934)</xref>
        is a well-established research strategy
in the social sciences. A researcher begins by studying a small number of cases
of a particular phenomenon with the intention of finding a set of common
denominators. The information gathered is used to draw up a hypothesis,
which is then tested on additional cases
        <xref ref-type="bibr" rid="ref23">(Robinson, 1951)</xref>
        . If any of the new
cases do not verify the hypothesis, either the hypothesis is reformulated to
match the features of all the cases studied so far, or the original definition
of the type of phenomenon to be explained is altered on the grounds that
it does not represent a causally homogeneous category
        <xref ref-type="bibr" rid="ref11">(Lewis-Beck et al.,
2003)</xref>
        . Further cases are investigated until no more irregularities appear.
Analytical induction
      </p>
      <sec id="sec-2-1">
        <title>Case building</title>
        <p>Building a singular complexion
of properties
Looking for interesting
relations</p>
      </sec>
      <sec id="sec-2-2">
        <title>Added value</title>
      </sec>
      <sec id="sec-2-3">
        <title>Frequency</title>
        <sec id="sec-2-3-1">
          <title>Graph mining</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Pattern in the graph:</title>
        <sec id="sec-2-4-1">
          <title>Fixed number of objects</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Required characteristics of each object Required relations between the objects</title>
          <p>Deep understanding of the data
while building hypotheses
Interesting patterns do not have
lots of occurrences but are
frequent enough to capture
general features</p>
          <p>
            The method of analytic induction highlights a number of complex
challenges associated with graph mining and semantic analysis (Table 1). First
and foremost, it underscores the importance of refining and developing the
categories that are initially used to define a particular social phenomenon;
computationally speaking, this corresponds to ontology refinement and
iterative semantic processing. Even more significant, perhaps, is the fact that the
primary goal of analytic induction is to identify cases
            <xref ref-type="bibr" rid="ref19">(Ragin and H., 1992)</xref>
            that can be used for comparative research: when translated to the field of
query processing in web data, such an approach requires a whole new set of
techniques, case-based reasoning search for similarities being a case in point
            <xref ref-type="bibr" rid="ref17">(Mottin et al., 2019)</xref>
            .
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Accounting Heterotopias: Retrieving Relevant Patterns as a Methodological Challenge</title>
      <p>Using the method outlined in the previous section, we examined the open
data package commonly referred to as the Panama Papers with a view
towards gathering any relevant information that could help to shed light on
the tax evasion techniques that were being used.</p>
      <p>
        Typically, the tax optimization practices of large companies are overseen
by external treasury management providers and include methods such as
over-invoicing; transferring expenses to subsidiaries located in tax havens;
’forgetting’ sub-subsidiaries in the accounting consolidation; organizing
white sales; and laundering money on a large scale through the purchase of
raw materials, and on a small scale through cheques or prepaid card systems.
=&gt;
=&gt;
=&gt;
All of these practices function as “black holes of power”
        <xref ref-type="bibr" rid="ref10">(Lascoumes and
Lorrain, 2007)</xref>
        insofar as intermediation is protected by the
institutionalization of secrecy in certain jurisdictions. The Lux Leaks scandal, in which
Luxembourg’s tax rulings were shown to have provided an unfair advantage
to over 340 companies worldwide, is a clear example of the high price paid by
whistleblowers from major audit firms who dare to reveal their companies’
schemes to the general public. In 2016, the European Directive on the
Protection of Trade Secrets further enhanced the secrecy of accounting data. As
a result, it has become very dificult to provide proof of ofshoring through
accounting – while the annual reports of major corporations are a matter of
public record via their consolidated statements, those of ofshore
subsidiaries are no longer available.
      </p>
      <p>
        In the absence of open transaction data, the information provided by the
Panama Papers is predominantly topological. Suzanne
        <xref ref-type="bibr" rid="ref22">Roberts (1994)</xref>
        has
shown how the geography of ofshore financial flows contributes to the
construction of fictitious spaces. In order to understand this phenomenon, it is
necessary to keep in mind that any accounting entry, whatever its final form,
assumes as a basic principle that the value recorded in one book is related to
a duplicate value in another book. Normally, this practice of double-entry
bookkeeping can serve as a basis for reconstructing transactions and, by
extension, the networks of economic agents who have recorded the
transactions in question. However, the various methods of fiscal optimization now
in widespread use combine to create what could be called an accounting
heterotopia; a term inspired by Michel Foucault’s concept of spaces that
suspend, neutralize, or reverse their relationships to other locations
        <xref ref-type="bibr" rid="ref5">(Foucault,
1984)</xref>
        . In eefct, tax havens oefr a means for constructing an accounting
heterotopia by permitting the registration of fictitious duplicates referring
to accounts drawn up in places where they escape any jurisdictional control.
What the Panama Papers allow us to do is to track down these duplicates.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Ofshoring Tricks</title>
        <p>Historically, law firms and corporate service providers such as Mossack
Fonseca, who were heavily implicated in the Panama Papers scandal, have used
certain strategies to help companies play with the spatiality of jurisdictions
in their relations with supervisory institutions. What these strategies have in
common is that they are not codified, but rather based on tacit knowledge.
We have therefore examined graph data from the Panama Papers with a view
towards patterns that are characteristic of such implicit practices.</p>
        <p>
          The online databases in question match the definition of what is
commonly called Open Data: “A piece of data is open if anyone is free to use,
reuse, and redistribute it – subject only, at most, to the requirement to
attribute and/or share-alike.” The progressive aggregation of such databases
means that they become increasingly linked
          <xref ref-type="bibr" rid="ref9">(Kitchin, 2014)</xref>
          in a process
of data massification. The main technical problem associated with such a
mass of data is the concomitant proliferation of duplicates, since the data
is not subject to semantic cleansing. If, for example, a company name has
been entered into the Evasion Professionals database as [Name] Limited and
[Name] Ltd, the same company will appear in two separate instances.
        </p>
        <p>The Evasion Professionals database is an aggregate of the Ofshore Leaks
database. The architecture of the database consists of oficers (operators
identified as set-up operators), intermediaries (intermediaries working on
the files), entities (natural or legal persons on whose behalf the file is
processed), and addresses (addresses linked to the three types of predefined
entities); companies are classified according to operational criteria (active,
inactive, dissolved, relocated, redomiciled, etc.). The content available online
does not, however, give access to all of the extracted data: in order to protect
themselves from cumbersome and costly legal proceedings, the International
Consortium of Investigative Journalists (ICIJ) and the participating hackers
have left out information concerning the amounts and dates of transactions.</p>
        <p>
          The absence of dates means that fine temporal processing is not available,
whether it be genealogical, archaeological, or sequential. As a result, the
attempt to represent the complex social temporalities involved in the various
transactions recorded in the data produces a synchronic fiction. In fact,
computation without dates makes it almost impossible to carry out a pragmatic
analysis of the operations of record capture, circulation, and manipulation,
or any form of matching of editing sequences forming the ‘careers’ of
clients
          <xref ref-type="bibr" rid="ref1">(Abbott, 1999)</xref>
          , which raises the question of the usefulness of
uploading data with such obvious technical and legal limitations.
        </p>
        <p>In the case of the Panama Papers, the ordering of the data corresponds
neither to the logic of entirely raw and unstructured data, nor to
semantically consistent object classes or ontologies.2 It does, however, appear
semistructured for the purposes of statistical analysis, whether it be simple
descriptive statistics or network statistics, and the ICIJ website provides a
graph-based network analysis software that appears to be intended for
precisely that purpose: the examples made available to users online show
networks of co-afiliation located at the same tax address, and the various
afiliations of operators identified as fraudsters.</p>
        <p>2In computing, ontologies are structured sets of terms and concepts that are used to
define the meaning of a given information field, either via the metadata of a namespace or
the elements defined by a stabilized knowledge domain.</p>
        <p>In combination, the openness of the available data, its sheer volume, and
the kind of investigative tools that are provided on the ICIJ’s website
exert considerable influence on how the mechanisms of tax evasion can be
examined.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>The Limitations of ICJI’s Collaborative Exploration Tools</title>
        <p>Sociologists investigating the phenomenon of tax evasion face a number of
methodological challenges. When conducting exploratory research of open
data, two types of problems immediately become apparent: the first
concerns the analytical limitations of the tools that are made available to the
researcher, as in the case of the ones provided on the ICIJ website; the second
has to do with the tools used for alternative research strategies.</p>
        <p>The ICIJ gives interested users the option of downloading network
processing software with which they can generate graphs comparable to those
created by information services companies to visualize the afiliations of
company directors or other persons of interest. However, the ICIJ’s choice to
focus on what is commonly referred to as self-centered networks or afiliation
networks has some rather unfortunate consequences. Generally speaking,
working with self-centered networks involves a simple query based on filters
and dictionaries of proper names (celebrities, companies, etc.), which makes
it possible to quickly retrieve information concerning the participation of a
given person or company. In the case of Mossack Fonseca, the immediate
targets of ICIJ’s eofrts were the British and Icelandic prime ministers, David
Cameron and Sigmundur Davíð Gunnlaugsson, both of whom resigned in
the wake of the Panama Papers scandal.3</p>
        <p>By conjuring up visible networks, the ICIJ succeeded in arousing media
interest in what are ultimately invisible practices. However, by choosing to
focus on the relationship between a particular personality and the
intermediaries of an optimization firm like Mossack Fonseca, the ICIJ participated
in a logic of shaming the latter’s clients, as opposed to conducting an
indepth analysis of the firm’s practices. Self-centered or afiliation networks
are simply not conducive to exposing the creative bookkeeping techniques
of such firms, which often duplicate, conceal, or anonymize company names
– in the final analysis, assessing the phenomenon of tax evasion based on
dyadic relationships, such as those between oficers and intermediaries, or
entities and oficers and/or intermediaries, is an inadequate response to the
challenges at hand.</p>
        <p>3The news that Cameron held shares in an ofshore company together with his father
broke in the middle of the Brexit campaign, while Gunnlaugsson’s involvement in ofshore
dealings came to light shortly after he had signed a military agreement with the US.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Visual Analytics</title>
        <p>
          One way to address the problems outlined above is to draw on software
solutions from the domain of visual analytics, which can be defined as a science
of analytical reasoning
          <xref ref-type="bibr" rid="ref20">(Ribarsky et al., 2009)</xref>
          supported by interactive visual
interfaces that help to overcome problems of data size and complexity
          <xref ref-type="bibr" rid="ref3">(Dill
et al., 2012)</xref>
          . Visual analytics is a multi-disciplinary field of research that
leverages recent findings in areas such as visualization, data mining, data
management, data fusion, statistics, and cognitive science
          <xref ref-type="bibr" rid="ref8">(Kielman et al., 2009)</xref>
          . As
pointed out in the introduction, the ability to visualize data is an essential
prerequisite for data-driven discoveries. In the course of our own
investigation into the Panama Papers, we used GraphScale and SemSpect software
developed by DERIVO – two excellent tools that make intuitive graph
exploration possible. The main advantage of GraphScale and SemSpect lies in
the fact that they exploit a level of ontological abstraction that is
automatically constructed from a semantic graph by means of graph mining in order
to enable complex and traceable queries.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Three Key Skills: Occulting, Partitioning, Porting</title>
      <p>By proceeding in an inductive way from basic data to the detailed analysis
of complex files, our systematic exploration of the database enabled us to
identify three recurring tax evasion techniques in the Panama Papers.
4.1</p>
      <sec id="sec-4-1">
        <title>Occulting</title>
        <p>As discussed in the previous section, the specific nature of the data contained
in the Panama Papers demands a suitable approach to processing it, and great
care must be taken in regard to the inferences that are being drawn.</p>
        <p>After conducting multiple tests, it became clear that the only reliable
method for engaging with the data at hand – incomplete as it is – was to
reconstruct the chains of tax packages starting with their associated addresses.
The registered companies are letterbox companies that have been set up for
the very purpose of bypassing legal restrictions, and they typically carry more
or less creative names. But if one manages to move past this ruse, it becomes
possible to reconstruct chains of letterboxes, to follow circuits of evasion,
and to expose the accounting heterotopias that are made possible by the
geographical location of the financial vehicle corporations (FVCs) involved in
the scheme.</p>
        <p>With this goal in mind, the query logic of searching for co-afiliations of
companies at specific addresses appears far more promising than the analysis
of egocentric networks. Indeed, the mass of data uploaded by the ICIJ is
rendered practically useless if it is not reconfigured under an ontological
database model which allows the user to define object classes and inferences
according to comparable semantic properties or certain logical properties.</p>
        <p>The need for such an ontological database model is easily demonstrated.
For example, the ICIJ site provides a graphical representation of the
relationships that can be reconstructed based on the identification of the various
companies registered to the same hotel suite in the Seychelles (Figure 1).</p>
        <p>Here we see how an unmodeled database can be used to trace the
relationships between the hotel’s managers and the companies Green Apple
System Limited/Ltd and GWT Systems Limited by relying on nothing more
than an address (Suite 102, Aarti Chambers, Mont Fleuri, Victoria, Mahé,
Seychelles).</p>
        <p>As Table 2 shows, the details of this address are subject to considerable
variation in the database, with changes falling under three broad
categories: punctuation, capitalization, and (deliberate) spelling errors. Without
a semantic approximation model, one would have to multiply the queries
to search for other possible occurrences of the same suite under diefrent
names. However, thanks to the modeling we developed in cooperation with
DERIVO, we were able to identify not one but twelve occurrences of the
same suite (Table 3). Moreover, we found other suites with multiple
occurrences at the same residence, as well as in other locations throughout the
Indian Ocean (Seychelles, Mauritius, etc.).</p>
        <p>
          Since the data is not time-stamped, it is dificult to infer whether or not
the relationships between all of these companies were simultaneous – it
appears possible, and indeed likely, that we are dealing with a series of successive
domiciliations. But in any case, what becomes evident here is an important
mechanism of tax evasion, namely the use of concealment procedures within
protected databases in accordance with Hervé Falciani’s
          <xref ref-type="bibr" rid="ref4">(Falciani, 2015)</xref>
          description of the IT strategies employed by HSBC to disperse data within the
organization for reasons of secrecy. This insight became possible because our
modeling allowed us not only to detect simple dyadic relationships, but to
establish sets of companies linked to the same registration address.
        </p>
        <p>
          Were we to use ICIJ’s approach to analyze the same data, we would find
that a total of eighty-one addresses remain after the subtraction of four
duplicates, all of which appear as separate entities. What becomes obvious here
is that while ICIJ managed to extract a large amount of information
concerning the mechanisms of tax evasion (and, as we have seen, generated
impressively large numbers in the process), the knowledge it produced is actually
much more superficial than it appears. In other words, it is quite clear that
the cognitive strategy employed to convert information extracted from the
data into knowledge
          <xref ref-type="bibr" rid="ref2">(Boisot and Canals, 2004)</xref>
          was not based on a firm grasp
of the nature of that data, even though this understanding is absolutely
essential. So-called ‘open’ data is by no means immune to such problems: if
anything, the very term itself may create a dangerous illusion of
accessibility, when in fact the mere availability of information is meaningless in the
absence of a viable analytical strategy.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Partitioning</title>
        <p>As the previous example has shown, the uncovering of tax avoidance
structures entails identifying the location nodes of FVCs and using these nodes
to reconstruct corporate networks based on criteria such as company
owners, intermediaries, and assembly operators. Instead of relying on the
soft</p>
        <p>No. of No. of
compan- doublets
ies
67 3
1
Lettering of the address
ware made available online by ICIJ with all of its limitations, I would argue
that it is much more expedient to investigate evasion circuits on the basis of
relational chains.</p>
        <p>
          The analysis of networks in terms of relational chains, first proposed by
Stanley
          <xref ref-type="bibr" rid="ref14">Milgram (1967)</xref>
          , was picked up by Mark
          <xref ref-type="bibr" rid="ref7">Granovetter (2018)</xref>
          in his
study on employment in the US, which examines the chains of contact that
need to be mobilized in order to gain access to information on career
opportunities and to request assistance with the application process.
        </p>
        <p>
          Let us consider an example of such a chain: the employer knows A, who
knows B, who knows C, who knows the potential recruit; B passes the latter’s
name along to A together with his assessment of C’s qualification to give a
recommendation, and the likelihood of his giving a frank opinion; A in turn
passes this information on to the employer, along with his own assessment
of B. Of course, chains of this kind become more and more impractical the
longer they are – even the three-step recommendation chain from our
example has a slightly implausible ring to it. But Milgram’s work holds out
hope that short chains could in fact be the norm: his investigation showed
that randomly chosen pairs of Americans (one from Massachusetts and the
other from Nebraska) need a mere six to eight links on average in order to
connect them with one another
          <xref ref-type="bibr" rid="ref7">(Granovetter, 2018, 137)</xref>
          .
        </p>
        <p>In the case of the Panama Papers, the relational chains are not always
chains of operators in the strictest sense of the word, since it is not
uncommon for the final recipient to be the initial customer at the end of a perfectly
circular array of intermediaries. This poses several problems if we are to
conduct a true network analysis.</p>
        <p>The first obstacle is gaining access to the governance structures of
companies residing in tax havens, as managers often enter account lines in shell
companies that are nothing more than mailboxes hosted by intermediaries.
This raises the question of which sources can be used to trace these financial
lfows – after all, the concerned parties have no incentive whatsoever to
cooperate. And what of the various other challenges associated with modeling
these complex and sometimes flat-out paradoxical relationships? Can one be
connected to a fictitious double of oneself via an intermediary one has never
met? And if so, to what extent is that double actually fictitious? Are there
no formal obligations to be fulfilled (signature, face-to-face procedures, etc.),
no practical precautions to be taken to facilitate eventual liquidation?</p>
        <p>For the purposes of our investigation, the lack of complete accounting
data means that carefully planned qualitative explorations are necessary to
understand the structure of the relational chains being examined. One
particularly instructive case involves a number of companies linked to Portcullis,
an assembly operator that is registered at 113 diefrent addresses in 9
countries (Table 4).</p>
        <p>These addresses are in turn linked to 1,735 oficers, i.e. professional
assembly operators (Table 6). The majority of these oficers are located in tax
havens, but some are based in European countries that have not traditionally
fulfilled such a role, including Italy. The two Italian oficers are registered at
18 diefrent addresses in their home country and are linked to an additional
22 oficers, one of whom is a legal entity called Sharecorp. When we proceed
to examine the number of beneficiaries of this company, we discover no less
than 1,610 further entities, distributed geographically over several tax havens
(Table 5).</p>
        <p>We can thus regress to infinity. What we have uncovered here, then, is
an intricate partitioning system that organizes transnational address sharing
in the form of concentric circles; an arrangement that is dificult if not
impossible to penetrate for all those who do not have access to the database of
participating law firms, most of which are based in the British Virgin Islands.
Graph-based visualization, however, can help to facilitate the search for an
original pattern. Beginning with what appears to be a kind of statistical
anomaly, it becomes possible to identify forms of organization that would go
unnoticed by classical detection algorithms – spotting their relevance is not
so much a matter of statistical knowledge of graph structures, but a question
of familiarity with the object of inquiry.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Porting</title>
        <p>Our analysis revealed two characteristic traits of the assemblies in question:
high-intensity backrest turnover and pronounced diversity of the carrying
vehicles. The use of shipping companies and shipping addresses seems to be
especially favored (Table 6).</p>
        <p>The available data is testament to the massive scale on which asset
management companies have been incorporated (72,720 in the Panama Papers and
a further 4,260 in the Ofshore Leaks database). Figure 2 shows the
distribution of these companies by country.</p>
        <p>Of course, a chart based on the organizing principle of companies per
country has significant limitations, as it does not reflect the volume of
assets carried. Nevertheless, it does show that the extensive use of shipping
companies is favored by Western companies and their traditional tax havens,
whereas it is much less prevalent in Asia (with the notable exception of Hong
Kong, which has traditionally served as the financial interface between major
Western banks and Asian companies).</p>
        <p>The 76,980 companies in question own a total of 35,345 entities, of which
Country
British Virgin Islands
China
Cook Islands
Hong Kong
Malaysia
Samoa
Seychelles
Singapore
Taiwan
31,549 were set up by Mossack Fonseca, and the rest by Portcullis. The status
profile of these entities shown in Table 7 illustrates the intensity of turnover.</p>
        <p>
          With companies registered as an international business company (IBC) in
a tax haven, the situation is straightforward: they are considered to be classic
instruments of tax evasion. Other vehicles, such as the porting of addresses
for consulting firms, blur the line between tax evasion and money laundering.
Indeed, the scheme of using a multitude of ‘shipping addresses’ is very much
in line with the findings of Fabian
          <xref ref-type="bibr" rid="ref24">Teichmann (2017)</xref>
          , who conducted an
interview survey with agents mentioned in the Panama Papers. In his study
on money laundering methods, Teichmann demonstrated that, in addition
to more ‘traditional’ methods (buying gold, jewellery, rough diamonds,
antiques, and paintings; arranging cash deals or deals with foreign exchange
ofices; etc.), organizational arrangements such as real estate projects,
overinvoicing by consulting firms, certain types of mergers and acquisitions, and
banking transactions (particularly in Dubai) play an important role.
        </p>
        <p>To cite just one example, a relational chain based on the addresses
connected to a specific porting address alerted us to complex financial dealings
between Russia, Dubai, and the United Kingdom. After researching
identities and companies, we found that one of the individuals (subsequently
referred to as operator X) involved in the scheme was already being
prosecuted in the United Kingdom and Moldova for money laundering on a scale
of several billion euros.
Country
Australia
Bahrain
British Virgin Islands
Canada
China
Cook Islands
Djibouti
France
Germany
Greece
Guernsey
Hong Kong
India
Indonesia
Italy
Japan
Kazakhstan
Kenya
Malaysia
Mauritius
Myanmar
Philippines
Russia
Samoa
Seychelles
Singapore
South Africa
South Korea
Spain
Sri Lanka
Switzerland
Taiwan
Thailand
Turkey
United Arab Emirates
United Kingdom
United States 1
Status
Defaulted
Active Entity
Dissolved
Changed agent
Struck / Defunct / Deregistered
Inactivated
Resignated as agent
Dead
Relocated in new jurisdiction
In transition
Discontinued
Transferred Out
Shelf company
Bad debt account
Liquidated
Not to be renewed / In deregistration
Shelf company not possible to sell
In liquidation
Unregistered
Change in administration pending
Redmoicited
Trash company
Profile
Standard International Company
Standard Company under IBC* Act
Business Company Limited by Shares
Nominee Only Entity
Bahamas IBC*
Turks
Geographical location of the entities
carried
Entity in Switzerland
Entity in Luxembourg
Entity in British Virgin Islands
Entity in Hong Kong
Entity in USA
Entity in Germany
Entity in France
o U
i
t e
R d
: n</p>
        <p>a
3 ,
e i
r a
u b
ig u
F D</p>
        <p>Yet the connected addresses revealed arrangements that went well beyond
the indictments brought in this case: in the Mossack database, operator X
appears as the manager of 103 companies located in ten diefrent countries:
82 in the British Virgin Islands, 70 in Russia, 9 in Cyprus, 7 in Samoa, 6
in Ukraine, 4 in the United Kingdom, 3 in the US, 2 in the Seychelles, 1 in
Hong Kong, and 1 in the United Arab Emirates.</p>
        <p>Operator X as an individual is domiciled in a series of addresses in Dubai
that are artificially distinguished according to the method described above.
Some of these addresses are shared by the anonymous directors of
Cyprusdomiciled carrier companies. After a query of the operators sharing the
address of the apartment in Dubai, it became apparent that one of them, a
British national (Y), is one of several British shareholders in six companies
domiciled in Russia and Cyprus, neither of which is owned or managed by
X. However, when we explored X’s own activities in Cyprus, we found that
X manages other companies in the country. In turn, these companies are all
mediated by a Russian consultancy firm registered at an address in Cyprus –
and one of the directors of this firm is Y, the person who shares X’s address
in Dubai (Figure 3).</p>
        <p>The chain thus operates as follows: the British partners in Russia, linked
to the Dubai address via their business partner Y, manage the business
relations of Russian and foreign companies established in Russia with other
companies in which Y is involved via a consultancy company registered in
Cyprus. The management of Cypriot companies from Dubai in turn
enables the establishment of holding companies that carry out transactions via
Mossack Fonseca in the British Virgin Islands.</p>
        <p>What makes this chain so interesting is that it is connected, via its main
operator, to a large-scale Russian money laundering network. In this case,
19 British companies have already been prosecuted for money laundering to
the tune of 20 billion pounds. The schemes described above may be legal,
and the presumption of innocence must be respected, but their discovery
illustrates the fragility of the boundaries between legally permissible financial
optimization and money laundering – some consulting firms such as
Mossack Fonseca have certainly been complicit in large-scale money laundering
operations, as the recently leaked FinCEN files show.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Summary and Outlook</title>
      <p>The questions and findings that have arisen over the course of our project
are indicative of a much broader need for tools that enable the investigative
exploration of open data in the humanities and social sciences (HSS) – a
technological infrastructure we refer to as an Investigation Support System (ISS).
We believe that two key building blocks are needed to establish such an ISS:
visual analytics and knowledge graph data mining.</p>
      <p>Visual analytics integrates new theoretical approaches and computational
tools with innovative interactive techniques and visual representations to
enable human‐information discourse. In our case, where visual analytics is
applied to knowledge graphs or semantic graphs constituted from open data,
DERIVO’s GraphScale and SemSpect tools have proven to be ideally suited
to the task at hand.</p>
      <p>
        When it comes to knowledge graph mining, relational data mining
(RDM) is an especially promising approach. Unlike traditional data
mining algorithms, which look for patterns in a single table (propositional
patterns), RDM algorithms look for patterns among multiple tables (relational
patterns). Davide Mottin’s groundbreaking work in the field allows
researchers to apply the sociological method of analytic induction to substantial
bodies of open data. By constructing exemplar patterns,
        <xref ref-type="bibr" rid="ref15">(Mottin et al., 2014)</xref>
        ,
examining their properties
        <xref ref-type="bibr" rid="ref16">(Mottin et al., 2017)</xref>
        , and systematizing this kind
of exploration
        <xref ref-type="bibr" rid="ref17">(Mottin et al., 2019)</xref>
        , scholars are able to expand their queries
intuitively, which produces a more informative (full) query that can retrieve
more detailed and relevant answers
        <xref ref-type="bibr" rid="ref13">(Lissandrini et al., 2020)</xref>
        .
      </p>
      <p>And yet, one significant methodological challenge remains. Graph
pattern mining aims at identifying structures that appear frequently in large
graphs, under the assumption that frequency signifies importance. Several
measures of frequency have been proposed that respect the a priori property,
which is essential for an eficient search of the patterns. This property states
that the number of appearances of a pattern in a graph cannot be larger than
the frequency of any of its sub-patterns. In real life, however, there are many
graphs with weighted nodes and/or edges, in which case it would be clearly
sensible for the importance (score) of a pattern to be determined not only by
the number of its appearances, but also by the weights on the nodes/edges
of those appearances.</p>
      <p>Removing this obstacle will require both a rigorous methodology of
frequency scoring, which Mottin et al. are in the process of developing, and
suitable methods for case-building, which GraphScale and SemSpect already
permit. In light of this, SemSpect’s plugging with Neo4j may well prove to
have been the decisive step towards the large-scale application of analytic
induction to open data.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The author would like to thank Thorsten Liebig, co-founder and CEO of
DERIVO, for modeling the initial semantic graph of the ICIJ’s online
database, and Vincent Vialard, senior engineer at DERIVO and co-presenter of
this paper at the 2019 Graph Technologies conference, for his collaboration
and astute assessment of the research process.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abbott</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>What Do Cases Do? Some Notes on Activity in Sociological Analysis</article-title>
          . In Ragin, C. and H., B., editors,
          <source>What Is a Case? Exploring the Foundations of Social Inquiry</source>
          , pages
          <fpage>53</fpage>
          -
          <lpage>82</lpage>
          . Cambridge University Press, Cambridge, NY.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Boisot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Canals</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <source>Data, Information and Knowledge: Have We Got It Right? Journal of Evolutionary Economics</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <fpage>43</fpage>
          -
          <lpage>67</lpage>
          , DOI: 10.1007/s00191-003-0181-9.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Dill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Earnshaw</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasik</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vince</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al., editors (
          <year>2012</year>
          ).
          <source>Expanding the Frontiers of Visual Analytics and Visualization</source>
          . Springer, London, DOI: 10.1007/978-1-
          <fpage>4471</fpage>
          -2804-5.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Falciani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Séisme sur la planète finance: au coeur du scandale HSBC</article-title>
          .
          <source>La Découverte</source>
          , Paris.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Foucault</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1984</year>
          ).
          <article-title>Dits et écrits, Des espaces autres</article-title>
          . Mouvement, Continuité,
          <volume>5</volume>
          :
          <fpage>46</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al. (
          <year>2014</year>
          ).
          <article-title>HermiT: an OWL 2 reasoner</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>53</volume>
          (
          <issue>3</issue>
          ):
          <fpage>245</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Granovetter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Getting a Job: A Study of Contacts and Careers</article-title>
          . University of Chicago Press, Chicago, IL.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kielman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and May,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Foundations and Frontiers in Visual Analytics</article-title>
          .
          <source>Information Visualization</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <fpage>239</fpage>
          -
          <lpage>246</lpage>
          , DOI: 10.1057/ivs.
          <year>2009</year>
          .
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kitchin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences</article-title>
          . Sage, London.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Lascoumes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lorrain</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Trous noirs du pouvoir</article-title>
          .
          <source>Les</source>
          intermé- diaires de l'action publique.
          <source>Introduction. Sociologie du travail</source>
          ,
          <volume>49</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , DOI: 10.4000/sdt.20509.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lewis-Beck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bryman</surname>
            ,
            <given-names>A. E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>T. F.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <source>The Sage Encyclopedia of Social Science Research Methods. Sage</source>
          , London.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Liebig</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vialard</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Opitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Connecting the Dots in Million-Nodes Knowledge Graphs With Semspect</article-title>
          . In Nikitina, N.,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Haase</surname>
          </string-name>
          , P., editors,
          <source>Proceedings of the ISWC 2017 Posters &amp; Demonstrations and Industry Tracks</source>
          , volume
          <volume>1963</volume>
          <source>of CEUR Workshop Proceedings</source>
          . http://ceur-ws.
          <source>org/</source>
          Vol-1963/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Lissandrini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mottin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>GraphQuery Suggestions for Knowledge Graph Exploration</article-title>
          .
          <source>In WWW '20: Proceedings of The Web Conference</source>
          <year>2020</year>
          , pages
          <fpage>2549</fpage>
          -
          <lpage>2555</lpage>
          , New York, NY. Association for Computing Machinery, DOI: 10.1145/3366423.3380005.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Milgram</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1967</year>
          ).
          <source>The Small World Problem. Psychology today</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>60</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Mottin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lissandrini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Exemplar Queries: Give Me an Example of What You Need</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ):
          <fpage>365</fpage>
          -
          <lpage>376</lpage>
          , DOI: 10.14778/2732269.2732273.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Mottin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lissandrini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>New trends on exploratory methods for data analytics</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>10</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1977</fpage>
          -
          <lpage>1980</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Mottin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lissandrini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Exploring the Data Wilderness through Examples</article-title>
          .
          <source>In Proceedings of the 2019 International Conference on Management of Data</source>
          , pages
          <fpage>2031</fpage>
          -
          <lpage>2035</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Pike</surname>
            ,
            <given-names>W. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stasko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>O</given-names>
            <surname>'connell</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. A.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <source>The Science of Interaction. Information Visualization</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <fpage>263</fpage>
          -
          <lpage>274</lpage>
          , DOI: 10.1057/ivs.
          <year>2009</year>
          .
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Ragin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and H., B., editors (
          <year>1992</year>
          ).
          <article-title>What Is a Case? Exploring the Foundations of Social Inquiry</article-title>
          . Cambridge University Press, Cambridge, NY.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Ribarsky</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Fisher,
          <string-name>
            <given-names>B.</given-names>
            , and
            <surname>Pottenger</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <source>Science of Analytical Reasoning. Information Visualization</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <fpage>254</fpage>
          -
          <lpage>262</lpage>
          , DOI: 10.1057/ivs.
          <year>2009</year>
          .
          <volume>28</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Riche</surname>
            ,
            <given-names>N. H.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Data-Driven Discoveries: Pushing Visualization Research Further</article-title>
          .
          <source>IEEE Computer Graphics and Applications</source>
          ,
          <volume>35</volume>
          (
          <issue>3</issue>
          ):
          <fpage>42</fpage>
          -
          <lpage>43</lpage>
          , DOI: 10.1109/
          <string-name>
            <surname>MCG</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <volume>54</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Fictitious Capital, Fictitious Spaces: The Geography of Ofshore Financial Flows</article-title>
          . In Corbridge, S.,
          <string-name>
            <surname>Thrift</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Martin</surname>
          </string-name>
          , R., editors,
          <source>Money, Power and Space</source>
          , pages
          <fpage>91</fpage>
          -
          <lpage>115</lpage>
          . Blackwell, Oxford.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>W. S.</given-names>
          </string-name>
          (
          <year>1951</year>
          ).
          <article-title>The Logical Structure of Analytic Induction</article-title>
          . American Sociological Review,
          <volume>16</volume>
          (
          <issue>6</issue>
          ):
          <fpage>812</fpage>
          -
          <lpage>818</lpage>
          , DOI: 10.2307/2087508.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Teichmann</surname>
            ,
            <given-names>F. M. J.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Twelve Methods of Money Laundering</article-title>
          .
          <source>Journal of Money Laundering Control</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <fpage>130</fpage>
          -
          <lpage>137</lpage>
          , DOI: 10.1108/JMLC-05-2016-0018.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Znaniecki</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>1934</year>
          ).
          <article-title>The method of sociology</article-title>
          . Rinehart &amp; Company, Inc., New York.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>