A Framework for Occupational Fraud Detection by
               Social Network Analysis

                            Sanni Lookman, Selmin Nurcan

       Centre de Recherche en Informatique, Université Paris I, Panthéon-Sorbonne
    lookman.sanni@malix.univ-paris1.fr, nurcan@univ-paris1.fr


      Abstract. This paper explores issues related to occupational fraud detection.
      We observe over the past years, a broad use of network research across social
      and physical sciences including but not limited to social sharing and filtering,
      recommendation systems, marketing and customer intelligence, counter intelli-
      gence and law enforcement. However, the rate of social network analysis adop-
      tion in organizations by control professionals or even by academics for insider
      fraud detection purpose is still very low. This paper introduces the OFD – Oc-
      cupational Fraud Detection framework, based on formal social network analysis
      and semantic reasoning principles by taking a design science research perspec-
      tive.


      Keywords: Design science, ontology, data mining, fraud detection, social net-
      work analysis, internal control, governance, risk, compliance.


1     Introduction

   Frauds partly draw from human beings imaginative nature. Over the years, fraud-
ster’s attack methodologies have evolved from an opportunistic approach to some
more sophisticated and traceless deception schemes and that, in a constantly yet au-
tomatizing but complexifying business environment. In recent years, several unethical
behaviors within organizations have received significant attention. Celebrated cases
range from financial scandals (Pechiney - 1988, Elf - 1994, Enron - 2001, Kerviel
2008) to data theft (WINDOWS 8 Beta - 2012, Korea Credit Bureau - 2014, SONY -
2014) and have proven that fraud is likely to happen at any level of an organization.
The Association of Certified Fraud Examiners, in his 2014 report to the nations on
occupational fraud and abuse [ACFE, 2014], estimates a global loss of 5% of reve-
nues to fraud (3.7 trillion dollars if applied to the 2013 Gross World Product). They
additionally reported that fraud cases were mostly uncovered by tips or chance (40%).
That is an anonymous fraud hotline would even anticipate a lot of fraud damage and
yet, knowledge discovery and data mining techniques are teeming.
   Detection innovations include automated rules, watch lists matching, supervised
and unsupervised classification, data fusion and link analysis. Such techniques have
received increased industry specific interests for external frauds (i.e. committed by
people outside of the organization) detection. Those would include cybercrimes by
computer or network intrusion, credit card, insurance, telecommunication and credit
application frauds [Phua et al., 2004], [Yufeng et al., 2004], [Cox et al., 1997],
[Wheeler et al., 2000]. In the meantime, internal or occupational fraud, defined by the
ACFE as the use of one’s occupation for personal enrichment through the deliberate
misuse or misapplication of the employing organization’s resources or assets, has
proved to be more prevalent than external fraud. PriceWaterhouseCoopers’ 2014
Global Economic Crime Survey reports in France an average of 56% of internal fraud
[PWC, 2014].
   This paper elicits problems faced by investigators in the process of occupational
fraud detection and comes up with a solution which contributes to solving these prob-
lems. Following the design science research paradigm [Wieringa, 2009], formal social
network analysis and semantic modeling concepts have been reused to suggest a new
perspective on the architecture of an effective fraud detection system.
   The remainder of this paper is organized as follows. Section 2 introduces proper-
ties, formal analysis of social networks and motivation for their use to address fraud
detection issues. Section 3 then demonstrates the design of the OFD framework and
the validation of its design within a context of fraud detection from journal entries. In
the last section, related works, concluding remarks and their implication for further
research on social network analysis were taken up.


2      Social Networks

2.1    What is Social Network Analysis (SNA)?

   A social network is a concept referring to a structure made of social actors sharing
interests, activities, etc… Joseph Moreno is cited by most research papers on the topic
of social network analysis as being the first to introduce methods and tools for a for-
mal analysis. In the 1930s, he was the first one to use all four properties that charac-
terizes SNA at the same time in a study aiming at explaining a spate of girls’ runa-
ways: (1) the intuition that links among social actors are important. (2) It is based on
data that record social relations that link actors. (3)It draws heavily on graphic image-
ry to reveal and display the patterning of those links. (4) It develops mathematical and
computational models to describe and explain those patterns [Freeman, 2011]. Basi-
cally, SNA aims at understanding relationships between the network participants, by
means of mapping and measuring. SNA has received increased attention from organ-
izations seeking to understand connection between patterns of interactions. It applies
to a wide range of business problems including collaboration in workplaces, team
building in post merger configuration, employee’s engagement measurement, online
reputation, customer intelligence, business strategy, disease contagion, counter terror-
ism, etc. It was a SNA which led US military to the capture of Saddam Hussein in
December 2003 [PSU, 2007]. The tool Inflow for example is credited with contribu-
tion to the analysis of terrorist networks surrounding the September 11 th events and
contact tracing for HIV transmission in a state prison [INFLOW, 2010].
2.2    Formal Social Network Analysis
   Whether used for infectious disease spread modeling, professional relations analy-
sis, concentration of resources or power identification, SNA would follow two differ-
ent approaches. Researchers distinguish between egocentric and socio-centric analysis
of networks [Chung et al., 2006]. In the former type of analysis, the focus is made on
local structure of networks, i.e. the network around a given node while the latter con-
siders the network as a whole, looking at interactions patterns and the overall network
structure by quantifying relationships between people. This distinction would impact
the SNA process during data collection and graph visualization.
   SNA provides both a visual and a mathematical analysis of relationships between
the entities participating to the network. From the visual perspective, social networks
are represented as “sociogram” [Scott et al., 2011] or graphs showing actors as nodes
that are tied by one or many types of interdependency (values, ideas, visions, sex,
friendship, kinship, collaboration, trade, antagonism, etc…). From a mathematical
perspective, the social relations datasets translate into a matrix, underlining the visual-
ized graph. This perspective serves at uncovering the graph’s theoretic properties
(e.g.: number of edges, number of vertices, degree, multiplexity, centrality, density,
closeness, betweenness, etc…), supported by metrics computed from the matrix, that
help characterizing and even querying the network at hand.


2.3    Why Social Network Analysis for Addressing Insider Fraud Detection?

   Social Network Analysis can bring value to occupational fraud detection in at least
three ways. First, nowadays organizations are networked (staff, management, custom-
ers, suppliers, etc…) and fraud can originate from any part of the network. SNA
brings the ability to analyze behaviors and reveal hidden connections that would have
not been seen in raw text format.
   Secondly, the dynamic nature of fraud makes detection challenging for the tradi-
tional rule based algorithms. Fraudsters are constantly adapting to circumvent the
existing controls and any new pattern would not be covered by such static algorithms.
As people excel at detecting patterns and their judgment when reviewing anomalous
activities or transactions very valuable, we believe combining this human ability to
computer’s capability to iteratively and tirelessly search for defined instances would
improve the overall detection process.
   Thirdly, SNA can help saving time during manual investigation, which is a neces-
sary step for validating any potential fraud case uncovered by a tool. Traditional com-
puter-aided audit tools are transaction oriented [ACL, 1987], output rows of incrimi-
nated transactions without the view of other related transactions performed by the
same entities and thus make the manual investigation process labor intensive. With
SNA the involved entities and their overall activities is readily available in a graph
view for fraud examiners, who in turn are able to quickly visualize false positives and
can focus on more risky cases.
3      The Occupational Fraud Detection Framework - OFD

3.1    Framework design


                         Fig. 1. Overview of the OFD framework

     The OFD framework as envisioned in this paper, starts with the selection of a
process well integrated with IT, from which historical facts (e.g.: sales, journal en-
tries, purchases, etc…) can be extracted from. OFD is built around four main compo-
nents:
 The ontology designer, which is the specification component. As fraud examiners
     are barely proficient in data sciences to maintain robust systems their own and
     fraud schemes always evolving, the need of a layer of semantic in fraud detection
     systems architecture is mission critical. Actor types, interaction types and their
     respective characteristics are to be specified via this component. The way in
     which they are represented in the final sociogram (shape, color, etc…) is essential
     and to be defined here as well. Compliance related rules, conflicting interactions,
     antecedence or association rules are also regarded. The idea behind this compo-
     nent is its complete flexibility so as to enable a fraud examiner to not only control
      the rationale behind how fraudulent cases are uncovered, but also the display of
      the different visualizations available.
     The risk assessment engine is made of a data parser, for ensuring proper integra-
      tion of raw data collected and a semantic reasoning system. The reasoner is
      meant to infer logical consequences from the rules specified in the ontology de-
      signer. It would analyze the parsed data with both socio-centric and ego-centric
      perspectives. At one hand, ego-centric analysis will highlight individual interac-
      tions which violate the set of rules specified, while at the other hand, socio-
      centric analysis will enable the identification of internal control deficiencies (e.g.:
      no segregation of duties) and the detection of fraud not pertaining to a specific
      transaction, or entity (e.g.: conflicts of interest, management frauds, etc…).
     The reporting component, like what exist today in the industry, would report on
      cases of violation of the specified rules, by outputting rows of potentially fraudu-
      lent transactions.
     The visualization component with its set of actionable sociograms includes a
      multidimensional social network view, showing several interaction types in the
      same network, what goes along with the socio-centric perspective mentioned ear-
      lier. Drill down and rollup capabilities would help zooming into transactions per-
      taining to a specific interaction type, or a specific actor of the network (ego-
      centric analysis). On reduced set of interactions, the time dimension would also
      be viewable. This component is critical to the overall detection process as
      through it, fraud examiners would uncover new unforeseen patterns to be speci-
      fied in the ontology editor, thus paving the way for a continuously improving
      fraud detection engine.


3.2     OFD framework evaluation by early prototyping

     Before jumping to the development of a generic, sound and theoretically ground-
ed tool for supporting the framework introduced earlier, a review of the design has
been performed. The aim of such evaluation was threefold:
a. Assess the extent to which graphs can fairly and faithfully represent the diversity
     of interactions happening between actors of the same or different organizations.
b. Measure the expressiveness of a social network in terms of red flagging of fraud-
     ulent interactions or transactions.
c. Gain insights on the perceived complexity by fraud examiners in the use of such
     visualizations to support fraud detection.
     To this end, we ran a case study using accounting journal entries extractions as
input for two different organizations of different size. The case study was conducted
in collaboration with a population of internal auditors, who have been surveyed on
various multidimensional social networks generated from the accounting journal en-
tries (actionable visualization component). The number of auditors involved in the
evaluation cannot be revealed in the presence of non disclosure agreement with the
cooperating organization. The R project and the network analysis package “IGRAPH
0.7.1” [Csárdi et al., 2006] were used for scripting raw data parsing, business and
design logic. Figure 2 illustrates the overall multidimensional network for one of the
entities studied.


         Fig. 1. Global multidimensional social network of accounting journal entries

Each edge in the graph above corresponds to a type of interaction happening between
an employee (orange nodes) and a third party (other nodes - customer or supplier in
this case). Red edges correspond to outgoing payments, orange ones being purchase
invoices, etc… Different drills down or subsets of what is shown in figure 2 have
been submitted to the auditors, like the one in figure 3, illustrating supplier only relat-
ed interactions for the same entity as above.
   The key takeaways from this evaluation exercise are as follows:

 Not all journal entries involve a third party (customer, supplier, etc…), what could
  be perceived as a threat to the validity of our social network oriented approach.
  Fortunately, such entries (depreciation, amortization, miscellaneous incomes,
  etc…) are usually subjected to rules which can be reasoned by the risk assessment
  engine and atypical entries solely highlighted in the reporting engine.
 The manipulation of the proposed visualizations is not that intuitive for auditors,
  even with a help document attached. Training should not be neglected as 20% of
  the surveyed auditors perceived the visualization as being too complex, embedding
  too much information at once. They actually did not provide any further answer to
  the questionnaire.
 The remaining participants’ high level observations or socio-centric conclusions
  were identical (e.g. non effectiveness of segregation of duties), denoting the good
  expressiveness of graphs for serving such purpose.
 At the other hand, the ego centric findings were diverse and varied from an auditor
  to another one, but not contradictory. The variability in the red flags of interest
  might be explained by the difference in the past experiences of each one of the au-
    ditors. They tend to focus their testing procedures on the types of anomalies they
    expect to come across (what is quite aligned with traditional rule based static de-
    tection algorithms); the visualization can help then going beyond that, by expand-
    ing the range of possibilities and suggesting further investigation axes.


      Fig. 2. Drill of the overall social network down to suppliers only related interactions


4       Conclusion and future works
        To the best of our knowledge, only few research papers tackled issues in occu-
pational fraud and even fewer integrated visual analytic concepts to their approach.
Those last include unsupervised approaches like graph pattern matching techniques
[Eberle et al, 2011], with strong focus on structural anomalies identification, but un-
fortunately forgoing real world business specificities and rules, what leads to a high
rate of false positives and complex maintenance by end users. Other approaches like
[Luell, 2010] or [Argyriou et al, 2013], rely on innovative but tailor made visualiza-
tions which cannot be applied to other business processes. The framework presented
in this paper extends existing data mining techniques used for occupational fraud
detection by offering not only visualizations to be used by auditors to uncover new
fraud patterns, but also semantic reasoning capabilities for integrating those new pat-
terns to the fraud detection engine. The targeted architecture is then scalable and ex-
tensible provided the only maintenance of specified ontologies. Our assessment of the
serviceability of the sociograms on accounting journal entries delivered promising
results and future directions for this research will be towards the design and the eval-
uation of a full prototype for supporting the framework. The generic nature of the
framework presented herein and its network oriented approach also open perspectives
for investigation beyond the scope of occupational fraud detection. Cyber criminality
in an environment where information systems are more and more interoperable may
also be investigated following a likely approach.


References
   [ACFE, 2014] ACFE 2014 Report to the nations on occupational fraud and abuse
    http://www.acfe.com/uploadedFiles/ACFE_Website/Content/documents/2004RttN.pdf
   [ACL, 1987] www.acl.com
   [Argyriou et al, 2013] Evmorfia N. Argyriou, Aikaterini A. Sotiraki, Antonios Symvonis.
    Occupational Fraud Detection Through Visualization, In Proc. of the 11th IEEE Intelli-
    gence and Security Informatics (ISI 2013), pages 4-7, 2013.
   [Chung et al., 2006] Kenneth K Chung, Liquat Hossain, Joseph Davis. Exploring
    sociocentric and egocentric approaches for social network analysis. KMAP 2005: Second
    International Conference on Knowledge Management in Asia Pacific (pp. 1-8). New Zea-
    land: Victoria University of Wellington.
   [Cox et al., 1997] Kenneth C. Cox, Stephen G. Eick, Graham J. Wills, Ronald J. Brachman.
    Visual data mining: Recognizing telephone calling fraud. Data Mining and Knowledge
    Discovery 1997, Volume 1, Issue 2, pp 225-231.
   [Eberle et al, 2011] William Eberle - PhD, Jeffrey Graves. Insider Threat Detection Using a
    Graph-Based Approach, Journal of Applied Security Research, 6:32–81, 2011
   [Freeman, 2011] Linton C. Freeman. The development of social network analysis - with an
    emphasis on recent events. The SAGE Handbook of Social Network Analysis, SAGE Pub-
    lications Ltd.
   [Csárdi et al., 2006] Gábor Csárdi, Tamás Nepusz: The igraph software package for com-
    plex network research. InterJournal Complex Systems, 1695, 2006.
   [INFLOW, 2010] http://www.orgnet.com/cases.html
   [Luell, 2010] “Employee fraud detection under real world conditions,” Ph.D. dissertation,
    2010. [Online]. Available: http://www.zora.uzh.ch/44863/
   [Phua et al., 2004] Clifton Phua, Vincent Lee, Kate Smith & Ross Gayler. A comprehensive
    Survey of Data Mining-based Fraud Detection Research. Arxiv preprint arXiv: 1009.6119.
   [PSU, 2007] https://courseware.e-education.psu.edu/courses/bootcamp/lo09/08.html
   [PWC, 2014] PriceWaterCoopers. 2014 Global economic crime survey. La fraude continue
    à être une vraie menace pour les entreprises.
   [Scott et al., 2011] John Scott, Peter J. Carrington. The SAGE handbook of social network
    analysis. SAGE Publications Ltd.
   [Wheeler et al, 2000] Richard Wheeler, Stuart Aitken. Multiple algorithms for fraud detec-
    tion. Knowledge-Based Systems 13(3): 93-99.
   [Wieringa, 2009] Roel Wieringa. Design science as nested problem solving. Proceedings of
    the 4th International Conference on Design Science Research in Information Systems and
    Technology (DESRIST '09), Philadelphia, Pennsylvania, USA.
   [Yufeng et al., 2004] Yufeng Kou, Chang-Tien Lu, Sirirat Sirwongwattana, Yo-Ping
    Huang. Survey of fraud detection techniques. Proceedings of the 2004 IEEE. International
    Conference of Networking, Sensing & Control. Taipei, Taiwan, March 21-23, 2004.