Integrating Graph and Machine Learning for Fraud
                                  Detection Use Case

                                       Uri Lapidot1 and Jay Yu2,3
                 1Risk and Fraud Platform, Intuit, Petah Tikva, Israel, uri_lapidot@intuit.com
                      2Technology Futures, Intuit, San Diego, USA, jay_yu@intuit.com
                3Product and Innovation, TigerGraph, San Diego, USA, jay.yu@tigergraph.com


Abstract
The Risk and Fraud platform team at Intuit relies heavily on graph-based technologies to prevent fraud
at scale. One of the challenges we were facing was how to expand the limited capabilities of the
traditional ML approach to leverage rich semantics of accounts connected as a graph. In this paper, we
will share our approach to integrate graph and machine learning together in an end-to-end risk and
fraud platform, including practical solutions to overcome limitations in temporal support and adoption
by ML Data Scientists.

Keywords: Fraud Detection, Graph Datab1ase, Machine Learning, Artificial Intelligence


1. Introduction

Payment fraud prevention is one of Intuit’s top priorities to support the lifeline of our 6M small
businesses globally. As fraudsters come up with more and more sophisticated attacks to redirect money
flow by setting up fraudulent merchant accounts and faking business transaction activities, we find
relying on traditional machine learning data features are not sufficient to detect and stop fraudulent
activities. In this talk, we will share our journey to build a graph-based risk and fraud system for fraud
detection, investigation and management, our insights from building such a system, challenges
encountered and practical solutions to overcome them.

2. Graph-based Features vs. Traditional ML Features

Traditional ML-based features in fraud detection use cases are usually drawn from relational datasets
associated with user accounts and interactions. These features can be classified as the following
categories:

    1.   Aggregations: sum of feature data columns. E.g., count of transactions from the same device
         in X days
    2.   Ratios: percentage of fraudulent over legitimate transactions. E.g., count of bad transactions
         from same device, divided by the count of legit transactions from same device in X days
    3.   Raw: direct feature value comparison. E.g., geo-location mismatch between IP and Zip code

Graph-based features add the new dimension of connectivity between any two user accounts with
various degrees of hops on one or multiple paths. These linkages connected overtime offer much more
intuitive, context rich, and explainable insights that can be leveraged by machine learning models
directly to greatly increase the accuracy of the algorithms. Below is a simple example of how entities
are connected via multiple hops in the fraud graph.


Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)
3. System Design, Challenges and Solutions

The end-to-end fraud detection and investigation platform is built with a graph database serving as the
centerpiece. By modeling and storing in the graph database up-to-date user/merchant account info and
their connected paths through contact and online access, we are able to generate on-demand graph-
based features to enrich our link analysis ML pipelines and models for Data Scientists. At the same
time, the underlying graph database is used to simplify and streamline fraud investigation and
management via intuitive graph visualization.


During the implementation of the end-to-end system, we encountered two biggest challenges:
    ● How to capture the evolving changes of the fraud and risk graph?
    ● How to allow data scientists to query graph data directly without having to learn a new query
        language?

To overcome the above challenges, we designed and implemented the following solutions:

    ●    Add time-dimension to all nodes/edges and adopt a hybrid strategy to connect latest snapshot
         data in graphs with the historical data in relational databases.
    ●    Support de-facto service API standard (GraphQL) as a query language to simplify adoption

4. Integrating Graph Features into ML Models

Our graph of business accounts, linked to each other via interactions through shared devices, emails or
direct financial transactions, is an optimal representation of entities and relationships that are defined
by human experts for the natural and intuitive reflection of the real world. In addition to directly
applying graph algorithms to perform unsupervised machine learning directly on graph data without a
separate ML pipeline, we explored practical ways to leverage deep insights in the graph to greatly
enhance our fraud detection machine learning models.
One such sample insight is the “number of linked closed accounts (related to fraud) in 6 hops”. This
graph-based feature is intuitive to get a deeper understanding of the level of risk for the account in
review. When graph-based features like this combined with other regular non-graph based features get
fed into a supervised learning process, the resulting model automatically combines human domain
knowledge encoded in the graph with the statistical power of machine learning. Thus dramatically
increase the effectiveness of the resulting model.


5. Results and Summary

By taking a graph-based approach with seamless integration with machine learning, we are able to
improve recall by 50% and precision by 50% for the fraud prediction ML model. In addition, one
graph feature rose to the second most important feature for our fraud detection model.


This end-to-end risk and fraud platform built upon the graph and ML integration proved to be a huge
success in production, becoming the backbone to fight against payment fraud in Intuit’s fast-growing
small business payment, capital, and cash capabilities.