<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Continuous Query Engine to Detect Anomalous ATM Transactions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fernando Martín-Canfrán</string-name>
          <email>fernando.martin.canfran@estudiantat.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Benedí</string-name>
          <email>daniel.benedi@estudiantat.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amalia Duch</string-name>
          <email>duch@cs.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edelmira Pasarella</string-name>
          <email>edelmira@cs.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politècnica de Catalunya, Barcelona Tech</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>Nowadays data are in motion, change continuously and are -possibly- unbounded implying data sources that are also constantly evolving. From the data persistence point of view this reality breaks the usual paradigm of having dynamic but stable data sources. This, together with the increasing number applications based on data streams for taking critical decisions in real time, raises the need for re-thinking both the data and the query models to fit these new requirements. Therefore, under these circumstances, it seems reasonable that a suitable data model is a continuously evolving data graph. In this work we tackled the problem of querying continuously evolving data graphs in the context of ATM transactions, in particular anomalous ones. Under this context, evaluating continuous queries corresponds to recognize patterns usually associated with anomalous behaviors in the volatile subgraph of ATM transactions. We propose an evaluation process based on the dynamic pipeline computational model, a stream processing technique allowing the emission of alerts as soon as anomalous patterns are identified. Stream based Bank applications that monitor ATM transactions are direct beneficiaries of our proposal since they can continuously query data graphs to get “fresh” data as are produced, avoiding the computational overhead of having to discard non-valid data.</p>
      </abstract>
      <kwd-group>
        <kwd>transactions</kwd>
        <kwd>continuous query evaluation</kwd>
        <kwd>property graphs</kwd>
        <kwd>dynamic pipeline approach</kwd>
        <kwd>stream processing</kwd>
        <kwd>ATM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Although from a classical point of view databases are thought of for persistent data, nowadays
this perspective is changing since data are in motion, continuously changing and (possibly)
unbounded. So, the following questions arise: (i) What is the proper data model? and (ii) What
is the proper query model?
Regarding the data model, the new nature of data requires a de facto new database paradigm
-continuously evolving databases- where data can be both stable and volatile. Even though
evolving databases can be implemented according to any approach, graph databases seem
especially well suited here [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Indeed, the natural way to process evolving graphs as streams
of edges gives insights on how to proceed in order to maintain dynamic graph databases. Hence,
we consider that a suitable data model is a continuously evolving data graph, a graph having
persistent (stable) as well as non persistent (volatile) relations. Stable relations correspond
to edges occurring in standard graph databases while volatile relations are edges arriving in
(a) Part of a schema of a PG
(b) Pattern of anomalous transactions
data streams during a set time interval. Once this time interval is over, the relations are not
longer valid so that there is no need to store them in the (stable) graph database. However,
when required -as for further legal or auditing purposes- timestamped occurrences of volatile
relations can be kept in a log file. Volatile relations induce subgraphs that exist only while the
relations are still valid. Without loss of generality, in this work we consider property graphs
(PG) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] as the basic reference data model. As an example, Figure 1a depicts part of a schema
of a PG database where stable relations correspond to the data that a bank typically gathers
on its issued cards, ATMs (Automated Teller Machines) network, etc. Volatile relations model
the interaction between cards and ATM entities. Concerning the query model, fixed queries
evaluated over data streams are known as continuous queries [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Thus, instead of classical
query evaluation processes we envision incremental/progressive query evaluation processes. A
query on a PG database can be seen as a PG graph pattern with constraints over some of its
properties. Evaluating such a query consists on identify if there is a subgraph of the database
that matches the given pattern and satisfies its constraints. The problem of progressively
identify and enumerate bitriangles (i.e. a specific graph pattern) in bipartite evolving graphs
using the Dynamic Pipeline Approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have been successfully solved by Royo-Sales [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We
claim that the problem of evaluating continuous queries over continuously evolving PGs belongs
to the same family of problems and hence, we propose to address it using the same stream
processing approach. However, in this case, in addition to identify the query pattern, the
constraint satisfaction over properties must be checked also. Figure 1b shows a constrained
graph pattern corresponding to a continuous query. In this work, as a proof of concept, we
tackled the problem of evaluating continuous queries corresponding to anomalous patterns
of ATM transactions against a continuously evolving PG representing a bank database. To
be concrete, the anomalous patterns of ATM transactions are identified in the volatile (PG)
subgraph of the considered database. The evaluation process is based on the dynamic pipeline
computational model and emits answers (alarms) as soon as anomalous patterns are identified.
Additionally, a log of all the volatile relations of the PG is maintained. Figure 2 illustrates a
possible anomalous situation associated to this query.
      </p>
      <p>
        Related work. Recently, in Rost et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] authors formalize an extension to the query
language Cypher that allows for evaluating continuous queries over property graph streams
according to a specified frequency, during a defined time interval. The main diferences of their
approach with our proposal are: first, the data model because they merge incoming PGs to the
persistent database and, second, we evaluate continuous queries as new edges are added to the
volatile part of the property graph (i.e. as the PG evolves) by processing (identifying) patterns
(subgraphs). To our knowledge there is no other work approaching the problem of modeling
and implementing a continuous query engine for detecting anomalous ATM transactions. The
related work that we have found is mainly based on using machine learning, big data and data
visualization approaches[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14">10, 11, 12, 13, 14</xref>
        ].
      </p>
      <p>Contribution. The main contribution of this work is to provide, using a stream processing
approach, a general technique for addressing the problem of continuous query evaluation
against an evolving graph database by decomposing the datagraph into volatile and stable
well defined subgraphs. Among the advantages of using the dynamic pipeline computational
model are its parallel/concurrent nature and its suitability for developing real-time systems that
emit results as they are computed, in a progressive way. In regards to detecting abnormal or
suspicious ATM transaction, to our knowledge, most of the work addressing this topic provide
a delayed detection based on predictions given by ML systems. Also, it is frequent the classical
treatment of the problem by consulting log files because of the complaint of customers when
detecting by themselves some weird movement in their accounts. This involves annoying
processes for customers in order to have their money back. The idea is that, in presence of
some weird finding in an ATM transaction, banks have a tool able to either ask card holders for
authorizations or to take any other fraud preventing action at real-time.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Challenges for achieving a real implementation</title>
      <p>Defining and implementing a continuous query engine requires to address many diferent
problems, each of diferent nature. In addition, the proof of concept we intend to provide has
itself its own complications.</p>
      <sec id="sec-3-1">
        <title>2.1. Defining anomalous patterns of transactions</title>
        <p>
          It is not trivial to establish what is and in which circumstances a transaction can be considered
anomalous. Based on a work that have addressed this characterization [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] we intend to find a
proper characterization and then define the graph patterns associated to these anomalies. The
exact topology of an anomaly will depend on its own nature. Figure 1b depicts an example
characterizing a possible card cloning, among many other possibilities. For instance, using a
(stolen) card many times over a period of time at diferent ATMs to withdraw small amounts.
In this latter case, there will arrive to the evolving PG many volatile (interaction) edges having
the same card vertex and diferent ATM vertices. There could also be patterns related with
frequent/very high expenses; transactions located in an ATM out of the threshold distance of
the usual/registered address of the card holder and so on. Moreover, definition of patterns can
be beyond ATM transactions by considering online card transactions.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Modeling and implementing the continuous query engine</title>
        <p>
          To define a proper architecture for a continuous query engine is one of the most challenging
activities of our work. Among other tasks, this comprises: to define a graph-based query
language expressive enough to allow for capturing the diferent patterns representing
anomalous queries; to establish the algorithms for identifying the patterns associated to anomalous
queries; to choose and manage the right windowing approach and other features related to
distributed query-evaluation; to deal with the evaluation of many continuous queries
simultaneously; to evaluate the suitability of the implementation language, tools and the proper
system configuration. Figure 3 depicts a preliminary architecture for a continuous query engine
for detecting anomalous ATM transactions, DPATM. We propose a solution that follows the
Dynamic Computational Model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Briefly speaking, in this approach, stages are processes
that execute tasks concurrently/in-parallel. The multiset underlying the input data stream is
partitioned [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and distributed along filters according to a grouping relationship, usually based
on filters’ parameters. Each filter applies the same function to its block of data (stored as its
state). Accordingly, the DPATM algorithm is specified as follows: During a pre-defined time
interval window, when an interaction e (together its properties’ values) arrives to the DPATM,
the stage Sr register it into a standard transactional log file. Then, Sr passes e to the next stage.
If there exists a filter parameterized with the value of the property number of the Card vertex
that is incident to e, this filter keeps e in its state. In this way, filters’ states store subgraphs
induced by the edges in the volatile subgraph. Notice that these sets of edges in each filter
correspond to blocks of the (multiset) input data stream. Otherwise, the filter passes e to the
next stage. The task/function of each filter is to decide if there is a match with (some of) the
continuous query pattern(s) evaluated by the engine DPATM by means of the graph that it stores
and the information retrieved from the stable PG to identify patterns and solve constraints.
This is, indeed, the way to evaluate continuous queries. In case of matching a pattern, filters
emit an alert reporting the finding. Hence, answers are the detected anomalies and they are
emitted as they are obtained in filters. When answers arrive to Sk, this stage post-process and
output them. In addition, Sk maintains an answer log file. The fact that an interaction arrives
to GF means that there were not previous interactions having the same value of Card property
number and thus, a new filter parameterized with this new value is spawned. When the time
interval window is over, the DPATM is, in some sense, reset according to the given window
policy. Note that the window policy must take into account stored data that might be valid in
between two windows and handle the transition properly.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Discussion on Ongoing Work and Outlook</title>
      <p>The main luck stone, in order to be able to provide the proof of concept that we propose,
is to have a proper dataset. Given the confidential and private nature of bank data, it has
been impossible to find a real dataset for our experiments. In this regard, we are currently
constructing a synthetic property graph bank dataset based on the Wisabi Bank Dataset1. In
fact, we consider that our synthetic dataset will be an important contribution of this work.</p>
      <p>A prototype implementation of the DPATM for detecting the anomalous ATM transaction
pattern presented in Figure 1b has been developed using Go language. For testing this
implementation2 we have constructed a small stable bank PG using Neo4j graph database management
system. We plan to extend the prototype including window management and multiple
continuous query evaluation. Once we had generated a big enough stable bank property graph, to be
able to conduct experiments to assess our proposal, we need to tackle two important issues: (i)
to find appropriate baselines to compare our solution and (ii) to establish what are the proper
statistic frequency distributions for including anomalous ATM transactions in the input streams
of volatile relations.
1https://www.kaggle.com/datasets/obinnaiheanachor/wisabi-bank-dataset
2https://github.com/FCanfran/ATM-DP
This work has been supported by MCIN/AEI/10.13039/501100011033 under grant
PID2020112581GB-C21.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <article-title>Survey of graph database models</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 40</source>
          (
          <year>2008</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          <string-name>
            <surname>Kaliyar</surname>
          </string-name>
          ,
          <article-title>Graph databases: A survey</article-title>
          , in: International Conference on Computing, Communication &amp; Automation,
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>790</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barceló</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrgoč</surname>
          </string-name>
          ,
          <article-title>Foundations of modern query languages for graph databases</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 50</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <article-title>The property graph database model</article-title>
          ,
          <source>in: CEUR Workshop Proceedings, CEURWS.org (AMW2018)</source>
          , volume
          <volume>2100</volume>
          ,
          <year>2018</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2100</volume>
          /paper26.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Babu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          ,
          <article-title>Continuous queries over data streams</article-title>
          ,
          <source>ACM Sigmod Record</source>
          <volume>30</volume>
          (
          <year>2001</year>
          )
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Logical foundations of continuous query languages for data streams</article-title>
          ,
          <source>in: International Datalog 2.0 Workshop</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pasarella</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zoltan</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Sales</surname>
          </string-name>
          ,
          <article-title>A computational framework based on the dynamic pipeline approach</article-title>
          ,
          <source>Journal of Logical and Algebraic Methods in Programming</source>
          <volume>139</volume>
          (
          <year>2024</year>
          )
          <fpage>100966</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Royo-Sales</surname>
          </string-name>
          ,
          <article-title>An algorithm for incrementally enumerating bitriangles in large bipartite networks</article-title>
          (
          <source>master thesis)</source>
          ,
          <year>2021</year>
          . URL: http://hdl.handle.net/2117/361615.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tommasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Plantikow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Selmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Voigt</surname>
          </string-name>
          , Seraph:
          <article-title>Continuous queries on property graph streams</article-title>
          ,
          <source>in: EDBT</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <article-title>A survey of anomaly detection techniques in ifnancial domain</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>55</volume>
          (
          <year>2016</year>
          )
          <fpage>278</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Heryadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Wulandhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Abbas</surname>
          </string-name>
          , et al.,
          <article-title>Recognizing debit card fraud transaction using chaid and k-nearest neighbor: Indonesian bank case</article-title>
          ,
          <source>in: 2016 11th International Conference on Knowledge, Information and Creativity Support Systems (KICSS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Best</surname>
          </string-name>
          ,
          <article-title>Anti-money laundering: Using data visualization to identify suspicious activity</article-title>
          ,
          <source>International Journal of Accounting Information Systems</source>
          <volume>34</volume>
          (
          <year>2019</year>
          )
          <fpage>100418</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. M. Rahman</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          <string-name>
            <surname>Saha</surname>
          </string-name>
          , et al.,
          <article-title>A comparative study and performance analysis of atm card fraud detection techniques</article-title>
          ,
          <source>Journal of Information Security</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>188</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Obaid</surname>
          </string-name>
          ,
          <article-title>Detection of fraud in banking transactions using big data clustering technique customer behavior indicators</article-title>
          ,
          <source>Journal of applied research on industrial engineering 9</source>
          (
          <year>2022</year>
          )
          <fpage>264</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Magdalena-Laorden</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and ontology applied to credit card fraud detection (bachelor thesis</article-title>
          ),
          <year>2021</year>
          . URL: https://oa.upm.es/69050/.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Bender</surname>
          </string-name>
          , Partitions of multisets,
          <source>Discrete Mathematics 9</source>
          (
          <year>1974</year>
          )
          <fpage>301</fpage>
          -
          <lpage>311</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>