<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Preparation and Enrichment Algorithm for Fraud Detection System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktor Sahaidak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kamila Storchak</string-name>
          <email>kpstorchak@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoriia Zhebka</string-name>
          <email>v.zhebka@duikt.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Bilavka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Bondarchuk</string-name>
          <email>a.bondarchuk@kubg.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Borys Grinchenko Kyiv Metropolitan University</institution>
          ,
          <addr-line>18/2, Bulvarno-Kudriavska St., 04053, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State University of Information and Communication Technologies</institution>
          ,
          <addr-line>Solomyanska Street 7, 03110, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>198</fpage>
      <lpage>206</lpage>
      <abstract>
        <p>Following paper provides small overview of recent research results on fraud detection solutions, test environment, conditions that showed a lack of focus and testing on standardized data formats and following information delivery time. In paper data preparation and enrichment algorithm scheme, related tables, configurations, required fields for mapping between session detail record from VoLTE platform and billing data was described. Developed algorithm approbation results were provided and compared between case with only standardized data format (NRTRDE, TAP3) and case that combines NRTRDE, TAP3 data with enriched xDR prepared by algorithm. Conclusions were made, that following algorithm improved fraudulent traffic detection time, allows to upload and enrich data directly in Fraud detection system database, proved that data delivery time is vital KPI and should be considered during Fraud management system development phase and approbation before implementation on carrier network.</p>
      </abstract>
      <kwd-group>
        <kwd>Information technologies</kwd>
        <kwd>VoLTE</kwd>
        <kwd>Big data</kwd>
        <kwd>network security</kwd>
        <kwd>telecommunication fraud</kwd>
        <kwd>KPI</kwd>
        <kwd>ETL</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        for transaction delivery time to partner network. This delivery time can vary not more than 4 hours
or 8 hours (in case of outages on partner network) for NRTRDE and not more than 30 days or not
more than 40 days (in case of outages on partner network) for TAP3 defined by GSMA [
        <xref ref-type="bibr" rid="ref7 ref8">8,9</xref>
        ]. In order
to improve detection, data preparation and enrichment algorithm was developed and tested on
environment approximate to real network.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Test environment and data flow description</title>
      <p>In order to simulate NRTRDE and TAP3 files preparation, Oracle Billing system was used. This
system has embedded feature to generate such files. For data transfer from one carrier network to
another carrier DCH providers are used of some other near real time file sharing service.</p>
      <p>
        Test environment (figure 1) consist of two OCS for carrier 1 and 2, Rhino VoLTE TAS Application
server, Rhino CDR ETL server and Fraud detection system (FMS). All elements are connected to one
and same NAS to deliver, prepare and upload files to FMS. OCS of carrier 1 provides price file to
calculate cost of subscriber session in Rhino Raw. OCS of carrier 2 generates NRTRDE and TAP3
files with time delays defined in their standards [
        <xref ref-type="bibr" rid="ref7 ref8">8,9</xref>
        ], delivers to NAS and FMS collects following
data from NAS. Rhino VoLTE TAS generate Rhino Raw in AVP format every 5 minutes, ETL server
convert Rhino Raw from AVP to .csv format (Rhino csv) and with help of Oracle ODI collect Rhino
Raw and Price file, calculate and enrich session cost, convert session time from UNIX to readable
format, load result to FMS DB.
      </p>
      <p>OOaCnrSadocRlfeecvBaerilrnliiuenreg2 OOaCnrSadocRfleecvaBerilrnliiuenreg1 sRehrviAneporpoVlifoccaLaTtirEorinTeArS1 bFaraseudd odnetOecraticolen RsyDsBteMmS
TAP3, NRTRDE</p>
      <p>Price</p>
      <p>Rhino Raw</p>
      <p>Price, Rhino csv
Rhino CDR</p>
      <p>ETL server
Price, Raw Rhino</p>
      <p>TAP3, NRTRDE</p>
      <p>NAS</p>
    </sec>
    <sec id="sec-3">
      <title>3. ETL server file preparation and DB structure</title>
      <p>Price files exported from OCS contain 8 fields (refer to table 1) for recognition of service name and
type, TADIG, MCC, MNC, IOI, minimum price and currency. Fields from Rhino CDR
OC-ServiceType, OC-Call-Type, OC-Access-Network-MCC-MNC, OC-Visited-Network-MCC-MNC,
OC-IMSIMCC-MNC, Inter-Operator-Identifier are used from Rhino csv to correlate Price file and session.</p>
      <p>
        To prepare Rhino csv file, ETL server have bash code configured to run periodically through cron
jobs. Rhine raw file is delivered by VoLTE in AVP (attribute value-pair) format and one AVP can
contain more than one child AVP. Rhino VoLTE AVP fields are similar to ETSI standard [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] but also
providing additional self-developed AVPs.
      </p>
      <p>Oracle ODI logical flow (figure 2) perform next actions:
1. Files are loaded from directories «/media/sf_fraud_detection/Price/in»
«/media/sf_fraud_detection/Rhino/in» to temporary tables C$_0EXP, C$_1CDR;
and
•
•
•
•
•</p>
      <p>Values from temporary tables are transformed into expressions (EXPRESSION_EXP,
EXPRESSION_CDR) for join operations (JOIN_MOC, JOIN_MTC);
Data from tables are joined by expression for JOIN_MTC
CDR.Service_name=EXP.Service_name and EXP.Service_type=CDR.Service_type and
CDR.Destination_network=EXP.IOI and JOIN_MOC
;
After join operation was done, results are loaded in table fraud.ur_ims_records. In this table
(to_date('1970-01-01
00:00:00','YYYY-MM-DD HH24:MI:SS') +
numtodsinterval((CDR.Start_call_time/1000),'SECOND')) , price for provided service is
calculated by - , session
(CDR.End_call_time-CDR.Start_call_time)/60000
After record is loaded in table, system periodically checks it and loads new CDRs in
application memory, where these records will be processed by fraud detection rules;
After ODI loaded records in table, original files that are located in
«/media/sf_fraud_detection/Rhino/in» moved automatically to directory
«/media/sf_fraud_detection/Rhino/done»;</p>
      <p>On physical level of described ODI scheme for data load from files to temporary table Loading
Knowledge Module (LKM) SQL to SQL (Built-in) Global is used. It is built-in ODI module for common
operations with relational databases.</p>
      <p>After records were loaded in database, Fraud detection software starts to process it and each
record goes through next lifecycle of tables located in RDBMS:
fraud.ur_ims_records&gt;fraud.pos_prealerts-&gt;total_alerted_cdrs/fraud.ims_alerted_cdrs-&gt;fraud.alerts-&gt;fraud.cases
This tables perform next functions:
fraud.ur_ims_records this table is used to store Rhino CDRs on FMS DB;
fraud.pos_prealerts this table is used by FMS to create temporary fraud alerts. It stores all
types of alerts that became complete or were discarded as duplicates;
fraud.ims_alerted_cdrs this table stores the CDRs that were used to create a fraud alert. By
default, the records in total_alerted_cdrs duplicate the information in fraud.ur_ims_records,
but will also provide the time when record was added into the table after alert was created
and alert ID related to record.
fraud.total_alerted_cdrs - stores information about all CDRs that were used to create an alert,
regardless of the source type (it combines CDRs from all types of data sources).
fraud.alerts stores information about alerts. This table is common for all data source types
and provides information about the alert creation time, number of CDR records, rule</p>
      <p>Mapping between table fraud.ur_ims_records in FMS, Rhino raw and price file, described in table
2. Depending on service type in record only one of two fields can be filled:
•
•</p>
      <p>Originated_imei can be filled in Rhino raw if service type is MOC</p>
      <p>Destination_imei can be filled in Rhino raw if service type is MTC.</p>
      <p>Fraud detection in FMS consists of Detection, Correlation and Case management phases (figure
3). Each phase uses next lookup tables:
•
•
•
•
•
•
fraud.sources stores information about data sources and their unique identifiers.
fraud.rule_sources stores information about rules belonging to data sources defined in
fraud.sources.
fraud.rules - this table stores information about rules created by users (unique identifier of
the rule, name of conditions used to detect suspicious traffic, and boundary conditions, which
include the monitoring period, number of suspicious sessions required to create a fraud case,
etc).
fraud.conditions this table contain information about conditions created by users to track
a certain type of traffic. Conditions can consist of any fields of the ur_*_records tables and
lists that use Boolean algebra conditions to combine or exclude a certain condition.
fraud.hotlist - stores information about existing lists created by users. It contains information
such as the list creation time, username, list name, and its unique identifier.
fraud.hotlist_values - this table stores the values of suspicious objects or identifiers of
unscrupulous carriers. Each value contains the time it was created and the time until which
this value must exist in the list to which it belongs.
Detection</p>
      <p>NRTRDE and TAP3 files are decoded by built-in ASN1_decoder of Fraud management system to
csv format and loaded in appropriate tables:
• fraud.ur_nrtrde_records this table is used to store decoded CDRs from NRTRDE on FMS</p>
      <p>DB;
• fraud.ur_ims_records this table is used to store decoded CDRs from TAP3 on FMS DB;
Processing flow for NRTRDE and TAP3 records by Fraud detection software is same:
•
fraud.ur_nrtrde_records-&gt;fraud.pos_prealerts</p>
      <p>&gt;total_alerted_cdrs/fraud.nrtrde_alerted_cdrs-&gt;fraud.alerts-&gt;fraud.cases;
•
fraud.ur_tap3_records-&gt;fraud.pos_prealerts-&gt;total_alerted_cdrs/fraud.tap3_alerted_cdrs&gt;fraud.alerts-&gt;fraud.cases;</p>
    </sec>
    <sec id="sec-4">
      <title>4. Rules for testing, number of records</title>
      <p>To detect fraudulent traffic several same detection rules were created for each type of data source
(refer to table 3). Following rules are targeted to detect long and small duration sessions that are
commonly recognized as fraudulent behavior.</p>
      <p>Traffic definition for monitoring
MOC, (terminating network
SWE01 or terminating network
bea.net), calling number is not
present in whitelist
MOC, (terminating network
SWE01 or terminating network
bea.net), calling number is not
present in whitelist
MOC, (terminating network
SWE01 or terminating network
bea.net), calling number is not
present in whitelist
MOC, (terminating network
SWE01 or terminating network
bea.net), calling number is present
in whitelist
MTC, (originating network
SWE01 or originating network
bea.net), called number is not
present in whitelist
MTC, (originating network
SWE01 or originating network
bea.net), called number is not
present in whitelist
MTC, (originating network
SWE01 or originating network
bea.net), called number is not
present in whitelist
MTC, (originating network
SWE01 or originating network
bea.net), called number is not
present in whitelist
duration&gt;=30 min
and duration&lt;=1800
min
duration&gt;20 sec
duration&gt;20 sec and
duration&lt;30 sec
duration&gt;20 sec and
duration&lt;30 sec
duration&gt;=30 min
and duration&lt;=1800
min
duration&gt;20 sec
duration&gt;20 sec and
duration&lt;30 sec
duration&gt;20 sec and
duration&lt;30 sec
count &gt;=3 calls
based on
originated
IMSI/MSISDN
count &gt;=10 calls
based on
originated
IMSI/MSISDN
sum duration
&gt;= 1 hours
based on
originated
IMSI/MSISDN
sum duration
&gt;= 1 hours
based on
terminated
IMSI/MSISDN
count &gt;=3 calls
based on
originated
IMSI/MSISDN
count &gt;=10 calls
based on
originated
IMSI/MSISDN
sum duration
&gt;=1 hours based
on originated
IMSI/MSISDN
sum duration
&gt;= 1 hours
based on
terminated
IMSI/MSISDN</p>
      <p>For traffic generation environment have 980 subscribers (490 virtual numbers for each carrier).
Number of sessions in network can vary from 7 to 8 thousand within 24 hours. Table 4 contain
number of files and session records generated within 1 day for each source type (Rhino CDR,
NRTRDE, TAP3).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Algorithm approbation results</title>
      <p>
        The formulas defined in [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] will be used to calculate algorithm efficiency. For each record
recognized as fraud by FMS,   will be calculated. For better scaling each result is divided into
intervals of 10 minutes and weighted average value of   is calculated for NRTRE and TAP3.
      </p>
      <p>Table 5 contain summary for fraud detection based on NRTRDE and TAP3 files detected within
24 hours.</p>
      <p>fraudulent records for NRTRDE and TAP3 source types</p>
      <p>In percentage distribution 31% of alerts generated based on NRTRDE files were uploaded in
system within one hour, while 36% of alerts generated based on TAP3 files were uploaded in system
within 24 hours.</p>
      <p>Table 6 provide combined statistics with Rhino for NRTRDE and TAP3 data sources. In percentage
distribution 94% of alerts created based on combination of NRTRDE and Rhino CDR files were
uploaded in system within one hour, while 95% of alerts created based on combination of TAP3 and
Rhino CDR files that were uploaded in system within one hour.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Comparing weighted average value of   for NRTRDE and TAP3, can be noticed that for
NRTRDE latency was reduced by 3,7 times and for TAP3 it was reduced by 14 times (table 7).
Approbation results of developed algorithm showed improvement in fraud detection speed, that
decrease reaction time from security on fraudulent traffic. Algorithm allows to directly upload,
transform time and calculate transaction cost of detail record in FMS database. Results proved that
data delivery time have significant impact on fraud detection systems and should be considered
during development and maintenance in test environment before system delivery on actual carrier
network.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <article-title>Authenticated caller ID plus regulatory changes, Document Presented to the FTC Robocall Challenge</article-title>
          , May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] SoK: Fraud in Telephony Networks / M. Sahin et al.
          <source>2017 IEEE European Symposium on Security and Privacy (EuroS&amp;P)</source>
          , Paris, 26 28
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2017</year>
          .
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1109/eurosp.
          <year>2017</year>
          .
          <volume>40</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Macia-Fernandez</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Teodoro</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz-Verdejo</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Fraud in roaming scenarios: an overview</article-title>
          .
          <source>IEEE Wireless Communications</source>
          .
          <year>2009</year>
          . Vol.
          <volume>16</volume>
          , no. 6. P.
          <volume>88</volume>
          94. doi:
          <volume>10</volume>
          .1109/mwc.
          <year>2009</year>
          .
          <volume>5361183</volume>
          . 20,
          <string-name>
            <surname>Q4</surname>
          </string-name>
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .31673/
          <fpage>2412</fpage>
          -
          <lpage>9070</lpage>
          .
          <year>2022</year>
          .
          <volume>061720</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Telecom</given-names>
            <surname>Fraud Detection via</surname>
          </string-name>
          Hawkes-enhanced Sequence Model / Y. Jiang et al.
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          .
          <year>2022</year>
          . P. 1. doi:
          <volume>10</volume>
          .1109/tkde.
          <year>2022</year>
          .
          <volume>3150803</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Djizanne</given-names>
            <surname>Toukem</surname>
          </string-name>
          <string-name>
            <surname>Joel</surname>
          </string-name>
          , Fotsa Jounda Boris, Manfouo Kennedy Armel, Tchouatcha Deumaga Michel, Yuhala Peterson Jr.
          <article-title>Dohbila des fraudeurs à la Simbox au Cameroun</article-title>
          , Department OF Computer engineering, National advanced school of engineering, University OF Yaounde I, Année academique 2017-
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Kehelwala</given-names>
            <surname>Gamaralalage Dasun Chamara Kehelwala</surname>
          </string-name>
          ,
          <article-title>REAL-TIME FRAUD DETECTION IN TELECOMMUNICATION NETWORK USING CALL PATTERN ANALYSIS, Dissertation submitted in partial fulfillment of the requirements for the degree Master of Science</article-title>
          , Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka,
          <year>December 2017</year>
          . URL: https://dl.lib.uom.lk/items/d5f9f86a-2ab3
          <string-name>
            <surname>-</surname>
          </string-name>
          40d9
          <string-name>
            <surname>-</surname>
          </string-name>
          bfce-5dc203ae31fe.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>GSM</given-names>
            <surname>Association</surname>
          </string-name>
          , Official Document TD.
          <fpage>104</fpage>
          -
          <article-title>Use of TAP for the Single IMSI Wholesale Billing Interface, Version 1</article-title>
          .1,
          <issue>15</issue>
          <year>May 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>GSM</given-names>
            <surname>Association</surname>
          </string-name>
          , Official Document TD.
          <fpage>106</fpage>
          -
          <article-title>Use of NRTRDE for the Single IMSI Fraud Interface</article-title>
          ,
          <source>Version 1.0</source>
          ,
          <issue>27</issue>
          <year>November 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[10] ETSI TS 132 299 V17.1</source>
          .
          <issue>0</issue>
          (
          <issue>2024</issue>
          -05),
          <article-title>Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; 5G; Telecommunication management; Charging management; Diameter charging applications (3GPP</article-title>
          <source>TS 32.299 version 17.1.0 Release</source>
          <volume>17</volume>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Richard</surname>
            <given-names>Johnson</given-names>
          </string-name>
          ,
          <article-title>Oracle Data Integrator Essentials: Definitive Reference for Developers and Engineers</article-title>
          , HiTeX Press, June 20,
          <year>2025</year>
          . ASIN:
          <fpage>B0FDYS4V1T</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Viktor</surname>
            <given-names>Sahaidak</given-names>
          </string-name>
          , «
          <article-title>Overview of fraud detection systems and performance KPI development»</article-title>
          . -283,
          <year>Q1 2024</year>
          . doi:
          <volume>10</volume>
          .28925/
          <fpage>2663</fpage>
          -
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>