<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning Methods for Detecting Fraud in Online Marketplaces.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raoul Dekou</string-name>
          <email>rdekou@team.mobile.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabljic Savo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Kufeld</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diana Francesca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Kawase</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Codecentric AG</institution>
          ,
          <addr-line>Hochstra e 11, 42697, Solingen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>In: RWTH Aachen, CEUR-WS: Proceedings of The 2021 International Workshop on Privacy, Security, and Trust in Computational Intelligence</institution>
          ,
          <addr-line>Gold coast, Queensland</addr-line>
          ,
          <country country="AU">Australia</country>
          ,
          <addr-line>01-11-2021, published at</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Inovex GmgH</institution>
          ,
          <addr-line>Ludwig-Erhard-Allee 6, 76131, Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Mobile.de</institution>
          ,
          <addr-line>MarktPlatz 1 Europarc Dreilinden, 14532, Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Connecting buyers and sellers in a safe and
secure environment is one of the biggest
challenges in online marketplaces. Probabilistic
models built upon user-item databases
address the challenge, but often encounter issues
such as lack of stability and robustness. These
issues are magni ed in fraud scenarios where
datasets are highly imbalanced, noisy and
malicious users deliberately adapt their
behaviors to avoid detection. In this context, we
leveraged the power of existing open sources
machine learning libraries H2O and Catboost
and designed a pipeline to collect, process and
predict the likelihood of a private seller's
listing data to be fraudulent. We found that the
stacked ensemble model provides the best
performance (F1=0.73) when compared to other
commonly used models in the eld. Further,
our models are benchmarked on a public
Kaggle Dataset, TalkingData AdTracking Fraud
Detection Challenge where we compared them
to other studies and highlighted their
generalizability and e ectiveness at handling online
fraud.</p>
      <p>Copyright © by the paper's authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY
4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        As reported in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], retail e-commerce sales worldwide
accounted for 1.86 trillion USD in 2016 and are
expected to rise to 4.48 trillion USD in 2021. In the
meantime, a recent report on fraud attacks trends in
the rst quarter of 20211 con rmed the shift of
attacks towards retail websites and estimated that 25%
of this tra c is malicious. Such increase in activity has
brought enough pressure to marketplaces which need
to ensure reliability and security of their services while
inspiring trust towards buyers.
      </p>
      <p>Unfortunately, the success of online marketplaces
attracts unwanted attention from malicious users who
try to abuse the platforms for personal monetary gain.
mobile.de does not control transactions between buyer
and sellers. It is a \matchmaking" platform that
bridges the gap between the two sets of entities. Once
the user with malicious intent creates an account,
he/she also creates an attractive vehicle listing (the
goal is to get as many leads as possible). To achieve
this, fraudsters take a series of lead-boosting steps.
They upload listings of high-demand vehicles into the
platform and set very low yet reasonable prices for the
vehicles. Since every aspect of the listing looks
legitimate (the website, the seller and the vehicle), buyers
lower their guard and contact the fraudster. Through
a series of interactions, the fraudster is able to
convince the buyer (now a victim) to send a pre-payment
money transfer, usually as a \reservation" fee. Once
this happens, and the damage is done, the victims
realize their mistake, they contact mobile.de`s Customer
Service and report the case. There are very few cases
1https://securityboulevard.com/2021/07/top-industry-speci
cfraud-attack-trends-from-q1-2021/ (accessed on July 2021).
that reach this point, however, the total monthly loss
can soar to thousands of Euros.</p>
      <p>Satis ed customers (buyers and sellers) are the
foundation for a valuable and successful marketplace.
Thus, providing a secure enviroment and a safe
experience to our customers is a top priority at mobile.de,
and the motivation of this work which aims at
preventing and detecting fraudulent activity. To achieve our
goals, we tackled the fraud detection problem by
leveraging user generated data and building machine
learning models which are able to identify fraudulent
activities. It is also essential to design robust models, of high
precision which can also generalise well. This paper
describes our approach to mitigate the case of
fraudulent activity by fraudsters posing as private sellers.
Our contribution is twofold. First, we describe a
production pipeline to collect, process and score sellers'
listings using open source machine learning libraries
Catboost2 and H2O3. We brie y highlight how to
efciently use these libraries to pre-select relevant
candidate models and tune their hyper-parameters.
Second, we demonstrate that our approach could
potentially inspire other used cases by verifying our
detection methods on a sample of a large dataset publicly
available at Kaggle.com4.</p>
      <p>The remainder of this paper is structured as
follows. In Section 2, we discuss existing work in the
eld. In Section 3, we provide deeper understanding
of the problem and formalize it. In Sections 4 and 5,
we describe our methodology to tackle the problem.
Section 6 contains our results, followed by the
conclusion and prospects
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Techniques used to detect fraud can be divided into
two groups: expertise based and data driven. In the
rst technique, experts use their knowledge to build
a set of rules that are tested and re ned to lter
out fraudulent activities. However, contrary to
machine learning solutions traditional expert techniques
sometimes lack the ability to model non trivial
online connections [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The second set of techniques,
data driven, i.e. Machine learning solutions,
overcome this issue but yield di erent challenges. While
the increase of activity in marketplaces generates
massive datasets which require model scalability, the low
occurrence of fraudulent events produces imbalanced
datasets. Maintaining both a high precision and recall
is often a challenge and many models provide signi
cant misclassi cation errors [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which result in genuine
customers being agged as fraudulent. Finally, there
is also the need for dynamic solutions given that
fraudsters adapt their behaviors to a point where they are
able to bypass the detection from machine learning
models.
      </p>
      <p>
        Literature suggests various examples of application
of machine learning methods which aim at
detecting fraud. Najem and Kadeem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] recent survey on
fraud detection techniques in e-commerce, provides a
broad view on the performance of the several models
on various datasets. It highlights that Random
Forest (RF) is the most used and usually the most
accurate of all methods. Though Naive Bayes algorithms
are easy to implement, they are limited compared to
decision trees when it comes to modelling non linear
problems. Such information were taken into
consideration when selecting candidate models for our pipeline
which consists essentially of decision trees ensembles
(RF, Xgboost and Catboost). For instance, Kanei
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] trained a Random Forest model for detecting
fraudulent ad requests. In their study, they
demonstrated that the model robustness challenge could be
addressed by means of features which could not be
controlled by fraudsters such as the network
statistics from clients and publishers. This set-up allowed
them to improve their recall rate by 10%. Renjith
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] described a pipeline using Support Vector Machine
(SVM) to detect fraudulent sellers in an online
marketplace. The authors speci cally pointed out that a
cold start problem may arise for new users when
using predictive models with seller or transaction
information as features. In our approach, the cold start
e ect was mitigated by removing these types of
features. Gupta et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] benchmarked ensemble
models for predicting the likelihood of a click on mobile
phone advertisement to be fraudulent on a publicly
available Kaggle dataset. They tested two con
gurations: traditional and Big Data. In the traditional
con guration, they combined di erent sampling
techniques (SMOTE, strati ed sampling, etc) to reduce
the data size and handle the imbalanced training set.
This dataset which has been widely used in previous
studies [
        <xref ref-type="bibr" rid="ref14 ref22 ref8">8, 14, 22</xref>
        ], is employed in our study and results
from Gupta et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are used as our baseline. In our
work, we applied the same preprocessing techniques
and compared our results to their best model, Two
Class Decision Forest5 with an F1 score of 0.944. Using
a sample of the same dataset, Minastireanu and
Mesnita [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], trained a Lightgbm model to detect
fraudulent clicks and reported an accuracy of 98%. The
authors speci cally described an example of how feature
engineering on original features set (click time, device,
5https://docs.microsoft.com/en-us/azure/machinelearning/algorithm-module-reference/two-class-decision-forest
(accessed on July 2021)
channel, etc) and K fold cross validation are combined
to enable high performance. Besides, by testing their
model on a large data sample (18 millions users clicks),
they proved the robustness of the boosting machine
for the case study. In the same context, Mohammed
et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] investigated the scalability of Random
Forest, Balanced Bagging Ensemble and Gaussian Naive
Bayes on massive and highly imbalanced credit card
fraud datasets. They found that random
undersampling is e ective at handling imbalanced datasets, and
combined with RF, it is suitable for real time
applications on large datasets. In their study, the Random
Forest model provided the highest recall of 91%.
Rajora et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] benchmarked the performance of various
machine learning algorithms on a credit card
transaction dataset with 31 attributes. They used random
undersampling technique to address the data
imbalance and Principal Component Analysis (PCA) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as
dimensionality reduction technique. On top of PCA
features, a time feature corresponding to the time
delay from the rst transaction is part of the training
set. Furthermore, the authors illustrated how the
inclusion of this feature can impact the performance. RF
provided a better performance without the time
feature while Gradient Boosting Regression Tree
performance was constant. Meng et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] also used a real
world credit card transactions dataset and combined
Xgboost and sampling techniques to achieve great
performance. SMOTE technique allowed an increase of
the recall from 0.8062 to 0.9 and the AUC from 0.9795
to 0.9853. Mohammed et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] reported that Neural
Networks tend to over t on fraud datasets and struggle
to handle imbalanced datasets. Nevertheless, as
illustrated by Adewumi and Akinyelu [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in their survey,
such techniques are also commonly used for credit card
fraud detection. Najem and Kadeem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] pointed out
that hybrid methods which combine several methods
to build a robust learner provide better performance
than individual learners. For example, Wang et al.
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] built an hybrid mixed model consisting of
Xgboost and Logistic regression (LR) and benchmarked it
against common baseline models such as Xgboost, RF,
SVM, Naive Bayes and Logistic Regression on the
German Credit dataset published by UCI6. In the hybrid
model, an e ective feature combination was obtained
by using Xgboost leaf nodes as features for the LR
model. This set up, provided an AUC of 0.8321 which
is far beyond the value of 0.7321 obtained with LR,
the best individual model. Other studies such as [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] use meta learning techniques to enhance the
performance on credit card fraud dataset. However,
combining the output of di erent classi ers to build a
6https://archive.ics.uci.edu/ml/index.php (accessed on July
2021).
model reduces the classi cation speed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which might
be an issue on big datasets.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Problem statement</title>
      <p>mobile.de supports two di erent types of sellers,
namely dealers and private sellers. Dealers are those
registered dealerships in Germany and
neighbouring countries who are paying customers of mobile.de.
These are professional sellers who make a living out of
buying and selling vehicles. Private sellers are the
regular common citizens who own a vehicle and use a
classi ed market to sell it (not registered as a business).
Internally, at mobile.de a private seller is labelled and
named as FSBO (For Sale By Owner), and for the rest
of this paper, we will address a private seller with the
same terminology. Although there are several
malicious activities which can be classi ed as fraud such
as: account take over, falsi cation of documents, etc.,
our objective in this study is focused on a single type
of users (FSBOs) that create fraudulent (fake) listings.
Our pipeline overview is depicted in Figure 1. When
a listing is created (or updated) our machine learning
models generate a fraud probability prediction and, in
case the result is above a certain threshold, the
listing is manually evaluated by a Customer Service (CS)
agent, who reviews the content of the listing and
assigns a rating (ground truth). In addition to listings
agged by our ML models, Customer Service agents
extend their reviewing process to listings which might
have received users' complaints. Eventually, one way
or another, every fraudulent listing is agged in our
dataset, the vast majority happening before damage is
done, and in very few cases, reports come from scam
victims. The main classi cation task is binary in the
sense that the target variable to predict has two
possible outcomes OK or FRAUD. The goal is to detect
when a vehicle listing is (or becomes) fraudulent. It
can happen at the insertion time (version 1 of the
listing) or at any time later due to a modi cation in the
data.</p>
    </sec>
    <sec id="sec-5">
      <title>Datasets</title>
      <p>In this study, we used two di erent datasets to train
and test our machine learning models, mobile.de
inhouse dataset and a tailored sample of TalkingData
AdTracking Fraud Detection Challenge dataset
obtained from the machine learning competition
platform Kaggle.</p>
      <p>At mobile.de FRAUD cases are less frequent
(positive cases) than the OK cases leading to a highly
imbalance dataset. The in-house dataset consists of 27
categorical variables and 10 continuous ones. To maintain
the con dentiality of our data points, and to eliminate
the risk of giving any clues that could lead to learnings
on how to bypass our fraud detection models, we
refrain from disclosing the exact names of the attributes
and features.</p>
      <p>The public dataset is taken from the China's largest
independent big data service platform which covers
70% of active mobile devices in the country, handles
3 billion clicks per day out of which 90% are
potentially fraudulent. Contrary to mobile.de case, here
click fraud is the most frequent class (negative class)
and occurs when a person or an automated bot
acting as legitimate user clicks on an app ad without
downloading the app afterwards. The raw dataset
contains 200 millions clicks over a 4 day period. It
includes 7 data elds (IP, app, device, OS, channel,
click time, attributed time) and a binary target to
predict (is attributed). The target variable is imbalanced
with 99.8% of negative cases.</p>
      <p>Tables 1 and 2 summarize the preprocessing steps
applied on mobile.de and TalkingData datasets
respectively. For our in-house dataset, the testing set
corresponds to samples recorded 7 days prior to the day
the model was trained. The training set corresponds
to 28 days of data prior to the start date of the testing
set. The timely split was done to prevent the model
from learning from future observations. In order to
reduce the imbalance and increase the performance,
we applied a random undersampling and kept 10 % of
the majority class in the training set. This resulted
in around 200,000 training samples and 240,000
testing ones. We kept raw missing entries within the sets,
H2O and Catboost models handled them as separate
categories7,8.</p>
      <p>
        For the Kaggle dataset, we borrowed the
preprocessing steps from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and we engineered two additional
features: click hour of the day and day of the week.
First, we reduced the data size by randomly sampling
15% of unique IP addresses and retaining a strati ed
7https://docs.h2o.ai/h2o/latest-stable/h2o-docs/datascience/gbm-faq/missing values.html (accessed on 16 July
2021).
      </p>
      <p>
        8https://catboost.ai/docs/concepts/algorithm-missingvalues-processing.html (accessed on 16 July 2021).
undersampling
missing values
feature
engineering
sample of 8% of the remaining set. To handle the
imbalance, we applied Synthetic Minority Over
Sampling Technique (SMOTE) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with 5 neighbours and
oversampled the positive class up to 11%. We then
applied a strati ed split, keeping 70% of the set for
training. The nal set has 1,706,481 training samples
and 731,349 testing ones without any missing values.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Training Machine Learning Models</title>
      <p>In this section, we brie y summarize the theoretical
concepts behind the models used in our study,
provide an overview of the machine learning libraries in
which the models were implemented and nally
describe the hyper-parameter tuning steps and our
performance metrics.</p>
      <p>
        As stated in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Random Forest is an ensemble
machine learning algorithm consisting of a collection of
decision trees each built from random samples. In each
tree, thresholds are applied to the input features to
maximize information gain while minimizing an
impurity function (for e.g. Cross Entropy, Mean Squared
Error, etc). The nal score is given by the average
scores of all trees. Besides, RF provides maximum
depth and minimum sample split parameters to
prevent decision trees from over tting on the training set.
      </p>
      <p>
        Xgboost [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is another ensemble method which
belongs to the large family of boosting algorithms. In
general, boosting models combine shallow decision
trees (also called weak learners), each built
sequentially considering the errors on previous trees to reduce
bias and variance at the same time. Xgboost
particularly is an advanced implementation of gradient
boosting which includes additional features such as parallel
processing and regularization techniques for handling
over tting.
      </p>
      <p>
        Introduced in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Catboost is a boosting model
designed to handle and process categorical data e
ciently. By default, Catboost implementation uses one
hot encoding technique on categorical variables except
for the ones with high cardinality. In such a case,
ordered targeted statistics [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] are used to maximize
information gain. Contrary to other machine learning
techniques which require preprocessing steps to
convert categorical data into numbers, Catboost requires
only the indices of the categorical features [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Meta learning technique aims at combining the
output of several based learners to improve the prediction
accuracy and utilize the strength of one learner to
complement the weaknesses of others [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In this study,
we used H2O AutoML [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to build a stacked
ensemble. AutoML brings out a simple wrapper function
optimized for training and combining a large number
of models in a short amount of time. This module
evaluates single machine learning models (GBM9,
Xgboost, RF, Extremely Randomized Trees10, Arti cial
Neural networks11 and Generalised Linear Models12)
and their stacked ensembles on validation sets using
relevant metrics (for e.g. AUC, logloss, etc). The best
performing model is then retained for deployment.
      </p>
      <p>
        H2O is an open source distributed library software
for machine learning and deep learning applications.
Its attributes: frame and clusters allow to easily
process tabular data of various types in a distributed
fashion. H2O platform supports various interface
including R, Python and Java making it easier to complete
analytic work ows [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In our case, we used H2O
Python interface to train and optimize Distributed
Random Forest (DRF), Xgboost and AutoML
models. The models trained are saved as MOJO (Model
Object Optimized) formats which are later embedded
in JAVA environment for real time predictions.
      </p>
      <p>The Catboost library is another high performance
open source framework for gradient boosting on
decision trees. Similar to H2O, Catboost library supports
Python, R and JAVA interfaces. For this study, we
combined Catboost's Python and JAVA interfaces for
model training and deployment.
The parameter optimization described in this section
is limited to our in-house dataset. In fact, because
of TalkingData large sample size (1,706,481 entries)
carrying out an extensive hyper parameters tuning is
daunting. Therefore, for this dataset, we applied a full
parameter optimization only for the Catboost model
and kept similar parameters for their H2O
counterparts.</p>
      <p>For H2O, 3, 5 and 10 folds Cross Validation (CV)
have provided the best performance for RF, AutoML
and Xgboost respectively. These models
hyperparameters are depicted in Table 3. However, on the public
dataset, we set the maximum number of models to 10
and the number of folds to 3 to circumvent memory
limitations for AutoML.</p>
      <p>
        For Catboost, Python library Hyperopt13 allowed
hyperparameters optimization. Hyperpot provides
custom functions for hyperparameter search. Each
parameter value is retrieved from a list of candidates
taken from a speci c \quantized" continuous
distribu13https://github.com/hyperopt/hyperopt (accessed on 16
July 2021).
tion such as qloguniform and quniform (see Table 4).
Besides, models are trained for 500 iterations, using 3
folds CV, the logarithm loss function and Area Under
the Receiver Operating Characteristic Curve (AUC)
evaluation metric.
In an imbalanced classi cation task, the positive class
denotes the less frequent value of the target and the
negative class is its complement. When scoring a
model, an optimal solution can be derived from the
confusion matrix [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. True positive (TP) and True
negative values (TN) occur when the output of the model
matches with the ground truth label on positive and
negative classes respectively. Conversely, False
Positive (FP) and False Negative (FN) occur when the
model provides predictions which mismatch with the
true labels. To convert model probabilities into classes,
we chose a threshold in order to maximize the F1 score
on the testing set accordingly. F1 score is the harmonic
mean between the precision and recall and evaluates
the accuracy of the model at predicting the positive
class. Another popular evaluation metric is the Area
Under the Receiver Operating Characteristic Curve.
Contrary, to the previous metrics, it is used to assess
the ability of a classi er to distinguish between classes
independently of any selected threshold.
6
      </p>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>
        In order to retain candidate models for our
evaluation, we rst benchmarked a large pool of machine
learning models. For this purpose, H2O AutoML
objects provide leaderboard() method which allows to
rank the models trained to build the stacked ensemble
on chosen dataset and metric. These models are
optimised with AutoML prede ned random grid
parameter searches which are di erent from our production
hyper-parameters tuning described in the previous
section. Table 5 summarizes the AUC obtained on our
inhouse test dataset but limited to the best algorithms of
each family (GBM, Xgboost, RF, Extremely
Randomized Trees, Arti cial Neural networks and Generalised
Linear Models). Tree based models outperform Arti
cial Neural Networks and Generalised Linear Models.
They suit well to complex non linear problems [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
Especially, GBM and Xgboost yield the best AUC of
0.982 followed by Random Forest of 0.9790 AUC.
Besides, Najem and Kadeem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] survey on fraud
detection techniques in e-commerce demonstrated that RF
has the highest frequency usage and is the best
performing one across various use cases. Based on these
observations, we initially retained AutoML, Xgboot
and RF for our benchmark. Catboost model, which
is not part of H2O was benchmarked separately and
added later for the comparison.
      </p>
      <p>
        Tables 6 and 7 illustrate performance metrics
obtained from the di erent models on mobile.de and
TalkingData datasets respectively. On the rst one,
AutoML best model (stacked ensemble) yields an F1
score of 0.73 which is higher than the one of 0.71
obtained with Xgboost and Catboost and of 0.68 with
Random Forest. It has been reported in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that
stacked ensemble models usually produce better
performance than individual models (Xgboost, Random
Forest, etc) used in an AutoML run in accordance with
our ndings. On Talking Dataset, Catboost model
yields the best performance with an F1 score of 0.988.
Catboost model is designed to process heterogeneous
data with categorical variables e ciently [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The
features cardinality is highlighted in Table 8. One hot
encoding on one side and ordered targeted statistic
applied on variables of high cardinality have a signi cant
impact on the model performance. Catboost also
provides get feature importance() method which gives the
contribution of each feature to the ensemble model.
The output of this method is summarized in Figure 2,
the app id for marketing and the IP address of click
are the most important features.
      </p>
      <p>
        In order to assess the generalizability of our
modelling approach at detecting fraud, we compared our
models with the work of Gupta et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Their best
model, Two Class Decision Forest classi er provides
a precision of 0.992 and a recall of 0.902
corresponding to an F1 score of 0.9442. All the models used in
our experiment outperform their results in terms of
feature
IP
device
OS
channel
app
hour
dayofweek
F1 (see Table 7). Especially, our best model Catboost
demonstrates a comparable precision and a better
recall. Relying on F1 score alone to compare our models
would be problematic since in the TalkingData's
context the positive class correponds to the non fraudulent
clicks. In the TalkingData adTracking Fraud
Detection Challenge, Kaggle competitors' machine learning
models were evaluated based on AUC. Using such a
metric, our Catboost model yields an AUC of 0.9994
compared to 0.997 from Gupta et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
7
      </p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>We presented a case study which described the
application of ensemble methods to detect fraud in a large
scale online marketplace (mobile.de). The business
value of such an investigation is twofold. First, to
enable a trustworthy customers' experience and enhance
customers' satisfaction. Second, to reduce Customer
Service operational cost in order to resolve fraudulent
cases.</p>
      <p>
        To achieve our goals, we designed a Machine
Learning pipeline based on sellers' listings data and
optimized a way to address common challenges in
ghting fraud (fraudsters adaptability, dataset imbalance,
high false positive rate, etc). The main contribution
of this study is that it proposes a pipeline using open
source data science libraries to collect, process and
score sellers listings to e ciently detect fraud. Our
best model AutoML has provided an F1 score of 0.73
outperforming Catboost, Xgboost and Random
Forest. These models were later tested on a TalkingData
public dataset from Kaggle competition platform and
yielded great robustness at detecting fraud and
outperformed previously proposed models. The best model
on this set, Catboost provides an F1 score of 0.9888
which is signi cantly higher than the value of 0.9442
reported in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        With regard to the prospects of the study, we will
rst explore dimensionality reduction techniques [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
and encoding methods in order to improve the
performance of the classi ers. Second, we will leverage
the power of Big Data tools (for e.g Spark) to train
and optimize the models on larger samples of data.
In addition to that, we aim at investigating di
erent meta learning techniques combining Catboost and
H2O models to build robust classi ers and further
prevent fraud in our website.
      </p>
      <p>Furthermore, in our future work we will tackle the
problem of detecting fraud \as soon as possible". It
is crucial that fraudulent listings are detected before
it reaches the audience. To this end we plan to
include further features such as buyers' and sellers' user
activity. Finally, we would like to highlight that the
work present in this paper is currently in production,
protecting buyers and sellers at mobile.de, and due to
that we refrain from disclosing more technical details
that could help malicious users to bypass our detection
system.
8</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>We would like to thank the Customer Service team
at mobile.de for their countless hours of manual work
in detecting fraud, and for providing us the ground
truth to start our work. We would also like to thank
members of TnS and Data teams at mobile.de who
have directly and indirectly been involved in this work,
with special thanks to Moritz Ascho and Matthias
Radtke.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Abdi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>L. J.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Principal component analysis</article-title>
          .
          <source>Wiley interdisciplinary reviews: computational statistics</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <volume>433</volume>
          {
          <fpage>459</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Adewumi</surname>
            ,
            <given-names>A. O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Akinyelu</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A survey of machine-learning and nature-inspired based credit card fraud detection techniques</article-title>
          .
          <source>International Journal of System Assurance Engineering and Management</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <volume>937</volume>
          {
          <fpage>953</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Aiello</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Click</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roark</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rehak</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stetsenko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Machine learning with python and h2o</article-title>
          . Edited by Lanford, J.,
          <source>Published by H</source>
          ,
          <volume>20</volume>
          :
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Random forests</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Chawla</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowyer</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>L. O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kegelmeyer</surname>
            ,
            <given-names>W. P.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Smote: synthetic minority over-sampling technique</article-title>
          .
          <source>Journal of arti cial intelligence research</source>
          ,
          <volume>16</volume>
          :
          <fpage>321</fpage>
          {
          <fpage>357</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          , pages
          <volume>785</volume>
          {
          <fpage>794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ghori</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abbasi</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Awais</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Performance analysis of di erent types of machine learning classi ers for nontechnical loss detection</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>8</volume>
          :
          <fpage>16033</fpage>
          {
          <fpage>16048</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boldina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Woo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Predicting fraud of ad click using traditional and spark ml</article-title>
          .
          <source>In KSII The 14th Asia Paci c International Conference on Information Science and Technology (APICIST)</source>
          , pages
          <fpage>24</fpage>
          {
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hossin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sulaiman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>A review on evaluation metrics for data classi cation evaluations</article-title>
          .
          <source>International Journal of Data Mining &amp; Knowledge Management Process</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kanei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiba</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hato</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsumoto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Akiyama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Detecting and understanding online advertising fraud in the wild</article-title>
          .
          <source>IEICE Transactions on Information and Systems</source>
          ,
          <volume>103</volume>
          (
          <issue>7</issue>
          ):
          <volume>1512</volume>
          {
          <fpage>1523</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>LeDell</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Poirier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>H2O AutoML: Scalable automatic machine learning</article-title>
          .
          <source>7th ICML Workshop on Automated Machine Learning (AutoML).</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.-J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Trust and distrust in e-commerce</article-title>
          .
          <source>Sustainability</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1015</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>A case study in credit fraud detection with smote and xgboost</article-title>
          .
          <source>In Journal of Physics: Conference Series</source>
          , volume
          <volume>1601</volume>
          , page 052016. IOP Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Minastireanu</surname>
            ,
            <given-names>E.-A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mesnita</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Light gbm machine learning algorithm to online click fraud detection</article-title>
          .
          <source>J. Inform. Assur. Cybersecur</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
          </string-name>
          , K.-W.,
          <string-name>
            <surname>Shiratuddin</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Scalable machine learning techniques for highly imbalanced credit card fraud detection: a comparative study</article-title>
          .
          <source>In Paci c Rim International Conference on Arti cial Intelligence</source>
          , pages
          <fpage>237</fpage>
          {
          <fpage>246</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Najem</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kadeem</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>A survey on fraud detection techniques in e-commerce.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Prokhorenkova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gusev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorobev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorogush</surname>
            ,
            <given-names>A. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Catboost: unbiased boosting with categorical features</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>6638</volume>
          {
          <fpage>6648</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Pun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lawryshyn</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Improving credit card fraud detection using a meta-classi cation strategy</article-title>
          .
          <source>International Journal of Computer Applications</source>
          ,
          <volume>56</volume>
          (
          <issue>10</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Rajora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.-L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bharill</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>O. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puthal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A comparative study of machine learning techniques for credit card fraud detection based on time variance</article-title>
          .
          <source>In 2018 IEEE Symposium Series on Computational Intelligence (SSCI)</source>
          , pages
          <year>1958</year>
          {
          <year>1963</year>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Renjith</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Detection of fraudulent sellers in online marketplaces using support vector machine approach</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .00464.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Suganya</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kamalra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Meta classi - cation technique for improving credit card fraud detection</article-title>
          .
          <source>International Journal of Scienti c and Technical Advancements</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <volume>101</volume>
          {
          <fpage>105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Thejas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dheeshjith</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyengar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sunitha</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Badrinath</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>A hybrid and e ective learning approach for click fraud detection</article-title>
          .
          <source>Machine Learning with Applications</source>
          ,
          <volume>3</volume>
          :
          <fpage>100016</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Credit fraud risk detection based on xgboost-lr hybrid model</article-title>
          .
          <source>In Proc. Int. Conf. Electron. Bus.</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>336</fpage>
          {
          <fpage>343</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A model based on convolutional neural network for online transaction fraud detection</article-title>
          .
          <source>Security and Communication Networks</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>