Regular Pattern and Anomaly Detection on Corporate Transaction Time Series Francesca Soro, Marco Mellia Nicolò Russo Politecnico di Torino CEE CIB Innovation, UniCredit S.p.A. Torino, Italy Vienna, Austria francesca.soro@polito.it Nicolo.Russo2@unicredit.eu marco.mellia@polito.it ABSTRACT reducing or enlarging the number of employees, etc.. Bearing Business applications make extensive usage of time series analy- in mind these possibilities, we need to distinguish among three sis for the most diverse tasks. By analyzing the development of kinds of behaviors of interest: any phenomena over time we gain some useful insights on the • Cyclic phenomena: e.g., repeated peaks in the number of stock market forecast, analyze the risk related to investments, payments, representing salary or suppliers payments; understand the behavior of a company on the market and so on. • Continuously increasing or decreasing trend in the in- More specifically, in a corporate investment banking environ- coming/outgoing amounts and counterparts, hinting to ment, analyzing the transaction history of a customer over the underlying policy changes; years is crucial to establish a fruitful relationship and adapt to • Single isolated anomalies that need more investigations. its behavioural changes. In this environment we recognize three In this paper we present different techniques to target each of macro-categories of phenomena of interest: cyclic events, sudden these cases, spanning from simpler heuristics to a combination of and significant changes in trend, and isolated anomalous points. Machine Learning (ML) models, to serve as a tool to compare the In this paper we present a framework to automatically spot these algorithms outcomes and advertise the most meaningful cases. behaviors by means of simple - yet effective - machine learning The final aim of this work is to compare existing algorithms and techniques. We observe that cyclic behaviors and sudden changes techniques to detect anomalies in banking transactions data, and can be easily targeted by means of adaptive threshold algorithms, evaluate their performance, balancing the simplicity of the solu- while unsupervised machine learning techniques are the most tion with its reliability. It should be clear that the final objective of reliable in detecting isolated anomalies. We design and test our the framework is not that of substituting the human supervision, algorithms on actual transactions collected in the past two years but to serve as an instrument aiding the relationship managers from more than 2,000 customers of UniCredit Bank, showing to judge corporate clients behaviour and raise the attention on the efficiency of our solution. This work is tested to serve as a unusual movements. We apply these to a very large dataset of decision aid tool for corporate investment banking employees actual banking data. Given the impossibility to disclose actual to facilitate the inspection of years of transactions and ease the banking data, it is not easy, to the best of our knowledge, to find visualization of interesting events in the customer history. a comprehensive work that provides an insight on the effective- ness of ML techniques in this environment and on anomalies 1 INTRODUCTION showing different characteristics. As such, we are among the first to provide a practical solution in this area. Time series are commonly defined as indexed points collected This paper is organized as follows: in Section 2 we discuss similar at regularly spaced points in time. Such representation makes use cases or applications already present in literature. In Section them particularly suitable to represent a large amount of daily 3 we describe the raw dataset and the preprocessing steps before life and actual phenomena that vary over time. In the business feeding data to the algorithms. In Section 4 we provide a theo- field, for instance, they are extensively used in order to, among retical overview with the basic concepts underlying the applied many other applications, evaluate risk, forecast future behaviors, methodologies. Section 5 contains a discussion and evaluation predict stock prices changes, or detect anomalies in different on the resulting outputs. Section 6 concludes the paper. types of transactions. In this work we focus on the latter ap- plication: we exploit, fine tune, evaluate and integrate into a 2 RELATED WORK comprehensive framework some well-known anomaly detection techniques to spot both typical and unusual behaviors in corpo- Financial modelling and business use cases make extensive use of rate banking transaction data [21]. Apart from raising awareness time series analysis techniques. Authors of [22] enumerate some in presence of possible fraudulent events, detecting anomalies in of the most relevant applications: interest rates, growth rate of this environment may rise the attention on a customer undergo- the gross domestic product, inflation, index of consumer confi- ing changes in ownership or management, choosing to operate dence, unemployment rate, trade imbalance, corporate earnings, in new markets, supplying new customers or adopting new sup- book-to-market ratio, etc. Authors of [6] present a visualization pliers, moving parts of its business relationships to a new bank, tool that aggregates transactions recorded by Bank of America, to serve as an aid in spotting the first signs of money laundering This work has been supported by the SmartData@PoliTO center on Big Data and activities. [11] reports a survey of the most used machine learn- Data Science and was developed in collaboration with UniCredit S.p.A. Zweignieder- ing techniques in stock forecasting, providing some outlines on lassung, Wien. Copyright © 2020 for this paper by its author(s). how to build an extensive set of input features. Another common Published in the Workshop Proceedings of the EDBT/ICDT 2020 Joint Confer- field of application is risk in investment evaluation. This aspect is ence (March 30-April 2, 2020, Copenhagen, Denmark) on CEUR-WS.org. addressed in [24], where authors focus on noise prediction give Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) information about its influence on trading behavior. Authors of [21] propose the usage of a regression classifier operating on the tensor representation of High-Frequency Trading data. The by the company, called Company or Physical Person, or CoPP. By analysis of such transactions and the modelling of their waiting means of a Random Forest algorithm, it labels the transaction time is also addressed by the usage of random variables in [18]. recipients based on a set of input features and flags such as the Time series can be enriched by other quantitative data: the work presence of company-related stop words (e.g., GmbH, S.r.l., S.p.a., in [27] exploits a wider set of company-related financial indices, etc.), the number of characters in the name, the volume of the cash flow information and industry specific variables, to perform transactions or the presence of a numerical value in the name of bankruptcy prediction. The above mentioned solutions are devel- the beneficiary. In output, it classifies the payment transactions oped to model single specific cases, and often fail in capturing the in those directed to suppliers and those directed to employees, similarities among phenomena. Moreover, detecting anomalies respectively in blue and green in Figure 1, with an average accu- by using time series prediction outcomes implies a very strong racy of 96.3%. As it is visible in the plot, salaries constitute the assumption, i.e., the fact that the data used to train the model do largest number of transactions, and look periodical. Suppliers not contain anomalies. Since this is not always the case in a real payment transactions tend to be more spread over the considered world scenario, unsupervised algorithms can represent a suitable time period and generally lower in number. solution to group together and spot similar patterns, isolating the most interesting anomalous ones. The work in [13] provides 1000 a useful partition of the time series clustering approaches: raw- Salaries # distinct beneficiaries data-based, feature-based, model-based (i.e., respectively giving 800 Suppliers in input to the model the raw dataset, some feature extracted 600 from it or modelling the coefficients or the residual). The same work summarizes some of the most common clustering evalua- 400 tion metrics. Authors in [12] provide an example of model-based 200 clustering of time series, which models residuals by means of a Gaussian distribution. Many of the mentioned solutions are very 0 /01 /01 /01 /01 /01 /01 /01 /01 specifically tailored to the problem they target, and are hence 7/10 018/01 018/04 018/07 018/10 019/01 019/04 019/07 difficult to generalize to a wider set of use cases. In this paper 201 2 2 2 2 2 2 2 we propose a comparative analysis of common off-the-shelf algo- rithms for anomaly detection applied to a large dataset of actual Figure 1: Output of CoPP algorithm classification corporate investment banking transactions. The final aim is to provide techniques to tackle such problem efficiently and from If we take a look at the distributions of the outgoing amounts different perspectives, according to the operator needs, stick- per month, reported in Figure 2 for the same customer, we can ing to easy to implement and understand algorithms commonly see that despite being less in number, the transactions to suppli- known by data science practitioners. The output of our analysis ers (top plot) constitute the largest part of the overall outgoing aims at facilitating the business relationships between the bank amount, outperforming the salary payments (bottom plot) of two and the customer, allowing the relationship managers employ- orders of magnitude. Moreover, from the spikes in Figure 2(b), ees, who are not ML experts, to know in advance and adapt to we could understand that the customer in this examples may pay eventual changes in the customer activities. the 13th salary to its employees. Figure 3 reports, on the other hand, an example of client show- 3 DATASET ing a significantly decreasing trend. Such behavior should raise The case study reported in this paper takes advantage of a large the attention of the relationship manager and lead to the investi- dataset recording years of corporate customers payment trans- gation of the business relationships and events that originated actions. We define a transaction as a single payment directed this outcome. to or operated by a given customer entity. The original dataset To serve as input to the proposed supervised and unsuper- consists of more than 50,000,000 transactions, characterized by 35 vised models we further enrich the time series data to build a different fields. All the transactions are reported from the point complete set of features useful for inspection. We report all the of view of the customer: it is either a beneficiary, i.e., a payment features and their brief description in Table 1. Please note that for recipient, or a source, i.e., it is carrying the payment out. For this, confidentiality and privacy reasons we omit some of the details we take into account the direction of the transaction and separate regrading the original dataset, and we cannot report complete "incoming from "outgoing" transactions. In total, we count about examples. 500,000 customers, many of them recorded only a small number of transactions, and hence not so relevant. To filter them, we keep Table 1: Lags dataset only those overcoming a monthly threshold T = 100 of minimum incoming or outgoing payments, depending on the targeted ap- Field Value y Target variable plication. Given the transactions involving each client, we group date Date, daily granularity them by time interval, computing (i) the sum of amounts, and (ii) y_t-D Value of y at day d-D (D={1,..,6}) counting the number of unique counterparts. We set a different monthly_avg Average value of y per month time granularity (i.e., daily, monthly, weekly, quarterly) accord- weekly_avg Average value of y per week is_<(dayofweek)> Flag, 1 if date is <(dayofweek)> ing to the required task and need of the analysts. To provide prev_W_weeks Value of y at week w-W (W={1,..,3}) an overview of the data, we report an example of the periodic prev_M_months Value of y at month m-M (M={1,..,3}) payments phenomena we are looking for in Figure 1. In this case quarter_avg Average value in quarter we are able to distinguish the payments of salaries from those neigh_4_days Average value of y at +/-2 days directed to suppliers thanks to an algorithm internally developed 4.1.1 Salaries and suppliers payments detection. For this case 100000 study, we focus only on the number of unique counterparts to- 75000 wards the customer performs transactions. An (almost) regular Amount pattern in the count of such operations is a clear sign that the cus- 50000 tomer is paying employees salaries, or suppliers. Normally such 25000 transactions appear as visible periodical spikes whose height does not show significant changes over time. This character- 0 istic makes such behavior easy to spot by means of a simple 201708 201719 201710 201711 201802 201801 201802 201803 201804 201805 201806 201807 201808 201819 201810 201811 201902 201901 201902 201903 201904 201905 19 6 07 20170 adaptive threshold algorithms. Given the existing time series, we 20 Date compute the maximum number of counterparts registered every year, namley max_y_counterpart. We then define the threshold (a) Company amount distribution τ = 0.8 · max_y_counterpart. We then test every row against this threshold. We get as output 1 if the number of counterparts for that day overcomes the threshold, 0 otherwise. To decide that 4000 a customer shows a cyclic behavior, we check that the threshold is overcome at least once per month for at least the 85% of the Amount 2000 considered months (i.e., 20 months out of 24). Not reported here due to space limitations, the choice of parameters results robust, and the suggested values have been selected as best candidates. 0 201708 201719 201710 201711 201802 201801 201802 201803 201804 201805 201806 201807 201808 201819 201810 201811 201902 201901 201902 201903 201904 201905 19 6 07 4.1.2 Trend detection. We run the trend detection procedure 20170 20 Date on the count of both outgoing and incoming counterparts. Since a daily aggregation is not suited to detect a trend change, we (b) Physical person amount distribution aggregate the data considering their average per quarter of the year, as typically done in economy fields. We then compute the variations that interested the same quarters across all the avail- Figure 2: Amount distribution towards physical persons able years (namely δQ 1 , δQ 2 , δQ 3 , δQ 4 ). As a following step, we and companies further aggregate the counterparts count by month, and we then fit to such time series a simple univariate linear regression model as: 2000 yi = α + β · x i + ϵi (1) # counterparts We take into account the values of β and the p-value. The former 1000 gives information on the slope of the intercept line, the latter on the correlation of the target with the given regressor. We only consider as statistically significant those customers hav- 0 ing p-value ≤ 0.05. We then define two subset of customers: 201708 201709 201710 201711 201712 201801 201802 201803 201804 201805 201806 201807 201808 201809 201810 201811 201812 201901 201902 201903 201904 201905 201906 907 those having a relevant increase in the trend for at least a quar- ter (i.e., δQi ≥ 30%), and those showing a relevant decrease 201 Date (i.e., δQi ≤ −30%). This combination of thresholds set on the δQi and on the p-value allows us to filter customers with clear trends. At the end of this stage, we notify the relationship man- Figure 3: Example of client showing decreasing trend in ager with a list of customers to carefully monitor. incoming counterparts count 4.2 Isolated anomalous points detection Heuristics and simple linear regression models do not target 4 OBJECTIVES AND METHODOLOGIES isolated anomalies equally well. For this purpose we realize a In this section we provide a brief description of the adopted algo- framework including standard algorithms for time series fore- rithms. The heuristic techniques in Section 4.1 allow to recognize casting, supervised and unsupervised techniques. We describe customers showing periodical repeated phenomena or sudden the specifications about our implementation below. A detailed trend changes. Later we compare a set of supervised and unsu- discussion on the techniques is out of the scope of this work. pervised techniques to spot isolated single-point anomalies. We 4.2.1 ARIMA models. Time Series Forecasting is an exten- assume the reader is familiar with ML algorithms. sively used technique to tackle the anomaly detection problem ([17], [7], [23], [9]). Its final objective is to provide a prediction 4.1 Periodicity and trend detection of the future values of a time series based on its past values. By We use heuristic techniques to detect two types of clients: the defining a reasonable confidence interval for the predicted val- ones who periodically pay salaries or suppliers, and the ones ues, we identify as anomalies the points that fall outside such who show a steep trend (either ascending or descending). As predicted interval. Here we focus on Auto Regressive Integrated previously said, spotting these kind of customers is relevant as Moving Average Models (ARIMA). Every ARIMA model requires it allows the bank to highlight changes, e.g., in the company as input a stationary time series. Wand three fundamental pa- business operations or in the personnel composition. For these rameters: p, the number of autoregressive terms, d, the order applications, we use the per-customer time series as input. of the differencing term and q, the number of moving average terms. As a general best practice, we should keep in mind that 4.2.3 Unsupervised techniques. All the implemented unsuper- the values of p, d and q are usually kept below 3. We hence run a vised techniques require the specification of different parameters grid search with parameters ranging from 0 to 3. We instantiate and outlier identfication criteria. The former is algorithm-specific: an ARIMA model per each combination and we chose the best We combine three cluster quality measures: the Silhouette [19], model by calculating the Mean Squared Error between the actual the Davies-Boulding index [3] and the Calinski-Harabasz index value and the prediction yielded by every (p, d, q) triplet. Given [2] to choose the best clustering configuration. More details for the best model, we compute the upper and lower boundary of the each algorithm below. The latter is generic and based on the confidence interval for each prediction, and we flag as anomalous definition of a threshold p that defines the maximum number of every point falling outside such boundaries. points a cluster should contain to be labeled as anomalous. We found p = 5 to provide best results for our case study. 4.2.2 Supervised techniques. All the following techniques use We consider four clustering algorithms: k-Means [15], DBScan as input the dataset described in Table 1, properly standardized. [5], Hierarchical clustering [16] and Isolation Forest [14]. More Similarly as before, we consider well-accepted ML algorithms in detail, for k-Means we are required to specify the parameter that we train to predict the next value ŷ. We run hyperparameter k, whose evaluation is commonly pointed out as a critical as- selection, and compare the prediction ŷ with the actual value y. pect of the algorithm itself ([8], [26]). We target this problem by Differently from ARIMA models, we do not have a standard way restricting the possible range of k to a reasonable set of values to compute confidence intervals, thus we rely on domain knowl- defined according to our domain knowledge: we let k range from edge driven heuristics to flag outliers. In details we define a set 2 to 7. For each k we compute the number of anomalous clusters of threshold-based control criteria to label a point as anomalous: (i.e., the clusters containing N j ≤ p points), and among those, the anomaly is advertised if at least one criterion is triggered. We we identify the most common number of anomalous clusters by list all the criteria below: taking the mode of such column. We chose the best k according • Criterion 1 (multiplicative): to each score. If the algorithm never identifies any anomalous cluster, it returns 0. The advantage of DBScan is that it does not KO : if (y > τ ŷ) ∧ (| y − ŷ |>= σmin )  require to define the number of clusters a priori and it isolates OK : else noise points without using the aforementioned threshold p. We automatize the choice of the parameters ϵ and minPoints by first • Criterion 2 (additive): evaluating the distribution of the nearest neighbour distances.  KO : if (y − ŷ) > kσr oll inд Once we define the average most common value of distances as OK : else our ϵ, we calculate the distribution of the number of points ac- cording to this ϵ − neiдhbourhood, and we choose our minPoints • Criterion 3 (multiplicative positive): value in the same way. We finally run the algorithm with the selected parameters, and we consider anomalous all the points KO : if (y > ŷ) & Criterion 1  marked as noise. When using Hierarchical Clustering we again OK : else face the problem of defining the correct number of clusters. This depends on the setting of the cutoff level on the obtained output • Criterion 4 (additive absolute): dendrogram. We limit the set of reasonable cutoff levels up to KO : if | y − ŷ |>= kσr oll inд C = 5, and we evaluate the best C by considering the number  OK : else of anomalous clusters according to the threshold p combined with the best scores. If we are unable to identify small anomalous Where τ and k are multiplicative thresholds we manually tune, clusters, we yield 0 as a result. Isolation Forest algorithm shows σmin is the minimum monthly standard deviation of the label as its main strength the fact that it does not require an a-priori variable, and σr oll inд is the rolling standard deviation of the label definition of the number of clusters, but only on the selection of computed month by month. a reasonable number of base trees, as required by some of the For the regression, we consider state of the art algorithms. We previously described supervised algorithms. The main underly- briefly report the chosen configurations below. We manually ing idea of this technique is that anomalies are often isolated tune all the algorithms parameter, and we hereby report only points whose identification requires just a few partitions of the the resulting best configurations for the sake of space. For the feature space to separate them from the more concentrated sets Support Vector Regressor ([25], [4]) we exploit three different of points. We do not rely on the output of a single tree, but on kernel functions: the linear, the polynomial and the RBF one. the average output generated by a set of N = 100 trees. All kernels require a regularization parameter C = 100, an ϵ = 0.1, while the polynomial kernel takes also deдree = 3. The 5 CASE STUDY AND RESULTS Stochastic Gradient Descent Regressor [10] exploits the standard In this section we proceed with the description and comment concept of stochastic gradient descent to fit linear regression of the results obtained with the different automatic detection models. We choose to fix the number of maximum iterations techniques illustrated in Section 4. Please note that, for data to I = 100, 000, 000, and a stopping criterion on the validation confidentiality reasons we report an indicative range for all the score improvement tol = e −10 . We then exploit a set of Decision- numeric results. We use Python 3.71 as a programming language, Tree based regressors: we first instantiate a simple Decision Tree, together with Pandas 2 and scikit-learn 3 libraries. All the exam- which we then use as a building block for an AdaBoost regressor ples use as input the transactions of a subset of 2, 000 customers. [20] and a Random Forest regressor [1]. For the former we specify that we want to terminate the boosting at n_estimators = 300; 1 https://www.python.org/downloads/release/python-370/ for the latter we require a number of trees N = 100 and we set 2 https://pandas.pydata.org/pandas-docs/stable/index.html max_depth = 20. 3 https://scikit-learn.org/stable/ 5.1 Periodicity and trend detection last quarter variations, namely δQ 1 and δQ 4 . We consider incom- 5.1.1 Salaries and suppliers payment detection. As already re- ing and outgoing counterparts separately. Out of the 2, 000 clients ported in Section 4.1.1, we define salaries and suppliers payment originally in scope, 20% shows an increasing trend in the incom- those outgoing transactions showing a regular spikes over time. ing counterparts and 7.5% shows a decreasing trend. Considering The height of the spikes is generally almost constant but it may outgoing counterparts, 20% of customers shows increasing trend, be subject to changes from time to time. The output of the adap- 6% shows decreasing trend. Figure 5 reports some examples of the tive threshold heuristic points out that slightly less than the 25% automatically detected behaviors for 3 different customers. As of the customers under analysis performs periodical payment visible, all of them show a clear decreasing trend, correctly iden- transactions. We manually verified about 100 cases and found tified by the algorithm. Also in this case, we manually verified no evident sign of miscalssification. For instance, we report the the results, showing no errors in the classification. output for two customers, namely CLI1 and CLI2, in Figure 4. In the two figures we can see clearly how a large number of transactions are concentrated in certain days of the month, and 200 CLI1 repeated periodically. CLI1, in Figure 4(a), shows an increase in CLI2 CLI3 the number of distinct transactions after the beginning of 2019: 150 # Counterparts this may suggest, for instance, a change in the relationships with the suppliers, or in the composition of the workforce with newly 100 hired employees. Our algorithm automatically adapts the thresh- old τ , helping the relationship manager to detect the change. For instance, we can suppose that the company is growing or trying 50 to enlarge its business. On the other hand, CLI2 in Figure 4(b) shows an almost regular pattern over the analyzed time period, 0 with a variation of ±3 counterparts over the years. Identified 201708 201719 201710 201711 201802 201801 201802 201803 201804 201805 201806 201807 201808 201819 201810 201811 201902 201901 201902 201903 201904 201905 19 6 07 peaks are consistent with personal payments as identified by the 20170 20 Month CoPP algorithm. The output of the salaries detection procedure is meant to be read together with CoPP, to allow the relation- ship manager to have a clear overview on the customer business Figure 5: Customers with relevant changes in trend for choices and structures. Notice that our solution does not require outgoing counterparts count labelled dataset, a often time-consuming process. In case of trend detection, we also present to the relationship 1000 manager a set of aggregated statistics on the quarter variations. # distinct beneficiaries 800 Figure 6 shows, for instance, the δQ 1 (in red) and δQ 4 (in blue) distributions per turnover buckets. In the figure we observe a 600 positive growth from one year to the following in most of the 400 buckets. The exception to be highlighted is the case of small Threshold companies (i.e., the ones having turnover between 0 and 500, 000 200 RON), which show a very significant growth in Q1. The Relation- Counterparts 0 ship Manager should pay attention to the bucket 500k - 1MLN, /01 /01 /01 /01 /01 /01 /01 /01 showing a significant decrease. 7/10 018/01 018/04 018/07 018/10 019/01 019/04 019/07 201 2 2 2 2 2 2 2 (a) Counterparts, CLI1 300 Delta distribution (%) 200 200 # distinct beneficiaries 150 100 100 0 50 Threshold 100 k K LN LN LN 10 N LN Counterparts 00 L M 0M 0M M 0M -5 -1 00 0 -1 -5 k 0 -1 /01 /01 /01 /01 /01 /01 /01 /01 0K > 7/10 018/01 018/04 018/07 018/10 019/01 019/04 019/07 LN LN LN 50 201 1M M 2 2 2 2 2 2 2 M 10 50 (b) Counterparts, CLI2 Turnover buckets Figure 4: Output of the salary detection process Figure 6: Delta Q1 and Q4 distribution for outgoing coun- terparts per turnover bucket 5.1.2 Trend detection. For the sake of space, we discuss the output of the heuristic reported in Section 4.1.2 for the first and Table 2: Arima and Supervised algorithms scores their general characteristics, and targeted with the most appro- priate set of techniques spanning from simple adaptive threshold ARIMA ADA DT RF SGDR Lin Poly RBF heuristics, to several types of machine learning algorithms. We Accuracy 0.75 0.88 0.89 0.73 0.51 0.61 0.28 0.54 Precision 0.06 0.19 0.18 0.14 0.08 0.11 0.01 0.03 demonstrated that phenomena such as salaries and periodic sup- Recall 1 0.89 0.89 0.89 0.4 0.58 0.79 0.88 pliers payments can be reliably spotted by means of an adaptive threshold algorithm, while a standard linear regression comes Table 3: Unsupervised algorithms scores in handy when major changes in trend need to be detected. We further provided a comparative analysis of the performance of DBScan Agglomerative Isolation K-means well-known machine learning algorithms in spotting isolated Accuracy 0.99 0.98 0.97 0.99 anomalies, whose result make us lean towards the usage of un- Precision 0.52 0.51 0.19 0.77 supervised algorithms. All the provided results are presented Recall 0.63 0.62 0.48 0.8 in a way that they can serve as a decision-aid tool for the bank employees that need easy to read and understand results when 5.2 Isolated anomalous points dealing with corporate customers. For the sake of space, in this Section we discuss the results for REFERENCES a subset of 30 clients whose time series passed the stationarity [1] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32. test. We run all of the considered algorithms using the 80% of the [2] Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster original dataset for the training phase, and the remaining 20% analysis. Communications in Statistics-theory and Methods 3, 1 (1974), 1–27. [3] D. L. Davies and D. W. Bouldin. 1979. A cluster separation measure. IEEE for testing (i.e., roughly the last three months of transactions). transactions on pattern analysis and machine intelligence 2 (1979), 224–227. Since we do not know if there are anomalies or not, we evaluate [4] Harris Drucker et al. 1997. Support vector regression machines. In Advances in neural information processing systems. 155–161. the reliability of the predictors through the insertion of artificial [5] Martin Ester et al. 1996. A density-based algorithm for discovering clusters in anomalies. In particular, we add anomalies on the testing set by large spatial databases with noise.. In Kdd, Vol. 96. 226–231. extracting a random subsample of L = 10 events from such set [6] Chang et al. 2007. Wirevis: Visualization of categorical, time-varying data from financial transactions. In 2007 IEEE Symposium on Visual Analytics Science (i.e., about 10% of the instances of the testing set). We iterate over and Technology. IEEE, 155–162. the whole time series adding one anomaly at a time. We want [7] Durdu Ömer Faruk. 2010. A hybrid neural network and ARIMA model for the anomaly to be clearly out of the standard range of the time water quality time series prediction. Engineering Applications of Artificial Intelligence 23, 4 (2010), 586–594. series, therefore we randomly choose a random entry e, and we [8] Greg Hamerly and Charles Elkan. 2004. Learning the k in k-means. In Advances modify it as: in neural information processing systems. 281–288. [9] Karin Kandananond. 2019. Electricity demand forecasting in buildings based e ∗ = e · N (7, 2.5) + k · max(ts) (2) on ARIMA and ARX models. In Proceedings of the 8th International Conference on Informatics, Environment, Energy and Applications. ACM, 268–271. where N is a normal distribution with mean µ = 7 and standard [10] Jack Kiefer et al. 1952. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics 23, 3 (1952), 462–466. deviation σ = 2.5, and k is a multiplicative coefficient we set [11] Bjoern Krollner et al. 2010. Financial time series forecasting with machine equal to 2. learning techniques: a survey.. In ESANN. [12] Mahesh Kumar et al. 2002. Clustering seasonality patterns in the presence of We now evaluate the obtained outputs. Recall that a point is errors. In Proceedings of the eighth ACM SIGKDD international conference on considered anomalous by ARIMA if it falls out of the confidence Knowledge discovery and data mining. ACM, 557–563. interval boundaries; for the supervised models if it falls out of the [13] T Warren Liao. 2005. Clustering of time series data - a survey. Pattern recognition 38, 11 (2005), 1857–1874. defined threshold boundaries; and for the unsupervised models if [14] Fei Tony Liu et al. 2008. Isolation forest. In 2008 Eighth IEEE International it belongs to a small cluster or it is recognized as a noise point. We Conference on Data Mining. IEEE, 413–422. should further point out that we retrain every model from scratch [15] James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on for each client. Tables 2 and 3 report the performance metrics mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281–297. for all the algorithms. Such metrics allow us to surely consider [16] Frank Nielsen. 2016. Hierarchical clustering. In Introduction to HPC with MPI for Data Science. Springer, 195–211. unreliable the Support Vector-based algorithms and the Stochas- [17] Ping-Feng Pai and Chih-Sheng Lin. 2005. A hybrid ARIMA and support vector tic Gradient Descent Regressor, as they are not able to recognize machines model in stock price forecasting. Omega 33, 6 (2005), 497–505. the true anomalies (small recall), and they wrongly tend to mark [18] Marco Raberto et al. 2002. Waiting-times and returns in high-frequency financial data: an empirical study. Physica A: Statistical Mechanics and its a very large set of points as anomalies (small precision). This Applications 314, 1-4 (2002), 749–755. second problem is common to all supervised approaches, since [19] Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and all them present a very low precision. In practice, the "noisy" validation of cluster analysis. Journal of computational and applied mathemat- ics 20 (1987), 53–65. time series does not allow the regressor to correctly predict the [20] R. E. Schapire. 2013. Explaining adaboost. In Empirical inference. Springer, next value, which too often results as an outlier (raising false 37–52. [21] Dat Thanh Tran et al. 2017. Tensor representation in high-frequency financial alarms). The unsupervised models show instead a better average data for price change prediction. In 2017 IEEE Symposium Series on Computa- behavior, coming from the fact that they tend to advertise anom- tional Intelligence (SSCI). IEEE, 1–7. alies only if their classification is very sure. This leads them to [22] Ruey S Tsay. 2014. Financial Time Series. Wiley StatsRef: Statistics Reference Online (2014), 1–23. be more precise: a simple k-means would indeed identify 80% of [23] Fang-Mei Tseng et al. 2001. Fuzzy ARIMA model for forecasting the foreign anomalies, with 77% of recall. These results are consistent also exchange market. Fuzzy sets and systems 118, 1 (2001), 9–19. for different values of (µ, σ , k) - not reported here for the sake of [24] Tony Van Gestel et al. 2001. Financial time series prediction using least squares support vector machines within the evidence framework. IEEE Transactions brevity. on neural networks 12, 4 (2001), 809–821. [25] Vladimir Vapnik et al. 1997. Support vector method for function approximation, regression estimation and signal processing. In Advances in neural information 6 CONCLUSIONS processing systems. 281–287. In this paper we presented a case study on anomaly detection [26] Kiri Wagstaff et al. 2001. Constrained k-means clustering with background knowledge. In Icml, Vol. 1. 577–584. in corporate investment banking transaction data. The anom- [27] Qi Yu et al. 2014. Bankruptcy prediction using extreme learning machine and alies have been divided in three different categories, according to financial expertise. Neurocomputing 128 (2014), 296–302.