6 Classification of e-commerce customers based on Data Science techniques Olena Piskunova and Rostyslav Klochko [0000-0003-2690-2785] Kyiv National Economic University named after Vadym Hetman, Kyiv, Ukraine EPiskunova@kneu.edu.ua,rostislav.klochko@gmail.com Abstract. Currently, most organizations are trying to build their data-driven strategies investing heavily in developing their own intelligent decision-making systems. But there are many small online retailers in the economy who are looking to implement business intelligence systems but still lack the necessary knowledge and expertise to do it. The article provides an example of using data science techniques for classification online store customers by their purchasing activity. The analysis of different approaches allowed us to propose the solution of this problem in two stages. At first, we segmented our e-commerce custom- ers by RFM metrics using the k-means method. The algorithms for automated selection of the number of clusters and the initial selection of group centers are applied. There were 6 groups of clients highlighted: first cluster - lost clients; cluster 2 is a new wholesale buyer; cluster 3 - customers that the company may soon lose; cluster 4 active retail buyer; cluster 5 new retail customers; cluster 6 is an active wholesale buyer. In the second stage, with help of machine learning algorithms the customers’ classification system was built. The presence of the second stage is conditioned by the need to take into account the constant updat- ing of the client base and accumulation of new information. Tenfold cross- validation was performed to avoid retraining models. The analysis of calcu- ?ations by 5 classification methods allowed us to give the advantage of the "random forest" method. To perform the analysis and all calculations this study uses R programming language and RStudio system. Keywords: clusterization, classification, rfm – model, e-commerce, machine learning, data science. 1 Introduction Retail is one of the fastest-growing sectors of the Ukrainian economy. Today it is an almost unique market in Ukraine that has a lot in common with perfect competition. There are thousands of players in the segment that realize millions of different prod- ucts. Most of them are small and medium-sized businesses. Year by year it becomes more and more difficult to win the loyalty of new customers and retain the loyalty of regular customers. Therefore, the ability to offer an individual approach to each of the clients in the coming years will be the only condition for successful business activity. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7 In view of the rapid economy digitalization, e-commerce is becoming one of the most important areas in retail activity. Despite the fact that Ukraine is far behind the global pace of e-commerce market development, in recent years’ Ukrainian online sales growth is even faster than in Europe. Nowadays, e-commerce companies have to refer to clients' wants and need in the decision-making process to meet the require- ments of today's economy. At the same time, everyday contact with thousands of customers makes it difficult to consider each of them. The solution to this problem is facilitated by the development of a clear segmentation of the client base, which can be done based on the mathematical modeling methods. Thus, modeling consumer behavior is an actual problem, which solution will not only improve the efficiency of e-commerce but also contribute to the development of the whole economy and better fulfilling of consumers' needs. In particular, the im- portant task of e-marketing is to classify online store consumers by the level of their purchasing activity. The peculiarities of this task are a large amount of available data and their constant updating and accumulation, which requires the use of Data Science techniques, including machine learning methods. The goal of this work is to classify online store customers by the level of their purchasing activity based on Data Science techniques, including machine learning methods. 2 Literature review The majority of Ukrainian scientists' researches is based on the analysis of the client base, which is supported only by a personal understanding of the process [1]. Also, the overwhelming amount of scientific work is based on the socio-demographic statis- tics of an individual company or the whole country [2]. E-commerce customer activi- ty data is almost not investigated. Recently, the first publications with examples of the application of machine learning methods in marketing have started to appear in the Ukrainian scientific space [3]. But most of them do not take full advantage of these technologies. For example, if cluster analysis methods are used, then the number of clusters is selected based on their own expert judgment [3]. As the analysis of Ukrain- ian scientific works shows, machine learning algorithms, the RFM model, and the process of automated decision-making are hardly used in them. Even if these technol- ogies are used, they are quite limited. For example, the RFM model is used, but the segmentation is performed manually [4, 5]. Foreign scientific literature has many studies that reveal the peculiarities of the us- age of intelligent systems in marketing [6]. Most papers describe the complete pro- cess of building an automated customer analysis system which includes: calculating RFM activity metrics, customer clustering using the machine learning methods (e.g. K-means), developing an individual approach for each segment [7,8,9]. However, the methodology for selecting the number of clusters to which necessary to divide the input data is hardly addressed. As a rule, only one method is used - “average silhou- ette width”, which usually does not allow to solve the problem correctly [10]. It is advisable to decide the required number of clusters, based on the value of 26 addi- 8 tional criteria that can be obtained using the NbClust data analysis package in RStudio [11]. Also, the question of the further efficiency of the built algorithms is almost not solved. Most of the research in this area details the methodology for clusterization's existing customer base, but they do not take into account that new clients are coming every day and current clients tend to change their behavior over time. It is considered appropriate to consider clustering as the first stage of data analysis, which only allows us to understand which customer groups are active, while it is important to have a system for automatically assigning a segment to customers. In scientific research, there are two approaches to solving this problem - fuzzy logic methods [12] or classi- fication algorithms [13]. The analysis of different approaches allows us to give pref- erence to the classification model. 3 Proposed methodology and experiments The approach proposed in the paper is implemented in 2 stages: the first stage in- volves customer segmentation by cluster analysis methods; the second stage involves the development of a client classification algorithm that would allow continuously update current clients segment and assign a segment to new customers. This research is aimed to reduce the human impact in strategic decisions making. Therefore, particular attention is paid to the accuracy and relevance of the proposed methods and algorithms. The number of clusters is selected based on 26 different criteria and indices. For classification task were applied 5 different models with tenfold cross-validations. After that, the most accurate and appropriate algorithm was chosen for implementation. Note that all calculations are performed using the R-Studio software with R pro- gramming techniques. 3.1 Execution of RFM Analysis The study was performed on the sample of data from one of the online stores [14]. Data include 1 067 371 transactions of purchase and return of goods during the period from 01.09.2009 to 09.12.2011. The database contains the following information: Invoice - unique operation code; StockCode - unique product code; Description - the name of the product; Quantity - the quantity of purchased/returned products; InvoiceDate - date of operation; Price - the price of the goods; Customer ID - unique customer code; Country - a country of the operation. The first task to be addressed in the research process is the selection of criteria for evaluating the level of customer purchasing activity. We will take a classic approach to measure purchasing activity - RMF-model (Recency - Frequency - Monetary) [15]. Recency for each individual customer is calculated as the difference between the actual date in the database and the date of the customer's last purchase. In our case, the metric is measured in days. The frequency of purchases (Frequency) for each 9 individual customer is determined by the number of transactions performed by the client during his client life. Monetary for each individual customer is defined as the total return on all customer transactions during his or her client life. In our case, the metric is measured in dollars. In the previous research phase, these customer activity metrics were calculated for each customer in the sample. After that, the characteristics of the statistical distribu- tions of Recency, Frequency, Monetary were calculated, namely: average, minimum and maximum values of indicators, as well as 1, 2 and 3 quartiles. The values of these characteristics are shown in Fig. 1. Fig. 1. Distribution of purchasing activity metrics As you can see, the average customer of our online store had the last purchase 202 days ago. On average the customers buy 6 goods during the client's life while spend- ing $ 2 720. Further, we will use these indicators as the main metrics. 3.2 Customer segmentation The process of clustering an online store's customer base relates to Unsupervised Learning algorithms where algorithms do not receive any clues as to the desired re- sult, but rather generate new results based on the data. Unsupervised learning tech- nologies are commonly used at the beginning of the study. The main result of the implementation of these algorithms is to find certain patterns in the available data and to characterize their structure. The most efficient and simple algorithm for cluster analysis is k-means. This method is very common in economic research, but its practical application for cluster- ing e-commerce customers has some difficulties. Firstly, the final results are sensitive to the initial random selection of group cen- ters. To solve this problem, a procedure involving multiple executions of an algorithm with different random assignment of initial centroids was applied. An iteration with a minimum value of is selected as the final clustering option. Within Cluster Sum of Squares ( ) measures the squared average distance of all the points with- in a cluster to the cluster centroid [16]. ( ) ∑ , (1) 10 where ( ) is the sum of Euclidean distances between points within the cluster l; - number of points in cluster l; k is the number of clusters. The sum of Euclidean distances between points within cluster l is calculated by the formula: ( ) ∑ ( ), (2) where n is the number of points in cluster l; is the cent of the weight of cluster l [17]. The second problem is the need to prioritize a fixed number of clusters for parti- tioning, which is certainly not always chosen to be optimal. Therefore, one of the main tasks of cluster analysis is to select the optimal value of k. There are several versions of the solution:  quantity is determined by business needs. This approach is commonly used if there is exist a proven customer classification system in the enterprise segment. An ex- ample would be the distribution of customers by their purchasing activity level (Low, Below Average, Medium, High);  quantity is selected using machine learning algorithms. This approach is used when the decision-maker has no understanding of the typology of their clients. Machine learning algorithms help you to select customer classes based on the level of simi- larity of their behavior;  a mixture of the first and second approaches. The most common approach is when a decision is made based both on business understanding and the results of mathe- matical modeling. The basic methods of machine learning that help to solve the problem of choosing the number of clusters are the methods of "elbow" and "medium silhouette". The elbow method explores the nature of the (1) variation spread with an increasing num- ber of groups k. Combining all n observations into one group, we have the largest intra-cluster variance, which will decrease to 0 as k → n [16]. Another, popular method of assessing the quality of the model is the "Average sil- houette width". The value of the silhouette shows how similar the object is to its clus- ter compared to other clusters. Suppose that the data were clustered into k clusters. For the point (the point ), let: ( ) ∑ ( ), (3) | | where ( ) is the average distance from to other objects in the cluster ; | | | is the number of objects in clusters. We can interpret ( ) as a measure of how well is assigned to its cluster (the smaller the value, the better the destination). Then we determine the average dissimilarity of the point to some cluster as the average distance from to all points (where ). For each data point , we now define: 11 ( ) ∑ ( ), (4) | | where ( ) is the smallest average distance to all points of any other cluster, where is not a member. A cluster with this smallest mean difference is considered a "neighboring cluster" to , since it is the next cluster best suited for the point . Now let's define the sil- houette of one data point : ( ) ( ) ( ) , (5) ( ( ) ( )) where ( ) is the average distance from to other objects in the cluster ; ( ) is the smallest average distance to all points of any other cluster. The silhouette varies from -1 to +1, where a high value indicates that the object is well-matched to its own cluster. If most objects are of high value, then the clustering configuration is appropriate [18]. By the formula (1) for each number of clusters (from 1 to 10), the level of variance explained by clustering was determined (Fig. 2). Fig. 2. Graphical implementation of the elbow method In Fig. 2 we need to identify the breaking point where the drop starts to slow. Points 3, 6 and 8 look most similar to the hacking point, but decisions made on one approach alone are in most cases not accurate. Therefore, the next step will be a silhouette check. Using formulas (3) - (5), we calculate the value of "silhouette" for each variant of the number of clusters (from 1 to 10). A graphical representation of the calculation results is shown in Fig. 3. 12 Fig. 3. Graphic implementation of the " average silhouette" method By this method, we look for the highest value of this indicator. As we can see in Fig. 3, the optimal number of clusters is 2 (6 in second place). NbClust analytical package was used to refine the results, which allows us to cal- culate 26 additional criteria. Table 1. Characterization of indices in the NbClust analytical package [11] Full name Short name Selection criterion Hubert index. Hubert and Hubert Graphical method Arabie 1985 Dindex. Lebart et al. (2000) Dindex Graphical method KL index. Krzanowski and KL Maximum value of the index Lai (1988) CH index. Calinski and CH Maximum value of the index Harabasz (1974) Hartigan index. Hartigan Maximum difference between Hartigan (1975) hierarchy levels of the index Cubic Clustering Criterion CCC Maximum value of the index (CCC). Sarle (1983) Scott index. Scott and Sy- Maximum difference between Scott mons (1971) hierarchy levels of the index Max. value of second differ- Marriot index. Marriot (1971) Marriot ences between levels of the index 13 TraceCovW index. Milligan Maximum difference between TrCovW and Cooper (1985) hierarchy levels of the index Maximum value of absolute TraceW index. Milligan and TraceW second differences between Cooper (1985) levels of the index Friedman index. Friedman Maximum difference between Friedman and Rubin (1967) hierarchy levels of the index Silhouette index. Kaufman Silhouette Maximum value of the index and Rousseeuw (1990) Ratkowsky index. Ratkowsky Ratkowsky Maximum value of the index and Lance (1978) Ball index. Ball and Hall Maximum difference between Ball (1965) hierarchy levels of the index PtBiserial index. Examined Ptbiserial Maximum value of the index by Milligan (1980,1981) Dunn index. Dunn(1974) Dunn Maximum value of the index Minimum value of second Rubin index. Friedman and Rubin differences between levels of Rubin (1967) the index C-index. Hubert and Levin Cindex Minimum value of the index (1976) DB index. Davies and Bould- DB Minimum value of the index in (1979 Duda index. Duda and Hart Smallest number of clusters Duda (1973) such that index > criticalValue Pseudot2 index. Duda and Smallest number of clusters Hart (1973) (тільки Pseudot2 such that index < criticalValue ієрархічний) number of clusters such that Beale index. Beale (1969) Beale critical value of the index >= alpha Frey index. Frey and Van the cluster level before that Frey Groenewoud (1972) index value < 1.00 Mcclain index. McClain and McClain Minimum value of the index Rao (1975) SDindex. Halkidi et al.(2000) SDindex Minimum value of the index SDbw. Halkidi et al.(2001) SDbw Minimum value of the index The results of calculations for each of the indices (Table 1) are presented in the Table. 2. The optimum values for each index are in bold. 14 Table 2. The value of additional quality criteria for dividing objects into clusters for different number of clusters Number of clusters Index 2 3 4 5 6 7 KL 0.721 54.5 0.037 2.63 0.963 0.148 CH 1645 2755 2525 2857 2894 2565 Hartigan 2835 930 1439 862 220 3241 CCC -34.5 -15.7 -17.7 -0.845 5.32 -1.10 Scott 5085 8886 13805 16936 19114 20893 Marriot 6.7E+10 6.5E+10 3.9E+10 3.0E+10 2.7E+10 2.5E+10 TrCovW 1.08E+6 4.04E+6 2.99E+6 2.03E+6 1.73E+6 1.68E+6 TraceW 9933 6102 5060 3837 3221 3072 Friedman 2.29 4.16 10.1 12.7 14.5 19.4 Rubin 1.36 2.22 2.68 3.53 4.21 4.41 Cindex 0.0231 0.0215 0.0174 0.0135 0.0122 0.0116 DB 1.512 0.971 0.981 0.899 0.982 0.956 Silhouette 0.460 0.481 0.455 0.508 0.518 0.475 Duda 0.454 0.974 1.03 0.852 1.01 1.16 Pseudot2 2651 72.3 -53.6 312 -20.5 -241 Beale 2.05 0.045 -0.045 0.277 -0.017 -0.197 Ratkowsky 0.308 0.427 0.392 0.377 0.355 0.331 Ball 4967 2034 1265 767 537 439 Ptbiserial 0.251 0.325 0.327 0.375 0.374 0.352 Frey -0.902 0.732 -0.205 0.515 2.215 0.468 McClain 0.541 0.451 0.645 0.556 0.574 0.669 Dunn 4.0E-04 3.0E-04 1.0E-04 2.0E-04 2.0E-04 2.0E-04 Hubert 1.0E-04 1.0E-04 2.0E-04 2.0E-04 2.0E-04 2.0E-04 SDindex 4.51 53.8 41.4 46.4 74.3 65.1 Dindex 0.799 0.741 0.575 0.498 0.466 0.430 SDbw 2.12 14.6 11.2 12.4 19.6 17.0 In Fig. 4 presents the number of criteria that supported the corresponding number of clusters. As we can see, the number of clusters in size 3 showed itself best (eight criteria selected this number). The next best option is to have 2 and 6 clusters. We can immediately discard option "2" as it will not bring us any value in future calculations. Therefore, the main options are 3 and 6 clusters. The next step will be the practical implementation of the k - means method and validation of the results on business logic. The clients were divided into 3 and 6 clusters. Tables 3 and 4 show the average values of RFM metrics for the case of clusters 3 and 6, respectively. The analysis of Table 3 allows us to give the following interpretation of clusters:  Cluster 1. This includes customers who, on average, make small purchases every 2 months. 15  Cluster 2. This includes wholesale buyers who, on average, purchase a large num- ber of goods once a month for a considerable amount.  Cluster 3. Here, retail customers make an average purchase once a year. Fig. 4. The number of criteria that support the corresponding number of clusters Table 3. Results of division into 3 clusters Cluster Recency Frequency Monetary 1 58 7.17 2 775 2 30 84.1 78 231 3 291 2.13 704 Table 4. Results of division into 6 clusters Cluster Recency Frequency Monetary Number of clients 1 424 1.54 498 723 2 26 57.8 46 400 34 3 218 2.71 889 1 615 4 32 19.8 8 528 345 5 54 4.65 1 603 1 794 6 33 145 135 430 7 As can be seen from Table 4, the resulting clusters characterize the following types of clients:  Cluster 1. Lost clients - Has made less than 2 purchases, the last of which was over a year ago.  Cluster 2. New wholesale buyer - high average check and activity, but it's been a while since the first purchase. Efforts must be made to increase customer loyalty to the business. 16  Cluster 3. Customers whose we will lose soon. They showed typical activity, how- ever, a long time has passed since the last purchase. We should pay attention to these customers and try to persuade them to do more frequent operations.  Cluster 4. Active retail buyer - high activity, buys for a long period, average check. The most valuable and loyal type of customer for the business.  Cluster 5. New retail customers - high activity, but during a short period, average check. Efforts should be made to turn them into regular customers.  Cluster 6. Active wholesale buyer - high average check and activity, buys over a long period. The most profitable type of customers. Given the business logic, it was decided that the division into 6 customer groups is more acceptable and better characterizes the current situation of the functioning of the online store. The number of clients in each cluster is shown in Table 5. 3.3 Classification of online store customers based on machine learning methods The next step after customer base segmentation is to build classification models for the distribution of e-commerce customers by these segments. The classification is the task of dividing the set of observations or objects by the values of certain attributes into a priori given groups called classes. Within each of these groups, objects are considered to be similar to each other [19]. The most common machine learning methods for classification are Linear discri- minant analysis (LDA); Support vector machine (SVM), Classification and regression trees (CART), k - nearest neighbors (KNN), Random forests (RF). Discriminant Analysis is a kind of multidimensional data analysis designed to solve random pattern recognition problems. It is used to decide what factors divide (“discriminate") certain data sets (so-called "groups"). SVM (support vector machine) is a set of similar supervised learning algorithms used for classification and regression analysis tasks. A feature of the reference vector method is the constant reduction of the empirical classification error and the intention to increase the distance, so this method is also known as the maximum distance classi- fication method [20]. The decision tree develops solutions with the help of a tree model. The algorithm splits the sample into two or more homogeneous sets (branches) based on the most significant differentiators of the input variables. To select a differentiator (predictor), the algorithm takes into account all the features and makes a binary partition. He then selects the lowest cost option (the highest precision) and repeats recursively until the successful partitioning of the data across all branches (or reaches the maximum depth). The Classification and Regression Tree (CART) is one of the implementations of the decision tree. Periodic nodes of trees of classification and regression are root and internal nodes - branches. The end nodes are leaf nodes. Each periodic node repre- sents one input variable (x) and a splitting point on that variable; leaf nodes represent the output variable (y). The model is used to predict the following algorithm: it is 17 necessary to go through all the splits of a tree in order to reach the node "leaves" and deduce the value present in it. Random Forest (RF) is an ensemble model that builds several trees and classifies objects on a "vote" basis. That is, the object belongs to the class that has the majority of votes from all the trees. The algorithm trains several decision trees on different subsamples of data and uses the average to improve model prediction accuracy. The K-Nearest Neighborhood Classification (KNN) algorithm assumes that objects are divided into different classes so that they can be classified based on their similari- ty. The distance between the objects may be a measure of similarity. KNN does not need a training phase, it is trained in the sense that it begins to classify data points at once, based on the majority of votes of its neighbors. The object is assigned the class that is most common among its k nearest neighbors. [21] The data set was divided into training and test samples (75% and 25% respective- ly). As a result, the training sample includes data about 3388 clients, and the test sample - 1130 clients. "Accuracy" was used to evaluate the quality of the simulation, which is the ratio of correctly distributed customers to their total. The clients are clas- sified according to the 5 methods presented above (LDA, CART, KNN, SVM, RF). Tenfold cross-validation was applied during the implementation of the customer clas- sification algorithm. It is necessary to test whether the simulation results are dependent on a particular dataset. A ten-fold test involves splitting the sample into ten randomly selected sets (test and training samples) and testing the model built on them. Table 5 shows the characteristics of the Accuracy distributions (minimum, maximum and average values, as well as 1, 2, and 3 quartiles) obtained from the training sample for each method. Table 5. «Accuracy» distribution performance values ( training sample) Min 1st Qu Median Mean 3rd Qu Max LDA 0.95 0.97 0.97 0.97 0.97 0.98 CART 0.90 0.91 0.91 0.92 0.94 0.97 KNN 0.90 0.93 0.94 0.93 0.94 0.95 SVM 0.98 0.98 0.99 0.99 0.99 0.99 RF 0.99 0.99 0.99 0.99 0.99 1.00 As can be seen from Table 5, the RF model showed the smallest error in the training sample (mean 0.99). Using a built "random forest" model, we check it on a test sample. The results of the customers' distribution by classes are presented in Table 6. On the test sample, this algorithm showed an accuracy of 99%, so RF was chosen to implement the classification process for the entire data sample (Table 7). Each cluster characterizes a specific group of customers that are similar in purchasing activity. At the same time, clients have a significant difference between clusters. A graphical representation of the difference between the level of purchasing activity in different clusters is shown in Fig. 5-7. 18 Table 6. Random forest classification results Real Segment 1 2 3 4 5 6 1 183 0 0 0 0 0 2 0 3 0 0 0 0 Forecast 3 0 0 412 0 2 0 4 0 1 1 85 0 0 5 0 1 0 0 440 0 6 0 0 0 0 0 2 Table 7. Dividing of all existing customers by random forest method Cluster Name Number of clients 1 Lost clients 723 2 New wholesalers 32 3 Almost lost 1 616 4 Active retail 347 5 New retail 1 793 6 Active wholesalers 7 Fig. 5. Boxplot of Recency metric by client type Fig. 6. Boxplot of Frequency metric by client type 19 Fig. 7. Boxplot of Monetary metric by client type 4 Conclusion The paper deals with the task of classifying online store customers by their purchasing activity based on Data Science techniques, including machine learning methods. The analysis of different approaches allowed us to propose the solution of this problem in two stages. First, the customers of the online store were segmented according to the k-means method by RFM indicators, using algorithms for automated selection of the number of clusters and initial centroids. There were 6 customer groups found: cluster 1 lost clients - Has made less than 2 purchases, the last of which was over a year ago; cluster 2 new wholesale buyer - high average check and activity, but it's been a while since the first purchase; cluster 3 customers whose we will lose soon; cluster 4 active retail buyer - high activity, buys for a long period, average check; cluster 5 new retail customers - high activity, but during a short period, average check; cluster 6 active wholesale buyer - high average check and activity, buys over a long period. The second step of the classification procedure, which is already directly carried out the classification of customers, due to the need to take into account the constant updating of the client base and the accumulation of new information. The analysis of calculations by 5 classification methods allowed us to give advantages of the "random forest" method. References 1. Pursky O., Masokha D. Method of building a network of storefronts of online stores based on MVC architecture // Business Inform. - 2017. - №10. - P. 319–324. (in Ukrainian) 2. Kondruk N. "Using a longitudinal measure of similarity in clustering problems" Radio electronics, informatics, control, no. 3 (46), 2018, p. 98-105. doi: 10.15588 / 1607-3274- 2018-3-11 (in Ukrainian) 3. Roskladka N., Roskladka A., Dzigman O. Cluster analysis of the client database of enter- prises of the service industry. Economy and management of the national economy. Interna- tional Economic Relations. No. 2 (35), 2019. p. 151-159 (in Ukrainian) 4. Matsuka V. Marketing technology of forming consumer loyalty in the tourist services market / V. Matsuka, A. Balabanyts // Bulletin of the Mariupol State University. Series: 20 Economics: Coll. of sciences. wash / goal ed. KV Balabanov. - Mariupol, 2017. - Issue. 14. P. 177–187. (in Ukrainian) 5. Shulgina L. "Methodical instructions on the application of analysis and quality assessment of tourist services" Business Inform, no. 3 (482), 2018, pp. 180-185. (in Ukrainian) 6. Kamthania, Deepali & Pahwa, Ashish & Madhavan, Srijit. (2018). Market Segmentation Analysis and Visualization Using K-Mode Clustering Algorithm for E-Commerce Busi- ness. Journal of Computing and Information Technology. 26. 57-68. 10.20532/cit.2018.1003863. 7. Chen, D., Sain, S. & Guo, K. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strat- egy Manag 19, 197–208 (2012). https://doi.org/10.1057/dbm.2012.17 8. Dogan, Onur & Ayçin, Ejder & Bulut, Zeki. (2018). CUSTOMER SEGMENTATION BY USING RFM MODEL AND CLUSTERING METHODS: A CASE STUDY IN RETAIL INDUSTRY. International Journal of Contemporary Economics and Administrative Sci- ences. 8. 1-19. 9. Ait daoud, Rachid. (2015). Customer Segmentation Model in E-commerce Using Cluster- ing Techniques and LRFM Model: The Case of Online Stores in Morocco. International Journal of Computer, Electrical, Automation, Control and Information Engineering. 9. 1795 - 1805. 10. P. Anitha and M. M. Patil, RFM model for customer purchase behavior using K-Means al- gorithm, Journal of King Saud University –Computer and Information Scienc- es,https://doi.org/10.1016/j.jksuci.2019.12.011 11. Charrad, Malika & Ghazzali, Nadia & Boiteau, Véronique & Niknafs, Azam. (2013). An examination of indices for determining the number of clusters: NbClust Package. 12. Ansari, Azarnoush & Riasi, Arash. (2016). Customer Clustering Using a Combination of Fuzzy C-Means and Genetic Algorithms. International Journal of Business and Manage- ment. 11. 59. 10.5539/ijbm. v11n7p59. 13. Mathivanan, N.M.N. & Md.ghani, N.A. & Mohd Janor, Roziah. (2018). Improving classi- fication accuracy using clustering technique. Bulletin of Electrical Engineering and Infor- matics. 7. 465-470. 10.11591/eei. v7i3.1272. 14. Online Retail II Data Set URL: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II 15. Wei, Jo-Ting & Lin, Shih-Yen & Wu, Hsin-Hung. (2010). A review of the application of RFM model. African Journal of Business Management December Special Review. 4. 4199-4206. 16. Shitikov V., Mastitsky S. Classification, regression, Data Mining algorithms using R. URL: https://ranalytics.github.io/data-mining/101-Partitioning-Algos.html. (in Russian) 17. Bertagnolli N. Elbow Method and Finding the Right Number of Clusters. URL: http://www.nbertagnolli.com/jekyll/update/2015/12/10/Elbow.html. 18. Lengyel, Attila & Botta-Dukat, Zoltan. (2018). Silhouette width using generalized mean - a flexible method for assessing clustering efficiency. 10.1101/434100. 19. Lavrenyuk M. An overview of machine learning methods for the classification of large volumes of satellite data / [MS. Lavrenyuk, OM Novikov]; Systems research and infor- mation technology. 2018. №. 1. P. 52-71. (in Ukrainian) 20. Soofi, Aized & Awan, Arshad. (2017). Classification Techniques in Machine Learning: Applications and Issues. Journal of Basic & Applied Sciences. 13. 459-465. 10.6000/1927- 5129.2017.13.76. 21. Akinsola, J E T. (2017). Supervised Machine Learning Algorithms: Classification and Comparison. International Journal of Computer Trends and Technology (IJCTT). 48. 128 - 138. 10.14445/22312803/IJCTT-V48P126.